1
00:00:00,880 --> 00:00:05,110
The RSA crypto systems is one of
the lovely and really important

2
00:00:05,110 --> 00:00:08,440
applications of number
theory in computer science.

3
00:00:08,440 --> 00:00:12,140
So let's start talking about it.

4
00:00:12,140 --> 00:00:13,950
The RSA crypto system
is what is known

5
00:00:13,950 --> 00:00:18,130
as a public key cryptosystem,
which has the following really

6
00:00:18,130 --> 00:00:23,500
amazing properties--
namely, anyone

7
00:00:23,500 --> 00:00:25,820
can send a secret
encrypted message

8
00:00:25,820 --> 00:00:29,340
to a designated receiver.

9
00:00:29,340 --> 00:00:32,860
This is without there being
any prior contact using only

10
00:00:32,860 --> 00:00:36,304
publicly available information.

11
00:00:36,304 --> 00:00:37,720
Now, if you think
about that, it's

12
00:00:37,720 --> 00:00:40,080
really terrific because
it means that you

13
00:00:40,080 --> 00:00:44,020
can send a secret message to
Amazon that nobody but Amazon

14
00:00:44,020 --> 00:00:46,670
can read even though
the entire world knows

15
00:00:46,670 --> 00:00:50,050
what you know and can see
what you sent to Amazon.

16
00:00:50,050 --> 00:00:52,380
And Amazon knows that
it's the only one can

17
00:00:52,380 --> 00:00:55,350
decrypt the message you sent.

18
00:00:55,350 --> 00:00:58,900
This in fact is hard to
believe if you think about it.

19
00:00:58,900 --> 00:01:00,350
It sounds paradoxical.

20
00:01:00,350 --> 00:01:04,739
How can secrecy be possible
using only public info?

21
00:01:04,739 --> 00:01:07,890
And in fact, the existence of
this public key cryptosystem

22
00:01:07,890 --> 00:01:12,010
has some genuinely paradoxical
consequence, which kind of

23
00:01:12,010 --> 00:01:12,840
are a mind bender.

24
00:01:12,840 --> 00:01:14,935
So let me tell you
about one of them.

25
00:01:14,935 --> 00:01:16,810
I don't know if you've
heard of mental chess,

26
00:01:16,810 --> 00:01:18,690
but it's a standard
thing in the chess world.

27
00:01:18,690 --> 00:01:23,180
Chess masters are so talented
and have such deep insight

28
00:01:23,180 --> 00:01:25,205
into the game that they
don't need a chessboard,

29
00:01:25,205 --> 00:01:26,580
and they don't
need chess pieces.

30
00:01:26,580 --> 00:01:29,850
They can just go for
walk on a country

31
00:01:29,850 --> 00:01:31,690
lane talking to each
other and saying

32
00:01:31,690 --> 00:01:36,420
pond to king 4 and
knight to bishop 3

33
00:01:36,420 --> 00:01:40,030
and just talking chess code
and play an entire chess

34
00:01:40,030 --> 00:01:40,635
game that way.

35
00:01:40,635 --> 00:01:42,284
That's known as mental chess.

36
00:01:42,284 --> 00:01:43,200
It's quite impressive.

37
00:01:43,200 --> 00:01:46,710
In fact, the grand masters
can play multiple games

38
00:01:46,710 --> 00:01:48,220
of mental chess
against opponents

39
00:01:48,220 --> 00:01:49,950
who are staring
at the chessboard

40
00:01:49,950 --> 00:01:52,560
and win the great
majority of the games.

41
00:01:52,560 --> 00:01:56,180
Of course, these are not against
other grand masters, but still.

42
00:01:56,180 --> 00:01:56,680
OK.

43
00:01:56,680 --> 00:01:58,470
So now, this is what I propose.

44
00:01:58,470 --> 00:02:00,000
How about playing mental poker?

45
00:02:00,000 --> 00:02:02,000
If you know how to play
poker, we deal our cards

46
00:02:02,000 --> 00:02:04,620
and we bet and so on.

47
00:02:04,620 --> 00:02:08,160
And my only condition
is that I'll deal.

48
00:02:08,160 --> 00:02:11,330
Now, that sounds like a
joke and an absurd thing

49
00:02:11,330 --> 00:02:14,090
for you to agree to
do, but it's amazing.

50
00:02:14,090 --> 00:02:15,640
It's actually possible.

51
00:02:15,640 --> 00:02:20,960
One of the famous papers
of Rivest and Shamir

52
00:02:20,960 --> 00:02:26,200
was how to play mental poker
using public key crypto.

53
00:02:26,200 --> 00:02:33,720
So I once tried to persuade
an eminent MIT dean who's

54
00:02:33,720 --> 00:02:35,507
a physicist
researcher about this,

55
00:02:35,507 --> 00:02:36,840
and he just wouldn't believe it.

56
00:02:36,840 --> 00:02:40,750
He argued that it was
just impossible logically.

57
00:02:40,750 --> 00:02:42,990
And what he was
thinking about was

58
00:02:42,990 --> 00:02:46,940
that if you know how to compute
a function, then of course

59
00:02:46,940 --> 00:02:50,090
you can figure out
how to invert it.

60
00:02:50,090 --> 00:02:54,170
That is to say if I know how
to compute some function f

61
00:02:54,170 --> 00:02:58,010
of a number and let's
say that the function is

62
00:02:58,010 --> 00:03:00,870
one arrow in-- that
is an injection-- then

63
00:03:00,870 --> 00:03:04,010
if I know what f of n, there's
a unique n that it came from.

64
00:03:04,010 --> 00:03:08,000
So how can I not
be able to find n?

65
00:03:08,000 --> 00:03:11,840
And it's an insight of computer
science and complexity theory

66
00:03:11,840 --> 00:03:14,750
that says it's quite possible.

67
00:03:14,750 --> 00:03:18,430
It's not that you can't find
the n that produced f of n.

68
00:03:18,430 --> 00:03:21,670
It's that the search for
it will be prohibitive.

69
00:03:21,670 --> 00:03:24,550
There are, in short, one-way.

70
00:03:24,550 --> 00:03:29,870
That is, functions that are
easy to compute in one direction

71
00:03:29,870 --> 00:03:31,960
but hard to invert.

72
00:03:31,960 --> 00:03:33,950
They're easy to compute
but hard to invert.

73
00:03:33,950 --> 00:03:37,522
In particular, we're
thinking about multiplying

74
00:03:37,522 --> 00:03:38,105
and factoring.

75
00:03:40,896 --> 00:03:43,270
It's an observation that it's
easy to compute the product

76
00:03:43,270 --> 00:03:44,720
of two large prime numbers.

77
00:03:44,720 --> 00:03:45,886
We all know how to multiply.

78
00:03:45,886 --> 00:03:49,950
And in fact, there are faster
ways to multiply than you know.

79
00:03:49,950 --> 00:03:53,070
But the current state
of our knowledge

80
00:03:53,070 --> 00:03:55,230
of number theory and
complexity theory

81
00:03:55,230 --> 00:03:58,570
is that given a
number n that happens

82
00:03:58,570 --> 00:04:00,470
to be the product
of two primes, it

83
00:04:00,470 --> 00:04:05,300
seems to be hopelessly
hard in general to factor

84
00:04:05,300 --> 00:04:07,750
n into the components p and q.

85
00:04:07,750 --> 00:04:09,960
Now, this is an open problem.

86
00:04:09,960 --> 00:04:12,660
It's similar to the p
equals np question--

87
00:04:12,660 --> 00:04:13,880
that famous open problem.

88
00:04:13,880 --> 00:04:19,450
It's actually a weaker--
it's quite possible

89
00:04:19,450 --> 00:04:22,840
that you could factor, and
np would not equal to np.

90
00:04:22,840 --> 00:04:24,950
But nevertheless, it's
the same kind of problem.

91
00:04:24,950 --> 00:04:26,640
And more generally,
the existence

92
00:04:26,640 --> 00:04:29,110
of one way functions is
closely related to that p

93
00:04:29,110 --> 00:04:30,700
equals np question.

94
00:04:30,700 --> 00:04:33,050
Nevertheless, even though
it's an open problem

95
00:04:33,050 --> 00:04:36,170
and theoretically has not
been settled either way,

96
00:04:36,170 --> 00:04:41,195
it's widely believed-- the
banks, the governments,

97
00:04:41,195 --> 00:04:46,480
and the commercial world have
really bet the family jewels

98
00:04:46,480 --> 00:04:52,690
on the difficulty of factoring
when they use the RSA protocol.

99
00:04:52,690 --> 00:04:58,880
So I like to make the joke that
my most important contribution

100
00:04:58,880 --> 00:05:03,630
to MIT was being involved in
the hiring of our S and A.

101
00:05:03,630 --> 00:05:12,130
So this is A, Adi Shamir, R,
Ron Rivest, and A, Len Adleman

102
00:05:12,130 --> 00:05:17,100
back in the late '70s when they
first came up with these ideas.

103
00:05:17,100 --> 00:05:24,470
So let's look at the way this
RSA protocol actually works.

104
00:05:24,470 --> 00:05:25,780
So here's what happens.

105
00:05:25,780 --> 00:05:28,850
To begin with, you have to
make some information public

106
00:05:28,850 --> 00:05:31,570
so that people can
communicate with you.

107
00:05:31,570 --> 00:05:34,700
We're looking at
two players here.

108
00:05:34,700 --> 00:05:37,690
There's a receiver who's going
to get encrypted messages,

109
00:05:37,690 --> 00:05:41,270
and there's a
sender who is trying

110
00:05:41,270 --> 00:05:43,780
to send an encrypted
message to the receiver.

111
00:05:43,780 --> 00:05:46,400
So what the receiver
does before hand is

112
00:05:46,400 --> 00:05:49,670
generates two primes, p and q.

113
00:05:49,670 --> 00:05:52,310
Now, in practice, you want
these to be pretty big primes--

114
00:05:52,310 --> 00:05:54,170
hundreds of digits.

115
00:05:54,170 --> 00:05:56,120
And we'll examine it in
a moment, the question

116
00:05:56,120 --> 00:05:56,995
of how you find them.

117
00:05:56,995 --> 00:06:02,350
But the receiver's job is to
find two quite substantial

118
00:06:02,350 --> 00:06:05,580
large primes, p and
q, chosen more or less

119
00:06:05,580 --> 00:06:09,050
randomly because if you have any
kind of predictable procedure

120
00:06:09,050 --> 00:06:11,860
for how you got them, that
would be a vulnerability.

121
00:06:11,860 --> 00:06:13,900
But if you just
choose them at random,

122
00:06:13,900 --> 00:06:18,190
then there's enough primes
in the hundreds of digits

123
00:06:18,190 --> 00:06:20,180
that it's hopeless
that people would guess

124
00:06:20,180 --> 00:06:22,020
which one you wound up with.

125
00:06:22,020 --> 00:06:22,560
OK.

126
00:06:22,560 --> 00:06:25,300
What do you do to begin with
is multiply p and q together,

127
00:06:25,300 --> 00:06:26,260
which is easy to do.

128
00:06:26,260 --> 00:06:28,920
Let's call that number n.

129
00:06:28,920 --> 00:06:31,790
And now the other thing
the receiver is going to do

130
00:06:31,790 --> 00:06:35,240
is find a number e
that's relatively

131
00:06:35,240 --> 00:06:39,330
prime to this peculiar
number p minus 1, q minus 1.

132
00:06:39,330 --> 00:06:43,080
Now as a hint, you might notice
that p minus 1, q minus 1

133
00:06:43,080 --> 00:06:45,930
is in fact Euler's
function of n-- phi of n.

134
00:06:45,930 --> 00:06:49,330
But for now, we don't
need to understand

135
00:06:49,330 --> 00:06:50,890
that this is Euler's function.

136
00:06:50,890 --> 00:06:54,550
It's just the recipe of
what the receiver has to do.

137
00:06:54,550 --> 00:06:57,350
Find a number e that's
relatively prime to p minus 1,

138
00:06:57,350 --> 00:06:58,000
q minus 1.

139
00:06:58,000 --> 00:06:59,870
Again, you don't want
e to be too small,

140
00:06:59,870 --> 00:07:03,390
and we'll discuss in a moment
how do you find such an e.

141
00:07:03,390 --> 00:07:07,960
But the receiver's job
is to find such an e.

142
00:07:07,960 --> 00:07:12,150
This pair of
numbers e and n will

143
00:07:12,150 --> 00:07:17,270
be the public key which the
receiver publishes widely

144
00:07:17,270 --> 00:07:23,285
where it can easily
be found by anyone

145
00:07:23,285 --> 00:07:24,430
who cares to look for it.

146
00:07:24,430 --> 00:07:26,410
Basically there's
a phone directory

147
00:07:26,410 --> 00:07:29,140
where if you want to know how to
send somebody a secret message,

148
00:07:29,140 --> 00:07:32,070
you look them up, and you find
the receivers name in there.

149
00:07:32,070 --> 00:07:34,159
And then you see
his public e and n,

150
00:07:34,159 --> 00:07:36,075
and that's what you use
to send him a message.

151
00:07:36,790 --> 00:07:41,375
Now, how do you use it
to send him a message?

152
00:07:41,375 --> 00:07:42,875
Well, I'll explain
that in a minute,

153
00:07:42,875 --> 00:07:47,440
but let's look at one more
thing that the receiver needs

154
00:07:47,440 --> 00:07:49,340
to do to set himself up.

155
00:07:49,340 --> 00:07:53,700
The receiver is going to find
an inverse of this number e that

156
00:07:53,700 --> 00:08:00,080
he's published-- the part of
his public -- modulo p minus 1,

157
00:08:00,080 --> 00:08:01,450
q minus 1.

158
00:08:01,450 --> 00:08:05,730
That is, this e since it's
relatively prime to p minus 1,

159
00:08:05,730 --> 00:08:09,520
q minus 1, it will have
an inverse in Z star p

160
00:08:09,520 --> 00:08:11,260
minus 1, q minus 1.

161
00:08:11,260 --> 00:08:13,060
Let's let that inverse be d.

162
00:08:13,060 --> 00:08:15,910
And of course, we know
how to find d because you

163
00:08:15,910 --> 00:08:17,760
can do that with a Pulverizer.

164
00:08:17,760 --> 00:08:19,090
D is the private key.

165
00:08:19,090 --> 00:08:21,390
That's this crucial
piece of information

166
00:08:21,390 --> 00:08:23,810
that the receiver has and
that the receiver is not

167
00:08:23,810 --> 00:08:25,520
going to tell anybody.

168
00:08:25,520 --> 00:08:28,450
Only the receiver knows
that because the receiver

169
00:08:28,450 --> 00:08:32,340
chose the p and the q and
the e more or less randomly--

170
00:08:32,340 --> 00:08:34,500
maybe even as randomly
as they can manage--

171
00:08:34,500 --> 00:08:35,679
and then they find the d.

172
00:08:35,679 --> 00:08:37,130
And that's their secret.

173
00:08:37,130 --> 00:08:37,770
OK.

174
00:08:37,770 --> 00:08:39,970
That's what the receiver does.

175
00:08:39,970 --> 00:08:42,720
How does the sender
send a message?

176
00:08:42,720 --> 00:08:48,560
Well, to send a message,
what the sender wants to do

177
00:08:48,560 --> 00:08:52,390
is choose a message that is
in fact a number in the range

178
00:08:52,390 --> 00:08:56,961
from 1 to n where-- we're
thinking again, of n,

179
00:08:56,961 --> 00:08:59,210
if it's a product of two
primes of a couple of hundred

180
00:08:59,210 --> 00:09:03,850
digits each, then the
product is around 400 digits.

181
00:09:03,850 --> 00:09:07,070
And so you can
pick any message m

182
00:09:07,070 --> 00:09:11,142
that can be represented
by a 400 digit number.

183
00:09:11,142 --> 00:09:12,600
Now, there's a lot
of messages that

184
00:09:12,600 --> 00:09:14,125
will fit within 400 digits.

185
00:09:14,125 --> 00:09:15,750
And of course, if
it's bigger, you just

186
00:09:15,750 --> 00:09:18,200
break it up into
400 digit pieces.

187
00:09:18,200 --> 00:09:21,070
So that's the kind of
message you're going to send.

188
00:09:21,070 --> 00:09:22,900
So the message is
going to be a number

189
00:09:22,900 --> 00:09:25,260
in this range from 1 to n.

190
00:09:25,260 --> 00:09:31,280
And what the sender is going to
do is look up the public key e

191
00:09:31,280 --> 00:09:33,440
and the other part
of the public key

192
00:09:33,440 --> 00:09:41,100
n and raise the secret
message to the power e in Z n.

193
00:09:41,100 --> 00:09:43,960
So we're going to
compute m to the e in Zn

194
00:09:43,960 --> 00:09:47,020
and send that encoded
message m hat.

195
00:09:47,020 --> 00:09:50,980
So m hat is what we think
of as the encrypted version

196
00:09:50,980 --> 00:09:54,459
of the message m.

197
00:09:54,459 --> 00:09:56,000
So then we have the
problem if that's

198
00:09:56,000 --> 00:09:58,580
what the sender sends
to the receiver,

199
00:09:58,580 --> 00:10:01,900
how does the receiver
decode the m hat,

200
00:10:01,900 --> 00:10:04,540
and the answer is the
receiver just computes

201
00:10:04,540 --> 00:10:09,190
m hat to the power d--
the secret key-- also

202
00:10:09,190 --> 00:10:10,860
in the ring Zn.

203
00:10:10,860 --> 00:10:14,240
And the claim is that in
fact, that's equal to m.

204
00:10:14,240 --> 00:10:16,430
Now, you can check
in class problem,

205
00:10:16,430 --> 00:10:19,950
and it's easy to see
that the reason why

206
00:10:19,950 --> 00:10:23,600
that method of decrypting
works is precisely

207
00:10:23,600 --> 00:10:26,180
an application of Euler's
theorem-- at least

208
00:10:26,180 --> 00:10:29,600
when m happens to be
relatively prime to n.

209
00:10:29,600 --> 00:10:32,400
Now, the odds of
finding an m that's

210
00:10:32,400 --> 00:10:35,620
not relatively prime
to n are basically

211
00:10:35,620 --> 00:10:38,599
negligible because if
you'd find such an m,

212
00:10:38,599 --> 00:10:40,057
it would enable
you to factor them.

213
00:10:40,057 --> 00:10:42,320
And we believe
factoring is very hard.

214
00:10:42,320 --> 00:10:45,280
But in fact, it actually
works for all m, which

215
00:10:45,280 --> 00:10:46,740
is a nice theoretical results.

216
00:10:46,740 --> 00:10:50,080
And you'll work this
out in class problem.

217
00:10:50,080 --> 00:10:51,310
OK.

218
00:10:51,310 --> 00:10:54,410
That's how it works.

219
00:10:54,410 --> 00:11:00,230
The receiver publishers e
and n, keeps a secret key d.

220
00:11:00,230 --> 00:11:03,980
The sender exponentiates
the message to the power e.

221
00:11:03,980 --> 00:11:07,620
The receiver simply
decodes by raising

222
00:11:07,620 --> 00:11:09,170
the received
message to the power

223
00:11:09,170 --> 00:11:12,930
d and reads off what
the original was.

224
00:11:12,930 --> 00:11:13,640
OK.

225
00:11:13,640 --> 00:11:16,650
So we need to think about the
feasibility of all of this

226
00:11:16,650 --> 00:11:21,740
because we believe that
it's impossible to decrypt,

227
00:11:21,740 --> 00:11:24,535
but there's a lot of other stuff
going on there that the players

228
00:11:24,535 --> 00:11:25,535
have to be able perform.

229
00:11:25,535 --> 00:11:28,550
And let's examine what their
responsibilities and abilities

230
00:11:28,550 --> 00:11:29,710
have to be.

231
00:11:29,710 --> 00:11:31,340
So the receiver
to begin with has

232
00:11:31,340 --> 00:11:33,320
to be able to find large primes.

233
00:11:33,320 --> 00:11:35,900
And how on earth
do they do that?

234
00:11:35,900 --> 00:11:39,820
Well, without going
into too much detail,

235
00:11:39,820 --> 00:11:44,650
we can make the remark that
there are lots of primes.

236
00:11:44,650 --> 00:11:47,960
That is to say by appealing
to the prime number theorem,

237
00:11:47,960 --> 00:11:55,490
we know that among the n digit
numbers, about log n of them

238
00:11:55,490 --> 00:11:58,530
are going to be
primes so that you

239
00:11:58,530 --> 00:12:02,590
don't have to go
too long before you

240
00:12:02,590 --> 00:12:04,950
stumble upon a random prime.

241
00:12:04,950 --> 00:12:08,740
That is, if you're
dealing with a 200 digit n

242
00:12:08,740 --> 00:12:13,670
and you're searching for a
prime of around that size,

243
00:12:13,670 --> 00:12:15,200
you're not going
to have to search

244
00:12:15,200 --> 00:12:17,350
more than a few hundred
numbers before you're

245
00:12:17,350 --> 00:12:20,440
likely to stumble on a prime.

246
00:12:20,440 --> 00:12:22,950
And of course, how do you know
that you stumbled on a prime?

247
00:12:22,950 --> 00:12:25,590
Well, you need to be able to
check whether a number is prime

248
00:12:25,590 --> 00:12:29,450
or not-- and efficientlY--
in order for this whole thing

249
00:12:29,450 --> 00:12:30,230
to be feasible.

250
00:12:30,230 --> 00:12:31,855
So we'll have to
discuss that brieflY--

251
00:12:31,855 --> 00:12:34,570
how do you test whether
or not a number is

252
00:12:34,570 --> 00:12:37,850
prime in an efficient way?

253
00:12:37,850 --> 00:12:39,550
The other thing the
receiver has to do

254
00:12:39,550 --> 00:12:43,980
is find an e that's relatively
prime to p minus 1, q minus 1.

255
00:12:43,980 --> 00:12:45,460
But that's easy.

256
00:12:45,460 --> 00:12:47,120
Well, it's easy
because first of all,

257
00:12:47,120 --> 00:12:53,230
if you just kind of randomly
guess a medium sized e

258
00:12:53,230 --> 00:12:56,970
and then search consecutively
from some random number you've

259
00:12:56,970 --> 00:13:00,690
chosen somewhere in the
middle of the interval

260
00:13:00,690 --> 00:13:03,550
up to p minus 1, q minus 1.

261
00:13:03,550 --> 00:13:10,180
Again, you're very likely to
find in a few steps a number

262
00:13:10,180 --> 00:13:13,720
e that is relatively prime
to p minus 1, q minus 1.

263
00:13:13,720 --> 00:13:16,590
How do you recognize that
it's relatively prime?

264
00:13:16,590 --> 00:13:19,060
Well, you just
compute the GCD, which

265
00:13:19,060 --> 00:13:20,970
we know how to do using
Euclid's algorithm.

266
00:13:20,970 --> 00:13:22,540
So that's really
quite efficient.

267
00:13:22,540 --> 00:13:25,842
Recognizing that it's
relatively prime is easy,

268
00:13:25,842 --> 00:13:27,800
you just don't have to
search very many numbers

269
00:13:27,800 --> 00:13:28,936
until you stumble on an e.

270
00:13:28,936 --> 00:13:30,040
OK.

271
00:13:30,040 --> 00:13:33,640
The other thing you have to do
is find the d that an e inverse

272
00:13:33,640 --> 00:13:36,350
modulo p minus 1, q minus 1.

273
00:13:36,350 --> 00:13:41,940
And again, that is the
extended Euclidean algorithm,

274
00:13:41,940 --> 00:13:45,060
the extended GCD,
namely the Pulverizer.

275
00:13:45,060 --> 00:13:49,270
So those are the pieces
that the receiver has to do.

276
00:13:49,270 --> 00:13:51,210
Now, let's look at
this a little bit more

277
00:13:51,210 --> 00:13:53,237
and think about the
information about the prime.

278
00:13:53,237 --> 00:13:54,820
So the famous theorem
about the primes

279
00:13:54,820 --> 00:13:58,630
is their density, which
is if you let a pi of n

280
00:13:58,630 --> 00:14:02,590
be the number of primes
less than or equal to n,

281
00:14:02,590 --> 00:14:05,560
then it's a deep
theorem of number theory

282
00:14:05,560 --> 00:14:10,380
that pi event
actually approaches

283
00:14:10,380 --> 00:14:13,230
a limit in an asymptotic sense--
which we'll discuss in more

284
00:14:13,230 --> 00:14:16,680
detail-- that pi of n
as n grows gets to be

285
00:14:16,680 --> 00:14:19,230
very close to n over log n.

286
00:14:19,230 --> 00:14:21,220
That's the natural log of n.

287
00:14:24,440 --> 00:14:25,597
Now, that's a deep theorem.

288
00:14:25,597 --> 00:14:27,680
But in fact, if we want a
self-contained treatment

289
00:14:27,680 --> 00:14:30,650
for our purposes,
there's an exercise

290
00:14:30,650 --> 00:14:32,920
that will be in
the text where we

291
00:14:32,920 --> 00:14:37,250
can derive Chebyshev's bound,
which is weaker than they tight

292
00:14:37,250 --> 00:14:38,130
prime number theorem.

293
00:14:38,130 --> 00:14:39,910
But Chebyshev's
bound, which can be

294
00:14:39,910 --> 00:14:44,390
proved by more elementary means
that's within our own ability

295
00:14:44,390 --> 00:14:46,620
at this point with the
number theory we have--

296
00:14:46,620 --> 00:14:50,380
to be able to show
that n over 4 log n

297
00:14:50,380 --> 00:14:53,010
is a lower bound on pi of n.

298
00:14:53,010 --> 00:14:55,720
So basically that
says that if you're

299
00:14:55,720 --> 00:14:58,940
dealing with numbers
of size n, which

300
00:14:58,940 --> 00:15:01,750
means they're of length
log n a few hundred digits,

301
00:15:01,750 --> 00:15:06,300
then you only have
to search maybe 1,000

302
00:15:06,300 --> 00:15:09,530
digits before your very
likely to stumble on a prime.

303
00:15:09,530 --> 00:15:14,100
And if you search 2,000 digits,
it becomes extremely likely

304
00:15:14,100 --> 00:15:15,870
that you'll stumble on a prime.

305
00:15:15,870 --> 00:15:18,140
So the primes are
dense enough that we

306
00:15:18,140 --> 00:15:20,780
can afford to look
for them, providing

307
00:15:20,780 --> 00:15:22,870
we can have a reasonably
fast way to recognize

308
00:15:22,870 --> 00:15:24,100
when a number is prime.

309
00:15:24,100 --> 00:15:27,070
Well, one simple
way that it almost

310
00:15:27,070 --> 00:15:30,030
is perfect-- but
works pragmatically

311
00:15:30,030 --> 00:15:32,670
pretty well-- is
called the Fermat test.

312
00:15:32,670 --> 00:15:35,750
But let me just reemphasize
this -- I got ahead of myself--

313
00:15:35,750 --> 00:15:38,650
that if I'm dealing
with 200 digit numbers,

314
00:15:38,650 --> 00:15:41,440
then about one in 1,000 is
prime using just the weaker

315
00:15:41,440 --> 00:15:42,680
Chebyshev's bound.

316
00:15:42,680 --> 00:15:44,500
And that says that I
don't have to search

317
00:15:44,500 --> 00:15:47,490
too long-- only a
few thousand numbers

318
00:15:47,490 --> 00:15:48,910
to be able to find a prime.

319
00:15:48,910 --> 00:15:50,540
And a few thousand
numbers is well

320
00:15:50,540 --> 00:15:54,330
within the ability of a
computer to carry out,

321
00:15:54,330 --> 00:15:57,270
providing that the test for
recognizing that a number is

322
00:15:57,270 --> 00:15:59,960
prime isn't too time consuming.

323
00:15:59,960 --> 00:16:03,520
So one naive way that
the really almost

324
00:16:03,520 --> 00:16:06,390
works to be a reliable
primality test

325
00:16:06,390 --> 00:16:09,837
is to check whether
Fermat's theorem is obeyed.

326
00:16:09,837 --> 00:16:12,170
Fermat's theorem-- the special
case of Euler's theorem--

327
00:16:12,170 --> 00:16:15,540
says that if n is
prime, then if I

328
00:16:15,540 --> 00:16:18,830
compute a number a
to the n minus 1,

329
00:16:18,830 --> 00:16:22,310
it's going to equal 1 in Z n.

330
00:16:22,310 --> 00:16:24,320
And that's going to
be the case for all a

331
00:16:24,320 --> 00:16:28,570
that are not 0 if n is prime.

332
00:16:28,570 --> 00:16:33,610
Now that means that if
this equality fails in Z n,

333
00:16:33,610 --> 00:16:35,700
then I immediately
know a is not prime.

334
00:16:35,700 --> 00:16:36,200
Go on.

335
00:16:36,200 --> 00:16:38,200
Search for another one.

336
00:16:38,200 --> 00:16:38,890
OK.

337
00:16:38,890 --> 00:16:42,300
So suppose I'm
unlucky-- or lucky--

338
00:16:42,300 --> 00:16:45,870
and I choose an a to
test and it turns out

339
00:16:45,870 --> 00:16:49,760
that a to the n minus 1 is 1,
does that mean that n is prime?

340
00:16:49,760 --> 00:16:50,920
Unfortunately not.

341
00:16:50,920 --> 00:16:53,620
It might be that I
just hit an n that

342
00:16:53,620 --> 00:16:59,580
happened to satisfy Fermat's
equation even though n was not

343
00:16:59,580 --> 00:17:00,480
prime.

344
00:17:00,480 --> 00:17:05,089
But it's not a very
hard thing to prove

345
00:17:05,089 --> 00:17:11,890
that if n is not prime, then
half of the numbers from 1 to n

346
00:17:11,890 --> 00:17:14,589
are not going to
pass the Fermat test.

347
00:17:14,589 --> 00:17:16,569
So if half of the
numbers are not

348
00:17:16,569 --> 00:17:19,849
going to pass the Fermat
test, then what I can do

349
00:17:19,849 --> 00:17:22,800
is just choose a random
nonzero number in the interval

350
00:17:22,800 --> 00:17:26,280
from 1 to n, raise it to
the n minus first power,

351
00:17:26,280 --> 00:17:28,160
and see what happens.

352
00:17:28,160 --> 00:17:31,730
And if n is not
prime, the probability

353
00:17:31,730 --> 00:17:36,210
that this random numbers that
I've chosen fails this test

354
00:17:36,210 --> 00:17:37,650
is at least a half.

355
00:17:37,650 --> 00:17:40,010
So I try it 50 times.

356
00:17:40,010 --> 00:17:44,400
And if in fact 50 randomly
chosen a's in the interval 1

357
00:17:44,400 --> 00:17:48,370
to n all satisfy
Fermat's theorem,

358
00:17:48,370 --> 00:17:58,850
then there's one chance in 2 to
the 50th that n is not prime.

359
00:17:58,850 --> 00:17:59,860
That's a great bet.

360
00:17:59,860 --> 00:18:01,500
Leap for it.

361
00:18:01,500 --> 00:18:05,660
So that basically is the idea of
a probabilistic primarily test.

362
00:18:05,660 --> 00:18:07,770
Now, there's a
small complication

363
00:18:07,770 --> 00:18:10,540
which is that there are
certain numbers n where

364
00:18:10,540 --> 00:18:14,240
this property that half the
numbers will fail to satisfy

365
00:18:14,240 --> 00:18:17,244
Fermat's theorem doesn't hold.

366
00:18:17,244 --> 00:18:18,910
They're known as the
Carmichael numbers,

367
00:18:18,910 --> 00:18:21,140
and they're known
to be pretty sparse.

368
00:18:21,140 --> 00:18:23,030
So that really if
you're choosing an n

369
00:18:23,030 --> 00:18:26,800
at random, which is kind of
what we're doing when we choose

370
00:18:26,800 --> 00:18:29,130
random primes p and
q, the likelihood

371
00:18:29,130 --> 00:18:31,010
that you'll stumble
on a Carmichael number

372
00:18:31,010 --> 00:18:33,600
is another thing that you just
don't have to worry about.

373
00:18:33,600 --> 00:18:35,780
So really, the
Fermat primality test

374
00:18:35,780 --> 00:18:38,680
is a plausible
pragmatic test that you

375
00:18:38,680 --> 00:18:42,550
could use to pretty reliably
detect whether or not

376
00:18:42,550 --> 00:18:45,370
a number was prime-- what
was the last component

377
00:18:45,370 --> 00:18:49,450
of the powers that we
needed the receiver to have.

378
00:18:49,450 --> 00:18:49,990
OK.

379
00:18:49,990 --> 00:18:53,110
So now we come to
the question of why

380
00:18:53,110 --> 00:18:55,930
do we believe that the
RSA protocol is secure?

381
00:18:55,930 --> 00:19:00,330
And the first thing to notice
is that if you could factor n,

382
00:19:00,330 --> 00:19:02,590
then it's easy to break.

383
00:19:02,590 --> 00:19:06,690
Because if you can factor n,
then you have the p and the q.

384
00:19:06,690 --> 00:19:10,170
And that means you know what
p minus 1 times q minus 1 is.

385
00:19:10,170 --> 00:19:13,090
And therefore you can
use the Pulverizer

386
00:19:13,090 --> 00:19:14,870
in exactly the same
way the receiver did

387
00:19:14,870 --> 00:19:18,250
to find the inverse
of the public key e.

388
00:19:18,250 --> 00:19:19,840
You could find d easily.

389
00:19:19,840 --> 00:19:23,790
So surely if you can
factor, then RSA breaks.

390
00:19:23,790 --> 00:19:25,890
No question about that.

391
00:19:25,890 --> 00:19:27,250
What about the converse?

392
00:19:27,250 --> 00:19:30,740
Well, what you can approve--
and there's an argument that's

393
00:19:30,740 --> 00:19:34,900
sketched in class problem,
not fully, in the book--

394
00:19:34,900 --> 00:19:40,610
is that if I could find the
private key d, then in fact,

395
00:19:40,610 --> 00:19:42,030
I can also factor n.

396
00:19:42,030 --> 00:19:45,200
So if I believe that
factoring is hard,

397
00:19:45,200 --> 00:19:49,110
then in fact finding the
secret key is also hard.

398
00:19:49,110 --> 00:19:51,860
And we could try to be confident
that our secret key is not

399
00:19:51,860 --> 00:19:54,810
going to be found
even given the public.

400
00:19:54,810 --> 00:19:58,850
Now, unfortunately this
is not the strongest kind

401
00:19:58,850 --> 00:20:04,700
of security guaranteed
you'd like because there's

402
00:20:04,700 --> 00:20:06,310
a logical possibility
that you might

403
00:20:06,310 --> 00:20:09,050
be able to decrypt messages
without knowing the secret key.

404
00:20:09,050 --> 00:20:11,800
Maybe there's some other
walk around whereby

405
00:20:11,800 --> 00:20:15,650
you can decrypt the secret
message m hat by a method other

406
00:20:15,650 --> 00:20:18,060
than raising it
to the dth power.

407
00:20:18,060 --> 00:20:22,160
And what you'd really like
is a theorem of security

408
00:20:22,160 --> 00:20:25,590
that said that
breaking RSA-- reading

409
00:20:25,590 --> 00:20:28,160
RSA messages by any
means whatsoever--

410
00:20:28,160 --> 00:20:29,440
would be as hard as factoring.

411
00:20:29,440 --> 00:20:30,690
That's not known for RSA.

412
00:20:30,690 --> 00:20:32,480
It's an open problem.

413
00:20:32,480 --> 00:20:37,570
And so RSA doesn't have the
theoretically most desirable

414
00:20:37,570 --> 00:20:41,510
security assurance, but
we really believe in it.

415
00:20:41,510 --> 00:20:43,100
And the reason we
really believe in it

416
00:20:43,100 --> 00:20:47,300
is that for 100 or more years,
mathematicians and number

417
00:20:47,300 --> 00:20:50,320
theorists have been trying to
find efficient ways to factor.

418
00:20:50,320 --> 00:20:55,970
And more pragmatically, the most
sophisticated cryptographers

419
00:20:55,970 --> 00:20:59,960
and decoders in the world using
the most powerful networks

420
00:20:59,960 --> 00:21:04,830
of supercomputers have been
attacking RSA for 35 years

421
00:21:04,830 --> 00:21:07,080
and have yet to crack it.

422
00:21:07,080 --> 00:21:10,130
Now, the truth is that in
the course of the 35 years,

423
00:21:10,130 --> 00:21:12,170
various kinds of
glitches were found

424
00:21:12,170 --> 00:21:18,050
that required some added
rules about how you found

425
00:21:18,050 --> 00:21:20,870
the p and the q and
how you found the e,

426
00:21:20,870 --> 00:21:24,620
but they were easily
identified and fixed.

427
00:21:24,620 --> 00:21:30,280
And RSA really is a robust
public key encryption

428
00:21:30,280 --> 00:21:36,460
method that has withstood
attack for all these years.

429
00:21:36,460 --> 00:21:38,990
That's why we believe in it.