1 00:00:00,880 --> 00:00:05,110 The RSA crypto systems is one of the lovely and really important 2 00:00:05,110 --> 00:00:08,440 applications of number theory in computer science. 3 00:00:08,440 --> 00:00:12,140 So let's start talking about it. 4 00:00:12,140 --> 00:00:13,950 The RSA crypto system is what is known 5 00:00:13,950 --> 00:00:18,130 as a public key cryptosystem, which has the following really 6 00:00:18,130 --> 00:00:23,500 amazing properties-- namely, anyone 7 00:00:23,500 --> 00:00:25,820 can send a secret encrypted message 8 00:00:25,820 --> 00:00:29,340 to a designated receiver. 9 00:00:29,340 --> 00:00:32,860 This is without there being any prior contact using only 10 00:00:32,860 --> 00:00:36,304 publicly available information. 11 00:00:36,304 --> 00:00:37,720 Now, if you think about that, it's 12 00:00:37,720 --> 00:00:40,080 really terrific because it means that you 13 00:00:40,080 --> 00:00:44,020 can send a secret message to Amazon that nobody but Amazon 14 00:00:44,020 --> 00:00:46,670 can read even though the entire world knows 15 00:00:46,670 --> 00:00:50,050 what you know and can see what you sent to Amazon. 16 00:00:50,050 --> 00:00:52,380 And Amazon knows that it's the only one can 17 00:00:52,380 --> 00:00:55,350 decrypt the message you sent. 18 00:00:55,350 --> 00:00:58,900 This in fact is hard to believe if you think about it. 19 00:00:58,900 --> 00:01:00,350 It sounds paradoxical. 20 00:01:00,350 --> 00:01:04,739 How can secrecy be possible using only public info? 21 00:01:04,739 --> 00:01:07,890 And in fact, the existence of this public key cryptosystem 22 00:01:07,890 --> 00:01:12,010 has some genuinely paradoxical consequence, which kind of 23 00:01:12,010 --> 00:01:12,840 are a mind bender. 24 00:01:12,840 --> 00:01:14,935 So let me tell you about one of them. 25 00:01:14,935 --> 00:01:16,810 I don't know if you've heard of mental chess, 26 00:01:16,810 --> 00:01:18,690 but it's a standard thing in the chess world. 27 00:01:18,690 --> 00:01:23,180 Chess masters are so talented and have such deep insight 28 00:01:23,180 --> 00:01:25,205 into the game that they don't need a chessboard, 29 00:01:25,205 --> 00:01:26,580 and they don't need chess pieces. 30 00:01:26,580 --> 00:01:29,850 They can just go for walk on a country 31 00:01:29,850 --> 00:01:31,690 lane talking to each other and saying 32 00:01:31,690 --> 00:01:36,420 pond to king 4 and knight to bishop 3 33 00:01:36,420 --> 00:01:40,030 and just talking chess code and play an entire chess 34 00:01:40,030 --> 00:01:40,635 game that way. 35 00:01:40,635 --> 00:01:42,284 That's known as mental chess. 36 00:01:42,284 --> 00:01:43,200 It's quite impressive. 37 00:01:43,200 --> 00:01:46,710 In fact, the grand masters can play multiple games 38 00:01:46,710 --> 00:01:48,220 of mental chess against opponents 39 00:01:48,220 --> 00:01:49,950 who are staring at the chessboard 40 00:01:49,950 --> 00:01:52,560 and win the great majority of the games. 41 00:01:52,560 --> 00:01:56,180 Of course, these are not against other grand masters, but still. 42 00:01:56,180 --> 00:01:56,680 OK. 43 00:01:56,680 --> 00:01:58,470 So now, this is what I propose. 44 00:01:58,470 --> 00:02:00,000 How about playing mental poker? 45 00:02:00,000 --> 00:02:02,000 If you know how to play poker, we deal our cards 46 00:02:02,000 --> 00:02:04,620 and we bet and so on. 47 00:02:04,620 --> 00:02:08,160 And my only condition is that I'll deal. 48 00:02:08,160 --> 00:02:11,330 Now, that sounds like a joke and an absurd thing 49 00:02:11,330 --> 00:02:14,090 for you to agree to do, but it's amazing. 50 00:02:14,090 --> 00:02:15,640 It's actually possible. 51 00:02:15,640 --> 00:02:20,960 One of the famous papers of Rivest and Shamir 52 00:02:20,960 --> 00:02:26,200 was how to play mental poker using public key crypto. 53 00:02:26,200 --> 00:02:33,720 So I once tried to persuade an eminent MIT dean who's 54 00:02:33,720 --> 00:02:35,507 a physicist researcher about this, 55 00:02:35,507 --> 00:02:36,840 and he just wouldn't believe it. 56 00:02:36,840 --> 00:02:40,750 He argued that it was just impossible logically. 57 00:02:40,750 --> 00:02:42,990 And what he was thinking about was 58 00:02:42,990 --> 00:02:46,940 that if you know how to compute a function, then of course 59 00:02:46,940 --> 00:02:50,090 you can figure out how to invert it. 60 00:02:50,090 --> 00:02:54,170 That is to say if I know how to compute some function f 61 00:02:54,170 --> 00:02:58,010 of a number and let's say that the function is 62 00:02:58,010 --> 00:03:00,870 one arrow in-- that is an injection-- then 63 00:03:00,870 --> 00:03:04,010 if I know what f of n, there's a unique n that it came from. 64 00:03:04,010 --> 00:03:08,000 So how can I not be able to find n? 65 00:03:08,000 --> 00:03:11,840 And it's an insight of computer science and complexity theory 66 00:03:11,840 --> 00:03:14,750 that says it's quite possible. 67 00:03:14,750 --> 00:03:18,430 It's not that you can't find the n that produced f of n. 68 00:03:18,430 --> 00:03:21,670 It's that the search for it will be prohibitive. 69 00:03:21,670 --> 00:03:24,550 There are, in short, one-way. 70 00:03:24,550 --> 00:03:29,870 That is, functions that are easy to compute in one direction 71 00:03:29,870 --> 00:03:31,960 but hard to invert. 72 00:03:31,960 --> 00:03:33,950 They're easy to compute but hard to invert. 73 00:03:33,950 --> 00:03:37,522 In particular, we're thinking about multiplying 74 00:03:37,522 --> 00:03:38,105 and factoring. 75 00:03:40,896 --> 00:03:43,270 It's an observation that it's easy to compute the product 76 00:03:43,270 --> 00:03:44,720 of two large prime numbers. 77 00:03:44,720 --> 00:03:45,886 We all know how to multiply. 78 00:03:45,886 --> 00:03:49,950 And in fact, there are faster ways to multiply than you know. 79 00:03:49,950 --> 00:03:53,070 But the current state of our knowledge 80 00:03:53,070 --> 00:03:55,230 of number theory and complexity theory 81 00:03:55,230 --> 00:03:58,570 is that given a number n that happens 82 00:03:58,570 --> 00:04:00,470 to be the product of two primes, it 83 00:04:00,470 --> 00:04:05,300 seems to be hopelessly hard in general to factor 84 00:04:05,300 --> 00:04:07,750 n into the components p and q. 85 00:04:07,750 --> 00:04:09,960 Now, this is an open problem. 86 00:04:09,960 --> 00:04:12,660 It's similar to the p equals np question-- 87 00:04:12,660 --> 00:04:13,880 that famous open problem. 88 00:04:13,880 --> 00:04:19,450 It's actually a weaker-- it's quite possible 89 00:04:19,450 --> 00:04:22,840 that you could factor, and np would not equal to np. 90 00:04:22,840 --> 00:04:24,950 But nevertheless, it's the same kind of problem. 91 00:04:24,950 --> 00:04:26,640 And more generally, the existence 92 00:04:26,640 --> 00:04:29,110 of one way functions is closely related to that p 93 00:04:29,110 --> 00:04:30,700 equals np question. 94 00:04:30,700 --> 00:04:33,050 Nevertheless, even though it's an open problem 95 00:04:33,050 --> 00:04:36,170 and theoretically has not been settled either way, 96 00:04:36,170 --> 00:04:41,195 it's widely believed-- the banks, the governments, 97 00:04:41,195 --> 00:04:46,480 and the commercial world have really bet the family jewels 98 00:04:46,480 --> 00:04:52,690 on the difficulty of factoring when they use the RSA protocol. 99 00:04:52,690 --> 00:04:58,880 So I like to make the joke that my most important contribution 100 00:04:58,880 --> 00:05:03,630 to MIT was being involved in the hiring of our S and A. 101 00:05:03,630 --> 00:05:12,130 So this is A, Adi Shamir, R, Ron Rivest, and A, Len Adleman 102 00:05:12,130 --> 00:05:17,100 back in the late '70s when they first came up with these ideas. 103 00:05:17,100 --> 00:05:24,470 So let's look at the way this RSA protocol actually works. 104 00:05:24,470 --> 00:05:25,780 So here's what happens. 105 00:05:25,780 --> 00:05:28,850 To begin with, you have to make some information public 106 00:05:28,850 --> 00:05:31,570 so that people can communicate with you. 107 00:05:31,570 --> 00:05:34,700 We're looking at two players here. 108 00:05:34,700 --> 00:05:37,690 There's a receiver who's going to get encrypted messages, 109 00:05:37,690 --> 00:05:41,270 and there's a sender who is trying 110 00:05:41,270 --> 00:05:43,780 to send an encrypted message to the receiver. 111 00:05:43,780 --> 00:05:46,400 So what the receiver does before hand is 112 00:05:46,400 --> 00:05:49,670 generates two primes, p and q. 113 00:05:49,670 --> 00:05:52,310 Now, in practice, you want these to be pretty big primes-- 114 00:05:52,310 --> 00:05:54,170 hundreds of digits. 115 00:05:54,170 --> 00:05:56,120 And we'll examine it in a moment, the question 116 00:05:56,120 --> 00:05:56,995 of how you find them. 117 00:05:56,995 --> 00:06:02,350 But the receiver's job is to find two quite substantial 118 00:06:02,350 --> 00:06:05,580 large primes, p and q, chosen more or less 119 00:06:05,580 --> 00:06:09,050 randomly because if you have any kind of predictable procedure 120 00:06:09,050 --> 00:06:11,860 for how you got them, that would be a vulnerability. 121 00:06:11,860 --> 00:06:13,900 But if you just choose them at random, 122 00:06:13,900 --> 00:06:18,190 then there's enough primes in the hundreds of digits 123 00:06:18,190 --> 00:06:20,180 that it's hopeless that people would guess 124 00:06:20,180 --> 00:06:22,020 which one you wound up with. 125 00:06:22,020 --> 00:06:22,560 OK. 126 00:06:22,560 --> 00:06:25,300 What do you do to begin with is multiply p and q together, 127 00:06:25,300 --> 00:06:26,260 which is easy to do. 128 00:06:26,260 --> 00:06:28,920 Let's call that number n. 129 00:06:28,920 --> 00:06:31,790 And now the other thing the receiver is going to do 130 00:06:31,790 --> 00:06:35,240 is find a number e that's relatively 131 00:06:35,240 --> 00:06:39,330 prime to this peculiar number p minus 1, q minus 1. 132 00:06:39,330 --> 00:06:43,080 Now as a hint, you might notice that p minus 1, q minus 1 133 00:06:43,080 --> 00:06:45,930 is in fact Euler's function of n-- phi of n. 134 00:06:45,930 --> 00:06:49,330 But for now, we don't need to understand 135 00:06:49,330 --> 00:06:50,890 that this is Euler's function. 136 00:06:50,890 --> 00:06:54,550 It's just the recipe of what the receiver has to do. 137 00:06:54,550 --> 00:06:57,350 Find a number e that's relatively prime to p minus 1, 138 00:06:57,350 --> 00:06:58,000 q minus 1. 139 00:06:58,000 --> 00:06:59,870 Again, you don't want e to be too small, 140 00:06:59,870 --> 00:07:03,390 and we'll discuss in a moment how do you find such an e. 141 00:07:03,390 --> 00:07:07,960 But the receiver's job is to find such an e. 142 00:07:07,960 --> 00:07:12,150 This pair of numbers e and n will 143 00:07:12,150 --> 00:07:17,270 be the public key which the receiver publishes widely 144 00:07:17,270 --> 00:07:23,285 where it can easily be found by anyone 145 00:07:23,285 --> 00:07:24,430 who cares to look for it. 146 00:07:24,430 --> 00:07:26,410 Basically there's a phone directory 147 00:07:26,410 --> 00:07:29,140 where if you want to know how to send somebody a secret message, 148 00:07:29,140 --> 00:07:32,070 you look them up, and you find the receivers name in there. 149 00:07:32,070 --> 00:07:34,159 And then you see his public e and n, 150 00:07:34,159 --> 00:07:36,075 and that's what you use to send him a message. 151 00:07:36,790 --> 00:07:41,375 Now, how do you use it to send him a message? 152 00:07:41,375 --> 00:07:42,875 Well, I'll explain that in a minute, 153 00:07:42,875 --> 00:07:47,440 but let's look at one more thing that the receiver needs 154 00:07:47,440 --> 00:07:49,340 to do to set himself up. 155 00:07:49,340 --> 00:07:53,700 The receiver is going to find an inverse of this number e that 156 00:07:53,700 --> 00:08:00,080 he's published-- the part of his public -- modulo p minus 1, 157 00:08:00,080 --> 00:08:01,450 q minus 1. 158 00:08:01,450 --> 00:08:05,730 That is, this e since it's relatively prime to p minus 1, 159 00:08:05,730 --> 00:08:09,520 q minus 1, it will have an inverse in Z star p 160 00:08:09,520 --> 00:08:11,260 minus 1, q minus 1. 161 00:08:11,260 --> 00:08:13,060 Let's let that inverse be d. 162 00:08:13,060 --> 00:08:15,910 And of course, we know how to find d because you 163 00:08:15,910 --> 00:08:17,760 can do that with a Pulverizer. 164 00:08:17,760 --> 00:08:19,090 D is the private key. 165 00:08:19,090 --> 00:08:21,390 That's this crucial piece of information 166 00:08:21,390 --> 00:08:23,810 that the receiver has and that the receiver is not 167 00:08:23,810 --> 00:08:25,520 going to tell anybody. 168 00:08:25,520 --> 00:08:28,450 Only the receiver knows that because the receiver 169 00:08:28,450 --> 00:08:32,340 chose the p and the q and the e more or less randomly-- 170 00:08:32,340 --> 00:08:34,500 maybe even as randomly as they can manage-- 171 00:08:34,500 --> 00:08:35,679 and then they find the d. 172 00:08:35,679 --> 00:08:37,130 And that's their secret. 173 00:08:37,130 --> 00:08:37,770 OK. 174 00:08:37,770 --> 00:08:39,970 That's what the receiver does. 175 00:08:39,970 --> 00:08:42,720 How does the sender send a message? 176 00:08:42,720 --> 00:08:48,560 Well, to send a message, what the sender wants to do 177 00:08:48,560 --> 00:08:52,390 is choose a message that is in fact a number in the range 178 00:08:52,390 --> 00:08:56,961 from 1 to n where-- we're thinking again, of n, 179 00:08:56,961 --> 00:08:59,210 if it's a product of two primes of a couple of hundred 180 00:08:59,210 --> 00:09:03,850 digits each, then the product is around 400 digits. 181 00:09:03,850 --> 00:09:07,070 And so you can pick any message m 182 00:09:07,070 --> 00:09:11,142 that can be represented by a 400 digit number. 183 00:09:11,142 --> 00:09:12,600 Now, there's a lot of messages that 184 00:09:12,600 --> 00:09:14,125 will fit within 400 digits. 185 00:09:14,125 --> 00:09:15,750 And of course, if it's bigger, you just 186 00:09:15,750 --> 00:09:18,200 break it up into 400 digit pieces. 187 00:09:18,200 --> 00:09:21,070 So that's the kind of message you're going to send. 188 00:09:21,070 --> 00:09:22,900 So the message is going to be a number 189 00:09:22,900 --> 00:09:25,260 in this range from 1 to n. 190 00:09:25,260 --> 00:09:31,280 And what the sender is going to do is look up the public key e 191 00:09:31,280 --> 00:09:33,440 and the other part of the public key 192 00:09:33,440 --> 00:09:41,100 n and raise the secret message to the power e in Z n. 193 00:09:41,100 --> 00:09:43,960 So we're going to compute m to the e in Zn 194 00:09:43,960 --> 00:09:47,020 and send that encoded message m hat. 195 00:09:47,020 --> 00:09:50,980 So m hat is what we think of as the encrypted version 196 00:09:50,980 --> 00:09:54,459 of the message m. 197 00:09:54,459 --> 00:09:56,000 So then we have the problem if that's 198 00:09:56,000 --> 00:09:58,580 what the sender sends to the receiver, 199 00:09:58,580 --> 00:10:01,900 how does the receiver decode the m hat, 200 00:10:01,900 --> 00:10:04,540 and the answer is the receiver just computes 201 00:10:04,540 --> 00:10:09,190 m hat to the power d-- the secret key-- also 202 00:10:09,190 --> 00:10:10,860 in the ring Zn. 203 00:10:10,860 --> 00:10:14,240 And the claim is that in fact, that's equal to m. 204 00:10:14,240 --> 00:10:16,430 Now, you can check in class problem, 205 00:10:16,430 --> 00:10:19,950 and it's easy to see that the reason why 206 00:10:19,950 --> 00:10:23,600 that method of decrypting works is precisely 207 00:10:23,600 --> 00:10:26,180 an application of Euler's theorem-- at least 208 00:10:26,180 --> 00:10:29,600 when m happens to be relatively prime to n. 209 00:10:29,600 --> 00:10:32,400 Now, the odds of finding an m that's 210 00:10:32,400 --> 00:10:35,620 not relatively prime to n are basically 211 00:10:35,620 --> 00:10:38,599 negligible because if you'd find such an m, 212 00:10:38,599 --> 00:10:40,057 it would enable you to factor them. 213 00:10:40,057 --> 00:10:42,320 And we believe factoring is very hard. 214 00:10:42,320 --> 00:10:45,280 But in fact, it actually works for all m, which 215 00:10:45,280 --> 00:10:46,740 is a nice theoretical results. 216 00:10:46,740 --> 00:10:50,080 And you'll work this out in class problem. 217 00:10:50,080 --> 00:10:51,310 OK. 218 00:10:51,310 --> 00:10:54,410 That's how it works. 219 00:10:54,410 --> 00:11:00,230 The receiver publishers e and n, keeps a secret key d. 220 00:11:00,230 --> 00:11:03,980 The sender exponentiates the message to the power e. 221 00:11:03,980 --> 00:11:07,620 The receiver simply decodes by raising 222 00:11:07,620 --> 00:11:09,170 the received message to the power 223 00:11:09,170 --> 00:11:12,930 d and reads off what the original was. 224 00:11:12,930 --> 00:11:13,640 OK. 225 00:11:13,640 --> 00:11:16,650 So we need to think about the feasibility of all of this 226 00:11:16,650 --> 00:11:21,740 because we believe that it's impossible to decrypt, 227 00:11:21,740 --> 00:11:24,535 but there's a lot of other stuff going on there that the players 228 00:11:24,535 --> 00:11:25,535 have to be able perform. 229 00:11:25,535 --> 00:11:28,550 And let's examine what their responsibilities and abilities 230 00:11:28,550 --> 00:11:29,710 have to be. 231 00:11:29,710 --> 00:11:31,340 So the receiver to begin with has 232 00:11:31,340 --> 00:11:33,320 to be able to find large primes. 233 00:11:33,320 --> 00:11:35,900 And how on earth do they do that? 234 00:11:35,900 --> 00:11:39,820 Well, without going into too much detail, 235 00:11:39,820 --> 00:11:44,650 we can make the remark that there are lots of primes. 236 00:11:44,650 --> 00:11:47,960 That is to say by appealing to the prime number theorem, 237 00:11:47,960 --> 00:11:55,490 we know that among the n digit numbers, about log n of them 238 00:11:55,490 --> 00:11:58,530 are going to be primes so that you 239 00:11:58,530 --> 00:12:02,590 don't have to go too long before you 240 00:12:02,590 --> 00:12:04,950 stumble upon a random prime. 241 00:12:04,950 --> 00:12:08,740 That is, if you're dealing with a 200 digit n 242 00:12:08,740 --> 00:12:13,670 and you're searching for a prime of around that size, 243 00:12:13,670 --> 00:12:15,200 you're not going to have to search 244 00:12:15,200 --> 00:12:17,350 more than a few hundred numbers before you're 245 00:12:17,350 --> 00:12:20,440 likely to stumble on a prime. 246 00:12:20,440 --> 00:12:22,950 And of course, how do you know that you stumbled on a prime? 247 00:12:22,950 --> 00:12:25,590 Well, you need to be able to check whether a number is prime 248 00:12:25,590 --> 00:12:29,450 or not-- and efficientlY-- in order for this whole thing 249 00:12:29,450 --> 00:12:30,230 to be feasible. 250 00:12:30,230 --> 00:12:31,855 So we'll have to discuss that brieflY-- 251 00:12:31,855 --> 00:12:34,570 how do you test whether or not a number is 252 00:12:34,570 --> 00:12:37,850 prime in an efficient way? 253 00:12:37,850 --> 00:12:39,550 The other thing the receiver has to do 254 00:12:39,550 --> 00:12:43,980 is find an e that's relatively prime to p minus 1, q minus 1. 255 00:12:43,980 --> 00:12:45,460 But that's easy. 256 00:12:45,460 --> 00:12:47,120 Well, it's easy because first of all, 257 00:12:47,120 --> 00:12:53,230 if you just kind of randomly guess a medium sized e 258 00:12:53,230 --> 00:12:56,970 and then search consecutively from some random number you've 259 00:12:56,970 --> 00:13:00,690 chosen somewhere in the middle of the interval 260 00:13:00,690 --> 00:13:03,550 up to p minus 1, q minus 1. 261 00:13:03,550 --> 00:13:10,180 Again, you're very likely to find in a few steps a number 262 00:13:10,180 --> 00:13:13,720 e that is relatively prime to p minus 1, q minus 1. 263 00:13:13,720 --> 00:13:16,590 How do you recognize that it's relatively prime? 264 00:13:16,590 --> 00:13:19,060 Well, you just compute the GCD, which 265 00:13:19,060 --> 00:13:20,970 we know how to do using Euclid's algorithm. 266 00:13:20,970 --> 00:13:22,540 So that's really quite efficient. 267 00:13:22,540 --> 00:13:25,842 Recognizing that it's relatively prime is easy, 268 00:13:25,842 --> 00:13:27,800 you just don't have to search very many numbers 269 00:13:27,800 --> 00:13:28,936 until you stumble on an e. 270 00:13:28,936 --> 00:13:30,040 OK. 271 00:13:30,040 --> 00:13:33,640 The other thing you have to do is find the d that an e inverse 272 00:13:33,640 --> 00:13:36,350 modulo p minus 1, q minus 1. 273 00:13:36,350 --> 00:13:41,940 And again, that is the extended Euclidean algorithm, 274 00:13:41,940 --> 00:13:45,060 the extended GCD, namely the Pulverizer. 275 00:13:45,060 --> 00:13:49,270 So those are the pieces that the receiver has to do. 276 00:13:49,270 --> 00:13:51,210 Now, let's look at this a little bit more 277 00:13:51,210 --> 00:13:53,237 and think about the information about the prime. 278 00:13:53,237 --> 00:13:54,820 So the famous theorem about the primes 279 00:13:54,820 --> 00:13:58,630 is their density, which is if you let a pi of n 280 00:13:58,630 --> 00:14:02,590 be the number of primes less than or equal to n, 281 00:14:02,590 --> 00:14:05,560 then it's a deep theorem of number theory 282 00:14:05,560 --> 00:14:10,380 that pi event actually approaches 283 00:14:10,380 --> 00:14:13,230 a limit in an asymptotic sense-- which we'll discuss in more 284 00:14:13,230 --> 00:14:16,680 detail-- that pi of n as n grows gets to be 285 00:14:16,680 --> 00:14:19,230 very close to n over log n. 286 00:14:19,230 --> 00:14:21,220 That's the natural log of n. 287 00:14:24,440 --> 00:14:25,597 Now, that's a deep theorem. 288 00:14:25,597 --> 00:14:27,680 But in fact, if we want a self-contained treatment 289 00:14:27,680 --> 00:14:30,650 for our purposes, there's an exercise 290 00:14:30,650 --> 00:14:32,920 that will be in the text where we 291 00:14:32,920 --> 00:14:37,250 can derive Chebyshev's bound, which is weaker than they tight 292 00:14:37,250 --> 00:14:38,130 prime number theorem. 293 00:14:38,130 --> 00:14:39,910 But Chebyshev's bound, which can be 294 00:14:39,910 --> 00:14:44,390 proved by more elementary means that's within our own ability 295 00:14:44,390 --> 00:14:46,620 at this point with the number theory we have-- 296 00:14:46,620 --> 00:14:50,380 to be able to show that n over 4 log n 297 00:14:50,380 --> 00:14:53,010 is a lower bound on pi of n. 298 00:14:53,010 --> 00:14:55,720 So basically that says that if you're 299 00:14:55,720 --> 00:14:58,940 dealing with numbers of size n, which 300 00:14:58,940 --> 00:15:01,750 means they're of length log n a few hundred digits, 301 00:15:01,750 --> 00:15:06,300 then you only have to search maybe 1,000 302 00:15:06,300 --> 00:15:09,530 digits before your very likely to stumble on a prime. 303 00:15:09,530 --> 00:15:14,100 And if you search 2,000 digits, it becomes extremely likely 304 00:15:14,100 --> 00:15:15,870 that you'll stumble on a prime. 305 00:15:15,870 --> 00:15:18,140 So the primes are dense enough that we 306 00:15:18,140 --> 00:15:20,780 can afford to look for them, providing 307 00:15:20,780 --> 00:15:22,870 we can have a reasonably fast way to recognize 308 00:15:22,870 --> 00:15:24,100 when a number is prime. 309 00:15:24,100 --> 00:15:27,070 Well, one simple way that it almost 310 00:15:27,070 --> 00:15:30,030 is perfect-- but works pragmatically 311 00:15:30,030 --> 00:15:32,670 pretty well-- is called the Fermat test. 312 00:15:32,670 --> 00:15:35,750 But let me just reemphasize this -- I got ahead of myself-- 313 00:15:35,750 --> 00:15:38,650 that if I'm dealing with 200 digit numbers, 314 00:15:38,650 --> 00:15:41,440 then about one in 1,000 is prime using just the weaker 315 00:15:41,440 --> 00:15:42,680 Chebyshev's bound. 316 00:15:42,680 --> 00:15:44,500 And that says that I don't have to search 317 00:15:44,500 --> 00:15:47,490 too long-- only a few thousand numbers 318 00:15:47,490 --> 00:15:48,910 to be able to find a prime. 319 00:15:48,910 --> 00:15:50,540 And a few thousand numbers is well 320 00:15:50,540 --> 00:15:54,330 within the ability of a computer to carry out, 321 00:15:54,330 --> 00:15:57,270 providing that the test for recognizing that a number is 322 00:15:57,270 --> 00:15:59,960 prime isn't too time consuming. 323 00:15:59,960 --> 00:16:03,520 So one naive way that the really almost 324 00:16:03,520 --> 00:16:06,390 works to be a reliable primality test 325 00:16:06,390 --> 00:16:09,837 is to check whether Fermat's theorem is obeyed. 326 00:16:09,837 --> 00:16:12,170 Fermat's theorem-- the special case of Euler's theorem-- 327 00:16:12,170 --> 00:16:15,540 says that if n is prime, then if I 328 00:16:15,540 --> 00:16:18,830 compute a number a to the n minus 1, 329 00:16:18,830 --> 00:16:22,310 it's going to equal 1 in Z n. 330 00:16:22,310 --> 00:16:24,320 And that's going to be the case for all a 331 00:16:24,320 --> 00:16:28,570 that are not 0 if n is prime. 332 00:16:28,570 --> 00:16:33,610 Now that means that if this equality fails in Z n, 333 00:16:33,610 --> 00:16:35,700 then I immediately know a is not prime. 334 00:16:35,700 --> 00:16:36,200 Go on. 335 00:16:36,200 --> 00:16:38,200 Search for another one. 336 00:16:38,200 --> 00:16:38,890 OK. 337 00:16:38,890 --> 00:16:42,300 So suppose I'm unlucky-- or lucky-- 338 00:16:42,300 --> 00:16:45,870 and I choose an a to test and it turns out 339 00:16:45,870 --> 00:16:49,760 that a to the n minus 1 is 1, does that mean that n is prime? 340 00:16:49,760 --> 00:16:50,920 Unfortunately not. 341 00:16:50,920 --> 00:16:53,620 It might be that I just hit an n that 342 00:16:53,620 --> 00:16:59,580 happened to satisfy Fermat's equation even though n was not 343 00:16:59,580 --> 00:17:00,480 prime. 344 00:17:00,480 --> 00:17:05,089 But it's not a very hard thing to prove 345 00:17:05,089 --> 00:17:11,890 that if n is not prime, then half of the numbers from 1 to n 346 00:17:11,890 --> 00:17:14,589 are not going to pass the Fermat test. 347 00:17:14,589 --> 00:17:16,569 So if half of the numbers are not 348 00:17:16,569 --> 00:17:19,849 going to pass the Fermat test, then what I can do 349 00:17:19,849 --> 00:17:22,800 is just choose a random nonzero number in the interval 350 00:17:22,800 --> 00:17:26,280 from 1 to n, raise it to the n minus first power, 351 00:17:26,280 --> 00:17:28,160 and see what happens. 352 00:17:28,160 --> 00:17:31,730 And if n is not prime, the probability 353 00:17:31,730 --> 00:17:36,210 that this random numbers that I've chosen fails this test 354 00:17:36,210 --> 00:17:37,650 is at least a half. 355 00:17:37,650 --> 00:17:40,010 So I try it 50 times. 356 00:17:40,010 --> 00:17:44,400 And if in fact 50 randomly chosen a's in the interval 1 357 00:17:44,400 --> 00:17:48,370 to n all satisfy Fermat's theorem, 358 00:17:48,370 --> 00:17:58,850 then there's one chance in 2 to the 50th that n is not prime. 359 00:17:58,850 --> 00:17:59,860 That's a great bet. 360 00:17:59,860 --> 00:18:01,500 Leap for it. 361 00:18:01,500 --> 00:18:05,660 So that basically is the idea of a probabilistic primarily test. 362 00:18:05,660 --> 00:18:07,770 Now, there's a small complication 363 00:18:07,770 --> 00:18:10,540 which is that there are certain numbers n where 364 00:18:10,540 --> 00:18:14,240 this property that half the numbers will fail to satisfy 365 00:18:14,240 --> 00:18:17,244 Fermat's theorem doesn't hold. 366 00:18:17,244 --> 00:18:18,910 They're known as the Carmichael numbers, 367 00:18:18,910 --> 00:18:21,140 and they're known to be pretty sparse. 368 00:18:21,140 --> 00:18:23,030 So that really if you're choosing an n 369 00:18:23,030 --> 00:18:26,800 at random, which is kind of what we're doing when we choose 370 00:18:26,800 --> 00:18:29,130 random primes p and q, the likelihood 371 00:18:29,130 --> 00:18:31,010 that you'll stumble on a Carmichael number 372 00:18:31,010 --> 00:18:33,600 is another thing that you just don't have to worry about. 373 00:18:33,600 --> 00:18:35,780 So really, the Fermat primality test 374 00:18:35,780 --> 00:18:38,680 is a plausible pragmatic test that you 375 00:18:38,680 --> 00:18:42,550 could use to pretty reliably detect whether or not 376 00:18:42,550 --> 00:18:45,370 a number was prime-- what was the last component 377 00:18:45,370 --> 00:18:49,450 of the powers that we needed the receiver to have. 378 00:18:49,450 --> 00:18:49,990 OK. 379 00:18:49,990 --> 00:18:53,110 So now we come to the question of why 380 00:18:53,110 --> 00:18:55,930 do we believe that the RSA protocol is secure? 381 00:18:55,930 --> 00:19:00,330 And the first thing to notice is that if you could factor n, 382 00:19:00,330 --> 00:19:02,590 then it's easy to break. 383 00:19:02,590 --> 00:19:06,690 Because if you can factor n, then you have the p and the q. 384 00:19:06,690 --> 00:19:10,170 And that means you know what p minus 1 times q minus 1 is. 385 00:19:10,170 --> 00:19:13,090 And therefore you can use the Pulverizer 386 00:19:13,090 --> 00:19:14,870 in exactly the same way the receiver did 387 00:19:14,870 --> 00:19:18,250 to find the inverse of the public key e. 388 00:19:18,250 --> 00:19:19,840 You could find d easily. 389 00:19:19,840 --> 00:19:23,790 So surely if you can factor, then RSA breaks. 390 00:19:23,790 --> 00:19:25,890 No question about that. 391 00:19:25,890 --> 00:19:27,250 What about the converse? 392 00:19:27,250 --> 00:19:30,740 Well, what you can approve-- and there's an argument that's 393 00:19:30,740 --> 00:19:34,900 sketched in class problem, not fully, in the book-- 394 00:19:34,900 --> 00:19:40,610 is that if I could find the private key d, then in fact, 395 00:19:40,610 --> 00:19:42,030 I can also factor n. 396 00:19:42,030 --> 00:19:45,200 So if I believe that factoring is hard, 397 00:19:45,200 --> 00:19:49,110 then in fact finding the secret key is also hard. 398 00:19:49,110 --> 00:19:51,860 And we could try to be confident that our secret key is not 399 00:19:51,860 --> 00:19:54,810 going to be found even given the public. 400 00:19:54,810 --> 00:19:58,850 Now, unfortunately this is not the strongest kind 401 00:19:58,850 --> 00:20:04,700 of security guaranteed you'd like because there's 402 00:20:04,700 --> 00:20:06,310 a logical possibility that you might 403 00:20:06,310 --> 00:20:09,050 be able to decrypt messages without knowing the secret key. 404 00:20:09,050 --> 00:20:11,800 Maybe there's some other walk around whereby 405 00:20:11,800 --> 00:20:15,650 you can decrypt the secret message m hat by a method other 406 00:20:15,650 --> 00:20:18,060 than raising it to the dth power. 407 00:20:18,060 --> 00:20:22,160 And what you'd really like is a theorem of security 408 00:20:22,160 --> 00:20:25,590 that said that breaking RSA-- reading 409 00:20:25,590 --> 00:20:28,160 RSA messages by any means whatsoever-- 410 00:20:28,160 --> 00:20:29,440 would be as hard as factoring. 411 00:20:29,440 --> 00:20:30,690 That's not known for RSA. 412 00:20:30,690 --> 00:20:32,480 It's an open problem. 413 00:20:32,480 --> 00:20:37,570 And so RSA doesn't have the theoretically most desirable 414 00:20:37,570 --> 00:20:41,510 security assurance, but we really believe in it. 415 00:20:41,510 --> 00:20:43,100 And the reason we really believe in it 416 00:20:43,100 --> 00:20:47,300 is that for 100 or more years, mathematicians and number 417 00:20:47,300 --> 00:20:50,320 theorists have been trying to find efficient ways to factor. 418 00:20:50,320 --> 00:20:55,970 And more pragmatically, the most sophisticated cryptographers 419 00:20:55,970 --> 00:20:59,960 and decoders in the world using the most powerful networks 420 00:20:59,960 --> 00:21:04,830 of supercomputers have been attacking RSA for 35 years 421 00:21:04,830 --> 00:21:07,080 and have yet to crack it. 422 00:21:07,080 --> 00:21:10,130 Now, the truth is that in the course of the 35 years, 423 00:21:10,130 --> 00:21:12,170 various kinds of glitches were found 424 00:21:12,170 --> 00:21:18,050 that required some added rules about how you found 425 00:21:18,050 --> 00:21:20,870 the p and the q and how you found the e, 426 00:21:20,870 --> 00:21:24,620 but they were easily identified and fixed. 427 00:21:24,620 --> 00:21:30,280 And RSA really is a robust public key encryption 428 00:21:30,280 --> 00:21:36,460 method that has withstood attack for all these years. 429 00:21:36,460 --> 00:21:38,990 That's why we believe in it.