1 00:00:00,080 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high-quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,217 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,217 --> 00:00:17,842 at ocw.mit.edu. 8 00:00:21,420 --> 00:00:24,770 PROFESSOR: So finally, it's our last recitation. 9 00:00:24,770 --> 00:00:29,350 And finally, it's my favorite topic-- cryptography-- 10 00:00:29,350 --> 00:00:32,990 because I work in this area. 11 00:00:32,990 --> 00:00:35,630 So I have probably a little bit more 12 00:00:35,630 --> 00:00:40,370 than what's required to tell you. 13 00:00:40,370 --> 00:00:51,930 So this recitation is access more primitives, 14 00:00:51,930 --> 00:00:55,350 so we'll introduce several more primitives that 15 00:00:55,350 --> 00:00:59,710 may be useful to your future work or study. 16 00:00:59,710 --> 00:01:02,330 And so the first one is digital signature. 17 00:01:11,280 --> 00:01:14,150 So we have briefly mentioned digital signatures 18 00:01:14,150 --> 00:01:19,710 in the lecture, but mainly as an application of Hatch, 19 00:01:19,710 --> 00:01:23,510 so now I'm going to introduce it as a standalone primitive. 20 00:01:26,500 --> 00:01:28,220 So as you may have already known, 21 00:01:28,220 --> 00:01:35,330 digital signature is used for verifying message authenticity. 22 00:01:35,330 --> 00:01:47,840 And it's a pair of function sign and verify. 23 00:01:47,840 --> 00:01:54,090 So sign takes a secret key and the message. 24 00:01:54,090 --> 00:02:03,320 It outputs a signature, which we refer to as sigma. 25 00:02:03,320 --> 00:02:09,570 And verify takes a public key, a message, and a signature, 26 00:02:09,570 --> 00:02:14,460 and outputs on either true or false. 27 00:02:14,460 --> 00:02:18,000 It either accepts the signature or reject. 28 00:02:18,000 --> 00:02:22,430 So we use secret key to sign and public key to verify. 29 00:02:22,430 --> 00:02:25,470 That means so if I want to send the message, 30 00:02:25,470 --> 00:02:30,080 I should be the only one who was able to sign it. 31 00:02:30,080 --> 00:02:33,510 And everyone can verify that this message indeed 32 00:02:33,510 --> 00:02:36,360 comes from me. 33 00:02:36,360 --> 00:02:41,590 So what properties do we want from digital signatures? 34 00:02:48,850 --> 00:02:49,990 Any thoughts on that? 35 00:02:55,298 --> 00:02:55,798 Go ahead. 36 00:02:55,798 --> 00:03:00,632 AUDIENCE: [INAUDIBLE] given the signature 37 00:03:00,632 --> 00:03:02,590 in a message [INAUDIBLE] to get the secret key. 38 00:03:02,590 --> 00:03:05,860 PROFESSOR: OK, that's definitely one. 39 00:03:05,860 --> 00:03:10,380 I'll put a more general description 40 00:03:10,380 --> 00:03:11,400 of what you just said. 41 00:03:16,170 --> 00:03:16,930 Any other answers? 42 00:03:19,876 --> 00:03:25,277 AUDIENCE: And only one signature [INAUDIBLE], 43 00:03:25,277 --> 00:03:29,210 like on any message coming out, can you have one signature? 44 00:03:29,210 --> 00:03:31,690 PROFESSOR: OK, so what's your name? 45 00:03:31,690 --> 00:03:32,900 AUDIENCE: Hugo. 46 00:03:32,900 --> 00:03:38,100 PROFESSOR: Hugo says a message should only have one signature. 47 00:03:38,100 --> 00:03:40,190 Let's think about whether that's necessary. 48 00:03:40,190 --> 00:03:44,750 So if my algorithm is a randomized one 49 00:03:44,750 --> 00:03:48,700 that for the same message I output 50 00:03:48,700 --> 00:03:50,430 many possible signatures. 51 00:03:50,430 --> 00:03:51,720 So why is that bad? 52 00:03:54,630 --> 00:03:56,350 So for any of them, they will all 53 00:03:56,350 --> 00:04:00,420 verify if that's how my algorithm works. 54 00:04:00,420 --> 00:04:02,040 I think that's fine. 55 00:04:02,040 --> 00:04:03,285 That's OK. 56 00:04:03,285 --> 00:04:04,160 It's not a bad thing. 57 00:04:04,160 --> 00:04:08,416 Actually, randomized signature is considered more secure. 58 00:04:08,416 --> 00:04:09,415 They are less efficient. 59 00:04:19,288 --> 00:04:20,079 Any other thoughts? 60 00:04:25,189 --> 00:04:26,897 AUDIENCE: Do you care about speed at all, 61 00:04:26,897 --> 00:04:29,820 like how long it takes to sign and verify? 62 00:04:29,820 --> 00:04:31,270 PROFESSOR: That's definitely one, 63 00:04:31,270 --> 00:04:33,760 but we haven't got any scheme yet, 64 00:04:33,760 --> 00:04:38,280 so we care about functionality first. 65 00:04:38,280 --> 00:04:43,250 There are faster signatures and slower ones. 66 00:04:43,250 --> 00:04:47,280 So the first one is actually very trivial. 67 00:04:47,280 --> 00:04:50,337 We first want correctness. 68 00:04:50,337 --> 00:04:51,170 What does that mean? 69 00:04:54,330 --> 00:04:57,780 That means if this sigma is indeed 70 00:04:57,780 --> 00:05:00,870 generated by the sign function that 71 00:05:00,870 --> 00:05:03,020 verified that the output 1. 72 00:05:03,020 --> 00:05:05,610 Otherwise, they should put output 0. 73 00:05:05,610 --> 00:05:08,901 That's actually the first and the most basic property 74 00:05:08,901 --> 00:05:09,400 we want. 75 00:05:12,100 --> 00:05:17,590 I don't want to write it because it's-- so the other one. 76 00:05:20,230 --> 00:05:22,990 So your answer is very close that you don't 77 00:05:22,990 --> 00:05:26,060 want to extract the secret key. 78 00:05:26,060 --> 00:05:29,010 But to make it more general, what we really want 79 00:05:29,010 --> 00:05:30,270 is unforgeability. 80 00:05:36,910 --> 00:05:40,390 That means if I have the secret key and someone 81 00:05:40,390 --> 00:05:43,430 else-- an adversary, who does not have the secret key-- 82 00:05:43,430 --> 00:05:48,420 should not be able to sign the message to pretend to be me. 83 00:05:51,760 --> 00:06:00,720 So they should not be able to produce m star sigma star, such 84 00:06:00,720 --> 00:06:05,030 that it verifies. 85 00:06:08,841 --> 00:06:09,340 Make sense? 86 00:06:12,200 --> 00:06:14,650 So what you said is a special case of this. 87 00:06:14,650 --> 00:06:19,050 So if they can extract, somehow extract the secret key, then, 88 00:06:19,050 --> 00:06:21,890 of course they can forge my signature 89 00:06:21,890 --> 00:06:24,980 on any other messages. 90 00:06:24,980 --> 00:06:29,320 But we do want to also prevent attack where they cannot 91 00:06:29,320 --> 00:06:32,255 extract the secret key, but they somehow can forge another 92 00:06:32,255 --> 00:06:34,380 signature. 93 00:06:34,380 --> 00:06:39,420 But usually we want to make the adversary more 94 00:06:39,420 --> 00:06:44,040 powerful because then we have higher confidence 95 00:06:44,040 --> 00:06:46,960 that we won't be attacked. 96 00:06:46,960 --> 00:06:51,550 So adversary is totally reasonable for it 97 00:06:51,550 --> 00:06:53,860 to see a bunch of messages from me. 98 00:07:00,240 --> 00:07:04,070 Because I am signing messages and output it to the world. 99 00:07:04,070 --> 00:07:08,300 So an adversary may have seen some of the message, signature 100 00:07:08,300 --> 00:07:11,590 pairs, I generated. 101 00:07:11,590 --> 00:07:15,050 But still we do not want to create a forgery. 102 00:07:15,050 --> 00:07:17,710 Now how is that defined? 103 00:07:17,710 --> 00:07:20,440 Because see you can definitely send one of these back. 104 00:07:20,440 --> 00:07:24,002 That's a valid message signature pair. 105 00:07:24,002 --> 00:07:26,110 So our unforgeability requirement 106 00:07:26,110 --> 00:07:29,710 is defined to be-- he should not be 107 00:07:29,710 --> 00:07:34,250 able to send such a pair where m star is 108 00:07:34,250 --> 00:07:39,130 different from any message he has already seen. 109 00:07:41,730 --> 00:07:44,610 There is no way to prevent the adversary 110 00:07:44,610 --> 00:07:47,260 from sending one of the message signature 111 00:07:47,260 --> 00:07:48,870 pairs he has seen before. 112 00:07:57,350 --> 00:07:59,470 So far, pretty straightforward. 113 00:08:04,080 --> 00:08:08,040 Now, how can we get digital signatures? 114 00:08:08,040 --> 00:08:13,590 So in the early days, researchers-- and it's 115 00:08:13,590 --> 00:08:16,040 actually great computer scientists-- 116 00:08:16,040 --> 00:08:21,720 they proposed a digital signature can 117 00:08:21,720 --> 00:08:32,880 be implemented as the inverse of public key encryption. 118 00:08:39,460 --> 00:08:40,955 What does that mean? 119 00:08:40,955 --> 00:08:42,860 So I'll use RSA as the example. 120 00:08:47,280 --> 00:08:57,720 So RSA encryption is m to the e mod n. 121 00:08:57,720 --> 00:09:02,810 The encryption is c to the d mod n. 122 00:09:11,380 --> 00:09:15,110 So the first attempt is we will just 123 00:09:15,110 --> 00:09:22,140 use this as our sign function and use 124 00:09:22,140 --> 00:09:27,150 this as our verify function. 125 00:09:27,150 --> 00:09:30,130 So now the symbol is a little bit confusing. 126 00:09:30,130 --> 00:09:33,740 So now I'm signing a message [INAUDIBLE] c. 127 00:09:36,300 --> 00:09:38,290 Let me actually change it. 128 00:09:38,290 --> 00:09:39,520 This is RSA encryption. 129 00:09:39,520 --> 00:09:43,180 I'm going to transform it into signature scheme 130 00:09:43,180 --> 00:09:47,990 where sign signs a message, and verify, raise the signature 131 00:09:47,990 --> 00:09:57,190 sigma to the power of e, and checks whether or not 132 00:09:57,190 --> 00:09:58,285 I get back my message. 133 00:10:03,590 --> 00:10:06,340 So this actually makes a lot of sense. 134 00:10:06,340 --> 00:10:06,970 Why? 135 00:10:06,970 --> 00:10:12,890 Because think of m as a ciphertext. 136 00:10:12,890 --> 00:10:18,500 Then if I decrypt it, and then re-encrypt it, 137 00:10:18,500 --> 00:10:20,560 I should get back my ciphertext. 138 00:10:20,560 --> 00:10:23,550 So correctness-- we have correctness. 139 00:10:26,480 --> 00:10:29,630 And why is it unforgeable? 140 00:10:29,630 --> 00:10:32,400 Because an attacker does not have the secret key, 141 00:10:32,400 --> 00:10:37,370 so he should not be able to decrypt this m here. 142 00:10:39,782 --> 00:10:40,990 He cannot run this algorithm. 143 00:10:46,040 --> 00:10:48,545 That's the reasoning behind it. 144 00:10:48,545 --> 00:10:49,170 So far so good? 145 00:10:53,630 --> 00:10:58,850 But, unfortunately, it is broken. 146 00:10:58,850 --> 00:11:05,350 And so I'll give you, say, seven minutes to think about it. 147 00:11:05,350 --> 00:11:09,680 Can you come up with an attack, a forgery? 148 00:11:09,680 --> 00:11:14,080 You can see a bunch of messages and then output a forgery 149 00:11:14,080 --> 00:11:16,110 for a message you haven't seen before. 150 00:11:43,650 --> 00:11:45,055 So is the algorithm clear? 151 00:12:34,186 --> 00:12:36,527 AUDIENCE: [INAUDIBLE]. 152 00:12:36,527 --> 00:12:37,860 PROFESSOR: Can you speak louder? 153 00:12:37,860 --> 00:12:40,790 AUDIENCE: Is it just the product of any messages? 154 00:12:40,790 --> 00:12:42,000 PROFESSOR: Exactly. 155 00:12:42,000 --> 00:12:45,990 So if an adversary had seen a bunch of messages-- 156 00:12:45,990 --> 00:12:50,840 because RSA has this sometimes good, sometimes nice, sometimes 157 00:12:50,840 --> 00:12:56,680 bad property, that is multiplicative, homomorphic, 158 00:12:56,680 --> 00:13:00,910 or to use a less fancy word-- malleable. 159 00:13:00,910 --> 00:13:04,530 So if they, an adversary, sees this message, 160 00:13:04,530 --> 00:13:09,640 it can set m star to be m1 times m2 161 00:13:09,640 --> 00:13:15,700 and sigma star to sigma 1 times sigma 2. 162 00:13:15,700 --> 00:13:16,460 You can check. 163 00:13:16,460 --> 00:13:19,360 This is a valid signature, message signature pair. 164 00:13:24,380 --> 00:13:27,980 You take this entire thing raised to d. 165 00:13:27,980 --> 00:13:29,820 They are raised to d individually and then 166 00:13:29,820 --> 00:13:32,475 multiply together, and that's exactly this message here. 167 00:13:37,320 --> 00:13:38,360 Attack one. 168 00:13:38,360 --> 00:13:38,860 OK. 169 00:13:38,860 --> 00:13:40,193 There's actually another attack. 170 00:13:43,800 --> 00:13:48,220 That's even simpler and tells you this scheme 171 00:13:48,220 --> 00:13:49,880 is even more broken. 172 00:13:57,020 --> 00:14:00,220 So all I want to do is to come up with a sigma 173 00:14:00,220 --> 00:14:03,690 when it's raised to e, that's equal to m. 174 00:14:03,690 --> 00:14:08,670 So I'm going to select the sigma, 175 00:14:08,670 --> 00:14:15,850 compute, m sigma raised to e, because e 176 00:14:15,850 --> 00:14:18,990 is my public key, mod m. 177 00:14:18,990 --> 00:14:19,830 I can do that. 178 00:14:19,830 --> 00:14:22,915 And then output sigma m-- oh sorry, m sigma. 179 00:14:26,800 --> 00:14:28,990 I select the signature first, and I raise it 180 00:14:28,990 --> 00:14:30,490 to the power of e. 181 00:14:30,490 --> 00:14:33,990 I get a very strange message, but it doesn't matter. 182 00:14:33,990 --> 00:14:34,740 That's my forgery. 183 00:14:42,140 --> 00:14:45,830 OK, so now you can see this scheme is basically 184 00:14:45,830 --> 00:14:47,340 totally broken. 185 00:14:47,340 --> 00:14:49,660 But they actually come from our, well, 186 00:14:49,660 --> 00:14:51,055 several renown scientists. 187 00:14:51,055 --> 00:14:52,730 But why is that a case? 188 00:14:52,730 --> 00:14:54,820 Because actually that definition didn't 189 00:14:54,820 --> 00:14:57,932 exist when they were trying to when they 190 00:14:57,932 --> 00:14:59,140 were working on this problem. 191 00:15:01,850 --> 00:15:04,670 And so that definition looks obvious today, 192 00:15:04,670 --> 00:15:07,120 but it's actually not obvious at all. 193 00:15:07,120 --> 00:15:13,740 And I think this algorithm came in the 70s, '78. 194 00:15:13,740 --> 00:15:16,880 And in '82, Goldwasser and Micali, 195 00:15:16,880 --> 00:15:19,250 two professors from MIT, proposed the definition 196 00:15:19,250 --> 00:15:21,735 for signature encryption and basically everything 197 00:15:21,735 --> 00:15:22,799 in cryptography. 198 00:15:22,799 --> 00:15:24,590 And they won another Turing Award for that. 199 00:15:27,940 --> 00:15:29,520 OK, so let's try to fix it. 200 00:15:37,570 --> 00:15:38,280 Any ideas? 201 00:15:42,760 --> 00:15:47,150 We do not want to change the framework. 202 00:15:47,150 --> 00:15:53,670 Let's still use RSA and combine it with some other primitive 203 00:15:53,670 --> 00:15:55,870 you have seen to try to fix it. 204 00:16:28,564 --> 00:16:29,480 What do we want to do? 205 00:16:29,480 --> 00:16:33,420 We want to break this multiplicative property. 206 00:16:37,190 --> 00:16:42,340 And we want to break this, this step, whatever it's called. 207 00:16:44,731 --> 00:16:45,230 Go ahead. 208 00:16:45,230 --> 00:16:50,600 AUDIENCE: Can we change the n, maybe? 209 00:16:50,600 --> 00:16:51,840 PROFESSOR: Change this n? 210 00:16:51,840 --> 00:16:52,770 AUDIENCE: Yeah. 211 00:16:52,770 --> 00:16:54,478 PROFESSOR: Right now, just to remind you, 212 00:16:54,478 --> 00:16:57,642 it's a product of two primes, OK, pq. 213 00:16:57,642 --> 00:16:58,850 It's a product of two primes. 214 00:16:58,850 --> 00:17:01,330 It's how RSA works. 215 00:17:01,330 --> 00:17:02,170 What's your idea? 216 00:17:02,170 --> 00:17:02,669 Go ahead. 217 00:17:02,669 --> 00:17:06,284 AUDIENCE: Before it raised to the power, 218 00:17:06,284 --> 00:17:08,119 we can get the hash function of it. 219 00:17:08,119 --> 00:17:08,910 PROFESSOR: Exactly. 220 00:17:11,660 --> 00:17:15,220 Let's just make a small change. 221 00:17:15,220 --> 00:17:21,619 So sign will be hash of m, raised to d. 222 00:17:21,619 --> 00:17:30,130 And verify will be-- just check whether hash of m 223 00:17:30,130 --> 00:17:32,960 equals signature is to e. 224 00:17:40,710 --> 00:17:44,311 This indeed fixes these attacks. 225 00:17:44,311 --> 00:17:44,810 Why? 226 00:17:44,810 --> 00:17:48,700 Because now you need-- well, if you do this-- hash of m 227 00:17:48,700 --> 00:17:53,250 and 1 times hash of m2 is not going to be hash of m star 228 00:17:53,250 --> 00:17:56,450 because hash is supposed to be [INAUDIBLE] random. 229 00:17:56,450 --> 00:17:57,900 That's not going to work. 230 00:17:57,900 --> 00:18:01,920 And here, what the attacker needs 231 00:18:01,920 --> 00:18:11,360 to do is to find hash of m, such that it's sigma raised to e. 232 00:18:11,360 --> 00:18:13,690 It can still do this, but it does not 233 00:18:13,690 --> 00:18:18,300 know what this message is because of the one-wayness 234 00:18:18,300 --> 00:18:19,740 of hash function. 235 00:18:19,740 --> 00:18:22,450 If we use a good hash function there, 236 00:18:22,450 --> 00:18:25,490 then it indeed fixes both the attacks. 237 00:18:25,490 --> 00:18:28,070 But we have seen the lecture that this hash function also 238 00:18:28,070 --> 00:18:30,750 needs to be collision resistant. 239 00:18:30,750 --> 00:18:32,400 Remember that? 240 00:18:32,400 --> 00:18:33,300 A question? 241 00:18:33,300 --> 00:18:35,450 AUDIENCE: Isn't the message public? 242 00:18:35,450 --> 00:18:40,640 PROFESSOR: Yeah, the message-- oh, OK. 243 00:18:40,640 --> 00:18:41,205 Good point. 244 00:18:47,671 --> 00:18:51,070 Oh, no, but, OK, you are talking about this attack, right? 245 00:18:51,070 --> 00:18:53,570 So the attacker needs to find the public message, 246 00:18:53,570 --> 00:18:57,490 but all he can do is select the sigma, and raise it to e. 247 00:18:57,490 --> 00:18:59,640 That's going to be its hash of m. 248 00:18:59,640 --> 00:19:03,250 And then he cannot figure out where this m is. 249 00:19:06,389 --> 00:19:07,597 But what about the other way? 250 00:19:07,597 --> 00:19:09,305 AUDIENCE: I mean, if he has two messages, 251 00:19:09,305 --> 00:19:14,370 he can still get m star, and then get hash of m star. 252 00:19:14,370 --> 00:19:17,030 PROFESSOR: OK, so he then he gets, has of m1. 253 00:19:17,030 --> 00:19:18,590 He gets hash of m2. 254 00:19:18,590 --> 00:19:21,020 But you need to find the m star, such that its hash 255 00:19:21,020 --> 00:19:23,012 is the multiplication of these two. 256 00:19:23,012 --> 00:19:25,220 And, yeah, he does not know how to find that message. 257 00:19:33,530 --> 00:19:38,130 So if the hash is not multiplicative, one-way, 258 00:19:38,130 --> 00:19:40,410 and collision resistant, then it seems 259 00:19:40,410 --> 00:19:43,920 that we have fixed all the attacks we know. 260 00:19:43,920 --> 00:19:48,680 However, how do we know there are no other attacks? 261 00:19:48,680 --> 00:19:50,850 So actually, indeed, this is a good idea. 262 00:19:50,850 --> 00:19:55,730 We have several national standards 263 00:19:55,730 --> 00:19:58,650 that just use this but slightly differently. 264 00:19:58,650 --> 00:20:01,915 I can-- this is just for your information. 265 00:20:08,870 --> 00:20:19,570 So there's a standard called [? NC, ?] whatever-- X93.1. 266 00:20:19,570 --> 00:20:34,180 It uses RSA, this word padding, so it 267 00:20:34,180 --> 00:20:39,770 takes the hash of the message and pad with this hex stream, 268 00:20:39,770 --> 00:20:43,330 and prepended and append another hex stream. 269 00:20:43,330 --> 00:20:45,749 Why do they do that? 270 00:20:45,749 --> 00:20:47,290 They don't know either, but they just 271 00:20:47,290 --> 00:20:51,686 think it's probably more secure than only using a hash. 272 00:20:51,686 --> 00:20:53,060 There's another standard that has 273 00:20:53,060 --> 00:20:54,760 a different steam and a difference stream here, 274 00:20:54,760 --> 00:20:55,860 and it doesn't matter. 275 00:21:00,140 --> 00:21:04,940 So that's indeed a weakness of these types of approaches. 276 00:21:04,940 --> 00:21:09,240 So their security is what we call ad hoc. 277 00:21:09,240 --> 00:21:10,769 We do not know how to break them. 278 00:21:10,769 --> 00:21:13,060 But we do not know how to prove they are secure either. 279 00:21:16,000 --> 00:21:17,920 Yet, that's what people do in practice. 280 00:21:20,620 --> 00:21:24,255 So, unfortunately, that's all I can tell you today, 281 00:21:24,255 --> 00:21:27,490 so how not to construct the digital signature. 282 00:21:27,490 --> 00:21:30,210 I cannot tell you how to construct the secure digital 283 00:21:30,210 --> 00:21:33,180 signature because that's out of the scope of this class. 284 00:21:33,180 --> 00:21:36,315 And it's a major topic in cryptography. 285 00:21:44,691 --> 00:21:45,565 Any questions so far? 286 00:21:52,822 --> 00:21:53,818 Go ahead. 287 00:21:53,818 --> 00:21:57,820 AUDIENCE: The hash function here is the one way, yeah? 288 00:21:57,820 --> 00:22:01,092 PROFESSOR: Yes, it's one-way, collision resistance, and-- 289 00:22:01,092 --> 00:22:03,008 AUDIENCE: So what is the use of using the RSA? 290 00:22:03,008 --> 00:22:05,630 Couldn't we just use the only hash function then? 291 00:22:09,100 --> 00:22:11,050 PROFESSOR: OK, good question. 292 00:22:11,050 --> 00:22:14,725 So, OK, let's be clear what you're saying. 293 00:22:14,725 --> 00:22:16,430 AUDIENCE: OK, never mind. 294 00:22:16,430 --> 00:22:17,400 PROFESSOR: Can you answer your own question? 295 00:22:17,400 --> 00:22:18,876 AUDIENCE: So my question was, well, 296 00:22:18,876 --> 00:22:20,352 why do we have to use the RSA? 297 00:22:20,352 --> 00:22:22,320 Why, when we have the hatch function? 298 00:22:22,320 --> 00:22:30,684 You want me-- so [INAUDIBLE] couldn't create the forgery. 299 00:22:30,684 --> 00:22:32,504 [INAUDIBLE] 300 00:22:32,504 --> 00:22:34,170 PROFESSOR: How does it create a forgery? 301 00:22:34,170 --> 00:22:35,420 Just answer your own question. 302 00:22:35,420 --> 00:22:36,540 Let everyone else know. 303 00:22:36,540 --> 00:22:39,770 Maybe they have the same question. 304 00:22:39,770 --> 00:22:42,208 So answer your own question. 305 00:22:42,208 --> 00:22:46,200 AUDIENCE: So my answer is so adversary can't just 306 00:22:46,200 --> 00:22:48,700 choose random message and hash it and [INAUDIBLE]. 307 00:22:48,700 --> 00:22:49,609 PROFESSOR: Yeah. 308 00:22:49,609 --> 00:22:50,400 What's the problem? 309 00:22:50,400 --> 00:22:52,810 Problem is that a hash function is a public function 310 00:22:52,810 --> 00:22:54,240 that everybody can compute. 311 00:22:54,240 --> 00:22:56,880 So the attacker just chooses a message, compute as hash, 312 00:22:56,880 --> 00:22:59,260 so using a hash is not a signature. 313 00:22:59,260 --> 00:22:59,920 But good point. 314 00:22:59,920 --> 00:23:02,070 I'm actually coming to that. 315 00:23:02,070 --> 00:23:09,980 So far we have seen three major primitives-- private key 316 00:23:09,980 --> 00:23:18,030 encryption, public key encryption, 317 00:23:18,030 --> 00:23:19,080 and digital signature. 318 00:23:24,870 --> 00:23:29,310 So if we categorize them a little bit-- 319 00:23:29,310 --> 00:23:33,940 so these two are asymmetric key. 320 00:23:33,940 --> 00:23:35,670 They are public key and secret key. 321 00:23:35,670 --> 00:23:37,080 This one is symmetric key. 322 00:23:40,150 --> 00:23:45,085 And these two are for secrecy. 323 00:23:49,580 --> 00:23:52,510 They are trying to hide the message. 324 00:23:52,510 --> 00:23:55,910 And this one is for integrity. 325 00:23:59,450 --> 00:24:04,180 Meaning, the message is what the sender sends. 326 00:24:04,180 --> 00:24:08,240 So you can see we are missing one primitive here. 327 00:24:08,240 --> 00:24:10,530 What if the two parties, they do share 328 00:24:10,530 --> 00:24:14,310 a secret key, and one party wants 329 00:24:14,310 --> 00:24:15,820 to verify the other party? 330 00:24:15,820 --> 00:24:18,850 The other message indeed does come from the other party. 331 00:24:18,850 --> 00:24:22,120 So indeed we do have a primitive for that. 332 00:24:22,120 --> 00:24:29,640 It's called message authentication code. 333 00:24:29,640 --> 00:24:31,930 So its definition is basically exactly the same 334 00:24:31,930 --> 00:24:35,370 as digital signature. 335 00:24:35,370 --> 00:24:39,630 I'm just going to change it here. 336 00:24:39,630 --> 00:24:42,145 Except that it has only one key. 337 00:24:45,960 --> 00:24:50,440 So the sign function is replaced by a MAC. 338 00:24:50,440 --> 00:24:54,160 And there's no notion of secret key and public key. 339 00:24:54,160 --> 00:24:55,430 We have only one key. 340 00:24:55,430 --> 00:24:57,710 And how do we verify? 341 00:24:57,710 --> 00:25:00,160 OK, so verify function basically just 342 00:25:00,160 --> 00:25:05,110 becomes the other guy also recomputes 343 00:25:05,110 --> 00:25:07,340 the MAC of the message and checks 344 00:25:07,340 --> 00:25:08,645 whether that's the signature. 345 00:25:12,810 --> 00:25:28,700 So verifier just to recompute and compare correctness-- 346 00:25:28,700 --> 00:25:30,270 we also want correctness. 347 00:25:30,270 --> 00:25:32,240 We also want unforgeability And it's defined 348 00:25:32,240 --> 00:25:33,530 in exactly the same way. 349 00:25:38,410 --> 00:25:43,810 Now, actually, I would have asked this question here-- 350 00:25:43,810 --> 00:25:45,840 is hash a valid MAC? 351 00:25:51,350 --> 00:25:54,190 The answer is still no because MAC is a public function 352 00:25:54,190 --> 00:25:56,290 that everyone can compute, and it's trivial 353 00:25:56,290 --> 00:25:59,047 come up with a forgery. 354 00:25:59,047 --> 00:26:00,630 So thank you for asking that question. 355 00:26:04,260 --> 00:26:06,106 But the hash is actually very close. 356 00:26:19,640 --> 00:26:21,880 How can we get a message authentication code? 357 00:26:24,970 --> 00:26:26,760 So several ideas. 358 00:26:26,760 --> 00:26:34,750 Can we just hash the key concatenated with the message? 359 00:26:34,750 --> 00:26:38,680 Then some other random attacker who doesn't have the key 360 00:26:38,680 --> 00:26:42,700 does not know how to compute this thing. 361 00:26:42,700 --> 00:26:44,512 That's a reasonable idea. 362 00:26:44,512 --> 00:26:47,740 But, well, if we can do it this way, how about we do 363 00:26:47,740 --> 00:26:50,905 the message concatenated with the key. 364 00:26:50,905 --> 00:26:53,510 Or if you want, you can do key concatenated with message 365 00:26:53,510 --> 00:26:54,968 and then concatenated with the key. 366 00:27:00,460 --> 00:27:04,710 So it turns out this doesn't work 367 00:27:04,710 --> 00:27:08,590 for some very advanced reasons. 368 00:27:08,590 --> 00:27:11,240 And this one may or may not. 369 00:27:11,240 --> 00:27:16,510 For SHA1, it doesn't work, unfortunately. 370 00:27:16,510 --> 00:27:21,950 And for SHA3-- that's the replacement for SHA1 and SHA2-- 371 00:27:21,950 --> 00:27:23,290 it actually works. 372 00:27:23,290 --> 00:27:25,820 So the simplest MAC we can imagine 373 00:27:25,820 --> 00:27:29,190 is just to choose SHA3 as the hash function, 374 00:27:29,190 --> 00:27:35,370 and input is the key and then the message. 375 00:27:35,370 --> 00:27:38,060 Not the other way. 376 00:27:38,060 --> 00:27:41,160 It's also, just FYI, purpose. 377 00:27:41,160 --> 00:27:44,660 By the way, there's another reasonable thought. 378 00:27:44,660 --> 00:27:49,010 That is, how about we encrypt the hash? 379 00:27:52,104 --> 00:27:53,710 Now, everyone can compute the hash, 380 00:27:53,710 --> 00:27:55,530 but they don't know how to encrypt. 381 00:27:55,530 --> 00:27:58,850 If I use, say, a secret key encryption, 382 00:27:58,850 --> 00:28:02,140 this turns out to be wrong as well. 383 00:28:11,230 --> 00:28:13,764 That's digital signature in MAC. 384 00:28:17,000 --> 00:28:20,140 But one caveat here, our unforgeability 385 00:28:20,140 --> 00:28:21,310 is defined this way. 386 00:28:21,310 --> 00:28:23,890 A little bit strange, but it makes sense. 387 00:28:23,890 --> 00:28:30,240 But it indeed has some weakness in some applications. 388 00:28:30,240 --> 00:28:34,710 So imagine, say, I send you a message-- today's recitation 389 00:28:34,710 --> 00:28:36,340 is canceled. 390 00:28:36,340 --> 00:28:38,890 And it has my signature on it. 391 00:28:38,890 --> 00:28:42,040 So you can verify it indeed comes from me. 392 00:28:42,040 --> 00:28:45,820 But once I send that message, every of you has that message. 393 00:28:45,820 --> 00:28:48,250 So next week, one of you can send that message again, 394 00:28:48,250 --> 00:28:51,550 saying, today's recitation is canceled. 395 00:28:51,550 --> 00:28:53,840 Then you have no idea whether it's indeed me 396 00:28:53,840 --> 00:28:55,750 sending the message again or someone 397 00:28:55,750 --> 00:28:58,330 doing an April Fool's Day joke. 398 00:29:00,840 --> 00:29:03,670 So how do we prevent that? 399 00:29:03,670 --> 00:29:05,350 Well, of course, one thing I can do 400 00:29:05,350 --> 00:29:08,860 is if I'm smart I'll say, today, like in parenthesis, May 401 00:29:08,860 --> 00:29:10,780 the 8th, recitation is canceled. 402 00:29:10,780 --> 00:29:12,940 Then you cannot repeat that message. 403 00:29:12,940 --> 00:29:15,980 But we want to protect human stability. 404 00:29:15,980 --> 00:29:20,360 That's the whole point of cryptography. 405 00:29:20,360 --> 00:29:24,030 So one thing we could do, let's see. 406 00:29:42,000 --> 00:29:43,095 Very simple modification. 407 00:29:45,850 --> 00:29:55,040 When I sign the message, I'll sign 1, 408 00:29:55,040 --> 00:29:57,990 concatenated with my message. 409 00:29:57,990 --> 00:30:00,910 Next time I sign 2, concatenated with my message. 410 00:30:00,910 --> 00:30:05,630 And then 3, 4, and just have this counter that 411 00:30:05,630 --> 00:30:07,570 keeps incrementing. 412 00:30:07,570 --> 00:30:08,660 It naturally fixes. 413 00:30:11,210 --> 00:30:14,971 So you can verify if you receive the same message 414 00:30:14,971 --> 00:30:16,470 with the same counter, then you know 415 00:30:16,470 --> 00:30:18,150 it's someone else who is resending it. 416 00:30:21,050 --> 00:30:25,060 So that's one thing we need to do 417 00:30:25,060 --> 00:30:28,840 for signature in practical use. 418 00:30:28,840 --> 00:30:36,910 Now, consider another totally different application. 419 00:30:36,910 --> 00:30:41,640 So say I think everyone uses Google Drive, 420 00:30:41,640 --> 00:30:44,180 Dropbox, something like that. 421 00:30:44,180 --> 00:30:52,540 You store a bunch of files on this cloud server. 422 00:30:56,190 --> 00:30:59,230 Now you are here. 423 00:30:59,230 --> 00:31:03,486 You'll have a, say, cell phone, and you can access your files. 424 00:31:03,486 --> 00:31:08,640 But how do you know when you read a file, 425 00:31:08,640 --> 00:31:12,360 it is indeed your file unmodified? 426 00:31:12,360 --> 00:31:16,320 How do, maybe Google messes with you, 427 00:31:16,320 --> 00:31:19,330 or there's someone in the middle who changes your file? 428 00:31:22,190 --> 00:31:24,450 Usually, most people do not care about that, 429 00:31:24,450 --> 00:31:27,090 while in cryptography, we do care about that. 430 00:31:29,610 --> 00:31:34,561 So in that case, MAC and signatures do not help us. 431 00:31:34,561 --> 00:31:35,060 Why? 432 00:31:35,060 --> 00:31:43,280 Because if you just store a MAC alongside each file, 433 00:31:43,280 --> 00:31:44,090 what went wrong? 434 00:31:48,520 --> 00:31:49,020 Go ahead. 435 00:31:49,020 --> 00:31:50,728 AUDIENCE: You need to modify the MAC too. 436 00:31:52,656 --> 00:31:54,280 PROFESSOR: But if they modify the file, 437 00:31:54,280 --> 00:31:56,280 they do not know how to generate a MAC 438 00:31:56,280 --> 00:32:01,320 for their version of the file. 439 00:32:01,320 --> 00:32:04,690 But what they can do is you have this file 440 00:32:04,690 --> 00:32:08,310 and then you come and write it, and you generate a new MAC. 441 00:32:08,310 --> 00:32:12,600 When you read it, they give you the old version. 442 00:32:12,600 --> 00:32:15,644 That has the valid signature or MAC on it 443 00:32:15,644 --> 00:32:17,060 because you generated that for it. 444 00:32:21,980 --> 00:32:22,980 You all see the problem? 445 00:32:22,980 --> 00:32:24,188 You haven't seen the problem? 446 00:32:25,386 --> 00:32:27,760 AUDIENCE: What do you mean they give you the old version? 447 00:32:27,760 --> 00:32:29,301 PROFESSOR: OK, so you have this file. 448 00:32:29,301 --> 00:32:33,160 You generate a MAC, but you-- at some point, 449 00:32:33,160 --> 00:32:34,720 you want to update the file. 450 00:32:34,720 --> 00:32:38,430 You want to update this file to this file prime, 451 00:32:38,430 --> 00:32:40,900 and generate a new MAC. 452 00:32:40,900 --> 00:32:44,590 Maybe then file double prime, MAC double prime. 453 00:32:44,590 --> 00:32:46,660 In this application, we want freshness. 454 00:32:46,660 --> 00:32:48,990 When you read this file, you want the latest version 455 00:32:48,990 --> 00:32:49,740 of the file. 456 00:32:49,740 --> 00:32:54,162 So it should be what you wrote there last time. 457 00:32:54,162 --> 00:32:55,870 But when you are trying to read the file, 458 00:32:55,870 --> 00:32:59,210 an attacker can give you this pair. 459 00:32:59,210 --> 00:33:03,390 If you check the MAC, it's going to match. 460 00:33:03,390 --> 00:33:06,018 This is also a valid message MAC pair. 461 00:33:08,969 --> 00:33:10,260 Now, everyone sees the problem. 462 00:33:13,600 --> 00:33:17,090 OK, so what can we do? 463 00:33:17,090 --> 00:33:24,870 Well, one thing we could do is store all these MACs here 464 00:33:24,870 --> 00:33:25,660 on your phone. 465 00:33:25,660 --> 00:33:31,960 MAC1, MAC2-- a MAC for every single file. 466 00:33:31,960 --> 00:33:35,770 But if you do that, in fact, we do not need MAC anymore. 467 00:33:35,770 --> 00:33:36,800 We can just use hash. 468 00:33:41,970 --> 00:33:49,370 So I'll say sigma-- I'll use sigmas, but they mean hashes. 469 00:33:52,850 --> 00:33:58,710 This is probably good enough in practice. 470 00:33:58,710 --> 00:34:05,190 I'll say these files are x1, x2, x3, x4. 471 00:34:05,190 --> 00:34:07,000 Now you just create a hash for each of them 472 00:34:07,000 --> 00:34:08,360 and store them locally. 473 00:34:08,360 --> 00:34:14,400 And the model here is that an attacker cannot modify files 474 00:34:14,400 --> 00:34:17,102 on your own computer or on your own phone. 475 00:34:17,102 --> 00:34:18,560 And then you can download the file. 476 00:34:18,560 --> 00:34:21,860 Match-- compares it with the latest version of the hash, 477 00:34:21,860 --> 00:34:25,880 and then you're convinced that it's the latest version. 478 00:34:25,880 --> 00:34:28,050 This is probably a good enough solution. 479 00:34:28,050 --> 00:34:34,400 The only downside is that we do have to store a lot of hashes 480 00:34:34,400 --> 00:34:37,659 if you have a lot of files. 481 00:34:37,659 --> 00:34:43,590 Or in our algorithmic terminology, 482 00:34:43,590 --> 00:34:50,012 we say your space complexity is o of n. 483 00:34:52,670 --> 00:34:54,355 Here, I mean your local space. 484 00:34:59,930 --> 00:35:03,330 So can we somehow reduce the local space complexity? 485 00:35:07,690 --> 00:35:11,270 Well, one thing we could do is to concatenate all the files 486 00:35:11,270 --> 00:35:15,275 together, generate a single hash, and store that one hash. 487 00:35:25,900 --> 00:35:29,860 So hash everything in one try. 488 00:35:29,860 --> 00:35:36,020 Then we do have o of 1 space, but there's a bigger problem. 489 00:35:39,190 --> 00:35:40,130 Can anyone tell me? 490 00:35:43,581 --> 00:35:45,553 AUDIENCE: You don't know which file to modify? 491 00:35:49,500 --> 00:35:52,610 PROFESSOR: Oh, OK, I think you are 492 00:35:52,610 --> 00:35:55,320 thinking in the right thing. 493 00:35:55,320 --> 00:35:57,860 So how do I verify? 494 00:35:57,860 --> 00:35:59,290 I cannot verify a single file. 495 00:35:59,290 --> 00:36:04,990 I have to download all the files and recompute a hash to verify. 496 00:36:04,990 --> 00:36:08,680 So the time complexity is o of n. 497 00:36:11,670 --> 00:36:13,990 And, also, if I want to update this file, 498 00:36:13,990 --> 00:36:15,640 I have to recompute the hash. 499 00:36:15,640 --> 00:36:19,180 That involves, again, downloading all the files 500 00:36:19,180 --> 00:36:20,460 and feed them into that hash. 501 00:36:26,150 --> 00:36:31,120 And we do have a better solution than both of them, which 502 00:36:31,120 --> 00:36:34,332 is called a hash tree or a Merkle tree, 503 00:36:34,332 --> 00:36:35,540 which was invented by Merkle. 504 00:36:38,720 --> 00:36:43,650 What we will do is so first for every file, 505 00:36:43,650 --> 00:36:44,880 we're going to create a hash. 506 00:36:51,020 --> 00:36:54,220 Let me, again, use sigma because h is unclear 507 00:36:54,220 --> 00:36:57,290 whether it's a hash value or a hash function. 508 00:36:57,290 --> 00:37:00,206 Sigma 2, sigma 3, sigma 4. 509 00:37:04,070 --> 00:37:05,480 So I said a hash tree. 510 00:37:05,480 --> 00:37:09,613 And guess what's the next step to do? 511 00:37:09,613 --> 00:37:11,134 AUDIENCE: Cross the hashes. 512 00:37:11,134 --> 00:37:11,800 PROFESSOR: Yeah. 513 00:37:11,800 --> 00:37:13,485 Exactly. 514 00:37:13,485 --> 00:37:21,820 We're going to create a sigma 5, which is the hash of sigma 1, 515 00:37:21,820 --> 00:37:25,160 concatenated with sigma 2. 516 00:37:25,160 --> 00:37:26,541 So we do the same thing here. 517 00:37:31,351 --> 00:37:33,010 And so you all know what it is, right? 518 00:37:33,010 --> 00:37:34,840 I don't need to write it. 519 00:37:34,840 --> 00:37:39,915 And keep going until we got a root hash. 520 00:37:45,280 --> 00:37:49,560 And now we're going to store this thing locally on the side. 521 00:37:58,725 --> 00:38:00,350 So what's the local storage complexity? 522 00:38:02,950 --> 00:38:06,172 o of 1-- we're only storing one hash locally. 523 00:38:06,172 --> 00:38:07,505 Now, what's the time complexity? 524 00:38:12,970 --> 00:38:15,900 OK, so how do I verify? 525 00:38:15,900 --> 00:38:17,260 Yeah. 526 00:38:17,260 --> 00:38:20,060 Log in-- how do I verify? 527 00:38:20,060 --> 00:38:24,450 I need to, so, first verify if this hash matches, 528 00:38:24,450 --> 00:38:26,540 and then read this hash, and verify 529 00:38:26,540 --> 00:38:30,030 whether this link matches, and verify whether this one 530 00:38:30,030 --> 00:38:32,600 matches, and then I'm done. 531 00:38:32,600 --> 00:38:35,640 If I want to update, I also need to update this hash, 532 00:38:35,640 --> 00:38:37,640 then it causes this hash to change 533 00:38:37,640 --> 00:38:39,941 and then that hash to change. 534 00:38:39,941 --> 00:38:43,580 But it's always some path in that tree. 535 00:38:43,580 --> 00:38:45,761 It doesn't affect anything globally. 536 00:38:45,761 --> 00:38:46,260 Question. 537 00:38:46,260 --> 00:38:48,705 AUDIENCE: But you're not storing like sigma 5? 538 00:38:48,705 --> 00:38:49,580 PROFESSOR: Say again. 539 00:38:49,580 --> 00:38:52,017 AUDIENCE: You're not storing sigma 5. 540 00:38:52,017 --> 00:38:52,850 PROFESSOR: I am not. 541 00:38:52,850 --> 00:38:55,124 I have to go ahead and read it. 542 00:38:55,124 --> 00:38:56,360 AUDIENCE: From where? 543 00:38:56,360 --> 00:39:00,558 PROFESSOR: From Google Drive or Dropbox. 544 00:39:00,558 --> 00:39:03,020 AUDIENCE: So are we sure that that is secure? 545 00:39:03,020 --> 00:39:04,970 PROFESSOR: OK, yeah, so that's the next thing 546 00:39:04,970 --> 00:39:06,280 we're going to do. 547 00:39:06,280 --> 00:39:07,580 Is this secure? 548 00:39:07,580 --> 00:39:10,540 Or in other words, can the adversary 549 00:39:10,540 --> 00:39:13,090 change one of the files, and somehow 550 00:39:13,090 --> 00:39:18,010 maintain the same root hash? 551 00:39:18,010 --> 00:39:19,174 That's your question then. 552 00:39:28,560 --> 00:39:32,210 Of course, we assume the hash is collision resistant. 553 00:39:32,210 --> 00:39:34,930 Or I should say if the hash is collision resistant, 554 00:39:34,930 --> 00:39:37,560 then this hash tree is collision resistant. 555 00:39:41,590 --> 00:39:42,210 Any intuition? 556 00:39:58,014 --> 00:39:59,180 Or anyone wants to prove it? 557 00:40:04,548 --> 00:40:05,048 Go ahead. 558 00:40:05,048 --> 00:40:09,449 AUDIENCE: So like if the root eventually one of the leaves 559 00:40:09,449 --> 00:40:11,405 will be different because it changes. 560 00:40:11,405 --> 00:40:12,383 PROFESSOR: Yep. 561 00:40:12,383 --> 00:40:14,341 AUDIENCE: Now, you want the root to be the same 562 00:40:14,341 --> 00:40:17,273 than the other hash has to be different, 563 00:40:17,273 --> 00:40:18,740 but there's no collisions. 564 00:40:18,740 --> 00:40:22,200 I mean, it's hard to find the other hash. 565 00:40:22,200 --> 00:40:25,670 PROFESSOR: Correct, so I'll just repeat what you said, 566 00:40:25,670 --> 00:40:28,125 but I'll start with the leaf because that's easier for me 567 00:40:28,125 --> 00:40:28,750 to think about. 568 00:40:28,750 --> 00:40:32,160 So say I change this one, this block. 569 00:40:32,160 --> 00:40:35,800 Now, I claim this hash here will change. 570 00:40:35,800 --> 00:40:39,090 If it doesn't, then I have found the collision. 571 00:40:39,090 --> 00:40:44,700 Because this x4 prime has the same hash as the original x4. 572 00:40:44,700 --> 00:40:48,860 So if this sigma 4 changes, then sigma 6 will change. 573 00:40:48,860 --> 00:40:51,960 Otherwise, I have found the collision. 574 00:40:51,960 --> 00:40:54,720 Because this sigma 3 concatenate with the new sigma 575 00:40:54,720 --> 00:40:56,780 4 is my collision. 576 00:40:56,780 --> 00:40:59,570 So same argument-- either this one changes, or I 577 00:40:59,570 --> 00:41:00,970 have found the collision. 578 00:41:00,970 --> 00:41:03,720 I repeat the argument all the way to the root. 579 00:41:11,500 --> 00:41:13,775 Any question about that? 580 00:41:13,775 --> 00:41:17,735 AUDIENCE: What if like the adversary changes like two hash 581 00:41:17,735 --> 00:41:21,695 options--for example, x1 and x2-- 582 00:41:21,695 --> 00:41:27,640 but sigma 1 and sigma 2 changes, but sigma 5 stays the same? 583 00:41:27,640 --> 00:41:29,310 PROFESSOR: OK, so then we have found 584 00:41:29,310 --> 00:41:33,910 the collision that is sigma 1 concatenated with sigma 2. 585 00:41:33,910 --> 00:41:36,625 That's a collision with the new sigma 1 concatenated 586 00:41:36,625 --> 00:41:37,500 with the new sigma 2. 587 00:41:41,910 --> 00:41:42,410 Make sense? 588 00:41:48,793 --> 00:41:52,475 AUDIENCE: If the concatenation stayed the same, like sigma 1 589 00:41:52,475 --> 00:41:55,015 and sigma 2 concatenation. 590 00:41:55,015 --> 00:41:57,140 PROFESSOR: So if the concatenation stayed the same, 591 00:41:57,140 --> 00:41:59,100 that means both of them are the same. 592 00:41:59,100 --> 00:42:03,431 AUDIENCE: They had to make sure they are changed? 593 00:42:03,431 --> 00:42:05,680 PROFESSOR: So I'm not sure I understand your question. 594 00:42:05,680 --> 00:42:08,180 So concatenation is basically just a bunch 595 00:42:08,180 --> 00:42:10,660 of bits then followed by another bunch of bits. 596 00:42:10,660 --> 00:42:12,120 If this entire thing is the same, 597 00:42:12,120 --> 00:42:15,360 that means this part is the same and this part is the same. 598 00:42:15,360 --> 00:42:19,200 And if your sigma 1, new sigma 1 is the same as your old sigma 599 00:42:19,200 --> 00:42:21,595 1, that means I have found a collision here. 600 00:42:21,595 --> 00:42:23,803 Because we changed it, but your sigma doesn't change. 601 00:42:46,500 --> 00:42:50,940 So lastly, I'm going to do a quick review of the knapsack 602 00:42:50,940 --> 00:42:55,447 problem because I think in the lecture, we may run out of time 603 00:42:55,447 --> 00:42:56,780 and I didn't mention everything. 604 00:43:10,660 --> 00:43:14,110 So if you recall, the knapsack cryptosystem, 605 00:43:14,110 --> 00:43:18,170 it says you have a knapsack problem. 606 00:43:21,700 --> 00:43:23,950 I'll call u1 to un. 607 00:43:23,950 --> 00:43:26,520 And then we're going to transform it. 608 00:43:26,520 --> 00:43:30,860 OK, this is a super increasing sequence. 609 00:43:30,860 --> 00:43:39,190 I'm going to transform into a general one by multiplying n 610 00:43:39,190 --> 00:43:43,125 and then mod m. 611 00:43:43,125 --> 00:43:46,860 So this is an easy problem, and that is a hard problem. 612 00:43:46,860 --> 00:43:48,310 So how do I encrypt? 613 00:43:48,310 --> 00:43:57,350 I'm going to take a subset sum, which is mi, Wi, where mi 614 00:43:57,350 --> 00:43:59,252 is the i-th bit in the message. 615 00:44:02,200 --> 00:44:04,880 So how do I decrypt? 616 00:44:04,880 --> 00:44:12,880 I'll take this, transform this s back to the super increasing 617 00:44:12,880 --> 00:44:18,660 domain by multiplying inverse of n. 618 00:44:18,660 --> 00:44:28,630 So that's going to be inverse of n multiplied by this mi Wi. 619 00:44:28,630 --> 00:44:30,940 That's how I encrypt it. 620 00:44:30,940 --> 00:44:38,000 And then each Wi is n times ui. 621 00:44:42,000 --> 00:44:44,200 So far so good. 622 00:44:44,200 --> 00:44:50,140 So that gives me mi times ui sigma. 623 00:44:53,720 --> 00:44:56,315 Of course, every step is modulo m. 624 00:44:59,500 --> 00:45:02,260 So the first thing I'm going to claim 625 00:45:02,260 --> 00:45:08,040 is that m has to be larger than sigma ui. 626 00:45:08,040 --> 00:45:13,950 If that's the case, then the t-- my t is just this subset sum. 627 00:45:13,950 --> 00:45:17,380 So if I solve this knapsack problem, 628 00:45:17,380 --> 00:45:20,310 I get the same answer as solving the original, 629 00:45:20,310 --> 00:45:22,970 the general knapsack problem. 630 00:45:22,970 --> 00:45:26,100 If my m is not that large, if the m is too small, 631 00:45:26,100 --> 00:45:27,340 then I have a problem. 632 00:45:27,340 --> 00:45:30,360 Because then my t will be the subset sum 633 00:45:30,360 --> 00:45:33,160 minus sum multiple of m. 634 00:45:33,160 --> 00:45:34,512 Then it's a different problem. 635 00:45:34,512 --> 00:45:35,970 I do not get the same message back. 636 00:45:42,080 --> 00:45:47,060 OK, then we have a problem. 637 00:45:47,060 --> 00:45:58,775 So because we defined density to be n over the log of max ui. 638 00:45:58,775 --> 00:46:00,150 Does everyone remember this part? 639 00:46:03,050 --> 00:46:10,560 So each ui is in the range of 1 to m, or maybe 0 to m. 640 00:46:10,560 --> 00:46:15,377 If I have a bunch of them, then this is not super rigorous. 641 00:46:15,377 --> 00:46:17,710 If I have a bunch of them, chances are that some of them 642 00:46:17,710 --> 00:46:21,960 are very close to m. 643 00:46:21,960 --> 00:46:25,700 Because it's unlikely that all of them are small. 644 00:46:25,700 --> 00:46:32,260 So this thing is roughly n over log of m. 645 00:46:38,450 --> 00:46:41,070 So then we have a dilemma. 646 00:46:41,070 --> 00:46:46,420 If we set m to be a small number, 647 00:46:46,420 --> 00:46:48,790 then my density is fine, but that 648 00:46:48,790 --> 00:46:50,950 means all of my ui's needs to be small 649 00:46:50,950 --> 00:46:54,730 because m needs to be greater than the sum of them. 650 00:46:54,730 --> 00:46:57,140 If all the ui's are small, then I 651 00:46:57,140 --> 00:46:59,680 have a very limited choices of them, 652 00:46:59,680 --> 00:47:01,582 then actually an attacker can just 653 00:47:01,582 --> 00:47:06,380 guess what ui I chose by a brute force algorithm or something 654 00:47:06,380 --> 00:47:07,210 like that. 655 00:47:07,210 --> 00:47:10,030 And if I choose m to be large, or if I 656 00:47:10,030 --> 00:47:12,550 choose all the ui's to be large, to choose them 657 00:47:12,550 --> 00:47:16,980 from large range, then my m is going to be very large. 658 00:47:16,980 --> 00:47:19,740 And this density is low. 659 00:47:19,740 --> 00:47:23,510 And that's vulnerable to the low density attacks. 660 00:47:23,510 --> 00:47:25,740 And so how low a density is considered low? 661 00:47:25,740 --> 00:47:32,430 So several people proposed that based on heuristics, 662 00:47:32,430 --> 00:47:37,410 that if this density is less than 0.45, 663 00:47:37,410 --> 00:47:39,440 then it's considered low density, 664 00:47:39,440 --> 00:47:41,320 and it can be attacked. 665 00:47:41,320 --> 00:47:44,520 And this threshold had been improved. 666 00:47:47,840 --> 00:47:52,380 So but while most of the knapsack cryptosystems 667 00:47:52,380 --> 00:47:55,140 are broken, there are few that have 668 00:47:55,140 --> 00:47:57,280 so far stood the test of time. 669 00:47:57,280 --> 00:47:59,940 So they are still interesting because knapsack problems, 670 00:47:59,940 --> 00:48:04,400 knapsack cryptosystems will be much faster than RSA 671 00:48:04,400 --> 00:48:06,940 or any number theory based, because we are just 672 00:48:06,940 --> 00:48:08,220 adding numbers here. 673 00:48:08,220 --> 00:48:12,500 An RSA have this operation where m is a 1,000 bit number, 674 00:48:12,500 --> 00:48:14,620 and e is also 1,000 bit number. 675 00:48:14,620 --> 00:48:19,400 And take this exponentiation is actually very slow. 676 00:48:19,400 --> 00:48:22,620 So knapsack cryptosystem are still interesting. 677 00:48:22,620 --> 00:48:27,440 However, the original motivation turned out to be unsuccessful. 678 00:48:27,440 --> 00:48:32,890 The original motivation is to base cryptography 679 00:48:32,890 --> 00:48:34,490 on the NP complete problem. 680 00:48:34,490 --> 00:48:39,370 That's not going to work because NP problems are hard, only 681 00:48:39,370 --> 00:48:40,520 in the worst case. 682 00:48:40,520 --> 00:48:45,540 And we need cryptography to be hard in the average case. 683 00:48:45,540 --> 00:48:47,550 Because if they are only hard in the worst case, 684 00:48:47,550 --> 00:48:49,133 that means there are several instances 685 00:48:49,133 --> 00:48:51,160 of this problem that are hard. 686 00:48:51,160 --> 00:48:54,680 So either you pick a secret key that 687 00:48:54,680 --> 00:48:57,260 doesn't correspond to a hard problem, 688 00:48:57,260 --> 00:49:00,910 or you pick a secret key that's corresponds to a hard problem. 689 00:49:00,910 --> 00:49:04,290 But everyone else picks the same secret key 690 00:49:04,290 --> 00:49:07,430 because everyone wants to be secure. 691 00:49:07,430 --> 00:49:13,520 That's the reason why it's unlikely to get cryptography 692 00:49:13,520 --> 00:49:14,520 from NP hard problems. 693 00:49:18,444 --> 00:49:19,860 That's all for today's recitation. 694 00:49:19,860 --> 00:49:22,660 And thanks everyone for the entire semester. 695 00:49:22,660 --> 00:49:24,510 Thank you for participation.