1 00:00:00,500 --> 00:00:02,420 Just one brief announcement. 2 00:00:02,420 --> 00:00:05,189 HKN reviews are going to be done in class next Monday 3 00:00:05,189 --> 00:00:07,730 so you guys should make sure you come, give us your feedback, 4 00:00:07,730 --> 00:00:09,813 let us know what you like and what you don't like. 5 00:00:11,562 --> 00:00:13,770 And, with that, we'll start talking about protection. 6 00:00:21,630 --> 00:00:25,039 Protection is like fault-tolerance 7 00:00:25,039 --> 00:00:25,830 and recoverability. 8 00:00:25,830 --> 00:00:28,040 One of these properties of systems, 9 00:00:28,040 --> 00:00:30,800 or building secure and protected systems 10 00:00:30,800 --> 00:00:33,910 has implications for the entire design of the system. 11 00:00:33,910 --> 00:00:36,440 And it's going to be sort of a set of cross-cutting issues 12 00:00:36,440 --> 00:00:38,970 that is going to affect the way that, for example, 13 00:00:38,970 --> 00:00:40,590 the networking protocols are designed 14 00:00:40,590 --> 00:00:44,660 or that the sort of modules that make up your biggest computer 15 00:00:44,660 --> 00:00:45,500 system are designs. 16 00:00:45,500 --> 00:00:47,390 So it's going to be a whole set of usually 17 00:00:47,390 --> 00:00:49,306 that we're going to look at through the course 18 00:00:49,306 --> 00:00:51,200 of this discussion about protection 19 00:00:51,200 --> 00:00:54,600 that are going to affect the system at all levels. 20 00:00:54,600 --> 00:00:58,210 In 6.033, we use the work protection and security 21 00:00:58,210 --> 00:00:59,579 essentially synonymous. 22 00:00:59,579 --> 00:01:02,120 Often times we'll talk about a secure system or a system that 23 00:01:02,120 --> 00:01:07,250 has security or certain security goals that we have, 24 00:01:07,250 --> 00:01:09,815 so we're going to use those words interchangeably. 25 00:01:17,425 --> 00:01:18,800 Security is one of these topicals 26 00:01:18,800 --> 00:01:21,650 that you guys are familiar with to some extent already. 27 00:01:21,650 --> 00:01:24,340 You've heard about various things 28 00:01:24,340 --> 00:01:27,560 on the Internet going on where people's information has been 29 00:01:27,560 --> 00:01:31,940 stolen on laptops or a website has been cracked into 30 00:01:31,940 --> 00:01:35,930 or some worm or new virus, the new I 31 00:01:35,930 --> 00:01:37,490 love you virus is spreading around 32 00:01:37,490 --> 00:01:38,730 and confection people's computers. 33 00:01:38,730 --> 00:01:40,396 So you guys are sort of familiar with it 34 00:01:40,396 --> 00:01:45,430 on a collegial sort of way, I'm sure. 35 00:01:45,430 --> 00:01:47,890 You also are familiar with many of the tools 36 00:01:47,890 --> 00:01:49,590 that we're going to talk about, so 37 00:01:49,590 --> 00:01:51,759 the applied versions of the many of the tools 38 00:01:51,759 --> 00:01:53,050 that we're going to talk about. 39 00:01:53,050 --> 00:01:55,840 You've all used a password to lock into a computer before 40 00:01:55,840 --> 00:01:58,600 or you've used a website that using 41 00:01:58,600 --> 00:02:02,700 SSL to encrypt the communication with some other website. 42 00:02:02,700 --> 00:02:04,920 So you're going to be familiar with some 43 00:02:04,920 --> 00:02:06,970 of many of the high letter instances of the tools 44 00:02:06,970 --> 00:02:08,430 that we'll talk through in this session, 45 00:02:08,430 --> 00:02:10,889 but what we're going to try to delve down into in 6.033 46 00:02:10,889 --> 00:02:12,847 is how those systems are actually put together, 47 00:02:12,847 --> 00:02:14,626 what the design principals are behind 48 00:02:14,626 --> 00:02:15,875 building these secure systems. 49 00:02:19,680 --> 00:02:22,710 As I said, you guys are presumably very familiar with, 50 00:02:22,710 --> 00:02:24,550 you've heard about these various kinds 51 00:02:24,550 --> 00:02:26,300 of attacks that are going on. 52 00:02:26,300 --> 00:02:30,220 And one of the things that's happened in the last few years, 53 00:02:30,220 --> 00:02:33,130 as the Internet has become more and more commercial and larger 54 00:02:33,130 --> 00:02:41,300 and larger, is that it's meant that security of computers 55 00:02:41,300 --> 00:02:43,010 has become much, much more of a problem. 56 00:02:43,010 --> 00:02:51,440 So the growth of the Internet has spawned additional attacks. 57 00:02:54,130 --> 00:02:56,590 If you go look at a website, for example, 58 00:02:56,590 --> 00:02:58,340 there are several security websites 59 00:02:58,340 --> 00:03:02,840 that track recent security breaches This is one example. 60 00:03:02,840 --> 00:03:04,714 It's called Security Focus dot com. 61 00:03:04,714 --> 00:03:06,130 I don't know if you can see these, 62 00:03:06,130 --> 00:03:07,504 but this is just a list of things 63 00:03:07,504 --> 00:03:10,687 that have happened in the last just few days on the Internet. 64 00:03:10,687 --> 00:03:12,770 Somebody is reporting that web server hacks are up 65 00:03:12,770 --> 00:03:16,760 by one-third, some IT conference was hacked, 66 00:03:16,760 --> 00:03:19,860 Windows says "trusted Windows" is still coming. 67 00:03:22,470 --> 00:03:25,167 [LAUGHTER] So it just goes on and on 68 00:03:25,167 --> 00:03:27,750 with these are things that have happened in the last few days. 69 00:03:27,750 --> 00:03:29,250 There is this huge number of things. 70 00:03:29,250 --> 00:03:33,012 You may have heard recently about how 71 00:03:33,012 --> 00:03:35,220 there have been several large companies recently that 72 00:03:35,220 --> 00:03:37,920 have had big privacy problems where databases of customer 73 00:03:37,920 --> 00:03:39,170 information have been stolen. 74 00:03:39,170 --> 00:03:40,824 AmeriTrade just had this happen. 75 00:03:40,824 --> 00:03:42,490 The University of California at Berkeley 76 00:03:42,490 --> 00:03:45,630 had something like several hundred thousand applications 77 00:03:45,630 --> 00:03:47,310 and graduate student records were 78 00:03:47,310 --> 00:03:48,950 on a laptop that was stolen. 79 00:03:48,950 --> 00:03:55,160 These are the kinds of things, the kinds of attacks that 80 00:03:55,160 --> 00:03:56,700 happen in the world, and these are 81 00:03:56,700 --> 00:03:58,200 the kinds of things that we're going 82 00:03:58,200 --> 00:03:59,491 to talk about how you mitigate. 83 00:04:06,420 --> 00:04:08,870 The objective, really, of security, 84 00:04:08,870 --> 00:04:11,390 one simple way to look at an objective of security 85 00:04:11,390 --> 00:04:15,400 is that we want to sort of protect 86 00:04:15,400 --> 00:04:18,660 our computer from bad guys. 87 00:04:18,660 --> 00:04:26,270 The definition of bad guy depends on what you mean. 88 00:04:26,270 --> 00:04:28,740 It could be the 16 year old kid in his dorm room hacking 89 00:04:28,740 --> 00:04:30,010 into people's computers. 90 00:04:30,010 --> 00:04:34,680 It could be somebody out to sleep hundreds of thousands 91 00:04:34,680 --> 00:04:36,697 of dollars from a corporation, but let's assume 92 00:04:36,697 --> 00:04:38,530 that there are some bad people out there who 93 00:04:38,530 --> 00:04:42,660 want to sort of take over your computer. 94 00:04:42,660 --> 00:04:45,370 But, at the same time, the objective of security 95 00:04:45,370 --> 00:04:50,120 is also to allow access to the good guys. 96 00:04:50,120 --> 00:04:55,849 So one way to make sure that the bad guys don't get at your data 97 00:04:55,849 --> 00:04:57,640 is simply to turn your computer off, right? 98 00:04:57,640 --> 00:05:00,020 But that's not really a good option. 99 00:05:00,020 --> 00:05:03,656 We want the data to be available to the people who need the data 100 00:05:03,656 --> 00:05:05,280 and have the rights to access the data. 101 00:05:08,750 --> 00:05:12,860 Often times we can sort of frame our discussion, 102 00:05:12,860 --> 00:05:14,769 we can say that we're trying to protect 103 00:05:14,769 --> 00:05:17,310 we have some set of information that we want to keep private. 104 00:05:17,310 --> 00:05:22,920 So sort of a goal of a secure system, in some sense, 105 00:05:22,920 --> 00:05:27,370 is providing privacy. 106 00:05:27,370 --> 00:05:29,790 So we have some set of data that's on our computer system 107 00:05:29,790 --> 00:05:31,380 or that's being transmitted over the network 108 00:05:31,380 --> 00:05:32,740 that we want to keep private, we want 109 00:05:32,740 --> 00:05:34,865 to keep other people from being able to have access 110 00:05:34,865 --> 00:05:35,780 to or tamper with. 111 00:05:38,970 --> 00:05:41,440 And throughout the sort of notion 112 00:05:41,440 --> 00:05:44,080 of what it means for a computer system to be secure 113 00:05:44,080 --> 00:05:47,240 is sort of application dependent. 114 00:05:47,240 --> 00:05:49,421 It depends very much on what the computer 115 00:05:49,421 --> 00:05:50,670 system we're talking about is. 116 00:05:50,670 --> 00:05:53,420 It may be the case that in your file system, 117 00:05:53,420 --> 00:05:55,670 you have a set of files that you want the entire world 118 00:05:55,670 --> 00:05:58,110 to have access to, stuff that's on your webpage they're 119 00:05:58,110 --> 00:05:59,240 willing to make public. 120 00:05:59,240 --> 00:06:03,830 You don't have any real security concerns about who can read it. 121 00:06:03,830 --> 00:06:06,031 But, at the same time, you might have banking data 122 00:06:06,031 --> 00:06:08,280 that you really don't want people in the outside world 123 00:06:08,280 --> 00:06:09,300 to be able to access. 124 00:06:09,300 --> 00:06:11,920 So you have a set of policies that define, in your head, 125 00:06:11,920 --> 00:06:13,690 some sort of set of policies or rules 126 00:06:13,690 --> 00:06:15,190 about what it is that you would like 127 00:06:15,190 --> 00:06:16,810 users to be able to access. 128 00:06:16,810 --> 00:06:21,940 So almost any system has some policies 129 00:06:21,940 --> 00:06:25,740 associated with data should be accessed by other people. 130 00:06:25,740 --> 00:06:27,420 Some notion of what data they want 131 00:06:27,420 --> 00:06:29,730 to keep private that can be sort of translated 132 00:06:29,730 --> 00:06:32,270 into this set of policies. 133 00:06:32,270 --> 00:06:36,270 So in 6.033 it's sort of hard to study in a systematic way 134 00:06:36,270 --> 00:06:38,680 different types of policy. 135 00:06:38,680 --> 00:06:41,950 Policy is something that we're not going to have, 136 00:06:41,950 --> 00:06:44,820 we could sit around and have an informal discussion 137 00:06:44,820 --> 00:06:47,270 about different possible policies that you might want. 138 00:06:47,270 --> 00:06:49,410 But instead, in 6.033, what we're going to do, 139 00:06:49,410 --> 00:06:51,530 as we've done in much of the rest of the class 140 00:06:51,530 --> 00:06:57,450 is talk about mechanisms that we can 141 00:06:57,450 --> 00:07:00,060 use to enforce these different security policies that someone 142 00:07:00,060 --> 00:07:00,560 might have. 143 00:07:00,560 --> 00:07:02,143 So we're going to talk about the tools 144 00:07:02,143 --> 00:07:03,880 that we use to protect data. 145 00:07:06,810 --> 00:07:10,620 In thinking about security and in thinking 146 00:07:10,620 --> 00:07:13,410 about these mechanisms, it's useful to start off maybe 147 00:07:13,410 --> 00:07:16,060 by thinking about what we mean by security and protection 148 00:07:16,060 --> 00:07:18,950 in the real world and sort of compare the mechanisms that we 149 00:07:18,950 --> 00:07:21,474 have in the real world for protecting data from, say, 150 00:07:21,474 --> 00:07:23,140 mechanisms that we have on the computer. 151 00:07:30,125 --> 00:07:31,500 From the point of view of what we 152 00:07:31,500 --> 00:07:34,360 might want to accomplish with a secure computer system, 153 00:07:34,360 --> 00:07:36,740 some of the goals and objectives are 154 00:07:36,740 --> 00:07:38,500 similar to what we have in the real world. 155 00:07:41,800 --> 00:07:45,810 Clearly, we have this same objective 156 00:07:45,810 --> 00:07:47,070 which is to protect data. 157 00:07:49,850 --> 00:07:51,670 We can say, just like in the real world, 158 00:07:51,670 --> 00:07:53,902 we have a lock on a door that protects somebody 159 00:07:53,902 --> 00:07:55,360 from getting access to something we 160 00:07:55,360 --> 00:07:57,330 don't want them to have access to. 161 00:07:57,330 --> 00:08:01,580 In the world of computers, we can encrypt data 162 00:08:01,580 --> 00:08:03,390 in order to make so somebody who we don't 163 00:08:03,390 --> 00:08:06,920 want to have access to our data doesn't have access 164 00:08:06,920 --> 00:08:10,430 to that data. 165 00:08:10,430 --> 00:08:13,790 Similarly, there are also, say, for example, in the real world 166 00:08:13,790 --> 00:08:17,330 a set of laws that regulate who can access what data and what's 167 00:08:17,330 --> 00:08:18,570 legal and what's not legal. 168 00:08:18,570 --> 00:08:20,320 It's not legal for me to break into your house 169 00:08:20,320 --> 00:08:21,460 and take something from it. 170 00:08:21,460 --> 00:08:23,060 Similarly, it's not legal for somebody 171 00:08:23,060 --> 00:08:25,190 to hack into a computer and steel a bunch of files. 172 00:08:27,292 --> 00:08:28,625 There are also some differences. 173 00:08:32,360 --> 00:08:34,840 The obvious one is one that has been true of almost all 174 00:08:34,840 --> 00:08:38,640 of our comparisons between the real world 175 00:08:38,640 --> 00:08:41,280 and, say, normal engineering disciplines that 176 00:08:41,280 --> 00:08:44,959 involve building bridges and buildings and computer systems. 177 00:08:44,959 --> 00:08:46,750 And that's this issue that computer systems 178 00:08:46,750 --> 00:08:49,830 have a very high dtech over dt. 179 00:08:49,830 --> 00:08:51,540 So the computer systems change quickly. 180 00:08:51,540 --> 00:08:54,836 And that means there is always both new ways in which 181 00:08:54,836 --> 00:08:56,710 computers systems are connected to the world, 182 00:08:56,710 --> 00:08:58,817 there are new and faster computers 183 00:08:58,817 --> 00:09:00,900 that are capable of breaking encryption algorithms 184 00:09:00,900 --> 00:09:02,984 that maybe we didn't think could be broken before, 185 00:09:02,984 --> 00:09:05,483 there are new algorithms being developed both for protecting 186 00:09:05,483 --> 00:09:06,920 data, and there are new strategies 187 00:09:06,920 --> 00:09:09,840 that people are adopting to sort of attack computers. 188 00:09:09,840 --> 00:09:12,700 In the recent years we've seen this thing where people are 189 00:09:12,700 --> 00:09:14,658 doing what they call phishing, P-H-I-S-H-I-N-G. 190 00:09:14,658 --> 00:09:17,640 Where people are putting up fake websites that look like real 191 00:09:17,640 --> 00:09:18,470 websites. 192 00:09:18,470 --> 00:09:21,637 So all these emails that you get from Bank One 193 00:09:21,637 --> 00:09:23,970 or whoever it is saying come to our website, click on it 194 00:09:23,970 --> 00:09:26,535 and give us your social security number and your credit card 195 00:09:26,535 --> 00:09:28,660 number, I hope you guys aren't responding to those. 196 00:09:28,660 --> 00:09:30,590 Those are fake websites and people are trying 197 00:09:30,590 --> 00:09:32,510 to steal your information. 198 00:09:32,510 --> 00:09:34,600 We see new attacks emerging over time. 199 00:09:37,750 --> 00:09:42,220 The other kinds of things that we see that are different, 200 00:09:42,220 --> 00:09:46,210 clearly computer systems are, a tax in computer systems 201 00:09:46,210 --> 00:09:50,690 can be both very fast and they can be cheap. 202 00:09:50,690 --> 00:09:53,470 So, unlike in the real world, in a computer system 203 00:09:53,470 --> 00:09:56,050 you don't have to physically break into something. 204 00:09:56,050 --> 00:09:58,580 You can do this very quickly over a computer system. 205 00:09:58,580 --> 00:10:00,310 So you see, for example, with some 206 00:10:00,310 --> 00:10:03,390 of these warms and viruses, these things spread literally 207 00:10:03,390 --> 00:10:04,990 across tens of thousands of computers 208 00:10:04,990 --> 00:10:06,080 in a matter of seconds. 209 00:10:06,080 --> 00:10:08,170 So these are very efficient and effective attacks 210 00:10:08,170 --> 00:10:10,180 that can take over a huge number of computers 211 00:10:10,180 --> 00:10:12,490 in a very short period of time. 212 00:10:12,490 --> 00:10:16,370 And that leads to us wanting to have a different set of sort 213 00:10:16,370 --> 00:10:19,810 of mechanisms for dealing with these kinds of problems. 214 00:10:19,810 --> 00:10:26,200 And then, finally, it is also the case 215 00:10:26,200 --> 00:10:29,360 there are some differences in between laws and computer 216 00:10:29,360 --> 00:10:32,120 systems and in the real world. 217 00:10:32,120 --> 00:10:34,730 In particular, because the computer systems 218 00:10:34,730 --> 00:10:37,960 change so fast because new technologies develop so fast, 219 00:10:37,960 --> 00:10:40,250 the legal system tends to lag significantly 220 00:10:40,250 --> 00:10:42,486 behind the state of the art and technology. 221 00:10:42,486 --> 00:10:43,860 So the legal system often doesn't 222 00:10:43,860 --> 00:10:46,800 have regulations or rules that specifically 223 00:10:46,800 --> 00:10:49,960 govern whether or not something is OK or is not OK. 224 00:10:49,960 --> 00:10:51,510 And that means sometimes it's unclear 225 00:10:51,510 --> 00:10:52,968 whether it's legal to do something. 226 00:10:52,968 --> 00:10:56,870 So, for example, right now it's not clear 227 00:10:56,870 --> 00:10:58,980 whether it's legal for you to take your laptop out 228 00:10:58,980 --> 00:11:01,410 in the City of Cambridge, open it up and try and connect 229 00:11:01,410 --> 00:11:04,079 to somebody's open wireless network. 230 00:11:04,079 --> 00:11:06,120 Certainly, you can do that, it's very easy to do, 231 00:11:06,120 --> 00:11:07,700 probably many of us have done this, 232 00:11:07,700 --> 00:11:11,120 but from a legal standpoint there 233 00:11:11,120 --> 00:11:14,130 is still some debate as to whether this is OK or not. 234 00:11:14,130 --> 00:11:16,130 What this suggests, the fact that laws 235 00:11:16,130 --> 00:11:18,950 are often unclear, ambiguous or simply unspecified 236 00:11:18,950 --> 00:11:20,700 about a particular thing is that we're 237 00:11:20,700 --> 00:11:22,640 going to need additional sort of sets 238 00:11:22,640 --> 00:11:24,997 of, if you really want to make sure your data is secure, 239 00:11:24,997 --> 00:11:26,830 if you want to enforce a particular security 240 00:11:26,830 --> 00:11:30,150 policy you're going to need to rely more on sort 241 00:11:30,150 --> 00:11:33,661 of real mechanisms in the computer software to do this 242 00:11:33,661 --> 00:11:36,160 rather than on the legal system to say, for example, protect 243 00:11:36,160 --> 00:11:39,010 you from something that might be happening in the outside world. 244 00:11:47,120 --> 00:11:49,110 Designing computer systems is hard. 245 00:11:49,110 --> 00:11:52,420 A secure computer system is hard, in particular. 246 00:11:52,420 --> 00:11:55,930 And the reason for that is that security, often 247 00:11:55,930 --> 00:11:59,120 times the things we want to do in secure systems, 248 00:11:59,120 --> 00:12:02,740 the things we want to enforce are so-called negative goals. 249 00:12:02,740 --> 00:12:03,860 So what do I mean by that? 250 00:12:13,370 --> 00:12:16,140 An example of a positive goal is, 251 00:12:16,140 --> 00:12:27,060 for example, I might say Sam can access pile F. 252 00:12:27,060 --> 00:12:29,550 That, presumably, is something that is relatively easy 253 00:12:29,550 --> 00:12:30,900 to verify that that's true. 254 00:12:30,900 --> 00:12:32,350 I can log onto my computer system. 255 00:12:32,350 --> 00:12:34,850 And if I can access this file F then, 256 00:12:34,850 --> 00:12:36,400 well, great, I can access the file F. 257 00:12:36,400 --> 00:12:37,470 We know that's true. 258 00:12:37,470 --> 00:12:38,700 And that was easy to check. 259 00:12:41,330 --> 00:12:45,880 Furthermore, if I cannot access the file and I think that I 260 00:12:45,880 --> 00:12:47,560 should have the rights to access it, 261 00:12:47,560 --> 00:12:49,990 I'm going to email my system administrator and say, hey, 262 00:12:49,990 --> 00:12:52,406 I think I should be able to access this file, why can't I, 263 00:12:52,406 --> 00:12:55,050 will you please give me access to it? 264 00:12:55,050 --> 00:13:03,210 An example of a negative goal is that Sam 265 00:13:03,210 --> 00:13:10,240 shouldn't be able to access F. 266 00:13:10,240 --> 00:13:13,730 At first it may seem that this is just the same problem 267 00:13:13,730 --> 00:13:17,050 as saying, it's just the inverse of saying Sam cannot access F, 268 00:13:17,050 --> 00:13:19,900 but it seems like it should be just as easy or hard to verify. 269 00:13:19,900 --> 00:13:22,930 But it turns out that when you're thinking about computer 270 00:13:22,930 --> 00:13:25,180 systems, this sort of a problem is very hard 271 00:13:25,180 --> 00:13:28,410 to verify because, while it may be true that when 272 00:13:28,410 --> 00:13:31,790 I try and open the file and I log into my machine 273 00:13:31,790 --> 00:13:33,290 and I connect to some remote machine 274 00:13:33,290 --> 00:13:35,370 and try and access that file, it may be the case 275 00:13:35,370 --> 00:13:38,180 that the file system denies me access to that file. 276 00:13:38,180 --> 00:13:40,800 But I may have many other avenues for obtaining access 277 00:13:40,800 --> 00:13:41,490 to that file. 278 00:13:41,490 --> 00:13:43,790 For example, suppose I have a key 279 00:13:43,790 --> 00:13:46,905 to the room in which the server that hosts that file is stored. 280 00:13:46,905 --> 00:13:48,280 I can walk into that room and may 281 00:13:48,280 --> 00:13:50,880 be able to sit down in front of the consol on this machine 282 00:13:50,880 --> 00:13:53,747 and obtain super user route access to that machine. 283 00:13:53,747 --> 00:13:56,080 Or, I may be able to pull the hard drive off the machine 284 00:13:56,080 --> 00:13:58,910 and put it into my computer and read files off of it. 285 00:13:58,910 --> 00:14:01,320 Or, I may be able to bribe my system administrator 286 00:14:01,320 --> 00:14:03,280 and give him a hundred dollars in exchange 287 00:14:03,280 --> 00:14:05,304 for him letting me have access to this file. 288 00:14:05,304 --> 00:14:06,720 So there are lots and lots of ways 289 00:14:06,720 --> 00:14:10,310 in which users can get unauthorized or unintended 290 00:14:10,310 --> 00:14:12,510 access to files or other information 291 00:14:12,510 --> 00:14:13,930 in computer systems. 292 00:14:13,930 --> 00:14:15,970 And verifying that none of those avenues 293 00:14:15,970 --> 00:14:20,490 are available to a particular user is very hard. 294 00:14:20,490 --> 00:14:23,110 Worse, or similarly, this is hard 295 00:14:23,110 --> 00:14:24,820 because it's very unlikely that a user is 296 00:14:24,820 --> 00:14:27,080 going to complain about having access to some file 297 00:14:27,080 --> 00:14:29,110 that they shouldn't have access to. 298 00:14:29,110 --> 00:14:31,520 I'm not going to call up my system administrator and say, 299 00:14:31,520 --> 00:14:33,210 hey, I have access to this file, I don't think 300 00:14:33,210 --> 00:14:34,350 I should have access to it. 301 00:14:34,350 --> 00:14:35,474 Nobody is going to do that. 302 00:14:35,474 --> 00:14:37,326 So, even though I'm not a malicious user, 303 00:14:37,326 --> 00:14:38,700 I don't really have any incentive 304 00:14:38,700 --> 00:14:40,570 to go to my system administrator and tell them 305 00:14:40,570 --> 00:14:42,153 that there's this problem with the way 306 00:14:42,153 --> 00:14:44,510 that things are configured on the computer. 307 00:14:44,510 --> 00:14:46,700 And that extends also to people who, 308 00:14:46,700 --> 00:14:48,027 in fact, are malicious users. 309 00:14:48,027 --> 00:14:49,860 When somebody breaks into a computer system, 310 00:14:49,860 --> 00:14:51,749 they don't typically, usually, send out 311 00:14:51,749 --> 00:14:54,040 an advertisement to everybody in the world saying, hey, 312 00:14:54,040 --> 00:14:55,810 by the way, I got access to this file 313 00:14:55,810 --> 00:14:57,320 that I should have access to. 314 00:14:57,320 --> 00:14:59,710 It's possible for them to log in, read the file, log out, 315 00:14:59,710 --> 00:15:05,450 and nobody would be any of the wiser. 316 00:15:05,450 --> 00:15:09,150 Many of our security goals are negative, 317 00:15:09,150 --> 00:15:12,990 and that means building secure computer systems is hard. 318 00:15:26,780 --> 00:15:31,740 What we're going to do in 6.033, in order to get at this, 319 00:15:31,740 --> 00:15:34,340 get at sort of building secure systems 320 00:15:34,340 --> 00:15:37,070 in the face of these negative goals, 321 00:15:37,070 --> 00:15:43,350 is look at a set of different security functions 322 00:15:43,350 --> 00:15:45,200 that we can use to protect information 323 00:15:45,200 --> 00:15:48,450 and access to computers in different sorts of ways. 324 00:15:48,450 --> 00:15:53,070 And we're typically going to talk about, 325 00:15:53,070 --> 00:15:57,550 throughout this, a client server sort of a system 326 00:15:57,550 --> 00:16:00,620 where you have some client and some server that are separated, 327 00:16:00,620 --> 00:16:03,040 typically over the Internet, that are sort of trying 328 00:16:03,040 --> 00:16:05,760 to exchange information or obtain information 329 00:16:05,760 --> 00:16:08,430 from each other in a way such that that information 330 00:16:08,430 --> 00:16:14,110 exchange is protected and secure and so on. 331 00:16:14,110 --> 00:16:18,940 Suppose we have a client sending some information out 332 00:16:18,940 --> 00:16:24,130 over the Internet to some server. 333 00:16:24,130 --> 00:16:26,600 What are the kinds of things that we want to make sure, 334 00:16:26,600 --> 00:16:29,320 what are the sorts of security goals 335 00:16:29,320 --> 00:16:33,900 that we might want to enforce in this environment? 336 00:16:33,900 --> 00:16:42,430 One thing we might want to do is authenticate the client. 337 00:16:42,430 --> 00:16:45,920 The server might like to know for sure 338 00:16:45,920 --> 00:16:47,990 that the person who issued this request 339 00:16:47,990 --> 00:16:49,450 is, in fact, the client. 340 00:16:49,450 --> 00:16:52,160 They'd like to have some way of knowing that the client issued 341 00:16:52,160 --> 00:16:57,210 this request and that the request that was sent 342 00:16:57,210 --> 00:17:00,120 is, in fact, what the client intended to be sent. 343 00:17:00,120 --> 00:17:02,480 For example, that somebody didn't intercept this message 344 00:17:02,480 --> 00:17:04,410 as it was being transmitted over the network, 345 00:17:04,410 --> 00:17:06,410 change it a little bit and then send it 346 00:17:06,410 --> 00:17:10,540 on making it look as though it came from the client. 347 00:17:10,540 --> 00:17:12,609 There might be different kinds of attackers that 348 00:17:12,609 --> 00:17:14,280 are sitting in this Internet. 349 00:17:14,280 --> 00:17:18,349 For example, there might be say an intermediate router 350 00:17:18,349 --> 00:17:20,540 along the path between the client and the server 351 00:17:20,540 --> 00:17:23,700 where the system administrator of the router is malicious. 352 00:17:27,390 --> 00:17:29,520 Oftentimes, in the security literature, 353 00:17:29,520 --> 00:17:31,020 there are these funny names that are 354 00:17:31,020 --> 00:17:33,420 attached to the different people in different places. 355 00:17:33,420 --> 00:17:37,540 Oftentimes the client is called Alice and the server 356 00:17:37,540 --> 00:17:39,960 is called Bob, the person receiving the request. 357 00:17:39,960 --> 00:17:41,770 And we talk about two different attackers. 358 00:17:41,770 --> 00:17:45,160 We talk about Eve who is an eaves dropper who 359 00:17:45,160 --> 00:17:48,450 listens to what's going on and tries to acquire information 360 00:17:48,450 --> 00:17:49,950 that she's not authorized to have 361 00:17:49,950 --> 00:17:52,890 and we talk about Lucifer who is the bad guy who 362 00:17:52,890 --> 00:17:55,710 not only is trying to overhear information 363 00:17:55,710 --> 00:17:57,870 but may do arbitrarily bad things. 364 00:17:57,870 --> 00:18:00,200 He's trying to take over the data 365 00:18:00,200 --> 00:18:04,700 and corrupt it in any way he possibly can. 366 00:18:04,700 --> 00:18:07,077 One goal is that we want to prevent, say, 367 00:18:07,077 --> 00:18:09,660 for example, Lucifer from being able to interfere with packets 368 00:18:09,660 --> 00:18:12,560 coming from Alice to Bob. 369 00:18:12,560 --> 00:18:14,640 We want to make sure that packets the server 370 00:18:14,640 --> 00:18:16,834 receives actually originated from Alice 371 00:18:16,834 --> 00:18:18,750 and were the original request that Alice sent, 372 00:18:18,750 --> 00:18:21,420 so that's authentication. 373 00:18:21,420 --> 00:18:27,880 We also might want to authorize, at the server, that Alice is, 374 00:18:27,880 --> 00:18:30,550 in fact, allowed to access the things that she's 375 00:18:30,550 --> 00:18:31,660 trying to access. 376 00:18:31,660 --> 00:18:33,420 If Alice tries to read a file, we 377 00:18:33,420 --> 00:18:35,740 need some way of understanding whether Alice is allowed 378 00:18:35,740 --> 00:18:36,990 to access this file or not. 379 00:18:45,565 --> 00:18:47,690 We also need to keep some information confidential. 380 00:18:54,760 --> 00:18:56,410 We may want it to be the case that Eve, 381 00:18:56,410 --> 00:18:59,450 who overhears a packet, cannot tell what the contents of that 382 00:18:59,450 --> 00:19:00,027 packet are. 383 00:19:00,027 --> 00:19:02,360 We might want to be able to protect the contents of that 384 00:19:02,360 --> 00:19:04,620 thing so that Eve cannot even see what's going over 385 00:19:04,620 --> 00:19:05,142 the network. 386 00:19:05,142 --> 00:19:06,600 So notice that this is a little bit 387 00:19:06,600 --> 00:19:08,070 different than authenticating. 388 00:19:08,070 --> 00:19:09,380 Authenticating says we just want to make 389 00:19:09,380 --> 00:19:11,080 sure the packet, in fact, came from the client. 390 00:19:11,080 --> 00:19:12,300 But we're not saying anything about 391 00:19:12,300 --> 00:19:13,800 whether or not somebody else can see 392 00:19:13,800 --> 00:19:15,064 the contents of that packet. 393 00:19:15,064 --> 00:19:16,980 Keeping confidential says we want to make sure 394 00:19:16,980 --> 00:19:19,200 that nobody, except for the intended recipients 395 00:19:19,200 --> 00:19:22,770 can, in fact, see the contents of this. 396 00:19:22,770 --> 00:19:24,350 There are a couple other properties 397 00:19:24,350 --> 00:19:25,615 that we want as well. 398 00:19:25,615 --> 00:19:27,365 One thing we might want is accountability. 399 00:19:32,759 --> 00:19:34,800 We're going to talk about this a little bit more. 400 00:19:34,800 --> 00:19:38,730 This says we need to assume that it's always possible 401 00:19:38,730 --> 00:19:39,980 that something could go wrong. 402 00:19:39,980 --> 00:19:41,617 It's always possible that I might 403 00:19:41,617 --> 00:19:43,200 have bribed my assistant administrator 404 00:19:43,200 --> 00:19:44,910 and he might have given me access to the computer. 405 00:19:44,910 --> 00:19:46,450 And, in the end, there is not much you're going 406 00:19:46,450 --> 00:19:47,780 to be able to do about that. 407 00:19:47,780 --> 00:19:50,580 What you want to do is make sure that when situations 408 00:19:50,580 --> 00:19:54,714 like that occur that there's some log of what happened, 409 00:19:54,714 --> 00:19:56,130 you have some way of understanding 410 00:19:56,130 --> 00:19:57,530 what it was that happened, why it happened 411 00:19:57,530 --> 00:19:59,863 and how it happened so you can try and prevent it later. 412 00:19:59,863 --> 00:20:03,310 You want to make sure you do a counting, 413 00:20:03,310 --> 00:20:05,675 you keep track of what's been going on. 414 00:20:05,675 --> 00:20:07,425 And, finally, you might want availability. 415 00:20:11,580 --> 00:20:14,609 This is you might want to make sure that Lucifer, 416 00:20:14,609 --> 00:20:17,150 who is sitting here between the client and the server cannot, 417 00:20:17,150 --> 00:20:19,920 for example, send a huge number of packets at the server 418 00:20:19,920 --> 00:20:22,290 and make it unavailable, just swamp it with a denial 419 00:20:22,290 --> 00:20:24,160 of service attack. 420 00:20:24,160 --> 00:20:26,280 Availability means that this system, in fact, 421 00:20:26,280 --> 00:20:28,380 functions and provides the functionality 422 00:20:28,380 --> 00:20:31,310 that it was intended to provide to the client. 423 00:20:34,770 --> 00:20:38,677 We are going to spend a while in 6.033 especially focusing 424 00:20:38,677 --> 00:20:40,010 on these first three techniques. 425 00:20:40,010 --> 00:20:41,510 Essentially, the next three lectures 426 00:20:41,510 --> 00:20:43,460 are going to be talking about how we guaranty, 427 00:20:43,460 --> 00:20:46,800 how we authenticate and authorize users 428 00:20:46,800 --> 00:20:50,540 and how we keep information confidential and private. 429 00:20:50,540 --> 00:20:55,200 But all of these goals together there 430 00:20:55,200 --> 00:20:57,450 is a set of technical techniques that we 431 00:20:57,450 --> 00:20:58,950 can talk about for trying to provide 432 00:20:58,950 --> 00:20:59,991 each one of these things. 433 00:20:59,991 --> 00:21:02,130 But when you think about building a secure system, 434 00:21:02,130 --> 00:21:04,460 it's not enough to simply say we're 435 00:21:04,460 --> 00:21:07,450 going to employ, you know, I employ authentication to make 436 00:21:07,450 --> 00:21:11,160 sure that Alice is, in fact, Alice when she talks to Bob. 437 00:21:11,160 --> 00:21:14,500 What you want to do, when you build a secure system, 438 00:21:14,500 --> 00:21:16,720 is think about sort of how to, you 439 00:21:16,720 --> 00:21:19,170 want to get your mindset around building 440 00:21:19,170 --> 00:21:22,400 sort of the general ideas behind a secure system. 441 00:21:22,400 --> 00:21:25,560 And so, in 6.033, we have this set of principles 442 00:21:25,560 --> 00:21:28,650 that we advocate called the safety net approach. 443 00:21:28,650 --> 00:21:30,650 And the idea is the safety net approach is a way 444 00:21:30,650 --> 00:21:33,550 to help you sort of in general think about building 445 00:21:33,550 --> 00:21:36,464 a secure system as opposed to these specific techniques 446 00:21:36,464 --> 00:21:38,380 that we're going to see how to apply later on. 447 00:21:49,370 --> 00:21:52,490 The safety net approach advocates 448 00:21:52,490 --> 00:21:57,190 sort of a set of ways of thinking about your system. 449 00:21:57,190 --> 00:21:59,065 The first one is be paranoid. 450 00:22:03,840 --> 00:22:06,750 This is sort of the Murphy's Law of security. 451 00:22:06,750 --> 00:22:08,760 It says assume that anything that can go wrong 452 00:22:08,760 --> 00:22:10,190 will go wrong. 453 00:22:10,190 --> 00:22:18,410 Don't just assume that because you're authenticating 454 00:22:18,410 --> 00:22:21,350 the communication between Alice and Bob 455 00:22:21,350 --> 00:22:23,780 that there is no way that somebody else will pretend 456 00:22:23,780 --> 00:22:24,290 to be Alice. 457 00:22:24,290 --> 00:22:28,220 You should always have something, a safety net, 458 00:22:28,220 --> 00:22:33,090 some backup to make sure that your system is really secure. 459 00:22:33,090 --> 00:22:35,234 A good example of being paranoid and applying 460 00:22:35,234 --> 00:22:36,650 the safety net approach are things 461 00:22:36,650 --> 00:22:44,160 like suppose your router has a firewall on it, 462 00:22:44,160 --> 00:22:46,984 suppose your home router has a firewall that's 463 00:22:46,984 --> 00:22:49,650 supposed to prevent unauthorized users from being able to access 464 00:22:49,650 --> 00:22:52,974 your computer, does that mean that you should then turn off 465 00:22:52,974 --> 00:22:54,640 all password protection on your computer 466 00:22:54,640 --> 00:22:56,117 so that anybody can log in? 467 00:22:56,117 --> 00:22:57,450 No, you're not going to do that. 468 00:22:57,450 --> 00:22:58,500 You're going to continue to protect 469 00:22:58,500 --> 00:23:00,583 the information on your computer with the password 470 00:23:00,583 --> 00:23:03,550 because you may have a laptop and may not 471 00:23:03,550 --> 00:23:04,690 be using it from home. 472 00:23:04,690 --> 00:23:06,670 Or, you may be worried about somebody 473 00:23:06,670 --> 00:23:09,580 breaking into your computer from inside of your house, 474 00:23:09,580 --> 00:23:11,550 say, perhaps. 475 00:23:11,550 --> 00:23:14,040 That's an example of sort of thinking about the safety net 476 00:23:14,040 --> 00:23:14,320 approach. 477 00:23:14,320 --> 00:23:15,810 You have multiple layers of protection 478 00:23:15,810 --> 00:23:16,685 for your information. 479 00:23:19,540 --> 00:23:21,260 There are some sort of sub-approaches, 480 00:23:21,260 --> 00:23:24,220 there are some sort of sub-techniques 481 00:23:24,220 --> 00:23:27,200 that we can talk about in the context of being prepared. 482 00:23:27,200 --> 00:23:30,750 One of them is accept feedback from users. 483 00:23:30,750 --> 00:23:32,730 If you were designing a big computer system 484 00:23:32,730 --> 00:23:35,150 and somebody tells you that something is secure 485 00:23:35,150 --> 00:23:37,030 or that there is a problem, be prepared 486 00:23:37,030 --> 00:23:39,500 to accept that feedback, have a way to accept that feedback 487 00:23:39,500 --> 00:23:41,140 and respond to that feedback. 488 00:23:41,140 --> 00:23:44,090 Don't simply say oh, that's not a real security problem, that's 489 00:23:44,090 --> 00:23:47,792 not a concern or don't, for example, 490 00:23:47,792 --> 00:23:50,375 make it so the users don't have a way to give you information. 491 00:23:53,390 --> 00:23:54,530 Defend in depth. 492 00:23:54,530 --> 00:23:58,720 This is have multiple security interfaces 493 00:23:58,720 --> 00:24:02,610 like passwords plus a firewall. 494 00:24:02,610 --> 00:24:07,825 And, finally, minimize what is trusted. 495 00:24:12,780 --> 00:24:15,746 You want to make sure that the, and this 496 00:24:15,746 --> 00:24:17,370 is sort of a good example, we've talked 497 00:24:17,370 --> 00:24:19,802 about this as an example of system design before. 498 00:24:19,802 --> 00:24:22,010 We want to try and keep things as simple as possible, 499 00:24:22,010 --> 00:24:25,650 but this was really important in the context of security 500 00:24:25,650 --> 00:24:29,780 because you want to make sure, for example, that you try 501 00:24:29,780 --> 00:24:31,390 and keep the protocols that you use 502 00:24:31,390 --> 00:24:33,500 to interact with people from the outside world as 503 00:24:33,500 --> 00:24:34,896 simple as possible. 504 00:24:34,896 --> 00:24:36,270 The more interfaces that you have 505 00:24:36,270 --> 00:24:37,800 with the outside world that users can connect 506 00:24:37,800 --> 00:24:39,890 to your computer over, the more places you have 507 00:24:39,890 --> 00:24:41,744 where your computer system is vulnerable. 508 00:24:41,744 --> 00:24:43,410 So you want to try and minimize, make it 509 00:24:43,410 --> 00:24:46,830 so that your computer system has as few sort of openings 510 00:24:46,830 --> 00:24:50,130 it can that you have to verify our securer as possible. 511 00:24:52,660 --> 00:24:54,700 Other examples of the safety net approach, 512 00:24:54,700 --> 00:24:57,210 you need to consider the environment. 513 00:25:05,170 --> 00:25:07,492 This just means it's not enough, you know, 514 00:25:07,492 --> 00:25:08,950 suppose that I have this connection 515 00:25:08,950 --> 00:25:12,210 here, this connection over the Internet between Alice and Bob, 516 00:25:12,210 --> 00:25:14,980 I may assume that I'm only worried about attackers 517 00:25:14,980 --> 00:25:18,930 who are coming in, say, for example, over the Internet. 518 00:25:18,930 --> 00:25:20,769 But if you're a server, the server 519 00:25:20,769 --> 00:25:22,560 may have other connections available to it. 520 00:25:22,560 --> 00:25:25,480 So it may be the case that this is a server 521 00:25:25,480 --> 00:25:27,040 inside of some corporate environment 522 00:25:27,040 --> 00:25:30,650 and this server has a dialup modem connected to it. 523 00:25:30,650 --> 00:25:32,550 Especially, nowadays, this is less common, 524 00:25:32,550 --> 00:25:35,830 but it used to be the case that almost all companies had 525 00:25:35,830 --> 00:25:38,290 a way that you could dial in and get access to a computer 526 00:25:38,290 --> 00:25:39,990 when you didn't have a wired Internet 527 00:25:39,990 --> 00:25:41,300 connection available to you. 528 00:25:41,300 --> 00:25:43,716 And often times this dial-in access was a separate place 529 00:25:43,716 --> 00:25:44,840 where people could connect. 530 00:25:44,840 --> 00:25:47,790 So, for example, it didn't have the same interface 531 00:25:47,790 --> 00:25:49,940 as the sort of main connection to the Internet. 532 00:25:49,940 --> 00:25:53,420 And that meant that there was sort of side channel, 533 00:25:53,420 --> 00:25:56,850 in the environment, through which people could use 534 00:25:56,850 --> 00:25:58,100 to get access to the computer. 535 00:25:58,100 --> 00:26:00,725 Similarly, this means you should think about all the people who 536 00:26:00,725 --> 00:26:01,930 have access to the computer. 537 00:26:01,930 --> 00:26:05,550 Is it the case maybe that this is a computer in your office 538 00:26:05,550 --> 00:26:08,380 and the janitor who works in your office comes 539 00:26:08,380 --> 00:26:10,530 into the office every night, would he or she 540 00:26:10,530 --> 00:26:12,405 be able to sit down in front of your computer 541 00:26:12,405 --> 00:26:15,110 and get access to the system when they shouldn't be able to? 542 00:26:15,110 --> 00:26:16,880 And this may sound paranoid. 543 00:26:16,880 --> 00:26:18,700 It is being paranoid, but if you really 544 00:26:18,700 --> 00:26:20,637 want to build a secure computer system 545 00:26:20,637 --> 00:26:22,720 you need to sort of keep all these things in mind. 546 00:26:26,420 --> 00:26:28,870 Need to plan for iteration. 547 00:26:28,870 --> 00:26:31,810 This is, again, just a good system design principle, 548 00:26:31,810 --> 00:26:33,320 but it's especially true here. 549 00:26:33,320 --> 00:26:35,420 Assume that there will be security violations 550 00:26:35,420 --> 00:26:36,961 in your computer system, assume there 551 00:26:36,961 --> 00:26:39,160 will be security problems, and plan 552 00:26:39,160 --> 00:26:40,810 to be able to address those problems 553 00:26:40,810 --> 00:26:43,615 and also have a way to verify that once you've addressed 554 00:26:43,615 --> 00:26:45,490 those problems the other parts of your system 555 00:26:45,490 --> 00:26:49,780 that are supposed to be secure continue to be secure. 556 00:26:49,780 --> 00:26:53,660 And, finally, keep audit trails. 557 00:26:58,070 --> 00:27:02,390 This gets at our goal of accountability. 558 00:27:02,390 --> 00:27:04,150 This just means keep track of everything. 559 00:27:04,150 --> 00:27:06,360 All of the authentication and authorization requests 560 00:27:06,360 --> 00:27:09,570 that you made in your system, when a user logs in, 561 00:27:09,570 --> 00:27:12,222 keep track of where they logged into, maybe 562 00:27:12,222 --> 00:27:14,680 even keep track of what they did so that you can come back, 563 00:27:14,680 --> 00:27:17,167 if it turns out that this person was unauthorized, 564 00:27:17,167 --> 00:27:19,250 you later discovered that they were up to no good, 565 00:27:19,250 --> 00:27:26,050 you can come back and understand what it is that they did. 566 00:27:26,050 --> 00:27:27,980 What all this sort of discussion, 567 00:27:27,980 --> 00:27:30,880 especially this discussion about the safety net approach 568 00:27:30,880 --> 00:27:34,179 illustrates or gets at is that there are lots of these issues 569 00:27:34,179 --> 00:27:35,220 that we're talking about. 570 00:27:35,220 --> 00:27:37,511 So, for example, the janitor breaking into our computer 571 00:27:37,511 --> 00:27:39,200 or me bribing my system administrator. 572 00:27:39,200 --> 00:27:41,590 There aren't really computer system issues, right? 573 00:27:41,590 --> 00:27:42,710 These are human issues. 574 00:27:42,710 --> 00:27:45,960 These are things that I'm bypassing 575 00:27:45,960 --> 00:27:47,530 any sort of authentication I might 576 00:27:47,530 --> 00:27:50,100 have in the computer or any kind of security 577 00:27:50,100 --> 00:27:52,010 that I might have built into the computer 578 00:27:52,010 --> 00:27:55,352 because, for example, my system administrator is authorized 579 00:27:55,352 --> 00:27:57,310 to access anything on the computer he wants to. 580 00:27:57,310 --> 00:28:00,250 And so I have, essentially from the computer systems 581 00:28:00,250 --> 00:28:02,350 point of view, made it look like I have rights 582 00:28:02,350 --> 00:28:03,920 to get at anything I want to if I can 583 00:28:03,920 --> 00:28:05,670 bribe the system administrator. 584 00:28:05,670 --> 00:28:12,860 So this suggests that sort of in many computer systems humans 585 00:28:12,860 --> 00:28:13,690 are the weak link. 586 00:28:19,890 --> 00:28:23,100 Not only are people bribable but people make mistakes. 587 00:28:23,100 --> 00:28:24,870 People don't read dialog boxes. 588 00:28:24,870 --> 00:28:27,400 People do things hastily. 589 00:28:27,400 --> 00:28:28,604 People don't pay attention. 590 00:28:28,604 --> 00:28:31,020 The reason that these phishing attacks work where somebody 591 00:28:31,020 --> 00:28:33,190 pretends to be Washington Mutual or eBay 592 00:28:33,190 --> 00:28:36,126 and sends you an account that asks you to log in 593 00:28:36,126 --> 00:28:37,750 and type in your social security number 594 00:28:37,750 --> 00:28:39,770 is that people don't think. 595 00:28:39,770 --> 00:28:41,940 They just see this email and say oh, I 596 00:28:41,940 --> 00:28:44,630 guess eBay wants me to give them my social security number. 597 00:28:44,630 --> 00:28:45,700 OK. 598 00:28:45,700 --> 00:28:46,910 People make mistakes. 599 00:28:46,910 --> 00:28:50,850 And this is the way that many, many security vulnerabilities 600 00:28:50,850 --> 00:28:54,170 or security problems happen is through people mistakes. 601 00:28:54,170 --> 00:28:56,140 So we're going to talk in most of 6.033 602 00:28:56,140 --> 00:28:58,129 about technical solutions to security problems. 603 00:28:58,129 --> 00:29:00,670 But, when you're actually out in the world building a system, 604 00:29:00,670 --> 00:29:02,550 you need to be thinking about the people who 605 00:29:02,550 --> 00:29:03,740 are going to be using this system 606 00:29:03,740 --> 00:29:05,115 almost as much as you're thinking 607 00:29:05,115 --> 00:29:07,780 about the sort of security cryptographic protocols 608 00:29:07,780 --> 00:29:09,890 that you design. 609 00:29:09,890 --> 00:29:11,450 That means, for example, you should 610 00:29:11,450 --> 00:29:13,420 think about the user interface in your system. 611 00:29:13,420 --> 00:29:15,870 It does the user interface to promote sort 612 00:29:15,870 --> 00:29:18,160 of users thinking securely. 613 00:29:18,160 --> 00:29:20,770 The classic example of sort of a bad user interface 614 00:29:20,770 --> 00:29:24,650 is your web-browser popping up this dialogue 615 00:29:24,650 --> 00:29:28,760 box every time you access a site that isn't SSL encrypted saying 616 00:29:28,760 --> 00:29:30,950 this website is not SSL encrypted, are you sure 617 00:29:30,950 --> 00:29:32,340 you wish to continue? 618 00:29:32,340 --> 00:29:34,260 And, after it has done this about ten times 619 00:29:34,260 --> 00:29:36,330 and these are sites that you know and believe 620 00:29:36,330 --> 00:29:38,862 are safe like almost any site on the Internet, 621 00:29:38,862 --> 00:29:40,820 you just click the dialogue box that says never 622 00:29:40,820 --> 00:29:42,570 show me this alert again. 623 00:29:42,570 --> 00:29:45,670 There is almost no useful information 624 00:29:45,670 --> 00:29:50,370 that the system is giving you by complaining in that way. 625 00:29:50,370 --> 00:29:53,560 Other examples of things you want to make sure you do 626 00:29:53,560 --> 00:29:55,265 is have good defaults. 627 00:29:59,010 --> 00:30:01,060 In particular, don't default to a state 628 00:30:01,060 --> 00:30:03,410 where the system is open. 629 00:30:03,410 --> 00:30:06,680 When an error occurs, don't leave the system in some state 630 00:30:06,680 --> 00:30:09,550 where anybody can have access to anything that they want to. 631 00:30:09,550 --> 00:30:12,750 Instead, default to a state where people don't have access 632 00:30:12,750 --> 00:30:14,940 to things so that you're not exposing 633 00:30:14,940 --> 00:30:16,660 the system to failures. 634 00:30:16,660 --> 00:30:21,790 Don't default to a password that anybody can guess. 635 00:30:21,790 --> 00:30:23,610 When you create a new user for your system 636 00:30:23,610 --> 00:30:27,460 don't make the default password PASSWORD, that's a bad idea. 637 00:30:27,460 --> 00:30:29,880 Make it some random string that you email to the user 638 00:30:29,880 --> 00:30:32,676 so that they have to come back and type it in. 639 00:30:35,920 --> 00:30:40,810 Finally, give users least privilege 640 00:30:40,810 --> 00:30:44,920 needed to do whatever it is they need to do. 641 00:30:44,920 --> 00:30:47,300 This means don't, by default, make users 642 00:30:47,300 --> 00:30:48,700 administrators of the system. 643 00:30:48,700 --> 00:30:50,690 If the user doesn't need to be an administrator of the system, 644 00:30:50,690 --> 00:30:52,440 they shouldn't have administrative access, 645 00:30:52,440 --> 00:30:55,480 they shouldn't be able to change anything that they want. 646 00:30:55,480 --> 00:30:57,550 A good example of least privilege being violated 647 00:30:57,550 --> 00:31:01,100 is many versions of Microsoft Windows, by default, 648 00:31:01,100 --> 00:31:03,550 make the first user who is created an administrator user 649 00:31:03,550 --> 00:31:04,930 who has access to everything. 650 00:31:04,930 --> 00:31:06,320 And this is a problem because now 651 00:31:06,320 --> 00:31:08,364 when somebody breaks into that users 652 00:31:08,364 --> 00:31:10,780 account they now have access to everything on the machine, 653 00:31:10,780 --> 00:31:12,390 as opposed to simply having access 654 00:31:12,390 --> 00:31:15,980 to just that user's files. 655 00:31:15,980 --> 00:31:23,390 In general, what this just means is keep your systems simple, 656 00:31:23,390 --> 00:31:25,640 keep them understandable, keep the complexity down. 657 00:31:25,640 --> 00:31:28,300 And that's sort of the safety net. 658 00:31:28,300 --> 00:31:30,800 There are these two principles that you want to think about. 659 00:31:30,800 --> 00:31:32,360 One, sort of be paranoid. 660 00:31:32,360 --> 00:31:34,700 Apply this notion of having a safety net in a computer 661 00:31:34,700 --> 00:31:35,350 system. 662 00:31:35,350 --> 00:31:38,340 And, two, don't just think about the technical protocols 663 00:31:38,340 --> 00:31:40,830 that you're going to use to enforce access to the computer 664 00:31:40,830 --> 00:31:44,162 but think about sort of who is using this system, 665 00:31:44,162 --> 00:31:45,870 think about humans who have access to it, 666 00:31:45,870 --> 00:31:48,170 and think about how to prevent those humans from being 667 00:31:48,170 --> 00:31:52,204 able to do stupid things that break your security 668 00:31:52,204 --> 00:31:53,120 goals for your system. 669 00:31:59,405 --> 00:32:01,530 This was a very sort of high-level fuzzy discussion 670 00:32:01,530 --> 00:32:02,150 about security. 671 00:32:02,150 --> 00:32:03,020 Now what we're going to do is we're 672 00:32:03,020 --> 00:32:05,020 going to drill in on some of these more 673 00:32:05,020 --> 00:32:07,490 specific technical protocols. 674 00:32:07,490 --> 00:32:13,550 And today we're going to look at a way in which you can think 675 00:32:13,550 --> 00:32:17,580 of most secure systems as consisting of a set of layers, 676 00:32:17,580 --> 00:32:20,800 and those layers are basically as follows. 677 00:32:20,800 --> 00:32:23,690 We have, at the top, some application 678 00:32:23,690 --> 00:32:26,220 which we want to secure. 679 00:32:26,220 --> 00:32:29,680 And then underneath this application 680 00:32:29,680 --> 00:32:31,505 we can talk about three layers. 681 00:32:34,910 --> 00:32:41,390 So we have some kind of functionality 682 00:32:41,390 --> 00:32:44,760 that we want to provide, we have some set of primitives 683 00:32:44,760 --> 00:32:47,060 that we're going to use to provide that, 684 00:32:47,060 --> 00:32:51,740 and then, at the very bottom, we have cryptography which 685 00:32:51,740 --> 00:32:57,630 is the set of mathematics and algorithms 686 00:32:57,630 --> 00:33:02,110 that we're going to use that we generally, in modern computer 687 00:33:02,110 --> 00:33:06,860 systems, use to make sure that the computer system is secure. 688 00:33:06,860 --> 00:33:09,349 The application may have its high-level functions that 689 00:33:09,349 --> 00:33:10,890 correspond to these things over here, 690 00:33:10,890 --> 00:33:15,300 so we may want to be able to authenticate users 691 00:33:15,300 --> 00:33:19,740 or we may want to be able to authorize 692 00:33:19,740 --> 00:33:22,140 that users have access to something 693 00:33:22,140 --> 00:33:27,920 or we may want to provide confidentiality. 694 00:33:27,920 --> 00:33:30,770 So we may want to be able to make sure 695 00:33:30,770 --> 00:33:34,480 that nobody can read information they don't have access to. 696 00:33:34,480 --> 00:33:37,290 So you're going to have a set of primitives for doing 697 00:33:37,290 --> 00:33:39,840 these different things. 698 00:33:39,840 --> 00:33:43,030 For authentication, we're going to need to talk about something 699 00:33:43,030 --> 00:33:45,099 called an access control list. 700 00:33:45,099 --> 00:33:47,390 For authorization, we're going to talk about primitives 701 00:33:47,390 --> 00:33:50,822 called sign and verify. 702 00:33:50,822 --> 00:33:53,280 And for confidentiality we will talk about these primitives 703 00:33:53,280 --> 00:33:56,715 called encrypt and decrypt. 704 00:34:00,860 --> 00:34:02,360 And these topics, this stuff that's 705 00:34:02,360 --> 00:34:03,735 in this middle set of primitives, 706 00:34:03,735 --> 00:34:06,800 we're going to describe these in more detail in later lectures. 707 00:34:06,800 --> 00:34:09,010 What I want to do with the rest of the lecture today 708 00:34:09,010 --> 00:34:11,659 is to talk about this bottom layer, this cryptography layer. 709 00:34:19,820 --> 00:34:22,060 We have some set of cryptographic ciphers 710 00:34:22,060 --> 00:34:23,210 and hashes. 711 00:34:23,210 --> 00:34:26,179 And so a cipher is just something 712 00:34:26,179 --> 00:34:30,580 that takes in, say, a message that the user 713 00:34:30,580 --> 00:34:32,679 wants to send that is not protected at all. 714 00:34:32,679 --> 00:34:36,340 And it flips the bytes in that message around 715 00:34:36,340 --> 00:34:40,139 in order to create something that is not understandable 716 00:34:40,139 --> 00:34:43,190 unless you have some piece of information that allows 717 00:34:43,190 --> 00:34:46,870 you to decipher that message. 718 00:34:46,870 --> 00:34:48,949 You can say ciphering and deciphering 719 00:34:48,949 --> 00:34:50,699 is sort of like encrypting and decrypting, 720 00:34:50,699 --> 00:34:52,260 words you may be familiar with. 721 00:34:52,260 --> 00:34:53,679 We also talk about hashes. 722 00:34:53,679 --> 00:34:57,970 Hashes we're going to use to authenticate or to authorize 723 00:34:57,970 --> 00:35:01,310 a particular message to make sure that a message is -- 724 00:35:01,310 --> 00:35:04,020 I'm sorry. 725 00:35:04,020 --> 00:35:05,990 Just for your notes, I got this backwards. 726 00:35:05,990 --> 00:35:13,560 This should be authenticate with sign and verify 727 00:35:13,560 --> 00:35:15,250 and we authorize with ACL. 728 00:35:15,250 --> 00:35:17,820 We'll talk about these things more in a minute, 729 00:35:17,820 --> 00:35:23,010 but it was just confusion over two words that start with auth. 730 00:35:23,010 --> 00:35:25,260 So we're going to use hashes to basically authenticate 731 00:35:25,260 --> 00:35:27,634 that a user is, in fact, who they claimed that they were. 732 00:35:27,634 --> 00:35:30,700 And, again, we'll see in more detail how this works 733 00:35:30,700 --> 00:35:32,110 over the next couple days. 734 00:35:42,190 --> 00:35:45,970 Early cryptographic systems relied on this idea, or early 735 00:35:45,970 --> 00:35:48,590 cryptography relied on this idea that we're 736 00:35:48,590 --> 00:35:50,320 going to try and keep the protocol that's 737 00:35:50,320 --> 00:35:55,410 used for encoding the information secret. 738 00:35:55,410 --> 00:35:58,710 A simple example of an early encryption method 739 00:35:58,710 --> 00:36:01,049 might be something that many of you 740 00:36:01,049 --> 00:36:02,590 played with when you were a kid where 741 00:36:02,590 --> 00:36:03,910 you transpose all the letters. 742 00:36:03,910 --> 00:36:08,310 You have some map where you A maps to C and B maps to F 743 00:36:08,310 --> 00:36:10,019 and so on, some mapping like that, 744 00:36:10,019 --> 00:36:11,310 and you use that to encrypt it. 745 00:36:11,310 --> 00:36:13,000 And there are these puzzles where 746 00:36:13,000 --> 00:36:14,750 you get told a few of the letters 747 00:36:14,750 --> 00:36:16,583 and you try and guess what the other letters 748 00:36:16,583 --> 00:36:17,750 and decode a message. 749 00:36:17,750 --> 00:36:19,910 That's a simple example of a kind of encryption 750 00:36:19,910 --> 00:36:21,170 that you might apply. 751 00:36:21,170 --> 00:36:22,950 And that encryption relies on the fact 752 00:36:22,950 --> 00:36:25,872 that this transform is essentially secret. 753 00:36:25,872 --> 00:36:27,330 If you know the transform obviously 754 00:36:27,330 --> 00:36:32,350 you can decrypt the message. 755 00:36:32,350 --> 00:36:39,640 These schemes are often called closed design crypto schemes. 756 00:36:39,640 --> 00:36:44,920 And the idea is that because the attacker doesn't 757 00:36:44,920 --> 00:36:47,200 know what scheme was used to encode the information 758 00:36:47,200 --> 00:36:51,010 it can be very, very hard for them to go about decoding it. 759 00:36:51,010 --> 00:36:53,100 For example, the architecture for this 760 00:36:53,100 --> 00:36:56,330 might look like message goes into some encryption box which 761 00:36:56,330 --> 00:36:57,890 is secured from the outside world. 762 00:36:57,890 --> 00:36:59,640 It was just hidden from the outside world. 763 00:36:59,640 --> 00:37:02,450 Nobody else knows what that is. 764 00:37:02,450 --> 00:37:07,510 And this goes over the Internet to some other decryption 765 00:37:07,510 --> 00:37:13,000 box which then comes out on the other end as a message. 766 00:37:13,000 --> 00:37:15,080 This is a closed design. 767 00:37:15,080 --> 00:37:20,970 And these designs, in general, sort of 768 00:37:20,970 --> 00:37:25,480 turn out not to be a very good idea because the problem is 769 00:37:25,480 --> 00:37:28,471 if somebody does discover what this function is now 770 00:37:28,471 --> 00:37:29,220 you're in trouble. 771 00:37:29,220 --> 00:37:32,400 Now this whole system is no longer secure. 772 00:37:32,400 --> 00:37:35,121 And, worse than that, when you make these things secure, 773 00:37:35,121 --> 00:37:37,370 if you suppose now you're going to put this system out 774 00:37:37,370 --> 00:37:39,240 in the world with a hidden protocol 775 00:37:39,240 --> 00:37:41,950 that nobody knows in the world, now it's 776 00:37:41,950 --> 00:37:43,747 sort of you against the whole world. 777 00:37:43,747 --> 00:37:46,080 Whereas, if you had told everybody what the protocol was 778 00:37:46,080 --> 00:37:48,040 to begin with and said this is the protocol 779 00:37:48,040 --> 00:37:49,520 and there is this little bit of information 780 00:37:49,520 --> 00:37:51,170 that we keep secret within the protocol, 781 00:37:51,170 --> 00:37:52,700 the two parties in the protocol keep secret, 782 00:37:52,700 --> 00:37:55,050 but here's the sort of algorithm that's in the protocol, 783 00:37:55,050 --> 00:37:56,780 let's let the entire world verify 784 00:37:56,780 --> 00:37:58,829 whether this protocol is, in fact, secure or not, 785 00:37:58,829 --> 00:38:01,120 you'd have a much better chance of developing something 786 00:38:01,120 --> 00:38:02,330 that was secure. 787 00:38:02,330 --> 00:38:04,390 It is sort of accepted wisdom that these kinds 788 00:38:04,390 --> 00:38:09,110 of closed design systems tend not to be widely used anymore. 789 00:38:09,110 --> 00:38:11,360 The systems that we are going to talk about, 790 00:38:11,360 --> 00:38:14,740 for the rest of this talk today, are 791 00:38:14,740 --> 00:38:17,690 going to be so-called open design systems. 792 00:38:17,690 --> 00:38:19,430 Of course, there are many, many systems 793 00:38:19,430 --> 00:38:22,520 that were designed with this and there are many, many closed 794 00:38:22,520 --> 00:38:23,400 cryptography systems. 795 00:38:23,400 --> 00:38:25,070 And, in some ways, they're sort of the most natural ones 796 00:38:25,070 --> 00:38:26,570 and the things you'd think of first. 797 00:38:26,570 --> 00:38:28,790 And they've been very effective throughout history. 798 00:38:28,790 --> 00:38:30,998 It's just the case that modern cryptography typically 799 00:38:30,998 --> 00:38:35,620 doesn't rely on it. 800 00:38:35,620 --> 00:38:52,680 Open design systems have an architecture that 801 00:38:52,680 --> 00:38:54,310 typically looks as follows. 802 00:38:54,310 --> 00:38:58,530 It is pretty similar to what we showed before. 803 00:38:58,530 --> 00:38:59,810 We have m going into E. 804 00:38:59,810 --> 00:39:05,180 But this time this protocol E, the world 805 00:39:05,180 --> 00:39:08,320 knows what the algorithm E is and instead this E 806 00:39:08,320 --> 00:39:10,590 has some piece of secret information coming into it, 807 00:39:10,590 --> 00:39:12,030 a key, k. 808 00:39:12,030 --> 00:39:20,620 And this key is usually not known to the world. 809 00:39:20,620 --> 00:39:23,390 And now we go through the Internet, for example, 810 00:39:23,390 --> 00:39:27,050 and we come out to a decryption box which 811 00:39:27,050 --> 00:39:29,250 also has a key going into it. 812 00:39:29,250 --> 00:39:33,430 And these keys may or may not be the same on the two boxes. 813 00:39:33,430 --> 00:39:35,530 And we'll talk about the difference 814 00:39:35,530 --> 00:39:38,720 between making them the same or not making them the same. 815 00:39:38,720 --> 00:39:41,520 Now we have this message that comes out. 816 00:39:41,520 --> 00:39:43,390 Message comes in, gets encrypted, 817 00:39:43,390 --> 00:39:45,720 goes over the Internet, goes into the decryption box 818 00:39:45,720 --> 00:39:48,000 and gets decrypted. 819 00:39:48,000 --> 00:39:54,000 If k1 is equal to k2 we say this system 820 00:39:54,000 --> 00:40:00,230 is a shared secret system. 821 00:40:00,230 --> 00:40:06,330 And if k1 is not equal to k2 we say that this 822 00:40:06,330 --> 00:40:08,160 is a public key system. 823 00:40:12,900 --> 00:40:17,130 In k1 is equal to k2 these two keys are the same. 824 00:40:17,130 --> 00:40:20,230 And, say, Alice and Bob on the two ends of this 825 00:40:20,230 --> 00:40:22,590 have exchanged information about what this key is 826 00:40:22,590 --> 00:40:25,830 before the protocol started. 827 00:40:25,830 --> 00:40:28,740 Alice called up Bob on the phone or saw Bob in the hallway 828 00:40:28,740 --> 00:40:30,500 and said hey, the key is X. 829 00:40:30,500 --> 00:40:32,290 And they agreed on this beforehand. 830 00:40:32,290 --> 00:40:34,456 And now that they've agreed on what this key is they 831 00:40:34,456 --> 00:40:36,470 can exchange information. 832 00:40:36,470 --> 00:40:38,809 In a public key system, we will see 833 00:40:38,809 --> 00:40:41,350 the design of how one public key system works in a little bit 834 00:40:41,350 --> 00:40:44,900 more detail, but typically it's the case 835 00:40:44,900 --> 00:40:47,630 that, for example, the person who is sending the message 836 00:40:47,630 --> 00:40:50,367 has a private key that nobody else knows, only they 837 00:40:50,367 --> 00:40:52,450 know, and then there's a public key that everybody 838 00:40:52,450 --> 00:40:54,949 else in the world knows and is sort of distributed publicly. 839 00:40:54,949 --> 00:40:56,830 And these two keys are not equal but there 840 00:40:56,830 --> 00:40:58,490 is some mathematical operation that you 841 00:40:58,490 --> 00:41:02,820 can apply that, given something that's been encrypted with k1, 842 00:41:02,820 --> 00:41:04,850 you can later decrypt it with k2. 843 00:41:04,850 --> 00:41:06,472 And we will look at one example of one 844 00:41:06,472 --> 00:41:07,805 of those mathematical functions. 845 00:41:14,070 --> 00:41:18,500 The point here is closed design says the algorithm itself 846 00:41:18,500 --> 00:41:20,350 is unknown to the world. 847 00:41:20,350 --> 00:41:22,010 What shared secret simply says is 848 00:41:22,010 --> 00:41:24,640 that there is some little bit of information, a little key, 849 00:41:24,640 --> 00:41:26,600 like a number that we've exchanged, 850 00:41:26,600 --> 00:41:29,680 but it's not the algorithm, it's not the protocol, 851 00:41:29,680 --> 00:41:32,140 it's just this one little bit of, say, 852 00:41:32,140 --> 00:41:35,530 several hundred bits of key information that we established 853 00:41:35,530 --> 00:41:36,900 beforehand. 854 00:41:36,900 --> 00:41:39,820 It still is the case that, for example, 855 00:41:39,820 --> 00:41:45,590 this protocol in a shared secret system is published. 856 00:41:45,590 --> 00:41:49,620 And so people can go and try and analyze 857 00:41:49,620 --> 00:41:50,970 the security of this thing. 858 00:41:50,970 --> 00:41:56,416 The community can look at what the math is that is going on. 859 00:42:00,330 --> 00:42:08,220 Let's look at a simple example of a shared key system. 860 00:42:16,360 --> 00:42:18,800 This is an approach called a one-time pad. 861 00:42:23,440 --> 00:42:25,990 One-time pad is a very simple example 862 00:42:25,990 --> 00:42:31,070 of an encryption protocol that is essentially 863 00:42:31,070 --> 00:42:32,924 cryptographically unbreakable. 864 00:42:32,924 --> 00:42:35,340 That is it may be possible to break it through other means 865 00:42:35,340 --> 00:42:38,680 but it is sort of provably not possible to break this 866 00:42:38,680 --> 00:42:42,810 through some sort of mathematical analysis attack. 867 00:42:42,810 --> 00:42:44,500 One-time pad depends on the ability 868 00:42:44,500 --> 00:42:46,492 of a source of truly random bits. 869 00:42:46,492 --> 00:42:48,450 I need some way to generate a set of bits which 870 00:42:48,450 --> 00:42:50,590 are sort of completely random. 871 00:42:50,590 --> 00:42:52,720 And doing that is tricky, but let's suppose 872 00:42:52,720 --> 00:42:54,760 that we can do it. 873 00:42:54,760 --> 00:42:56,390 What one-time pad says, we're going 874 00:42:56,390 --> 00:42:59,440 to call this sequence of bits k, that's going to be our key, 875 00:42:59,440 --> 00:43:01,900 and this key and the one-time pad approach 876 00:43:01,900 --> 00:43:03,054 is not going to be short. 877 00:43:03,054 --> 00:43:04,720 This key is going to be very, very long. 878 00:43:04,720 --> 00:43:07,110 It's going to be as long as all of the messages 879 00:43:07,110 --> 00:43:10,029 that we possibly want to send over all of time. 880 00:43:10,029 --> 00:43:12,070 Suppose that Alice and Bob, the way they generate 881 00:43:12,070 --> 00:43:15,500 this is Alice writes a set of random bits onto a CD, 882 00:43:15,500 --> 00:43:18,210 writes 650 megabytes of random bytes onto a CD 883 00:43:18,210 --> 00:43:19,910 and gives that to Bob and they agree 884 00:43:19,910 --> 00:43:23,070 that this is the key that they are going to use over time. 885 00:43:23,070 --> 00:43:28,890 We have m and k coming in. 886 00:43:28,890 --> 00:43:32,470 They are being combined together. 887 00:43:32,470 --> 00:43:36,680 They are being transmitted over the Internet. 888 00:43:36,680 --> 00:43:39,540 And then, on the other side, they 889 00:43:39,540 --> 00:43:44,300 are being decoded also with k where 890 00:43:44,300 --> 00:43:46,400 these two ks are equal in this case 891 00:43:46,400 --> 00:43:48,320 and we've got m coming out. 892 00:43:48,320 --> 00:43:52,290 This plus operation here is an XOR. 893 00:43:52,290 --> 00:43:55,200 This plus with a circle is XOR. 894 00:43:55,200 --> 00:43:58,140 If you were to remember what XOR does, the definition of XOR 895 00:43:58,140 --> 00:44:04,390 is given two bytes, zero, one, two things that 896 00:44:04,390 --> 00:44:07,180 were XORing together, two bytes that are either zero or one, 897 00:44:07,180 --> 00:44:10,960 the XOR of two zeros is zero, the XOR of zero and one is one 898 00:44:10,960 --> 00:44:14,560 and the XOR of two ones is zero. 899 00:44:14,560 --> 00:44:17,970 XOR has this nice property. 900 00:44:17,970 --> 00:44:20,500 This XOR is a byte-wise operation. 901 00:44:20,500 --> 00:44:24,350 This says the first byte in k is XORed with the first bid in m 902 00:44:24,350 --> 00:44:26,850 and the second byte in k is XORed with the second byte in m 903 00:44:26,850 --> 00:44:27,349 and so on. 904 00:44:27,349 --> 00:44:29,550 And the same thing happens over here. 905 00:44:29,550 --> 00:44:40,550 XOR has the nice property that m XOR k XOR k is equal to m. 906 00:44:40,550 --> 00:44:42,310 You can verify that that's true if you 907 00:44:42,310 --> 00:44:45,520 look at a simple example. 908 00:44:45,520 --> 00:44:49,110 What happens is when the stream m comes in and gets XORed 909 00:44:49,110 --> 00:44:51,980 with k, the encrypted message that is traveling over here 910 00:44:51,980 --> 00:44:53,890 essentially looks like a random byte string 911 00:44:53,890 --> 00:44:55,920 because k is a random byte string. 912 00:44:55,920 --> 00:44:59,089 And this m, no matter what m is, when 913 00:44:59,089 --> 00:45:00,630 it is XORed with a random byte string 914 00:45:00,630 --> 00:45:02,270 you're going to get something that looks like a random byte 915 00:45:02,270 --> 00:45:03,361 string coming out. 916 00:45:03,361 --> 00:45:04,860 And then on this other side, though, 917 00:45:04,860 --> 00:45:07,276 we also have access to the same byte string that was used, 918 00:45:07,276 --> 00:45:09,260 and now we can decrypt the message. 919 00:45:09,260 --> 00:45:12,140 So only if somebody knows exactly what k is can they 920 00:45:12,140 --> 00:45:14,570 decrypt the message. 921 00:45:14,570 --> 00:45:16,980 This approach is hard to make work in practice 922 00:45:16,980 --> 00:45:19,490 because it requires the availability of a large amount 923 00:45:19,490 --> 00:45:21,060 of random data. 924 00:45:21,060 --> 00:45:24,140 One thing that you can do is use a random number generator that 925 00:45:24,140 --> 00:45:25,970 is seeded with some number and then 926 00:45:25,970 --> 00:45:29,390 use that random number generator to begin to generate 927 00:45:29,390 --> 00:45:30,612 the sequence of bytes. 928 00:45:30,612 --> 00:45:32,070 Of course, that has the problem now 929 00:45:32,070 --> 00:45:35,020 which is somebody can discover the seed or somebody can guess 930 00:45:35,020 --> 00:45:37,530 which seed you used because the random number generator used 931 00:45:37,530 --> 00:45:38,550 is not very good. 932 00:45:38,550 --> 00:45:40,234 They may be able to break your protocol. 933 00:45:40,234 --> 00:45:42,150 But on-time pad is a nice example of something 934 00:45:42,150 --> 00:45:44,066 which is sort of a cryptographic protocol that 935 00:45:44,066 --> 00:45:51,155 is known to work pretty well. 936 00:45:51,155 --> 00:45:53,280 What I want to do, just with the last five minutes, 937 00:45:53,280 --> 00:46:02,590 is talk about one specific cryptographic protocol which 938 00:46:02,590 --> 00:46:04,430 is called the RSA protocol. 939 00:46:04,430 --> 00:46:07,450 And we probably won't have time to go 940 00:46:07,450 --> 00:46:09,760 through the details of how RSA actually works, 941 00:46:09,760 --> 00:46:16,823 but RSA is a public key protocol. 942 00:46:23,120 --> 00:46:29,970 And let me just quickly show you what RSA uses 943 00:46:29,970 --> 00:46:32,220 and then we will skip over DES here. 944 00:46:36,630 --> 00:46:38,490 What RSA does is says we are going to take 945 00:46:38,490 --> 00:46:40,080 two numbers, p and q which are prime. 946 00:46:40,080 --> 00:46:42,560 This protocol is in the book. 947 00:46:42,560 --> 00:46:45,760 It is in appendix one of the chapter 948 00:46:45,760 --> 00:46:48,100 so you don't need to copy it down word for word 949 00:46:48,100 --> 00:46:49,850 if you don't want to. 950 00:46:49,850 --> 00:46:51,750 If we have p and 1 are primes, we 951 00:46:51,750 --> 00:46:53,630 pick two numbers p and q which are prime, 952 00:46:53,630 --> 00:46:55,400 and then we generate some number n 953 00:46:55,400 --> 00:46:58,280 which is equal to p times q and another number 954 00:46:58,280 --> 00:47:01,530 z which is equal to p minus one times q minus one. 955 00:47:01,530 --> 00:47:05,180 And then pick another number e which is relatively prime to z. 956 00:47:05,180 --> 00:47:06,930 So that relatively prime just means 957 00:47:06,930 --> 00:47:10,440 that e doesn't divide z evenly. 958 00:47:10,440 --> 00:47:13,650 So z is not divisible by e. 959 00:47:13,650 --> 00:47:17,950 And we pick a number d such that e times d is equal to one, 960 00:47:17,950 --> 00:47:23,790 as long as the product [is modulo z?]. 961 00:47:23,790 --> 00:47:26,570 If you take e times d and you take the modulus with z 962 00:47:26,570 --> 00:47:29,840 the value should be one. 963 00:47:29,840 --> 00:47:33,540 If you pick a set of numbers that satisfy this property then 964 00:47:33,540 --> 00:47:36,090 we can define the public key to be e, 965 00:47:36,090 --> 00:47:39,180 n and the private key to be d, n. 966 00:47:39,180 --> 00:47:42,642 And these numbers have this magic property. 967 00:47:42,642 --> 00:47:44,600 And then we are going to be able to, with this, 968 00:47:44,600 --> 00:47:47,180 encrypt any message that is up to n in length. 969 00:47:47,180 --> 00:47:49,680 So, in general, we are going to want n and p 970 00:47:49,680 --> 00:47:52,117 and q to be some set of very large numbers. 971 00:47:52,117 --> 00:47:53,950 They are going to be hundreds of bytes long. 972 00:47:53,950 --> 00:47:55,366 We are going to be able to encrypt 973 00:47:55,366 --> 00:47:56,960 any message that is up to n bytes 974 00:47:56,960 --> 00:48:01,005 long, it is up to size of n. 975 00:48:01,005 --> 00:48:02,630 So we may have to break the messages up 976 00:48:02,630 --> 00:48:05,890 into chunks that are size n or smaller. 977 00:48:05,890 --> 00:48:10,890 This has this magic property now that if we encrypt the data 978 00:48:10,890 --> 00:48:14,060 in the following way, to encrypt a message 979 00:48:14,060 --> 00:48:16,000 we take m to the power of e and then take 980 00:48:16,000 --> 00:48:17,480 the whole thing module n. 981 00:48:17,480 --> 00:48:19,910 Now, transmit that message c across the network. 982 00:48:19,910 --> 00:48:22,487 Now, to decrypt, we take that encrypted thing 983 00:48:22,487 --> 00:48:24,820 and take it to the power d and then take the whole thing 984 00:48:24,820 --> 00:48:30,005 modulo n, we get the original message out at the end. 985 00:48:30,005 --> 00:48:32,380 The mathematics of understanding why this work turned out 986 00:48:32,380 --> 00:48:34,217 to be fairly subtle and sophisticated. 987 00:48:34,217 --> 00:48:36,050 There is a brief outline of it in the paper, 988 00:48:36,050 --> 00:48:37,430 but if you really want to understand this 989 00:48:37,430 --> 00:48:39,260 it's the kind of thing that requires an additional course 990 00:48:39,260 --> 00:48:39,980 in cryptography. 991 00:48:39,980 --> 00:48:42,396 We're not going to go into the details of the mathematics, 992 00:48:42,396 --> 00:48:44,580 but the idea is suppose we pick p to be 47 993 00:48:44,580 --> 00:48:44,660 and q to be 49, two prime numbers, 994 00:48:44,660 --> 00:48:44,810 generate n to be 2773, just the product of those, 995 00:48:44,810 --> 00:48:45,435 z then is 2668. 996 00:48:45,435 --> 00:48:55,780 And now we pick two numbers e and d. 997 00:48:55,780 --> 00:48:59,400 We pick these numbers just by using some searching 998 00:48:59,400 --> 00:49:02,160 for these numbers somehow over the set of all possible numbers 999 00:49:02,160 --> 00:49:03,040 we could have picked. 1000 00:49:03,040 --> 00:49:04,600 So we have these two numbers e and d 1001 00:49:04,600 --> 00:49:06,290 which satisfy this property. 1002 00:49:06,290 --> 00:49:14,890 That is 2668 is not divisible by 17, and 17 times 157 modulo 1003 00:49:14,890 --> 00:49:17,830 z, modulo 2668 is equal to one. 1004 00:49:17,830 --> 00:49:20,490 And you have to trust me that it true. 1005 00:49:20,490 --> 00:49:22,920 Now, suppose we have our message is equal to 31. 1006 00:49:22,920 --> 00:49:25,980 If we compute now C is equal to this thing, 31 1007 00:49:25,980 --> 00:49:26,090 to the 17th modulo 2773, we get 587. 1008 00:49:26,090 --> 00:49:26,220 And then, sort of magically at the end, we reverse this thing 1009 00:49:26,220 --> 00:49:27,261 and out pops our message. 1010 00:49:27,261 --> 00:49:44,240 What we see here is a public key protocol. 1011 00:49:44,240 --> 00:49:45,860 What we say here is the public key 1012 00:49:45,860 --> 00:49:48,660 is equal to this combination of e and n and the private key 1013 00:49:48,660 --> 00:49:52,520 is equal to this combination of d and n. 1014 00:49:52,520 --> 00:49:55,760 Suppose that only Alice knows the private key 1015 00:49:55,760 --> 00:49:58,499 and that Bob knows the public key and everybody 1016 00:49:58,499 --> 00:50:01,040 else in the world, for example, knows what the public key is. 1017 00:50:01,040 --> 00:50:03,030 Now, what we can do is Alice can encrypt 1018 00:50:03,030 --> 00:50:06,840 the message with her private key which nobody else knows. 1019 00:50:06,840 --> 00:50:09,750 And then using the public key everybody else 1020 00:50:09,750 --> 00:50:13,490 can go ahead and decrypt that message. 1021 00:50:13,490 --> 00:50:17,780 And by decrypting that message they 1022 00:50:17,780 --> 00:50:20,280 can be assured that the only person that could have actually 1023 00:50:20,280 --> 00:50:21,696 created this message to begin with 1024 00:50:21,696 --> 00:50:24,870 is somebody who had access to Alice's private key. 1025 00:50:24,870 --> 00:50:29,370 So they can authenticate that this message came from Alice. 1026 00:50:29,370 --> 00:50:31,470 This protocol also has a nice side effect 1027 00:50:31,470 --> 00:50:33,630 which is that it is reversible. 1028 00:50:36,350 --> 00:50:40,890 If Bob encrypts a message using Alice's public key, 1029 00:50:40,890 --> 00:50:43,582 Alice can decrypt that message, and only Alice 1030 00:50:43,582 --> 00:50:45,540 can decrypt that message using her private key. 1031 00:50:45,540 --> 00:50:48,180 We will talk more about these properties next time. 1032 00:50:48,180 --> 00:50:50,790 And take care.