1 00:00:00,080 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,820 Commons license. 3 00:00:03,820 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,150 continue to offer high quality educational resources for free. 5 00:00:10,150 --> 00:00:12,700 To make a donation or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,255 at ocw.mit.edu. 8 00:00:27,595 --> 00:00:28,720 PROFESSOR: All right, guys. 9 00:00:28,720 --> 00:00:29,470 Let's get started. 10 00:00:29,470 --> 00:00:31,137 So today, we're going to talk about Tor. 11 00:00:31,137 --> 00:00:32,761 And we actually have one of the authors 12 00:00:32,761 --> 00:00:35,090 of the paper you guys read for today, Nick Mathewson. 13 00:00:35,090 --> 00:00:37,080 He's also one of the main developers of Tor. 14 00:00:37,080 --> 00:00:38,065 He's going to tell you more about it. 15 00:00:38,065 --> 00:00:39,148 NICK MATHEWSON: Thank you. 16 00:00:39,148 --> 00:00:42,315 So at this point, I could start out 17 00:00:42,315 --> 00:00:44,420 by saying, please put your hands up 18 00:00:44,420 --> 00:00:48,490 if you didn't read the paper, but that wouldn't work. 19 00:00:48,490 --> 00:00:50,920 Because it's embarrassing not to have read a paper 20 00:00:50,920 --> 00:00:52,940 you're supposed to have read. 21 00:00:52,940 --> 00:00:56,020 So instead, what I will ask is, think of your birthday. 22 00:00:56,020 --> 00:00:57,610 Think of the date of your birth. 23 00:00:57,610 --> 00:01:01,000 If the last digit of the date of your birth is odd, 24 00:01:01,000 --> 00:01:06,080 or you didn't read the paper, please raise your hand. 25 00:01:06,080 --> 00:01:09,000 OK, that's not far from half. 26 00:01:09,000 --> 00:01:11,810 So I'm guessing most people read the paper. 27 00:01:14,720 --> 00:01:19,190 Means of communicating that preserve our privacy enable 28 00:01:19,190 --> 00:01:23,620 us to communicate more honestly to gather better 29 00:01:23,620 --> 00:01:26,570 information about the world when we are less 30 00:01:26,570 --> 00:01:32,210 disinhibited from speaking because of possibly justified 31 00:01:32,210 --> 00:01:37,540 possibly unjustified social and other consequences. 32 00:01:37,540 --> 00:01:41,570 So this brings us to Tor, which is a anonymity 33 00:01:41,570 --> 00:01:44,210 network that I've been working on for the last 10 years 34 00:01:44,210 --> 00:01:48,080 with some friends and colleagues and so on. 35 00:01:48,080 --> 00:01:51,170 [INAUDIBLE] there's a set of volunteer operating servers, 36 00:01:51,170 --> 00:01:52,930 about 6,000 of them. 37 00:01:52,930 --> 00:01:55,290 At first, it was just friends of ours 38 00:01:55,290 --> 00:01:58,310 that Roger Dingledine and I knew from MIT. 39 00:01:58,310 --> 00:02:01,660 After that, we built up more publicity. 40 00:02:01,660 --> 00:02:04,810 More people started running servers. 41 00:02:04,810 --> 00:02:08,360 Now it's run by nonprofits, private individuals, 42 00:02:08,360 --> 00:02:12,370 some university teams, possibly some of you here today, 43 00:02:12,370 --> 00:02:17,820 and no doubt some very sketchy people. 44 00:02:17,820 --> 00:02:19,140 We've got about 6,000 nodes. 45 00:02:19,140 --> 00:02:21,540 We're serving on the order of hundreds of thousands 46 00:02:21,540 --> 00:02:24,060 to millions of users depending on how you count. 47 00:02:24,060 --> 00:02:26,310 It's kind of hard to count, because they're anonymous. 48 00:02:26,310 --> 00:02:29,142 So you have to use statistical techniques to estimate. 49 00:02:29,142 --> 00:02:30,850 And we're doing on the order of terabytes 50 00:02:30,850 --> 00:02:34,500 per second worth of traffic. 51 00:02:34,500 --> 00:02:39,190 Lots of people need anonymity for their regular work. 52 00:02:39,190 --> 00:02:40,670 Not everyone who needs anonymity, 53 00:02:40,670 --> 00:02:43,980 though, thinks of it as anonymity. 54 00:02:43,980 --> 00:02:46,380 Some people say, I don't need anonymity. 55 00:02:46,380 --> 00:02:48,520 I'm perfectly fine identifying myself. 56 00:02:48,520 --> 00:02:52,590 But there's broad perceptions that the privacy 57 00:02:52,590 --> 00:02:55,330 is necessary or useful. 58 00:02:55,330 --> 00:02:57,982 And when regular citizens use anonymity stuff, 59 00:02:57,982 --> 00:03:00,750 they tend to be doing it because they want privacy 60 00:03:00,750 --> 00:03:04,455 in search results, privacy in doing research on the internet. 61 00:03:04,455 --> 00:03:07,900 They want to be able to engage in local politics 62 00:03:07,900 --> 00:03:12,180 while not offending local politicians, and so on. 63 00:03:12,180 --> 00:03:15,210 Researchers frequently use anonymizing tools 64 00:03:15,210 --> 00:03:21,800 to avoid gathering biased data, biased by geolocation based 65 00:03:21,800 --> 00:03:23,685 services that might be serving them 66 00:03:23,685 --> 00:03:26,500 in particular different versions of things. 67 00:03:26,500 --> 00:03:29,700 Companies use anonymity technologies 68 00:03:29,700 --> 00:03:32,620 for protection of sensitive data. 69 00:03:32,620 --> 00:03:38,730 For instance, if I can track all of the movements 70 00:03:38,730 --> 00:03:42,600 of the legal team for some major internet company, 71 00:03:42,600 --> 00:03:49,360 I can probably, just by tracking when they're visiting their web 72 00:03:49,360 --> 00:03:52,085 server from different places around the world, 73 00:03:52,085 --> 00:03:54,126 or where they're visiting the company [INAUDIBLE] 74 00:03:54,126 --> 00:03:56,000 different places around the world, 75 00:03:56,000 --> 00:03:58,996 learn a lot about which teams are collaborating with which. 76 00:03:58,996 --> 00:04:00,370 And this is information companies 77 00:04:00,370 --> 00:04:02,370 would like to keep private. 78 00:04:02,370 --> 00:04:07,540 Companies use also the anonymity technology for doing research. 79 00:04:07,540 --> 00:04:12,130 So a major router manufacturer for a while-- 80 00:04:12,130 --> 00:04:13,800 I don't know if this is still the case-- 81 00:04:13,800 --> 00:04:17,200 would regularly serve different versions of its product sheets 82 00:04:17,200 --> 00:04:20,200 to IP addresses associated with its competitors 83 00:04:20,200 --> 00:04:23,851 in order to make reverse engineering trickier. 84 00:04:23,851 --> 00:04:26,142 And they found this out by using our software and said, 85 00:04:26,142 --> 00:04:28,308 hey, wait a minute, we got a different product sheet 86 00:04:28,308 --> 00:04:32,407 when we came in from Tor than we did coming directly. 87 00:04:32,407 --> 00:04:34,365 And it's also kind of normal for some companies 88 00:04:34,365 --> 00:04:36,679 to serve other companies versions of their websites 89 00:04:36,679 --> 00:04:38,720 to emphasize the employment opportunity sections. 90 00:04:41,660 --> 00:04:46,910 Regular law enforcement needs anonymity technologies as well 91 00:04:46,910 --> 00:04:49,900 to avoid tipping off people during investigations. 92 00:04:49,900 --> 00:04:51,955 You do not want the local police station 93 00:04:51,955 --> 00:04:57,290 to appear in the web logs of somebody you're investigating. 94 00:04:57,290 --> 00:05:00,960 And regular folks need it, as I said, 95 00:05:00,960 --> 00:05:04,640 for avoiding harassment because of online activities, 96 00:05:04,640 --> 00:05:07,600 to research stuff that might be embarrassing. 97 00:05:07,600 --> 00:05:13,390 If you live in a country with uncertain health care laws, 98 00:05:13,390 --> 00:05:16,420 you might want to avoid creating too much public record 99 00:05:16,420 --> 00:05:19,070 of what diseases you think you might have and so on, 100 00:05:19,070 --> 00:05:21,920 or what dangerous hobbies you might have. 101 00:05:21,920 --> 00:05:27,400 And also lots of criminal or bad folks use anonymity technology. 102 00:05:27,400 --> 00:05:28,710 It's not their only option. 103 00:05:28,710 --> 00:05:33,650 But if you are willing to purchase time on a bot net, 104 00:05:33,650 --> 00:05:35,360 you can buy some pretty good privacy 105 00:05:35,360 --> 00:05:38,329 that is not available to people who 106 00:05:38,329 --> 00:05:39,620 think that bot nets are amoral. 107 00:05:39,620 --> 00:05:43,890 And Tor, and anonymity stuff in general, 108 00:05:43,890 --> 00:05:49,482 are not the only multi-use technology out there. 109 00:05:49,482 --> 00:05:51,690 Let's see, the average age of a graduate is about 20. 110 00:05:51,690 --> 00:05:56,706 So around when you were born-- have you talked 111 00:05:56,706 --> 00:05:58,327 about crypto wars at all? 112 00:05:58,327 --> 00:05:59,181 PROFESSOR: No. 113 00:05:59,181 --> 00:06:00,270 NICK MATHEWSON: No. 114 00:06:00,270 --> 00:06:02,700 During the 1990s, it was sort of an up-in-the-air question 115 00:06:02,700 --> 00:06:06,590 in the United States about to what extent civilian use 116 00:06:06,590 --> 00:06:09,120 of non-backdoor cryptography should be legal, 117 00:06:09,120 --> 00:06:11,320 and to what extent it should be exported. 118 00:06:11,320 --> 00:06:13,200 That kind of came down pretty decisively 119 00:06:13,200 --> 00:06:17,090 on the side of cryptography should be legal and exportable 120 00:06:17,090 --> 00:06:20,310 during the '90s and early 2000s. 121 00:06:20,310 --> 00:06:24,350 And although there's some debate about anonymity technology, 122 00:06:24,350 --> 00:06:27,100 it's more or less the same debate. 123 00:06:27,100 --> 00:06:30,750 And I think it's going to end in more or less the same way. 124 00:06:30,750 --> 00:06:33,264 So here's an outline of my talk. 125 00:06:33,264 --> 00:06:35,680 I'm going to give you that little introduction I gave you, 126 00:06:35,680 --> 00:06:37,721 talk a little bit about what we mean by anonymity 127 00:06:37,721 --> 00:06:40,235 in a technical sense, talk a little about our motivations 128 00:06:40,235 --> 00:06:41,068 for getting into it. 129 00:06:41,068 --> 00:06:44,970 Then I'm going to kind of walk you through step 130 00:06:44,970 --> 00:06:47,450 by step how you start with the idea of, 131 00:06:47,450 --> 00:06:50,445 we ought to have some anonymity, and how 132 00:06:50,445 --> 00:06:52,902 do you wind up with the design of Tor from that point. 133 00:06:52,902 --> 00:06:54,660 And I'll mention some branching off 134 00:06:54,660 --> 00:06:56,990 points where you might wind up with other designs. 135 00:06:56,990 --> 00:06:59,780 I'll pause to answer some of the cool questions 136 00:06:59,780 --> 00:07:04,220 that everyone has sent in for their class assignment. 137 00:07:04,220 --> 00:07:06,710 I'll talk a little bit about how node discovery works, 138 00:07:06,710 --> 00:07:08,394 which is an important topic. 139 00:07:08,394 --> 00:07:10,150 And then I'll sort of by show of hands 140 00:07:10,150 --> 00:07:12,856 pick which of these advanced topics to cover. 141 00:07:12,856 --> 00:07:15,230 I guess we're calling them advanced because they're later 142 00:07:15,230 --> 00:07:16,360 in the lecture. 143 00:07:16,360 --> 00:07:19,750 And I can't read them all, but they're all really cool. 144 00:07:19,750 --> 00:07:21,655 I'll mention some related systems 145 00:07:21,655 --> 00:07:23,905 whose designs you ought to check out 146 00:07:23,905 --> 00:07:26,370 if this is a topic that interests you and you'd like 147 00:07:26,370 --> 00:07:27,286 to know more about it. 148 00:07:27,286 --> 00:07:30,340 I'll talk about future work that we want to have done at Tor 149 00:07:30,340 --> 00:07:32,860 and I hope that we'll have time to do some day. 150 00:07:32,860 --> 00:07:35,870 And if there's time for questions, then I'll take them. 151 00:07:35,870 --> 00:07:38,930 And I've got nowhere I need to be for the next hour or so. 152 00:07:38,930 --> 00:07:43,307 So I and my colleague David over there-- can you wave your hand, 153 00:07:43,307 --> 00:07:47,340 David-- will be hanging around somewhere and talking 154 00:07:47,340 --> 00:07:48,613 to anyone who wants to talk. 155 00:07:48,613 --> 00:07:52,750 So right, anonymity-- what do we mean 156 00:07:52,750 --> 00:07:54,183 when we talk about anonymity? 157 00:07:54,183 --> 00:07:57,210 There are lots of informal notions 158 00:07:57,210 --> 00:08:03,390 that get used in informal discussions, in online, and so 159 00:08:03,390 --> 00:08:03,890 on. 160 00:08:03,890 --> 00:08:05,390 Some people use anonymous to mean, 161 00:08:05,390 --> 00:08:06,598 I didn't write my name on it. 162 00:08:06,598 --> 00:08:10,900 Some people use anonymous to mean, well, 163 00:08:10,900 --> 00:08:12,290 no one can actually prove it's me 164 00:08:12,290 --> 00:08:15,230 even if you suspect strongly. 165 00:08:15,230 --> 00:08:18,200 What we mean is a number of notions expressed 166 00:08:18,200 --> 00:08:25,560 in terms of the ability of an observer 167 00:08:25,560 --> 00:08:32,590 or attacker on a network to link participants to actions. 168 00:08:32,590 --> 00:08:35,870 These notions come out of a terminology paper 169 00:08:35,870 --> 00:08:38,659 by [INAUDIBLE] that you find a link 170 00:08:38,659 --> 00:08:43,929 to on freehaven.net/anonbib/, the anonymity bibliography that 171 00:08:43,929 --> 00:08:46,790 I help maintain. 172 00:08:46,790 --> 00:08:49,423 It should list most of the good papers in the field. 173 00:08:49,423 --> 00:08:51,550 We need to bring it up to date to 2014, 174 00:08:51,550 --> 00:08:53,390 but it's pretty useful. 175 00:08:53,390 --> 00:08:55,840 So when I say anonymity, generally what I mean 176 00:08:55,840 --> 00:09:01,080 is Alice is doing some activity. 177 00:09:01,080 --> 00:09:05,132 She's-- what should Alice be doing? 178 00:09:05,132 --> 00:09:06,215 Alice is buying new socks. 179 00:09:10,270 --> 00:09:11,820 And there's some attacker here. 180 00:09:11,820 --> 00:09:14,770 Let's call her Eve for now. 181 00:09:14,770 --> 00:09:18,999 Eve can tell that Alice is doing something. 182 00:09:18,999 --> 00:09:21,040 Preventing that is not what we mean by anonymity. 183 00:09:21,040 --> 00:09:22,890 That's called unobservability. 184 00:09:22,890 --> 00:09:26,550 Eve can tell possibly that someone is buying socks. 185 00:09:26,550 --> 00:09:28,850 Again, that's not what we mean by anonymity. 186 00:09:28,850 --> 00:09:33,480 But what we hope is that Eve cannot tell that Alice 187 00:09:33,480 --> 00:09:36,310 in particular is buying socks. 188 00:09:36,310 --> 00:09:40,190 And we mean that both on a categorical level-- 189 00:09:40,190 --> 00:09:42,935 Eve should not be able to conclude 190 00:09:42,935 --> 00:09:45,060 through rigorous mathematical proof, this is Alice, 191 00:09:45,060 --> 00:09:48,430 she's buying socks-- but also, Eve should not 192 00:09:48,430 --> 00:09:52,180 be able to conclude probabilistically it's likelier 193 00:09:52,180 --> 00:09:56,030 that Alice is buying socks than some randomly selected person. 194 00:09:56,030 --> 00:09:59,080 And also, we would like Eve not to be 195 00:09:59,080 --> 00:10:02,250 able to conclude after observing many Alice activities, 196 00:10:02,250 --> 00:10:05,280 Alice sometimes buys socks, even if I 197 00:10:05,280 --> 00:10:08,260 don't know some particular activity of Alice's is 198 00:10:08,260 --> 00:10:09,045 a socks purchase. 199 00:10:12,650 --> 00:10:14,560 There are other ideas that are related. 200 00:10:14,560 --> 00:10:17,030 One is on unlinkability. 201 00:10:17,030 --> 00:10:23,876 Unlinkability is it's like a long-term profile of Alice. 202 00:10:23,876 --> 00:10:26,210 So for instance, Alice has been posting 203 00:10:26,210 --> 00:10:33,050 as-- I'm never good at picking names for my example. 204 00:10:33,050 --> 00:10:39,060 Alice has been posting as Bob and writing a political blog 205 00:10:39,060 --> 00:10:43,315 that would disrupt her career, that 206 00:10:43,315 --> 00:10:45,490 would offend her department head and disrupt 207 00:10:45,490 --> 00:10:49,820 her career as a computer security [INAUDIBLE]. 208 00:10:49,820 --> 00:10:53,280 So she's been writing as Bob. 209 00:10:53,280 --> 00:10:59,650 Unlinkability is Eve's inability to link Alice 210 00:10:59,650 --> 00:11:01,950 to a particular profile. 211 00:11:01,950 --> 00:11:05,910 Final notion-- unobservability, some systems 212 00:11:05,910 --> 00:11:12,540 try to make it impossible to even tell that Alice is online, 213 00:11:12,540 --> 00:11:15,620 that Alice is connecting to anybody at all, that Alice 214 00:11:15,620 --> 00:11:17,190 is doing any active. 215 00:11:17,190 --> 00:11:20,660 These are rather hard to build. 216 00:11:20,660 --> 00:11:22,660 I'll talk a little bit more about to what extent 217 00:11:22,660 --> 00:11:25,650 that they are useful later. 218 00:11:25,650 --> 00:11:27,610 Something that is useful in that area 219 00:11:27,610 --> 00:11:29,745 is you might want to conceal that Alice 220 00:11:29,745 --> 00:11:32,240 is using an anonymity system, but not 221 00:11:32,240 --> 00:11:33,630 that she is on the internet. 222 00:11:33,630 --> 00:11:35,910 That's more achievable than concealing the fact 223 00:11:35,910 --> 00:11:39,070 that Alice is on the internet entirely. 224 00:11:39,070 --> 00:11:42,710 So why did I start working on this in the first place? 225 00:11:42,710 --> 00:11:45,177 Well, partially because of the engineer's itch. 226 00:11:45,177 --> 00:11:46,010 It's a cool problem. 227 00:11:46,010 --> 00:11:47,870 It's an interesting problem. 228 00:11:47,870 --> 00:11:50,805 Nobody else was actually working on it. 229 00:11:50,805 --> 00:11:52,480 And my friend Roger got a contract 230 00:11:52,480 --> 00:11:56,940 to finish up a stalled research project 231 00:11:56,940 --> 00:11:58,490 before the grant expired. 232 00:11:58,490 --> 00:12:03,585 And he did it well enough that I said, hey, I'll join up. 233 00:12:03,585 --> 00:12:05,200 And [INAUDIBLE]. 234 00:12:05,200 --> 00:12:06,720 I'll join in. 235 00:12:06,720 --> 00:12:09,740 After a while, we formed a nonprofit 236 00:12:09,740 --> 00:12:13,310 and released everything as open source. 237 00:12:13,310 --> 00:12:14,870 So that's part of it. 238 00:12:14,870 --> 00:12:18,810 But for deeper motivations, I think 239 00:12:18,810 --> 00:12:21,800 humanity has got a lot of problems that can only 240 00:12:21,800 --> 00:12:25,760 be solved through better and more dedicated 241 00:12:25,760 --> 00:12:30,530 communication, freer expression, and more freedom of thought. 242 00:12:30,530 --> 00:12:33,890 And I don't know how to solve these problems. 243 00:12:33,890 --> 00:12:37,360 All I think I can do is try to make sure 244 00:12:37,360 --> 00:12:40,880 that what I see as inhibiting discussion, 245 00:12:40,880 --> 00:12:44,780 thought, speech, becomes harder to do. 246 00:12:44,780 --> 00:12:46,275 So that's [INAUDIBLE]. 247 00:12:46,275 --> 00:12:47,188 Yeah. 248 00:12:47,188 --> 00:12:49,604 STUDENT: So I know there are many good reasons to use Tor. 249 00:12:49,604 --> 00:12:51,062 Please don't see this as criticism. 250 00:12:51,062 --> 00:12:53,036 I'm just curious, what is your opinion 251 00:12:53,036 --> 00:12:55,297 as far as criminal activity? 252 00:12:55,297 --> 00:12:57,630 NICK MATHEWSON: What is my opinion on criminal activity? 253 00:12:57,630 --> 00:12:58,430 Some laws are good. 254 00:12:58,430 --> 00:12:59,532 Some laws are bad. 255 00:12:59,532 --> 00:13:01,490 My lawyers would tell me never to advise anyone 256 00:13:01,490 --> 00:13:02,860 to break the law. 257 00:13:05,750 --> 00:13:08,253 My goal was not to enable criminal activity against most 258 00:13:08,253 --> 00:13:10,550 of the laws I agree with. 259 00:13:10,550 --> 00:13:13,140 In places where criticising the government is illegal, 260 00:13:13,140 --> 00:13:17,399 then I'm in favor of criminal activity of that kind. 261 00:13:17,399 --> 00:13:19,190 So in that case, I suppose I was supporting 262 00:13:19,190 --> 00:13:21,330 that kind of criminal activity. 263 00:13:21,330 --> 00:13:24,946 My stance on whether it's a problem that an anonymity 264 00:13:24,946 --> 00:13:26,570 network gets used for criminal activity 265 00:13:26,570 --> 00:13:29,215 in general, to the extent that there are good laws, 266 00:13:29,215 --> 00:13:31,660 I would prefer that people not break them. 267 00:13:31,660 --> 00:13:36,980 I would, however, think that any computer security system that 268 00:13:36,980 --> 00:13:40,830 does not get used by criminals is probably a very bad computer 269 00:13:40,830 --> 00:13:43,720 security system if the criminals are 270 00:13:43,720 --> 00:13:46,770 making any kind of good decision making policy. 271 00:13:46,770 --> 00:13:49,820 I think that if we go around banning security 272 00:13:49,820 --> 00:13:54,619 that works for criminals, we wind up with insecure systems. 273 00:13:54,619 --> 00:13:56,410 So that's more or less where I stand on it. 274 00:13:56,410 --> 00:13:58,284 I'm not really the philosopher of it, though. 275 00:13:58,284 --> 00:13:59,620 I'm more of the programmer. 276 00:13:59,620 --> 00:14:01,760 So I'm going to be giving really trite answers 277 00:14:01,760 --> 00:14:03,362 to philosophical and legal questions. 278 00:14:03,362 --> 00:14:05,570 Also, I'm not a lawyer and cannot offer legal advice. 279 00:14:05,570 --> 00:14:08,510 Do not take anything I say as legal advice. 280 00:14:08,510 --> 00:14:14,464 That said, [INAUDIBLE], a lot of these research problems 281 00:14:14,464 --> 00:14:15,880 that I'm going to be talking about 282 00:14:15,880 --> 00:14:17,484 weren't even close to being solved. 283 00:14:17,484 --> 00:14:19,650 So whey do we start anyway instead of going straight 284 00:14:19,650 --> 00:14:21,099 into research? 285 00:14:21,099 --> 00:14:23,140 One of the reasons, we thought that a lot of them 286 00:14:23,140 --> 00:14:27,300 wouldn't get solved unless there was a test bed to work on. 287 00:14:27,300 --> 00:14:29,250 And that's kind of been borne out. 288 00:14:29,250 --> 00:14:33,590 Because Tor has kind of become the research platform of choice 289 00:14:33,590 --> 00:14:36,530 for lots of work on low latency anonymity systems. 290 00:14:36,530 --> 00:14:38,580 And it's helped the field a lot in that way. 291 00:14:38,580 --> 00:14:41,120 But also, 10 years on, a lot of the big problems 292 00:14:41,120 --> 00:14:42,650 still aren't solved. 293 00:14:42,650 --> 00:14:45,740 So if we had waited 10 years for everything to get fixed, 294 00:14:45,740 --> 00:14:48,290 we would have been waiting in vain. 295 00:14:48,290 --> 00:14:51,760 So why do it then? 296 00:14:51,760 --> 00:14:58,740 Partially because we thought that having a system out there 297 00:14:58,740 --> 00:15:03,041 would improve long-term outcomes for the world. 298 00:15:03,041 --> 00:15:05,290 That is, it's really easy to argue that something that 299 00:15:05,290 --> 00:15:08,250 doesn't exist should be banned. 300 00:15:08,250 --> 00:15:10,440 Arguments against civilian use of cryptography 301 00:15:10,440 --> 00:15:13,230 were much easier to make in public in 1990 302 00:15:13,230 --> 00:15:14,361 than they are today. 303 00:15:14,361 --> 00:15:15,860 Because there was almost no civilian 304 00:15:15,860 --> 00:15:18,240 use of strong cryptography then. 305 00:15:18,240 --> 00:15:23,050 And you could argue that if anything stronger than DES 306 00:15:23,050 --> 00:15:28,010 is legal, then civilization will collapse. 307 00:15:28,010 --> 00:15:34,900 Criminals will never be caught, and organized crime 308 00:15:34,900 --> 00:15:36,525 will take over everything. 309 00:15:36,525 --> 00:15:38,150 But you couldn't really argue that that 310 00:15:38,150 --> 00:15:41,410 was the inevitable consequence of cryptography in 2000. 311 00:15:41,410 --> 00:15:43,440 Because cryptography had already been out there, 312 00:15:43,440 --> 00:15:46,160 and it turned out not to end the world. 313 00:15:46,160 --> 00:15:49,420 Further, it was harder to argue for a cryptography ban in 2000 314 00:15:49,420 --> 00:15:54,270 because there was a large constituency in favor 315 00:15:54,270 --> 00:15:56,090 of the use of cryptography. 316 00:15:56,090 --> 00:15:59,150 That is, if someone in 1985 says, 317 00:15:59,150 --> 00:16:01,630 let's ban strong cryptography, well, banks 318 00:16:01,630 --> 00:16:02,880 are using strong cryptography. 319 00:16:02,880 --> 00:16:04,860 So they'll ask for an exemption. 320 00:16:04,860 --> 00:16:05,580 But other than that, there weren't 321 00:16:05,580 --> 00:16:07,121 a lot of users of strong cryptography 322 00:16:07,121 --> 00:16:08,384 in the civilian space. 323 00:16:08,384 --> 00:16:09,800 But if someone in 2000 said, let's 324 00:16:09,800 --> 00:16:12,180 ban strong cryptography, that would 325 00:16:12,180 --> 00:16:14,900 be every internet company. 326 00:16:14,900 --> 00:16:18,885 Everyone running an HTTPS page would start waving their hands 327 00:16:18,885 --> 00:16:20,050 and shouting about it. 328 00:16:20,050 --> 00:16:21,690 And nowadays, strong cryptography bans 329 00:16:21,690 --> 00:16:24,610 are probably unfeasible, although people 330 00:16:24,610 --> 00:16:26,000 keep bringing back the idea. 331 00:16:26,000 --> 00:16:27,470 And again, I'm not the philosopher 332 00:16:27,470 --> 00:16:29,980 or political scientist of the movement. 333 00:16:29,980 --> 00:16:34,860 So some folks ask me, what's your threat model? 334 00:16:34,860 --> 00:16:37,390 It's good to be thinking in terms of threat models. 335 00:16:37,390 --> 00:16:40,280 Unfortunately, our threat model is kind of weird. 336 00:16:40,280 --> 00:16:43,570 We started not with an adversary requirement. 337 00:16:43,570 --> 00:16:46,202 But we started with a usability requirement. 338 00:16:46,202 --> 00:16:48,700 The usability requirement we gave ourselves to begin 339 00:16:48,700 --> 00:16:52,395 is, this has to be useful for web browsing. 340 00:16:52,395 --> 00:16:58,910 This has to be useful for interactive protocols. 341 00:16:58,910 --> 00:17:01,110 And it actually needs to see use. 342 00:17:01,110 --> 00:17:04,800 Subject to that, we want to maximize security. 343 00:17:04,800 --> 00:17:07,369 So our threat model has lots of weird corners 344 00:17:07,369 --> 00:17:10,050 in it if you actually write it out as, 345 00:17:10,050 --> 00:17:13,410 what can an attacker do, under what circumstances, and how? 346 00:17:13,410 --> 00:17:15,780 And that's because we've set ourselves the goal of, 347 00:17:15,780 --> 00:17:17,810 it has to work for the web. 348 00:17:17,810 --> 00:17:20,443 And I'll return to that in a minute or two. 349 00:17:20,443 --> 00:17:23,180 But let's sort of talk about now how 350 00:17:23,180 --> 00:17:29,810 we can use forward anonymity, how we build forward anonymity. 351 00:17:29,810 --> 00:17:32,580 So here's Alice. 352 00:17:32,580 --> 00:17:35,890 She wants to buy socks. 353 00:17:35,890 --> 00:17:42,170 So OK, let's say that Alice runs a computer. 354 00:17:42,170 --> 00:17:43,820 Let's call it R for relay. 355 00:17:43,820 --> 00:17:47,670 And this computer relays her traffic to-- I 356 00:17:47,670 --> 00:17:50,600 want to say socks.com, but I'm afraid that'll 357 00:17:50,600 --> 00:17:53,690 turn out to be something horrible, so zappos.com. 358 00:17:53,690 --> 00:17:55,000 Yeah, they sell socks, too. 359 00:17:55,000 --> 00:17:58,470 All right, so Alice wants to buy some socks from zappos.com. 360 00:17:58,470 --> 00:18:00,930 And she's going through a relay. 361 00:18:00,930 --> 00:18:04,530 Well, I said Alice runs a relay. 362 00:18:04,530 --> 00:18:07,910 Any eavesdropper who's looking at this will say, 363 00:18:07,910 --> 00:18:09,240 that's Alice's computer. 364 00:18:09,240 --> 00:18:11,097 It's probably Alice. 365 00:18:11,097 --> 00:18:13,180 All right, so let's have somebody else run a relay 366 00:18:13,180 --> 00:18:17,340 and have lots of other users all visit it. 367 00:18:17,340 --> 00:18:20,200 I'll call them A2 and A3, because there 368 00:18:20,200 --> 00:18:26,720 aren't enough standard cryptography person names-- buy 369 00:18:26,720 --> 00:18:34,332 books, tweet cat pictures. 370 00:18:37,600 --> 00:18:42,670 This is like 80% of what people do on the internet, right? 371 00:18:42,670 --> 00:18:46,650 So now we have three people all going into this relay, three 372 00:18:46,650 --> 00:18:47,600 streams exiting. 373 00:18:47,600 --> 00:18:51,090 Someone who's watching the relay can't easily correlate-- 374 00:18:51,090 --> 00:18:54,290 should not be, we hope, but we return to that later-- 375 00:18:54,290 --> 00:18:58,300 that this Alice is buying socks, this Alice, buying books, 376 00:18:58,300 --> 00:19:00,860 this Alice is tweeting cat pix. 377 00:19:00,860 --> 00:19:06,090 Well, except if they're watching this side of the connections, 378 00:19:06,090 --> 00:19:08,530 they can see Alice telling the relay, 379 00:19:08,530 --> 00:19:10,554 please connect me to zappos.com. 380 00:19:10,554 --> 00:19:12,220 All right, so we'll add some encryption. 381 00:19:12,220 --> 00:19:15,200 We'll maybe do TLS on all of these links. 382 00:19:15,200 --> 00:19:18,015 So to the extent that you can't break TLS, to the extent 383 00:19:18,015 --> 00:19:20,200 you can't correlate this to this, 384 00:19:20,200 --> 00:19:22,630 then they get some privacy. 385 00:19:22,630 --> 00:19:25,830 Well, that's still not good enough, though. 386 00:19:25,830 --> 00:19:31,040 Because first off, we're assuming that this relay 387 00:19:31,040 --> 00:19:32,619 is fully trusted. 388 00:19:32,619 --> 00:19:34,410 I assume you know the definition of trusted 389 00:19:34,410 --> 00:19:36,460 and why it doesn't actually mean trusted. 390 00:19:36,460 --> 00:19:37,940 OK, good. 391 00:19:37,940 --> 00:19:39,440 This is trusted in the sense that it 392 00:19:39,440 --> 00:19:41,720 can break the whole system, trusted in the sense 393 00:19:41,720 --> 00:19:44,845 that you can't help but trust it, not trusted in the sense 394 00:19:44,845 --> 00:19:46,620 that it's actually trustworthy. 395 00:19:46,620 --> 00:19:49,720 So all right, we can introduce multiple relays. 396 00:19:49,720 --> 00:19:53,410 We can have different relays run by different people. 397 00:19:53,410 --> 00:20:00,120 We can have-- this is not actually the topology we use. 398 00:20:00,120 --> 00:20:01,885 But my blackboard technique is terrible, 399 00:20:01,885 --> 00:20:04,225 and I don't want to redraw anything. 400 00:20:07,190 --> 00:20:09,720 We can imagine tumbling these connections 401 00:20:09,720 --> 00:20:11,680 through multiple relays, each of which 402 00:20:11,680 --> 00:20:14,170 removes a single layer of encryption. 403 00:20:14,170 --> 00:20:19,770 So all this relay sees is Alice is doing something. 404 00:20:19,770 --> 00:20:23,610 All this relay sees is someone is buying socks. 405 00:20:23,610 --> 00:20:26,240 But this one just sees someone is buying socks. 406 00:20:26,240 --> 00:20:28,562 The connection came from this relay. 407 00:20:28,562 --> 00:20:30,395 This one just sees Alice is doing something, 408 00:20:30,395 --> 00:20:32,320 and it forwards onto this relay. 409 00:20:32,320 --> 00:20:35,505 And no single party ought to be able to correlate 410 00:20:35,505 --> 00:20:37,450 the whole thing. 411 00:20:37,450 --> 00:20:42,780 Now we come to a major design point. 412 00:20:42,780 --> 00:20:50,090 Let's suppose that Eve is watching here and here. 413 00:20:50,090 --> 00:20:52,250 Nothing I've said so far does anything 414 00:20:52,250 --> 00:20:57,860 to obscure the timing and volume of Alice's packets. 415 00:20:57,860 --> 00:21:01,140 Oh sure, there'll be some trivial noise 416 00:21:01,140 --> 00:21:03,690 added from all the computation and decryption 417 00:21:03,690 --> 00:21:06,220 these things do from network latency and so on. 418 00:21:06,220 --> 00:21:11,600 But ultimately, if Alice is sending a kilobyte in, 419 00:21:11,600 --> 00:21:13,500 then the design I've sketched out so far, 420 00:21:13,500 --> 00:21:16,315 a kilobyte is coming out. 421 00:21:16,315 --> 00:21:21,650 And if the socks web page is 64k long, 422 00:21:21,650 --> 00:21:26,340 and is served by this web server at 11:26, 423 00:21:26,340 --> 00:21:27,870 then Alice is going to get something 424 00:21:27,870 --> 00:21:33,460 about 64k long at 11:26 or 11:27 or so. 425 00:21:33,460 --> 00:21:38,400 Now, with some statistics, Eve can 426 00:21:38,400 --> 00:21:42,540 correlate some of these streams if we don't obscure 427 00:21:42,540 --> 00:21:44,726 volume and timing information. 428 00:21:44,726 --> 00:21:46,850 There are designs that do obscure volume and timing 429 00:21:46,850 --> 00:21:48,190 information. 430 00:21:48,190 --> 00:21:52,230 The good ones usually come out of [INAUDIBLE], 431 00:21:52,230 --> 00:21:55,140 although there's some work on DC-nets. 432 00:21:55,140 --> 00:21:58,040 You could have something where each of these nodes 433 00:21:58,040 --> 00:22:00,600 received a large number of requests, just [INAUDIBLE] 434 00:22:00,600 --> 00:22:03,030 up all the requests they got for an hour, 435 00:22:03,030 --> 00:22:06,970 reordered them, and transmitted them all at once. 436 00:22:06,970 --> 00:22:10,260 And you could also say all requests must be the same size. 437 00:22:10,260 --> 00:22:13,670 Requests are 1k, responses are 1 megabyte. 438 00:22:13,670 --> 00:22:15,680 And with some more work on that, we 439 00:22:15,680 --> 00:22:22,440 get something that would let you send an email that would arrive 440 00:22:22,440 --> 00:22:29,220 in order of hours, or get a web page in order of to end time, 441 00:22:29,220 --> 00:22:32,610 assuming that you optimize it to a single round trip. 442 00:22:32,610 --> 00:22:36,500 These systems exist, and existed when we started doing Tor. 443 00:22:36,500 --> 00:22:38,675 They don't get a lot of use, though. 444 00:22:38,675 --> 00:22:40,740 I actually wrote one called Mixminion 445 00:22:40,740 --> 00:22:44,010 that was a successor to the Mixmaster remailer. 446 00:22:44,010 --> 00:22:46,510 I have not gotten a remailer message in the last three 447 00:22:46,510 --> 00:22:47,010 years. 448 00:22:49,620 --> 00:22:51,350 Tor has billions of users. 449 00:22:51,350 --> 00:22:54,293 Remailers, it's unclear whether they've got more than 450 00:22:54,293 --> 00:22:55,477 on the order of hundreds. 451 00:22:55,477 --> 00:22:57,310 So you might think, well, still though, it's 452 00:22:57,310 --> 00:22:59,830 better anonymity for the people who really need it. 453 00:22:59,830 --> 00:23:03,120 Except if you've only got on the order of hundreds of users, 454 00:23:03,120 --> 00:23:05,655 then you're not really providing them 455 00:23:05,655 --> 00:23:08,630 all that much anonymity against this kind of adversary anyway. 456 00:23:08,630 --> 00:23:10,260 Because this adversary can simply go, 457 00:23:10,260 --> 00:23:12,250 OK, there's 100 people. 458 00:23:12,250 --> 00:23:14,080 Well, the message I want to investigate 459 00:23:14,080 --> 00:23:15,630 was looking at a Bulgarian website. 460 00:23:15,630 --> 00:23:17,040 How many of them speak Bulgarian? 461 00:23:17,040 --> 00:23:20,170 OK, that's five. 462 00:23:20,170 --> 00:23:22,950 The saying is, anonymity loves company. 463 00:23:22,950 --> 00:23:25,615 Unless you have a large user base, 464 00:23:25,615 --> 00:23:28,230 no system can actually provide anonymity. 465 00:23:28,230 --> 00:23:31,970 And that's why also in this design, if these Alices all 466 00:23:31,970 --> 00:23:33,770 belong to an organization, they ought 467 00:23:33,770 --> 00:23:38,830 to have a shared public system rather than a private one. 468 00:23:38,830 --> 00:23:45,130 If they all work for MIT legal, and they're 469 00:23:45,130 --> 00:23:50,120 investigating some fake MIT website that's 470 00:23:50,120 --> 00:23:54,663 offering fake diplomas, then if they're just 471 00:23:54,663 --> 00:23:58,800 using the MIT legal anonymizer, then it's not really 472 00:23:58,800 --> 00:24:00,370 concealing who they are. 473 00:24:00,370 --> 00:24:02,495 But if you have a large number of different parties 474 00:24:02,495 --> 00:24:06,590 all using this, then it actually can provide some privacy. 475 00:24:06,590 --> 00:24:13,830 So we'll return one more time to resisting these correlation 476 00:24:13,830 --> 00:24:14,330 attacks. 477 00:24:14,330 --> 00:24:16,996 But for now let's say that we're not resisting these correlation 478 00:24:16,996 --> 00:24:17,720 attacks. 479 00:24:17,720 --> 00:24:23,070 And instead, we assume that an attacker who sees both ends 480 00:24:23,070 --> 00:24:25,850 wins, and we're trying to minimize the probability 481 00:24:25,850 --> 00:24:28,220 that that happens over time. 482 00:24:28,220 --> 00:24:31,150 All right, so I've just talked about message passing. 483 00:24:35,464 --> 00:24:37,880 The way you would build that with something like a mix net 484 00:24:37,880 --> 00:24:45,630 is you give each of these relays a public key-- K3, K2, K1. 485 00:24:45,630 --> 00:24:48,480 And when Alice wants to send something through here, 486 00:24:48,480 --> 00:24:55,110 she would say, encrypt with K3, socks, 487 00:24:55,110 --> 00:24:59,350 and then encrypt that with K2-- I'm 488 00:24:59,350 --> 00:25:01,430 leaving off writing information for now-- 489 00:25:01,430 --> 00:25:04,320 and then encrypt with K1. 490 00:25:04,320 --> 00:25:05,894 But public key, as you know, is kind 491 00:25:05,894 --> 00:25:08,310 of expensive enough that you don't want to use it for bulk 492 00:25:08,310 --> 00:25:10,000 traffic. 493 00:25:10,000 --> 00:25:17,610 So instead what you do is you negotiate 494 00:25:17,610 --> 00:25:20,110 a set of keys with each server. 495 00:25:20,110 --> 00:25:23,350 So Alice shares a symmetric key with this relay, 496 00:25:23,350 --> 00:25:25,100 a different symmetric key with this relay, 497 00:25:25,100 --> 00:25:28,395 and a different symmetric key with this relay associated 498 00:25:28,395 --> 00:25:32,110 in what we call a circuit, which is a path through the network. 499 00:25:32,110 --> 00:25:38,677 And after the initial public key is set up to create those keys, 500 00:25:38,677 --> 00:25:40,135 Alice can then use symmetric crypto 501 00:25:40,135 --> 00:25:41,551 to send stuff through the network. 502 00:25:41,551 --> 00:25:43,920 If you stop at that point, then you 503 00:25:43,920 --> 00:25:47,250 have onion routing as it was designed in the 1990s 504 00:25:47,250 --> 00:25:51,955 by Syverson, Goldschlag, and Reed. 505 00:25:51,955 --> 00:25:54,811 And I hope I get the names right. 506 00:25:54,811 --> 00:25:56,060 Paul Syverson is still active. 507 00:25:56,060 --> 00:25:59,210 The other two are working on other things. 508 00:25:59,210 --> 00:26:03,390 Also, once you've added circuits like that, medium term paths 509 00:26:03,390 --> 00:26:06,910 through the network, you can have an easy reply channel 510 00:26:06,910 --> 00:26:09,310 where things sent back this way get 511 00:26:09,310 --> 00:26:13,155 to Alice being encrypted at each step instead of decrypted 512 00:26:13,155 --> 00:26:15,770 at each step. 513 00:26:15,770 --> 00:26:21,660 And of course you need some kind of integrity checking, 514 00:26:21,660 --> 00:26:24,430 either node by node or end to end. 515 00:26:24,430 --> 00:26:26,280 Because if you don't do integrity checking, 516 00:26:26,280 --> 00:26:31,855 then-- well, let's say you're using an XOR based stream 517 00:26:31,855 --> 00:26:33,622 cypher for your encryption. 518 00:26:33,622 --> 00:26:35,080 If you don't do integrity checking, 519 00:26:35,080 --> 00:26:39,230 then this node can XOR in Alice, Alice, Alice, Alice, 520 00:26:39,230 --> 00:26:42,410 Alice to the encrypted message. 521 00:26:42,410 --> 00:26:44,970 And then when it's finally decrypted over here, 522 00:26:44,970 --> 00:26:47,310 because that's a malleable crypto 523 00:26:47,310 --> 00:26:56,410 scheme, if the same attacker is controlling this node as well, 524 00:26:56,410 --> 00:26:58,970 or if the attacker is observing it here, 525 00:26:58,970 --> 00:27:01,870 the attacker will see Alice, Alice, Alice, Alice, Alice 526 00:27:01,870 --> 00:27:03,820 XORed with a reasonable plain text 527 00:27:03,820 --> 00:27:05,320 and be able to use that to identify, 528 00:27:05,320 --> 00:27:08,580 ah, this is the stream that came from Alice. 529 00:27:08,580 --> 00:27:12,370 So let's do a little more about how the protocol works. 530 00:27:12,370 --> 00:27:14,870 Because it would be a shame to have everybody read the paper 531 00:27:14,870 --> 00:27:16,245 and then not talk about the stuff 532 00:27:16,245 --> 00:27:17,680 that the paper is focused on. 533 00:27:24,011 --> 00:27:26,840 Again, I apologize for my blackboard technique. 534 00:27:26,840 --> 00:27:32,120 Most of the time, I'm sitting at home on a desktop. 535 00:27:32,120 --> 00:27:35,385 This is alien tech. 536 00:27:35,385 --> 00:27:38,315 So here's a relay. 537 00:27:38,315 --> 00:27:41,580 Here's Alice. 538 00:27:41,580 --> 00:27:43,610 Here's another relay. 539 00:27:43,610 --> 00:27:44,270 Here's Bob. 540 00:27:44,270 --> 00:27:45,843 Now Alice wants to talk to Bob. 541 00:27:48,460 --> 00:27:52,720 So first thing Alice has to do is build a circuit 542 00:27:52,720 --> 00:27:55,210 through these relays to Bob. 543 00:27:55,210 --> 00:27:57,130 Let's say she's picked these two, R1 and R2. 544 00:27:59,900 --> 00:28:08,050 So Alice first makes a TLS link to R1. 545 00:28:08,050 --> 00:28:10,660 R1, let's say, already has a TLS link to R2. 546 00:28:13,550 --> 00:28:16,335 First thing Alice does is she does 547 00:28:16,335 --> 00:28:25,250 a one-way authenticated one-way anonymous key negotiation. 548 00:28:25,250 --> 00:28:28,340 The old one in Tor is called TAP. 549 00:28:28,340 --> 00:28:30,280 The new one is called NTor. 550 00:28:30,280 --> 00:28:31,980 They both have proofs. 551 00:28:35,032 --> 00:28:36,490 They both even have correct proofs, 552 00:28:36,490 --> 00:28:41,540 although the original proof in the paper had a flaw in it. 553 00:28:41,540 --> 00:28:45,780 But when that's done, she sends a create cell. 554 00:28:45,780 --> 00:28:47,690 And she picks a circuit ID. 555 00:28:47,690 --> 00:28:52,023 Let's say she picks 3, and says, create 3. 556 00:28:54,650 --> 00:28:55,650 The relay says, created. 557 00:29:00,010 --> 00:29:05,575 And now R1 and Alice share a secret key, a symmetric key, 558 00:29:05,575 --> 00:29:06,866 which they're going to call S1. 559 00:29:10,280 --> 00:29:16,234 And they both have this stored as 3 with respect to this link. 560 00:29:19,020 --> 00:29:23,810 Now Alice can use that key to send messages to R1. 561 00:29:23,810 --> 00:29:27,265 So she says, on 3-- that's the circuit ID that everything 562 00:29:27,265 --> 00:29:38,760 was talking about in the paper-- send a relay extend 563 00:29:38,760 --> 00:29:41,210 with some contents. 564 00:29:41,210 --> 00:29:44,326 The extend cell basically contains the first half 565 00:29:44,326 --> 00:29:47,130 of the create handshake. 566 00:29:47,130 --> 00:29:50,965 But this time, it's not encrypted with R1's public key. 567 00:29:50,965 --> 00:29:53,070 It's encrypted with R2's public key. 568 00:29:53,070 --> 00:29:56,130 And it also says, and this one goes to R2. 569 00:29:56,130 --> 00:30:01,941 So R1 knows to open a new circuit to R2, and says, 570 00:30:01,941 --> 00:30:02,440 create. 571 00:30:05,770 --> 00:30:09,480 And it passes the initial part of the handshake 572 00:30:09,480 --> 00:30:12,120 as it came from Alice along. 573 00:30:12,120 --> 00:30:14,550 And it picks its own circuit ID. 574 00:30:14,550 --> 00:30:17,185 Because circuit IDs identify the different circuits 575 00:30:17,185 --> 00:30:19,122 on this TLS connection. 576 00:30:19,122 --> 00:30:20,830 And Alice doesn't know what other circuit 577 00:30:20,830 --> 00:30:22,120 IDs are in use on this one. 578 00:30:22,120 --> 00:30:24,390 Because this one is private to R1 and R2. 579 00:30:24,390 --> 00:30:28,270 So it might pick 95. 580 00:30:28,270 --> 00:30:30,020 It actually is very unlikely to pick that, 581 00:30:30,020 --> 00:30:36,270 because they're randomly chosen from a 4 byte space. 582 00:30:36,270 --> 00:30:40,780 But I don't want to write out any 32-bit numbers today. 583 00:30:40,780 --> 00:30:43,975 And this says, created in response. 584 00:30:43,975 --> 00:30:48,590 So this one sends back an extended encrypted with S1. 585 00:30:48,590 --> 00:30:58,480 And now Alice and relay share S2. 586 00:30:58,480 --> 00:31:01,050 So now Alice can send messages encrypted 587 00:31:01,050 --> 00:31:06,480 first with S2, and then with S1 as relay cells. 588 00:31:06,480 --> 00:31:08,000 So she sends a message like that. 589 00:31:08,000 --> 00:31:12,960 R1 removes the S1 encryption and forwards it on. 590 00:31:12,960 --> 00:31:17,750 It says, OK, it came in on circuit 3. 591 00:31:17,750 --> 00:31:20,370 I know that 3 goes to 95 on this one. 592 00:31:20,370 --> 00:31:23,075 So I send it on 95. 593 00:31:23,075 --> 00:31:25,852 And I say whatever I got after decrypting. 594 00:31:25,852 --> 00:31:28,980 OK, and this one says, ah, I came on 95. 595 00:31:28,980 --> 00:31:33,290 95 corresponds to the shared key S2. 596 00:31:33,290 --> 00:31:34,740 So I'll decrypt with that. 597 00:31:34,740 --> 00:31:38,340 Oh, that says, open a connection to Bob. 598 00:31:38,340 --> 00:31:41,650 And relay 2 opens a TCP connection to Bob 599 00:31:41,650 --> 00:31:45,270 and tells Alice that it did it through the same process. 600 00:31:45,270 --> 00:31:47,150 And Alice says, great. 601 00:31:47,150 --> 00:31:58,440 Tell Bob http 10 get/index.html, and the world goes on. 602 00:31:58,440 --> 00:32:00,120 Let's see, what did I leave out? 603 00:32:00,120 --> 00:32:03,040 I'll skip that, skip that, skip that. 604 00:32:03,040 --> 00:32:04,930 So what do we actually relay? 605 00:32:04,930 --> 00:32:07,210 Some designs in this area say, well, you should 606 00:32:07,210 --> 00:32:08,980 send IP packets back and forth. 607 00:32:08,980 --> 00:32:12,006 This should just be a way to transmit IP packets. 608 00:32:12,006 --> 00:32:15,980 One of the problems with that is we 609 00:32:15,980 --> 00:32:19,070 want to support as many users as possible, which 610 00:32:19,070 --> 00:32:21,580 means we have to run on all kinds of operating systems. 611 00:32:21,580 --> 00:32:23,920 And operating system TCP stacks do not 612 00:32:23,920 --> 00:32:26,020 act anything like each other. 613 00:32:26,020 --> 00:32:27,960 If you've ever used Nmap, or if you've ever 614 00:32:27,960 --> 00:32:30,610 used any kind of network traffic analysis tool, 615 00:32:30,610 --> 00:32:34,635 you can trivially tell Windows TCP from FreeBSD 616 00:32:34,635 --> 00:32:36,880 from Linux TCP. 617 00:32:36,880 --> 00:32:38,990 And you can even tell different versions apart. 618 00:32:38,990 --> 00:32:41,870 And moreover, if you can send raw IP packets 619 00:32:41,870 --> 00:32:45,560 to a chosen host, you can provoke 620 00:32:45,560 --> 00:32:49,810 different responses in part based 621 00:32:49,810 --> 00:32:51,637 on what the host is doing. 622 00:32:51,637 --> 00:32:53,458 So if you're doing IP, you would actually 623 00:32:53,458 --> 00:32:55,900 need an IP normalization layer if IP is what 624 00:32:55,900 --> 00:32:58,630 you transport back and forth. 625 00:32:58,630 --> 00:33:03,730 And it seems that anything less than a full IP stack is not 626 00:33:03,730 --> 00:33:07,017 actually going to work for IP normalization. 627 00:33:07,017 --> 00:33:08,350 So you wouldn't want to do that. 628 00:33:10,880 --> 00:33:13,560 Instead, what we just chose is-- and this is largely 629 00:33:13,560 --> 00:33:15,960 because this is the easiest way-- you take 630 00:33:15,960 --> 00:33:18,230 the contents of TCP streams. 631 00:33:18,230 --> 00:33:25,390 So you just assume each of these things 632 00:33:25,390 --> 00:33:27,610 is reliable and in order. 633 00:33:27,610 --> 00:33:31,430 You have the computer analysis end, the program analysis 634 00:33:31,430 --> 00:33:35,400 running to do all this stuff for her, 635 00:33:35,400 --> 00:33:38,120 accept TCP connections from Alice's applications, 636 00:33:38,120 --> 00:33:40,720 and then just relay their contents 637 00:33:40,720 --> 00:33:44,229 and don't do anything trickier on the network level. 638 00:33:44,229 --> 00:33:46,020 You might be able to get better performance 639 00:33:46,020 --> 00:33:46,970 by trying some other means. 640 00:33:46,970 --> 00:33:48,428 And there are some papers examining 641 00:33:48,428 --> 00:33:49,880 how you would do that. 642 00:33:49,880 --> 00:33:52,820 But this is the one that we could actually implement. 643 00:33:52,820 --> 00:33:54,392 Because we paid a lot more attention 644 00:33:54,392 --> 00:33:56,100 in security and compilers classes than we 645 00:33:56,100 --> 00:33:58,860 did in networking classes. 646 00:33:58,860 --> 00:34:00,760 Now we have networking people. 647 00:34:00,760 --> 00:34:04,285 But in 2003, 2004, we did not have any networking experts. 648 00:34:07,250 --> 00:34:09,030 TCP also seems like the right level. 649 00:34:09,030 --> 00:34:11,594 Higher level protocols-- like in some 650 00:34:11,594 --> 00:34:13,210 of the original [INAUDIBLE] designs, 651 00:34:13,210 --> 00:34:16,389 there were separate proxies at this end for HTTP, 652 00:34:16,389 --> 00:34:19,000 for FTP, and so on. 653 00:34:19,000 --> 00:34:21,889 That seems to be mostly a bad idea. 654 00:34:21,889 --> 00:34:24,060 Because any interesting protocol is 655 00:34:24,060 --> 00:34:26,880 going to have end to end encryption from Alice 656 00:34:26,880 --> 00:34:28,650 all the way to Bob. 657 00:34:28,650 --> 00:34:32,800 That is if we're lucky, Alice is doing a TLS connection 658 00:34:32,800 --> 00:34:40,800 over this to Bob so that TLS properties get her integrity 659 00:34:40,800 --> 00:34:44,110 and secrecy. 660 00:34:44,110 --> 00:34:46,909 But if that's the case, then any kind anonymizing 661 00:34:46,909 --> 00:34:50,840 transformations you want to apply to the encrypted data 662 00:34:50,840 --> 00:34:53,139 need to happen in the application 663 00:34:53,139 --> 00:34:56,710 Alice is using before the TLS happens entirely. 664 00:34:56,710 --> 00:34:58,637 So you can't really do that in a proxy. 665 00:34:58,637 --> 00:35:00,220 And that's kind of why we came out to, 666 00:35:00,220 --> 00:35:03,370 OK, the sweet spot is TCP contents. 667 00:35:03,370 --> 00:35:08,070 Somebody asked me, OK, but where are your security proofs? 668 00:35:08,070 --> 00:35:11,530 We do have security proofs for a lot of the cryptography that we 669 00:35:11,530 --> 00:35:15,760 use, standard reductions. 670 00:35:15,760 --> 00:35:19,510 For the protocol as a whole, there 671 00:35:19,510 --> 00:35:23,069 are proofs in the field about certain aspects of onion 672 00:35:23,069 --> 00:35:23,710 routing. 673 00:35:23,710 --> 00:35:27,310 But the models that they have to use in order 674 00:35:27,310 --> 00:35:31,170 to prove that this provides anonymity 675 00:35:31,170 --> 00:35:36,890 make assumptions about the universe, the network, 676 00:35:36,890 --> 00:35:41,930 or the attacker's abilities that are so weird as 677 00:35:41,930 --> 00:35:45,710 to satisfy no one but certain program committees of more 678 00:35:45,710 --> 00:35:49,070 theoretical conferences. 679 00:35:49,070 --> 00:35:54,580 The kind of things you can prove is that an attacker who sees 680 00:35:54,580 --> 00:36:02,890 this, who sees a number of strings here all of equal 681 00:36:02,890 --> 00:36:07,140 volume and equal timing, cannot tell which one goes to which 682 00:36:07,140 --> 00:36:11,650 Bob simply by looking at the bytes coming out. 683 00:36:11,650 --> 00:36:14,630 But that's hardly a useful result. 684 00:36:14,630 --> 00:36:17,880 Also, the kind of guarantee you can get from anonymity systems 685 00:36:17,880 --> 00:36:20,319 that we know how to build today-- OK, 686 00:36:20,319 --> 00:36:21,360 I should be careful here. 687 00:36:21,360 --> 00:36:24,780 There are some where you have very strong guarantees 688 00:36:24,780 --> 00:36:26,930 that we do know how to build that you would never 689 00:36:26,930 --> 00:36:28,010 actually want to use. 690 00:36:28,010 --> 00:36:32,490 Like classical [INAUDIBLE] DC-nets, 691 00:36:32,490 --> 00:36:35,200 for instance, provide guaranteed anonymity. 692 00:36:35,200 --> 00:36:37,450 Except any participant can shut down the whole network 693 00:36:37,450 --> 00:36:39,550 by not participating. 694 00:36:39,550 --> 00:36:41,400 That does not scale. 695 00:36:41,400 --> 00:36:42,820 But for the things that we do want 696 00:36:42,820 --> 00:36:46,880 to build these days, for the most part, 697 00:36:46,880 --> 00:36:49,960 the anonymity properties are probabilistic rather 698 00:36:49,960 --> 00:36:52,670 than categorically guarantee-able. 699 00:36:52,670 --> 00:36:56,070 So instead of asking, does this protect 700 00:36:56,070 --> 00:36:58,650 Alice, the kind of questions you could ask 701 00:36:58,650 --> 00:37:02,600 are, under this assumption about hacker capabilities, how 702 00:37:02,600 --> 00:37:04,260 much traffic can Alice safely send 703 00:37:04,260 --> 00:37:10,370 if she wants a 99% chance of not being linked to her activities? 704 00:37:10,370 --> 00:37:13,070 So will anyone actually run these things? 705 00:37:13,070 --> 00:37:15,430 That was an opening question when we started. 706 00:37:15,430 --> 00:37:17,430 We didn't know whether the system would actually 707 00:37:17,430 --> 00:37:18,320 take off or not. 708 00:37:18,320 --> 00:37:25,450 So the only [INAUDIBLE] try to see what happens. 709 00:37:25,450 --> 00:37:28,920 We got a fair amount of volunteer operators. 710 00:37:28,920 --> 00:37:33,410 A fair number of non-profits have formed whose sole purpose 711 00:37:33,410 --> 00:37:36,440 is just to take donations and use it to buy bandwidth and run 712 00:37:36,440 --> 00:37:38,890 Tor nodes. 713 00:37:38,890 --> 00:37:40,450 And there are also universities. 714 00:37:40,450 --> 00:37:42,609 There's also private companies. 715 00:37:42,609 --> 00:37:44,650 For a while, [INAUDIBLE] was running a Tor server 716 00:37:44,650 --> 00:37:47,689 out of their security team because they 717 00:37:47,689 --> 00:37:48,480 thought it was fun. 718 00:37:52,360 --> 00:37:54,760 The legal issues there-- again, I'm not a lawyer. 719 00:37:54,760 --> 00:37:55,910 I can't offer legal advice. 720 00:37:55,910 --> 00:37:58,035 But five different people asked about legal issues. 721 00:38:00,192 --> 00:38:01,900 As far as I can tell, in the US at least, 722 00:38:01,900 --> 00:38:04,800 there's no legal impediment to running a Tor server. 723 00:38:04,800 --> 00:38:07,690 And that seems to be the case throughout most of Europe 724 00:38:07,690 --> 00:38:09,580 as far as I'm aware. 725 00:38:09,580 --> 00:38:12,970 In places that generally have less internet freedom, 726 00:38:12,970 --> 00:38:14,670 it's a dicier proposition. 727 00:38:14,670 --> 00:38:16,670 The issues to be concerned about are not, 728 00:38:16,670 --> 00:38:19,180 is it illegal to run a Tor server, 729 00:38:19,180 --> 00:38:24,635 but if somebody does something illegal or undesirable 730 00:38:24,635 --> 00:38:28,180 with my Tor server, will my ISP shut me down, 731 00:38:28,180 --> 00:38:32,846 and will law enforcement believe, oh, 732 00:38:32,846 --> 00:38:34,220 you're just running a Tor server, 733 00:38:34,220 --> 00:38:37,336 or will they seize the computer to make sure? 734 00:38:37,336 --> 00:38:39,710 For those, I would suggest not running the Tor server out 735 00:38:39,710 --> 00:38:42,720 of your dorm room. 736 00:38:42,720 --> 00:38:45,670 Excuse me, don't run an exit out of your dorm room, 737 00:38:45,670 --> 00:38:48,460 or really out of your dorm room, assuming the network policy 738 00:38:48,460 --> 00:38:49,460 allows that. 739 00:38:49,460 --> 00:38:50,650 I have no idea. 740 00:38:50,650 --> 00:38:52,400 They've changed so much since I was a kid. 741 00:38:55,266 --> 00:38:57,890 Running an exit out of your dorm room could get you in trouble. 742 00:38:57,890 --> 00:39:01,620 But running a non-exit relay that doesn't deliver traffic 743 00:39:01,620 --> 00:39:05,282 to the internet is less likely to create those issues 744 00:39:05,282 --> 00:39:05,865 in particular. 745 00:39:10,140 --> 00:39:12,010 But if you do it in a nice co-lo site, 746 00:39:12,010 --> 00:39:14,730 and you get your ISP's permission, 747 00:39:14,730 --> 00:39:19,840 then it's a pretty reasonable thing to do. 748 00:39:19,840 --> 00:39:23,311 Let's see, someone asked, well, what if users 749 00:39:23,311 --> 00:39:24,560 don't trust a particular node? 750 00:39:24,560 --> 00:39:29,670 And this brings me to my next topic. 751 00:39:29,670 --> 00:39:32,750 So the software the clients use, you can't tell it, 752 00:39:32,750 --> 00:39:35,780 don't use this one, don't use this one, only use this one. 753 00:39:35,780 --> 00:39:39,130 But remember that anonymity loves company principle. 754 00:39:39,130 --> 00:39:43,631 If I'm only using three nodes, and you're 755 00:39:43,631 --> 00:39:45,256 using three different nodes, and you're 756 00:39:45,256 --> 00:39:49,550 using three different nodes, our traffic will not mix at all. 757 00:39:49,550 --> 00:39:52,280 To the extent that we partition off which parts of the network 758 00:39:52,280 --> 00:39:55,740 we use, we are distinguishable from one another. 759 00:39:55,740 --> 00:39:57,800 Now, if I just exclude one or two nodes, 760 00:39:57,800 --> 00:40:00,040 and you just exclude one or two nodes, 761 00:40:00,040 --> 00:40:03,120 that's not a big partitioning, and that doesn't help 762 00:40:03,120 --> 00:40:05,270 distinguish-ability that much. 763 00:40:05,270 --> 00:40:08,700 But it would be good to the extent possible to have 764 00:40:08,700 --> 00:40:12,290 everyone using the same nodes. 765 00:40:12,290 --> 00:40:14,880 So all right, how do we accomplish that? 766 00:40:14,880 --> 00:40:16,780 So version one, in the first version of Tor, 767 00:40:16,780 --> 00:40:18,730 we just chipped a list of all of the nodes. 768 00:40:18,730 --> 00:40:21,525 I think there were three of them, or five, or something. 769 00:40:21,525 --> 00:40:22,900 No, I think there were about six, 770 00:40:22,900 --> 00:40:25,910 of which three were all running on the same computer 771 00:40:25,910 --> 00:40:30,142 in a closet at LCS in Tech Square. 772 00:40:30,142 --> 00:40:32,560 All right, so that wasn't a good idea. 773 00:40:32,560 --> 00:40:34,090 Because nodes can go up and down. 774 00:40:34,090 --> 00:40:35,067 Nodes change. 775 00:40:35,067 --> 00:40:36,442 You don't want to have to put out 776 00:40:36,442 --> 00:40:39,005 a new release of your software every time somebody 777 00:40:39,005 --> 00:40:41,160 joins to release the network. 778 00:40:41,160 --> 00:40:44,260 So you could just have every node keep 779 00:40:44,260 --> 00:40:46,677 a list of all the other nodes that are connected to it 780 00:40:46,677 --> 00:40:48,010 and all advertise to each other. 781 00:40:48,010 --> 00:40:50,193 And then when a client connects, a client just 782 00:40:50,193 --> 00:40:51,790 has to know one node and then says, 783 00:40:51,790 --> 00:40:53,189 hey, who's on the network? 784 00:40:53,189 --> 00:40:54,730 And actually, a lot of designs people 785 00:40:54,730 --> 00:40:57,320 have built work this way. 786 00:40:57,320 --> 00:40:59,500 A lot of early peer to peer anonymity designs work 787 00:40:59,500 --> 00:41:00,360 this way. 788 00:41:00,360 --> 00:41:01,771 But it's a terrible idea. 789 00:41:01,771 --> 00:41:04,270 Because if you go to one node and say, who's on the network, 790 00:41:04,270 --> 00:41:07,240 and you believe them, well, if I'm that node, I can say, 791 00:41:07,240 --> 00:41:11,070 yes, I'm on the network, and my friend over here 792 00:41:11,070 --> 00:41:14,130 is on the network, and my friend over here is on the network, 793 00:41:14,130 --> 00:41:15,920 and no one else is on the network. 794 00:41:15,920 --> 00:41:18,895 And I can tell you any number of fake nodes 795 00:41:18,895 --> 00:41:22,790 that are all operated by me and capture all of your traffic 796 00:41:22,790 --> 00:41:25,160 that way with what's called a row capture attack. 797 00:41:25,160 --> 00:41:28,480 OK, so maybe we just have a single directory 798 00:41:28,480 --> 00:41:30,470 operated by a trusted party. 799 00:41:30,470 --> 00:41:33,730 That's not so good as a single point of failure. 800 00:41:33,730 --> 00:41:38,210 So OK, let's have multiple trusted parties. 801 00:41:38,210 --> 00:41:41,750 And clients go to these multiple trusted parties 802 00:41:41,750 --> 00:41:43,990 and get a list of all of the nodes from all of them 803 00:41:43,990 --> 00:41:47,010 and combine those lists. 804 00:41:47,010 --> 00:41:49,813 Then you're actually-- first off, 805 00:41:49,813 --> 00:41:51,560 you're partitioned in that case. 806 00:41:51,560 --> 00:41:54,060 If I choose these three, and you choose those three, 807 00:41:54,060 --> 00:41:55,975 and they say anything different, then we'll 808 00:41:55,975 --> 00:41:57,350 be using different sets of nodes. 809 00:41:57,350 --> 00:41:58,820 So that's still not good. 810 00:41:58,820 --> 00:42:01,800 Also, there's still a [INAUDIBLE] 811 00:42:01,800 --> 00:42:08,820 where if I use the intersection of the sets they tell me, 812 00:42:08,820 --> 00:42:11,520 then any one of them can keep me from using a node they 813 00:42:11,520 --> 00:42:13,360 don't like by not listing it. 814 00:42:13,360 --> 00:42:16,700 If I use the union, anyone can flood me 815 00:42:16,700 --> 00:42:21,630 by making 20,000 fake servers that are all on the list. 816 00:42:21,630 --> 00:42:24,545 I might compute the result of some sort of vote 817 00:42:24,545 --> 00:42:26,930 on them, which would solve those two problems. 818 00:42:26,930 --> 00:42:28,890 But I'd still be partitioned from everyone 819 00:42:28,890 --> 00:42:32,580 who's using different trusted parties. 820 00:42:32,580 --> 00:42:35,270 We could do a magical DHT. 821 00:42:35,270 --> 00:42:36,859 Have we done [INAUDIBLE] hash tables? 822 00:42:36,859 --> 00:42:39,150 All right, we could do some sort of magical distributed 823 00:42:39,150 --> 00:42:43,930 structure run across all of the nodes. 824 00:42:43,930 --> 00:42:50,140 I say magical, because although there are designs in this area, 825 00:42:50,140 --> 00:42:54,320 and some better than others, none of them 826 00:42:54,320 --> 00:42:58,624 really seem to have a solid security evidence for it 827 00:42:58,624 --> 00:43:00,040 at this point to the point where I 828 00:43:00,040 --> 00:43:04,260 would be comfortable in saying, yes, this is actually secure. 829 00:43:04,260 --> 00:43:06,900 So the solution we wound up with is 830 00:43:06,900 --> 00:43:10,610 have multiple hardened trusted authorities run 831 00:43:10,610 --> 00:43:14,040 by trusted parties that collect lists of nodes 832 00:43:14,040 --> 00:43:17,690 that vote hourly on which nodes are running 833 00:43:17,690 --> 00:43:21,870 that can vote to exclude nodes that seem to be misbehaving 834 00:43:21,870 --> 00:43:25,920 that are all running on the same slash 16 that are doing 835 00:43:25,920 --> 00:43:29,120 strange things to traffic, and have 836 00:43:29,120 --> 00:43:34,190 them form a consensus that's a result of their votes. 837 00:43:34,190 --> 00:43:36,017 And everybody signs the consensus. 838 00:43:36,017 --> 00:43:37,517 And clients don't use it unless it's 839 00:43:37,517 --> 00:43:39,490 signed by enough authorities. 840 00:43:39,490 --> 00:43:40,940 This is not the final design. 841 00:43:40,940 --> 00:43:44,670 But it's the best we've managed to come up with so far. 842 00:43:44,670 --> 00:43:46,630 And this way, all you need to distribute 843 00:43:46,630 --> 00:43:51,880 with clients is a list of all of the authorities' public keys 844 00:43:51,880 --> 00:43:54,210 and some places to get the directories. 845 00:43:54,210 --> 00:43:58,120 You want to have all the nodes cache these directory things. 846 00:43:58,120 --> 00:44:00,604 Because if you don't, the bandwidth load on authorities 847 00:44:00,604 --> 00:44:01,270 is catastrophic. 848 00:44:04,320 --> 00:44:06,050 So I'm going to skip over that. 849 00:44:06,050 --> 00:44:11,260 Because I would love to talk about how 850 00:44:11,260 --> 00:44:13,295 clients should choose which paths 851 00:44:13,295 --> 00:44:14,800 to build through the network. 852 00:44:14,800 --> 00:44:17,560 I would love to talk about issues applications 853 00:44:17,560 --> 00:44:20,382 and making applications not deanonymize themselves. 854 00:44:20,382 --> 00:44:21,590 I'd love to talk about abuse. 855 00:44:21,590 --> 00:44:24,470 I'd love to talk about hidden services and how they work. 856 00:44:24,470 --> 00:44:27,210 I'd love to talk about censorship resistance. 857 00:44:27,210 --> 00:44:30,540 And I'd like to talk about attacks and defenses. 858 00:44:30,540 --> 00:44:34,230 But I've only got 35 minutes. 859 00:44:34,230 --> 00:44:36,280 And I can't possibly cover all of these. 860 00:44:36,280 --> 00:44:38,490 So show of hands for how many people 861 00:44:38,490 --> 00:44:42,500 think the most important-- think about what you think 862 00:44:42,500 --> 00:44:45,584 are the two most important topics on this list. 863 00:44:45,584 --> 00:44:47,250 If one of your two most important topics 864 00:44:47,250 --> 00:44:49,041 is path selection and how you choose nodes, 865 00:44:49,041 --> 00:44:51,500 please raise your hand. 866 00:44:51,500 --> 00:44:53,550 If one of your two most important topics 867 00:44:53,550 --> 00:44:57,370 is application issues and how to make applications not 868 00:44:57,370 --> 00:45:00,044 bust your anonymity, please raise your hand. 869 00:45:00,044 --> 00:45:02,020 If one of your most important issues 870 00:45:02,020 --> 00:45:05,700 is abuse and what kind of abuse we see, how you can prevent it, 871 00:45:05,700 --> 00:45:08,294 and how that works out, please raise your hand. 872 00:45:08,294 --> 00:45:11,651 OK, that one's popular. 873 00:45:11,651 --> 00:45:13,150 If one of your most important topics 874 00:45:13,150 --> 00:45:14,566 is how these services work and how 875 00:45:14,566 --> 00:45:17,280 they can be made to work better, please raise your hand. 876 00:45:17,280 --> 00:45:19,530 Wow, that's much more popular on this side of the room 877 00:45:19,530 --> 00:45:20,654 than that side of the room. 878 00:45:20,654 --> 00:45:23,162 What's going on? 879 00:45:23,162 --> 00:45:24,820 You guys in a club? 880 00:45:24,820 --> 00:45:26,926 Are you up to something? 881 00:45:26,926 --> 00:45:29,610 Censorship, who's interested in censorship? 882 00:45:29,610 --> 00:45:32,880 OK, that's fairly popular. 883 00:45:32,880 --> 00:45:36,170 Attacks and defenses? 884 00:45:36,170 --> 00:45:39,530 OK, so we're not doing paths and we're not doing apps. 885 00:45:39,530 --> 00:45:44,600 So apps-- guard nodes, guard nodes, C guard node designs, 886 00:45:44,600 --> 00:45:46,240 select by bandwidth. 887 00:45:46,240 --> 00:45:48,230 You need to actually weight by bandwidth, 888 00:45:48,230 --> 00:45:51,200 but you also need a trusted way to measure bandwidth. 889 00:45:51,200 --> 00:45:55,025 And that's the too long, didn't lecture of what 890 00:45:55,025 --> 00:45:56,150 would be on path selection. 891 00:45:56,150 --> 00:45:59,555 For application issues, almost no protocol 892 00:45:59,555 --> 00:46:03,630 is actually designed to provide anonymity. 893 00:46:03,630 --> 00:46:06,530 Because almost every protocol that's widely used 894 00:46:06,530 --> 00:46:08,324 has the assumption in it, well, you 895 00:46:08,324 --> 00:46:09,740 know, anyone who wants to can just 896 00:46:09,740 --> 00:46:12,500 see the IPs on this traffic. 897 00:46:12,500 --> 00:46:16,030 So there's no point in trying to conceal identity. 898 00:46:16,030 --> 00:46:18,900 So in a particularly complex protocol, 899 00:46:18,900 --> 00:46:22,320 like the whole stack of protocols a web browser uses, 900 00:46:22,320 --> 00:46:24,020 there's no real way to anonymize that 901 00:46:24,020 --> 00:46:27,400 just by anonymizing the traffic with something like Tor. 902 00:46:27,400 --> 00:46:30,150 You need to hack the web browser pretty hard 903 00:46:30,150 --> 00:46:32,810 to make it stop doing things like leaking the list of fonts 904 00:46:32,810 --> 00:46:34,830 that are identified on your system, 905 00:46:34,830 --> 00:46:38,540 leaking your exact window size, allowing 906 00:46:38,540 --> 00:46:41,780 all kinds of permanent cookie-like structures, 907 00:46:41,780 --> 00:46:44,740 leaking what's in the cache and what's not in the cache, 908 00:46:44,740 --> 00:46:46,250 and so on. 909 00:46:46,250 --> 00:46:48,680 So your choices there are basically 910 00:46:48,680 --> 00:46:52,180 isolate everything and restart from a fresh VM all the time, 911 00:46:52,180 --> 00:46:53,514 or reroute the browser, or both. 912 00:46:53,514 --> 00:46:55,513 Other things are a lot easier than web browsers, 913 00:46:55,513 --> 00:46:56,460 but still problematic. 914 00:46:56,460 --> 00:47:00,780 That's all I'm going to say about app issues. 915 00:47:00,780 --> 00:47:02,850 Let's see, I think I got the most 916 00:47:02,850 --> 00:47:05,142 hands-- did you see what I got the most hands for, 917 00:47:05,142 --> 00:47:06,624 any opinions? 918 00:47:06,624 --> 00:47:08,083 STUDENT: Abuse and hidden services? 919 00:47:08,083 --> 00:47:09,832 NICK MATHEWSON: Abuse and hidden services. 920 00:47:09,832 --> 00:47:12,277 All right, I'll talk about abuse and hidden services. 921 00:47:12,277 --> 00:47:15,200 And if I've still got time, I'll do censorship and attacks. 922 00:47:15,200 --> 00:47:19,185 So let's go to abuse-- abuse, abuse, abuse. 923 00:47:22,420 --> 00:47:26,960 So one problem that we've fortunately not 924 00:47:26,960 --> 00:47:30,707 had all that much of-- so when we were working on this stuff, 925 00:47:30,707 --> 00:47:32,490 the problem that everybody was afraid of 926 00:47:32,490 --> 00:47:34,698 was this horrible stuff that would get you kicked off 927 00:47:34,698 --> 00:47:37,580 of any ISP, and it would create tremendous legal issues 928 00:47:37,580 --> 00:47:38,750 and ruin your lives. 929 00:47:38,750 --> 00:47:41,360 I speak of course of file sharing. 930 00:47:41,360 --> 00:47:43,540 We were terrified that people would 931 00:47:43,540 --> 00:47:48,200 try to BitTorrent or Gnutella or whatever over this thing. 932 00:47:48,200 --> 00:47:49,760 Yes, it was a long time ago. 933 00:47:49,760 --> 00:47:52,990 And we thought about how we'd do that. 934 00:47:52,990 --> 00:47:55,470 Well, you'll see in the paper that we talk a lot about exit 935 00:47:55,470 --> 00:47:58,140 policies, about letting exit nodes say, 936 00:47:58,140 --> 00:48:03,040 I only allow connections to port 80 and port 443. 937 00:48:03,040 --> 00:48:05,850 This doesn't actually help with abuse at all. 938 00:48:05,850 --> 00:48:15,800 Because you can try to spread worms over port 80. 939 00:48:15,800 --> 00:48:21,897 You can post abusive stuff to IRC channels over web 940 00:48:21,897 --> 00:48:23,710 to IRC interfaces. 941 00:48:23,710 --> 00:48:26,140 Everything's got a web interface these days. 942 00:48:26,140 --> 00:48:29,340 So you can't really say, it's only web. 943 00:48:29,340 --> 00:48:30,400 It's safe. 944 00:48:30,400 --> 00:48:33,040 If it's useful, it can be abused. 945 00:48:33,040 --> 00:48:35,450 That said, there are people who are 946 00:48:35,450 --> 00:48:39,000 willing to run exits that deliver 80 and 443 947 00:48:39,000 --> 00:48:42,547 who would not be willing to run exits delivering all ports. 948 00:48:42,547 --> 00:48:43,880 So it did turn out to be useful. 949 00:48:43,880 --> 00:48:45,588 It just didn't turn out to be a solution. 950 00:48:49,010 --> 00:48:54,699 Another thing that creates problems is criminal activity 951 00:48:54,699 --> 00:48:56,740 generally doesn't create problems for the network 952 00:48:56,740 --> 00:48:58,560 operators so much. 953 00:48:58,560 --> 00:49:01,750 From time to time, somebody's server gets seized and returned 954 00:49:01,750 --> 00:49:04,550 six months later, and they have to wipe the thing. 955 00:49:04,550 --> 00:49:07,430 That's still an infrequent enough occurrence 956 00:49:07,430 --> 00:49:12,950 that it's somewhat surprising when it happens. 957 00:49:12,950 --> 00:49:16,050 And so yeah, don't run an exit node on a server 958 00:49:16,050 --> 00:49:19,185 that you need to graduate. 959 00:49:23,165 --> 00:49:23,665 What else? 960 00:49:27,670 --> 00:49:31,210 The biggest problem that we have for abuse of stuff 961 00:49:31,210 --> 00:49:34,260 is that many websites around the world, 962 00:49:34,260 --> 00:49:36,200 and many IRC services and so one, 963 00:49:36,200 --> 00:49:42,210 use IP-based blocking in order to deter and mitigate 964 00:49:42,210 --> 00:49:50,680 abusive behavior-- people posting road kill pictures 965 00:49:50,680 --> 00:49:56,160 on My Little Pony sites, people flaming everybody 966 00:49:56,160 --> 00:49:59,690 on IRC channels, people making love, 967 00:49:59,690 --> 00:50:05,300 leave, join requests, people replacing entire Wikipedia 968 00:50:05,300 --> 00:50:08,896 pages with racial slurs. 969 00:50:08,896 --> 00:50:09,770 This stuff it's real. 970 00:50:09,770 --> 00:50:10,478 It's problematic. 971 00:50:10,478 --> 00:50:13,560 It's unacceptable to the websites and services 972 00:50:13,560 --> 00:50:15,580 that use IP-based blocking. 973 00:50:15,580 --> 00:50:18,140 They need a way to keep this from happening. 974 00:50:18,140 --> 00:50:21,950 And IP-based blocking is a cheap way for them to do that. 975 00:50:21,950 --> 00:50:27,230 So it's pretty frequent that Tor users get banned completely 976 00:50:27,230 --> 00:50:30,340 from some sites. 977 00:50:30,340 --> 00:50:36,370 There's some work on trying to say, well, why does IP-based 978 00:50:36,370 --> 00:50:37,330 blocking really work? 979 00:50:37,330 --> 00:50:40,690 Is it because IPs are people? 980 00:50:40,690 --> 00:50:41,310 No. 981 00:50:41,310 --> 00:50:44,295 Everybody in this room knows how to get a different IP 982 00:50:44,295 --> 00:50:45,710 if they need one. 983 00:50:45,710 --> 00:50:49,540 Everybody in this room knows how to get like tens of thousands 984 00:50:49,540 --> 00:50:51,550 of different IPs if they need one, 985 00:50:51,550 --> 00:50:53,180 if they need tens of thousands. 986 00:50:53,180 --> 00:50:56,680 But for most people, getting more IPs 987 00:50:56,680 --> 00:50:59,720 is at least a little time consuming and at least 988 00:50:59,720 --> 00:51:03,265 a little challenging to the extent that it imposes a rate 989 00:51:03,265 --> 00:51:05,660 limit and a resource cost on abuse 990 00:51:05,660 --> 00:51:08,940 if you don't want a bot net and if they've already blocked 991 00:51:08,940 --> 00:51:12,110 Tor and all the other proxy services. 992 00:51:12,110 --> 00:51:16,850 So for that, you need to look at different ways 993 00:51:16,850 --> 00:51:20,380 to provide other resource costs. 994 00:51:20,380 --> 00:51:24,970 You can either say, well-- have you done blind signatures? 995 00:51:24,970 --> 00:51:28,740 Oh, you can construct things so that you 996 00:51:28,740 --> 00:51:31,210 need an IP to make an account. 997 00:51:31,210 --> 00:51:33,620 But what account you make with an IP 998 00:51:33,620 --> 00:51:37,250 is not linkable to your IP. 999 00:51:37,250 --> 00:51:39,277 And then later on if the account gets banned, 1000 00:51:39,277 --> 00:51:41,670 you need to create a new account from a different IP. 1001 00:51:41,670 --> 00:51:44,211 That's something you can build, and we're working with people 1002 00:51:44,211 --> 00:51:47,890 to work on it, although it needs more hacking on the integration 1003 00:51:47,890 --> 00:51:48,630 side. 1004 00:51:48,630 --> 00:51:51,213 Something else that needs more hacking on the integration side 1005 00:51:51,213 --> 00:51:54,387 is anonymous black listable credentials. 1006 00:51:54,387 --> 00:51:55,470 They're a little esoteric. 1007 00:51:55,470 --> 00:52:02,220 But the idea is that you get something 1008 00:52:02,220 --> 00:52:05,780 that allows you to participate on an IRC server, for example. 1009 00:52:05,780 --> 00:52:08,080 You can use this as many times as you want. 1010 00:52:08,080 --> 00:52:12,380 Your using it is not linkable until you are banned. 1011 00:52:12,380 --> 00:52:14,580 Once you are banned, future attempts 1012 00:52:14,580 --> 00:52:18,000 from the same person with the same credential don't work. 1013 00:52:18,000 --> 00:52:21,840 But past activities do not become linkable to one another. 1014 00:52:21,840 --> 00:52:24,090 These can be built pretty easily. 1015 00:52:24,090 --> 00:52:26,730 The problem is convincing people who are more or less satisfied 1016 00:52:26,730 --> 00:52:29,300 with IP blocking to actually use them 1017 00:52:29,300 --> 00:52:32,965 and actually integrating them with services. 1018 00:52:32,965 --> 00:52:36,170 Someone inevitably asks me-- it's kind of neat. 1019 00:52:36,170 --> 00:52:43,310 So I started these lecture notes based on my lecture from 2013. 1020 00:52:43,310 --> 00:52:46,110 And there was something about the inevitable question 1021 00:52:46,110 --> 00:52:48,660 about Silk Road 1 getting busted. 1022 00:52:48,660 --> 00:52:50,885 There's the inevitable question about Silk Road 2 1023 00:52:50,885 --> 00:52:51,510 getting busted. 1024 00:52:51,510 --> 00:52:55,880 Silk Road 2 was a hidden service operating on the Tor network 1025 00:52:55,880 --> 00:52:58,650 where people would get together to buy and sell 1026 00:52:58,650 --> 00:53:03,480 illegal things, mostly illegal drugs. 1027 00:53:03,480 --> 00:53:06,360 So as far as we know, as far as we can find out, 1028 00:53:06,360 --> 00:53:10,050 the guy got busted through bad OPSEC. 1029 00:53:10,050 --> 00:53:13,810 Like he made a public posting with his actual name, 1030 00:53:13,810 --> 00:53:17,430 and then went and deleted it and put his pseudonym on it. 1031 00:53:17,430 --> 00:53:20,363 Tor can't help people against that kind of stuff. 1032 00:53:20,363 --> 00:53:23,520 On the other hand, if you've been looking at the NSA leaks, 1033 00:53:23,520 --> 00:53:26,640 you know that law enforcement has been getting information 1034 00:53:26,640 --> 00:53:29,620 from intelligence and then sanitizing it 1035 00:53:29,620 --> 00:53:33,495 through a process called dual construction where 1036 00:53:33,495 --> 00:53:36,120 the intelligence agency will say to the law enforcement agency, 1037 00:53:36,120 --> 00:53:38,390 OK, look, it's Fred over there. 1038 00:53:38,390 --> 00:53:39,480 He did it. 1039 00:53:39,480 --> 00:53:41,482 But that's not admissible in a court, 1040 00:53:41,482 --> 00:53:43,190 and you can never admit that we told you. 1041 00:53:43,190 --> 00:53:46,125 Just find some other way to find out that Fred did it, 1042 00:53:46,125 --> 00:53:48,120 but Fred did it. 1043 00:53:48,120 --> 00:53:50,210 According to some of the Snowden leaks 1044 00:53:50,210 --> 00:53:52,380 and some of the leaks from the other guy, who 1045 00:53:52,380 --> 00:53:59,910 has still not been caught, that's done sometimes. 1046 00:53:59,910 --> 00:54:05,960 So OK, at this point, you use your basic Bayesian reasoning 1047 00:54:05,960 --> 00:54:08,850 skills, and you say, well OK, would I 1048 00:54:08,850 --> 00:54:11,090 see this evidence if the guy actually 1049 00:54:11,090 --> 00:54:13,040 got caught by because of OPSEC? 1050 00:54:13,040 --> 00:54:14,490 Yes, I would. 1051 00:54:14,490 --> 00:54:15,720 I would see bad OPSEC. 1052 00:54:15,720 --> 00:54:19,880 I would see reports that he got caught because of bad OPSEC. 1053 00:54:19,880 --> 00:54:24,410 But what would I see if it were a dual construction case? 1054 00:54:24,410 --> 00:54:27,100 I would also see reports that the guy 1055 00:54:27,100 --> 00:54:29,450 got caught by bad OPSEC. 1056 00:54:29,450 --> 00:54:32,100 Because the evidence that would be available to 1057 00:54:32,100 --> 00:54:33,970 us is the same in either case. 1058 00:54:33,970 --> 00:54:38,185 We can't really conclude much from any public reports 1059 00:54:38,185 --> 00:54:39,940 of that. 1060 00:54:39,940 --> 00:54:44,521 That said, it does look like the guy got busted by bad OPSEC. 1061 00:54:44,521 --> 00:54:46,145 It does look like the kind of bad OPSEC 1062 00:54:46,145 --> 00:54:48,000 that you would be looking for if you 1063 00:54:48,000 --> 00:54:51,210 were trying to catch somebody running something like this. 1064 00:54:51,210 --> 00:54:54,620 Nevertheless, earlier I suggested that please do not 1065 00:54:54,620 --> 00:54:58,130 use myself to break any laws. 1066 00:54:58,130 --> 00:55:05,380 Also if you're life or freedom is at stake from using 1067 00:55:05,380 --> 00:55:09,665 Tor or any security product, do not 1068 00:55:09,665 --> 00:55:11,180 use that product in isolation. 1069 00:55:11,180 --> 00:55:14,810 Think of ways to use it to construct 1070 00:55:14,810 --> 00:55:21,330 a series of redundant defenses for yourself 1071 00:55:21,330 --> 00:55:23,830 if your life or freedom at stake, 1072 00:55:23,830 --> 00:55:27,050 or if having the system broken is 1073 00:55:27,050 --> 00:55:28,780 completely unacceptable to you. 1074 00:55:28,780 --> 00:55:30,024 And I'll say that about Tor. 1075 00:55:30,024 --> 00:55:31,190 And I'll say that about TLS. 1076 00:55:31,190 --> 00:55:33,590 And I'll say that about PGP. 1077 00:55:33,590 --> 00:55:38,620 Software is a work in progress. 1078 00:55:38,620 --> 00:55:41,065 So that's the abuse section. 1079 00:55:41,065 --> 00:55:44,870 I've got 25 minutes-- hidden services. 1080 00:55:47,750 --> 00:55:50,490 Where's hidden services? 1081 00:55:50,490 --> 00:55:53,620 So responder anonymity is a much harder problem 1082 00:55:53,620 --> 00:55:55,640 than initiator anonymity. 1083 00:55:55,640 --> 00:55:57,300 Initiator anonymity is what you get 1084 00:55:57,300 --> 00:56:00,210 when Alice wants to buy socks, and Alice 1085 00:56:00,210 --> 00:56:02,580 wants to stay anonymous from the sock vendor. 1086 00:56:02,580 --> 00:56:05,200 Responder anonymity is when Alice 1087 00:56:05,200 --> 00:56:09,300 wants to publish her poetry online and run a web 1088 00:56:09,300 --> 00:56:11,190 server that has her poetry on it, 1089 00:56:11,190 --> 00:56:14,150 but not let anyone know where that web server is 1090 00:56:14,150 --> 00:56:16,680 because the poetry is so embarrassing. 1091 00:56:16,680 --> 00:56:19,360 And yes there actually is a hidden service 1092 00:56:19,360 --> 00:56:21,710 out there of mine with bad poetry on it. 1093 00:56:21,710 --> 00:56:24,070 No, I don't think anybody's actually published it yet. 1094 00:56:24,070 --> 00:56:26,490 No, I'm not going to tell anybody where it is. 1095 00:56:26,490 --> 00:56:27,990 I'm waiting for it to go public. 1096 00:56:31,390 --> 00:56:37,920 So all right, one thing you could do is-- let's 1097 00:56:37,920 --> 00:56:39,351 see, how much time? 1098 00:56:39,351 --> 00:56:43,650 OK, I can do this. 1099 00:56:43,650 --> 00:56:46,622 So now Alice wants to publish her poetry. 1100 00:56:46,622 --> 00:56:48,205 So I'm going to put Alice on this end, 1101 00:56:48,205 --> 00:56:49,450 because she's the responder. 1102 00:56:49,450 --> 00:56:54,080 Alice could build a path-- this represents a lot of relays-- 1103 00:56:54,080 --> 00:56:59,052 through the Tor network, and then just say to this relay, 1104 00:56:59,052 --> 00:57:00,135 please accept connections. 1105 00:57:02,660 --> 00:57:05,600 So now anyone who goes to this relay could say, 1106 00:57:05,600 --> 00:57:07,770 hey, I want to talk to Alice. 1107 00:57:07,770 --> 00:57:10,180 And there have been designs that work this way. 1108 00:57:10,180 --> 00:57:12,620 It has some challenges, though. 1109 00:57:12,620 --> 00:57:15,185 One challenge is this relay could man in the middle 1110 00:57:15,185 --> 00:57:19,920 all the traffic unless there is a well known TLS key. 1111 00:57:19,920 --> 00:57:22,400 Another thing is maybe this relay 1112 00:57:22,400 --> 00:57:24,396 is also embarrassed by the poetry 1113 00:57:24,396 --> 00:57:26,020 and doesn't want to be a public contact 1114 00:57:26,020 --> 00:57:31,160 point for poetry so terrible. 1115 00:57:31,160 --> 00:57:35,280 So this relay could also be pressured by other people who 1116 00:57:35,280 --> 00:57:37,760 hate the poetry to censor it. 1117 00:57:37,760 --> 00:57:41,940 This relay could also make itself an attack target. 1118 00:57:41,940 --> 00:57:45,130 So you want some way where Alice can go to different relays over 1119 00:57:45,130 --> 00:57:51,170 time and no single relay is touching unencrypted traffic 1120 00:57:51,170 --> 00:57:52,480 of Alice's. 1121 00:57:52,480 --> 00:57:56,620 All right, that's doable. 1122 00:57:56,620 --> 00:57:58,510 But once you have a lot of different relays, 1123 00:57:58,510 --> 00:58:01,790 what does Alice actually tell people? 1124 00:58:01,790 --> 00:58:04,490 It's kind of got to be a public key. 1125 00:58:04,490 --> 00:58:08,250 Because if she just says, relay x, relay y, relay z, but x, y, 1126 00:58:08,250 --> 00:58:11,530 and z are changing every five minutes, 1127 00:58:11,530 --> 00:58:13,920 that's kind of challenging to know you actually 1128 00:58:13,920 --> 00:58:15,570 got the right relay. 1129 00:58:15,570 --> 00:58:17,590 So let's say she tells everybody a public key, 1130 00:58:17,590 --> 00:58:22,550 and once she gets over here, she says, hey, this is Alice. 1131 00:58:22,550 --> 00:58:24,090 I'll prove it with my public key. 1132 00:58:24,090 --> 00:58:33,960 So this relay knows that public key z is 1133 00:58:33,960 --> 00:58:35,380 running a hidden service here. 1134 00:58:35,380 --> 00:58:38,330 And so if anyone else says, hey, connect me to public key z, 1135 00:58:38,330 --> 00:58:41,130 they can do a handshake and wind up 1136 00:58:41,130 --> 00:58:43,170 with a shared key with Alice. 1137 00:58:43,170 --> 00:58:46,260 And it's the same handshake as the Tor circuit extension uses. 1138 00:58:46,260 --> 00:58:48,590 And now Bob can read Alice's poetry 1139 00:58:48,590 --> 00:58:52,190 by going another path through the Tor network over here. 1140 00:58:52,190 --> 00:58:57,045 Bob has to know PKz, and Bob can say, hey, connect me with PKz. 1141 00:58:57,045 --> 00:58:59,170 Send this thing that's sort of like a create cell-- 1142 00:58:59,170 --> 00:59:01,380 really it's an introduce cell, but let's 1143 00:59:01,380 --> 00:59:03,380 forget that-- over the Alice. 1144 00:59:03,380 --> 00:59:05,820 They do the same handshake that relays do. 1145 00:59:05,820 --> 00:59:07,913 And now they have a shared key that they can 1146 00:59:07,913 --> 00:59:10,100 use for end to end encryption. 1147 00:59:10,100 --> 00:59:11,915 Well, there's something I left out, though, 1148 00:59:11,915 --> 00:59:15,120 which is, how does Bob know how to go here? 1149 00:59:15,120 --> 00:59:17,082 And can we do anything about the fact 1150 00:59:17,082 --> 00:59:22,480 that this relay has to learn to this public key? 1151 00:59:22,480 --> 00:59:23,070 Well, we can. 1152 00:59:23,070 --> 00:59:27,730 We can add some [INAUDIBLE] directory system 1153 00:59:27,730 --> 00:59:32,745 where Alice uploads a signed statement anonymously over Tor 1154 00:59:32,745 --> 00:59:38,725 saying PKz is at a relay x. 1155 00:59:41,590 --> 00:59:44,620 And then Bob says, hey, give me a signed statement 1156 00:59:44,620 --> 00:59:46,520 to ask the directory system, hey, give me 1157 00:59:46,520 --> 00:59:49,940 a signed statement about PKz. 1158 00:59:49,940 --> 00:59:52,376 And Bob finds out where to go. 1159 00:59:52,376 --> 00:59:56,740 And we could even do one better and have Alice give 1160 00:59:56,740 --> 00:59:59,250 a different public key here. 1161 00:59:59,250 --> 01:00:00,890 So this could be PKw. 1162 01:00:04,660 --> 01:00:09,840 And the statement she uploads to the directory can say, 1163 01:00:09,840 --> 01:00:12,730 if you want to talk to the service with public key z, 1164 01:00:12,730 --> 01:00:16,560 then go to relay x and use public key w. 1165 01:00:16,560 --> 01:00:21,820 And now public key z isn't published here. 1166 01:00:21,820 --> 01:00:26,590 You could even go one farther and encrypt this 1167 01:00:26,590 --> 01:00:29,480 with some shared secret known to Alice and Bob. 1168 01:00:29,480 --> 01:00:32,330 And if you do that, then the directory service 1169 01:00:32,330 --> 01:00:34,990 and people who can contact the directory service 1170 01:00:34,990 --> 01:00:39,530 can't learn how to connect to Alice with that. 1171 01:00:39,530 --> 01:00:40,030 Yeah. 1172 01:00:40,030 --> 01:00:42,190 STUDENT: Just a quick question there. 1173 01:00:42,190 --> 01:00:44,850 If that's not encrypted, then Rx can still 1174 01:00:44,850 --> 01:00:48,010 find out that it's running a service for Alice, right? 1175 01:00:48,010 --> 01:00:48,890 NICK MATHEWSON: Yep. 1176 01:00:48,890 --> 01:00:49,934 Well, not for Alice. 1177 01:00:49,934 --> 01:00:51,475 It can find out that it's running PKz 1178 01:00:51,475 --> 01:00:53,060 if this is not encrypted. 1179 01:00:53,060 --> 01:00:55,680 We have a design for that that I'm actually going 1180 01:00:55,680 --> 01:00:56,950 to get to at the end of this. 1181 01:00:56,950 --> 01:00:58,740 But it's not built yet. 1182 01:00:58,740 --> 01:01:01,040 But it's pretty cool. 1183 01:01:01,040 --> 01:01:03,535 So OK, and you don't want to use a centralized directory 1184 01:01:03,535 --> 01:01:04,460 for this. 1185 01:01:04,460 --> 01:01:12,280 So we actually do use a DHT, which is, again, not perfect, 1186 01:01:12,280 --> 01:01:14,370 and has some censorship opportunities. 1187 01:01:14,370 --> 01:01:16,966 But we are trying to make those less and less. 1188 01:01:16,966 --> 01:01:19,700 And I might cover more stuff, so I can't do the whole details. 1189 01:01:22,510 --> 01:01:24,860 So one of the problems there though 1190 01:01:24,860 --> 01:01:28,090 is if you are running one of these directory services, 1191 01:01:28,090 --> 01:01:35,960 you've got a complete list of these keys pretty-- over time, 1192 01:01:35,960 --> 01:01:37,800 you run a directory service [INAUDIBLE]. 1193 01:01:37,800 --> 01:01:39,551 You get a complete list of all these keys, 1194 01:01:39,551 --> 01:01:41,300 and you can try connecting to all the ones 1195 01:01:41,300 --> 01:01:43,830 that don't have encrypted stuff to find out what's there. 1196 01:01:43,830 --> 01:01:45,509 That's called an enumeration attack. 1197 01:01:45,509 --> 01:01:47,050 And we didn't list that in our paper, 1198 01:01:47,050 --> 01:01:49,690 because we weren't thinking of that. 1199 01:01:49,690 --> 01:01:51,270 We didn't. 1200 01:01:51,270 --> 01:01:53,630 But it is something we'd like to resist. 1201 01:01:53,630 --> 01:01:57,680 So in the design I hope to be hacking together 1202 01:01:57,680 --> 01:02:02,020 sometime in 2014, we're going to move towards a key blinding 1203 01:02:02,020 --> 01:02:18,770 approach where Alice and Bob share PKz, 1204 01:02:18,770 --> 01:02:22,190 but this statement is not signed with PKz. 1205 01:02:22,190 --> 01:02:24,780 This statement is signed with PKz prime 1206 01:02:24,780 --> 01:02:33,380 where PKz prime is derived from PKz 1207 01:02:33,380 --> 01:02:44,490 and, say, the date such that if you know PKz and the date, 1208 01:02:44,490 --> 01:02:47,240 you can derive PKz prime. 1209 01:02:47,240 --> 01:02:51,810 If like Alice you know secret Kz, 1210 01:02:51,810 --> 01:02:56,550 you can generate messages that are signed by PKz prime. 1211 01:02:56,550 --> 01:03:01,410 But if you only see PKz prime, even knowing the date, 1212 01:03:01,410 --> 01:03:04,440 you cannot re-derive PKz. 1213 01:03:04,440 --> 01:03:06,170 We've got a proof. 1214 01:03:06,170 --> 01:03:10,495 And if you'd like to find out how this works, then ping me 1215 01:03:10,495 --> 01:03:11,960 and I'll send you the paper. 1216 01:03:11,960 --> 01:03:15,700 It's a cool trick. 1217 01:03:15,700 --> 01:03:18,590 We weren't the first ones to invent this idea. 1218 01:03:18,590 --> 01:03:22,900 But that is how we're going to solve enumeration attacks 1219 01:03:22,900 --> 01:03:26,790 sometime this coming year assuming that I can actually 1220 01:03:26,790 --> 01:03:29,253 get the time to build it. 1221 01:03:29,253 --> 01:03:30,336 So that's hidden services. 1222 01:03:34,730 --> 01:03:41,630 Attacks and defenses-- so so far, 1223 01:03:41,630 --> 01:03:44,600 the biggest category of attacks we've seen 1224 01:03:44,600 --> 01:03:47,370 is attacks at the application level. 1225 01:03:47,370 --> 01:03:50,810 So if you're running an application over Tor, 1226 01:03:50,810 --> 01:03:56,146 and it's sending unencrypted traffic, like regular HTTP, 1227 01:03:56,146 --> 01:03:59,450 then a hostile exit node, just like anyone 1228 01:03:59,450 --> 01:04:02,470 else who touches HTTP traffic, can observe and modify 1229 01:04:02,470 --> 01:04:04,830 the traffic. 1230 01:04:04,830 --> 01:04:08,240 This is the number one attack on the whole system. 1231 01:04:08,240 --> 01:04:10,120 The solution is encrypted traffic. 1232 01:04:10,120 --> 01:04:13,060 Fortunately, we're kind of in an encryption renaissance 1233 01:04:13,060 --> 01:04:14,520 over the last few years. 1234 01:04:14,520 --> 01:04:16,650 And more and more traffic is getting 1235 01:04:16,650 --> 01:04:21,520 encrypted with the nifty free certificate authority 1236 01:04:21,520 --> 01:04:25,550 that EFF and Mozilla and Cisco and I forget who else announced 1237 01:04:25,550 --> 01:04:26,740 a day or two ago. 1238 01:04:26,740 --> 01:04:29,632 There will be even less excuse for unencrypted traffic in 2015 1239 01:04:29,632 --> 01:04:31,420 than there was this year. 1240 01:04:31,420 --> 01:04:33,210 So that solves that. 1241 01:04:33,210 --> 01:04:37,580 More interesting attacks include things like traffic tagging. 1242 01:04:37,580 --> 01:04:44,090 So we made a mistake in our early integrity checking 1243 01:04:44,090 --> 01:04:44,870 implementation. 1244 01:04:44,870 --> 01:04:47,870 Our early integrity checking implementation 1245 01:04:47,870 --> 01:04:55,098 did end to end checking between Alice's program and the exit 1246 01:04:55,098 --> 01:04:56,410 node. 1247 01:04:56,410 --> 01:04:58,900 But it turns out that that's not enough. 1248 01:04:58,900 --> 01:05:02,410 Because if the first relay messes 1249 01:05:02,410 --> 01:05:07,290 with the traffic in a way that creates a pattern that the exit 1250 01:05:07,290 --> 01:05:10,330 node can detect, then that's an easy way 1251 01:05:10,330 --> 01:05:12,800 for the first relay and the last relay 1252 01:05:12,800 --> 01:05:17,860 to learn that they are on the same path and identify Alice. 1253 01:05:17,860 --> 01:05:20,220 Of course, if the first relay and the last relay 1254 01:05:20,220 --> 01:05:23,390 happen to be on the same path, happen 1255 01:05:23,390 --> 01:05:25,950 to be collaborating anyway, then they can already 1256 01:05:25,950 --> 01:05:30,000 identify Alice through traffic correlation, we believe. 1257 01:05:30,000 --> 01:05:34,944 But perhaps it should not be so easy for them as that. 1258 01:05:34,944 --> 01:05:36,610 Perhaps traffic correlation will someday 1259 01:05:36,610 --> 01:05:38,330 be harder than we think. 1260 01:05:38,330 --> 01:05:41,460 It would be good to actually solve that attack for real. 1261 01:05:41,460 --> 01:05:43,700 We've got two solutions for that. 1262 01:05:43,700 --> 01:05:46,220 One is the expected result of this attack 1263 01:05:46,220 --> 01:05:48,350 is that periodically circuits will fail. 1264 01:05:48,350 --> 01:05:50,750 Because the attacker on the first hop 1265 01:05:50,750 --> 01:05:53,570 guessed wrong about controlling the last hop. 1266 01:05:53,570 --> 01:05:59,130 So every Tor client checks for weird failure rates. 1267 01:05:59,130 --> 01:06:00,910 The real long-term fix is to make it 1268 01:06:00,910 --> 01:06:04,570 so that messing with the pattern on the first hop 1269 01:06:04,570 --> 01:06:07,890 doesn't create more than 1 bit of information on the last hop. 1270 01:06:07,890 --> 01:06:10,790 You can't avoid sending 1 bit of information, 1271 01:06:10,790 --> 01:06:13,830 because the first hop can always just shut down the connection. 1272 01:06:13,830 --> 01:06:17,097 But you can limit it to 1 bit-- OK, 2 bits. 1273 01:06:17,097 --> 01:06:19,430 Because then they'll have the choice to corrupt the data 1274 01:06:19,430 --> 01:06:20,740 or shut down the connection. 1275 01:06:23,700 --> 01:06:25,716 Oh, I had an idea of how to make that better. 1276 01:06:25,716 --> 01:06:28,980 I'll have to think about that. 1277 01:06:28,980 --> 01:06:32,610 Let's see, DOS is actually pretty important. 1278 01:06:32,610 --> 01:06:34,610 There was a paper the other year about something 1279 01:06:34,610 --> 01:06:36,640 that the authors called the sniper attack 1280 01:06:36,640 --> 01:06:39,986 where you see traffic coming from a Tor node 1281 01:06:39,986 --> 01:06:41,850 that you don't control. 1282 01:06:41,850 --> 01:06:44,230 You want to kick everybody off that Tor node. 1283 01:06:44,230 --> 01:06:45,490 So you connect to it. 1284 01:06:45,490 --> 01:06:50,217 You fill up all its memory buffers, and it crashes. 1285 01:06:50,217 --> 01:06:52,050 Then you see whether the traffic in question 1286 01:06:52,050 --> 01:06:54,055 gets rerouted to a node you control or not, 1287 01:06:54,055 --> 01:06:55,235 and you repeat as necessary. 1288 01:06:59,020 --> 01:07:02,575 For that, our best options are first off, 1289 01:07:02,575 --> 01:07:05,310 no longer have memory DOSes. 1290 01:07:05,310 --> 01:07:10,550 I think we have all of the good memory DOSes fixed now. 1291 01:07:10,550 --> 01:07:13,080 There are some bad ones that still needed to get addressed. 1292 01:07:13,080 --> 01:07:16,430 But they're screamingly inefficient. 1293 01:07:16,430 --> 01:07:19,770 The other option for resolving this kind of thing 1294 01:07:19,770 --> 01:07:23,020 is make sure relays are high capacity. 1295 01:07:23,020 --> 01:07:25,720 Don't accept low capacity relays on the network. 1296 01:07:25,720 --> 01:07:26,700 We do that, too. 1297 01:07:26,700 --> 01:07:30,130 If you're trying to run a relay on your phone, 1298 01:07:30,130 --> 01:07:31,570 the authorities won't list it. 1299 01:07:35,950 --> 01:07:39,350 And another thing is to try to pick our circuit scheduling 1300 01:07:39,350 --> 01:07:45,710 algorithms so that it's hard to starve out circuits 1301 01:07:45,710 --> 01:07:46,820 that you don't control. 1302 01:07:46,820 --> 01:07:50,605 That's very hard, though, and it's as yet 1303 01:07:50,605 --> 01:07:52,830 an unsolved problem. 1304 01:07:52,830 --> 01:07:55,660 Let's see, should I do an interesting attack 1305 01:07:55,660 --> 01:07:58,342 or an important attack? 1306 01:07:58,342 --> 01:07:59,216 STUDENT: Interesting. 1307 01:07:59,216 --> 01:08:01,600 NICK MATHEWSON: Interesting, OK. 1308 01:08:01,600 --> 01:08:03,094 So show of hands, how many people 1309 01:08:03,094 --> 01:08:04,510 might like to write a program that 1310 01:08:04,510 --> 01:08:07,130 uses cryptography some day? 1311 01:08:07,130 --> 01:08:08,770 Cool, here's what you must learn. 1312 01:08:08,770 --> 01:08:12,540 Never trust your cryptography implementation. 1313 01:08:12,540 --> 01:08:15,670 So even when it's correct, it's wrong. 1314 01:08:15,670 --> 01:08:21,825 So long ago-- I think this may be one of the worse security 1315 01:08:21,825 --> 01:08:24,430 bugs that we've had. 1316 01:08:24,430 --> 01:08:25,805 Any relay could man in the middle 1317 01:08:25,805 --> 01:08:32,420 any circuit because we assumed that a correct Diffie-Hellman 1318 01:08:32,420 --> 01:08:38,120 implementation would verify that it was not being passed 0 1319 01:08:38,120 --> 01:08:40,600 as one of the inputs. 1320 01:08:40,600 --> 01:08:42,770 The authors of our Diffie-Hellman implementation 1321 01:08:42,770 --> 01:08:44,758 assumed the proper application would never 1322 01:08:44,758 --> 01:08:49,470 pass zero to a Diffie-Hellman implementation. 1323 01:08:49,470 --> 01:08:56,229 So Diffie-Hellman, when I say g to the x, you say g to the y. 1324 01:08:56,229 --> 01:08:57,340 I know x. 1325 01:08:57,340 --> 01:08:58,310 You know y. 1326 01:08:58,310 --> 01:09:01,332 And we can both compute g to the xy now. 1327 01:09:01,332 --> 01:09:02,540 You tend to feel me? 1328 01:09:02,540 --> 01:09:03,100 Good. 1329 01:09:03,100 --> 01:09:06,640 Well, if instead the man in the middle 1330 01:09:06,640 --> 01:09:10,990 replaces my g to the x with 0 and your g to the x with 0, 1331 01:09:10,990 --> 01:09:13,100 and then I happily compute 0 to the x, 1332 01:09:13,100 --> 01:09:16,890 and you compute 0 to the y, we will have the same key. 1333 01:09:16,890 --> 01:09:18,719 We will happily talk to each other. 1334 01:09:18,719 --> 01:09:22,740 But this will be a key that the attacker knows, because it's 0. 1335 01:09:22,740 --> 01:09:25,149 1 also works. 1336 01:09:25,149 --> 01:09:27,290 p also works. 1337 01:09:27,290 --> 01:09:29,729 p plus 1 also works. 1338 01:09:29,729 --> 01:09:33,110 So you basically just need to make sure that your values here 1339 01:09:33,110 --> 01:09:37,120 are within range 2 and p minus 1 if you're doing Diffie-Hellman 1340 01:09:37,120 --> 01:09:38,439 in z sub p. 1341 01:09:41,010 --> 01:09:47,090 OK, let's see, I would love to talk more about censorship. 1342 01:09:47,090 --> 01:09:49,609 Because actually, it's one of the areas 1343 01:09:49,609 --> 01:09:51,460 where we can do the most good. 1344 01:09:51,460 --> 01:09:55,260 Generally, the summarized version of that 1345 01:09:55,260 --> 01:09:57,240 was, in the earliest paper you read, 1346 01:09:57,240 --> 01:09:59,880 and in some of the updates, we were still on the idea 1347 01:09:59,880 --> 01:10:01,880 that we would try to make Tor look just 1348 01:10:01,880 --> 01:10:05,275 like a web client talking to a web server over HTTPS 1349 01:10:05,275 --> 01:10:06,869 and make that hard to block. 1350 01:10:06,869 --> 01:10:08,660 It turns out that's fantastically difficult 1351 01:10:08,660 --> 01:10:10,820 and probably not worth doing. 1352 01:10:10,820 --> 01:10:12,250 Instead, the approach we take now 1353 01:10:12,250 --> 01:10:15,190 is using different plug-in programs 1354 01:10:15,190 --> 01:10:21,030 that a non-listed relay called a bridge can use, 1355 01:10:21,030 --> 01:10:23,930 and a client can use to do different traffic 1356 01:10:23,930 --> 01:10:25,440 transformations. 1357 01:10:25,440 --> 01:10:28,675 And we manage to keep adding new ones of those 1358 01:10:28,675 --> 01:10:30,800 faster than the censors have been able to implement 1359 01:10:30,800 --> 01:10:32,380 blocking for them. 1360 01:10:32,380 --> 01:10:38,560 And that's actually a case where none of the solutions 1361 01:10:38,560 --> 01:10:42,320 are categorically workable. 1362 01:10:42,320 --> 01:10:44,030 That's not a well-formed sentence. 1363 01:10:44,030 --> 01:10:47,170 None of these plug-ins are inherently 1364 01:10:47,170 --> 01:10:50,651 unblockable by any imaginable technique so far. 1365 01:10:50,651 --> 01:10:53,150 But they're good enough to keep traffic unblocked for a year 1366 01:10:53,150 --> 01:10:56,390 or two in most places, and six or seven 1367 01:10:56,390 --> 01:10:59,460 months at a time in China. 1368 01:10:59,460 --> 01:11:02,760 China currently has the most competent censors in the world, 1369 01:11:02,760 --> 01:11:04,580 largely because China doesn't outsource. 1370 01:11:04,580 --> 01:11:08,330 Most other censoring countries outsource their censorship 1371 01:11:08,330 --> 01:11:12,680 to dishonest European, American, and Asian companies whose 1372 01:11:12,680 --> 01:11:15,410 incentives are not actually to sell them good censorship, 1373 01:11:15,410 --> 01:11:17,820 but to keep them on an upgrade treadmill. 1374 01:11:17,820 --> 01:11:21,130 So if you were buying your censorship software 1375 01:11:21,130 --> 01:11:24,470 from the United States-- which technically speaking 1376 01:11:24,470 --> 01:11:27,220 US companies aren't allowed to make censorship software 1377 01:11:27,220 --> 01:11:29,140 for nations. 1378 01:11:29,140 --> 01:11:32,620 But they just make corporate firewall software 1379 01:11:32,620 --> 01:11:34,650 that happens to scale to 10 million people. 1380 01:11:37,240 --> 01:11:39,116 Yeah, I think that's unethical. 1381 01:11:39,116 --> 01:11:41,900 But again, I'm not the political scientist of the organization, 1382 01:11:41,900 --> 01:11:43,729 or the philosopher. 1383 01:11:43,729 --> 01:11:46,020 Paul Syverson, one of the original [INAUDIBLE] authors, 1384 01:11:46,020 --> 01:11:47,790 does have a degree in philosophy, 1385 01:11:47,790 --> 01:11:50,090 for what that's worth, which means that he can't 1386 01:11:50,090 --> 01:11:50,886 answer these questions either. 1387 01:11:50,886 --> 01:11:52,761 But he takes a lot longer not to answer them. 1388 01:11:56,720 --> 01:11:58,550 Right, where was I? 1389 01:11:58,550 --> 01:12:01,380 90 minutes is a long time. 1390 01:12:01,380 --> 01:12:05,200 Censorship-- right, so what the censorware providers 1391 01:12:05,200 --> 01:12:10,020 do is once Tor gets around their censorship, 1392 01:12:10,020 --> 01:12:13,510 they will block the most recent version of Tor. 1393 01:12:13,510 --> 01:12:17,480 But they do it in a way that is the weakest possible block. 1394 01:12:17,480 --> 01:12:20,470 So if we change 1 bit in one identifier somewhere, 1395 01:12:20,470 --> 01:12:22,150 we get around it. 1396 01:12:22,150 --> 01:12:25,050 We can't prove that they're doing this on purpose 1397 01:12:25,050 --> 01:12:30,890 to ensure that Tor will evade their version so that they can 1398 01:12:30,890 --> 01:12:34,370 sell Tor blocking and then have it not work so they can sell 1399 01:12:34,370 --> 01:12:36,360 the upgrade, and then sell the next upgrade, 1400 01:12:36,360 --> 01:12:37,640 and sell the next upgrade. 1401 01:12:37,640 --> 01:12:39,480 But it sure does seem that way. 1402 01:12:39,480 --> 01:12:42,614 So that's another reason not to work for censorship providers. 1403 01:12:42,614 --> 01:12:44,530 They're tremendously unethical, and they don't 1404 01:12:44,530 --> 01:12:45,654 provide very good software. 1405 01:12:48,180 --> 01:12:50,920 If you're interested in writing any 1406 01:12:50,920 --> 01:12:52,584 of these plug-able transport things, 1407 01:12:52,584 --> 01:12:54,000 that is an excellent kind of thing 1408 01:12:54,000 --> 01:12:56,877 to do as a student project-- loads of fun, 1409 01:12:56,877 --> 01:12:58,460 learn a little bit about crypto, learn 1410 01:12:58,460 --> 01:13:00,076 a little bit about networking. 1411 01:13:00,076 --> 01:13:02,200 And so long as you do it in a memory-safe language, 1412 01:13:02,200 --> 01:13:04,240 you can't screw it up that badly. 1413 01:13:04,240 --> 01:13:06,350 The worst thing that happens is it 1414 01:13:06,350 --> 01:13:10,496 gets censored after a month instead of after a year. 1415 01:13:10,496 --> 01:13:17,600 And that's what I want to-- oh, the addenda related to work. 1416 01:13:17,600 --> 01:13:21,680 Tor is the most popular system of its kind, 1417 01:13:21,680 --> 01:13:23,110 but it's not the only one. 1418 01:13:23,110 --> 01:13:24,740 Lots of others have really good ideas, 1419 01:13:24,740 --> 01:13:26,820 and you should check them out too 1420 01:13:26,820 --> 01:13:29,822 if you're interested in learning all 1421 01:13:29,822 --> 01:13:31,280 of the stuff I'm not thinking about 1422 01:13:31,280 --> 01:13:33,770 and all the reasons I'm wrong. 1423 01:13:33,770 --> 01:13:37,330 freehaven.net/anonbib/ lists the academic research 1424 01:13:37,330 --> 01:13:39,290 and publications in this area. 1425 01:13:39,290 --> 01:13:42,240 But not all the research in this area is academic. 1426 01:13:42,240 --> 01:13:48,680 You should also look at I2P; Gnunet; 1427 01:13:48,680 --> 01:13:52,090 Freedom, which is currently defunct, 1428 01:13:52,090 --> 01:14:09,640 no pun intended; Mixmaster; Mixminion; Sphynx with a Y, 1429 01:14:09,640 --> 01:14:17,280 Sphinx with an I is something different; DC-nets, 1430 01:14:17,280 --> 01:14:25,950 particularly the work of Brian Ford, and also of the team 1431 01:14:25,950 --> 01:14:28,645 at Technical University Dresden, in trying 1432 01:14:28,645 --> 01:14:30,240 to make DC-nets practical. 1433 01:14:30,240 --> 01:14:32,770 They're very strong [INAUDIBLE], not actually deployable 1434 01:14:32,770 --> 01:14:35,245 yet-- and many others. 1435 01:14:41,040 --> 01:14:44,230 Why these get less use or attention than Tor 1436 01:14:44,230 --> 01:14:48,270 is an open topic of some interest 1437 01:14:48,270 --> 01:14:50,910 that I don't have a solid answer for. 1438 01:14:50,910 --> 01:14:55,120 Future work-- so one of the reasons 1439 01:14:55,120 --> 01:14:58,940 I do these is not just because I would like everybody 1440 01:14:58,940 --> 01:15:00,700 to know about the cool software I work on. 1441 01:15:00,700 --> 01:15:02,820 But also because I know students have 1442 01:15:02,820 --> 01:15:05,090 lots and lots of free time. 1443 01:15:05,090 --> 01:15:07,360 And I'm kind of looking to recruit. 1444 01:15:07,360 --> 01:15:09,180 OK, you may think I'm joking. 1445 01:15:09,180 --> 01:15:12,730 But when I was just getting started in this field, 1446 01:15:12,730 --> 01:15:16,790 I was complaining about how I was so busy reviewing papers 1447 01:15:16,790 --> 01:15:19,254 for one conference, writing software, fixing a bug, 1448 01:15:19,254 --> 01:15:19,920 answering email. 1449 01:15:19,920 --> 01:15:21,920 I was complaining to some senior faculty member. 1450 01:15:21,920 --> 01:15:27,060 And he told me, you will never have so much free time 1451 01:15:27,060 --> 01:15:27,850 as you do today. 1452 01:15:29,955 --> 01:15:31,330 You actually have a lot more free 1453 01:15:31,330 --> 01:15:33,050 time now than you will in 10 years. 1454 01:15:33,050 --> 01:15:37,580 So this is a great time to work on crazy software projects. 1455 01:15:37,580 --> 01:15:39,680 So let me tell you about future work in Tor. 1456 01:15:39,680 --> 01:15:43,710 There's this key blinding thing and a complete revamp 1457 01:15:43,710 --> 01:15:45,670 of our hidden services system, which 1458 01:15:45,670 --> 01:15:47,900 was the best we could design when we came up with it. 1459 01:15:47,900 --> 01:15:49,816 But there's been a lot of research since then. 1460 01:15:49,816 --> 01:15:52,710 Maybe some of it will turn out to be a good idea. 1461 01:15:52,710 --> 01:15:54,480 We're also revamping most of our crypto. 1462 01:15:54,480 --> 01:15:58,320 We chose schemes that seemed like a good security 1463 01:15:58,320 --> 01:16:03,140 performance trade-off in 2003, like RSA-1024. 1464 01:16:03,140 --> 01:16:05,720 We've replaced the really important uses 1465 01:16:05,720 --> 01:16:09,580 of RSA-1024 with stronger stuff, currently [INAUDIBLE] 25519. 1466 01:16:09,580 --> 01:16:11,080 But there's still some cases that we 1467 01:16:11,080 --> 01:16:14,797 want to replace in the protocol that we need some work on. 1468 01:16:14,797 --> 01:16:16,630 I didn't talk too much about path selection, 1469 01:16:16,630 --> 01:16:19,910 so I can't talk too much about improvements in that selection. 1470 01:16:19,910 --> 01:16:24,410 But our path selection algorithms were [INAUDIBLE]. 1471 01:16:24,410 --> 01:16:26,140 And there's been some awesome research 1472 01:16:26,140 --> 01:16:31,750 in the past five or six years on that that we need to integrate. 1473 01:16:31,750 --> 01:16:33,900 There's a little work that's been 1474 01:16:33,900 --> 01:16:38,500 done on mixing high latency and low latency traffic so 1475 01:16:38,500 --> 01:16:41,345 that the low latency traffic can provide cover 1476 01:16:41,345 --> 01:16:44,270 for the high latency traffic in terms of providing lots 1477 01:16:44,270 --> 01:16:47,960 of users while the high latency traffic is still very well 1478 01:16:47,960 --> 01:16:50,500 anonymized. 1479 01:16:50,500 --> 01:16:53,970 It's not clear whether this would work or not. 1480 01:16:53,970 --> 01:16:57,600 It's not clear whether anyone would use this or not. 1481 01:16:57,600 --> 01:17:01,080 And it is clear that unless something changes, 1482 01:17:01,080 --> 01:17:03,879 or unless some major funding for that particularly shows up, 1483 01:17:03,879 --> 01:17:05,920 I'm not going to have time to work on it in 2015. 1484 01:17:05,920 --> 01:17:08,045 But if somebody else wants to hack on that, my god, 1485 01:17:08,045 --> 01:17:08,860 that would be fun. 1486 01:17:08,860 --> 01:17:10,920 Our congestion control algorithms 1487 01:17:10,920 --> 01:17:15,030 were chosen questionably based on what we could hack together 1488 01:17:15,030 --> 01:17:17,050 in a week. 1489 01:17:17,050 --> 01:17:20,360 We've improved them, but they could use a bigger revamp. 1490 01:17:20,360 --> 01:17:23,070 There's some research on scaling to hundreds of thousands 1491 01:17:23,070 --> 01:17:24,170 of nodes. 1492 01:17:24,170 --> 01:17:26,800 So in the current design, we can probably 1493 01:17:26,800 --> 01:17:29,630 get up to 10,000 or 20,000 with no problem. 1494 01:17:29,630 --> 01:17:33,330 But because we assume that every client knows about every node, 1495 01:17:33,330 --> 01:17:35,670 and every node may be connected to every other node, 1496 01:17:35,670 --> 01:17:38,350 that's going to stop scaling before 100,000. 1497 01:17:38,350 --> 01:17:41,250 And we need to do something about that. 1498 01:17:41,250 --> 01:17:43,070 That opens up some classes of attacks 1499 01:17:43,070 --> 01:17:47,680 based on attackers learning which clients know which nodes 1500 01:17:47,680 --> 01:17:50,574 and using that to distinguish clients. 1501 01:17:50,574 --> 01:17:52,740 So most of the naive approaches are a bad idea here. 1502 01:17:52,740 --> 01:17:56,960 But it may be that less naive approaches might work out. 1503 01:17:56,960 --> 01:17:59,585 Another thing you might want to do if you're increasing 100,000 1504 01:17:59,585 --> 01:18:02,230 nodes is get rid of those centralized directory 1505 01:18:02,230 --> 01:18:05,840 authorities and go to some kind of peer to peer design. 1506 01:18:05,840 --> 01:18:10,010 I don't have extremely high confidence 1507 01:18:10,010 --> 01:18:12,530 in the peer to peer designs I know of so far. 1508 01:18:12,530 --> 01:18:16,940 But it could be that somebody's about to advance 1509 01:18:16,940 --> 01:18:17,690 the next good one. 1510 01:18:20,230 --> 01:18:23,400 Let's see, I don't know what that means. 1511 01:18:23,400 --> 01:18:26,566 Oh, somebody asked a question about adding 1512 01:18:26,566 --> 01:18:33,013 padding traffic or fake traffic to try to deceive end 1513 01:18:33,013 --> 01:18:34,360 to end traffic correlation. 1514 01:18:34,360 --> 01:18:36,150 This is an exciting research field 1515 01:18:36,150 --> 01:18:40,422 that needs someone smarter to work on it or someone 1516 01:18:40,422 --> 01:18:42,630 with a more practical attitude to work on it than has 1517 01:18:42,630 --> 01:18:44,230 previously worked on it. 1518 01:18:44,230 --> 01:18:47,260 Too many of the results in the research literature 1519 01:18:47,260 --> 01:18:51,345 there are only about distinguishing the traffic 1520 01:18:51,345 --> 01:18:55,229 of two users on a number containing one relay, 1521 01:18:55,229 --> 01:18:57,020 because that's how the math was easy to do. 1522 01:19:00,230 --> 01:19:02,230 So because of this kind of stuff, 1523 01:19:02,230 --> 01:19:04,240 all of the traffic analysis defenses 1524 01:19:04,240 --> 01:19:06,200 that we know of in this area that are still 1525 01:19:06,200 --> 01:19:10,109 compatible with broad browsing, they sound good 1526 01:19:10,109 --> 01:19:11,150 if you read the abstract. 1527 01:19:11,150 --> 01:19:14,790 You'll say, hooray, this one forces the attacker 1528 01:19:14,790 --> 01:19:17,510 to gather three times as much traffic before they 1529 01:19:17,510 --> 01:19:19,020 can correlate users. 1530 01:19:19,020 --> 01:19:20,950 Except when you actually read the paper, 1531 01:19:20,950 --> 01:19:23,510 previously the attacker needed two seconds worth of traffic, 1532 01:19:23,510 --> 01:19:24,485 and then they won. 1533 01:19:24,485 --> 01:19:26,940 Now they need six seconds. 1534 01:19:26,940 --> 01:19:29,430 That's not really a defence in this model, 1535 01:19:29,430 --> 01:19:33,699 although perhaps against a real network, 1536 01:19:33,699 --> 01:19:35,740 the numbers would be different and it might work. 1537 01:19:35,740 --> 01:19:38,930 So we would actually like to see some stuff done 1538 01:19:38,930 --> 01:19:40,470 with padding and fake traffic. 1539 01:19:40,470 --> 01:19:43,645 But we don't like to add voodoo defenses 1540 01:19:43,645 --> 01:19:45,580 that we conjecture to maybe do some good, 1541 01:19:45,580 --> 01:19:47,340 although we can't do that. 1542 01:19:47,340 --> 01:19:48,715 We actually like to have evidence 1543 01:19:48,715 --> 01:19:50,590 that any changes we're going to make 1544 01:19:50,590 --> 01:19:51,790 are going to help something. 1545 01:19:51,790 --> 01:19:53,240 I think I'm out of time. 1546 01:19:53,240 --> 01:19:55,439 And there may be a class in here after us? 1547 01:19:55,439 --> 01:19:55,980 There is not? 1548 01:19:55,980 --> 01:19:58,104 All right, so I'm going to hang around for a while. 1549 01:19:58,104 --> 01:20:00,140 And thanks for coming to listen. 1550 01:20:00,140 --> 01:20:02,290 I would take questions now. 1551 01:20:02,290 --> 01:20:06,089 But it's 12:25, and folks may have another class. 1552 01:20:06,089 --> 01:20:07,380 But I'll be around [INAUDIBLE]. 1553 01:20:07,380 --> 01:20:08,880 Thank you very much for coming. 1554 01:20:08,880 --> 01:20:11,952 [APPLAUSE]