1
00:00:00,080 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,820
Commons license.

3
00:00:03,820 --> 00:00:06,060
Your support will help
MIT OpenCourseWare

4
00:00:06,060 --> 00:00:10,150
continue to offer high quality
educational resources for free.

5
00:00:10,150 --> 00:00:12,700
To make a donation or to
view additional materials

6
00:00:12,700 --> 00:00:16,600
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,600 --> 00:00:17,255
at ocw.mit.edu.

8
00:00:27,595 --> 00:00:28,720
PROFESSOR: All right, guys.

9
00:00:28,720 --> 00:00:29,470
Let's get started.

10
00:00:29,470 --> 00:00:31,137
So today, we're going
to talk about Tor.

11
00:00:31,137 --> 00:00:32,761
And we actually have
one of the authors

12
00:00:32,761 --> 00:00:35,090
of the paper you guys read
for today, Nick Mathewson.

13
00:00:35,090 --> 00:00:37,080
He's also one of the
main developers of Tor.

14
00:00:37,080 --> 00:00:38,065
He's going to tell
you more about it.

15
00:00:38,065 --> 00:00:39,148
NICK MATHEWSON: Thank you.

16
00:00:39,148 --> 00:00:42,315
So at this point,
I could start out

17
00:00:42,315 --> 00:00:44,420
by saying, please
put your hands up

18
00:00:44,420 --> 00:00:48,490
if you didn't read the paper,
but that wouldn't work.

19
00:00:48,490 --> 00:00:50,920
Because it's embarrassing
not to have read a paper

20
00:00:50,920 --> 00:00:52,940
you're supposed to have read.

21
00:00:52,940 --> 00:00:56,020
So instead, what I will ask
is, think of your birthday.

22
00:00:56,020 --> 00:00:57,610
Think of the date of your birth.

23
00:00:57,610 --> 00:01:01,000
If the last digit of the
date of your birth is odd,

24
00:01:01,000 --> 00:01:06,080
or you didn't read the paper,
please raise your hand.

25
00:01:06,080 --> 00:01:09,000
OK, that's not far from half.

26
00:01:09,000 --> 00:01:11,810
So I'm guessing most
people read the paper.

27
00:01:14,720 --> 00:01:19,190
Means of communicating that
preserve our privacy enable

28
00:01:19,190 --> 00:01:23,620
us to communicate more
honestly to gather better

29
00:01:23,620 --> 00:01:26,570
information about the
world when we are less

30
00:01:26,570 --> 00:01:32,210
disinhibited from speaking
because of possibly justified

31
00:01:32,210 --> 00:01:37,540
possibly unjustified social
and other consequences.

32
00:01:37,540 --> 00:01:41,570
So this brings us to
Tor, which is a anonymity

33
00:01:41,570 --> 00:01:44,210
network that I've been working
on for the last 10 years

34
00:01:44,210 --> 00:01:48,080
with some friends and
colleagues and so on.

35
00:01:48,080 --> 00:01:51,170
[INAUDIBLE] there's a set of
volunteer operating servers,

36
00:01:51,170 --> 00:01:52,930
about 6,000 of them.

37
00:01:52,930 --> 00:01:55,290
At first, it was
just friends of ours

38
00:01:55,290 --> 00:01:58,310
that Roger Dingledine
and I knew from MIT.

39
00:01:58,310 --> 00:02:01,660
After that, we built
up more publicity.

40
00:02:01,660 --> 00:02:04,810
More people started
running servers.

41
00:02:04,810 --> 00:02:08,360
Now it's run by nonprofits,
private individuals,

42
00:02:08,360 --> 00:02:12,370
some university teams, possibly
some of you here today,

43
00:02:12,370 --> 00:02:17,820
and no doubt some
very sketchy people.

44
00:02:17,820 --> 00:02:19,140
We've got about 6,000 nodes.

45
00:02:19,140 --> 00:02:21,540
We're serving on the order
of hundreds of thousands

46
00:02:21,540 --> 00:02:24,060
to millions of users
depending on how you count.

47
00:02:24,060 --> 00:02:26,310
It's kind of hard to count,
because they're anonymous.

48
00:02:26,310 --> 00:02:29,142
So you have to use statistical
techniques to estimate.

49
00:02:29,142 --> 00:02:30,850
And we're doing on
the order of terabytes

50
00:02:30,850 --> 00:02:34,500
per second worth of traffic.

51
00:02:34,500 --> 00:02:39,190
Lots of people need anonymity
for their regular work.

52
00:02:39,190 --> 00:02:40,670
Not everyone who
needs anonymity,

53
00:02:40,670 --> 00:02:43,980
though, thinks of
it as anonymity.

54
00:02:43,980 --> 00:02:46,380
Some people say, I
don't need anonymity.

55
00:02:46,380 --> 00:02:48,520
I'm perfectly fine
identifying myself.

56
00:02:48,520 --> 00:02:52,590
But there's broad
perceptions that the privacy

57
00:02:52,590 --> 00:02:55,330
is necessary or useful.

58
00:02:55,330 --> 00:02:57,982
And when regular citizens
use anonymity stuff,

59
00:02:57,982 --> 00:03:00,750
they tend to be doing it
because they want privacy

60
00:03:00,750 --> 00:03:04,455
in search results, privacy in
doing research on the internet.

61
00:03:04,455 --> 00:03:07,900
They want to be able to
engage in local politics

62
00:03:07,900 --> 00:03:12,180
while not offending local
politicians, and so on.

63
00:03:12,180 --> 00:03:15,210
Researchers frequently
use anonymizing tools

64
00:03:15,210 --> 00:03:21,800
to avoid gathering biased data,
biased by geolocation based

65
00:03:21,800 --> 00:03:23,685
services that might
be serving them

66
00:03:23,685 --> 00:03:26,500
in particular different
versions of things.

67
00:03:26,500 --> 00:03:29,700
Companies use
anonymity technologies

68
00:03:29,700 --> 00:03:32,620
for protection of
sensitive data.

69
00:03:32,620 --> 00:03:38,730
For instance, if I can
track all of the movements

70
00:03:38,730 --> 00:03:42,600
of the legal team for some
major internet company,

71
00:03:42,600 --> 00:03:49,360
I can probably, just by tracking
when they're visiting their web

72
00:03:49,360 --> 00:03:52,085
server from different
places around the world,

73
00:03:52,085 --> 00:03:54,126
or where they're visiting
the company [INAUDIBLE]

74
00:03:54,126 --> 00:03:56,000
different places
around the world,

75
00:03:56,000 --> 00:03:58,996
learn a lot about which teams
are collaborating with which.

76
00:03:58,996 --> 00:04:00,370
And this is
information companies

77
00:04:00,370 --> 00:04:02,370
would like to keep private.

78
00:04:02,370 --> 00:04:07,540
Companies use also the anonymity
technology for doing research.

79
00:04:07,540 --> 00:04:12,130
So a major router
manufacturer for a while--

80
00:04:12,130 --> 00:04:13,800
I don't know if this
is still the case--

81
00:04:13,800 --> 00:04:17,200
would regularly serve different
versions of its product sheets

82
00:04:17,200 --> 00:04:20,200
to IP addresses associated
with its competitors

83
00:04:20,200 --> 00:04:23,851
in order to make reverse
engineering trickier.

84
00:04:23,851 --> 00:04:26,142
And they found this out by
using our software and said,

85
00:04:26,142 --> 00:04:28,308
hey, wait a minute, we got
a different product sheet

86
00:04:28,308 --> 00:04:32,407
when we came in from Tor
than we did coming directly.

87
00:04:32,407 --> 00:04:34,365
And it's also kind of
normal for some companies

88
00:04:34,365 --> 00:04:36,679
to serve other companies
versions of their websites

89
00:04:36,679 --> 00:04:38,720
to emphasize the employment
opportunity sections.

90
00:04:41,660 --> 00:04:46,910
Regular law enforcement needs
anonymity technologies as well

91
00:04:46,910 --> 00:04:49,900
to avoid tipping off people
during investigations.

92
00:04:49,900 --> 00:04:51,955
You do not want the
local police station

93
00:04:51,955 --> 00:04:57,290
to appear in the web logs of
somebody you're investigating.

94
00:04:57,290 --> 00:05:00,960
And regular folks
need it, as I said,

95
00:05:00,960 --> 00:05:04,640
for avoiding harassment
because of online activities,

96
00:05:04,640 --> 00:05:07,600
to research stuff that
might be embarrassing.

97
00:05:07,600 --> 00:05:13,390
If you live in a country with
uncertain health care laws,

98
00:05:13,390 --> 00:05:16,420
you might want to avoid
creating too much public record

99
00:05:16,420 --> 00:05:19,070
of what diseases you think
you might have and so on,

100
00:05:19,070 --> 00:05:21,920
or what dangerous
hobbies you might have.

101
00:05:21,920 --> 00:05:27,400
And also lots of criminal or bad
folks use anonymity technology.

102
00:05:27,400 --> 00:05:28,710
It's not their only option.

103
00:05:28,710 --> 00:05:33,650
But if you are willing to
purchase time on a bot net,

104
00:05:33,650 --> 00:05:35,360
you can buy some
pretty good privacy

105
00:05:35,360 --> 00:05:38,329
that is not available
to people who

106
00:05:38,329 --> 00:05:39,620
think that bot nets are amoral.

107
00:05:39,620 --> 00:05:43,890
And Tor, and anonymity
stuff in general,

108
00:05:43,890 --> 00:05:49,482
are not the only multi-use
technology out there.

109
00:05:49,482 --> 00:05:51,690
Let's see, the average age
of a graduate is about 20.

110
00:05:51,690 --> 00:05:56,706
So around when you were
born-- have you talked

111
00:05:56,706 --> 00:05:58,327
about crypto wars at all?

112
00:05:58,327 --> 00:05:59,181
PROFESSOR: No.

113
00:05:59,181 --> 00:06:00,270
NICK MATHEWSON: No.

114
00:06:00,270 --> 00:06:02,700
During the 1990s, it was sort
of an up-in-the-air question

115
00:06:02,700 --> 00:06:06,590
in the United States about
to what extent civilian use

116
00:06:06,590 --> 00:06:09,120
of non-backdoor cryptography
should be legal,

117
00:06:09,120 --> 00:06:11,320
and to what extent it
should be exported.

118
00:06:11,320 --> 00:06:13,200
That kind of came
down pretty decisively

119
00:06:13,200 --> 00:06:17,090
on the side of cryptography
should be legal and exportable

120
00:06:17,090 --> 00:06:20,310
during the '90s and early 2000s.

121
00:06:20,310 --> 00:06:24,350
And although there's some debate
about anonymity technology,

122
00:06:24,350 --> 00:06:27,100
it's more or less
the same debate.

123
00:06:27,100 --> 00:06:30,750
And I think it's going to end
in more or less the same way.

124
00:06:30,750 --> 00:06:33,264
So here's an outline of my talk.

125
00:06:33,264 --> 00:06:35,680
I'm going to give you that
little introduction I gave you,

126
00:06:35,680 --> 00:06:37,721
talk a little bit about
what we mean by anonymity

127
00:06:37,721 --> 00:06:40,235
in a technical sense, talk a
little about our motivations

128
00:06:40,235 --> 00:06:41,068
for getting into it.

129
00:06:41,068 --> 00:06:44,970
Then I'm going to kind
of walk you through step

130
00:06:44,970 --> 00:06:47,450
by step how you start
with the idea of,

131
00:06:47,450 --> 00:06:50,445
we ought to have some
anonymity, and how

132
00:06:50,445 --> 00:06:52,902
do you wind up with the
design of Tor from that point.

133
00:06:52,902 --> 00:06:54,660
And I'll mention
some branching off

134
00:06:54,660 --> 00:06:56,990
points where you might
wind up with other designs.

135
00:06:56,990 --> 00:06:59,780
I'll pause to answer some
of the cool questions

136
00:06:59,780 --> 00:07:04,220
that everyone has sent in
for their class assignment.

137
00:07:04,220 --> 00:07:06,710
I'll talk a little bit about
how node discovery works,

138
00:07:06,710 --> 00:07:08,394
which is an important topic.

139
00:07:08,394 --> 00:07:10,150
And then I'll sort
of by show of hands

140
00:07:10,150 --> 00:07:12,856
pick which of these
advanced topics to cover.

141
00:07:12,856 --> 00:07:15,230
I guess we're calling them
advanced because they're later

142
00:07:15,230 --> 00:07:16,360
in the lecture.

143
00:07:16,360 --> 00:07:19,750
And I can't read them all,
but they're all really cool.

144
00:07:19,750 --> 00:07:21,655
I'll mention some
related systems

145
00:07:21,655 --> 00:07:23,905
whose designs you
ought to check out

146
00:07:23,905 --> 00:07:26,370
if this is a topic that
interests you and you'd like

147
00:07:26,370 --> 00:07:27,286
to know more about it.

148
00:07:27,286 --> 00:07:30,340
I'll talk about future work
that we want to have done at Tor

149
00:07:30,340 --> 00:07:32,860
and I hope that we'll
have time to do some day.

150
00:07:32,860 --> 00:07:35,870
And if there's time for
questions, then I'll take them.

151
00:07:35,870 --> 00:07:38,930
And I've got nowhere I need
to be for the next hour or so.

152
00:07:38,930 --> 00:07:43,307
So I and my colleague David over
there-- can you wave your hand,

153
00:07:43,307 --> 00:07:47,340
David-- will be hanging
around somewhere and talking

154
00:07:47,340 --> 00:07:48,613
to anyone who wants to talk.

155
00:07:48,613 --> 00:07:52,750
So right, anonymity--
what do we mean

156
00:07:52,750 --> 00:07:54,183
when we talk about anonymity?

157
00:07:54,183 --> 00:07:57,210
There are lots of
informal notions

158
00:07:57,210 --> 00:08:03,390
that get used in informal
discussions, in online, and so

159
00:08:03,390 --> 00:08:03,890
on.

160
00:08:03,890 --> 00:08:05,390
Some people use
anonymous to mean,

161
00:08:05,390 --> 00:08:06,598
I didn't write my name on it.

162
00:08:06,598 --> 00:08:10,900
Some people use
anonymous to mean, well,

163
00:08:10,900 --> 00:08:12,290
no one can actually
prove it's me

164
00:08:12,290 --> 00:08:15,230
even if you suspect strongly.

165
00:08:15,230 --> 00:08:18,200
What we mean is a number
of notions expressed

166
00:08:18,200 --> 00:08:25,560
in terms of the
ability of an observer

167
00:08:25,560 --> 00:08:32,590
or attacker on a network to
link participants to actions.

168
00:08:32,590 --> 00:08:35,870
These notions come out
of a terminology paper

169
00:08:35,870 --> 00:08:38,659
by [INAUDIBLE] that
you find a link

170
00:08:38,659 --> 00:08:43,929
to on freehaven.net/anonbib/,
the anonymity bibliography that

171
00:08:43,929 --> 00:08:46,790
I help maintain.

172
00:08:46,790 --> 00:08:49,423
It should list most of the
good papers in the field.

173
00:08:49,423 --> 00:08:51,550
We need to bring it
up to date to 2014,

174
00:08:51,550 --> 00:08:53,390
but it's pretty useful.

175
00:08:53,390 --> 00:08:55,840
So when I say anonymity,
generally what I mean

176
00:08:55,840 --> 00:09:01,080
is Alice is doing some activity.

177
00:09:01,080 --> 00:09:05,132
She's-- what should
Alice be doing?

178
00:09:05,132 --> 00:09:06,215
Alice is buying new socks.

179
00:09:10,270 --> 00:09:11,820
And there's some attacker here.

180
00:09:11,820 --> 00:09:14,770
Let's call her Eve for now.

181
00:09:14,770 --> 00:09:18,999
Eve can tell that Alice
is doing something.

182
00:09:18,999 --> 00:09:21,040
Preventing that is not
what we mean by anonymity.

183
00:09:21,040 --> 00:09:22,890
That's called unobservability.

184
00:09:22,890 --> 00:09:26,550
Eve can tell possibly that
someone is buying socks.

185
00:09:26,550 --> 00:09:28,850
Again, that's not what
we mean by anonymity.

186
00:09:28,850 --> 00:09:33,480
But what we hope is that
Eve cannot tell that Alice

187
00:09:33,480 --> 00:09:36,310
in particular is buying socks.

188
00:09:36,310 --> 00:09:40,190
And we mean that both
on a categorical level--

189
00:09:40,190 --> 00:09:42,935
Eve should not be
able to conclude

190
00:09:42,935 --> 00:09:45,060
through rigorous mathematical
proof, this is Alice,

191
00:09:45,060 --> 00:09:48,430
she's buying socks--
but also, Eve should not

192
00:09:48,430 --> 00:09:52,180
be able to conclude
probabilistically it's likelier

193
00:09:52,180 --> 00:09:56,030
that Alice is buying socks than
some randomly selected person.

194
00:09:56,030 --> 00:09:59,080
And also, we would
like Eve not to be

195
00:09:59,080 --> 00:10:02,250
able to conclude after
observing many Alice activities,

196
00:10:02,250 --> 00:10:05,280
Alice sometimes buys
socks, even if I

197
00:10:05,280 --> 00:10:08,260
don't know some particular
activity of Alice's is

198
00:10:08,260 --> 00:10:09,045
a socks purchase.

199
00:10:12,650 --> 00:10:14,560
There are other ideas
that are related.

200
00:10:14,560 --> 00:10:17,030
One is on unlinkability.

201
00:10:17,030 --> 00:10:23,876
Unlinkability is it's like a
long-term profile of Alice.

202
00:10:23,876 --> 00:10:26,210
So for instance,
Alice has been posting

203
00:10:26,210 --> 00:10:33,050
as-- I'm never good at
picking names for my example.

204
00:10:33,050 --> 00:10:39,060
Alice has been posting as Bob
and writing a political blog

205
00:10:39,060 --> 00:10:43,315
that would disrupt
her career, that

206
00:10:43,315 --> 00:10:45,490
would offend her
department head and disrupt

207
00:10:45,490 --> 00:10:49,820
her career as a computer
security [INAUDIBLE].

208
00:10:49,820 --> 00:10:53,280
So she's been writing as Bob.

209
00:10:53,280 --> 00:10:59,650
Unlinkability is Eve's
inability to link Alice

210
00:10:59,650 --> 00:11:01,950
to a particular profile.

211
00:11:01,950 --> 00:11:05,910
Final notion--
unobservability, some systems

212
00:11:05,910 --> 00:11:12,540
try to make it impossible to
even tell that Alice is online,

213
00:11:12,540 --> 00:11:15,620
that Alice is connecting to
anybody at all, that Alice

214
00:11:15,620 --> 00:11:17,190
is doing any active.

215
00:11:17,190 --> 00:11:20,660
These are rather hard to build.

216
00:11:20,660 --> 00:11:22,660
I'll talk a little bit
more about to what extent

217
00:11:22,660 --> 00:11:25,650
that they are useful later.

218
00:11:25,650 --> 00:11:27,610
Something that is
useful in that area

219
00:11:27,610 --> 00:11:29,745
is you might want to
conceal that Alice

220
00:11:29,745 --> 00:11:32,240
is using an anonymity
system, but not

221
00:11:32,240 --> 00:11:33,630
that she is on the internet.

222
00:11:33,630 --> 00:11:35,910
That's more achievable
than concealing the fact

223
00:11:35,910 --> 00:11:39,070
that Alice is on the
internet entirely.

224
00:11:39,070 --> 00:11:42,710
So why did I start working
on this in the first place?

225
00:11:42,710 --> 00:11:45,177
Well, partially because
of the engineer's itch.

226
00:11:45,177 --> 00:11:46,010
It's a cool problem.

227
00:11:46,010 --> 00:11:47,870
It's an interesting problem.

228
00:11:47,870 --> 00:11:50,805
Nobody else was
actually working on it.

229
00:11:50,805 --> 00:11:52,480
And my friend Roger
got a contract

230
00:11:52,480 --> 00:11:56,940
to finish up a stalled
research project

231
00:11:56,940 --> 00:11:58,490
before the grant expired.

232
00:11:58,490 --> 00:12:03,585
And he did it well enough that
I said, hey, I'll join up.

233
00:12:03,585 --> 00:12:05,200
And [INAUDIBLE].

234
00:12:05,200 --> 00:12:06,720
I'll join in.

235
00:12:06,720 --> 00:12:09,740
After a while, we
formed a nonprofit

236
00:12:09,740 --> 00:12:13,310
and released everything
as open source.

237
00:12:13,310 --> 00:12:14,870
So that's part of it.

238
00:12:14,870 --> 00:12:18,810
But for deeper
motivations, I think

239
00:12:18,810 --> 00:12:21,800
humanity has got a lot
of problems that can only

240
00:12:21,800 --> 00:12:25,760
be solved through better
and more dedicated

241
00:12:25,760 --> 00:12:30,530
communication, freer expression,
and more freedom of thought.

242
00:12:30,530 --> 00:12:33,890
And I don't know how to
solve these problems.

243
00:12:33,890 --> 00:12:37,360
All I think I can do
is try to make sure

244
00:12:37,360 --> 00:12:40,880
that what I see as
inhibiting discussion,

245
00:12:40,880 --> 00:12:44,780
thought, speech,
becomes harder to do.

246
00:12:44,780 --> 00:12:46,275
So that's [INAUDIBLE].

247
00:12:46,275 --> 00:12:47,188
Yeah.

248
00:12:47,188 --> 00:12:49,604
STUDENT: So I know there are
many good reasons to use Tor.

249
00:12:49,604 --> 00:12:51,062
Please don't see
this as criticism.

250
00:12:51,062 --> 00:12:53,036
I'm just curious,
what is your opinion

251
00:12:53,036 --> 00:12:55,297
as far as criminal activity?

252
00:12:55,297 --> 00:12:57,630
NICK MATHEWSON: What is my
opinion on criminal activity?

253
00:12:57,630 --> 00:12:58,430
Some laws are good.

254
00:12:58,430 --> 00:12:59,532
Some laws are bad.

255
00:12:59,532 --> 00:13:01,490
My lawyers would tell me
never to advise anyone

256
00:13:01,490 --> 00:13:02,860
to break the law.

257
00:13:05,750 --> 00:13:08,253
My goal was not to enable
criminal activity against most

258
00:13:08,253 --> 00:13:10,550
of the laws I agree with.

259
00:13:10,550 --> 00:13:13,140
In places where criticising
the government is illegal,

260
00:13:13,140 --> 00:13:17,399
then I'm in favor of criminal
activity of that kind.

261
00:13:17,399 --> 00:13:19,190
So in that case, I
suppose I was supporting

262
00:13:19,190 --> 00:13:21,330
that kind of criminal activity.

263
00:13:21,330 --> 00:13:24,946
My stance on whether it's
a problem that an anonymity

264
00:13:24,946 --> 00:13:26,570
network gets used
for criminal activity

265
00:13:26,570 --> 00:13:29,215
in general, to the extent
that there are good laws,

266
00:13:29,215 --> 00:13:31,660
I would prefer that
people not break them.

267
00:13:31,660 --> 00:13:36,980
I would, however, think that any
computer security system that

268
00:13:36,980 --> 00:13:40,830
does not get used by criminals
is probably a very bad computer

269
00:13:40,830 --> 00:13:43,720
security system if
the criminals are

270
00:13:43,720 --> 00:13:46,770
making any kind of good
decision making policy.

271
00:13:46,770 --> 00:13:49,820
I think that if we go
around banning security

272
00:13:49,820 --> 00:13:54,619
that works for criminals, we
wind up with insecure systems.

273
00:13:54,619 --> 00:13:56,410
So that's more or less
where I stand on it.

274
00:13:56,410 --> 00:13:58,284
I'm not really the
philosopher of it, though.

275
00:13:58,284 --> 00:13:59,620
I'm more of the programmer.

276
00:13:59,620 --> 00:14:01,760
So I'm going to be giving
really trite answers

277
00:14:01,760 --> 00:14:03,362
to philosophical
and legal questions.

278
00:14:03,362 --> 00:14:05,570
Also, I'm not a lawyer and
cannot offer legal advice.

279
00:14:05,570 --> 00:14:08,510
Do not take anything
I say as legal advice.

280
00:14:08,510 --> 00:14:14,464
That said, [INAUDIBLE], a lot
of these research problems

281
00:14:14,464 --> 00:14:15,880
that I'm going to
be talking about

282
00:14:15,880 --> 00:14:17,484
weren't even close
to being solved.

283
00:14:17,484 --> 00:14:19,650
So whey do we start anyway
instead of going straight

284
00:14:19,650 --> 00:14:21,099
into research?

285
00:14:21,099 --> 00:14:23,140
One of the reasons, we
thought that a lot of them

286
00:14:23,140 --> 00:14:27,300
wouldn't get solved unless
there was a test bed to work on.

287
00:14:27,300 --> 00:14:29,250
And that's kind
of been borne out.

288
00:14:29,250 --> 00:14:33,590
Because Tor has kind of become
the research platform of choice

289
00:14:33,590 --> 00:14:36,530
for lots of work on low
latency anonymity systems.

290
00:14:36,530 --> 00:14:38,580
And it's helped the
field a lot in that way.

291
00:14:38,580 --> 00:14:41,120
But also, 10 years on, a
lot of the big problems

292
00:14:41,120 --> 00:14:42,650
still aren't solved.

293
00:14:42,650 --> 00:14:45,740
So if we had waited 10 years
for everything to get fixed,

294
00:14:45,740 --> 00:14:48,290
we would have been
waiting in vain.

295
00:14:48,290 --> 00:14:51,760
So why do it then?

296
00:14:51,760 --> 00:14:58,740
Partially because we thought
that having a system out there

297
00:14:58,740 --> 00:15:03,041
would improve long-term
outcomes for the world.

298
00:15:03,041 --> 00:15:05,290
That is, it's really easy
to argue that something that

299
00:15:05,290 --> 00:15:08,250
doesn't exist should be banned.

300
00:15:08,250 --> 00:15:10,440
Arguments against civilian
use of cryptography

301
00:15:10,440 --> 00:15:13,230
were much easier to
make in public in 1990

302
00:15:13,230 --> 00:15:14,361
than they are today.

303
00:15:14,361 --> 00:15:15,860
Because there was
almost no civilian

304
00:15:15,860 --> 00:15:18,240
use of strong cryptography then.

305
00:15:18,240 --> 00:15:23,050
And you could argue that if
anything stronger than DES

306
00:15:23,050 --> 00:15:28,010
is legal, then
civilization will collapse.

307
00:15:28,010 --> 00:15:34,900
Criminals will never be
caught, and organized crime

308
00:15:34,900 --> 00:15:36,525
will take over everything.

309
00:15:36,525 --> 00:15:38,150
But you couldn't
really argue that that

310
00:15:38,150 --> 00:15:41,410
was the inevitable consequence
of cryptography in 2000.

311
00:15:41,410 --> 00:15:43,440
Because cryptography had
already been out there,

312
00:15:43,440 --> 00:15:46,160
and it turned out
not to end the world.

313
00:15:46,160 --> 00:15:49,420
Further, it was harder to argue
for a cryptography ban in 2000

314
00:15:49,420 --> 00:15:54,270
because there was a large
constituency in favor

315
00:15:54,270 --> 00:15:56,090
of the use of cryptography.

316
00:15:56,090 --> 00:15:59,150
That is, if someone
in 1985 says,

317
00:15:59,150 --> 00:16:01,630
let's ban strong
cryptography, well, banks

318
00:16:01,630 --> 00:16:02,880
are using strong cryptography.

319
00:16:02,880 --> 00:16:04,860
So they'll ask for an exemption.

320
00:16:04,860 --> 00:16:05,580
But other than
that, there weren't

321
00:16:05,580 --> 00:16:07,121
a lot of users of
strong cryptography

322
00:16:07,121 --> 00:16:08,384
in the civilian space.

323
00:16:08,384 --> 00:16:09,800
But if someone in
2000 said, let's

324
00:16:09,800 --> 00:16:12,180
ban strong
cryptography, that would

325
00:16:12,180 --> 00:16:14,900
be every internet company.

326
00:16:14,900 --> 00:16:18,885
Everyone running an HTTPS page
would start waving their hands

327
00:16:18,885 --> 00:16:20,050
and shouting about it.

328
00:16:20,050 --> 00:16:21,690
And nowadays, strong
cryptography bans

329
00:16:21,690 --> 00:16:24,610
are probably unfeasible,
although people

330
00:16:24,610 --> 00:16:26,000
keep bringing back the idea.

331
00:16:26,000 --> 00:16:27,470
And again, I'm not
the philosopher

332
00:16:27,470 --> 00:16:29,980
or political scientist
of the movement.

333
00:16:29,980 --> 00:16:34,860
So some folks ask me,
what's your threat model?

334
00:16:34,860 --> 00:16:37,390
It's good to be thinking
in terms of threat models.

335
00:16:37,390 --> 00:16:40,280
Unfortunately, our threat
model is kind of weird.

336
00:16:40,280 --> 00:16:43,570
We started not with an
adversary requirement.

337
00:16:43,570 --> 00:16:46,202
But we started with a
usability requirement.

338
00:16:46,202 --> 00:16:48,700
The usability requirement
we gave ourselves to begin

339
00:16:48,700 --> 00:16:52,395
is, this has to be
useful for web browsing.

340
00:16:52,395 --> 00:16:58,910
This has to be useful for
interactive protocols.

341
00:16:58,910 --> 00:17:01,110
And it actually
needs to see use.

342
00:17:01,110 --> 00:17:04,800
Subject to that, we want
to maximize security.

343
00:17:04,800 --> 00:17:07,369
So our threat model has
lots of weird corners

344
00:17:07,369 --> 00:17:10,050
in it if you actually
write it out as,

345
00:17:10,050 --> 00:17:13,410
what can an attacker do, under
what circumstances, and how?

346
00:17:13,410 --> 00:17:15,780
And that's because we've
set ourselves the goal of,

347
00:17:15,780 --> 00:17:17,810
it has to work for the web.

348
00:17:17,810 --> 00:17:20,443
And I'll return to that
in a minute or two.

349
00:17:20,443 --> 00:17:23,180
But let's sort of
talk about now how

350
00:17:23,180 --> 00:17:29,810
we can use forward anonymity,
how we build forward anonymity.

351
00:17:29,810 --> 00:17:32,580
So here's Alice.

352
00:17:32,580 --> 00:17:35,890
She wants to buy socks.

353
00:17:35,890 --> 00:17:42,170
So OK, let's say that
Alice runs a computer.

354
00:17:42,170 --> 00:17:43,820
Let's call it R for relay.

355
00:17:43,820 --> 00:17:47,670
And this computer relays
her traffic to-- I

356
00:17:47,670 --> 00:17:50,600
want to say socks.com,
but I'm afraid that'll

357
00:17:50,600 --> 00:17:53,690
turn out to be something
horrible, so zappos.com.

358
00:17:53,690 --> 00:17:55,000
Yeah, they sell socks, too.

359
00:17:55,000 --> 00:17:58,470
All right, so Alice wants to
buy some socks from zappos.com.

360
00:17:58,470 --> 00:18:00,930
And she's going through a relay.

361
00:18:00,930 --> 00:18:04,530
Well, I said Alice runs a relay.

362
00:18:04,530 --> 00:18:07,910
Any eavesdropper who's
looking at this will say,

363
00:18:07,910 --> 00:18:09,240
that's Alice's computer.

364
00:18:09,240 --> 00:18:11,097
It's probably Alice.

365
00:18:11,097 --> 00:18:13,180
All right, so let's have
somebody else run a relay

366
00:18:13,180 --> 00:18:17,340
and have lots of other
users all visit it.

367
00:18:17,340 --> 00:18:20,200
I'll call them A2
and A3, because there

368
00:18:20,200 --> 00:18:26,720
aren't enough standard
cryptography person names-- buy

369
00:18:26,720 --> 00:18:34,332
books, tweet cat pictures.

370
00:18:37,600 --> 00:18:42,670
This is like 80% of what people
do on the internet, right?

371
00:18:42,670 --> 00:18:46,650
So now we have three people all
going into this relay, three

372
00:18:46,650 --> 00:18:47,600
streams exiting.

373
00:18:47,600 --> 00:18:51,090
Someone who's watching the
relay can't easily correlate--

374
00:18:51,090 --> 00:18:54,290
should not be, we hope, but
we return to that later--

375
00:18:54,290 --> 00:18:58,300
that this Alice is buying
socks, this Alice, buying books,

376
00:18:58,300 --> 00:19:00,860
this Alice is tweeting cat pix.

377
00:19:00,860 --> 00:19:06,090
Well, except if they're watching
this side of the connections,

378
00:19:06,090 --> 00:19:08,530
they can see Alice
telling the relay,

379
00:19:08,530 --> 00:19:10,554
please connect me to zappos.com.

380
00:19:10,554 --> 00:19:12,220
All right, so we'll
add some encryption.

381
00:19:12,220 --> 00:19:15,200
We'll maybe do TLS on
all of these links.

382
00:19:15,200 --> 00:19:18,015
So to the extent that you
can't break TLS, to the extent

383
00:19:18,015 --> 00:19:20,200
you can't correlate
this to this,

384
00:19:20,200 --> 00:19:22,630
then they get some privacy.

385
00:19:22,630 --> 00:19:25,830
Well, that's still not
good enough, though.

386
00:19:25,830 --> 00:19:31,040
Because first off, we're
assuming that this relay

387
00:19:31,040 --> 00:19:32,619
is fully trusted.

388
00:19:32,619 --> 00:19:34,410
I assume you know the
definition of trusted

389
00:19:34,410 --> 00:19:36,460
and why it doesn't
actually mean trusted.

390
00:19:36,460 --> 00:19:37,940
OK, good.

391
00:19:37,940 --> 00:19:39,440
This is trusted in
the sense that it

392
00:19:39,440 --> 00:19:41,720
can break the whole system,
trusted in the sense

393
00:19:41,720 --> 00:19:44,845
that you can't help but trust
it, not trusted in the sense

394
00:19:44,845 --> 00:19:46,620
that it's actually trustworthy.

395
00:19:46,620 --> 00:19:49,720
So all right, we can
introduce multiple relays.

396
00:19:49,720 --> 00:19:53,410
We can have different relays
run by different people.

397
00:19:53,410 --> 00:20:00,120
We can have-- this is not
actually the topology we use.

398
00:20:00,120 --> 00:20:01,885
But my blackboard
technique is terrible,

399
00:20:01,885 --> 00:20:04,225
and I don't want
to redraw anything.

400
00:20:07,190 --> 00:20:09,720
We can imagine tumbling
these connections

401
00:20:09,720 --> 00:20:11,680
through multiple
relays, each of which

402
00:20:11,680 --> 00:20:14,170
removes a single
layer of encryption.

403
00:20:14,170 --> 00:20:19,770
So all this relay sees is
Alice is doing something.

404
00:20:19,770 --> 00:20:23,610
All this relay sees is
someone is buying socks.

405
00:20:23,610 --> 00:20:26,240
But this one just sees
someone is buying socks.

406
00:20:26,240 --> 00:20:28,562
The connection came
from this relay.

407
00:20:28,562 --> 00:20:30,395
This one just sees Alice
is doing something,

408
00:20:30,395 --> 00:20:32,320
and it forwards onto this relay.

409
00:20:32,320 --> 00:20:35,505
And no single party ought
to be able to correlate

410
00:20:35,505 --> 00:20:37,450
the whole thing.

411
00:20:37,450 --> 00:20:42,780
Now we come to a
major design point.

412
00:20:42,780 --> 00:20:50,090
Let's suppose that Eve is
watching here and here.

413
00:20:50,090 --> 00:20:52,250
Nothing I've said
so far does anything

414
00:20:52,250 --> 00:20:57,860
to obscure the timing and
volume of Alice's packets.

415
00:20:57,860 --> 00:21:01,140
Oh sure, there'll be
some trivial noise

416
00:21:01,140 --> 00:21:03,690
added from all the
computation and decryption

417
00:21:03,690 --> 00:21:06,220
these things do from
network latency and so on.

418
00:21:06,220 --> 00:21:11,600
But ultimately, if Alice
is sending a kilobyte in,

419
00:21:11,600 --> 00:21:13,500
then the design I've
sketched out so far,

420
00:21:13,500 --> 00:21:16,315
a kilobyte is coming out.

421
00:21:16,315 --> 00:21:21,650
And if the socks web
page is 64k long,

422
00:21:21,650 --> 00:21:26,340
and is served by this
web server at 11:26,

423
00:21:26,340 --> 00:21:27,870
then Alice is going
to get something

424
00:21:27,870 --> 00:21:33,460
about 64k long at
11:26 or 11:27 or so.

425
00:21:33,460 --> 00:21:38,400
Now, with some
statistics, Eve can

426
00:21:38,400 --> 00:21:42,540
correlate some of these
streams if we don't obscure

427
00:21:42,540 --> 00:21:44,726
volume and timing information.

428
00:21:44,726 --> 00:21:46,850
There are designs that do
obscure volume and timing

429
00:21:46,850 --> 00:21:48,190
information.

430
00:21:48,190 --> 00:21:52,230
The good ones usually
come out of [INAUDIBLE],

431
00:21:52,230 --> 00:21:55,140
although there's
some work on DC-nets.

432
00:21:55,140 --> 00:21:58,040
You could have something
where each of these nodes

433
00:21:58,040 --> 00:22:00,600
received a large number of
requests, just [INAUDIBLE]

434
00:22:00,600 --> 00:22:03,030
up all the requests
they got for an hour,

435
00:22:03,030 --> 00:22:06,970
reordered them, and
transmitted them all at once.

436
00:22:06,970 --> 00:22:10,260
And you could also say all
requests must be the same size.

437
00:22:10,260 --> 00:22:13,670
Requests are 1k,
responses are 1 megabyte.

438
00:22:13,670 --> 00:22:15,680
And with some more
work on that, we

439
00:22:15,680 --> 00:22:22,440
get something that would let you
send an email that would arrive

440
00:22:22,440 --> 00:22:29,220
in order of hours, or get a web
page in order of to end time,

441
00:22:29,220 --> 00:22:32,610
assuming that you optimize
it to a single round trip.

442
00:22:32,610 --> 00:22:36,500
These systems exist, and existed
when we started doing Tor.

443
00:22:36,500 --> 00:22:38,675
They don't get a
lot of use, though.

444
00:22:38,675 --> 00:22:40,740
I actually wrote
one called Mixminion

445
00:22:40,740 --> 00:22:44,010
that was a successor to
the Mixmaster remailer.

446
00:22:44,010 --> 00:22:46,510
I have not gotten a remailer
message in the last three

447
00:22:46,510 --> 00:22:47,010
years.

448
00:22:49,620 --> 00:22:51,350
Tor has billions of users.

449
00:22:51,350 --> 00:22:54,293
Remailers, it's unclear
whether they've got more than

450
00:22:54,293 --> 00:22:55,477
on the order of hundreds.

451
00:22:55,477 --> 00:22:57,310
So you might think,
well, still though, it's

452
00:22:57,310 --> 00:22:59,830
better anonymity for the
people who really need it.

453
00:22:59,830 --> 00:23:03,120
Except if you've only got on
the order of hundreds of users,

454
00:23:03,120 --> 00:23:05,655
then you're not
really providing them

455
00:23:05,655 --> 00:23:08,630
all that much anonymity against
this kind of adversary anyway.

456
00:23:08,630 --> 00:23:10,260
Because this adversary
can simply go,

457
00:23:10,260 --> 00:23:12,250
OK, there's 100 people.

458
00:23:12,250 --> 00:23:14,080
Well, the message I
want to investigate

459
00:23:14,080 --> 00:23:15,630
was looking at a
Bulgarian website.

460
00:23:15,630 --> 00:23:17,040
How many of them
speak Bulgarian?

461
00:23:17,040 --> 00:23:20,170
OK, that's five.

462
00:23:20,170 --> 00:23:22,950
The saying is,
anonymity loves company.

463
00:23:22,950 --> 00:23:25,615
Unless you have a
large user base,

464
00:23:25,615 --> 00:23:28,230
no system can actually
provide anonymity.

465
00:23:28,230 --> 00:23:31,970
And that's why also in this
design, if these Alices all

466
00:23:31,970 --> 00:23:33,770
belong to an
organization, they ought

467
00:23:33,770 --> 00:23:38,830
to have a shared public system
rather than a private one.

468
00:23:38,830 --> 00:23:45,130
If they all work for
MIT legal, and they're

469
00:23:45,130 --> 00:23:50,120
investigating some
fake MIT website that's

470
00:23:50,120 --> 00:23:54,663
offering fake diplomas,
then if they're just

471
00:23:54,663 --> 00:23:58,800
using the MIT legal anonymizer,
then it's not really

472
00:23:58,800 --> 00:24:00,370
concealing who they are.

473
00:24:00,370 --> 00:24:02,495
But if you have a large
number of different parties

474
00:24:02,495 --> 00:24:06,590
all using this, then it actually
can provide some privacy.

475
00:24:06,590 --> 00:24:13,830
So we'll return one more time
to resisting these correlation

476
00:24:13,830 --> 00:24:14,330
attacks.

477
00:24:14,330 --> 00:24:16,996
But for now let's say that we're
not resisting these correlation

478
00:24:16,996 --> 00:24:17,720
attacks.

479
00:24:17,720 --> 00:24:23,070
And instead, we assume that
an attacker who sees both ends

480
00:24:23,070 --> 00:24:25,850
wins, and we're trying to
minimize the probability

481
00:24:25,850 --> 00:24:28,220
that that happens over time.

482
00:24:28,220 --> 00:24:31,150
All right, so I've just
talked about message passing.

483
00:24:35,464 --> 00:24:37,880
The way you would build that
with something like a mix net

484
00:24:37,880 --> 00:24:45,630
is you give each of these relays
a public key-- K3, K2, K1.

485
00:24:45,630 --> 00:24:48,480
And when Alice wants to
send something through here,

486
00:24:48,480 --> 00:24:55,110
she would say, encrypt
with K3, socks,

487
00:24:55,110 --> 00:24:59,350
and then encrypt
that with K2-- I'm

488
00:24:59,350 --> 00:25:01,430
leaving off writing
information for now--

489
00:25:01,430 --> 00:25:04,320
and then encrypt with K1.

490
00:25:04,320 --> 00:25:05,894
But public key, as
you know, is kind

491
00:25:05,894 --> 00:25:08,310
of expensive enough that you
don't want to use it for bulk

492
00:25:08,310 --> 00:25:10,000
traffic.

493
00:25:10,000 --> 00:25:17,610
So instead what you
do is you negotiate

494
00:25:17,610 --> 00:25:20,110
a set of keys with each server.

495
00:25:20,110 --> 00:25:23,350
So Alice shares a symmetric
key with this relay,

496
00:25:23,350 --> 00:25:25,100
a different symmetric
key with this relay,

497
00:25:25,100 --> 00:25:28,395
and a different symmetric key
with this relay associated

498
00:25:28,395 --> 00:25:32,110
in what we call a circuit, which
is a path through the network.

499
00:25:32,110 --> 00:25:38,677
And after the initial public key
is set up to create those keys,

500
00:25:38,677 --> 00:25:40,135
Alice can then use
symmetric crypto

501
00:25:40,135 --> 00:25:41,551
to send stuff
through the network.

502
00:25:41,551 --> 00:25:43,920
If you stop at that
point, then you

503
00:25:43,920 --> 00:25:47,250
have onion routing as it
was designed in the 1990s

504
00:25:47,250 --> 00:25:51,955
by Syverson,
Goldschlag, and Reed.

505
00:25:51,955 --> 00:25:54,811
And I hope I get
the names right.

506
00:25:54,811 --> 00:25:56,060
Paul Syverson is still active.

507
00:25:56,060 --> 00:25:59,210
The other two are
working on other things.

508
00:25:59,210 --> 00:26:03,390
Also, once you've added circuits
like that, medium term paths

509
00:26:03,390 --> 00:26:06,910
through the network, you can
have an easy reply channel

510
00:26:06,910 --> 00:26:09,310
where things sent
back this way get

511
00:26:09,310 --> 00:26:13,155
to Alice being encrypted at
each step instead of decrypted

512
00:26:13,155 --> 00:26:15,770
at each step.

513
00:26:15,770 --> 00:26:21,660
And of course you need some
kind of integrity checking,

514
00:26:21,660 --> 00:26:24,430
either node by
node or end to end.

515
00:26:24,430 --> 00:26:26,280
Because if you don't
do integrity checking,

516
00:26:26,280 --> 00:26:31,855
then-- well, let's say you're
using an XOR based stream

517
00:26:31,855 --> 00:26:33,622
cypher for your encryption.

518
00:26:33,622 --> 00:26:35,080
If you don't do
integrity checking,

519
00:26:35,080 --> 00:26:39,230
then this node can XOR in
Alice, Alice, Alice, Alice,

520
00:26:39,230 --> 00:26:42,410
Alice to the encrypted message.

521
00:26:42,410 --> 00:26:44,970
And then when it's finally
decrypted over here,

522
00:26:44,970 --> 00:26:47,310
because that's a
malleable crypto

523
00:26:47,310 --> 00:26:56,410
scheme, if the same attacker is
controlling this node as well,

524
00:26:56,410 --> 00:26:58,970
or if the attacker
is observing it here,

525
00:26:58,970 --> 00:27:01,870
the attacker will see Alice,
Alice, Alice, Alice, Alice

526
00:27:01,870 --> 00:27:03,820
XORed with a
reasonable plain text

527
00:27:03,820 --> 00:27:05,320
and be able to use
that to identify,

528
00:27:05,320 --> 00:27:08,580
ah, this is the stream
that came from Alice.

529
00:27:08,580 --> 00:27:12,370
So let's do a little more
about how the protocol works.

530
00:27:12,370 --> 00:27:14,870
Because it would be a shame to
have everybody read the paper

531
00:27:14,870 --> 00:27:16,245
and then not talk
about the stuff

532
00:27:16,245 --> 00:27:17,680
that the paper is focused on.

533
00:27:24,011 --> 00:27:26,840
Again, I apologize for
my blackboard technique.

534
00:27:26,840 --> 00:27:32,120
Most of the time, I'm
sitting at home on a desktop.

535
00:27:32,120 --> 00:27:35,385
This is alien tech.

536
00:27:35,385 --> 00:27:38,315
So here's a relay.

537
00:27:38,315 --> 00:27:41,580
Here's Alice.

538
00:27:41,580 --> 00:27:43,610
Here's another relay.

539
00:27:43,610 --> 00:27:44,270
Here's Bob.

540
00:27:44,270 --> 00:27:45,843
Now Alice wants to talk to Bob.

541
00:27:48,460 --> 00:27:52,720
So first thing Alice has
to do is build a circuit

542
00:27:52,720 --> 00:27:55,210
through these relays to Bob.

543
00:27:55,210 --> 00:27:57,130
Let's say she's picked
these two, R1 and R2.

544
00:27:59,900 --> 00:28:08,050
So Alice first makes
a TLS link to R1.

545
00:28:08,050 --> 00:28:10,660
R1, let's say, already
has a TLS link to R2.

546
00:28:13,550 --> 00:28:16,335
First thing Alice
does is she does

547
00:28:16,335 --> 00:28:25,250
a one-way authenticated one-way
anonymous key negotiation.

548
00:28:25,250 --> 00:28:28,340
The old one in
Tor is called TAP.

549
00:28:28,340 --> 00:28:30,280
The new one is called NTor.

550
00:28:30,280 --> 00:28:31,980
They both have proofs.

551
00:28:35,032 --> 00:28:36,490
They both even have
correct proofs,

552
00:28:36,490 --> 00:28:41,540
although the original proof
in the paper had a flaw in it.

553
00:28:41,540 --> 00:28:45,780
But when that's done,
she sends a create cell.

554
00:28:45,780 --> 00:28:47,690
And she picks a circuit ID.

555
00:28:47,690 --> 00:28:52,023
Let's say she picks
3, and says, create 3.

556
00:28:54,650 --> 00:28:55,650
The relay says, created.

557
00:29:00,010 --> 00:29:05,575
And now R1 and Alice share a
secret key, a symmetric key,

558
00:29:05,575 --> 00:29:06,866
which they're going to call S1.

559
00:29:10,280 --> 00:29:16,234
And they both have this stored
as 3 with respect to this link.

560
00:29:19,020 --> 00:29:23,810
Now Alice can use that key
to send messages to R1.

561
00:29:23,810 --> 00:29:27,265
So she says, on 3-- that's
the circuit ID that everything

562
00:29:27,265 --> 00:29:38,760
was talking about in the
paper-- send a relay extend

563
00:29:38,760 --> 00:29:41,210
with some contents.

564
00:29:41,210 --> 00:29:44,326
The extend cell basically
contains the first half

565
00:29:44,326 --> 00:29:47,130
of the create handshake.

566
00:29:47,130 --> 00:29:50,965
But this time, it's not
encrypted with R1's public key.

567
00:29:50,965 --> 00:29:53,070
It's encrypted with
R2's public key.

568
00:29:53,070 --> 00:29:56,130
And it also says, and
this one goes to R2.

569
00:29:56,130 --> 00:30:01,941
So R1 knows to open a new
circuit to R2, and says,

570
00:30:01,941 --> 00:30:02,440
create.

571
00:30:05,770 --> 00:30:09,480
And it passes the initial
part of the handshake

572
00:30:09,480 --> 00:30:12,120
as it came from Alice along.

573
00:30:12,120 --> 00:30:14,550
And it picks its own circuit ID.

574
00:30:14,550 --> 00:30:17,185
Because circuit IDs identify
the different circuits

575
00:30:17,185 --> 00:30:19,122
on this TLS connection.

576
00:30:19,122 --> 00:30:20,830
And Alice doesn't know
what other circuit

577
00:30:20,830 --> 00:30:22,120
IDs are in use on this one.

578
00:30:22,120 --> 00:30:24,390
Because this one is
private to R1 and R2.

579
00:30:24,390 --> 00:30:28,270
So it might pick 95.

580
00:30:28,270 --> 00:30:30,020
It actually is very
unlikely to pick that,

581
00:30:30,020 --> 00:30:36,270
because they're randomly
chosen from a 4 byte space.

582
00:30:36,270 --> 00:30:40,780
But I don't want to write
out any 32-bit numbers today.

583
00:30:40,780 --> 00:30:43,975
And this says,
created in response.

584
00:30:43,975 --> 00:30:48,590
So this one sends back an
extended encrypted with S1.

585
00:30:48,590 --> 00:30:58,480
And now Alice and
relay share S2.

586
00:30:58,480 --> 00:31:01,050
So now Alice can send
messages encrypted

587
00:31:01,050 --> 00:31:06,480
first with S2, and then
with S1 as relay cells.

588
00:31:06,480 --> 00:31:08,000
So she sends a
message like that.

589
00:31:08,000 --> 00:31:12,960
R1 removes the S1 encryption
and forwards it on.

590
00:31:12,960 --> 00:31:17,750
It says, OK, it came
in on circuit 3.

591
00:31:17,750 --> 00:31:20,370
I know that 3 goes
to 95 on this one.

592
00:31:20,370 --> 00:31:23,075
So I send it on 95.

593
00:31:23,075 --> 00:31:25,852
And I say whatever I
got after decrypting.

594
00:31:25,852 --> 00:31:28,980
OK, and this one says,
ah, I came on 95.

595
00:31:28,980 --> 00:31:33,290
95 corresponds to
the shared key S2.

596
00:31:33,290 --> 00:31:34,740
So I'll decrypt with that.

597
00:31:34,740 --> 00:31:38,340
Oh, that says, open
a connection to Bob.

598
00:31:38,340 --> 00:31:41,650
And relay 2 opens a
TCP connection to Bob

599
00:31:41,650 --> 00:31:45,270
and tells Alice that it did
it through the same process.

600
00:31:45,270 --> 00:31:47,150
And Alice says, great.

601
00:31:47,150 --> 00:31:58,440
Tell Bob http 10 get/index.html,
and the world goes on.

602
00:31:58,440 --> 00:32:00,120
Let's see, what did I leave out?

603
00:32:00,120 --> 00:32:03,040
I'll skip that, skip
that, skip that.

604
00:32:03,040 --> 00:32:04,930
So what do we actually relay?

605
00:32:04,930 --> 00:32:07,210
Some designs in this area
say, well, you should

606
00:32:07,210 --> 00:32:08,980
send IP packets back and forth.

607
00:32:08,980 --> 00:32:12,006
This should just be a way
to transmit IP packets.

608
00:32:12,006 --> 00:32:15,980
One of the problems
with that is we

609
00:32:15,980 --> 00:32:19,070
want to support as many
users as possible, which

610
00:32:19,070 --> 00:32:21,580
means we have to run on all
kinds of operating systems.

611
00:32:21,580 --> 00:32:23,920
And operating system
TCP stacks do not

612
00:32:23,920 --> 00:32:26,020
act anything like each other.

613
00:32:26,020 --> 00:32:27,960
If you've ever used
Nmap, or if you've ever

614
00:32:27,960 --> 00:32:30,610
used any kind of network
traffic analysis tool,

615
00:32:30,610 --> 00:32:34,635
you can trivially tell
Windows TCP from FreeBSD

616
00:32:34,635 --> 00:32:36,880
from Linux TCP.

617
00:32:36,880 --> 00:32:38,990
And you can even tell
different versions apart.

618
00:32:38,990 --> 00:32:41,870
And moreover, if you
can send raw IP packets

619
00:32:41,870 --> 00:32:45,560
to a chosen host,
you can provoke

620
00:32:45,560 --> 00:32:49,810
different responses
in part based

621
00:32:49,810 --> 00:32:51,637
on what the host is doing.

622
00:32:51,637 --> 00:32:53,458
So if you're doing
IP, you would actually

623
00:32:53,458 --> 00:32:55,900
need an IP normalization
layer if IP is what

624
00:32:55,900 --> 00:32:58,630
you transport back and forth.

625
00:32:58,630 --> 00:33:03,730
And it seems that anything less
than a full IP stack is not

626
00:33:03,730 --> 00:33:07,017
actually going to work
for IP normalization.

627
00:33:07,017 --> 00:33:08,350
So you wouldn't want to do that.

628
00:33:10,880 --> 00:33:13,560
Instead, what we just chose
is-- and this is largely

629
00:33:13,560 --> 00:33:15,960
because this is the
easiest way-- you take

630
00:33:15,960 --> 00:33:18,230
the contents of TCP streams.

631
00:33:18,230 --> 00:33:25,390
So you just assume
each of these things

632
00:33:25,390 --> 00:33:27,610
is reliable and in order.

633
00:33:27,610 --> 00:33:31,430
You have the computer analysis
end, the program analysis

634
00:33:31,430 --> 00:33:35,400
running to do all
this stuff for her,

635
00:33:35,400 --> 00:33:38,120
accept TCP connections
from Alice's applications,

636
00:33:38,120 --> 00:33:40,720
and then just relay
their contents

637
00:33:40,720 --> 00:33:44,229
and don't do anything
trickier on the network level.

638
00:33:44,229 --> 00:33:46,020
You might be able to
get better performance

639
00:33:46,020 --> 00:33:46,970
by trying some other means.

640
00:33:46,970 --> 00:33:48,428
And there are some
papers examining

641
00:33:48,428 --> 00:33:49,880
how you would do that.

642
00:33:49,880 --> 00:33:52,820
But this is the one that we
could actually implement.

643
00:33:52,820 --> 00:33:54,392
Because we paid a
lot more attention

644
00:33:54,392 --> 00:33:56,100
in security and
compilers classes than we

645
00:33:56,100 --> 00:33:58,860
did in networking classes.

646
00:33:58,860 --> 00:34:00,760
Now we have networking people.

647
00:34:00,760 --> 00:34:04,285
But in 2003, 2004, we did not
have any networking experts.

648
00:34:07,250 --> 00:34:09,030
TCP also seems like
the right level.

649
00:34:09,030 --> 00:34:11,594
Higher level
protocols-- like in some

650
00:34:11,594 --> 00:34:13,210
of the original
[INAUDIBLE] designs,

651
00:34:13,210 --> 00:34:16,389
there were separate proxies
at this end for HTTP,

652
00:34:16,389 --> 00:34:19,000
for FTP, and so on.

653
00:34:19,000 --> 00:34:21,889
That seems to be
mostly a bad idea.

654
00:34:21,889 --> 00:34:24,060
Because any
interesting protocol is

655
00:34:24,060 --> 00:34:26,880
going to have end to end
encryption from Alice

656
00:34:26,880 --> 00:34:28,650
all the way to Bob.

657
00:34:28,650 --> 00:34:32,800
That is if we're lucky, Alice
is doing a TLS connection

658
00:34:32,800 --> 00:34:40,800
over this to Bob so that TLS
properties get her integrity

659
00:34:40,800 --> 00:34:44,110
and secrecy.

660
00:34:44,110 --> 00:34:46,909
But if that's the case,
then any kind anonymizing

661
00:34:46,909 --> 00:34:50,840
transformations you want to
apply to the encrypted data

662
00:34:50,840 --> 00:34:53,139
need to happen in
the application

663
00:34:53,139 --> 00:34:56,710
Alice is using before
the TLS happens entirely.

664
00:34:56,710 --> 00:34:58,637
So you can't really
do that in a proxy.

665
00:34:58,637 --> 00:35:00,220
And that's kind of
why we came out to,

666
00:35:00,220 --> 00:35:03,370
OK, the sweet spot
is TCP contents.

667
00:35:03,370 --> 00:35:08,070
Somebody asked me, OK, but
where are your security proofs?

668
00:35:08,070 --> 00:35:11,530
We do have security proofs for a
lot of the cryptography that we

669
00:35:11,530 --> 00:35:15,760
use, standard reductions.

670
00:35:15,760 --> 00:35:19,510
For the protocol
as a whole, there

671
00:35:19,510 --> 00:35:23,069
are proofs in the field about
certain aspects of onion

672
00:35:23,069 --> 00:35:23,710
routing.

673
00:35:23,710 --> 00:35:27,310
But the models that they
have to use in order

674
00:35:27,310 --> 00:35:31,170
to prove that this
provides anonymity

675
00:35:31,170 --> 00:35:36,890
make assumptions about
the universe, the network,

676
00:35:36,890 --> 00:35:41,930
or the attacker's abilities
that are so weird as

677
00:35:41,930 --> 00:35:45,710
to satisfy no one but certain
program committees of more

678
00:35:45,710 --> 00:35:49,070
theoretical conferences.

679
00:35:49,070 --> 00:35:54,580
The kind of things you can prove
is that an attacker who sees

680
00:35:54,580 --> 00:36:02,890
this, who sees a number of
strings here all of equal

681
00:36:02,890 --> 00:36:07,140
volume and equal timing, cannot
tell which one goes to which

682
00:36:07,140 --> 00:36:11,650
Bob simply by looking
at the bytes coming out.

683
00:36:11,650 --> 00:36:14,630
But that's hardly
a useful result.

684
00:36:14,630 --> 00:36:17,880
Also, the kind of guarantee you
can get from anonymity systems

685
00:36:17,880 --> 00:36:20,319
that we know how to
build today-- OK,

686
00:36:20,319 --> 00:36:21,360
I should be careful here.

687
00:36:21,360 --> 00:36:24,780
There are some where you
have very strong guarantees

688
00:36:24,780 --> 00:36:26,930
that we do know how to
build that you would never

689
00:36:26,930 --> 00:36:28,010
actually want to use.

690
00:36:28,010 --> 00:36:32,490
Like classical
[INAUDIBLE] DC-nets,

691
00:36:32,490 --> 00:36:35,200
for instance, provide
guaranteed anonymity.

692
00:36:35,200 --> 00:36:37,450
Except any participant can
shut down the whole network

693
00:36:37,450 --> 00:36:39,550
by not participating.

694
00:36:39,550 --> 00:36:41,400
That does not scale.

695
00:36:41,400 --> 00:36:42,820
But for the things
that we do want

696
00:36:42,820 --> 00:36:46,880
to build these days,
for the most part,

697
00:36:46,880 --> 00:36:49,960
the anonymity properties
are probabilistic rather

698
00:36:49,960 --> 00:36:52,670
than categorically
guarantee-able.

699
00:36:52,670 --> 00:36:56,070
So instead of asking,
does this protect

700
00:36:56,070 --> 00:36:58,650
Alice, the kind of
questions you could ask

701
00:36:58,650 --> 00:37:02,600
are, under this assumption
about hacker capabilities, how

702
00:37:02,600 --> 00:37:04,260
much traffic can
Alice safely send

703
00:37:04,260 --> 00:37:10,370
if she wants a 99% chance of not
being linked to her activities?

704
00:37:10,370 --> 00:37:13,070
So will anyone actually
run these things?

705
00:37:13,070 --> 00:37:15,430
That was an opening
question when we started.

706
00:37:15,430 --> 00:37:17,430
We didn't know whether
the system would actually

707
00:37:17,430 --> 00:37:18,320
take off or not.

708
00:37:18,320 --> 00:37:25,450
So the only [INAUDIBLE]
try to see what happens.

709
00:37:25,450 --> 00:37:28,920
We got a fair amount
of volunteer operators.

710
00:37:28,920 --> 00:37:33,410
A fair number of non-profits
have formed whose sole purpose

711
00:37:33,410 --> 00:37:36,440
is just to take donations and
use it to buy bandwidth and run

712
00:37:36,440 --> 00:37:38,890
Tor nodes.

713
00:37:38,890 --> 00:37:40,450
And there are also universities.

714
00:37:40,450 --> 00:37:42,609
There's also private companies.

715
00:37:42,609 --> 00:37:44,650
For a while, [INAUDIBLE]
was running a Tor server

716
00:37:44,650 --> 00:37:47,689
out of their security
team because they

717
00:37:47,689 --> 00:37:48,480
thought it was fun.

718
00:37:52,360 --> 00:37:54,760
The legal issues there--
again, I'm not a lawyer.

719
00:37:54,760 --> 00:37:55,910
I can't offer legal advice.

720
00:37:55,910 --> 00:37:58,035
But five different people
asked about legal issues.

721
00:38:00,192 --> 00:38:01,900
As far as I can tell,
in the US at least,

722
00:38:01,900 --> 00:38:04,800
there's no legal impediment
to running a Tor server.

723
00:38:04,800 --> 00:38:07,690
And that seems to be the case
throughout most of Europe

724
00:38:07,690 --> 00:38:09,580
as far as I'm aware.

725
00:38:09,580 --> 00:38:12,970
In places that generally
have less internet freedom,

726
00:38:12,970 --> 00:38:14,670
it's a dicier proposition.

727
00:38:14,670 --> 00:38:16,670
The issues to be
concerned about are not,

728
00:38:16,670 --> 00:38:19,180
is it illegal to
run a Tor server,

729
00:38:19,180 --> 00:38:24,635
but if somebody does something
illegal or undesirable

730
00:38:24,635 --> 00:38:28,180
with my Tor server, will
my ISP shut me down,

731
00:38:28,180 --> 00:38:32,846
and will law
enforcement believe, oh,

732
00:38:32,846 --> 00:38:34,220
you're just running
a Tor server,

733
00:38:34,220 --> 00:38:37,336
or will they seize the
computer to make sure?

734
00:38:37,336 --> 00:38:39,710
For those, I would suggest
not running the Tor server out

735
00:38:39,710 --> 00:38:42,720
of your dorm room.

736
00:38:42,720 --> 00:38:45,670
Excuse me, don't run an
exit out of your dorm room,

737
00:38:45,670 --> 00:38:48,460
or really out of your dorm room,
assuming the network policy

738
00:38:48,460 --> 00:38:49,460
allows that.

739
00:38:49,460 --> 00:38:50,650
I have no idea.

740
00:38:50,650 --> 00:38:52,400
They've changed so
much since I was a kid.

741
00:38:55,266 --> 00:38:57,890
Running an exit out of your dorm
room could get you in trouble.

742
00:38:57,890 --> 00:39:01,620
But running a non-exit relay
that doesn't deliver traffic

743
00:39:01,620 --> 00:39:05,282
to the internet is less
likely to create those issues

744
00:39:05,282 --> 00:39:05,865
in particular.

745
00:39:10,140 --> 00:39:12,010
But if you do it in
a nice co-lo site,

746
00:39:12,010 --> 00:39:14,730
and you get your
ISP's permission,

747
00:39:14,730 --> 00:39:19,840
then it's a pretty
reasonable thing to do.

748
00:39:19,840 --> 00:39:23,311
Let's see, someone asked,
well, what if users

749
00:39:23,311 --> 00:39:24,560
don't trust a particular node?

750
00:39:24,560 --> 00:39:29,670
And this brings me
to my next topic.

751
00:39:29,670 --> 00:39:32,750
So the software the clients
use, you can't tell it,

752
00:39:32,750 --> 00:39:35,780
don't use this one, don't use
this one, only use this one.

753
00:39:35,780 --> 00:39:39,130
But remember that anonymity
loves company principle.

754
00:39:39,130 --> 00:39:43,631
If I'm only using
three nodes, and you're

755
00:39:43,631 --> 00:39:45,256
using three different
nodes, and you're

756
00:39:45,256 --> 00:39:49,550
using three different nodes,
our traffic will not mix at all.

757
00:39:49,550 --> 00:39:52,280
To the extent that we partition
off which parts of the network

758
00:39:52,280 --> 00:39:55,740
we use, we are distinguishable
from one another.

759
00:39:55,740 --> 00:39:57,800
Now, if I just exclude
one or two nodes,

760
00:39:57,800 --> 00:40:00,040
and you just exclude
one or two nodes,

761
00:40:00,040 --> 00:40:03,120
that's not a big partitioning,
and that doesn't help

762
00:40:03,120 --> 00:40:05,270
distinguish-ability that much.

763
00:40:05,270 --> 00:40:08,700
But it would be good to
the extent possible to have

764
00:40:08,700 --> 00:40:12,290
everyone using the same nodes.

765
00:40:12,290 --> 00:40:14,880
So all right, how do
we accomplish that?

766
00:40:14,880 --> 00:40:16,780
So version one, in the
first version of Tor,

767
00:40:16,780 --> 00:40:18,730
we just chipped a list
of all of the nodes.

768
00:40:18,730 --> 00:40:21,525
I think there were three of
them, or five, or something.

769
00:40:21,525 --> 00:40:22,900
No, I think there
were about six,

770
00:40:22,900 --> 00:40:25,910
of which three were all
running on the same computer

771
00:40:25,910 --> 00:40:30,142
in a closet at LCS
in Tech Square.

772
00:40:30,142 --> 00:40:32,560
All right, so that
wasn't a good idea.

773
00:40:32,560 --> 00:40:34,090
Because nodes can
go up and down.

774
00:40:34,090 --> 00:40:35,067
Nodes change.

775
00:40:35,067 --> 00:40:36,442
You don't want to
have to put out

776
00:40:36,442 --> 00:40:39,005
a new release of your
software every time somebody

777
00:40:39,005 --> 00:40:41,160
joins to release the network.

778
00:40:41,160 --> 00:40:44,260
So you could just
have every node keep

779
00:40:44,260 --> 00:40:46,677
a list of all the other nodes
that are connected to it

780
00:40:46,677 --> 00:40:48,010
and all advertise to each other.

781
00:40:48,010 --> 00:40:50,193
And then when a client
connects, a client just

782
00:40:50,193 --> 00:40:51,790
has to know one
node and then says,

783
00:40:51,790 --> 00:40:53,189
hey, who's on the network?

784
00:40:53,189 --> 00:40:54,730
And actually, a lot
of designs people

785
00:40:54,730 --> 00:40:57,320
have built work this way.

786
00:40:57,320 --> 00:40:59,500
A lot of early peer to
peer anonymity designs work

787
00:40:59,500 --> 00:41:00,360
this way.

788
00:41:00,360 --> 00:41:01,771
But it's a terrible idea.

789
00:41:01,771 --> 00:41:04,270
Because if you go to one node
and say, who's on the network,

790
00:41:04,270 --> 00:41:07,240
and you believe them, well,
if I'm that node, I can say,

791
00:41:07,240 --> 00:41:11,070
yes, I'm on the network,
and my friend over here

792
00:41:11,070 --> 00:41:14,130
is on the network, and my friend
over here is on the network,

793
00:41:14,130 --> 00:41:15,920
and no one else
is on the network.

794
00:41:15,920 --> 00:41:18,895
And I can tell you any
number of fake nodes

795
00:41:18,895 --> 00:41:22,790
that are all operated by me
and capture all of your traffic

796
00:41:22,790 --> 00:41:25,160
that way with what's called
a row capture attack.

797
00:41:25,160 --> 00:41:28,480
OK, so maybe we just
have a single directory

798
00:41:28,480 --> 00:41:30,470
operated by a trusted party.

799
00:41:30,470 --> 00:41:33,730
That's not so good as a
single point of failure.

800
00:41:33,730 --> 00:41:38,210
So OK, let's have
multiple trusted parties.

801
00:41:38,210 --> 00:41:41,750
And clients go to these
multiple trusted parties

802
00:41:41,750 --> 00:41:43,990
and get a list of all of
the nodes from all of them

803
00:41:43,990 --> 00:41:47,010
and combine those lists.

804
00:41:47,010 --> 00:41:49,813
Then you're
actually-- first off,

805
00:41:49,813 --> 00:41:51,560
you're partitioned in that case.

806
00:41:51,560 --> 00:41:54,060
If I choose these three,
and you choose those three,

807
00:41:54,060 --> 00:41:55,975
and they say anything
different, then we'll

808
00:41:55,975 --> 00:41:57,350
be using different
sets of nodes.

809
00:41:57,350 --> 00:41:58,820
So that's still not good.

810
00:41:58,820 --> 00:42:01,800
Also, there's
still a [INAUDIBLE]

811
00:42:01,800 --> 00:42:08,820
where if I use the intersection
of the sets they tell me,

812
00:42:08,820 --> 00:42:11,520
then any one of them can keep
me from using a node they

813
00:42:11,520 --> 00:42:13,360
don't like by not listing it.

814
00:42:13,360 --> 00:42:16,700
If I use the union,
anyone can flood me

815
00:42:16,700 --> 00:42:21,630
by making 20,000 fake servers
that are all on the list.

816
00:42:21,630 --> 00:42:24,545
I might compute the result
of some sort of vote

817
00:42:24,545 --> 00:42:26,930
on them, which would
solve those two problems.

818
00:42:26,930 --> 00:42:28,890
But I'd still be
partitioned from everyone

819
00:42:28,890 --> 00:42:32,580
who's using different
trusted parties.

820
00:42:32,580 --> 00:42:35,270
We could do a magical DHT.

821
00:42:35,270 --> 00:42:36,859
Have we done
[INAUDIBLE] hash tables?

822
00:42:36,859 --> 00:42:39,150
All right, we could do some
sort of magical distributed

823
00:42:39,150 --> 00:42:43,930
structure run across
all of the nodes.

824
00:42:43,930 --> 00:42:50,140
I say magical, because although
there are designs in this area,

825
00:42:50,140 --> 00:42:54,320
and some better than
others, none of them

826
00:42:54,320 --> 00:42:58,624
really seem to have a solid
security evidence for it

827
00:42:58,624 --> 00:43:00,040
at this point to
the point where I

828
00:43:00,040 --> 00:43:04,260
would be comfortable in saying,
yes, this is actually secure.

829
00:43:04,260 --> 00:43:06,900
So the solution we
wound up with is

830
00:43:06,900 --> 00:43:10,610
have multiple hardened
trusted authorities run

831
00:43:10,610 --> 00:43:14,040
by trusted parties that
collect lists of nodes

832
00:43:14,040 --> 00:43:17,690
that vote hourly on
which nodes are running

833
00:43:17,690 --> 00:43:21,870
that can vote to exclude nodes
that seem to be misbehaving

834
00:43:21,870 --> 00:43:25,920
that are all running on the
same slash 16 that are doing

835
00:43:25,920 --> 00:43:29,120
strange things to
traffic, and have

836
00:43:29,120 --> 00:43:34,190
them form a consensus that's
a result of their votes.

837
00:43:34,190 --> 00:43:36,017
And everybody signs
the consensus.

838
00:43:36,017 --> 00:43:37,517
And clients don't
use it unless it's

839
00:43:37,517 --> 00:43:39,490
signed by enough authorities.

840
00:43:39,490 --> 00:43:40,940
This is not the final design.

841
00:43:40,940 --> 00:43:44,670
But it's the best we've
managed to come up with so far.

842
00:43:44,670 --> 00:43:46,630
And this way, all you
need to distribute

843
00:43:46,630 --> 00:43:51,880
with clients is a list of all
of the authorities' public keys

844
00:43:51,880 --> 00:43:54,210
and some places to
get the directories.

845
00:43:54,210 --> 00:43:58,120
You want to have all the nodes
cache these directory things.

846
00:43:58,120 --> 00:44:00,604
Because if you don't, the
bandwidth load on authorities

847
00:44:00,604 --> 00:44:01,270
is catastrophic.

848
00:44:04,320 --> 00:44:06,050
So I'm going to skip over that.

849
00:44:06,050 --> 00:44:11,260
Because I would love
to talk about how

850
00:44:11,260 --> 00:44:13,295
clients should
choose which paths

851
00:44:13,295 --> 00:44:14,800
to build through the network.

852
00:44:14,800 --> 00:44:17,560
I would love to talk
about issues applications

853
00:44:17,560 --> 00:44:20,382
and making applications
not deanonymize themselves.

854
00:44:20,382 --> 00:44:21,590
I'd love to talk about abuse.

855
00:44:21,590 --> 00:44:24,470
I'd love to talk about hidden
services and how they work.

856
00:44:24,470 --> 00:44:27,210
I'd love to talk about
censorship resistance.

857
00:44:27,210 --> 00:44:30,540
And I'd like to talk about
attacks and defenses.

858
00:44:30,540 --> 00:44:34,230
But I've only got 35 minutes.

859
00:44:34,230 --> 00:44:36,280
And I can't possibly
cover all of these.

860
00:44:36,280 --> 00:44:38,490
So show of hands
for how many people

861
00:44:38,490 --> 00:44:42,500
think the most important--
think about what you think

862
00:44:42,500 --> 00:44:45,584
are the two most important
topics on this list.

863
00:44:45,584 --> 00:44:47,250
If one of your two
most important topics

864
00:44:47,250 --> 00:44:49,041
is path selection and
how you choose nodes,

865
00:44:49,041 --> 00:44:51,500
please raise your hand.

866
00:44:51,500 --> 00:44:53,550
If one of your two
most important topics

867
00:44:53,550 --> 00:44:57,370
is application issues and
how to make applications not

868
00:44:57,370 --> 00:45:00,044
bust your anonymity,
please raise your hand.

869
00:45:00,044 --> 00:45:02,020
If one of your most
important issues

870
00:45:02,020 --> 00:45:05,700
is abuse and what kind of abuse
we see, how you can prevent it,

871
00:45:05,700 --> 00:45:08,294
and how that works out,
please raise your hand.

872
00:45:08,294 --> 00:45:11,651
OK, that one's popular.

873
00:45:11,651 --> 00:45:13,150
If one of your most
important topics

874
00:45:13,150 --> 00:45:14,566
is how these
services work and how

875
00:45:14,566 --> 00:45:17,280
they can be made to work
better, please raise your hand.

876
00:45:17,280 --> 00:45:19,530
Wow, that's much more popular
on this side of the room

877
00:45:19,530 --> 00:45:20,654
than that side of the room.

878
00:45:20,654 --> 00:45:23,162
What's going on?

879
00:45:23,162 --> 00:45:24,820
You guys in a club?

880
00:45:24,820 --> 00:45:26,926
Are you up to something?

881
00:45:26,926 --> 00:45:29,610
Censorship, who's
interested in censorship?

882
00:45:29,610 --> 00:45:32,880
OK, that's fairly popular.

883
00:45:32,880 --> 00:45:36,170
Attacks and defenses?

884
00:45:36,170 --> 00:45:39,530
OK, so we're not doing paths
and we're not doing apps.

885
00:45:39,530 --> 00:45:44,600
So apps-- guard nodes, guard
nodes, C guard node designs,

886
00:45:44,600 --> 00:45:46,240
select by bandwidth.

887
00:45:46,240 --> 00:45:48,230
You need to actually
weight by bandwidth,

888
00:45:48,230 --> 00:45:51,200
but you also need a trusted
way to measure bandwidth.

889
00:45:51,200 --> 00:45:55,025
And that's the too long,
didn't lecture of what

890
00:45:55,025 --> 00:45:56,150
would be on path selection.

891
00:45:56,150 --> 00:45:59,555
For application issues,
almost no protocol

892
00:45:59,555 --> 00:46:03,630
is actually designed
to provide anonymity.

893
00:46:03,630 --> 00:46:06,530
Because almost every
protocol that's widely used

894
00:46:06,530 --> 00:46:08,324
has the assumption
in it, well, you

895
00:46:08,324 --> 00:46:09,740
know, anyone who
wants to can just

896
00:46:09,740 --> 00:46:12,500
see the IPs on this traffic.

897
00:46:12,500 --> 00:46:16,030
So there's no point in
trying to conceal identity.

898
00:46:16,030 --> 00:46:18,900
So in a particularly
complex protocol,

899
00:46:18,900 --> 00:46:22,320
like the whole stack of
protocols a web browser uses,

900
00:46:22,320 --> 00:46:24,020
there's no real way
to anonymize that

901
00:46:24,020 --> 00:46:27,400
just by anonymizing the traffic
with something like Tor.

902
00:46:27,400 --> 00:46:30,150
You need to hack the
web browser pretty hard

903
00:46:30,150 --> 00:46:32,810
to make it stop doing things
like leaking the list of fonts

904
00:46:32,810 --> 00:46:34,830
that are identified
on your system,

905
00:46:34,830 --> 00:46:38,540
leaking your exact
window size, allowing

906
00:46:38,540 --> 00:46:41,780
all kinds of permanent
cookie-like structures,

907
00:46:41,780 --> 00:46:44,740
leaking what's in the cache
and what's not in the cache,

908
00:46:44,740 --> 00:46:46,250
and so on.

909
00:46:46,250 --> 00:46:48,680
So your choices
there are basically

910
00:46:48,680 --> 00:46:52,180
isolate everything and restart
from a fresh VM all the time,

911
00:46:52,180 --> 00:46:53,514
or reroute the browser, or both.

912
00:46:53,514 --> 00:46:55,513
Other things are a lot
easier than web browsers,

913
00:46:55,513 --> 00:46:56,460
but still problematic.

914
00:46:56,460 --> 00:47:00,780
That's all I'm going to
say about app issues.

915
00:47:00,780 --> 00:47:02,850
Let's see, I think
I got the most

916
00:47:02,850 --> 00:47:05,142
hands-- did you see what
I got the most hands for,

917
00:47:05,142 --> 00:47:06,624
any opinions?

918
00:47:06,624 --> 00:47:08,083
STUDENT: Abuse and
hidden services?

919
00:47:08,083 --> 00:47:09,832
NICK MATHEWSON: Abuse
and hidden services.

920
00:47:09,832 --> 00:47:12,277
All right, I'll talk about
abuse and hidden services.

921
00:47:12,277 --> 00:47:15,200
And if I've still got time,
I'll do censorship and attacks.

922
00:47:15,200 --> 00:47:19,185
So let's go to abuse--
abuse, abuse, abuse.

923
00:47:22,420 --> 00:47:26,960
So one problem that
we've fortunately not

924
00:47:26,960 --> 00:47:30,707
had all that much of-- so when
we were working on this stuff,

925
00:47:30,707 --> 00:47:32,490
the problem that
everybody was afraid of

926
00:47:32,490 --> 00:47:34,698
was this horrible stuff that
would get you kicked off

927
00:47:34,698 --> 00:47:37,580
of any ISP, and it would
create tremendous legal issues

928
00:47:37,580 --> 00:47:38,750
and ruin your lives.

929
00:47:38,750 --> 00:47:41,360
I speak of course
of file sharing.

930
00:47:41,360 --> 00:47:43,540
We were terrified
that people would

931
00:47:43,540 --> 00:47:48,200
try to BitTorrent or Gnutella
or whatever over this thing.

932
00:47:48,200 --> 00:47:49,760
Yes, it was a long time ago.

933
00:47:49,760 --> 00:47:52,990
And we thought about
how we'd do that.

934
00:47:52,990 --> 00:47:55,470
Well, you'll see in the paper
that we talk a lot about exit

935
00:47:55,470 --> 00:47:58,140
policies, about
letting exit nodes say,

936
00:47:58,140 --> 00:48:03,040
I only allow connections
to port 80 and port 443.

937
00:48:03,040 --> 00:48:05,850
This doesn't actually
help with abuse at all.

938
00:48:05,850 --> 00:48:15,800
Because you can try to
spread worms over port 80.

939
00:48:15,800 --> 00:48:21,897
You can post abusive stuff
to IRC channels over web

940
00:48:21,897 --> 00:48:23,710
to IRC interfaces.

941
00:48:23,710 --> 00:48:26,140
Everything's got a web
interface these days.

942
00:48:26,140 --> 00:48:29,340
So you can't really
say, it's only web.

943
00:48:29,340 --> 00:48:30,400
It's safe.

944
00:48:30,400 --> 00:48:33,040
If it's useful,
it can be abused.

945
00:48:33,040 --> 00:48:35,450
That said, there
are people who are

946
00:48:35,450 --> 00:48:39,000
willing to run exits
that deliver 80 and 443

947
00:48:39,000 --> 00:48:42,547
who would not be willing to
run exits delivering all ports.

948
00:48:42,547 --> 00:48:43,880
So it did turn out to be useful.

949
00:48:43,880 --> 00:48:45,588
It just didn't turn
out to be a solution.

950
00:48:49,010 --> 00:48:54,699
Another thing that creates
problems is criminal activity

951
00:48:54,699 --> 00:48:56,740
generally doesn't create
problems for the network

952
00:48:56,740 --> 00:48:58,560
operators so much.

953
00:48:58,560 --> 00:49:01,750
From time to time, somebody's
server gets seized and returned

954
00:49:01,750 --> 00:49:04,550
six months later, and they
have to wipe the thing.

955
00:49:04,550 --> 00:49:07,430
That's still an infrequent
enough occurrence

956
00:49:07,430 --> 00:49:12,950
that it's somewhat
surprising when it happens.

957
00:49:12,950 --> 00:49:16,050
And so yeah, don't run
an exit node on a server

958
00:49:16,050 --> 00:49:19,185
that you need to graduate.

959
00:49:23,165 --> 00:49:23,665
What else?

960
00:49:27,670 --> 00:49:31,210
The biggest problem that
we have for abuse of stuff

961
00:49:31,210 --> 00:49:34,260
is that many websites
around the world,

962
00:49:34,260 --> 00:49:36,200
and many IRC
services and so one,

963
00:49:36,200 --> 00:49:42,210
use IP-based blocking in
order to deter and mitigate

964
00:49:42,210 --> 00:49:50,680
abusive behavior-- people
posting road kill pictures

965
00:49:50,680 --> 00:49:56,160
on My Little Pony sites,
people flaming everybody

966
00:49:56,160 --> 00:49:59,690
on IRC channels,
people making love,

967
00:49:59,690 --> 00:50:05,300
leave, join requests, people
replacing entire Wikipedia

968
00:50:05,300 --> 00:50:08,896
pages with racial slurs.

969
00:50:08,896 --> 00:50:09,770
This stuff it's real.

970
00:50:09,770 --> 00:50:10,478
It's problematic.

971
00:50:10,478 --> 00:50:13,560
It's unacceptable to the
websites and services

972
00:50:13,560 --> 00:50:15,580
that use IP-based blocking.

973
00:50:15,580 --> 00:50:18,140
They need a way to keep
this from happening.

974
00:50:18,140 --> 00:50:21,950
And IP-based blocking is a
cheap way for them to do that.

975
00:50:21,950 --> 00:50:27,230
So it's pretty frequent that
Tor users get banned completely

976
00:50:27,230 --> 00:50:30,340
from some sites.

977
00:50:30,340 --> 00:50:36,370
There's some work on trying to
say, well, why does IP-based

978
00:50:36,370 --> 00:50:37,330
blocking really work?

979
00:50:37,330 --> 00:50:40,690
Is it because IPs are people?

980
00:50:40,690 --> 00:50:41,310
No.

981
00:50:41,310 --> 00:50:44,295
Everybody in this room knows
how to get a different IP

982
00:50:44,295 --> 00:50:45,710
if they need one.

983
00:50:45,710 --> 00:50:49,540
Everybody in this room knows how
to get like tens of thousands

984
00:50:49,540 --> 00:50:51,550
of different IPs
if they need one,

985
00:50:51,550 --> 00:50:53,180
if they need tens of thousands.

986
00:50:53,180 --> 00:50:56,680
But for most people,
getting more IPs

987
00:50:56,680 --> 00:50:59,720
is at least a little time
consuming and at least

988
00:50:59,720 --> 00:51:03,265
a little challenging to the
extent that it imposes a rate

989
00:51:03,265 --> 00:51:05,660
limit and a resource
cost on abuse

990
00:51:05,660 --> 00:51:08,940
if you don't want a bot net
and if they've already blocked

991
00:51:08,940 --> 00:51:12,110
Tor and all the
other proxy services.

992
00:51:12,110 --> 00:51:16,850
So for that, you need to
look at different ways

993
00:51:16,850 --> 00:51:20,380
to provide other resource costs.

994
00:51:20,380 --> 00:51:24,970
You can either say, well--
have you done blind signatures?

995
00:51:24,970 --> 00:51:28,740
Oh, you can construct
things so that you

996
00:51:28,740 --> 00:51:31,210
need an IP to make an account.

997
00:51:31,210 --> 00:51:33,620
But what account
you make with an IP

998
00:51:33,620 --> 00:51:37,250
is not linkable to your IP.

999
00:51:37,250 --> 00:51:39,277
And then later on if
the account gets banned,

1000
00:51:39,277 --> 00:51:41,670
you need to create a new
account from a different IP.

1001
00:51:41,670 --> 00:51:44,211
That's something you can build,
and we're working with people

1002
00:51:44,211 --> 00:51:47,890
to work on it, although it needs
more hacking on the integration

1003
00:51:47,890 --> 00:51:48,630
side.

1004
00:51:48,630 --> 00:51:51,213
Something else that needs more
hacking on the integration side

1005
00:51:51,213 --> 00:51:54,387
is anonymous black
listable credentials.

1006
00:51:54,387 --> 00:51:55,470
They're a little esoteric.

1007
00:51:55,470 --> 00:52:02,220
But the idea is that
you get something

1008
00:52:02,220 --> 00:52:05,780
that allows you to participate
on an IRC server, for example.

1009
00:52:05,780 --> 00:52:08,080
You can use this as
many times as you want.

1010
00:52:08,080 --> 00:52:12,380
Your using it is not linkable
until you are banned.

1011
00:52:12,380 --> 00:52:14,580
Once you are banned,
future attempts

1012
00:52:14,580 --> 00:52:18,000
from the same person with the
same credential don't work.

1013
00:52:18,000 --> 00:52:21,840
But past activities do not
become linkable to one another.

1014
00:52:21,840 --> 00:52:24,090
These can be built
pretty easily.

1015
00:52:24,090 --> 00:52:26,730
The problem is convincing people
who are more or less satisfied

1016
00:52:26,730 --> 00:52:29,300
with IP blocking to
actually use them

1017
00:52:29,300 --> 00:52:32,965
and actually integrating
them with services.

1018
00:52:32,965 --> 00:52:36,170
Someone inevitably asks
me-- it's kind of neat.

1019
00:52:36,170 --> 00:52:43,310
So I started these lecture notes
based on my lecture from 2013.

1020
00:52:43,310 --> 00:52:46,110
And there was something
about the inevitable question

1021
00:52:46,110 --> 00:52:48,660
about Silk Road
1 getting busted.

1022
00:52:48,660 --> 00:52:50,885
There's the inevitable
question about Silk Road 2

1023
00:52:50,885 --> 00:52:51,510
getting busted.

1024
00:52:51,510 --> 00:52:55,880
Silk Road 2 was a hidden service
operating on the Tor network

1025
00:52:55,880 --> 00:52:58,650
where people would get
together to buy and sell

1026
00:52:58,650 --> 00:53:03,480
illegal things,
mostly illegal drugs.

1027
00:53:03,480 --> 00:53:06,360
So as far as we know, as
far as we can find out,

1028
00:53:06,360 --> 00:53:10,050
the guy got busted
through bad OPSEC.

1029
00:53:10,050 --> 00:53:13,810
Like he made a public
posting with his actual name,

1030
00:53:13,810 --> 00:53:17,430
and then went and deleted it
and put his pseudonym on it.

1031
00:53:17,430 --> 00:53:20,363
Tor can't help people
against that kind of stuff.

1032
00:53:20,363 --> 00:53:23,520
On the other hand, if you've
been looking at the NSA leaks,

1033
00:53:23,520 --> 00:53:26,640
you know that law enforcement
has been getting information

1034
00:53:26,640 --> 00:53:29,620
from intelligence and
then sanitizing it

1035
00:53:29,620 --> 00:53:33,495
through a process called
dual construction where

1036
00:53:33,495 --> 00:53:36,120
the intelligence agency will say
to the law enforcement agency,

1037
00:53:36,120 --> 00:53:38,390
OK, look, it's Fred over there.

1038
00:53:38,390 --> 00:53:39,480
He did it.

1039
00:53:39,480 --> 00:53:41,482
But that's not
admissible in a court,

1040
00:53:41,482 --> 00:53:43,190
and you can never
admit that we told you.

1041
00:53:43,190 --> 00:53:46,125
Just find some other way to
find out that Fred did it,

1042
00:53:46,125 --> 00:53:48,120
but Fred did it.

1043
00:53:48,120 --> 00:53:50,210
According to some
of the Snowden leaks

1044
00:53:50,210 --> 00:53:52,380
and some of the leaks
from the other guy, who

1045
00:53:52,380 --> 00:53:59,910
has still not been caught,
that's done sometimes.

1046
00:53:59,910 --> 00:54:05,960
So OK, at this point, you use
your basic Bayesian reasoning

1047
00:54:05,960 --> 00:54:08,850
skills, and you say,
well OK, would I

1048
00:54:08,850 --> 00:54:11,090
see this evidence
if the guy actually

1049
00:54:11,090 --> 00:54:13,040
got caught by because of OPSEC?

1050
00:54:13,040 --> 00:54:14,490
Yes, I would.

1051
00:54:14,490 --> 00:54:15,720
I would see bad OPSEC.

1052
00:54:15,720 --> 00:54:19,880
I would see reports that he got
caught because of bad OPSEC.

1053
00:54:19,880 --> 00:54:24,410
But what would I see if it
were a dual construction case?

1054
00:54:24,410 --> 00:54:27,100
I would also see
reports that the guy

1055
00:54:27,100 --> 00:54:29,450
got caught by bad OPSEC.

1056
00:54:29,450 --> 00:54:32,100
Because the evidence that
would be available to

1057
00:54:32,100 --> 00:54:33,970
us is the same in either case.

1058
00:54:33,970 --> 00:54:38,185
We can't really conclude
much from any public reports

1059
00:54:38,185 --> 00:54:39,940
of that.

1060
00:54:39,940 --> 00:54:44,521
That said, it does look like
the guy got busted by bad OPSEC.

1061
00:54:44,521 --> 00:54:46,145
It does look like
the kind of bad OPSEC

1062
00:54:46,145 --> 00:54:48,000
that you would be
looking for if you

1063
00:54:48,000 --> 00:54:51,210
were trying to catch somebody
running something like this.

1064
00:54:51,210 --> 00:54:54,620
Nevertheless, earlier I
suggested that please do not

1065
00:54:54,620 --> 00:54:58,130
use myself to break any laws.

1066
00:54:58,130 --> 00:55:05,380
Also if you're life or
freedom is at stake from using

1067
00:55:05,380 --> 00:55:09,665
Tor or any security
product, do not

1068
00:55:09,665 --> 00:55:11,180
use that product in isolation.

1069
00:55:11,180 --> 00:55:14,810
Think of ways to
use it to construct

1070
00:55:14,810 --> 00:55:21,330
a series of redundant
defenses for yourself

1071
00:55:21,330 --> 00:55:23,830
if your life or
freedom at stake,

1072
00:55:23,830 --> 00:55:27,050
or if having the
system broken is

1073
00:55:27,050 --> 00:55:28,780
completely unacceptable to you.

1074
00:55:28,780 --> 00:55:30,024
And I'll say that about Tor.

1075
00:55:30,024 --> 00:55:31,190
And I'll say that about TLS.

1076
00:55:31,190 --> 00:55:33,590
And I'll say that about PGP.

1077
00:55:33,590 --> 00:55:38,620
Software is a work in progress.

1078
00:55:38,620 --> 00:55:41,065
So that's the abuse section.

1079
00:55:41,065 --> 00:55:44,870
I've got 25 minutes--
hidden services.

1080
00:55:47,750 --> 00:55:50,490
Where's hidden services?

1081
00:55:50,490 --> 00:55:53,620
So responder anonymity
is a much harder problem

1082
00:55:53,620 --> 00:55:55,640
than initiator anonymity.

1083
00:55:55,640 --> 00:55:57,300
Initiator anonymity
is what you get

1084
00:55:57,300 --> 00:56:00,210
when Alice wants to
buy socks, and Alice

1085
00:56:00,210 --> 00:56:02,580
wants to stay anonymous
from the sock vendor.

1086
00:56:02,580 --> 00:56:05,200
Responder anonymity
is when Alice

1087
00:56:05,200 --> 00:56:09,300
wants to publish her
poetry online and run a web

1088
00:56:09,300 --> 00:56:11,190
server that has
her poetry on it,

1089
00:56:11,190 --> 00:56:14,150
but not let anyone know
where that web server is

1090
00:56:14,150 --> 00:56:16,680
because the poetry
is so embarrassing.

1091
00:56:16,680 --> 00:56:19,360
And yes there actually
is a hidden service

1092
00:56:19,360 --> 00:56:21,710
out there of mine
with bad poetry on it.

1093
00:56:21,710 --> 00:56:24,070
No, I don't think anybody's
actually published it yet.

1094
00:56:24,070 --> 00:56:26,490
No, I'm not going to
tell anybody where it is.

1095
00:56:26,490 --> 00:56:27,990
I'm waiting for it to go public.

1096
00:56:31,390 --> 00:56:37,920
So all right, one thing
you could do is-- let's

1097
00:56:37,920 --> 00:56:39,351
see, how much time?

1098
00:56:39,351 --> 00:56:43,650
OK, I can do this.

1099
00:56:43,650 --> 00:56:46,622
So now Alice wants to
publish her poetry.

1100
00:56:46,622 --> 00:56:48,205
So I'm going to put
Alice on this end,

1101
00:56:48,205 --> 00:56:49,450
because she's the responder.

1102
00:56:49,450 --> 00:56:54,080
Alice could build a path-- this
represents a lot of relays--

1103
00:56:54,080 --> 00:56:59,052
through the Tor network, and
then just say to this relay,

1104
00:56:59,052 --> 00:57:00,135
please accept connections.

1105
00:57:02,660 --> 00:57:05,600
So now anyone who goes
to this relay could say,

1106
00:57:05,600 --> 00:57:07,770
hey, I want to talk to Alice.

1107
00:57:07,770 --> 00:57:10,180
And there have been
designs that work this way.

1108
00:57:10,180 --> 00:57:12,620
It has some challenges, though.

1109
00:57:12,620 --> 00:57:15,185
One challenge is this relay
could man in the middle

1110
00:57:15,185 --> 00:57:19,920
all the traffic unless there
is a well known TLS key.

1111
00:57:19,920 --> 00:57:22,400
Another thing is
maybe this relay

1112
00:57:22,400 --> 00:57:24,396
is also embarrassed
by the poetry

1113
00:57:24,396 --> 00:57:26,020
and doesn't want to
be a public contact

1114
00:57:26,020 --> 00:57:31,160
point for poetry so terrible.

1115
00:57:31,160 --> 00:57:35,280
So this relay could also be
pressured by other people who

1116
00:57:35,280 --> 00:57:37,760
hate the poetry to censor it.

1117
00:57:37,760 --> 00:57:41,940
This relay could also make
itself an attack target.

1118
00:57:41,940 --> 00:57:45,130
So you want some way where Alice
can go to different relays over

1119
00:57:45,130 --> 00:57:51,170
time and no single relay is
touching unencrypted traffic

1120
00:57:51,170 --> 00:57:52,480
of Alice's.

1121
00:57:52,480 --> 00:57:56,620
All right, that's doable.

1122
00:57:56,620 --> 00:57:58,510
But once you have a lot
of different relays,

1123
00:57:58,510 --> 00:58:01,790
what does Alice
actually tell people?

1124
00:58:01,790 --> 00:58:04,490
It's kind of got
to be a public key.

1125
00:58:04,490 --> 00:58:08,250
Because if she just says, relay
x, relay y, relay z, but x, y,

1126
00:58:08,250 --> 00:58:11,530
and z are changing
every five minutes,

1127
00:58:11,530 --> 00:58:13,920
that's kind of challenging
to know you actually

1128
00:58:13,920 --> 00:58:15,570
got the right relay.

1129
00:58:15,570 --> 00:58:17,590
So let's say she tells
everybody a public key,

1130
00:58:17,590 --> 00:58:22,550
and once she gets over here,
she says, hey, this is Alice.

1131
00:58:22,550 --> 00:58:24,090
I'll prove it with
my public key.

1132
00:58:24,090 --> 00:58:33,960
So this relay knows
that public key z is

1133
00:58:33,960 --> 00:58:35,380
running a hidden service here.

1134
00:58:35,380 --> 00:58:38,330
And so if anyone else says,
hey, connect me to public key z,

1135
00:58:38,330 --> 00:58:41,130
they can do a
handshake and wind up

1136
00:58:41,130 --> 00:58:43,170
with a shared key with Alice.

1137
00:58:43,170 --> 00:58:46,260
And it's the same handshake as
the Tor circuit extension uses.

1138
00:58:46,260 --> 00:58:48,590
And now Bob can
read Alice's poetry

1139
00:58:48,590 --> 00:58:52,190
by going another path through
the Tor network over here.

1140
00:58:52,190 --> 00:58:57,045
Bob has to know PKz, and Bob can
say, hey, connect me with PKz.

1141
00:58:57,045 --> 00:58:59,170
Send this thing that's sort
of like a create cell--

1142
00:58:59,170 --> 00:59:01,380
really it's an introduce
cell, but let's

1143
00:59:01,380 --> 00:59:03,380
forget that-- over the Alice.

1144
00:59:03,380 --> 00:59:05,820
They do the same
handshake that relays do.

1145
00:59:05,820 --> 00:59:07,913
And now they have a
shared key that they can

1146
00:59:07,913 --> 00:59:10,100
use for end to end encryption.

1147
00:59:10,100 --> 00:59:11,915
Well, there's something
I left out, though,

1148
00:59:11,915 --> 00:59:15,120
which is, how does Bob
know how to go here?

1149
00:59:15,120 --> 00:59:17,082
And can we do anything
about the fact

1150
00:59:17,082 --> 00:59:22,480
that this relay has to
learn to this public key?

1151
00:59:22,480 --> 00:59:23,070
Well, we can.

1152
00:59:23,070 --> 00:59:27,730
We can add some [INAUDIBLE]
directory system

1153
00:59:27,730 --> 00:59:32,745
where Alice uploads a signed
statement anonymously over Tor

1154
00:59:32,745 --> 00:59:38,725
saying PKz is at a relay x.

1155
00:59:41,590 --> 00:59:44,620
And then Bob says, hey,
give me a signed statement

1156
00:59:44,620 --> 00:59:46,520
to ask the directory
system, hey, give me

1157
00:59:46,520 --> 00:59:49,940
a signed statement about PKz.

1158
00:59:49,940 --> 00:59:52,376
And Bob finds out where to go.

1159
00:59:52,376 --> 00:59:56,740
And we could even do one
better and have Alice give

1160
00:59:56,740 --> 00:59:59,250
a different public key here.

1161
00:59:59,250 --> 01:00:00,890
So this could be PKw.

1162
01:00:04,660 --> 01:00:09,840
And the statement she uploads
to the directory can say,

1163
01:00:09,840 --> 01:00:12,730
if you want to talk to the
service with public key z,

1164
01:00:12,730 --> 01:00:16,560
then go to relay x
and use public key w.

1165
01:00:16,560 --> 01:00:21,820
And now public key z
isn't published here.

1166
01:00:21,820 --> 01:00:26,590
You could even go one
farther and encrypt this

1167
01:00:26,590 --> 01:00:29,480
with some shared secret
known to Alice and Bob.

1168
01:00:29,480 --> 01:00:32,330
And if you do that, then
the directory service

1169
01:00:32,330 --> 01:00:34,990
and people who can contact
the directory service

1170
01:00:34,990 --> 01:00:39,530
can't learn how to connect
to Alice with that.

1171
01:00:39,530 --> 01:00:40,030
Yeah.

1172
01:00:40,030 --> 01:00:42,190
STUDENT: Just a
quick question there.

1173
01:00:42,190 --> 01:00:44,850
If that's not encrypted,
then Rx can still

1174
01:00:44,850 --> 01:00:48,010
find out that it's running
a service for Alice, right?

1175
01:00:48,010 --> 01:00:48,890
NICK MATHEWSON: Yep.

1176
01:00:48,890 --> 01:00:49,934
Well, not for Alice.

1177
01:00:49,934 --> 01:00:51,475
It can find out that
it's running PKz

1178
01:00:51,475 --> 01:00:53,060
if this is not encrypted.

1179
01:00:53,060 --> 01:00:55,680
We have a design for that
that I'm actually going

1180
01:00:55,680 --> 01:00:56,950
to get to at the end of this.

1181
01:00:56,950 --> 01:00:58,740
But it's not built yet.

1182
01:00:58,740 --> 01:01:01,040
But it's pretty cool.

1183
01:01:01,040 --> 01:01:03,535
So OK, and you don't want to
use a centralized directory

1184
01:01:03,535 --> 01:01:04,460
for this.

1185
01:01:04,460 --> 01:01:12,280
So we actually do use a DHT,
which is, again, not perfect,

1186
01:01:12,280 --> 01:01:14,370
and has some censorship
opportunities.

1187
01:01:14,370 --> 01:01:16,966
But we are trying to
make those less and less.

1188
01:01:16,966 --> 01:01:19,700
And I might cover more stuff,
so I can't do the whole details.

1189
01:01:22,510 --> 01:01:24,860
So one of the
problems there though

1190
01:01:24,860 --> 01:01:28,090
is if you are running one
of these directory services,

1191
01:01:28,090 --> 01:01:35,960
you've got a complete list of
these keys pretty-- over time,

1192
01:01:35,960 --> 01:01:37,800
you run a directory
service [INAUDIBLE].

1193
01:01:37,800 --> 01:01:39,551
You get a complete
list of all these keys,

1194
01:01:39,551 --> 01:01:41,300
and you can try
connecting to all the ones

1195
01:01:41,300 --> 01:01:43,830
that don't have encrypted
stuff to find out what's there.

1196
01:01:43,830 --> 01:01:45,509
That's called an
enumeration attack.

1197
01:01:45,509 --> 01:01:47,050
And we didn't list
that in our paper,

1198
01:01:47,050 --> 01:01:49,690
because we weren't
thinking of that.

1199
01:01:49,690 --> 01:01:51,270
We didn't.

1200
01:01:51,270 --> 01:01:53,630
But it is something
we'd like to resist.

1201
01:01:53,630 --> 01:01:57,680
So in the design I hope
to be hacking together

1202
01:01:57,680 --> 01:02:02,020
sometime in 2014, we're going
to move towards a key blinding

1203
01:02:02,020 --> 01:02:18,770
approach where Alice
and Bob share PKz,

1204
01:02:18,770 --> 01:02:22,190
but this statement is
not signed with PKz.

1205
01:02:22,190 --> 01:02:24,780
This statement is
signed with PKz prime

1206
01:02:24,780 --> 01:02:33,380
where PKz prime is
derived from PKz

1207
01:02:33,380 --> 01:02:44,490
and, say, the date such that
if you know PKz and the date,

1208
01:02:44,490 --> 01:02:47,240
you can derive PKz prime.

1209
01:02:47,240 --> 01:02:51,810
If like Alice you
know secret Kz,

1210
01:02:51,810 --> 01:02:56,550
you can generate messages
that are signed by PKz prime.

1211
01:02:56,550 --> 01:03:01,410
But if you only see PKz
prime, even knowing the date,

1212
01:03:01,410 --> 01:03:04,440
you cannot re-derive PKz.

1213
01:03:04,440 --> 01:03:06,170
We've got a proof.

1214
01:03:06,170 --> 01:03:10,495
And if you'd like to find out
how this works, then ping me

1215
01:03:10,495 --> 01:03:11,960
and I'll send you the paper.

1216
01:03:11,960 --> 01:03:15,700
It's a cool trick.

1217
01:03:15,700 --> 01:03:18,590
We weren't the first
ones to invent this idea.

1218
01:03:18,590 --> 01:03:22,900
But that is how we're going
to solve enumeration attacks

1219
01:03:22,900 --> 01:03:26,790
sometime this coming year
assuming that I can actually

1220
01:03:26,790 --> 01:03:29,253
get the time to build it.

1221
01:03:29,253 --> 01:03:30,336
So that's hidden services.

1222
01:03:34,730 --> 01:03:41,630
Attacks and
defenses-- so so far,

1223
01:03:41,630 --> 01:03:44,600
the biggest category
of attacks we've seen

1224
01:03:44,600 --> 01:03:47,370
is attacks at the
application level.

1225
01:03:47,370 --> 01:03:50,810
So if you're running an
application over Tor,

1226
01:03:50,810 --> 01:03:56,146
and it's sending unencrypted
traffic, like regular HTTP,

1227
01:03:56,146 --> 01:03:59,450
then a hostile exit
node, just like anyone

1228
01:03:59,450 --> 01:04:02,470
else who touches HTTP traffic,
can observe and modify

1229
01:04:02,470 --> 01:04:04,830
the traffic.

1230
01:04:04,830 --> 01:04:08,240
This is the number one
attack on the whole system.

1231
01:04:08,240 --> 01:04:10,120
The solution is
encrypted traffic.

1232
01:04:10,120 --> 01:04:13,060
Fortunately, we're kind of
in an encryption renaissance

1233
01:04:13,060 --> 01:04:14,520
over the last few years.

1234
01:04:14,520 --> 01:04:16,650
And more and more
traffic is getting

1235
01:04:16,650 --> 01:04:21,520
encrypted with the nifty
free certificate authority

1236
01:04:21,520 --> 01:04:25,550
that EFF and Mozilla and Cisco
and I forget who else announced

1237
01:04:25,550 --> 01:04:26,740
a day or two ago.

1238
01:04:26,740 --> 01:04:29,632
There will be even less excuse
for unencrypted traffic in 2015

1239
01:04:29,632 --> 01:04:31,420
than there was this year.

1240
01:04:31,420 --> 01:04:33,210
So that solves that.

1241
01:04:33,210 --> 01:04:37,580
More interesting attacks include
things like traffic tagging.

1242
01:04:37,580 --> 01:04:44,090
So we made a mistake in our
early integrity checking

1243
01:04:44,090 --> 01:04:44,870
implementation.

1244
01:04:44,870 --> 01:04:47,870
Our early integrity
checking implementation

1245
01:04:47,870 --> 01:04:55,098
did end to end checking between
Alice's program and the exit

1246
01:04:55,098 --> 01:04:56,410
node.

1247
01:04:56,410 --> 01:04:58,900
But it turns out that
that's not enough.

1248
01:04:58,900 --> 01:05:02,410
Because if the
first relay messes

1249
01:05:02,410 --> 01:05:07,290
with the traffic in a way that
creates a pattern that the exit

1250
01:05:07,290 --> 01:05:10,330
node can detect, then
that's an easy way

1251
01:05:10,330 --> 01:05:12,800
for the first relay
and the last relay

1252
01:05:12,800 --> 01:05:17,860
to learn that they are on the
same path and identify Alice.

1253
01:05:17,860 --> 01:05:20,220
Of course, if the first
relay and the last relay

1254
01:05:20,220 --> 01:05:23,390
happen to be on the
same path, happen

1255
01:05:23,390 --> 01:05:25,950
to be collaborating anyway,
then they can already

1256
01:05:25,950 --> 01:05:30,000
identify Alice through traffic
correlation, we believe.

1257
01:05:30,000 --> 01:05:34,944
But perhaps it should not
be so easy for them as that.

1258
01:05:34,944 --> 01:05:36,610
Perhaps traffic
correlation will someday

1259
01:05:36,610 --> 01:05:38,330
be harder than we think.

1260
01:05:38,330 --> 01:05:41,460
It would be good to actually
solve that attack for real.

1261
01:05:41,460 --> 01:05:43,700
We've got two
solutions for that.

1262
01:05:43,700 --> 01:05:46,220
One is the expected
result of this attack

1263
01:05:46,220 --> 01:05:48,350
is that periodically
circuits will fail.

1264
01:05:48,350 --> 01:05:50,750
Because the attacker
on the first hop

1265
01:05:50,750 --> 01:05:53,570
guessed wrong about
controlling the last hop.

1266
01:05:53,570 --> 01:05:59,130
So every Tor client checks
for weird failure rates.

1267
01:05:59,130 --> 01:06:00,910
The real long-term
fix is to make it

1268
01:06:00,910 --> 01:06:04,570
so that messing with the
pattern on the first hop

1269
01:06:04,570 --> 01:06:07,890
doesn't create more than 1 bit
of information on the last hop.

1270
01:06:07,890 --> 01:06:10,790
You can't avoid sending
1 bit of information,

1271
01:06:10,790 --> 01:06:13,830
because the first hop can always
just shut down the connection.

1272
01:06:13,830 --> 01:06:17,097
But you can limit it
to 1 bit-- OK, 2 bits.

1273
01:06:17,097 --> 01:06:19,430
Because then they'll have the
choice to corrupt the data

1274
01:06:19,430 --> 01:06:20,740
or shut down the connection.

1275
01:06:23,700 --> 01:06:25,716
Oh, I had an idea of
how to make that better.

1276
01:06:25,716 --> 01:06:28,980
I'll have to think about that.

1277
01:06:28,980 --> 01:06:32,610
Let's see, DOS is
actually pretty important.

1278
01:06:32,610 --> 01:06:34,610
There was a paper the
other year about something

1279
01:06:34,610 --> 01:06:36,640
that the authors called
the sniper attack

1280
01:06:36,640 --> 01:06:39,986
where you see traffic
coming from a Tor node

1281
01:06:39,986 --> 01:06:41,850
that you don't control.

1282
01:06:41,850 --> 01:06:44,230
You want to kick everybody
off that Tor node.

1283
01:06:44,230 --> 01:06:45,490
So you connect to it.

1284
01:06:45,490 --> 01:06:50,217
You fill up all its memory
buffers, and it crashes.

1285
01:06:50,217 --> 01:06:52,050
Then you see whether
the traffic in question

1286
01:06:52,050 --> 01:06:54,055
gets rerouted to a node
you control or not,

1287
01:06:54,055 --> 01:06:55,235
and you repeat as necessary.

1288
01:06:59,020 --> 01:07:02,575
For that, our best
options are first off,

1289
01:07:02,575 --> 01:07:05,310
no longer have memory DOSes.

1290
01:07:05,310 --> 01:07:10,550
I think we have all of the
good memory DOSes fixed now.

1291
01:07:10,550 --> 01:07:13,080
There are some bad ones that
still needed to get addressed.

1292
01:07:13,080 --> 01:07:16,430
But they're screamingly
inefficient.

1293
01:07:16,430 --> 01:07:19,770
The other option for
resolving this kind of thing

1294
01:07:19,770 --> 01:07:23,020
is make sure relays
are high capacity.

1295
01:07:23,020 --> 01:07:25,720
Don't accept low capacity
relays on the network.

1296
01:07:25,720 --> 01:07:26,700
We do that, too.

1297
01:07:26,700 --> 01:07:30,130
If you're trying to run
a relay on your phone,

1298
01:07:30,130 --> 01:07:31,570
the authorities won't list it.

1299
01:07:35,950 --> 01:07:39,350
And another thing is to try
to pick our circuit scheduling

1300
01:07:39,350 --> 01:07:45,710
algorithms so that it's
hard to starve out circuits

1301
01:07:45,710 --> 01:07:46,820
that you don't control.

1302
01:07:46,820 --> 01:07:50,605
That's very hard,
though, and it's as yet

1303
01:07:50,605 --> 01:07:52,830
an unsolved problem.

1304
01:07:52,830 --> 01:07:55,660
Let's see, should I do
an interesting attack

1305
01:07:55,660 --> 01:07:58,342
or an important attack?

1306
01:07:58,342 --> 01:07:59,216
STUDENT: Interesting.

1307
01:07:59,216 --> 01:08:01,600
NICK MATHEWSON: Interesting, OK.

1308
01:08:01,600 --> 01:08:03,094
So show of hands,
how many people

1309
01:08:03,094 --> 01:08:04,510
might like to write
a program that

1310
01:08:04,510 --> 01:08:07,130
uses cryptography some day?

1311
01:08:07,130 --> 01:08:08,770
Cool, here's what
you must learn.

1312
01:08:08,770 --> 01:08:12,540
Never trust your
cryptography implementation.

1313
01:08:12,540 --> 01:08:15,670
So even when it's
correct, it's wrong.

1314
01:08:15,670 --> 01:08:21,825
So long ago-- I think this may
be one of the worse security

1315
01:08:21,825 --> 01:08:24,430
bugs that we've had.

1316
01:08:24,430 --> 01:08:25,805
Any relay could
man in the middle

1317
01:08:25,805 --> 01:08:32,420
any circuit because we assumed
that a correct Diffie-Hellman

1318
01:08:32,420 --> 01:08:38,120
implementation would verify
that it was not being passed 0

1319
01:08:38,120 --> 01:08:40,600
as one of the inputs.

1320
01:08:40,600 --> 01:08:42,770
The authors of our
Diffie-Hellman implementation

1321
01:08:42,770 --> 01:08:44,758
assumed the proper
application would never

1322
01:08:44,758 --> 01:08:49,470
pass zero to a Diffie-Hellman
implementation.

1323
01:08:49,470 --> 01:08:56,229
So Diffie-Hellman, when I say
g to the x, you say g to the y.

1324
01:08:56,229 --> 01:08:57,340
I know x.

1325
01:08:57,340 --> 01:08:58,310
You know y.

1326
01:08:58,310 --> 01:09:01,332
And we can both compute
g to the xy now.

1327
01:09:01,332 --> 01:09:02,540
You tend to feel me?

1328
01:09:02,540 --> 01:09:03,100
Good.

1329
01:09:03,100 --> 01:09:06,640
Well, if instead the
man in the middle

1330
01:09:06,640 --> 01:09:10,990
replaces my g to the x with
0 and your g to the x with 0,

1331
01:09:10,990 --> 01:09:13,100
and then I happily
compute 0 to the x,

1332
01:09:13,100 --> 01:09:16,890
and you compute 0 to the y,
we will have the same key.

1333
01:09:16,890 --> 01:09:18,719
We will happily
talk to each other.

1334
01:09:18,719 --> 01:09:22,740
But this will be a key that the
attacker knows, because it's 0.

1335
01:09:22,740 --> 01:09:25,149
1 also works.

1336
01:09:25,149 --> 01:09:27,290
p also works.

1337
01:09:27,290 --> 01:09:29,729
p plus 1 also works.

1338
01:09:29,729 --> 01:09:33,110
So you basically just need to
make sure that your values here

1339
01:09:33,110 --> 01:09:37,120
are within range 2 and p minus
1 if you're doing Diffie-Hellman

1340
01:09:37,120 --> 01:09:38,439
in z sub p.

1341
01:09:41,010 --> 01:09:47,090
OK, let's see, I would love
to talk more about censorship.

1342
01:09:47,090 --> 01:09:49,609
Because actually,
it's one of the areas

1343
01:09:49,609 --> 01:09:51,460
where we can do the most good.

1344
01:09:51,460 --> 01:09:55,260
Generally, the summarized
version of that

1345
01:09:55,260 --> 01:09:57,240
was, in the earliest
paper you read,

1346
01:09:57,240 --> 01:09:59,880
and in some of the updates,
we were still on the idea

1347
01:09:59,880 --> 01:10:01,880
that we would try to
make Tor look just

1348
01:10:01,880 --> 01:10:05,275
like a web client talking
to a web server over HTTPS

1349
01:10:05,275 --> 01:10:06,869
and make that hard to block.

1350
01:10:06,869 --> 01:10:08,660
It turns out that's
fantastically difficult

1351
01:10:08,660 --> 01:10:10,820
and probably not worth doing.

1352
01:10:10,820 --> 01:10:12,250
Instead, the
approach we take now

1353
01:10:12,250 --> 01:10:15,190
is using different
plug-in programs

1354
01:10:15,190 --> 01:10:21,030
that a non-listed relay
called a bridge can use,

1355
01:10:21,030 --> 01:10:23,930
and a client can use
to do different traffic

1356
01:10:23,930 --> 01:10:25,440
transformations.

1357
01:10:25,440 --> 01:10:28,675
And we manage to keep
adding new ones of those

1358
01:10:28,675 --> 01:10:30,800
faster than the censors
have been able to implement

1359
01:10:30,800 --> 01:10:32,380
blocking for them.

1360
01:10:32,380 --> 01:10:38,560
And that's actually a case
where none of the solutions

1361
01:10:38,560 --> 01:10:42,320
are categorically workable.

1362
01:10:42,320 --> 01:10:44,030
That's not a
well-formed sentence.

1363
01:10:44,030 --> 01:10:47,170
None of these plug-ins
are inherently

1364
01:10:47,170 --> 01:10:50,651
unblockable by any
imaginable technique so far.

1365
01:10:50,651 --> 01:10:53,150
But they're good enough to keep
traffic unblocked for a year

1366
01:10:53,150 --> 01:10:56,390
or two in most places,
and six or seven

1367
01:10:56,390 --> 01:10:59,460
months at a time in China.

1368
01:10:59,460 --> 01:11:02,760
China currently has the most
competent censors in the world,

1369
01:11:02,760 --> 01:11:04,580
largely because China
doesn't outsource.

1370
01:11:04,580 --> 01:11:08,330
Most other censoring countries
outsource their censorship

1371
01:11:08,330 --> 01:11:12,680
to dishonest European, American,
and Asian companies whose

1372
01:11:12,680 --> 01:11:15,410
incentives are not actually
to sell them good censorship,

1373
01:11:15,410 --> 01:11:17,820
but to keep them on
an upgrade treadmill.

1374
01:11:17,820 --> 01:11:21,130
So if you were buying
your censorship software

1375
01:11:21,130 --> 01:11:24,470
from the United States--
which technically speaking

1376
01:11:24,470 --> 01:11:27,220
US companies aren't allowed
to make censorship software

1377
01:11:27,220 --> 01:11:29,140
for nations.

1378
01:11:29,140 --> 01:11:32,620
But they just make
corporate firewall software

1379
01:11:32,620 --> 01:11:34,650
that happens to scale
to 10 million people.

1380
01:11:37,240 --> 01:11:39,116
Yeah, I think that's unethical.

1381
01:11:39,116 --> 01:11:41,900
But again, I'm not the political
scientist of the organization,

1382
01:11:41,900 --> 01:11:43,729
or the philosopher.

1383
01:11:43,729 --> 01:11:46,020
Paul Syverson, one of the
original [INAUDIBLE] authors,

1384
01:11:46,020 --> 01:11:47,790
does have a degree
in philosophy,

1385
01:11:47,790 --> 01:11:50,090
for what that's worth,
which means that he can't

1386
01:11:50,090 --> 01:11:50,886
answer these questions either.

1387
01:11:50,886 --> 01:11:52,761
But he takes a lot longer
not to answer them.

1388
01:11:56,720 --> 01:11:58,550
Right, where was I?

1389
01:11:58,550 --> 01:12:01,380
90 minutes is a long time.

1390
01:12:01,380 --> 01:12:05,200
Censorship-- right, so what
the censorware providers

1391
01:12:05,200 --> 01:12:10,020
do is once Tor gets
around their censorship,

1392
01:12:10,020 --> 01:12:13,510
they will block the most
recent version of Tor.

1393
01:12:13,510 --> 01:12:17,480
But they do it in a way that
is the weakest possible block.

1394
01:12:17,480 --> 01:12:20,470
So if we change 1 bit in
one identifier somewhere,

1395
01:12:20,470 --> 01:12:22,150
we get around it.

1396
01:12:22,150 --> 01:12:25,050
We can't prove that they're
doing this on purpose

1397
01:12:25,050 --> 01:12:30,890
to ensure that Tor will evade
their version so that they can

1398
01:12:30,890 --> 01:12:34,370
sell Tor blocking and then have
it not work so they can sell

1399
01:12:34,370 --> 01:12:36,360
the upgrade, and then
sell the next upgrade,

1400
01:12:36,360 --> 01:12:37,640
and sell the next upgrade.

1401
01:12:37,640 --> 01:12:39,480
But it sure does seem that way.

1402
01:12:39,480 --> 01:12:42,614
So that's another reason not to
work for censorship providers.

1403
01:12:42,614 --> 01:12:44,530
They're tremendously
unethical, and they don't

1404
01:12:44,530 --> 01:12:45,654
provide very good software.

1405
01:12:48,180 --> 01:12:50,920
If you're interested
in writing any

1406
01:12:50,920 --> 01:12:52,584
of these plug-able
transport things,

1407
01:12:52,584 --> 01:12:54,000
that is an excellent
kind of thing

1408
01:12:54,000 --> 01:12:56,877
to do as a student
project-- loads of fun,

1409
01:12:56,877 --> 01:12:58,460
learn a little bit
about crypto, learn

1410
01:12:58,460 --> 01:13:00,076
a little bit about networking.

1411
01:13:00,076 --> 01:13:02,200
And so long as you do it
in a memory-safe language,

1412
01:13:02,200 --> 01:13:04,240
you can't screw
it up that badly.

1413
01:13:04,240 --> 01:13:06,350
The worst thing
that happens is it

1414
01:13:06,350 --> 01:13:10,496
gets censored after a month
instead of after a year.

1415
01:13:10,496 --> 01:13:17,600
And that's what I want to-- oh,
the addenda related to work.

1416
01:13:17,600 --> 01:13:21,680
Tor is the most popular
system of its kind,

1417
01:13:21,680 --> 01:13:23,110
but it's not the only one.

1418
01:13:23,110 --> 01:13:24,740
Lots of others have
really good ideas,

1419
01:13:24,740 --> 01:13:26,820
and you should
check them out too

1420
01:13:26,820 --> 01:13:29,822
if you're interested
in learning all

1421
01:13:29,822 --> 01:13:31,280
of the stuff I'm
not thinking about

1422
01:13:31,280 --> 01:13:33,770
and all the reasons I'm wrong.

1423
01:13:33,770 --> 01:13:37,330
freehaven.net/anonbib/
lists the academic research

1424
01:13:37,330 --> 01:13:39,290
and publications in this area.

1425
01:13:39,290 --> 01:13:42,240
But not all the research
in this area is academic.

1426
01:13:42,240 --> 01:13:48,680
You should also
look at I2P; Gnunet;

1427
01:13:48,680 --> 01:13:52,090
Freedom, which is
currently defunct,

1428
01:13:52,090 --> 01:14:09,640
no pun intended; Mixmaster;
Mixminion; Sphynx with a Y,

1429
01:14:09,640 --> 01:14:17,280
Sphinx with an I is
something different; DC-nets,

1430
01:14:17,280 --> 01:14:25,950
particularly the work of Brian
Ford, and also of the team

1431
01:14:25,950 --> 01:14:28,645
at Technical University
Dresden, in trying

1432
01:14:28,645 --> 01:14:30,240
to make DC-nets practical.

1433
01:14:30,240 --> 01:14:32,770
They're very strong [INAUDIBLE],
not actually deployable

1434
01:14:32,770 --> 01:14:35,245
yet-- and many others.

1435
01:14:41,040 --> 01:14:44,230
Why these get less use
or attention than Tor

1436
01:14:44,230 --> 01:14:48,270
is an open topic
of some interest

1437
01:14:48,270 --> 01:14:50,910
that I don't have
a solid answer for.

1438
01:14:50,910 --> 01:14:55,120
Future work-- so
one of the reasons

1439
01:14:55,120 --> 01:14:58,940
I do these is not just
because I would like everybody

1440
01:14:58,940 --> 01:15:00,700
to know about the cool
software I work on.

1441
01:15:00,700 --> 01:15:02,820
But also because I
know students have

1442
01:15:02,820 --> 01:15:05,090
lots and lots of free time.

1443
01:15:05,090 --> 01:15:07,360
And I'm kind of
looking to recruit.

1444
01:15:07,360 --> 01:15:09,180
OK, you may think I'm joking.

1445
01:15:09,180 --> 01:15:12,730
But when I was just getting
started in this field,

1446
01:15:12,730 --> 01:15:16,790
I was complaining about how I
was so busy reviewing papers

1447
01:15:16,790 --> 01:15:19,254
for one conference, writing
software, fixing a bug,

1448
01:15:19,254 --> 01:15:19,920
answering email.

1449
01:15:19,920 --> 01:15:21,920
I was complaining to some
senior faculty member.

1450
01:15:21,920 --> 01:15:27,060
And he told me, you will
never have so much free time

1451
01:15:27,060 --> 01:15:27,850
as you do today.

1452
01:15:29,955 --> 01:15:31,330
You actually have
a lot more free

1453
01:15:31,330 --> 01:15:33,050
time now than you
will in 10 years.

1454
01:15:33,050 --> 01:15:37,580
So this is a great time to work
on crazy software projects.

1455
01:15:37,580 --> 01:15:39,680
So let me tell you about
future work in Tor.

1456
01:15:39,680 --> 01:15:43,710
There's this key blinding
thing and a complete revamp

1457
01:15:43,710 --> 01:15:45,670
of our hidden
services system, which

1458
01:15:45,670 --> 01:15:47,900
was the best we could design
when we came up with it.

1459
01:15:47,900 --> 01:15:49,816
But there's been a lot
of research since then.

1460
01:15:49,816 --> 01:15:52,710
Maybe some of it will turn
out to be a good idea.

1461
01:15:52,710 --> 01:15:54,480
We're also revamping
most of our crypto.

1462
01:15:54,480 --> 01:15:58,320
We chose schemes that
seemed like a good security

1463
01:15:58,320 --> 01:16:03,140
performance trade-off
in 2003, like RSA-1024.

1464
01:16:03,140 --> 01:16:05,720
We've replaced the
really important uses

1465
01:16:05,720 --> 01:16:09,580
of RSA-1024 with stronger stuff,
currently [INAUDIBLE] 25519.

1466
01:16:09,580 --> 01:16:11,080
But there's still
some cases that we

1467
01:16:11,080 --> 01:16:14,797
want to replace in the protocol
that we need some work on.

1468
01:16:14,797 --> 01:16:16,630
I didn't talk too much
about path selection,

1469
01:16:16,630 --> 01:16:19,910
so I can't talk too much about
improvements in that selection.

1470
01:16:19,910 --> 01:16:24,410
But our path selection
algorithms were [INAUDIBLE].

1471
01:16:24,410 --> 01:16:26,140
And there's been
some awesome research

1472
01:16:26,140 --> 01:16:31,750
in the past five or six years on
that that we need to integrate.

1473
01:16:31,750 --> 01:16:33,900
There's a little
work that's been

1474
01:16:33,900 --> 01:16:38,500
done on mixing high latency
and low latency traffic so

1475
01:16:38,500 --> 01:16:41,345
that the low latency
traffic can provide cover

1476
01:16:41,345 --> 01:16:44,270
for the high latency traffic
in terms of providing lots

1477
01:16:44,270 --> 01:16:47,960
of users while the high latency
traffic is still very well

1478
01:16:47,960 --> 01:16:50,500
anonymized.

1479
01:16:50,500 --> 01:16:53,970
It's not clear whether
this would work or not.

1480
01:16:53,970 --> 01:16:57,600
It's not clear whether
anyone would use this or not.

1481
01:16:57,600 --> 01:17:01,080
And it is clear that
unless something changes,

1482
01:17:01,080 --> 01:17:03,879
or unless some major funding
for that particularly shows up,

1483
01:17:03,879 --> 01:17:05,920
I'm not going to have time
to work on it in 2015.

1484
01:17:05,920 --> 01:17:08,045
But if somebody else wants
to hack on that, my god,

1485
01:17:08,045 --> 01:17:08,860
that would be fun.

1486
01:17:08,860 --> 01:17:10,920
Our congestion
control algorithms

1487
01:17:10,920 --> 01:17:15,030
were chosen questionably based
on what we could hack together

1488
01:17:15,030 --> 01:17:17,050
in a week.

1489
01:17:17,050 --> 01:17:20,360
We've improved them, but they
could use a bigger revamp.

1490
01:17:20,360 --> 01:17:23,070
There's some research on
scaling to hundreds of thousands

1491
01:17:23,070 --> 01:17:24,170
of nodes.

1492
01:17:24,170 --> 01:17:26,800
So in the current
design, we can probably

1493
01:17:26,800 --> 01:17:29,630
get up to 10,000 or
20,000 with no problem.

1494
01:17:29,630 --> 01:17:33,330
But because we assume that every
client knows about every node,

1495
01:17:33,330 --> 01:17:35,670
and every node may be
connected to every other node,

1496
01:17:35,670 --> 01:17:38,350
that's going to stop
scaling before 100,000.

1497
01:17:38,350 --> 01:17:41,250
And we need to do
something about that.

1498
01:17:41,250 --> 01:17:43,070
That opens up some
classes of attacks

1499
01:17:43,070 --> 01:17:47,680
based on attackers learning
which clients know which nodes

1500
01:17:47,680 --> 01:17:50,574
and using that to
distinguish clients.

1501
01:17:50,574 --> 01:17:52,740
So most of the naive
approaches are a bad idea here.

1502
01:17:52,740 --> 01:17:56,960
But it may be that less naive
approaches might work out.

1503
01:17:56,960 --> 01:17:59,585
Another thing you might want to
do if you're increasing 100,000

1504
01:17:59,585 --> 01:18:02,230
nodes is get rid of those
centralized directory

1505
01:18:02,230 --> 01:18:05,840
authorities and go to some
kind of peer to peer design.

1506
01:18:05,840 --> 01:18:10,010
I don't have extremely
high confidence

1507
01:18:10,010 --> 01:18:12,530
in the peer to peer
designs I know of so far.

1508
01:18:12,530 --> 01:18:16,940
But it could be that
somebody's about to advance

1509
01:18:16,940 --> 01:18:17,690
the next good one.

1510
01:18:20,230 --> 01:18:23,400
Let's see, I don't
know what that means.

1511
01:18:23,400 --> 01:18:26,566
Oh, somebody asked a
question about adding

1512
01:18:26,566 --> 01:18:33,013
padding traffic or fake
traffic to try to deceive end

1513
01:18:33,013 --> 01:18:34,360
to end traffic correlation.

1514
01:18:34,360 --> 01:18:36,150
This is an exciting
research field

1515
01:18:36,150 --> 01:18:40,422
that needs someone smarter
to work on it or someone

1516
01:18:40,422 --> 01:18:42,630
with a more practical attitude
to work on it than has

1517
01:18:42,630 --> 01:18:44,230
previously worked on it.

1518
01:18:44,230 --> 01:18:47,260
Too many of the results
in the research literature

1519
01:18:47,260 --> 01:18:51,345
there are only about
distinguishing the traffic

1520
01:18:51,345 --> 01:18:55,229
of two users on a number
containing one relay,

1521
01:18:55,229 --> 01:18:57,020
because that's how the
math was easy to do.

1522
01:19:00,230 --> 01:19:02,230
So because of this
kind of stuff,

1523
01:19:02,230 --> 01:19:04,240
all of the traffic
analysis defenses

1524
01:19:04,240 --> 01:19:06,200
that we know of in this
area that are still

1525
01:19:06,200 --> 01:19:10,109
compatible with broad
browsing, they sound good

1526
01:19:10,109 --> 01:19:11,150
if you read the abstract.

1527
01:19:11,150 --> 01:19:14,790
You'll say, hooray, this
one forces the attacker

1528
01:19:14,790 --> 01:19:17,510
to gather three times as
much traffic before they

1529
01:19:17,510 --> 01:19:19,020
can correlate users.

1530
01:19:19,020 --> 01:19:20,950
Except when you
actually read the paper,

1531
01:19:20,950 --> 01:19:23,510
previously the attacker needed
two seconds worth of traffic,

1532
01:19:23,510 --> 01:19:24,485
and then they won.

1533
01:19:24,485 --> 01:19:26,940
Now they need six seconds.

1534
01:19:26,940 --> 01:19:29,430
That's not really a
defence in this model,

1535
01:19:29,430 --> 01:19:33,699
although perhaps
against a real network,

1536
01:19:33,699 --> 01:19:35,740
the numbers would be
different and it might work.

1537
01:19:35,740 --> 01:19:38,930
So we would actually like
to see some stuff done

1538
01:19:38,930 --> 01:19:40,470
with padding and fake traffic.

1539
01:19:40,470 --> 01:19:43,645
But we don't like to
add voodoo defenses

1540
01:19:43,645 --> 01:19:45,580
that we conjecture to
maybe do some good,

1541
01:19:45,580 --> 01:19:47,340
although we can't do that.

1542
01:19:47,340 --> 01:19:48,715
We actually like
to have evidence

1543
01:19:48,715 --> 01:19:50,590
that any changes
we're going to make

1544
01:19:50,590 --> 01:19:51,790
are going to help something.

1545
01:19:51,790 --> 01:19:53,240
I think I'm out of time.

1546
01:19:53,240 --> 01:19:55,439
And there may be a
class in here after us?

1547
01:19:55,439 --> 01:19:55,980
There is not?

1548
01:19:55,980 --> 01:19:58,104
All right, so I'm going to
hang around for a while.

1549
01:19:58,104 --> 01:20:00,140
And thanks for coming to listen.

1550
01:20:00,140 --> 01:20:02,290
I would take questions now.

1551
01:20:02,290 --> 01:20:06,089
But it's 12:25, and folks
may have another class.

1552
01:20:06,089 --> 01:20:07,380
But I'll be around [INAUDIBLE].

1553
01:20:07,380 --> 01:20:08,880
Thank you very much for coming.

1554
01:20:08,880 --> 01:20:11,952
[APPLAUSE]