1
00:00:00,070 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,810
Commons license.

3
00:00:03,810 --> 00:00:06,060
Your support will help
MIT OpenCourseWare

4
00:00:06,060 --> 00:00:10,150
continue to offer high-quality
educational resources for free.

5
00:00:10,150 --> 00:00:12,700
To make a donation or to
view additional materials

6
00:00:12,700 --> 00:00:16,600
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,600 --> 00:00:17,310
at ocw.mit.edu.

8
00:00:26,169 --> 00:00:27,210
PROFESSOR: Hey, everyone.

9
00:00:27,210 --> 00:00:28,076
Good on that?

10
00:00:28,076 --> 00:00:29,480
All right, cool.

11
00:00:29,480 --> 00:00:34,477
So today we're going to talk
about the economics of spam

12
00:00:34,477 --> 00:00:35,740
and security in general.

13
00:00:35,740 --> 00:00:37,355
And so up to this
point in the class,

14
00:00:37,355 --> 00:00:40,540
we've mainly talked about the
technical aspects of security.

15
00:00:40,540 --> 00:00:42,680
So we've looked at things
like buffer overflows,

16
00:00:42,680 --> 00:00:46,540
the same-origin policy, Tor, and
all kinds of things like that.

17
00:00:46,540 --> 00:00:49,550
And so the context
for that discussion

18
00:00:49,550 --> 00:00:53,780
was that we were looking at
how an adversary can compromise

19
00:00:53,780 --> 00:00:54,560
a system.

20
00:00:54,560 --> 00:00:56,570
We tried to devise
a threat model that

21
00:00:56,570 --> 00:00:58,820
would describe the types of
things we want to prevent,

22
00:00:58,820 --> 00:01:00,320
and then we tried
to think about how

23
00:01:00,320 --> 00:01:03,400
we could design systems
that would help us to defend

24
00:01:03,400 --> 00:01:05,129
against that threat model.

25
00:01:05,129 --> 00:01:07,560
So today we're going to look
at an altered perspective.

26
00:01:07,560 --> 00:01:09,950
And the perspective
that we'll look at today

27
00:01:09,950 --> 00:01:13,520
is, why is the attacker trying
to compromise your system?

28
00:01:13,520 --> 00:01:17,189
Why is the attacker trying to
do these evil things to us?

29
00:01:17,189 --> 00:01:18,730
And so there's a
bunch of the reasons

30
00:01:18,730 --> 00:01:20,750
you can imagine why
attackers might be

31
00:01:20,750 --> 00:01:22,510
trying to do these evil things.

32
00:01:22,510 --> 00:01:25,805
So some of these attacks are
done for ideological reasons.

33
00:01:25,805 --> 00:01:27,804
So think about
people who perceive

34
00:01:27,804 --> 00:01:30,220
themselves to be political
activists, or things like that.

35
00:01:30,220 --> 00:01:32,950
Or if you think about
Stuxnet, for example.

36
00:01:32,950 --> 00:01:35,490
Sometimes it's like governments
attacking other governments.

37
00:01:35,490 --> 00:01:38,470
And so for these
types of attacks

38
00:01:38,470 --> 00:01:41,265
money, economics, is not
the primary motivation

39
00:01:41,265 --> 00:01:42,816
for the attack to take place.

40
00:01:42,816 --> 00:01:45,050
And what's interesting
is that it's actually

41
00:01:45,050 --> 00:01:48,540
hard to make these attacks go
away, other than generically

42
00:01:48,540 --> 00:01:51,357
making computers more secure.

43
00:01:51,357 --> 00:01:53,190
There's not really some
financial thumbscrew

44
00:01:53,190 --> 00:01:57,010
you can turn to make these
attackers disincentivized

45
00:01:57,010 --> 00:01:57,940
to do things.

46
00:01:57,940 --> 00:02:02,170
However, there are
some types of attacks

47
00:02:02,170 --> 00:02:04,900
that do involve a strong
economic component,

48
00:02:04,900 --> 00:02:07,690
and those are some of the things
we're going to look at today.

49
00:02:07,690 --> 00:02:08,990
One of the interesting
things, though,

50
00:02:08,990 --> 00:02:09,929
is that for a lot
of these attacks

51
00:02:09,929 --> 00:02:12,099
that don't have an economic
component, in that we

52
00:02:12,099 --> 00:02:14,640
can't use regulations and things
like that to try and prevent

53
00:02:14,640 --> 00:02:15,139
them.

54
00:02:15,139 --> 00:02:17,426
It can sometimes be
difficult to figure out

55
00:02:17,426 --> 00:02:19,800
how we'd be able to stop them
at all beyond, like I said,

56
00:02:19,800 --> 00:02:21,549
just trying to make
computers more secure.

57
00:02:21,549 --> 00:02:23,570
For example, Stuxnet's
a great idea.

58
00:02:23,570 --> 00:02:26,850
So this is the malware
that was attacking

59
00:02:26,850 --> 00:02:30,740
some of the industrial software
in Iran, with the centrifuges.

60
00:02:30,740 --> 00:02:34,430
So we all kind of know where
Stuxnet came from, right?

61
00:02:34,430 --> 00:02:36,850
We basically know it was the
Americans and the Israelis.

62
00:02:36,850 --> 00:02:37,370
Basically.

63
00:02:37,370 --> 00:02:40,000
But can we prove that
in a court of law?

64
00:02:40,000 --> 00:02:43,344
Like, who can we sue, to say
You put Stuxnet on our machine?

65
00:02:43,344 --> 00:02:44,885
So it becomes a
little bit murky when

66
00:02:44,885 --> 00:02:47,100
you have some of these
attacks, where it's not clear

67
00:02:47,100 --> 00:02:49,720
you can sue the Federal
Reserve, or you can sue Israel,

68
00:02:49,720 --> 00:02:50,770
for something like this.

69
00:02:50,770 --> 00:02:52,000
And furthermore, no
one's gone on the record

70
00:02:52,000 --> 00:02:53,750
as officially claiming
that it was them.

71
00:02:53,750 --> 00:02:56,660
So there's some very interesting
legal and financial issues

72
00:02:56,660 --> 00:02:58,243
that get involved
when you look at how

73
00:02:58,243 --> 00:02:59,460
to prevent these attacks.

74
00:02:59,460 --> 00:03:01,770
So there are many
kinds of computer crime

75
00:03:01,770 --> 00:03:04,440
that are driven by
economic motivations.

76
00:03:04,440 --> 00:03:07,050
So for example, state-sponsored
industrial espionage,

77
00:03:07,050 --> 00:03:07,819
for instance.

78
00:03:07,819 --> 00:03:10,110
So this is one thing that
some of our previous speakers

79
00:03:10,110 --> 00:03:10,660
have talked about.

80
00:03:10,660 --> 00:03:12,230
Sometimes governments
try to hack

81
00:03:12,230 --> 00:03:14,540
into other governments
or other industries

82
00:03:14,540 --> 00:03:17,562
to steal intellectual
property, or things like that.

83
00:03:17,562 --> 00:03:20,020
And what's interesting is that,
like the attacks that we'll

84
00:03:20,020 --> 00:03:21,840
look at today, which
are spam, you'll

85
00:03:21,840 --> 00:03:24,750
see that actually take some
money to make some money.

86
00:03:24,750 --> 00:03:27,770
Spammers actually have to
invest in an infrastructure

87
00:03:27,770 --> 00:03:30,100
before they can actually
send these messages out.

88
00:03:30,100 --> 00:03:32,630
And so if you have these
attacks where it takes money

89
00:03:32,630 --> 00:03:34,290
to make money, and
you can figure out

90
00:03:34,290 --> 00:03:37,314
what that financial sort of
tool chain looks like, then

91
00:03:37,314 --> 00:03:38,730
maybe you can think
about applying

92
00:03:38,730 --> 00:03:43,580
upstream financial pressure to
stop that downstream malware

93
00:03:43,580 --> 00:03:46,470
attacks or security problems.

94
00:03:46,470 --> 00:03:47,900
And so I think the
take-home point

95
00:03:47,900 --> 00:03:50,840
is that if we look at the
context of spam in particular,

96
00:03:50,840 --> 00:03:54,550
spammers will stop sending spam
if it becomes unprofitable.

97
00:03:54,550 --> 00:03:56,980
One of the sad truths of
the world that we continue

98
00:03:56,980 --> 00:03:59,260
to get spam messages
because it's cheap for them

99
00:03:59,260 --> 00:04:02,465
to send them, and 2% to 3%
of our fellow human beings

100
00:04:02,465 --> 00:04:05,050
will actually click on
links and look at stuff.

101
00:04:05,050 --> 00:04:08,430
And so as long as these costs
for sending these messages out

102
00:04:08,430 --> 00:04:10,775
are so low, then even if
the hit rates are low,

103
00:04:10,775 --> 00:04:12,900
people can still make money
off that kind of stuff.

104
00:04:12,900 --> 00:04:19,200
So for today we're
going to look at attacks

105
00:04:19,200 --> 00:04:24,266
that have a significant
economic component to them.

106
00:04:27,020 --> 00:04:30,110
And so one interesting
example which I actually just

107
00:04:30,110 --> 00:04:33,490
read about takes place in China.

108
00:04:33,490 --> 00:04:37,710
And so in China they
have this problem

109
00:04:37,710 --> 00:04:41,680
with what they call
text message cars.

110
00:04:41,680 --> 00:04:46,350
So the basic idea here is
that people drive around

111
00:04:46,350 --> 00:04:49,790
with these cars that have
these radio antennas attached

112
00:04:49,790 --> 00:04:50,730
to the side.

113
00:04:50,730 --> 00:04:52,770
And they can essentially
do-- think of it

114
00:04:52,770 --> 00:04:55,520
almost like a man in the middle
between people's mobile cell

115
00:04:55,520 --> 00:04:57,850
phones and the actual
cellphone tower.

116
00:04:57,850 --> 00:05:00,360
And so they can basically run
around in these troll cars,

117
00:05:00,360 --> 00:05:02,420
and they can get all of
these cell phone numbers,

118
00:05:02,420 --> 00:05:06,600
and then use that car to
send spam messages directly

119
00:05:06,600 --> 00:05:09,190
to the numbers that
they've collected using

120
00:05:09,190 --> 00:05:12,040
this sort of vehicle take.

121
00:05:12,040 --> 00:05:13,850
So these text message
cars can actually

122
00:05:13,850 --> 00:05:21,440
send upward of 200,000
messages a day,

123
00:05:21,440 --> 00:05:23,100
which is an incredibly
high number.

124
00:05:23,100 --> 00:05:25,630
And the cost of labor over
there is actually very cheap.

125
00:05:25,630 --> 00:05:28,134
So it's very inexpensive
to hire a driver,

126
00:05:28,134 --> 00:05:29,800
drive around one of
these cars, and just

127
00:05:29,800 --> 00:05:32,070
snoop on people's traffic
and send them spam.

128
00:05:32,070 --> 00:05:33,970
So let's look at the
economics of this.

129
00:05:33,970 --> 00:05:40,530
So what is the cost
of the evil antenna,

130
00:05:40,530 --> 00:05:43,350
this thing that
allows people to take

131
00:05:43,350 --> 00:05:45,630
these messages off the air?

132
00:05:45,630 --> 00:05:50,530
Roughly speaking, it's
somewhere in the order of about

133
00:05:50,530 --> 00:05:53,790
1600 bucks, give or take.

134
00:05:53,790 --> 00:05:59,760
So how much profit can
these people make a day?

135
00:05:59,760 --> 00:06:01,470
So in a hilarious
coincidence, this

136
00:06:01,470 --> 00:06:06,074
is also roughly 1600 dollars.

137
00:06:06,074 --> 00:06:07,240
So this is very interesting.

138
00:06:07,240 --> 00:06:10,230
What this means is that once
you buy one of these things,

139
00:06:10,230 --> 00:06:12,872
then in a day essentially
you've made back your money.

140
00:06:12,872 --> 00:06:16,260
So that's great, from the
perspective of being a spammer.

141
00:06:16,260 --> 00:06:18,835
Now you might say, OK, but you
might get caught by the police

142
00:06:18,835 --> 00:06:21,210
and then you might get put in
jail or have to pay a fine.

143
00:06:21,210 --> 00:06:29,650
So in the case of the fines,
the fines for getting caught

144
00:06:29,650 --> 00:06:32,100
are less than 5K.

145
00:06:35,220 --> 00:06:37,810
And people rarely get caught.

146
00:06:37,810 --> 00:06:40,215
And so these are the
types of calculations

147
00:06:40,215 --> 00:06:41,715
we have to look at
when we're trying

148
00:06:41,715 --> 00:06:44,360
to think about how
to economically deter

149
00:06:44,360 --> 00:06:45,620
these spammers.

150
00:06:45,620 --> 00:06:47,060
Because if these
spammers only get

151
00:06:47,060 --> 00:06:49,870
caught a couple times a
year, and they basically

152
00:06:49,870 --> 00:06:52,570
make back their hardware
costs in a single day,

153
00:06:52,570 --> 00:06:54,360
it's very tricky
to figure out how

154
00:06:54,360 --> 00:06:56,605
we can use financial
disincentives to make them

155
00:06:56,605 --> 00:06:58,330
stop doing this kind of stuff.

156
00:06:58,330 --> 00:07:02,790
And what's interesting is that
in China the mobile carriers

157
00:07:02,790 --> 00:07:05,160
are also somewhat
implicit in this scheme.

158
00:07:05,160 --> 00:07:06,740
So every time you
send a spam, you're

159
00:07:06,740 --> 00:07:09,540
going to send some small amount
of money to the mobile carrier,

160
00:07:09,540 --> 00:07:09,720
right?

161
00:07:09,720 --> 00:07:10,420
A couple cents.

162
00:07:10,420 --> 00:07:11,970
It works that way
over here as well.

163
00:07:11,970 --> 00:07:14,280
Now over here in
Europe in many cases,

164
00:07:14,280 --> 00:07:16,450
the mobile carriers
have decided that they

165
00:07:16,450 --> 00:07:18,610
don't want angry customers
contacting them saying,

166
00:07:18,610 --> 00:07:20,970
I'm getting hit by these
spam messages all the time.

167
00:07:20,970 --> 00:07:23,410
But apparently a lot of the
Chinese mobile carriers,

168
00:07:23,410 --> 00:07:24,910
at least the top
three ones, they're

169
00:07:24,910 --> 00:07:26,780
actually seeing
these spam messages

170
00:07:26,780 --> 00:07:29,070
as a source of revenue.

171
00:07:29,070 --> 00:07:31,970
They actually think this
is a nice way for them

172
00:07:31,970 --> 00:07:32,950
to get some free money.

173
00:07:32,950 --> 00:07:36,810
So in fact these telcos
have set up these things

174
00:07:36,810 --> 00:07:41,414
called 106 prefix numbers.

175
00:07:41,414 --> 00:07:44,190
I don't know if you've
heard of these before.

176
00:07:44,190 --> 00:07:44,849
[BANGING]

177
00:07:44,849 --> 00:07:48,521
But the original-- there's
apparently a ghost in the room.

178
00:07:48,521 --> 00:07:50,810
The original purpose
of these numbers

179
00:07:50,810 --> 00:07:53,710
was to do things for
non-commercial reasons.

180
00:07:53,710 --> 00:07:56,180
For example, imagine
that you run a company,

181
00:07:56,180 --> 00:07:58,120
and you want to send a
bunch of text messages

182
00:07:58,120 --> 00:07:59,540
to all of your employees.

183
00:07:59,540 --> 00:08:02,205
You can use one of
these 106 numbers,

184
00:08:02,205 --> 00:08:05,730
and you would basically be
able to send things in bulk.

185
00:08:05,730 --> 00:08:08,510
You'd be able to avoid some
of the built-in rate-limiting

186
00:08:08,510 --> 00:08:10,840
mechanisms they had
in the cell network.

187
00:08:10,840 --> 00:08:12,630
So there's this nice
thing sitting around

188
00:08:12,630 --> 00:08:14,820
that spammers can actually use.

189
00:08:14,820 --> 00:08:16,800
And so as it turns
out, I think it's

190
00:08:16,800 --> 00:08:26,050
something like 55% of the mobile
span that gets sent in China

191
00:08:26,050 --> 00:08:30,620
comes from one of
these 106 numbers.

192
00:08:30,620 --> 00:08:32,900
So this is a really
interesting case study

193
00:08:32,900 --> 00:08:36,180
of how these financial
numbers work out,

194
00:08:36,180 --> 00:08:37,710
and how sometimes
you can actually

195
00:08:37,710 --> 00:08:41,630
have these sort of perverse
incentives, where in this case

196
00:08:41,630 --> 00:08:44,160
the cellphone carriers
are just going along

197
00:08:44,160 --> 00:08:47,223
with these scams
and these schemes.

198
00:08:47,223 --> 00:08:49,056
And there'll be a link
in the lecture notes.

199
00:08:49,056 --> 00:08:50,968
There's an interesting
Economist article about this.

200
00:08:50,968 --> 00:08:51,759
[BANGING CONTINUES]

201
00:08:51,759 --> 00:08:55,800
There is like a pan-African
drum circle back there.

202
00:08:55,800 --> 00:08:57,450
This is super exciting, though.

203
00:08:57,450 --> 00:08:57,950
I like it.

204
00:08:57,950 --> 00:08:59,430
I am being
adversarially attacked.

205
00:08:59,430 --> 00:09:00,383
That's OK.

206
00:09:00,383 --> 00:09:02,130
We will play through the pain.

207
00:09:02,130 --> 00:09:03,780
Perhaps this is the Mossad.

208
00:09:03,780 --> 00:09:06,920
They don't want me to
talk about Stuxnet.

209
00:09:06,920 --> 00:09:09,040
Another interesting
thing about security

210
00:09:09,040 --> 00:09:12,400
is that there are actually
many companies that

211
00:09:12,400 --> 00:09:14,470
deal in cyber arms.

212
00:09:14,470 --> 00:09:17,802
So this is kind of
something out of G.I. Joe,

213
00:09:17,802 --> 00:09:20,260
but there are actually these
companies that will sit around

214
00:09:20,260 --> 00:09:22,976
and they will actually
sell you malware,

215
00:09:22,976 --> 00:09:24,350
they will sell
you exploits, they

216
00:09:24,350 --> 00:09:26,300
will sell you things like this.

217
00:09:26,300 --> 00:09:34,430
So one example is this
company that's called Endgame.

218
00:09:34,430 --> 00:09:42,210
And so for example for
about $1.5 million,

219
00:09:42,210 --> 00:09:45,940
Endgame will give
you IP addresses

220
00:09:45,940 --> 00:09:53,195
and the physical locations of
millions of unpatched machines.

221
00:09:57,460 --> 00:10:00,450
So they have sort of vantage
points all over the internet,

222
00:10:00,450 --> 00:10:02,620
and they know all kinds
of interesting information

223
00:10:02,620 --> 00:10:04,690
about machines that
you may or may not

224
00:10:04,690 --> 00:10:07,690
want to attack if, for
example, you're a government,

225
00:10:07,690 --> 00:10:09,890
or if you're another agency
or something like that.

226
00:10:09,890 --> 00:10:15,650
For about $2.5 million,
they will give you

227
00:10:15,650 --> 00:10:22,990
what is delightfully called a
zero-day subscription package.

228
00:10:22,990 --> 00:10:28,170
And so if you sign up for
this, then basically you

229
00:10:28,170 --> 00:10:30,800
will get 25 exploits
a year, they

230
00:10:30,800 --> 00:10:33,130
claim, for that much money.

231
00:10:33,130 --> 00:10:36,060
And so you'll get those exploits
in your inbox or whatever.

232
00:10:36,060 --> 00:10:39,880
Once again, you can do with
these things whatever you want.

233
00:10:39,880 --> 00:10:41,565
You've clearly got
2.5 million dollars,

234
00:10:41,565 --> 00:10:43,660
so you've got a lot of spare
time to think about this stuff,

235
00:10:43,660 --> 00:10:44,320
presumably.

236
00:10:44,320 --> 00:10:46,240
And so what's
interesting is that a lot

237
00:10:46,240 --> 00:10:48,420
of people who work in
these cyber arms dealers,

238
00:10:48,420 --> 00:10:50,850
they're actually ex
three-letter agencies.

239
00:10:50,850 --> 00:10:53,662
They're ex-CIA, or ex-NSA,
or things like this.

240
00:10:53,662 --> 00:10:55,120
It's interesting
to think about who

241
00:10:55,120 --> 00:10:57,867
are the actual customers of
these cyber arms dealers.

242
00:10:57,867 --> 00:10:59,450
Some of them are
actually governments,

243
00:10:59,450 --> 00:11:01,199
like the American
government, for example.

244
00:11:01,199 --> 00:11:03,310
And they use these things
to attack other nations,

245
00:11:03,310 --> 00:11:04,070
or whatever.

246
00:11:04,070 --> 00:11:06,337
But some of the people
who buy this stuff

247
00:11:06,337 --> 00:11:07,920
are actually,
increasingly, companies.

248
00:11:07,920 --> 00:11:09,670
So one thing we'll
talk about a little bit

249
00:11:09,670 --> 00:11:12,260
at the end of the lecture is
how sometimes companies are now

250
00:11:12,260 --> 00:11:13,968
taking cybersecurity
into their own hands

251
00:11:13,968 --> 00:11:17,000
and sometimes doing
what's called hackbacks.

252
00:11:17,000 --> 00:11:19,026
So without getting the
government involved,

253
00:11:19,026 --> 00:11:20,900
companies that are
attacked by cybercriminals

254
00:11:20,900 --> 00:11:22,680
will sometimes go
back and explicitly

255
00:11:22,680 --> 00:11:24,810
try to take out people
who tried to steal

256
00:11:24,810 --> 00:11:26,070
their intellectual property.

257
00:11:26,070 --> 00:11:28,430
And they've used some very
inventive legal arguments

258
00:11:28,430 --> 00:11:30,140
to justify this, and
so far it's actually

259
00:11:30,140 --> 00:11:31,098
been fairly successful.

260
00:11:31,098 --> 00:11:33,395
So this is an interesting
aspect of cyber warfare.

261
00:11:33,395 --> 00:11:35,135
AUDIENCE: How is
any of that legal?

262
00:11:38,910 --> 00:11:39,910
PROFESSOR: Well, so.

263
00:11:39,910 --> 00:11:42,181
I mean, information
wants to be free, dude.

264
00:11:42,181 --> 00:11:42,680
Right?

265
00:11:42,680 --> 00:11:46,910
So if you think about stuff
like this, for example.

266
00:11:46,910 --> 00:11:49,895
Just telling you stuff
isn't necessarily illegal.

267
00:11:49,895 --> 00:11:52,020
I mean, it gets a
little bit gray.

268
00:11:52,020 --> 00:11:54,860
But for example, if I tell
you that look over there,

269
00:11:54,860 --> 00:11:59,550
there's a house, and the lock
doesn't work on that door.

270
00:11:59,550 --> 00:12:00,730
Can I have 20 bucks?

271
00:12:00,730 --> 00:12:02,540
That's not necessarily illegal.

272
00:12:02,540 --> 00:12:04,730
Because as it turns
out, these companies

273
00:12:04,730 --> 00:12:06,880
have, like, hordes
of lawyers that

274
00:12:06,880 --> 00:12:08,880
look into things like this.

275
00:12:08,880 --> 00:12:10,654
But in many cases, if
you think about it,

276
00:12:10,654 --> 00:12:12,320
you can search for
stuff on the internet

277
00:12:12,320 --> 00:12:14,986
and go to websites that tell you
things like how to build bombs,

278
00:12:14,986 --> 00:12:16,460
for example.

279
00:12:16,460 --> 00:12:19,170
Just posting that
information typically

280
00:12:19,170 --> 00:12:21,392
is not illegal, because
you're just learning.

281
00:12:21,392 --> 00:12:22,850
What if I'm a
chemist, for example?

282
00:12:22,850 --> 00:12:24,680
Or something like this.

283
00:12:24,680 --> 00:12:27,200
So a lot of times, just
giving someone knowledge

284
00:12:27,200 --> 00:12:29,045
is not necessarily illegal.

285
00:12:29,045 --> 00:12:31,290
But you're right that
there's some gray areas here,

286
00:12:31,290 --> 00:12:34,220
and as we'll talk about with
some of these hackbacks,

287
00:12:34,220 --> 00:12:35,250
it's not always clear.

288
00:12:35,250 --> 00:12:38,730
For example, if I am a bank, I'm
not a government, I'm a bank.

289
00:12:38,730 --> 00:12:39,500
I get hacked.

290
00:12:39,500 --> 00:12:40,600
It's not always
clear that I actually

291
00:12:40,600 --> 00:12:42,058
have the legal
authority to go back

292
00:12:42,058 --> 00:12:44,690
and, let's say, try to shut down
a botnet or things like that.

293
00:12:44,690 --> 00:12:46,680
Companies have done
stuff like that.

294
00:12:46,680 --> 00:12:50,670
But I think this is an
example where the law is

295
00:12:50,670 --> 00:12:54,610
lagging behind practice.

296
00:12:54,610 --> 00:12:56,170
And so people have
used things like,

297
00:12:56,170 --> 00:12:57,970
we will use copyright
infringement law

298
00:12:57,970 --> 00:12:59,880
to attack botnets as a company.

299
00:12:59,880 --> 00:13:02,260
Because they're selling
legal goods of ours,

300
00:13:02,260 --> 00:13:04,470
so we'll use IP infringement.

301
00:13:04,470 --> 00:13:06,470
Like, this is probably
not what Thomas Jefferson

302
00:13:06,470 --> 00:13:07,845
was thinking when
he was thinking

303
00:13:07,845 --> 00:13:09,360
about how these laws work.

304
00:13:09,360 --> 00:13:11,370
So this is a little bit
of a cat-and-mouse game.

305
00:13:11,370 --> 00:13:15,650
So we'll do a little bit of
that later in the lecture.

306
00:13:15,650 --> 00:13:17,940
So, yes, this is
very interesting.

307
00:13:17,940 --> 00:13:21,130
Basically what this all
means is that there's

308
00:13:21,130 --> 00:13:28,760
this marketplace for all kinds
of computational resources

309
00:13:28,760 --> 00:13:32,500
that you might use as someone
who wants to launch attacks.

310
00:13:32,500 --> 00:13:34,070
So for example,
there's a marketplace

311
00:13:34,070 --> 00:13:39,270
for compromised systems.

312
00:13:39,270 --> 00:13:40,980
So, for example, you
can go to the darker

313
00:13:40,980 --> 00:13:43,820
places of the internet,
you can purchase

314
00:13:43,820 --> 00:13:47,460
entire compromised machines
that might be part of a botnet.

315
00:13:47,460 --> 00:13:51,130
You can actually buy access
to a compromised website,

316
00:13:51,130 --> 00:13:52,070
for example.

317
00:13:52,070 --> 00:13:55,130
You might use that website
to post spam, or put up

318
00:13:55,130 --> 00:13:57,630
evil links, or things like that.

319
00:13:57,630 --> 00:14:00,810
You can also get access to
compromised email accounts,

320
00:14:00,810 --> 00:14:02,265
like Gmail or Yahoo accounts.

321
00:14:02,265 --> 00:14:03,640
As we'll talk
later, those things

322
00:14:03,640 --> 00:14:05,726
are very very powerful
for an attacker.

323
00:14:05,726 --> 00:14:08,190
And you may also just buy
sort of a subscription

324
00:14:08,190 --> 00:14:09,472
service for a botnet.

325
00:14:09,472 --> 00:14:11,180
You'll just have this
thing lying around.

326
00:14:11,180 --> 00:14:13,280
You can use it to send denial
of service attacks or things

327
00:14:13,280 --> 00:14:13,780
like that.

328
00:14:13,780 --> 00:14:15,350
So there's a
marketplace for that.

329
00:14:15,350 --> 00:14:18,650
There's a marketplace for tools.

330
00:14:18,650 --> 00:14:22,170
So you can get, as an attacker,
off-the-shelf malware kits,

331
00:14:22,170 --> 00:14:23,470
for example.

332
00:14:23,470 --> 00:14:26,370
You can use perhaps
arms dealers like this

333
00:14:26,370 --> 00:14:27,893
to get access to
zero-day exploits

334
00:14:27,893 --> 00:14:30,510
so you can write your own
malware, so on and so forth.

335
00:14:30,510 --> 00:14:32,620
And there's also
a big marketplace

336
00:14:32,620 --> 00:14:38,150
for stolen user information.

337
00:14:38,150 --> 00:14:40,480
So this is stuff like
Social Security numbers,

338
00:14:40,480 --> 00:14:44,040
credit card numbers, email
addresses, so on and so forth.

339
00:14:44,040 --> 00:14:45,710
So it's all out
there on the internet

340
00:14:45,710 --> 00:14:47,717
if you're just willing
to look for it.

341
00:14:47,717 --> 00:14:49,350
And so the paper
that we're going

342
00:14:49,350 --> 00:14:52,550
to look at today
basically focused

343
00:14:52,550 --> 00:14:56,990
on one aspect of this,
which is the spam ecosystem.

344
00:15:00,110 --> 00:15:02,020
And so in particular,
they look at the sale

345
00:15:02,020 --> 00:15:06,850
of pharmaceuticals, of
knockoff goods, and software.

346
00:15:06,850 --> 00:15:09,420
And so they basically
break this spam ecosystem

347
00:15:09,420 --> 00:15:11,100
into three parts.

348
00:15:11,100 --> 00:15:15,230
They break it into advertising.

349
00:15:15,230 --> 00:15:18,020
So this is the
process of somehow

350
00:15:18,020 --> 00:15:22,570
getting a user to click
on a spam link somehow.

351
00:15:22,570 --> 00:15:25,300
And then once they've
done that, there's

352
00:15:25,300 --> 00:15:29,890
this issue of click support.

353
00:15:29,890 --> 00:15:33,665
So this is the notion that
once the user clicks the link,

354
00:15:33,665 --> 00:15:36,165
there has to be some type of
web server, DNS infrastructure,

355
00:15:36,165 --> 00:15:38,220
so on and so forth
on the back end that

356
00:15:38,220 --> 00:15:40,790
actually presents the spam
website that the user goes to.

357
00:15:40,790 --> 00:15:43,076
And then the final
part is realization.

358
00:15:45,820 --> 00:15:48,910
So this is actually
allowing the user

359
00:15:48,910 --> 00:15:51,650
to say they want
to buy something.

360
00:15:51,650 --> 00:15:53,950
The user sends money
to the spammers,

361
00:15:53,950 --> 00:15:57,230
and the user's going to get some
product back in the back end.

362
00:15:57,230 --> 00:16:01,450
And so this is where all
of the money makes place.

363
00:16:01,450 --> 00:16:04,160
And so a lot of this
stuff is actually

364
00:16:04,160 --> 00:16:10,070
outsourced to what the paper
calls affiliate programs.

365
00:16:13,050 --> 00:16:15,650
And so you can think of
these affiliate programs

366
00:16:15,650 --> 00:16:20,030
as essentially doing a
lot of the back-end grunt

367
00:16:20,030 --> 00:16:23,130
work of talking to banks
and Visa and MasterCard

368
00:16:23,130 --> 00:16:24,200
and things like this.

369
00:16:24,200 --> 00:16:26,044
And so a lot of
times, the spammers,

370
00:16:26,044 --> 00:16:27,710
they don't want to
deal with that stuff.

371
00:16:27,710 --> 00:16:29,640
They just want to
create the links

372
00:16:29,640 --> 00:16:32,520
and do-- you can think of it
as the advertising component.

373
00:16:32,520 --> 00:16:34,230
And so a lot of
times the spammers

374
00:16:34,230 --> 00:16:37,920
themselves, they will
work on a commission.

375
00:16:37,920 --> 00:16:42,340
So they will get, let's
say, anywhere between 30%

376
00:16:42,340 --> 00:16:49,890
and maybe 50% of the final
sale that they deliver to one

377
00:16:49,890 --> 00:16:52,670
of these back-end affiliates.

378
00:16:52,670 --> 00:16:55,541
So does that all make
sense at a high level?

379
00:16:55,541 --> 00:16:56,040
OK.

380
00:16:56,040 --> 00:17:02,570
So what we'll do is we'll look
at each component of this spam

381
00:17:02,570 --> 00:17:05,230
trajectory, and then see how
it works, and then maybe think

382
00:17:05,230 --> 00:17:07,505
about how we'd to be able
to shut down spammers

383
00:17:07,505 --> 00:17:11,540
at different levels
of this [INAUDIBLE].

384
00:17:11,540 --> 00:17:14,609
So the first thing we'll look
at is the advertising component.

385
00:17:18,992 --> 00:17:21,450
And so, like I mentioned, the
basic idea of the advertising

386
00:17:21,450 --> 00:17:29,440
is, how do you get the
user to click on a link?

387
00:17:34,180 --> 00:17:36,630
That's the primary question
we'll be concerned with here.

388
00:17:36,630 --> 00:17:39,320
And so the typical
thing, as we all know,

389
00:17:39,320 --> 00:17:42,457
is you're going to email
spam, although as we discussed

390
00:17:42,457 --> 00:17:43,915
at the beginning
of lecture, people

391
00:17:43,915 --> 00:17:45,670
are starting to use
text messages and some

392
00:17:45,670 --> 00:17:48,890
of these other forms
of communication.

393
00:17:48,890 --> 00:17:50,760
You could also imagine
maybe here we're

394
00:17:50,760 --> 00:17:53,305
going to start using
social networks as well.

395
00:17:53,305 --> 00:17:54,763
So now when you go
to Facebook, not

396
00:17:54,763 --> 00:17:56,929
only are you polluted by
your real friends' content,

397
00:17:56,929 --> 00:17:58,940
you're also polluted
by spam messages too.

398
00:17:58,940 --> 00:18:03,390
So this is about
economics, this discussion.

399
00:18:03,390 --> 00:18:05,190
So one interesting
question is, how much

400
00:18:05,190 --> 00:18:08,350
does it cost to actually
send out these spam messages.

401
00:18:08,350 --> 00:18:12,250
And so as it turns out, it's
not very expensive at all.

402
00:18:12,250 --> 00:18:18,454
For about 60 bucks, you can
spend a million spam messages.

403
00:18:21,150 --> 00:18:23,760
So that's a super,
super low cost.

404
00:18:23,760 --> 00:18:26,190
And this cost is
actually much lower

405
00:18:26,190 --> 00:18:28,220
if you're directly
operating a botnet.

406
00:18:28,220 --> 00:18:29,990
You can cut out the middleman.

407
00:18:29,990 --> 00:18:32,570
But even if you are
renting one of the botnets

408
00:18:32,570 --> 00:18:35,890
from one of these marketplaces,
this is still super, super low.

409
00:18:35,890 --> 00:18:38,154
AUDIENCE: So how many of
those are actually effective?

410
00:18:38,154 --> 00:18:40,072
As in, they don't get filtered?

411
00:18:40,072 --> 00:18:41,780
PROFESSOR: Ah, so
that's a good question.

412
00:18:41,780 --> 00:18:44,300
So that leads to my next point.

413
00:18:44,300 --> 00:18:46,299
So you're sending
a million spams,

414
00:18:46,299 --> 00:18:47,840
but then they're
going to get dropped

415
00:18:47,840 --> 00:18:49,174
at various points along the way.

416
00:18:49,174 --> 00:18:51,006
They're going to get
caught in spam filters,

417
00:18:51,006 --> 00:18:53,048
people will-- they see it
but they just delete it

418
00:18:53,048 --> 00:18:55,005
because they know that
an email that has, like,

419
00:18:55,005 --> 00:18:56,700
18 dollar signs should
just be deleted.

420
00:18:56,700 --> 00:18:58,940
So if you look at
the conversion rate,

421
00:18:58,940 --> 00:19:00,870
you'll see that the
click rates are actually

422
00:19:00,870 --> 00:19:04,320
very low because of
things like spam filters

423
00:19:04,320 --> 00:19:05,290
and stuff like that.

424
00:19:05,290 --> 00:19:10,200
And also many users are
trained to avoid these things.

425
00:19:10,200 --> 00:19:11,950
Click rates are low.

426
00:19:11,950 --> 00:19:15,170
And this is why
sending spam has to be

427
00:19:15,170 --> 00:19:18,800
super, super cheap,
because you will not

428
00:19:18,800 --> 00:19:20,040
get a lot of conversions.

429
00:19:20,040 --> 00:19:21,850
So for example, there have been
some empirical studies that

430
00:19:21,850 --> 00:19:23,016
looked at these click rates.

431
00:19:23,016 --> 00:19:31,030
And one study found that they
looked at 350 million spam

432
00:19:31,030 --> 00:19:34,650
messages, and they
found that out

433
00:19:34,650 --> 00:19:37,650
of those 350 million
messages, there

434
00:19:37,650 --> 00:19:44,960
was only about 10,000
clicks on those messages.

435
00:19:44,960 --> 00:19:46,710
So there's a massive
dropoff here.

436
00:19:46,710 --> 00:19:49,750
And then out of
these 10,000 clicks

437
00:19:49,750 --> 00:19:52,567
there were only 28
purchase attempts.

438
00:19:55,680 --> 00:19:58,430
So that's super, super low.

439
00:19:58,430 --> 00:20:01,010
And so that's why it's
extremely important

440
00:20:01,010 --> 00:20:04,275
for this entire ecosystem to be
very cheap from the perspective

441
00:20:04,275 --> 00:20:04,820
of a spammer.

442
00:20:04,820 --> 00:20:06,653
Because I mean, look
at these dropoffs here.

443
00:20:06,653 --> 00:20:08,780
These are multiple
orders of magnitude.

444
00:20:08,780 --> 00:20:13,636
And so that's why one might
hope that at least in theory we

445
00:20:13,636 --> 00:20:15,010
could squeeze--
like for example,

446
00:20:15,010 --> 00:20:17,880
we could drive this
number up maybe just $10.

447
00:20:17,880 --> 00:20:20,280
Maybe that has some
catastrophic knockdown effect

448
00:20:20,280 --> 00:20:22,440
on how profitable this stuff is.

449
00:20:22,440 --> 00:20:24,995
So it's very important
for the spammers

450
00:20:24,995 --> 00:20:26,880
that everything be
as cheap as possible.

451
00:20:26,880 --> 00:20:28,848
AUDIENCE: So those
10,000 clicks.

452
00:20:28,848 --> 00:20:33,768
Again, how many of
those 350 million emails

453
00:20:33,768 --> 00:20:35,911
were filtered out of the inbox?

454
00:20:35,911 --> 00:20:39,854
I'm just trying to get a sense
of out of how many emails

455
00:20:39,854 --> 00:20:41,270
those clicks were
out of, to gauge

456
00:20:41,270 --> 00:20:45,577
how effective spam filtering is
versus how silly us humans are.

457
00:20:45,577 --> 00:20:47,410
PROFESSOR: Yeah, that
I'm not actually sure.

458
00:20:47,410 --> 00:20:49,960
That's a good question.

459
00:20:49,960 --> 00:20:52,870
AUDIENCE: So I was just
listening to a talk

460
00:20:52,870 --> 00:20:55,490
by Jeff Walker on
Friday about this stuff,

461
00:20:55,490 --> 00:20:59,350
and he says that on
the order of 20% to 40%

462
00:20:59,350 --> 00:21:02,990
of clicks going to one of
these websites actually

463
00:21:02,990 --> 00:21:04,425
goes from a user's spam folder.

464
00:21:04,425 --> 00:21:07,363
So users go in their spam
folder, looking for this stuff,

465
00:21:07,363 --> 00:21:08,238
and they click on it.

466
00:21:08,238 --> 00:21:10,070
So presumably there's
a class of customers

467
00:21:10,070 --> 00:21:11,842
that are looking for
this, and if they're

468
00:21:11,842 --> 00:21:14,300
looking for it-- oh, yeah, I'll
just go into my spam folder

469
00:21:14,300 --> 00:21:15,340
to find this.

470
00:21:15,340 --> 00:21:17,850
So it's not clear that things
going into spam folders

471
00:21:17,850 --> 00:21:19,324
are getting zero clicks.

472
00:21:19,324 --> 00:21:21,740
PROFESSOR: Yeah, I've heard
anecdotal reports of that too.

473
00:21:21,740 --> 00:21:24,900
Some people, even for
legitimate emails,

474
00:21:24,900 --> 00:21:26,980
they'll mark it as spam
just so that if there's

475
00:21:26,980 --> 00:21:29,512
a shoulder-surfer,
like at work, who's

476
00:21:29,512 --> 00:21:30,970
seeing them go to
Gmail, let's say,

477
00:21:30,970 --> 00:21:33,440
they won't come and see that
you've subscribed to, you know,

478
00:21:33,440 --> 00:21:33,940
whatever.

479
00:21:33,940 --> 00:21:35,950
And then they can secretly
go into the spam folder,

480
00:21:35,950 --> 00:21:37,910
they know it's not deleted,
and look at this stuff.

481
00:21:37,910 --> 00:21:38,890
This is actually a
really interesting point.

482
00:21:38,890 --> 00:21:41,020
There's this whole
psychology of who

483
00:21:41,020 --> 00:21:42,804
it is that actually
clicks on these links.

484
00:21:42,804 --> 00:21:45,470
And so I think one of the papers
that I linked to in the lecture

485
00:21:45,470 --> 00:21:49,830
notes talks about why these
Nigerian scams still work.

486
00:21:49,830 --> 00:21:52,810
Because you'd think that
anyone who basically

487
00:21:52,810 --> 00:21:54,440
has either common
sense themselves,

488
00:21:54,440 --> 00:21:56,270
or a friend who
has common sense,

489
00:21:56,270 --> 00:21:59,120
would never click on one of
these Nigerian email scams.

490
00:21:59,120 --> 00:21:59,620
Right?

491
00:21:59,620 --> 00:22:04,470
But it turns out that the
Nigerian meme is actually

492
00:22:04,470 --> 00:22:08,450
useful for spammers
to filter out idiots.

493
00:22:08,450 --> 00:22:12,260
In other words, if you are so
foolish that you would still

494
00:22:12,260 --> 00:22:15,210
click on a Nigerian
email, then oh, OK, you're

495
00:22:15,210 --> 00:22:19,230
going to do one of these
conversion things here.

496
00:22:19,230 --> 00:22:21,540
When you think about it,
that's one of the key things

497
00:22:21,540 --> 00:22:22,350
that spammers need.

498
00:22:22,350 --> 00:22:24,370
They need people
who are gullible

499
00:22:24,370 --> 00:22:28,370
enough or idealistic enough to
click through on these things.

500
00:22:28,370 --> 00:22:31,490
There's a whole sort of
psychology behind this.

501
00:22:31,490 --> 00:22:32,833
It's very interesting.

502
00:22:32,833 --> 00:22:36,037
AUDIENCE: So each of these
purchases, about how much

503
00:22:36,037 --> 00:22:37,254
are they worth?

504
00:22:37,254 --> 00:22:38,670
PROFESSOR: That's
a good question.

505
00:22:38,670 --> 00:22:41,560
So it actually depends
on the type of thing

506
00:22:41,560 --> 00:22:42,850
that you're looking at.

507
00:22:42,850 --> 00:22:45,930
A lot of these purchases are not
actually super high in value.

508
00:22:45,930 --> 00:22:48,500
So you're thinking that
someone's buying herbal Viagra

509
00:22:48,500 --> 00:22:50,530
or they're buying like
a knockoff Windows

510
00:22:50,530 --> 00:22:51,870
license or things like that.

511
00:22:51,870 --> 00:22:54,453
And in fact, a lot of times when
they're buying these knockoff

512
00:22:54,453 --> 00:22:55,924
products, presumably
the price is

513
00:22:55,924 --> 00:22:58,215
lower than what they'd actually
get in the real market,

514
00:22:58,215 --> 00:23:00,510
because otherwise you could
just go down to your local mall

515
00:23:00,510 --> 00:23:01,480
and buy these things.

516
00:23:01,480 --> 00:23:03,521
So a lot of times these
purchases you're actually

517
00:23:03,521 --> 00:23:05,850
making are less
than 1,000 dollars,

518
00:23:05,850 --> 00:23:09,205
and oftentimes a
lot less than that.

519
00:23:09,205 --> 00:23:11,310
Any other questions?

520
00:23:11,310 --> 00:23:12,180
OK.

521
00:23:12,180 --> 00:23:14,515
So these conversion rates
are super, super low.

522
00:23:14,515 --> 00:23:16,430
So like I said, one
of the key things

523
00:23:16,430 --> 00:23:22,680
to do as a defender is to
try to basically make spam

524
00:23:22,680 --> 00:23:29,380
more expensive for the spammer.

525
00:23:29,380 --> 00:23:31,020
So there's a couple
different ways

526
00:23:31,020 --> 00:23:32,920
you might think
about doing that.

527
00:23:32,920 --> 00:23:40,170
One way you might think about
doing that are IP blacklists.

528
00:23:40,170 --> 00:23:43,540
So maybe ISPs or
someone else basically

529
00:23:43,540 --> 00:23:45,545
collects this list
of IPS that are

530
00:23:45,545 --> 00:23:48,125
known to be bad, that are
known to come from spammers.

531
00:23:48,125 --> 00:23:51,630
And then we just don't let
these people send traffic.

532
00:23:51,630 --> 00:23:54,430
So this kinda-sorta used
to work for a while.

533
00:23:54,430 --> 00:23:58,470
But now it's so much
easier for the attackers

534
00:23:58,470 --> 00:24:00,756
to use techniques like
DNS redirection and stuff

535
00:24:00,756 --> 00:24:02,260
like that, that we'll talk
about in a little bit,

536
00:24:02,260 --> 00:24:04,200
this doesn't actually
work out very well.

537
00:24:04,200 --> 00:24:06,420
Because now there's a much
larger set of addresses

538
00:24:06,420 --> 00:24:08,890
that spammers can
send spam from,

539
00:24:08,890 --> 00:24:10,970
and they can also
dynamically switch

540
00:24:10,970 --> 00:24:15,480
the binding between
hostnames and web servers

541
00:24:15,480 --> 00:24:18,250
and all these types of things So
this doesn't work out so well.

542
00:24:18,250 --> 00:24:20,760
Another idea that's been
around for a long time

543
00:24:20,760 --> 00:24:27,600
is charging for email in some
way, so each email you send,

544
00:24:27,600 --> 00:24:30,840
you have to pay
some micropayment.

545
00:24:30,840 --> 00:24:33,024
So that currency could be
a couple different things.

546
00:24:33,024 --> 00:24:34,565
So you might imagine
that if I wanted

547
00:24:34,565 --> 00:24:36,360
to send you an email,
maybe I'd have to pay

548
00:24:36,360 --> 00:24:38,390
a tenth of a tenth of a penny.

549
00:24:38,390 --> 00:24:41,174
And that's no big deal
for me, because I don't

550
00:24:41,174 --> 00:24:42,340
send that many emails a day.

551
00:24:42,340 --> 00:24:44,798
But if you're a spammer trying
to operate at these volumes,

552
00:24:44,798 --> 00:24:46,000
then that quickly adds up.

553
00:24:46,000 --> 00:24:48,360
That destroys their value chain.

554
00:24:48,360 --> 00:24:49,945
Another idea that
people have had

555
00:24:49,945 --> 00:24:53,590
is, what if you used
computation as a currency?

556
00:24:53,590 --> 00:24:55,740
This is the idea
that before my email

557
00:24:55,740 --> 00:24:57,370
server will accept
an email from me,

558
00:24:57,370 --> 00:24:58,842
I have to solve some puzzle.

559
00:24:58,842 --> 00:25:01,680
I have to do some math trick,
or something like that.

560
00:25:01,680 --> 00:25:03,840
Once again, that
cuts down the rate

561
00:25:03,840 --> 00:25:07,642
at which these bulk
mailers can send messages.

562
00:25:07,642 --> 00:25:10,215
Also, we're all familiar
with CAPTCHAs, too.

563
00:25:10,215 --> 00:25:11,590
This is basically
the idea that I

564
00:25:11,590 --> 00:25:14,750
have to look at some
picture of nine animals

565
00:25:14,750 --> 00:25:16,260
and find the cat
instead of the dog,

566
00:25:16,260 --> 00:25:18,074
or type in some weird
squiggly number that

567
00:25:18,074 --> 00:25:19,990
looks like a migraine,
or something like that.

568
00:25:19,990 --> 00:25:24,280
So there have been
all kinds of ideas

569
00:25:24,280 --> 00:25:26,772
for charging for email to
stop this kind of stuff

570
00:25:26,772 --> 00:25:28,680
from happening.

571
00:25:28,680 --> 00:25:31,180
One of the classic problems,
though, with all these schemes,

572
00:25:31,180 --> 00:25:35,120
is who's going to be the
first one to implement it.

573
00:25:35,120 --> 00:25:37,172
And if all the email
providers don't move forward

574
00:25:37,172 --> 00:25:38,880
at the same time, then
of course spammers

575
00:25:38,880 --> 00:25:41,088
are just going to migrate
to the email providers that

576
00:25:41,088 --> 00:25:42,682
don't require these techniques.

577
00:25:42,682 --> 00:25:44,890
So there's been the problem
of how do we get everyone

578
00:25:44,890 --> 00:25:47,010
to upgrade en masse.

579
00:25:47,010 --> 00:25:48,930
And there's this
issue of, well, what

580
00:25:48,930 --> 00:25:52,360
would happen if a user
device is compromised?

581
00:25:52,360 --> 00:25:54,900
So maybe if someone breaks
into my Gmail account,

582
00:25:54,900 --> 00:25:56,275
then maybe they're
going to force

583
00:25:56,275 --> 00:26:00,330
me to pay 350 million
micropayments, which

584
00:26:00,330 --> 00:26:02,555
could individually bankrupt me.

585
00:26:02,555 --> 00:26:04,805
And so it's not quite clear
that some of these schemes

586
00:26:04,805 --> 00:26:06,335
are ready for
primetime, but they

587
00:26:06,335 --> 00:26:07,920
do represent an interesting
thought experiment

588
00:26:07,920 --> 00:26:09,700
about how you might be able
to stop some of this stuff

589
00:26:09,700 --> 00:26:10,658
from the senders' side.

590
00:26:10,658 --> 00:26:13,582
AUDIENCE: So how do they work
with mailing lists, where you

591
00:26:13,582 --> 00:26:14,790
have these big mailing lists?

592
00:26:14,790 --> 00:26:15,340
PROFESSOR: Yeah,
so there's problems

593
00:26:15,340 --> 00:26:17,820
with that, and with
mailing list aggregation.

594
00:26:17,820 --> 00:26:20,050
So it's very, very tricky,
because there are actually

595
00:26:20,050 --> 00:26:22,722
some bulk mails that
you do want to send.

596
00:26:22,722 --> 00:26:24,930
I mean, you might imagine
having some heuristic where

597
00:26:24,930 --> 00:26:27,010
you look at the size
of the mailing list

598
00:26:27,010 --> 00:26:29,702
and maybe you scale the
payment according to that.

599
00:26:29,702 --> 00:26:31,160
So for example,
maybe heuristically

600
00:26:31,160 --> 00:26:33,950
you think it's reasonable
to send email to 1000 folks

601
00:26:33,950 --> 00:26:36,790
but not to 350 million folks,
or something like this.

602
00:26:36,790 --> 00:26:39,331
But you're right that there are
a lot of practical limitation

603
00:26:39,331 --> 00:26:42,280
issues that come out
with this kind of stuff.

604
00:26:42,280 --> 00:26:51,080
So what the adversary can do
to get around some of this?

605
00:26:51,080 --> 00:26:54,070
There are basically
three workarounds

606
00:26:54,070 --> 00:26:58,170
that adversaries might try.

607
00:26:58,170 --> 00:27:02,820
So one thing they can
do is just use botnets,

608
00:27:02,820 --> 00:27:11,512
because botnets have a lot of
IPs that the attacker can use.

609
00:27:11,512 --> 00:27:12,970
And so for example,
even if someone

610
00:27:12,970 --> 00:27:15,340
were trying to do something
like IP blacklists,

611
00:27:15,340 --> 00:27:17,960
then maybe the attacker can
cycle through a bunch of IPs

612
00:27:17,960 --> 00:27:19,650
in this botnet and
maybe get around

613
00:27:19,650 --> 00:27:22,510
some of that
blacklist filtering.

614
00:27:22,510 --> 00:27:28,320
They can also try to use
compromised webmail accounts

615
00:27:28,320 --> 00:27:29,260
to send spam.

616
00:27:32,210 --> 00:27:35,560
So the reason why
these are super useful

617
00:27:35,560 --> 00:27:38,870
is because sites
like Gmail or Yahoo

618
00:27:38,870 --> 00:27:43,170
or Hotmail, those services can't
be blacklisted, because they're

619
00:27:43,170 --> 00:27:44,095
super, super powerful.

620
00:27:44,095 --> 00:27:46,230
So if you blacklisted
the entire service,

621
00:27:46,230 --> 00:27:48,188
then you're probably
going to shut down service

622
00:27:48,188 --> 00:27:50,020
for tens of millions of people.

623
00:27:50,020 --> 00:27:54,320
Now of course, these individual
services can shut down you.

624
00:27:54,320 --> 00:27:56,654
And so that will
actually happen once they

625
00:27:56,654 --> 00:27:59,070
have these heuristics running
that see that you're sending

626
00:27:59,070 --> 00:28:00,570
to a lot of people
you've never sent

627
00:28:00,570 --> 00:28:01,980
before, and so on and so forth.

628
00:28:01,980 --> 00:28:05,660
A lot of AI strategy takes
place on the webmail server side

629
00:28:05,660 --> 00:28:07,324
to try to predict these things.

630
00:28:07,324 --> 00:28:09,490
But these things can be
very valuable to an attacker

631
00:28:09,490 --> 00:28:13,100
because even if your
compromised account is not

632
00:28:13,100 --> 00:28:16,250
used to send a lot of emails,
it can be used to send emails

633
00:28:16,250 --> 00:28:18,210
to people that you know.

634
00:28:18,210 --> 00:28:20,170
So maybe it allows the
attacker to do things

635
00:28:20,170 --> 00:28:22,740
like spearfishing more
easily, or things like that.

636
00:28:22,740 --> 00:28:24,250
People are more likely
to click on an email that

637
00:28:24,250 --> 00:28:26,000
comes from an address
that they recognize.

638
00:28:26,000 --> 00:28:29,500
So that's a very
powerful technique there.

639
00:28:29,500 --> 00:28:31,210
And then attackers
can also try to do

640
00:28:31,210 --> 00:28:38,530
things like hijack IP addresses
from legitimate owners.

641
00:28:38,530 --> 00:28:42,790
So as was mentioned
briefly in Mark's talk,

642
00:28:42,790 --> 00:28:45,380
there's this protocol
called BGP that

643
00:28:45,380 --> 00:28:48,130
basically is used to control
routing on the internet.

644
00:28:48,130 --> 00:28:49,960
So there are these
attacks that people

645
00:28:49,960 --> 00:28:52,120
can do whereby they
will essentially say,

646
00:28:52,120 --> 00:28:55,905
hey, I'm actually the owner of
some prefix of IP addresses,

647
00:28:55,905 --> 00:28:57,530
even though they
don't actually own it.

648
00:28:57,530 --> 00:28:59,734
So all the traffic that's
involving those addresses

649
00:28:59,734 --> 00:29:01,650
will go in towards the
attacker, and then they

650
00:29:01,650 --> 00:29:04,520
can actually use those addresses
to send out spam from there.

651
00:29:04,520 --> 00:29:05,790
Then once they're
done with their evil,

652
00:29:05,790 --> 00:29:07,373
they can release the
BGP advertisement

653
00:29:07,373 --> 00:29:10,220
and then go try to do
this somewhere else.

654
00:29:10,220 --> 00:29:12,940
There's a lot of research
in how you can essentially

655
00:29:12,940 --> 00:29:15,810
think of ways to authenticate
BGP by advertisement

656
00:29:15,810 --> 00:29:18,290
or otherwise prevent
these IP address hijacks.

657
00:29:18,290 --> 00:29:19,440
So there's a bunch of
different techniques

658
00:29:19,440 --> 00:29:21,398
that attackers can do to
try to get around some

659
00:29:21,398 --> 00:29:24,840
of these defensive techniques.

660
00:29:24,840 --> 00:29:28,030
So this can all be done,
but still, these defenses,

661
00:29:28,030 --> 00:29:28,739
they're not free.

662
00:29:28,739 --> 00:29:31,279
So presumably the attacker has
to pay for the botnet somehow,

663
00:29:31,279 --> 00:29:33,590
they have to get inside
these webmail accounts.

664
00:29:33,590 --> 00:29:36,330
And so any of these
defenses that you can do

665
00:29:36,330 --> 00:29:39,856
will help to drive the cost
up of generating these spams.

666
00:29:39,856 --> 00:29:41,230
So as such, they're
still useful,

667
00:29:41,230 --> 00:29:45,610
even though they are
not perfect defenses.

668
00:29:45,610 --> 00:29:48,760
So what do these
botnets look like?

669
00:29:48,760 --> 00:29:55,770
So at a high level, you
have the proverbial cloud

670
00:29:55,770 --> 00:29:56,785
from your cloud diagram.

671
00:29:56,785 --> 00:29:59,780
You have your command and
control infrastructure up here,

672
00:29:59,780 --> 00:30:01,760
and this is the
thing that actually

673
00:30:01,760 --> 00:30:08,220
sends commands to all of the
individual bots down here.

674
00:30:08,220 --> 00:30:11,490
So the spammer will talk to
the C&C and will say hey,

675
00:30:11,490 --> 00:30:14,130
here's my new spam
messages I want to send,

676
00:30:14,130 --> 00:30:17,445
and then maybe these bots will
act on behalf of their command

677
00:30:17,445 --> 00:30:19,570
and control infrastructure
and start sending emails

678
00:30:19,570 --> 00:30:21,460
to a bunch of people.

679
00:30:21,460 --> 00:30:23,030
So let's see here.

680
00:30:23,030 --> 00:30:25,230
So why are these bots useful?

681
00:30:25,230 --> 00:30:27,592
Well, as I mentioned here,
they have IP addresses,

682
00:30:27,592 --> 00:30:28,550
which are super useful.

683
00:30:28,550 --> 00:30:31,050
But of course they also have
the associated bandwidth there.

684
00:30:31,050 --> 00:30:32,551
They also have
computational cycles.

685
00:30:32,551 --> 00:30:33,925
Sometimes these
bots are actually

686
00:30:33,925 --> 00:30:35,240
used as web servers themselves.

687
00:30:35,240 --> 00:30:37,610
So these things are
very, very useful.

688
00:30:37,610 --> 00:30:40,905
And they also serve as
a layer of indirection.

689
00:30:40,905 --> 00:30:43,740
So, as we're to discuss in
more detail in a second,

690
00:30:43,740 --> 00:30:46,460
indirection is very
useful for attackers.

691
00:30:46,460 --> 00:30:49,590
That means that if law
enforcement or whatnot shuts

692
00:30:49,590 --> 00:30:51,724
down this level, well, if
the command and control

693
00:30:51,724 --> 00:30:53,890
infrastructure's still
alive, then maybe the spammer

694
00:30:53,890 --> 00:30:55,672
can just attach this
command and control

695
00:30:55,672 --> 00:30:57,380
infrastructure to a
different set of bots

696
00:30:57,380 --> 00:30:59,040
and keep on running.

697
00:30:59,040 --> 00:31:01,670
So that's one reason why
these bots are very useful.

698
00:31:01,670 --> 00:31:04,370
And these bots can scale
to the order of magnitude

699
00:31:04,370 --> 00:31:06,860
of millions of IP addresses.

700
00:31:06,860 --> 00:31:09,550
So as it turns out, people
will click random links

701
00:31:09,550 --> 00:31:11,700
involving malware all the time.

702
00:31:11,700 --> 00:31:13,796
So these things can get
very, very, very large.

703
00:31:13,796 --> 00:31:15,920
And so some of these
takedowns that these companies

704
00:31:15,920 --> 00:31:18,253
get involved in, with trying
to take down these botnets,

705
00:31:18,253 --> 00:31:20,561
they involve millions
upon millions of machines.

706
00:31:20,561 --> 00:31:22,900
So they're very
technically challenging.

707
00:31:22,900 --> 00:31:25,780
So how much does it cost to
get your malware installed

708
00:31:25,780 --> 00:31:27,060
on all these bots?

709
00:31:27,060 --> 00:31:29,080
Remember, these
are all typically

710
00:31:29,080 --> 00:31:30,680
regular end-user machines.

711
00:31:30,680 --> 00:31:34,300
So the cost for getting
your malware on one of these

712
00:31:34,300 --> 00:31:45,640
machines, so price per post,
is about $0.10 for U.S.

713
00:31:45,640 --> 00:31:58,370
hosts and on the order of
$0.01 for posts in Asia.

714
00:31:58,370 --> 00:32:00,640
So it's interesting there's
this differential here.

715
00:32:00,640 --> 00:32:01,620
There might a couple
of different reasons

716
00:32:01,620 --> 00:32:03,020
we can imagine for why that is.

717
00:32:03,020 --> 00:32:09,240
It might be that people are
prone to think that connections

718
00:32:09,240 --> 00:32:11,860
originating from the U.S. are
more likely to be trustworthy.

719
00:32:11,860 --> 00:32:14,390
It may also be that
because there's

720
00:32:14,390 --> 00:32:15,890
pirated software
running here, stuff

721
00:32:15,890 --> 00:32:18,430
that's not actively up to
date with respect to patches.

722
00:32:18,430 --> 00:32:21,100
It's actually easier to
get botnet posts over here.

723
00:32:21,100 --> 00:32:24,000
So you'll see some very
interesting statistics

724
00:32:24,000 --> 00:32:27,180
about how some of these rates
might fluctuate, for example,

725
00:32:27,180 --> 00:32:29,410
as you see companies
like Microsoft go out

726
00:32:29,410 --> 00:32:32,169
and try to stamp down on
piracy and things like that.

727
00:32:32,169 --> 00:32:33,710
But anyway, this is
a rough estimate.

728
00:32:33,710 --> 00:32:38,260
Suffice it to say, this
is not super expensive.

729
00:32:38,260 --> 00:32:41,480
So what does-- any questions
before we continue?

730
00:32:41,480 --> 00:32:41,980
OK.

731
00:32:41,980 --> 00:32:45,340
So what does this command
and control infrastructure

732
00:32:45,340 --> 00:32:46,060
look like?

733
00:32:46,060 --> 00:32:49,580
So you can imagine that in one
substantiation, the simplest

734
00:32:49,580 --> 00:32:55,090
substantiation, this is
just some centralized setup.

735
00:32:55,090 --> 00:32:58,185
And so this is maybe
one machine or maybe

736
00:32:58,185 --> 00:32:59,840
some small number of machines.

737
00:32:59,840 --> 00:33:01,990
The attacker gets to
log into those machines

738
00:33:01,990 --> 00:33:04,490
and essentially just send these
commands out to the botnets

739
00:33:04,490 --> 00:33:05,195
from there.

740
00:33:05,195 --> 00:33:06,653
So if it's going
to be centralized,

741
00:33:06,653 --> 00:33:10,890
then it's going to be very
useful for the attacker to have

742
00:33:10,890 --> 00:33:13,160
what's known as
bulletproof hosting.

743
00:33:17,480 --> 00:33:19,545
So the idea behind
bulletproof hosting

744
00:33:19,545 --> 00:33:23,980
is that you want to put
this command and control

745
00:33:23,980 --> 00:33:31,750
infrastructure on servers that
reside in ISPs that ignore

746
00:33:31,750 --> 00:33:33,570
requests from banks or
from law enforcement

747
00:33:33,570 --> 00:33:35,980
to take down servers.

748
00:33:35,980 --> 00:33:38,200
So there are actually
bulletproof servers that exist.

749
00:33:38,200 --> 00:33:40,699
They charge a premium, because
there is a little bit of risk

750
00:33:40,699 --> 00:33:41,500
involved there.

751
00:33:41,500 --> 00:33:44,041
But if you can manage to host
one of your command and control

752
00:33:44,041 --> 00:33:45,819
centers there, it's
going to be very nice.

753
00:33:45,819 --> 00:33:47,860
Because then when the
American government or when

754
00:33:47,860 --> 00:33:50,220
Goldman Sachs or whoever
says hey, shut this guy down,

755
00:33:50,220 --> 00:33:52,500
they're running spam,
the provider will say,

756
00:33:52,500 --> 00:33:53,390
how can you make me?

757
00:33:53,390 --> 00:33:55,199
I run in a different
legal jurisdiction.

758
00:33:55,199 --> 00:33:57,490
I don't have to follow your
intellectual property laws.

759
00:33:57,490 --> 00:33:58,922
So on and so forth.

760
00:33:58,922 --> 00:33:59,880
So this is very useful.

761
00:33:59,880 --> 00:34:02,330
Like I said, these
types of hosts

762
00:34:02,330 --> 00:34:05,300
actually charge a risk
premium for running

763
00:34:05,300 --> 00:34:06,850
that kind of service.

764
00:34:06,850 --> 00:34:09,489
And so the other alternative for
running the C&C infrastructure

765
00:34:09,489 --> 00:34:13,639
is, this could be a
peer-to-peer network.

766
00:34:17,280 --> 00:34:22,042
And so the idea here is that
maybe this is sort of-- you

767
00:34:22,042 --> 00:34:24,250
can almost think of it as
a mini-botnet up there too.

768
00:34:24,250 --> 00:34:25,965
So the entire control
infrastructure

769
00:34:25,965 --> 00:34:28,250
is spread across many
different machines,

770
00:34:28,250 --> 00:34:30,010
and maybe at any
given time there's

771
00:34:30,010 --> 00:34:32,270
a different machine that's
responsible for sending

772
00:34:32,270 --> 00:34:34,484
commands to all of these
worker nodes down here.

773
00:34:34,484 --> 00:34:36,109
And so this is nice,
because it doesn't

774
00:34:36,109 --> 00:34:39,370
require you to have access to
one of these bulletproof hosts.

775
00:34:39,370 --> 00:34:42,040
You can construct the
C&C infrastructure

776
00:34:42,040 --> 00:34:44,900
using regular bots.

777
00:34:44,900 --> 00:34:47,179
The P2P aspect of it
makes it a little more

778
00:34:47,179 --> 00:34:49,820
difficult to provide guarantees
about the availability

779
00:34:49,820 --> 00:34:52,047
of the hosts that are up
here, but it does have

780
00:34:52,047 --> 00:34:53,255
some other nice advantages.

781
00:34:53,255 --> 00:34:55,428
At a high level, those
are the two approaches

782
00:34:55,428 --> 00:34:57,610
that people can use.

783
00:34:57,610 --> 00:35:08,130
So what happens if the hosting
service gets taken down?

784
00:35:12,590 --> 00:35:17,740
Well, there's a couple things
that the adversary can do.

785
00:35:17,740 --> 00:35:23,895
So they can use DNS to
essentially redirect requests.

786
00:35:30,440 --> 00:35:34,060
So let's say that
someone attacks,

787
00:35:34,060 --> 00:35:36,610
or someone issues a takedown
for the DNS infrastructure

788
00:35:36,610 --> 00:35:37,870
for something like this.

789
00:35:37,870 --> 00:35:39,870
As long as the back-end
servers are still alive,

790
00:35:39,870 --> 00:35:44,750
what the attacker
can do is basically--

791
00:35:44,750 --> 00:35:51,330
the attacker creates lists
of server IP addresses.

792
00:35:55,114 --> 00:35:58,285
And there may be hundreds or
thousands of these IP addresses

793
00:35:58,285 --> 00:35:59,600
that it collects.

794
00:35:59,600 --> 00:36:08,210
And then it will bind
each one to a host

795
00:36:08,210 --> 00:36:13,090
name for a very
short period of time.

796
00:36:13,090 --> 00:36:16,610
So let's say maybe
for 300 seconds.

797
00:36:20,317 --> 00:36:22,400
And so what's nice about
this is that if someone's

798
00:36:22,400 --> 00:36:24,370
trying to run
heuristics that say,

799
00:36:24,370 --> 00:36:28,140
if I see some particular
server sending

800
00:36:28,140 --> 00:36:32,197
more than 1,000 spam-like
messages in a given period

801
00:36:32,197 --> 00:36:34,780
I'm going to try to issue some
kind of takedown to them, well,

802
00:36:34,780 --> 00:36:37,795
these types of techniques will
maybe help the attacker fly

803
00:36:37,795 --> 00:36:40,086
under the radar of those
types of detection techniques.

804
00:36:40,086 --> 00:36:41,990
Because essentially every
300 seconds they're saying,

805
00:36:41,990 --> 00:36:43,245
OK, I'm going to be
serving spam from here,

806
00:36:43,245 --> 00:36:45,620
then I'm going to be serving
spam from here, serving spam

807
00:36:45,620 --> 00:36:46,960
from here, so on and so forth.

808
00:36:46,960 --> 00:36:49,292
So this is a nice use
of indirection, at least

809
00:36:49,292 --> 00:36:50,960
from the attacker's perspective.

810
00:36:50,960 --> 00:36:55,640
And so, as I mentioned earlier,
these types of indirection

811
00:36:55,640 --> 00:36:58,000
are of one of the key
ways that attackers

812
00:36:58,000 --> 00:37:02,710
try to evade law enforcement
and these detection heuristics.

813
00:37:02,710 --> 00:37:05,540
So you might think about,
well, what if we just

814
00:37:05,540 --> 00:37:07,480
take down the DNS server?

815
00:37:07,480 --> 00:37:09,337
How hard is it to do that?

816
00:37:09,337 --> 00:37:10,795
Well, as the paper
describes, there

817
00:37:10,795 --> 00:37:12,160
are a couple different
layers on which

818
00:37:12,160 --> 00:37:13,500
you can attack these spammers.

819
00:37:13,500 --> 00:37:17,409
So you can try to take down the
attacker's domain registration.

820
00:37:17,409 --> 00:37:18,950
That's basically
the thing that says,

821
00:37:18,950 --> 00:37:25,050
like, hey, if you're looking
for russianpharma.rx.biz.org,

822
00:37:25,050 --> 00:37:27,299
then here's the DNS
server that you talk to.

823
00:37:27,299 --> 00:37:29,090
You can imagine attacking
it at that level.

824
00:37:29,090 --> 00:37:30,548
You could also
imagine attacking it

825
00:37:30,548 --> 00:37:34,060
at the level of taking down
the spammer's DNS server,

826
00:37:34,060 --> 00:37:36,120
the thing to which you'll
be redirected once you

827
00:37:36,120 --> 00:37:38,552
look at that top-level domain.

828
00:37:38,552 --> 00:37:40,260
And so what's tricky
is that the attacker

829
00:37:40,260 --> 00:37:43,540
can use these sort of
fast flux techniques

830
00:37:43,540 --> 00:37:44,800
at every different level.

831
00:37:44,800 --> 00:37:47,600
So, for example, they
can rotate the servers

832
00:37:47,600 --> 00:37:49,360
they use to act as
their DNS servers.

833
00:37:49,360 --> 00:37:54,970
They can rotate the web servers
they use to send out the spam.

834
00:37:54,970 --> 00:37:56,388
And so on and so forth.

835
00:37:56,388 --> 00:37:58,221
So that's just a
high-level review

836
00:37:58,221 --> 00:37:59,846
of how people can
use multiple machines

837
00:37:59,846 --> 00:38:03,810
to try to avoid detection.

838
00:38:03,810 --> 00:38:09,540
So as I mentioned earlier,
you can use compromised

839
00:38:09,540 --> 00:38:14,660
webmail accounts to send spam.

840
00:38:20,900 --> 00:38:25,190
And the power of that is
that if you can get access

841
00:38:25,190 --> 00:38:27,065
to someone's account,
then you don't actually

842
00:38:27,065 --> 00:38:28,773
have to install malware
on their machine.

843
00:38:28,773 --> 00:38:30,374
You can actually
access their account

844
00:38:30,374 --> 00:38:32,290
from the privacy of your
own machine, wherever

845
00:38:32,290 --> 00:38:33,373
it is that you're located.

846
00:38:33,373 --> 00:38:36,074
And as we were
discussing earlier,

847
00:38:36,074 --> 00:38:37,740
this is useful for
spearfishing attacks,

848
00:38:37,740 --> 00:38:40,690
because you can send this spam
message as the person whose

849
00:38:40,690 --> 00:38:42,570
account it actually belongs to.

850
00:38:42,570 --> 00:38:44,280
And so as a result
the webmail providers

851
00:38:44,280 --> 00:38:47,714
are very motivated to shut
this kind of thing down.

852
00:38:47,714 --> 00:38:49,380
Because if they don't
do that, then they

853
00:38:49,380 --> 00:38:51,600
risk being blacklisted
as a whole.

854
00:38:51,600 --> 00:38:54,880
All the users risk being flagged
as spam, which they don't want.

855
00:38:54,880 --> 00:38:58,140
And also the provider actually
needs to somehow monetize

856
00:38:58,140 --> 00:38:58,870
their service.

857
00:38:58,870 --> 00:39:01,750
They actually need real
users to be doing things

858
00:39:01,750 --> 00:39:03,550
like clicking on ads
in the righthand bar

859
00:39:03,550 --> 00:39:04,880
of their webmail account.

860
00:39:04,880 --> 00:39:08,380
So the higher the proportion of
their users which are spamming,

861
00:39:08,380 --> 00:39:10,535
the less likely advertisers
are to advertise

862
00:39:10,535 --> 00:39:11,920
in their webmail system.

863
00:39:11,920 --> 00:39:13,972
So the webmail
account providers are

864
00:39:13,972 --> 00:39:17,280
very incentivized to shut
down this kind of stuff.

865
00:39:17,280 --> 00:39:20,330
So how do they try to
detect this type of spam?

866
00:39:20,330 --> 00:39:21,450
They use those heuristics.

867
00:39:21,450 --> 00:39:24,350
They might try to use CAPTCHAs.

868
00:39:24,350 --> 00:39:27,180
If they suspect that you've
sent some spam-like messages,

869
00:39:27,180 --> 00:39:28,835
let's say five
times in a row, they

870
00:39:28,835 --> 00:39:30,960
might ask you to type in
one of those fuzzy letters

871
00:39:30,960 --> 00:39:32,950
or whatever.

872
00:39:32,950 --> 00:39:35,150
Suffice it to say, though,
a lot of these techniques

873
00:39:35,150 --> 00:39:36,520
don't work very well.

874
00:39:36,520 --> 00:39:41,650
If you look at the
price per account,

875
00:39:41,650 --> 00:39:43,425
so how much you
as a spammer would

876
00:39:43,425 --> 00:39:45,880
have to pay to get
one of these things,

877
00:39:45,880 --> 00:39:47,590
it's still super, super cheap.

878
00:39:47,590 --> 00:39:54,860
So it's on the order of $0.01 to
$0.05 for an account on Yahoo,

879
00:39:54,860 --> 00:39:56,770
Gmail, Hotmail,
something like that.

880
00:39:56,770 --> 00:39:59,030
So once again, this
is very, very low.

881
00:39:59,030 --> 00:40:01,580
And so this does not act as
an effective disincentive

882
00:40:01,580 --> 00:40:04,670
for spammers to try to
do these types of things.

883
00:40:04,670 --> 00:40:08,590
So this maybe is a
little bit disappointing,

884
00:40:08,590 --> 00:40:10,740
because it seems
like everywhere we

885
00:40:10,740 --> 00:40:13,160
go, we have to solve
these CAPTCHAs if we

886
00:40:13,160 --> 00:40:15,459
want to buy things
or send emails or do

887
00:40:15,459 --> 00:40:16,250
that kind of stuff.

888
00:40:16,250 --> 00:40:20,480
So basically, what
happened to CAPTCHAs?

889
00:40:20,480 --> 00:40:24,660
They were supposed to make
all this bad stuff go away.

890
00:40:24,660 --> 00:40:29,580
And as it turns
out, the attacker

891
00:40:29,580 --> 00:40:34,250
can build services
to solve CAPTCHAs.

892
00:40:37,580 --> 00:40:41,210
So this can be automated,
just like anything else.

893
00:40:44,380 --> 00:40:46,860
As it turns out, the
economics for this

894
00:40:46,860 --> 00:40:49,440
is that if you want
to solve one CAPTCHA,

895
00:40:49,440 --> 00:40:57,521
then it's approximately $0.001
dollar to solve a CAPTCHA.

896
00:40:57,521 --> 00:40:59,930
Which is nothing.

897
00:40:59,930 --> 00:41:02,830
And this can be done with
very, very low latency, too.

898
00:41:02,830 --> 00:41:05,400
So CAPTCHAs essentially
are not presenting

899
00:41:05,400 --> 00:41:08,620
most large-scale spammers
with a high barrier

900
00:41:08,620 --> 00:41:10,200
for sending these spams.

901
00:41:10,200 --> 00:41:12,630
And so how is this being done?

902
00:41:12,630 --> 00:41:14,290
If it's this cheap,
you might think,

903
00:41:14,290 --> 00:41:17,182
maybe it's being done all
by computers, by software.

904
00:41:17,182 --> 00:41:18,140
But it's not, actually.

905
00:41:18,140 --> 00:41:21,434
So a lot of this
is done by humans.

906
00:41:25,780 --> 00:41:29,903
In particular, the attacker
can outsource this in one

907
00:41:29,903 --> 00:41:30,650
of two ways.

908
00:41:30,650 --> 00:41:32,191
So first of all the
attacker can just

909
00:41:32,191 --> 00:41:34,570
find a labor market
where the cost of labor

910
00:41:34,570 --> 00:41:36,340
is very, very cheap.

911
00:41:36,340 --> 00:41:39,740
So you can employ humans
to essentially act

912
00:41:39,740 --> 00:41:42,154
as CAPTCHA solvers for you.

913
00:41:42,154 --> 00:41:44,070
You, the spammer, are
presented with a CAPTCHA

914
00:41:44,070 --> 00:41:45,240
by Gmail or whatever.

915
00:41:45,240 --> 00:41:47,470
You, the spammer,
then send that CAPTCHA

916
00:41:47,470 --> 00:41:49,290
over to some human
sitting somewhere.

917
00:41:49,290 --> 00:41:51,690
They solve for you, they've
earned some small amount

918
00:41:51,690 --> 00:41:54,340
of money, and then
you send their answer

919
00:41:54,340 --> 00:41:56,410
to the legitimate site.

920
00:41:56,410 --> 00:42:02,160
You could also do this
with Mechanical Turk.

921
00:42:02,160 --> 00:42:05,064
Have you guys heard
of Mechanical Turk?

922
00:42:05,064 --> 00:42:07,800
I've asked the question, my
back is turned, [INAUDIBLE].

923
00:42:07,800 --> 00:42:11,230
OK, so Mechanical
Turk is pretty neat,

924
00:42:11,230 --> 00:42:12,980
I mean neat if you're
trying to do evil.

925
00:42:12,980 --> 00:42:13,880
So what's nice about
that is that you

926
00:42:13,880 --> 00:42:16,192
can post these tasks on
Mechanical Turk and say,

927
00:42:16,192 --> 00:42:18,650
hey, I have a picture-solving
game, or something like this.

928
00:42:18,650 --> 00:42:20,390
Or you can just come
out and say straight up,

929
00:42:20,390 --> 00:42:22,015
I've got some CAPTCHAs
I want to solve.

930
00:42:22,015 --> 00:42:23,990
You post a price, and
then basically the market

931
00:42:23,990 --> 00:42:26,466
will match you with people who
are willing to do that task.

932
00:42:26,466 --> 00:42:28,840
And then they'll do it for
you, they'll post the answers.

933
00:42:28,840 --> 00:42:34,060
So this actually automates
a lot of actually finding

934
00:42:34,060 --> 00:42:37,180
the labor pool for the spammer.

935
00:42:37,180 --> 00:42:38,907
The problem with
this is that you

936
00:42:38,907 --> 00:42:40,365
have more overhead
for the spammer,

937
00:42:40,365 --> 00:42:43,955
because Amazon has to take
some cut of that profit that's

938
00:42:43,955 --> 00:42:44,890
generated from that.

939
00:42:44,890 --> 00:42:48,410
But that's very nice there.

940
00:42:48,410 --> 00:42:50,780
Another thing that
attackers can do

941
00:42:50,780 --> 00:42:55,530
is they can actually reuse
CAPTCHAs on legitimate sites.

942
00:42:55,530 --> 00:42:58,610
So there's some CAPTCHA that
the attacker wants to solve.

943
00:42:58,610 --> 00:43:00,590
They then have some
legitimate site

944
00:43:00,590 --> 00:43:03,590
on the side where they present
that exact same CAPTCHA,

945
00:43:03,590 --> 00:43:06,510
and get a real visitor to
figure out what that CAPTCHA is.

946
00:43:06,510 --> 00:43:08,680
Then they come back
over to the first site

947
00:43:08,680 --> 00:43:11,590
and then use that
answer as the answer.

948
00:43:11,590 --> 00:43:14,001
And like all these
crowdsourcing-type things,

949
00:43:14,001 --> 00:43:15,626
if you don't trust
your users, then you

950
00:43:15,626 --> 00:43:17,540
can maybe replicate the work.

951
00:43:17,540 --> 00:43:19,880
So you send the CAPTCHA to
maybe two or three people.

952
00:43:19,880 --> 00:43:21,963
And then you come back in
and use majority voting,

953
00:43:21,963 --> 00:43:25,430
take whatever that majority
vote was as your CAPTCHA answer.

954
00:43:25,430 --> 00:43:27,190
And so these are
some of the reasons

955
00:43:27,190 --> 00:43:29,270
why the CAPTCHA
defenses don't work

956
00:43:29,270 --> 00:43:31,130
as well as you might think.

957
00:43:31,130 --> 00:43:34,590
So the providers, so for example
Gmail or Yahoo or whatever,

958
00:43:34,590 --> 00:43:37,840
can to try to implement
more frequent CAPTCHAs

959
00:43:37,840 --> 00:43:42,200
to try to push the friction
level up for the spammer.

960
00:43:42,200 --> 00:43:44,320
The problem there is
that then regular users

961
00:43:44,320 --> 00:43:45,490
will get irritated.

962
00:43:45,490 --> 00:43:47,960
So a good example
of this is Gmail's

963
00:43:47,960 --> 00:43:49,210
two-factor authentication.

964
00:43:49,210 --> 00:43:51,610
It's actually a super good idea.

965
00:43:51,610 --> 00:43:53,585
Whenever Gmail will
detect that you're

966
00:43:53,585 --> 00:43:55,320
trying to use Gmail
from a machine

967
00:43:55,320 --> 00:43:57,580
that it doesn't
know about, it'll

968
00:43:57,580 --> 00:44:00,025
basically send you a text
message saying hey, enter

969
00:44:00,025 --> 00:44:02,940
this verification
code into Gmail

970
00:44:02,940 --> 00:44:05,170
before you can actually
continue to use the service.

971
00:44:05,170 --> 00:44:07,336
And so what's funny is that
it's a super great idea,

972
00:44:07,336 --> 00:44:09,370
but at least for me,
I get super irritated

973
00:44:09,370 --> 00:44:11,044
when I have to get
that text message.

974
00:44:11,044 --> 00:44:13,210
Like, I know it's good for
me, but I just get angry.

975
00:44:13,210 --> 00:44:13,918
It's frictionful.

976
00:44:13,918 --> 00:44:15,479
And so I'll do it
if I don't migrate

977
00:44:15,479 --> 00:44:17,020
to a lot of different
machines a lot,

978
00:44:17,020 --> 00:44:19,640
but if I had to do it any
more than I did right now,

979
00:44:19,640 --> 00:44:22,800
it's unclear that I'd feel
as happy about it as I do.

980
00:44:22,800 --> 00:44:24,690
So there's this very
interesting sort

981
00:44:24,690 --> 00:44:27,060
of tradeoff between the
security that people

982
00:44:27,060 --> 00:44:29,660
say that they want and the
security measures that they're

983
00:44:29,660 --> 00:44:30,740
willing to put up with.

984
00:44:30,740 --> 00:44:32,490
So as a result,
it's very difficult

985
00:44:32,490 --> 00:44:35,485
for the webmail providers to
increase the amount of CAPTCHAs

986
00:44:35,485 --> 00:44:38,620
and still keep users happy.

987
00:44:38,620 --> 00:44:40,490
OK, so any other questions
before we move on

988
00:44:40,490 --> 00:44:41,360
to click support?

989
00:44:41,360 --> 00:44:45,824
AUDIENCE: So is one of the
reasons for the non-adoption

990
00:44:45,824 --> 00:44:49,296
of encrypted emails,
besides the [INAUDIBLE]

991
00:44:49,296 --> 00:44:52,770
is that spam filters have
a very, very big part?

992
00:44:52,770 --> 00:44:56,374
PROFESSOR: Ah, because then they
can't inspect messages and see

993
00:44:56,374 --> 00:44:57,040
what's going on.

994
00:44:57,040 --> 00:44:57,998
That's a good question.

995
00:44:57,998 --> 00:44:59,530
I think it's
actually hard to say.

996
00:44:59,530 --> 00:45:01,820
I don't know, because it's a
little bit of a chicken and egg

997
00:45:01,820 --> 00:45:02,320
problem.

998
00:45:02,320 --> 00:45:05,260
So because there isn't a huge
volume of encrypted email,

999
00:45:05,260 --> 00:45:07,977
it's unclear whether
spammers are actually trying

1000
00:45:07,977 --> 00:45:09,060
to take advantage of that.

1001
00:45:09,060 --> 00:45:11,130
But I could see that
maybe being a problem.

1002
00:45:11,130 --> 00:45:12,810
I mean, people
have looked at ways

1003
00:45:12,810 --> 00:45:16,880
to do computation
over encrypted data.

1004
00:45:16,880 --> 00:45:19,430
So maybe you could think
about doing something there.

1005
00:45:19,430 --> 00:45:20,880
But it's always tricky.

1006
00:45:20,880 --> 00:45:22,560
So for example,
with spam, people

1007
00:45:22,560 --> 00:45:25,730
have these spam filters that
were based on Markov models

1008
00:45:25,730 --> 00:45:26,810
and things like that.

1009
00:45:26,810 --> 00:45:27,935
So what do the spammers do?

1010
00:45:27,935 --> 00:45:30,950
They start making these
images that basically

1011
00:45:30,950 --> 00:45:32,480
can't be seen by
the text scanners,

1012
00:45:32,480 --> 00:45:34,313
but then have the
spamming content in there.

1013
00:45:34,313 --> 00:45:38,290
So it's always an arms race.

1014
00:45:38,290 --> 00:45:38,995
All right.

1015
00:45:38,995 --> 00:45:44,100
So let's move on
to click support.

1016
00:45:44,100 --> 00:45:49,000
So what is this about?

1017
00:45:49,000 --> 00:45:51,870
So once the advertising
step has succeeded

1018
00:45:51,870 --> 00:45:54,930
and the user is given a link, so
these are clicks on that link,

1019
00:45:54,930 --> 00:46:01,630
so the user contacts
some DNS server

1020
00:46:01,630 --> 00:46:09,010
after clicking on that
link to basically translate

1021
00:46:09,010 --> 00:46:18,130
some hostname that was
in that link to some IP.

1022
00:46:18,130 --> 00:46:21,940
And then after that
translation takes place,

1023
00:46:21,940 --> 00:46:34,980
the user has to contact some
web server that has that IP.

1024
00:46:34,980 --> 00:46:37,080
So to make all this
work, the spammer

1025
00:46:37,080 --> 00:46:44,838
has to register a domain name.

1026
00:46:44,838 --> 00:46:54,570
And then the spammer
has to run a DNS server,

1027
00:46:54,570 --> 00:46:56,920
and then they have
to run a web server.

1028
00:47:02,930 --> 00:47:05,010
So this is essentially
what the spammer

1029
00:47:05,010 --> 00:47:07,950
has to do to make this click
support thing work out.

1030
00:47:07,950 --> 00:47:10,376
So one question you
might have is, well,

1031
00:47:10,376 --> 00:47:13,380
why wouldn't the
spammer just use

1032
00:47:13,380 --> 00:47:18,101
raw IP addresses, for example,
like in these spam URLs?

1033
00:47:18,101 --> 00:47:20,100
And so does anyone have
any thoughts about that?

1034
00:47:20,100 --> 00:47:25,046
Why wouldn't you just
have 183.4.4 dot whatever,

1035
00:47:25,046 --> 00:47:27,530
instead of having something
like russianjewels.biz?

1036
00:47:27,530 --> 00:47:29,495
AUDIENCE: Because
it looks sketchy,

1037
00:47:29,495 --> 00:47:30,694
it makes it easier to tell.

1038
00:47:30,694 --> 00:47:31,360
PROFESSOR: Yeah.

1039
00:47:31,360 --> 00:47:34,814
So one thing, one would
hope, is that a user would

1040
00:47:34,814 --> 00:47:37,230
look at this thing that just
has a bunch of numbers in it,

1041
00:47:37,230 --> 00:47:39,962
and they'd say, well,
this clearly seems weird.

1042
00:47:39,962 --> 00:47:42,420
As it turns out, this will only
weed out some of the users,

1043
00:47:42,420 --> 00:47:43,461
but you're exactly right.

1044
00:47:43,461 --> 00:47:46,225
There's a subset of people you
would lose just because nobody

1045
00:47:46,225 --> 00:47:47,730
wants to click on that.

1046
00:47:47,730 --> 00:47:50,210
Another reason is
that once again,

1047
00:47:50,210 --> 00:47:53,580
having this sort of DNS
infrastructure up here

1048
00:47:53,580 --> 00:47:56,220
gives the attacker another
level of indirection.

1049
00:47:56,220 --> 00:47:59,900
So once again, if the legal
authorities or whoever

1050
00:47:59,900 --> 00:48:02,280
shut down the DNS
infrastructure but they somehow

1051
00:48:02,280 --> 00:48:05,400
don't manage to shut down
that back-end web server,

1052
00:48:05,400 --> 00:48:07,524
then the spammer can
conjure up a different sort

1053
00:48:07,524 --> 00:48:09,190
of front end for their
service and maybe

1054
00:48:09,190 --> 00:48:11,930
try to use that same web
server on the back end.

1055
00:48:11,930 --> 00:48:13,450
So that's another
reason, I think,

1056
00:48:13,450 --> 00:48:16,960
that people don't
typically put these raw IP

1057
00:48:16,960 --> 00:48:21,020
addresses in their spam URLs.

1058
00:48:21,020 --> 00:48:27,400
So another example of how this
redirection comes into play--

1059
00:48:27,400 --> 00:48:29,790
how this indirection
comes into play, sorry--

1060
00:48:29,790 --> 00:48:37,445
is that these spam URLs often
point to redirection sites.

1061
00:48:43,070 --> 00:48:48,660
And so these are sites like
bit.ly, or things like that.

1062
00:48:48,660 --> 00:48:52,793
And so in addition to
things like bit.ly,

1063
00:48:52,793 --> 00:48:55,870
you could also imagine
that a compromised

1064
00:48:55,870 --> 00:49:02,515
website can actually
also act as a redirecter.

1065
00:49:05,310 --> 00:49:09,134
You just put the appropriate
HTML or JavaScript in there

1066
00:49:09,134 --> 00:49:10,675
that when the user
goes to that site,

1067
00:49:10,675 --> 00:49:13,520
it's then going to
redirect the user's browser

1068
00:49:13,520 --> 00:49:15,674
to some other different site.

1069
00:49:15,674 --> 00:49:17,590
So once again, this
useful because it provides

1070
00:49:17,590 --> 00:49:19,320
that level of indirection.

1071
00:49:19,320 --> 00:49:21,585
And it actually acts
as a force multiplier,

1072
00:49:21,585 --> 00:49:25,770
so you have a single
spamming web server back end,

1073
00:49:25,770 --> 00:49:29,180
but then you can name it
using different things.

1074
00:49:29,180 --> 00:49:32,480
And that will allow
you to maybe confuse

1075
00:49:32,480 --> 00:49:35,980
filters who have blacklisted,
let's say, 10% of your URLs,

1076
00:49:35,980 --> 00:49:37,970
but not the other 90% of them.

1077
00:49:37,970 --> 00:49:40,290
So this is a very,
very common technique.

1078
00:49:40,290 --> 00:49:45,770
And then another thing is
that sometimes the spammers

1079
00:49:45,770 --> 00:49:58,070
can use botnets as web servers
or maybe as proxies, as DNS

1080
00:49:58,070 --> 00:50:01,800
servers, and so and so forth.

1081
00:50:01,800 --> 00:50:04,990
We mentioned this a
little bit earlier,

1082
00:50:04,990 --> 00:50:07,860
but this is another example of
how the more machines you have

1083
00:50:07,860 --> 00:50:10,237
as an attacker, the more
defense that gives you.

1084
00:50:10,237 --> 00:50:12,320
Because you can hide your
evil amongst a watershed

1085
00:50:12,320 --> 00:50:12,870
of machines.

1086
00:50:16,758 --> 00:50:20,160
All right.

1087
00:50:20,160 --> 00:50:22,802
So in some cases, one of the
things the paper talks about

1088
00:50:22,802 --> 00:50:24,010
is these affiliate providers.

1089
00:50:24,010 --> 00:50:29,290
These affiliate providers kind
of act as evil clearinghouses.

1090
00:50:29,290 --> 00:50:31,905
They will help to automate some
of the tedium of interacting

1091
00:50:31,905 --> 00:50:34,020
with the banks, and
things like this,

1092
00:50:34,020 --> 00:50:35,730
on behalf of you, the spammer.

1093
00:50:35,730 --> 00:50:37,610
So one thing you
might wonder is, well,

1094
00:50:37,610 --> 00:50:39,800
why can't the law
enforcement just take down

1095
00:50:39,800 --> 00:50:41,090
the affiliate providers?

1096
00:50:41,090 --> 00:50:43,152
They seem kind of
like a choke point.

1097
00:50:43,152 --> 00:50:45,110
And the thing is that
these affiliate providers

1098
00:50:45,110 --> 00:50:48,310
are kind of like SPECTRE
from the James Bond movies.

1099
00:50:48,310 --> 00:50:50,220
They're very
decentralized themselves.

1100
00:50:50,220 --> 00:50:53,184
So it's very difficult to
point to an affiliate provider

1101
00:50:53,184 --> 00:50:55,350
at this particular machine,
and we'll just shut down

1102
00:50:55,350 --> 00:50:56,530
that particular machine.

1103
00:50:56,530 --> 00:50:58,000
Oftentimes the
affiliate providers

1104
00:50:58,000 --> 00:50:59,640
are distributed themselves.

1105
00:50:59,640 --> 00:51:01,800
So that means that it's
actually pretty tricky for,

1106
00:51:01,800 --> 00:51:04,770
let's say, the FBI, to just
go to some affiliate program

1107
00:51:04,770 --> 00:51:07,840
and say, thou shalt
not do this anymore.

1108
00:51:07,840 --> 00:51:09,420
Another interesting
thing, too, is

1109
00:51:09,420 --> 00:51:12,830
that the paper mentions
that in many countries

1110
00:51:12,830 --> 00:51:14,640
IP laws are different,
for example.

1111
00:51:14,640 --> 00:51:17,600
So the FBI may not be able to
enforce intellectual properties

1112
00:51:17,600 --> 00:51:19,430
that we have with
other countries.

1113
00:51:19,430 --> 00:51:21,520
And also, according
to the paper,

1114
00:51:21,520 --> 00:51:23,755
in many of these spam
forums, the spammers

1115
00:51:23,755 --> 00:51:26,790
claim they are providing a
useful, legitimate service

1116
00:51:26,790 --> 00:51:28,370
to Western countries.

1117
00:51:28,370 --> 00:51:30,720
They say that
essentially, prices

1118
00:51:30,720 --> 00:51:32,380
are too high for
some of these things,

1119
00:51:32,380 --> 00:51:34,900
in these Western countries,
and that the fact that people

1120
00:51:34,900 --> 00:51:37,850
are clicking on demand indicates
there's a legitimate need

1121
00:51:37,850 --> 00:51:41,970
to buy Windows copies that
may be riddled with malware.

1122
00:51:41,970 --> 00:51:44,399
So a lot of times the
spammers themselves

1123
00:51:44,399 --> 00:51:46,190
don't feel that they're
doing anything bad.

1124
00:51:46,190 --> 00:51:48,050
And as we'll discuss
a little bit later,

1125
00:51:48,050 --> 00:51:50,430
the spammers do often
actually give you

1126
00:51:50,430 --> 00:51:52,476
the stuff that you've
paid money for,

1127
00:51:52,476 --> 00:51:54,642
which for me was one of the
most surprising outcomes

1128
00:51:54,642 --> 00:51:55,790
of the paper.

1129
00:51:55,790 --> 00:51:59,610
And so we'll discuss why
that is in a little bit.

1130
00:51:59,610 --> 00:52:02,030
So one thing that
the paper talks about

1131
00:52:02,030 --> 00:52:05,380
is various takedown
strategies that you

1132
00:52:05,380 --> 00:52:09,680
can imagine employing to
try to stop a spammer.

1133
00:52:09,680 --> 00:52:11,420
So one thing it
talked about, they

1134
00:52:11,420 --> 00:52:24,900
said that only a few
number of registrars host

1135
00:52:24,900 --> 00:52:27,955
domains for many affiliates.

1136
00:52:32,330 --> 00:52:37,195
And so what that means is
that most of these affiliate

1137
00:52:37,195 --> 00:52:40,900
programs are-- there's sort
of this one-to-one binding

1138
00:52:40,900 --> 00:52:43,350
between affiliates and
the registrars that

1139
00:52:43,350 --> 00:52:45,950
are dealing with their domain
name and infrastructure.

1140
00:52:45,950 --> 00:52:48,360
It's very rare that you
have a single domain name

1141
00:52:48,360 --> 00:52:51,280
registrar who's going
to be associated

1142
00:52:51,280 --> 00:52:53,390
with a bunch of different
affiliate programs.

1143
00:52:53,390 --> 00:52:55,056
So what that means
is that in many cases

1144
00:52:55,056 --> 00:52:57,240
there's not this, like,
master decapitation strike

1145
00:52:57,240 --> 00:52:58,520
you could launch,
where you'd take out

1146
00:52:58,520 --> 00:53:00,603
this particular registrar
and then all of a sudden

1147
00:53:00,603 --> 00:53:03,360
the entire spam
infrastructure falls down.

1148
00:53:03,360 --> 00:53:09,670
They found similar results
for things like web servers.

1149
00:53:09,670 --> 00:53:12,330
It's very rare that
one ISP will actually

1150
00:53:12,330 --> 00:53:16,230
host a ton of web servers for
a ton of affiliate programs.

1151
00:53:16,230 --> 00:53:17,910
This distributed
nature, once again,

1152
00:53:17,910 --> 00:53:20,000
makes it very difficult
to say, if we just

1153
00:53:20,000 --> 00:53:23,050
take out these three things
then the whole ecosystem just

1154
00:53:23,050 --> 00:53:25,560
crumbles.

1155
00:53:25,560 --> 00:53:27,300
So that's a little
bit disappointing,

1156
00:53:27,300 --> 00:53:29,130
because one would
hope that there'd

1157
00:53:29,130 --> 00:53:34,000
be one web server in Evildonia,
where if we could just

1158
00:53:34,000 --> 00:53:36,865
take down Evildonia, then people
would stop sending us spam.

1159
00:53:36,865 --> 00:53:38,290
That's actually not true.

1160
00:53:38,290 --> 00:53:40,490
As we'll see later,
though, that may

1161
00:53:40,490 --> 00:53:42,470
be true to some extent
at the banking back end.

1162
00:53:42,470 --> 00:53:44,990
And so maybe we can actually
put the squeeze on there.

1163
00:53:44,990 --> 00:53:48,580
So anyway, I was alluding to
earlier about this realization

1164
00:53:48,580 --> 00:53:51,320
phase.

1165
00:53:51,320 --> 00:53:57,220
So the realization phase is what
happens after you, the user,

1166
00:53:57,220 --> 00:54:00,050
have decided to buy something.

1167
00:54:00,050 --> 00:54:03,660
So the realization phase
consists of two parts.

1168
00:54:03,660 --> 00:54:07,770
The user pays for whatever
goods they've bought,

1169
00:54:07,770 --> 00:54:14,140
or they want to buy, and
then the user hopefully

1170
00:54:14,140 --> 00:54:17,700
will receive those goods.

1171
00:54:17,700 --> 00:54:20,450
So either in the
mail because they're

1172
00:54:20,450 --> 00:54:23,180
buying some type
of knockoff drug,

1173
00:54:23,180 --> 00:54:25,489
or they get some
software download

1174
00:54:25,489 --> 00:54:27,780
because they want to get some
fake version of Photoshop

1175
00:54:27,780 --> 00:54:28,780
or something like that.

1176
00:54:28,780 --> 00:54:33,870
And so the money flow
looks something like this.

1177
00:54:33,870 --> 00:54:38,840
We start with the
customer here, and they're

1178
00:54:38,840 --> 00:54:44,180
going to tell the merchant hey,
I want to go buy something.

1179
00:54:44,180 --> 00:54:47,430
They will send some
credit card info here,

1180
00:54:47,430 --> 00:54:50,050
and then the merchant is
going to talk to the payment

1181
00:54:50,050 --> 00:54:52,800
processor.

1182
00:54:52,800 --> 00:54:54,840
And this is
essentially a middleman

1183
00:54:54,840 --> 00:54:58,650
that helps the
merchant, the spammer,

1184
00:54:58,650 --> 00:55:00,710
deal with some of the
intricacies of interacting

1185
00:55:00,710 --> 00:55:03,160
with the credit card system.

1186
00:55:03,160 --> 00:55:07,320
The payment processor will
talk to the acquiring bank.

1187
00:55:10,097 --> 00:55:12,180
So the acquiring bank,
that's the merchant's bank.

1188
00:55:17,630 --> 00:55:20,000
And then the acquiring bank--
running out of space here.

1189
00:55:20,000 --> 00:55:24,120
So, violating all
good design standards,

1190
00:55:24,120 --> 00:55:25,880
we will come up here.

1191
00:55:25,880 --> 00:55:28,860
So the acquiring bank is
then going to talk to-- they

1192
00:55:28,860 --> 00:55:33,400
call them in the paper
the association network,

1193
00:55:33,400 --> 00:55:35,940
but just think of this as Visa.

1194
00:55:35,940 --> 00:55:40,170
This is the credit
card network up here.

1195
00:55:40,170 --> 00:55:42,290
And then finally the
association network,

1196
00:55:42,290 --> 00:55:48,460
Visa or MasterCard or whatever,
talks to the issuing bank.

1197
00:55:48,460 --> 00:55:52,070
So that issuing bank
is the customer's bank.

1198
00:55:52,070 --> 00:55:57,067
And essentially
the Visa or whoever

1199
00:55:57,067 --> 00:55:59,150
is going to go to the
customer's bank and say hey,

1200
00:55:59,150 --> 00:56:00,191
is this a legit purchase?

1201
00:56:00,191 --> 00:56:01,570
Is this a legit transaction?

1202
00:56:01,570 --> 00:56:03,280
And if this is a
legit transaction,

1203
00:56:03,280 --> 00:56:04,970
then the money
will actually flow

1204
00:56:04,970 --> 00:56:06,255
through this entire system.

1205
00:56:06,255 --> 00:56:11,810
So this is what the end-to-end
financial workflow looks like.

1206
00:56:11,810 --> 00:56:13,992
And so this workflow
can actually

1207
00:56:13,992 --> 00:56:14,950
process a lot of money.

1208
00:56:14,950 --> 00:56:18,030
So one of the papers that we
mentioned in the lecture notes

1209
00:56:18,030 --> 00:56:20,090
shows that a single
affiliate can

1210
00:56:20,090 --> 00:56:23,530
get more than $10 million
dollars at this workflow here.

1211
00:56:23,530 --> 00:56:26,580
And so in practice, you
might think that oh,

1212
00:56:26,580 --> 00:56:29,610
why wouldn't the acquiring
bank or the issuing

1213
00:56:29,610 --> 00:56:31,980
bank say, something
looks kind of fishy here?

1214
00:56:31,980 --> 00:56:35,740
As it turns, in many
cases, they don't.

1215
00:56:35,740 --> 00:56:37,960
And so this gets into this
interesting discussion

1216
00:56:37,960 --> 00:56:45,580
about why is it that these
workflows are often tolerated

1217
00:56:45,580 --> 00:56:46,790
by the financial system.

1218
00:56:46,790 --> 00:56:54,480
For example, why do
spammers properly

1219
00:56:54,480 --> 00:56:55,650
classify their transactions?

1220
00:56:58,930 --> 00:57:05,160
So if you want to send
something through this system,

1221
00:57:05,160 --> 00:57:08,942
you have to tag that transaction
with some type of type.

1222
00:57:08,942 --> 00:57:10,650
You have to say, this
is pharmaceuticals,

1223
00:57:10,650 --> 00:57:13,250
this is software, this is
whatever, this is whatever.

1224
00:57:13,250 --> 00:57:15,300
So you might think
that as a spammer,

1225
00:57:15,300 --> 00:57:18,390
you wouldn't actually
want to do this.

1226
00:57:18,390 --> 00:57:22,157
If you were selling fake
Flintstones vitamins,

1227
00:57:22,157 --> 00:57:23,990
maybe you don't want
to say this is actually

1228
00:57:23,990 --> 00:57:25,810
a pharmaceutical transaction.

1229
00:57:25,810 --> 00:57:28,170
And what's interesting is
that spammers do actually

1230
00:57:28,170 --> 00:57:30,840
properly classify these
transactions in many cases.

1231
00:57:30,840 --> 00:57:37,660
And the reason is that there are
high fines if you misclassify.

1232
00:57:40,520 --> 00:57:46,590
So essentially what happens is
that these association networks

1233
00:57:46,590 --> 00:57:50,440
like Visa or Mastercard,
in many cases

1234
00:57:50,440 --> 00:57:52,985
they are OK, perhaps,
with transactions

1235
00:57:52,985 --> 00:57:54,730
that are slightly shady.

1236
00:57:54,730 --> 00:57:57,810
But they don't want to be blamed
for being a money launderer,

1237
00:57:57,810 --> 00:58:00,330
or for trying to
deceive the authorities.

1238
00:58:00,330 --> 00:58:04,480
So as long as you properly
classify what you do, then

1239
00:58:04,480 --> 00:58:06,970
in a certain sense this
gives the association

1240
00:58:06,970 --> 00:58:08,790
networks a little
bit of, well, listen,

1241
00:58:08,790 --> 00:58:10,700
they told us what was going on.

1242
00:58:10,700 --> 00:58:12,540
Maybe the law was a
little bit unclear.

1243
00:58:12,540 --> 00:58:14,410
But we, at least,
Visa or MasterCard,

1244
00:58:14,410 --> 00:58:18,140
did not try to hide the
intent of this transaction.

1245
00:58:18,140 --> 00:58:20,195
So spammers do oftentimes
properly classify

1246
00:58:20,195 --> 00:58:22,750
their transactions.

1247
00:58:22,750 --> 00:58:23,879
So that's interesting.

1248
00:58:23,879 --> 00:58:25,920
It seems like they're
playing within the confines

1249
00:58:25,920 --> 00:58:27,520
of the system a little bit.

1250
00:58:27,520 --> 00:58:30,450
So another question
I mentioned earlier

1251
00:58:30,450 --> 00:58:33,970
is, why send anything to users?

1252
00:58:38,240 --> 00:58:41,400
Because presumably you're a
spammer, so you're a criminal,

1253
00:58:41,400 --> 00:58:41,900
right?

1254
00:58:41,900 --> 00:58:45,545
So why wouldn't it just be cool
if you just took people's money

1255
00:58:45,545 --> 00:58:46,340
and then ran?

1256
00:58:46,340 --> 00:58:48,050
I mean, that'd be
the ultimate crime.

1257
00:58:48,050 --> 00:58:53,260
So as it turns out, they
actually send things to users

1258
00:58:53,260 --> 00:58:59,150
because, surprise surprise,
high fines if they don't.

1259
00:58:59,150 --> 00:59:00,780
So it's this very
entertaining system

1260
00:59:00,780 --> 00:59:03,660
whereby spammers kind of want
to do things that are legal,

1261
00:59:03,660 --> 00:59:06,000
when they actually
can't use Bitcoins yet.

1262
00:59:06,000 --> 00:59:08,634
They actually have to work
within the constraints

1263
00:59:08,634 --> 00:59:09,800
of this pre-existing system.

1264
00:59:09,800 --> 00:59:12,485
So as it turns out, there
are these high fines

1265
00:59:12,485 --> 00:59:19,000
if you, and by you
I mean the spammer,

1266
00:59:19,000 --> 00:59:20,160
have too many chargebacks.

1267
00:59:24,050 --> 00:59:29,370
So a chargeback is
essentially when a customer

1268
00:59:29,370 --> 00:59:31,280
tells their credit
card company, hey,

1269
00:59:31,280 --> 00:59:34,805
I didn't get the thing that
I was supposed to get that I

1270
00:59:34,805 --> 00:59:36,040
bought with your credit card.

1271
00:59:36,040 --> 00:59:38,120
Or I got it, but
they didn't like it.

1272
00:59:38,120 --> 00:59:41,400
So if you're a spammer and you
have too many customers saying

1273
00:59:41,400 --> 00:59:43,150
things like this,
then you will actually

1274
00:59:43,150 --> 00:59:45,580
get charged very,
very high fines.

1275
00:59:45,580 --> 00:59:50,550
And as we saw earlier, the
clickthrough rates for spam

1276
00:59:50,550 --> 00:59:52,285
are super, super low.

1277
00:59:52,285 --> 00:59:55,070
The conversion rates
are super, super low.

1278
00:59:55,070 --> 00:59:58,290
So even just one or two
fines might wipe out

1279
00:59:58,290 --> 01:00:00,302
your entire profit
for a month, let's

1280
01:00:00,302 --> 01:00:01,510
say, for something like this.

1281
01:00:01,510 --> 01:00:03,860
So spammers are really
motivated to avoid these fines

1282
01:00:03,860 --> 01:00:04,850
in both cases.

1283
01:00:04,850 --> 01:00:07,920
AUDIENCE: Would using
Paypal obscure any of that,

1284
01:00:07,920 --> 01:00:10,590
like the relationship
with the bank?

1285
01:00:10,590 --> 01:00:13,690
PROFESSOR: Well,
typically, yes and no.

1286
01:00:13,690 --> 01:00:17,930
So you can think of those--
Paypal is in many respects

1287
01:00:17,930 --> 01:00:20,410
very similar to
Visa or MasterCard.

1288
01:00:20,410 --> 01:00:24,420
So it has very similar
regulations that oversee it,

1289
01:00:24,420 --> 01:00:27,080
because it bears many of
the same types of risks.

1290
01:00:27,080 --> 01:00:31,122
I do think that Visa
has slightly stricter

1291
01:00:31,122 --> 01:00:32,580
restrictions on
some of this stuff,

1292
01:00:32,580 --> 01:00:34,000
as we'll talk about in a second.

1293
01:00:34,000 --> 01:00:35,375
But for all intents
and purposes,

1294
01:00:35,375 --> 01:00:37,012
Paypal looks very similar.

1295
01:00:37,012 --> 01:00:39,200
AUDIENCE: Is there
any sort of idea

1296
01:00:39,200 --> 01:00:42,405
of having a group where you
make some sort of account

1297
01:00:42,405 --> 01:00:44,520
and then intentionally go
to a bunch of spammers,

1298
01:00:44,520 --> 01:00:48,180
buy a bunch of things, and then
ask for a bunch of chargebacks

1299
01:00:48,180 --> 01:00:50,590
whether or not they
send it to you?

1300
01:00:50,590 --> 01:00:52,470
So that they incur these fines.

1301
01:00:52,470 --> 01:00:55,110
Or report them for
misclassifying things,

1302
01:00:55,110 --> 01:00:57,540
in order to just make
them pay these fines.

1303
01:00:57,540 --> 01:00:59,830
PROFESSOR: That's interesting.

1304
01:00:59,830 --> 01:01:00,706
It's like vigilantes.

1305
01:01:00,706 --> 01:01:01,871
AUDIENCE: Spam the spammers.

1306
01:01:01,871 --> 01:01:03,030
PROFESSOR: Yeah, exactly.

1307
01:01:03,030 --> 01:01:04,988
I don't know if I've
heard anything about that.

1308
01:01:04,988 --> 01:01:09,630
I do know that the
spammers do try to detect

1309
01:01:09,630 --> 01:01:11,350
people who are trolling them.

1310
01:01:11,350 --> 01:01:14,710
So for example, one thing that
they talked about in the paper

1311
01:01:14,710 --> 01:01:18,160
a little bit is that
spammers-- so how

1312
01:01:18,160 --> 01:01:21,519
did the authors of the
paper determine all this?

1313
01:01:21,519 --> 01:01:23,310
They actually got a
bunch of spam messages,

1314
01:01:23,310 --> 01:01:24,685
they clicked on
a bunch of stuff.

1315
01:01:24,685 --> 01:01:26,230
They got a special
Visa card they

1316
01:01:26,230 --> 01:01:28,870
used to purchase this stuff,
and then so on and so forth.

1317
01:01:28,870 --> 01:01:31,250
So spammers obviously
don't like this.

1318
01:01:31,250 --> 01:01:33,810
And so in the paper they
call this test buys.

1319
01:01:33,810 --> 01:01:35,620
Spammers want to
prevent these test buys

1320
01:01:35,620 --> 01:01:38,430
from researchers who are trying
to figure out what's going on.

1321
01:01:38,430 --> 01:01:41,990
So one thing that some spammers
did-- do, I should say--

1322
01:01:41,990 --> 01:01:45,330
is they actually require
proof of your identity

1323
01:01:45,330 --> 01:01:46,730
before you can buy something.

1324
01:01:46,730 --> 01:01:49,820
So they might ask you to send
a picture of your photo ID,

1325
01:01:49,820 --> 01:01:51,470
or something like that.

1326
01:01:51,470 --> 01:01:53,790
In particular,
some people started

1327
01:01:53,790 --> 01:01:58,000
doing this after Visa tightened
up some of their rules

1328
01:01:58,000 --> 01:01:58,720
about spam.

1329
01:01:58,720 --> 01:02:04,500
Now, the problem with this
is that most people who

1330
01:02:04,500 --> 01:02:07,000
would click on span
apparently are still

1331
01:02:07,000 --> 01:02:10,470
reluctant to send their photo
ID to just some random person.

1332
01:02:10,470 --> 01:02:12,527
So there's a bunch
of-- I've linked

1333
01:02:12,527 --> 01:02:14,027
one of these articles
in the lecture

1334
01:02:14,027 --> 01:02:15,460
notes-- there's a bunch
of hilarious commentary

1335
01:02:15,460 --> 01:02:18,200
from a spammer bulletin board,
where they say oh no, Visa's

1336
01:02:18,200 --> 01:02:19,260
cracking down on us.

1337
01:02:19,260 --> 01:02:21,390
We try to ask for
people's photo IDs,

1338
01:02:21,390 --> 01:02:23,820
but they don't want to send
it to us for some reason.

1339
01:02:23,820 --> 01:02:25,840
And it's so weird that people
wouldn't want to do that,

1340
01:02:25,840 --> 01:02:27,490
but they will give them
their credit card number.

1341
01:02:27,490 --> 01:02:29,198
But anyway, so long
story short, spammers

1342
01:02:29,198 --> 01:02:33,375
are highly incentivized to try
to detect that kind of stuff.

1343
01:02:33,375 --> 01:02:36,854
AUDIENCE: So for chargebacks,
if you don't necessarily

1344
01:02:36,854 --> 01:02:40,333
want your bank to know that you
were buying these completely

1345
01:02:40,333 --> 01:02:44,309
shady items, do a lot of
users actually do chargebacks

1346
01:02:44,309 --> 01:02:45,800
if they don't get the item?

1347
01:02:45,800 --> 01:02:47,800
Or are they too embarrassed?

1348
01:02:47,800 --> 01:02:49,466
PROFESSOR: Yeah,
that's a good question.

1349
01:02:49,466 --> 01:02:52,540
I don't know what
fraction of people

1350
01:02:52,540 --> 01:02:54,890
are in the set of
people who bought

1351
01:02:54,890 --> 01:02:56,830
herbal Flintstones
vitamins, were disappointed

1352
01:02:56,830 --> 01:02:58,290
by herbal Flintstones
vitamins, and then,

1353
01:02:58,290 --> 01:03:00,706
yeah, told their bank-- but
what's interesting, though, is

1354
01:03:00,706 --> 01:03:03,016
that the bank has to
know in the first place

1355
01:03:03,016 --> 01:03:04,390
that they're going
to this place,

1356
01:03:04,390 --> 01:03:06,120
right, because the
thing went through.

1357
01:03:06,120 --> 01:03:09,634
So avoiding the chargeback, I
don't think you're going to--

1358
01:03:09,634 --> 01:03:11,300
but by doing the
chargeback, let me say,

1359
01:03:11,300 --> 01:03:13,799
I don't think you'd reveal any
extra information to the bank

1360
01:03:13,799 --> 01:03:15,000
that they wouldn't already know.

1361
01:03:15,000 --> 01:03:17,291
Because they had to clear
the transaction first for you

1362
01:03:17,291 --> 01:03:19,000
to actually get it
and be disappointed.

1363
01:03:19,000 --> 01:03:22,320
AUDIENCE: So then roughly how
many chargebacks is too much?

1364
01:03:22,320 --> 01:03:24,410
PROFESSOR: So some of the
figures I've heard here

1365
01:03:24,410 --> 01:03:26,862
are greater than 1%.

1366
01:03:26,862 --> 01:03:28,445
So in other words,
if you're a spammer

1367
01:03:28,445 --> 01:03:30,890
and you have more than 1%
of your transactions causing

1368
01:03:30,890 --> 01:03:33,142
these problems,
you get in trouble.

1369
01:03:33,142 --> 01:03:35,475
And I wouldn't be surprised
if it was a little bit lower

1370
01:03:35,475 --> 01:03:37,794
than that, but 1% is the
number that I've heard.

1371
01:03:41,220 --> 01:03:41,720
All right.

1372
01:03:41,720 --> 01:03:44,540
So to me, like I
said, this was one

1373
01:03:44,540 --> 01:03:46,607
of the most interesting
parts of the paper.

1374
01:03:46,607 --> 01:03:48,940
Because I would have thought
that a lot of spamming just

1375
01:03:48,940 --> 01:03:50,234
involved straight-up fraud.

1376
01:03:50,234 --> 01:03:52,150
That people clicked on
links, they sent money,

1377
01:03:52,150 --> 01:03:53,149
they never got anything.

1378
01:03:53,149 --> 01:03:55,272
But as it turns out,
because these spammers have

1379
01:03:55,272 --> 01:03:58,130
to go through this
network which has

1380
01:03:58,130 --> 01:04:02,330
all these mechanisms
to prevent fraud,

1381
01:04:02,330 --> 01:04:06,892
they end up having to actually
ship things over to users.

1382
01:04:06,892 --> 01:04:10,030
So that's kind of neat.

1383
01:04:10,030 --> 01:04:12,400
And so another
reason why spammers

1384
01:04:12,400 --> 01:04:14,940
want to do these things,
properly classify transactions

1385
01:04:14,940 --> 01:04:16,610
and actually send
things to users,

1386
01:04:16,610 --> 01:04:24,650
is that only a few
banks are actually

1387
01:04:24,650 --> 01:04:28,320
willing to interact
with spammers.

1388
01:04:32,590 --> 01:04:38,894
And so what this means
is that if the spammer is

1389
01:04:38,894 --> 01:04:40,560
getting a lot of
chargebacks, or getting

1390
01:04:40,560 --> 01:04:42,685
in trouble with the bank
or the credit card company

1391
01:04:42,685 --> 01:04:44,549
or whatever, and
some bank decides,

1392
01:04:44,549 --> 01:04:46,090
I can't do business
with you anymore,

1393
01:04:46,090 --> 01:04:49,030
there's not a really
large set of other banks

1394
01:04:49,030 --> 01:04:53,120
that the spammer could go to
to continue their chicanery.

1395
01:04:53,120 --> 01:04:57,440
So one study of this stuff
found that there are basically

1396
01:04:57,440 --> 01:05:06,290
only 30 acquiring banks that
spammers were seen to use over

1397
01:05:06,290 --> 01:05:07,530
some two-year period.

1398
01:05:07,530 --> 01:05:09,360
That's actually not very high.

1399
01:05:09,360 --> 01:05:14,166
So there is this
other incentive to not

1400
01:05:14,166 --> 01:05:15,790
be too goofy with
the financial system,

1401
01:05:15,790 --> 01:05:18,165
because you don't really have
too many other places to go

1402
01:05:18,165 --> 01:05:20,300
if you break those
relationships.

1403
01:05:20,300 --> 01:05:25,140
So it seems like maybe
this is a good choke point

1404
01:05:25,140 --> 01:05:26,910
to try to cut down on spam.

1405
01:05:26,910 --> 01:05:29,075
So we've already discussed
how things like botnets

1406
01:05:29,075 --> 01:05:31,140
give the attack a
lot of IP addresses.

1407
01:05:31,140 --> 01:05:33,919
There's a lot of
different types of hosts

1408
01:05:33,919 --> 01:05:36,210
who are willing to run web
servers, so on and so forth.

1409
01:05:36,210 --> 01:05:37,751
But this number
actually seems small.

1410
01:05:37,751 --> 01:05:41,660
So maybe we can actually
attack spamming here.

1411
01:05:41,660 --> 01:05:43,920
But as I alluded to earlier,
it's a little bit tricky

1412
01:05:43,920 --> 01:05:46,900
to do this because of things
like differing IP laws,

1413
01:05:46,900 --> 01:05:50,290
because of things
like the fact that it

1414
01:05:50,290 --> 01:05:54,830
can be sort of tricky to
actually say that spammers

1415
01:05:54,830 --> 01:05:57,560
are doing something illegal.

1416
01:05:57,560 --> 01:06:00,230
So if you are
using spam messages

1417
01:06:00,230 --> 01:06:03,220
to sell someone-- let's make
this up, let's say sugar,

1418
01:06:03,220 --> 01:06:04,130
sugar's delicious.

1419
01:06:04,130 --> 01:06:07,252
It's not illegal to sell
sugar, even at cut-rate prices.

1420
01:06:07,252 --> 01:06:08,710
So even though the
way that you may

1421
01:06:08,710 --> 01:06:11,400
have drawn the user
to that purchase

1422
01:06:11,400 --> 01:06:13,970
was sort of
duplicitous or gross,

1423
01:06:13,970 --> 01:06:17,180
it is not in and of itself
illegal to sell someone sugar.

1424
01:06:17,180 --> 01:06:18,860
And so as it turns
out, a lot of spam

1425
01:06:18,860 --> 01:06:21,647
sort of falls into
this gray area,

1426
01:06:21,647 --> 01:06:23,480
where the things that
the spammers are doing

1427
01:06:23,480 --> 01:06:26,510
are distasteful, but maybe
not necessarily as illegal

1428
01:06:26,510 --> 01:06:27,370
as you'd think.

1429
01:06:27,370 --> 01:06:30,350
Now, for stuff like
pirated software,

1430
01:06:30,350 --> 01:06:31,742
there it's much more clear-cut.

1431
01:06:31,742 --> 01:06:33,700
But suffice it to say,
it's not always the case

1432
01:06:33,700 --> 01:06:35,710
that you can just point to one
of these banks and say hey,

1433
01:06:35,710 --> 01:06:36,918
your customers are criminals.

1434
01:06:36,918 --> 01:06:38,220
Because that's not always true.

1435
01:06:38,220 --> 01:06:44,870
Particularly if there's not a
very strong paper trail that

1436
01:06:44,870 --> 01:06:48,230
attaches the financial
transaction to some spam

1437
01:06:48,230 --> 01:06:51,160
URL that was the origin
of the transaction.

1438
01:06:51,160 --> 01:06:55,050
It's often very difficult to
prove those types of links.

1439
01:06:55,050 --> 01:06:58,260
OK, so since this
paper was published,

1440
01:06:58,260 --> 01:07:00,952
the credit card networks
have taken some actions.

1441
01:07:00,952 --> 01:07:02,910
So this paper actually
made a pretty big splash

1442
01:07:02,910 --> 01:07:04,100
when it came out.

1443
01:07:04,100 --> 01:07:07,430
And so the association networks
like Visa and MasterCard

1444
01:07:07,430 --> 01:07:09,560
and all of them were
wondering, what can we

1445
01:07:09,560 --> 01:07:13,510
do to cut down on
some of this spam?

1446
01:07:13,510 --> 01:07:15,360
So interestingly, after
the paper came out,

1447
01:07:15,360 --> 01:07:18,710
some pharmaceutical companies
and software vendors actually

1448
01:07:18,710 --> 01:07:21,000
lodged complaints with Visa.

1449
01:07:21,000 --> 01:07:22,450
So if you remember
from the paper,

1450
01:07:22,450 --> 01:07:25,790
Visa was the association
network the researchers used

1451
01:07:25,790 --> 01:07:28,640
to make these test
buys, these dummy buys.

1452
01:07:28,640 --> 01:07:30,890
So it's a little
bit unfortunate,

1453
01:07:30,890 --> 01:07:33,600
but that then showed
some of these companies

1454
01:07:33,600 --> 01:07:37,510
that hey, Visa can be used
as the association network

1455
01:07:37,510 --> 01:07:39,280
to fund some of this
spam, or to translate

1456
01:07:39,280 --> 01:07:41,590
some of this spam traffic.

1457
01:07:41,590 --> 01:07:44,700
So some people
complained about that.

1458
01:07:44,700 --> 01:07:51,270
So Visa made some policy
changes in response

1459
01:07:51,270 --> 01:07:53,600
to some of the issues
that were brought up

1460
01:07:53,600 --> 01:07:56,460
in the paper and some
of the complaints

1461
01:07:56,460 --> 01:07:59,120
that they got as
a result. So now,

1462
01:07:59,120 --> 01:08:07,090
for example, all
pharmaceutical sales are now

1463
01:08:07,090 --> 01:08:11,780
labeled by Visa as high-risk.

1464
01:08:14,990 --> 01:08:19,439
So what this means is
that if a bank acts

1465
01:08:19,439 --> 01:08:27,859
as an acquirer for these
high-risk transactions,

1466
01:08:27,859 --> 01:08:31,569
then Visa will have some more
stringent regulations they will

1467
01:08:31,569 --> 01:08:34,460
put on that merchant-side bank.

1468
01:08:34,460 --> 01:08:36,729
For example, they
will require that bank

1469
01:08:36,729 --> 01:08:38,920
to engage in a risk
management program,

1470
01:08:38,920 --> 01:08:40,970
and they may be audited
more frequently,

1471
01:08:40,970 --> 01:08:42,229
and so on and so forth.

1472
01:08:42,229 --> 01:08:45,410
So Visa made that change.

1473
01:08:45,410 --> 01:08:52,430
And Visa also changed
its operating guidelines.

1474
01:08:52,430 --> 01:08:58,720
So its operating
guidelines, now they

1475
01:08:58,720 --> 01:09:07,220
explicitly enumerate and
forbid illegal sales of drugs

1476
01:09:07,220 --> 01:09:08,970
and trademark-enforcing goods.

1477
01:09:12,050 --> 01:09:14,689
So the reason why they did
this is that by tightening up

1478
01:09:14,689 --> 01:09:17,270
this language, it is
now easier for them

1479
01:09:17,270 --> 01:09:21,737
to issue more aggressive fines
against banks and merchants

1480
01:09:21,737 --> 01:09:25,680
that they feel are doing
things like selling

1481
01:09:25,680 --> 01:09:29,859
illegal pharmaceuticals or
selling knockoff versions

1482
01:09:29,859 --> 01:09:32,065
of watches or things like that.

1483
01:09:32,065 --> 01:09:33,815
So once again, there's
still a lot of spam

1484
01:09:33,815 --> 01:09:36,590
that's in that gray area where
it's not necessarily illegal.

1485
01:09:36,590 --> 01:09:37,624
It's just that the
customers were required

1486
01:09:37,624 --> 01:09:38,665
to do certain techniques.

1487
01:09:38,665 --> 01:09:40,459
And this is very
useful because now Visa

1488
01:09:40,459 --> 01:09:44,450
can drop some much
bigger hammers on folks.

1489
01:09:44,450 --> 01:09:46,450
And as I mentioned before,
some of the spammers

1490
01:09:46,450 --> 01:09:48,420
tried to react to
this by saying,

1491
01:09:48,420 --> 01:09:50,880
well, let's just
prevent these test buys.

1492
01:09:50,880 --> 01:09:52,796
Because not only do
security researchers do

1493
01:09:52,796 --> 01:09:54,902
these test buys, but the
association networks can

1494
01:09:54,902 --> 01:09:55,860
do these test buys too.

1495
01:09:55,860 --> 01:09:58,160
So they did some things like
the photo ID type stuff,

1496
01:09:58,160 --> 01:10:01,820
and that tended not to
work out super well.

1497
01:10:01,820 --> 01:10:04,460
And so at least a few years
after these changes were made,

1498
01:10:04,460 --> 01:10:05,900
this did have an impact.

1499
01:10:05,900 --> 01:10:09,160
I'm not sure what the latest
state-of-the-art is with

1500
01:10:09,160 --> 01:10:12,014
respect to trolling these
Visa policy changes,

1501
01:10:12,014 --> 01:10:14,430
but it was kind of cool to see
this paper have this impact

1502
01:10:14,430 --> 01:10:16,574
in real life.

1503
01:10:16,574 --> 01:10:18,740
So one interesting thing
they mentioned in the paper

1504
01:10:18,740 --> 01:10:21,825
is they talked about
the ethical aspects

1505
01:10:21,825 --> 01:10:23,260
of doing security research.

1506
01:10:23,260 --> 01:10:27,960
And in particular, doing this
research about the spam chain.

1507
01:10:27,960 --> 01:10:31,530
To actually understand how some
of this banking stuff worked,

1508
01:10:31,530 --> 01:10:34,700
these researchers actually
had to make purchases.

1509
01:10:34,700 --> 01:10:37,890
They actually had to
give money to people

1510
01:10:37,890 --> 01:10:39,310
in exchange for these products.

1511
01:10:39,310 --> 01:10:41,420
And so in the paper they
go through this kind

1512
01:10:41,420 --> 01:10:44,857
of semi-hilarious defensive
section where they say,

1513
01:10:44,857 --> 01:10:46,690
we totally burned
everything that we bought.

1514
01:10:46,690 --> 01:10:47,398
We didn't use it.

1515
01:10:47,398 --> 01:10:49,972
We talked to the companies
whose pirated software we

1516
01:10:49,972 --> 01:10:51,320
were buying before we got it.

1517
01:10:51,320 --> 01:10:53,240
But these things are actually
pretty important to go through,

1518
01:10:53,240 --> 01:10:55,100
particularly if you're
within a university setting.

1519
01:10:55,100 --> 01:10:56,600
Because as you may
know, if you want

1520
01:10:56,600 --> 01:10:59,174
to do anything that involves--
particularly human research,

1521
01:10:59,174 --> 01:11:01,590
but anything that might have
these ethical sort of aspects

1522
01:11:01,590 --> 01:11:04,060
to it, you have to get things
cleared by lawyers, sometimes

1523
01:11:04,060 --> 01:11:06,121
by an IRB, and things like that.

1524
01:11:06,121 --> 01:11:07,870
So it's actually pretty
important for them

1525
01:11:07,870 --> 01:11:10,820
to jump through these hoops,
because at the end of the day

1526
01:11:10,820 --> 01:11:13,090
they have to at least be
somewhat confident that they

1527
01:11:13,090 --> 01:11:16,170
weren't supporting some
deeply nefarious activity

1528
01:11:16,170 --> 01:11:18,130
in some far-flung
corner of the world.

1529
01:11:18,130 --> 01:11:20,640
So that was another interesting
part of the paper, too.

1530
01:11:20,640 --> 01:11:23,390
And other people have talked in
this class about things like,

1531
01:11:23,390 --> 01:11:27,610
what are the ethics of releasing
zero-day exploits if you

1532
01:11:27,610 --> 01:11:29,360
know they haven't been
patched by someone?

1533
01:11:29,360 --> 01:11:30,818
So it's a really
interesting aspect

1534
01:11:30,818 --> 01:11:32,075
of doing security research.

1535
01:11:32,075 --> 01:11:36,350
AUDIENCE: Is there any sort of
oversight on security ethics?

1536
01:11:36,350 --> 01:11:39,042
Because in the paper, they
said the IRB wasn't interested.

1537
01:11:39,042 --> 01:11:41,000
PROFESSOR: Yeah, so that
was super interesting.

1538
01:11:41,000 --> 01:11:41,500
Yes.

1539
01:11:41,500 --> 01:11:44,470
They said the IRB wasn't
interested, I think,

1540
01:11:44,470 --> 01:11:48,940
because there was no
obvious human subject.

1541
01:11:48,940 --> 01:11:50,890
But I think that at
most universities,

1542
01:11:50,890 --> 01:11:53,015
you couldn't just
say, oh, there's

1543
01:11:53,015 --> 01:11:54,515
no direct human
subject, let me just

1544
01:11:54,515 --> 01:11:58,220
go buy some stuff from somebody
at the end of a spam link.

1545
01:11:58,220 --> 01:12:00,170
And what they describe
in the paper, actually

1546
01:12:00,170 --> 01:12:01,240
in the acknowledgment
section, they

1547
01:12:01,240 --> 01:12:02,730
thank this whole set of people.

1548
01:12:02,730 --> 01:12:06,024
Like, Sally at Legal,
so-and-so at the Philosophers

1549
01:12:06,024 --> 01:12:07,440
For Ethical Computing
Association,

1550
01:12:07,440 --> 01:12:09,440
and stuff like that.

1551
01:12:09,440 --> 01:12:12,650
I don't think there's
actually a, how would

1552
01:12:12,650 --> 01:12:16,820
you say it, an
America-wide standard

1553
01:12:16,820 --> 01:12:18,420
for doing this type of research.

1554
01:12:18,420 --> 01:12:20,070
I know that each
university's IRB

1555
01:12:20,070 --> 01:12:22,640
has slightly different policies
of what they do and do not

1556
01:12:22,640 --> 01:12:26,639
allow, but I don't think
there's a blanket policy.

1557
01:12:26,639 --> 01:12:29,477
AUDIENCE: Out of the
350 million spam URLs

1558
01:12:29,477 --> 01:12:33,840
they tracked, of the 28 that
actually responded, is there

1559
01:12:33,840 --> 01:12:37,554
any chance that an appreciable
number of those 28 spam

1560
01:12:37,554 --> 01:12:39,637
responses were coming from
researchers researching

1561
01:12:39,637 --> 01:12:42,332
on spam?

1562
01:12:42,332 --> 01:12:44,540
PROFESSOR: Well, it's true
that this type of calculus

1563
01:12:44,540 --> 01:12:46,320
is actually one
reason why I think

1564
01:12:46,320 --> 01:12:49,302
the authors went to such
lengths to defend themselves.

1565
01:12:49,302 --> 01:12:51,780
Because if you think
about it, the reason

1566
01:12:51,780 --> 01:12:53,680
why those statistics
are so hilarious

1567
01:12:53,680 --> 01:12:56,210
is that it means that if you
were to add five or remove

1568
01:12:56,210 --> 01:12:58,340
five, that's the difference
between a spammer being

1569
01:12:58,340 --> 01:13:00,090
able to give their
kids, like, a real gift

1570
01:13:00,090 --> 01:13:01,952
versus a piece of coal.

1571
01:13:01,952 --> 01:13:03,410
Because those
numbers are so small.

1572
01:13:06,712 --> 01:13:08,587
So with regard to that
particular [INAUDIBLE]

1573
01:13:08,587 --> 01:13:10,545
that I gave you, I don't
know how many of those

1574
01:13:10,545 --> 01:13:12,190
were researchers.

1575
01:13:12,190 --> 01:13:15,420
But I do think in general--
like I said, the spammers,

1576
01:13:15,420 --> 01:13:17,010
they want to take your money.

1577
01:13:17,010 --> 01:13:19,460
And so if they could
find some equilibrium

1578
01:13:19,460 --> 01:13:23,200
whereby security researchers
could do test buys,

1579
01:13:23,200 --> 01:13:25,650
but that had no impact
on their overall sales,

1580
01:13:25,650 --> 01:13:26,949
they'd be fine with that.

1581
01:13:26,949 --> 01:13:27,990
They just want the money.

1582
01:13:27,990 --> 01:13:29,615
But the tricky thing
is that, let's say

1583
01:13:29,615 --> 01:13:32,520
that-- let's make some
number up-- half of those 35

1584
01:13:32,520 --> 01:13:34,560
were test buys, and
that resulted in people

1585
01:13:34,560 --> 01:13:37,490
putting pressure on the banks,
and then instead of 35 they'd

1586
01:13:37,490 --> 01:13:38,470
be getting two.

1587
01:13:38,470 --> 01:13:39,380
That they don't want.

1588
01:13:39,380 --> 01:13:41,956
So that's why they're so
motivated to stop that stuff.

1589
01:13:41,956 --> 01:13:44,436
AUDIENCE: How much of
this is blind emailing

1590
01:13:44,436 --> 01:13:45,924
versus any sort of filtering?

1591
01:13:45,924 --> 01:13:48,652
Because I'm sure they
could run some models

1592
01:13:48,652 --> 01:13:51,380
and get that 350 million
down to, like, one page.

1593
01:13:51,380 --> 01:13:54,350
PROFESSOR: Yeah, so it's all
about the cost-benefit analysis

1594
01:13:54,350 --> 01:13:56,350
from the perspective
of the spammer.

1595
01:13:56,350 --> 01:13:59,660
So I think that you're right,
and there are actually--

1596
01:13:59,660 --> 01:14:02,922
there's a marketplace
for more targeted stuff.

1597
01:14:02,922 --> 01:14:05,380
In particular, that's where
some of those compromised email

1598
01:14:05,380 --> 01:14:07,650
accounts can become very useful.

1599
01:14:07,650 --> 01:14:10,170
But I think what you
see is that people

1600
01:14:10,170 --> 01:14:14,774
tend to go for the more
focused stuff, like the more

1601
01:14:14,774 --> 01:14:16,190
focused spam emails,
for what they

1602
01:14:16,190 --> 01:14:17,960
view as higher-reward targets.

1603
01:14:17,960 --> 01:14:21,240
So for example,
political groups.

1604
01:14:21,240 --> 01:14:24,010
People associated with the
Dalai Lama, for instance.

1605
01:14:24,010 --> 01:14:26,620
There, the perceived
value of being

1606
01:14:26,620 --> 01:14:28,260
able to get into that
system is so high

1607
01:14:28,260 --> 01:14:30,958
that people will spend the
time to do this kind of stuff.

1608
01:14:30,958 --> 01:14:32,333
AUDIENCE: It would
be interesting

1609
01:14:32,333 --> 01:14:33,940
if there was one
company dedicated

1610
01:14:33,940 --> 01:14:35,788
to finding all the
gullible grandmas

1611
01:14:35,788 --> 01:14:37,640
and putting their
emails into stuff.

1612
01:14:37,640 --> 01:14:38,270
PROFESSOR: Oh, interesting.

1613
01:14:38,270 --> 01:14:38,780
I see.

1614
01:14:38,780 --> 01:14:40,154
So basically having
some database

1615
01:14:40,154 --> 01:14:42,660
where it's like, totally send
spam to this person, because--

1616
01:14:42,660 --> 01:14:43,700
AUDIENCE: It works.

1617
01:14:43,700 --> 01:14:45,908
PROFESSOR: I wouldn't be
surprised if stuff like that

1618
01:14:45,908 --> 01:14:49,310
existed, but I don't
know if they do.

1619
01:14:49,310 --> 01:14:52,110
So one last thing that I
wanted to mention is that,

1620
01:14:52,110 --> 01:14:54,730
and I alluded to this a
bit earlier in the lecture,

1621
01:14:54,730 --> 01:14:57,970
that some companies have taken
to doing these things they

1622
01:14:57,970 --> 01:14:59,357
call hackbacks.

1623
01:14:59,357 --> 01:15:01,440
So the idea is that, let's
say that you're a bank,

1624
01:15:01,440 --> 01:15:02,981
someone tries to
break into your bank

1625
01:15:02,981 --> 01:15:04,440
and steal your information.

1626
01:15:04,440 --> 01:15:07,040
That bank will then,
of their own volition,

1627
01:15:07,040 --> 01:15:10,780
go back to those hackers
and try to do something.

1628
01:15:10,780 --> 01:15:13,116
Where something may be as
quote-on-quote innocuous

1629
01:15:13,116 --> 01:15:15,090
as shutting down
the botnet, or maybe

1630
01:15:15,090 --> 01:15:16,920
they try to steal
their information back,

1631
01:15:16,920 --> 01:15:17,794
and things like that.

1632
01:15:17,794 --> 01:15:20,940
This has actually
become very much more

1633
01:15:20,940 --> 01:15:22,550
common than it used to be.

1634
01:15:22,550 --> 01:15:26,910
And one reason for this is that
because the legal system has

1635
01:15:26,910 --> 01:15:30,261
a little bit slow in adapting
to some of these threats,

1636
01:15:30,261 --> 01:15:32,760
some of these institutions, in
particular software companies

1637
01:15:32,760 --> 01:15:34,852
and banks, are tired of
waiting for government--

1638
01:15:34,852 --> 01:15:36,560
like, their national
government-- to deal

1639
01:15:36,560 --> 01:15:37,540
with stuff.

1640
01:15:37,540 --> 01:15:40,630
So what ends up happening
is that, for example, there

1641
01:15:40,630 --> 01:15:43,000
was this big botnet
in 2013 that was

1642
01:15:43,000 --> 01:15:45,690
hosting all kinds of pirated
goods and things like that.

1643
01:15:45,690 --> 01:15:51,010
And so this huge coalition of
Microsoft, American Express,

1644
01:15:51,010 --> 01:15:53,350
Paypal, a bunch of them
launched an operation

1645
01:15:53,350 --> 01:15:55,379
to take down a botnet.

1646
01:15:55,379 --> 01:15:56,920
They themselves took
down the botnet.

1647
01:15:56,920 --> 01:15:58,586
They lurked around
for a while, they

1648
01:15:58,586 --> 01:16:01,210
learned about where the command
and control infrastructure was.

1649
01:16:01,210 --> 01:16:02,690
They actually went
in there, took

1650
01:16:02,690 --> 01:16:04,773
control of the command and
control infrastructure,

1651
01:16:04,773 --> 01:16:06,685
identified where all
the end-user bots were.

1652
01:16:06,685 --> 01:16:08,590
And they could send
them messages saying,

1653
01:16:08,590 --> 01:16:10,630
you need to patch your machine.

1654
01:16:10,630 --> 01:16:13,790
And so it's a very interesting
area of intersection

1655
01:16:13,790 --> 01:16:15,960
between security and the law.

1656
01:16:15,960 --> 01:16:17,850
Because what part
of American law,

1657
01:16:17,850 --> 01:16:21,810
for example, gave those
companies the right to do that?

1658
01:16:21,810 --> 01:16:24,880
So what Microsoft
lawyers said, at least,

1659
01:16:24,880 --> 01:16:26,530
is that they said
these botnets were

1660
01:16:26,530 --> 01:16:29,380
violating Microsoft trademarks.

1661
01:16:29,380 --> 01:16:31,450
So for example, if you
sell pirated goods,

1662
01:16:31,450 --> 01:16:34,222
and you're saying this
is Windows, for example,

1663
01:16:34,222 --> 01:16:36,180
but it's not actually
Windows or it didn't come

1664
01:16:36,180 --> 01:16:38,630
from an official channel,
then Microsoft says OK,

1665
01:16:38,630 --> 01:16:40,340
you're violating our trademark.

1666
01:16:40,340 --> 01:16:43,330
Therefore we can
hack your botnet.

1667
01:16:43,330 --> 01:16:46,980
It's a little interesting to
see how that leap of logic

1668
01:16:46,980 --> 01:16:47,760
took place.

1669
01:16:47,760 --> 01:16:49,280
But the courts allowed it.

1670
01:16:49,280 --> 01:16:51,440
And this is increasingly
happening more and more.

1671
01:16:51,440 --> 01:16:54,440
And the banks in particular seem
to be pretty upset about this,

1672
01:16:54,440 --> 01:16:57,386
because there seems to be a
lot of state-level sponsorship

1673
01:16:57,386 --> 01:16:58,835
of some of these banking hacks.

1674
01:16:58,835 --> 01:17:00,840
And the bankers care
about the money,

1675
01:17:00,840 --> 01:17:02,350
and so when they
lose this money,

1676
01:17:02,350 --> 01:17:04,000
they get very upset about that.

1677
01:17:04,000 --> 01:17:06,470
And so it's
interesting to see how

1678
01:17:06,470 --> 01:17:09,630
some of the burden
for doing cyber

1679
01:17:09,630 --> 01:17:11,940
security, in particular
offensive operations,

1680
01:17:11,940 --> 01:17:14,800
has now shifted a little bit
more to the private sector.

1681
01:17:14,800 --> 01:17:17,750
So it's not quite clear what
the long-term implications are.

1682
01:17:17,750 --> 01:17:18,250
OK.

1683
01:17:18,250 --> 01:17:19,770
That's the end of
the lecture, and I

1684
01:17:19,770 --> 01:17:21,890
guess we will see
you on Wednesday

1685
01:17:21,890 --> 01:17:25,240
and we'll go through
the class projects.