1 00:00:00,070 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,150 continue to offer high-quality educational resources for free. 5 00:00:10,150 --> 00:00:12,700 To make a donation or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,310 at ocw.mit.edu. 8 00:00:26,169 --> 00:00:27,210 PROFESSOR: Hey, everyone. 9 00:00:27,210 --> 00:00:28,076 Good on that? 10 00:00:28,076 --> 00:00:29,480 All right, cool. 11 00:00:29,480 --> 00:00:34,477 So today we're going to talk about the economics of spam 12 00:00:34,477 --> 00:00:35,740 and security in general. 13 00:00:35,740 --> 00:00:37,355 And so up to this point in the class, 14 00:00:37,355 --> 00:00:40,540 we've mainly talked about the technical aspects of security. 15 00:00:40,540 --> 00:00:42,680 So we've looked at things like buffer overflows, 16 00:00:42,680 --> 00:00:46,540 the same-origin policy, Tor, and all kinds of things like that. 17 00:00:46,540 --> 00:00:49,550 And so the context for that discussion 18 00:00:49,550 --> 00:00:53,780 was that we were looking at how an adversary can compromise 19 00:00:53,780 --> 00:00:54,560 a system. 20 00:00:54,560 --> 00:00:56,570 We tried to devise a threat model that 21 00:00:56,570 --> 00:00:58,820 would describe the types of things we want to prevent, 22 00:00:58,820 --> 00:01:00,320 and then we tried to think about how 23 00:01:00,320 --> 00:01:03,400 we could design systems that would help us to defend 24 00:01:03,400 --> 00:01:05,129 against that threat model. 25 00:01:05,129 --> 00:01:07,560 So today we're going to look at an altered perspective. 26 00:01:07,560 --> 00:01:09,950 And the perspective that we'll look at today 27 00:01:09,950 --> 00:01:13,520 is, why is the attacker trying to compromise your system? 28 00:01:13,520 --> 00:01:17,189 Why is the attacker trying to do these evil things to us? 29 00:01:17,189 --> 00:01:18,730 And so there's a bunch of the reasons 30 00:01:18,730 --> 00:01:20,750 you can imagine why attackers might be 31 00:01:20,750 --> 00:01:22,510 trying to do these evil things. 32 00:01:22,510 --> 00:01:25,805 So some of these attacks are done for ideological reasons. 33 00:01:25,805 --> 00:01:27,804 So think about people who perceive 34 00:01:27,804 --> 00:01:30,220 themselves to be political activists, or things like that. 35 00:01:30,220 --> 00:01:32,950 Or if you think about Stuxnet, for example. 36 00:01:32,950 --> 00:01:35,490 Sometimes it's like governments attacking other governments. 37 00:01:35,490 --> 00:01:38,470 And so for these types of attacks 38 00:01:38,470 --> 00:01:41,265 money, economics, is not the primary motivation 39 00:01:41,265 --> 00:01:42,816 for the attack to take place. 40 00:01:42,816 --> 00:01:45,050 And what's interesting is that it's actually 41 00:01:45,050 --> 00:01:48,540 hard to make these attacks go away, other than generically 42 00:01:48,540 --> 00:01:51,357 making computers more secure. 43 00:01:51,357 --> 00:01:53,190 There's not really some financial thumbscrew 44 00:01:53,190 --> 00:01:57,010 you can turn to make these attackers disincentivized 45 00:01:57,010 --> 00:01:57,940 to do things. 46 00:01:57,940 --> 00:02:02,170 However, there are some types of attacks 47 00:02:02,170 --> 00:02:04,900 that do involve a strong economic component, 48 00:02:04,900 --> 00:02:07,690 and those are some of the things we're going to look at today. 49 00:02:07,690 --> 00:02:08,990 One of the interesting things, though, 50 00:02:08,990 --> 00:02:09,929 is that for a lot of these attacks 51 00:02:09,929 --> 00:02:12,099 that don't have an economic component, in that we 52 00:02:12,099 --> 00:02:14,640 can't use regulations and things like that to try and prevent 53 00:02:14,640 --> 00:02:15,139 them. 54 00:02:15,139 --> 00:02:17,426 It can sometimes be difficult to figure out 55 00:02:17,426 --> 00:02:19,800 how we'd be able to stop them at all beyond, like I said, 56 00:02:19,800 --> 00:02:21,549 just trying to make computers more secure. 57 00:02:21,549 --> 00:02:23,570 For example, Stuxnet's a great idea. 58 00:02:23,570 --> 00:02:26,850 So this is the malware that was attacking 59 00:02:26,850 --> 00:02:30,740 some of the industrial software in Iran, with the centrifuges. 60 00:02:30,740 --> 00:02:34,430 So we all kind of know where Stuxnet came from, right? 61 00:02:34,430 --> 00:02:36,850 We basically know it was the Americans and the Israelis. 62 00:02:36,850 --> 00:02:37,370 Basically. 63 00:02:37,370 --> 00:02:40,000 But can we prove that in a court of law? 64 00:02:40,000 --> 00:02:43,344 Like, who can we sue, to say You put Stuxnet on our machine? 65 00:02:43,344 --> 00:02:44,885 So it becomes a little bit murky when 66 00:02:44,885 --> 00:02:47,100 you have some of these attacks, where it's not clear 67 00:02:47,100 --> 00:02:49,720 you can sue the Federal Reserve, or you can sue Israel, 68 00:02:49,720 --> 00:02:50,770 for something like this. 69 00:02:50,770 --> 00:02:52,000 And furthermore, no one's gone on the record 70 00:02:52,000 --> 00:02:53,750 as officially claiming that it was them. 71 00:02:53,750 --> 00:02:56,660 So there's some very interesting legal and financial issues 72 00:02:56,660 --> 00:02:58,243 that get involved when you look at how 73 00:02:58,243 --> 00:02:59,460 to prevent these attacks. 74 00:02:59,460 --> 00:03:01,770 So there are many kinds of computer crime 75 00:03:01,770 --> 00:03:04,440 that are driven by economic motivations. 76 00:03:04,440 --> 00:03:07,050 So for example, state-sponsored industrial espionage, 77 00:03:07,050 --> 00:03:07,819 for instance. 78 00:03:07,819 --> 00:03:10,110 So this is one thing that some of our previous speakers 79 00:03:10,110 --> 00:03:10,660 have talked about. 80 00:03:10,660 --> 00:03:12,230 Sometimes governments try to hack 81 00:03:12,230 --> 00:03:14,540 into other governments or other industries 82 00:03:14,540 --> 00:03:17,562 to steal intellectual property, or things like that. 83 00:03:17,562 --> 00:03:20,020 And what's interesting is that, like the attacks that we'll 84 00:03:20,020 --> 00:03:21,840 look at today, which are spam, you'll 85 00:03:21,840 --> 00:03:24,750 see that actually take some money to make some money. 86 00:03:24,750 --> 00:03:27,770 Spammers actually have to invest in an infrastructure 87 00:03:27,770 --> 00:03:30,100 before they can actually send these messages out. 88 00:03:30,100 --> 00:03:32,630 And so if you have these attacks where it takes money 89 00:03:32,630 --> 00:03:34,290 to make money, and you can figure out 90 00:03:34,290 --> 00:03:37,314 what that financial sort of tool chain looks like, then 91 00:03:37,314 --> 00:03:38,730 maybe you can think about applying 92 00:03:38,730 --> 00:03:43,580 upstream financial pressure to stop that downstream malware 93 00:03:43,580 --> 00:03:46,470 attacks or security problems. 94 00:03:46,470 --> 00:03:47,900 And so I think the take-home point 95 00:03:47,900 --> 00:03:50,840 is that if we look at the context of spam in particular, 96 00:03:50,840 --> 00:03:54,550 spammers will stop sending spam if it becomes unprofitable. 97 00:03:54,550 --> 00:03:56,980 One of the sad truths of the world that we continue 98 00:03:56,980 --> 00:03:59,260 to get spam messages because it's cheap for them 99 00:03:59,260 --> 00:04:02,465 to send them, and 2% to 3% of our fellow human beings 100 00:04:02,465 --> 00:04:05,050 will actually click on links and look at stuff. 101 00:04:05,050 --> 00:04:08,430 And so as long as these costs for sending these messages out 102 00:04:08,430 --> 00:04:10,775 are so low, then even if the hit rates are low, 103 00:04:10,775 --> 00:04:12,900 people can still make money off that kind of stuff. 104 00:04:12,900 --> 00:04:19,200 So for today we're going to look at attacks 105 00:04:19,200 --> 00:04:24,266 that have a significant economic component to them. 106 00:04:27,020 --> 00:04:30,110 And so one interesting example which I actually just 107 00:04:30,110 --> 00:04:33,490 read about takes place in China. 108 00:04:33,490 --> 00:04:37,710 And so in China they have this problem 109 00:04:37,710 --> 00:04:41,680 with what they call text message cars. 110 00:04:41,680 --> 00:04:46,350 So the basic idea here is that people drive around 111 00:04:46,350 --> 00:04:49,790 with these cars that have these radio antennas attached 112 00:04:49,790 --> 00:04:50,730 to the side. 113 00:04:50,730 --> 00:04:52,770 And they can essentially do-- think of it 114 00:04:52,770 --> 00:04:55,520 almost like a man in the middle between people's mobile cell 115 00:04:55,520 --> 00:04:57,850 phones and the actual cellphone tower. 116 00:04:57,850 --> 00:05:00,360 And so they can basically run around in these troll cars, 117 00:05:00,360 --> 00:05:02,420 and they can get all of these cell phone numbers, 118 00:05:02,420 --> 00:05:06,600 and then use that car to send spam messages directly 119 00:05:06,600 --> 00:05:09,190 to the numbers that they've collected using 120 00:05:09,190 --> 00:05:12,040 this sort of vehicle take. 121 00:05:12,040 --> 00:05:13,850 So these text message cars can actually 122 00:05:13,850 --> 00:05:21,440 send upward of 200,000 messages a day, 123 00:05:21,440 --> 00:05:23,100 which is an incredibly high number. 124 00:05:23,100 --> 00:05:25,630 And the cost of labor over there is actually very cheap. 125 00:05:25,630 --> 00:05:28,134 So it's very inexpensive to hire a driver, 126 00:05:28,134 --> 00:05:29,800 drive around one of these cars, and just 127 00:05:29,800 --> 00:05:32,070 snoop on people's traffic and send them spam. 128 00:05:32,070 --> 00:05:33,970 So let's look at the economics of this. 129 00:05:33,970 --> 00:05:40,530 So what is the cost of the evil antenna, 130 00:05:40,530 --> 00:05:43,350 this thing that allows people to take 131 00:05:43,350 --> 00:05:45,630 these messages off the air? 132 00:05:45,630 --> 00:05:50,530 Roughly speaking, it's somewhere in the order of about 133 00:05:50,530 --> 00:05:53,790 1600 bucks, give or take. 134 00:05:53,790 --> 00:05:59,760 So how much profit can these people make a day? 135 00:05:59,760 --> 00:06:01,470 So in a hilarious coincidence, this 136 00:06:01,470 --> 00:06:06,074 is also roughly 1600 dollars. 137 00:06:06,074 --> 00:06:07,240 So this is very interesting. 138 00:06:07,240 --> 00:06:10,230 What this means is that once you buy one of these things, 139 00:06:10,230 --> 00:06:12,872 then in a day essentially you've made back your money. 140 00:06:12,872 --> 00:06:16,260 So that's great, from the perspective of being a spammer. 141 00:06:16,260 --> 00:06:18,835 Now you might say, OK, but you might get caught by the police 142 00:06:18,835 --> 00:06:21,210 and then you might get put in jail or have to pay a fine. 143 00:06:21,210 --> 00:06:29,650 So in the case of the fines, the fines for getting caught 144 00:06:29,650 --> 00:06:32,100 are less than 5K. 145 00:06:35,220 --> 00:06:37,810 And people rarely get caught. 146 00:06:37,810 --> 00:06:40,215 And so these are the types of calculations 147 00:06:40,215 --> 00:06:41,715 we have to look at when we're trying 148 00:06:41,715 --> 00:06:44,360 to think about how to economically deter 149 00:06:44,360 --> 00:06:45,620 these spammers. 150 00:06:45,620 --> 00:06:47,060 Because if these spammers only get 151 00:06:47,060 --> 00:06:49,870 caught a couple times a year, and they basically 152 00:06:49,870 --> 00:06:52,570 make back their hardware costs in a single day, 153 00:06:52,570 --> 00:06:54,360 it's very tricky to figure out how 154 00:06:54,360 --> 00:06:56,605 we can use financial disincentives to make them 155 00:06:56,605 --> 00:06:58,330 stop doing this kind of stuff. 156 00:06:58,330 --> 00:07:02,790 And what's interesting is that in China the mobile carriers 157 00:07:02,790 --> 00:07:05,160 are also somewhat implicit in this scheme. 158 00:07:05,160 --> 00:07:06,740 So every time you send a spam, you're 159 00:07:06,740 --> 00:07:09,540 going to send some small amount of money to the mobile carrier, 160 00:07:09,540 --> 00:07:09,720 right? 161 00:07:09,720 --> 00:07:10,420 A couple cents. 162 00:07:10,420 --> 00:07:11,970 It works that way over here as well. 163 00:07:11,970 --> 00:07:14,280 Now over here in Europe in many cases, 164 00:07:14,280 --> 00:07:16,450 the mobile carriers have decided that they 165 00:07:16,450 --> 00:07:18,610 don't want angry customers contacting them saying, 166 00:07:18,610 --> 00:07:20,970 I'm getting hit by these spam messages all the time. 167 00:07:20,970 --> 00:07:23,410 But apparently a lot of the Chinese mobile carriers, 168 00:07:23,410 --> 00:07:24,910 at least the top three ones, they're 169 00:07:24,910 --> 00:07:26,780 actually seeing these spam messages 170 00:07:26,780 --> 00:07:29,070 as a source of revenue. 171 00:07:29,070 --> 00:07:31,970 They actually think this is a nice way for them 172 00:07:31,970 --> 00:07:32,950 to get some free money. 173 00:07:32,950 --> 00:07:36,810 So in fact these telcos have set up these things 174 00:07:36,810 --> 00:07:41,414 called 106 prefix numbers. 175 00:07:41,414 --> 00:07:44,190 I don't know if you've heard of these before. 176 00:07:44,190 --> 00:07:44,849 [BANGING] 177 00:07:44,849 --> 00:07:48,521 But the original-- there's apparently a ghost in the room. 178 00:07:48,521 --> 00:07:50,810 The original purpose of these numbers 179 00:07:50,810 --> 00:07:53,710 was to do things for non-commercial reasons. 180 00:07:53,710 --> 00:07:56,180 For example, imagine that you run a company, 181 00:07:56,180 --> 00:07:58,120 and you want to send a bunch of text messages 182 00:07:58,120 --> 00:07:59,540 to all of your employees. 183 00:07:59,540 --> 00:08:02,205 You can use one of these 106 numbers, 184 00:08:02,205 --> 00:08:05,730 and you would basically be able to send things in bulk. 185 00:08:05,730 --> 00:08:08,510 You'd be able to avoid some of the built-in rate-limiting 186 00:08:08,510 --> 00:08:10,840 mechanisms they had in the cell network. 187 00:08:10,840 --> 00:08:12,630 So there's this nice thing sitting around 188 00:08:12,630 --> 00:08:14,820 that spammers can actually use. 189 00:08:14,820 --> 00:08:16,800 And so as it turns out, I think it's 190 00:08:16,800 --> 00:08:26,050 something like 55% of the mobile span that gets sent in China 191 00:08:26,050 --> 00:08:30,620 comes from one of these 106 numbers. 192 00:08:30,620 --> 00:08:32,900 So this is a really interesting case study 193 00:08:32,900 --> 00:08:36,180 of how these financial numbers work out, 194 00:08:36,180 --> 00:08:37,710 and how sometimes you can actually 195 00:08:37,710 --> 00:08:41,630 have these sort of perverse incentives, where in this case 196 00:08:41,630 --> 00:08:44,160 the cellphone carriers are just going along 197 00:08:44,160 --> 00:08:47,223 with these scams and these schemes. 198 00:08:47,223 --> 00:08:49,056 And there'll be a link in the lecture notes. 199 00:08:49,056 --> 00:08:50,968 There's an interesting Economist article about this. 200 00:08:50,968 --> 00:08:51,759 [BANGING CONTINUES] 201 00:08:51,759 --> 00:08:55,800 There is like a pan-African drum circle back there. 202 00:08:55,800 --> 00:08:57,450 This is super exciting, though. 203 00:08:57,450 --> 00:08:57,950 I like it. 204 00:08:57,950 --> 00:08:59,430 I am being adversarially attacked. 205 00:08:59,430 --> 00:09:00,383 That's OK. 206 00:09:00,383 --> 00:09:02,130 We will play through the pain. 207 00:09:02,130 --> 00:09:03,780 Perhaps this is the Mossad. 208 00:09:03,780 --> 00:09:06,920 They don't want me to talk about Stuxnet. 209 00:09:06,920 --> 00:09:09,040 Another interesting thing about security 210 00:09:09,040 --> 00:09:12,400 is that there are actually many companies that 211 00:09:12,400 --> 00:09:14,470 deal in cyber arms. 212 00:09:14,470 --> 00:09:17,802 So this is kind of something out of G.I. Joe, 213 00:09:17,802 --> 00:09:20,260 but there are actually these companies that will sit around 214 00:09:20,260 --> 00:09:22,976 and they will actually sell you malware, 215 00:09:22,976 --> 00:09:24,350 they will sell you exploits, they 216 00:09:24,350 --> 00:09:26,300 will sell you things like this. 217 00:09:26,300 --> 00:09:34,430 So one example is this company that's called Endgame. 218 00:09:34,430 --> 00:09:42,210 And so for example for about $1.5 million, 219 00:09:42,210 --> 00:09:45,940 Endgame will give you IP addresses 220 00:09:45,940 --> 00:09:53,195 and the physical locations of millions of unpatched machines. 221 00:09:57,460 --> 00:10:00,450 So they have sort of vantage points all over the internet, 222 00:10:00,450 --> 00:10:02,620 and they know all kinds of interesting information 223 00:10:02,620 --> 00:10:04,690 about machines that you may or may not 224 00:10:04,690 --> 00:10:07,690 want to attack if, for example, you're a government, 225 00:10:07,690 --> 00:10:09,890 or if you're another agency or something like that. 226 00:10:09,890 --> 00:10:15,650 For about $2.5 million, they will give you 227 00:10:15,650 --> 00:10:22,990 what is delightfully called a zero-day subscription package. 228 00:10:22,990 --> 00:10:28,170 And so if you sign up for this, then basically you 229 00:10:28,170 --> 00:10:30,800 will get 25 exploits a year, they 230 00:10:30,800 --> 00:10:33,130 claim, for that much money. 231 00:10:33,130 --> 00:10:36,060 And so you'll get those exploits in your inbox or whatever. 232 00:10:36,060 --> 00:10:39,880 Once again, you can do with these things whatever you want. 233 00:10:39,880 --> 00:10:41,565 You've clearly got 2.5 million dollars, 234 00:10:41,565 --> 00:10:43,660 so you've got a lot of spare time to think about this stuff, 235 00:10:43,660 --> 00:10:44,320 presumably. 236 00:10:44,320 --> 00:10:46,240 And so what's interesting is that a lot 237 00:10:46,240 --> 00:10:48,420 of people who work in these cyber arms dealers, 238 00:10:48,420 --> 00:10:50,850 they're actually ex three-letter agencies. 239 00:10:50,850 --> 00:10:53,662 They're ex-CIA, or ex-NSA, or things like this. 240 00:10:53,662 --> 00:10:55,120 It's interesting to think about who 241 00:10:55,120 --> 00:10:57,867 are the actual customers of these cyber arms dealers. 242 00:10:57,867 --> 00:10:59,450 Some of them are actually governments, 243 00:10:59,450 --> 00:11:01,199 like the American government, for example. 244 00:11:01,199 --> 00:11:03,310 And they use these things to attack other nations, 245 00:11:03,310 --> 00:11:04,070 or whatever. 246 00:11:04,070 --> 00:11:06,337 But some of the people who buy this stuff 247 00:11:06,337 --> 00:11:07,920 are actually, increasingly, companies. 248 00:11:07,920 --> 00:11:09,670 So one thing we'll talk about a little bit 249 00:11:09,670 --> 00:11:12,260 at the end of the lecture is how sometimes companies are now 250 00:11:12,260 --> 00:11:13,968 taking cybersecurity into their own hands 251 00:11:13,968 --> 00:11:17,000 and sometimes doing what's called hackbacks. 252 00:11:17,000 --> 00:11:19,026 So without getting the government involved, 253 00:11:19,026 --> 00:11:20,900 companies that are attacked by cybercriminals 254 00:11:20,900 --> 00:11:22,680 will sometimes go back and explicitly 255 00:11:22,680 --> 00:11:24,810 try to take out people who tried to steal 256 00:11:24,810 --> 00:11:26,070 their intellectual property. 257 00:11:26,070 --> 00:11:28,430 And they've used some very inventive legal arguments 258 00:11:28,430 --> 00:11:30,140 to justify this, and so far it's actually 259 00:11:30,140 --> 00:11:31,098 been fairly successful. 260 00:11:31,098 --> 00:11:33,395 So this is an interesting aspect of cyber warfare. 261 00:11:33,395 --> 00:11:35,135 AUDIENCE: How is any of that legal? 262 00:11:38,910 --> 00:11:39,910 PROFESSOR: Well, so. 263 00:11:39,910 --> 00:11:42,181 I mean, information wants to be free, dude. 264 00:11:42,181 --> 00:11:42,680 Right? 265 00:11:42,680 --> 00:11:46,910 So if you think about stuff like this, for example. 266 00:11:46,910 --> 00:11:49,895 Just telling you stuff isn't necessarily illegal. 267 00:11:49,895 --> 00:11:52,020 I mean, it gets a little bit gray. 268 00:11:52,020 --> 00:11:54,860 But for example, if I tell you that look over there, 269 00:11:54,860 --> 00:11:59,550 there's a house, and the lock doesn't work on that door. 270 00:11:59,550 --> 00:12:00,730 Can I have 20 bucks? 271 00:12:00,730 --> 00:12:02,540 That's not necessarily illegal. 272 00:12:02,540 --> 00:12:04,730 Because as it turns out, these companies 273 00:12:04,730 --> 00:12:06,880 have, like, hordes of lawyers that 274 00:12:06,880 --> 00:12:08,880 look into things like this. 275 00:12:08,880 --> 00:12:10,654 But in many cases, if you think about it, 276 00:12:10,654 --> 00:12:12,320 you can search for stuff on the internet 277 00:12:12,320 --> 00:12:14,986 and go to websites that tell you things like how to build bombs, 278 00:12:14,986 --> 00:12:16,460 for example. 279 00:12:16,460 --> 00:12:19,170 Just posting that information typically 280 00:12:19,170 --> 00:12:21,392 is not illegal, because you're just learning. 281 00:12:21,392 --> 00:12:22,850 What if I'm a chemist, for example? 282 00:12:22,850 --> 00:12:24,680 Or something like this. 283 00:12:24,680 --> 00:12:27,200 So a lot of times, just giving someone knowledge 284 00:12:27,200 --> 00:12:29,045 is not necessarily illegal. 285 00:12:29,045 --> 00:12:31,290 But you're right that there's some gray areas here, 286 00:12:31,290 --> 00:12:34,220 and as we'll talk about with some of these hackbacks, 287 00:12:34,220 --> 00:12:35,250 it's not always clear. 288 00:12:35,250 --> 00:12:38,730 For example, if I am a bank, I'm not a government, I'm a bank. 289 00:12:38,730 --> 00:12:39,500 I get hacked. 290 00:12:39,500 --> 00:12:40,600 It's not always clear that I actually 291 00:12:40,600 --> 00:12:42,058 have the legal authority to go back 292 00:12:42,058 --> 00:12:44,690 and, let's say, try to shut down a botnet or things like that. 293 00:12:44,690 --> 00:12:46,680 Companies have done stuff like that. 294 00:12:46,680 --> 00:12:50,670 But I think this is an example where the law is 295 00:12:50,670 --> 00:12:54,610 lagging behind practice. 296 00:12:54,610 --> 00:12:56,170 And so people have used things like, 297 00:12:56,170 --> 00:12:57,970 we will use copyright infringement law 298 00:12:57,970 --> 00:12:59,880 to attack botnets as a company. 299 00:12:59,880 --> 00:13:02,260 Because they're selling legal goods of ours, 300 00:13:02,260 --> 00:13:04,470 so we'll use IP infringement. 301 00:13:04,470 --> 00:13:06,470 Like, this is probably not what Thomas Jefferson 302 00:13:06,470 --> 00:13:07,845 was thinking when he was thinking 303 00:13:07,845 --> 00:13:09,360 about how these laws work. 304 00:13:09,360 --> 00:13:11,370 So this is a little bit of a cat-and-mouse game. 305 00:13:11,370 --> 00:13:15,650 So we'll do a little bit of that later in the lecture. 306 00:13:15,650 --> 00:13:17,940 So, yes, this is very interesting. 307 00:13:17,940 --> 00:13:21,130 Basically what this all means is that there's 308 00:13:21,130 --> 00:13:28,760 this marketplace for all kinds of computational resources 309 00:13:28,760 --> 00:13:32,500 that you might use as someone who wants to launch attacks. 310 00:13:32,500 --> 00:13:34,070 So for example, there's a marketplace 311 00:13:34,070 --> 00:13:39,270 for compromised systems. 312 00:13:39,270 --> 00:13:40,980 So, for example, you can go to the darker 313 00:13:40,980 --> 00:13:43,820 places of the internet, you can purchase 314 00:13:43,820 --> 00:13:47,460 entire compromised machines that might be part of a botnet. 315 00:13:47,460 --> 00:13:51,130 You can actually buy access to a compromised website, 316 00:13:51,130 --> 00:13:52,070 for example. 317 00:13:52,070 --> 00:13:55,130 You might use that website to post spam, or put up 318 00:13:55,130 --> 00:13:57,630 evil links, or things like that. 319 00:13:57,630 --> 00:14:00,810 You can also get access to compromised email accounts, 320 00:14:00,810 --> 00:14:02,265 like Gmail or Yahoo accounts. 321 00:14:02,265 --> 00:14:03,640 As we'll talk later, those things 322 00:14:03,640 --> 00:14:05,726 are very very powerful for an attacker. 323 00:14:05,726 --> 00:14:08,190 And you may also just buy sort of a subscription 324 00:14:08,190 --> 00:14:09,472 service for a botnet. 325 00:14:09,472 --> 00:14:11,180 You'll just have this thing lying around. 326 00:14:11,180 --> 00:14:13,280 You can use it to send denial of service attacks or things 327 00:14:13,280 --> 00:14:13,780 like that. 328 00:14:13,780 --> 00:14:15,350 So there's a marketplace for that. 329 00:14:15,350 --> 00:14:18,650 There's a marketplace for tools. 330 00:14:18,650 --> 00:14:22,170 So you can get, as an attacker, off-the-shelf malware kits, 331 00:14:22,170 --> 00:14:23,470 for example. 332 00:14:23,470 --> 00:14:26,370 You can use perhaps arms dealers like this 333 00:14:26,370 --> 00:14:27,893 to get access to zero-day exploits 334 00:14:27,893 --> 00:14:30,510 so you can write your own malware, so on and so forth. 335 00:14:30,510 --> 00:14:32,620 And there's also a big marketplace 336 00:14:32,620 --> 00:14:38,150 for stolen user information. 337 00:14:38,150 --> 00:14:40,480 So this is stuff like Social Security numbers, 338 00:14:40,480 --> 00:14:44,040 credit card numbers, email addresses, so on and so forth. 339 00:14:44,040 --> 00:14:45,710 So it's all out there on the internet 340 00:14:45,710 --> 00:14:47,717 if you're just willing to look for it. 341 00:14:47,717 --> 00:14:49,350 And so the paper that we're going 342 00:14:49,350 --> 00:14:52,550 to look at today basically focused 343 00:14:52,550 --> 00:14:56,990 on one aspect of this, which is the spam ecosystem. 344 00:15:00,110 --> 00:15:02,020 And so in particular, they look at the sale 345 00:15:02,020 --> 00:15:06,850 of pharmaceuticals, of knockoff goods, and software. 346 00:15:06,850 --> 00:15:09,420 And so they basically break this spam ecosystem 347 00:15:09,420 --> 00:15:11,100 into three parts. 348 00:15:11,100 --> 00:15:15,230 They break it into advertising. 349 00:15:15,230 --> 00:15:18,020 So this is the process of somehow 350 00:15:18,020 --> 00:15:22,570 getting a user to click on a spam link somehow. 351 00:15:22,570 --> 00:15:25,300 And then once they've done that, there's 352 00:15:25,300 --> 00:15:29,890 this issue of click support. 353 00:15:29,890 --> 00:15:33,665 So this is the notion that once the user clicks the link, 354 00:15:33,665 --> 00:15:36,165 there has to be some type of web server, DNS infrastructure, 355 00:15:36,165 --> 00:15:38,220 so on and so forth on the back end that 356 00:15:38,220 --> 00:15:40,790 actually presents the spam website that the user goes to. 357 00:15:40,790 --> 00:15:43,076 And then the final part is realization. 358 00:15:45,820 --> 00:15:48,910 So this is actually allowing the user 359 00:15:48,910 --> 00:15:51,650 to say they want to buy something. 360 00:15:51,650 --> 00:15:53,950 The user sends money to the spammers, 361 00:15:53,950 --> 00:15:57,230 and the user's going to get some product back in the back end. 362 00:15:57,230 --> 00:16:01,450 And so this is where all of the money makes place. 363 00:16:01,450 --> 00:16:04,160 And so a lot of this stuff is actually 364 00:16:04,160 --> 00:16:10,070 outsourced to what the paper calls affiliate programs. 365 00:16:13,050 --> 00:16:15,650 And so you can think of these affiliate programs 366 00:16:15,650 --> 00:16:20,030 as essentially doing a lot of the back-end grunt 367 00:16:20,030 --> 00:16:23,130 work of talking to banks and Visa and MasterCard 368 00:16:23,130 --> 00:16:24,200 and things like this. 369 00:16:24,200 --> 00:16:26,044 And so a lot of times, the spammers, 370 00:16:26,044 --> 00:16:27,710 they don't want to deal with that stuff. 371 00:16:27,710 --> 00:16:29,640 They just want to create the links 372 00:16:29,640 --> 00:16:32,520 and do-- you can think of it as the advertising component. 373 00:16:32,520 --> 00:16:34,230 And so a lot of times the spammers 374 00:16:34,230 --> 00:16:37,920 themselves, they will work on a commission. 375 00:16:37,920 --> 00:16:42,340 So they will get, let's say, anywhere between 30% 376 00:16:42,340 --> 00:16:49,890 and maybe 50% of the final sale that they deliver to one 377 00:16:49,890 --> 00:16:52,670 of these back-end affiliates. 378 00:16:52,670 --> 00:16:55,541 So does that all make sense at a high level? 379 00:16:55,541 --> 00:16:56,040 OK. 380 00:16:56,040 --> 00:17:02,570 So what we'll do is we'll look at each component of this spam 381 00:17:02,570 --> 00:17:05,230 trajectory, and then see how it works, and then maybe think 382 00:17:05,230 --> 00:17:07,505 about how we'd to be able to shut down spammers 383 00:17:07,505 --> 00:17:11,540 at different levels of this [INAUDIBLE]. 384 00:17:11,540 --> 00:17:14,609 So the first thing we'll look at is the advertising component. 385 00:17:18,992 --> 00:17:21,450 And so, like I mentioned, the basic idea of the advertising 386 00:17:21,450 --> 00:17:29,440 is, how do you get the user to click on a link? 387 00:17:34,180 --> 00:17:36,630 That's the primary question we'll be concerned with here. 388 00:17:36,630 --> 00:17:39,320 And so the typical thing, as we all know, 389 00:17:39,320 --> 00:17:42,457 is you're going to email spam, although as we discussed 390 00:17:42,457 --> 00:17:43,915 at the beginning of lecture, people 391 00:17:43,915 --> 00:17:45,670 are starting to use text messages and some 392 00:17:45,670 --> 00:17:48,890 of these other forms of communication. 393 00:17:48,890 --> 00:17:50,760 You could also imagine maybe here we're 394 00:17:50,760 --> 00:17:53,305 going to start using social networks as well. 395 00:17:53,305 --> 00:17:54,763 So now when you go to Facebook, not 396 00:17:54,763 --> 00:17:56,929 only are you polluted by your real friends' content, 397 00:17:56,929 --> 00:17:58,940 you're also polluted by spam messages too. 398 00:17:58,940 --> 00:18:03,390 So this is about economics, this discussion. 399 00:18:03,390 --> 00:18:05,190 So one interesting question is, how much 400 00:18:05,190 --> 00:18:08,350 does it cost to actually send out these spam messages. 401 00:18:08,350 --> 00:18:12,250 And so as it turns out, it's not very expensive at all. 402 00:18:12,250 --> 00:18:18,454 For about 60 bucks, you can spend a million spam messages. 403 00:18:21,150 --> 00:18:23,760 So that's a super, super low cost. 404 00:18:23,760 --> 00:18:26,190 And this cost is actually much lower 405 00:18:26,190 --> 00:18:28,220 if you're directly operating a botnet. 406 00:18:28,220 --> 00:18:29,990 You can cut out the middleman. 407 00:18:29,990 --> 00:18:32,570 But even if you are renting one of the botnets 408 00:18:32,570 --> 00:18:35,890 from one of these marketplaces, this is still super, super low. 409 00:18:35,890 --> 00:18:38,154 AUDIENCE: So how many of those are actually effective? 410 00:18:38,154 --> 00:18:40,072 As in, they don't get filtered? 411 00:18:40,072 --> 00:18:41,780 PROFESSOR: Ah, so that's a good question. 412 00:18:41,780 --> 00:18:44,300 So that leads to my next point. 413 00:18:44,300 --> 00:18:46,299 So you're sending a million spams, 414 00:18:46,299 --> 00:18:47,840 but then they're going to get dropped 415 00:18:47,840 --> 00:18:49,174 at various points along the way. 416 00:18:49,174 --> 00:18:51,006 They're going to get caught in spam filters, 417 00:18:51,006 --> 00:18:53,048 people will-- they see it but they just delete it 418 00:18:53,048 --> 00:18:55,005 because they know that an email that has, like, 419 00:18:55,005 --> 00:18:56,700 18 dollar signs should just be deleted. 420 00:18:56,700 --> 00:18:58,940 So if you look at the conversion rate, 421 00:18:58,940 --> 00:19:00,870 you'll see that the click rates are actually 422 00:19:00,870 --> 00:19:04,320 very low because of things like spam filters 423 00:19:04,320 --> 00:19:05,290 and stuff like that. 424 00:19:05,290 --> 00:19:10,200 And also many users are trained to avoid these things. 425 00:19:10,200 --> 00:19:11,950 Click rates are low. 426 00:19:11,950 --> 00:19:15,170 And this is why sending spam has to be 427 00:19:15,170 --> 00:19:18,800 super, super cheap, because you will not 428 00:19:18,800 --> 00:19:20,040 get a lot of conversions. 429 00:19:20,040 --> 00:19:21,850 So for example, there have been some empirical studies that 430 00:19:21,850 --> 00:19:23,016 looked at these click rates. 431 00:19:23,016 --> 00:19:31,030 And one study found that they looked at 350 million spam 432 00:19:31,030 --> 00:19:34,650 messages, and they found that out 433 00:19:34,650 --> 00:19:37,650 of those 350 million messages, there 434 00:19:37,650 --> 00:19:44,960 was only about 10,000 clicks on those messages. 435 00:19:44,960 --> 00:19:46,710 So there's a massive dropoff here. 436 00:19:46,710 --> 00:19:49,750 And then out of these 10,000 clicks 437 00:19:49,750 --> 00:19:52,567 there were only 28 purchase attempts. 438 00:19:55,680 --> 00:19:58,430 So that's super, super low. 439 00:19:58,430 --> 00:20:01,010 And so that's why it's extremely important 440 00:20:01,010 --> 00:20:04,275 for this entire ecosystem to be very cheap from the perspective 441 00:20:04,275 --> 00:20:04,820 of a spammer. 442 00:20:04,820 --> 00:20:06,653 Because I mean, look at these dropoffs here. 443 00:20:06,653 --> 00:20:08,780 These are multiple orders of magnitude. 444 00:20:08,780 --> 00:20:13,636 And so that's why one might hope that at least in theory we 445 00:20:13,636 --> 00:20:15,010 could squeeze-- like for example, 446 00:20:15,010 --> 00:20:17,880 we could drive this number up maybe just $10. 447 00:20:17,880 --> 00:20:20,280 Maybe that has some catastrophic knockdown effect 448 00:20:20,280 --> 00:20:22,440 on how profitable this stuff is. 449 00:20:22,440 --> 00:20:24,995 So it's very important for the spammers 450 00:20:24,995 --> 00:20:26,880 that everything be as cheap as possible. 451 00:20:26,880 --> 00:20:28,848 AUDIENCE: So those 10,000 clicks. 452 00:20:28,848 --> 00:20:33,768 Again, how many of those 350 million emails 453 00:20:33,768 --> 00:20:35,911 were filtered out of the inbox? 454 00:20:35,911 --> 00:20:39,854 I'm just trying to get a sense of out of how many emails 455 00:20:39,854 --> 00:20:41,270 those clicks were out of, to gauge 456 00:20:41,270 --> 00:20:45,577 how effective spam filtering is versus how silly us humans are. 457 00:20:45,577 --> 00:20:47,410 PROFESSOR: Yeah, that I'm not actually sure. 458 00:20:47,410 --> 00:20:49,960 That's a good question. 459 00:20:49,960 --> 00:20:52,870 AUDIENCE: So I was just listening to a talk 460 00:20:52,870 --> 00:20:55,490 by Jeff Walker on Friday about this stuff, 461 00:20:55,490 --> 00:20:59,350 and he says that on the order of 20% to 40% 462 00:20:59,350 --> 00:21:02,990 of clicks going to one of these websites actually 463 00:21:02,990 --> 00:21:04,425 goes from a user's spam folder. 464 00:21:04,425 --> 00:21:07,363 So users go in their spam folder, looking for this stuff, 465 00:21:07,363 --> 00:21:08,238 and they click on it. 466 00:21:08,238 --> 00:21:10,070 So presumably there's a class of customers 467 00:21:10,070 --> 00:21:11,842 that are looking for this, and if they're 468 00:21:11,842 --> 00:21:14,300 looking for it-- oh, yeah, I'll just go into my spam folder 469 00:21:14,300 --> 00:21:15,340 to find this. 470 00:21:15,340 --> 00:21:17,850 So it's not clear that things going into spam folders 471 00:21:17,850 --> 00:21:19,324 are getting zero clicks. 472 00:21:19,324 --> 00:21:21,740 PROFESSOR: Yeah, I've heard anecdotal reports of that too. 473 00:21:21,740 --> 00:21:24,900 Some people, even for legitimate emails, 474 00:21:24,900 --> 00:21:26,980 they'll mark it as spam just so that if there's 475 00:21:26,980 --> 00:21:29,512 a shoulder-surfer, like at work, who's 476 00:21:29,512 --> 00:21:30,970 seeing them go to Gmail, let's say, 477 00:21:30,970 --> 00:21:33,440 they won't come and see that you've subscribed to, you know, 478 00:21:33,440 --> 00:21:33,940 whatever. 479 00:21:33,940 --> 00:21:35,950 And then they can secretly go into the spam folder, 480 00:21:35,950 --> 00:21:37,910 they know it's not deleted, and look at this stuff. 481 00:21:37,910 --> 00:21:38,890 This is actually a really interesting point. 482 00:21:38,890 --> 00:21:41,020 There's this whole psychology of who 483 00:21:41,020 --> 00:21:42,804 it is that actually clicks on these links. 484 00:21:42,804 --> 00:21:45,470 And so I think one of the papers that I linked to in the lecture 485 00:21:45,470 --> 00:21:49,830 notes talks about why these Nigerian scams still work. 486 00:21:49,830 --> 00:21:52,810 Because you'd think that anyone who basically 487 00:21:52,810 --> 00:21:54,440 has either common sense themselves, 488 00:21:54,440 --> 00:21:56,270 or a friend who has common sense, 489 00:21:56,270 --> 00:21:59,120 would never click on one of these Nigerian email scams. 490 00:21:59,120 --> 00:21:59,620 Right? 491 00:21:59,620 --> 00:22:04,470 But it turns out that the Nigerian meme is actually 492 00:22:04,470 --> 00:22:08,450 useful for spammers to filter out idiots. 493 00:22:08,450 --> 00:22:12,260 In other words, if you are so foolish that you would still 494 00:22:12,260 --> 00:22:15,210 click on a Nigerian email, then oh, OK, you're 495 00:22:15,210 --> 00:22:19,230 going to do one of these conversion things here. 496 00:22:19,230 --> 00:22:21,540 When you think about it, that's one of the key things 497 00:22:21,540 --> 00:22:22,350 that spammers need. 498 00:22:22,350 --> 00:22:24,370 They need people who are gullible 499 00:22:24,370 --> 00:22:28,370 enough or idealistic enough to click through on these things. 500 00:22:28,370 --> 00:22:31,490 There's a whole sort of psychology behind this. 501 00:22:31,490 --> 00:22:32,833 It's very interesting. 502 00:22:32,833 --> 00:22:36,037 AUDIENCE: So each of these purchases, about how much 503 00:22:36,037 --> 00:22:37,254 are they worth? 504 00:22:37,254 --> 00:22:38,670 PROFESSOR: That's a good question. 505 00:22:38,670 --> 00:22:41,560 So it actually depends on the type of thing 506 00:22:41,560 --> 00:22:42,850 that you're looking at. 507 00:22:42,850 --> 00:22:45,930 A lot of these purchases are not actually super high in value. 508 00:22:45,930 --> 00:22:48,500 So you're thinking that someone's buying herbal Viagra 509 00:22:48,500 --> 00:22:50,530 or they're buying like a knockoff Windows 510 00:22:50,530 --> 00:22:51,870 license or things like that. 511 00:22:51,870 --> 00:22:54,453 And in fact, a lot of times when they're buying these knockoff 512 00:22:54,453 --> 00:22:55,924 products, presumably the price is 513 00:22:55,924 --> 00:22:58,215 lower than what they'd actually get in the real market, 514 00:22:58,215 --> 00:23:00,510 because otherwise you could just go down to your local mall 515 00:23:00,510 --> 00:23:01,480 and buy these things. 516 00:23:01,480 --> 00:23:03,521 So a lot of times these purchases you're actually 517 00:23:03,521 --> 00:23:05,850 making are less than 1,000 dollars, 518 00:23:05,850 --> 00:23:09,205 and oftentimes a lot less than that. 519 00:23:09,205 --> 00:23:11,310 Any other questions? 520 00:23:11,310 --> 00:23:12,180 OK. 521 00:23:12,180 --> 00:23:14,515 So these conversion rates are super, super low. 522 00:23:14,515 --> 00:23:16,430 So like I said, one of the key things 523 00:23:16,430 --> 00:23:22,680 to do as a defender is to try to basically make spam 524 00:23:22,680 --> 00:23:29,380 more expensive for the spammer. 525 00:23:29,380 --> 00:23:31,020 So there's a couple different ways 526 00:23:31,020 --> 00:23:32,920 you might think about doing that. 527 00:23:32,920 --> 00:23:40,170 One way you might think about doing that are IP blacklists. 528 00:23:40,170 --> 00:23:43,540 So maybe ISPs or someone else basically 529 00:23:43,540 --> 00:23:45,545 collects this list of IPS that are 530 00:23:45,545 --> 00:23:48,125 known to be bad, that are known to come from spammers. 531 00:23:48,125 --> 00:23:51,630 And then we just don't let these people send traffic. 532 00:23:51,630 --> 00:23:54,430 So this kinda-sorta used to work for a while. 533 00:23:54,430 --> 00:23:58,470 But now it's so much easier for the attackers 534 00:23:58,470 --> 00:24:00,756 to use techniques like DNS redirection and stuff 535 00:24:00,756 --> 00:24:02,260 like that, that we'll talk about in a little bit, 536 00:24:02,260 --> 00:24:04,200 this doesn't actually work out very well. 537 00:24:04,200 --> 00:24:06,420 Because now there's a much larger set of addresses 538 00:24:06,420 --> 00:24:08,890 that spammers can send spam from, 539 00:24:08,890 --> 00:24:10,970 and they can also dynamically switch 540 00:24:10,970 --> 00:24:15,480 the binding between hostnames and web servers 541 00:24:15,480 --> 00:24:18,250 and all these types of things So this doesn't work out so well. 542 00:24:18,250 --> 00:24:20,760 Another idea that's been around for a long time 543 00:24:20,760 --> 00:24:27,600 is charging for email in some way, so each email you send, 544 00:24:27,600 --> 00:24:30,840 you have to pay some micropayment. 545 00:24:30,840 --> 00:24:33,024 So that currency could be a couple different things. 546 00:24:33,024 --> 00:24:34,565 So you might imagine that if I wanted 547 00:24:34,565 --> 00:24:36,360 to send you an email, maybe I'd have to pay 548 00:24:36,360 --> 00:24:38,390 a tenth of a tenth of a penny. 549 00:24:38,390 --> 00:24:41,174 And that's no big deal for me, because I don't 550 00:24:41,174 --> 00:24:42,340 send that many emails a day. 551 00:24:42,340 --> 00:24:44,798 But if you're a spammer trying to operate at these volumes, 552 00:24:44,798 --> 00:24:46,000 then that quickly adds up. 553 00:24:46,000 --> 00:24:48,360 That destroys their value chain. 554 00:24:48,360 --> 00:24:49,945 Another idea that people have had 555 00:24:49,945 --> 00:24:53,590 is, what if you used computation as a currency? 556 00:24:53,590 --> 00:24:55,740 This is the idea that before my email 557 00:24:55,740 --> 00:24:57,370 server will accept an email from me, 558 00:24:57,370 --> 00:24:58,842 I have to solve some puzzle. 559 00:24:58,842 --> 00:25:01,680 I have to do some math trick, or something like that. 560 00:25:01,680 --> 00:25:03,840 Once again, that cuts down the rate 561 00:25:03,840 --> 00:25:07,642 at which these bulk mailers can send messages. 562 00:25:07,642 --> 00:25:10,215 Also, we're all familiar with CAPTCHAs, too. 563 00:25:10,215 --> 00:25:11,590 This is basically the idea that I 564 00:25:11,590 --> 00:25:14,750 have to look at some picture of nine animals 565 00:25:14,750 --> 00:25:16,260 and find the cat instead of the dog, 566 00:25:16,260 --> 00:25:18,074 or type in some weird squiggly number that 567 00:25:18,074 --> 00:25:19,990 looks like a migraine, or something like that. 568 00:25:19,990 --> 00:25:24,280 So there have been all kinds of ideas 569 00:25:24,280 --> 00:25:26,772 for charging for email to stop this kind of stuff 570 00:25:26,772 --> 00:25:28,680 from happening. 571 00:25:28,680 --> 00:25:31,180 One of the classic problems, though, with all these schemes, 572 00:25:31,180 --> 00:25:35,120 is who's going to be the first one to implement it. 573 00:25:35,120 --> 00:25:37,172 And if all the email providers don't move forward 574 00:25:37,172 --> 00:25:38,880 at the same time, then of course spammers 575 00:25:38,880 --> 00:25:41,088 are just going to migrate to the email providers that 576 00:25:41,088 --> 00:25:42,682 don't require these techniques. 577 00:25:42,682 --> 00:25:44,890 So there's been the problem of how do we get everyone 578 00:25:44,890 --> 00:25:47,010 to upgrade en masse. 579 00:25:47,010 --> 00:25:48,930 And there's this issue of, well, what 580 00:25:48,930 --> 00:25:52,360 would happen if a user device is compromised? 581 00:25:52,360 --> 00:25:54,900 So maybe if someone breaks into my Gmail account, 582 00:25:54,900 --> 00:25:56,275 then maybe they're going to force 583 00:25:56,275 --> 00:26:00,330 me to pay 350 million micropayments, which 584 00:26:00,330 --> 00:26:02,555 could individually bankrupt me. 585 00:26:02,555 --> 00:26:04,805 And so it's not quite clear that some of these schemes 586 00:26:04,805 --> 00:26:06,335 are ready for primetime, but they 587 00:26:06,335 --> 00:26:07,920 do represent an interesting thought experiment 588 00:26:07,920 --> 00:26:09,700 about how you might be able to stop some of this stuff 589 00:26:09,700 --> 00:26:10,658 from the senders' side. 590 00:26:10,658 --> 00:26:13,582 AUDIENCE: So how do they work with mailing lists, where you 591 00:26:13,582 --> 00:26:14,790 have these big mailing lists? 592 00:26:14,790 --> 00:26:15,340 PROFESSOR: Yeah, so there's problems 593 00:26:15,340 --> 00:26:17,820 with that, and with mailing list aggregation. 594 00:26:17,820 --> 00:26:20,050 So it's very, very tricky, because there are actually 595 00:26:20,050 --> 00:26:22,722 some bulk mails that you do want to send. 596 00:26:22,722 --> 00:26:24,930 I mean, you might imagine having some heuristic where 597 00:26:24,930 --> 00:26:27,010 you look at the size of the mailing list 598 00:26:27,010 --> 00:26:29,702 and maybe you scale the payment according to that. 599 00:26:29,702 --> 00:26:31,160 So for example, maybe heuristically 600 00:26:31,160 --> 00:26:33,950 you think it's reasonable to send email to 1000 folks 601 00:26:33,950 --> 00:26:36,790 but not to 350 million folks, or something like this. 602 00:26:36,790 --> 00:26:39,331 But you're right that there are a lot of practical limitation 603 00:26:39,331 --> 00:26:42,280 issues that come out with this kind of stuff. 604 00:26:42,280 --> 00:26:51,080 So what the adversary can do to get around some of this? 605 00:26:51,080 --> 00:26:54,070 There are basically three workarounds 606 00:26:54,070 --> 00:26:58,170 that adversaries might try. 607 00:26:58,170 --> 00:27:02,820 So one thing they can do is just use botnets, 608 00:27:02,820 --> 00:27:11,512 because botnets have a lot of IPs that the attacker can use. 609 00:27:11,512 --> 00:27:12,970 And so for example, even if someone 610 00:27:12,970 --> 00:27:15,340 were trying to do something like IP blacklists, 611 00:27:15,340 --> 00:27:17,960 then maybe the attacker can cycle through a bunch of IPs 612 00:27:17,960 --> 00:27:19,650 in this botnet and maybe get around 613 00:27:19,650 --> 00:27:22,510 some of that blacklist filtering. 614 00:27:22,510 --> 00:27:28,320 They can also try to use compromised webmail accounts 615 00:27:28,320 --> 00:27:29,260 to send spam. 616 00:27:32,210 --> 00:27:35,560 So the reason why these are super useful 617 00:27:35,560 --> 00:27:38,870 is because sites like Gmail or Yahoo 618 00:27:38,870 --> 00:27:43,170 or Hotmail, those services can't be blacklisted, because they're 619 00:27:43,170 --> 00:27:44,095 super, super powerful. 620 00:27:44,095 --> 00:27:46,230 So if you blacklisted the entire service, 621 00:27:46,230 --> 00:27:48,188 then you're probably going to shut down service 622 00:27:48,188 --> 00:27:50,020 for tens of millions of people. 623 00:27:50,020 --> 00:27:54,320 Now of course, these individual services can shut down you. 624 00:27:54,320 --> 00:27:56,654 And so that will actually happen once they 625 00:27:56,654 --> 00:27:59,070 have these heuristics running that see that you're sending 626 00:27:59,070 --> 00:28:00,570 to a lot of people you've never sent 627 00:28:00,570 --> 00:28:01,980 before, and so on and so forth. 628 00:28:01,980 --> 00:28:05,660 A lot of AI strategy takes place on the webmail server side 629 00:28:05,660 --> 00:28:07,324 to try to predict these things. 630 00:28:07,324 --> 00:28:09,490 But these things can be very valuable to an attacker 631 00:28:09,490 --> 00:28:13,100 because even if your compromised account is not 632 00:28:13,100 --> 00:28:16,250 used to send a lot of emails, it can be used to send emails 633 00:28:16,250 --> 00:28:18,210 to people that you know. 634 00:28:18,210 --> 00:28:20,170 So maybe it allows the attacker to do things 635 00:28:20,170 --> 00:28:22,740 like spearfishing more easily, or things like that. 636 00:28:22,740 --> 00:28:24,250 People are more likely to click on an email that 637 00:28:24,250 --> 00:28:26,000 comes from an address that they recognize. 638 00:28:26,000 --> 00:28:29,500 So that's a very powerful technique there. 639 00:28:29,500 --> 00:28:31,210 And then attackers can also try to do 640 00:28:31,210 --> 00:28:38,530 things like hijack IP addresses from legitimate owners. 641 00:28:38,530 --> 00:28:42,790 So as was mentioned briefly in Mark's talk, 642 00:28:42,790 --> 00:28:45,380 there's this protocol called BGP that 643 00:28:45,380 --> 00:28:48,130 basically is used to control routing on the internet. 644 00:28:48,130 --> 00:28:49,960 So there are these attacks that people 645 00:28:49,960 --> 00:28:52,120 can do whereby they will essentially say, 646 00:28:52,120 --> 00:28:55,905 hey, I'm actually the owner of some prefix of IP addresses, 647 00:28:55,905 --> 00:28:57,530 even though they don't actually own it. 648 00:28:57,530 --> 00:28:59,734 So all the traffic that's involving those addresses 649 00:28:59,734 --> 00:29:01,650 will go in towards the attacker, and then they 650 00:29:01,650 --> 00:29:04,520 can actually use those addresses to send out spam from there. 651 00:29:04,520 --> 00:29:05,790 Then once they're done with their evil, 652 00:29:05,790 --> 00:29:07,373 they can release the BGP advertisement 653 00:29:07,373 --> 00:29:10,220 and then go try to do this somewhere else. 654 00:29:10,220 --> 00:29:12,940 There's a lot of research in how you can essentially 655 00:29:12,940 --> 00:29:15,810 think of ways to authenticate BGP by advertisement 656 00:29:15,810 --> 00:29:18,290 or otherwise prevent these IP address hijacks. 657 00:29:18,290 --> 00:29:19,440 So there's a bunch of different techniques 658 00:29:19,440 --> 00:29:21,398 that attackers can do to try to get around some 659 00:29:21,398 --> 00:29:24,840 of these defensive techniques. 660 00:29:24,840 --> 00:29:28,030 So this can all be done, but still, these defenses, 661 00:29:28,030 --> 00:29:28,739 they're not free. 662 00:29:28,739 --> 00:29:31,279 So presumably the attacker has to pay for the botnet somehow, 663 00:29:31,279 --> 00:29:33,590 they have to get inside these webmail accounts. 664 00:29:33,590 --> 00:29:36,330 And so any of these defenses that you can do 665 00:29:36,330 --> 00:29:39,856 will help to drive the cost up of generating these spams. 666 00:29:39,856 --> 00:29:41,230 So as such, they're still useful, 667 00:29:41,230 --> 00:29:45,610 even though they are not perfect defenses. 668 00:29:45,610 --> 00:29:48,760 So what do these botnets look like? 669 00:29:48,760 --> 00:29:55,770 So at a high level, you have the proverbial cloud 670 00:29:55,770 --> 00:29:56,785 from your cloud diagram. 671 00:29:56,785 --> 00:29:59,780 You have your command and control infrastructure up here, 672 00:29:59,780 --> 00:30:01,760 and this is the thing that actually 673 00:30:01,760 --> 00:30:08,220 sends commands to all of the individual bots down here. 674 00:30:08,220 --> 00:30:11,490 So the spammer will talk to the C&C and will say hey, 675 00:30:11,490 --> 00:30:14,130 here's my new spam messages I want to send, 676 00:30:14,130 --> 00:30:17,445 and then maybe these bots will act on behalf of their command 677 00:30:17,445 --> 00:30:19,570 and control infrastructure and start sending emails 678 00:30:19,570 --> 00:30:21,460 to a bunch of people. 679 00:30:21,460 --> 00:30:23,030 So let's see here. 680 00:30:23,030 --> 00:30:25,230 So why are these bots useful? 681 00:30:25,230 --> 00:30:27,592 Well, as I mentioned here, they have IP addresses, 682 00:30:27,592 --> 00:30:28,550 which are super useful. 683 00:30:28,550 --> 00:30:31,050 But of course they also have the associated bandwidth there. 684 00:30:31,050 --> 00:30:32,551 They also have computational cycles. 685 00:30:32,551 --> 00:30:33,925 Sometimes these bots are actually 686 00:30:33,925 --> 00:30:35,240 used as web servers themselves. 687 00:30:35,240 --> 00:30:37,610 So these things are very, very useful. 688 00:30:37,610 --> 00:30:40,905 And they also serve as a layer of indirection. 689 00:30:40,905 --> 00:30:43,740 So, as we're to discuss in more detail in a second, 690 00:30:43,740 --> 00:30:46,460 indirection is very useful for attackers. 691 00:30:46,460 --> 00:30:49,590 That means that if law enforcement or whatnot shuts 692 00:30:49,590 --> 00:30:51,724 down this level, well, if the command and control 693 00:30:51,724 --> 00:30:53,890 infrastructure's still alive, then maybe the spammer 694 00:30:53,890 --> 00:30:55,672 can just attach this command and control 695 00:30:55,672 --> 00:30:57,380 infrastructure to a different set of bots 696 00:30:57,380 --> 00:30:59,040 and keep on running. 697 00:30:59,040 --> 00:31:01,670 So that's one reason why these bots are very useful. 698 00:31:01,670 --> 00:31:04,370 And these bots can scale to the order of magnitude 699 00:31:04,370 --> 00:31:06,860 of millions of IP addresses. 700 00:31:06,860 --> 00:31:09,550 So as it turns out, people will click random links 701 00:31:09,550 --> 00:31:11,700 involving malware all the time. 702 00:31:11,700 --> 00:31:13,796 So these things can get very, very, very large. 703 00:31:13,796 --> 00:31:15,920 And so some of these takedowns that these companies 704 00:31:15,920 --> 00:31:18,253 get involved in, with trying to take down these botnets, 705 00:31:18,253 --> 00:31:20,561 they involve millions upon millions of machines. 706 00:31:20,561 --> 00:31:22,900 So they're very technically challenging. 707 00:31:22,900 --> 00:31:25,780 So how much does it cost to get your malware installed 708 00:31:25,780 --> 00:31:27,060 on all these bots? 709 00:31:27,060 --> 00:31:29,080 Remember, these are all typically 710 00:31:29,080 --> 00:31:30,680 regular end-user machines. 711 00:31:30,680 --> 00:31:34,300 So the cost for getting your malware on one of these 712 00:31:34,300 --> 00:31:45,640 machines, so price per post, is about $0.10 for U.S. 713 00:31:45,640 --> 00:31:58,370 hosts and on the order of $0.01 for posts in Asia. 714 00:31:58,370 --> 00:32:00,640 So it's interesting there's this differential here. 715 00:32:00,640 --> 00:32:01,620 There might a couple of different reasons 716 00:32:01,620 --> 00:32:03,020 we can imagine for why that is. 717 00:32:03,020 --> 00:32:09,240 It might be that people are prone to think that connections 718 00:32:09,240 --> 00:32:11,860 originating from the U.S. are more likely to be trustworthy. 719 00:32:11,860 --> 00:32:14,390 It may also be that because there's 720 00:32:14,390 --> 00:32:15,890 pirated software running here, stuff 721 00:32:15,890 --> 00:32:18,430 that's not actively up to date with respect to patches. 722 00:32:18,430 --> 00:32:21,100 It's actually easier to get botnet posts over here. 723 00:32:21,100 --> 00:32:24,000 So you'll see some very interesting statistics 724 00:32:24,000 --> 00:32:27,180 about how some of these rates might fluctuate, for example, 725 00:32:27,180 --> 00:32:29,410 as you see companies like Microsoft go out 726 00:32:29,410 --> 00:32:32,169 and try to stamp down on piracy and things like that. 727 00:32:32,169 --> 00:32:33,710 But anyway, this is a rough estimate. 728 00:32:33,710 --> 00:32:38,260 Suffice it to say, this is not super expensive. 729 00:32:38,260 --> 00:32:41,480 So what does-- any questions before we continue? 730 00:32:41,480 --> 00:32:41,980 OK. 731 00:32:41,980 --> 00:32:45,340 So what does this command and control infrastructure 732 00:32:45,340 --> 00:32:46,060 look like? 733 00:32:46,060 --> 00:32:49,580 So you can imagine that in one substantiation, the simplest 734 00:32:49,580 --> 00:32:55,090 substantiation, this is just some centralized setup. 735 00:32:55,090 --> 00:32:58,185 And so this is maybe one machine or maybe 736 00:32:58,185 --> 00:32:59,840 some small number of machines. 737 00:32:59,840 --> 00:33:01,990 The attacker gets to log into those machines 738 00:33:01,990 --> 00:33:04,490 and essentially just send these commands out to the botnets 739 00:33:04,490 --> 00:33:05,195 from there. 740 00:33:05,195 --> 00:33:06,653 So if it's going to be centralized, 741 00:33:06,653 --> 00:33:10,890 then it's going to be very useful for the attacker to have 742 00:33:10,890 --> 00:33:13,160 what's known as bulletproof hosting. 743 00:33:17,480 --> 00:33:19,545 So the idea behind bulletproof hosting 744 00:33:19,545 --> 00:33:23,980 is that you want to put this command and control 745 00:33:23,980 --> 00:33:31,750 infrastructure on servers that reside in ISPs that ignore 746 00:33:31,750 --> 00:33:33,570 requests from banks or from law enforcement 747 00:33:33,570 --> 00:33:35,980 to take down servers. 748 00:33:35,980 --> 00:33:38,200 So there are actually bulletproof servers that exist. 749 00:33:38,200 --> 00:33:40,699 They charge a premium, because there is a little bit of risk 750 00:33:40,699 --> 00:33:41,500 involved there. 751 00:33:41,500 --> 00:33:44,041 But if you can manage to host one of your command and control 752 00:33:44,041 --> 00:33:45,819 centers there, it's going to be very nice. 753 00:33:45,819 --> 00:33:47,860 Because then when the American government or when 754 00:33:47,860 --> 00:33:50,220 Goldman Sachs or whoever says hey, shut this guy down, 755 00:33:50,220 --> 00:33:52,500 they're running spam, the provider will say, 756 00:33:52,500 --> 00:33:53,390 how can you make me? 757 00:33:53,390 --> 00:33:55,199 I run in a different legal jurisdiction. 758 00:33:55,199 --> 00:33:57,490 I don't have to follow your intellectual property laws. 759 00:33:57,490 --> 00:33:58,922 So on and so forth. 760 00:33:58,922 --> 00:33:59,880 So this is very useful. 761 00:33:59,880 --> 00:34:02,330 Like I said, these types of hosts 762 00:34:02,330 --> 00:34:05,300 actually charge a risk premium for running 763 00:34:05,300 --> 00:34:06,850 that kind of service. 764 00:34:06,850 --> 00:34:09,489 And so the other alternative for running the C&C infrastructure 765 00:34:09,489 --> 00:34:13,639 is, this could be a peer-to-peer network. 766 00:34:17,280 --> 00:34:22,042 And so the idea here is that maybe this is sort of-- you 767 00:34:22,042 --> 00:34:24,250 can almost think of it as a mini-botnet up there too. 768 00:34:24,250 --> 00:34:25,965 So the entire control infrastructure 769 00:34:25,965 --> 00:34:28,250 is spread across many different machines, 770 00:34:28,250 --> 00:34:30,010 and maybe at any given time there's 771 00:34:30,010 --> 00:34:32,270 a different machine that's responsible for sending 772 00:34:32,270 --> 00:34:34,484 commands to all of these worker nodes down here. 773 00:34:34,484 --> 00:34:36,109 And so this is nice, because it doesn't 774 00:34:36,109 --> 00:34:39,370 require you to have access to one of these bulletproof hosts. 775 00:34:39,370 --> 00:34:42,040 You can construct the C&C infrastructure 776 00:34:42,040 --> 00:34:44,900 using regular bots. 777 00:34:44,900 --> 00:34:47,179 The P2P aspect of it makes it a little more 778 00:34:47,179 --> 00:34:49,820 difficult to provide guarantees about the availability 779 00:34:49,820 --> 00:34:52,047 of the hosts that are up here, but it does have 780 00:34:52,047 --> 00:34:53,255 some other nice advantages. 781 00:34:53,255 --> 00:34:55,428 At a high level, those are the two approaches 782 00:34:55,428 --> 00:34:57,610 that people can use. 783 00:34:57,610 --> 00:35:08,130 So what happens if the hosting service gets taken down? 784 00:35:12,590 --> 00:35:17,740 Well, there's a couple things that the adversary can do. 785 00:35:17,740 --> 00:35:23,895 So they can use DNS to essentially redirect requests. 786 00:35:30,440 --> 00:35:34,060 So let's say that someone attacks, 787 00:35:34,060 --> 00:35:36,610 or someone issues a takedown for the DNS infrastructure 788 00:35:36,610 --> 00:35:37,870 for something like this. 789 00:35:37,870 --> 00:35:39,870 As long as the back-end servers are still alive, 790 00:35:39,870 --> 00:35:44,750 what the attacker can do is basically-- 791 00:35:44,750 --> 00:35:51,330 the attacker creates lists of server IP addresses. 792 00:35:55,114 --> 00:35:58,285 And there may be hundreds or thousands of these IP addresses 793 00:35:58,285 --> 00:35:59,600 that it collects. 794 00:35:59,600 --> 00:36:08,210 And then it will bind each one to a host 795 00:36:08,210 --> 00:36:13,090 name for a very short period of time. 796 00:36:13,090 --> 00:36:16,610 So let's say maybe for 300 seconds. 797 00:36:20,317 --> 00:36:22,400 And so what's nice about this is that if someone's 798 00:36:22,400 --> 00:36:24,370 trying to run heuristics that say, 799 00:36:24,370 --> 00:36:28,140 if I see some particular server sending 800 00:36:28,140 --> 00:36:32,197 more than 1,000 spam-like messages in a given period 801 00:36:32,197 --> 00:36:34,780 I'm going to try to issue some kind of takedown to them, well, 802 00:36:34,780 --> 00:36:37,795 these types of techniques will maybe help the attacker fly 803 00:36:37,795 --> 00:36:40,086 under the radar of those types of detection techniques. 804 00:36:40,086 --> 00:36:41,990 Because essentially every 300 seconds they're saying, 805 00:36:41,990 --> 00:36:43,245 OK, I'm going to be serving spam from here, 806 00:36:43,245 --> 00:36:45,620 then I'm going to be serving spam from here, serving spam 807 00:36:45,620 --> 00:36:46,960 from here, so on and so forth. 808 00:36:46,960 --> 00:36:49,292 So this is a nice use of indirection, at least 809 00:36:49,292 --> 00:36:50,960 from the attacker's perspective. 810 00:36:50,960 --> 00:36:55,640 And so, as I mentioned earlier, these types of indirection 811 00:36:55,640 --> 00:36:58,000 are of one of the key ways that attackers 812 00:36:58,000 --> 00:37:02,710 try to evade law enforcement and these detection heuristics. 813 00:37:02,710 --> 00:37:05,540 So you might think about, well, what if we just 814 00:37:05,540 --> 00:37:07,480 take down the DNS server? 815 00:37:07,480 --> 00:37:09,337 How hard is it to do that? 816 00:37:09,337 --> 00:37:10,795 Well, as the paper describes, there 817 00:37:10,795 --> 00:37:12,160 are a couple different layers on which 818 00:37:12,160 --> 00:37:13,500 you can attack these spammers. 819 00:37:13,500 --> 00:37:17,409 So you can try to take down the attacker's domain registration. 820 00:37:17,409 --> 00:37:18,950 That's basically the thing that says, 821 00:37:18,950 --> 00:37:25,050 like, hey, if you're looking for russianpharma.rx.biz.org, 822 00:37:25,050 --> 00:37:27,299 then here's the DNS server that you talk to. 823 00:37:27,299 --> 00:37:29,090 You can imagine attacking it at that level. 824 00:37:29,090 --> 00:37:30,548 You could also imagine attacking it 825 00:37:30,548 --> 00:37:34,060 at the level of taking down the spammer's DNS server, 826 00:37:34,060 --> 00:37:36,120 the thing to which you'll be redirected once you 827 00:37:36,120 --> 00:37:38,552 look at that top-level domain. 828 00:37:38,552 --> 00:37:40,260 And so what's tricky is that the attacker 829 00:37:40,260 --> 00:37:43,540 can use these sort of fast flux techniques 830 00:37:43,540 --> 00:37:44,800 at every different level. 831 00:37:44,800 --> 00:37:47,600 So, for example, they can rotate the servers 832 00:37:47,600 --> 00:37:49,360 they use to act as their DNS servers. 833 00:37:49,360 --> 00:37:54,970 They can rotate the web servers they use to send out the spam. 834 00:37:54,970 --> 00:37:56,388 And so on and so forth. 835 00:37:56,388 --> 00:37:58,221 So that's just a high-level review 836 00:37:58,221 --> 00:37:59,846 of how people can use multiple machines 837 00:37:59,846 --> 00:38:03,810 to try to avoid detection. 838 00:38:03,810 --> 00:38:09,540 So as I mentioned earlier, you can use compromised 839 00:38:09,540 --> 00:38:14,660 webmail accounts to send spam. 840 00:38:20,900 --> 00:38:25,190 And the power of that is that if you can get access 841 00:38:25,190 --> 00:38:27,065 to someone's account, then you don't actually 842 00:38:27,065 --> 00:38:28,773 have to install malware on their machine. 843 00:38:28,773 --> 00:38:30,374 You can actually access their account 844 00:38:30,374 --> 00:38:32,290 from the privacy of your own machine, wherever 845 00:38:32,290 --> 00:38:33,373 it is that you're located. 846 00:38:33,373 --> 00:38:36,074 And as we were discussing earlier, 847 00:38:36,074 --> 00:38:37,740 this is useful for spearfishing attacks, 848 00:38:37,740 --> 00:38:40,690 because you can send this spam message as the person whose 849 00:38:40,690 --> 00:38:42,570 account it actually belongs to. 850 00:38:42,570 --> 00:38:44,280 And so as a result the webmail providers 851 00:38:44,280 --> 00:38:47,714 are very motivated to shut this kind of thing down. 852 00:38:47,714 --> 00:38:49,380 Because if they don't do that, then they 853 00:38:49,380 --> 00:38:51,600 risk being blacklisted as a whole. 854 00:38:51,600 --> 00:38:54,880 All the users risk being flagged as spam, which they don't want. 855 00:38:54,880 --> 00:38:58,140 And also the provider actually needs to somehow monetize 856 00:38:58,140 --> 00:38:58,870 their service. 857 00:38:58,870 --> 00:39:01,750 They actually need real users to be doing things 858 00:39:01,750 --> 00:39:03,550 like clicking on ads in the righthand bar 859 00:39:03,550 --> 00:39:04,880 of their webmail account. 860 00:39:04,880 --> 00:39:08,380 So the higher the proportion of their users which are spamming, 861 00:39:08,380 --> 00:39:10,535 the less likely advertisers are to advertise 862 00:39:10,535 --> 00:39:11,920 in their webmail system. 863 00:39:11,920 --> 00:39:13,972 So the webmail account providers are 864 00:39:13,972 --> 00:39:17,280 very incentivized to shut down this kind of stuff. 865 00:39:17,280 --> 00:39:20,330 So how do they try to detect this type of spam? 866 00:39:20,330 --> 00:39:21,450 They use those heuristics. 867 00:39:21,450 --> 00:39:24,350 They might try to use CAPTCHAs. 868 00:39:24,350 --> 00:39:27,180 If they suspect that you've sent some spam-like messages, 869 00:39:27,180 --> 00:39:28,835 let's say five times in a row, they 870 00:39:28,835 --> 00:39:30,960 might ask you to type in one of those fuzzy letters 871 00:39:30,960 --> 00:39:32,950 or whatever. 872 00:39:32,950 --> 00:39:35,150 Suffice it to say, though, a lot of these techniques 873 00:39:35,150 --> 00:39:36,520 don't work very well. 874 00:39:36,520 --> 00:39:41,650 If you look at the price per account, 875 00:39:41,650 --> 00:39:43,425 so how much you as a spammer would 876 00:39:43,425 --> 00:39:45,880 have to pay to get one of these things, 877 00:39:45,880 --> 00:39:47,590 it's still super, super cheap. 878 00:39:47,590 --> 00:39:54,860 So it's on the order of $0.01 to $0.05 for an account on Yahoo, 879 00:39:54,860 --> 00:39:56,770 Gmail, Hotmail, something like that. 880 00:39:56,770 --> 00:39:59,030 So once again, this is very, very low. 881 00:39:59,030 --> 00:40:01,580 And so this does not act as an effective disincentive 882 00:40:01,580 --> 00:40:04,670 for spammers to try to do these types of things. 883 00:40:04,670 --> 00:40:08,590 So this maybe is a little bit disappointing, 884 00:40:08,590 --> 00:40:10,740 because it seems like everywhere we 885 00:40:10,740 --> 00:40:13,160 go, we have to solve these CAPTCHAs if we 886 00:40:13,160 --> 00:40:15,459 want to buy things or send emails or do 887 00:40:15,459 --> 00:40:16,250 that kind of stuff. 888 00:40:16,250 --> 00:40:20,480 So basically, what happened to CAPTCHAs? 889 00:40:20,480 --> 00:40:24,660 They were supposed to make all this bad stuff go away. 890 00:40:24,660 --> 00:40:29,580 And as it turns out, the attacker 891 00:40:29,580 --> 00:40:34,250 can build services to solve CAPTCHAs. 892 00:40:37,580 --> 00:40:41,210 So this can be automated, just like anything else. 893 00:40:44,380 --> 00:40:46,860 As it turns out, the economics for this 894 00:40:46,860 --> 00:40:49,440 is that if you want to solve one CAPTCHA, 895 00:40:49,440 --> 00:40:57,521 then it's approximately $0.001 dollar to solve a CAPTCHA. 896 00:40:57,521 --> 00:40:59,930 Which is nothing. 897 00:40:59,930 --> 00:41:02,830 And this can be done with very, very low latency, too. 898 00:41:02,830 --> 00:41:05,400 So CAPTCHAs essentially are not presenting 899 00:41:05,400 --> 00:41:08,620 most large-scale spammers with a high barrier 900 00:41:08,620 --> 00:41:10,200 for sending these spams. 901 00:41:10,200 --> 00:41:12,630 And so how is this being done? 902 00:41:12,630 --> 00:41:14,290 If it's this cheap, you might think, 903 00:41:14,290 --> 00:41:17,182 maybe it's being done all by computers, by software. 904 00:41:17,182 --> 00:41:18,140 But it's not, actually. 905 00:41:18,140 --> 00:41:21,434 So a lot of this is done by humans. 906 00:41:25,780 --> 00:41:29,903 In particular, the attacker can outsource this in one 907 00:41:29,903 --> 00:41:30,650 of two ways. 908 00:41:30,650 --> 00:41:32,191 So first of all the attacker can just 909 00:41:32,191 --> 00:41:34,570 find a labor market where the cost of labor 910 00:41:34,570 --> 00:41:36,340 is very, very cheap. 911 00:41:36,340 --> 00:41:39,740 So you can employ humans to essentially act 912 00:41:39,740 --> 00:41:42,154 as CAPTCHA solvers for you. 913 00:41:42,154 --> 00:41:44,070 You, the spammer, are presented with a CAPTCHA 914 00:41:44,070 --> 00:41:45,240 by Gmail or whatever. 915 00:41:45,240 --> 00:41:47,470 You, the spammer, then send that CAPTCHA 916 00:41:47,470 --> 00:41:49,290 over to some human sitting somewhere. 917 00:41:49,290 --> 00:41:51,690 They solve for you, they've earned some small amount 918 00:41:51,690 --> 00:41:54,340 of money, and then you send their answer 919 00:41:54,340 --> 00:41:56,410 to the legitimate site. 920 00:41:56,410 --> 00:42:02,160 You could also do this with Mechanical Turk. 921 00:42:02,160 --> 00:42:05,064 Have you guys heard of Mechanical Turk? 922 00:42:05,064 --> 00:42:07,800 I've asked the question, my back is turned, [INAUDIBLE]. 923 00:42:07,800 --> 00:42:11,230 OK, so Mechanical Turk is pretty neat, 924 00:42:11,230 --> 00:42:12,980 I mean neat if you're trying to do evil. 925 00:42:12,980 --> 00:42:13,880 So what's nice about that is that you 926 00:42:13,880 --> 00:42:16,192 can post these tasks on Mechanical Turk and say, 927 00:42:16,192 --> 00:42:18,650 hey, I have a picture-solving game, or something like this. 928 00:42:18,650 --> 00:42:20,390 Or you can just come out and say straight up, 929 00:42:20,390 --> 00:42:22,015 I've got some CAPTCHAs I want to solve. 930 00:42:22,015 --> 00:42:23,990 You post a price, and then basically the market 931 00:42:23,990 --> 00:42:26,466 will match you with people who are willing to do that task. 932 00:42:26,466 --> 00:42:28,840 And then they'll do it for you, they'll post the answers. 933 00:42:28,840 --> 00:42:34,060 So this actually automates a lot of actually finding 934 00:42:34,060 --> 00:42:37,180 the labor pool for the spammer. 935 00:42:37,180 --> 00:42:38,907 The problem with this is that you 936 00:42:38,907 --> 00:42:40,365 have more overhead for the spammer, 937 00:42:40,365 --> 00:42:43,955 because Amazon has to take some cut of that profit that's 938 00:42:43,955 --> 00:42:44,890 generated from that. 939 00:42:44,890 --> 00:42:48,410 But that's very nice there. 940 00:42:48,410 --> 00:42:50,780 Another thing that attackers can do 941 00:42:50,780 --> 00:42:55,530 is they can actually reuse CAPTCHAs on legitimate sites. 942 00:42:55,530 --> 00:42:58,610 So there's some CAPTCHA that the attacker wants to solve. 943 00:42:58,610 --> 00:43:00,590 They then have some legitimate site 944 00:43:00,590 --> 00:43:03,590 on the side where they present that exact same CAPTCHA, 945 00:43:03,590 --> 00:43:06,510 and get a real visitor to figure out what that CAPTCHA is. 946 00:43:06,510 --> 00:43:08,680 Then they come back over to the first site 947 00:43:08,680 --> 00:43:11,590 and then use that answer as the answer. 948 00:43:11,590 --> 00:43:14,001 And like all these crowdsourcing-type things, 949 00:43:14,001 --> 00:43:15,626 if you don't trust your users, then you 950 00:43:15,626 --> 00:43:17,540 can maybe replicate the work. 951 00:43:17,540 --> 00:43:19,880 So you send the CAPTCHA to maybe two or three people. 952 00:43:19,880 --> 00:43:21,963 And then you come back in and use majority voting, 953 00:43:21,963 --> 00:43:25,430 take whatever that majority vote was as your CAPTCHA answer. 954 00:43:25,430 --> 00:43:27,190 And so these are some of the reasons 955 00:43:27,190 --> 00:43:29,270 why the CAPTCHA defenses don't work 956 00:43:29,270 --> 00:43:31,130 as well as you might think. 957 00:43:31,130 --> 00:43:34,590 So the providers, so for example Gmail or Yahoo or whatever, 958 00:43:34,590 --> 00:43:37,840 can to try to implement more frequent CAPTCHAs 959 00:43:37,840 --> 00:43:42,200 to try to push the friction level up for the spammer. 960 00:43:42,200 --> 00:43:44,320 The problem there is that then regular users 961 00:43:44,320 --> 00:43:45,490 will get irritated. 962 00:43:45,490 --> 00:43:47,960 So a good example of this is Gmail's 963 00:43:47,960 --> 00:43:49,210 two-factor authentication. 964 00:43:49,210 --> 00:43:51,610 It's actually a super good idea. 965 00:43:51,610 --> 00:43:53,585 Whenever Gmail will detect that you're 966 00:43:53,585 --> 00:43:55,320 trying to use Gmail from a machine 967 00:43:55,320 --> 00:43:57,580 that it doesn't know about, it'll 968 00:43:57,580 --> 00:44:00,025 basically send you a text message saying hey, enter 969 00:44:00,025 --> 00:44:02,940 this verification code into Gmail 970 00:44:02,940 --> 00:44:05,170 before you can actually continue to use the service. 971 00:44:05,170 --> 00:44:07,336 And so what's funny is that it's a super great idea, 972 00:44:07,336 --> 00:44:09,370 but at least for me, I get super irritated 973 00:44:09,370 --> 00:44:11,044 when I have to get that text message. 974 00:44:11,044 --> 00:44:13,210 Like, I know it's good for me, but I just get angry. 975 00:44:13,210 --> 00:44:13,918 It's frictionful. 976 00:44:13,918 --> 00:44:15,479 And so I'll do it if I don't migrate 977 00:44:15,479 --> 00:44:17,020 to a lot of different machines a lot, 978 00:44:17,020 --> 00:44:19,640 but if I had to do it any more than I did right now, 979 00:44:19,640 --> 00:44:22,800 it's unclear that I'd feel as happy about it as I do. 980 00:44:22,800 --> 00:44:24,690 So there's this very interesting sort 981 00:44:24,690 --> 00:44:27,060 of tradeoff between the security that people 982 00:44:27,060 --> 00:44:29,660 say that they want and the security measures that they're 983 00:44:29,660 --> 00:44:30,740 willing to put up with. 984 00:44:30,740 --> 00:44:32,490 So as a result, it's very difficult 985 00:44:32,490 --> 00:44:35,485 for the webmail providers to increase the amount of CAPTCHAs 986 00:44:35,485 --> 00:44:38,620 and still keep users happy. 987 00:44:38,620 --> 00:44:40,490 OK, so any other questions before we move on 988 00:44:40,490 --> 00:44:41,360 to click support? 989 00:44:41,360 --> 00:44:45,824 AUDIENCE: So is one of the reasons for the non-adoption 990 00:44:45,824 --> 00:44:49,296 of encrypted emails, besides the [INAUDIBLE] 991 00:44:49,296 --> 00:44:52,770 is that spam filters have a very, very big part? 992 00:44:52,770 --> 00:44:56,374 PROFESSOR: Ah, because then they can't inspect messages and see 993 00:44:56,374 --> 00:44:57,040 what's going on. 994 00:44:57,040 --> 00:44:57,998 That's a good question. 995 00:44:57,998 --> 00:44:59,530 I think it's actually hard to say. 996 00:44:59,530 --> 00:45:01,820 I don't know, because it's a little bit of a chicken and egg 997 00:45:01,820 --> 00:45:02,320 problem. 998 00:45:02,320 --> 00:45:05,260 So because there isn't a huge volume of encrypted email, 999 00:45:05,260 --> 00:45:07,977 it's unclear whether spammers are actually trying 1000 00:45:07,977 --> 00:45:09,060 to take advantage of that. 1001 00:45:09,060 --> 00:45:11,130 But I could see that maybe being a problem. 1002 00:45:11,130 --> 00:45:12,810 I mean, people have looked at ways 1003 00:45:12,810 --> 00:45:16,880 to do computation over encrypted data. 1004 00:45:16,880 --> 00:45:19,430 So maybe you could think about doing something there. 1005 00:45:19,430 --> 00:45:20,880 But it's always tricky. 1006 00:45:20,880 --> 00:45:22,560 So for example, with spam, people 1007 00:45:22,560 --> 00:45:25,730 have these spam filters that were based on Markov models 1008 00:45:25,730 --> 00:45:26,810 and things like that. 1009 00:45:26,810 --> 00:45:27,935 So what do the spammers do? 1010 00:45:27,935 --> 00:45:30,950 They start making these images that basically 1011 00:45:30,950 --> 00:45:32,480 can't be seen by the text scanners, 1012 00:45:32,480 --> 00:45:34,313 but then have the spamming content in there. 1013 00:45:34,313 --> 00:45:38,290 So it's always an arms race. 1014 00:45:38,290 --> 00:45:38,995 All right. 1015 00:45:38,995 --> 00:45:44,100 So let's move on to click support. 1016 00:45:44,100 --> 00:45:49,000 So what is this about? 1017 00:45:49,000 --> 00:45:51,870 So once the advertising step has succeeded 1018 00:45:51,870 --> 00:45:54,930 and the user is given a link, so these are clicks on that link, 1019 00:45:54,930 --> 00:46:01,630 so the user contacts some DNS server 1020 00:46:01,630 --> 00:46:09,010 after clicking on that link to basically translate 1021 00:46:09,010 --> 00:46:18,130 some hostname that was in that link to some IP. 1022 00:46:18,130 --> 00:46:21,940 And then after that translation takes place, 1023 00:46:21,940 --> 00:46:34,980 the user has to contact some web server that has that IP. 1024 00:46:34,980 --> 00:46:37,080 So to make all this work, the spammer 1025 00:46:37,080 --> 00:46:44,838 has to register a domain name. 1026 00:46:44,838 --> 00:46:54,570 And then the spammer has to run a DNS server, 1027 00:46:54,570 --> 00:46:56,920 and then they have to run a web server. 1028 00:47:02,930 --> 00:47:05,010 So this is essentially what the spammer 1029 00:47:05,010 --> 00:47:07,950 has to do to make this click support thing work out. 1030 00:47:07,950 --> 00:47:10,376 So one question you might have is, well, 1031 00:47:10,376 --> 00:47:13,380 why wouldn't the spammer just use 1032 00:47:13,380 --> 00:47:18,101 raw IP addresses, for example, like in these spam URLs? 1033 00:47:18,101 --> 00:47:20,100 And so does anyone have any thoughts about that? 1034 00:47:20,100 --> 00:47:25,046 Why wouldn't you just have 183.4.4 dot whatever, 1035 00:47:25,046 --> 00:47:27,530 instead of having something like russianjewels.biz? 1036 00:47:27,530 --> 00:47:29,495 AUDIENCE: Because it looks sketchy, 1037 00:47:29,495 --> 00:47:30,694 it makes it easier to tell. 1038 00:47:30,694 --> 00:47:31,360 PROFESSOR: Yeah. 1039 00:47:31,360 --> 00:47:34,814 So one thing, one would hope, is that a user would 1040 00:47:34,814 --> 00:47:37,230 look at this thing that just has a bunch of numbers in it, 1041 00:47:37,230 --> 00:47:39,962 and they'd say, well, this clearly seems weird. 1042 00:47:39,962 --> 00:47:42,420 As it turns out, this will only weed out some of the users, 1043 00:47:42,420 --> 00:47:43,461 but you're exactly right. 1044 00:47:43,461 --> 00:47:46,225 There's a subset of people you would lose just because nobody 1045 00:47:46,225 --> 00:47:47,730 wants to click on that. 1046 00:47:47,730 --> 00:47:50,210 Another reason is that once again, 1047 00:47:50,210 --> 00:47:53,580 having this sort of DNS infrastructure up here 1048 00:47:53,580 --> 00:47:56,220 gives the attacker another level of indirection. 1049 00:47:56,220 --> 00:47:59,900 So once again, if the legal authorities or whoever 1050 00:47:59,900 --> 00:48:02,280 shut down the DNS infrastructure but they somehow 1051 00:48:02,280 --> 00:48:05,400 don't manage to shut down that back-end web server, 1052 00:48:05,400 --> 00:48:07,524 then the spammer can conjure up a different sort 1053 00:48:07,524 --> 00:48:09,190 of front end for their service and maybe 1054 00:48:09,190 --> 00:48:11,930 try to use that same web server on the back end. 1055 00:48:11,930 --> 00:48:13,450 So that's another reason, I think, 1056 00:48:13,450 --> 00:48:16,960 that people don't typically put these raw IP 1057 00:48:16,960 --> 00:48:21,020 addresses in their spam URLs. 1058 00:48:21,020 --> 00:48:27,400 So another example of how this redirection comes into play-- 1059 00:48:27,400 --> 00:48:29,790 how this indirection comes into play, sorry-- 1060 00:48:29,790 --> 00:48:37,445 is that these spam URLs often point to redirection sites. 1061 00:48:43,070 --> 00:48:48,660 And so these are sites like bit.ly, or things like that. 1062 00:48:48,660 --> 00:48:52,793 And so in addition to things like bit.ly, 1063 00:48:52,793 --> 00:48:55,870 you could also imagine that a compromised 1064 00:48:55,870 --> 00:49:02,515 website can actually also act as a redirecter. 1065 00:49:05,310 --> 00:49:09,134 You just put the appropriate HTML or JavaScript in there 1066 00:49:09,134 --> 00:49:10,675 that when the user goes to that site, 1067 00:49:10,675 --> 00:49:13,520 it's then going to redirect the user's browser 1068 00:49:13,520 --> 00:49:15,674 to some other different site. 1069 00:49:15,674 --> 00:49:17,590 So once again, this useful because it provides 1070 00:49:17,590 --> 00:49:19,320 that level of indirection. 1071 00:49:19,320 --> 00:49:21,585 And it actually acts as a force multiplier, 1072 00:49:21,585 --> 00:49:25,770 so you have a single spamming web server back end, 1073 00:49:25,770 --> 00:49:29,180 but then you can name it using different things. 1074 00:49:29,180 --> 00:49:32,480 And that will allow you to maybe confuse 1075 00:49:32,480 --> 00:49:35,980 filters who have blacklisted, let's say, 10% of your URLs, 1076 00:49:35,980 --> 00:49:37,970 but not the other 90% of them. 1077 00:49:37,970 --> 00:49:40,290 So this is a very, very common technique. 1078 00:49:40,290 --> 00:49:45,770 And then another thing is that sometimes the spammers 1079 00:49:45,770 --> 00:49:58,070 can use botnets as web servers or maybe as proxies, as DNS 1080 00:49:58,070 --> 00:50:01,800 servers, and so and so forth. 1081 00:50:01,800 --> 00:50:04,990 We mentioned this a little bit earlier, 1082 00:50:04,990 --> 00:50:07,860 but this is another example of how the more machines you have 1083 00:50:07,860 --> 00:50:10,237 as an attacker, the more defense that gives you. 1084 00:50:10,237 --> 00:50:12,320 Because you can hide your evil amongst a watershed 1085 00:50:12,320 --> 00:50:12,870 of machines. 1086 00:50:16,758 --> 00:50:20,160 All right. 1087 00:50:20,160 --> 00:50:22,802 So in some cases, one of the things the paper talks about 1088 00:50:22,802 --> 00:50:24,010 is these affiliate providers. 1089 00:50:24,010 --> 00:50:29,290 These affiliate providers kind of act as evil clearinghouses. 1090 00:50:29,290 --> 00:50:31,905 They will help to automate some of the tedium of interacting 1091 00:50:31,905 --> 00:50:34,020 with the banks, and things like this, 1092 00:50:34,020 --> 00:50:35,730 on behalf of you, the spammer. 1093 00:50:35,730 --> 00:50:37,610 So one thing you might wonder is, well, 1094 00:50:37,610 --> 00:50:39,800 why can't the law enforcement just take down 1095 00:50:39,800 --> 00:50:41,090 the affiliate providers? 1096 00:50:41,090 --> 00:50:43,152 They seem kind of like a choke point. 1097 00:50:43,152 --> 00:50:45,110 And the thing is that these affiliate providers 1098 00:50:45,110 --> 00:50:48,310 are kind of like SPECTRE from the James Bond movies. 1099 00:50:48,310 --> 00:50:50,220 They're very decentralized themselves. 1100 00:50:50,220 --> 00:50:53,184 So it's very difficult to point to an affiliate provider 1101 00:50:53,184 --> 00:50:55,350 at this particular machine, and we'll just shut down 1102 00:50:55,350 --> 00:50:56,530 that particular machine. 1103 00:50:56,530 --> 00:50:58,000 Oftentimes the affiliate providers 1104 00:50:58,000 --> 00:50:59,640 are distributed themselves. 1105 00:50:59,640 --> 00:51:01,800 So that means that it's actually pretty tricky for, 1106 00:51:01,800 --> 00:51:04,770 let's say, the FBI, to just go to some affiliate program 1107 00:51:04,770 --> 00:51:07,840 and say, thou shalt not do this anymore. 1108 00:51:07,840 --> 00:51:09,420 Another interesting thing, too, is 1109 00:51:09,420 --> 00:51:12,830 that the paper mentions that in many countries 1110 00:51:12,830 --> 00:51:14,640 IP laws are different, for example. 1111 00:51:14,640 --> 00:51:17,600 So the FBI may not be able to enforce intellectual properties 1112 00:51:17,600 --> 00:51:19,430 that we have with other countries. 1113 00:51:19,430 --> 00:51:21,520 And also, according to the paper, 1114 00:51:21,520 --> 00:51:23,755 in many of these spam forums, the spammers 1115 00:51:23,755 --> 00:51:26,790 claim they are providing a useful, legitimate service 1116 00:51:26,790 --> 00:51:28,370 to Western countries. 1117 00:51:28,370 --> 00:51:30,720 They say that essentially, prices 1118 00:51:30,720 --> 00:51:32,380 are too high for some of these things, 1119 00:51:32,380 --> 00:51:34,900 in these Western countries, and that the fact that people 1120 00:51:34,900 --> 00:51:37,850 are clicking on demand indicates there's a legitimate need 1121 00:51:37,850 --> 00:51:41,970 to buy Windows copies that may be riddled with malware. 1122 00:51:41,970 --> 00:51:44,399 So a lot of times the spammers themselves 1123 00:51:44,399 --> 00:51:46,190 don't feel that they're doing anything bad. 1124 00:51:46,190 --> 00:51:48,050 And as we'll discuss a little bit later, 1125 00:51:48,050 --> 00:51:50,430 the spammers do often actually give you 1126 00:51:50,430 --> 00:51:52,476 the stuff that you've paid money for, 1127 00:51:52,476 --> 00:51:54,642 which for me was one of the most surprising outcomes 1128 00:51:54,642 --> 00:51:55,790 of the paper. 1129 00:51:55,790 --> 00:51:59,610 And so we'll discuss why that is in a little bit. 1130 00:51:59,610 --> 00:52:02,030 So one thing that the paper talks about 1131 00:52:02,030 --> 00:52:05,380 is various takedown strategies that you 1132 00:52:05,380 --> 00:52:09,680 can imagine employing to try to stop a spammer. 1133 00:52:09,680 --> 00:52:11,420 So one thing it talked about, they 1134 00:52:11,420 --> 00:52:24,900 said that only a few number of registrars host 1135 00:52:24,900 --> 00:52:27,955 domains for many affiliates. 1136 00:52:32,330 --> 00:52:37,195 And so what that means is that most of these affiliate 1137 00:52:37,195 --> 00:52:40,900 programs are-- there's sort of this one-to-one binding 1138 00:52:40,900 --> 00:52:43,350 between affiliates and the registrars that 1139 00:52:43,350 --> 00:52:45,950 are dealing with their domain name and infrastructure. 1140 00:52:45,950 --> 00:52:48,360 It's very rare that you have a single domain name 1141 00:52:48,360 --> 00:52:51,280 registrar who's going to be associated 1142 00:52:51,280 --> 00:52:53,390 with a bunch of different affiliate programs. 1143 00:52:53,390 --> 00:52:55,056 So what that means is that in many cases 1144 00:52:55,056 --> 00:52:57,240 there's not this, like, master decapitation strike 1145 00:52:57,240 --> 00:52:58,520 you could launch, where you'd take out 1146 00:52:58,520 --> 00:53:00,603 this particular registrar and then all of a sudden 1147 00:53:00,603 --> 00:53:03,360 the entire spam infrastructure falls down. 1148 00:53:03,360 --> 00:53:09,670 They found similar results for things like web servers. 1149 00:53:09,670 --> 00:53:12,330 It's very rare that one ISP will actually 1150 00:53:12,330 --> 00:53:16,230 host a ton of web servers for a ton of affiliate programs. 1151 00:53:16,230 --> 00:53:17,910 This distributed nature, once again, 1152 00:53:17,910 --> 00:53:20,000 makes it very difficult to say, if we just 1153 00:53:20,000 --> 00:53:23,050 take out these three things then the whole ecosystem just 1154 00:53:23,050 --> 00:53:25,560 crumbles. 1155 00:53:25,560 --> 00:53:27,300 So that's a little bit disappointing, 1156 00:53:27,300 --> 00:53:29,130 because one would hope that there'd 1157 00:53:29,130 --> 00:53:34,000 be one web server in Evildonia, where if we could just 1158 00:53:34,000 --> 00:53:36,865 take down Evildonia, then people would stop sending us spam. 1159 00:53:36,865 --> 00:53:38,290 That's actually not true. 1160 00:53:38,290 --> 00:53:40,490 As we'll see later, though, that may 1161 00:53:40,490 --> 00:53:42,470 be true to some extent at the banking back end. 1162 00:53:42,470 --> 00:53:44,990 And so maybe we can actually put the squeeze on there. 1163 00:53:44,990 --> 00:53:48,580 So anyway, I was alluding to earlier about this realization 1164 00:53:48,580 --> 00:53:51,320 phase. 1165 00:53:51,320 --> 00:53:57,220 So the realization phase is what happens after you, the user, 1166 00:53:57,220 --> 00:54:00,050 have decided to buy something. 1167 00:54:00,050 --> 00:54:03,660 So the realization phase consists of two parts. 1168 00:54:03,660 --> 00:54:07,770 The user pays for whatever goods they've bought, 1169 00:54:07,770 --> 00:54:14,140 or they want to buy, and then the user hopefully 1170 00:54:14,140 --> 00:54:17,700 will receive those goods. 1171 00:54:17,700 --> 00:54:20,450 So either in the mail because they're 1172 00:54:20,450 --> 00:54:23,180 buying some type of knockoff drug, 1173 00:54:23,180 --> 00:54:25,489 or they get some software download 1174 00:54:25,489 --> 00:54:27,780 because they want to get some fake version of Photoshop 1175 00:54:27,780 --> 00:54:28,780 or something like that. 1176 00:54:28,780 --> 00:54:33,870 And so the money flow looks something like this. 1177 00:54:33,870 --> 00:54:38,840 We start with the customer here, and they're 1178 00:54:38,840 --> 00:54:44,180 going to tell the merchant hey, I want to go buy something. 1179 00:54:44,180 --> 00:54:47,430 They will send some credit card info here, 1180 00:54:47,430 --> 00:54:50,050 and then the merchant is going to talk to the payment 1181 00:54:50,050 --> 00:54:52,800 processor. 1182 00:54:52,800 --> 00:54:54,840 And this is essentially a middleman 1183 00:54:54,840 --> 00:54:58,650 that helps the merchant, the spammer, 1184 00:54:58,650 --> 00:55:00,710 deal with some of the intricacies of interacting 1185 00:55:00,710 --> 00:55:03,160 with the credit card system. 1186 00:55:03,160 --> 00:55:07,320 The payment processor will talk to the acquiring bank. 1187 00:55:10,097 --> 00:55:12,180 So the acquiring bank, that's the merchant's bank. 1188 00:55:17,630 --> 00:55:20,000 And then the acquiring bank-- running out of space here. 1189 00:55:20,000 --> 00:55:24,120 So, violating all good design standards, 1190 00:55:24,120 --> 00:55:25,880 we will come up here. 1191 00:55:25,880 --> 00:55:28,860 So the acquiring bank is then going to talk to-- they 1192 00:55:28,860 --> 00:55:33,400 call them in the paper the association network, 1193 00:55:33,400 --> 00:55:35,940 but just think of this as Visa. 1194 00:55:35,940 --> 00:55:40,170 This is the credit card network up here. 1195 00:55:40,170 --> 00:55:42,290 And then finally the association network, 1196 00:55:42,290 --> 00:55:48,460 Visa or MasterCard or whatever, talks to the issuing bank. 1197 00:55:48,460 --> 00:55:52,070 So that issuing bank is the customer's bank. 1198 00:55:52,070 --> 00:55:57,067 And essentially the Visa or whoever 1199 00:55:57,067 --> 00:55:59,150 is going to go to the customer's bank and say hey, 1200 00:55:59,150 --> 00:56:00,191 is this a legit purchase? 1201 00:56:00,191 --> 00:56:01,570 Is this a legit transaction? 1202 00:56:01,570 --> 00:56:03,280 And if this is a legit transaction, 1203 00:56:03,280 --> 00:56:04,970 then the money will actually flow 1204 00:56:04,970 --> 00:56:06,255 through this entire system. 1205 00:56:06,255 --> 00:56:11,810 So this is what the end-to-end financial workflow looks like. 1206 00:56:11,810 --> 00:56:13,992 And so this workflow can actually 1207 00:56:13,992 --> 00:56:14,950 process a lot of money. 1208 00:56:14,950 --> 00:56:18,030 So one of the papers that we mentioned in the lecture notes 1209 00:56:18,030 --> 00:56:20,090 shows that a single affiliate can 1210 00:56:20,090 --> 00:56:23,530 get more than $10 million dollars at this workflow here. 1211 00:56:23,530 --> 00:56:26,580 And so in practice, you might think that oh, 1212 00:56:26,580 --> 00:56:29,610 why wouldn't the acquiring bank or the issuing 1213 00:56:29,610 --> 00:56:31,980 bank say, something looks kind of fishy here? 1214 00:56:31,980 --> 00:56:35,740 As it turns, in many cases, they don't. 1215 00:56:35,740 --> 00:56:37,960 And so this gets into this interesting discussion 1216 00:56:37,960 --> 00:56:45,580 about why is it that these workflows are often tolerated 1217 00:56:45,580 --> 00:56:46,790 by the financial system. 1218 00:56:46,790 --> 00:56:54,480 For example, why do spammers properly 1219 00:56:54,480 --> 00:56:55,650 classify their transactions? 1220 00:56:58,930 --> 00:57:05,160 So if you want to send something through this system, 1221 00:57:05,160 --> 00:57:08,942 you have to tag that transaction with some type of type. 1222 00:57:08,942 --> 00:57:10,650 You have to say, this is pharmaceuticals, 1223 00:57:10,650 --> 00:57:13,250 this is software, this is whatever, this is whatever. 1224 00:57:13,250 --> 00:57:15,300 So you might think that as a spammer, 1225 00:57:15,300 --> 00:57:18,390 you wouldn't actually want to do this. 1226 00:57:18,390 --> 00:57:22,157 If you were selling fake Flintstones vitamins, 1227 00:57:22,157 --> 00:57:23,990 maybe you don't want to say this is actually 1228 00:57:23,990 --> 00:57:25,810 a pharmaceutical transaction. 1229 00:57:25,810 --> 00:57:28,170 And what's interesting is that spammers do actually 1230 00:57:28,170 --> 00:57:30,840 properly classify these transactions in many cases. 1231 00:57:30,840 --> 00:57:37,660 And the reason is that there are high fines if you misclassify. 1232 00:57:40,520 --> 00:57:46,590 So essentially what happens is that these association networks 1233 00:57:46,590 --> 00:57:50,440 like Visa or Mastercard, in many cases 1234 00:57:50,440 --> 00:57:52,985 they are OK, perhaps, with transactions 1235 00:57:52,985 --> 00:57:54,730 that are slightly shady. 1236 00:57:54,730 --> 00:57:57,810 But they don't want to be blamed for being a money launderer, 1237 00:57:57,810 --> 00:58:00,330 or for trying to deceive the authorities. 1238 00:58:00,330 --> 00:58:04,480 So as long as you properly classify what you do, then 1239 00:58:04,480 --> 00:58:06,970 in a certain sense this gives the association 1240 00:58:06,970 --> 00:58:08,790 networks a little bit of, well, listen, 1241 00:58:08,790 --> 00:58:10,700 they told us what was going on. 1242 00:58:10,700 --> 00:58:12,540 Maybe the law was a little bit unclear. 1243 00:58:12,540 --> 00:58:14,410 But we, at least, Visa or MasterCard, 1244 00:58:14,410 --> 00:58:18,140 did not try to hide the intent of this transaction. 1245 00:58:18,140 --> 00:58:20,195 So spammers do oftentimes properly classify 1246 00:58:20,195 --> 00:58:22,750 their transactions. 1247 00:58:22,750 --> 00:58:23,879 So that's interesting. 1248 00:58:23,879 --> 00:58:25,920 It seems like they're playing within the confines 1249 00:58:25,920 --> 00:58:27,520 of the system a little bit. 1250 00:58:27,520 --> 00:58:30,450 So another question I mentioned earlier 1251 00:58:30,450 --> 00:58:33,970 is, why send anything to users? 1252 00:58:38,240 --> 00:58:41,400 Because presumably you're a spammer, so you're a criminal, 1253 00:58:41,400 --> 00:58:41,900 right? 1254 00:58:41,900 --> 00:58:45,545 So why wouldn't it just be cool if you just took people's money 1255 00:58:45,545 --> 00:58:46,340 and then ran? 1256 00:58:46,340 --> 00:58:48,050 I mean, that'd be the ultimate crime. 1257 00:58:48,050 --> 00:58:53,260 So as it turns out, they actually send things to users 1258 00:58:53,260 --> 00:58:59,150 because, surprise surprise, high fines if they don't. 1259 00:58:59,150 --> 00:59:00,780 So it's this very entertaining system 1260 00:59:00,780 --> 00:59:03,660 whereby spammers kind of want to do things that are legal, 1261 00:59:03,660 --> 00:59:06,000 when they actually can't use Bitcoins yet. 1262 00:59:06,000 --> 00:59:08,634 They actually have to work within the constraints 1263 00:59:08,634 --> 00:59:09,800 of this pre-existing system. 1264 00:59:09,800 --> 00:59:12,485 So as it turns out, there are these high fines 1265 00:59:12,485 --> 00:59:19,000 if you, and by you I mean the spammer, 1266 00:59:19,000 --> 00:59:20,160 have too many chargebacks. 1267 00:59:24,050 --> 00:59:29,370 So a chargeback is essentially when a customer 1268 00:59:29,370 --> 00:59:31,280 tells their credit card company, hey, 1269 00:59:31,280 --> 00:59:34,805 I didn't get the thing that I was supposed to get that I 1270 00:59:34,805 --> 00:59:36,040 bought with your credit card. 1271 00:59:36,040 --> 00:59:38,120 Or I got it, but they didn't like it. 1272 00:59:38,120 --> 00:59:41,400 So if you're a spammer and you have too many customers saying 1273 00:59:41,400 --> 00:59:43,150 things like this, then you will actually 1274 00:59:43,150 --> 00:59:45,580 get charged very, very high fines. 1275 00:59:45,580 --> 00:59:50,550 And as we saw earlier, the clickthrough rates for spam 1276 00:59:50,550 --> 00:59:52,285 are super, super low. 1277 00:59:52,285 --> 00:59:55,070 The conversion rates are super, super low. 1278 00:59:55,070 --> 00:59:58,290 So even just one or two fines might wipe out 1279 00:59:58,290 --> 01:00:00,302 your entire profit for a month, let's 1280 01:00:00,302 --> 01:00:01,510 say, for something like this. 1281 01:00:01,510 --> 01:00:03,860 So spammers are really motivated to avoid these fines 1282 01:00:03,860 --> 01:00:04,850 in both cases. 1283 01:00:04,850 --> 01:00:07,920 AUDIENCE: Would using Paypal obscure any of that, 1284 01:00:07,920 --> 01:00:10,590 like the relationship with the bank? 1285 01:00:10,590 --> 01:00:13,690 PROFESSOR: Well, typically, yes and no. 1286 01:00:13,690 --> 01:00:17,930 So you can think of those-- Paypal is in many respects 1287 01:00:17,930 --> 01:00:20,410 very similar to Visa or MasterCard. 1288 01:00:20,410 --> 01:00:24,420 So it has very similar regulations that oversee it, 1289 01:00:24,420 --> 01:00:27,080 because it bears many of the same types of risks. 1290 01:00:27,080 --> 01:00:31,122 I do think that Visa has slightly stricter 1291 01:00:31,122 --> 01:00:32,580 restrictions on some of this stuff, 1292 01:00:32,580 --> 01:00:34,000 as we'll talk about in a second. 1293 01:00:34,000 --> 01:00:35,375 But for all intents and purposes, 1294 01:00:35,375 --> 01:00:37,012 Paypal looks very similar. 1295 01:00:37,012 --> 01:00:39,200 AUDIENCE: Is there any sort of idea 1296 01:00:39,200 --> 01:00:42,405 of having a group where you make some sort of account 1297 01:00:42,405 --> 01:00:44,520 and then intentionally go to a bunch of spammers, 1298 01:00:44,520 --> 01:00:48,180 buy a bunch of things, and then ask for a bunch of chargebacks 1299 01:00:48,180 --> 01:00:50,590 whether or not they send it to you? 1300 01:00:50,590 --> 01:00:52,470 So that they incur these fines. 1301 01:00:52,470 --> 01:00:55,110 Or report them for misclassifying things, 1302 01:00:55,110 --> 01:00:57,540 in order to just make them pay these fines. 1303 01:00:57,540 --> 01:00:59,830 PROFESSOR: That's interesting. 1304 01:00:59,830 --> 01:01:00,706 It's like vigilantes. 1305 01:01:00,706 --> 01:01:01,871 AUDIENCE: Spam the spammers. 1306 01:01:01,871 --> 01:01:03,030 PROFESSOR: Yeah, exactly. 1307 01:01:03,030 --> 01:01:04,988 I don't know if I've heard anything about that. 1308 01:01:04,988 --> 01:01:09,630 I do know that the spammers do try to detect 1309 01:01:09,630 --> 01:01:11,350 people who are trolling them. 1310 01:01:11,350 --> 01:01:14,710 So for example, one thing that they talked about in the paper 1311 01:01:14,710 --> 01:01:18,160 a little bit is that spammers-- so how 1312 01:01:18,160 --> 01:01:21,519 did the authors of the paper determine all this? 1313 01:01:21,519 --> 01:01:23,310 They actually got a bunch of spam messages, 1314 01:01:23,310 --> 01:01:24,685 they clicked on a bunch of stuff. 1315 01:01:24,685 --> 01:01:26,230 They got a special Visa card they 1316 01:01:26,230 --> 01:01:28,870 used to purchase this stuff, and then so on and so forth. 1317 01:01:28,870 --> 01:01:31,250 So spammers obviously don't like this. 1318 01:01:31,250 --> 01:01:33,810 And so in the paper they call this test buys. 1319 01:01:33,810 --> 01:01:35,620 Spammers want to prevent these test buys 1320 01:01:35,620 --> 01:01:38,430 from researchers who are trying to figure out what's going on. 1321 01:01:38,430 --> 01:01:41,990 So one thing that some spammers did-- do, I should say-- 1322 01:01:41,990 --> 01:01:45,330 is they actually require proof of your identity 1323 01:01:45,330 --> 01:01:46,730 before you can buy something. 1324 01:01:46,730 --> 01:01:49,820 So they might ask you to send a picture of your photo ID, 1325 01:01:49,820 --> 01:01:51,470 or something like that. 1326 01:01:51,470 --> 01:01:53,790 In particular, some people started 1327 01:01:53,790 --> 01:01:58,000 doing this after Visa tightened up some of their rules 1328 01:01:58,000 --> 01:01:58,720 about spam. 1329 01:01:58,720 --> 01:02:04,500 Now, the problem with this is that most people who 1330 01:02:04,500 --> 01:02:07,000 would click on span apparently are still 1331 01:02:07,000 --> 01:02:10,470 reluctant to send their photo ID to just some random person. 1332 01:02:10,470 --> 01:02:12,527 So there's a bunch of-- I've linked 1333 01:02:12,527 --> 01:02:14,027 one of these articles in the lecture 1334 01:02:14,027 --> 01:02:15,460 notes-- there's a bunch of hilarious commentary 1335 01:02:15,460 --> 01:02:18,200 from a spammer bulletin board, where they say oh no, Visa's 1336 01:02:18,200 --> 01:02:19,260 cracking down on us. 1337 01:02:19,260 --> 01:02:21,390 We try to ask for people's photo IDs, 1338 01:02:21,390 --> 01:02:23,820 but they don't want to send it to us for some reason. 1339 01:02:23,820 --> 01:02:25,840 And it's so weird that people wouldn't want to do that, 1340 01:02:25,840 --> 01:02:27,490 but they will give them their credit card number. 1341 01:02:27,490 --> 01:02:29,198 But anyway, so long story short, spammers 1342 01:02:29,198 --> 01:02:33,375 are highly incentivized to try to detect that kind of stuff. 1343 01:02:33,375 --> 01:02:36,854 AUDIENCE: So for chargebacks, if you don't necessarily 1344 01:02:36,854 --> 01:02:40,333 want your bank to know that you were buying these completely 1345 01:02:40,333 --> 01:02:44,309 shady items, do a lot of users actually do chargebacks 1346 01:02:44,309 --> 01:02:45,800 if they don't get the item? 1347 01:02:45,800 --> 01:02:47,800 Or are they too embarrassed? 1348 01:02:47,800 --> 01:02:49,466 PROFESSOR: Yeah, that's a good question. 1349 01:02:49,466 --> 01:02:52,540 I don't know what fraction of people 1350 01:02:52,540 --> 01:02:54,890 are in the set of people who bought 1351 01:02:54,890 --> 01:02:56,830 herbal Flintstones vitamins, were disappointed 1352 01:02:56,830 --> 01:02:58,290 by herbal Flintstones vitamins, and then, 1353 01:02:58,290 --> 01:03:00,706 yeah, told their bank-- but what's interesting, though, is 1354 01:03:00,706 --> 01:03:03,016 that the bank has to know in the first place 1355 01:03:03,016 --> 01:03:04,390 that they're going to this place, 1356 01:03:04,390 --> 01:03:06,120 right, because the thing went through. 1357 01:03:06,120 --> 01:03:09,634 So avoiding the chargeback, I don't think you're going to-- 1358 01:03:09,634 --> 01:03:11,300 but by doing the chargeback, let me say, 1359 01:03:11,300 --> 01:03:13,799 I don't think you'd reveal any extra information to the bank 1360 01:03:13,799 --> 01:03:15,000 that they wouldn't already know. 1361 01:03:15,000 --> 01:03:17,291 Because they had to clear the transaction first for you 1362 01:03:17,291 --> 01:03:19,000 to actually get it and be disappointed. 1363 01:03:19,000 --> 01:03:22,320 AUDIENCE: So then roughly how many chargebacks is too much? 1364 01:03:22,320 --> 01:03:24,410 PROFESSOR: So some of the figures I've heard here 1365 01:03:24,410 --> 01:03:26,862 are greater than 1%. 1366 01:03:26,862 --> 01:03:28,445 So in other words, if you're a spammer 1367 01:03:28,445 --> 01:03:30,890 and you have more than 1% of your transactions causing 1368 01:03:30,890 --> 01:03:33,142 these problems, you get in trouble. 1369 01:03:33,142 --> 01:03:35,475 And I wouldn't be surprised if it was a little bit lower 1370 01:03:35,475 --> 01:03:37,794 than that, but 1% is the number that I've heard. 1371 01:03:41,220 --> 01:03:41,720 All right. 1372 01:03:41,720 --> 01:03:44,540 So to me, like I said, this was one 1373 01:03:44,540 --> 01:03:46,607 of the most interesting parts of the paper. 1374 01:03:46,607 --> 01:03:48,940 Because I would have thought that a lot of spamming just 1375 01:03:48,940 --> 01:03:50,234 involved straight-up fraud. 1376 01:03:50,234 --> 01:03:52,150 That people clicked on links, they sent money, 1377 01:03:52,150 --> 01:03:53,149 they never got anything. 1378 01:03:53,149 --> 01:03:55,272 But as it turns out, because these spammers have 1379 01:03:55,272 --> 01:03:58,130 to go through this network which has 1380 01:03:58,130 --> 01:04:02,330 all these mechanisms to prevent fraud, 1381 01:04:02,330 --> 01:04:06,892 they end up having to actually ship things over to users. 1382 01:04:06,892 --> 01:04:10,030 So that's kind of neat. 1383 01:04:10,030 --> 01:04:12,400 And so another reason why spammers 1384 01:04:12,400 --> 01:04:14,940 want to do these things, properly classify transactions 1385 01:04:14,940 --> 01:04:16,610 and actually send things to users, 1386 01:04:16,610 --> 01:04:24,650 is that only a few banks are actually 1387 01:04:24,650 --> 01:04:28,320 willing to interact with spammers. 1388 01:04:32,590 --> 01:04:38,894 And so what this means is that if the spammer is 1389 01:04:38,894 --> 01:04:40,560 getting a lot of chargebacks, or getting 1390 01:04:40,560 --> 01:04:42,685 in trouble with the bank or the credit card company 1391 01:04:42,685 --> 01:04:44,549 or whatever, and some bank decides, 1392 01:04:44,549 --> 01:04:46,090 I can't do business with you anymore, 1393 01:04:46,090 --> 01:04:49,030 there's not a really large set of other banks 1394 01:04:49,030 --> 01:04:53,120 that the spammer could go to to continue their chicanery. 1395 01:04:53,120 --> 01:04:57,440 So one study of this stuff found that there are basically 1396 01:04:57,440 --> 01:05:06,290 only 30 acquiring banks that spammers were seen to use over 1397 01:05:06,290 --> 01:05:07,530 some two-year period. 1398 01:05:07,530 --> 01:05:09,360 That's actually not very high. 1399 01:05:09,360 --> 01:05:14,166 So there is this other incentive to not 1400 01:05:14,166 --> 01:05:15,790 be too goofy with the financial system, 1401 01:05:15,790 --> 01:05:18,165 because you don't really have too many other places to go 1402 01:05:18,165 --> 01:05:20,300 if you break those relationships. 1403 01:05:20,300 --> 01:05:25,140 So it seems like maybe this is a good choke point 1404 01:05:25,140 --> 01:05:26,910 to try to cut down on spam. 1405 01:05:26,910 --> 01:05:29,075 So we've already discussed how things like botnets 1406 01:05:29,075 --> 01:05:31,140 give the attack a lot of IP addresses. 1407 01:05:31,140 --> 01:05:33,919 There's a lot of different types of hosts 1408 01:05:33,919 --> 01:05:36,210 who are willing to run web servers, so on and so forth. 1409 01:05:36,210 --> 01:05:37,751 But this number actually seems small. 1410 01:05:37,751 --> 01:05:41,660 So maybe we can actually attack spamming here. 1411 01:05:41,660 --> 01:05:43,920 But as I alluded to earlier, it's a little bit tricky 1412 01:05:43,920 --> 01:05:46,900 to do this because of things like differing IP laws, 1413 01:05:46,900 --> 01:05:50,290 because of things like the fact that it 1414 01:05:50,290 --> 01:05:54,830 can be sort of tricky to actually say that spammers 1415 01:05:54,830 --> 01:05:57,560 are doing something illegal. 1416 01:05:57,560 --> 01:06:00,230 So if you are using spam messages 1417 01:06:00,230 --> 01:06:03,220 to sell someone-- let's make this up, let's say sugar, 1418 01:06:03,220 --> 01:06:04,130 sugar's delicious. 1419 01:06:04,130 --> 01:06:07,252 It's not illegal to sell sugar, even at cut-rate prices. 1420 01:06:07,252 --> 01:06:08,710 So even though the way that you may 1421 01:06:08,710 --> 01:06:11,400 have drawn the user to that purchase 1422 01:06:11,400 --> 01:06:13,970 was sort of duplicitous or gross, 1423 01:06:13,970 --> 01:06:17,180 it is not in and of itself illegal to sell someone sugar. 1424 01:06:17,180 --> 01:06:18,860 And so as it turns out, a lot of spam 1425 01:06:18,860 --> 01:06:21,647 sort of falls into this gray area, 1426 01:06:21,647 --> 01:06:23,480 where the things that the spammers are doing 1427 01:06:23,480 --> 01:06:26,510 are distasteful, but maybe not necessarily as illegal 1428 01:06:26,510 --> 01:06:27,370 as you'd think. 1429 01:06:27,370 --> 01:06:30,350 Now, for stuff like pirated software, 1430 01:06:30,350 --> 01:06:31,742 there it's much more clear-cut. 1431 01:06:31,742 --> 01:06:33,700 But suffice it to say, it's not always the case 1432 01:06:33,700 --> 01:06:35,710 that you can just point to one of these banks and say hey, 1433 01:06:35,710 --> 01:06:36,918 your customers are criminals. 1434 01:06:36,918 --> 01:06:38,220 Because that's not always true. 1435 01:06:38,220 --> 01:06:44,870 Particularly if there's not a very strong paper trail that 1436 01:06:44,870 --> 01:06:48,230 attaches the financial transaction to some spam 1437 01:06:48,230 --> 01:06:51,160 URL that was the origin of the transaction. 1438 01:06:51,160 --> 01:06:55,050 It's often very difficult to prove those types of links. 1439 01:06:55,050 --> 01:06:58,260 OK, so since this paper was published, 1440 01:06:58,260 --> 01:07:00,952 the credit card networks have taken some actions. 1441 01:07:00,952 --> 01:07:02,910 So this paper actually made a pretty big splash 1442 01:07:02,910 --> 01:07:04,100 when it came out. 1443 01:07:04,100 --> 01:07:07,430 And so the association networks like Visa and MasterCard 1444 01:07:07,430 --> 01:07:09,560 and all of them were wondering, what can we 1445 01:07:09,560 --> 01:07:13,510 do to cut down on some of this spam? 1446 01:07:13,510 --> 01:07:15,360 So interestingly, after the paper came out, 1447 01:07:15,360 --> 01:07:18,710 some pharmaceutical companies and software vendors actually 1448 01:07:18,710 --> 01:07:21,000 lodged complaints with Visa. 1449 01:07:21,000 --> 01:07:22,450 So if you remember from the paper, 1450 01:07:22,450 --> 01:07:25,790 Visa was the association network the researchers used 1451 01:07:25,790 --> 01:07:28,640 to make these test buys, these dummy buys. 1452 01:07:28,640 --> 01:07:30,890 So it's a little bit unfortunate, 1453 01:07:30,890 --> 01:07:33,600 but that then showed some of these companies 1454 01:07:33,600 --> 01:07:37,510 that hey, Visa can be used as the association network 1455 01:07:37,510 --> 01:07:39,280 to fund some of this spam, or to translate 1456 01:07:39,280 --> 01:07:41,590 some of this spam traffic. 1457 01:07:41,590 --> 01:07:44,700 So some people complained about that. 1458 01:07:44,700 --> 01:07:51,270 So Visa made some policy changes in response 1459 01:07:51,270 --> 01:07:53,600 to some of the issues that were brought up 1460 01:07:53,600 --> 01:07:56,460 in the paper and some of the complaints 1461 01:07:56,460 --> 01:07:59,120 that they got as a result. So now, 1462 01:07:59,120 --> 01:08:07,090 for example, all pharmaceutical sales are now 1463 01:08:07,090 --> 01:08:11,780 labeled by Visa as high-risk. 1464 01:08:14,990 --> 01:08:19,439 So what this means is that if a bank acts 1465 01:08:19,439 --> 01:08:27,859 as an acquirer for these high-risk transactions, 1466 01:08:27,859 --> 01:08:31,569 then Visa will have some more stringent regulations they will 1467 01:08:31,569 --> 01:08:34,460 put on that merchant-side bank. 1468 01:08:34,460 --> 01:08:36,729 For example, they will require that bank 1469 01:08:36,729 --> 01:08:38,920 to engage in a risk management program, 1470 01:08:38,920 --> 01:08:40,970 and they may be audited more frequently, 1471 01:08:40,970 --> 01:08:42,229 and so on and so forth. 1472 01:08:42,229 --> 01:08:45,410 So Visa made that change. 1473 01:08:45,410 --> 01:08:52,430 And Visa also changed its operating guidelines. 1474 01:08:52,430 --> 01:08:58,720 So its operating guidelines, now they 1475 01:08:58,720 --> 01:09:07,220 explicitly enumerate and forbid illegal sales of drugs 1476 01:09:07,220 --> 01:09:08,970 and trademark-enforcing goods. 1477 01:09:12,050 --> 01:09:14,689 So the reason why they did this is that by tightening up 1478 01:09:14,689 --> 01:09:17,270 this language, it is now easier for them 1479 01:09:17,270 --> 01:09:21,737 to issue more aggressive fines against banks and merchants 1480 01:09:21,737 --> 01:09:25,680 that they feel are doing things like selling 1481 01:09:25,680 --> 01:09:29,859 illegal pharmaceuticals or selling knockoff versions 1482 01:09:29,859 --> 01:09:32,065 of watches or things like that. 1483 01:09:32,065 --> 01:09:33,815 So once again, there's still a lot of spam 1484 01:09:33,815 --> 01:09:36,590 that's in that gray area where it's not necessarily illegal. 1485 01:09:36,590 --> 01:09:37,624 It's just that the customers were required 1486 01:09:37,624 --> 01:09:38,665 to do certain techniques. 1487 01:09:38,665 --> 01:09:40,459 And this is very useful because now Visa 1488 01:09:40,459 --> 01:09:44,450 can drop some much bigger hammers on folks. 1489 01:09:44,450 --> 01:09:46,450 And as I mentioned before, some of the spammers 1490 01:09:46,450 --> 01:09:48,420 tried to react to this by saying, 1491 01:09:48,420 --> 01:09:50,880 well, let's just prevent these test buys. 1492 01:09:50,880 --> 01:09:52,796 Because not only do security researchers do 1493 01:09:52,796 --> 01:09:54,902 these test buys, but the association networks can 1494 01:09:54,902 --> 01:09:55,860 do these test buys too. 1495 01:09:55,860 --> 01:09:58,160 So they did some things like the photo ID type stuff, 1496 01:09:58,160 --> 01:10:01,820 and that tended not to work out super well. 1497 01:10:01,820 --> 01:10:04,460 And so at least a few years after these changes were made, 1498 01:10:04,460 --> 01:10:05,900 this did have an impact. 1499 01:10:05,900 --> 01:10:09,160 I'm not sure what the latest state-of-the-art is with 1500 01:10:09,160 --> 01:10:12,014 respect to trolling these Visa policy changes, 1501 01:10:12,014 --> 01:10:14,430 but it was kind of cool to see this paper have this impact 1502 01:10:14,430 --> 01:10:16,574 in real life. 1503 01:10:16,574 --> 01:10:18,740 So one interesting thing they mentioned in the paper 1504 01:10:18,740 --> 01:10:21,825 is they talked about the ethical aspects 1505 01:10:21,825 --> 01:10:23,260 of doing security research. 1506 01:10:23,260 --> 01:10:27,960 And in particular, doing this research about the spam chain. 1507 01:10:27,960 --> 01:10:31,530 To actually understand how some of this banking stuff worked, 1508 01:10:31,530 --> 01:10:34,700 these researchers actually had to make purchases. 1509 01:10:34,700 --> 01:10:37,890 They actually had to give money to people 1510 01:10:37,890 --> 01:10:39,310 in exchange for these products. 1511 01:10:39,310 --> 01:10:41,420 And so in the paper they go through this kind 1512 01:10:41,420 --> 01:10:44,857 of semi-hilarious defensive section where they say, 1513 01:10:44,857 --> 01:10:46,690 we totally burned everything that we bought. 1514 01:10:46,690 --> 01:10:47,398 We didn't use it. 1515 01:10:47,398 --> 01:10:49,972 We talked to the companies whose pirated software we 1516 01:10:49,972 --> 01:10:51,320 were buying before we got it. 1517 01:10:51,320 --> 01:10:53,240 But these things are actually pretty important to go through, 1518 01:10:53,240 --> 01:10:55,100 particularly if you're within a university setting. 1519 01:10:55,100 --> 01:10:56,600 Because as you may know, if you want 1520 01:10:56,600 --> 01:10:59,174 to do anything that involves-- particularly human research, 1521 01:10:59,174 --> 01:11:01,590 but anything that might have these ethical sort of aspects 1522 01:11:01,590 --> 01:11:04,060 to it, you have to get things cleared by lawyers, sometimes 1523 01:11:04,060 --> 01:11:06,121 by an IRB, and things like that. 1524 01:11:06,121 --> 01:11:07,870 So it's actually pretty important for them 1525 01:11:07,870 --> 01:11:10,820 to jump through these hoops, because at the end of the day 1526 01:11:10,820 --> 01:11:13,090 they have to at least be somewhat confident that they 1527 01:11:13,090 --> 01:11:16,170 weren't supporting some deeply nefarious activity 1528 01:11:16,170 --> 01:11:18,130 in some far-flung corner of the world. 1529 01:11:18,130 --> 01:11:20,640 So that was another interesting part of the paper, too. 1530 01:11:20,640 --> 01:11:23,390 And other people have talked in this class about things like, 1531 01:11:23,390 --> 01:11:27,610 what are the ethics of releasing zero-day exploits if you 1532 01:11:27,610 --> 01:11:29,360 know they haven't been patched by someone? 1533 01:11:29,360 --> 01:11:30,818 So it's a really interesting aspect 1534 01:11:30,818 --> 01:11:32,075 of doing security research. 1535 01:11:32,075 --> 01:11:36,350 AUDIENCE: Is there any sort of oversight on security ethics? 1536 01:11:36,350 --> 01:11:39,042 Because in the paper, they said the IRB wasn't interested. 1537 01:11:39,042 --> 01:11:41,000 PROFESSOR: Yeah, so that was super interesting. 1538 01:11:41,000 --> 01:11:41,500 Yes. 1539 01:11:41,500 --> 01:11:44,470 They said the IRB wasn't interested, I think, 1540 01:11:44,470 --> 01:11:48,940 because there was no obvious human subject. 1541 01:11:48,940 --> 01:11:50,890 But I think that at most universities, 1542 01:11:50,890 --> 01:11:53,015 you couldn't just say, oh, there's 1543 01:11:53,015 --> 01:11:54,515 no direct human subject, let me just 1544 01:11:54,515 --> 01:11:58,220 go buy some stuff from somebody at the end of a spam link. 1545 01:11:58,220 --> 01:12:00,170 And what they describe in the paper, actually 1546 01:12:00,170 --> 01:12:01,240 in the acknowledgment section, they 1547 01:12:01,240 --> 01:12:02,730 thank this whole set of people. 1548 01:12:02,730 --> 01:12:06,024 Like, Sally at Legal, so-and-so at the Philosophers 1549 01:12:06,024 --> 01:12:07,440 For Ethical Computing Association, 1550 01:12:07,440 --> 01:12:09,440 and stuff like that. 1551 01:12:09,440 --> 01:12:12,650 I don't think there's actually a, how would 1552 01:12:12,650 --> 01:12:16,820 you say it, an America-wide standard 1553 01:12:16,820 --> 01:12:18,420 for doing this type of research. 1554 01:12:18,420 --> 01:12:20,070 I know that each university's IRB 1555 01:12:20,070 --> 01:12:22,640 has slightly different policies of what they do and do not 1556 01:12:22,640 --> 01:12:26,639 allow, but I don't think there's a blanket policy. 1557 01:12:26,639 --> 01:12:29,477 AUDIENCE: Out of the 350 million spam URLs 1558 01:12:29,477 --> 01:12:33,840 they tracked, of the 28 that actually responded, is there 1559 01:12:33,840 --> 01:12:37,554 any chance that an appreciable number of those 28 spam 1560 01:12:37,554 --> 01:12:39,637 responses were coming from researchers researching 1561 01:12:39,637 --> 01:12:42,332 on spam? 1562 01:12:42,332 --> 01:12:44,540 PROFESSOR: Well, it's true that this type of calculus 1563 01:12:44,540 --> 01:12:46,320 is actually one reason why I think 1564 01:12:46,320 --> 01:12:49,302 the authors went to such lengths to defend themselves. 1565 01:12:49,302 --> 01:12:51,780 Because if you think about it, the reason 1566 01:12:51,780 --> 01:12:53,680 why those statistics are so hilarious 1567 01:12:53,680 --> 01:12:56,210 is that it means that if you were to add five or remove 1568 01:12:56,210 --> 01:12:58,340 five, that's the difference between a spammer being 1569 01:12:58,340 --> 01:13:00,090 able to give their kids, like, a real gift 1570 01:13:00,090 --> 01:13:01,952 versus a piece of coal. 1571 01:13:01,952 --> 01:13:03,410 Because those numbers are so small. 1572 01:13:06,712 --> 01:13:08,587 So with regard to that particular [INAUDIBLE] 1573 01:13:08,587 --> 01:13:10,545 that I gave you, I don't know how many of those 1574 01:13:10,545 --> 01:13:12,190 were researchers. 1575 01:13:12,190 --> 01:13:15,420 But I do think in general-- like I said, the spammers, 1576 01:13:15,420 --> 01:13:17,010 they want to take your money. 1577 01:13:17,010 --> 01:13:19,460 And so if they could find some equilibrium 1578 01:13:19,460 --> 01:13:23,200 whereby security researchers could do test buys, 1579 01:13:23,200 --> 01:13:25,650 but that had no impact on their overall sales, 1580 01:13:25,650 --> 01:13:26,949 they'd be fine with that. 1581 01:13:26,949 --> 01:13:27,990 They just want the money. 1582 01:13:27,990 --> 01:13:29,615 But the tricky thing is that, let's say 1583 01:13:29,615 --> 01:13:32,520 that-- let's make some number up-- half of those 35 1584 01:13:32,520 --> 01:13:34,560 were test buys, and that resulted in people 1585 01:13:34,560 --> 01:13:37,490 putting pressure on the banks, and then instead of 35 they'd 1586 01:13:37,490 --> 01:13:38,470 be getting two. 1587 01:13:38,470 --> 01:13:39,380 That they don't want. 1588 01:13:39,380 --> 01:13:41,956 So that's why they're so motivated to stop that stuff. 1589 01:13:41,956 --> 01:13:44,436 AUDIENCE: How much of this is blind emailing 1590 01:13:44,436 --> 01:13:45,924 versus any sort of filtering? 1591 01:13:45,924 --> 01:13:48,652 Because I'm sure they could run some models 1592 01:13:48,652 --> 01:13:51,380 and get that 350 million down to, like, one page. 1593 01:13:51,380 --> 01:13:54,350 PROFESSOR: Yeah, so it's all about the cost-benefit analysis 1594 01:13:54,350 --> 01:13:56,350 from the perspective of the spammer. 1595 01:13:56,350 --> 01:13:59,660 So I think that you're right, and there are actually-- 1596 01:13:59,660 --> 01:14:02,922 there's a marketplace for more targeted stuff. 1597 01:14:02,922 --> 01:14:05,380 In particular, that's where some of those compromised email 1598 01:14:05,380 --> 01:14:07,650 accounts can become very useful. 1599 01:14:07,650 --> 01:14:10,170 But I think what you see is that people 1600 01:14:10,170 --> 01:14:14,774 tend to go for the more focused stuff, like the more 1601 01:14:14,774 --> 01:14:16,190 focused spam emails, for what they 1602 01:14:16,190 --> 01:14:17,960 view as higher-reward targets. 1603 01:14:17,960 --> 01:14:21,240 So for example, political groups. 1604 01:14:21,240 --> 01:14:24,010 People associated with the Dalai Lama, for instance. 1605 01:14:24,010 --> 01:14:26,620 There, the perceived value of being 1606 01:14:26,620 --> 01:14:28,260 able to get into that system is so high 1607 01:14:28,260 --> 01:14:30,958 that people will spend the time to do this kind of stuff. 1608 01:14:30,958 --> 01:14:32,333 AUDIENCE: It would be interesting 1609 01:14:32,333 --> 01:14:33,940 if there was one company dedicated 1610 01:14:33,940 --> 01:14:35,788 to finding all the gullible grandmas 1611 01:14:35,788 --> 01:14:37,640 and putting their emails into stuff. 1612 01:14:37,640 --> 01:14:38,270 PROFESSOR: Oh, interesting. 1613 01:14:38,270 --> 01:14:38,780 I see. 1614 01:14:38,780 --> 01:14:40,154 So basically having some database 1615 01:14:40,154 --> 01:14:42,660 where it's like, totally send spam to this person, because-- 1616 01:14:42,660 --> 01:14:43,700 AUDIENCE: It works. 1617 01:14:43,700 --> 01:14:45,908 PROFESSOR: I wouldn't be surprised if stuff like that 1618 01:14:45,908 --> 01:14:49,310 existed, but I don't know if they do. 1619 01:14:49,310 --> 01:14:52,110 So one last thing that I wanted to mention is that, 1620 01:14:52,110 --> 01:14:54,730 and I alluded to this a bit earlier in the lecture, 1621 01:14:54,730 --> 01:14:57,970 that some companies have taken to doing these things they 1622 01:14:57,970 --> 01:14:59,357 call hackbacks. 1623 01:14:59,357 --> 01:15:01,440 So the idea is that, let's say that you're a bank, 1624 01:15:01,440 --> 01:15:02,981 someone tries to break into your bank 1625 01:15:02,981 --> 01:15:04,440 and steal your information. 1626 01:15:04,440 --> 01:15:07,040 That bank will then, of their own volition, 1627 01:15:07,040 --> 01:15:10,780 go back to those hackers and try to do something. 1628 01:15:10,780 --> 01:15:13,116 Where something may be as quote-on-quote innocuous 1629 01:15:13,116 --> 01:15:15,090 as shutting down the botnet, or maybe 1630 01:15:15,090 --> 01:15:16,920 they try to steal their information back, 1631 01:15:16,920 --> 01:15:17,794 and things like that. 1632 01:15:17,794 --> 01:15:20,940 This has actually become very much more 1633 01:15:20,940 --> 01:15:22,550 common than it used to be. 1634 01:15:22,550 --> 01:15:26,910 And one reason for this is that because the legal system has 1635 01:15:26,910 --> 01:15:30,261 a little bit slow in adapting to some of these threats, 1636 01:15:30,261 --> 01:15:32,760 some of these institutions, in particular software companies 1637 01:15:32,760 --> 01:15:34,852 and banks, are tired of waiting for government-- 1638 01:15:34,852 --> 01:15:36,560 like, their national government-- to deal 1639 01:15:36,560 --> 01:15:37,540 with stuff. 1640 01:15:37,540 --> 01:15:40,630 So what ends up happening is that, for example, there 1641 01:15:40,630 --> 01:15:43,000 was this big botnet in 2013 that was 1642 01:15:43,000 --> 01:15:45,690 hosting all kinds of pirated goods and things like that. 1643 01:15:45,690 --> 01:15:51,010 And so this huge coalition of Microsoft, American Express, 1644 01:15:51,010 --> 01:15:53,350 Paypal, a bunch of them launched an operation 1645 01:15:53,350 --> 01:15:55,379 to take down a botnet. 1646 01:15:55,379 --> 01:15:56,920 They themselves took down the botnet. 1647 01:15:56,920 --> 01:15:58,586 They lurked around for a while, they 1648 01:15:58,586 --> 01:16:01,210 learned about where the command and control infrastructure was. 1649 01:16:01,210 --> 01:16:02,690 They actually went in there, took 1650 01:16:02,690 --> 01:16:04,773 control of the command and control infrastructure, 1651 01:16:04,773 --> 01:16:06,685 identified where all the end-user bots were. 1652 01:16:06,685 --> 01:16:08,590 And they could send them messages saying, 1653 01:16:08,590 --> 01:16:10,630 you need to patch your machine. 1654 01:16:10,630 --> 01:16:13,790 And so it's a very interesting area of intersection 1655 01:16:13,790 --> 01:16:15,960 between security and the law. 1656 01:16:15,960 --> 01:16:17,850 Because what part of American law, 1657 01:16:17,850 --> 01:16:21,810 for example, gave those companies the right to do that? 1658 01:16:21,810 --> 01:16:24,880 So what Microsoft lawyers said, at least, 1659 01:16:24,880 --> 01:16:26,530 is that they said these botnets were 1660 01:16:26,530 --> 01:16:29,380 violating Microsoft trademarks. 1661 01:16:29,380 --> 01:16:31,450 So for example, if you sell pirated goods, 1662 01:16:31,450 --> 01:16:34,222 and you're saying this is Windows, for example, 1663 01:16:34,222 --> 01:16:36,180 but it's not actually Windows or it didn't come 1664 01:16:36,180 --> 01:16:38,630 from an official channel, then Microsoft says OK, 1665 01:16:38,630 --> 01:16:40,340 you're violating our trademark. 1666 01:16:40,340 --> 01:16:43,330 Therefore we can hack your botnet. 1667 01:16:43,330 --> 01:16:46,980 It's a little interesting to see how that leap of logic 1668 01:16:46,980 --> 01:16:47,760 took place. 1669 01:16:47,760 --> 01:16:49,280 But the courts allowed it. 1670 01:16:49,280 --> 01:16:51,440 And this is increasingly happening more and more. 1671 01:16:51,440 --> 01:16:54,440 And the banks in particular seem to be pretty upset about this, 1672 01:16:54,440 --> 01:16:57,386 because there seems to be a lot of state-level sponsorship 1673 01:16:57,386 --> 01:16:58,835 of some of these banking hacks. 1674 01:16:58,835 --> 01:17:00,840 And the bankers care about the money, 1675 01:17:00,840 --> 01:17:02,350 and so when they lose this money, 1676 01:17:02,350 --> 01:17:04,000 they get very upset about that. 1677 01:17:04,000 --> 01:17:06,470 And so it's interesting to see how 1678 01:17:06,470 --> 01:17:09,630 some of the burden for doing cyber 1679 01:17:09,630 --> 01:17:11,940 security, in particular offensive operations, 1680 01:17:11,940 --> 01:17:14,800 has now shifted a little bit more to the private sector. 1681 01:17:14,800 --> 01:17:17,750 So it's not quite clear what the long-term implications are. 1682 01:17:17,750 --> 01:17:18,250 OK. 1683 01:17:18,250 --> 01:17:19,770 That's the end of the lecture, and I 1684 01:17:19,770 --> 01:17:21,890 guess we will see you on Wednesday 1685 01:17:21,890 --> 01:17:25,240 and we'll go through the class projects.