1 00:00:00,080 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,150 continue to offer high quality educational resources for free. 5 00:00:10,150 --> 00:00:12,690 To make a donation or to view additional materials 6 00:00:12,690 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,260 at ocw.mit.edu. 8 00:00:26,640 --> 00:00:29,142 PROFESSOR: All right, guys, let's get started. 9 00:00:29,142 --> 00:00:31,350 So today, we're going to talk about network security. 10 00:00:31,350 --> 00:00:32,933 And in particular, we're going to talk 11 00:00:32,933 --> 00:00:35,550 about this paper on TCP/IP security by this guy 12 00:00:35,550 --> 00:00:40,027 Steve Bellovin, who used to be at AT&T and now is at Columbia. 13 00:00:40,027 --> 00:00:41,610 One interesting thing about this paper 14 00:00:41,610 --> 00:00:43,276 is it's actually a relatively old paper. 15 00:00:43,276 --> 00:00:44,584 It's more than 10 years old. 16 00:00:44,584 --> 00:00:46,209 And in fact, it's commentary on a paper 17 00:00:46,209 --> 00:00:48,520 that was 10 years before that. 18 00:00:48,520 --> 00:00:51,840 And many of you guys actually ask, why are we reading this 19 00:00:51,840 --> 00:00:54,500 if many of these problems have been solved in today's TCP 20 00:00:54,500 --> 00:00:55,980 protocol stacks? 21 00:00:55,980 --> 00:00:57,429 So one interesting point-- so it's 22 00:00:57,429 --> 00:00:59,220 true that some of these problems that Steve 23 00:00:59,220 --> 00:01:02,061 describes in this paper have been solved since then. 24 00:01:02,061 --> 00:01:04,019 Some of them are still actually problems today. 25 00:01:04,019 --> 00:01:07,145 We'll sort of look at that and see what's going on. 26 00:01:07,145 --> 00:01:09,925 But you might actually wonder, why didn't people 27 00:01:09,925 --> 00:01:11,970 solve all these problems in the first place 28 00:01:11,970 --> 00:01:13,730 when they were designing TCP? 29 00:01:13,730 --> 00:01:15,480 What were they thinking? 30 00:01:15,480 --> 00:01:17,507 And it's actually not clear. 31 00:01:17,507 --> 00:01:18,590 So what do you guys think? 32 00:01:18,590 --> 00:01:21,440 Why wasn't TCP designed to be secure with all 33 00:01:21,440 --> 00:01:23,916 these considerations up front? 34 00:01:23,916 --> 00:01:26,596 Yeah, any guesses? 35 00:01:26,596 --> 00:01:27,760 All right, anyone else? 36 00:01:27,760 --> 00:01:28,260 Yeah. 37 00:01:28,260 --> 00:01:30,052 AUDIENCE: The internet was a much more trusting place 38 00:01:30,052 --> 00:01:30,500 back then. 39 00:01:30,500 --> 00:01:32,250 PROFESSOR: Yeah, this was almost literally 40 00:01:32,250 --> 00:01:33,515 a quote from this guy's paper. 41 00:01:33,515 --> 00:01:36,445 Yeah, at the time-- the whole internet set of protocols 42 00:01:36,445 --> 00:01:39,700 was designed I guess about 40 years ago now. 43 00:01:39,700 --> 00:01:41,366 The requirements were totally different. 44 00:01:41,366 --> 00:01:44,450 It was to connect a bunch of relatively trusting sites 45 00:01:44,450 --> 00:01:47,070 that all knew each other by name. 46 00:01:47,070 --> 00:01:50,510 And I think this is often the case in any system that 47 00:01:50,510 --> 00:01:51,420 becomes successful. 48 00:01:51,420 --> 00:01:52,600 The requirements change. 49 00:01:52,600 --> 00:01:54,864 So it used to be that this was a protocol 50 00:01:54,864 --> 00:01:56,030 for a small number of sites. 51 00:01:56,030 --> 00:01:57,530 Now it's the entire world. 52 00:01:57,530 --> 00:01:58,990 And you don't know all the people 53 00:01:58,990 --> 00:02:00,175 connected to the internet by name anymore. 54 00:02:00,175 --> 00:02:01,633 You can't call them up on the phone 55 00:02:01,633 --> 00:02:03,470 if they do something bad, et cetera. 56 00:02:03,470 --> 00:02:05,650 So I think this is a story for many of the protocols 57 00:02:05,650 --> 00:02:06,170 we look at. 58 00:02:06,170 --> 00:02:08,520 And many of you guys have questions, like, what the hell 59 00:02:08,520 --> 00:02:09,561 were these guys thinking? 60 00:02:09,561 --> 00:02:10,312 This is so broken. 61 00:02:10,312 --> 00:02:12,811 But in fact, they were designing a totally different system. 62 00:02:12,811 --> 00:02:13,660 It got adopted. 63 00:02:13,660 --> 00:02:15,243 Same for the web, like we were looking 64 00:02:15,243 --> 00:02:17,060 at in the last couple of weeks. 65 00:02:17,060 --> 00:02:18,950 It was designed for a very different goal. 66 00:02:18,950 --> 00:02:19,944 And it expanded. 67 00:02:19,944 --> 00:02:21,610 And you sort of have these growing pains 68 00:02:21,610 --> 00:02:24,760 you have to figure out how to make the protocol adapt 69 00:02:24,760 --> 00:02:26,750 to new requirements. 70 00:02:26,750 --> 00:02:29,370 And another thing that somewhat suddenly happened 71 00:02:29,370 --> 00:02:31,370 is I think people also in the process 72 00:02:31,370 --> 00:02:32,814 gained a much greater appreciation 73 00:02:32,814 --> 00:02:34,230 for the kinds of problems you have 74 00:02:34,230 --> 00:02:35,970 to worry about in security. 75 00:02:35,970 --> 00:02:38,520 And it used to be the case that you didn't really 76 00:02:38,520 --> 00:02:39,960 understand all the things that you 77 00:02:39,960 --> 00:02:42,870 should worry about an attacker doing to your system. 78 00:02:42,870 --> 00:02:44,520 And I think it's partly for this reason 79 00:02:44,520 --> 00:02:46,400 that it's sort of interesting to look 80 00:02:46,400 --> 00:02:48,960 at what happened to TCP security, what went wrong, 81 00:02:48,960 --> 00:02:51,639 how could we fix it, et cetera, to both figure out 82 00:02:51,639 --> 00:02:54,180 what kinds of problems you might want to avoid when designing 83 00:02:54,180 --> 00:02:56,980 your own protocols, and also what's 84 00:02:56,980 --> 00:02:59,765 the right mindset for thinking about these kinds of attacks. 85 00:02:59,765 --> 00:03:02,410 How do you figure out what an attacker might 86 00:03:02,410 --> 00:03:03,840 be able to do in your own protocol 87 00:03:03,840 --> 00:03:08,290 when you're designing it so you can avoid similar pitfalls? 88 00:03:08,290 --> 00:03:10,152 All right, so with that preamble aside, 89 00:03:10,152 --> 00:03:12,610 let's actually start talking about what the paper is about. 90 00:03:12,610 --> 00:03:15,770 So how should we think about security in a network? 91 00:03:15,770 --> 00:03:18,620 So I guess we could try to start from first principles 92 00:03:18,620 --> 00:03:22,350 and try to figure out, what is our threat model? 93 00:03:22,350 --> 00:03:24,220 So what do we think the attacker is 94 00:03:24,220 --> 00:03:28,210 going to be able to do in our network? 95 00:03:28,210 --> 00:03:29,980 Well, relatively straightforwardly, 96 00:03:29,980 --> 00:03:36,340 there's presumably being able to intercept packets, 97 00:03:36,340 --> 00:03:38,110 and probably being able to modify them. 98 00:03:41,000 --> 00:03:42,960 So if you send a packet over the network, 99 00:03:42,960 --> 00:03:45,800 it might be prudent to assume that some bad guy out there is 100 00:03:45,800 --> 00:03:48,540 going to see your packet and might be able to change it 101 00:03:48,540 --> 00:03:49,980 before it reaches the destination, 102 00:03:49,980 --> 00:03:52,440 might be able to drop it, and in fact might 103 00:03:52,440 --> 00:03:54,740 be able to inject packets of their own 104 00:03:54,740 --> 00:03:59,170 that you never sent with arbitrary contents. 105 00:03:59,170 --> 00:04:02,605 And probably-- so this you can sort of 106 00:04:02,605 --> 00:04:04,480 come up with fairly straightforwardly by just 107 00:04:04,480 --> 00:04:07,010 thinking, well, if you don't trust the network, 108 00:04:07,010 --> 00:04:09,490 some bad guy is going to send arbitrary packets, 109 00:04:09,490 --> 00:04:11,470 see yours, modify them, et cetera. 110 00:04:11,470 --> 00:04:15,240 Somewhat more worryingly, as this paper talks about, 111 00:04:15,240 --> 00:04:17,720 the bad guy can also participate in your protocols. 112 00:04:17,720 --> 00:04:19,178 They have their own machine, right? 113 00:04:19,178 --> 00:04:22,490 So the attacker has their own computer 114 00:04:22,490 --> 00:04:23,990 that they have full control over. 115 00:04:23,990 --> 00:04:29,450 So even if all the computers that you trust 116 00:04:29,450 --> 00:04:32,187 are reasonably maintained, they all behave correctly, 117 00:04:32,187 --> 00:04:34,020 the bad guy has his own computer that he can 118 00:04:34,020 --> 00:04:35,700 make it do whatever he wants. 119 00:04:35,700 --> 00:04:37,840 And in fact, he can participate in a protocol 120 00:04:37,840 --> 00:04:39,743 or distribute a system. 121 00:04:45,150 --> 00:04:47,000 So if you have a routing protocol, which 122 00:04:47,000 --> 00:04:49,835 involves many people talking to each other, at some scale, 123 00:04:49,835 --> 00:04:51,710 it's probably going to be impractical to keep 124 00:04:51,710 --> 00:04:52,484 the bad guys out. 125 00:04:52,484 --> 00:04:54,900 If you're running a routing protocol with 10 participants, 126 00:04:54,900 --> 00:04:57,170 then maybe you can just call all them up and say, well, yeah, 127 00:04:57,170 --> 00:04:58,380 yeah, I know all you guys. 128 00:04:58,380 --> 00:05:00,720 But at the scale of the internet today, it's 129 00:05:00,720 --> 00:05:04,300 unfeasible to have sort of direct knowledge of what 130 00:05:04,300 --> 00:05:07,236 everyone else or who everyone else in this protocol is. 131 00:05:07,236 --> 00:05:08,610 So probably some bad guy is going 132 00:05:08,610 --> 00:05:11,160 to be participating in your protocols or distributed 133 00:05:11,160 --> 00:05:11,660 systems. 134 00:05:11,660 --> 00:05:13,868 And it's important to design distributed systems that 135 00:05:13,868 --> 00:05:17,955 can nonetheless do something reasonable with that. 136 00:05:17,955 --> 00:05:19,580 All right, so what are the implications 137 00:05:19,580 --> 00:05:20,280 of all these things? 138 00:05:20,280 --> 00:05:21,680 I guess we'll go down the list. 139 00:05:21,680 --> 00:05:26,570 So intercepting is-- it's on the whole easy to understand. 140 00:05:26,570 --> 00:05:29,170 Well, you shouldn't send any important data 141 00:05:29,170 --> 00:05:32,380 over the network if you expect a bad guy to intercept them, 142 00:05:32,380 --> 00:05:33,680 or at least not in clear text. 143 00:05:33,680 --> 00:05:35,480 Maybe you should encrypt your data. 144 00:05:35,480 --> 00:05:37,785 So that seems relatively straightforward to sort 145 00:05:37,785 --> 00:05:38,565 of figure out. 146 00:05:38,565 --> 00:05:41,106 Although still you should sort of keep it in mind, of course, 147 00:05:41,106 --> 00:05:43,070 when designing protocols. 148 00:05:43,070 --> 00:05:46,770 Now, injecting packets turns out to lead 149 00:05:46,770 --> 00:05:50,380 to a much wider range of interesting problems 150 00:05:50,380 --> 00:05:51,970 that this paper talks about. 151 00:05:51,970 --> 00:05:55,160 And in particular, attackers can inject 152 00:05:55,160 --> 00:05:58,860 packets that can pretend to be from any other sender. 153 00:05:58,860 --> 00:06:02,295 Because the way this works in IP is that the IP packet itself 154 00:06:02,295 --> 00:06:04,420 has a header that contains the source of the packet 155 00:06:04,420 --> 00:06:06,080 and the destination. 156 00:06:06,080 --> 00:06:08,830 And it's up to whoever creates the packet 157 00:06:08,830 --> 00:06:11,750 to fill in the right values for the source and destination. 158 00:06:11,750 --> 00:06:14,100 And no one checks that the source is necessarily 159 00:06:14,100 --> 00:06:16,100 the correct. 160 00:06:16,100 --> 00:06:18,050 There's some filtering going on these days. 161 00:06:18,050 --> 00:06:21,430 But it's sort of fairly spotty, and it's hard to rely on. 162 00:06:21,430 --> 00:06:23,115 So to a first approximation, an attacker 163 00:06:23,115 --> 00:06:25,340 could fill in any IP address as the source, 164 00:06:25,340 --> 00:06:29,180 and it will get to the destination correctly. 165 00:06:29,180 --> 00:06:32,070 And it's interesting to try to figure out 166 00:06:32,070 --> 00:06:35,540 what could an attacker do with such a capability of sending 167 00:06:35,540 --> 00:06:37,670 arbitrary packets. 168 00:06:37,670 --> 00:06:41,771 Now, in the several weeks up to this, 169 00:06:41,771 --> 00:06:43,520 like in buffer overflows and web security, 170 00:06:43,520 --> 00:06:46,110 we looked at, to a large extent, implementation bugs, like, 171 00:06:46,110 --> 00:06:48,440 how could you exploit a buffer overflow? 172 00:06:48,440 --> 00:06:50,420 And interestingly, the author of this paper 173 00:06:50,420 --> 00:06:53,010 is actually not at all interested in implementation 174 00:06:53,010 --> 00:06:53,510 bugs. 175 00:06:53,510 --> 00:06:56,770 He's really interested in protocol errors or protocol 176 00:06:56,770 --> 00:06:57,562 mistakes. 177 00:06:57,562 --> 00:06:58,520 So what's the big deal? 178 00:06:58,520 --> 00:07:00,580 Why is he down on implementation bugs, 179 00:07:00,580 --> 00:07:02,705 even though we spent several weeks looking at them? 180 00:07:02,705 --> 00:07:03,510 Why does it matter? 181 00:07:03,510 --> 00:07:04,010 Yeah. 182 00:07:04,010 --> 00:07:06,828 AUDIENCE: Because we have to keep those bugs [INAUDIBLE]. 183 00:07:06,828 --> 00:07:09,800 PROFESSOR: Yeah, so this is the really big bummer about a bug 184 00:07:09,800 --> 00:07:11,550 in your protocol design. 185 00:07:11,550 --> 00:07:13,589 Because it's hard to change. 186 00:07:13,589 --> 00:07:15,130 So if you have an implementation bug, 187 00:07:15,130 --> 00:07:17,350 well, you had a memcpy or a print-out 188 00:07:17,350 --> 00:07:19,254 out of some sort that didn't check the range. 189 00:07:19,254 --> 00:07:21,420 OK, well, you had a range check, and it still works, 190 00:07:21,420 --> 00:07:22,461 and now it's also secure. 191 00:07:22,461 --> 00:07:23,510 So that's great. 192 00:07:23,510 --> 00:07:25,995 But if you have some bug in the protocol specification, 193 00:07:25,995 --> 00:07:29,140 in how the protocol has to work, then fixing a bug 194 00:07:29,140 --> 00:07:31,190 is going to require fixing a protocol, which 195 00:07:31,190 --> 00:07:33,820 means potentially affecting all the systems that 196 00:07:33,820 --> 00:07:35,460 are out there speaking this protocol. 197 00:07:35,460 --> 00:07:37,340 So if we find some problem in TCP, 198 00:07:37,340 --> 00:07:38,910 it's potentially quite devastating. 199 00:07:38,910 --> 00:07:42,229 Because every machine that uses TCP is going to have to change. 200 00:07:42,229 --> 00:07:43,770 Because it's going to be hard to make 201 00:07:43,770 --> 00:07:45,790 it potentially backwards compatible. 202 00:07:45,790 --> 00:07:48,530 We'll see exactly what these bugs are. 203 00:07:48,530 --> 00:07:51,640 But this is the real reason he's so excited about looking 204 00:07:51,640 --> 00:07:52,460 at protocol bugs. 205 00:07:52,460 --> 00:07:56,555 Because they're fairly fundamental to the TCP protocol 206 00:07:56,555 --> 00:07:58,982 that everyone agrees to speak. 207 00:07:58,982 --> 00:08:00,440 So let's look at one of these guys. 208 00:08:00,440 --> 00:08:03,950 So the first example he points out 209 00:08:03,950 --> 00:08:07,140 has to do with how TCP sequence numbers work. 210 00:08:09,720 --> 00:08:13,169 So just to re-explain-- yeah, question. 211 00:08:13,169 --> 00:08:14,294 AUDIENCE: I'm just curious. 212 00:08:14,294 --> 00:08:15,502 This is a tiny bit off topic. 213 00:08:15,502 --> 00:08:17,606 But let's say you do find a bug in TCP. 214 00:08:17,606 --> 00:08:19,976 How do you make the change to it? 215 00:08:19,976 --> 00:08:21,398 How do you tell all the computers 216 00:08:21,398 --> 00:08:23,736 in the world to change that? 217 00:08:23,736 --> 00:08:25,610 PROFESSOR: Yeah, I think it's a huge problem. 218 00:08:25,610 --> 00:08:26,900 What if you find a bug in TCP? 219 00:08:26,900 --> 00:08:29,280 Well, it's unclear what to do. 220 00:08:29,280 --> 00:08:32,530 And I think the authors here struggle a lot with that. 221 00:08:32,530 --> 00:08:36,165 And in many ways, if you could redesign TCP, 222 00:08:36,165 --> 00:08:37,539 many of these bugs are relatively 223 00:08:37,539 --> 00:08:41,620 easy to fix if you knew what to look for ahead of time. 224 00:08:41,620 --> 00:08:46,410 But because TCP is sort of relatively hard 225 00:08:46,410 --> 00:08:51,350 to fix or change, what ends up happening 226 00:08:51,350 --> 00:08:54,457 is that people or designers try to look 227 00:08:54,457 --> 00:08:58,135 for backwards compatible tweaks that either allow 228 00:08:58,135 --> 00:09:01,170 old implementations to coexist with the new implementation 229 00:09:01,170 --> 00:09:04,970 or to add some optional field that if it's there, then 230 00:09:04,970 --> 00:09:08,590 the communication is more secure in some way. 231 00:09:08,590 --> 00:09:10,370 But it is a big problem. 232 00:09:10,370 --> 00:09:14,030 If it's some security issue that's deeply ingrained in TCP, 233 00:09:14,030 --> 00:09:17,750 then it's going to be a pretty humongous issue for everyone 234 00:09:17,750 --> 00:09:23,930 to just pack up and move onto a TCP version whatever, n plus 1. 235 00:09:23,930 --> 00:09:27,296 And you can look at IPv6 as one example of this not happening. 236 00:09:27,296 --> 00:09:29,170 We've known this problem was going to come up 237 00:09:29,170 --> 00:09:31,025 for like 15 years or 20 years. 238 00:09:31,025 --> 00:09:34,759 IPv6 has been around for well over 10 years now. 239 00:09:34,759 --> 00:09:37,300 And it's just hard to convince people to move away from IPv4. 240 00:09:37,300 --> 00:09:38,060 It's good enough. 241 00:09:38,060 --> 00:09:39,040 It sort of works. 242 00:09:39,040 --> 00:09:41,465 It's a lot of overhead to move over. 243 00:09:41,465 --> 00:09:43,019 And no one else is speaking IPv6, so 244 00:09:43,019 --> 00:09:45,060 why should I start speaking this bizarre protocol 245 00:09:45,060 --> 00:09:47,220 that no one else is going to speak to me in? 246 00:09:47,220 --> 00:09:48,520 So it's sort of moving along. 247 00:09:48,520 --> 00:09:49,895 But I think it takes a long time. 248 00:09:49,895 --> 00:09:53,290 And there's going to be really some motivation to migrate. 249 00:09:53,290 --> 00:09:56,940 And backwards compatibility helps a lot. 250 00:09:56,940 --> 00:09:58,700 Not good enough for, I guess, IPv6-- IPv6 251 00:09:58,700 --> 00:10:01,205 has lots of backwards compatibility plans in it. 252 00:10:01,205 --> 00:10:04,720 You can talk to an IPv4 host from IPv6. 253 00:10:04,720 --> 00:10:07,770 So they try to engineer all this support. 254 00:10:07,770 --> 00:10:11,840 But still, it's hard to convince people to upgrade. 255 00:10:11,840 --> 00:10:15,194 All right, but yeah, looking back at the TCP sequence 256 00:10:15,194 --> 00:10:17,610 numbers, we're going to look at actually two problems that 257 00:10:17,610 --> 00:10:20,990 have to do with how the TCP handshake works. 258 00:10:20,990 --> 00:10:24,260 So let's just spend a little bit of time working out 259 00:10:24,260 --> 00:10:27,060 what are the details of how a TCP connection gets initially 260 00:10:27,060 --> 00:10:28,510 established. 261 00:10:28,510 --> 00:10:30,650 So there's actually three packets 262 00:10:30,650 --> 00:10:33,900 that have to get sent in order for a new TCP connection 263 00:10:33,900 --> 00:10:34,690 to be established. 264 00:10:34,690 --> 00:10:37,960 So our client generates a packet to connect to a server. 265 00:10:37,960 --> 00:10:41,140 And it says, well, here's my IP address, C, client. 266 00:10:41,140 --> 00:10:42,940 I'm sending this to the server. 267 00:10:42,940 --> 00:10:44,534 And there's various fields. 268 00:10:44,534 --> 00:10:46,575 But the ones that are interesting for the purpose 269 00:10:46,575 --> 00:10:49,590 of this discussion is going to be a sequence number. 270 00:10:49,590 --> 00:10:51,610 So there's going to be a syn flag saying, 271 00:10:51,610 --> 00:10:54,350 I want to synchronize state and establish a new connection. 272 00:10:54,350 --> 00:10:57,480 And you include a client sequence number 273 00:10:57,480 --> 00:11:00,680 in the initial syn packet. 274 00:11:00,680 --> 00:11:02,370 Then when the server receives this, 275 00:11:02,370 --> 00:11:04,167 the server is going to look and say, 276 00:11:04,167 --> 00:11:05,750 well, a client wants to connect to me, 277 00:11:05,750 --> 00:11:07,459 so I'll send a packet back to whatever 278 00:11:07,459 --> 00:11:09,000 this address is, whoever said they're 279 00:11:09,000 --> 00:11:10,060 trying to connect to me. 280 00:11:10,060 --> 00:11:13,740 So it'll send a packet from the server to the client 281 00:11:13,740 --> 00:11:17,170 and include its own synchronization number, SN 282 00:11:17,170 --> 00:11:18,080 server. 283 00:11:18,080 --> 00:11:19,910 And it'll acknowledge the client's number. 284 00:11:23,750 --> 00:11:28,040 And finally, the client replies back, 285 00:11:28,040 --> 00:11:30,260 acknowledging the server synchronization 286 00:11:30,260 --> 00:11:37,070 number-- acknowledge SNS. 287 00:11:37,070 --> 00:11:40,110 And now the client can actually start sending data. 288 00:11:40,110 --> 00:11:42,290 So in order to send data, the client 289 00:11:42,290 --> 00:11:46,770 has to include some data in the packet, 290 00:11:46,770 --> 00:11:51,780 and also put in the sequence number of the client 291 00:11:51,780 --> 00:11:53,275 to indicate that this is actually 292 00:11:53,275 --> 00:11:55,858 sort of legitimate client data at the start of the connection. 293 00:11:55,858 --> 00:11:57,650 It's not some data from later on, 294 00:11:57,650 --> 00:12:00,168 for example, that just happens to arrive now 295 00:12:00,168 --> 00:12:02,542 because the server missed some initial parts of the data. 296 00:12:02,542 --> 00:12:04,735 So generally, all these sequence numbers 297 00:12:04,735 --> 00:12:08,240 were meant for ensuring in order delivery of packets. 298 00:12:08,240 --> 00:12:11,225 So if the client sends two packets, the one that 299 00:12:11,225 --> 00:12:14,370 has the initial sequence number, that's the first chunk of data. 300 00:12:14,370 --> 00:12:16,078 And the one with the next sequence number 301 00:12:16,078 --> 00:12:17,410 is the next chunk of data. 302 00:12:17,410 --> 00:12:20,790 But it turns out to also be useful for providing 303 00:12:20,790 --> 00:12:22,424 some security properties. 304 00:12:22,424 --> 00:12:24,465 Here's an example of these requirements changing. 305 00:12:24,465 --> 00:12:25,840 So initially, no one was thinking 306 00:12:25,840 --> 00:12:27,465 TCP provides any security properties. 307 00:12:27,465 --> 00:12:29,940 But then applications started using TCP and sort 308 00:12:29,940 --> 00:12:32,470 of relying on these TCP connections 309 00:12:32,470 --> 00:12:35,800 not being able to be broken by some arbitrary attacker, 310 00:12:35,800 --> 00:12:39,150 or an attacker not being able to inject data into your existing 311 00:12:39,150 --> 00:12:40,360 TCP connection. 312 00:12:40,360 --> 00:12:43,060 And all of a sudden, this mechanism that was initially 313 00:12:43,060 --> 00:12:46,240 meant for just packet ordering now 314 00:12:46,240 --> 00:12:49,160 gets used to guarantee some semblance of security 315 00:12:49,160 --> 00:12:52,110 for these connections. 316 00:12:52,110 --> 00:12:59,810 So in this case, I guess the problem 317 00:12:59,810 --> 00:13:03,710 stems from what could a server assume about this TCP 318 00:13:03,710 --> 00:13:04,520 connection. 319 00:13:04,520 --> 00:13:08,180 So typically, the server assumes-- implicitly, 320 00:13:08,180 --> 00:13:12,480 you might imagine-- that this connection is established 321 00:13:12,480 --> 00:13:17,195 with the right client at this IP address C. It seems 322 00:13:17,195 --> 00:13:18,650 like a natural thing to assume. 323 00:13:18,650 --> 00:13:20,990 Is there any basis for making this assumption? 324 00:13:20,990 --> 00:13:24,780 If a server gets this message saying, here's 325 00:13:24,780 --> 00:13:27,500 some data on this connection from a client to a server, 326 00:13:27,500 --> 00:13:32,950 and it has sequence number C, why might the server 327 00:13:32,950 --> 00:13:35,759 conclude that this was actually the real client sending this? 328 00:13:35,759 --> 00:13:38,050 AUDIENCE: Because the sequence number is hard to guess. 329 00:13:38,050 --> 00:13:39,940 PROFESSOR: Right, so that's sort of the implicit thing going on, 330 00:13:39,940 --> 00:13:42,640 that it has to have the right sequence number C here. 331 00:13:42,640 --> 00:13:46,430 And in order for this connection to get established, 332 00:13:46,430 --> 00:13:49,870 the client must have acknowledged the server 333 00:13:49,870 --> 00:13:51,570 sequence number S here. 334 00:13:51,570 --> 00:13:55,520 And the server sequence number S was only sent by the server 335 00:13:55,520 --> 00:13:59,386 to the intended client IP address. 336 00:13:59,386 --> 00:13:59,886 Yeah. 337 00:13:59,886 --> 00:14:01,362 AUDIENCE: How many bits are available for the sequence 338 00:14:01,362 --> 00:14:01,862 number? 339 00:14:01,862 --> 00:14:05,831 PROFESSOR: So sequence numbers in TCP are 32 bits long. 340 00:14:05,831 --> 00:14:10,359 That's not entirely easy to guess. 341 00:14:10,359 --> 00:14:12,025 If it was really a random 32 bit number, 342 00:14:12,025 --> 00:14:14,040 it would be hard to just guess. 343 00:14:14,040 --> 00:14:16,699 And you'd probably waste a lot of bandwidth 344 00:14:16,699 --> 00:14:17,637 trying to guess this. 345 00:14:17,637 --> 00:14:19,044 Yeah, question. 346 00:14:19,044 --> 00:14:20,607 AUDIENCE: The data frequency number 347 00:14:20,607 --> 00:14:23,150 is higher than the initial sequence number? 348 00:14:23,150 --> 00:14:25,650 PROFESSOR: Yeah, so basically, these things get incremented. 349 00:14:25,650 --> 00:14:27,500 So every time you send a syn, that 350 00:14:27,500 --> 00:14:29,660 counts as one byte against your sequence number. 351 00:14:29,660 --> 00:14:31,010 So this is SNC. 352 00:14:31,010 --> 00:14:34,670 I think actually what happens is this is SNC plus 1. 353 00:14:34,670 --> 00:14:36,140 And then it goes on from there. 354 00:14:36,140 --> 00:14:40,117 So if you send 5 bytes, then the next one is SNC initial plus 6. 355 00:14:40,117 --> 00:14:42,200 So this just counts the bytes that you're sending. 356 00:14:42,200 --> 00:14:44,920 SYNs count as 1 byte each. 357 00:14:44,920 --> 00:14:45,720 Make sense? 358 00:14:45,720 --> 00:14:48,050 Other questions about this? 359 00:14:48,050 --> 00:14:54,650 All right, so typically, or at least the way 360 00:14:54,650 --> 00:14:56,370 the TCP specification recommended 361 00:14:56,370 --> 00:14:58,550 that people choose these sequence numbers, 362 00:14:58,550 --> 00:15:02,750 was to increment them at some roughly fixed rate. 363 00:15:02,750 --> 00:15:06,300 So the initial RFCs suggested that you increment these things 364 00:15:06,300 --> 00:15:12,350 at something like 250,000 units, plus 250,000, per second. 365 00:15:12,350 --> 00:15:14,920 And the reason that it wasn't entirely random 366 00:15:14,920 --> 00:15:17,060 is that these sequence numbers are actually 367 00:15:17,060 --> 00:15:20,690 used to prevent out of order packets, or packets 368 00:15:20,690 --> 00:15:22,534 from previous connections, from interfering 369 00:15:22,534 --> 00:15:23,917 with new connections. 370 00:15:23,917 --> 00:15:26,870 So if every time you established a new connection 371 00:15:26,870 --> 00:15:29,740 you chose a completely random sequence number, 372 00:15:29,740 --> 00:15:32,590 then there's some chance if you establish lots of connections 373 00:15:32,590 --> 00:15:35,510 over and over that some packet from a previous connection 374 00:15:35,510 --> 00:15:37,290 is going to have a similar enough sequence 375 00:15:37,290 --> 00:15:38,790 number to your new connection and is 376 00:15:38,790 --> 00:15:41,490 going to be accepted as a valid piece of data 377 00:15:41,490 --> 00:15:42,602 on that new connection. 378 00:15:42,602 --> 00:15:45,000 So this is something that the TCP designers worried a lot 379 00:15:45,000 --> 00:15:48,916 about-- these out of order packets or delayed packets. 380 00:15:48,916 --> 00:15:51,290 So as a result, they really wanted these sequence numbers 381 00:15:51,290 --> 00:15:55,210 to progress in a roughly monotonic matter over time, 382 00:15:55,210 --> 00:15:56,900 even across connections. 383 00:15:56,900 --> 00:15:58,580 If I opened one connection, it might 384 00:15:58,580 --> 00:16:01,060 have the same source and destination, port numbers, 385 00:16:01,060 --> 00:16:02,610 IP addresses, et cetera. 386 00:16:02,610 --> 00:16:04,920 But because I established this connection now instead 387 00:16:04,920 --> 00:16:07,640 of earlier, packets from earlier hopefully 388 00:16:07,640 --> 00:16:10,030 aren't going to match up with the sequence numbers 389 00:16:10,030 --> 00:16:12,310 I have for my new connection. 390 00:16:12,310 --> 00:16:14,280 So this was a mechanism to prevent confusion 391 00:16:14,280 --> 00:16:18,431 across repeated connection establishments. 392 00:16:18,431 --> 00:16:18,930 Yeah. 393 00:16:18,930 --> 00:16:22,983 AUDIENCE: So if you don't know exactly how much your other 394 00:16:22,983 --> 00:16:25,566 grid that you're talking to is going to improve the sequencing 395 00:16:25,566 --> 00:16:26,845 pack, how do you know that the packet you're getting is 396 00:16:26,845 --> 00:16:29,120 the next packet if there wasn't [INAUDIBLE] immediate packet 397 00:16:29,120 --> 00:16:29,530 that you-- 398 00:16:29,530 --> 00:16:31,870 PROFESSOR: So typically you'll remember the last packet 399 00:16:31,870 --> 00:16:33,260 that you received. 400 00:16:33,260 --> 00:16:35,680 And if the next sequence number is exactly that, 401 00:16:35,680 --> 00:16:38,267 then this is the next packet in sequence. 402 00:16:38,267 --> 00:16:39,850 So for example, here, the server knows 403 00:16:39,850 --> 00:16:43,937 that I've seen exactly SNC plus 1 worth of data. 404 00:16:43,937 --> 00:16:46,020 If the next packet has sequence number SNC plus 1, 405 00:16:46,020 --> 00:16:47,618 that's the next one. 406 00:16:47,618 --> 00:16:49,117 AUDIENCE: So you're saying that when 407 00:16:49,117 --> 00:16:51,790 you establish a sequence number, then even after that you're 408 00:16:51,790 --> 00:16:52,762 committing it-- 409 00:16:52,762 --> 00:16:54,428 PROFESSOR: Well, absolutely, yeah, yeah. 410 00:16:54,428 --> 00:16:56,611 So these sequence numbers, initially when 411 00:16:56,611 --> 00:16:58,986 you establish it, they get picked according to some plan. 412 00:16:58,986 --> 00:17:00,135 We'll talk about that plan. 413 00:17:00,135 --> 00:17:01,940 You can sort of think they might be random. 414 00:17:01,940 --> 00:17:05,310 But over time, they have to have some flow for initial sequence 415 00:17:05,310 --> 00:17:07,034 numbers for connection. 416 00:17:07,034 --> 00:17:09,575 But within a connection, once they're established, that's it. 417 00:17:09,575 --> 00:17:10,280 They're fixed. 418 00:17:10,280 --> 00:17:11,890 And they just tick along as the data 419 00:17:11,890 --> 00:17:15,630 gets sent on the connection, exactly. 420 00:17:15,630 --> 00:17:17,270 Make sense? 421 00:17:17,270 --> 00:17:19,619 All right, so there were some plans 422 00:17:19,619 --> 00:17:22,020 suggested for how to manage these sequence numbers. 423 00:17:22,020 --> 00:17:23,650 And it was actually a reasonable plan 424 00:17:23,650 --> 00:17:27,970 for avoiding duplicate packets in the network causing trouble. 425 00:17:27,970 --> 00:17:31,730 But the problem, of course, showed up 426 00:17:31,730 --> 00:17:37,511 that attackers were able to sort of guess these sequence 427 00:17:37,511 --> 00:17:38,010 numbers. 428 00:17:38,010 --> 00:17:42,090 Because there wasn't a lot of randomness being chosen. 429 00:17:42,090 --> 00:17:44,750 So the way that the host machine would choose these sequence 430 00:17:44,750 --> 00:17:47,290 numbers is they have just a running counter in memory. 431 00:17:47,290 --> 00:17:50,080 Every second they bump it by 250,000. 432 00:17:50,080 --> 00:17:51,790 And every time a new connection comes 433 00:17:51,790 --> 00:17:55,530 in, they also bump it by some constant like 64k or 128k. 434 00:17:55,530 --> 00:17:57,930 I forget the exact number. 435 00:17:57,930 --> 00:18:00,180 So this was relatively easy to guess, as you can tell. 436 00:18:00,180 --> 00:18:02,030 You send them their connection request, 437 00:18:02,030 --> 00:18:04,116 and you see what sequence number comes back. 438 00:18:04,116 --> 00:18:05,741 And then you know the next one is going 439 00:18:05,741 --> 00:18:07,950 to be 64k higher than that. 440 00:18:07,950 --> 00:18:12,352 So there wasn't a huge amount of randomness in this protocol. 441 00:18:12,352 --> 00:18:14,310 So we can just sketch out what this looks like. 442 00:18:14,310 --> 00:18:17,190 So if I'm an attacker that wants to connect to a server 443 00:18:17,190 --> 00:18:20,690 but pretend to be from a particular IP address, 444 00:18:20,690 --> 00:18:23,920 then what I might do is send a request to the server, 445 00:18:23,920 --> 00:18:26,490 very much like the first step there, 446 00:18:26,490 --> 00:18:30,470 include some initial sequence number that I choose. 447 00:18:30,470 --> 00:18:31,995 At this point, any sequence number 448 00:18:31,995 --> 00:18:33,660 is just as good, because the server shouldn't 449 00:18:33,660 --> 00:18:35,970 have any assumptions about what the client's sequence 450 00:18:35,970 --> 00:18:37,042 number is. 451 00:18:37,042 --> 00:18:38,250 Now, what does the server do? 452 00:18:38,250 --> 00:18:40,760 The server gets the same packet as before. 453 00:18:40,760 --> 00:18:42,550 So it performs the same way as before. 454 00:18:42,550 --> 00:18:47,500 It sends a packet back to the client with some server 455 00:18:47,500 --> 00:18:53,000 sequence number and acknowledges SNC. 456 00:18:53,000 --> 00:18:55,800 And now the attacker, if the attacker 457 00:18:55,800 --> 00:18:58,620 wants to establish a connection, needs to somehow synthesize 458 00:18:58,620 --> 00:19:01,540 a packet that looks exactly like the third packet over there. 459 00:19:01,540 --> 00:19:04,779 So it needs to send a packet from the client to the server. 460 00:19:04,779 --> 00:19:05,570 That's easy enough. 461 00:19:05,570 --> 00:19:07,403 You just fill in these values in the header. 462 00:19:07,403 --> 00:19:12,820 But you have to acknowledge this server sequence number SNS. 463 00:19:12,820 --> 00:19:15,090 And this is where sort of the problems start. 464 00:19:15,090 --> 00:19:18,950 If the SNS value is relatively easy to guess, 465 00:19:18,950 --> 00:19:21,050 then the attacker is good to go. 466 00:19:21,050 --> 00:19:22,910 And now the server thinks they have 467 00:19:22,910 --> 00:19:26,400 an established connection with a client coming from this IP 468 00:19:26,400 --> 00:19:27,990 address. 469 00:19:27,990 --> 00:19:31,920 And now an attacker could inject data into this connection 470 00:19:31,920 --> 00:19:33,174 just as before. 471 00:19:33,174 --> 00:19:34,590 They just synthesize a packet that 472 00:19:34,590 --> 00:19:37,680 looks like this, has the data, and it 473 00:19:37,680 --> 00:19:41,030 has the client sequence number that in fact the adversary 474 00:19:41,030 --> 00:19:41,540 chose. 475 00:19:41,540 --> 00:19:43,870 Maybe it's plus 1 here. 476 00:19:43,870 --> 00:19:48,170 But it all hinges on being able to guess this particular server 477 00:19:48,170 --> 00:19:51,400 supplied sequence number. 478 00:19:51,400 --> 00:19:52,820 All right, does this make sense? 479 00:19:52,820 --> 00:19:53,320 Yeah. 480 00:19:53,320 --> 00:19:54,280 AUDIENCE: What's the reason that the server sequence 481 00:19:54,280 --> 00:19:56,374 number isn't completely random? 482 00:19:56,374 --> 00:19:57,790 PROFESSOR: So there's two reasons. 483 00:19:57,790 --> 00:20:00,940 One, as I was describing earlier, 484 00:20:00,940 --> 00:20:05,330 the server wants to make sure that packets 485 00:20:05,330 --> 00:20:07,720 from different connections over time 486 00:20:07,720 --> 00:20:09,315 don't get confused for one another. 487 00:20:09,315 --> 00:20:12,280 So if you establish a connection from one source port 488 00:20:12,280 --> 00:20:14,864 to another destination port, and then you close the connection 489 00:20:14,864 --> 00:20:17,363 and establish another one of the same source and destination 490 00:20:17,363 --> 00:20:20,000 port, you want to make sure the packets from one connection 491 00:20:20,000 --> 00:20:24,121 don't appear to be valid in another connection. 492 00:20:24,121 --> 00:20:26,109 AUDIENCE: So the server sequence number 493 00:20:26,109 --> 00:20:28,525 is incremented for every one of their packets? 494 00:20:28,525 --> 00:20:33,397 PROFESSOR: Well, so the sequence numbers within a connection, 495 00:20:33,397 --> 00:20:35,550 as I was describing, get bumped with all the data 496 00:20:35,550 --> 00:20:36,217 in a connection. 497 00:20:36,217 --> 00:20:37,591 But there's also the question of, 498 00:20:37,591 --> 00:20:39,820 how do you choose the initial sequence number here? 499 00:20:39,820 --> 00:20:42,530 And that gets bumped every time a new connection is 500 00:20:42,530 --> 00:20:43,440 established. 501 00:20:43,440 --> 00:20:47,190 So the hope is that by the time it wraps around 2 to the 32 502 00:20:47,190 --> 00:20:50,600 and comes back, there's been enough time 503 00:20:50,600 --> 00:20:52,270 so that old packets in the network 504 00:20:52,270 --> 00:20:54,283 have actually been dropped and will not 505 00:20:54,283 --> 00:20:56,006 appear as duplicates anymore. 506 00:20:56,006 --> 00:20:57,630 So that's the reason why you don't just 507 00:20:57,630 --> 00:20:59,920 choose random points, or they didn't initially 508 00:20:59,920 --> 00:21:01,374 choose random points. 509 00:21:01,374 --> 00:21:01,874 Yeah. 510 00:21:01,874 --> 00:21:04,309 AUDIENCE: So this is a problem between connections, 511 00:21:04,309 --> 00:21:06,910 for a connection between the same guide, the same client, 512 00:21:06,910 --> 00:21:09,410 the same server, the same source port, the same destination. 513 00:21:09,410 --> 00:21:11,012 And we're worried about old packets-- 514 00:21:11,012 --> 00:21:13,565 PROFESSOR: So this is what the original, yeah, TCP designers 515 00:21:13,565 --> 00:21:14,810 were worried about, which is why they 516 00:21:14,810 --> 00:21:16,696 prescribed this way of picking these initial sequence numbers. 517 00:21:16,696 --> 00:21:18,094 AUDIENCE: If you have different new connections, 518 00:21:18,094 --> 00:21:19,135 you could differentiate. 519 00:21:19,135 --> 00:21:19,900 PROFESSOR: That's right, yeah. 520 00:21:19,900 --> 00:21:22,375 AUDIENCE: So then I don't see why the incrementing stuff 521 00:21:22,375 --> 00:21:24,355 and not just take randomly. 522 00:21:24,355 --> 00:21:26,850 PROFESSOR: So I think the reason they don't pick randomly 523 00:21:26,850 --> 00:21:29,100 is that if you did pick randomly, and you established, 524 00:21:29,100 --> 00:21:31,320 I don't know, 1,000 connections within a short amount 525 00:21:31,320 --> 00:21:34,040 of time from the same source to the same destination, 526 00:21:34,040 --> 00:21:37,780 then, well, every one of them is some random value of module 2 527 00:21:37,780 --> 00:21:38,990 to the 32. 528 00:21:38,990 --> 00:21:40,920 And now there's a nontrivial chance 529 00:21:40,920 --> 00:21:42,420 that some packet from one connection 530 00:21:42,420 --> 00:21:45,600 will be delayed in the network, and eventually show up again, 531 00:21:45,600 --> 00:21:48,760 and will get confused for a packet from another connection. 532 00:21:48,760 --> 00:21:50,840 This is just sort of nothing to do with security. 533 00:21:50,840 --> 00:21:52,465 This is just their design consideration 534 00:21:52,465 --> 00:21:55,290 initially for reliable delivery. 535 00:21:55,290 --> 00:21:58,720 AUDIENCE: [INAUDIBLE] some other client to the server, right? 536 00:21:58,720 --> 00:21:59,595 PROFESSOR: Sorry? 537 00:21:59,595 --> 00:22:01,830 AUDIENCE: This is [INAUDIBLE] some other client? 538 00:22:01,830 --> 00:22:03,080 PROFESSOR: That's right, yeah. 539 00:22:03,080 --> 00:22:04,830 So we haven't actually said why this is interesting at all 540 00:22:04,830 --> 00:22:06,190 for the attacker to do. 541 00:22:06,190 --> 00:22:06,880 Why bother? 542 00:22:06,880 --> 00:22:09,055 You could just go from his old IP address, right? 543 00:22:09,055 --> 00:22:12,855 AUDIENCE: So what happens for the server [INAUDIBLE]? 544 00:22:17,346 --> 00:22:19,720 PROFESSOR: Yes, this is actually an interesting question. 545 00:22:19,720 --> 00:22:20,440 What happens here? 546 00:22:20,440 --> 00:22:22,106 So this packet doesn't just get dropped. 547 00:22:22,106 --> 00:22:24,430 It actually goes to this computer. 548 00:22:24,430 --> 00:22:26,426 And what happens? 549 00:22:26,426 --> 00:22:28,926 AUDIENCE: [INAUDIBLE], they just mentioned you try and do it 550 00:22:28,926 --> 00:22:31,116 like they would try and do it when 551 00:22:31,116 --> 00:22:34,019 the other computer was updating or rebooting or off, 552 00:22:34,019 --> 00:22:34,560 or something. 553 00:22:34,560 --> 00:22:35,682 PROFESSOR: Right, certainly they felt, oh, 554 00:22:35,682 --> 00:22:36,192 that computer is offline. 555 00:22:36,192 --> 00:22:37,670 The packet will just get dropped, 556 00:22:37,670 --> 00:22:39,720 and you don't have to worry about it too much. 557 00:22:39,720 --> 00:22:43,010 If a computer is actually listening on that IP address, 558 00:22:43,010 --> 00:22:45,065 then in the TCP protocol, you're supposed 559 00:22:45,065 --> 00:22:47,790 to send a reset packet resetting the connection. 560 00:22:47,790 --> 00:22:51,630 Because this is not a connection that computer C knows about. 561 00:22:51,630 --> 00:22:54,730 And in TCP, this is presumed to be because, oh, this 562 00:22:54,730 --> 00:22:57,640 is some old packet that I requested long ago, 563 00:22:57,640 --> 00:22:59,320 but I've since forgotten about it. 564 00:22:59,320 --> 00:23:04,850 So the machine C here might send a packet to the server saying, 565 00:23:04,850 --> 00:23:07,710 I want a reset. 566 00:23:07,710 --> 00:23:10,503 I actually forget exactly which sequence number goes in there. 567 00:23:10,503 --> 00:23:13,582 But the client C here knows all the sequence numbers 568 00:23:13,582 --> 00:23:15,290 and send any sequence number as necessary 569 00:23:15,290 --> 00:23:17,580 and reset this connection. 570 00:23:17,580 --> 00:23:20,229 So if this computer C is going to do this, 571 00:23:20,229 --> 00:23:21,812 then it might interfere with your plan 572 00:23:21,812 --> 00:23:22,948 to establish a connection. 573 00:23:22,948 --> 00:23:24,406 Because when S gets this packet, it 574 00:23:24,406 --> 00:23:25,993 says, oh, sure, if you don't want it, 575 00:23:25,993 --> 00:23:27,840 I'll reset your connection. 576 00:23:27,840 --> 00:23:30,610 There's some implementation-ish bugs 577 00:23:30,610 --> 00:23:34,075 that you might exploit, or at least the author talks about, 578 00:23:34,075 --> 00:23:38,215 and an potentially exploiting, that would prevent 579 00:23:38,215 --> 00:23:39,990 client C from responding. 580 00:23:39,990 --> 00:23:42,758 So for example, if you flood C with lots of packets, 581 00:23:42,758 --> 00:23:44,633 it's an easy way to get him to drop this one. 582 00:23:44,633 --> 00:23:46,924 It turns out there are other more interesting bugs that 583 00:23:46,924 --> 00:23:49,520 don't require flooding C with lots of packets that still get 584 00:23:49,520 --> 00:23:51,200 C to drop this packet, or at least it 585 00:23:51,200 --> 00:23:53,436 used to on some implementations on TCP stacks. 586 00:23:53,436 --> 00:23:53,936 Yeah. 587 00:23:53,936 --> 00:23:55,436 AUDIENCE: Presumably, most firewalls 588 00:23:55,436 --> 00:23:57,888 would also [INAUDIBLE]. 589 00:23:57,888 --> 00:23:59,370 PROFESSOR: This one? 590 00:23:59,370 --> 00:24:00,852 AUDIENCE: No, the SYN. 591 00:24:00,852 --> 00:24:01,840 PROFESSOR: This one. 592 00:24:01,840 --> 00:24:02,828 AUDIENCE: That came into a client, 593 00:24:02,828 --> 00:24:05,203 and a client didn't originally send a SYN to that server. 594 00:24:05,203 --> 00:24:07,188 And the firewall is going to drop it. 595 00:24:07,188 --> 00:24:08,980 PROFESSOR: It depends, yeah. 596 00:24:08,980 --> 00:24:12,590 So certainly if you have a very sophisticated stateful firewall 597 00:24:12,590 --> 00:24:15,150 that keeps track of all existing connections, or for example 598 00:24:15,150 --> 00:24:17,110 if you have a NAT, then this might happen. 599 00:24:17,110 --> 00:24:20,640 On the other hand, a NAT might actually send the RST 600 00:24:20,640 --> 00:24:22,810 on behalf of the client. 601 00:24:22,810 --> 00:24:23,720 So it's not clear. 602 00:24:23,720 --> 00:24:26,450 I think this is not as common. 603 00:24:26,450 --> 00:24:29,730 So for example, on a Comcast network, 604 00:24:29,730 --> 00:24:32,500 I certainly don't have anyone intercepting these packets 605 00:24:32,500 --> 00:24:34,916 and maintaining state for me and sending RSTs on my behalf 606 00:24:34,916 --> 00:24:35,750 or anything like that. 607 00:24:35,750 --> 00:24:36,250 Yeah. 608 00:24:36,250 --> 00:24:38,250 AUDIENCE: So why can't the server 609 00:24:38,250 --> 00:24:40,206 have independent sequence numbers 610 00:24:40,206 --> 00:24:42,380 for each possible source? 611 00:24:42,380 --> 00:24:46,260 PROFESSOR: Right, so this is in fact what TCP stacks do today. 612 00:24:46,260 --> 00:24:49,910 This is one example of how you fix this problem in a backwards 613 00:24:49,910 --> 00:24:50,660 compatible manner. 614 00:24:50,660 --> 00:24:52,285 So we'll get to exactly the formulation 615 00:24:52,285 --> 00:24:53,330 of how you arrange this. 616 00:24:53,330 --> 00:24:55,910 But yeah, it turns out that if you look at this carefully, 617 00:24:55,910 --> 00:24:59,630 as you're doing, you don't need to have this initial sequence 618 00:24:59,630 --> 00:25:00,950 number be global. 619 00:25:00,950 --> 00:25:04,626 You just scope it to every source/destination pair. 620 00:25:04,626 --> 00:25:06,500 And then you have all the duplicate avoidance 621 00:25:06,500 --> 00:25:11,560 properties we had before, and you have some security as well. 622 00:25:11,560 --> 00:25:15,590 So just to sort of write this out 623 00:25:15,590 --> 00:25:18,550 on the board of how the attacker is getting 624 00:25:18,550 --> 00:25:21,425 this initial sequence number, the attacker would probably 625 00:25:21,425 --> 00:25:23,770 just send a connection from its own IP address 626 00:25:23,770 --> 00:25:27,400 to the server saying, I want to establish a new connection, 627 00:25:27,400 --> 00:25:30,506 and the server would send a response 628 00:25:30,506 --> 00:25:33,920 back to the attacker containing its own sequence number S. 629 00:25:33,920 --> 00:25:36,590 And if the SNS for this connection 630 00:25:36,590 --> 00:25:39,492 and the SNS for this connection are related, 631 00:25:39,492 --> 00:25:40,450 then this is a problem. 632 00:25:40,450 --> 00:25:42,540 But you're saying, let's make them not related. 633 00:25:42,540 --> 00:25:44,310 Because this is from a different address. 634 00:25:44,310 --> 00:25:45,960 Then this is not a problem anymore. 635 00:25:45,960 --> 00:25:47,630 You can't guess what this SNS is going 636 00:25:47,630 --> 00:25:52,298 to be based on this SNS for a different connection. 637 00:25:52,298 --> 00:25:52,798 Yeah. 638 00:25:52,798 --> 00:25:54,797 AUDIENCE: So you still have a collision problem, 639 00:25:54,797 --> 00:25:56,533 because you could engage the 32 bits 640 00:25:56,533 --> 00:25:58,774 by the addresses of your peers. 641 00:25:58,774 --> 00:26:01,264 So you have a lot of ports for each one of these. 642 00:26:01,264 --> 00:26:04,252 So you still have conflicting sequence numbers 643 00:26:04,252 --> 00:26:06,660 for all of these connections that you're getting, right? 644 00:26:06,660 --> 00:26:08,160 PROFESSOR: So these sequence numbers 645 00:26:08,160 --> 00:26:11,156 are specific, as it turns out, to an IP 646 00:26:11,156 --> 00:26:14,530 address and a port number source/destination duple. 647 00:26:14,530 --> 00:26:16,360 So if it's different ports, then they 648 00:26:16,360 --> 00:26:17,372 don't interfere with each other at all. 649 00:26:17,372 --> 00:26:19,130 AUDIENCE: Oh, because you're using the port-- 650 00:26:19,130 --> 00:26:20,205 PROFESSOR: That's right, yeah, you also use 651 00:26:20,205 --> 00:26:21,504 the port in this as well. 652 00:26:21,504 --> 00:26:22,746 AUDIENCE: Because I thought those ports-- 653 00:26:22,746 --> 00:26:25,246 PROFESSOR: Yeah, so the ports are sort of below the sequence 654 00:26:25,246 --> 00:26:27,987 numbers in some way of thinking about it. 655 00:26:27,987 --> 00:26:28,486 Question? 656 00:26:28,486 --> 00:26:31,060 AUDIENCE: If the sequence numbers are global, 657 00:26:31,060 --> 00:26:33,910 then doesn't the attacker [INAUDIBLE]? 658 00:26:36,774 --> 00:26:37,940 PROFESSOR: Yeah, good point. 659 00:26:37,940 --> 00:26:40,780 So in fact, if the server increments the sequence number 660 00:26:40,780 --> 00:26:43,180 by, I don't know, 64k I think it is, 661 00:26:43,180 --> 00:26:46,885 or it was, for every connection, then, well, you connect. 662 00:26:46,885 --> 00:26:49,090 And then maybe five other people connect. 663 00:26:49,090 --> 00:26:51,120 And then you have to do this attack. 664 00:26:51,120 --> 00:26:54,790 So to some extent, you're right, this is a little troublesome. 665 00:26:54,790 --> 00:26:56,420 On the other hand, you could probably 666 00:26:56,420 --> 00:27:01,860 arrange it for your packet here to be delivered just 667 00:27:01,860 --> 00:27:02,740 before this packet. 668 00:27:02,740 --> 00:27:05,870 So if you send these guys back to back, 669 00:27:05,870 --> 00:27:08,200 then there's a good chance they'll arrive at the server 670 00:27:08,200 --> 00:27:08,920 back to back. 671 00:27:08,920 --> 00:27:10,710 The server will get this one, respond 672 00:27:10,710 --> 00:27:12,180 with this sequence number. 673 00:27:12,180 --> 00:27:13,580 It'll get the next one, this one, 674 00:27:13,580 --> 00:27:16,754 respond with the sequence number right afterwards. 675 00:27:16,754 --> 00:27:19,170 And then you know exactly what to put in this third packet 676 00:27:19,170 --> 00:27:21,450 in your sequence. 677 00:27:21,450 --> 00:27:24,620 So I think this is not a foolproof method 678 00:27:24,620 --> 00:27:25,790 of connecting to a server. 679 00:27:25,790 --> 00:27:27,260 There's some guessing involved. 680 00:27:27,260 --> 00:27:29,720 But if you carefully arrange your packets right, 681 00:27:29,720 --> 00:27:32,230 then it's quite easy to make the right guess. 682 00:27:32,230 --> 00:27:34,720 Or maybe you try several times, and you'll get lucky. 683 00:27:34,720 --> 00:27:35,480 Yeah. 684 00:27:35,480 --> 00:27:38,246 AUDIENCE: So even if it's totally random, 685 00:27:38,246 --> 00:27:39,912 and you have to guess it, there are only 686 00:27:39,912 --> 00:27:40,998 like 4 billion possibilities. 687 00:27:40,998 --> 00:27:42,248 It's not a huge number, right? 688 00:27:42,248 --> 00:27:44,372 I feel like in the course of a year, 689 00:27:44,372 --> 00:27:46,300 you should be able to probably get through. 690 00:27:46,300 --> 00:27:48,660 PROFESSOR: Right, yeah, so you're absolutely right. 691 00:27:48,660 --> 00:27:53,500 You shouldn't really be relying on TCP to provide security 692 00:27:53,500 --> 00:27:54,817 very strongly. 693 00:27:54,817 --> 00:27:56,900 Because you're right, it's only 4 billion guesses. 694 00:27:56,900 --> 00:27:59,370 And you can probably send that many packets 695 00:27:59,370 --> 00:28:02,780 certainly within a day if you have a fast enough connection. 696 00:28:05,510 --> 00:28:07,562 So it's sort of an interesting argument 697 00:28:07,562 --> 00:28:09,645 we're having here in the sense that at some level, 698 00:28:09,645 --> 00:28:11,120 TCP is hopefully insecure. 699 00:28:11,120 --> 00:28:12,270 Because it's only 32 bits. 700 00:28:12,270 --> 00:28:14,090 There's no way we could make it secure. 701 00:28:14,090 --> 00:28:15,631 But I think many applications rely on 702 00:28:15,631 --> 00:28:18,740 it enough that not providing any security at all 703 00:28:18,740 --> 00:28:22,368 is so much of a nuisance that it really becomes a problem. 704 00:28:22,368 --> 00:28:24,060 But you're absolutely right. 705 00:28:24,060 --> 00:28:26,790 In practice, you do want to do some sort of encryption 706 00:28:26,790 --> 00:28:29,350 on top of this that will provide stronger 707 00:28:29,350 --> 00:28:31,610 guarantees that no one tampered with your data, 708 00:28:31,610 --> 00:28:33,710 but where the keys are more than 32 bits long. 709 00:28:39,120 --> 00:28:41,242 It still turns out to be useful to prevent people 710 00:28:41,242 --> 00:28:45,812 from tampering with TCP connections in most cases. 711 00:28:45,812 --> 00:28:48,140 All right, other questions? 712 00:28:48,140 --> 00:28:50,435 All right, so let's see what actually goes wrong. 713 00:28:50,435 --> 00:28:53,360 Why is it a bad thing if people are 714 00:28:53,360 --> 00:28:56,990 able to spoof TCP connections from arbitrary addresses? 715 00:28:56,990 --> 00:29:00,335 So one reason why this is bad is if there 716 00:29:00,335 --> 00:29:03,460 is any kind of IP-based authorization. 717 00:29:08,240 --> 00:29:11,600 So if some server decides whether an operation is going 718 00:29:11,600 --> 00:29:14,170 to be allowed or not based on the IP address it comes from, 719 00:29:14,170 --> 00:29:16,000 then this is potentially going to be 720 00:29:16,000 --> 00:29:18,950 a problem for an attacker who spoofed connections 721 00:29:18,950 --> 00:29:21,440 from an arbitrary source address. 722 00:29:21,440 --> 00:29:24,040 So one example where this was a problem-- 723 00:29:24,040 --> 00:29:26,350 and it largely isn't anymore-- is 724 00:29:26,350 --> 00:29:30,910 this family of r commands, things like rlogin. 725 00:29:30,910 --> 00:29:33,160 So it used to be the case that you could run something 726 00:29:33,160 --> 00:29:34,984 like rlogin into a machine, let's say 727 00:29:34,984 --> 00:29:35,900 athena.dialup.mit.edu. 728 00:29:41,360 --> 00:29:45,600 And if your connection was coming from a host at MIT, 729 00:29:45,600 --> 00:29:49,000 then this rlogin command would succeed if you say, oh yeah, 730 00:29:49,000 --> 00:29:51,174 I'm user Alice on this machine. 731 00:29:51,174 --> 00:29:53,340 Let me log in as user Alice onto this other machine. 732 00:29:53,340 --> 00:29:55,548 And it'll just trust that all the machines at mit.edu 733 00:29:55,548 --> 00:29:58,410 are trustworthy to make these statements. 734 00:29:58,410 --> 00:30:00,700 I should say I think dial-up never actually 735 00:30:00,700 --> 00:30:01,500 had this problem. 736 00:30:01,500 --> 00:30:03,416 It was using Cerberus from the very beginning. 737 00:30:03,416 --> 00:30:07,360 But other systems certainly did have such problems. 738 00:30:07,360 --> 00:30:10,610 And this is an example of using the IP address where 739 00:30:10,610 --> 00:30:15,170 the connection is coming from some sort of authentication 740 00:30:15,170 --> 00:30:19,190 mechanism for whether the caller or the client 741 00:30:19,190 --> 00:30:20,470 is trustworthy or not. 742 00:30:20,470 --> 00:30:22,650 So this certainly used to be a problem, 743 00:30:22,650 --> 00:30:23,730 isn't a problem anymore. 744 00:30:23,730 --> 00:30:27,120 So relying on IP seems like such a clearly bad plan. 745 00:30:27,120 --> 00:30:28,980 Yet, this actually is still the case. 746 00:30:28,980 --> 00:30:30,160 So rlogin is gone. 747 00:30:30,160 --> 00:30:33,574 It was recently replaced by SSH now, which is good. 748 00:30:33,574 --> 00:30:34,990 On the other hand, there are still 749 00:30:34,990 --> 00:30:38,470 many other examples of protocols that rely 750 00:30:38,470 --> 00:30:40,530 on IP-based authentication. 751 00:30:40,530 --> 00:30:41,890 One of them is SMTP. 752 00:30:41,890 --> 00:30:45,555 So when you send email, you use SMTP to talk to some mail 753 00:30:45,555 --> 00:30:47,650 server to send a message. 754 00:30:47,650 --> 00:30:51,120 And to prevent spam, many SMTP servers 755 00:30:51,120 --> 00:30:53,470 will only accept incoming messages 756 00:30:53,470 --> 00:30:55,710 from a particular source IP address. 757 00:30:55,710 --> 00:30:57,460 So for example, Comcast's mail server 758 00:30:57,460 --> 00:31:00,280 will only accept mail from Comcast IP addresses. 759 00:31:00,280 --> 00:31:02,570 Same for MIT mail servers-- will only accept mail 760 00:31:02,570 --> 00:31:03,500 from MIT IP addresses. 761 00:31:03,500 --> 00:31:06,260 Or there was at least one server that ISNT 762 00:31:06,260 --> 00:31:08,910 runs that has this property. 763 00:31:08,910 --> 00:31:11,090 So this is the case where it's still 764 00:31:11,090 --> 00:31:13,026 using IP-based authentication. 765 00:31:13,026 --> 00:31:14,905 Here it's not so bad. 766 00:31:14,905 --> 00:31:16,775 Worst case, you'll send some piece of spam 767 00:31:16,775 --> 00:31:17,775 through the mail server. 768 00:31:17,775 --> 00:31:19,890 So that's probably why they're still using it, 769 00:31:19,890 --> 00:31:23,710 whereas things that allow you to log into an arbitrary account 770 00:31:23,710 --> 00:31:27,280 stopped using IP-based authentication. 771 00:31:27,280 --> 00:31:29,820 So does this make sense, why this is a bad plan? 772 00:31:29,820 --> 00:31:33,500 And just to double check, suppose that some server 773 00:31:33,500 --> 00:31:34,554 was using rlogin. 774 00:31:34,554 --> 00:31:35,845 What would you do to attack it? 775 00:31:35,845 --> 00:31:39,640 What bad thing would happen? 776 00:31:39,640 --> 00:31:41,590 Suggestions? 777 00:31:41,590 --> 00:31:42,090 Yeah. 778 00:31:42,090 --> 00:31:44,026 AUDIENCE: Just getting into your computer, 779 00:31:44,026 --> 00:31:46,482 and then make a user that you want to log into, 780 00:31:46,482 --> 00:31:47,898 and then you get into the network. 781 00:31:47,898 --> 00:31:50,450 PROFESSOR: Yeah, so basically you get your computer. 782 00:31:50,450 --> 00:31:53,955 You synthesize this data to look like a legitimate set of rlogin 783 00:31:53,955 --> 00:31:56,170 commands that say, log in as this user 784 00:31:56,170 --> 00:31:58,980 and run this command in my Unix shell there. 785 00:31:58,980 --> 00:32:01,780 You sort of synthesize this data and you mount this whole attack 786 00:32:01,780 --> 00:32:04,295 and send this data as if a legitimate user was interacting 787 00:32:04,295 --> 00:32:09,275 with an rlogin client, and then you're good to go. 788 00:32:09,275 --> 00:32:11,280 OK, so this is one reason why you probably 789 00:32:11,280 --> 00:32:15,560 don't want your TCP sequence numbers to be so guessable. 790 00:32:15,560 --> 00:32:17,340 Another problem is these reset attacks. 791 00:32:17,340 --> 00:32:23,120 So much like we were able to send a SYN packet, 792 00:32:23,120 --> 00:32:25,222 if you know someone's sequence number, 793 00:32:25,222 --> 00:32:26,680 you could also send a reset packet. 794 00:32:26,680 --> 00:32:27,810 We sort of briefly talked about it 795 00:32:27,810 --> 00:32:29,750 here as the legitimate client potentially 796 00:32:29,750 --> 00:32:32,975 sending a reset to reset the fake connection 797 00:32:32,975 --> 00:32:35,200 that the attacker is establishing. 798 00:32:35,200 --> 00:32:36,960 But in a similar vain, the adversary 799 00:32:36,960 --> 00:32:40,180 could try to send reset packets for an existing connection 800 00:32:40,180 --> 00:32:42,400 if there's some way that the adversary knows 801 00:32:42,400 --> 00:32:46,060 what your sequence number is on that connection. 802 00:32:46,060 --> 00:32:48,850 So this is actually not clear if this is such a big problem 803 00:32:48,850 --> 00:32:49,750 or or. 804 00:32:49,750 --> 00:32:51,240 At some level, maybe you should be 805 00:32:51,240 --> 00:32:52,490 assuming that all your TCP connections could 806 00:32:52,490 --> 00:32:53,450 be broken at any time anyway. 807 00:32:53,450 --> 00:32:55,210 It's not like the network is reliable. 808 00:32:55,210 --> 00:32:57,920 So maybe you should be expecting your connections to drop. 809 00:32:57,920 --> 00:32:59,710 But one place where this turned out 810 00:32:59,710 --> 00:33:03,000 to be particularly not a good assumption to make 811 00:33:03,000 --> 00:33:06,060 is in the case of routers talking to one another. 812 00:33:06,060 --> 00:33:08,340 So if you have multiple routers that 813 00:33:08,340 --> 00:33:10,505 speak some routing protocol, then they're 814 00:33:10,505 --> 00:33:13,590 connected, of course, by some physical links. 815 00:33:13,590 --> 00:33:16,480 But over some physical links, they actually 816 00:33:16,480 --> 00:33:18,000 speak some network protocol. 817 00:33:18,000 --> 00:33:19,671 And that network protocol runs over TCP. 818 00:33:19,671 --> 00:33:21,170 So there's actually some TCP session 819 00:33:21,170 --> 00:33:22,878 running over each of these physical links 820 00:33:22,878 --> 00:33:26,672 that the routers use to exchange routing information. 821 00:33:26,672 --> 00:33:28,630 So this is certainly the case for this protocol 822 00:33:28,630 --> 00:33:32,250 called BGP we'll talk about a bit more in a second. 823 00:33:32,250 --> 00:33:36,050 And BGP uses the fact that the TCP connection 824 00:33:36,050 --> 00:33:39,580 is alive to also infer that the link is alive. 825 00:33:39,580 --> 00:33:41,750 So if the TCP connection breaks, then the routers 826 00:33:41,750 --> 00:33:43,090 assume the link broke. 827 00:33:43,090 --> 00:33:46,350 And they recompute all their routing tables. 828 00:33:46,350 --> 00:33:47,875 So if an adversary wants to mount 829 00:33:47,875 --> 00:33:49,819 some sort of a denial of service attack here, 830 00:33:49,819 --> 00:33:51,319 they could try to guess the sequence 831 00:33:51,319 --> 00:33:54,520 numbers of these routers and reset these sessions. 832 00:33:54,520 --> 00:33:57,876 So if the TCP session between two routers goes down, 833 00:33:57,876 --> 00:33:59,750 both routers are like, oh, this link is dead. 834 00:33:59,750 --> 00:34:01,583 We have to recompute all the routing tables, 835 00:34:01,583 --> 00:34:02,510 and the routes change. 836 00:34:02,510 --> 00:34:05,235 And then you might shoot down another link, and so on. 837 00:34:05,235 --> 00:34:07,300 So this is a bit of a worrisome attack, 838 00:34:07,300 --> 00:34:12,489 not because it violates someone's secrecy, et cetera, 839 00:34:12,489 --> 00:34:15,510 or at least not directly, but more because it really 840 00:34:15,510 --> 00:34:19,310 causes a lot of availability problems 841 00:34:19,310 --> 00:34:21,079 for other users in the system. 842 00:34:21,079 --> 00:34:21,734 Yeah. 843 00:34:21,734 --> 00:34:23,960 AUDIENCE: So if you're an attacker, 844 00:34:23,960 --> 00:34:26,170 and you wanted to target one particular user, 845 00:34:26,170 --> 00:34:29,740 could you just keep sending connection requests 846 00:34:29,740 --> 00:34:32,594 to a server on behalf of his IP and make 847 00:34:32,594 --> 00:34:33,969 him keep dropping his connections 848 00:34:33,969 --> 00:34:38,679 to the servers and so you just [INAUDIBLE]? 849 00:34:38,679 --> 00:34:41,540 PROFESSOR: Well, so it requires you guessing. 850 00:34:41,540 --> 00:34:43,460 So you're saying, suppose I'm using Gmail, 851 00:34:43,460 --> 00:34:45,835 and you want to stop me from learning something in Gmail, 852 00:34:45,835 --> 00:34:48,292 so just send packets to my machine 853 00:34:48,292 --> 00:34:49,530 pretending to be from Gmail. 854 00:34:49,530 --> 00:34:51,980 Well, you have to guess the right source and destination 855 00:34:51,980 --> 00:34:52,787 port numbers. 856 00:34:52,787 --> 00:34:54,620 The destination port number is probably 443, 857 00:34:54,620 --> 00:34:55,912 because I'm using HTTPS. 858 00:34:55,912 --> 00:34:57,370 But the source port number is going 859 00:34:57,370 --> 00:34:59,390 to be some random 16-bit thing. 860 00:34:59,390 --> 00:35:02,040 And it's also going to be the case that probably the sequence 861 00:35:02,040 --> 00:35:03,070 numbers are going to be different. 862 00:35:03,070 --> 00:35:04,903 So unless you guess a sequence number that's 863 00:35:04,903 --> 00:35:09,650 within my TCP window, which is in order of probably 864 00:35:09,650 --> 00:35:11,400 tens of kilobytes, you're also going 865 00:35:11,400 --> 00:35:13,280 to be not successful in that regard. 866 00:35:13,280 --> 00:35:17,350 So you have to guess a fair amount of stuff. 867 00:35:17,350 --> 00:35:19,168 There's no sort of oracle access. 868 00:35:19,168 --> 00:35:21,293 You can't just query the server and say, well, what 869 00:35:21,293 --> 00:35:23,130 is that guy's sequence number? 870 00:35:23,130 --> 00:35:27,890 So that's the reason why that doesn't work out as well. 871 00:35:27,890 --> 00:35:30,040 So again, many of these issues were 872 00:35:30,040 --> 00:35:31,880 fixed, including this RST-based thing, 873 00:35:31,880 --> 00:35:33,065 especially for BGP routers. 874 00:35:35,890 --> 00:35:38,460 There was actually two sort of amusing fixes. 875 00:35:38,460 --> 00:35:41,389 One really shows you how you can carefully 876 00:35:41,389 --> 00:35:43,430 exploit existing things or take advantage of them 877 00:35:43,430 --> 00:35:45,670 to fix particular problems. 878 00:35:45,670 --> 00:35:47,747 Here, the insight is that these routers only 879 00:35:47,747 --> 00:35:49,497 want to talk to each other, not to someone 880 00:35:49,497 --> 00:35:50,980 else over the network. 881 00:35:50,980 --> 00:35:52,990 And as a result, if the packet is 882 00:35:52,990 --> 00:35:55,817 coming not from the immediate router next across the link, 883 00:35:55,817 --> 00:35:58,442 but from someone else, I want to drop this packet all together. 884 00:35:58,442 --> 00:36:01,730 And what the designers of these writing protocols realized 885 00:36:01,730 --> 00:36:04,370 is that there's this wonderful field in a packet called time 886 00:36:04,370 --> 00:36:05,390 to live. 887 00:36:05,390 --> 00:36:08,665 It's an 8-bit field that gets decremented by every router 888 00:36:08,665 --> 00:36:11,840 to make sure that packets don't go into an infinite loop. 889 00:36:11,840 --> 00:36:15,180 So the highest this TTL value could ever be is 255. 890 00:36:15,180 --> 00:36:17,630 And then it'll get decremented from there. 891 00:36:17,630 --> 00:36:19,165 So what these writing protocols do-- 892 00:36:19,165 --> 00:36:23,000 it's sort of a clever hack-- is they 893 00:36:23,000 --> 00:36:27,220 reject any packet with a TTL value that's not 255. 894 00:36:27,220 --> 00:36:29,660 Because if a packet has a value of 255, 895 00:36:29,660 --> 00:36:31,430 it must have come from the router 896 00:36:31,430 --> 00:36:33,530 just on the other side of this link. 897 00:36:33,530 --> 00:36:35,630 And if the an adversary tries to inject any packet 898 00:36:35,630 --> 00:36:37,900 to tamper with this existing BGP connection, 899 00:36:37,900 --> 00:36:39,852 it'll have a TTL value less than 255, 900 00:36:39,852 --> 00:36:41,935 because it'll be decremented by some other routers 901 00:36:41,935 --> 00:36:44,520 along the path, including this one. 902 00:36:44,520 --> 00:36:47,766 And then it'll just get rejected by the recipient. 903 00:36:47,766 --> 00:36:50,762 So this is one example of a clever combination 904 00:36:50,762 --> 00:36:52,470 of techniques that's backwards compatible 905 00:36:52,470 --> 00:36:54,658 and solves this very specific problem. 906 00:36:54,658 --> 00:36:55,158 Yeah. 907 00:36:55,158 --> 00:36:56,866 AUDIENCE: Doesn't the bottom right router 908 00:36:56,866 --> 00:36:58,616 also send something with a TTL of 255? 909 00:36:58,616 --> 00:37:00,670 PROFESSOR: Yeah, so these routers are actually-- 910 00:37:00,670 --> 00:37:01,830 this is a physical router. 911 00:37:01,830 --> 00:37:03,780 And it knows these are separate links. 912 00:37:03,780 --> 00:37:07,130 So it looks at the TTL and which link it came on. 913 00:37:07,130 --> 00:37:09,150 So if a packet came in on this link, 914 00:37:09,150 --> 00:37:12,840 it will not accept it for this TCP connection. 915 00:37:12,840 --> 00:37:13,896 But you're right. 916 00:37:13,896 --> 00:37:17,630 For the most part, these routers trust 917 00:37:17,630 --> 00:37:19,450 their immediate neighbors. 918 00:37:19,450 --> 00:37:20,950 It need not necessarily be the case. 919 00:37:20,950 --> 00:37:22,016 But if you keep seeing this problem, 920 00:37:22,016 --> 00:37:23,740 and you know you've implemented this hack, 921 00:37:23,740 --> 00:37:24,720 then it must be one of your neighbors. 922 00:37:24,720 --> 00:37:25,594 You're going to look. 923 00:37:25,594 --> 00:37:27,162 TCP dumped these interfaces. 924 00:37:27,162 --> 00:37:29,210 Why are you sending me these reset packets? 925 00:37:29,210 --> 00:37:31,120 This problem is not as big. 926 00:37:31,120 --> 00:37:34,736 You can manage it by some Auto Pan mechanism. 927 00:37:34,736 --> 00:37:36,005 Make sense? 928 00:37:36,005 --> 00:37:38,450 All right, there are other fixes for BGP 929 00:37:38,450 --> 00:37:41,550 where they implemented some form of header authentication, 930 00:37:41,550 --> 00:37:43,480 MD5 header authentication as well. 931 00:37:43,480 --> 00:37:46,220 But they're really targeting this particular application 932 00:37:46,220 --> 00:37:48,340 where this reset attack is particularly bad. 933 00:37:48,340 --> 00:37:49,975 This is still a problem today. 934 00:37:49,975 --> 00:37:51,975 If there's some long-lived connection 935 00:37:51,975 --> 00:37:53,766 out there that I really want to shoot down, 936 00:37:53,766 --> 00:37:58,480 I just have to send some large number of RST packets, 937 00:37:58,480 --> 00:38:00,730 probably on the order of hundreds of thousands 938 00:38:00,730 --> 00:38:04,770 or so, but probably not exactly 4 billion. 939 00:38:04,770 --> 00:38:07,930 Because the servers are actually somewhat 940 00:38:07,930 --> 00:38:10,520 lax in terms of which sequence number they accept for a reset. 941 00:38:10,520 --> 00:38:13,360 It can be any packet within a certain window. 942 00:38:13,360 --> 00:38:16,730 And in that case, I could probably, or any attacker, 943 00:38:16,730 --> 00:38:19,500 reset an existing connection with a modest 944 00:38:19,500 --> 00:38:21,262 but not a huge amount of effort. 945 00:38:21,262 --> 00:38:22,738 That's still a problem. 946 00:38:22,738 --> 00:38:25,460 And people haven't really found any great solution for that. 947 00:38:28,640 --> 00:38:31,576 All right, and I guess the sort of last bad thing that 948 00:38:31,576 --> 00:38:33,700 happens because these sequence numbers are somewhat 949 00:38:33,700 --> 00:38:36,331 predictable is just data injection 950 00:38:36,331 --> 00:38:39,280 into existing connections. 951 00:38:39,280 --> 00:38:43,550 So suppose there is some protocol like rlogin, 952 00:38:43,550 --> 00:38:47,650 but maybe rlogin doesn't-- suppose we have some 953 00:38:47,650 --> 00:38:49,710 hypothetical protocol that's kind of like rlogin, 954 00:38:49,710 --> 00:38:51,990 but actually it doesn't do IP-based authentication. 955 00:38:51,990 --> 00:38:53,364 You have to type in your password 956 00:38:53,364 --> 00:38:55,392 to log in, all this great stuff. 957 00:38:55,392 --> 00:38:57,350 The problem is once you've typed your password, 958 00:38:57,350 --> 00:38:59,225 maybe your TCP connection is just established 959 00:38:59,225 --> 00:39:01,060 and can accept arbitrary data. 960 00:39:01,060 --> 00:39:03,392 So wait for one of you guys to log into a machine, type 961 00:39:03,392 --> 00:39:04,100 in your password. 962 00:39:04,100 --> 00:39:05,235 I don't know what that password is. 963 00:39:05,235 --> 00:39:07,026 But once you've established TCP connection, 964 00:39:07,026 --> 00:39:09,120 I'll just try to guess your sequence number 965 00:39:09,120 --> 00:39:11,332 and inject some data into your existing connection. 966 00:39:11,332 --> 00:39:13,415 So if I can guess your sequence numbers correctly, 967 00:39:13,415 --> 00:39:16,005 then this allows me to make it pretend 968 00:39:16,005 --> 00:39:18,255 like you've typed some command after you authenticated 969 00:39:18,255 --> 00:39:19,977 correctly with your password. 970 00:39:19,977 --> 00:39:21,810 So this all sort of suggests that you really 971 00:39:21,810 --> 00:39:28,260 don't want to rely on these 32-bit sequence numbers 972 00:39:28,260 --> 00:39:30,430 for providing security. 973 00:39:30,430 --> 00:39:33,660 But let's actually see what modern TCP stacks actually 974 00:39:33,660 --> 00:39:35,650 do to try to mitigate this problem. 975 00:39:35,650 --> 00:39:37,650 So as we were sort of discussing, 976 00:39:37,650 --> 00:39:41,170 I guess one approach that we'll look at in the next two 977 00:39:41,170 --> 00:39:44,780 lectures is how to implement some security 978 00:39:44,780 --> 00:39:45,910 at the application level. 979 00:39:45,910 --> 00:39:50,160 So we'll use cryptography to authenticate and encrypt 980 00:39:50,160 --> 00:39:54,460 and sign and verify messages at the application level 981 00:39:54,460 --> 00:39:57,420 without really involving TCP so much. 982 00:39:57,420 --> 00:39:59,606 But there are some existing applications 983 00:39:59,606 --> 00:40:03,950 that would benefit from making this slightly better, 984 00:40:03,950 --> 00:40:07,160 at least not make it so easy to exploit these problems. 985 00:40:07,160 --> 00:40:09,360 And the way that I guess people do 986 00:40:09,360 --> 00:40:13,200 this in practice today-- for example Linux and Windows-- 987 00:40:13,200 --> 00:40:17,360 is they implement the suggestion that John gave earlier, 988 00:40:17,360 --> 00:40:20,920 that we maintain different initial sequence 989 00:40:20,920 --> 00:40:22,820 numbers for every source destination pair. 990 00:40:22,820 --> 00:40:29,110 So what most TCP SYN implementations do 991 00:40:29,110 --> 00:40:33,200 is they still compute this initial sequence number 992 00:40:33,200 --> 00:40:34,590 as we were computing before. 993 00:40:34,590 --> 00:40:39,600 So this is the old style ISN, let's say. 994 00:40:39,600 --> 00:40:42,000 And in order to actually generate 995 00:40:42,000 --> 00:40:44,650 the actual ISN for any particular connection, 996 00:40:44,650 --> 00:40:48,130 we're going to add a random 32-bit offset. 997 00:40:48,130 --> 00:40:51,005 So we're going to include some sort of a function. 998 00:40:51,005 --> 00:40:54,010 Think of it like as like a hash function like SHA-1 999 00:40:54,010 --> 00:40:56,295 or something maybe better. 1000 00:40:56,295 --> 00:40:59,190 And this is going to be a function of the source 1001 00:40:59,190 --> 00:41:05,140 IP, the source port number, the destination IP 1002 00:41:05,140 --> 00:41:10,820 address, destination port, and some sort of a secret key 1003 00:41:10,820 --> 00:41:14,900 that only the server knows in this case. 1004 00:41:14,900 --> 00:41:17,290 So this has the nice property that 1005 00:41:17,290 --> 00:41:19,350 within any particular connection, 1006 00:41:19,350 --> 00:41:24,147 as identified by a source and destination IP port pair, 1007 00:41:24,147 --> 00:41:25,980 it still preserves all these nice properties 1008 00:41:25,980 --> 00:41:30,000 of this old style sequence number algorithm had. 1009 00:41:30,000 --> 00:41:34,780 But if you have connections from different source/destination 1010 00:41:34,780 --> 00:41:37,230 tuples, then there's nothing you can 1011 00:41:37,230 --> 00:41:41,600 learn about the exact value of another connection tuple's 1012 00:41:41,600 --> 00:41:42,780 sequence number. 1013 00:41:42,780 --> 00:41:45,450 And in fact, you'll have to guess this key in order 1014 00:41:45,450 --> 00:41:47,680 to infer that value. 1015 00:41:47,680 --> 00:41:50,580 And hopefully the server, presumably the OS kernel, 1016 00:41:50,580 --> 00:41:52,290 stores this key somewhere in its memory 1017 00:41:52,290 --> 00:41:54,670 and doesn't give it out to anyone else. 1018 00:41:54,670 --> 00:41:56,420 So this is how pretty much most TCP stacks 1019 00:41:56,420 --> 00:41:58,365 deal with this particular problem 1020 00:41:58,365 --> 00:42:02,896 today to the extent allowed by the total 32-bit sequence 1021 00:42:02,896 --> 00:42:03,395 number. 1022 00:42:03,395 --> 00:42:04,050 It's not great, but sort of works. 1023 00:42:04,050 --> 00:42:04,543 Yeah. 1024 00:42:04,543 --> 00:42:06,126 AUDIENCE: Could you repeat that again? 1025 00:42:06,126 --> 00:42:09,480 Is the key unique to-- 1026 00:42:09,480 --> 00:42:11,380 PROFESSOR: So when my machine boots up, 1027 00:42:11,380 --> 00:42:13,780 or when any machine boots up, it generates a random key. 1028 00:42:13,780 --> 00:42:16,530 Every time you reboot it it generates a new key. 1029 00:42:16,530 --> 00:42:20,565 And this means that every time that 1030 00:42:20,565 --> 00:42:24,680 for a particular source/destination pair, 1031 00:42:24,680 --> 00:42:26,830 the sequence numbers advance at the same rate as 1032 00:42:26,830 --> 00:42:27,820 controlled by this. 1033 00:42:27,820 --> 00:42:29,444 So for a given source/destination pair, 1034 00:42:29,444 --> 00:42:30,850 this thing is fixed. 1035 00:42:30,850 --> 00:42:32,889 So you observe your sequence numbers 1036 00:42:32,889 --> 00:42:34,680 evolving according to your initial sequence 1037 00:42:34,680 --> 00:42:36,596 numbers for new connections evolving according 1038 00:42:36,596 --> 00:42:39,480 to a particular algorithm. 1039 00:42:39,480 --> 00:42:43,915 So that still provides all these defences against old packets 1040 00:42:43,915 --> 00:42:47,120 from previous connections being injected into new connections, 1041 00:42:47,120 --> 00:42:50,430 just like packet reordering problems. 1042 00:42:50,430 --> 00:42:51,630 So that still works. 1043 00:42:51,630 --> 00:42:53,490 And that's the only real thing for which 1044 00:42:53,490 --> 00:42:56,030 we needed this sequence number choosing algorithms 1045 00:42:56,030 --> 00:42:58,800 to prevent these duplicate packets from causing problems. 1046 00:42:58,800 --> 00:43:01,660 However, the thing that we were exploiting before, 1047 00:43:01,660 --> 00:43:04,095 which is that if you get the sequence 1048 00:43:04,095 --> 00:43:08,130 number for one connection from A to S, then 1049 00:43:08,130 --> 00:43:10,715 from that you can infer the sequence number 1050 00:43:10,715 --> 00:43:12,360 for a different connection. 1051 00:43:12,360 --> 00:43:13,080 That's now gone. 1052 00:43:13,080 --> 00:43:14,750 Because every connection has a different 1053 00:43:14,750 --> 00:43:19,790 offset in this 32-bit space as implemented by its F function. 1054 00:43:19,790 --> 00:43:25,085 So this completely decouples the initial sequence numbers 1055 00:43:25,085 --> 00:43:27,611 seen by every connection. 1056 00:43:27,611 --> 00:43:28,110 Yeah. 1057 00:43:28,110 --> 00:43:31,300 AUDIENCE: What's the point in including the key? 1058 00:43:31,300 --> 00:43:33,300 PROFESSOR: Well, if you don't include the key, 1059 00:43:33,300 --> 00:43:35,050 then I can connect to you. 1060 00:43:35,050 --> 00:43:37,487 I'll compute the same function F. I'll subtract it out. 1061 00:43:37,487 --> 00:43:38,070 I'll get this. 1062 00:43:38,070 --> 00:43:40,170 I'll compute this function F for the connection I actually 1063 00:43:40,170 --> 00:43:40,910 want to fake. 1064 00:43:40,910 --> 00:43:42,615 And I'll guess what the initial sequence number for that one 1065 00:43:42,615 --> 00:43:43,360 is going to be. 1066 00:43:43,360 --> 00:43:46,230 AUDIENCE: So can you-- because machines now restart 1067 00:43:46,230 --> 00:43:50,630 infrequently, can you still [INAUDIBLE] by reversing-- 1068 00:43:50,630 --> 00:43:53,569 PROFESSOR: I think typically this function F 1069 00:43:53,569 --> 00:43:55,610 is something like a cryptographically secure hash 1070 00:43:55,610 --> 00:44:01,557 function, which has a semi-proved property that it's 1071 00:44:01,557 --> 00:44:03,140 very difficult. It's cryptographically 1072 00:44:03,140 --> 00:44:04,480 hard to invert it. 1073 00:44:04,480 --> 00:44:07,160 So even if you were given the literal inputs and outputs 1074 00:44:07,160 --> 00:44:14,059 of this hash function except for this key part, 1075 00:44:14,059 --> 00:44:15,517 it would be very hard for you guess 1076 00:44:15,517 --> 00:44:17,490 what this key is cryptographically, 1077 00:44:17,490 --> 00:44:19,324 even in an isolated setting. 1078 00:44:19,324 --> 00:44:21,740 So hopefully this will be at least as hard in this setting 1079 00:44:21,740 --> 00:44:24,324 as well. 1080 00:44:24,324 --> 00:44:26,615 We'll talk a little bit more about what these functions 1081 00:44:26,615 --> 00:44:30,460 F are a bit later on and how you to use them correctly. 1082 00:44:30,460 --> 00:44:31,650 Make sense? 1083 00:44:31,650 --> 00:44:37,140 Other questions of this problem and solution? 1084 00:44:37,140 --> 00:44:41,252 All right, so in fact, this was mostly 1085 00:44:41,252 --> 00:44:44,255 sort of an example of these TCP sequence number attacks 1086 00:44:44,255 --> 00:44:46,790 that aren't as relevant anymore. 1087 00:44:46,790 --> 00:44:49,440 Because every operating system basically implements this plan 1088 00:44:49,440 --> 00:44:50,080 these days. 1089 00:44:50,080 --> 00:44:52,425 So it's hard to infer what someone's sequence 1090 00:44:52,425 --> 00:44:53,577 number is going to be. 1091 00:44:53,577 --> 00:44:55,910 On the other hand, people keep making the same mistakes. 1092 00:44:55,910 --> 00:44:59,755 So even after this was implemented for TCP, 1093 00:44:59,755 --> 00:45:01,490 there was this other protocol called 1094 00:45:01,490 --> 00:45:05,830 DNS that is hugely vulnerable to similar attacks. 1095 00:45:05,830 --> 00:45:10,060 And the reason is that DNS actually runs over UDP. 1096 00:45:10,060 --> 00:45:13,100 So UDP is a stateless protocol where you actually 1097 00:45:13,100 --> 00:45:16,340 don't do any connection establishment where 1098 00:45:16,340 --> 00:45:18,140 you exchange sequence numbers. 1099 00:45:18,140 --> 00:45:20,040 In UDP, you simply send a request 1100 00:45:20,040 --> 00:45:22,140 from your source address to the server. 1101 00:45:22,140 --> 00:45:24,950 And the server figures out what the reply should be and sends 1102 00:45:24,950 --> 00:45:28,780 it back to whatever source address appeared in the packet. 1103 00:45:28,780 --> 00:45:32,850 So it's a single round trip, so there's 1104 00:45:32,850 --> 00:45:34,350 no time to exchange sequence numbers 1105 00:45:34,350 --> 00:45:36,349 and to establish that, oh, yeah, you're actually 1106 00:45:36,349 --> 00:45:38,140 talking to the right guy. 1107 00:45:38,140 --> 00:45:44,330 So with DNS, as a result, for a while, 1108 00:45:44,330 --> 00:45:48,840 it was quite easy to fake responses from a DNS server. 1109 00:45:48,840 --> 00:45:51,464 So how would a query look like in DNS, typically? 1110 00:45:51,464 --> 00:45:53,130 Well, you send some queries-- so suppose 1111 00:45:53,130 --> 00:45:57,230 a client sends a packet from client to some DNS server 1112 00:45:57,230 --> 00:46:00,210 that knows the DNS server's IP address ahead of time, 1113 00:46:00,210 --> 00:46:04,260 maybe preconfigured somewhere, say, well, here's my query. 1114 00:46:04,260 --> 00:46:07,336 Maybe I'm looking for mit.edu. 1115 00:46:07,336 --> 00:46:09,860 And that's basically it. 1116 00:46:09,860 --> 00:46:12,420 And the server's destination port number 1117 00:46:12,420 --> 00:46:14,600 is always 53 for DNS. 1118 00:46:14,600 --> 00:46:17,380 And the clients used to also run on the same port number 1119 00:46:17,380 --> 00:46:20,720 for ease of use or something. 1120 00:46:20,720 --> 00:46:23,003 So you send this packet from the client on this port 1121 00:46:23,003 --> 00:46:24,320 to the server on this port. 1122 00:46:24,320 --> 00:46:25,200 Here's the query. 1123 00:46:25,200 --> 00:46:30,460 And the server eventually sends back a reply saying, 1124 00:46:30,460 --> 00:46:38,580 mit.edu has a particular IP address, 18.9 dot something. 1125 00:46:38,580 --> 00:46:41,860 The problem is that some adversary could easily 1126 00:46:41,860 --> 00:46:43,615 send a similar response packet pretending 1127 00:46:43,615 --> 00:46:45,408 to be from the server. 1128 00:46:45,408 --> 00:46:47,366 And there's not a whole lot of randomness here. 1129 00:46:47,366 --> 00:46:50,180 So if I know that you're trying to connect to mit.edu, 1130 00:46:50,180 --> 00:46:53,126 I'll just send a lot of packets like this to your machine. 1131 00:46:53,126 --> 00:46:55,334 I know exactly what DNS server you're going to query. 1132 00:46:55,334 --> 00:46:57,400 I know exactly what your IP address is. 1133 00:46:57,400 --> 00:46:58,703 I know the port numbers. 1134 00:46:58,703 --> 00:47:00,036 I know what you're querying for. 1135 00:47:00,036 --> 00:47:02,530 I can just supply my own IP address here. 1136 00:47:02,530 --> 00:47:06,310 And if my packet gets there after you send this 1137 00:47:06,310 --> 00:47:08,630 but before you get the real response, 1138 00:47:08,630 --> 00:47:11,930 your client machine is going to use my packet. 1139 00:47:11,930 --> 00:47:14,585 So this is another example where insufficient randomness 1140 00:47:14,585 --> 00:47:17,470 in this protocol makes it very easy to inject responses 1141 00:47:17,470 --> 00:47:20,027 or inject packets in general. 1142 00:47:20,027 --> 00:47:21,860 And this is actually in some ways even worse 1143 00:47:21,860 --> 00:47:23,113 than the previous attack. 1144 00:47:23,113 --> 00:47:25,220 Because here you could convince a client 1145 00:47:25,220 --> 00:47:28,150 to connect to another IP address all together. 1146 00:47:28,150 --> 00:47:29,865 And it'll probably cache this result, 1147 00:47:29,865 --> 00:47:31,246 because DNS involves caching. 1148 00:47:31,246 --> 00:47:32,787 Maybe you can supply a very long time 1149 00:47:32,787 --> 00:47:36,550 to live in this response saying, this is valid for years. 1150 00:47:36,550 --> 00:47:38,620 And then your client, again till it reboots, 1151 00:47:38,620 --> 00:47:41,740 is going to keep using this IP address for mit.edu. 1152 00:47:41,740 --> 00:47:42,240 Yeah. 1153 00:47:42,240 --> 00:47:44,980 AUDIENCE: Could you fix this by having the client include 1154 00:47:44,980 --> 00:47:48,130 some random value in the query, and the server customer 1155 00:47:48,130 --> 00:47:48,630 exactly? 1156 00:47:48,630 --> 00:47:50,755 PROFESSOR: That's right, yeah, so this is typically 1157 00:47:50,755 --> 00:47:52,050 what people have done now. 1158 00:47:52,050 --> 00:47:55,167 The problem, as we were sort of talking about earlier, 1159 00:47:55,167 --> 00:47:56,250 is backward compatibility. 1160 00:47:56,250 --> 00:47:58,870 It's very hard to change the DNS server software 1161 00:47:58,870 --> 00:47:59,760 that everyone runs. 1162 00:47:59,760 --> 00:48:01,260 So you basically have to figure out, 1163 00:48:01,260 --> 00:48:02,700 where can you inject randomness? 1164 00:48:02,700 --> 00:48:04,450 And people have figured out two places. 1165 00:48:04,450 --> 00:48:05,860 It's not great. 1166 00:48:05,860 --> 00:48:08,340 But basically there's a source port number, 1167 00:48:08,340 --> 00:48:11,050 which is 16 bits of randomness. 1168 00:48:11,050 --> 00:48:13,700 So if you can choose the source port number randomly, 1169 00:48:13,700 --> 00:48:15,140 then you get 16 bits. 1170 00:48:15,140 --> 00:48:19,470 And there's also a query ID inside 1171 00:48:19,470 --> 00:48:22,460 of the packet, which is also 16 bits. 1172 00:48:22,460 --> 00:48:25,030 And the server does echo back the query ID. 1173 00:48:25,030 --> 00:48:27,040 So combining these two things together, 1174 00:48:27,040 --> 00:48:30,570 most resolvers these days get 32 bits of randomness 1175 00:48:30,570 --> 00:48:31,940 out of this protocol. 1176 00:48:31,940 --> 00:48:36,620 And it, again, makes it noticeably harder, but still 1177 00:48:36,620 --> 00:48:40,854 not cryptographically perfect, to fake this kind of response 1178 00:48:40,854 --> 00:48:44,340 and have it be accepted by the client. 1179 00:48:44,340 --> 00:48:47,450 But these problems keep coming up, unfortunately. 1180 00:48:47,450 --> 00:48:51,990 So even though it was well understood for TCP, 1181 00:48:51,990 --> 00:48:55,200 some people I guess suggested that this might be a problem. 1182 00:48:55,200 --> 00:48:59,350 But it wasn't actually fixed until only a few years ago. 1183 00:49:01,970 --> 00:49:04,220 Make sense? 1184 00:49:04,220 --> 00:49:06,080 All right, so I guess maybe as an aside, 1185 00:49:06,080 --> 00:49:08,910 there are solutions to this DNS problem 1186 00:49:08,910 --> 00:49:11,890 as well by enforcing security for DNS 1187 00:49:11,890 --> 00:49:13,270 at the application level. 1188 00:49:13,270 --> 00:49:16,750 So instead of relying on these randomness properties 1189 00:49:16,750 --> 00:49:19,611 of small numbers of bits in the packet, 1190 00:49:19,611 --> 00:49:23,680 you could try to use encryption in the DNS protocols. 1191 00:49:23,680 --> 00:49:26,040 So protocols like DNS SEC that the paper briefly 1192 00:49:26,040 --> 00:49:28,070 talks about try to do this. 1193 00:49:28,070 --> 00:49:30,770 So instead of relying on any network level security 1194 00:49:30,770 --> 00:49:34,590 properties, they require that all DNS names have signatures 1195 00:49:34,590 --> 00:49:36,290 attached to them. 1196 00:49:36,290 --> 00:49:37,820 That seems like a sensible plan. 1197 00:49:37,820 --> 00:49:39,695 But it turns out that working out the details 1198 00:49:39,695 --> 00:49:42,300 is actually quite difficult. So one example 1199 00:49:42,300 --> 00:49:47,100 of a problem that showed up is name and origin. 1200 00:49:47,100 --> 00:49:51,130 Because in DNS, you want to get responses. 1201 00:49:51,130 --> 00:49:52,680 Well, this name has that IP address. 1202 00:49:52,680 --> 00:49:54,721 Or you could get a response saying, no, so sorry, 1203 00:49:54,721 --> 00:49:56,310 this name doesn't exist. 1204 00:49:56,310 --> 00:50:00,186 So you want to sign the it doesn't exist response as well. 1205 00:50:00,186 --> 00:50:01,560 Because otherwise, that adversary 1206 00:50:01,560 --> 00:50:04,760 could send back a doesn't exist response and pretend 1207 00:50:04,760 --> 00:50:07,420 that a name doesn't exist, even though it does. 1208 00:50:07,420 --> 00:50:09,930 So how do you sign responses that certain names 1209 00:50:09,930 --> 00:50:11,951 don't exist ahead of time? 1210 00:50:11,951 --> 00:50:13,450 I guess one possibility is you could 1211 00:50:13,450 --> 00:50:17,950 give your DNS server the key that signs all your records. 1212 00:50:17,950 --> 00:50:19,082 That seems like a bad plan. 1213 00:50:19,082 --> 00:50:21,248 Because then someone who compromises your DNS server 1214 00:50:21,248 --> 00:50:22,680 could walk away with this key. 1215 00:50:22,680 --> 00:50:25,700 So instead, the model the DNS SEC operates under 1216 00:50:25,700 --> 00:50:29,440 is that you sign all your names in your domain ahead of time, 1217 00:50:29,440 --> 00:50:32,315 and you give the signed blob to your DNS server. 1218 00:50:32,315 --> 00:50:34,866 And the DNS server can then respond to any queries. 1219 00:50:34,866 --> 00:50:36,990 But even if it's compromised, there's not much else 1220 00:50:36,990 --> 00:50:37,480 that that attacker can do. 1221 00:50:37,480 --> 00:50:39,600 All these things are signed, and the key 1222 00:50:39,600 --> 00:50:43,340 is not to be found on the DNS server itself. 1223 00:50:43,340 --> 00:50:49,150 So the DNS SEC protocol had this clever mechanism called NSEC 1224 00:50:49,150 --> 00:50:52,230 for signing nonexistent records. 1225 00:50:52,230 --> 00:50:55,390 And the way you would do this is by signing gaps 1226 00:50:55,390 --> 00:50:56,650 in the namespace. 1227 00:50:56,650 --> 00:51:00,490 So an NSEC record might say, well, there's 1228 00:51:00,490 --> 00:51:06,550 a name called foo.mit.edu, and the next name alphabetically 1229 00:51:06,550 --> 00:51:10,492 is maybe goo.mit.edu. 1230 00:51:10,492 --> 00:51:13,680 And there's nothing alphabetical in between these two names. 1231 00:51:13,680 --> 00:51:16,380 So if you query for a name between these two 1232 00:51:16,380 --> 00:51:17,921 names alphabetically sorted, then 1233 00:51:17,921 --> 00:51:20,170 the server could send back this signed message saying, 1234 00:51:20,170 --> 00:51:22,050 oh, there's nothing between these two names. 1235 00:51:22,050 --> 00:51:24,460 You can safely return, doesn't exist. 1236 00:51:24,460 --> 00:51:26,060 But then this allows some attacker 1237 00:51:26,060 --> 00:51:27,768 to completely enumerate your domain name. 1238 00:51:27,768 --> 00:51:31,410 You can just ask for some domain name and find this record 1239 00:51:31,410 --> 00:51:32,710 and say, oh, yeah, great. 1240 00:51:32,710 --> 00:51:34,330 So these two things exist. 1241 00:51:34,330 --> 00:51:36,520 Let me query for gooa.mit.edu. 1242 00:51:36,520 --> 00:51:38,200 That'll give me a response saying, 1243 00:51:38,200 --> 00:51:40,820 what's the next name in your domain, et cetera. 1244 00:51:40,820 --> 00:51:42,290 So it's actually a little bit hard 1245 00:51:42,290 --> 00:51:43,950 to come up with the right protocol 1246 00:51:43,950 --> 00:51:46,790 that both preserves all the nice properties of DNS 1247 00:51:46,790 --> 00:51:50,420 and prevents name enumeration and other problems. 1248 00:51:50,420 --> 00:51:52,020 There's actually a nice thing now 1249 00:51:52,020 --> 00:51:55,950 called NSEC3 that tries to solve this problem partially-- sort 1250 00:51:55,950 --> 00:51:56,875 of works, sort of not. 1251 00:51:56,875 --> 00:51:59,110 We'll see, I guess, what gets it [INAUDIBLE]. 1252 00:51:59,110 --> 00:51:59,880 Yeah. 1253 00:51:59,880 --> 00:52:01,550 AUDIENCE: Is there any kind of signing 1254 00:52:01,550 --> 00:52:03,915 of nonexistent top level domains? 1255 00:52:03,915 --> 00:52:05,540 PROFESSOR: Yeah, I think actually yeah. 1256 00:52:05,540 --> 00:52:07,600 The dot domain is just another domain. 1257 00:52:07,600 --> 00:52:10,250 And they similarly have this mechanism implemented as well. 1258 00:52:10,250 --> 00:52:13,120 So actually dot and dot com now implement DNS SEC, 1259 00:52:13,120 --> 00:52:15,842 and there's all these records there that say, well, 1260 00:52:15,842 --> 00:52:18,540 .in is a domain name that exists, 1261 00:52:18,540 --> 00:52:21,915 and dot something else exists, and there's nothing in between. 1262 00:52:21,915 --> 00:52:23,186 So there's all these things. 1263 00:52:23,186 --> 00:52:25,118 AUDIENCE: So other than denial of service, 1264 00:52:25,118 --> 00:52:27,533 why do we care so much about repeating 1265 00:52:27,533 --> 00:52:29,442 domain names within mit.edu? 1266 00:52:29,442 --> 00:52:30,900 PROFESSOR: Well, probably we don't. 1267 00:52:30,900 --> 00:52:33,200 Actually, there's a text file in AFS 1268 00:52:33,200 --> 00:52:35,210 that lists all these domain names at MIT anyway. 1269 00:52:35,210 --> 00:52:36,930 But I think in general, some companies 1270 00:52:36,930 --> 00:52:39,530 feel a little uneasy about revealing this. 1271 00:52:39,530 --> 00:52:41,735 They often have internal names that 1272 00:52:41,735 --> 00:52:46,245 sit in DNS that should never be exposed to the outside. 1273 00:52:46,245 --> 00:52:49,730 I think it's actually this fuzzy area where it was never 1274 00:52:49,730 --> 00:52:51,910 really formalized what guarantees DNS was providing 1275 00:52:51,910 --> 00:52:52,774 to you or was not. 1276 00:52:52,774 --> 00:52:54,690 And people started assuming things like, well, 1277 00:52:54,690 --> 00:52:57,390 if we stick some name, and it's not really publicized anywhere, 1278 00:52:57,390 --> 00:52:59,760 then it's probably secure here. 1279 00:52:59,760 --> 00:53:02,740 I think this is another place where this system doesn't have 1280 00:53:02,740 --> 00:53:04,740 a clear spec in terms of what it has and doesn't 1281 00:53:04,740 --> 00:53:05,930 have to provide. 1282 00:53:05,930 --> 00:53:08,224 And when you make some changes like this, then people 1283 00:53:08,224 --> 00:53:11,214 say, oh, yeah, I was sort of relying on that. 1284 00:53:11,214 --> 00:53:12,116 Yeah. 1285 00:53:12,116 --> 00:53:13,574 AUDIENCE: [INAUDIBLE] replay attack 1286 00:53:13,574 --> 00:53:16,595 where you could send in bold gap signature? 1287 00:53:16,595 --> 00:53:17,970 PROFESSOR: Yeah, there's actually 1288 00:53:17,970 --> 00:53:19,053 time outs on these things. 1289 00:53:19,053 --> 00:53:22,480 So when you sign this, you actually sign and say, 1290 00:53:22,480 --> 00:53:25,370 I'm signing that this set of names 1291 00:53:25,370 --> 00:53:27,710 is valid for, I don't know, a week. 1292 00:53:27,710 --> 00:53:30,790 And then the clients, if they have a synchronized clock, 1293 00:53:30,790 --> 00:53:33,436 they can reject old signed messages. 1294 00:53:33,436 --> 00:53:36,770 Make sense? 1295 00:53:36,770 --> 00:53:43,290 All right, so this is on the TCP SYN guessing attacks. 1296 00:53:43,290 --> 00:53:47,850 Another interesting problem that also comes up in the TCP case 1297 00:53:47,850 --> 00:53:50,490 is a denial of service attack that 1298 00:53:50,490 --> 00:53:54,160 exploits the fact that the server has to store some state. 1299 00:53:54,160 --> 00:53:57,460 So if you look at this handshake that we 1300 00:53:57,460 --> 00:54:00,450 had on the board before, we'll see 1301 00:54:00,450 --> 00:54:04,230 that when a client establishes a connection to the server, 1302 00:54:04,230 --> 00:54:08,720 the server has to actually remember the sequence number 1303 00:54:08,720 --> 00:54:10,440 SNC. 1304 00:54:10,440 --> 00:54:13,060 So the server has to maintain some data structure 1305 00:54:13,060 --> 00:54:16,960 on the side that says, for this connection, 1306 00:54:16,960 --> 00:54:18,850 here's the sequence number. 1307 00:54:18,850 --> 00:54:21,140 And it's going to say, well, my connection from C to S 1308 00:54:21,140 --> 00:54:23,740 has the sequence number SNC. 1309 00:54:26,250 --> 00:54:28,510 And the reason the server has to store this table 1310 00:54:28,510 --> 00:54:33,545 is because the server needs to figure out what 1311 00:54:33,545 --> 00:54:37,340 SNC value to accept here later. 1312 00:54:37,340 --> 00:54:38,660 Does this make sense? 1313 00:54:38,660 --> 00:54:41,562 AUDIENCE: [INAUDIBLE] SNS? 1314 00:54:41,562 --> 00:54:43,936 PROFESSOR: Yeah, the server also needs SNS I guess, yeah. 1315 00:54:48,120 --> 00:54:51,770 But it turns out that-- well, yeah, you're right. 1316 00:54:51,770 --> 00:54:57,445 And the problem is that-- actually, yeah, you're right. 1317 00:54:57,445 --> 00:54:58,945 SNS is actually much more important. 1318 00:54:58,945 --> 00:55:00,235 Sorry, yeah. 1319 00:55:00,235 --> 00:55:02,746 [INAUDIBLE] SNS is actually much more important. 1320 00:55:02,746 --> 00:55:04,371 Because SNS is how you know that you're 1321 00:55:04,371 --> 00:55:05,412 talking to the right guy. 1322 00:55:08,790 --> 00:55:12,075 The problem is that there's no real bound 1323 00:55:12,075 --> 00:55:13,710 on the size of this table. 1324 00:55:13,710 --> 00:55:16,317 So you might get packets from some machine. 1325 00:55:16,317 --> 00:55:17,650 You don't even know who sent it. 1326 00:55:17,650 --> 00:55:19,983 You just get a packet that looks like this with a source 1327 00:55:19,983 --> 00:55:21,610 address that claims to be C. 1328 00:55:21,610 --> 00:55:24,730 And in order to potentially accept a connection later 1329 00:55:24,730 --> 00:55:28,435 from this IP address, you have to create this table entry. 1330 00:55:28,435 --> 00:55:31,012 And these table entries are somewhat long lived. 1331 00:55:31,012 --> 00:55:33,345 Because maybe someone is connecting to you from a really 1332 00:55:33,345 --> 00:55:34,346 far away place. 1333 00:55:34,346 --> 00:55:35,690 There's lots of packet loss. 1334 00:55:35,690 --> 00:55:40,090 It might be not for maybe a minute until someone 1335 00:55:40,090 --> 00:55:42,730 finishes this TCP handshake in the worst case. 1336 00:55:42,730 --> 00:55:45,710 So you have to store this state in your TCP stack 1337 00:55:45,710 --> 00:55:47,980 for a relatively long time. 1338 00:55:47,980 --> 00:55:50,230 And there's no way to guess whether this 1339 00:55:50,230 --> 00:55:52,640 is a valid connection or not. 1340 00:55:52,640 --> 00:55:55,710 So one denial of service attack that people discovered 1341 00:55:55,710 --> 00:55:58,690 against most TCP stacks is to simply send 1342 00:55:58,690 --> 00:56:01,670 lots of packets like this. 1343 00:56:01,670 --> 00:56:04,980 So if I'm an attacker, then I'll just send lots of SYN packets 1344 00:56:04,980 --> 00:56:08,930 to a particular server and get it to fill up its table. 1345 00:56:08,930 --> 00:56:12,810 And the problem is that in the best case, 1346 00:56:12,810 --> 00:56:15,410 maybe the attacker just always uses the same source IP 1347 00:56:15,410 --> 00:56:16,720 address. 1348 00:56:16,720 --> 00:56:18,800 In that case, you can just say, well, 1349 00:56:18,800 --> 00:56:21,710 every client machine is allowed two entries in my table, 1350 00:56:21,710 --> 00:56:23,340 or something like this. 1351 00:56:23,340 --> 00:56:25,870 And then the attacker can use up two table entries but not 1352 00:56:25,870 --> 00:56:26,745 much more. 1353 00:56:26,745 --> 00:56:28,667 The problem, of course, is that the attacker 1354 00:56:28,667 --> 00:56:30,125 can fake these client IP addresses, 1355 00:56:30,125 --> 00:56:31,832 make them look random. 1356 00:56:31,832 --> 00:56:33,290 And then for the server, it's going 1357 00:56:33,290 --> 00:56:34,885 to be very difficult to distinguish whether this 1358 00:56:34,885 --> 00:56:37,385 is an attacker trying to connect to me or some client 1359 00:56:37,385 --> 00:56:38,510 I've never heard of before. 1360 00:56:38,510 --> 00:56:41,320 So if you're some website that's supposed to accept connections 1361 00:56:41,320 --> 00:56:44,275 from anywhere in the world, this is going to be a big problem. 1362 00:56:44,275 --> 00:56:46,870 Because either you deny access to everyone, 1363 00:56:46,870 --> 00:56:51,080 or you have a store state for all these mostly fake 1364 00:56:51,080 --> 00:56:52,716 connection attempts. 1365 00:56:52,716 --> 00:56:55,020 Does that make sense? 1366 00:56:55,020 --> 00:56:57,480 So this is a bit of a problem for TCP, and in fact 1367 00:56:57,480 --> 00:57:01,990 for most protocols that allow some sort of connection 1368 00:57:01,990 --> 00:57:04,970 initiation, and the server has to store state. 1369 00:57:04,970 --> 00:57:05,890 So there's some fixes. 1370 00:57:05,890 --> 00:57:07,490 We'll talk about in a second what 1371 00:57:07,490 --> 00:57:10,285 workaround TCP implements to try to deal with this problem. 1372 00:57:10,285 --> 00:57:13,788 This is called SYN flooding in TCP. 1373 00:57:13,788 --> 00:57:15,162 But in general, this is a problem 1374 00:57:15,162 --> 00:57:17,030 that's worth knowing about and trying 1375 00:57:17,030 --> 00:57:19,975 to avoid in any protocol you design on top as well. 1376 00:57:19,975 --> 00:57:22,120 So you want to make sure that the server doesn't 1377 00:57:22,120 --> 00:57:24,830 have to keep state until it can actually 1378 00:57:24,830 --> 00:57:27,204 authenticate and identify, who is the client? 1379 00:57:27,204 --> 00:57:29,745 Because by that time, if you've identified who the client is, 1380 00:57:29,745 --> 00:57:31,340 you've authenticated them somehow, 1381 00:57:31,340 --> 00:57:32,290 then you can actually make a decision, 1382 00:57:32,290 --> 00:57:34,515 well, every client is allowed to only connect 1383 00:57:34,515 --> 00:57:35,920 once, or something. 1384 00:57:35,920 --> 00:57:37,780 And then I'm not going to keep more state. 1385 00:57:37,780 --> 00:57:40,240 Here, the problem is you're guaranteeing 1386 00:57:40,240 --> 00:57:42,938 that you're storing state before you have any idea who it 1387 00:57:42,938 --> 00:57:44,146 is that is connecting to you. 1388 00:57:46,670 --> 00:57:48,330 So let's look at how you can actually 1389 00:57:48,330 --> 00:57:53,070 solve this SYN flooding attack where the server accumulates 1390 00:57:53,070 --> 00:57:54,850 lots of state. 1391 00:57:54,850 --> 00:57:57,530 So of course, if you could change TCP again, 1392 00:57:57,530 --> 00:58:00,810 you could fix this pretty easily by using cryptography 1393 00:58:00,810 --> 00:58:04,490 or something or changing exactly who's responsible for storing 1394 00:58:04,490 --> 00:58:05,130 what state. 1395 00:58:05,130 --> 00:58:07,100 The problem is we have TCP as is. 1396 00:58:07,100 --> 00:58:11,310 And could we fix this problem without changing the TCP wire 1397 00:58:11,310 --> 00:58:12,860 protocol? 1398 00:58:12,860 --> 00:58:15,657 So this is, again, an exercise in trying to figure out, well, 1399 00:58:15,657 --> 00:58:17,930 what exactly tricks we could play 1400 00:58:17,930 --> 00:58:21,470 or exactly what assumptions we could relax and still 1401 00:58:21,470 --> 00:58:24,715 stick to the TCP header format and other things. 1402 00:58:24,715 --> 00:58:28,900 And the trick is to in fact figure out a clever way 1403 00:58:28,900 --> 00:58:31,500 to make the server stateless without having 1404 00:58:31,500 --> 00:58:33,842 to-- so the server isn't going to have to keep 1405 00:58:33,842 --> 00:58:36,424 this table around in memory. 1406 00:58:36,424 --> 00:58:37,840 And the way we're going to do this 1407 00:58:37,840 --> 00:58:42,140 is by carefully choosing SMS. 1408 00:58:42,140 --> 00:58:44,840 Instead of using this formula we were looking at before, where 1409 00:58:44,840 --> 00:58:47,650 we were to add this function, we're 1410 00:58:47,650 --> 00:58:51,170 instead going to choose this sequence 1411 00:58:51,170 --> 00:58:52,710 number in a different way. 1412 00:58:52,710 --> 00:58:55,094 And I'll give you exactly the formula. 1413 00:58:55,094 --> 00:58:57,510 And then we'll talk about why this is actually interesting 1414 00:58:57,510 --> 00:58:59,530 and what nice properties it has. 1415 00:58:59,530 --> 00:59:02,192 So if the server detects that it's under this kind of attack, 1416 00:59:02,192 --> 00:59:03,650 it's going to switch into this mode 1417 00:59:03,650 --> 00:59:12,510 where it chooses SNS using this formula of applying 1418 00:59:12,510 --> 00:59:14,900 basically the same or similar kind of function F 1419 00:59:14,900 --> 00:59:15,490 we saw before. 1420 00:59:18,470 --> 00:59:20,100 And what it's going to apply it to 1421 00:59:20,100 --> 00:59:25,652 is the source IP, destination IP, the same things as before, 1422 00:59:25,652 --> 00:59:35,920 source port, destination port, and also timestamp, 1423 00:59:35,920 --> 00:59:39,420 and also a key in here as well. 1424 00:59:39,420 --> 00:59:45,374 And we're going to concatenate it with a timestamp as well. 1425 00:59:45,374 --> 00:59:47,665 So this timestamp is going to be fairly coarse grained. 1426 00:59:47,665 --> 00:59:49,206 It's going to go in order of minutes. 1427 00:59:49,206 --> 00:59:52,290 So every minute, the timestamp ticks off by one. 1428 00:59:52,290 --> 00:59:54,560 It's a very coarse grained time. 1429 00:59:54,560 --> 00:59:59,920 And there's probably some split between this part of the header 1430 00:59:59,920 --> 01:00:01,270 and this part of the header. 1431 01:00:01,270 --> 01:00:03,270 This timestamp doesn't need a whole lot of bits. 1432 01:00:03,270 --> 01:00:07,000 So I forget exactly what this protocol does in real machines. 1433 01:00:07,000 --> 01:00:09,730 But you could easily imagine maybe using 8 bits. 1434 01:00:09,730 --> 01:00:11,158 For the timestamp, I'm going to be 1435 01:00:11,158 --> 01:00:15,920 using 24 bits for this chunk of the sequence number. 1436 01:00:15,920 --> 01:00:18,830 All right, so why is this a good plan? 1437 01:00:18,830 --> 01:00:19,850 What's going on here? 1438 01:00:19,850 --> 01:00:21,990 Why this weird formula? 1439 01:00:21,990 --> 01:00:24,210 So I think you have to remember, one was the property 1440 01:00:24,210 --> 01:00:26,920 that we were trying to achieve of the sequence number. 1441 01:00:26,920 --> 01:00:28,580 So there's two things going on. 1442 01:00:28,580 --> 01:00:31,844 One is there's this defense against duplicated packets 1443 01:00:31,844 --> 01:00:35,041 that we were trying to achieve by-- maybe the formula is still 1444 01:00:35,041 --> 01:00:35,541 here. 1445 01:00:35,541 --> 01:00:37,030 Nope-- oh, yeah, yeah, here. 1446 01:00:37,030 --> 01:00:39,210 Right, so just to compare these guys-- so 1447 01:00:39,210 --> 01:00:42,100 when we're not under attack, we were previously 1448 01:00:42,100 --> 01:00:45,148 maintaining this old style sequence number scheme 1449 01:00:45,148 --> 01:00:47,606 to prevent duplicate packets from previous connections, all 1450 01:00:47,606 --> 01:00:49,495 this good stuff. 1451 01:00:49,495 --> 01:00:51,120 It turns out people couldn't figure out 1452 01:00:51,120 --> 01:00:53,800 a way to defend against these kinds of SYN flooding attacks 1453 01:00:53,800 --> 01:00:55,990 without giving up on this property, 1454 01:00:55,990 --> 01:00:57,370 so basically saying, well, here's 1455 01:00:57,370 --> 01:00:59,670 one plan that works well in some situations. 1456 01:00:59,670 --> 01:01:02,330 Here's a different plan where we'll give up on that ISN 1457 01:01:02,330 --> 01:01:03,760 old style component. 1458 01:01:03,760 --> 01:01:06,890 And instead, we'll focus on just ensuring 1459 01:01:06,890 --> 01:01:12,305 that if someone presents us this sequence number S in response 1460 01:01:12,305 --> 01:01:15,900 to a packet, like here, then we know it 1461 01:01:15,900 --> 01:01:18,150 must've been the right client. 1462 01:01:18,150 --> 01:01:22,434 So remember that in order to prevent IP spoofing attacks, 1463 01:01:22,434 --> 01:01:23,850 we sort of rely on this SNS value. 1464 01:01:23,850 --> 01:01:28,310 So if the server sends this SNS value to some client, then 1465 01:01:28,310 --> 01:01:30,800 hopefully only that client can send us back the correct SNS 1466 01:01:30,800 --> 01:01:32,985 value, finish establishing the connection. 1467 01:01:32,985 --> 01:01:36,220 And this is why you had to store it in this table over here. 1468 01:01:36,220 --> 01:01:37,730 Because otherwise, how do you know 1469 01:01:37,730 --> 01:01:40,610 if this is a real response or a fake response? 1470 01:01:40,610 --> 01:01:42,660 And the reason for using this function F here 1471 01:01:42,660 --> 01:01:47,670 is that now we can maybe not store this table in memory. 1472 01:01:47,670 --> 01:01:51,760 And instead, when a connection attempt arrives here, 1473 01:01:51,760 --> 01:01:53,480 we're going to compute SNS according 1474 01:01:53,480 --> 01:01:55,440 to this formula over here and just 1475 01:01:55,440 --> 01:01:58,058 send it back to whatever client pretends to have connected 1476 01:01:58,058 --> 01:01:59,250 to us. 1477 01:01:59,250 --> 01:02:01,960 And then we'll forget all about this connection. 1478 01:02:01,960 --> 01:02:05,040 And then if this third packet eventually comes through, 1479 01:02:05,040 --> 01:02:09,230 and its SNS value here matches what we would expect to see, 1480 01:02:09,230 --> 01:02:11,040 then we'll say, oh yeah, this must've 1481 01:02:11,040 --> 01:02:13,310 been someone got our response from step two 1482 01:02:13,310 --> 01:02:15,745 and finally sent it back to us. 1483 01:02:15,745 --> 01:02:17,495 And now we finally commit after step three 1484 01:02:17,495 --> 01:02:21,846 to storing a real entry for this TCP connection in memory. 1485 01:02:21,846 --> 01:02:25,350 So this is a way to sort of defer the storage of this state 1486 01:02:25,350 --> 01:02:28,820 at the server by requiring the server, the client, 1487 01:02:28,820 --> 01:02:30,420 to echo back this exact value. 1488 01:02:30,420 --> 01:02:32,716 And by constructing it in this careful way, 1489 01:02:32,716 --> 01:02:34,590 we can actually check whether the client just 1490 01:02:34,590 --> 01:02:38,598 made up this value, or if it's the real thing we're expecting. 1491 01:02:38,598 --> 01:02:40,486 Does that make sense? 1492 01:02:40,486 --> 01:02:43,320 AUDIENCE: [INAUDIBLE] SNC [INAUDIBLE]? 1493 01:02:43,320 --> 01:02:46,620 PROFESSOR: Yeah, so SNC now, we basically don't store it. 1494 01:02:46,620 --> 01:02:48,570 It's maybe not great. 1495 01:02:48,570 --> 01:02:52,134 But so it is. 1496 01:02:52,134 --> 01:02:54,470 So in fact, I guess what really happens 1497 01:02:54,470 --> 01:02:59,650 is in-- I didn't show it here. 1498 01:02:59,650 --> 01:03:05,435 But there's probably going to be sort of a null data field here 1499 01:03:05,435 --> 01:03:07,560 that says this packet has no data. 1500 01:03:07,560 --> 01:03:10,680 But it still includes the sequence number SNC just 1501 01:03:10,680 --> 01:03:12,790 because there's a field for it. 1502 01:03:12,790 --> 01:03:14,554 So this is how the server can reconstruct 1503 01:03:14,554 --> 01:03:15,857 what this SNC value is. 1504 01:03:15,857 --> 01:03:18,190 Because the client is going to include it in this packet 1505 01:03:18,190 --> 01:03:18,727 anyway. 1506 01:03:18,727 --> 01:03:19,810 It wasn't relevant before. 1507 01:03:19,810 --> 01:03:22,050 But it sort of is relevant now. 1508 01:03:22,050 --> 01:03:24,820 And we weren't going to check it against anything. 1509 01:03:24,820 --> 01:03:28,210 But it turns out to be pretty much good enough. 1510 01:03:28,210 --> 01:03:29,770 It has some unfortunate consequences. 1511 01:03:29,770 --> 01:03:33,785 Like if this is-- well, there's some complicated things 1512 01:03:33,785 --> 01:03:35,100 you might abuse here. 1513 01:03:35,100 --> 01:03:37,330 But it doesn't seem to be that bad. 1514 01:03:37,330 --> 01:03:39,370 It seems certainly better than the server 1515 01:03:39,370 --> 01:03:41,495 filling up its memory and swapping serving requests 1516 01:03:41,495 --> 01:03:43,370 all together. 1517 01:03:43,370 --> 01:03:45,630 And then we don't include in this computation. 1518 01:03:45,630 --> 01:03:48,110 Because the only thing we care about here 1519 01:03:48,110 --> 01:03:50,099 is offloaded the storage of this table 1520 01:03:50,099 --> 01:03:52,640 and making sure that the only connections that eventually you 1521 01:03:52,640 --> 01:03:56,075 do get established are legitimate clients. 1522 01:03:56,075 --> 01:03:58,110 Because therefore, we can say, well, 1523 01:03:58,110 --> 01:04:00,990 if this client is establishing a million connections to me, 1524 01:04:00,990 --> 01:04:02,698 I'll stop accepting connections from him. 1525 01:04:02,698 --> 01:04:04,150 That's easy enough, finally. 1526 01:04:04,150 --> 01:04:06,710 The problem is that all these source addresses, 1527 01:04:06,710 --> 01:04:09,180 if they're spoofed, are hard to distinguish 1528 01:04:09,180 --> 01:04:11,630 from legitimate clients. 1529 01:04:11,630 --> 01:04:12,580 Make sense? 1530 01:04:12,580 --> 01:04:13,530 Yeah. 1531 01:04:13,530 --> 01:04:15,612 AUDIENCE: Would you need to store the timestamp? 1532 01:04:15,612 --> 01:04:17,570 PROFESSOR: Ahh, so the clever thing, the reason 1533 01:04:17,570 --> 01:04:20,280 this timestamp is sort of on the slide here, 1534 01:04:20,280 --> 01:04:23,920 is that when we receive this SNS value in step three, 1535 01:04:23,920 --> 01:04:26,190 we need to figure out, how do you 1536 01:04:26,190 --> 01:04:27,690 compute the input to this function F 1537 01:04:27,690 --> 01:04:28,951 to check whether it's correct? 1538 01:04:28,951 --> 01:04:30,367 So actually, we take the timestamp 1539 01:04:30,367 --> 01:04:33,510 from the end of the packet, and we use that inside 1540 01:04:33,510 --> 01:04:35,512 of this computation. 1541 01:04:35,512 --> 01:04:36,970 Everything else we can reconstruct. 1542 01:04:36,970 --> 01:04:39,330 We know who just sent us the third step and packet. 1543 01:04:39,330 --> 01:04:41,230 And we have all these fields. 1544 01:04:41,230 --> 01:04:43,542 And we have our key, which is, again, still secret. 1545 01:04:43,542 --> 01:04:46,000 And this timestamp just comes from the end of the sequence, 1546 01:04:46,000 --> 01:04:47,810 from the last 8 bits. 1547 01:04:47,810 --> 01:04:51,040 And then it might be that we'll reject 1548 01:04:51,040 --> 01:04:55,780 timestamps that are too old, just disallow old connections. 1549 01:04:55,780 --> 01:04:56,280 Yeah. 1550 01:04:56,280 --> 01:04:57,492 AUDIENCE: So I'm guessing the reason you only 1551 01:04:57,492 --> 01:04:58,867 use this when you're under attack 1552 01:04:58,867 --> 01:05:01,160 is because you lose 8 bits of security, or whatever? 1553 01:05:01,160 --> 01:05:02,630 PROFESSOR: Yes, it's not great. 1554 01:05:02,630 --> 01:05:04,120 It has many bad properties. 1555 01:05:04,120 --> 01:05:07,668 One is you sort of lose 8 bits of security in some sense. 1556 01:05:07,668 --> 01:05:09,880 Because now the unguessable part is just 1557 01:05:09,880 --> 01:05:13,250 24 bits instead of 32 bits. 1558 01:05:13,250 --> 01:05:18,750 Another problem is what happens if you lose certain packets? 1559 01:05:18,750 --> 01:05:26,163 So if this packet is lost-- so it's typically, in TCP, 1560 01:05:26,163 --> 01:05:28,580 there's someone responsible for retransmitting something 1561 01:05:28,580 --> 01:05:30,540 if a particular packet is lost. 1562 01:05:30,540 --> 01:05:33,870 And in TCP, if the third packet is lost, 1563 01:05:33,870 --> 01:05:36,490 then the client might not be waiting for anything. 1564 01:05:36,490 --> 01:05:39,040 Or sorry, maybe the protocol we're 1565 01:05:39,040 --> 01:05:40,850 running on top of this TCP connection 1566 01:05:40,850 --> 01:05:42,308 is one where the server is supposed 1567 01:05:42,308 --> 01:05:43,900 to say something initially. 1568 01:05:43,900 --> 01:05:45,290 So I connect. 1569 01:05:45,290 --> 01:05:46,470 I just listen. 1570 01:05:46,470 --> 01:05:48,869 And in the SMTP, for example, the server 1571 01:05:48,869 --> 01:05:51,160 is supposed to send me some sort of an initial greeting 1572 01:05:51,160 --> 01:05:53,370 in the protocol. 1573 01:05:53,370 --> 01:05:55,446 So OK, suppose I'm connecting to an SMTP server. 1574 01:05:55,446 --> 01:05:57,160 I send my third packet. 1575 01:05:57,160 --> 01:05:58,120 I think I'm done. 1576 01:05:58,120 --> 01:06:00,795 I'm just waiting for the server to tell me, 1577 01:06:00,795 --> 01:06:02,190 greetings as an SMTP server. 1578 01:06:02,190 --> 01:06:04,316 Please send mail. 1579 01:06:04,316 --> 01:06:05,440 This packet could get lost. 1580 01:06:05,440 --> 01:06:08,340 And in real TCP, the way this gets handled 1581 01:06:08,340 --> 01:06:12,540 is that the server from step two remembers that, hey, I 1582 01:06:12,540 --> 01:06:13,860 sent this response. 1583 01:06:13,860 --> 01:06:15,867 I never heard back, this third thing. 1584 01:06:15,867 --> 01:06:17,283 So it's the server that's supposed 1585 01:06:17,283 --> 01:06:19,829 to resend this packet to trigger the client 1586 01:06:19,829 --> 01:06:22,489 to resend this third packet. 1587 01:06:22,489 --> 01:06:24,530 Of course, if the server isn't storing any state, 1588 01:06:24,530 --> 01:06:26,660 it has no idea what to resend. 1589 01:06:26,660 --> 01:06:28,720 So this actually makes connection establishment 1590 01:06:28,720 --> 01:06:31,669 potentially programmatic where you 1591 01:06:31,669 --> 01:06:33,710 could enter this weird state where both sides are 1592 01:06:33,710 --> 01:06:34,714 waiting for each other. 1593 01:06:34,714 --> 01:06:36,130 Well, the server doesn't even know 1594 01:06:36,130 --> 01:06:37,421 that it's waiting for anything. 1595 01:06:37,421 --> 01:06:39,180 And the client is waiting for the server. 1596 01:06:39,180 --> 01:06:41,138 And the server basically dropped responsibility 1597 01:06:41,138 --> 01:06:42,250 by not storing state. 1598 01:06:42,250 --> 01:06:44,512 So this is another reason why you 1599 01:06:44,512 --> 01:06:46,470 don't run this in production mode all the time. 1600 01:06:46,470 --> 01:06:47,216 Yeah. 1601 01:06:47,216 --> 01:06:49,950 AUDIENCE: Presumably also you could have data commissions 1602 01:06:49,950 --> 01:06:53,530 if you establish two very short-lived connections right 1603 01:06:53,530 --> 01:06:55,600 after each other from the same host. 1604 01:06:55,600 --> 01:06:56,105 PROFESSOR: Absolutely, yeah, yeah. 1605 01:06:56,105 --> 01:06:58,370 So another thing is, of course, because we gave up 1606 01:06:58,370 --> 01:07:01,340 on using this ISN old style part, 1607 01:07:01,340 --> 01:07:03,080 we now give up protection against 1608 01:07:03,080 --> 01:07:05,400 these multiple connections in a short time period being 1609 01:07:05,400 --> 01:07:07,400 independent from one another. 1610 01:07:07,400 --> 01:07:09,322 So I think there's a number of trade-offs. 1611 01:07:09,322 --> 01:07:10,770 We just talked about three. 1612 01:07:10,770 --> 01:07:12,830 There's several more things you worry about. 1613 01:07:12,830 --> 01:07:15,790 But it's not great. 1614 01:07:15,790 --> 01:07:18,150 If we could design a protocol from scratch to be better, 1615 01:07:18,150 --> 01:07:21,390 we could just have a separate nice 64-bit header for this 1616 01:07:21,390 --> 01:07:23,001 and a 64-bit value for this. 1617 01:07:23,001 --> 01:07:24,750 And then we could enable this all the time 1618 01:07:24,750 --> 01:07:26,458 without giving up the other stuff and all 1619 01:07:26,458 --> 01:07:28,148 these nice things. 1620 01:07:28,148 --> 01:07:28,648 Yeah. 1621 01:07:28,648 --> 01:07:31,133 AUDIENCE: I just had one quick question on the SNS. 1622 01:07:31,133 --> 01:07:36,432 In step two, [INAUDIBLE], do they have to be the same? 1623 01:07:36,432 --> 01:07:37,821 PROFESSOR: This SNS and this SNS? 1624 01:07:37,821 --> 01:07:38,750 AUDIENCE: Mhm. 1625 01:07:38,750 --> 01:07:41,400 PROFESSOR: Yeah, because otherwise, 1626 01:07:41,400 --> 01:07:45,685 the server has no way to conclude that this client got 1627 01:07:45,685 --> 01:07:47,330 our packet. 1628 01:07:47,330 --> 01:07:51,020 If the server didn't check that this SNS was the same value as 1629 01:07:51,020 --> 01:07:54,197 before, then these actually would be even worse. 1630 01:07:54,197 --> 01:07:56,405 Because I could fake a connection from some arbitrary 1631 01:07:56,405 --> 01:07:58,810 IP address, then get this response. 1632 01:07:58,810 --> 01:08:00,362 Maybe I don't even get it, because it 1633 01:08:00,362 --> 01:08:01,320 goes to a different IP. 1634 01:08:01,320 --> 01:08:04,114 Then I establish a connection from some other IP address. 1635 01:08:04,114 --> 01:08:05,530 And then the server is maintaining 1636 01:08:05,530 --> 01:08:06,812 a whole live connection. 1637 01:08:06,812 --> 01:08:09,020 Probably a server crosses another side waiting for me 1638 01:08:09,020 --> 01:08:10,800 to send data and so on. 1639 01:08:10,800 --> 01:08:13,660 AUDIENCE: But the timestamp is going to be different, right? 1640 01:08:13,660 --> 01:08:15,288 So how can the server recalculate 1641 01:08:15,288 --> 01:08:17,708 that with a new timestamp and null the one before 1642 01:08:17,708 --> 01:08:19,443 if it doesn't store any state? 1643 01:08:19,443 --> 01:08:21,859 PROFESSOR: So the way this works is these timestamps, as I 1644 01:08:21,859 --> 01:08:23,150 was saying, are course grained. 1645 01:08:23,150 --> 01:08:24,899 So they're on a scale of minutes. 1646 01:08:24,899 --> 01:08:26,631 So if you connect within the same minute, 1647 01:08:26,631 --> 01:08:30,540 then you're in good shape. 1648 01:08:30,540 --> 01:08:33,820 And if you connect on the minute boundary, well, too bad. 1649 01:08:33,820 --> 01:08:35,569 Yet another problem with the scheme-- it's 1650 01:08:35,569 --> 01:08:37,155 imperfect in many ways. 1651 01:08:37,155 --> 01:08:39,180 But most operating systems, including Linux, 1652 01:08:39,180 --> 01:08:42,440 actually have ways of detecting if there's too many entries 1653 01:08:42,440 --> 01:08:44,689 building up in this table that aren't being completed. 1654 01:08:44,689 --> 01:08:46,750 It switches to this other scheme instead 1655 01:08:46,750 --> 01:08:48,590 to make sure it doesn't overflow this table. 1656 01:08:48,590 --> 01:08:49,071 Yeah. 1657 01:08:49,071 --> 01:08:50,737 AUDIENCE: So if the attacker has control 1658 01:08:50,737 --> 01:08:53,400 of a lot of IP addresses, and they do this, 1659 01:08:53,400 --> 01:08:55,324 and even if you switch it the same-- 1660 01:08:55,324 --> 01:08:57,032 PROFESSOR: Yeah, so then actually there's 1661 01:08:57,032 --> 01:08:58,644 not much you can do. 1662 01:08:58,644 --> 01:09:00,060 The reason that we were so worried 1663 01:09:00,060 --> 01:09:01,560 about this scheme in the first place 1664 01:09:01,560 --> 01:09:04,485 is because we wanted to filter out or somehow 1665 01:09:04,485 --> 01:09:06,828 distinguish between the attacker and the good guys. 1666 01:09:06,828 --> 01:09:09,290 And if the attacker has more IP addresses 1667 01:09:09,290 --> 01:09:11,420 and just controls more machines than the good guys, 1668 01:09:11,420 --> 01:09:14,003 then he can just connect to our server and request lots of web 1669 01:09:14,003 --> 01:09:16,210 pages or maintain connections. 1670 01:09:16,210 --> 01:09:18,100 And it's very hard then for the server 1671 01:09:18,100 --> 01:09:21,060 to distinguish whether these are legitimate clients or just 1672 01:09:21,060 --> 01:09:23,350 the attacker tying up resources of the server. 1673 01:09:23,350 --> 01:09:24,880 So you're absolutely right. 1674 01:09:24,880 --> 01:09:27,170 This only addresses the case where the attacker 1675 01:09:27,170 --> 01:09:29,060 has a small number of IP addresses 1676 01:09:29,060 --> 01:09:32,130 and wants to amplify his effect. 1677 01:09:32,130 --> 01:09:34,109 But it is a worry. 1678 01:09:34,109 --> 01:09:38,819 And in fact, today it might be that some attackers control 1679 01:09:38,819 --> 01:09:40,488 a large number of compromised machines, 1680 01:09:40,488 --> 01:09:42,529 like just desktop machines of someone that didn't 1681 01:09:42,529 --> 01:09:44,331 patch their machine correctly. 1682 01:09:44,331 --> 01:09:46,580 And then they can just mount denial of service attacks 1683 01:09:46,580 --> 01:09:48,960 from this distributed set of machines all over the world. 1684 01:09:48,960 --> 01:09:53,680 And that's pretty hard to defend against. 1685 01:09:53,680 --> 01:09:56,030 So another actually interesting thing I want to mention 1686 01:09:56,030 --> 01:10:02,200 is denial of service attacks, but in the particular way 1687 01:10:02,200 --> 01:10:05,049 that other protocols make them worse. 1688 01:10:05,049 --> 01:10:07,340 I guess other protocols allow denial of service attacks 1689 01:10:07,340 --> 01:10:08,131 in the first place. 1690 01:10:08,131 --> 01:10:08,692 I'm sorry. 1691 01:10:08,692 --> 01:10:11,150 But there are some that are protocols that are particularly 1692 01:10:11,150 --> 01:10:13,370 susceptible to abuse. 1693 01:10:13,370 --> 01:10:16,510 And probably a good example of that 1694 01:10:16,510 --> 01:10:19,150 is, again, this DNS protocol that we were looking at before. 1695 01:10:19,150 --> 01:10:21,890 So the DNS protocol-- we still have it 1696 01:10:21,890 --> 01:10:24,990 here-- involves the client sending a request 1697 01:10:24,990 --> 01:10:27,540 to the server and the server sending a response back 1698 01:10:27,540 --> 01:10:29,300 to the client. 1699 01:10:29,300 --> 01:10:34,310 And in many cases, the response is larger than the request. 1700 01:10:34,310 --> 01:10:36,890 The request could be just, tell me about mit.edu. 1701 01:10:36,890 --> 01:10:38,710 And the response might be all the records 1702 01:10:38,710 --> 01:10:41,290 the server has about mit.edu-- the email address, 1703 01:10:41,290 --> 01:10:44,660 the mail server for mit.edu, the assigned record if it's 1704 01:10:44,660 --> 01:10:46,030 using DNS SEC, and so on. 1705 01:10:46,030 --> 01:10:47,630 So the query might be 100 bytes. 1706 01:10:47,630 --> 01:10:50,946 The response could well be over 1,000 bytes. 1707 01:10:50,946 --> 01:10:53,120 So suppose that you want to flood 1708 01:10:53,120 --> 01:10:57,510 some guy with lots of packets or lots of bandwidth. 1709 01:10:57,510 --> 01:10:59,074 Well, you might only be able to send 1710 01:10:59,074 --> 01:11:00,240 a small amount of bandwidth. 1711 01:11:00,240 --> 01:11:03,030 But what you could do is you could fake queries to DNS 1712 01:11:03,030 --> 01:11:04,725 servers on behalf of that guy. 1713 01:11:04,725 --> 01:11:06,170 So you only have to send 100 bytes 1714 01:11:06,170 --> 01:11:10,360 to some DNS server pretending to be a query from that poor guy. 1715 01:11:10,360 --> 01:11:12,880 And the DNS server is going to send 1,000 bytes to him 1716 01:11:12,880 --> 01:11:14,260 on your behalf. 1717 01:11:14,260 --> 01:11:17,920 So this is a problematic feature of this protocol. 1718 01:11:17,920 --> 01:11:21,510 Because it allows you to amplify bandwidth attacks. 1719 01:11:21,510 --> 01:11:23,285 And partly for the same reason we 1720 01:11:23,285 --> 01:11:26,250 were talking about with TCP's SYN flooding attacks, 1721 01:11:26,250 --> 01:11:28,740 it's very hard for the server, for the DNS server, 1722 01:11:28,740 --> 01:11:32,110 in this case, to know whether this request is valid or not. 1723 01:11:32,110 --> 01:11:34,439 Because there's no authentication or no sort 1724 01:11:34,439 --> 01:11:35,980 of sequence number exchanges going on 1725 01:11:35,980 --> 01:11:38,188 to tell that this is the right guy connecting to you, 1726 01:11:38,188 --> 01:11:39,520 et cetera. 1727 01:11:39,520 --> 01:11:42,450 So in fact this is still a problem in DNS today. 1728 01:11:42,450 --> 01:11:45,180 And it gets used quite frequently 1729 01:11:45,180 --> 01:11:47,730 to attack people with bandwidth attacks. 1730 01:11:47,730 --> 01:11:50,184 So if you have a certain amount of bandwidth, 1731 01:11:50,184 --> 01:11:51,600 you'll be that much more effective 1732 01:11:51,600 --> 01:11:54,380 if you reflect your attack off of a DNS server. 1733 01:11:54,380 --> 01:11:57,400 And these DNS servers are very well provisioned. 1734 01:11:57,400 --> 01:11:59,460 And they basically have to respond to every query 1735 01:11:59,460 --> 01:12:00,127 out there. 1736 01:12:00,127 --> 01:12:01,960 Because if they stop responding to requests, 1737 01:12:01,960 --> 01:12:03,530 then probably some legitimate requests are going to get 1738 01:12:03,530 --> 01:12:04,030 dropped. 1739 01:12:04,030 --> 01:12:05,846 So this is a big problem in practice. 1740 01:12:05,846 --> 01:12:06,346 Yeah. 1741 01:12:06,346 --> 01:12:08,786 AUDIENCE: So if you can still see it on the DNS server, 1742 01:12:08,786 --> 01:12:15,140 [INAUDIBLE] requests and never reply to-- 1743 01:12:15,140 --> 01:12:17,820 PROFESSOR: Right, yeah, so it's possible to maybe modify 1744 01:12:17,820 --> 01:12:20,757 the DNS server to keep some sort of state like this. 1745 01:12:20,757 --> 01:12:22,965 AUDIENCE: That's the reason why this still works now, 1746 01:12:22,965 --> 01:12:24,170 because they don't store state? 1747 01:12:24,170 --> 01:12:25,878 PROFESSOR: Yeah, well I think some people 1748 01:12:25,878 --> 01:12:29,015 are starting to modify DNS server to try to store state. 1749 01:12:29,015 --> 01:12:32,020 A lot of times, there's so many DNS servers out there 1750 01:12:32,020 --> 01:12:33,520 that it doesn't matter. 1751 01:12:33,520 --> 01:12:37,498 Even if you appear to do 10 queries against every DNS 1752 01:12:37,498 --> 01:12:38,900 server, that's still every packet 1753 01:12:38,900 --> 01:12:42,060 gets amplified by some significant factor. 1754 01:12:42,060 --> 01:12:43,450 And they have to respond. 1755 01:12:43,450 --> 01:12:46,125 Because maybe that client really is trying to issue this query. 1756 01:12:46,125 --> 01:12:47,000 So this is a problem. 1757 01:12:47,000 --> 01:12:49,190 Yeah, so you're right, if this was one DNS server, 1758 01:12:49,190 --> 01:12:51,170 then this would be maybe not as big of a deal. 1759 01:12:51,170 --> 01:12:53,870 The problem is also that the root servers for DNS, 1760 01:12:53,870 --> 01:12:55,430 for example, aren't a single machine. 1761 01:12:55,430 --> 01:12:57,360 It's actually racks and racks of servers. 1762 01:12:57,360 --> 01:12:59,120 Because they're so heavily used. 1763 01:12:59,120 --> 01:13:02,085 And trying to maintain a state across all these machines 1764 01:13:02,085 --> 01:13:03,430 is probably nontrivial. 1765 01:13:03,430 --> 01:13:05,850 So as it gets abused more, probably it 1766 01:13:05,850 --> 01:13:09,582 will be more worthwhile to maintain this state. 1767 01:13:09,582 --> 01:13:11,082 I guess a general principle you want 1768 01:13:11,082 --> 01:13:15,120 to follow in any protocol-- well, 1769 01:13:15,120 --> 01:13:17,340 might be a good principle-- is to make 1770 01:13:17,340 --> 01:13:19,855 the client do at least as much work as the server is doing. 1771 01:13:19,855 --> 01:13:22,450 So here, the problem is the client isn't doing as much work 1772 01:13:22,450 --> 01:13:23,310 as the server. 1773 01:13:23,310 --> 01:13:27,280 That's why the server can help the client amplify this effect. 1774 01:13:27,280 --> 01:13:29,120 If you were redesigning DNS from scratch, 1775 01:13:29,120 --> 01:13:30,970 and this was really your big concern, 1776 01:13:30,970 --> 01:13:33,510 then it'd probably be fairly straightforward to fix this. 1777 01:13:33,510 --> 01:13:36,200 The client has to send a request that 1778 01:13:36,200 --> 01:13:40,090 has extra padding bytes just there just wasting bandwidth. 1779 01:13:40,090 --> 01:13:42,610 And then the server is going to respond back 1780 01:13:42,610 --> 01:13:44,880 with a response that's at most as big as that. 1781 01:13:44,880 --> 01:13:46,400 And if you want a response that's bigger, maybe 1782 01:13:46,400 --> 01:13:48,858 the server will say, sorry, your padding wasn't big enough. 1783 01:13:48,858 --> 01:13:49,780 Send me more padding. 1784 01:13:49,780 --> 01:13:53,300 And this way, you guarantee that the DNS server cannot be used 1785 01:13:53,300 --> 01:13:58,676 ever to amplify these kinds of bandwidth attacks. 1786 01:13:58,676 --> 01:14:00,050 Actually, these kinds of problems 1787 01:14:00,050 --> 01:14:02,390 happen also at higher levels as well. 1788 01:14:02,390 --> 01:14:04,660 So in web applications, you often 1789 01:14:04,660 --> 01:14:07,505 have web services that do lots and lots of computation 1790 01:14:07,505 --> 01:14:08,825 on behalf of a single request. 1791 01:14:08,825 --> 01:14:11,200 And there's often denial of service attacks at that level 1792 01:14:11,200 --> 01:14:15,200 where adversaries know that a certain operation is very 1793 01:14:15,200 --> 01:14:17,290 expensive, and they'll just ask for that operation 1794 01:14:17,290 --> 01:14:18,940 to be done over and over again. 1795 01:14:18,940 --> 01:14:22,520 And unless you carefully design your protocol and application 1796 01:14:22,520 --> 01:14:24,610 to allow the client to prove that, oh, I'm 1797 01:14:24,610 --> 01:14:28,670 burning at least as much work as you, or something like this, 1798 01:14:28,670 --> 01:14:32,803 then it's hard to defend against these things as well. 1799 01:14:32,803 --> 01:14:34,760 Make sense? 1800 01:14:34,760 --> 01:14:36,950 All right, so I guess the last thing 1801 01:14:36,950 --> 01:14:38,990 I want to briefly touch on about the paper 1802 01:14:38,990 --> 01:14:41,150 we talked about as well is these routing attacks. 1803 01:14:41,150 --> 01:14:43,120 And the reason these attacks are interesting 1804 01:14:43,120 --> 01:14:46,740 is they're maybe popping up a level above these protocol 1805 01:14:46,740 --> 01:14:48,200 transport level issues. 1806 01:14:48,200 --> 01:14:50,710 And look at what goes wrong in an application. 1807 01:14:50,710 --> 01:14:52,960 And the routing protocol is a particularly interesting 1808 01:14:52,960 --> 01:14:53,460 example. 1809 01:14:53,460 --> 01:14:56,360 Because it's often the place where 1810 01:14:56,360 --> 01:14:58,670 trust and sort of initial configuration gets 1811 01:14:58,670 --> 01:15:01,230 bootstrapped in the first place. 1812 01:15:01,230 --> 01:15:04,200 And it's easy to sort of get that wrong. 1813 01:15:04,200 --> 01:15:07,800 And even today, there's not great authentication mechanisms 1814 01:15:07,800 --> 01:15:08,790 for that. 1815 01:15:08,790 --> 01:15:11,560 Perhaps the clearest example is the DHCP protocol 1816 01:15:11,560 --> 01:15:13,660 that all of you guys use when you open a computer 1817 01:15:13,660 --> 01:15:16,017 or connect to some wireless or wired network. 1818 01:15:16,017 --> 01:15:17,850 The computer just sends out a packet saying, 1819 01:15:17,850 --> 01:15:20,370 I want an IP address and other stuff. 1820 01:15:20,370 --> 01:15:23,900 And some DHCP server at MIT typically receives that packet 1821 01:15:23,900 --> 01:15:27,957 and sends you back, here's an IP address that you should use. 1822 01:15:27,957 --> 01:15:29,790 And also here's a DNS server you should use, 1823 01:15:29,790 --> 01:15:33,030 and other interesting configuration data. 1824 01:15:33,030 --> 01:15:35,980 And the problem is that the DHCP request packet is just 1825 01:15:35,980 --> 01:15:37,990 broadcasting on the local network trying 1826 01:15:37,990 --> 01:15:39,040 to reach the DHCP server. 1827 01:15:39,040 --> 01:15:40,350 Because you actually don't know what 1828 01:15:40,350 --> 01:15:41,934 the DHCP is going to be ahead of time. 1829 01:15:41,934 --> 01:15:44,433 You're just plugging into the network, the first time you've 1830 01:15:44,433 --> 01:15:45,310 been here, let's say. 1831 01:15:45,310 --> 01:15:47,970 And your client doesn't know what else to do 1832 01:15:47,970 --> 01:15:49,770 or who to trust. 1833 01:15:49,770 --> 01:15:52,890 And consequently, any machine on the local network 1834 01:15:52,890 --> 01:15:54,660 could intercept these DHCP requests 1835 01:15:54,660 --> 01:15:56,734 and respond back with any IP address 1836 01:15:56,734 --> 01:15:59,150 that the client could use, and also maybe tell the client, 1837 01:15:59,150 --> 01:16:01,525 hey you should use my DNS server instead of the real one. 1838 01:16:01,525 --> 01:16:03,774 And then you could intercept those future DNS requests 1839 01:16:03,774 --> 01:16:04,890 from the client and so on. 1840 01:16:04,890 --> 01:16:06,900 That make sense? 1841 01:16:06,900 --> 01:16:09,640 So I think these protocols are fairly tricky to get right. 1842 01:16:09,640 --> 01:16:12,300 And on a global scale, the protocols like BGP 1843 01:16:12,300 --> 01:16:14,940 allow any participant to announce a particular IP 1844 01:16:14,940 --> 01:16:18,505 address prefix for the world to sort of know about 1845 01:16:18,505 --> 01:16:21,200 and route packets toward the attacker. 1846 01:16:21,200 --> 01:16:25,053 There's certainly been attacks where some router participating 1847 01:16:25,053 --> 01:16:29,546 in BGP says, oh, I'm a very quick way 1848 01:16:29,546 --> 01:16:31,409 to reach this particular IP address range. 1849 01:16:31,409 --> 01:16:32,950 And then all the routers in the world 1850 01:16:32,950 --> 01:16:36,090 say, OK, sure, we'll send those packets to you. 1851 01:16:36,090 --> 01:16:40,330 And probably the most frequent abuse of this 1852 01:16:40,330 --> 01:16:42,432 is by spammers who want to send spam, 1853 01:16:42,432 --> 01:16:44,720 but their old IP addresses are blacklisted everywhere, 1854 01:16:44,720 --> 01:16:46,000 because they are sending spam. 1855 01:16:46,000 --> 01:16:47,910 So they just pick some random IP address. 1856 01:16:47,910 --> 01:16:50,332 They announce that, oh yeah, this IP address is now here. 1857 01:16:50,332 --> 01:16:52,290 And then they sort of announce this IP address, 1858 01:16:52,290 --> 01:16:54,080 send spam from it, and then disconnect. 1859 01:16:54,080 --> 01:16:57,935 And it gets abused a fair amount this way. 1860 01:16:57,935 --> 01:17:00,382 It's sort of getting less now. 1861 01:17:00,382 --> 01:17:01,590 But it's kind of hard to fix. 1862 01:17:01,590 --> 01:17:04,560 Because in order to fix it, you have 1863 01:17:04,560 --> 01:17:07,434 to know whether someone really owns that IP address or not. 1864 01:17:07,434 --> 01:17:09,100 And it's hard to do without establishing 1865 01:17:09,100 --> 01:17:12,100 some global database of, maybe, cryptographic keys 1866 01:17:12,100 --> 01:17:13,890 for every ISP in the world. 1867 01:17:13,890 --> 01:17:16,510 And it takes quite a bit of effort by someone 1868 01:17:16,510 --> 01:17:18,080 to build this database. 1869 01:17:18,080 --> 01:17:20,640 The same actually applies to DNS SEC as well. 1870 01:17:20,640 --> 01:17:23,350 In order to know which signature to look for in DNS, 1871 01:17:23,350 --> 01:17:25,690 you have to have a cryptographic key associated 1872 01:17:25,690 --> 01:17:27,420 with every entity in the world. 1873 01:17:27,420 --> 01:17:28,677 And it's not there now. 1874 01:17:28,677 --> 01:17:30,010 Maybe it'll get built up slowly. 1875 01:17:30,010 --> 01:17:34,910 But it's certainly one big problem for adopting DNS SEC. 1876 01:17:34,910 --> 01:17:37,542 All right, so I guess the thing to take away from this 1877 01:17:37,542 --> 01:17:39,500 is maybe just a bunch of lessons about what not 1878 01:17:39,500 --> 01:17:41,200 to do in general in protocols. 1879 01:17:41,200 --> 01:17:43,074 But also actually one thing I want to mention 1880 01:17:43,074 --> 01:17:46,307 is that while probably secrecy and integrity are 1881 01:17:46,307 --> 01:17:48,390 good properties and driving force of higher levels 1882 01:17:48,390 --> 01:17:50,637 of abstraction, like in cryptographic protocols 1883 01:17:50,637 --> 01:17:53,220 in the application-- and we'll look at that in next lectures-- 1884 01:17:53,220 --> 01:17:55,303 one thing that you really do want from the network 1885 01:17:55,303 --> 01:17:57,384 is some sort of availability and DOS resistance. 1886 01:17:57,384 --> 01:17:59,050 Because these properties are much harder 1887 01:17:59,050 --> 01:18:00,850 to achieve at higher levels in the stack. 1888 01:18:00,850 --> 01:18:02,266 So you really want to avoid things 1889 01:18:02,266 --> 01:18:04,710 like maybe these amplification attacks, maybe 1890 01:18:04,710 --> 01:18:09,250 these SYN flooding attacks, maybe these RST attacks 1891 01:18:09,250 --> 01:18:11,680 where you can shoot down an arbitrary person's connection. 1892 01:18:11,680 --> 01:18:14,096 These are things that are really damaging at the low level 1893 01:18:14,096 --> 01:18:16,190 and that are hard to fix higher up. 1894 01:18:16,190 --> 01:18:19,155 But the integrity and confidentiality you 1895 01:18:19,155 --> 01:18:20,780 can more or less solve with encryption. 1896 01:18:20,780 --> 01:18:23,310 And we'll talk about how we do that in the next lecture 1897 01:18:23,310 --> 01:18:23,910 on Cerberus. 1898 01:18:23,910 --> 01:18:25,760 See you guys then.