1 00:00:00,890 --> 00:00:03,870 So I'm back. 2 00:00:03,870 --> 00:00:06,000 I guess by your presence here you've 3 00:00:06,000 --> 00:00:09,570 shown that for you people, DP1 is well under control. 4 00:00:09,570 --> 00:00:12,100 And that's great. 5 00:00:12,100 --> 00:00:16,610 OK, so today is the last lecture on this topic on the networking 6 00:00:16,610 --> 00:00:18,369 piece of 6.033. 7 00:00:18,369 --> 00:00:19,910 And, the topic for today is something 8 00:00:19,910 --> 00:00:21,140 called congestion control. 9 00:00:21,140 --> 00:00:24,382 And what we're going to do is spend 10 00:00:24,382 --> 00:00:26,340 most of today talking about congestion control. 11 00:00:26,340 --> 00:00:29,902 But let me remind you of where we are in networking. 12 00:00:29,902 --> 00:00:31,610 And so if there's one thing you take away 13 00:00:31,610 --> 00:00:33,656 from 6.033 from this networking piece 14 00:00:33,656 --> 00:00:35,030 you should remember this picture. 15 00:00:38,570 --> 00:00:41,270 So the way we're dealing with networking, 16 00:00:41,270 --> 00:00:43,090 and the way we always deal with networking 17 00:00:43,090 --> 00:00:47,320 in any complicated system is to layer network protocols. 18 00:00:47,320 --> 00:00:50,744 And the particular learning model that we picked for 6.033 19 00:00:50,744 --> 00:00:52,910 is a subset of what you'd see out in the real world. 20 00:00:52,910 --> 00:00:55,050 And it's mostly accurate. 21 00:00:55,050 --> 00:00:56,479 There is something called the link 22 00:00:56,479 --> 00:00:58,270 layer which deals with transmitting packets 23 00:00:58,270 --> 00:00:59,240 on the link. 24 00:00:59,240 --> 00:01:02,660 And of that, you have the network layer. 25 00:01:02,660 --> 00:01:04,390 And, the particular kind of network layer 26 00:01:04,390 --> 00:01:07,660 that we are talking about is a very popular one 27 00:01:07,660 --> 00:01:10,816 which provides a kind of service called a best effort service. 28 00:01:10,816 --> 00:01:12,690 And the easiest way to understand best effort 29 00:01:12,690 --> 00:01:14,800 services: it just tries to get packets 30 00:01:14,800 --> 00:01:16,280 through from one end to another. 31 00:01:16,280 --> 00:01:18,700 But it doesn't get all panicked if it can't get packets 32 00:01:18,700 --> 00:01:19,200 through. 33 00:01:19,200 --> 00:01:22,140 It just lets a higher layer called the end 34 00:01:22,140 --> 00:01:27,080 to end layer deal with any problems such as lost 35 00:01:27,080 --> 00:01:30,120 packets, or missing packets, or corrupted packets, 36 00:01:30,120 --> 00:01:33,000 reordered packets, and so on. 37 00:01:33,000 --> 00:01:37,620 And last time when Sam gave the lecture, 38 00:01:37,620 --> 00:01:40,890 he talked about a few things that the end to end layer does. 39 00:01:40,890 --> 00:01:43,980 And in particular we talked about how the end to end layer 40 00:01:43,980 --> 00:01:49,130 achieves reliability using acknowledgments. 41 00:01:49,130 --> 00:01:50,830 So the receiver acknowledges packets 42 00:01:50,830 --> 00:01:53,190 that it receives from the center, and the sender, 43 00:01:53,190 --> 00:01:56,240 if it misses an acknowledgment goes ahead and retransmits 44 00:01:56,240 --> 00:01:57,880 the packet. 45 00:01:57,880 --> 00:01:59,930 And in order to do that, we spent some time 46 00:01:59,930 --> 00:02:03,577 talking about timers because you have to not try 47 00:02:03,577 --> 00:02:04,660 to retransmit immediately. 48 00:02:04,660 --> 00:02:06,285 You have to wait some time until you're 49 00:02:06,285 --> 00:02:07,940 pretty sure that the packet's actually 50 00:02:07,940 --> 00:02:12,100 lost before you go ahead and try to retransmit the packet. 51 00:02:12,100 --> 00:02:16,770 The next concept at the end to end layer that we talked about 52 00:02:16,770 --> 00:02:22,460 is something called a sliding window, where the idea was 53 00:02:22,460 --> 00:02:25,316 that if I send you a packet, got an acknowledgment back, 54 00:02:25,316 --> 00:02:26,690 and then sent you the next packet 55 00:02:26,690 --> 00:02:29,590 and did that, things are really slow. 56 00:02:29,590 --> 00:02:31,090 And all that a sliding window is, 57 00:02:31,090 --> 00:02:34,260 is really an idea that you've already seen from an earlier 58 00:02:34,260 --> 00:02:38,180 chapter, and probably from 6.004 called pipelining, which 59 00:02:38,180 --> 00:02:41,320 is to have multiple outstanding things in the pipe 60 00:02:41,320 --> 00:02:46,840 or in the network at once as a way to get higher performance. 61 00:02:46,840 --> 00:02:50,690 And the last thing that we talked about last time 62 00:02:50,690 --> 00:02:53,230 was flow control. 63 00:02:53,230 --> 00:02:55,250 And the idea here is to make sure 64 00:02:55,250 --> 00:02:57,500 that the sender doesn't send too fast because if it's 65 00:02:57,500 --> 00:03:00,770 sent really fast, it would swamp the receiver, which 66 00:03:00,770 --> 00:03:04,400 might be slow trying to keep up processing the sender's 67 00:03:04,400 --> 00:03:04,900 packets. 68 00:03:04,900 --> 00:03:07,360 So you don't want to swamp the receiver's buffer. 69 00:03:07,360 --> 00:03:10,980 And we talked about how with every acknowledgment, 70 00:03:10,980 --> 00:03:13,620 the receiver can piggyback some information 71 00:03:13,620 --> 00:03:16,580 about how much space it has remaining in its buffer. 72 00:03:16,580 --> 00:03:17,970 And if that clamped down to zero, 73 00:03:17,970 --> 00:03:21,090 then the sender would automatically slow down. 74 00:03:21,090 --> 00:03:23,624 You would produce what's also known as back pressure back 75 00:03:23,624 --> 00:03:24,540 to the sender's thing. 76 00:03:24,540 --> 00:03:26,530 I don't have any more space to slow down. 77 00:03:26,530 --> 00:03:29,280 And the sender guarantees that it won't send more packets 78 00:03:29,280 --> 00:03:31,670 at any given time, more data at any given 79 00:03:31,670 --> 00:03:37,120 time than what the receiver says it can handle in its buffer. 80 00:03:37,120 --> 00:03:39,890 So that was the plan. 81 00:03:39,890 --> 00:03:42,390 Those are the things we talked about so far for the end 82 00:03:42,390 --> 00:03:43,372 to end layer. 83 00:03:43,372 --> 00:03:44,830 And what we're going to do today is 84 00:03:44,830 --> 00:03:48,120 to go back to one of the main things about networks that 85 00:03:48,120 --> 00:03:50,400 was mentioned during the first networking lecture 86 00:03:50,400 --> 00:03:53,080 and talk about how we achieve that, which is sharing. 87 00:03:57,890 --> 00:04:02,230 Ultimately, it's extremely inefficient to build networks 88 00:04:02,230 --> 00:04:03,630 where every computer is connected 89 00:04:03,630 --> 00:04:05,129 to every other computer in the world 90 00:04:05,129 --> 00:04:08,750 by a dedicated link or a dedicated path of its own. 91 00:04:08,750 --> 00:04:10,550 Fundamentally, networks are efficient 92 00:04:10,550 --> 00:04:13,410 only if they allow computers connected 93 00:04:13,410 --> 00:04:15,120 to each other to share paths underneath. 94 00:04:15,120 --> 00:04:20,089 And the moment you have sharing of links, 95 00:04:20,089 --> 00:04:23,570 you have to worry about sharing the resources, 96 00:04:23,570 --> 00:04:26,600 namely sharing the bandwidth of a link 97 00:04:26,600 --> 00:04:28,970 and that's what we're going to do today. 98 00:04:28,970 --> 00:04:30,930 And we're going to do this. 99 00:04:30,930 --> 00:04:34,120 Basically the goal for today is to talk about the problems that 100 00:04:34,120 --> 00:04:36,210 arise if you don't you do sharing properly, 101 00:04:36,210 --> 00:04:38,980 and then spend some time talking about how 102 00:04:38,980 --> 00:04:40,065 we solve these problems. 103 00:04:43,610 --> 00:04:46,560 So imagine you have a network. 104 00:04:46,560 --> 00:04:49,032 And I'm going to start with a simple example. 105 00:04:49,032 --> 00:04:50,490 And we're going to use that example 106 00:04:50,490 --> 00:04:52,239 to write through, because it will turn out 107 00:04:52,239 --> 00:04:54,200 that the simple example will illustrate 108 00:04:54,200 --> 00:04:57,250 the essential problem with sharing. 109 00:04:57,250 --> 00:05:00,940 So imagine you have a bunch of computers connected 110 00:05:00,940 --> 00:05:02,030 to one end of the network. 111 00:05:02,030 --> 00:05:07,174 And at the other end you have other computers. 112 00:05:07,174 --> 00:05:08,590 And imagine that these are senders 113 00:05:08,590 --> 00:05:09,810 and these are receivers. 114 00:05:09,810 --> 00:05:11,880 And they share the network. 115 00:05:11,880 --> 00:05:14,300 And you know a really simple form of sharing this network 116 00:05:14,300 --> 00:05:17,180 might be when you have all of these computers, 117 00:05:17,180 --> 00:05:20,160 take their links like the Ethernet connections 118 00:05:20,160 --> 00:05:23,260 and hook them up to a switch. 119 00:05:23,260 --> 00:05:25,230 And maybe you hook it up to another switch. 120 00:05:25,230 --> 00:05:26,640 And then there's other paths that 121 00:05:26,640 --> 00:05:29,670 eventually take you to the receivers 122 00:05:29,670 --> 00:05:32,580 that you want to talk to. 123 00:05:32,580 --> 00:05:37,010 And imagine just for example that these are fake lengths. 124 00:05:37,010 --> 00:05:40,440 So for example these might be 100 Mb per second links. 125 00:05:40,440 --> 00:05:48,430 And then you go [SOUND OFF/THEN ON], and that might be, 126 00:05:48,430 --> 00:05:51,330 let's say it's a megabit per second link. 127 00:05:51,330 --> 00:05:54,090 So this might be 1 Mb per second. 128 00:05:54,090 --> 00:05:56,740 And these things are 100 Mb per second. 129 00:05:56,740 --> 00:06:00,690 And of course you could have the receivers connected 130 00:06:00,690 --> 00:06:03,630 with extremely fast links as well, which 131 00:06:03,630 --> 00:06:05,400 means that if the sender and the receiver 132 00:06:05,400 --> 00:06:08,680 just looked at their own access links, these 100 133 00:06:08,680 --> 00:06:11,990 Mb per second links, and just thought to themselves, well, 134 00:06:11,990 --> 00:06:13,600 I have 100 Mb per second links. 135 00:06:13,600 --> 00:06:15,730 The sender has 100 Mb per second link, 136 00:06:15,730 --> 00:06:19,440 so clearly we could exchange data at 100 Mb per second. 137 00:06:19,440 --> 00:06:22,180 That would be flawed because all of these things 138 00:06:22,180 --> 00:06:28,570 go through this relatively thin pipe of a megabit per second. 139 00:06:28,570 --> 00:06:30,140 So really, the goal here is to take 140 00:06:30,140 --> 00:06:32,180 all of these connections, all of these end 141 00:06:32,180 --> 00:06:34,620 to end transfers that might be happening at any given 142 00:06:34,620 --> 00:06:39,470 point in time, and share every link in the network properly. 143 00:06:39,470 --> 00:06:44,364 And I'll define what I mean by properly as we go along. 144 00:06:44,364 --> 00:06:45,780 So let's write some notation first 145 00:06:45,780 --> 00:06:49,190 because it will help us pose the problem clearly. 146 00:06:49,190 --> 00:06:51,470 So let's say that there is some offered 147 00:06:51,470 --> 00:06:53,880 load that these different senders offered to the network. 148 00:06:53,880 --> 00:06:56,970 So you've decided to download a bunch of files, 149 00:06:56,970 --> 00:06:58,530 music files, and WebPages. 150 00:06:58,530 --> 00:07:01,320 And each of those has a certain load 151 00:07:01,320 --> 00:07:03,130 that it's a certain size of a file. 152 00:07:03,130 --> 00:07:07,100 A sender can actually push that data through at some rate. 153 00:07:07,100 --> 00:07:09,070 So that's the offered load on the network. 154 00:07:09,070 --> 00:07:12,330 And let's assume that senders [want?] through N here, 155 00:07:12,330 --> 00:07:15,570 and let's say that the offered load of the sender is L1. 156 00:07:15,570 --> 00:07:19,410 This is L2, all the way through LN. 157 00:07:19,410 --> 00:07:23,330 OK, so in this simple picture, the total offered load 158 00:07:23,330 --> 00:07:26,340 on the network along this path is 159 00:07:26,340 --> 00:07:30,713 the summation of LI's, where I runs from one to N, right? 160 00:07:34,380 --> 00:07:36,230 Hand out here, in this simple picture, 161 00:07:36,230 --> 00:07:39,030 we have all this offered load going through this one 162 00:07:39,030 --> 00:07:42,352 common link, which we're going to call the bottleneck link. 163 00:07:42,352 --> 00:07:44,060 We don't actually know where; the senders 164 00:07:44,060 --> 00:07:46,770 don't know where the bottleneck is because it's not near them. 165 00:07:46,770 --> 00:07:48,405 And in general, it's not near them. 166 00:07:48,405 --> 00:07:50,780 It could be, but in general, they don't know where it is. 167 00:07:50,780 --> 00:07:52,360 The receivers don't know where it is. 168 00:07:52,360 --> 00:07:53,734 But there is some bottleneck link 169 00:07:53,734 --> 00:07:56,390 that's going to throttle the maximum rate at which you 170 00:07:56,390 --> 00:07:57,840 could send data. 171 00:07:57,840 --> 00:08:00,710 Let's call that rate C. 172 00:08:00,710 --> 00:08:02,500 So what we would like to be able to do 173 00:08:02,500 --> 00:08:04,650 is ensure that at all points in time 174 00:08:04,650 --> 00:08:07,180 that the sum of the load that's being offered 175 00:08:07,180 --> 00:08:11,140 by all senders that share any given link 176 00:08:11,140 --> 00:08:13,500 is less than the capacity of that link. 177 00:08:13,500 --> 00:08:16,986 And so, C here, this picture, this would be C. 178 00:08:16,986 --> 00:08:18,360 And notice that we don't actually 179 00:08:18,360 --> 00:08:19,540 know where these things are. 180 00:08:19,540 --> 00:08:21,915 And for different connections, if you are surfing the web 181 00:08:21,915 --> 00:08:24,750 and downloading music and so on, the actual bottleneck 182 00:08:24,750 --> 00:08:28,269 for each of those transfers might in general be different. 183 00:08:28,269 --> 00:08:30,060 But the general goal for congestion control 184 00:08:30,060 --> 00:08:33,720 is you look at any path between a sender and receiver, 185 00:08:33,720 --> 00:08:36,390 and there is, in general, some bottleneck there. 186 00:08:36,390 --> 00:08:39,090 And you have to ensure that for every bottleneck link, 187 00:08:39,090 --> 00:08:41,730 in fact for every link, the total offered 188 00:08:41,730 --> 00:08:46,350 load presented to that link satisfies this relationship. 189 00:08:46,350 --> 00:08:48,680 OK, so that's ultimate goal. 190 00:08:48,680 --> 00:08:50,150 [SOUND OFF/THEN ON], OK? 191 00:08:50,150 --> 00:09:02,370 [SOUND OFF/THEN ON] one other problem, that you want this. 192 00:09:08,000 --> 00:09:11,600 This is not a trivial problem. 193 00:09:11,600 --> 00:09:13,280 So one reason it's not trivial is 194 00:09:13,280 --> 00:09:14,540 something that might have been clear to you 195 00:09:14,540 --> 00:09:15,998 from the definition of the problem. 196 00:09:15,998 --> 00:09:18,390 You want to share this for all the links in the network. 197 00:09:18,390 --> 00:09:21,250 And the network in general might have, or will have, 198 00:09:21,250 --> 00:09:25,530 millions of links, and hundreds of millions of hosts. 199 00:09:25,530 --> 00:09:27,955 So any solution you come up with has to be scalable. 200 00:09:33,140 --> 00:09:36,390 And in particular, it has to scale to really large networks, 201 00:09:36,390 --> 00:09:40,630 networks that are as big as the public Internet, 202 00:09:40,630 --> 00:09:44,760 and the whole public Internet 10 years from now. 203 00:09:44,760 --> 00:09:46,760 So it has to be something that scales. 204 00:09:46,760 --> 00:09:48,710 It has to handle large values. 205 00:09:48,710 --> 00:09:52,090 So it has to handle large networks. 206 00:09:52,090 --> 00:09:58,590 It has to handle large values of N, where the number of N 207 00:09:58,590 --> 00:10:00,670 to N connections that are transferring data 208 00:10:00,670 --> 00:10:03,180 on this network, and it's an unknown value of N. 209 00:10:03,180 --> 00:10:05,697 You don't really know what N is at any given point in time. 210 00:10:05,697 --> 00:10:07,030 And it could be extremely large. 211 00:10:07,030 --> 00:10:09,660 So you have to handle that. 212 00:10:09,660 --> 00:10:11,170 And above all, and this is actually 213 00:10:11,170 --> 00:10:12,340 the most important point. 214 00:10:12,340 --> 00:10:17,110 And it's often missed by a lot of descriptions of congestion 215 00:10:17,110 --> 00:10:19,540 control and sharing. 216 00:10:19,540 --> 00:10:23,500 A good solution has to scale across many orders of magnitude 217 00:10:23,500 --> 00:10:26,410 of link properties. 218 00:10:26,410 --> 00:10:30,500 On the Internet or in any decent packet switch network, 219 00:10:30,500 --> 00:10:33,580 link rates vary by perhaps seven, 220 00:10:33,580 --> 00:10:35,580 or eight, or nine orders of magnitude. 221 00:10:35,580 --> 00:10:39,870 A single network could have links that send data in five 222 00:10:39,870 --> 00:10:41,350 or ten kilobits per second on one 223 00:10:41,350 --> 00:10:44,510 extreme, and 10 Gb per second, or 40 Gb per second 224 00:10:44,510 --> 00:10:45,660 on the other extreme. 225 00:10:45,660 --> 00:10:48,760 And a single path might actually go 226 00:10:48,760 --> 00:10:51,400 through links whose capacities vary 227 00:10:51,400 --> 00:10:53,297 by many orders of magnitude. 228 00:10:53,297 --> 00:10:55,130 And it's extremely important that a solution 229 00:10:55,130 --> 00:10:59,649 work across this range because otherwise its generality isn't 230 00:10:59,649 --> 00:11:00,190 that general. 231 00:11:00,190 --> 00:11:02,314 And what you would like is a very general solution. 232 00:11:05,030 --> 00:11:06,910 The second reason this problem as 233 00:11:06,910 --> 00:11:10,580 hard is that here we have this N, 234 00:11:10,580 --> 00:11:12,922 but really N is a function of time. 235 00:11:12,922 --> 00:11:13,880 You're surfing the Web. 236 00:11:13,880 --> 00:11:16,850 You click on a link, and all of a sudden 18 embedded objects 237 00:11:16,850 --> 00:11:17,410 come through. 238 00:11:17,410 --> 00:11:20,890 In general, each of those is a different connection. 239 00:11:20,890 --> 00:11:22,004 And then it's gone. 240 00:11:22,004 --> 00:11:24,170 And the next time somebody else clicks on something, 241 00:11:24,170 --> 00:11:25,100 there's a bunch more objects. 242 00:11:25,100 --> 00:11:26,290 So N varies with time. 243 00:11:31,540 --> 00:11:34,460 And as a consequence, the offered load 244 00:11:34,460 --> 00:11:36,830 on the system, both the total offered load 245 00:11:36,830 --> 00:11:40,020 as well as the load offered by single connection 246 00:11:40,020 --> 00:11:41,650 varies with time. 247 00:11:41,650 --> 00:11:43,870 So, LI varies with time as well. 248 00:11:46,362 --> 00:11:48,070 So it's kind of really hard to understand 249 00:11:48,070 --> 00:11:50,360 what the steady-state behavior of a large network 250 00:11:50,360 --> 00:11:51,390 like the Internet is because there 251 00:11:51,390 --> 00:11:52,556 is no steady-state behavior. 252 00:11:56,250 --> 00:12:00,360 And [SOUND OFF/THEN ON] is the thing 253 00:12:00,360 --> 00:12:08,620 that we're going to come back to and [SOUND OFF/THEN ON] 254 00:12:08,620 --> 00:12:11,240 the person who can control the rate at which data 255 00:12:11,240 --> 00:12:13,830 is being sent is the sender. 256 00:12:13,830 --> 00:12:15,750 And maybe the receiver can help it control it 257 00:12:15,750 --> 00:12:18,490 by advertising these flow control windows. 258 00:12:18,490 --> 00:12:21,342 So the control happens at these points. 259 00:12:21,342 --> 00:12:23,800 But if you think about it, the resource that's being shared 260 00:12:23,800 --> 00:12:24,630 is somewhere else. 261 00:12:24,630 --> 00:12:26,970 It's far away, right? 262 00:12:26,970 --> 00:12:30,470 So, there is a delay between the congestion point, 263 00:12:30,470 --> 00:12:32,560 or the points where overload happens 264 00:12:32,560 --> 00:12:34,090 with these points in the network, 265 00:12:34,090 --> 00:12:36,850 and where the control can be exercised 266 00:12:36,850 --> 00:12:38,860 to control congestion. 267 00:12:38,860 --> 00:12:42,410 And this situation is extremely different from most 268 00:12:42,410 --> 00:12:45,000 other resource management schemes 269 00:12:45,000 --> 00:12:46,370 for other computer systems. 270 00:12:46,370 --> 00:12:48,380 If you talk about processor resource management, 271 00:12:48,380 --> 00:12:49,930 well, there's a schedule of the controls, 272 00:12:49,930 --> 00:12:51,054 what runs on the processor. 273 00:12:51,054 --> 00:12:52,330 The processor is right there. 274 00:12:52,330 --> 00:12:54,435 There's no long delay between the two. 275 00:12:54,435 --> 00:12:56,810 So if the operating system decides to schedule [another?] 276 00:12:56,810 --> 00:12:59,570 process, well, it doesn't take a long time before that happens. 277 00:12:59,570 --> 00:13:01,370 It happens pretty quickly. 278 00:13:01,370 --> 00:13:03,570 And nothing else usually happens in the meantime 279 00:13:03,570 --> 00:13:06,134 related to something else, some other process. 280 00:13:06,134 --> 00:13:07,550 If you talk about disk scheduling, 281 00:13:07,550 --> 00:13:10,040 well, the disk scheduler is very, very close to the disk. 282 00:13:10,040 --> 00:13:11,860 It can make a decision, and exercises 283 00:13:11,860 --> 00:13:15,457 control according to whatever policy that it wants. 284 00:13:15,457 --> 00:13:17,290 Here, if the network decides it's congested, 285 00:13:17,290 --> 00:13:19,430 and there is some plan by which the network will 286 00:13:19,430 --> 00:13:25,400 want to react to that load, well, it can't do very much. 287 00:13:25,400 --> 00:13:27,020 It has to somehow arrange for feedback 288 00:13:27,020 --> 00:13:30,530 to reach the points of control, which are in general far away, 289 00:13:30,530 --> 00:13:34,380 and are not just far away in terms of delay, 290 00:13:34,380 --> 00:13:36,390 but that delay varies as well. 291 00:13:36,390 --> 00:13:38,660 That's what makes the problem hard, 292 00:13:38,660 --> 00:13:52,910 which is that the resource is far from the control point. 293 00:13:52,910 --> 00:13:54,810 So these are the congestion points, which 294 00:13:54,810 --> 00:13:56,180 is where the resources are. 295 00:13:56,180 --> 00:13:57,510 And these are the control points, which 296 00:13:57,510 --> 00:13:58,520 is where you can exercise. 297 00:13:58,520 --> 00:14:00,228 And the two are separated geographically. 298 00:14:04,040 --> 00:14:07,610 I don't think I mentioned this. 299 00:14:07,610 --> 00:14:08,710 This is what you want. 300 00:14:08,710 --> 00:14:12,160 And any situation where the summation of LI, 301 00:14:12,160 --> 00:14:15,890 the offered load on a link, is larger than the capacity, 302 00:14:15,890 --> 00:14:19,660 that overload situation is what we're going to call congestion. 303 00:14:19,660 --> 00:14:22,600 OK, so any time you see LI bigger than C, 304 00:14:22,600 --> 00:14:24,504 we're going to define that as congestion. 305 00:14:24,504 --> 00:14:26,920 And then, I'll make this notion a little bit more precise. 306 00:14:26,920 --> 00:14:29,120 And you'll see why we have to make it slightly more precise. 307 00:14:29,120 --> 00:14:31,460 But for now, just say that if inequality is swapped, 308 00:14:31,460 --> 00:14:33,020 that means it's congested. 309 00:14:36,950 --> 00:14:38,320 OK, so that's the problem. 310 00:14:38,320 --> 00:14:41,640 We are going to want to solve the problem. 311 00:14:41,640 --> 00:14:45,310 Now, every other problem that we've encountered in networking 312 00:14:45,310 --> 00:14:50,350 so far, we've solved by going back to that layered picture 313 00:14:50,350 --> 00:14:53,480 and saying, well, we need to deliver, for example, 314 00:14:53,480 --> 00:14:55,589 if you want to deliver packets across one link, 315 00:14:55,589 --> 00:14:57,880 we say all right, we'll take this thing as a link layer 316 00:14:57,880 --> 00:14:58,380 protocol. 317 00:14:58,380 --> 00:14:59,910 We'll define framing on top of it, 318 00:14:59,910 --> 00:15:03,801 and define a way by which we can do things 319 00:15:03,801 --> 00:15:06,300 like [UNINTELLIGIBLE] correction if we need to on that link, 320 00:15:06,300 --> 00:15:08,094 and we'll solve the problem right there. 321 00:15:08,094 --> 00:15:10,510 Then we say, OK, we need to connect all these computers up 322 00:15:10,510 --> 00:15:13,385 together and build up routing tables so we can forward data. 323 00:15:13,385 --> 00:15:15,510 And we say, OK, that's the network layer's problem. 324 00:15:15,510 --> 00:15:17,134 We're going to solve the problem there. 325 00:15:17,134 --> 00:15:18,270 And then packets get lost. 326 00:15:18,270 --> 00:15:20,936 Well, we'll deal with it at the end to end layer. 327 00:15:20,936 --> 00:15:22,810 And, that model has worked out extremely well 328 00:15:22,810 --> 00:15:26,670 because it allows us to run arbitrary applications on top 329 00:15:26,670 --> 00:15:28,650 of this network layer without really having 330 00:15:28,650 --> 00:15:31,360 to build up forwarding tables for every application 331 00:15:31,360 --> 00:15:34,210 anew and run it on all sorts of links. 332 00:15:34,210 --> 00:15:35,620 And you can have paths containing 333 00:15:35,620 --> 00:15:37,160 a variety of different links. 334 00:15:37,160 --> 00:15:39,320 And everything works out like a charm. 335 00:15:39,320 --> 00:15:42,080 But the problem with doing that for congestion control 336 00:15:42,080 --> 00:15:44,590 is that this layered picture is actually 337 00:15:44,590 --> 00:15:46,810 getting in the way of solving congestion 338 00:15:46,810 --> 00:15:49,180 control in a very clean manner. 339 00:15:49,180 --> 00:15:52,220 And the reason for that is that the end to end layer runs 340 00:15:52,220 --> 00:15:53,080 at the end points. 341 00:15:53,080 --> 00:15:55,320 And those are the points were control 342 00:15:55,320 --> 00:15:57,700 is exercised to control the rate at which traffic is 343 00:15:57,700 --> 00:16:02,940 being sent on to the network. 344 00:16:02,940 --> 00:16:04,550 But the congestion, and the notice 345 00:16:04,550 --> 00:16:07,330 of any information about whether the network is overloaded 346 00:16:07,330 --> 00:16:11,720 is deeply buried inside the network at the network layer. 347 00:16:11,720 --> 00:16:14,490 So, what you need is a way by which information 348 00:16:14,490 --> 00:16:16,642 from the network layer about whether congestion 349 00:16:16,642 --> 00:16:18,850 is occurring, or whether congestion is not occurring, 350 00:16:18,850 --> 00:16:21,070 or whether congestion is likely to occur even though it's not 351 00:16:21,070 --> 00:16:22,920 yet occurred, that kind of information 352 00:16:22,920 --> 00:16:26,460 has somehow to percolate toward the end to end layer. 353 00:16:26,460 --> 00:16:28,640 And so far, we've modularized this very nicely 354 00:16:28,640 --> 00:16:30,410 by not really having very much information 355 00:16:30,410 --> 00:16:31,940 propagate between the layers. 356 00:16:31,940 --> 00:16:35,260 But now, to solve this problem of congestion, precisely 357 00:16:35,260 --> 00:16:38,097 because this separation between the resource and the control 358 00:16:38,097 --> 00:16:40,430 point, and the fact that the control point is at the end 359 00:16:40,430 --> 00:16:41,940 to end layer, at least in the way 360 00:16:41,940 --> 00:16:43,340 we're going to solve the problem. 361 00:16:43,340 --> 00:16:45,560 And the fact that the resources at the network layer, 362 00:16:45,560 --> 00:16:49,130 it necessitates across layer solution to the problem. 363 00:16:57,420 --> 00:17:01,499 So somehow we need information to move between the two layers. 364 00:17:01,499 --> 00:17:02,540 So here's a general plan. 365 00:17:07,250 --> 00:17:10,710 So we're going to arrange for them to end layer at the sender 366 00:17:10,710 --> 00:17:12,990 to send at a rate. 367 00:17:12,990 --> 00:17:15,400 This is going to be a rate that changes with time. 368 00:17:15,400 --> 00:17:18,619 But let's say that the sender at some point in time 369 00:17:18,619 --> 00:17:27,450 sends at a rate, RI where RI is measured in bits per second. 370 00:17:27,450 --> 00:17:28,700 And that's actually true here. 371 00:17:28,700 --> 00:17:31,199 In case it wasn't clear, these loads are in bits per second. 372 00:17:31,199 --> 00:17:34,060 The capacity is in bits per second as well. 373 00:17:34,060 --> 00:17:38,410 So the plan is for the sender to send at a [layered?] rate, RI. 374 00:17:38,410 --> 00:17:43,405 And, for all of the switches inside the network, to keep 375 00:17:43,405 --> 00:17:45,780 track in some fashion of whether they are being congested 376 00:17:45,780 --> 00:17:49,960 or not, and it's going to be really simple after that. 377 00:17:49,960 --> 00:17:51,660 If the senders are sending too fast, 378 00:17:51,660 --> 00:17:53,650 or if a given sender is sending too fast, 379 00:17:53,650 --> 00:17:58,190 then we're going to tell them to slow down. 380 00:17:58,190 --> 00:18:00,830 The network layer is somehow going to tell the sender 381 00:18:00,830 --> 00:18:03,180 to slow down. 382 00:18:03,180 --> 00:18:05,860 And likewise, if they are sending too slow, 383 00:18:05,860 --> 00:18:07,687 and we have yet to figure out how 384 00:18:07,687 --> 00:18:09,270 you know that you're sending too slow, 385 00:18:09,270 --> 00:18:11,050 and that you could send a little faster, 386 00:18:11,050 --> 00:18:13,560 you're sending too slow there's going 387 00:18:13,560 --> 00:18:16,440 to be some plan by which the sender can speed up. 388 00:18:21,710 --> 00:18:23,440 And the nature of our solution is 389 00:18:23,440 --> 00:18:25,050 going to be that there is really not 390 00:18:25,050 --> 00:18:27,090 going to be a steady rate at which senders 391 00:18:27,090 --> 00:18:27,840 are going to send. 392 00:18:27,840 --> 00:18:29,590 In fact, by the definition of the problem, 393 00:18:29,590 --> 00:18:32,665 the steady rate is a [fool's errand?] because N varies N 394 00:18:32,665 --> 00:18:33,790 varies and the load varies. 395 00:18:33,790 --> 00:18:36,930 So that means on any given link, the traffic varies with time. 396 00:18:36,930 --> 00:18:38,570 So you really don't want a static rate. 397 00:18:38,570 --> 00:18:41,028 What you would like to do is if there's any extra capacity, 398 00:18:41,028 --> 00:18:43,000 and the sender has load to fill that capacity, 399 00:18:43,000 --> 00:18:45,170 you want that sender to use that capacity. 400 00:18:45,170 --> 00:18:47,170 So this rate is going to adapt and change 401 00:18:47,170 --> 00:18:50,020 with time in response to some feedback 402 00:18:50,020 --> 00:18:54,580 that we're going to obtain based on network layer information 403 00:18:54,580 --> 00:18:57,190 essentially from the network layer about whether people 404 00:18:57,190 --> 00:18:59,370 are sending too fast or people are sending too slow. 405 00:18:59,370 --> 00:19:02,285 OK and all congestion control schemes 406 00:19:02,285 --> 00:19:04,160 that people have come up with, all algorithms 407 00:19:04,160 --> 00:19:07,890 for solving this problem, and there have been dozens 408 00:19:07,890 --> 00:19:11,160 if not a few hundred of them, all 409 00:19:11,160 --> 00:19:13,930 of these things in various variants of various solutions, 410 00:19:13,930 --> 00:19:16,810 all of them are according to the basic plan. 411 00:19:16,810 --> 00:19:18,560 OK, the network layer gives some feedback. 412 00:19:18,560 --> 00:19:19,726 If it's too fast, slow down. 413 00:19:19,726 --> 00:19:21,810 If it's too slow, speed up. 414 00:19:24,780 --> 00:19:28,360 The analogy is a little bit like a water pipeline network. 415 00:19:28,360 --> 00:19:31,050 Imagine have all sorts of pipes feeding water in. 416 00:19:31,050 --> 00:19:32,750 You have this massive pipeline network. 417 00:19:32,750 --> 00:19:35,830 And what you can control are the valves at the end points. 418 00:19:35,830 --> 00:19:37,780 And by controlling the valves, you 419 00:19:37,780 --> 00:19:39,970 can decide whether to let water rush in or not. 420 00:19:39,970 --> 00:19:41,680 And anytime you're getting clogged, 421 00:19:41,680 --> 00:19:44,030 you have to slow down and close the valves. 422 00:19:47,605 --> 00:19:49,230 So the devil's actually in the details. 423 00:19:49,230 --> 00:19:51,720 So were going to dive in to actually seeing how we're 424 00:19:51,720 --> 00:19:53,560 going to solve this problem. 425 00:19:53,560 --> 00:19:56,477 And the first component of any solution to congestion control 426 00:19:56,477 --> 00:19:57,810 is something we've already seen. 427 00:20:01,700 --> 00:20:04,090 And it's buffering. 428 00:20:04,090 --> 00:20:06,670 Any network that has asynchronous multiplex 429 00:20:06,670 --> 00:20:16,240 and, which Sam talked about the first time, any network that 430 00:20:16,240 --> 00:20:20,330 has the following behavior, at any given point in time you 431 00:20:20,330 --> 00:20:24,552 might have multiple packets arrive in to a given switch. 432 00:20:24,552 --> 00:20:26,260 And, the switch can only send one of them 433 00:20:26,260 --> 00:20:29,410 out on an outgoing link at any given point in time, which 434 00:20:29,410 --> 00:20:32,280 means if you weren't careful, you 435 00:20:32,280 --> 00:20:35,340 would have to drop the other packet that showed up. 436 00:20:35,340 --> 00:20:36,880 And so, almost everyone who builds 437 00:20:36,880 --> 00:20:39,530 a decent asynchronously multiplex network 438 00:20:39,530 --> 00:20:43,050 puts in some queues to buffer packets 439 00:20:43,050 --> 00:20:47,300 until the link is free so they can then send the packet out 440 00:20:47,300 --> 00:20:49,380 onto the link. 441 00:20:49,380 --> 00:20:51,170 So the key question when it comes 442 00:20:51,170 --> 00:20:54,847 to buffering but you have to ask is how much? 443 00:20:54,847 --> 00:20:57,180 I don't actually mean how much in terms of how expensive 444 00:20:57,180 --> 00:20:59,704 it is, but in terms of how much should the buffering be? 445 00:20:59,704 --> 00:21:01,370 Should you have one packet of buffering? 446 00:21:01,370 --> 00:21:02,040 Two packets? 447 00:21:02,040 --> 00:21:03,140 Four packets? 448 00:21:03,140 --> 00:21:05,860 And how big should the buffer be? 449 00:21:05,860 --> 00:21:11,860 Well, the one way to answer the question of how much, 450 00:21:11,860 --> 00:21:14,592 first you can ask, what happens if it's too little? 451 00:21:14,592 --> 00:21:17,050 So what happens if the buffering is too little in a switch? 452 00:21:20,867 --> 00:21:22,450 Like you put one packet or two packets 453 00:21:22,450 --> 00:21:23,574 of buffering: what happens? 454 00:21:28,070 --> 00:21:31,270 I'm sorry, what? 455 00:21:31,270 --> 00:21:33,510 Well, congestion by definition has happened when 456 00:21:33,510 --> 00:21:34,830 the load exceeds the capacity. 457 00:21:34,830 --> 00:21:36,270 So you get congestion. 458 00:21:36,270 --> 00:21:38,300 But what's the effect of that? 459 00:21:38,300 --> 00:21:39,960 So a bunch of packets show up. 460 00:21:39,960 --> 00:21:42,370 You've got two packets of buffering. 461 00:21:42,370 --> 00:21:43,820 [So you could lose?] packets. 462 00:21:43,820 --> 00:21:45,720 This is pretty clear, right? 463 00:21:45,720 --> 00:21:46,320 Good. 464 00:21:46,320 --> 00:21:47,780 So if it's too little, what ends up 465 00:21:47,780 --> 00:21:49,680 happening is that you drop packets, 466 00:21:49,680 --> 00:21:54,339 which suggests that you can't have too little buffering, OK? 467 00:21:54,339 --> 00:21:55,880 So, at the other end of the spectrum, 468 00:21:55,880 --> 00:21:57,570 you could just say, well, I'm going 469 00:21:57,570 --> 00:22:00,430 to design my network so it never drops a packet. 470 00:22:00,430 --> 00:22:01,240 Memory is cheap. 471 00:22:01,240 --> 00:22:02,740 We learned about Moore's Law. 472 00:22:02,740 --> 00:22:07,900 So let's just over provision, also called too much buffering. 473 00:22:07,900 --> 00:22:11,324 OK, it's not really that expensive. 474 00:22:11,324 --> 00:22:13,240 So what happens if there's too much buffering? 475 00:22:13,240 --> 00:22:16,580 Well, if you think about it, the only thing 476 00:22:16,580 --> 00:22:19,400 that happens when you have too much buffering as well, packets 477 00:22:19,400 --> 00:22:20,580 won't get lost. 478 00:22:20,580 --> 00:22:22,780 But all you've done is traded off 479 00:22:22,780 --> 00:22:24,580 packet loss for packet delay. 480 00:22:27,190 --> 00:22:31,115 Adding more buffering doesn't make your link go any faster. 481 00:22:31,115 --> 00:22:32,990 They should probably learn that at Disneyland 482 00:22:32,990 --> 00:22:33,939 and places like that. 483 00:22:33,939 --> 00:22:35,730 I mean, these lines there are just so long. 484 00:22:35,730 --> 00:22:36,660 I mean, they may as well tell people 485 00:22:36,660 --> 00:22:38,050 to go away and come back later. 486 00:22:38,050 --> 00:22:39,050 It's the same principle. 487 00:22:39,050 --> 00:22:41,590 I mean, just adding longer lines and longer queues 488 00:22:41,590 --> 00:22:44,960 doesn't mean that the link goes any faster. 489 00:22:44,960 --> 00:22:48,390 So it's really problematic to add excessive buffers. 490 00:22:48,390 --> 00:22:50,500 And the reason is quite subtle. 491 00:22:50,500 --> 00:22:53,250 The reason it actually has to do with something 492 00:22:53,250 --> 00:22:55,500 we talked about the last time, or at lease one reason, 493 00:22:55,500 --> 00:23:00,100 a critical reason has to do with timers and retransmissions. 494 00:23:00,100 --> 00:23:03,070 Recall that all you do when you have too much buffering 495 00:23:03,070 --> 00:23:06,080 is you eliminate packet loss or at least reduce it greatly 496 00:23:06,080 --> 00:23:08,970 at the expense of increasing delays. 497 00:23:08,970 --> 00:23:10,700 But the problem with increasing delays 498 00:23:10,700 --> 00:23:15,090 is that your timers that you're trying to set up to figure out 499 00:23:15,090 --> 00:23:17,930 when to retransmit a packet, you would like them 500 00:23:17,930 --> 00:23:19,280 to adapt to increasing delays. 501 00:23:19,280 --> 00:23:20,790 So he you build this exponentially weighted 502 00:23:20,790 --> 00:23:23,040 moving average, and you pick a timeout interval that's 503 00:23:23,040 --> 00:23:26,640 based on the mean value, and the standard deviation. 504 00:23:26,640 --> 00:23:28,360 And the problem with too much buffering 505 00:23:28,360 --> 00:23:30,550 is that it makes these adaptive timers extremely 506 00:23:30,550 --> 00:23:35,190 hard to implement because your timeout 507 00:23:35,190 --> 00:23:38,662 value has to depend on both the mean and standard deviation. 508 00:23:38,662 --> 00:23:40,120 And if you have too much buffering, 509 00:23:40,120 --> 00:23:45,020 the range of round-trip time values is too high. 510 00:23:45,020 --> 00:23:47,840 And the result of that you end up 511 00:23:47,840 --> 00:23:49,680 with this potential for something 512 00:23:49,680 --> 00:23:50,850 called congestion collapse. 513 00:23:55,620 --> 00:24:09,930 And let me explain this with a picture. 514 00:24:14,000 --> 00:24:17,360 So your adaptive timers are trying 515 00:24:17,360 --> 00:24:21,270 to estimate the round-trip time, and the average round-trip time 516 00:24:21,270 --> 00:24:23,470 in the standard deviation or at the linear deviation 517 00:24:23,470 --> 00:24:26,700 of the round-trip time, and at some point they're 518 00:24:26,700 --> 00:24:29,200 going to make a decision as to whether to retransmit 519 00:24:29,200 --> 00:24:33,120 a packet or not. 520 00:24:33,120 --> 00:24:36,490 What might happen, and what does happen 521 00:24:36,490 --> 00:24:39,670 and has happened when you have too much networks with too 522 00:24:39,670 --> 00:24:41,160 much buffering is that you end up 523 00:24:41,160 --> 00:24:43,640 with a queue that's really big. 524 00:24:43,640 --> 00:24:47,440 OK, and this is some link on which the packet's going out, 525 00:24:47,440 --> 00:24:49,612 and you might have packet one sitting here, 526 00:24:49,612 --> 00:24:51,820 and two sitting here, and three, and all the way out. 527 00:24:51,820 --> 00:24:55,780 There is a large number of packets sitting there. 528 00:24:55,780 --> 00:24:59,150 Notice that the end to end sender 529 00:24:59,150 --> 00:25:01,460 is trying to decide whether packet one 530 00:25:01,460 --> 00:25:04,050 for whose acknowledgment it still hasn't heard. 531 00:25:04,050 --> 00:25:06,880 It's trying to decide whether one is still in transit, 532 00:25:06,880 --> 00:25:09,072 or has actually been dropped. 533 00:25:09,072 --> 00:25:10,530 And it should retransmit the packet 534 00:25:10,530 --> 00:25:12,842 only after one has been dropped. 535 00:25:12,842 --> 00:25:14,300 But if you have too much buffering, 536 00:25:14,300 --> 00:25:15,900 the range of these values is so high 537 00:25:15,900 --> 00:25:19,690 that it makes these adaptive timers quite difficult to tune. 538 00:25:19,690 --> 00:25:23,160 And the result often is that one is still sitting here. 539 00:25:23,160 --> 00:25:25,847 But it had been stuck behind a large number of packets. 540 00:25:25,847 --> 00:25:27,180 So the delay was extremely long. 541 00:25:27,180 --> 00:25:29,450 And the end to end sender timed out. 542 00:25:29,450 --> 00:25:34,250 And when it times out, get retransmits one into the queue. 543 00:25:34,250 --> 00:25:36,280 And, soon after that, often it might 544 00:25:36,280 --> 00:25:38,750 retransmit two, and retransmit three, and retransmit four, 545 00:25:38,750 --> 00:25:39,190 and so on. 546 00:25:39,190 --> 00:25:41,440 And these packets are sort of just stuck in the queue. 547 00:25:41,440 --> 00:25:42,860 They're not actually lost. 548 00:25:42,860 --> 00:25:45,160 And if you think about what will happen then, 549 00:25:45,160 --> 00:25:48,467 this link, which is already a congested link, because queues 550 00:25:48,467 --> 00:25:51,050 have been building up here, long queues have been building up, 551 00:25:51,050 --> 00:25:56,160 this link is not starting to use more and more of its capacity 552 00:25:56,160 --> 00:26:01,190 to send the same packet twice, right, because it sent one out. 553 00:26:01,190 --> 00:26:03,150 And the sender retransmitted it thinking 554 00:26:03,150 --> 00:26:05,447 it was lost when it was just stuck behind a long queue. 555 00:26:05,447 --> 00:26:06,780 And now one is being sent again. 556 00:26:06,780 --> 00:26:08,405 And two is being sent again, and so on. 557 00:26:10,740 --> 00:26:13,490 And graphically, if you look at this, what you end up with 558 00:26:13,490 --> 00:26:15,970 is a picture that looks like the following. 559 00:26:15,970 --> 00:26:21,830 This picture plots the total offered load on the X axis, 560 00:26:21,830 --> 00:26:24,860 and the throughput of the system on the Y 561 00:26:24,860 --> 00:26:29,250 axis where the throughput is defined 562 00:26:29,250 --> 00:26:32,770 as the number of useful bits per second that you get. 563 00:26:32,770 --> 00:26:36,590 So if you send the packet one twice, 564 00:26:36,590 --> 00:26:40,260 only one of those packets is actually useful. 565 00:26:40,260 --> 00:26:42,226 Now, initially when the offered load is low 566 00:26:42,226 --> 00:26:43,600 and the network is not congested, 567 00:26:43,600 --> 00:26:46,100 and the offered load is less than the capacity of the link, 568 00:26:46,100 --> 00:26:49,370 this curve is just a straight line with slope one, 569 00:26:49,370 --> 00:26:52,260 right, because everything you offer 570 00:26:52,260 --> 00:26:53,500 is below the link's capacity. 571 00:26:53,500 --> 00:26:55,530 And it's going through. 572 00:26:55,530 --> 00:26:58,720 Now at some point, it hits the link's capacity, right, 573 00:26:58,720 --> 00:27:00,790 the bottleneck link's capacity. 574 00:27:00,790 --> 00:27:03,290 And after that, any extra offered load 575 00:27:03,290 --> 00:27:04,860 that you pump into the network is not 576 00:27:04,860 --> 00:27:06,249 going to go out any faster. 577 00:27:06,249 --> 00:27:08,290 The throughput is still going to remain the same. 578 00:27:08,290 --> 00:27:11,850 And, it's just going to be flat for a while. 579 00:27:11,850 --> 00:27:14,184 And the reason it's flat is that queues are building up. 580 00:27:14,184 --> 00:27:16,724 And, that's one reason that you can't send things any faster. 581 00:27:16,724 --> 00:27:19,340 The only thing that's going on is that queues are building up. 582 00:27:19,340 --> 00:27:21,550 So this curve remains flat. 583 00:27:21,550 --> 00:27:24,020 Now, in a decent network that doesn't have congestion 584 00:27:24,020 --> 00:27:27,720 collapse, if you don't do anything else, 585 00:27:27,720 --> 00:27:29,940 but somehow manage to keep the system working here, 586 00:27:29,940 --> 00:27:31,490 this curve might remain flat forever. 587 00:27:31,490 --> 00:27:33,280 No matter what the offered load, you know, 588 00:27:33,280 --> 00:27:35,196 you can pump more and more load in the system. 589 00:27:35,196 --> 00:27:38,820 And the throughput remains flat at the capacity. 590 00:27:38,820 --> 00:27:40,682 But the problem is that it has interactions 591 00:27:40,682 --> 00:27:42,640 with things that the higher layer is are doing, 592 00:27:42,640 --> 00:27:45,080 such as these retransmissions and timers. 593 00:27:45,080 --> 00:27:48,240 And, eventually, more and more of the capacity 594 00:27:48,240 --> 00:27:51,242 starts being used uselessly for these redundant transmissions. 595 00:27:51,242 --> 00:27:52,700 And you might end up in a situation 596 00:27:52,700 --> 00:27:55,850 where the throughput dies down. 597 00:27:55,850 --> 00:27:59,300 OK, and if that happens, this situation 598 00:27:59,300 --> 00:28:00,550 is called congestion collapse. 599 00:28:04,150 --> 00:28:08,670 There is more and more work being presented to the system. 600 00:28:08,670 --> 00:28:10,670 If you reach a situation where the actual amount 601 00:28:10,670 --> 00:28:12,940 of useful work that's been done starts 602 00:28:12,940 --> 00:28:15,730 reducing as more work gets presented to the system, 603 00:28:15,730 --> 00:28:18,631 that's the situation of congestion collapse. 604 00:28:18,631 --> 00:28:20,880 And this kind of thing shows up in many other systems, 605 00:28:20,880 --> 00:28:26,710 for example, in things like Web servers that are built out 606 00:28:26,710 --> 00:28:30,410 of many stages where you have a high load presented [at?] 607 00:28:30,410 --> 00:28:32,490 the system, you might see congestion collapse 608 00:28:32,490 --> 00:28:35,376 in situations where, let's say you get a Web request. 609 00:28:35,376 --> 00:28:37,000 And what you have to do is seven stages 610 00:28:37,000 --> 00:28:38,940 of processing on the Web request. 611 00:28:38,940 --> 00:28:40,570 But what you do with the processor 612 00:28:40,570 --> 00:28:42,620 is you take a request, and you process it 613 00:28:42,620 --> 00:28:44,920 for three of those stages. 614 00:28:44,920 --> 00:28:47,240 And then you decide to go to the next request 615 00:28:47,240 --> 00:28:48,450 and process at three stages. 616 00:28:48,450 --> 00:28:50,630 And, you go to the next request [that?] passes at three stages. 617 00:28:50,630 --> 00:28:52,310 So, [no?] request gets complete. 618 00:28:52,310 --> 00:28:53,986 Your CPU's 100% utilized. 619 00:28:53,986 --> 00:28:55,360 But the throughput is essentially 620 00:28:55,360 --> 00:28:56,810 diving down to zero. 621 00:28:56,810 --> 00:29:00,750 That's another situation where you have congestion collapse. 622 00:29:00,750 --> 00:29:04,380 But when it occurs in networks, it 623 00:29:04,380 --> 00:29:07,130 turns out it's more complicated to solve the networks because 624 00:29:07,130 --> 00:29:10,127 of these reasons that I outlined before. 625 00:29:16,190 --> 00:29:25,930 So has everyone seen this? 626 00:29:25,930 --> 00:29:27,360 So let's go back to our problem. 627 00:29:27,360 --> 00:29:31,810 Our goal is to, we want this thing to be true. 628 00:29:31,810 --> 00:29:33,700 The aggregate offered load to be smaller 629 00:29:33,700 --> 00:29:38,890 than the capacity of a link for every link in the system. 630 00:29:38,890 --> 00:29:41,170 But it turns out that's not enough of a specification 631 00:29:41,170 --> 00:29:44,910 because there are many ways to achieve this goal. 632 00:29:44,910 --> 00:29:47,210 For example, a really cheesy way to achieve this goal 633 00:29:47,210 --> 00:29:49,370 is to make sure that everybody remains quiet. 634 00:29:49,370 --> 00:29:51,150 If nobody sends any packets, you're 635 00:29:51,150 --> 00:29:53,052 going to get that to be true. 636 00:29:53,052 --> 00:29:55,260 So we're going to actually have to define the problem 637 00:29:55,260 --> 00:29:56,700 a little bit more completely. 638 00:29:56,700 --> 00:30:00,050 And let's first define what we don't want. 639 00:30:00,050 --> 00:30:07,250 The first thing we don't want is congestion collapse. 640 00:30:07,250 --> 00:30:09,530 So any solution should have the property that it never 641 00:30:09,530 --> 00:30:11,490 gets into that situation. 642 00:30:11,490 --> 00:30:13,860 And in fact, good congestion control schemes 643 00:30:13,860 --> 00:30:16,297 operate at the left knee of the curve. 644 00:30:16,297 --> 00:30:17,880 But we're not going to get too hung up 645 00:30:17,880 --> 00:30:20,040 on that because if we operate a little bit in the middle, 646 00:30:20,040 --> 00:30:22,360 we're going to say that's fine because that'll turn out 647 00:30:22,360 --> 00:30:23,620 to be good enough in practice. 648 00:30:23,620 --> 00:30:25,050 And often is just fine. 649 00:30:25,050 --> 00:30:28,070 And the additional complexity that you might have to bring 650 00:30:28,070 --> 00:30:30,730 to bear on a solution to work nearer [the knee?] might not be 651 00:30:30,730 --> 00:30:31,850 worth it. 652 00:30:31,850 --> 00:30:33,340 OK, but really we're going to worry 653 00:30:33,340 --> 00:30:38,600 about not falling off the cliff at the right edge, OK? 654 00:30:38,600 --> 00:30:41,070 And then having done that, we're going 655 00:30:41,070 --> 00:30:47,360 to want reasonable utilization also called efficiency. 656 00:30:47,360 --> 00:30:51,200 So, what this says is that if you have a network link that's 657 00:30:51,200 --> 00:30:53,530 often congested, you want to make sure 658 00:30:53,530 --> 00:30:56,370 that that link isn't underutilized when 659 00:30:56,370 --> 00:30:58,000 there is offered load around. 660 00:30:58,000 --> 00:31:00,350 So for example, people are presenting 661 00:31:00,350 --> 00:31:03,100 100 kb per second of load, and the network link 662 00:31:03,100 --> 00:31:06,830 has that capacity; you want that to be used. 663 00:31:06,830 --> 00:31:08,799 You don't want to shut them up too much. 664 00:31:08,799 --> 00:31:10,340 So what this really means in practice 665 00:31:10,340 --> 00:31:13,940 is that if you have slowed down, and excess capacity 666 00:31:13,940 --> 00:31:15,900 has presented itself, then you have 667 00:31:15,900 --> 00:31:17,770 to make sure that you speed up. 668 00:31:17,770 --> 00:31:20,050 So that's what it means in practice. 669 00:31:22,670 --> 00:31:24,640 And the third part of the solution 670 00:31:24,640 --> 00:31:28,130 is there's another thing we need to specify. 671 00:31:28,130 --> 00:31:31,160 And the reason is that you can solve these two problems 672 00:31:31,160 --> 00:31:34,490 by making sure that only one person transmits 673 00:31:34,490 --> 00:31:35,780 in the network. 674 00:31:35,780 --> 00:31:37,770 OK, if that person has enough offered load, 675 00:31:37,770 --> 00:31:40,150 then you just got everybody out altogether, 676 00:31:40,150 --> 00:31:42,330 and essentially allocate that resource 677 00:31:42,330 --> 00:31:45,979 in sort of a monopolistic fashion to this one person. 678 00:31:45,979 --> 00:31:47,770 That's not going to be very good because we 679 00:31:47,770 --> 00:31:50,510 would like our network to be used by a number of people. 680 00:31:50,510 --> 00:31:54,760 So I'm going to define that as a goal of any good solution. 681 00:31:54,760 --> 00:31:56,510 I'm going to call it equitable allocation. 682 00:31:59,580 --> 00:32:01,710 I'm not going to say fair allocation because fair 683 00:32:01,710 --> 00:32:03,850 suggests that it's really a strong condition 684 00:32:03,850 --> 00:32:06,980 that every connection gets a roughly equal throughput if it 685 00:32:06,980 --> 00:32:08,850 has that offered load. 686 00:32:08,850 --> 00:32:12,310 I mean, that turns out to be, I think, in my opinion, 687 00:32:12,310 --> 00:32:14,930 achieving perfect fairness to TCP connections 688 00:32:14,930 --> 00:32:18,840 is just a waste of time because in reality, fairness 689 00:32:18,840 --> 00:32:22,240 is governed by who's paying what for access to the network. 690 00:32:22,240 --> 00:32:24,340 So we're not going to get into that in this class. 691 00:32:24,340 --> 00:32:27,830 But we are going to want solutions that don't eliminate, 692 00:32:27,830 --> 00:32:29,562 don't starve certain connections out. 693 00:32:29,562 --> 00:32:31,270 And we'll be happy with that because that 694 00:32:31,270 --> 00:32:33,305 will turn out to work out just fine in practice. 695 00:32:37,820 --> 00:32:40,130 Now, to understand this problem a little bit better, 696 00:32:40,130 --> 00:32:44,220 we're going to want to understand this requirement 697 00:32:44,220 --> 00:32:47,500 a little bit more closely. 698 00:32:47,500 --> 00:32:50,030 And the problem is that this requirement 699 00:32:50,030 --> 00:32:54,620 of the aggregate rate specified by the offered load, 700 00:32:54,620 --> 00:32:57,380 being smaller than the link capacity is 701 00:32:57,380 --> 00:32:59,100 a condition on rates. 702 00:32:59,100 --> 00:33:01,600 It just says the offered load is 100 kb per second. 703 00:33:01,600 --> 00:33:04,150 The capacity is 150 kb per second. 704 00:33:04,150 --> 00:33:06,910 That means it's fine. 705 00:33:06,910 --> 00:33:09,440 The problem is that you have to really ask offered 706 00:33:09,440 --> 00:33:12,230 load over what timescale? 707 00:33:12,230 --> 00:33:15,860 For example, if the overall offered load on your network 708 00:33:15,860 --> 00:33:19,010 is, let's say, a megabit per second, 709 00:33:19,010 --> 00:33:22,200 and the capacity of the network is half a megabit per second, 710 00:33:22,200 --> 00:33:24,870 and in that condition lasts for a whole day, 711 00:33:24,870 --> 00:33:28,240 OK, so for a whole day, your website got slash dotted. 712 00:33:28,240 --> 00:33:29,320 Take that example. 713 00:33:29,320 --> 00:33:31,890 And you have this little wimpy access link 714 00:33:31,890 --> 00:33:33,250 through your DSL line. 715 00:33:33,250 --> 00:33:36,100 And, ten times that much in terms of requests 716 00:33:36,100 --> 00:33:37,700 are being presented to your website. 717 00:33:37,700 --> 00:33:40,270 And, it lasts for the whole day. 718 00:33:40,270 --> 00:33:40,940 Guess what. 719 00:33:40,940 --> 00:33:42,100 Nothing we're going to talk about 720 00:33:42,100 --> 00:33:43,540 is really going to solve that problem 721 00:33:43,540 --> 00:33:45,581 in a way that allows every request to get through 722 00:33:45,581 --> 00:33:46,820 in a timely fashion. 723 00:33:46,820 --> 00:33:49,620 At some point, you throw up your hands and say, you know what? 724 00:33:49,620 --> 00:33:51,380 If I want my website to be popular, 725 00:33:51,380 --> 00:33:55,000 I'd better put it on a 10 Mb per second network 726 00:33:55,000 --> 00:33:57,602 and make sure that everybody can gain access to it. 727 00:33:57,602 --> 00:34:00,060 So we're not really going to solve the problem at that time 728 00:34:00,060 --> 00:34:01,040 scale. 729 00:34:01,040 --> 00:34:04,140 On the other hand, if your website suddenly 730 00:34:04,140 --> 00:34:08,270 got a little bit popular, and a bunch of connections 731 00:34:08,270 --> 00:34:10,920 came to it, but it didn't last for a whole day, 732 00:34:10,920 --> 00:34:13,440 but it lasted for a few seconds, we're 733 00:34:13,440 --> 00:34:16,840 going to want to deal with that problem. 734 00:34:16,840 --> 00:34:19,300 So if you think about it, there are three timescales 735 00:34:19,300 --> 00:34:20,409 that matter here. 736 00:34:20,409 --> 00:34:22,840 And these timescales arise in an actual way 737 00:34:22,840 --> 00:34:26,655 because of this network where congestion happens here. 738 00:34:26,655 --> 00:34:28,780 And then we're going to have some feedback, go back 739 00:34:28,780 --> 00:34:31,650 to the sender, and the sender exercises control. 740 00:34:31,650 --> 00:34:33,719 And the only timescale in this whole system 741 00:34:33,719 --> 00:34:35,690 is the round-trip time because that's 742 00:34:35,690 --> 00:34:37,476 the timescale, the order of magnitude 743 00:34:37,476 --> 00:34:39,350 of the timescale around which any feedback is 744 00:34:39,350 --> 00:34:42,030 going to come to us. 745 00:34:42,030 --> 00:34:45,659 So there are three timescales of interest. 746 00:34:45,659 --> 00:34:47,900 There is smaller than one round-trip time, 747 00:34:47,900 --> 00:34:53,480 which says this inequality is not satisfied for really small. 748 00:34:53,480 --> 00:34:56,449 The time where there is a little spurt, a burst, 749 00:34:56,449 --> 00:34:58,680 it's also called a burst, a burst of packets show up, 750 00:34:58,680 --> 00:35:01,750 and then you have to handle them. 751 00:35:01,750 --> 00:35:05,565 But then after that, things get to be fine. 752 00:35:05,565 --> 00:35:07,440 So there is smaller than one round-trip time. 753 00:35:07,440 --> 00:35:11,180 So this is summation LI is greater than C, i.e. 754 00:35:11,180 --> 00:35:12,450 the network is congested. 755 00:35:12,450 --> 00:35:14,810 It could be congested at really short durations that are 756 00:35:14,810 --> 00:35:16,840 smaller than a roundtrip time. 757 00:35:16,840 --> 00:35:18,730 It could be between one and I'm going 758 00:35:18,730 --> 00:35:20,355 to say 100 round-trip times. 759 00:35:22,950 --> 00:35:25,710 And just for real numbers, a round-trip time 760 00:35:25,710 --> 00:35:28,310 is typically order of 100 ms. 761 00:35:28,310 --> 00:35:32,290 So we are talking here of less than 100 ms up 762 00:35:32,290 --> 00:35:37,010 to about ten seconds, and then bigger than this number, OK? 763 00:35:37,010 --> 00:35:40,647 Bigger than 100. 764 00:35:40,647 --> 00:35:42,230 And these are all orders of magnitude. 765 00:35:42,230 --> 00:35:43,630 I mean, it could be 500 RTT's. 766 00:35:43,630 --> 00:35:46,580 OK, those are the three time scales to worry about. 767 00:35:50,300 --> 00:35:53,470 When congestion happens at less than one roundtrip time, 768 00:35:53,470 --> 00:35:56,470 we're going to solve that problem using this technique. 769 00:35:56,470 --> 00:35:59,575 We're going to solve it using buffering, OK? 770 00:35:59,575 --> 00:36:00,450 And, that's the plan. 771 00:36:00,450 --> 00:36:03,030 The reason is that it's really hard to do anything else 772 00:36:03,030 --> 00:36:05,930 because the congestion lasts for certain burst. 773 00:36:05,930 --> 00:36:08,694 And by the time you figure that out and tell 774 00:36:08,694 --> 00:36:10,110 the sender of that, the congestion 775 00:36:10,110 --> 00:36:11,740 has gone away, which means telling the sender 776 00:36:11,740 --> 00:36:12,960 that the congestion has gone away, 777 00:36:12,960 --> 00:36:14,990 which means telling the sender that the congestion has gone 778 00:36:14,990 --> 00:36:16,920 away was sort of a waste because the sender would 779 00:36:16,920 --> 00:36:17,628 have slowed down. 780 00:36:17,628 --> 00:36:19,480 But the congestion anyway went away. 781 00:36:19,480 --> 00:36:21,160 So, why bother, right? 782 00:36:21,160 --> 00:36:22,700 Why did you tell the sender that? 783 00:36:22,700 --> 00:36:24,199 So that's the kind of thing you want 784 00:36:24,199 --> 00:36:27,547 to solve at the network layer inside the switches. 785 00:36:27,547 --> 00:36:29,380 And that's going to be done using buffering. 786 00:36:32,130 --> 00:36:34,240 And that sort of suggests, and there's 787 00:36:34,240 --> 00:36:35,490 a bit of sleight-of-hand here. 788 00:36:35,490 --> 00:36:37,907 And we're not going to talk about why 789 00:36:37,907 --> 00:36:38,990 there's a sleight-of-hand. 790 00:36:38,990 --> 00:36:41,900 But this really suggests that if you design a network 791 00:36:41,900 --> 00:36:44,100 and you want to put buffering in the network, 792 00:36:44,100 --> 00:36:46,910 you'd better not put more than about a round-trip time's worth 793 00:36:46,910 --> 00:36:49,060 of buffering in that switch. 794 00:36:49,060 --> 00:36:51,382 If you put buffering longer than a round-trip time, 795 00:36:51,382 --> 00:36:52,840 you're getting in the way of things 796 00:36:52,840 --> 00:36:54,790 that the higher layers are going to be doing. 797 00:36:54,790 --> 00:36:56,690 And it's going to confuse them. 798 00:36:56,690 --> 00:36:58,190 In fact, Sam showed you this picture 799 00:36:58,190 --> 00:37:00,000 of these round-trip times varying 800 00:37:00,000 --> 00:37:04,312 a lot between the last lecture where the mean was 801 00:37:04,312 --> 00:37:05,270 two and a half seconds. 802 00:37:05,270 --> 00:37:07,759 And the standard deviation was one and a half seconds. 803 00:37:07,759 --> 00:37:09,050 That was not a made-up picture. 804 00:37:09,050 --> 00:37:10,591 That was from a real wireless network 805 00:37:10,591 --> 00:37:12,849 where the designers had incorrectly 806 00:37:12,849 --> 00:37:14,140 put a huge amount of buffering. 807 00:37:14,140 --> 00:37:15,610 And this is extremely common. 808 00:37:15,610 --> 00:37:19,836 Almost every modem that you buy, cellular modem or phone modem, 809 00:37:19,836 --> 00:37:21,210 has way too much buffering in it. 810 00:37:21,210 --> 00:37:23,626 And, the only thing that happens is these queues build up. 811 00:37:23,626 --> 00:37:25,170 It's just a mistake. 812 00:37:27,700 --> 00:37:30,640 So less than one roundtrip time: deal with it using buffering. 813 00:37:34,120 --> 00:37:35,820 Between one and 100 round-trip times, 814 00:37:35,820 --> 00:37:40,010 we're going to deal with that problem using the techniques 815 00:37:40,010 --> 00:37:43,970 that we are going to talk about today, the next five or ten 816 00:37:43,970 --> 00:37:45,370 minutes. 817 00:37:45,370 --> 00:37:48,134 And then bigger than 100 round-trip times or 1,000 818 00:37:48,134 --> 00:37:50,300 round-trip times are things where congestion is just 819 00:37:50,300 --> 00:37:53,910 sort of persistent for many, many, many seconds or minutes 820 00:37:53,910 --> 00:37:54,740 or hours. 821 00:37:54,740 --> 00:38:00,280 And there are ways of dealing with this problem using 822 00:38:00,280 --> 00:38:02,410 protocols and using algorithms. 823 00:38:02,410 --> 00:38:05,090 But ultimately you have to ask yourself whether you are really 824 00:38:05,090 --> 00:38:06,465 under provisioned, and you really 825 00:38:06,465 --> 00:38:08,445 ought to be buying or provisioning your network 826 00:38:08,445 --> 00:38:10,320 to be higher, maybe put your [UNINTELLIGIBLE] 827 00:38:10,320 --> 00:38:12,820 on a different network that has higher capacity. 828 00:38:12,820 --> 00:38:14,800 And these are longer times congestion effects 829 00:38:14,800 --> 00:38:20,310 for which decisions based on provisioning 830 00:38:20,310 --> 00:38:23,480 have to probably come to play in solving the problem. 831 00:38:28,885 --> 00:38:31,010 So, provisioning is certainly an important problem. 832 00:38:31,010 --> 00:38:32,510 And when you have congestion lasting 833 00:38:32,510 --> 00:38:34,280 really long periods of time, that 834 00:38:34,280 --> 00:38:36,460 might be the right solution. 835 00:38:36,460 --> 00:38:42,780 But for everything else, there is 836 00:38:42,780 --> 00:38:45,910 solutions that are much easier to understand, or at least as 837 00:38:45,910 --> 00:38:49,840 far as using the tools that we've built so far in 6.033. 838 00:38:49,840 --> 00:38:51,790 So, the first component of the solution 839 00:38:51,790 --> 00:38:56,630 is some buffering to deal with the smaller than one 840 00:38:56,630 --> 00:38:58,460 round-trip time situation. 841 00:39:01,380 --> 00:39:05,540 And then when your buffers start getting filled up, 842 00:39:05,540 --> 00:39:08,800 and congestion is lasting for more than a round-trip time, 843 00:39:08,800 --> 00:39:10,630 then your buffers start getting filled up. 844 00:39:10,630 --> 00:39:14,280 At that point, you're starting to see congestion that can't 845 00:39:14,280 --> 00:39:17,280 be hidden by pure buffering. 846 00:39:17,280 --> 00:39:19,730 And that can't be hidden because queues are building up, 847 00:39:19,730 --> 00:39:22,300 and that's causing increased delay to show up at the sender, 848 00:39:22,300 --> 00:39:24,240 and perhaps packets may start getting dropped 849 00:39:24,240 --> 00:39:25,990 when the queue overflows. 850 00:39:25,990 --> 00:39:31,170 And that causes the sender to observe this congestion, which 851 00:39:31,170 --> 00:39:33,400 means that what we want is a plan by which, 852 00:39:33,400 --> 00:39:38,090 when congestion happens lasting over half a round-trip time 853 00:39:38,090 --> 00:39:40,120 or close to a round-trip time, we're 854 00:39:40,120 --> 00:39:44,866 going to want to provide feedback to the sender, OK? 855 00:39:44,866 --> 00:39:46,490 And then the third part of our solution 856 00:39:46,490 --> 00:39:52,380 is when the sender gets this feedback, what it's going to do 857 00:39:52,380 --> 00:39:52,950 is adapt. 858 00:39:55,209 --> 00:39:57,250 And the way it's going to adapt is actually easy. 859 00:39:57,250 --> 00:40:00,710 It's going to do it by modifying the rate at which it 860 00:40:00,710 --> 00:40:01,370 sends packets. 861 00:40:01,370 --> 00:40:02,495 And that's what are set up. 862 00:40:02,495 --> 00:40:04,180 The sender sends at rate RI. 863 00:40:04,180 --> 00:40:10,370 What you do is you change the rate or the speed 864 00:40:10,370 --> 00:40:11,810 at which the sender sends. 865 00:40:14,480 --> 00:40:16,820 So, there's two things we have to figure out. 866 00:40:16,820 --> 00:40:19,810 One is, how does the network give feedback? 867 00:40:19,810 --> 00:40:21,941 And the second is, what exactly is going on 868 00:40:21,941 --> 00:40:22,940 with changing the speed? 869 00:40:22,940 --> 00:40:23,650 How does it work? 870 00:40:26,309 --> 00:40:27,850 There are many ways to give feedback, 871 00:40:27,850 --> 00:40:31,120 and sort of the first order of things you might think about 872 00:40:31,120 --> 00:40:33,010 are, well, when the queue starts to fill up, 873 00:40:33,010 --> 00:40:34,550 I'll send a message to the sender, 874 00:40:34,550 --> 00:40:37,644 or I'll send a bit in the packet header and get to the endpoint, 875 00:40:37,644 --> 00:40:39,060 and it'll send this feedback back. 876 00:40:39,060 --> 00:40:42,380 And any variant that you can think of in the next five 877 00:40:42,380 --> 00:40:44,520 or ten minutes I can assure you has been thought of 878 00:40:44,520 --> 00:40:46,714 and investigated. 879 00:40:46,714 --> 00:40:48,380 And you might think of new ways of doing 880 00:40:48,380 --> 00:40:49,380 it which would be great. 881 00:40:49,380 --> 00:40:51,510 I mean, this is a topic of active work. 882 00:40:51,510 --> 00:40:53,900 People are working on this stuff still. 883 00:40:53,900 --> 00:40:56,390 But my opinion is that the best way 884 00:40:56,390 --> 00:40:58,240 to solve this problem of feedback 885 00:40:58,240 --> 00:41:00,690 is the simplest possible thing you could think of. 886 00:41:00,690 --> 00:41:03,110 The simplest technique that really works all the time 887 00:41:03,110 --> 00:41:05,560 is just drop the packet. 888 00:41:05,560 --> 00:41:07,559 In particular, if the queue overflows, and it's 889 00:41:07,559 --> 00:41:10,100 going to get dropped anyway, and that's a sign of congestion, 890 00:41:10,100 --> 00:41:12,516 but if at any point in time you decide that the network is 891 00:41:12,516 --> 00:41:15,354 getting congested and remains so for a long enough time scale 892 00:41:15,354 --> 00:41:16,770 that you want the sender to react, 893 00:41:16,770 --> 00:41:18,227 just throw the packet away. 894 00:41:18,227 --> 00:41:19,810 The last thing about throwing a packet 895 00:41:19,810 --> 00:41:22,200 away is that it's extremely hard to implement it wrong 896 00:41:22,200 --> 00:41:23,594 because when the queue overflows, 897 00:41:23,594 --> 00:41:25,260 the packet's going to be dropped anyway. 898 00:41:25,260 --> 00:41:28,140 So that feedback is going to be made available to the sender. 899 00:41:32,970 --> 00:41:35,490 So what we're going to assume for now 900 00:41:35,490 --> 00:41:38,350 is that all packet drops that happen in networks 901 00:41:38,350 --> 00:41:41,870 are a sign of congestion, OK? 902 00:41:41,870 --> 00:41:44,952 And when a packet gets dropped, the sender gets that feedback. 903 00:41:44,952 --> 00:41:46,660 And, the reason it gets that feedback is, 904 00:41:46,660 --> 00:41:49,070 remember that every packet's being acknowledged. 905 00:41:49,070 --> 00:41:51,010 So if it misses an acknowledgment, 906 00:41:51,010 --> 00:41:54,060 then it knows that the packet's been dropped. 907 00:41:54,060 --> 00:41:56,650 And it says, ah, the network is congested. 908 00:41:56,650 --> 00:41:58,256 And then I'm going to change my speed. 909 00:41:58,256 --> 00:41:59,630 And, I'm going to change my speed 910 00:41:59,630 --> 00:42:04,520 by reducing the rate at which I'm going to send packets. 911 00:42:04,520 --> 00:42:07,110 So now, we have to ask how the sender adapts 912 00:42:07,110 --> 00:42:09,400 to congestion, how the speed thing actually works out. 913 00:42:16,480 --> 00:42:16,980 Well? 914 00:42:16,980 --> 00:42:23,794 It says up. 915 00:42:23,794 --> 00:42:24,460 It's already up. 916 00:42:24,460 --> 00:42:29,120 Yeah, it doesn't have a down button. 917 00:42:32,110 --> 00:42:32,610 What's that? 918 00:42:39,180 --> 00:42:39,680 Sam? 919 00:42:44,940 --> 00:43:09,950 How many professors does it take to -- 920 00:43:09,950 --> 00:43:12,050 He turned the thing off. 921 00:43:12,050 --> 00:43:13,900 He turned the light off on that. 922 00:43:13,900 --> 00:43:19,740 This might actually work. 923 00:43:19,740 --> 00:43:20,528 Can you see that? 924 00:43:24,200 --> 00:43:27,420 I bet it's a race condition. 925 00:43:27,420 --> 00:43:30,390 All right, so, can you see that? 926 00:43:30,390 --> 00:43:31,640 OK, let's just move from here. 927 00:43:31,640 --> 00:43:33,181 OK, the way we're going to figure out 928 00:43:33,181 --> 00:43:35,310 the rate at which we're going to send our packets 929 00:43:35,310 --> 00:43:37,100 is to use this idea of a sliding window 930 00:43:37,100 --> 00:43:39,030 that we talked about the last time. 931 00:43:39,030 --> 00:43:42,110 And since Sam did such a nice animation, 932 00:43:42,110 --> 00:43:46,510 I'm going to display that again, just a refresher. 933 00:43:46,510 --> 00:43:47,260 Wonderful. 934 00:43:47,260 --> 00:43:49,680 All right. 935 00:43:49,680 --> 00:43:54,580 [APPLAUSE] And since Sam made this nice animation, 936 00:43:54,580 --> 00:43:57,180 I'm going to play that back. 937 00:43:57,180 --> 00:44:01,330 I've always wanted to say, play it again, Sam. 938 00:44:01,330 --> 00:44:05,260 So, what's going on here is that whenever 939 00:44:05,260 --> 00:44:07,550 the receiver receives a packet, it 940 00:44:07,550 --> 00:44:09,080 sends in acknowledgment back. 941 00:44:09,080 --> 00:44:11,432 And the sender's pipelining packets going through. 942 00:44:11,432 --> 00:44:13,390 And whatever the sender gets an acknowledgment, 943 00:44:13,390 --> 00:44:14,560 it sends a new packet off. 944 00:44:14,560 --> 00:44:15,730 And that's where you see happening. 945 00:44:15,730 --> 00:44:17,200 Whenever it gets an acknowledgment, 946 00:44:17,200 --> 00:44:19,720 a new packet goes out onto the other end. 947 00:44:19,720 --> 00:44:22,196 And, the window's sliding by one every time 948 00:44:22,196 --> 00:44:23,320 it gets an acknowledgement. 949 00:44:23,320 --> 00:44:26,620 So this is sort of the way in which this steady-state thing 950 00:44:26,620 --> 00:44:31,290 works out for us in this network. 951 00:44:31,290 --> 00:44:33,470 So the main point to note about this 952 00:44:33,470 --> 00:44:36,960 is that this scheme has an effect 953 00:44:36,960 --> 00:44:39,350 that I'll call self pacing. 954 00:44:39,350 --> 00:44:42,840 OK, it's self pacing because the sender doesn't actually 955 00:44:42,840 --> 00:44:45,480 have to worry very much about when it should exactly 956 00:44:45,480 --> 00:44:47,829 send a packet because it's being told 957 00:44:47,829 --> 00:44:49,370 every time an acknowledgment arrives, 958 00:44:49,370 --> 00:44:52,230 it's being told that the reason I got an acknowledgment is 959 00:44:52,230 --> 00:44:54,760 because one packet left the network, which means I can send 960 00:44:54,760 --> 00:44:56,102 one packet into the network. 961 00:44:56,102 --> 00:44:58,060 And as long as things weren't congested before, 962 00:44:58,060 --> 00:44:59,840 they won't be congested now, right, 963 00:44:59,840 --> 00:45:01,770 because packets have left the pipe. 964 00:45:01,770 --> 00:45:04,590 For every packet that leaves, you're putting one packet in. 965 00:45:04,590 --> 00:45:08,530 This is an absolutely beautiful idea called self-pacing. 966 00:45:08,530 --> 00:45:11,880 And, the idea here is that [acts?] essentially [strobe?] 967 00:45:11,880 --> 00:45:14,380 data packets. 968 00:45:14,380 --> 00:45:15,820 And, like all a really nice ideas, 969 00:45:15,820 --> 00:45:18,476 it's extremely simple to get it. 970 00:45:18,476 --> 00:45:20,350 And it turns out to have lots of implications 971 00:45:20,350 --> 00:45:23,730 for why the congestion control system is actually 972 00:45:23,730 --> 00:45:24,820 stable in reality. 973 00:45:24,820 --> 00:45:26,750 So what this means, for example, is let's 974 00:45:26,750 --> 00:45:27,860 say that things are fine. 975 00:45:27,860 --> 00:45:29,527 And, all of a sudden a bunch of packets 976 00:45:29,527 --> 00:45:31,360 come in from somewhere else, and the network 977 00:45:31,360 --> 00:45:32,709 starts to get congested. 978 00:45:32,709 --> 00:45:34,250 What's going to happen is that queues 979 00:45:34,250 --> 00:45:36,330 are going to start building up. 980 00:45:36,330 --> 00:45:38,490 But, when queues start building up, what happens 981 00:45:38,490 --> 00:45:42,020 is that transmissions that are ongoing for this connection 982 00:45:42,020 --> 00:45:44,470 are just going to slow down a little bit because they're 983 00:45:44,470 --> 00:45:46,475 going to get interleaved with other packets 984 00:45:46,475 --> 00:45:48,850 in the queue, which means the acts are going to come back 985 00:45:48,850 --> 00:45:51,150 slower because how can the acts come back 986 00:45:51,150 --> 00:45:53,670 any faster than the network can deliver them, which means 987 00:45:53,670 --> 00:45:56,520 automatically the sender has a slowing down effect dealing 988 00:45:56,520 --> 00:45:58,170 with transient congestion. 989 00:45:58,170 --> 00:46:00,690 This is a really, really nice behavior. 990 00:46:00,690 --> 00:46:03,090 And as a consequence of the way in which 991 00:46:03,090 --> 00:46:06,740 the self pacing of, the way in which the sliding window 992 00:46:06,740 --> 00:46:09,120 thing here works. 993 00:46:09,120 --> 00:46:17,610 So now, at some point the network might become congested, 994 00:46:17,610 --> 00:46:19,420 and a packet may get dropped. 995 00:46:19,420 --> 00:46:22,202 So, the way that manifests itself as one packet gets 996 00:46:22,202 --> 00:46:23,910 dropped, the corresponding acknowledgment 997 00:46:23,910 --> 00:46:26,109 doesn't get back to the sender. 998 00:46:26,109 --> 00:46:27,900 And when an acknowledgment doesn't get back 999 00:46:27,900 --> 00:46:30,300 to the sender, what the sender does 1000 00:46:30,300 --> 00:46:32,600 is reduce its window size by two. 1001 00:46:39,630 --> 00:46:42,480 OK, so I should mention here there are really two windows 1002 00:46:42,480 --> 00:46:43,510 going on. 1003 00:46:43,510 --> 00:46:46,280 One window is the window that we talked about last time 1004 00:46:46,280 --> 00:46:49,230 where the receiver tells the sender how much buffer space 1005 00:46:49,230 --> 00:46:50,450 that it has. 1006 00:46:50,450 --> 00:46:53,410 And the other window, another variable 1007 00:46:53,410 --> 00:46:56,470 that we just introduced which is maintained by the sender, 1008 00:46:56,470 --> 00:47:00,880 and it's also called the congestion window, OK? 1009 00:47:00,880 --> 00:47:03,090 And this is a dynamic variable that 1010 00:47:03,090 --> 00:47:06,160 constantly changing as the sender gets feedback 1011 00:47:06,160 --> 00:47:07,210 from the receiver. 1012 00:47:07,210 --> 00:47:08,930 And in response to a missing act when 1013 00:47:08,930 --> 00:47:10,430 you determine that a packet is lost, 1014 00:47:10,430 --> 00:47:11,900 you say its congestion is going on. 1015 00:47:11,900 --> 00:47:13,210 And I want to reduce my window size. 1016 00:47:13,210 --> 00:47:14,751 And there are many ways to reduce it. 1017 00:47:14,751 --> 00:47:16,889 We're going to just multiplicatively decrease it. 1018 00:47:16,889 --> 00:47:18,930 And the reason for that has to do with the reason 1019 00:47:18,930 --> 00:47:20,880 why, for example, on the Ethernet 1020 00:47:20,880 --> 00:47:23,820 you found that you were doing exponential back off. 1021 00:47:23,820 --> 00:47:25,520 You're sort of geometrically increasing 1022 00:47:25,520 --> 00:47:28,212 the spacing with which you are sending packets there. 1023 00:47:28,212 --> 00:47:29,670 You're doing the same kind of thing 1024 00:47:29,670 --> 00:47:33,530 here reducing the rate by a factor of two. 1025 00:47:33,530 --> 00:47:35,320 So that's how you slowdown. 1026 00:47:35,320 --> 00:47:37,330 At some point you might get in a situation 1027 00:47:37,330 --> 00:47:40,340 where no acknowledgments come back for a while. 1028 00:47:40,340 --> 00:47:42,744 Say, many packets gets lost, right? 1029 00:47:42,744 --> 00:47:43,660 And that could happen. 1030 00:47:43,660 --> 00:47:45,868 All of a sudden there's a lot of congestion going on. 1031 00:47:45,868 --> 00:47:47,180 You get nothing back. 1032 00:47:47,180 --> 00:47:50,380 At that point, you've lost this ability for acknowledgments 1033 00:47:50,380 --> 00:47:52,520 to strobe new data packets. 1034 00:47:52,520 --> 00:47:54,987 OK, so the acts have acted as a clock allowing you 1035 00:47:54,987 --> 00:47:56,070 to strobe packets through. 1036 00:47:56,070 --> 00:47:57,680 And you've lost that. 1037 00:47:57,680 --> 00:48:01,270 At this point, your best plan is just to stop. 1038 00:48:01,270 --> 00:48:02,440 And you halt for a while. 1039 00:48:02,440 --> 00:48:04,410 It's also called a timeout where you just 1040 00:48:04,410 --> 00:48:07,760 be quiet for awhile, often on the order of a second or more. 1041 00:48:07,760 --> 00:48:11,087 And then you try to start over again. 1042 00:48:11,087 --> 00:48:13,670 Of course, now we have to figure out how you start over again. 1043 00:48:13,670 --> 00:48:15,195 And that's the same problem you have 1044 00:48:15,195 --> 00:48:17,070 when you start a connection in the beginning. 1045 00:48:17,070 --> 00:48:19,790 When a connection starts up, how does it know how fast to send? 1046 00:48:19,790 --> 00:48:22,590 It's the same problem as when many, many packet drops happen, 1047 00:48:22,590 --> 00:48:23,590 and you timeout. 1048 00:48:23,590 --> 00:48:27,090 And, in TCP, that startup is done using something called, 1049 00:48:27,090 --> 00:48:28,937 a technique called slow start. 1050 00:48:28,937 --> 00:48:30,270 And I'll describe the algorithm. 1051 00:48:30,270 --> 00:48:32,830 It's a really elegant algorithm, and it's very, very simple 1052 00:48:32,830 --> 00:48:34,414 to implement. 1053 00:48:34,414 --> 00:48:35,830 So you have a sender and receiver. 1054 00:48:35,830 --> 00:48:37,210 And the plan is initially, the sender 1055 00:48:37,210 --> 00:48:39,168 wants to initiate a connection to the receiver. 1056 00:48:39,168 --> 00:48:40,140 It sends a request. 1057 00:48:40,140 --> 00:48:42,610 It gets the response back with the flow control window. 1058 00:48:42,610 --> 00:48:44,670 It just says how much buffer space the receiver 1059 00:48:44,670 --> 00:48:45,690 currently has. 1060 00:48:45,690 --> 00:48:48,980 OK, we're not going to use that anymore in this picture. 1061 00:48:48,980 --> 00:48:50,670 But it's just there saying, which 1062 00:48:50,670 --> 00:48:52,961 means the sender can never send more than eight packets 1063 00:48:52,961 --> 00:48:55,460 within any given roundtrip. 1064 00:48:55,460 --> 00:48:57,880 Then what happens is the sender sends the first segment, 1065 00:48:57,880 --> 00:48:58,920 and it gets an act. 1066 00:48:58,920 --> 00:49:02,206 So, initially this congestion window value starts off at one. 1067 00:49:02,206 --> 00:49:03,330 OK, so you send one packet. 1068 00:49:03,330 --> 00:49:05,930 One segment, you get an acknowledgment back. 1069 00:49:05,930 --> 00:49:09,150 And now the plan is that the algorithm is as follows. 1070 00:49:09,150 --> 00:49:11,210 For every acknowledgment you get back, 1071 00:49:11,210 --> 00:49:14,240 you increase the congestion window by one. 1072 00:49:14,240 --> 00:49:18,510 OK, so what this means is when you get this acknowledgment 1073 00:49:18,510 --> 00:49:22,140 back, you send out two packets. 1074 00:49:22,140 --> 00:49:24,460 And each of those two packets and sends you 1075 00:49:24,460 --> 00:49:27,590 back two acknowledgements. 1076 00:49:27,590 --> 00:49:29,520 Now, for each of those two acknowledgements, 1077 00:49:29,520 --> 00:49:31,140 you increase the window by one, which 1078 00:49:31,140 --> 00:49:33,500 means that after you got both those acknowledgements, 1079 00:49:33,500 --> 00:49:35,284 you've increased your window by two. 1080 00:49:35,284 --> 00:49:36,700 In previously, the window was two. 1081 00:49:36,700 --> 00:49:38,364 So, now you can send four packets. 1082 00:49:38,364 --> 00:49:40,780 So what happens is in response to the rape acknowledgment, 1083 00:49:40,780 --> 00:49:42,590 you send out two packets. 1084 00:49:42,590 --> 00:49:44,710 In response to the light blue acknowledgement, 1085 00:49:44,710 --> 00:49:46,970 you send out two more packets. 1086 00:49:46,970 --> 00:49:48,950 And now, those get through to the other side, 1087 00:49:48,950 --> 00:49:50,625 they send back acknowledgments to you. 1088 00:49:50,625 --> 00:49:52,000 In response to each one of those, 1089 00:49:52,000 --> 00:49:53,550 you increase the window by one. 1090 00:49:53,550 --> 00:49:56,590 So increasing the window by one on each acknowledgment 1091 00:49:56,590 --> 00:49:58,420 means you can send two packets. 1092 00:49:58,420 --> 00:50:01,870 The reason is that one packet went through. 1093 00:50:01,870 --> 00:50:03,120 And you got an acknowledgment. 1094 00:50:03,120 --> 00:50:04,680 So in response to the acknowledgment, 1095 00:50:04,680 --> 00:50:06,400 you could send one packet to fill 1096 00:50:06,400 --> 00:50:10,330 in the space occupied by the packet that 1097 00:50:10,330 --> 00:50:11,480 just got acknowledged. 1098 00:50:11,480 --> 00:50:12,980 In addition, you increase the window 1099 00:50:12,980 --> 00:50:16,260 by one, which means you can send one more packet. 1100 00:50:16,260 --> 00:50:18,340 So really, what's going on with this algorithm is 1101 00:50:18,340 --> 00:50:21,570 if on each act, you send two segments, which 1102 00:50:21,570 --> 00:50:23,260 means you increase the window by one, 1103 00:50:23,260 --> 00:50:25,390 the congestion window by one, really what happens 1104 00:50:25,390 --> 00:50:28,010 is the rate at which this window opens is exponential in time 1105 00:50:28,010 --> 00:50:30,700 because in every roundtrip, from one roundtrip to the next, 1106 00:50:30,700 --> 00:50:33,812 you're doubling the entire window, OK? 1107 00:50:33,812 --> 00:50:36,270 So although it's called slow start, it's really quite fast. 1108 00:50:36,270 --> 00:50:39,010 And, it's an exponential increase. 1109 00:50:39,010 --> 00:50:40,780 So if you put it all together, it 1110 00:50:40,780 --> 00:50:47,140 will turn out that the way in which algorithms like the TCP 1111 00:50:47,140 --> 00:50:49,870 algorithm works is it starts off at a value of the congestion 1112 00:50:49,870 --> 00:50:50,370 window. 1113 00:50:50,370 --> 00:50:52,670 If this is the congestion window, and this is time, 1114 00:50:52,670 --> 00:50:54,917 it starts off at some small value like one. 1115 00:50:54,917 --> 00:50:56,250 And, it increases exponentially. 1116 00:50:58,780 --> 00:51:01,360 And then, if you continually increase exponentially, 1117 00:51:01,360 --> 00:51:02,760 this is a recipe for congestion. 1118 00:51:02,760 --> 00:51:05,120 It's surely going to overflow some buffer and slowdown. 1119 00:51:05,120 --> 00:51:07,960 So what really happens in practice is after a while, 1120 00:51:07,960 --> 00:51:11,980 based on some threshold, it just automatically decides 1121 00:51:11,980 --> 00:51:15,060 it's probably not worth going exponentially fast. 1122 00:51:15,060 --> 00:51:18,805 And, it turns out slows down to go at a linear increase. 1123 00:51:18,805 --> 00:51:20,930 So every round-trip time it's increasing the window 1124 00:51:20,930 --> 00:51:22,830 by a constant amount. 1125 00:51:22,830 --> 00:51:24,940 And then at some point, congestion might happen. 1126 00:51:24,940 --> 00:51:27,120 And when congestion happens, we drop the window 1127 00:51:27,120 --> 00:51:28,420 by a factor of two. 1128 00:51:28,420 --> 00:51:31,360 So, we cut down to a factor of two of the current window. 1129 00:51:31,360 --> 00:51:33,940 And then, we continue to increase linearly. 1130 00:51:33,940 --> 00:51:36,860 And then you might have congestion coming again. 1131 00:51:36,860 --> 00:51:40,950 For instance, we might have lost all of the acknowledgements, 1132 00:51:40,950 --> 00:51:42,790 not got any of the acknowledgements, 1133 00:51:42,790 --> 00:51:45,870 losing a lot of packets, which means we go down to this thing 1134 00:51:45,870 --> 00:51:47,580 where we've lost the act clock. 1135 00:51:47,580 --> 00:51:48,650 So, we have to time out. 1136 00:51:48,650 --> 00:51:50,160 And, it's like a new connection. 1137 00:51:50,160 --> 00:51:52,507 So, we remain silent for a while. 1138 00:51:52,507 --> 00:51:54,090 And this period is the time out period 1139 00:51:54,090 --> 00:51:56,412 that's governed by the equations that we talked 1140 00:51:56,412 --> 00:51:58,370 about the last time based on the roundtrip time 1141 00:51:58,370 --> 00:51:59,494 and the standard deviation. 1142 00:51:59,494 --> 00:52:02,224 We remain quiet for a while, and then we start over 1143 00:52:02,224 --> 00:52:03,390 as if it's a new connection. 1144 00:52:03,390 --> 00:52:05,949 And we increase exponentially. 1145 00:52:05,949 --> 00:52:07,490 And this time, we don't exponentially 1146 00:52:07,490 --> 00:52:09,960 increase as we did the very first time. 1147 00:52:09,960 --> 00:52:12,580 We actually exponentially increase only for a little bit 1148 00:52:12,580 --> 00:52:14,720 because we knew that congestion happened here. 1149 00:52:14,720 --> 00:52:17,860 So, we go up to some fraction of that, and then 1150 00:52:17,860 --> 00:52:19,420 increase linearly. 1151 00:52:19,420 --> 00:52:21,870 So the exact details are not important. 1152 00:52:21,870 --> 00:52:24,367 The thing that's important is just understanding that when 1153 00:52:24,367 --> 00:52:26,950 congestion occurs, you will drop your window by a factor [of?] 1154 00:52:26,950 --> 00:52:28,560 some multiplicative degrees. 1155 00:52:28,560 --> 00:52:30,237 And at the beginning of connection, 1156 00:52:30,237 --> 00:52:32,570 you're trying to quickly figure out what the sustainable 1157 00:52:32,570 --> 00:52:33,190 capacity is. 1158 00:52:33,190 --> 00:52:35,770 So he you are going in this increase phase here. 1159 00:52:35,770 --> 00:52:37,410 But after awhile, you don't really 1160 00:52:37,410 --> 00:52:39,409 want to be exponentially increasing all the time 1161 00:52:39,409 --> 00:52:41,740 because that's almost guaranteed to overrun buffers. 1162 00:52:41,740 --> 00:52:44,014 So you want to go into a more [ginger mode?]. 1163 00:52:44,014 --> 00:52:46,180 And, people have proposed congestion control schemes 1164 00:52:46,180 --> 00:52:48,400 that do all sorts of things, all sorts of functions. 1165 00:52:48,400 --> 00:52:49,770 But this is the thing that's just out 1166 00:52:49,770 --> 00:52:51,050 there on all of the computers. 1167 00:52:51,050 --> 00:52:53,529 And it works remarkably well. 1168 00:52:53,529 --> 00:52:56,070 So, there's two basic points you have to take home from here. 1169 00:52:56,070 --> 00:52:57,528 The main goal of congestion control 1170 00:52:57,528 --> 00:52:59,230 is to avoid congestion collapse, which 1171 00:52:59,230 --> 00:53:01,861 is the picture that's being hidden behind that, which 1172 00:53:01,861 --> 00:53:03,360 is that as often load increases, you 1173 00:53:03,360 --> 00:53:06,370 don't want throughput to drop like a cliff. 1174 00:53:06,370 --> 00:53:09,170 And this is a cross layer problem. 1175 00:53:09,170 --> 00:53:11,040 And the essence of all solutions is 1176 00:53:11,040 --> 00:53:14,800 to use a combination of network layer feedback and end system 1177 00:53:14,800 --> 00:53:16,670 control, end to end control. 1178 00:53:16,670 --> 00:53:18,880 And, good solutions work across eight orders 1179 00:53:18,880 --> 00:53:21,120 of magnitude of bandwidth, and four orders 1180 00:53:21,120 --> 00:53:22,900 of magnitude of delay. 1181 00:53:22,900 --> 00:53:24,220 So, with that, we'll start. 1182 00:53:24,220 --> 00:53:25,530 DP1 is due tomorrow. 1183 00:53:25,530 --> 00:53:29,390 Have a great spring break, and see you after spring break.