1 00:00:00,000 --> 00:00:03,510 And if I keel over and fall down, 2 00:00:03,510 --> 00:00:06,820 somebody call the ambulance. 3 00:00:06,820 --> 00:00:09,790 OK, what we're going to do today is, 4 00:00:09,790 --> 00:00:13,390 this is actually the last lecture on a topic 5 00:00:13,390 --> 00:00:16,370 that some might consider part of networking, 6 00:00:16,370 --> 00:00:19,190 but some might consider part of a more general topic 7 00:00:19,190 --> 00:00:20,200 of distributed systems. 8 00:00:20,200 --> 00:00:24,000 So it actually forms a bridge between the stuff 9 00:00:24,000 --> 00:00:25,950 we learned about in networking, and what 10 00:00:25,950 --> 00:00:29,890 we're going to see from Wednesday over the next six 11 00:00:29,890 --> 00:00:33,600 or seven lectures of the class, and of the associated 12 00:00:33,600 --> 00:00:35,570 recitations having to do with fault 13 00:00:35,570 --> 00:00:37,860 tolerant, reliable computing. 14 00:00:37,860 --> 00:00:41,300 And most of the interesting aspects 15 00:00:41,300 --> 00:00:46,056 of what we're going to talk about involve techniques 16 00:00:46,056 --> 00:00:47,680 in fault tolerance, and more techniques 17 00:00:47,680 --> 00:00:50,180 have to do with redundancy and replication. 18 00:00:50,180 --> 00:00:52,650 And DNS, the domain name system, which 19 00:00:52,650 --> 00:00:54,390 is the system we're going to look 20 00:00:54,390 --> 00:00:56,990 at today in the context of distributed naming 21 00:00:56,990 --> 00:00:59,510 is a bridge because on the one hand 22 00:00:59,510 --> 00:01:01,566 it covers some of the aspects of networking 23 00:01:01,566 --> 00:01:02,440 that we talked about. 24 00:01:02,440 --> 00:01:05,140 On the other hand, it shows an example 25 00:01:05,140 --> 00:01:07,910 of how you can achieve replication 26 00:01:07,910 --> 00:01:11,280 to achieve fault tolerance. 27 00:01:11,280 --> 00:01:15,516 And were going to study these techniques systematically 28 00:01:15,516 --> 00:01:16,640 over the next few lectures. 29 00:01:16,640 --> 00:01:22,390 So getting an example in mind is usually a good idea. 30 00:01:22,390 --> 00:01:24,840 So you've already seen the network layer, 31 00:01:24,840 --> 00:01:26,430 and you've seen that. 32 00:01:26,430 --> 00:01:33,450 At the network layer, attachment points on the network 33 00:01:33,450 --> 00:01:37,310 are identified by IP addresses in the internet 34 00:01:37,310 --> 00:01:40,670 are more generally identified by network layer addresses. 35 00:01:40,670 --> 00:01:44,175 So as an example on the Internet are IP addresses. 36 00:01:46,900 --> 00:01:50,790 And, this is the name that's used by the network layer 37 00:01:50,790 --> 00:01:53,277 to identify attachment points anywhere on the network. 38 00:01:53,277 --> 00:01:55,360 And in fact, there's a name that's used by the end 39 00:01:55,360 --> 00:02:00,890 to end layer to name one endpoint of connection. 40 00:02:00,890 --> 00:02:02,750 So in fact if you think about it and go back 41 00:02:02,750 --> 00:02:06,080 to our very early lecture on naming, 42 00:02:06,080 --> 00:02:08,350 an address is really just a name, 43 00:02:08,350 --> 00:02:10,830 but a name that's been overloaded with information 44 00:02:10,830 --> 00:02:15,540 that allows the user of that name to locate this object. 45 00:02:15,540 --> 00:02:18,740 An address, really, is a name that has information in it, 46 00:02:18,740 --> 00:02:20,950 overloaded information in it that actually 47 00:02:20,950 --> 00:02:22,730 allows for it to be located. 48 00:02:22,730 --> 00:02:24,980 And in fact, an IP address is nothing other 49 00:02:24,980 --> 00:02:30,090 than a name that tells you where in the Internet topology, 50 00:02:30,090 --> 00:02:33,670 the entity being named by this IP address is located. 51 00:02:33,670 --> 00:02:37,810 So my computer is something like 18.31.0.35. 52 00:02:37,810 --> 00:02:41,180 It doesn't mean anything other than the fact 53 00:02:41,180 --> 00:02:43,070 that somewhere on this Internet topology 54 00:02:43,070 --> 00:02:45,490 there is this big, complicated graph. 55 00:02:45,490 --> 00:02:48,780 And that address allows you to do 56 00:02:48,780 --> 00:02:50,990 routing in the geography of that topology. 57 00:02:50,990 --> 00:02:53,090 It has nothing to do with real-world geography. 58 00:02:53,090 --> 00:02:56,589 It just allows you to do routing in that topology. 59 00:02:56,589 --> 00:02:58,630 Now, in principle, you could build every Internet 60 00:02:58,630 --> 00:03:01,640 application, and have users interact with Internet 61 00:03:01,640 --> 00:03:05,079 applications purely with these network layer addresses. 62 00:03:05,079 --> 00:03:06,620 But that would be quite inconvenient. 63 00:03:06,620 --> 00:03:07,994 I mean, you would then have to be 64 00:03:07,994 --> 00:03:11,600 sending e-mail to your friends, or not, with Joe at MIT.edu. 65 00:03:11,600 --> 00:03:14,080 But, you'd have to do Joe at some IP address. 66 00:03:14,080 --> 00:03:17,380 And, it's pretty hard and complicated to remember. 67 00:03:17,380 --> 00:03:20,430 So the first problem with just using pure network layer 68 00:03:20,430 --> 00:03:22,590 addresses that we want to solve today, 69 00:03:22,590 --> 00:03:26,790 and we will solve to some degree, although not completely 70 00:03:26,790 --> 00:03:29,629 is to come up with a better way of naming things 71 00:03:29,629 --> 00:03:30,670 that are more convenient. 72 00:03:33,177 --> 00:03:34,510 And you already know the answer. 73 00:03:34,510 --> 00:03:38,940 The answer is you send e-mail to Joe at MIT.edu. 74 00:03:38,940 --> 00:03:44,290 You go to the 6.033 website at MIT.edu/6.033 or some other 75 00:03:44,290 --> 00:03:47,150 equivalent thing that leads to the same page. 76 00:03:47,150 --> 00:03:50,910 You don't actually think about names in terms of IP addresses. 77 00:03:50,910 --> 00:03:53,180 So in fact, to a large extent, the fact 78 00:03:53,180 --> 00:03:57,690 that these are human understandable and names 79 00:03:57,690 --> 00:04:01,530 that are mnemonics that you can easily remember 80 00:04:01,530 --> 00:04:03,170 is a good thing. 81 00:04:03,170 --> 00:04:04,940 And so we do actually want to come up 82 00:04:04,940 --> 00:04:07,950 with a way of naming things that's 83 00:04:07,950 --> 00:04:10,570 independent of IP addresses. 84 00:04:10,570 --> 00:04:14,840 The second goal here is to come up 85 00:04:14,840 --> 00:04:17,680 with a naming scheme with a solution that allows 86 00:04:17,680 --> 00:04:20,603 some degree of modularity. 87 00:04:20,603 --> 00:04:22,019 As you know, names provide a level 88 00:04:22,019 --> 00:04:25,770 of indirection between the thing that you want to get to, 89 00:04:25,770 --> 00:04:32,910 and the handle that you want to associate with it. 90 00:04:32,910 --> 00:04:35,850 And that level of indirection, if we come up 91 00:04:35,850 --> 00:04:37,450 with a good way of doing this, it 92 00:04:37,450 --> 00:04:40,760 will allow us to do a few things like, for example, I 93 00:04:40,760 --> 00:04:43,220 can tell you that the website for MIT.edu, 94 00:04:43,220 --> 00:04:47,620 for the institute's homepage is MIT.edu. 95 00:04:47,620 --> 00:04:52,850 And, behind that, I could change the actual computers on which 96 00:04:52,850 --> 00:04:54,790 the website is located. 97 00:04:54,790 --> 00:04:56,340 And I could do that independently 98 00:04:56,340 --> 00:04:58,370 of telling other people of any change in it, 99 00:04:58,370 --> 00:05:01,310 whereas if I told them that the website was at a particular IP 100 00:05:01,310 --> 00:05:03,360 address, then every time I moved a page, 101 00:05:03,360 --> 00:05:05,430 moved the pages from one computer to another, 102 00:05:05,430 --> 00:05:06,990 I have to tell everybody in the world 103 00:05:06,990 --> 00:05:08,850 that the website has changed. 104 00:05:08,850 --> 00:05:13,270 And we'd like to minimize doing that kind of thing. 105 00:05:13,270 --> 00:05:15,934 And so, the domain name system provides a solution 106 00:05:15,934 --> 00:05:16,600 to this problem. 107 00:05:16,600 --> 00:05:18,505 DNS provides a solution to this problem. 108 00:05:18,505 --> 00:05:20,130 Most of you have already heard of this. 109 00:05:20,130 --> 00:05:24,994 It maps between what are formerly called domain names, 110 00:05:24,994 --> 00:05:26,410 but what we are going to just call 111 00:05:26,410 --> 00:05:28,340 host names for convenience. 112 00:05:28,340 --> 00:05:36,850 It maps between host names and records. 113 00:05:36,850 --> 00:05:39,320 And it turns out there are many different kinds of records. 114 00:05:39,320 --> 00:05:42,470 And we're going to look at a few of them in this lecture today. 115 00:05:42,470 --> 00:05:45,020 But for your mental model right now, 116 00:05:45,020 --> 00:05:49,060 just assume that it maps between host names and IP addresses. 117 00:05:49,060 --> 00:05:51,350 So it turns out that an IP address 118 00:05:51,350 --> 00:05:55,010 is just one example of a record called an address record. 119 00:05:55,010 --> 00:05:56,880 But the general goal of DNS is to map 120 00:05:56,880 --> 00:05:58,620 between host names and records. 121 00:05:58,620 --> 00:06:01,130 And, there is a variety of them, as I said. 122 00:06:01,130 --> 00:06:03,670 For now, just assume they are IP addresses. 123 00:06:03,670 --> 00:06:06,720 So for example, MIT.edu might be 18 dot whatever 124 00:06:06,720 --> 00:06:10,710 it is as an IP address. 125 00:06:10,710 --> 00:06:14,475 So what are the goals in designing the system? 126 00:06:14,475 --> 00:06:16,100 So, primarily we're going to be talking 127 00:06:16,100 --> 00:06:19,690 about how the domain name system is designed, and how it works. 128 00:06:19,690 --> 00:06:22,087 And as we go along, we'll see some things 129 00:06:22,087 --> 00:06:24,170 that it does that are different from other systems 130 00:06:24,170 --> 00:06:27,100 or other ways you can solve this problem. 131 00:06:27,100 --> 00:06:30,630 There are basically two goals in the design of the system. 132 00:06:30,630 --> 00:06:33,570 And the first goal is that it should scale. 133 00:06:33,570 --> 00:06:40,960 In fact, the original motivation for the domain name system when 134 00:06:40,960 --> 00:06:43,600 it came about in the early 80s was 135 00:06:43,600 --> 00:06:46,040 that it was becoming extremely inconvenient 136 00:06:46,040 --> 00:06:50,160 to manage the mappings between names of hosts and their IP 137 00:06:50,160 --> 00:06:53,190 addresses in a way that as the network grew, 138 00:06:53,190 --> 00:06:55,390 and it was growing pretty rapidly even then, 139 00:06:55,390 --> 00:06:56,970 as the network grew it turned out 140 00:06:56,970 --> 00:06:59,830 to be a management nightmare. 141 00:06:59,830 --> 00:07:01,890 And to understand this, basically there's 142 00:07:01,890 --> 00:07:04,070 three ways in which you can imagine; 143 00:07:04,070 --> 00:07:05,710 there are more than three ways. 144 00:07:05,710 --> 00:07:07,874 But there are three more obvious ways 145 00:07:07,874 --> 00:07:09,290 in which you could imagine mapping 146 00:07:09,290 --> 00:07:11,210 between these names and these records, 147 00:07:11,210 --> 00:07:13,220 so the names and IP addresses. 148 00:07:13,220 --> 00:07:14,940 And the first one is, in fact, the way 149 00:07:14,940 --> 00:07:19,770 the Internet names were being managed until DNS came along. 150 00:07:19,770 --> 00:07:22,670 And that's to use a model that you might think 151 00:07:22,670 --> 00:07:23,980 of as the telephone book model. 152 00:07:28,730 --> 00:07:33,164 It's actually astonishing the telephone companies use 153 00:07:33,164 --> 00:07:34,830 the telephone book model where every six 154 00:07:34,830 --> 00:07:37,070 months at your doorstep there are these three 155 00:07:37,070 --> 00:07:39,630 thick books that show up. 156 00:07:39,630 --> 00:07:41,380 And you're just looking at it and going, 157 00:07:41,380 --> 00:07:42,200 what do I do with this? 158 00:07:42,200 --> 00:07:44,033 And you lug it to your home, and by the time 159 00:07:44,033 --> 00:07:46,990 you might use a couple of times, and then the next big three 160 00:07:46,990 --> 00:07:47,736 books come along. 161 00:07:47,736 --> 00:07:49,360 You actually wonder why they don't just 162 00:07:49,360 --> 00:07:54,412 put their phone books on the Web and make it easy to get at. 163 00:07:54,412 --> 00:07:55,870 But the telephone book model really 164 00:07:55,870 --> 00:07:59,790 is, there's this central repository of information. 165 00:07:59,790 --> 00:08:02,600 And everybody gets this information 166 00:08:02,600 --> 00:08:03,800 from the center repository. 167 00:08:03,800 --> 00:08:06,240 But the repository actually pushes it to everybody. 168 00:08:06,240 --> 00:08:08,240 So it's not really a queryable (sic) repository. 169 00:08:08,240 --> 00:08:13,100 It's not something you go in contact on an as needed basis. 170 00:08:13,100 --> 00:08:15,610 So, in fact, every few months, or in the old days 171 00:08:15,610 --> 00:08:17,950 of the Internet, everyday in the morning, 172 00:08:17,950 --> 00:08:21,530 every computer would go and pull the current mapping 173 00:08:21,530 --> 00:08:25,530 between names and addresses from a central site. 174 00:08:25,530 --> 00:08:28,590 And this model kind of works for a little bit, 175 00:08:28,590 --> 00:08:29,990 but stopped working. 176 00:08:29,990 --> 00:08:31,920 It stops scaling on multiple levels. 177 00:08:31,920 --> 00:08:34,640 First of all, the resources required for everybody 178 00:08:34,640 --> 00:08:38,090 to have to go and collect this information every day 179 00:08:38,090 --> 00:08:39,981 turns out to be significant. 180 00:08:39,981 --> 00:08:41,980 But it also turns out to be a scaling bottleneck 181 00:08:41,980 --> 00:08:45,290 more fundamentally from the standpoint of any time 182 00:08:45,290 --> 00:08:49,151 you add a machine to your local organization, 183 00:08:49,151 --> 00:08:50,900 you have to go and tell the central person 184 00:08:50,900 --> 00:08:52,660 that you've added a machine. 185 00:08:52,660 --> 00:08:54,406 And so, at a human level it doesn't 186 00:08:54,406 --> 00:08:55,780 scale very well because you can't 187 00:08:55,780 --> 00:08:58,071 allow for this automatically happen because then people 188 00:08:58,071 --> 00:09:00,530 would sort of willy-nilly do all sorts of, you know, 189 00:09:00,530 --> 00:09:04,250 just claim that they own various machines or various names 190 00:09:04,250 --> 00:09:06,600 at various places. 191 00:09:06,600 --> 00:09:08,519 So the telephone book model is something 192 00:09:08,519 --> 00:09:09,810 that the Internet used to have. 193 00:09:09,810 --> 00:09:11,590 It used to be a file card host.txt. 194 00:09:11,590 --> 00:09:14,790 And, every computer had a copy of this file that usually was 195 00:09:14,790 --> 00:09:17,050 current as of a 24 hour period. 196 00:09:19,780 --> 00:09:21,700 So, the second approach that you could 197 00:09:21,700 --> 00:09:24,030 adapt knowing this approach of just 198 00:09:24,030 --> 00:09:27,060 sort of pushing a telephone book periodically doesn't work. 199 00:09:27,060 --> 00:09:30,301 You might actually adapt a model, a centralized server 200 00:09:30,301 --> 00:09:30,800 model. 201 00:09:35,240 --> 00:09:38,960 And the central server model is, the main difference 202 00:09:38,960 --> 00:09:42,900 from the telephone book model is that nothing is pushed to you. 203 00:09:42,900 --> 00:09:44,880 Instead, imagine a search engine like, say, 204 00:09:44,880 --> 00:09:46,554 Google or something like that where 205 00:09:46,554 --> 00:09:48,470 if you want to know the mapping between a name 206 00:09:48,470 --> 00:09:52,320 and in it the address, you go and contact this name service, 207 00:09:52,320 --> 00:09:54,470 which turns out to be a central server. 208 00:09:54,470 --> 00:09:57,810 And it does essentially the same thing that Google might do. 209 00:09:57,810 --> 00:10:01,090 And this actually isn't that hard to implement 210 00:10:01,090 --> 00:10:02,880 from a technical standpoint because Google 211 00:10:02,880 --> 00:10:04,300 does much more than this. 212 00:10:04,300 --> 00:10:06,187 And clearly it works. 213 00:10:06,187 --> 00:10:08,270 But the real problem with the central server model 214 00:10:08,270 --> 00:10:10,270 is what I alluded to the last time. 215 00:10:10,270 --> 00:10:15,990 It doesn't handle, when you assign names to machines, 216 00:10:15,990 --> 00:10:18,050 you want to make sure that people don't conflict 217 00:10:18,050 --> 00:10:20,764 on these names because these are well-defined names for machines 218 00:10:20,764 --> 00:10:21,430 on the Internet. 219 00:10:21,430 --> 00:10:23,120 And you need a model by which you 220 00:10:23,120 --> 00:10:28,150 can decide who is allowed to name a machine as fool.MIT.edu 221 00:10:28,150 --> 00:10:32,260 and who's allowed to take X.CNN.com, and so on. 222 00:10:32,260 --> 00:10:34,170 And so, for every computer in the Internet, 223 00:10:34,170 --> 00:10:38,300 having a central person deal with deciding whether that's OK 224 00:10:38,300 --> 00:10:40,510 or not isn't a scalable solution. 225 00:10:40,510 --> 00:10:44,840 And that's why we don't really adopt that model for DNS. 226 00:10:44,840 --> 00:10:46,590 The model that's adapted for DNS, 227 00:10:46,590 --> 00:10:49,840 and has a number of other attractive properties, which 228 00:10:49,840 --> 00:10:53,000 we'll talk about, is the distributed database model, 229 00:10:53,000 --> 00:11:00,970 or more generally, a distributed, federated model 230 00:11:00,970 --> 00:11:04,420 where every organization, and organizations could 231 00:11:04,420 --> 00:11:07,070 be recursively defined to have sub-organizations. 232 00:11:07,070 --> 00:11:09,030 Every organization sort of manages 233 00:11:09,030 --> 00:11:12,200 a portion of this overall global namespace. 234 00:11:12,200 --> 00:11:13,790 And it manages everything about it. 235 00:11:13,790 --> 00:11:17,820 It manages all of the mappings between names and records 236 00:11:17,820 --> 00:11:20,790 for that part of the namespace. 237 00:11:20,790 --> 00:11:23,560 And the more names they have, the more work they have to do. 238 00:11:23,560 --> 00:11:26,430 But nobody else really has to do that much more work. 239 00:11:26,430 --> 00:11:28,400 And it's a pretty nice model because everybody 240 00:11:28,400 --> 00:11:31,480 does some amount of work in terms of technical resources, 241 00:11:31,480 --> 00:11:34,870 and more importantly in terms of human administrative resources. 242 00:11:34,870 --> 00:11:39,710 And overall, that's one component 243 00:11:39,710 --> 00:11:41,204 of the scalability of the system. 244 00:11:41,204 --> 00:11:42,620 And that's one of the main reasons 245 00:11:42,620 --> 00:11:44,828 why the system turns out to scale and work very well. 246 00:11:47,480 --> 00:11:49,225 And the second goal is reliability. 247 00:11:54,386 --> 00:11:55,760 Once you have a system like this, 248 00:11:55,760 --> 00:11:57,260 and people start getting used to it, 249 00:11:57,260 --> 00:11:58,932 and applications start using names, 250 00:11:58,932 --> 00:12:00,640 it had better be the case that the system 251 00:12:00,640 --> 00:12:03,140 is in fact generally available. 252 00:12:03,140 --> 00:12:06,940 And it better not be the case that the DNS be the Achilles 253 00:12:06,940 --> 00:12:08,780 heel of the Internet in terms of, you know, 254 00:12:08,780 --> 00:12:10,590 the network infrastructure has a very, very 255 00:12:10,590 --> 00:12:13,270 high reliability because all this running stuff works 256 00:12:13,270 --> 00:12:13,770 out great. 257 00:12:13,770 --> 00:12:16,200 And we find all these alternate paths. 258 00:12:16,200 --> 00:12:19,097 And things don't work because I'm not able to get to my DNS. 259 00:12:19,097 --> 00:12:20,430 That had better not be the case. 260 00:12:20,430 --> 00:12:22,740 So we'll see what techniques DNS uses 261 00:12:22,740 --> 00:12:25,250 to get pretty good reliability. 262 00:12:25,250 --> 00:12:29,200 But the jury is still out as to exactly reliable it is. 263 00:12:29,200 --> 00:12:31,999 But overall most users will admit that generally it 264 00:12:31,999 --> 00:12:34,290 seems to work about as well as the rest of the network. 265 00:12:37,130 --> 00:12:38,550 So, what does DNS do? 266 00:12:38,550 --> 00:12:40,950 Fundamentally, it provides an abstraction 267 00:12:40,950 --> 00:12:44,199 called a lookup abstraction. 268 00:12:44,199 --> 00:12:44,990 You give it a name. 269 00:12:44,990 --> 00:12:46,920 It returns a record to you. 270 00:12:46,920 --> 00:12:50,720 And you can ask for particular types of records that you want. 271 00:12:50,720 --> 00:12:53,390 We'll get into that in a second. 272 00:12:53,390 --> 00:12:57,370 So you ask a name, and you get back a record. 273 00:12:57,370 --> 00:13:01,190 And, there's many ways, and different operating systems 274 00:13:01,190 --> 00:13:04,320 have different ways in which this exact function is called. 275 00:13:04,320 --> 00:13:06,050 For example, get host by name might 276 00:13:06,050 --> 00:13:11,890 be a commonly used way of going from a name to an IP address. 277 00:13:11,890 --> 00:13:13,620 But, we are going to just say DNS 278 00:13:13,620 --> 00:13:15,115 resolved just more abstractly. 279 00:13:21,360 --> 00:13:24,890 So an application can invoke DNS resolve, DNS name, 280 00:13:24,890 --> 00:13:28,130 and get an IP address or some other record associated 281 00:13:28,130 --> 00:13:30,190 with it. 282 00:13:30,190 --> 00:13:37,190 Now if you think back at the way we did our idealized naming 283 00:13:37,190 --> 00:13:39,230 model or genetic naming model, we actually 284 00:13:39,230 --> 00:13:42,550 had a resolve which took a name and a context as an argument. 285 00:13:42,550 --> 00:13:48,380 And, it returned back the value that was bound to that name. 286 00:13:48,380 --> 00:13:52,220 So you might ask what the context is for a DNS resolve. 287 00:13:52,220 --> 00:13:56,000 And it is actually multiple answers to this question. 288 00:13:56,000 --> 00:14:00,021 The first answer is that applications specify a context 289 00:14:00,021 --> 00:14:00,520 usually. 290 00:14:05,490 --> 00:14:08,370 So, for example, an application might specify that. 291 00:14:08,370 --> 00:14:12,560 It wishes to know the IP address of this name. 292 00:14:12,560 --> 00:14:14,680 For example, if it's a web browser that 293 00:14:14,680 --> 00:14:17,000 wants to know some server's IP address, 294 00:14:17,000 --> 00:14:19,770 so it can connect to it using the TCP connection. 295 00:14:19,770 --> 00:14:22,900 Alternatively, an application like an e-mail program 296 00:14:22,900 --> 00:14:26,080 might specify that it wants not the IP address of this machine, 297 00:14:26,080 --> 00:14:31,830 but the name of a machine that can handle mail 298 00:14:31,830 --> 00:14:33,390 on behalf of this name. 299 00:14:33,390 --> 00:14:38,590 So, for example, if I send somebody email to ABC.MIT.edu, 300 00:14:38,590 --> 00:14:42,760 my mail program, somewhere along the way some server would 301 00:14:42,760 --> 00:14:45,670 do a lookup for a special kind of record 302 00:14:45,670 --> 00:14:47,600 turns out to be called the MX record, which 303 00:14:47,600 --> 00:14:52,300 is a mail record, which would then return to caller 304 00:14:52,300 --> 00:14:56,230 not the IP address of MIT.edu, but the IP address of, in fact, 305 00:14:56,230 --> 00:14:59,770 we learn a name of a machine that's capable of handling mail 306 00:14:59,770 --> 00:15:00,614 for MIT.edu. 307 00:15:00,614 --> 00:15:02,030 And in general, that would be very 308 00:15:02,030 --> 00:15:06,200 different from the IP address associated with MIT.edu. 309 00:15:06,200 --> 00:15:12,030 So that's established as a context. 310 00:15:12,030 --> 00:15:15,290 And more generally, there is a DNS configuration file. 311 00:15:15,290 --> 00:15:19,780 There is some DNS configuration on registry, 312 00:15:19,780 --> 00:15:23,930 or whatever it is depending on the system that you have that 313 00:15:23,930 --> 00:15:28,510 tells this resolve step what context in which 314 00:15:28,510 --> 00:15:29,850 the name must be resolved. 315 00:15:29,850 --> 00:15:32,310 So just for example, you might have a machine, 316 00:15:32,310 --> 00:15:36,040 cricket.MIT.edu, another machine cricket.berklee.edu. 317 00:15:36,040 --> 00:15:39,380 And DNS resolve might call DNS resolve of cricket. 318 00:15:42,940 --> 00:15:45,510 And when the application calls that, 319 00:15:45,510 --> 00:15:48,760 the program doing the resolution needs 320 00:15:48,760 --> 00:15:50,420 to know what context to do it in. 321 00:15:50,420 --> 00:15:52,860 And often, that's specified in the DNS configuration. 322 00:15:52,860 --> 00:15:55,560 So on Windows and UNIX, there is this thing 323 00:15:55,560 --> 00:15:56,660 called a search path. 324 00:15:56,660 --> 00:15:58,800 And you go through a set of search paths 325 00:15:58,800 --> 00:16:02,450 that provide this context. 326 00:16:02,450 --> 00:16:07,080 So will see this in more detail later today. 327 00:16:07,080 --> 00:16:09,780 Now users themselves, and there's 328 00:16:09,780 --> 00:16:13,080 an important point here about lookups, and how lookups 329 00:16:13,080 --> 00:16:15,250 are to be distinguished from something else that 330 00:16:15,250 --> 00:16:17,350 are called search. 331 00:16:17,350 --> 00:16:21,220 Now, these names are generally very useful for programs 332 00:16:21,220 --> 00:16:24,640 because they allow modularity to occur, 333 00:16:24,640 --> 00:16:27,400 where it no longer are you worried about services being 334 00:16:27,400 --> 00:16:28,750 associated with IP addresses. 335 00:16:28,750 --> 00:16:30,750 You can move your website between IP addresses 336 00:16:30,750 --> 00:16:33,320 of the back without telling everybody about it. 337 00:16:33,320 --> 00:16:35,580 So from that standpoint this name 338 00:16:35,580 --> 00:16:39,760 to record mapping a DNS with lookups is very, very useful. 339 00:16:39,760 --> 00:16:44,120 In terms of convenience, the DNS is not the whole story 340 00:16:44,120 --> 00:16:47,400 because although you most often send email 341 00:16:47,400 --> 00:16:50,560 to people with fool@MIT.edu so you remember 342 00:16:50,560 --> 00:16:54,180 that or you have some other file in which you find 343 00:16:54,180 --> 00:16:56,640 that information very easily, more generally 344 00:16:56,640 --> 00:17:00,280 speaking people often don't, we need something in addition 345 00:17:00,280 --> 00:17:01,940 to just lookups. 346 00:17:01,940 --> 00:17:05,010 Human users need something in addition to lookups. 347 00:17:05,010 --> 00:17:09,010 And the general term given to that is search. 348 00:17:09,010 --> 00:17:12,157 So for example Google provides search on the Internet. 349 00:17:12,157 --> 00:17:13,990 And tomorrow's recitation talks a little bit 350 00:17:13,990 --> 00:17:16,000 about one aspect of that. 351 00:17:16,000 --> 00:17:18,589 And in fact, it's an interesting discussion 352 00:17:18,589 --> 00:17:20,910 because you will find from the reading 353 00:17:20,910 --> 00:17:25,079 tomorrow that users were using a search engine to essentially do 354 00:17:25,079 --> 00:17:28,780 a lookup task where they would go to Google and type CNN.com 355 00:17:28,780 --> 00:17:30,870 when in fact if they already knew to CNN.com 356 00:17:30,870 --> 00:17:33,760 they could just as well have typed it on their URL window. 357 00:17:33,760 --> 00:17:39,500 And that's sort of the way real users turned out 358 00:17:39,500 --> 00:17:41,630 to use the Web. 359 00:17:41,630 --> 00:17:44,870 But over peer-to-peer applications that most of you 360 00:17:44,870 --> 00:17:49,816 might be more familiar with than I am have a form of search. 361 00:17:49,816 --> 00:17:51,940 And those applications are interesting because they 362 00:17:51,940 --> 00:17:54,190 do searches on all sorts of attributes of the content 363 00:17:54,190 --> 00:17:56,340 that you want, and by and large don't really use 364 00:17:56,340 --> 00:17:58,539 DNS in a particularly, if they use it 365 00:17:58,539 --> 00:17:59,830 at all it's sort of incidental. 366 00:17:59,830 --> 00:18:01,540 They probably don't even need to use it. 367 00:18:01,540 --> 00:18:03,470 So it's not like all Internet applications 368 00:18:03,470 --> 00:18:05,600 require DNS in order to work. 369 00:18:05,600 --> 00:18:07,480 In fact, there are plenty of applications 370 00:18:07,480 --> 00:18:10,630 that don't need DNS at all. 371 00:18:10,630 --> 00:18:14,590 So, and they benefit a lot from search. 372 00:18:14,590 --> 00:18:20,010 OK, so let's get back to DNS and talk a little bit 373 00:18:20,010 --> 00:18:21,630 about a few different topics. 374 00:18:21,630 --> 00:18:24,020 We're going to start first with the namespace, 375 00:18:24,020 --> 00:18:25,509 and how it works. 376 00:18:25,509 --> 00:18:27,050 And then were going to talk about how 377 00:18:27,050 --> 00:18:28,330 name resolution works. 378 00:18:28,330 --> 00:18:30,746 And then we're going to talk about things like performance 379 00:18:30,746 --> 00:18:32,185 and scalability and robustness. 380 00:18:35,850 --> 00:18:38,030 So the DNS namespace has two properties to it. 381 00:18:38,030 --> 00:18:42,710 The first is, in both of these are nice ideas, 382 00:18:42,710 --> 00:18:45,770 and hit upon a theme, or at least one of which 383 00:18:45,770 --> 00:18:48,390 hits upon a theme we've covered in 6.033. 384 00:18:48,390 --> 00:18:50,390 DNS is a hierarchical system. 385 00:18:50,390 --> 00:18:53,471 And it's a structured namespace. 386 00:18:53,471 --> 00:18:55,470 I'll describe what structured means in a moment. 387 00:18:58,460 --> 00:19:00,540 So the way our DNS namespace works 388 00:19:00,540 --> 00:19:03,770 is it's a hierarchical system which is structured as a tree. 389 00:19:03,770 --> 00:19:06,450 And at the top of the tree there is a little circle 390 00:19:06,450 --> 00:19:08,820 which really is a dot. 391 00:19:08,820 --> 00:19:11,790 And I'm going to call that the root. 392 00:19:11,790 --> 00:19:13,529 So, everything starts at the root. 393 00:19:13,529 --> 00:19:15,320 And the root is the root of this namespace. 394 00:19:15,320 --> 00:19:16,760 And it's a tree. 395 00:19:16,760 --> 00:19:22,930 The namespace is divided into A, a bunch of domains. 396 00:19:22,930 --> 00:19:24,650 And domains are divided into subdomains. 397 00:19:24,650 --> 00:19:26,320 And subdomains are recursively divided 398 00:19:26,320 --> 00:19:28,810 into sub-subdomains, and so on. 399 00:19:28,810 --> 00:19:30,390 And in fact, the depth is arbitrary. 400 00:19:30,390 --> 00:19:32,440 It could be arbitrarily long. 401 00:19:32,440 --> 00:19:34,910 In practice, nobody really has need for depth 402 00:19:34,910 --> 00:19:36,480 more than four or five. 403 00:19:36,480 --> 00:19:39,790 And, in fact, 90% of the names, or more than 90% of the names 404 00:19:39,790 --> 00:19:41,610 are pretty flat. 405 00:19:41,610 --> 00:19:44,140 You don't go more than two levels down. 406 00:19:44,140 --> 00:19:47,240 So at the top level, this is pretty familiar to most of you. 407 00:19:47,240 --> 00:19:51,880 You would have com, and edu, and net, and org, 408 00:19:51,880 --> 00:19:55,450 and gov, and a few others. 409 00:19:55,450 --> 00:19:58,970 And these things, there is actually about 13 410 00:19:58,970 --> 00:19:59,880 of them right now. 411 00:19:59,880 --> 00:20:02,720 These things are called generic, top-level domains, so generic 412 00:20:02,720 --> 00:20:04,720 in the sense that they are not really associated 413 00:20:04,720 --> 00:20:07,266 with any country, for example. 414 00:20:07,266 --> 00:20:08,640 And then in addition to this, you 415 00:20:08,640 --> 00:20:10,860 have a whole bunch of country codes like dot US, 416 00:20:10,860 --> 00:20:12,990 and I don't know how many countries there are, 417 00:20:12,990 --> 00:20:15,000 but there's a large number of them. 418 00:20:15,000 --> 00:20:18,050 So those things are country code top level domains. 419 00:20:18,050 --> 00:20:21,390 And they are not that interesting in that there's 420 00:20:21,390 --> 00:20:23,390 nothing different about them from anything else. 421 00:20:23,390 --> 00:20:26,440 So we may as well just look at the generic top-level domains. 422 00:20:26,440 --> 00:20:32,040 OK, and edu gets divided into MIT and other places that don't 423 00:20:32,040 --> 00:20:36,910 matter and so on and so forth. 424 00:20:36,910 --> 00:20:40,000 So you might end up down here. 425 00:20:40,000 --> 00:20:44,199 You might have www, or website, or for whatever 426 00:20:44,199 --> 00:20:44,990 the student's from. 427 00:20:44,990 --> 00:20:47,440 I don't know who actually owns this. 428 00:20:47,440 --> 00:20:49,860 C sale might be here. 429 00:20:49,860 --> 00:20:51,010 EECS might be here. 430 00:20:51,010 --> 00:20:53,320 And I might have a machine underneath here, 431 00:20:53,320 --> 00:20:57,040 let's say, X just around a machine. 432 00:20:57,040 --> 00:20:58,700 And, the thing about the DNS namespace 433 00:20:58,700 --> 00:21:04,990 is that every label here, this is a label, right? 434 00:21:04,990 --> 00:21:09,260 So the way you read this is if you start at any label 435 00:21:09,260 --> 00:21:13,260 here and go upward, you can read it out from bottom to top. 436 00:21:13,260 --> 00:21:15,870 So you would say com is an example of a label. 437 00:21:15,870 --> 00:21:18,260 MIT, edu, root is a label. 438 00:21:18,260 --> 00:21:21,490 So that's read as MIT.edu dot. 439 00:21:21,490 --> 00:21:22,990 And usually people omit the trailing 440 00:21:22,990 --> 00:21:25,800 dot because it's implicit. 441 00:21:25,800 --> 00:21:31,660 Or you X.csale.MIT.edu dot is another fully formed, what's it 442 00:21:31,660 --> 00:21:33,570 called, fully qualified domain name. 443 00:21:33,570 --> 00:21:35,130 OK, that's sort of the correct way 444 00:21:35,130 --> 00:21:37,030 to read it out is from bottom to top. 445 00:21:37,030 --> 00:21:39,824 Now, every node here is associated 446 00:21:39,824 --> 00:21:40,740 with some information. 447 00:21:40,740 --> 00:21:42,560 And that's what this record means. 448 00:21:42,560 --> 00:21:44,364 OK, some nodes might have nothing in it, 449 00:21:44,364 --> 00:21:46,030 but in general, every node is associated 450 00:21:46,030 --> 00:21:47,180 with some information. 451 00:21:47,180 --> 00:21:49,990 So, for example, X might have information here. 452 00:21:49,990 --> 00:21:54,120 I'm just going to call it INFO for now. 453 00:21:54,120 --> 00:21:55,960 But, X might have associated with it an 454 00:21:55,960 --> 00:21:59,219 A record, which is an IP address record. 455 00:21:59,219 --> 00:22:00,510 I'll describe that in a moment. 456 00:22:00,510 --> 00:22:02,593 But it might have other things associated with it. 457 00:22:02,593 --> 00:22:04,040 In fact, DNS is pretty flexible. 458 00:22:04,040 --> 00:22:08,640 You could define your own record and put that into the system, 459 00:22:08,640 --> 00:22:10,369 and have applications that read from it. 460 00:22:10,369 --> 00:22:11,410 It's pretty opaque to it. 461 00:22:11,410 --> 00:22:14,741 It doesn't really require; if you have something new that 462 00:22:14,741 --> 00:22:17,240 comes up and you want to use it, you could stick it into DNS 463 00:22:17,240 --> 00:22:18,073 and retrieve it out. 464 00:22:18,073 --> 00:22:20,230 So people put all sorts of things now into DNS. 465 00:22:20,230 --> 00:22:22,710 For example, there are weird proposals. 466 00:22:22,710 --> 00:22:26,060 But in GPS coordinates of a name. 467 00:22:26,060 --> 00:22:29,280 So if you know MIT.edu isn't going to move very much, 468 00:22:29,280 --> 00:22:31,580 then put in the GPS coordinates. 469 00:22:31,580 --> 00:22:33,580 You might find applications that find it useful. 470 00:22:33,580 --> 00:22:36,569 For example, if you are some mobile computing application, 471 00:22:36,569 --> 00:22:37,610 you might find it useful. 472 00:22:37,610 --> 00:22:40,160 So there's all sorts of things you could stick into the DNS. 473 00:22:40,160 --> 00:22:42,660 And people have come up with all sorts of very wacky things, 474 00:22:42,660 --> 00:22:45,790 including telephone numbers in DNS. 475 00:22:45,790 --> 00:22:46,855 It's very flexible. 476 00:22:49,640 --> 00:22:51,760 So not only are the leaves associated 477 00:22:51,760 --> 00:22:54,210 with things with information, but you 478 00:22:54,210 --> 00:22:58,720 can have information at any level, any node in the tree 479 00:22:58,720 --> 00:23:00,980 has information associated with it. 480 00:23:00,980 --> 00:23:04,120 And the scale of this namespace today is extremely vague. 481 00:23:04,120 --> 00:23:06,960 I mean, I don't know the exact number of things there are, 482 00:23:06,960 --> 00:23:08,830 but I've read that about 500 million, 483 00:23:08,830 --> 00:23:12,780 I don't know if it's 250 million or 500 million, 484 00:23:12,780 --> 00:23:17,320 somewhere in between, there are that many registered names 485 00:23:17,320 --> 00:23:19,390 in the system in aggregate. 486 00:23:19,390 --> 00:23:22,684 That's a pretty big number. 487 00:23:22,684 --> 00:23:23,850 Now, it's a very big number. 488 00:23:23,850 --> 00:23:26,780 So you need a way in which you can make the system scale. 489 00:23:26,780 --> 00:23:32,560 And that's done using a really nice idea called delegation. 490 00:23:32,560 --> 00:23:35,664 And more than any technical decision that was made in DNS, 491 00:23:35,664 --> 00:23:37,580 I mean, we're going to talk about some of them 492 00:23:37,580 --> 00:23:39,940 like caching and all this other stuff. 493 00:23:39,940 --> 00:23:42,450 But more than any technical decision, 494 00:23:42,450 --> 00:23:45,797 delegation is really the reason DNS scales. 495 00:23:45,797 --> 00:23:47,380 And, it's ultimately really the reason 496 00:23:47,380 --> 00:23:50,190 why DNS is pretty successful. 497 00:23:50,190 --> 00:23:51,230 So, what is delegation? 498 00:23:51,230 --> 00:23:54,140 The best way to understand it is recursively. 499 00:23:54,140 --> 00:23:59,960 So the root at the top is centrally owned, 500 00:23:59,960 --> 00:24:02,240 and it's owned by a trusted entity. 501 00:24:02,240 --> 00:24:04,840 So what that means is that any name that 502 00:24:04,840 --> 00:24:07,890 ends in root, ultimately the authority for that name 503 00:24:07,890 --> 00:24:10,200 rests with whoever owns and runs root. 504 00:24:10,200 --> 00:24:14,380 And if you paid attention to the press, and the newspapers, 505 00:24:14,380 --> 00:24:16,220 or magazines, you'll see that there's 506 00:24:16,220 --> 00:24:18,970 this fight for the root that's ongoing right now 507 00:24:18,970 --> 00:24:20,249 over the past few years. 508 00:24:20,249 --> 00:24:22,790 And, the current owner of the root and things associated with 509 00:24:22,790 --> 00:24:26,080 it, and who essentially controls the namespace, 510 00:24:26,080 --> 00:24:30,200 is an entity called ICANN, I-C-A-N-N, 511 00:24:30,200 --> 00:24:34,270 and there's a lot of politics associated with it. 512 00:24:34,270 --> 00:24:38,920 Now, continuing down on the delegation idea, 513 00:24:38,920 --> 00:24:43,260 the next layer down from the root, the technical term for it 514 00:24:43,260 --> 00:24:55,110 is top-level domain because it's at the top level: TLD, OK? 515 00:24:55,110 --> 00:24:56,899 And, these top-level domains that 516 00:24:56,899 --> 00:24:58,940 are delegated from the root, and it's really hard 517 00:24:58,940 --> 00:25:00,880 to come up with a new top-level domain. 518 00:25:00,880 --> 00:25:02,630 In relation to these, you have a few more. 519 00:25:02,630 --> 00:25:04,370 But you don't really come up with them willy-nilly. 520 00:25:04,370 --> 00:25:06,680 You kind of have to go through a long procedure 521 00:25:06,680 --> 00:25:11,260 before the root decides to delegate the top-level domain 522 00:25:11,260 --> 00:25:12,260 to somebody else. 523 00:25:12,260 --> 00:25:14,430 And now it's recursive from here on. 524 00:25:14,430 --> 00:25:18,130 Every label here can be sub-delegated arbitrarily 525 00:25:18,130 --> 00:25:20,770 by only contacting the owner of that label. 526 00:25:20,770 --> 00:25:23,600 So once you get to com, you don't have to go further down. 527 00:25:23,600 --> 00:25:27,840 You just have to go to whoever happens to own the com label 528 00:25:27,840 --> 00:25:29,690 and tell them that you want to register 529 00:25:29,690 --> 00:25:32,210 a new portion of the namespace with that. 530 00:25:32,210 --> 00:25:34,780 So, for example, MIT must have gone at some point 531 00:25:34,780 --> 00:25:37,240 to the owner of edu and said, I want MIT. 532 00:25:37,240 --> 00:25:39,210 And, there's some out-of-band human procedure 533 00:25:39,210 --> 00:25:42,130 that occurs before the other party is convinced 534 00:25:42,130 --> 00:25:45,670 that this MIT is a legitimate entity, 535 00:25:45,670 --> 00:25:48,440 and allocates this name to it, and likewise. 536 00:25:48,440 --> 00:25:54,590 Whoever wanted CSALE went to the person who runs MIT's name. 537 00:25:54,590 --> 00:25:59,830 This is called a zone, this DNS namespace 538 00:25:59,830 --> 00:26:01,460 and told it that it wanted CSALE, 539 00:26:01,460 --> 00:26:05,470 and established by human efforts rather than anything technical 540 00:26:05,470 --> 00:26:08,710 that it wanted that portion. 541 00:26:08,710 --> 00:26:10,840 Now, the reason it scales is you can kind of 542 00:26:10,840 --> 00:26:13,530 add machines, you know, names at the bottom, not machines 543 00:26:13,530 --> 00:26:15,500 but names anywhere here without really 544 00:26:15,500 --> 00:26:16,730 having to go all the way up. 545 00:26:16,730 --> 00:26:22,146 You just need to go up to whoever owns your parent label 546 00:26:22,146 --> 00:26:23,770 and convince them that you want a name. 547 00:26:23,770 --> 00:26:25,580 So if I want to add a machine, Y, I 548 00:26:25,580 --> 00:26:28,060 don't really have to go and talk to even anybody at MIT. 549 00:26:28,060 --> 00:26:29,210 I just have to talk to the person who 550 00:26:29,210 --> 00:26:30,880 runs the CSALE namespace and tell them 551 00:26:30,880 --> 00:26:32,520 that I want a name, Y. 552 00:26:32,520 --> 00:26:34,450 And I can, then, sub-delegate that name. 553 00:26:34,450 --> 00:26:38,240 I could, today, connect a computer X.CSALE.MIT.edu 554 00:26:38,240 --> 00:26:40,320 have in this IP address with it, and tomorrow 555 00:26:40,320 --> 00:26:42,236 decide I don't really want X to be a computer. 556 00:26:42,236 --> 00:26:44,710 I want it to be the name of my research group or whatever, 557 00:26:44,710 --> 00:26:46,990 and then have machines underneath it which 558 00:26:46,990 --> 00:26:50,150 are Y.X.CSALE.MIT.edu. 559 00:26:50,150 --> 00:26:51,860 What I do with the label is my business. 560 00:26:51,860 --> 00:26:54,170 And there's no rule that these are IP addresses. 561 00:26:54,170 --> 00:26:55,294 In fact, these are nothing. 562 00:26:55,294 --> 00:26:56,214 These are just labels. 563 00:26:56,214 --> 00:26:58,380 And I can associate arbitrary amounts of information 564 00:26:58,380 --> 00:27:01,640 I want with that label. 565 00:27:06,980 --> 00:27:10,420 So, domains can be formed anywhere in tree. 566 00:27:17,890 --> 00:27:19,602 And that's really nice because you 567 00:27:19,602 --> 00:27:21,310 don't have to go back to a central entity 568 00:27:21,310 --> 00:27:22,660 in order to add these names. 569 00:27:22,660 --> 00:27:25,620 And that's the main reason why the central server 570 00:27:25,620 --> 00:27:28,000 model doesn't really fly. 571 00:27:35,580 --> 00:27:39,810 OK, so examples of records, we've 572 00:27:39,810 --> 00:27:41,750 already seen a few of these. 573 00:27:41,750 --> 00:27:44,457 So let's look in more detail at what this info could contain. 574 00:27:44,457 --> 00:27:45,790 Like I said, it's very flexible. 575 00:27:45,790 --> 00:27:47,831 You can have all sorts of things you add in here. 576 00:27:47,831 --> 00:27:49,500 But there's a few very common ones. 577 00:27:49,500 --> 00:27:51,290 The first one is called an A record, which 578 00:27:51,290 --> 00:27:52,607 stands for an address record. 579 00:27:52,607 --> 00:27:53,940 And, it's what you might expect. 580 00:27:53,940 --> 00:27:57,510 It's an IP version four address for a name. 581 00:27:57,510 --> 00:28:01,690 So, X.CSALE.MIT.edu, whatever its IP address is. 582 00:28:01,690 --> 00:28:03,510 So, that's stuck in this database. 583 00:28:03,510 --> 00:28:06,680 It's really maintained in a file on the name 584 00:28:06,680 --> 00:28:09,770 server that handles that name. 585 00:28:09,770 --> 00:28:14,000 You might have MX, which stands, the X is for mail exchanger. 586 00:28:14,000 --> 00:28:15,200 So, it's mail exchanger. 587 00:28:15,200 --> 00:28:16,140 So, that's for email. 588 00:28:16,140 --> 00:28:19,802 So, when I send email to you@MIT.edu, somewhere 589 00:28:19,802 --> 00:28:21,260 along the way there's a lookup done 590 00:28:21,260 --> 00:28:24,350 for not the IP address of MIT.edu, 591 00:28:24,350 --> 00:28:26,690 but the MX record for MIT.edu. 592 00:28:26,690 --> 00:28:28,690 And in general, the MX record could be anywhere. 593 00:28:28,690 --> 00:28:31,220 If MIT decided to outsource its email functionality 594 00:28:31,220 --> 00:28:34,290 to some other company, the MX record 595 00:28:34,290 --> 00:28:36,450 would just point to some name of a mail 596 00:28:36,450 --> 00:28:37,680 server in that other company. 597 00:28:37,680 --> 00:28:40,762 So it doesn't even have to be local to us. 598 00:28:40,762 --> 00:28:42,970 There's another one that's interesting in the context 599 00:28:42,970 --> 00:28:46,450 of stuff we've seen before called a C name, which 600 00:28:46,450 --> 00:28:48,280 stands for a canonical name. 601 00:28:48,280 --> 00:28:50,040 But a C name is really a synonym. 602 00:28:53,850 --> 00:28:56,690 This is where you can say there are many names that 603 00:28:56,690 --> 00:28:58,230 really mean the same thing. 604 00:28:58,230 --> 00:29:01,010 So, for example, to take a very real example, 605 00:29:01,010 --> 00:29:06,470 there used to be AI Lab and LCS, and now there's 606 00:29:06,470 --> 00:29:08,150 the same lab, CSALE. 607 00:29:08,150 --> 00:29:11,660 Now, there are a lot of subdomains from LCS.MIT.edu, 608 00:29:11,660 --> 00:29:13,000 and AI.MIT.edu. 609 00:29:13,000 --> 00:29:14,500 And sort of it's a nightmare to have 610 00:29:14,500 --> 00:29:16,610 to go and change all of the DNS entries 611 00:29:16,610 --> 00:29:17,934 for all of the machines. 612 00:29:17,934 --> 00:29:20,350 So the standard way in which you manage this kind of thing 613 00:29:20,350 --> 00:29:22,680 is to set up these C names, which says [UNINTELLIGIBLE] 614 00:29:22,680 --> 00:29:23,470 NMS.LCS.MIT.edu. 615 00:29:23,470 --> 00:29:27,060 That's what it was. 616 00:29:27,060 --> 00:29:31,410 You just set up a C name that says NMS.LCS.MIT.edu, 617 00:29:31,410 --> 00:29:34,990 there's a C name for NMS.CSALE.MIT.edu. 618 00:29:34,990 --> 00:29:37,830 So, you don't have to do anything more other than set up 619 00:29:37,830 --> 00:29:38,490 these synonyms. 620 00:29:38,490 --> 00:29:41,520 And everything else just sort of continues to work out. 621 00:29:41,520 --> 00:29:45,280 There are other useful things you could do with C names. 622 00:29:45,280 --> 00:29:47,240 For example, if you decide you want, 623 00:29:47,240 --> 00:29:50,570 you're running your Web server on one machine, 624 00:29:50,570 --> 00:29:53,300 and then you want to change it over to another machine 625 00:29:53,300 --> 00:29:56,597 but not have to tell the whole world about it, what you 626 00:29:56,597 --> 00:29:58,930 do is you tell everybody that your Web server's running, 627 00:29:58,930 --> 00:30:00,530 let's say, MIT.EDU. 628 00:30:00,530 --> 00:30:04,320 And then you set up a C name for that MIT.edu 629 00:30:04,320 --> 00:30:06,490 to machineone.MIT.edu. 630 00:30:06,490 --> 00:30:07,830 It's another name. 631 00:30:07,830 --> 00:30:10,700 And then someday you decide to change from machineone.MIT.edu 632 00:30:10,700 --> 00:30:12,460 to machinetwo.MIT.edu. 633 00:30:12,460 --> 00:30:14,490 All you have to do is to change this one mapping 634 00:30:14,490 --> 00:30:18,920 in the backhand in the DNS that maps from MIT.edu, 635 00:30:18,920 --> 00:30:21,030 set up a C name mapping to a different machine, 636 00:30:21,030 --> 00:30:22,794 and that's all you have to do. 637 00:30:22,794 --> 00:30:24,960 You don't have to tell anybody else about the change 638 00:30:24,960 --> 00:30:25,770 that you've made. 639 00:30:25,770 --> 00:30:26,840 So, it's very useful. 640 00:30:26,840 --> 00:30:30,360 And this is just an example of a more general concept 641 00:30:30,360 --> 00:30:33,257 that we've seen already called a synonym. 642 00:30:33,257 --> 00:30:34,840 In the fourth thing, which is actually 643 00:30:34,840 --> 00:30:36,590 what we're going to spend most of our time 644 00:30:36,590 --> 00:30:39,670 on today, for the rest of today, is called 645 00:30:39,670 --> 00:30:42,750 an NS record, which is a name server record. 646 00:30:47,370 --> 00:30:48,870 And, this is the thing that's really 647 00:30:48,870 --> 00:30:53,490 going to help us figure out how to implement 648 00:30:53,490 --> 00:30:56,905 DNS resolve in a scalable way. 649 00:30:56,905 --> 00:30:59,280 I will describe what a name server record is in a moment. 650 00:31:03,200 --> 00:31:08,212 So if you look at this tree, associated 651 00:31:08,212 --> 00:31:09,670 with many of the labels in the tree 652 00:31:09,670 --> 00:31:13,560 are not just A records, which are generally associated, 653 00:31:13,560 --> 00:31:16,300 but also things called NS records. 654 00:31:16,300 --> 00:31:20,220 And, what an NS record says is that, let's say 655 00:31:20,220 --> 00:31:23,240 there's an NS record associated with MIT.edu. 656 00:31:23,240 --> 00:31:28,310 What it says is that that NS record 657 00:31:28,310 --> 00:31:31,370 gives the name of the machine that's responsible for managing 658 00:31:31,370 --> 00:31:34,780 all of the names that end with MIT.edu. 659 00:31:34,780 --> 00:31:38,980 So, for example, edu wouldn't know anything 660 00:31:38,980 --> 00:31:41,450 about, in general, edu may not know anything about how 661 00:31:41,450 --> 00:31:44,870 CSALE.MIT.edu is really mapped. 662 00:31:44,870 --> 00:31:47,820 But all edu needs to know is what the NS record is 663 00:31:47,820 --> 00:31:51,890 for MIT.edu so that as you'll see in this procedure, 664 00:31:51,890 --> 00:31:55,010 you'll see that occasionally things at the top of the tree 665 00:31:55,010 --> 00:31:56,790 will get requests for a given name, 666 00:31:56,790 --> 00:31:59,689 and they need to know what to do with the full name. 667 00:31:59,689 --> 00:32:01,230 And, the way they'll do it is they'll 668 00:32:01,230 --> 00:32:03,688 find out that they don't know anything about the full name. 669 00:32:03,688 --> 00:32:07,040 But they know about other people who know more about that name. 670 00:32:07,040 --> 00:32:08,840 And that's going to be implemented using 671 00:32:08,840 --> 00:32:10,214 this thing called an NS record. 672 00:32:10,214 --> 00:32:11,630 So I'll show this with an example. 673 00:32:11,630 --> 00:32:14,530 I think it'll become pretty clear. 674 00:32:14,530 --> 00:32:20,264 So the way in which applications use DNS is you 675 00:32:20,264 --> 00:32:21,930 have an application that wants to called 676 00:32:21,930 --> 00:32:25,129 DNS resolve on a DNS name. 677 00:32:25,129 --> 00:32:26,670 And, there's a piece of software that 678 00:32:26,670 --> 00:32:28,919 usually runs on every computer called a stub resolver. 679 00:32:32,140 --> 00:32:34,130 OK, and a stub resolver is just something 680 00:32:34,130 --> 00:32:35,840 that allows applications to not have 681 00:32:35,840 --> 00:32:38,722 to worry about this whole RPC mechanism that DNS involves, 682 00:32:38,722 --> 00:32:39,430 that DNS entails. 683 00:32:39,430 --> 00:32:39,520 So, that's just [UNINTELLIGIBLE] way into the stub resolver. 684 00:32:39,520 --> 00:32:39,620 So the stub resolver really does all of the work. 685 00:32:39,620 --> 00:32:41,203 So the way the stub resolver works is, 686 00:32:41,203 --> 00:32:51,890 and the way DNS resolution works is 687 00:32:51,890 --> 00:32:57,070 that the stub resolver really can send a DNS request that it 688 00:32:57,070 --> 00:33:04,570 has from the application to any name server that it wants to. 689 00:33:04,570 --> 00:33:07,150 And later on we'll talk about how you pick this name server. 690 00:33:10,920 --> 00:33:12,930 And there are lots of these name servers around. 691 00:33:12,930 --> 00:33:15,510 On the internet, it's a massive distributed infrastructure. 692 00:33:15,510 --> 00:33:18,190 And, the infrastructure consists of people, 693 00:33:18,190 --> 00:33:20,380 of nodes that have responsibility 694 00:33:20,380 --> 00:33:22,580 for different portions of this namespace. 695 00:33:22,580 --> 00:33:24,630 So, you send a request to any name server, 696 00:33:24,630 --> 00:33:27,110 and they all participate in this system together, 697 00:33:27,110 --> 00:33:30,640 these name servers to help resolve names. 698 00:33:30,640 --> 00:33:31,890 So let's take an example here. 699 00:33:31,890 --> 00:33:33,181 Let's say it's X.CSALE.MIT.edu. 700 00:33:38,657 --> 00:33:40,490 So, at this point in general the name server 701 00:33:40,490 --> 00:33:44,000 knows nothing about X.CSALE.MIT.edu. 702 00:33:44,000 --> 00:33:46,590 And it's plan's going to be that it's 703 00:33:46,590 --> 00:33:55,530 going to send this request out to the root name server, OK? 704 00:33:55,530 --> 00:33:58,730 And for now, assume that the root name server is well-known, 705 00:33:58,730 --> 00:34:00,880 that is, the IP address of the name 706 00:34:00,880 --> 00:34:02,500 server in charge of the root. 707 00:34:02,500 --> 00:34:04,050 So everybody has a name server that 708 00:34:04,050 --> 00:34:05,580 got associated with themselves. 709 00:34:05,580 --> 00:34:07,954 Assume that the IP address of the name server of the root 710 00:34:07,954 --> 00:34:09,780 is just well known to everybody, OK? 711 00:34:12,350 --> 00:34:15,590 So it sends X.CSALE.MIT.edu all the way 712 00:34:15,590 --> 00:34:17,921 to the root name server. 713 00:34:17,921 --> 00:34:19,420 And the root name server is actually 714 00:34:19,420 --> 00:34:21,340 going to look at this thing and say, well, you 715 00:34:21,340 --> 00:34:22,650 want to know the A record. 716 00:34:22,650 --> 00:34:26,084 You might want to know the A record for X.CSALE.MIT.edu. 717 00:34:26,084 --> 00:34:27,500 It says, but I don't actually know 718 00:34:27,500 --> 00:34:30,300 what the IP address associated with this name is. 719 00:34:30,300 --> 00:34:32,800 But I do know somebody who can tell you 720 00:34:32,800 --> 00:34:37,050 more because I do know who runs the name service for edu. 721 00:34:37,050 --> 00:34:39,239 So, it comes back with the response that's 722 00:34:39,239 --> 00:34:42,030 also called a referral saying, well, I don't know the answer, 723 00:34:42,030 --> 00:34:45,060 but here's where you need to go in order to find out more. 724 00:34:45,060 --> 00:34:49,539 And that's done by sending back the name server, or in fact 725 00:34:49,539 --> 00:34:51,080 more generally a set of name servers, 726 00:34:51,080 --> 00:34:53,219 but sending back the NS records for nodes 727 00:34:53,219 --> 00:34:55,610 that handle the edu domain. 728 00:34:55,610 --> 00:34:58,600 So that just comes back at you. 729 00:34:58,600 --> 00:35:01,680 And now, you now know one or more name servers 730 00:35:01,680 --> 00:35:03,130 for the edu domain. 731 00:35:03,130 --> 00:35:05,440 So, let me write that here. 732 00:35:05,440 --> 00:35:08,000 And this name server then goes back to this edu domain 733 00:35:08,000 --> 00:35:13,430 and says, OK, tell me what the IP address for X.CSALE.MIT.edu 734 00:35:13,430 --> 00:35:13,930 is. 735 00:35:13,930 --> 00:35:15,430 And, notice that at all stages, it's 736 00:35:15,430 --> 00:35:17,990 sending the full name because there's always 737 00:35:17,990 --> 00:35:20,140 some chance that these nodes have the answer, 738 00:35:20,140 --> 00:35:22,980 and we'll get to why that might be in a few minutes. 739 00:35:22,980 --> 00:35:24,630 But you always send the full name. 740 00:35:24,630 --> 00:35:26,650 And the edu name server record in this case 741 00:35:26,650 --> 00:35:28,180 is, in general, going to say, well, 742 00:35:28,180 --> 00:35:30,560 I don't know about X.CSALE.MIT.edu. 743 00:35:30,560 --> 00:35:34,120 But I do know that MIT came and delegated from me. 744 00:35:34,120 --> 00:35:38,480 So I do know the name server record associated with MIT 745 00:35:38,480 --> 00:35:39,770 because it delegated from me. 746 00:35:39,770 --> 00:35:42,070 And in general, everybody has to know the name server 747 00:35:42,070 --> 00:35:44,432 records for the people one level down from them. 748 00:35:44,432 --> 00:35:46,390 So it sends back this information saying, well, 749 00:35:46,390 --> 00:35:47,390 I don't know what it is. 750 00:35:47,390 --> 00:35:52,720 But here's a referral to the name server for MIT.edu. 751 00:35:52,720 --> 00:35:54,710 And this procedure basically continues. 752 00:35:54,710 --> 00:35:58,250 So this is MIT. 753 00:35:58,250 --> 00:36:00,870 Actually it's just the MIT.edu name server. 754 00:36:00,870 --> 00:36:04,010 And you just go back and forth until eventually you 755 00:36:04,010 --> 00:36:07,000 get to the main server for CSALE.MIT.edu, which 756 00:36:07,000 --> 00:36:10,200 by definition maintains the mapping for everything 757 00:36:10,200 --> 00:36:11,781 of the form NAME.CSALE.MIT.edu. 758 00:36:14,670 --> 00:36:16,530 And so, you get the answer. 759 00:36:16,530 --> 00:36:19,350 Or you get something that says X is not actually 760 00:36:19,350 --> 00:36:23,020 registered in which case you get a no-such-domain error message. 761 00:36:23,020 --> 00:36:25,930 Actually it's a no-such-domain error code, which you then 762 00:36:25,930 --> 00:36:28,404 interpret as saying, OK, there is no such name that's 763 00:36:28,404 --> 00:36:29,070 been registered. 764 00:36:33,020 --> 00:36:35,090 Now there's a couple of things of note here. 765 00:36:35,090 --> 00:36:39,160 The name server records associated with the domain 766 00:36:39,160 --> 00:36:41,580 really have to have very little to do with that domain. 767 00:36:41,580 --> 00:36:43,930 In fact, the name server record for, 768 00:36:43,930 --> 00:36:47,580 forget the root names for a moment. 769 00:36:47,580 --> 00:36:53,190 The name server records for the edu domain 770 00:36:53,190 --> 00:36:55,730 don't actually have to end in edu, 771 00:36:55,730 --> 00:36:59,150 or don't have to end in anything that relates to edu. 772 00:36:59,150 --> 00:37:01,580 In practice, in fact, today they're 773 00:37:01,580 --> 00:37:04,610 of the form something.NSTLD.com. 774 00:37:04,610 --> 00:37:07,260 I mean, they have nothing to do with the edu domain. 775 00:37:07,260 --> 00:37:08,780 And, this is an important point. 776 00:37:08,780 --> 00:37:14,720 You could associate here any label, any name of a name 777 00:37:14,720 --> 00:37:20,310 server that's willing to manage the delegation, 778 00:37:20,310 --> 00:37:22,050 manage the entries in your database. 779 00:37:22,050 --> 00:37:24,750 They don't actually have to match the same domain name. 780 00:37:24,750 --> 00:37:26,870 And, this is a very powerful feature 781 00:37:26,870 --> 00:37:29,520 of the namespace of the way DNS works 782 00:37:29,520 --> 00:37:33,470 because it's not like this thing has to be something.edu, 783 00:37:33,470 --> 00:37:36,620 and here the name servers for MIT.edu 784 00:37:36,620 --> 00:37:39,380 have to be actually something.MIT.edu. 785 00:37:39,380 --> 00:37:42,840 In practice, edu is not managed by anything.edu. 786 00:37:42,840 --> 00:37:45,230 It's something, NSTLD.com. 787 00:37:45,230 --> 00:37:47,750 In practice, it so happens that MIT.edu happens 788 00:37:47,750 --> 00:37:49,840 to run its own name servers; that's 789 00:37:49,840 --> 00:37:51,770 something like bitsy.MIT.edu. 790 00:37:51,770 --> 00:37:53,750 But that's by no means a requirement. 791 00:38:09,316 --> 00:38:10,940 So there's a couple of problems that we 792 00:38:10,940 --> 00:38:15,000 need to solve about this basic mechanism it does what we saw 793 00:38:15,000 --> 00:38:17,340 was that once you know the root name server record, 794 00:38:17,340 --> 00:38:21,980 then you could go back and forth because everybody knows how 795 00:38:21,980 --> 00:38:24,460 the names one level down from them have been delegated 796 00:38:24,460 --> 00:38:28,350 and knows the name server entries for those delegates. 797 00:38:28,350 --> 00:38:31,390 And then you get the final answer. 798 00:38:31,390 --> 00:38:33,420 So there's a few things we need to solve. 799 00:38:33,420 --> 00:38:34,545 The first one is bootstrap. 800 00:38:38,219 --> 00:38:40,510 There's actually a couple of aspects of this bootstrap. 801 00:38:40,510 --> 00:38:46,590 The first one is, how does this any name server here know 802 00:38:46,590 --> 00:38:51,340 the name servers of the root? 803 00:38:51,340 --> 00:38:53,480 And, the answer here is that there's no magic. 804 00:38:53,480 --> 00:38:56,090 I mean, you just have to know. 805 00:38:56,090 --> 00:38:58,380 And, in some sense, in all naming systems, 806 00:38:58,380 --> 00:39:01,940 ultimately there is, at some level of this name discovery 807 00:39:01,940 --> 00:39:05,300 procedure, at some level there's out of band machinery that 808 00:39:05,300 --> 00:39:06,900 has to get in and be involved. 809 00:39:06,900 --> 00:39:10,804 And the way this works out is that people publish the name 810 00:39:10,804 --> 00:39:11,970 server records for the root. 811 00:39:11,970 --> 00:39:13,280 They post it on websites. 812 00:39:13,280 --> 00:39:15,020 They publish it on mailing lists. 813 00:39:15,020 --> 00:39:19,820 And you can figure DNS software is configured 814 00:39:19,820 --> 00:39:21,360 to manage those entries. 815 00:39:21,360 --> 00:39:24,290 And there are proposals and protocols for automatically 816 00:39:24,290 --> 00:39:25,316 updating it and so on. 817 00:39:25,316 --> 00:39:26,690 But you've got to be very careful 818 00:39:26,690 --> 00:39:28,106 with technical solutions like that 819 00:39:28,106 --> 00:39:31,550 because you want to make sure that malicious bodies don't 820 00:39:31,550 --> 00:39:35,140 pretend that they're telling you the correct name server entry 821 00:39:35,140 --> 00:39:37,930 because once they usurp DNS functioning, 822 00:39:37,930 --> 00:39:39,850 they could do a lot of damage. 823 00:39:39,850 --> 00:39:42,370 In practice, in fact, DNS is not particularly secure. 824 00:39:42,370 --> 00:39:44,970 It's a different discussion as to whether that's 825 00:39:44,970 --> 00:39:46,230 important or not. 826 00:39:46,230 --> 00:39:49,310 But the way this root name server mapping works 827 00:39:49,310 --> 00:39:51,694 is that everybody just knows. 828 00:39:51,694 --> 00:39:52,610 It's widely published. 829 00:39:52,610 --> 00:39:54,960 You go to Google and you'll just find the answer. 830 00:39:54,960 --> 00:39:59,130 It's also on all sorts of mailing lists. 831 00:39:59,130 --> 00:40:01,790 So the first one is root identity. 832 00:40:05,080 --> 00:40:08,780 The second issue is, how does the stub 833 00:40:08,780 --> 00:40:10,330 connect to any name server? 834 00:40:13,060 --> 00:40:14,480 Would it just pick at random? 835 00:40:14,480 --> 00:40:18,170 How does it find the name server that it can connect to? 836 00:40:18,170 --> 00:40:20,950 And the answer to this is that this is actually 837 00:40:20,950 --> 00:40:22,950 running on your computer. 838 00:40:22,950 --> 00:40:25,310 The stub result was running on your machine. 839 00:40:25,310 --> 00:40:27,410 It's a library in your machine. 840 00:40:27,410 --> 00:40:29,510 So, one approach, and the most common approach 841 00:40:29,510 --> 00:40:32,120 today is that you might obtain it 842 00:40:32,120 --> 00:40:35,530 when you obtain an IP address using a protocol like DCP where 843 00:40:35,530 --> 00:40:38,430 you turn on your computer and you get an IP address using 844 00:40:38,430 --> 00:40:39,320 some protocol. 845 00:40:39,320 --> 00:40:41,700 That protocol, some gateway upstream 846 00:40:41,700 --> 00:40:44,734 might tell you which name server to use. 847 00:40:44,734 --> 00:40:46,150 So if you have a computer at home, 848 00:40:46,150 --> 00:40:49,020 your Internet service provider would have something set up. 849 00:40:49,020 --> 00:40:54,870 So they would tell you what name server to use. 850 00:40:54,870 --> 00:40:57,440 So this can be done using a mechanism like DCP. 851 00:40:57,440 --> 00:41:00,580 Or like in the old days, and these days 852 00:41:00,580 --> 00:41:04,510 if you want to do this kind of work, it's done manually. 853 00:41:07,707 --> 00:41:09,540 For example, you could go into some registry 854 00:41:09,540 --> 00:41:11,482 somewhere or file like Kazaa.com. 855 00:41:11,482 --> 00:41:12,690 Then you just start stuff in. 856 00:41:17,760 --> 00:41:20,444 The second issue that we need to spend some time about, which 857 00:41:20,444 --> 00:41:22,610 we are going to spend the next five or so minutes on 858 00:41:22,610 --> 00:41:23,235 is performance. 859 00:41:28,580 --> 00:41:30,110 And the main point about performance 860 00:41:30,110 --> 00:41:33,020 is that if you look at that picture for how names are being 861 00:41:33,020 --> 00:41:36,340 resolved, if somebody at Berkeley 862 00:41:36,340 --> 00:41:39,110 wants to resolve X.CSALE.MIT.edu, 863 00:41:39,110 --> 00:41:40,820 there's a huge number of roundtrips 864 00:41:40,820 --> 00:41:42,980 that they have to do to all sorts of name servers 865 00:41:42,980 --> 00:41:45,240 all over the world, or at least all over the country 866 00:41:45,240 --> 00:41:47,290 before they can figure out what the IP 867 00:41:47,290 --> 00:41:48,610 address is from a machine. 868 00:41:48,610 --> 00:41:50,869 And this is a little bit silly because often, they 869 00:41:50,869 --> 00:41:52,410 might want to connect to your webpage 870 00:41:52,410 --> 00:41:55,920 and get a 4 kB thing, which takes a couple of roundtrips 871 00:41:55,920 --> 00:41:56,520 to get. 872 00:41:56,520 --> 00:41:57,894 And, in order to do that, they're 873 00:41:57,894 --> 00:42:01,130 spending a huge number of roundtrips latency, 874 00:42:01,130 --> 00:42:05,520 use latency in getting this right answer. 875 00:42:05,520 --> 00:42:07,870 So the approach that we're going to take to solve 876 00:42:07,870 --> 00:42:10,540 the too-much-too-many roundtrips problem is an approach 877 00:42:10,540 --> 00:42:11,436 you've already seen. 878 00:42:11,436 --> 00:42:12,560 We're going to use caching. 879 00:42:15,886 --> 00:42:17,510 And, in order for this caching to work, 880 00:42:17,510 --> 00:42:19,430 there is actually something in this picture 881 00:42:19,430 --> 00:42:22,220 that you have to understand in a little bit more detail 882 00:42:22,220 --> 00:42:24,500 about two different kinds of name resolutions 883 00:42:24,500 --> 00:42:26,560 that are really going on. 884 00:42:26,560 --> 00:42:28,010 The first kind of name resolution 885 00:42:28,010 --> 00:42:31,920 that's going on in DNS is the kind of resolution 886 00:42:31,920 --> 00:42:34,900 that the edu name server or the MIT.edu name server 887 00:42:34,900 --> 00:42:36,460 is doing in this picture, and that's 888 00:42:36,460 --> 00:42:38,270 called iterative resolution. 889 00:42:38,270 --> 00:42:40,110 Iterative resolution says the following. 890 00:42:40,110 --> 00:42:44,634 If I ask you to do some work for me to resolve a name, 891 00:42:44,634 --> 00:42:46,550 if you don't know the answer you just tell me, 892 00:42:46,550 --> 00:42:48,758 I don't know the answer, but you go here and find out 893 00:42:48,758 --> 00:42:49,504 the answer. 894 00:42:49,504 --> 00:42:51,420 So you're getting these pointers referring you 895 00:42:51,420 --> 00:42:53,215 to the right place. 896 00:42:53,215 --> 00:42:55,090 The second kind of resolution that's going on 897 00:42:55,090 --> 00:42:57,280 is the kind of resolution that this box is doing, 898 00:42:57,280 --> 00:42:59,580 this any name silver box. 899 00:42:59,580 --> 00:43:01,890 And it's doing something called recursive resolution 900 00:43:01,890 --> 00:43:04,770 because what's going on here is the stub resolver's telling it, 901 00:43:04,770 --> 00:43:07,370 here's a name; resolve it for me. 902 00:43:07,370 --> 00:43:09,930 And what he's doing is basically saying, 903 00:43:09,930 --> 00:43:13,170 OK, I'll resolve it for you and get you the final answer. 904 00:43:13,170 --> 00:43:15,020 I'm not going to give you a referral back 905 00:43:15,020 --> 00:43:18,300 to other places, which means eventually 906 00:43:18,300 --> 00:43:20,990 once it figures out the answer, if there is one, 907 00:43:20,990 --> 00:43:24,170 it's going to know the answer even though it did not 908 00:43:24,170 --> 00:43:26,440 originate the query. 909 00:43:26,440 --> 00:43:28,570 And the moment you have recursive resolution, 910 00:43:28,570 --> 00:43:30,390 it means that you can cache the answer 911 00:43:30,390 --> 00:43:33,140 because if somebody else comes and asks you the same query, 912 00:43:33,140 --> 00:43:36,550 you already have the answer cached. 913 00:43:36,550 --> 00:43:39,460 And notice that this benefit doesn't 914 00:43:39,460 --> 00:43:41,810 accrue if you're doing purely iterative resolution. 915 00:43:41,810 --> 00:43:44,210 If everybody was just sending referrals back 916 00:43:44,210 --> 00:43:46,720 to the stub resolver, the only caching 917 00:43:46,720 --> 00:43:49,832 you'd really primarily be gaining 918 00:43:49,832 --> 00:43:52,290 is for all of the requests that are common to this computer 919 00:43:52,290 --> 00:43:55,940 because nobody is actually cashing in getting 920 00:43:55,940 --> 00:43:57,100 any answers along the way. 921 00:43:57,100 --> 00:44:00,860 Nobody's even getting any referrals along the way. 922 00:44:00,860 --> 00:44:04,910 Only the node that's starting the name resolution 923 00:44:04,910 --> 00:44:07,100 is going to be getting any answers or any referrals 924 00:44:07,100 --> 00:44:09,190 at all. 925 00:44:09,190 --> 00:44:13,730 So the secret to getting good DNS performance is for certain 926 00:44:13,730 --> 00:44:16,660 nodes, for certain name servers, to agree to do recursive 927 00:44:16,660 --> 00:44:17,747 resolution -- 928 00:44:24,014 --> 00:44:26,430 And an example of that is any name server in this picture. 929 00:44:26,430 --> 00:44:29,251 So now if you have, a lot of the computers here 930 00:44:29,251 --> 00:44:31,500 with a lot of applications running on it, all of which 931 00:44:31,500 --> 00:44:33,840 use the same name server, and that name server 932 00:44:33,840 --> 00:44:36,380 is configured to recursively resolve names, then 933 00:44:36,380 --> 00:44:38,900 that name server benefits from being 934 00:44:38,900 --> 00:44:43,580 able to cache the answers to previous DNS lookups. 935 00:44:43,580 --> 00:44:46,020 But notice that they can cache two kinds of answers, 936 00:44:46,020 --> 00:44:48,975 and one of which is much more important than the other. 937 00:44:48,975 --> 00:44:50,600 The first kind of answer they can cache 938 00:44:50,600 --> 00:44:52,141 is the final answer that you get back 939 00:44:52,141 --> 00:44:56,230 that says X.CSALE.MIT.edu is at a particular IP address. 940 00:44:56,230 --> 00:44:59,690 So the next time somebody goes to X.CSALE.MIT.edu, 941 00:44:59,690 --> 00:45:02,430 they have the answer right there. 942 00:45:02,430 --> 00:45:05,060 But if you look at the statistics of DNS requests, 943 00:45:05,060 --> 00:45:07,240 there is some commonality in that everybody 944 00:45:07,240 --> 00:45:11,360 wants to go to www.CNN.com or Google.com or Yahoo.com, 945 00:45:11,360 --> 00:45:12,120 and so on. 946 00:45:12,120 --> 00:45:13,890 But, there's a huge number of requests 947 00:45:13,890 --> 00:45:15,690 going to machines like yours and mine 948 00:45:15,690 --> 00:45:17,460 which aren't running anything interesting. 949 00:45:17,460 --> 00:45:18,876 And you are really the only people 950 00:45:18,876 --> 00:45:20,330 interested in those machines. 951 00:45:20,330 --> 00:45:22,010 So really what's going on, and why 952 00:45:22,010 --> 00:45:25,380 caching helps is that not only are the final answers being 953 00:45:25,380 --> 00:45:28,670 cached, these referrals are being cached. 954 00:45:28,670 --> 00:45:31,240 So for example, [any name?] already knows the root, 955 00:45:31,240 --> 00:45:33,850 but now after the first request it knows the mapping 956 00:45:33,850 --> 00:45:35,670 for edu's name service. 957 00:45:35,670 --> 00:45:39,130 And after the first one to MIT.edu, 958 00:45:39,130 --> 00:45:42,590 it can cache the mapping for MIT.edu's name server as well. 959 00:45:42,590 --> 00:45:46,104 So, the next time somebody asks for anything.MIT.edu, 960 00:45:46,104 --> 00:45:48,270 this name server doesn't have to go all the way back 961 00:45:48,270 --> 00:45:48,800 to the root. 962 00:45:48,800 --> 00:45:51,560 In fact, it doesn't even have to go all the way back to edu. 963 00:45:51,560 --> 00:45:53,880 It just has to start with MIT.edu 964 00:45:53,880 --> 00:45:58,310 whose name service entry it already has in its cache. 965 00:45:58,310 --> 00:46:01,300 And, that really is the reason why 966 00:46:01,300 --> 00:46:03,610 the DNS scales very, very well. 967 00:46:03,610 --> 00:46:05,260 It's because it does caching. 968 00:46:05,260 --> 00:46:07,960 But the real key is that it's caching referrals. 969 00:46:07,960 --> 00:46:10,200 It's caching these name server entries 970 00:46:10,200 --> 00:46:12,135 associated with these labels. 971 00:46:12,135 --> 00:46:14,510 It's getting some benefit from caching the final answers, 972 00:46:14,510 --> 00:46:17,176 but it's really getting a lot of benefit from caching referrals. 973 00:46:19,657 --> 00:46:21,240 Now, of course, when you catch things, 974 00:46:21,240 --> 00:46:24,229 you have to worry about being stale because you certainly 975 00:46:24,229 --> 00:46:25,520 don't want to cache it forever. 976 00:46:25,520 --> 00:46:30,040 If you cached it forever, then nobody could change anything. 977 00:46:30,040 --> 00:46:32,150 So let's say you decide to change 978 00:46:32,150 --> 00:46:36,580 the mapping of www.MIT.edu's A record from one IP address 979 00:46:36,580 --> 00:46:37,080 to another. 980 00:46:37,080 --> 00:46:39,659 How do you tell the whole world that you've changed it? 981 00:46:39,659 --> 00:46:41,700 Well, there's two high-level strategies for this. 982 00:46:41,700 --> 00:46:43,616 One is to somehow keep track of all the people 983 00:46:43,616 --> 00:46:45,510 who have cached it and invalidate entries. 984 00:46:45,510 --> 00:46:47,300 And a few lectures from now, we'll 985 00:46:47,300 --> 00:46:49,320 look at ways in which that kind of approach 986 00:46:49,320 --> 00:46:51,770 might be made to work for different systems. 987 00:46:51,770 --> 00:46:54,970 DNS deals with it in sort of a different way. 988 00:46:54,970 --> 00:46:57,290 It doesn't worry about invalidation. 989 00:46:57,290 --> 00:47:03,200 Instead, it sets expiration time on entries 990 00:47:03,200 --> 00:47:06,370 also called TTL is a time to live. 991 00:47:06,370 --> 00:47:10,162 That's an abused and overloaded term in networks. 992 00:47:10,162 --> 00:47:11,620 But it's really an expiration time. 993 00:47:11,620 --> 00:47:14,910 It says that here's the answer to any of these questions, 994 00:47:14,910 --> 00:47:16,700 to a referral or to the final answer. 995 00:47:16,700 --> 00:47:17,780 Here's the mapping. 996 00:47:17,780 --> 00:47:20,400 And, it's valid for such and such a time, 997 00:47:20,400 --> 00:47:23,710 OK, like it could be anywhere from 15 seconds or 30 seconds 998 00:47:23,710 --> 00:47:25,847 to three hours or a day. 999 00:47:25,847 --> 00:47:26,930 Usually it's a day or two. 1000 00:47:26,930 --> 00:47:29,410 It's usually never more than a couple of days 1001 00:47:29,410 --> 00:47:31,590 because you reach the point of diminishing returns. 1002 00:47:31,590 --> 00:47:34,110 One request every two days is not a big deal. 1003 00:47:34,110 --> 00:47:37,580 So usually it's on the order of several seconds 1004 00:47:37,580 --> 00:47:39,690 if you want the mapping to be fine-grained, 1005 00:47:39,690 --> 00:47:41,630 or an hour, or a day. 1006 00:47:41,630 --> 00:47:45,820 And, after that, any access made after that to the same entry, 1007 00:47:45,820 --> 00:47:48,490 whether it be a referral or whether it be a final answer, 1008 00:47:48,490 --> 00:47:50,910 has to go back to the server that's authoritative that's 1009 00:47:50,910 --> 00:47:52,410 responsible for that entry. 1010 00:47:52,410 --> 00:47:56,132 So, for example, you went for the first time doing 1011 00:47:56,132 --> 00:47:57,840 the lookup of this name, and you got back 1012 00:47:57,840 --> 00:48:00,940 that MIT.edu's name service was some name, 1013 00:48:00,940 --> 00:48:03,780 and that it was valid for an hour. 1014 00:48:03,780 --> 00:48:05,540 Then the first request that happens 1015 00:48:05,540 --> 00:48:10,650 for anything.MIT.edu from here after an hour has to go here. 1016 00:48:10,650 --> 00:48:12,450 I mean, assuming that edu is still 1017 00:48:12,450 --> 00:48:14,760 a valid, hasn't yet expired. 1018 00:48:14,760 --> 00:48:16,490 This entry has to go here. 1019 00:48:16,490 --> 00:48:20,450 So, if you look at a time sequence of when you go, 1020 00:48:20,450 --> 00:48:24,270 when you actually go back to the server responsible for a name, 1021 00:48:24,270 --> 00:48:26,880 you can kind of divide up time into these chunks which 1022 00:48:26,880 --> 00:48:28,630 are the expiration time intervals assuming 1023 00:48:28,630 --> 00:48:29,440 they don't change. 1024 00:48:29,440 --> 00:48:31,000 They are set by the server. 1025 00:48:31,000 --> 00:48:33,300 And then, you might have accesses in between like this. 1026 00:48:37,480 --> 00:48:40,970 And the only accesses that go to the server 1027 00:48:40,970 --> 00:48:43,010 responsible for the name are the first ones 1028 00:48:43,010 --> 00:48:48,700 after every expiration. 1029 00:48:48,700 --> 00:48:53,221 So, this is the basic story for how 1030 00:48:53,221 --> 00:48:54,720 you get reasonably good performance, 1031 00:48:54,720 --> 00:48:57,934 and save a lot of roundtrips in DNS. 1032 00:48:57,934 --> 00:49:00,350 And the real reason for the scalability of the domain name 1033 00:49:00,350 --> 00:49:02,810 system is a combination of administrative delegation. 1034 00:49:02,810 --> 00:49:04,630 So you don't have some human being 1035 00:49:04,630 --> 00:49:08,160 involved in every name that's being added to the network. 1036 00:49:08,160 --> 00:49:11,250 It's distributed across different organizations. 1037 00:49:11,250 --> 00:49:14,000 And the fact that you have caching, 1038 00:49:14,000 --> 00:49:15,800 in particular your caching these name 1039 00:49:15,800 --> 00:49:21,110 server records, which means that you can save a lot of load 1040 00:49:21,110 --> 00:49:25,800 on the root and on the edu or com name servers. 1041 00:49:25,800 --> 00:49:27,820 Originally, the designers of DNS, one 1042 00:49:27,820 --> 00:49:29,820 thought that they had was that DNS 1043 00:49:29,820 --> 00:49:32,970 would scale very well because the name space is extremely 1044 00:49:32,970 --> 00:49:35,940 divided and hierarchical, which means that you were gaining 1045 00:49:35,940 --> 00:49:37,240 a lot from the hierarchy. 1046 00:49:37,240 --> 00:49:39,240 And that's why it would scale. 1047 00:49:39,240 --> 00:49:42,500 But that's actually not true because more than 90% 1048 00:49:42,500 --> 00:49:46,220 of the domain namespace is not hierarchical in any deep sense. 1049 00:49:46,220 --> 00:49:47,760 Everybody is something.com. 1050 00:49:47,760 --> 00:49:50,000 Everybody of importance is something.com, 1051 00:49:50,000 --> 00:49:51,890 or wants to be something.com. 1052 00:49:51,890 --> 00:49:54,220 And so, most of the load gets here. 1053 00:49:54,220 --> 00:49:55,660 So, there's no real deep hierarchy 1054 00:49:55,660 --> 00:49:57,035 that you're benefiting from here. 1055 00:49:57,035 --> 00:49:59,640 That's usually some flatname.com. 1056 00:49:59,640 --> 00:50:01,580 But it's divided. 1057 00:50:01,580 --> 00:50:02,900 And it's delegated. 1058 00:50:02,900 --> 00:50:04,650 And that's the really nice thing about it. 1059 00:50:04,650 --> 00:50:06,030 So you are gaining much more from the fact 1060 00:50:06,030 --> 00:50:07,130 that it's delegated. 1061 00:50:07,130 --> 00:50:09,530 You'd probably gain the same scalability is everything 1062 00:50:09,530 --> 00:50:13,150 were just something.root, OK? 1063 00:50:13,150 --> 00:50:14,800 We didn't really need too much of this 1064 00:50:14,800 --> 00:50:17,049 in terms of scalability, although it's very convenient 1065 00:50:17,049 --> 00:50:18,640 to be able to do delegation. 1066 00:50:18,640 --> 00:50:20,760 But primarily it seems to be universities 1067 00:50:20,760 --> 00:50:23,560 that take advantage of this kind of depth here. 1068 00:50:23,560 --> 00:50:26,530 Most companies tend not to pay much attention to depth. 1069 00:50:26,530 --> 00:50:28,700 But yet, the system scales because it is, in fact, 1070 00:50:28,700 --> 00:50:31,830 administratively delegated. 1071 00:50:31,830 --> 00:50:33,770 So one word on replication. 1072 00:50:33,770 --> 00:50:37,770 The DNS name servers responsible for these names are replicated. 1073 00:50:37,770 --> 00:50:40,980 You can't set up a DNS name and have a name service associated 1074 00:50:40,980 --> 00:50:41,590 with it. 1075 00:50:41,590 --> 00:50:44,180 A DNS name server record has to have at least two entries, 1076 00:50:44,180 --> 00:50:46,640 and you have to be on two different networks. 1077 00:50:46,640 --> 00:50:48,860 And that's one way so that if one of them is down 1078 00:50:48,860 --> 00:50:50,480 you can get to the other. 1079 00:50:50,480 --> 00:50:53,210 And unfortunately it turns out that that simple rule is often 1080 00:50:53,210 --> 00:50:54,310 violated. 1081 00:50:54,310 --> 00:50:56,060 Probably the most celebrated incident here 1082 00:50:56,060 --> 00:50:59,320 was Microsoft's update site, or one of the sites 1083 00:50:59,320 --> 00:51:00,800 was down for more than 24 hours. 1084 00:51:00,800 --> 00:51:03,030 And in the end it turned out in all these cases 1085 00:51:03,030 --> 00:51:05,430 to be a complicated set of reasons for why it failed. 1086 00:51:05,430 --> 00:51:07,180 Nothing fails for a simple reason. 1087 00:51:07,180 --> 00:51:09,610 But one of the root causes was that they 1088 00:51:09,610 --> 00:51:11,829 had DNS name servers that were replicated 1089 00:51:11,829 --> 00:51:14,120 but that happened to be behind the same Ethernet switch 1090 00:51:14,120 --> 00:51:17,020 on the same subnet, which is not recommended. 1091 00:51:17,020 --> 00:51:18,190 But that's what they had. 1092 00:51:18,190 --> 00:51:20,180 So it's the kind of thing that you 1093 00:51:20,180 --> 00:51:22,227 need to be careful about doing. 1094 00:51:22,227 --> 00:51:23,310 So I'm going to stop here. 1095 00:51:23,310 --> 00:51:26,860 And from Wednesday, we will talk about fault tolerance. 1096 00:51:26,860 --> 00:51:29,470 The recitation for tomorrow is a very short paper 1097 00:51:29,470 --> 00:51:31,440 called Google and 9/11. 1098 00:51:31,440 --> 00:51:35,230 But tomorrow you'll see a little bit about how Google works.