1 00:00:00,070 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,820 Commons license. 3 00:00:03,820 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,150 continue to offer high quality, educational resources for free. 5 00:00:10,150 --> 00:00:12,700 To make a donation, or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,260 at ocw.mit.edu. 8 00:00:26,985 --> 00:00:27,860 PROFESSOR: All right. 9 00:00:27,860 --> 00:00:29,760 Let's get started. 10 00:00:29,760 --> 00:00:32,409 So today we're going to talk about capabilities, 11 00:00:32,409 --> 00:00:36,310 continue our discussion of how to do privilege separation. 12 00:00:36,310 --> 00:00:39,960 And remember last week we talked about how Unix provides 13 00:00:39,960 --> 00:00:41,910 some mechanisms for applications to use 14 00:00:41,910 --> 00:00:45,600 if they want to privilege separate the application's 15 00:00:45,600 --> 00:00:46,649 internal structure. 16 00:00:46,649 --> 00:00:48,940 And today we're going to talk about capabilities, which 17 00:00:48,940 --> 00:00:53,720 is a very different way of thinking about privileges 18 00:00:53,720 --> 00:00:56,220 that an application might have. 19 00:00:56,220 --> 00:00:59,030 And this is why we have actually these two somewhat distinct 20 00:00:59,030 --> 00:01:06,840 readings for today, one of which is this confused deputy problem 21 00:01:06,840 --> 00:01:10,982 and how to make your privileges much more explicit when you're 22 00:01:10,982 --> 00:01:12,940 writing software so that you don't accidentally 23 00:01:12,940 --> 00:01:14,595 use the wrong privileges. 24 00:01:14,595 --> 00:01:16,470 And then the second paper is about the system 25 00:01:16,470 --> 00:01:20,700 called Capsicum, which is all about sandboxing and running 26 00:01:20,700 --> 00:01:22,930 some piece of code with fewer privileges 27 00:01:22,930 --> 00:01:26,420 so that it, very much like [INAUDIBLE], 28 00:01:26,420 --> 00:01:29,786 if it's compromised, the damage isn't that great. 29 00:01:29,786 --> 00:01:31,830 Now it turns out that the authors 30 00:01:31,830 --> 00:01:34,380 of both of these readings really think 31 00:01:34,380 --> 00:01:37,610 capabilities are the answer, because they let you manipulate 32 00:01:37,610 --> 00:01:42,540 privileges in a rather different way from how Unix, let's say, 33 00:01:42,540 --> 00:01:44,812 thinks about privileges. 34 00:01:44,812 --> 00:01:47,270 So to get started, maybe let's look at this confused deputy 35 00:01:47,270 --> 00:01:48,880 problem and try to understand what 36 00:01:48,880 --> 00:01:52,980 is this problem that Norman Hardy ran into and was 37 00:01:52,980 --> 00:01:54,590 so perplexed by. 38 00:01:54,590 --> 00:01:56,854 So the paper is written-- well, it 39 00:01:56,854 --> 00:01:58,395 was written quite a while ago, and it 40 00:01:58,395 --> 00:02:01,020 uses syntax for file names that's a bit surprising. 41 00:02:01,020 --> 00:02:04,480 But we can try to at least transcribe his problem 42 00:02:04,480 --> 00:02:07,690 into more familiar syntax with Unix-style path 43 00:02:07,690 --> 00:02:08,947 names, et cetera. 44 00:02:08,947 --> 00:02:10,530 So as far as I can tell, what is going 45 00:02:10,530 --> 00:02:13,880 on in their system is that they had a Fortran compiler, which 46 00:02:13,880 --> 00:02:16,310 sort of dates their design at some level, too. 47 00:02:16,310 --> 00:02:22,030 But their Fortran compiler lived in /sysx/fort, 48 00:02:22,030 --> 00:02:26,150 and they wanted to change this Fortran compiler, 49 00:02:26,150 --> 00:02:29,554 so they would keep statistics about what was compiled, 50 00:02:29,554 --> 00:02:31,720 what parts of a compiler were particularly expensive 51 00:02:31,720 --> 00:02:33,410 presumably, et cetera. 52 00:02:33,410 --> 00:02:36,120 So he wanted to make sure this Fortran compiler would somehow 53 00:02:36,120 --> 00:02:39,110 end up writing to this file /sysx/stat, 54 00:02:39,110 --> 00:02:44,360 that it would record information about various invocations 55 00:02:44,360 --> 00:02:46,350 of the compiler. 56 00:02:46,350 --> 00:02:50,070 And the way they did this is, in their operating system, they 57 00:02:50,070 --> 00:02:52,170 had something kind of like the setuid 58 00:02:52,170 --> 00:02:54,040 that we talked about in Unix. 59 00:02:54,040 --> 00:02:57,360 Except there, they called it the home files license. 60 00:02:57,360 --> 00:03:01,380 And what it means is that if you ran /sysx/fort, 61 00:03:01,380 --> 00:03:05,710 and this program had this so-called home files license, 62 00:03:05,710 --> 00:03:09,860 then this process that you just ran would have extra privileges 63 00:03:09,860 --> 00:03:13,102 on being able to write everything in /sysx. 64 00:03:13,102 --> 00:03:15,310 So it would have these extra privileges on everything 65 00:03:15,310 --> 00:03:18,819 in /sysx/, basically, star. 66 00:03:18,819 --> 00:03:20,610 It could access all those files in addition 67 00:03:20,610 --> 00:03:22,985 to anything that it could access because the user ran it, 68 00:03:22,985 --> 00:03:25,190 for example. 69 00:03:25,190 --> 00:03:27,030 So the particular problem they ran into 70 00:03:27,030 --> 00:03:31,236 is that some clever user was able to do this. 71 00:03:31,236 --> 00:03:32,860 So they would run the Fortran compiler, 72 00:03:32,860 --> 00:03:35,151 and the Fortran compiler would take arguments very much 73 00:03:35,151 --> 00:03:36,790 like GCC takes arguments. 74 00:03:36,790 --> 00:03:39,590 And they would compile something like foo.f. 75 00:03:39,590 --> 00:03:41,620 Here is my Fortran source code. 76 00:03:41,620 --> 00:03:48,120 And they'd say, well, put that output -o into /sysx/stat. 77 00:03:48,120 --> 00:03:50,700 Or more damagingly in their case, 78 00:03:50,700 --> 00:03:54,470 there was another file in /sysx that was the billing file 79 00:03:54,470 --> 00:03:56,390 for all the customers on the system. 80 00:03:56,390 --> 00:04:01,850 So you could similarly ask the Fortran compiler to compile 81 00:04:01,850 --> 00:04:05,800 the source file and put the output into some special file 82 00:04:05,800 --> 00:04:07,980 in /sysx. 83 00:04:07,980 --> 00:04:10,860 And in their case, this actually worked. 84 00:04:10,860 --> 00:04:12,570 Even though the user themselves didn't 85 00:04:12,570 --> 00:04:15,430 have access to write to this file or directory, 86 00:04:15,430 --> 00:04:18,620 because the compiler had this extra privilege-- 87 00:04:18,620 --> 00:04:21,660 this home files license, in their case-- 88 00:04:21,660 --> 00:04:24,590 it was able to override these files 89 00:04:24,590 --> 00:04:28,784 despite that not being really the developer's intention. 90 00:04:28,784 --> 00:04:29,450 This make sense? 91 00:04:29,450 --> 00:04:31,116 This is the rough problem they ran into? 92 00:04:31,116 --> 00:04:32,515 So who do they blame? 93 00:04:32,515 --> 00:04:33,765 What do they think went wrong? 94 00:04:40,995 --> 00:04:42,780 Or how would you design it differently 95 00:04:42,780 --> 00:04:46,150 to avoid running into such problems? 96 00:04:46,150 --> 00:04:48,770 So the thing they sort of think about here, 97 00:04:48,770 --> 00:04:51,930 or they talk about in this write up, 98 00:04:51,930 --> 00:04:55,240 is that they believe this Fortran compiler should 99 00:04:55,240 --> 00:04:57,990 be very careful when it's using its privileges. 100 00:04:57,990 --> 00:04:59,960 Because, at some level, the Fortran compiler 101 00:04:59,960 --> 00:05:01,570 has two types of privileges. 102 00:05:01,570 --> 00:05:05,660 It has one stemming from the fact the user invoked it, 103 00:05:05,660 --> 00:05:08,140 so the user should be able to access the source 104 00:05:08,140 --> 00:05:10,050 file, like foo.f. 105 00:05:10,050 --> 00:05:11,860 And if it was some other user, maybe 106 00:05:11,860 --> 00:05:14,680 it wouldn't be able to access the user source code. 107 00:05:14,680 --> 00:05:17,590 And in other sorts of privileges is from those home files 108 00:05:17,590 --> 00:05:20,830 license thing that allows us to write to these special files. 109 00:05:20,830 --> 00:05:23,480 And internally, in the source code of the compiler, 110 00:05:23,480 --> 00:05:25,920 when they open a file, the compiler 111 00:05:25,920 --> 00:05:28,900 should have been very explicit about which of these privileges 112 00:05:28,900 --> 00:05:31,910 it wants to exercise when opening a file 113 00:05:31,910 --> 00:05:34,372 or performing some privileged operation. 114 00:05:34,372 --> 00:05:36,330 But their compiler was not written in this way. 115 00:05:36,330 --> 00:05:38,140 It was just called open, read, write, 116 00:05:38,140 --> 00:05:39,550 like any other program would do. 117 00:05:39,550 --> 00:05:42,440 And it would implicitly use all the privileges that it has, 118 00:05:42,440 --> 00:05:45,033 which combines-- well, in their system design, 119 00:05:45,033 --> 00:05:47,410 it was sort of the union of the user privileges 120 00:05:47,410 --> 00:05:51,086 and these home files license privileges. 121 00:05:51,086 --> 00:05:52,790 That make sense? 122 00:05:52,790 --> 00:05:55,390 So these guys were really interested in fixing 123 00:05:55,390 --> 00:05:56,180 this problem. 124 00:05:56,180 --> 00:05:59,240 And they were sort of calling this compiler this confused 125 00:05:59,240 --> 00:06:00,964 deputy, because it needs to disambiguate 126 00:06:00,964 --> 00:06:02,505 these multiple privileges that it has 127 00:06:02,505 --> 00:06:06,800 and carefully use them in the right instance. 128 00:06:06,800 --> 00:06:09,350 So I guess one thing we could try to look at 129 00:06:09,350 --> 00:06:15,120 is how would we design such a compiler in Unix? 130 00:06:15,120 --> 00:06:18,024 So in their system, they had this whole files license thing. 131 00:06:18,024 --> 00:06:20,190 Other mechanisms, then they introduced capabilities. 132 00:06:20,190 --> 00:06:21,750 We'll talk about them shortly. 133 00:06:21,750 --> 00:06:24,830 But could we solve this in a Unix system? 134 00:06:24,830 --> 00:06:27,080 Suppose you had to write this Fortran compiler in Unix 135 00:06:27,080 --> 00:06:29,566 and write to a special file and avoid this confused 136 00:06:29,566 --> 00:06:30,190 deputy problem. 137 00:06:30,190 --> 00:06:32,775 What would you do? 138 00:06:32,775 --> 00:06:33,275 Any ideas? 139 00:06:35,802 --> 00:06:37,760 I guess you could just declare this a bad plan. 140 00:06:37,760 --> 00:06:40,212 Like don't keep statistics. 141 00:06:40,212 --> 00:06:42,028 Yeah? 142 00:06:42,028 --> 00:06:44,649 AUDIENCE: [INAUDIBLE]. 143 00:06:44,649 --> 00:06:45,315 PROFESSOR: Sure. 144 00:06:45,315 --> 00:06:46,670 That could be, right? 145 00:06:46,670 --> 00:06:47,530 Well, yeah. 146 00:06:47,530 --> 00:06:50,530 So you could not support flags like -o. 147 00:06:50,530 --> 00:06:52,210 On the other hand, you might want 148 00:06:52,210 --> 00:06:55,980 to allow specifying which source code you want 149 00:06:55,980 --> 00:06:58,196 to compile so that maybe you could read the billing 150 00:06:58,196 --> 00:06:59,820 file or read the statistics file, which 151 00:06:59,820 --> 00:07:01,230 maybe should be secret. 152 00:07:01,230 --> 00:07:02,897 Or maybe the source code has-- maybe you 153 00:07:02,897 --> 00:07:04,646 can support a the source code on standard, 154 00:07:04,646 --> 00:07:06,330 but it has include statements, so 155 00:07:06,330 --> 00:07:08,370 it needs to include other pieces of source code. 156 00:07:08,370 --> 00:07:09,354 So that's a little tricky. 157 00:07:09,354 --> 00:07:11,729 AUDIENCE: You could split up the application [INAUDIBLE]. 158 00:07:16,905 --> 00:07:17,530 PROFESSOR: Yes. 159 00:07:17,530 --> 00:07:20,270 So another potentially good design is to split it up, 160 00:07:20,270 --> 00:07:20,770 right? 161 00:07:20,770 --> 00:07:23,130 And realize that this fort compiler really 162 00:07:23,130 --> 00:07:25,525 doesn't need all these two privileges at the same time. 163 00:07:25,525 --> 00:07:33,420 So maybe we should have our Unix world /bin/fortcc or something, 164 00:07:33,420 --> 00:07:36,570 the compiler, and then this guy is just a regular program with 165 00:07:36,570 --> 00:07:37,790 no extra privileges. 166 00:07:37,790 --> 00:07:41,980 And then we'll also maybe have a /bin/fortlog, 167 00:07:41,980 --> 00:07:44,350 which is going to be a special program with some extra 168 00:07:44,350 --> 00:07:47,640 privileges and it'll log some statistics about what's going 169 00:07:47,640 --> 00:07:49,410 on in the compiler. 170 00:07:49,410 --> 00:07:53,010 And fortcc is going to invoke this guy. 171 00:07:53,010 --> 00:07:56,020 So how do we give this guy extra privileges? 172 00:07:56,020 --> 00:07:56,520 Yeah? 173 00:07:56,520 --> 00:07:58,153 AUDIENCE: Well, maybe if you use something like setuid 174 00:07:58,153 --> 00:08:00,930 or something, like fortlog, then presumably any other user 175 00:08:00,930 --> 00:08:03,034 could also log arbitrary data through it. 176 00:08:03,034 --> 00:08:03,700 PROFESSOR: Yeah. 177 00:08:03,700 --> 00:08:04,719 So this is not so great. 178 00:08:04,719 --> 00:08:06,510 Because on fortlog, presumably the only way 179 00:08:06,510 --> 00:08:07,968 to give extra privileges in Unix is 180 00:08:07,968 --> 00:08:11,170 to in fact make it owned by, I don't know, maybe the fort UID, 181 00:08:11,170 --> 00:08:14,490 and that's also setuid. 182 00:08:14,490 --> 00:08:17,550 So every time you run it, it switches to this Fortran UID. 183 00:08:17,550 --> 00:08:19,580 And maybe there's some special stats file. 184 00:08:19,580 --> 00:08:23,170 But then in fact anyone can invoke this fortlog thingy. 185 00:08:23,170 --> 00:08:24,730 Which is maybe not great. 186 00:08:24,730 --> 00:08:26,940 Now anyone can write to the stats file. 187 00:08:26,940 --> 00:08:29,782 But maybe this example is not the biggest security concern 188 00:08:29,782 --> 00:08:31,490 about someone corrupting your statistics. 189 00:08:31,490 --> 00:08:33,220 But suppose this was a billing file. 190 00:08:33,220 --> 00:08:36,072 Then maybe the same problems would be slightly more acute. 191 00:08:36,072 --> 00:08:36,571 Yeah? 192 00:08:36,571 --> 00:08:39,674 AUDIENCE: But you can always make your [INAUDIBLE] stats 193 00:08:39,674 --> 00:08:40,340 you want, right? 194 00:08:40,340 --> 00:08:41,298 Instead of [INAUDIBLE]. 195 00:08:44,940 --> 00:08:46,930 PROFESSOR: So in some sense, yeah. 196 00:08:46,930 --> 00:08:48,960 If you're willing to live with arbitrary 197 00:08:48,960 --> 00:08:51,262 stuff in your statistics or logging file, 198 00:08:51,262 --> 00:08:52,220 then maybe that's true. 199 00:08:52,220 --> 00:08:54,212 AUDIENCE: Even if you [INAUDIBLE], 200 00:08:54,212 --> 00:08:56,702 you can already make your C code have whatever statistics 201 00:08:56,702 --> 00:08:57,994 that you'd want to be recorded. 202 00:08:57,994 --> 00:08:58,868 PROFESSOR: You could. 203 00:08:58,868 --> 00:08:59,585 Yeah. 204 00:08:59,585 --> 00:09:00,084 Yeah. 205 00:09:00,084 --> 00:09:01,524 So it might be that in this case, 206 00:09:01,524 --> 00:09:03,940 it doesn't really matter that you can log arbitrary stuff. 207 00:09:03,940 --> 00:09:05,240 So that's true. 208 00:09:05,240 --> 00:09:06,040 Yeah. 209 00:09:06,040 --> 00:09:08,480 So if you cared about who can invoke this fortlog thing, 210 00:09:08,480 --> 00:09:10,063 could you really do something about it 211 00:09:10,063 --> 00:09:12,484 in Unix, or not so much? 212 00:09:12,484 --> 00:09:12,984 Yeah? 213 00:09:12,984 --> 00:09:14,892 AUDIENCE: [INAUDIBLE]. 214 00:09:14,892 --> 00:09:18,090 It would make both of them setuid. 215 00:09:18,090 --> 00:09:23,120 Now the fortcc would read that source files. 216 00:09:23,120 --> 00:09:26,430 It would switch back to the saved UID, just the user UID. 217 00:09:26,430 --> 00:09:31,060 Remote fortlog in a setuid, which has 218 00:09:31,060 --> 00:09:32,485 permissions to execute fortlog. 219 00:09:32,485 --> 00:09:37,812 And that fortlog would setuid again [INAUDIBLE]. 220 00:09:37,812 --> 00:09:38,520 PROFESSOR: Right. 221 00:09:38,520 --> 00:09:39,020 Yeah. 222 00:09:39,020 --> 00:09:42,710 So there is this rather elaborate mechanism in Unix 223 00:09:42,710 --> 00:09:46,280 that we skipped on last Monday's lecture, that 224 00:09:46,280 --> 00:09:48,780 actually allows an application to switch 225 00:09:48,780 --> 00:09:50,190 between multiple UIDs. 226 00:09:50,190 --> 00:09:53,800 if it was setuid to some user ID, then it could say, 227 00:09:53,800 --> 00:09:55,730 well, now I want to run with this user ID. 228 00:09:55,730 --> 00:09:57,480 Now I want to run with this other user ID. 229 00:09:57,480 --> 00:10:00,820 And it could sort of carefully alternate between these. 230 00:10:00,820 --> 00:10:02,320 It's a little tricky to do it right, 231 00:10:02,320 --> 00:10:04,213 but it's probably doable. 232 00:10:04,213 --> 00:10:06,224 So that's one potential design. 233 00:10:06,224 --> 00:10:08,140 I guess another hack you could maybe try to do 234 00:10:08,140 --> 00:10:10,740 is make this fortlog binary only executable 235 00:10:10,740 --> 00:10:14,790 to a particular group and make fortcc a setgid binary 236 00:10:14,790 --> 00:10:15,622 to that group. 237 00:10:15,622 --> 00:10:17,830 It's not great, because it obliterates whatever group 238 00:10:17,830 --> 00:10:19,950 list the user had initially. 239 00:10:19,950 --> 00:10:21,200 But who knows? 240 00:10:21,200 --> 00:10:24,190 Maybe that's better than nothing. 241 00:10:24,190 --> 00:10:26,550 Anyway, so it's a fairly tricky problem 242 00:10:26,550 --> 00:10:29,600 to solve in an entirely satisfactory fashion 243 00:10:29,600 --> 00:10:31,812 with these Unix mechanisms. 244 00:10:31,812 --> 00:10:33,770 Although, maybe you should rethink your problem 245 00:10:33,770 --> 00:10:35,640 and not worry about your statistics 246 00:10:35,640 --> 00:10:38,970 file as much in the first place. 247 00:10:38,970 --> 00:10:45,150 But how do we think about what's going wrong in the design? 248 00:10:45,150 --> 00:10:47,920 Well, there's two things we could try to learn from this, 249 00:10:47,920 --> 00:10:49,925 or basically, what went wrong. 250 00:10:53,120 --> 00:10:58,180 And one interpretation that one party wants us to take away 251 00:10:58,180 --> 00:11:01,560 is this notion that he calls ambient authority. 252 00:11:06,730 --> 00:11:08,300 So what is ambient authority? 253 00:11:08,300 --> 00:11:10,230 Can anyone figure out what they meant? 254 00:11:10,230 --> 00:11:12,230 They've never exactly defined this. 255 00:11:12,230 --> 00:11:12,730 Yeah? 256 00:11:12,730 --> 00:11:14,248 AUDIENCE: It means you have the authority given 257 00:11:14,248 --> 00:11:15,590 to you by the environment. 258 00:11:15,590 --> 00:11:19,464 So as if [INAUDIBLE] user with no limitations. 259 00:11:19,464 --> 00:11:20,130 PROFESSOR: Yeah. 260 00:11:20,130 --> 00:11:24,040 So you're making an operation, and you can specify 261 00:11:24,040 --> 00:11:25,177 what operation you want. 262 00:11:25,177 --> 00:11:27,760 But the decision of whether that operation is going to succeed 263 00:11:27,760 --> 00:11:30,850 comes from some extra implicit parameters in your process, 264 00:11:30,850 --> 00:11:31,660 for example. 265 00:11:31,660 --> 00:11:34,970 And in Unix, you can figure out what this ambient authority 266 00:11:34,970 --> 00:11:36,490 check might look like. 267 00:11:36,490 --> 00:11:38,860 So if you make a system call, then you probably 268 00:11:38,860 --> 00:11:41,080 supplied some sort of a name to a system call. 269 00:11:41,080 --> 00:11:43,340 And inside of the kernel, this gets 270 00:11:43,340 --> 00:11:45,570 mapped to some sort of an object. 271 00:11:45,570 --> 00:11:48,580 And the object presumably has some kind of an access control 272 00:11:48,580 --> 00:11:52,110 list on it, like the permissions on a file, et cetera. 273 00:11:52,110 --> 00:11:53,930 So there are some permissions that you 274 00:11:53,930 --> 00:11:56,460 can get from the object. 275 00:11:56,460 --> 00:11:58,770 And that should decide whether an operation 276 00:11:58,770 --> 00:12:00,480 is going to be allowed on this name 277 00:12:00,480 --> 00:12:02,180 of the application supplied. 278 00:12:02,180 --> 00:12:04,400 This is sort of what the application gets to see. 279 00:12:04,400 --> 00:12:06,850 Inside of the kernel, there's also 280 00:12:06,850 --> 00:12:09,780 the current user ID of the process making the calls. 281 00:12:09,780 --> 00:12:12,600 So this is the current prox UID. 282 00:12:15,140 --> 00:12:18,250 And this thing goes into the decision 283 00:12:18,250 --> 00:12:22,710 of whether to allow a particular operation or not. 284 00:12:22,710 --> 00:12:24,770 So it's the current process user ID 285 00:12:24,770 --> 00:12:27,210 that's this ambient privilege. 286 00:12:27,210 --> 00:12:29,240 Whatever operation you're going to try to do, 287 00:12:29,240 --> 00:12:31,540 the kernel will actually try, in some sense, 288 00:12:31,540 --> 00:12:35,815 as hard as possible to allow it by using your current UID, 289 00:12:35,815 --> 00:12:39,410 and your current GID and whatever other extra privileges 290 00:12:39,410 --> 00:12:40,500 you might have. 291 00:12:40,500 --> 00:12:43,120 And as long as there's some set of privileges that allow you 292 00:12:43,120 --> 00:12:45,690 to do it, it'll let you do it. 293 00:12:45,690 --> 00:12:47,065 Which is maybe not the best thing 294 00:12:47,065 --> 00:12:51,080 to do if you aren't fully aware of what all these problems are. 295 00:12:51,080 --> 00:12:53,010 Maybe you don't want to use all of them 296 00:12:53,010 --> 00:12:57,910 to open a particular file or make some other operation. 297 00:12:57,910 --> 00:13:01,867 Does this make sense, roughly what ambient privilege is? 298 00:13:01,867 --> 00:13:03,325 In the case of an operating system, 299 00:13:03,325 --> 00:13:05,910 it basically ends up being the fact that a process has 300 00:13:05,910 --> 00:13:07,680 some sort of a user ID. 301 00:13:07,680 --> 00:13:11,570 Are there non-OS examples of ambient privilege 302 00:13:11,570 --> 00:13:12,710 you guys can think of? 303 00:13:12,710 --> 00:13:15,280 Like when you're making an operation, something 304 00:13:15,280 --> 00:13:17,525 about the identity of the caller, the terms of 305 00:13:17,525 --> 00:13:18,900 whether they'll succeed or not. 306 00:13:21,640 --> 00:13:23,765 Like one example is probably firewalls, as well. 307 00:13:23,765 --> 00:13:25,610 So this is just an OS example. 308 00:13:25,610 --> 00:13:29,940 And in privilege, another is the firewalls on the network. 309 00:13:29,940 --> 00:13:32,570 Because any operation you do from a machine 310 00:13:32,570 --> 00:13:35,890 inside of a firewall is going to be allowed because, 311 00:13:35,890 --> 00:13:37,410 well, you just have that IP address, 312 00:13:37,410 --> 00:13:39,930 or you're on that side of a network. 313 00:13:39,930 --> 00:13:43,870 And if you're outside, the same operation will be disallowed. 314 00:13:43,870 --> 00:13:47,330 So it's also a solar problem. 315 00:13:47,330 --> 00:13:50,850 Say you visit some website, and the website includes a link 316 00:13:50,850 --> 00:13:53,794 to some different server, well, maybe you 317 00:13:53,794 --> 00:13:55,710 don't want to use the privileges that you have 318 00:13:55,710 --> 00:13:58,500 or the inside of your network to access that link. 319 00:13:58,500 --> 00:14:00,500 Because maybe it'll access your internal printer 320 00:14:00,500 --> 00:14:02,470 and exploit it in some way. 321 00:14:02,470 --> 00:14:05,021 And really, the guy that provided you the link 322 00:14:05,021 --> 00:14:06,396 shouldn't have been able to reach 323 00:14:06,396 --> 00:14:08,397 the printer in the first place, because they 324 00:14:08,397 --> 00:14:09,230 were on the outside. 325 00:14:09,230 --> 00:14:14,190 Or a firewall that your browser, maybe by visiting uplink, 326 00:14:14,190 --> 00:14:15,885 will be tricked into doing this. 327 00:14:15,885 --> 00:14:19,510 It's sort of a moral equivalent of this confused 328 00:14:19,510 --> 00:14:21,010 problem on the network models. 329 00:14:21,010 --> 00:14:22,010 Yeah? 330 00:14:22,010 --> 00:14:25,344 AUDIENCE: [INAUDIBLE] permission are directly affected also. 331 00:14:25,344 --> 00:14:26,010 PROFESSOR: Yeah. 332 00:14:26,010 --> 00:14:28,070 AUDIENCE: Because it's essentially DAC, potentially, 333 00:14:28,070 --> 00:14:28,830 in the Capsicum. 334 00:14:28,830 --> 00:14:29,280 PROFESSOR: Yeah. 335 00:14:29,280 --> 00:14:31,250 So this is pretty much what the Capsicum guys 336 00:14:31,250 --> 00:14:33,550 think of as discretionary access control. 337 00:14:33,550 --> 00:14:35,800 And the fact that it's discretionary, well, 338 00:14:35,800 --> 00:14:38,697 this is not quite what discretionary access control 339 00:14:38,697 --> 00:14:39,470 means. 340 00:14:39,470 --> 00:14:41,790 But what discretionary access control means 341 00:14:41,790 --> 00:14:45,350 is that the user, or the owner of an object, 342 00:14:45,350 --> 00:14:48,609 can decide what security policy will look like to an object. 343 00:14:48,609 --> 00:14:51,025 Which seems very natural in a Unix setting. it's my files, 344 00:14:51,025 --> 00:14:51,970 I can decide what I want. 345 00:14:51,970 --> 00:14:54,386 I can give them to you, or I can keep them private, great. 346 00:14:55,960 --> 00:14:58,700 So almost all DAC systems do look 347 00:14:58,700 --> 00:15:01,300 like this, because they want to have some sort of permissions 348 00:15:01,300 --> 00:15:04,450 that a user could modify to control the security 349 00:15:04,450 --> 00:15:07,800 policy for their files. 350 00:15:07,800 --> 00:15:11,910 The flip side is mandatory access control. 351 00:15:11,910 --> 00:15:15,257 We'll talk about it in a little while, but at some level, 352 00:15:15,257 --> 00:15:17,340 they have this very philosophically different view 353 00:15:17,340 --> 00:15:17,881 of the world. 354 00:15:17,881 --> 00:15:20,000 They think, well, you're the user. 355 00:15:20,000 --> 00:15:22,240 But someone else will set the security policy 356 00:15:22,240 --> 00:15:24,460 for how you use this computer. 357 00:15:24,460 --> 00:15:29,000 And this sort of came out of the military in the '70s or '80s, 358 00:15:29,000 --> 00:15:32,946 when they really wanted to have classified computer systems 359 00:15:32,946 --> 00:15:34,654 where, well, you're working on some stuff 360 00:15:34,654 --> 00:15:35,613 and it's marked secret. 361 00:15:35,613 --> 00:15:37,737 I'm working on some stuff that's marked top secret. 362 00:15:37,737 --> 00:15:39,113 So my stuff just can't go to you. 363 00:15:39,113 --> 00:15:41,112 It's not up to me whether to set the permissions 364 00:15:41,112 --> 00:15:42,000 on a file, et cetera. 365 00:15:42,000 --> 00:15:44,830 It's just not allowed by some guy in charge. 366 00:15:44,830 --> 00:15:46,630 So mandatory access control is really 367 00:15:46,630 --> 00:15:49,640 trying to enforce these different kinds of policies 368 00:15:49,640 --> 00:15:52,500 in the first place, where there's 369 00:15:52,500 --> 00:15:54,610 the user and the application developer. 370 00:15:54,610 --> 00:15:56,910 And then there's some guy separate from the user 371 00:15:56,910 --> 00:15:59,472 and the developer that sets the policy. 372 00:15:59,472 --> 00:16:02,492 And, as you can sort of guess, it doesn't always work out. 373 00:16:02,492 --> 00:16:03,950 Well, we'll talk about it in a bit. 374 00:16:03,950 --> 00:16:06,001 But that's what discretionary versus mandatory 375 00:16:06,001 --> 00:16:10,110 means at this control. 376 00:16:10,110 --> 00:16:11,310 All right. 377 00:16:11,310 --> 00:16:14,480 So there's many other examples that you could imagine where 378 00:16:14,480 --> 00:16:16,040 we have ambient authority. 379 00:16:16,040 --> 00:16:20,910 And it's not inherently bad, law but it's just something 380 00:16:20,910 --> 00:16:22,637 that you have to be very careful about. 381 00:16:22,637 --> 00:16:24,470 If you have a system with ambient authority, 382 00:16:24,470 --> 00:16:27,020 you should probably be very careful 383 00:16:27,020 --> 00:16:29,595 if you're performing privileged operations. 384 00:16:29,595 --> 00:16:31,220 You should make sure that you're really 385 00:16:31,220 --> 00:16:35,980 using the right authority and not accidentally being 386 00:16:35,980 --> 00:16:39,146 tricked very much like this Fortran compiler 20 years ago. 387 00:16:39,146 --> 00:16:41,580 25 now. 388 00:16:41,580 --> 00:16:42,450 All right. 389 00:16:42,450 --> 00:16:45,470 So this is one interpretation of what goes wrong. 390 00:16:45,470 --> 00:16:47,487 And it's not necessarily the only way 391 00:16:47,487 --> 00:16:49,070 to think about what goes wrong, right? 392 00:16:49,070 --> 00:16:51,192 Another possibility is that, well, 393 00:16:51,192 --> 00:16:53,400 wouldn't it be nice if it was easy for an application 394 00:16:53,400 --> 00:16:56,440 to tell whether it should access a file on behalf 395 00:16:56,440 --> 00:16:57,445 of some principle? 396 00:16:57,445 --> 00:17:00,700 So maybe another problem is that the access control 397 00:17:00,700 --> 00:17:02,024 checks are complicated. 398 00:17:07,381 --> 00:17:10,294 So in some sense, when the Fortran compiler is running, 399 00:17:10,294 --> 00:17:13,900 and it's opening a file on behalf of a user, 400 00:17:13,900 --> 00:17:16,660 it basically needs to replicate the same exact logic 401 00:17:16,660 --> 00:17:20,240 we see drawn out here, except that the Fortran compiler needs 402 00:17:20,240 --> 00:17:22,490 to plug-in something else here. 403 00:17:22,490 --> 00:17:25,770 Instead of using its current privileges, and all of them, 404 00:17:25,770 --> 00:17:27,470 it should just replicate this check 405 00:17:27,470 --> 00:17:32,150 and try to make it with a different set of privileges. 406 00:17:32,150 --> 00:17:34,110 So in Unix, this turns out to be fairly 407 00:17:34,110 --> 00:17:36,920 tricky to do, because there's many places 408 00:17:36,920 --> 00:17:38,500 where these security checks happen. 409 00:17:38,500 --> 00:17:41,020 if you have symbolic links, then the symbolic link 410 00:17:41,020 --> 00:17:43,660 gets looked up, and that path name also 411 00:17:43,660 --> 00:17:47,540 gets evaluated with someone's privileges, et cetera. 412 00:17:47,540 --> 00:17:50,220 But it might be that, in some system, 413 00:17:50,220 --> 00:17:51,940 you could simplify this access control 414 00:17:51,940 --> 00:17:55,632 check, where you could do it yourself in an application. 415 00:17:55,632 --> 00:17:59,320 Does that seem like a reasonable plan to you guys? 416 00:17:59,320 --> 00:18:01,960 Would you go with that? 417 00:18:01,960 --> 00:18:03,640 Any dangers of replicating these checks? 418 00:18:03,640 --> 00:18:04,260 Yeah? 419 00:18:04,260 --> 00:18:06,865 AUDIENCE: Well, if you do the checks in the application, 420 00:18:06,865 --> 00:18:08,594 you could just not do the checks. 421 00:18:08,594 --> 00:18:09,260 PROFESSOR: Yeah. 422 00:18:09,260 --> 00:18:10,360 So you could easily miss the checks. 423 00:18:10,360 --> 00:18:11,360 That's absolutely right. 424 00:18:11,360 --> 00:18:13,680 So in some sense, what the Fortran compiler did here, 425 00:18:13,680 --> 00:18:15,370 well, they didn't even bother trying to do the checks, 426 00:18:15,370 --> 00:18:16,659 now that they screwed them up. 427 00:18:16,659 --> 00:18:18,950 Another possibility, in addition to missing the checks, 428 00:18:18,950 --> 00:18:21,589 is maybe the kernel will change over time, 429 00:18:21,589 --> 00:18:23,380 and it will have slightly different checks. 430 00:18:23,380 --> 00:18:25,100 It will introduce some extra security measure, 431 00:18:25,100 --> 00:18:26,766 and the application will be left behind. 432 00:18:26,766 --> 00:18:28,455 And it will implement old style checks. 433 00:18:28,455 --> 00:18:31,280 And probably not a great plan. 434 00:18:31,280 --> 00:18:34,862 So recall, one good idea in security 435 00:18:34,862 --> 00:18:36,590 is to have economy of mechanisms. 436 00:18:36,590 --> 00:18:39,222 So there's only a small number of places that are enforcing 437 00:18:39,222 --> 00:18:40,180 your security policies. 438 00:18:40,180 --> 00:18:41,890 You probably don't want to replicate 439 00:18:41,890 --> 00:18:45,520 the same functionality in applications in the kernel, 440 00:18:45,520 --> 00:18:46,020 et cetera. 441 00:18:46,020 --> 00:18:48,090 You really want to boil it down to one place. 442 00:18:48,090 --> 00:18:50,900 That roughly makes sense? 443 00:18:50,900 --> 00:18:52,070 OK. 444 00:18:52,070 --> 00:18:56,980 So what is this capability, I guess, 445 00:18:56,980 --> 00:19:02,220 idea where thinking might solve this authority problem? 446 00:19:02,220 --> 00:19:05,150 Well, there's some formal definition for the thing. 447 00:19:05,150 --> 00:19:08,570 But really, you can get very close by thinking of Unix file 448 00:19:08,570 --> 00:19:11,270 descriptors as a capability. 449 00:19:11,270 --> 00:19:15,210 So I guess the alternative to this picture, 450 00:19:15,210 --> 00:19:18,470 in capability world, is that instead 451 00:19:18,470 --> 00:19:20,510 of having the application supply name, 452 00:19:20,510 --> 00:19:22,510 and you look up an object, you get a permission, 453 00:19:22,510 --> 00:19:24,180 you decide whether to allow it based 454 00:19:24,180 --> 00:19:25,910 on some ambient authority, instead, 455 00:19:25,910 --> 00:19:28,920 the capability is the picture looks very simple. 456 00:19:28,920 --> 00:19:32,230 You have a capability, and if you have a capability, 457 00:19:32,230 --> 00:19:35,270 it points to an object. 458 00:19:35,270 --> 00:19:37,482 And maybe the capability has some small number 459 00:19:37,482 --> 00:19:40,450 of restrictions of what you can do with an object. 460 00:19:40,450 --> 00:19:43,340 But basically, if you have the capability to an object, 461 00:19:43,340 --> 00:19:44,830 you can access the object. 462 00:19:44,830 --> 00:19:46,420 It's actually very simple. 463 00:19:46,420 --> 00:19:49,280 So there's no ambient authority that 464 00:19:49,280 --> 00:19:51,470 decides whether an operation on a capability 465 00:19:51,470 --> 00:19:53,310 is going to be allowed. 466 00:19:53,310 --> 00:19:55,290 The only thing is that maybe the capability has 467 00:19:55,290 --> 00:19:57,623 a couple of extra bits, or this mass that they described 468 00:19:57,623 --> 00:19:59,629 in the paper, which says, well, you 469 00:19:59,629 --> 00:20:02,240 have a capability for this file, as it's restricted 470 00:20:02,240 --> 00:20:03,470 to read operations only. 471 00:20:03,470 --> 00:20:07,440 Or it's restricted to write or append operations only. 472 00:20:07,440 --> 00:20:10,885 And then your security decisions are all of a sudden very easy. 473 00:20:10,885 --> 00:20:12,260 Because if you have a capability, 474 00:20:12,260 --> 00:20:13,410 you can do something. 475 00:20:13,410 --> 00:20:15,248 If you don't, you can't. 476 00:20:15,248 --> 00:20:17,940 Make sense? 477 00:20:17,940 --> 00:20:21,430 So I guess one important property of capability 478 00:20:21,430 --> 00:20:25,000 is that they should actually be unforgeable, 479 00:20:25,000 --> 00:20:27,396 as the papers talk about. 480 00:20:27,396 --> 00:20:29,020 So what does it mean to be unforgeable, 481 00:20:29,020 --> 00:20:31,900 or why do we want this in this capability world? 482 00:20:34,980 --> 00:20:37,700 Well, I guess this actually may be almost too obvious here. 483 00:20:37,700 --> 00:20:39,324 Well, if you can make up any capability 484 00:20:39,324 --> 00:20:41,275 you want-- I can make up a capability for any 485 00:20:41,275 --> 00:20:42,849 of your guys' files and go access it. 486 00:20:42,849 --> 00:20:44,640 So if I can make it up, and I'll access it, 487 00:20:44,640 --> 00:20:47,760 and there's nothing else in the security design, that stops me 488 00:20:47,760 --> 00:20:54,030 from accessing an object once I can manufacture a capability. 489 00:20:54,030 --> 00:20:55,870 So it's important that these capabilities 490 00:20:55,870 --> 00:20:58,765 can't be made up out of thin air by the application 491 00:20:58,765 --> 00:21:01,340 or by whatever's running. 492 00:21:01,340 --> 00:21:05,170 How is this getting forced, if we think of file descriptors 493 00:21:05,170 --> 00:21:07,249 as capabilities? 494 00:21:07,249 --> 00:21:09,040 So many of you guys actually submitted this 495 00:21:09,040 --> 00:21:11,300 as the big question about Capsicum. 496 00:21:11,300 --> 00:21:13,080 What do you think? 497 00:21:13,080 --> 00:21:17,490 What prevents an application from synthesizing a capability 498 00:21:17,490 --> 00:21:20,490 in this file descriptor world? 499 00:21:20,490 --> 00:21:24,310 Could you synthesize a capability? 500 00:21:24,310 --> 00:21:24,950 Yeah? 501 00:21:24,950 --> 00:21:26,949 AUDIENCE: Well, it was probably like a structure 502 00:21:26,949 --> 00:21:29,364 and a construct that says that they 503 00:21:29,364 --> 00:21:31,504 have a capability for certain file descriptors. 504 00:21:31,504 --> 00:21:32,170 PROFESSOR: Yeah. 505 00:21:32,170 --> 00:21:35,510 So it's actually fairly easy to see 506 00:21:35,510 --> 00:21:37,500 what goes on once you look at what exactly 507 00:21:37,500 --> 00:21:38,666 is a file descriptor, right? 508 00:21:38,666 --> 00:21:40,230 So a file descriptor is basically 509 00:21:40,230 --> 00:21:42,040 just some sort of an integer. 510 00:21:42,040 --> 00:21:44,756 And this integer-- like in Unix, you 511 00:21:44,756 --> 00:21:46,880 have file descriptor 0, which refers to your input, 512 00:21:46,880 --> 00:21:48,796 file descriptor 1 which refers to your output. 513 00:21:48,796 --> 00:21:49,470 Rockwell 514 00:21:49,470 --> 00:21:52,580 But really, these are just integers in user space. 515 00:21:52,580 --> 00:21:56,120 And this is what the application can presumably do, 516 00:21:56,120 --> 00:21:58,380 and it can choose any integer it wants. 517 00:21:58,380 --> 00:22:00,190 But whenever you try to do something 518 00:22:00,190 --> 00:22:02,570 to a file descriptor, which is one of these integers, 519 00:22:02,570 --> 00:22:05,640 the kernel will always interpret the integer 520 00:22:05,640 --> 00:22:08,680 according to your current process's file descriptor 521 00:22:08,680 --> 00:22:09,490 table. 522 00:22:09,490 --> 00:22:12,430 So for every PID-- let's say, well, this is PID, 523 00:22:12,430 --> 00:22:13,395 I don't know, 57. 524 00:22:13,395 --> 00:22:14,830 So I'm process running. 525 00:22:14,830 --> 00:22:18,750 It has an open file table, and each integer 526 00:22:18,750 --> 00:22:20,560 from supply from user space, refers 527 00:22:20,560 --> 00:22:23,185 to some entry in this table. 528 00:22:23,185 --> 00:22:26,650 And of course, the kernel should check that the integer 529 00:22:26,650 --> 00:22:28,000 is in bounds in this stable. 530 00:22:28,000 --> 00:22:29,630 It isn't negative. 531 00:22:29,630 --> 00:22:31,890 It doesn't go past the end of the table. 532 00:22:31,890 --> 00:22:34,050 Otherwise, it will have the usual buffer overflow 533 00:22:34,050 --> 00:22:35,630 problems, et cetera. 534 00:22:35,630 --> 00:22:38,517 But if you carefully check that the integer is 535 00:22:38,517 --> 00:22:41,380 in bounds in the kernel implementation, 536 00:22:41,380 --> 00:22:44,670 then the only possible things that the application 537 00:22:44,670 --> 00:22:46,550 can refer to by a file descriptor 538 00:22:46,550 --> 00:22:48,910 are entries in this table. 539 00:22:48,910 --> 00:22:51,060 So presumably, the kernel will somehow 540 00:22:51,060 --> 00:22:54,640 make sure that you legitimately guard a particular capability. 541 00:22:54,640 --> 00:22:58,810 So when you, for example, open a file outside of this capability 542 00:22:58,810 --> 00:23:03,240 model in Unix, well, the kernel, after the open call succeeds, 543 00:23:03,240 --> 00:23:07,420 it's going to change that file descriptor table 544 00:23:07,420 --> 00:23:10,090 entry to point to a particular open file, 545 00:23:10,090 --> 00:23:11,126 like maybe open/etc/pwd. 546 00:23:14,350 --> 00:23:17,380 And now, the entry at this slot on the table 547 00:23:17,380 --> 00:23:18,580 points to an open file. 548 00:23:18,580 --> 00:23:20,080 Some of them might actually be null. 549 00:23:20,080 --> 00:23:23,260 Maybe you don't have an open file with a particular index 550 00:23:23,260 --> 00:23:24,660 in this table. 551 00:23:24,660 --> 00:23:29,000 And as a result, what does it mean to forge a capability? 552 00:23:29,000 --> 00:23:30,700 The only thing you can do in user space 553 00:23:30,700 --> 00:23:32,460 is make up an integer. 554 00:23:32,460 --> 00:23:35,230 And the only integers that would make sense to make up 555 00:23:35,230 --> 00:23:38,560 would be entries that point to non-null entries in this table. 556 00:23:38,560 --> 00:23:42,910 And those guys are exactly the capabilities that you have. 557 00:23:42,910 --> 00:23:45,620 So does that make sense why it's difficult, in this file 558 00:23:45,620 --> 00:23:47,750 descriptor world, to actually forge capabilities 559 00:23:47,750 --> 00:23:48,542 in the first place? 560 00:23:48,542 --> 00:23:49,708 So it's kind of cool, right? 561 00:23:49,708 --> 00:23:52,130 Like the only files that you have opened are exactly 562 00:23:52,130 --> 00:23:53,420 the things you can operate on. 563 00:23:53,420 --> 00:23:56,740 And there's nothing else that you can potentially 564 00:23:56,740 --> 00:23:59,996 touch and effect. 565 00:23:59,996 --> 00:24:00,820 Make sense? 566 00:24:00,820 --> 00:24:01,403 Any questions? 567 00:24:05,630 --> 00:24:06,610 All right. 568 00:24:06,610 --> 00:24:07,110 OK. 569 00:24:07,110 --> 00:24:09,990 So I guess, how would capabilities 570 00:24:09,990 --> 00:24:12,540 help solve the ambient authority problem 571 00:24:12,540 --> 00:24:14,820 that Norman Hardy is excited about with his Fortran 572 00:24:14,820 --> 00:24:16,020 compiler? 573 00:24:16,020 --> 00:24:19,790 So what would be the file descriptor moral equivalent 574 00:24:19,790 --> 00:24:22,600 solution to this sysx/fort thing? 575 00:24:25,682 --> 00:24:27,140 Do they actually solve the problem? 576 00:24:27,140 --> 00:24:28,016 Yeah? 577 00:24:28,016 --> 00:24:31,590 AUDIENCE: Well, they just use the appropriate capabilities 578 00:24:31,590 --> 00:24:33,160 whenever they're needed. 579 00:24:33,160 --> 00:24:36,660 So when you have to access the output file, in the statistics, 580 00:24:36,660 --> 00:24:39,378 you use the capability [INAUDIBLE] file. 581 00:24:39,378 --> 00:24:42,320 But when you're accessing the file you're about to read, 582 00:24:42,320 --> 00:24:44,714 you don't use that capability. 583 00:24:44,714 --> 00:24:45,380 PROFESSOR: Yeah. 584 00:24:45,380 --> 00:24:48,370 So I guess really what it boils down to is that somehow 585 00:24:48,370 --> 00:24:51,560 the Fortran compiler should just already have a file descriptor 586 00:24:51,560 --> 00:24:54,280 open for that /sysx/stat file. 587 00:24:54,280 --> 00:24:57,660 So they don't really describe, in their short paper, 588 00:24:57,660 --> 00:24:59,950 about how we don't get that capability. 589 00:24:59,950 --> 00:25:02,340 But it basically means you shouldn't really 590 00:25:02,340 --> 00:25:04,250 pass file names around. 591 00:25:04,250 --> 00:25:05,925 You shouldn't set past file descriptors. 592 00:25:05,925 --> 00:25:08,270 So you could actually come up with a perhaps much more 593 00:25:08,270 --> 00:25:12,540 elegant design for our Unix replacement on the Fortran 594 00:25:12,540 --> 00:25:14,290 compiler using capabilities. 595 00:25:14,290 --> 00:25:19,530 So maybe the plan is we should just have a Fortran compiler 596 00:25:19,530 --> 00:25:22,310 front end that doesn't have any extra privileges, 597 00:25:22,310 --> 00:25:25,750 and it takes all these arguments you give it, and converts 598 00:25:25,750 --> 00:25:30,340 all the path names you supply to it into open file descriptors. 599 00:25:30,340 --> 00:25:33,540 So the alternative design I am thinking of here 600 00:25:33,540 --> 00:25:36,160 is that maybe we'd have a program 601 00:25:36,160 --> 00:25:38,200 fort1, which is the front end. 602 00:25:38,200 --> 00:25:40,345 And it would take some sort of a file, foo.f, 603 00:25:40,345 --> 00:25:45,390 and all the other arguments, -o, whatever. 604 00:25:45,390 --> 00:25:48,470 And it doesn't actually implement any of the compiler 605 00:25:48,470 --> 00:25:50,020 logic, anything else. 606 00:25:50,020 --> 00:25:52,080 All it looks for is path names in its arguments, 607 00:25:52,080 --> 00:25:54,870 and it's going to open them and establish 608 00:25:54,870 --> 00:25:55,991 file descriptors for them. 609 00:25:56,471 --> 00:25:58,054 And the cool thing is that, because it 610 00:25:58,054 --> 00:26:01,570 has no extra privileges, if the user can't have access 611 00:26:01,570 --> 00:26:03,520 to some file name, then it will fail. 612 00:26:03,520 --> 00:26:04,720 Those are great. 613 00:26:04,720 --> 00:26:07,280 And then once this front end has opened all these file 614 00:26:07,280 --> 00:26:10,990 descriptors, it can execute some privileged extra component, 615 00:26:10,990 --> 00:26:14,500 like the actual setuid Fortran compiler. 616 00:26:14,500 --> 00:26:16,520 So maybe then it'll run fort. 617 00:26:16,520 --> 00:26:19,075 This guy's maybe setuid to some special user ID that 618 00:26:19,075 --> 00:26:21,230 has access to the stats file. 619 00:26:21,230 --> 00:26:23,750 But it doesn't actually accept any path names as input. 620 00:26:23,750 --> 00:26:27,250 All it's going to do is accept file descriptors. 621 00:26:27,250 --> 00:26:29,550 And, in that case, the file descriptor 622 00:26:29,550 --> 00:26:33,980 is already prove that the caller had access to open them. 623 00:26:33,980 --> 00:26:35,845 Does the property make sense? 624 00:26:35,845 --> 00:26:37,800 So it of course doesn't solve every issue. 625 00:26:37,800 --> 00:26:40,570 I'm just sort of sketching out how capabilities might help. 626 00:26:40,570 --> 00:26:43,565 But that's roughly the plan, is that you should demonstrate 627 00:26:43,565 --> 00:26:45,760 the fact that you have access to a particular name 628 00:26:45,760 --> 00:26:49,190 by just opening it and passing a capability, instead of saying, 629 00:26:49,190 --> 00:26:51,140 why didn't you try to open this file 630 00:26:51,140 --> 00:26:54,457 and maybe accidentally use some extra privileges. 631 00:26:54,457 --> 00:26:54,956 Yes. 632 00:26:54,956 --> 00:26:56,354 AUDIENCE: So does this generalize 633 00:26:56,354 --> 00:26:59,137 to having one process per capability? 634 00:26:59,137 --> 00:27:00,470 PROFESSOR: Does this generalize? 635 00:27:00,470 --> 00:27:02,330 Well, of course you can have as many processes as you want. 636 00:27:02,330 --> 00:27:04,288 You can have multiple processes per capability, 637 00:27:04,288 --> 00:27:05,324 but I'm not sure-- 638 00:27:05,324 --> 00:27:06,240 AUDIENCE: [INAUDIBLE]. 639 00:27:12,930 --> 00:27:16,480 PROFESSOR: I'm still not sure what you mean by one property. 640 00:27:16,480 --> 00:27:19,222 AUDIENCE: So we have [INAUDIBLE] capabilities the user has. 641 00:27:19,801 --> 00:27:20,800 PROFESSOR: That's right. 642 00:27:20,800 --> 00:27:22,633 AUDIENCE: And then we have the fort.s access 643 00:27:22,633 --> 00:27:24,211 to this past file. 644 00:27:24,211 --> 00:27:25,210 PROFESSOR: That's right. 645 00:27:25,210 --> 00:27:25,470 Yeah. 646 00:27:25,470 --> 00:27:27,595 So the way to think of it is, you don't necessarily 647 00:27:27,595 --> 00:27:31,516 need a separate process for every capability. 648 00:27:31,516 --> 00:27:35,140 Because here, the fort1 thing might open many files 649 00:27:35,140 --> 00:27:38,590 and might pass many capabilities to the privileged fort 650 00:27:38,590 --> 00:27:40,435 component. 651 00:27:40,435 --> 00:27:42,060 The problem here-- the reason that this 652 00:27:42,060 --> 00:27:44,030 might seem like you want a separate process 653 00:27:44,030 --> 00:27:48,427 for every capability is that we're 654 00:27:48,427 --> 00:27:51,010 sort of dealing with this weird interface between capabilities 655 00:27:51,010 --> 00:27:52,450 and ambient privileges. 656 00:27:52,450 --> 00:27:54,780 Because fort1 sort of does have ambient privilege. 657 00:27:54,780 --> 00:27:56,155 And what we're doing is basically 658 00:27:56,155 --> 00:27:59,100 we're converting this ambient privilege into capabilities 659 00:27:59,100 --> 00:28:00,890 in this fort1 process. 660 00:28:00,890 --> 00:28:02,580 So if you have multiple different kinds 661 00:28:02,580 --> 00:28:05,035 of ambient privilege, or multiple different privileges 662 00:28:05,035 --> 00:28:07,730 that you want to carefully use, then maybe what you want 663 00:28:07,730 --> 00:28:10,320 is a separate process holding that privilege. 664 00:28:10,320 --> 00:28:12,820 And whenever you want to use a particular set of privileges, 665 00:28:12,820 --> 00:28:14,520 you'll ask the corresponding process 666 00:28:14,520 --> 00:28:16,800 to please perform a separation. 667 00:28:16,800 --> 00:28:19,120 And if it succeeds, give me back the capability. 668 00:28:19,120 --> 00:28:21,210 So that's maybe one way to think of this. 669 00:28:24,000 --> 00:28:26,336 There's been actually some operating system designs that 670 00:28:26,336 --> 00:28:30,770 are entirely capability-based, there are no ambient privileges 671 00:28:30,770 --> 00:28:31,564 whatsoever. 672 00:28:31,564 --> 00:28:32,480 And it's kind of cool. 673 00:28:32,480 --> 00:28:35,961 Unfortunately, it's more of sort of an interesting reading 674 00:28:35,961 --> 00:28:36,460 experience. 675 00:28:36,460 --> 00:28:37,905 Like oh, yeah, you can do it. 676 00:28:37,905 --> 00:28:38,920 That's pretty cool. 677 00:28:38,920 --> 00:28:42,680 But it's probably not really practical to use 678 00:28:42,680 --> 00:28:45,540 in a real system, unfortunately. 679 00:28:45,540 --> 00:28:48,300 It turns out that you really do want not so much 680 00:28:48,300 --> 00:28:51,200 ambient privilege but being able to name an object 681 00:28:51,200 --> 00:28:53,960 and tell someone about an object without conveying necessarily 682 00:28:53,960 --> 00:28:56,060 the rights to that object. 683 00:28:56,060 --> 00:28:57,670 So maybe I don't know what privileges 684 00:28:57,670 --> 00:29:00,599 you might have over some shared document, but I do 685 00:29:00,599 --> 00:29:02,890 want to tell you, hey, well, there's a shared document. 686 00:29:02,890 --> 00:29:04,230 If you can read it, read it. 687 00:29:04,230 --> 00:29:05,605 If you write it, great, write it. 688 00:29:05,605 --> 00:29:07,830 But I don't want to necessarily convey any rights. 689 00:29:07,830 --> 00:29:10,960 I just want to tell you, hey, there's this thing, go try it. 690 00:29:10,960 --> 00:29:13,540 So it's a bit of a bummer in a capability world 691 00:29:13,540 --> 00:29:16,930 that it really forces you to never talk 692 00:29:16,930 --> 00:29:21,050 about objects without conveying rights to that object. 693 00:29:21,050 --> 00:29:24,910 So it's an important idea to know about, 694 00:29:24,910 --> 00:29:27,240 and to use it in some parts of a system, 695 00:29:27,240 --> 00:29:29,639 but probably not the be all end all solution 696 00:29:29,639 --> 00:29:31,930 to security, much like almost anything else [INAUDIBLE] 697 00:29:31,930 --> 00:29:33,419 about here. 698 00:29:33,419 --> 00:29:33,918 Make sense? 699 00:29:33,918 --> 00:29:34,810 Yeah? 700 00:29:34,810 --> 00:29:37,720 AUDIENCE: So if the process has capabilities given to it 701 00:29:37,720 --> 00:29:40,811 by some other process, and it happens 702 00:29:40,811 --> 00:29:43,395 to already have the capability to that object, that's greater. 703 00:29:43,395 --> 00:29:45,269 Can it compare them to make sure that they're 704 00:29:45,269 --> 00:29:46,629 about the same object? 705 00:29:46,629 --> 00:29:48,420 Or will it just use the one that's greater? 706 00:29:48,420 --> 00:29:50,919 PROFESSOR: So the thing is that a process doesn't implicitly 707 00:29:50,919 --> 00:29:51,794 use the capabilities. 708 00:29:51,794 --> 00:29:53,627 So that's the cool thing about capabilities. 709 00:29:53,627 --> 00:29:55,760 You have to explicitly name which one you're using. 710 00:29:55,760 --> 00:29:57,680 So think of it in terms of file descriptors. 711 00:29:57,680 --> 00:30:01,820 Suppose that I give you an open file descriptor for some file, 712 00:30:01,820 --> 00:30:02,807 and it's read only. 713 00:30:02,807 --> 00:30:04,890 And then someone else gives you another capability 714 00:30:04,890 --> 00:30:07,431 for some other-- maybe the same filem maybe a different file, 715 00:30:07,431 --> 00:30:08,760 and it's read/write. 716 00:30:08,760 --> 00:30:10,390 It's not all of a sudden that if you're 717 00:30:10,390 --> 00:30:12,869 trying to write to the first file descriptor 718 00:30:12,869 --> 00:30:14,660 you had that was read only, all of a sudden 719 00:30:14,660 --> 00:30:16,390 those will start succeeding because you 720 00:30:16,390 --> 00:30:19,270 have this extra writeable file descriptor open. 721 00:30:19,270 --> 00:30:21,407 So that's sort of the cool thing. 722 00:30:21,407 --> 00:30:22,990 You don't want this ambient privilege. 723 00:30:22,990 --> 00:30:24,920 Because if you think of these capabilities 724 00:30:24,920 --> 00:30:27,245 as a bunch of privileges that just keep accumulating 725 00:30:27,245 --> 00:30:29,190 in your process, then you'll actually just 726 00:30:29,190 --> 00:30:30,690 end up with ambient privilege again. 727 00:30:30,690 --> 00:30:32,849 You just have all these magic capabilities, 728 00:30:32,849 --> 00:30:34,765 and people have actually built such libraries. 729 00:30:34,765 --> 00:30:37,197 Basically, well, they manage your capabilities for you. 730 00:30:37,197 --> 00:30:38,280 They sort of collect them. 731 00:30:38,280 --> 00:30:39,680 And when you try to perform an operation, 732 00:30:39,680 --> 00:30:40,670 they look for the capabilities and find 733 00:30:40,670 --> 00:30:42,250 the one that'll make it work. 734 00:30:42,250 --> 00:30:44,500 That exactly brings you back to this ambient authority 735 00:30:44,500 --> 00:30:45,890 that you were trying to avoid. 736 00:30:45,890 --> 00:30:47,390 So the cool thing about capabilities 737 00:30:47,390 --> 00:30:50,670 is that it's almost like a programming construct, 738 00:30:50,670 --> 00:30:52,875 where it makes it easy for you-- which 739 00:30:52,875 --> 00:30:54,875 is a rare thing in security-- it makes it easier 740 00:30:54,875 --> 00:30:56,950 for you to write code that specifies 741 00:30:56,950 --> 00:30:59,200 exactly what privileges you want to do from a security 742 00:30:59,200 --> 00:30:59,700 standpoint. 743 00:30:59,700 --> 00:31:02,570 And it's actually a fairly natural code to write. 744 00:31:02,570 --> 00:31:05,280 So if you get into that mindset of always carrying around 745 00:31:05,280 --> 00:31:07,450 this privilege with the object you're accessing, 746 00:31:07,450 --> 00:31:09,210 it seems like a cool thing to do. 747 00:31:09,210 --> 00:31:12,750 It doesn't always make sense, but sometimes it does. 748 00:31:12,750 --> 00:31:16,070 Any other questions? 749 00:31:16,070 --> 00:31:16,640 OK. 750 00:31:16,640 --> 00:31:20,150 So that's more on the ambient authority 751 00:31:20,150 --> 00:31:21,730 that we've look at here. 752 00:31:21,730 --> 00:31:23,640 It turns out that capabilities are also 753 00:31:23,640 --> 00:31:26,100 great for other problems, as well. 754 00:31:26,100 --> 00:31:30,000 And in particular, the problem of managing privileges 755 00:31:30,000 --> 00:31:33,700 often shows up when you want to run some untrustworthy code. 756 00:31:33,700 --> 00:31:35,370 Because you want to really control 757 00:31:35,370 --> 00:31:37,280 which privileges you give it, because you 758 00:31:37,280 --> 00:31:40,590 think it will misuse any privileges you give it at all. 759 00:31:40,590 --> 00:31:44,150 And this is the slightly different point of view 760 00:31:44,150 --> 00:31:46,960 from which the authors of the Capsicum paper 761 00:31:46,960 --> 00:31:50,640 are coming at capabilities. 762 00:31:50,640 --> 00:31:53,575 So they're of course clearly aware of this ambient authority 763 00:31:53,575 --> 00:31:55,450 problem, but it's sort of a different problem 764 00:31:55,450 --> 00:31:57,720 that you might or might not care about solving. 765 00:31:57,720 --> 00:32:00,960 But the particular thing they really care about 766 00:32:00,960 --> 00:32:04,776 is they have a really large privileged application, 767 00:32:04,776 --> 00:32:06,150 and they worry that there's going 768 00:32:06,150 --> 00:32:10,480 to be bugs in different parts of that application source code. 769 00:32:10,480 --> 00:32:12,900 So they would like to reduce the privileges 770 00:32:12,900 --> 00:32:16,380 of different components of that application. 771 00:32:16,380 --> 00:32:20,480 So in that sense, the story is very similar to OKWS. 772 00:32:20,480 --> 00:32:24,459 So you have-- for sandboxing, you 773 00:32:24,459 --> 00:32:27,000 have some large application, you break it up into components, 774 00:32:27,000 --> 00:32:30,270 and you will limit what privileges each component has. 775 00:32:30,270 --> 00:32:31,520 So where does this make sense? 776 00:32:31,520 --> 00:32:34,140 Like OKWS is clearly one example. 777 00:32:34,140 --> 00:32:36,010 What are other situations where you might 778 00:32:36,010 --> 00:32:40,280 care about prileged separation? 779 00:32:40,280 --> 00:32:43,707 Well, I guess in the paper they describe the examples I 780 00:32:43,707 --> 00:32:44,540 actually got to run. 781 00:32:44,540 --> 00:32:48,320 So things like tcpdump and other applications 782 00:32:48,320 --> 00:32:50,285 that parse network data. 783 00:32:50,285 --> 00:32:53,890 So why do they worry so much about applications 784 00:32:53,890 --> 00:32:56,000 that parse network inputs? 785 00:32:56,000 --> 00:32:57,580 What goes wrong in tcpdump? 786 00:32:57,580 --> 00:32:58,656 Why are they so paranoid? 787 00:32:58,656 --> 00:33:01,036 AUDIENCE: Well, an attacker can control what's being sent 788 00:33:01,036 --> 00:33:01,988 and what's being called. 789 00:33:01,988 --> 00:33:02,470 PROFESSOR: Yeah. 790 00:33:02,470 --> 00:33:04,020 I think what they really worry about is, 791 00:33:04,020 --> 00:33:06,603 very much like with OKWS, they worry about that attack surface 792 00:33:06,603 --> 00:33:08,900 and how much can an attacker really control the inputs? 793 00:33:08,900 --> 00:33:11,970 And with these network parsing programs, 794 00:33:11,970 --> 00:33:14,698 there's a lot of control that that factor has. 795 00:33:14,698 --> 00:33:16,100 They have the exact packet. 796 00:33:16,100 --> 00:33:18,355 And the reason that this was so problematic 797 00:33:18,355 --> 00:33:21,400 is that if you're writing code in C that 798 00:33:21,400 --> 00:33:23,920 has to parse data structures, you're presumably 799 00:33:23,920 --> 00:33:26,100 going to do lots of pointer manipulations, 800 00:33:26,100 --> 00:33:28,830 copying bites into arrays, allocating memory. 801 00:33:28,830 --> 00:33:32,450 And as you are now experts, this is super fragile. 802 00:33:32,450 --> 00:33:34,875 And you can easily have memory management errors 803 00:33:34,875 --> 00:33:38,155 that lead to pretty disastrous consequences. 804 00:33:38,155 --> 00:33:39,530 So this is the reason why they're 805 00:33:39,530 --> 00:33:43,990 very excited about sandboxing various network protocol, 806 00:33:43,990 --> 00:33:45,790 parsing things, et cetera. 807 00:33:45,790 --> 00:33:47,850 Another probably real world instance 808 00:33:47,850 --> 00:33:50,070 where you really care about this is in your browser. 809 00:33:50,070 --> 00:33:52,070 You probably want to sandbox your Flash plug-in, 810 00:33:52,070 --> 00:33:54,960 or your Java extension, or whatnot. 811 00:33:54,960 --> 00:33:56,570 Because they're pretty large attack 812 00:33:56,570 --> 00:33:58,430 surfaces as well that have gotten 813 00:33:58,430 --> 00:34:01,352 exploited pretty aggressively. 814 00:34:01,352 --> 00:34:02,810 So it seems like a reasonable plan. 815 00:34:02,810 --> 00:34:04,726 Like if you're writing some piece of software, 816 00:34:04,726 --> 00:34:06,980 you want to sandbox different components of it. 817 00:34:06,980 --> 00:34:08,790 What about more generally, if you download something 818 00:34:08,790 --> 00:34:10,498 from the internet, and you want to run it 819 00:34:10,498 --> 00:34:12,889 with fewer privileges? 820 00:34:12,889 --> 00:34:16,989 Is this sort of Capsicum style isolation a good plan for that? 821 00:34:16,989 --> 00:34:19,500 I could download some random screensaver or some game 822 00:34:19,500 --> 00:34:20,290 from the internet. 823 00:34:20,290 --> 00:34:21,590 And I want to run it on my computer, 824 00:34:21,590 --> 00:34:23,381 and I want to make sure it doesn't screw up 825 00:34:23,381 --> 00:34:24,690 whatever I have laying around. 826 00:34:27,802 --> 00:34:28,760 Would you use Capsicum? 827 00:34:28,760 --> 00:34:31,588 Would this be a good plan? 828 00:34:31,588 --> 00:34:33,046 Yeah? 829 00:34:33,046 --> 00:34:35,476 AUDIENCE: You could write a sandboxing program, 830 00:34:35,476 --> 00:34:38,878 which you'd use Capsicum to sandbox [INAUDIBLE]. 831 00:34:42,652 --> 00:34:43,360 PROFESSOR: Right. 832 00:34:43,360 --> 00:34:44,900 You could try to use Capsicum. 833 00:34:44,900 --> 00:34:46,150 So how would you use Capsicum? 834 00:34:46,150 --> 00:34:49,380 Well, you'd just enter into the sandbox mode with cap_enter. 835 00:34:49,380 --> 00:34:53,330 And then you run the program. 836 00:34:53,330 --> 00:34:54,514 Would you expect it to work? 837 00:34:56,887 --> 00:34:59,220 I guess the problem is that if the program wasn't really 838 00:34:59,220 --> 00:35:01,155 expecting to be sandboxed with Capsicum, 839 00:35:01,155 --> 00:35:04,920 then all of a sudden the program will try to open any 840 00:35:04,920 --> 00:35:07,460 simplified-- it'll open a shared library, 841 00:35:07,460 --> 00:35:09,430 and it can't open the shared library, 842 00:35:09,430 --> 00:35:11,570 because it can't open/liv/ something else. 843 00:35:11,570 --> 00:35:13,810 That's not allowed in capability mode. 844 00:35:13,810 --> 00:35:16,790 So it's a bit of a problem. 845 00:35:16,790 --> 00:35:18,800 So typically, these sandboxing techniques 846 00:35:18,800 --> 00:35:21,685 that we're going to look at here-- capabilities, style, 847 00:35:21,685 --> 00:35:24,850 stuff, and so on-- really are best 848 00:35:24,850 --> 00:35:27,400 used when the developer is sort of building 849 00:35:27,400 --> 00:35:30,110 the application aware that the code is 850 00:35:30,110 --> 00:35:31,882 going to run in this mode. 851 00:35:31,882 --> 00:35:34,260 There's probably other kinds of sandboxing techniques 852 00:35:34,260 --> 00:35:36,550 that could be used for unmodified code, 853 00:35:36,550 --> 00:35:40,270 but then the focus, or the requirements, change a bit. 854 00:35:40,270 --> 00:35:42,410 So in Capsicum, they don't really 855 00:35:42,410 --> 00:35:43,910 worry about backwards compatibility. 856 00:35:43,910 --> 00:35:45,320 Well, we have to open files differently? 857 00:35:45,320 --> 00:35:46,770 Sure, we'll open them differently. 858 00:35:46,770 --> 00:35:48,820 Whereas, if you want to write existing code, 859 00:35:48,820 --> 00:35:51,330 you probably want something more like maybe 860 00:35:51,330 --> 00:35:52,450 a full virtual machine. 861 00:35:52,450 --> 00:35:55,040 So you could open a VM and run it there. 862 00:35:55,040 --> 00:35:58,400 And it's very compatible, and there's 863 00:35:58,400 --> 00:36:03,440 no question that it'll just run, and probably not-- 864 00:36:03,440 --> 00:36:07,060 Well, it's actually a good thought exercise. 865 00:36:07,060 --> 00:36:11,970 Should we use virtual machines to sandbox instead of Capsicum? 866 00:36:11,970 --> 00:36:12,886 AUDIENCE: [INAUDIBLE]. 867 00:36:12,886 --> 00:36:13,690 PROFESSOR: Yeah. 868 00:36:13,690 --> 00:36:16,510 The overheads are probably quite significant. 869 00:36:16,510 --> 00:36:20,715 So the memory overhead is pretty bad. 870 00:36:20,715 --> 00:36:21,325 It could be. 871 00:36:21,325 --> 00:36:22,900 But what if we don't care about memory overhead? 872 00:36:22,900 --> 00:36:24,691 So maybe virtual machines gets really good, 873 00:36:24,691 --> 00:36:28,080 and they don't actually use that much memory. 874 00:36:28,080 --> 00:36:30,210 Is it still a bad plan? 875 00:36:30,210 --> 00:36:32,708 AUDIENCE: [INAUDIBLE]. 876 00:36:32,708 --> 00:36:33,374 PROFESSOR: Yeah. 877 00:36:33,374 --> 00:36:37,160 So it's kind of hard to control what happens on the network, 878 00:36:37,160 --> 00:36:40,150 because either you give the virtual machine no access 879 00:36:40,150 --> 00:36:42,570 to the network at all, or you connect to a network 880 00:36:42,570 --> 00:36:45,800 through NAT mode or something in Preview or VMware. 881 00:36:45,800 --> 00:36:47,550 And then it can access the whole internet. 882 00:36:47,550 --> 00:36:52,652 So you have to much more explicitly control network 883 00:36:52,652 --> 00:36:55,110 by maybe setting up firewall rules for the virtual machine, 884 00:36:55,110 --> 00:36:55,797 et cetera. 885 00:36:55,797 --> 00:36:56,880 That's maybe not so great. 886 00:36:56,880 --> 00:36:58,890 What if you don't care about network? 887 00:36:58,890 --> 00:37:04,240 What if you're some simple video or tcpdump parser. 888 00:37:04,240 --> 00:37:05,260 You just spin up a VM. 889 00:37:05,260 --> 00:37:07,000 It's going to parse your tcpdump packets 890 00:37:07,000 --> 00:37:09,490 and spit you back after your presentation 891 00:37:09,490 --> 00:37:11,850 that tcpdump wants to burn to the user. 892 00:37:11,850 --> 00:37:14,190 So there's no real network I/O. Maybe you're, 893 00:37:14,190 --> 00:37:20,820 for some reason [INAUDIBLE] still? 894 00:37:20,820 --> 00:37:23,340 AUDIENCE: Because the initialization overhead 895 00:37:23,340 --> 00:37:24,656 is still large. 896 00:37:24,656 --> 00:37:25,490 PROFESSOR: Yeah. 897 00:37:25,490 --> 00:37:27,823 So it's maybe like an initial overhead of starting a VM. 898 00:37:27,823 --> 00:37:28,620 So that's true. 899 00:37:28,620 --> 00:37:32,030 There's some performance stuff. 900 00:37:32,030 --> 00:37:32,530 Yeah. 901 00:37:32,530 --> 00:37:34,780 AUDIENCE: Well, you might want to have database rights 902 00:37:34,780 --> 00:37:35,762 and things like that. 903 00:37:35,762 --> 00:37:36,200 PROFESSOR: Yeah. 904 00:37:36,200 --> 00:37:38,158 But even more generally, what you're getting at 905 00:37:38,158 --> 00:37:41,140 is what if there's a real data that you care about here? 906 00:37:41,140 --> 00:37:42,840 And it's really hard to share. 907 00:37:42,840 --> 00:37:45,990 So VMs are really a much more sort 908 00:37:45,990 --> 00:37:50,040 of separation mechanism, where you can't really share stuff 909 00:37:50,040 --> 00:37:51,970 across VMs very easily. 910 00:37:51,970 --> 00:37:53,640 So it's good for situations where 911 00:37:53,640 --> 00:37:57,090 you have a very isolated program you want to run, you basically 912 00:37:57,090 --> 00:37:59,470 don't want to share any files with any directories, 913 00:37:59,470 --> 00:38:01,830 any processes, any pipes even. 914 00:38:01,830 --> 00:38:03,640 And you just let it run separately. 915 00:38:03,640 --> 00:38:04,290 So it's great. 916 00:38:04,290 --> 00:38:07,340 It's probably, in some ways, stronger isolation than what 917 00:38:07,340 --> 00:38:10,340 Capsicum provides, because there's probably fewer 918 00:38:10,340 --> 00:38:12,865 ways for things to go wrong. 919 00:38:12,865 --> 00:38:14,240 And, you know, all these problems 920 00:38:14,240 --> 00:38:15,640 we talked about so far. 921 00:38:15,640 --> 00:38:18,189 But it's also not applicable in many of the situations 922 00:38:18,189 --> 00:38:19,730 where you might want to use Capsicum, 923 00:38:19,730 --> 00:38:21,880 because in Capsicum, you can actually 924 00:38:21,880 --> 00:38:26,645 share files that have very fine granularity between sandbox 925 00:38:26,645 --> 00:38:30,342 [INAUDIBLE] by just giving it capability to [INAUDIBLE] file. 926 00:38:30,342 --> 00:38:32,550 This is something that's very easy to do in Capsicum, 927 00:38:32,550 --> 00:38:35,220 and would require quite a bit of machinery 928 00:38:35,220 --> 00:38:37,280 in a virtual machine setting. 929 00:38:37,280 --> 00:38:40,720 That makes sense? 930 00:38:40,720 --> 00:38:43,200 Questions? 931 00:38:43,200 --> 00:38:44,330 All right. 932 00:38:44,330 --> 00:38:47,600 So does that seem like a useful primitives 933 00:38:47,600 --> 00:38:49,340 to have to maybe sandbox stuff. 934 00:38:49,340 --> 00:38:53,040 So I guess we're going to talk about different ways 935 00:38:53,040 --> 00:38:54,900 to try to sandbox something. 936 00:38:54,900 --> 00:38:58,060 And Capsicum in particular is the new thing here 937 00:38:58,060 --> 00:38:59,270 that uses capabilities. 938 00:38:59,270 --> 00:39:05,810 But just by comparison, I guess, you 939 00:39:05,810 --> 00:39:08,350 can do some sandboxing in Unix, as we saw with OKWS. 940 00:39:08,350 --> 00:39:08,850 Right? 941 00:39:08,850 --> 00:39:13,170 It's just not great from several standpoints. 942 00:39:13,170 --> 00:39:17,860 So let's maybe take the example of tcpdump 943 00:39:17,860 --> 00:39:24,530 and see why tcpdump is difficult to sandbox with Unix mechanism. 944 00:39:24,530 --> 00:39:27,880 So remember, in the Capsicum paper, these guys took tcpdump. 945 00:39:27,880 --> 00:39:32,570 And the way tcpdump works is that it 946 00:39:32,570 --> 00:39:39,080 opens some special sockets and then runs basically parsing 947 00:39:39,080 --> 00:39:41,010 logic on network packets. 948 00:39:41,010 --> 00:39:44,860 And it proceeds and prints them out to the users' terminal. 949 00:39:44,860 --> 00:39:51,180 So what would it take to sandbox tcpdump with Unix primitives? 950 00:39:51,180 --> 00:39:54,066 Have you restricted privileges? 951 00:39:54,066 --> 00:39:55,870 So I guess the one problem with Unix 952 00:39:55,870 --> 00:39:59,300 is that you basically have to-- well, the only way 953 00:39:59,300 --> 00:40:01,890 to really change privileges is to change 954 00:40:01,890 --> 00:40:04,152 the inputs into the decision function that 955 00:40:04,152 --> 00:40:06,610 decides whether you can actually access some object or not. 956 00:40:06,610 --> 00:40:09,160 And the only things you can really change 957 00:40:09,160 --> 00:40:11,860 are, well, you can change the privilges of the process, 958 00:40:11,860 --> 00:40:14,300 which means it sends UID to something else. 959 00:40:14,300 --> 00:40:15,800 Or you could change the permissions 960 00:40:15,800 --> 00:40:21,510 on various objects that are laying around in your system. 961 00:40:21,510 --> 00:40:23,330 Or probably both, in fact, right? 962 00:40:23,330 --> 00:40:25,110 If you wanted to sandbox tcpdump, 963 00:40:25,110 --> 00:40:27,850 you'd probably have to pick some extra user ID 964 00:40:27,850 --> 00:40:31,612 and switch to that while you're running. 965 00:40:31,612 --> 00:40:36,660 Probably not an ideal plan, because you probably 966 00:40:36,660 --> 00:40:39,340 don't mean for multiple instances of tcpdump 967 00:40:39,340 --> 00:40:41,049 to run as the same user ID. 968 00:40:41,049 --> 00:40:42,840 So if I compromise one instance of tcpdump, 969 00:40:42,840 --> 00:40:45,307 it doesn't really mean I want to allow that factor 970 00:40:45,307 --> 00:40:47,515 to now control the other instances of tcpdump running 971 00:40:47,515 --> 00:40:49,070 on my machine. 972 00:40:49,070 --> 00:40:53,614 So that's potentially a bad part of using user IDs here. 973 00:40:53,614 --> 00:40:55,530 Another problem is that, in Unix, you actually 974 00:40:55,530 --> 00:40:58,924 have to be root in order to change the user 975 00:40:58,924 --> 00:41:01,215 ID of the process or something else, or user privileges 976 00:41:01,215 --> 00:41:03,200 or switch them to something else. 977 00:41:03,200 --> 00:41:05,060 That's not great either. 978 00:41:05,060 --> 00:41:08,080 And another problem is that, regardless 979 00:41:08,080 --> 00:41:11,700 of what your user ID is, there could be files 980 00:41:11,700 --> 00:41:13,830 that allow access to them. 981 00:41:13,830 --> 00:41:16,760 So there could be world writable or world readable files 982 00:41:16,760 --> 00:41:17,800 in your file system. 983 00:41:17,800 --> 00:41:19,730 Like your etc password file. 984 00:41:19,730 --> 00:41:22,370 Regardless of what your UID is, the process 985 00:41:22,370 --> 00:41:24,420 will still be able to read that password. 986 00:41:24,420 --> 00:41:26,070 So that's not so nice. 987 00:41:26,070 --> 00:41:29,850 So the result, in order to sandbox a unit, 988 00:41:29,850 --> 00:41:36,257 you probably have to do both-- some UID changing and maybe 989 00:41:36,257 --> 00:41:38,340 careful look at the permissions of all the objects 990 00:41:38,340 --> 00:41:40,507 to convince yourself that there's no world writeable 991 00:41:40,507 --> 00:41:41,714 file that's really sensitive. 992 00:41:41,714 --> 00:41:43,130 Or there's no world readable file 993 00:41:43,130 --> 00:41:45,742 that you don't want that hacker to get access to. 994 00:41:45,742 --> 00:41:48,200 And I guess [INAUDIBLE] true that you get another mechanism 995 00:41:48,200 --> 00:41:49,530 unit that you can use. 996 00:41:49,530 --> 00:41:50,920 But it all starts to add up. 997 00:41:50,920 --> 00:41:52,420 If you see it through, then it might 998 00:41:52,420 --> 00:41:56,681 be hard to share files or share directories and so on. 999 00:41:56,681 --> 00:41:57,680 So does that make sense? 1000 00:41:57,680 --> 00:42:00,466 Just in terms of contrast for what 1001 00:42:00,466 --> 00:42:02,393 Capsicum is trying to solve? 1002 00:42:02,393 --> 00:42:06,160 Any questions about Unix stuff? 1003 00:42:06,160 --> 00:42:06,950 All right. 1004 00:42:06,950 --> 00:42:10,650 So let's look at how Capsicum tries to solve this problem. 1005 00:42:10,650 --> 00:42:13,680 So in Capsicum, as we keep alluding to, 1006 00:42:13,680 --> 00:42:18,330 the plan is very much that once you enter the sandboxing mode, 1007 00:42:18,330 --> 00:42:20,879 everything is going to be accessed only 1008 00:42:20,879 --> 00:42:21,670 through capability. 1009 00:42:21,670 --> 00:42:23,490 So if you don't have a capability, 1010 00:42:23,490 --> 00:42:27,610 you simply cannot access any objects. 1011 00:42:27,610 --> 00:42:32,000 So these guys, in the paper, make a huge deal 1012 00:42:32,000 --> 00:42:34,870 about global namespaces. 1013 00:42:34,870 --> 00:42:37,720 So what's this thing about a global namespace, 1014 00:42:37,720 --> 00:42:39,460 and why are they so worried about it? 1015 00:42:43,155 --> 00:42:44,780 What's an example of a global namespace 1016 00:42:44,780 --> 00:42:47,379 these guys worry about? 1017 00:42:47,379 --> 00:42:48,534 AUDIENCE: [INAUDIBLE]. 1018 00:42:48,534 --> 00:42:49,200 PROFESSOR: Yeah. 1019 00:42:49,200 --> 00:42:51,634 So a file system from them is sort of the prime example 1020 00:42:51,634 --> 00:42:52,550 of a global namespace. 1021 00:42:52,550 --> 00:42:55,420 You can start a slash, and you can basically enumerate 1022 00:42:55,420 --> 00:42:56,690 any file you could, right? 1023 00:42:56,690 --> 00:42:59,450 Like go to someone's home directory-- 1024 00:42:59,450 --> 00:43:03,748 /home/nickolai/ something, something. 1025 00:43:03,748 --> 00:43:04,860 Why is this bad? 1026 00:43:04,860 --> 00:43:08,470 Why are they against global namespaces in Capsicum? 1027 00:43:14,350 --> 00:43:15,100 What do you think? 1028 00:43:15,100 --> 00:43:15,460 Yeah? 1029 00:43:15,460 --> 00:43:17,543 AUDIENCE: Well, if you have the wrong permissions, 1030 00:43:17,543 --> 00:43:20,534 then use authorities, and then you can get in trouble. 1031 00:43:20,534 --> 00:43:21,200 PROFESSOR: Yeah. 1032 00:43:21,200 --> 00:43:23,116 So the problem is that this is Unix after all. 1033 00:43:23,116 --> 00:43:27,370 So there are still regular permissions on file. 1034 00:43:27,370 --> 00:43:29,790 So maybe you really want to sandbox some process 1035 00:43:29,790 --> 00:43:31,804 and can't read anything at all in the system 1036 00:43:31,804 --> 00:43:32,970 and can't write to anything. 1037 00:43:32,970 --> 00:43:35,530 But if you can name a file starting from scratch, 1038 00:43:35,530 --> 00:43:38,060 you'll find some stupid user that has a world writable 1039 00:43:38,060 --> 00:43:39,970 file in their home directory. 1040 00:43:39,970 --> 00:43:43,874 And that would be not so great for the sandboxing client. 1041 00:43:43,874 --> 00:43:46,290 And I guess more generally, the way they're thinking of it 1042 00:43:46,290 --> 00:43:50,430 is that, with capabilities, you could, in principle, enumerate 1043 00:43:50,430 --> 00:43:53,122 exactly all the objects that a process has. 1044 00:43:53,122 --> 00:43:56,030 Because you could just enumerate all the capabilities 1045 00:43:56,030 --> 00:43:58,350 in the file descriptor table, or whatever it is that's 1046 00:43:58,350 --> 00:44:00,250 storing capabilities for you. 1047 00:44:00,250 --> 00:44:03,970 And those are the only things that the process could ever 1048 00:44:03,970 --> 00:44:05,734 touch. 1049 00:44:05,734 --> 00:44:07,900 And if you ever have access to our global namespace, 1050 00:44:07,900 --> 00:44:09,090 and this was potentially unbounded. 1051 00:44:09,090 --> 00:44:10,540 Because you could-- even if you have 1052 00:44:10,540 --> 00:44:11,920 some limited set of capabilities, 1053 00:44:11,920 --> 00:44:14,850 maybe you'll start from slash again and find some new file, 1054 00:44:14,850 --> 00:44:16,510 and you'll never really know what 1055 00:44:16,510 --> 00:44:19,745 is the set of operations or objects 1056 00:44:19,745 --> 00:44:22,120 that a process could access. 1057 00:44:22,120 --> 00:44:25,370 So this is the reason they're so worried about global namespaces 1058 00:44:25,370 --> 00:44:28,775 because it goes against their goal of precisely controlling 1059 00:44:28,775 --> 00:44:33,880 all the things that a sandbox process should have access to. 1060 00:44:33,880 --> 00:44:36,440 Make sense? 1061 00:44:36,440 --> 00:44:37,590 All right. 1062 00:44:37,590 --> 00:44:39,850 So they tried to eliminate global namespaces 1063 00:44:39,850 --> 00:44:44,590 with a bunch of kernel changes to the FreeBSD, in their case, 1064 00:44:44,590 --> 00:44:47,960 kernel to make sure that all the operations go 1065 00:44:47,960 --> 00:44:52,220 through some kind of capability, which is, in their case, 1066 00:44:52,220 --> 00:44:54,190 a file descriptor. 1067 00:44:54,190 --> 00:44:57,800 So just to double check, do we really need kernel changes? 1068 00:44:57,800 --> 00:45:00,350 What if we just do this in a library? 1069 00:45:00,350 --> 00:45:03,040 So we implement Capsicum, which they already have a library. 1070 00:45:03,040 --> 00:45:05,700 And all we do is we change all these functions, 1071 00:45:05,700 --> 00:45:08,590 like open, read, and write, to all very exclusive use 1072 00:45:08,590 --> 00:45:09,927 capabilities. 1073 00:45:09,927 --> 00:45:12,010 So all operations will go through some capability, 1074 00:45:12,010 --> 00:45:16,193 and look it up in the file table, et cetera. 1075 00:45:16,193 --> 00:45:17,140 Does that work? 1076 00:45:17,140 --> 00:45:17,640 Yeah? 1077 00:45:17,640 --> 00:45:19,730 AUDIENCE: You could always make a sys call. 1078 00:45:19,730 --> 00:45:20,010 PROFESSOR: Yeah. 1079 00:45:20,010 --> 00:45:22,551 So the problem is that there was this existing set of systems 1080 00:45:22,551 --> 00:45:23,866 calls the kernel will accept. 1081 00:45:23,866 --> 00:45:25,866 And even if you implement a nice library, 1082 00:45:25,866 --> 00:45:28,240 it doesn't prevent a bad process or a compromised process 1083 00:45:28,240 --> 00:45:29,656 from making the sys call directly. 1084 00:45:29,656 --> 00:45:32,270 And then you have to have the kernel enforce 1085 00:45:32,270 --> 00:45:33,786 something or other. 1086 00:45:33,786 --> 00:45:34,286 Yeah? 1087 00:45:34,286 --> 00:45:36,724 AUDIENCE: [INAUDIBLE]. 1088 00:45:36,724 --> 00:45:37,390 PROFESSOR: Yeah. 1089 00:45:37,390 --> 00:45:39,247 So I think it's a question of-- I guess 1090 00:45:39,247 --> 00:45:40,330 what is your threat model? 1091 00:45:40,330 --> 00:45:40,830 Exactly. 1092 00:45:40,830 --> 00:45:42,580 So for the compiler, the threat model 1093 00:45:42,580 --> 00:45:47,230 is that the programmer is maybe not paying attention 1094 00:45:47,230 --> 00:45:50,240 a whole lot, but it's not really a compromised compiler process, 1095 00:45:50,240 --> 00:45:51,710 not an arbitrary code. 1096 00:45:51,710 --> 00:45:54,750 So if we just help the well-meaning developer do 1097 00:45:54,750 --> 00:45:58,590 the right thing, then a library will probably suffice. 1098 00:45:58,590 --> 00:46:00,990 On the other hand, if we're talking about a process that 1099 00:46:00,990 --> 00:46:03,110 could be our executing arbitrary code 1100 00:46:03,110 --> 00:46:05,610 and could be trying to bypass our mechanisms 1101 00:46:05,610 --> 00:46:07,210 in any possible way, then we have 1102 00:46:07,210 --> 00:46:09,370 to have a strong enforcement boundary. 1103 00:46:09,370 --> 00:46:12,160 And a library doesn't provide any kind of strong enforcement 1104 00:46:12,160 --> 00:46:12,660 guarantees. 1105 00:46:12,660 --> 00:46:16,311 Whereas a kernel, in our case, would do that. 1106 00:46:16,311 --> 00:46:16,810 OK. 1107 00:46:16,810 --> 00:46:20,805 So what do they actually make in terms of changes to the kernel? 1108 00:46:20,805 --> 00:46:25,270 So I guess the first thing is this system call 1109 00:46:25,270 --> 00:46:26,780 that they call cap_enter. 1110 00:46:30,750 --> 00:46:33,049 And what happens once you run cap_enter? 1111 00:46:33,049 --> 00:46:35,215 Once you've [INAUDIBLE] cap_enter from your process? 1112 00:46:38,309 --> 00:46:39,850 So as far as I can tell, what happens 1113 00:46:39,850 --> 00:46:44,950 is that the kernel will stop accepting any system calls that 1114 00:46:44,950 --> 00:46:47,635 refer to global namespaces. 1115 00:46:47,635 --> 00:46:49,260 And the only thing you'll be able to do 1116 00:46:49,260 --> 00:46:52,650 is refer to existing file descriptors 1117 00:46:52,650 --> 00:46:54,810 that you have open in your process. 1118 00:46:54,810 --> 00:46:58,340 So cap_enter will put your process in a special mode where 1119 00:46:58,340 --> 00:47:02,265 you cannot use the regular system called open, 1120 00:47:02,265 --> 00:47:06,059 and instead you have to do things like openat. 1121 00:47:06,059 --> 00:47:07,475 So there's this new sort of family 1122 00:47:07,475 --> 00:47:10,830 of systems called, in Unix like operating systems, where 1123 00:47:10,830 --> 00:47:13,280 instead of having open take a single path name, 1124 00:47:13,280 --> 00:47:15,850 you can actually you openat, where 1125 00:47:15,850 --> 00:47:17,560 you pass it a first argument which 1126 00:47:17,560 --> 00:47:20,110 is a file descriptor for a directory 1127 00:47:20,110 --> 00:47:23,640 and the second is some sort of a name. 1128 00:47:23,640 --> 00:47:27,610 And the open at system call will open this name 1129 00:47:27,610 --> 00:47:31,250 relative to whatever directory the file descriptor points to. 1130 00:47:31,250 --> 00:47:33,430 So this is a much more capability-like version 1131 00:47:33,430 --> 00:47:36,930 of open, where you can still have file descriptors pointing 1132 00:47:36,930 --> 00:47:42,580 to directories, but you can-- well, sorry. 1133 00:47:42,580 --> 00:47:44,795 You can still direct your operation. 1134 00:47:44,795 --> 00:47:46,170 But in order to do this, you have 1135 00:47:46,170 --> 00:47:47,872 to have a capability to the directory 1136 00:47:47,872 --> 00:47:49,830 in the form of an open file descriptor for that 1137 00:47:49,830 --> 00:47:51,200 [INAUDIBLE]. 1138 00:47:51,200 --> 00:47:53,944 Make sense? 1139 00:47:53,944 --> 00:47:55,290 OK. 1140 00:47:55,290 --> 00:47:58,480 So do they need any other kernel changes? 1141 00:47:58,480 --> 00:48:00,630 Is there anything else they worry about? 1142 00:48:04,520 --> 00:48:06,086 So I guess there's another-- yeah? 1143 00:48:06,086 --> 00:48:07,650 AUDIENCE: [INAUDIBLE]. 1144 00:48:07,650 --> 00:48:08,316 PROFESSOR: Yeah. 1145 00:48:08,316 --> 00:48:10,274 So what do they do about network access, right? 1146 00:48:10,274 --> 00:48:12,073 So what happens in capability mode? 1147 00:48:12,073 --> 00:48:14,281 AUDIENCE: I guess they have capabilities for security 1148 00:48:14,281 --> 00:48:17,365 packets [INAUDIBLE]. 1149 00:48:17,365 --> 00:48:17,990 PROFESSOR: Yes. 1150 00:48:17,990 --> 00:48:19,365 So I think the way they basically 1151 00:48:19,365 --> 00:48:22,682 do it is that they treat the network as a global namespace, 1152 00:48:22,682 --> 00:48:23,890 very much like a file system. 1153 00:48:23,890 --> 00:48:28,020 So I think once you enter capability mode, 1154 00:48:28,020 --> 00:48:30,660 you cannot create a new socket. 1155 00:48:30,660 --> 00:48:33,320 Or you cannot create a new socket and connect to some 1156 00:48:33,320 --> 00:48:36,321 arbitrary machine, or to some arbitrary address or fort 1157 00:48:36,321 --> 00:48:36,820 number. 1158 00:48:36,820 --> 00:48:40,710 You have to basically create all the connections you want ahead 1159 00:48:40,710 --> 00:48:42,420 of time and fill them in as capabilities. 1160 00:48:42,420 --> 00:48:44,670 Or maybe you'd have to get them from someone that will 1161 00:48:44,670 --> 00:48:46,185 pass you a file descriptor. 1162 00:48:46,185 --> 00:48:48,655 But basically, once you're in capability mode, 1163 00:48:48,655 --> 00:48:51,280 the set of file descriptors you have open completely enumerates 1164 00:48:51,280 --> 00:48:52,821 all the machines you'll ever talk to. 1165 00:48:52,821 --> 00:48:54,430 So you can find open connections. 1166 00:48:54,430 --> 00:48:55,846 Maybe you're listening on a forge. 1167 00:48:55,846 --> 00:48:57,050 That's OK. 1168 00:48:57,050 --> 00:48:59,790 But you cannot connect to an address specified 1169 00:48:59,790 --> 00:49:02,453 by an absolute name, kind of like a global namespace would 1170 00:49:02,453 --> 00:49:03,866 allow you to do it. 1171 00:49:03,866 --> 00:49:05,150 That make sense? 1172 00:49:05,150 --> 00:49:09,310 So it's access through the networking namespace, as well. 1173 00:49:09,310 --> 00:49:11,840 What do they do for processes? 1174 00:49:11,840 --> 00:49:14,400 So another global namespace, I guess, in Unix, 1175 00:49:14,400 --> 00:49:16,670 is the the PIDs themselves. 1176 00:49:16,670 --> 00:49:18,875 So the example of a system call that operates 1177 00:49:18,875 --> 00:49:20,090 in this name space is "kill." 1178 00:49:20,090 --> 00:49:22,549 So I could kill PID 25. 1179 00:49:22,549 --> 00:49:24,840 And I could-- well, presumably I'll put a single number 1180 00:49:24,840 --> 00:49:26,110 in there, too. 1181 00:49:26,110 --> 00:49:31,040 But I could actually kill a process by its PID number. 1182 00:49:31,040 --> 00:49:35,320 How do they fix this in Capsicum? 1183 00:49:35,320 --> 00:49:36,130 What's their plan? 1184 00:49:41,553 --> 00:49:42,269 Yeah? 1185 00:49:42,269 --> 00:49:44,018 AUDIENCE: File descriptors with processes. 1186 00:49:44,018 --> 00:49:44,520 PROFESSOR: Yeah. 1187 00:49:44,520 --> 00:49:45,130 It's actually kind of cool. 1188 00:49:45,130 --> 00:49:47,300 It's like, I wish Unix had this all along. 1189 00:49:47,300 --> 00:49:50,640 Which is that, instead of having these different kinds 1190 00:49:50,640 --> 00:49:54,630 of numbers or PIDs, instead, when you fork off a process, 1191 00:49:54,630 --> 00:49:56,620 actually having new variant of fork 1192 00:49:56,620 --> 00:50:01,300 called pdfork, or Process Descriptor Fork. 1193 00:50:01,300 --> 00:50:04,560 And what it does is when it creates a child process, 1194 00:50:04,560 --> 00:50:07,700 it actually sticks a reference to that child process 1195 00:50:07,700 --> 00:50:10,320 into your file descriptor table somewhere. 1196 00:50:10,320 --> 00:50:11,730 And this is your new process. 1197 00:50:11,730 --> 00:50:13,700 And you can operate on a child process 1198 00:50:13,700 --> 00:50:15,409 by specifying the file descriptor number. 1199 00:50:15,409 --> 00:50:17,491 Well, it would be pretty cool, because you can now 1200 00:50:17,491 --> 00:50:19,550 pass your child process to someone else 1201 00:50:19,550 --> 00:50:21,580 and say, well, if you can go and kill them now, 1202 00:50:21,580 --> 00:50:24,230 or you can manage this process however you want, 1203 00:50:24,230 --> 00:50:26,560 you'll get notifications when the process dies. 1204 00:50:26,560 --> 00:50:31,000 It'll look like a readable file descriptor, et cetera. 1205 00:50:31,000 --> 00:50:34,530 So they really try to homogenize everything 1206 00:50:34,530 --> 00:50:38,930 into looking like a file descriptor of some sort here. 1207 00:50:38,930 --> 00:50:40,695 And with these kernel changes, you 1208 00:50:40,695 --> 00:50:43,300 can finally have all the functionalities 1209 00:50:43,300 --> 00:50:44,330 you might care about. 1210 00:50:44,330 --> 00:50:46,110 You have the support for sockets already, 1211 00:50:46,110 --> 00:50:48,160 process descriptors, et cetera. 1212 00:50:48,160 --> 00:50:52,350 And you have a way of constraining 1213 00:50:52,350 --> 00:50:53,840 what the process can do. 1214 00:50:53,840 --> 00:50:56,470 Because it cannot refer to any of the global names anymore 1215 00:50:56,470 --> 00:50:59,690 after [INAUDIBLE]. 1216 00:50:59,690 --> 00:51:00,610 All right. 1217 00:51:00,610 --> 00:51:03,050 Any questions? 1218 00:51:03,050 --> 00:51:05,700 So here's an interesting puzzle. 1219 00:51:05,700 --> 00:51:07,820 I was trying to understand from the paper. 1220 00:51:07,820 --> 00:51:10,410 They make a big deal about dot dot 1221 00:51:10,410 --> 00:51:12,820 in looking up directory names. 1222 00:51:12,820 --> 00:51:16,210 So they basically say, well, once you're in capability mode, 1223 00:51:16,210 --> 00:51:19,430 when you pass a particular name to openat, 1224 00:51:19,430 --> 00:51:21,416 you cannot use dot dot in those names. 1225 00:51:21,416 --> 00:51:23,040 And presumably, if you have a Simulink, 1226 00:51:23,040 --> 00:51:25,205 if a Simulink's target contains dot dot, 1227 00:51:25,205 --> 00:51:28,780 they will reject it if you're in capability mode. 1228 00:51:28,780 --> 00:51:31,830 So is this strictly required? 1229 00:51:31,830 --> 00:51:33,980 Could you imagine a safe design in principle 1230 00:51:33,980 --> 00:51:35,610 that allows the use of dot dot? 1231 00:51:40,330 --> 00:51:41,040 Yeah. 1232 00:51:41,040 --> 00:51:43,664 AUDIENCE: Well, you'd need to be able to find whether they have 1233 00:51:43,664 --> 00:51:46,892 a file or a capability that allows the masses to the parent 1234 00:51:46,892 --> 00:51:47,880 directory. 1235 00:51:47,880 --> 00:51:48,640 PROFESSOR: Right. 1236 00:51:48,640 --> 00:51:50,181 AUDIENCE: So it's trivial to go down, 1237 00:51:50,181 --> 00:51:52,908 because any subdirectory-- you already have access to it 1238 00:51:52,908 --> 00:51:53,490 by having the capability. 1239 00:51:53,490 --> 00:51:54,190 PROFESSOR: That's right. 1240 00:51:54,190 --> 00:51:54,740 Yeah. 1241 00:51:54,740 --> 00:51:56,050 AUDIENCE: But going up, you need to see 1242 00:51:56,050 --> 00:51:58,050 whether you have any capabilities for the parent 1243 00:51:58,050 --> 00:51:58,810 directory. 1244 00:51:58,810 --> 00:51:59,810 PROFESSOR: That's right. 1245 00:51:59,810 --> 00:52:00,300 Yeah. 1246 00:52:00,300 --> 00:52:01,060 AUDIENCE: Search for it somehow. 1247 00:52:01,060 --> 00:52:01,220 PROFESSOR: Yeah. 1248 00:52:01,220 --> 00:52:01,955 So that's a little bit tricky. 1249 00:52:01,955 --> 00:52:03,503 And also, it goes against the grain 1250 00:52:03,503 --> 00:52:06,490 of this whole explicit authority thing. 1251 00:52:06,490 --> 00:52:09,895 What about if you're using dot dot inside sort 1252 00:52:09,895 --> 00:52:11,415 of a single open call? 1253 00:52:11,415 --> 00:52:15,800 So for example, what if you call something like openat some 1254 00:52:15,800 --> 00:52:18,050 particular directory or file descriptor number, 1255 00:52:18,050 --> 00:52:20,332 and you open something like, I don't know, b/c/../..? 1256 00:52:26,690 --> 00:52:28,910 In principle, this might be safe, right? 1257 00:52:28,910 --> 00:52:31,290 Because you go down some directory, and then you just 1258 00:52:31,290 --> 00:52:33,770 climb back up out of it. 1259 00:52:33,770 --> 00:52:34,660 Yeah? 1260 00:52:34,660 --> 00:52:36,824 AUDIENCE: What if c is [INAUDIBLE]? 1261 00:52:36,824 --> 00:52:37,490 PROFESSOR: Yeah. 1262 00:52:37,490 --> 00:52:38,560 So it's a little bit tricky, of course, 1263 00:52:38,560 --> 00:52:40,570 to define exactly what it means to be safe. 1264 00:52:40,570 --> 00:52:41,070 Right? 1265 00:52:41,070 --> 00:52:44,350 You probably have to make sure that c isn't a Simulink that 1266 00:52:44,350 --> 00:52:46,160 goes somewhere else and so on. 1267 00:52:46,160 --> 00:52:46,660 Yeah. 1268 00:52:46,660 --> 00:52:48,190 That's a fairly tricky proposition, to get this right. 1269 00:52:48,190 --> 00:52:50,106 And I think, in the paper, what they basically 1270 00:52:50,106 --> 00:52:52,000 argue about is that it's actually 1271 00:52:52,000 --> 00:52:54,630 quite difficult in practice to implement a set of checks 1272 00:52:54,630 --> 00:52:57,990 that's sufficient and bypasses all the possible rate 1273 00:52:57,990 --> 00:52:59,640 conditions here. 1274 00:52:59,640 --> 00:53:02,020 So they basically just do the conservative thing 1275 00:53:02,020 --> 00:53:04,190 and disallow any dot dot at any time 1276 00:53:04,190 --> 00:53:07,520 once you're in capability mode. 1277 00:53:07,520 --> 00:53:09,330 There's some interesting rate conditions 1278 00:53:09,330 --> 00:53:10,496 you could come up with here. 1279 00:53:10,496 --> 00:53:14,000 The lecture notes have more details. 1280 00:53:14,000 --> 00:53:16,010 But basically I think these guys are 1281 00:53:16,010 --> 00:53:18,560 being extra cautious in defining what's allowed 1282 00:53:18,560 --> 00:53:22,700 and what's not allowed in capability mode. 1283 00:53:22,700 --> 00:53:23,567 OK. 1284 00:53:23,567 --> 00:53:25,620 So here, to answer your question, 1285 00:53:25,620 --> 00:53:27,036 once you enter capability mode, it 1286 00:53:27,036 --> 00:53:30,505 seems to be all controlled by your file table. 1287 00:53:30,505 --> 00:53:33,641 Does your UID still matter, once you enter capability mode? 1288 00:53:41,020 --> 00:53:43,340 [INAUDIBLE] 1289 00:53:43,340 --> 00:53:44,080 Yeah? 1290 00:53:44,080 --> 00:53:46,080 AUDIENCE: Well, you could still launch a process 1291 00:53:46,080 --> 00:53:48,077 that doesn't use capabilities. 1292 00:53:48,077 --> 00:53:48,660 PROFESSOR: No. 1293 00:53:48,660 --> 00:53:50,187 Actually, no, you can't. 1294 00:53:50,187 --> 00:53:52,520 You have to make sure that-- otherwise you could escape, 1295 00:53:52,520 --> 00:53:54,811 like well, I can't access-- why don't you run this guy? 1296 00:53:54,811 --> 00:53:56,258 [INAUDIBLE] 1297 00:53:56,258 --> 00:53:59,535 So yeah, cap_enter is inherited by all the children, which 1298 00:53:59,535 --> 00:54:01,550 is actually hugely important. 1299 00:54:01,550 --> 00:54:02,050 Yeah? 1300 00:54:06,000 --> 00:54:09,190 Anyone else? 1301 00:54:09,190 --> 00:54:10,990 So what if we kill the UID? 1302 00:54:10,990 --> 00:54:13,591 So it's supposed to be like going to cap_enter, 1303 00:54:13,591 --> 00:54:15,590 and we just kill the UID of the current process. 1304 00:54:15,590 --> 00:54:17,476 We don't actually care what it is anymore. 1305 00:54:17,476 --> 00:54:19,225 And then the process tries to open a file. 1306 00:54:19,225 --> 00:54:22,350 What checks should apply? 1307 00:54:22,350 --> 00:54:22,850 Yeah? 1308 00:54:22,850 --> 00:54:25,191 AUDIENCE: Oh, I was thinking that the UID is 1309 00:54:25,191 --> 00:54:26,690 useful for logging purposes as well, 1310 00:54:26,690 --> 00:54:28,580 like being able to tell if you did something. 1311 00:54:28,580 --> 00:54:29,130 PROFESSOR: So yeah, you're right. 1312 00:54:29,130 --> 00:54:29,460 Actually, yeah. 1313 00:54:29,460 --> 00:54:30,930 So that would be actually kind of damaging, right? 1314 00:54:30,930 --> 00:54:33,500 Like I spawned some sandbox process on my machine 1315 00:54:33,500 --> 00:54:34,669 and it loses the UID. 1316 00:54:34,669 --> 00:54:36,460 I'm like I have a hundred processes running 1317 00:54:36,460 --> 00:54:38,730 on my machine, and I have no idea what they are. 1318 00:54:38,730 --> 00:54:40,400 So that's probably not a good plan for a management purpose. 1319 00:54:40,400 --> 00:54:41,555 You're absolutely right. 1320 00:54:41,555 --> 00:54:44,170 But I'm just sort of hypothetically saying, well, 1321 00:54:44,170 --> 00:54:45,920 do we need it for access control, I guess. 1322 00:54:45,920 --> 00:54:46,750 Yeah? 1323 00:54:46,750 --> 00:54:48,280 AUDIENCE: Maybe if this UID is only 1324 00:54:48,280 --> 00:54:50,790 supposed to be able to access this file by reading 1325 00:54:50,790 --> 00:54:54,075 or whatever, but you have the file descriptor for it, 1326 00:54:54,075 --> 00:54:55,450 but then if you lose the UID, you 1327 00:54:55,450 --> 00:54:57,960 might get permissions to write [INAUDIBLE] or something? 1328 00:54:57,960 --> 00:54:58,780 PROFESSOR: Yeah. 1329 00:54:58,780 --> 00:55:03,410 I think actually what it shows up in is in directories. 1330 00:55:03,410 --> 00:55:05,287 Because once you add a capability to a file, 1331 00:55:05,287 --> 00:55:06,120 that's basically it. 1332 00:55:06,120 --> 00:55:08,600 You have it open with particular privileges, et cetera. 1333 00:55:08,600 --> 00:55:11,519 But the problem is that they have this hybrid design where 1334 00:55:11,519 --> 00:55:13,560 they say, well, you can actually add capabilities 1335 00:55:13,560 --> 00:55:15,510 to directories, and you can open a new file 1336 00:55:15,510 --> 00:55:17,030 as you're running along. 1337 00:55:17,030 --> 00:55:19,375 And it might be the case that you add a capability 1338 00:55:19,375 --> 00:55:22,200 to a directory, like /etc. 1339 00:55:22,200 --> 00:55:24,450 And you don't have access to necessarily all the files 1340 00:55:24,450 --> 00:55:25,520 in /etc. 1341 00:55:25,520 --> 00:55:27,440 But once you enter capability mode, 1342 00:55:27,440 --> 00:55:29,860 you can now try to open those files by saying, well, 1343 00:55:29,860 --> 00:55:31,840 I have access to the /etc directory. 1344 00:55:31,840 --> 00:55:32,850 It's open already. 1345 00:55:32,850 --> 00:55:34,620 Why don't you give me the file named 1346 00:55:34,620 --> 00:55:36,060 password in that directory? 1347 00:55:36,060 --> 00:55:38,780 And the kernel still needs to make an access control decision 1348 00:55:38,780 --> 00:55:42,090 on whether to allow you to open a file in that directory 1349 00:55:42,090 --> 00:55:45,010 with either read mode or write mode or what have you. 1350 00:55:45,010 --> 00:55:47,490 So I think this is the one place where you still need 1351 00:55:47,490 --> 00:55:50,620 this ambient privilege, to some extent, because they're 1352 00:55:50,620 --> 00:55:53,140 trying to build this compatible design where 1353 00:55:53,140 --> 00:55:56,780 you can have semi-natural semantics for how directories 1354 00:55:56,780 --> 00:55:57,670 work. 1355 00:55:57,670 --> 00:55:59,410 Does that make sense? 1356 00:55:59,410 --> 00:56:02,920 it's like one leftover place, kind of for compatibility 1357 00:56:02,920 --> 00:56:05,884 reasons, or at least the way that Unix file systems 1358 00:56:05,884 --> 00:56:07,660 are typically set up. 1359 00:56:07,660 --> 00:56:09,649 AUDIENCE: Are there any other places? 1360 00:56:09,649 --> 00:56:10,690 PROFESSOR: Good question. 1361 00:56:10,690 --> 00:56:12,240 I couldn't think of one off hand, 1362 00:56:12,240 --> 00:56:14,531 but I guess I would have to get their previous desource 1363 00:56:14,531 --> 00:56:17,980 code to really figure out what's going on. 1364 00:56:17,980 --> 00:56:20,150 I think most of the other situations 1365 00:56:20,150 --> 00:56:22,069 don't really require a UID check. 1366 00:56:22,069 --> 00:56:23,860 Because for networking, it doesn't show up. 1367 00:56:23,860 --> 00:56:27,406 I think for process descriptors it doesn't show up, either. 1368 00:56:27,406 --> 00:56:29,660 If you have it, then you just have it. 1369 00:56:29,660 --> 00:56:33,421 So I think it probably is just file system operations. 1370 00:56:33,421 --> 00:56:35,920 For shared memory, it's also-- once you have a shared memory 1371 00:56:35,920 --> 00:56:37,760 segment, you have it open. 1372 00:56:41,232 --> 00:56:41,841 Yeah? 1373 00:56:41,841 --> 00:56:43,216 AUDIENCE: Could you explain again 1374 00:56:43,216 --> 00:56:47,404 how exactly the user ID matters if you have a capability? 1375 00:56:47,404 --> 00:56:48,070 PROFESSOR: Yeah. 1376 00:56:48,070 --> 00:56:51,810 So I think where it matters is, you have 1377 00:56:51,810 --> 00:56:54,910 a capability to a directory. 1378 00:56:54,910 --> 00:56:57,770 The question is, what does the capability represent? 1379 00:56:57,770 --> 00:57:01,260 So one interpretation that-- for example, some capability 1380 00:57:01,260 --> 00:57:03,130 system state, not Capsicum. 1381 00:57:03,130 --> 00:57:04,130 Pure capability systems. 1382 00:57:04,130 --> 00:57:06,870 They say, well, if you have a capability to a directory, 1383 00:57:06,870 --> 00:57:08,828 then of course you have access to all the files 1384 00:57:08,828 --> 00:57:11,392 in that directory, no questions about it. 1385 00:57:11,392 --> 00:57:13,225 And in Unix, this is typically not the case. 1386 00:57:13,225 --> 00:57:16,110 You can open a directory like /etc, 1387 00:57:16,110 --> 00:57:18,670 but there's lots of system files in there that are maybe 1388 00:57:18,670 --> 00:57:21,917 private, like the private key of your server is stored in there. 1389 00:57:21,917 --> 00:57:24,250 And just because you can look at a directory and open it 1390 00:57:24,250 --> 00:57:26,820 and list it doesn't mean that you cannot open the files 1391 00:57:26,820 --> 00:57:28,310 in that directory. 1392 00:57:28,310 --> 00:57:32,392 So in Capsicum, if you open a directory like /etc, 1393 00:57:32,392 --> 00:57:33,850 and then you enter capability mode. 1394 00:57:33,850 --> 00:57:35,190 And then you say, well, hey, I don't 1395 00:57:35,190 --> 00:57:36,200 know what this directory is. 1396 00:57:36,200 --> 00:57:37,658 I just add a file descriptor to it. 1397 00:57:37,658 --> 00:57:39,342 There's a file in there called "key." 1398 00:57:39,342 --> 00:57:41,390 Why don't you open that file "key"? 1399 00:57:41,390 --> 00:57:44,070 And at this point, you probably don't 1400 00:57:44,070 --> 00:57:46,270 want to allow this capability-based processor 1401 00:57:46,270 --> 00:57:48,480 to just open it, because that wasn't the intent. 1402 00:57:48,480 --> 00:57:52,060 They'll allow you to bypass the Unix permissions on a file. 1403 00:57:52,060 --> 00:57:54,250 So I think the authors of this paper 1404 00:57:54,250 --> 00:57:59,850 are careful to design a system which would not violate 1405 00:57:59,850 --> 00:58:01,600 existing security mechanisms. 1406 00:58:01,600 --> 00:58:04,462 AUDIENCE: So you're saying that you can, in some cases, 1407 00:58:04,462 --> 00:58:06,370 use a combination of the two? 1408 00:58:06,370 --> 00:58:08,760 So even though it'll be able to change it to directory, 1409 00:58:08,760 --> 00:58:10,760 inside the directory, which files you can access 1410 00:58:10,760 --> 00:58:11,839 depends on your user ID? 1411 00:58:11,839 --> 00:58:12,880 PROFESSOR: Yeah, exactly. 1412 00:58:12,880 --> 00:58:16,645 So in Capsicum, the way they get it to work in practice 1413 00:58:16,645 --> 00:58:19,890 is that, actually, before you enter capability mode, 1414 00:58:19,890 --> 00:58:20,666 you have to guess. 1415 00:58:20,666 --> 00:58:22,415 Well, what files am I going to need later? 1416 00:58:22,415 --> 00:58:23,970 I'm going to need some shared libraries. 1417 00:58:23,970 --> 00:58:25,060 I'll need some text files. 1418 00:58:25,060 --> 00:58:26,644 I'll need some templates. 1419 00:58:26,644 --> 00:58:28,560 I'll need some network connections, et cetera. 1420 00:58:28,560 --> 00:58:30,960 So you open all these things ahead of time. 1421 00:58:30,960 --> 00:58:33,970 And you don't always necessarily know which exact file you need. 1422 00:58:33,970 --> 00:58:35,754 So what these guys support as well, 1423 00:58:35,754 --> 00:58:38,045 you can actually just open a directory file descriptor, 1424 00:58:38,045 --> 00:58:38,780 as well. 1425 00:58:38,780 --> 00:58:41,460 And then I can look up the particular files later. 1426 00:58:41,460 --> 00:58:42,960 But it might be that the files don't 1427 00:58:42,960 --> 00:58:44,209 have all the same permissions. 1428 00:58:44,209 --> 00:58:46,760 So that's exactly the reason, yeah. 1429 00:58:46,760 --> 00:58:49,610 Make sense? 1430 00:58:49,610 --> 00:58:50,940 All right. 1431 00:58:50,940 --> 00:58:55,560 So this is the kernel mechanism part of it. 1432 00:58:55,560 --> 00:59:01,830 Why do they also need this library for libcapsicum? 1433 00:59:01,830 --> 00:59:04,410 I guess there's two things that they support in that library, 1434 00:59:04,410 --> 00:59:07,330 as far as I can tell, or two main things. 1435 00:59:07,330 --> 00:59:15,342 One is that they implement this function they call lch_start 1436 00:59:15,342 --> 00:59:21,930 that you should use instead of cap_enter. 1437 00:59:21,930 --> 00:59:25,600 And the other sort of feature the library provides 1438 00:59:25,600 --> 00:59:31,120 in libcapsicum is this notion called fd lists 1439 00:59:31,120 --> 00:59:33,600 instead of passing file descriptors by number. 1440 00:59:33,600 --> 00:59:35,030 So this fd list thing is probably 1441 00:59:35,030 --> 00:59:36,460 the easiest thing to explain. 1442 00:59:36,460 --> 00:59:40,940 It's basically a generalization, or maybe a clean up, 1443 00:59:40,940 --> 00:59:43,520 of how Unix manages and passes file 1444 00:59:43,520 --> 00:59:46,220 descriptors between process. 1445 00:59:46,220 --> 00:59:49,580 So in traditional Unix and Linux, 1446 00:59:49,580 --> 00:59:52,910 how you use it today, typically when you launch a process, 1447 00:59:52,910 --> 00:59:54,550 you can pass it some file descriptors. 1448 00:59:54,550 --> 00:59:56,020 You just open some file descriptors 1449 00:59:56,020 --> 00:59:58,485 at particular integer numbers in this table 1450 00:59:58,485 --> 01:00:00,610 and you run the child process that you want to run. 1451 01:00:00,610 --> 01:00:03,180 Or you run a particular binary, and it 1452 01:00:03,180 --> 01:00:08,000 inherits all these open slots in the fd table. 1453 01:00:08,000 --> 01:00:10,370 But there's no real good way to name these things 1454 01:00:10,370 --> 01:00:11,730 other than by number. 1455 01:00:11,730 --> 01:00:15,244 So the somewhat surprising convention, 1456 01:00:15,244 --> 01:00:16,660 if you haven't [INAUDIBLE] before, 1457 01:00:16,660 --> 01:00:18,750 is that, well, slot 0 is your input. 1458 01:00:18,750 --> 01:00:20,940 Slot 1 is your output. 1459 01:00:20,940 --> 01:00:24,010 Slot 2 is where you should print error messages to. 1460 01:00:24,010 --> 01:00:27,370 And that's how Unix sort of works. 1461 01:00:27,370 --> 01:00:32,240 And it sort of works OK if you are just passing these three 1462 01:00:32,240 --> 01:00:35,430 files or streams to a process. 1463 01:00:35,430 --> 01:00:37,570 But in Capsicum, what's happening 1464 01:00:37,570 --> 01:00:41,140 is that you're passing down many more file descriptors around. 1465 01:00:41,140 --> 01:00:43,894 So you're passing a file descriptor for some files. 1466 01:00:43,894 --> 01:00:46,310 You're passing a file descriptor for a network connection, 1467 01:00:46,310 --> 01:00:49,320 for a shared library, what have you. 1468 01:00:49,320 --> 01:00:52,060 And it becomes much more tedious to manage all these numbers. 1469 01:00:52,060 --> 01:00:55,370 So basically, libcapsicum provides an abstraction 1470 01:00:55,370 --> 01:00:59,460 for naming these past file descriptors between processes 1471 01:00:59,460 --> 01:01:01,810 by some sort of a hierarchical name, 1472 01:01:01,810 --> 01:01:06,980 instead of just these opaque integers, if you will. 1473 01:01:06,980 --> 01:01:08,410 So that's one sort of simple thing 1474 01:01:08,410 --> 01:01:10,240 that they provide in their library. 1475 01:01:10,240 --> 01:01:13,260 So I can pass a file descriptor to a process 1476 01:01:13,260 --> 01:01:14,100 and give it a name. 1477 01:01:14,100 --> 01:01:16,100 And it doesn't really matter what number it has, 1478 01:01:16,100 --> 01:01:16,982 a little easier. 1479 01:01:16,982 --> 01:01:17,968 That make sense? 1480 01:01:17,968 --> 01:01:19,450 OK. 1481 01:01:19,450 --> 01:01:21,120 So then they have this other mechanism, 1482 01:01:21,120 --> 01:01:25,906 this much more elaborate way to start a sandbox. 1483 01:01:25,906 --> 01:01:29,740 This lch, libcapsicum Host, API for starting a sandbox, 1484 01:01:29,740 --> 01:01:33,342 instead of just entering the capability mode. 1485 01:01:33,342 --> 01:01:34,050 So what happened? 1486 01:01:34,050 --> 01:01:36,396 Why do they need something more than just entering 1487 01:01:36,396 --> 01:01:37,392 capability mode? 1488 01:01:37,392 --> 01:01:39,950 What are you worried about on creating a sandbox? 1489 01:01:39,950 --> 01:01:40,810 Yeah? 1490 01:01:40,810 --> 01:01:43,502 AUDIENCE: It erases all the inherited stuff 1491 01:01:43,502 --> 01:01:45,524 to give you a clean start. 1492 01:01:45,524 --> 01:01:46,190 PROFESSOR: Yeah. 1493 01:01:46,190 --> 01:01:48,430 So I think they worry about trying 1494 01:01:48,430 --> 01:01:51,230 to enumerate what are all the things the sandbox has access 1495 01:01:51,230 --> 01:01:51,870 to. 1496 01:01:51,870 --> 01:01:56,160 And the problem is that if you just call cap_enter, 1497 01:01:56,160 --> 01:01:58,560 technically, at the kernel mechanism level, as we talked 1498 01:01:58,560 --> 01:01:59,285 about just now, it worked. 1499 01:01:59,285 --> 01:01:59,785 Right? 1500 01:01:59,785 --> 01:02:02,270 It just prevents you from opening any new capabilities. 1501 01:02:02,270 --> 01:02:05,230 But the problem is that there might be lots of existing stuff 1502 01:02:05,230 --> 01:02:08,780 that the process already has access to. 1503 01:02:08,780 --> 01:02:11,256 So I guess the simplest example is maybe 1504 01:02:11,256 --> 01:02:13,930 there are some file descriptors that you forgot you had opened, 1505 01:02:13,930 --> 01:02:17,310 and it'll just get inherited by this process. 1506 01:02:17,310 --> 01:02:20,470 So one example is they were looking at tcpdump. 1507 01:02:20,470 --> 01:02:23,950 And they realized that-- well, first, they changed tcpdump 1508 01:02:23,950 --> 01:02:27,500 just by calling cap_enter at the point 1509 01:02:27,500 --> 01:02:30,594 just before they were about to parse all the network input. 1510 01:02:30,594 --> 01:02:32,760 So this works well, in some sense, because you can't 1511 01:02:32,760 --> 01:02:34,290 get any more capabilities. 1512 01:02:34,290 --> 01:02:36,331 But then they looked at the open file descriptor, 1513 01:02:36,331 --> 01:02:39,285 and they realized that you have complete access to the user's 1514 01:02:39,285 --> 01:02:41,720 terminal, because you have an open file descriptor to it. 1515 01:02:41,720 --> 01:02:43,145 So you can actually sniff all the keystrokes 1516 01:02:43,145 --> 01:02:45,225 that the user is typing and all that stuff. 1517 01:02:45,225 --> 01:02:48,602 So it's probably not a great plan for tcpdump. 1518 01:02:48,602 --> 01:02:51,060 This compromise you probably don't want sniffing everything 1519 01:02:51,060 --> 01:02:52,950 you're typing. 1520 01:02:52,950 --> 01:02:56,520 So instead they-- well, in tcpdump's case, 1521 01:02:56,520 --> 01:03:00,900 they manually changed these file descriptors 1522 01:03:00,900 --> 01:03:03,010 to add some capability bits to them, 1523 01:03:03,010 --> 01:03:05,360 to restrict what kinds of operations you can do. 1524 01:03:05,360 --> 01:03:07,990 So remember, the capability, at least in Capsicum, 1525 01:03:07,990 --> 01:03:11,030 has these extra bits that say, here's the class of operations 1526 01:03:11,030 --> 01:03:13,310 you can perform on a file descriptor. 1527 01:03:13,310 --> 01:03:17,650 So they basically take what used to be file descriptor 0. 1528 01:03:17,650 --> 01:03:20,700 It pointed to the user's terminal, tty. 1529 01:03:20,700 --> 01:03:23,670 And originally, this was just a direct pointer 1530 01:03:23,670 --> 01:03:25,880 to the tty structure in the kernel. 1531 01:03:25,880 --> 01:03:27,570 What they do is they actually-- in order 1532 01:03:27,570 --> 01:03:30,070 to limit the kind of operations you can perform on this file 1533 01:03:30,070 --> 01:03:31,930 descriptor, they basically introduced some extra beta 1534 01:03:31,930 --> 01:03:32,930 structure in the middle. 1535 01:03:32,930 --> 01:03:34,810 This guy will point to the terminal. 1536 01:03:34,810 --> 01:03:36,730 And the file descriptor itself will 1537 01:03:36,730 --> 01:03:39,950 point to some sort of a capability structure. 1538 01:03:39,950 --> 01:03:43,040 And inside of it is the pointer to the real file 1539 01:03:43,040 --> 01:03:46,685 that you're trying to access, as well as some restricted bits 1540 01:03:46,685 --> 01:03:51,590 or permissions on that file descriptor 1541 01:03:51,590 --> 01:03:53,280 object that you can do. 1542 01:03:53,280 --> 01:03:55,740 In their case, they basically can say for tcpdumps standard 1543 01:03:55,740 --> 01:03:57,585 input, you cannot do anything on it. 1544 01:03:57,585 --> 01:03:59,602 You can just see that it exists, and that's it. 1545 01:03:59,602 --> 01:04:01,564 For the output file descriptor, they say, 1546 01:04:01,564 --> 01:04:03,980 well, you can write to it, but you maybe can't reposition. 1547 01:04:03,980 --> 01:04:07,710 You can't [INAUDIBLE] back and forth, et cetera. 1548 01:04:07,710 --> 01:04:10,280 Make sense? 1549 01:04:10,280 --> 01:04:11,900 So what else would you worry about, 1550 01:04:11,900 --> 01:04:12,570 in terms of starting a sandbox? 1551 01:04:12,570 --> 01:04:14,810 So there is, I guess, the file descriptor state. 1552 01:04:14,810 --> 01:04:16,234 Anything else that matters? 1553 01:04:21,448 --> 01:04:24,320 Well, I guess in Unix it's file descriptors and memory. 1554 01:04:24,320 --> 01:04:25,670 That's pretty much it. 1555 01:04:25,670 --> 01:04:29,400 So the other thing that these guys worry about 1556 01:04:29,400 --> 01:04:32,250 is that it might be that in your address space, 1557 01:04:32,250 --> 01:04:34,600 you previously allocated some sensitive data. 1558 01:04:34,600 --> 01:04:36,920 And the process that your sandbox 1559 01:04:36,920 --> 01:04:38,830 is going to be able to read all its memory. 1560 01:04:38,830 --> 01:04:40,205 So if there's maybe some password 1561 01:04:40,205 --> 01:04:42,420 that you checked before when the user was logging in, 1562 01:04:42,420 --> 01:04:44,150 and you haven't cleared that yet, 1563 01:04:44,150 --> 01:04:45,749 well, the sandbox process will be 1564 01:04:45,749 --> 01:04:47,165 able to read that and do something 1565 01:04:47,165 --> 01:04:49,050 maybe interesting to that. 1566 01:04:49,050 --> 01:04:50,920 So the way they solved this problem 1567 01:04:50,920 --> 01:04:55,100 is, in lch_start, you basically have to start a program fresh. 1568 01:04:55,100 --> 01:04:57,270 You basically take a program. 1569 01:04:57,270 --> 01:04:59,197 You explicitly package up all the arguments 1570 01:04:59,197 --> 01:05:00,030 you want to give it. 1571 01:05:00,030 --> 01:05:01,590 You explicitly package up all the file descriptors 1572 01:05:01,590 --> 01:05:02,860 you want to give it. 1573 01:05:02,860 --> 01:05:04,235 And then you start a new process, 1574 01:05:04,235 --> 01:05:06,410 or you would call executives to reinitialize 1575 01:05:06,410 --> 01:05:09,200 your whole virtual memory space. 1576 01:05:09,200 --> 01:05:11,080 And then there's no question about what 1577 01:05:11,080 --> 01:05:14,370 is the set of sensitive data of extra privileges 1578 01:05:14,370 --> 01:05:15,510 that this process has. 1579 01:05:15,510 --> 01:05:18,160 It's exactly what you passed to lch_start, 1580 01:05:18,160 --> 01:05:22,040 in terms of a program name, arguments, and capabilities. 1581 01:05:22,040 --> 01:05:24,540 Does that make sense? 1582 01:05:24,540 --> 01:05:27,160 AUDIENCE: What would happen if the process that you're 1583 01:05:27,160 --> 01:05:29,494 starting is a setuid 0 binary? 1584 01:05:29,494 --> 01:05:30,160 PROFESSOR: Yeah. 1585 01:05:30,160 --> 01:05:35,380 I think these guys say that they don't actually 1586 01:05:35,380 --> 01:05:38,020 allow setuid binaries in capability mode, 1587 01:05:38,020 --> 01:05:39,860 just to avoid some weird interactions that 1588 01:05:39,860 --> 01:05:40,905 would show up. 1589 01:05:40,905 --> 01:05:42,940 I think the rules that they implement 1590 01:05:42,940 --> 01:05:45,263 is that you could have a setuid program that 1591 01:05:45,263 --> 01:05:47,770 gets its privileges from a setuid binary, 1592 01:05:47,770 --> 01:05:50,950 and then it can call capenter or lch_start. 1593 01:05:50,950 --> 01:05:52,890 But once you're in capability mode, 1594 01:05:52,890 --> 01:05:54,640 you cannot regain extra privileges. 1595 01:05:54,640 --> 01:05:58,110 In principle, this could work, but it would be very weird. 1596 01:05:58,110 --> 01:06:00,680 Because remember, the only place where the UID matters, 1597 01:06:00,680 --> 01:06:02,275 once you're in capability mode, is 1598 01:06:02,275 --> 01:06:04,150 in opening these files inside of a directory. 1599 01:06:04,150 --> 01:06:07,080 So it's not clear this is really a great plan 1600 01:06:07,080 --> 01:06:10,850 for getting more privileges or [INAUDIBLE] there. 1601 01:06:10,850 --> 01:06:11,350 Make sense? 1602 01:06:11,350 --> 01:06:12,790 Yeah? 1603 01:06:12,790 --> 01:06:14,270 AUDIENCE: We talked about earlier 1604 01:06:14,270 --> 01:06:17,575 why the library doesn't really support strict separation 1605 01:06:17,575 --> 01:06:19,165 between those two. 1606 01:06:19,165 --> 01:06:21,390 And then we just mentioned all these problems 1607 01:06:21,390 --> 01:06:23,800 that you could use [INAUDIBLE], so we're still 1608 01:06:23,800 --> 01:06:26,680 not under a restriction to use lch_start necessarily, right? 1609 01:06:26,680 --> 01:06:27,680 PROFESSOR: That's right. 1610 01:06:27,680 --> 01:06:28,179 Yeah. 1611 01:06:28,179 --> 01:06:30,510 So lch_start, here's sort of the way to think of it. 1612 01:06:30,510 --> 01:06:32,960 So you have an application, like maybe tcpdump. 1613 01:06:32,960 --> 01:06:36,309 Or gzip is the other thing they work with. 1614 01:06:36,309 --> 01:06:37,725 And what you're basically assuming 1615 01:06:37,725 --> 01:06:40,390 is the application is probably not compromised, 1616 01:06:40,390 --> 01:06:42,960 and there are some core part of the application that you 1617 01:06:42,960 --> 01:06:44,730 worry about sandboxing. 1618 01:06:44,730 --> 01:06:47,570 In tcpdump's case, it's actually parsing packets 1619 01:06:47,570 --> 01:06:48,730 coming from the network. 1620 01:06:48,730 --> 01:06:50,660 In gzip's case, it's actually taking the file 1621 01:06:50,660 --> 01:06:51,915 and decompressing it. 1622 01:06:51,915 --> 01:06:54,250 And you're basically assuming, well, up until a point, 1623 01:06:54,250 --> 01:06:56,250 the process is probably doing all the right things. 1624 01:06:56,250 --> 01:06:57,041 It's not exploited. 1625 01:06:57,041 --> 01:06:59,420 There's probably not a bug yet for the [INAUDIBLE] even. 1626 01:06:59,420 --> 01:07:00,795 So at that point, you're trusting 1627 01:07:00,795 --> 01:07:04,210 that it will run lch_start correctly and correctly set up 1628 01:07:04,210 --> 01:07:06,580 the image, correctly set up all the capabilities, 1629 01:07:06,580 --> 01:07:09,870 and then restrict itself from making any further system calls 1630 01:07:09,870 --> 01:07:11,840 outside its capability mode. 1631 01:07:11,840 --> 01:07:13,490 And then you run the dangerous stuff. 1632 01:07:13,490 --> 01:07:16,590 And by then, this setup has happened correctly, 1633 01:07:16,590 --> 01:07:20,252 and there's no way to escape out of that sandbox. 1634 01:07:20,252 --> 01:07:22,570 Make sense? 1635 01:07:22,570 --> 01:07:23,690 All right. 1636 01:07:23,690 --> 01:07:28,230 So I guess let's look at how you actually use capability mode 1637 01:07:28,230 --> 01:07:30,584 to sandbox applications. 1638 01:07:30,584 --> 01:07:32,250 So we talked a little bit about tcpdump. 1639 01:07:32,250 --> 01:07:36,005 How do you isolate this process? 1640 01:07:36,005 --> 01:07:38,410 Another interesting example they had 1641 01:07:38,410 --> 01:07:44,660 was this gzip program that compresses, decompresses files. 1642 01:07:44,660 --> 01:07:47,010 So why do they worry about sandboxing it? 1643 01:07:47,010 --> 01:07:50,420 I guess they worry that the decompression code is going 1644 01:07:50,420 --> 01:07:52,740 to be potentially buggy, or maybe there's 1645 01:07:52,740 --> 01:07:54,880 some memory management errors in how 1646 01:07:54,880 --> 01:07:58,100 they manage the buffers during decompression, et cetera. 1647 01:07:58,100 --> 01:08:05,450 So could they-- well, one interesting question, I guess, 1648 01:08:05,450 --> 01:08:10,390 is why are the changes to gzip seemingly much more 1649 01:08:10,390 --> 01:08:16,109 complicated than for tcpdump? 1650 01:08:23,670 --> 01:08:24,170 Any guesses? 1651 01:08:26,655 --> 01:08:28,029 Well as far as you can tell, it's 1652 01:08:28,029 --> 01:08:31,640 mostly just a question of how the application is structured 1653 01:08:31,640 --> 01:08:32,439 internally, right? 1654 01:08:32,439 --> 01:08:39,170 So if you had a application that simply compressed 1655 01:08:39,170 --> 01:08:42,029 a single file, or decompressed a single file, 1656 01:08:42,029 --> 01:08:48,125 then it might be OK for us to just run it in capability mode 1657 01:08:48,125 --> 01:08:49,249 without really changing it. 1658 01:08:49,249 --> 01:08:52,540 You just give it a new standard in for something to decompress, 1659 01:08:52,540 --> 01:08:55,830 and the standard out goes to the decompressed output, 1660 01:08:55,830 --> 01:08:57,300 and that would work fine. 1661 01:08:57,300 --> 01:08:59,830 The problem, as is almost always the case 1662 01:08:59,830 --> 01:09:01,899 here with these kind of sandboxing techniques, 1663 01:09:01,899 --> 01:09:04,830 is that the application actually has much more complicated logic 1664 01:09:04,830 --> 01:09:05,330 around it. 1665 01:09:05,330 --> 01:09:07,359 So gzip, for example, can compress 1666 01:09:07,359 --> 01:09:09,490 multiple files, et cetera. 1667 01:09:09,490 --> 01:09:13,580 And in that case, you have some sort of a driver process on top 1668 01:09:13,580 --> 01:09:15,450 which actually has these extra privileges 1669 01:09:15,450 --> 01:09:18,899 to open multiple files, to create things, et cetera. 1670 01:09:18,899 --> 01:09:22,300 And the core logic needs to be often another helper process. 1671 01:09:22,300 --> 01:09:24,600 And it was just so the case in gzip 1672 01:09:24,600 --> 01:09:27,359 that the application wasn't structured 1673 01:09:27,359 --> 01:09:29,890 in a way where this was already a separate process doing 1674 01:09:29,890 --> 01:09:31,689 all the decompression or compression. 1675 01:09:31,689 --> 01:09:36,020 So they had to change gzip's core implementation, 1676 01:09:36,020 --> 01:09:42,050 and, well, some structure of the gzip application, instead 1677 01:09:42,050 --> 01:09:44,560 of just passing the data to the decompression 1678 01:09:44,560 --> 01:09:47,060 function to actually send it over an RPC call 1679 01:09:47,060 --> 01:09:49,859 or really just write it to some almost file descriptor 1680 01:09:49,859 --> 01:09:52,660 to help process the problems on the side 1681 01:09:52,660 --> 01:09:54,200 and performs all the decompression 1682 01:09:54,200 --> 01:09:55,940 with almost no privileges. 1683 01:09:55,940 --> 01:09:57,760 The only thing it can do is return 1684 01:09:57,760 --> 01:10:00,090 the decompressed data, or the compressed data, 1685 01:10:00,090 --> 01:10:02,670 back to the caller process. 1686 01:10:02,670 --> 01:10:03,670 That roughly make sense? 1687 01:10:03,670 --> 01:10:06,230 What's going on in gzip? 1688 01:10:06,230 --> 01:10:07,820 All right. 1689 01:10:07,820 --> 01:10:12,180 So I guess one thing we asked for the homework is how do you 1690 01:10:12,180 --> 01:10:13,667 actually use Capsicum in OKWS? 1691 01:10:13,667 --> 01:10:14,750 So what do you guys think? 1692 01:10:14,750 --> 01:10:17,025 Would it be useful? 1693 01:10:17,025 --> 01:10:19,385 Would the OKWS guys have been excited 1694 01:10:19,385 --> 01:10:23,980 and switched to FreeBSD because this was much easier to use? 1695 01:10:23,980 --> 01:10:25,590 Or is this a wash? 1696 01:10:25,590 --> 01:10:26,777 So what do you think? 1697 01:10:26,777 --> 01:10:28,360 How would you use Capsicum in FreeBSD? 1698 01:10:28,360 --> 01:10:30,954 Would this be much different? 1699 01:10:30,954 --> 01:10:31,890 Yeah. 1700 01:10:31,890 --> 01:10:33,765 AUDIENCE: So it means you can get rid of some 1701 01:10:33,765 --> 01:10:36,944 of the jailing [INAUDIBLE]. 1702 01:10:36,944 --> 01:10:37,610 PROFESSOR: Yeah. 1703 01:10:37,610 --> 01:10:38,109 That's true. 1704 01:10:38,109 --> 01:10:40,600 So truth seems to be completely superseded by this plan 1705 01:10:40,600 --> 01:10:42,980 of having directory file descriptors and capabilities. 1706 01:10:42,980 --> 01:10:43,646 So that's great. 1707 01:10:43,646 --> 01:10:45,980 So you don't need the chroots setting it up. 1708 01:10:45,980 --> 01:10:46,770 That seems messy. 1709 01:10:46,770 --> 01:10:48,270 And this is much more precise, also. 1710 01:10:48,270 --> 01:10:49,996 Because you can-- instead of having 1711 01:10:49,996 --> 01:10:51,870 a chroot with lots of little things in there, 1712 01:10:51,870 --> 01:10:54,397 you have to maybe set the permissions on there carefully. 1713 01:10:54,397 --> 01:10:56,480 You can just open exactly the files that you need. 1714 01:10:56,480 --> 01:10:58,800 So that seems like a plus. 1715 01:10:58,800 --> 01:11:00,788 Any other benefits? 1716 01:11:00,788 --> 01:11:01,288 Yeah. 1717 01:11:01,288 --> 01:11:02,236 AUDIENCE: [INAUDIBLE]. 1718 01:11:06,502 --> 01:11:08,120 PROFESSOR: In OKWS, you mean? 1719 01:11:08,120 --> 01:11:09,036 AUDIENCE: [INAUDIBLE]. 1720 01:11:09,036 --> 01:11:09,438 PROFESSOR: Yeah. 1721 01:11:09,438 --> 01:11:11,880 So in OKWS, right, you have this OK launcher daemon that 1722 01:11:11,880 --> 01:11:14,150 had to launch all these guys. 1723 01:11:14,150 --> 01:11:15,870 And it was the parent process. 1724 01:11:15,870 --> 01:11:18,030 Only when they die, the signal goes back 1725 01:11:18,030 --> 01:11:22,197 to this okld to restart the crash process. 1726 01:11:22,197 --> 01:11:24,155 And that thing had to run this root, because it 1727 01:11:24,155 --> 01:11:25,700 had to sandbox things. 1728 01:11:25,700 --> 01:11:28,140 There's actually a number of things you could do better 1729 01:11:28,140 --> 01:11:31,240 with Capsicum in OKWS. 1730 01:11:31,240 --> 01:11:33,200 So one example is you could probably 1731 01:11:33,200 --> 01:11:35,410 have okld have many fewer privileges. 1732 01:11:35,410 --> 01:11:39,410 Because it might need to be root initially to get fort 80. 1733 01:11:39,410 --> 01:11:42,516 But after that, it could set up sandboxes for everyone else 1734 01:11:42,516 --> 01:11:43,640 without being root anymore. 1735 01:11:43,640 --> 01:11:44,670 So that's kind of cool. 1736 01:11:44,670 --> 01:11:46,620 And maybe you can even delegate the job 1737 01:11:46,620 --> 01:11:48,870 of responding a process to someone else, 1738 01:11:48,870 --> 01:11:50,930 maybe a per service monitor Damion 1739 01:11:50,930 --> 01:11:54,430 that just has this process descriptor handle, 1740 01:11:54,430 --> 01:11:56,950 or process descriptor for child process, 1741 01:11:56,950 --> 01:11:58,870 and whenever it crashes, starts a new one. 1742 01:11:58,870 --> 01:12:02,745 So I think this process [INAUDIBLE] helps things a lot. 1743 01:12:02,745 --> 01:12:06,160 And the fact that you can create a sandbox without being root 1744 01:12:06,160 --> 01:12:09,542 is also quite helpful, as well. 1745 01:12:09,542 --> 01:12:11,000 Any other stuff, what you could do? 1746 01:12:11,000 --> 01:12:11,440 Yeah? 1747 01:12:11,440 --> 01:12:12,320 AUDIENCE: You could give each one 1748 01:12:12,320 --> 01:12:14,387 a file descriptor with append only mode to the log. 1749 01:12:14,387 --> 01:12:15,053 PROFESSOR: Yeah. 1750 01:12:15,053 --> 01:12:16,750 So that's pretty cool. 1751 01:12:16,750 --> 01:12:19,560 So as we were talking last time, in OKWS, 1752 01:12:19,560 --> 01:12:23,675 well, the oklogd maybe could hamper with the log file. 1753 01:12:23,675 --> 01:12:25,373 And who knows what the kernel will 1754 01:12:25,373 --> 01:12:27,710 allow it to do once it has a file descriptor on the log 1755 01:12:27,710 --> 01:12:28,670 file itself. 1756 01:12:28,670 --> 01:12:30,090 But here, the fact that we can do 1757 01:12:30,090 --> 01:12:33,010 much more of a precise capability map 1758 01:12:33,010 --> 01:12:35,562 on a file descriptor, well, we could give it a log file 1759 01:12:35,562 --> 01:12:37,895 and say, well, you could just write to it, but not seek. 1760 01:12:37,895 --> 01:12:40,150 So that basically means append only, 1761 01:12:40,150 --> 01:12:41,935 if you're the only writer to that file. 1762 01:12:41,935 --> 01:12:43,060 So that seems kind of nice. 1763 01:12:43,060 --> 01:12:45,270 And you could prevent it from reading a file. 1764 01:12:45,270 --> 01:12:47,140 You could say, well, you can only write, but not read, 1765 01:12:47,140 --> 01:12:48,270 which is something that's probably 1766 01:12:48,270 --> 01:12:50,519 difficult to do with Unix permissions alone right now. 1767 01:12:53,253 --> 01:12:54,630 Make sense? 1768 01:12:54,630 --> 01:12:57,120 Any other ideas for how Capsicum might help? 1769 01:12:59,680 --> 01:13:01,815 Would you wish there was more stuff in Capsicum? 1770 01:13:01,815 --> 01:13:03,670 I guess we always wish there was more stuff. 1771 01:13:03,670 --> 01:13:05,128 AUDIENCE: So one thing that perhaps 1772 01:13:05,128 --> 01:13:07,326 may be tricky is the service team daemons need 1773 01:13:07,326 --> 01:13:11,617 to connected to their backend databases somehow. 1774 01:13:11,617 --> 01:13:13,470 Which might be remotely. 1775 01:13:13,470 --> 01:13:15,235 But you don't want the launch daemon 1776 01:13:15,235 --> 01:13:17,235 to know about which services each service 1777 01:13:17,235 --> 01:13:18,722 is going to connect to. 1778 01:13:18,722 --> 01:13:19,680 PROFESSOR: Maybe, yeah. 1779 01:13:19,680 --> 01:13:20,930 That's a good question, right? 1780 01:13:20,930 --> 01:13:23,990 So in Capsicum, as we were talking about, 1781 01:13:23,990 --> 01:13:25,780 the network is in global namespace. 1782 01:13:25,780 --> 01:13:27,570 You have to have existing file descriptors 1783 01:13:27,570 --> 01:13:29,910 for all the outstanding connections ahead of time. 1784 01:13:29,910 --> 01:13:30,576 AUDIENCE: Right. 1785 01:13:30,576 --> 01:13:33,675 But you don't necessarily want okld to open up all the sockets 1786 01:13:33,675 --> 01:13:34,700 for all the services. 1787 01:13:34,700 --> 01:13:37,940 Because it might not know where the services are connected. 1788 01:13:37,940 --> 01:13:38,140 PROFESSOR: That's right. 1789 01:13:38,140 --> 01:13:38,510 Yeah. 1790 01:13:38,510 --> 01:13:39,960 So that's a little bit of an awkward thing. 1791 01:13:39,960 --> 01:13:40,850 I absolutely agree. 1792 01:13:40,850 --> 01:13:42,700 And this is part of the reason why 1793 01:13:42,700 --> 01:13:44,950 I think capabilities haven't completely 1794 01:13:44,950 --> 01:13:46,830 subsumed everything in the security world, 1795 01:13:46,830 --> 01:13:48,350 is because they are kind of awkward to use. 1796 01:13:48,350 --> 01:13:50,430 Because the guy that gives you all the privileges 1797 01:13:50,430 --> 01:13:52,638 has to know exactly what things you're going to need, 1798 01:13:52,638 --> 01:13:55,100 like these connections to backend servers. 1799 01:13:55,100 --> 01:13:58,150 So at some level, maybe this is not such a huge problem 1800 01:13:58,150 --> 01:13:58,650 in OKWS. 1801 01:13:58,650 --> 01:14:01,330 Because the launcher daemon has to read a Config 1802 01:14:01,330 --> 01:14:03,610 file and is going to pass the token to the service 1803 01:14:03,610 --> 01:14:04,401 in the first place. 1804 01:14:04,401 --> 01:14:07,070 So maybe the token is going to contain the host and port 1805 01:14:07,070 --> 01:14:08,580 number to which you're connected to. 1806 01:14:08,580 --> 01:14:09,080 But I agree. 1807 01:14:09,080 --> 01:14:10,360 It's not great. 1808 01:14:10,360 --> 01:14:12,590 Because especially, suppose the database server 1809 01:14:12,590 --> 01:14:13,780 disconnects you. 1810 01:14:13,780 --> 01:14:15,150 Well, you're kind of stuck now. 1811 01:14:15,150 --> 01:14:17,135 The file server is not connected anymore, 1812 01:14:17,135 --> 01:14:18,060 and you can't connect to a new one. 1813 01:14:18,060 --> 01:14:20,476 So basically, if the database server crashes, or restarts, 1814 01:14:20,476 --> 01:14:22,130 or the network breaks, you basically 1815 01:14:22,130 --> 01:14:24,500 have to terminate it, get yourself response, 1816 01:14:24,500 --> 01:14:27,230 so you can get a new one of these connections past you. 1817 01:14:27,230 --> 01:14:29,104 So it's maybe not a great plan in that sense. 1818 01:14:29,104 --> 01:14:32,518 AUDIENCE: Could we wrap the system call, the function 1819 01:14:32,518 --> 01:14:35,144 [INAUDIBLE] to open a socket so that it faults 1820 01:14:35,144 --> 01:14:37,602 the middleman instead of the socket that the users send out 1821 01:14:37,602 --> 01:14:39,254 to [INAUDIBLE]? 1822 01:14:39,254 --> 01:14:39,920 PROFESSOR: Yeah. 1823 01:14:39,920 --> 01:14:43,130 This is what I think the FreeBSD guys have done since. 1824 01:14:43,130 --> 01:14:46,312 Well, there's a bunch of situations 1825 01:14:46,312 --> 01:14:48,770 like this, where you want to open some file after the fact, 1826 01:14:48,770 --> 01:14:50,728 or you want to connect to something after going 1827 01:14:50,728 --> 01:14:51,880 into capability mode. 1828 01:14:51,880 --> 01:14:54,060 So the FreeBSD developers have added 1829 01:14:54,060 --> 01:14:58,250 this daemon called Casper, that every capability based process 1830 01:14:58,250 --> 01:14:59,470 has a handle on. 1831 01:14:59,470 --> 01:15:03,010 And this Casper daemon runs outside of capability mode, 1832 01:15:03,010 --> 01:15:04,470 and basically listens to requests 1833 01:15:04,470 --> 01:15:06,380 from sandbox processes. 1834 01:15:06,380 --> 01:15:09,790 And if you want to open some file, 1835 01:15:09,790 --> 01:15:12,400 or if you want to send a network connection, or a packet, 1836 01:15:12,400 --> 01:15:14,980 or something, but you didn't have the right capability 1837 01:15:14,980 --> 01:15:18,250 beforehand, then this Casper daemon will do it for you. 1838 01:15:18,250 --> 01:15:21,022 But it carefully maintains a list of things 1839 01:15:21,022 --> 01:15:22,980 that every sandbox process should or should not 1840 01:15:22,980 --> 01:15:24,010 be able to do. 1841 01:15:24,010 --> 01:15:25,870 So it's like a systems service. 1842 01:15:25,870 --> 01:15:28,400 So when you start a capability process, 1843 01:15:28,400 --> 01:15:30,900 or enter capability mode, by default, 1844 01:15:30,900 --> 01:15:33,620 this Casper thing will not allow you to do anything extra funny. 1845 01:15:33,620 --> 01:15:35,250 But you could say, well, hey, I'm 1846 01:15:35,250 --> 01:15:37,050 going to start the sandbox process. 1847 01:15:37,050 --> 01:15:40,750 And you can ask Casper, well, please allow my process 1848 01:15:40,750 --> 01:15:42,977 to do the following things later. 1849 01:15:42,977 --> 01:15:43,810 So you could, right? 1850 01:15:43,810 --> 01:15:46,240 And the cool thing is that you can pass file descriptors 1851 01:15:46,240 --> 01:15:48,700 or capabilities through fd passing in Unix. 1852 01:15:48,700 --> 01:15:51,520 So once you have a handle on this Casper guy, 1853 01:15:51,520 --> 01:15:55,120 you can get more capabilities later on. 1854 01:15:55,120 --> 01:15:58,680 So it's, again, trade off between being pure capability 1855 01:15:58,680 --> 01:16:04,330 world versus actually being programmable or easy to use. 1856 01:16:04,330 --> 01:16:06,110 So it seems to be working out. 1857 01:16:06,110 --> 01:16:10,230 I think the particular thing they use it for in FreeBSD, 1858 01:16:10,230 --> 01:16:13,350 or the thing that shows up often, is making DNS queries. 1859 01:16:13,350 --> 01:16:15,600 So you want to be able to make DNS queries once you're 1860 01:16:15,600 --> 01:16:16,150 in a sandbox. 1861 01:16:16,150 --> 01:16:18,608 And actually, this is a problem they ran into with tcpdump. 1862 01:16:18,608 --> 01:16:20,850 Because when tcpdump is printing your packets, 1863 01:16:20,850 --> 01:16:22,580 it wants to print the host name for an IP address. 1864 01:16:22,580 --> 01:16:24,680 In order to do this, it has to talk to a DNS server. 1865 01:16:24,680 --> 01:16:26,263 But you probably don't want to connect 1866 01:16:26,263 --> 01:16:28,940 to a DNS server ahead of time, or to every DNS server 1867 01:16:28,940 --> 01:16:30,320 you might ever need. 1868 01:16:30,320 --> 01:16:32,230 So instead, they use this helper daemon 1869 01:16:32,230 --> 01:16:35,440 that's going to make DNS queries for you. 1870 01:16:35,440 --> 01:16:37,388 Make sense? 1871 01:16:37,388 --> 01:16:38,750 All right. 1872 01:16:38,750 --> 01:16:42,905 So I guess the last thing I wanted to talk about 1873 01:16:42,905 --> 01:16:46,310 is what are the security guarantees that Capsicum 1874 01:16:46,310 --> 01:16:46,810 provides? 1875 01:16:46,810 --> 01:16:49,120 So should you trust it? 1876 01:16:49,120 --> 01:16:50,700 How could Capsicum go wrong? 1877 01:16:53,399 --> 01:16:55,440 Presumably you can always have security problems, 1878 01:16:55,440 --> 01:16:57,870 regardless of what mechanism you're using underneath. 1879 01:16:57,870 --> 01:16:59,370 But what particular things should we 1880 01:16:59,370 --> 01:17:01,930 worry about in Capsicum when we're 1881 01:17:01,930 --> 01:17:03,310 building some system here? 1882 01:17:06,710 --> 01:17:08,680 Suppose you have to attack this thing. 1883 01:17:08,680 --> 01:17:11,970 You have to attack this tcpdump thing, or gzip, 1884 01:17:11,970 --> 01:17:14,060 or whatever it is that they implemented. 1885 01:17:14,060 --> 01:17:18,039 What would you look at, in terms of bugs or problems? 1886 01:17:18,039 --> 01:17:19,872 AUDIENCE: Well, it depends on the developers 1887 01:17:19,872 --> 01:17:21,524 knowing what they're doing. 1888 01:17:21,524 --> 01:17:24,220 So they might give a bad capability. 1889 01:17:24,220 --> 01:17:25,220 PROFESSOR: That's right. 1890 01:17:25,220 --> 01:17:25,350 Yeah. 1891 01:17:25,350 --> 01:17:27,710 So it's actually one interesting property of Capsicum 1892 01:17:27,710 --> 01:17:30,640 is that it's not a guarantee that the user of the system 1893 01:17:30,640 --> 01:17:31,430 gets. 1894 01:17:31,430 --> 01:17:33,290 It's really a tool that the developer 1895 01:17:33,290 --> 01:17:38,260 has to build more trustworthy or better application software. 1896 01:17:38,260 --> 01:17:40,095 But I, as a user of the system, have no idea 1897 01:17:40,095 --> 01:17:41,553 whether this is a good or bad thing 1898 01:17:41,553 --> 01:17:43,178 that the application is using Capsicum. 1899 01:17:43,178 --> 01:17:46,440 You could totally misuse it, as you're absolutely right. 1900 01:17:46,440 --> 01:17:49,170 So maybe one example is, as they show in the paper, 1901 01:17:49,170 --> 01:17:51,490 you could give too many privileges to the sandbox 1902 01:17:51,490 --> 01:17:51,990 process. 1903 01:17:51,990 --> 01:17:53,810 Like the the TCP helper, or maybe 1904 01:17:53,810 --> 01:17:55,030 it has access to my console. 1905 01:17:55,030 --> 01:17:57,900 And that's not so great, but it's hard for me 1906 01:17:57,900 --> 01:18:01,130 as a user to really tell this in a general purpose fashion. 1907 01:18:01,130 --> 01:18:01,828 Yeah? 1908 01:18:01,828 --> 01:18:05,443 AUDIENCE: It might also be that when you set the permissions 1909 01:18:05,443 --> 01:18:09,100 to the masks on any given file descriptor 1910 01:18:09,100 --> 01:18:11,304 that you set two permission masks. 1911 01:18:11,304 --> 01:18:11,970 PROFESSOR: Yeah. 1912 01:18:11,970 --> 01:18:12,170 Right. 1913 01:18:12,170 --> 01:18:13,720 So it's not just the file descriptors. 1914 01:18:13,720 --> 01:18:15,610 Also, what can you do with those file descriptors? 1915 01:18:15,610 --> 01:18:16,220 You're right. 1916 01:18:16,220 --> 01:18:16,450 Yes. 1917 01:18:16,450 --> 01:18:18,140 These maps are another part of the story 1918 01:18:18,140 --> 01:18:21,460 that you have to watch out for. 1919 01:18:21,460 --> 01:18:21,980 OK. 1920 01:18:21,980 --> 01:18:23,594 So suppose we got the masks right. 1921 01:18:23,594 --> 01:18:25,010 We got the file descriptors right. 1922 01:18:25,010 --> 01:18:26,120 We haven't used lth_start. 1923 01:18:26,120 --> 01:18:28,740 There's nothing extra in memory. 1924 01:18:28,740 --> 01:18:30,532 AUDIENCE: [INAUDIBLE]. 1925 01:18:30,532 --> 01:18:31,490 PROFESSOR: That's true. 1926 01:18:31,490 --> 01:18:31,990 Yes. 1927 01:18:31,990 --> 01:18:34,030 So maybe there's like something before you even 1928 01:18:34,030 --> 01:18:35,950 add the capability mode that's damaging. 1929 01:18:35,950 --> 01:18:39,030 So it only helps once you jump in. 1930 01:18:39,030 --> 01:18:42,240 And one slightly annoying thing is 1931 01:18:42,240 --> 01:18:47,360 that it seems like it can't do a whole lot inside of capability 1932 01:18:47,360 --> 01:18:51,560 mode, not in the sense that you can't run large computations, 1933 01:18:51,560 --> 01:18:55,010 but you can't really put a large part of a complicated system 1934 01:18:55,010 --> 01:18:55,900 into capability mode. 1935 01:18:55,900 --> 01:18:57,358 Because inevitably, in Unix, you'll 1936 01:18:57,358 --> 01:18:59,820 need to do something with new processes, 1937 01:18:59,820 --> 01:19:01,870 opening network connections, et cetera. 1938 01:19:01,870 --> 01:19:03,487 And you'll probably need to use some 1939 01:19:03,487 --> 01:19:05,130 of these global namespaces that are not 1940 01:19:05,130 --> 01:19:06,790 available in capability mode. 1941 01:19:06,790 --> 01:19:08,330 So it's probably going to be quite 1942 01:19:08,330 --> 01:19:12,790 difficult to put large chunks of logic or intricate system 1943 01:19:12,790 --> 01:19:15,370 code inside of capability mode. 1944 01:19:15,370 --> 01:19:19,760 So only well-defined chunks of an application 1945 01:19:19,760 --> 01:19:22,500 are likely to be running in capability mode. 1946 01:19:22,500 --> 01:19:23,000 It depends. 1947 01:19:23,000 --> 01:19:25,520 I don't know if this is entirely true or not. 1948 01:19:25,520 --> 01:19:27,180 In Chrome, for example, large processes 1949 01:19:27,180 --> 01:19:30,460 do run in capability mode in their design. 1950 01:19:30,460 --> 01:19:32,960 It might be that you basically have 1951 01:19:32,960 --> 01:19:37,190 to have non-capability mode chunks of your application 1952 01:19:37,190 --> 01:19:40,390 because you wanted to incorporate nicely with Unix, 1953 01:19:40,390 --> 01:19:44,330 or whatever is is you're running alongside of it. 1954 01:19:44,330 --> 01:19:44,910 OK. 1955 01:19:44,910 --> 01:19:48,460 Any other thing you should worry about? 1956 01:19:48,460 --> 01:19:49,170 Yeah? 1957 01:19:49,170 --> 01:19:51,450 AUDIENCE: Well, whether they implemented capabilities 1958 01:19:51,450 --> 01:19:52,090 correctly. 1959 01:19:52,090 --> 01:19:53,012 PROFESSOR: Yeah. 1960 01:19:53,012 --> 01:19:55,320 AUDIENCE: Whether they've covered all the system calls. 1961 01:19:55,320 --> 01:19:55,750 PROFESSOR: That's right. 1962 01:19:55,750 --> 01:19:56,010 Yes. 1963 01:19:56,010 --> 01:19:58,220 So that's actually a huge problem, in some sense, 1964 01:19:58,220 --> 01:19:58,980 already. 1965 01:19:58,980 --> 01:20:01,230 If you think about it, there's probably 1966 01:20:01,230 --> 01:20:03,960 hundreds of system calls that the kernel provides you. 1967 01:20:03,960 --> 01:20:06,529 And they're not especially precisely documented, 1968 01:20:06,529 --> 01:20:08,695 so you probably have to look at their implementation 1969 01:20:08,695 --> 01:20:11,242 and see if, for every system call, if there's 1970 01:20:11,242 --> 01:20:13,650 some way for the applications to get 1971 01:20:13,650 --> 01:20:16,010 the system call to perform some operation 1972 01:20:16,010 --> 01:20:18,600 on some extra object that didn't have a file descriptor to it. 1973 01:20:18,600 --> 01:20:20,490 And most Unix system calls weren't 1974 01:20:20,490 --> 01:20:22,870 written with the expectation of everything 1975 01:20:22,870 --> 01:20:24,600 has to be operation on a file descriptor. 1976 01:20:24,600 --> 01:20:27,160 So you really have to get every system all right. 1977 01:20:27,160 --> 01:20:30,100 And probably more worryingly is that the kernel has 1978 01:20:30,100 --> 01:20:32,300 to be free of bugs, like buffer overflows 1979 01:20:32,300 --> 01:20:34,884 or whatever other memory corruption like you guys 1980 01:20:34,884 --> 01:20:35,800 explained [INAUDIBLE]. 1981 01:20:35,800 --> 01:20:37,940 Otherwise, all of this is complete nonsense. 1982 01:20:37,940 --> 01:20:40,300 You just are on arbitrary assembly code in the kernel, 1983 01:20:40,300 --> 01:20:43,298 and you have full control of the machine. 1984 01:20:43,298 --> 01:20:44,214 AUDIENCE: [INAUDIBLE]. 1985 01:20:52,225 --> 01:20:53,300 PROFESSOR: Yeah. 1986 01:20:53,300 --> 01:20:54,100 I guess, yeah. 1987 01:20:54,100 --> 01:20:55,900 So the one thing I didn't get a chance to talk about 1988 01:20:55,900 --> 01:20:56,816 is alternative things. 1989 01:20:56,816 --> 01:20:58,150 So this is in FreeBSD. 1990 01:20:58,150 --> 01:20:59,990 Linux has this thing called [INAUDIBLE], 1991 01:20:59,990 --> 01:21:04,140 that allows you to specify which system calls you can operate. 1992 01:21:04,140 --> 01:21:06,070 If you squinted, it's kind of like Capsicum 1993 01:21:06,070 --> 01:21:08,190 but very different, in the sense that Capsicum 1994 01:21:08,190 --> 01:21:09,731 talks about specific file descriptors 1995 01:21:09,731 --> 01:21:11,010 that you can operate. 1996 01:21:11,010 --> 01:21:12,812 And in Linux, the [INAUDIBLE] mechanism 1997 01:21:12,812 --> 01:21:14,520 lets you talk about specific system calls 1998 01:21:14,520 --> 01:21:16,040 that you could run. 1999 01:21:16,040 --> 01:21:18,670 So it's probably less fine grained, 2000 01:21:18,670 --> 01:21:22,110 but it's what's available in Linux today. 2001 01:21:22,110 --> 01:21:24,289 And it's actually probably a good idea 2002 01:21:24,289 --> 01:21:26,622 to look at your applications and see what system call do 2003 01:21:26,622 --> 01:21:29,450 you expect it to make and then code in a filter 2004 01:21:29,450 --> 01:21:31,770 and allow it to make only those system calls. 2005 01:21:31,770 --> 01:21:34,311 The problem is that if you have any interesting applications, 2006 01:21:34,311 --> 01:21:36,215 it'll probably run exec and open and write, 2007 01:21:36,215 --> 01:21:38,680 and that's probably enough to do quite a bit of damage 2008 01:21:38,680 --> 01:21:39,264 to the system. 2009 01:21:39,264 --> 01:21:41,763 So that's why you probably want the more fine-grained system 2010 01:21:41,763 --> 01:21:43,170 like Capsicum, where you can say, 2011 01:21:43,170 --> 01:21:45,630 well, you can run right, but only on this thing, 2012 01:21:45,630 --> 01:21:49,350 not on my entire home directory. 2013 01:21:49,350 --> 01:21:49,850 All right. 2014 01:21:49,850 --> 01:21:51,520 So I guess we're out of time to talk about Capsicum. 2015 01:21:51,520 --> 01:21:53,250 So let's talk about native clients 2016 01:21:53,250 --> 01:21:56,840 on Wednesday and a different way to sandbox programs.