1 00:00:00,080 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,820 Commons license. 3 00:00:03,820 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,140 continue to offer high quality educational resources for free. 5 00:00:10,140 --> 00:00:12,700 To make a donation or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,263 at ocw.mit.edu. 8 00:00:26,360 --> 00:00:30,130 PROFESSOR: So today is our third and probably final lecture 9 00:00:30,130 --> 00:00:31,870 on approximation algorithms. 10 00:00:31,870 --> 00:00:34,760 We're going to take a different approach 11 00:00:34,760 --> 00:00:39,340 to proving inapproximability to optimization problems called 12 00:00:39,340 --> 00:00:40,450 gap problems. 13 00:00:40,450 --> 00:00:43,620 And we'll think about gap-preserving reductions. 14 00:00:43,620 --> 00:00:47,680 We will prove along the way optimal lower bound on MAX-3SAT 15 00:00:47,680 --> 00:00:50,080 and other fun things. 16 00:00:50,080 --> 00:00:52,700 So let's start with what a gap problem is. 17 00:00:58,340 --> 00:01:00,750 I haven't actually seen a generic definition of this, 18 00:01:00,750 --> 00:01:03,570 so this is new terminology, but I think helpful. 19 00:01:06,310 --> 00:01:10,420 A gap problem is a way of converting an optimization 20 00:01:10,420 --> 00:01:12,680 problem into a decision problem. 21 00:01:12,680 --> 00:01:14,510 Now, we already know one way to do that. 22 00:01:14,510 --> 00:01:16,590 If you have an NPO optimization problem, 23 00:01:16,590 --> 00:01:18,690 you convert it into the obvious decision problem, 24 00:01:18,690 --> 00:01:21,190 which is OPT at most k. 25 00:01:21,190 --> 00:01:25,600 For the minimization problem, that is NP-complete. 26 00:01:25,600 --> 00:01:28,700 But we want to get something useful from an approximability 27 00:01:28,700 --> 00:01:29,880 standpoint. 28 00:01:29,880 --> 00:01:35,190 And so the idea is, here's a different problem. 29 00:01:35,190 --> 00:01:44,100 Instead of just deciding whether OPT is at most k, 30 00:01:44,100 --> 00:01:49,310 we want to distinguish between OPT being at most k 31 00:01:49,310 --> 00:01:57,360 versus OPT being at least k over c for some value c. 32 00:01:57,360 --> 00:02:01,070 And the analogy here is with c approximation algorithm, 33 00:02:01,070 --> 00:02:04,279 c does not have to be constant, despite the name. 34 00:02:04,279 --> 00:02:05,320 Could be a function of n. 35 00:02:05,320 --> 00:02:06,600 Maybe it's log n. 36 00:02:06,600 --> 00:02:10,460 Maybe it's n to the epsilon, whatever. 37 00:02:10,460 --> 00:02:14,306 So for a minimization problem, normally-- 38 00:02:14,306 --> 00:02:15,860 did I get this the right way? 39 00:02:15,860 --> 00:02:17,730 Sorry, that should be c times k. 40 00:02:24,050 --> 00:02:27,830 For minimization and for maximization, 41 00:02:27,830 --> 00:02:29,905 it's going to be the reverse. 42 00:02:36,880 --> 00:02:42,030 And I think I'm going to use strict inequality here also. 43 00:02:42,030 --> 00:02:43,590 So a minimization. 44 00:02:43,590 --> 00:02:46,750 There's a gap here between k and c times k. 45 00:02:46,750 --> 00:02:51,640 We're imagining here c is bigger than 1. 46 00:02:51,640 --> 00:02:53,956 So distinguishing between being less than k 47 00:02:53,956 --> 00:02:59,320 and being at least c times k leaves a hole. 48 00:02:59,320 --> 00:03:03,870 And the point is you are promised that 49 00:03:03,870 --> 00:03:06,250 your input-- you have a question? 50 00:03:06,250 --> 00:03:09,954 AUDIENCE: The second one, should it really be c over k? 51 00:03:13,725 --> 00:03:14,600 PROFESSOR: Sorry, no. 52 00:03:17,540 --> 00:03:20,750 Thank you. 53 00:03:20,750 --> 00:03:25,771 C and k sound the same, so it's always easy to mix them up. 54 00:03:25,771 --> 00:03:26,270 Cool. 55 00:03:26,270 --> 00:03:30,220 So the idea is that you're-- so in both cases, 56 00:03:30,220 --> 00:03:33,740 there's a ratio gap here of c. 57 00:03:33,740 --> 00:03:35,580 And the idea is that you're promised 58 00:03:35,580 --> 00:03:38,724 that your input falls into one of these two categories. 59 00:03:38,724 --> 00:03:40,140 What does it mean to distinguish-- 60 00:03:40,140 --> 00:03:42,550 I mean, I tell you up front the input either 61 00:03:42,550 --> 00:03:44,990 has this property or this property, 62 00:03:44,990 --> 00:03:47,940 and I want you to decide which one it is. 63 00:03:47,940 --> 00:03:51,610 And we'll call these yes instances 64 00:03:51,610 --> 00:03:53,010 and these no instances. 65 00:03:55,720 --> 00:03:57,230 Normally with a decision problem, 66 00:03:57,230 --> 00:04:02,080 the no instance is that OPT is just one bigger or one smaller. 67 00:04:02,080 --> 00:04:05,190 Now we have a big gap between the no instances and the yes 68 00:04:05,190 --> 00:04:05,690 instances. 69 00:04:05,690 --> 00:04:07,200 We're told that that's true. 70 00:04:07,200 --> 00:04:08,830 This is called promise problem. 71 00:04:12,560 --> 00:04:14,954 And effectively what that means is 72 00:04:14,954 --> 00:04:16,829 if you're trying to come up with an algorithm 73 00:04:16,829 --> 00:04:18,700 to solve this decision problem, you 74 00:04:18,700 --> 00:04:21,720 don't care what the algorithm does if OPT 75 00:04:21,720 --> 00:04:24,317 happens to fall in between. 76 00:04:24,317 --> 00:04:25,900 The algorithm can do whatever it want. 77 00:04:25,900 --> 00:04:28,530 It can output digits of pi in the middle, 78 00:04:28,530 --> 00:04:32,930 as long as when OPT is it at most k or greater than 79 00:04:32,930 --> 00:04:35,060 or equal to k, it outputs yes. 80 00:04:35,060 --> 00:04:38,320 And when OPT is at least a factor of c away from that, 81 00:04:38,320 --> 00:04:40,720 it outputs no. 82 00:04:40,720 --> 00:04:45,550 So that's an easier problem. 83 00:04:45,550 --> 00:04:56,090 And the cool thing is if the c gap version of a problem 84 00:04:56,090 --> 00:05:06,990 is NP-hard, then so is c approximating 85 00:05:06,990 --> 00:05:07,980 the original problem. 86 00:05:21,200 --> 00:05:24,340 So this really is in direct analogy to c approximation. 87 00:05:24,340 --> 00:05:27,540 And so this lets us think about an NP hardness 88 00:05:27,540 --> 00:05:31,750 for a decision problem and prove an inapproximability result. 89 00:05:31,750 --> 00:05:33,950 This is nice because in the last two lectures, 90 00:05:33,950 --> 00:05:39,040 we were having to keep track of a lot more in just defining 91 00:05:39,040 --> 00:05:43,190 what inapproximability meant and APX hardness and so on. 92 00:05:43,190 --> 00:05:45,674 Here it's kind of back to regular NP hardnes. 93 00:05:45,674 --> 00:05:47,090 Now, the techniques are completely 94 00:05:47,090 --> 00:05:50,800 different in this world than our older NP hardness proofs. 95 00:05:50,800 --> 00:05:53,000 But still, it's kind of comforting to be back 96 00:05:53,000 --> 00:05:56,170 in decision land. 97 00:05:56,170 --> 00:05:58,420 Cool. 98 00:05:58,420 --> 00:06:03,510 So because of this implication, in previous lectures 99 00:06:03,510 --> 00:06:07,510 we were just worried about proving inapproximability. 100 00:06:07,510 --> 00:06:10,110 But today we're going to be thinking 101 00:06:10,110 --> 00:06:13,100 about proving that gap problems are NP-hard, 102 00:06:13,100 --> 00:06:15,710 or some other kind of hardness. 103 00:06:15,710 --> 00:06:17,500 This is a stronger type of result. 104 00:06:17,500 --> 00:06:19,520 So in general, inapproximability is 105 00:06:19,520 --> 00:06:22,150 kind of what you care about from the algorithmic standpoint, 106 00:06:22,150 --> 00:06:24,910 but gap results saying that hey, your problem 107 00:06:24,910 --> 00:06:28,360 is hard even if you have this huge gap between the yes 108 00:06:28,360 --> 00:06:30,240 instances and the no instances, that's 109 00:06:30,240 --> 00:06:32,220 also of independent interest, more 110 00:06:32,220 --> 00:06:34,370 about the structure of the problem. 111 00:06:34,370 --> 00:06:35,490 But one implies the other. 112 00:06:35,490 --> 00:06:39,750 So this is the stronger type of thing to go for. 113 00:06:39,750 --> 00:06:42,420 The practical reason to care about this stuff 114 00:06:42,420 --> 00:06:47,550 is that this gap idea lets you get stronger inapproximability 115 00:06:47,550 --> 00:06:48,650 results. 116 00:06:48,650 --> 00:06:52,550 The factor c you get by thinking about gaps in practice 117 00:06:52,550 --> 00:06:56,070 seems to be larger than the gaps you 118 00:06:56,070 --> 00:07:00,280 get by L reductions and things. 119 00:07:00,280 --> 00:07:05,110 So let me tell you about one other type of gap problem. 120 00:07:05,110 --> 00:07:08,565 This is a standard one, a little bit more precise. 121 00:07:14,200 --> 00:07:15,920 Consider MAX-SAT. 122 00:07:15,920 --> 00:07:17,770 Pick your favorite version of MAX-SAT, 123 00:07:17,770 --> 00:07:20,100 or MAX-CSP was the general form where you 124 00:07:20,100 --> 00:07:22,140 could have any type of clause. 125 00:07:22,140 --> 00:07:24,765 Instead of just a c gap, we will define slightly more precisely 126 00:07:24,765 --> 00:07:39,400 an a, b gap, which is to distinguish between OPT 127 00:07:39,400 --> 00:07:46,990 is less than a times the number of clauses, 128 00:07:46,990 --> 00:07:51,645 and OPT is at least b times the number of clauses. 129 00:07:55,740 --> 00:08:00,300 So whereas here everything was relative to some input k 130 00:08:00,300 --> 00:08:03,580 that you want to decide about, with SAT 131 00:08:03,580 --> 00:08:05,640 there's a kind of absolute notion of what you'd 132 00:08:05,640 --> 00:08:07,950 like to achieve, which is that you satisfy everything, 133 00:08:07,950 --> 00:08:09,270 all clauses are true. 134 00:08:09,270 --> 00:08:11,440 So typically we'll think about b being one. 135 00:08:14,280 --> 00:08:17,460 And so you're distinguishing between a satisfiable instance 136 00:08:17,460 --> 00:08:20,130 where all clauses are satisfied, and something 137 00:08:20,130 --> 00:08:21,500 that's very unsatisfiable. 138 00:08:21,500 --> 00:08:25,341 There's some kind of usually constant fraction unsatisfiable 139 00:08:25,341 --> 00:08:25,840 clauses. 140 00:08:29,250 --> 00:08:32,590 We need this level of precision thinking 141 00:08:32,590 --> 00:08:35,409 about when you're right up against 100% satisfied 142 00:08:35,409 --> 00:08:39,720 versus 1% satisfied or something like that, or 1% satisfiable. 143 00:08:42,909 --> 00:08:43,409 Cool. 144 00:08:43,409 --> 00:08:45,825 AUDIENCE: Do you use the same notation for one [INAUDIBLE] 145 00:08:45,825 --> 00:08:49,050 or only for-- so the one problem like the maximum number of ones 146 00:08:49,050 --> 00:08:49,550 you can get. 147 00:08:49,550 --> 00:08:52,300 PROFESSOR: I haven't seen it, but yeah, that's 148 00:08:52,300 --> 00:08:53,700 a good question. 149 00:08:53,700 --> 00:08:56,290 Certainly valid to do it for-- we will see it 150 00:08:56,290 --> 00:08:57,540 for one other type of problem. 151 00:08:57,540 --> 00:09:00,670 For any problem, if you can define some absolute notion 152 00:09:00,670 --> 00:09:02,270 of how much you'd like to get, you 153 00:09:02,270 --> 00:09:05,720 can always measure relative to that and define this kind of a, 154 00:09:05,720 --> 00:09:08,690 b gap problem. 155 00:09:08,690 --> 00:09:10,660 Cool. 156 00:09:10,660 --> 00:09:11,160 All right. 157 00:09:11,160 --> 00:09:14,740 So how do we get these gaps? 158 00:09:14,740 --> 00:09:17,440 There's two, well maybe three ways, I guess. 159 00:09:24,870 --> 00:09:28,480 In general, we're going to use reductions, like always. 160 00:09:28,480 --> 00:09:31,880 And you could start from no gap and make a gap, 161 00:09:31,880 --> 00:09:33,930 or start from a gap of additive one 162 00:09:33,930 --> 00:09:35,710 and turn it into a big multiplicative gap. 163 00:09:35,710 --> 00:09:38,050 That will be gap-producing reductions. 164 00:09:38,050 --> 00:09:40,890 You could start with some gap and then make it bigger. 165 00:09:40,890 --> 00:09:43,440 That's gap-amplifying reduction. 166 00:09:43,440 --> 00:09:46,090 Or you could just start with a gap and try to preserve it. 167 00:09:46,090 --> 00:09:48,010 That would be gap-preserving reductions. 168 00:09:48,010 --> 00:09:49,950 In general, once you have some gap, 169 00:09:49,950 --> 00:09:52,780 you try to keep it or make it bigger to get stronger 170 00:09:52,780 --> 00:09:55,990 hardness for your problem. 171 00:09:55,990 --> 00:09:59,010 So the idea with a gap-producing reduction 172 00:09:59,010 --> 00:10:02,200 is that you have no assumption about your starting problem. 173 00:10:02,200 --> 00:10:04,325 In general, reduction we're going from some problem 174 00:10:04,325 --> 00:10:06,430 a to some problem b. 175 00:10:06,430 --> 00:10:11,730 And what we would like is that the output instance to problem 176 00:10:11,730 --> 00:10:21,250 b, the output of the reduction has OPT equal to k 177 00:10:21,250 --> 00:10:29,500 or, for a minimization problem, OPT bigger than c times k. 178 00:10:29,500 --> 00:10:33,420 And for a maximization problem, OPT less than k over c. 179 00:10:36,430 --> 00:10:38,880 So that's just saying we have a gap in the output. 180 00:10:38,880 --> 00:10:41,340 We assume nothing about the input instance. 181 00:10:41,340 --> 00:10:43,300 That would be a gap-producing production. 182 00:10:43,300 --> 00:10:46,720 Now we have seen some of these before, or at least 183 00:10:46,720 --> 00:10:47,800 mentioned them. 184 00:10:47,800 --> 00:10:49,930 One of them, this is from lecture three, 185 00:10:49,930 --> 00:10:51,460 I think for Tetris. 186 00:10:51,460 --> 00:10:54,380 We proved NP hardness, which was this three partition reduction. 187 00:10:54,380 --> 00:10:57,080 And the idea is that if you could satisfy that and open 188 00:10:57,080 --> 00:11:00,630 this thing, then you could get a zillion points down here. 189 00:11:00,630 --> 00:11:02,360 In most of the instances down here, 190 00:11:02,360 --> 00:11:04,680 we squeeze this down to like an n to the epsilon. 191 00:11:04,680 --> 00:11:06,320 That's still hard. 192 00:11:06,320 --> 00:11:09,510 And so n to the 1 minus epsilon of the instances down here, 193 00:11:09,510 --> 00:11:13,330 and you're given a ton of pieces to fill in the space, 194 00:11:13,330 --> 00:11:14,510 get lots of points. 195 00:11:14,510 --> 00:11:18,260 If the answer was no here, then you won't get those points. 196 00:11:18,260 --> 00:11:22,910 And so OPT is very small, at most, say, n to the epsilon. 197 00:11:22,910 --> 00:11:24,626 If you can solve this instance, we 198 00:11:24,626 --> 00:11:29,600 have a yes instance in the input, then we get end points. 199 00:11:29,600 --> 00:11:32,210 So the gap there is n to the 1 minus epsilon. 200 00:11:36,683 --> 00:11:40,380 So the Tetris reduction, we assume nothing about the three 201 00:11:40,380 --> 00:11:41,180 partition instance. 202 00:11:41,180 --> 00:11:42,680 It was just yes or no. 203 00:11:42,680 --> 00:11:47,663 And we produced an instance that had a gap of n 204 00:11:47,663 --> 00:11:48,830 to the 1 minus epsilon. 205 00:11:48,830 --> 00:11:50,840 We could set epsilon to any constant 206 00:11:50,840 --> 00:11:54,160 we want bigger than zero. 207 00:11:54,160 --> 00:11:57,600 We also mentioned another such reduction. 208 00:11:57,600 --> 00:11:59,840 And in general, for a lot of games and puzzles, 209 00:11:59,840 --> 00:12:00,590 you can do this. 210 00:12:00,590 --> 00:12:02,350 It's sort of on all or nothing deal. 211 00:12:02,350 --> 00:12:06,720 And gap-producing reduction is a way to formalize that. 212 00:12:06,720 --> 00:12:09,760 Another problem we talked about last class I believe 213 00:12:09,760 --> 00:12:12,270 was non-metric TSP. 214 00:12:12,270 --> 00:12:14,580 I just give you a complete graph. 215 00:12:14,580 --> 00:12:17,750 Every edge has some number on it that's the length of that edge. 216 00:12:17,750 --> 00:12:20,720 You want to find a TSP tour of minimum total length. 217 00:12:20,720 --> 00:12:25,640 This is really hard to approximate because depending 218 00:12:25,640 --> 00:12:31,710 on your model, you can use let's say edge weights. 219 00:12:31,710 --> 00:12:36,330 And to be really annoying would be 0, comma 1. 220 00:12:36,330 --> 00:12:39,600 And if I'm given a Hamiltonicity instance, wherever 221 00:12:39,600 --> 00:12:41,680 there's an edge, I put a weight of zero. 222 00:12:41,680 --> 00:12:44,250 Wherever there's not an edge, I put a weight of one. 223 00:12:44,250 --> 00:12:46,810 And then if the input graph was Hamiltonian, 224 00:12:46,810 --> 00:12:47,710 it's a yes instance. 225 00:12:47,710 --> 00:12:51,140 Then the output thing will have a tour of length zero. 226 00:12:51,140 --> 00:12:55,250 And if the input was not Hamiltonian, 227 00:12:55,250 --> 00:12:57,840 then the output will have weight n. 228 00:12:57,840 --> 00:13:00,030 Ratio between n and zero is infinity. 229 00:13:00,030 --> 00:13:02,580 So this is an infinite gap creation 230 00:13:02,580 --> 00:13:04,390 if you allow weights of zero. 231 00:13:04,390 --> 00:13:06,840 If you say zero is cheating, which 232 00:13:06,840 --> 00:13:10,390 we did and some papers do, you could instead 233 00:13:10,390 --> 00:13:14,460 do one and infinity, where infinity is the largest 234 00:13:14,460 --> 00:13:15,730 representable number. 235 00:13:15,730 --> 00:13:18,180 So that's going to be something like 2 to the n 236 00:13:18,180 --> 00:13:22,980 if you allow usual binary encodings of numbers. 237 00:13:22,980 --> 00:13:25,350 If you don't, the PB case, then that 238 00:13:25,350 --> 00:13:27,410 would be n to some constant. 239 00:13:27,410 --> 00:13:30,030 But you get a big gap in any case. 240 00:13:30,030 --> 00:13:36,180 So you get some gap equals huge. 241 00:13:36,180 --> 00:13:39,700 So these are kind of trivial senses of inapproximability, 242 00:13:39,700 --> 00:13:42,040 but hey, that's one way to do it. 243 00:13:42,040 --> 00:13:43,540 What we're going to talk about today 244 00:13:43,540 --> 00:13:48,460 are other known ways to get gap production that are really 245 00:13:48,460 --> 00:13:50,150 cool and more broadly useful. 246 00:13:50,150 --> 00:13:53,190 This is useful when you have a sort of all or nothing problem. 247 00:13:53,190 --> 00:13:54,850 A lot of the time, it's not so clear. 248 00:13:54,850 --> 00:13:57,060 There's a constant factor approximation. 249 00:13:57,060 --> 00:14:00,319 So some giant gap like this isn't going to be possible, 250 00:14:00,319 --> 00:14:01,485 but still gaps are possible. 251 00:14:04,420 --> 00:14:13,190 Now, an important part of the story here is the PCP theorem. 252 00:14:13,190 --> 00:14:16,050 So this is not about drugs. 253 00:14:16,050 --> 00:14:24,840 This is about another complexity class. 254 00:14:24,840 --> 00:14:27,330 And the complexity class is normally 255 00:14:27,330 --> 00:14:30,080 written PCP of order log n, comma order one. 256 00:14:30,080 --> 00:14:33,470 I'm going to simplify this to just PCP as the class. 257 00:14:33,470 --> 00:14:35,790 The other notions make sense here, 258 00:14:35,790 --> 00:14:38,660 although the parameters don't turn out to matter too much. 259 00:14:38,660 --> 00:14:41,250 And it's rather lengthy to write that every time. 260 00:14:41,250 --> 00:14:43,800 So I'm just going to write PCP. 261 00:14:43,800 --> 00:14:47,330 Let me first tell you what this class is about briefly, 262 00:14:47,330 --> 00:14:49,940 and then we'll see why it's directly related 263 00:14:49,940 --> 00:14:52,960 to gap problems, hence where a lot 264 00:14:52,960 --> 00:14:56,000 of these gap-producing reductions come from. 265 00:14:56,000 --> 00:14:59,640 So PCP stands for Probabilistically Checkable 266 00:14:59,640 --> 00:15:00,140 Proof. 267 00:15:11,730 --> 00:15:16,890 The checkable proof refers to NP. 268 00:15:16,890 --> 00:15:19,060 Every yes instance has a checkable proof 269 00:15:19,060 --> 00:15:20,930 that the answer is yes. 270 00:15:20,930 --> 00:15:22,480 Probabilistically checkable means 271 00:15:22,480 --> 00:15:25,650 you can check it even faster with high probability. 272 00:15:25,650 --> 00:15:29,970 So normally to check a proof, we take polynomial time in NP. 273 00:15:29,970 --> 00:15:33,360 Here we want to achieve constant time. 274 00:15:33,360 --> 00:15:34,630 That's the main idea. 275 00:15:34,630 --> 00:15:37,220 That can't be done perfectly, but you can do it correctly 276 00:15:37,220 --> 00:15:38,750 with high probability. 277 00:15:38,750 --> 00:15:41,720 So in general, a problem in PCP has 278 00:15:41,720 --> 00:15:44,240 certificates of polynomial length, just like NP. 279 00:15:54,530 --> 00:16:01,060 And we have an algorithm for checking certificates, which 280 00:16:01,060 --> 00:16:08,090 is given certificate, and it's given order log 281 00:16:08,090 --> 00:16:10,285 n bits of randomness. 282 00:16:17,240 --> 00:16:19,560 That's what this first parameter refers to, 283 00:16:19,560 --> 00:16:21,975 is how much randomness the algorithm's given. 284 00:16:21,975 --> 00:16:23,600 So we restrict the amount of randomness 285 00:16:23,600 --> 00:16:26,680 to a very small amount. 286 00:16:26,680 --> 00:16:29,340 And it should tell you whether the instance 287 00:16:29,340 --> 00:16:32,210 is a yes instance or a no instance, 288 00:16:32,210 --> 00:16:34,850 in some sense if you're given the right certificate. 289 00:16:34,850 --> 00:16:37,880 So in particular, if the instance 290 00:16:37,880 --> 00:16:41,990 was a yes instance-- so this is back to decision problems, 291 00:16:41,990 --> 00:16:42,610 just like NP. 292 00:16:42,610 --> 00:16:45,200 There's no optimization here. 293 00:16:45,200 --> 00:16:47,480 But we're going to apply this to gap problems, 294 00:16:47,480 --> 00:16:50,910 and that will relate us to optimization. 295 00:16:50,910 --> 00:17:02,710 So let's say there's no error on yes instances, although you 296 00:17:02,710 --> 00:17:03,450 could relax that. 297 00:17:03,450 --> 00:17:05,619 It won't make a big difference. 298 00:17:05,619 --> 00:17:09,500 So if you have a yes instance, and you give the right 299 00:17:09,500 --> 00:17:18,099 certificate-- so this is for some certificate-- 300 00:17:18,099 --> 00:17:19,710 the algorithm's guaranteed to say yes. 301 00:17:19,710 --> 00:17:21,450 So no error there. 302 00:17:21,450 --> 00:17:26,510 Where we add some slack is if there's a no instance. 303 00:17:26,510 --> 00:17:29,060 Now normally in NP for a no instance, 304 00:17:29,060 --> 00:17:32,320 there is no correct certificate. 305 00:17:32,320 --> 00:17:34,020 Now, the algorithm will sometimes 306 00:17:34,020 --> 00:17:37,090 say yes even if we give it the wrong certificate. 307 00:17:37,090 --> 00:17:38,730 There is no right certificate. 308 00:17:38,730 --> 00:17:43,272 But it will say so with some at most constant probability. 309 00:17:43,272 --> 00:17:50,140 So let's say the probability that the algorithm says 310 00:17:50,140 --> 00:17:59,860 no is at least some constant, presumably less than one. 311 00:17:59,860 --> 00:18:02,230 If it's one, then that's NP. 312 00:18:02,230 --> 00:18:05,240 If it's a half, that would be fine. 313 00:18:05,240 --> 00:18:08,270 A tenth, a hundredth, they'll all be the same. 314 00:18:08,270 --> 00:18:10,210 Because once you have such an algorithm that 315 00:18:10,210 --> 00:18:11,940 achieves some constant probability, 316 00:18:11,940 --> 00:18:16,335 you could apply it log 1 over epsilon times. 317 00:18:19,380 --> 00:18:25,220 And we reduce the error to epsilon. 318 00:18:28,194 --> 00:18:30,610 The probability of error goes to epsilon if we just repeat 319 00:18:30,610 --> 00:18:32,310 this log 1 over epsilon times. 320 00:18:32,310 --> 00:18:36,940 So in constant time-- it didn't say. 321 00:18:36,940 --> 00:18:40,380 The order one here refers to the running time of the algorithm. 322 00:18:40,380 --> 00:18:47,040 So this is an order one time algorithm. 323 00:18:47,040 --> 00:18:49,120 So the point is, the algorithm's super fast 324 00:18:49,120 --> 00:18:52,020 and still in constant time for constant epsilon. 325 00:18:52,020 --> 00:18:55,130 You can get arbitrarily small error probability, 326 00:18:55,130 --> 00:18:57,700 say one in 100 or one in a million, 327 00:18:57,700 --> 00:19:00,640 and it's still pretty good. 328 00:19:00,640 --> 00:19:03,690 And you're checking your proof super, super fast. 329 00:19:03,690 --> 00:19:04,628 Question. 330 00:19:04,628 --> 00:19:06,970 AUDIENCE: Why is there a limit on the randomness? 331 00:19:09,352 --> 00:19:10,810 PROFESSOR: This limit on randomness 332 00:19:10,810 --> 00:19:13,316 is not strictly necessary. 333 00:19:13,316 --> 00:19:14,690 For example, n bits of randomness 334 00:19:14,690 --> 00:19:15,814 turned out not to help you. 335 00:19:15,814 --> 00:19:17,060 That was proved later. 336 00:19:17,060 --> 00:19:19,550 But we're going to use this in a moment. 337 00:19:19,550 --> 00:19:22,860 It will help us simulate this algorithm without randomness, 338 00:19:22,860 --> 00:19:23,909 basically. 339 00:19:23,909 --> 00:19:24,847 Yeah. 340 00:19:24,847 --> 00:19:27,192 AUDIENCE: If the verifier runs in constant time, 341 00:19:27,192 --> 00:19:29,540 can it either read or was written? 342 00:19:29,540 --> 00:19:34,050 PROFESSOR: So this is constant time in a model of computation 343 00:19:34,050 --> 00:19:36,380 where you can read log n bits in one step. 344 00:19:36,380 --> 00:19:38,582 So your word, let's say, is log n bits long. 345 00:19:38,582 --> 00:19:40,540 So you have enough time to read the randomness. 346 00:19:40,540 --> 00:19:42,789 Obviously you don't have time to read the certificate, 347 00:19:42,789 --> 00:19:44,520 because that has polynomial length. 348 00:19:44,520 --> 00:19:48,050 But yeah, constant time. 349 00:19:48,050 --> 00:19:48,920 Cool. 350 00:19:48,920 --> 00:19:49,830 Other questions? 351 00:19:49,830 --> 00:19:52,460 So that is the definition of PCP. 352 00:19:52,460 --> 00:19:57,255 Now let me relate it to gap problems. 353 00:20:00,010 --> 00:20:21,680 So let's say first claim is that if we look at this gap sap 354 00:20:21,680 --> 00:20:25,840 problem, where b equals one and a is some constant, presumably 355 00:20:25,840 --> 00:20:31,310 less than one, then-- in fact, that should be less than one. 356 00:20:31,310 --> 00:20:36,760 Why did I write strictly less than 1 here? 357 00:20:36,760 --> 00:20:38,620 This is a constant less than one. 358 00:20:38,620 --> 00:20:44,749 Then I claim that problem is in PCP, 359 00:20:44,749 --> 00:20:46,540 that there is a probabilistically checkable 360 00:20:46,540 --> 00:20:47,960 proof for this instance. 361 00:20:47,960 --> 00:20:51,090 Namely, it's a satisfying variable assignment. 362 00:20:51,090 --> 00:20:53,500 Again, this instance either has the prop-- 363 00:20:53,500 --> 00:20:56,170 when in a yes instance all of the entire thing 364 00:20:56,170 --> 00:20:57,220 is satisfiable. 365 00:20:57,220 --> 00:21:05,480 So just like before, I can have a certificate, 366 00:21:05,480 --> 00:21:10,291 just like an NP satisfying assignment to the variables 367 00:21:10,291 --> 00:21:10,790 is good. 368 00:21:15,230 --> 00:21:18,280 In the no instance, now I know, let's say, at most half 369 00:21:18,280 --> 00:21:22,340 of the things are satisfied if this is one half. 370 00:21:22,340 --> 00:21:25,460 And so what is my algorithm going to do? 371 00:21:28,300 --> 00:21:31,330 In order to get some at most constant probability 372 00:21:31,330 --> 00:21:34,462 of failure, it's going to choose a random clause 373 00:21:34,462 --> 00:21:35,795 and check that it was satisfied. 374 00:21:54,530 --> 00:21:55,800 Uniform random. 375 00:21:55,800 --> 00:21:57,580 So I've got log n bits. 376 00:21:57,580 --> 00:21:59,500 Let's say there are n clauses. 377 00:21:59,500 --> 00:22:04,730 So I can choose one of them at random by flipping log n coins. 378 00:22:04,730 --> 00:22:09,915 And then check so that involves-- this is three SATs. 379 00:22:09,915 --> 00:22:11,570 It only involves three variables. 380 00:22:11,570 --> 00:22:14,180 I check those three variable value assignments 381 00:22:14,180 --> 00:22:18,380 in my certificate by random access into the certificate. 382 00:22:18,380 --> 00:22:20,695 In constant time, I determine whether that clause 383 00:22:20,695 --> 00:22:21,640 is satisfied. 384 00:22:21,640 --> 00:22:24,290 If the clause is satisfied, algorithm returns yes. 385 00:22:24,290 --> 00:22:25,780 Otherwise, return no. 386 00:22:25,780 --> 00:22:27,750 Now, if it was a satisfying assignment, 387 00:22:27,750 --> 00:22:29,750 the algorithm will always say yes. 388 00:22:29,750 --> 00:22:31,330 So that's good. 389 00:22:31,330 --> 00:22:34,010 If it was not satisfiable, we know 390 00:22:34,010 --> 00:22:37,700 that, let's say at most half of the clauses are satisfiable. 391 00:22:37,700 --> 00:22:40,020 Which means in every certificate, 392 00:22:40,020 --> 00:22:43,210 the algorithm will say no at least half the time. 393 00:22:43,210 --> 00:22:46,410 And half is whatever that constant is. 394 00:22:46,410 --> 00:22:51,590 So that means the probability that the algorithm 395 00:22:51,590 --> 00:23:02,190 is wrong is less than 1 over the gap, whatever that ratio is. 396 00:23:02,190 --> 00:23:03,710 Cool? 397 00:23:03,710 --> 00:23:04,260 Yeah. 398 00:23:04,260 --> 00:23:06,020 AUDIENCE: So does this [INAUDIBLE]? 399 00:23:08,760 --> 00:23:09,810 So for [INAUDIBLE]. 400 00:23:12,630 --> 00:23:18,680 PROFESSOR: Let me tell you, the PCP theorem 401 00:23:18,680 --> 00:23:22,720 is that NP equals PCP. 402 00:23:22,720 --> 00:23:23,540 This is proved. 403 00:23:23,540 --> 00:23:26,100 So all problems are in PCP. 404 00:23:26,100 --> 00:23:29,370 But this is some motivation for where this class came from. 405 00:23:33,210 --> 00:23:34,760 I'm not going to prove this theorem. 406 00:23:34,760 --> 00:23:36,240 The original proof is super long. 407 00:23:36,240 --> 00:23:38,420 Since then, there have been relatively short proofs. 408 00:23:38,420 --> 00:23:40,900 I think the shortest proof currently is two pages long. 409 00:23:40,900 --> 00:23:42,890 Still not going to prove it because it's 410 00:23:42,890 --> 00:23:47,730 a bit beside the point to some extent. 411 00:23:47,730 --> 00:23:51,400 It does use reductions and gap amplification, 412 00:23:51,400 --> 00:23:55,950 but it's technical to prove it, let's say. 413 00:23:55,950 --> 00:23:59,770 But I will give you some more motivation for why it's true. 414 00:23:59,770 --> 00:24:10,880 So for example, so here's one claim. 415 00:24:10,880 --> 00:24:16,820 If one-- let's change this notation. 416 00:24:16,820 --> 00:24:31,170 If less than 1, comma 1, gap 3SAT is NP-hard, 417 00:24:31,170 --> 00:24:36,650 then NP equals PCP. 418 00:24:36,650 --> 00:24:38,880 So we know that this is true, but before we 419 00:24:38,880 --> 00:24:45,170 know that here-- so we just proved that this thing is NPCP. 420 00:24:45,170 --> 00:24:48,849 And if furthermore this problem-- 421 00:24:48,849 --> 00:24:50,390 we're going to prove this is NP-hard. 422 00:24:50,390 --> 00:24:52,860 That's the motivation. 423 00:24:52,860 --> 00:24:54,800 If you believe that it's NP-hard, 424 00:24:54,800 --> 00:24:58,330 then we know all problems in NP can reduce to this thing. 425 00:24:58,330 --> 00:24:59,980 And then that thing is NPCP. 426 00:24:59,980 --> 00:25:02,350 So that tells us that all problems in NP, 427 00:25:02,350 --> 00:25:06,090 you can convert them into less than 1, comma 1 gap 3SAT 428 00:25:06,090 --> 00:25:09,530 and then get a PCP algorithm for them. 429 00:25:09,530 --> 00:25:13,190 So that would be one way to prove the PCP theorem. 430 00:25:13,190 --> 00:25:16,490 In fact, the reverse is also true. 431 00:25:16,490 --> 00:25:20,770 And this is sort of more directly useful to us. 432 00:25:20,770 --> 00:25:41,180 If, let's say, 3SAT is NPCP, then the gap version of 3SAT 433 00:25:41,180 --> 00:25:41,680 is NP-hard. 434 00:25:45,100 --> 00:25:49,630 This is interesting because-- this is true 435 00:25:49,630 --> 00:25:54,290 because NP equals PCP, in particular 3SAT is NPCP. 436 00:25:54,290 --> 00:25:56,180 And so we're going to be able to conclude, 437 00:25:56,180 --> 00:25:59,850 by a very short argument, that the gap version of 3SAT 438 00:25:59,850 --> 00:26:00,740 is also NP-hard. 439 00:26:00,740 --> 00:26:03,720 And this proves constant factor inapproximability of 3SAT. 440 00:26:03,720 --> 00:26:05,790 We will see a tighter constant in a little bit, 441 00:26:05,790 --> 00:26:09,840 but this will be our first such bound. 442 00:26:09,840 --> 00:26:11,865 And this is a very general kind of algorithm. 443 00:26:11,865 --> 00:26:12,800 It's kind of cool. 444 00:26:16,090 --> 00:26:19,310 So PCP is easy for the gap version of 3SAT. 445 00:26:19,310 --> 00:26:21,970 But suppose there was a probabilistically checkable 446 00:26:21,970 --> 00:26:24,910 proof for just straight up 3SAT when you're not given 447 00:26:24,910 --> 00:26:28,050 any gap bound, which is true. 448 00:26:28,050 --> 00:26:30,580 It does exist. 449 00:26:30,580 --> 00:26:32,580 So we're going to use that algorithm. 450 00:26:32,580 --> 00:26:35,740 And we're going to do a gap-preserving reduction. 451 00:26:46,280 --> 00:26:48,910 The PCP algorithm we're given, because we're looking at PCP 452 00:26:48,910 --> 00:26:52,800 log n, comma order one, runs in constant time. 453 00:26:52,800 --> 00:26:55,720 Constant time algorithm can't do very much. 454 00:26:55,720 --> 00:26:58,200 In particular, I can write the algorithm 455 00:26:58,200 --> 00:27:00,545 as a constant size formula. 456 00:27:04,510 --> 00:27:07,240 It's really a distribution over such formulas defined 457 00:27:07,240 --> 00:27:09,750 by the log n and the bits. 458 00:27:09,750 --> 00:27:13,240 But let's say it's a random variable where 459 00:27:13,240 --> 00:27:17,200 for each possible random choice is a constant size formula that 460 00:27:17,200 --> 00:27:19,510 evaluates to true or false, corresponding 461 00:27:19,510 --> 00:27:21,270 to whether the algorithm says yes or no. 462 00:27:21,270 --> 00:27:22,645 We know we can convert algorithms 463 00:27:22,645 --> 00:27:27,350 to formulas if they're a short amount of time. 464 00:27:27,350 --> 00:27:29,370 So we can make that a CNF formula. 465 00:27:29,370 --> 00:27:31,870 Why not? 466 00:27:31,870 --> 00:27:33,510 3CNF if we want. 467 00:27:33,510 --> 00:27:37,850 My goal is to-- I want to reduce 3SAT 468 00:27:37,850 --> 00:27:39,140 to the gap version of 3SAT. 469 00:27:39,140 --> 00:27:40,730 Because 3SAT we know is NP-hard. 470 00:27:40,730 --> 00:27:44,290 So if I can reduce it to the gap version of 3SAT, I'm happy. 471 00:27:44,290 --> 00:27:47,360 Then I know the gap version of 3SAT is also hard. 472 00:27:47,360 --> 00:27:48,735 So here is my reduction. 473 00:27:53,260 --> 00:27:57,260 So I'm given the 3SAT formula, and the algorithm 474 00:27:57,260 --> 00:28:01,260 evaluates some formula on it and the certificate. 475 00:28:01,260 --> 00:28:12,914 What I'm going to do is try all of the random choices. 476 00:28:12,914 --> 00:28:14,580 Because there's only log n bits, there's 477 00:28:14,580 --> 00:28:17,620 only polynomially many possible choices for those bits. 478 00:28:20,970 --> 00:28:23,592 Order log n so it's n to some constant. 479 00:28:23,592 --> 00:28:25,800 And I want to take this formula, take the conjunction 480 00:28:25,800 --> 00:28:27,750 over all of those choices. 481 00:28:27,750 --> 00:28:31,090 If the algorithm always says yes, 482 00:28:31,090 --> 00:28:33,740 then this formula will be satisfied. 483 00:28:33,740 --> 00:28:38,180 So in the yes instance case, I get a satisfiable formula. 484 00:28:38,180 --> 00:28:45,680 So yes, complies satisfiable, 100% satisfiable. 485 00:28:45,680 --> 00:28:48,410 That corresponds to this number. 486 00:28:48,410 --> 00:28:51,720 I want it to be 100% in the yes case. 487 00:28:51,720 --> 00:28:59,650 In the no case, I know that a constant fraction 488 00:28:59,650 --> 00:29:03,150 of these random choices give a no. 489 00:29:03,150 --> 00:29:06,420 Meaning, they will not be satisfied. 490 00:29:06,420 --> 00:29:10,660 For any choice, any certificate, I 491 00:29:10,660 --> 00:29:14,840 know that a constant fraction of these terms which 492 00:29:14,840 --> 00:29:19,100 I'm conjuncting will evaluate to false because 493 00:29:19,100 --> 00:29:20,260 of the definition of PCP. 494 00:29:20,260 --> 00:29:25,310 That's what probability algorithm saying no means. 495 00:29:25,310 --> 00:29:37,560 So it's a constant fraction of the terms are false. 496 00:29:40,250 --> 00:29:43,310 The terms are the things we're conjuncting over. 497 00:29:43,310 --> 00:29:49,070 But each term here is a constant size CNF formula. 498 00:29:49,070 --> 00:29:51,720 So when I and those together, I really just get one giant 499 00:29:51,720 --> 00:29:53,740 and of clauses. 500 00:29:53,740 --> 00:29:56,330 Constant fraction larger than the number of terms. 501 00:29:56,330 --> 00:29:59,410 And if a term is false, that means at least one 502 00:29:59,410 --> 00:30:01,372 of the clauses is false. 503 00:30:01,372 --> 00:30:03,830 And there's only a constant number of clauses in each term. 504 00:30:03,830 --> 00:30:06,410 So this means a constant fraction 505 00:30:06,410 --> 00:30:10,510 of the clauses in that giant conjunction are also false. 506 00:30:19,500 --> 00:30:21,980 And that is essentially it. 507 00:30:29,420 --> 00:30:30,500 That is my reduction. 508 00:30:42,310 --> 00:30:45,540 So in the yes instance, I get 100% percent satisfiable thing. 509 00:30:45,540 --> 00:30:47,990 In the no instance, I get some constant 510 00:30:47,990 --> 00:30:49,920 strictly less than 1 satisfiable thing. 511 00:30:49,920 --> 00:30:53,010 Because in any solution, I get a constant fraction 512 00:30:53,010 --> 00:30:54,844 that turn out to be false, constant fraction 513 00:30:54,844 --> 00:30:55,468 of the clauses. 514 00:30:55,468 --> 00:30:57,820 Now what the constant is, you'd have to work out things. 515 00:30:57,820 --> 00:31:00,170 You'd have to know how big your PCP algorithm is. 516 00:31:00,170 --> 00:31:03,960 But at least we get a constant lower bound proving-- 517 00:31:03,960 --> 00:31:08,530 in particular, proving there's no P [? task ?] for MAX-3SAT. 518 00:31:08,530 --> 00:31:12,720 This is what you might call a gap-amplifying reduction, 519 00:31:12,720 --> 00:31:15,150 in the sense we started with no gap. 520 00:31:15,150 --> 00:31:17,940 The instance of 3SAT was either true or false. 521 00:31:17,940 --> 00:31:21,105 And we ended up with something with a significant gap. 522 00:31:24,977 --> 00:31:26,560 So what we're going to talk about next 523 00:31:26,560 --> 00:31:28,355 is called gap-preserving reductions. 524 00:31:33,860 --> 00:31:38,120 Maybe before I get there, what we just showed 525 00:31:38,120 --> 00:31:42,370 is that the PCP theorem is equivalent. 526 00:31:42,370 --> 00:31:45,600 And in particular, we get gap problems being NP-hard. 527 00:31:45,600 --> 00:31:48,710 This is why we care about PCPs. 528 00:31:48,710 --> 00:31:53,180 And then in general, once we have 529 00:31:53,180 --> 00:31:55,750 these kinds of gap hardness results, 530 00:31:55,750 --> 00:31:58,670 we convert our-- when we're thinking about reductions 531 00:31:58,670 --> 00:32:03,780 from a to b, because we know gap implies inapproximability, 532 00:32:03,780 --> 00:32:06,600 we could say, OK, 3SAT is inapproximable, 533 00:32:06,600 --> 00:32:10,390 and then do, say, an l reduction from 3SAT to something else. 534 00:32:10,390 --> 00:32:13,500 The something else is therefore inapproximable also. 535 00:32:13,500 --> 00:32:15,880 That's all good. 536 00:32:15,880 --> 00:32:17,600 But we can also, instead of thinking 537 00:32:17,600 --> 00:32:21,090 about the inapproximability and how much carries from a to b, 538 00:32:21,090 --> 00:32:23,550 we can think about the gap directly. 539 00:32:23,550 --> 00:32:26,480 And this is sort of the main approach in this lecture 540 00:32:26,480 --> 00:32:28,580 that I'm trying to demonstrate is by preserving 541 00:32:28,580 --> 00:32:31,440 the gap directly, a, well you get 542 00:32:31,440 --> 00:32:34,350 new gap bounds and generally stronger gap bounds. 543 00:32:34,350 --> 00:32:37,310 And then those imply inapproximability results. 544 00:32:37,310 --> 00:32:40,100 But the gap bounds are stronger than the inapproximability, 545 00:32:40,100 --> 00:32:43,130 and also they tend to give larger constant factors 546 00:32:43,130 --> 00:32:46,030 in the inapproximability results. 547 00:32:46,030 --> 00:32:50,630 So what do we want out of a gap-preserving reduction? 548 00:32:50,630 --> 00:32:58,720 Let's say we have an instance x of A. 549 00:32:58,720 --> 00:33:07,940 We convert that into an instance x prime of some problem B. 550 00:33:07,940 --> 00:33:09,930 We're just going to think about the OPT of x 551 00:33:09,930 --> 00:33:12,800 versus the OPT of x prime. 552 00:33:12,800 --> 00:33:17,980 And what we want for, let's say, a minimization problem 553 00:33:17,980 --> 00:33:21,330 is two properties. 554 00:33:21,330 --> 00:33:27,770 One is that the OPT-- if the OPT of x is at most some k, 555 00:33:27,770 --> 00:33:35,270 then the OPT of x prime is at most some k prime. 556 00:33:35,270 --> 00:33:37,620 And conversely. 557 00:33:54,320 --> 00:33:56,290 So in general, OPT may not be preserved. 558 00:33:56,290 --> 00:34:00,910 But let's say it changes by some prime operation. 559 00:34:00,910 --> 00:34:08,969 So in fact, you can think of k and k prime as functions of n. 560 00:34:08,969 --> 00:34:12,659 So if I know that OPT of x is at most some function of n, 561 00:34:12,659 --> 00:34:16,111 then I get that OPT of x prime is at most some other function 562 00:34:16,111 --> 00:34:17,060 of n. 563 00:34:17,060 --> 00:34:19,420 But there's some known relation between the two. 564 00:34:19,420 --> 00:34:21,760 What I care about is this gap c. 565 00:34:21,760 --> 00:34:23,860 Should be a c prime here. 566 00:34:23,860 --> 00:34:27,310 So what this is saying is suppose I had a gap, if I know 567 00:34:27,310 --> 00:34:30,100 that all the solutions are either less than k 568 00:34:30,100 --> 00:34:33,030 or more than c times k, I want that to be 569 00:34:33,030 --> 00:34:35,290 preserved for some possibly other gap c 570 00:34:35,290 --> 00:34:37,187 prime in the new problem. 571 00:34:37,187 --> 00:34:39,520 So this is pretty general, but this is the sort of thing 572 00:34:39,520 --> 00:34:40,940 we want to preserve. 573 00:34:40,940 --> 00:34:44,679 If we had a gap of c before, we get some gap c prime after. 574 00:34:44,679 --> 00:34:48,600 If c prime equals c, this would be a perfectly gap-preserving 575 00:34:48,600 --> 00:34:49,100 reduction. 576 00:34:49,100 --> 00:34:51,440 Maybe we'll lose some constant factor. 577 00:34:51,440 --> 00:34:54,480 If c prime is greater than c, this 578 00:34:54,480 --> 00:34:56,280 is called gap amplification. 579 00:35:02,150 --> 00:35:04,720 And gap amplification is essentially 580 00:35:04,720 --> 00:35:11,790 how the PCP theorem is shown, by repeatedly 581 00:35:11,790 --> 00:35:14,670 growing the gap until it's something reasonable. 582 00:35:14,670 --> 00:35:18,400 And if you want to, say, prove that set cover is log n hard, 583 00:35:18,400 --> 00:35:20,690 it's a similar thing where you start with a small gap 584 00:35:20,690 --> 00:35:22,610 constant factor, and then you grow it, 585 00:35:22,610 --> 00:35:25,420 and you show you can grow it to log n before you run out 586 00:35:25,420 --> 00:35:28,010 of space to write it in your problem essentially, 587 00:35:28,010 --> 00:35:30,880 before your instance gets more than polynomial. 588 00:35:30,880 --> 00:35:33,240 Or if you want to prove that [INAUDIBLE] can't be solved 589 00:35:33,240 --> 00:35:37,170 in better than whatever n to the 1 minus epsilon, 590 00:35:37,170 --> 00:35:40,510 then a similar trick of gap amplification works. 591 00:35:40,510 --> 00:35:43,080 Those amplification arguments are involved, 592 00:35:43,080 --> 00:35:46,260 and so I'm not going to show them here. 593 00:35:46,260 --> 00:35:49,260 But I will show you an example of a gap-preserving reduction 594 00:35:49,260 --> 00:35:50,685 next, unless there are questions. 595 00:35:53,930 --> 00:36:02,190 Cool So I'm going to reduce a problem which 596 00:36:02,190 --> 00:36:16,220 we have mentioned before, which is MAX 597 00:36:16,220 --> 00:36:19,320 exactly 3 XOR- and XNOR-SAT. 598 00:36:19,320 --> 00:36:22,760 This is linear equations, [INAUDIBLE] two, 599 00:36:22,760 --> 00:36:28,210 where every equation has exactly three terms. 600 00:36:28,210 --> 00:36:38,586 So something like xi XOR xj XOR xk equals one, or something. 601 00:36:38,586 --> 00:36:39,960 You can also have negations here. 602 00:36:42,530 --> 00:36:46,460 So I have a bunch of equations like that. 603 00:36:46,460 --> 00:36:49,191 I'm going to just tell you a gap bound on this problem, 604 00:36:49,191 --> 00:36:51,690 and then we're going to reduce it to another problem, namely 605 00:36:51,690 --> 00:36:54,290 MAX-E3-SAT. 606 00:36:54,290 --> 00:36:58,360 So the claim here is that this problem 607 00:36:58,360 --> 00:37:04,650 is one half plus epsilon, one minus epsilon, gap 608 00:37:04,650 --> 00:37:09,060 hard for any epsilon. 609 00:37:14,460 --> 00:37:15,920 Which in particular implies that it 610 00:37:15,920 --> 00:37:19,830 is one half minus epsilon inapproximable, 611 00:37:19,830 --> 00:37:21,020 unless p equals NP. 612 00:37:24,399 --> 00:37:25,690 But this is of course stronger. 613 00:37:25,690 --> 00:37:28,890 It says if you just look at instances where 614 00:37:28,890 --> 00:37:37,520 let's say 99% of the equations are satisfiable versus when 615 00:37:37,520 --> 00:37:42,060 51% are satisfiable, it's NP-hard to distinguish 616 00:37:42,060 --> 00:37:42,810 between those two. 617 00:37:46,070 --> 00:37:47,940 Why one half here? 618 00:37:47,940 --> 00:37:49,855 Because there is a one half approximation. 619 00:37:54,830 --> 00:37:56,730 I've kind of mentioned the general approach 620 00:37:56,730 --> 00:37:58,773 for approximation algorithms for SAT 621 00:37:58,773 --> 00:38:02,980 is take a random assignment, variable assignment. 622 00:38:02,980 --> 00:38:08,610 And in this case, because these statements are about a parity, 623 00:38:08,610 --> 00:38:10,330 if you think of xk as random, it doesn't 624 00:38:10,330 --> 00:38:11,950 matter what these two are. 625 00:38:11,950 --> 00:38:14,170 50% probability this will be satisfied. 626 00:38:14,170 --> 00:38:17,580 And so you can always satisfy at least half of the clauses 627 00:38:17,580 --> 00:38:20,700 because this randomized algorithm will satisfy half 628 00:38:20,700 --> 00:38:21,540 in expectation. 629 00:38:21,540 --> 00:38:24,840 Therefore, in at least one instance, it will do so. 630 00:38:24,840 --> 00:38:26,740 But if you allow randomized approximation, 631 00:38:26,740 --> 00:38:29,800 this is a one half approximation or a two approximation, 632 00:38:29,800 --> 00:38:32,520 depending on your perspective. 633 00:38:32,520 --> 00:38:34,700 So this is really tight. 634 00:38:34,700 --> 00:38:35,870 That's good news. 635 00:38:35,870 --> 00:38:41,580 And this is essentially a form of the PCP theorem. 636 00:38:41,580 --> 00:38:47,070 PCP theorem says that there's some algorithm, 637 00:38:47,070 --> 00:38:49,970 and you can prove that in fact there is an algorithm that 638 00:38:49,970 --> 00:38:50,800 looks like this. 639 00:38:50,800 --> 00:38:58,010 It's a bunch of linear equations with three terms per equation. 640 00:38:58,010 --> 00:39:02,300 So let's take that as given. 641 00:39:07,690 --> 00:39:10,950 Now, what I want to show is a reduction 642 00:39:10,950 --> 00:39:15,440 from that problem to MAX-E3-SAT. 643 00:39:15,440 --> 00:39:24,840 So remember MAX-E3-SAT, you're given 644 00:39:24,840 --> 00:39:27,680 CNF where every clause has exactly 645 00:39:27,680 --> 00:39:30,840 three distinct literals. 646 00:39:30,840 --> 00:39:34,100 You want to maximize the number of satisfied things. 647 00:39:34,100 --> 00:39:37,670 So this is roughly the problem we were talking about up there. 648 00:39:40,230 --> 00:39:48,570 So first thing I'm going to do is 649 00:39:48,570 --> 00:39:52,840 I want to reduce this to that. 650 00:39:52,840 --> 00:39:55,700 And this is the reduction. 651 00:39:55,700 --> 00:39:57,970 And the first claim is just that it's an L-reduction. 652 00:39:57,970 --> 00:39:59,680 So that's something we're familiar with. 653 00:39:59,680 --> 00:40:00,930 Let's think about it that way. 654 00:40:00,930 --> 00:40:04,510 Then we will think about it in a gap-preserving sense. 655 00:40:04,510 --> 00:40:08,500 So there are two types of equations we need to satisfy, 656 00:40:08,500 --> 00:40:12,547 the sort of odd case or the even case. 657 00:40:12,547 --> 00:40:14,130 Again, each of these could be negated. 658 00:40:14,130 --> 00:40:17,920 I'm just going to double negate means unnegated over here. 659 00:40:17,920 --> 00:40:21,550 So each equation is going to be replaced with exactly four 660 00:40:21,550 --> 00:40:26,260 clauses in the E3-SAT instance. 661 00:40:26,260 --> 00:40:34,660 And the idea is, well, if I want the parity of them to be odd, 662 00:40:34,660 --> 00:40:37,400 it should be the case that at least one of them is true. 663 00:40:37,400 --> 00:40:40,680 And if you stare at it long enough, also when 664 00:40:40,680 --> 00:40:44,890 you put two bars in there, I don't want exactly two of them 665 00:40:44,890 --> 00:40:45,540 to be true. 666 00:40:45,540 --> 00:40:46,970 That's the parity constraint. 667 00:40:46,970 --> 00:40:51,800 If this is true, all four of these should be true. 668 00:40:51,800 --> 00:40:54,140 That's the first claim, just by the parity 669 00:40:54,140 --> 00:40:55,690 of the number of bars. 670 00:40:55,690 --> 00:40:59,010 There's either zero bars or two bars, 671 00:40:59,010 --> 00:41:02,710 or three positive or one positive. 672 00:41:02,710 --> 00:41:04,390 That's the two cases. 673 00:41:04,390 --> 00:41:10,600 And in this situation where I want the parity to be even, 674 00:41:10,600 --> 00:41:14,790 even number of trues, I have all the even number 675 00:41:14,790 --> 00:41:17,190 of trues cases over here. 676 00:41:17,190 --> 00:41:20,736 Here are two of them even, and here none of them even. 677 00:41:23,700 --> 00:41:26,610 And again, if this is satisfied, then all four of those 678 00:41:26,610 --> 00:41:28,250 are satisfied. 679 00:41:28,250 --> 00:41:33,300 Now, if these are not satisfied, by the same argument 680 00:41:33,300 --> 00:41:35,650 you can show that at least one of these is violated. 681 00:41:35,650 --> 00:41:38,940 But in fact, just one will be violated. 682 00:41:38,940 --> 00:41:43,360 So for example, so this is just a case analysis. 683 00:41:43,360 --> 00:41:46,500 Let's say I set all of these to be zero, 684 00:41:46,500 --> 00:41:49,450 and so their XOR is zero and not one. 685 00:41:49,450 --> 00:41:53,270 So if they're all false, then this will not be satisfied, 686 00:41:53,270 --> 00:41:56,650 but the other three will be. 687 00:41:56,650 --> 00:41:59,700 And in general, because we have, for example, 688 00:41:59,700 --> 00:42:03,320 xi appearing true and false in different cases, 689 00:42:03,320 --> 00:42:07,690 you will satisfy three out of four 690 00:42:07,690 --> 00:42:10,730 on the right when you don't satisfy on the left. 691 00:42:10,730 --> 00:42:13,287 So the difference is three versus four. 692 00:42:13,287 --> 00:42:15,620 When these are satisfied, you satisfy four on the right. 693 00:42:15,620 --> 00:42:17,995 When they're unsatisfied, you satisfy three on the right. 694 00:42:17,995 --> 00:42:18,910 That's all. 695 00:42:18,910 --> 00:42:19,410 Claiming. 696 00:42:27,260 --> 00:42:32,540 So if the equation is satisfied, then we 697 00:42:32,540 --> 00:42:38,920 get four in the 3SAT instance. 698 00:42:38,920 --> 00:42:46,670 And if it's unsatisfied, we turn out to get exactly three. 699 00:42:53,280 --> 00:42:56,180 So I want to prove that this is an L-reduction. 700 00:42:56,180 --> 00:42:57,940 To prove L-reduction, we need two things. 701 00:42:57,940 --> 00:43:02,880 One is that the additive gap, if I solve the 3SAT instance 702 00:43:02,880 --> 00:43:07,570 and convert it back into a corresponding solution 703 00:43:07,570 --> 00:43:11,730 to MAX-E3 XNOR SAT, which don't change anything. 704 00:43:11,730 --> 00:43:14,380 The variables are just what they were before. 705 00:43:14,380 --> 00:43:17,970 That the additive gap from OPT on the right side 706 00:43:17,970 --> 00:43:21,170 is at most some constant times the additive gap 707 00:43:21,170 --> 00:43:23,960 on the left side, or vice versa. 708 00:43:23,960 --> 00:43:26,840 In this case, the gap is exactly preserved because it's 709 00:43:26,840 --> 00:43:28,300 four versus three over here. 710 00:43:28,300 --> 00:43:29,980 It's one versus zero over here. 711 00:43:29,980 --> 00:43:32,750 So additive gap remains one. 712 00:43:32,750 --> 00:43:39,165 And that is called beta, I think, in L-reduction land. 713 00:43:45,220 --> 00:43:48,580 So this was property two in the L-reduction. 714 00:43:48,580 --> 00:43:58,130 So the additive error in this case is exactly preserved. 715 00:44:01,630 --> 00:44:02,580 So there's no scale. 716 00:44:02,580 --> 00:44:05,530 Beta equals one. 717 00:44:05,530 --> 00:44:08,540 If there's some other gap, if it was five versus three, 718 00:44:08,540 --> 00:44:12,310 then we'd have beta equal two. 719 00:44:12,310 --> 00:44:13,740 Then there was the other property, 720 00:44:13,740 --> 00:44:17,070 which is you need to show that you don't blow up OPT too much. 721 00:44:17,070 --> 00:44:20,050 We want the OPT on the right hand side 722 00:44:20,050 --> 00:44:23,380 to be at most some constant times OPT on the left hand 723 00:44:23,380 --> 00:44:25,560 side. 724 00:44:25,560 --> 00:44:28,980 This requires a little bit more care 725 00:44:28,980 --> 00:44:31,290 because we need to make sure OPT is linear, basically. 726 00:44:31,290 --> 00:44:36,070 We did a lot of these arguments last lecture. 727 00:44:36,070 --> 00:44:38,040 Because even when you don't satisfy things, 728 00:44:38,040 --> 00:44:38,920 you still get points. 729 00:44:38,920 --> 00:44:42,680 And the difference between zero and three is big ratio. 730 00:44:42,680 --> 00:44:44,442 We want that to not happen too much. 731 00:44:44,442 --> 00:44:46,150 And it doesn't happen too much because we 732 00:44:46,150 --> 00:44:52,140 know the left hand side OPT is at least a half of all clauses. 733 00:44:52,140 --> 00:44:55,400 So it's not like there are very many unsatisfied clauses. 734 00:44:55,400 --> 00:44:56,960 At most, half of them are unsatisfied 735 00:44:56,960 --> 00:45:02,150 because at least half are satisfiable in the case of OPT. 736 00:45:02,150 --> 00:45:06,070 So here's the full argument. 737 00:45:12,930 --> 00:45:18,640 In general, OPT for the 3SAT instance 738 00:45:18,640 --> 00:45:22,020 is going to be four times all the satisfiable things 739 00:45:22,020 --> 00:45:24,280 plus three times all the unsatisfiable things. 740 00:45:24,280 --> 00:45:29,600 This is the same thing as saying the-- sorry. 741 00:45:29,600 --> 00:45:31,900 You take three times the number of equations. 742 00:45:31,900 --> 00:45:35,090 Every equation gets three points for free. 743 00:45:35,090 --> 00:45:38,530 And then if you also satisfy them, you get one more point. 744 00:45:38,530 --> 00:45:42,400 So this is an equation on those things, the two OPTs. 745 00:45:42,400 --> 00:45:44,930 And we get plus three times the number of equations. 746 00:45:44,930 --> 00:45:48,090 And because there is a one half approximation, 747 00:45:48,090 --> 00:45:54,670 we know that number of equations is at most two times OPT. 748 00:46:01,470 --> 00:46:04,300 Because OPT is at least a half the number of equations. 749 00:46:04,300 --> 00:46:10,502 And so this thing is overall at most six plus one seven times 750 00:46:10,502 --> 00:46:11,340 OPT E3 XNOR. 751 00:46:16,260 --> 00:46:21,240 And this is the thing called alpha in L-reduction. 752 00:46:21,240 --> 00:46:23,110 I wanted to compute these explicitly 753 00:46:23,110 --> 00:46:25,810 because I want to see how much inapproximability I get. 754 00:46:25,810 --> 00:46:28,925 Because I started with a tight inapproximability bound 755 00:46:28,925 --> 00:46:32,220 of one half minus epsilon being impossible, 756 00:46:32,220 --> 00:46:34,730 whereas one half is possible. 757 00:46:34,730 --> 00:46:40,100 It's tight up to this very tiny arbitrary additive constant. 758 00:46:40,100 --> 00:46:42,950 And over here, we're going to lose something. 759 00:46:42,950 --> 00:46:45,580 We know from L-reductions, if you were inapproximable 760 00:46:45,580 --> 00:46:47,976 before, you get inapproximability 761 00:46:47,976 --> 00:46:49,100 in this case of MAX-E3-SAT. 762 00:46:49,100 --> 00:46:50,500 E3 763 00:46:50,500 --> 00:46:53,520 So what is the factor? 764 00:46:53,520 --> 00:46:57,250 If you think of-- there's one simplification 765 00:46:57,250 --> 00:46:59,800 here relative to what I presented before. 766 00:46:59,800 --> 00:47:02,400 A couple lectures ago, we always thought about one 767 00:47:02,400 --> 00:47:06,870 plus epsilon approximation, and how does epsilon change. 768 00:47:06,870 --> 00:47:09,520 And that works really well for minimization problems. 769 00:47:09,520 --> 00:47:12,090 For a maximization problem, your approximation factor 770 00:47:12,090 --> 00:47:16,640 is-- an approximation factor of one 771 00:47:16,640 --> 00:47:20,140 plus epsilon means you are at least this thing times OPT. 772 00:47:20,140 --> 00:47:24,000 And this thing gets awkward to work with. 773 00:47:24,000 --> 00:47:26,800 Equivalently, with a different notion of epsilon, 774 00:47:26,800 --> 00:47:30,260 you could just think of a one minus epsilon approximation 775 00:47:30,260 --> 00:47:32,070 and how does epsilon change. 776 00:47:32,070 --> 00:47:35,810 And in general, for maximization problem, 777 00:47:35,810 --> 00:47:38,610 if you have one minus epsilon approximation 778 00:47:38,610 --> 00:47:41,100 before the L-reduction, then afterwards you 779 00:47:41,100 --> 00:47:47,870 will have a one minus epsilon over alpha beta. 780 00:47:47,870 --> 00:47:50,750 So for maximization, we had one plus epsilon. 781 00:47:50,750 --> 00:47:53,270 And then we got one plus epsilon over alpha beta. 782 00:47:53,270 --> 00:47:54,900 With the minuses, it also works out. 783 00:47:54,900 --> 00:47:57,550 That's a cleaner way to do maximization. 784 00:47:57,550 --> 00:47:59,030 So this was a maximization problem. 785 00:47:59,030 --> 00:48:02,880 We had over here epsilon was-- sorry, 786 00:48:02,880 --> 00:48:04,160 different notions of epsilon. 787 00:48:04,160 --> 00:48:07,180 Here we have one half inapproximability 788 00:48:07,180 --> 00:48:09,780 One half is also known as one minus one half. 789 00:48:09,780 --> 00:48:12,500 So epsilon here is a half. 790 00:48:12,500 --> 00:48:16,280 And alpha was seven. 791 00:48:16,280 --> 00:48:20,460 Beta was one. 792 00:48:20,460 --> 00:48:22,590 And so we just divide by seven. 793 00:48:22,590 --> 00:48:30,650 So in this case, we get that MAX-E3-SAT 794 00:48:30,650 --> 00:48:40,920 is one minus one half divided by seven, which is 1/14. 795 00:48:40,920 --> 00:48:45,100 Technically there's a minus epsilon here. 796 00:48:45,100 --> 00:48:47,559 Sorry, bad overuse of epsilon. 797 00:48:47,559 --> 00:48:49,600 This is, again, for any epsilon greater than zero 798 00:48:49,600 --> 00:48:52,560 because we had some epsilon greater than zero here. 799 00:48:52,560 --> 00:48:54,890 Slightly less than one half is impossible. 800 00:48:54,890 --> 00:49:00,010 So over here we get slightly less than one minus 1/14 801 00:49:00,010 --> 00:49:00,850 is impossible. 802 00:49:00,850 --> 00:49:14,030 This is 13/14 minus epsilon, which is OK. 803 00:49:14,030 --> 00:49:15,370 It's a bound. 804 00:49:15,370 --> 00:49:17,090 But it's not a tight bound. 805 00:49:17,090 --> 00:49:20,710 The right answer for MAX-3SAT is 7/8. 806 00:49:20,710 --> 00:49:23,700 Because if you take, again, a uniform random assignment, 807 00:49:23,700 --> 00:49:28,100 every variable flips a coin, heads or tails, true or false. 808 00:49:28,100 --> 00:49:32,470 Then 7/8 of the clauses will be satisfied in expectation. 809 00:49:32,470 --> 00:49:36,170 Because if you look at a clause, if it has exactly three terms 810 00:49:36,170 --> 00:49:39,720 and it's an or of three things, you just need at least one head 811 00:49:39,720 --> 00:49:41,180 to satisfy this thing. 812 00:49:41,180 --> 00:49:43,770 So you get a 50% chance to do it in the first time, 813 00:49:43,770 --> 00:49:45,978 and then a quarter chance to do it in the third time, 814 00:49:45,978 --> 00:49:51,760 and in general 7/8 chance to get it one of the three times. 815 00:49:51,760 --> 00:49:57,050 7/8 is smaller than 13/14, so we're not quite there yet. 816 00:49:57,050 --> 00:50:01,060 But this reduction will do it if we 817 00:50:01,060 --> 00:50:02,940 think about it from the perspective 818 00:50:02,940 --> 00:50:05,390 of gap-preserving reductions. 819 00:50:05,390 --> 00:50:07,990 So from this general L-reduction black box 820 00:50:07,990 --> 00:50:13,540 that we only lose an alpha beta factor, yeah we get this bound. 821 00:50:13,540 --> 00:50:16,600 But from a gap perspective, we can do better. 822 00:50:16,600 --> 00:50:19,500 The reason we can do better is because gaps are always 823 00:50:19,500 --> 00:50:22,410 talking about yes instances where lots of things 824 00:50:22,410 --> 00:50:23,112 are satisfied. 825 00:50:23,112 --> 00:50:25,570 That means we're most of the time in the case where we have 826 00:50:25,570 --> 00:50:28,897 fours on the right hand side, or a situation where we have lots 827 00:50:28,897 --> 00:50:31,230 of things unsatisfied, that means we have lots of threes 828 00:50:31,230 --> 00:50:32,330 on the right hand side. 829 00:50:32,330 --> 00:50:35,230 It lets us get a slightly tighter bound. 830 00:50:35,230 --> 00:50:36,030 So let's do that. 831 00:50:57,340 --> 00:51:04,630 So here is a gap argument about the same reduction. 832 00:51:04,630 --> 00:51:15,820 What we're going to claim is that 7/8 minus epsilon gap 3SAT 833 00:51:15,820 --> 00:51:21,000 is NP-hard, which implies 7/8 inapproximability, 834 00:51:21,000 --> 00:51:23,040 but by looking at it from the gap perspective, 835 00:51:23,040 --> 00:51:28,110 we will get this stronger bound versus the 13/14 bound. 836 00:51:28,110 --> 00:51:33,430 So the proof is by a gap-preserving reduction, 837 00:51:33,430 --> 00:51:41,800 namely that reduction, from MAX-E3-XNOR-SAT to MAX-3SAT, 838 00:51:41,800 --> 00:51:43,450 E3-SAT I should say. 839 00:51:46,990 --> 00:51:49,420 And so the idea is the following. 840 00:51:49,420 --> 00:51:51,940 Either we have a yes instance or a no instance. 841 00:51:55,200 --> 00:52:01,940 If we have a yes instance to the equation problem, 842 00:52:01,940 --> 00:52:09,010 then we know that at least one minus epsilon of the equations 843 00:52:09,010 --> 00:52:11,650 are satisfiable. 844 00:52:11,650 --> 00:52:15,820 So we have one minus epsilon. 845 00:52:15,820 --> 00:52:17,445 Let's say m is the number of equations. 846 00:52:26,570 --> 00:52:28,030 In the no instance case, of course 847 00:52:28,030 --> 00:52:30,410 we know that not too many are satisfied. 848 00:52:30,410 --> 00:52:35,890 At most, one half plus epsilon fraction of the equations 849 00:52:35,890 --> 00:52:37,250 are satisfiable. 850 00:52:43,460 --> 00:52:48,060 So in both cases, I want to see what that converts into. 851 00:52:48,060 --> 00:52:53,457 So in the yes instance, we get all four 852 00:52:53,457 --> 00:52:56,690 of those things being satisfied. 853 00:52:56,690 --> 00:53:02,230 So that means we're going to have at least one 854 00:53:02,230 --> 00:53:08,200 minus epsilon times m times four clauses satisfied. 855 00:53:08,200 --> 00:53:11,460 We'll also have epsilon m times three. 856 00:53:11,460 --> 00:53:12,925 Those are the unsatisfied. 857 00:53:12,925 --> 00:53:14,950 And maybe some of them are actually satisfied, 858 00:53:14,950 --> 00:53:18,337 but this is a lower bound on how many clauses we get. 859 00:53:21,870 --> 00:53:24,190 On the other hand, in this situation 860 00:53:24,190 --> 00:53:25,860 where not too many are satisfied, 861 00:53:25,860 --> 00:53:28,060 that means we get a tighter upper bound. 862 00:53:28,060 --> 00:53:37,020 So we have one half plus epsilon times m times four. 863 00:53:37,020 --> 00:53:44,490 And then there's the rest, one half minus epsilon times three. 864 00:53:44,490 --> 00:53:46,620 And maybe some of these are not satisfied, 865 00:53:46,620 --> 00:53:51,090 but this is an upper bound on how many clauses are satisfied 866 00:53:51,090 --> 00:53:55,670 in the 3SAT instance versus equations in the 3x [INAUDIBLE] 867 00:53:55,670 --> 00:53:57,500 SAT instance. 868 00:53:57,500 --> 00:54:01,770 Now I just want to compute these. 869 00:54:01,770 --> 00:54:04,040 So everything's times m. 870 00:54:04,040 --> 00:54:06,830 And over here we have four minus four epsilon. 871 00:54:06,830 --> 00:54:09,850 Over here we have plus three epsilon. 872 00:54:09,850 --> 00:54:14,390 So that is four minus epsilon m. 873 00:54:14,390 --> 00:54:17,420 And here we have again everything is times m. 874 00:54:17,420 --> 00:54:25,960 So we have 4/2, also known as two, plus four epsilon. 875 00:54:28,470 --> 00:54:33,600 Plus we have 3/2 minus three epsilon. 876 00:54:33,600 --> 00:54:35,805 So the epsilons add up to plus epsilon. 877 00:54:35,805 --> 00:54:36,680 Then I check and see. 878 00:54:36,680 --> 00:54:39,750 Four epsilon minus three epsilon. 879 00:54:39,750 --> 00:54:45,371 And then we have 4/2 plus 3/2, also known as 7/2. 880 00:54:45,371 --> 00:54:45,870 Yes. 881 00:54:54,120 --> 00:54:59,520 So we had a gap before, and we get this new gap after. 882 00:54:59,520 --> 00:55:01,135 When we have a yes instance, we know 883 00:55:01,135 --> 00:55:03,010 that there will be at least this many clauses 884 00:55:03,010 --> 00:55:04,650 satisfied in the 3SAT. 885 00:55:04,650 --> 00:55:07,520 And there'll be at most this many in the no instance. 886 00:55:07,520 --> 00:55:15,340 So what we proved is this bound that-- sorry, 887 00:55:15,340 --> 00:55:16,560 get them in the right order. 888 00:55:16,560 --> 00:55:18,740 7/2 is the smaller one. 889 00:55:18,740 --> 00:55:27,720 7/2 plus epsilon, comma four minus epsilon gap 3SAT, 890 00:55:27,720 --> 00:55:33,520 E3-SAT, is NP-hard. 891 00:55:33,520 --> 00:55:36,990 Because we had NP hardness of the gap before, 892 00:55:36,990 --> 00:55:38,570 we did this gap-preserving reduction, 893 00:55:38,570 --> 00:55:40,787 which ended up with this new gap, 894 00:55:40,787 --> 00:55:42,620 with this being for no instances, this being 895 00:55:42,620 --> 00:55:44,640 for yes instances. 896 00:55:44,640 --> 00:55:47,510 And so if we want to-- this is with the comma notation 897 00:55:47,510 --> 00:55:51,100 for the yes and no what fraction is satisfied. 898 00:55:51,100 --> 00:55:56,370 If you convert it back into the c gap notation, 899 00:55:56,370 --> 00:55:58,880 you just take the ratio between these two things. 900 00:55:58,880 --> 00:56:04,190 And ignoring the epsilons, this is like 4 divided by 7/2. 901 00:56:04,190 --> 00:56:12,140 So that is 7/8 or 8/7, depending on which way you're looking. 902 00:56:12,140 --> 00:56:18,080 So we get also 7/8 gap. 903 00:56:18,080 --> 00:56:24,590 Sorry, I guess it's 8/7 the way I was phrasing it before. 904 00:56:24,590 --> 00:56:26,200 It's also NP-hard. 905 00:56:26,200 --> 00:56:28,994 And so that proves-- there's also a minus epsilon. 906 00:56:28,994 --> 00:56:30,160 So I should have kept those. 907 00:56:30,160 --> 00:56:34,380 Slightly different epsilon, but minus two epsilon, whatever. 908 00:56:34,380 --> 00:56:37,800 And so this gives us the 8/7 is the best approximation 909 00:56:37,800 --> 00:56:39,180 factor we can hope for. 910 00:56:39,180 --> 00:56:40,800 AUDIENCE: In the first notation, isn't 911 00:56:40,800 --> 00:56:42,352 it the fraction of clauses? 912 00:56:42,352 --> 00:56:44,877 So between zero and one? 913 00:56:44,877 --> 00:56:45,710 PROFESSOR: Oh, yeah. 914 00:56:45,710 --> 00:56:47,520 Four is a little funny. 915 00:56:47,520 --> 00:56:48,020 Right. 916 00:56:48,020 --> 00:56:52,600 I needed to scale-- thank you-- because the number of clauses 917 00:56:52,600 --> 00:56:54,970 in the resulting thing is actually 4m, not m. 918 00:56:54,970 --> 00:56:58,990 So everything here needs to be divided by four. 919 00:56:58,990 --> 00:57:01,950 It won't affect the final ratio, but this should really 920 00:57:01,950 --> 00:57:06,050 be over four and over four. 921 00:57:06,050 --> 00:57:15,190 So also known as 7/8 plus epsilon, 922 00:57:15,190 --> 00:57:18,190 comma one minus epsilon. 923 00:57:18,190 --> 00:57:21,470 Now it's a little clearer, 7/8. 924 00:57:21,470 --> 00:57:24,260 Cool. 925 00:57:24,260 --> 00:57:25,252 Yeah. 926 00:57:25,252 --> 00:57:30,220 AUDIENCE: So are there any [INAUDIBLE] randomness? 927 00:57:30,220 --> 00:57:34,470 AUDIENCE: So for [INAUDIBLE], you can be the randomness. 928 00:57:34,470 --> 00:57:36,780 Randomness would give you one half. 929 00:57:36,780 --> 00:57:40,026 [INAUDIBLE] algorithm gives you 1.8. 930 00:57:40,026 --> 00:57:42,150 PROFESSOR: So you can beat it by a constant factor. 931 00:57:42,150 --> 00:57:44,340 Probably not by more than a constant factor. 932 00:57:44,340 --> 00:57:48,280 MAX CUT is an example where you can beat it. 933 00:57:48,280 --> 00:57:51,720 I think I have the Goemans Williamson bound here. 934 00:57:55,270 --> 00:57:57,360 MAX CUT, the best approximation is 935 00:57:57,360 --> 00:58:01,190 0.878, which is better than what you get by random, 936 00:58:01,190 --> 00:58:03,620 which is a half I guess. 937 00:58:03,620 --> 00:58:05,870 Cool. 938 00:58:05,870 --> 00:58:06,370 All right. 939 00:58:08,740 --> 00:58:09,240 Cool. 940 00:58:09,240 --> 00:58:12,350 So we get optimal bound for MAX-E3-SAT, 941 00:58:12,350 --> 00:58:16,751 assuming an optimum bound for E3-XNOR-SAT, which is from PCP. 942 00:58:16,751 --> 00:58:17,250 Yeah. 943 00:58:17,250 --> 00:58:19,000 AUDIENCE: So I'm sorry, can you explain to me again why 944 00:58:19,000 --> 00:58:20,645 we don't get this from the L-reduction, 945 00:58:20,645 --> 00:58:22,270 but we do get it from the gap argument, 946 00:58:22,270 --> 00:58:24,510 even though the reduction is the same reduction? 947 00:58:24,510 --> 00:58:27,149 PROFESSOR: It just lets us give a tighter argument 948 00:58:27,149 --> 00:58:27,690 in this case. 949 00:58:27,690 --> 00:58:30,800 By thinking about yes instances and no instances separately, 950 00:58:30,800 --> 00:58:32,320 we get one thing. 951 00:58:32,320 --> 00:58:35,820 Because this reduction is designed to do different things 952 00:58:35,820 --> 00:58:37,354 for yes and no instances. 953 00:58:37,354 --> 00:58:39,270 Whereas the L-reduction just says generically, 954 00:58:39,270 --> 00:58:42,150 if you satisfy these parameters alpha and beta, 955 00:58:42,150 --> 00:58:44,460 you get some inapproximability result on the output, 956 00:58:44,460 --> 00:58:46,470 but it's conservative. 957 00:58:46,470 --> 00:58:47,610 It's a conservative bound. 958 00:58:47,610 --> 00:58:50,000 If you just use properties one and two up here, 959 00:58:50,000 --> 00:58:51,650 that's the best you could show. 960 00:58:51,650 --> 00:58:56,114 But by essentially reanalyzing property one, 961 00:58:56,114 --> 00:58:58,280 but thinking separately about yes and no instances-- 962 00:58:58,280 --> 00:59:00,370 this held for all instances. 963 00:59:00,370 --> 00:59:02,010 We got a bound of seven. 964 00:59:02,010 --> 00:59:03,705 But in the yes and the no cases, you 965 00:59:03,705 --> 00:59:05,705 can essentially get a slightly tighter constant. 966 00:59:09,541 --> 00:59:10,040 All right. 967 00:59:10,040 --> 00:59:14,000 I want to tell you about another cool problem. 968 00:59:31,990 --> 00:59:46,040 Another gap hardness that you can get out of PCP analysis 969 00:59:46,040 --> 00:59:52,090 by some gap amplification essentially, which 970 00:59:52,090 --> 00:59:53,060 is called label cover. 971 01:00:00,110 --> 01:00:04,470 So this problem takes a little bit of time to define. 972 01:00:04,470 --> 01:00:08,020 But the basic point is there are very strong lower bounds 973 01:00:08,020 --> 01:00:09,270 on the approximation factor. 974 01:00:26,030 --> 01:00:28,300 So you're given a bipartite graph, no weights. 975 01:00:32,680 --> 01:00:37,620 The bipartition is A, B. And furthermore, A 976 01:00:37,620 --> 01:00:40,495 can be divided into k chunks. 977 01:00:44,120 --> 01:00:51,515 And so can B. And these are disjoint unions. 978 01:00:54,120 --> 01:01:01,480 And let's say size of A is n, size of B is n, 979 01:01:01,480 --> 01:01:08,010 and size of each Ai is also the same. 980 01:01:08,010 --> 01:01:11,736 We don't have to make these assumptions, but you can. 981 01:01:11,736 --> 01:01:13,670 So let's make it a little bit cleaner. 982 01:01:13,670 --> 01:01:17,340 So in general, A consists of k groups, each of size n over k. 983 01:01:17,340 --> 01:01:21,470 B consists of k groups, each of size n over k. 984 01:01:21,470 --> 01:01:26,010 So that's our-- we have A here with these little groups. 985 01:01:26,010 --> 01:01:29,560 We have B, these little groups. 986 01:01:29,560 --> 01:01:31,060 And there's some edges between them. 987 01:01:37,700 --> 01:01:43,890 In general, your goal is to choose some subset of A, 988 01:01:43,890 --> 01:01:49,900 let's call it A prime, and some subset of B, call it B prime. 989 01:01:49,900 --> 01:01:55,680 And one other thing I want to talk about 990 01:01:55,680 --> 01:01:57,680 is called a super edge. 991 01:02:02,550 --> 01:02:05,280 And then I'll say what we want out of these subsets 992 01:02:05,280 --> 01:02:06,720 that we choose. 993 01:02:06,720 --> 01:02:09,830 Imagine contracting each of these groups. 994 01:02:09,830 --> 01:02:13,250 There are n over k items here, and there 995 01:02:13,250 --> 01:02:16,330 are k different groups. 996 01:02:16,330 --> 01:02:20,200 Imagine contracting each group to a single vertex. 997 01:02:20,200 --> 01:02:22,170 This is A1. 998 01:02:22,170 --> 01:02:25,310 This is B3. 999 01:02:25,310 --> 01:02:27,820 I want to say that there's a super edge from the group 1000 01:02:27,820 --> 01:02:29,790 A1 to the group B3 because there's 1001 01:02:29,790 --> 01:02:31,790 at least one edge between them. 1002 01:02:31,790 --> 01:02:33,890 If I squashed A1 to a single vertex, 1003 01:02:33,890 --> 01:02:37,110 B3 down to a single vertex, I would get an edge between them. 1004 01:02:37,110 --> 01:02:45,100 So a super edge, Ai Bi-- Ai Bj, I should say-- 1005 01:02:45,100 --> 01:02:50,110 exists if there's at least one edge 1006 01:02:50,110 --> 01:02:56,840 in AI cross Bj, at least one edge connecting those groups. 1007 01:02:56,840 --> 01:03:02,110 And I'm going to call such a super edge covered by A prime B 1008 01:03:02,110 --> 01:03:09,910 prime if at least one of those edges is in this chosen set. 1009 01:03:09,910 --> 01:03:12,952 So if there's at least one edge-- sorry. 1010 01:03:16,150 --> 01:03:21,090 If this Ai cross Bj, these are all the possible edges 1011 01:03:21,090 --> 01:03:33,920 between those groups, intersects A prime cross B prime. 1012 01:03:33,920 --> 01:03:37,320 And in general, I want to cover all the hyper edges if I can. 1013 01:03:37,320 --> 01:03:40,460 So I would like to have a solution where, 1014 01:03:40,460 --> 01:03:44,070 if there is some edge between A1 and B3, 1015 01:03:44,070 --> 01:03:46,930 then in the set of vertices I choose, 1016 01:03:46,930 --> 01:03:51,600 A prime and B prime in the left, they induce at least one edge 1017 01:03:51,600 --> 01:03:54,980 from A1 to B3, and also from A2 to B3 1018 01:03:54,980 --> 01:03:57,900 because there is an edge that I drew here. 1019 01:03:57,900 --> 01:04:00,697 I want ideally to choose the endpoints of that edge, 1020 01:04:00,697 --> 01:04:02,780 or some other edge that connects those two groups. 1021 01:04:02,780 --> 01:04:03,125 Yeah. 1022 01:04:03,125 --> 01:04:05,310 AUDIENCE: So you're choosing subsets A prime of A. 1023 01:04:05,310 --> 01:04:07,434 Is there some restriction on the subset you choose? 1024 01:04:07,434 --> 01:04:08,879 Why don't you choose all of A? 1025 01:04:08,879 --> 01:04:09,545 PROFESSOR: Wait. 1026 01:04:09,545 --> 01:04:10,293 AUDIENCE: Oh, OK. 1027 01:04:10,293 --> 01:04:11,800 You're not done yet? 1028 01:04:11,800 --> 01:04:14,095 PROFESSOR: Nope. 1029 01:04:14,095 --> 01:04:15,595 That's about half of the definition. 1030 01:04:23,190 --> 01:04:26,130 it's a lot to say it's not that complicated of a problem. 1031 01:04:30,800 --> 01:04:33,180 So there's two versions. 1032 01:04:33,180 --> 01:04:35,410 That's part of what makes it longer. 1033 01:04:35,410 --> 01:04:37,350 We'll start with the maximization version, 1034 01:04:37,350 --> 01:04:39,520 which is called Max-Rep. 1035 01:04:39,520 --> 01:04:42,880 So we have two constraints on A prime and B prime. 1036 01:04:47,020 --> 01:04:50,180 First is that we choose exactly one vertex from each group. 1037 01:04:57,700 --> 01:05:02,920 So we got A prime intersect Ai equals 1038 01:05:02,920 --> 01:05:11,600 one, and B prime intersect Bj equals one, for all i and j. 1039 01:05:11,600 --> 01:05:15,720 OK And then subject to that constraint, 1040 01:05:15,720 --> 01:05:20,380 we want to maximize the number of covered super edges. 1041 01:05:28,140 --> 01:05:33,250 Intuition here is that those groups are labels. 1042 01:05:33,250 --> 01:05:35,440 And there's really one super vertex there, 1043 01:05:35,440 --> 01:05:37,860 and you want to choose one of those labels 1044 01:05:37,860 --> 01:05:39,720 to satisfy the instance. 1045 01:05:39,720 --> 01:05:42,380 So here you're only allowed to choose one label per vertex. 1046 01:05:42,380 --> 01:05:44,860 We choose one out of each of the groups. 1047 01:05:44,860 --> 01:05:48,330 Then you'd like to cover as many edges as you can. 1048 01:05:48,330 --> 01:05:53,736 If there is an edge in the super graph from Ai to Bj, 1049 01:05:53,736 --> 01:05:57,837 you would like to include an induced edge. 1050 01:05:57,837 --> 01:05:59,920 There should actually be an edge between the label 1051 01:05:59,920 --> 01:06:03,520 you assign to Ai and the label you assign to Bj. 1052 01:06:03,520 --> 01:06:05,690 That's this version. 1053 01:06:05,690 --> 01:06:09,060 The complementary problem is a minimization problem 1054 01:06:09,060 --> 01:06:13,400 where we switch what is relaxed, what constraint is relaxed, 1055 01:06:13,400 --> 01:06:15,990 and what constraint must hold. 1056 01:06:15,990 --> 01:06:19,270 So here we're going to allow multiple labels 1057 01:06:19,270 --> 01:06:21,980 for each super vertex, multiple vertices 1058 01:06:21,980 --> 01:06:24,560 to be chosen from each group. 1059 01:06:24,560 --> 01:06:28,970 Instead we force that everything is covered. 1060 01:06:28,970 --> 01:06:39,220 We want to cover every super edge that exists. 1061 01:06:39,220 --> 01:06:45,710 And our goal is to minimize the size of these sets, 1062 01:06:45,710 --> 01:06:48,560 A prime plus B prime. 1063 01:06:48,560 --> 01:06:51,320 So this is sort of the dual problem. 1064 01:06:51,320 --> 01:06:52,859 Here we force one level per vertex. 1065 01:06:52,859 --> 01:06:54,900 We want to maximize the number of covered things. 1066 01:06:54,900 --> 01:06:56,660 Here we force everything to be covered. 1067 01:06:56,660 --> 01:07:00,082 We want to essentially minimize the number of labels we assign. 1068 01:07:03,060 --> 01:07:05,980 So these problems are both very hard. 1069 01:07:05,980 --> 01:07:09,540 This should build you some more intuition. 1070 01:07:09,540 --> 01:07:13,360 Let me show you a puzzle which is basically 1071 01:07:13,360 --> 01:07:19,100 exactly this game, designed by MIT professor Dana Moshkovitz. 1072 01:07:19,100 --> 01:07:22,110 So here's a word puzzle. 1073 01:07:22,110 --> 01:07:24,980 Your goal is to put letters into each of these boxes-- this 1074 01:07:24,980 --> 01:07:30,260 is B, and this is A-- such that-- for example, this 1075 01:07:30,260 --> 01:07:33,740 is animal, which means these three things pointed 1076 01:07:33,740 --> 01:07:37,410 by the red arrows, those letters should concatenate 1077 01:07:37,410 --> 01:07:41,480 to form an animal, like cat. 1078 01:07:41,480 --> 01:07:43,840 Bat is the example. 1079 01:07:43,840 --> 01:07:48,030 So if I write B, A, and T, animal is satisfied perfectly. 1080 01:07:48,030 --> 01:07:51,460 Because all three letters form a word, 1081 01:07:51,460 --> 01:07:54,590 I get three points so far. 1082 01:07:54,590 --> 01:07:57,290 Next let's think about transportation. 1083 01:07:57,290 --> 01:07:59,180 For example, cab is a three-letter word 1084 01:07:59,180 --> 01:08:00,410 that is transportation. 1085 01:08:00,410 --> 01:08:02,200 Notice there's always three over here. 1086 01:08:02,200 --> 01:08:07,870 This corresponds to some regularity constraint 1087 01:08:07,870 --> 01:08:09,210 on the bipartite graph. 1088 01:08:09,210 --> 01:08:12,010 There's always going to be three arrows going from left to right 1089 01:08:12,010 --> 01:08:16,960 for every group. 1090 01:08:16,960 --> 01:08:18,550 So transportation, fine. 1091 01:08:18,550 --> 01:08:22,630 We got C-A-B. That is happy. 1092 01:08:22,630 --> 01:08:25,330 We happen to reuse the A, so we get three more points, 1093 01:08:25,330 --> 01:08:26,689 total of six. 1094 01:08:26,689 --> 01:08:32,220 Furniture, we have B, blank, and T left. 1095 01:08:32,220 --> 01:08:33,720 This is going to be a little harder. 1096 01:08:33,720 --> 01:08:35,720 I don't know of any furniture that starts with B 1097 01:08:35,720 --> 01:08:38,160 and ends with T and is three letters long. 1098 01:08:38,160 --> 01:08:40,950 But if you, for example, write an E here, 1099 01:08:40,950 --> 01:08:44,005 that's pretty close to the word bed, which is furniture. 1100 01:08:44,005 --> 01:08:45,880 So in general, of course, each of these words 1101 01:08:45,880 --> 01:08:48,580 corresponds to a set of English words. 1102 01:08:48,580 --> 01:08:52,290 That's going to be the groups on the left. 1103 01:08:52,290 --> 01:08:54,630 So this Ai group for furniture is 1104 01:08:54,630 --> 01:08:57,229 the set of all words that are furniture and three letters 1105 01:08:57,229 --> 01:08:59,149 long. 1106 01:08:59,149 --> 01:09:02,104 And then for each such choice on the left, 1107 01:09:02,104 --> 01:09:03,520 for each such choice on the right, 1108 01:09:03,520 --> 01:09:04,978 you can say is, are they compatible 1109 01:09:04,978 --> 01:09:08,180 by either putting an edge or not. 1110 01:09:08,180 --> 01:09:12,300 And so this is-- we got two out of three of these edges. 1111 01:09:12,300 --> 01:09:13,649 These two are satisfied. 1112 01:09:13,649 --> 01:09:14,330 This one's not. 1113 01:09:14,330 --> 01:09:18,099 So we get two more points for a total of eight. 1114 01:09:18,099 --> 01:09:19,640 This is for the maximization problem. 1115 01:09:19,640 --> 01:09:22,830 Minimization would be different. 1116 01:09:22,830 --> 01:09:27,109 Here's a verb, where we almost get cry, C-B-Y. 1117 01:09:27,109 --> 01:09:29,600 So we get two more points. 1118 01:09:29,600 --> 01:09:30,870 Here is another. 1119 01:09:30,870 --> 01:09:32,170 We want a verb. 1120 01:09:32,170 --> 01:09:36,270 Blank, A, Y. There are multiple such verbs. 1121 01:09:36,270 --> 01:09:37,920 You can think of them. 1122 01:09:37,920 --> 01:09:40,800 And on the other hand, we have a food, which 1123 01:09:40,800 --> 01:09:44,970 is supposed to be blank, E, Y. So a pretty good choice 1124 01:09:44,970 --> 01:09:46,529 would be P for that top letter. 1125 01:09:46,529 --> 01:09:49,800 Then you get pay exactly and almost get pea. 1126 01:09:49,800 --> 01:09:52,510 So a total score of 15. 1127 01:09:52,510 --> 01:09:58,585 And so this would be a solution to Max-Rep of cost 15. 1128 01:09:58,585 --> 01:10:00,275 It's not the best. 1129 01:10:00,275 --> 01:10:02,150 And if you stare at this example long enough, 1130 01:10:02,150 --> 01:10:05,630 you can actually get a perfect solution of score 18, where 1131 01:10:05,630 --> 01:10:06,630 there are no violations. 1132 01:10:06,630 --> 01:10:12,440 Basically, in particular you do say here and get soy for food. 1133 01:10:12,440 --> 01:10:16,015 AUDIENCE: So the sets on the right are 26 letters? 1134 01:10:16,015 --> 01:10:16,640 PROFESSOR: Yes. 1135 01:10:16,640 --> 01:10:19,760 The Bis here are the alphabet A through Z, 1136 01:10:19,760 --> 01:10:22,019 and the sets on the left are a set of words. 1137 01:10:22,019 --> 01:10:24,310 And then you're going to connect two of them by an edge 1138 01:10:24,310 --> 01:10:30,530 if that letter happens to match on the right, [INAUDIBLE] 1139 01:10:30,530 --> 01:10:31,660 letter. 1140 01:10:31,660 --> 01:10:33,650 So it's a little-- I mean, the mapping 1141 01:10:33,650 --> 01:10:34,650 is slightly complicated. 1142 01:10:34,650 --> 01:10:37,430 But this is a particular instance of Max-Rep. 1143 01:10:41,080 --> 01:10:49,430 So what-- well, we get some super extreme hardness 1144 01:10:49,430 --> 01:10:51,260 for these problems. 1145 01:10:51,260 --> 01:11:05,710 So let's start with epsilon, comma one gap Max-Rep 1146 01:11:05,710 --> 01:11:06,350 is NP-hard. 1147 01:11:16,160 --> 01:11:20,500 So what I mean by this is in the best situation, 1148 01:11:20,500 --> 01:11:22,860 you cover all of the super edges. 1149 01:11:22,860 --> 01:11:26,021 So the one means 100% of the super edges are covered. 1150 01:11:26,021 --> 01:11:28,270 Epsilon means that at most an epsilon fraction of them 1151 01:11:28,270 --> 01:11:29,170 are covered. 1152 01:11:29,170 --> 01:11:30,297 So that problem is NP-hard. 1153 01:11:30,297 --> 01:11:32,255 This is a bit stronger than what we had before. 1154 01:11:32,255 --> 01:11:36,270 Before we had a particular constant, comma one or one 1155 01:11:36,270 --> 01:11:38,200 minus epsilon or something. 1156 01:11:38,200 --> 01:11:41,170 Here, for any constant epsilon, this is true. 1157 01:11:43,842 --> 01:11:45,550 And there's a similar result for Min-Rep. 1158 01:11:45,550 --> 01:11:49,160 It's just from one to one over epsilon. 1159 01:11:49,160 --> 01:11:51,870 So this means there is no constant factor approximation. 1160 01:11:51,870 --> 01:11:55,490 Max-Rep is not in APX. 1161 01:11:55,490 --> 01:11:57,460 But it's worse than that. 1162 01:11:57,460 --> 01:12:01,580 We need to assume slightly more. 1163 01:12:01,580 --> 01:12:08,336 In general, what you can show, if you have some constant, p, 1164 01:12:08,336 --> 01:12:18,930 or there is a constant p, such that if you can solve this gap 1165 01:12:18,930 --> 01:12:22,700 problem, one over p to the k, so very tiny fraction of things 1166 01:12:22,700 --> 01:12:25,590 satisfied versus all of the super edges 1167 01:12:25,590 --> 01:12:37,340 covered, then NP can be solved in n to the order k time. 1168 01:12:42,300 --> 01:12:44,470 So we haven't usually used this class. 1169 01:12:44,470 --> 01:12:47,230 Usually we talk about p, which is the union of all these 1170 01:12:47,230 --> 01:12:48,442 for constant k. 1171 01:12:48,442 --> 01:12:50,150 But here k doesn't have to be a constant. 1172 01:12:50,150 --> 01:12:52,180 It could be some function of n. 1173 01:12:52,180 --> 01:12:54,640 And in particular, if p does not equal NP, 1174 01:12:54,640 --> 01:12:57,310 then k constant is not possible. 1175 01:12:57,310 --> 01:13:00,305 So this result implies this result. 1176 01:13:03,170 --> 01:13:09,570 But if we let k get bigger than a constant, like log-log n 1177 01:13:09,570 --> 01:13:15,570 or something, then we get some separation between-- we 1178 01:13:15,570 --> 01:13:18,170 get a somewhat weaker statement here. 1179 01:13:18,170 --> 01:13:20,470 We know if p does not equal NP, we know 1180 01:13:20,470 --> 01:13:23,000 that NP is not contained in p. 1181 01:13:23,000 --> 01:13:26,380 But if we furthermore assume that NP doesn't 1182 01:13:26,380 --> 01:13:31,970 have subexponential solutions, and very subexponential 1183 01:13:31,970 --> 01:13:34,690 solutions, then we get various gap bounds 1184 01:13:34,690 --> 01:13:36,460 inapproximability on Max-Rep. 1185 01:13:36,460 --> 01:13:39,920 So a reasonable limit, for example, 1186 01:13:39,920 --> 01:13:53,060 is that-- let's say we assume NP is not in n to the polylog n. 1187 01:13:56,336 --> 01:13:59,760 n to the polylog n is usually called quasi-polynomial. 1188 01:13:59,760 --> 01:14:01,290 It's almost polynomial. 1189 01:14:01,290 --> 01:14:04,870 Log n is kind of close to constant-- ish. 1190 01:14:04,870 --> 01:14:08,987 This is the same as two to the polylog n, n to the polylog n. 1191 01:14:08,987 --> 01:14:10,070 But it's a little clearer. 1192 01:14:10,070 --> 01:14:13,700 This is obviously close to polynomial, quite far 1193 01:14:13,700 --> 01:14:17,250 from exponential, which is two to the n, not polylog. 1194 01:14:17,250 --> 01:14:18,800 So very different from exponential. 1195 01:14:18,800 --> 01:14:23,390 So almost everyone believes NP does not admit 1196 01:14:23,390 --> 01:14:24,550 quasi-polynomial solutions. 1197 01:14:24,550 --> 01:14:26,890 All problems in NP would have to admit that. 1198 01:14:26,890 --> 01:14:28,566 3SAT, for example, people don't think 1199 01:14:28,566 --> 01:14:30,482 you can do better than some constant to the n. 1200 01:14:33,150 --> 01:14:39,030 Then what do we get when we plug in that value of k? 1201 01:14:39,030 --> 01:14:47,390 That there is a no 1/2 to the log to the one minus epsilon 1202 01:14:47,390 --> 01:14:49,545 n approximation. 1203 01:14:52,200 --> 01:14:56,080 Or also, the same thing, gap is hard. 1204 01:14:56,080 --> 01:14:57,960 Now, it's not NP-hard. 1205 01:14:57,960 --> 01:14:59,419 But it's as hard as this problem. 1206 01:14:59,419 --> 01:15:01,210 If you believe this is not true, then there 1207 01:15:01,210 --> 01:15:02,830 will be no polynomial time algorithm 1208 01:15:02,830 --> 01:15:06,530 to solve this factor gap Max-Rep. 1209 01:15:06,530 --> 01:15:07,890 So this is very large. 1210 01:15:07,890 --> 01:15:12,425 We've seen this before in this table of various results. 1211 01:15:12,425 --> 01:15:14,800 Near the bottom, there is a lower bound of two to the log 1212 01:15:14,800 --> 01:15:16,240 to one minus epsilon n. 1213 01:15:16,240 --> 01:15:18,630 This is not assuming p does not equal NP. 1214 01:15:18,630 --> 01:15:20,720 It's assuming this statement, NP does not 1215 01:15:20,720 --> 01:15:23,260 have quasi-polynomial algorithms. 1216 01:15:23,260 --> 01:15:25,100 And you see here our friends Max-Rep 1217 01:15:25,100 --> 01:15:28,750 and Min-Rep, two versions of label cover. 1218 01:15:28,750 --> 01:15:32,690 So I'm not going to prove these theorems. 1219 01:15:32,690 --> 01:15:35,170 But again, they're PCP style arguments 1220 01:15:35,170 --> 01:15:38,890 with some gap boosting. 1221 01:15:38,890 --> 01:15:43,850 But I would say most or a lot of approximation lower 1222 01:15:43,850 --> 01:15:47,900 bounds in s world today start from Max-Rep or Min-Rep 1223 01:15:47,900 --> 01:15:51,020 and reduce to the problem using usually some kind 1224 01:15:51,020 --> 01:15:52,850 of gap-preserving reduction. 1225 01:15:52,850 --> 01:15:55,020 Maybe they lose the gap, but we have such a huge gap 1226 01:15:55,020 --> 01:15:56,830 to start with that even if you lose gap, 1227 01:15:56,830 --> 01:15:59,490 you still get pretty good results. 1228 01:15:59,490 --> 01:16:03,150 So a couple of quick examples here on the slides. 1229 01:16:03,150 --> 01:16:05,110 Directed Steiner forest. 1230 01:16:05,110 --> 01:16:07,200 Remember, you have a directed graph, 1231 01:16:07,200 --> 01:16:10,360 and you have a bunch of terminal pairs. 1232 01:16:10,360 --> 01:16:14,210 And you want to, in particular, connect via directed path 1233 01:16:14,210 --> 01:16:17,990 some Ais and Bjs, let's say. 1234 01:16:17,990 --> 01:16:22,050 And you want to do so by choosing the fewest vertices 1235 01:16:22,050 --> 01:16:23,470 in this graph. 1236 01:16:23,470 --> 01:16:27,560 So what I'm going to do, if I'm given my bipartite graph here 1237 01:16:27,560 --> 01:16:29,904 for Min-Rep, I'm just going to add-- 1238 01:16:29,904 --> 01:16:31,320 to represent that this is a group, 1239 01:16:31,320 --> 01:16:34,290 I'm going to add a vertex here connect by directed edges here. 1240 01:16:34,290 --> 01:16:35,870 And there's a group down here, so I'm 1241 01:16:35,870 --> 01:16:37,860 going to have downward edges down there. 1242 01:16:37,860 --> 01:16:40,850 And whenever there's a super edge from, say, 1243 01:16:40,850 --> 01:16:43,980 A2, capital A2 to capital B1, then 1244 01:16:43,980 --> 01:16:46,510 I'm going to say in my directed Steiner forest problem, 1245 01:16:46,510 --> 01:16:49,926 I want a path from little a2 to little b1. 1246 01:16:49,926 --> 01:16:51,800 So in general, whenever there's a super edge, 1247 01:16:51,800 --> 01:16:53,460 I add that constraint. 1248 01:16:53,460 --> 01:16:56,020 And then any solution to directed Steiner forest 1249 01:16:56,020 --> 01:16:58,030 will exactly be a solution to Min-Rep. 1250 01:16:58,030 --> 01:17:02,040 You're just forcing the addition of the Ais and Bis. 1251 01:17:02,040 --> 01:17:03,560 It's again an L-reduction. 1252 01:17:03,560 --> 01:17:05,880 You're just offsetting by a fixed additive amount. 1253 01:17:05,880 --> 01:17:07,791 So your gap OPT will be the same. 1254 01:17:07,791 --> 01:17:10,290 And so you get that this problem is just as hard as Min-Rep. 1255 01:17:15,870 --> 01:17:18,040 Well, this is another one from set cover. 1256 01:17:18,040 --> 01:17:20,950 You can also show node weighted Steiner trees. 1257 01:17:20,950 --> 01:17:22,410 Log n hard to approximate. 1258 01:17:22,410 --> 01:17:25,250 That's not from Min-Rep, but threw it in there 1259 01:17:25,250 --> 01:17:27,661 while we're on the topic of Steiner trees. 1260 01:17:27,661 --> 01:17:28,160 All right. 1261 01:17:28,160 --> 01:17:31,797 I want to mention one more thing quickly 1262 01:17:31,797 --> 01:17:33,005 in my zero minutes remaining. 1263 01:17:45,530 --> 01:17:48,900 And that is unique games. 1264 01:17:54,500 --> 01:17:58,030 So unique games is a special case of, say, 1265 01:17:58,030 --> 01:18:08,920 Max-Rep, or either label cover problem, where the edges in Ai 1266 01:18:08,920 --> 01:18:12,427 cross Bj form a matching. 1267 01:18:17,780 --> 01:18:19,810 For every choice in the left, there's 1268 01:18:19,810 --> 01:18:25,120 a unique choice on the right and vice versa that matches. 1269 01:18:25,120 --> 01:18:27,170 Well, there's at most one choice, I guess. 1270 01:18:27,170 --> 01:18:30,960 And I think that corresponds to these games. 1271 01:18:30,960 --> 01:18:32,840 Once you choose a word over here, 1272 01:18:32,840 --> 01:18:34,490 there's unique letter that matches. 1273 01:18:34,490 --> 01:18:36,180 The reverse is not true. 1274 01:18:36,180 --> 01:18:38,500 So in this problem, it's more like a star, 1275 01:18:38,500 --> 01:18:39,350 left to right star. 1276 01:18:39,350 --> 01:18:41,980 Once you choose this word, it's fixed 1277 01:18:41,980 --> 01:18:43,730 what you have to choose on the right side. 1278 01:18:43,730 --> 01:18:45,563 But if you choose a single letter over here, 1279 01:18:45,563 --> 01:18:47,650 it does not uniquely determine the word over here. 1280 01:18:47,650 --> 01:18:49,630 So unique games is quite a bit stronger. 1281 01:18:49,630 --> 01:18:51,630 You choose either side, it forces the other one, 1282 01:18:51,630 --> 01:18:54,640 if you want to cover that edge. 1283 01:18:54,640 --> 01:18:58,870 OK So far so good. 1284 01:18:58,870 --> 01:19:03,750 Unique games conjecture is that the special case is also hard. 1285 01:19:03,750 --> 01:19:14,190 Unique games conjecture is that epsilon one minus epsilon gap 1286 01:19:14,190 --> 01:19:18,451 unique game is NP-hard. 1287 01:19:18,451 --> 01:19:19,950 Of course, there are weaker versions 1288 01:19:19,950 --> 01:19:21,917 of this conjecture that don't say NP-hard, 1289 01:19:21,917 --> 01:19:24,000 maybe assuming some weaker assumption that there's 1290 01:19:24,000 --> 01:19:27,350 no polynomial time algorithm. 1291 01:19:27,350 --> 01:19:30,030 Unlike every other complexity theoretic assumption 1292 01:19:30,030 --> 01:19:31,960 I have mentioned in this class, this one 1293 01:19:31,960 --> 01:19:33,600 is the subject of much debate. 1294 01:19:33,600 --> 01:19:35,320 Not everyone believes that it's true. 1295 01:19:35,320 --> 01:19:37,380 Some people believe that it's false. 1296 01:19:37,380 --> 01:19:40,300 Many people believe-- basically people don't know 1297 01:19:40,300 --> 01:19:41,480 is the short answer. 1298 01:19:41,480 --> 01:19:45,085 There's some somewhat scary evidence that it's not true. 1299 01:19:45,085 --> 01:19:47,710 There's slightly stronger forms of this that are definitely not 1300 01:19:47,710 --> 01:19:50,140 true, which I won't get into. 1301 01:19:50,140 --> 01:19:53,100 There is a subexponential algorithm for this problem. 1302 01:19:53,100 --> 01:19:56,750 But it's still up in the air. 1303 01:19:56,750 --> 01:19:58,750 A lot of people like to assume that this is true 1304 01:19:58,750 --> 01:20:02,950 because it makes life a lot more beautiful, 1305 01:20:02,950 --> 01:20:05,390 especially from an inapproximability standpoint. 1306 01:20:05,390 --> 01:20:09,480 So for example, MAX-2SAT, the best approximation algorithm is 1307 01:20:09,480 --> 01:20:11,081 0.940. 1308 01:20:11,081 --> 01:20:12,580 If you assume that unique games, you 1309 01:20:12,580 --> 01:20:14,360 can prove a matching lower bound. 1310 01:20:14,360 --> 01:20:17,770 That was MAX-2SAT for MAX-CUT, as was mentioned, 1311 01:20:17,770 --> 01:20:22,990 0.878 is the best upper bound by Goemans Williamson. 1312 01:20:22,990 --> 01:20:25,220 If you assume unique games, then that's also tight. 1313 01:20:25,220 --> 01:20:27,915 There's a matching this minus epsilon 1314 01:20:27,915 --> 01:20:33,330 or plus epsilon inapproximability result. 1315 01:20:33,330 --> 01:20:36,060 And vertex cover, two. 1316 01:20:36,060 --> 01:20:37,790 You probably know how to do two. 1317 01:20:37,790 --> 01:20:41,060 If you assume unique games, two is the right answer. 1318 01:20:41,060 --> 01:20:42,900 If you don't assume anything, the best 1319 01:20:42,900 --> 01:20:45,350 we know how to prove using all of this stuff 1320 01:20:45,350 --> 01:20:51,400 is 0.857 versus 0.5. 1321 01:20:51,400 --> 01:20:56,450 So it's nice to assume unique games is true. 1322 01:20:56,450 --> 01:21:00,120 Very cool results is if you look at over all the different CSP 1323 01:21:00,120 --> 01:21:03,030 problems that we've seen, all the MAX-CSP problems, 1324 01:21:03,030 --> 01:21:05,400 and you try to solve it using a particular kind 1325 01:21:05,400 --> 01:21:10,490 of semi-definite programming, there's an STP relaxation. 1326 01:21:10,490 --> 01:21:13,040 If you don't know STPs, ignore this sentence. 1327 01:21:13,040 --> 01:21:15,940 There's an STP relaxation of all CSP problems. 1328 01:21:15,940 --> 01:21:18,310 You do the obvious thing. 1329 01:21:18,310 --> 01:21:21,240 And that STP will have an integrality gap. 1330 01:21:21,240 --> 01:21:23,870 And if you believe unique games conjecture, 1331 01:21:23,870 --> 01:21:27,670 then that integrality gap equals the approximability factor, 1332 01:21:27,670 --> 01:21:28,400 one for one. 1333 01:21:28,400 --> 01:21:30,780 And so in this sense, if you're trying to solve any CSP 1334 01:21:30,780 --> 01:21:32,810 problem, semi-definite programming 1335 01:21:32,810 --> 01:21:37,050 is the ultimate tool for all approximation algorithms. 1336 01:21:37,050 --> 01:21:39,500 Because if there's a gap in the STP, 1337 01:21:39,500 --> 01:21:41,230 you can prove an inapproximability result 1338 01:21:41,230 --> 01:21:43,270 of that minus epsilon. 1339 01:21:43,270 --> 01:21:44,682 So this is amazingly powerful. 1340 01:21:44,682 --> 01:21:46,890 The only catch is, we don't know whether unique games 1341 01:21:46,890 --> 01:21:48,350 conjecture is true. 1342 01:21:48,350 --> 01:21:51,080 And for that reason, I'm not going to spend more time on it. 1343 01:21:51,080 --> 01:21:56,180 But this gives you a flavor of this side of the field, the gap 1344 01:21:56,180 --> 01:22:00,090 preservation approximation. 1345 01:22:00,090 --> 01:22:02,284 Any final questions? 1346 01:22:02,284 --> 01:22:03,248 Yeah. 1347 01:22:03,248 --> 01:22:05,158 AUDIENCE: If there's a [INAUDIBLE] algorithm 1348 01:22:05,158 --> 01:22:05,658 [INAUDIBLE]? 1349 01:22:09,550 --> 01:22:12,530 PROFESSOR: It's fine for a problem 1350 01:22:12,530 --> 01:22:15,570 to be slightly subexponential. 1351 01:22:15,570 --> 01:22:19,970 It's like two to the n to the epsilon or something. 1352 01:22:19,970 --> 01:22:22,600 So when you do an NP reduction, you 1353 01:22:22,600 --> 01:22:24,380 can blow things up by a polynomial factor. 1354 01:22:24,380 --> 01:22:28,000 And so that n to the epsilon becomes n again. 1355 01:22:28,000 --> 01:22:29,890 So if you start from 3SAT where we 1356 01:22:29,890 --> 01:22:33,600 don't believe there's a subexponential thing, when you 1357 01:22:33,600 --> 01:22:36,910 reduce to this, you might end up putting it-- 1358 01:22:36,910 --> 01:22:38,870 you lose that polynomial factor. 1359 01:22:38,870 --> 01:22:42,489 And so it's not a contradiction. 1360 01:22:42,489 --> 01:22:43,030 A bit subtle. 1361 01:22:45,840 --> 01:22:46,620 Cool. 1362 01:22:46,620 --> 01:22:48,910 See you Thursday.