1 00:00:00,080 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 3 00:00:04,010 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,226 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,226 --> 00:00:17,851 at ocw.mit.edu. 8 00:00:22,720 --> 00:00:26,270 PROFESSOR: Today, we are going to do computational complexity. 9 00:00:26,270 --> 00:00:28,989 This is rather different from every other thing 10 00:00:28,989 --> 00:00:30,030 we've seen in this class. 11 00:00:32,729 --> 00:00:36,120 This class is basically about polynomial time algorithms 12 00:00:36,120 --> 00:00:38,580 and problems where we can solve your problem 13 00:00:38,580 --> 00:00:40,290 in polynomial time. 14 00:00:40,290 --> 00:00:43,130 And today, it's about when you can't do that. 15 00:00:43,130 --> 00:00:44,880 Sometimes, we can prove you can't do that. 16 00:00:44,880 --> 00:00:47,020 Sometimes, we're pretty sure you can't do that. 17 00:00:47,020 --> 00:00:49,140 But it's all about negative results 18 00:00:49,140 --> 00:00:52,850 when your problems are really complex. 19 00:00:52,850 --> 00:00:55,040 And there's a lot of fun topics, here. 20 00:00:55,040 --> 00:00:59,280 This is the topic of entire classes, like 6045. 21 00:00:59,280 --> 00:01:02,950 We're just going to get a 1 hour flavor of it. 22 00:01:02,950 --> 00:01:04,519 So think of it as a high level intro. 23 00:01:04,519 --> 00:01:06,893 But we're going to prove real theorems and do real things 24 00:01:06,893 --> 00:01:09,580 and you'll get a sense of how all this works. 25 00:01:09,580 --> 00:01:13,680 So I'm going to start out with three complexity classes-- 26 00:01:13,680 --> 00:01:21,390 P, EXP, and R. How many people know what P is? 27 00:01:21,390 --> 00:01:23,760 And it is? 28 00:01:23,760 --> 00:01:25,990 Polynomial time. 29 00:01:25,990 --> 00:01:28,510 More precisely, it's the set of all problems 30 00:01:28,510 --> 00:01:30,060 you can solve in polynomial time. 31 00:01:35,590 --> 00:01:37,090 This is what the class is all about. 32 00:01:39,736 --> 00:01:41,110 Almost every problem we have seen 33 00:01:41,110 --> 00:01:44,410 in this class-- there's one exception-- is 34 00:01:44,410 --> 00:01:48,960 in P. Does anyone know the exception? 35 00:01:48,960 --> 00:01:51,150 It's a good puzzle for you. 36 00:01:51,150 --> 00:01:51,940 Not NP. 37 00:01:51,940 --> 00:01:52,440 What's next? 38 00:01:52,440 --> 00:01:55,120 EXP. 39 00:01:55,120 --> 00:01:56,660 How many people know what EXP is? 40 00:01:56,660 --> 00:01:58,810 Or you can guess. 41 00:01:58,810 --> 00:02:00,400 Any guesses? 42 00:02:00,400 --> 00:02:01,569 Exponential. 43 00:02:01,569 --> 00:02:04,110 These are all the problems you can solve in exponential time. 44 00:02:21,190 --> 00:02:23,210 If you want to be formal about it, in this case, 45 00:02:23,210 --> 00:02:29,200 exponential means 2 to the n to some constant. 46 00:02:29,200 --> 00:02:31,700 So not just 2 the n, but also 2 to the n squared, 2 to the n 47 00:02:31,700 --> 00:02:32,199 cubed. 48 00:02:32,199 --> 00:02:34,350 Those are all considered-- exponential 49 00:02:34,350 --> 00:02:38,050 and a polynomial is considered in the class EXP. 50 00:02:38,050 --> 00:02:42,010 Now, basically, almost every problem you can dream of you 51 00:02:42,010 --> 00:02:43,170 can solve in EXP. 52 00:02:43,170 --> 00:02:45,370 Exponential time is so much time. 53 00:02:45,370 --> 00:02:47,705 And this class has always been about taking things that 54 00:02:47,705 --> 00:02:51,460 are obviously in EXP and showing that they're actually in P. 55 00:02:51,460 --> 00:02:53,120 So if you want to draw a picture, 56 00:02:53,120 --> 00:02:54,870 you could say, OK, here's all the problems 57 00:02:54,870 --> 00:02:57,070 we can solve in polynomial time. 58 00:02:57,070 --> 00:03:00,287 Here's all the problems we can solve in exponential time. 59 00:03:00,287 --> 00:03:01,620 And there are problems out here. 60 00:03:01,620 --> 00:03:03,390 These are different classes. 61 00:03:03,390 --> 00:03:06,070 And we want to sort of bring things 62 00:03:06,070 --> 00:03:09,620 into here as much as possible. 63 00:03:09,620 --> 00:03:11,940 I actually want to draw this picture 64 00:03:11,940 --> 00:03:16,050 in a different way, which is as a horizontal line. 65 00:03:20,760 --> 00:03:21,715 So an axis. 66 00:03:24,860 --> 00:03:27,949 I'm going to call this computational difficulty. 67 00:03:27,949 --> 00:03:29,740 You could call it computational complexity, 68 00:03:29,740 --> 00:03:30,930 but that's a bit of a loaded term that 69 00:03:30,930 --> 00:03:32,470 actually has formal meaning. 70 00:03:32,470 --> 00:03:34,000 Difficulty is nice and vague. 71 00:03:34,000 --> 00:03:36,040 So I can draw an abstract picture. 72 00:03:36,040 --> 00:03:38,460 This is not a true diagram, but it's 73 00:03:38,460 --> 00:03:40,770 a very good guideline of what's going on. 74 00:03:40,770 --> 00:03:46,905 So we have-- I'm going to draw-- I believe-- three notches. 75 00:03:50,200 --> 00:03:52,770 No, eventually four, so let me give myself some room. 76 00:03:56,480 --> 00:04:01,920 We have over here, the easy problems are P. Then, 77 00:04:01,920 --> 00:04:05,180 we have these problems, which are EXP. 78 00:04:05,180 --> 00:04:08,930 We're going to fill in something in the middle. 79 00:04:08,930 --> 00:04:11,910 And then this is something called R. 80 00:04:11,910 --> 00:04:13,410 So you've got P is everything, here. 81 00:04:13,410 --> 00:04:19,140 EXP is all the way out to here, in some abstract view. 82 00:04:19,140 --> 00:04:23,290 The next thing is R. How many people know what R is? 83 00:04:23,290 --> 00:04:26,250 This one, I had to look up. 84 00:04:26,250 --> 00:04:29,780 It's not usually given a name. 85 00:04:29,780 --> 00:04:30,805 No one. 86 00:04:30,805 --> 00:04:31,430 Teaching staff? 87 00:04:31,430 --> 00:04:32,445 You guys know it? 88 00:04:36,500 --> 00:04:39,900 These are all problems solvable in finite time. 89 00:04:39,900 --> 00:04:40,910 R stands for finite. 90 00:04:49,980 --> 00:04:52,280 R stands for recursive. 91 00:04:52,280 --> 00:04:54,530 Recursive used to mean something completely different, 92 00:04:54,530 --> 00:04:56,863 back in the '30s, when people were thinking about what's 93 00:04:56,863 --> 00:04:58,470 computable, what's not computable. 94 00:04:58,470 --> 00:05:02,390 These are, basically, solvable problems, computable problems. 95 00:05:02,390 --> 00:05:04,890 Finite time is a reasonable requirement, I think, 96 00:05:04,890 --> 00:05:06,105 for all algorithms. 97 00:05:06,105 --> 00:05:09,690 And that's R. Now, I've drawn this arrow 98 00:05:09,690 --> 00:05:13,106 to keep going because there are problems out here. 99 00:05:13,106 --> 00:05:14,605 It's kind of discouraging, but there 100 00:05:14,605 --> 00:05:17,080 are problems that are unsolvable. 101 00:05:17,080 --> 00:05:19,977 In fact, most problems are unsolvable. 102 00:05:19,977 --> 00:05:21,060 We're going to prove that. 103 00:05:21,060 --> 00:05:23,340 It's actually really easy to prove. 104 00:05:23,340 --> 00:05:28,360 Kind of depressing, but true. 105 00:05:28,360 --> 00:05:31,820 Let me start with some examples before we get to that proof. 106 00:05:36,200 --> 00:05:40,730 So I'm writing examples of some things we've seen. 107 00:05:40,730 --> 00:05:44,170 So here's an example of a problem we've seen. 108 00:05:47,740 --> 00:05:49,095 Negative-weight cycle detection. 109 00:05:52,314 --> 00:05:54,440 I give you a graph-- a weighted graph. 110 00:05:54,440 --> 00:05:58,090 I want to know does it have any negative-weight cycles? 111 00:05:58,090 --> 00:06:00,560 What classes is this problem in? 112 00:06:00,560 --> 00:06:03,050 P. We know how to solve this in polynomial time-- 113 00:06:03,050 --> 00:06:06,260 in VE time-- using Bellman-Ford. 114 00:06:06,260 --> 00:06:08,650 VE time-- well, that finds negative-weight cycles 115 00:06:08,650 --> 00:06:09,540 reachable from s. 116 00:06:09,540 --> 00:06:11,370 But, I guess, if you add a source that 117 00:06:11,370 --> 00:06:14,390 can reach anywhere-- zero weight-- then 118 00:06:14,390 --> 00:06:18,390 that'll tell you overall that it's in P. 119 00:06:18,390 --> 00:06:19,640 It's also in EXP, of course. 120 00:06:19,640 --> 00:06:21,322 Everything in P is also in EXP. 121 00:06:21,322 --> 00:06:23,280 Because if you can solve it in polynomial time, 122 00:06:23,280 --> 00:06:25,290 you can solve it in exponential time. 123 00:06:25,290 --> 00:06:29,230 This is at most exponential time. 124 00:06:29,230 --> 00:06:30,048 At most polynomial. 125 00:06:33,440 --> 00:06:35,680 Here's a problem we haven't seen. 126 00:06:35,680 --> 00:06:37,160 But it's pretty cool. 127 00:06:37,160 --> 00:06:39,190 N by n Chess. 128 00:06:39,190 --> 00:06:41,320 So this is the problem I give you. 129 00:06:41,320 --> 00:06:43,440 So we're in an by n board, and I give you 130 00:06:43,440 --> 00:06:45,690 a whole bunch of pieces on the board, 131 00:06:45,690 --> 00:06:48,890 and I want to know does White win from here? 132 00:06:48,890 --> 00:06:51,660 I say it's White to move or Black to move, 133 00:06:51,660 --> 00:06:55,250 and who's going to win form this position? 134 00:06:55,250 --> 00:06:59,085 This problem, can be solved in exponential time. 135 00:06:59,085 --> 00:07:02,080 You can sort of play out all possible strategies 136 00:07:02,080 --> 00:07:05,210 and see who wins. 137 00:07:05,210 --> 00:07:10,060 And it's not in P. There's no polynomial time algorithm 138 00:07:10,060 --> 00:07:12,150 to play generalized Chess. 139 00:07:12,150 --> 00:07:15,250 This sort of captures why Chess-- even at eight by eight 140 00:07:15,250 --> 00:07:17,510 Chess-- is hard-- because there's no general way 141 00:07:17,510 --> 00:07:19,220 to do it. 142 00:07:19,220 --> 00:07:23,210 So there's no special way to do it, probably. 143 00:07:23,210 --> 00:07:25,780 Computational complexity is all about order of growth. 144 00:07:25,780 --> 00:07:27,770 So we can't analyze eight by eight Chess, 145 00:07:27,770 --> 00:07:29,290 but we can analyze n by n Chess. 146 00:07:29,290 --> 00:07:32,290 And that gives us a flavor of why 8 by 8 is so difficult. 147 00:07:32,290 --> 00:07:37,000 Go is also in EXP, but not in P-- lots of games 148 00:07:37,000 --> 00:07:41,430 are in this category, lot's of complicated games, let's say. 149 00:07:41,430 --> 00:07:45,130 And so this is a first example of a problem that we know we 150 00:07:45,130 --> 00:07:48,930 cannot solve in polynomial time. 151 00:07:48,930 --> 00:07:50,880 Bad news. 152 00:07:50,880 --> 00:07:53,070 I also talked about Tetris a little bit. 153 00:07:56,940 --> 00:07:58,920 Unlike the Tetris training, which we saw, 154 00:07:58,920 --> 00:08:00,630 this is sort of realistic Tetris-- 155 00:08:00,630 --> 00:08:02,340 all the rules of Tetris. 156 00:08:02,340 --> 00:08:04,990 The only catch is that I tell you all the pieces that 157 00:08:04,990 --> 00:08:06,470 are going to come in advance. 158 00:08:06,470 --> 00:08:08,442 Because, otherwise, it's some random process 159 00:08:08,442 --> 00:08:11,025 and it's kind of hard to think about what's the best strategy. 160 00:08:11,025 --> 00:08:13,524 But if I tell you what's going to come-- 161 00:08:13,524 --> 00:08:14,940 say it's a pseudo-random generator 162 00:08:14,940 --> 00:08:16,365 and you know how it works. 163 00:08:16,365 --> 00:08:17,990 You know all the pieces that will come. 164 00:08:17,990 --> 00:08:22,830 I want to know can I survive from a given initial board mess 165 00:08:22,830 --> 00:08:24,980 and for a given sequence of pieces. 166 00:08:24,980 --> 00:08:27,740 This can also be solved in exponential time. 167 00:08:27,740 --> 00:08:29,575 Just try all the possibilities. 168 00:08:34,780 --> 00:08:45,760 We don't know whether it's in P. We're pretty sure 169 00:08:45,760 --> 00:08:47,920 it's not in P. And by the end of today's lecture, 170 00:08:47,920 --> 00:08:51,500 you'll understand why we think it's not in P. 171 00:08:51,500 --> 00:08:54,930 But it's going to be somewhere in between here. 172 00:08:54,930 --> 00:08:57,110 Tetris is actually right here. 173 00:08:57,110 --> 00:08:59,410 But I haven't defined what right here is yet. 174 00:09:06,040 --> 00:09:10,460 And then the next one is halting problem. 175 00:09:24,720 --> 00:09:27,140 So halting problem is particularly cool, 176 00:09:27,140 --> 00:09:29,320 as we'll see-- or interesting. 177 00:09:29,320 --> 00:09:34,710 It's the problem of given a computer program-- Python, 178 00:09:34,710 --> 00:09:37,540 whatever, it doesn't really matter what language. 179 00:09:37,540 --> 00:09:42,150 They're all the same in a theoretical sense-- 180 00:09:42,150 --> 00:09:43,005 does it ever halt? 181 00:09:46,680 --> 00:09:50,335 Does it ever stop running, return a result, whatever? 182 00:09:53,070 --> 00:09:55,600 This would be really handy-- you're writing some code, 183 00:09:55,600 --> 00:09:58,360 and you've run it for 5 hours, and you 184 00:09:58,360 --> 00:10:00,120 don't know is that because there's a bug 185 00:10:00,120 --> 00:10:01,453 and you've got an infinite loop? 186 00:10:01,453 --> 00:10:04,000 Or is it just because it's really slow? 187 00:10:04,000 --> 00:10:08,137 So you'd like to give it to some program-- checking 188 00:10:08,137 --> 00:10:09,845 program-- that says will this run forever 189 00:10:09,845 --> 00:10:11,532 or will it terminate. 190 00:10:11,532 --> 00:10:13,160 That's the halting problem. 191 00:10:13,160 --> 00:10:17,080 And this problem is not in R. There 192 00:10:17,080 --> 00:10:20,260 is no correct algorithm for solving this problem. 193 00:10:20,260 --> 00:10:24,270 There's no way to tell, given an arbitrary program, 194 00:10:24,270 --> 00:10:25,840 whether it will halt. 195 00:10:25,840 --> 00:10:28,130 Now, in some situations-- take the empty program-- 196 00:10:28,130 --> 00:10:29,630 I can tell that it halts. 197 00:10:29,630 --> 00:10:33,580 Or I take some special simple class of programs, 198 00:10:33,580 --> 00:10:36,890 I can tell whether they halt or determine that they don't halt. 199 00:10:36,890 --> 00:10:40,890 But there's no algorithm that solves it for all programs, 200 00:10:40,890 --> 00:10:42,380 in finite time. 201 00:10:42,380 --> 00:10:44,330 In infinite time, I can solve it. 202 00:10:44,330 --> 00:10:46,540 Just run it. 203 00:10:46,540 --> 00:10:48,340 Run the program. 204 00:10:48,340 --> 00:10:50,340 Given finite time, there's no way to solve this. 205 00:10:50,340 --> 00:10:53,370 And so this is a little bit beyond what we can prove today. 206 00:10:53,370 --> 00:10:54,930 It's not that hard to prove, but it 207 00:10:54,930 --> 00:10:56,440 takes half an hour or something. 208 00:10:56,440 --> 00:10:57,690 I want to get to other things. 209 00:10:57,690 --> 00:11:02,020 But if you take 6045, they'll prove this. 210 00:11:02,020 --> 00:11:03,990 What I want to show you instead is an easier 211 00:11:03,990 --> 00:11:29,800 result-- that almost every problem is not in R. 212 00:11:29,800 --> 00:11:32,680 I need one term, though, which is decision problems. 213 00:11:32,680 --> 00:11:35,050 All of these problems, I set it up in a way 214 00:11:35,050 --> 00:11:37,556 that the answer is binary-- yes or no. 215 00:11:37,556 --> 00:11:38,930 Is there a negative-weight cycle? 216 00:11:38,930 --> 00:11:41,090 Yes or no? 217 00:11:41,090 --> 00:11:43,950 Does White win from this position in Chess? 218 00:11:43,950 --> 00:11:46,000 Can you survive in Tetris? 219 00:11:46,000 --> 00:11:48,240 And does this program halt? 220 00:11:48,240 --> 00:11:51,430 For various reasons-- basically convenience-- 221 00:11:51,430 --> 00:11:53,320 the whole field of computational complexity 222 00:11:53,320 --> 00:11:56,550 focuses on decision problems. 223 00:11:56,550 --> 00:11:59,370 And, in fact-- so decision problems 224 00:11:59,370 --> 00:12:01,080 are ones where the answer is yes or no. 225 00:12:01,080 --> 00:12:02,920 That's all. 226 00:12:02,920 --> 00:12:03,880 Why? 227 00:12:03,880 --> 00:12:05,520 Essentially because it doesn't matter. 228 00:12:05,520 --> 00:12:07,920 If you take a problem you care about, 229 00:12:07,920 --> 00:12:10,200 you can convert it into a decision problem. 230 00:12:10,200 --> 00:12:12,760 We can see examples of that later. 231 00:12:12,760 --> 00:12:14,290 Decision problems are basically as 232 00:12:14,290 --> 00:12:17,989 hard as optimization problems or whatever. 233 00:12:17,989 --> 00:12:19,530 But let's focus on decision problems. 234 00:12:19,530 --> 00:12:20,920 The answer is yes or no. 235 00:12:20,920 --> 00:12:23,590 Claim that most of them are uncomputable. 236 00:12:23,590 --> 00:12:26,100 And we can prove this pretty easily 237 00:12:26,100 --> 00:12:30,390 if you know a bit of set theory, I guess. 238 00:12:35,309 --> 00:12:37,350 On the one hand, I have problems I want to solve. 239 00:12:37,350 --> 00:12:38,520 These are decision problems. 240 00:12:38,520 --> 00:12:41,220 And on the other hand, I have algorithms, 241 00:12:41,220 --> 00:12:42,820 or computer programs to solve them. 242 00:12:42,820 --> 00:12:44,445 I'm going to think of computer programs 243 00:12:44,445 --> 00:12:46,640 because more precise algorithms can 244 00:12:46,640 --> 00:12:50,230 be a little bit nebulous for thinking about pseudocode-- 245 00:12:50,230 --> 00:12:51,460 what's valid, what's invalid. 246 00:12:51,460 --> 00:12:53,610 But computer programs are very clear. 247 00:12:53,610 --> 00:12:55,049 I give you some code. 248 00:12:55,049 --> 00:12:56,090 You throw it into Python. 249 00:12:56,090 --> 00:12:57,500 Either it works or it doesn't. 250 00:12:57,500 --> 00:12:59,552 And it does something. 251 00:12:59,552 --> 00:13:00,260 Runs for a while. 252 00:13:04,080 --> 00:13:08,750 How can I think about the space of all possible programs? 253 00:13:08,750 --> 00:13:12,020 Well, programs are things you type into a computer 254 00:13:12,020 --> 00:13:13,400 in ASCII, whatever. 255 00:13:13,400 --> 00:13:16,380 In the end, you can think of it as just as a binary string. 256 00:13:16,380 --> 00:13:18,060 Somehow it gets encoded in binary. 257 00:13:18,060 --> 00:13:21,870 Everything is reduced to binary in the end, on a computer. 258 00:13:21,870 --> 00:13:27,340 So this is a binary string. 259 00:13:27,340 --> 00:13:29,430 Now, you can also think of a binary string 260 00:13:29,430 --> 00:13:33,280 as representing a number, in binary. 261 00:13:33,280 --> 00:13:34,870 So you can also think of a program, 262 00:13:34,870 --> 00:13:38,960 then, as a natural number-- some number between 0 and infinity. 263 00:13:38,960 --> 00:13:41,870 And an integer. 264 00:13:41,870 --> 00:13:45,850 So usually we represent this as math bold N. 265 00:13:45,850 --> 00:13:48,470 That's just 0, 1, 2, 3. 266 00:13:48,470 --> 00:13:50,670 You can think of every program is ultimately 267 00:13:50,670 --> 00:13:51,960 reducing to an integer. 268 00:13:51,960 --> 00:13:53,615 It's a big integer, but, hey. 269 00:13:53,615 --> 00:13:55,770 It's an integer. 270 00:13:55,770 --> 00:13:57,647 So that's the space of all programs. 271 00:13:57,647 --> 00:14:00,230 Now, I want to think about the space of all decision problems. 272 00:14:02,900 --> 00:14:06,474 So how can I define a decision problem? 273 00:14:06,474 --> 00:14:08,640 Well, the natural way to think of a decision problem 274 00:14:08,640 --> 00:14:12,650 is as a function that maps inputs to yes or no. 275 00:14:19,350 --> 00:14:28,430 Function from inputs to yes or no. 276 00:14:28,430 --> 00:14:32,310 Or you can think of that as 1 and 0. 277 00:14:32,310 --> 00:14:34,310 So what's an input? 278 00:14:34,310 --> 00:14:36,220 Well, an input is a binary string. 279 00:14:36,220 --> 00:14:39,000 So an input is a number-- a natural number. 280 00:14:41,790 --> 00:14:51,580 Input is a binary string, which we can think of as being in N. 281 00:14:51,580 --> 00:14:58,570 So we've got a function from N to 0,1. 282 00:14:58,570 --> 00:15:03,000 So another way to represent one of these functions 283 00:15:03,000 --> 00:15:04,220 is as a table. 284 00:15:04,220 --> 00:15:06,070 I could just write down all the answers. 285 00:15:06,070 --> 00:15:10,110 So I've got, well, the input could be 0-- the number 0. 286 00:15:10,110 --> 00:15:11,969 And then, maybe it's a 0. 287 00:15:11,969 --> 00:15:14,260 Input could be could be 1 and then, maybe, output is 0. 288 00:15:14,260 --> 00:15:21,750 Then, the input could be 2, 3, 4, 5, 1, 0, 1, 1, whatever. 289 00:15:21,750 --> 00:15:24,510 So I could write the table of all answers. 290 00:15:24,510 --> 00:15:28,430 This is another way to write down such a function. 291 00:15:28,430 --> 00:15:32,110 What we have, here, is an infinite string of bits. 292 00:15:32,110 --> 00:15:34,290 Each of them could be 0 or 1. 293 00:15:34,290 --> 00:15:36,810 It would be a different problem. 294 00:15:36,810 --> 00:15:37,850 But they all exist. 295 00:15:37,850 --> 00:15:41,060 Any infinite string of bits represents a decision problem. 296 00:15:41,060 --> 00:15:42,750 They're the same thing. 297 00:15:42,750 --> 00:15:45,950 So a decision problem is an infinite string of bits. 298 00:15:45,950 --> 00:15:49,676 A program is a finite string of bits. 299 00:15:49,676 --> 00:15:52,260 These are different things. 300 00:15:52,260 --> 00:15:54,170 One way to see that they're different 301 00:15:54,170 --> 00:15:57,640 is put a decimal point, here. 302 00:15:57,640 --> 00:15:59,630 Now, this infinite string of bits 303 00:15:59,630 --> 00:16:03,710 is a number-- a real number-- between 0 and 1. 304 00:16:03,710 --> 00:16:04,720 It's written in binary. 305 00:16:04,720 --> 00:16:07,627 You may not be used to binary point. 306 00:16:07,627 --> 00:16:08,960 This dot is not a decimal point. 307 00:16:08,960 --> 00:16:10,180 It's a binary point. 308 00:16:10,180 --> 00:16:12,970 But, hey. 309 00:16:12,970 --> 00:16:16,040 Any real number can be expressed by an infinite string of bits 310 00:16:16,040 --> 00:16:18,820 in this way-- any real number between 0 and 1. 311 00:16:22,210 --> 00:16:31,940 So a decision problem is basically 312 00:16:31,940 --> 00:16:35,260 something in R, the set of all real numbers, 313 00:16:35,260 --> 00:16:38,555 whereas a program is something in N, the set of all integers. 314 00:16:41,750 --> 00:16:45,360 And the thing is, the number of real numbers 315 00:16:45,360 --> 00:16:50,460 is much, much bigger than the number of integers. 316 00:16:50,460 --> 00:16:53,110 In a formal sense, we call this one uncountably infinite, 317 00:16:53,110 --> 00:16:55,060 and this one is countably infinite. 318 00:16:55,060 --> 00:16:56,830 I'm not going to prove that here, today. 319 00:16:56,830 --> 00:16:59,010 You may have seen that proof. 320 00:16:59,010 --> 00:17:01,320 It's pretty simple. 321 00:17:01,320 --> 00:17:02,350 And that's bad news. 322 00:17:02,350 --> 00:17:04,829 That means that there are way more problems 323 00:17:04,829 --> 00:17:07,550 than there are programs to solve them. 324 00:17:07,550 --> 00:17:12,890 So this means almost every problem that we could conceive 325 00:17:12,890 --> 00:17:16,525 of is unsolvable by every program. 326 00:17:30,600 --> 00:17:33,245 And this is pretty depressing the first time I saw it. 327 00:17:33,245 --> 00:17:35,120 That's why we put it at the end of the class. 328 00:17:37,950 --> 00:17:40,160 I think you get all existential. 329 00:17:40,160 --> 00:17:42,040 I mean the thing is every program only 330 00:17:42,040 --> 00:17:43,201 solves one problem. 331 00:17:43,201 --> 00:17:44,700 It takes some input, and it's either 332 00:17:44,700 --> 00:17:46,247 going to output yes or no. 333 00:17:46,247 --> 00:17:48,580 And if it's wrong on any of the inputs, then it's wrong. 334 00:17:48,580 --> 00:17:51,010 So it's going to give an answer. 335 00:17:51,010 --> 00:17:52,570 Say it's a deterministic algorithm. 336 00:17:52,570 --> 00:17:55,950 No random numbers or things. 337 00:17:55,950 --> 00:17:57,670 Then, there's just not enough programs 338 00:17:57,670 --> 00:18:01,402 to go around if each program only solves one problem. 339 00:18:01,402 --> 00:18:02,610 This is the end of the proof. 340 00:18:02,610 --> 00:18:05,110 Any questions about that? 341 00:18:05,110 --> 00:18:07,390 Kind of weird. 342 00:18:07,390 --> 00:18:10,390 Because yet somehow, most of the problems that we think about 343 00:18:10,390 --> 00:18:11,530 are computable. 344 00:18:11,530 --> 00:18:13,100 I don't know why that is. 345 00:18:13,100 --> 00:18:15,520 But mathematically, most problems 346 00:18:15,520 --> 00:18:17,425 that you could think of are uncomputable. 347 00:18:21,450 --> 00:18:22,974 Question? 348 00:18:22,974 --> 00:18:23,890 AUDIENCE: [INAUDIBLE]. 349 00:18:27,850 --> 00:18:28,750 PROFESSOR: Yeah. 350 00:18:28,750 --> 00:18:32,270 It's something like, the way that we describe 351 00:18:32,270 --> 00:18:35,990 problems is usually almost algorithmic, anyway. 352 00:18:35,990 --> 00:18:39,610 And so, usually, most problems we think of are in EXP. 353 00:18:39,610 --> 00:18:42,300 And so they're definitely computable. 354 00:18:42,300 --> 00:18:43,820 There's some metatheorem about how 355 00:18:43,820 --> 00:18:46,385 we think about problems, not just programs. 356 00:18:51,110 --> 00:18:53,970 So that's all I'm going to say about R. So out here, 357 00:18:53,970 --> 00:18:57,972 we have halting problem and, actually, most problems. 358 00:18:57,972 --> 00:18:59,680 You can think of this as an infinite line 359 00:18:59,680 --> 00:19:01,980 and then there's just this small portion 360 00:19:01,980 --> 00:19:03,840 which are things you can solve. 361 00:19:03,840 --> 00:19:05,880 But we care about this portion because that's 362 00:19:05,880 --> 00:19:07,040 the interesting stuff. 363 00:19:07,040 --> 00:19:08,750 That's what algorithms are about. 364 00:19:08,750 --> 00:19:13,560 Out here kind of nothing happens. 365 00:19:13,560 --> 00:19:17,070 So I want to talk about this notch, which is NP. 366 00:19:22,747 --> 00:19:24,080 I imagine you've heard about NP. 367 00:19:27,252 --> 00:19:30,240 It's pretty cool, but also kind of confusing. 368 00:19:34,616 --> 00:19:37,890 But it's actually very closely related to something 369 00:19:37,890 --> 00:19:42,320 we've seen with dynamic programming, which is guessing. 370 00:19:42,320 --> 00:19:44,460 So I'm going to give you a couple of definitions 371 00:19:44,460 --> 00:19:48,285 of NP-- not formal definition, but high level definitions. 372 00:19:52,210 --> 00:19:57,350 So just like P, EXP, and R, it's a set of decision problems. 373 00:19:57,350 --> 00:20:03,650 And it's going to look very similar to P. NP does not 374 00:20:03,650 --> 00:20:05,640 stand for not a polynomial. 375 00:20:05,640 --> 00:20:08,750 It stands for nondeterministic polynomial. 376 00:20:08,750 --> 00:20:12,330 We'll get to nondeterministic in a moment. 377 00:20:12,330 --> 00:20:14,436 The first line is the same. 378 00:20:14,436 --> 00:20:17,740 It's all decision problems you can solve in polynomial time. 379 00:20:17,740 --> 00:20:19,720 That sounds like P. But then, there's 380 00:20:19,720 --> 00:20:25,480 this extra line, which is via a "lucky" algorithm. 381 00:20:33,730 --> 00:20:36,840 Let me tell you-- at a high level what 382 00:20:36,840 --> 00:20:40,434 a lucky algorithm does is it can make guesses. 383 00:20:40,434 --> 00:20:42,475 But unlike the way that we've been making guesses 384 00:20:42,475 --> 00:20:44,360 with dynamic programming-- with dynamic programming 385 00:20:44,360 --> 00:20:45,443 we had to guess something. 386 00:20:45,443 --> 00:20:47,190 We tried all the possibilities. 387 00:20:47,190 --> 00:20:50,597 A lucky algorithm just needs to try one possibility 388 00:20:50,597 --> 00:20:51,680 because it's really lucky. 389 00:20:51,680 --> 00:20:54,860 It always guesses the right choice. 390 00:20:54,860 --> 00:20:56,440 It's like magic. 391 00:20:56,440 --> 00:20:59,310 This is not a realistic model of computation, 392 00:20:59,310 --> 00:21:04,470 but it is a model of computation called nondeterministic model. 393 00:21:09,040 --> 00:21:11,950 And it's going to sound crazy because it is crazy, 394 00:21:11,950 --> 00:21:15,080 but nonetheless it's actually really useful-- 395 00:21:15,080 --> 00:21:17,410 even though you could never really build 396 00:21:17,410 --> 00:21:19,576 this on a real computer. 397 00:21:19,576 --> 00:21:20,950 The nondeterministic model is not 398 00:21:20,950 --> 00:21:22,116 a model of real computation. 399 00:21:22,116 --> 00:21:25,410 It is a model of theoretical hypothetical computation. 400 00:21:25,410 --> 00:21:27,300 It gets at the root-- at the core 401 00:21:27,300 --> 00:21:29,730 of what is possible to solve. 402 00:21:29,730 --> 00:21:32,800 You'll see why, in a little bit. 403 00:21:32,800 --> 00:21:39,800 So in this model, an algorithm-- it can compute stuff, 404 00:21:39,800 --> 00:21:43,682 but, in particular, it makes guesses. 405 00:21:43,682 --> 00:21:46,030 So should I do this or should I do this? 406 00:21:46,030 --> 00:21:48,790 And it just says-- It doesn't flip a coin. 407 00:21:48,790 --> 00:21:50,010 It's not random. 408 00:21:50,010 --> 00:21:53,990 It just thinks-- it just makes a guess. 409 00:21:53,990 --> 00:21:54,800 Well, I don't know. 410 00:21:54,800 --> 00:21:56,390 Let's go this way. 411 00:21:56,390 --> 00:21:58,182 And then it comes another fork in the road. 412 00:21:58,182 --> 00:21:59,431 It's like, well, I don't know. 413 00:21:59,431 --> 00:22:00,410 I'll go this way. 414 00:22:00,410 --> 00:22:01,660 That's the guessing. 415 00:22:01,660 --> 00:22:04,160 You give it a list of choices and somehow a choice 416 00:22:04,160 --> 00:22:08,910 is determined, by magic-- nondeterministic magic. 417 00:22:08,910 --> 00:22:18,970 And then the fun part is-- I should say, at the end 418 00:22:18,970 --> 00:22:24,652 the algorithm either says yes or no. 419 00:22:24,652 --> 00:22:25,610 It gives you an output. 420 00:22:28,420 --> 00:22:34,590 The guesses are guaranteed-- this is the magic part-- 421 00:22:34,590 --> 00:22:43,780 to lead to a yes answer, if possible. 422 00:22:47,650 --> 00:22:50,800 So if you imagine the space of executions of this program, 423 00:22:50,800 --> 00:22:53,342 you start here, and you make some guess and you 424 00:22:53,342 --> 00:22:55,190 don't know which way to go. 425 00:22:55,190 --> 00:22:57,119 In dynamic programming, we try all of them. 426 00:22:57,119 --> 00:22:58,910 But this algorithm doesn't try all of them. 427 00:22:58,910 --> 00:23:01,580 It's like a branching universe model of the universe. 428 00:23:01,580 --> 00:23:03,930 So you make some choice, and then you 429 00:23:03,930 --> 00:23:06,430 make some other choice, and then you make some other choice. 430 00:23:06,430 --> 00:23:08,900 All of these are guesses. 431 00:23:08,900 --> 00:23:11,500 And some of these things will lead to yes. 432 00:23:11,500 --> 00:23:13,120 Some of these things will lead to no. 433 00:23:13,120 --> 00:23:17,040 And in this magical model, if there's any yes out there, 434 00:23:17,040 --> 00:23:19,660 you will follow a path to a yes. 435 00:23:19,660 --> 00:23:21,974 If all of the answers are no, then, of course, 436 00:23:21,974 --> 00:23:23,640 it doesn't matter what choices you make. 437 00:23:23,640 --> 00:23:25,100 You will output no. 438 00:23:25,100 --> 00:23:27,940 But if there's ever a yes, magically these guesses 439 00:23:27,940 --> 00:23:28,580 find it. 440 00:23:28,580 --> 00:23:30,390 This is the sense of lucky. 441 00:23:30,390 --> 00:23:33,940 If you're trying to find a yes-- that's your goal in life-- 442 00:23:33,940 --> 00:23:37,290 then this corresponds to luck. 443 00:23:37,290 --> 00:23:40,260 And NP is the class of all problems solvable 444 00:23:40,260 --> 00:23:43,800 in polynomial time by a really lucky algorithm. 445 00:23:43,800 --> 00:23:44,850 Crazy. 446 00:23:44,850 --> 00:23:45,350 I know. 447 00:23:50,450 --> 00:23:53,115 Let's talk about Tetris. 448 00:23:56,320 --> 00:23:59,871 Tetris, I claim, is in NP. 449 00:23:59,871 --> 00:24:01,870 And we know how to solve it in exponential time. 450 00:24:01,870 --> 00:24:04,280 Just try all the options. 451 00:24:04,280 --> 00:24:07,640 But, in fact, I don't need to try all the options. 452 00:24:07,640 --> 00:24:11,150 It would be enough just use this nondeterministic magic. 453 00:24:11,150 --> 00:24:14,420 I could say, well, should I drop the piece here, here, here, 454 00:24:14,420 --> 00:24:15,739 here, here, or here. 455 00:24:15,739 --> 00:24:17,780 And should it be rotated like this, or like this, 456 00:24:17,780 --> 00:24:20,170 or like this, or like this? 457 00:24:20,170 --> 00:24:20,910 I don't know. 458 00:24:20,910 --> 00:24:22,210 So I guess. 459 00:24:22,210 --> 00:24:23,775 And I just place that piece. 460 00:24:23,775 --> 00:24:25,900 I make another guess where to place the next piece. 461 00:24:25,900 --> 00:24:27,550 Then I make another guess where to place the next piece. 462 00:24:27,550 --> 00:24:29,520 I implement the rules of Tetris, which 463 00:24:29,520 --> 00:24:32,426 is if there's a full line it clears. 464 00:24:32,426 --> 00:24:34,980 I figure out where these things fall. 465 00:24:34,980 --> 00:24:39,080 I can even think about, should I rotate at the last second. 466 00:24:39,080 --> 00:24:41,894 If I don't know, I'll guess. 467 00:24:41,894 --> 00:24:43,810 Any choice you have to make in playing Tetris, 468 00:24:43,810 --> 00:24:45,280 you can just guess. 469 00:24:45,280 --> 00:24:47,660 There's only polynomially many guesses you need to make. 470 00:24:47,660 --> 00:24:49,600 So it's still polynomial time. 471 00:24:49,600 --> 00:24:50,440 That's important. 472 00:24:50,440 --> 00:24:52,060 It's not like we can do anything. 473 00:24:52,060 --> 00:24:54,770 But we can make a polynomial number these magic guesses. 474 00:24:54,770 --> 00:24:59,330 And then at the end, I determine did I die-- 475 00:24:59,330 --> 00:25:01,070 or rather, did I survive. 476 00:25:01,070 --> 00:25:02,120 It's important, actually. 477 00:25:02,120 --> 00:25:03,980 It only works one way. 478 00:25:03,980 --> 00:25:04,780 Did I survive? 479 00:25:04,780 --> 00:25:05,555 Yes or no? 480 00:25:05,555 --> 00:25:06,680 And that's easy to compute. 481 00:25:06,680 --> 00:25:11,120 I just see did I ever go above the top row. 482 00:25:11,120 --> 00:25:13,640 So what this model says is if there is any way 483 00:25:13,640 --> 00:25:17,100 to survive-- if there is any way to get a yes answer, 484 00:25:17,100 --> 00:25:21,220 then, my guesses will find it, magically, in this model. 485 00:25:21,220 --> 00:25:22,370 Therefore, Tetris is in NP. 486 00:25:24,980 --> 00:25:28,200 If I had instead said, did I die, then, 487 00:25:28,200 --> 00:25:31,120 what this algorithm would tell me is there any way 488 00:25:31,120 --> 00:25:33,970 to die-- which, the answer's probably yes, 489 00:25:33,970 --> 00:25:36,360 unless you're given a really trivial input. 490 00:25:36,360 --> 00:25:39,710 So it's important you set up the yes versus no, correctly. 491 00:25:39,710 --> 00:25:43,980 But the Tetris decision problem "can I survive," is in NP. 492 00:25:43,980 --> 00:25:48,670 The decision problem "can I die," should not be in NP. 493 00:25:48,670 --> 00:25:49,494 But we don't know. 494 00:25:57,110 --> 00:25:58,430 Another way to think about NP. 495 00:26:01,382 --> 00:26:02,950 And you might find this intuitive 496 00:26:02,950 --> 00:26:04,800 because we've been doing lots of guessing. 497 00:26:04,800 --> 00:26:06,330 It's just a little crazy. 498 00:26:06,330 --> 00:26:11,490 There's another way that's more intuitive to many people. 499 00:26:11,490 --> 00:26:14,520 So if this doesn't make sense, don't worry, yet. 500 00:26:14,520 --> 00:26:16,110 This is another way to phrase it. 501 00:26:53,152 --> 00:26:55,110 Another way to think about NP-- which turns out 502 00:26:55,110 --> 00:27:01,450 to be equivalent-- is that don't think so much about algorithms 503 00:27:01,450 --> 00:27:04,300 for solving a problem, just think about algorithms 504 00:27:04,300 --> 00:27:07,437 for checking the solution to a problem. 505 00:27:07,437 --> 00:27:09,270 It's usually a lot easier to check your work 506 00:27:09,270 --> 00:27:11,980 than it is to solve a problem in the first place. 507 00:27:11,980 --> 00:27:15,400 And NP is all about that issue. 508 00:27:15,400 --> 00:27:17,540 So think of decision problems and think 509 00:27:17,540 --> 00:27:21,270 about if you have a solution-- so let's say in Tetris, 510 00:27:21,270 --> 00:27:24,790 the solution is yes. 511 00:27:24,790 --> 00:27:27,660 In fact, I need to say this, probably. 512 00:27:27,660 --> 00:27:31,150 The more formal version is whenever 513 00:27:31,150 --> 00:27:37,900 the answer is yes, you can prove it. 514 00:27:41,880 --> 00:27:44,100 And you can check that proof in polynomial time. 515 00:27:49,890 --> 00:27:53,140 This is the more formal-- this a little bit high level. 516 00:27:53,140 --> 00:27:54,130 What does check mean? 517 00:27:54,130 --> 00:27:56,080 Here's what check means. 518 00:27:56,080 --> 00:27:59,560 Whenever an answer is "yes," you can write down a proof 519 00:27:59,560 --> 00:28:00,900 that the answer is yes. 520 00:28:00,900 --> 00:28:02,400 And someone can come along and check 521 00:28:02,400 --> 00:28:04,370 that proof in polynomial time and be convinced 522 00:28:04,370 --> 00:28:06,310 that the answer is yes. 523 00:28:06,310 --> 00:28:07,600 What does convinced mean? 524 00:28:07,600 --> 00:28:09,800 It's not that hard. 525 00:28:09,800 --> 00:28:11,660 Think of it is a two player game. 526 00:28:11,660 --> 00:28:13,400 There's me trying to play Tetris, 527 00:28:13,400 --> 00:28:15,400 and there's you trying to be convinced 528 00:28:15,400 --> 00:28:18,100 that I'm really good at Tetris. 529 00:28:18,100 --> 00:28:23,060 It seems a little one sided, but-- it's a asymmetric game. 530 00:28:23,060 --> 00:28:27,420 So you want to prove Tetris is-- I want to show Tetris is in NP. 531 00:28:27,420 --> 00:28:29,790 Imagine I'm this magical creature. 532 00:28:29,790 --> 00:28:31,160 Actually, it's kind of funny. 533 00:28:31,160 --> 00:28:32,680 It reminds me of a story. 534 00:28:32,680 --> 00:28:34,620 On the front of my office door, you 535 00:28:34,620 --> 00:28:37,540 may have seen there's an email I received, 536 00:28:37,540 --> 00:28:39,900 maybe 15 years ago-- oh no, I guess 537 00:28:39,900 --> 00:28:42,100 it can't be that long ago. 538 00:28:42,100 --> 00:28:43,710 Must've been about 7 years ago when 539 00:28:43,710 --> 00:28:47,040 we proved that Tetris is NP-complete. 540 00:28:47,040 --> 00:28:51,660 And the email says, "Dear Sir,"-- or whatever-- 541 00:28:51,660 --> 00:28:54,349 "I am NP-complete." 542 00:28:54,349 --> 00:28:55,890 We don't what NP-complete means, yet, 543 00:28:55,890 --> 00:28:57,310 but it's a meaningless statement. 544 00:28:57,310 --> 00:28:59,640 So it doesn't matter that you don't know what it means. 545 00:28:59,640 --> 00:29:03,940 It might get funnier throughout the lecture today. 546 00:29:03,940 --> 00:29:07,860 And he's like, I can solve Tetris. 547 00:29:07,860 --> 00:29:09,534 I'm really good at playing Tetris. 548 00:29:09,534 --> 00:29:11,200 I'm really good at playing Minesweeper-- 549 00:29:11,200 --> 00:29:14,210 all these games that are thought to be intractable. 550 00:29:14,210 --> 00:29:15,810 He gave me his records and so on. 551 00:29:15,810 --> 00:29:20,230 It's like how can I apply my talent. 552 00:29:20,230 --> 00:29:26,230 So I will translate what he meant to say was, "I am lucky." 553 00:29:26,230 --> 00:29:29,017 And this is probably not true, but he 554 00:29:29,017 --> 00:29:30,100 thought that he was lucky. 555 00:29:30,100 --> 00:29:31,940 He wanted to convince me he was lucky. 556 00:29:31,940 --> 00:29:33,610 So how could we do it? 557 00:29:33,610 --> 00:29:36,540 Well, I could give him a really hard Tetris problem. 558 00:29:36,540 --> 00:29:38,870 And say, can you survive these pieces? 559 00:29:38,870 --> 00:29:41,450 And he says, "yes, I can survive. " 560 00:29:41,450 --> 00:29:43,450 And how does he prove to me that he can survive? 561 00:29:43,450 --> 00:29:45,150 Well, he just plays it. 562 00:29:45,150 --> 00:29:47,420 He shows me what to do. 563 00:29:47,420 --> 00:29:53,740 So proof is sequence of moves that you make. 564 00:29:53,740 --> 00:29:55,870 It's really easy to convince someone 565 00:29:55,870 --> 00:30:00,290 that you can survive a given level of Tetris. 566 00:30:00,290 --> 00:30:04,380 You just show what the sequence of moves are. 567 00:30:04,380 --> 00:30:07,860 And then I, as a mere mortal polynomial time algorithm 568 00:30:07,860 --> 00:30:09,780 can check that that sequence works. 569 00:30:09,780 --> 00:30:12,064 I just have to implement the rules of Tetris. 570 00:30:12,064 --> 00:30:13,980 So in Tetris, the rules are easy to implement. 571 00:30:13,980 --> 00:30:18,120 Its the knowing what thing to do is hard. 572 00:30:18,120 --> 00:30:21,840 But in NP, knowing which way to go is easy. 573 00:30:21,840 --> 00:30:23,340 In this version, you don't even talk 574 00:30:23,340 --> 00:30:24,820 about how to find the solution. 575 00:30:24,820 --> 00:30:26,486 It's just a matter of can you write down 576 00:30:26,486 --> 00:30:29,280 a solution that can be checked. 577 00:30:29,280 --> 00:30:30,000 Can prove it. 578 00:30:30,000 --> 00:30:31,420 This is not in polynomial time. 579 00:30:31,420 --> 00:30:34,710 You get arbitrarily much time to prove it. 580 00:30:34,710 --> 00:30:37,540 But then, the check has to happen in polynomial time. 581 00:30:41,047 --> 00:30:41,630 Kind of clear? 582 00:30:44,290 --> 00:30:46,450 That's Tetris. 583 00:30:46,450 --> 00:30:49,220 And every problem that you can solve in polynomial 584 00:30:49,220 --> 00:30:51,049 time you can also, of course, check it. 585 00:30:51,049 --> 00:30:53,090 Because if you could solve it in polynomial time, 586 00:30:53,090 --> 00:30:54,590 you could just solve it and then see 587 00:30:54,590 --> 00:30:56,320 did you get the same answer that I did. 588 00:30:56,320 --> 00:30:59,790 So P is inside NP. 589 00:30:59,790 --> 00:31:04,910 But the big question is does p equal NP. 590 00:31:04,910 --> 00:31:08,600 And most people think no. 591 00:31:08,600 --> 00:31:12,060 P does not equal NP-- most sane people. 592 00:31:16,690 --> 00:31:18,910 So this is a big problem. 593 00:31:18,910 --> 00:31:21,900 It's one of the famous Millennium Prize problems. 594 00:31:21,900 --> 00:31:27,030 So in particular, if you solved it, you would get $1 million, 595 00:31:27,030 --> 00:31:29,080 and fame, and probably other fortune. 596 00:31:29,080 --> 00:31:31,060 You could do TV spots. 597 00:31:31,060 --> 00:31:34,160 I think that's how people mostly make their money. 598 00:31:34,160 --> 00:31:35,280 You could do a lot. 599 00:31:35,280 --> 00:31:38,020 You would become the most famous computer scientist in the world 600 00:31:38,020 --> 00:31:40,020 if you prove this. 601 00:31:40,020 --> 00:31:41,270 So a lot of people have tried. 602 00:31:41,270 --> 00:31:44,070 Every year, there's an attempt to prove either 603 00:31:44,070 --> 00:31:46,500 what everyone believes or, most often, 604 00:31:46,500 --> 00:31:49,742 people try to prove the reverse-- that they are equal. 605 00:31:49,742 --> 00:31:50,450 I don't know why. 606 00:31:50,450 --> 00:31:53,250 They should bet the other way. 607 00:31:53,250 --> 00:31:55,360 So what does P does not equal NP mean? 608 00:31:55,360 --> 00:32:00,040 It means that there are problems, here, that are in NP 609 00:32:00,040 --> 00:32:03,240 but not in P. Think about what this means. 610 00:32:03,240 --> 00:32:05,810 This is saying P are the problems that we can actually 611 00:32:05,810 --> 00:32:07,680 solve on a legitimate computer. 612 00:32:07,680 --> 00:32:10,950 NP are problems that we can solve in this magical fairy 613 00:32:10,950 --> 00:32:14,370 computer where all of our dreams are granted. 614 00:32:14,370 --> 00:32:16,120 You say, oh, I don't know which way to go. 615 00:32:16,120 --> 00:32:19,510 It doesn't matter because the machine magically 616 00:32:19,510 --> 00:32:21,400 tells you which way to go. 617 00:32:21,400 --> 00:32:24,210 If you're goal is to get to a yes. 618 00:32:24,210 --> 00:32:27,930 So NP is a really powerful model of computation. 619 00:32:27,930 --> 00:32:29,690 It's an insane model of computation. 620 00:32:29,690 --> 00:32:32,100 No one in their right mind would consider it legitimate. 621 00:32:32,100 --> 00:32:35,250 So obviously, it's more powerful than P, 622 00:32:35,250 --> 00:32:37,727 except we don't know how to prove it. 623 00:32:37,727 --> 00:32:38,310 Very annoying. 624 00:32:45,480 --> 00:32:47,450 Other phrasings of P does not equal 625 00:32:47,450 --> 00:32:50,870 NP is-- these are my phrasings, I them up-- you 626 00:32:50,870 --> 00:32:53,090 can't engineer luck. 627 00:32:57,520 --> 00:32:59,160 You can believe in luck, if you want. 628 00:32:59,160 --> 00:33:01,410 But it's not something that we can build out 629 00:33:01,410 --> 00:33:03,960 of a regular computer. 630 00:33:03,960 --> 00:33:07,545 That's the meaning of this statement. 631 00:33:07,545 --> 00:33:09,485 And so I think most people believe that. 632 00:33:13,530 --> 00:33:19,520 Another phrasing would be that solving problems 633 00:33:19,520 --> 00:33:22,460 is harder than checking solutions. 634 00:33:27,300 --> 00:33:30,645 A more formal version is that generating solutions or proofs 635 00:33:30,645 --> 00:33:37,510 of solutions can be harder than checking them. 636 00:33:44,850 --> 00:33:47,860 Another phrasing is it's harder to generate 637 00:33:47,860 --> 00:33:49,550 a proof of a theorem than it is to check 638 00:33:49,550 --> 00:33:50,780 the proof of a theorem. 639 00:33:50,780 --> 00:33:53,400 We all know checking the proof of a theorem 640 00:33:53,400 --> 00:33:56,000 should be easy if you write it precisely. 641 00:33:56,000 --> 00:33:58,420 Just make sure each step follows from the previous ones. 642 00:33:58,420 --> 00:34:00,152 Done. 643 00:34:00,152 --> 00:34:01,610 But proving a theorem, that's hard. 644 00:34:01,610 --> 00:34:02,550 You need inspiration. 645 00:34:02,550 --> 00:34:03,740 You need some clever idea. 646 00:34:03,740 --> 00:34:04,920 That's guessing. 647 00:34:04,920 --> 00:34:09,020 Inspiration equals luck equals guessing, in this model. 648 00:34:09,020 --> 00:34:10,370 And that's hard. 649 00:34:13,380 --> 00:34:15,880 The only way we know is to try all the proofs. 650 00:34:15,880 --> 00:34:17,270 See which of them work. 651 00:34:24,020 --> 00:34:26,350 So what the heck? 652 00:34:26,350 --> 00:34:27,510 What could we possibly say? 653 00:34:27,510 --> 00:34:30,020 This is all kind of weird. 654 00:34:30,020 --> 00:34:31,520 This would be the end of the lecture 655 00:34:31,520 --> 00:34:34,770 if you say, OK, well we don't know. 656 00:34:34,770 --> 00:34:37,350 That's it. 657 00:34:37,350 --> 00:34:41,524 But thankfully-- I kind of need this board. 658 00:34:41,524 --> 00:34:43,690 I also want this one, but I guess I'll go over here. 659 00:34:48,364 --> 00:34:50,280 Fortunately, this is not the end of the story. 660 00:34:50,280 --> 00:34:55,340 And we can say a lot about things like Tetris. 661 00:34:55,340 --> 00:34:57,930 See I drew Tetris not just in this regime. 662 00:34:57,930 --> 00:35:01,330 We're pretty sure Tetris is between NP and P. 663 00:35:01,330 --> 00:35:06,480 That it's in NP minus P. 664 00:35:06,480 --> 00:35:08,830 So let me write that down. 665 00:35:08,830 --> 00:35:16,640 Tetris is in NP minus P. We don't know that because we 666 00:35:16,640 --> 00:35:20,070 don't know-- this could be the empty set. 667 00:35:20,070 --> 00:35:26,040 What we do know is that if there's 668 00:35:26,040 --> 00:35:32,040 anything in NP minus P-- if they are different, 669 00:35:32,040 --> 00:35:35,900 then-- if there's anything in NP minus P, 670 00:35:35,900 --> 00:35:39,060 then Tetris is one of those things. 671 00:35:39,060 --> 00:35:40,760 That's why I drew Tetris out there. 672 00:35:40,760 --> 00:35:45,800 It is, in a certain sense, the hardest problem in NP. 673 00:35:45,800 --> 00:35:47,690 Tetris. 674 00:35:47,690 --> 00:35:49,550 Why Tetris? 675 00:35:49,550 --> 00:35:50,809 Well, it's not just Tetris. 676 00:35:50,809 --> 00:35:53,100 There are a lot of problems right at that little notch. 677 00:35:53,100 --> 00:35:57,220 But this is pretty interesting because, while we can't figure 678 00:35:57,220 --> 00:35:59,920 this out, most people believe this is true. 679 00:35:59,920 --> 00:36:01,982 And so as long as you believe in that-- as long 680 00:36:01,982 --> 00:36:05,920 as you have faith-- then you can prove 681 00:36:05,920 --> 00:36:08,160 that Tetris is in NP minus P. 682 00:36:08,160 --> 00:36:09,650 And so it's hard. 683 00:36:09,650 --> 00:36:11,880 It's not in P, in this case. 684 00:36:11,880 --> 00:36:19,739 In particular, not in P. That's kind of cool. 685 00:36:19,739 --> 00:36:21,780 How in the world do we prove something like this? 686 00:36:21,780 --> 00:36:23,910 It's actually not that hard. 687 00:36:23,910 --> 00:36:25,830 I mean it took us several months, 688 00:36:25,830 --> 00:36:29,640 but that's just months, whereas this thing has been around 689 00:36:29,640 --> 00:36:33,170 since, I guess, the '70s. 690 00:36:33,170 --> 00:36:36,030 P versus NP. 691 00:36:36,030 --> 00:36:38,760 Why is this true? 692 00:36:38,760 --> 00:36:42,960 Because Tetris is NP-hard. 693 00:36:46,210 --> 00:36:48,420 What does NP-hard mean? 694 00:36:48,420 --> 00:36:54,640 This means as hard as every problem in NP. 695 00:36:59,340 --> 00:37:02,010 I can't say harder than because it's non-strict. 696 00:37:02,010 --> 00:37:04,910 So it's at least as hard as every problem in NP. 697 00:37:04,910 --> 00:37:07,580 And that's why I drew it at the far right. 698 00:37:07,580 --> 00:37:10,340 It's sort of the hardest extreme of NP. 699 00:37:10,340 --> 00:37:13,030 Among everything in NP you can possibly imagine, 700 00:37:13,030 --> 00:37:16,000 Tetris is as hard as all of them. 701 00:37:16,000 --> 00:37:19,430 And therefore, if there's anything that's harder than P, 702 00:37:19,430 --> 00:37:22,350 then Tetris is going to be harder than P because it's 703 00:37:22,350 --> 00:37:23,700 as far to the right as possible. 704 00:37:23,700 --> 00:37:27,490 Either P equals NP, in which case the picture is like this. 705 00:37:27,490 --> 00:37:29,920 Here's P. Here's NP. 706 00:37:29,920 --> 00:37:32,300 Tetris is still at the right extreme, here. 707 00:37:32,300 --> 00:37:35,430 But it's less interesting because it's still in P. 708 00:37:35,430 --> 00:37:37,590 Or the picture looks like this, and NP is strictly 709 00:37:37,590 --> 00:37:41,020 bigger than P. And then, because Tetris is at the right extreme, 710 00:37:41,020 --> 00:37:45,290 it's outside of P. So we prove this in order 711 00:37:45,290 --> 00:37:47,110 to establish this claim. 712 00:37:51,010 --> 00:37:52,630 Just to get some terminology, what 713 00:37:52,630 --> 00:37:53,940 is this NP-complete business? 714 00:37:58,810 --> 00:38:09,550 Tetris is NP-complete, which means two things. 715 00:38:09,550 --> 00:38:11,470 One is that it's NP-hard. 716 00:38:11,470 --> 00:38:13,960 And the other is that it's in NP. 717 00:38:13,960 --> 00:38:16,340 So if you think of the intersection, NP intersect 718 00:38:16,340 --> 00:38:18,210 NP-hard, that's NP-complete. 719 00:38:18,210 --> 00:38:26,490 Let me draw on the picture here what this means. 720 00:38:26,490 --> 00:38:28,140 So I'm going to draw it on the top. 721 00:38:38,590 --> 00:38:39,720 This is NP-hard. 722 00:38:42,390 --> 00:38:46,040 Everything from here to the right is NP-hard. 723 00:38:46,040 --> 00:38:48,922 NP-hard means it's at least as hard as everything in NP. 724 00:38:48,922 --> 00:38:50,380 That means it might be at this line 725 00:38:50,380 --> 00:38:52,390 or it might be to the right. 726 00:38:52,390 --> 00:38:55,130 But in the case of Tetris, we know that it's in NP. 727 00:38:55,130 --> 00:38:57,494 We proved that a couple of times. 728 00:38:57,494 --> 00:38:59,535 And so we know that Tetris is also in this range. 729 00:38:59,535 --> 00:39:01,850 And so if it's in this range and in this range, 730 00:39:01,850 --> 00:39:03,690 it's got to be right here. 731 00:39:03,690 --> 00:39:04,940 Completeness is nice. 732 00:39:04,940 --> 00:39:07,370 If you prove something is something complete-- 733 00:39:07,370 --> 00:39:09,920 prove a problem is some complexity class complete-- 734 00:39:09,920 --> 00:39:13,550 then you know sort of exactly where it falls on this line. 735 00:39:13,550 --> 00:39:15,750 NP-complete means right here. 736 00:39:15,750 --> 00:39:18,520 EXP-complete means right here. 737 00:39:18,520 --> 00:39:22,880 Turns out Chess is EXP-complete. 738 00:39:22,880 --> 00:39:27,710 EXP-hard is anything from here over. 739 00:39:27,710 --> 00:39:30,670 EXP is anything from here, over this way. 740 00:39:30,670 --> 00:39:32,335 Chess is right at that borderline. 741 00:39:32,335 --> 00:39:34,512 It is the hardest problem in EXP. 742 00:39:34,512 --> 00:39:35,970 And that's actually the only way we 743 00:39:35,970 --> 00:39:37,970 know to prove that it's not NP. 744 00:39:37,970 --> 00:39:39,970 It's is pretty easy to show that EXP is bigger 745 00:39:39,970 --> 00:39:43,770 than P. And Chess is the farthest to the right in EXP-- 746 00:39:43,770 --> 00:39:47,800 of any problem in EXP-- and so, therefore, it's not in P. 747 00:39:47,800 --> 00:39:51,350 So whereas this one-- these two, we're not sure are they equal. 748 00:39:51,350 --> 00:39:55,190 This line we know is different from this one. 749 00:39:55,190 --> 00:39:58,720 We don't know about these two, though. 750 00:39:58,720 --> 00:40:01,550 Does NP equal EXP? 751 00:40:01,550 --> 00:40:02,240 Not as famous. 752 00:40:02,240 --> 00:40:04,850 You won't get a million dollars, but still a very big, 753 00:40:04,850 --> 00:40:07,550 open question. 754 00:40:07,550 --> 00:40:09,590 What else do I wanna say? 755 00:40:09,590 --> 00:40:11,020 Tetris, Chess, EXP-hard. 756 00:40:11,020 --> 00:40:16,369 So these lines, here-- this is NP-complete 757 00:40:16,369 --> 00:40:17,410 And this is EXP-complete. 758 00:40:35,980 --> 00:40:39,015 So the last thing I want to talk about is reductions. 759 00:40:43,770 --> 00:40:45,980 Reductions-- so how do you prove something like this? 760 00:40:45,980 --> 00:40:47,710 What is as hard as even mean? 761 00:40:47,710 --> 00:40:49,230 I haven't defined that. 762 00:40:49,230 --> 00:40:51,270 But it's not hard to define. 763 00:40:51,270 --> 00:40:53,130 In fact, it's a concept we've seen already. 764 00:41:18,610 --> 00:41:21,380 Reductions are actually a way to design algorithms 765 00:41:21,380 --> 00:41:24,354 that we've been using implicitly a lot. 766 00:41:24,354 --> 00:41:25,770 You may have even heard this term. 767 00:41:25,770 --> 00:41:28,010 A bunch of recitations have used the word reduction 768 00:41:28,010 --> 00:41:29,970 for graph reduction. 769 00:41:29,970 --> 00:41:31,770 You have some problem, you convert it 770 00:41:31,770 --> 00:41:34,590 into a graph problem, then you just call the graph algorithm. 771 00:41:34,590 --> 00:41:35,830 You're done. 772 00:41:35,830 --> 00:41:36,760 That's reduction. 773 00:41:36,760 --> 00:41:38,820 In general, you have some problem, A, 774 00:41:38,820 --> 00:41:40,830 that you want to solve. 775 00:41:40,830 --> 00:41:44,030 And you convert it into some other problem, B, 776 00:41:44,030 --> 00:41:46,182 that you already know how to solve. 777 00:41:46,182 --> 00:41:47,890 It's a great tool because, in this class, 778 00:41:47,890 --> 00:41:50,630 you learn tons of algorithms for solving tons of problems. 779 00:41:50,630 --> 00:41:55,070 Now, someone gives you, in your job or whatever, 780 00:41:55,070 --> 00:41:56,950 or you think about some problem that you 781 00:41:56,950 --> 00:41:59,180 don't know how to solve, the first thing you should 782 00:41:59,180 --> 00:42:01,000 do is-- can I convert it into something 783 00:42:01,000 --> 00:42:02,930 I know how to solve because then you're done. 784 00:42:02,930 --> 00:42:04,721 Now it may not be the best way to solve it, 785 00:42:04,721 --> 00:42:06,410 but at least it's a way to solve it. 786 00:42:06,410 --> 00:42:09,015 Probably in polynomial time because we think of B as things 787 00:42:09,015 --> 00:42:10,390 you can solve in polynomial time. 788 00:42:10,390 --> 00:42:13,160 Great. 789 00:42:13,160 --> 00:42:20,730 So just convert problem A, which you 790 00:42:20,730 --> 00:42:27,615 want to solve, into some problem B that you know how to solve. 791 00:42:30,690 --> 00:42:32,370 That's reduction. 792 00:42:32,370 --> 00:42:35,460 Let me give you some examples that we've already seen, 793 00:42:35,460 --> 00:42:38,065 just to fit this into your mental map of the class. 794 00:42:42,640 --> 00:42:45,060 It's kind of a funny one but it's a very simple one. 795 00:42:52,470 --> 00:42:54,460 So how do you solve unweighted shortest paths? 796 00:42:58,300 --> 00:42:59,810 In general? 797 00:42:59,810 --> 00:43:00,670 Easy one. 798 00:43:00,670 --> 00:43:02,794 Give you a graph with no weights on the edges and I 799 00:43:02,794 --> 00:43:04,466 want to the shortest path from s to t. 800 00:43:04,466 --> 00:43:05,390 AUDIENCE: BFS 801 00:43:05,390 --> 00:43:06,180 PROFESSOR: BFS. 802 00:43:06,180 --> 00:43:07,600 Linear time, right? 803 00:43:07,600 --> 00:43:10,050 Well, that's if you're smart or if you 804 00:43:10,050 --> 00:43:11,250 feel like implementing BFS. 805 00:43:11,250 --> 00:43:14,380 Suppose someone gave you Djikstra. 806 00:43:14,380 --> 00:43:16,125 Said, here, look, I've got Djikstra code. 807 00:43:16,125 --> 00:43:17,375 You don't have to do anything. 808 00:43:17,375 --> 00:43:18,940 There's Djisktra code right there. 809 00:43:18,940 --> 00:43:21,100 But Djikstra solves weighted shortest path. 810 00:43:21,100 --> 00:43:22,160 I don't have any weights. 811 00:43:22,160 --> 00:43:24,960 What do I do? 812 00:43:24,960 --> 00:43:28,140 Set the weights to 1. 813 00:43:28,140 --> 00:43:30,630 It's very easy, but this is a reduction-- 814 00:43:30,630 --> 00:43:32,460 a simple example of reduction. 815 00:43:32,460 --> 00:43:35,330 Not the smartest of reductions, but it's a reduction. 816 00:43:38,840 --> 00:43:40,780 So I can convert unweighted shortest paths 817 00:43:40,780 --> 00:43:43,750 into weighted shortest paths by adding weights of 1. 818 00:43:43,750 --> 00:43:44,320 Done. 819 00:43:44,320 --> 00:43:46,070 Adding weights of 0 would not work. 820 00:43:46,070 --> 00:43:47,170 But weights of 1. 821 00:43:47,170 --> 00:43:47,900 OK. 822 00:43:47,900 --> 00:43:49,492 Weights of 2 also works. 823 00:43:49,492 --> 00:43:51,950 Pick your favorite number, but as long as you're consistent 824 00:43:51,950 --> 00:43:52,780 about it. 825 00:43:52,780 --> 00:43:54,520 That's a reduction. 826 00:43:54,520 --> 00:43:56,570 Here's some more interesting ones. 827 00:43:56,570 --> 00:44:03,920 On the problems set-- problem set six-- 828 00:44:03,920 --> 00:44:08,205 there was this RenBook problem, "I Can Haz Moar Frendz?" 829 00:44:08,205 --> 00:44:09,580 That was the name of the problem. 830 00:44:09,580 --> 00:44:14,640 And the goal was to solve-- to find 831 00:44:14,640 --> 00:44:17,884 paths that minimize the product of weights. 832 00:44:17,884 --> 00:44:19,300 But what we've covered in class is 833 00:44:19,300 --> 00:44:21,910 how to solve a problem when it's the sum of weights. 834 00:44:21,910 --> 00:44:23,890 How do you do it? 835 00:44:23,890 --> 00:44:26,070 In one word, or less? 836 00:44:26,070 --> 00:44:26,990 Logs. 837 00:44:26,990 --> 00:44:28,920 Just take logs. 838 00:44:28,920 --> 00:44:31,597 That converts products into sums. 839 00:44:31,597 --> 00:44:32,930 Now you start to get the flavor. 840 00:44:32,930 --> 00:44:37,150 This is a problem that you could take Djikstra or Bellman-Ford, 841 00:44:37,150 --> 00:44:39,390 and change all the relaxation steps 842 00:44:39,390 --> 00:44:42,470 and change it to work directly with products. 843 00:44:42,470 --> 00:44:46,570 That would work, but it's more work. 844 00:44:46,570 --> 00:44:49,200 You have to prove that that's still correct. 845 00:44:49,200 --> 00:44:50,500 It's annoying to think about. 846 00:44:50,500 --> 00:44:52,660 And it's annoying to program. 847 00:44:52,660 --> 00:44:54,590 It's not modular, blah, blah, blah. 848 00:44:54,590 --> 00:44:56,720 Whereas if you just do this reduction, 849 00:44:56,720 --> 00:44:59,990 you can use exactly the code that you had before, 850 00:44:59,990 --> 00:45:01,960 at the end. 851 00:45:01,960 --> 00:45:03,220 So that's nice. 852 00:45:03,220 --> 00:45:04,670 This is why reductions are really 853 00:45:04,670 --> 00:45:07,562 the most common algorithm design technique because you don't 854 00:45:07,562 --> 00:45:10,020 want to implement an algorithm for every single problem you 855 00:45:10,020 --> 00:45:10,700 have. 856 00:45:10,700 --> 00:45:13,200 It would be nice if you could reuse some of those algorithms 857 00:45:13,200 --> 00:45:14,630 that you had before. 858 00:45:14,630 --> 00:45:17,100 Reductions let you do that. 859 00:45:17,100 --> 00:45:21,680 Another one, which was on the quiz in the true-false-- quiz 860 00:45:21,680 --> 00:45:25,532 two-- was converting longest path into shortest path. 861 00:45:25,532 --> 00:45:26,990 We didn't phrase it as a reduction. 862 00:45:26,990 --> 00:45:29,730 It was just can you solve longest path using 863 00:45:29,730 --> 00:45:30,910 Bellman-Ford. 864 00:45:30,910 --> 00:45:31,832 And the answer is yes. 865 00:45:31,832 --> 00:45:33,165 You just negate all the weights. 866 00:45:33,165 --> 00:45:34,900 And that converts a longest path problem 867 00:45:34,900 --> 00:45:37,660 into a shortest path problem. 868 00:45:37,660 --> 00:45:40,310 Easy. 869 00:45:40,310 --> 00:45:43,030 Also on the quiz-- maybe I don't need to write all of these down 870 00:45:43,030 --> 00:45:45,200 because they're a little bit weird problems. 871 00:45:45,200 --> 00:45:46,370 We made them up. 872 00:45:46,370 --> 00:45:50,220 There was the-- what was the duck tour called? 873 00:45:50,220 --> 00:45:50,990 Bird tours? 874 00:45:50,990 --> 00:45:51,950 Bird tours? 875 00:45:51,950 --> 00:45:52,700 Aviation tours? 876 00:45:52,700 --> 00:45:53,610 Whatever. 877 00:45:53,610 --> 00:45:56,990 You want to visit a bunch of sites in some specified order. 878 00:45:56,990 --> 00:45:58,990 The point in that problem is you could reduce it 879 00:45:58,990 --> 00:46:02,900 to a single shortest paths query. 880 00:46:02,900 --> 00:46:05,682 And so if you already have shortest path code, 881 00:46:05,682 --> 00:46:06,890 you don't have to think much. 882 00:46:06,890 --> 00:46:08,400 You just do the graph application. 883 00:46:08,400 --> 00:46:09,970 Done. 884 00:46:09,970 --> 00:46:11,600 Then there's the leaky tank problem, 885 00:46:11,600 --> 00:46:14,570 which is also a graph reduction problem. 886 00:46:14,570 --> 00:46:16,570 You could represent all these extra weird things 887 00:46:16,570 --> 00:46:18,640 that were happening in your car by just 888 00:46:18,640 --> 00:46:20,202 changing the graph a little bit. 889 00:46:20,202 --> 00:46:21,660 And it's a very powerful technique. 890 00:46:21,660 --> 00:46:24,860 In this class, we see it mostly in graph reductions. 891 00:46:24,860 --> 00:46:28,120 But it could apply all over the place. 892 00:46:28,120 --> 00:46:30,810 And while this is a powerful technique for coming up 893 00:46:30,810 --> 00:46:34,310 with new algorithms, it's also a powerful technique 894 00:46:34,310 --> 00:46:41,380 for proving things like Tetris is NP-hard. 895 00:46:41,380 --> 00:46:43,830 So what we proved is that a problem 896 00:46:43,830 --> 00:46:49,600 called 3-Partition can be reduced to Tetris. 897 00:46:57,810 --> 00:46:58,610 What's 3-Partition? 898 00:46:58,610 --> 00:47:01,000 3-Partition is I give you n numbers. 899 00:47:01,000 --> 00:47:03,930 I want to know can I divide them into triples, 900 00:47:03,930 --> 00:47:06,450 each of the same sum. 901 00:47:06,450 --> 00:47:07,780 So I have n numbers. 902 00:47:07,780 --> 00:47:10,170 Divide them into n over 3 groups of 3, 903 00:47:10,170 --> 00:47:14,030 such that the sum of each of the 3s is equal. 904 00:47:14,030 --> 00:47:15,780 Sounds like an easy enough problem. 905 00:47:15,780 --> 00:47:18,230 But it's an NP-complete problem. 906 00:47:18,230 --> 00:47:22,950 And people knew that since one of the first papers. 907 00:47:22,950 --> 00:47:26,790 I guess that was late '70s, early '80s, by Karp. 908 00:47:26,790 --> 00:47:28,800 So Karp already proved this is standing 909 00:47:28,800 --> 00:47:32,410 on the shoulders of giants. 910 00:47:32,410 --> 00:47:34,360 Karp proved 3-Partition is NP-complete, 911 00:47:34,360 --> 00:47:37,060 so I don't need to think about that. 912 00:47:37,060 --> 00:47:39,210 All I need to focus on is showing 913 00:47:39,210 --> 00:47:43,470 that Tetris is harder than 3-Partition. 914 00:47:43,470 --> 00:47:45,270 This is what I mean by harder. 915 00:47:45,270 --> 00:47:48,990 Harder means-- so when I can reduce A to B, 916 00:47:48,990 --> 00:48:02,090 we say the A-- B is at least as hard as A. Why's that? 917 00:48:02,090 --> 00:48:05,820 Because I can solve A by solving B. I just apply this reduction 918 00:48:05,820 --> 00:48:08,570 and then solve B. So if I had some good way to solve B, 919 00:48:08,570 --> 00:48:11,110 it would turn into a good way to solve A. 920 00:48:11,110 --> 00:48:14,940 Now 3-Partition-- which is A, here-- we're 921 00:48:14,940 --> 00:48:17,440 pretty sure there's no good algorithm for solving this. 922 00:48:17,440 --> 00:48:22,900 Pretty sure it's not in P. And so Tetris better not be P 923 00:48:22,900 --> 00:48:25,430 either because if Tetris were in P, then 924 00:48:25,430 --> 00:48:27,140 we could just take our 3-Partition, 925 00:48:27,140 --> 00:48:30,990 reduce it to Tetris, and then 3-Partition would be in P. 926 00:48:30,990 --> 00:48:33,210 In fact, all of the NP-complete problems, 927 00:48:33,210 --> 00:48:36,470 you can reduce to each other. 928 00:48:36,470 --> 00:48:39,820 And so to show that something is at that little position, 929 00:48:39,820 --> 00:48:41,900 NP-complete, all you need to do is 930 00:48:41,900 --> 00:48:44,120 find some known NP-complete problem 931 00:48:44,120 --> 00:48:47,520 and reduce it to your problem. 932 00:48:47,520 --> 00:48:51,400 So reductions are super useful for getting positive results 933 00:48:51,400 --> 00:48:53,580 for making new algorithms, but also 934 00:48:53,580 --> 00:48:56,110 for proving negative results-- showing that one problem is 935 00:48:56,110 --> 00:48:57,310 harder than another. 936 00:48:57,310 --> 00:48:59,080 And if you already believe this is hard, 937 00:48:59,080 --> 00:49:00,621 then you should believe this is hard. 938 00:49:08,570 --> 00:49:12,060 I think that's all I really have time for. 939 00:49:12,060 --> 00:49:14,480 I'll give you a couple more NP-complete problems. 940 00:49:14,480 --> 00:49:15,930 Kind of fun. 941 00:49:15,930 --> 00:49:18,896 Traveling salesman problem, you may have heard of. 942 00:49:18,896 --> 00:49:20,020 Let's say you have a graph. 943 00:49:20,020 --> 00:49:22,040 And you want to find out the shortest path that 944 00:49:22,040 --> 00:49:25,770 visits all the vertices, not just one vertex. 945 00:49:25,770 --> 00:49:28,680 That's NP-complete. 946 00:49:28,680 --> 00:49:31,680 We solved longest common subsequence for two strings, 947 00:49:31,680 --> 00:49:33,280 but if I give you n strings that you 948 00:49:33,280 --> 00:49:35,238 need to find the longest common subsequence of, 949 00:49:35,238 --> 00:49:37,730 that's NP-complete. 950 00:49:37,730 --> 00:49:41,560 Minesweeper, Sudoku, most puzzles that are interesting 951 00:49:41,560 --> 00:49:43,990 are NP-complete. 952 00:49:43,990 --> 00:49:45,360 SAT. 953 00:49:45,360 --> 00:49:53,120 SAT is a-- I give you a Boolean formula like x or y AND NOT 954 00:49:53,120 --> 00:49:55,050 x-- something like that. 955 00:49:55,050 --> 00:49:57,499 I want to know is there some setting of the variables that 956 00:49:57,499 --> 00:49:58,790 makes this thing come out true? 957 00:49:58,790 --> 00:50:01,634 Is it possible to make this true? 958 00:50:01,634 --> 00:50:02,800 That's NP-complete complete. 959 00:50:02,800 --> 00:50:04,310 This was actually the first problem 960 00:50:04,310 --> 00:50:05,610 that was shown NP-complete. 961 00:50:05,610 --> 00:50:06,880 There's this issue, right? 962 00:50:06,880 --> 00:50:08,754 If I'm going to show everything's NP-complete 963 00:50:08,754 --> 00:50:10,910 by reduction, how the heck do I get started? 964 00:50:10,910 --> 00:50:12,360 What's the first problem? 965 00:50:12,360 --> 00:50:15,620 And this is the first problem. 966 00:50:15,620 --> 00:50:18,580 You could sort of prove it by definition, almost, of NP, 967 00:50:18,580 --> 00:50:19,480 here. 968 00:50:19,480 --> 00:50:22,760 But I won't do that. 969 00:50:22,760 --> 00:50:24,610 Three coloring a graph. 970 00:50:24,610 --> 00:50:25,280 Shortest paths. 971 00:50:25,280 --> 00:50:26,010 This is fun. 972 00:50:26,010 --> 00:50:27,840 Shortest paths in a graph is hard. 973 00:50:27,840 --> 00:50:30,620 But in the real world, we live in a three dimensional, 974 00:50:30,620 --> 00:50:31,880 geometric environment. 975 00:50:31,880 --> 00:50:33,338 What if I want to find the shortest 976 00:50:33,338 --> 00:50:35,620 path from this point, where I am, to that point, 977 00:50:35,620 --> 00:50:37,500 over on the ceiling or something. 978 00:50:37,500 --> 00:50:40,020 And I can fly. 979 00:50:40,020 --> 00:50:41,669 That's NP-complete. 980 00:50:41,669 --> 00:50:42,460 It's kind of weird. 981 00:50:42,460 --> 00:50:44,160 Shortest paths in a two dimensional environment 982 00:50:44,160 --> 00:50:44,743 is polynomial. 983 00:50:44,743 --> 00:50:47,532 It's a good thing that we are on ground because, then, we 984 00:50:47,532 --> 00:50:48,990 can model things by two dimensions. 985 00:50:48,990 --> 00:50:50,470 We can model things by graphs. 986 00:50:50,470 --> 00:50:53,500 But in 3D, shortest paths is NP-complete. 987 00:50:53,500 --> 00:50:56,139 So all these things where a problem-- knapsack, 988 00:50:56,139 --> 00:50:56,930 that's another one. 989 00:50:56,930 --> 00:50:58,221 We've already covered knapsack. 990 00:50:58,221 --> 00:50:59,990 We saw a pseudo-polynomial algorithm. 991 00:50:59,990 --> 00:51:02,390 Turns out, you can't do better than pseudo-polynomial 992 00:51:02,390 --> 00:51:07,030 unless P equals NP because knapsack is NP-complete. 993 00:51:07,030 --> 00:51:08,160 So there you go. 994 00:51:08,160 --> 00:51:11,313 Computational complexity in 50 minutes.