1 00:00:00,790 --> 00:00:03,130 The following content is provided under a Creative 2 00:00:03,130 --> 00:00:04,550 Commons license. 3 00:00:04,550 --> 00:00:06,760 Your support will help MIT OpenCourseWare 4 00:00:06,760 --> 00:00:10,850 continue to offer high quality educational resources for free. 5 00:00:10,850 --> 00:00:13,390 To make a donation or to view additional materials 6 00:00:13,390 --> 00:00:17,320 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,320 --> 00:00:18,570 at ocw.mit.edu. 8 00:00:30,962 --> 00:00:36,450 JOHN GUTTAG: All right, welcome to the 60002, 9 00:00:36,450 --> 00:00:40,270 or if you were in 600, the second half of 600. 10 00:00:40,270 --> 00:00:42,440 I'm John Guttag. 11 00:00:42,440 --> 00:00:44,590 Let me start with a few administrative things. 12 00:00:47,510 --> 00:00:48,770 What's the workload? 13 00:00:48,770 --> 00:00:50,900 There are problem sets. 14 00:00:50,900 --> 00:00:52,850 They'll all be programming problems 15 00:00:52,850 --> 00:00:56,210 much in the style of 60001. 16 00:00:56,210 --> 00:00:59,680 And the goal-- really twofold. 17 00:00:59,680 --> 00:01:03,470 60001 problem sets were mostly about you 18 00:01:03,470 --> 00:01:06,590 learning to be a programmer. 19 00:01:06,590 --> 00:01:08,360 A lot of that carries over. 20 00:01:08,360 --> 00:01:12,530 No one learns to be a programmer in half a semester. 21 00:01:12,530 --> 00:01:15,470 So a lot of it is to improve your skills, 22 00:01:15,470 --> 00:01:18,620 but also there's a lot more, I would say, 23 00:01:18,620 --> 00:01:24,380 conceptual, algorithmic material in 60002, 24 00:01:24,380 --> 00:01:26,270 and the problem sets are designed 25 00:01:26,270 --> 00:01:28,280 to help cement that as well as just 26 00:01:28,280 --> 00:01:31,260 to give you programming experience. 27 00:01:31,260 --> 00:01:34,530 Finger exercises, small things. 28 00:01:34,530 --> 00:01:39,150 If they're taking you more than 15 minutes, let us know. 29 00:01:39,150 --> 00:01:42,090 They really shouldn't, and they're generally 30 00:01:42,090 --> 00:01:45,270 designed to help you learn a single concept, usually 31 00:01:45,270 --> 00:01:48,000 a programming concept. 32 00:01:48,000 --> 00:01:50,670 Reading assignments in the textbooks, 33 00:01:50,670 --> 00:01:54,450 I've already posted the first reading assignment, 34 00:01:54,450 --> 00:01:58,440 and essentially they should provide you a very different 35 00:01:58,440 --> 00:02:00,300 take on the same material we're covering 36 00:02:00,300 --> 00:02:03,540 in lectures and recitations. 37 00:02:03,540 --> 00:02:06,270 We've tried to choose different examples for lectures 38 00:02:06,270 --> 00:02:08,520 and from the textbooks for the most part, 39 00:02:08,520 --> 00:02:12,330 so you get to see things in two slightly different ways. 40 00:02:12,330 --> 00:02:16,680 There'll be a final exam based upon all of the above. 41 00:02:16,680 --> 00:02:18,600 All right, prerequisites-- experience 42 00:02:18,600 --> 00:02:22,830 writing object-oriented programs in Python, preferably 43 00:02:22,830 --> 00:02:27,570 Python 3.5. 44 00:02:27,570 --> 00:02:31,260 Familiarity with concepts of computational complexity. 45 00:02:31,260 --> 00:02:32,950 You'll see even in today's lecture, 46 00:02:32,950 --> 00:02:35,310 we'll be assuming that. 47 00:02:35,310 --> 00:02:37,650 Familiarity with some simple algorithms. 48 00:02:40,870 --> 00:02:45,280 If you took 60001 or you took the 60001 advanced 49 00:02:45,280 --> 00:02:49,360 standing exam, you'll be fine. 50 00:02:49,360 --> 00:02:51,910 Odds are you'll be fine anyway, but that's 51 00:02:51,910 --> 00:02:54,260 the safest way to do it. 52 00:02:54,260 --> 00:02:56,890 So the programming assignments are 53 00:02:56,890 --> 00:02:59,500 going to be a bit easier, at least that's 54 00:02:59,500 --> 00:03:01,840 what students have reported in the past, 55 00:03:01,840 --> 00:03:04,930 because they'll be more focused on the problem to be solved 56 00:03:04,930 --> 00:03:07,240 than on the actual programming. 57 00:03:07,240 --> 00:03:10,000 The lecture content, more abstract. 58 00:03:10,000 --> 00:03:11,410 The lectures will be-- 59 00:03:11,410 --> 00:03:13,270 and maybe I'm speaking euphemistically-- 60 00:03:13,270 --> 00:03:15,260 a bit faster paced. 61 00:03:15,260 --> 00:03:18,670 So hang on to your seats. 62 00:03:18,670 --> 00:03:21,670 And the course is really less about programming 63 00:03:21,670 --> 00:03:25,690 and more about dipping your toe into the exotic world of data 64 00:03:25,690 --> 00:03:27,880 science. 65 00:03:27,880 --> 00:03:30,760 We do want you to hone your programming skills. 66 00:03:30,760 --> 00:03:33,100 There'll be a few additional bits of Python. 67 00:03:33,100 --> 00:03:37,870 Today, for example, we'll talk about lambda abstraction. 68 00:03:37,870 --> 00:03:40,420 Inevitably, some comments about software engineering, 69 00:03:40,420 --> 00:03:45,460 how to structure your code, more emphasis in using packages. 70 00:03:45,460 --> 00:03:47,290 Hopefully it will go a little bit smoother 71 00:03:47,290 --> 00:03:51,370 than in the last problem set in 60001. 72 00:03:51,370 --> 00:03:54,550 And finally, it's the old joke about programming 73 00:03:54,550 --> 00:04:00,390 that somebody walks up to a taxi driver in New York City 74 00:04:00,390 --> 00:04:01,830 and says, "I'm lost. 75 00:04:01,830 --> 00:04:03,930 How do I get to Carnegie Hall?" 76 00:04:03,930 --> 00:04:05,820 The taxi driver turns to the person 77 00:04:05,820 --> 00:04:09,790 and says, "practice, practice, practice." 78 00:04:09,790 --> 00:04:12,090 And that's really the only way to learn to program 79 00:04:12,090 --> 00:04:14,135 is practice, practice, practice. 80 00:04:17,290 --> 00:04:19,990 The main topic of the course is what I think 81 00:04:19,990 --> 00:04:22,580 of as computational models. 82 00:04:22,580 --> 00:04:24,970 How do we use computation to understand 83 00:04:24,970 --> 00:04:28,770 the world in which we live? 84 00:04:28,770 --> 00:04:29,820 What is a model? 85 00:04:29,820 --> 00:04:33,180 To me I think of it as an experimental device 86 00:04:33,180 --> 00:04:35,880 that can help us to either understand something that 87 00:04:35,880 --> 00:04:40,980 has happened, to sort of build a model that explains phenomena 88 00:04:40,980 --> 00:04:44,250 we see every day, or a model that 89 00:04:44,250 --> 00:04:46,320 will allow us to predict the future, something 90 00:04:46,320 --> 00:04:48,720 that hasn't happened. 91 00:04:48,720 --> 00:04:51,150 So you can think of, for example, a climate change 92 00:04:51,150 --> 00:04:52,320 model. 93 00:04:52,320 --> 00:04:55,530 We can build models that sort of explain how the climate has 94 00:04:55,530 --> 00:04:58,380 changed over the millennia, and then we 95 00:04:58,380 --> 00:05:00,450 can build probably a slightly different model 96 00:05:00,450 --> 00:05:03,330 that might predict what it will be like in the future. 97 00:05:06,630 --> 00:05:10,620 So essentially what's happening is 98 00:05:10,620 --> 00:05:17,640 science is moving out of the wet lab and into the computer. 99 00:05:17,640 --> 00:05:19,830 Increasingly, I'm sure you all see this-- 100 00:05:19,830 --> 00:05:22,050 those of you who are science majors-- 101 00:05:22,050 --> 00:05:25,680 an increasing reliance on computation rather than 102 00:05:25,680 --> 00:05:28,650 traditional experimentation. 103 00:05:28,650 --> 00:05:32,040 As we'll talk about, traditional experimentation 104 00:05:32,040 --> 00:05:34,980 is and will remain important, but now it 105 00:05:34,980 --> 00:05:39,300 has to really be supplemented by computation. 106 00:05:39,300 --> 00:05:41,790 We'll talk about three kinds of models-- 107 00:05:41,790 --> 00:05:48,240 optimization models, statistical models, and simulation models. 108 00:05:48,240 --> 00:05:52,610 So let's talk first about optimization models. 109 00:05:52,610 --> 00:05:56,220 An optimization model is a very simple thing. 110 00:05:56,220 --> 00:05:59,600 We start with an objective function that's either 111 00:05:59,600 --> 00:06:03,340 to be maximized or minimized. 112 00:06:03,340 --> 00:06:06,610 So for, example, if I'm going from New York to Boston, 113 00:06:06,610 --> 00:06:09,400 I might want to find a route by car or plane 114 00:06:09,400 --> 00:06:13,930 or train that minimizes the total travel time. 115 00:06:13,930 --> 00:06:15,610 So my objective function would be 116 00:06:15,610 --> 00:06:19,870 the number of minutes spent in transit getting from a to b. 117 00:06:23,380 --> 00:06:28,520 We then often have to layer on top of that objective function 118 00:06:28,520 --> 00:06:34,380 a set of constraints, sometimes empty, that we have to obey. 119 00:06:34,380 --> 00:06:38,570 So maybe the fastest way to get from New York to Boston 120 00:06:38,570 --> 00:06:42,570 is to take a plane, but I only have $100 to spend. 121 00:06:42,570 --> 00:06:44,710 So that option is off the table. 122 00:06:44,710 --> 00:06:47,690 So I have the constraints there on the amount 123 00:06:47,690 --> 00:06:50,030 of money I can spend. 124 00:06:50,030 --> 00:06:53,480 Or maybe I have to be in Boston before 5:00 PM 125 00:06:53,480 --> 00:06:58,220 and while the bus would get me there for $15, 126 00:06:58,220 --> 00:07:00,480 it won't get me there before 5:00. 127 00:07:00,480 --> 00:07:04,460 And so maybe what I'm left with is driving, 128 00:07:04,460 --> 00:07:05,870 something like that. 129 00:07:05,870 --> 00:07:08,570 So objective function, something you're either 130 00:07:08,570 --> 00:07:12,950 minimizing or maximizing, and a set of constraints 131 00:07:12,950 --> 00:07:16,700 that eliminate some solutions. 132 00:07:16,700 --> 00:07:19,550 And as we'll see, there's an asymmetry here. 133 00:07:19,550 --> 00:07:22,791 We handle these two things differently. 134 00:07:26,850 --> 00:07:28,550 We use these things all the time. 135 00:07:31,220 --> 00:07:35,420 I commute to work using Waze, which essentially is solving-- 136 00:07:35,420 --> 00:07:38,660 not very well, I believe-- an optimization problem 137 00:07:38,660 --> 00:07:42,380 to minimize my time from home to here. 138 00:07:42,380 --> 00:07:47,140 When you travel, maybe you log into various advisory programs 139 00:07:47,140 --> 00:07:51,470 that try and optimize things for you. 140 00:07:51,470 --> 00:07:52,970 They're all over the place. 141 00:07:52,970 --> 00:07:58,280 Today you really can't avoid using optimization algorithm 142 00:07:58,280 --> 00:07:59,450 as you get through life. 143 00:08:03,200 --> 00:08:04,020 Pretty abstract. 144 00:08:04,020 --> 00:08:07,100 Let's talk about a specific optimization problem 145 00:08:07,100 --> 00:08:10,460 called the knapsack problem. 146 00:08:10,460 --> 00:08:12,950 The first time I talked about the knapsack problem 147 00:08:12,950 --> 00:08:15,890 I neglected to show a picture of a knapsack, 148 00:08:15,890 --> 00:08:17,930 and I was 10 minutes into it before I 149 00:08:17,930 --> 00:08:21,890 realized most of the class had no idea what a knapsack was. 150 00:08:21,890 --> 00:08:25,370 It's what we old people used to call a backpack, 151 00:08:25,370 --> 00:08:30,390 and they used to look more like that than they look today. 152 00:08:30,390 --> 00:08:34,919 So the knapsack problem involves-- 153 00:08:34,919 --> 00:08:39,620 usually it's told in terms of a burglar who breaks into a house 154 00:08:39,620 --> 00:08:42,049 and wants to steal a bunch of stuff 155 00:08:42,049 --> 00:08:44,240 but has a knapsack that will only 156 00:08:44,240 --> 00:08:48,740 hold a finite amount of stuff that he or she wishes to steal. 157 00:08:48,740 --> 00:08:53,750 And so the burglar has to solve the optimization problem 158 00:08:53,750 --> 00:08:57,920 of stealing the stuff with the most value while obeying 159 00:08:57,920 --> 00:09:03,110 the constraint that it all has to fit in the knapsack. 160 00:09:03,110 --> 00:09:07,740 So we have an objective function. 161 00:09:07,740 --> 00:09:10,530 I'll get the most for this when I fence it. 162 00:09:10,530 --> 00:09:13,950 And a constraint, it has to fit in my backpack. 163 00:09:13,950 --> 00:09:17,100 And you can guess which of these might be 164 00:09:17,100 --> 00:09:18,580 the most valuable items here. 165 00:09:21,440 --> 00:09:27,890 So here is in words, written words what I just said orally. 166 00:09:27,890 --> 00:09:29,630 There's more stuff than you can carry, 167 00:09:29,630 --> 00:09:32,270 and you have to choose which stuff to take 168 00:09:32,270 --> 00:09:33,520 and which to leave behind. 169 00:09:36,210 --> 00:09:39,890 I should point out that there are two variants of it. 170 00:09:39,890 --> 00:09:46,740 There's the 0/1 knapsack problem and the continuous. 171 00:09:46,740 --> 00:09:52,160 The 0/1 would be illustrated by something like this. 172 00:09:52,160 --> 00:09:55,060 So the 0/1 knapsack problem means you either take 173 00:09:55,060 --> 00:09:56,790 the object or you don't. 174 00:09:56,790 --> 00:10:01,480 I take that whole gold bar or I take none of it. 175 00:10:01,480 --> 00:10:04,480 The continuous or so-called fractional knapsack problem 176 00:10:04,480 --> 00:10:07,100 says I can take pieces of it. 177 00:10:07,100 --> 00:10:08,770 So maybe if I take in my gold bar 178 00:10:08,770 --> 00:10:12,130 and shaved it into gold dust, I then can say, 179 00:10:12,130 --> 00:10:13,700 well, the whole thing won't fit in, 180 00:10:13,700 --> 00:10:16,800 but I can fit in a path, part of it. 181 00:10:16,800 --> 00:10:20,940 The continuous knapsack problem is really boring. 182 00:10:20,940 --> 00:10:22,837 It's easy to solve. 183 00:10:22,837 --> 00:10:25,170 How do you think you would solve the continuous problem? 184 00:10:29,650 --> 00:10:34,560 Suppose you had over here a pile of gold and a pile of silver 185 00:10:34,560 --> 00:10:40,010 and a pile of raisins, and you wanted to maximize your value. 186 00:10:40,010 --> 00:10:42,890 Well, you'd fill up your knapsack with gold 187 00:10:42,890 --> 00:10:45,905 until you either ran out of gold or ran out of space. 188 00:10:45,905 --> 00:10:48,230 If you haven't run out of space, you'll 189 00:10:48,230 --> 00:10:52,577 now put silver in until you run out of space. 190 00:10:52,577 --> 00:10:54,160 If you still haven't run out of space, 191 00:10:54,160 --> 00:10:57,660 well, then you'll take as many raisins as you can fit in. 192 00:10:57,660 --> 00:11:01,090 But you can solve it with what's called a greedy algorithm, 193 00:11:01,090 --> 00:11:03,410 and we'll talk much more about this as we go forward. 194 00:11:07,860 --> 00:11:10,920 Where you take the best thing first as long as 195 00:11:10,920 --> 00:11:15,320 you can and then you move on to the next thing. 196 00:11:15,320 --> 00:11:18,920 As we'll see, the 0/1 knapsack problem 197 00:11:18,920 --> 00:11:22,820 is much more complicated because once you make a decision, 198 00:11:22,820 --> 00:11:26,950 it will affect the future decisions. 199 00:11:26,950 --> 00:11:30,790 Let's look at an example, and I should probably warn you, 200 00:11:30,790 --> 00:11:35,120 if you're hungry, this is not going to be a fun lecture. 201 00:11:35,120 --> 00:11:38,020 So here is my least favorite because I always 202 00:11:38,020 --> 00:11:41,030 want to eat more than I'm supposed to eat. 203 00:11:41,030 --> 00:11:44,400 So the point is typically knapsack problems 204 00:11:44,400 --> 00:11:48,730 are not physical knapsacks but some conceptual idea. 205 00:11:48,730 --> 00:11:53,830 So let's say that I'm allowed 1,500 calories of food, 206 00:11:53,830 --> 00:11:56,810 and these are my options. 207 00:11:56,810 --> 00:12:00,580 I have to go about deciding, looking at this food-- 208 00:12:00,580 --> 00:12:02,830 and it's interesting, again, there's things showing up 209 00:12:02,830 --> 00:12:06,050 on your screen that are not showing up on my screen, 210 00:12:06,050 --> 00:12:10,030 but they're harmless, things like how my mouse works. 211 00:12:10,030 --> 00:12:17,000 Anyway, so I'm trying to take some fraction of this food, 212 00:12:17,000 --> 00:12:23,240 and it can't add up to more than 1,500 calories. 213 00:12:23,240 --> 00:12:27,920 The problem might be that once I take something that's 214 00:12:27,920 --> 00:12:30,830 1,485 calories, I can't take anything 215 00:12:30,830 --> 00:12:34,640 else, or maybe 1,200 calories and everything else is 216 00:12:34,640 --> 00:12:36,380 more than 300. 217 00:12:36,380 --> 00:12:40,760 So once I take one thing, it constrains possible solutions. 218 00:12:40,760 --> 00:12:42,540 A greedy algorithm, as we'll see, 219 00:12:42,540 --> 00:12:46,670 is not guaranteed to give me the best answer. 220 00:12:46,670 --> 00:12:49,670 Let's look at a formalization of it. 221 00:12:49,670 --> 00:12:55,510 So each item is represented by a pair, the value of the item 222 00:12:55,510 --> 00:12:56,670 and the weight of the item. 223 00:13:01,270 --> 00:13:04,540 And let's assume the knapsack can accommodate items 224 00:13:04,540 --> 00:13:10,000 with the total weight of no more than w. 225 00:13:10,000 --> 00:13:12,560 I apologize for the short variable names, 226 00:13:12,560 --> 00:13:16,090 but they're easier to fit on a slide. 227 00:13:16,090 --> 00:13:19,540 Finally, we're going to have a vector l 228 00:13:19,540 --> 00:13:24,510 of length n representing the set of available items. 229 00:13:24,510 --> 00:13:29,210 This is assuming we have n items to choose from. 230 00:13:29,210 --> 00:13:31,840 So each element of the vector represents an item. 231 00:13:37,290 --> 00:13:39,600 So those are the items we have. 232 00:13:39,600 --> 00:13:43,140 And then another vector v is going 233 00:13:43,140 --> 00:13:47,600 to indicate whether or not an item was taken. 234 00:13:47,600 --> 00:13:49,850 So essentially I'm going to use a binary number 235 00:13:49,850 --> 00:13:54,700 to represent the set of items I choose to take. 236 00:13:54,700 --> 00:13:58,810 For item three say, if bit three is zero 237 00:13:58,810 --> 00:14:01,030 I'm not taking the item. 238 00:14:01,030 --> 00:14:06,270 If bit three is one, then I am taking the item. 239 00:14:06,270 --> 00:14:09,450 So it just shows I can now very nicely 240 00:14:09,450 --> 00:14:14,201 represent what I've done by a single vector of zeros 241 00:14:14,201 --> 00:14:14,700 and ones. 242 00:14:17,510 --> 00:14:20,000 Let me pause for a second. 243 00:14:20,000 --> 00:14:23,590 Does anyone have any questions about this setup? 244 00:14:23,590 --> 00:14:25,630 It's important to get this setup because what 245 00:14:25,630 --> 00:14:31,840 we're going to see now depends upon that setting in your head. 246 00:14:31,840 --> 00:14:35,590 So I've kind of used mathematics to describe the backpack 247 00:14:35,590 --> 00:14:36,860 problem. 248 00:14:36,860 --> 00:14:39,580 And that's typically the way we deal with these optimization 249 00:14:39,580 --> 00:14:40,690 problems. 250 00:14:40,690 --> 00:14:43,870 We start with some informal description, 251 00:14:43,870 --> 00:14:48,920 and then we translate them into a mathematical representation. 252 00:14:48,920 --> 00:14:51,080 So here it is. 253 00:14:51,080 --> 00:14:52,850 We're going to try and find a vector 254 00:14:52,850 --> 00:15:02,430 v that maximizes the sum of V sub i times I sub i. 255 00:15:05,760 --> 00:15:09,750 Now, remember I sub i is the value of the item. 256 00:15:09,750 --> 00:15:17,740 V sub i is either zero or one So if I didn't take the item, 257 00:15:17,740 --> 00:15:20,350 I'm multiplying its value by zero. 258 00:15:20,350 --> 00:15:23,410 So it contributes nothing to the sum. 259 00:15:23,410 --> 00:15:27,040 If I did take the item, I'm multiplying its value by one. 260 00:15:27,040 --> 00:15:31,270 So the value of the item gets added to the sum. 261 00:15:31,270 --> 00:15:35,910 So that tells me the value of V. And I 262 00:15:35,910 --> 00:15:38,160 want to get the most valuable V I 263 00:15:38,160 --> 00:15:43,290 can get subject to the constraint 264 00:15:43,290 --> 00:15:48,850 that if I look at the item's dot weight and multiply it by V, 265 00:15:48,850 --> 00:15:54,670 the sum of the weights is no greater than w. 266 00:15:54,670 --> 00:15:56,940 So I'm playing the same trick with the values 267 00:15:56,940 --> 00:16:01,620 of multiplying each one by zero or one, 268 00:16:01,620 --> 00:16:04,412 and that's my constraint. 269 00:16:08,480 --> 00:16:11,260 Make sense? 270 00:16:11,260 --> 00:16:16,960 All right, so now we have the problem formalized. 271 00:16:16,960 --> 00:16:19,370 How do we solve it? 272 00:16:19,370 --> 00:16:24,860 Well, the most obvious solution is brute force. 273 00:16:24,860 --> 00:16:27,510 I enumerate all possible combinations 274 00:16:27,510 --> 00:16:36,066 of items; that is to say, I generate all subsets 275 00:16:36,066 --> 00:16:37,440 of the items that are available-- 276 00:16:37,440 --> 00:16:40,020 I don't know why it says subjects here, 277 00:16:40,020 --> 00:16:41,520 but we should have said items. 278 00:16:41,520 --> 00:16:44,470 Let me fix that. 279 00:16:44,470 --> 00:16:47,170 This is called the power set. 280 00:16:47,170 --> 00:16:51,370 So the power set of a set includes the empty subset. 281 00:16:51,370 --> 00:16:54,670 It includes the set that includes everything 282 00:16:54,670 --> 00:16:58,730 and everything in between. 283 00:16:58,730 --> 00:17:03,550 So subsets of size one, subsets of size two, et cetera. 284 00:17:03,550 --> 00:17:07,450 So now I've generated all possible sets of items. 285 00:17:07,450 --> 00:17:10,960 I can now go through and sum up the weights 286 00:17:10,960 --> 00:17:16,460 and remove all those sets that weigh more than I'm allowed. 287 00:17:16,460 --> 00:17:18,780 And then from the remaining combinations, 288 00:17:18,780 --> 00:17:23,390 choose any one whose value is the largest. 289 00:17:23,390 --> 00:17:25,280 I say choose any one because there 290 00:17:25,280 --> 00:17:27,895 could be ties, in which case I don't care which I choose. 291 00:17:30,740 --> 00:17:34,800 So it's pretty obvious that this is going 292 00:17:34,800 --> 00:17:37,170 to give you a correct answer. 293 00:17:37,170 --> 00:17:39,540 You're considering all possibilities 294 00:17:39,540 --> 00:17:40,620 and choosing a winner. 295 00:17:43,290 --> 00:17:47,280 Unfortunately, it's usually not very practical. 296 00:17:47,280 --> 00:17:51,910 What we see here is that's what the power 297 00:17:51,910 --> 00:17:54,420 set is if you have 100 vec. 298 00:17:54,420 --> 00:17:57,790 Not very practical, right, even for a fast computer 299 00:17:57,790 --> 00:18:01,170 generating that many possibilities is going 300 00:18:01,170 --> 00:18:04,200 to take a rather long time. 301 00:18:04,200 --> 00:18:05,910 So kind of disappointing. 302 00:18:05,910 --> 00:18:09,340 We look at it and say, well, we got a brute force algorithm. 303 00:18:09,340 --> 00:18:13,740 It will solve the problem, but it'll take too long. 304 00:18:13,740 --> 00:18:15,090 We can't actually do it. 305 00:18:15,090 --> 00:18:17,400 100 is a pretty small number, right. 306 00:18:17,400 --> 00:18:19,860 We often end up solving optimization problems 307 00:18:19,860 --> 00:18:22,830 where n is something closer to 1,000, sometimes 308 00:18:22,830 --> 00:18:25,020 even a million. 309 00:18:25,020 --> 00:18:27,970 Clearly, brute force isn't going to work. 310 00:18:27,970 --> 00:18:30,040 So that raises the next question, 311 00:18:30,040 --> 00:18:32,420 are we just being stupid? 312 00:18:32,420 --> 00:18:34,970 Is there a better algorithm that I should have showed you? 313 00:18:34,970 --> 00:18:35,970 I shouldn't say we. 314 00:18:35,970 --> 00:18:37,730 Am I just being stupid? 315 00:18:37,730 --> 00:18:42,950 Is there a better algorithm that would have given us the answer? 316 00:18:42,950 --> 00:18:48,960 The sad answer to that is no for the knapsack problem. 317 00:18:48,960 --> 00:18:52,820 And indeed many optimization problems 318 00:18:52,820 --> 00:18:56,240 are inherently exponential. 319 00:18:56,240 --> 00:19:00,590 What that means is there is no algorithm that 320 00:19:00,590 --> 00:19:04,880 provides an exact solution to this problem whose worst 321 00:19:04,880 --> 00:19:08,846 case running time is not exponential in the number 322 00:19:08,846 --> 00:19:09,345 of items. 323 00:19:12,640 --> 00:19:14,770 It is an exponentially hard problem. 324 00:19:17,470 --> 00:19:21,810 There is no really good solution. 325 00:19:21,810 --> 00:19:28,010 But that should not make you sad because while there's 326 00:19:28,010 --> 00:19:32,060 no perfect solution, we're going to look at a couple of really 327 00:19:32,060 --> 00:19:36,020 very good solutions that will make this poor woman 328 00:19:36,020 --> 00:19:38,400 a happier person. 329 00:19:38,400 --> 00:19:40,220 So let's start with the greedy algorithm. 330 00:19:40,220 --> 00:19:44,360 I already talked to you about greedy algorithms. 331 00:19:44,360 --> 00:19:47,930 So it could hardly be simpler. 332 00:19:47,930 --> 00:19:50,600 We say while the knapsack is not full, 333 00:19:50,600 --> 00:19:52,770 put the best available item into the knapsack. 334 00:19:57,916 --> 00:19:59,040 When it's full, we're done. 335 00:20:03,710 --> 00:20:06,100 You do need to ask a question. 336 00:20:06,100 --> 00:20:09,290 What does best mean? 337 00:20:09,290 --> 00:20:14,100 Is the best item the most valuable? 338 00:20:14,100 --> 00:20:16,230 Is it the least expensive in terms 339 00:20:16,230 --> 00:20:20,190 of, say, the fewest calories, in my case? 340 00:20:20,190 --> 00:20:24,050 Or is it the highest ratio of value to units? 341 00:20:24,050 --> 00:20:27,120 Now, maybe I think a calorie in a glass of beer 342 00:20:27,120 --> 00:20:30,590 is worth more than a calorie in a bar of chocolate, 343 00:20:30,590 --> 00:20:33,330 maybe vice versa. 344 00:20:33,330 --> 00:20:37,380 Which gets me to a concrete example. 345 00:20:37,380 --> 00:20:40,680 So you're about to sit down to a meal. 346 00:20:40,680 --> 00:20:44,190 You know how much you value the various different foods. 347 00:20:44,190 --> 00:20:45,930 For example, maybe you like donuts 348 00:20:45,930 --> 00:20:48,360 more than you like apples. 349 00:20:48,360 --> 00:20:50,070 You have a calorie budget, and here we're 350 00:20:50,070 --> 00:20:52,290 going to have a fairly austere budget-- 351 00:20:52,290 --> 00:20:54,980 it's only one meal; it's not the whole day-- 352 00:20:54,980 --> 00:20:58,920 of 750 calories, and we're going to have to go through menus 353 00:20:58,920 --> 00:21:01,470 and choose what to eat. 354 00:21:01,470 --> 00:21:04,720 That is as we've seen a knapsack problem. 355 00:21:04,720 --> 00:21:06,640 They should probably have a knapsack solver 356 00:21:06,640 --> 00:21:10,210 at every McDonald's and Burger King. 357 00:21:10,210 --> 00:21:16,690 So here's a menu I just made up of wine, beer, pizza, burger, 358 00:21:16,690 --> 00:21:19,930 fries, Coke, apples, and a donut, 359 00:21:19,930 --> 00:21:24,580 and the value I might place on each of these 360 00:21:24,580 --> 00:21:29,350 and the number of calories that actually are in each of these. 361 00:21:29,350 --> 00:21:32,050 And we're going to build a program that 362 00:21:32,050 --> 00:21:33,880 will find an optimal menu. 363 00:21:36,710 --> 00:21:40,150 And if you don't like this menu, you can run the program 364 00:21:40,150 --> 00:21:42,340 and change the values to be whatever you like. 365 00:21:46,720 --> 00:21:49,930 Well, as you saw if you took 60001, 366 00:21:49,930 --> 00:21:54,040 we like to start with an abstract data type, 367 00:21:54,040 --> 00:21:57,520 like to organize our program around data abstractions. 368 00:21:57,520 --> 00:22:00,580 So I've got this class food. 369 00:22:00,580 --> 00:22:02,890 I can initialize things. 370 00:22:02,890 --> 00:22:07,880 I have a getValue, a getCost, density, 371 00:22:07,880 --> 00:22:11,515 which is going to be the value divided by the cost, and then 372 00:22:11,515 --> 00:22:14,580 a string representation. 373 00:22:14,580 --> 00:22:19,480 So nothing here that you should not all be very familiar with. 374 00:22:23,010 --> 00:22:26,950 Then I'm going to have a function called buildMenu, 375 00:22:26,950 --> 00:22:29,410 which will take in a list of names 376 00:22:29,410 --> 00:22:34,225 and a list of values of equal length and a list of calories. 377 00:22:34,225 --> 00:22:36,550 They're all the same length. 378 00:22:36,550 --> 00:22:37,730 And it will build the menu. 379 00:22:41,380 --> 00:22:44,740 So it's going to be a menu of tuples-- 380 00:22:44,740 --> 00:22:46,840 a menu of foods, rather. 381 00:22:46,840 --> 00:22:49,990 And I build each food by giving it its name, its value, 382 00:22:49,990 --> 00:22:53,140 and its caloric content. 383 00:22:53,140 --> 00:22:53,980 Now I have a menu. 384 00:22:57,210 --> 00:22:59,760 Now comes the fun part. 385 00:22:59,760 --> 00:23:03,090 Here is an implementation of a greedy algorithm. 386 00:23:03,090 --> 00:23:06,810 I called it a flexible greedy primarily because 387 00:23:06,810 --> 00:23:08,440 of this key function over here. 388 00:23:11,070 --> 00:23:15,590 So you'll notice in red there's a parameter called keyfunction. 389 00:23:18,450 --> 00:23:22,780 That's going to be-- map the elements of items to numbers. 390 00:23:25,310 --> 00:23:33,590 So it will be used to sort the items. 391 00:23:33,590 --> 00:23:37,110 So I want to sort them from best to worst, 392 00:23:37,110 --> 00:23:42,650 and this function will be used to tell me what I mean by best. 393 00:23:42,650 --> 00:23:47,660 So maybe keyfunction will just return the value or maybe 394 00:23:47,660 --> 00:23:50,000 it will return the weight or maybe it will return 395 00:23:50,000 --> 00:23:53,960 some function of the density. 396 00:23:53,960 --> 00:23:56,750 But the idea here is I want to use 397 00:23:56,750 --> 00:24:00,830 one greedy algorithm independently 398 00:24:00,830 --> 00:24:03,500 of my definition of best. 399 00:24:03,500 --> 00:24:07,025 So I use keyfunction to define what I mean by best. 400 00:24:11,721 --> 00:24:12,720 So I'm going to come in. 401 00:24:12,720 --> 00:24:15,920 I'm going to sort it from best to worst. 402 00:24:15,920 --> 00:24:21,020 And then for i in range len of items sub copy-- 403 00:24:21,020 --> 00:24:21,980 I'm being good. 404 00:24:21,980 --> 00:24:22,790 I've copied it. 405 00:24:22,790 --> 00:24:25,580 That's why you sorted rather than sort. 406 00:24:25,580 --> 00:24:28,670 I don't want to have a side effect in the parameter. 407 00:24:28,670 --> 00:24:33,750 In general, it's not good hygiene to do that. 408 00:24:33,750 --> 00:24:37,610 And so for-- I'll go through it in order from best to worst. 409 00:24:37,610 --> 00:24:44,240 And if the value is less than the maximum cost, 410 00:24:44,240 --> 00:24:47,190 if putting it in would keep me under the cost or not 411 00:24:47,190 --> 00:24:50,310 over the cost, I put it in, and I just 412 00:24:50,310 --> 00:24:53,830 do that until I can't put anything else in. 413 00:24:56,610 --> 00:25:00,120 So I might skip a few because I might get to the point 414 00:25:00,120 --> 00:25:02,400 where there's only a few calories left, 415 00:25:02,400 --> 00:25:07,380 and the next best item is over that budget but maybe 416 00:25:07,380 --> 00:25:12,040 further down I'll find one that is not over it and put it in. 417 00:25:12,040 --> 00:25:16,210 That's why I can't exit as soon as I reach-- 418 00:25:16,210 --> 00:25:19,600 as soon as I find an item that won't fit. 419 00:25:19,600 --> 00:25:22,820 And then I'll just return. 420 00:25:22,820 --> 00:25:24,782 Does this make sense? 421 00:25:24,782 --> 00:25:28,010 Does anyone have any doubts about whether this algorithm 422 00:25:28,010 --> 00:25:28,775 actually works? 423 00:25:33,974 --> 00:25:35,640 I hope not because I think it does work. 424 00:25:38,280 --> 00:25:39,645 Let's ask the next question. 425 00:25:41,940 --> 00:25:44,550 How efficient do we think it is? 426 00:25:47,840 --> 00:25:50,455 What is the efficiency of this algorithm? 427 00:25:57,250 --> 00:26:00,836 Let's see where the time goes. 428 00:26:00,836 --> 00:26:04,340 That's the algorithm we just looked at. 429 00:26:04,340 --> 00:26:06,389 So I deleted the comment, so we'd 430 00:26:06,389 --> 00:26:07,930 have a little more room in the slide. 431 00:26:11,150 --> 00:26:13,765 Who wants to make a guess? 432 00:26:13,765 --> 00:26:15,140 By the way, this is the question. 433 00:26:15,140 --> 00:26:17,430 So please go answer the questions. 434 00:26:17,430 --> 00:26:19,890 We'll see how people do. 435 00:26:19,890 --> 00:26:21,765 But we can think about it as well together. 436 00:26:25,010 --> 00:26:30,110 Well, let's see where the time goes. 437 00:26:30,110 --> 00:26:33,560 The first thing is at the sort. 438 00:26:33,560 --> 00:26:37,940 So I'm going to sort all the items. 439 00:26:37,940 --> 00:26:41,330 And we heard from Professor Grimson 440 00:26:41,330 --> 00:26:44,580 how long the sort takes. 441 00:26:44,580 --> 00:26:47,160 See who remembers. 442 00:26:47,160 --> 00:26:50,190 Python uses something called timsort, 443 00:26:50,190 --> 00:26:54,240 which is a variant of something called quicksort, which 444 00:26:54,240 --> 00:26:59,940 has the same worst-case complexity as merge sort. 445 00:26:59,940 --> 00:27:09,260 And so we know that is n log n where n in this case 446 00:27:09,260 --> 00:27:10,520 would be the len of items. 447 00:27:17,600 --> 00:27:20,920 So we know we have that. 448 00:27:25,780 --> 00:27:27,390 Then we have a loop. 449 00:27:27,390 --> 00:27:29,140 How many times do we go through this loop? 450 00:27:34,670 --> 00:27:42,030 Well, we go through the loop n times, once for each item 451 00:27:42,030 --> 00:27:43,880 because we do end up looking at every item. 452 00:27:47,250 --> 00:27:51,276 And if we know that, what's the order? 453 00:27:51,276 --> 00:27:52,758 AUDIENCE: [INAUDIBLE]. 454 00:27:59,674 --> 00:28:03,855 JOHN GUTTAG: N log n plus n-- 455 00:28:03,855 --> 00:28:09,570 I guess is order n log n, right? 456 00:28:09,570 --> 00:28:14,280 So it's pretty efficient. 457 00:28:14,280 --> 00:28:17,370 And we can do this for big numbers like a million. 458 00:28:20,220 --> 00:28:23,190 Log of a million times a million is not a very big number. 459 00:28:26,150 --> 00:28:29,411 So it's very efficient. 460 00:28:32,720 --> 00:28:34,400 Here's some code that uses greedy. 461 00:28:37,000 --> 00:28:41,390 Takes in the items, the constraint, in this case 462 00:28:41,390 --> 00:28:46,800 will be the weight, and just calls greedy, 463 00:28:46,800 --> 00:28:49,970 but with the keyfunction and prints what we have. 464 00:28:56,140 --> 00:28:58,510 So we're going to test greedy. 465 00:28:58,510 --> 00:29:02,280 I actually think I used 750 in the code, but we can use 800. 466 00:29:02,280 --> 00:29:03,750 It doesn't matter. 467 00:29:03,750 --> 00:29:08,430 And here's something we haven't seen before. 468 00:29:08,430 --> 00:29:10,980 So used greedy by value to allocate 469 00:29:10,980 --> 00:29:15,570 and calls testGreedy with food, maxUnits and Food.getValue. 470 00:29:15,570 --> 00:29:17,260 Notice it's passing the function. 471 00:29:17,260 --> 00:29:18,410 That's why it's not-- 472 00:29:18,410 --> 00:29:21,560 no closed parentheses after it. 473 00:29:21,560 --> 00:29:24,780 Used greedy to allocate. 474 00:29:24,780 --> 00:29:27,238 And then we have something pretty interesting. 475 00:29:29,930 --> 00:29:31,830 What's going on with this lambda? 476 00:29:35,660 --> 00:29:41,210 So here we're going to be using greedy by density to allocate-- 477 00:29:41,210 --> 00:29:44,570 actually, sorry, this is greedy by cost. 478 00:29:44,570 --> 00:29:47,030 And you'll notice what we're doing is-- 479 00:29:47,030 --> 00:29:50,180 we don't want to pass in the cost, 480 00:29:50,180 --> 00:29:56,730 right, because we really want the opposite of the cost. 481 00:29:56,730 --> 00:30:00,300 We want to reverse the sort because we want the cheaper 482 00:30:00,300 --> 00:30:02,610 items to get chosen first. 483 00:30:02,610 --> 00:30:04,950 The ones that have fewer calories, not the ones that 484 00:30:04,950 --> 00:30:07,080 have more calories. 485 00:30:07,080 --> 00:30:10,740 As it happens, when I define cost, 486 00:30:10,740 --> 00:30:15,130 I defined it in the obvious way, the total number of calories. 487 00:30:15,130 --> 00:30:19,020 So I could have gone and written another function to do it, 488 00:30:19,020 --> 00:30:24,300 but since it was so simple, I decided to do it in line. 489 00:30:24,300 --> 00:30:28,460 So let's talk about lambda and then come back to it. 490 00:30:28,460 --> 00:30:32,880 Lambda is used to create an anonymous function, 491 00:30:32,880 --> 00:30:37,560 anonymous in the sense that it has no name. 492 00:30:37,560 --> 00:30:41,110 So you start with the keyword lambda. 493 00:30:41,110 --> 00:30:45,040 You then give it a sequence of identifiers 494 00:30:45,040 --> 00:30:46,210 and then some expression. 495 00:30:50,210 --> 00:30:55,560 What lambda does is it builds a function 496 00:30:55,560 --> 00:31:01,410 that evaluates that expression on those parameters and returns 497 00:31:01,410 --> 00:31:05,010 the result of evaluating the expression. 498 00:31:05,010 --> 00:31:12,130 So instead of writing def, I have inline defined a function. 499 00:31:12,130 --> 00:31:17,190 So if we go back to it here, you can see that what I've done 500 00:31:17,190 --> 00:31:24,762 is lambda x one divided by Food.getCost of x. 501 00:31:29,250 --> 00:31:33,290 Notice food is the class name here. 502 00:31:33,290 --> 00:31:38,480 So I'm taking the function getCost from the class food, 503 00:31:38,480 --> 00:31:46,350 and I'm passing it the parameter x, which is going to be what? 504 00:31:46,350 --> 00:31:47,960 What's the type of x going to be? 505 00:31:56,170 --> 00:31:59,200 I can wait you out. 506 00:31:59,200 --> 00:32:02,570 What is the type of x have to be for this lambda expression 507 00:32:02,570 --> 00:32:03,160 to make sense? 508 00:32:09,830 --> 00:32:14,240 Well, go back to the class food. 509 00:32:14,240 --> 00:32:16,040 What's the type of the argument of getCost? 510 00:32:22,020 --> 00:32:24,487 What's the name of the argument to getCost? 511 00:32:24,487 --> 00:32:25,570 That's an easier question. 512 00:32:31,212 --> 00:32:32,670 We'll go back and we'll look at it. 513 00:32:40,979 --> 00:32:42,770 What's the type of the argument to getCost? 514 00:32:46,310 --> 00:32:47,810 AUDIENCE: Food. 515 00:32:47,810 --> 00:32:48,760 JOHN GUTTAG: Food. 516 00:32:48,760 --> 00:32:50,890 Thank you. 517 00:32:50,890 --> 00:32:55,140 So I do have-- speaking of food, we 518 00:32:55,140 --> 00:32:57,600 do have a tradition in this class 519 00:32:57,600 --> 00:33:00,490 that people who answer questions correctly get 520 00:33:00,490 --> 00:33:03,340 rewarded with food. 521 00:33:03,340 --> 00:33:07,790 Oh, Napoli would have caught that. 522 00:33:12,560 --> 00:33:15,750 So it has to be of type food because it's 523 00:33:15,750 --> 00:33:18,460 self in the class food. 524 00:33:28,190 --> 00:33:35,150 So if we go back to here, this x has to be of type food, right. 525 00:33:40,450 --> 00:33:44,680 And sure enough, when we use it, it will be. 526 00:33:44,680 --> 00:33:48,240 Let's now use it. 527 00:33:48,240 --> 00:33:54,350 I should point out that lambda can be really handy as it 528 00:33:54,350 --> 00:33:58,250 is here, and it's possible to write 529 00:33:58,250 --> 00:34:02,270 amazing, beautiful, complicated lambda expressions. 530 00:34:02,270 --> 00:34:08,790 And back in the good old days of 6001 people learned to do that. 531 00:34:08,790 --> 00:34:12,800 And then they learned that they shouldn't. 532 00:34:12,800 --> 00:34:15,590 My view on lambda expressions is if I can't fit it 533 00:34:15,590 --> 00:34:17,330 in a single line, I just go right 534 00:34:17,330 --> 00:34:19,699 def and write a function definition 535 00:34:19,699 --> 00:34:22,580 because it's easier to debug. 536 00:34:22,580 --> 00:34:25,010 But for one-liners, lambda is great. 537 00:34:28,690 --> 00:34:31,159 Let's look at using greedy. 538 00:34:31,159 --> 00:34:34,340 So here's this function testGreedy, 539 00:34:34,340 --> 00:34:36,650 takes foods and the maximum number of units. 540 00:34:39,520 --> 00:34:42,420 And it's going to go through and it's 541 00:34:42,420 --> 00:34:45,792 going to test all three greedy algorithms. 542 00:34:48,750 --> 00:34:52,340 And we just saw that, and then here is the call of it. 543 00:34:52,340 --> 00:34:55,730 And so I picked up some names and the values. 544 00:34:55,730 --> 00:34:57,940 This is just the menu we saw. 545 00:34:57,940 --> 00:34:59,690 I'm going to build the menus, and then I'm 546 00:34:59,690 --> 00:35:02,900 going to call testGreedys. 547 00:35:02,900 --> 00:35:06,250 So let's go look at the code that does this. 548 00:35:10,950 --> 00:35:15,470 So here you have it or maybe you don't, because every time 549 00:35:15,470 --> 00:35:19,910 I switch applications Windows decides I don't want 550 00:35:19,910 --> 00:35:21,210 to show you the screen anyway. 551 00:35:27,720 --> 00:35:29,260 This really shouldn't be necessary. 552 00:35:35,230 --> 00:35:37,410 Keep changes. 553 00:35:37,410 --> 00:35:39,240 Why it keeps forgetting, I don't know. 554 00:35:39,240 --> 00:35:41,270 Anyway, so here's the code. 555 00:35:41,270 --> 00:35:42,860 It's all the code we just looked at. 556 00:35:42,860 --> 00:35:43,670 Now let's run it. 557 00:35:47,690 --> 00:35:51,830 Well, what we see here is that we 558 00:35:51,830 --> 00:35:56,000 use greedy by value to allocate 750 calories, 559 00:35:56,000 --> 00:35:57,770 and it chooses a burger, the pizza, 560 00:35:57,770 --> 00:36:00,980 and the wine for a total of-- 561 00:36:00,980 --> 00:36:07,140 a value of 284 happiness points, if you will. 562 00:36:07,140 --> 00:36:10,370 On the other hand, if we use greedy by cost, 563 00:36:10,370 --> 00:36:16,910 I get 318 happiness points and a different menu, the apple, 564 00:36:16,910 --> 00:36:19,280 the wine, the cola, the beer, and the donut. 565 00:36:19,280 --> 00:36:21,500 I've lost the pizza and the burger. 566 00:36:25,010 --> 00:36:27,620 I guess this is what I signed up for when 567 00:36:27,620 --> 00:36:28,910 I put my preferences on. 568 00:36:31,820 --> 00:36:42,330 And here's another solution with 318, apple, wine-- 569 00:36:42,330 --> 00:36:44,490 yeah, all right. 570 00:36:44,490 --> 00:36:47,290 So I actually got the same solution, 571 00:36:47,290 --> 00:36:49,760 but it just found them in a different order. 572 00:36:49,760 --> 00:36:51,510 Why did it find them in a different order? 573 00:36:51,510 --> 00:36:55,200 Because the sort order was different because in this case 574 00:36:55,200 --> 00:36:56,940 I was sorting by density. 575 00:37:00,390 --> 00:37:03,240 From this, we see an important point 576 00:37:03,240 --> 00:37:08,670 about greedy algorithms, right, that we used the algorithm 577 00:37:08,670 --> 00:37:10,260 and we got different answers. 578 00:37:13,110 --> 00:37:14,625 Why do we have different answers? 579 00:37:18,070 --> 00:37:20,860 The problem is that a greedy algorithm 580 00:37:20,860 --> 00:37:25,570 makes a sequence of local optimizations, 581 00:37:25,570 --> 00:37:29,410 chooses the locally optimal answer at every point, 582 00:37:29,410 --> 00:37:31,390 and that doesn't necessarily add up 583 00:37:31,390 --> 00:37:34,510 to a globally optimal answer. 584 00:37:34,510 --> 00:37:39,010 This is often illustrated by showing an example of, say, 585 00:37:39,010 --> 00:37:40,730 hill climbing. 586 00:37:40,730 --> 00:37:47,410 So imagine you're in a terrain that looks something like this, 587 00:37:47,410 --> 00:37:52,420 and you want to get to the highest point you can get. 588 00:37:52,420 --> 00:37:56,410 So you might choose as a greedy algorithm 589 00:37:56,410 --> 00:38:03,520 if you can go up, go up; if you can't go up, you stop. 590 00:38:03,520 --> 00:38:06,710 So whenever you get a choice, you go up. 591 00:38:06,710 --> 00:38:14,590 And so if I start here, I could right in the middle 592 00:38:14,590 --> 00:38:18,950 maybe say, all right, it's not up but it's not down either. 593 00:38:18,950 --> 00:38:20,510 So I'll go either left or right. 594 00:38:23,840 --> 00:38:27,120 And let's say I go right, so I come to here. 595 00:38:27,120 --> 00:38:30,500 Then I'll just make my way up to the top of the hill, 596 00:38:30,500 --> 00:38:34,670 making a locally optimal decision head up at each point, 597 00:38:34,670 --> 00:38:38,210 and I'll get here and I'll say, well, now any place I go 598 00:38:38,210 --> 00:38:40,190 takes me to a lower point. 599 00:38:40,190 --> 00:38:44,330 So I don't want to do it, right, because the greedy algorithm 600 00:38:44,330 --> 00:38:47,110 says never go backwards. 601 00:38:47,110 --> 00:38:49,540 So I'm here and I'm happy. 602 00:38:49,540 --> 00:38:56,560 On the other hand, if I had gone here for my first step, 603 00:38:56,560 --> 00:39:01,960 then my next step up would take me up, up, up, I'd get to here, 604 00:39:01,960 --> 00:39:08,250 and I'd stop and say, OK, no way to go but down. 605 00:39:08,250 --> 00:39:09,250 I don't want to go down. 606 00:39:09,250 --> 00:39:10,450 I'm done. 607 00:39:10,450 --> 00:39:14,170 And what I would find is I'm at a local maximum rather than 608 00:39:14,170 --> 00:39:14,985 a global maximum. 609 00:39:17,740 --> 00:39:21,160 And that's the problem with greedy algorithms, 610 00:39:21,160 --> 00:39:26,380 that you can get stuck at a local optimal point 611 00:39:26,380 --> 00:39:30,590 and not get to the best one. 612 00:39:30,590 --> 00:39:37,020 Now, we could ask the question, can 613 00:39:37,020 --> 00:39:39,300 I just say don't worry about a density 614 00:39:39,300 --> 00:39:43,260 will always get me the best answer? 615 00:39:43,260 --> 00:39:45,520 Well, I've tried a different experiment. 616 00:39:45,520 --> 00:39:47,620 Let's say I'm feeling expansive and I'm going 617 00:39:47,620 --> 00:39:57,820 to allow myself 1,000 calories. 618 00:39:57,820 --> 00:40:09,800 Well, here what we see is the winner will be greedy by value, 619 00:40:09,800 --> 00:40:13,630 happens to find a better answer, 424 instead of 413. 620 00:40:15,990 --> 00:40:19,700 So there is no way to know in advance. 621 00:40:19,700 --> 00:40:22,520 Sometimes this definition of best might work. 622 00:40:22,520 --> 00:40:24,350 Sometimes that might work. 623 00:40:24,350 --> 00:40:27,950 Sometimes no definition of best will work, 624 00:40:27,950 --> 00:40:31,580 and you can't get to a good solution-- 625 00:40:31,580 --> 00:40:33,170 you get to a good solution. 626 00:40:33,170 --> 00:40:35,120 You can't get to an optimal solution 627 00:40:35,120 --> 00:40:37,370 with a greedy algorithm. 628 00:40:37,370 --> 00:40:40,940 On Wednesday, we'll talk about how do you actually 629 00:40:40,940 --> 00:40:44,600 guarantee finding an optimal solution in a better 630 00:40:44,600 --> 00:40:46,730 way than brute force. 631 00:40:46,730 --> 00:40:48,620 See you then.