1 00:00:00,000 --> 00:00:00,100 OPERATOR: -- 2 00:00:00,100 --> 00:00:02,400 The following content is provided under a Creative 3 00:00:02,400 --> 00:00:03,840 Commons license. 4 00:00:03,840 --> 00:00:06,840 Your support will help MIT OpenCourseWare continue to 5 00:00:06,840 --> 00:00:10,530 offer high quality educational resources for free. 6 00:00:10,530 --> 00:00:13,390 To make a donation, or view additional materials from 7 00:00:13,390 --> 00:00:17,490 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:17,490 --> 00:00:22,750 ocw.mit.edu. 9 00:00:22,750 --> 00:00:27,470 PROFESSOR: At the end of the lecture on Tuesday, a number 10 00:00:27,470 --> 00:00:30,310 of people asked me questions, asked Professor Grimson 11 00:00:30,310 --> 00:00:35,540 questions, which made it clear that I had been less than 12 00:00:35,540 --> 00:00:39,060 clear on at least a few things, so I want to come back 13 00:00:39,060 --> 00:00:43,400 and revisit a couple of the things we talked about at the 14 00:00:43,400 --> 00:00:45,930 end of the lecture. 15 00:00:45,930 --> 00:00:58,110 You'll remember that I had drawn this decision tree, in 16 00:00:58,110 --> 00:01:00,740 part because it's an important concept I want you to 17 00:01:00,740 --> 00:01:05,710 understand, the concept of decision trees, and also to 18 00:01:05,710 --> 00:01:10,430 illustrate, I hope, visually, some things related to dynamic 19 00:01:10,430 --> 00:01:11,910 programming. 20 00:01:11,910 --> 00:01:17,940 So we had in that decision tree, is we had the weight 21 00:01:17,940 --> 00:01:23,730 vector, and I just given a very simple one [5,3,2], and 22 00:01:23,730 --> 00:01:32,820 we had a very simple value vector, [9,7,8]. 23 00:01:32,820 --> 00:01:44,120 And then the way we drew the tree, was we started at the 24 00:01:44,120 --> 00:01:53,660 top, and said all right, we're going to first look at item 25 00:01:53,660 --> 00:01:57,510 number 2, which was the third item in our list of items, of 26 00:01:57,510 --> 00:02:04,200 course, and say that we had five pounds left of weight 27 00:02:04,200 --> 00:02:07,920 that our knapsack that could hold, and currently had a 28 00:02:07,920 --> 00:02:11,850 value of 0. 29 00:02:11,850 --> 00:02:18,240 And then we made a decision, to not put that last item in 30 00:02:18,240 --> 00:02:23,780 the backpack, and said if we made that decision, the next 31 00:02:23,780 --> 00:02:28,870 item we had to consider was item 1, we still had five 32 00:02:28,870 --> 00:02:35,090 pounds available, and we still had a weight 0 available. 33 00:02:35,090 --> 00:02:40,140 Now I, said the next item to consider is item 1, but really 34 00:02:40,140 --> 00:02:45,640 what I meant is, 1 and all of the items proceeding it in the 35 00:02:45,640 --> 00:02:51,790 list. This is my shorthand for saying the list up to and 36 00:02:51,790 --> 00:02:55,350 including items sub 1, kind of a normal way 37 00:02:55,350 --> 00:02:58,010 to think about it. 38 00:02:58,010 --> 00:03:01,750 And then we finish building the tree, left first step 39 00:03:01,750 --> 00:03:09,310 first, looking at all the no branches, 0,5,0 and then we 40 00:03:09,310 --> 00:03:15,920 were done, that was one branch. 41 00:03:15,920 --> 00:03:21,490 We then backed up, and said let's look at a yes, we'll 42 00:03:21,490 --> 00:03:25,480 include item number 1. 43 00:03:25,480 --> 00:03:33,940 Well, what happens here, if we've included that, it uses 44 00:03:33,940 --> 00:03:37,980 up all the available weight, and gave us the value of 9. 45 00:03:37,980 --> 00:03:41,340 STUDENT: [UNINTELLIGIBLE] 46 00:03:41,340 --> 00:03:44,210 PROFESSOR: Pardon? 47 00:03:44,210 --> 00:03:49,410 STUDENT: -- want to be off the bottom branch. 48 00:03:49,410 --> 00:03:52,390 PROFESSOR: Yup, Off by 1. 49 00:03:52,390 --> 00:03:54,980 Yeah, I wanted to come off this branch, because I've 50 00:03:54,980 --> 00:04:03,340 backtrack just 1, thank you. 51 00:04:03,340 --> 00:04:09,160 And then I backtrack up to this branch, and 52 00:04:09,160 --> 00:04:13,810 from here we got 0,2,7. 53 00:04:13,810 --> 00:04:16,170 And I'm not going to draw the rest of the tree for you here, 54 00:04:16,170 --> 00:04:19,270 because I drew it last time, and you don't need to see the 55 00:04:19,270 --> 00:04:20,860 whole tree. 56 00:04:20,860 --> 00:04:29,330 The point I wanted to make is that for every node, except 57 00:04:29,330 --> 00:04:34,220 the leaves, the leaves are the bottom of a tree in this case, 58 00:04:34,220 --> 00:04:36,810 computer scientists are weird, right, they draw trees where 59 00:04:36,810 --> 00:04:40,970 the root is at the top, and the leaves are at the bottom. 60 00:04:40,970 --> 00:04:44,524 And I don't know why, but since time immemorial that is 61 00:04:44,524 --> 00:04:49,540 the way computer scientists have drawn trees. 62 00:04:49,540 --> 00:04:52,260 That's why we're not biologists, I guess. 63 00:04:52,260 --> 00:04:54,970 We don't understand these things. 64 00:04:54,970 --> 00:04:58,650 But what I want you to notice is that for each node, except 65 00:04:58,650 --> 00:05:05,310 the leaves, the solution for that node can be computed from 66 00:05:05,310 --> 00:05:10,450 the solutions from it's children. 67 00:05:10,450 --> 00:05:16,430 So in order to look at the solution of this node, I 68 00:05:16,430 --> 00:05:23,020 choose one of the solutions of it's children, a or b, is the 69 00:05:23,020 --> 00:05:26,970 best solution if I'm here, and of course this is the better 70 00:05:26,970 --> 00:05:29,360 of the 2 solutions. 71 00:05:29,360 --> 00:05:34,280 If I look at this node, I get to choose its solution as the 72 00:05:34,280 --> 00:05:39,890 better of the solution for this node, and this node. 73 00:05:39,890 --> 00:05:43,340 All the way up to the top, where when I have to choose 74 00:05:43,340 --> 00:05:47,020 the best solution to the whole problem, it's either the best 75 00:05:47,020 --> 00:05:49,140 solution to the left node, or the best solution 76 00:05:49,140 --> 00:05:51,120 to the right node. 77 00:05:51,120 --> 00:05:58,200 This happens to be a binary decision tree. 78 00:05:58,200 --> 00:06:03,080 There's nothing magic about there being only two nodes, 79 00:06:03,080 --> 00:06:05,410 for the knapsack problem, that's just the way it works 80 00:06:05,410 --> 00:06:09,350 out, but there are other problems where there might be 81 00:06:09,350 --> 00:06:13,920 multiple decisions to make, more than a or yes or no, but 82 00:06:13,920 --> 00:06:19,790 it's always the case here that I have what we last time 83 00:06:19,790 --> 00:06:23,080 talked about as what? 84 00:06:23,080 --> 00:06:36,530 Optimal sub structure. 85 00:06:36,530 --> 00:06:42,600 As I defined it last time, it means that I can solve a 86 00:06:42,600 --> 00:06:48,390 problem by finding the optimal solution to smaller sub 87 00:06:48,390 --> 00:06:54,340 problems. Classic divide and conquer that we've seen over 88 00:06:54,340 --> 00:06:57,900 and over again in the term. 89 00:06:57,900 --> 00:07:02,360 Take a hard problem, say well, I can solve it by solving 2 90 00:07:02,360 --> 00:07:06,940 smaller problems and combine their solution, and this case, 91 00:07:06,940 --> 00:07:16,700 the combining is choosing the best, it's a or b. 92 00:07:16,700 --> 00:07:20,380 So then, I went directly from that way of thinking about the 93 00:07:20,380 --> 00:07:26,260 problem, to this straightforward, at the top of 94 00:07:26,260 --> 00:07:31,170 the slide here, also at the top of your handout, both 95 00:07:31,170 --> 00:07:35,090 yesterday and today, a straightforward implementation 96 00:07:35,090 --> 00:07:44,650 of max val, that basically just did this. 97 00:07:44,650 --> 00:07:48,100 And as you might have guessed, when you're doing this sort of 98 00:07:48,100 --> 00:07:54,050 thing, recursion is a very natural way to implement it. 99 00:07:54,050 --> 00:07:58,770 We then ran this, demonstrated that it got the right answer 100 00:07:58,770 --> 00:08:00,980 on problems that were small enough that we knew with the 101 00:08:00,980 --> 00:08:08,160 right answer was, ran it on a big problem, got what we hoped 102 00:08:08,160 --> 00:08:10,640 was the right answer, but we had no good way to check it in 103 00:08:10,640 --> 00:08:17,830 our heads, but noticed it took a long time to run. 104 00:08:17,830 --> 00:08:19,400 And then we asked ourselves, why did it 105 00:08:19,400 --> 00:08:21,690 take so long to run? 106 00:08:21,690 --> 00:08:24,800 And when we turned on the print statement, what we saw 107 00:08:24,800 --> 00:08:26,580 is because it was doing the same thing 108 00:08:26,580 --> 00:08:31,800 over and over again. 109 00:08:31,800 --> 00:08:37,740 Because we had a lot of the sub-problems were the same. 110 00:08:37,740 --> 00:08:40,450 It was as if, when we went through this search tree, we 111 00:08:40,450 --> 00:08:43,300 never remembered what we got at the bottom, and we just 112 00:08:43,300 --> 00:08:46,470 re-computed things over and over. 113 00:08:46,470 --> 00:08:51,320 So that led us to look at memoization, the sort of key 114 00:08:51,320 --> 00:09:03,370 idea behind dynamic programming, which says let's 115 00:09:03,370 --> 00:09:08,470 remember the work we've done and not do it all over again. 116 00:09:08,470 --> 00:09:12,960 We used a dictionary to implement the memo, and that 117 00:09:12,960 --> 00:09:18,850 got us to the fast max val, which got called from max val 118 00:09:18,850 --> 00:09:23,200 0, because I wanted to make sure I didn't change the 119 00:09:23,200 --> 00:09:27,330 specification of max val by introducing this memo that 120 00:09:27,330 --> 00:09:30,050 users shouldn't know even exists, because it's part of 121 00:09:30,050 --> 00:09:35,090 the implementation, not part of the problem statement. 122 00:09:35,090 --> 00:09:40,340 We did that, and all I did was take the original code and 123 00:09:40,340 --> 00:09:43,060 keep track of what I've done, and say have I computed this 124 00:09:43,060 --> 00:09:50,420 value before, if so, don't compute it again. 125 00:09:50,420 --> 00:09:54,060 And that's the key idea that you'll see over and over again 126 00:09:54,060 --> 00:09:57,610 as you solve problems with dynamic programming, is you 127 00:09:57,610 --> 00:10:02,840 say, have I already solved this problem, if so, let me 128 00:10:02,840 --> 00:10:05,270 look up the answer. 129 00:10:05,270 --> 00:10:08,260 If I haven't solved the problem, let me solve it, and 130 00:10:08,260 --> 00:10:15,240 store the answer away for later reference. 131 00:10:15,240 --> 00:10:19,140 Very simple idea, and typically the beauty of 132 00:10:19,140 --> 00:10:22,370 dynamic programming as you've seen here, is not only is the 133 00:10:22,370 --> 00:10:26,470 idea simple, even the implementation is simple. 134 00:10:26,470 --> 00:10:30,700 There are a lot of complicated algorithmic ideas, dynamic 135 00:10:30,700 --> 00:10:32,680 programming is not one of them. 136 00:10:32,680 --> 00:10:35,460 Which is one of the reasons we teach it here. 137 00:10:35,460 --> 00:10:37,810 The other reason we teach it here, in addition to it being 138 00:10:37,810 --> 00:10:42,550 simple, is that it's incredibly useful. 139 00:10:42,550 --> 00:10:46,700 It's probably among the most useful ideas there is for 140 00:10:46,700 --> 00:10:50,010 solving complicated problems. 141 00:10:50,010 --> 00:10:55,240 All right, now let's look at it. 142 00:10:55,240 --> 00:10:57,900 So here's the fast version, we looked at it last time, I'm 143 00:10:57,900 --> 00:11:00,610 not going to bore you by going through the details of it 144 00:11:00,610 --> 00:11:07,530 again, but we'll run it. 145 00:11:07,530 --> 00:11:15,670 This was the big example we looked at last time, where we 146 00:11:15,670 --> 00:11:22,560 had 30 items we could put in to choose from, so when we do 147 00:11:22,560 --> 00:11:25,100 it exponentially, it looks like it's 2 to the 30, which 148 00:11:25,100 --> 00:11:33,290 is a big number, but when we ran this, it found the answer, 149 00:11:33,290 --> 00:11:38,630 and it took only 1805 calls. 150 00:11:38,630 --> 00:11:42,870 Now I got really excited about this because, to me it's 151 00:11:42,870 --> 00:11:46,430 really amazing, that we've taken a problem that is 152 00:11:46,430 --> 00:11:51,290 apparently exponential, and solved it like that. 153 00:11:51,290 --> 00:11:56,240 And in fact, I could double the size of the items to 154 00:11:56,240 --> 00:11:59,350 choose from, and it would still run like. 155 00:11:59,350 --> 00:12:02,160 Eh - I'm not very good at snapping my fingers -- it 156 00:12:02,160 --> 00:12:04,690 would still run quickly. 157 00:12:04,690 --> 00:12:11,850 All right, so here's the question: have I found a way 158 00:12:11,850 --> 00:12:14,050 to solve an inherently exponential 159 00:12:14,050 --> 00:12:17,180 problem in linear time. 160 00:12:17,180 --> 00:12:20,570 Because what we'll see here, and we saw a little of this 161 00:12:20,570 --> 00:12:26,670 last time, as I double the size of the items, I only 162 00:12:26,670 --> 00:12:29,960 really roughly double the running time. 163 00:12:29,960 --> 00:12:32,520 Quite amazing. 164 00:12:32,520 --> 00:12:34,900 So have I done that? 165 00:12:34,900 --> 00:12:39,250 Well, I wish I had, because then I would be really famous, 166 00:12:39,250 --> 00:12:42,320 and my department head would give me a big raise, and all 167 00:12:42,320 --> 00:12:45,120 sorts of wonderful things would follow. 168 00:12:45,120 --> 00:12:51,680 But, I'm not famous, and I didn't solve that problem. 169 00:12:51,680 --> 00:12:53,390 What's going on? 170 00:12:53,390 --> 00:12:58,890 Well this particular algorithm takes roughly, and I'll come 171 00:12:58,890 --> 00:13:12,930 back to the roughly question, order (n,s) time, where n is 172 00:13:12,930 --> 00:13:21,800 the number of items in the list and s, roughly speaking, 173 00:13:21,800 --> 00:13:26,980 is the size of the knapsack. 174 00:13:26,980 --> 00:13:37,280 We should also observe, that it takes order and s space. 175 00:13:37,280 --> 00:13:46,030 Because it's not free to store all these values. 176 00:13:46,030 --> 00:13:51,810 So at one level what I'm doing is trading time for space. 177 00:13:51,810 --> 00:13:54,180 It can run faster because I'm using some 178 00:13:54,180 --> 00:13:59,900 space to save things. 179 00:13:59,900 --> 00:14:08,760 So in this case, we had 30 items and the wait was 40, 180 00:14:08,760 --> 00:14:11,250 and, you know, this gives us 1200 which is 181 00:14:11,250 --> 00:14:14,000 kind of where we were. 182 00:14:14,000 --> 00:14:18,160 And I'm really emphasizing kind of here, because really 183 00:14:18,160 --> 00:14:23,600 what I'm using the available size for, is as a proxy for 184 00:14:23,600 --> 00:14:28,940 the number of items that can fit in the knapsack. 185 00:14:28,940 --> 00:14:32,560 Because the actual running time of this, and the actual 186 00:14:32,560 --> 00:14:38,570 space of this algorithm, is governed, interestingly 187 00:14:38,570 --> 00:14:45,930 enough, not by the size of the problem alone, but by the size 188 00:14:45,930 --> 00:14:51,050 of the solution. 189 00:14:51,050 --> 00:14:58,430 And I'm going to come back to that. 190 00:14:58,430 --> 00:15:05,840 So how long it takes to run is related to how many items I 191 00:15:05,840 --> 00:15:08,000 end up being able to fit into the knapsack. 192 00:15:08,000 --> 00:15:16,070 If you think about it, this make sense. 193 00:15:16,070 --> 00:15:22,280 An entry is made in the memo whenever an item, and an 194 00:15:22,280 --> 00:15:27,560 available size pair is considered. 195 00:15:27,560 --> 00:15:32,380 As soon as the available size goes to 0, I know I can't 196 00:15:32,380 --> 00:15:37,510 enter any more items into the memo, right? 197 00:15:37,510 --> 00:15:44,760 So the number of items I have to remember is related to how 198 00:15:44,760 --> 00:15:51,030 many items I can fit in the knapsack. 199 00:15:51,030 --> 00:15:54,790 And of course, the amount of running time is exactly the 200 00:15:54,790 --> 00:15:56,540 number of things I have to remember, 201 00:15:56,540 --> 00:15:58,950 almost exactly, right? 202 00:15:58,950 --> 00:16:06,330 So you can see if you think about it abstractly, why the 203 00:16:06,330 --> 00:16:11,220 amount of work I have to do here will be proportional to 204 00:16:11,220 --> 00:16:14,840 the number of items I can fit in, that is to say, the size 205 00:16:14,840 --> 00:16:17,480 of the solution. 206 00:16:17,480 --> 00:16:23,950 This is not the way we'd like to talk about complexity. 207 00:16:23,950 --> 00:16:27,600 When we talk about the order, or big O, as we keep writing 208 00:16:27,600 --> 00:16:32,940 it, of a problem, we always prefer to talk about it in 209 00:16:32,940 --> 00:16:34,950 terms of the size of the problem. 210 00:16:34,950 --> 00:16:40,730 And that makes sense because in general we don't know the 211 00:16:40,730 --> 00:16:45,290 size of the solution until we've solved it. 212 00:16:45,290 --> 00:16:53,520 So we'd much rather define big O in terms of the inputs. 213 00:16:53,520 --> 00:16:57,700 What we have here is what's called a 214 00:16:57,700 --> 00:17:04,930 pseudo-polynomial algorithm. 215 00:17:04,930 --> 00:17:08,730 You remember a polynomial algorithm is an algorithm 216 00:17:08,730 --> 00:17:15,850 that's polynomial in the size of the inputs. 217 00:17:15,850 --> 00:17:18,940 Here we have an algorithm that's polynomial in the size 218 00:17:18,940 --> 00:17:28,790 of the solution, hence the qualifier pseudo. 219 00:17:28,790 --> 00:17:33,040 More formally, and again this is not crucial to get all the 220 00:17:33,040 --> 00:17:39,390 details on this, if we think about a numerical algorithm, a 221 00:17:39,390 --> 00:17:45,860 pseudo-polynomial algorithm has running time that's 222 00:17:45,860 --> 00:18:02,510 polynomial in the numeric value of the input. 223 00:18:02,510 --> 00:18:05,300 I'm using a numeric example because it's easier to talk 224 00:18:05,300 --> 00:18:11,000 about it that way. 225 00:18:11,000 --> 00:18:13,730 So you might to look at, say, an implementation of 226 00:18:13,730 --> 00:18:19,600 factorial, and say its running time is proportional to the 227 00:18:19,600 --> 00:18:22,940 numerical value of the number who's factorial. 228 00:18:22,940 --> 00:18:29,470 If I'm computing factorial of 8, I'll do 8 operations, Right 229 00:18:29,470 --> 00:18:34,550 Factorial of 10, I'll do 10 operations. 230 00:18:34,550 --> 00:18:39,870 Now the key issue to think about here, is that as we look 231 00:18:39,870 --> 00:18:48,770 at this kind of thing, what we'll see is that, if we look 232 00:18:48,770 --> 00:18:58,820 at a numeric value, we know that that's exponential number 233 00:18:58,820 --> 00:19:12,190 in the number of digits. 234 00:19:12,190 --> 00:19:16,500 So that's the key thing to think about, Right That you 235 00:19:16,500 --> 00:19:22,750 can take a problem, and typically, when we're actually 236 00:19:22,750 --> 00:19:26,895 formally looking at computational complexity, big 237 00:19:26,895 --> 00:19:37,780 O, what we'll define the in terms of, is the size of the 238 00:19:37,780 --> 00:19:48,070 coding of the input. 239 00:19:48,070 --> 00:19:51,930 The number of bits required to represent the 240 00:19:51,930 --> 00:19:59,180 input in the computer. 241 00:19:59,180 --> 00:20:03,110 And so when we say something is exponential, we're talking 242 00:20:03,110 --> 00:20:05,600 about in terms of the number of bits required 243 00:20:05,600 --> 00:20:10,510 to represent it. 244 00:20:10,510 --> 00:20:14,990 Now why am I going through all this, maybe I should use the 245 00:20:14,990 --> 00:20:18,050 word pseudo-theory? 246 00:20:18,050 --> 00:20:22,380 Only because I want you to understand that when we start 247 00:20:22,380 --> 00:20:27,830 talking about complexity, it can be really quite subtle. 248 00:20:27,830 --> 00:20:30,760 And you have to be very careful to think about what 249 00:20:30,760 --> 00:20:34,680 you mean, or you can be very surprised at how long 250 00:20:34,680 --> 00:20:38,980 something takes to run, or how much space it uses. 251 00:20:38,980 --> 00:20:43,450 And you have to understand the difference between, are you 252 00:20:43,450 --> 00:20:46,110 defining the performance in terms of the size of the 253 00:20:46,110 --> 00:20:49,330 problem, or the size of the solution. 254 00:20:49,330 --> 00:20:53,020 When you talk about the size of the problem, what do you 255 00:20:53,020 --> 00:20:56,940 mean by that, is it the length of an array, is it the size of 256 00:20:56,940 --> 00:21:01,850 the elements of the array, and it can matter. 257 00:21:01,850 --> 00:21:05,620 So when we ask you to tell us something about the 258 00:21:05,620 --> 00:21:11,270 efficiency, on for example a quiz, we want you to be very 259 00:21:11,270 --> 00:21:15,660 careful not to just write something like, order n 260 00:21:15,660 --> 00:21:22,290 squared, but to tell us what n is. 261 00:21:22,290 --> 00:21:26,520 For example, the number of elements in the list. But if 262 00:21:26,520 --> 00:21:30,010 you have a list of lists, maybe it's not just the number 263 00:21:30,010 --> 00:21:32,120 elements in the list, maybe it depends upon what 264 00:21:32,120 --> 00:21:37,670 the elements are. 265 00:21:37,670 --> 00:21:42,960 So just sort of a warning to try and be very careful as you 266 00:21:42,960 --> 00:21:51,740 think about these things, all right. 267 00:21:51,740 --> 00:21:55,900 So I haven't done magic, I've given you a really fast way to 268 00:21:55,900 --> 00:22:02,220 solve a knapsack problem, but it's still exponential deep 269 00:22:02,220 --> 00:22:05,640 down in its heart, in something. 270 00:22:05,640 --> 00:22:10,610 All right, in recitation you'll get a chance to look at 271 00:22:10,610 --> 00:22:13,850 yet another kind of problem that can be solved by dynamic 272 00:22:13,850 --> 00:22:17,640 programming, there are many of them. 273 00:22:17,640 --> 00:22:21,600 Before we leave the knapsack problem though, I want to take 274 00:22:21,600 --> 00:22:23,350 a couple of minutes to look at a slight 275 00:22:23,350 --> 00:22:26,680 variation of the problem. 276 00:22:26,680 --> 00:22:29,770 So let's look at this one. 277 00:22:29,770 --> 00:22:35,120 Suppose I told you that not only was there a limit on the 278 00:22:35,120 --> 00:22:38,100 total weight of the items in the knapsack, 279 00:22:38,100 --> 00:22:41,580 but also on the volume. 280 00:22:41,580 --> 00:22:45,860 OK, if I gave you a box of balloons, the fact that they 281 00:22:45,860 --> 00:22:48,360 didn't weight anything wouldn't mean you couldn't 282 00:22:48,360 --> 00:22:52,380 put, you could put lots of them in the knapsack, right? 283 00:22:52,380 --> 00:22:56,940 Sometimes it's the volume not the weight that matters, 284 00:22:56,940 --> 00:22:59,290 sometimes it's both. 285 00:22:59,290 --> 00:23:02,080 So how would we go about solving this problem if I told 286 00:23:02,080 --> 00:23:04,650 you not only was there a maximum weight, but there was 287 00:23:04,650 --> 00:23:07,400 a maximum volume. 288 00:23:07,400 --> 00:23:12,300 Well, we want to go back and attack it exactly the way we 289 00:23:12,300 --> 00:23:14,880 attacked it the first time, which was write some 290 00:23:14,880 --> 00:23:17,990 mathematical formulas. 291 00:23:17,990 --> 00:23:21,630 So you'll remember that when we looked at it, we said that 292 00:23:21,630 --> 00:23:28,290 the problem was to maximize the sum from i equals 1 to n, 293 00:23:28,290 --> 00:23:34,790 of p sub i, x sub i, maybe it should be 0 to n minus 1, but 294 00:23:34,790 --> 00:23:38,820 we won't worry about that. 295 00:23:38,820 --> 00:23:45,510 And we had to do it subject to the constraint that the sum 296 00:23:45,510 --> 00:23:50,530 from 1 to n of the weight sub i times x sub i, remember x is 297 00:23:50,530 --> 00:23:54,580 0 if it was in, 1 if it wasn't, was less than or equal 298 00:23:54,580 --> 00:23:58,740 to the cost, as I wrote it this time, which was the 299 00:23:58,740 --> 00:24:02,690 maximum allowable weight. 300 00:24:02,690 --> 00:24:06,350 What do we do if we want to add volume, is an issue? 301 00:24:06,350 --> 00:24:07,670 Does this change? 302 00:24:07,670 --> 00:24:10,410 Does the goal change? 303 00:24:10,410 --> 00:24:11,070 You're answering. 304 00:24:11,070 --> 00:24:13,780 Answer out -- no one else can see you shake your head. 305 00:24:13,780 --> 00:24:14,180 STUDENT: No. 306 00:24:14,180 --> 00:24:15,120 PROFESSOR: No. 307 00:24:15,120 --> 00:24:18,150 The goal does not change, it's still the same goal. 308 00:24:18,150 --> 00:24:19,090 What changes? 309 00:24:19,090 --> 00:24:22,280 STUDENT: The constraints. 310 00:24:22,280 --> 00:24:24,320 PROFESSOR: Yeah, and you don't get another bar. 311 00:24:24,320 --> 00:24:25,850 The constraint has to change. 312 00:24:25,850 --> 00:24:28,020 I've added a constraint. 313 00:24:28,020 --> 00:24:31,830 And, what's the constraint I've added? 314 00:24:31,830 --> 00:24:33,290 Somebody else -- yeah? 315 00:24:33,290 --> 00:24:36,210 STUDENT: You can't exceed the volume that the 316 00:24:36,210 --> 00:24:37,310 knapsack can hold. 317 00:24:37,310 --> 00:24:38,810 PROFESSOR: Right, but can you state in this 318 00:24:38,810 --> 00:24:39,760 kind of formal way? 319 00:24:39,760 --> 00:24:43,620 STUDENT: [INAUDIBLE] 320 00:24:43,620 --> 00:24:44,660 PROFESSOR: -- sum from i equals 1 to n -- 321 00:24:44,660 --> 00:24:45,120 STUDENT: [INAUDIBLE] 322 00:24:45,120 --> 00:24:55,170 PROFESSOR: Let's say v sub i, x sub i, is less than or equal 323 00:24:55,170 --> 00:24:59,790 to, we'll write k for the total allowable volume. 324 00:24:59,790 --> 00:25:02,040 Exactly. 325 00:25:02,040 --> 00:25:05,450 So the thing to notice here, is it's actually quite a 326 00:25:05,450 --> 00:25:13,920 simple little change we've made. 327 00:25:13,920 --> 00:25:18,520 I've simply added this one extra constraint, nice thing 328 00:25:18,520 --> 00:25:20,930 about thinking about it this way is it's easy to think 329 00:25:20,930 --> 00:25:24,810 about it, and what do you think I'll have to do if I 330 00:25:24,810 --> 00:25:28,560 want to go change the code? 331 00:25:28,560 --> 00:25:32,700 I'm not going to do it for you, but what would I think 332 00:25:32,700 --> 00:25:35,770 about doing when I change the code? 333 00:25:35,770 --> 00:25:38,360 Well, let's look at the simple version first, because it's 334 00:25:38,360 --> 00:25:39,800 easier to look at. 335 00:25:39,800 --> 00:25:42,000 At the top. 336 00:25:42,000 --> 00:25:45,440 Well basically, all I'd have to do is go through and find 337 00:25:45,440 --> 00:25:51,930 every place I checked the constraint, and change it. 338 00:25:51,930 --> 00:25:55,800 To incorporate the new constraint. 339 00:25:55,800 --> 00:25:58,180 And when I went to the dynamic programming problem, what 340 00:25:58,180 --> 00:26:02,960 would I have to do, what would change? 341 00:26:02,960 --> 00:26:07,990 The memo would have to change, as well as the checks, right? 342 00:26:07,990 --> 00:26:13,970 Because now, I not only would have to think about how much 343 00:26:13,970 --> 00:26:17,880 weight did I have available, but I have to think about how 344 00:26:17,880 --> 00:26:20,350 much volume did I have available. 345 00:26:20,350 --> 00:26:27,610 So whereas before, I had a mapping from the item and the 346 00:26:27,610 --> 00:26:31,950 weight available, now I would have to have it from a tuple 347 00:26:31,950 --> 00:26:35,270 of the weight and the volume. 348 00:26:35,270 --> 00:26:38,410 Very small changes. 349 00:26:38,410 --> 00:26:41,520 That's one of the things I want you to sort of understand 350 00:26:41,520 --> 00:26:47,040 as we look at algorithms, that they're very general, and once 351 00:26:47,040 --> 00:26:50,170 you've figured out how to solve one problem, you can 352 00:26:50,170 --> 00:26:53,770 often solve another problem by a very straightforward 353 00:26:53,770 --> 00:26:59,990 reduction of this kind of thing. 354 00:26:59,990 --> 00:27:03,360 All right, any questions about that. 355 00:27:03,360 --> 00:27:03,660 Yeah? 356 00:27:03,660 --> 00:27:05,860 STUDENT: I had a question about what you were talking 357 00:27:05,860 --> 00:27:06,310 about just before. 358 00:27:06,310 --> 00:27:06,660 PROFESSOR: The pseudo-polynomial? 359 00:27:06,660 --> 00:27:07,000 STUDENT: Yes. 360 00:27:07,000 --> 00:27:07,760 PROFESSOR: Ok. 361 00:27:07,760 --> 00:27:10,058 STUDENT: So, how do you come to a conclusion as to which 362 00:27:10,058 --> 00:27:15,464 you should use then, if you can determine the size based 363 00:27:15,464 --> 00:27:22,010 on solution, or based on input, so how do you decide? 364 00:27:22,010 --> 00:27:23,600 PROFESSOR: Great question. 365 00:27:23,600 --> 00:27:27,940 So the question is, how do you choose an algorithm, why would 366 00:27:27,940 --> 00:27:30,630 I choose to use a pseudo-polynomial algorithm 367 00:27:30,630 --> 00:27:33,960 when I don't know how big the solution is likely to be, I 368 00:27:33,960 --> 00:27:36,410 think that's one way to think about it. 369 00:27:36,410 --> 00:27:42,500 Well, so if we think about the knapsack problem, we can look 370 00:27:42,500 --> 00:27:45,870 at it, and we can ask ourselves, well first of all 371 00:27:45,870 --> 00:27:50,065 we know that the brute force exponential solution is going 372 00:27:50,065 --> 00:27:55,150 to be a loser if the number of items is large. 373 00:27:55,150 --> 00:27:58,090 Fundamentally in this case, what I could look at is the 374 00:27:58,090 --> 00:28:03,090 ratio of the number of items to the size of the knapsack, 375 00:28:03,090 --> 00:28:08,010 say well, I've got lots items to choose from, I probably 376 00:28:08,010 --> 00:28:10,130 won't put them all in. 377 00:28:10,130 --> 00:28:13,400 But even if I did, it would still only 378 00:28:13,400 --> 00:28:17,730 be 30 of them, right? 379 00:28:17,730 --> 00:28:18,380 It's hard. 380 00:28:18,380 --> 00:28:22,610 Typically what we'll discover is the pseudo-polynomial 381 00:28:22,610 --> 00:28:33,360 algorithms are usually better, and in this case, never worse. 382 00:28:33,360 --> 00:28:37,270 So this will never be worse than the brute force one. if I 383 00:28:37,270 --> 00:28:40,270 get really unlucky, I end up checking the same number of 384 00:28:40,270 --> 00:28:44,590 things, but I'd have to be really, it'd have to be a very 385 00:28:44,590 --> 00:28:47,780 strange structure to the problem. 386 00:28:47,780 --> 00:28:54,500 So it's almost always the case that, if you can find a 387 00:28:54,500 --> 00:29:01,930 solution that uses dynamic programming, it will be better 388 00:29:01,930 --> 00:29:05,710 than the brute force, and certainly not, well, maybe use 389 00:29:05,710 --> 00:29:09,420 more space, but not use more time. 390 00:29:09,420 --> 00:29:13,280 But there is no magic, here, and so the question you asked 391 00:29:13,280 --> 00:29:15,570 is a very good question. 392 00:29:15,570 --> 00:29:20,660 And it's sometimes the case in real life that you don't know 393 00:29:20,660 --> 00:29:23,830 which is the better algorithm on the data you're actually 394 00:29:23,830 --> 00:29:26,440 going to be crunching. 395 00:29:26,440 --> 00:29:31,230 And you pays your money and you takes your chances, right? 396 00:29:31,230 --> 00:29:34,830 And if the data is not what you think it's going to be, 397 00:29:34,830 --> 00:29:37,120 you may be wrong in your choice, so you typically do 398 00:29:37,120 --> 00:29:40,850 have to spend some time thinking about it, what's the 399 00:29:40,850 --> 00:29:42,470 data going to actually look like. 400 00:29:42,470 --> 00:29:45,660 Very good question. 401 00:29:45,660 --> 00:29:48,940 Anything else? 402 00:29:48,940 --> 00:29:55,970 All right, a couple of closing points before we leave this, 403 00:29:55,970 --> 00:29:59,380 things I would like you to remember. 404 00:29:59,380 --> 00:30:09,820 In dynamic programming, one of the things that's going on is 405 00:30:09,820 --> 00:30:17,900 we're trading time for space. 406 00:30:17,900 --> 00:30:21,690 Dynamic programming is not the only time we do that. 407 00:30:21,690 --> 00:30:26,680 We've solved a lot of problems that way, in fact, by trading 408 00:30:26,680 --> 00:30:30,220 time for space. 409 00:30:30,220 --> 00:30:32,910 Table look-up, for example, right, that if you're going to 410 00:30:32,910 --> 00:30:35,440 have trig tables, you may want to compute them all at once 411 00:30:35,440 --> 00:30:38,510 and then just look it up. 412 00:30:38,510 --> 00:30:40,570 So that's one thing. 413 00:30:40,570 --> 00:30:58,360 Two, don't be intimidated by exponential problems. There's 414 00:30:58,360 --> 00:31:01,580 a tendency for people to say, oh this problem's exponential, 415 00:31:01,580 --> 00:31:03,800 I can't solve it. 416 00:31:03,800 --> 00:31:06,870 Well, I solve 2 or 3 exponential problems before 417 00:31:06,870 --> 00:31:10,780 breakfast every day. 418 00:31:10,780 --> 00:31:13,750 You know things like, how to find my way to the bathroom is 419 00:31:13,750 --> 00:31:20,380 inherently exponential, but I manage to solve it anyway. 420 00:31:20,380 --> 00:31:21,800 Don't be intimidated. 421 00:31:21,800 --> 00:31:27,150 Even though it is apparently exponential, a lot of times 422 00:31:27,150 --> 00:31:31,470 you can actually solve it much, much faster. 423 00:31:31,470 --> 00:31:36,800 Other issues. 424 00:31:36,800 --> 00:31:53,990 Three: dynamic programming is broadly useful. 425 00:31:53,990 --> 00:31:58,170 Whenever you're looking at a problem that seems to have a 426 00:31:58,170 --> 00:32:02,780 natural recursive solution, think about whether you can 427 00:32:02,780 --> 00:32:07,070 attack it with dynamic programming. 428 00:32:07,070 --> 00:32:10,700 If you've got this optimal substructure, and overlapping 429 00:32:10,700 --> 00:32:15,180 sub-problems, you can use dynamic programming. 430 00:32:15,180 --> 00:32:19,380 So it's good for knapsacks, it's good for shortest paths, 431 00:32:19,380 --> 00:32:23,470 it's good for change-making, it's good for a whole variety 432 00:32:23,470 --> 00:32:28,260 of problems. And so keep it in your toolbox, and when you 433 00:32:28,260 --> 00:32:31,600 have a hard problem to solve, one of the first questions you 434 00:32:31,600 --> 00:32:35,490 should ask yourself is, can I use dynamic programming? 435 00:32:35,490 --> 00:32:40,510 It's great for string-matching problems of a whole variety. 436 00:32:40,510 --> 00:32:43,250 It's hugely useful. 437 00:32:43,250 --> 00:32:48,420 And finally, I want you to keep in mind the whole concept 438 00:32:48,420 --> 00:32:54,660 of problem reduction. 439 00:32:54,660 --> 00:33:00,340 I started with this silly description of a burglar, and 440 00:33:00,340 --> 00:33:03,930 said : Well this is really the knapsack problem, and now I 441 00:33:03,930 --> 00:33:06,550 can go Google the knapsack problem and find 442 00:33:06,550 --> 00:33:10,830 code to solve it. 443 00:33:10,830 --> 00:33:13,940 Any time you can reduce something to a previously 444 00:33:13,940 --> 00:33:19,760 solved problem, that's good. 445 00:33:19,760 --> 00:33:25,080 And this is a hugely important lesson to learn. 446 00:33:25,080 --> 00:33:28,560 People tend not to realize that the first question you 447 00:33:28,560 --> 00:33:31,790 should always ask yourself, is this really just something 448 00:33:31,790 --> 00:33:34,180 that's well-known in disguise? 449 00:33:34,180 --> 00:33:36,250 Is it a shortest path problem? 450 00:33:36,250 --> 00:33:38,380 Is it a nearest neighbor problem? 451 00:33:38,380 --> 00:33:42,250 Is it what string is this most similar to problem? 452 00:33:42,250 --> 00:33:47,240 There are scores of well-understood problems, but 453 00:33:47,240 --> 00:33:52,930 only really scores, it's not thousands of them, and over 454 00:33:52,930 --> 00:33:56,120 time you'll build up a vocabulary of these problems, 455 00:33:56,120 --> 00:33:59,730 and when you see something in your domain, be it physics or 456 00:33:59,730 --> 00:34:04,100 biology or anything else, linguistics, the question you 457 00:34:04,100 --> 00:34:10,210 should ask is can I transform this into an existing problem? 458 00:34:10,210 --> 00:34:14,020 Ok, double line. 459 00:34:14,020 --> 00:34:16,570 If there are no questions I'm going to make a dramatic 460 00:34:16,570 --> 00:34:19,900 change in topic. 461 00:34:19,900 --> 00:34:24,370 We're going to temporarily get off of this more esoteric 462 00:34:24,370 --> 00:34:28,710 stuff, and go back to Python. 463 00:34:28,710 --> 00:34:33,430 And for the next, off and on for the next couple of weeks, 464 00:34:33,430 --> 00:34:51,190 we'll be talking about Python and program organization. 465 00:34:51,190 --> 00:34:58,880 And what I want to be talking about is modules of one sort, 466 00:34:58,880 --> 00:35:01,280 and of course that's because what we're interested in is 467 00:35:01,280 --> 00:35:06,510 modularity. 468 00:35:06,510 --> 00:35:10,550 How do we take a complex program, again, divide and 469 00:35:10,550 --> 00:35:14,050 conquer, I feel like a 1-trick pony, I keep repeating the 470 00:35:14,050 --> 00:35:16,400 same thing over and over again. 471 00:35:16,400 --> 00:35:21,290 Divide and conquer to make our programs modular so we can 472 00:35:21,290 --> 00:35:23,870 write them a little piece at a time and understand them a 473 00:35:23,870 --> 00:35:26,920 little piece at a time. 474 00:35:26,920 --> 00:35:29,810 Now I think of a module as a 475 00:35:29,810 --> 00:35:44,240 collection of related functions. 476 00:35:44,240 --> 00:35:47,570 We've already seen these, and we're going to refer to the 477 00:35:47,570 --> 00:36:01,720 functions using dot notation. 478 00:36:01,720 --> 00:36:04,550 We've been doing this all term, right, probably 479 00:36:04,550 --> 00:36:12,700 somewhere around lecture 2, we said import math, and then 480 00:36:12,700 --> 00:36:17,310 somewhere in our program we wrote something like math dot 481 00:36:17,310 --> 00:36:25,570 sqrt of 11, or some other number. 482 00:36:25,570 --> 00:36:28,040 And the good news was we didn't have to worry about how 483 00:36:28,040 --> 00:36:31,260 math did square root or anything like that, we just 484 00:36:31,260 --> 00:36:33,850 got it and we used it. 485 00:36:33,850 --> 00:36:54,010 Now we have the dot notation to avoid name conflicts. 486 00:36:54,010 --> 00:36:59,230 Imagine, for example, that in my program I wrote something 487 00:36:59,230 --> 00:37:05,820 like import set, because somebody had written a module 488 00:37:05,820 --> 00:37:11,150 that implements mathematical sets, and somewhere else I'd 489 00:37:11,150 --> 00:37:16,520 written something like import table, because someone had 490 00:37:16,520 --> 00:37:19,940 something that implemented look-up tables of some sort, 491 00:37:19,940 --> 00:37:24,640 something like dictionaries, for example. 492 00:37:24,640 --> 00:37:28,590 And then, I wanted to ask something like membership. 493 00:37:28,590 --> 00:37:31,960 Is something in the set, or is something in the table? 494 00:37:31,960 --> 00:37:35,350 Well, what I would have written is something like 495 00:37:35,350 --> 00:37:43,120 table dot member. 496 00:37:43,120 --> 00:37:48,260 And then the element and maybe the table. 497 00:37:48,260 --> 00:37:51,920 And the dot notation was used to disambiguate, because I 498 00:37:51,920 --> 00:37:54,160 want the member operation from table, not the 499 00:37:54,160 --> 00:37:56,990 member one from set. 500 00:37:56,990 --> 00:38:00,670 This was important because the people who implemented table 501 00:38:00,670 --> 00:38:04,170 and set might never have met each other, and so they can 502 00:38:04,170 --> 00:38:06,910 hardly have been expected not to have used the same name 503 00:38:06,910 --> 00:38:10,090 somewhere by accident. 504 00:38:10,090 --> 00:38:14,810 Hence the use of the dot notation. 505 00:38:14,810 --> 00:38:20,660 I now want to talk about a particular kind of module, and 506 00:38:20,660 --> 00:38:22,370 those are the modules that include 507 00:38:22,370 --> 00:38:32,730 classes or that are classes. 508 00:38:32,730 --> 00:38:36,590 This is a very important concept as we'll see, it's why 509 00:38:36,590 --> 00:38:39,780 MIT calls things like 6.00 subjects, so that they don't 510 00:38:39,780 --> 00:38:45,180 get confused with classes in Python, something we really 511 00:38:45,180 --> 00:38:46,900 need to remember here. 512 00:38:46,900 --> 00:38:49,400 Now they can be used in different ways, and they have 513 00:38:49,400 --> 00:38:52,290 been historically used in different ways. 514 00:38:52,290 --> 00:38:57,080 In this subject we're going to emphasize using classes in the 515 00:38:57,080 --> 00:39:05,890 context of what's called object-oriented programming. 516 00:39:05,890 --> 00:39:09,720 And if you go look up at Python books on the web, or 517 00:39:09,720 --> 00:39:13,520 Java books on the web, about 80% of them will include the 518 00:39:13,520 --> 00:39:17,040 word object-oriented in their title. 519 00:39:17,040 --> 00:39:20,400 Object-oriented Python programming for computer 520 00:39:20,400 --> 00:39:24,390 games, or who knows what else. 521 00:39:24,390 --> 00:39:30,710 And we're going to use this object-oriented programming, 522 00:39:30,710 --> 00:39:38,170 typically to create something called data abstractions. 523 00:39:38,170 --> 00:39:41,000 And over the next couple of days, you'll see what we mean 524 00:39:41,000 --> 00:39:43,220 by this in detail. 525 00:39:43,220 --> 00:39:48,160 A synonym for this is an abstract data type. 526 00:39:48,160 --> 00:39:52,910 You'll see both terms used on the web, and the literature 527 00:39:52,910 --> 00:39:57,410 etc., and think of them as synonyms. Now these ideas of 528 00:39:57,410 --> 00:40:01,160 classes, object-oriented programming, data abstraction, 529 00:40:01,160 --> 00:40:05,930 are about 40 years old, they're not new ideas. 530 00:40:05,930 --> 00:40:09,290 But they've only been really widely accepted in practice 531 00:40:09,290 --> 00:40:12,640 for 10 to 15 years. 532 00:40:12,640 --> 00:40:16,850 It was in the mid-70's, people began to write articles 533 00:40:16,850 --> 00:40:22,530 advocating this style of programming, and actually 534 00:40:22,530 --> 00:40:25,930 building programming languages, notably Smalltalk 535 00:40:25,930 --> 00:40:30,450 and Clue at MIT in fact, that provided linguistic support 536 00:40:30,450 --> 00:40:32,410 for the ideas of data abstraction and 537 00:40:32,410 --> 00:40:34,870 object-oriented programming. 538 00:40:34,870 --> 00:40:40,490 But it really wasn't until, I would say, the arrival of Java 539 00:40:40,490 --> 00:40:42,230 that object-oriented programming 540 00:40:42,230 --> 00:40:47,570 caught the popular attention. 541 00:40:47,570 --> 00:40:58,810 And then Java, C++ , Python of, course. 542 00:40:58,810 --> 00:41:02,790 And today nobody advocates a programming language that does 543 00:41:02,790 --> 00:41:07,720 not support it in some sort of way. 544 00:41:07,720 --> 00:41:10,970 So what is this all about? 545 00:41:10,970 --> 00:41:27,570 What is an object in object-oriented programming? 546 00:41:27,570 --> 00:41:49,620 An object is a collection of data and functions. 547 00:41:49,620 --> 00:41:53,580 In particular functions that operate on the data, perhaps 548 00:41:53,580 --> 00:41:58,300 on other data as well. 549 00:41:58,300 --> 00:42:05,000 The key idea here is to bind together the data and the 550 00:42:05,000 --> 00:42:10,630 functions that operate on that data as a single thing. 551 00:42:10,630 --> 00:42:12,800 Now typically that's probably not the way you've been 552 00:42:12,800 --> 00:42:14,840 thinking about things. 553 00:42:14,840 --> 00:42:18,460 When you think about an int or a float or a dictionary or a 554 00:42:18,460 --> 00:42:21,100 list, you knew that there were functions 555 00:42:21,100 --> 00:42:23,110 that operated on them. 556 00:42:23,110 --> 00:42:27,110 But when you pass a parameter say, a list, you didn't think 557 00:42:27,110 --> 00:42:30,120 that you were not only passing the list, you were also 558 00:42:30,120 --> 00:42:31,910 passing the functions that operate on the 559 00:42:31,910 --> 00:42:37,230 list. In fact you are. 560 00:42:37,230 --> 00:42:45,800 It often doesn't matter, but it sometimes really does. 561 00:42:45,800 --> 00:42:50,600 The advantage of that, is that when you pass an object to 562 00:42:50,600 --> 00:42:56,910 another part of the program, that part of the program also 563 00:42:56,910 --> 00:43:03,050 gets the ability to perform operations on the object. 564 00:43:03,050 --> 00:43:06,270 Now when the only types we're dealing with are the built-in 565 00:43:06,270 --> 00:43:10,910 types, the ones that came with the programming language, that 566 00:43:10,910 --> 00:43:12,980 doesn't really matter. 567 00:43:12,980 --> 00:43:16,040 Because, well, the programming language means everybody has 568 00:43:16,040 --> 00:43:19,300 access to those operations. 569 00:43:19,300 --> 00:43:23,640 But the key idea here is that we're going to be generating 570 00:43:23,640 --> 00:43:33,860 user-defined types, we'll invent new types, and as we do 571 00:43:33,860 --> 00:43:38,780 that we can't assume that if as we pass objects of that 572 00:43:38,780 --> 00:43:42,660 type around, that the programming language is giving 573 00:43:42,660 --> 00:43:51,210 us the appropriate operations on that type. 574 00:43:51,210 --> 00:43:56,320 This combining of data and functions on that data is a 575 00:43:56,320 --> 00:44:00,210 very essence of object-oriented programming. 576 00:44:00,210 --> 00:44:03,830 That's really what defines it. 577 00:44:03,830 --> 00:44:13,470 And the word that's often used for that is encapsulation. 578 00:44:13,470 --> 00:44:18,790 Think of it as we got a capsule, like a pill or 579 00:44:18,790 --> 00:44:23,770 something, and in that capsule we've got data and a bunch of 580 00:44:23,770 --> 00:44:32,710 functions, which as we'll see are called methods. 581 00:44:32,710 --> 00:44:34,860 Don't worry, it doesn't matter that they're called methods, 582 00:44:34,860 --> 00:44:43,260 it's a historical artifact. 583 00:44:43,260 --> 00:44:46,770 All right, so what's an example of this? 584 00:44:46,770 --> 00:44:51,720 Well, we could create a circle object, that would store a 585 00:44:51,720 --> 00:44:56,750 representation of the circle and also provide methods to 586 00:44:56,750 --> 00:45:01,000 operate on it, for example, draw the circle on the screen, 587 00:45:01,000 --> 00:45:06,480 return the area of the circle, inscribe it in a square, who 588 00:45:06,480 --> 00:45:09,730 knows what you want to do with it. 589 00:45:09,730 --> 00:45:14,010 As we talk about this, as people talk about this, in the 590 00:45:14,010 --> 00:45:22,400 context of our object-oriented programming, they typically 591 00:45:22,400 --> 00:45:27,790 will talk about it in terms of message pass, a message 592 00:45:27,790 --> 00:45:42,620 passing metaphor. 593 00:45:42,620 --> 00:45:45,470 I want to mention it's just a metaphor, just a way of 594 00:45:45,470 --> 00:45:50,720 thinking about it, it's not anything very deep here. 595 00:45:50,720 --> 00:45:56,920 So, the way people will talk about this, is one object can 596 00:45:56,920 --> 00:46:01,690 pass a message to another object, and the receiving 597 00:46:01,690 --> 00:46:06,610 object responds by executing one of its 598 00:46:06,610 --> 00:46:08,440 methods on the object. 599 00:46:08,440 --> 00:46:13,180 So let's think about lists. 600 00:46:13,180 --> 00:46:17,620 So if l is a list, I can call something like s 601 00:46:17,620 --> 00:46:21,020 dot sort, l dot sort. 602 00:46:21,020 --> 00:46:22,690 You've seen this. 603 00:46:22,690 --> 00:46:32,260 This says, pass the object l the message sort, and that 604 00:46:32,260 --> 00:46:36,990 message says find the method sort, and apply it to the 605 00:46:36,990 --> 00:46:41,010 object l, in this case mutating the object so that 606 00:46:41,010 --> 00:46:47,710 the elements are now in sorted order. 607 00:46:47,710 --> 00:46:50,130 If c is a circle, I might write 608 00:46:50,130 --> 00:46:55,300 something like c dot area. 609 00:46:55,300 --> 00:46:58,890 And this would say, pass to the object denoted by the 610 00:46:58,890 --> 00:47:05,710 variable c, the message area, which says execute a method 611 00:47:05,710 --> 00:47:09,495 called area, and in this case the method might return a 612 00:47:09,495 --> 00:47:14,750 float, rather than have a side-effect. 613 00:47:14,750 --> 00:47:18,850 Now again, don't get carried away, I almost didn't talk 614 00:47:18,850 --> 00:47:22,390 about this whole message-passing paradigm, but 615 00:47:22,390 --> 00:47:25,000 it's so pervasive in the world I felt you needed 616 00:47:25,000 --> 00:47:27,200 to hear about it. 617 00:47:27,200 --> 00:47:30,110 But it's nothing very deep, and if you want to not think 618 00:47:30,110 --> 00:47:34,940 about messages, and just think oh, c has a method area, a 619 00:47:34,940 --> 00:47:38,600 circle has a method area, and c as a circle will apply it 620 00:47:38,600 --> 00:47:44,570 and do what it says, you won't get in any trouble at all. 621 00:47:44,570 --> 00:47:57,100 Now the next concept to think about here, is the notion of 622 00:47:57,100 --> 00:48:03,750 an instance. 623 00:48:03,750 --> 00:48:08,440 So we've already thought about, we create instances of 624 00:48:08,440 --> 00:48:12,350 types, so when we looked at lists, and we looked at the 625 00:48:12,350 --> 00:48:16,280 notion of aliasing, we used the word instance, and said 626 00:48:16,280 --> 00:48:19,760 this is 1 object, this is another object, each of those 627 00:48:19,760 --> 00:48:28,070 objects is an instance of type list. So now that 628 00:48:28,070 --> 00:48:31,040 gets us to a class. 629 00:48:31,040 --> 00:48:51,730 A class is a collection of objects with 630 00:48:51,730 --> 00:49:05,740 characteristics in common. 631 00:49:05,740 --> 00:49:09,450 So you can think of class list. What is the 632 00:49:09,450 --> 00:49:12,700 characteristic that all objects of class list have in 633 00:49:12,700 --> 00:49:17,210 common, all instances of class list? it's the set of methods 634 00:49:17,210 --> 00:49:19,330 that can be applied to lists. 635 00:49:19,330 --> 00:49:26,740 Methods like sort, append, other things. 636 00:49:26,740 --> 00:49:30,430 So you should think of all of the built-in types we've 637 00:49:30,430 --> 00:49:35,990 talked about as actually just built-in classes, like 638 00:49:35,990 --> 00:49:38,680 dictionaries, lists, etc. 639 00:49:38,680 --> 00:49:43,840 The beauty of being able to define your own class is you 640 00:49:43,840 --> 00:49:47,850 can now extend the language. 641 00:49:47,850 --> 00:49:50,840 So if, for example, you're in the business, God forbid, of 642 00:49:50,840 --> 00:49:55,600 writing financial software today, you might decide, I'd 643 00:49:55,600 --> 00:50:00,240 really like to have a class called tanking stock, or bad 644 00:50:00,240 --> 00:50:03,550 mortgage, or something like that or mortgage, right? 645 00:50:03,550 --> 00:50:07,030 Which would have a bunch of operations, like, I won't go 646 00:50:07,030 --> 00:50:08,620 into what they might be. 647 00:50:08,620 --> 00:50:11,690 But you'd like to write your program not in terms of floats 648 00:50:11,690 --> 00:50:15,640 and ints and lists, but in terms of mortgages, and CDOs, 649 00:50:15,640 --> 00:50:18,400 and all of the objects that you read about in the paper, 650 00:50:18,400 --> 00:50:20,530 the types you read about. 651 00:50:20,530 --> 00:50:23,170 And so you get to build your own special purpose 652 00:50:23,170 --> 00:50:26,120 programming language that helped you solve your problems 653 00:50:26,120 --> 00:50:29,600 in biology or finance or whatever, and we'll pick up 654 00:50:29,600 --> 00:50:31,760 here again on Tuesday.