1 00:00:00,030 --> 00:00:01,690 The following content is provided 2 00:00:01,690 --> 00:00:03,830 under a creative commons license. 3 00:00:03,830 --> 00:00:06,250 Your support will help MIT OpenCourseWare 4 00:00:06,250 --> 00:00:10,520 continue to offer high-quality educational resources for free. 5 00:00:10,520 --> 00:00:13,230 To make a donation or view additional materials 6 00:00:13,230 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:21,520 at ocw.mit.edu. 8 00:00:21,520 --> 00:00:24,980 PROFESSOR: Last time we were talking about binary search 9 00:00:24,980 --> 00:00:29,370 and I sort of left a promise to you which I need to pick up. 10 00:00:29,370 --> 00:00:31,130 I want to remind you, we were talking 11 00:00:31,130 --> 00:00:35,999 about search, which is a very fundamental thing that we do 12 00:00:35,999 --> 00:00:37,290 in a whole lot of applications. 13 00:00:37,290 --> 00:00:39,930 We want to go find things in some data set. 14 00:00:39,930 --> 00:00:42,650 And I'll remind you that we sort of a separated out two cases. 15 00:00:42,650 --> 00:00:48,090 We said if we had an ordered list, 16 00:00:48,090 --> 00:00:50,420 we could use binary search. 17 00:00:50,420 --> 00:00:54,660 And we said that was log rhythmic, took log n time where 18 00:00:54,660 --> 00:00:56,140 n is the size of the list. 19 00:00:56,140 --> 00:01:01,140 If it was an unordered list, we were basically 20 00:01:01,140 --> 00:01:03,574 stuck with linear search. 21 00:01:03,574 --> 00:01:04,990 Got to walk through the whole list 22 00:01:04,990 --> 00:01:06,198 to see if the thing is there. 23 00:01:06,198 --> 00:01:08,289 So that was of order in. 24 00:01:08,289 --> 00:01:10,080 And then one of the things that I suggested 25 00:01:10,080 --> 00:01:13,650 was that if we could figure out some way to order it, 26 00:01:13,650 --> 00:01:22,392 and in particular, if we could order it in n log n time, 27 00:01:22,392 --> 00:01:24,100 and we still haven't done that, but if we 28 00:01:24,100 --> 00:01:26,670 could do that, then we said the complexity changed 29 00:01:26,670 --> 00:01:27,880 a little bit. 30 00:01:27,880 --> 00:01:30,020 But it changed in a way that I want to remind you. 31 00:01:30,020 --> 00:01:32,810 And the change was, that in this case, 32 00:01:32,810 --> 00:01:36,720 if I'm doing a single search, I've got a choice. 33 00:01:36,720 --> 00:01:40,860 I could still do the linear case, which is order n 34 00:01:40,860 --> 00:01:44,170 or I could say, look, take the list, let's sort it 35 00:01:44,170 --> 00:01:45,480 and then search it. 36 00:01:45,480 --> 00:01:47,500 But in that case, we said well to sort 37 00:01:47,500 --> 00:01:52,710 it was going to take n log n time, assuming I can do that. 38 00:01:52,710 --> 00:01:59,610 Once I have it sorted I can search it in log n time, 39 00:01:59,610 --> 00:02:02,860 but that's still isn't as good as just doing n. 40 00:02:02,860 --> 00:02:08,220 And this led to this idea of amortization, 41 00:02:08,220 --> 00:02:11,172 which is I need to not only factor in the cost, 42 00:02:11,172 --> 00:02:12,380 but how am I going to use it? 43 00:02:12,380 --> 00:02:14,520 And typically, I'm not going to just search once in a list, 44 00:02:14,520 --> 00:02:16,250 I'm going to search multiple times. 45 00:02:16,250 --> 00:02:21,650 So if I have to do k searches, then in the linear case, 46 00:02:21,650 --> 00:02:23,370 I got to do order n things k times. 47 00:02:23,370 --> 00:02:25,070 It's order k n. 48 00:02:25,070 --> 00:02:30,390 Whereas in the ordered case, I need to get them sorted, 49 00:02:30,390 --> 00:02:34,870 which is still n log n, but then the search is only log n. 50 00:02:34,870 --> 00:02:38,880 I need to do k of those. 51 00:02:38,880 --> 00:02:44,600 And we suggested well this is better than that. 52 00:02:44,600 --> 00:02:47,820 This is certainly better than that. 53 00:02:47,820 --> 00:02:50,270 m plus k all times log n is in general going to be 54 00:02:50,270 --> 00:02:51,750 much better than k times n. 55 00:02:51,750 --> 00:02:53,300 It depends on n and k but obviously 56 00:02:53,300 --> 00:02:56,580 as n gets big, that one is going to be better. 57 00:02:56,580 --> 00:02:58,330 And that's just a way of reminding you 58 00:02:58,330 --> 00:03:00,959 that we want to think carefully, but what 59 00:03:00,959 --> 00:03:02,750 are the things we're trying to measure when 60 00:03:02,750 --> 00:03:04,090 we talk about complexity here? 61 00:03:04,090 --> 00:03:06,480 It's both the size of the thing and how often 62 00:03:06,480 --> 00:03:07,540 are we going to use it? 63 00:03:07,540 --> 00:03:10,131 And there are some trade offs, but I still 64 00:03:10,131 --> 00:03:11,880 haven't said how I'm going to get an n log 65 00:03:11,880 --> 00:03:14,560 n sorting algorithm, and that's what I want to do today. 66 00:03:14,560 --> 00:03:17,030 One of the two things I want to do today. 67 00:03:17,030 --> 00:03:19,610 To set the stage for this, let's go back just 68 00:03:19,610 --> 00:03:22,600 for a second to binary search. 69 00:03:22,600 --> 00:03:25,710 At the end of the lecture I said binary search 70 00:03:25,710 --> 00:03:39,660 was an example of a divide and conquer algorithm. 71 00:03:39,660 --> 00:03:41,860 Sort of an Attila the Hun kind of approach 72 00:03:41,860 --> 00:03:43,940 to doing things if you like. 73 00:03:43,940 --> 00:03:46,140 So let me say -- boy, I could have made a really bad 74 00:03:46,140 --> 00:03:48,340 political joke there, which I will forego, right. 75 00:03:48,340 --> 00:03:50,760 Let's say what this actually means, divide and conquer. 76 00:03:50,760 --> 00:03:53,340 Divide and conquer says basically do the following: 77 00:03:53,340 --> 00:04:14,254 split the problem into several sub-problems of the same type. 78 00:04:14,254 --> 00:04:15,670 I'll come back in a second to help 79 00:04:15,670 --> 00:04:16,800 binary searches matches in that, but that's 80 00:04:16,800 --> 00:04:17,880 what we're going to do. 81 00:04:17,880 --> 00:04:21,920 For each of those sub-problems we're 82 00:04:21,920 --> 00:04:26,710 going to solve them independently, 83 00:04:26,710 --> 00:04:34,124 and then we're going to combine those solutions. 84 00:04:34,124 --> 00:04:36,540 And it's called divide and conquer for the obvious reason. 85 00:04:36,540 --> 00:04:39,670 I'm going to divide it up into sub-problems with the hope 86 00:04:39,670 --> 00:04:41,152 that those sub-problems get easier. 87 00:04:41,152 --> 00:04:43,110 It's going to be easier to conquer if you like, 88 00:04:43,110 --> 00:04:45,150 and then I'm going to merge them back. 89 00:04:45,150 --> 00:04:47,840 Now, in the binary search case, in some sense, 90 00:04:47,840 --> 00:04:51,140 this is a little bit trivial. 91 00:04:51,140 --> 00:04:52,480 What was the divide? 92 00:04:52,480 --> 00:04:56,715 The divide was breaking a big search up into half a search. 93 00:04:56,715 --> 00:04:58,340 We actually threw half of the list away 94 00:04:58,340 --> 00:05:01,380 and we kept dividing it down, until ultimately we 95 00:05:01,380 --> 00:05:03,000 got something of size one to search. 96 00:05:03,000 --> 00:05:04,770 That's really easy. 97 00:05:04,770 --> 00:05:07,510 The combination was also sort of trivial in this case 98 00:05:07,510 --> 00:05:09,930 because the solution to the sub-problem 99 00:05:09,930 --> 00:05:13,300 was, in fact, the solution to the larger problem. 100 00:05:13,300 --> 00:05:15,940 But there's the idea of divide and conquer. 101 00:05:15,940 --> 00:05:18,820 I'm going to use exactly that same ideas to tackle sort. 102 00:05:18,820 --> 00:05:20,840 Again, I've got an unordered list of n elements. 103 00:05:20,840 --> 00:05:24,010 I want to sort it into a obviously a sorted list. 104 00:05:24,010 --> 00:05:27,270 And that particular algorithm is actually 105 00:05:27,270 --> 00:05:29,950 a really nice algorithm called merge sort. 106 00:05:29,950 --> 00:05:37,800 And it's actually a fairly old algorithm. 107 00:05:37,800 --> 00:05:43,720 It was invented in 1945 by John von Neumann one of the pioneers 108 00:05:43,720 --> 00:05:46,060 of computer science. 109 00:05:46,060 --> 00:05:50,050 And here's the idea behind merge sort, 110 00:05:50,050 --> 00:05:53,960 actually I'm going to back into it in a funny way. 111 00:05:53,960 --> 00:05:56,340 Let's assume that I could somehow get to the stage 112 00:05:56,340 --> 00:05:59,560 where I've got two sorted lists. 113 00:05:59,560 --> 00:06:04,171 How much work do I have to do to actually merge them together? 114 00:06:04,171 --> 00:06:05,420 So let me give you an example. 115 00:06:05,420 --> 00:06:13,610 Suppose I want to merge two lists, and they're sorted. 116 00:06:13,610 --> 00:06:19,190 Just to give you an example, here's one list, 117 00:06:19,190 --> 00:06:25,830 3121724 Here's another list, 12430. 118 00:06:25,830 --> 00:06:29,020 I haven't said how I'm going to get those sorted lists, 119 00:06:29,020 --> 00:06:31,790 but imagine I had two sorted lists like that. 120 00:06:31,790 --> 00:06:34,370 How hard is it to merge them? 121 00:06:34,370 --> 00:06:35,940 Well it's pretty easy, right? 122 00:06:35,940 --> 00:06:37,540 I start at the beginning of each list, 123 00:06:37,540 --> 00:06:39,820 and I say is one less than three? 124 00:06:39,820 --> 00:06:41,170 Sure. 125 00:06:41,170 --> 00:06:45,470 So that says one should be the first element in my merge list. 126 00:06:45,470 --> 00:06:49,220 Now, compare the first element in each of these lists. 127 00:06:49,220 --> 00:06:52,810 Two is less than three, so two ought 128 00:06:52,810 --> 00:06:54,910 to be the next element of the list. 129 00:06:54,910 --> 00:06:55,920 And you get the idea. 130 00:06:55,920 --> 00:06:57,045 What am I going to do next? 131 00:06:57,045 --> 00:07:00,020 I'm going to compare three against four. 132 00:07:00,020 --> 00:07:02,070 Three is the smallest one, and I'm 133 00:07:02,070 --> 00:07:05,260 going to compare four games twelve, 134 00:07:05,260 --> 00:07:07,132 which is going to give me four. 135 00:07:07,132 --> 00:07:07,840 And then what do? 136 00:07:07,840 --> 00:07:13,860 I have to do twelve against thirty, twelve is smaller, 137 00:07:13,860 --> 00:07:15,590 take that out. 138 00:07:15,590 --> 00:07:22,250 Seventeen against thirty, twenty-four against thirty 139 00:07:22,250 --> 00:07:27,100 And by this stage I've got nothing left in this element, 140 00:07:27,100 --> 00:07:31,010 so I just add the rest of that list in. 141 00:07:31,010 --> 00:07:33,300 Wow I can sort two lists, so I can merge two lists. 142 00:07:33,300 --> 00:07:35,100 I said it poorly. 143 00:07:35,100 --> 00:07:37,110 What's the point? 144 00:07:37,110 --> 00:07:40,000 How many operations did it take me to do this? 145 00:07:40,000 --> 00:07:41,150 Seven comparisons, right? 146 00:07:41,150 --> 00:07:42,150 I've got eight elements. 147 00:07:42,150 --> 00:07:47,290 It took me seven comparisons, because I 148 00:07:47,290 --> 00:07:49,020 can take advantage of the fact I know 149 00:07:49,020 --> 00:07:52,627 I only ever have to look at the first element of each sub-list. 150 00:07:52,627 --> 00:07:54,460 Those are the only things I need to compare, 151 00:07:54,460 --> 00:07:57,365 and when I run out of one list, I just add the rest of the list 152 00:07:57,365 --> 00:07:58,140 in. 153 00:07:58,140 --> 00:08:02,474 What's the order of complexity of merging? 154 00:08:02,474 --> 00:08:03,890 I heard it somewhere very quietly. 155 00:08:03,890 --> 00:08:05,217 STUDENT: n. 156 00:08:05,217 --> 00:08:06,550 PROFESSOR: Sorry, and thank you. 157 00:08:06,550 --> 00:08:08,290 Linear, absolutely right? 158 00:08:08,290 --> 00:08:10,517 And what's n by the way here? 159 00:08:10,517 --> 00:08:11,350 What's it measuring? 160 00:08:11,350 --> 00:08:14,017 STUDENT: [UNINTELLIGIBLE] 161 00:08:14,017 --> 00:08:15,350 PROFESSOR: In both lists, right. 162 00:08:15,350 --> 00:08:21,230 So this is linear, order n and n is this sum of the element, 163 00:08:21,230 --> 00:08:30,590 or sorry, the number of elements in each list. 164 00:08:30,590 --> 00:08:32,740 I said I was going to back my way into this. 165 00:08:32,740 --> 00:08:37,410 That gives me a way to merge things. 166 00:08:37,410 --> 00:08:40,020 So here's what merge sort would do. 167 00:08:40,020 --> 00:08:47,150 Merge sort takes this idea of divide and conquer, 168 00:08:47,150 --> 00:08:49,610 and it does the following: it says 169 00:08:49,610 --> 00:08:58,440 let's divide the list in half. 170 00:08:58,440 --> 00:09:00,940 There's the divide and conquer. 171 00:09:00,940 --> 00:09:04,884 And let's keep dividing each of those lists in half 172 00:09:04,884 --> 00:09:07,300 until we get down to something that's really easy to sort. 173 00:09:07,300 --> 00:09:08,716 What's the simplest thing to sort? 174 00:09:08,716 --> 00:09:12,220 A list of size one, right? 175 00:09:12,220 --> 00:09:26,820 So continue until we have singleton lists. 176 00:09:26,820 --> 00:09:28,710 Once I got a list of size one they're 177 00:09:28,710 --> 00:09:31,140 sorted, and then combine them. 178 00:09:31,140 --> 00:09:36,060 Combine them by doing emerge the sub-lists. 179 00:09:36,060 --> 00:09:42,540 And again, you see that flavor. 180 00:09:42,540 --> 00:09:44,640 I'm going to just keep dividing it up 181 00:09:44,640 --> 00:09:46,160 until I get something really easy, 182 00:09:46,160 --> 00:09:47,410 and then I'm going to combine. 183 00:09:47,410 --> 00:09:49,070 And this is different than binary search now, 184 00:09:49,070 --> 00:09:50,945 the combine is going to have to do some work. 185 00:09:50,945 --> 00:09:54,352 So, I'm giving you a piece of code that does this, 186 00:09:54,352 --> 00:09:56,310 and I'm going to come back to it in the second, 187 00:09:56,310 --> 00:09:57,405 but it's up there. 188 00:09:57,405 --> 00:09:58,780 But what I'd like to do is to try 189 00:09:58,780 --> 00:10:01,434 you sort sort of a little simulation of how 190 00:10:01,434 --> 00:10:02,100 this would work. 191 00:10:02,100 --> 00:10:03,850 And I was going to originally make the TAs come up here 192 00:10:03,850 --> 00:10:05,760 and do it, but I don't have enough t a's 193 00:10:05,760 --> 00:10:06,830 to do a full merge sort. 194 00:10:06,830 --> 00:10:10,380 So I'm hoping, so I also have these really high-tech props. 195 00:10:10,380 --> 00:10:12,242 I spent tons and tons of department money 196 00:10:12,242 --> 00:10:13,200 on them as you can see. 197 00:10:13,200 --> 00:10:15,250 I hope you can see this because I'm going to try 198 00:10:15,250 --> 00:10:16,750 and simulate what a merge sort does. 199 00:10:16,750 --> 00:10:18,950 I've got eight things I want to sort here, 200 00:10:18,950 --> 00:10:21,640 and those initially start out here at top level. 201 00:10:21,640 --> 00:10:23,920 The first step is divide them in half. 202 00:10:23,920 --> 00:10:27,440 All right? 203 00:10:27,440 --> 00:10:29,702 I'm not sure how to mark it here, 204 00:10:29,702 --> 00:10:31,160 remember I need to come back there. 205 00:10:31,160 --> 00:10:33,189 I'm not yet done. 206 00:10:33,189 --> 00:10:33,730 What do I do? 207 00:10:33,730 --> 00:10:40,805 Divide them in half again. 208 00:10:40,805 --> 00:10:42,430 You know, if I had like shells and peas 209 00:10:42,430 --> 00:10:44,229 here I could make some more money. 210 00:10:44,229 --> 00:10:44,770 What do I do? 211 00:10:44,770 --> 00:10:50,980 I divide them in half one more time. 212 00:10:50,980 --> 00:10:53,450 Let me cluster them because really what I have, 213 00:10:53,450 --> 00:10:56,240 sorry, separate them out. 214 00:10:56,240 --> 00:10:58,980 I've gone from one problem size eight down 215 00:10:58,980 --> 00:11:01,170 to eight problems of size one. 216 00:11:01,170 --> 00:11:03,540 At this stage I'm at my singleton case. 217 00:11:03,540 --> 00:11:04,229 So this is easy. 218 00:11:04,229 --> 00:11:04,770 What do I do? 219 00:11:04,770 --> 00:11:05,950 I merge. 220 00:11:05,950 --> 00:11:17,180 And the merge is, put them in order. 221 00:11:17,180 --> 00:11:18,030 What do I do next? 222 00:11:18,030 --> 00:11:19,510 Obvious thing, I merge these. 223 00:11:19,510 --> 00:11:22,800 And that as we saw was a nice linear operation. 224 00:11:22,800 --> 00:11:30,900 It's fun to do it upside down, and then one more merge 225 00:11:30,900 --> 00:11:34,040 which is I take the smallest elements of each one 226 00:11:34,040 --> 00:11:40,350 until I get to where I want. 227 00:11:40,350 --> 00:11:42,760 Wow aren't you impressed. 228 00:11:42,760 --> 00:11:45,610 No, don't please don't clap, not for that one. 229 00:11:45,610 --> 00:11:48,622 Now let me do it a second time to show you that -- 230 00:11:48,622 --> 00:11:49,580 I'm saying this poorly. 231 00:11:49,580 --> 00:11:50,790 Let me say it again. 232 00:11:50,790 --> 00:11:52,327 That's the general idea. 233 00:11:52,327 --> 00:11:53,660 What should you see out of that? 234 00:11:53,660 --> 00:11:55,280 I just kept sub-dividing down until I 235 00:11:55,280 --> 00:11:59,210 got really easy problems, and then I combine them back. 236 00:11:59,210 --> 00:12:02,380 I actually misled you slightly there or maybe a lot, 237 00:12:02,380 --> 00:12:03,760 because I did it in parallel. 238 00:12:03,760 --> 00:12:06,180 In fact, let me just shuffle these up a little bit. 239 00:12:06,180 --> 00:12:08,180 Really what's going to happen here, because this 240 00:12:08,180 --> 00:12:10,650 is a sequential computer, is that we're 241 00:12:10,650 --> 00:12:13,940 going to start off up here, at top level, 242 00:12:13,940 --> 00:12:19,530 we're going to divide into half, then 243 00:12:19,530 --> 00:12:21,490 we're going to do the complete subdivision 244 00:12:21,490 --> 00:12:24,160 and merge here before we ever come back and do this one. 245 00:12:24,160 --> 00:12:30,030 We're going to do a division here and then a division there. 246 00:12:30,030 --> 00:12:32,850 At that stage we can merge these, and then take this down, 247 00:12:32,850 --> 00:12:35,150 do the division merge and bring them back up. 248 00:12:35,150 --> 00:12:41,880 Let me show you an example by running that. 249 00:12:41,880 --> 00:12:44,300 I've got a little list I've made here called test. 250 00:12:44,300 --> 00:12:53,360 Let's run merge sort on it, and then we'll look at the code. 251 00:12:53,360 --> 00:12:57,170 OK, what I would like you to see is I've been printing out, 252 00:12:57,170 --> 00:12:59,420 as I went along, actually let's back up slightly 253 00:12:59,420 --> 00:13:00,950 and look at the code. 254 00:13:00,950 --> 00:13:03,440 There's merge sort. 255 00:13:03,440 --> 00:13:04,140 Takes in a list. 256 00:13:04,140 --> 00:13:05,098 What does it say to do? 257 00:13:05,098 --> 00:13:07,880 It says check to see if I'm in that base case. 258 00:13:07,880 --> 00:13:10,050 It's the list of length less than two. 259 00:13:10,050 --> 00:13:11,740 Is it one basically? 260 00:13:11,740 --> 00:13:16,090 In which case, just return a copy the list. 261 00:13:16,090 --> 00:13:17,279 That's the simple case. 262 00:13:17,279 --> 00:13:18,820 Otherwise, notice what it says to do. 263 00:13:18,820 --> 00:13:24,640 It's says find the mid-point and split the list in half. 264 00:13:24,640 --> 00:13:27,070 Copy of the back end, sorry, copy of the left side, 265 00:13:27,070 --> 00:13:28,380 copy of the right side. 266 00:13:28,380 --> 00:13:30,690 Run merge sort on those. 267 00:13:30,690 --> 00:13:32,430 By induction, if it does the right thing, 268 00:13:32,430 --> 00:13:34,740 I'm going to get back two lists, and I'm 269 00:13:34,740 --> 00:13:36,814 going to then merge them together. 270 00:13:36,814 --> 00:13:37,980 Notice what I'm going to do. 271 00:13:37,980 --> 00:13:40,370 I'm going to print here the list if we go into it, 272 00:13:40,370 --> 00:13:44,310 and print of the when we're done and then just return that. 273 00:13:44,310 --> 00:13:45,160 Merge up here. 274 00:13:45,160 --> 00:13:46,190 There's a little more code there. 275 00:13:46,190 --> 00:13:47,565 I'll let you just grok it but you 276 00:13:47,565 --> 00:13:51,200 can see it's basically doing what I did over there. 277 00:13:51,200 --> 00:13:53,230 Setting up two indices for the two sub-list, 278 00:13:53,230 --> 00:13:56,460 it's just walking down, finding the smallest element, 279 00:13:56,460 --> 00:13:57,890 putting it into a new list. 280 00:13:57,890 --> 00:14:00,700 When it gets to the end of one of the lists, 281 00:14:00,700 --> 00:14:03,339 it skips to the next part, and only one of these two pieces 282 00:14:03,339 --> 00:14:05,005 will get called because only one of them 283 00:14:05,005 --> 00:14:06,421 is going to have things leftovers. 284 00:14:06,421 --> 00:14:08,502 It's going to add the other pieces in. 285 00:14:08,502 --> 00:14:09,960 OK, if you look at that then, let's 286 00:14:09,960 --> 00:14:11,585 look at what happened when we ran this. 287 00:14:11,585 --> 00:14:16,480 We started off with a call with that list. 288 00:14:16,480 --> 00:14:19,030 Ah ha, split it in half. 289 00:14:19,030 --> 00:14:21,190 It's going down the left side of this. 290 00:14:21,190 --> 00:14:23,700 That got split in half, and that got split in half 291 00:14:23,700 --> 00:14:27,180 until I got to a list of one. 292 00:14:27,180 --> 00:14:28,620 Here's the first list of size one. 293 00:14:28,620 --> 00:14:30,120 There's the second list of size one. 294 00:14:30,120 --> 00:14:31,600 So I merged them. 295 00:14:31,600 --> 00:14:35,560 It's now in the right order, and that's coming from right there. 296 00:14:35,560 --> 00:14:37,320 Having done that, it goes back up 297 00:14:37,320 --> 00:14:42,210 and picks the second sub-list, which came from there. 298 00:14:42,210 --> 00:14:44,150 It's a down to base case, merges it. 299 00:14:44,150 --> 00:14:46,770 When these two merges are done, we're 300 00:14:46,770 --> 00:14:48,690 basically at a stage in that branch 301 00:14:48,690 --> 00:14:52,730 where we can now merge those two together, which gives us that, 302 00:14:52,730 --> 00:14:56,330 and it goes through the rest of it. 303 00:14:56,330 --> 00:15:00,000 A really nice algorithm. 304 00:15:00,000 --> 00:15:03,210 As I said, an example of divide and conquer. 305 00:15:03,210 --> 00:15:06,600 Notice here that it's different than the binary search case. 306 00:15:06,600 --> 00:15:10,240 We're certainly dividing down, but the combination now 307 00:15:10,240 --> 00:15:12,370 actually takes some work. 308 00:15:12,370 --> 00:15:15,082 I'll have to actually figure out how to put them back together. 309 00:15:15,082 --> 00:15:16,540 And that's a general thing you want 310 00:15:16,540 --> 00:15:18,040 to keep in mind when you're thinking 311 00:15:18,040 --> 00:15:21,159 about designing a divide and conquer kind of algorithm. 312 00:15:21,159 --> 00:15:23,450 You really want to get the power of dividing things up, 313 00:15:23,450 --> 00:15:26,645 but if you end up doing a ton of work at the combination stage, 314 00:15:26,645 --> 00:15:28,020 you may not have gained anything. 315 00:15:28,020 --> 00:15:31,290 So you really want to think about that trade off. 316 00:15:31,290 --> 00:15:37,682 All right, having said that, what's the complexity here? 317 00:15:37,682 --> 00:15:40,140 Boy, there's a dumb question, because I've been telling you 318 00:15:40,140 --> 00:15:42,540 for the last two lectures the complexity is n log n, 319 00:15:42,540 --> 00:15:43,790 but let's see if it really is. 320 00:15:43,790 --> 00:15:46,440 What's the complexity here? 321 00:15:46,440 --> 00:16:01,727 If we think about it, we start off with the problem of size n. 322 00:16:01,727 --> 00:16:02,310 What do we do? 323 00:16:02,310 --> 00:16:05,530 We split it into two problems of size n over 2. 324 00:16:05,530 --> 00:16:08,340 Those get split each into two problems of size n over 4, 325 00:16:08,340 --> 00:16:14,790 and we keep doing that until we get down to a level 326 00:16:14,790 --> 00:16:20,270 in this tree where we have only singletons left over. 327 00:16:20,270 --> 00:16:23,480 Once we're there, we have to do the merge. 328 00:16:23,480 --> 00:16:24,980 Notice what happens here. 329 00:16:24,980 --> 00:16:30,420 We said each of the merge operations was of order n. 330 00:16:30,420 --> 00:16:31,430 But n is different. 331 00:16:31,430 --> 00:16:31,630 Right? 332 00:16:31,630 --> 00:16:33,450 Down here, I've just got two things to merge, 333 00:16:33,450 --> 00:16:34,950 and then I've got things of size two 334 00:16:34,950 --> 00:16:37,530 to merge and then things of size four to merge. 335 00:16:37,530 --> 00:16:38,690 But notice a trade off. 336 00:16:38,690 --> 00:16:43,960 I have n operations if you like down there of size one. 337 00:16:43,960 --> 00:16:47,140 Up here I have n over two operations of size two. 338 00:16:47,140 --> 00:16:50,680 Up here I've got n over four operations of size four. 339 00:16:50,680 --> 00:16:54,950 So I always have to do a merge of n elements. 340 00:16:54,950 --> 00:16:57,480 How much time does that take? 341 00:16:57,480 --> 00:17:01,210 Well, we said it, right? 342 00:17:01,210 --> 00:17:02,950 Where did I put it? 343 00:17:02,950 --> 00:17:04,900 Right there, order n. 344 00:17:04,900 --> 00:17:16,402 So I have order n operations at each level in the tree. 345 00:17:16,402 --> 00:17:17,860 And then how many levels deep am I? 346 00:17:17,860 --> 00:17:20,900 Well, that's the divide, right? 347 00:17:20,900 --> 00:17:26,480 So how many levels do I have? 348 00:17:26,480 --> 00:17:30,100 Log n, because at each stage I'm cutting the problem in half. 349 00:17:30,100 --> 00:17:31,570 So I start off with n then it's n 350 00:17:31,570 --> 00:17:33,460 over two n over four n over eight. 351 00:17:33,460 --> 00:17:40,806 So I have n operations log n times, there we go, n log n. 352 00:17:40,806 --> 00:17:42,180 Took us a long time to get there, 353 00:17:42,180 --> 00:17:45,340 but it's a nice algorithm to have. 354 00:17:45,340 --> 00:17:51,670 Let me generalize this slightly. 355 00:17:51,670 --> 00:17:55,150 When we get a problem, a standard tool 356 00:17:55,150 --> 00:17:56,900 to try and attack it with is to say, 357 00:17:56,900 --> 00:18:00,800 is there some way to break this problem down into simpler, 358 00:18:00,800 --> 00:18:05,820 I shouldn't say simpler, smaller versions of the same problem. 359 00:18:05,820 --> 00:18:07,512 If I can do that, it's a good candidate 360 00:18:07,512 --> 00:18:08,470 for divide and conquer. 361 00:18:08,470 --> 00:18:10,928 And then the things I have to ask is how much of a division 362 00:18:10,928 --> 00:18:12,210 do I want to do? 363 00:18:12,210 --> 00:18:13,950 The obvious one is to divide it in half, 364 00:18:13,950 --> 00:18:16,366 but there may be cases where there are different divisions 365 00:18:16,366 --> 00:18:18,270 you want to have take place. 366 00:18:18,270 --> 00:18:21,632 The second question I want to ask is what's the base case? 367 00:18:21,632 --> 00:18:23,215 When do I get down to a problem that's 368 00:18:23,215 --> 00:18:26,740 small enough that it's basically trivial to solve? 369 00:18:26,740 --> 00:18:28,460 Here it was lists of size one. 370 00:18:28,460 --> 00:18:30,557 I could have stopped at lists of size two right. 371 00:18:30,557 --> 00:18:31,640 That's an easy comparison. 372 00:18:31,640 --> 00:18:34,850 Do one comparison and return one of two possible orders on it, 373 00:18:34,850 --> 00:18:36,410 but I need to decide that. 374 00:18:36,410 --> 00:18:39,930 And the third thing I need to decide is how do I combine? 375 00:18:39,930 --> 00:18:42,260 You know, point out to you in the binary search case, 376 00:18:42,260 --> 00:18:44,037 combination was trivial. 377 00:18:44,037 --> 00:18:46,120 The answer to the final search was just the answer 378 00:18:46,120 --> 00:18:47,270 all the way up. 379 00:18:47,270 --> 00:18:49,157 Here, a little more work, and that's 380 00:18:49,157 --> 00:18:50,490 why I'll come back to that idea. 381 00:18:50,490 --> 00:18:53,081 If I'm basically just squeezing jello, 382 00:18:53,081 --> 00:18:55,080 that is, I'm trying to make the problem simpler, 383 00:18:55,080 --> 00:18:57,500 but the combination turns out to be really complex, 384 00:18:57,500 --> 00:18:59,050 I've not gained anything. 385 00:18:59,050 --> 00:19:02,100 So things that are good candidates for divide 386 00:19:02,100 --> 00:19:04,280 and conquer are problems where it's 387 00:19:04,280 --> 00:19:06,400 easy to figure out how to divide down, 388 00:19:06,400 --> 00:19:09,880 and the combination is of little complexity. 389 00:19:09,880 --> 00:19:11,792 It would be nice if it was less than linear, 390 00:19:11,792 --> 00:19:13,250 but linear is nice because then I'm 391 00:19:13,250 --> 00:19:15,950 going to get that n log in kind of behavior. 392 00:19:15,950 --> 00:19:17,880 And if you ask the TAs in recitation tomorrow, 393 00:19:17,880 --> 00:19:20,517 they'll tell you that you see a lot of n log n algorithms 394 00:19:20,517 --> 00:19:21,350 in computer science. 395 00:19:21,350 --> 00:19:23,220 It's a very common class of algorithms, 396 00:19:23,220 --> 00:19:28,210 and it's very useful one to have. 397 00:19:28,210 --> 00:19:31,500 Now, one of the questions we could still ask 398 00:19:31,500 --> 00:19:34,660 is, right, we've got binary search, which 399 00:19:34,660 --> 00:19:36,880 has got this nice log behavior. 400 00:19:36,880 --> 00:19:41,130 If we can sort things, you know, we get this n log n behavior, 401 00:19:41,130 --> 00:19:43,490 and we got a n log n behavior overall. 402 00:19:43,490 --> 00:19:47,412 But can we actually do better in terms of searching. 403 00:19:47,412 --> 00:19:49,120 I'm going to show you one last technique. 404 00:19:49,120 --> 00:19:53,470 And in fact, we're going to put quotes around the word better, 405 00:19:53,470 --> 00:19:58,580 but it does better than even this kind of binary search, 406 00:19:58,580 --> 00:20:04,090 and that's a method called hashing. 407 00:20:04,090 --> 00:20:08,402 You've actually seen hashing, you just don't know it. 408 00:20:08,402 --> 00:20:09,860 Hashing is the the technique that's 409 00:20:09,860 --> 00:20:12,720 used in Python to represent dictionaries. 410 00:20:12,720 --> 00:20:14,840 Hashing is used when you actually 411 00:20:14,840 --> 00:20:17,460 come in to Logan Airport and Immigration or Homeland 412 00:20:17,460 --> 00:20:20,060 Security checks your picture against a database. 413 00:20:20,060 --> 00:20:24,880 Hashing is used every time you enter a password into a system. 414 00:20:24,880 --> 00:20:26,320 So what in the world is hashing? 415 00:20:26,320 --> 00:20:29,180 Well, let me start with a simple little example. 416 00:20:29,180 --> 00:20:35,850 Suppose I want to represent a collection of integers. 417 00:20:35,850 --> 00:20:38,690 This is an easy little example. 418 00:20:38,690 --> 00:20:41,080 And I promise you that the integers are never 419 00:20:41,080 --> 00:20:44,580 going to be anything other than between the range of zero 420 00:20:44,580 --> 00:20:45,607 to nine. 421 00:20:45,607 --> 00:20:47,690 OK, so it might be the collection of one and five. 422 00:20:47,690 --> 00:20:48,670 It might be two, three, four, eight. 423 00:20:48,670 --> 00:20:50,128 I mean some collection of integers, 424 00:20:50,128 --> 00:20:52,980 but I guarantee you it's between zero and nine. 425 00:20:52,980 --> 00:20:55,690 Here's the trick I can play. 426 00:20:55,690 --> 00:21:11,840 I can build -- I can't count -- I could build a list with spots 427 00:21:11,840 --> 00:21:14,390 for all of those elements, zero, one, two, three, four, five, 428 00:21:14,390 --> 00:21:16,040 six, seven, eight, nine. 429 00:21:16,040 --> 00:21:18,750 And then when I want to create my set, 430 00:21:18,750 --> 00:21:24,230 I could simply put a one everywhere 431 00:21:24,230 --> 00:21:26,090 that that integer falls. 432 00:21:26,090 --> 00:21:27,940 So if I wanted to represent, for example, 433 00:21:27,940 --> 00:21:32,360 this is the set two, six and eight, 434 00:21:32,360 --> 00:21:35,279 I put a one in those slots. 435 00:21:35,279 --> 00:21:37,570 This seems a little weird, but bear with me for second, 436 00:21:37,570 --> 00:21:40,940 in fact, I've given you a little piece a code to do it, 437 00:21:40,940 --> 00:21:45,890 which is the next piece of code on the hand out. 438 00:21:45,890 --> 00:21:48,960 So let's take a look at it for second. 439 00:21:48,960 --> 00:21:53,140 This little set of code here from create insert and number. 440 00:21:53,140 --> 00:21:53,897 What's create do? 441 00:21:53,897 --> 00:21:55,480 It says, given a low and a high range, 442 00:21:55,480 --> 00:21:57,230 in this case it would be zero to nine. 443 00:21:57,230 --> 00:22:00,197 I'm going to build a list. 444 00:22:00,197 --> 00:22:02,530 Right, you can see that little loop going through there. 445 00:22:02,530 --> 00:22:03,220 What am I doing? 446 00:22:03,220 --> 00:22:07,079 I'm creating a list with just that special symbol none in it. 447 00:22:07,079 --> 00:22:08,120 So I'm building the list. 448 00:22:08,120 --> 00:22:09,972 I'm returning that as my set. 449 00:22:09,972 --> 00:22:11,430 And then to create the object, I'll 450 00:22:11,430 --> 00:22:13,050 simply do a set of inserts. 451 00:22:13,050 --> 00:22:15,130 If I want the values two, six and eight in there, 452 00:22:15,130 --> 00:22:18,520 I would do an insert of two into that set, an insert of six 453 00:22:18,520 --> 00:22:20,827 into that set, and an insert of eight into the set. 454 00:22:20,827 --> 00:22:21,660 And what does it do? 455 00:22:21,660 --> 00:22:24,945 It marks a one in each of those spots. 456 00:22:24,945 --> 00:22:26,070 Now, what did I want to do? 457 00:22:26,070 --> 00:22:27,650 I wanted to check membership. 458 00:22:27,650 --> 00:22:28,840 I want to do search. 459 00:22:28,840 --> 00:22:30,430 Well that's simple. 460 00:22:30,430 --> 00:22:32,390 Given that representation and some value, 461 00:22:32,390 --> 00:22:36,300 I just say gee is it there? 462 00:22:36,300 --> 00:22:43,500 What's the order complexity here? 463 00:22:43,500 --> 00:22:45,310 I know I drive you nuts asking questions? 464 00:22:45,310 --> 00:22:48,530 What's the order complexity here? 465 00:22:48,530 --> 00:22:57,870 Quadratic, linear, log, constant? 466 00:22:57,870 --> 00:22:58,500 Any takers? 467 00:22:58,500 --> 00:23:01,124 I know I have the wrong glasses on the see hands up too, but... 468 00:23:01,124 --> 00:23:04,320 STUDENT: [UNINTELLIGIBLE] 469 00:23:04,320 --> 00:23:05,350 PROFESSOR: Who said it? 470 00:23:05,350 --> 00:23:06,290 STUDENT: Constant. 471 00:23:06,290 --> 00:23:08,559 PROFESSOR: Constant, why? 472 00:23:08,559 --> 00:23:09,600 STUDENT: [UNINTELLIGIBLE] 473 00:23:09,600 --> 00:23:10,683 PROFESSOR: Yes, thank you. 474 00:23:10,683 --> 00:23:11,890 All right, it is constant. 475 00:23:11,890 --> 00:23:14,280 You keep sitting back there where I can't get to you. 476 00:23:14,280 --> 00:23:15,690 Thank you very much. 477 00:23:15,690 --> 00:23:17,480 It has a constant. 478 00:23:17,480 --> 00:23:20,250 Remember we said we design lists so 479 00:23:20,250 --> 00:23:22,850 that the access, no matter where it was on the list 480 00:23:22,850 --> 00:23:24,910 was of constant time. 481 00:23:24,910 --> 00:23:28,810 That is another way of saying that looking up this thing here 482 00:23:28,810 --> 00:23:29,960 is constant. 483 00:23:29,960 --> 00:23:35,009 So this is constant time, order one. 484 00:23:35,009 --> 00:23:37,050 Come on, you know, representing sets of integers, 485 00:23:37,050 --> 00:23:38,520 this is pretty dumb. 486 00:23:38,520 --> 00:23:41,557 Suppose I want to have a set of characters. 487 00:23:41,557 --> 00:23:42,390 How could I do that? 488 00:23:42,390 --> 00:23:44,320 Well the idea of a hash, in fact, 489 00:23:44,320 --> 00:23:47,880 what's called a hash function is to have some way of mapping 490 00:23:47,880 --> 00:23:50,060 any kind of data into integers. 491 00:23:50,060 --> 00:23:55,260 So let's look at the second example, all right, -- 492 00:23:55,260 --> 00:24:01,700 I keep doing that -- this piece of code from here to here gives 493 00:24:01,700 --> 00:24:06,860 me a way of now creating a hash table of size 256. 494 00:24:06,860 --> 00:24:09,252 Ord as a built in python representation. 495 00:24:09,252 --> 00:24:11,460 There is lots of them around that takes any character 496 00:24:11,460 --> 00:24:13,600 and gives you back an integer. 497 00:24:13,600 --> 00:24:16,760 In fact, just to show that to you, if I go down here 498 00:24:16,760 --> 00:24:32,250 and I type ord, sorry, I did that wrong. 499 00:24:32,250 --> 00:24:33,660 Let me try again. 500 00:24:33,660 --> 00:24:38,889 We'll get to exceptions in a second. 501 00:24:38,889 --> 00:24:39,930 I give it some character. 502 00:24:39,930 --> 00:24:42,706 It gives me back an integer representing. 503 00:24:42,706 --> 00:24:43,330 It looks weird. 504 00:24:43,330 --> 00:24:44,870 Why is three come back to some other thing? 505 00:24:44,870 --> 00:24:46,360 That's the internal representation 506 00:24:46,360 --> 00:24:47,610 that python uses for this. 507 00:24:47,610 --> 00:24:51,080 If I give it some other character, yeah, 508 00:24:51,080 --> 00:24:54,230 it would help if I could type, give it some other character. 509 00:24:54,230 --> 00:24:57,460 It gives me back a representation. 510 00:24:57,460 --> 00:24:59,430 So now here's the idea. 511 00:24:59,430 --> 00:25:03,280 I build a list 256 elements long, 512 00:25:03,280 --> 00:25:06,240 and I fill it up with those special characters none. 513 00:25:06,240 --> 00:25:09,880 That's what create is going to do right here. 514 00:25:09,880 --> 00:25:14,130 And then hash character takes in any string or character, 515 00:25:14,130 --> 00:25:16,270 single character, gives me back a number. 516 00:25:16,270 --> 00:25:17,170 Notice what I do. 517 00:25:17,170 --> 00:25:20,220 If I want to create a set or a sequence 518 00:25:20,220 --> 00:25:24,430 representing these things, I simply insert into that list. 519 00:25:24,430 --> 00:25:26,990 It goes through and puts ones in the right place. 520 00:25:26,990 --> 00:25:29,587 And then, if I want to find out if something's there, 521 00:25:29,587 --> 00:25:30,420 I do the same thing. 522 00:25:30,420 --> 00:25:33,890 But notice now, hash is converting the input 523 00:25:33,890 --> 00:25:38,060 into an integer. 524 00:25:38,060 --> 00:25:40,130 So, what's the idea? 525 00:25:40,130 --> 00:25:42,470 If I know what my hash function does, 526 00:25:42,470 --> 00:25:46,850 it maps, in this case characters into a range zero 527 00:25:46,850 --> 00:25:50,720 to 256, which is zero to 255, I create a list that long, 528 00:25:50,720 --> 00:25:52,350 and I simply mark things. 529 00:25:52,350 --> 00:25:55,164 And my look up is still constant. 530 00:25:55,164 --> 00:25:56,080 Characters are simple. 531 00:25:56,080 --> 00:25:58,790 Suppose you want to represent sets of strings, 532 00:25:58,790 --> 00:26:01,379 well you basically just generalize the hash function. 533 00:26:01,379 --> 00:26:03,170 I think one of the classic ones for strings 534 00:26:03,170 --> 00:26:06,080 is called the Rabin-Karp algorithm. 535 00:26:06,080 --> 00:26:07,870 And it's simply the same idea that you 536 00:26:07,870 --> 00:26:13,810 have a mapping from your import into a set of integers. 537 00:26:13,810 --> 00:26:17,880 Wow, OK, maybe not so wow, but this is now constant. 538 00:26:17,880 --> 00:26:19,920 This is constant time access. 539 00:26:19,920 --> 00:26:24,000 So I can do searching in constant time which is great. 540 00:26:24,000 --> 00:26:26,070 Where's the penalty? 541 00:26:26,070 --> 00:26:28,970 What did I trade off here? 542 00:26:28,970 --> 00:26:30,950 Well I'm going to suggest that what I did 543 00:26:30,950 --> 00:26:40,080 was I really traded space for time. 544 00:26:40,080 --> 00:26:44,240 It makes me sound like an astro physicist somehow right? 545 00:26:44,240 --> 00:26:45,860 What do I mean by that? 546 00:26:45,860 --> 00:26:49,000 I have constant time access which is great, 547 00:26:49,000 --> 00:26:52,660 but I paid a price, which is I had to use up some space. 548 00:26:52,660 --> 00:26:55,290 In the case of integers it was easy. 549 00:26:55,290 --> 00:26:58,060 In the case of characters, so I have to give up a list of 256, 550 00:26:58,060 --> 00:26:59,390 no big deal. 551 00:26:59,390 --> 00:27:01,450 Imagine now you want to do faces. 552 00:27:01,450 --> 00:27:03,190 You've got a picture of somebody's face, 553 00:27:03,190 --> 00:27:04,300 it's a million pixels. 554 00:27:04,300 --> 00:27:07,410 Each pixel has a range of values from zero to 256. 555 00:27:07,410 --> 00:27:11,477 I want to hash a face with some function into an integer. 556 00:27:11,477 --> 00:27:13,310 I may not want to do the full range of this, 557 00:27:13,310 --> 00:27:16,550 but I may decide I have to use a lot of gigabytes of space 558 00:27:16,550 --> 00:27:18,199 in order to do a trade off. 559 00:27:18,199 --> 00:27:19,740 The reason I'm showing you this is it 560 00:27:19,740 --> 00:27:22,880 that this is a gain, a common trade off in computer science. 561 00:27:22,880 --> 00:27:25,015 That in many cases, I can gain efficiency 562 00:27:25,015 --> 00:27:28,220 if I'm willing to give up space. 563 00:27:28,220 --> 00:27:30,470 Having said that though, there may still be a problem, 564 00:27:30,470 --> 00:27:32,719 or there ought to be a problem that may be bugging you 565 00:27:32,719 --> 00:27:36,320 slightly, which is how do I guarantee 566 00:27:36,320 --> 00:27:39,660 that my hash function takes any input into exactly one 567 00:27:39,660 --> 00:27:43,600 spot in the storage space? 568 00:27:43,600 --> 00:27:45,190 The answer is I can't. 569 00:27:45,190 --> 00:27:46,680 OK, in the simple case of integers 570 00:27:46,680 --> 00:27:49,480 I can, but in the case of something more complex 571 00:27:49,480 --> 00:27:52,790 like faces or fingerprints or passwords for that matter, 572 00:27:52,790 --> 00:27:54,850 it's hard to design a hash function that 573 00:27:54,850 --> 00:27:56,970 has completely even distribution, meaning 574 00:27:56,970 --> 00:28:00,950 that it takes any input into exactly one output spot. 575 00:28:00,950 --> 00:28:03,560 So what you typically do and a hash case is you 576 00:28:03,560 --> 00:28:05,300 design your code to deal with that. 577 00:28:05,300 --> 00:28:07,230 You try to design -- actually I'm going to come back to that 578 00:28:07,230 --> 00:28:07,290 in a second. 579 00:28:07,290 --> 00:28:09,820 It's like you're trying to use a hash function that spread 580 00:28:09,820 --> 00:28:11,060 things out pretty evenly. 581 00:28:11,060 --> 00:28:13,280 But the places you store into in those lists 582 00:28:13,280 --> 00:28:16,082 may have to themselves have a small list in there, 583 00:28:16,082 --> 00:28:17,540 and when you go to check something, 584 00:28:17,540 --> 00:28:19,831 you may have to do a linear search through the elements 585 00:28:19,831 --> 00:28:20,930 in that list. 586 00:28:20,930 --> 00:28:23,050 The good news is the elements in any one spot 587 00:28:23,050 --> 00:28:26,090 in a hash table are likely to be a small number, three, four, 588 00:28:26,090 --> 00:28:26,670 five. 589 00:28:26,670 --> 00:28:27,790 So the search is really easy. 590 00:28:27,790 --> 00:28:29,280 You're not searching a million things. 591 00:28:29,280 --> 00:28:30,863 You're searching three or four things, 592 00:28:30,863 --> 00:28:34,090 but nonetheless, you have to do that trade off. 593 00:28:34,090 --> 00:28:36,250 The last thing I want to say about hashes 594 00:28:36,250 --> 00:28:46,057 are that they're actually really hard to create. 595 00:28:46,057 --> 00:28:48,390 There's been a lot of work done on these over the years, 596 00:28:48,390 --> 00:28:51,720 but in fact, it's pretty hard to invent a good hash function. 597 00:28:51,720 --> 00:28:53,305 So my advice to you is, if you want 598 00:28:53,305 --> 00:28:56,340 to use something was a hash, go to a library. 599 00:28:56,340 --> 00:28:57,580 Look up a good hash function. 600 00:28:57,580 --> 00:29:00,244 For strings, there's a classic set of them 601 00:29:00,244 --> 00:29:01,160 that work pretty well. 602 00:29:01,160 --> 00:29:02,930 For integers, there are some real simple ones. 603 00:29:02,930 --> 00:29:04,470 If there's something more complex, 604 00:29:04,470 --> 00:29:07,000 find a good hash function, but designing a really good hash 605 00:29:07,000 --> 00:29:10,070 function takes a lot of effort because you want it 606 00:29:10,070 --> 00:29:11,610 to have that even distribution. 607 00:29:11,610 --> 00:29:15,290 You'd like it to have as few duplicates 608 00:29:15,290 --> 00:29:17,730 if you like in each spot in the hash table for each one 609 00:29:17,730 --> 00:29:22,800 of the things that you use. 610 00:29:22,800 --> 00:29:25,760 Let me pull back for a second then. 611 00:29:25,760 --> 00:29:28,640 What have we done over the last three or four lectures? 612 00:29:28,640 --> 00:29:31,790 We've started introducing you to classes of algorithms. 613 00:29:31,790 --> 00:29:34,100 Things that I'd like you to be able to see 614 00:29:34,100 --> 00:29:37,310 are how to do some simple complexity analysis. 615 00:29:37,310 --> 00:29:40,670 Perhaps more importantly, how to recognize a kind of algorithm 616 00:29:40,670 --> 00:29:43,940 based on its properties and know what class it belongs to. 617 00:29:43,940 --> 00:29:44,640 This is a hint. 618 00:29:44,640 --> 00:29:46,560 If you like, leaning towards the next quiz, 619 00:29:46,560 --> 00:29:48,714 that you oughta be able to say that 620 00:29:48,714 --> 00:29:50,130 looks like a logarithmic algorithm 621 00:29:50,130 --> 00:29:51,780 because it's got a particular property. 622 00:29:51,780 --> 00:29:53,280 That looks like an n log n algorithm 623 00:29:53,280 --> 00:29:54,821 because it has a particular property. 624 00:29:54,821 --> 00:29:56,740 And the third thing we've done is 625 00:29:56,740 --> 00:30:00,190 we've given you now a set of sort of standard algorithms 626 00:30:00,190 --> 00:30:01,710 if you like. 627 00:30:01,710 --> 00:30:05,270 Brute force, just walk through every possible case. 628 00:30:05,270 --> 00:30:08,080 It works well if the problem sizes are small. 629 00:30:08,080 --> 00:30:11,090 We've had, there are a number of variants of guess and check 630 00:30:11,090 --> 00:30:14,540 or hypothesize and test, where you try to guess the solution 631 00:30:14,540 --> 00:30:17,270 and then check it and use that to refine your search. 632 00:30:17,270 --> 00:30:19,925 Successive approximation, Newton-Raphson 633 00:30:19,925 --> 00:30:21,300 was one nice example, but there's 634 00:30:21,300 --> 00:30:24,200 a whole class of things that get closer and closer, reducing 635 00:30:24,200 --> 00:30:26,550 your errors as you go along. 636 00:30:26,550 --> 00:30:29,807 Divide and conquer and actually I 637 00:30:29,807 --> 00:30:31,890 guess in between there bi-section, which is really 638 00:30:31,890 --> 00:30:34,820 just a very difficult of successive approximation, 639 00:30:34,820 --> 00:30:37,140 but divide and conquer is a class of algorithm. 640 00:30:37,140 --> 00:30:39,620 These are tools that you want in your tool box. 641 00:30:39,620 --> 00:30:41,370 These are the kinds of algorithms that you 642 00:30:41,370 --> 00:30:42,810 should be able to recognize. 643 00:30:42,810 --> 00:30:45,990 And what I'd like you to begin to do is to look at a problem 644 00:30:45,990 --> 00:30:48,690 and say, gee, which kind of algorithm 645 00:30:48,690 --> 00:30:51,600 is most likely to be successful on this problem, 646 00:30:51,600 --> 00:30:54,340 and map it into that case. 647 00:30:54,340 --> 00:30:56,810 OK, starting next -- don't worry I'm not going to quit 36 648 00:30:56,810 --> 00:30:58,810 minutes after -- I got one more topic for today. 649 00:30:58,810 --> 00:31:01,780 But jumping ahead, I'm going to skip in a second 650 00:31:01,780 --> 00:31:04,880 now to talk about one last linguistic thing from Python, 651 00:31:04,880 --> 00:31:07,410 but I want to preface Professor Guttag is going to pick up 652 00:31:07,410 --> 00:31:09,951 next week, and what we're going to start doing then is taking 653 00:31:09,951 --> 00:31:12,960 these classes of algorithms and start looking at much more 654 00:31:12,960 --> 00:31:13,870 complex algorithms. 655 00:31:13,870 --> 00:31:16,200 Things you're more likely to use in problems. 656 00:31:16,200 --> 00:31:19,780 Things like knapsack problems as we move ahead. 657 00:31:19,780 --> 00:31:21,780 But the tools you've seen so far are really 658 00:31:21,780 --> 00:31:23,155 the things that were going to see 659 00:31:23,155 --> 00:31:24,860 as we build those algorithms. 660 00:31:24,860 --> 00:31:26,920 OK, I want to spend the last portion 661 00:31:26,920 --> 00:31:29,420 of this lecture doing one last piece of linguistics stuff. 662 00:31:29,420 --> 00:31:32,640 One last little thing from Python, and that's 663 00:31:32,640 --> 00:31:44,760 to talk about exceptions. 664 00:31:44,760 --> 00:31:48,597 OK, you've actually seen exceptions a lot, 665 00:31:48,597 --> 00:31:51,180 you just didn't know that's what they were, because exceptions 666 00:31:51,180 --> 00:31:52,689 show up everywhere in Python. 667 00:31:52,689 --> 00:31:54,230 Let me give you a couple of examples. 668 00:31:54,230 --> 00:31:58,480 I'm going to clear some space here. 669 00:31:58,480 --> 00:32:00,840 Before I type in that expression, 670 00:32:00,840 --> 00:32:02,417 I get an error, right? 671 00:32:02,417 --> 00:32:03,250 So it's not defined. 672 00:32:03,250 --> 00:32:06,190 But in fact, what this did was it threw an exception. 673 00:32:06,190 --> 00:32:10,840 An exception is called a name error exception. 674 00:32:10,840 --> 00:32:13,820 It says you gave me something I didn't know how to deal. 675 00:32:13,820 --> 00:32:15,460 I'm going to throw it, or raise it, 676 00:32:15,460 --> 00:32:16,910 to use the right term to somebody 677 00:32:16,910 --> 00:32:18,410 in case they can handle it, but it's 678 00:32:18,410 --> 00:32:21,430 a particular kind of exception. 679 00:32:21,430 --> 00:32:24,500 I might do something like, remind you I have test. 680 00:32:24,500 --> 00:32:31,834 If I do this, try and get the 10th element of a list that's 681 00:32:31,834 --> 00:32:32,500 only eight long. 682 00:32:32,500 --> 00:32:34,210 I get what looks like an error, but it's actually 683 00:32:34,210 --> 00:32:35,130 throwing an exception. 684 00:32:35,130 --> 00:32:37,080 The exception is right there. 685 00:32:37,080 --> 00:32:39,730 It's an index error, that is it's 686 00:32:39,730 --> 00:32:42,720 trying to do something going beyond the range of what 687 00:32:42,720 --> 00:32:45,055 this thing could deal with. 688 00:32:45,055 --> 00:32:47,180 OK, you say, come on, I've seen these all the time. 689 00:32:47,180 --> 00:32:49,013 Every time I type something into my program, 690 00:32:49,013 --> 00:32:50,830 it does one of these things, right? 691 00:32:50,830 --> 00:32:54,450 When we're just interacting with idol, with the interactive 692 00:32:54,450 --> 00:32:57,150 editor or sorry, interactive environment if you like, 693 00:32:57,150 --> 00:32:58,589 that's what you expect. 694 00:32:58,589 --> 00:33:00,130 What's happening is that we're typing 695 00:33:00,130 --> 00:33:02,200 in something, an expression it doesn't know how to deal. 696 00:33:02,200 --> 00:33:04,116 It's raising the exception, but is this simply 697 00:33:04,116 --> 00:33:07,600 bubbling up at the top level saying you've got a problem. 698 00:33:07,600 --> 00:33:11,210 Suppose instead you're in the middle 699 00:33:11,210 --> 00:33:15,830 of some deep piece of code and you get one of these cases. 700 00:33:15,830 --> 00:33:18,660 It's kind of annoying if it throws it all the way 701 00:33:18,660 --> 00:33:21,450 back up to top level for you to fix. 702 00:33:21,450 --> 00:33:23,880 If it's truly a bug, that's the right thing to do. 703 00:33:23,880 --> 00:33:25,095 You want to catch it. 704 00:33:25,095 --> 00:33:27,690 But in many cases exceptions or things that, in fact, 705 00:33:27,690 --> 00:33:30,280 you as a program designer could have handled. 706 00:33:30,280 --> 00:33:31,870 So I'm going to distinguish in fact 707 00:33:31,870 --> 00:33:36,160 between un-handled exceptions, which 708 00:33:36,160 --> 00:33:43,442 are the things that we saw there, and handled exceptions. 709 00:33:43,442 --> 00:33:45,650 I'm going to show you in a second how to handle them, 710 00:33:45,650 --> 00:33:46,858 but let's look at an example. 711 00:33:46,858 --> 00:33:50,200 What do I mean by a handled exception? 712 00:33:50,200 --> 00:33:54,230 Well let's look at the next piece of code. 713 00:33:54,230 --> 00:33:55,210 OK, it's right here. 714 00:33:55,210 --> 00:33:56,700 It's called read float. 715 00:33:56,700 --> 00:33:58,109 We'll look at it in a second. 716 00:33:58,109 --> 00:33:59,900 Let me sort of set the stage up for this -- 717 00:33:59,900 --> 00:34:03,210 suppose I want to input -- I'm sorry I want you as a user 718 00:34:03,210 --> 00:34:05,850 to input a floating point number. 719 00:34:05,850 --> 00:34:08,070 We talked about things you could do 720 00:34:08,070 --> 00:34:09,320 to try make sure that happens. 721 00:34:09,320 --> 00:34:10,778 You could run through a little loop 722 00:34:10,778 --> 00:34:12,370 to say keep trying until you get one. 723 00:34:12,370 --> 00:34:15,650 But one of the ways I could deal with it is what's shown here. 724 00:34:15,650 --> 00:34:17,830 And what's this little loop say to do? 725 00:34:17,830 --> 00:34:20,580 This little loop says I'm going to write 726 00:34:20,580 --> 00:34:23,484 a function or procedures that takes in two messages. 727 00:34:23,484 --> 00:34:25,150 I'm going to run through a loop, and I'm 728 00:34:25,150 --> 00:34:26,983 going to request some input, which I'm going 729 00:34:26,983 --> 00:34:28,150 to read in with raw input. 730 00:34:28,150 --> 00:34:30,040 I'm going to store that into val. 731 00:34:30,040 --> 00:34:31,880 And as you might expect, I'm going 732 00:34:31,880 --> 00:34:35,522 to then try and see if I can convert that into a float. 733 00:34:35,522 --> 00:34:37,730 Oh wait a minute, that's a little different than what 734 00:34:37,730 --> 00:34:38,730 we did last time, right? 735 00:34:38,730 --> 00:34:40,682 Last time we checked the type and said 736 00:34:40,682 --> 00:34:41,890 if it is a float you're okay. 737 00:34:41,890 --> 00:34:42,680 If not, carry on. 738 00:34:42,680 --> 00:34:43,971 In this case what would happen? 739 00:34:43,971 --> 00:34:46,800 Well float is going to try and do the cohersion. 740 00:34:46,800 --> 00:34:50,540 It's going to try and turn it into a floating point number. 741 00:34:50,540 --> 00:34:52,990 If it does, I'm great, right. 742 00:34:52,990 --> 00:34:55,100 And I like just to return val. 743 00:34:55,100 --> 00:34:58,700 If it doesn't, floats going to throw or raise, 744 00:34:58,700 --> 00:35:00,330 to use the right term, an exception. 745 00:35:00,330 --> 00:35:03,040 It's going to say something like a type error. 746 00:35:03,040 --> 00:35:04,650 In fact, let's try it over here. 747 00:35:04,650 --> 00:35:09,177 I if I go over here, and I say float of three, 748 00:35:09,177 --> 00:35:10,510 it's going to do the conversion. 749 00:35:10,510 --> 00:35:14,830 But if I say turn this into a float, 750 00:35:14,830 --> 00:35:16,790 ah it throws a value error exception. 751 00:35:16,790 --> 00:35:19,610 It says it's a wrong kind of value that I've got. 752 00:35:19,610 --> 00:35:21,880 So I'm going to write a little piece of code that 753 00:35:21,880 --> 00:35:25,950 says if it gives me a float, I'm set, But if not, 754 00:35:25,950 --> 00:35:29,130 I'd like to have the code handle the exception. 755 00:35:29,130 --> 00:35:33,470 And that's what this funky try/except thing does. 756 00:35:33,470 --> 00:35:43,570 This is a try/except block and here's the flow of control 757 00:35:43,570 --> 00:35:45,170 that takes place in there. 758 00:35:45,170 --> 00:35:47,217 When I hit a try-block. 759 00:35:47,217 --> 00:35:48,550 It's going to literally do that. 760 00:35:48,550 --> 00:35:51,090 It's going to try and execute the instructions. 761 00:35:51,090 --> 00:35:54,310 If it can successfully execute the instructions, 762 00:35:54,310 --> 00:35:56,870 it's going to skip past the except block 763 00:35:56,870 --> 00:35:59,720 and just carry on with the rest of the code. 764 00:35:59,720 --> 00:36:03,920 If, however, it raises an exception, that exception, 765 00:36:03,920 --> 00:36:06,940 at least in this case where it's a pure accept with no tags 766 00:36:06,940 --> 00:36:10,140 on it, is going to get, be like thrown directly 767 00:36:10,140 --> 00:36:12,927 to the except block, and it's going to try and execute that. 768 00:36:12,927 --> 00:36:14,760 So notice what's going to happen here, then. 769 00:36:14,760 --> 00:36:16,960 If I give it something that can be turned into a float, 770 00:36:16,960 --> 00:36:18,380 I come in here, I read the input, 771 00:36:18,380 --> 00:36:20,000 if it can be turned into a float, 772 00:36:20,000 --> 00:36:22,600 I'm going to just return the value and I'm set. 773 00:36:22,600 --> 00:36:25,070 If not, it's basically going to throw it 774 00:36:25,070 --> 00:36:27,570 to this point, in which case I'm going to print out an error 775 00:36:27,570 --> 00:36:30,240 message and oh yeah, I'm still in that while loop, 776 00:36:30,240 --> 00:36:31,960 so it's going to go around. 777 00:36:31,960 --> 00:36:35,320 So in fact, if I go here and, let me 778 00:36:35,320 --> 00:36:39,490 un-comment this and run the code. 779 00:36:39,490 --> 00:36:41,950 It says enter a float. 780 00:36:41,950 --> 00:36:48,940 And if I give it something that can be -- sorry, I've got, yes, 781 00:36:48,940 --> 00:36:51,480 never mind the grades crap. 782 00:36:51,480 --> 00:36:54,190 Where did I have that? 783 00:36:54,190 --> 00:36:58,014 Let me comment that out. 784 00:36:58,014 --> 00:37:00,180 Somehow it's appropriate in the middle of my lecture 785 00:37:00,180 --> 00:37:04,650 for it to say whoops at me but that wasn't what I intended. 786 00:37:04,650 --> 00:37:08,959 And we will try this again. 787 00:37:08,959 --> 00:37:10,250 OK, says it says enter a float. 788 00:37:10,250 --> 00:37:12,541 I give it something that can be converted into a float, 789 00:37:12,541 --> 00:37:13,485 it says fine. 790 00:37:13,485 --> 00:37:15,360 I'm going to go back and run it again though. 791 00:37:15,360 --> 00:37:21,290 If I run it again, it says enter a float. 792 00:37:21,290 --> 00:37:24,750 Ah ha, it goes into that accept portion, prints out a message, 793 00:37:24,750 --> 00:37:27,026 and goes back around the while loop to say try again. 794 00:37:27,026 --> 00:37:28,400 And it's going to keep doing this 795 00:37:28,400 --> 00:37:33,360 until I give it something that does serve as a float. 796 00:37:33,360 --> 00:37:36,810 Right, so an exception then has this format 797 00:37:36,810 --> 00:37:39,172 that I can control as a programmer. 798 00:37:39,172 --> 00:37:40,380 Why would I want to use this? 799 00:37:40,380 --> 00:37:44,070 Well some things I can actually expect may happen 800 00:37:44,070 --> 00:37:45,424 and I want to handle them. 801 00:37:45,424 --> 00:37:46,840 The float example is a simple one. 802 00:37:46,840 --> 00:37:47,780 I'm going to generalize in a second. 803 00:37:47,780 --> 00:37:48,910 Here's a better example. 804 00:37:48,910 --> 00:37:52,135 I'm writing a piece of code that wants to input a file. 805 00:37:52,135 --> 00:37:53,510 I can certainly imagine something 806 00:37:53,510 --> 00:37:54,740 that says give me the file name, I'm 807 00:37:54,740 --> 00:37:56,010 going to do something with it. 808 00:37:56,010 --> 00:37:59,270 I can't guarantee that the file may exist under that name, 809 00:37:59,270 --> 00:38:01,530 but I know that's something that might occur. 810 00:38:01,530 --> 00:38:03,415 So a nice way to handle it is to write it 811 00:38:03,415 --> 00:38:04,810 as an exception that says, here's 812 00:38:04,810 --> 00:38:06,480 what I want to do if I get the file. 813 00:38:06,480 --> 00:38:09,476 But just in case the file name is not there, 814 00:38:09,476 --> 00:38:11,600 here's what I want to do in that case to handle it. 815 00:38:11,600 --> 00:38:15,275 Let me specify what the exception should do. 816 00:38:15,275 --> 00:38:16,650 In the example I just wrote here, 817 00:38:16,650 --> 00:38:17,900 this is pretty trivial, right. 818 00:38:17,900 --> 00:38:19,590 OK, I'm trying to input floats. 819 00:38:19,590 --> 00:38:21,650 I could generalize this pretty nicely. 820 00:38:21,650 --> 00:38:23,210 Imagine the same kind of idea where 821 00:38:23,210 --> 00:38:25,576 I want to simply say I want to take input of anything 822 00:38:25,576 --> 00:38:28,200 and try and see how to make sure I get the right kind of thing. 823 00:38:28,200 --> 00:38:35,240 I want to make it polymorphic. 824 00:38:35,240 --> 00:38:38,250 Well that's pretty easy to do. 825 00:38:38,250 --> 00:38:43,410 That is basically the next example, right here. 826 00:38:43,410 --> 00:38:49,110 In fact, let me comment this one out. 827 00:38:49,110 --> 00:38:50,880 I can do exactly the same kind of thing. 828 00:38:50,880 --> 00:38:55,270 Now what I'm going to try and do is read in a set of values, 829 00:38:55,270 --> 00:38:59,190 but I'm going to give a type of value as well as the messages. 830 00:38:59,190 --> 00:39:00,220 The format is the same. 831 00:39:00,220 --> 00:39:02,290 I'm going to ask for some input, and then I 832 00:39:02,290 --> 00:39:05,840 am going to use that procedure to check, 833 00:39:05,840 --> 00:39:07,810 is this the right type of value. 834 00:39:07,810 --> 00:39:10,260 And I'm trying to use that to do the coercion if you like. 835 00:39:10,260 --> 00:39:12,880 Same thing if it works, I'm going to skip that, if it not, 836 00:39:12,880 --> 00:39:14,296 it's going to throw the exception. 837 00:39:14,296 --> 00:39:16,300 Why is this much nice? 838 00:39:16,300 --> 00:39:18,119 Well, that's a handy piece of code. 839 00:39:18,119 --> 00:39:19,535 Because imagine I've got that now, 840 00:39:19,535 --> 00:39:23,960 and I can now store that away in some file name, input dot p y, 841 00:39:23,960 --> 00:39:27,150 and import into every one of my procedure functions, 842 00:39:27,150 --> 00:39:29,240 pardon me, my files of procedures, 843 00:39:29,240 --> 00:39:33,160 because it's a standard way of now giving me the input. 844 00:39:33,160 --> 00:39:35,060 OK, so far though, I've just shown you what 845 00:39:35,060 --> 00:39:36,680 happens inside a peace a code. 846 00:39:36,680 --> 00:39:37,740 It raises an exception. 847 00:39:37,740 --> 00:39:39,470 It goes to that accept clause. 848 00:39:39,470 --> 00:39:42,140 We don't have to use it just inside of one place. 849 00:39:42,140 --> 00:39:44,430 We can actually use it more generally. 850 00:39:44,430 --> 00:39:47,480 And that gets me to the last example I wanted to show you. 851 00:39:47,480 --> 00:39:55,519 Let me uncomment this. 852 00:39:55,519 --> 00:39:56,810 Let's take a look at this code. 853 00:39:56,810 --> 00:40:00,010 This looks like a handy piece of code 854 00:40:00,010 --> 00:40:02,740 to have given what we just recently did to you. 855 00:40:02,740 --> 00:40:03,810 All right, get grades. 856 00:40:03,810 --> 00:40:06,230 It's a little function that's going to say give me a file 857 00:40:06,230 --> 00:40:09,570 name, and I'm going to go off and open that up 858 00:40:09,570 --> 00:40:11,870 and bind it to a local variable. 859 00:40:11,870 --> 00:40:13,744 And if it's successful, then I'd just 860 00:40:13,744 --> 00:40:16,160 like to go off and do some things like turn it into a list 861 00:40:16,160 --> 00:40:18,770 so I can compute average score or distributions or something 862 00:40:18,770 --> 00:40:19,270 else. 863 00:40:19,270 --> 00:40:21,700 I don't really care what's going on here. 864 00:40:21,700 --> 00:40:24,190 Notice though what I've done. 865 00:40:24,190 --> 00:40:27,300 Open, it doesn't succeed is going 866 00:40:27,300 --> 00:40:31,705 to raise a particular kind of exception called I O error. 867 00:40:31,705 --> 00:40:33,830 And so I've done a little bit different things here 868 00:40:33,830 --> 00:40:39,519 which is I put the accept part of the block with I O error. 869 00:40:39,519 --> 00:40:40,310 What does that say? 870 00:40:40,310 --> 00:40:43,840 It says if in the code up here I get an exception of that sort, 871 00:40:43,840 --> 00:40:46,920 I'm going to go to this place to handle it. 872 00:40:46,920 --> 00:40:48,960 On the other hand, if I'm inside this procedure 873 00:40:48,960 --> 00:40:51,360 and some other exception is raised, 874 00:40:51,360 --> 00:40:53,200 it's not tagged by that one, it's 875 00:40:53,200 --> 00:40:55,316 going to raise it up the chain. 876 00:40:55,316 --> 00:40:57,690 If that procedure was called by some other procedure it's 877 00:40:57,690 --> 00:41:00,040 going to say is there an exception block in there 878 00:41:00,040 --> 00:41:00,936 that can handle that. 879 00:41:00,936 --> 00:41:02,810 If not, I am going to keep going up the chain 880 00:41:02,810 --> 00:41:04,929 until eventually I get to the top level. 881 00:41:04,929 --> 00:41:06,220 And you can see that down here. 882 00:41:06,220 --> 00:41:07,320 I'm going to run this in a second. 883 00:41:07,320 --> 00:41:08,736 This is just a piece of code where 884 00:41:08,736 --> 00:41:10,700 I'm going to say, gee, if I can get the grades, 885 00:41:10,700 --> 00:41:15,020 do something, if not carry on. 886 00:41:15,020 --> 00:41:19,287 And if I go ahead and run this -- now it's going to say woops, 887 00:41:19,287 --> 00:41:19,786 at me. 888 00:41:19,786 --> 00:41:25,760 What happened? 889 00:41:25,760 --> 00:41:27,170 I'm down here and try, I'm trying 890 00:41:27,170 --> 00:41:30,120 do get grades, which is a call to that function, which is not 891 00:41:30,120 --> 00:41:33,932 bound in my computer. 892 00:41:33,932 --> 00:41:34,890 That says it's in here. 893 00:41:34,890 --> 00:41:36,450 It's in this try-block. 894 00:41:36,450 --> 00:41:41,350 It raised an exception, but it wasn't and I O error. 895 00:41:41,350 --> 00:41:45,490 So it passes it back, past this exception, up to this level, 896 00:41:45,490 --> 00:41:48,910 which gets to that exception. 897 00:41:48,910 --> 00:41:51,210 Let me say this a little bit better then. 898 00:41:51,210 --> 00:41:54,240 I can write exceptions inside a piece of code. 899 00:41:54,240 --> 00:41:56,570 Try this, if it doesn't work I can 900 00:41:56,570 --> 00:41:59,240 have an exception that catches any error at that level. 901 00:41:59,240 --> 00:42:02,360 Or I can say catch only these kinds of errors at that level, 902 00:42:02,360 --> 00:42:05,190 otherwise pass them up the chain. 903 00:42:05,190 --> 00:42:07,250 And that exception will keep getting passed up 904 00:42:07,250 --> 00:42:08,680 the chain of calls until it either 905 00:42:08,680 --> 00:42:10,180 gets to the top level, in which case 906 00:42:10,180 --> 00:42:11,830 it looks like what you see all the time. 907 00:42:11,830 --> 00:42:14,330 It looks like an error, but it tells you what the error came 908 00:42:14,330 --> 00:42:18,080 from, or it gets an exception , it can deal with it. 909 00:42:18,080 --> 00:42:20,580 OK, so the last thing to say about this 910 00:42:20,580 --> 00:42:37,940 is what's the difference between an exception and an assert? 911 00:42:37,940 --> 00:42:39,660 We introduced asserts earlier on. 912 00:42:39,660 --> 00:42:42,350 You've actually seen them in some pieces of code, 913 00:42:42,350 --> 00:42:47,150 so what's the difference between the two of them? 914 00:42:47,150 --> 00:42:49,920 Well here's my way of describing it. 915 00:42:49,920 --> 00:42:53,090 The goal of an assert, or an assert statement, 916 00:42:53,090 --> 00:42:55,600 is basically to say, look, you can make sure 917 00:42:55,600 --> 00:42:59,150 that my function is going to give this kind of result 918 00:42:59,150 --> 00:43:02,380 if you give me inputs of a particular type. 919 00:43:02,380 --> 00:43:03,630 Sorry, wrong way of saying it. 920 00:43:03,630 --> 00:43:05,220 If you give me inputs that satisfy 921 00:43:05,220 --> 00:43:07,302 some particular constraints. 922 00:43:07,302 --> 00:43:08,760 That was the kind of thing you saw. 923 00:43:08,760 --> 00:43:11,260 Asserts said here are some conditions to test. 924 00:43:11,260 --> 00:43:13,770 If they're true, I'm going to let the rest of the code run. 925 00:43:13,770 --> 00:43:16,840 If not, I'm going to throw an error. 926 00:43:16,840 --> 00:43:19,590 So the assertion is basically saying 927 00:43:19,590 --> 00:43:24,750 we got some pre-conditions, those 928 00:43:24,750 --> 00:43:27,100 are the clauses inside the assert that have to be true, 929 00:43:27,100 --> 00:43:34,120 and there's a post condition. and in essence, 930 00:43:34,120 --> 00:43:37,780 what the assert is saying is, or rather the programmer is saying 931 00:43:37,780 --> 00:43:40,370 using the assert is, if you give me input that satisfies 932 00:43:40,370 --> 00:43:41,995 the preconditions, I'm guaranteeing 933 00:43:41,995 --> 00:43:43,780 to you that my code is going to give you something 934 00:43:43,780 --> 00:43:45,050 that meets the post condition. 935 00:43:45,050 --> 00:43:46,737 It's going to do the right thing. 936 00:43:46,737 --> 00:43:48,820 And as a consequence, as you saw with the asserts, 937 00:43:48,820 --> 00:43:52,560 if the preconditions aren't true, it throws an error. 938 00:43:52,560 --> 00:43:55,480 It goes back up the top level saying stop operation 939 00:43:55,480 --> 00:43:59,100 immediately and goes back up the top level. 940 00:43:59,100 --> 00:44:00,774 Asserts in fact are nice in the sense 941 00:44:00,774 --> 00:44:02,190 that they let you check conditions 942 00:44:02,190 --> 00:44:03,730 at debugging time or testing time. 943 00:44:03,730 --> 00:44:07,610 So you can actually use them to see where your code is going. 944 00:44:07,610 --> 00:44:10,070 An exception, when you use an exception, basically what 945 00:44:10,070 --> 00:44:12,030 you're saying is, look, you can do anything 946 00:44:12,030 --> 00:44:13,447 you want with my function, and you 947 00:44:13,447 --> 00:44:15,030 can be sure that I'm going to tell you 948 00:44:15,030 --> 00:44:16,340 if something is going wrong. 949 00:44:16,340 --> 00:44:19,840 And in many cases I'm going to handle it myself. 950 00:44:19,840 --> 00:44:22,300 So as much as possible, the exceptions 951 00:44:22,300 --> 00:44:26,296 are going to try to handle unexpected things, 952 00:44:26,296 --> 00:44:27,920 actually wrong term, you expected them, 953 00:44:27,920 --> 00:44:29,210 but not what the user did. 954 00:44:29,210 --> 00:44:30,940 It's going to try to handle conditions 955 00:44:30,940 --> 00:44:33,760 other than the normal ones itself. 956 00:44:33,760 --> 00:44:36,790 So you can use the thing in anyway. 957 00:44:36,790 --> 00:44:39,290 If it can't, it's going to try and throw it to somebody else 958 00:44:39,290 --> 00:44:41,190 to handle, and only if there is no handler 959 00:44:41,190 --> 00:44:45,270 for that unexpected condition, will it come up to top level. 960 00:44:45,270 --> 00:44:47,400 So, summarizing better, assert is 961 00:44:47,400 --> 00:44:49,360 something you put in to say to the user, 962 00:44:49,360 --> 00:44:51,790 make sure you're giving me input of this type, 963 00:44:51,790 --> 00:44:54,164 but I'm going to guarantee you the rest of the code works 964 00:44:54,164 --> 00:44:54,810 correctly. 965 00:44:54,810 --> 00:44:56,880 Exceptions and exception handlers are saying, 966 00:44:56,880 --> 00:44:59,400 here are the odd cases that I might see 967 00:44:59,400 --> 00:45:01,650 and here's what I'd like to do in those cases 968 00:45:01,650 --> 00:45:04,950 in order to try and be able to deal with them. 969 00:45:04,950 --> 00:45:10,970 Last thing to say is why would you want to have exceptions? 970 00:45:10,970 --> 00:45:14,140 Well, let's go back to that case of inputting 971 00:45:14,140 --> 00:45:16,570 a simple little floating point. 972 00:45:16,570 --> 00:45:18,270 If I'm expecting mostly numbers in, 973 00:45:18,270 --> 00:45:19,936 I can certainly try and do the coercion. 974 00:45:19,936 --> 00:45:22,910 I could have done that just doing the coercion. 975 00:45:22,910 --> 00:45:25,990 The problem is, I want to know if, in fact, I've 976 00:45:25,990 --> 00:45:28,020 got something that's not of the form I expect. 977 00:45:28,020 --> 00:45:30,490 I'm much better having an exception get handled 978 00:45:30,490 --> 00:45:33,525 at the time of input than to let that prop -- 979 00:45:33,525 --> 00:45:35,900 that value rather propagate through a whole bunch of code 980 00:45:35,900 --> 00:45:38,960 until eventually it hits an error 17 calls later, 981 00:45:38,960 --> 00:45:41,460 and you have no clue where it came from. 982 00:45:41,460 --> 00:45:43,190 So the exceptions are useful when 983 00:45:43,190 --> 00:45:45,420 you want to have the ability to say, 984 00:45:45,420 --> 00:45:48,032 I expect in general this kind of behavior, 985 00:45:48,032 --> 00:45:50,490 but I do know there are some other things that might happen 986 00:45:50,490 --> 00:45:53,510 and here's what I'd like to do in each one of those cases. 987 00:45:53,510 --> 00:45:56,100 But I do want to make sure that I don't let a value that I'm 988 00:45:56,100 --> 00:45:58,700 not expecting pass through. 989 00:45:58,700 --> 00:46:01,480 That goes back to that idea of sort of discipline coding. 990 00:46:01,480 --> 00:46:03,530 It's easy to have assumptions about what 991 00:46:03,530 --> 00:46:06,730 you think are going to come into the program when you writ it. 992 00:46:06,730 --> 00:46:09,250 If you really know what they are use them as search, 993 00:46:09,250 --> 00:46:11,500 but if you think there's going to be some flexibility, 994 00:46:11,500 --> 00:46:14,540 you want to prevent the user getting trapped in a bad spot, 995 00:46:14,540 --> 00:46:17,860 and exceptions as a consequence are a good thing to use.