1 00:00:00,000 --> 00:00:02,400 ANNOUNCER: Open content is provided under a creative 2 00:00:02,400 --> 00:00:03,830 commons license. 3 00:00:03,830 --> 00:00:06,840 Your support will help MIT OpenCourseWare continue to 4 00:00:06,840 --> 00:00:10,520 offer High-quality educational resources for free. 5 00:00:10,520 --> 00:00:13,380 To make a donation, or view additional materials from 6 00:00:13,380 --> 00:00:17,490 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,490 --> 00:00:19,930 ocw.mit.edu . 8 00:00:19,930 --> 00:00:23,820 PROFESSOR ERIC GRIMSON: Let's recap where we were. 9 00:00:23,820 --> 00:00:26,640 Last lecture, we talked about, or started to talk about, 10 00:00:26,640 --> 00:00:27,770 efficiency. 11 00:00:27,770 --> 00:00:28,640 Orders of growth. 12 00:00:28,640 --> 00:00:30,050 Complexity. 13 00:00:30,050 --> 00:00:33,450 And I'll remind you, we saw a set of algorithms, and part of 14 00:00:33,450 --> 00:00:35,740 my goal was to get you to begin to recognize 15 00:00:35,740 --> 00:00:38,380 characteristics of algorithms that map into 16 00:00:38,380 --> 00:00:40,200 a particular class. 17 00:00:40,200 --> 00:00:40,820 So what did we see? 18 00:00:40,820 --> 00:00:44,480 We saw linear algorithms. Typical characterization, not 19 00:00:44,480 --> 00:00:47,100 all the time, but typical characterization, is an 20 00:00:47,100 --> 00:00:51,270 algorithm that reduces the size of a problem by one, or 21 00:00:51,270 --> 00:00:55,310 by some constant amount each time, is typically an example 22 00:00:55,310 --> 00:00:57,330 of a linear algorithm. 23 00:00:57,330 --> 00:01:00,140 And we saw a couple of examples of linear algorithms. 24 00:01:00,140 --> 00:01:03,620 We also saw a logarithmic algorithm. and we like log 25 00:01:03,620 --> 00:01:06,510 algorithms, because they're really fast. A typical 26 00:01:06,510 --> 00:01:09,490 characteristic of a log algorithm is a pro-- or sorry, 27 00:01:09,490 --> 00:01:13,850 an algorithm where it reduces the size of the problem by a 28 00:01:13,850 --> 00:01:14,820 constant factor. 29 00:01:14,820 --> 00:01:16,372 Obviously-- and that's a bad way of saying it, I said 30 00:01:16,372 --> 00:01:18,120 constant the previous time-- in the linear case, it's 31 00:01:18,120 --> 00:01:19,600 subtract by certain amount. 32 00:01:19,600 --> 00:01:21,840 In the log case, it's divide by an amount. 33 00:01:21,840 --> 00:01:23,240 Cut the problem in half. 34 00:01:23,240 --> 00:01:25,300 Cut the problem in half again. 35 00:01:25,300 --> 00:01:26,110 And that's a typical 36 00:01:26,110 --> 00:01:28,310 characterization of a log algorithm. 37 00:01:28,310 --> 00:01:31,140 We saw some quadratic algorithms, typically those 38 00:01:31,140 --> 00:01:33,910 are things with multiple nested loops, or iterative or 39 00:01:33,910 --> 00:01:36,390 recursive calls, where you're doing, say, a linear amount of 40 00:01:36,390 --> 00:01:39,820 time but you're doing it a linear number of times, and so 41 00:01:39,820 --> 00:01:42,010 it becomes quadratic, and you'll see other polynomial 42 00:01:42,010 --> 00:01:43,100 kinds of algorithms. 43 00:01:43,100 --> 00:01:46,340 And finally, we saw an example of an exponential algorithm, 44 00:01:46,340 --> 00:01:48,510 those Towers of Hanoi. 45 00:01:48,510 --> 00:01:51,200 We don't like exponential algorithms, or at least you 46 00:01:51,200 --> 00:01:53,410 shouldn't like them, because they blow up quickly. 47 00:01:53,410 --> 00:01:55,490 And we saw some examples of that. 48 00:01:55,490 --> 00:01:58,210 And unfortunately, some problems are inherently 49 00:01:58,210 --> 00:02:00,470 exponential, you're sort of stuck with that, and then you 50 00:02:00,470 --> 00:02:03,450 just have to try be as clever as you can. 51 00:02:03,450 --> 00:02:04,750 OK. 52 00:02:04,750 --> 00:02:07,180 At the end of the lecture last time, I also showed you an 53 00:02:07,180 --> 00:02:09,940 example of binary search. 54 00:02:09,940 --> 00:02:12,440 And I want to redo that in a little more detail today, 55 00:02:12,440 --> 00:02:15,170 because I felt like I did that a little more quickly than I 56 00:02:15,170 --> 00:02:18,780 wanted to, so, if you really got binary search, fall asleep 57 00:02:18,780 --> 00:02:21,060 for about ten minutes, just don't snore, your neighbors 58 00:02:21,060 --> 00:02:22,680 may not appreciate it, but we're going to go over it 59 00:02:22,680 --> 00:02:24,730 again, because it's a problem and an idea that we're going 60 00:02:24,730 --> 00:02:26,690 to come back to, and I really want to make sure that I do 61 00:02:26,690 --> 00:02:30,970 this in a way that makes real good sense you. 62 00:02:30,970 --> 00:02:31,570 Again. 63 00:02:31,570 --> 00:02:33,820 Basic premise of binary search, or at least we set it 64 00:02:33,820 --> 00:02:37,720 up was, imagine I have a sorted list of elements. 65 00:02:37,720 --> 00:02:39,150 We get, in a second, to how we're going to get them 66 00:02:39,150 --> 00:02:42,330 sorted, and I want to know, is a particular 67 00:02:42,330 --> 00:02:44,390 element in that list.. 68 00:02:44,390 --> 00:02:47,510 And the basic idea of binary search is to start with the 69 00:02:47,510 --> 00:02:50,790 full range of the list, pick the midpoint, 70 00:02:50,790 --> 00:02:53,160 and test that point. 71 00:02:53,160 --> 00:02:54,970 If it's the thing I'm looking for, I'm golden. 72 00:02:54,970 --> 00:02:58,350 If not, because the list is sorted, I can use the 73 00:02:58,350 --> 00:03:01,150 difference between what I'm looking for and that midpoint 74 00:03:01,150 --> 00:03:04,040 to decide, should I look in the top half of the list, or 75 00:03:04,040 --> 00:03:05,350 the bottom half of the list? 76 00:03:05,350 --> 00:03:06,960 And I keep chopping it down. 77 00:03:06,960 --> 00:03:09,230 And I want to show you a little bit more detail of 78 00:03:09,230 --> 00:03:11,490 that, so let's create a simple little list here. 79 00:03:11,490 --> 00:03:22,880 All right? 80 00:03:22,880 --> 00:03:24,530 I don't care what's in there, but just assume that's my 81 00:03:24,530 --> 00:03:28,510 list. And just to remind you, on your handout, and there it 82 00:03:28,510 --> 00:03:30,930 is on the screen, I'm going to bring it back up, there's the 83 00:03:30,930 --> 00:03:32,140 little binary search algorithm. 84 00:03:32,140 --> 00:03:33,350 We're going to call search, which just 85 00:03:33,350 --> 00:03:34,550 calls binary search. 86 00:03:34,550 --> 00:03:39,140 And you can look at it, and let's in fact take a look at 87 00:03:39,140 --> 00:03:40,640 it to see what it does. 88 00:03:40,640 --> 00:03:42,670 We're going to call binary search, it's going to take the 89 00:03:42,670 --> 00:03:44,640 list to search and the element, but it's also going 90 00:03:44,640 --> 00:03:52,390 to say, here's the first part of the list, and there's the 91 00:03:52,390 --> 00:03:54,760 last part of the list, and what does it 92 00:03:54,760 --> 00:03:55,830 do inside that code? 93 00:03:55,830 --> 00:03:57,770 Well, it checks to see, is it bigger than two? 94 00:03:57,770 --> 00:03:59,280 Are there more than two elements there? 95 00:03:59,280 --> 00:04:01,920 If there are less than two elements there, I just check 96 00:04:01,920 --> 00:04:03,420 one or both of those to see if I'm looking 97 00:04:03,420 --> 00:04:04,530 for the right thing. 98 00:04:04,530 --> 00:04:06,790 Otherwise, what does that code say to do? 99 00:04:06,790 --> 00:04:10,370 It says find the midpoint, which says, take the start, 100 00:04:10,370 --> 00:04:15,270 which is pointing to that place right there, take last 101 00:04:15,270 --> 00:04:17,850 minus first, divide it by 2, and add it to start. 102 00:04:17,850 --> 00:04:21,230 And that basically, somewhere about here, 103 00:04:21,230 --> 00:04:23,740 gives me the midpoint. 104 00:04:23,740 --> 00:04:25,590 Now I look at that element. 105 00:04:25,590 --> 00:04:26,830 Is it the thing I'm looking for? 106 00:04:26,830 --> 00:04:29,020 If I'm really lucky, it is. 107 00:04:29,020 --> 00:04:33,030 If not, I look at the value of that point here and the thing 108 00:04:33,030 --> 00:04:34,210 I'm looking for. 109 00:04:34,210 --> 00:04:36,450 And for sake of argument, let's assume that the thing 110 00:04:36,450 --> 00:04:39,380 I'm looking for is smaller than the value here. 111 00:04:39,380 --> 00:04:41,180 Here's what I do. 112 00:04:41,180 --> 00:04:42,470 I change-- oops! 113 00:04:42,470 --> 00:04:43,320 Let me do that this way-- 114 00:04:43,320 --> 00:04:51,570 I change last to here, and keep first there, and I throw 115 00:04:51,570 --> 00:04:54,330 away all of that. 116 00:04:54,330 --> 00:04:56,480 All right? 117 00:04:56,480 --> 00:04:59,730 That's just the those-- let me use my pointer-- that's just 118 00:04:59,730 --> 00:05:01,580 these two lines here. 119 00:05:01,580 --> 00:05:05,020 I checked the value, and in one case, I'm changing the 120 00:05:05,020 --> 00:05:09,350 last to be mid minus 1, which is the case I'm in here, and I 121 00:05:09,350 --> 00:05:10,050 just call again. 122 00:05:10,050 --> 00:05:12,060 All right? 123 00:05:12,060 --> 00:05:13,450 I'm going to call exactly the same thing. 124 00:05:13,450 --> 00:05:16,860 Now, first is pointing here, last is pointing there, again, 125 00:05:16,860 --> 00:05:18,880 I check to see, are there more than two things left? 126 00:05:18,880 --> 00:05:20,280 There are, in this case. 127 00:05:20,280 --> 00:05:21,000 So what do I do? 128 00:05:21,000 --> 00:05:24,060 I find the midpoint by taking last minus first, divide by 2, 129 00:05:24,060 --> 00:05:25,800 and add to start. 130 00:05:25,800 --> 00:05:29,790 Just for sake of argument, we'll assume it's about there, 131 00:05:29,790 --> 00:05:31,330 and I do the same thing. 132 00:05:31,330 --> 00:05:33,950 Is this value what I'm looking for? 133 00:05:33,950 --> 00:05:35,970 Again, for sake of argument, let's assume it's not. 134 00:05:35,970 --> 00:05:38,130 Let's assume, for sake of argument, the thing I'm 135 00:05:38,130 --> 00:05:40,440 looking for is bigger than this. 136 00:05:40,440 --> 00:05:43,630 In that case, I'm going to throw away all of this, I'm 137 00:05:43,630 --> 00:05:46,790 going to hit that bottom line of that code. 138 00:05:46,790 --> 00:05:47,200 Ah. 139 00:05:47,200 --> 00:05:48,310 What does that do? 140 00:05:48,310 --> 00:05:49,440 It changes the call. 141 00:05:49,440 --> 00:05:54,530 So in this case, first now points 142 00:05:54,530 --> 00:06:00,310 there, last points there. 143 00:06:00,310 --> 00:06:01,110 And I cut around. 144 00:06:01,110 --> 00:06:07,660 And again, notice what I've done. 145 00:06:07,660 --> 00:06:09,980 I've thrown away most of the array-- most of the list, I 146 00:06:09,980 --> 00:06:12,780 shouldn't say array-- most of the list. All right? 147 00:06:12,780 --> 00:06:16,420 So it cuts it down quickly as we go along. 148 00:06:16,420 --> 00:06:18,170 OK. 149 00:06:18,170 --> 00:06:20,730 That's the basic idea of binary search. 150 00:06:20,730 --> 00:06:23,210 And let's just run a couple of examples to remind you of what 151 00:06:23,210 --> 00:06:27,470 happens if we do this. 152 00:06:27,470 --> 00:06:29,960 So if I call, let's [UNINTELLIGIBLE], let's set up 153 00:06:29,960 --> 00:06:37,950 s to be, I don't know, some big long list. OK. 154 00:06:37,950 --> 00:06:41,890 And I'm going to look to see, is a particular element inside 155 00:06:41,890 --> 00:06:47,400 of that list, and again, I'll remind you, that's just giving 156 00:06:47,400 --> 00:06:50,750 me the integers from zero up to 9999 something or other. 157 00:06:50,750 --> 00:06:56,740 If I look for, say, minus 1, you might go, gee, wait a 158 00:06:56,740 --> 00:06:58,680 minute, if I was just doing linear search, I would've 159 00:06:58,680 --> 00:07:01,030 known right away that minus one wasn't in this list, 160 00:07:01,030 --> 00:07:02,250 because it's sorted and it's smaller 161 00:07:02,250 --> 00:07:03,840 than the first elements. 162 00:07:03,840 --> 00:07:06,070 So this looks like it's doing a little bit of extra work, 163 00:07:06,070 --> 00:07:09,340 but you can see, if you look at that, how it cuts it down 164 00:07:09,340 --> 00:07:09,990 at each stage. 165 00:07:09,990 --> 00:07:12,580 And I'll remind you, what I'm printing out there is, first 166 00:07:12,580 --> 00:07:16,920 and last, with the range I'm looking over, and then just 167 00:07:16,920 --> 00:07:20,230 how many times the iteration called. 168 00:07:20,230 --> 00:07:22,590 So in this case, it just keeps chopping down from the back 169 00:07:22,590 --> 00:07:24,830 end, which kind of makes sense, all right? 170 00:07:24,830 --> 00:07:28,040 But in a fixed number, in fact, twenty-three calls, it 171 00:07:28,040 --> 00:07:29,440 gets down to the point of being able to say 172 00:07:29,440 --> 00:07:30,100 whether it's there. 173 00:07:30,100 --> 00:07:33,870 Let's go the other direction. 174 00:07:33,870 --> 00:07:39,490 And yes, I guess I'd better say s not 2, or we're going to 175 00:07:39,490 --> 00:07:40,620 get an error here. 176 00:07:40,620 --> 00:07:48,320 Again, in twenty-three checks. 177 00:07:48,320 --> 00:07:50,650 In this case, it's cutting up from the bottom end, which 178 00:07:50,650 --> 00:07:52,610 makes sense because the thing I'm looking for is always 179 00:07:52,610 --> 00:07:55,950 bigger than the midpoint, and then, I don't know, let's pick 180 00:07:55,950 --> 00:07:58,940 something in between. 181 00:07:58,940 --> 00:08:03,820 Somebody want-- ah, I keep doing that-- somebody like to 182 00:08:03,820 --> 00:08:05,740 give me a number? 183 00:08:05,740 --> 00:08:07,620 I know you'd like to give me other things, other 184 00:08:07,620 --> 00:08:10,810 expression, somebody give me a number. 185 00:08:10,810 --> 00:08:11,650 Anybody? 186 00:08:11,650 --> 00:08:12,750 No? 187 00:08:12,750 --> 00:08:13,940 Sorry. 188 00:08:13,940 --> 00:08:14,880 Thank you. 189 00:08:14,880 --> 00:08:15,430 Good number. 190 00:08:15,430 --> 00:08:23,480 OK, walks in very quickly. 191 00:08:23,480 --> 00:08:24,620 OK? 192 00:08:24,620 --> 00:08:26,780 And if you just look at the numbers, you can see how it 193 00:08:26,780 --> 00:08:29,290 cuts in from one side and then the other side as it keeps 194 00:08:29,290 --> 00:08:31,800 narrowing that range, until it gets down to the place where 195 00:08:31,800 --> 00:08:34,320 there are at most two things left, and then it just has to 196 00:08:34,320 --> 00:08:37,030 check those two to say whether it's there or not. 197 00:08:37,030 --> 00:08:39,350 Think about this compared to a linear search. 198 00:08:39,350 --> 00:08:39,440 All right? 199 00:08:39,440 --> 00:08:41,300 A linear search, I start at the beginning of the list and 200 00:08:41,300 --> 00:08:42,490 walk all the way through it. 201 00:08:42,490 --> 00:08:45,970 All right, if I'm lucky and it's at the low end, I'll find 202 00:08:45,970 --> 00:08:46,770 it pretty quickly. 203 00:08:46,770 --> 00:08:48,980 If it's not, if it's at the far end, I've got to go 204 00:08:48,980 --> 00:08:51,380 forever, and you saw that last time where this thing paused 205 00:08:51,380 --> 00:08:52,470 for a little while while it actually 206 00:08:52,470 --> 00:08:55,210 searched a list this big. 207 00:08:55,210 --> 00:08:55,470 OK. 208 00:08:55,470 --> 00:08:58,750 So, what do I want you to take away from this? 209 00:08:58,750 --> 00:09:00,950 This idea of binary search is going to be a 210 00:09:00,950 --> 00:09:02,720 really powerful tool. 211 00:09:02,720 --> 00:09:04,500 And it has this property, again, of 212 00:09:04,500 --> 00:09:06,440 chopping things into pieces. 213 00:09:06,440 --> 00:09:09,040 So in fact, what does that suggest about the order of 214 00:09:09,040 --> 00:09:09,400 growth here? 215 00:09:09,400 --> 00:09:12,860 What is the complexity of this? 216 00:09:12,860 --> 00:09:14,100 Yeah. 217 00:09:14,100 --> 00:09:14,750 Logarithmic. 218 00:09:14,750 --> 00:09:15,200 Why? 219 00:09:15,200 --> 00:09:17,810 STUDENT: [UNINTELLIGIBLE] 220 00:09:17,810 --> 00:09:18,450 PROFESSOR ERIC GRIMSON: Yeah. 221 00:09:18,450 --> 00:09:18,840 Thank you. 222 00:09:18,840 --> 00:09:20,830 I mean, I know I sort of said it to you, but you're right. 223 00:09:20,830 --> 00:09:21,750 It's logarithmic, right? 224 00:09:21,750 --> 00:09:24,460 It's got that property of, it cuts things in half. 225 00:09:24,460 --> 00:09:26,860 Here's another way to think about why is this log. 226 00:09:26,860 --> 00:09:28,460 Actually, let me ask a slightly different question. 227 00:09:28,460 --> 00:09:30,050 How do we know this always stops? 228 00:09:30,050 --> 00:09:33,190 I mean, I ran three trials here, and it did. 229 00:09:33,190 --> 00:09:36,560 But how would I reason about, does this always stop? 230 00:09:36,560 --> 00:09:37,190 Well let's see. 231 00:09:37,190 --> 00:09:40,180 Where's the end test on this thing? 232 00:09:40,180 --> 00:09:43,470 The end test-- and I've got the wrong glasses on-- but 233 00:09:43,470 --> 00:09:46,450 it's up here, where I'm looking to see, is last minus 234 00:09:46,450 --> 00:09:49,470 first less than or equal to 2? 235 00:09:49,470 --> 00:09:49,620 OK. 236 00:09:49,620 --> 00:09:52,330 So, soon as I get down to a list that has no more than two 237 00:09:52,330 --> 00:09:55,060 elements in it, I'm done. 238 00:09:55,060 --> 00:09:55,600 Notice that. 239 00:09:55,600 --> 00:09:57,440 It's a less than or equal to. 240 00:09:57,440 --> 00:10:00,640 What if I just tested to see if it was only, say, one? 241 00:10:00,640 --> 00:10:01,730 There was one element in there. 242 00:10:01,730 --> 00:10:07,740 Would that have worked? 243 00:10:07,740 --> 00:10:09,480 I think it depends on whether the list is 244 00:10:09,480 --> 00:10:11,870 odd or even in length. 245 00:10:11,870 --> 00:10:12,760 Actually, that's probably not true. 246 00:10:12,760 --> 00:10:14,770 With one, it'll probably always get it down there, but 247 00:10:14,770 --> 00:10:17,440 if I've made it just equal to two, I might have lost. 248 00:10:17,440 --> 00:10:19,320 So first of all, I've got to be careful about the end test. 249 00:10:19,320 --> 00:10:22,270 But the second thing is, OK, if it stops whenever this is 250 00:10:22,270 --> 00:10:26,030 less than two, am I convinced that this will always halt? 251 00:10:26,030 --> 00:10:26,820 And the answer is sure. 252 00:10:26,820 --> 00:10:27,670 Because what do I do? 253 00:10:27,670 --> 00:10:32,820 At each stage, no matter which branch, here or here, I take, 254 00:10:32,820 --> 00:10:35,400 I'm cutting down the length of the list that I'm 255 00:10:35,400 --> 00:10:36,850 searching in half. 256 00:10:36,850 --> 00:10:38,160 All right? 257 00:10:38,160 --> 00:10:41,020 So if I start off with a list of length n, how many times 258 00:10:41,020 --> 00:10:43,610 can I divide it by 2, until I get to something no 259 00:10:43,610 --> 00:10:45,450 more than two left? 260 00:10:45,450 --> 00:10:46,520 Log times, right.? 261 00:10:46,520 --> 00:10:47,630 Exactly as the gentleman said. 262 00:10:47,630 --> 00:10:48,490 Oh, I'm sorry. 263 00:10:48,490 --> 00:10:50,280 You're patiently waiting for me to reward. 264 00:10:50,280 --> 00:10:53,340 Or actually, maybe you're not. 265 00:10:53,340 --> 00:10:55,030 Thank you. 266 00:10:55,030 --> 00:10:56,080 OK. 267 00:10:56,080 --> 00:11:08,690 So this is, in fact, log. 268 00:11:08,690 --> 00:11:10,830 Now, having said that, I actually snuck 269 00:11:10,830 --> 00:11:12,900 something by you. 270 00:11:12,900 --> 00:11:14,330 And I want to spend a couple of minutes 271 00:11:14,330 --> 00:11:16,300 again reinforcing that. 272 00:11:16,300 --> 00:11:19,690 So if we look at that code, and we were little more 273 00:11:19,690 --> 00:11:21,480 careful about this, what did we say to do? 274 00:11:21,480 --> 00:11:22,840 We said look an-- sorry. 275 00:11:22,840 --> 00:11:26,600 Count the number of primitive operations in each step. 276 00:11:26,600 --> 00:11:26,860 OK. 277 00:11:26,860 --> 00:11:30,470 So if I look at this code, first of all I'm calling 278 00:11:30,470 --> 00:11:35,420 search, it just has one call, so looks like search is 279 00:11:35,420 --> 00:11:37,040 constant, except I don't know what 280 00:11:37,040 --> 00:11:38,180 happens inside of b search. 281 00:11:38,180 --> 00:11:39,210 So I've got to look at b search. 282 00:11:39,210 --> 00:11:39,900 So let's see. 283 00:11:39,900 --> 00:11:42,900 The first line, that print thing, is 284 00:11:42,900 --> 00:11:44,910 obviously constant, right? 285 00:11:44,910 --> 00:11:47,510 Just take it as a constant amount of operations But. 286 00:11:47,510 --> 00:11:50,960 let's look at the next one here, or is that second line? 287 00:11:50,960 --> 00:11:51,310 OK. 288 00:11:51,310 --> 00:11:54,900 If last minus first is greater than or equal to 2-- sorry, 289 00:11:54,900 --> 00:11:57,820 less than 2, then either look at this thing or 290 00:11:57,820 --> 00:11:58,590 look at that thing. 291 00:11:58,590 --> 00:12:02,310 And that's where I said we've got to be careful. 292 00:12:02,310 --> 00:12:05,900 That's accessing an element of a list. We have to make sure 293 00:12:05,900 --> 00:12:08,490 that, in fact, that operation is not linear. 294 00:12:08,490 --> 00:12:12,460 So let me expand on that very slightly, and again, we did 295 00:12:12,460 --> 00:12:14,740 this last time but I want to do one more time. 296 00:12:14,740 --> 00:12:22,470 I have to be careful about how I'm actually implementing a 297 00:12:22,470 --> 00:12:22,840 list. 298 00:12:22,840 --> 00:12:38,410 So, for example: in this case, my list 299 00:12:38,410 --> 00:12:40,530 is a bunch of integers. 300 00:12:40,530 --> 00:12:42,740 And one of the things I could take advantage of, is I'm only 301 00:12:42,740 --> 00:12:44,540 going to need a finite amount of space to 302 00:12:44,540 --> 00:12:46,180 represent an integer. 303 00:12:46,180 --> 00:12:49,730 So, for example, if I want to allow for some fairly large 304 00:12:49,730 --> 00:12:52,880 range of integers, I might say, I need four memory cells 305 00:12:52,880 --> 00:12:54,350 in a row to represent an integer. 306 00:12:54,350 --> 00:12:56,660 All right, if it's a zero, it's going to be a whole bunch 307 00:12:56,660 --> 00:12:59,200 of ones-- of zeroes, so one, it may be a whole bunch of 308 00:12:59,200 --> 00:13:01,510 zeroes in the first three and then a one at the end of this 309 00:13:01,510 --> 00:13:04,020 thing, but one of the way to think about this list in 310 00:13:04,020 --> 00:13:08,470 memory, is that I can decide in constant time how to find 311 00:13:08,470 --> 00:13:09,640 the i'th element of a list. 312 00:13:09,640 --> 00:13:12,460 So in particular, here's where the zero-th element of the 313 00:13:12,460 --> 00:13:15,320 list starts, there's where the first element starts, here's 314 00:13:15,320 --> 00:13:17,560 where the third element starts, these are just memory 315 00:13:17,560 --> 00:13:25,140 cells in a row, and to find the zero-th element, if start 316 00:13:25,140 --> 00:13:29,300 is pointing to that memory cell, it's just at start. 317 00:13:29,300 --> 00:13:33,730 To find the first element, because I know I need four 318 00:13:33,730 --> 00:13:41,590 memory cells to represent an integer, it's at start plus 4. 319 00:13:41,590 --> 00:13:46,080 To get to the second element, I know that that's-- you get 320 00:13:46,080 --> 00:13:50,940 the idea-- at the start plus 2 times 4, and to get to the 321 00:13:50,940 --> 00:14:01,520 k'th element, I know that I want to take whatever the 322 00:14:01,520 --> 00:14:04,790 start is which points to that place in memory, take care, 323 00:14:04,790 --> 00:14:08,590 multiply by 4, and that tells me exactly where to go to find 324 00:14:08,590 --> 00:14:09,200 that location. 325 00:14:09,200 --> 00:14:13,860 This may sound like a nuance, but it's important. 326 00:14:13,860 --> 00:14:14,910 Why? 327 00:14:14,910 --> 00:14:16,900 Because that's a constant access, right? 328 00:14:16,900 --> 00:14:19,890 To get any location in memory, to get to any value of the 329 00:14:19,890 --> 00:14:23,120 list, I simply have to say which element do I want to 330 00:14:23,120 --> 00:14:25,780 get, I know that these things are stored in a particular 331 00:14:25,780 --> 00:14:29,210 size, multiply that index by 4, add it to start, and then 332 00:14:29,210 --> 00:14:31,880 it's in a constant amount of time I can go to that location 333 00:14:31,880 --> 00:14:33,870 and get out the cell. 334 00:14:33,870 --> 00:14:36,590 OK. 335 00:14:36,590 --> 00:14:42,670 That works nicely if I know that I have things stored in 336 00:14:42,670 --> 00:14:44,510 constant size. 337 00:14:44,510 --> 00:14:47,320 But what if I have a list of lists? 338 00:14:47,320 --> 00:14:49,640 What if I have a homogeneous list, a list of integers and 339 00:14:49,640 --> 00:14:52,438 strings and floats and lists and lists of lists and lists 340 00:14:52,438 --> 00:14:53,860 of lists of lists and all that sort of cool stuff? 341 00:14:53,860 --> 00:15:04,190 In that case, I've got to be a lot more careful. 342 00:15:04,190 --> 00:15:07,450 So in this case, one of the standard ways to do this, is 343 00:15:07,450 --> 00:15:13,130 to use what's called a linked list. And I'm going to do it 344 00:15:13,130 --> 00:15:14,120 in the following way. 345 00:15:14,120 --> 00:15:21,570 Start again, we'll point to the beginning of the list. But 346 00:15:21,570 --> 00:15:23,790 now, because my elements are going to take different 347 00:15:23,790 --> 00:15:26,430 amounts of memory, I'm going to do the following thing. 348 00:15:26,430 --> 00:15:31,440 In the first spot, I'm going to store something that says, 349 00:15:31,440 --> 00:15:33,610 here's how far you have to jump to 350 00:15:33,610 --> 00:15:35,320 get to the next element. 351 00:15:35,320 --> 00:15:38,910 And then, I'm going to use the next sequence of things to 352 00:15:38,910 --> 00:15:40,750 represent the first element, or 353 00:15:40,750 --> 00:15:42,230 the zero-th element, if you like. 354 00:15:42,230 --> 00:15:44,120 In this case I might need five. 355 00:15:44,120 --> 00:15:47,550 And then in the next spot, I'm going to say how far you have 356 00:15:47,550 --> 00:15:49,070 to jump to get to the next element. 357 00:15:49,070 --> 00:15:53,220 All right, followed by whatever I need to represent 358 00:15:53,220 --> 00:15:54,880 it, which might only be a blank one. 359 00:15:54,880 --> 00:15:59,100 And in the next spot, maybe I've got a really long list, 360 00:15:59,100 --> 00:16:00,990 and I'm going to say how to jump to 361 00:16:00,990 --> 00:16:02,060 get to the next element. 362 00:16:02,060 --> 00:16:05,020 All right, this is actually kind of nice. 363 00:16:05,020 --> 00:16:07,800 This lets me have a way of representing things that could 364 00:16:07,800 --> 00:16:08,900 be arbitrary in size. 365 00:16:08,900 --> 00:16:10,240 And some of these things could be huge, if 366 00:16:10,240 --> 00:16:12,370 they're themselves lists. 367 00:16:12,370 --> 00:16:13,970 Here's the problem. 368 00:16:13,970 --> 00:16:16,690 How do I get to the nth-- er, the k'th element in the list, 369 00:16:16,690 --> 00:16:17,860 in this case? 370 00:16:17,860 --> 00:16:21,380 Well I have to go to the zero-th element, and say OK, 371 00:16:21,380 --> 00:16:23,520 gee, to get to the next element, I've got 372 00:16:23,520 --> 00:16:24,700 to jump this here. 373 00:16:24,700 --> 00:16:27,490 And to get to the next element, I've got to jump to 374 00:16:27,490 --> 00:16:29,760 here, and to get to the next element, I've got to jump to 375 00:16:29,760 --> 00:16:32,990 here, until I get there. 376 00:16:32,990 --> 00:16:34,310 And so, I get some power. 377 00:16:34,310 --> 00:16:37,030 I get the ability to store arbitrary things, but what 378 00:16:37,030 --> 00:16:39,830 just happened to my complexity? 379 00:16:39,830 --> 00:16:41,270 How long does it take me to find the 380 00:16:41,270 --> 00:16:43,310 k'th element? 381 00:16:43,310 --> 00:16:44,270 Linear. 382 00:16:44,270 --> 00:16:45,990 Because I've got to walk my way down it. 383 00:16:45,990 --> 00:16:46,940 OK? 384 00:16:46,940 --> 00:16:56,730 So in this case, you have linear access. 385 00:16:56,730 --> 00:16:57,820 Oh fudge knuckle. 386 00:16:57,820 --> 00:16:58,780 Right? 387 00:16:58,780 --> 00:17:01,440 If that was the case in that code, then my complexity is no 388 00:17:01,440 --> 00:17:04,690 longer log, because I need linear access for each time 389 00:17:04,690 --> 00:17:06,260 I've got to go to the list, and it's going to be much 390 00:17:06,260 --> 00:17:06,940 worse than that. 391 00:17:06,940 --> 00:17:08,430 All right. 392 00:17:08,430 --> 00:17:08,650 Now. 393 00:17:08,650 --> 00:17:13,730 Some programming languages, primarily Lisp, actually store 394 00:17:13,730 --> 00:17:15,700 lists these ways. 395 00:17:15,700 --> 00:17:17,470 You might say, why? 396 00:17:17,470 --> 00:17:19,850 Well it turns out there's some trade-offs to it. 397 00:17:19,850 --> 00:17:22,340 It has some advantages in terms of power of storing 398 00:17:22,340 --> 00:17:24,420 things, it has some disadvantages, primarily in 399 00:17:24,420 --> 00:17:26,270 terms of access time. 400 00:17:26,270 --> 00:17:28,890 Fortunately for you, Python decided, or the investors of 401 00:17:28,890 --> 00:17:30,840 Python decided, to store this a different way. 402 00:17:30,840 --> 00:17:34,720 And the different way is to say, look, if I redraw this, 403 00:17:34,720 --> 00:17:48,150 it's called a box and pointer diagram, what we really have 404 00:17:48,150 --> 00:17:49,800 for each element is two things. 405 00:17:49,800 --> 00:17:52,070 And I've actually just reversed the order here. 406 00:17:52,070 --> 00:17:55,070 We have a pointer to the location in memory that 407 00:17:55,070 --> 00:17:58,510 contains the actual value, which itself might be a bunch 408 00:17:58,510 --> 00:18:02,885 of pointers, and we have a pointer to the actual-- sorry, 409 00:18:02,885 --> 00:18:05,390 a pointer the value and we have a pointer to the next 410 00:18:05,390 --> 00:18:08,000 element in the list. All right? 411 00:18:08,000 --> 00:18:10,090 And one of the things we could do if we look at that is, we 412 00:18:10,090 --> 00:18:12,440 say, gee, we could reorganize this in a pretty 413 00:18:12,440 --> 00:18:13,360 straightforward way. 414 00:18:13,360 --> 00:18:21,400 In particular, why don't we just take all of the first 415 00:18:21,400 --> 00:18:35,520 cells and stick them together? 416 00:18:35,520 --> 00:18:40,070 Where now, my list is a list of pointers, it's not a set of 417 00:18:40,070 --> 00:18:42,080 values but it's actually a pointer off to some other 418 00:18:42,080 --> 00:18:44,480 piece of memory that contains the value. 419 00:18:44,480 --> 00:18:46,380 Why is this nice? 420 00:18:46,380 --> 00:18:50,750 Well this is exactly like this. 421 00:18:50,750 --> 00:18:53,480 All right? 422 00:18:53,480 --> 00:18:57,160 It's now something that I can search in constant time. 423 00:18:57,160 --> 00:18:58,670 And that's what's going to allow me to keep this 424 00:18:58,670 --> 00:19:01,400 thing as being log. 425 00:19:01,400 --> 00:19:03,440 OK. 426 00:19:03,440 --> 00:19:07,530 With that in mind, let's go back to where we were. 427 00:19:07,530 --> 00:19:10,580 And where were we? 428 00:19:10,580 --> 00:19:15,040 We started off talking about binary search, and I suggested 429 00:19:15,040 --> 00:19:18,870 that this was a log algorithm, which it is, which is really 430 00:19:18,870 --> 00:19:21,060 kind of nice. 431 00:19:21,060 --> 00:19:32,970 Let's pull together what this algorithm actually does. 432 00:19:32,970 --> 00:19:35,780 If I generalize binary search, here's what I'm going to stake 433 00:19:35,780 --> 00:19:37,070 that this thing does. 434 00:19:37,070 --> 00:19:45,580 It says one: pick the midpoint. 435 00:19:45,580 --> 00:19:56,360 Two: check to see if this is the answer, if this is the 436 00:19:56,360 --> 00:19:58,000 thing I'm looking for. 437 00:19:58,000 --> 00:20:05,910 And then, three: if not, reduce to a smaller problem, 438 00:20:05,910 --> 00:20:16,220 and repeat. 439 00:20:16,220 --> 00:20:18,260 OK, you're going, yeah, come on, that makes obvious sense. 440 00:20:18,260 --> 00:20:18,980 And it does. 441 00:20:18,980 --> 00:20:21,150 But I want you to keep that template in mind, because 442 00:20:21,150 --> 00:20:22,930 we're going to come back to that. 443 00:20:22,930 --> 00:20:25,310 It's an example of a very common tool that's going to be 444 00:20:25,310 --> 00:20:28,180 really useful to us, not just for doing search, but for 445 00:20:28,180 --> 00:20:30,900 doing a whole range of problems. That is, in essence, 446 00:20:30,900 --> 00:20:34,030 the template the describes a log style algorithm. 447 00:20:34,030 --> 00:20:37,030 And we're going to come back to it. 448 00:20:37,030 --> 00:20:38,620 OK. 449 00:20:38,620 --> 00:20:41,560 With that in mind though, didn't I cheat? 450 00:20:41,560 --> 00:20:45,340 I remind you, I know you're not really listening to me, 451 00:20:45,340 --> 00:20:45,940 but that's OK. 452 00:20:45,940 --> 00:20:47,635 I reminded you at the beginning of the lecture, I 453 00:20:47,635 --> 00:20:50,090 said, let's assume we have a sorted list, and then 454 00:20:50,090 --> 00:20:52,210 let's go search it. 455 00:20:52,210 --> 00:20:52,830 Where in the world 456 00:20:52,830 --> 00:20:54,790 did that sorted list come from? 457 00:20:54,790 --> 00:20:58,700 What if I just get a list of elements, what do I do? 458 00:20:58,700 --> 00:20:59,250 Well let's see. 459 00:20:59,250 --> 00:21:02,210 My fall back is, I could just do linear search, walk down 460 00:21:02,210 --> 00:21:04,400 the list one at a time, just comparing those things. 461 00:21:04,400 --> 00:21:04,580 OK. 462 00:21:04,580 --> 00:21:06,250 So that's sort of my base. 463 00:21:06,250 --> 00:21:08,390 But what if I wanted, you know, how do I want to get to 464 00:21:08,390 --> 00:21:09,300 that sorted list? 465 00:21:09,300 --> 00:21:12,250 All right? 466 00:21:12,250 --> 00:21:16,090 Now. 467 00:21:16,090 --> 00:21:18,160 One of the questions, before we get to doing the sorting, 468 00:21:18,160 --> 00:21:20,810 is even to ask, what should I do in a search case like that? 469 00:21:20,810 --> 00:21:26,380 All right, so in particular, does it make sense, if I'm 470 00:21:26,380 --> 00:21:29,990 given an unsorted list, to first sort it, 471 00:21:29,990 --> 00:21:31,450 and then search it? 472 00:21:31,450 --> 00:21:34,330 Or should I just use the basically linear case? 473 00:21:34,330 --> 00:21:34,580 All right? 474 00:21:34,580 --> 00:21:39,900 So, here's the question. 475 00:21:39,900 --> 00:21:47,560 Should we sort before we search? 476 00:21:47,560 --> 00:21:47,820 OK. 477 00:21:47,820 --> 00:21:51,050 So let's see, if I'm going to do this, how fast could we 478 00:21:51,050 --> 00:21:53,560 sort a list? 479 00:21:53,560 --> 00:22:05,880 Can we sort a list in sublinear time? 480 00:22:05,880 --> 00:22:07,150 Sublinear meaning, something like log 481 00:22:07,150 --> 00:22:08,180 less than linear time? 482 00:22:08,180 --> 00:22:11,460 What do you think? 483 00:22:11,460 --> 00:22:17,630 It's possible? 484 00:22:17,630 --> 00:22:19,800 Any thoughts? 485 00:22:19,800 --> 00:22:22,150 Don't you hate professors who stand here waiting for you to 486 00:22:22,150 --> 00:22:25,660 answer, even when they have candy? 487 00:22:25,660 --> 00:22:28,370 Does it make sense to think we could do this in less than 488 00:22:28,370 --> 00:22:28,940 linear time? 489 00:22:28,940 --> 00:22:31,060 You know, it takes a little bit of thinking. 490 00:22:31,060 --> 00:22:31,760 What would it mean-- 491 00:22:31,760 --> 00:22:34,240 [UNINTELLIGIBLE PHRASE] do I see a hand, way at the back, 492 00:22:34,240 --> 00:22:37,500 yes please? 493 00:22:37,500 --> 00:22:39,770 Thank you. 494 00:22:39,770 --> 00:22:41,570 Man, you're going to really make me work here, I have no 495 00:22:41,570 --> 00:22:43,720 idea if I can get it that far, ah, your friend 496 00:22:43,720 --> 00:22:44,250 will help you out. 497 00:22:44,250 --> 00:22:45,090 Thank you. 498 00:22:45,090 --> 00:22:47,130 The gentleman has it exactly right. 499 00:22:47,130 --> 00:22:49,910 How could I possibly do it in sublinear time, I've got to 500 00:22:49,910 --> 00:22:52,700 look at least every element once. 501 00:22:52,700 --> 00:22:54,550 And that's the kind of instinct I'd like you to get 502 00:22:54,550 --> 00:22:55,130 into thinking about. 503 00:22:55,130 --> 00:22:58,620 So the answer here is no. 504 00:22:58,620 --> 00:23:00,180 OK. 505 00:23:00,180 --> 00:23:07,380 Can we sort it in linear time? 506 00:23:07,380 --> 00:23:07,830 Hmmm. 507 00:23:07,830 --> 00:23:11,910 That one's not so obvious. 508 00:23:11,910 --> 00:23:13,440 So let's think about this for a second. 509 00:23:13,440 --> 00:23:18,190 To sort a list in linear time, would say, I have to look at 510 00:23:18,190 --> 00:23:20,530 each element in the list at most a 511 00:23:20,530 --> 00:23:21,770 constant number of times. 512 00:23:21,770 --> 00:23:23,030 It doesn't have to be just once, right? 513 00:23:23,030 --> 00:23:25,550 It could be two or three times. 514 00:23:25,550 --> 00:23:25,960 Hmm. 515 00:23:25,960 --> 00:23:26,590 Well, wait a minute. 516 00:23:26,590 --> 00:23:28,860 If I want to sort a list, I'll take one element, I've got to 517 00:23:28,860 --> 00:23:33,590 look at probably a lot of the other elements in the list in 518 00:23:33,590 --> 00:23:35,290 order to decide where it goes. 519 00:23:35,290 --> 00:23:37,370 And that suggests it's going to depend on how 520 00:23:37,370 --> 00:23:38,200 long the list is. 521 00:23:38,200 --> 00:23:41,640 All right, so that's a weak argument, but in fact, it's a 522 00:23:41,640 --> 00:23:49,010 way of suggesting, probably not. 523 00:23:49,010 --> 00:23:50,490 All right. 524 00:23:50,490 --> 00:23:52,890 So how fast could I sort a list? 525 00:23:52,890 --> 00:23:54,170 How fast can we sort it? 526 00:23:54,170 --> 00:24:03,020 And we're going to come back to this, probably next time if 527 00:24:03,020 --> 00:24:12,670 I time this right, but the answer is, we can do it in n 528 00:24:12,670 --> 00:24:14,830 log n time. 529 00:24:14,830 --> 00:24:15,810 We're going to come back to that. 530 00:24:15,810 --> 00:24:16,110 All right? 531 00:24:16,110 --> 00:24:18,956 And I'm going to say-- sort of set that stage here, so that-- 532 00:24:18,956 --> 00:24:21,740 It turns out that that's probably about the best we can 533 00:24:21,740 --> 00:24:25,560 do, or again ends at the length of the list. 534 00:24:25,560 --> 00:24:27,510 OK, so that's still comes back to my question. 535 00:24:27,510 --> 00:24:30,220 If I want to search a list, should I sort it first and 536 00:24:30,220 --> 00:24:31,710 then search it? 537 00:24:31,710 --> 00:24:33,310 Hmmm. 538 00:24:33,310 --> 00:24:39,420 OK, so let's do the comparison. 539 00:24:39,420 --> 00:24:42,550 I'm just going to take an unsorted list and search it, I 540 00:24:42,550 --> 00:24:43,980 could do it in linear time, right? 541 00:24:43,980 --> 00:24:44,600 One at a time. 542 00:24:44,600 --> 00:24:46,400 Walk down the elements until I find it. 543 00:24:46,400 --> 00:24:48,480 That would be order n. 544 00:24:48,480 --> 00:24:53,020 On the other hand, if I want to sort it first, OK, if I 545 00:24:53,020 --> 00:24:59,520 want to do sort and search, I want to sort it, it's going to 546 00:24:59,520 --> 00:25:05,510 take n log n time to sort it, and having done that, then I 547 00:25:05,510 --> 00:25:09,170 can search it in log n time. 548 00:25:09,170 --> 00:25:10,610 Ah. 549 00:25:10,610 --> 00:25:15,020 So which one's better? 550 00:25:15,020 --> 00:25:20,200 Yeah. 551 00:25:20,200 --> 00:25:20,760 Ah-ha. 552 00:25:20,760 --> 00:25:21,690 Thank you. 553 00:25:21,690 --> 00:25:23,205 Hold on to that thought for second, I'm going to 554 00:25:23,205 --> 00:25:23,890 come back to it. 555 00:25:23,890 --> 00:25:25,860 That does not assume I'm running a search it wants, 556 00:25:25,860 --> 00:25:29,000 which one's better? 557 00:25:29,000 --> 00:25:30,760 The unsorted, and you have exactly the point I want to 558 00:25:30,760 --> 00:25:33,140 get to-- how come all the guys, sorry, all the people 559 00:25:33,140 --> 00:25:36,780 answering questions are way, way up in the back? 560 00:25:36,780 --> 00:25:40,430 Wow. that's a Tim Wakefield pitch right there, all right. 561 00:25:40,430 --> 00:25:42,160 Thank you. 562 00:25:42,160 --> 00:25:43,910 He has it exactly right. 563 00:25:43,910 --> 00:25:45,170 OK? 564 00:25:45,170 --> 00:25:48,330 Is this smaller than that? 565 00:25:48,330 --> 00:25:49,190 No. 566 00:25:49,190 --> 00:25:50,340 Now that's a slight lie. 567 00:25:50,340 --> 00:25:52,730 Sorry, a slight misstatement, OK? 568 00:25:52,730 --> 00:25:54,570 I could run for office, couldn't I, if I can do that 569 00:25:54,570 --> 00:25:55,460 kind of talk. 570 00:25:55,460 --> 00:25:57,620 It's a slight misstatement in the sense that these should 571 00:25:57,620 --> 00:25:58,630 really be orders of growth. 572 00:25:58,630 --> 00:26:00,640 There are some constants in there, it depends on the size, 573 00:26:00,640 --> 00:26:05,800 but in general, n log n has to be bigger than n. 574 00:26:05,800 --> 00:26:08,330 So, as the gentleman back there said, if I'm searching 575 00:26:08,330 --> 00:26:11,920 it once, just use the linear search. 576 00:26:11,920 --> 00:26:15,370 On the other hand, am I likely to only search a list once? 577 00:26:15,370 --> 00:26:16,190 Probably not. 578 00:26:16,190 --> 00:26:17,710 There are going to be multiple elements I'm going to be 579 00:26:17,710 --> 00:26:24,120 looking for, so that suggests that in fact, I want to 580 00:26:24,120 --> 00:26:26,500 amortize the cost. 581 00:26:26,500 --> 00:26:30,970 And what does that say? 582 00:26:30,970 --> 00:26:33,000 It says, let's assume I want to do k 583 00:26:33,000 --> 00:26:41,700 searches of a list. OK. 584 00:26:41,700 --> 00:26:44,720 In the linear case, meaning in the unsorted case, what's the 585 00:26:44,720 --> 00:26:48,900 complexity of this? k times n, right? 586 00:26:48,900 --> 00:26:51,170 Order n to do the search, and I've got to do it k times, so 587 00:26:51,170 --> 00:26:55,830 this would be k times n. 588 00:26:55,830 --> 00:26:58,170 In the [GARBLED PHRASE] 589 00:26:58,170 --> 00:27:03,530 sort and search case, what's my cost? 590 00:27:03,530 --> 00:27:05,690 I've got to sort it, and we said, and we'll come back to 591 00:27:05,690 --> 00:27:10,520 that next time, that I can do the sort in n log n, and then 592 00:27:10,520 --> 00:27:13,880 what's the search in this case? 593 00:27:13,880 --> 00:27:17,730 Let's log n to do one search, I want to do k of them, that's 594 00:27:17,730 --> 00:27:26,090 k log n, ah-ha! 595 00:27:26,090 --> 00:27:28,780 Now I'm in better shape, right? 596 00:27:28,780 --> 00:27:31,730 Especially for really large n or for a lot of k, because now 597 00:27:31,730 --> 00:27:37,860 in general, this is going to be smaller than that. 598 00:27:37,860 --> 00:27:40,110 So this is a place where the amortized cost 599 00:27:40,110 --> 00:27:41,800 actually helps me out. 600 00:27:41,800 --> 00:27:43,900 And as the gentleman at the back said, the question he 601 00:27:43,900 --> 00:27:46,210 asked is right, it depends on what I'm trying to do. 602 00:27:46,210 --> 00:27:49,160 So when I do the analysis, I want to think about what am I 603 00:27:49,160 --> 00:27:51,370 doing here, am I capturing all the pieces of it? 604 00:27:51,370 --> 00:27:54,020 Here, the two variables that matter are what's the length 605 00:27:54,020 --> 00:27:57,030 of the list, and how many times I'm going to search it? 606 00:27:57,030 --> 00:28:04,010 So in this case, this one wins, whereas in this case, 607 00:28:04,010 --> 00:28:07,000 that one wins. 608 00:28:07,000 --> 00:28:08,960 OK. 609 00:28:08,960 --> 00:28:13,220 Having said that, let's look at doing some sorts. 610 00:28:13,220 --> 00:28:16,290 And I'm going to start with a couple of dumb sorting 611 00:28:16,290 --> 00:28:19,400 mechanisms. Actually, that's the wrong way saying it, 612 00:28:19,400 --> 00:28:21,510 they're simply brain-damaged, they're not dumb, OK? 613 00:28:21,510 --> 00:28:23,650 They are computationally challenged, meaning, at the 614 00:28:23,650 --> 00:28:25,700 time they were invented, they were perfectly good sorting 615 00:28:25,700 --> 00:28:27,170 algorithms, there are better ones, we're going to see a 616 00:28:27,170 --> 00:28:29,010 much better one next time around, but this is a good way 617 00:28:29,010 --> 00:28:30,900 to just start thinking about how to do the algorithm, or 618 00:28:30,900 --> 00:28:32,360 how to do the sort. 619 00:28:32,360 --> 00:28:33,060 Blah, try again. 620 00:28:33,060 --> 00:28:34,560 How to do this sort. 621 00:28:34,560 --> 00:28:38,640 So the first one I want to talk about it's what's called 622 00:28:38,640 --> 00:28:40,940 selection sort. 623 00:28:40,940 --> 00:28:50,330 And it's on your handout, and I'm going to bring the code up 624 00:28:50,330 --> 00:28:53,310 here, you can see it, it's called cell sort, just for 625 00:28:53,310 --> 00:28:54,160 selection sort. 626 00:28:54,160 --> 00:28:59,060 And let's take a look at what this does. 627 00:28:59,060 --> 00:28:59,220 OK. 628 00:28:59,220 --> 00:29:01,010 And in fact I think the easy way to look at what this 629 00:29:01,010 --> 00:29:02,690 does-- boy. 630 00:29:02,690 --> 00:29:03,700 My jokes are that bad. 631 00:29:03,700 --> 00:29:04,510 Wow-- 632 00:29:04,510 --> 00:29:04,790 All right. 633 00:29:04,790 --> 00:29:07,535 I think the easiest way to look at what this does, is 634 00:29:07,535 --> 00:29:10,790 let's take a really simple example-- 635 00:29:10,790 --> 00:29:20,690 I want to make sure I put the right things out-- 636 00:29:20,690 --> 00:29:23,040 I've got a simple little list of values there. 637 00:29:23,040 --> 00:29:25,660 And if I look at this code, I'm going to run over a loop, 638 00:29:25,660 --> 00:29:28,930 you can see that there, i is going to go from zero up to 639 00:29:28,930 --> 00:29:34,720 the length minus 1, and I'm going to keep track of a 640 00:29:34,720 --> 00:29:35,700 couple of variables. 641 00:29:35,700 --> 00:29:42,670 Min index, I think I called it min val. 642 00:29:42,670 --> 00:29:42,810 OK. 643 00:29:42,810 --> 00:29:43,780 Let's simulate the code. 644 00:29:43,780 --> 00:29:44,960 Let's see what it's doing here. 645 00:29:44,960 --> 00:29:47,780 All right, so we start off. 646 00:29:47,780 --> 00:29:53,110 Initially i-- ah, let me do it this way, i is going to point 647 00:29:53,110 --> 00:29:58,780 there, and I want to make sure I do it right, OK-- and min 648 00:29:58,780 --> 00:30:03,330 index is going to point to the value of i, which is there, 649 00:30:03,330 --> 00:30:06,780 and min value is initially going to have the value 1. 650 00:30:06,780 --> 00:30:09,490 So we're simply catting a hold of what's the first value 651 00:30:09,490 --> 00:30:10,160 we've got there. 652 00:30:10,160 --> 00:30:12,000 And then what do we do? 653 00:30:12,000 --> 00:30:18,660 We start with j pointing here, and we can see what this 654 00:30:18,660 --> 00:30:20,840 loop's going to do, right? j is just going to move up. 655 00:30:20,840 --> 00:30:23,180 So it's going to look at the rest of the list, walking 656 00:30:23,180 --> 00:30:25,990 along, and what does it do? 657 00:30:25,990 --> 00:30:27,780 It says, right. 658 00:30:27,780 --> 00:30:30,540 If j is-- well it says until j is at the less than the length 659 00:30:30,540 --> 00:30:37,050 of l-- it says, if min value is bigger than the thing I'm 660 00:30:37,050 --> 00:30:39,690 looking at, I'm going to do something, all right? 661 00:30:39,690 --> 00:30:40,850 So let's walk this. 662 00:30:40,850 --> 00:30:42,030 Min value is 1,. 663 00:30:42,030 --> 00:30:43,020 Is 1 bigger than 8? 664 00:30:43,020 --> 00:30:43,410 No. 665 00:30:43,410 --> 00:30:44,080 I move j up. 666 00:30:44,080 --> 00:30:44,990 Is 1 bigger than 3? 667 00:30:44,990 --> 00:30:45,510 No. 668 00:30:45,510 --> 00:30:46,440 1 bigger than 6? 669 00:30:46,440 --> 00:30:46,580 No. 670 00:30:46,580 --> 00:30:47,550 1 bigger than 4? 671 00:30:47,550 --> 00:30:47,850 No. 672 00:30:47,850 --> 00:30:50,640 I get to the end of the loop, and I actually do a little bit 673 00:30:50,640 --> 00:30:51,860 of wasted motion there. 674 00:30:51,860 --> 00:30:55,690 And the little bit of wasted motion is, I take the value at 675 00:30:55,690 --> 00:31:00,405 i, store it away temporarily, take the value where min index 676 00:31:00,405 --> 00:31:02,940 is pointing to, put it back in there, and 677 00:31:02,940 --> 00:31:04,900 then swap it around. 678 00:31:04,900 --> 00:31:05,210 OK. 679 00:31:05,210 --> 00:31:11,540 Having done that, let's move i up to here. i is now pointing 680 00:31:11,540 --> 00:31:12,070 at that thing. 681 00:31:12,070 --> 00:31:13,850 Go through the second round of the loop. 682 00:31:13,850 --> 00:31:14,890 OK. 683 00:31:14,890 --> 00:31:15,790 What does that say? 684 00:31:15,790 --> 00:31:21,360 I'm going to change min index to also point there n value is 685 00:31:21,360 --> 00:31:26,720 8, j starts off here, and I say, OK, is the thing I'm 686 00:31:26,720 --> 00:31:30,330 looking at here smaller than that? 687 00:31:30,330 --> 00:31:31,550 Yes. 688 00:31:31,550 --> 00:31:32,530 Ah-ha. 689 00:31:32,530 --> 00:31:33,710 What does that say to do? 690 00:31:33,710 --> 00:31:41,200 It says, gee, make min index point to there, 691 00:31:41,200 --> 00:31:44,420 min value be 3. 692 00:31:44,420 --> 00:31:46,810 Change j. 693 00:31:46,810 --> 00:31:47,970 Is 6 bigger than 3? 694 00:31:47,970 --> 00:31:48,520 Yes. 695 00:31:48,520 --> 00:31:49,430 Is 4 bigger than 3? 696 00:31:49,430 --> 00:31:50,070 Yes. 697 00:31:50,070 --> 00:31:51,420 Get to the end. 698 00:31:51,420 --> 00:31:55,210 And when I get to the end, what do I do? 699 00:31:55,210 --> 00:32:01,620 Well, you see, I say, take temp, and store away what's 700 00:32:01,620 --> 00:32:04,300 here, all right? 701 00:32:04,300 --> 00:32:07,290 Which is that value, and then take what min index is 702 00:32:07,290 --> 00:32:16,810 pointing to, and stick it in there, and finally, replace 703 00:32:16,810 --> 00:32:21,550 that value. 704 00:32:21,550 --> 00:32:23,240 OK. 705 00:32:23,240 --> 00:32:24,900 Aren't you glad I'm not a computer? 706 00:32:24,900 --> 00:32:26,890 Slow as hell. 707 00:32:26,890 --> 00:32:29,440 What's this thing doing? 708 00:32:29,440 --> 00:32:34,930 It's walking along the list, looking for the smallest thing 709 00:32:34,930 --> 00:32:37,980 in the back end of the list, keeping track of where it came 710 00:32:37,980 --> 00:32:41,220 from, and swapping it with that spot in 711 00:32:41,220 --> 00:32:42,850 the list. All right? 712 00:32:42,850 --> 00:32:45,340 So in the first case, I didn't have to do any swaps because 1 713 00:32:45,340 --> 00:32:46,150 was the smallest thing. 714 00:32:46,150 --> 00:32:49,700 In the second case, I found in the next smallest element and 715 00:32:49,700 --> 00:32:52,550 moved here, taking what was there and moving it on, in 716 00:32:52,550 --> 00:32:56,230 this case I would swap the 4 and the 8, and in next case I 717 00:32:56,230 --> 00:32:58,300 wouldn't have to do anything. 718 00:32:58,300 --> 00:32:59,520 Let's check it out. 719 00:32:59,520 --> 00:33:02,650 I've written a little bit of a test script here, so if we 720 00:33:02,650 --> 00:33:07,080 test cell sort, and I've written this so that it's 721 00:33:07,080 --> 00:33:08,830 going to print out what the list is at the end 722 00:33:08,830 --> 00:33:13,480 of each round, OK. 723 00:33:13,480 --> 00:33:16,110 Ah-ha. 724 00:33:16,110 --> 00:33:17,930 Notice what-- where am I, here-- notice what 725 00:33:17,930 --> 00:33:19,000 happened in this case. 726 00:33:19,000 --> 00:33:22,220 At the end of the first round, I've got the smallest element 727 00:33:22,220 --> 00:33:23,200 at the front. 728 00:33:23,200 --> 00:33:25,340 At the end of the second round, I've got the smallest 729 00:33:25,340 --> 00:33:27,280 two elements at the front, in fact I got all 730 00:33:27,280 --> 00:33:29,340 of them sorted out. 731 00:33:29,340 --> 00:33:31,810 And it actually runs through the loop multiple times, 732 00:33:31,810 --> 00:33:33,180 making sure that it's in the right form. 733 00:33:33,180 --> 00:33:36,710 Let's take another example. 734 00:33:36,710 --> 00:33:39,370 OK. 735 00:33:39,370 --> 00:33:40,950 Smallest element at the front. 736 00:33:40,950 --> 00:33:42,830 Smallest two elements at the front. 737 00:33:42,830 --> 00:33:44,330 Smallest three elements at the front. 738 00:33:44,330 --> 00:33:46,590 Smallest four elements at the front, you get the idea. 739 00:33:46,590 --> 00:33:49,500 Smallest five elements at the front. 740 00:33:49,500 --> 00:33:52,660 So this is a nice little search-- sorry, a nice little 741 00:33:52,660 --> 00:33:53,210 sort algorithm . 742 00:33:53,210 --> 00:33:56,880 And in fact, it's relying on something that we're going to 743 00:33:56,880 --> 00:33:59,200 come back to, called the loop invariant. 744 00:33:59,200 --> 00:34:16,350 Actually, let me put it on this board so you can see it. 745 00:34:16,350 --> 00:34:18,360 The loop invariant what does the loop invariant mean? 746 00:34:18,360 --> 00:34:21,850 It says, here is a property that is true of this structure 747 00:34:21,850 --> 00:34:23,510 every time through the loop. 748 00:34:23,510 --> 00:34:26,870 In the loop invariant here is the following: the list is 749 00:34:26,870 --> 00:34:37,150 split, into a prefix or a first part, and a suffix, the 750 00:34:37,150 --> 00:34:48,100 prefix is sorted, the suffix is not, and basically, the 751 00:34:48,100 --> 00:34:50,410 loop starts off with the prefix being nothing and it 752 00:34:50,410 --> 00:34:53,340 keeps increasing the size of the prefix by 1 until it gets 753 00:34:53,340 --> 00:34:55,890 through the entire list, at which point there's nothing in 754 00:34:55,890 --> 00:35:00,200 the suffix and entire prefix is sorted. 755 00:35:00,200 --> 00:35:01,990 OK? 756 00:35:01,990 --> 00:35:04,114 So you can see that, it's just walking through it, and in 757 00:35:04,114 --> 00:35:06,250 fact if I look at a couple of another-- another couple of 758 00:35:06,250 --> 00:35:09,380 examples, it's been a long day, again, you 759 00:35:09,380 --> 00:35:12,680 can see that property. 760 00:35:12,680 --> 00:35:16,470 You'll also notice that this thing goes through the entire 761 00:35:16,470 --> 00:35:19,345 list, even if the list is sorted before it 762 00:35:19,345 --> 00:35:20,030 gets partway through. 763 00:35:20,030 --> 00:35:22,720 And that you might look at, for example, that first 764 00:35:22,720 --> 00:35:25,720 example, and say, man by this stage it was already sorted, 765 00:35:25,720 --> 00:35:28,230 yet it had to go through and check that the third element 766 00:35:28,230 --> 00:35:30,000 was in the right place, and then the fourth and then the 767 00:35:30,000 --> 00:35:32,430 fifth and then the six. 768 00:35:32,430 --> 00:35:34,460 OK. 769 00:35:34,460 --> 00:35:35,740 What order of growth? 770 00:35:35,740 --> 00:35:40,450 What's complexity of this? 771 00:35:40,450 --> 00:35:43,200 I've got to get rid of this candy. 772 00:35:43,200 --> 00:35:44,050 Anybody help me out? 773 00:35:44,050 --> 00:35:46,690 What's the complexity of this? 774 00:35:46,690 --> 00:35:49,010 Sorry, somebody at the back. 775 00:35:49,010 --> 00:35:49,680 n squared. 776 00:35:49,680 --> 00:35:52,810 Yeah, where n is what? 777 00:35:52,810 --> 00:35:54,940 Yeah, and I can't even see who's saying that. 778 00:35:54,940 --> 00:35:56,220 Thank you. 779 00:35:56,220 --> 00:35:57,900 Sorry, I've got the wrong glasses on, but you're 780 00:35:57,900 --> 00:36:00,030 absolutely right, and in case the rest of you didn't hear 781 00:36:00,030 --> 00:36:03,890 it, n squared. 782 00:36:03,890 --> 00:36:05,980 How do I figure that out? 783 00:36:05,980 --> 00:36:09,630 Well I'm looping down the list, right? 784 00:36:09,630 --> 00:36:12,660 I'm walking down the list. So it's certainly at least linear 785 00:36:12,660 --> 00:36:15,230 in the length of the list. For each starting 786 00:36:15,230 --> 00:36:15,940 point, what do I do? 787 00:36:15,940 --> 00:36:19,900 I look at the rest of the list to decide what's the element 788 00:36:19,900 --> 00:36:21,300 to swap into the next place. 789 00:36:21,300 --> 00:36:23,200 Now, you might say, well, wait a minute. 790 00:36:23,200 --> 00:36:26,310 As I keep moving down, that part gets smaller, it's not 791 00:36:26,310 --> 00:36:29,110 always the initial length of the list, and you're right. 792 00:36:29,110 --> 00:36:31,350 But if you do the sums, or if you want to think of it this 793 00:36:31,350 --> 00:36:34,180 way, if you think about this more generally, it's always on 794 00:36:34,180 --> 00:36:37,330 average at least the length of the list. So I've got to do n 795 00:36:37,330 --> 00:36:39,160 things n times. 796 00:36:39,160 --> 00:36:42,620 So it's quadratic, in terms of that sort. 797 00:36:42,620 --> 00:36:43,930 OK. 798 00:36:43,930 --> 00:36:45,680 That's one way to do this sort. 799 00:36:45,680 --> 00:36:50,510 Let's do another one. 800 00:36:50,510 --> 00:36:52,180 The second one we're going to do is called bubble sort. 801 00:36:52,180 --> 00:36:55,020 All right? 802 00:36:55,020 --> 00:36:59,980 And bubble sort is also on your handout. 803 00:36:59,980 --> 00:37:07,970 And you want to take the first of these, let me-- sorry, for 804 00:37:07,970 --> 00:37:10,480 a second let me uncomment that, and let me 805 00:37:10,480 --> 00:37:11,510 comment this out-- 806 00:37:11,510 --> 00:37:19,950 All right, you can see the code for bubble sort there. 807 00:37:19,950 --> 00:37:21,770 Let's just look at it for a second, then we'll try some 808 00:37:21,770 --> 00:37:23,100 examples, and then we'll figure out what 809 00:37:23,100 --> 00:37:25,030 it's actually doing. 810 00:37:25,030 --> 00:37:27,890 So bubble sort, which is right up here. 811 00:37:27,890 --> 00:37:28,630 What's it going to do? 812 00:37:28,630 --> 00:37:32,530 It's going to let j run over the length of the list, all 813 00:37:32,530 --> 00:37:34,530 right, so it's going to start at some point to move down, 814 00:37:34,530 --> 00:37:38,870 and then it's going to let i run over range, that's just 815 00:37:38,870 --> 00:37:43,580 one smaller, and what's it doing there? 816 00:37:43,580 --> 00:37:45,520 It's looking at successive pairs, right? 817 00:37:45,520 --> 00:37:48,680 It's looking at the i'th and the i plus first element, and 818 00:37:48,680 --> 00:37:51,240 it's saying, gee, if the i'th element is bigger than the 819 00:37:51,240 --> 00:37:53,490 i'th plus first element, what's the next set of three 820 00:37:53,490 --> 00:37:55,730 things doing? 821 00:37:55,730 --> 00:37:57,640 Just swapping them, right? 822 00:37:57,640 --> 00:37:59,770 I temporarily hold on to what's in the i'th element so 823 00:37:59,770 --> 00:38:02,940 I can move the i plus first one in, and then replace that 824 00:38:02,940 --> 00:38:05,210 with the i'th element. 825 00:38:05,210 --> 00:38:06,450 OK. 826 00:38:06,450 --> 00:38:08,980 What's this thing doing then, in terms of sorting? 827 00:38:08,980 --> 00:38:13,230 At the end of the first pass, what could I say about the 828 00:38:13,230 --> 00:38:16,910 result of this thing? 829 00:38:16,910 --> 00:38:25,360 What's the last element in the list look like? 830 00:38:25,360 --> 00:38:28,050 I hate professors who do this. 831 00:38:28,050 --> 00:38:30,180 Well, let's try it. 832 00:38:30,180 --> 00:38:35,700 Let's try a little test. OK? 833 00:38:35,700 --> 00:38:40,850 Test bubble sort-- especially if I could type-- let's run it 834 00:38:40,850 --> 00:38:49,740 on the first list. OK, let's try it on another one. 835 00:38:49,740 --> 00:38:50,910 Oops sorry. 836 00:38:50,910 --> 00:38:53,520 Ah, I didn't want to do it this time, I forgot to do the 837 00:38:53,520 --> 00:38:56,580 following, bear with me. 838 00:38:56,580 --> 00:38:57,890 I gave away my punchline. 839 00:38:57,890 --> 00:38:58,820 Let's try it again. 840 00:38:58,820 --> 00:39:04,440 Test bubble sort. 841 00:39:04,440 --> 00:39:07,180 OK, there's the first run, I'm going to take a different 842 00:39:07,180 --> 00:39:18,720 list. Can you see a pattern there? 843 00:39:18,720 --> 00:39:18,970 Yeah. 844 00:39:18,970 --> 00:39:22,180 STUDENT: The last cell in the list is always going to 845 00:39:22,180 --> 00:39:22,510 [INAUDIBLE] 846 00:39:22,510 --> 00:39:23,400 PROFESSOR ERIC GRIMSON: Yeah. 847 00:39:23,400 --> 00:39:23,560 Why? 848 00:39:23,560 --> 00:39:24,380 You're right, but why? 849 00:39:24,380 --> 00:39:28,940 STUDENT: [UNINTELLIGIBLE PHRASE] 850 00:39:28,940 --> 00:39:29,910 PROFESSOR ERIC GRIMSON: Exactly right. 851 00:39:29,910 --> 00:39:30,670 Thank you. 852 00:39:30,670 --> 00:39:37,090 The observation is, thank you, on the first pass through, the 853 00:39:37,090 --> 00:39:40,110 last element is the biggest thing in the list. On the next 854 00:39:40,110 --> 00:39:43,200 pass through, the next largest element is at the second point 855 00:39:43,200 --> 00:39:43,280 in 856 00:39:43,280 --> 00:39:43,970 the list. OK? 857 00:39:43,970 --> 00:39:45,180 Because what am I doing? 858 00:39:45,180 --> 00:39:46,600 It's called bubble sort because it's literally 859 00:39:46,600 --> 00:39:47,900 bubbling along, right? 860 00:39:47,900 --> 00:39:51,610 I'm walking along the list once, taking two things, and 861 00:39:51,610 --> 00:39:53,510 saying, make sure the biggest one is next. 862 00:39:53,510 --> 00:39:55,810 So wherever the largest element started out in the 863 00:39:55,810 --> 00:39:59,800 list, by the time I get through it, it's at the end. 864 00:39:59,800 --> 00:40:01,760 And then I go back and I start again, and 865 00:40:01,760 --> 00:40:03,030 I do the same thing. 866 00:40:03,030 --> 00:40:03,250 OK. 867 00:40:03,250 --> 00:40:05,340 The next largest element has to end up in 868 00:40:05,340 --> 00:40:06,740 the second last spot. 869 00:40:06,740 --> 00:40:07,290 Et cetera. 870 00:40:07,290 --> 00:40:09,990 All right, so it's called bubble sort because it does 871 00:40:09,990 --> 00:40:12,340 this bubbling up until it gets there. 872 00:40:12,340 --> 00:40:14,070 Now. 873 00:40:14,070 --> 00:40:15,110 What's the order of growth here? 874 00:40:15,110 --> 00:40:19,810 What's the complexity? 875 00:40:19,810 --> 00:40:21,720 I haven't talked to the side of the room in a while, 876 00:40:21,720 --> 00:40:23,160 actually I have. This gentleman has helped me out. 877 00:40:23,160 --> 00:40:23,870 Somebody else help me out. 878 00:40:23,870 --> 00:40:27,970 What's the complexity here? 879 00:40:27,970 --> 00:40:31,700 I must have the wrong glasses on to see a hand. 880 00:40:31,700 --> 00:40:34,160 No help. 881 00:40:34,160 --> 00:40:36,260 Log? 882 00:40:36,260 --> 00:40:38,050 Linear? 883 00:40:38,050 --> 00:40:40,450 Exponential? 884 00:40:40,450 --> 00:40:41,470 Quadratic? 885 00:40:41,470 --> 00:40:43,020 Yeah. 886 00:40:43,020 --> 00:40:44,970 Log. 887 00:40:44,970 --> 00:40:50,160 It's a good think, but why do you think it's log? 888 00:40:50,160 --> 00:40:50,980 Ah-ha. 889 00:40:50,980 --> 00:40:53,490 It's not a bad instinct, the length is getting shorter each 890 00:40:53,490 --> 00:40:54,400 time, but what's one of the 891 00:40:54,400 --> 00:40:56,260 characteristics of a log algorithm? 892 00:40:56,260 --> 00:40:58,920 It drops in half each time. 893 00:40:58,920 --> 00:41:00,900 So this isn't-- 894 00:41:00,900 --> 00:41:01,230 OK. 895 00:41:01,230 --> 00:41:02,120 And you're also close. 896 00:41:02,120 --> 00:41:04,200 It's going to be linear, but how many times do 897 00:41:04,200 --> 00:41:05,070 I go through this? 898 00:41:05,070 --> 00:41:08,980 All right, I've got to do one pass to bubble the last 899 00:41:08,980 --> 00:41:10,080 element to the end. 900 00:41:10,080 --> 00:41:12,470 I've got to do another pass to bubble the second last element 901 00:41:12,470 --> 00:41:12,720 to the end. 902 00:41:12,720 --> 00:41:14,730 I've got to do another pass. 903 00:41:14,730 --> 00:41:15,800 Huh. 904 00:41:15,800 --> 00:41:19,400 Sounds like a linear number of times I've got to do-- oh 905 00:41:19,400 --> 00:41:20,150 fudge knuckle. 906 00:41:20,150 --> 00:41:23,230 A linear number of things, quadratic. 907 00:41:23,230 --> 00:41:25,220 Right? 908 00:41:25,220 --> 00:41:25,600 OK. 909 00:41:25,600 --> 00:41:32,620 So this is again an example, this was quadratic, and this 910 00:41:32,620 --> 00:41:35,130 one was quadratic. 911 00:41:35,130 --> 00:41:40,690 And I have this, to write it out, this is order the length 912 00:41:40,690 --> 00:41:43,720 of the list squared, OK? 913 00:41:43,720 --> 00:41:44,820 Just to make it clear what we're 914 00:41:44,820 --> 00:41:48,250 actually measuring there. 915 00:41:48,250 --> 00:41:48,360 All 916 00:41:48,360 --> 00:41:48,720 right. 917 00:41:48,720 --> 00:41:50,870 Could we do better? 918 00:41:50,870 --> 00:41:52,110 Sure. 919 00:41:52,110 --> 00:41:54,530 And in fact, next time we're going to show you that n log n 920 00:41:54,530 --> 00:41:57,050 algorithm, but even with bubble sort, we can do better. 921 00:41:57,050 --> 00:42:00,290 In a particular, if I look at those traces, I can certainly 922 00:42:00,290 --> 00:42:03,950 see cases where, man, I already had the list sorted 923 00:42:03,950 --> 00:42:06,350 much earlier on, and yet I kept going back to see if 924 00:42:06,350 --> 00:42:08,620 there was anything else to bubble up. 925 00:42:08,620 --> 00:42:09,870 How would I keep track of that? 926 00:42:09,870 --> 00:42:12,600 Could I take advantage of that? 927 00:42:12,600 --> 00:42:13,860 Sure. 928 00:42:13,860 --> 00:42:16,550 Why don't I just keep track on each pass through the 929 00:42:16,550 --> 00:42:18,720 algorithm whether I have done any swaps? 930 00:42:18,720 --> 00:42:20,210 All right? 931 00:42:20,210 --> 00:42:22,480 Because if I don't do any swaps on a pass through the 932 00:42:22,480 --> 00:42:23,850 algorithm, then it says everything's 933 00:42:23,850 --> 00:42:24,820 in the right order. 934 00:42:24,820 --> 00:42:28,180 And so, in fact, the version that I commented out-- which 935 00:42:28,180 --> 00:42:29,880 is also in your handout and I'm now going to uncomment, 936 00:42:29,880 --> 00:42:38,820 let's get that one out, get rid of this one-- notice the 937 00:42:38,820 --> 00:42:39,810 only change. 938 00:42:39,810 --> 00:42:42,113 I'm going to keep track of a little variable called swap, 939 00:42:42,113 --> 00:42:46,225 it's initially true, and as long as it's true, I'm going 940 00:42:46,225 --> 00:42:49,010 to keep going, but inside of the loop I'm going to set it 941 00:42:49,010 --> 00:42:53,620 to false, and only if I do a swap will I set it to true. 942 00:42:53,620 --> 00:42:56,070 This says, if I go through an entire pass through the list 943 00:42:56,070 --> 00:42:58,410 and nothing gets changed, I'm done. 944 00:42:58,410 --> 00:43:09,730 And in fact if I do that, and try test bubble sort, well, in 945 00:43:09,730 --> 00:43:13,080 the first case, looks the same. 946 00:43:13,080 --> 00:43:13,620 Ah. 947 00:43:13,620 --> 00:43:17,660 On the second case, I spot it right away. 948 00:43:17,660 --> 00:43:20,340 On the third case, it takes me the same amount of time. 949 00:43:20,340 --> 00:43:24,210 And the fourth case, when I set it up, I'm done. 950 00:43:24,210 --> 00:43:24,340 OK. 951 00:43:24,340 --> 00:43:25,670 So what's the lesson here? 952 00:43:25,670 --> 00:43:28,420 I can be a little more careful about keeping track of what 953 00:43:28,420 --> 00:43:30,000 goes on inside of that loop. 954 00:43:30,000 --> 00:43:31,930 If I don't have any more work to do, let me just stop. 955 00:43:31,930 --> 00:43:33,230 All right. 956 00:43:33,230 --> 00:43:36,940 Nonetheless, even with this change, what's the order 957 00:43:36,940 --> 00:43:39,080 growth for bubble sort? 958 00:43:39,080 --> 00:43:40,380 Still quadratic, right? 959 00:43:40,380 --> 00:43:42,360 I'm looking for the worst case behavior, it's still 960 00:43:42,360 --> 00:43:44,630 quadratic, it's quadratic in the length of the list, so I'm 961 00:43:44,630 --> 00:43:47,180 sort of stuck with that. 962 00:43:47,180 --> 00:43:47,560 Now. 963 00:43:47,560 --> 00:43:49,120 Let me ask you one last question, and then 964 00:43:49,120 --> 00:43:51,070 we'll wrap this up. 965 00:43:51,070 --> 00:43:55,420 Which of these algorithms is better? 966 00:43:55,420 --> 00:43:57,020 Insertion sort or bubble sort? 967 00:43:57,020 --> 00:43:59,140 STUDENT: Bubble. 968 00:43:59,140 --> 00:43:59,520 PROFESSOR ERIC GRIMSON: Bubble. 969 00:43:59,520 --> 00:44:00,270 Bubble bubble toil and trouble. 970 00:44:00,270 --> 00:44:01,780 Who said bubble? 971 00:44:01,780 --> 00:44:02,140 Why? 972 00:44:02,140 --> 00:44:04,836 STUDENT: Well, the first one was too inefficient 973 00:44:04,836 --> 00:44:07,195 [UNINTELLIGIBLE] store and compare each one, so 974 00:44:07,195 --> 00:44:15,320 [UNINTELLIGIBLE] 975 00:44:15,320 --> 00:44:16,380 PROFESSOR ERIC GRIMSON: It's not a bad instinct. 976 00:44:16,380 --> 00:44:16,600 Right. 977 00:44:16,600 --> 00:44:19,680 So it-- so, your argument is, bubble is better because it's 978 00:44:19,680 --> 00:44:23,300 is essentially not doing all these extra comparisons. 979 00:44:23,300 --> 00:44:25,150 Another way of saying it is, I can do this stop when 980 00:44:25,150 --> 00:44:25,900 I don't need to. 981 00:44:25,900 --> 00:44:26,450 All right? 982 00:44:26,450 --> 00:44:28,120 OK. 983 00:44:28,120 --> 00:44:30,660 Anybody have an opposing opinion? 984 00:44:30,660 --> 00:44:34,260 Wow, this sounds like a presidential debate. 985 00:44:34,260 --> 00:44:35,320 Sorry, I should reward you. 986 00:44:35,320 --> 00:44:37,380 Thank you for that statement. 987 00:44:37,380 --> 00:44:40,160 Anybody have an opposing opinion? 988 00:44:40,160 --> 00:44:41,730 Everybody's answering these things and sitting 989 00:44:41,730 --> 00:44:42,390 way up at the back. 990 00:44:42,390 --> 00:44:44,340 Nice catch. 991 00:44:44,340 --> 00:44:44,720 Yeah. 992 00:44:44,720 --> 00:44:55,160 STUDENT: [INAUDIBLE] 993 00:44:55,160 --> 00:44:55,990 PROFESSOR ERIC GRIMSON: I don't think so, right? 994 00:44:55,990 --> 00:44:57,690 I think selection sort, I still have to go through 995 00:44:57,690 --> 00:45:01,750 multiple times, it was still quadratic, OK, but I think 996 00:45:01,750 --> 00:45:03,540 you're heading towards a direction I want to get at, so 997 00:45:03,540 --> 00:45:05,150 let me prime this a little bit. 998 00:45:05,150 --> 00:45:10,120 How many swaps do I do in general in bubble sort, 999 00:45:10,120 --> 00:45:13,650 compared to selection source? 1000 00:45:13,650 --> 00:45:14,340 God bless. 1001 00:45:14,340 --> 00:45:18,840 Oh, sorry, that wasn't a sneeze, it was a two? 1002 00:45:18,840 --> 00:45:23,460 How many swaps do I do in bubble sort? 1003 00:45:23,460 --> 00:45:24,430 A lot. 1004 00:45:24,430 --> 00:45:24,840 Right. 1005 00:45:24,840 --> 00:45:27,160 Potentially a lot because I'm constantly doing that, that 1006 00:45:27,160 --> 00:45:29,620 says I'm running that inner loop a whole bunch of times. 1007 00:45:29,620 --> 00:45:34,350 How many swaps do I do in selection sort? 1008 00:45:34,350 --> 00:45:36,190 Once each time. 1009 00:45:36,190 --> 00:45:36,320 Right? 1010 00:45:36,320 --> 00:45:39,130 I only do one swap potentially, it-- though not 1011 00:45:39,130 --> 00:45:40,750 one potentially, each time at the end of 1012 00:45:40,750 --> 00:45:42,450 the loop I do a swap. 1013 00:45:42,450 --> 00:45:45,480 So this actually suggests again, the orders of growth 1014 00:45:45,480 --> 00:45:49,060 are the same, but probably selection sort is a more 1015 00:45:49,060 --> 00:45:51,710 efficient algorithm, because I'm not doing that constant 1016 00:45:51,710 --> 00:45:53,010 amount of work every time around. 1017 00:45:53,010 --> 00:45:55,890 And in fact, if you go look up, you won't see bubble sort 1018 00:45:55,890 --> 00:45:56,780 used very much. 1019 00:45:56,780 --> 00:45:57,240 Most-- 1020 00:45:57,240 --> 00:45:59,140 I shouldn't say most, many computer scientists don't 1021 00:45:59,140 --> 00:46:00,700 think it should be taught, because it's just so 1022 00:46:00,700 --> 00:46:01,770 inefficient. 1023 00:46:01,770 --> 00:46:03,950 I disagree, because it's a clever idea, but it's still 1024 00:46:03,950 --> 00:46:06,310 something that we have to keep track of. 1025 00:46:06,310 --> 00:46:07,770 All right. 1026 00:46:07,770 --> 00:46:10,140 We haven't gotten to our n log n algorithm, we're going to do 1027 00:46:10,140 --> 00:46:14,150 that next time, but I want to set the stage here by pulling 1028 00:46:14,150 --> 00:46:16,940 out one last piece. 1029 00:46:16,940 --> 00:46:17,330 OK. 1030 00:46:17,330 --> 00:46:19,130 Could we do better in terms of sorting? 1031 00:46:19,130 --> 00:46:20,270 Again, remember what our goal was. 1032 00:46:20,270 --> 00:46:23,300 If we could do sort, then we saw, if we amortized the cost, 1033 00:46:23,300 --> 00:46:25,955 that searching is a lot more efficient if we're searching a 1034 00:46:25,955 --> 00:46:27,060 sorted list. 1035 00:46:27,060 --> 00:46:29,030 How could we do better? 1036 00:46:29,030 --> 00:46:30,500 Let me set the stage. 1037 00:46:30,500 --> 00:46:34,690 I already said, back here, when I used this board, that 1038 00:46:34,690 --> 00:46:36,630 this idea was really important. 1039 00:46:36,630 --> 00:46:43,260 And that's because that is a version of a divide and 1040 00:46:43,260 --> 00:46:48,590 conquer algorithm. 1041 00:46:48,590 --> 00:46:48,800 OK. 1042 00:46:48,800 --> 00:46:51,460 Binary search is perhaps the simplest of the divide and 1043 00:46:51,460 --> 00:46:53,070 conquer algorithms, and what does that mean? 1044 00:46:53,070 --> 00:46:56,230 It says, in order to solve a problem, cut it down to a 1045 00:46:56,230 --> 00:46:58,770 smaller problem and try and solve that one. 1046 00:46:58,770 --> 00:47:01,740 So to just preface what we're going to do next time, what 1047 00:47:01,740 --> 00:47:04,810 would happen if I wanted to do sort, and rather than in 1048 00:47:04,810 --> 00:47:09,210 sorting the entire list at once, I broke it into pieces, 1049 00:47:09,210 --> 00:47:12,150 and sorted the pieces, and then just figured out a very 1050 00:47:12,150 --> 00:47:15,240 efficient way to bring those two pieces and merge them back 1051 00:47:15,240 --> 00:47:16,420 together again? 1052 00:47:16,420 --> 00:47:19,310 Where those pieces, I would do the same thing with, I would 1053 00:47:19,310 --> 00:47:23,360 divide them up into smaller chunks, and sort those. 1054 00:47:23,360 --> 00:47:25,760 Is that going to give me a more efficient algorithm? 1055 00:47:25,760 --> 00:47:27,580 And if you come back on Thursday, 1056 00:47:27,580 --> 00:47:29,420 we'll answer that question.