1 00:00:00,050 --> 00:00:02,490 The following content is provided under a Creative 2 00:00:02,490 --> 00:00:03,900 Commons license. 3 00:00:03,900 --> 00:00:06,940 Your support will help MIT OpenCourseWare continue to 4 00:00:06,940 --> 00:00:10,600 offer high-quality educational resources for free. 5 00:00:10,600 --> 00:00:13,490 To make a donation or view additional materials from 6 00:00:13,490 --> 00:00:19,320 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:19,320 --> 00:00:22,170 ocw.mit.edu. 8 00:00:22,170 --> 00:00:24,750 PROFESSOR JOHN GUTTAG: In the example we looked at, we had a 9 00:00:24,750 --> 00:00:27,350 list of ints. 10 00:00:27,350 --> 00:00:31,340 That's actually quite easy to do in constant time. 11 00:00:31,340 --> 00:00:38,210 If you think about it, an int is always going to occupy the 12 00:00:38,210 --> 00:00:44,070 same amount of space, roughly speaking, either 32 or 64 13 00:00:44,070 --> 00:00:48,340 bits, depending upon how big an int the 14 00:00:48,340 --> 00:00:50,980 language wants to support. 15 00:00:50,980 --> 00:00:56,100 So let's just, for the sake of argument, assume an int 16 00:00:56,100 --> 00:00:59,970 occupies four units of memory. 17 00:00:59,970 --> 00:01:01,280 And I don't care what a unit is. 18 00:01:01,280 --> 00:01:04,104 Is a unit 8 bits, 16 bits? 19 00:01:04,104 --> 00:01:06,110 It doesn't matter. 20 00:01:06,110 --> 00:01:09,240 4 units. 21 00:01:09,240 --> 00:01:14,450 How would we get to the i-th element of the list? 22 00:01:14,450 --> 00:01:20,590 What is the location in memory of L-th of i? 23 00:01:27,700 --> 00:01:30,015 Well, if we know the location of the start of the list-- 24 00:01:35,010 --> 00:01:40,240 and certainly we can know that because our identifier, say L 25 00:01:40,240 --> 00:01:44,310 in this case, will point to the start of the list-- 26 00:01:44,310 --> 00:01:50,590 then it's simply going to be the start plus 4 times i. 27 00:01:55,890 --> 00:01:58,010 My list looks like this. 28 00:01:58,010 --> 00:02:01,320 I point to the start. 29 00:02:01,320 --> 00:02:03,680 The first element is here. 30 00:02:03,680 --> 00:02:07,840 So, that's start plus 4 times 0. 31 00:02:07,840 --> 00:02:10,770 Makes perfect sense. 32 00:02:10,770 --> 00:02:12,790 The second element is here. 33 00:02:12,790 --> 00:02:15,970 So, that's going to be start plus 4 times 1. 34 00:02:15,970 --> 00:02:20,140 Sure enough, this would be location 4, relative to the 35 00:02:20,140 --> 00:02:23,680 start of the list, et cetera. 36 00:02:23,680 --> 00:02:29,520 This is a very conventional way to implement lists. 37 00:02:29,520 --> 00:02:33,060 But what does its correctness depend upon? 38 00:02:35,780 --> 00:02:40,070 It depends upon the fact that each element of the list is of 39 00:02:40,070 --> 00:02:41,320 the same size. 40 00:02:45,340 --> 00:02:46,760 In this case, it's 4. 41 00:02:46,760 --> 00:02:50,050 But I don't care if it's 4. 42 00:02:50,050 --> 00:02:52,240 If it's 2, it's 2 times i. 43 00:02:52,240 --> 00:02:54,560 If it's 58, it's 58 times i. 44 00:02:54,560 --> 00:02:55,840 It doesn't matter. 45 00:02:55,840 --> 00:03:01,850 But what matters is that each element is the same size. 46 00:03:01,850 --> 00:03:05,790 So this trick would work for accessing elements of lists of 47 00:03:05,790 --> 00:03:12,010 floats, lists of ints, anything that's of fixed size. 48 00:03:12,010 --> 00:03:15,580 But that's not the way lists are in Python. 49 00:03:15,580 --> 00:03:19,670 In Python, I can have a list that contains ints, and 50 00:03:19,670 --> 00:03:24,070 floats, and strings, and other lists, and 51 00:03:24,070 --> 00:03:27,080 dicts, almost anything. 52 00:03:27,080 --> 00:03:32,440 So, in Python, it's not this nice picture where the lists 53 00:03:32,440 --> 00:03:34,890 are all homogeneous. 54 00:03:34,890 --> 00:03:39,110 In many languages they are, by the way. 55 00:03:39,110 --> 00:03:42,480 And those languages would implement it exactly as I've 56 00:03:42,480 --> 00:03:45,080 outlined it on the board here. 57 00:03:45,080 --> 00:03:48,495 But what about languages where they're not, like Python? 58 00:03:51,700 --> 00:03:53,870 One possibility-- 59 00:03:53,870 --> 00:03:57,720 and this is probably the oldest way that people used to 60 00:03:57,720 --> 00:03:59,520 implement lists-- 61 00:03:59,520 --> 00:04:00,970 is the notion of a linked list. 62 00:04:03,840 --> 00:04:06,800 These were used way back in the 1960s, when 63 00:04:06,800 --> 00:04:09,570 Lisp was first invented. 64 00:04:09,570 --> 00:04:13,165 And, effectively, there, what you do is a list. 65 00:04:15,910 --> 00:04:24,460 Every element of the list is a pointer to the next element. 66 00:04:28,300 --> 00:04:29,550 And then the value. 67 00:04:36,810 --> 00:04:42,335 So what it looks like in memory is we have the list. 68 00:04:45,020 --> 00:04:48,760 And this points to the next element, which maybe has a 69 00:04:48,760 --> 00:04:51,880 much bigger value field. 70 00:04:51,880 --> 00:04:53,750 But that's OK. 71 00:04:53,750 --> 00:04:57,000 This points to the next element. 72 00:04:57,000 --> 00:04:59,570 Let's say this one, maybe, is a tiny value field. 73 00:05:06,160 --> 00:05:12,830 And then at the end of the list, I might write none, 74 00:05:12,830 --> 00:05:15,810 saying there is no next element. 75 00:05:15,810 --> 00:05:17,450 Or nil, in Lisp speak. 76 00:05:20,080 --> 00:05:23,640 But what's the cost here of accessing the nth element of 77 00:05:23,640 --> 00:05:27,160 the list, of the i-th element of the list? 78 00:05:30,880 --> 00:05:33,660 Somebody? 79 00:05:33,660 --> 00:05:38,450 How many steps does it take to find element i? 80 00:05:38,450 --> 00:05:38,940 AUDIENCE: i. 81 00:05:38,940 --> 00:05:39,430 AUDIENCE: i? 82 00:05:39,430 --> 00:05:41,310 PROFESSOR JOHN GUTTAG: i steps, exactly. 83 00:05:41,310 --> 00:05:50,900 So for a linked list, finding the i-th element is order i. 84 00:05:50,900 --> 00:05:53,370 That's not very good. 85 00:05:53,370 --> 00:05:56,650 That won't help me with binary search. 86 00:05:56,650 --> 00:06:00,890 Because if this were the case for finding an element of a 87 00:06:00,890 --> 00:06:06,390 list in Python, binary search would not be log length of the 88 00:06:06,390 --> 00:06:10,830 list, but it would be order length of the list. 89 00:06:10,830 --> 00:06:13,640 Because the worst case is I'd have to visit every element of 90 00:06:13,640 --> 00:06:16,830 the list, say, to discover something isn't in it. 91 00:06:22,690 --> 00:06:28,330 So, this is not what you want to do. 92 00:06:28,330 --> 00:06:32,020 Instead, Python uses something like the 93 00:06:32,020 --> 00:06:33,270 picture in your handout. 94 00:06:39,920 --> 00:06:42,840 And the key idea here is one of indirection. 95 00:06:46,610 --> 00:06:57,420 So in Python, what a list looks like is it is a list, a 96 00:06:57,420 --> 00:07:06,080 section of memory, a list of objects each of the same size. 97 00:07:06,080 --> 00:07:10,590 Because now what each object is a pointer. 98 00:07:10,590 --> 00:07:15,570 So we've now separated in space the values of the 99 00:07:15,570 --> 00:07:17,630 members of the list and the pointers to, if you 100 00:07:17,630 --> 00:07:19,730 will, the next one. 101 00:07:19,730 --> 00:07:23,210 So now, it can be very simple. 102 00:07:23,210 --> 00:07:26,510 This first element could be big. 103 00:07:26,510 --> 00:07:28,840 Second element could be small. 104 00:07:28,840 --> 00:07:30,090 We don't care. 105 00:07:34,290 --> 00:07:39,370 Now I'm back to exactly the model we looked at here. 106 00:07:43,020 --> 00:07:47,190 If, say, a pointer to someplace in memory is 4 units 107 00:07:47,190 --> 00:07:55,430 long, then to find l-th of i, I use that trick to find, say, 108 00:07:55,430 --> 00:07:57,710 the i-th pointer. 109 00:07:57,710 --> 00:08:00,540 And then it takes me only one step to follow it to get to 110 00:08:00,540 --> 00:08:01,790 the object. 111 00:08:05,770 --> 00:08:14,910 So, I can now, in constant time, access any object into a 112 00:08:14,910 --> 00:08:17,615 list, even though the objects in the list 113 00:08:17,615 --> 00:08:21,090 are of varying size. 114 00:08:21,090 --> 00:08:25,510 This is the way it's done in all object-oriented 115 00:08:25,510 --> 00:08:28,035 programming languages. 116 00:08:28,035 --> 00:08:29,545 Does that makes sense to everybody? 117 00:08:35,340 --> 00:08:41,230 This concept of indirection is one of the most powerful 118 00:08:41,230 --> 00:08:44,570 programming techniques we have. 119 00:08:44,570 --> 00:08:45,820 It gets used a lot. 120 00:08:48,960 --> 00:08:51,690 My dictionary defines indirection as a lack of 121 00:08:51,690 --> 00:08:56,090 straightforwardness and openness and as a synonym uses 122 00:08:56,090 --> 00:08:57,340 deceitfulness. 123 00:08:59,300 --> 00:09:02,930 And it had this pejorative term until about 1950 when 124 00:09:02,930 --> 00:09:06,130 computer scientists discovered it and decided it was a 125 00:09:06,130 --> 00:09:08,180 wonderful thing. 126 00:09:08,180 --> 00:09:10,720 There's something that's often quoted at people who do 127 00:09:10,720 --> 00:09:12,210 algorithms. 128 00:09:12,210 --> 00:09:15,200 They say quote, "all problems in computer science can be 129 00:09:15,200 --> 00:09:19,140 solved by another level of indirection." So, it's sort 130 00:09:19,140 --> 00:09:21,420 of, whenever you're stuck, you add another level of 131 00:09:21,420 --> 00:09:23,470 indirection. 132 00:09:23,470 --> 00:09:26,850 The caveat to this is the one problem that can't be solved 133 00:09:26,850 --> 00:09:30,690 by adding another level of indirection is too many levels 134 00:09:30,690 --> 00:09:34,580 of indirection, which can be a problem. 135 00:09:34,580 --> 00:09:38,700 As you look at certain kinds of memory structures, the fact 136 00:09:38,700 --> 00:09:42,580 that you've separated the pointers from the value fields 137 00:09:42,580 --> 00:09:46,910 can lead to them being in very far apart in memory, which can 138 00:09:46,910 --> 00:09:51,000 disturb behaviors of caches and things like that. 139 00:09:51,000 --> 00:09:55,120 So in some models of memory this can lead to surprising 140 00:09:55,120 --> 00:09:56,660 inefficiency. 141 00:09:56,660 --> 00:09:58,730 But most of the time it's really a great 142 00:09:58,730 --> 00:10:01,120 implementation technique. 143 00:10:01,120 --> 00:10:05,420 And I highly recommend it. 144 00:10:05,420 --> 00:10:07,640 So that's how we do the trick. 145 00:10:07,640 --> 00:10:12,280 Now we can convince ourselves that binary search is indeed 146 00:10:12,280 --> 00:10:14,390 order log n. 147 00:10:14,390 --> 00:10:17,090 And as we saw Tuesday, logarithmic 148 00:10:17,090 --> 00:10:19,130 growth is very slow. 149 00:10:19,130 --> 00:10:22,360 So it means we can use binary search to search enormous 150 00:10:22,360 --> 00:10:25,280 lists and get the answer very quickly. 151 00:10:28,820 --> 00:10:28,980 All right. 152 00:10:28,980 --> 00:10:32,230 There's still one catch. 153 00:10:32,230 --> 00:10:34,400 And what's the catch? 154 00:10:34,400 --> 00:10:37,970 There's an assumption to binary search. 155 00:10:37,970 --> 00:10:42,310 Binary search works only when what assumption is true? 156 00:10:42,310 --> 00:10:43,240 AUDIENCE: It's sorted. 157 00:10:43,240 --> 00:10:51,850 PROFESSOR JOHN GUTTAG: The list is sorted because it 158 00:10:51,850 --> 00:10:53,455 depends on that piece of knowledge. 159 00:10:55,960 --> 00:11:00,835 So, that raises the question, how did it get sorted? 160 00:11:03,370 --> 00:11:07,600 Or the other question it raises, if I ask you to search 161 00:11:07,600 --> 00:11:11,520 for something, does it make sense to follow the algorithm 162 00:11:11,520 --> 00:11:19,075 of (1) sort L, (2) use binary search? 163 00:11:25,990 --> 00:11:29,500 Does that make sense? 164 00:11:29,500 --> 00:11:34,490 Well, what does it depend upon, whether this makes sense 165 00:11:34,490 --> 00:11:37,080 from an efficiency point of view? 166 00:11:37,080 --> 00:11:54,210 We know that that's order log length of L. We also know if 167 00:11:54,210 --> 00:12:01,440 the list isn't sorted, we can do it in order L. We can 168 00:12:01,440 --> 00:12:04,590 always use linear search. 169 00:12:04,590 --> 00:12:09,780 So, whether or not this makes a good idea depends upon 170 00:12:09,780 --> 00:12:15,750 whether we can do this fast enough. 171 00:12:15,750 --> 00:12:28,250 It has the question is order question mark plus order log 172 00:12:28,250 --> 00:12:35,360 len of L less than order L? 173 00:12:38,580 --> 00:12:42,030 If it's not, it doesn't make sense to sort it first in 174 00:12:42,030 --> 00:12:43,280 sums, right? 175 00:12:45,660 --> 00:12:49,510 So what's the answer to this question? 176 00:12:49,510 --> 00:12:53,940 Do we think we can sort a list fast enough? 177 00:12:53,940 --> 00:12:56,060 And what would fast enough mean? 178 00:12:56,060 --> 00:12:57,310 What would it have to be? 179 00:12:59,610 --> 00:13:03,250 For this to be better than this, we know that we have to 180 00:13:03,250 --> 00:13:06,375 be able to sort a list in sublinear time. 181 00:13:10,240 --> 00:13:12,510 Can we do that? 182 00:13:12,510 --> 00:13:16,410 Alas, the answer is provably no. 183 00:13:16,410 --> 00:13:19,190 No matter how clever we are, there is no algorithm that 184 00:13:19,190 --> 00:13:23,150 will sort a list in sublinear time. 185 00:13:23,150 --> 00:13:25,990 And if you think of it, that makes a lot of sense. 186 00:13:25,990 --> 00:13:28,740 Because, how can you get a list in ascending or 187 00:13:28,740 --> 00:13:31,460 descending order without looking at every element in 188 00:13:31,460 --> 00:13:35,000 the list at least once? 189 00:13:35,000 --> 00:13:37,700 Logic says you just can't do it. 190 00:13:37,700 --> 00:13:40,130 If you're going to put something in order, you're 191 00:13:40,130 --> 00:13:42,970 going to have to look at it. 192 00:13:42,970 --> 00:13:48,930 So we know that we have a lower bound on sorting, which 193 00:13:48,930 --> 00:13:53,830 is order L. And we know that order L plus order log length 194 00:13:53,830 --> 00:14:04,520 L is the same as order L, which is not better than that. 195 00:14:04,520 --> 00:14:05,790 So why do we care? 196 00:14:05,790 --> 00:14:08,840 If this is true, why are we interested in things like 197 00:14:08,840 --> 00:14:11,380 binary search at all? 198 00:14:11,380 --> 00:14:16,500 And the reason is we're often interested in something called 199 00:14:16,500 --> 00:14:18,765 amortized complexity. 200 00:14:25,260 --> 00:14:27,650 I know that there are some course 15 students in the 201 00:14:27,650 --> 00:14:31,360 class who will know what amortization means. 202 00:14:31,360 --> 00:14:34,250 But maybe not everybody does. 203 00:14:34,250 --> 00:14:41,900 The idea here is that if we can sort the list once and end 204 00:14:41,900 --> 00:14:53,590 up searching it many times, the cost of the sort can be 205 00:14:53,590 --> 00:14:58,840 allocated, a little bit of it, to each of the searches. 206 00:14:58,840 --> 00:15:03,480 And if we do enough searches, then in fact it doesn't really 207 00:15:03,480 --> 00:15:07,680 matter how long the sort takes. 208 00:15:07,680 --> 00:15:11,860 So if we were going to search this list a million times, 209 00:15:11,860 --> 00:15:14,660 maybe we don't care about the one-time 210 00:15:14,660 --> 00:15:18,450 overhead of sorting it. 211 00:15:18,450 --> 00:15:22,510 And this kind of amortized analysis is quite common and 212 00:15:22,510 --> 00:15:27,510 is what we really end up doing most of the time in practice. 213 00:15:27,510 --> 00:15:39,930 So the real question we want to ask is, if we plan on 214 00:15:39,930 --> 00:15:41,303 performing k searches-- 215 00:15:50,550 --> 00:15:54,160 who knows how long it will take to sort it-- 216 00:15:54,160 --> 00:16:03,480 what we'll take is order of whatever sort of the list is, 217 00:16:03,480 --> 00:16:29,530 plus k times log length of L. Is that less than k 218 00:16:29,530 --> 00:16:33,230 times len of L? 219 00:16:37,290 --> 00:16:40,840 If I don't sort it, to do k sort searches will 220 00:16:40,840 --> 00:16:42,890 take this much time. 221 00:16:42,890 --> 00:16:45,680 If I do sort it, it will take this much time. 222 00:16:48,730 --> 00:16:51,510 The answer to this question, of course, depends upon what's 223 00:16:51,510 --> 00:16:55,150 the complexity of that and how big is k. 224 00:17:01,273 --> 00:17:04,099 Does that make sense? 225 00:17:04,099 --> 00:17:09,329 In practice, k is often very big. 226 00:17:09,329 --> 00:17:13,310 The number of times we access, say, a student record is quite 227 00:17:13,310 --> 00:17:16,369 large compared to the number of times 228 00:17:16,369 --> 00:17:19,970 students enroll in MIT. 229 00:17:19,970 --> 00:17:23,050 So if at the start of each semester we produce a sorted 230 00:17:23,050 --> 00:17:29,610 list, it pays off to do the searches. 231 00:17:29,610 --> 00:17:31,210 In fact, we don't do a sorted list. 232 00:17:31,210 --> 00:17:33,220 We do something more complex. 233 00:17:33,220 --> 00:17:35,260 But you understand the concept I hope. 234 00:17:40,080 --> 00:17:46,280 Now we have to say, how well can we do that? 235 00:17:46,280 --> 00:17:49,190 That's what I want to spend most of the rest of today on 236 00:17:49,190 --> 00:17:52,930 now is talking about how do we do sorting because it is a 237 00:17:52,930 --> 00:17:56,750 very common operation. 238 00:17:56,750 --> 00:18:00,355 First of all, let's look at a way we don't do sorting. 239 00:18:04,720 --> 00:18:08,240 There was a famous computer scientist who 240 00:18:08,240 --> 00:18:09,490 opined on this topic. 241 00:18:12,520 --> 00:18:16,485 We can look for him this way. 242 00:18:16,485 --> 00:18:18,775 A well-known technique is bubble sort. 243 00:18:26,820 --> 00:18:28,460 Actually, stop. 244 00:18:28,460 --> 00:18:30,180 We're going to need sound for this. 245 00:18:30,180 --> 00:18:31,720 Do we have sound in the booth? 246 00:18:34,720 --> 00:18:36,340 Do we have somebody in the booth? 247 00:18:39,600 --> 00:18:42,980 Well, we either have sound or we don't. 248 00:18:42,980 --> 00:18:44,230 We'll find out shortly. 249 00:19:06,180 --> 00:19:07,600 Other way. 250 00:19:07,600 --> 00:19:08,120 Come on. 251 00:19:08,120 --> 00:19:08,920 You should know. 252 00:19:08,920 --> 00:19:10,170 Oh there. 253 00:19:12,050 --> 00:19:12,920 Thank you. 254 00:19:12,920 --> 00:19:13,350 [VIDEO PLAYBACK] 255 00:19:13,350 --> 00:19:16,380 -Now, it's hard to get a job as President. 256 00:19:16,380 --> 00:19:18,000 And you're going through the rigors now. 257 00:19:18,000 --> 00:19:21,510 It's also hard to get a job at Google. 258 00:19:21,510 --> 00:19:25,180 We have questions, and we ask our candidates questions. 259 00:19:25,180 --> 00:19:28,110 And this one is from Larry Schwimmer. 260 00:19:28,110 --> 00:19:30,774 [LAUGHTER] 261 00:19:30,774 --> 00:19:31,980 -You guys think I'm kidding? 262 00:19:31,980 --> 00:19:34,350 It's right here. 263 00:19:34,350 --> 00:19:36,300 What is the most efficient way to sort a 264 00:19:36,300 --> 00:19:37,350 million 32-bit integers? 265 00:19:37,350 --> 00:19:41,170 [LAUGHTER] 266 00:19:41,170 --> 00:19:44,130 -Well, uh. 267 00:19:44,130 --> 00:19:47,090 -I'm Sorry, maybe that's not a-- 268 00:19:47,090 --> 00:19:50,550 -I think the bubble sort would be the wrong way to go. 269 00:19:50,550 --> 00:19:53,388 [LAUGHTER] 270 00:19:53,388 --> 00:19:57,880 -Come on, who told him this? 271 00:19:57,880 --> 00:20:00,030 I didn't see computer science in your background. 272 00:20:00,030 --> 00:20:01,705 -We've got our spies in there. 273 00:20:04,610 --> 00:20:07,260 -OK, let's ask a different interval-- 274 00:20:07,260 --> 00:20:08,080 [END VIDEO PLAYBACK] 275 00:20:08,080 --> 00:20:13,410 PROFESSOR JOHN GUTTAG: All right, so as he sometimes is, 276 00:20:13,410 --> 00:20:15,280 the President was correct. 277 00:20:15,280 --> 00:20:19,310 Bubble sort, though often discussed, is almost always 278 00:20:19,310 --> 00:20:20,660 the wrong answer. 279 00:20:20,660 --> 00:20:24,001 So we're not going to talk about bubble sort. 280 00:20:24,001 --> 00:20:26,430 I, by the way, know Larry Schwimmer and can believe he 281 00:20:26,430 --> 00:20:29,700 did ask that question. 282 00:20:29,700 --> 00:20:31,350 But yes, I'm surprised. 283 00:20:31,350 --> 00:20:35,010 Someone had obviously warned the President, actually the 284 00:20:35,010 --> 00:20:38,280 then future president I think. 285 00:20:38,280 --> 00:20:42,770 Let's look at a different one that's often used, and that's 286 00:20:42,770 --> 00:20:45,470 called selection sort. 287 00:20:45,470 --> 00:20:48,620 This is about as simple as it gets. 288 00:20:48,620 --> 00:20:52,230 The basic idea of selection sort-- 289 00:20:52,230 --> 00:20:56,700 and it's not a very good way to sort, but it is a useful 290 00:20:56,700 --> 00:20:58,060 kind of thing to look at because it 291 00:20:58,060 --> 00:21:00,770 introduces some ideas. 292 00:21:00,770 --> 00:21:09,460 Like many algorithms, it depends upon establishing and 293 00:21:09,460 --> 00:21:10,710 maintaining an invariant. 294 00:21:27,860 --> 00:21:33,370 An invariant is something that's invariantly true. 295 00:21:33,370 --> 00:21:37,140 The invariant we're going to maintain here is we're going 296 00:21:37,140 --> 00:21:40,730 to have a pointer into the list. 297 00:21:40,730 --> 00:21:49,130 And that pointer is going to divide the list into a prefix 298 00:21:49,130 --> 00:21:50,380 and a suffix. 299 00:21:54,810 --> 00:22:02,520 And the invariant that we're going to maintain is that the 300 00:22:02,520 --> 00:22:08,510 prefix is always sorted. 301 00:22:13,580 --> 00:22:16,810 We'll start where the prefix is empty. 302 00:22:16,810 --> 00:22:19,910 It contains none of the list. 303 00:22:19,910 --> 00:22:24,430 And then each step through the algorithm, we'll decrease the 304 00:22:24,430 --> 00:22:28,430 size of the suffix by one element and increase the size 305 00:22:28,430 --> 00:22:31,240 of the prefix by one element while 306 00:22:31,240 --> 00:22:34,480 maintaining the invariant. 307 00:22:34,480 --> 00:22:39,290 And we'll be done when the size of the suffix is 0, and 308 00:22:39,290 --> 00:22:42,750 therefore the prefix contains all the elements. 309 00:22:42,750 --> 00:22:45,960 And because we've been maintaining this invariant, we 310 00:22:45,960 --> 00:22:48,210 know that we have now sorted the list. 311 00:22:54,530 --> 00:22:55,800 So, you can think about it. 312 00:22:55,800 --> 00:23:07,380 For example, if I have a list that looks like 4, 2, 3, I'll 313 00:23:07,380 --> 00:23:09,800 start pointing here. 314 00:23:09,800 --> 00:23:12,610 And the prefix, which contains nothing, obeys the invariant. 315 00:23:15,760 --> 00:23:19,840 I'll then go through the list and find the smallest element 316 00:23:19,840 --> 00:23:24,630 in the list and swap it with the first element. 317 00:23:27,220 --> 00:23:32,745 My next step, the list will look like 2, 4, 3. 318 00:23:36,270 --> 00:23:39,880 I'll now point here. 319 00:23:39,880 --> 00:23:41,200 My invariant is true. 320 00:23:41,200 --> 00:23:44,210 The prefix contains only one element, so it is 321 00:23:44,210 --> 00:23:46,560 in ascending order. 322 00:23:46,560 --> 00:23:48,780 And I've increased its size by 1. 323 00:23:52,320 --> 00:23:55,340 I don't have to look at this element again because I know 324 00:23:55,340 --> 00:23:58,150 by construction that's the smallest. 325 00:23:58,150 --> 00:24:02,160 Now I move here, and I look for the smallest element in 326 00:24:02,160 --> 00:24:07,350 the suffix, which will be 3. 327 00:24:07,350 --> 00:24:09,990 I swapped 3 and 4. 328 00:24:09,990 --> 00:24:13,490 And then I'm going to be done. 329 00:24:13,490 --> 00:24:15,650 Does that make sense? 330 00:24:15,650 --> 00:24:17,390 It's very straightforward. 331 00:24:17,390 --> 00:24:23,280 It's, in some sense, the most obvious way to sort a list. 332 00:24:23,280 --> 00:24:29,390 And if you look at the code, that's exactly what it does. 333 00:24:29,390 --> 00:24:31,570 I've stated the invariant here. 334 00:24:31,570 --> 00:24:36,200 And I just go through and I sort it. 335 00:24:36,200 --> 00:24:38,730 So we can run it. 336 00:24:38,730 --> 00:24:39,980 Let's do that. 337 00:24:46,620 --> 00:24:52,960 I'm going to sort the list 3, 4, 5, et cetera, 35, 45. 338 00:24:52,960 --> 00:24:55,050 I'm going to call selection sort. 339 00:24:55,050 --> 00:24:59,210 And I don't think this is in your handout, but just to make 340 00:24:59,210 --> 00:25:02,550 it obvious what's going on, each iteration of the loop I'm 341 00:25:02,550 --> 00:25:08,220 going to print the partially sorted list so we can see 342 00:25:08,220 --> 00:25:09,470 what's happening. 343 00:25:18,250 --> 00:25:20,930 The first step, it finds 4 and puts that in the beginning. 344 00:25:24,277 --> 00:25:27,420 It actually finds 0, puts it in the beginning, et cetera. 345 00:25:27,420 --> 00:25:28,290 All right? 346 00:25:28,290 --> 00:25:31,080 So, people see what's going on here? 347 00:25:31,080 --> 00:25:33,300 It's essentially doing exactly what I did on 348 00:25:33,300 --> 00:25:35,220 the board over there. 349 00:25:35,220 --> 00:25:38,030 And when we're done, we have the list completely sorted. 350 00:25:42,000 --> 00:25:43,495 What's the complexity of this? 351 00:25:51,580 --> 00:25:53,630 What's the complexity of selection sort? 352 00:25:59,260 --> 00:26:01,610 There are two things going on. 353 00:26:01,610 --> 00:26:04,330 I'm doing a bunch of comparisons. 354 00:26:04,330 --> 00:26:05,680 And I'm doing a bunch of swaps. 355 00:26:09,130 --> 00:26:14,280 Since I do, at most, the same number of comparisons as I do 356 00:26:14,280 --> 00:26:17,170 swap or swaps as I do comparisons-- 357 00:26:17,170 --> 00:26:20,520 I never swap without doing a comparison-- 358 00:26:20,520 --> 00:26:23,260 we can calculate complexity by looking at the number of 359 00:26:23,260 --> 00:26:26,630 comparisons I'm doing. 360 00:26:26,630 --> 00:26:28,922 You can see that in the code as well. 361 00:26:28,922 --> 00:26:32,870 So how many comparisons might I have to do here? 362 00:26:40,850 --> 00:26:46,680 The key thing to notice is each time I look at it, each 363 00:26:46,680 --> 00:26:50,870 iteration, I'm looking at every element in what? 364 00:26:53,650 --> 00:26:55,660 In the list? 365 00:26:55,660 --> 00:26:58,520 No, every element in the suffix. 366 00:26:58,520 --> 00:27:03,210 The first time through, I'm going to look at-- 367 00:27:03,210 --> 00:27:08,520 let's just say n equals the length of the list. 368 00:27:08,520 --> 00:27:13,385 So the first time through, I'm going to look at n elements. 369 00:27:17,060 --> 00:27:21,190 Then I'm going to look at n minus 1. 370 00:27:21,190 --> 00:27:24,830 Then I'm going to look at n minus 2. 371 00:27:24,830 --> 00:27:26,220 Until I'm done, right? 372 00:27:29,410 --> 00:27:34,470 So that's how many operations I'm doing. 373 00:27:34,470 --> 00:27:39,650 And what is the order of n plus n minus 1 plus n minus 2? 374 00:27:44,000 --> 00:27:46,610 Exactly. 375 00:27:46,610 --> 00:27:47,860 Order n. 376 00:27:50,610 --> 00:27:52,900 So, selection sort is order n. 377 00:27:56,250 --> 00:27:57,820 Is that right? 378 00:27:57,820 --> 00:27:58,910 Somebody said order n. 379 00:27:58,910 --> 00:28:01,290 Do you believe it's n? 380 00:28:01,290 --> 00:28:04,410 Is this really n? 381 00:28:04,410 --> 00:28:05,510 It's not n. 382 00:28:05,510 --> 00:28:08,010 What is it? 383 00:28:08,010 --> 00:28:11,160 Somebody raise your hand, so I can throw the candy out. 384 00:28:11,160 --> 00:28:13,800 Yeah. 385 00:28:13,800 --> 00:28:16,284 AUDIENCE: [INAUDIBLE] 386 00:28:16,284 --> 00:28:17,534 PROFESSOR JOHN GUTTAG: It's not n factorial. 387 00:28:22,260 --> 00:28:24,252 AUDIENCE: n-squared? 388 00:28:24,252 --> 00:28:25,746 PROFESSOR JOHN GUTTAG: You said that with a question mark 389 00:28:25,746 --> 00:28:27,240 at the end of your voice. 390 00:28:27,240 --> 00:28:29,398 AUDIENCE: No, it's like the sum of the numbers is, like, n 391 00:28:29,398 --> 00:28:31,722 times n minus 1 over 2 or something like that. 392 00:28:31,722 --> 00:28:34,212 PROFESSOR JOHN GUTTAG: It's really exactly right. 393 00:28:40,190 --> 00:28:41,910 It's a little smaller than n-squared, 394 00:28:41,910 --> 00:28:43,160 but it's order n-squared. 395 00:28:47,820 --> 00:28:50,810 I'm doing a lot of these additions. 396 00:28:50,810 --> 00:28:54,380 So I can't ignore all of these extra terms and 397 00:28:54,380 --> 00:28:55,630 say they don't matter. 398 00:28:58,590 --> 00:29:02,090 It's almost as bad as comparing every element to 399 00:29:02,090 --> 00:29:04,750 every other element. 400 00:29:04,750 --> 00:29:09,140 So, selection sort is order n-squared. 401 00:29:09,140 --> 00:29:13,260 And you can do it by understanding that sum or you 402 00:29:13,260 --> 00:29:16,410 can look at the code here. 403 00:29:16,410 --> 00:29:18,700 And that sort of will also tip you off. 404 00:29:25,330 --> 00:29:26,580 Ok. so now, can we do better? 405 00:29:28,710 --> 00:29:32,550 There was a while where people were pretty unsure whether you 406 00:29:32,550 --> 00:29:34,820 could do better. 407 00:29:34,820 --> 00:29:36,070 But we can. 408 00:29:40,790 --> 00:29:47,130 If we think about it now, it was a method invented by John 409 00:29:47,130 --> 00:29:50,490 von Neumann, a very famous guy. 410 00:29:50,490 --> 00:29:56,790 And he, back in the '40s amazingly enough, viewed this 411 00:29:56,790 --> 00:29:59,640 as a kind of divide and conquer algorithm. 412 00:29:59,640 --> 00:30:01,405 And we've looked at divide and conquer before. 413 00:30:04,330 --> 00:30:06,860 What is the general form of divide and conquer? 414 00:30:16,190 --> 00:30:21,330 A phrase you've heard me use many times, popularized, by 415 00:30:21,330 --> 00:30:24,380 the way, I think, by Machiavelli in The Prince, in 416 00:30:24,380 --> 00:30:25,630 a not very nice context. 417 00:30:28,470 --> 00:30:32,220 So, what we do-- and they're all of a kind, the same-- 418 00:30:32,220 --> 00:30:33,490 we start with 1. 419 00:30:33,490 --> 00:30:35,750 Let me get over here and get a full board for this. 420 00:30:43,300 --> 00:30:46,020 First, we have to choose a threshold size. 421 00:31:01,070 --> 00:31:02,320 Let's call it n0. 422 00:31:05,400 --> 00:31:07,770 And that will be, essentially, the smallest problem. 423 00:31:15,590 --> 00:31:20,010 So, we can keep dividing, making our problem smaller-- 424 00:31:20,010 --> 00:31:23,710 this is what we saw with binary search, for example-- 425 00:31:23,710 --> 00:31:26,890 until it's small enough that we say, oh the heck with it. 426 00:31:26,890 --> 00:31:27,970 We'll stop dividing it. 427 00:31:27,970 --> 00:31:29,370 Now we'll just solve it directly. 428 00:31:35,350 --> 00:31:39,400 So, that's how small we need to do it, the smallest thing 429 00:31:39,400 --> 00:31:42,760 we'll divide things into. 430 00:31:42,760 --> 00:31:53,090 The next thing we have to ask ourselves is, how many 431 00:31:53,090 --> 00:31:57,685 instances at each division? 432 00:32:02,590 --> 00:32:03,640 We have a big problem. 433 00:32:03,640 --> 00:32:05,970 We divide it into smaller problems. 434 00:32:05,970 --> 00:32:07,790 How many are we going to divide it into? 435 00:32:13,790 --> 00:32:17,390 We divide it into smaller problems until we reach the 436 00:32:17,390 --> 00:32:20,930 threshold where we can solve it directly. 437 00:32:20,930 --> 00:32:26,120 And then the third and most important part is we need some 438 00:32:26,120 --> 00:32:28,960 algorithm to combine the sub-solutions. 439 00:32:34,290 --> 00:32:37,850 It's no good solving the small problem if we don't have some 440 00:32:37,850 --> 00:32:39,996 way to combine them to solve the larger problem. 441 00:32:45,950 --> 00:32:50,530 We saw that before, and now we're going to see it again. 442 00:32:50,530 --> 00:32:53,800 And we're going to see it, in particular, in the context of 443 00:32:53,800 --> 00:32:55,950 merge sort. 444 00:32:55,950 --> 00:32:58,910 If I use this board, can people see it or is the screen 445 00:32:58,910 --> 00:33:01,860 going to occlude it? 446 00:33:01,860 --> 00:33:05,990 Is there anyone who cannot see this board if I write on it? 447 00:33:05,990 --> 00:33:07,300 All right then, I will write on it. 448 00:33:11,130 --> 00:33:16,030 Let's first look at this problem. 449 00:33:18,720 --> 00:33:23,470 What von Neumann observed in 1945 is 450 00:33:23,470 --> 00:33:25,760 given two sorted lists-- 451 00:33:25,760 --> 00:33:28,790 and amazingly enough, this is still the most popular sorting 452 00:33:28,790 --> 00:33:32,780 algorithm or one of the two most popular I should say-- 453 00:33:35,610 --> 00:33:40,240 you can merge them quickly. 454 00:33:40,240 --> 00:33:42,640 Let's look at an example. 455 00:33:42,640 --> 00:33:54,340 I'll take the lists 1, 5, 12, 18, 19, and 20. 456 00:33:54,340 --> 00:33:56,730 That's list one. 457 00:33:56,730 --> 00:34:03,620 And I'll try and merge it with the list 2, 3, 4, and 17. 458 00:34:07,920 --> 00:34:11,550 The way you do the merge is you start by comparing the 459 00:34:11,550 --> 00:34:14,040 first element to the first element. 460 00:34:17,710 --> 00:34:20,380 And then you choose and say all right, 1 is 461 00:34:20,380 --> 00:34:23,090 smaller than 2. 462 00:34:23,090 --> 00:34:27,330 So that will be the first element of the merge list. 463 00:34:27,330 --> 00:34:29,340 I'm now done with 1, and I never have 464 00:34:29,340 --> 00:34:30,590 to look at it again. 465 00:34:34,440 --> 00:34:40,929 The next thing I do is I compare 5 and 2, the head of 466 00:34:40,929 --> 00:34:43,212 the two remaining lists. 467 00:34:43,212 --> 00:34:45,440 And I say, well, 2 is smaller than 5. 468 00:34:48,500 --> 00:34:52,170 I never have to look at 2 again. 469 00:34:52,170 --> 00:34:55,449 And then compare 5 and 3. 470 00:34:55,449 --> 00:34:56,699 I say 3 is smaller. 471 00:34:59,259 --> 00:35:02,240 I never have to look at 3 again. 472 00:35:02,240 --> 00:35:05,340 I then compare 4 and 5. 473 00:35:05,340 --> 00:35:06,590 4 is smaller. 474 00:35:09,230 --> 00:35:13,280 I then compare 5 and 17. 475 00:35:13,280 --> 00:35:16,360 5 is smaller. 476 00:35:16,360 --> 00:35:17,610 Et cetera. 477 00:35:22,600 --> 00:35:26,655 Now, how many comparisons am I going to do this time? 478 00:35:34,730 --> 00:35:38,110 Well, let's first ask the question, how many elements am 479 00:35:38,110 --> 00:35:40,480 I going to copy from one of these lists to this list? 480 00:35:44,080 --> 00:35:46,720 Copy each element once, right? 481 00:35:46,720 --> 00:35:55,340 So, the number of copies is order len of the list. 482 00:36:01,260 --> 00:36:02,530 That's pretty good. 483 00:36:02,530 --> 00:36:03,930 That's linear. 484 00:36:03,930 --> 00:36:07,326 That's sort of at the lower bound. 485 00:36:07,326 --> 00:36:10,730 But how many comparisons? 486 00:36:10,730 --> 00:36:12,410 That's a little trickier to think about. 487 00:36:16,874 --> 00:36:18,870 AUDIENCE: [INAUDIBLE] 488 00:36:18,870 --> 00:36:19,150 PROFESSOR JOHN GUTTAG: Pardon? 489 00:36:19,150 --> 00:36:22,030 AUDIENCE: At most, the length of the longer list. 490 00:36:22,030 --> 00:36:24,070 PROFESSOR JOHN GUTTAG: At most, the length of the longer 491 00:36:24,070 --> 00:36:28,920 list, which would also be, we could claim to 492 00:36:28,920 --> 00:36:33,740 be, order len of-- 493 00:36:33,740 --> 00:36:37,850 I sort of cheated using L when we have two lists. 494 00:36:37,850 --> 00:36:40,100 But just think of it as the longer list. 495 00:36:40,100 --> 00:36:42,276 So, you'd think that many comparisons. 496 00:36:48,500 --> 00:36:50,725 You think we can do this whole thing in linear time? 497 00:36:53,920 --> 00:36:55,310 And the answer is yes. 498 00:36:59,910 --> 00:37:01,160 That's our merge. 499 00:37:04,900 --> 00:37:07,130 That's a good thing. 500 00:37:07,130 --> 00:37:10,740 Now, that takes care of this step. 501 00:37:15,740 --> 00:37:18,900 But now we have to ask, how many times are we 502 00:37:18,900 --> 00:37:20,150 going to do a merge? 503 00:37:24,980 --> 00:37:27,970 Because remember, this worked because 504 00:37:27,970 --> 00:37:29,840 these lists were sorted. 505 00:37:29,840 --> 00:37:32,090 And so I only had to compare the front of each list. 506 00:37:37,710 --> 00:37:42,550 When I think about how I'm going to do the binary or the 507 00:37:42,550 --> 00:37:47,260 merge sort, what I'm going to do is take the original list, 508 00:37:47,260 --> 00:37:50,400 break it up, break it up, break it up, break it up, 509 00:37:50,400 --> 00:37:53,865 until I have a list of length 1. 510 00:37:53,865 --> 00:37:58,070 Well, those are all sorted, trivially sorted. 511 00:37:58,070 --> 00:38:00,780 And then I'll have, at the end, a bunch of 512 00:38:00,780 --> 00:38:02,630 lists of length 1. 513 00:38:02,630 --> 00:38:04,300 I'll merge pairs of those. 514 00:38:06,870 --> 00:38:09,750 Now I'll have sorted lists of length 2. 515 00:38:09,750 --> 00:38:14,620 Then I'll merge those, getting sorted lists of length 4. 516 00:38:14,620 --> 00:38:18,610 Until at the end, I'll be merging two lists, each half 517 00:38:18,610 --> 00:38:23,030 the length of the original list. 518 00:38:23,030 --> 00:38:23,255 Right. 519 00:38:23,255 --> 00:38:24,505 Does that make sense to everybody? 520 00:38:29,470 --> 00:38:34,080 Now I have to ask the question, how many times am I 521 00:38:34,080 --> 00:38:35,330 going to call merge? 522 00:38:39,910 --> 00:38:40,740 Yeah. 523 00:38:40,740 --> 00:38:43,230 AUDIENCE: Base 2 log of one of the lists. 524 00:38:43,230 --> 00:38:48,078 PROFESSOR JOHN GUTTAG: Yes, I'm going to call merge log 525 00:38:48,078 --> 00:38:50,054 length of the list times. 526 00:38:53,020 --> 00:39:06,630 So, if each merge is order n where n is length of the list, 527 00:39:06,630 --> 00:39:12,150 and I call merge log n times, what's the total complexity of 528 00:39:12,150 --> 00:39:13,714 the merge sort? 529 00:39:13,714 --> 00:39:14,590 AUDIENCE: nlog(n). 530 00:39:14,590 --> 00:39:16,490 PROFESSOR JOHN GUTTAG: nlog(n). 531 00:39:16,490 --> 00:39:17,740 Thank you. 532 00:39:20,440 --> 00:39:22,850 Let's see, I have to choose a heavy candy 533 00:39:22,850 --> 00:39:26,160 because they carry better. 534 00:39:26,160 --> 00:39:27,830 Not well enough though. 535 00:39:27,830 --> 00:39:29,280 All right, you can relay it back. 536 00:39:31,890 --> 00:39:33,800 Now let's look at an implementation. 537 00:39:42,170 --> 00:39:45,470 Here's the implementation of sort. 538 00:39:45,470 --> 00:39:49,340 And I don't think you need to look at it in detail. 539 00:39:49,340 --> 00:39:51,410 It's doing exactly what I did on the board. 540 00:39:51,410 --> 00:39:53,480 Actually, you do need to look at in detail, 541 00:39:53,480 --> 00:39:55,890 but not in real time. 542 00:39:55,890 --> 00:39:58,500 And then sort. 543 00:39:58,500 --> 00:40:00,430 Now, there's a little complication here because I 544 00:40:00,430 --> 00:40:04,640 wanted to show another feature to you. 545 00:40:04,640 --> 00:40:07,740 For the moment, we'll ignore the complication, which is-- 546 00:40:12,080 --> 00:40:15,945 it's, in principle, working, but it's not very bright. 547 00:40:19,350 --> 00:40:22,550 I'll use the mouse. 548 00:40:22,550 --> 00:40:29,490 What we see here is, whenever you do a sort, you're sorting 549 00:40:29,490 --> 00:40:32,126 by some ordering metric. 550 00:40:32,126 --> 00:40:33,700 It could be less than. 551 00:40:33,700 --> 00:40:35,030 It could be greater than. 552 00:40:35,030 --> 00:40:38,550 It could be anything you want. 553 00:40:38,550 --> 00:40:40,990 If you're sorting people, you could sort them by weight or 554 00:40:40,990 --> 00:40:42,770 you could sort them by height. 555 00:40:42,770 --> 00:40:46,010 You could sort them by, God forbid, GPA, 556 00:40:46,010 --> 00:40:47,260 whatever you want. 557 00:40:50,590 --> 00:40:57,620 So, I've written sort to take as an argument the ordering. 558 00:40:57,620 --> 00:41:03,270 I've used this funny thing called lambda, which you don't 559 00:41:03,270 --> 00:41:05,330 actually have to be responsible for. 560 00:41:05,330 --> 00:41:07,550 You're never going to, probably, need to use it in 561 00:41:07,550 --> 00:41:08,760 this course. 562 00:41:08,760 --> 00:41:13,800 But it's a way to dynamically build a function on the fly. 563 00:41:13,800 --> 00:41:19,740 The function I've built is I've said the default value of 564 00:41:19,740 --> 00:41:24,930 LT is x less than y. 565 00:41:24,930 --> 00:41:28,910 Lambda x, lambda xy says x and y are the 566 00:41:28,910 --> 00:41:31,150 parameters to a function. 567 00:41:31,150 --> 00:41:34,970 And the body of the function is simply return the value x 568 00:41:34,970 --> 00:41:37,310 less than y. 569 00:41:37,310 --> 00:41:38,800 All right? 570 00:41:38,800 --> 00:41:41,490 Nothing very exciting there. 571 00:41:41,490 --> 00:41:44,680 What is exciting is having a function as an argument. 572 00:41:44,680 --> 00:41:47,170 And that is something that you'll be doing in future 573 00:41:47,170 --> 00:41:48,670 problem sets. 574 00:41:48,670 --> 00:41:51,150 Because it's one of the very powerful and most useful 575 00:41:51,150 --> 00:41:54,160 features in Python, is using functional arguments. 576 00:41:56,910 --> 00:41:57,040 Right. 577 00:41:57,040 --> 00:42:01,670 Having got past that, what we see is we first say if the 578 00:42:01,670 --> 00:42:04,880 length of L is less than 2-- 579 00:42:04,880 --> 00:42:07,890 that's my threshold-- 580 00:42:07,890 --> 00:42:15,600 then I'm just going to return L, actually a copy of L. 581 00:42:15,600 --> 00:42:22,810 Otherwise, I'm going to find roughly the middle of L. 582 00:42:22,810 --> 00:42:27,970 Then I'm going to call sort recursively with the part to 583 00:42:27,970 --> 00:42:30,040 the left of the middle and the part to the right of the 584 00:42:30,040 --> 00:42:37,640 middle, and then merge them. 585 00:42:37,640 --> 00:42:40,720 So I'm going to go all the way down until I get to list of 586 00:42:40,720 --> 00:42:43,240 length 1, and then bubble all the way back up, 587 00:42:43,240 --> 00:42:44,490 merging as I go. 588 00:42:49,250 --> 00:42:53,550 So, we can see that the depth of the recursion will be 589 00:42:53,550 --> 00:42:57,210 log(n), as observed before. 590 00:42:57,210 --> 00:42:58,920 This is exactly what we looked at when we 591 00:42:58,920 --> 00:43:01,120 looked at binary search. 592 00:43:01,120 --> 00:43:03,870 How many times can you divide something in half -- 593 00:43:03,870 --> 00:43:06,700 log(n) times? 594 00:43:06,700 --> 00:43:13,080 And each recursion we're going to call merge. 595 00:43:13,080 --> 00:43:15,645 So, this is consistent with the notion that the complexity 596 00:43:15,645 --> 00:43:18,760 of the overall algorithm is nlog(n). 597 00:43:22,580 --> 00:43:25,620 Let's run it. 598 00:43:25,620 --> 00:43:28,150 And I'm going to print as we go what's getting merged. 599 00:43:50,640 --> 00:43:51,510 Get rid of this one. 600 00:43:51,510 --> 00:43:52,780 This was our selection sort. 601 00:43:52,780 --> 00:43:55,770 We already looked at that. 602 00:43:55,770 --> 00:43:57,020 Yeah. 603 00:44:05,630 --> 00:44:12,890 So what we'll see here is the first example, I was just 604 00:44:12,890 --> 00:44:16,401 sorting a list of integers. 605 00:44:16,401 --> 00:44:18,240 Maybe we'll look at that all by itself. 606 00:44:26,520 --> 00:44:29,600 I didn't pass it in the second argument, so it used the 607 00:44:29,600 --> 00:44:30,880 default less than. 608 00:44:33,920 --> 00:44:37,110 It was first merge 4 and 5. 609 00:44:37,110 --> 00:44:40,960 Then it had to merge 35 with 4 and 5, then 29 610 00:44:40,960 --> 00:44:44,460 with 17, 58 and 0. 611 00:44:44,460 --> 00:44:54,130 And then the longer list, 1729 with 058, 04535 with 0172958. 612 00:44:54,130 --> 00:44:55,380 And then we were done. 613 00:44:57,940 --> 00:45:00,950 So, indeed it did a logarithmic number of merges. 614 00:45:03,450 --> 00:45:13,360 The next piece of code, I'm taking advantage of the fact 615 00:45:13,360 --> 00:45:19,080 that this function can sort lists of different kinds. 616 00:45:19,080 --> 00:45:22,740 And I'm calling it now with the list of floats. 617 00:45:22,740 --> 00:45:25,320 And I am passing in the second argument, 618 00:45:25,320 --> 00:45:27,020 which is going to be-- 619 00:45:27,020 --> 00:45:30,800 well, let's for fun, I wonder what happens if I make this 620 00:45:30,800 --> 00:45:32,050 greater than. 621 00:45:34,180 --> 00:45:35,430 Let's see what we get. 622 00:45:41,490 --> 00:45:44,060 Now you note, it's sorted it in the other order. 623 00:45:46,950 --> 00:45:49,850 Because I passed in the ordering that said I want to 624 00:45:49,850 --> 00:45:52,730 use a different comparison then less than, I want to use 625 00:45:52,730 --> 00:45:55,090 greater than. 626 00:45:55,090 --> 00:45:57,740 So the same code did the sort the other way. 627 00:46:01,040 --> 00:46:02,640 I can do more interesting things. 628 00:46:09,470 --> 00:46:14,320 So, here I'm assuming I have a list of names. 629 00:46:14,320 --> 00:46:20,300 And I've written two ordering functions myself, one that 630 00:46:20,300 --> 00:46:23,800 first compares the last names and then the first names. 631 00:46:23,800 --> 00:46:25,750 And a different one that compares the first names and 632 00:46:25,750 --> 00:46:27,000 then the last names. 633 00:46:30,070 --> 00:46:33,290 And we can look at those. 634 00:46:42,550 --> 00:46:44,610 Just to avoid cluttering up the screen, let 635 00:46:44,610 --> 00:46:45,860 me get rid of this. 636 00:46:56,370 --> 00:46:58,640 What we can see is we got-- 637 00:46:58,640 --> 00:47:01,770 we did the same way of dividing things initially, but 638 00:47:01,770 --> 00:47:05,600 now we got different orderings. 639 00:47:05,600 --> 00:47:08,750 So, if we look at the first ordering I used, we start with 640 00:47:08,750 --> 00:47:12,710 Giselle Brady and then Tom Brady and then Chancellor 641 00:47:12,710 --> 00:47:14,800 Grimson, et cetera. 642 00:47:14,800 --> 00:47:17,020 And if we do the second ordering, we see, among other 643 00:47:17,020 --> 00:47:21,590 things, you have me between Giselle and Tom. 644 00:47:21,590 --> 00:47:23,205 Not a bad outcome from my perspective. 645 00:47:27,920 --> 00:47:31,480 But again, a lot of flexibility. 646 00:47:31,480 --> 00:47:36,760 By using this functional argument, I can define 647 00:47:36,760 --> 00:47:40,880 whatever functions I want, and using the same sort, get lots 648 00:47:40,880 --> 00:47:43,570 of different code. 649 00:47:43,570 --> 00:47:47,760 And you will discover that in fact the built in sort of 650 00:47:47,760 --> 00:47:52,000 Python has this kind of flexibility. 651 00:47:52,000 --> 00:47:55,040 You will also find, as you write your own programs, 652 00:47:55,040 --> 00:47:58,680 increasingly you'll want to use functions as arguments. 653 00:47:58,680 --> 00:48:02,020 Because it allows you to write a lot less code to accomplish 654 00:48:02,020 --> 00:48:03,270 the same tasks.