1 00:00:00,790 --> 00:00:03,190 The following content is provided under a Creative 2 00:00:03,190 --> 00:00:04,730 Commons license. 3 00:00:04,730 --> 00:00:07,030 Your support will help MIT OpenCourseWare 4 00:00:07,030 --> 00:00:11,390 continue to offer high quality educational resources for free. 5 00:00:11,390 --> 00:00:13,990 To make a donation or view additional materials 6 00:00:13,990 --> 00:00:17,870 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,870 --> 00:00:18,840 at ocw.mit.edu. 8 00:00:30,550 --> 00:00:34,200 PROFESSOR: So, for the last two lectures 9 00:00:34,200 --> 00:00:36,990 we've been talking about analyzing algorithms, 10 00:00:36,990 --> 00:00:38,430 complexity, orders of growth. 11 00:00:38,430 --> 00:00:41,220 How do we estimate the cost of an algorithm 12 00:00:41,220 --> 00:00:43,320 as the size of the input grows? 13 00:00:43,320 --> 00:00:44,820 And as I've said several times, I'll 14 00:00:44,820 --> 00:00:46,710 say at least once more, how do we also 15 00:00:46,710 --> 00:00:47,910 turn it the other direction? 16 00:00:47,910 --> 00:00:52,230 How do we use thoughts about choices of pieces of algorithm 17 00:00:52,230 --> 00:00:54,000 in terms of implications on the cost 18 00:00:54,000 --> 00:00:56,700 it's going to take us to compute? 19 00:00:56,700 --> 00:01:01,230 We saw last time a set of examples-- constant algorithms, 20 00:01:01,230 --> 00:01:06,390 linear algorithms, logarithmic algorithms, linear algorithms, 21 00:01:06,390 --> 00:01:09,870 quadratic algorithms, exponential algorithms. 22 00:01:09,870 --> 00:01:11,340 Today, what I'm going to do is fill 23 00:01:11,340 --> 00:01:14,400 in one more piece, a log linear algorithm-- something that's 24 00:01:14,400 --> 00:01:18,060 really a nice kind of algorithm to have-- and use it to talk 25 00:01:18,060 --> 00:01:20,550 about one last class of algorithms that 26 00:01:20,550 --> 00:01:23,520 are really valuable, and those are searching and sorting 27 00:01:23,520 --> 00:01:25,630 algorithms. 28 00:01:25,630 --> 00:01:27,880 So a search algorithm. 29 00:01:27,880 --> 00:01:29,410 Kind of an obvious statement. 30 00:01:29,410 --> 00:01:31,800 You use them all the time when you go to Google or Bing 31 00:01:31,800 --> 00:01:35,670 or whatever your favorite search mechanism on the web is. 32 00:01:35,670 --> 00:01:39,280 It's just a way to find an item or a group of items 33 00:01:39,280 --> 00:01:41,350 from a collection. 34 00:01:41,350 --> 00:01:43,380 If you think about it, that collection 35 00:01:43,380 --> 00:01:47,002 could be either implicit or explicit. 36 00:01:47,002 --> 00:01:48,710 So way back at the beginning of the term, 37 00:01:48,710 --> 00:01:50,660 we saw an example of a search algorithm 38 00:01:50,660 --> 00:01:53,600 when you were looking for square roots. 39 00:01:53,600 --> 00:01:56,590 And we saw simple things like exhaustive enumeration. 40 00:01:56,590 --> 00:01:58,550 We'd go through all the possibilities. 41 00:01:58,550 --> 00:02:00,770 We saw our first version of bisection search 42 00:02:00,770 --> 00:02:02,670 there, where you would do approximations. 43 00:02:02,670 --> 00:02:05,960 Newton-Raphson-- these are all examples of a search algorithm 44 00:02:05,960 --> 00:02:08,020 where the collection is implicit. 45 00:02:08,020 --> 00:02:11,900 So all the numbers between some point that some other point. 46 00:02:11,900 --> 00:02:13,610 More common is a search algorithm 47 00:02:13,610 --> 00:02:16,409 where the collection is explicit. 48 00:02:16,409 --> 00:02:16,950 I don't know. 49 00:02:16,950 --> 00:02:19,791 For example, I've got all the data records of students 50 00:02:19,791 --> 00:02:22,040 and I want to know how do I find a particular student, 51 00:02:22,040 --> 00:02:24,960 so I can record that A plus that everybody in this room 52 00:02:24,960 --> 00:02:27,855 is going to get next Tuesday on that exam? 53 00:02:27,855 --> 00:02:28,730 That's not a promise. 54 00:02:28,730 --> 00:02:29,230 Sorry. 55 00:02:29,230 --> 00:02:30,810 But we'll work on it. 56 00:02:30,810 --> 00:02:33,230 So could do it implicit, could do it explicit. 57 00:02:33,230 --> 00:02:36,950 Today I want to focus on doing search explicitly. 58 00:02:36,950 --> 00:02:39,320 And it could be on different kinds of collections, 59 00:02:39,320 --> 00:02:41,810 but I'm going to focus-- just as an example-- on search 60 00:02:41,810 --> 00:02:42,944 over lists. 61 00:02:42,944 --> 00:02:45,110 And to make it a little easier, let's just do search 62 00:02:45,110 --> 00:02:46,026 over lists of numbers. 63 00:02:46,026 --> 00:02:49,487 But it could obviously be other kinds of elements. 64 00:02:49,487 --> 00:02:51,320 Now you've already seen some of this, right? 65 00:02:51,320 --> 00:02:54,890 We did search where we said, we can do linear search. 66 00:02:54,890 --> 00:02:55,670 Brute force. 67 00:02:55,670 --> 00:02:57,950 Just walk down the list looking at everything 68 00:02:57,950 --> 00:03:00,170 till we either find the thing we're looking for 69 00:03:00,170 --> 00:03:02,120 or we get to the end of the list. 70 00:03:02,120 --> 00:03:04,340 Sometimes also called British Museum algorithm 71 00:03:04,340 --> 00:03:05,540 or exhaustive enumeration. 72 00:03:05,540 --> 00:03:07,860 I go through everything in the list. 73 00:03:07,860 --> 00:03:10,140 Nice news is, the list doesn't have to be sorted. 74 00:03:10,140 --> 00:03:12,420 It could be just in arbitrary order. 75 00:03:12,420 --> 00:03:16,380 What we saw is that the expected-- sorry, not expected. 76 00:03:16,380 --> 00:03:18,599 The worst case behavior is linear. 77 00:03:18,599 --> 00:03:20,640 In the worst case, the element's not in the list. 78 00:03:20,640 --> 00:03:21,806 I got to look at everything. 79 00:03:21,806 --> 00:03:24,960 So it's going to be linear in terms of complexity. 80 00:03:24,960 --> 00:03:26,790 And then we looked at bisection search, 81 00:03:26,790 --> 00:03:29,740 where we said the list needs to be sorted. 82 00:03:29,740 --> 00:03:32,880 But if it is, we can actually be much more efficient 83 00:03:32,880 --> 00:03:36,270 because we can take advantage of the sorting to cut down 84 00:03:36,270 --> 00:03:38,054 the size of the problem. 85 00:03:38,054 --> 00:03:39,720 And I'll remind you about both of those. 86 00:03:39,720 --> 00:03:42,370 There was our simple little linear search. 87 00:03:42,370 --> 00:03:42,870 Right? 88 00:03:42,870 --> 00:03:45,020 Set a flag that says, I haven't yet found it. 89 00:03:45,020 --> 00:03:48,120 And then just loop over the indices into the list. 90 00:03:48,120 --> 00:03:50,310 I could have also just looped directly over the list 91 00:03:50,310 --> 00:03:53,560 itself, checking to see if the ith member of the list 92 00:03:53,560 --> 00:03:55,440 is the thing I'm looking for. 93 00:03:55,440 --> 00:03:57,150 If it is, change the flag to true 94 00:03:57,150 --> 00:03:59,280 so that when I come out of all of this 95 00:03:59,280 --> 00:04:01,470 I'll return the flag-- either false because it 96 00:04:01,470 --> 00:04:03,902 was set that way initially or true because I found it. 97 00:04:03,902 --> 00:04:06,360 And of course what we knew is we have to look at everything 98 00:04:06,360 --> 00:04:08,510 to see if it's there or not. 99 00:04:08,510 --> 00:04:12,750 I could speed this up by just returning true at this point. 100 00:04:12,750 --> 00:04:15,630 While that would improve the average case, 101 00:04:15,630 --> 00:04:17,019 doesn't improve the worst case. 102 00:04:17,019 --> 00:04:18,630 And that's the thing we usually are concerned about, 103 00:04:18,630 --> 00:04:21,180 because in the worst case I've got to go through everything. 104 00:04:21,180 --> 00:04:22,830 And just to remind you, we said this 105 00:04:22,830 --> 00:04:24,960 is order length of the list. 106 00:04:24,960 --> 00:04:26,790 To go around this part-- the loop right 107 00:04:26,790 --> 00:04:30,090 here-- and inside the loop, it's constant work. 108 00:04:30,090 --> 00:04:32,460 I'm doing the same number of things each time. 109 00:04:32,460 --> 00:04:35,040 That's order n times order 1. 110 00:04:35,040 --> 00:04:36,790 And by our rules, that's just order n. 111 00:04:36,790 --> 00:04:41,020 So it's linear in the size of the problem. 112 00:04:41,020 --> 00:04:42,980 OK. 113 00:04:42,980 --> 00:04:45,910 We said we could do it on sorted lists. 114 00:04:45,910 --> 00:04:47,950 But just again, we'll walk down the list. 115 00:04:47,950 --> 00:04:50,350 Again, here I could loop over everything in the list, 116 00:04:50,350 --> 00:04:52,310 checking to see if it's the thing I want. 117 00:04:52,310 --> 00:04:53,690 Return true. 118 00:04:53,690 --> 00:04:56,530 And if I ever get to a point where the element of the list 119 00:04:56,530 --> 00:04:59,371 is bigger than the thing I'm looking for, 120 00:04:59,371 --> 00:05:01,120 I know it can't be in the rest of the list 121 00:05:01,120 --> 00:05:03,250 because all the things to the right are bigger yet. 122 00:05:03,250 --> 00:05:06,760 I could just Return false and drop out. 123 00:05:06,760 --> 00:05:08,500 In terms of average behavior, this 124 00:05:08,500 --> 00:05:10,307 is better because it's going to stop 125 00:05:10,307 --> 00:05:11,890 as soon as it gets to a point where it 126 00:05:11,890 --> 00:05:14,060 can rule everything else out. 127 00:05:14,060 --> 00:05:18,490 But in terms of complexity, it's still order n. 128 00:05:18,490 --> 00:05:20,674 Because I still on average have-- not average. 129 00:05:20,674 --> 00:05:22,090 In the worst case, I'm still going 130 00:05:22,090 --> 00:05:24,640 to be looking n times through the loop 131 00:05:24,640 --> 00:05:27,490 before I get to a point where I can decide to bail out of it. 132 00:05:27,490 --> 00:05:30,080 So order n. 133 00:05:30,080 --> 00:05:35,260 And then finally-- last piece of recap-- bisection search. 134 00:05:35,260 --> 00:05:35,890 Repeat again. 135 00:05:35,890 --> 00:05:39,160 The idea here is, take the midpoint of the list. 136 00:05:39,160 --> 00:05:40,322 Look at that element. 137 00:05:40,322 --> 00:05:42,030 If it's the thing I'm looking for, great. 138 00:05:42,030 --> 00:05:43,570 I just won the lottery. 139 00:05:43,570 --> 00:05:45,940 If it isn't, decide is the thing I'm 140 00:05:45,940 --> 00:05:49,570 looking for bigger or less than that middle point. 141 00:05:49,570 --> 00:05:53,740 If it's bigger than that, I only use the upper half of the list. 142 00:05:53,740 --> 00:05:57,260 If it's less than that, I only use the lower half of the list. 143 00:05:57,260 --> 00:06:00,870 And the characteristic here was, at each step, 144 00:06:00,870 --> 00:06:03,070 I'm reducing the size of the problem in half. 145 00:06:03,070 --> 00:06:06,460 I'm throwing away half of the remaining list at each step. 146 00:06:06,460 --> 00:06:08,050 And I'll just remind you of that code. 147 00:06:08,050 --> 00:06:10,490 I know it's a lot here, but just to remind you. 148 00:06:10,490 --> 00:06:13,060 It said, down here if I've got an empty list, 149 00:06:13,060 --> 00:06:13,810 it can't be there. 150 00:06:13,810 --> 00:06:15,340 I'm going to Return false. 151 00:06:15,340 --> 00:06:17,800 Otherwise call this little helper function 152 00:06:17,800 --> 00:06:21,100 with the list, the thing for which I'm searching, 153 00:06:21,100 --> 00:06:24,730 and the beginning and end point indices into the list. 154 00:06:24,730 --> 00:06:27,910 Initially the start and the very end. 155 00:06:27,910 --> 00:06:31,090 And this code up here basically says, 156 00:06:31,090 --> 00:06:34,920 if those two numbers are the same I'm down to a list of one. 157 00:06:34,920 --> 00:06:37,650 Just check to see if it's the thing I'm looking for. 158 00:06:37,650 --> 00:06:40,710 Otherwise, pick something halfway in between. 159 00:06:40,710 --> 00:06:42,210 And ignore this case for the moment. 160 00:06:42,210 --> 00:06:44,430 Basically then check to see, is the thing 161 00:06:44,430 --> 00:06:46,530 at that point bigger than e? 162 00:06:46,530 --> 00:06:48,540 In which case, I'm in general going 163 00:06:48,540 --> 00:06:51,920 to call this only with from the low point to the midpoint. 164 00:06:51,920 --> 00:06:55,230 Otherwise I'm going to call this with the midpoint to high. 165 00:06:55,230 --> 00:06:57,480 And that was just this idea of, keep cutting down 166 00:06:57,480 --> 00:07:00,600 in half the size of the list. 167 00:07:00,600 --> 00:07:02,100 Last piece of the recap-- the thing 168 00:07:02,100 --> 00:07:04,650 we wanted you to see here-- is there are the two recursive 169 00:07:04,650 --> 00:07:05,310 calls. 170 00:07:05,310 --> 00:07:08,640 I'm only going to do one because I'm making a decision. 171 00:07:08,640 --> 00:07:12,570 At each step, I'm cutting down the problem by half. 172 00:07:12,570 --> 00:07:15,360 And that says the number of steps, the number of times 173 00:07:15,360 --> 00:07:17,040 I'm going to iterate through here, 174 00:07:17,040 --> 00:07:19,822 will be log in the length of the list. 175 00:07:19,822 --> 00:07:22,030 And if that still doesn't make sense to you, it says, 176 00:07:22,030 --> 00:07:24,460 I need to know when 1 over 2 to the k-- where 177 00:07:24,460 --> 00:07:27,370 k is the number of steps-- is equal to 1. 178 00:07:27,370 --> 00:07:29,680 Because in each step, I'm reducing by half. 179 00:07:29,680 --> 00:07:31,850 And that's when k is log base 2 of n. 180 00:07:31,850 --> 00:07:34,970 So that's why it's log linear. 181 00:07:34,970 --> 00:07:36,210 And so this just reminds you. 182 00:07:36,210 --> 00:07:38,100 Again, that recap. 183 00:07:38,100 --> 00:07:39,710 Number of calls reduced-- or, sorry. 184 00:07:39,710 --> 00:07:42,000 The call gets reduced by a factor or two each time. 185 00:07:42,000 --> 00:07:44,400 I'm going to have a log n work going around it. 186 00:07:44,400 --> 00:07:46,740 And inside it's a constant amount of work 187 00:07:46,740 --> 00:07:48,660 because I'm just passing the pointers, 188 00:07:48,660 --> 00:07:50,790 I'm not actually copying the list. 189 00:07:50,790 --> 00:07:53,540 And that's a nice state to be. 190 00:07:53,540 --> 00:07:57,420 OK, so-- sounds good. 191 00:07:57,420 --> 00:07:58,760 Could just use linear search. 192 00:07:58,760 --> 00:07:59,760 It's going to be linear. 193 00:07:59,760 --> 00:08:03,000 When you use binary search or bisection search, 194 00:08:03,000 --> 00:08:04,260 we can do it in log time. 195 00:08:04,260 --> 00:08:05,520 That's great. 196 00:08:05,520 --> 00:08:08,400 We assumed the list was sorted, but all right. 197 00:08:08,400 --> 00:08:11,860 So that lens basically says, OK. 198 00:08:11,860 --> 00:08:15,880 So when does it make sense to sort the list 199 00:08:15,880 --> 00:08:17,540 and then do the search? 200 00:08:17,540 --> 00:08:18,370 Right? 201 00:08:18,370 --> 00:08:20,269 Because if I can sort the list cheaply, 202 00:08:20,269 --> 00:08:22,060 then the search is going to be logarithmic. 203 00:08:22,060 --> 00:08:24,680 That's really what I would like. 204 00:08:24,680 --> 00:08:27,210 This little expression basically says, 205 00:08:27,210 --> 00:08:30,020 let's let sort be the cost of sorting the list. 206 00:08:30,020 --> 00:08:33,169 I want to know when that cost plus something that's order 207 00:08:33,169 --> 00:08:36,200 log n-- which is what it's going to cost me to do this search. 208 00:08:36,200 --> 00:08:40,156 When is that less than something that's order n? 209 00:08:40,156 --> 00:08:42,530 Because then it's going to be better to do the sort first 210 00:08:42,530 --> 00:08:43,759 than do the search. 211 00:08:43,759 --> 00:08:45,050 And so I can just rearrange it. 212 00:08:45,050 --> 00:08:47,560 It needs to be, when does the cost of sorting-- when is it 213 00:08:47,560 --> 00:08:49,430 last than this expression? 214 00:08:49,430 --> 00:08:52,555 Which basically says, when is sorting 215 00:08:52,555 --> 00:08:54,555 going to be less expensive than the linear cost? 216 00:08:57,682 --> 00:08:59,192 Crud. 217 00:08:59,192 --> 00:09:00,650 Actually, good news for you, right? 218 00:09:00,650 --> 00:09:02,840 This is a really short lecture. 219 00:09:02,840 --> 00:09:05,180 Because it says it's never true. 220 00:09:05,180 --> 00:09:06,241 Ouch. 221 00:09:06,241 --> 00:09:06,740 Don't worry. 222 00:09:06,740 --> 00:09:09,056 We've got more to go on the lecture. 223 00:09:09,056 --> 00:09:11,180 The reason it can't be true-- if you think about it 224 00:09:11,180 --> 00:09:14,330 just informally-- is, if I've got a collection of n elements 225 00:09:14,330 --> 00:09:17,340 and I want to sort it, I've got to look 226 00:09:17,340 --> 00:09:20,160 at each one of those elements at least once. 227 00:09:20,160 --> 00:09:21,310 Right? 228 00:09:21,310 --> 00:09:23,760 I have to look at them to decide where they go. 229 00:09:23,760 --> 00:09:25,110 Oh, that's n elements. 230 00:09:25,110 --> 00:09:28,020 So sorting must be at least order n, 231 00:09:28,020 --> 00:09:29,940 because I got to look at everything. 232 00:09:29,940 --> 00:09:31,830 And in fact as it says there, I'm 233 00:09:31,830 --> 00:09:36,810 going to have to use at least linear time to do the sort. 234 00:09:36,810 --> 00:09:40,660 Sounds like we're stuck, but we're not. 235 00:09:40,660 --> 00:09:42,730 And the reason is, often when I want 236 00:09:42,730 --> 00:09:46,330 to search something I'm going to do multiple searches, 237 00:09:46,330 --> 00:09:48,160 but I may only want to sort the list once. 238 00:09:48,160 --> 00:09:51,370 In fact, I probably only want to sort the list once. 239 00:09:51,370 --> 00:09:53,860 So in that case, I'm spreading out the cost. 240 00:09:53,860 --> 00:09:57,390 I'm amortizing the expense of the sort. 241 00:09:57,390 --> 00:10:01,260 And now what I want to know is, if I'm going to do k searches, 242 00:10:01,260 --> 00:10:03,120 the cost of those k searches I know 243 00:10:03,120 --> 00:10:07,080 is going to be k log n-- because it's log to do the search. 244 00:10:07,080 --> 00:10:09,360 And I simply need to know, is the cost 245 00:10:09,360 --> 00:10:11,910 of sorting plus this-- can I have 246 00:10:11,910 --> 00:10:15,210 something where it's less than k searches just using 247 00:10:15,210 --> 00:10:16,930 linear search? 248 00:10:16,930 --> 00:10:19,190 And the answer is, yes. 249 00:10:19,190 --> 00:10:21,620 There are going to be, for large k's, ways 250 00:10:21,620 --> 00:10:24,560 in which we can do the sort where the sort time becomes 251 00:10:24,560 --> 00:10:26,360 irrelevant, that the cost is really 252 00:10:26,360 --> 00:10:28,664 dominated by this search. 253 00:10:28,664 --> 00:10:30,830 And so what I want to do now is look at-- all right. 254 00:10:30,830 --> 00:10:34,192 How could we do the sort reasonably efficiently? 255 00:10:34,192 --> 00:10:35,900 It's going to have to be at least linear. 256 00:10:35,900 --> 00:10:37,760 We're going to see it's going to be a little more than linear. 257 00:10:37,760 --> 00:10:40,730 But if I could do it reasonably, I'm going to be in good shape 258 00:10:40,730 --> 00:10:41,990 here. 259 00:10:41,990 --> 00:10:44,570 So what I want to do is show you a number of ways in which we 260 00:10:44,570 --> 00:10:47,150 can do sorting-- take a list of elements 261 00:10:47,150 --> 00:10:49,830 and sort them from, in this case, smaller to higher 262 00:10:49,830 --> 00:10:52,920 or increasing order. 263 00:10:52,920 --> 00:10:54,350 So here's my goal. 264 00:10:54,350 --> 00:10:56,430 I want to efficiently sort a list. 265 00:10:56,430 --> 00:11:00,930 I want to see if we can do this as efficiently as possible. 266 00:11:00,930 --> 00:11:03,300 I'm going to start, you might say, 267 00:11:03,300 --> 00:11:05,070 with a humorous version of sort. 268 00:11:05,070 --> 00:11:07,470 You're all convinced that my humor is non-existent. 269 00:11:07,470 --> 00:11:08,430 You're right. 270 00:11:08,430 --> 00:11:09,750 But it sets the stage for it. 271 00:11:09,750 --> 00:11:10,230 This is a sort. 272 00:11:10,230 --> 00:11:11,021 You can look it up. 273 00:11:11,021 --> 00:11:14,180 It's called monkey sort, BOGO sort, stupid sort, slow sort, 274 00:11:14,180 --> 00:11:15,990 permutation sort, shotgun sort. 275 00:11:15,990 --> 00:11:17,520 And here's how it works. 276 00:11:17,520 --> 00:11:20,700 Anna has nicely given me a set of numbers on cards here. 277 00:11:20,700 --> 00:11:24,130 Here's how you do BOGO sort. 278 00:11:24,130 --> 00:11:25,380 I got to do that better. 279 00:11:25,380 --> 00:11:29,360 I got to spread them out randomly, like this. 280 00:11:29,360 --> 00:11:29,860 Oh good. 281 00:11:29,860 --> 00:11:30,880 I'm going to have to-- sorry, Tom. 282 00:11:30,880 --> 00:11:31,546 I'm not walking. 283 00:11:31,546 --> 00:11:34,880 And now I pick them up, saying, is that less than this? 284 00:11:34,880 --> 00:11:37,090 Which is less than-- oh, crud. 285 00:11:37,090 --> 00:11:38,470 They're not sorted. 286 00:11:38,470 --> 00:11:40,250 All right. 287 00:11:40,250 --> 00:11:45,100 I pick them all up and I do it again. 288 00:11:45,100 --> 00:11:47,410 A little brain damage, right? 289 00:11:47,410 --> 00:11:49,220 Now it's intended to get your attention. 290 00:11:49,220 --> 00:11:49,720 I did. 291 00:11:49,720 --> 00:11:50,830 I heard a couple of chuckles. 292 00:11:50,830 --> 00:11:52,204 Those are A students, by the way. 293 00:11:52,204 --> 00:11:54,820 I heard a couple of chuckles here. 294 00:11:54,820 --> 00:11:58,360 We could actually do this exhaustively. 295 00:11:58,360 --> 00:12:00,190 Basically it's called permutation sort 296 00:12:00,190 --> 00:12:03,610 because you could search through all possible permutations 297 00:12:03,610 --> 00:12:06,430 to see if you find something that's sorted. 298 00:12:06,430 --> 00:12:08,830 That, by the way-- the complexity of that 299 00:12:08,830 --> 00:12:12,910 is something like n factorial, which for large n 300 00:12:12,910 --> 00:12:16,480 is n to the nth power. 301 00:12:16,480 --> 00:12:19,040 And if n's anything bigger than about 2, don't do it. 302 00:12:19,040 --> 00:12:19,701 Right? 303 00:12:19,701 --> 00:12:21,700 But it would be a way to think about doing this. 304 00:12:21,700 --> 00:12:22,199 All right. 305 00:12:22,199 --> 00:12:24,700 Now, having caught the humorous version of this, how could 306 00:12:24,700 --> 00:12:27,716 we do this a little bit better? 307 00:12:27,716 --> 00:12:28,350 Oh sorry. 308 00:12:28,350 --> 00:12:29,850 I should say, what's the complexity? 309 00:12:29,850 --> 00:12:33,670 There's a nice crisp definition of BOGO sort. 310 00:12:33,670 --> 00:12:35,890 Its best case is order n, because I just 311 00:12:35,890 --> 00:12:37,870 need to check it's sorted. 312 00:12:37,870 --> 00:12:41,230 Its average case is n factorial and its worst case, 313 00:12:41,230 --> 00:12:43,702 if I'm just doing it randomly, is God knows. 314 00:12:43,702 --> 00:12:45,410 Because I could be doing it here forever. 315 00:12:45,410 --> 00:12:48,050 So we're going to move on. 316 00:12:48,050 --> 00:12:50,857 Here's a second way to do it called bubble sort. 317 00:12:50,857 --> 00:12:52,940 I'm going to do this with a small version of this. 318 00:12:52,940 --> 00:12:54,064 I'm going to put out a set. 319 00:12:54,064 --> 00:12:57,759 I'll turn these up so you can see them in a second. 320 00:12:57,759 --> 00:12:59,300 The idea of bubble sort is, I'm going 321 00:12:59,300 --> 00:13:02,300 to start at-- I'm going to call this the front end of the list. 322 00:13:02,300 --> 00:13:05,630 And I'm going to walk down, comparing elements pairwise. 323 00:13:05,630 --> 00:13:08,270 And I'm always going to move the larger one over. 324 00:13:08,270 --> 00:13:11,720 So I start here and I say, 1 is less than 11. 325 00:13:11,720 --> 00:13:12,930 I'm OK. 326 00:13:12,930 --> 00:13:14,660 11's bigger than five. 327 00:13:14,660 --> 00:13:16,440 I'm going to bubble that up. 328 00:13:16,440 --> 00:13:17,534 11's bigger than 6. 329 00:13:17,534 --> 00:13:18,700 I'm going to bubble that up. 330 00:13:18,700 --> 00:13:20,750 11's bigger than 2. 331 00:13:20,750 --> 00:13:22,610 I've basically bubbled 11 to the end. 332 00:13:22,610 --> 00:13:23,450 Now I go back here. 333 00:13:23,450 --> 00:13:24,890 I say, 1 is less than 5. 334 00:13:24,890 --> 00:13:25,820 That's good. 335 00:13:25,820 --> 00:13:27,020 5 is less than 6. 336 00:13:27,020 --> 00:13:27,850 That's good. 337 00:13:27,850 --> 00:13:30,150 Ah, 6 is bigger than 2. 338 00:13:30,150 --> 00:13:31,190 Bubble that. 339 00:13:31,190 --> 00:13:33,470 6 is less than 11. 340 00:13:33,470 --> 00:13:37,370 You get the idea-- comparison, comparison, and swap. 341 00:13:37,370 --> 00:13:40,040 Comparison, comparison. 342 00:13:40,040 --> 00:13:41,990 And now if I go back to this part and do it, 343 00:13:41,990 --> 00:13:44,580 you'll notice that's in the right order. 344 00:13:44,580 --> 00:13:46,290 That's in the right order. 345 00:13:46,290 --> 00:13:48,090 That's in the right order. 346 00:13:48,090 --> 00:13:49,370 That's in the right order. 347 00:13:49,370 --> 00:13:51,370 I'm done. 348 00:13:51,370 --> 00:13:52,760 Small round of applause, please. 349 00:13:52,760 --> 00:13:54,250 I was able to sort five elements. 350 00:13:54,250 --> 00:13:57,250 Thank you. 351 00:13:57,250 --> 00:13:59,210 The little video is showing the same thing. 352 00:13:59,210 --> 00:14:00,650 You can see the idea here. 353 00:14:00,650 --> 00:14:02,233 It's called bubble sort because you're 354 00:14:02,233 --> 00:14:05,020 literally bubbling things up to the end of the list. 355 00:14:05,020 --> 00:14:06,280 It's pretty simple to do. 356 00:14:06,280 --> 00:14:07,659 You're just swapping pairs. 357 00:14:07,659 --> 00:14:09,700 And as you saw, when I get to the end of the list 358 00:14:09,700 --> 00:14:11,860 I go back and do it until I have a pass where 359 00:14:11,860 --> 00:14:14,696 I go all the way through the list and I don't do any swaps. 360 00:14:14,696 --> 00:14:17,320 And in that case I know I'm done because everything's in order, 361 00:14:17,320 --> 00:14:18,860 and I can stop. 362 00:14:18,860 --> 00:14:21,760 One of the properties of it is that the largest unsorted 363 00:14:21,760 --> 00:14:25,480 element is always at the end after the pass. 364 00:14:25,480 --> 00:14:26,966 In other words, after the first one 365 00:14:26,966 --> 00:14:28,840 I know that the largest element's at the end. 366 00:14:28,840 --> 00:14:30,673 After the second one, the largest thing left 367 00:14:30,673 --> 00:14:32,140 is going to be in the next place. 368 00:14:32,140 --> 00:14:34,540 And that tells me, among other things, 369 00:14:34,540 --> 00:14:37,540 that this is going to take no more than n times 370 00:14:37,540 --> 00:14:40,010 through the list to succeed. 371 00:14:40,010 --> 00:14:42,430 It might actually take fewer than that. 372 00:14:42,430 --> 00:14:44,239 OK. 373 00:14:44,239 --> 00:14:45,780 Again let's look at some code for it. 374 00:14:45,780 --> 00:14:48,155 Let's look at its complexity and let's actually run this. 375 00:14:48,155 --> 00:14:51,850 So here is a little simple version of bubble sort. 376 00:14:51,850 --> 00:14:53,850 I'm going to set a flag up here. 377 00:14:53,850 --> 00:14:56,460 I'm going to call its swap initially to false. 378 00:14:56,460 --> 00:14:58,110 That's going to let me tell when I'm 379 00:14:58,110 --> 00:15:00,235 done, when I've gone through everything in the list 380 00:15:00,235 --> 00:15:01,816 without doing a swap. 381 00:15:01,816 --> 00:15:02,940 And then I'm going to loop. 382 00:15:02,940 --> 00:15:05,700 As long as swap is false-- so the first time through it's 383 00:15:05,700 --> 00:15:07,220 going to do that loop. 384 00:15:07,220 --> 00:15:11,160 I set swap initially to true, and notice what I then do. 385 00:15:11,160 --> 00:15:15,690 I let j range from 1 up to the length of the list, 386 00:15:15,690 --> 00:15:19,320 and I look at the jth element and the previous element. 387 00:15:19,320 --> 00:15:23,430 If the previous element is bigger, I'm going to flip them. 388 00:15:23,430 --> 00:15:24,150 Right there. 389 00:15:24,150 --> 00:15:27,300 And that's just doing that swap, what I just did down here. 390 00:15:27,300 --> 00:15:30,840 And if that's the case, I'm going to set the flag to false. 391 00:15:30,840 --> 00:15:33,882 Which says, I've done at least one bubble as part of this. 392 00:15:33,882 --> 00:15:35,340 Which means when I come out of here 393 00:15:35,340 --> 00:15:37,980 and go back around to the loop, it's going to do it again. 394 00:15:37,980 --> 00:15:41,100 And it will do it until all of this 395 00:15:41,100 --> 00:15:43,440 succeeds without this ever being true, in which case 396 00:15:43,440 --> 00:15:45,150 that's true, which makes that false. 397 00:15:45,150 --> 00:15:47,872 And it will drop out. 398 00:15:47,872 --> 00:15:50,370 OK? 399 00:15:50,370 --> 00:15:52,230 Let's look at an example of this running. 400 00:15:52,230 --> 00:15:55,080 It's just to give you a sense of that, 401 00:15:55,080 --> 00:15:57,660 assuming I can find the right place here. 402 00:15:57,660 --> 00:16:01,200 So there is, again, a version of bubble sort on the side. 403 00:16:01,200 --> 00:16:03,630 And I'm going to bring this down to the bottom I've 404 00:16:03,630 --> 00:16:05,160 got a little test list there. 405 00:16:05,160 --> 00:16:07,270 And I've put a print statement in it. 406 00:16:07,270 --> 00:16:09,630 So you can see each time through the loop, 407 00:16:09,630 --> 00:16:12,140 what's the form of the list as it starts. 408 00:16:12,140 --> 00:16:18,000 And assuming I've done this right-- here you go. 409 00:16:18,000 --> 00:16:20,270 There's the list the first time through. 410 00:16:20,270 --> 00:16:23,957 Notice after one pass, 25's at the end of the list-- 411 00:16:23,957 --> 00:16:24,790 the biggest element. 412 00:16:24,790 --> 00:16:25,680 Exactly what I like. 413 00:16:25,680 --> 00:16:27,920 But you can also see a few other things have flipped. 414 00:16:27,920 --> 00:16:28,420 Right? 415 00:16:28,420 --> 00:16:30,560 Right in there, there have been some other swaps 416 00:16:30,560 --> 00:16:32,180 as it bubbled through. 417 00:16:32,180 --> 00:16:35,790 And in fact, you can see it's-- well, you can see that idea. 418 00:16:35,790 --> 00:16:38,000 You can see 25 moving through. 419 00:16:38,000 --> 00:16:42,230 Notice on the next step, a whole bunch of the list 420 00:16:42,230 --> 00:16:44,041 is actually in the right order. 421 00:16:44,041 --> 00:16:45,290 It's just because I got lucky. 422 00:16:45,290 --> 00:16:47,450 All I can guarantee is that the second largest 423 00:16:47,450 --> 00:16:51,140 element is the second from the end of the list. 424 00:16:51,140 --> 00:16:52,040 But you can see here. 425 00:16:52,040 --> 00:16:53,415 Even though the list is, I think, 426 00:16:53,415 --> 00:16:57,050 nine long, it only took us four passes through. 427 00:16:57,050 --> 00:16:57,860 So this is nice. 428 00:16:57,860 --> 00:17:01,472 It says, at most, n times through the list. 429 00:17:01,472 --> 00:17:02,930 And at the end, we actually get out 430 00:17:02,930 --> 00:17:06,040 something that's in the right form. 431 00:17:06,040 --> 00:17:07,319 OK. 432 00:17:07,319 --> 00:17:09,960 So let's go back to this and basically say, 433 00:17:09,960 --> 00:17:13,650 what's the complexity? 434 00:17:13,650 --> 00:17:15,980 Well that's length n, right? 435 00:17:15,980 --> 00:17:16,480 Has to be. 436 00:17:16,480 --> 00:17:18,760 I'm going through the entire list. 437 00:17:18,760 --> 00:17:24,150 And inside of there is just constant work. 438 00:17:24,150 --> 00:17:25,079 Four operations. 439 00:17:25,079 --> 00:17:26,119 I'm doing a test. 440 00:17:26,119 --> 00:17:26,780 Sorry, five. 441 00:17:26,780 --> 00:17:27,774 I'm doing a test. 442 00:17:27,774 --> 00:17:29,940 And then depending whether that test is true or not, 443 00:17:29,940 --> 00:17:32,050 I'm setting a flag and doing some movement of things around. 444 00:17:32,050 --> 00:17:33,008 But it's just constant. 445 00:17:33,008 --> 00:17:35,210 I don't care about the five. 446 00:17:35,210 --> 00:17:39,190 And there, how many times do I go around the loop? 447 00:17:39,190 --> 00:17:42,120 In the worst case, n. 448 00:17:42,120 --> 00:17:44,175 All I can guarantee is, after the first pass 449 00:17:44,175 --> 00:17:45,536 the biggest thing is here. 450 00:17:45,536 --> 00:17:47,910 After the second pass, the second biggest thing is there. 451 00:17:47,910 --> 00:17:50,130 After the third pass-- you get the idea. 452 00:17:50,130 --> 00:17:54,450 So I've got order and things inside the loop, 453 00:17:54,450 --> 00:17:57,400 and I'm doing that loop n times. 454 00:17:57,400 --> 00:17:58,880 And I hope that looks familiar. 455 00:17:58,880 --> 00:18:00,070 We've talked about this. 456 00:18:00,070 --> 00:18:00,570 Right? 457 00:18:00,570 --> 00:18:01,750 This is nested loops. 458 00:18:01,750 --> 00:18:02,920 What's this? 459 00:18:02,920 --> 00:18:05,020 Quadratic. 460 00:18:05,020 --> 00:18:08,920 So it's order n squared, where n is the length of the list. 461 00:18:08,920 --> 00:18:11,420 Now as you also saw, on average, it could be less than that. 462 00:18:11,420 --> 00:18:14,450 But it's going to be order n squared. 463 00:18:14,450 --> 00:18:16,690 OK. 464 00:18:16,690 --> 00:18:18,810 That's one possibility. 465 00:18:18,810 --> 00:18:22,340 Here's a second nice, simple, sort algorithm. 466 00:18:22,340 --> 00:18:24,872 It's called selection sort. 467 00:18:24,872 --> 00:18:27,080 You can kind of think of this as going the other way. 468 00:18:27,080 --> 00:18:28,530 Not completely, but going the other way. 469 00:18:28,530 --> 00:18:29,510 And when I say going the other way, 470 00:18:29,510 --> 00:18:32,120 the idea here is that I'm going to find the smallest 471 00:18:32,120 --> 00:18:33,337 element in the list. 472 00:18:33,337 --> 00:18:35,420 And I'm going to stick it at the front of the list 473 00:18:35,420 --> 00:18:38,840 when I'm done, and simply swap that place with whatever 474 00:18:38,840 --> 00:18:39,545 was there. 475 00:18:39,545 --> 00:18:40,739 Flip them. 476 00:18:40,739 --> 00:18:42,530 I might do a few other flips along the way, 477 00:18:42,530 --> 00:18:45,000 depending how I implement this. 478 00:18:45,000 --> 00:18:46,470 Next, pass. 479 00:18:46,470 --> 00:18:47,970 I'm just going to look at everything 480 00:18:47,970 --> 00:18:49,890 but the first element, because I know that one's done. 481 00:18:49,890 --> 00:18:51,181 I'm going to do the same thing. 482 00:18:51,181 --> 00:18:53,490 Find the smallest element remaining in the list, 483 00:18:53,490 --> 00:18:57,380 put it in the second spot, and keep doing that. 484 00:18:57,380 --> 00:18:59,990 What I know is, if I implement this correctly, 485 00:18:59,990 --> 00:19:04,220 after i steps the first i elements of the list 486 00:19:04,220 --> 00:19:05,540 will be sorted. 487 00:19:05,540 --> 00:19:07,130 And everything in the rest of the list 488 00:19:07,130 --> 00:19:09,140 has to be bigger than the largest thing 489 00:19:09,140 --> 00:19:11,600 in the first part of the list. 490 00:19:11,600 --> 00:19:12,660 OK. 491 00:19:12,660 --> 00:19:13,662 So we could build that. 492 00:19:13,662 --> 00:19:15,870 Before we do it, I'm going to show you a little video 493 00:19:15,870 --> 00:19:17,790 starring Professor Guttag. 494 00:19:17,790 --> 00:19:19,920 This is his cameo performance here. 495 00:19:19,920 --> 00:19:22,200 But I want to just show you an example of this 496 00:19:22,200 --> 00:19:24,540 using not numbers, but people. 497 00:19:24,540 --> 00:19:25,240 [VIDEO PLAYBACK] 498 00:19:25,240 --> 00:19:25,740 - All right. 499 00:19:25,740 --> 00:19:29,070 So now we're going to do selection sort. 500 00:19:29,070 --> 00:19:31,530 The idea here is that each step we're 501 00:19:31,530 --> 00:19:34,020 going to select the shortest person 502 00:19:34,020 --> 00:19:38,220 and put them next in line of the sorted group. 503 00:19:38,220 --> 00:19:41,610 So we'll bring the leftmost person forward, 504 00:19:41,610 --> 00:19:44,290 and we will compare her to everybody else. 505 00:19:44,290 --> 00:19:47,301 So one at a time, step forward. 506 00:19:47,301 --> 00:19:48,300 You're still the winner. 507 00:19:48,300 --> 00:19:50,130 You go back. 508 00:19:50,130 --> 00:19:52,074 Please step forward. 509 00:19:52,074 --> 00:19:54,490 PROFESSOR: And watch the number of comparisons that go on, 510 00:19:54,490 --> 00:19:55,040 by the way. 511 00:19:55,040 --> 00:19:55,550 We're going to come back to that. 512 00:19:55,550 --> 00:19:56,050 - Next. 513 00:19:59,750 --> 00:20:01,930 Still the winner. 514 00:20:01,930 --> 00:20:03,744 Next. 515 00:20:03,744 --> 00:20:04,660 Ah. 516 00:20:04,660 --> 00:20:06,190 A new winner. 517 00:20:06,190 --> 00:20:07,040 All right. 518 00:20:07,040 --> 00:20:10,102 So you can take her place. 519 00:20:10,102 --> 00:20:12,310 PROFESSOR: So here, we're choosing to actually insert 520 00:20:12,310 --> 00:20:13,690 into the spot in the Line We could 521 00:20:13,690 --> 00:20:16,065 have put her back at the front, but either one will work. 522 00:20:16,065 --> 00:20:17,380 - Now we'll compare. 523 00:20:17,380 --> 00:20:18,675 Same old winner. 524 00:20:22,780 --> 00:20:23,530 Same winner. 525 00:20:28,090 --> 00:20:30,020 No change. 526 00:20:30,020 --> 00:20:33,670 It's getting kind of boring. 527 00:20:33,670 --> 00:20:37,050 Don't fall, that-- same winner. 528 00:20:37,050 --> 00:20:39,300 Please. 529 00:20:39,300 --> 00:20:42,450 PROFESSOR: This is a tough one. 530 00:20:42,450 --> 00:20:43,140 - Oh. 531 00:20:43,140 --> 00:20:45,540 Close, but I think you're still shorter. 532 00:20:45,540 --> 00:20:46,240 All right. 533 00:20:46,240 --> 00:20:46,740 Next. 534 00:20:49,810 --> 00:20:53,230 No change, which means you are the first in line. 535 00:20:53,230 --> 00:20:54,296 Congratulations. 536 00:20:54,296 --> 00:20:56,920 PROFESSOR: So, smallest element now going to be the first slot. 537 00:20:56,920 --> 00:20:59,110 - Now you step forward, and we'll compare you. 538 00:21:11,757 --> 00:21:13,340 PROFESSOR: I would invite you to watch 539 00:21:13,340 --> 00:21:14,730 the left hand of the list. 540 00:21:14,730 --> 00:21:17,700 Notice how it is slowly building up at each stage 541 00:21:17,700 --> 00:21:20,040 to have that portion sorted. 542 00:21:23,364 --> 00:21:24,780 And we deliberately admit students 543 00:21:24,780 --> 00:21:28,950 to be of different heights, so John can do this demo. 544 00:21:28,950 --> 00:21:30,090 - You are the winner. 545 00:21:30,090 --> 00:21:32,172 Take your place in line. 546 00:21:32,172 --> 00:21:34,502 Next. 547 00:21:34,502 --> 00:21:35,230 It's you. 548 00:21:41,640 --> 00:21:44,100 And once again, we have a lovely group 549 00:21:44,100 --> 00:21:46,260 of students sorted in height order. 550 00:21:46,260 --> 00:21:47,194 [END PLAYBACK] 551 00:21:47,194 --> 00:21:48,130 [APPLAUSE] 552 00:21:48,130 --> 00:21:49,504 PROFESSOR: And check out-- I want 553 00:21:49,504 --> 00:21:52,010 you to remember number of comparisons-- 55. 554 00:21:52,010 --> 00:21:54,510 Not that the [INAUDIBLE], but I want you to see a comparison 555 00:21:54,510 --> 00:21:56,800 as we go on in a second. 556 00:21:56,800 --> 00:21:58,330 So again, selection sort. 557 00:21:58,330 --> 00:22:00,700 This is this idea of, find the smallest element. 558 00:22:00,700 --> 00:22:02,150 Put it at the front. 559 00:22:02,150 --> 00:22:03,682 I might do a little number of flips, 560 00:22:03,682 --> 00:22:05,140 as you can see, here along the way. 561 00:22:05,140 --> 00:22:07,060 But this is the same animation of that. 562 00:22:07,060 --> 00:22:10,240 So let's first of all convince ourselves 563 00:22:10,240 --> 00:22:13,390 it will do the right thing, and then look at some code, 564 00:22:13,390 --> 00:22:15,814 and then run the code. 565 00:22:15,814 --> 00:22:17,230 So to convince ourselves that this 566 00:22:17,230 --> 00:22:18,646 is going to do the right thing, we 567 00:22:18,646 --> 00:22:21,740 could talk about something that we often refer to as a loop 568 00:22:21,740 --> 00:22:22,240 invariant. 569 00:22:22,240 --> 00:22:23,320 We're going to write a loop, but we're 570 00:22:23,320 --> 00:22:24,810 going to walk through this. 571 00:22:24,810 --> 00:22:26,590 And the invariant here-- and we want 572 00:22:26,590 --> 00:22:28,720 to just demonstrate if it's true at the beginning 573 00:22:28,720 --> 00:22:29,845 and it's true at each step. 574 00:22:29,845 --> 00:22:32,020 Therefore, by induction as we did earlier, 575 00:22:32,020 --> 00:22:33,910 I can conclude it's true always. 576 00:22:33,910 --> 00:22:37,300 Is that if I'm given the prefix or the first part 577 00:22:37,300 --> 00:22:41,410 of a list from 0 up to i, and a suffix or a second part 578 00:22:41,410 --> 00:22:44,980 of the list from i plus 1 up to the end of the overall list-- 579 00:22:44,980 --> 00:22:48,220 given that, then I want to assert that the invariant is 580 00:22:48,220 --> 00:22:52,240 that the prefix is sorted and no element of the prefix 581 00:22:52,240 --> 00:22:54,799 is larger than the smallest element of the suffix. 582 00:22:54,799 --> 00:22:55,840 Just what I said earlier. 583 00:22:55,840 --> 00:22:59,110 It says, at any stage here-- if this is the amount of sort 584 00:22:59,110 --> 00:23:01,900 I've done so far-- I can guarantee, I'm going to claim, 585 00:23:01,900 --> 00:23:02,900 this will be sorted. 586 00:23:02,900 --> 00:23:07,440 And everything here is bigger than that thing there. 587 00:23:07,440 --> 00:23:09,110 How do I prove it? 588 00:23:09,110 --> 00:23:12,310 Well the base case is really easy. 589 00:23:12,310 --> 00:23:14,240 In the base case, the prefix is empty. 590 00:23:14,240 --> 00:23:16,260 I don't have anything, so it's obviously sorted. 591 00:23:16,260 --> 00:23:17,843 And everything in the suffix is bigger 592 00:23:17,843 --> 00:23:19,080 than anything in the prefix. 593 00:23:19,080 --> 00:23:20,710 So I'm fine. 594 00:23:20,710 --> 00:23:23,460 And then I just want to say, as long as I write my code so 595 00:23:23,460 --> 00:23:26,730 that this step is true, then I'm going to move the smallest 596 00:23:26,730 --> 00:23:30,420 element from the suffix-- the second part of the list-- 597 00:23:30,420 --> 00:23:32,460 to the end of the prefix. 598 00:23:32,460 --> 00:23:35,820 Since the prefix was sorted, this is now sorted. 599 00:23:35,820 --> 00:23:38,850 And everything in the suffix is still 600 00:23:38,850 --> 00:23:41,940 going to be bigger than everything in the prefix. 601 00:23:41,940 --> 00:23:43,530 And as a consequence, by induction, 602 00:23:43,530 --> 00:23:45,900 this is going to give me something that says it's always 603 00:23:45,900 --> 00:23:48,980 going to be correct. 604 00:23:48,980 --> 00:23:52,600 So here's code that would do that. 605 00:23:52,600 --> 00:23:53,100 Here. 606 00:23:53,100 --> 00:23:55,308 I'm just going to set a little thing called the start 607 00:23:55,308 --> 00:23:56,840 of suffix, or soft start. 608 00:23:56,840 --> 00:24:00,686 Initially it's going to point to the beginning of the list. 609 00:24:00,686 --> 00:24:02,060 And then I'm going to run a loop. 610 00:24:02,060 --> 00:24:03,920 And as long as I still have things 611 00:24:03,920 --> 00:24:06,110 to search in the list, that that pointer doesn't 612 00:24:06,110 --> 00:24:10,046 point to the end of the list, what am I going to do? 613 00:24:10,046 --> 00:24:11,420 I'm going to loop over everything 614 00:24:11,420 --> 00:24:14,240 from that point to the end of the list, 615 00:24:14,240 --> 00:24:19,020 comparing it to the thing at that point. 616 00:24:19,020 --> 00:24:22,750 If it's less than, I'm going to do a swap 617 00:24:22,750 --> 00:24:24,610 because I wanted to move it up. 618 00:24:24,610 --> 00:24:26,110 And you can see, by the time I get 619 00:24:26,110 --> 00:24:29,290 through this loop I will have found the smallest element 620 00:24:29,290 --> 00:24:31,040 in the remainder of the list. 621 00:24:31,040 --> 00:24:33,640 And I would have put it at that spot, whatever 622 00:24:33,640 --> 00:24:36,740 suffix start points to. 623 00:24:36,740 --> 00:24:41,084 And when I've done all of that, I just change this by one. 624 00:24:41,084 --> 00:24:42,500 Having found the smallest element, 625 00:24:42,500 --> 00:24:43,700 I've stuck it at spot zero. 626 00:24:43,700 --> 00:24:44,450 I'll do the same thing. 627 00:24:44,450 --> 00:24:46,075 Having found the next smallest element, 628 00:24:46,075 --> 00:24:47,480 I know it's at point one. 629 00:24:47,480 --> 00:24:51,084 And I'll just continue around. 630 00:24:51,084 --> 00:24:52,500 One of the things you can see here 631 00:24:52,500 --> 00:24:56,620 is, as opposed to bubble sort, this one 632 00:24:56,620 --> 00:25:00,630 is going to take n times around the loop 633 00:25:00,630 --> 00:25:03,595 because I'm only moving this pointer by one 634 00:25:03,595 --> 00:25:05,970 So it starts at 0, and then 1, and then 2, all the way up 635 00:25:05,970 --> 00:25:08,650 to n minus 1. 636 00:25:08,650 --> 00:25:12,070 You can also see in this particular implementation, 637 00:25:12,070 --> 00:25:15,010 while I'm certainly ensuring that the smallest element goes 638 00:25:15,010 --> 00:25:19,590 into that spot, I may do a few other flips along the way. 639 00:25:19,590 --> 00:25:22,090 I'm going to find something I think is the smallest element, 640 00:25:22,090 --> 00:25:23,950 put it there and put that element here. 641 00:25:23,950 --> 00:25:25,824 And then when I find another smaller element, 642 00:25:25,824 --> 00:25:27,182 I may do that flip. 643 00:25:27,182 --> 00:25:29,140 I could have implemented this where I literally 644 00:25:29,140 --> 00:25:32,290 search for the smallest element and only move that. 645 00:25:32,290 --> 00:25:36,190 Doesn't make any difference in terms of the complexity. 646 00:25:36,190 --> 00:25:36,690 All right. 647 00:25:36,690 --> 00:25:39,230 What's the complexity here? 648 00:25:39,230 --> 00:25:40,590 Already said this part. 649 00:25:40,590 --> 00:25:45,170 I will loop n times, because I start at 0 and then 1. 650 00:25:45,170 --> 00:25:46,730 You get the idea. 651 00:25:46,730 --> 00:25:51,260 Inside of the loop I'm going to walk down 652 00:25:51,260 --> 00:25:54,260 the remainder of the list, which is initially n. 653 00:25:54,260 --> 00:25:57,932 And then n minus 1, and then n minus 2 times. 654 00:25:57,932 --> 00:25:59,390 But we've seen that before as well. 655 00:25:59,390 --> 00:26:05,570 While they get shorter, that complexity is still quadratic. 656 00:26:05,570 --> 00:26:08,360 Order n times going through this process. 657 00:26:08,360 --> 00:26:12,850 Within the process, order n things that I have to compare. 658 00:26:12,850 --> 00:26:14,090 And yes, n gets smaller. 659 00:26:14,090 --> 00:26:17,000 But we know that that n term, if you like to dominate. 660 00:26:17,000 --> 00:26:20,980 So again, this is quadratic. 661 00:26:20,980 --> 00:26:21,480 OK. 662 00:26:21,480 --> 00:26:23,563 Before you believe that all sorting algorithms are 663 00:26:23,563 --> 00:26:26,340 quadratic, I want to show you the last one, 664 00:26:26,340 --> 00:26:29,220 the one that actually is one of the-- I think-- 665 00:26:29,220 --> 00:26:31,560 the prettiest algorithms around, and a great example 666 00:26:31,560 --> 00:26:33,180 of a more efficient algorithm. 667 00:26:33,180 --> 00:26:36,180 It's called merge sort. 668 00:26:36,180 --> 00:26:39,080 Merge sort takes an approach we've seen before. 669 00:26:39,080 --> 00:26:40,830 We talked about divide and conquer. 670 00:26:40,830 --> 00:26:44,010 Break the problem down into smaller versions 671 00:26:44,010 --> 00:26:45,540 of the same problem. 672 00:26:45,540 --> 00:26:47,310 And once you've got those solutions, 673 00:26:47,310 --> 00:26:49,670 bring the answer back together. 674 00:26:49,670 --> 00:26:51,460 For merge sort, that's pretty easy. 675 00:26:51,460 --> 00:26:54,950 It says, if I've got a list of 0 or 1 elements, it's sorted. 676 00:26:54,950 --> 00:26:55,940 Duh. 677 00:26:55,940 --> 00:26:56,890 OK. 678 00:26:56,890 --> 00:27:00,495 If I got a list of more than 1 element, here's my trick. 679 00:27:00,495 --> 00:27:03,160 I'm going to split it into two lists. 680 00:27:03,160 --> 00:27:04,740 I'm going to sort them. 681 00:27:04,740 --> 00:27:08,010 And when I'm done, I'm just going to merge those two lists 682 00:27:08,010 --> 00:27:09,510 into one list. 683 00:27:09,510 --> 00:27:11,164 And the merge is easy. 684 00:27:11,164 --> 00:27:13,080 Because if I've got two lists that are sorted, 685 00:27:13,080 --> 00:27:15,904 I just need to look at the first element of each, 686 00:27:15,904 --> 00:27:17,070 take the one that's smaller. 687 00:27:17,070 --> 00:27:19,874 Add it to my result. And keep doing that until one 688 00:27:19,874 --> 00:27:20,790 of the lists is empty. 689 00:27:20,790 --> 00:27:25,260 And then just copy the remainder of the other list. 690 00:27:25,260 --> 00:27:27,137 You can probably already get a sense 691 00:27:27,137 --> 00:27:29,220 of what the cost is going to be here, because this 692 00:27:29,220 --> 00:27:31,510 is cutting the problem in half. 693 00:27:31,510 --> 00:27:32,830 Now I've got two pieces. 694 00:27:32,830 --> 00:27:35,100 So I need to think about both of them. 695 00:27:35,100 --> 00:27:37,350 I want to give you a couple of visualizations of this. 696 00:27:37,350 --> 00:27:38,224 Here's the first one. 697 00:27:38,224 --> 00:27:41,060 It says, basically, I've got a big unsorted list. 698 00:27:41,060 --> 00:27:42,660 I'm going to split it. 699 00:27:42,660 --> 00:27:43,890 And I'm going to split it. 700 00:27:43,890 --> 00:27:45,420 And I'm going to split it. 701 00:27:45,420 --> 00:27:49,620 Until I get down to just lists that are either 0 or 1, which 702 00:27:49,620 --> 00:27:52,830 by definition are sorted. 703 00:27:52,830 --> 00:27:55,320 And once I'm at that level, then I just 704 00:27:55,320 --> 00:27:58,680 have to merge them into a sorted list 705 00:27:58,680 --> 00:28:01,770 and then merge them pairwise into a sorted list. 706 00:28:01,770 --> 00:28:04,490 And you get the idea. 707 00:28:04,490 --> 00:28:05,780 So it's divide and conquer. 708 00:28:05,780 --> 00:28:08,360 The divide is dividing it up into smaller pieces. 709 00:28:08,360 --> 00:28:12,000 The conquer is merging them back together. 710 00:28:12,000 --> 00:28:15,220 And we have Professor Guttag back for an encore, 711 00:28:15,220 --> 00:28:16,480 together with his students. 712 00:28:16,480 --> 00:28:19,133 So let's show you an example of merge sort. 713 00:28:19,133 --> 00:28:19,940 [VIDEO PLAYBACK] 714 00:28:19,940 --> 00:28:23,120 - So we're about to demonstrate merge sort. 715 00:28:23,120 --> 00:28:26,320 And we're going to sort this rather motley collection of MIT 716 00:28:26,320 --> 00:28:29,410 students by height. 717 00:28:29,410 --> 00:28:32,380 So the first thing we need to do is, 718 00:28:32,380 --> 00:28:36,290 we're going to ask everyone to split into a group of two. 719 00:28:36,290 --> 00:28:38,590 So you split a little bit. 720 00:28:38,590 --> 00:28:40,310 You two are together. 721 00:28:40,310 --> 00:28:41,980 You two are together. 722 00:28:41,980 --> 00:28:43,870 You two are together. 723 00:28:43,870 --> 00:28:44,830 You two are together. 724 00:28:44,830 --> 00:28:46,910 And you are all by yourself. 725 00:28:46,910 --> 00:28:48,215 I'm sorry. 726 00:28:48,215 --> 00:28:49,090 PROFESSOR: Poor Anna. 727 00:28:49,090 --> 00:28:50,470 - All right. 728 00:28:50,470 --> 00:28:53,830 So now let's take the first group. 729 00:28:53,830 --> 00:28:55,670 Take a step down. 730 00:28:55,670 --> 00:28:58,570 And what we do is, we sort this group by height, 731 00:28:58,570 --> 00:29:01,300 with the shortest on the left. 732 00:29:01,300 --> 00:29:02,020 And look at this. 733 00:29:02,020 --> 00:29:03,670 We don't have to do anything. 734 00:29:03,670 --> 00:29:04,340 Thank you. 735 00:29:04,340 --> 00:29:07,160 Feel free to go back up. 736 00:29:07,160 --> 00:29:09,460 We then sort the next pair. 737 00:29:09,460 --> 00:29:10,720 Please. 738 00:29:10,720 --> 00:29:13,900 And it looks to me like we need to switch. 739 00:29:13,900 --> 00:29:14,650 All right. 740 00:29:14,650 --> 00:29:15,525 Take a step back. 741 00:29:18,260 --> 00:29:26,640 Ladies-- OK. 742 00:29:26,640 --> 00:29:30,926 Ladies, gentlemen-- also OK. 743 00:29:34,640 --> 00:29:37,414 And again, OK. 744 00:29:37,414 --> 00:29:39,330 PROFESSOR: Notice each subgroup is now sorted. 745 00:29:39,330 --> 00:29:39,955 Which is great. 746 00:29:39,955 --> 00:29:42,110 - And I think you're in the correct order. 747 00:29:42,110 --> 00:29:47,690 Now what we do is, we take these groups and merge the groups. 748 00:29:47,690 --> 00:29:51,290 So let's have these two-- going to sort these groups, 749 00:29:51,290 --> 00:29:54,660 have them step forward. 750 00:29:54,660 --> 00:29:56,540 And now what we're doing is, we're 751 00:29:56,540 --> 00:30:00,330 doing a merge of the two sorted groups. 752 00:30:00,330 --> 00:30:03,620 So we start by merging them. 753 00:30:03,620 --> 00:30:06,590 We'll take the leftmost person in this group 754 00:30:06,590 --> 00:30:09,290 and compare her to the first person in this group, 755 00:30:09,290 --> 00:30:09,860 and decide. 756 00:30:09,860 --> 00:30:11,750 She's still the shortest. 757 00:30:11,750 --> 00:30:12,703 Take a step back. 758 00:30:16,390 --> 00:30:20,450 Now we're going to look at you and say, 759 00:30:20,450 --> 00:30:23,450 you're actually taller than this fellow. 760 00:30:23,450 --> 00:30:25,610 So you now step up there. 761 00:30:29,070 --> 00:30:31,320 And we're good here. 762 00:30:31,320 --> 00:30:32,710 Both of you take a step back. 763 00:30:36,010 --> 00:30:40,330 Now we'll take these two groups and follow the same procedure. 764 00:30:40,330 --> 00:30:41,756 We'll merge them. 765 00:30:41,756 --> 00:30:42,400 Let's see. 766 00:30:42,400 --> 00:30:45,370 We'll compare you-- the first person 767 00:30:45,370 --> 00:30:48,220 in this group to the first person in this group. 768 00:30:48,220 --> 00:30:49,520 Now it's a little tricky. 769 00:30:49,520 --> 00:30:51,910 So let's see, the two of you compare. 770 00:30:51,910 --> 00:30:54,950 Let's see, back to back. 771 00:30:54,950 --> 00:30:56,800 We have a winner. 772 00:30:56,800 --> 00:30:58,120 Step back. 773 00:30:58,120 --> 00:31:01,790 And now we need to compare the shortest person in this group 774 00:31:01,790 --> 00:31:04,210 to the shortest person in this group. 775 00:31:04,210 --> 00:31:04,930 We have a winner. 776 00:31:04,930 --> 00:31:07,700 It's you. 777 00:31:07,700 --> 00:31:09,410 I'm sorry. 778 00:31:09,410 --> 00:31:12,880 And now we just-- we're OK. 779 00:31:12,880 --> 00:31:13,695 Please step back. 780 00:31:17,000 --> 00:31:21,170 Now we'll have these two groups come forward. 781 00:31:21,170 --> 00:31:23,210 We'll compare the shortest person in this group 782 00:31:23,210 --> 00:31:25,410 to the shortest person in that group. 783 00:31:25,410 --> 00:31:27,610 I actually need you guys to get back to back here. 784 00:31:30,450 --> 00:31:32,930 You are the winner. 785 00:31:32,930 --> 00:31:35,540 And it's pretty clear that the shortest person in this group 786 00:31:35,540 --> 00:31:38,550 is shorter than the shortest person in that group. 787 00:31:38,550 --> 00:31:40,880 So you go there and you step back. 788 00:31:40,880 --> 00:31:42,110 PROFESSOR: Notice the groups. 789 00:31:42,110 --> 00:31:42,890 Now all sorted. 790 00:31:42,890 --> 00:31:44,431 - And now we repeat the same process. 791 00:31:51,840 --> 00:31:53,840 PROFESSOR: And notice how the whole subgroup now 792 00:31:53,840 --> 00:31:55,980 goes up once we know that one group is empty. 793 00:32:04,140 --> 00:32:06,780 - And you can see that we have a group of students 794 00:32:06,780 --> 00:32:09,462 sorted in order by height. 795 00:32:09,462 --> 00:32:10,386 [END PLAYBACK] 796 00:32:10,386 --> 00:32:12,577 [APPLAUSE] 797 00:32:12,577 --> 00:32:14,410 PROFESSOR: Remember the first number, right? 798 00:32:14,410 --> 00:32:16,070 55, 28. 799 00:32:16,070 --> 00:32:17,560 Now it's just numbers but you can 800 00:32:17,560 --> 00:32:20,740 see the expectation is, this is going to take less time. 801 00:32:20,740 --> 00:32:22,550 And it certainly did there. 802 00:32:22,550 --> 00:32:26,460 So again just to demo another way visually. 803 00:32:26,460 --> 00:32:27,700 I'm sorting-- sorry. 804 00:32:27,700 --> 00:32:29,820 I am splitting down until I get small things, 805 00:32:29,820 --> 00:32:31,125 and then just merging them up. 806 00:32:31,125 --> 00:32:33,780 I may have to do multiple passes through here, 807 00:32:33,780 --> 00:32:35,460 but it's going to be hopefully faster 808 00:32:35,460 --> 00:32:37,830 than the other methods we looked at. 809 00:32:37,830 --> 00:32:39,514 I'm going to show you code in a second, 810 00:32:39,514 --> 00:32:41,430 and then we're going to run it just to see it. 811 00:32:41,430 --> 00:32:45,515 But let me stress one more time just the idea of merging. 812 00:32:45,515 --> 00:32:46,390 You can see the idea. 813 00:32:46,390 --> 00:32:48,360 I keep splitting down till I got something small enough. 814 00:32:48,360 --> 00:32:49,420 And I want to merge them back. 815 00:32:49,420 --> 00:32:52,050 The idea of merging-- you've seen it from Professor Guttag. 816 00:32:52,050 --> 00:32:55,690 But I just want to highlight why this is going to be efficient. 817 00:32:55,690 --> 00:32:59,990 If I've got two lists: list 1 and list 2, 818 00:32:59,990 --> 00:33:01,340 the things left there. 819 00:33:01,340 --> 00:33:03,320 Process is very simple. 820 00:33:03,320 --> 00:33:05,960 I pull out the smallest element of each. 821 00:33:05,960 --> 00:33:07,460 I compare them. 822 00:33:07,460 --> 00:33:11,060 And I simply put the smallest one into the result, 823 00:33:11,060 --> 00:33:13,130 move on in that first list. 824 00:33:13,130 --> 00:33:15,170 So the 1 disappears from that left list. 825 00:33:15,170 --> 00:33:18,260 And now again I pull up just the smallest element of each one, 826 00:33:18,260 --> 00:33:19,670 do the comparison. 827 00:33:19,670 --> 00:33:22,190 Smallest one goes to the end of my result. 828 00:33:22,190 --> 00:33:24,430 And I drop that element from its list. 829 00:33:24,430 --> 00:33:28,750 So I've now taken 1 from list 1 and one from list 2. 830 00:33:28,750 --> 00:33:30,470 You get the idea. 831 00:33:30,470 --> 00:33:33,460 The reason I want to give you this visualization-- sorry. 832 00:33:33,460 --> 00:33:34,460 Let me do the last step. 833 00:33:34,460 --> 00:33:37,610 Once I get to a place where one of the lists is empty, 834 00:33:37,610 --> 00:33:42,200 just copy the rest of the list onto the end. 835 00:33:42,200 --> 00:33:46,250 You can see already a hint of the code. 836 00:33:46,250 --> 00:33:49,040 And that is, that I'm only going to ever look 837 00:33:49,040 --> 00:33:54,110 at each element of each sublist once as I do the merge. 838 00:33:54,110 --> 00:33:55,280 And that's a nice property. 839 00:33:55,280 --> 00:33:56,810 Having had them sorted, I don't need 840 00:33:56,810 --> 00:33:58,460 to do lots of interior comparisons. 841 00:33:58,460 --> 00:34:00,800 I'm only comparing the ends of the list. 842 00:34:00,800 --> 00:34:03,600 I only, therefore, look at each element-- 843 00:34:03,600 --> 00:34:05,720 the number of comparisons, rather, I should say. 844 00:34:05,720 --> 00:34:07,470 I may look at each element more than once. 845 00:34:07,470 --> 00:34:09,170 The number of comparisons is going 846 00:34:09,170 --> 00:34:12,819 to be, at most, the number of elements in both lists. 847 00:34:12,819 --> 00:34:14,360 And that's going to be a nice Q as we 848 00:34:14,360 --> 00:34:17,029 think about how to solve it. 849 00:34:17,029 --> 00:34:19,570 So here's the code to merge, and then we'll write Merge Sort. 850 00:34:19,570 --> 00:34:21,153 And I know there's a lot of code here, 851 00:34:21,153 --> 00:34:23,830 but we can walk through it and get a good sense of it. 852 00:34:23,830 --> 00:34:26,770 I'm going to set up a variable called Result that's 853 00:34:26,770 --> 00:34:29,260 going to hold my answer. 854 00:34:29,260 --> 00:34:31,480 And I'm going to set up two indices, i and j, that 855 00:34:31,480 --> 00:34:32,260 are initially 0. 856 00:34:32,260 --> 00:34:33,676 They're pointing to the beginning. 857 00:34:33,676 --> 00:34:35,770 And remember, the input here is two lists 858 00:34:35,770 --> 00:34:37,719 that we know are sorted-- or should be sorted, 859 00:34:37,719 --> 00:34:39,670 or we screwed up in some way. 860 00:34:39,670 --> 00:34:42,130 So initially, i and j are both pointing to the beginning 861 00:34:42,130 --> 00:34:44,172 of the left and right list. 862 00:34:44,172 --> 00:34:45,130 And look at what we do. 863 00:34:45,130 --> 00:34:48,491 We say, as long as there's still something in the left list 864 00:34:48,491 --> 00:34:50,199 and still something in the right list-- i 865 00:34:50,199 --> 00:34:51,670 is less than the length of left, j 866 00:34:51,670 --> 00:34:54,389 is less than the length of right. 867 00:34:54,389 --> 00:34:56,420 Do the comparison. 868 00:34:56,420 --> 00:35:00,870 If the left wants smaller, add it to the end of result. 869 00:35:00,870 --> 00:35:02,190 To the end of result, right? 870 00:35:02,190 --> 00:35:05,370 I'm appending it because I want it to be in that sorted order. 871 00:35:05,370 --> 00:35:07,220 And increase i. 872 00:35:07,220 --> 00:35:11,510 If it's not, add the right one to the end of result 873 00:35:11,510 --> 00:35:13,400 and increase j. 874 00:35:13,400 --> 00:35:15,310 And I'll just keep doing that until I 875 00:35:15,310 --> 00:35:16,660 exhaust one of the lists. 876 00:35:16,660 --> 00:35:19,480 And when I do I can basically say, 877 00:35:19,480 --> 00:35:23,110 if the right list is empty, I know if I get out of here 878 00:35:23,110 --> 00:35:24,170 they can't both be true. 879 00:35:24,170 --> 00:35:26,669 In other words, if there's still something in the left list, 880 00:35:26,669 --> 00:35:29,337 just put it on the end. 881 00:35:29,337 --> 00:35:31,670 Otherwise if the only things left are in the right list, 882 00:35:31,670 --> 00:35:34,360 just put them on the end. 883 00:35:34,360 --> 00:35:36,730 So I'm just walking down the list, doing the comparison, 884 00:35:36,730 --> 00:35:39,230 adding the smallest element to my result. And when I'm done, 885 00:35:39,230 --> 00:35:42,810 I just return result. 886 00:35:42,810 --> 00:35:45,137 Complexity we can already begin to see here, right? 887 00:35:45,137 --> 00:35:47,220 This says the left and right sublists are ordered, 888 00:35:47,220 --> 00:35:49,710 so I'm just moving the indices depending on which 889 00:35:49,710 --> 00:35:51,790 one holds the smaller element. 890 00:35:51,790 --> 00:35:56,740 And when I get done, I'm just returning the rest of the list. 891 00:35:56,740 --> 00:35:59,224 So what's the complexity here? 892 00:35:59,224 --> 00:36:01,140 I'm going to do this a little more informally. 893 00:36:01,140 --> 00:36:03,098 You could actually do that kind of relationship 894 00:36:03,098 --> 00:36:03,937 I did last time. 895 00:36:03,937 --> 00:36:04,770 But what am I doing? 896 00:36:04,770 --> 00:36:07,320 I'm going through the two lists, but only one time 897 00:36:07,320 --> 00:36:09,540 through each of those two lists. 898 00:36:09,540 --> 00:36:11,780 I'm only comparing the smallest elements. 899 00:36:11,780 --> 00:36:14,960 So as I already said, this says that the number of elements 900 00:36:14,960 --> 00:36:18,300 I copy will be everything in the left list and everything 901 00:36:18,300 --> 00:36:19,050 in the right list. 902 00:36:19,050 --> 00:36:21,420 So that order is just the length of left 903 00:36:21,420 --> 00:36:23,340 plus the length of right. 904 00:36:23,340 --> 00:36:26,440 And how many comparisons do I do? 905 00:36:26,440 --> 00:36:29,750 The most I have to do is however many are in the longer list. 906 00:36:29,750 --> 00:36:30,250 Right? 907 00:36:30,250 --> 00:36:33,250 That's the maximum number I need to have. 908 00:36:33,250 --> 00:36:33,970 Oh, that's nice. 909 00:36:33,970 --> 00:36:37,720 That says, if the lists are of order n-- I'm doing order n 910 00:36:37,720 --> 00:36:39,580 copies, because order n plus order 911 00:36:39,580 --> 00:36:41,860 n is just 2n, which is order n-- then 912 00:36:41,860 --> 00:36:44,590 I'm doing order n comparisons. 913 00:36:44,590 --> 00:36:47,770 So it's linear in the length of the lists. 914 00:36:47,770 --> 00:36:48,330 OK. 915 00:36:48,330 --> 00:36:50,580 Sounds good. 916 00:36:50,580 --> 00:36:52,320 That just does the merge. 917 00:36:52,320 --> 00:36:54,250 How do I do merge sort? 918 00:36:54,250 --> 00:36:55,740 Well we said it. 919 00:36:55,740 --> 00:36:56,910 Break the problem in half. 920 00:36:56,910 --> 00:36:58,980 Keep doing it until I get sorted lists. 921 00:36:58,980 --> 00:37:00,670 And then grow them back up. 922 00:37:00,670 --> 00:37:01,710 So there's merge sort. 923 00:37:01,710 --> 00:37:04,770 It says, if the list is either empty or of length 1, 924 00:37:04,770 --> 00:37:07,480 just return a copy of the list. 925 00:37:07,480 --> 00:37:08,980 It's sorted. 926 00:37:08,980 --> 00:37:10,750 Otherwise find the middle point-- 927 00:37:10,750 --> 00:37:13,710 there's that integer division-- and split. 928 00:37:13,710 --> 00:37:16,630 Split the list everything up to the middle point 929 00:37:16,630 --> 00:37:17,800 and do merge sort on that. 930 00:37:17,800 --> 00:37:20,520 Split everything in the list from the middle point on. 931 00:37:20,520 --> 00:37:22,000 Do merge sort on that. 932 00:37:22,000 --> 00:37:28,160 And when I get back those two sorted lists, just merge them. 933 00:37:28,160 --> 00:37:30,260 Again, I hope you can see what the order of growth 934 00:37:30,260 --> 00:37:31,730 should be here. 935 00:37:31,730 --> 00:37:35,270 Cutting the problem down in half at each step. 936 00:37:35,270 --> 00:37:37,910 So the number of times I should have to go through this 937 00:37:37,910 --> 00:37:42,047 should be to log n the size of the original list. 938 00:37:42,047 --> 00:37:44,130 And you can see why we call it divide and conquer. 939 00:37:44,130 --> 00:37:45,866 I'm dividing it down into small pieces 940 00:37:45,866 --> 00:37:47,490 until I have a simple solution and then 941 00:37:47,490 --> 00:37:50,910 I'm growing that solution back up. 942 00:37:50,910 --> 00:37:54,720 So there is the base case, there's the divide, 943 00:37:54,720 --> 00:37:58,340 and there's the nice conquer [INAUDIBLE] piece of this. 944 00:37:58,340 --> 00:37:59,412 OK. 945 00:37:59,412 --> 00:38:01,120 I'm going to show you an example of that. 946 00:38:01,120 --> 00:38:03,520 But let's actually look at some code-- sorry about that. 947 00:38:03,520 --> 00:38:05,550 Let's look at some code to do this. 948 00:38:05,550 --> 00:38:08,990 And in fact I meant to do this earlier and didn't. 949 00:38:08,990 --> 00:38:13,120 I also have a version of bubble sort here. 950 00:38:13,120 --> 00:38:14,140 Sorry-- selection sort. 951 00:38:14,140 --> 00:38:15,389 I've already done bubble sort. 952 00:38:15,389 --> 00:38:16,590 There is selection sort. 953 00:38:16,590 --> 00:38:21,170 Let's uncomment this. 954 00:38:21,170 --> 00:38:22,970 And let's run both of those and just see 955 00:38:22,970 --> 00:38:25,470 the comparison between them. 956 00:38:25,470 --> 00:38:28,370 Yeah, sorry-- just make that a little easier to read. 957 00:38:28,370 --> 00:38:30,910 There we go. 958 00:38:30,910 --> 00:38:31,990 So we saw a bubble sort. 959 00:38:31,990 --> 00:38:34,720 It only went through four times, so less than n times. 960 00:38:34,720 --> 00:38:36,930 There's selection sort. 961 00:38:36,930 --> 00:38:38,920 And as I said to you, it has to do 962 00:38:38,920 --> 00:38:41,890 n passes it because it can only ever guarantee that it gets 963 00:38:41,890 --> 00:38:43,840 one element at the beginning. 964 00:38:43,840 --> 00:38:46,930 So you can in fact see, in this case, from the first 965 00:38:46,930 --> 00:38:50,230 or after the initial input until the end of the first step, 966 00:38:50,230 --> 00:38:52,180 it looks like it didn't do anything 967 00:38:52,180 --> 00:38:56,260 because it determined eventually that one was in the right spot. 968 00:38:56,260 --> 00:38:58,240 And similarly I think there's another one 969 00:38:58,240 --> 00:39:00,000 right there where it doesn't do any-- 970 00:39:00,000 --> 00:39:01,340 or appears not to do anything. 971 00:39:01,340 --> 00:39:03,298 All it's guaranteeing is that the next smallest 972 00:39:03,298 --> 00:39:05,242 element is in the right spot. 973 00:39:05,242 --> 00:39:06,700 As we get through to the end of it, 974 00:39:06,700 --> 00:39:09,220 it in fact ends up in the right place. 975 00:39:09,220 --> 00:39:10,660 And then let's look at merge sort 976 00:39:10,660 --> 00:39:14,330 and do one more visualization of this. 977 00:39:14,330 --> 00:39:17,750 Again let me remove that. 978 00:39:17,750 --> 00:39:23,110 If we run it-- again, I've just put some print statements 979 00:39:23,110 --> 00:39:24,750 in there. 980 00:39:24,750 --> 00:39:28,100 Here you can see a nice behavior. 981 00:39:28,100 --> 00:39:30,850 I start off calling Merge Sort with that, 982 00:39:30,850 --> 00:39:33,184 which splits down into doing Merge Sort of this portion. 983 00:39:33,184 --> 00:39:35,350 Eventually it's going to come back down there and do 984 00:39:35,350 --> 00:39:35,980 the second one. 985 00:39:35,980 --> 00:39:39,460 It keeps doing it until it gets down to simple lists 986 00:39:39,460 --> 00:39:41,380 that it knows are sorted. 987 00:39:41,380 --> 00:39:43,090 And then it merges it. 988 00:39:43,090 --> 00:39:45,460 Does the smaller pieces and then merges it. 989 00:39:45,460 --> 00:39:47,530 And having now 2 merged things, it 990 00:39:47,530 --> 00:39:50,180 can do the next level of merge. 991 00:39:50,180 --> 00:39:53,000 So you can see that it gets this nice reduction of problems 992 00:39:53,000 --> 00:39:57,250 until it gets down to the smallest size. 993 00:39:57,250 --> 00:39:59,620 So let's just look at one more visualization of that 994 00:39:59,620 --> 00:40:01,670 and then get the complexity. 995 00:40:01,670 --> 00:40:05,873 So if I start out with this list-- sorry about that. 996 00:40:05,873 --> 00:40:09,050 What I need to do is split it. 997 00:40:09,050 --> 00:40:10,520 Take the first one, split it. 998 00:40:10,520 --> 00:40:13,670 Keep doing that until I get down to a base case 999 00:40:13,670 --> 00:40:17,420 where I know what those are and I simply merge them. 1000 00:40:17,420 --> 00:40:18,950 Pass it back up. 1001 00:40:18,950 --> 00:40:20,000 Take the second piece. 1002 00:40:20,000 --> 00:40:22,040 Split it until I get down to base cases. 1003 00:40:22,040 --> 00:40:24,620 Do the merge, which is nice and linear. 1004 00:40:24,620 --> 00:40:25,610 Pass that back up. 1005 00:40:25,610 --> 00:40:29,470 Having done those two pieces, I do one more merge. 1006 00:40:29,470 --> 00:40:30,838 And I do the same thing. 1007 00:40:33,530 --> 00:40:35,330 I want you to see this, because again you 1008 00:40:35,330 --> 00:40:40,640 can notice how many levels in this tree log. 1009 00:40:40,640 --> 00:40:41,540 Log in the size. 1010 00:40:41,540 --> 00:40:44,000 Because at each stage here, I went from a problem 1011 00:40:44,000 --> 00:40:46,100 of 8 to two problems of 4. 1012 00:40:46,100 --> 00:40:47,850 Each of those went to two problems of 2, 1013 00:40:47,850 --> 00:40:52,090 and each of those went to two problems of size 1. 1014 00:40:52,090 --> 00:40:52,590 All right. 1015 00:40:52,590 --> 00:40:58,030 So the last piece is, what's the complexity? 1016 00:40:58,030 --> 00:41:00,970 Here's a simple way to think about it. 1017 00:41:00,970 --> 00:41:04,330 At the top level, I start off with n elements. 1018 00:41:04,330 --> 00:41:08,440 I've got two sorted lists of size n over 2. 1019 00:41:08,440 --> 00:41:13,520 And to merge them together, I need to do order n work. 1020 00:41:13,520 --> 00:41:17,110 Because as I said I got to do at least n comparisons where 1021 00:41:17,110 --> 00:41:19,040 n is the length of the list. 1022 00:41:19,040 --> 00:41:22,045 And then I've got to do n plus n copies, which is just order n. 1023 00:41:22,045 --> 00:41:24,610 So I'm doing order n work. 1024 00:41:24,610 --> 00:41:27,850 At the second level, it gets a little more complicated. 1025 00:41:27,850 --> 00:41:31,710 Now I've got problems of size n over 4. 1026 00:41:31,710 --> 00:41:33,942 But how many of them do I have? 1027 00:41:33,942 --> 00:41:35,770 4. 1028 00:41:35,770 --> 00:41:37,084 Oh, that's nice. 1029 00:41:37,084 --> 00:41:38,500 Because what do I know about this? 1030 00:41:38,500 --> 00:41:41,976 I know that I have to copy each element at least once. 1031 00:41:41,976 --> 00:41:42,850 So not at least once. 1032 00:41:42,850 --> 00:41:45,220 I will copy each element exactly once. 1033 00:41:45,220 --> 00:41:48,130 And I'll do comparisons that are equal to the length 1034 00:41:48,130 --> 00:41:49,910 of the longer list. 1035 00:41:49,910 --> 00:41:52,300 So I've got four sublists of length n over 4 1036 00:41:52,300 --> 00:41:54,310 that says n elements. 1037 00:41:54,310 --> 00:41:55,630 That's nice. 1038 00:41:55,630 --> 00:41:56,980 Order n. 1039 00:41:56,980 --> 00:42:00,090 At each step, the subproblems get smaller 1040 00:42:00,090 --> 00:42:01,090 but I have more of them. 1041 00:42:01,090 --> 00:42:03,220 But the total size of the problem is n. 1042 00:42:03,220 --> 00:42:07,550 So the cost at each step is order n. 1043 00:42:07,550 --> 00:42:09,530 How many times do I do it? 1044 00:42:09,530 --> 00:42:11,110 Log n. 1045 00:42:11,110 --> 00:42:16,840 So this is log n iterations with order n work at each step. 1046 00:42:16,840 --> 00:42:20,320 And this is a wonderful example of a log linear algorithm. 1047 00:42:20,320 --> 00:42:25,870 It's n log n, where n is the length of the list. 1048 00:42:25,870 --> 00:42:30,460 So what you end up with, then, is-- all right, a joke version, 1049 00:42:30,460 --> 00:42:32,500 some reasonable ways of doing sort 1050 00:42:32,500 --> 00:42:35,560 that are quick and easy to implement but are quadratic, 1051 00:42:35,560 --> 00:42:39,770 and then an elegant way of doing the search that's n log n. 1052 00:42:39,770 --> 00:42:42,250 And I'll remind you I started by saying, as long as I 1053 00:42:42,250 --> 00:42:45,070 can make the cost of sorting small enough 1054 00:42:45,070 --> 00:42:46,600 I can amortize that cost. 1055 00:42:46,600 --> 00:42:48,820 And if you go back and look at last lecture's notes, 1056 00:42:48,820 --> 00:42:52,799 you'll see n log n grows pretty slowly. 1057 00:42:52,799 --> 00:42:54,340 And it's actually a nice thing to do. 1058 00:42:54,340 --> 00:42:56,470 It makes it reasonable to do the sort. 1059 00:42:56,470 --> 00:43:00,030 And then I can do the search in order n time. 1060 00:43:00,030 --> 00:43:02,430 And here's the last punchline. 1061 00:43:02,430 --> 00:43:04,280 It's the fastest we can do. 1062 00:43:04,280 --> 00:43:05,900 I'm going to look at John again. 1063 00:43:05,900 --> 00:43:08,330 I don't think anybody has found a faster sort algorithm. 1064 00:43:08,330 --> 00:43:08,830 Right? 1065 00:43:08,830 --> 00:43:10,600 This is the best one can do. 1066 00:43:10,600 --> 00:43:13,831 Unless you do-- sorry, the best worst case. 1067 00:43:13,831 --> 00:43:14,330 I'm sorry. 1068 00:43:14,330 --> 00:43:15,380 John is absolute right. 1069 00:43:15,380 --> 00:43:16,671 There are better average cases. 1070 00:43:16,671 --> 00:43:18,300 Again, our concern is worst case. 1071 00:43:18,300 --> 00:43:19,674 So this is as good as we're going 1072 00:43:19,674 --> 00:43:22,110 to do in terms of a worst case algorithm. 1073 00:43:22,110 --> 00:43:24,630 So there you now have sorting algorithms and searching 1074 00:43:24,630 --> 00:43:26,960 algorithms, and you've now seen-- 1075 00:43:26,960 --> 00:43:30,780 excuse me, sorry-- constant, log, linear, 1076 00:43:30,780 --> 00:43:34,730 log linear, quadratic, and exponential algorithms. 1077 00:43:34,730 --> 00:43:36,930 I'll remind you, we want things as high up 1078 00:43:36,930 --> 00:43:40,424 in that hierarchy as possible. 1079 00:43:40,424 --> 00:43:42,210 All right. 1080 00:43:42,210 --> 00:43:44,310 I have six minutes left. 1081 00:43:44,310 --> 00:43:45,780 Some of you are going to leave us. 1082 00:43:45,780 --> 00:43:47,020 We're going to miss you, but that's OK. 1083 00:43:47,020 --> 00:43:48,186 I'm sure we'll see later on. 1084 00:43:48,186 --> 00:43:50,950 For those of you hanging around, this isn't a bad time just 1085 00:43:50,950 --> 00:43:52,890 to step back and say, so what have we seen? 1086 00:43:52,890 --> 00:43:55,660 And I want to do this just very quickly. 1087 00:43:55,660 --> 00:43:56,160 I'm sorry. 1088 00:43:56,160 --> 00:43:59,340 And I'll remind you, we started by in some sense giving you 1089 00:43:59,340 --> 00:44:00,957 a little bit of a contract of things 1090 00:44:00,957 --> 00:44:02,040 we were going to show you. 1091 00:44:02,040 --> 00:44:05,580 And I would simply suggest to you, what have we done? 1092 00:44:05,580 --> 00:44:08,120 We've given you a sense of how to represent knowledge 1093 00:44:08,120 --> 00:44:11,720 with data structures, tuples, lists, dictionaries, more 1094 00:44:11,720 --> 00:44:13,640 complicated structures. 1095 00:44:13,640 --> 00:44:17,270 We've shown you some good computational metaphors, 1096 00:44:17,270 --> 00:44:18,530 iteration, and loops. 1097 00:44:18,530 --> 00:44:21,170 Recursion has a great way of breaking problems down 1098 00:44:21,170 --> 00:44:23,809 into simpler versions of the same problem. 1099 00:44:23,809 --> 00:44:25,100 And there really are metaphors. 1100 00:44:25,100 --> 00:44:27,990 There are ways of thinking about problems. 1101 00:44:27,990 --> 00:44:31,950 We've given you abstraction, the idea of capture a computation, 1102 00:44:31,950 --> 00:44:33,450 bury it in a procedure. 1103 00:44:33,450 --> 00:44:34,620 You now have a contract. 1104 00:44:34,620 --> 00:44:36,953 You don't need to know what happens inside the procedure 1105 00:44:36,953 --> 00:44:39,520 as long as it delivers the answer it says it would. 1106 00:44:39,520 --> 00:44:41,520 Or another way of saying it, you can delegate it 1107 00:44:41,520 --> 00:44:43,145 to somebody and trust that you're going 1108 00:44:43,145 --> 00:44:44,880 to get what you like out of it. 1109 00:44:44,880 --> 00:44:47,460 We've seen classes and methods as a wonderful way 1110 00:44:47,460 --> 00:44:51,150 to modularize systems, to capture combinations 1111 00:44:51,150 --> 00:44:54,900 of data and things that operate on them in a nice, elegant way. 1112 00:44:54,900 --> 00:44:56,730 And we just spent a week and a half 1113 00:44:56,730 --> 00:44:59,250 talking about classes of algorithms 1114 00:44:59,250 --> 00:45:01,042 and their complexity. 1115 00:45:01,042 --> 00:45:05,985 If you step up a level, what we hope you've gotten out of this 1116 00:45:05,985 --> 00:45:07,870 are a couple of things. 1117 00:45:07,870 --> 00:45:10,070 You've begun to learn computational modes 1118 00:45:10,070 --> 00:45:11,210 of thinking. 1119 00:45:11,210 --> 00:45:13,260 How do I tackle a problem and divide and conquer? 1120 00:45:13,260 --> 00:45:15,440 How do I think about recursion as a tool 1121 00:45:15,440 --> 00:45:17,480 in dealing with something? 1122 00:45:17,480 --> 00:45:20,330 You've begun to-- begun, I will use that word deliberately-- 1123 00:45:20,330 --> 00:45:22,790 to master the art of computational problem solving. 1124 00:45:22,790 --> 00:45:26,310 How can you take a problem and turn it into an algorithm? 1125 00:45:26,310 --> 00:45:28,730 And especially, you've begun to have the ability 1126 00:45:28,730 --> 00:45:30,800 to make the computer do what you want it to. 1127 00:45:30,800 --> 00:45:33,740 To say, if I've got a problem from biology or chemistry 1128 00:45:33,740 --> 00:45:36,200 or math or physics or chemical engineering 1129 00:45:36,200 --> 00:45:39,507 or mechanical engineering, how do I take that problem and say, 1130 00:45:39,507 --> 00:45:41,090 here's how I would design an algorithm 1131 00:45:41,090 --> 00:45:45,654 to give me a simulation and a way of evaluating what it does. 1132 00:45:45,654 --> 00:45:47,070 And so what we hope we've done is, 1133 00:45:47,070 --> 00:45:50,310 we've started you down the path to being able to think and act 1134 00:45:50,310 --> 00:45:52,170 like a computer scientist. 1135 00:45:52,170 --> 00:45:52,670 All right. 1136 00:45:52,670 --> 00:45:53,406 Don't panic. 1137 00:45:53,406 --> 00:45:55,280 That doesn't mean you stare at people's shoes 1138 00:45:55,280 --> 00:45:55,990 when you talk to them. 1139 00:45:55,990 --> 00:45:58,220 Not all computer scientists do that, just faculty. 1140 00:46:00,780 --> 00:46:02,160 Sorry, John. 1141 00:46:02,160 --> 00:46:04,022 So what do computer scientists do? 1142 00:46:04,022 --> 00:46:05,730 And this is actually meant to be serious. 1143 00:46:05,730 --> 00:46:08,100 And I put up two of my famous historical figures 1144 00:46:08,100 --> 00:46:10,050 of computer scientists. 1145 00:46:10,050 --> 00:46:11,820 They do think computationally. 1146 00:46:11,820 --> 00:46:15,420 They think about abstractions, about algorithms, 1147 00:46:15,420 --> 00:46:16,720 about automated execution. 1148 00:46:16,720 --> 00:46:18,720 So the three A's of computational thinking. 1149 00:46:18,720 --> 00:46:21,000 And in the same way that traditionally you 1150 00:46:21,000 --> 00:46:23,910 had the three R's of reading, writing, and arithmetic, 1151 00:46:23,910 --> 00:46:27,060 computational thinking we hope is becoming a fundamental 1152 00:46:27,060 --> 00:46:30,840 that every well-educated person is going to need. 1153 00:46:30,840 --> 00:46:35,130 And that says, you think about the right abstraction. 1154 00:46:35,130 --> 00:46:37,012 When you have a problem in your [INAUDIBLE] 1155 00:46:37,012 --> 00:46:38,280 what's the right abstraction? 1156 00:46:38,280 --> 00:46:40,740 How do I pull apart the pieces? 1157 00:46:40,740 --> 00:46:43,890 How do I think about that in terms of decomposing things 1158 00:46:43,890 --> 00:46:48,080 into a relationship that I can use to solve problems? 1159 00:46:48,080 --> 00:46:49,910 How do I automate? 1160 00:46:49,910 --> 00:46:51,590 How do I mechanize that abstraction? 1161 00:46:51,590 --> 00:46:54,500 How do I use what I know happens inside of the machine 1162 00:46:54,500 --> 00:46:56,780 to write a sequence of steps in a language I'm 1163 00:46:56,780 --> 00:46:59,460 using to capture that process? 1164 00:46:59,460 --> 00:47:02,570 And then finally, how do I turn that into an algorithm? 1165 00:47:02,570 --> 00:47:06,020 And that not only means I need a language for describing 1166 00:47:06,020 --> 00:47:09,140 those automated processes, and if you 1167 00:47:09,140 --> 00:47:11,810 like allowing the abstraction of details, 1168 00:47:11,810 --> 00:47:14,420 but frankly also a way to communicate. 1169 00:47:14,420 --> 00:47:16,550 If you have to think crisply about how do I 1170 00:47:16,550 --> 00:47:19,010 describe an algorithm, it's actually 1171 00:47:19,010 --> 00:47:21,440 giving you a way to crystallize or clarify 1172 00:47:21,440 --> 00:47:23,510 your thinking about a problem. 1173 00:47:23,510 --> 00:47:26,090 This is not to say you should talk to your friends in Python. 1174 00:47:26,090 --> 00:47:27,500 I don't recommend it. 1175 00:47:27,500 --> 00:47:29,780 But it does say you should use that thinking 1176 00:47:29,780 --> 00:47:33,780 as a way of capturing your ideas of what you're going to do. 1177 00:47:33,780 --> 00:47:36,600 And that leads, then, to this idea of, 1178 00:47:36,600 --> 00:47:38,470 how difficult is a problem? 1179 00:47:38,470 --> 00:47:40,230 How best can I solve it? 1180 00:47:40,230 --> 00:47:42,090 We've shown you these complexity classes 1181 00:47:42,090 --> 00:47:44,820 and we've hinted at the idea that in fact some problems 1182 00:47:44,820 --> 00:47:47,350 are inherently more difficult than others. 1183 00:47:47,350 --> 00:47:50,010 That's something I hope you come back to as you go along. 1184 00:47:50,010 --> 00:47:53,680 And especially we want you to start thinking recursively. 1185 00:47:53,680 --> 00:47:56,980 We want you to think about how do I take a hard problem, 1186 00:47:56,980 --> 00:47:59,520 break it up into simpler versions of the same problem, 1187 00:47:59,520 --> 00:48:02,380 and then construct the solution. 1188 00:48:02,380 --> 00:48:05,590 And that shows up lots of places. 1189 00:48:05,590 --> 00:48:06,090 Right? 1190 00:48:06,090 --> 00:48:08,100 Recursion is in all sorts of wonderful places. 1191 00:48:08,100 --> 00:48:12,030 So just to give you an example, I could say to you recursively, 1192 00:48:12,030 --> 00:48:13,950 "This lecture will end when I'm done 1193 00:48:13,950 --> 00:48:15,840 talking about this lecture, which 1194 00:48:15,840 --> 00:48:18,600 will end when I'm done talking about this lecture, which 1195 00:48:18,600 --> 00:48:19,670 will end when I'm done--" 1196 00:48:19,670 --> 00:48:20,170 All right. 1197 00:48:20,170 --> 00:48:21,630 You don't like infinite recursion. 1198 00:48:21,630 --> 00:48:23,750 Good luck on the exam.