1 00:00:00,070 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 3 00:00:04,010 --> 00:00:06,860 B support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,209 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,209 --> 00:00:17,834 at ocw.mit.edu. 8 00:00:21,141 --> 00:00:23,390 VICTOR COSTAN: Any questions about the sorting methods 9 00:00:23,390 --> 00:00:27,570 that you want me to go over in that while I revise? 10 00:00:32,440 --> 00:00:34,744 OK. 11 00:00:34,744 --> 00:00:35,535 All right, sorting. 12 00:00:40,520 --> 00:00:44,030 What sorting methods have we learned? 13 00:00:44,030 --> 00:00:46,395 Let's start from dumbest to smartest. 14 00:00:46,395 --> 00:00:47,700 AUDIENCE: Merge sorting. 15 00:00:47,700 --> 00:00:50,550 VICTOR COSTAN: OK, somewhere in the middle. 16 00:00:50,550 --> 00:00:52,020 Merge sort isn't very bad. 17 00:00:52,020 --> 00:00:54,401 What's the easiest method to sort? 18 00:00:54,401 --> 00:00:55,234 AUDIENCE: Insertion. 19 00:00:58,120 --> 00:00:59,370 VICTOR COSTAN: Insertion sort. 20 00:00:59,370 --> 00:01:00,370 Excellent. 21 00:01:00,370 --> 00:01:00,930 All right. 22 00:01:00,930 --> 00:01:01,520 What else? 23 00:01:06,730 --> 00:01:07,230 Heapsort. 24 00:01:11,070 --> 00:01:11,620 And? 25 00:01:11,620 --> 00:01:14,239 I gave two away now. 26 00:01:14,239 --> 00:01:15,030 AUDIENCE: Counting. 27 00:01:17,772 --> 00:01:18,980 VICTOR COSTAN: Counting sort. 28 00:01:18,980 --> 00:01:20,190 Very good. 29 00:01:20,190 --> 00:01:20,690 And? 30 00:01:25,800 --> 00:01:26,300 Oh, wow. 31 00:01:26,300 --> 00:01:29,130 If you don't even have the name of it. 32 00:01:29,130 --> 00:01:32,120 So the last one is radix sort. 33 00:01:32,120 --> 00:01:35,520 What are the running times for these three 34 00:01:35,520 --> 00:01:36,555 that you guys remember? 35 00:01:40,191 --> 00:01:44,530 AUDIENCE: Insertion sort is linearly one more. 36 00:01:44,530 --> 00:01:45,030 It's bad. 37 00:01:45,030 --> 00:01:47,696 VICTOR COSTAN: I want to see our pseudocode for insertion sorts. 38 00:01:47,696 --> 00:01:49,915 AUDIENCE: n squared. 39 00:01:49,915 --> 00:01:52,390 AUDIENCE: Now that's really bad. 40 00:01:52,390 --> 00:01:55,430 VICTOR COSTAN: So linear is as good as you could possibly get. 41 00:01:55,430 --> 00:01:58,770 So sorting takes an array of random stuff 42 00:01:58,770 --> 00:02:01,510 and outputs an array of things in a sorted order. 43 00:02:01,510 --> 00:02:05,170 The array is size n, so it has to output an array of size n. 44 00:02:05,170 --> 00:02:07,680 If you can do an algorithm that runs in order n time, 45 00:02:07,680 --> 00:02:09,979 then that's the best you could possibly accomplish, 46 00:02:09,979 --> 00:02:12,340 because you have output n elements. 47 00:02:12,340 --> 00:02:14,790 So the best possible time you could get for sorting 48 00:02:14,790 --> 00:02:17,070 is theta of n. 49 00:02:17,070 --> 00:02:17,570 All right. 50 00:02:17,570 --> 00:02:18,573 How about merge sort? 51 00:02:21,351 --> 00:02:22,549 AUDIENCE: [INAUDIBLE]. 52 00:02:22,549 --> 00:02:23,590 VICTOR COSTAN: Thank you. 53 00:02:26,260 --> 00:02:28,171 Heapsort. 54 00:02:28,171 --> 00:02:30,506 AUDIENCE: Order h. 55 00:02:30,506 --> 00:02:32,850 Order h is log n. 56 00:02:32,850 --> 00:02:34,680 VICTOR COSTAN: Order h where h is log n. 57 00:02:34,680 --> 00:02:35,340 OK. 58 00:02:35,340 --> 00:02:37,900 And you're missing a factor. 59 00:02:37,900 --> 00:02:41,300 So a heap operation takes order h, which is log n. 60 00:02:41,300 --> 00:02:43,540 So if I have to insert a numbering in a heap 61 00:02:43,540 --> 00:02:46,750 or extract a number from a heap, that's log n. 62 00:02:46,750 --> 00:02:51,765 In order to start an array, how many insertions do I do? 63 00:02:51,765 --> 00:02:54,140 AUDIENCE: I think-- now I don't know. 64 00:02:54,140 --> 00:02:55,090 VICTOR COSTAN: OK. 65 00:02:55,090 --> 00:02:56,580 Wild guess. 66 00:02:56,580 --> 00:02:57,390 AUDIENCE: n. 67 00:02:57,390 --> 00:02:58,610 VICTOR COSTAN: Very good. 68 00:02:58,610 --> 00:03:00,560 See, there you go. 69 00:03:00,560 --> 00:03:04,240 So you need to insert all your numbers in a heap 70 00:03:04,240 --> 00:03:05,740 and then extract them one by one. 71 00:03:05,740 --> 00:03:07,750 And you will get them in the correct order 72 00:03:07,750 --> 00:03:09,360 that gives you the sorted results. 73 00:03:09,360 --> 00:03:12,440 So n log n. 74 00:03:12,440 --> 00:03:16,240 Does anyone remember what's special about these three 75 00:03:16,240 --> 00:03:19,628 sorting methods that does not apply to the other two? 76 00:03:19,628 --> 00:03:23,942 AUDIENCE: They're in place. 77 00:03:23,942 --> 00:03:25,900 VICTOR COSTAN: Merge sort isn't quite in place. 78 00:03:25,900 --> 00:03:29,000 If it would be in place, it would be perfect. 79 00:03:29,000 --> 00:03:32,260 There is actually a way of making in place merge sort, 80 00:03:32,260 --> 00:03:35,820 but it requires a PhD degree to understand that. 81 00:03:35,820 --> 00:03:40,120 So we will not cover it in 6006, because I do not understand it. 82 00:03:40,120 --> 00:03:42,230 So I couldn't explain it. 83 00:03:42,230 --> 00:03:44,120 So merge sort is not quite in place. 84 00:03:44,120 --> 00:03:45,640 Which one is in place? 85 00:03:49,951 --> 00:03:51,337 AUDIENCE: Heapsort. 86 00:03:51,337 --> 00:03:52,170 VICTOR COSTAN: Good. 87 00:03:52,170 --> 00:03:54,210 So heapsort is in place. 88 00:03:54,210 --> 00:03:56,000 Merge sort is not in place. 89 00:03:56,000 --> 00:03:58,890 And insertion sort is really slow, 90 00:03:58,890 --> 00:04:00,800 so we don't care that much about it. 91 00:04:00,800 --> 00:04:04,080 So what's special about these three 92 00:04:04,080 --> 00:04:05,550 that does not apply to these two? 93 00:04:11,200 --> 00:04:13,320 AUDIENCE: You don't have to use integers. 94 00:04:13,320 --> 00:04:14,070 VICTOR COSTAN: OK. 95 00:04:14,070 --> 00:04:15,720 You don't have to use integers. 96 00:04:15,720 --> 00:04:17,519 What do they want to know instead 97 00:04:17,519 --> 00:04:19,149 about the things you use? 98 00:04:19,149 --> 00:04:20,303 So we'll call them keys. 99 00:04:20,303 --> 00:04:22,219 AUDIENCE: You need to be able to compare them. 100 00:04:22,219 --> 00:04:22,674 VICTOR COSTAN: All right. 101 00:04:22,674 --> 00:04:24,729 AUDIENCE: You don't need to have a minimum 102 00:04:24,729 --> 00:04:27,806 and a maximum integer. 103 00:04:27,806 --> 00:04:30,430 VICTOR COSTAN: So turns out, if you have a comparison operator, 104 00:04:30,430 --> 00:04:32,720 you will have a minimum and a maximum. 105 00:04:32,720 --> 00:04:35,130 But that's complex abstract algebra 106 00:04:35,130 --> 00:04:37,230 that we don't need to worry about. 107 00:04:37,230 --> 00:04:39,060 So you gave me the good answer, which 108 00:04:39,060 --> 00:04:42,915 is we use something called a comparison model. 109 00:04:45,870 --> 00:04:47,970 And in that model, you do not need 110 00:04:47,970 --> 00:04:49,610 to know too much about your keys. 111 00:04:49,610 --> 00:04:52,640 So the elements in the area that you're sorting. 112 00:04:52,640 --> 00:04:54,000 Your keys are blobs. 113 00:04:54,000 --> 00:04:55,750 And all they have to be able to do 114 00:04:55,750 --> 00:04:57,380 is know-- if you have two of them-- 115 00:04:57,380 --> 00:05:01,670 you have to know which one's greater. 116 00:05:01,670 --> 00:05:02,190 That's it. 117 00:05:02,190 --> 00:05:03,820 Nothing else. 118 00:05:03,820 --> 00:05:05,720 What's the problem with the comparison model? 119 00:05:09,024 --> 00:05:11,384 AUDIENCE: It takes time to compare things. 120 00:05:11,384 --> 00:05:12,800 It's like with everything. 121 00:05:12,800 --> 00:05:13,633 VICTOR COSTAN: Yeah. 122 00:05:16,192 --> 00:05:17,650 So we learned in lecture that there 123 00:05:17,650 --> 00:05:20,260 is a lower bound for the comparison model. 124 00:05:20,260 --> 00:05:23,720 And if you want to sort using nothing but this information, 125 00:05:23,720 --> 00:05:28,680 that will take you at least n log n time. 126 00:05:28,680 --> 00:05:31,310 You cannot do better than n log n if all you're using is 127 00:05:31,310 --> 00:05:33,020 comparisons. 128 00:05:33,020 --> 00:05:37,260 So in that respect, merge sort and heap sort are optimal. 129 00:05:37,260 --> 00:05:39,170 If you want to stay within this model, 130 00:05:39,170 --> 00:05:42,290 this is the best time you're going to get. 131 00:05:42,290 --> 00:05:46,670 Does anyone know how you can implement this comparison model 132 00:05:46,670 --> 00:05:47,730 in Python? 133 00:05:47,730 --> 00:05:51,190 So numbers respond to these operators, right? 134 00:05:51,190 --> 00:05:54,020 Actually, in Python this is equals equals. 135 00:05:54,020 --> 00:05:56,080 What if I have a random object and I 136 00:05:56,080 --> 00:05:58,630 want to make it respond to these operators? 137 00:05:58,630 --> 00:06:00,370 So for example, I write merge sort. 138 00:06:00,370 --> 00:06:01,840 We wrote merge sort. 139 00:06:01,840 --> 00:06:03,952 And now I have my own objects, my own keys 140 00:06:03,952 --> 00:06:05,410 which are not necessarily integers, 141 00:06:05,410 --> 00:06:07,004 because that's why we like this. 142 00:06:07,004 --> 00:06:09,170 And we want to make them respond to these operators. 143 00:06:09,170 --> 00:06:11,490 So I can call merge sort on an array of them 144 00:06:11,490 --> 00:06:13,240 and it will crash. 145 00:06:13,240 --> 00:06:16,042 What do I have to do? 146 00:06:16,042 --> 00:06:18,432 AUDIENCE: I mean, you have to give the keys 147 00:06:18,432 --> 00:06:21,300 values that can be compared. 148 00:06:21,300 --> 00:06:23,430 VICTOR COSTAN: So suppose this is my key class. 149 00:06:27,254 --> 00:06:31,119 AUDIENCE: This is lad, the lt, and gt. 150 00:06:31,119 --> 00:06:32,160 VICTOR COSTAN: All right. 151 00:06:32,160 --> 00:06:34,360 There's a magical method in Python. 152 00:06:34,360 --> 00:06:36,410 So there is the old school model, 153 00:06:36,410 --> 00:06:41,480 which you might see in legacy code, which only works 154 00:06:41,480 --> 00:06:45,000 in Python 2.x, which is you define the method called 155 00:06:45,000 --> 00:06:52,480 cmp that takes self and other. 156 00:06:52,480 --> 00:06:54,900 And it has to return a number that's 157 00:06:54,900 --> 00:06:58,780 either smaller than zero, equal to zero, or greater than zero. 158 00:06:58,780 --> 00:07:01,660 And this maps to this. 159 00:07:04,590 --> 00:07:06,130 So you'll see this in old code. 160 00:07:06,130 --> 00:07:08,340 But you shouldn't use it in new code. 161 00:07:08,340 --> 00:07:10,940 On this, you have a very good reason to. 162 00:07:10,940 --> 00:07:14,020 Instead, the new model says that you 163 00:07:14,020 --> 00:07:21,480 define special methods called lt, which stands for less than. 164 00:07:21,480 --> 00:07:22,470 So it's this guy. 165 00:07:24,980 --> 00:07:30,100 le, which is less or equal. 166 00:07:30,100 --> 00:07:31,800 gt, which is greater than. 167 00:07:31,800 --> 00:07:34,935 And ge, which is greater or equal. 168 00:07:37,630 --> 00:07:40,740 And if you look at our code for pieces two and three, 169 00:07:40,740 --> 00:07:43,640 we have some objects that pretend they're keys. 170 00:07:43,640 --> 00:07:47,780 And we have to define these methods. 171 00:07:47,780 --> 00:07:49,460 Also, when you define these, it's 172 00:07:49,460 --> 00:07:56,420 a good idea to define eq for equality comparison. 173 00:07:56,420 --> 00:08:01,250 And ne, which is this guy. 174 00:08:01,250 --> 00:08:06,620 So these also take self and other key 175 00:08:06,620 --> 00:08:08,790 that you're comparing with. 176 00:08:08,790 --> 00:08:10,370 And they return true or false. 177 00:08:13,440 --> 00:08:16,930 So this will help you understand the code better. 178 00:08:16,930 --> 00:08:18,680 All right, so with relatively little work, 179 00:08:18,680 --> 00:08:23,450 you can have any wild object you want act as a key. 180 00:08:23,450 --> 00:08:25,950 And then you have insertion sort, 181 00:08:25,950 --> 00:08:31,050 merge sort, heapsort, heaps, binary trees, AVLs. 182 00:08:31,050 --> 00:08:34,020 Everything works, because everything uses the comparison 183 00:08:34,020 --> 00:08:35,190 model. 184 00:08:35,190 --> 00:08:37,600 The problem is this n log n bound. 185 00:08:40,200 --> 00:08:43,890 It's not as fast as the best possible sorting algorithm 186 00:08:43,890 --> 00:08:45,410 you could come up with. 187 00:08:45,410 --> 00:08:47,550 This is slower than this. 188 00:08:47,550 --> 00:08:50,250 So that's why we have to break out of the comparison model. 189 00:08:50,250 --> 00:08:55,000 And we have to look into these boxes and get more information, 190 00:08:55,000 --> 00:08:58,200 so that we can write faster sorting algorithms. 191 00:08:58,200 --> 00:09:02,678 Does anyone remember the running time for counting sort? 192 00:09:02,678 --> 00:09:04,580 AUDIENCE: [INAUDIBLE] again? 193 00:09:04,580 --> 00:09:05,330 VICTOR COSTAN: OK. 194 00:09:09,524 --> 00:09:10,650 AUDIENCE: n plus e. 195 00:09:10,650 --> 00:09:11,400 VICTOR COSTAN: OK. 196 00:09:15,720 --> 00:09:18,790 Let's remember how counting sort looks like. 197 00:09:18,790 --> 00:09:27,710 Let's get this array that-- that should be enough-- four, one, 198 00:09:27,710 --> 00:09:30,970 three, two, three. 199 00:09:30,970 --> 00:09:33,180 How do we sort it using counting sort? 200 00:09:38,080 --> 00:09:43,929 AUDIENCE: We initialize an array of all the possible values. 201 00:09:43,929 --> 00:09:44,970 VICTOR COSTAN: Very good. 202 00:09:44,970 --> 00:09:45,850 Very good. 203 00:09:45,850 --> 00:09:48,370 So counting sort needs to know something about your values, 204 00:09:48,370 --> 00:09:48,870 right? 205 00:09:48,870 --> 00:09:49,870 It makes an assumption. 206 00:09:49,870 --> 00:09:51,630 And the assumption is that these values 207 00:09:51,630 --> 00:09:57,170 are integers from 0 to, say, k minus 1. 208 00:09:57,170 --> 00:10:00,340 So you have k possible values. 209 00:10:00,340 --> 00:10:02,380 And they don't really have to be these as long 210 00:10:02,380 --> 00:10:05,640 as you can map them to these numbers. 211 00:10:05,640 --> 00:10:10,090 So we are going to initialize an array. 212 00:10:10,090 --> 00:10:14,960 Let's say this is an array. 213 00:10:14,960 --> 00:10:16,760 And zero, one. 214 00:10:16,760 --> 00:10:20,500 So zero, one, two, three, four, five. 215 00:10:23,460 --> 00:10:26,090 So we're going to initialize it with-- 216 00:10:26,090 --> 00:10:27,440 AUDIENCE: Oh, zeroes. 217 00:10:27,440 --> 00:10:28,481 VICTOR COSTAN: All right. 218 00:10:31,020 --> 00:10:31,990 And then? 219 00:10:31,990 --> 00:10:37,380 AUDIENCE: Iterative over our list sort 220 00:10:37,380 --> 00:10:44,620 incrementing the corresponding value to each key in your-- 221 00:10:44,620 --> 00:10:46,890 VICTOR COSTAN: So which one am I incrementing here? 222 00:10:46,890 --> 00:10:47,360 AUDIENCE: Pardon? 223 00:10:47,360 --> 00:10:48,790 VICTOR COSTAN: Which one am I incrementing here? 224 00:10:48,790 --> 00:10:50,081 AUDIENCE: Zero ne through four. 225 00:10:53,450 --> 00:10:53,950 One. 226 00:10:56,740 --> 00:10:59,800 VICTOR COSTAN: Three, two. 227 00:10:59,800 --> 00:11:00,697 And then? 228 00:11:00,697 --> 00:11:02,040 AUDIENCE: Three n. 229 00:11:02,040 --> 00:11:03,970 So this becomes a two. 230 00:11:06,630 --> 00:11:09,380 And what do I do now? 231 00:11:09,380 --> 00:11:14,482 AUDIENCE: Reiterate over that-- I don't know. 232 00:11:14,482 --> 00:11:16,940 I don't know what to call that identity [INAUDIBLE] almost? 233 00:11:16,940 --> 00:11:20,270 OK, an array. 234 00:11:20,270 --> 00:11:27,510 Printing into your output array one one, one two, two threes, 235 00:11:27,510 --> 00:11:28,210 one four. 236 00:11:28,210 --> 00:11:28,820 VICTOR COSTAN: All right. 237 00:11:28,820 --> 00:11:30,330 So there's no zeroes and now fives. 238 00:11:30,330 --> 00:11:36,240 So one one, one two, one three, and one four. 239 00:11:38,930 --> 00:11:39,830 OK, so far so good. 240 00:11:39,830 --> 00:11:42,180 This is great. 241 00:11:42,180 --> 00:11:44,310 There's one thing that's missing. 242 00:11:44,310 --> 00:11:47,200 For counting sort and for other sorting algorithms, 243 00:11:47,200 --> 00:11:50,300 we care about the property called stability. 244 00:11:50,300 --> 00:11:52,780 And stability means that if you have 245 00:11:52,780 --> 00:11:55,090 two equal keys, or at least two keys 246 00:11:55,090 --> 00:11:56,810 that look equal to the sorting algorithm, 247 00:11:56,810 --> 00:11:58,684 they might be different objects, because they 248 00:11:58,684 --> 00:12:00,800 might be implementing that. 249 00:12:00,800 --> 00:12:02,700 The one that shows up first in the input 250 00:12:02,700 --> 00:12:06,010 should also show up first in the output. 251 00:12:06,010 --> 00:12:07,710 And that requires particular care, 252 00:12:07,710 --> 00:12:10,267 because you can't just look at the keys 253 00:12:10,267 --> 00:12:11,850 from your sorting perspective and know 254 00:12:11,850 --> 00:12:13,224 which one's supposed to go where. 255 00:12:13,224 --> 00:12:15,660 You have to remember where they were in the input. 256 00:12:15,660 --> 00:12:19,841 So if this guy is 3a, and this guy is 3b, 257 00:12:19,841 --> 00:12:21,587 I can't use this approach anymore, right? 258 00:12:21,587 --> 00:12:23,420 Because when I'm outputting here, all I know 259 00:12:23,420 --> 00:12:24,520 is I have to output a three. 260 00:12:24,520 --> 00:12:25,936 I don't have any other information 261 00:12:25,936 --> 00:12:27,950 associated with the key. 262 00:12:27,950 --> 00:12:30,006 So instead, I have to do something smarter. 263 00:12:30,006 --> 00:12:34,200 AUDIENCE: Either replace your array with a 2-D array. 264 00:12:34,200 --> 00:12:36,920 Or I think better would be to replace 265 00:12:36,920 --> 00:12:39,601 each value with a length list. 266 00:12:39,601 --> 00:12:40,350 VICTOR COSTAN: OK. 267 00:12:40,350 --> 00:12:45,700 So we can replace each value with a length list, which 268 00:12:45,700 --> 00:12:48,030 would have the keys that map to it, right. 269 00:12:48,030 --> 00:12:49,700 So here I would have a one. 270 00:12:49,700 --> 00:12:51,970 Here I would have a two. 271 00:12:51,970 --> 00:12:55,450 Here I would have 3a, and then 3b. 272 00:12:55,450 --> 00:12:59,120 and here I would have a four. 273 00:12:59,120 --> 00:13:03,150 So then I can go through these and output them the right way. 274 00:13:03,150 --> 00:13:06,940 OK, now suppose I'm writing this in C. 275 00:13:06,940 --> 00:13:09,089 Suppose I'm in a low level language. 276 00:13:09,089 --> 00:13:10,880 And I'm in a low level language because I'm 277 00:13:10,880 --> 00:13:14,760 hired by one of these startups that are doing NoSQL databases. 278 00:13:14,760 --> 00:13:16,490 And they're writing everything in C 279 00:13:16,490 --> 00:13:18,660 to make their things really fast. 280 00:13:18,660 --> 00:13:20,660 So I'm writing an index that uses counting sort. 281 00:13:20,660 --> 00:13:23,870 I don't have length lists, because if I'm writing in C, 282 00:13:23,870 --> 00:13:24,900 I have to write my own. 283 00:13:24,900 --> 00:13:26,220 And that's hard. 284 00:13:26,220 --> 00:13:28,430 So I want to implement this in another way. 285 00:13:32,240 --> 00:13:33,560 Length lists are hard. 286 00:13:33,560 --> 00:13:36,070 What would I do instead? 287 00:13:36,070 --> 00:13:37,440 Can anyone think of another way? 288 00:13:37,440 --> 00:13:40,350 AUDIENCE: I think you can decrement the values 289 00:13:40,350 --> 00:13:43,906 for the C in the array that you have, 290 00:13:43,906 --> 00:13:46,354 where you have to type the culture of each anyway. 291 00:13:46,354 --> 00:13:48,270 VICTOR COSTAN: OK, so you have the right idea. 292 00:13:50,880 --> 00:13:51,910 You're missing one step. 293 00:13:51,910 --> 00:13:53,350 So I'll give everyone else a hint 294 00:13:53,350 --> 00:13:54,600 so that everyone can catch up. 295 00:13:54,600 --> 00:13:58,470 So what I want to do is I want to take this and transform it 296 00:13:58,470 --> 00:14:02,330 into something that allows me to go through the keys. 297 00:14:02,330 --> 00:14:05,020 So I know I have five keys here. 298 00:14:05,020 --> 00:14:07,970 I'm going to make an output array of five elements. 299 00:14:07,970 --> 00:14:11,100 And I want to be able to see four and know 300 00:14:11,100 --> 00:14:12,630 that it belongs here. 301 00:14:12,630 --> 00:14:15,180 See one, know that it belongs here. 302 00:14:15,180 --> 00:14:18,470 See 3a, know that it belongs here. 303 00:14:18,470 --> 00:14:21,410 Then probably update the value associated with three. 304 00:14:21,410 --> 00:14:23,040 See two, know that it belongs here. 305 00:14:23,040 --> 00:14:26,010 And then when I see 3b, know that it belongs here. 306 00:14:28,720 --> 00:14:33,270 So I want to look, when I get to 3a, I want to look inside here. 307 00:14:33,270 --> 00:14:39,090 And I want this to tell me that 3 belongs here, 308 00:14:39,090 --> 00:14:39,960 3a belongs here. 309 00:14:54,610 --> 00:14:58,570 So what would the position of 3a be? 310 00:14:58,570 --> 00:14:59,700 That's not good, right? 311 00:14:59,700 --> 00:15:02,760 Let's call this c instead so that I can say 3a be. 312 00:15:05,920 --> 00:15:09,470 So how would I define the position 313 00:15:09,470 --> 00:15:14,350 using the sorted property? 314 00:15:14,350 --> 00:15:19,780 3a should go in the index that is how many keys smaller than 3 315 00:15:19,780 --> 00:15:20,420 there are. 316 00:15:22,970 --> 00:15:25,750 So if I can look through here and see 317 00:15:25,750 --> 00:15:29,800 how many keys do I have that are smaller than 3, 318 00:15:29,800 --> 00:15:33,210 this is where 3a needs to go. 319 00:15:33,210 --> 00:15:35,180 If I look at four, there are four keys 320 00:15:35,180 --> 00:15:36,450 that are smaller than four. 321 00:15:36,450 --> 00:15:40,640 So it needs to go in position four. 322 00:15:40,640 --> 00:15:44,680 AUDIENCE: Well, that almost seems more like a compare. 323 00:15:44,680 --> 00:15:46,253 I'm guessing that makes it-- I think 324 00:15:46,253 --> 00:15:47,586 it's kind of a comparison model. 325 00:15:47,586 --> 00:15:51,530 But you're saying is it greater than. 326 00:15:51,530 --> 00:15:54,180 So it's not really counting sort anymore as much. 327 00:15:54,180 --> 00:15:55,680 VICTOR COSTAN: Well, I'm telling you 328 00:15:55,680 --> 00:15:58,860 I can compute that using this. 329 00:15:58,860 --> 00:16:00,650 So I can use the counting sort algorithm 330 00:16:00,650 --> 00:16:05,130 and change this array a little bit so that I can do this trick 331 00:16:05,130 --> 00:16:07,082 and know what goes where. 332 00:16:07,082 --> 00:16:09,860 AUDIENCE: You already mentioned using a 2-D array. 333 00:16:09,860 --> 00:16:14,470 VICTOR COSTAN: But a 2-D array would be too much. 334 00:16:14,470 --> 00:16:16,660 In the end, I will be changing this in place. 335 00:16:16,660 --> 00:16:22,910 So no extra space except for this array of size k. 336 00:16:22,910 --> 00:16:25,320 But let's not worry about changing it in place right now. 337 00:16:25,320 --> 00:16:28,920 Let's say we're going to make another array of size k. 338 00:16:34,920 --> 00:16:39,320 So I want it to tell me that-- I guess I don't care about this-- 339 00:16:39,320 --> 00:16:42,360 but I want it to tell me that one, the first one 340 00:16:42,360 --> 00:16:44,340 should go here, the first two should go here, 341 00:16:44,340 --> 00:16:47,540 the first three should go here, the first four should go here. 342 00:16:47,540 --> 00:16:48,310 How do I do that? 343 00:16:51,472 --> 00:16:54,820 AUDIENCE: Well, you could make that array, right. 344 00:16:54,820 --> 00:16:56,620 VICTOR COSTAN: But how do I compute it? 345 00:16:56,620 --> 00:16:58,245 AUDIENCE: While you're making this one, 346 00:16:58,245 --> 00:17:01,120 you can start filling that one in. 347 00:17:01,120 --> 00:17:03,910 But while you're making the top one. 348 00:17:03,910 --> 00:17:05,269 VICTOR COSTAN: Can I? 349 00:17:05,269 --> 00:17:08,790 AUDIENCE: It would be like insertion sort though, kind of. 350 00:17:08,790 --> 00:17:11,655 So you come across the four. 351 00:17:11,655 --> 00:17:14,030 You put it in there, because you know how many there are. 352 00:17:14,030 --> 00:17:15,510 But that doesn't make a lot of sense. 353 00:17:15,510 --> 00:17:16,510 VICTOR COSTAN: Yeah, OK. 354 00:17:16,510 --> 00:17:17,980 So let's abandon that route. 355 00:17:17,980 --> 00:17:20,309 Let's think of something else. 356 00:17:20,309 --> 00:17:21,946 AUDIENCE: Could you populate the array 357 00:17:21,946 --> 00:17:26,250 with the number of elements that are less than that [INAUDIBLE]? 358 00:17:26,250 --> 00:17:28,270 VICTOR COSTAN: So intuitively, I want 359 00:17:28,270 --> 00:17:30,150 this to tell me how many elements there 360 00:17:30,150 --> 00:17:32,170 are that are smaller than two. 361 00:17:32,170 --> 00:17:34,084 This should tell me the number of elements 362 00:17:34,084 --> 00:17:36,500 there are that are smaller than three, so on and so forth. 363 00:17:39,520 --> 00:17:41,230 OK, how would I compute that? 364 00:17:46,852 --> 00:17:48,310 Let's see what it's supposed to be. 365 00:17:48,310 --> 00:17:49,768 Let's fill it out with real values. 366 00:17:49,768 --> 00:17:50,847 AUDIENCE: Zero. 367 00:17:50,847 --> 00:17:51,680 VICTOR COSTAN: Zero. 368 00:17:51,680 --> 00:17:53,410 How many elements smaller than one? 369 00:17:53,410 --> 00:17:54,650 AUDIENCE: Zero. 370 00:17:54,650 --> 00:17:57,087 VICTOR COSTAN: How many elements smaller than two? 371 00:17:57,087 --> 00:17:58,032 AUDIENCE: One. 372 00:17:58,032 --> 00:18:00,198 VICTOR COSTAN: How many elements smaller than three? 373 00:18:00,198 --> 00:18:01,950 AUDIENCE: Two. 374 00:18:01,950 --> 00:18:03,770 It's a cumulative sum. 375 00:18:03,770 --> 00:18:04,800 VICTOR COSTAN: OK. 376 00:18:04,800 --> 00:18:07,240 AUDIENCE: On the array above. 377 00:18:07,240 --> 00:18:09,650 VICTOR COSTAN: So this is how many elements smaller 378 00:18:09,650 --> 00:18:11,236 than four? 379 00:18:11,236 --> 00:18:13,530 Or how many elements smaller than 5 4? 380 00:18:13,530 --> 00:18:14,810 OK. 381 00:18:14,810 --> 00:18:16,992 what's the difference between these two guys? 382 00:18:16,992 --> 00:18:18,350 AUDIENCE: One. 383 00:18:18,350 --> 00:18:19,724 VICTOR COSTAN: What's the difference between these two 384 00:18:19,724 --> 00:18:19,968 guys? 385 00:18:19,968 --> 00:18:20,551 AUDIENCE: One. 386 00:18:22,994 --> 00:18:24,410 VICTOR COSTAN: Yeah, you're right. 387 00:18:24,410 --> 00:18:26,410 Sorry. 388 00:18:26,410 --> 00:18:27,530 Thank you. 389 00:18:27,530 --> 00:18:29,982 What's the difference between these two guys? 390 00:18:29,982 --> 00:18:32,142 AUDIENCE: Two. 391 00:18:32,142 --> 00:18:32,643 One. 392 00:18:32,643 --> 00:18:35,058 VICTOR COSTAN: And what's the difference between these two 393 00:18:35,058 --> 00:18:36,116 guys? 394 00:18:36,116 --> 00:18:38,561 AUDIENCE: Zero. 395 00:18:38,561 --> 00:18:41,154 VICTOR COSTAN: OK, What did I just write here? 396 00:18:41,154 --> 00:18:42,615 AUDIENCE: Same series up there. 397 00:18:42,615 --> 00:18:44,080 AUDIENCE: Array. 398 00:18:44,080 --> 00:18:45,660 VICTOR COSTAN: All right. 399 00:18:45,660 --> 00:18:49,610 So this guy is zero, right, because there's no element 400 00:18:49,610 --> 00:18:52,380 that-- there's nothing that's smaller to the smallest key. 401 00:18:52,380 --> 00:18:59,610 And then this guy is whatever was here plus this almost. 402 00:18:59,610 --> 00:19:02,140 So the difference between this guy and this guy is this. 403 00:19:05,633 --> 00:19:07,629 AUDIENCE: So why go through an array? 404 00:19:07,629 --> 00:19:09,967 I mean, why did you bother? 405 00:19:09,967 --> 00:19:11,835 Why do we make a new array? 406 00:19:11,835 --> 00:19:13,675 Because we could just get that information. 407 00:19:13,675 --> 00:19:15,050 VICTOR COSTAN: Making a new array 408 00:19:15,050 --> 00:19:18,870 so that we can see how to compute it. 409 00:19:18,870 --> 00:19:21,170 So now we're going to try to right pseudocode that 410 00:19:21,170 --> 00:19:23,950 does this in place. 411 00:19:23,950 --> 00:19:27,780 So suppose this array is a and this 412 00:19:27,780 --> 00:19:33,300 array is pass for position. 413 00:19:33,300 --> 00:19:35,770 And suppose-- sorry, not this array. 414 00:19:35,770 --> 00:19:36,640 This array is a. 415 00:19:39,460 --> 00:19:40,360 This array is pass. 416 00:19:40,360 --> 00:19:41,740 And I start with this. 417 00:19:41,740 --> 00:19:45,020 And I want to end up with this. 418 00:19:45,020 --> 00:19:48,510 So let's try to write the pseudocode for counting sort. 419 00:19:48,510 --> 00:19:51,250 Counting sort with an array a. 420 00:19:51,250 --> 00:19:57,300 I'm not going to write the first two lines that produce this. 421 00:19:57,300 --> 00:19:59,827 Let's transform this to this. 422 00:19:59,827 --> 00:20:00,660 How would I do that? 423 00:20:03,950 --> 00:20:06,770 AUDIENCE: Initialize an array of the same size. 424 00:20:06,770 --> 00:20:08,597 VICTOR COSTAN: OK. 425 00:20:08,597 --> 00:20:09,805 Can we try to do it in place? 426 00:20:13,253 --> 00:20:13,878 AUDIENCE: Sure. 427 00:20:15,864 --> 00:20:17,530 VICTOR COSTAN: How do we do it in place? 428 00:20:22,450 --> 00:20:25,530 AUDIENCE: You could, well for four, you get the four. 429 00:20:25,530 --> 00:20:28,660 You're like, oh, I haven't encountered anything below me. 430 00:20:28,660 --> 00:20:31,745 So you put it in zero initially for four. 431 00:20:31,745 --> 00:20:32,704 And then you get a one. 432 00:20:32,704 --> 00:20:35,036 And you're like, oh, I haven't gotten anything below me. 433 00:20:35,036 --> 00:20:36,866 But I forget to keep track of the fact 434 00:20:36,866 --> 00:20:39,116 that you have to iterate a whole list ever single time 435 00:20:39,116 --> 00:20:40,260 you get a new input. 436 00:20:40,260 --> 00:20:42,010 VICTOR COSTAN: So I don't want to do that, 437 00:20:42,010 --> 00:20:43,051 because that's n squared. 438 00:20:43,051 --> 00:20:50,540 AUDIENCE: What you need to do is keep a running sum. 439 00:20:50,540 --> 00:20:51,440 Is it a register? 440 00:20:51,440 --> 00:20:52,340 Is that what you do call it? 441 00:20:52,340 --> 00:20:52,920 VICTOR COSTAN: Running sum. 442 00:20:52,920 --> 00:20:53,720 I like running sum. 443 00:20:53,720 --> 00:20:54,261 AUDIENCE: OK. 444 00:20:54,261 --> 00:20:55,480 Keep a running sum of-- 445 00:20:58,770 --> 00:21:00,780 VICTOR COSTAN: Sums always start at zero, right? 446 00:21:00,780 --> 00:21:01,820 AUDIENCE: Right. 447 00:21:01,820 --> 00:21:10,140 So you keep zero at-- you take the value 448 00:21:10,140 --> 00:21:14,920 in each index of that array and add it to sum. 449 00:21:14,920 --> 00:21:16,250 VICTOR COSTAN: OK. 450 00:21:16,250 --> 00:21:23,280 So for i iterating from zero to-- so you 451 00:21:23,280 --> 00:21:26,577 want each value in this array, right? 452 00:21:26,577 --> 00:21:27,160 AUDIENCE: Yes. 453 00:21:27,160 --> 00:21:30,980 VICTOR COSTAN: So it's going to iterate from zero to what? 454 00:21:30,980 --> 00:21:34,635 How many elements do I have there? 455 00:21:34,635 --> 00:21:36,910 AUDIENCE: Length k. 456 00:21:36,910 --> 00:21:38,400 VICTOR COSTAN: OK, almost. 457 00:21:38,400 --> 00:21:43,220 So we're using Python numbering, which is zero base indexing. 458 00:21:43,220 --> 00:21:45,290 The indices look like this. 459 00:21:45,290 --> 00:21:46,799 So it's zero to-- 460 00:21:46,799 --> 00:21:47,715 AUDIENCE: [INAUDIBLE]. 461 00:21:47,715 --> 00:21:48,756 VICTOR COSTAN: Very good. 462 00:21:48,756 --> 00:21:50,870 Thank you. 463 00:21:50,870 --> 00:21:53,790 And you said I'm going to add the elements to a sum. 464 00:21:53,790 --> 00:22:03,300 So sum is sum plus position of i. 465 00:22:03,300 --> 00:22:05,080 OK. 466 00:22:05,080 --> 00:22:05,590 And then? 467 00:22:08,265 --> 00:22:12,510 AUDIENCE: The replace is the [INAUDIBLE]. 468 00:22:12,510 --> 00:22:16,830 So zero should be zero still. 469 00:22:16,830 --> 00:22:23,665 One should be the sum after evaluating zero. 470 00:22:23,665 --> 00:22:25,610 You'll need a temp variable. 471 00:22:25,610 --> 00:22:26,400 VICTOR COSTAN: OK. 472 00:22:26,400 --> 00:22:31,360 AUDIENCE: You'll need to graph position i when in temp. 473 00:22:31,360 --> 00:22:35,400 VICTOR COSTAN: Temp is position i. 474 00:22:35,400 --> 00:22:42,605 AUDIENCE: Then say position i is sum before incremental sums. 475 00:22:45,985 --> 00:22:46,485 No. 476 00:22:46,485 --> 00:22:49,395 That's not it at all. 477 00:22:49,395 --> 00:22:51,335 VICTOR COSTAN: Really? 478 00:22:51,335 --> 00:22:53,670 AUDIENCE: We'll have to say that sum is sum plus temp. 479 00:23:01,542 --> 00:23:02,526 That is going to work. 480 00:23:05,480 --> 00:23:06,230 VICTOR COSTAN: OK. 481 00:23:06,230 --> 00:23:09,500 How does everyone else feel about this? 482 00:23:09,500 --> 00:23:11,396 Does it make sense? 483 00:23:11,396 --> 00:23:12,283 AUDIENCE: Not really. 484 00:23:12,283 --> 00:23:14,324 AUDIENCE: [INAUDIBLE] temporary blast [INAUDIBLE] 485 00:23:14,324 --> 00:23:19,500 previous adjuration, because-- so when you first started, 486 00:23:19,500 --> 00:23:21,930 it's the very initial case that doesn't work. 487 00:23:21,930 --> 00:23:24,520 So like, if you're in the first column, everything's fine. 488 00:23:24,520 --> 00:23:27,283 Then you go to column one. 489 00:23:27,283 --> 00:23:29,919 You're looking at everything to the left of it. 490 00:23:29,919 --> 00:23:31,085 It's still going to be zero. 491 00:23:31,085 --> 00:23:32,560 Then you go to the second column, 492 00:23:32,560 --> 00:23:35,250 but you already overwrote the previous column. 493 00:23:35,250 --> 00:23:39,620 So you need to store somehow the-- I don't know. 494 00:23:39,620 --> 00:23:42,932 It's just the initial case from when it first 495 00:23:42,932 --> 00:23:46,047 goes from zero to an actual qualified number. 496 00:23:46,047 --> 00:23:47,547 Because otherwise, you're just going 497 00:23:47,547 --> 00:23:48,824 to get like zero, zero, zero. 498 00:23:48,824 --> 00:23:51,770 And you just overwrite. 499 00:23:51,770 --> 00:23:56,189 AUDIENCE: Can you start [INAUDIBLE]? 500 00:23:56,189 --> 00:23:59,293 Was that before you changed? 501 00:23:59,293 --> 00:24:00,043 VICTOR COSTAN: OK. 502 00:24:14,580 --> 00:24:16,650 Sorry, I'm getting confused. 503 00:24:24,770 --> 00:24:27,220 This is getting hard. 504 00:24:27,220 --> 00:24:29,750 I will show you a trick to make life easier. 505 00:24:29,750 --> 00:24:33,160 I'm going to put-- how many elements do I have here? 506 00:24:33,160 --> 00:24:35,050 Five, right? 507 00:24:35,050 --> 00:24:41,170 So I'm going to put a five here after the array. 508 00:24:41,170 --> 00:24:45,212 And then I'm going to ask you, what's this difference. 509 00:24:45,212 --> 00:24:47,110 AUDIENCE: Zero. 510 00:24:47,110 --> 00:24:48,490 VICTOR COSTAN: OK. 511 00:24:48,490 --> 00:24:50,380 So now we have this whole array. 512 00:24:57,650 --> 00:25:00,780 Can people see what's going on here.? 513 00:25:00,780 --> 00:25:03,130 So instead of starting at the beginning, 514 00:25:03,130 --> 00:25:04,380 I'm going to start at the end. 515 00:25:04,380 --> 00:25:09,750 And I'm going to know-- I know for sure there are n elements. 516 00:25:09,750 --> 00:25:13,310 Therefore, the index of this guy is n minus-- 517 00:25:13,310 --> 00:25:17,850 so the index of the last key is n minus how many keys I 518 00:25:17,850 --> 00:25:18,770 have with this value. 519 00:25:21,720 --> 00:25:24,140 Does this make sense? 520 00:25:24,140 --> 00:25:26,334 AUDIENCE: But you're iterating over an order, right? 521 00:25:26,334 --> 00:25:27,875 So we can't just take the whole thing 522 00:25:27,875 --> 00:25:30,300 and say we're going to shift it over to the right. 523 00:25:30,300 --> 00:25:31,383 VICTOR COSTAN: How about-- 524 00:25:35,069 --> 00:25:37,110 AUDIENCE: And you're going through left to right. 525 00:25:37,110 --> 00:25:39,840 You'll only know what you see thus far. 526 00:25:39,840 --> 00:25:48,150 VICTOR COSTAN: How about going it for ai from n minus 1 to 0. 527 00:25:48,150 --> 00:25:51,230 Will it work then? 528 00:25:51,230 --> 00:25:52,370 So what would I write? 529 00:25:52,370 --> 00:25:54,510 AUDIENCE: But isn't that super inefficient? 530 00:25:54,510 --> 00:25:57,534 Because then you're starting looking at the whole list. 531 00:25:57,534 --> 00:25:59,430 And then you're sort of, rather than just 532 00:25:59,430 --> 00:26:02,955 looking at the previous sum that you just-- the cumulative. 533 00:26:02,955 --> 00:26:04,386 So your first adjuration, you have 534 00:26:04,386 --> 00:26:05,817 to add up everything that you see. 535 00:26:05,817 --> 00:26:08,204 Like adjuration, you have to add everything up. 536 00:26:08,204 --> 00:26:10,120 VICTOR COSTAN: So if I add everything up here, 537 00:26:10,120 --> 00:26:12,790 what's the result going to be? 538 00:26:12,790 --> 00:26:13,570 AUDIENCE: Five. 539 00:26:13,570 --> 00:26:14,320 VICTOR COSTAN: OK. 540 00:26:14,320 --> 00:26:14,880 What's five? 541 00:26:21,094 --> 00:26:24,070 So this counts how many zero keys 542 00:26:24,070 --> 00:26:26,086 I've seen, how many one keys I've seen, 543 00:26:26,086 --> 00:26:29,389 how many two keys I've seen, so on and so forth. 544 00:26:29,389 --> 00:26:29,930 So in total-- 545 00:26:29,930 --> 00:26:30,960 AUDIENCE: So you're subtracting 546 00:26:30,960 --> 00:26:32,793 VICTOR COSTAN: It's how many keys I've seen. 547 00:26:32,793 --> 00:26:36,040 All this, the sum of all these, is how many keys I've sent. 548 00:26:36,040 --> 00:26:37,540 How many keys do I have? 549 00:26:37,540 --> 00:26:38,450 AUDIENCE: Five. 550 00:26:38,450 --> 00:26:39,950 For each one you see, you can just-- 551 00:26:39,950 --> 00:26:42,846 VICTOR COSTAN: So who's five? 552 00:26:42,846 --> 00:26:45,320 It's the length of this guy, right? 553 00:26:45,320 --> 00:26:47,680 And we usually call that n. 554 00:26:47,680 --> 00:26:53,620 So when we're doing sorting, this is n. 555 00:26:53,620 --> 00:26:55,175 So maybe it's less confusing. 556 00:26:55,175 --> 00:26:57,180 Oh, I already used n in two places. 557 00:26:57,180 --> 00:26:59,930 So I guess that's it. 558 00:26:59,930 --> 00:27:04,245 I could say the length of a, but there you go. 559 00:27:07,327 --> 00:27:09,660 So I could do the thing that we're going through before. 560 00:27:09,660 --> 00:27:11,201 I could figure out my temp variables. 561 00:27:11,201 --> 00:27:14,060 And I could make it work. 562 00:27:14,060 --> 00:27:15,145 Or I could do this. 563 00:27:15,145 --> 00:27:16,440 AUDIENCE: I think it's the same though, isn't it? 564 00:27:16,440 --> 00:27:17,340 VICTOR COSTAN: Yup. 565 00:27:17,340 --> 00:27:20,616 It's the same thing, except I think this is easier to write. 566 00:27:20,616 --> 00:27:22,240 Does anyone want to help me write this? 567 00:27:28,636 --> 00:27:31,217 AUDIENCE: Maybe doing once you're 568 00:27:31,217 --> 00:27:34,052 starting with the top array, and then finding the bottom one. 569 00:27:34,052 --> 00:27:35,024 VICTOR COSTAN: Yeah. 570 00:27:35,024 --> 00:27:35,960 AUDIENCE: Oh, OK. 571 00:27:35,960 --> 00:27:37,835 Well, you just-- you start with the first one 572 00:27:37,835 --> 00:27:40,444 and the one ahead of it. 573 00:27:40,444 --> 00:27:43,240 And oh, I mean starting with the top right. 574 00:27:43,240 --> 00:27:43,820 Sorry. 575 00:27:43,820 --> 00:27:46,400 VICTOR COSTAN: OK, so I have this. 576 00:27:46,400 --> 00:27:49,182 And then what do I do? 577 00:27:49,182 --> 00:27:51,597 AUDIENCE: [INAUDIBLE]? 578 00:27:51,597 --> 00:27:53,530 Oh, so you're starting from the back. 579 00:27:53,530 --> 00:27:54,534 VICTOR COSTAN: Yep. 580 00:27:54,534 --> 00:28:00,720 AUDIENCE: Well, then you just compare that to-- I mean, 581 00:28:00,720 --> 00:28:03,445 you're going to start with zero difference. 582 00:28:03,445 --> 00:28:05,820 If you have-- well you don't have any of those last keys, 583 00:28:05,820 --> 00:28:07,434 so you'd be able to start with a zero. 584 00:28:07,434 --> 00:28:09,600 VICTOR COSTAN: So what's the difference between five 585 00:28:09,600 --> 00:28:12,780 here, which is n, and this guy? 586 00:28:12,780 --> 00:28:13,437 What is this? 587 00:28:13,437 --> 00:28:14,770 AUDIENCE: It's going to be zero. 588 00:28:14,770 --> 00:28:16,140 VICTOR COSTAN: But what is it? 589 00:28:16,140 --> 00:28:16,870 Why is it zero? 590 00:28:16,870 --> 00:28:19,350 So this one's zero, this one's one, this one's two. 591 00:28:19,350 --> 00:28:21,910 What is this? 592 00:28:21,910 --> 00:28:23,160 It's the last guy here, right? 593 00:28:23,160 --> 00:28:24,180 AUDIENCE: Yeah, yeah. 594 00:28:24,180 --> 00:28:29,450 VICTOR COSTAN: So this is pass of n minus 1. 595 00:28:29,450 --> 00:28:34,090 And this is pass of n minus 2, so on and so forth. 596 00:28:34,090 --> 00:28:38,420 So to get from n to the value here, 597 00:28:38,420 --> 00:28:40,030 I have to subtract this guy. 598 00:28:45,417 --> 00:28:46,250 AUDIENCE: Pass of i. 599 00:28:48,859 --> 00:28:49,900 VICTOR COSTAN: Pass of i. 600 00:28:56,264 --> 00:28:57,180 AUDIENCE: [INAUDIBLE]. 601 00:29:02,681 --> 00:29:03,430 VICTOR COSTAN: OK. 602 00:29:03,430 --> 00:29:04,180 Very good. 603 00:29:04,180 --> 00:29:07,659 AUDIENCE: And then update sum. 604 00:29:07,659 --> 00:29:10,144 Sum equals a pos value. 605 00:29:13,140 --> 00:29:14,210 VICTOR COSTAN: Sweet. 606 00:29:14,210 --> 00:29:17,681 No temp variables, aside from this, I guess. 607 00:29:17,681 --> 00:29:18,680 How does this look like? 608 00:29:18,680 --> 00:29:20,860 Do people get it? 609 00:29:20,860 --> 00:29:22,690 AUDIENCE: You're subtracting positive i, 610 00:29:22,690 --> 00:29:25,450 or you're subtracting a of i. 611 00:29:25,450 --> 00:29:26,870 AUDIENCE: It's all one array. 612 00:29:26,870 --> 00:29:28,120 AUDIENCE: It's the same thing. 613 00:29:28,120 --> 00:29:28,661 That's right. 614 00:29:28,661 --> 00:29:32,230 VICTOR COSTAN: So a is this array. a is the input array. 615 00:29:32,230 --> 00:29:34,370 And pass is this guy. 616 00:29:34,370 --> 00:29:36,800 And this is pass before the four loop. 617 00:29:36,800 --> 00:29:39,320 And this is pass after the four loop. 618 00:29:39,320 --> 00:29:41,150 So I guess this is pass zero. 619 00:29:41,150 --> 00:29:43,050 And this is pass one. 620 00:29:43,050 --> 00:29:46,080 And here, we start with pass zero. 621 00:29:46,080 --> 00:29:48,910 This, we end up with pass one. 622 00:29:48,910 --> 00:29:53,300 OK 623 00:29:53,300 --> 00:29:55,747 So we're able to compute this. 624 00:29:55,747 --> 00:29:57,830 There are many ways of doing this, but in the end, 625 00:29:57,830 --> 00:30:00,250 you want an array that looks like that. 626 00:30:00,250 --> 00:30:01,379 This is counting sort. 627 00:30:01,379 --> 00:30:03,420 This is the hard part of counting sort, coming up 628 00:30:03,420 --> 00:30:04,650 with that array. 629 00:30:04,650 --> 00:30:06,980 Once you come up with that array, you're golden. 630 00:30:06,980 --> 00:30:10,990 So let's see that we're golden and produce an output array 631 00:30:10,990 --> 00:30:12,500 with the keys in the right order. 632 00:30:15,760 --> 00:30:18,050 So say we have an array called output. 633 00:30:18,050 --> 00:30:21,530 And this is going to have these keys in the right order. 634 00:30:21,530 --> 00:30:24,956 What's the pseudocode for that? 635 00:30:24,956 --> 00:30:28,160 First, I'm going to create a new array. 636 00:30:28,160 --> 00:30:32,155 And I'm going to initialize it with n NIL values. 637 00:30:34,990 --> 00:30:35,740 Then what do I do? 638 00:30:40,390 --> 00:30:42,050 AUDIENCE: Iterate over a. 639 00:30:42,050 --> 00:30:44,470 VICTOR COSTAN: Very good. 640 00:30:44,470 --> 00:30:47,350 For-- nah, it's too low. 641 00:30:47,350 --> 00:30:48,458 Let's do it here. 642 00:30:48,458 --> 00:30:49,454 AUDIENCE: i of a. 643 00:30:53,440 --> 00:30:57,950 From zero to n minus 1. 644 00:30:57,950 --> 00:30:59,660 VICTOR COSTAN: OK. 645 00:30:59,660 --> 00:31:00,210 What do I do? 646 00:31:03,070 --> 00:31:12,962 AUDIENCE: Out of [INAUDIBLE] has to be-- oh, 647 00:31:12,962 --> 00:31:14,450 can we modify pass one as we go? 648 00:31:14,450 --> 00:31:15,350 VICTOR COSTAN: Yeah. 649 00:31:15,350 --> 00:31:18,750 AUDIENCE: So you could say, out of pos one-- 650 00:31:18,750 --> 00:31:20,500 VICTOR COSTAN: So by the way, this is pos. 651 00:31:20,500 --> 00:31:22,350 The reason I label them with zero and one, 652 00:31:22,350 --> 00:31:23,900 so we're doing the change in place. 653 00:31:23,900 --> 00:31:24,150 AUDIENCE: Right. 654 00:31:24,150 --> 00:31:25,608 VICTOR COSTAN: The reason I labeled 655 00:31:25,608 --> 00:31:27,825 them is to say that this is what pos 656 00:31:27,825 --> 00:31:29,590 is before we going into the loop. 657 00:31:29,590 --> 00:31:31,600 This is what pos is afterwards. 658 00:31:31,600 --> 00:31:33,440 But it's a single array. 659 00:31:33,440 --> 00:31:34,580 So let's call it pos. 660 00:31:34,580 --> 00:31:36,455 So out of pos of-- 661 00:31:36,455 --> 00:31:40,330 AUDIENCE: Pos of i equals a to the i. 662 00:31:43,281 --> 00:31:46,990 Positive i plus pos squared. 663 00:31:46,990 --> 00:31:48,830 VICTOR COSTAN: Yup. 664 00:31:48,830 --> 00:31:51,670 And I'm going to use the CLRS, the way 665 00:31:51,670 --> 00:31:54,590 which makes me write more. 666 00:31:58,190 --> 00:31:59,470 So how this work? 667 00:31:59,470 --> 00:32:01,710 I have the survey here. 668 00:32:01,710 --> 00:32:04,370 I start at four. 669 00:32:04,370 --> 00:32:05,750 What's pos of four? 670 00:32:08,582 --> 00:32:10,000 AUDIENCE: Four. 671 00:32:10,000 --> 00:32:12,570 VICTOR COSTAN: All right, so I'm going 672 00:32:12,570 --> 00:32:14,520 to write this as position four. 673 00:32:14,520 --> 00:32:18,950 I should probably make this a proper array. 674 00:32:18,950 --> 00:32:21,510 One two, three, four, five. 675 00:32:24,570 --> 00:32:26,740 So at four, I write four. 676 00:32:26,740 --> 00:32:30,420 And then I increment this guy to become five. 677 00:32:34,820 --> 00:32:36,070 Then I get to one. 678 00:32:36,070 --> 00:32:38,320 So I look at pos of-- 679 00:32:38,320 --> 00:32:39,680 AUDIENCE: One. 680 00:32:39,680 --> 00:32:42,800 VICTOR COSTAN: And that is zero. 681 00:32:42,800 --> 00:32:46,580 So I'm going to write one at position zero. 682 00:32:46,580 --> 00:32:50,470 And I'm going to increment it. 683 00:32:50,470 --> 00:32:51,755 Then I get to 3a. 684 00:32:51,755 --> 00:32:53,620 I look at positive 3. 685 00:32:53,620 --> 00:32:54,600 It says 2. 686 00:32:54,600 --> 00:32:58,793 So I'm going to write 3a here and increment this. 687 00:33:03,140 --> 00:33:04,660 Then I get to two. 688 00:33:04,660 --> 00:33:06,707 Pos of two is-- 689 00:33:06,707 --> 00:33:07,290 AUDIENCE: One. 690 00:33:07,290 --> 00:33:08,240 VICTOR COSTAN: One. 691 00:33:08,240 --> 00:33:10,560 So I write two here. 692 00:33:10,560 --> 00:33:14,530 Pos of two becomes two. 693 00:33:14,530 --> 00:33:19,610 Then I have 3c, which is pos of 3 is now 3. 694 00:33:19,610 --> 00:33:20,790 It's not two anymore. 695 00:33:20,790 --> 00:33:23,870 So yay, I'm not overwriting 3a. 696 00:33:23,870 --> 00:33:24,960 That's good. 697 00:33:24,960 --> 00:33:26,040 And this becomes four. 698 00:33:31,559 --> 00:33:33,350 Are people getting what just happened here? 699 00:33:35,990 --> 00:33:41,250 AUDIENCE: Wait, why didn't [INAUDIBLE] to just basically 700 00:33:41,250 --> 00:33:45,639 train the next array into an index binder? 701 00:33:45,639 --> 00:33:46,430 VICTOR COSTAN: Yep. 702 00:33:46,430 --> 00:33:50,090 So this guy tells me if I have a key, 703 00:33:50,090 --> 00:33:52,700 where do I write it in here? 704 00:33:52,700 --> 00:33:56,950 So these start out with pointers to the first element that 705 00:33:56,950 --> 00:33:58,240 would store that key value. 706 00:33:58,240 --> 00:34:01,910 And when I store a key, say when I start 3a, when I get to 3c, 707 00:34:01,910 --> 00:34:03,770 I don't want to store it in the same place. 708 00:34:03,770 --> 00:34:05,280 So I have to increment that. 709 00:34:05,280 --> 00:34:07,910 I have to say, yo, I wrote 3a at position two. 710 00:34:07,910 --> 00:34:10,510 So next time, write it-- next time you 711 00:34:10,510 --> 00:34:13,290 see a three, right it at the position following that. 712 00:34:13,290 --> 00:34:16,750 And that's what this guy does. 713 00:34:19,989 --> 00:34:22,710 So this is the relatively easy part. 714 00:34:22,710 --> 00:34:26,159 And this is the hard magic in counting sort. 715 00:34:30,270 --> 00:34:33,290 So how are people feeling about it now? 716 00:34:35,969 --> 00:34:39,426 Any nods, or is still confusing as hell? 717 00:34:39,426 --> 00:34:40,300 AUDIENCE: It's a lot. 718 00:34:40,300 --> 00:34:43,340 I'm confused. 719 00:34:43,340 --> 00:34:46,230 VICTOR COSTAN: OK. 720 00:34:46,230 --> 00:34:47,690 Well what should we do? 721 00:34:47,690 --> 00:34:49,690 Do you guys want to ask more questions? 722 00:34:49,690 --> 00:34:52,815 Do you want to run through another example? 723 00:34:52,815 --> 00:34:55,750 Do you want to try to see how this becomes useful in radix 724 00:34:55,750 --> 00:34:58,720 sort, so that you're motivated to figure it out on your own? 725 00:34:58,720 --> 00:35:00,570 What would make more sense? 726 00:35:00,570 --> 00:35:01,070 All right. 727 00:35:01,070 --> 00:35:03,710 Who wants to do more count sort? 728 00:35:03,710 --> 00:35:06,200 Who wants to do some radix sort. 729 00:35:06,200 --> 00:35:07,190 All right. 730 00:35:07,190 --> 00:35:07,950 Radix sort it is. 731 00:35:10,399 --> 00:35:12,440 Next time you want to move on, tell me understood 732 00:35:12,440 --> 00:35:13,434 and I'll believe you. 733 00:35:13,434 --> 00:35:14,600 And it'll look good on tape. 734 00:35:17,310 --> 00:35:18,390 Two, three-- 735 00:35:18,390 --> 00:35:20,056 AUDIENCE: You're not supposed to tell us 736 00:35:20,056 --> 00:35:21,630 that there's a camera in here. 737 00:35:21,630 --> 00:35:24,091 VICTOR COSTAN: One, four. 738 00:35:24,091 --> 00:35:26,340 I think you're supposed to know, because otherwise you 739 00:35:26,340 --> 00:35:30,410 don't know that we're violating your rights. 740 00:35:30,410 --> 00:35:31,330 Two, four-- 741 00:35:31,330 --> 00:35:34,250 AUDIENCE: This is out the door. 742 00:35:34,250 --> 00:35:44,090 VICTOR COSTAN: One, two, four, three, two, one, four, three. 743 00:35:44,090 --> 00:35:45,075 And one more. 744 00:35:45,075 --> 00:35:48,120 One, two, three, four. 745 00:35:48,120 --> 00:35:50,150 So this is to refresh your memory. 746 00:35:50,150 --> 00:35:54,800 What do keys look like in merge and radix sort? 747 00:35:54,800 --> 00:35:58,580 So in concert, the keys have to be numbers from 0 to k minus 1. 748 00:35:58,580 --> 00:35:59,480 How about merge sort? 749 00:35:59,480 --> 00:36:00,508 What do keys look like? 750 00:36:10,560 --> 00:36:16,080 So radix sort says that a key is a sequence of digits. 751 00:36:16,080 --> 00:36:21,010 Say you have d digits in a key. 752 00:36:21,010 --> 00:36:23,880 But then each digit isn't necessarily a base 10 digit 753 00:36:23,880 --> 00:36:25,240 like we're used to. 754 00:36:25,240 --> 00:36:27,920 Each digit is in base k. 755 00:36:27,920 --> 00:36:31,940 So each digit can be from 0 to k minus 1. 756 00:36:31,940 --> 00:36:36,820 And we're using base k. 757 00:36:36,820 --> 00:36:39,230 How many keys can I represent this way? 758 00:36:41,940 --> 00:36:44,310 So if you have numbers of n digits in base k, 759 00:36:44,310 --> 00:36:46,310 what's the biggest number that we can represent, 760 00:36:46,310 --> 00:36:48,389 or how many numbers can we represent with that? 761 00:36:48,389 --> 00:36:49,796 AUDIENCE: n to the k. 762 00:36:49,796 --> 00:36:53,079 No, d to the k. 763 00:36:53,079 --> 00:36:54,774 Right? 764 00:36:54,774 --> 00:36:55,690 VICTOR COSTAN: Almost. 765 00:36:55,690 --> 00:36:56,856 AUDIENCE: [INAUDIBLE] the d. 766 00:37:00,479 --> 00:37:01,520 VICTOR COSTAN: All right. 767 00:37:01,520 --> 00:37:03,900 So if our base is two, like if we're using bits, 768 00:37:03,900 --> 00:37:05,630 then our base is two. 769 00:37:05,630 --> 00:37:08,010 And if I have eight bits, then two to the eight. 770 00:37:10,600 --> 00:37:11,330 Cool. 771 00:37:11,330 --> 00:37:14,490 So if I add one more digit, I get 772 00:37:14,490 --> 00:37:19,800 to multiply the number of keys I represent by k. 773 00:37:19,800 --> 00:37:21,850 How do I radix sort? 774 00:37:21,850 --> 00:37:24,610 Does anyone remember? 775 00:37:24,610 --> 00:37:28,692 AUDIENCE: We checked the log base k of everything. 776 00:37:28,692 --> 00:37:32,330 I guess log base d. 777 00:37:32,330 --> 00:37:33,260 Oh, k. 778 00:37:33,260 --> 00:37:34,190 It's based in-- 779 00:37:34,190 --> 00:37:34,330 VICTOR COSTAN: No. 780 00:37:34,330 --> 00:37:35,340 That would be hard math. 781 00:37:35,340 --> 00:37:36,820 We don't do hard math. 782 00:37:36,820 --> 00:37:40,340 In sorting, if you have integers going into your sort, 783 00:37:40,340 --> 00:37:41,690 you only do integer operations. 784 00:37:41,690 --> 00:37:43,690 You don't do anything math beyond them. 785 00:37:51,830 --> 00:37:54,520 So what we do is we've broken up the keys 786 00:37:54,520 --> 00:37:55,950 into d digits for a reason. 787 00:37:55,950 --> 00:37:59,030 We're going to have d rounds in the sort. 788 00:37:59,030 --> 00:38:03,659 And in each round, we're going to take all the keys 789 00:38:03,659 --> 00:38:04,200 that we have. 790 00:38:04,200 --> 00:38:08,940 And we're going to sort them according to one of the digits. 791 00:38:08,940 --> 00:38:11,800 So in one round, we'll sort them according to this digit. 792 00:38:11,800 --> 00:38:13,540 In one round, we'll sort them according 793 00:38:13,540 --> 00:38:15,690 to this digit, this digit, this digit. 794 00:38:18,750 --> 00:38:20,580 Which digit do we start with? 795 00:38:20,580 --> 00:38:21,988 What do you guys think? 796 00:38:21,988 --> 00:38:23,900 AUDIENCE: To least significant digit, right? 797 00:38:23,900 --> 00:38:25,691 AUDIENCE: And most significant on the left. 798 00:38:29,830 --> 00:38:32,272 VICTOR COSTAN: So this or this? 799 00:38:32,272 --> 00:38:33,580 AUDIENCE: The right side. 800 00:38:33,580 --> 00:38:36,180 AUDIENCE: 100 is bigger than 1, even 801 00:38:36,180 --> 00:38:41,219 though the 1 is greater than the 0 in 100. 802 00:38:41,219 --> 00:38:42,760 VICTOR COSTAN: You're helping me out. 803 00:38:42,760 --> 00:38:44,730 So the point I'm trying to make here 804 00:38:44,730 --> 00:38:47,280 is radix sort is unintuitive. 805 00:38:47,280 --> 00:38:49,269 If we ask you on a quiz what do you start with, 806 00:38:49,269 --> 00:38:50,810 your intuition will tell you to start 807 00:38:50,810 --> 00:38:52,460 with the most significant digit. 808 00:38:52,460 --> 00:38:53,876 Go against it. 809 00:38:53,876 --> 00:38:56,726 In radix sort, you start with the least significant digit 810 00:38:56,726 --> 00:38:57,850 and then move your way out. 811 00:38:57,850 --> 00:39:02,333 So radix sort goes like this. 812 00:39:02,333 --> 00:39:03,874 AUDIENCE: I mean, it does make sense, 813 00:39:03,874 --> 00:39:05,962 because you don't have very much information unless you're 814 00:39:05,962 --> 00:39:06,766 looking at bits. 815 00:39:06,766 --> 00:39:09,176 You can get a bunch of twos, but that 816 00:39:09,176 --> 00:39:10,622 doesn't give you much information. 817 00:39:10,622 --> 00:39:12,400 The most information is the smallest bit. 818 00:39:12,400 --> 00:39:14,671 And then you move up from there. 819 00:39:14,671 --> 00:39:16,420 VICTOR COSTAN: It depends what information 820 00:39:16,420 --> 00:39:17,530 you're trying to get. 821 00:39:17,530 --> 00:39:20,810 But maybe you know the algorithm, so you're thinking, 822 00:39:20,810 --> 00:39:22,470 oh, by knowing the algorithm, I know 823 00:39:22,470 --> 00:39:24,700 that I'll have the most information 824 00:39:24,700 --> 00:39:26,300 by looking at it this way. 825 00:39:26,300 --> 00:39:30,050 All right, so let's sort these by the last digit. 826 00:39:30,050 --> 00:39:30,990 Sweet. 827 00:39:30,990 --> 00:39:33,660 Let's sort them by the digit, by the digit 828 00:39:33,660 --> 00:39:35,970 before the last digit. 829 00:39:35,970 --> 00:39:38,022 What do I have to do in my sorting? 830 00:39:38,022 --> 00:39:39,480 What do I have to pay attention to? 831 00:39:43,380 --> 00:39:46,280 So the sorting method that I use has to have a property. 832 00:39:46,280 --> 00:39:47,780 It can't be any kind of sorting. 833 00:39:50,251 --> 00:39:50,750 Stable. 834 00:39:55,460 --> 00:39:57,990 So the reason we went through all this pain in counting sort 835 00:39:57,990 --> 00:40:00,840 is because we want to have a stable sort here. 836 00:40:00,840 --> 00:40:05,750 Now, let's try to sort these in a stable manner. 837 00:40:05,750 --> 00:40:09,110 This is the first one, two, four, one, three. 838 00:40:09,110 --> 00:40:15,854 Then I have two threes, so one, four, three, two, one, two, 839 00:40:15,854 --> 00:40:16,780 three, four. 840 00:40:16,780 --> 00:40:20,160 And then I have three fours. 841 00:40:20,160 --> 00:40:22,160 Two, three, four, one. 842 00:40:22,160 --> 00:40:24,660 Two, four, one, three. 843 00:40:24,660 --> 00:40:27,320 Two, one, four, three. 844 00:40:27,320 --> 00:40:28,340 Way this isn't good. 845 00:40:32,860 --> 00:40:35,560 Two, three, four, one. 846 00:40:35,560 --> 00:40:36,735 One, two, four, three. 847 00:40:36,735 --> 00:40:38,718 AUDIENCE: You should cross them off if you write them down. 848 00:40:38,718 --> 00:40:39,717 VICTOR COSTAN: I should. 849 00:40:43,450 --> 00:40:47,020 I was hoping you guys would help me if I mess up. 850 00:40:47,020 --> 00:40:48,900 So now these are sorted stably. 851 00:40:48,900 --> 00:40:53,330 Let's look at these last three that have the same digit here. 852 00:40:53,330 --> 00:40:54,900 So they have the same four. 853 00:40:54,900 --> 00:40:57,360 If you look at the last digit, because I 854 00:40:57,360 --> 00:40:59,850 used a stable sorting, they're also 855 00:40:59,850 --> 00:41:02,230 sorted according to this last digit. 856 00:41:02,230 --> 00:41:05,770 So they're sorted according to these last two digits, 857 00:41:05,770 --> 00:41:08,140 because the sorting that I used is stable. 858 00:41:08,140 --> 00:41:11,740 So now if I sort according to this digit, 859 00:41:11,740 --> 00:41:13,570 then if my sorting is stable, they're 860 00:41:13,570 --> 00:41:17,350 going to be sorted according to the last three digits. 861 00:41:17,350 --> 00:41:21,300 So as I go from my last digit to my first digit, 862 00:41:21,300 --> 00:41:23,300 the keys are going to be sorted according 863 00:41:23,300 --> 00:41:25,550 to the last digit, the last two digits, the last three 864 00:41:25,550 --> 00:41:28,190 digits, and then all the way up to everything. 865 00:41:28,190 --> 00:41:29,760 This is why I need a stable sort. 866 00:41:29,760 --> 00:41:32,034 And also, this is why I need to start from the end. 867 00:41:40,160 --> 00:41:42,870 Does this make some sense? 868 00:41:42,870 --> 00:41:46,750 What stable sort did we just learn? 869 00:41:46,750 --> 00:41:48,582 AUDIENCE: Counting. 870 00:41:48,582 --> 00:41:49,790 VICTOR COSTAN: Counting sort. 871 00:41:49,790 --> 00:41:50,290 All right. 872 00:41:50,290 --> 00:41:52,930 So we're going to use counting sort for this. 873 00:41:52,930 --> 00:41:54,680 What's the running time for one round? 874 00:41:54,680 --> 00:41:57,520 So for one sorting. 875 00:41:57,520 --> 00:41:59,220 One counting sort takes how much time? 876 00:42:02,094 --> 00:42:04,559 AUDIENCE: This is a radix sort. 877 00:42:04,559 --> 00:42:05,350 VICTOR COSTAN: Yes. 878 00:42:05,350 --> 00:42:08,620 So radix sort is d rounds of counting sort. 879 00:42:08,620 --> 00:42:10,320 Count sort this, count sort this, 880 00:42:10,320 --> 00:42:12,890 count sort this, count sort this. 881 00:42:12,890 --> 00:42:15,550 So one round, one counting sort, what's the running time? 882 00:42:18,256 --> 00:42:19,610 AUDIENCE: [INAUDIBLE]. 883 00:42:19,610 --> 00:42:20,651 VICTOR COSTAN: Thank you. 884 00:42:23,840 --> 00:42:31,750 Now how about d of these plus the running time? 885 00:42:31,750 --> 00:42:33,202 AUDIENCE: dn plus b. 886 00:42:40,480 --> 00:42:43,040 VICTOR COSTAN: OK, but I want to come back here. 887 00:42:43,040 --> 00:42:46,450 And I want to be able to say that radix sort is optimal. 888 00:42:46,450 --> 00:42:50,780 I want to be able to say that it is order n. 889 00:42:50,780 --> 00:42:53,635 So what do I have to do in order to be able to say that? 890 00:42:58,810 --> 00:43:02,550 AUDIENCE: [INAUDIBLE] k equal to m. 891 00:43:02,550 --> 00:43:05,450 VICTOR COSTAN: So you're going from-- you know the answer. 892 00:43:05,450 --> 00:43:07,616 You're going from the fact that you know the answer. 893 00:43:07,616 --> 00:43:08,660 AUDIENCE: [INAUDIBLE]. 894 00:43:08,660 --> 00:43:09,510 VICTOR COSTAN: OK, very good. 895 00:43:09,510 --> 00:43:11,010 What if we wouldn't know the answer? 896 00:43:11,010 --> 00:43:12,725 What do I need to do? 897 00:43:12,725 --> 00:43:14,808 AUDIENCE: Well, we know the first part is order n. 898 00:43:14,808 --> 00:43:16,189 So-- 899 00:43:16,189 --> 00:43:17,480 VICTOR COSTAN: So d has to be-- 900 00:43:17,480 --> 00:43:21,400 AUDIENCE: We want dn to be greater than dk, right? 901 00:43:21,400 --> 00:43:23,470 VICTOR COSTAN: Well, so dn. 902 00:43:23,470 --> 00:43:26,817 dn has to be, at most, o of n, right. 903 00:43:26,817 --> 00:43:28,900 Because otherwise, the whole thing would go above. 904 00:43:28,900 --> 00:43:30,150 So that wouldn't work. 905 00:43:30,150 --> 00:43:34,000 So then what can I say about d? 906 00:43:34,000 --> 00:43:35,340 AUDIENCE: Constant. 907 00:43:35,340 --> 00:43:36,381 VICTOR COSTAN: Very good. 908 00:43:36,381 --> 00:43:38,402 And how do you write constants in math mode? 909 00:43:38,402 --> 00:43:39,509 AUDIENCE: Order one. 910 00:43:39,509 --> 00:43:40,550 VICTOR COSTAN: Very good. 911 00:43:40,550 --> 00:43:42,060 So d has to be order one. 912 00:43:42,060 --> 00:43:44,980 Otherwise, it's not going to come out to that. 913 00:43:44,980 --> 00:43:46,490 Now, what else do we know? 914 00:43:46,490 --> 00:43:50,700 We have this that's order n plus k. 915 00:43:50,700 --> 00:43:52,940 If I said this to be a lot smaller than k, 916 00:43:52,940 --> 00:43:56,820 if I set it to be log n, it's going to be order n. 917 00:43:56,820 --> 00:44:01,900 If I set it k to be a constant, if I use bits, 918 00:44:01,900 --> 00:44:05,500 if I use base 2-- so I said k equal 2-- this is still 919 00:44:05,500 --> 00:44:07,080 going to be order n. 920 00:44:07,080 --> 00:44:09,740 So if k goes way below n, this step 921 00:44:09,740 --> 00:44:11,360 is still going to be order n. 922 00:44:11,360 --> 00:44:13,975 So I might as well set k as high as possible. 923 00:44:17,550 --> 00:44:21,330 So k is order n, because that's the highest 924 00:44:21,330 --> 00:44:22,650 thing I could set it to. 925 00:44:22,650 --> 00:44:25,120 Now why do I want to do that? 926 00:44:25,120 --> 00:44:26,374 Yes, you have a ques-- 927 00:44:26,374 --> 00:44:29,146 AUDIENCE: [INAUDIBLE] represent in counting sort again? 928 00:44:29,146 --> 00:44:31,352 The length of what? 929 00:44:31,352 --> 00:44:32,810 VICTOR COSTAN: So in counting sort, 930 00:44:32,810 --> 00:44:37,230 n is your input, how many keys you have. 931 00:44:37,230 --> 00:44:41,150 And k is the size of this array. 932 00:44:41,150 --> 00:44:41,859 AUDIENCE: Oh, OK. 933 00:44:41,859 --> 00:44:43,691 VICTOR COSTAN: So you have to be able to map 934 00:44:43,691 --> 00:44:45,090 your keys from 0 to k minus 1. 935 00:44:45,090 --> 00:44:46,962 AUDIENCE: It's set by n, basically. 936 00:44:46,962 --> 00:44:48,370 Or it's set by the elements. 937 00:44:48,370 --> 00:44:48,760 VICTOR COSTAN: Yeah. 938 00:44:48,760 --> 00:44:50,115 It's set by the nature of the keys. 939 00:44:50,115 --> 00:44:50,390 AUDIENCE: OK. 940 00:44:50,390 --> 00:44:50,910 Got it. 941 00:44:50,910 --> 00:44:52,451 VICTOR COSTAN: So in real life, we're 942 00:44:52,451 --> 00:44:55,520 thinking maybe we have some huge numbers that we want to sort. 943 00:44:55,520 --> 00:44:57,830 And we're going to chunk them up into-- when we're 944 00:44:57,830 --> 00:44:59,330 writing on the board, we always have 945 00:44:59,330 --> 00:45:00,829 to chunk them up in base 10 digits, 946 00:45:00,829 --> 00:45:02,870 because that's the only way we know how to write. 947 00:45:02,870 --> 00:45:05,890 But in a computer memory, we can chunk them up into, say, 948 00:45:05,890 --> 00:45:08,440 base 10,000 digits. 949 00:45:08,440 --> 00:45:10,440 And the fewer digits you have, the faster 950 00:45:10,440 --> 00:45:11,380 this is going to run. 951 00:45:11,380 --> 00:45:14,480 So we have to figure out what's the base. 952 00:45:14,480 --> 00:45:15,940 And it turns out that if you want 953 00:45:15,940 --> 00:45:19,680 to have radix sort run in order and time, well, 954 00:45:19,680 --> 00:45:24,390 the number of digits has to be sort of constant. 955 00:45:24,390 --> 00:45:27,210 I know that k should be order n, because I 956 00:45:27,210 --> 00:45:30,320 have no interest in making it lower than that. 957 00:45:30,320 --> 00:45:32,050 So these two bounds together tell me 958 00:45:32,050 --> 00:45:36,666 that the keys that I can sort are from zero 959 00:45:36,666 --> 00:45:42,610 up to order n of order one. 960 00:45:42,610 --> 00:45:45,940 And this looks terrible, but what it comes up to 961 00:45:45,940 --> 00:45:49,690 is that you can sort keys that look 962 00:45:49,690 --> 00:45:52,640 like n to some constant for any constant. 963 00:45:55,630 --> 00:45:59,690 So you can sort huge keys, as long as huge still 964 00:45:59,690 --> 00:46:00,260 means finite. 965 00:46:04,070 --> 00:46:06,780 And as long as you can figure out how to map them to numbers. 966 00:46:10,060 --> 00:46:13,170 Does this make some sense? 967 00:46:13,170 --> 00:46:15,950 Would we ever want to use merge sort instead of counting sort? 968 00:46:15,950 --> 00:46:17,870 Suppose we had a stable merge sort. 969 00:46:17,870 --> 00:46:24,071 Would we want to use that instead of counting sort here? 970 00:46:24,071 --> 00:46:24,820 What would happen? 971 00:46:32,300 --> 00:46:33,330 So suppose it's stable. 972 00:46:33,330 --> 00:46:34,010 So it's correct. 973 00:46:34,010 --> 00:46:35,551 The algorithm isn't going to blow up. 974 00:46:35,551 --> 00:46:37,273 What's the running time for merge sort? 975 00:46:44,380 --> 00:46:46,880 So if I use a merge sort. 976 00:46:46,880 --> 00:46:50,760 So if I use the merge sort, it's going to be d times n log n. 977 00:46:50,760 --> 00:46:52,980 So no matter how small d is, I'm still 978 00:46:52,980 --> 00:46:55,190 not running in linear time. 979 00:46:55,190 --> 00:46:57,657 So merge sort does not go well with radix sort. 980 00:47:01,340 --> 00:47:03,280 So from my end, we're pretty much done. 981 00:47:03,280 --> 00:47:04,730 We started with n log n. 982 00:47:04,730 --> 00:47:08,217 And we got to a sorting algorithm that's order n. 983 00:47:08,217 --> 00:47:10,300 We started at the beginning of [INAUDIBLE], saying 984 00:47:10,300 --> 00:47:13,090 that the best thing we can do is omega-- 985 00:47:13,090 --> 00:47:15,160 is that omega-- omega of n. 986 00:47:15,160 --> 00:47:16,051 We got to that limit. 987 00:47:16,051 --> 00:47:16,550 We're happy. 988 00:47:16,550 --> 00:47:17,950 We're going to be done with sorting. 989 00:47:17,950 --> 00:47:19,140 Any questions from you guys? 990 00:47:23,912 --> 00:47:25,495 That means everyone's confused, right? 991 00:47:25,495 --> 00:47:26,360 Yes, thank you. 992 00:47:26,360 --> 00:47:28,610 AUDIENCE: Can you explain what the stability criteria 993 00:47:28,610 --> 00:47:29,512 is again? 994 00:47:29,512 --> 00:47:30,345 VICTOR COSTAN: The-- 995 00:47:30,345 --> 00:47:34,990 AUDIENCE: Stability for these sorting algorithms. 996 00:47:34,990 --> 00:47:37,032 Which ones are stable and what makes it unstable? 997 00:47:37,032 --> 00:47:38,531 VICTOR COSTAN: All right, very good. 998 00:47:38,531 --> 00:47:39,160 Thank you. 999 00:47:39,160 --> 00:47:41,420 So I like especially the last part, 1000 00:47:41,420 --> 00:47:42,674 with which ones are stable. 1001 00:47:42,674 --> 00:47:43,840 I'd like to go through that. 1002 00:47:43,840 --> 00:47:46,410 So a stable sorting algorithm means 1003 00:47:46,410 --> 00:47:49,300 that if you have two keys that are equal, 1004 00:47:49,300 --> 00:47:51,480 the key that shows up first in the input 1005 00:47:51,480 --> 00:47:55,610 is the key that is produced to the output. 1006 00:47:55,610 --> 00:47:58,920 So in this model, your keys are not necessarily integers. 1007 00:47:58,920 --> 00:48:00,950 Your keys might be those weird classes 1008 00:48:00,950 --> 00:48:04,880 that implement some method that maps them to integers. 1009 00:48:04,880 --> 00:48:08,665 So say there is a method there, __int__, 1010 00:48:08,665 --> 00:48:11,880 that gives you the integer for that. 1011 00:48:11,880 --> 00:48:14,750 So the sorting algorithm would only see a three here. 1012 00:48:14,750 --> 00:48:17,010 But in fact, this is a complex object. 1013 00:48:17,010 --> 00:48:18,830 And this is another complex object, 1014 00:48:18,830 --> 00:48:21,690 but the sorting only sees the three. 1015 00:48:21,690 --> 00:48:24,380 If this guy shows up before this guy in the input, 1016 00:48:24,380 --> 00:48:28,278 they have to show up in the same order in the output. 1017 00:48:28,278 --> 00:48:31,480 AUDIENCE: Why would that be bad if they're switched? 1018 00:48:31,480 --> 00:48:34,730 VICTOR COSTAN: It's not stable. 1019 00:48:34,730 --> 00:48:39,900 If they're switched, then when we're using a stable sorting 1020 00:48:39,900 --> 00:48:41,720 algorithm here. 1021 00:48:41,720 --> 00:48:46,050 So here, the key is this complicated object. 1022 00:48:46,050 --> 00:48:47,650 But say we're in the second round. 1023 00:48:47,650 --> 00:48:49,900 We're in this round, which we played with. 1024 00:48:49,900 --> 00:48:52,730 Even though the key is this whole complicated object, 1025 00:48:52,730 --> 00:48:55,730 the only thing that the counting sort sees is this number. 1026 00:48:58,400 --> 00:49:00,430 So this guy looks like three. 1027 00:49:00,430 --> 00:49:01,930 This guy looks like three. 1028 00:49:01,930 --> 00:49:04,310 And these three guys, although they're different, 1029 00:49:04,310 --> 00:49:06,430 they look like four. 1030 00:49:06,430 --> 00:49:09,480 If I don't output them in the right order-- 1031 00:49:09,480 --> 00:49:12,770 say I output this one all the way at the end-- then 1032 00:49:12,770 --> 00:49:16,550 I'm going to get two, three, four, one to be down here. 1033 00:49:16,550 --> 00:49:20,360 And now my numbers aren't sorted by the last two digits anymore. 1034 00:49:20,360 --> 00:49:23,440 So it breaks any algorithm that assumes stability. 1035 00:49:23,440 --> 00:49:25,910 So stability is something that you get from a sort, 1036 00:49:25,910 --> 00:49:27,940 because it's convenient to assume it 1037 00:49:27,940 --> 00:49:30,580 in some other algorithm that builds up on that sort. 1038 00:49:30,580 --> 00:49:32,496 If you don't need it, you don't care about it. 1039 00:49:32,496 --> 00:49:34,920 But in some cases, you need it. 1040 00:49:34,920 --> 00:49:37,915 And for the second part, which algorithms are stable. 1041 00:49:41,370 --> 00:49:42,675 Is insertion sort stable? 1042 00:49:46,968 --> 00:49:47,960 AUDIENCE: I assume so. 1043 00:49:47,960 --> 00:49:49,810 I mean, stable is being correct, right? 1044 00:49:49,810 --> 00:49:50,560 VICTOR COSTAN: No. 1045 00:49:50,560 --> 00:49:53,482 We mean that property there. 1046 00:49:53,482 --> 00:49:54,344 AUDIENCE: Oh, I see. 1047 00:49:54,344 --> 00:49:55,204 You mean in order. 1048 00:49:55,204 --> 00:49:55,995 VICTOR COSTAN: Yep. 1049 00:49:55,995 --> 00:49:56,905 AUDIENCE: Oh, OK. 1050 00:50:00,927 --> 00:50:03,463 Insertion sort goes in order. 1051 00:50:03,463 --> 00:50:06,076 But I guess it could push other things out of order. 1052 00:50:06,076 --> 00:50:07,450 VICTOR COSTAN: So insertion sort, 1053 00:50:07,450 --> 00:50:09,730 you're doing swapping to move things to the left. 1054 00:50:09,730 --> 00:50:11,480 But if you find two things that are equal, 1055 00:50:11,480 --> 00:50:14,000 you're never going to swap them. 1056 00:50:14,000 --> 00:50:18,760 So insertion sort is in order, is stable. 1057 00:50:18,760 --> 00:50:23,130 Merge sort, the one we gave you in that list is not stable. 1058 00:50:23,130 --> 00:50:25,907 But there is the one character change that makes it stable. 1059 00:50:25,907 --> 00:50:27,740 And you should look at today's lecture notes 1060 00:50:27,740 --> 00:50:29,260 to find out what that is. 1061 00:50:29,260 --> 00:50:31,580 So merge sort can be stable. 1062 00:50:31,580 --> 00:50:33,075 Heapsort, stable or unstable? 1063 00:50:37,040 --> 00:50:37,630 Unstable. 1064 00:50:37,630 --> 00:50:39,070 And there's a really small example 1065 00:50:39,070 --> 00:50:41,180 that you should look at. 1066 00:50:41,180 --> 00:50:43,470 Counting sort, stable or unstable? 1067 00:50:43,470 --> 00:50:44,430 AUDIENCE: Stable. 1068 00:50:44,430 --> 00:50:45,471 VICTOR COSTAN: Thank you. 1069 00:50:45,471 --> 00:50:48,700 It would have broken my heart if this would have come out wrong. 1070 00:50:48,700 --> 00:50:50,878 And radix sort? 1071 00:50:50,878 --> 00:50:52,330 AUDIENCE: Probably. 1072 00:50:52,330 --> 00:50:54,417 Yes. 1073 00:50:54,417 --> 00:50:56,000 VICTOR COSTAN: Probably stable, right? 1074 00:50:56,000 --> 00:50:56,755 All right. 1075 00:50:56,755 --> 00:50:57,970 Any more questions? 1076 00:50:57,970 --> 00:51:00,980 I like that question by the way, because you made me do this. 1077 00:51:00,980 --> 00:51:01,480 I like that. 1078 00:51:01,480 --> 00:51:02,405 Any more questions? 1079 00:51:06,070 --> 00:51:08,390 All right, thank you guys.