1 00:00:00,090 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,215 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,215 --> 00:00:17,840 at ocw.mit.edu. 8 00:00:20,706 --> 00:00:23,206 AMARTYA SHANKHA BISWAS: Feel free to populate the front row. 9 00:00:23,206 --> 00:00:26,030 I'm not that scary. 10 00:00:26,030 --> 00:00:29,010 So today, we're going to look at more greedy algorithms. 11 00:00:29,010 --> 00:00:31,420 So I think you went over Kruskal's algorithm 12 00:00:31,420 --> 00:00:34,040 and how you do the sorting in the lecture. 13 00:00:34,040 --> 00:00:38,970 So going back to make change from last recitation, 14 00:00:38,970 --> 00:00:41,220 so this is sort of a variant on that. 15 00:00:41,220 --> 00:00:43,080 So instead of discrete coins, we now 16 00:00:43,080 --> 00:00:46,440 have continuous coins, in the sense so the analogy here is, 17 00:00:46,440 --> 00:00:50,030 let's say, you have N metals, and each of the metals 18 00:00:50,030 --> 00:00:53,550 has some value given by Ci dollars per kilogram, 19 00:00:53,550 --> 00:00:55,990 or whatever units you prefer. 20 00:00:55,990 --> 00:00:58,660 And you want to achieve some value 21 00:00:58,660 --> 00:01:01,520 T. You want to give someone T dollars worth of metal. 22 00:01:01,520 --> 00:01:04,110 And you want to do this while minimizing-- oh, so I 23 00:01:04,110 --> 00:01:05,099 should mention this. 24 00:01:05,099 --> 00:01:07,580 ki is the weight of every metal that you 25 00:01:07,580 --> 00:01:09,720 will give to the person. 26 00:01:09,720 --> 00:01:13,242 So you're taking ki of metal i, and you are going to-- 27 00:01:13,242 --> 00:01:14,950 and you have to ensure, so basically, you 28 00:01:14,950 --> 00:01:22,820 have to ensure that some summation of ki Ci over all i 29 00:01:22,820 --> 00:01:26,890 is equal to T. And in doing so you want to minimize 30 00:01:26,890 --> 00:01:28,570 the summation over all ki. 31 00:01:28,570 --> 00:01:29,570 So does that make sense? 32 00:01:29,570 --> 00:01:30,550 So you have a bunch of metals. 33 00:01:30,550 --> 00:01:32,383 Some of them are more expensive than others. 34 00:01:32,383 --> 00:01:34,740 And you want to measure them out and give someone 35 00:01:34,740 --> 00:01:36,660 a certain fixed value. 36 00:01:36,660 --> 00:01:39,590 So anyone have any ideas how to do this? 37 00:01:39,590 --> 00:01:49,780 Should be-- should be the first thing that comes to mind. 38 00:01:49,780 --> 00:01:52,340 So you have much of metals-- some of them 39 00:01:52,340 --> 00:01:53,470 with certain costs. 40 00:01:53,470 --> 00:01:58,200 And you're trying to create a value T. 41 00:01:58,200 --> 00:02:00,660 So which metal would you want to pick? 42 00:02:00,660 --> 00:02:03,422 So it should seem intuitive that if you 43 00:02:03,422 --> 00:02:05,130 want to minimize the weight of the metal, 44 00:02:05,130 --> 00:02:07,540 you would want to pick the-- have the most 45 00:02:07,540 --> 00:02:10,130 expensive one for weight. 46 00:02:10,130 --> 00:02:17,760 So let's start by sort by Ci. 47 00:02:17,760 --> 00:02:22,990 And we want to sort it in decreasing order. 48 00:02:22,990 --> 00:02:23,670 Does make sense? 49 00:02:23,670 --> 00:02:25,920 So if you have the most expensive metal, 50 00:02:25,920 --> 00:02:28,480 you want to use as much of that as you can, 51 00:02:28,480 --> 00:02:30,510 so that your weight is minimized. 52 00:02:30,510 --> 00:02:33,560 So once you sort by Ci, so let's say, you have your costs 53 00:02:33,560 --> 00:02:41,190 right now are-- let's call this one C1, C2, up to Cn. 54 00:02:41,190 --> 00:02:42,760 And these are in sorted order. 55 00:02:42,760 --> 00:02:44,120 So it's increasing this way. 56 00:02:46,910 --> 00:02:52,970 So you now take your value T, and you look at T by C1. 57 00:02:52,970 --> 00:02:56,000 And that is the amount of weight you would need to generate C. 58 00:02:56,000 --> 00:02:58,900 So you look at how much you have here. 59 00:02:58,900 --> 00:03:02,400 So the amount of metal-- so a constraint I forgot to mention, 60 00:03:02,400 --> 00:03:05,301 you are given a limited amount of every metal. 61 00:03:05,301 --> 00:03:10,340 OK, that's-- it's not that trivial. 62 00:03:10,340 --> 00:03:13,280 So you have-- let's mention that. 63 00:03:13,280 --> 00:03:16,260 So you have-- is that used? 64 00:03:16,260 --> 00:03:19,940 No, it's not-- amount. 65 00:03:32,136 --> 00:03:33,390 Does that make more sense? 66 00:03:36,210 --> 00:03:38,730 So you look at T over Ci. 67 00:03:38,730 --> 00:03:45,220 And if T over Ci is greater than W of i, 68 00:03:45,220 --> 00:03:47,890 then you just use the amount you need to construct Wi, 69 00:03:47,890 --> 00:03:49,170 and you're done. 70 00:03:49,170 --> 00:03:52,100 Otherwise, you use all of Ci. 71 00:03:52,100 --> 00:03:57,320 So if it's less than Wi, in that case, 72 00:03:57,320 --> 00:03:59,420 you-- sorry, other way around. 73 00:03:59,420 --> 00:04:02,140 If it's greater than Wi, you use all of it. 74 00:04:02,140 --> 00:04:05,210 And then you move on to the next one, the next one, and so on. 75 00:04:05,210 --> 00:04:06,890 So that seems pretty intuitive. 76 00:04:06,890 --> 00:04:09,300 Let's actually do a formal proof of that. 77 00:04:09,300 --> 00:04:11,660 So how you go about proving this is 78 00:04:11,660 --> 00:04:15,201 that-- so let's say-- so it's what we call the current base 79 00:04:15,201 --> 00:04:15,700 method. 80 00:04:15,700 --> 00:04:17,201 So basically what you have is, let's 81 00:04:17,201 --> 00:04:19,241 say you're not using the most expensive metal you 82 00:04:19,241 --> 00:04:20,079 have at this point. 83 00:04:20,079 --> 00:04:23,680 So let's say your most expensive metal has cost Ci, 84 00:04:23,680 --> 00:04:27,450 but instead, you decide to use Cj. 85 00:04:27,450 --> 00:04:33,350 So let's say you decide to use some kj amount of Cj. 86 00:04:33,350 --> 00:04:40,260 So the value you're getting from this is Cj kj. 87 00:04:40,260 --> 00:04:45,250 And instead, if you use Ci, how much metal would 88 00:04:45,250 --> 00:04:46,690 you need to get the same value? 89 00:04:46,690 --> 00:04:50,885 You would need Cj kj over Ci. 90 00:04:50,885 --> 00:04:52,860 Does that make sense? 91 00:04:52,860 --> 00:04:55,560 So this is the value you would obtain by using 92 00:04:55,560 --> 00:04:59,290 kj kilograms of this metal. 93 00:04:59,290 --> 00:05:01,830 So if you instead used this one, you'd get this value. 94 00:05:01,830 --> 00:05:04,480 And so this is the most expensive one. 95 00:05:04,480 --> 00:05:07,840 And Ci is greater than Cj, this value, 96 00:05:07,840 --> 00:05:11,200 so this value is less than kj. 97 00:05:11,200 --> 00:05:14,150 So by using this metal instead of that one, 98 00:05:14,150 --> 00:05:16,830 you are decreasing the amount-- the weight you would need, 99 00:05:16,830 --> 00:05:19,110 so your minimization goes down. 100 00:05:19,110 --> 00:05:20,175 Make sense? 101 00:05:20,175 --> 00:05:23,860 So that's like a very simple greedy algorithm. 102 00:05:23,860 --> 00:05:26,350 And it's-- the algorithm is exactly what you'd expect, 103 00:05:26,350 --> 00:05:28,180 and the proof isn't very hard. 104 00:05:28,180 --> 00:05:32,040 So let's move on to a slightly interesting one. 105 00:05:36,130 --> 00:05:38,870 So this is process scheduling. 106 00:05:38,870 --> 00:05:40,250 So let's say you have a computer, 107 00:05:40,250 --> 00:05:41,780 and you're running end processes. 108 00:05:41,780 --> 00:05:45,560 And each of the process has a time-- t1 through tn, 109 00:05:45,560 --> 00:05:46,670 again processes. 110 00:05:46,670 --> 00:05:48,890 And you want to order them in some way. 111 00:05:48,890 --> 00:05:50,560 So first, you will do process p1. 112 00:05:50,560 --> 00:05:53,550 Then you'll process p2, and so on, and so forth. 113 00:05:53,550 --> 00:05:55,190 Then you'll define a completion time. 114 00:05:55,190 --> 00:05:58,026 So completion time is simply when does process i end. 115 00:05:58,026 --> 00:05:59,150 So when does process i end? 116 00:05:59,150 --> 00:06:02,140 You just p1 plus-- it's like the time for p1 plus time for p2 117 00:06:02,140 --> 00:06:03,170 up to pn. 118 00:06:03,170 --> 00:06:07,230 So basically, you have all your processes. 119 00:06:07,230 --> 00:06:12,130 So let's says this is p1, this is p2, and so on. 120 00:06:12,130 --> 00:06:14,690 And the completion time for a certain process in the middle 121 00:06:14,690 --> 00:06:17,000 is just the sum of all times before it. 122 00:06:17,000 --> 00:06:18,150 That's completion time. 123 00:06:18,150 --> 00:06:18,920 And now what you want to do is you 124 00:06:18,920 --> 00:06:20,544 want to minimize the average completion 125 00:06:20,544 --> 00:06:22,860 time, which is summation over all the completion 126 00:06:22,860 --> 00:06:25,270 times over n. 127 00:06:25,270 --> 00:06:29,370 So any ideas what an algorithm for this would look like? 128 00:06:29,370 --> 00:06:31,780 Essentially, you want to minimize 129 00:06:31,780 --> 00:06:33,460 the sum of all these times. 130 00:06:33,460 --> 00:06:36,250 So all these times, you want to minimize the average of these. 131 00:06:36,250 --> 00:06:37,333 So what do you want to do? 132 00:06:37,333 --> 00:06:40,440 Do you want to shift the slower-- the processes which 133 00:06:40,440 --> 00:06:41,680 take more time, do you want to keep them at the end, 134 00:06:41,680 --> 00:06:43,555 or do you want to keep them at the beginning? 135 00:06:50,310 --> 00:06:52,060 So if you have a bunch of small processes, 136 00:06:52,060 --> 00:06:53,220 what do you do with them at the end? 137 00:06:53,220 --> 00:06:54,553 What do you do at the beginning? 138 00:07:02,990 --> 00:07:04,520 Completion time is when does-- So 139 00:07:04,520 --> 00:07:06,400 let's say this is process pi. 140 00:07:06,400 --> 00:07:09,097 And completion time for process pi is like this distance. 141 00:07:09,097 --> 00:07:10,680 It's like, when does pi get completed? 142 00:07:10,680 --> 00:07:13,840 So it's summation of all the times-- so the time 143 00:07:13,840 --> 00:07:16,030 taken for p1, p2, up to the end, see? 144 00:07:16,030 --> 00:07:17,840 So you want to basically minimize 145 00:07:17,840 --> 00:07:20,629 the average of these values. 146 00:07:20,629 --> 00:07:22,420 So where do you put the smaller processes-- 147 00:07:22,420 --> 00:07:23,878 would you put the shorter processes 148 00:07:23,878 --> 00:07:25,790 at the end or the beginning? 149 00:07:25,790 --> 00:07:29,435 Which one would decrease your average? 150 00:07:29,435 --> 00:07:31,080 The beginning? 151 00:07:31,080 --> 00:07:32,307 Makes sense, right? 152 00:07:32,307 --> 00:07:33,390 So yeah, that makes sense. 153 00:07:33,390 --> 00:07:34,610 So you basically want to like scrunch 154 00:07:34,610 --> 00:07:36,026 these lines towards the beginning, 155 00:07:36,026 --> 00:07:37,740 so your average is smaller. 156 00:07:37,740 --> 00:07:41,000 Note that this total length is always the constant. 157 00:07:41,000 --> 00:07:44,640 It's like summation over all ti. 158 00:07:44,640 --> 00:07:49,060 So let's go about-- so OK, this is strategy. 159 00:07:49,060 --> 00:07:58,590 Again, sort by ti, and this is increasing order, 160 00:07:58,590 --> 00:08:00,100 and that's it basically. 161 00:08:00,100 --> 00:08:02,170 So this is your algorithm, sorted by tn, 162 00:08:02,170 --> 00:08:03,950 but use the process in that order. 163 00:08:03,950 --> 00:08:06,120 So let's try to prove this. 164 00:08:06,120 --> 00:08:08,417 So the way you prove this is a pretty generic method. 165 00:08:08,417 --> 00:08:10,250 It is often used to prove greedy algorithms. 166 00:08:10,250 --> 00:08:11,430 So let's say that this is not the optimal. 167 00:08:11,430 --> 00:08:12,890 Let's say someone comes up to you 168 00:08:12,890 --> 00:08:15,490 and tells you, OK, I have a better sequence. 169 00:08:15,490 --> 00:08:17,100 I have a sequence, let's say, called-- 170 00:08:17,100 --> 00:08:19,760 let's say I have a sequence of p1 to pn. 171 00:08:19,760 --> 00:08:22,620 And that sequence does better than a sorted order. 172 00:08:22,620 --> 00:08:26,585 So you're like, OK, so if this is not sorted, 173 00:08:26,585 --> 00:08:28,335 then you have some elements in the middle. 174 00:08:28,335 --> 00:08:31,480 Let's say you call them pi is greater 175 00:08:31,480 --> 00:08:39,770 than pj, with i is less than j. 176 00:08:39,770 --> 00:08:43,500 So there's some pi here, and there's some pj here, such 177 00:08:43,500 --> 00:08:45,614 that this is greater than that. 178 00:08:45,614 --> 00:08:46,780 So it's not in sorted order. 179 00:08:46,780 --> 00:08:48,800 So you can always find a pair like that. 180 00:08:48,800 --> 00:08:53,140 So now I'm going to claim that if you swap these two 181 00:08:53,140 --> 00:08:57,770 values-- so you swap pi and pj-- that'll actually 182 00:08:57,770 --> 00:09:01,340 decrease whatever current average completion time you 183 00:09:01,340 --> 00:09:01,840 have. 184 00:09:04,820 --> 00:09:06,880 So initially, you had something like this. 185 00:09:06,880 --> 00:09:10,980 So-- no, let's not draw a line there. 186 00:09:10,980 --> 00:09:16,670 So let's say you had something like you had this process, so 187 00:09:16,670 --> 00:09:21,130 pi-- actually, this is the bigger process, so this is pi, 188 00:09:21,130 --> 00:09:25,547 and this is pj. 189 00:09:25,547 --> 00:09:26,922 And now I'm saying that-- and you 190 00:09:26,922 --> 00:09:28,770 have some stuff in the middle. 191 00:09:28,770 --> 00:09:31,190 And my claim is that, no, this is not optimal. 192 00:09:31,190 --> 00:09:33,850 You'd do much better if you moved the pj over here. 193 00:09:33,850 --> 00:09:41,630 So you want to go from this to this, and big process. 194 00:09:41,630 --> 00:09:44,410 So let's see what changes when you go from there to there. 195 00:09:44,410 --> 00:09:46,760 So first of all, observe that the completion 196 00:09:46,760 --> 00:09:48,594 times of everything behind this is the same. 197 00:09:48,594 --> 00:09:50,218 They all have the same completion time; 198 00:09:50,218 --> 00:09:51,160 nothing is affected. 199 00:09:51,160 --> 00:09:53,044 And you're only changing these two things. 200 00:09:53,044 --> 00:09:54,710 Everything after this is also the same-- 201 00:09:54,710 --> 00:09:56,620 has the same completion time. 202 00:09:56,620 --> 00:09:58,120 So the only things that are changing 203 00:09:58,120 --> 00:10:00,490 are this one, this one, and all the ones up to this one. 204 00:10:00,490 --> 00:10:02,990 Even this one has the same completion time. 205 00:10:02,990 --> 00:10:03,900 Make sense? 206 00:10:03,900 --> 00:10:08,100 So how much is this changing by? 207 00:10:08,100 --> 00:10:08,990 So let's define this. 208 00:10:08,990 --> 00:10:18,110 Delta is equal to t of pi minus t of pj-- 209 00:10:18,110 --> 00:10:20,650 so the difference between these two processes. 210 00:10:20,650 --> 00:10:25,060 So the original completion time of pi was this. 211 00:10:25,060 --> 00:10:26,600 And now the corresponding process 212 00:10:26,600 --> 00:10:29,290 down here, the completion time is decreased by delta. 213 00:10:29,290 --> 00:10:32,340 So completion time for us goes down like minus delta. 214 00:10:32,340 --> 00:10:34,020 This is a summation of completion time. 215 00:10:34,020 --> 00:10:35,500 This divided by n is a constant. 216 00:10:35,500 --> 00:10:38,642 So you just want to minimize this. 217 00:10:38,642 --> 00:10:40,850 So first it goes-- so this one goes down minus delta. 218 00:10:40,850 --> 00:10:42,266 So let's look at the next process. 219 00:10:42,266 --> 00:10:44,700 The next process is something like this. 220 00:10:44,700 --> 00:10:46,539 So again, these do not change. 221 00:10:46,539 --> 00:10:47,830 You're only swapping these two. 222 00:10:47,830 --> 00:10:50,820 So this completion time also down a minus delta, 223 00:10:50,820 --> 00:10:52,330 and so on, and so forth. 224 00:10:52,330 --> 00:10:54,880 So you just get a bunch of minus deltas, 225 00:10:54,880 --> 00:10:57,311 which is equal to however many processes you have. 226 00:10:57,311 --> 00:10:58,560 But that's not even important. 227 00:10:58,560 --> 00:11:00,490 What is important is that just by swapping, 228 00:11:00,490 --> 00:11:02,810 you're going to get at least one minus delta. 229 00:11:02,810 --> 00:11:06,084 And delta is positive, because assumption-- oops, sorry, 230 00:11:06,084 --> 00:11:11,347 t-- because assumption was that t pi minus t pj is positive. 231 00:11:11,347 --> 00:11:13,680 So just by swapping, you're going to always decrease it. 232 00:11:13,680 --> 00:11:17,260 So the claim that that sequence was an optimal solution 233 00:11:17,260 --> 00:11:18,260 is wrong. 234 00:11:18,260 --> 00:11:21,040 So you can always do better by swapping two inversions. 235 00:11:21,040 --> 00:11:24,180 So that out of sorted order is called an inversion. 236 00:11:24,180 --> 00:11:27,626 So if you solve an inversion, you always get a better result. 237 00:11:27,626 --> 00:11:28,750 Does that proof make sense? 238 00:11:33,060 --> 00:11:36,560 So that's a slightly more interesting recent algorithm. 239 00:11:36,560 --> 00:11:40,410 So let's move on to the third one we have here. 240 00:11:40,410 --> 00:11:42,901 The third one is event overlap. 241 00:11:42,901 --> 00:11:43,900 So this is how it works. 242 00:11:46,520 --> 00:11:51,210 So you wake up in the morning, and you look at your calendar. 243 00:11:51,210 --> 00:11:54,450 And being an MIT student, your calendar looks pretty full. 244 00:11:54,450 --> 00:11:57,810 So let's say this is what it looks like. 245 00:11:57,810 --> 00:11:59,942 So these are your events. 246 00:11:59,942 --> 00:12:04,872 Let's use some colors, make it a little clearer possibly. 247 00:12:04,872 --> 00:12:08,140 And let's say you have another event over here. 248 00:12:10,680 --> 00:12:14,816 You have something here. 249 00:12:14,816 --> 00:12:18,956 You have something here. 250 00:12:18,956 --> 00:12:21,396 You have something here. 251 00:12:21,396 --> 00:12:26,907 And you have something here. 252 00:12:26,907 --> 00:12:28,839 So OK, let's move this down. 253 00:12:33,560 --> 00:12:36,780 So the problem is that you have this bunch 254 00:12:36,780 --> 00:12:37,920 of events planned out. 255 00:12:37,920 --> 00:12:39,720 Now clearly, they're overlapping, 256 00:12:39,720 --> 00:12:41,880 so you can't attend all of them. 257 00:12:41,880 --> 00:12:45,010 So the idea is you make a bunch of clones of yourself. 258 00:12:45,010 --> 00:12:49,140 And so in this case, look at the matching colors. 259 00:12:49,140 --> 00:12:51,850 So if you create clone number 1 goes here, 260 00:12:51,850 --> 00:12:54,820 and clone number 2 goes to red, and clone number 3 261 00:12:54,820 --> 00:12:55,890 goes to blue. 262 00:12:55,890 --> 00:12:57,460 So then clone number 1 does this. 263 00:12:57,460 --> 00:12:59,120 Clone number 3 does the blue one. 264 00:12:59,120 --> 00:13:01,770 Clone number 2 does red. 265 00:13:01,770 --> 00:13:03,760 I guess, we should move the red back a little 266 00:13:03,760 --> 00:13:06,288 or forward a little bit just to make it clear. 267 00:13:06,288 --> 00:13:07,780 Yeah, there we go. 268 00:13:07,780 --> 00:13:11,350 And now, you could easily see that this is optimal. 269 00:13:11,350 --> 00:13:14,417 So you can do this with three clones and no less. 270 00:13:14,417 --> 00:13:16,000 So you make three clones, and then you 271 00:13:16,000 --> 00:13:17,490 can go off to spring break. 272 00:13:17,490 --> 00:13:18,720 And your schedule is fine. 273 00:13:18,720 --> 00:13:22,480 So now, how would you approach this problem? 274 00:13:22,480 --> 00:13:26,210 So what is a greedy strategy to, given a number of intervals, 275 00:13:26,210 --> 00:13:29,200 how do you find the minimum number of clones 276 00:13:29,200 --> 00:13:32,750 you need to cover your day? 277 00:13:32,750 --> 00:13:33,250 Any ideas? 278 00:13:35,552 --> 00:13:37,010 What is a naive thing you could do? 279 00:13:39,705 --> 00:13:41,495 AUDIENCE: When you say to cover your day, 280 00:13:41,495 --> 00:13:42,620 then it's like the number-- 281 00:13:42,620 --> 00:13:45,480 AMARTYA SHANKHA BISWAS: So you want to do every event. 282 00:13:45,480 --> 00:13:48,710 But like so this clone can't-- so clone number 1 does this 283 00:13:48,710 --> 00:13:49,210 event. 284 00:13:49,210 --> 00:13:51,371 Then he can't do this event or this event. 285 00:13:51,371 --> 00:13:53,335 AUDIENCE: Sort of maximizing your events? 286 00:13:53,335 --> 00:13:55,635 AMARTYA SHANKHA BISWAS: You want to do all the events? 287 00:13:55,635 --> 00:13:57,385 You want to minimize the number of clones. 288 00:13:57,385 --> 00:13:58,375 AUDIENCE: [INAUDIBLE] 289 00:14:01,196 --> 00:14:03,570 AMARTYA SHANKHA BISWAS: So it's like interval scheduling. 290 00:14:03,570 --> 00:14:07,240 But you want to do all the intervals, 291 00:14:07,240 --> 00:14:09,005 but you're allowed to use multiple people 292 00:14:09,005 --> 00:14:10,004 to do all the intervals. 293 00:14:12,944 --> 00:14:13,875 Yes? 294 00:14:13,875 --> 00:14:14,375 Yeah? 295 00:14:14,375 --> 00:14:16,760 AUDIENCE: You could sort by end time. 296 00:14:16,760 --> 00:14:17,330 AMARTYA SHANKHA BISWAS: By end time, OK. 297 00:14:17,330 --> 00:14:19,079 What do you do after you sort by end time? 298 00:14:19,079 --> 00:14:22,139 AUDIENCE: And then iterate over all the intervals 299 00:14:22,139 --> 00:14:25,369 once they're sorted and just count how many intervals there 300 00:14:25,369 --> 00:14:31,085 are between the [INAUDIBLE]. 301 00:14:31,085 --> 00:14:32,774 That would get complicated. 302 00:14:32,774 --> 00:14:34,440 AMARTYA SHANKHA BISWAS: So you're close. 303 00:14:34,440 --> 00:14:35,940 So you do begin by sorting. 304 00:14:35,940 --> 00:14:38,426 But you can actually do it by sorting by end time. 305 00:14:38,426 --> 00:14:40,550 It's easier to visualize if you sort by start time. 306 00:14:40,550 --> 00:14:44,602 So leading from that, anyone want to top in? 307 00:14:44,602 --> 00:14:46,530 AUDIENCE: It's when your sorting starts, 308 00:14:46,530 --> 00:14:50,386 and every time you get a class, you get a new clone. 309 00:14:50,386 --> 00:14:52,584 AMARTYA SHANKHA BISWAS: So yeah, essentially, 310 00:14:52,584 --> 00:14:55,000 every time you can't add it to one of your current clones, 311 00:14:55,000 --> 00:14:55,810 you just create a new one. 312 00:14:55,810 --> 00:14:56,593 You could also do it by end time, 313 00:14:56,593 --> 00:14:57,340 because it's symmetrical, right? 314 00:14:57,340 --> 00:14:58,930 So if you sort by end time, then you 315 00:14:58,930 --> 00:15:01,700 start with the smallest-- last end time and go backwards, 316 00:15:01,700 --> 00:15:03,870 exactly the same thing. 317 00:15:03,870 --> 00:15:05,210 So let's write it down. 318 00:15:17,070 --> 00:15:19,330 So sort by start time, and so actually, 319 00:15:19,330 --> 00:15:21,210 let's work out this example. 320 00:15:21,210 --> 00:15:26,130 So in this case, you would go-- OK, actually, 321 00:15:26,130 --> 00:15:28,580 if once you sort-- so first you have 1, 322 00:15:28,580 --> 00:15:35,120 then you have 2, 3, 4, 5, 6. 323 00:15:35,120 --> 00:15:37,260 So that's sorted by start time. 324 00:15:37,260 --> 00:15:40,920 And then you have-- so first you go for this one. 325 00:15:40,920 --> 00:15:45,160 Then you go for 2, and 2 intersects with 1. 326 00:15:45,160 --> 00:15:46,750 So you put 2 into it. 327 00:15:46,750 --> 00:15:48,670 So this is clone number 1. 328 00:15:48,670 --> 00:15:50,920 And then you have to create a new clone for 2, 329 00:15:50,920 --> 00:15:52,310 so you create the new clone. 330 00:15:52,310 --> 00:15:53,820 And there we go. 331 00:15:53,820 --> 00:15:55,300 So then you go to 3. 332 00:15:55,300 --> 00:15:56,950 3 clashes with both 1 and 2, so you 333 00:15:56,950 --> 00:15:59,110 have to create a new clone again. 334 00:15:59,110 --> 00:16:05,930 So in that case, you go forth and create 3. 335 00:16:05,930 --> 00:16:07,100 Then you go to 4. 336 00:16:07,100 --> 00:16:10,050 Now, 4, you see, it's starts with 2 and 3, 337 00:16:10,050 --> 00:16:11,620 but it is good with 1. 338 00:16:11,620 --> 00:16:15,224 So you just put 4 over here. 339 00:16:15,224 --> 00:16:17,140 And if you continue like this, you essentially 340 00:16:17,140 --> 00:16:22,120 get this and this. 341 00:16:22,120 --> 00:16:23,050 Make sense? 342 00:16:23,050 --> 00:16:24,970 So that's how you schedule it. 343 00:16:24,970 --> 00:16:28,330 So does that algorithm make sense? 344 00:16:28,330 --> 00:16:30,520 Let's try to prove its correctness. 345 00:16:33,280 --> 00:16:36,820 So let's look at the instance where you're inserting 346 00:16:36,820 --> 00:16:41,995 the m-th clone-- so m-th clone. 347 00:16:52,090 --> 00:16:55,219 So when the m-th is created, you already 348 00:16:55,219 --> 00:16:56,260 have some values in here. 349 00:16:59,000 --> 00:17:03,740 So you have 1, 2, all the way up to m minus 1. 350 00:17:03,740 --> 00:17:06,650 So now, you bring in your interval, 351 00:17:06,650 --> 00:17:09,560 and you see that it collides with all of these values. 352 00:17:09,560 --> 00:17:13,430 So let's just draw the final interval for all these guys. 353 00:17:13,430 --> 00:17:16,664 So let's say the final interval for this guy was out here. 354 00:17:16,664 --> 00:17:18,645 Let's say the final interval for this guy 355 00:17:18,645 --> 00:17:23,010 was out here, and so on, and blah, blah, blah, blah, blah. 356 00:17:23,010 --> 00:17:25,859 And so when you create the m-th clone, 357 00:17:25,859 --> 00:17:27,000 you look at the start time. 358 00:17:27,000 --> 00:17:31,080 So what happens is that the start time is somewhere, 359 00:17:31,080 --> 00:17:32,350 let's say, here. 360 00:17:32,350 --> 00:17:34,450 And now, you know that because of this-- 361 00:17:34,450 --> 00:17:36,810 so you're only adding a new clone when you 362 00:17:36,810 --> 00:17:38,590 don't have an available slot. 363 00:17:38,590 --> 00:17:40,890 So that means that there is some interval here, 364 00:17:40,890 --> 00:17:43,716 which intersects with this guy. 365 00:17:43,716 --> 00:17:45,590 So how do you show that this is one interval? 366 00:17:45,590 --> 00:17:47,050 Well, it's like consider any level. 367 00:17:47,050 --> 00:17:49,258 But say there is no interval that intersects with it. 368 00:17:49,258 --> 00:17:51,860 So that means that there is either-- So 369 00:17:51,860 --> 00:17:57,480 if there were a gap here-- so let's say, at this location, 370 00:17:57,480 --> 00:17:58,980 this interval wasn't here. 371 00:17:58,980 --> 00:18:02,430 Let's say if you extrapolate this line outward-- so this 372 00:18:02,430 --> 00:18:04,210 is your current starting value. 373 00:18:04,210 --> 00:18:06,730 And let's say you look at this line. 374 00:18:06,730 --> 00:18:10,510 And in this segment, you can't have 375 00:18:10,510 --> 00:18:12,250 something which starts after this, 376 00:18:12,250 --> 00:18:15,140 because this is the current highest sorting starting time. 377 00:18:15,140 --> 00:18:17,340 So there's no interval that starts after this. 378 00:18:17,340 --> 00:18:19,400 So the only interval that's going to exist 379 00:18:19,400 --> 00:18:21,124 have already ended here. 380 00:18:21,124 --> 00:18:22,540 And if they're already ended here, 381 00:18:22,540 --> 00:18:24,560 that means you could evaluate here. 382 00:18:24,560 --> 00:18:27,210 Does that make sense? 383 00:18:27,210 --> 00:18:29,370 So basically, then you can show that, OK, 384 00:18:29,370 --> 00:18:32,579 so at every existing-- if you're adding a new clone, that 385 00:18:32,579 --> 00:18:33,995 means at every existing level, you 386 00:18:33,995 --> 00:18:36,270 have something which intersects. 387 00:18:36,270 --> 00:18:40,220 So what that means is that you have 388 00:18:40,220 --> 00:18:42,672 a single point of time where there 389 00:18:42,672 --> 00:18:46,100 are m minus 1 plus 1 intervals. 390 00:18:46,100 --> 00:18:49,330 That means that you absolutely need m intervals regardless 391 00:18:49,330 --> 00:18:51,010 of what your strategy is. 392 00:18:51,010 --> 00:18:53,690 So adding the m-th clone is necessary. 393 00:18:53,690 --> 00:18:55,640 So if you go on, continue the argument-- 394 00:18:55,640 --> 00:18:57,181 let's say your total number of clones 395 00:18:57,181 --> 00:18:59,190 was m-- so you can just do this argument for m. 396 00:18:59,190 --> 00:19:01,648 There you will show that, oh, if I followed all these rules 397 00:19:01,648 --> 00:19:04,360 correctly, I can show that the start time for m 398 00:19:04,360 --> 00:19:07,300 intersects with m minus 1 other intervals. 399 00:19:07,300 --> 00:19:10,310 So there's no way I can create a scheduling 400 00:19:10,310 --> 00:19:12,962 with less than m clones. 401 00:19:12,962 --> 00:19:15,420 Did that argument make sense, or should I go over it again? 402 00:19:19,060 --> 00:19:25,316 So that's somewhat hand wavy, but that shouldn't be-- OK. 403 00:19:25,316 --> 00:19:28,390 In any case, well, that's the three problems. 404 00:19:28,390 --> 00:19:30,790 So I guess we can go back to this one 405 00:19:30,790 --> 00:19:32,510 and sort of give the motivation for this. 406 00:19:32,510 --> 00:19:37,055 So this could, for example, be used in scheduling processes 407 00:19:37,055 --> 00:19:38,550 for servers, for instance. 408 00:19:38,550 --> 00:19:42,010 So let's say your server gets a request to run n processes, 409 00:19:42,010 --> 00:19:43,570 and they have times like that. 410 00:19:43,570 --> 00:19:45,140 So this is like shortest time first. 411 00:19:45,140 --> 00:19:47,600 So you take all the short-- the smallest jobs, 412 00:19:47,600 --> 00:19:49,184 and you execute them in the beginning. 413 00:19:49,184 --> 00:19:50,349 And you wait for other jobs. 414 00:19:50,349 --> 00:19:51,860 And this can also be done online. 415 00:19:51,860 --> 00:19:55,780 So you can have an online version of this. 416 00:19:55,780 --> 00:19:58,890 So if you take this algorithm and you do it online-- so let's 417 00:19:58,890 --> 00:20:02,080 say your server is running jobs, and you get a new request. 418 00:20:02,080 --> 00:20:06,170 So you get a new request, so you already have some set of t1 419 00:20:06,170 --> 00:20:07,150 to tn. 420 00:20:07,150 --> 00:20:09,820 And let's say, at the current moment, 421 00:20:09,820 --> 00:20:11,560 ti is your smallest job. 422 00:20:11,560 --> 00:20:16,200 And you're running it, and you're currently at this point. 423 00:20:16,200 --> 00:20:18,120 And then in the middle of running it, 424 00:20:18,120 --> 00:20:20,350 you can get new requests for jobs. 425 00:20:20,350 --> 00:20:23,205 So how would you modify this algorithm to handle that? 426 00:20:26,875 --> 00:20:30,290 So you still want to maintain this lowest average completion 427 00:20:30,290 --> 00:20:31,316 time thing. 428 00:20:31,316 --> 00:20:32,940 So how would you handle this situation. 429 00:20:32,940 --> 00:20:34,090 So let's say you're in the middle of a job 430 00:20:34,090 --> 00:20:36,010 and you get a bunch of new requests. 431 00:20:36,010 --> 00:20:39,390 So current set is all these existing jobs 432 00:20:39,390 --> 00:20:41,590 plus some other things you get in here. 433 00:20:44,910 --> 00:20:48,042 So would you consider switching to a different job here, 434 00:20:48,042 --> 00:20:49,250 or would you keep doing this? 435 00:20:56,520 --> 00:20:59,840 Let's say one of the new jobs you get is really small. 436 00:20:59,840 --> 00:21:02,460 So what you would do in that case 437 00:21:02,460 --> 00:21:04,910 is that instead of continuing with this, 438 00:21:04,910 --> 00:21:07,200 you would switch to current smallest job. 439 00:21:07,200 --> 00:21:08,670 So you would look at the remaining 440 00:21:08,670 --> 00:21:09,987 time, so that's important. 441 00:21:09,987 --> 00:21:11,820 So you could forget about the amount of time 442 00:21:11,820 --> 00:21:12,904 you already spent on this. 443 00:21:12,904 --> 00:21:14,403 You know what the remaining time is, 444 00:21:14,403 --> 00:21:16,070 and that is all that is relevant. 445 00:21:16,070 --> 00:21:17,803 So you can just consider this problem 446 00:21:17,803 --> 00:21:19,870 in a different framework. 447 00:21:19,870 --> 00:21:21,620 It's the exact same question. 448 00:21:21,620 --> 00:21:24,210 You just look at remaining time, instead of total time. 449 00:21:24,210 --> 00:21:25,080 So if you're in the middle of a job, 450 00:21:25,080 --> 00:21:26,747 and a new one comes in which is smaller, 451 00:21:26,747 --> 00:21:28,371 you just switch to that, complete that, 452 00:21:28,371 --> 00:21:30,570 and then look at the remaining times for everything. 453 00:21:30,570 --> 00:21:32,028 So at some point of time, you might 454 00:21:32,028 --> 00:21:34,470 have a lot of half completed jobs just lying around. 455 00:21:34,470 --> 00:21:37,960 And for all of them, you'll update their ti values 456 00:21:37,960 --> 00:21:40,130 to remaining time rather than start time. 457 00:21:40,130 --> 00:21:42,860 And that gives you a nice way to decide 458 00:21:42,860 --> 00:21:44,227 which processes to do online. 459 00:21:44,227 --> 00:21:45,060 And that gives you-- 460 00:21:45,060 --> 00:21:49,280 So this is assuming that all of your tasks have equal weights. 461 00:21:49,280 --> 00:21:51,150 So all of them have equal reward. 462 00:21:51,150 --> 00:21:52,970 So obviously, that's not always the case. 463 00:21:52,970 --> 00:21:55,730 You might be pushing back a very long job forever, 464 00:21:55,730 --> 00:21:57,712 because smaller things keep coming in 465 00:21:57,712 --> 00:21:58,920 and that might get important. 466 00:21:58,920 --> 00:22:01,092 But everything is equally weighted, 467 00:22:01,092 --> 00:22:02,842 then this is the optimal thing you can do. 468 00:22:02,842 --> 00:22:06,040 And it's a very simple strategy that works. 469 00:22:06,040 --> 00:22:09,925 So those are the three problems I wanted to discuss. 470 00:22:09,925 --> 00:22:11,925 Do you guys have any other questions or comments 471 00:22:11,925 --> 00:22:14,900 or anything? 472 00:22:14,900 --> 00:22:16,322 Good? 473 00:22:16,322 --> 00:22:18,700 OK. 474 00:22:18,700 --> 00:22:20,420 We finished pretty early, so I guess, 475 00:22:20,420 --> 00:22:22,540 have a great spring break.