1 00:00:00,150 --> 00:00:01,840 The following content is provided 2 00:00:01,840 --> 00:00:04,080 under a Creative Commons license. 3 00:00:04,080 --> 00:00:06,930 Your support will help MIT OpenCourseWare continue 4 00:00:06,930 --> 00:00:10,780 to offer high quality educational resources for free. 5 00:00:10,780 --> 00:00:13,390 To make a donation or view additional materials 6 00:00:13,390 --> 00:00:17,296 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,296 --> 00:00:17,921 at ocw.mit.edu. 8 00:00:21,680 --> 00:00:23,800 PROFESSOR: We're going to cover hashing next time. 9 00:00:23,800 --> 00:00:25,580 Now we're Going to focus on pset three 10 00:00:25,580 --> 00:00:27,880 because I heard pset three is hard 11 00:00:27,880 --> 00:00:32,216 so we're going to look at the code and try to understand it. 12 00:00:32,216 --> 00:00:33,560 AUDIENCE: This course is hard. 13 00:00:33,560 --> 00:00:35,143 PROFESSOR: I've tried to make it easy. 14 00:00:35,143 --> 00:00:36,040 I really tried. 15 00:00:40,140 --> 00:00:42,410 How many people recognize the code? 16 00:00:42,410 --> 00:00:44,220 Does this look familiar? 17 00:00:44,220 --> 00:00:44,950 One, two, three. 18 00:00:44,950 --> 00:00:48,190 Everyone else got stuck on question one? 19 00:00:48,190 --> 00:00:50,600 I'm not going to give you the answers to the psets. 20 00:00:50,600 --> 00:00:51,310 We are going to-- 21 00:00:51,310 --> 00:00:51,530 AUDIENCE: [LAUGHING] 22 00:00:51,530 --> 00:00:52,370 PROFESSOR: Sorry. 23 00:00:52,370 --> 00:00:54,382 We're going to look at the code and understand 24 00:00:54,382 --> 00:00:56,590 what it does because once you understand what it does 25 00:00:56,590 --> 00:01:00,270 you can understand how to modify it. 26 00:01:00,270 --> 00:01:02,444 All right, before I do that, any questions? 27 00:01:02,444 --> 00:01:04,069 And I might not answer them right away, 28 00:01:04,069 --> 00:01:08,190 but I will keep them in mind while I go through the code. 29 00:01:08,190 --> 00:01:10,333 Are there any specific pin-points. 30 00:01:10,333 --> 00:01:12,550 AUDIENCE: The types. 31 00:01:12,550 --> 00:01:14,410 PROFESSOR: Yep. 32 00:01:14,410 --> 00:01:15,520 Covered in five minutes. 33 00:01:21,510 --> 00:01:22,010 Oh well. 34 00:01:28,572 --> 00:01:30,280 The first thing that I want to talk about 35 00:01:30,280 --> 00:01:33,151 is sweep line algorithms. 36 00:01:33,151 --> 00:01:35,400 We're asking you to implement the sweep line algorithm 37 00:01:35,400 --> 00:01:39,140 and we're giving you one that is horribly inefficient. 38 00:01:39,140 --> 00:01:43,660 Suppose you have some line segments that look like this. 39 00:01:48,280 --> 00:01:49,530 Sorry, drawing isn't. 40 00:01:56,030 --> 00:01:56,920 They look like this. 41 00:02:01,190 --> 00:02:05,842 And suppose you want to find the intersections between them. 42 00:02:05,842 --> 00:02:07,730 Now let's figure out what the algorithm 43 00:02:07,730 --> 00:02:09,740 that we gave you does. 44 00:02:09,740 --> 00:02:12,400 Did people look at the trace for that? 45 00:02:12,400 --> 00:02:16,000 Not the good trace, the trace that's output by the algorithm 46 00:02:16,000 --> 00:02:18,620 that we gave you. 47 00:02:18,620 --> 00:02:21,598 Does anyone understand what that does. 48 00:02:21,598 --> 00:02:23,445 AUDIENCE: I kind of get it. 49 00:02:23,445 --> 00:02:24,028 PROFESSOR: OK. 50 00:02:24,028 --> 00:02:26,444 AUDIENCE: I think what it does is it just hits through all 51 00:02:26,444 --> 00:02:29,464 the horizontal lines and it goes through all the vertical ones. 52 00:02:29,464 --> 00:02:31,380 First it goes through all the horizontal lines 53 00:02:31,380 --> 00:02:32,890 and what does it do with them? 54 00:02:32,890 --> 00:02:34,730 AUDIENCE: [INAUDIBLE]. 55 00:02:34,730 --> 00:02:37,020 PROFESSOR: They get bolded in the visualizer which 56 00:02:37,020 --> 00:02:39,780 means they're added to the range index. 57 00:02:39,780 --> 00:02:44,060 So one, add all the horizontal lines. 58 00:02:44,060 --> 00:02:48,020 And two-- oh you said it goes through all the vertical lines 59 00:02:48,020 --> 00:02:50,220 after that. 60 00:02:50,220 --> 00:02:52,306 Any idea what it does? 61 00:02:52,306 --> 00:02:54,260 AUDIENCE: It looks for intersections. 62 00:02:54,260 --> 00:03:00,290 PROFESSOR: When he looks at a vertical line segment 63 00:03:00,290 --> 00:03:01,990 you'll see some blue square and you'll 64 00:03:01,990 --> 00:03:03,800 see some segments that are green. 65 00:03:03,800 --> 00:03:06,130 This is a query to the range index 66 00:03:06,130 --> 00:03:08,710 and the green segments are the answers 67 00:03:08,710 --> 00:03:13,516 and the blue square is the range that's queried. 68 00:03:13,516 --> 00:03:15,390 First add all the segments to the range index 69 00:03:15,390 --> 00:03:21,470 and then do a query. 70 00:03:21,470 --> 00:03:23,420 In principle sweep line algorithms 71 00:03:23,420 --> 00:03:25,740 have some geometric inputs. 72 00:03:25,740 --> 00:03:27,790 For example, a bunch of lines. 73 00:03:27,790 --> 00:03:30,610 And conceptually they have this vertical line 74 00:03:30,610 --> 00:03:33,990 that they sweep across the plane all the way from minus infinity 75 00:03:33,990 --> 00:03:35,870 to plus infinity. 76 00:03:35,870 --> 00:03:37,800 Now if you try to sweep a line continuously 77 00:03:37,800 --> 00:03:39,830 from minus infinity to plus infinity in code 78 00:03:39,830 --> 00:03:41,800 it might take forever. 79 00:03:41,800 --> 00:03:43,450 We don't really do it that way. 80 00:03:43,450 --> 00:03:46,182 The way we do it in code is you know 81 00:03:46,182 --> 00:03:48,910 when something interesting is going to happen 82 00:03:48,910 --> 00:03:51,640 and you only simulate what happens to the sweep line 83 00:03:51,640 --> 00:03:53,730 at that point. 84 00:03:53,730 --> 00:03:57,760 In this case the sweep line starts at minus infinity 85 00:03:57,760 --> 00:03:59,540 and then all the horizontal segments 86 00:03:59,540 --> 00:04:01,890 are added to our range index. 87 00:04:01,890 --> 00:04:06,420 And then as the sweep line goes across the plane, when 88 00:04:06,420 --> 00:04:09,010 it hits a vertical segment a query happens. 89 00:04:11,770 --> 00:04:14,165 The only times when the sweep line stops 90 00:04:14,165 --> 00:04:18,360 is first, at the beginning to add all the horizontal line 91 00:04:18,360 --> 00:04:24,920 segments, and then every time it hits a vertical line segment. 92 00:04:24,920 --> 00:04:27,020 We know this ahead of time so we can 93 00:04:27,020 --> 00:04:29,790 compute a list of all these coordinates 94 00:04:29,790 --> 00:04:31,670 of interesting things. 95 00:04:31,670 --> 00:04:33,920 And then we can sort them and then go through them 96 00:04:33,920 --> 00:04:37,990 and simulate the behavior of the sweep line that way. 97 00:04:37,990 --> 00:04:40,260 If you look at the code listing at events from there, 98 00:04:40,260 --> 00:04:41,660 this is the gist of it. 99 00:04:41,660 --> 00:04:43,260 This is what it does. 100 00:04:43,260 --> 00:04:45,790 It compiles a list of events. 101 00:04:45,790 --> 00:04:50,780 The first key in each event is the x, is that x? 102 00:04:50,780 --> 00:04:53,610 Yeah. x like this, y like this. 103 00:04:53,610 --> 00:04:56,230 The first case, the x-coordinate of an event 104 00:04:56,230 --> 00:04:59,260 and by sorting that list of events 105 00:04:59,260 --> 00:05:02,600 it will then process them in order. 106 00:05:02,600 --> 00:05:05,490 Events from there is responsible for looking at all the wires 107 00:05:05,490 --> 00:05:08,420 and compiling events. 108 00:05:08,420 --> 00:05:10,990 How does an event for a horizontal layer look like? 109 00:05:10,990 --> 00:05:12,480 Can anyone tell me? 110 00:05:12,480 --> 00:05:15,435 Sorry, for a horizontal wire. 111 00:05:15,435 --> 00:05:17,143 AUDIENCE: It will run across [INAUDIBLE]. 112 00:05:21,460 --> 00:05:24,330 PROFESSOR: First off it looks like a list, right? 113 00:05:24,330 --> 00:05:26,150 It's a regular python list. 114 00:05:26,150 --> 00:05:29,360 And I will try to call it vector so that we don't confuse it 115 00:05:29,360 --> 00:05:31,830 with a big list of all the events. 116 00:05:31,830 --> 00:05:34,610 You said the first thing is the leftmost point. 117 00:05:34,610 --> 00:05:36,860 Right? 118 00:05:36,860 --> 00:05:39,990 If this segment, for example, starts at minus 100, 119 00:05:39,990 --> 00:05:44,680 this one starts at minus 95, this one starts at minus 90, 120 00:05:44,680 --> 00:05:49,890 and this one starts at minus 50, by the way not to scale, 121 00:05:49,890 --> 00:05:52,535 then the leftmost point is minus 100. 122 00:05:52,535 --> 00:05:54,110 Right? 123 00:05:54,110 --> 00:05:58,230 This is the first element in the vector. 124 00:05:58,230 --> 00:06:00,795 And that's computed on line four where 125 00:06:00,795 --> 00:06:02,170 it goes through all the segments, 126 00:06:02,170 --> 00:06:07,200 looks at all the X-coordinates and chooses the minimum. 127 00:06:07,200 --> 00:06:09,227 What's the next thing in the vector? 128 00:06:09,227 --> 00:06:10,560 We don't have to understand why. 129 00:06:10,560 --> 00:06:12,180 We'll go through why in a bit. 130 00:06:12,180 --> 00:06:14,656 I'm just interested in what? 131 00:06:14,656 --> 00:06:15,280 AUDIENCE: Zero. 132 00:06:15,280 --> 00:06:15,946 PROFESSOR: Zero. 133 00:06:15,946 --> 00:06:17,440 Excellent. 134 00:06:17,440 --> 00:06:21,190 PROFESSOR: After that there is a wire ID. 135 00:06:21,190 --> 00:06:24,540 Each wire gets a unique ID, that's a number between zero 136 00:06:24,540 --> 00:06:27,140 and the number of wires. 137 00:06:27,140 --> 00:06:29,790 Say this is one, two, three. 138 00:06:29,790 --> 00:06:31,760 This is going to be one. 139 00:06:31,760 --> 00:06:34,780 And then I have the string add. 140 00:06:34,780 --> 00:06:36,510 And then I have the actual wire object. 141 00:06:39,400 --> 00:06:45,270 What does the vector look like for a query event? 142 00:06:45,270 --> 00:06:49,032 Sorry, what does it look like for a vertical wire? 143 00:06:49,032 --> 00:06:50,490 I'm giving away some of the answer. 144 00:06:50,490 --> 00:06:51,320 Bad, bad, bad. 145 00:06:55,120 --> 00:06:57,690 All right, vertical wire, look at the code and. 146 00:07:03,157 --> 00:07:06,312 AUDIENCE: It keeps the left [INAUDIBLE]. 147 00:07:06,312 --> 00:07:07,270 PROFESSOR: Of the wire. 148 00:07:07,270 --> 00:07:08,050 OK. 149 00:07:08,050 --> 00:07:09,379 And? 150 00:07:09,379 --> 00:07:11,378 AUDIENCE: It keeps the x-coordinate [INAUDIBLE]. 151 00:07:14,502 --> 00:07:16,960 PROFESSOR: A vertical wire will have the same X-coordinates 152 00:07:16,960 --> 00:07:18,543 for all the points on the wire, right? 153 00:07:18,543 --> 00:07:21,380 I don't really worry about which coordinate it is. 154 00:07:21,380 --> 00:07:23,320 This wire is at minus 50. 155 00:07:23,320 --> 00:07:27,430 The first element here is going to be minus 50. 156 00:07:27,430 --> 00:07:29,329 What's next? 157 00:07:29,329 --> 00:07:30,720 AUDIENCE: A 1. 158 00:07:30,720 --> 00:07:32,270 PROFESSOR: A 1. 159 00:07:32,270 --> 00:07:33,080 OK. 160 00:07:33,080 --> 00:07:36,600 And suppose this is wire ID 4. 161 00:07:36,600 --> 00:07:40,530 I'm going to have 4. 162 00:07:40,530 --> 00:07:41,450 AUDIENCE: Query. 163 00:07:41,450 --> 00:07:42,604 PROFESSOR: Query. 164 00:07:42,604 --> 00:07:45,160 AUDIENCE: And the wire. 165 00:07:45,160 --> 00:07:46,160 PROFESSOR: And the wire. 166 00:07:51,200 --> 00:07:54,250 All right, what happens with these in the init method, which 167 00:07:54,250 --> 00:07:56,330 is not shown in the code listing, 168 00:07:56,330 --> 00:08:01,750 is after the list is put together sort is called on it. 169 00:08:01,750 --> 00:08:04,980 These are all compared and then reordered 170 00:08:04,980 --> 00:08:09,520 to be sorted according to some ordering relationship. 171 00:08:09,520 --> 00:08:11,700 Does anyone know what the ordering relationship 172 00:08:11,700 --> 00:08:14,667 is for Python lists? 173 00:08:14,667 --> 00:08:16,134 Yeah. 174 00:08:16,134 --> 00:08:18,579 AUDIENCE: It starts from the first, from zero, 175 00:08:18,579 --> 00:08:24,367 and the next it goes to sort the same with as X 1. 176 00:08:24,367 --> 00:08:24,950 PROFESSOR: OK. 177 00:08:24,950 --> 00:08:33,645 Say if I have 1, 2, 3 and 1, 3, 2. 178 00:08:36,320 --> 00:08:42,950 Sorry, and 2, 1, 2. 179 00:08:42,950 --> 00:08:44,870 He's going to look at the first element 180 00:08:44,870 --> 00:08:47,340 and if they are different than the one with the smaller 181 00:08:47,340 --> 00:08:49,020 element is smaller. 182 00:08:49,020 --> 00:08:52,270 1, 2, 3 as a vector, sorry, as a list, 183 00:08:52,270 --> 00:08:54,470 is smaller than 2, 1, 2 as a list. 184 00:08:57,250 --> 00:09:00,430 If this guy becomes one then they're equal. 185 00:09:00,430 --> 00:09:02,280 Then it has to go to the next element, 186 00:09:02,280 --> 00:09:06,030 compare them and see if they're different we have 187 00:09:06,030 --> 00:09:07,460 an answer, if not we have to keep 188 00:09:07,460 --> 00:09:10,090 going all the way until the end of the list. 189 00:09:10,090 --> 00:09:12,170 This is called lexicographic comparison 190 00:09:12,170 --> 00:09:15,730 and this is the same ordering that you have for the words 191 00:09:15,730 --> 00:09:17,420 in a dictionary. 192 00:09:17,420 --> 00:09:19,040 Right, if you think of each letter 193 00:09:19,040 --> 00:09:22,450 as an element in a list and the word is the list 194 00:09:22,450 --> 00:09:26,820 then this is how you look up words in a dictionary. 195 00:09:26,820 --> 00:09:30,850 The reason events look like this is 196 00:09:30,850 --> 00:09:33,210 so that sort would output them in the right order 197 00:09:33,210 --> 00:09:35,150 for the next state. 198 00:09:35,150 --> 00:09:36,911 So, yes. 199 00:09:36,911 --> 00:09:38,994 AUDIENCE: Does that mean all the horizontal wires, 200 00:09:38,994 --> 00:09:41,683 does that mean that they start at same place even though that 201 00:09:41,683 --> 00:09:42,806 they don't? 202 00:09:42,806 --> 00:09:43,430 PROFESSOR: Yup. 203 00:09:43,430 --> 00:09:45,700 all the horizontal wires will be sorted. 204 00:09:45,700 --> 00:09:50,850 All these add events will be sorted before the query events. 205 00:09:50,850 --> 00:09:52,080 Yup, very good observation. 206 00:09:52,080 --> 00:09:53,990 This is what I'm trying to achieve. 207 00:09:53,990 --> 00:09:56,405 That's why I have this special value here. 208 00:09:56,405 --> 00:09:58,280 I go through some extra effort to compute it. 209 00:09:58,280 --> 00:09:59,780 I had to write an extra line of code 210 00:09:59,780 --> 00:10:01,894 so this is the motivation for it. 211 00:10:06,540 --> 00:10:09,730 The first key in the vector is the X element 212 00:10:09,730 --> 00:10:10,788 on the sweep line. 213 00:10:13,780 --> 00:10:17,370 I could've also had the very large negative value 214 00:10:17,370 --> 00:10:20,000 that something that would behave like minus infinity here 215 00:10:20,000 --> 00:10:21,790 and that would work just as well. 216 00:10:24,450 --> 00:10:26,920 By computing the left edge I don't have to deal with that. 217 00:10:26,920 --> 00:10:29,320 I don't have to worry of whether my negative infinity is 218 00:10:29,320 --> 00:10:32,390 small enough, because if it's not I will fail the test 219 00:10:32,390 --> 00:10:34,380 and that's bad. 220 00:10:34,380 --> 00:10:37,340 This is the x-coordinate at which the sweep line stops 221 00:10:37,340 --> 00:10:39,340 and something happens. 222 00:10:39,340 --> 00:10:41,630 So if all the events are different the x-coordinates 223 00:10:41,630 --> 00:10:43,390 then I have my ordering, I'm done. 224 00:10:43,390 --> 00:10:46,220 Now what if I have two events at the same time? 225 00:10:46,220 --> 00:10:49,800 For example, what if I have two vertical wires? 226 00:10:54,551 --> 00:10:56,300 The x-coordinates are going to be the same 227 00:10:56,300 --> 00:11:00,040 so I need to use something else to break the ties, right? 228 00:11:00,040 --> 00:11:01,920 The first thing that I use to break the ties 229 00:11:01,920 --> 00:11:05,120 is all add events have a zero here 230 00:11:05,120 --> 00:11:07,530 and all the query events have a one. 231 00:11:07,530 --> 00:11:09,720 And the point of that is if I have 232 00:11:09,720 --> 00:11:11,350 two events at the same x-coordinate, 233 00:11:11,350 --> 00:11:13,940 if I have a query and an add, I want 234 00:11:13,940 --> 00:11:16,300 the add to happen before the query. 235 00:11:16,300 --> 00:11:18,780 If this is the same this is going 236 00:11:18,780 --> 00:11:20,630 to be different for ad versus query. 237 00:11:24,200 --> 00:11:25,250 Does that make sense? 238 00:11:25,250 --> 00:11:27,610 This is how we order events relatively 239 00:11:27,610 --> 00:11:29,770 to each other on the same line. 240 00:11:29,770 --> 00:11:33,010 Now suppose I have these two wires, 241 00:11:33,010 --> 00:11:35,470 they're both vertical so they're both going to be queries. 242 00:11:35,470 --> 00:11:37,270 They're both at the same index. 243 00:11:37,270 --> 00:11:40,520 I need something else, and that other thing that I have 244 00:11:40,520 --> 00:11:44,370 is the wire ID which is guaranteed to be unique. 245 00:11:44,370 --> 00:11:47,620 I know for sure the comparison will stop here. 246 00:11:47,620 --> 00:11:49,410 And the comparison had better stop here 247 00:11:49,410 --> 00:11:52,290 because if the comparison gets all the way to here, 248 00:11:52,290 --> 00:11:54,740 wires aren't comparable so the code is going to crash. 249 00:11:59,390 --> 00:12:02,130 It stops here because the IDs are unique, everyone is happy. 250 00:12:04,770 --> 00:12:06,650 All right, do the events make more sense now? 251 00:12:09,350 --> 00:12:11,180 To complete the picture if you look 252 00:12:11,180 --> 00:12:14,360 at compute crossings, lines nine and ten. 253 00:12:14,360 --> 00:12:16,740 Line nine goes through the vector-- sorry, 254 00:12:16,740 --> 00:12:19,340 goes through the list of events that have been sorted 255 00:12:19,340 --> 00:12:21,750 and processes them in order so sort 256 00:12:21,750 --> 00:12:23,480 had better do the right job. 257 00:12:23,480 --> 00:12:27,135 And then it extracts the x-coordinate, that's here, 258 00:12:27,135 --> 00:12:30,120 it extracts the event type, that's here, 259 00:12:30,120 --> 00:12:32,804 and then it extracts the wire from here. 260 00:12:32,804 --> 00:12:35,220 These guys really are just there to help with the sorting, 261 00:12:35,220 --> 00:12:37,356 they're never read afterwards. 262 00:12:37,356 --> 00:12:39,230 AUDIENCE: Where are the events actually sort? 263 00:12:39,230 --> 00:12:42,110 I thought they just put into order. 264 00:12:42,110 --> 00:12:47,135 PROFESSOR: The events are sorted in an init method that's 265 00:12:47,135 --> 00:12:47,635 not here. 266 00:12:47,635 --> 00:12:49,990 It's in the piece of code but it's not here. 267 00:12:49,990 --> 00:12:54,741 And that may be a hint that you don't want to change it. 268 00:12:54,741 --> 00:12:55,740 As in you don't need to. 269 00:12:58,990 --> 00:13:00,790 All right, events look like this. 270 00:13:04,800 --> 00:13:08,480 Any questions about events? 271 00:13:08,480 --> 00:13:10,639 Everything make sense? 272 00:13:10,639 --> 00:13:12,930 Presumably when you write your own sweep line algorithm 273 00:13:12,930 --> 00:13:16,347 you're going to come up with your own events which 274 00:13:16,347 --> 00:13:17,430 are going to be different. 275 00:13:17,430 --> 00:13:20,830 You're going to change these methods to add 276 00:13:20,830 --> 00:13:22,564 your own events to the list. 277 00:13:22,564 --> 00:13:24,230 And then it's going to be sorted for you 278 00:13:24,230 --> 00:13:27,072 and then you're going to change compute crossings to process 279 00:13:27,072 --> 00:13:29,280 your events in the way that they should be processed. 280 00:13:36,730 --> 00:13:40,540 Let's look at the range index. 281 00:13:40,540 --> 00:13:41,995 What do we store in a range index? 282 00:13:44,745 --> 00:13:47,304 AUDIENCE: The horizontal wires. 283 00:13:47,304 --> 00:13:48,470 PROFESSOR: Horizontal wires. 284 00:13:48,470 --> 00:13:49,380 Very good. 285 00:13:49,380 --> 00:13:53,070 What's the point of storing horizontal wires in an index? 286 00:13:53,070 --> 00:13:55,920 AUDIENCE: That's when you want to query, 287 00:13:55,920 --> 00:13:57,820 the area has a line through it. 288 00:14:01,420 --> 00:14:03,890 PROFESSOR: For the algorithm that we gave you 289 00:14:03,890 --> 00:14:05,590 how does a query look like? 290 00:14:05,590 --> 00:14:10,070 Suppose I have this wire here and I'm doing a query for it. 291 00:14:10,070 --> 00:14:13,395 AUDIENCE: Just ask to list horizontal wires that 292 00:14:13,395 --> 00:14:17,670 are between that Y. 293 00:14:17,670 --> 00:14:20,020 PROFESSOR: Between this guy's top and this guy's bottom. 294 00:14:20,020 --> 00:14:20,520 Right? 295 00:14:20,520 --> 00:14:21,270 AUDIENCE: Yeah. 296 00:14:21,270 --> 00:14:22,936 PROFESSOR: A query would look like this. 297 00:14:27,500 --> 00:14:31,760 Basically all the horizontal wires 298 00:14:31,760 --> 00:14:35,450 that have their y-coordinates between this guy and this guy. 299 00:14:35,450 --> 00:14:40,510 Now if I have a wire that's way up here or way down here 300 00:14:40,510 --> 00:14:43,490 I know for sure that it's not going to intersect this wire. 301 00:14:43,490 --> 00:14:43,990 Right? 302 00:14:43,990 --> 00:14:46,680 I don't care about it. 303 00:14:46,680 --> 00:14:50,400 The range index helps me eliminate some wires. 304 00:14:50,400 --> 00:14:53,840 Now, will the range index eliminate all the wires 305 00:14:53,840 --> 00:14:58,020 that don't intersect with this wire? 306 00:14:58,020 --> 00:14:59,790 No, OK. 307 00:14:59,790 --> 00:15:01,080 Why not? 308 00:15:01,080 --> 00:15:04,636 AUDIENCE: Well in this code it doesn't. 309 00:15:04,636 --> 00:15:06,260 PROFESSOR: OK, in this code it doesn't. 310 00:15:06,260 --> 00:15:07,718 We're only talking about this code, 311 00:15:07,718 --> 00:15:09,360 I'm not talking about the solution code 312 00:15:09,360 --> 00:15:12,478 that you guys have to implement. 313 00:15:12,478 --> 00:15:15,852 AUDIENCE: It doesn't because it never 314 00:15:15,852 --> 00:15:18,750 removes the horizontal wires. 315 00:15:18,750 --> 00:15:21,420 PROFESSOR: OK. 316 00:15:21,420 --> 00:15:27,880 What's an example of a query that would give me 317 00:15:27,880 --> 00:15:32,095 some wires that don't intersect my wire? 318 00:15:32,095 --> 00:15:35,065 AUDIENCE: Like if they're in between the y-axis, 319 00:15:35,065 --> 00:15:38,287 they're, yeah but they're not actually. 320 00:15:38,287 --> 00:15:38,870 PROFESSOR: OK. 321 00:15:38,870 --> 00:15:40,709 So you mean. 322 00:15:40,709 --> 00:15:41,334 AUDIENCE: Yeah. 323 00:15:43,974 --> 00:15:46,224 AUDIENCE: Negative infinity and infinity on the x-axis 324 00:15:46,224 --> 00:15:49,633 would give you a bunch of wires and not many of them 325 00:15:49,633 --> 00:15:52,555 will intersect with the vertical wires. 326 00:15:52,555 --> 00:15:55,477 AUDIENCE: If the end were a vertical wire, [INAUDIBLE]. 327 00:15:58,910 --> 00:15:59,870 PROFESSOR: OK. 328 00:15:59,870 --> 00:16:03,580 I understood this because it matches what I drew. 329 00:16:03,580 --> 00:16:05,910 I didn't understand the minus infinity plus infinity. 330 00:16:05,910 --> 00:16:07,120 AUDIENCE: Well if you think about the x-axis on the bottom, 331 00:16:07,120 --> 00:16:09,722 if you go from all the way on the left to all the way 332 00:16:09,722 --> 00:16:12,400 on the right you're going to get a whole bunch of wires that are 333 00:16:12,400 --> 00:16:14,100 short that-- 334 00:16:14,100 --> 00:16:17,350 PROFESSOR: Are you talking about one wire 335 00:16:17,350 --> 00:16:18,880 that's like that, or many? 336 00:16:18,880 --> 00:16:21,260 AUDIENCE: If you're grabbing from your range index. 337 00:16:21,260 --> 00:16:28,940 PROFESSOR: The range index has horizontal wires in it. 338 00:16:28,940 --> 00:16:31,238 AUDIENCE: Right. 339 00:16:31,238 --> 00:16:34,099 AUDIENCE: Are you assuming that they're all short? 340 00:16:34,099 --> 00:16:34,682 AUDIENCE: Yes. 341 00:16:34,682 --> 00:16:36,650 But it's not necessarily true. 342 00:16:36,650 --> 00:16:38,140 Some of them will be short. 343 00:16:38,140 --> 00:16:40,730 PROFESSOR: You're saying a lot of short horizontal lines-- 344 00:16:40,730 --> 00:16:41,845 you mean like this, right? 345 00:16:41,845 --> 00:16:42,980 AUDIENCE: Yeah. 346 00:16:42,980 --> 00:16:43,860 PROFESSOR: OK. 347 00:16:43,860 --> 00:16:45,300 That's what you mean. 348 00:16:45,300 --> 00:16:46,146 AUDIENCE: You have one intersection and-- 349 00:16:46,146 --> 00:16:47,155 PROFESSOR: A ton of non-intersections. 350 00:16:47,155 --> 00:16:47,520 AUDIENCE: Exactly. 351 00:16:47,520 --> 00:16:48,080 PROFESSOR: OK, cool. 352 00:16:48,080 --> 00:16:48,580 Thank you. 353 00:16:51,680 --> 00:16:53,460 Doing a range query on this can give me 354 00:16:53,460 --> 00:16:55,680 a lot of false positives. 355 00:16:55,680 --> 00:16:57,620 That's why after I get them I have 356 00:16:57,620 --> 00:16:59,660 to make sure that they intersect and I 357 00:16:59,660 --> 00:17:02,830 can't just blindly output them. 358 00:17:02,830 --> 00:17:07,160 These are bad false positives, these are bad false positive. 359 00:17:07,160 --> 00:17:09,650 Now let's look at how the range index works. 360 00:17:09,650 --> 00:17:13,492 Can anyone remind me of what the range index does? 361 00:17:13,492 --> 00:17:14,700 It's a data structure, right? 362 00:17:14,700 --> 00:17:16,099 And it has some operations. 363 00:17:16,099 --> 00:17:17,390 What are the operations for it? 364 00:17:25,636 --> 00:17:27,761 AUDIENCE: It returns everything between two values. 365 00:17:30,430 --> 00:17:33,310 PROFESSOR: You have a list operation where you give it 366 00:17:33,310 --> 00:17:38,770 two values and it will return everything inside the index, 367 00:17:38,770 --> 00:17:44,250 in the interval between those two values. 368 00:17:44,250 --> 00:17:45,720 What else can I do? 369 00:17:50,540 --> 00:17:52,950 AUDIENCE: It can also tell you how many. 370 00:17:52,950 --> 00:17:57,770 It returns those keys and how many there are. 371 00:17:57,770 --> 00:17:59,470 PROFESSOR: I have the count and you 372 00:17:59,470 --> 00:18:03,330 said that I have some things in the index. 373 00:18:03,330 --> 00:18:03,830 Right? 374 00:18:03,830 --> 00:18:05,910 How do I get them in there? 375 00:18:05,910 --> 00:18:08,494 These are queries, I need updates. 376 00:18:08,494 --> 00:18:09,535 AUDIENCE: Add and remove. 377 00:18:20,360 --> 00:18:22,820 PROFESSOR: For the code that they gave you, 378 00:18:22,820 --> 00:18:25,190 what is the running time for each of these operations? 379 00:18:30,946 --> 00:18:35,287 AUDIENCE: It would be constant had to be removed. 380 00:18:35,287 --> 00:18:35,870 PROFESSOR: OK. 381 00:18:35,870 --> 00:18:36,731 Add calls the pend. 382 00:18:36,731 --> 00:18:37,230 Right? 383 00:18:37,230 --> 00:18:38,680 So that is constant. 384 00:18:38,680 --> 00:18:41,810 I will agree there. 385 00:18:41,810 --> 00:18:44,215 AUDIENCE: To remove it you had to shift everything. 386 00:18:44,215 --> 00:18:44,840 PROFESSOR: Yep. 387 00:18:48,000 --> 00:18:50,530 Let's start with an example. 388 00:18:50,530 --> 00:18:54,344 Say I have these keys in my range index. 389 00:18:54,344 --> 00:18:55,320 5, 8, 11, 13. 390 00:18:59,990 --> 00:19:03,260 Suppose they're magically inserted in this order. 391 00:19:03,260 --> 00:19:05,570 If I want to remove two, since this is an array 392 00:19:05,570 --> 00:19:07,377 I have to shift everything to the left. 393 00:19:07,377 --> 00:19:07,960 AUDIENCE: Yes. 394 00:19:07,960 --> 00:19:08,835 PROFESSOR: So that's. 395 00:19:08,835 --> 00:19:10,540 AUDIENCE: Order and time. 396 00:19:10,540 --> 00:19:13,140 PROFESSOR: OK. 397 00:19:13,140 --> 00:19:15,298 How about list and counts? 398 00:19:24,022 --> 00:19:25,480 What's the running time for counts? 399 00:19:25,480 --> 00:19:28,532 How does count work? 400 00:19:28,532 --> 00:19:31,500 AUDIENCE: It's not just through all of the keys. 401 00:19:31,500 --> 00:19:33,570 PROFESSOR: Count goes through everything and-- 402 00:19:33,570 --> 00:19:34,650 AUDIENCE: It has to be order [INAUDIBLE]. 403 00:19:34,650 --> 00:19:35,680 PROFESSOR: Cool. 404 00:19:35,680 --> 00:19:37,430 Count goes through all the keys, evaluates 405 00:19:37,430 --> 00:19:40,460 the predicate that checks whether they're in the range 406 00:19:40,460 --> 00:19:45,860 and then it adds up one to the sum for the right place. 407 00:19:45,860 --> 00:19:46,610 How about list? 408 00:19:49,604 --> 00:19:51,919 AUDIENCE: Also [INAUDIBLE]. 409 00:19:51,919 --> 00:19:53,710 PROFESSOR: OK, that's a list comprehension. 410 00:19:53,710 --> 00:19:55,687 The code is a bit tricky, but it pretty much 411 00:19:55,687 --> 00:19:56,520 does the same thing. 412 00:19:56,520 --> 00:19:59,100 It goes through the list and it puts all the keys 413 00:19:59,100 --> 00:20:01,214 that are in the interval in the list. 414 00:20:04,850 --> 00:20:06,570 Now let's look at another version 415 00:20:06,570 --> 00:20:09,720 on the next page of our range index that is slightly better. 416 00:20:12,440 --> 00:20:14,350 What's the rapid variance for this? 417 00:20:14,350 --> 00:20:18,554 How is it different from the first one? 418 00:20:18,554 --> 00:20:20,207 AUDIENCE: Sorted already. 419 00:20:20,207 --> 00:20:21,290 PROFESSOR: Sorted already. 420 00:20:21,290 --> 00:20:23,190 Very good. 421 00:20:23,190 --> 00:20:25,778 The keys look exactly like this. 422 00:20:25,778 --> 00:20:26,278 Right? 423 00:20:28,900 --> 00:20:33,678 What is the running time for-- let's start with add or remove, 424 00:20:33,678 --> 00:20:35,998 what's the running time for add and remove? 425 00:20:35,998 --> 00:20:37,860 AUDIENCE: N, order of N. 426 00:20:37,860 --> 00:20:40,508 PROFESSOR: Why is it order of N? 427 00:20:40,508 --> 00:20:42,410 AUDIENCE: No, oder log N. 428 00:20:42,410 --> 00:20:44,098 PROFESSOR: Why is it order log N. 429 00:20:44,098 --> 00:20:46,912 AUDIENCE: Because if it's sorted you can look in the middle 430 00:20:46,912 --> 00:20:49,930 and do a binary search. 431 00:20:49,930 --> 00:20:51,810 PROFESSOR: I can do a binary search in order 432 00:20:51,810 --> 00:20:53,351 to find out where I insert something. 433 00:20:53,351 --> 00:20:54,240 Right? 434 00:20:54,240 --> 00:20:56,990 But what if I want to insert zero in this? 435 00:20:56,990 --> 00:21:00,726 AUDIENCE: Is it log N plus order N? 436 00:21:00,726 --> 00:21:01,600 Does that make sense? 437 00:21:01,600 --> 00:21:05,360 Because you have to shift all of the [INAUDIBLE]. 438 00:21:05,360 --> 00:21:06,300 PROFESSOR: Yeah. 439 00:21:06,300 --> 00:21:08,270 So worst case if I have to insert zero here, 440 00:21:08,270 --> 00:21:11,090 I have to shift everything to the right to make room for it. 441 00:21:11,090 --> 00:21:11,590 Right? 442 00:21:11,590 --> 00:21:13,810 So your first instinct was good. 443 00:21:13,810 --> 00:21:18,230 Order N. And questions? 444 00:21:18,230 --> 00:21:18,870 No more? 445 00:21:18,870 --> 00:21:19,842 Cool. 446 00:21:19,842 --> 00:21:20,550 How about remove? 447 00:21:23,710 --> 00:21:26,070 AUDIENCE: Order N. 448 00:21:26,070 --> 00:21:28,816 PROFESSOR: Order N. Why is it order N? 449 00:21:28,816 --> 00:21:32,530 AUDIENCE: You have to shift left everything to the right. 450 00:21:32,530 --> 00:21:35,410 PROFESSOR: Is this what you were going to say too? 451 00:21:35,410 --> 00:21:37,590 Same reason as before, if I want to remove 2 452 00:21:37,590 --> 00:21:39,640 I still have to move everything to the left. 453 00:21:39,640 --> 00:21:42,180 I can find the key that I want to remove very fast 454 00:21:42,180 --> 00:21:44,640 but then shifting things is still 455 00:21:44,640 --> 00:21:50,400 order N. OK how about count? 456 00:21:50,400 --> 00:21:53,498 How does count work? 457 00:21:53,498 --> 00:21:56,480 AUDIENCE: It's the same thing, isn't it? 458 00:21:56,480 --> 00:21:58,970 AUDIENCE: Binary search so it's log n. 459 00:21:58,970 --> 00:22:01,460 PROFESSOR: Binary search, so it's log n. 460 00:22:01,460 --> 00:22:03,315 If I look at the count there it's-- 461 00:22:03,315 --> 00:22:04,690 AUDIENCE: It's not though, that's 462 00:22:04,690 --> 00:22:06,165 telling you how many keys between. 463 00:22:06,165 --> 00:22:07,748 AUDIENCE: So it's a simple subtraction 464 00:22:07,748 --> 00:22:09,490 to find the number of keys between. 465 00:22:12,800 --> 00:22:14,550 PROFESSOR: Count calls binary search twice 466 00:22:14,550 --> 00:22:18,104 and then it does a arithmetic operation, yes. 467 00:22:18,104 --> 00:22:20,724 AUDIENCE: It would be log n because with the count where 468 00:22:20,724 --> 00:22:22,974 the range is that's log n because you have to actually 469 00:22:22,974 --> 00:22:25,329 sift through it, but binary search you have two 470 00:22:25,329 --> 00:22:27,370 other operations and then you just do a surprise. 471 00:22:27,370 --> 00:22:28,453 PROFESSOR: Yep, thank you. 472 00:22:28,453 --> 00:22:30,330 Very good. 473 00:22:30,330 --> 00:22:35,570 This is log n because of good old binary search. 474 00:22:35,570 --> 00:22:36,930 How about lists? 475 00:22:36,930 --> 00:22:38,984 I think you were thinking ahead of lists. 476 00:22:38,984 --> 00:22:39,650 How about lists? 477 00:22:43,870 --> 00:22:46,370 AUDIENCE: You have to list every single thing that you read. 478 00:22:46,370 --> 00:22:46,754 AUDIENCE: You've got the values you might potentially 479 00:22:46,754 --> 00:22:48,290 have to [INAUDIBLE]. 480 00:22:48,290 --> 00:22:52,910 AUDIENCE: Is it log n plus the length of the list. 481 00:22:52,910 --> 00:22:54,330 PROFESSOR: Yeah. 482 00:22:54,330 --> 00:22:56,460 I was going to do a nice long analysis 483 00:22:56,460 --> 00:22:58,540 and there's the answer. 484 00:22:58,540 --> 00:23:03,850 Log n plus, suppose you return i values from the list, i. 485 00:23:03,850 --> 00:23:06,450 Why does this matter? 486 00:23:06,450 --> 00:23:11,820 If I have a range query from minus infinity to plus infinity 487 00:23:11,820 --> 00:23:14,565 I have to create a new list, copy all these keys in 488 00:23:14,565 --> 00:23:15,860 and return them. 489 00:23:15,860 --> 00:23:17,160 That's order n. 490 00:23:17,160 --> 00:23:19,810 I can't just say it's log n. 491 00:23:19,810 --> 00:23:22,620 On the other hand, if the answers to my list queries 492 00:23:22,620 --> 00:23:25,320 are small, if they're one element, 493 00:23:25,320 --> 00:23:29,470 then it's going to take log n time to do a binary search 494 00:23:29,470 --> 00:23:32,700 and then order one to produce the list if I 495 00:23:32,700 --> 00:23:34,640 have a constant size list. 496 00:23:34,640 --> 00:23:36,970 If I just say order n I'm selling myself short. 497 00:23:36,970 --> 00:23:41,500 It's a lot better if I want to have short queries. 498 00:23:45,860 --> 00:23:48,395 What would be the best possible running time for lists? 499 00:23:56,460 --> 00:24:02,480 If I had a magic algorithm that works as fast as possible would 500 00:24:02,480 --> 00:24:05,186 I be able to run in order one time? 501 00:24:05,186 --> 00:24:07,310 AUDIENCE: No. 502 00:24:07,310 --> 00:24:09,460 PROFESSOR: Super fast magic algorithm order one. 503 00:24:09,460 --> 00:24:11,645 No, why not? 504 00:24:11,645 --> 00:24:16,452 AUDIENCE: Because you still have to find l and h in your array. 505 00:24:16,452 --> 00:24:18,160 PROFESSOR: Suppose I have some other data 506 00:24:18,160 --> 00:24:21,324 structure, a super magical data structure. 507 00:24:21,324 --> 00:24:23,896 AUDIENCE: You can do searches in order one time, 508 00:24:23,896 --> 00:24:25,170 is what you're saying. 509 00:24:25,170 --> 00:24:27,705 AUDIENCE: You still have to have the contents of it 510 00:24:27,705 --> 00:24:29,664 for what you're listing. 511 00:24:29,664 --> 00:24:30,830 It would have to be order i. 512 00:24:36,309 --> 00:24:38,225 PROFESSOR: Even if I have a super magical data 513 00:24:38,225 --> 00:24:40,950 structure where I can do everything that I want inside 514 00:24:40,950 --> 00:24:44,110 in order one time then when I produce the output 515 00:24:44,110 --> 00:24:47,680 I still have to write the output. 516 00:24:47,680 --> 00:24:50,900 This thing is going to be omega i 517 00:24:50,900 --> 00:24:54,050 and I can't possibly do any better than that. 518 00:24:54,050 --> 00:24:56,390 No matter what running time you have 519 00:24:56,390 --> 00:24:59,410 for list it has to include this i. 520 00:24:59,410 --> 00:25:01,680 Unless you're running time is order i, in which case 521 00:25:01,680 --> 00:25:04,698 i is smaller or equal to n. 522 00:25:08,020 --> 00:25:09,530 Does this look right to you? 523 00:25:14,540 --> 00:25:16,120 There is a small bug in this code. 524 00:25:16,120 --> 00:25:19,990 What if I do count of 4, 11. 525 00:25:19,990 --> 00:25:22,210 Binary search returns, if there's a key 526 00:25:22,210 --> 00:25:24,670 there it will give me the position of that key. 527 00:25:24,670 --> 00:25:29,010 And if there's no key it will tell me 528 00:25:29,010 --> 00:25:31,870 where I should insert the key if the key doesn't 529 00:25:31,870 --> 00:25:33,360 exist in theory. 530 00:25:33,360 --> 00:25:39,520 So if this is my array then the positions are 0, 1, 2, 3, 4, 5. 531 00:25:43,250 --> 00:25:47,550 Count of 4, 11 would do a binary search of 11 and see 4 532 00:25:47,550 --> 00:25:51,430 and then it will do a binary search for 4 and return 2. 533 00:25:51,430 --> 00:25:51,930 Right? 534 00:25:51,930 --> 00:25:53,506 Because if I want to insert 4 I would 535 00:25:53,506 --> 00:25:56,450 have to put it here and then shift everything to the right. 536 00:25:56,450 --> 00:26:01,556 And 4 minus 2 is 2 therefore it's broken. 537 00:26:01,556 --> 00:26:03,410 AUDIENCE: Off by one block. 538 00:26:03,410 --> 00:26:05,580 PROFESSOR: Off by one block, right? 539 00:26:05,580 --> 00:26:08,870 Well the interesting thing about it is that I had this code, 540 00:26:08,870 --> 00:26:11,860 we actually shipped this code between Saturday morning 541 00:26:11,860 --> 00:26:14,620 and Sunday evening and it passed all the tests. 542 00:26:14,620 --> 00:26:16,440 As we go to the next section keep in mind 543 00:26:16,440 --> 00:26:19,000 that this is broken but it would still 544 00:26:19,000 --> 00:26:23,418 pass the test for our problem and we'll see why in a bit. 545 00:26:23,418 --> 00:26:26,100 AUDIENCE: [INAUDIBLE]. 546 00:26:26,100 --> 00:26:29,116 PROFESSOR: Count of 4, 11. 547 00:26:29,116 --> 00:26:30,824 AUDIENCE: So confined to them then 548 00:26:30,824 --> 00:26:34,970 it says that when you're down 5, right? 549 00:26:34,970 --> 00:26:35,470 Four. 550 00:26:35,470 --> 00:26:36,136 PROFESSOR: Four. 551 00:26:40,690 --> 00:26:42,360 Binary search, if it finds the key 552 00:26:42,360 --> 00:26:44,874 it returns the position of they key. 553 00:26:44,874 --> 00:26:46,707 AUDIENCE: [INAUDIBLE] 4 divided by x, and 4, 554 00:26:46,707 --> 00:26:54,150 4 would it have had two? 555 00:26:54,150 --> 00:26:56,643 PROFESSOR: Yeah, because it tells me where to insert it. 556 00:26:56,643 --> 00:26:58,535 AUDIENCE: OK, then it is. 557 00:27:02,319 --> 00:27:03,302 [INAUDIBLE] 558 00:27:03,302 --> 00:27:05,010 PROFESSOR: Well it says 4 minus 2, right? 559 00:27:05,010 --> 00:27:07,920 The last line is high index minus low index? 560 00:27:07,920 --> 00:27:10,320 AUDIENCE: Oh, well that's wrong. [INAUDIBLE]. 561 00:27:10,320 --> 00:27:11,400 AUDIENCE: So are you saying that it tells you where to insert it 562 00:27:11,400 --> 00:27:12,775 instead of actually inserting it. 563 00:27:14,844 --> 00:27:15,510 PROFESSOR: Yeah. 564 00:27:15,510 --> 00:27:17,640 Binary search doesn't insert the element. 565 00:27:17,640 --> 00:27:19,490 It just tells me this is where you put it. 566 00:27:19,490 --> 00:27:24,450 And then if you look at add, add inserts it right there. 567 00:27:24,450 --> 00:27:26,158 That's why binary search is done that way 568 00:27:26,158 --> 00:27:27,239 so that that would work. 569 00:27:27,239 --> 00:27:29,284 AUDIENCE: Can I ask a style question? 570 00:27:29,284 --> 00:27:29,950 PROFESSOR: Sure. 571 00:27:29,950 --> 00:27:31,866 AUDIENCE: The underscore before binary search, 572 00:27:31,866 --> 00:27:33,915 is that like saying private? 573 00:27:33,915 --> 00:27:34,540 PROFESSOR: Yep. 574 00:27:34,540 --> 00:27:36,492 AUDIENCE: OK. 575 00:27:36,492 --> 00:27:39,230 AUDIENCE: Wait, what does private mean? 576 00:27:39,230 --> 00:27:41,390 PROFESSOR: Private means that it's not 577 00:27:41,390 --> 00:27:43,540 part of the public interface of the class. 578 00:27:46,647 --> 00:27:48,730 If we're in a team and we're working on something, 579 00:27:48,730 --> 00:27:52,930 if I mark a method as private that means if you use my class, 580 00:27:52,930 --> 00:27:55,290 you're not supposed to call it because I might change 581 00:27:55,290 --> 00:27:57,470 it's name, I might delete it completely, 582 00:27:57,470 --> 00:27:59,230 I can do anything I want to it. 583 00:27:59,230 --> 00:28:02,590 But if the method is public, so add, remove, list, and count, 584 00:28:02,590 --> 00:28:06,510 those are the API or the public interface of the class. 585 00:28:06,510 --> 00:28:10,260 I guarantee that as long as the blit range index class exists 586 00:28:10,260 --> 00:28:12,650 it's going to have an add to remove a list 587 00:28:12,650 --> 00:28:15,802 and the conduct will behave in exactly in that way. 588 00:28:15,802 --> 00:28:17,510 Once you have a class with public methods 589 00:28:17,510 --> 00:28:18,770 you're not supposed to change them 590 00:28:18,770 --> 00:28:20,565 because other people might depend on them 591 00:28:20,565 --> 00:28:22,756 and that would make other people unhappy. 592 00:28:31,000 --> 00:28:36,880 We said that this range index holds wires. 593 00:28:36,880 --> 00:28:37,380 Right? 594 00:28:37,380 --> 00:28:43,337 So if you have these wires here they're not really numbers. 595 00:28:43,337 --> 00:28:44,920 I know how to compare numbers, I don't 596 00:28:44,920 --> 00:28:46,080 know how to compare wires. 597 00:28:46,080 --> 00:28:51,040 That's not something that's readily comparable, right? 598 00:28:51,040 --> 00:28:58,520 We have to bridge the gap between numbers or things 599 00:28:58,520 --> 00:29:00,830 that are comparable and wires. 600 00:29:00,830 --> 00:29:04,460 The way we do that is the key wire pair class. 601 00:29:04,460 --> 00:29:06,180 If you look at key wire pair class 602 00:29:06,180 --> 00:29:10,770 it has-- I will write it as KWP here. 603 00:29:10,770 --> 00:29:12,600 So it has a key and the wire. 604 00:29:12,600 --> 00:29:13,100 Right? 605 00:29:13,100 --> 00:29:16,850 That's why it's a key wire pair. 606 00:29:16,850 --> 00:29:19,580 And if you look at the comparison methods, 607 00:29:19,580 --> 00:29:23,040 lines 11 through 26, but if you look at 11 608 00:29:23,040 --> 00:29:26,510 through 13 pretty much everything else is the same. 609 00:29:26,510 --> 00:29:30,410 Then comparisons are done in a certain way. 610 00:29:30,410 --> 00:29:33,300 So the first criterion is the key. 611 00:29:33,300 --> 00:29:36,680 If you have two key wire pairs that have different keys, 612 00:29:36,680 --> 00:29:39,200 then the one with the bigger key wins. 613 00:29:39,200 --> 00:29:41,390 Now if they have the same key then 614 00:29:41,390 --> 00:29:43,820 this field called wire IDs is used to break ties. 615 00:29:46,490 --> 00:29:51,870 So first key and then wire ID. 616 00:29:51,870 --> 00:29:54,900 What's the wire ID set to? 617 00:29:54,900 --> 00:29:55,507 And where? 618 00:29:58,130 --> 00:30:01,100 Where would you set a field in a nice class? 619 00:30:01,100 --> 00:30:02,552 AUDIENCE: The constructor. 620 00:30:02,552 --> 00:30:04,260 PROFESSOR: In the constructor, very good. 621 00:30:04,260 --> 00:30:05,590 Where is it set? 622 00:30:05,590 --> 00:30:08,340 Line number? 623 00:30:08,340 --> 00:30:09,320 AUDIENCE: 10. 624 00:30:09,320 --> 00:30:10,111 PROFESSOR: Line 10. 625 00:30:10,111 --> 00:30:13,250 So the wire ID uses the same object IDs that we had earlier. 626 00:30:13,250 --> 00:30:16,780 So each wire has a unique ID that's between zero 627 00:30:16,780 --> 00:30:17,850 and the number of wires. 628 00:30:21,960 --> 00:30:25,330 This works when I insert my wires into the index. 629 00:30:25,330 --> 00:30:28,490 What's the key for wires, by the way? 630 00:30:28,490 --> 00:30:31,150 Where are wires inserted into the index in the algorithm 631 00:30:31,150 --> 00:30:32,900 that we have and what's the key? 632 00:30:35,770 --> 00:30:37,650 The algorithm that we gave you. 633 00:30:37,650 --> 00:30:38,450 Yes. 634 00:30:38,450 --> 00:30:39,675 AUDIENCE: The y-coordinates. 635 00:30:39,675 --> 00:30:40,300 PROFESSOR: The? 636 00:30:40,300 --> 00:30:41,592 Which one, I couldn't hear you. 637 00:30:41,592 --> 00:30:42,716 AUDIENCE: The y-coordinate. 638 00:30:42,716 --> 00:30:43,990 PROFESSOR: The y-coordinate. 639 00:30:43,990 --> 00:30:45,115 Can you tell me which line? 640 00:30:47,600 --> 00:30:48,830 The intuition is very good. 641 00:30:48,830 --> 00:30:52,660 You want the wires to be keyed by the y-coordinate 642 00:30:52,660 --> 00:30:56,430 so that when you a range query between this guy and this guy 643 00:30:56,430 --> 00:30:59,685 you get these wires. 644 00:30:59,685 --> 00:31:00,560 That's the intuition. 645 00:31:00,560 --> 00:31:03,380 Now what's the code? 646 00:31:03,380 --> 00:31:04,130 AUDIENCE: Line 13. 647 00:31:07,480 --> 00:31:08,296 PROFESSOR: Line? 648 00:31:08,296 --> 00:31:09,270 I didn't hear you well. 649 00:31:09,270 --> 00:31:09,979 Line? 650 00:31:09,979 --> 00:31:10,520 AUDIENCE: 13. 651 00:31:10,520 --> 00:31:12,520 PROFESSOR: 13. 652 00:31:12,520 --> 00:31:16,150 Wires are added to the index when an event of type 653 00:31:16,150 --> 00:31:17,390 add is processed. 654 00:31:17,390 --> 00:31:18,410 Right? 655 00:31:18,410 --> 00:31:20,457 Line 12 says if event type is add 656 00:31:20,457 --> 00:31:21,790 so that's probably what we want. 657 00:31:21,790 --> 00:31:24,890 And line 13 adds it to the index and builds the key wire repair 658 00:31:24,890 --> 00:31:28,320 with the key, the y-coordinate of the wire. 659 00:31:31,460 --> 00:31:35,270 Now let's look back at the keys and see 660 00:31:35,270 --> 00:31:40,810 that we have two special key wire repair classes. 661 00:31:40,810 --> 00:31:44,280 Key wire pair l and key wire pair h. 662 00:31:44,280 --> 00:31:47,340 These don't want the wire. 663 00:31:47,340 --> 00:31:51,460 Why would I make a wire key pair that doesn't want the wire? 664 00:31:51,460 --> 00:31:52,450 Why is that useful? 665 00:31:56,797 --> 00:32:02,751 AUDIENCE: You have the minimum and maximum wire at ease. 666 00:32:02,751 --> 00:32:04,125 PROFESSOR: Very good observation. 667 00:32:04,125 --> 00:32:08,080 The wire ID for the low key pair looks 668 00:32:08,080 --> 00:32:11,730 like minus infinity and the one for the high key pair 669 00:32:11,730 --> 00:32:14,020 look like plus infinity as long as I 670 00:32:14,020 --> 00:32:16,840 don't have more than a billion wires. 671 00:32:16,840 --> 00:32:19,050 What do these things not have? 672 00:32:22,860 --> 00:32:25,110 They don't have a wire. 673 00:32:25,110 --> 00:32:27,240 So if you look at key wire pair l and h, 674 00:32:27,240 --> 00:32:30,590 if you look at the initializers on lines two and eight, 675 00:32:30,590 --> 00:32:33,820 they only take a key they don't take a wire. 676 00:32:33,820 --> 00:32:37,260 This is useful in which situation? 677 00:32:37,260 --> 00:32:40,530 Where do I want to create an index key that's 678 00:32:40,530 --> 00:32:42,487 not associated with the real wire? 679 00:32:42,487 --> 00:32:44,362 AUDIENCE: When you want to make the list when 680 00:32:44,362 --> 00:32:46,540 you want to do a query. 681 00:32:46,540 --> 00:32:48,450 PROFESSOR: When I do a query. 682 00:32:48,450 --> 00:32:49,960 Very good. 683 00:32:49,960 --> 00:32:53,770 If I'm querying, if I have this vertical wire, which 684 00:32:53,770 --> 00:33:01,780 starts from minus 80 to plus 80 then I 685 00:33:01,780 --> 00:33:04,530 would like to make a low key that corresponds to minus 80 686 00:33:04,530 --> 00:33:07,300 and a high key that corresponds to plus 80, 687 00:33:07,300 --> 00:33:11,800 but I don't have any wire with a horizontal coordinate of 80. 688 00:33:11,800 --> 00:33:12,300 Right? 689 00:33:12,300 --> 00:33:13,800 A hackish solution would be to make 690 00:33:13,800 --> 00:33:15,250 fake wires with those coordinates 691 00:33:15,250 --> 00:33:18,960 but if I make fake wires God knows where my system is going 692 00:33:18,960 --> 00:33:20,460 to break elsewhere. 693 00:33:20,460 --> 00:33:23,720 Instead the clean solution is to have these key wire pairs 694 00:33:23,720 --> 00:33:25,880 so that when I insert wires I have a wire 695 00:33:25,880 --> 00:33:28,200 and when I don't the wire is set to none. 696 00:33:28,200 --> 00:33:30,380 And the wire IDs are these special values, 697 00:33:30,380 --> 00:33:32,920 minus infinity and plus infinity. 698 00:33:32,920 --> 00:33:36,250 What's cool about setting the wire IDs to minus infinity 699 00:33:36,250 --> 00:33:37,060 and plus infinity? 700 00:33:42,040 --> 00:33:44,032 AUDIENCE: It's so that if you have a y of 80 701 00:33:44,032 --> 00:33:46,024 this y is at negative infinity so it's 702 00:33:46,024 --> 00:33:53,924 going to take that y also because its y value is 703 00:33:53,924 --> 00:33:56,350 [INAUDIBLE]. 704 00:33:56,350 --> 00:33:58,070 PROFESSOR: All right. 705 00:33:58,070 --> 00:33:58,570 Cool. 706 00:33:58,570 --> 00:34:01,410 If I have a real key wire pair and it 707 00:34:01,410 --> 00:34:05,340 has a wire whose coordinate is 80 708 00:34:05,340 --> 00:34:12,090 this is going to be bigger than a key wire pair 709 00:34:12,090 --> 00:34:15,429 l with coordinate 80 and it's going 710 00:34:15,429 --> 00:34:22,159 to be smaller than a key wire pair h with coordinate 80. 711 00:34:22,159 --> 00:34:25,270 If you do a query using this and this 712 00:34:25,270 --> 00:34:28,000 it's going to grab all the wires with coordinate 80. 713 00:34:34,940 --> 00:34:36,860 Does this makes sense? 714 00:34:36,860 --> 00:34:38,840 So what's cool about this in terms of coding? 715 00:34:45,010 --> 00:34:47,719 Do I have equal keys in my range index? 716 00:34:47,719 --> 00:34:51,800 Do I have to handle multiple equal keys? 717 00:34:56,264 --> 00:34:58,744 AUDIENCE: No because, actually in this case yes. 718 00:35:02,720 --> 00:35:03,840 PROFESSOR: Do I? 719 00:35:03,840 --> 00:35:06,090 AUDIENCE: Because if you have all the horizontal wires 720 00:35:06,090 --> 00:35:09,765 then it's possible that you have two 721 00:35:09,765 --> 00:35:11,980 horizontal wires at the same-- 722 00:35:11,980 --> 00:35:14,050 PROFESSOR: If I have two horizontal wires, 723 00:35:14,050 --> 00:35:16,250 say this guy here and this other guy 724 00:35:16,250 --> 00:35:19,750 here, this guy here, what's the key value pair for it? 725 00:35:19,750 --> 00:35:20,470 Key wire pair? 726 00:35:25,640 --> 00:35:29,700 Let's say the coordinate is minus 95 and the wire ID is 2. 727 00:35:29,700 --> 00:35:34,600 The key wire pair is they key is minus 95 728 00:35:34,600 --> 00:35:36,480 and the wire ID is 2, right? 729 00:35:36,480 --> 00:35:37,260 AUDIENCE: Yeah. 730 00:35:37,260 --> 00:35:38,926 PROFESSOR: And then for this other wire, 731 00:35:38,926 --> 00:35:43,170 suppose the wire ID is 10, the coordinate is 732 00:35:43,170 --> 00:35:48,940 going to be minus 95 then the wire ID is going to be 10. 733 00:35:48,940 --> 00:35:51,580 So how are they going to compare? 734 00:35:51,580 --> 00:35:53,696 AUDIENCE: Then that one is less than that one. 735 00:35:53,696 --> 00:35:54,320 PROFESSOR: Yup. 736 00:35:54,320 --> 00:35:58,010 So even though I have two wires with the same y-coordinates 737 00:35:58,010 --> 00:36:01,410 of the key as far as the algorithm is concerned 738 00:36:01,410 --> 00:36:03,530 is the same, in the implementation 739 00:36:03,530 --> 00:36:07,140 the keys are artificially different because I introduce 740 00:36:07,140 --> 00:36:10,630 an artificial ordering on the wires using that wire ID. 741 00:36:13,880 --> 00:36:17,341 In Which case would I have the same key inserted in my index 742 00:36:17,341 --> 00:36:17,840 twice? 743 00:36:26,100 --> 00:36:26,770 Yes? 744 00:36:26,770 --> 00:36:27,686 AUDIENCE: [INAUDIBLE]. 745 00:36:31,890 --> 00:36:32,870 PROFESSOR: OK. 746 00:36:32,870 --> 00:36:36,290 If these two wires overlap, say this guy ends here 747 00:36:36,290 --> 00:36:39,155 and this guy ends here, and if I put their keys into the index 748 00:36:39,155 --> 00:36:40,774 they're still different. 749 00:36:40,774 --> 00:36:42,773 AUDIENCE: Only if they're the same wire ID would 750 00:36:42,773 --> 00:36:45,390 they have the same-- 751 00:36:45,390 --> 00:36:48,909 PROFESSOR: OK, and when do two wires have the same wire ID? 752 00:36:48,909 --> 00:36:50,867 AUDIENCE: If one of them was negative infinity. 753 00:36:56,179 --> 00:36:57,970 PROFESSOR: Let's assume that the infinities 754 00:36:57,970 --> 00:37:00,650 were in the right way. 755 00:37:00,650 --> 00:37:01,560 Yes? 756 00:37:01,560 --> 00:37:02,060 No? 757 00:37:04,610 --> 00:37:05,900 Wire IDs are unique. 758 00:37:05,900 --> 00:37:07,790 Every wire has it's own wire ID. 759 00:37:07,790 --> 00:37:10,220 Wires will never have the same wire ID. 760 00:37:10,220 --> 00:37:11,990 So the only case in which you have 761 00:37:11,990 --> 00:37:14,900 two equal keys in the index is if you insert the same wire 762 00:37:14,900 --> 00:37:17,160 twice. 763 00:37:17,160 --> 00:37:20,450 Would you ever want to do that? 764 00:37:20,450 --> 00:37:21,070 Probably not. 765 00:37:21,070 --> 00:37:23,700 You're going to the same wire twice from the range query 766 00:37:23,700 --> 00:37:26,510 and that's not useful. 767 00:37:26,510 --> 00:37:29,349 When implementing a data structure for the range index 768 00:37:29,349 --> 00:37:30,890 we don't have to deal with equal keys 769 00:37:30,890 --> 00:37:34,760 and that is nice because that's less thinking. 770 00:37:34,760 --> 00:37:38,570 Now when we do a query, will the keys in the query 771 00:37:38,570 --> 00:37:41,577 be in the range index? 772 00:37:41,577 --> 00:37:42,660 AUDIENCE: Not necessarily. 773 00:37:48,735 --> 00:37:50,110 PROFESSOR: How would I do a query 774 00:37:50,110 --> 00:37:52,846 if I want to do a query from minus 80 to plus 80? 775 00:37:52,846 --> 00:37:54,887 AUDIENCE: You grab all the horizontal wires whose 776 00:37:54,887 --> 00:37:58,540 y-coordinates are-- 777 00:37:58,540 --> 00:38:01,570 PROFESSOR: You're thinking in the range index. 778 00:38:01,570 --> 00:38:04,490 I'm thinking as the client of the range index. 779 00:38:04,490 --> 00:38:06,060 So you have this range index class, 780 00:38:06,060 --> 00:38:09,960 how would you issue a query to the range index? 781 00:38:09,960 --> 00:38:12,317 AUDIENCE: The list on this page. 782 00:38:12,317 --> 00:38:12,900 PROFESSOR: OK. 783 00:38:12,900 --> 00:38:16,690 I would call the list method. 784 00:38:16,690 --> 00:38:20,000 And-- oh crap, I thought I was going 785 00:38:20,000 --> 00:38:22,800 to avoid doing that today. 786 00:38:22,800 --> 00:38:24,520 l and h are keys, right? 787 00:38:24,520 --> 00:38:27,868 How do I construct my keys? 788 00:38:27,868 --> 00:38:31,110 AUDIENCE: Use key wire pair. 789 00:38:31,110 --> 00:38:34,250 PROFESSOR: Key wire pair left of 80-- 790 00:38:34,250 --> 00:38:39,740 sorry low, and key wire pair high of 80. 791 00:38:39,740 --> 00:38:42,330 So then I'm going to get to these-- 792 00:38:42,330 --> 00:38:44,890 I don't have wires for this 80 coordinate. 793 00:38:44,890 --> 00:38:46,990 Sorry, minus 80 to 80 because that's 794 00:38:46,990 --> 00:38:48,434 what the query looks like. 795 00:38:48,434 --> 00:38:50,100 I don't have wires for these coordinates 796 00:38:50,100 --> 00:38:53,024 so I'm going to use these special values 797 00:38:53,024 --> 00:38:55,440 and this one is smaller than all the wires with coordinate 798 00:38:55,440 --> 00:38:57,210 minus 80 and this one is bigger than all 799 00:38:57,210 --> 00:38:59,280 the wires with coordinate 80. 800 00:38:59,280 --> 00:39:00,870 Will these keys ever be in the index? 801 00:39:03,430 --> 00:39:05,130 Nope, we're only using them for updates. 802 00:39:05,130 --> 00:39:06,770 If you put this in the index then when 803 00:39:06,770 --> 00:39:09,666 you're getting it back and you try to get the wire associated 804 00:39:09,666 --> 00:39:11,040 with it you're going to get none, 805 00:39:11,040 --> 00:39:14,340 your code is going to crash. 806 00:39:14,340 --> 00:39:17,410 If I have a query will the keys in the query ever 807 00:39:17,410 --> 00:39:18,320 being the index? 808 00:39:18,320 --> 00:39:19,240 AUDIENCE: No. 809 00:39:19,240 --> 00:39:20,330 PROFESSOR: No. 810 00:39:20,330 --> 00:39:22,870 That means fewer border cases to consider 811 00:39:22,870 --> 00:39:25,565 when implementing query. 812 00:39:25,565 --> 00:39:27,440 All the keys in the range index are different 813 00:39:27,440 --> 00:39:31,025 and they keys in the query will never be in the range index. 814 00:39:31,025 --> 00:39:32,420 AUDIENCE: When you do the count l 815 00:39:32,420 --> 00:39:37,070 and h aren't in your range index you have to count that. 816 00:39:37,070 --> 00:39:37,870 PROFESSOR: Yeah. 817 00:39:37,870 --> 00:39:42,610 In this case that's why the code didn't fail any test. 818 00:39:42,610 --> 00:39:45,600 Because I'm not using numbers I'm actually using this. 819 00:39:45,600 --> 00:39:48,340 So for the number analogy this is 820 00:39:48,340 --> 00:39:54,120 equivalent to when I'm doing a count of 4,11 that 821 00:39:54,120 --> 00:40:05,360 would get rewritten as count of 3, 99 and 11.01 and this works. 822 00:40:05,360 --> 00:40:10,740 This is what these two values are doing. 823 00:40:10,740 --> 00:40:14,820 Except if you try to do 0.99 and 0.01 824 00:40:14,820 --> 00:40:19,107 that's brittle because you might actually have those keys. 825 00:40:19,107 --> 00:40:20,440 All right, does this make sense? 826 00:40:23,800 --> 00:40:25,321 Any questions on this part? 827 00:40:25,321 --> 00:40:25,820 Nope? 828 00:40:25,820 --> 00:40:27,590 Yes? 829 00:40:27,590 --> 00:40:28,250 Yes. 830 00:40:28,250 --> 00:40:30,250 AUDIENCE: Is this thing in a count off situation 831 00:40:30,250 --> 00:40:31,636 where you have a gap in the wires 832 00:40:31,636 --> 00:40:34,096 but you still have a [INAUDIBLE] associated with it 833 00:40:34,096 --> 00:40:37,060 as it's going through it. 834 00:40:37,060 --> 00:40:38,976 PROFESSOR: Sorry, say that again. 835 00:40:38,976 --> 00:40:40,767 AUDIENCE: So with that example right there. 836 00:40:40,767 --> 00:40:41,840 PROFESSOR: Yeah. 837 00:40:41,840 --> 00:40:42,960 PROFESSOR: Are you worried about the fact 838 00:40:42,960 --> 00:40:45,610 that I'll have two equal keys or are you worried about the fact 839 00:40:45,610 --> 00:40:46,980 that these wires are crossing? 840 00:40:46,980 --> 00:40:48,730 AUDIENCE: No that they're not overlapping. 841 00:40:51,930 --> 00:40:54,076 You have a wire and you have a wire coming down 842 00:40:54,076 --> 00:40:55,159 and you have another wire. 843 00:40:55,159 --> 00:40:56,930 PROFESSOR: OK. 844 00:40:56,930 --> 00:40:58,670 This is not going to account for that. 845 00:40:58,670 --> 00:41:01,102 You have to write the better sweep line algorithm. 846 00:41:01,102 --> 00:41:03,056 AUDIENCE: OK. 847 00:41:03,056 --> 00:41:04,430 PROFESSOR: What I'm talking about 848 00:41:04,430 --> 00:41:08,010 now will make your range index faster. 849 00:41:08,010 --> 00:41:11,827 Or it makes your range index simpler to implement. 850 00:41:11,827 --> 00:41:12,368 AUDIENCE: OK. 851 00:41:14,662 --> 00:41:16,620 PROFESSOR: This is magic behind the scenes that 852 00:41:16,620 --> 00:41:18,355 makes your code smaller. 853 00:41:18,355 --> 00:41:20,730 You still have to implement the sweep line algorithm that 854 00:41:20,730 --> 00:41:23,190 makes sure that when you a range query 855 00:41:23,190 --> 00:41:27,420 it will either return less false positives or ideally no false 856 00:41:27,420 --> 00:41:28,610 positives. 857 00:41:28,610 --> 00:41:30,870 And then the code will be faster. 858 00:41:30,870 --> 00:41:32,940 But then when you do that, you'll 859 00:41:32,940 --> 00:41:35,860 see that it's still slow because the range index 860 00:41:35,860 --> 00:41:37,565 that we have here isn't optimal. 861 00:41:37,565 --> 00:41:39,690 Why isn't the range index here optimal, by the way? 862 00:41:43,510 --> 00:41:44,910 What can I improve about it? 863 00:41:47,790 --> 00:41:49,230 AUDIENCE: Add and remove. 864 00:41:49,230 --> 00:41:49,850 PROFESSOR: Add and remove. 865 00:41:49,850 --> 00:41:50,350 OK. 866 00:41:55,290 --> 00:41:57,590 We're using the comparison model because it's 867 00:41:57,590 --> 00:41:59,340 convenient to use the comparison model. 868 00:41:59,340 --> 00:42:02,050 You have complex objects to implement those six methods 869 00:42:02,050 --> 00:42:04,270 that you see in key wire pair and bam. 870 00:42:04,270 --> 00:42:07,860 You can do comparisons, you can use any data structure 871 00:42:07,860 --> 00:42:10,730 or algorithm that assume a comparison model. 872 00:42:10,730 --> 00:42:13,330 What's the penalty of the comparison model? 873 00:42:13,330 --> 00:42:14,810 What's the problem with it? 874 00:42:14,810 --> 00:42:15,311 Yes. 875 00:42:15,311 --> 00:42:17,185 AUDIENCE: It will be slower, or it will never 876 00:42:17,185 --> 00:42:18,610 be faster than the [INAUDIBLE]. 877 00:42:18,610 --> 00:42:19,300 PROFESSOR: Yup. 878 00:42:19,300 --> 00:42:24,964 If I want to do a sort that is n log n. 879 00:42:27,750 --> 00:42:32,946 And if I want to do a search that is theta what? 880 00:42:32,946 --> 00:42:33,862 AUDIENCE: [INAUDIBLE]. 881 00:42:38,004 --> 00:42:40,670 PROFESSOR: Now what are the best bounds that they have for these 882 00:42:40,670 --> 00:42:42,657 if we can go outside the comparison model? 883 00:42:42,657 --> 00:42:44,990 Just to see if you guys are paying attention in lecture. 884 00:42:48,640 --> 00:42:50,150 So best running time for sort if we 885 00:42:50,150 --> 00:42:51,775 don't have to use the comparison model. 886 00:42:51,775 --> 00:42:52,740 AUDIENCE: n. 887 00:42:52,740 --> 00:42:53,450 PROFESSOR: n. 888 00:42:53,450 --> 00:42:54,620 OK. 889 00:42:54,620 --> 00:42:56,924 Best running time for search if we 890 00:42:56,924 --> 00:42:58,340 don't to use the comparison model. 891 00:43:00,880 --> 00:43:02,630 AUDIENCE: It's order one with [INAUDIBLE]. 892 00:43:02,630 --> 00:43:06,010 PROFESSOR: Order one, with hashing. 893 00:43:06,010 --> 00:43:08,796 Can you do any better than that? 894 00:43:08,796 --> 00:43:10,550 AUDIENCE: You can code it as zero. 895 00:43:10,550 --> 00:43:12,990 [LAUGHING] 896 00:43:12,990 --> 00:43:14,830 PROFESSOR: Sort has to output an array 897 00:43:14,830 --> 00:43:18,800 of elements so it can be faster than order n. 898 00:43:18,800 --> 00:43:21,690 Search has to output yes or no so it has to output 899 00:43:21,690 --> 00:43:23,940 something so it has to be order one. 900 00:43:23,940 --> 00:43:27,150 So this is as fast as it gets, we're 901 00:43:27,150 --> 00:43:30,030 definitely not going to go faster than that. 902 00:43:30,030 --> 00:43:33,360 We have this extra log n factor that we 903 00:43:33,360 --> 00:43:37,100 pay if we use the comparison model if we don't collect 904 00:43:37,100 --> 00:43:39,180 more information about our keys. 905 00:43:39,180 --> 00:43:40,730 But in return we have the convenience 906 00:43:40,730 --> 00:43:42,204 that all we have to do is establish 907 00:43:42,204 --> 00:43:43,245 an ordering relationship. 908 00:43:48,020 --> 00:43:50,140 Does it make sense? 909 00:43:50,140 --> 00:43:50,810 Cool. 910 00:43:50,810 --> 00:43:55,830 Let's try to talk about the list pseudocode, which 911 00:43:55,830 --> 00:43:58,340 you might have seen on the pset before. 912 00:43:58,340 --> 00:44:02,250 And that's on the last page here. 913 00:44:02,250 --> 00:44:04,160 I can't give you a proof of why it works, 914 00:44:04,160 --> 00:44:07,500 but we can work together to figure out how it works 915 00:44:07,500 --> 00:44:10,170 and get an intuitive understanding. 916 00:44:10,170 --> 00:44:11,850 Let's get the keys here and let's 917 00:44:11,850 --> 00:44:16,325 assume we're using a regular BSD for implementing a range index. 918 00:44:22,980 --> 00:44:25,085 I'm going to put my keys in a BSD. 919 00:44:28,240 --> 00:44:32,610 8, 3, 2, 5. 920 00:44:39,850 --> 00:44:41,460 OK. 921 00:44:41,460 --> 00:44:43,050 List those two things. 922 00:44:43,050 --> 00:44:46,670 LCA and no list. 923 00:44:46,670 --> 00:44:47,170 Right. 924 00:44:47,170 --> 00:44:48,330 What's LCA? 925 00:44:48,330 --> 00:44:52,370 Suppose I want to list all the keys between one and six. 926 00:44:57,860 --> 00:45:00,400 First I'm going to call LCA and get some answer 927 00:45:00,400 --> 00:45:02,170 and then I'm going to call node list. 928 00:45:02,170 --> 00:45:03,800 What's the answer for LCA? 929 00:45:06,790 --> 00:45:07,930 OK. 930 00:45:07,930 --> 00:45:08,930 AUDIENCE: Node at three. 931 00:45:08,930 --> 00:45:10,364 Key value three. 932 00:45:13,240 --> 00:45:14,350 PROFESSOR: What is this? 933 00:45:14,350 --> 00:45:15,650 What's LCA? 934 00:45:15,650 --> 00:45:19,254 It's OK to give out an answer to a dumb question on the pset. 935 00:45:22,090 --> 00:45:24,600 So LCA is lowest common ancestor. 936 00:45:24,600 --> 00:45:25,630 Very good. 937 00:45:25,630 --> 00:45:27,530 It's the lowest common ancestor of what? 938 00:45:31,420 --> 00:45:39,330 AUDIENCE: Subtree of values that should be returned nodeless. 939 00:45:39,330 --> 00:45:41,560 PROFESSOR: So usually you have two nodes in a tree 940 00:45:41,560 --> 00:45:43,560 and you want to return the LCA. 941 00:45:43,560 --> 00:45:47,940 If I want to find out the LCA of 2 and 5, that is 3. 942 00:45:47,940 --> 00:45:51,780 If I want to find out the LCA of 2 and 13, that's 8. 943 00:45:51,780 --> 00:45:56,254 Now if the keys are not in the tree what does the LCA give me? 944 00:45:56,254 --> 00:45:57,713 AUDIENCE: It's where they would be. 945 00:45:57,713 --> 00:46:00,129 PROFESSOR: It's where they would be if they were inserted. 946 00:46:00,129 --> 00:46:00,820 Right. 947 00:46:00,820 --> 00:46:06,885 So 1 would be inserted here and 6 would be inserted here. 948 00:46:09,650 --> 00:46:12,930 And then their LCA is indeed 3. 949 00:46:12,930 --> 00:46:13,980 What does that tell me? 950 00:46:13,980 --> 00:46:14,940 Why is that useful? 951 00:46:17,820 --> 00:46:20,220 AUDIENCE: You can cut off a bunch of the rest of the tree 952 00:46:20,220 --> 00:46:24,060 so you don't have to look at that. 953 00:46:24,060 --> 00:46:26,835 PROFESSOR: A bunch of, or the rest of the tree? 954 00:46:26,835 --> 00:46:28,287 AUDIENCE: The rest of the tree. 955 00:46:28,287 --> 00:46:29,620 PROFESSOR: The rest of the tree. 956 00:46:29,620 --> 00:46:33,620 So if 1 is here and 6 is here, then I 957 00:46:33,620 --> 00:46:41,730 know that all the keys-- sorry, so 3 is to the left of 8 958 00:46:41,730 --> 00:46:48,810 so all the keys to the right of 8 959 00:46:48,810 --> 00:46:51,280 are going to be bigger than 6, for example. 960 00:46:51,280 --> 00:46:51,780 Right. 961 00:46:51,780 --> 00:46:54,700 Because 6 is here and it's under 8. 962 00:46:54,700 --> 00:46:56,050 I can throw this away. 963 00:46:56,050 --> 00:46:58,900 Now if this would be to the left of something else, 964 00:46:58,900 --> 00:47:02,959 because 1 is here I know that all the keys 965 00:47:02,959 --> 00:47:05,250 to the left side of that thing would be smaller than 1. 966 00:47:10,560 --> 00:47:13,970 This subtree has to have all the keys between 1 and 6. 967 00:47:16,920 --> 00:47:17,915 Yes. 968 00:47:17,915 --> 00:47:19,665 AUDIENCE: You don't have to move 8, right? 969 00:47:19,665 --> 00:47:21,265 Where would you move 8? 970 00:47:21,265 --> 00:47:21,890 PROFESSOR: Yup. 971 00:47:21,890 --> 00:47:25,400 So we only look at the subtree rooted at LCA. 972 00:47:25,400 --> 00:47:28,410 LCA is passed as an argument to node list, 973 00:47:28,410 --> 00:47:31,845 and node list seems to do sort of an-- this 974 00:47:31,845 --> 00:47:34,330 is a pre-ordered traversal. 975 00:47:34,330 --> 00:47:36,590 You look at the key, you look at the left node, 976 00:47:36,590 --> 00:47:39,150 at the left subtree, you look at the right subtree. 977 00:47:39,150 --> 00:47:41,590 Except there is some magic there and in some cases 978 00:47:41,590 --> 00:47:44,999 it doesn't go and look. 979 00:47:44,999 --> 00:47:47,540 Let's see what would a good case where it doesn't go and look 980 00:47:47,540 --> 00:47:47,740 like. 981 00:47:47,740 --> 00:47:48,780 Let's get the same tree. 982 00:47:58,020 --> 00:48:04,016 Suppose I want to do a list between 4 and 11-- 983 00:48:04,016 --> 00:48:08,020 sorry between 4 and 10. 984 00:48:08,020 --> 00:48:10,002 We'll the LCA of 4 and 10? 985 00:48:10,002 --> 00:48:10,668 AUDIENCE: Eight. 986 00:48:13,970 --> 00:48:17,730 PROFESSOR: OK, so node list would have to start here. 987 00:48:17,730 --> 00:48:21,420 Suppose I'm here and I go to 3. 988 00:48:21,420 --> 00:48:23,920 I know 3 is smaller than 4. 989 00:48:23,920 --> 00:48:26,160 Do I want to go left? 990 00:48:26,160 --> 00:48:29,170 Everything to the left of 3 is smaller than 4. 991 00:48:29,170 --> 00:48:32,510 This entire subtree can be pruned. 992 00:48:32,510 --> 00:48:37,790 Now suppose I'm on the right side and I'm at the 11. 993 00:48:37,790 --> 00:48:42,550 11 is bigger than 10, do I want to go right? 994 00:48:42,550 --> 00:48:44,620 So if 11 is bigger than 10 I never 995 00:48:44,620 --> 00:48:46,220 want to go right because everything 996 00:48:46,220 --> 00:48:48,030 to the right of that is bigger. 997 00:48:50,600 --> 00:48:55,950 OK so this is how pruning works. 998 00:48:55,950 --> 00:48:58,010 Now if I'm at 3, can I stop? 999 00:49:01,060 --> 00:49:05,117 I can't stop and exit completely because on the right side of 3 1000 00:49:05,117 --> 00:49:06,950 I might have keys that are bigger than three 1001 00:49:06,950 --> 00:49:08,660 and that are in my interval. 1002 00:49:08,660 --> 00:49:11,060 And in this example that seems to be the case. 1003 00:49:15,290 --> 00:49:22,150 Let's look very quickly at the intuitive way 1004 00:49:22,150 --> 00:49:26,220 of analyzing the running time for this. 1005 00:49:26,220 --> 00:49:30,880 Suppose I have my LCA up here and suppose 1006 00:49:30,880 --> 00:49:36,500 I go on a path from LCA to L and then-- let's 1007 00:49:36,500 --> 00:49:39,810 not worry about the right, about h. h is somewhere here, 1008 00:49:39,810 --> 00:49:42,050 the case is going to be symmetric. 1009 00:49:42,050 --> 00:49:46,910 When I go down the path, after I go here would I ever want to go 1010 00:49:46,910 --> 00:49:47,410 left? 1011 00:49:50,370 --> 00:49:52,660 If this node is before a right turn then 1012 00:49:52,660 --> 00:49:56,410 I know that l is going to be bigger than that node. 1013 00:49:56,410 --> 00:49:57,080 Right? 1014 00:49:57,080 --> 00:50:00,190 So I'm never going to take a left turn. 1015 00:50:00,190 --> 00:50:03,420 This subtree be pruned, this subtree will be pruned, 1016 00:50:03,420 --> 00:50:05,950 this subtree will be pruned. 1017 00:50:05,950 --> 00:50:08,325 Now if I go right here, what happens? 1018 00:50:11,378 --> 00:50:12,365 AUDIENCE: It's node-- 1019 00:50:12,365 --> 00:50:13,740 PROFESSOR: All the nodes here are 1020 00:50:13,740 --> 00:50:16,370 going to be between l and LCA. 1021 00:50:16,370 --> 00:50:19,720 LCA is smaller or equal to h so all these nodes here 1022 00:50:19,720 --> 00:50:23,840 are guaranteed to be nodes that come out in my list. 1023 00:50:23,840 --> 00:50:26,170 All right, why am I doing this? 1024 00:50:26,170 --> 00:50:29,090 I was saying before that I absolutely 1025 00:50:29,090 --> 00:50:31,820 have to have an order i because I 1026 00:50:31,820 --> 00:50:34,670 have to produce the output list. 1027 00:50:34,670 --> 00:50:38,600 If we look at the lines three and four, 1028 00:50:38,600 --> 00:50:42,430 line four outputs a key right? 1029 00:50:42,430 --> 00:50:45,075 Line four is definitely order i and line three 1030 00:50:45,075 --> 00:50:47,030 is an if condition so we know for sure 1031 00:50:47,030 --> 00:50:48,700 that line four is only going to execute 1032 00:50:48,700 --> 00:50:52,720 four keys that are between l and h. 1033 00:50:52,720 --> 00:50:56,080 This algorithm is going to visit some nodes in the tree. 1034 00:50:56,080 --> 00:51:01,860 For example, if I have a list of 4 and 10 it's going to visit 8, 1035 00:51:01,860 --> 00:51:06,220 it's going to visit 3, 5, and 11. 1036 00:51:06,220 --> 00:51:09,020 And for some of the nodes that are visited 1037 00:51:09,020 --> 00:51:11,490 it's going to output them. 1038 00:51:11,490 --> 00:51:16,290 Right this guy gets output and this guy gets output. 1039 00:51:16,290 --> 00:51:20,300 The nodes that are outputs are order i. 1040 00:51:20,300 --> 00:51:22,407 Those are already covered in the i term. 1041 00:51:22,407 --> 00:51:24,740 And in order to get the running time what I need to know 1042 00:51:24,740 --> 00:51:27,090 is, how many nodes do I visit that are not 1043 00:51:27,090 --> 00:51:29,940 part of the output. 1044 00:51:29,940 --> 00:51:32,450 Because if I end up visiting the entire tree then 1045 00:51:32,450 --> 00:51:33,720 that's order n plus i. 1046 00:51:33,720 --> 00:51:35,160 So order n. 1047 00:51:35,160 --> 00:51:38,370 But if I only end up visiting a few nodes outside of the output 1048 00:51:38,370 --> 00:51:42,090 then that might be a lot better. 1049 00:51:42,090 --> 00:51:44,700 And here I'm trying to argue that I will only 1050 00:51:44,700 --> 00:51:48,090 visit a few notes the outside the path. 1051 00:51:51,010 --> 00:51:52,690 In fact, for every node I'm going 1052 00:51:52,690 --> 00:51:57,650 to visit at most of the node and the node at its left. 1053 00:51:57,650 --> 00:52:01,370 And then everything to the right of the path is between l and h, 1054 00:52:01,370 --> 00:52:04,460 so everything to the right is going to be output. 1055 00:52:04,460 --> 00:52:08,000 Everything here is in order i and everything here 1056 00:52:08,000 --> 00:52:10,130 is visited in that output. 1057 00:52:10,130 --> 00:52:14,600 And potentially everything on the path as well. 1058 00:52:14,600 --> 00:52:19,356 So what's the height of a path in a binary search tree? 1059 00:52:19,356 --> 00:52:20,480 Regular binary search tree? 1060 00:52:23,720 --> 00:52:25,990 OK, and worst case. 1061 00:52:25,990 --> 00:52:31,500 So worst case the height is order n. 1062 00:52:31,500 --> 00:52:35,160 But we usually call it height because on average, at least, 1063 00:52:35,160 --> 00:52:37,740 it's log n. 1064 00:52:37,740 --> 00:52:42,010 It's somewhere between n and log n. 1065 00:52:42,010 --> 00:52:45,450 What is the running time for lists in a binary search tree? 1066 00:52:48,410 --> 00:52:50,399 If you buy this argument here. 1067 00:52:50,399 --> 00:52:51,440 What is the running time? 1068 00:52:55,024 --> 00:52:55,940 AUDIENCE: [INAUDIBLE]. 1069 00:52:58,720 --> 00:53:01,440 PROFESSOR: You could say n plus i and you're perfectly correct. 1070 00:53:01,440 --> 00:53:04,200 But we can make it a bit better. 1071 00:53:04,200 --> 00:53:06,416 What's the height of this path? 1072 00:53:06,416 --> 00:53:08,660 AUDIENCE: h. 1073 00:53:08,660 --> 00:53:13,679 PROFESSOR: H. What's the running time? 1074 00:53:13,679 --> 00:53:14,470 AUDIENCE: h plus i. 1075 00:53:18,320 --> 00:53:19,460 PROFESSOR: OK. 1076 00:53:19,460 --> 00:53:20,900 Cool. 1077 00:53:20,900 --> 00:53:22,350 Does this make sense at all? 1078 00:53:22,350 --> 00:53:23,920 Is this too complicated? 1079 00:53:23,920 --> 00:53:25,920 AUDIENCE: That's the running time for what? 1080 00:53:25,920 --> 00:53:27,836 PROFESSOR: This is the running time for lists. 1081 00:53:33,150 --> 00:53:36,510 All right so, questions? 1082 00:53:36,510 --> 00:53:38,574 Is this too far out? 1083 00:53:38,574 --> 00:53:39,865 Did I lose you guys completely? 1084 00:53:43,710 --> 00:53:45,100 Makes perfect sense. 1085 00:53:45,100 --> 00:53:46,088 I like that. 1086 00:53:46,088 --> 00:53:48,528 AUDIENCE: It's oder n plus i because it 1087 00:53:48,528 --> 00:53:51,456 could have up to any nodes on it, right? 1088 00:53:51,456 --> 00:53:52,267 PROFESSOR: Yup. 1089 00:53:52,267 --> 00:53:54,058 AUDIENCE: And then it's order i because you 1090 00:53:54,058 --> 00:53:56,336 have to add that [INAUDIBLE] of nodes anyway. 1091 00:53:56,336 --> 00:53:58,417 It's order h. 1092 00:53:58,417 --> 00:54:01,000 PROFESSOR: I'm saying that it's a binary search tree of height 1093 00:54:01,000 --> 00:54:04,500 h and if h is better than n then I'm 1094 00:54:04,500 --> 00:54:06,230 going to do much better than n and that's 1095 00:54:06,230 --> 00:54:07,497 why I'm putting that h there. 1096 00:54:07,497 --> 00:54:08,788 AUDIENCE: OK, that makes sense. 1097 00:54:08,788 --> 00:54:09,218 I think it's just a ltitle confusing at that part, 1098 00:54:09,218 --> 00:54:12,613 but the [INAUDIBLE] makes sense because of course 1099 00:54:12,613 --> 00:54:14,196 that's a smaller node you're not going 1100 00:54:14,196 --> 00:54:15,612 to look at the left portion you're 1101 00:54:15,612 --> 00:54:18,090 going to look at the right because it's bigger 1102 00:54:18,090 --> 00:54:20,800 than your smaller [INAUDIBLE], right? 1103 00:54:20,800 --> 00:54:24,190 So if you go to the right, that means that-- 1104 00:54:24,190 --> 00:54:27,210 PROFESSOR: Here I know that l is on the left, right? 1105 00:54:27,210 --> 00:54:30,190 So l is smaller than this key. 1106 00:54:30,190 --> 00:54:35,630 Everything to the right of this key is between l and LCA. 1107 00:54:35,630 --> 00:54:38,170 LCA is smaller or equal to h. 1108 00:54:38,170 --> 00:54:39,780 Everything here is between l and h, 1109 00:54:39,780 --> 00:54:43,740 so all the nodes here are going to be output. 1110 00:54:43,740 --> 00:54:46,196 All the nodes here count as i. 1111 00:54:46,196 --> 00:54:48,010 AUDIENCE: OK. 1112 00:54:48,010 --> 00:54:58,820 PROFESSOR: Now when I turn left here, this is smaller than l. 1113 00:54:58,820 --> 00:55:02,340 This is smaller than l because l is on the right. 1114 00:55:02,340 --> 00:55:05,030 This thing is going to be pruned. 1115 00:55:05,030 --> 00:55:07,420 So I will visit the parent, this node, and that's it. 1116 00:55:07,420 --> 00:55:09,350 I'm not going to look down. 1117 00:55:09,350 --> 00:55:12,562 And this is how the nodes look like. 1118 00:55:12,562 --> 00:55:14,270 Everything that's to the left of the path 1119 00:55:14,270 --> 00:55:16,561 is not visited, everything that's to the right it open. 1120 00:55:29,170 --> 00:55:29,990 Any more questions? 1121 00:55:32,889 --> 00:55:34,430 I hope you get it right on pset then. 1122 00:55:37,720 --> 00:55:39,740 Thanks guys.