1 00:00:00,790 --> 00:00:03,190 The following content is provided under a Creative 2 00:00:03,190 --> 00:00:04,730 Commons license. 3 00:00:04,730 --> 00:00:07,030 Your support will help MIT OpenCourseWare 4 00:00:07,030 --> 00:00:11,390 continue to offer high quality educational resources for free. 5 00:00:11,390 --> 00:00:13,990 To make a donation or view additional materials 6 00:00:13,990 --> 00:00:16,450 from hundreds of MIT courses, visit 7 00:00:16,450 --> 00:00:18,480 MITOpenCourseWare@ocw.mit.edu. 8 00:00:24,840 --> 00:00:27,550 PROFESSOR: OK, folks. 9 00:00:27,550 --> 00:00:28,930 Welcome back. 10 00:00:28,930 --> 00:00:31,000 Hope you had a nice long weekend with no classes. 11 00:00:31,000 --> 00:00:33,000 You got caught up on all those problem sets that 12 00:00:33,000 --> 00:00:34,550 have been sneaking up on you. 13 00:00:34,550 --> 00:00:36,540 You enjoyed watching the Patriots and Tom 14 00:00:36,540 --> 00:00:37,320 Brady come back. 15 00:00:37,320 --> 00:00:40,720 Oh, sorry, I'm showing my local bias. 16 00:00:40,720 --> 00:00:44,030 Before we talk about today's topic, 17 00:00:44,030 --> 00:00:45,940 I want to take a second to set the stage. 18 00:00:45,940 --> 00:00:48,440 And I want you to stop and think about what you've 19 00:00:48,440 --> 00:00:50,322 seen so far in this course. 20 00:00:50,322 --> 00:00:52,280 We're coming up on the end of the first section 21 00:00:52,280 --> 00:00:53,120 of the course. 22 00:00:53,120 --> 00:00:55,100 And you've already seen a lot. 23 00:00:55,100 --> 00:00:58,120 You've certainly learned about fundamentals of computation. 24 00:00:58,120 --> 00:01:00,760 You've seen different kinds of data structures, 25 00:01:00,760 --> 00:01:04,180 both mutable and immutable, so tuples and lists, dictionaries, 26 00:01:04,180 --> 00:01:07,150 different ways of pulling things together. 27 00:01:07,150 --> 00:01:08,530 You've seen a range of algorithms 28 00:01:08,530 --> 00:01:12,730 from simple linear code to loops, fors and whiles. 29 00:01:12,730 --> 00:01:14,380 You've seen iterative algorithms. 30 00:01:14,380 --> 00:01:16,150 You've seen recursive algorithms. 31 00:01:16,150 --> 00:01:19,630 You've seen classes of algorithms. 32 00:01:19,630 --> 00:01:21,280 Divide and conquer. 33 00:01:21,280 --> 00:01:22,630 Greedy algorithms. 34 00:01:22,630 --> 00:01:24,490 Bisection search. 35 00:01:24,490 --> 00:01:25,270 A range of things. 36 00:01:25,270 --> 00:01:27,520 And then most recently, you start pulling things 37 00:01:27,520 --> 00:01:30,430 together with classes-- a way to group together 38 00:01:30,430 --> 00:01:34,570 data that belongs together along with methods or procedures that 39 00:01:34,570 --> 00:01:37,120 are designed to manipulate that data. 40 00:01:37,120 --> 00:01:39,520 So you've had actually a fairly good coverage 41 00:01:39,520 --> 00:01:43,181 already of a lot of the fundamentals of computation. 42 00:01:43,181 --> 00:01:44,680 And you're starting to get geared up 43 00:01:44,680 --> 00:01:49,180 to be able to tackle a pretty interesting range of problems. 44 00:01:49,180 --> 00:01:51,880 Today and Monday, we're going to take 45 00:01:51,880 --> 00:01:54,280 a little bit of a different look at computation. 46 00:01:54,280 --> 00:01:57,280 Because now that you've got the tools to start building up 47 00:01:57,280 --> 00:02:01,390 your own personal armamentarium of tools, 48 00:02:01,390 --> 00:02:04,330 we'd like to ask a couple of important questions. 49 00:02:04,330 --> 00:02:09,840 The primary one of which is how efficient are my algorithms? 50 00:02:09,840 --> 00:02:13,060 And by efficiency, we'll see it refers both to space and time, 51 00:02:13,060 --> 00:02:14,610 but primarily to time. 52 00:02:14,610 --> 00:02:18,460 And we'd like to know both how fast are my algorithms going 53 00:02:18,460 --> 00:02:21,771 to run and how could I reason about past performance. 54 00:02:21,771 --> 00:02:24,020 And that's what we're going to do with today's topics. 55 00:02:24,020 --> 00:02:26,020 We're going to talk about orders of growth. 56 00:02:26,020 --> 00:02:27,980 We'll define what that means in a few minutes. 57 00:02:27,980 --> 00:02:30,970 We're going to talk about what's called the big O notation. 58 00:02:30,970 --> 00:02:33,730 And we're going to begin to explore different classes 59 00:02:33,730 --> 00:02:36,160 of algorithms. 60 00:02:36,160 --> 00:02:39,685 Before we do that though, let's talk about why. 61 00:02:39,685 --> 00:02:43,740 And I want to suggest to you there are two reasons this 62 00:02:43,740 --> 00:02:47,230 is important to be considering. 63 00:02:47,230 --> 00:02:49,930 First question is how could we reason about an algorithm 64 00:02:49,930 --> 00:02:53,950 something you write in order to predict how much time is 65 00:02:53,950 --> 00:02:57,920 it going to need to solve a problem of a particular size? 66 00:02:57,920 --> 00:03:00,960 I might be testing my code on small scale examples. 67 00:03:00,960 --> 00:03:03,390 And I want to know if I'd run it on a really big one, how 68 00:03:03,390 --> 00:03:04,431 long is it going to take? 69 00:03:04,431 --> 00:03:05,940 Can I predict that? 70 00:03:05,940 --> 00:03:09,030 Can I make guesses about how much time 71 00:03:09,030 --> 00:03:11,170 I'm going to need to solve this problem? 72 00:03:11,170 --> 00:03:12,750 Especially if it's in a real world 73 00:03:12,750 --> 00:03:16,120 circumstance where time is going to be crucial. 74 00:03:16,120 --> 00:03:20,110 Equally important is going the other direction. 75 00:03:20,110 --> 00:03:23,380 We want you to begin to reason about the algorithms 76 00:03:23,380 --> 00:03:25,480 you write to be able to say how do 77 00:03:25,480 --> 00:03:30,040 certain choices in a design of an algorithm influence 78 00:03:30,040 --> 00:03:32,050 how much time it's going to take. 79 00:03:32,050 --> 00:03:33,602 If I choose to do this recursively, 80 00:03:33,602 --> 00:03:35,560 is that going to be different than iteratively? 81 00:03:35,560 --> 00:03:38,620 If I choose to do this with a particular kind of structure 82 00:03:38,620 --> 00:03:41,890 in my algorithm, what does that say about the amount 83 00:03:41,890 --> 00:03:43,240 of time I'm going to need? 84 00:03:43,240 --> 00:03:45,760 And you're going to see there's a nice association 85 00:03:45,760 --> 00:03:48,910 between classes of algorithms and the interior structure 86 00:03:48,910 --> 00:03:50,050 of them. 87 00:03:50,050 --> 00:03:54,580 And in particular, we want to ask some fundamental questions. 88 00:03:54,580 --> 00:03:57,759 Are there fundamental limits to how much time 89 00:03:57,759 --> 00:03:59,800 it's going to take to solve a particular problem, 90 00:03:59,800 --> 00:04:02,649 no matter what kind of algorithm I design around this? 91 00:04:02,649 --> 00:04:04,690 And we'll see that there are some nice challenges 92 00:04:04,690 --> 00:04:05,950 about that. 93 00:04:05,950 --> 00:04:09,380 So that's what we're going to do over the next two days. 94 00:04:09,380 --> 00:04:11,850 Before we do though, let's maybe ask the obvious question-- 95 00:04:11,850 --> 00:04:14,210 why should we care? 96 00:04:14,210 --> 00:04:18,110 Could be on a quiz, might matter to you. 97 00:04:18,110 --> 00:04:22,630 Better choice is because it actually makes a difference. 98 00:04:22,630 --> 00:04:25,100 And I say that because it may not be as obvious to you 99 00:04:25,100 --> 00:04:26,660 as it was in an earlier generation. 100 00:04:26,660 --> 00:04:29,390 So people with my gray hair or what's left of my gray hair 101 00:04:29,390 --> 00:04:31,200 like to tell stories. 102 00:04:31,200 --> 00:04:33,460 I'll make it short. 103 00:04:33,460 --> 00:04:37,700 But I started programming 41 years ago-- no, 104 00:04:37,700 --> 00:04:40,541 sorry, 45 years ago-- on punch cards. 105 00:04:40,541 --> 00:04:43,040 You don't know what those are unless you've been to a museum 106 00:04:43,040 --> 00:04:45,440 on a machine that filled a half a room 107 00:04:45,440 --> 00:04:47,750 and that took about five minutes to execute 108 00:04:47,750 --> 00:04:50,970 what you can do in a fraction of a second on your phone. 109 00:04:50,970 --> 00:04:51,470 Right. 110 00:04:51,470 --> 00:04:53,660 This is to tell you're living in a great time, not 111 00:04:53,660 --> 00:04:57,320 independent of what's going to happen on November 8th. 112 00:04:57,320 --> 00:04:57,820 All right. 113 00:04:57,820 --> 00:05:00,860 We'll stay away from those topics as well, won't we? 114 00:05:00,860 --> 00:05:03,230 My point is yeah, I tell old stories. 115 00:05:03,230 --> 00:05:04,532 I'm an old guy. 116 00:05:04,532 --> 00:05:05,990 But you might argue look, computers 117 00:05:05,990 --> 00:05:07,400 are getting so much faster. 118 00:05:07,400 --> 00:05:08,480 Does it really matter? 119 00:05:08,480 --> 00:05:10,730 And I want to say to you-- maybe it's obvious to you-- 120 00:05:10,730 --> 00:05:13,040 yes, absolutely it does. 121 00:05:13,040 --> 00:05:16,190 Because in conjunction with us getting faster computers, 122 00:05:16,190 --> 00:05:17,990 we're increasing the sizes of the problems. 123 00:05:17,990 --> 00:05:21,790 The data sets we want to analyze are getting massive. 124 00:05:21,790 --> 00:05:23,830 And I'll give you an example. 125 00:05:23,830 --> 00:05:26,990 I just pulled this off of Google, of course. 126 00:05:26,990 --> 00:05:29,240 In 2014-- I don't have more recent numbers-- 127 00:05:29,240 --> 00:05:31,910 Google served-- I think I have that number right-- 128 00:05:31,910 --> 00:05:37,110 30 trillion pages on the web. 129 00:05:37,110 --> 00:05:39,540 It's either 30 trillionaire or 30 quadrillion. 130 00:05:39,540 --> 00:05:41,370 I can't count that many zeros there. 131 00:05:41,370 --> 00:05:45,586 It covered 100 million gigabytes of data. 132 00:05:45,586 --> 00:05:48,210 And I suggest to you if you want to find a piece of information 133 00:05:48,210 --> 00:05:51,402 on the web, can you write a simple little search algorithm 134 00:05:51,402 --> 00:05:53,610 that's going to sequentially go through all the pages 135 00:05:53,610 --> 00:05:57,090 and find anything in any reasonable amount of time? 136 00:05:57,090 --> 00:05:57,820 Probably not. 137 00:05:57,820 --> 00:05:58,320 Right? 138 00:05:58,320 --> 00:06:00,840 It's just growing way too fast. 139 00:06:00,840 --> 00:06:03,870 This, by the way, is of course, why Google makes a lot of money 140 00:06:03,870 --> 00:06:06,480 off of their map reduced algorithm 141 00:06:06,480 --> 00:06:08,760 for searching the web, written by the way, 142 00:06:08,760 --> 00:06:12,540 or co-written by an MIT grad and the parent of a current MIT 143 00:06:12,540 --> 00:06:13,040 student. 144 00:06:13,040 --> 00:06:14,610 So there's a nice hook in there, not 145 00:06:14,610 --> 00:06:17,040 that Google pays MIT royalties for that wonderful thing, 146 00:06:17,040 --> 00:06:18,490 by the way. 147 00:06:18,490 --> 00:06:19,230 All right. 148 00:06:19,230 --> 00:06:23,070 Bad jokes aside, searching Google-- ton of time. 149 00:06:23,070 --> 00:06:26,590 Searching a genomics data set-- ton of time. 150 00:06:26,590 --> 00:06:29,100 The data sets are growing so fast. 151 00:06:29,100 --> 00:06:30,750 You're working for the US government. 152 00:06:30,750 --> 00:06:33,960 You want to track terrorists using image surveillance 153 00:06:33,960 --> 00:06:37,140 from around the world, growing incredibly rapidly. 154 00:06:37,140 --> 00:06:37,960 Pick a problem. 155 00:06:37,960 --> 00:06:41,100 The data sets grow so quickly that even 156 00:06:41,100 --> 00:06:42,750 if the computers speed up, you still 157 00:06:42,750 --> 00:06:45,690 need to think about how to come up with efficient ways 158 00:06:45,690 --> 00:06:47,680 to solve those problems. 159 00:06:47,680 --> 00:06:50,010 So I want to suggest to you while sometimes 160 00:06:50,010 --> 00:06:52,570 simple solutions are great, they are the easy ones to rate-- 161 00:06:52,570 --> 00:06:53,130 too write. 162 00:06:53,130 --> 00:06:53,910 Sorry. 163 00:06:53,910 --> 00:06:56,070 At times, you need to be more sophisticated. 164 00:06:56,070 --> 00:06:59,010 Therefore, we want to reason about 165 00:06:59,010 --> 00:07:01,440 how do we measure efficiency and how do we 166 00:07:01,440 --> 00:07:05,340 relate algorithm design choices to the cost that's 167 00:07:05,340 --> 00:07:08,170 going to be associated with it? 168 00:07:08,170 --> 00:07:09,669 OK. 169 00:07:09,669 --> 00:07:11,710 Even when we do that, we've got a choice to make. 170 00:07:11,710 --> 00:07:15,850 Because we could talk about both efficiency in terms of time 171 00:07:15,850 --> 00:07:18,700 or in terms of space, meaning how much storage 172 00:07:18,700 --> 00:07:20,920 do I have inside the computer? 173 00:07:20,920 --> 00:07:22,390 And the reason that's relevant is 174 00:07:22,390 --> 00:07:25,955 there's actually in many cases a trade-off between those two. 175 00:07:25,955 --> 00:07:28,330 And you've actually seen an example, which you may or may 176 00:07:28,330 --> 00:07:29,830 not remember. 177 00:07:29,830 --> 00:07:31,810 You may recall when we introduced dictionaries, 178 00:07:31,810 --> 00:07:33,550 I showed you a variation where you 179 00:07:33,550 --> 00:07:37,060 could compute Fibonacci using the dictionary to keep 180 00:07:37,060 --> 00:07:39,760 track of intermediate values. 181 00:07:39,760 --> 00:07:42,130 And we'll see in next week that it actually tremendously 182 00:07:42,130 --> 00:07:43,960 reduces the time complexity. 183 00:07:43,960 --> 00:07:46,330 That's called a trade-off, in the sense that sometimes I 184 00:07:46,330 --> 00:07:49,710 can pre-compute portions of the answer, 185 00:07:49,710 --> 00:07:51,220 store them away, so that when I've 186 00:07:51,220 --> 00:07:52,844 tried to a bigger version of the answer 187 00:07:52,844 --> 00:07:54,910 I can just look up those portions. 188 00:07:54,910 --> 00:07:56,890 So there's going to be a trade-off here. 189 00:07:56,890 --> 00:07:59,020 We're going to focus, for purposes 190 00:07:59,020 --> 00:08:01,910 of this lecture and the next one, on time efficiency. 191 00:08:01,910 --> 00:08:03,910 How much time is it going to take our algorithms 192 00:08:03,910 --> 00:08:07,060 to solve a problem? 193 00:08:07,060 --> 00:08:08,612 OK. 194 00:08:08,612 --> 00:08:10,570 What are the challenges in doing that before we 195 00:08:10,570 --> 00:08:11,710 look at the actual tools? 196 00:08:11,710 --> 00:08:15,230 And in fact, this is going to lead into the tools. 197 00:08:15,230 --> 00:08:19,490 The first one is even if I've decided on an algorithm, 198 00:08:19,490 --> 00:08:21,930 there are lots of ways to implement that. 199 00:08:21,930 --> 00:08:24,570 A while loop and a for loop might have slightly different 200 00:08:24,570 --> 00:08:25,650 behavior. 201 00:08:25,650 --> 00:08:28,170 I could choose to do it with temporary variables 202 00:08:28,170 --> 00:08:29,530 or using direct substitution. 203 00:08:29,530 --> 00:08:31,290 There's lots of little choices. 204 00:08:31,290 --> 00:08:32,940 So an algorithm could be implemented 205 00:08:32,940 --> 00:08:34,210 many different ways. 206 00:08:34,210 --> 00:08:38,179 How do I measure the actual efficiency of the algorithm? 207 00:08:38,179 --> 00:08:40,730 Second one is I might, for a given problem, 208 00:08:40,730 --> 00:08:43,460 have different choices of algorithm. 209 00:08:43,460 --> 00:08:46,490 A recursive solution versus an iterative one. 210 00:08:46,490 --> 00:08:49,562 Using divide and conquer versus straightforward search. 211 00:08:49,562 --> 00:08:51,270 We're going to see some examples of that. 212 00:08:51,270 --> 00:08:54,320 So I've got to somehow separate those pieces out. 213 00:08:54,320 --> 00:08:56,480 And in particular, I'd like to separate out 214 00:08:56,480 --> 00:09:00,110 the choice of implementation from the choice of algorithm. 215 00:09:00,110 --> 00:09:02,010 I want to measure how hard is the algorithm, 216 00:09:02,010 --> 00:09:04,430 not can I come up with a slightly more 217 00:09:04,430 --> 00:09:08,650 efficient way of coming up with an implementation. 218 00:09:08,650 --> 00:09:10,950 So here are three ways I might do it. 219 00:09:10,950 --> 00:09:13,770 And we're going to look at each one of them very briefly. 220 00:09:13,770 --> 00:09:17,750 The obvious one is we could be scientists-- time it. 221 00:09:17,750 --> 00:09:20,390 Write the code, run a bunch of test case, run a timer, 222 00:09:20,390 --> 00:09:25,160 use that to try and come up with a way of estimating efficiency. 223 00:09:25,160 --> 00:09:27,610 We'll see some challenges with that. 224 00:09:27,610 --> 00:09:32,590 Slightly more abstractly, we could count operations. 225 00:09:32,590 --> 00:09:35,260 We could say here are the set of fundamental operations-- 226 00:09:35,260 --> 00:09:38,890 mathematical operations, comparisons, setting values, 227 00:09:38,890 --> 00:09:40,480 retrieving values. 228 00:09:40,480 --> 00:09:42,760 And simply say how many of those operations 229 00:09:42,760 --> 00:09:45,220 do I use in my algorithm as a function 230 00:09:45,220 --> 00:09:46,930 of the size of the input? 231 00:09:46,930 --> 00:09:50,490 And that could be used to give us a sense of efficiency. 232 00:09:50,490 --> 00:09:53,480 We're going to see both of those are flawed somewhat more 233 00:09:53,480 --> 00:09:55,570 in the first case than the second one. 234 00:09:55,570 --> 00:09:57,720 And so we're going to abstract that second one 235 00:09:57,720 --> 00:09:59,520 to a more abstract notion of something 236 00:09:59,520 --> 00:10:01,760 we call an order of growth. 237 00:10:01,760 --> 00:10:04,970 And I'll come back to that in a couple of minutes. 238 00:10:04,970 --> 00:10:07,320 This is the one we're going to focus on. 239 00:10:07,320 --> 00:10:09,065 It's one that computer scientists use. 240 00:10:09,065 --> 00:10:11,180 It leads to what we call complexity classes. 241 00:10:11,180 --> 00:10:13,430 So order of growth or big O notation 242 00:10:13,430 --> 00:10:16,190 is a way of abstractly describing 243 00:10:16,190 --> 00:10:18,620 the behavior of an algorithm, and especially 244 00:10:18,620 --> 00:10:21,880 the equivalences of different algorithms. 245 00:10:21,880 --> 00:10:23,300 But let's look at those. 246 00:10:23,300 --> 00:10:25,497 Timing. 247 00:10:25,497 --> 00:10:26,830 Python provides a timer for you. 248 00:10:26,830 --> 00:10:28,771 You could import the time module. 249 00:10:28,771 --> 00:10:31,020 And then you can call, as you can see right down here. 250 00:10:31,020 --> 00:10:33,820 I might have defined a really simple little function-- 251 00:10:33,820 --> 00:10:35,920 convert Celsius to Fahrenheit. 252 00:10:35,920 --> 00:10:39,130 And in particular, I could invoke the clock method 253 00:10:39,130 --> 00:10:41,500 from the time module. 254 00:10:41,500 --> 00:10:42,880 And what that does is it gives me 255 00:10:42,880 --> 00:10:46,040 a number as the number of some fractions of a second currently 256 00:10:46,040 --> 00:10:46,870 there. 257 00:10:46,870 --> 00:10:49,280 Having done that I could call the function. 258 00:10:49,280 --> 00:10:52,270 And then I could call the clock again, and take the difference 259 00:10:52,270 --> 00:10:54,630 to tell me how much time it took to execute this. 260 00:10:54,630 --> 00:10:56,620 It's going to be a tiny amount of time. 261 00:10:56,620 --> 00:11:00,022 And then I could certainly print out some statistics. 262 00:11:00,022 --> 00:11:01,480 I could do that over a large number 263 00:11:01,480 --> 00:11:03,700 of runs-- different sizes of the input-- 264 00:11:03,700 --> 00:11:08,470 and come up with a sense of how much time does it take. 265 00:11:08,470 --> 00:11:10,760 Here's the problem with that. 266 00:11:10,760 --> 00:11:12,580 Not a bad idea. 267 00:11:12,580 --> 00:11:15,920 But again, my goal is to evaluate algorithms. 268 00:11:15,920 --> 00:11:17,840 Do different algorithms have different amounts 269 00:11:17,840 --> 00:11:20,066 of time associated with them? 270 00:11:20,066 --> 00:11:22,190 The good news is is that if I measure running time, 271 00:11:22,190 --> 00:11:25,640 it will certainly vary as the algorithm changes. 272 00:11:25,640 --> 00:11:27,050 Just what I want to measure. 273 00:11:27,050 --> 00:11:28,640 Sorry. 274 00:11:28,640 --> 00:11:30,890 But one of the problems is that it will also 275 00:11:30,890 --> 00:11:34,330 vary as a function of the implementation. 276 00:11:34,330 --> 00:11:34,830 Right? 277 00:11:34,830 --> 00:11:37,260 If I use a loop that's got a couple of more steps inside 278 00:11:37,260 --> 00:11:38,761 of it in one algorithm than another, 279 00:11:38,761 --> 00:11:40,010 it's going to change the time. 280 00:11:40,010 --> 00:11:42,090 And I don't really care about that difference. 281 00:11:42,090 --> 00:11:45,690 So I'm confounding or conflating implementation influence 282 00:11:45,690 --> 00:11:49,530 on time with algorithm influence on time. 283 00:11:49,530 --> 00:11:51,450 Not so good. 284 00:11:51,450 --> 00:11:55,890 Worse, timing will depend on the computer. 285 00:11:55,890 --> 00:11:57,139 My Mac here is pretty old. 286 00:11:57,139 --> 00:11:58,680 Well, at least for computer versions. 287 00:11:58,680 --> 00:11:59,820 It's about five years old. 288 00:11:59,820 --> 00:12:02,370 I'm sure some of you have much more recent Macs 289 00:12:02,370 --> 00:12:04,195 or other kinds of machines. 290 00:12:04,195 --> 00:12:05,820 Your speeds may be different from mine. 291 00:12:05,820 --> 00:12:08,490 That's not going to help me in trying to measure this. 292 00:12:08,490 --> 00:12:11,730 And even if I could measure it on small sized problems, 293 00:12:11,730 --> 00:12:13,710 it doesn't necessarily predict what 294 00:12:13,710 --> 00:12:16,350 happens when I go to a really large sized problems, 295 00:12:16,350 --> 00:12:18,000 because of issues like the time it 296 00:12:18,000 --> 00:12:20,340 takes to get things out of memory 297 00:12:20,340 --> 00:12:22,840 and bring them back in to the computer. 298 00:12:22,840 --> 00:12:26,280 So what it says is that timing does 299 00:12:26,280 --> 00:12:28,650 vary based on what I'd like to measure, 300 00:12:28,650 --> 00:12:30,520 but it varies on a lot of other factors. 301 00:12:30,520 --> 00:12:33,630 And it's really not all that valuable. 302 00:12:33,630 --> 00:12:34,230 OK. 303 00:12:34,230 --> 00:12:36,265 Got rid of the first one. 304 00:12:36,265 --> 00:12:39,320 Let's abstract that. 305 00:12:39,320 --> 00:12:41,730 By abstract, I'm going to make the following assumption. 306 00:12:41,730 --> 00:12:46,026 I'm going to identify a set of primitive operations. 307 00:12:46,026 --> 00:12:47,400 Kind of get to say what they are, 308 00:12:47,400 --> 00:12:48,816 but the obvious one is to say what 309 00:12:48,816 --> 00:12:51,450 does the machine do for me automatically. 310 00:12:51,450 --> 00:12:53,490 That would be things like arithmetic 311 00:12:53,490 --> 00:12:56,940 or mathematical operations, multiplication, division, 312 00:12:56,940 --> 00:13:00,492 subtraction, comparisons, something 313 00:13:00,492 --> 00:13:02,450 equal to another thing, something greater than, 314 00:13:02,450 --> 00:13:05,620 something less than, assignments, 315 00:13:05,620 --> 00:13:09,190 set a name to a value, and retrieval from memory. 316 00:13:09,190 --> 00:13:12,580 I'm going to assume that all of these operations 317 00:13:12,580 --> 00:13:16,764 take about the same amount of time inside my machine. 318 00:13:16,764 --> 00:13:18,180 Nice thing here is then it doesn't 319 00:13:18,180 --> 00:13:19,471 matter which machine I'm using. 320 00:13:19,471 --> 00:13:21,870 I'm measuring how long does the algorithm take 321 00:13:21,870 --> 00:13:25,200 by counting how many operations of this type 322 00:13:25,200 --> 00:13:28,130 are done inside of the algorithm. 323 00:13:28,130 --> 00:13:29,880 And I'm going to use that count to come up 324 00:13:29,880 --> 00:13:31,650 with a number of operations executed 325 00:13:31,650 --> 00:13:33,996 as a function of the size of the input. 326 00:13:33,996 --> 00:13:35,370 And if I'm lucky, that'll give me 327 00:13:35,370 --> 00:13:38,770 a sense of what's the efficiency of the algorithm. 328 00:13:38,770 --> 00:13:40,910 So this one's pretty boring. 329 00:13:40,910 --> 00:13:41,920 It's got three steps. 330 00:13:41,920 --> 00:13:42,420 Right? 331 00:13:42,420 --> 00:13:44,770 A multiplication, a division, and an addition-- four, 332 00:13:44,770 --> 00:13:46,570 if you count the return. 333 00:13:46,570 --> 00:13:48,580 But if I had a little thing here that added up 334 00:13:48,580 --> 00:13:51,651 the integers from 0 up to x, I've 335 00:13:51,651 --> 00:13:52,900 got a little loop inside here. 336 00:13:52,900 --> 00:13:55,150 And I could count operations. 337 00:13:55,150 --> 00:13:57,550 So in this case, it's just, as I said, three operations. 338 00:13:57,550 --> 00:14:01,050 Here, I've got one operation. 339 00:14:01,050 --> 00:14:03,270 I'm doing an assignment. 340 00:14:03,270 --> 00:14:05,550 And then inside here, in essence, 341 00:14:05,550 --> 00:14:10,630 there's one operation to set i to a value from that iterator. 342 00:14:10,630 --> 00:14:11,880 Initially, it's going to be 0. 343 00:14:11,880 --> 00:14:13,046 And then it's going to be 1. 344 00:14:13,046 --> 00:14:14,840 And you get the idea. 345 00:14:14,840 --> 00:14:19,500 And here, that's actually two operations. 346 00:14:19,500 --> 00:14:20,806 It's nice Python shorthand. 347 00:14:20,806 --> 00:14:21,930 But what is that operation? 348 00:14:21,930 --> 00:14:24,810 It says take the value of total and the value of i, 349 00:14:24,810 --> 00:14:26,850 add them together-- it's one operation-- 350 00:14:26,850 --> 00:14:30,030 and then set that value, or rather, set the name 351 00:14:30,030 --> 00:14:31,210 total to that new value. 352 00:14:31,210 --> 00:14:32,580 So a second operation. 353 00:14:32,580 --> 00:14:36,100 So you can see in here, I've got three operations. 354 00:14:36,100 --> 00:14:37,390 And what else do I have? 355 00:14:37,390 --> 00:14:41,530 Well, I'm going to go through this loop x times. 356 00:14:41,530 --> 00:14:42,030 Right? 357 00:14:42,030 --> 00:14:43,350 I do it for i equals 0. 358 00:14:43,350 --> 00:14:44,850 And therefore, i equal 1, and so on. 359 00:14:44,850 --> 00:14:48,480 So I'm going to run through that loop x times. 360 00:14:48,480 --> 00:14:54,210 And if I put that together, I get a nice little expression. 361 00:14:54,210 --> 00:14:56,744 1 plus 3x. 362 00:14:56,744 --> 00:14:58,160 Actually, I probably cheated here. 363 00:14:58,160 --> 00:14:58,820 I shouldn't say cheated. 364 00:14:58,820 --> 00:15:00,528 I probably should have counted the return 365 00:15:00,528 --> 00:15:03,680 as one more operation, so that would be 1 plus 3x plus 1, 366 00:15:03,680 --> 00:15:07,480 or 3x plus 2 operations. 367 00:15:07,480 --> 00:15:09,557 Why should you care? 368 00:15:09,557 --> 00:15:11,140 It's a little closer to what I'd like. 369 00:15:11,140 --> 00:15:12,760 Because now I've got an expression 370 00:15:12,760 --> 00:15:15,160 that tells me something about how much time 371 00:15:15,160 --> 00:15:20,180 is this going to take as I change the size of the problem. 372 00:15:20,180 --> 00:15:23,060 If x is equal to 10, it's going to take me 32 operations. 373 00:15:23,060 --> 00:15:25,790 If x is equal to 100, 302 operations. 374 00:15:25,790 --> 00:15:29,061 If x is equal to 1,000, 3,002 operations. 375 00:15:29,061 --> 00:15:30,560 And if I wanted the actual time, I'd 376 00:15:30,560 --> 00:15:33,394 just multiply that by whatever that constant amount of time 377 00:15:33,394 --> 00:15:34,310 is for each operation. 378 00:15:34,310 --> 00:15:37,420 I've got a good estimate of that. 379 00:15:37,420 --> 00:15:39,230 Sounds pretty good. 380 00:15:39,230 --> 00:15:43,060 Not quite what we want, but it's close. 381 00:15:43,060 --> 00:15:46,190 So if I was counting operations, what could I say about it? 382 00:15:46,190 --> 00:15:49,930 First of all, it certainly depends on the algorithm. 383 00:15:49,930 --> 00:15:51,250 That's great. 384 00:15:51,250 --> 00:15:53,560 Number of operations is going to directly relate 385 00:15:53,560 --> 00:15:55,240 to the algorithm I'm trying to measure, 386 00:15:55,240 --> 00:15:57,550 which is what I'm after. 387 00:15:57,550 --> 00:16:01,060 Unfortunately, it still depends a little bit 388 00:16:01,060 --> 00:16:02,932 on the implementation. 389 00:16:02,932 --> 00:16:04,390 Let me show you what I mean by that 390 00:16:04,390 --> 00:16:07,880 by backing up for a second. 391 00:16:07,880 --> 00:16:12,460 Suppose I were to change this for loop to a while loop. 392 00:16:12,460 --> 00:16:14,780 I'll set i equal to 0 outside of the loop. 393 00:16:14,780 --> 00:16:18,930 And then while i is less than x plus 1, 394 00:16:18,930 --> 00:16:20,910 I'll do the things inside of that. 395 00:16:20,910 --> 00:16:23,500 That would actually add one more operation 396 00:16:23,500 --> 00:16:28,130 inside the loop, because I both have to set the value of i 397 00:16:28,130 --> 00:16:30,650 and I have to test the value of i, 398 00:16:30,650 --> 00:16:33,710 as well as doing the other operations down here. 399 00:16:33,710 --> 00:16:38,760 And so rather than getting 3x plus 1, I would get 4x plus 1. 400 00:16:38,760 --> 00:16:40,135 Eh. 401 00:16:40,135 --> 00:16:42,010 As the government says, what's the difference 402 00:16:42,010 --> 00:16:43,720 between three and for when you're talking 403 00:16:43,720 --> 00:16:46,200 about really big numbers? 404 00:16:46,200 --> 00:16:49,220 Problem is in terms of counting, it does depend. 405 00:16:49,220 --> 00:16:50,970 And I want to get rid of that in a second, 406 00:16:50,970 --> 00:16:53,370 so it still depends a little bit on the implementation. 407 00:16:53,370 --> 00:16:55,740 I remind you, I wanted to measure 408 00:16:55,740 --> 00:16:57,961 impact of the algorithm. 409 00:16:57,961 --> 00:16:59,630 But the other good news is the count 410 00:16:59,630 --> 00:17:02,700 is independent of which computer I run on. 411 00:17:02,700 --> 00:17:05,180 As long as all my computers come with the same set 412 00:17:05,180 --> 00:17:07,339 of basic operations, I don't care 413 00:17:07,339 --> 00:17:09,410 what the time of my computer is versus yours 414 00:17:09,410 --> 00:17:11,839 to do those operations on measuring 415 00:17:11,839 --> 00:17:13,921 how much time it would take. 416 00:17:13,921 --> 00:17:15,359 And I should say, by the way, one 417 00:17:15,359 --> 00:17:17,275 of the reasons I want to do it is last to know 418 00:17:17,275 --> 00:17:20,609 is it going to take 37.42 femtoseconds or not, 419 00:17:20,609 --> 00:17:22,560 but rather to say if this algorithm has 420 00:17:22,560 --> 00:17:26,700 a particular behavior, if I double the size of the input, 421 00:17:26,700 --> 00:17:28,590 does that double the amount of time I need? 422 00:17:28,590 --> 00:17:30,510 Does that quadruple the amount of time I need? 423 00:17:30,510 --> 00:17:32,970 Does it increase it by a factor of 10? 424 00:17:32,970 --> 00:17:36,670 And here, what matters isn't the speed of the computer. 425 00:17:36,670 --> 00:17:39,550 It's the number of operations. 426 00:17:39,550 --> 00:17:41,620 The last one I'm not going to really worry about. 427 00:17:41,620 --> 00:17:43,600 But we'd have to really think about what 428 00:17:43,600 --> 00:17:46,300 are the operations we want to count. 429 00:17:46,300 --> 00:17:48,310 I made an assumption that the amount 430 00:17:48,310 --> 00:17:50,440 of time it takes to retrieve something from memory 431 00:17:50,440 --> 00:17:51,814 is the same as the amount of time 432 00:17:51,814 --> 00:17:54,010 it takes to do a numerical computation. 433 00:17:54,010 --> 00:17:55,780 That may not be accurate. 434 00:17:55,780 --> 00:17:57,370 But this one could probably be dealt 435 00:17:57,370 --> 00:18:00,010 with by just agreeing on what are the common operations 436 00:18:00,010 --> 00:18:02,330 and then doing the measurement. 437 00:18:02,330 --> 00:18:04,640 So this is closer. 438 00:18:04,640 --> 00:18:05,400 Excuse me. 439 00:18:05,400 --> 00:18:08,840 And certainly, that count varies for different inputs. 440 00:18:08,840 --> 00:18:11,090 And we can use it to come up with a relationship 441 00:18:11,090 --> 00:18:13,690 between the inputs and the count. 442 00:18:13,690 --> 00:18:16,640 And for the most part, it reflects the algorithm, not 443 00:18:16,640 --> 00:18:18,024 the implementation. 444 00:18:18,024 --> 00:18:19,940 But it's still got that last piece left there, 445 00:18:19,940 --> 00:18:22,710 so I need to get rid of the last piece. 446 00:18:22,710 --> 00:18:23,710 So what can we say here? 447 00:18:23,710 --> 00:18:27,760 Timing and counting do evaluate or reflect implementations? 448 00:18:27,760 --> 00:18:28,510 I don't want that. 449 00:18:28,510 --> 00:18:31,480 Timing also evaluates the machines. 450 00:18:31,480 --> 00:18:34,790 What I want to do is just evaluate the algorithm. 451 00:18:34,790 --> 00:18:38,480 And especially, I want to understand how does it scale? 452 00:18:38,480 --> 00:18:41,180 I'm going to say what I said a few minutes ago again. 453 00:18:41,180 --> 00:18:43,730 If I were to take an algorithm, and I 454 00:18:43,730 --> 00:18:46,910 say I know what its complexity is, my question is 455 00:18:46,910 --> 00:18:49,567 if I double the size of the input, 456 00:18:49,567 --> 00:18:50,900 what does that say to the speed? 457 00:18:50,900 --> 00:18:52,550 Because that's going to tell me something about the algorithm. 458 00:18:52,550 --> 00:18:54,300 I want to say what happens when I scale it? 459 00:18:54,300 --> 00:18:55,966 And in particular, I want to relate that 460 00:18:55,966 --> 00:18:58,845 to the size of the input. 461 00:18:58,845 --> 00:19:00,220 So here's what we're going to do. 462 00:19:00,220 --> 00:19:01,970 We're going to introduce orders of growth. 463 00:19:01,970 --> 00:19:03,946 It's a wonderful tool in computer science. 464 00:19:03,946 --> 00:19:05,820 And what we're going to focus on is that idea 465 00:19:05,820 --> 00:19:07,980 of counting operations. 466 00:19:07,980 --> 00:19:11,100 But we're not going to worry about small variations, 467 00:19:11,100 --> 00:19:13,267 whether it's three or four steps inside of the loop. 468 00:19:13,267 --> 00:19:15,141 We're going to show that that doesn't matter. 469 00:19:15,141 --> 00:19:17,580 And if you think about my statement of does it double 470 00:19:17,580 --> 00:19:20,250 in terms of size or speed or not-- or I'm sorry-- time 471 00:19:20,250 --> 00:19:24,090 or not, whether it goes from three to six or four to eight, 472 00:19:24,090 --> 00:19:25,100 it's still a doubling. 473 00:19:25,100 --> 00:19:28,170 So I don't care about those pieces inside. 474 00:19:28,170 --> 00:19:29,730 I'm going to focus on what happens 475 00:19:29,730 --> 00:19:34,000 when the size of the problem gets arbitrarily large. 476 00:19:34,000 --> 00:19:35,710 I don't care about counting things from 0 477 00:19:35,710 --> 00:19:37,300 up to x when x is 10 or 20. 478 00:19:37,300 --> 00:19:38,950 What happens when it's a million? 479 00:19:38,950 --> 00:19:40,190 100 million? 480 00:19:40,190 --> 00:19:42,610 What's the asymptotic behavior of this? 481 00:19:42,610 --> 00:19:45,190 And I want to relate that time needed 482 00:19:45,190 --> 00:19:47,650 against the size of the input, so I can make 483 00:19:47,650 --> 00:19:50,370 that comparison I suggested. 484 00:19:50,370 --> 00:19:50,870 OK. 485 00:19:50,870 --> 00:19:53,280 So to do that, we've got to do a couple of things. 486 00:19:53,280 --> 00:19:55,290 We have to decide what are we going to measure? 487 00:19:55,290 --> 00:19:56,998 And then we have to think about how do we 488 00:19:56,998 --> 00:20:00,750 count without worrying about implementation details. 489 00:20:00,750 --> 00:20:03,110 So we're going to express efficiency 490 00:20:03,110 --> 00:20:05,570 in terms of size of input. 491 00:20:05,570 --> 00:20:08,190 And usually, this is going to be obvious. 492 00:20:08,190 --> 00:20:12,310 If I've got a procedure that takes one argument that's 493 00:20:12,310 --> 00:20:14,227 an integer, the size of the integer 494 00:20:14,227 --> 00:20:16,060 is the thing I'm going to measure things in. 495 00:20:16,060 --> 00:20:17,920 If I double the size of that integer, 496 00:20:17,920 --> 00:20:20,270 what happens to the computation? 497 00:20:20,270 --> 00:20:22,700 If I'm computing something over a list, 498 00:20:22,700 --> 00:20:25,130 typically the length of the list is 499 00:20:25,130 --> 00:20:27,680 going to be the thing I'm going to use to characterize 500 00:20:27,680 --> 00:20:29,660 the size of the problem. 501 00:20:29,660 --> 00:20:32,360 If I've got-- and we'll see this in a second-- a function that 502 00:20:32,360 --> 00:20:34,490 takes more than one argument, I get 503 00:20:34,490 --> 00:20:36,500 to decide what's the parameter I want to use. 504 00:20:36,500 --> 00:20:39,071 If I'm searching to see is this element in that list, 505 00:20:39,071 --> 00:20:40,820 typically, I'm going to worry about what's 506 00:20:40,820 --> 00:20:43,910 the size of the list, not what's the size of the element. 507 00:20:43,910 --> 00:20:46,135 But we have to specify what is that we're measuring. 508 00:20:46,135 --> 00:20:48,510 And we're going to see examples of that in just a second. 509 00:20:51,500 --> 00:20:52,990 OK. 510 00:20:52,990 --> 00:20:56,440 So now, we start thinking about that sounds great. 511 00:20:56,440 --> 00:20:58,930 Certainly fun computing something numeric. 512 00:20:58,930 --> 00:21:00,835 Sum of integers from 0 up to x. 513 00:21:00,835 --> 00:21:03,284 It's kind of obvious x is the size of my problem. 514 00:21:03,284 --> 00:21:04,450 How many steps does it take? 515 00:21:04,450 --> 00:21:06,170 I can count that. 516 00:21:06,170 --> 00:21:09,020 But in some cases, the amount of time the code takes 517 00:21:09,020 --> 00:21:12,030 is going to depend on the input. 518 00:21:12,030 --> 00:21:14,660 So let's take this little piece of code here. 519 00:21:14,660 --> 00:21:17,381 And I do hope by now, even though we flash up code, 520 00:21:17,381 --> 00:21:19,630 you're already beginning to recognize what does it do. 521 00:21:19,630 --> 00:21:23,600 Not the least of which, by the clever name that we chose. 522 00:21:23,600 --> 00:21:25,700 But this is obviously just a little function. 523 00:21:25,700 --> 00:21:29,480 It runs through a loop-- sorry, a for loop that takes i 524 00:21:29,480 --> 00:21:32,090 for each element in a list L. And it's 525 00:21:32,090 --> 00:21:35,522 checking to see is i equal to the element I've provided. 526 00:21:35,522 --> 00:21:37,230 And when it is, I'm going to return true. 527 00:21:37,230 --> 00:21:39,860 If I get all the way through the loop and I didn't find it, 528 00:21:39,860 --> 00:21:40,943 I'm going to return false. 529 00:21:40,943 --> 00:21:45,730 It's just saying is e in my input list L? 530 00:21:45,730 --> 00:21:48,779 How many steps is this going to take? 531 00:21:48,779 --> 00:21:51,320 Well, we can certainly count the number of steps in the loop. 532 00:21:51,320 --> 00:21:51,820 Right? 533 00:21:51,820 --> 00:21:52,770 We've got a set i. 534 00:21:52,770 --> 00:21:55,140 We've got to compare i and potentially we've 535 00:21:55,140 --> 00:21:55,790 got to return. 536 00:21:55,790 --> 00:21:59,322 So there's at most three steps inside the loop. 537 00:21:59,322 --> 00:22:03,070 But depends on how lucky I'm feeling. 538 00:22:03,070 --> 00:22:04,857 Right? 539 00:22:04,857 --> 00:22:06,940 If e happens to be the first element in the list-- 540 00:22:06,940 --> 00:22:09,040 it goes through the loop once-- I'm done. 541 00:22:09,040 --> 00:22:11,430 Great. 542 00:22:11,430 --> 00:22:13,700 I'm not always that lucky. 543 00:22:13,700 --> 00:22:16,420 If e is not in the list, then it will 544 00:22:16,420 --> 00:22:18,370 go through this entire loop until it 545 00:22:18,370 --> 00:22:20,770 gets all the way through the elements of L 546 00:22:20,770 --> 00:22:23,010 before saying false. 547 00:22:23,010 --> 00:22:26,740 So this-- sort of a best case scenario. 548 00:22:26,740 --> 00:22:29,002 This is the worst case scenario. 549 00:22:29,002 --> 00:22:31,470 Again, if I'm assigned and say well, let's run some trials. 550 00:22:31,470 --> 00:22:33,709 Let's do a bunch of examples and see how many steps 551 00:22:33,709 --> 00:22:34,500 does it go through. 552 00:22:34,500 --> 00:22:36,360 And that would be the average case. 553 00:22:36,360 --> 00:22:39,600 On average, I'm likely to look at half the elements 554 00:22:39,600 --> 00:22:42,171 in the list before I find it. 555 00:22:42,171 --> 00:22:42,670 Right? 556 00:22:42,670 --> 00:22:43,836 If I'm lucky, it's early on. 557 00:22:43,836 --> 00:22:46,070 If I'm not so lucky, it's later on. 558 00:22:46,070 --> 00:22:48,120 Which one do I use? 559 00:22:48,120 --> 00:22:52,210 Well, we're going to focus on this one. 560 00:22:52,210 --> 00:22:54,869 Because that gives you an upper bound on the amount of time 561 00:22:54,869 --> 00:22:55,660 it's going to take. 562 00:22:55,660 --> 00:22:58,900 What happens in the worst case scenario? 563 00:22:58,900 --> 00:23:01,360 We will find at times it's valuable to look 564 00:23:01,360 --> 00:23:03,760 at the average case to give us a rough sense of what's 565 00:23:03,760 --> 00:23:05,719 going to happen on average. 566 00:23:05,719 --> 00:23:07,510 But usually, when we talk about complexity, 567 00:23:07,510 --> 00:23:11,839 we're going to focus on the worst case behavior. 568 00:23:11,839 --> 00:23:13,630 So to say it in a little bit different way, 569 00:23:13,630 --> 00:23:14,796 let's go back to my example. 570 00:23:14,796 --> 00:23:17,620 Suppose you gave it a list L of some length. 571 00:23:17,620 --> 00:23:20,050 Length of L, you can call that len if you like. 572 00:23:20,050 --> 00:23:24,095 Then my best case would be the minimum running type. 573 00:23:24,095 --> 00:23:26,720 And in this case, it will be for the first element in the list. 574 00:23:26,720 --> 00:23:30,200 And notice in that case, the number of steps I take 575 00:23:30,200 --> 00:23:32,776 would be independent of the length of L. That's great. 576 00:23:32,776 --> 00:23:34,400 It doesn't matter how long the list is. 577 00:23:34,400 --> 00:23:37,860 If I'm always going to find the first element, I'm done. 578 00:23:37,860 --> 00:23:39,870 The average case would be the average 579 00:23:39,870 --> 00:23:42,557 over the number of steps I take, depending 580 00:23:42,557 --> 00:23:43,640 on the length of the list. 581 00:23:43,640 --> 00:23:45,973 It's going to grow linearly with the length of the list. 582 00:23:45,973 --> 00:23:47,850 It's a good practical measure. 583 00:23:47,850 --> 00:23:52,400 But the one I want to focus on will be the worst case. 584 00:23:52,400 --> 00:23:54,282 And here, the amount of time as we're 585 00:23:54,282 --> 00:23:55,740 going to see in a couple of slides, 586 00:23:55,740 --> 00:23:59,660 is linear in the size of the problem. 587 00:23:59,660 --> 00:24:04,550 Meaning if I double the length of the list in the worst case, 588 00:24:04,550 --> 00:24:07,220 it's going to take me twice as much time 589 00:24:07,220 --> 00:24:09,140 to find that it's not there. 590 00:24:09,140 --> 00:24:11,824 If I increase the length in the list by a factor of 10, 591 00:24:11,824 --> 00:24:13,490 in the worst case, it's going to take me 592 00:24:13,490 --> 00:24:16,310 10 times as much time as it did in the earlier case 593 00:24:16,310 --> 00:24:18,320 to find out that the problem's not there. 594 00:24:18,320 --> 00:24:23,170 And that linear relationship is what I want to capture. 595 00:24:23,170 --> 00:24:24,570 So I'm going to focus on that. 596 00:24:24,570 --> 00:24:26,937 What's the worst case behavior? 597 00:24:26,937 --> 00:24:29,520 And we're about ready to start talking about orders of growth, 598 00:24:29,520 --> 00:24:31,380 but here then is what orders of growth 599 00:24:31,380 --> 00:24:32,546 are going to provide for me. 600 00:24:32,546 --> 00:24:36,390 I want to evaluate efficiency, particularly when 601 00:24:36,390 --> 00:24:37,470 the input is very large. 602 00:24:37,470 --> 00:24:39,906 What happens when I really scale this up? 603 00:24:39,906 --> 00:24:44,260 I want to express the growth of the program's runtime 604 00:24:44,260 --> 00:24:46,450 as that input grows. 605 00:24:46,450 --> 00:24:50,830 Not the exact runtime, but that notion of if I doubled it, 606 00:24:50,830 --> 00:24:52,060 how much longer does it take? 607 00:24:52,060 --> 00:24:54,310 What's the relationship between increasing 608 00:24:54,310 --> 00:24:56,620 the size of the input and the increase 609 00:24:56,620 --> 00:24:59,640 in the amount of time it takes to solve it? 610 00:24:59,640 --> 00:25:02,889 We're going to put an upper bound on that growth. 611 00:25:02,889 --> 00:25:04,430 And if you haven't seen this in math, 612 00:25:04,430 --> 00:25:07,830 it basically says I want to come up with a description that 613 00:25:07,830 --> 00:25:11,850 is at least as big as-- sorry-- as big as or bigger 614 00:25:11,850 --> 00:25:15,517 than the actual amount of time it's going to take. 615 00:25:15,517 --> 00:25:17,475 And I'm going to not worry about being precise. 616 00:25:17,475 --> 00:25:20,770 We're going to talk about the order of rather than exact. 617 00:25:20,770 --> 00:25:23,340 I don't need to know to the femtosecond how long this 618 00:25:23,340 --> 00:25:26,065 is going to take, or to exactly one operation how long this 619 00:25:26,065 --> 00:25:26,880 is going to take. 620 00:25:26,880 --> 00:25:31,057 But I want to say things like this is going to grow linearly. 621 00:25:31,057 --> 00:25:33,640 I double the size of the input, it doubles the amount of time. 622 00:25:33,640 --> 00:25:35,750 Or this is going to grow quadratically. 623 00:25:35,750 --> 00:25:37,750 I double the size of the input, it's 624 00:25:37,750 --> 00:25:40,750 going to take four times as much time to solve it. 625 00:25:40,750 --> 00:25:43,756 Or if I'm really lucky, this is going to have constant growth. 626 00:25:43,756 --> 00:25:45,130 No matter how I change the input, 627 00:25:45,130 --> 00:25:47,590 it's not going to take any more time. 628 00:25:47,590 --> 00:25:49,820 To do that, we're going to look at the largest 629 00:25:49,820 --> 00:25:50,960 factors in the runtime. 630 00:25:50,960 --> 00:25:53,864 Which piece of the program takes the most time? 631 00:25:53,864 --> 00:25:55,280 And so in orders of growth, we are 632 00:25:55,280 --> 00:25:57,770 going to look for as tight as possible 633 00:25:57,770 --> 00:26:00,650 an upper bound on the growth as a function of the size 634 00:26:00,650 --> 00:26:03,730 of the input in the worst case. 635 00:26:03,730 --> 00:26:07,530 Nice long definition. 636 00:26:07,530 --> 00:26:09,250 Almost ready to look at some examples. 637 00:26:09,250 --> 00:26:11,660 So here's the notation we're going to use. 638 00:26:11,660 --> 00:26:13,419 It's called Big O notation. 639 00:26:13,419 --> 00:26:15,210 I have to admit-- and John's not here today 640 00:26:15,210 --> 00:26:17,460 to remind me the history-- I think it comes because we 641 00:26:17,460 --> 00:26:19,530 used Omicron-- God knows why. 642 00:26:19,530 --> 00:26:21,690 Sounds like something from Futurama. 643 00:26:21,690 --> 00:26:26,152 But we used Omicron as our symbol to define this. 644 00:26:26,152 --> 00:26:28,110 I'm having such good luck with bad jokes today. 645 00:26:28,110 --> 00:26:30,405 You're not even wincing when I throw those things out. 646 00:26:30,405 --> 00:26:31,740 But that's OK. 647 00:26:31,740 --> 00:26:33,044 It's called Big O notation. 648 00:26:33,044 --> 00:26:33,960 We're going to use it. 649 00:26:33,960 --> 00:26:35,280 We're going to describe the rules of it. 650 00:26:35,280 --> 00:26:36,690 Is this the tradition of it? 651 00:26:36,690 --> 00:26:39,030 It describes the worst case, because it's often 652 00:26:39,030 --> 00:26:40,309 the bottleneck we're after. 653 00:26:40,309 --> 00:26:41,850 And as we said, it's going to express 654 00:26:41,850 --> 00:26:46,468 the growth of the program relative to the input size. 655 00:26:46,468 --> 00:26:47,452 OK. 656 00:26:47,452 --> 00:26:50,070 Let's see how we go from counting operations 657 00:26:50,070 --> 00:26:51,790 to getting to orders of growth. 658 00:26:51,790 --> 00:26:54,248 Then we're going to define some examples of ordered growth. 659 00:26:54,248 --> 00:26:56,480 And we're going to start looking at algorithms. 660 00:26:56,480 --> 00:26:58,230 Here's a piece of code you've seen before. 661 00:26:58,230 --> 00:27:00,150 Again, hopefully, you recognize or can see 662 00:27:00,150 --> 00:27:02,340 fairly quickly what it's doing. 663 00:27:02,340 --> 00:27:04,110 Computing factorials the iterative way. 664 00:27:04,110 --> 00:27:05,930 Basically, remember n factorial is 665 00:27:05,930 --> 00:27:09,060 n times n minus 1 times n minus 2 all the way down to 1. 666 00:27:09,060 --> 00:27:12,360 Hopefully, assuming n is a non-negative integer. 667 00:27:12,360 --> 00:27:14,880 Here, we're going to set up an internal variable called 668 00:27:14,880 --> 00:27:15,960 answer. 669 00:27:15,960 --> 00:27:17,910 And then we're just going to run over a loop. 670 00:27:17,910 --> 00:27:19,560 As long as n is bigger than 1, we're 671 00:27:19,560 --> 00:27:22,640 going to multiply answer by n, store it back into answer, 672 00:27:22,640 --> 00:27:23,730 decrease n by 1. 673 00:27:23,730 --> 00:27:26,130 We'll keep doing that until we get out of the loop. 674 00:27:26,130 --> 00:27:28,850 And we're going to return answer. 675 00:27:28,850 --> 00:27:30,110 We'll start by counting steps. 676 00:27:32,587 --> 00:27:35,170 And that's, by the way, just to remind you that in fact, there 677 00:27:35,170 --> 00:27:35,750 are two steps here. 678 00:27:35,750 --> 00:27:36,500 So what do I have? 679 00:27:36,500 --> 00:27:38,070 I've got one step up there. 680 00:27:38,070 --> 00:27:40,070 Set answer to one. 681 00:27:40,070 --> 00:27:43,670 I'm setting up n-- sorry, I'm not setting up n. 682 00:27:43,670 --> 00:27:45,320 I'm going to test n. 683 00:27:45,320 --> 00:27:47,646 And then I'm going to do two steps here, 684 00:27:47,646 --> 00:27:50,270 because I got a multiply answer by n and then set it to answer. 685 00:27:50,270 --> 00:27:52,103 And now similarly, we've got two steps there 686 00:27:52,103 --> 00:27:55,700 because I'm subtracting 1 from n and then setting it to n. 687 00:27:55,700 --> 00:28:01,480 So I've got 2 plus 4 plus the test, which is 5. 688 00:28:01,480 --> 00:28:02,930 I've got 1 outside here. 689 00:28:02,930 --> 00:28:04,070 I got 1 outside there. 690 00:28:04,070 --> 00:28:07,740 And I'm going to go through this loop n times. 691 00:28:07,740 --> 00:28:10,130 So I would suggest that if I count the number of steps, 692 00:28:10,130 --> 00:28:13,310 it's 1 plus 5n plus 1. 693 00:28:17,330 --> 00:28:19,450 Sort of what we did before. 694 00:28:19,450 --> 00:28:23,800 5n plus 2 is the total number of steps that I use here. 695 00:28:23,800 --> 00:28:26,270 But now, I'm interested in what's the worst case behavior? 696 00:28:26,270 --> 00:28:28,311 Well, in this case, it is the worst case behavior 697 00:28:28,311 --> 00:28:30,780 because it doesn't have decisions anywhere in here. 698 00:28:30,780 --> 00:28:34,760 But I just want to know what's the asymptotic complexity? 699 00:28:34,760 --> 00:28:37,402 And I'm going to say-- oh, sorry-- that 700 00:28:37,402 --> 00:28:39,110 is to say I could do this different ways. 701 00:28:39,110 --> 00:28:41,240 I could have done this with two steps like that. 702 00:28:41,240 --> 00:28:43,730 That would have made it not just 1 plus 5n plus 1. 703 00:28:43,730 --> 00:28:46,010 It would have made it 1 plus 6n plus 1 704 00:28:46,010 --> 00:28:48,630 because I've got an extra step. 705 00:28:48,630 --> 00:28:50,650 I put that up because I want to remind you 706 00:28:50,650 --> 00:28:54,610 I don't care about implementation differences. 707 00:28:54,610 --> 00:28:57,640 And so I want to know what captures 708 00:28:57,640 --> 00:28:59,440 both of those behaviors. 709 00:28:59,440 --> 00:29:04,540 And in Big O notation, I say that's order n. 710 00:29:04,540 --> 00:29:06,287 Grows linearly. 711 00:29:06,287 --> 00:29:07,870 So I'm going to keep doing this to you 712 00:29:07,870 --> 00:29:09,700 until you really do wince at me. 713 00:29:09,700 --> 00:29:13,210 If I were to double the size of n, 714 00:29:13,210 --> 00:29:16,060 whether I use this version or that version, 715 00:29:16,060 --> 00:29:18,030 the amount of time the number of steps 716 00:29:18,030 --> 00:29:20,346 is basically going to double. 717 00:29:20,346 --> 00:29:21,470 Now you say, wait a minute. 718 00:29:21,470 --> 00:29:26,320 5n plus 2-- if n is 10 that's 52. 719 00:29:26,320 --> 00:29:28,620 And if n is 20, that's 102. 720 00:29:28,620 --> 00:29:29,850 That's not quite doubling it. 721 00:29:29,850 --> 00:29:31,075 And you're right. 722 00:29:31,075 --> 00:29:32,700 But remember, we really care about this 723 00:29:32,700 --> 00:29:33,750 in the asymptotic case. 724 00:29:33,750 --> 00:29:36,030 When n gets really big, those extra little pieces 725 00:29:36,030 --> 00:29:37,230 don't matter. 726 00:29:37,230 --> 00:29:39,000 And so what we're going to do is we're 727 00:29:39,000 --> 00:29:42,320 going to ignore the additive constants 728 00:29:42,320 --> 00:29:45,370 and we're going to ignore the multiplicative constants when 729 00:29:45,370 --> 00:29:48,630 we talk about orders of growth. 730 00:29:48,630 --> 00:29:51,074 So what does o of n measure? 731 00:29:51,074 --> 00:29:52,490 Well, we're just summarizing here. 732 00:29:52,490 --> 00:29:54,796 We want to describe how much time is needed to compute 733 00:29:54,796 --> 00:29:56,420 or how does the amount of time, rather, 734 00:29:56,420 --> 00:29:58,420 needed to computer problem growth 735 00:29:58,420 --> 00:30:01,130 as the size of the problem itself grows. 736 00:30:01,130 --> 00:30:03,590 So we want an expression that counts 737 00:30:03,590 --> 00:30:05,434 that asymptotic behavior. 738 00:30:05,434 --> 00:30:07,850 And we're going to focus as a consequence on the term that 739 00:30:07,850 --> 00:30:11,190 grows most rapidly. 740 00:30:11,190 --> 00:30:13,309 So here are some examples. 741 00:30:13,309 --> 00:30:14,850 And I know if you're following along, 742 00:30:14,850 --> 00:30:16,070 you can already see the answers here. 743 00:30:16,070 --> 00:30:17,570 But we're going to do this to simply 744 00:30:17,570 --> 00:30:19,010 give you a sense of that. 745 00:30:19,010 --> 00:30:21,350 If I'm counting operations and I come up 746 00:30:21,350 --> 00:30:23,810 with an expression that has n squared 747 00:30:23,810 --> 00:30:28,880 plus 2n plus 2 operations, that expression I say 748 00:30:28,880 --> 00:30:31,740 is order n squared. 749 00:30:31,740 --> 00:30:33,720 The 2 and the 2n don't matter. 750 00:30:33,720 --> 00:30:36,450 And think about what happens if you make n really big. 751 00:30:36,450 --> 00:30:39,510 n squared is much more dominant than the other terms. 752 00:30:39,510 --> 00:30:42,590 We say that's order n squared. 753 00:30:42,590 --> 00:30:47,460 Even this expression we say is order n squared. 754 00:30:47,460 --> 00:30:49,690 So in this case, for lower values of n, 755 00:30:49,690 --> 00:30:51,749 this term is going to be the big one in terms 756 00:30:51,749 --> 00:30:52,540 of number of steps. 757 00:30:52,540 --> 00:30:56,080 I have no idea how I wrote such an inefficient algorithm 758 00:30:56,080 --> 00:30:57,970 that it took 100,000 steps to do something. 759 00:30:57,970 --> 00:31:01,150 But if I had that expression for smaller values of n, 760 00:31:01,150 --> 00:31:02,590 this matters a lot. 761 00:31:02,590 --> 00:31:04,727 This is a really big number. 762 00:31:04,727 --> 00:31:06,310 But when I'm interested in the growth, 763 00:31:06,310 --> 00:31:09,010 then that's the term that dominates. 764 00:31:09,010 --> 00:31:11,200 And you see the idea or begin to see the idea here 765 00:31:11,200 --> 00:31:13,300 that when I have-- sorry, let me go back there-- 766 00:31:13,300 --> 00:31:17,820 when I have expressions, if it's a polynomial expression, 767 00:31:17,820 --> 00:31:19,240 it's the highest order term. 768 00:31:19,240 --> 00:31:21,090 It's the term that captures the complexity. 769 00:31:21,090 --> 00:31:23,930 Both of these are quadratic. 770 00:31:23,930 --> 00:31:31,710 This term is order n, because n grows faster than log of n. 771 00:31:31,710 --> 00:31:33,420 This funky looking term, even though that 772 00:31:33,420 --> 00:31:35,190 looks like the big number there and it 773 00:31:35,190 --> 00:31:40,530 is a big number, that expression we see is order n log n. 774 00:31:40,530 --> 00:31:42,420 Because again, if I plot out as how 775 00:31:42,420 --> 00:31:44,670 this changes as I make n really large, 776 00:31:44,670 --> 00:31:48,780 this term eventually takes over as the dominant term. 777 00:31:48,780 --> 00:31:51,050 What about that one? 778 00:31:51,050 --> 00:31:53,340 What's the big term there? 779 00:31:53,340 --> 00:31:56,530 How many people think it's n to the 30th? 780 00:31:56,530 --> 00:31:58,560 Show of hands. 781 00:31:58,560 --> 00:32:00,590 How many people think it's 3 to the n? 782 00:32:00,590 --> 00:32:02,050 Show of hands. 783 00:32:02,050 --> 00:32:02,550 Thank you. 784 00:32:02,550 --> 00:32:03,508 You're following along. 785 00:32:03,508 --> 00:32:04,880 You're also paying attention. 786 00:32:04,880 --> 00:32:07,088 How many people think I should stop asking questions? 787 00:32:07,088 --> 00:32:07,800 No show of hands. 788 00:32:07,800 --> 00:32:09,200 All right. 789 00:32:09,200 --> 00:32:10,790 But you're right. 790 00:32:10,790 --> 00:32:14,845 Exponentials are much worse than powers. 791 00:32:14,845 --> 00:32:16,220 Even something like this-- again, 792 00:32:16,220 --> 00:32:18,594 it's going to take a big value of n before it gets there, 793 00:32:18,594 --> 00:32:20,300 but it does get there. 794 00:32:20,300 --> 00:32:21,850 And that, by the way, is important, 795 00:32:21,850 --> 00:32:23,470 because we're going to see later on in the term 796 00:32:23,470 --> 00:32:24,886 that there are some problems where 797 00:32:24,886 --> 00:32:29,410 it's believed that all of the solutions are exponential. 798 00:32:29,410 --> 00:32:30,940 And that's a pain, because it says 799 00:32:30,940 --> 00:32:32,815 it's always going to be expensive to compute. 800 00:32:32,815 --> 00:32:36,700 So that's how we're going to reason about these things. 801 00:32:36,700 --> 00:32:39,580 And to see it visually, here are the differences 802 00:32:39,580 --> 00:32:42,014 between those different classes. 803 00:32:42,014 --> 00:32:43,930 Something that's constant-- the amount of time 804 00:32:43,930 --> 00:32:47,210 doesn't change as I change the size of the input. 805 00:32:47,210 --> 00:32:49,460 Something that linear grows as a straight line, 806 00:32:49,460 --> 00:32:50,990 as you would expect. 807 00:32:50,990 --> 00:32:51,620 Nice behavior. 808 00:32:51,620 --> 00:32:55,140 Quadratic starts to grow more quickly. 809 00:32:55,140 --> 00:32:56,930 The log is always better than linear 810 00:32:56,930 --> 00:33:01,300 because it slows down as we increase the size. 811 00:33:01,300 --> 00:33:03,860 n log n or log linear is a funky term, 812 00:33:03,860 --> 00:33:06,770 but we're going to see it's a very common complexity 813 00:33:06,770 --> 00:33:09,830 for really valuable algorithms in computer science. 814 00:33:09,830 --> 00:33:12,840 And it has a nice behavior, sort of between the linear 815 00:33:12,840 --> 00:33:14,230 and the quadratic. 816 00:33:14,230 --> 00:33:16,566 And exponential blows up. 817 00:33:16,566 --> 00:33:18,420 Just to remind you of that-- well, 818 00:33:18,420 --> 00:33:20,045 sorry-- let me show you how we're going 819 00:33:20,045 --> 00:33:22,480 to do the reasoning about this. 820 00:33:22,480 --> 00:33:25,114 So here's how we're going to reason about it. 821 00:33:25,114 --> 00:33:26,530 We've already seen some code where 822 00:33:26,530 --> 00:33:30,250 I started working through this process of counting operations. 823 00:33:30,250 --> 00:33:32,020 Here are the tools I want you to use. 824 00:33:32,020 --> 00:33:34,090 Given a piece of code, you're going 825 00:33:34,090 --> 00:33:37,660 to reason about each chunk of code separately. 826 00:33:37,660 --> 00:33:41,966 If you've got sequential pieces of code, 827 00:33:41,966 --> 00:33:43,840 then the rules are called the law of addition 828 00:33:43,840 --> 00:33:47,530 for order of growth is that the order of growth 829 00:33:47,530 --> 00:33:50,080 of the combination is the combination 830 00:33:50,080 --> 00:33:52,380 of the order of the growth. 831 00:33:52,380 --> 00:33:54,340 Say that quickly 10 times. 832 00:33:54,340 --> 00:33:56,130 But let's look at an example of that. 833 00:33:56,130 --> 00:33:58,200 Here are two loops. 834 00:33:58,200 --> 00:34:00,094 You've already seen examples of how 835 00:34:00,094 --> 00:34:01,260 to reason about those loops. 836 00:34:01,260 --> 00:34:05,524 For this one, it's linear in the size of n. 837 00:34:05,524 --> 00:34:08,190 I'm going to go through the loop n times doing a constant amount 838 00:34:08,190 --> 00:34:09,489 of things each time around. 839 00:34:09,489 --> 00:34:12,389 But what I just showed, that's order n. 840 00:34:12,389 --> 00:34:15,270 This one-- again, I'm doing just a constant number of things 841 00:34:15,270 --> 00:34:20,380 inside the loop-- but notice, that it's n squared. 842 00:34:20,380 --> 00:34:23,360 So that's order n squared. 843 00:34:23,360 --> 00:34:24,870 n times n. 844 00:34:24,870 --> 00:34:27,719 The combination is I have to do this work and then that work. 845 00:34:27,719 --> 00:34:31,850 So I write that as saying that is order of n plus order of n 846 00:34:31,850 --> 00:34:33,260 squared. 847 00:34:33,260 --> 00:34:35,510 But by this up here, that is the same 848 00:34:35,510 --> 00:34:39,179 as saying what's the order of growth of n plus n squared. 849 00:34:39,179 --> 00:34:39,679 Oh yeah. 850 00:34:39,679 --> 00:34:41,060 We just saw that. 851 00:34:41,060 --> 00:34:43,150 Says it's n squared. 852 00:34:43,150 --> 00:34:44,794 So addition or the law of addition 853 00:34:44,794 --> 00:34:46,210 let's be reasonable about the fact 854 00:34:46,210 --> 00:34:50,139 that this will be an order n squared algorithm. 855 00:34:50,139 --> 00:34:52,310 Second one I'm going to use is called 856 00:34:52,310 --> 00:34:54,510 the law of multiplication. 857 00:34:54,510 --> 00:35:01,260 And this says when I have nested statements or nested loops, 858 00:35:01,260 --> 00:35:03,140 I need to reason about those. 859 00:35:03,140 --> 00:35:06,410 And in that case, what I want to argue-- or not argue-- state 860 00:35:06,410 --> 00:35:09,650 is that the order of growth here is a multiplication. 861 00:35:09,650 --> 00:35:11,266 That is, when I have nested things, 862 00:35:11,266 --> 00:35:12,890 I figure out what's the order of growth 863 00:35:12,890 --> 00:35:16,080 of the inner part, what's the order growth of the outer part, 864 00:35:16,080 --> 00:35:18,890 and I'm going to multiply-- bleh, try again-- I'm 865 00:35:18,890 --> 00:35:22,050 going to multiply together those orders of growth, 866 00:35:22,050 --> 00:35:24,150 get the overall order of growth. 867 00:35:24,150 --> 00:35:26,120 If you think about it, it makes sense. 868 00:35:26,120 --> 00:35:27,710 Look at my little example here. 869 00:35:27,710 --> 00:35:28,960 It's a trivial little example. 870 00:35:28,960 --> 00:35:33,110 But I'm looping for i from 0 up to n. 871 00:35:33,110 --> 00:35:36,470 For every value of i, I'm looping for j from 0 up to n. 872 00:35:36,470 --> 00:35:39,250 And then I'm printing out A. I'm the Fonz. 873 00:35:39,250 --> 00:35:41,870 I'm saying heyyy a lot. 874 00:35:41,870 --> 00:35:42,370 Oh, come on. 875 00:35:42,370 --> 00:35:44,578 At least throw something, I mean, when it's that bad. 876 00:35:44,578 --> 00:35:45,179 Right? 877 00:35:45,179 --> 00:35:46,720 Want to make sure you're still awake. 878 00:35:46,720 --> 00:35:46,930 OK. 879 00:35:46,930 --> 00:35:47,650 You get the idea. 880 00:35:47,650 --> 00:35:52,360 But what I want to show you here is notice the order of growth. 881 00:35:52,360 --> 00:35:54,400 That's order n. 882 00:35:54,400 --> 00:35:55,240 Right? 883 00:35:55,240 --> 00:35:57,340 I'm doing that n times. 884 00:35:57,340 --> 00:35:59,650 But I'm doing that for each value of i. 885 00:35:59,650 --> 00:36:03,100 The outer piece here loops also n times. 886 00:36:03,100 --> 00:36:05,430 For each value of i, I'm doing order n things. 887 00:36:05,430 --> 00:36:10,770 So I'm doing order of n times order of n steps. 888 00:36:10,770 --> 00:36:13,930 And by that law, that is the same order of n times 889 00:36:13,930 --> 00:36:16,100 n or n squared. 890 00:36:16,100 --> 00:36:19,849 So this is a quadratic expression. 891 00:36:19,849 --> 00:36:21,140 You're going to see that a lot. 892 00:36:21,140 --> 00:36:25,590 Nested loops typically have that kind of behavior. 893 00:36:25,590 --> 00:36:29,117 Not always, but typically have that kind of behavior. 894 00:36:29,117 --> 00:36:30,700 So what you're going to see is there's 895 00:36:30,700 --> 00:36:32,110 a set of complexity classes. 896 00:36:32,110 --> 00:36:34,960 And we're about to start filling these in. 897 00:36:34,960 --> 00:36:37,810 Order one is constant. 898 00:36:37,810 --> 00:36:39,790 Says amount of time it takes doesn't depend 899 00:36:39,790 --> 00:36:42,170 on the size of the problem. 900 00:36:42,170 --> 00:36:44,080 These are really rare that you get. 901 00:36:44,080 --> 00:36:47,330 They tend to be trivial pieces of code, but they're valuable. 902 00:36:47,330 --> 00:36:50,307 Log n reflects logarithmic runtime. 903 00:36:50,307 --> 00:36:51,890 You can sort of read the rest of them. 904 00:36:51,890 --> 00:36:54,390 These are the kinds of things that we're going to deal with. 905 00:36:54,390 --> 00:36:58,494 We are going to see examples here, here, and here. 906 00:36:58,494 --> 00:37:00,410 And later on, we're going to come back and see 907 00:37:00,410 --> 00:37:03,660 these, which are really nice examples to have. 908 00:37:03,660 --> 00:37:05,732 Just to remind you why these orders of growth 909 00:37:05,732 --> 00:37:07,440 matter-- sorry, that's just reminding you 910 00:37:07,440 --> 00:37:08,273 what they look like. 911 00:37:08,273 --> 00:37:11,040 We've already done that. 912 00:37:11,040 --> 00:37:13,780 Here is the difference between constant log, 913 00:37:13,780 --> 00:37:17,940 linear, log linear squared, and exponential. 914 00:37:17,940 --> 00:37:23,520 When n is equal to 10, 100, 1,000 or a million. 915 00:37:23,520 --> 00:37:26,670 I know you know this, but I want to drive home the difference. 916 00:37:26,670 --> 00:37:29,190 Something that's constant is wonderful, no matter 917 00:37:29,190 --> 00:37:30,150 how big the problem is. 918 00:37:30,150 --> 00:37:31,920 Takes the same amount of time. 919 00:37:31,920 --> 00:37:34,710 Something that is log is pretty nice. 920 00:37:34,710 --> 00:37:36,840 Increase the size of the problem by 10, 921 00:37:36,840 --> 00:37:39,870 it increases by a factor of 2. 922 00:37:39,870 --> 00:37:41,310 From another 10, it only increases 923 00:37:41,310 --> 00:37:43,290 by a factor of another 50%. 924 00:37:43,290 --> 00:37:44,910 It only increases a little bit. 925 00:37:44,910 --> 00:37:47,150 That's a gorgeous kind of problem to have. 926 00:37:47,150 --> 00:37:48,570 Linear-- not so bad. 927 00:37:48,570 --> 00:37:51,570 I go from 10 to 100 to 1,000 to a million. 928 00:37:51,570 --> 00:37:54,380 You can see log linear is not bad either. 929 00:37:54,380 --> 00:37:55,250 Right? 930 00:37:55,250 --> 00:37:58,600 A factor of 10 increase here is only a factor of 20 increase 931 00:37:58,600 --> 00:37:59,100 there. 932 00:37:59,100 --> 00:38:03,550 A factor of 10 increase there is only a factor of 30 increase 933 00:38:03,550 --> 00:38:04,050 there. 934 00:38:04,050 --> 00:38:07,450 So log linear doesn't grow that badly. 935 00:38:07,450 --> 00:38:09,890 But look at the difference between n squared and 2 936 00:38:09,890 --> 00:38:11,296 to the n. 937 00:38:11,296 --> 00:38:13,460 I actually did think of printing this out. 938 00:38:13,460 --> 00:38:15,320 By the way, Python will compute this. 939 00:38:15,320 --> 00:38:17,360 But it was taken pages and pages and pages. 940 00:38:17,360 --> 00:38:18,530 I didn't want to do it. 941 00:38:18,530 --> 00:38:20,420 You get the point. 942 00:38:20,420 --> 00:38:22,730 Exponential-- always much worse. 943 00:38:22,730 --> 00:38:26,780 Always much worse than a quadratic or a power 944 00:38:26,780 --> 00:38:27,620 expression. 945 00:38:27,620 --> 00:38:31,100 And you really see the difference here. 946 00:38:31,100 --> 00:38:32,600 All right. 947 00:38:32,600 --> 00:38:35,570 The reason I put this up is as you design algorithms, 948 00:38:35,570 --> 00:38:41,369 your goal is to be as high up in this listing as you can. 949 00:38:41,369 --> 00:38:43,160 The closer you are to the top of this list, 950 00:38:43,160 --> 00:38:44,720 the better off you are. 951 00:38:44,720 --> 00:38:47,900 If you have a solution that's down here, 952 00:38:47,900 --> 00:38:49,550 bring a sleeping bag and some coffee. 953 00:38:49,550 --> 00:38:50,690 You're going to be there for a while. 954 00:38:50,690 --> 00:38:51,290 Right? 955 00:38:51,290 --> 00:38:54,230 You really want to try and avoid that if you can. 956 00:38:54,230 --> 00:38:57,780 So now what we want to do, both for the rest of today 957 00:38:57,780 --> 00:39:01,900 in the last 15 minutes and then next week, 958 00:39:01,900 --> 00:39:06,200 is start identifying common algorithms 959 00:39:06,200 --> 00:39:07,991 and what is their complexity. 960 00:39:07,991 --> 00:39:09,740 As I said to you way back at the beginning 961 00:39:09,740 --> 00:39:11,180 of this lecture, which I'm sure you remember, 962 00:39:11,180 --> 00:39:13,760 it's not just to be able to identify the complexity. 963 00:39:13,760 --> 00:39:17,330 I want you to see how choices algorithm 964 00:39:17,330 --> 00:39:20,120 design are going to lead to particular kinds 965 00:39:20,120 --> 00:39:22,970 of consequences in terms of what this is going to cost you. 966 00:39:22,970 --> 00:39:25,000 That's your goal here. 967 00:39:25,000 --> 00:39:26,110 All right. 968 00:39:26,110 --> 00:39:27,500 We've already seen some examples. 969 00:39:27,500 --> 00:39:28,750 I'm going to do one more here. 970 00:39:28,750 --> 00:39:30,670 But simple iterative loop algorithms 971 00:39:30,670 --> 00:39:33,760 are typically linear. 972 00:39:33,760 --> 00:39:35,830 Here's another version of searching. 973 00:39:35,830 --> 00:39:38,050 Imagine I'll have an unsorted list. 974 00:39:38,050 --> 00:39:39,340 Arbitrary order. 975 00:39:39,340 --> 00:39:41,380 Here's another way of doing the linear search. 976 00:39:41,380 --> 00:39:43,300 Looks a little bit faster. 977 00:39:43,300 --> 00:39:46,810 I'm going to set a flag initially to false. 978 00:39:46,810 --> 00:39:49,630 And then I'm going to loop for i from 0 979 00:39:49,630 --> 00:39:52,330 up to the length of L. I'm going to use that 980 00:39:52,330 --> 00:39:55,300 to index into the list, pull out each element of the list 981 00:39:55,300 --> 00:39:58,510 in turn, and check to see is it the thing I'm looking for. 982 00:39:58,510 --> 00:40:01,780 As soon as I find it, I'm going to send-- sorry-- 983 00:40:01,780 --> 00:40:04,650 set the flag to true. 984 00:40:04,650 --> 00:40:05,440 OK? 985 00:40:05,440 --> 00:40:08,350 So that when I return out of the loop, I can just return found. 986 00:40:08,350 --> 00:40:11,200 And if I found it to be true, if I never found it, 987 00:40:11,200 --> 00:40:14,610 found will still be false and I'll return it. 988 00:40:14,610 --> 00:40:16,320 We could count the operations here, 989 00:40:16,320 --> 00:40:19,290 but you've already seen examples of doing that. 990 00:40:19,290 --> 00:40:23,270 This is linear, because I'm looping 991 00:40:23,270 --> 00:40:26,220 n times if n is the length of the list over there. 992 00:40:26,220 --> 00:40:30,080 And the number of things I do inside the loop is constant. 993 00:40:30,080 --> 00:40:32,360 Now, you might say, wait a minute. 994 00:40:32,360 --> 00:40:34,110 This is really brain damaged, or if you're 995 00:40:34,110 --> 00:40:37,391 being more politically correct, computationally challenged. 996 00:40:37,391 --> 00:40:37,890 OK? 997 00:40:37,890 --> 00:40:40,720 In the sense of once I've found it, 998 00:40:40,720 --> 00:40:43,110 why bother looking at the rest of the list? 999 00:40:43,110 --> 00:40:48,160 So in fact, I could just return true right here. 1000 00:40:48,160 --> 00:40:52,202 Does that change the order of growth of this algorithm? 1001 00:40:52,202 --> 00:40:54,170 No. 1002 00:40:54,170 --> 00:40:56,180 Changes the average time. 1003 00:40:56,180 --> 00:40:58,040 I'm going to stop faster. 1004 00:40:58,040 --> 00:41:00,260 But remember the order of growth captures 1005 00:41:00,260 --> 00:41:01,820 what's the worst case behavior. 1006 00:41:01,820 --> 00:41:04,017 And the worst case behavior is the elements 1007 00:41:04,017 --> 00:41:05,850 not in the list I got to look at everything. 1008 00:41:05,850 --> 00:41:09,980 So this will be an example of a linear algorithm. 1009 00:41:09,980 --> 00:41:12,740 And you can see, I'm looping length of L times 1010 00:41:12,740 --> 00:41:14,490 over the loop inside of there. 1011 00:41:14,490 --> 00:41:17,060 It's taking the order one to test it. 1012 00:41:17,060 --> 00:41:19,835 So it's order n. 1013 00:41:19,835 --> 00:41:22,370 And if I were to actually count it, there's the expression. 1014 00:41:22,370 --> 00:41:25,910 It's 1 plus 4n plus 1, which is 4n plus 2, which by my rule 1015 00:41:25,910 --> 00:41:28,280 says I don't care about the additive constant. 1016 00:41:28,280 --> 00:41:30,060 I only care about the dominant term. 1017 00:41:30,060 --> 00:41:32,650 And I don't care about that multiplicative constant. 1018 00:41:32,650 --> 00:41:35,250 It's order n. 1019 00:41:35,250 --> 00:41:38,790 An example of a template you're going to see a lot. 1020 00:41:38,790 --> 00:41:43,306 Now, order n where n is the length of the list 1021 00:41:43,306 --> 00:41:44,430 and I need to specify that. 1022 00:41:44,430 --> 00:41:46,860 That's the thing I'm after. 1023 00:41:46,860 --> 00:41:48,920 If you think about it, I cheated. 1024 00:41:48,920 --> 00:41:50,020 Sorry-- I never cheat. 1025 00:41:50,020 --> 00:41:50,530 I'm tenure. 1026 00:41:50,530 --> 00:41:51,340 I never cheat. 1027 00:41:51,340 --> 00:41:52,785 I just mislead you badly. 1028 00:41:55,300 --> 00:41:57,710 Not a chance. 1029 00:41:57,710 --> 00:42:02,250 How do I know that accessing an element of the list 1030 00:42:02,250 --> 00:42:03,930 takes constant time? 1031 00:42:03,930 --> 00:42:06,530 I made an assumption about that. 1032 00:42:06,530 --> 00:42:09,450 And this is a reasonable thing to ask about-- both 1033 00:42:09,450 --> 00:42:11,950 what am I assuming about the constant operations 1034 00:42:11,950 --> 00:42:14,249 and how do I know that's actually true? 1035 00:42:14,249 --> 00:42:16,290 Well, it gives me a chance to point out something 1036 00:42:16,290 --> 00:42:17,910 that Python does very effectively. 1037 00:42:17,910 --> 00:42:19,872 Not all languages do. 1038 00:42:19,872 --> 00:42:20,830 But think about a list. 1039 00:42:20,830 --> 00:42:22,710 Suppose I've got a list that's all integers. 1040 00:42:22,710 --> 00:42:24,390 I'm going to need some amount of memory 1041 00:42:24,390 --> 00:42:26,410 to represent each integer. 1042 00:42:26,410 --> 00:42:30,600 So if a byte is 8 bits, I might reserve 4 bytes or 32 bits 1043 00:42:30,600 --> 00:42:33,660 to cover any reasonable sized integer. 1044 00:42:33,660 --> 00:42:36,450 When I represent a list, I could simply have each of them 1045 00:42:36,450 --> 00:42:37,000 in turn. 1046 00:42:37,000 --> 00:42:38,850 So what do I need to know? 1047 00:42:38,850 --> 00:42:40,980 I'm going to allocate out a particular length-- 1048 00:42:40,980 --> 00:42:44,940 say 4 bytes, 32 bits, 32 sequential elements of memory 1049 00:42:44,940 --> 00:42:46,860 to represent each integer. 1050 00:42:46,860 --> 00:42:49,794 And then I just need to know where's 1051 00:42:49,794 --> 00:42:51,210 the first part of the list, what's 1052 00:42:51,210 --> 00:42:53,418 the address and memory of the first part of the list. 1053 00:42:53,418 --> 00:42:55,440 And to get to the i-th element, I take 1054 00:42:55,440 --> 00:42:59,930 that base plus 4 bytes times i. 1055 00:42:59,930 --> 00:43:02,190 And I can go straight to this point 1056 00:43:02,190 --> 00:43:04,680 without having to walk down the list. 1057 00:43:04,680 --> 00:43:05,750 That's nice. 1058 00:43:05,750 --> 00:43:06,570 OK? 1059 00:43:06,570 --> 00:43:09,660 It says, in fact, I can get to any element of memory-- 1060 00:43:09,660 --> 00:43:14,921 I'm sorry-- any element of the list in constant time. 1061 00:43:14,921 --> 00:43:15,420 OK. 1062 00:43:15,420 --> 00:43:17,740 Now, what if the things I'm representing aren't integers? 1063 00:43:17,740 --> 00:43:19,448 They're arbitrary things and they take up 1064 00:43:19,448 --> 00:43:21,820 a big chunk of space. 1065 00:43:21,820 --> 00:43:23,710 Well, if the list is heterogeneous, 1066 00:43:23,710 --> 00:43:26,520 we use a nice technique called indirection. 1067 00:43:26,520 --> 00:43:28,930 And that simply says we, again, have a list. 1068 00:43:28,930 --> 00:43:30,610 We know the address of this point. 1069 00:43:30,610 --> 00:43:33,730 We know the address there for the i-th element of this list. 1070 00:43:33,730 --> 00:43:38,005 But inside of here, we don't store the actual value. 1071 00:43:38,005 --> 00:43:40,907 We store a pointer to where it is in memory. 1072 00:43:40,907 --> 00:43:42,490 Just what these things are indicating. 1073 00:43:42,490 --> 00:43:44,740 So they can be arbitrary size. 1074 00:43:44,740 --> 00:43:47,500 But again, I can get to any element in constant time, which 1075 00:43:47,500 --> 00:43:49,730 is exactly what I want. 1076 00:43:49,730 --> 00:43:52,550 So that's great. 1077 00:43:52,550 --> 00:43:53,350 OK. 1078 00:43:53,350 --> 00:43:55,730 Now, suppose I tell you that the list is sorted. 1079 00:43:55,730 --> 00:43:57,750 It's in increasing order. 1080 00:43:57,750 --> 00:44:00,190 I can be more clever about my algorithm. 1081 00:44:00,190 --> 00:44:01,720 Because now, as I loop through it, 1082 00:44:01,720 --> 00:44:05,680 I can say if it's the thing I'm looking for, just return true. 1083 00:44:05,680 --> 00:44:09,010 If the element of the list is bigger than the thing 1084 00:44:09,010 --> 00:44:10,795 I'm looking for, I'm done. 1085 00:44:10,795 --> 00:44:12,670 I don't need to look at the rest of the list, 1086 00:44:12,670 --> 00:44:15,340 because I know it can't be there because it's ordered or sorted. 1087 00:44:15,340 --> 00:44:16,770 I can just return false. 1088 00:44:16,770 --> 00:44:19,735 If I get all the way through the loop, I can return false. 1089 00:44:19,735 --> 00:44:21,610 So I only have to look until I get to a point 1090 00:44:21,610 --> 00:44:23,151 where the thing in the list is bigger 1091 00:44:23,151 --> 00:44:24,572 than what I'm looking for. 1092 00:44:24,572 --> 00:44:27,180 It's the order of growth here. 1093 00:44:27,180 --> 00:44:31,670 Again, the average time behavior will be faster. 1094 00:44:31,670 --> 00:44:33,320 But the order of growth is I've got 1095 00:44:33,320 --> 00:44:34,970 to do order of length of the list 1096 00:44:34,970 --> 00:44:38,720 to go through the loop, order of one to do the test, 1097 00:44:38,720 --> 00:44:40,430 and in the worst case, again, I still 1098 00:44:40,430 --> 00:44:42,754 have to go through the entire list. 1099 00:44:42,754 --> 00:44:44,420 So the order of growth here is the same. 1100 00:44:44,420 --> 00:44:47,780 It is, again, linear in the length of the list, 1101 00:44:47,780 --> 00:44:49,950 even though the runtime will be different depending 1102 00:44:49,950 --> 00:44:52,742 whether it's sorted or not. 1103 00:44:52,742 --> 00:44:54,200 I want you to hold on to that idea, 1104 00:44:54,200 --> 00:44:56,360 because we're going to come back to the sorted list 1105 00:44:56,360 --> 00:44:57,860 next week to see that there actually 1106 00:44:57,860 --> 00:45:00,850 are much more efficient ways to use the fact that a list is 1107 00:45:00,850 --> 00:45:02,960 sorted to do the search. 1108 00:45:02,960 --> 00:45:07,710 But both of these versions same order growth, order n. 1109 00:45:07,710 --> 00:45:08,820 OK. 1110 00:45:08,820 --> 00:45:11,130 So lurching through a list-- right, sorry-- 1111 00:45:11,130 --> 00:45:13,530 searching through a list in sequence 1112 00:45:13,530 --> 00:45:16,110 is linear because of that loop. 1113 00:45:16,110 --> 00:45:18,210 There are other things that have a similar flavor. 1114 00:45:18,210 --> 00:45:19,585 And I'm going to do these quickly 1115 00:45:19,585 --> 00:45:21,370 to get to the last example. 1116 00:45:21,370 --> 00:45:25,260 Imagine I give you a string of characters that are all soon 1117 00:45:25,260 --> 00:45:26,760 to be composed of decimal digits. 1118 00:45:26,760 --> 00:45:28,710 I just want to add them all up. 1119 00:45:28,710 --> 00:45:31,464 This is also linear, because there's the loop. 1120 00:45:31,464 --> 00:45:33,630 I'm going to loop over the characters in the string. 1121 00:45:33,630 --> 00:45:35,730 I'm going to cast them into integers, 1122 00:45:35,730 --> 00:45:37,620 add them in, and return the value. 1123 00:45:37,620 --> 00:45:43,730 This is linear in the length of the input s. 1124 00:45:43,730 --> 00:45:44,990 Notice the pattern. 1125 00:45:44,990 --> 00:45:46,527 That loop-- that in-iterative loop-- 1126 00:45:46,527 --> 00:45:48,110 it's got that linear behavior, because 1127 00:45:48,110 --> 00:45:49,943 inside of the loop constant number of things 1128 00:45:49,943 --> 00:45:52,380 that I'm executing. 1129 00:45:52,380 --> 00:45:53,760 We already looked at fact iter. 1130 00:45:53,760 --> 00:45:55,569 Same idea. 1131 00:45:55,569 --> 00:45:58,110 There's the loop I'm going to do that n times inside the loop 1132 00:45:58,110 --> 00:46:00,380 a constant amount of things. 1133 00:46:00,380 --> 00:46:02,780 So looping around it is order n. 1134 00:46:02,780 --> 00:46:04,070 There's the actual expression. 1135 00:46:04,070 --> 00:46:06,170 But again, the pattern I want you to see here 1136 00:46:06,170 --> 00:46:09,620 is that this is order n. 1137 00:46:09,620 --> 00:46:11,600 OK. 1138 00:46:11,600 --> 00:46:13,250 Last example for today. 1139 00:46:13,250 --> 00:46:17,470 I know you're all secretly looking at your watches. 1140 00:46:17,470 --> 00:46:20,340 Standard loops, typically linear. 1141 00:46:20,340 --> 00:46:21,820 What about nested loops? 1142 00:46:21,820 --> 00:46:24,310 What about loops that have loops inside of them? 1143 00:46:24,310 --> 00:46:26,270 How long do they take? 1144 00:46:26,270 --> 00:46:28,830 I want to show you a couple of examples. 1145 00:46:28,830 --> 00:46:32,820 And mostly, I want to show you how to reason about them. 1146 00:46:32,820 --> 00:46:36,080 Suppose I gave you two lists composed of integers, 1147 00:46:36,080 --> 00:46:38,010 and I want to know is the first list 1148 00:46:38,010 --> 00:46:41,131 a subset of the second list. 1149 00:46:41,131 --> 00:46:43,630 Codes in the handbook, by the way, if you want to go run it. 1150 00:46:43,630 --> 00:46:45,370 But basically, the simple idea would 1151 00:46:45,370 --> 00:46:48,965 be I'm going to loop over every element in the first list. 1152 00:46:48,965 --> 00:46:50,340 And for each one of those, I want 1153 00:46:50,340 --> 00:46:52,362 to say is it in the second list? 1154 00:46:52,362 --> 00:46:53,820 So I'll use the same kind of trick. 1155 00:46:53,820 --> 00:46:56,310 I'll set up a flag that's initially false. 1156 00:46:56,310 --> 00:46:59,550 And then I'm going to loop over everything in the second list. 1157 00:46:59,550 --> 00:47:03,350 And if that thing is equal to the thing I'm looking for, 1158 00:47:03,350 --> 00:47:05,750 I'll set match to true and break out of the loop-- 1159 00:47:05,750 --> 00:47:07,287 the inner loop. 1160 00:47:07,287 --> 00:47:09,120 If I get all the way through the second list 1161 00:47:09,120 --> 00:47:10,495 and I haven't found the thing I'm 1162 00:47:10,495 --> 00:47:14,160 looking for, when I break out or come out of this loop, 1163 00:47:14,160 --> 00:47:18,420 matched in that case, will still be false and all return false. 1164 00:47:18,420 --> 00:47:20,550 But if up here, I found something that matched, 1165 00:47:20,550 --> 00:47:21,400 match would be true. 1166 00:47:21,400 --> 00:47:22,560 I break out of it. 1167 00:47:22,560 --> 00:47:23,760 It's not false. 1168 00:47:23,760 --> 00:47:27,664 Therefore, a return true. 1169 00:47:27,664 --> 00:47:28,830 I want you look at the code. 1170 00:47:28,830 --> 00:47:30,246 You should be able to look at this 1171 00:47:30,246 --> 00:47:32,002 and realize what it's doing. 1172 00:47:32,002 --> 00:47:33,460 For each element in the first list, 1173 00:47:33,460 --> 00:47:36,100 I walk through the second list to say is that element there. 1174 00:47:36,100 --> 00:47:37,650 And if it is, I return true. 1175 00:47:37,650 --> 00:47:40,150 If that's true for all of the elements in the first list, 1176 00:47:40,150 --> 00:47:42,930 I return true overall. 1177 00:47:42,930 --> 00:47:44,214 OK. 1178 00:47:44,214 --> 00:47:44,880 Order of growth. 1179 00:47:47,790 --> 00:47:51,290 Outer loop-- this loop I'm going to execute 1180 00:47:51,290 --> 00:47:53,190 the length of L1 times. 1181 00:47:53,190 --> 00:47:53,690 Right? 1182 00:47:53,690 --> 00:47:55,950 I've got to walk down that first list. 1183 00:47:55,950 --> 00:47:57,770 If I call that n, it's going to take that n 1184 00:47:57,770 --> 00:48:00,760 times over the outer loop. 1185 00:48:00,760 --> 00:48:03,180 But what about n here? 1186 00:48:03,180 --> 00:48:06,300 All of the earlier examples, we had a constant number 1187 00:48:06,300 --> 00:48:09,150 of operations inside of the loop. 1188 00:48:09,150 --> 00:48:09,780 Here, we don't. 1189 00:48:09,780 --> 00:48:13,980 We've got another loop that's looping over in principle 1190 00:48:13,980 --> 00:48:17,220 all the elements of the second list. 1191 00:48:17,220 --> 00:48:20,310 So in each iteration is going to execute the inner loop up 1192 00:48:20,310 --> 00:48:24,240 to length of L2 times, where inside of this inner loop 1193 00:48:24,240 --> 00:48:27,910 there is a constant number of operations. 1194 00:48:27,910 --> 00:48:30,070 Ah, nice. 1195 00:48:30,070 --> 00:48:32,800 That's the multiplicative law of orders of growth. 1196 00:48:32,800 --> 00:48:35,309 It says if this is order length L1. 1197 00:48:35,309 --> 00:48:36,850 And we're going to do that then order 1198 00:48:36,850 --> 00:48:41,941 length of L2 times, the order of growth is a product. 1199 00:48:41,941 --> 00:48:44,260 And the most common or the worst case behavior 1200 00:48:44,260 --> 00:48:47,040 is going to be when the lists are of the same length 1201 00:48:47,040 --> 00:48:49,950 and none of the elements of L1 are in L2. 1202 00:48:49,950 --> 00:48:52,680 And in that case, we're going to get something that's 1203 00:48:52,680 --> 00:48:56,160 order n squared quadratic, where n 1204 00:48:56,160 --> 00:49:00,900 is the length of the list in terms of number of operations. 1205 00:49:00,900 --> 00:49:03,280 I don't really care about subsets. 1206 00:49:03,280 --> 00:49:05,230 I've got one more example. 1207 00:49:05,230 --> 00:49:07,730 We could similarly do intersection. 1208 00:49:07,730 --> 00:49:10,440 If I wanted to say what is the intersection of two lists? 1209 00:49:10,440 --> 00:49:13,380 What elements are on both list 1 and list 2? 1210 00:49:13,380 --> 00:49:14,820 Same basic idea. 1211 00:49:14,820 --> 00:49:17,230 Here, I've got a pair of nested loops. 1212 00:49:17,230 --> 00:49:19,140 I'm looping over everything in L1. 1213 00:49:19,140 --> 00:49:22,050 For that, I'm looping over everything in L2. 1214 00:49:22,050 --> 00:49:24,930 And if they are the same, I'm going to put that 1215 00:49:24,930 --> 00:49:27,620 into a temporary variable. 1216 00:49:27,620 --> 00:49:29,596 Once I've done that, I need to clean things up. 1217 00:49:29,596 --> 00:49:31,220 So I'm going to write another loop that 1218 00:49:31,220 --> 00:49:34,070 sets up an internal variable and then runs through everything 1219 00:49:34,070 --> 00:49:36,410 in the list I accumulated, making sure 1220 00:49:36,410 --> 00:49:38,074 that it's not already there. 1221 00:49:38,074 --> 00:49:40,490 And as long as it isn't, I'm going to put it in the result 1222 00:49:40,490 --> 00:49:42,580 and return it. 1223 00:49:42,580 --> 00:49:43,380 I did it quickly. 1224 00:49:43,380 --> 00:49:44,200 You can look through it. 1225 00:49:44,200 --> 00:49:45,658 You'll see it does the right thing. 1226 00:49:45,658 --> 00:49:48,140 What I want it to see is what's the order of growth. 1227 00:49:48,140 --> 00:49:49,830 I need to look at this piece. 1228 00:49:49,830 --> 00:49:51,840 Then I need to look at that piece. 1229 00:49:51,840 --> 00:49:58,240 This piece-- well, it's order length L1 to do the outer loop. 1230 00:49:58,240 --> 00:50:01,120 For each version of e1, I've got to do 1231 00:50:01,120 --> 00:50:05,080 order of length L2 things inside to accumulate them. 1232 00:50:05,080 --> 00:50:08,400 So that's quadratic. 1233 00:50:08,400 --> 00:50:10,410 What about the second loop? 1234 00:50:10,410 --> 00:50:12,600 Well, this one is a little more subtle. 1235 00:50:12,600 --> 00:50:17,430 I'm only looping over temp, which is at most going 1236 00:50:17,430 --> 00:50:20,433 to be length L1 long. 1237 00:50:20,433 --> 00:50:27,730 But I'm checking to see is that element in a list? 1238 00:50:27,730 --> 00:50:29,400 And it depends on the implementation. 1239 00:50:29,400 --> 00:50:31,032 But typically, that's going to take up 1240 00:50:31,032 --> 00:50:32,490 to the length of the list to do it. 1241 00:50:32,490 --> 00:50:34,590 I got to look to see is it there or not. 1242 00:50:34,590 --> 00:50:36,930 And so that inner loop if we assume 1243 00:50:36,930 --> 00:50:38,850 the lists are the same size is also 1244 00:50:38,850 --> 00:50:42,280 going to take potentially up to length L1 steps. 1245 00:50:42,280 --> 00:50:45,174 And so this is, again, quadratic. 1246 00:50:45,174 --> 00:50:46,590 It's actually two quadratics-- one 1247 00:50:46,590 --> 00:50:49,560 for the first nested loop, one for the second one, 1248 00:50:49,560 --> 00:50:52,680 because there's an implicit second loop right there. 1249 00:50:52,680 --> 00:50:56,320 But overall, it's quadratic. 1250 00:50:56,320 --> 00:50:58,630 So what you see in general-- this 1251 00:50:58,630 --> 00:51:01,430 is a really dumb way to compute n squared. 1252 00:51:01,430 --> 00:51:03,730 When you have nested loops, typically, it's 1253 00:51:03,730 --> 00:51:06,100 going to be quadratic behavior. 1254 00:51:06,100 --> 00:51:07,760 And so what we've done then is we've 1255 00:51:07,760 --> 00:51:10,040 started to build up examples. 1256 00:51:10,040 --> 00:51:12,800 We've now seen simple looping mechanisms, simple iterative 1257 00:51:12,800 --> 00:51:14,480 mechanisms, nested loops. 1258 00:51:14,480 --> 00:51:17,870 They tend to naturally give rise to linear and quadratic 1259 00:51:17,870 --> 00:51:18,869 complexity. 1260 00:51:18,869 --> 00:51:20,660 And next time, we're going to start looking 1261 00:51:20,660 --> 00:51:22,670 at more interesting classes. 1262 00:51:22,670 --> 00:51:25,153 And we'll see you next time.