1 00:00:00,050 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 3 00:00:04,010 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 8 00:00:22,730 --> 00:00:23,390 PROFESSOR: Hi. 9 00:00:23,390 --> 00:00:24,750 I'm Srini Devadas. 10 00:00:24,750 --> 00:00:27,040 I'm a professor of electrical engineering and computer 11 00:00:27,040 --> 00:00:27,650 science. 12 00:00:27,650 --> 00:00:30,970 I'm going to be co-lecturing 6.006-- Introduction 13 00:00:30,970 --> 00:00:34,950 to Algorithms-- this term with professor Erik Domane. 14 00:00:34,950 --> 00:00:36,001 Eric, say hi. 15 00:00:36,001 --> 00:00:36,883 ERIK DOMANE: Hi. 16 00:00:36,883 --> 00:00:38,650 [LAUGHTER] 17 00:00:38,650 --> 00:00:40,210 PROFESSOR: And we hope you're going 18 00:00:40,210 --> 00:00:43,710 to have a fun time in 6.006 learning 19 00:00:43,710 --> 00:00:45,760 a variety of algorithms. 20 00:00:45,760 --> 00:00:50,760 What I want to do today is spend literally a minute or so 21 00:00:50,760 --> 00:00:55,004 on administrative details, maybe even less. 22 00:00:55,004 --> 00:00:56,420 What I'd like to do is to tell you 23 00:00:56,420 --> 00:01:00,670 to go to the website that's listed up there and read it. 24 00:01:00,670 --> 00:01:02,250 And you'll get all information you 25 00:01:02,250 --> 00:01:06,430 need about what this class is about from a standpoint 26 00:01:06,430 --> 00:01:11,590 of syllabus; what's expected of you; the problem set 27 00:01:11,590 --> 00:01:15,660 schedule; the quiz schedule; and so on and so forth. 28 00:01:15,660 --> 00:01:19,460 I want to dive right in and tell you about interesting things, 29 00:01:19,460 --> 00:01:24,550 like algorithms and complexity of algorithms. 30 00:01:24,550 --> 00:01:26,490 I want to spend some time giving you 31 00:01:26,490 --> 00:01:29,380 an overview of the course content. 32 00:01:29,380 --> 00:01:31,640 And then we're going to dive right 33 00:01:31,640 --> 00:01:35,230 in and look at a particular problem of peak 34 00:01:35,230 --> 00:01:38,360 finding-- both the one dimensional version and a two 35 00:01:38,360 --> 00:01:41,900 dimensional version-- and talk about algorithms to solve 36 00:01:41,900 --> 00:01:46,670 this peak finding problem-- both varieties of it. 37 00:01:46,670 --> 00:01:50,000 And you'll find that there's really 38 00:01:50,000 --> 00:01:53,090 a difference between these various algorithms 39 00:01:53,090 --> 00:01:56,480 that we'll look at in terms of their complexity. 40 00:01:56,480 --> 00:01:59,070 And what I mean by that is you're 41 00:01:59,070 --> 00:02:02,750 going to have different run times of these algorithms 42 00:02:02,750 --> 00:02:06,210 depending on input size, based on how 43 00:02:06,210 --> 00:02:08,600 efficient these algorithms are. 44 00:02:08,600 --> 00:02:14,370 And a prerequisite for this class is 6.042. 45 00:02:14,370 --> 00:02:18,620 And in 6.042 you learned about asymptotic complexity. 46 00:02:18,620 --> 00:02:21,240 And you'll see that in this lecture 47 00:02:21,240 --> 00:02:25,430 we'll analyze relatively simple algorithms today 48 00:02:25,430 --> 00:02:28,070 in terms of their asymptotic complexity. 49 00:02:28,070 --> 00:02:30,340 And you'll be able to compare and say 50 00:02:30,340 --> 00:02:33,940 that this algorithm is fasten this other one-- assuming 51 00:02:33,940 --> 00:02:37,320 that you have large inputs-- because it's 52 00:02:37,320 --> 00:02:40,840 asymptotically less complex. 53 00:02:40,840 --> 00:02:43,185 So let's dive right in and talk about the class. 54 00:02:52,420 --> 00:02:54,550 So the one sentence summary of this class 55 00:02:54,550 --> 00:02:58,910 is that this is about efficient procedures 56 00:02:58,910 --> 00:03:04,850 for solving problems on large inputs. 57 00:03:04,850 --> 00:03:06,800 And when I say large inputs, I mean things 58 00:03:06,800 --> 00:03:10,720 like the US highway system, a map 59 00:03:10,720 --> 00:03:14,110 of all of the highways in the United States; 60 00:03:14,110 --> 00:03:17,850 the human genome, which has a billion letters 61 00:03:17,850 --> 00:03:23,170 in its alphabet; a social network responding to Facebook, 62 00:03:23,170 --> 00:03:26,840 that I guess has 500 million nodes or so. 63 00:03:26,840 --> 00:03:28,280 So these are large inputs. 64 00:03:28,280 --> 00:03:31,470 Now our definition of large has really changed with the times. 65 00:03:31,470 --> 00:03:35,440 And so really the 21st century definition of large 66 00:03:35,440 --> 00:03:36,971 is, I guess, a trillion. 67 00:03:36,971 --> 00:03:37,470 Right? 68 00:03:37,470 --> 00:03:40,680 Back when I was your age large was like 1,000. 69 00:03:40,680 --> 00:03:42,400 [LAUGHTER] 70 00:03:42,400 --> 00:03:44,844 I guess I'm dating myself here. 71 00:03:44,844 --> 00:03:46,760 Back when Eric was your age, it was a million. 72 00:03:46,760 --> 00:03:47,260 Right? 73 00:03:47,260 --> 00:03:48,650 [LAUGHTER] 74 00:03:48,650 --> 00:03:55,000 But what's happening really the world is moving faster, 75 00:03:55,000 --> 00:03:56,420 things are getting bigger. 76 00:03:56,420 --> 00:04:00,880 We have the capability of computing on large inputs, 77 00:04:00,880 --> 00:04:03,220 but that doesn't mean that efficiency 78 00:04:03,220 --> 00:04:05,760 isn't of paramount concern. 79 00:04:05,760 --> 00:04:08,690 The fact of matter is that you can, maybe, 80 00:04:08,690 --> 00:04:13,550 scan a billion elements in a matter of seconds. 81 00:04:13,550 --> 00:04:17,750 But if you had an algorithm that required cubic complexity, 82 00:04:17,750 --> 00:04:19,899 suddenly you're not talking about 10 raised to 9, 83 00:04:19,899 --> 00:04:22,079 you're talking about 10 raised to 27. 84 00:04:22,079 --> 00:04:24,510 And even current computers can't really 85 00:04:24,510 --> 00:04:30,890 handle those kinds of numbers, so efficiency is a concern. 86 00:04:30,890 --> 00:04:34,820 And as inputs get larger, it becomes more of a concern. 87 00:04:34,820 --> 00:04:35,320 All right? 88 00:04:35,320 --> 00:04:39,398 So we're concerned about-- 89 00:04:43,760 --> 00:04:51,310 --efficient procedures-- for solving large scale problems 90 00:04:51,310 --> 00:04:51,940 in this class. 91 00:04:58,140 --> 00:05:01,640 And we're concerned about scalability, 92 00:05:01,640 --> 00:05:07,030 because-- just as, you know, 1,000 93 00:05:07,030 --> 00:05:09,600 was a big number a couple of decades ago, 94 00:05:09,600 --> 00:05:12,140 and now it's kind of a small number-- it's 95 00:05:12,140 --> 00:05:16,430 quite possible that by the time you guys are professors 96 00:05:16,430 --> 00:05:18,220 teaching this class in some university 97 00:05:18,220 --> 00:05:20,690 that a trillion is going to be a small number. 98 00:05:20,690 --> 00:05:24,430 And we're going to be talking about-- I don't know-- 99 00:05:24,430 --> 00:05:27,520 10 raised to 18 as being something 100 00:05:27,520 --> 00:05:32,620 that we're concerned with from a standpoint of a common case 101 00:05:32,620 --> 00:05:34,510 input for an algorithm. 102 00:05:34,510 --> 00:05:38,120 So scalability is important. 103 00:05:38,120 --> 00:05:41,480 And we want to be able to track how our algorithms are going 104 00:05:41,480 --> 00:05:44,000 to do as inputs get larger and larger. 105 00:05:47,210 --> 00:05:52,180 You going to learn a bunch of different data structures. 106 00:05:52,180 --> 00:05:56,650 We'll call them classic data structures, 107 00:05:56,650 --> 00:06:01,450 like binary search trees, hash tables-- that 108 00:06:01,450 --> 00:06:06,020 are called dictionaries in Python-- and data 109 00:06:06,020 --> 00:06:09,470 structures-- such as balanced binary search trees-- that 110 00:06:09,470 --> 00:06:12,975 are more efficient than just the regular binary search trees. 111 00:06:12,975 --> 00:06:14,350 And these are all data structures 112 00:06:14,350 --> 00:06:18,540 that were invented many decades ago. 113 00:06:18,540 --> 00:06:20,850 But they've stood the test of time, 114 00:06:20,850 --> 00:06:23,530 and they continue to be useful. 115 00:06:23,530 --> 00:06:26,210 We're going to augment these data structures in various ways 116 00:06:26,210 --> 00:06:30,330 to make them more efficient for certain kinds of problems. 117 00:06:30,330 --> 00:06:33,980 And while you're not going to be doing a whole lot of algorithm 118 00:06:33,980 --> 00:06:36,180 design in this class, you will be 119 00:06:36,180 --> 00:06:38,335 doing some design and a whole lot of analysis. 120 00:06:40,880 --> 00:06:46,060 The class following this one, 6.046 Designing Analysis 121 00:06:46,060 --> 00:06:48,530 of Algorithms, is a class that you 122 00:06:48,530 --> 00:06:52,080 should take if you like this one. 123 00:06:52,080 --> 00:06:57,180 And you can do a whole lot more design of algorithms in 6.046. 124 00:06:57,180 --> 00:06:59,880 But you will look at classic data structures 125 00:06:59,880 --> 00:07:06,260 and classical algorithms for these data structures, 126 00:07:06,260 --> 00:07:12,470 including things like sorting and matching, and so on. 127 00:07:12,470 --> 00:07:17,200 And one of the nice things about this class 128 00:07:17,200 --> 00:07:21,800 is that you'll be doing real implementations of these data 129 00:07:21,800 --> 00:07:25,130 structures and algorithms in Python. 130 00:07:28,220 --> 00:07:30,880 And in particular are each of the problem 131 00:07:30,880 --> 00:07:38,680 sets in this class are going to have both a theory 132 00:07:38,680 --> 00:07:41,930 part to them, and a programming part to them. 133 00:07:41,930 --> 00:07:43,430 So hopefully it'll all tie together. 134 00:07:43,430 --> 00:07:46,060 The kinds of things we're going to be talking about in lectures 135 00:07:46,060 --> 00:07:51,200 and recitations are going to be directly connected 136 00:07:51,200 --> 00:07:53,260 to the theory parts of the problem sets. 137 00:07:53,260 --> 00:07:55,800 And you'll be programming the algorithms that we talk about 138 00:07:55,800 --> 00:07:58,680 in lecture, or augmenting them, running them. 139 00:07:58,680 --> 00:08:03,180 Figuring out whether they work well on large inputs or not. 140 00:08:06,510 --> 00:08:09,530 So let me talk a little bit about the modules 141 00:08:09,530 --> 00:08:11,462 in this class and the problem sets. 142 00:08:11,462 --> 00:08:12,920 And we hope that these problem sets 143 00:08:12,920 --> 00:08:15,470 are going to be fun for you. 144 00:08:15,470 --> 00:08:19,430 And by fun I don't mean easy. 145 00:08:19,430 --> 00:08:22,656 I mean challenging and worthwhile, so at the end of it 146 00:08:22,656 --> 00:08:24,280 you feel like you've learned something, 147 00:08:24,280 --> 00:08:26,870 and you had some fun along the way. 148 00:08:26,870 --> 00:08:28,580 All right? 149 00:08:28,580 --> 00:08:30,550 So content wise-- 150 00:08:37,350 --> 00:08:41,830 --we have eight modules in the class. 151 00:08:41,830 --> 00:08:44,490 Each of which, roughly speaking, has 152 00:08:44,490 --> 00:08:47,020 a problem set associated with it. 153 00:08:47,020 --> 00:08:51,950 The first of these is what we call algorithmic thinking. 154 00:08:55,710 --> 00:08:59,130 And we'll kick start that one today. 155 00:08:59,130 --> 00:09:01,480 We'll look at a particular problem, as I mentioned, 156 00:09:01,480 --> 00:09:02,790 of peak finding. 157 00:09:02,790 --> 00:09:04,350 And as part of this, you're going 158 00:09:04,350 --> 00:09:07,960 to have a problem set that's going to go out today as well. 159 00:09:07,960 --> 00:09:12,320 And you'll find that in this problem set 160 00:09:12,320 --> 00:09:14,420 some of these algorithms I talk about today will 161 00:09:14,420 --> 00:09:17,090 be coded in Python and given to. 162 00:09:17,090 --> 00:09:20,190 A couple of them are going to have bugs in them. 163 00:09:20,190 --> 00:09:24,340 You'll have to analyze the complexity of these algorithms; 164 00:09:24,340 --> 00:09:27,380 figure out which ones are correct and efficient; 165 00:09:27,380 --> 00:09:29,760 and write a proof for one of them. 166 00:09:29,760 --> 00:09:30,260 All right? 167 00:09:30,260 --> 00:09:33,320 So that's sort of an example problem set. 168 00:09:33,320 --> 00:09:37,600 And you can expect that most of the problem sets 169 00:09:37,600 --> 00:09:40,036 are going to follow that sort of template. 170 00:09:40,036 --> 00:09:40,750 All right. 171 00:09:40,750 --> 00:09:44,810 So you'll get a better sense of this 172 00:09:44,810 --> 00:09:46,690 by the end of the day today for sure. 173 00:09:46,690 --> 00:09:48,930 Or a concrete sense of this, because we'll 174 00:09:48,930 --> 00:09:52,850 be done with lecture and you'll see your first problem set. 175 00:09:52,850 --> 00:09:57,540 We're going to be doing a module on sorting and trees. 176 00:09:57,540 --> 00:10:00,619 Sorting you now about, sorting a bunch of numbers. 177 00:10:00,619 --> 00:10:02,160 Imagine if you had a trillion numbers 178 00:10:02,160 --> 00:10:04,250 and you wanted to sort them. 179 00:10:04,250 --> 00:10:07,610 What kind of algorithm could use for that? 180 00:10:07,610 --> 00:10:10,280 Trees are a wonderful data structure. 181 00:10:10,280 --> 00:10:14,760 There's different varieties, the most common being binary trees. 182 00:10:14,760 --> 00:10:17,580 And there's ways of doing all sorts of things, 183 00:10:17,580 --> 00:10:22,560 like scheduling, and sorting, using various kinds of trees, 184 00:10:22,560 --> 00:10:24,200 including binary trees. 185 00:10:24,200 --> 00:10:31,330 And we have a problem set on simulating a logic network 186 00:10:31,330 --> 00:10:36,660 using a particular kind of sorting algorithm in a data 187 00:10:36,660 --> 00:10:38,340 structure. 188 00:10:38,340 --> 00:10:41,150 That is going to be your second problem set. 189 00:10:41,150 --> 00:10:47,190 And more quickly, we're going to have modules on hashing, 190 00:10:47,190 --> 00:10:51,240 where we do things like genome comparison. 191 00:10:51,240 --> 00:10:56,330 In past terms we compared a human genome to a rat genome, 192 00:10:56,330 --> 00:10:59,350 and discovered they were pretty similar. 193 00:10:59,350 --> 00:11:01,860 99% similar, which is kind of amazing. 194 00:11:01,860 --> 00:11:04,960 But again, these things are so large that you 195 00:11:04,960 --> 00:11:07,590 have to have efficiency in the comparison methods 196 00:11:07,590 --> 00:11:08,460 that you use. 197 00:11:08,460 --> 00:11:11,690 And you'll find that if you don't get the complexity low 198 00:11:11,690 --> 00:11:15,300 enough, you just won't be able to complete-- 199 00:11:15,300 --> 00:11:19,950 your program won't be able to finish running within the time 200 00:11:19,950 --> 00:11:21,260 that your problem set is do. 201 00:11:21,260 --> 00:11:21,760 Right? 202 00:11:21,760 --> 00:11:24,660 Which is a bit of a problem. 203 00:11:24,660 --> 00:11:28,860 So that's something to keep in mind as you test your code. 204 00:11:28,860 --> 00:11:32,140 The fact is that you will get large inputs to run your code. 205 00:11:32,140 --> 00:11:34,960 And you want to keep complexity in mind 206 00:11:34,960 --> 00:11:40,070 as you're coding and thinking about the pseudocode, 207 00:11:40,070 --> 00:11:43,624 if you will, of your algorithm itself. 208 00:11:43,624 --> 00:11:44,790 We will talk about numerics. 209 00:11:47,420 --> 00:11:50,840 A lot of the time we talk about such large numbers 210 00:11:50,840 --> 00:11:54,290 that 32 bits isn't enough. 211 00:11:54,290 --> 00:11:57,130 Or 64 bits isn't enough to represent these numbers. 212 00:11:57,130 --> 00:11:58,910 These numbers have thousands of bits. 213 00:11:58,910 --> 00:12:01,110 A good example is RSA encryption, 214 00:12:01,110 --> 00:12:05,140 that is used in SSL, for example. 215 00:12:05,140 --> 00:12:09,720 And when you go-- use https on websites, 216 00:12:09,720 --> 00:12:12,710 RSA is used at the back end. 217 00:12:12,710 --> 00:12:15,360 And typically you work with prime numbers 218 00:12:15,360 --> 00:12:18,510 that are thousands of bits long in RSA. 219 00:12:18,510 --> 00:12:19,930 So how do you handle that? 220 00:12:19,930 --> 00:12:21,270 How does Python handle that? 221 00:12:21,270 --> 00:12:22,950 How do you write algorithms that can 222 00:12:22,950 --> 00:12:26,270 deal with what are called infinite precision numbers? 223 00:12:26,270 --> 00:12:30,500 So we have a module on numerics in the middle of the term that 224 00:12:30,500 --> 00:12:31,850 talks about that. 225 00:12:31,850 --> 00:12:35,480 Graphs, really a fundamental data structure 226 00:12:35,480 --> 00:12:37,970 in all of computer science. 227 00:12:37,970 --> 00:12:42,610 You might have heard of the famous Rubik's cube assignment 228 00:12:42,610 --> 00:12:43,110 from . 229 00:12:43,110 --> 00:12:46,850 006 a 2 by 2 by 2 Rubik's cube. 230 00:12:46,850 --> 00:12:48,690 What's the minimum number of moves 231 00:12:48,690 --> 00:12:53,240 necessary to go from a given starting configuration 232 00:12:53,240 --> 00:12:56,640 to the final end configuration, where all of the faces-- each 233 00:12:56,640 --> 00:12:58,940 of the faces has uniform color? 234 00:12:58,940 --> 00:13:01,830 And that can be posed as a graph problem. 235 00:13:01,830 --> 00:13:04,052 We'll probably do that one this term. 236 00:13:04,052 --> 00:13:05,760 In previous terms we've done other things 237 00:13:05,760 --> 00:13:07,310 like the 15 puzzle. 238 00:13:07,310 --> 00:13:10,170 And so some of these are tentative. 239 00:13:10,170 --> 00:13:12,420 We definitely know what the first problem set is like, 240 00:13:12,420 --> 00:13:16,420 but the rest of them are, at this moment, tentative. 241 00:13:16,420 --> 00:13:20,340 And to finish up shortest paths. 242 00:13:20,340 --> 00:13:24,660 Again in terms past we've asked you 243 00:13:24,660 --> 00:13:27,380 to write code using a particular algorithm that 244 00:13:27,380 --> 00:13:30,984 finds the shortest path from Caltech to MIT. 245 00:13:30,984 --> 00:13:33,150 This time we may do things a little bit differently. 246 00:13:33,150 --> 00:13:37,150 We were thinking maybe we'll give you a street map of Boston 247 00:13:37,150 --> 00:13:41,360 and go figure out if Paul Revere used 248 00:13:41,360 --> 00:13:44,140 the shortest path to get to where he was going, 249 00:13:44,140 --> 00:13:45,025 or things like that. 250 00:13:45,025 --> 00:13:47,540 We'll try and make it fun. 251 00:13:47,540 --> 00:13:54,420 Dynamic programming is an important algorithm design 252 00:13:54,420 --> 00:14:00,690 technique that's used in many, many problems. 253 00:14:00,690 --> 00:14:04,510 And it can be used to do a variety of things, including 254 00:14:04,510 --> 00:14:06,600 image compression. 255 00:14:06,600 --> 00:14:10,060 How do you compress an image so the number of pixels 256 00:14:10,060 --> 00:14:12,960 reduces, but it still looks like the image 257 00:14:12,960 --> 00:14:15,761 that you started out with, that had many more pixels? 258 00:14:15,761 --> 00:14:16,260 All right? 259 00:14:16,260 --> 00:14:18,970 So you could use dynamic programming for that. 260 00:14:18,970 --> 00:14:23,370 And finally, advanced topics, complexity theory, research 261 00:14:23,370 --> 00:14:25,760 and algorithms. 262 00:14:25,760 --> 00:14:28,590 Hopefully by now-- by this time in the course, 263 00:14:28,590 --> 00:14:30,330 you have been sold on algorithms. 264 00:14:30,330 --> 00:14:32,605 And most, if not all of you, would 265 00:14:32,605 --> 00:14:34,550 want to pursue a carrier in algorithms. 266 00:14:34,550 --> 00:14:37,680 And we'll give you a sense of what else is there. 267 00:14:37,680 --> 00:14:40,364 We're just scratching the surface in this class, 268 00:14:40,364 --> 00:14:42,530 and there's many, many classes that you can possibly 269 00:14:42,530 --> 00:14:47,650 take if you want to continue in-- to learn about algorithms, 270 00:14:47,650 --> 00:14:49,790 or to pursue a career in algorithms. 271 00:14:49,790 --> 00:14:51,580 All right? 272 00:14:51,580 --> 00:14:53,990 So that's the story of the class, 273 00:14:53,990 --> 00:14:55,840 or the synopsis of the class. 274 00:14:55,840 --> 00:15:01,950 And I encourage you to go spend a few minutes on the website. 275 00:15:01,950 --> 00:15:05,850 In particular please read the collaboration policy, and get 276 00:15:05,850 --> 00:15:08,440 a sense of what is expected of you. 277 00:15:08,440 --> 00:15:13,580 What the rules are in terms of doing the problem sets. 278 00:15:13,580 --> 00:15:17,100 And the course grading break down, 279 00:15:17,100 --> 00:15:20,860 the grading policies are all listed on the website as well. 280 00:15:20,860 --> 00:15:23,000 All right. 281 00:15:23,000 --> 00:15:23,870 OK. 282 00:15:23,870 --> 00:15:26,210 So let's get started. 283 00:15:26,210 --> 00:15:28,930 I want to talk about a specific problem. 284 00:15:28,930 --> 00:15:32,000 And talk about algorithms for a specific problem. 285 00:15:32,000 --> 00:15:35,560 We picked this problem, because it's so easy to understand. 286 00:15:35,560 --> 00:15:38,790 And they're fairly straightforward algorithms 287 00:15:38,790 --> 00:15:41,280 that are not particularly efficient to solve 288 00:15:41,280 --> 00:15:42,530 this problem. 289 00:15:42,530 --> 00:15:45,060 And so this is a, kind of, a toy problem. 290 00:15:45,060 --> 00:15:49,660 But like a lot of toy problems, it's 291 00:15:49,660 --> 00:15:55,230 very evocative in that it points out the issues involved 292 00:15:55,230 --> 00:15:57,739 in designing efficient algorithms. 293 00:15:57,739 --> 00:15:59,280 So we'll start with a one dimensional 294 00:15:59,280 --> 00:16:02,395 version of what we call peak finding. 295 00:16:05,810 --> 00:16:10,635 And a peak finder is something in the one dimensional case. 296 00:16:14,180 --> 00:16:18,240 Runs on an array of numbers. 297 00:16:18,240 --> 00:16:22,770 And I'm just putting-- 298 00:16:22,770 --> 00:16:27,020 --symbols for each of these numbers here. 299 00:16:27,020 --> 00:16:31,546 And the numbers are positive, negative. 300 00:16:31,546 --> 00:16:33,170 We'll just assume they're all positive, 301 00:16:33,170 --> 00:16:34,480 it doesn't really matter. 302 00:16:34,480 --> 00:16:38,460 The algorithms we describe will work. 303 00:16:38,460 --> 00:16:41,330 And so we have this one dimensional array 304 00:16:41,330 --> 00:16:43,450 that has nine different positions. 305 00:16:43,450 --> 00:16:47,405 And a through i are numbers. 306 00:16:49,910 --> 00:16:53,030 And we want to find a peak. 307 00:16:53,030 --> 00:16:56,180 And so we have to define what we mean by a peak. 308 00:16:56,180 --> 00:17:00,320 And so, in particular, as an example, 309 00:17:00,320 --> 00:17:07,369 position 2 is a peak if, and only 310 00:17:07,369 --> 00:17:16,520 if, b greater than or equal to a, and b greater than or equal 311 00:17:16,520 --> 00:17:18,020 to c. 312 00:17:18,020 --> 00:17:21,359 So it's really a very local property corresponding 313 00:17:21,359 --> 00:17:22,270 to a peak. 314 00:17:22,270 --> 00:17:25,020 In the one dimensional case, it's trivial. 315 00:17:25,020 --> 00:17:26,220 Look to your left. 316 00:17:26,220 --> 00:17:27,990 Look to your right. 317 00:17:27,990 --> 00:17:31,990 If you are equal or greater than both of the elements 318 00:17:31,990 --> 00:17:35,120 that you see on the left and the right, you're a peak. 319 00:17:35,120 --> 00:17:35,760 OK? 320 00:17:35,760 --> 00:17:38,690 And in the case of the edges, you only 321 00:17:38,690 --> 00:17:40,700 have to look to one side. 322 00:17:40,700 --> 00:17:53,567 So position 9 is a peak if i greater than or equal to h. 323 00:17:53,567 --> 00:17:55,400 So you just have to look to your left there, 324 00:17:55,400 --> 00:17:57,483 because you're all the way on the right hand side. 325 00:17:57,483 --> 00:17:58,270 All right? 326 00:17:58,270 --> 00:18:00,480 So that's it. 327 00:18:00,480 --> 00:18:03,920 And the statement of the problem, the one dimensional 328 00:18:03,920 --> 00:18:13,820 version, is find the peak if it exists. 329 00:18:19,490 --> 00:18:22,070 All right? 330 00:18:22,070 --> 00:18:24,510 That's all there is to it. 331 00:18:24,510 --> 00:18:27,890 I'm going to give you a straightforward algorithm. 332 00:18:27,890 --> 00:18:30,630 And then we'll see if we can improve it. 333 00:18:30,630 --> 00:18:31,270 All right? 334 00:18:31,270 --> 00:18:34,110 You can imagine that the straightforward algorithm is 335 00:18:34,110 --> 00:18:39,440 something that just, you know, walks across the array. 336 00:18:39,440 --> 00:18:43,629 But we need that as a starting point for building something 337 00:18:43,629 --> 00:18:44,420 more sophisticated. 338 00:18:49,680 --> 00:18:57,340 So let's say we start from left and all 339 00:18:57,340 --> 00:19:01,500 we have is one traversal, really. 340 00:19:05,360 --> 00:19:07,930 So let's say we have 1, 2, and then we 341 00:19:07,930 --> 00:19:10,810 have n over 2 over here corresponding 342 00:19:10,810 --> 00:19:14,620 to the middle of this n element array. 343 00:19:14,620 --> 00:19:18,970 And then we have n minus 1, and n. 344 00:19:18,970 --> 00:19:21,090 What I'm interested in doing is, not only 345 00:19:21,090 --> 00:19:24,880 coming up with a straightforward algorithm, 346 00:19:24,880 --> 00:19:29,300 but also precisely characterizing 347 00:19:29,300 --> 00:19:32,030 what its complexity is in relation 348 00:19:32,030 --> 00:19:35,260 to n, which is the number of inputs. 349 00:19:35,260 --> 00:19:35,760 Yeah? 350 00:19:35,760 --> 00:19:36,915 Question? 351 00:19:36,915 --> 00:19:38,456 AUDIENCE: Why do you say if it exists 352 00:19:38,456 --> 00:19:40,348 when the criteria in the [INAUDIBLE] 353 00:19:40,348 --> 00:19:41,397 guarantees [INAUDIBLE]? 354 00:19:41,397 --> 00:19:42,730 PROFESSOR: That's exactly right. 355 00:19:42,730 --> 00:19:44,540 I was going to get to that. 356 00:19:44,540 --> 00:19:50,530 So if you look at the definition of the peak, 357 00:19:50,530 --> 00:19:55,210 then what I have here is greater than or equal to. 358 00:19:55,210 --> 00:19:56,010 OK? 359 00:19:56,010 --> 00:19:59,660 And so this-- That's a great question that was asked. 360 00:19:59,660 --> 00:20:04,470 Why is there "if it exists" in this problem? 361 00:20:04,470 --> 00:20:08,440 Now in the case where I have greater than or equal to, 362 00:20:08,440 --> 00:20:12,310 then-- this is a homework question for you, 363 00:20:12,310 --> 00:20:18,240 and for the rest of you-- argue that any array will always 364 00:20:18,240 --> 00:20:19,610 have a peak. 365 00:20:19,610 --> 00:20:20,790 OK? 366 00:20:20,790 --> 00:20:24,300 Now if you didn't have the greater than or equal to, 367 00:20:24,300 --> 00:20:29,070 and you had a greater than, then can you make that argument? 368 00:20:29,070 --> 00:20:30,120 No, you can't. 369 00:20:30,120 --> 00:20:30,820 Right? 370 00:20:30,820 --> 00:20:33,230 So great question. 371 00:20:33,230 --> 00:20:35,859 In this case it's just a question-- 372 00:20:35,859 --> 00:20:37,400 You would want to modify this problem 373 00:20:37,400 --> 00:20:38,950 statement to find the peak. 374 00:20:38,950 --> 00:20:43,710 But if I had a different definition of a peak-- and this 375 00:20:43,710 --> 00:20:45,850 is part of algorithmic thinking. 376 00:20:45,850 --> 00:20:49,580 You want to be able to create algorithms that are general, 377 00:20:49,580 --> 00:20:52,130 so if the problem definition changes on you, 378 00:20:52,130 --> 00:20:54,300 you still have a starting point to go attack 379 00:20:54,300 --> 00:20:56,500 the second version of the problem. 380 00:20:56,500 --> 00:20:57,310 OK? 381 00:20:57,310 --> 00:21:01,479 So you could eliminate this in the case 382 00:21:01,479 --> 00:21:03,270 of the greater than or equal to definition. 383 00:21:03,270 --> 00:21:05,664 The "if it exists", because a peak will always exist. 384 00:21:05,664 --> 00:21:07,330 But you probably want to argue that when 385 00:21:07,330 --> 00:21:09,950 you want to show the correctness of your algorithm. 386 00:21:09,950 --> 00:21:13,210 And if in fact you had a different definition, 387 00:21:13,210 --> 00:21:19,130 well you would have to create an algorithm that tells you 388 00:21:19,130 --> 00:21:22,310 for sure that a peak doesn't exist, or find 389 00:21:22,310 --> 00:21:23,900 a peak if it exists. 390 00:21:23,900 --> 00:21:24,400 All right? 391 00:21:24,400 --> 00:21:26,300 So that's really the general case. 392 00:21:26,300 --> 00:21:29,830 Many a time it's possible that you're asked to do something, 393 00:21:29,830 --> 00:21:34,990 and you can't actually give an answer to the question, 394 00:21:34,990 --> 00:21:39,335 or find something that satisfies all the constraints required. 395 00:21:39,335 --> 00:21:41,710 And in that case, you want to be able to put up your hand 396 00:21:41,710 --> 00:21:43,470 and say, you know what? 397 00:21:43,470 --> 00:21:44,870 I searched long and hard. 398 00:21:44,870 --> 00:21:46,730 I searched exhaustively. 399 00:21:46,730 --> 00:21:49,930 Here's my argument that I searched exhaustively, 400 00:21:49,930 --> 00:21:51,101 and I couldn't find it. 401 00:21:51,101 --> 00:21:51,600 Right? 402 00:21:51,600 --> 00:21:53,490 If you do that, you get to keep your job. 403 00:21:53,490 --> 00:21:54,580 Right? 404 00:21:54,580 --> 00:21:57,390 Otherwise there's always the case 405 00:21:57,390 --> 00:21:59,060 that you didn't search hard enough. 406 00:21:59,060 --> 00:22:02,310 So it's nice to have that argument. 407 00:22:02,310 --> 00:22:02,810 All right? 408 00:22:02,810 --> 00:22:03,080 Great. 409 00:22:03,080 --> 00:22:04,190 Thanks for the question. 410 00:22:04,190 --> 00:22:05,170 Feel free to interrupt. 411 00:22:05,170 --> 00:22:07,840 Raise your hand, and I'm watching you guys, 412 00:22:07,840 --> 00:22:11,550 and I'm happy to answer questions at any time. 413 00:22:11,550 --> 00:22:14,540 So let's talk about the straightforward algorithm. 414 00:22:14,540 --> 00:22:16,510 The straightforward algorithm is something 415 00:22:16,510 --> 00:22:20,940 that starts from the left and just walks across. 416 00:22:20,940 --> 00:22:24,285 And you might have something that looks like that. 417 00:22:24,285 --> 00:22:24,960 All right? 418 00:22:24,960 --> 00:22:27,830 By that-- By this I mean the numbers are increasing 419 00:22:27,830 --> 00:22:30,730 as you start from the left, the peak is somewhere 420 00:22:30,730 --> 00:22:33,620 in the middle, and then things start decreasing. 421 00:22:33,620 --> 00:22:34,120 Right? 422 00:22:34,120 --> 00:22:39,570 So in this case, you know, this might be the peak. 423 00:22:46,950 --> 00:22:49,550 You also may have a situation where 424 00:22:49,550 --> 00:22:51,240 the peak is all the way on the right, 425 00:22:51,240 --> 00:22:52,780 you started from the left. 426 00:22:52,780 --> 00:22:55,060 And it's 1, 2, 3, 4, 5, 6, literally 427 00:22:55,060 --> 00:22:56,390 in terms of the numbers. 428 00:22:56,390 --> 00:23:01,000 And you're going to look at n elements going all the way 429 00:23:01,000 --> 00:23:04,800 to the right in order to find the peak. 430 00:23:04,800 --> 00:23:07,310 So in the case of the middle you'd 431 00:23:07,310 --> 00:23:10,940 look at n over 2 elements. 432 00:23:13,770 --> 00:23:15,020 If it was right in the middle. 433 00:23:18,340 --> 00:23:26,340 And the complexity, worst case complexity-- 434 00:23:26,340 --> 00:23:29,830 --is what we call theta n. 435 00:23:29,830 --> 00:23:33,580 And it's theta n, because in the worst case, 436 00:23:33,580 --> 00:23:36,294 you may have to look at all n elements. 437 00:23:36,294 --> 00:23:38,710 And that would be the case where you started from the left 438 00:23:38,710 --> 00:23:40,860 and you had to go all the way to the right. 439 00:23:40,860 --> 00:23:43,850 Now remember theta n is essentially something 440 00:23:43,850 --> 00:23:45,830 that's says of the order of n. 441 00:23:45,830 --> 00:23:49,400 So it gives you both the lower bound and an upper bound. 442 00:23:49,400 --> 00:23:52,470 Big [? O ?] of n is just upper bound. 443 00:23:52,470 --> 00:23:53,970 And what we're saying here is, we're 444 00:23:53,970 --> 00:23:58,110 saying this algorithm that starts from the left 445 00:23:58,110 --> 00:24:03,470 is going to, essentially, require in the worst case 446 00:24:03,470 --> 00:24:06,740 something that's a constant times n. 447 00:24:06,740 --> 00:24:07,880 OK? 448 00:24:07,880 --> 00:24:11,210 And you know that constant could be 1. 449 00:24:11,210 --> 00:24:13,400 You could certainly set things up that way. 450 00:24:13,400 --> 00:24:15,860 Or if you had a different kind of algorithm, 451 00:24:15,860 --> 00:24:18,460 maybe you could work on the constant. 452 00:24:18,460 --> 00:24:22,360 But bottom line, we're only concerned, at this moment, 453 00:24:22,360 --> 00:24:24,760 about as asymptotic complexity. 454 00:24:24,760 --> 00:24:29,030 And the asymptotic complexity of this algorithm is linear. 455 00:24:29,030 --> 00:24:29,710 All right? 456 00:24:29,710 --> 00:24:32,150 That make sense? 457 00:24:32,150 --> 00:24:32,930 OK. 458 00:24:32,930 --> 00:24:38,950 So someone help me do better. 459 00:24:38,950 --> 00:24:39,890 How can we do better? 460 00:24:39,890 --> 00:24:43,040 How can we lower the asymptotic complexity 461 00:24:43,040 --> 00:24:46,700 of a one dimensional peak finder? 462 00:24:46,700 --> 00:24:48,450 Anybody want to take a stab at that? 463 00:24:48,450 --> 00:24:48,950 Yeah? 464 00:24:48,950 --> 00:24:50,086 Back there. 465 00:24:50,086 --> 00:24:52,078 AUDIENCE: Do a binary search subset. 466 00:24:52,078 --> 00:24:54,236 You look at the middle, and whatever 467 00:24:54,236 --> 00:24:58,552 is higher-- whichever side is higher, then cut that in half, 468 00:24:58,552 --> 00:25:00,290 because you know there's a peak. 469 00:25:00,290 --> 00:25:00,410 PROFESSOR: On-- 470 00:25:00,410 --> 00:25:01,578 AUDIENCE: For example if you're in the middle 471 00:25:01,578 --> 00:25:03,492 on the right side-- there's a higher number 472 00:25:03,492 --> 00:25:05,116 on the right side-- then you would just 473 00:25:05,116 --> 00:25:06,946 look at that, because you know that your peak's somewhere 474 00:25:06,946 --> 00:25:07,446 in there. 475 00:25:07,446 --> 00:25:08,900 And you continue cutting in half. 476 00:25:08,900 --> 00:25:09,360 PROFESSOR: Excellent! 477 00:25:09,360 --> 00:25:09,859 Excellent! 478 00:25:09,859 --> 00:25:11,200 That's exactly right. 479 00:25:11,200 --> 00:25:14,850 So you can-- You can do something different, which 480 00:25:14,850 --> 00:25:19,240 is essentially try and break up this problem. 481 00:25:19,240 --> 00:25:22,650 Use a divide and conquer strategy, and recursively break 482 00:25:22,650 --> 00:25:26,550 up this one dimensional array into smaller arrays. 483 00:25:26,550 --> 00:25:29,940 And try and get this complexity down. 484 00:25:29,940 --> 00:25:30,440 Yeah? 485 00:25:30,440 --> 00:25:33,239 AUDIENCE: Are we assuming that there's only one peak? 486 00:25:33,239 --> 00:25:34,280 PROFESSOR: No, we're not. 487 00:25:34,280 --> 00:25:34,980 AUDIENCE: OK. 488 00:25:34,980 --> 00:25:39,219 PROFESSOR: It's find a peak if it exists. 489 00:25:39,219 --> 00:25:40,760 And in this case it's, "find a peak", 490 00:25:40,760 --> 00:25:42,610 because of the definition. 491 00:25:42,610 --> 00:25:45,910 We don't really need this as it was discussed. 492 00:25:45,910 --> 00:25:46,660 All right? 493 00:25:46,660 --> 00:25:47,180 OK. 494 00:25:47,180 --> 00:25:49,080 So-- 495 00:25:49,080 --> 00:25:53,392 So that was a great answer, and-- You know this class 496 00:25:53,392 --> 00:25:54,850 after while is going to get boring. 497 00:25:54,850 --> 00:25:55,770 Right? 498 00:25:55,770 --> 00:25:57,650 Every class gets boring. 499 00:25:57,650 --> 00:26:00,777 So we, you know, try and break the monotony here a bit. 500 00:26:00,777 --> 00:26:02,860 And so-- And then the other thing that we realized 501 00:26:02,860 --> 00:26:04,790 was that these seats you're sitting on-- this 502 00:26:04,790 --> 00:26:06,998 is a nice classroom-- but the seats you're sitting on 503 00:26:06,998 --> 00:26:07,750 are kind of hard. 504 00:26:07,750 --> 00:26:08,250 Right? 505 00:26:08,250 --> 00:26:10,787 So what Eric and I did was we decided 506 00:26:10,787 --> 00:26:12,620 we'll help you guys out, especially the ones 507 00:26:12,620 --> 00:26:15,870 who are-- who are interacting with us. 508 00:26:15,870 --> 00:26:17,580 And we have these-- 509 00:26:17,580 --> 00:26:18,610 [LAUGHTER] 510 00:26:18,610 --> 00:26:22,145 --cushions that are 6.006 cushions. 511 00:26:22,145 --> 00:26:25,170 And, you know, that's a 2 by 2 by 2 Rubik's cube here. 512 00:26:25,170 --> 00:26:28,410 And since you answered the first question, you get a cushion. 513 00:26:28,410 --> 00:26:31,510 This is kind of like a Frisbee, but not really. 514 00:26:31,510 --> 00:26:32,010 So-- 515 00:26:32,010 --> 00:26:32,510 [LAUGHTER] 516 00:26:32,510 --> 00:26:35,190 I'm not sure-- I'm not sure I'm going to get it to you. 517 00:26:35,190 --> 00:26:36,565 But the other thing I want to say 518 00:26:36,565 --> 00:26:37,970 is this is not a baseball game. 519 00:26:37,970 --> 00:26:38,469 Right? 520 00:26:38,469 --> 00:26:40,560 Where you just grab the ball as it comes by. 521 00:26:40,560 --> 00:26:43,670 This is meant for him, my friend in the red shirt. 522 00:26:43,670 --> 00:26:45,920 So here you go. 523 00:26:45,920 --> 00:26:46,820 Ah, too bad. 524 00:26:46,820 --> 00:26:47,620 All right. 525 00:26:47,620 --> 00:26:48,580 It is soft. 526 00:26:48,580 --> 00:26:51,255 So, you know, it won't-- it won't hurt you if hits you. 527 00:26:51,255 --> 00:26:51,910 [LAUGHTER] 528 00:26:51,910 --> 00:26:52,540 All right. 529 00:26:52,540 --> 00:26:54,216 So we got a bunch of these. 530 00:26:54,216 --> 00:26:57,300 And raise your hands, you know, going 531 00:26:57,300 --> 00:27:01,025 to ask-- There's going to be-- I think-- There's 532 00:27:01,025 --> 00:27:03,150 some trivial questions that we're going to ask just 533 00:27:03,150 --> 00:27:05,180 to make sure you're awake. 534 00:27:05,180 --> 00:27:07,750 So an answer to that doesn't get you a cushion. 535 00:27:07,750 --> 00:27:10,514 But an answer like-- What's your name? 536 00:27:10,514 --> 00:27:11,180 AUDIENCE: Chase. 537 00:27:11,180 --> 00:27:11,890 PROFESSOR: Chase. 538 00:27:11,890 --> 00:27:15,134 An answer like Chase just gave is-- 539 00:27:15,134 --> 00:27:17,050 that's a good answer to a nontrivial question. 540 00:27:17,050 --> 00:27:18,500 That gets you a cushion. 541 00:27:18,500 --> 00:27:19,290 OK? 542 00:27:19,290 --> 00:27:20,300 All right, great. 543 00:27:20,300 --> 00:27:24,230 So let's put up by Chase's algorithm up here. 544 00:27:24,230 --> 00:27:26,510 I'm going to write it out for the 1D version. 545 00:27:41,390 --> 00:27:45,205 So what we have here is a recursive algorithm. 546 00:28:02,967 --> 00:28:04,800 So the picture you want to keep in your head 547 00:28:04,800 --> 00:28:06,860 is this picture that I put up there. 548 00:28:06,860 --> 00:28:11,010 And this is a divide and conquer algorithm. 549 00:28:11,010 --> 00:28:14,140 You're going to see this over and over-- this paradigm-- 550 00:28:14,140 --> 00:28:17,360 over and over in 6.006. 551 00:28:17,360 --> 00:28:22,745 We're going to look at the n over 2 position. 552 00:28:25,990 --> 00:28:28,700 And we're going to look to the left, 553 00:28:28,700 --> 00:28:31,010 and we're going to look to the right. 554 00:28:31,010 --> 00:28:33,420 And we're going to do that in sequence. 555 00:28:33,420 --> 00:28:33,920 So-- 556 00:28:36,680 --> 00:28:50,950 --if a n over 2 is less than a n over 2 minus 1, then-- 557 00:28:50,950 --> 00:28:54,380 --only look at the left half. 558 00:28:57,680 --> 00:29:04,410 1 through n over 2 minus 1 to look for peak-- for a peak. 559 00:29:08,381 --> 00:29:08,880 All right? 560 00:29:08,880 --> 00:29:10,295 So that's step one. 561 00:29:10,295 --> 00:29:12,170 And you know I could put it on the right hand 562 00:29:12,170 --> 00:29:15,990 side or the left hand side, doesn't really matter. 563 00:29:15,990 --> 00:29:20,311 I chose to do the left hand side first, the left half. 564 00:29:20,311 --> 00:29:24,570 And so what I've done is, through that one step, 565 00:29:24,570 --> 00:29:30,010 if in fact you have that condition-- a n over 2 566 00:29:30,010 --> 00:29:33,630 is less than a n over 2 minus 1-- then you move to your left 567 00:29:33,630 --> 00:29:37,490 and you work on one half of the problem. 568 00:29:37,490 --> 00:29:43,120 But if that's not the case, then if n over-- n over 2 569 00:29:43,120 --> 00:29:48,170 is less than a over n over-- n by 2 plus 1, 570 00:29:48,170 --> 00:29:57,520 then only look at n over 2 plus 1 through n for a peak. 571 00:29:57,520 --> 00:29:59,960 So I haven't bothered writing out all the words. 572 00:29:59,960 --> 00:30:03,480 They're exactly the same as the left hand side. 573 00:30:03,480 --> 00:30:06,160 You just look to the right hand side. 574 00:30:06,160 --> 00:30:10,430 Otherwise if both of these conditions don't fire, 575 00:30:10,430 --> 00:30:12,160 you're actually done. 576 00:30:12,160 --> 00:30:12,660 OK? 577 00:30:12,660 --> 00:30:16,130 That's actually the best case in terms of finishing early, 578 00:30:16,130 --> 00:30:18,340 at least in this recursive step. 579 00:30:18,340 --> 00:30:22,580 Because now the n over 2 position is a peak. 580 00:30:27,210 --> 00:30:30,500 Because what you found is that the n over 2 position 581 00:30:30,500 --> 00:30:34,740 is greater than or equal to both of its adjacent positions, 582 00:30:34,740 --> 00:30:36,850 and that's exactly the definition of a peak. 583 00:30:36,850 --> 00:30:38,430 So you're done. 584 00:30:38,430 --> 00:30:39,350 OK? 585 00:30:39,350 --> 00:30:44,500 So all of this is good. 586 00:30:44,500 --> 00:30:53,307 You want to write an argument that this algorithm is correct. 587 00:30:53,307 --> 00:30:54,890 And I'm not going to bother with that. 588 00:30:54,890 --> 00:30:59,530 I just wave my hands a bit, and you all nodded, 589 00:30:59,530 --> 00:31:01,230 so we're done with that. 590 00:31:01,230 --> 00:31:07,310 But the point being you will see in your problem set 591 00:31:07,310 --> 00:31:11,560 a precise argument for a more complicated algorithm, the 2D 592 00:31:11,560 --> 00:31:12,720 version of this. 593 00:31:12,720 --> 00:31:16,900 And that should be a template for you to go write a proof, 594 00:31:16,900 --> 00:31:19,200 or an argument, a formal argument, 595 00:31:19,200 --> 00:31:21,620 that a particular algorithm is correct. 596 00:31:21,620 --> 00:31:23,550 That it does what it claims to do. 597 00:31:23,550 --> 00:31:30,370 And in this case it's two, three lines of careful reasoning 598 00:31:30,370 --> 00:31:34,520 that essentially say, given the definition of the peak, 599 00:31:34,520 --> 00:31:38,600 that this is going to find a peak in the array 600 00:31:38,600 --> 00:31:39,860 that you're given. 601 00:31:39,860 --> 00:31:40,900 All right? 602 00:31:40,900 --> 00:31:44,910 So we all believe that this algorithm is correct. 603 00:31:44,910 --> 00:31:48,650 Let's talk now about the complexity of this algorithm. 604 00:31:48,650 --> 00:31:50,630 Because the whole point of this algorithm 605 00:31:50,630 --> 00:31:52,700 was because we didn't like this theta 606 00:31:52,700 --> 00:31:56,350 n complexity corresponding to the straightforward algorithm. 607 00:31:56,350 --> 00:31:57,470 So it'd like to do better. 608 00:32:08,350 --> 00:32:10,830 So what I'd like to do is ask one of you 609 00:32:10,830 --> 00:32:14,890 to give me a recurrence relation of the kind, you know, T of n 610 00:32:14,890 --> 00:32:18,040 equals blah, blah, blah. 611 00:32:18,040 --> 00:32:22,310 That would correspond to this recursive algorithm, 612 00:32:22,310 --> 00:32:24,020 this divide and conquer algorithm. 613 00:32:24,020 --> 00:32:29,050 And then using that, I'd like to get to the actual complexity 614 00:32:29,050 --> 00:32:33,280 in terms of what the theta of complexity corresponds to. 615 00:32:33,280 --> 00:32:33,780 Yeah? 616 00:32:33,780 --> 00:32:34,752 Back there? 617 00:32:34,752 --> 00:32:39,680 AUDIENCE: So the worst case scenario if T of n 618 00:32:39,680 --> 00:32:42,795 is going to be some constant amount of time-- 619 00:32:42,795 --> 00:32:43,420 PROFESSOR: Yep. 620 00:32:43,420 --> 00:32:47,116 AUDIENCE: --it takes to investigate whether a certain 621 00:32:47,116 --> 00:32:49,851 element is [INAUDIBLE], plus-- 622 00:32:49,851 --> 00:32:50,713 [COUGH] 623 00:32:50,713 --> 00:32:52,022 --T of n over 2? 624 00:32:52,022 --> 00:32:52,730 PROFESSOR: Great. 625 00:32:52,730 --> 00:32:53,550 Exactly right. 626 00:32:53,550 --> 00:32:54,460 That's exactly right. 627 00:32:54,460 --> 00:32:58,370 So if you look at this algorithm and you say, 628 00:32:58,370 --> 00:33:01,290 from a computation standpoint, can I 629 00:33:01,290 --> 00:33:05,510 write an equation corresponding to the execution 630 00:33:05,510 --> 00:33:06,570 of this algorithm? 631 00:33:06,570 --> 00:33:11,350 And you say, T of n is the work that this algorithm does on-- 632 00:33:11,350 --> 00:33:13,630 as input of size n. 633 00:33:13,630 --> 00:33:14,130 OK? 634 00:33:25,390 --> 00:33:28,550 Then I can write this equation. 635 00:33:31,310 --> 00:33:34,530 And this theta 1 corresponds to the two comparisons 636 00:33:34,530 --> 00:33:37,697 that you do looking at-- potentially the two comparisons 637 00:33:37,697 --> 00:33:39,280 that you do-- looking at the left hand 638 00:33:39,280 --> 00:33:41,440 side and the right hand side. 639 00:33:41,440 --> 00:33:44,580 So that's-- 2 is a constant, so that's why we put theta 1. 640 00:33:44,580 --> 00:33:45,200 All right? 641 00:33:45,200 --> 00:33:47,060 So you get a cushion, too. 642 00:33:47,060 --> 00:33:49,630 Watch out guys. 643 00:33:49,630 --> 00:33:50,780 Whoa! 644 00:33:50,780 --> 00:33:52,192 Oh actually that wasn't so bad. 645 00:33:52,192 --> 00:33:54,000 Good. 646 00:33:54,000 --> 00:33:55,620 Veers left, Eric. 647 00:33:55,620 --> 00:33:57,420 Veers left. 648 00:33:57,420 --> 00:34:03,360 So if you take this and you start expanding it, 649 00:34:03,360 --> 00:34:05,180 eventually you're going to get to the base 650 00:34:05,180 --> 00:34:12,091 case, which is T of 1 is theta 1. 651 00:34:12,091 --> 00:34:12,590 Right? 652 00:34:12,590 --> 00:34:16,580 Because you have a one element array you just for that array 653 00:34:16,580 --> 00:34:19,650 it's just going to return that as a peak. 654 00:34:19,650 --> 00:34:23,130 And so if you do that, and you expand it all the way out, 655 00:34:23,130 --> 00:34:31,080 then you can write T of n equals theta 1 plus theta 1. 656 00:34:31,080 --> 00:34:39,300 And you're going to do this log to the base 2 of n times. 657 00:34:39,300 --> 00:34:43,660 And adding these all up, gives you 658 00:34:43,660 --> 00:34:46,360 a complexity theta log 2 of n. 659 00:34:46,360 --> 00:34:48,330 Right? 660 00:34:48,330 --> 00:34:53,089 So now you compare this with that. 661 00:34:53,089 --> 00:34:54,630 And there's really a huge difference. 662 00:34:54,630 --> 00:34:57,440 There's an exponential difference. 663 00:34:57,440 --> 00:35:01,860 If you coded up this algorithm in Python-- 664 00:35:01,860 --> 00:35:06,170 and I did-- both these algorithms for the 1D version-- 665 00:35:06,170 --> 00:35:14,160 and if you run it on n being 10 million or so, 666 00:35:14,160 --> 00:35:17,820 then this algorithm takes 13 seconds. 667 00:35:17,820 --> 00:35:18,320 OK? 668 00:35:18,320 --> 00:35:21,880 The-- The theta 10 algorithm takes 13 seconds. 669 00:35:21,880 --> 00:35:26,070 And this one takes 0.001 seconds. 670 00:35:26,070 --> 00:35:26,570 OK? 671 00:35:26,570 --> 00:35:27,929 Huge difference. 672 00:35:27,929 --> 00:35:30,345 So there is a big difference between theta n and theta log 673 00:35:30,345 --> 00:35:31,970 n. 674 00:35:31,970 --> 00:35:35,840 It's literally the difference between 2 raised to n, and n. 675 00:35:35,840 --> 00:35:40,120 It makes sense to try and reduce complexity 676 00:35:40,120 --> 00:35:43,000 as you can see, especially if you're 677 00:35:43,000 --> 00:35:44,450 talking about large inputs. 678 00:35:44,450 --> 00:35:45,390 All right? 679 00:35:45,390 --> 00:35:48,860 And you'll see that more clearly as we 680 00:35:48,860 --> 00:35:51,300 go to a 2D version of this problem. 681 00:35:51,300 --> 00:35:52,202 All right? 682 00:35:52,202 --> 00:35:53,910 So you can't really do better for the 1D. 683 00:35:53,910 --> 00:35:56,750 The 1D is a straightforward problem. 684 00:35:56,750 --> 00:35:58,500 It gets a little more interesting-- 685 00:35:58,500 --> 00:36:01,080 the problems get a little-- excuse me, 686 00:36:01,080 --> 00:36:03,600 the algorithms get a little more sophisticated 687 00:36:03,600 --> 00:36:08,340 when we look at a 2D version of peak finding. 688 00:36:08,340 --> 00:36:10,535 So let's talk about the 2D version. 689 00:36:15,810 --> 00:36:18,250 So as you can imagine in the 2D version 690 00:36:18,250 --> 00:36:20,715 you have a matrix, or a two dimensional array. 691 00:36:23,490 --> 00:36:29,575 And we'll say this thing has n rows and m columns. 692 00:36:34,700 --> 00:36:37,190 And now we have to define what a peak is. 693 00:36:37,190 --> 00:36:38,350 And it's a hill. 694 00:36:38,350 --> 00:36:41,540 It's the obvious definition of a peak. 695 00:36:41,540 --> 00:36:50,490 So if you had a in here, c, b, d, e. 696 00:36:50,490 --> 00:37:02,250 Then as you can guess, a is a 2D peak if, and only if, 697 00:37:02,250 --> 00:37:08,830 a greater than or equal to b; a greater than or equal to d, c 698 00:37:08,830 --> 00:37:10,061 and e. 699 00:37:10,061 --> 00:37:10,560 All right? 700 00:37:10,560 --> 00:37:12,230 So it's a little hill up there. 701 00:37:12,230 --> 00:37:12,730 All right? 702 00:37:12,730 --> 00:37:15,120 And again I've used the greater than or equal to here, 703 00:37:15,120 --> 00:37:18,490 so that's similar to the 1D in the case 704 00:37:18,490 --> 00:37:21,345 that you'll always find a peak in any 2D matrix. 705 00:37:23,960 --> 00:37:29,210 Now again I'll give you the straightforward algorithm, 706 00:37:29,210 --> 00:37:31,640 and we'll call it the Greedy Ascent algorithm. 707 00:37:41,660 --> 00:37:45,820 And the Greedy Ascent algorithm essentially picks a direction 708 00:37:45,820 --> 00:37:50,560 and, you know, tries to follow that direction in order 709 00:37:50,560 --> 00:37:52,770 to find a peak. 710 00:37:52,770 --> 00:38:01,840 So for example, if I had this particular-- 711 00:38:01,840 --> 00:38:10,790 --matrix; 14, 13, 12, 15, 9, 11, 17-- 712 00:38:17,010 --> 00:38:20,850 Then what might happen is if I started at some arbitrary 713 00:38:20,850 --> 00:38:23,360 midpoint-- So the Greedy Ascent algorithm 714 00:38:23,360 --> 00:38:26,210 has to make choices as to where to start. 715 00:38:26,210 --> 00:38:29,142 Just like we had different cases here, 716 00:38:29,142 --> 00:38:31,100 you have to make a choice as to where to start. 717 00:38:31,100 --> 00:38:32,770 You might want to start in the middle, 718 00:38:32,770 --> 00:38:35,560 and you might want to work your way left first. 719 00:38:35,560 --> 00:38:38,380 Or you're going to all-- You just keep going left, 720 00:38:38,380 --> 00:38:39,720 our keep going right. 721 00:38:39,720 --> 00:38:42,340 And if you hit an edge, you go down. 722 00:38:42,340 --> 00:38:46,450 So you make some choices as to what the default traversal 723 00:38:46,450 --> 00:38:47,810 directions are. 724 00:38:47,810 --> 00:38:50,820 And so if you say you want to start with 12, 725 00:38:50,820 --> 00:38:54,050 you are going to go look for something to left. 726 00:38:54,050 --> 00:38:58,470 And if it's greater than, you're going to follow that direction. 727 00:38:58,470 --> 00:39:00,950 If it's not, if it's less, then you're 728 00:39:00,950 --> 00:39:04,200 going to go in the other direction, in this case, 729 00:39:04,200 --> 00:39:05,160 for example. 730 00:39:05,160 --> 00:39:13,120 So in this case you'll go to 12, 13 , 14, 15, 16, 17, 19, 731 00:39:13,120 --> 00:39:14,230 and 20. 732 00:39:14,230 --> 00:39:17,765 And you'd find-- You 'd find this peak. 733 00:39:17,765 --> 00:39:21,680 Now I haven't given you the specific details 734 00:39:21,680 --> 00:39:23,750 of a Greedy Ascent algorithm. 735 00:39:23,750 --> 00:39:33,400 But I think if you look at the worst case possibilities 736 00:39:33,400 --> 00:39:36,370 here, with respect to a given matrix, 737 00:39:36,370 --> 00:39:38,920 and for any given starting point, 738 00:39:38,920 --> 00:39:43,270 and for any given strategy-- in terms of choosing left first, 739 00:39:43,270 --> 00:39:48,630 versus right first, or down first versus up first-- 740 00:39:48,630 --> 00:39:51,370 you will have a situation where-- just 741 00:39:51,370 --> 00:39:55,450 like we had in the 1D case-- you may end up 742 00:39:55,450 --> 00:40:02,015 touching a large fraction of the elements in this 2D array. 743 00:40:02,015 --> 00:40:02,750 OK? 744 00:40:02,750 --> 00:40:05,190 So in this case, we ended up, you know, 745 00:40:05,190 --> 00:40:06,890 touching a bunch of different elements. 746 00:40:06,890 --> 00:40:10,529 And it's quite possible that you could end up touching-- 747 00:40:10,529 --> 00:40:12,820 starting from the midpoint-- you could up touching half 748 00:40:12,820 --> 00:40:16,990 the elements, and in some cases, touching all the elements. 749 00:40:16,990 --> 00:40:23,000 So if you do a worst case analysis of this algorithm-- 750 00:40:23,000 --> 00:40:25,410 a particular algorithm with particular choices in terms 751 00:40:25,410 --> 00:40:30,370 of the starting point and the direction of search-- 752 00:40:30,370 --> 00:40:33,750 a Greedy Ascent algorithm would have theta n m complexity. 753 00:40:33,750 --> 00:40:34,320 All right? 754 00:40:34,320 --> 00:40:42,480 And in the case where n equals m, or m equals n, 755 00:40:42,480 --> 00:40:44,840 you'd have theta n squared complexity. 756 00:40:44,840 --> 00:40:46,290 OK? 757 00:40:46,290 --> 00:40:48,440 I won't spend very much time on this, 758 00:40:48,440 --> 00:40:52,150 because I want to talk to you about the divide 759 00:40:52,150 --> 00:40:58,020 and conquer versions of this algorithm for the 2D peak. 760 00:40:58,020 --> 00:41:00,860 But hopefully you're all with me with respect 761 00:41:00,860 --> 00:41:03,530 to what the worst case complexity is. 762 00:41:03,530 --> 00:41:04,990 All right? 763 00:41:04,990 --> 00:41:06,070 People buy that? 764 00:41:06,070 --> 00:41:06,570 Yeah. 765 00:41:06,570 --> 00:41:07,390 Question back there. 766 00:41:07,390 --> 00:41:09,264 AUDIENCE: Can you-- Is that an approximation? 767 00:41:09,264 --> 00:41:14,630 Or can you actually get to n times m traversals? 768 00:41:14,630 --> 00:41:18,780 PROFESSOR: So there are specific Greedy Ascent algorithms, 769 00:41:18,780 --> 00:41:21,680 and specific matrices where, if I give you 770 00:41:21,680 --> 00:41:24,680 the code for the algorithm, and I give you a specific matrix, 771 00:41:24,680 --> 00:41:28,200 that I could make you touch all of these elements. 772 00:41:28,200 --> 00:41:28,870 That's correct. 773 00:41:28,870 --> 00:41:30,600 So we're talking about worst case. 774 00:41:30,600 --> 00:41:32,260 You're being very paranoid when you 775 00:41:32,260 --> 00:41:34,540 talk about worst case complexity. 776 00:41:34,540 --> 00:41:38,800 And so I'm-- hand waving a bit here, 777 00:41:38,800 --> 00:41:41,150 simply because I haven't given you the specifics 778 00:41:41,150 --> 00:41:42,150 of the algorithm yet. 779 00:41:42,150 --> 00:41:42,650 Right? 780 00:41:42,650 --> 00:41:44,669 This is really a set of algorithms, 781 00:41:44,669 --> 00:41:46,210 because I haven't given you the code, 782 00:41:46,210 --> 00:41:47,668 I haven't told you where it starts, 783 00:41:47,668 --> 00:41:49,050 and which direction it goes. 784 00:41:49,050 --> 00:41:52,380 But you go, do that, fix it, and I 785 00:41:52,380 --> 00:41:55,380 would be the person who tries to find the worst case complexity. 786 00:41:55,380 --> 00:41:58,250 Suddenly it's very easy to get to theta n 787 00:41:58,250 --> 00:42:03,140 m in terms of having some constant multiplying n times m. 788 00:42:03,140 --> 00:42:05,810 But you can definitely get to that constant 789 00:42:05,810 --> 00:42:08,520 being very close to 1. 790 00:42:08,520 --> 00:42:09,910 OK? 791 00:42:09,910 --> 00:42:11,350 If not 1. 792 00:42:11,350 --> 00:42:12,190 All right. 793 00:42:12,190 --> 00:42:14,480 So let's talk about divide and conquer. 794 00:42:14,480 --> 00:42:18,720 And let's say that I did something 795 00:42:18,720 --> 00:42:22,770 like this, where I just tried to jam the binary search 796 00:42:22,770 --> 00:42:26,340 algorithm into the 2D version. 797 00:42:26,340 --> 00:42:26,840 All right? 798 00:42:37,780 --> 00:42:43,830 So what I'm going to do is-- 799 00:42:43,830 --> 00:42:55,430 --I'm going to pick the middle column, j equals m over 2. 800 00:42:55,430 --> 00:43:00,710 And I'm going to find a 1D peak using 801 00:43:00,710 --> 00:43:01,810 whatever algorithm I want. 802 00:43:01,810 --> 00:43:04,820 And I'll probably end up using the more efficient algorithm, 803 00:43:04,820 --> 00:43:07,850 the binary search version that's gone 804 00:43:07,850 --> 00:43:10,530 all the way to the left of the board there. 805 00:43:10,530 --> 00:43:14,000 And let's say I find a binary peak at (i, j). 806 00:43:14,000 --> 00:43:17,060 Because I've picked a column, and I'm just finding a 1D peak. 807 00:43:20,320 --> 00:43:23,550 So this is j equals m over 2. 808 00:43:23,550 --> 00:43:25,690 That's i. 809 00:43:25,690 --> 00:43:29,850 Now I use (i,j). 810 00:43:29,850 --> 00:43:38,730 In particular row i as a start-- 811 00:43:38,730 --> 00:43:42,310 --to find a 1D peak on row i. 812 00:43:47,470 --> 00:43:50,041 And I stand up here, I'm really happy. 813 00:43:50,041 --> 00:43:50,540 OK? 814 00:43:50,540 --> 00:43:53,440 Because I say, wow. 815 00:43:53,440 --> 00:43:56,850 I picked a middle column, I found a 1D peak, 816 00:43:56,850 --> 00:44:01,350 that is theta m complexity to find a 1D peak as we argued. 817 00:44:01,350 --> 00:44:06,665 And one side-- the theta m-- 818 00:44:06,665 --> 00:44:07,659 AUDIENCE: Log n. 819 00:44:07,659 --> 00:44:08,700 PROFESSOR: Oh, I'm sorry. 820 00:44:08,700 --> 00:44:09,730 You're right. 821 00:44:09,730 --> 00:44:13,490 The log n complexity, that's what this was. 822 00:44:13,490 --> 00:44:15,031 So I do have that here. 823 00:44:15,031 --> 00:44:15,530 Yeah. 824 00:44:15,530 --> 00:44:16,470 Log n complexity. 825 00:44:16,470 --> 00:44:18,920 Thanks, Eric. 826 00:44:18,920 --> 00:44:26,130 And then once I do that, I can find a 1D peak on row i. 827 00:44:26,130 --> 00:44:28,690 In this case row i would be m wide, 828 00:44:28,690 --> 00:44:30,630 so it would be log m complexity. 829 00:44:30,630 --> 00:44:33,840 If n equals m, then I have a couple of steps of log n, 830 00:44:33,840 --> 00:44:35,050 and I'm done. 831 00:44:35,050 --> 00:44:36,030 All right? 832 00:44:36,030 --> 00:44:38,320 Am I done? 833 00:44:38,320 --> 00:44:39,640 No. 834 00:44:39,640 --> 00:44:42,770 Can someone tell me why I'm not done? 835 00:44:42,770 --> 00:44:43,270 Precisely? 836 00:44:43,270 --> 00:44:43,947 Yep. 837 00:44:43,947 --> 00:44:46,841 AUDIENCE: Because when you do the second part 838 00:44:46,841 --> 00:44:50,155 to find the peak in row i, you might not 839 00:44:50,155 --> 00:44:52,987 have that column peak-- There might not 840 00:44:52,987 --> 00:44:54,320 be a peak on the column anymore. 841 00:44:54,320 --> 00:44:56,240 PROFESSOR: That's exactly correct. 842 00:44:56,240 --> 00:44:59,280 So this algorithm is incorrect. 843 00:44:59,280 --> 00:44:59,780 OK? 844 00:44:59,780 --> 00:45:01,460 It doesn't work. 845 00:45:01,460 --> 00:45:04,380 It's efficient, but incorrect. 846 00:45:04,380 --> 00:45:05,390 OK? 847 00:45:05,390 --> 00:45:07,215 It's-- You want to be correct. 848 00:45:07,215 --> 00:45:09,640 You know being correcting and inefficient 849 00:45:09,640 --> 00:45:13,580 is definitely better than being inefficient-- I'm sorry. 850 00:45:13,580 --> 00:45:15,790 Being incorrect and efficient. 851 00:45:15,790 --> 00:45:17,870 So this is an efficient algorithm, 852 00:45:17,870 --> 00:45:22,077 in the sense that it will only take log n time, 853 00:45:22,077 --> 00:45:22,910 but it doesn't work. 854 00:45:22,910 --> 00:45:25,620 And I'll give you a simple example 855 00:45:25,620 --> 00:45:27,650 here where it doesn't work. 856 00:45:32,490 --> 00:45:35,680 The problem is-- 857 00:45:35,680 --> 00:45:39,960 --a 2D peak-- 858 00:45:39,960 --> 00:45:44,150 --may not exist-- 859 00:45:44,150 --> 00:45:46,090 --on row i. 860 00:45:46,090 --> 00:45:47,700 And here's an example of that. 861 00:45:53,640 --> 00:45:58,360 Actually this is-- This is exactly the example of that. 862 00:45:58,360 --> 00:46:02,690 Let's say that I started with this row. 863 00:46:02,690 --> 00:46:05,057 Since it's-- I'm starting with the middle row, 864 00:46:05,057 --> 00:46:06,890 and I could start with this one or that one. 865 00:46:06,890 --> 00:46:10,640 Let's say I started with that one. 866 00:46:10,640 --> 00:46:16,350 I end up finding a peak. 867 00:46:16,350 --> 00:46:22,330 And if this were 10 up here, I'd choose 12 as a peak. 868 00:46:22,330 --> 00:46:25,856 And it's quite possible that I return 12 as a peak. 869 00:46:25,856 --> 00:46:27,900 Even though 19 is bigger, because 12 870 00:46:27,900 --> 00:46:30,370 is a peak given 10 and 11 up here. 871 00:46:30,370 --> 00:46:33,060 And then when I choose this particular row, 872 00:46:33,060 --> 00:46:36,720 and I find a peak on this row, it would be 14. 873 00:46:36,720 --> 00:46:38,870 That is a 1D peak on this row. 874 00:46:38,870 --> 00:46:41,840 But 14 is not a 2D peak. 875 00:46:41,840 --> 00:46:42,790 OK? 876 00:46:42,790 --> 00:46:47,402 So this particular example, 14 would return 14. 877 00:46:47,402 --> 00:46:50,740 And 14 is not a 2D peak. 878 00:46:50,740 --> 00:46:53,730 All right? 879 00:46:53,730 --> 00:46:57,460 You can collect your cushion after the class. 880 00:46:57,460 --> 00:47:01,880 So not so good. 881 00:47:01,880 --> 00:47:05,430 Look like an efficient algorithm, but doesn't work. 882 00:47:05,430 --> 00:47:06,180 All right? 883 00:47:06,180 --> 00:47:09,290 So how can we get to something that actually works? 884 00:47:09,290 --> 00:47:14,300 So the last algorithm that I'm going to show you-- 885 00:47:14,300 --> 00:47:16,920 And you'll see four different algorithms in your problem 886 00:47:16,920 --> 00:47:21,260 set-- 887 00:47:21,260 --> 00:47:24,340 --that you'll have to analyze the complexity for and decide 888 00:47:24,340 --> 00:47:28,180 if they're efficient, and if they're correct. 889 00:47:28,180 --> 00:47:33,440 But here's a-- a recursive version 890 00:47:33,440 --> 00:47:37,650 that is better than, in terms of complexity, 891 00:47:37,650 --> 00:47:40,120 than the Greedy Ascent algorithm. 892 00:47:40,120 --> 00:47:43,410 And this one works. 893 00:47:43,410 --> 00:47:46,470 So what I'm going to do is pick a middle column. 894 00:47:49,750 --> 00:47:51,435 j equals m over 2 as before. 895 00:47:54,050 --> 00:48:02,480 I'm going to find the global maximum on column j. 896 00:48:05,316 --> 00:48:06,690 And that's going to be at (i, j). 897 00:48:09,580 --> 00:48:18,230 I'm going to compare (i comma j minus 1), (i comma j), 898 00:48:18,230 --> 00:48:20,440 and (i,j plus 1). 899 00:48:20,440 --> 00:48:23,620 Which means that once I've found the maximum in this row, 900 00:48:23,620 --> 00:48:25,890 all I'm going to look to the left and the right, 901 00:48:25,890 --> 00:48:27,920 and compare. 902 00:48:27,920 --> 00:48:30,825 I'm going to pick the left columns. 903 00:48:33,520 --> 00:48:40,890 If (i comma j minus 1) is greater than (i comma j)-- 904 00:48:40,890 --> 00:48:42,420 and similarly for the right. 905 00:48:49,490 --> 00:48:55,720 And if in fact I-- either of these two conditions 906 00:48:55,720 --> 00:49:00,210 don't fire, and what I have is (i comma j) 907 00:49:00,210 --> 00:49:04,280 is greater than or equal to (i comma j minus 1) 908 00:49:04,280 --> 00:49:07,630 and (i comma j plus 1), then I'm done. 909 00:49:07,630 --> 00:49:12,760 Just like I had for the 1D version. 910 00:49:12,760 --> 00:49:17,500 If (i comma j) is greater than or equal to (i comma 911 00:49:17,500 --> 00:49:26,350 j minus 1), and (i comma j plus 1), that implies (i, j) 912 00:49:26,350 --> 00:49:28,591 is a 2D peak. 913 00:49:28,591 --> 00:49:29,212 OK? 914 00:49:29,212 --> 00:49:30,670 And the reason that is the case, is 915 00:49:30,670 --> 00:49:35,902 because (i comma j) was the maximum element in that column. 916 00:49:35,902 --> 00:49:37,360 So you know that you've compared it 917 00:49:37,360 --> 00:49:41,520 to all of the adjacent elements, looking up and looking down, 918 00:49:41,520 --> 00:49:43,000 that's the maximum element. 919 00:49:43,000 --> 00:49:45,150 Now you've look at the left and the right, 920 00:49:45,150 --> 00:49:47,750 and in fact it's greater than or equal to the elements 921 00:49:47,750 --> 00:49:49,110 on the left and the right. 922 00:49:49,110 --> 00:49:51,290 And so therefore it's a 2D peak. 923 00:49:51,290 --> 00:49:52,270 OK? 924 00:49:52,270 --> 00:49:57,710 So in this case, when you pick the left or the right columns-- 925 00:49:57,710 --> 00:49:59,570 you'll pick one of them-- you're going 926 00:49:59,570 --> 00:50:08,025 to solve the new problem with half the number of columns. 927 00:50:16,540 --> 00:50:17,580 All right? 928 00:50:17,580 --> 00:50:20,965 And again, you have to go through an analysis, 929 00:50:20,965 --> 00:50:24,950 or an argument, to make sure that this algorithm is correct. 930 00:50:24,950 --> 00:50:29,740 But its intuitively correct, simply because it matches 931 00:50:29,740 --> 00:50:33,190 the 1D version much more closely. 932 00:50:33,190 --> 00:50:37,870 And you also have your condition where you break away right 933 00:50:37,870 --> 00:50:41,160 here, where you have a 2D peak, just like the 1D version. 934 00:50:41,160 --> 00:50:43,930 And what you've done is break this matrix up 935 00:50:43,930 --> 00:50:46,190 into half the size. 936 00:50:46,190 --> 00:50:51,090 And that's essentially why this algorithm works. 937 00:50:51,090 --> 00:50:55,806 When you have a single column-- 938 00:51:01,070 --> 00:51:09,610 --find the global maximum and you're done. 939 00:51:09,610 --> 00:51:10,110 All right? 940 00:51:10,110 --> 00:51:12,570 So that's the base case. 941 00:51:12,570 --> 00:51:14,670 So let me end with just writing out 942 00:51:14,670 --> 00:51:17,870 what the recurrence relation for the complexity of this 943 00:51:17,870 --> 00:51:22,481 is, and argue what the overall complexity of this algorithm 944 00:51:22,481 --> 00:51:22,980 is. 945 00:51:25,221 --> 00:51:26,720 And then I'll give you the bad news. 946 00:51:30,781 --> 00:51:31,280 All right. 947 00:51:31,280 --> 00:51:36,260 So overall what you have is, you have something like T of (n, m) 948 00:51:36,260 --> 00:51:42,570 equals T of (n, m over 2) plus theta n. 949 00:51:42,570 --> 00:51:43,640 Why is that? 950 00:51:43,640 --> 00:51:47,830 Well n is the number of rows, m is the number of columns. 951 00:51:47,830 --> 00:51:51,430 In one case you'll be breaking things down 952 00:51:51,430 --> 00:51:54,630 into half the number of columns, which is m over 2. 953 00:51:54,630 --> 00:51:57,520 And in order to find the global maximum, 954 00:51:57,520 --> 00:52:00,220 you'll be doing theta n work, because you're 955 00:52:00,220 --> 00:52:01,495 finding the global maximum. 956 00:52:01,495 --> 00:52:01,995 Right? 957 00:52:01,995 --> 00:52:05,270 You just have to scan it-- this-- 958 00:52:05,270 --> 00:52:08,840 That's the way-- That's what it's going to take. 959 00:52:08,840 --> 00:52:11,960 And so if you do that, and you go run it through-- 960 00:52:11,960 --> 00:52:16,210 and you know that T of (n, 1) is theta n-- which 961 00:52:16,210 --> 00:52:20,880 is this last part over here-- that's your base case. 962 00:52:20,880 --> 00:52:28,560 You get T of (n, m) is theta of n added to theta of n, 963 00:52:28,560 --> 00:52:34,820 log of m times-- log 2 of m times. 964 00:52:34,820 --> 00:52:42,250 Which is theta of n-- log 2 of m. 965 00:52:42,250 --> 00:52:43,640 All right? 966 00:52:43,640 --> 00:52:48,120 So you're not done with peak finding. 967 00:52:48,120 --> 00:52:53,082 What you'll see is at four algorithms coded in Python-- 968 00:52:53,082 --> 00:52:55,290 I'm not going to give away what those algorithms are, 969 00:52:55,290 --> 00:52:57,090 but you'll have to recognize them. 970 00:52:57,090 --> 00:53:00,180 You will have seen versions of those algorithms 971 00:53:00,180 --> 00:53:01,850 already in lecture. 972 00:53:01,850 --> 00:53:06,210 And your job is going to be to analyze the algorithms, as I 973 00:53:06,210 --> 00:53:09,690 said before, prove that one of them is correct, 974 00:53:09,690 --> 00:53:12,784 and find counter-examples for the ones that aren't correct. 975 00:53:12,784 --> 00:53:14,200 The course staff will stick around 976 00:53:14,200 --> 00:53:17,110 here to answer questions-- logistical questions-- 977 00:53:17,110 --> 00:53:18,990 or questions about lecture. 978 00:53:18,990 --> 00:53:21,650 And I owe that gentleman a cushion.