1 00:00:01,040 --> 00:00:03,460 The following content is provided under a Creative 2 00:00:03,460 --> 00:00:04,870 Commons license. 3 00:00:04,870 --> 00:00:07,910 Your support will help MIT OpenCourseWare continue to 4 00:00:07,910 --> 00:00:11,560 offer high quality educational resources for free. 5 00:00:11,560 --> 00:00:14,460 To make a donation, or view additional materials from 6 00:00:14,460 --> 00:00:20,290 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:20,290 --> 00:00:21,540 ocw.mit.edu. 8 00:00:24,708 --> 00:00:28,230 PROFESSOR: OK, I want to remind you that there's a quiz 9 00:00:28,230 --> 00:00:29,480 one week from today. 10 00:00:32,060 --> 00:00:34,390 Yeah, I know it's soon. 11 00:00:34,390 --> 00:00:40,500 Open book, open notes, no computing or communication 12 00:00:40,500 --> 00:00:41,750 devices allowed. 13 00:00:44,050 --> 00:00:49,220 Between now and then, probably tomorrow in fact, or at least 14 00:00:49,220 --> 00:00:52,920 over the weekend, I'll send out a summary of what I think 15 00:00:52,920 --> 00:00:54,980 we've covered so far and what you'll be 16 00:00:54,980 --> 00:00:58,210 responsible for in the quiz. 17 00:00:58,210 --> 00:01:03,160 Roughly speaking, it's anything covered in lectures, 18 00:01:03,160 --> 00:01:06,850 problem sets, or recitations. 19 00:01:06,850 --> 00:01:09,280 I will also post some practice questions that 20 00:01:09,280 --> 00:01:10,880 you can work on. 21 00:01:10,880 --> 00:01:15,160 And I'll tell you now that we will not be posting answers to 22 00:01:15,160 --> 00:01:17,910 the practice questions. 23 00:01:17,910 --> 00:01:23,260 Instead, we'll be holding some quiz reviews. 24 00:01:23,260 --> 00:01:24,510 OK. 25 00:01:26,460 --> 00:01:30,060 I wanted to cover two different topics today. 26 00:01:30,060 --> 00:01:34,190 The first topic is just a tiny bit on floating 27 00:01:34,190 --> 00:01:37,030 point numbers in Python. 28 00:01:37,030 --> 00:01:40,120 But in fact, what I'm going to tell you is true about all 29 00:01:40,120 --> 00:01:42,342 programming languages-- 30 00:01:42,342 --> 00:01:44,730 in fact all, computers really. 31 00:01:44,730 --> 00:01:47,655 And then after that we'll spend most of the lecture on 32 00:01:47,655 --> 00:01:50,470 the topic of debugging. 33 00:01:50,470 --> 00:01:54,030 So let me start with a quick review of binary numbers. 34 00:01:54,030 --> 00:01:56,940 Because you have to understand binary numbers to understand 35 00:01:56,940 --> 00:01:59,030 floating point. 36 00:01:59,030 --> 00:02:01,200 So when you first learned about numbers, you learned 37 00:02:01,200 --> 00:02:03,060 about base 10. 38 00:02:03,060 --> 00:02:06,430 And you learned that a decimal number is represented by some 39 00:02:06,430 --> 00:02:10,534 combination of the digits 0 through 9, the rightmost place 40 00:02:10,534 --> 00:02:14,200 is the 10 to the 0 place, and then it's the 10 to the 1 41 00:02:14,200 --> 00:02:18,180 place, the 10 to the 2 place, et cetera. 42 00:02:18,180 --> 00:02:23,790 So for example, the number 302 or the digits 3-0-2 represent 43 00:02:23,790 --> 00:02:30,530 3 times 100, plus 0 times 10, plus 2 times 1. 44 00:02:30,530 --> 00:02:32,000 Duh. 45 00:02:32,000 --> 00:02:36,300 All right, binary numbers are exactly the same except we 46 00:02:36,300 --> 00:02:39,680 only have two digits to choose from. 47 00:02:39,680 --> 00:02:44,490 Typically written as 0 and 1 and everything is represented 48 00:02:44,490 --> 00:02:47,290 by a sequence of those digits. 49 00:02:47,290 --> 00:02:53,600 The rightmost place is 2 to the 0, the next place is 2 to 50 00:02:53,600 --> 00:02:59,160 the 1, 2 to the 3, 2 to the 4, et cetera. 51 00:02:59,160 --> 00:03:04,730 So for example, if we look at the binary number 1-0-1, we 52 00:03:04,730 --> 00:03:14,400 see that's equal to 1 times 4, plus 0 times 2, plus 53 00:03:14,400 --> 00:03:17,180 1 times 1, or 5. 54 00:03:20,350 --> 00:03:24,080 So one of the first things we'll notice is binary numbers 55 00:03:24,080 --> 00:03:27,030 take a lot more digits to represent them, or take more 56 00:03:27,030 --> 00:03:29,760 digits than decimal numbers. 57 00:03:29,760 --> 00:03:37,380 In fact, if I give you n digits, n binary digits, how 58 00:03:37,380 --> 00:03:39,810 many different binary numbers can I represent 59 00:03:39,810 --> 00:03:41,060 with those n digits? 60 00:03:47,653 --> 00:03:52,350 Well, if I gave you n decimal digits, how many different 61 00:03:52,350 --> 00:03:53,580 numbers can I represent? 62 00:03:53,580 --> 00:03:55,390 How many different values can I represent? 63 00:03:55,390 --> 00:03:56,310 AUDIENCE: 10 to the n. 64 00:03:56,310 --> 00:03:57,065 PROFESSOR: Pardon? 65 00:03:57,065 --> 00:03:58,712 AUDIENCE: 10 to the n. 66 00:03:58,712 --> 00:03:59,800 PROFESSOR: 10 to the n. 67 00:03:59,800 --> 00:04:02,705 And so, for a binary number it's going to be 2 to the n. 68 00:04:06,980 --> 00:04:10,620 That's important, because we'll see as we get to talking 69 00:04:10,620 --> 00:04:14,380 about the complexity of various algorithms how long 70 00:04:14,380 --> 00:04:18,300 they take to run, or how much space they use, we'll 71 00:04:18,300 --> 00:04:22,930 frequently be resorting to arguments of this sort to 72 00:04:22,930 --> 00:04:26,220 understand them. 73 00:04:26,220 --> 00:04:31,440 Now the reason floating point numbers cause problems for 74 00:04:31,440 --> 00:04:36,960 programmers is that people have learned to 75 00:04:36,960 --> 00:04:39,390 think in base 10. 76 00:04:39,390 --> 00:04:44,000 Computers do everything in base 2, and that causes a 77 00:04:44,000 --> 00:04:47,310 cognitive dissonance sometimes. 78 00:04:47,310 --> 00:04:49,460 Where people are thinking one thing, and the computer is 79 00:04:49,460 --> 00:04:53,980 doing something slightly different. 80 00:04:53,980 --> 00:04:58,370 So why do people work in base 10? 81 00:04:58,370 --> 00:04:59,440 I don't know. 82 00:04:59,440 --> 00:05:02,190 Maybe it's because we have 10 fingers, but we 83 00:05:02,190 --> 00:05:03,500 also have 10 toes. 84 00:05:03,500 --> 00:05:06,620 So why didn't we work in base 20? 85 00:05:06,620 --> 00:05:08,670 We have one head, I don't know why. 86 00:05:08,670 --> 00:05:12,350 But we do it, we work in base 10. 87 00:05:12,350 --> 00:05:16,360 I do know why computers work in base 2. 88 00:05:16,360 --> 00:05:19,390 And that's because it's easy to build switches in 89 00:05:19,390 --> 00:05:21,480 electronic hardware. 90 00:05:21,480 --> 00:05:25,450 A switch is some physical device that has only two 91 00:05:25,450 --> 00:05:29,430 possible positions, on or off. 92 00:05:29,430 --> 00:05:33,700 We can build very efficient switches in hardware and so 93 00:05:33,700 --> 00:05:39,210 it's easy to represent a number as a sequence of on and 94 00:05:39,210 --> 00:05:44,120 off bits, which is either on or off. 95 00:05:44,120 --> 00:05:48,920 Originally they were relays, then they became transistors, 96 00:05:48,920 --> 00:05:51,090 now they're something altogether different. 97 00:05:51,090 --> 00:05:55,540 But, what they all had in common was they were stable in 98 00:05:55,540 --> 00:05:58,770 the off position, they were stable in the on position, and 99 00:05:58,770 --> 00:06:01,240 they never had to get in between. 100 00:06:01,240 --> 00:06:06,140 Hence, we represent everything in computers in binary. 101 00:06:06,140 --> 00:06:11,630 So now let's think about why that causes some confusion. 102 00:06:11,630 --> 00:06:17,140 And it does only for fractional numbers. 103 00:06:17,140 --> 00:06:22,680 So for whole numbers binary and decimal it doesn't matter. 104 00:06:22,680 --> 00:06:27,690 Ints are never confusing, they sort of do what God told us 105 00:06:27,690 --> 00:06:31,800 integers should do, or whoever told us integers. 106 00:06:31,800 --> 00:06:35,480 All right, but now let's look at other things. 107 00:06:35,480 --> 00:06:46,450 So I want to start by looking at the decimal number 0.125. 108 00:06:46,450 --> 00:06:50,280 What's that as a fraction, by the way? 109 00:06:50,280 --> 00:06:52,145 Happens to be one what? 110 00:06:52,145 --> 00:06:52,700 AUDIENCE: 1/8. 111 00:06:52,700 --> 00:06:56,010 PROFESSOR: 1/8, we'll see why that actually 112 00:06:56,010 --> 00:06:58,550 matters in a minute. 113 00:06:58,550 --> 00:07:03,760 So, what does it mean, in some sense, in decimal? 114 00:07:03,760 --> 00:07:06,708 It's equal to 1 times 10 to the minus 1 plus 2 times 10 to 115 00:07:06,708 --> 00:07:07,958 the minus 2 plus 5 times 10 to the minus 3. 116 00:07:23,800 --> 00:07:26,510 So it works exactly the same way that things work on the 117 00:07:26,510 --> 00:07:31,530 other side of, in this case, the decimal point. 118 00:07:31,530 --> 00:07:35,070 Suppose we want to represent it in binary. 119 00:07:35,070 --> 00:07:38,150 So instead of a decimal point, we have a binary point. 120 00:07:41,040 --> 00:07:42,450 What does it look like then? 121 00:07:47,930 --> 00:07:56,550 Well it's equal to what? 122 00:07:56,550 --> 00:07:58,170 1 times-- 123 00:07:58,170 --> 00:08:00,550 if it's 1/8, what's it going to be? 124 00:08:00,550 --> 00:08:00,920 1 times what? 125 00:08:00,920 --> 00:08:03,320 AUDIENCE: 1 times 10 to the minus 3. 126 00:08:03,320 --> 00:08:04,570 PROFESSOR: 10 to the minus 3. 127 00:08:07,880 --> 00:08:14,722 Or, 0.001. 128 00:08:14,722 --> 00:08:17,110 Right? 129 00:08:17,110 --> 00:08:20,140 So, so far, so good. 130 00:08:20,140 --> 00:08:23,450 Not much difference between the two. 131 00:08:23,450 --> 00:08:26,960 Now let's take a different decimal number. 132 00:08:26,960 --> 00:08:35,490 What about the decimal 0.1? 133 00:08:35,490 --> 00:08:38,549 I have to tell you that it's a decimal because it could also 134 00:08:38,549 --> 00:08:41,690 be a binary with just 0's and 1's. 135 00:08:41,690 --> 00:08:45,100 Well, we know how to represent that in decimal. 136 00:08:49,430 --> 00:08:50,680 How about in binary? 137 00:08:53,430 --> 00:08:54,660 What's the equivalent? 138 00:08:54,660 --> 00:08:57,020 Now that's 1/10, of course. 139 00:08:57,020 --> 00:08:59,010 What does 1/10 look like in binary? 140 00:09:04,650 --> 00:09:05,900 Any takers? 141 00:09:10,900 --> 00:09:12,610 Well I'll give you a hint. 142 00:09:12,610 --> 00:09:18,078 It's so long, that I don't want to write it on the board. 143 00:09:23,540 --> 00:09:32,340 In fact, it's worse than long, it's infinite. 144 00:09:32,340 --> 00:09:35,630 I guess that's kind of long. 145 00:09:35,630 --> 00:09:44,100 It's this repeating binary fraction. 146 00:09:44,100 --> 00:09:49,580 There is no finite combination of binary digits that 147 00:09:49,580 --> 00:09:52,295 represent the decimal fraction 1/10. 148 00:09:55,510 --> 00:09:56,880 There's no way to do it. 149 00:09:59,940 --> 00:10:04,000 And that's why things get a little hairy. 150 00:10:04,000 --> 00:10:06,495 So we can stop at some finite number of bits. 151 00:10:09,185 --> 00:10:13,420 And in fact that's what happens in the internal 152 00:10:13,420 --> 00:10:16,630 representation in Python. 153 00:10:16,630 --> 00:10:22,090 It ends up representing 1/10 as something equivalent to 154 00:10:22,090 --> 00:10:25,490 this decimal fraction. 155 00:10:25,490 --> 00:10:31,390 If I take the number of binary bits that are inside the 156 00:10:31,390 --> 00:10:36,290 computer, and then I translate it back to decimal, it turns 157 00:10:36,290 --> 00:10:40,280 out that it's using this approximation for the decimal 158 00:10:40,280 --> 00:10:41,530 fraction 1/10. 159 00:10:44,780 --> 00:10:49,050 So for example, some of you in your problem sets -- where you 160 00:10:49,050 --> 00:10:54,060 were computing how much you had to pay on a credit card -- 161 00:10:54,060 --> 00:10:57,110 would get answers that were eventually off by a penny or 162 00:10:57,110 --> 00:11:00,290 something from what we expected in some, and that has 163 00:11:00,290 --> 00:11:04,350 to do with the fact that you were thinking in decimal. 164 00:11:04,350 --> 00:11:08,450 And in fact, you were writing your program in decimal, yet 165 00:11:08,450 --> 00:11:11,940 internally things were happening in binary, and when 166 00:11:11,940 --> 00:11:15,630 you thought you were writing 1/10 for example you were 167 00:11:15,630 --> 00:11:20,920 actually getting something like this inside the computer. 168 00:11:20,920 --> 00:11:24,870 Pretty close to 1/10, but not exactly 1/10. 169 00:11:29,350 --> 00:11:37,110 Now, when we print it, we get yet something else because the 170 00:11:37,110 --> 00:11:41,670 print statement uses an internal function that by 171 00:11:41,670 --> 00:11:46,700 default rounds these things to 17 digits. 172 00:11:46,700 --> 00:11:51,370 And so you end up getting something like that, or you 173 00:11:51,370 --> 00:11:53,560 might depending how you do it. 174 00:11:53,560 --> 00:11:55,270 So let's look at an example here. 175 00:12:07,090 --> 00:12:15,820 So I can do something like this, and it prints that 176 00:12:15,820 --> 00:12:19,130 because it's doing some rounding for me. 177 00:12:19,130 --> 00:12:25,430 But if I really look at what's under there, and look at the 178 00:12:25,430 --> 00:12:30,530 representation, the REPR function is convenient to get 179 00:12:30,530 --> 00:12:35,890 a sense of what's really going on inside, it tells me that 180 00:12:35,890 --> 00:12:40,420 well that's a 17-digit approximation. 181 00:12:40,420 --> 00:12:43,650 And now so that's what's really lurking there. 182 00:12:43,650 --> 00:12:47,530 So a hint, If you think something is going funny 183 00:12:47,530 --> 00:12:52,460 because of the way arithmetic is working, instead of just 184 00:12:52,460 --> 00:13:00,020 using print, you can use print of REPR to get a better idea 185 00:13:00,020 --> 00:13:02,780 about what's really going on. 186 00:13:02,780 --> 00:13:06,840 All right, now, does this matter? 187 00:13:06,840 --> 00:13:09,420 Usually it doesn't. 188 00:13:09,420 --> 00:13:12,260 Most of the time it's safe just to pretend that floating 189 00:13:12,260 --> 00:13:15,970 points work the way you learned about arithmetic when 190 00:13:15,970 --> 00:13:21,770 you were in third grade, or probably in kindergarten if 191 00:13:21,770 --> 00:13:24,820 you were educated in Europe or Asia. 192 00:13:24,820 --> 00:13:29,160 But now let's look at an example where 193 00:13:29,160 --> 00:13:31,470 you can get in trouble. 194 00:13:31,470 --> 00:13:33,970 So I've got a little program here. 195 00:13:33,970 --> 00:13:38,480 I initialize x to 0, then I'm going to go through a loop a 196 00:13:38,480 --> 00:13:44,570 lot of times, where I increment x by 1/10. 197 00:13:44,570 --> 00:13:47,180 And then I'm going to print x. 198 00:13:47,180 --> 00:13:49,920 And because it's going to do automatic rounding, it's going 199 00:13:49,920 --> 00:13:53,890 to print 10,000-- 200 00:13:53,890 --> 00:14:00,470 or actually, it should print 100,000, right? 201 00:14:00,470 --> 00:14:02,380 No 10,000, because I'm only incrementing it by 202 00:14:02,380 --> 00:14:04,450 1/10, excuse me. 203 00:14:04,450 --> 00:14:10,570 But then I'm going to print REPR of x, and then I'm going 204 00:14:10,570 --> 00:14:13,000 to do a comparison. 205 00:14:13,000 --> 00:14:16,280 Now if floating point arithmetic worked the way 206 00:14:16,280 --> 00:14:23,370 reals work, we would think that 10.0 times x should equal 207 00:14:23,370 --> 00:14:26,200 the number of iterations. 208 00:14:26,200 --> 00:14:29,710 Because I'm starting at 0, each time I'm incrementing it 209 00:14:29,710 --> 00:14:35,790 by 1/10, and so if I multiply the result by 10 at the end, I 210 00:14:35,790 --> 00:14:38,920 should get the same as the number of iterations. 211 00:14:38,920 --> 00:14:41,270 Does that make sense to everybody? 212 00:14:41,270 --> 00:14:47,530 That's what you would normally get if you did this with 213 00:14:47,530 --> 00:14:48,320 pencil and paper. 214 00:14:48,320 --> 00:14:52,550 Of course, it would take you a really long time to do 100,000 215 00:14:52,550 --> 00:14:53,990 increments. 216 00:14:53,990 --> 00:14:55,240 Let's give it a shot. 217 00:14:58,090 --> 00:15:01,460 And what we'll see is that if I print it, it 218 00:15:01,460 --> 00:15:04,100 looks OK, it's 1,000. 219 00:15:04,100 --> 00:15:10,570 But if I print REPR of it, I see it's 10,000, a bunch of 220 00:15:10,570 --> 00:15:15,260 0's, and then 18848. 221 00:15:15,260 --> 00:15:19,650 And, of course, consequently when I compare it, I get 222 00:15:19,650 --> 00:15:22,280 something that says false. 223 00:15:22,280 --> 00:15:35,230 And that's because if I look at REPR of 10.0 times x-- 224 00:15:35,230 --> 00:15:41,420 well, that's interesting, what's going on here? 225 00:15:41,420 --> 00:15:43,500 It kind of looks like the same thing, doesn't it? 226 00:15:47,140 --> 00:15:50,490 But it's not, because way out there are some other digits 227 00:15:50,490 --> 00:15:53,440 we're not seeing, something different is happening. 228 00:15:57,190 --> 00:16:00,070 OK, what's the moral of this? 229 00:16:00,070 --> 00:16:01,960 It's not complicated. 230 00:16:01,960 --> 00:16:06,500 It's not, OK write your programs thinking deeply about 231 00:16:06,500 --> 00:16:10,380 what's going on in those bits way out there at the end. 232 00:16:10,380 --> 00:16:14,380 It's, don't ever test whether to floating numbers are equal 233 00:16:14,380 --> 00:16:17,300 to each other. 234 00:16:17,300 --> 00:16:20,130 Instead, do something like this. 235 00:16:28,370 --> 00:16:32,920 Define a function called 'close', or whatever you want, 236 00:16:32,920 --> 00:16:37,760 that takes two floats and some epsilon. 237 00:16:37,760 --> 00:16:41,290 And I've given here epsilon a default value. 238 00:16:41,290 --> 00:16:44,330 And then just return whether the absolute value of x minus 239 00:16:44,330 --> 00:16:47,460 y is less than epsilon. 240 00:16:47,460 --> 00:16:49,880 So whenever you're comparing two floating numbers, the 241 00:16:49,880 --> 00:16:53,890 question shouldn't be are they identical, but are they close 242 00:16:53,890 --> 00:16:56,370 enough for your purposes. 243 00:16:56,370 --> 00:16:59,810 And if you do that, then you don't get tripped up by this 244 00:16:59,810 --> 00:17:01,780 kind of rounding and things like that. 245 00:17:05,359 --> 00:17:08,960 Not a complicated story, but keeping this in mind will get 246 00:17:08,960 --> 00:17:11,050 you out of trouble when you're doing floating point 247 00:17:11,050 --> 00:17:13,190 arithmetic. 248 00:17:13,190 --> 00:17:15,209 Let's run this, and see what happens. 249 00:17:18,660 --> 00:17:21,910 And indeed, they're not equal but they're good enough, close 250 00:17:21,910 --> 00:17:23,160 enough if you will. 251 00:17:25,650 --> 00:17:26,900 OK. 252 00:17:29,480 --> 00:17:32,660 One of the dangers, the reason this went wrong, is these 253 00:17:32,660 --> 00:17:35,770 little differences can accumulate if you go through a 254 00:17:35,770 --> 00:17:37,620 lot of iterations. 255 00:17:37,620 --> 00:17:40,990 Sometimes they balance out, sometimes it rounds up, 256 00:17:40,990 --> 00:17:43,870 sometimes it rounds down, but not always. 257 00:17:43,870 --> 00:17:46,180 So very simple answer. 258 00:17:46,180 --> 00:17:49,920 Just don't get caught up in this problem of 259 00:17:49,920 --> 00:17:51,780 floating point numbers. 260 00:17:51,780 --> 00:17:53,030 All right, any questions about that? 261 00:17:55,960 --> 00:17:58,540 All right, Yes. 262 00:17:58,540 --> 00:18:00,990 AUDIENCE: Doesn't it change for Python 2.7? 263 00:18:00,990 --> 00:18:05,400 It's only returning 0.1 and not 0.100000. 264 00:18:05,400 --> 00:18:06,380 PROFESSOR: In 2.7? 265 00:18:06,380 --> 00:18:07,360 AUDIENCE: Yeah. 266 00:18:07,360 --> 00:18:08,610 PROFESSOR: Don't know, sorry. 267 00:18:12,560 --> 00:18:15,460 But the moral remains the same. 268 00:18:15,460 --> 00:18:18,110 Whatever is going on, don't test floating point numbers 269 00:18:18,110 --> 00:18:23,170 for quality because you'll have a high probability of 270 00:18:23,170 --> 00:18:26,274 getting false, when you should get true. 271 00:18:26,274 --> 00:18:27,260 OK. 272 00:18:27,260 --> 00:18:30,480 You almost never get true when you should get false. 273 00:18:30,480 --> 00:18:32,890 I now want to move on if there are no more 274 00:18:32,890 --> 00:18:34,140 questions to debugging. 275 00:18:37,220 --> 00:18:40,140 I never know when to give this lecture in the term. 276 00:18:40,140 --> 00:18:44,920 So what I usually do is I wait until the volume of email, and 277 00:18:44,920 --> 00:18:48,750 complaints, and office hours builds, and I realized people 278 00:18:48,750 --> 00:18:51,590 are ready to learn more about debugging. 279 00:18:51,590 --> 00:18:54,970 If I do it too early, people don't pay any attention 280 00:18:54,970 --> 00:18:57,490 because they don't realize it's a problem. 281 00:18:57,490 --> 00:19:00,960 And if I do it too late, they get irritated with me because 282 00:19:00,960 --> 00:19:02,950 they say well why didn't you tell me this earlier in the 283 00:19:02,950 --> 00:19:05,950 semester when it would've done me some good. 284 00:19:05,950 --> 00:19:07,900 So, I pick a time. 285 00:19:07,900 --> 00:19:11,070 And right now it looks like the need has built up enough 286 00:19:11,070 --> 00:19:13,840 that it's worth doing. 287 00:19:13,840 --> 00:19:18,640 There's a very charming urban legend about how the process 288 00:19:18,640 --> 00:19:23,320 of fixing flaws in software came to be known as debugging. 289 00:19:23,320 --> 00:19:25,860 It's one of those stories that's so nice that you just 290 00:19:25,860 --> 00:19:27,910 want it to be true. 291 00:19:27,910 --> 00:19:30,440 So let's look at this story, because it's fun. 292 00:19:45,370 --> 00:19:49,950 All right, what you see on the screen now is a photo of a 293 00:19:49,950 --> 00:19:54,760 book now at the Smithsonian Museum, of the lab book from 294 00:19:54,760 --> 00:19:59,170 the group working on the Mark II Aiken Relay computer at 295 00:19:59,170 --> 00:20:02,450 Harvard University. 296 00:20:02,450 --> 00:20:03,030 Pardon? 297 00:20:03,030 --> 00:20:06,210 Oh, I see it on my screen, now you see it on your screen. 298 00:20:06,210 --> 00:20:07,960 Thank you. 299 00:20:07,960 --> 00:20:10,100 So there it is. 300 00:20:10,100 --> 00:20:17,150 It was September 9, 1947, even before I was born, it 301 00:20:17,150 --> 00:20:20,190 was that long ago. 302 00:20:20,190 --> 00:20:22,040 And so you can see that they're running their 303 00:20:22,040 --> 00:20:27,130 computer, and they started to do an arctan computation, and 304 00:20:27,130 --> 00:20:31,460 it's kind of interesting that they started it at 8 o'clock 305 00:20:31,460 --> 00:20:33,790 in the morning, and it ran for two 306 00:20:33,790 --> 00:20:36,470 hours, and then it stopped. 307 00:20:36,470 --> 00:20:38,960 Wow, to do an arctan. 308 00:20:38,960 --> 00:20:42,060 Tells you something about how fast this computer was. 309 00:20:42,060 --> 00:20:43,550 Then it went on. 310 00:20:43,550 --> 00:20:49,150 Then they started the cosine tape, and started to do a 311 00:20:49,150 --> 00:20:52,220 multiple adder, and then something bad happened. 312 00:20:57,470 --> 00:20:58,720 It stopped working. 313 00:21:00,850 --> 00:21:02,100 Whoops. 314 00:21:04,680 --> 00:21:07,450 All right, hold on a second. 315 00:21:16,730 --> 00:21:19,550 And they spent a long time trying to find out why it 316 00:21:19,550 --> 00:21:21,220 stopped working. 317 00:21:21,220 --> 00:21:25,040 And then they found out the problem. 318 00:21:25,040 --> 00:21:30,750 They found a moth stuck between one of the relays. 319 00:21:30,750 --> 00:21:34,140 So it had electromechanical relays for their switches, the 320 00:21:34,140 --> 00:21:35,480 on and off. 321 00:21:35,480 --> 00:21:39,010 And they were debugging, they didn't call it debugging. 322 00:21:39,010 --> 00:21:41,540 And they found the software had failed because the 323 00:21:41,540 --> 00:21:44,620 hardware had failed, and the hardware had failed because a 324 00:21:44,620 --> 00:21:47,710 bug had been stuck in one of the relays. 325 00:21:47,710 --> 00:21:51,490 They debugged it, as in removed the moth, and the 326 00:21:51,490 --> 00:21:54,770 program ran to successful completion. 327 00:21:54,770 --> 00:21:57,280 And as you can see, the comment was written in this 328 00:21:57,280 --> 00:22:01,830 book, first actual case of a bug being found. 329 00:22:01,830 --> 00:22:05,610 Hence, we call it debugging. 330 00:22:05,610 --> 00:22:10,490 This was, by the way, Grace Murray Hopper's lab book. 331 00:22:10,490 --> 00:22:15,190 She is often described as the first programmer. 332 00:22:15,190 --> 00:22:16,590 It's unclear if that's true. 333 00:22:16,590 --> 00:22:19,340 What is true, she was the first female 334 00:22:19,340 --> 00:22:21,700 Admiral in the US Navy. 335 00:22:21,700 --> 00:22:23,980 She was a Navy programmer who eventually rose 336 00:22:23,980 --> 00:22:25,230 to the rank of Admiral. 337 00:22:27,700 --> 00:22:29,590 So it's a charming story that this is 338 00:22:29,590 --> 00:22:31,140 why we call it debugging. 339 00:22:31,140 --> 00:22:33,720 Turns out it's not at all true. 340 00:22:33,720 --> 00:22:38,580 That the phrase debugging had been used for a long time, and 341 00:22:38,580 --> 00:22:42,840 could easily be traced back to the 1800s when people were 342 00:22:42,840 --> 00:22:45,820 writing books about electronics and talking about 343 00:22:45,820 --> 00:22:48,780 debugging even in those days. 344 00:22:48,780 --> 00:22:52,310 And in fact, you can go back to Shakespeare who talks about 345 00:22:52,310 --> 00:22:57,940 a bugbear, meaning something causing needless exercise, 346 00:22:57,940 --> 00:23:01,550 needless or excessive fear or anxiety. 347 00:23:01,550 --> 00:23:04,760 Well that's a good description of a bug. 348 00:23:04,760 --> 00:23:09,020 And he actually called it a bug when he had Hamlet 349 00:23:09,020 --> 00:23:09,530 [UNINTELLIGIBLE] 350 00:23:09,530 --> 00:23:12,680 about to bugs and goblins in my life. 351 00:23:12,680 --> 00:23:15,770 All right, so I want to start now-- 352 00:23:15,770 --> 00:23:21,230 oh by the way, just for fun, this is what the Mark II 353 00:23:21,230 --> 00:23:23,640 looked like. 354 00:23:23,640 --> 00:23:27,940 This was the computer the took an hour or so to do an arctan. 355 00:23:27,940 --> 00:23:29,440 You see it filled-- 356 00:23:29,440 --> 00:23:31,430 made it's a little hard to see in this light-- 357 00:23:31,430 --> 00:23:33,340 but you can see it filled an entire room. 358 00:23:35,960 --> 00:23:37,610 Quite amazing. 359 00:23:37,610 --> 00:23:42,120 And, here's a picture of Admiral Hopper and some 360 00:23:42,120 --> 00:23:45,080 unidentified mail. 361 00:23:45,080 --> 00:23:48,100 All right, if anyone knows who this it would be good to know 362 00:23:48,100 --> 00:23:50,670 so I can update my archives. 363 00:23:50,670 --> 00:23:54,280 All right, so now on to some 364 00:23:54,280 --> 00:23:57,090 practical aspects of debugging. 365 00:23:57,090 --> 00:23:59,180 The first thing I want to do is dispel 366 00:23:59,180 --> 00:24:01,660 some myths about debugging. 367 00:24:01,660 --> 00:24:05,030 There is this myth that bugs crawl 368 00:24:05,030 --> 00:24:07,700 unbidden into our programs. 369 00:24:07,700 --> 00:24:11,770 That we write perfect programs and somehow a bug just sneaks 370 00:24:11,770 --> 00:24:15,360 in, and ruins perfection. 371 00:24:15,360 --> 00:24:16,650 That's not true. 372 00:24:16,650 --> 00:24:18,200 In fact, if there's a bug in your 373 00:24:18,200 --> 00:24:21,260 program, you put it there. 374 00:24:21,260 --> 00:24:23,820 So it would be almost better not to call it a bug, which 375 00:24:23,820 --> 00:24:27,890 sort of sounds like it's not our fault, but it's a mistake, 376 00:24:27,890 --> 00:24:29,850 it's a screw up. 377 00:24:29,850 --> 00:24:31,630 So get that through your head. 378 00:24:31,630 --> 00:24:35,250 Similarly bugs do not breed in programs. 379 00:24:35,250 --> 00:24:38,600 If there are multiple bugs in your program, it's not because 380 00:24:38,600 --> 00:24:42,400 a couple of them got together and procreated, it's because 381 00:24:42,400 --> 00:24:43,650 you made a lot of mistakes. 382 00:24:46,590 --> 00:24:47,780 Keep that in mind. 383 00:24:47,780 --> 00:24:51,290 With that in mind, we should think about what the goal of 384 00:24:51,290 --> 00:24:52,900 debugging-- 385 00:24:52,900 --> 00:25:14,250 and it's not to eliminate one bug quickly, it is to move 386 00:25:14,250 --> 00:25:16,070 towards a bug-free program. 387 00:25:21,730 --> 00:25:32,290 And I say this because they're not always the same strategy 388 00:25:32,290 --> 00:25:36,894 that you would follow for these different goals. 389 00:25:36,894 --> 00:25:40,660 And I also carefully say to move towards a bug-free 390 00:25:40,660 --> 00:25:45,010 program because in truth be told we are hardly ever sure 391 00:25:45,010 --> 00:25:48,290 that we have no bugs left. 392 00:25:48,290 --> 00:25:50,840 Debugging is a learned skill. 393 00:25:50,840 --> 00:25:51,690 Don't despair. 394 00:25:51,690 --> 00:25:55,220 Nobody does it well instinctively. 395 00:25:55,220 --> 00:25:59,170 Evolution did not train us to be debuggers. 396 00:25:59,170 --> 00:26:02,290 So a large part, probably the largest part in many ways, of 397 00:26:02,290 --> 00:26:04,520 learning to be a good programmer 398 00:26:04,520 --> 00:26:06,850 is learning to debug. 399 00:26:06,850 --> 00:26:11,840 And what that has to do is thinking systematically and 400 00:26:11,840 --> 00:26:16,990 efficiently about how to move towards a bug-free program. 401 00:26:16,990 --> 00:26:21,230 The good news is that it's not hard to learn, and it is a 402 00:26:21,230 --> 00:26:24,170 largely transferable skill. 403 00:26:24,170 --> 00:26:28,580 The same skills you use to debug software, can be used to 404 00:26:28,580 --> 00:26:31,540 debug laboratory experiments. 405 00:26:31,540 --> 00:26:35,390 I actually give lectures sometimes to physicians about 406 00:26:35,390 --> 00:26:38,230 how to debug patients. 407 00:26:38,230 --> 00:26:40,350 How to use debugging techniques to find out what's 408 00:26:40,350 --> 00:26:42,350 wrong with people when they're sick. 409 00:26:42,350 --> 00:26:46,200 It's a very good and useful life skill. 410 00:26:46,200 --> 00:26:51,490 Now for four decades, maybe five decades, people have been 411 00:26:51,490 --> 00:26:56,450 building tools called the debuggers. 412 00:26:56,450 --> 00:27:00,432 And you'll find that built into IDOL there is a debugger 413 00:27:00,432 --> 00:27:04,320 that are designed to help people find out why their 414 00:27:04,320 --> 00:27:08,400 programs don't work, and fix them. 415 00:27:08,400 --> 00:27:13,300 Personally, I almost never use one. 416 00:27:13,300 --> 00:27:16,710 The tools are not that important. 417 00:27:16,710 --> 00:27:19,870 What's important is the skill of the 418 00:27:19,870 --> 00:27:22,170 craftsman, in this case. 419 00:27:22,170 --> 00:27:26,030 And in fact, most of the experienced programmers I know 420 00:27:26,030 --> 00:27:27,365 rely on print statements. 421 00:27:29,940 --> 00:27:34,930 So it's OK to use a debugger but I think the best debugging 422 00:27:34,930 --> 00:27:37,030 tool is print. 423 00:27:37,030 --> 00:27:41,550 And I have to say I've been surprised-- 424 00:27:41,550 --> 00:27:43,770 that's a mild word here-- 425 00:27:43,770 --> 00:27:48,670 at how few print statements you guys seem to use. 426 00:27:48,670 --> 00:27:52,820 I get these emails, or the staff gets these emails, kind 427 00:27:52,820 --> 00:27:56,500 of plaintiff, why doesn't my program work? 428 00:27:56,500 --> 00:27:59,050 And then there's a little piece of code. 429 00:27:59,050 --> 00:28:02,150 And the answer I send back-- when I reply before one of the 430 00:28:02,150 --> 00:28:04,760 TA's do, and they usually get there first-- 431 00:28:04,760 --> 00:28:07,380 is usually, put in a print statement here 432 00:28:07,380 --> 00:28:09,600 and see what happens. 433 00:28:09,600 --> 00:28:12,300 And I'm just amazed that when the code arrives it doesn't 434 00:28:12,300 --> 00:28:15,490 have these statements in it. 435 00:28:15,490 --> 00:28:18,630 My favorite response, was I sent an email to a student, 436 00:28:18,630 --> 00:28:23,345 who shall go nameless, and he-- or maybe it was a she-- 437 00:28:23,345 --> 00:28:25,350 and I said, insert a print statement here 438 00:28:25,350 --> 00:28:26,010 and see what happens. 439 00:28:26,010 --> 00:28:30,040 And I got back to reply saying, no I don't need a 440 00:28:30,040 --> 00:28:31,930 print statement here I know what the value of this 441 00:28:31,930 --> 00:28:34,220 variable is. 442 00:28:34,220 --> 00:28:38,550 Well, you know, my reply was that if all the values were 443 00:28:38,550 --> 00:28:40,680 what you thought they were, you wouldn't be sending an 444 00:28:40,680 --> 00:28:43,180 email saying, why doesn't my program work. 445 00:28:43,180 --> 00:28:46,330 Put the darn print statement and see what happens. 446 00:28:46,330 --> 00:28:49,310 And then I got a gracious email back saying, more or 447 00:28:49,310 --> 00:28:52,490 less, oops, I see. 448 00:28:52,490 --> 00:28:57,230 But please, when you send us some code, you want some help, 449 00:28:57,230 --> 00:29:00,265 send us code with some print statements already in it to at 450 00:29:00,265 --> 00:29:01,770 least show us that you've tried to 451 00:29:01,770 --> 00:29:04,880 find the bug yourself. 452 00:29:04,880 --> 00:29:07,160 All right, so what we're essentially doing when we 453 00:29:07,160 --> 00:29:11,890 insert print statements in a code is searching for the 454 00:29:11,890 --> 00:29:14,020 place in our program where things have gone awry. 455 00:29:16,710 --> 00:29:22,420 And the key to being a good debugger is to be systematic 456 00:29:22,420 --> 00:29:24,580 in this search. 457 00:29:24,580 --> 00:29:27,530 So you saw that when we looked at algorithms for things like 458 00:29:27,530 --> 00:29:29,800 exhaustive enumeration. 459 00:29:29,800 --> 00:29:32,620 We said, well if we're searching for an answer, we 460 00:29:32,620 --> 00:29:36,480 have to search the space carefully one at a time. 461 00:29:36,480 --> 00:29:39,120 And then we said, if we want to search it efficiently, 462 00:29:39,120 --> 00:29:41,930 maybe instead of starting at the beginning and just going 463 00:29:41,930 --> 00:29:46,260 to the end, we should use something like binary search. 464 00:29:46,260 --> 00:29:49,960 The same techniques can be used when you're 465 00:29:49,960 --> 00:29:51,210 searching for bugs. 466 00:29:58,660 --> 00:30:04,240 So I recommend searching for bugs using some approximation 467 00:30:04,240 --> 00:30:05,490 to binary search. 468 00:30:11,820 --> 00:30:16,180 And we'll see an example of this as we go forward, but as 469 00:30:16,180 --> 00:30:21,340 we look at the example what I want you to think about is 470 00:30:21,340 --> 00:30:24,340 what are we searching for? 471 00:30:24,340 --> 00:30:27,400 We know our program doesn't work. 472 00:30:27,400 --> 00:30:35,680 So the question that I like to ask, is not why didn't it 473 00:30:35,680 --> 00:30:38,960 produce the answer I wanted it to? 474 00:30:38,960 --> 00:30:42,030 But, how could it have done what it had done? 475 00:31:01,700 --> 00:31:04,070 This is a subtly different question. 476 00:31:04,070 --> 00:31:07,810 And it's usually a much easier question to answer. 477 00:31:07,810 --> 00:31:10,170 Not why didn't it do the right thing, 478 00:31:10,170 --> 00:31:12,200 but here it did something. 479 00:31:12,200 --> 00:31:13,590 So I already know what it did. 480 00:31:13,590 --> 00:31:16,780 And I say, I didn't expect it to do that, so 481 00:31:16,780 --> 00:31:18,860 why did it do that? 482 00:31:18,860 --> 00:31:22,620 Once I know why it did what it did, it's usually pretty easy 483 00:31:22,620 --> 00:31:24,500 to think how to fix it. 484 00:31:27,590 --> 00:31:29,635 So that's the first question I ask. 485 00:31:33,230 --> 00:31:37,980 I then go about it using something akin to the 486 00:31:37,980 --> 00:31:40,250 scientific method, which we all learned 487 00:31:40,250 --> 00:31:42,710 about many years ago. 488 00:31:42,710 --> 00:31:50,310 And basically the scientific method is based upon studying 489 00:31:50,310 --> 00:31:51,560 available data. 490 00:32:00,690 --> 00:32:09,080 The data you have is of course the program text itself, the 491 00:32:09,080 --> 00:32:13,770 test results, you ran some tests and got the wrong answer 492 00:32:13,770 --> 00:32:18,190 which is why you knew you had a bug. 493 00:32:18,190 --> 00:32:23,240 And then you can probe it, you can change the test results by 494 00:32:23,240 --> 00:32:28,110 using print statements so that you have more data to study. 495 00:32:28,110 --> 00:32:31,840 Keep in mind that you don't understand this program, 496 00:32:31,840 --> 00:32:33,090 because if you did it would work. 497 00:32:35,600 --> 00:32:47,640 Once I study this, I form a hypothesis that at least I 498 00:32:47,640 --> 00:32:49,655 think is consistent with the data. 499 00:32:56,750 --> 00:33:08,595 And then I go and design and run a repeatable experiment. 500 00:33:12,728 --> 00:33:15,570 And I want to emphasize the word repeatable, here. 501 00:33:24,580 --> 00:33:28,200 And again the key thing as with the scientific method, 502 00:33:28,200 --> 00:33:34,330 the experiment to be useful must have the potential to 503 00:33:34,330 --> 00:33:36,370 refute the hypothesis. 504 00:33:46,290 --> 00:33:49,870 Why might repeatability to be an issue? 505 00:33:49,870 --> 00:33:54,650 Well, as we'll see pretty soon, a lot of programs 506 00:33:54,650 --> 00:33:58,070 involve randomness. 507 00:33:58,070 --> 00:34:02,620 Where you're doing something equivalent to flipping a coin, 508 00:34:02,620 --> 00:34:05,690 somewhere in the program which might come up heads or tails, 509 00:34:05,690 --> 00:34:08,110 and the program would do different things. 510 00:34:08,110 --> 00:34:10,420 We'll see why that's an important programming 511 00:34:10,420 --> 00:34:12,320 techniques soon. 512 00:34:12,320 --> 00:34:15,960 And once you do that, you can get different results with 513 00:34:15,960 --> 00:34:17,210 different runs. 514 00:34:19,460 --> 00:34:25,139 More subtly there can be various kinds of timing errors 515 00:34:25,139 --> 00:34:27,570 deep down in the operating system where you have multiple 516 00:34:27,570 --> 00:34:30,120 activities going on at the same time. 517 00:34:30,120 --> 00:34:33,429 This is usually the reason that you'll see say, Windows 518 00:34:33,429 --> 00:34:38,389 crash, or Word, or PowerPoint, or something else. 519 00:34:38,389 --> 00:34:42,580 Because there's some timing error that occurs sometimes. 520 00:34:42,580 --> 00:34:46,350 And probably most commonly, because there's human input. 521 00:34:46,350 --> 00:34:47,770 Somebody typed something and they might 522 00:34:47,770 --> 00:34:50,719 type something different. 523 00:34:50,719 --> 00:34:55,440 So one of the things you want to do when you're systematic 524 00:34:55,440 --> 00:34:59,090 is make sure that you can replay things. 525 00:34:59,090 --> 00:35:02,780 And we'll talk more about this when we get to randomness, 526 00:35:02,780 --> 00:35:06,190 about how we go about doing that. 527 00:35:06,190 --> 00:35:08,570 All right, now let's try and put this all together in a 528 00:35:08,570 --> 00:35:10,870 little program. 529 00:35:10,870 --> 00:35:14,600 If you've been studying your handout, as at least one of 530 00:35:14,600 --> 00:35:20,000 the TA's did, you've been kind of mystified by the fact that 531 00:35:20,000 --> 00:35:22,230 there's a pretty crummy looking program in it. 532 00:35:26,090 --> 00:35:30,300 And unlike sometimes when I make mistakes I don't know 533 00:35:30,300 --> 00:35:31,310 I've made, here I 534 00:35:31,310 --> 00:35:34,100 intentionally made some mistakes. 535 00:35:34,100 --> 00:35:37,380 So let's look at this program. 536 00:35:37,380 --> 00:35:41,980 I wrote a function called is_palindrome that takes in a 537 00:35:41,980 --> 00:35:46,190 list and is intended to return true if the list is a 538 00:35:46,190 --> 00:35:49,390 palindrome and false otherwise. 539 00:35:49,390 --> 00:35:52,680 Then I wrote this little program called Silly that uses 540 00:35:52,680 --> 00:36:00,000 isPal, takes in a number, requests that the user make 541 00:36:00,000 --> 00:36:05,910 that many inputs, then calls isPal to find out whether or 542 00:36:05,910 --> 00:36:08,020 not the resultant list is a palindrome. 543 00:36:10,750 --> 00:36:12,900 Not too complicated. 544 00:36:12,900 --> 00:36:14,630 But now let's run it. 545 00:36:22,560 --> 00:36:25,240 Do Silly of 'five'. 546 00:36:34,670 --> 00:36:37,140 And it tells me 'abcde' is a palindrome. 547 00:36:39,670 --> 00:36:42,290 All right, I have a bug. 548 00:36:42,290 --> 00:36:46,830 Now I need to go try and find that bug. 549 00:36:46,830 --> 00:36:49,390 So the first thing I need to think about when I'm looking 550 00:36:49,390 --> 00:36:56,590 for it is to try and find a smaller piece of input that 551 00:36:56,590 --> 00:36:57,840 will produce the bug. 552 00:37:08,700 --> 00:37:13,470 So I want to find small input on which program fails. 553 00:37:13,470 --> 00:37:16,610 Why do I want to find a smaller input? 554 00:37:16,610 --> 00:37:21,300 Well, a in this case it's less typing, b if it's a real 555 00:37:21,300 --> 00:37:25,280 program it's probably less execution time to make it run, 556 00:37:25,280 --> 00:37:28,910 but c it'll be easier to debug because there are fewer kinds 557 00:37:28,910 --> 00:37:30,560 of problems. 558 00:37:30,560 --> 00:37:33,860 So let me try it on a small piece of input 559 00:37:33,860 --> 00:37:35,220 say, Silly of 1. 560 00:37:39,240 --> 00:37:42,680 Oh, it gets that right. 561 00:37:42,680 --> 00:37:45,210 So that's no good. 562 00:37:45,210 --> 00:37:48,300 Let me try something else, let's try Silly of 2, I'm sort 563 00:37:48,300 --> 00:37:49,550 of sneaking up. 564 00:37:52,810 --> 00:37:55,270 It gets that one wrong. 565 00:37:55,270 --> 00:37:59,990 All right, so I know I can test it on a small input. 566 00:37:59,990 --> 00:38:01,540 So that's a good thing. 567 00:38:01,540 --> 00:38:03,630 I now have a simple test. 568 00:38:03,630 --> 00:38:08,340 Now in this case the code is so short, and so stupid, that 569 00:38:08,340 --> 00:38:10,520 you could probably look at it with your eyes and just find 570 00:38:10,520 --> 00:38:12,340 the bug instantly. 571 00:38:12,340 --> 00:38:15,870 But the point of this exercise is not to find the bug, but to 572 00:38:15,870 --> 00:38:18,890 kind of show the process. 573 00:38:18,890 --> 00:38:21,970 So now I wanted to go through this process of binary search 574 00:38:21,970 --> 00:38:25,660 to try and find the bug. 575 00:38:25,660 --> 00:38:31,360 So we'll start with Silly, the top level program, and I'll 576 00:38:31,360 --> 00:38:37,920 look for something about halfway through, maybe here. 577 00:38:37,920 --> 00:38:42,380 And try and now answer the question, that I've got a lot 578 00:38:42,380 --> 00:38:49,950 of code and I'm going to find a point halfway through it and 579 00:38:49,950 --> 00:38:52,730 try and ask is the bug above this, or below this. 580 00:38:56,940 --> 00:38:58,610 So I need to find some 581 00:38:58,610 --> 00:39:01,430 intermediate value I can check. 582 00:39:01,430 --> 00:39:03,980 And at this point in the program the only thing I have 583 00:39:03,980 --> 00:39:07,350 done is accumulate the input, right? 584 00:39:07,350 --> 00:39:09,950 So there's nothing else to ask. 585 00:39:09,950 --> 00:39:14,500 So my hypothesis is that everything is good and that 586 00:39:14,500 --> 00:39:17,610 the input will be 'ab'. 587 00:39:17,610 --> 00:39:20,150 So let's try it. 588 00:39:20,150 --> 00:39:27,450 Let's print result here every time through and see if we get 589 00:39:27,450 --> 00:39:29,410 what we wanted to get. 590 00:39:45,410 --> 00:39:49,230 All right, that's not what I expected. 591 00:39:49,230 --> 00:39:51,830 So something is wrong. 592 00:39:51,830 --> 00:39:53,080 What's wrong? 593 00:39:56,450 --> 00:40:00,189 Why is result always the empty list? 594 00:40:00,189 --> 00:40:01,175 I can out-wait you. 595 00:40:01,175 --> 00:40:06,721 AUDIENCE: Because whenever it goes through the for loop it 596 00:40:06,721 --> 00:40:07,091 keeps coming back. 597 00:40:07,091 --> 00:40:08,090 PROFESSOR: Right. 598 00:40:08,090 --> 00:40:12,020 So every time through the for loop, it's reinitializing-- 599 00:40:12,020 --> 00:40:15,690 whoa, got you. 600 00:40:15,690 --> 00:40:18,600 For those of you watching on TV, I just hit a person that 601 00:40:18,600 --> 00:40:22,040 was heads down with a piece of candy. 602 00:40:22,040 --> 00:40:24,500 Fortunately it was not a hard candy. 603 00:40:24,500 --> 00:40:26,960 All right, so you're right. 604 00:40:26,960 --> 00:40:28,425 Let's get that out of there. 605 00:40:32,470 --> 00:40:33,755 Put it where it belongs. 606 00:40:37,220 --> 00:40:38,470 Run it again. 607 00:40:49,110 --> 00:40:51,645 OK, are we happy with that result? 608 00:40:57,180 --> 00:41:00,780 Yeah, because I've done that before the append, right? 609 00:41:00,780 --> 00:41:03,900 And now just to be sure, we'll take this print statement out 610 00:41:03,900 --> 00:41:07,310 here and let's put it here. 611 00:41:07,310 --> 00:41:08,560 We're now searching elsewhere. 612 00:41:20,620 --> 00:41:26,110 Well the good news is I now have the right result for the 613 00:41:26,110 --> 00:41:29,920 value of the variable, but the wrong result for the program. 614 00:41:29,920 --> 00:41:33,050 It's still telling me it's a palindrome. 615 00:41:33,050 --> 00:41:47,075 So the moral here is there is no such thing as the bug. 616 00:41:51,630 --> 00:41:55,190 Never use the definitive article. 617 00:41:55,190 --> 00:41:56,440 There is a bug. 618 00:41:59,420 --> 00:42:03,770 There's a story that I've heard related to this, as far 619 00:42:03,770 --> 00:42:04,590 as finding a bug. 620 00:42:04,590 --> 00:42:08,010 You can imagine that you're at someone's house for dinner, 621 00:42:08,010 --> 00:42:10,380 you're sitting at the dining room table, you can't see the 622 00:42:10,380 --> 00:42:15,930 kitchen, and suddenly you hear from the kitchen, [BAM]. 623 00:42:15,930 --> 00:42:17,810 What the heck's that? 624 00:42:17,810 --> 00:42:20,610 Your hostess walks out and says, don't worry I just 625 00:42:20,610 --> 00:42:23,870 killed the cockroach on the turkey. 626 00:42:23,870 --> 00:42:26,035 Well, your immediate reaction is the 627 00:42:26,035 --> 00:42:27,980 cockroach on the turkey? 628 00:42:27,980 --> 00:42:30,660 Where there's one, there's likely to be more. 629 00:42:30,660 --> 00:42:33,470 Every time you found a bug-- 630 00:42:33,470 --> 00:42:35,690 the more bugs you find, then probably the more bugs there 631 00:42:35,690 --> 00:42:38,080 are still left, because you've shown that you 632 00:42:38,080 --> 00:42:40,230 make a lot of mistakes. 633 00:42:40,230 --> 00:42:42,710 All right, onward we go. 634 00:42:42,710 --> 00:42:45,080 So what do we do next? 635 00:42:45,080 --> 00:42:49,010 Well, we now know at least that things look OK to this 636 00:42:49,010 --> 00:42:55,090 point, which suggests that the problem must come below this 637 00:42:55,090 --> 00:42:56,710 in the program. 638 00:42:56,710 --> 00:42:59,260 Well the only thing that's going on below this is the 639 00:42:59,260 --> 00:43:01,770 call to isPal. 640 00:43:01,770 --> 00:43:06,680 So now we'll say OK, we've now isolated the bug to isPal. 641 00:43:06,680 --> 00:43:08,650 That's a good thing. 642 00:43:08,650 --> 00:43:14,950 Let's try and ask where things are going on there. 643 00:43:14,950 --> 00:43:21,680 So we'll take a point halfway through isPal, and we'll print 644 00:43:21,680 --> 00:43:24,060 some things here. 645 00:43:24,060 --> 00:43:25,310 So let's print-- 646 00:43:40,020 --> 00:43:42,680 see what we have here. 647 00:43:42,680 --> 00:43:46,350 But before I do that, I've gotten really tired of typing 648 00:43:46,350 --> 00:43:53,600 'a' and 'b', so I'm going to use something called a test 649 00:43:53,600 --> 00:43:55,385 driver, or a test harness. 650 00:43:58,060 --> 00:44:01,370 And I recommend that you do this kind of thing whenever 651 00:44:01,370 --> 00:44:03,500 you're testing a program. 652 00:44:03,500 --> 00:44:08,190 Write some code that has nothing to do with the program 653 00:44:08,190 --> 00:44:13,850 itself but makes it easier to test and debug the program. 654 00:44:23,290 --> 00:44:26,720 The pretentious word for this is a test harness. 655 00:44:31,900 --> 00:44:35,690 All this is code that helps testing. 656 00:44:35,690 --> 00:44:41,520 One of the things that you see in industry is about half the 657 00:44:41,520 --> 00:44:45,540 code that gets written is not intended to be delivered as 658 00:44:45,540 --> 00:44:49,330 part of the final product, but is there merely for the 659 00:44:49,330 --> 00:44:52,360 purpose of testing and debugging. 660 00:44:52,360 --> 00:44:54,040 It's a big deal. 661 00:44:54,040 --> 00:44:58,210 So don't feel bad that you're writing code that's not part 662 00:44:58,210 --> 00:45:02,050 of the solution to the problem set that is there only to help 663 00:45:02,050 --> 00:45:05,080 you make your code work. 664 00:45:05,080 --> 00:45:07,660 It seems like it's extra work, but in fact, it 665 00:45:07,660 --> 00:45:10,640 will save you work. 666 00:45:10,640 --> 00:45:16,030 So let's call it. 667 00:45:16,030 --> 00:45:17,840 We'll call isPal. 668 00:45:17,840 --> 00:45:19,490 And it's going to print some things that I 669 00:45:19,490 --> 00:45:21,740 think it should do. 670 00:45:21,740 --> 00:45:25,180 In fact, we'll look at what it does first before we look at 671 00:45:25,180 --> 00:45:28,500 the print statements in isPal. 672 00:45:28,500 --> 00:45:30,640 So for the moment, let me just comment these out. 673 00:45:38,980 --> 00:45:48,260 And what we see here is it should print false, and it 674 00:45:48,260 --> 00:45:49,510 prints true. 675 00:45:52,350 --> 00:45:57,520 Well, should it print false the second time? 676 00:45:57,520 --> 00:45:58,980 No, right. 677 00:45:58,980 --> 00:46:03,030 So it should have printed true, and it did. 678 00:46:03,030 --> 00:46:05,440 So this is an important lesson. 679 00:46:05,440 --> 00:46:09,560 Make sure that when you put in these debugging statements, 680 00:46:09,560 --> 00:46:12,770 you write down as part of the print statement what you 681 00:46:12,770 --> 00:46:15,400 expect it to print. 682 00:46:15,400 --> 00:46:19,970 So that when you look at your output you can quickly scan it 683 00:46:19,970 --> 00:46:22,330 and see whether the program is behaving as 684 00:46:22,330 --> 00:46:23,580 you thought it would. 685 00:46:26,000 --> 00:46:29,785 So now, works once doesn't work the other time. 686 00:46:34,840 --> 00:46:44,610 So we'll go back and turn on the print statements up here 687 00:46:44,610 --> 00:46:45,860 and see what we get. 688 00:46:54,210 --> 00:46:59,260 So it's printed temp as 1-2-1 and x as 1-2-1. 689 00:46:59,260 --> 00:47:02,100 So kind of OK that print and x are the 690 00:47:02,100 --> 00:47:05,320 same, we expected that. 691 00:47:05,320 --> 00:47:09,440 But we thought we reversed it. 692 00:47:09,440 --> 00:47:11,770 We've entered 1-2-1 and it is this. 693 00:47:11,770 --> 00:47:12,630 What's going on? 694 00:47:12,630 --> 00:47:15,410 What's wrong? 695 00:47:15,410 --> 00:47:19,550 Well now what we can do, is let's see where it went wrong. 696 00:47:19,550 --> 00:47:25,800 We'll put in another print statement here, see 697 00:47:25,800 --> 00:47:27,125 what value is there. 698 00:47:31,180 --> 00:47:34,950 Well it was 1-2-1 before reverse, and 699 00:47:34,950 --> 00:47:37,460 it's 1-2-1 after reverse. 700 00:47:37,460 --> 00:47:38,710 How come? 701 00:47:41,500 --> 00:47:43,300 Why isn't reverse reversing temp? 702 00:47:46,060 --> 00:47:47,900 AUDIENCE: Do you need parenthesis after reverse? 703 00:47:47,900 --> 00:47:52,220 PROFESSOR: Exactly, I need parenthesis after reverse. 704 00:47:55,390 --> 00:47:59,060 Whoa, close. 705 00:47:59,060 --> 00:48:04,360 Because without the parentheses, all reverse is 706 00:48:04,360 --> 00:48:08,150 doing is nothing. 707 00:48:08,150 --> 00:48:11,500 That's just the name of the method, not an invocation of 708 00:48:11,500 --> 00:48:14,990 the method, right? 709 00:48:14,990 --> 00:48:16,510 All right, now let's run it. 710 00:48:21,220 --> 00:48:22,470 Good news and bad news. 711 00:48:25,890 --> 00:48:29,020 What's the good news? 712 00:48:29,020 --> 00:48:40,780 It has indeed reversed 1-2 right, to make it 2-1 but it's 713 00:48:40,780 --> 00:48:42,030 also reversed x. 714 00:48:44,720 --> 00:48:48,310 So naturally, since it's reversed x temp and x will be 715 00:48:48,310 --> 00:48:50,880 the same, and I get the wrong answer. 716 00:48:50,880 --> 00:48:52,130 What's wrong now? 717 00:48:54,590 --> 00:48:55,160 Yeah? 718 00:48:55,160 --> 00:48:57,380 AUDIENCE: So, I think you're aliasing. 719 00:48:57,380 --> 00:48:58,772 PROFESSOR: I'm aliasing? 720 00:48:58,772 --> 00:49:00,616 AUDIENCE: And it's reversing-- 721 00:49:00,616 --> 00:49:05,210 PROFESSOR: Because now remember how mutation works, 722 00:49:05,210 --> 00:49:09,100 now temp and x both point to the same object. 723 00:49:09,100 --> 00:49:12,590 If I reverse the object, it doesn't matter whether I get 724 00:49:12,590 --> 00:49:15,730 to it through x or I get to through temp it will still 725 00:49:15,730 --> 00:49:17,410 have been reversed. 726 00:49:17,410 --> 00:49:26,190 So in this case, what I'd need to do is this, clone it. 727 00:49:29,910 --> 00:49:34,690 And now when I run my code, it works. 728 00:49:34,690 --> 00:49:36,320 No applause? 729 00:49:36,320 --> 00:49:40,600 All right, a couple more things about debugging next 730 00:49:40,600 --> 00:49:43,750 Tuesday, and then we'll move on to some pretty interesting 731 00:49:43,750 --> 00:49:45,380 topics in the next phase of the course.