1
00:00:01,040 --> 00:00:03,460
The following content is
provided under a Creative

2
00:00:03,460 --> 00:00:04,870
Commons license.

3
00:00:04,870 --> 00:00:07,910
Your support will help MIT
OpenCourseWare continue to

4
00:00:07,910 --> 00:00:11,560
offer high quality educational
resources for free.

5
00:00:11,560 --> 00:00:14,460
To make a donation, or view
additional materials from

6
00:00:14,460 --> 00:00:20,290
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:20,290 --> 00:00:21,540
ocw.mit.edu.

8
00:00:24,708 --> 00:00:28,230
PROFESSOR: OK, I want to remind
you that there's a quiz

9
00:00:28,230 --> 00:00:29,480
one week from today.

10
00:00:32,060 --> 00:00:34,390
Yeah, I know it's soon.

11
00:00:34,390 --> 00:00:40,500
Open book, open notes, no
computing or communication

12
00:00:40,500 --> 00:00:41,750
devices allowed.

13
00:00:44,050 --> 00:00:49,220
Between now and then, probably
tomorrow in fact, or at least

14
00:00:49,220 --> 00:00:52,920
over the weekend, I'll send out
a summary of what I think

15
00:00:52,920 --> 00:00:54,980
we've covered so far
and what you'll be

16
00:00:54,980 --> 00:00:58,210
responsible for in the quiz.

17
00:00:58,210 --> 00:01:03,160
Roughly speaking, it's anything
covered in lectures,

18
00:01:03,160 --> 00:01:06,850
problem sets, or recitations.

19
00:01:06,850 --> 00:01:09,280
I will also post some practice
questions that

20
00:01:09,280 --> 00:01:10,880
you can work on.

21
00:01:10,880 --> 00:01:15,160
And I'll tell you now that we
will not be posting answers to

22
00:01:15,160 --> 00:01:17,910
the practice questions.

23
00:01:17,910 --> 00:01:23,260
Instead, we'll be holding
some quiz reviews.

24
00:01:23,260 --> 00:01:24,510
OK.

25
00:01:26,460 --> 00:01:30,060
I wanted to cover two different
topics today.

26
00:01:30,060 --> 00:01:34,190
The first topic is just
a tiny bit on floating

27
00:01:34,190 --> 00:01:37,030
point numbers in Python.

28
00:01:37,030 --> 00:01:40,120
But in fact, what I'm going to
tell you is true about all

29
00:01:40,120 --> 00:01:42,342
programming languages--

30
00:01:42,342 --> 00:01:44,730
in fact all, computers really.

31
00:01:44,730 --> 00:01:47,655
And then after that we'll spend
most of the lecture on

32
00:01:47,655 --> 00:01:50,470
the topic of debugging.

33
00:01:50,470 --> 00:01:54,030
So let me start with a quick
review of binary numbers.

34
00:01:54,030 --> 00:01:56,940
Because you have to understand
binary numbers to understand

35
00:01:56,940 --> 00:01:59,030
floating point.

36
00:01:59,030 --> 00:02:01,200
So when you first learned about
numbers, you learned

37
00:02:01,200 --> 00:02:03,060
about base 10.

38
00:02:03,060 --> 00:02:06,430
And you learned that a decimal
number is represented by some

39
00:02:06,430 --> 00:02:10,534
combination of the digits 0
through 9, the rightmost place

40
00:02:10,534 --> 00:02:14,200
is the 10 to the 0 place, and
then it's the 10 to the 1

41
00:02:14,200 --> 00:02:18,180
place, the 10 to the
2 place, et cetera.

42
00:02:18,180 --> 00:02:23,790
So for example, the number 302
or the digits 3-0-2 represent

43
00:02:23,790 --> 00:02:30,530
3 times 100, plus 0 times
10, plus 2 times 1.

44
00:02:30,530 --> 00:02:32,000
Duh.

45
00:02:32,000 --> 00:02:36,300
All right, binary numbers are
exactly the same except we

46
00:02:36,300 --> 00:02:39,680
only have two digits
to choose from.

47
00:02:39,680 --> 00:02:44,490
Typically written as 0 and 1 and
everything is represented

48
00:02:44,490 --> 00:02:47,290
by a sequence of those digits.

49
00:02:47,290 --> 00:02:53,600
The rightmost place is 2 to the
0, the next place is 2 to

50
00:02:53,600 --> 00:02:59,160
the 1, 2 to the 3, 2 to
the 4, et cetera.

51
00:02:59,160 --> 00:03:04,730
So for example, if we look at
the binary number 1-0-1, we

52
00:03:04,730 --> 00:03:14,400
see that's equal to 1 times
4, plus 0 times 2, plus

53
00:03:14,400 --> 00:03:17,180
1 times 1, or 5.

54
00:03:20,350 --> 00:03:24,080
So one of the first things we'll
notice is binary numbers

55
00:03:24,080 --> 00:03:27,030
take a lot more digits to
represent them, or take more

56
00:03:27,030 --> 00:03:29,760
digits than decimal numbers.

57
00:03:29,760 --> 00:03:37,380
In fact, if I give you n digits,
n binary digits, how

58
00:03:37,380 --> 00:03:39,810
many different binary numbers
can I represent

59
00:03:39,810 --> 00:03:41,060
with those n digits?

60
00:03:47,653 --> 00:03:52,350
Well, if I gave you n decimal
digits, how many different

61
00:03:52,350 --> 00:03:53,580
numbers can I represent?

62
00:03:53,580 --> 00:03:55,390
How many different values
can I represent?

63
00:03:55,390 --> 00:03:56,310
AUDIENCE: 10 to the n.

64
00:03:56,310 --> 00:03:57,065
PROFESSOR: Pardon?

65
00:03:57,065 --> 00:03:58,712
AUDIENCE: 10 to the n.

66
00:03:58,712 --> 00:03:59,800
PROFESSOR: 10 to the n.

67
00:03:59,800 --> 00:04:02,705
And so, for a binary number it's
going to be 2 to the n.

68
00:04:06,980 --> 00:04:10,620
That's important, because we'll
see as we get to talking

69
00:04:10,620 --> 00:04:14,380
about the complexity of various
algorithms how long

70
00:04:14,380 --> 00:04:18,300
they take to run, or how much
space they use, we'll

71
00:04:18,300 --> 00:04:22,930
frequently be resorting to
arguments of this sort to

72
00:04:22,930 --> 00:04:26,220
understand them.

73
00:04:26,220 --> 00:04:31,440
Now the reason floating point
numbers cause problems for

74
00:04:31,440 --> 00:04:36,960
programmers is that people
have learned to

75
00:04:36,960 --> 00:04:39,390
think in base 10.

76
00:04:39,390 --> 00:04:44,000
Computers do everything in
base 2, and that causes a

77
00:04:44,000 --> 00:04:47,310
cognitive dissonance
sometimes.

78
00:04:47,310 --> 00:04:49,460
Where people are thinking one
thing, and the computer is

79
00:04:49,460 --> 00:04:53,980
doing something slightly
different.

80
00:04:53,980 --> 00:04:58,370
So why do people work
in base 10?

81
00:04:58,370 --> 00:04:59,440
I don't know.

82
00:04:59,440 --> 00:05:02,190
Maybe it's because we have
10 fingers, but we

83
00:05:02,190 --> 00:05:03,500
also have 10 toes.

84
00:05:03,500 --> 00:05:06,620
So why didn't we work
in base 20?

85
00:05:06,620 --> 00:05:08,670
We have one head, I
don't know why.

86
00:05:08,670 --> 00:05:12,350
But we do it, we work
in base 10.

87
00:05:12,350 --> 00:05:16,360
I do know why computers
work in base 2.

88
00:05:16,360 --> 00:05:19,390
And that's because it's easy
to build switches in

89
00:05:19,390 --> 00:05:21,480
electronic hardware.

90
00:05:21,480 --> 00:05:25,450
A switch is some physical
device that has only two

91
00:05:25,450 --> 00:05:29,430
possible positions, on or off.

92
00:05:29,430 --> 00:05:33,700
We can build very efficient
switches in hardware and so

93
00:05:33,700 --> 00:05:39,210
it's easy to represent a number
as a sequence of on and

94
00:05:39,210 --> 00:05:44,120
off bits, which is
either on or off.

95
00:05:44,120 --> 00:05:48,920
Originally they were relays,
then they became transistors,

96
00:05:48,920 --> 00:05:51,090
now they're something altogether
different.

97
00:05:51,090 --> 00:05:55,540
But, what they all had in common
was they were stable in

98
00:05:55,540 --> 00:05:58,770
the off position, they were
stable in the on position, and

99
00:05:58,770 --> 00:06:01,240
they never had to
get in between.

100
00:06:01,240 --> 00:06:06,140
Hence, we represent everything
in computers in binary.

101
00:06:06,140 --> 00:06:11,630
So now let's think about why
that causes some confusion.

102
00:06:11,630 --> 00:06:17,140
And it does only for
fractional numbers.

103
00:06:17,140 --> 00:06:22,680
So for whole numbers binary and
decimal it doesn't matter.

104
00:06:22,680 --> 00:06:27,690
Ints are never confusing, they
sort of do what God told us

105
00:06:27,690 --> 00:06:31,800
integers should do, or whoever
told us integers.

106
00:06:31,800 --> 00:06:35,480
All right, but now let's
look at other things.

107
00:06:35,480 --> 00:06:46,450
So I want to start by looking
at the decimal number 0.125.

108
00:06:46,450 --> 00:06:50,280
What's that as a fraction,
by the way?

109
00:06:50,280 --> 00:06:52,145
Happens to be one what?

110
00:06:52,145 --> 00:06:52,700
AUDIENCE: 1/8.

111
00:06:52,700 --> 00:06:56,010
PROFESSOR: 1/8, we'll see
why that actually

112
00:06:56,010 --> 00:06:58,550
matters in a minute.

113
00:06:58,550 --> 00:07:03,760
So, what does it mean, in
some sense, in decimal?

114
00:07:03,760 --> 00:07:06,708
It's equal to 1 times 10 to the
minus 1 plus 2 times 10 to

115
00:07:06,708 --> 00:07:07,958
the minus 2 plus 5 times
10 to the minus 3.

116
00:07:23,800 --> 00:07:26,510
So it works exactly the same
way that things work on the

117
00:07:26,510 --> 00:07:31,530
other side of, in this case,
the decimal point.

118
00:07:31,530 --> 00:07:35,070
Suppose we want to represent
it in binary.

119
00:07:35,070 --> 00:07:38,150
So instead of a decimal point,
we have a binary point.

120
00:07:41,040 --> 00:07:42,450
What does it look like then?

121
00:07:47,930 --> 00:07:56,550
Well it's equal to what?

122
00:07:56,550 --> 00:07:58,170
1 times--

123
00:07:58,170 --> 00:08:00,550
if it's 1/8, what's
it going to be?

124
00:08:00,550 --> 00:08:00,920
1 times what?

125
00:08:00,920 --> 00:08:03,320
AUDIENCE: 1 times 10
to the minus 3.

126
00:08:03,320 --> 00:08:04,570
PROFESSOR: 10 to the minus 3.

127
00:08:07,880 --> 00:08:14,722
Or, 0.001.

128
00:08:14,722 --> 00:08:17,110
Right?

129
00:08:17,110 --> 00:08:20,140
So, so far, so good.

130
00:08:20,140 --> 00:08:23,450
Not much difference
between the two.

131
00:08:23,450 --> 00:08:26,960
Now let's take a different
decimal number.

132
00:08:26,960 --> 00:08:35,490
What about the decimal 0.1?

133
00:08:35,490 --> 00:08:38,549
I have to tell you that it's a
decimal because it could also

134
00:08:38,549 --> 00:08:41,690
be a binary with just
0's and 1's.

135
00:08:41,690 --> 00:08:45,100
Well, we know how to represent
that in decimal.

136
00:08:49,430 --> 00:08:50,680
How about in binary?

137
00:08:53,430 --> 00:08:54,660
What's the equivalent?

138
00:08:54,660 --> 00:08:57,020
Now that's 1/10, of course.

139
00:08:57,020 --> 00:08:59,010
What does 1/10 look
like in binary?

140
00:09:04,650 --> 00:09:05,900
Any takers?

141
00:09:10,900 --> 00:09:12,610
Well I'll give you a hint.

142
00:09:12,610 --> 00:09:18,078
It's so long, that I don't want
to write it on the board.

143
00:09:23,540 --> 00:09:32,340
In fact, it's worse than
long, it's infinite.

144
00:09:32,340 --> 00:09:35,630
I guess that's kind of long.

145
00:09:35,630 --> 00:09:44,100
It's this repeating
binary fraction.

146
00:09:44,100 --> 00:09:49,580
There is no finite combination
of binary digits that

147
00:09:49,580 --> 00:09:52,295
represent the decimal
fraction 1/10.

148
00:09:55,510 --> 00:09:56,880
There's no way to do it.

149
00:09:59,940 --> 00:10:04,000
And that's why things
get a little hairy.

150
00:10:04,000 --> 00:10:06,495
So we can stop at some finite
number of bits.

151
00:10:09,185 --> 00:10:13,420
And in fact that's what happens
in the internal

152
00:10:13,420 --> 00:10:16,630
representation in Python.

153
00:10:16,630 --> 00:10:22,090
It ends up representing 1/10
as something equivalent to

154
00:10:22,090 --> 00:10:25,490
this decimal fraction.

155
00:10:25,490 --> 00:10:31,390
If I take the number of binary
bits that are inside the

156
00:10:31,390 --> 00:10:36,290
computer, and then I translate
it back to decimal, it turns

157
00:10:36,290 --> 00:10:40,280
out that it's using this
approximation for the decimal

158
00:10:40,280 --> 00:10:41,530
fraction 1/10.

159
00:10:44,780 --> 00:10:49,050
So for example, some of you in
your problem sets -- where you

160
00:10:49,050 --> 00:10:54,060
were computing how much you had
to pay on a credit card --

161
00:10:54,060 --> 00:10:57,110
would get answers that were
eventually off by a penny or

162
00:10:57,110 --> 00:11:00,290
something from what we expected
in some, and that has

163
00:11:00,290 --> 00:11:04,350
to do with the fact that you
were thinking in decimal.

164
00:11:04,350 --> 00:11:08,450
And in fact, you were writing
your program in decimal, yet

165
00:11:08,450 --> 00:11:11,940
internally things were happening
in binary, and when

166
00:11:11,940 --> 00:11:15,630
you thought you were writing
1/10 for example you were

167
00:11:15,630 --> 00:11:20,920
actually getting something like
this inside the computer.

168
00:11:20,920 --> 00:11:24,870
Pretty close to 1/10, but
not exactly 1/10.

169
00:11:29,350 --> 00:11:37,110
Now, when we print it, we get
yet something else because the

170
00:11:37,110 --> 00:11:41,670
print statement uses an internal
function that by

171
00:11:41,670 --> 00:11:46,700
default rounds these things
to 17 digits.

172
00:11:46,700 --> 00:11:51,370
And so you end up getting
something like that, or you

173
00:11:51,370 --> 00:11:53,560
might depending how you do it.

174
00:11:53,560 --> 00:11:55,270
So let's look at an
example here.

175
00:12:07,090 --> 00:12:15,820
So I can do something like
this, and it prints that

176
00:12:15,820 --> 00:12:19,130
because it's doing some
rounding for me.

177
00:12:19,130 --> 00:12:25,430
But if I really look at what's
under there, and look at the

178
00:12:25,430 --> 00:12:30,530
representation, the REPR
function is convenient to get

179
00:12:30,530 --> 00:12:35,890
a sense of what's really going
on inside, it tells me that

180
00:12:35,890 --> 00:12:40,420
well that's a 17-digit
approximation.

181
00:12:40,420 --> 00:12:43,650
And now so that's what's
really lurking there.

182
00:12:43,650 --> 00:12:47,530
So a hint, If you think
something is going funny

183
00:12:47,530 --> 00:12:52,460
because of the way arithmetic
is working, instead of just

184
00:12:52,460 --> 00:13:00,020
using print, you can use print
of REPR to get a better idea

185
00:13:00,020 --> 00:13:02,780
about what's really going on.

186
00:13:02,780 --> 00:13:06,840
All right, now, does
this matter?

187
00:13:06,840 --> 00:13:09,420
Usually it doesn't.

188
00:13:09,420 --> 00:13:12,260
Most of the time it's safe just
to pretend that floating

189
00:13:12,260 --> 00:13:15,970
points work the way you learned
about arithmetic when

190
00:13:15,970 --> 00:13:21,770
you were in third grade, or
probably in kindergarten if

191
00:13:21,770 --> 00:13:24,820
you were educated in
Europe or Asia.

192
00:13:24,820 --> 00:13:29,160
But now let's look at
an example where

193
00:13:29,160 --> 00:13:31,470
you can get in trouble.

194
00:13:31,470 --> 00:13:33,970
So I've got a little
program here.

195
00:13:33,970 --> 00:13:38,480
I initialize x to 0, then I'm
going to go through a loop a

196
00:13:38,480 --> 00:13:44,570
lot of times, where I
increment x by 1/10.

197
00:13:44,570 --> 00:13:47,180
And then I'm going to print x.

198
00:13:47,180 --> 00:13:49,920
And because it's going to do
automatic rounding, it's going

199
00:13:49,920 --> 00:13:53,890
to print 10,000--

200
00:13:53,890 --> 00:14:00,470
or actually, it should
print 100,000, right?

201
00:14:00,470 --> 00:14:02,380
No 10,000, because I'm only
incrementing it by

202
00:14:02,380 --> 00:14:04,450
1/10, excuse me.

203
00:14:04,450 --> 00:14:10,570
But then I'm going to print REPR
of x, and then I'm going

204
00:14:10,570 --> 00:14:13,000
to do a comparison.

205
00:14:13,000 --> 00:14:16,280
Now if floating point arithmetic
worked the way

206
00:14:16,280 --> 00:14:23,370
reals work, we would think that
10.0 times x should equal

207
00:14:23,370 --> 00:14:26,200
the number of iterations.

208
00:14:26,200 --> 00:14:29,710
Because I'm starting at 0, each
time I'm incrementing it

209
00:14:29,710 --> 00:14:35,790
by 1/10, and so if I multiply
the result by 10 at the end, I

210
00:14:35,790 --> 00:14:38,920
should get the same as the
number of iterations.

211
00:14:38,920 --> 00:14:41,270
Does that make sense
to everybody?

212
00:14:41,270 --> 00:14:47,530
That's what you would normally
get if you did this with

213
00:14:47,530 --> 00:14:48,320
pencil and paper.

214
00:14:48,320 --> 00:14:52,550
Of course, it would take you a
really long time to do 100,000

215
00:14:52,550 --> 00:14:53,990
increments.

216
00:14:53,990 --> 00:14:55,240
Let's give it a shot.

217
00:14:58,090 --> 00:15:01,460
And what we'll see is that
if I print it, it

218
00:15:01,460 --> 00:15:04,100
looks OK, it's 1,000.

219
00:15:04,100 --> 00:15:10,570
But if I print REPR of it, I
see it's 10,000, a bunch of

220
00:15:10,570 --> 00:15:15,260
0's, and then 18848.

221
00:15:15,260 --> 00:15:19,650
And, of course, consequently
when I compare it, I get

222
00:15:19,650 --> 00:15:22,280
something that says false.

223
00:15:22,280 --> 00:15:35,230
And that's because if I look
at REPR of 10.0 times x--

224
00:15:35,230 --> 00:15:41,420
well, that's interesting,
what's going on here?

225
00:15:41,420 --> 00:15:43,500
It kind of looks like the
same thing, doesn't it?

226
00:15:47,140 --> 00:15:50,490
But it's not, because way out
there are some other digits

227
00:15:50,490 --> 00:15:53,440
we're not seeing, something
different is happening.

228
00:15:57,190 --> 00:16:00,070
OK, what's the moral of this?

229
00:16:00,070 --> 00:16:01,960
It's not complicated.

230
00:16:01,960 --> 00:16:06,500
It's not, OK write your programs
thinking deeply about

231
00:16:06,500 --> 00:16:10,380
what's going on in those bits
way out there at the end.

232
00:16:10,380 --> 00:16:14,380
It's, don't ever test whether to
floating numbers are equal

233
00:16:14,380 --> 00:16:17,300
to each other.

234
00:16:17,300 --> 00:16:20,130
Instead, do something
like this.

235
00:16:28,370 --> 00:16:32,920
Define a function called
'close', or whatever you want,

236
00:16:32,920 --> 00:16:37,760
that takes two floats
and some epsilon.

237
00:16:37,760 --> 00:16:41,290
And I've given here epsilon
a default value.

238
00:16:41,290 --> 00:16:44,330
And then just return whether the
absolute value of x minus

239
00:16:44,330 --> 00:16:47,460
y is less than epsilon.

240
00:16:47,460 --> 00:16:49,880
So whenever you're comparing
two floating numbers, the

241
00:16:49,880 --> 00:16:53,890
question shouldn't be are they
identical, but are they close

242
00:16:53,890 --> 00:16:56,370
enough for your purposes.

243
00:16:56,370 --> 00:16:59,810
And if you do that, then you
don't get tripped up by this

244
00:16:59,810 --> 00:17:01,780
kind of rounding and
things like that.

245
00:17:05,359 --> 00:17:08,960
Not a complicated story, but
keeping this in mind will get

246
00:17:08,960 --> 00:17:11,050
you out of trouble when you're
doing floating point

247
00:17:11,050 --> 00:17:13,190
arithmetic.

248
00:17:13,190 --> 00:17:15,209
Let's run this, and
see what happens.

249
00:17:18,660 --> 00:17:21,910
And indeed, they're not equal
but they're good enough, close

250
00:17:21,910 --> 00:17:23,160
enough if you will.

251
00:17:25,650 --> 00:17:26,900
OK.

252
00:17:29,480 --> 00:17:32,660
One of the dangers, the reason
this went wrong, is these

253
00:17:32,660 --> 00:17:35,770
little differences can
accumulate if you go through a

254
00:17:35,770 --> 00:17:37,620
lot of iterations.

255
00:17:37,620 --> 00:17:40,990
Sometimes they balance out,
sometimes it rounds up,

256
00:17:40,990 --> 00:17:43,870
sometimes it rounds down,
but not always.

257
00:17:43,870 --> 00:17:46,180
So very simple answer.

258
00:17:46,180 --> 00:17:49,920
Just don't get caught up
in this problem of

259
00:17:49,920 --> 00:17:51,780
floating point numbers.

260
00:17:51,780 --> 00:17:53,030
All right, any questions
about that?

261
00:17:55,960 --> 00:17:58,540
All right, Yes.

262
00:17:58,540 --> 00:18:00,990
AUDIENCE: Doesn't it change
for Python 2.7?

263
00:18:00,990 --> 00:18:05,400
It's only returning 0.1
and not 0.100000.

264
00:18:05,400 --> 00:18:06,380
PROFESSOR: In 2.7?

265
00:18:06,380 --> 00:18:07,360
AUDIENCE: Yeah.

266
00:18:07,360 --> 00:18:08,610
PROFESSOR: Don't know, sorry.

267
00:18:12,560 --> 00:18:15,460
But the moral remains
the same.

268
00:18:15,460 --> 00:18:18,110
Whatever is going on, don't test
floating point numbers

269
00:18:18,110 --> 00:18:23,170
for quality because you'll have
a high probability of

270
00:18:23,170 --> 00:18:26,274
getting false, when you
should get true.

271
00:18:26,274 --> 00:18:27,260
OK.

272
00:18:27,260 --> 00:18:30,480
You almost never get true when
you should get false.

273
00:18:30,480 --> 00:18:32,890
I now want to move on
if there are no more

274
00:18:32,890 --> 00:18:34,140
questions to debugging.

275
00:18:37,220 --> 00:18:40,140
I never know when to give this
lecture in the term.

276
00:18:40,140 --> 00:18:44,920
So what I usually do is I wait
until the volume of email, and

277
00:18:44,920 --> 00:18:48,750
complaints, and office hours
builds, and I realized people

278
00:18:48,750 --> 00:18:51,590
are ready to learn more
about debugging.

279
00:18:51,590 --> 00:18:54,970
If I do it too early, people
don't pay any attention

280
00:18:54,970 --> 00:18:57,490
because they don't realize
it's a problem.

281
00:18:57,490 --> 00:19:00,960
And if I do it too late, they
get irritated with me because

282
00:19:00,960 --> 00:19:02,950
they say well why didn't you
tell me this earlier in the

283
00:19:02,950 --> 00:19:05,950
semester when it would've
done me some good.

284
00:19:05,950 --> 00:19:07,900
So, I pick a time.

285
00:19:07,900 --> 00:19:11,070
And right now it looks like the
need has built up enough

286
00:19:11,070 --> 00:19:13,840
that it's worth doing.

287
00:19:13,840 --> 00:19:18,640
There's a very charming urban
legend about how the process

288
00:19:18,640 --> 00:19:23,320
of fixing flaws in software came
to be known as debugging.

289
00:19:23,320 --> 00:19:25,860
It's one of those stories that's
so nice that you just

290
00:19:25,860 --> 00:19:27,910
want it to be true.

291
00:19:27,910 --> 00:19:30,440
So let's look at this story,
because it's fun.

292
00:19:45,370 --> 00:19:49,950
All right, what you see on the
screen now is a photo of a

293
00:19:49,950 --> 00:19:54,760
book now at the Smithsonian
Museum, of the lab book from

294
00:19:54,760 --> 00:19:59,170
the group working on the Mark
II Aiken Relay computer at

295
00:19:59,170 --> 00:20:02,450
Harvard University.

296
00:20:02,450 --> 00:20:03,030
Pardon?

297
00:20:03,030 --> 00:20:06,210
Oh, I see it on my screen, now
you see it on your screen.

298
00:20:06,210 --> 00:20:07,960
Thank you.

299
00:20:07,960 --> 00:20:10,100
So there it is.

300
00:20:10,100 --> 00:20:17,150
It was September 9, 1947, even
before I was born, it

301
00:20:17,150 --> 00:20:20,190
was that long ago.

302
00:20:20,190 --> 00:20:22,040
And so you can see that
they're running their

303
00:20:22,040 --> 00:20:27,130
computer, and they started to do
an arctan computation, and

304
00:20:27,130 --> 00:20:31,460
it's kind of interesting that
they started it at 8 o'clock

305
00:20:31,460 --> 00:20:33,790
in the morning, and
it ran for two

306
00:20:33,790 --> 00:20:36,470
hours, and then it stopped.

307
00:20:36,470 --> 00:20:38,960
Wow, to do an arctan.

308
00:20:38,960 --> 00:20:42,060
Tells you something about how
fast this computer was.

309
00:20:42,060 --> 00:20:43,550
Then it went on.

310
00:20:43,550 --> 00:20:49,150
Then they started the cosine
tape, and started to do a

311
00:20:49,150 --> 00:20:52,220
multiple adder, and then
something bad happened.

312
00:20:57,470 --> 00:20:58,720
It stopped working.

313
00:21:00,850 --> 00:21:02,100
Whoops.

314
00:21:04,680 --> 00:21:07,450
All right, hold on a second.

315
00:21:16,730 --> 00:21:19,550
And they spent a long time
trying to find out why it

316
00:21:19,550 --> 00:21:21,220
stopped working.

317
00:21:21,220 --> 00:21:25,040
And then they found
out the problem.

318
00:21:25,040 --> 00:21:30,750
They found a moth stuck between
one of the relays.

319
00:21:30,750 --> 00:21:34,140
So it had electromechanical
relays for their switches, the

320
00:21:34,140 --> 00:21:35,480
on and off.

321
00:21:35,480 --> 00:21:39,010
And they were debugging, they
didn't call it debugging.

322
00:21:39,010 --> 00:21:41,540
And they found the software
had failed because the

323
00:21:41,540 --> 00:21:44,620
hardware had failed, and the
hardware had failed because a

324
00:21:44,620 --> 00:21:47,710
bug had been stuck in
one of the relays.

325
00:21:47,710 --> 00:21:51,490
They debugged it, as in removed
the moth, and the

326
00:21:51,490 --> 00:21:54,770
program ran to successful
completion.

327
00:21:54,770 --> 00:21:57,280
And as you can see, the comment
was written in this

328
00:21:57,280 --> 00:22:01,830
book, first actual case
of a bug being found.

329
00:22:01,830 --> 00:22:05,610
Hence, we call it debugging.

330
00:22:05,610 --> 00:22:10,490
This was, by the way, Grace
Murray Hopper's lab book.

331
00:22:10,490 --> 00:22:15,190
She is often described as
the first programmer.

332
00:22:15,190 --> 00:22:16,590
It's unclear if that's true.

333
00:22:16,590 --> 00:22:19,340
What is true, she was
the first female

334
00:22:19,340 --> 00:22:21,700
Admiral in the US Navy.

335
00:22:21,700 --> 00:22:23,980
She was a Navy programmer
who eventually rose

336
00:22:23,980 --> 00:22:25,230
to the rank of Admiral.

337
00:22:27,700 --> 00:22:29,590
So it's a charming story
that this is

338
00:22:29,590 --> 00:22:31,140
why we call it debugging.

339
00:22:31,140 --> 00:22:33,720
Turns out it's not
at all true.

340
00:22:33,720 --> 00:22:38,580
That the phrase debugging had
been used for a long time, and

341
00:22:38,580 --> 00:22:42,840
could easily be traced back to
the 1800s when people were

342
00:22:42,840 --> 00:22:45,820
writing books about electronics
and talking about

343
00:22:45,820 --> 00:22:48,780
debugging even in those days.

344
00:22:48,780 --> 00:22:52,310
And in fact, you can go back to
Shakespeare who talks about

345
00:22:52,310 --> 00:22:57,940
a bugbear, meaning something
causing needless exercise,

346
00:22:57,940 --> 00:23:01,550
needless or excessive
fear or anxiety.

347
00:23:01,550 --> 00:23:04,760
Well that's a good description
of a bug.

348
00:23:04,760 --> 00:23:09,020
And he actually called it
a bug when he had Hamlet

349
00:23:09,020 --> 00:23:09,530
[UNINTELLIGIBLE]

350
00:23:09,530 --> 00:23:12,680
about to bugs and goblins
in my life.

351
00:23:12,680 --> 00:23:15,770
All right, so I want
to start now--

352
00:23:15,770 --> 00:23:21,230
oh by the way, just for fun,
this is what the Mark II

353
00:23:21,230 --> 00:23:23,640
looked like.

354
00:23:23,640 --> 00:23:27,940
This was the computer the took
an hour or so to do an arctan.

355
00:23:27,940 --> 00:23:29,440
You see it filled--

356
00:23:29,440 --> 00:23:31,430
made it's a little hard
to see in this light--

357
00:23:31,430 --> 00:23:33,340
but you can see it filled
an entire room.

358
00:23:35,960 --> 00:23:37,610
Quite amazing.

359
00:23:37,610 --> 00:23:42,120
And, here's a picture of
Admiral Hopper and some

360
00:23:42,120 --> 00:23:45,080
unidentified mail.

361
00:23:45,080 --> 00:23:48,100
All right, if anyone knows who
this it would be good to know

362
00:23:48,100 --> 00:23:50,670
so I can update my archives.

363
00:23:50,670 --> 00:23:54,280
All right, so now on to some

364
00:23:54,280 --> 00:23:57,090
practical aspects of debugging.

365
00:23:57,090 --> 00:23:59,180
The first thing I want
to do is dispel

366
00:23:59,180 --> 00:24:01,660
some myths about debugging.

367
00:24:01,660 --> 00:24:05,030
There is this myth
that bugs crawl

368
00:24:05,030 --> 00:24:07,700
unbidden into our programs.

369
00:24:07,700 --> 00:24:11,770
That we write perfect programs
and somehow a bug just sneaks

370
00:24:11,770 --> 00:24:15,360
in, and ruins perfection.

371
00:24:15,360 --> 00:24:16,650
That's not true.

372
00:24:16,650 --> 00:24:18,200
In fact, if there's
a bug in your

373
00:24:18,200 --> 00:24:21,260
program, you put it there.

374
00:24:21,260 --> 00:24:23,820
So it would be almost better
not to call it a bug, which

375
00:24:23,820 --> 00:24:27,890
sort of sounds like it's not our
fault, but it's a mistake,

376
00:24:27,890 --> 00:24:29,850
it's a screw up.

377
00:24:29,850 --> 00:24:31,630
So get that through your head.

378
00:24:31,630 --> 00:24:35,250
Similarly bugs do not
breed in programs.

379
00:24:35,250 --> 00:24:38,600
If there are multiple bugs in
your program, it's not because

380
00:24:38,600 --> 00:24:42,400
a couple of them got together
and procreated, it's because

381
00:24:42,400 --> 00:24:43,650
you made a lot of mistakes.

382
00:24:46,590 --> 00:24:47,780
Keep that in mind.

383
00:24:47,780 --> 00:24:51,290
With that in mind, we should
think about what the goal of

384
00:24:51,290 --> 00:24:52,900
debugging--

385
00:24:52,900 --> 00:25:14,250
and it's not to eliminate one
bug quickly, it is to move

386
00:25:14,250 --> 00:25:16,070
towards a bug-free program.

387
00:25:21,730 --> 00:25:32,290
And I say this because they're
not always the same strategy

388
00:25:32,290 --> 00:25:36,894
that you would follow for
these different goals.

389
00:25:36,894 --> 00:25:40,660
And I also carefully say to
move towards a bug-free

390
00:25:40,660 --> 00:25:45,010
program because in truth be told
we are hardly ever sure

391
00:25:45,010 --> 00:25:48,290
that we have no bugs left.

392
00:25:48,290 --> 00:25:50,840
Debugging is a learned skill.

393
00:25:50,840 --> 00:25:51,690
Don't despair.

394
00:25:51,690 --> 00:25:55,220
Nobody does it well
instinctively.

395
00:25:55,220 --> 00:25:59,170
Evolution did not train
us to be debuggers.

396
00:25:59,170 --> 00:26:02,290
So a large part, probably the
largest part in many ways, of

397
00:26:02,290 --> 00:26:04,520
learning to be a
good programmer

398
00:26:04,520 --> 00:26:06,850
is learning to debug.

399
00:26:06,850 --> 00:26:11,840
And what that has to do is
thinking systematically and

400
00:26:11,840 --> 00:26:16,990
efficiently about how to move
towards a bug-free program.

401
00:26:16,990 --> 00:26:21,230
The good news is that it's not
hard to learn, and it is a

402
00:26:21,230 --> 00:26:24,170
largely transferable skill.

403
00:26:24,170 --> 00:26:28,580
The same skills you use to debug
software, can be used to

404
00:26:28,580 --> 00:26:31,540
debug laboratory experiments.

405
00:26:31,540 --> 00:26:35,390
I actually give lectures
sometimes to physicians about

406
00:26:35,390 --> 00:26:38,230
how to debug patients.

407
00:26:38,230 --> 00:26:40,350
How to use debugging techniques
to find out what's

408
00:26:40,350 --> 00:26:42,350
wrong with people when
they're sick.

409
00:26:42,350 --> 00:26:46,200
It's a very good and
useful life skill.

410
00:26:46,200 --> 00:26:51,490
Now for four decades, maybe five
decades, people have been

411
00:26:51,490 --> 00:26:56,450
building tools called
the debuggers.

412
00:26:56,450 --> 00:27:00,432
And you'll find that built into
IDOL there is a debugger

413
00:27:00,432 --> 00:27:04,320
that are designed to help people
find out why their

414
00:27:04,320 --> 00:27:08,400
programs don't work,
and fix them.

415
00:27:08,400 --> 00:27:13,300
Personally, I almost
never use one.

416
00:27:13,300 --> 00:27:16,710
The tools are not
that important.

417
00:27:16,710 --> 00:27:19,870
What's important is
the skill of the

418
00:27:19,870 --> 00:27:22,170
craftsman, in this case.

419
00:27:22,170 --> 00:27:26,030
And in fact, most of the
experienced programmers I know

420
00:27:26,030 --> 00:27:27,365
rely on print statements.

421
00:27:29,940 --> 00:27:34,930
So it's OK to use a debugger but
I think the best debugging

422
00:27:34,930 --> 00:27:37,030
tool is print.

423
00:27:37,030 --> 00:27:41,550
And I have to say I've
been surprised--

424
00:27:41,550 --> 00:27:43,770
that's a mild word here--

425
00:27:43,770 --> 00:27:48,670
at how few print statements
you guys seem to use.

426
00:27:48,670 --> 00:27:52,820
I get these emails, or the staff
gets these emails, kind

427
00:27:52,820 --> 00:27:56,500
of plaintiff, why doesn't
my program work?

428
00:27:56,500 --> 00:27:59,050
And then there's a little
piece of code.

429
00:27:59,050 --> 00:28:02,150
And the answer I send back--
when I reply before one of the

430
00:28:02,150 --> 00:28:04,760
TA's do, and they usually
get there first--

431
00:28:04,760 --> 00:28:07,380
is usually, put in a print
statement here

432
00:28:07,380 --> 00:28:09,600
and see what happens.

433
00:28:09,600 --> 00:28:12,300
And I'm just amazed that when
the code arrives it doesn't

434
00:28:12,300 --> 00:28:15,490
have these statements in it.

435
00:28:15,490 --> 00:28:18,630
My favorite response, was I sent
an email to a student,

436
00:28:18,630 --> 00:28:23,345
who shall go nameless, and he--
or maybe it was a she--

437
00:28:23,345 --> 00:28:25,350
and I said, insert a print
statement here

438
00:28:25,350 --> 00:28:26,010
and see what happens.

439
00:28:26,010 --> 00:28:30,040
And I got back to reply saying,
no I don't need a

440
00:28:30,040 --> 00:28:31,930
print statement here I know
what the value of this

441
00:28:31,930 --> 00:28:34,220
variable is.

442
00:28:34,220 --> 00:28:38,550
Well, you know, my reply was
that if all the values were

443
00:28:38,550 --> 00:28:40,680
what you thought they were,
you wouldn't be sending an

444
00:28:40,680 --> 00:28:43,180
email saying, why doesn't
my program work.

445
00:28:43,180 --> 00:28:46,330
Put the darn print statement
and see what happens.

446
00:28:46,330 --> 00:28:49,310
And then I got a gracious email
back saying, more or

447
00:28:49,310 --> 00:28:52,490
less, oops, I see.

448
00:28:52,490 --> 00:28:57,230
But please, when you send us
some code, you want some help,

449
00:28:57,230 --> 00:29:00,265
send us code with some print
statements already in it to at

450
00:29:00,265 --> 00:29:01,770
least show us that
you've tried to

451
00:29:01,770 --> 00:29:04,880
find the bug yourself.

452
00:29:04,880 --> 00:29:07,160
All right, so what we're
essentially doing when we

453
00:29:07,160 --> 00:29:11,890
insert print statements in a
code is searching for the

454
00:29:11,890 --> 00:29:14,020
place in our program where
things have gone awry.

455
00:29:16,710 --> 00:29:22,420
And the key to being a good
debugger is to be systematic

456
00:29:22,420 --> 00:29:24,580
in this search.

457
00:29:24,580 --> 00:29:27,530
So you saw that when we looked
at algorithms for things like

458
00:29:27,530 --> 00:29:29,800
exhaustive enumeration.

459
00:29:29,800 --> 00:29:32,620
We said, well if we're searching
for an answer, we

460
00:29:32,620 --> 00:29:36,480
have to search the space
carefully one at a time.

461
00:29:36,480 --> 00:29:39,120
And then we said, if we want
to search it efficiently,

462
00:29:39,120 --> 00:29:41,930
maybe instead of starting at the
beginning and just going

463
00:29:41,930 --> 00:29:46,260
to the end, we should use
something like binary search.

464
00:29:46,260 --> 00:29:49,960
The same techniques can
be used when you're

465
00:29:49,960 --> 00:29:51,210
searching for bugs.

466
00:29:58,660 --> 00:30:04,240
So I recommend searching for
bugs using some approximation

467
00:30:04,240 --> 00:30:05,490
to binary search.

468
00:30:11,820 --> 00:30:16,180
And we'll see an example of this
as we go forward, but as

469
00:30:16,180 --> 00:30:21,340
we look at the example what I
want you to think about is

470
00:30:21,340 --> 00:30:24,340
what are we searching for?

471
00:30:24,340 --> 00:30:27,400
We know our program
doesn't work.

472
00:30:27,400 --> 00:30:35,680
So the question that I like to
ask, is not why didn't it

473
00:30:35,680 --> 00:30:38,960
produce the answer
I wanted it to?

474
00:30:38,960 --> 00:30:42,030
But, how could it have done
what it had done?

475
00:31:01,700 --> 00:31:04,070
This is a subtly different
question.

476
00:31:04,070 --> 00:31:07,810
And it's usually a much easier
question to answer.

477
00:31:07,810 --> 00:31:10,170
Not why didn't it do
the right thing,

478
00:31:10,170 --> 00:31:12,200
but here it did something.

479
00:31:12,200 --> 00:31:13,590
So I already know what it did.

480
00:31:13,590 --> 00:31:16,780
And I say, I didn't expect
it to do that, so

481
00:31:16,780 --> 00:31:18,860
why did it do that?

482
00:31:18,860 --> 00:31:22,620
Once I know why it did what it
did, it's usually pretty easy

483
00:31:22,620 --> 00:31:24,500
to think how to fix it.

484
00:31:27,590 --> 00:31:29,635
So that's the first
question I ask.

485
00:31:33,230 --> 00:31:37,980
I then go about it using
something akin to the

486
00:31:37,980 --> 00:31:40,250
scientific method, which
we all learned

487
00:31:40,250 --> 00:31:42,710
about many years ago.

488
00:31:42,710 --> 00:31:50,310
And basically the scientific
method is based upon studying

489
00:31:50,310 --> 00:31:51,560
available data.

490
00:32:00,690 --> 00:32:09,080
The data you have is of course
the program text itself, the

491
00:32:09,080 --> 00:32:13,770
test results, you ran some tests
and got the wrong answer

492
00:32:13,770 --> 00:32:18,190
which is why you knew
you had a bug.

493
00:32:18,190 --> 00:32:23,240
And then you can probe it, you
can change the test results by

494
00:32:23,240 --> 00:32:28,110
using print statements so that
you have more data to study.

495
00:32:28,110 --> 00:32:31,840
Keep in mind that you don't
understand this program,

496
00:32:31,840 --> 00:32:33,090
because if you did
it would work.

497
00:32:35,600 --> 00:32:47,640
Once I study this, I form a
hypothesis that at least I

498
00:32:47,640 --> 00:32:49,655
think is consistent
with the data.

499
00:32:56,750 --> 00:33:08,595
And then I go and design and run
a repeatable experiment.

500
00:33:12,728 --> 00:33:15,570
And I want to emphasize the
word repeatable, here.

501
00:33:24,580 --> 00:33:28,200
And again the key thing as with
the scientific method,

502
00:33:28,200 --> 00:33:34,330
the experiment to be useful
must have the potential to

503
00:33:34,330 --> 00:33:36,370
refute the hypothesis.

504
00:33:46,290 --> 00:33:49,870
Why might repeatability
to be an issue?

505
00:33:49,870 --> 00:33:54,650
Well, as we'll see pretty
soon, a lot of programs

506
00:33:54,650 --> 00:33:58,070
involve randomness.

507
00:33:58,070 --> 00:34:02,620
Where you're doing something
equivalent to flipping a coin,

508
00:34:02,620 --> 00:34:05,690
somewhere in the program which
might come up heads or tails,

509
00:34:05,690 --> 00:34:08,110
and the program would
do different things.

510
00:34:08,110 --> 00:34:10,420
We'll see why that's an
important programming

511
00:34:10,420 --> 00:34:12,320
techniques soon.

512
00:34:12,320 --> 00:34:15,960
And once you do that, you can
get different results with

513
00:34:15,960 --> 00:34:17,210
different runs.

514
00:34:19,460 --> 00:34:25,139
More subtly there can be various
kinds of timing errors

515
00:34:25,139 --> 00:34:27,570
deep down in the operating
system where you have multiple

516
00:34:27,570 --> 00:34:30,120
activities going on
at the same time.

517
00:34:30,120 --> 00:34:33,429
This is usually the reason that
you'll see say, Windows

518
00:34:33,429 --> 00:34:38,389
crash, or Word, or PowerPoint,
or something else.

519
00:34:38,389 --> 00:34:42,580
Because there's some timing
error that occurs sometimes.

520
00:34:42,580 --> 00:34:46,350
And probably most commonly,
because there's human input.

521
00:34:46,350 --> 00:34:47,770
Somebody typed something
and they might

522
00:34:47,770 --> 00:34:50,719
type something different.

523
00:34:50,719 --> 00:34:55,440
So one of the things you want
to do when you're systematic

524
00:34:55,440 --> 00:34:59,090
is make sure that you
can replay things.

525
00:34:59,090 --> 00:35:02,780
And we'll talk more about this
when we get to randomness,

526
00:35:02,780 --> 00:35:06,190
about how we go about
doing that.

527
00:35:06,190 --> 00:35:08,570
All right, now let's try and
put this all together in a

528
00:35:08,570 --> 00:35:10,870
little program.

529
00:35:10,870 --> 00:35:14,600
If you've been studying your
handout, as at least one of

530
00:35:14,600 --> 00:35:20,000
the TA's did, you've been kind
of mystified by the fact that

531
00:35:20,000 --> 00:35:22,230
there's a pretty crummy
looking program in it.

532
00:35:26,090 --> 00:35:30,300
And unlike sometimes when I
make mistakes I don't know

533
00:35:30,300 --> 00:35:31,310
I've made, here I

534
00:35:31,310 --> 00:35:34,100
intentionally made some mistakes.

535
00:35:34,100 --> 00:35:37,380
So let's look at this program.

536
00:35:37,380 --> 00:35:41,980
I wrote a function called
is_palindrome that takes in a

537
00:35:41,980 --> 00:35:46,190
list and is intended to return
true if the list is a

538
00:35:46,190 --> 00:35:49,390
palindrome and false
otherwise.

539
00:35:49,390 --> 00:35:52,680
Then I wrote this little program
called Silly that uses

540
00:35:52,680 --> 00:36:00,000
isPal, takes in a number,
requests that the user make

541
00:36:00,000 --> 00:36:05,910
that many inputs, then calls
isPal to find out whether or

542
00:36:05,910 --> 00:36:08,020
not the resultant list
is a palindrome.

543
00:36:10,750 --> 00:36:12,900
Not too complicated.

544
00:36:12,900 --> 00:36:14,630
But now let's run it.

545
00:36:22,560 --> 00:36:25,240
Do Silly of 'five'.

546
00:36:34,670 --> 00:36:37,140
And it tells me 'abcde'
is a palindrome.

547
00:36:39,670 --> 00:36:42,290
All right, I have a bug.

548
00:36:42,290 --> 00:36:46,830
Now I need to go try
and find that bug.

549
00:36:46,830 --> 00:36:49,390
So the first thing I need to
think about when I'm looking

550
00:36:49,390 --> 00:36:56,590
for it is to try and find a
smaller piece of input that

551
00:36:56,590 --> 00:36:57,840
will produce the bug.

552
00:37:08,700 --> 00:37:13,470
So I want to find small input
on which program fails.

553
00:37:13,470 --> 00:37:16,610
Why do I want to find
a smaller input?

554
00:37:16,610 --> 00:37:21,300
Well, a in this case it's less
typing, b if it's a real

555
00:37:21,300 --> 00:37:25,280
program it's probably less
execution time to make it run,

556
00:37:25,280 --> 00:37:28,910
but c it'll be easier to debug
because there are fewer kinds

557
00:37:28,910 --> 00:37:30,560
of problems.

558
00:37:30,560 --> 00:37:33,860
So let me try it on a
small piece of input

559
00:37:33,860 --> 00:37:35,220
say, Silly of 1.

560
00:37:39,240 --> 00:37:42,680
Oh, it gets that right.

561
00:37:42,680 --> 00:37:45,210
So that's no good.

562
00:37:45,210 --> 00:37:48,300
Let me try something else, let's
try Silly of 2, I'm sort

563
00:37:48,300 --> 00:37:49,550
of sneaking up.

564
00:37:52,810 --> 00:37:55,270
It gets that one wrong.

565
00:37:55,270 --> 00:37:59,990
All right, so I know I can
test it on a small input.

566
00:37:59,990 --> 00:38:01,540
So that's a good thing.

567
00:38:01,540 --> 00:38:03,630
I now have a simple test.

568
00:38:03,630 --> 00:38:08,340
Now in this case the code is so
short, and so stupid, that

569
00:38:08,340 --> 00:38:10,520
you could probably look at it
with your eyes and just find

570
00:38:10,520 --> 00:38:12,340
the bug instantly.

571
00:38:12,340 --> 00:38:15,870
But the point of this exercise
is not to find the bug, but to

572
00:38:15,870 --> 00:38:18,890
kind of show the process.

573
00:38:18,890 --> 00:38:21,970
So now I wanted to go through
this process of binary search

574
00:38:21,970 --> 00:38:25,660
to try and find the bug.

575
00:38:25,660 --> 00:38:31,360
So we'll start with Silly, the
top level program, and I'll

576
00:38:31,360 --> 00:38:37,920
look for something about halfway
through, maybe here.

577
00:38:37,920 --> 00:38:42,380
And try and now answer the
question, that I've got a lot

578
00:38:42,380 --> 00:38:49,950
of code and I'm going to find a
point halfway through it and

579
00:38:49,950 --> 00:38:52,730
try and ask is the bug above
this, or below this.

580
00:38:56,940 --> 00:38:58,610
So I need to find some

581
00:38:58,610 --> 00:39:01,430
intermediate value I can check.

582
00:39:01,430 --> 00:39:03,980
And at this point in the program
the only thing I have

583
00:39:03,980 --> 00:39:07,350
done is accumulate
the input, right?

584
00:39:07,350 --> 00:39:09,950
So there's nothing
else to ask.

585
00:39:09,950 --> 00:39:14,500
So my hypothesis is that
everything is good and that

586
00:39:14,500 --> 00:39:17,610
the input will be 'ab'.

587
00:39:17,610 --> 00:39:20,150
So let's try it.

588
00:39:20,150 --> 00:39:27,450
Let's print result here every
time through and see if we get

589
00:39:27,450 --> 00:39:29,410
what we wanted to get.

590
00:39:45,410 --> 00:39:49,230
All right, that's not
what I expected.

591
00:39:49,230 --> 00:39:51,830
So something is wrong.

592
00:39:51,830 --> 00:39:53,080
What's wrong?

593
00:39:56,450 --> 00:40:00,189
Why is result always
the empty list?

594
00:40:00,189 --> 00:40:01,175
I can out-wait you.

595
00:40:01,175 --> 00:40:06,721
AUDIENCE: Because whenever it
goes through the for loop it

596
00:40:06,721 --> 00:40:07,091
keeps coming back.

597
00:40:07,091 --> 00:40:08,090
PROFESSOR: Right.

598
00:40:08,090 --> 00:40:12,020
So every time through the for
loop, it's reinitializing--

599
00:40:12,020 --> 00:40:15,690
whoa, got you.

600
00:40:15,690 --> 00:40:18,600
For those of you watching on TV,
I just hit a person that

601
00:40:18,600 --> 00:40:22,040
was heads down with
a piece of candy.

602
00:40:22,040 --> 00:40:24,500
Fortunately it was
not a hard candy.

603
00:40:24,500 --> 00:40:26,960
All right, so you're right.

604
00:40:26,960 --> 00:40:28,425
Let's get that out of there.

605
00:40:32,470 --> 00:40:33,755
Put it where it belongs.

606
00:40:37,220 --> 00:40:38,470
Run it again.

607
00:40:49,110 --> 00:40:51,645
OK, are we happy with
that result?

608
00:40:57,180 --> 00:41:00,780
Yeah, because I've done that
before the append, right?

609
00:41:00,780 --> 00:41:03,900
And now just to be sure, we'll
take this print statement out

610
00:41:03,900 --> 00:41:07,310
here and let's put it here.

611
00:41:07,310 --> 00:41:08,560
We're now searching elsewhere.

612
00:41:20,620 --> 00:41:26,110
Well the good news is I now have
the right result for the

613
00:41:26,110 --> 00:41:29,920
value of the variable, but the
wrong result for the program.

614
00:41:29,920 --> 00:41:33,050
It's still telling me
it's a palindrome.

615
00:41:33,050 --> 00:41:47,075
So the moral here is there is
no such thing as the bug.

616
00:41:51,630 --> 00:41:55,190
Never use the definitive
article.

617
00:41:55,190 --> 00:41:56,440
There is a bug.

618
00:41:59,420 --> 00:42:03,770
There's a story that I've heard
related to this, as far

619
00:42:03,770 --> 00:42:04,590
as finding a bug.

620
00:42:04,590 --> 00:42:08,010
You can imagine that you're at
someone's house for dinner,

621
00:42:08,010 --> 00:42:10,380
you're sitting at the dining
room table, you can't see the

622
00:42:10,380 --> 00:42:15,930
kitchen, and suddenly you hear
from the kitchen, [BAM].

623
00:42:15,930 --> 00:42:17,810
What the heck's that?

624
00:42:17,810 --> 00:42:20,610
Your hostess walks out and
says, don't worry I just

625
00:42:20,610 --> 00:42:23,870
killed the cockroach
on the turkey.

626
00:42:23,870 --> 00:42:26,035
Well, your immediate
reaction is the

627
00:42:26,035 --> 00:42:27,980
cockroach on the turkey?

628
00:42:27,980 --> 00:42:30,660
Where there's one, there's
likely to be more.

629
00:42:30,660 --> 00:42:33,470
Every time you found a bug--

630
00:42:33,470 --> 00:42:35,690
the more bugs you find, then
probably the more bugs there

631
00:42:35,690 --> 00:42:38,080
are still left, because
you've shown that you

632
00:42:38,080 --> 00:42:40,230
make a lot of mistakes.

633
00:42:40,230 --> 00:42:42,710
All right, onward we go.

634
00:42:42,710 --> 00:42:45,080
So what do we do next?

635
00:42:45,080 --> 00:42:49,010
Well, we now know at least that
things look OK to this

636
00:42:49,010 --> 00:42:55,090
point, which suggests that the
problem must come below this

637
00:42:55,090 --> 00:42:56,710
in the program.

638
00:42:56,710 --> 00:42:59,260
Well the only thing that's going
on below this is the

639
00:42:59,260 --> 00:43:01,770
call to isPal.

640
00:43:01,770 --> 00:43:06,680
So now we'll say OK, we've now
isolated the bug to isPal.

641
00:43:06,680 --> 00:43:08,650
That's a good thing.

642
00:43:08,650 --> 00:43:14,950
Let's try and ask where things
are going on there.

643
00:43:14,950 --> 00:43:21,680
So we'll take a point halfway
through isPal, and we'll print

644
00:43:21,680 --> 00:43:24,060
some things here.

645
00:43:24,060 --> 00:43:25,310
So let's print--

646
00:43:40,020 --> 00:43:42,680
see what we have here.

647
00:43:42,680 --> 00:43:46,350
But before I do that, I've
gotten really tired of typing

648
00:43:46,350 --> 00:43:53,600
'a' and 'b', so I'm going to
use something called a test

649
00:43:53,600 --> 00:43:55,385
driver, or a test harness.

650
00:43:58,060 --> 00:44:01,370
And I recommend that you do this
kind of thing whenever

651
00:44:01,370 --> 00:44:03,500
you're testing a program.

652
00:44:03,500 --> 00:44:08,190
Write some code that has nothing
to do with the program

653
00:44:08,190 --> 00:44:13,850
itself but makes it easier to
test and debug the program.

654
00:44:23,290 --> 00:44:26,720
The pretentious word for
this is a test harness.

655
00:44:31,900 --> 00:44:35,690
All this is code that
helps testing.

656
00:44:35,690 --> 00:44:41,520
One of the things that you see
in industry is about half the

657
00:44:41,520 --> 00:44:45,540
code that gets written is not
intended to be delivered as

658
00:44:45,540 --> 00:44:49,330
part of the final product, but
is there merely for the

659
00:44:49,330 --> 00:44:52,360
purpose of testing
and debugging.

660
00:44:52,360 --> 00:44:54,040
It's a big deal.

661
00:44:54,040 --> 00:44:58,210
So don't feel bad that you're
writing code that's not part

662
00:44:58,210 --> 00:45:02,050
of the solution to the problem
set that is there only to help

663
00:45:02,050 --> 00:45:05,080
you make your code work.

664
00:45:05,080 --> 00:45:07,660
It seems like it's extra
work, but in fact, it

665
00:45:07,660 --> 00:45:10,640
will save you work.

666
00:45:10,640 --> 00:45:16,030
So let's call it.

667
00:45:16,030 --> 00:45:17,840
We'll call isPal.

668
00:45:17,840 --> 00:45:19,490
And it's going to print
some things that I

669
00:45:19,490 --> 00:45:21,740
think it should do.

670
00:45:21,740 --> 00:45:25,180
In fact, we'll look at what it
does first before we look at

671
00:45:25,180 --> 00:45:28,500
the print statements in isPal.

672
00:45:28,500 --> 00:45:30,640
So for the moment, let me
just comment these out.

673
00:45:38,980 --> 00:45:48,260
And what we see here is it
should print false, and it

674
00:45:48,260 --> 00:45:49,510
prints true.

675
00:45:52,350 --> 00:45:57,520
Well, should it print false
the second time?

676
00:45:57,520 --> 00:45:58,980
No, right.

677
00:45:58,980 --> 00:46:03,030
So it should have printed
true, and it did.

678
00:46:03,030 --> 00:46:05,440
So this is an important
lesson.

679
00:46:05,440 --> 00:46:09,560
Make sure that when you put in
these debugging statements,

680
00:46:09,560 --> 00:46:12,770
you write down as part of the
print statement what you

681
00:46:12,770 --> 00:46:15,400
expect it to print.

682
00:46:15,400 --> 00:46:19,970
So that when you look at your
output you can quickly scan it

683
00:46:19,970 --> 00:46:22,330
and see whether the program
is behaving as

684
00:46:22,330 --> 00:46:23,580
you thought it would.

685
00:46:26,000 --> 00:46:29,785
So now, works once doesn't
work the other time.

686
00:46:34,840 --> 00:46:44,610
So we'll go back and turn on the
print statements up here

687
00:46:44,610 --> 00:46:45,860
and see what we get.

688
00:46:54,210 --> 00:46:59,260
So it's printed temp as
1-2-1 and x as 1-2-1.

689
00:46:59,260 --> 00:47:02,100
So kind of OK that print
and x are the

690
00:47:02,100 --> 00:47:05,320
same, we expected that.

691
00:47:05,320 --> 00:47:09,440
But we thought we reversed it.

692
00:47:09,440 --> 00:47:11,770
We've entered 1-2-1
and it is this.

693
00:47:11,770 --> 00:47:12,630
What's going on?

694
00:47:12,630 --> 00:47:15,410
What's wrong?

695
00:47:15,410 --> 00:47:19,550
Well now what we can do, is
let's see where it went wrong.

696
00:47:19,550 --> 00:47:25,800
We'll put in another print
statement here, see

697
00:47:25,800 --> 00:47:27,125
what value is there.

698
00:47:31,180 --> 00:47:34,950
Well it was 1-2-1 before
reverse, and

699
00:47:34,950 --> 00:47:37,460
it's 1-2-1 after reverse.

700
00:47:37,460 --> 00:47:38,710
How come?

701
00:47:41,500 --> 00:47:43,300
Why isn't reverse
reversing temp?

702
00:47:46,060 --> 00:47:47,900
AUDIENCE: Do you need
parenthesis after reverse?

703
00:47:47,900 --> 00:47:52,220
PROFESSOR: Exactly, I need
parenthesis after reverse.

704
00:47:55,390 --> 00:47:59,060
Whoa, close.

705
00:47:59,060 --> 00:48:04,360
Because without the parentheses,
all reverse is

706
00:48:04,360 --> 00:48:08,150
doing is nothing.

707
00:48:08,150 --> 00:48:11,500
That's just the name of the
method, not an invocation of

708
00:48:11,500 --> 00:48:14,990
the method, right?

709
00:48:14,990 --> 00:48:16,510
All right, now let's run it.

710
00:48:21,220 --> 00:48:22,470
Good news and bad news.

711
00:48:25,890 --> 00:48:29,020
What's the good news?

712
00:48:29,020 --> 00:48:40,780
It has indeed reversed 1-2
right, to make it 2-1 but it's

713
00:48:40,780 --> 00:48:42,030
also reversed x.

714
00:48:44,720 --> 00:48:48,310
So naturally, since it's
reversed x temp and x will be

715
00:48:48,310 --> 00:48:50,880
the same, and I get
the wrong answer.

716
00:48:50,880 --> 00:48:52,130
What's wrong now?

717
00:48:54,590 --> 00:48:55,160
Yeah?

718
00:48:55,160 --> 00:48:57,380
AUDIENCE: So, I think
you're aliasing.

719
00:48:57,380 --> 00:48:58,772
PROFESSOR: I'm aliasing?

720
00:48:58,772 --> 00:49:00,616
AUDIENCE: And it's reversing--

721
00:49:00,616 --> 00:49:05,210
PROFESSOR: Because now remember
how mutation works,

722
00:49:05,210 --> 00:49:09,100
now temp and x both point
to the same object.

723
00:49:09,100 --> 00:49:12,590
If I reverse the object, it
doesn't matter whether I get

724
00:49:12,590 --> 00:49:15,730
to it through x or I get to
through temp it will still

725
00:49:15,730 --> 00:49:17,410
have been reversed.

726
00:49:17,410 --> 00:49:26,190
So in this case, what I'd need
to do is this, clone it.

727
00:49:29,910 --> 00:49:34,690
And now when I run my
code, it works.

728
00:49:34,690 --> 00:49:36,320
No applause?

729
00:49:36,320 --> 00:49:40,600
All right, a couple more things
about debugging next

730
00:49:40,600 --> 00:49:43,750
Tuesday, and then we'll move on
to some pretty interesting

731
00:49:43,750 --> 00:49:45,380
topics in the next phase
of the course.