1
00:00:01,060 --> 00:00:03,090
PROFESSOR: So in this
final segment today,

2
00:00:03,090 --> 00:00:06,900
we're going to talk about
set theory just a little bit.

3
00:00:06,900 --> 00:00:10,370
Because if you're going
to take a math class,

4
00:00:10,370 --> 00:00:13,820
if you're going to be exposed
to math for computer science,

5
00:00:13,820 --> 00:00:16,110
it's useful to have
at least a glimmering

6
00:00:16,110 --> 00:00:19,600
of what the foundations of
math looks like, how it starts

7
00:00:19,600 --> 00:00:20,860
and how it gets justified.

8
00:00:20,860 --> 00:00:23,310
And that's what set theory does.

9
00:00:23,310 --> 00:00:27,540
In addition, we will see
that the diagonal argument

10
00:00:27,540 --> 00:00:30,790
that we've already made much
of played a crucial role

11
00:00:30,790 --> 00:00:34,460
in the development and
understanding of set theory.

12
00:00:34,460 --> 00:00:38,130
So let's begin
with an issue that

13
00:00:38,130 --> 00:00:41,220
plays an important role in set
theory and in computer science,

14
00:00:41,220 --> 00:00:44,250
having to do with the
idea of taking a function

15
00:00:44,250 --> 00:00:47,500
and applying it to itself,
or having something

16
00:00:47,500 --> 00:00:49,020
refer to itself.

17
00:00:49,020 --> 00:00:51,900
And this is one of these things
that's notoriously doubtful.

18
00:00:51,900 --> 00:00:53,120
There's all these paradoxes.

19
00:00:53,120 --> 00:00:56,330
But maybe the
simplest one is when I

20
00:00:56,330 --> 00:00:58,810
assert this statement is false.

21
00:00:58,810 --> 00:01:01,780
And the question is,
it true or false?

22
00:01:01,780 --> 00:01:04,364
Well, if it's true,
then it's false.

23
00:01:04,364 --> 00:01:05,780
And if it's false,
then it's true.

24
00:01:05,780 --> 00:01:08,100
And we get a kind of buzzer.

25
00:01:08,100 --> 00:01:11,800
It's not possible to figure
out whether this statement is

26
00:01:11,800 --> 00:01:12,420
true or false.

27
00:01:12,420 --> 00:01:15,300
I think we would deny
that it was a proposition.

28
00:01:15,300 --> 00:01:17,360
So that's a hint that
there's something

29
00:01:17,360 --> 00:01:21,080
suspicious about self-reference,
self-application, and so one.

30
00:01:21,080 --> 00:01:23,070
On the other hand,
it's worth remembering

31
00:01:23,070 --> 00:01:25,620
that in computer science,
we take this for granted.

32
00:01:25,620 --> 00:01:27,810
So let's look at an example.

33
00:01:27,810 --> 00:01:32,590
Here's a simple example
of a list in Scheme

34
00:01:32,590 --> 00:01:36,730
Lisp notation, meaning it's
a list of three things, 0, 1,

35
00:01:36,730 --> 00:01:37,750
and 2.

36
00:01:37,750 --> 00:01:40,900
And the black parens
indicate that we're thinking

37
00:01:40,900 --> 00:01:43,120
of it as an ordered list.

38
00:01:43,120 --> 00:01:46,100
Now the way that I would
represent a list like that

39
00:01:46,100 --> 00:01:49,690
in memory, typically, is
by using these things are

40
00:01:49,690 --> 00:01:50,620
called cons cells.

41
00:01:50,620 --> 00:01:52,750
So a cons cell has
these two parts.

42
00:01:52,750 --> 00:01:56,700
The left hand part
points to the value

43
00:01:56,700 --> 00:01:58,410
in that location in the list.

44
00:01:58,410 --> 00:02:00,990
So this first cons
cell points to 0,

45
00:02:00,990 --> 00:02:02,750
which is the first
element in the list.

46
00:02:02,750 --> 00:02:06,050
The second component of the cons
cell points to the next element

47
00:02:06,050 --> 00:02:06,710
to the list.

48
00:02:06,710 --> 00:02:09,590
And so here you see 1
pointing to the third element

49
00:02:09,590 --> 00:02:10,090
of the list.

50
00:02:10,090 --> 00:02:11,220
And there you see 2.

51
00:02:11,220 --> 00:02:13,840
And that little nil
marker indicates

52
00:02:13,840 --> 00:02:15,080
that's the end of the list.

53
00:02:15,080 --> 00:02:17,140
So that's a simple
representation

54
00:02:17,140 --> 00:02:21,650
of a list of length three
with three con cells.

55
00:02:21,650 --> 00:02:25,180
One of the things that
computer science lets

56
00:02:25,180 --> 00:02:27,050
you do and many
languages let you do

57
00:02:27,050 --> 00:02:30,220
is you can manipulate
these pointers.

58
00:02:30,220 --> 00:02:34,000
So using the language
of Scheme, what I can do

59
00:02:34,000 --> 00:02:39,240
is I'll do an operation called
setcar where I'm taking,

60
00:02:39,240 --> 00:02:42,940
in this case, I'm setting
the second element of L, that

61
00:02:42,940 --> 00:02:46,590
is this cons cell, to
L. And setcar is saying,

62
00:02:46,590 --> 00:02:54,360
let's change what the element in
the left hand part of this cell

63
00:02:54,360 --> 00:02:55,030
is.

64
00:02:55,030 --> 00:02:57,600
This is where the value
of the second element is.

65
00:02:57,600 --> 00:03:01,089
Let's change the value of
the second element to be L.

66
00:03:01,089 --> 00:03:03,005
What does that mean as
a pointer manipulation?

67
00:03:03,005 --> 00:03:04,670
Well, it's pretty simple.

68
00:03:04,670 --> 00:03:06,400
I just move this
pointer to point

69
00:03:06,400 --> 00:03:10,490
to the beginning of
the list L. And now

70
00:03:10,490 --> 00:03:14,870
I have an interesting
situation, because this list now

71
00:03:14,870 --> 00:03:19,700
is a list it consists
of 0 and L And 2.

72
00:03:19,700 --> 00:03:22,950
It's a list that has
itself as a member.

73
00:03:22,950 --> 00:03:25,360
And it makes perfect sense.

74
00:03:25,360 --> 00:03:27,580
And if you sort of
expand that out,

75
00:03:27,580 --> 00:03:31,580
L is this list
that begins with 0.

76
00:03:31,580 --> 00:03:35,570
And then its second element
is a list that begins with 0.

77
00:03:35,570 --> 00:03:37,450
And the second
element of that list

78
00:03:37,450 --> 00:03:39,760
is a list that begins
with 0, and so on.

79
00:03:39,760 --> 00:03:42,580
And then the third
element of L is 2,

80
00:03:42,580 --> 00:03:45,000
and the third element of
the second element of L

81
00:03:45,000 --> 00:03:46,460
is 2, and so on.

82
00:03:46,460 --> 00:03:49,060
It's an interesting
infinite nested structure

83
00:03:49,060 --> 00:03:55,100
that's nicely represented by
this finite circular list.

84
00:03:58,710 --> 00:04:01,460
Let's look at another example
where, in computer science,

85
00:04:01,460 --> 00:04:04,470
we actually apply
things to themselves.

86
00:04:04,470 --> 00:04:06,610
So let's define the
composition operator.

87
00:04:06,610 --> 00:04:10,310
And again, I'm using notation
from the language Scheme.

88
00:04:10,310 --> 00:04:13,270
I want to take two functions f
and g that take one argument.

89
00:04:13,270 --> 00:04:15,250
I'm going to define
their composition.

90
00:04:15,250 --> 00:04:17,760
The way that I
compose f and g is

91
00:04:17,760 --> 00:04:20,670
I define a new function, h
of x, which is going to be

92
00:04:20,670 --> 00:04:22,380
the composition of h and g.

93
00:04:22,380 --> 00:04:27,970
The way I defined h
of x is I say apply

94
00:04:27,970 --> 00:04:32,390
f to g applied to x
and return the value h.

95
00:04:32,390 --> 00:04:35,360
So this is a compose
is a procedure that

96
00:04:35,360 --> 00:04:41,540
takes two procedures f and g and
returns their composition, h.

97
00:04:41,540 --> 00:04:43,520
OK, let's practice.

98
00:04:43,520 --> 00:04:47,610
Suppose that I compose
the square function that

99
00:04:47,610 --> 00:04:51,010
maps x to x squared, and
the add1 function that

100
00:04:51,010 --> 00:04:52,820
maps x to x plus 1.

101
00:04:52,820 --> 00:04:59,280
Well, if I compose the
square of adding 1,

102
00:04:59,280 --> 00:05:06,170
and I apply it to 3, what I'm
saying is let's add 1 to 3,

103
00:05:06,170 --> 00:05:07,630
and then square it.

104
00:05:07,630 --> 00:05:13,150
And I get 3 plus 1 squared, or
16, because the add1 and then

105
00:05:13,150 --> 00:05:17,290
square it is the function
that's the composition of square

106
00:05:17,290 --> 00:05:18,920
and add1.

107
00:05:18,920 --> 00:05:20,510
Now I can do the following.

108
00:05:20,510 --> 00:05:23,280
I could compose
square with itself.

109
00:05:23,280 --> 00:05:26,280
If I take the function,
square it, and square that,

110
00:05:26,280 --> 00:05:28,550
I'm really taking
the fourth power.

111
00:05:28,550 --> 00:05:32,580
So if I apply the function
composed of square square to 3,

112
00:05:32,580 --> 00:05:36,640
I get 3 square square, or
81, or 3 to the fourth.

113
00:05:36,640 --> 00:05:39,200
All makes perfect sense.

114
00:05:39,200 --> 00:05:44,920
Well now let's define a compose
it with itself operation.

115
00:05:44,920 --> 00:05:46,570
I'm going to call it comp2.

116
00:05:46,570 --> 00:05:48,690
comp2 takes one function f.

117
00:05:48,690 --> 00:05:53,680
And the definition of
comp2 is compose f with f.

118
00:05:53,680 --> 00:05:58,880
And if I then apply
comp2 to square and 3,

119
00:05:58,880 --> 00:06:01,470
it's saying, OK, compose
square and square.

120
00:06:01,470 --> 00:06:02,270
We just did that.

121
00:06:02,270 --> 00:06:03,700
That was the fourth power.

122
00:06:03,700 --> 00:06:04,690
Apply it 3.

123
00:06:04,690 --> 00:06:06,580
I get 81.

124
00:06:06,580 --> 00:06:08,370
And now we can do
some weird stuff.

125
00:06:08,370 --> 00:06:13,950
Because suppose that I
apply comp2 to comp2,

126
00:06:13,950 --> 00:06:19,290
and then apply that to
add1, and apply that to 3.

127
00:06:19,290 --> 00:06:20,970
Well that one's a
little hard to follow,

128
00:06:20,970 --> 00:06:22,720
and I'm going to let
you think it through.

129
00:06:22,720 --> 00:06:28,710
But comp2 of comp2 is
compose something four times.

130
00:06:28,710 --> 00:06:32,360
And when you do that
with add1, what happens

131
00:06:32,360 --> 00:06:39,110
is that you're adding
1 four times to 3.

132
00:06:39,110 --> 00:06:44,130
If I comp2 of comp2
of square, I'm

133
00:06:44,130 --> 00:06:46,710
composing square with
itself, and then composing

134
00:06:46,710 --> 00:06:47,530
that with itself.

135
00:06:47,530 --> 00:06:51,110
I'm really squaring four times.

136
00:06:51,110 --> 00:06:59,040
And I wind up with 2
to the fourth, or 16,

137
00:06:59,040 --> 00:07:00,660
is the power that I'm taking.

138
00:07:00,660 --> 00:07:03,660
And so compose2 of
compose2 of square of 3

139
00:07:03,660 --> 00:07:06,769
is this rather large
number, 3 to the 16th.

140
00:07:06,769 --> 00:07:08,810
It could be a little bit
tricky to think through,

141
00:07:08,810 --> 00:07:10,101
but it all makes perfect sense.

142
00:07:10,101 --> 00:07:14,100
And it works just fine in
recursive programming languages

143
00:07:14,100 --> 00:07:17,530
that allow this kind of
untyped or generically

144
00:07:17,530 --> 00:07:20,120
typed composition.

145
00:07:20,120 --> 00:07:23,080
So why is it that computer
scientists are so daring,

146
00:07:23,080 --> 00:07:27,720
and mathematicians are so
timid about self-reference?

147
00:07:27,720 --> 00:07:29,520
And the reason is
that mathematicians

148
00:07:29,520 --> 00:07:32,010
have been traumatized by
Bertrand Russell because

149
00:07:32,010 --> 00:07:34,650
of Russell's famous
paradox, which

150
00:07:34,650 --> 00:07:36,660
we're now ready to look at.

151
00:07:36,660 --> 00:07:40,650
So what Russell was
proposing, and it's

152
00:07:40,650 --> 00:07:43,890
going to look just like a
diagonal argument is, Russell

153
00:07:43,890 --> 00:07:50,580
said, let's let W be the
collection of sets s such

154
00:07:50,580 --> 00:07:53,310
that s is not a member of s.

155
00:07:53,310 --> 00:07:55,440
Now let's think about
that for a little bit.

156
00:07:55,440 --> 00:07:57,240
Most sets are not
members of themselves.

157
00:07:57,240 --> 00:08:01,650
Like the set of integers
is not a member of itself

158
00:08:01,650 --> 00:08:04,470
because the only thing
in it are integers.

159
00:08:04,470 --> 00:08:10,960
And the power set of integers
is not a member of itself

160
00:08:10,960 --> 00:08:15,450
because every member of
the power set of integers

161
00:08:15,450 --> 00:08:18,320
is a set of integers, whereas
the power set of integers

162
00:08:18,320 --> 00:08:20,050
is a set of sets
of those things.

163
00:08:20,050 --> 00:08:22,770
So those familiar
sets are typically not

164
00:08:22,770 --> 00:08:24,050
members of themselves.

165
00:08:24,050 --> 00:08:26,620
But who knows, maybe
there are these weird sets

166
00:08:26,620 --> 00:08:31,260
like the circular list,
or a function that

167
00:08:31,260 --> 00:08:35,320
can compose with itself,
that is a member of itself.

168
00:08:35,320 --> 00:08:37,590
Now let me step
back for a moment

169
00:08:37,590 --> 00:08:40,909
and mention where did Russell
get thinking about this.

170
00:08:40,909 --> 00:08:44,740
And it comes from the period
in the late 19th century

171
00:08:44,740 --> 00:08:46,700
when mathematicians
and logicians

172
00:08:46,700 --> 00:08:50,360
were trying to think about
confirming and establishing

173
00:08:50,360 --> 00:08:51,570
the foundations of math.

174
00:08:51,570 --> 00:08:53,480
What was math absolutely about?

175
00:08:53,480 --> 00:08:55,450
What were the
fundamental objects

176
00:08:55,450 --> 00:08:59,040
that mathematics
could be built from,

177
00:08:59,040 --> 00:09:02,090
and what were the rules for
understanding those objects?

178
00:09:02,090 --> 00:09:04,806
And it was pretty well
agreed that sets were it.

179
00:09:04,806 --> 00:09:06,430
You could build
everything out of sets.

180
00:09:06,430 --> 00:09:08,340
And you just needed
to understand sets,

181
00:09:08,340 --> 00:09:10,350
and then you were in business.

182
00:09:10,350 --> 00:09:12,470
And there was a
German mathematician

183
00:09:12,470 --> 00:09:16,190
named Frege who tried
to demonstrate this

184
00:09:16,190 --> 00:09:20,780
by developing a set
theory very carefully,

185
00:09:20,780 --> 00:09:23,200
giving careful axioms
for what sets were.

186
00:09:23,200 --> 00:09:25,430
And he showed how you
could build, out of sets,

187
00:09:25,430 --> 00:09:26,760
you could build the integers.

188
00:09:26,760 --> 00:09:28,540
And then you could
build rationals,

189
00:09:28,540 --> 00:09:30,510
which are sort of just
pairs of integers.

190
00:09:30,510 --> 00:09:32,420
And then you could
build real numbers

191
00:09:32,420 --> 00:09:34,820
by taking collections
of rationals

192
00:09:34,820 --> 00:09:39,020
that had at least upper bound.

193
00:09:39,020 --> 00:09:40,319
And then you keep going.

194
00:09:40,319 --> 00:09:42,360
You can build functions
and continuous functions.

195
00:09:42,360 --> 00:09:44,960
And he showed how
you could build up

196
00:09:44,960 --> 00:09:47,510
the basic structures of
mathematical analysis

197
00:09:47,510 --> 00:09:52,747
and prove their basic theorems
in his formal set theory.

198
00:09:52,747 --> 00:09:54,830
The problem was that Russell
came along and looked

199
00:09:54,830 --> 00:10:00,870
at Frege's set theory, and came
up with the following paradox.

200
00:10:00,870 --> 00:10:04,630
He defined W to be the
collection of s in sets

201
00:10:04,630 --> 00:10:06,500
such that s is
not a member of s.

202
00:10:06,500 --> 00:10:10,230
Frege would certainly have
said that's a well defined set,

203
00:10:10,230 --> 00:10:13,660
and he will acknowledge
the W is a set.

204
00:10:13,660 --> 00:10:15,524
And let's look at
what this means.

205
00:10:15,524 --> 00:10:16,690
This is a diagonal argument.

206
00:10:16,690 --> 00:10:19,940
So let's remember, by
this definition of W,

207
00:10:19,940 --> 00:10:22,630
what we have is that
a set s is in W if

208
00:10:22,630 --> 00:10:24,806
and only if s is
not a member of s.

209
00:10:24,806 --> 00:10:27,040
OK, that's fine.

210
00:10:27,040 --> 00:10:30,100
Then just let s be W.
And we immediately get

211
00:10:30,100 --> 00:10:35,870
a contradiction that W is in W
if and only if W is not in W.

212
00:10:35,870 --> 00:10:37,700
Poor Frege.

213
00:10:37,700 --> 00:10:39,660
His book was a disaster.

214
00:10:39,660 --> 00:10:40,455
Math is broken.

215
00:10:42,862 --> 00:10:44,320
You can prove that
you're the pope.

216
00:10:44,320 --> 00:10:47,640
You could prove that pigs
fly, verify programs crash.

217
00:10:47,640 --> 00:10:49,760
Math is just broken.

218
00:10:49,760 --> 00:10:50,780
It's not reliable.

219
00:10:50,780 --> 00:10:53,970
You can prove anything
in Frege's system,

220
00:10:53,970 --> 00:10:55,750
because it reached
a contradiction.

221
00:10:55,750 --> 00:10:58,780
And from something false,
you can prove anything.

222
00:10:58,780 --> 00:11:01,520
Well Frege had to book.

223
00:11:01,520 --> 00:11:03,840
It was a vanity publication.

224
00:11:03,840 --> 00:11:06,330
And the preface of it
had to be rewritten.

225
00:11:06,330 --> 00:11:09,820
And he said look,
my system's broken.

226
00:11:09,820 --> 00:11:10,590
And I know that.

227
00:11:10,590 --> 00:11:12,910
And Russell showed
that unambiguously.

228
00:11:12,910 --> 00:11:15,280
But I think that there's
still something here

229
00:11:15,280 --> 00:11:16,310
that's salvageable.

230
00:11:16,310 --> 00:11:18,620
And so I'm going
to publish a book.

231
00:11:18,620 --> 00:11:20,300
But I apologize for
the fact that you

232
00:11:20,300 --> 00:11:23,100
can't rely on the conclusions.

233
00:11:23,100 --> 00:11:24,640
Poor Frege.

234
00:11:24,640 --> 00:11:27,470
That was his life work
gone down the drain.

235
00:11:31,480 --> 00:11:32,760
How do we resolve this?

236
00:11:32,760 --> 00:11:36,220
What's wrong with this
apparent paradox of Russell's?

237
00:11:36,220 --> 00:11:41,340
Well, the assumption
was that W was a set.

238
00:11:41,340 --> 00:11:44,250
And that turns out to
be what we can doubt.

239
00:11:44,250 --> 00:11:50,650
So the definition of W is that
for all sets W, s is in W if

240
00:11:50,650 --> 00:11:52,742
and only if s is not in s.

241
00:11:52,742 --> 00:11:58,250
And we got a contradiction by
saying OK, substitute W for s.

242
00:11:58,250 --> 00:12:03,246
But that's allowed only if
you believe that W is a set.

243
00:12:03,246 --> 00:12:04,620
Now it looks like
it ought to be,

244
00:12:04,620 --> 00:12:08,720
because it's certainly well
defined by that formula.

245
00:12:08,720 --> 00:12:11,400
But it was well
understood at the time

246
00:12:11,400 --> 00:12:13,750
that that was the
fix to the paradox.

247
00:12:13,750 --> 00:12:18,040
You just can't
allow W to be a set.

248
00:12:18,040 --> 00:12:21,440
The problem was that W was
acknowledged by everybody

249
00:12:21,440 --> 00:12:24,210
to be absolutely clearly
defined mathematically.

250
00:12:24,210 --> 00:12:27,070
It was this bunch of sets.

251
00:12:27,070 --> 00:12:29,570
And yet, we're going
to say it's not a set.

252
00:12:29,570 --> 00:12:30,810
Well, it's OK.

253
00:12:30,810 --> 00:12:32,210
That will fix Russell's paradox.

254
00:12:32,210 --> 00:12:34,360
But it leaves us
with a much bigger

255
00:12:34,360 --> 00:12:36,920
general philosophical
question is,

256
00:12:36,920 --> 00:12:40,680
when it is a well defined
mathematical object a set,

257
00:12:40,680 --> 00:12:42,000
and when isn't a set?

258
00:12:42,000 --> 00:12:45,470
And that's what you need
sophisticated rules for.

259
00:12:45,470 --> 00:12:47,180
When is it that
you're going to define

260
00:12:47,180 --> 00:12:48,730
some collection of
elements, and you

261
00:12:48,730 --> 00:12:51,700
could be sure it's a set, as
opposed to something else--

262
00:12:51,700 --> 00:12:55,710
called a class by the
way-- which is basically

263
00:12:55,710 --> 00:12:59,300
something that's too big to be
a set, because if it was a set,

264
00:12:59,300 --> 00:13:04,740
it would contain itself and be
circular and self-referential.

265
00:13:04,740 --> 00:13:07,000
Well, there's no simple
answer to this question

266
00:13:07,000 --> 00:13:10,840
about what things are sets
and what are not sets.

267
00:13:10,840 --> 00:13:14,600
But over time, by
the 1930s, people

268
00:13:14,600 --> 00:13:19,610
had pretty much settled on a
very economical and persuasive

269
00:13:19,610 --> 00:13:24,000
set of axioms for set theory
called the Zermelo-Fraenkel set

270
00:13:24,000 --> 00:13:25,840
theory axiom.