1 00:00:01,060 --> 00:00:03,090 PROFESSOR: So in this final segment today, 2 00:00:03,090 --> 00:00:06,900 we're going to talk about set theory just a little bit. 3 00:00:06,900 --> 00:00:10,370 Because if you're going to take a math class, 4 00:00:10,370 --> 00:00:13,820 if you're going to be exposed to math for computer science, 5 00:00:13,820 --> 00:00:16,110 it's useful to have at least a glimmering 6 00:00:16,110 --> 00:00:19,600 of what the foundations of math looks like, how it starts 7 00:00:19,600 --> 00:00:20,860 and how it gets justified. 8 00:00:20,860 --> 00:00:23,310 And that's what set theory does. 9 00:00:23,310 --> 00:00:27,540 In addition, we will see that the diagonal argument 10 00:00:27,540 --> 00:00:30,790 that we've already made much of played a crucial role 11 00:00:30,790 --> 00:00:34,460 in the development and understanding of set theory. 12 00:00:34,460 --> 00:00:38,130 So let's begin with an issue that 13 00:00:38,130 --> 00:00:41,220 plays an important role in set theory and in computer science, 14 00:00:41,220 --> 00:00:44,250 having to do with the idea of taking a function 15 00:00:44,250 --> 00:00:47,500 and applying it to itself, or having something 16 00:00:47,500 --> 00:00:49,020 refer to itself. 17 00:00:49,020 --> 00:00:51,900 And this is one of these things that's notoriously doubtful. 18 00:00:51,900 --> 00:00:53,120 There's all these paradoxes. 19 00:00:53,120 --> 00:00:56,330 But maybe the simplest one is when I 20 00:00:56,330 --> 00:00:58,810 assert this statement is false. 21 00:00:58,810 --> 00:01:01,780 And the question is, it true or false? 22 00:01:01,780 --> 00:01:04,364 Well, if it's true, then it's false. 23 00:01:04,364 --> 00:01:05,780 And if it's false, then it's true. 24 00:01:05,780 --> 00:01:08,100 And we get a kind of buzzer. 25 00:01:08,100 --> 00:01:11,800 It's not possible to figure out whether this statement is 26 00:01:11,800 --> 00:01:12,420 true or false. 27 00:01:12,420 --> 00:01:15,300 I think we would deny that it was a proposition. 28 00:01:15,300 --> 00:01:17,360 So that's a hint that there's something 29 00:01:17,360 --> 00:01:21,080 suspicious about self-reference, self-application, and so one. 30 00:01:21,080 --> 00:01:23,070 On the other hand, it's worth remembering 31 00:01:23,070 --> 00:01:25,620 that in computer science, we take this for granted. 32 00:01:25,620 --> 00:01:27,810 So let's look at an example. 33 00:01:27,810 --> 00:01:32,590 Here's a simple example of a list in Scheme 34 00:01:32,590 --> 00:01:36,730 Lisp notation, meaning it's a list of three things, 0, 1, 35 00:01:36,730 --> 00:01:37,750 and 2. 36 00:01:37,750 --> 00:01:40,900 And the black parens indicate that we're thinking 37 00:01:40,900 --> 00:01:43,120 of it as an ordered list. 38 00:01:43,120 --> 00:01:46,100 Now the way that I would represent a list like that 39 00:01:46,100 --> 00:01:49,690 in memory, typically, is by using these things are 40 00:01:49,690 --> 00:01:50,620 called cons cells. 41 00:01:50,620 --> 00:01:52,750 So a cons cell has these two parts. 42 00:01:52,750 --> 00:01:56,700 The left hand part points to the value 43 00:01:56,700 --> 00:01:58,410 in that location in the list. 44 00:01:58,410 --> 00:02:00,990 So this first cons cell points to 0, 45 00:02:00,990 --> 00:02:02,750 which is the first element in the list. 46 00:02:02,750 --> 00:02:06,050 The second component of the cons cell points to the next element 47 00:02:06,050 --> 00:02:06,710 to the list. 48 00:02:06,710 --> 00:02:09,590 And so here you see 1 pointing to the third element 49 00:02:09,590 --> 00:02:10,090 of the list. 50 00:02:10,090 --> 00:02:11,220 And there you see 2. 51 00:02:11,220 --> 00:02:13,840 And that little nil marker indicates 52 00:02:13,840 --> 00:02:15,080 that's the end of the list. 53 00:02:15,080 --> 00:02:17,140 So that's a simple representation 54 00:02:17,140 --> 00:02:21,650 of a list of length three with three con cells. 55 00:02:21,650 --> 00:02:25,180 One of the things that computer science lets 56 00:02:25,180 --> 00:02:27,050 you do and many languages let you do 57 00:02:27,050 --> 00:02:30,220 is you can manipulate these pointers. 58 00:02:30,220 --> 00:02:34,000 So using the language of Scheme, what I can do 59 00:02:34,000 --> 00:02:39,240 is I'll do an operation called setcar where I'm taking, 60 00:02:39,240 --> 00:02:42,940 in this case, I'm setting the second element of L, that 61 00:02:42,940 --> 00:02:46,590 is this cons cell, to L. And setcar is saying, 62 00:02:46,590 --> 00:02:54,360 let's change what the element in the left hand part of this cell 63 00:02:54,360 --> 00:02:55,030 is. 64 00:02:55,030 --> 00:02:57,600 This is where the value of the second element is. 65 00:02:57,600 --> 00:03:01,089 Let's change the value of the second element to be L. 66 00:03:01,089 --> 00:03:03,005 What does that mean as a pointer manipulation? 67 00:03:03,005 --> 00:03:04,670 Well, it's pretty simple. 68 00:03:04,670 --> 00:03:06,400 I just move this pointer to point 69 00:03:06,400 --> 00:03:10,490 to the beginning of the list L. And now 70 00:03:10,490 --> 00:03:14,870 I have an interesting situation, because this list now 71 00:03:14,870 --> 00:03:19,700 is a list it consists of 0 and L And 2. 72 00:03:19,700 --> 00:03:22,950 It's a list that has itself as a member. 73 00:03:22,950 --> 00:03:25,360 And it makes perfect sense. 74 00:03:25,360 --> 00:03:27,580 And if you sort of expand that out, 75 00:03:27,580 --> 00:03:31,580 L is this list that begins with 0. 76 00:03:31,580 --> 00:03:35,570 And then its second element is a list that begins with 0. 77 00:03:35,570 --> 00:03:37,450 And the second element of that list 78 00:03:37,450 --> 00:03:39,760 is a list that begins with 0, and so on. 79 00:03:39,760 --> 00:03:42,580 And then the third element of L is 2, 80 00:03:42,580 --> 00:03:45,000 and the third element of the second element of L 81 00:03:45,000 --> 00:03:46,460 is 2, and so on. 82 00:03:46,460 --> 00:03:49,060 It's an interesting infinite nested structure 83 00:03:49,060 --> 00:03:55,100 that's nicely represented by this finite circular list. 84 00:03:58,710 --> 00:04:01,460 Let's look at another example where, in computer science, 85 00:04:01,460 --> 00:04:04,470 we actually apply things to themselves. 86 00:04:04,470 --> 00:04:06,610 So let's define the composition operator. 87 00:04:06,610 --> 00:04:10,310 And again, I'm using notation from the language Scheme. 88 00:04:10,310 --> 00:04:13,270 I want to take two functions f and g that take one argument. 89 00:04:13,270 --> 00:04:15,250 I'm going to define their composition. 90 00:04:15,250 --> 00:04:17,760 The way that I compose f and g is 91 00:04:17,760 --> 00:04:20,670 I define a new function, h of x, which is going to be 92 00:04:20,670 --> 00:04:22,380 the composition of h and g. 93 00:04:22,380 --> 00:04:27,970 The way I defined h of x is I say apply 94 00:04:27,970 --> 00:04:32,390 f to g applied to x and return the value h. 95 00:04:32,390 --> 00:04:35,360 So this is a compose is a procedure that 96 00:04:35,360 --> 00:04:41,540 takes two procedures f and g and returns their composition, h. 97 00:04:41,540 --> 00:04:43,520 OK, let's practice. 98 00:04:43,520 --> 00:04:47,610 Suppose that I compose the square function that 99 00:04:47,610 --> 00:04:51,010 maps x to x squared, and the add1 function that 100 00:04:51,010 --> 00:04:52,820 maps x to x plus 1. 101 00:04:52,820 --> 00:04:59,280 Well, if I compose the square of adding 1, 102 00:04:59,280 --> 00:05:06,170 and I apply it to 3, what I'm saying is let's add 1 to 3, 103 00:05:06,170 --> 00:05:07,630 and then square it. 104 00:05:07,630 --> 00:05:13,150 And I get 3 plus 1 squared, or 16, because the add1 and then 105 00:05:13,150 --> 00:05:17,290 square it is the function that's the composition of square 106 00:05:17,290 --> 00:05:18,920 and add1. 107 00:05:18,920 --> 00:05:20,510 Now I can do the following. 108 00:05:20,510 --> 00:05:23,280 I could compose square with itself. 109 00:05:23,280 --> 00:05:26,280 If I take the function, square it, and square that, 110 00:05:26,280 --> 00:05:28,550 I'm really taking the fourth power. 111 00:05:28,550 --> 00:05:32,580 So if I apply the function composed of square square to 3, 112 00:05:32,580 --> 00:05:36,640 I get 3 square square, or 81, or 3 to the fourth. 113 00:05:36,640 --> 00:05:39,200 All makes perfect sense. 114 00:05:39,200 --> 00:05:44,920 Well now let's define a compose it with itself operation. 115 00:05:44,920 --> 00:05:46,570 I'm going to call it comp2. 116 00:05:46,570 --> 00:05:48,690 comp2 takes one function f. 117 00:05:48,690 --> 00:05:53,680 And the definition of comp2 is compose f with f. 118 00:05:53,680 --> 00:05:58,880 And if I then apply comp2 to square and 3, 119 00:05:58,880 --> 00:06:01,470 it's saying, OK, compose square and square. 120 00:06:01,470 --> 00:06:02,270 We just did that. 121 00:06:02,270 --> 00:06:03,700 That was the fourth power. 122 00:06:03,700 --> 00:06:04,690 Apply it 3. 123 00:06:04,690 --> 00:06:06,580 I get 81. 124 00:06:06,580 --> 00:06:08,370 And now we can do some weird stuff. 125 00:06:08,370 --> 00:06:13,950 Because suppose that I apply comp2 to comp2, 126 00:06:13,950 --> 00:06:19,290 and then apply that to add1, and apply that to 3. 127 00:06:19,290 --> 00:06:20,970 Well that one's a little hard to follow, 128 00:06:20,970 --> 00:06:22,720 and I'm going to let you think it through. 129 00:06:22,720 --> 00:06:28,710 But comp2 of comp2 is compose something four times. 130 00:06:28,710 --> 00:06:32,360 And when you do that with add1, what happens 131 00:06:32,360 --> 00:06:39,110 is that you're adding 1 four times to 3. 132 00:06:39,110 --> 00:06:44,130 If I comp2 of comp2 of square, I'm 133 00:06:44,130 --> 00:06:46,710 composing square with itself, and then composing 134 00:06:46,710 --> 00:06:47,530 that with itself. 135 00:06:47,530 --> 00:06:51,110 I'm really squaring four times. 136 00:06:51,110 --> 00:06:59,040 And I wind up with 2 to the fourth, or 16, 137 00:06:59,040 --> 00:07:00,660 is the power that I'm taking. 138 00:07:00,660 --> 00:07:03,660 And so compose2 of compose2 of square of 3 139 00:07:03,660 --> 00:07:06,769 is this rather large number, 3 to the 16th. 140 00:07:06,769 --> 00:07:08,810 It could be a little bit tricky to think through, 141 00:07:08,810 --> 00:07:10,101 but it all makes perfect sense. 142 00:07:10,101 --> 00:07:14,100 And it works just fine in recursive programming languages 143 00:07:14,100 --> 00:07:17,530 that allow this kind of untyped or generically 144 00:07:17,530 --> 00:07:20,120 typed composition. 145 00:07:20,120 --> 00:07:23,080 So why is it that computer scientists are so daring, 146 00:07:23,080 --> 00:07:27,720 and mathematicians are so timid about self-reference? 147 00:07:27,720 --> 00:07:29,520 And the reason is that mathematicians 148 00:07:29,520 --> 00:07:32,010 have been traumatized by Bertrand Russell because 149 00:07:32,010 --> 00:07:34,650 of Russell's famous paradox, which 150 00:07:34,650 --> 00:07:36,660 we're now ready to look at. 151 00:07:36,660 --> 00:07:40,650 So what Russell was proposing, and it's 152 00:07:40,650 --> 00:07:43,890 going to look just like a diagonal argument is, Russell 153 00:07:43,890 --> 00:07:50,580 said, let's let W be the collection of sets s such 154 00:07:50,580 --> 00:07:53,310 that s is not a member of s. 155 00:07:53,310 --> 00:07:55,440 Now let's think about that for a little bit. 156 00:07:55,440 --> 00:07:57,240 Most sets are not members of themselves. 157 00:07:57,240 --> 00:08:01,650 Like the set of integers is not a member of itself 158 00:08:01,650 --> 00:08:04,470 because the only thing in it are integers. 159 00:08:04,470 --> 00:08:10,960 And the power set of integers is not a member of itself 160 00:08:10,960 --> 00:08:15,450 because every member of the power set of integers 161 00:08:15,450 --> 00:08:18,320 is a set of integers, whereas the power set of integers 162 00:08:18,320 --> 00:08:20,050 is a set of sets of those things. 163 00:08:20,050 --> 00:08:22,770 So those familiar sets are typically not 164 00:08:22,770 --> 00:08:24,050 members of themselves. 165 00:08:24,050 --> 00:08:26,620 But who knows, maybe there are these weird sets 166 00:08:26,620 --> 00:08:31,260 like the circular list, or a function that 167 00:08:31,260 --> 00:08:35,320 can compose with itself, that is a member of itself. 168 00:08:35,320 --> 00:08:37,590 Now let me step back for a moment 169 00:08:37,590 --> 00:08:40,909 and mention where did Russell get thinking about this. 170 00:08:40,909 --> 00:08:44,740 And it comes from the period in the late 19th century 171 00:08:44,740 --> 00:08:46,700 when mathematicians and logicians 172 00:08:46,700 --> 00:08:50,360 were trying to think about confirming and establishing 173 00:08:50,360 --> 00:08:51,570 the foundations of math. 174 00:08:51,570 --> 00:08:53,480 What was math absolutely about? 175 00:08:53,480 --> 00:08:55,450 What were the fundamental objects 176 00:08:55,450 --> 00:08:59,040 that mathematics could be built from, 177 00:08:59,040 --> 00:09:02,090 and what were the rules for understanding those objects? 178 00:09:02,090 --> 00:09:04,806 And it was pretty well agreed that sets were it. 179 00:09:04,806 --> 00:09:06,430 You could build everything out of sets. 180 00:09:06,430 --> 00:09:08,340 And you just needed to understand sets, 181 00:09:08,340 --> 00:09:10,350 and then you were in business. 182 00:09:10,350 --> 00:09:12,470 And there was a German mathematician 183 00:09:12,470 --> 00:09:16,190 named Frege who tried to demonstrate this 184 00:09:16,190 --> 00:09:20,780 by developing a set theory very carefully, 185 00:09:20,780 --> 00:09:23,200 giving careful axioms for what sets were. 186 00:09:23,200 --> 00:09:25,430 And he showed how you could build, out of sets, 187 00:09:25,430 --> 00:09:26,760 you could build the integers. 188 00:09:26,760 --> 00:09:28,540 And then you could build rationals, 189 00:09:28,540 --> 00:09:30,510 which are sort of just pairs of integers. 190 00:09:30,510 --> 00:09:32,420 And then you could build real numbers 191 00:09:32,420 --> 00:09:34,820 by taking collections of rationals 192 00:09:34,820 --> 00:09:39,020 that had at least upper bound. 193 00:09:39,020 --> 00:09:40,319 And then you keep going. 194 00:09:40,319 --> 00:09:42,360 You can build functions and continuous functions. 195 00:09:42,360 --> 00:09:44,960 And he showed how you could build up 196 00:09:44,960 --> 00:09:47,510 the basic structures of mathematical analysis 197 00:09:47,510 --> 00:09:52,747 and prove their basic theorems in his formal set theory. 198 00:09:52,747 --> 00:09:54,830 The problem was that Russell came along and looked 199 00:09:54,830 --> 00:10:00,870 at Frege's set theory, and came up with the following paradox. 200 00:10:00,870 --> 00:10:04,630 He defined W to be the collection of s in sets 201 00:10:04,630 --> 00:10:06,500 such that s is not a member of s. 202 00:10:06,500 --> 00:10:10,230 Frege would certainly have said that's a well defined set, 203 00:10:10,230 --> 00:10:13,660 and he will acknowledge the W is a set. 204 00:10:13,660 --> 00:10:15,524 And let's look at what this means. 205 00:10:15,524 --> 00:10:16,690 This is a diagonal argument. 206 00:10:16,690 --> 00:10:19,940 So let's remember, by this definition of W, 207 00:10:19,940 --> 00:10:22,630 what we have is that a set s is in W if 208 00:10:22,630 --> 00:10:24,806 and only if s is not a member of s. 209 00:10:24,806 --> 00:10:27,040 OK, that's fine. 210 00:10:27,040 --> 00:10:30,100 Then just let s be W. And we immediately get 211 00:10:30,100 --> 00:10:35,870 a contradiction that W is in W if and only if W is not in W. 212 00:10:35,870 --> 00:10:37,700 Poor Frege. 213 00:10:37,700 --> 00:10:39,660 His book was a disaster. 214 00:10:39,660 --> 00:10:40,455 Math is broken. 215 00:10:42,862 --> 00:10:44,320 You can prove that you're the pope. 216 00:10:44,320 --> 00:10:47,640 You could prove that pigs fly, verify programs crash. 217 00:10:47,640 --> 00:10:49,760 Math is just broken. 218 00:10:49,760 --> 00:10:50,780 It's not reliable. 219 00:10:50,780 --> 00:10:53,970 You can prove anything in Frege's system, 220 00:10:53,970 --> 00:10:55,750 because it reached a contradiction. 221 00:10:55,750 --> 00:10:58,780 And from something false, you can prove anything. 222 00:10:58,780 --> 00:11:01,520 Well Frege had to book. 223 00:11:01,520 --> 00:11:03,840 It was a vanity publication. 224 00:11:03,840 --> 00:11:06,330 And the preface of it had to be rewritten. 225 00:11:06,330 --> 00:11:09,820 And he said look, my system's broken. 226 00:11:09,820 --> 00:11:10,590 And I know that. 227 00:11:10,590 --> 00:11:12,910 And Russell showed that unambiguously. 228 00:11:12,910 --> 00:11:15,280 But I think that there's still something here 229 00:11:15,280 --> 00:11:16,310 that's salvageable. 230 00:11:16,310 --> 00:11:18,620 And so I'm going to publish a book. 231 00:11:18,620 --> 00:11:20,300 But I apologize for the fact that you 232 00:11:20,300 --> 00:11:23,100 can't rely on the conclusions. 233 00:11:23,100 --> 00:11:24,640 Poor Frege. 234 00:11:24,640 --> 00:11:27,470 That was his life work gone down the drain. 235 00:11:31,480 --> 00:11:32,760 How do we resolve this? 236 00:11:32,760 --> 00:11:36,220 What's wrong with this apparent paradox of Russell's? 237 00:11:36,220 --> 00:11:41,340 Well, the assumption was that W was a set. 238 00:11:41,340 --> 00:11:44,250 And that turns out to be what we can doubt. 239 00:11:44,250 --> 00:11:50,650 So the definition of W is that for all sets W, s is in W if 240 00:11:50,650 --> 00:11:52,742 and only if s is not in s. 241 00:11:52,742 --> 00:11:58,250 And we got a contradiction by saying OK, substitute W for s. 242 00:11:58,250 --> 00:12:03,246 But that's allowed only if you believe that W is a set. 243 00:12:03,246 --> 00:12:04,620 Now it looks like it ought to be, 244 00:12:04,620 --> 00:12:08,720 because it's certainly well defined by that formula. 245 00:12:08,720 --> 00:12:11,400 But it was well understood at the time 246 00:12:11,400 --> 00:12:13,750 that that was the fix to the paradox. 247 00:12:13,750 --> 00:12:18,040 You just can't allow W to be a set. 248 00:12:18,040 --> 00:12:21,440 The problem was that W was acknowledged by everybody 249 00:12:21,440 --> 00:12:24,210 to be absolutely clearly defined mathematically. 250 00:12:24,210 --> 00:12:27,070 It was this bunch of sets. 251 00:12:27,070 --> 00:12:29,570 And yet, we're going to say it's not a set. 252 00:12:29,570 --> 00:12:30,810 Well, it's OK. 253 00:12:30,810 --> 00:12:32,210 That will fix Russell's paradox. 254 00:12:32,210 --> 00:12:34,360 But it leaves us with a much bigger 255 00:12:34,360 --> 00:12:36,920 general philosophical question is, 256 00:12:36,920 --> 00:12:40,680 when it is a well defined mathematical object a set, 257 00:12:40,680 --> 00:12:42,000 and when isn't a set? 258 00:12:42,000 --> 00:12:45,470 And that's what you need sophisticated rules for. 259 00:12:45,470 --> 00:12:47,180 When is it that you're going to define 260 00:12:47,180 --> 00:12:48,730 some collection of elements, and you 261 00:12:48,730 --> 00:12:51,700 could be sure it's a set, as opposed to something else-- 262 00:12:51,700 --> 00:12:55,710 called a class by the way-- which is basically 263 00:12:55,710 --> 00:12:59,300 something that's too big to be a set, because if it was a set, 264 00:12:59,300 --> 00:13:04,740 it would contain itself and be circular and self-referential. 265 00:13:04,740 --> 00:13:07,000 Well, there's no simple answer to this question 266 00:13:07,000 --> 00:13:10,840 about what things are sets and what are not sets. 267 00:13:10,840 --> 00:13:14,600 But over time, by the 1930s, people 268 00:13:14,600 --> 00:13:19,610 had pretty much settled on a very economical and persuasive 269 00:13:19,610 --> 00:13:24,000 set of axioms for set theory called the Zermelo-Fraenkel set 270 00:13:24,000 --> 00:13:25,840 theory axiom.