1 00:00:00,499 --> 00:00:02,760 The following content is provided under a Creative 2 00:00:02,760 --> 00:00:04,280 Commons license. 3 00:00:04,280 --> 00:00:06,620 Your support will help MIT OpenCourseWare 4 00:00:06,620 --> 00:00:10,980 continue to offer high quality educational resources for free. 5 00:00:10,980 --> 00:00:13,600 To make a donation or view additional materials 6 00:00:13,600 --> 00:00:17,496 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,496 --> 00:00:18,121 at ocw.mit.edu. 8 00:00:23,580 --> 00:00:24,920 PROFESSOR: OK let's get started. 9 00:00:27,790 --> 00:00:31,480 I once taught with a professor who 10 00:00:31,480 --> 00:00:36,100 was lamenting the fact that as the term progresses attendance 11 00:00:36,100 --> 00:00:37,830 in lecture tends to drop off. 12 00:00:37,830 --> 00:00:40,640 And gets pretty dramatic by the end of the term 13 00:00:40,640 --> 00:00:43,760 when you're lecturing, and nobody's there. 14 00:00:43,760 --> 00:00:45,980 And I asked him what he did about it. 15 00:00:45,980 --> 00:00:48,180 And he thought about it and he said, 16 00:00:48,180 --> 00:00:51,030 there's only two things that can get students to come 17 00:00:51,030 --> 00:00:55,840 to lecture, candy and sex. 18 00:00:55,840 --> 00:01:01,510 Now we've already tried candy, so today we're 19 00:01:01,510 --> 00:01:04,410 going to talk about sex. 20 00:01:04,410 --> 00:01:07,600 In fact we're going to use graph theory 21 00:01:07,600 --> 00:01:10,630 to address a decades old debate concerning 22 00:01:10,630 --> 00:01:15,240 the relative promiscuity of men versus women. 23 00:01:15,240 --> 00:01:19,410 Now graphs are incredibly useful structures in computer science, 24 00:01:19,410 --> 00:01:22,270 and we're going to be studying them for the next five or six 25 00:01:22,270 --> 00:01:23,520 lectures. 26 00:01:23,520 --> 00:01:27,800 They come up in all sorts of applications, scheduling, 27 00:01:27,800 --> 00:01:31,070 optimization, communications, the design 28 00:01:31,070 --> 00:01:32,710 and analysis of algorithms. 29 00:01:32,710 --> 00:01:35,310 In fact next week, you're going to see how to Stanford 30 00:01:35,310 --> 00:01:38,990 graduate students became gazillionaires 31 00:01:38,990 --> 00:01:43,200 because they use graph theoretic in a clever way. 32 00:01:43,200 --> 00:01:46,470 But let's talk about sex. 33 00:01:46,470 --> 00:01:48,640 The issue that we're going to address today 34 00:01:48,640 --> 00:01:52,660 is one of the most talked about, and most well studied, 35 00:01:52,660 --> 00:01:55,900 questions in all of human sociology. 36 00:01:55,900 --> 00:02:00,400 On average, who has more opposite gender partners, 37 00:02:00,400 --> 00:02:02,715 men or women? 38 00:02:02,715 --> 00:02:05,310 Now opposite gender is going to be important. 39 00:02:05,310 --> 00:02:09,035 And by this I mean, one boy, and one girl. 40 00:02:09,035 --> 00:02:11,035 All right, I'm not making a political statement. 41 00:02:11,035 --> 00:02:16,770 It's just that the math is a lot easier that way, as you'll see. 42 00:02:16,770 --> 00:02:18,860 Now I'd like to start by taking a pole here 43 00:02:18,860 --> 00:02:21,270 to see what you think about that. 44 00:02:21,270 --> 00:02:24,830 So raise your and if you think men, on average, 45 00:02:24,830 --> 00:02:27,910 have more opposite gender partners than women do. 46 00:02:30,420 --> 00:02:31,095 Only a few. 47 00:02:31,095 --> 00:02:33,401 AUDIENCE: In life or [INAUDIBLE] 48 00:02:33,401 --> 00:02:34,400 PROFESSOR: Um, you can-- 49 00:02:34,400 --> 00:02:39,040 [LAUGHTER] 50 00:02:39,040 --> 00:02:40,640 PROFESSOR: One on one. 51 00:02:40,640 --> 00:02:43,970 OK, so let's say over the course of their lives, let's say, 52 00:02:43,970 --> 00:02:48,200 or over the course of 2010, that men in America 53 00:02:48,200 --> 00:02:51,460 have more opposite-gender partners than women in America, 54 00:02:51,460 --> 00:02:52,830 say in 2010. 55 00:02:52,830 --> 00:02:55,030 Raise your hand if you think men have more going on. 56 00:02:55,030 --> 00:02:57,040 All right a bunch of you. 57 00:02:57,040 --> 00:02:58,730 Raise your hand if you think women have 58 00:02:58,730 --> 00:03:00,890 more opposite-gender partners? 59 00:03:00,890 --> 00:03:02,490 This is unusual. 60 00:03:02,490 --> 00:03:04,870 Maybe even more voted for women, but it's close. 61 00:03:04,870 --> 00:03:07,744 Raise your hand if you think it's equal. 62 00:03:07,744 --> 00:03:09,680 All right, about the same. 63 00:03:09,680 --> 00:03:12,550 Raise your hand if you think there's no way to know, 64 00:03:12,550 --> 00:03:15,210 that it's hopeless to really figure it out. 65 00:03:15,210 --> 00:03:17,210 All right, nobody goes for that. 66 00:03:17,210 --> 00:03:18,565 All right, good. 67 00:03:18,565 --> 00:03:20,460 All right well now in the popular literature, 68 00:03:20,460 --> 00:03:23,620 I think the feelings are different than expressed here. 69 00:03:23,620 --> 00:03:25,450 Pretty much universally, in the literature, 70 00:03:25,450 --> 00:03:28,000 it's believed that men have more opposite-gender partners 71 00:03:28,000 --> 00:03:29,216 than women. 72 00:03:29,216 --> 00:03:30,590 And in fact, you could even think 73 00:03:30,590 --> 00:03:33,120 about that, if you think about literature, 74 00:03:33,120 --> 00:03:35,780 the leader of the harem is always a man. 75 00:03:35,780 --> 00:03:37,680 And he's got lots of women. 76 00:03:37,680 --> 00:03:40,540 In polygamist cultures, it's always 77 00:03:40,540 --> 00:03:44,890 the man that has multiple wives, not the reverse. 78 00:03:44,890 --> 00:03:47,410 Now not surprisingly, this issue has 79 00:03:47,410 --> 00:03:49,620 been studied "scientifically," I'll 80 00:03:49,620 --> 00:03:54,520 put in quotes, extensively, in one of the largest studies ever 81 00:03:54,520 --> 00:03:56,220 done. 82 00:03:56,220 --> 00:03:58,210 Researchers from University of Chicago 83 00:03:58,210 --> 00:04:03,200 interviewed 2,500 people, at random, over several years. 84 00:04:03,200 --> 00:04:05,800 They brought them in, on many occasions, 85 00:04:05,800 --> 00:04:08,920 to try to get the answer for the question once and for all. 86 00:04:08,920 --> 00:04:11,560 And they wrote this 700 page book, 87 00:04:11,560 --> 00:04:14,680 called The soul of Social Organization of Sexuality: 88 00:04:14,680 --> 00:04:18,050 Sexual Practices in the US. 89 00:04:18,050 --> 00:04:19,670 Actually walking around with this book 90 00:04:19,670 --> 00:04:22,720 has proved to be a little embarrassing. 91 00:04:22,720 --> 00:04:25,890 Last week my 11-year-old daughter saw it, 92 00:04:25,890 --> 00:04:29,410 and she goes dad, why do you have this sex book. 93 00:04:29,410 --> 00:04:32,700 And I grabbed it back and said, well that's for the course. 94 00:04:32,700 --> 00:04:34,910 I'm teaching. 95 00:04:34,910 --> 00:04:37,040 And I thought I'd gotten away with it, 96 00:04:37,040 --> 00:04:38,930 and everything was fine. 97 00:04:38,930 --> 00:04:42,150 And then later that day she texted all of our friends 98 00:04:42,150 --> 00:04:44,330 about the new news that what do you know, 99 00:04:44,330 --> 00:04:48,960 her dad teaches sex ed at MIT. 100 00:04:48,960 --> 00:04:53,950 Anyway this study concludes that on average men 101 00:04:53,950 --> 00:05:00,430 have 74% more opposite-gender partners than women. 102 00:05:00,430 --> 00:05:01,925 There's one other central claims. 103 00:05:10,389 --> 00:05:11,305 And this is in the US. 104 00:05:36,420 --> 00:05:43,350 OK now, when you think about it that sounds maybe reasonable, 105 00:05:43,350 --> 00:05:45,390 might be OK. 106 00:05:45,390 --> 00:05:49,750 But not according to ABC News. 107 00:05:49,750 --> 00:05:54,315 They did a poll of 1,500 people in the country, in 2004, 108 00:05:54,315 --> 00:05:59,090 and concluded that the average disparity is much greater. 109 00:05:59,090 --> 00:06:01,950 In particular, in this study, they 110 00:06:01,950 --> 00:06:05,210 said that the average man has 20 partners-- 111 00:06:05,210 --> 00:06:07,100 I'm assuming over their lifetime-- 112 00:06:07,100 --> 00:06:10,050 and the average woman has six. 113 00:06:10,050 --> 00:06:12,636 And this gives a disparity 233%. 114 00:06:21,220 --> 00:06:30,630 So ABC News did a smaller survey says that it's 233% here, 115 00:06:30,630 --> 00:06:33,540 much more than 74%. 116 00:06:33,540 --> 00:06:36,800 Now ABC News claimed this is one of the most 117 00:06:36,800 --> 00:06:39,060 scientific studies ever done. 118 00:06:39,060 --> 00:06:42,430 And there was a 2.5% margin of error. 119 00:06:42,430 --> 00:06:45,540 Now we'll actually talk about what that means mathematically 120 00:06:45,540 --> 00:06:47,570 later in the term when we do probability, and do 121 00:06:47,570 --> 00:06:49,512 study polling. 122 00:06:49,512 --> 00:06:50,970 Now of course I should also mention 123 00:06:50,970 --> 00:06:52,770 that ABC News is the one that said 124 00:06:52,770 --> 00:06:57,720 Al Gore won the presidential election in 2000. 125 00:06:57,720 --> 00:07:01,930 Now the study is called American Sex Survey, 126 00:07:01,930 --> 00:07:04,740 a Peak Between the Sheets. 127 00:07:04,740 --> 00:07:06,530 That doesn't sound so scientific. 128 00:07:06,530 --> 00:07:09,990 And it was on TV, on Primetime Live in 2004. 129 00:07:09,990 --> 00:07:12,300 The promo for this is really good. 130 00:07:12,300 --> 00:07:16,740 It says, a groundbreaking ABC News Primetime Live survey 131 00:07:16,740 --> 00:07:20,030 finds a range of eye popping sexual activities, fantasies, 132 00:07:20,030 --> 00:07:22,460 and attitudes in this country, confirming 133 00:07:22,460 --> 00:07:25,700 some conventional wisdom, exploding some myths, 134 00:07:25,700 --> 00:07:28,560 and venturing where few scientific surveys have 135 00:07:28,560 --> 00:07:29,972 gone before. 136 00:07:29,972 --> 00:07:31,680 By the end of today, we're going to agree 137 00:07:31,680 --> 00:07:34,390 with that last statement. 138 00:07:34,390 --> 00:07:39,250 OK now who do you think's right? 139 00:07:39,250 --> 00:07:41,980 University of Chicago. 140 00:07:41,980 --> 00:07:45,010 Who votes for 74% as being pretty close? 141 00:07:45,010 --> 00:07:47,190 A few of you. 142 00:07:47,190 --> 00:07:48,890 I've already slammed these guys. 143 00:07:48,890 --> 00:07:51,830 Who votes for ABC News as being more accurate? 144 00:07:51,830 --> 00:07:54,980 Yeah, nobody. 145 00:07:54,980 --> 00:07:58,210 Who votes for no way to tell? 146 00:07:58,210 --> 00:08:00,360 I got some votes there, all right. 147 00:08:00,360 --> 00:08:03,230 So how do you tackle this problem? 148 00:08:03,230 --> 00:08:06,414 In theory we could do our own 6.042 survey. 149 00:08:06,414 --> 00:08:08,080 I don't know how much we'd really learn, 150 00:08:08,080 --> 00:08:10,430 and for sure I'd get fired. 151 00:08:10,430 --> 00:08:14,030 So I don't think we're going to do that. 152 00:08:14,030 --> 00:08:16,310 But fortunately, this is the kind of question 153 00:08:16,310 --> 00:08:17,940 that could be handled, and actually 154 00:08:17,940 --> 00:08:21,430 answered, by graph theory, even though it might 155 00:08:21,430 --> 00:08:23,680 be more interesting to interview thousands of people, 156 00:08:23,680 --> 00:08:25,640 and find out what's going on. 157 00:08:25,640 --> 00:08:28,760 That's not as efficient as using graphs. 158 00:08:28,760 --> 00:08:33,070 So let me start by defining what a graph is. 159 00:08:33,070 --> 00:08:37,210 Informally graph is just a bunch of dots and lines 160 00:08:37,210 --> 00:08:40,374 connecting the dots, it's actually very simple. 161 00:08:40,374 --> 00:08:41,165 So here's to graph. 162 00:08:46,830 --> 00:08:49,460 These are the nodes, and they're connected 163 00:08:49,460 --> 00:08:51,215 with these lines, called edges. 164 00:08:55,480 --> 00:08:57,530 And often the nodes, and sometimes the edges, 165 00:08:57,530 --> 00:08:58,770 are labeled. 166 00:08:58,770 --> 00:09:10,323 For example, we might call this x1, x2, x3, x4, x5, x6, and x7. 167 00:09:10,323 --> 00:09:13,090 So that's an example of a graph. 168 00:09:13,090 --> 00:09:14,940 Now this being a math class, we got 169 00:09:14,940 --> 00:09:17,747 to give a formal definition of a graph. 170 00:09:17,747 --> 00:09:19,580 And we'll usually use the formal definition. 171 00:09:23,910 --> 00:09:41,310 A graph G is a pair of sets often called V and E. 172 00:09:41,310 --> 00:09:45,850 Where V is a set of elements called vertices or nodes. 173 00:09:45,850 --> 00:09:48,330 And it has to be non-empty here in this class. 174 00:10:06,140 --> 00:10:08,820 And we'll go back and forth between vertices and nodes. 175 00:10:08,820 --> 00:10:13,270 Even the text we use both words interchangeably. 176 00:10:13,270 --> 00:10:26,590 And E is a set of 2-item subsets and V, 177 00:10:26,590 --> 00:10:27,950 and they're called edges. 178 00:10:32,970 --> 00:10:37,060 So for example, over here in this picture, 179 00:10:37,060 --> 00:10:48,950 V is the set of nodes is x1, x2, x3, up to x7, that's the nodes. 180 00:10:48,950 --> 00:10:54,430 And E, the set of edges, is pairs, 181 00:10:54,430 --> 00:10:58,700 unordered pairs of vertices. 182 00:10:58,700 --> 00:11:04,615 So for example x1, x2 is an edge. 183 00:11:04,615 --> 00:11:07,660 And it's the same as the set x2, x1, 184 00:11:07,660 --> 00:11:09,210 doesn't matter the order here. 185 00:11:09,210 --> 00:11:11,680 Later in a week or so, we'll talk about directed graphs 186 00:11:11,680 --> 00:11:14,600 where the order matters. 187 00:11:14,600 --> 00:11:21,750 x1, x3 is also an edge here, and so one. 188 00:11:21,750 --> 00:11:25,231 Think we've got, let's see, 1, 2, 3, 4, 5, 6, 189 00:11:25,231 --> 00:11:27,570 7 edges in this graph. 190 00:11:27,570 --> 00:11:31,250 And the last one would be x5, x7. 191 00:11:35,300 --> 00:11:41,190 Edges are also sometimes written with this notation, x1 line x2, 192 00:11:41,190 --> 00:11:42,877 is another notation. 193 00:11:42,877 --> 00:11:44,960 And then later when you talk about directed edges, 194 00:11:44,960 --> 00:11:49,710 we'll put a little arrowhead on one end of this. 195 00:11:49,710 --> 00:11:52,110 Now the definition of a graph is really pretty simple. 196 00:11:52,110 --> 00:11:55,290 Just think of it as dots and lines, if you want. 197 00:11:55,290 --> 00:11:59,440 But there's often differences in how people define graphs. 198 00:11:59,440 --> 00:12:04,550 For example, in this class we don't allow the empty graph, 199 00:12:04,550 --> 00:12:06,742 i.e. the graph with no nodes. 200 00:12:06,742 --> 00:12:08,950 So we're going to insist that every graph has to have 201 00:12:08,950 --> 00:12:10,270 at least one node in it. 202 00:12:10,270 --> 00:12:11,770 And that's just to make the theorems 203 00:12:11,770 --> 00:12:13,010 we're going prove be true. 204 00:12:13,010 --> 00:12:14,510 Otherwise there's some theorems that 205 00:12:14,510 --> 00:12:16,880 are false for the special case of the empty graph. 206 00:12:16,880 --> 00:12:19,270 But we don't require the graph to have any edges. 207 00:12:19,270 --> 00:12:23,330 In fact, it's possible you have a graph with nodes, 208 00:12:23,330 --> 00:12:23,945 but no edges. 209 00:12:26,500 --> 00:12:28,560 For example, this graph. 210 00:12:28,560 --> 00:12:29,310 Three-node graph. 211 00:12:33,560 --> 00:12:42,730 So here G equals VE, V equals x1, x2, x3. 212 00:12:42,730 --> 00:12:44,210 And E is just the empty set. 213 00:12:48,110 --> 00:12:53,060 Now for a general graph, when you do have edges, 214 00:12:53,060 --> 00:13:04,005 we say that two nodes, call them xi and xj, 215 00:13:04,005 --> 00:13:13,120 are adjacent if they're connected by an edge, namely 216 00:13:13,120 --> 00:13:18,160 if xi xj is an edge. 217 00:13:18,160 --> 00:13:24,630 All right so for example, x5 is adjacent to x7, 218 00:13:24,630 --> 00:13:29,040 but it's not adjacent to x4, there's no edge there. 219 00:13:33,160 --> 00:13:37,810 Closely related is the definition of the incidence. 220 00:13:37,810 --> 00:13:48,410 An edge E, which is xixj, is said 221 00:13:48,410 --> 00:13:55,560 to be incident to its end points, xi and xj. 222 00:13:59,120 --> 00:14:06,400 OK so, for example, if I labeled that edge as E, E 223 00:14:06,400 --> 00:14:11,435 is the edge x1, x2, and this incident to x1, and incident 224 00:14:11,435 --> 00:14:11,935 x2. 225 00:14:18,320 --> 00:14:20,600 Then we can talk about the degree of a node. 226 00:14:23,410 --> 00:14:38,180 The number of edges incident to a node 227 00:14:38,180 --> 00:14:40,030 is called the degree of the node. 228 00:14:51,510 --> 00:14:57,520 So for example, what's the degree of x5 over here? 229 00:14:57,520 --> 00:15:05,010 3, so in this case, the degree of x5 equals 3. 230 00:15:05,010 --> 00:15:10,920 The degree of x7 is 1. 231 00:15:10,920 --> 00:15:14,190 These guys all have degree 0, there's 232 00:15:14,190 --> 00:15:17,980 no edges incident to them. 233 00:15:17,980 --> 00:15:21,790 Now in this class, we're going to look at only simple graphs, 234 00:15:21,790 --> 00:15:23,000 at lease for a while. 235 00:15:26,570 --> 00:15:40,950 A graph is simple if it has no loops, or multiple edges. 236 00:15:48,200 --> 00:15:53,750 Now a loop is an edge that only connects up one node, that's 237 00:15:53,750 --> 00:15:56,250 a loop and we don't allow it. 238 00:15:56,250 --> 00:16:00,070 A multiple edge is we've got two edges that are really the same, 239 00:16:00,070 --> 00:16:02,980 they connect the same endpoints. 240 00:16:02,980 --> 00:16:04,040 Also called a multi-edge. 241 00:16:08,290 --> 00:16:10,620 And those we're not going to have in simple graphs. 242 00:16:10,620 --> 00:16:11,975 We don't allow this. 243 00:16:11,975 --> 00:16:15,200 We don't allow that. 244 00:16:15,200 --> 00:16:17,920 Any questions so far about what a graph is? 245 00:16:22,490 --> 00:16:26,390 So how are we going to use a graph 246 00:16:26,390 --> 00:16:32,470 to model the problem of opposite-gender partners? 247 00:16:32,470 --> 00:16:34,890 That's the question we're after. 248 00:16:34,890 --> 00:16:38,052 So any thoughts about what the nodes of the graph 249 00:16:38,052 --> 00:16:39,010 are going to represent? 250 00:16:41,840 --> 00:16:42,440 What is it? 251 00:16:42,440 --> 00:16:43,930 AUDIENCE: Males and females? 252 00:16:43,930 --> 00:16:45,370 PROFESSOR: People. 253 00:16:45,370 --> 00:16:47,670 Yeah, so we're going to have people. 254 00:16:47,670 --> 00:16:50,290 In fact, there's two kinds of people here. 255 00:16:50,290 --> 00:16:52,630 There's men, and women. 256 00:16:55,370 --> 00:17:01,520 All right we got nodes here for the men. 257 00:17:01,520 --> 00:17:05,400 And in fact in America, there's a lot of nodes here. 258 00:17:05,400 --> 00:17:07,750 All right, and so this might be oh I don't know, 259 00:17:07,750 --> 00:17:14,310 say that's Tom Cruise and Nicole Kidman. 260 00:17:14,310 --> 00:17:16,944 Now what's the edge going to represent? 261 00:17:16,944 --> 00:17:18,420 AUDIENCE: Partners. 262 00:17:18,420 --> 00:17:20,020 PROFESSOR: Partners. 263 00:17:20,020 --> 00:17:23,109 They were opposite-gender partners. 264 00:17:23,109 --> 00:17:26,470 And there's actually more edges probably here. 265 00:17:26,470 --> 00:17:34,200 We could have Penelope here, and Katie here. 266 00:17:34,200 --> 00:17:37,400 And well probably lots more, I probably don't know them all. 267 00:17:37,400 --> 00:17:41,250 And Ben's over here with Nicole. 268 00:17:41,250 --> 00:17:48,054 And Nicole got Jude and Keith. 269 00:17:48,054 --> 00:17:49,470 There's actually a website you can 270 00:17:49,470 --> 00:17:53,500 go to get a lot of these things here. 271 00:17:53,500 --> 00:17:56,830 And Katie went with Josh. 272 00:17:56,830 --> 00:17:59,779 It's called whosedatedwho.com, and you get big graph, 273 00:17:59,779 --> 00:18:01,320 you could start filling in the edges. 274 00:18:01,320 --> 00:18:03,690 I don't know how reliable it is. 275 00:18:03,690 --> 00:18:06,340 Now it's really critical that we're only looking edges 276 00:18:06,340 --> 00:18:08,270 from here to here. 277 00:18:08,270 --> 00:18:10,650 All right, so if there's an edge between Tom and Ben, 278 00:18:10,650 --> 00:18:13,740 I don't want to know about it. 279 00:18:13,740 --> 00:18:17,540 Just opposite-gender partners. 280 00:18:17,540 --> 00:18:22,090 OK now in the USA, the number of nodes 281 00:18:22,090 --> 00:18:25,590 here is about 300 million. 282 00:18:28,640 --> 00:18:30,220 About three million people. 283 00:18:30,220 --> 00:18:35,080 And the number of men nodes, male nodes, call these VM, 284 00:18:35,080 --> 00:18:39,950 and this is VW, by the way, I'm using cardinality notation. 285 00:18:39,950 --> 00:18:42,750 When I put bars around a set, that 286 00:18:42,750 --> 00:18:46,050 is the denoting how many are in the set. 287 00:18:46,050 --> 00:18:53,360 In the US there's about 147.6 men out of the 300. 288 00:18:53,360 --> 00:18:58,820 And the number of women-- oh we got a w here-- 289 00:18:58,820 --> 00:19:03,240 is about 152.4 million. 290 00:19:03,240 --> 00:19:05,740 So there's a little bit more nodes 291 00:19:05,740 --> 00:19:08,420 on this side of the graph, than that side in the US. 292 00:19:10,960 --> 00:19:12,130 What about the edges? 293 00:19:12,130 --> 00:19:14,115 Any idea of how many edges there are here? 294 00:19:17,097 --> 00:19:17,680 We don't know. 295 00:19:17,680 --> 00:19:21,070 I sure as heck don't know how many edges there are. 296 00:19:21,070 --> 00:19:22,410 So that we don't know. 297 00:19:22,410 --> 00:19:26,160 The cardinality of the edge set we don't know, 298 00:19:26,160 --> 00:19:27,940 and we're not likely to figure out. 299 00:19:27,940 --> 00:19:30,420 I don't even think these surveys, really, 300 00:19:30,420 --> 00:19:32,900 can estimate that. 301 00:19:32,900 --> 00:19:35,430 But what we're trying to figure out 302 00:19:35,430 --> 00:19:40,570 is the ratio of the average degree of the men, 303 00:19:40,570 --> 00:19:44,310 to the average degree of the women. 304 00:19:44,310 --> 00:19:46,280 Because the number of opposite-gender partners 305 00:19:46,280 --> 00:19:48,640 you have is your degree here, and you're 306 00:19:48,640 --> 00:19:50,770 looking for the average guy degree, 307 00:19:50,770 --> 00:19:54,370 compared to the average female degree here. 308 00:19:54,370 --> 00:19:55,390 That's what we're after. 309 00:19:55,390 --> 00:19:57,835 All right so let's find that quantity. 310 00:20:00,740 --> 00:20:10,020 Let's let A sub m equal the average number 311 00:20:10,020 --> 00:20:21,240 of opposite-gender partners for men. 312 00:20:24,830 --> 00:20:27,970 And we can let A W be the same thing for women. 313 00:20:37,310 --> 00:20:38,090 All right. 314 00:20:38,090 --> 00:20:42,440 Now we're trying to figure out the answer to this question. 315 00:20:42,440 --> 00:20:48,360 What is A m, the average guy degree, 316 00:20:48,360 --> 00:20:52,550 over the average woman degree. 317 00:20:52,550 --> 00:20:54,490 And in particular, the University of Chicago 318 00:20:54,490 --> 00:20:58,480 says, they say it's 1.74. 319 00:20:58,480 --> 00:21:02,770 That the average guy as 74% more opposite-gender partners 320 00:21:02,770 --> 00:21:04,550 than the average woman. 321 00:21:04,550 --> 00:21:10,830 ABC News says it's 3.33, that is 233% more for the men, 322 00:21:10,830 --> 00:21:12,460 than the women. 323 00:21:12,460 --> 00:21:16,860 Now we're going to figure this out what this ratio is. 324 00:21:16,860 --> 00:21:18,540 Just use a little bit of math here, 325 00:21:18,540 --> 00:21:21,260 and a little bit of graph theory. 326 00:21:21,260 --> 00:21:23,650 So let's write a formula for A m. 327 00:21:27,760 --> 00:21:34,230 Well we're trying to figure out the average degree over here. 328 00:21:34,230 --> 00:21:35,660 Well, that's pretty simple. 329 00:21:35,660 --> 00:21:39,370 We just add up all the degrees, and divide 330 00:21:39,370 --> 00:21:40,735 by the number of nodes. 331 00:21:40,735 --> 00:21:42,360 And that'll give us the average degree. 332 00:21:44,910 --> 00:21:48,390 So the average degree is the sum of the degrees, 333 00:21:48,390 --> 00:21:51,670 over all men, x in the set of men, 334 00:21:51,670 --> 00:21:56,490 of the degree of x, divided by the number of men. 335 00:21:59,830 --> 00:22:04,620 Can somebody give me a simpler expression for this? 336 00:22:04,620 --> 00:22:07,454 It doesn't have that nasty sum in it? 337 00:22:07,454 --> 00:22:09,210 AUDIENCE: E. 338 00:22:09,210 --> 00:22:12,440 PROFESSOR: E. The cardinality of E. 339 00:22:12,440 --> 00:22:16,960 I'm adding all the degrees here. 340 00:22:16,960 --> 00:22:20,940 Well that's just another way of counting all the edges, 341 00:22:20,940 --> 00:22:24,430 because every edge shows up once, and only once, 342 00:22:24,430 --> 00:22:26,360 in a degree count here. 343 00:22:26,360 --> 00:22:28,990 And this is where, we use the fact we 344 00:22:28,990 --> 00:22:30,370 have opposite-gender partners. 345 00:22:33,020 --> 00:22:36,130 Because if I had some edges over here 346 00:22:36,130 --> 00:22:40,240 they wouldn't get counted in sum of the degrees here. 347 00:22:40,240 --> 00:22:45,255 All right so this is just the cardinality 348 00:22:45,255 --> 00:22:48,970 of the number of edges, divided by the number of men. 349 00:22:48,970 --> 00:22:50,380 Any questions about that? 350 00:22:50,380 --> 00:22:53,020 Because this is an important statement about 351 00:22:53,020 --> 00:22:54,930 graphs in general. 352 00:22:54,930 --> 00:22:57,384 When I have a graph like this-- which 353 00:22:57,384 --> 00:22:58,800 is called a bipartite graph, we'll 354 00:22:58,800 --> 00:23:00,249 talk about more in a little bit. 355 00:23:00,249 --> 00:23:02,290 But where the edges go from the left to the right 356 00:23:02,290 --> 00:23:03,987 if I sum the degrees on the left, 357 00:23:03,987 --> 00:23:05,570 I'm just counting the number of edges. 358 00:23:09,060 --> 00:23:13,840 All right, let's figure out a formula for the average number 359 00:23:13,840 --> 00:23:16,340 of partners for the women. 360 00:23:16,340 --> 00:23:23,240 That simple that's just sum x over the women. 361 00:23:23,240 --> 00:23:27,950 The degree of x, divided by the number of women. 362 00:23:30,900 --> 00:23:32,530 Let me rewrite that so it's clearer. 363 00:23:36,150 --> 00:23:38,508 What's a simpler expression for this? 364 00:23:38,508 --> 00:23:39,383 AUDIENCE: [INAUDIBLE] 365 00:23:43,080 --> 00:23:46,310 PROFESSOR: Yeah, this sum, adding 366 00:23:46,310 --> 00:23:52,410 the degrees of the women, is just the number of edges, 367 00:23:52,410 --> 00:23:54,690 right. 368 00:23:54,690 --> 00:24:00,280 So that is cardinality of edges, divided by the number of women. 369 00:24:00,280 --> 00:24:06,700 All right, well now we can write, solve for our formula, 370 00:24:06,700 --> 00:24:12,320 average over men over average of the women. 371 00:24:12,320 --> 00:24:22,126 That's E over VM, divided by E over VW. 372 00:24:22,126 --> 00:24:24,900 Wow, this is nice. 373 00:24:24,900 --> 00:24:30,290 I don't know the number of edges is, but it just canceled out. 374 00:24:30,290 --> 00:24:33,300 And this is just the number of women, 375 00:24:33,300 --> 00:24:34,890 divided by the number of men. 376 00:24:37,430 --> 00:24:40,740 And in fact we know that. 377 00:24:40,740 --> 00:24:42,660 That's this number, divided by that number, 378 00:24:42,660 --> 00:24:52,090 which is about 1.0325. 379 00:24:52,090 --> 00:24:57,550 So we just proved, that on average, a man 380 00:24:57,550 --> 00:25:01,950 has 3%, or 3 and 1/4% more opposite-gender partners 381 00:25:01,950 --> 00:25:02,500 than women. 382 00:25:05,010 --> 00:25:08,800 No need to do the interviews, or spend years doing. 383 00:25:08,800 --> 00:25:10,390 That is the answer. 384 00:25:10,390 --> 00:25:14,120 And it has nothing to do with the promiscuity of men, 385 00:25:14,120 --> 00:25:16,345 or women, nothing at all. 386 00:25:19,080 --> 00:25:23,300 So the Chicago study is way off, and the ABC New study 387 00:25:23,300 --> 00:25:25,400 is completely nuts. 388 00:25:25,400 --> 00:25:30,050 It just can't be right, this is a proof. 389 00:25:30,050 --> 00:25:33,740 Now what happened here? 390 00:25:33,740 --> 00:25:37,910 Well what's going on, what's the reason for why this is true? 391 00:25:37,910 --> 00:25:38,410 Yeah? 392 00:25:38,410 --> 00:25:41,679 AUDIENCE: A male has a female partner then 393 00:25:41,679 --> 00:25:44,004 the female has a male partner. 394 00:25:44,004 --> 00:25:44,670 PROFESSOR: Yeah. 395 00:25:44,670 --> 00:25:47,456 AUDIENCE: You're not looking at like how many males 396 00:25:47,456 --> 00:25:49,786 are going to one female. 397 00:25:49,786 --> 00:25:53,110 The promiscuity isn't even a part of the question. 398 00:25:53,110 --> 00:25:54,110 PROFESSOR: That's right. 399 00:25:54,110 --> 00:25:56,610 It takes two to tango. 400 00:25:56,610 --> 00:25:58,880 Every time you got a guy, you got a women. 401 00:25:58,880 --> 00:26:01,920 And you have the number of relationships going. 402 00:26:01,920 --> 00:26:04,530 The average for the men is that number, divided by the men. 403 00:26:04,530 --> 00:26:06,810 Average for the women is that same number, 404 00:26:06,810 --> 00:26:07,699 divided by the women. 405 00:26:07,699 --> 00:26:09,240 And so if there's more women, they're 406 00:26:09,240 --> 00:26:11,080 going to have less partners on average. 407 00:26:11,080 --> 00:26:12,670 Has to be. 408 00:26:12,670 --> 00:26:14,420 So it really was a stupid question. 409 00:26:14,420 --> 00:26:17,970 It's very, very simple to answer. 410 00:26:17,970 --> 00:26:21,340 Now as it turns out there are endless studies like this, 411 00:26:21,340 --> 00:26:23,090 in the literature. 412 00:26:23,090 --> 00:26:25,940 In fact, a few years ago the Boston Globe 413 00:26:25,940 --> 00:26:31,140 ran an explosive story about the study habits of students 414 00:26:31,140 --> 00:26:33,530 on Boston-area campuses. 415 00:26:33,530 --> 00:26:38,320 And their surveys show that, on average, minority students 416 00:26:38,320 --> 00:26:41,380 tended to study with non-minority students 417 00:26:41,380 --> 00:26:42,685 more than the other way around. 418 00:26:46,350 --> 00:26:49,250 And they want on great length consulting the experts 419 00:26:49,250 --> 00:26:50,980 as to why this might be true. 420 00:26:50,980 --> 00:26:53,230 Why is it the minority students study 421 00:26:53,230 --> 00:26:56,870 with non-minority students more than the other way around. 422 00:26:56,870 --> 00:26:58,940 Now can anyone tell me why it is certainly 423 00:26:58,940 --> 00:27:02,205 true, and not surprising, why that's the case? 424 00:27:02,205 --> 00:27:04,120 AUDIENCE: Because they're the minority. 425 00:27:04,120 --> 00:27:05,703 PROFESSOR: Because they're a minority. 426 00:27:05,703 --> 00:27:07,745 There's fewer minorities than non-minorities. 427 00:27:07,745 --> 00:27:12,030 End of story, we don't need this sociology PhD 428 00:27:12,030 --> 00:27:13,780 from down the street to explain it to us. 429 00:27:16,780 --> 00:27:19,060 We're going to see a lot of other bogus studies later. 430 00:27:19,060 --> 00:27:22,270 This is not unusual, especially when we get the probability. 431 00:27:22,270 --> 00:27:26,410 Just every day there's a new one in probability. 432 00:27:26,410 --> 00:27:29,310 Any questions about this before we leave? 433 00:27:29,310 --> 00:27:32,160 Unfortunately that's most all we'll say about sex today. 434 00:27:35,590 --> 00:27:36,800 OK. 435 00:27:36,800 --> 00:27:42,110 But now, in this example, we used an edge in the graph 436 00:27:42,110 --> 00:27:46,530 to denote some kind of affinity between two nodes. 437 00:27:46,530 --> 00:27:48,594 The two nodes liked each other in some sense 438 00:27:48,594 --> 00:27:50,010 if they were connected by an edge, 439 00:27:50,010 --> 00:27:52,520 or they had a relationship of some kind. 440 00:27:52,520 --> 00:27:54,490 There's lots of examples in computer science 441 00:27:54,490 --> 00:27:57,453 where you use an edge to denote just the opposite. 442 00:27:57,453 --> 00:28:00,200 That the two nodes can't be near each other, 443 00:28:00,200 --> 00:28:02,750 or don't like each other. 444 00:28:02,750 --> 00:28:04,420 For example, consider the problem 445 00:28:04,420 --> 00:28:08,080 of scheduling final exams at MIT. 446 00:28:08,080 --> 00:28:11,310 And they do this after they find out all of your schedules, 447 00:28:11,310 --> 00:28:14,354 and they try to schedule the exams so that you don't 448 00:28:14,354 --> 00:28:16,270 have to take two at once, or there's as little 449 00:28:16,270 --> 00:28:19,270 of that as possible. 450 00:28:19,270 --> 00:28:22,730 For example, let's do an example here. 451 00:28:33,460 --> 00:28:35,760 Say we look at these five classes. 452 00:28:44,590 --> 00:28:47,220 Take 6041. 453 00:28:47,220 --> 00:28:49,920 And this may not be totally accurate, but roughly. 454 00:28:57,250 --> 00:28:59,950 So I've got five MIT classes, and I'm 455 00:28:59,950 --> 00:29:03,920 going to put an edge between pairs of classes that have 456 00:29:03,920 --> 00:29:05,400 overlapping student enrollment. 457 00:29:08,630 --> 00:29:11,490 So in this case, for example, we've 458 00:29:11,490 --> 00:29:13,160 assumed in the drawing of his graph, 459 00:29:13,160 --> 00:29:17,280 that you can't have our exam the same time is 6002, 460 00:29:17,280 --> 00:29:20,640 on the assumption there's students in both classes. 461 00:29:20,640 --> 00:29:24,560 But you could have our exam the same time as 6034. 462 00:29:24,560 --> 00:29:27,730 Because there's not an overlapping student 463 00:29:27,730 --> 00:29:29,350 in both classes, so the exams could 464 00:29:29,350 --> 00:29:31,510 be scheduled at the same time. 465 00:29:31,510 --> 00:29:33,720 So we've used a graph to represent 466 00:29:33,720 --> 00:29:38,740 which courses can't have their exam at the same time. 467 00:29:38,740 --> 00:29:42,100 Now let's also suppose we have a set of slots for the exam. 468 00:29:45,360 --> 00:29:47,670 And say they're all on a Wednesday. 469 00:29:47,670 --> 00:29:51,480 And the first slot is Wednesday from 5:00 to 7:00. 470 00:29:51,480 --> 00:29:55,430 And the next one is 7:00 to 9:00. 471 00:29:55,430 --> 00:30:00,070 And then, the next one is 9:00 to 11:00. 472 00:30:00,070 --> 00:30:05,290 And then 11:00 to 1:00 in the morning, and then 1:00 to 3:00, 473 00:30:05,290 --> 00:30:07,930 getting pretty late. 474 00:30:07,930 --> 00:30:13,230 And your job is to figure out how not to have 475 00:30:13,230 --> 00:30:15,290 to use these later exam slots. 476 00:30:15,290 --> 00:30:17,140 You'd like to use as few as possible 477 00:30:17,140 --> 00:30:20,180 so you're not going too late night, 478 00:30:20,180 --> 00:30:22,930 or come before the holidays, so you're not 479 00:30:22,930 --> 00:30:25,380 having exams on Christmas and New Year's, for example. 480 00:30:28,450 --> 00:30:33,840 So the goal is to assign slots to the nodes. 481 00:30:33,840 --> 00:30:37,090 Put every node in a slot so you don't 482 00:30:37,090 --> 00:30:42,110 have nodes hooked by an edge getting the same slot. 483 00:30:42,110 --> 00:30:44,400 Now this is an example of what's called 484 00:30:44,400 --> 00:30:46,496 a graph coloring problem. 485 00:30:46,496 --> 00:30:47,370 So let's define that. 486 00:31:17,680 --> 00:31:44,020 Given a graph G, and K colors, assign a color to each node, 487 00:31:44,020 --> 00:31:55,970 so that adjacent nodes get different colors. 488 00:32:03,154 --> 00:32:08,120 All right, and then the minimum number of colors you need 489 00:32:08,120 --> 00:32:11,750 is called the chromatic number of the graph. 490 00:32:11,750 --> 00:32:36,460 So the minimum value of K, for which such a coloring exist, 491 00:32:36,460 --> 00:32:48,840 is the chromatic number OF the graph. 492 00:32:48,840 --> 00:32:59,956 And it's denoted by this symbol chi of G. 493 00:32:59,956 --> 00:33:02,330 Because usually you want to use a small number of colors. 494 00:33:02,330 --> 00:33:07,270 Now what does a color represent when we're 495 00:33:07,270 --> 00:33:09,220 dealing with this problem? 496 00:33:09,220 --> 00:33:10,760 What's the meaning of a color? 497 00:33:10,760 --> 00:33:11,900 AUDIENCE: Time slot. 498 00:33:11,900 --> 00:33:14,080 PROFESSOR: A time slot, OK. 499 00:33:14,080 --> 00:33:22,690 So let's call this time slot C1, C2, C3, C4, C5, so there's 500 00:33:22,690 --> 00:33:23,850 five possible colors. 501 00:33:23,850 --> 00:33:26,800 Now of course, we could color this graph with five colors, 502 00:33:26,800 --> 00:33:29,150 every node could just get its own color. 503 00:33:29,150 --> 00:33:33,060 But then somebody's taking their exam from 1:00 to 3:00 AM, 504 00:33:33,060 --> 00:33:36,880 and that's a bit of a pain. 505 00:33:36,880 --> 00:33:41,360 Let's see if we can do less than five. 506 00:33:41,360 --> 00:33:46,010 Let's say I give this color one, let's give this one color 507 00:33:46,010 --> 00:33:50,630 one, that's OK, because they're not connected. 508 00:33:50,630 --> 00:33:55,880 I can't give this one color one, so I give it color two, say. 509 00:33:55,880 --> 00:33:58,570 Now this one I can't give color one, because this guy got it, 510 00:33:58,570 --> 00:34:00,570 he can't get color two, because that guy got it. 511 00:34:00,570 --> 00:34:03,020 So it give it color three. 512 00:34:03,020 --> 00:34:08,159 And well, I can't do one, two, or three here, 513 00:34:08,159 --> 00:34:11,020 so I gotta go to color four. 514 00:34:11,020 --> 00:34:17,800 All right so 6042 will get the 11:00 PM to 1:00 AM slot, 515 00:34:17,800 --> 00:34:19,443 not so good. 516 00:34:19,443 --> 00:34:21,916 Can we do any better? 517 00:34:21,916 --> 00:34:26,139 Can we get away with three colors. 518 00:34:26,139 --> 00:34:29,870 Some say yes, some say no. 519 00:34:29,870 --> 00:34:33,690 How many people think you can do three colors on this graph? 520 00:34:33,690 --> 00:34:34,830 A bunch. 521 00:34:34,830 --> 00:34:37,402 How many think you can't do any better? 522 00:34:37,402 --> 00:34:40,270 All right, the vote is mostly for three. 523 00:34:40,270 --> 00:34:42,060 Let's see. 524 00:34:42,060 --> 00:34:42,639 Any ideas? 525 00:34:42,639 --> 00:34:43,980 Anybody see how to do three? 526 00:34:46,775 --> 00:34:47,275 Yeah? 527 00:34:47,275 --> 00:34:50,199 AUDIENCE: Assign C4 to 6034 . 528 00:34:50,199 --> 00:34:52,550 PROFESSOR: Assign C4 to 6043. 529 00:34:52,550 --> 00:34:55,889 AUDIENCE: Or C1 to 6042. 530 00:34:55,889 --> 00:34:59,580 PROFESSOR: C-- I can't do see C1 to 6042. 531 00:34:59,580 --> 00:35:01,561 It crashes, but can I do-- yeah? 532 00:35:01,561 --> 00:35:02,060 Put 533 00:35:02,060 --> 00:35:04,434 AUDIENCE: C1 in 6003. 534 00:35:04,434 --> 00:35:05,350 PROFESSOR: C1 in 6003. 535 00:35:05,350 --> 00:35:10,222 AUDIENCE: And get rid of C1 in 6034. 536 00:35:10,222 --> 00:35:11,180 PROFESSOR: Get rid of-- 537 00:35:11,180 --> 00:35:12,580 AUDIENCE: Make it C2. 538 00:35:12,580 --> 00:35:15,070 PROFESSOR: Make this a C2. 539 00:35:15,070 --> 00:35:17,362 Oh, yeah. 540 00:35:17,362 --> 00:35:21,520 All right, these got C1, they're not adjacent. 541 00:35:21,520 --> 00:35:23,740 These got C2, they're not adjacent. 542 00:35:23,740 --> 00:35:26,380 This can now get C3. 543 00:35:26,380 --> 00:35:29,990 So we can have our exam from 9:00 to 11:00, which is better. 544 00:35:29,990 --> 00:35:32,375 All right, can anybody do it in two colors? 545 00:35:40,780 --> 00:35:44,935 Can anybody offer a reason why two colors may not be possible? 546 00:35:47,536 --> 00:35:48,035 Yeah? 547 00:35:48,035 --> 00:35:51,575 AUDIENCE: Because let's say you could do it with two colors. 548 00:35:51,575 --> 00:35:52,200 PROFESSOR: Yep. 549 00:35:52,200 --> 00:35:58,170 AUDIENCE: 6041 and 6002 have to be different colors. 550 00:35:58,170 --> 00:35:59,830 PROFESSOR: Yes. 551 00:35:59,830 --> 00:36:03,500 AUDIENCE: 6042 can't be C1, and it can't be C2. 552 00:36:03,500 --> 00:36:05,300 PROFESSOR: Yeah, good. 553 00:36:05,300 --> 00:36:09,130 So you can't in two colors, because these three guys 554 00:36:09,130 --> 00:36:10,670 would violate that. 555 00:36:10,670 --> 00:36:11,990 You've got a triangle here. 556 00:36:11,990 --> 00:36:15,550 Each one of these guys has to be different than the other two. 557 00:36:15,550 --> 00:36:16,920 So two colors can't work. 558 00:36:16,920 --> 00:36:19,740 You've got to have at least three in this case. 559 00:36:19,740 --> 00:36:20,760 So three is optimal. 560 00:36:20,760 --> 00:36:24,000 We have just shown for this graph, 561 00:36:24,000 --> 00:36:26,836 the chromatic number is three. 562 00:36:26,836 --> 00:36:33,300 All right, now in general doing what we just did is very hard. 563 00:36:33,300 --> 00:36:35,680 No one knows a fast algorithm for determining 564 00:36:35,680 --> 00:36:38,560 the chromatic number. 565 00:36:38,560 --> 00:36:40,370 In fact, it's a weird kind of problem, 566 00:36:40,370 --> 00:36:44,239 because it's easy enough to check that a coloring is OK. 567 00:36:44,239 --> 00:36:46,530 If somebody put a coloring on the board, you can check, 568 00:36:46,530 --> 00:36:48,420 oh that works really simply. 569 00:36:48,420 --> 00:36:52,070 Just check every edge, and make sure the colors are different. 570 00:36:52,070 --> 00:36:54,590 But figuring it out, as best we know, 571 00:36:54,590 --> 00:36:57,840 you've got to try an exponential number of possibilities. 572 00:36:57,840 --> 00:37:00,962 So if I had 100 nodes here, my running time 573 00:37:00,962 --> 00:37:02,920 of the algorithm to check all the possibilities 574 00:37:02,920 --> 00:37:04,800 would be exponential and a hundred. 575 00:37:04,800 --> 00:37:05,300 Yeah? 576 00:37:05,300 --> 00:37:08,772 AUDIENCE: Can that number just like the highest 577 00:37:08,772 --> 00:37:11,750 degree of each node, or nodes. 578 00:37:11,750 --> 00:37:13,015 PROFESSOR: Uh no. 579 00:37:13,015 --> 00:37:15,770 But it's no worse than something like that, 580 00:37:15,770 --> 00:37:17,130 as we'll see a few minutes. 581 00:37:17,130 --> 00:37:18,750 That's a great observation. 582 00:37:18,750 --> 00:37:21,210 And we're going to come back to that in a few minutes. 583 00:37:21,210 --> 00:37:24,180 But it's not just that. 584 00:37:24,180 --> 00:37:30,140 OK now in fact even figuring out for an arbitrary graph 585 00:37:30,140 --> 00:37:33,320 if three colors can be done, called 586 00:37:33,320 --> 00:37:36,290 the three-coloring problem, that's really hard. 587 00:37:36,290 --> 00:37:39,570 No one knows how to solve that in less than exponential time. 588 00:37:39,570 --> 00:37:43,000 In fact, one of these NP-complete problems 589 00:37:43,000 --> 00:37:44,340 is what it's called. 590 00:37:44,340 --> 00:37:47,870 How many people here don't know about NP-completeness? 591 00:37:47,870 --> 00:37:49,822 Is everybody-- all right so all of you 592 00:37:49,822 --> 00:37:51,030 haven't seen NP-completeness. 593 00:37:51,030 --> 00:37:56,120 OK so there is a class of thousands of problems-- 594 00:37:56,120 --> 00:38:00,230 in fact there's books list these 1,000 problems-- that are all 595 00:38:00,230 --> 00:38:03,000 NP-complete, somebody's proved they belong in the class. 596 00:38:03,000 --> 00:38:06,440 And what that means is that if somebody gave you 597 00:38:06,440 --> 00:38:09,300 a solution, like a coloring here, 598 00:38:09,300 --> 00:38:13,850 it's easy to check really quickly if it's valid. 599 00:38:13,850 --> 00:38:16,040 But figuring it out is really hard. 600 00:38:16,040 --> 00:38:17,790 And if you figured out how to solve 601 00:38:17,790 --> 00:38:19,761 one of those thousands of problems, 602 00:38:19,761 --> 00:38:22,260 like suddenly you figured out how to tell if any graph could 603 00:38:22,260 --> 00:38:25,610 work with three colors, you would solve automatically 604 00:38:25,610 --> 00:38:28,429 all other thousands in the book. 605 00:38:28,429 --> 00:38:30,470 So it's this book of problems you will constantly 606 00:38:30,470 --> 00:38:33,952 run into in your career in computer science. 607 00:38:33,952 --> 00:38:35,410 And it's bad when you run into one, 608 00:38:35,410 --> 00:38:38,200 because there's no good algorithm to solve it known. 609 00:38:38,200 --> 00:38:40,065 But if you just solved one of them, 610 00:38:40,065 --> 00:38:43,620 the other thousands would suddenly be solvable quickly. 611 00:38:43,620 --> 00:38:46,540 Even better, you win a million dollar prize. 612 00:38:46,540 --> 00:38:48,000 One of these Millennium Prizes we 613 00:38:48,000 --> 00:38:50,860 talked about the first lecture. 614 00:38:50,860 --> 00:38:53,880 Even if you show you can't find a fast algorithm for one 615 00:38:53,880 --> 00:38:57,000 of them, that means that known of them have fast algorithms, 616 00:38:57,000 --> 00:38:59,397 and you also get a million dollars. 617 00:38:59,397 --> 00:39:01,980 So this is the central problem in computer science, and theory 618 00:39:01,980 --> 00:39:04,410 computing, is whether or not you could solve 619 00:39:04,410 --> 00:39:07,060 these NP-complete problems. 620 00:39:07,060 --> 00:39:09,770 Now actually lots of people have claim to do it. 621 00:39:09,770 --> 00:39:12,300 And in fact, there was a lot of buzz in the community 622 00:39:12,300 --> 00:39:15,110 about a month ago when actually a reputable researcher 623 00:39:15,110 --> 00:39:17,370 at HP Labs said he'd done it. 624 00:39:17,370 --> 00:39:21,630 He proved that you can't solve NP-complete problems. 625 00:39:21,630 --> 00:39:24,100 And he got people going for probably at least a week, 626 00:39:24,100 --> 00:39:26,420 until they discovered a fatal flaw. 627 00:39:26,420 --> 00:39:29,550 And the proof was actually bogus. 628 00:39:29,550 --> 00:39:31,770 So no one still knows if you can solve 629 00:39:31,770 --> 00:39:34,770 these NP-complete problems quickly. 630 00:39:34,770 --> 00:39:38,620 Now the problem is, in practice, you run into these things 631 00:39:38,620 --> 00:39:40,430 all the time, like MIT really does 632 00:39:40,430 --> 00:39:42,430 have to schedule the exams. 633 00:39:42,430 --> 00:39:43,680 So you've got to do something. 634 00:39:43,680 --> 00:39:46,690 You can't just go say, hey it's NP-complete, so no exams 635 00:39:46,690 --> 00:39:48,870 this year, or whatever. 636 00:39:48,870 --> 00:39:52,970 That's not going to fly, so you got to do something. 637 00:39:52,970 --> 00:39:55,320 So now this is a problem-- many of you 638 00:39:55,320 --> 00:39:58,510 when you go into careers, you're going to be faced with this. 639 00:39:58,510 --> 00:40:00,250 You got to do something. 640 00:40:00,250 --> 00:40:06,010 Any thoughts about an algorithm for coloring graphs that might 641 00:40:06,010 --> 00:40:07,880 use a small number of colors? 642 00:40:07,880 --> 00:40:10,335 It doesn't have to always work, or you're 643 00:40:10,335 --> 00:40:11,960 going to win a lot of money if it does. 644 00:40:11,960 --> 00:40:14,300 But a simple algorithm, you can't 645 00:40:14,300 --> 00:40:16,590 take either the 100 steps. 646 00:40:16,590 --> 00:40:19,690 You got to be linear, probably, or quadratic time. 647 00:40:19,690 --> 00:40:22,800 That could get you a small number of colors. 648 00:40:22,800 --> 00:40:25,860 Any thoughts about what you'd do? 649 00:40:25,860 --> 00:40:26,360 Yeah? 650 00:40:26,360 --> 00:40:28,770 AUDIENCE: The number of degrees and nodes? 651 00:40:28,770 --> 00:40:30,590 PROFESSOR: The number-- what about it? 652 00:40:30,590 --> 00:40:33,220 AUDIENCE: The highest degree and that node, 653 00:40:33,220 --> 00:40:34,799 the 6042 is [INAUDIBLE]. 654 00:40:34,799 --> 00:40:35,465 PROFESSOR: Yeah. 655 00:40:35,465 --> 00:40:38,700 AUDIENCE: So you could use that. 656 00:40:38,700 --> 00:40:40,210 PROFESSOR: Good, all right. 657 00:40:40,210 --> 00:40:42,200 So what do I do with that-- so I found 658 00:40:42,200 --> 00:40:45,130 a node with a high degree, there's 659 00:40:45,130 --> 00:40:47,030 three of them have degree three here. 660 00:40:47,030 --> 00:40:48,806 What do I do with them? 661 00:40:48,806 --> 00:40:52,324 AUDIENCE: Pick a different color to. 662 00:40:52,324 --> 00:40:53,740 PROFESSOR: Pick a different color, 663 00:40:53,740 --> 00:40:55,890 that means I've colored some of the others. 664 00:40:55,890 --> 00:40:59,040 If I pick a different color, do I start with them, 665 00:40:59,040 --> 00:41:03,030 or do I finish with a high degree nodes? 666 00:41:03,030 --> 00:41:05,807 Because you've got to assign the colors to them. 667 00:41:05,807 --> 00:41:07,890 And high degree is important to be thinking about. 668 00:41:07,890 --> 00:41:10,431 We're going to prove a theorem in just a minute about related 669 00:41:10,431 --> 00:41:11,730 to degree and coloring. 670 00:41:11,730 --> 00:41:14,800 AUDIENCE: Start with them. 671 00:41:14,800 --> 00:41:17,310 PROFESSOR: Start with them, and do what with it? 672 00:41:17,310 --> 00:41:18,498 Color? 673 00:41:18,498 --> 00:41:22,890 AUDIENCE: Yeah, and then assign the ones that aren't connected 674 00:41:22,890 --> 00:41:26,310 [INAUDIBLE] to the same slots. 675 00:41:26,310 --> 00:41:29,250 PROFESSOR: OK, so I could-- here's a degree of theory 676 00:41:29,250 --> 00:41:32,630 now I can start with color one for that. 677 00:41:32,630 --> 00:41:34,190 And then what do I do next? 678 00:41:34,190 --> 00:41:38,372 I pick-- its neighbors have to get different colors, I guess. 679 00:41:38,372 --> 00:41:39,830 You'd start coloring the neighbors. 680 00:41:39,830 --> 00:41:42,305 AUDIENCE: My first instinct would be 681 00:41:42,305 --> 00:41:44,780 to color all the [INAUDIBLE]. 682 00:41:48,250 --> 00:41:49,530 PROFESSOR: OK. 683 00:41:49,530 --> 00:41:51,320 And what color would use for them? 684 00:41:51,320 --> 00:41:52,702 AUDIENCE: Different ones. 685 00:41:52,702 --> 00:41:54,660 PROFESSOR: Different ones if they're connected, 686 00:41:54,660 --> 00:41:57,790 or if they're not connected you'd still use different ones? 687 00:41:57,790 --> 00:42:00,670 AUDIENCE: Only if they're connected. 688 00:42:00,670 --> 00:42:02,940 PROFESSOR: Only they're connected use different ones. 689 00:42:02,940 --> 00:42:06,901 And so if they're not connected, you'd use the same colors? 690 00:42:06,901 --> 00:42:07,401 Yeah? 691 00:42:10,190 --> 00:42:14,620 You're going close, and it actually works pretty well. 692 00:42:14,620 --> 00:42:17,910 The underlying principle you're sort of thinking about here 693 00:42:17,910 --> 00:42:21,160 is you've got some notion of the order in which you're 694 00:42:21,160 --> 00:42:22,985 going to process your graph. 695 00:42:22,985 --> 00:42:25,110 And you're going to start with a high degree nodes, 696 00:42:25,110 --> 00:42:26,530 in your case. 697 00:42:26,530 --> 00:42:28,200 And as you go along, you're going 698 00:42:28,200 --> 00:42:30,950 to start coloring the nodes. 699 00:42:30,950 --> 00:42:33,990 And you're going to make sure you color them legally. 700 00:42:33,990 --> 00:42:37,920 And it sounds like you're going to color them with a low color 701 00:42:37,920 --> 00:42:39,270 as you go along. 702 00:42:39,270 --> 00:42:44,860 And that is probably the most basic graph coloring approach. 703 00:42:44,860 --> 00:42:48,560 And almost you could almost say is a generic approach. 704 00:42:48,560 --> 00:42:52,091 So let's define that, and then see prove some facts about it. 705 00:42:56,710 --> 00:42:58,860 Most of the graph coloring algorithms in practice 706 00:42:58,860 --> 00:43:01,070 are based on this approach. 707 00:43:01,070 --> 00:43:05,320 And we're going to call it the basic graph coloring algorithm. 708 00:43:14,240 --> 00:43:20,840 And for our graph G, with vertices V, and edges E. 709 00:43:20,840 --> 00:43:31,080 So the first step is going to be to order the nodes from 1 to n. 710 00:43:34,110 --> 00:43:35,850 Now in your case, you were suggesting 711 00:43:35,850 --> 00:43:39,700 an ordering where I have the high degree nodes first. 712 00:43:39,700 --> 00:43:41,217 All right. 713 00:43:41,217 --> 00:43:43,050 But for now we're not going to specify that. 714 00:43:43,050 --> 00:43:47,290 We're going to make it any ordering you want. 715 00:43:47,290 --> 00:43:50,080 And then we're going to have a notion of an order 716 00:43:50,080 --> 00:43:51,490 on the colors, as well. 717 00:43:57,815 --> 00:43:59,190 And I don't know how many colors, 718 00:43:59,190 --> 00:44:02,480 but they're going to be numbered 1, 2, and so forth. 719 00:44:02,480 --> 00:44:08,773 And then we're going to process the nodes one at a time, 720 00:44:08,773 --> 00:44:13,860 to N. We color the nodes, what is step I, 721 00:44:13,860 --> 00:44:18,430 we color the Ith node V sub i with the lowest legal color. 722 00:44:25,240 --> 00:44:28,490 And by the legal I mean you don't color at the same node 723 00:44:28,490 --> 00:44:31,640 as another node that's already been colored the same that it's 724 00:44:31,640 --> 00:44:32,602 adjacent to. 725 00:44:40,790 --> 00:44:43,290 All right so let's try this. 726 00:44:43,290 --> 00:44:47,480 In fact, this is sort of the algorithm I used initially 727 00:44:47,480 --> 00:44:50,700 to color exam graph over there. 728 00:44:54,508 --> 00:44:58,230 All right, so let's look at that. 729 00:44:58,230 --> 00:45:04,310 So let's say we-- let me erase the colors here, and put 730 00:45:04,310 --> 00:45:05,465 an ordering on the nodes. 731 00:45:08,100 --> 00:45:12,230 So let's say I ordered them with 6034 first, 732 00:45:12,230 --> 00:45:15,060 so this would be V1. 733 00:45:15,060 --> 00:45:18,390 Then 6041 is V2. 734 00:45:18,390 --> 00:45:23,380 Then V3, V4, V5. 735 00:45:26,290 --> 00:45:30,552 If that's my ordering, what color would I assign to 6034? 736 00:45:30,552 --> 00:45:31,730 AUDIENCE: One. 737 00:45:31,730 --> 00:45:35,140 PROFESSOR: One, C1, I'd color it first to get C1. 738 00:45:35,140 --> 00:45:38,860 What color does 6041 get? 739 00:45:38,860 --> 00:45:41,420 C1, as well, it's the lowest possible color 740 00:45:41,420 --> 00:45:47,110 that's legal, and is not hooked to this guy, so C1 is legal. 741 00:45:47,110 --> 00:45:49,810 What color do I give here? 742 00:45:49,810 --> 00:45:52,480 C2. 743 00:45:52,480 --> 00:45:59,170 Then I color this one next C-- can't do C2, can't do C1, 744 00:45:59,170 --> 00:46:01,460 so I pick C3. 745 00:46:01,460 --> 00:46:03,500 And then I get to 6042 last, and I 746 00:46:03,500 --> 00:46:07,220 can't do one, two, or three, so I do four. 747 00:46:07,220 --> 00:46:09,150 All right so algorithm, with that ordering, 748 00:46:09,150 --> 00:46:10,860 gave four colors. 749 00:46:10,860 --> 00:46:14,970 However we know there's a way to do a different ordering that 750 00:46:14,970 --> 00:46:19,220 gives us three colors. 751 00:46:19,220 --> 00:46:20,840 In particular, let's see if we do 752 00:46:20,840 --> 00:46:23,150 this what happens if we use this other ordering. 753 00:46:23,150 --> 00:46:26,050 Let me erase these. 754 00:46:26,050 --> 00:46:33,960 Say that's V1, V2, V3, V4, V5. 755 00:46:33,960 --> 00:46:40,140 Now I get C1, this will be C2, C1. 756 00:46:43,920 --> 00:46:45,670 What's this one get? 757 00:46:45,670 --> 00:46:46,320 C2. 758 00:46:46,320 --> 00:46:49,350 Ah, much better. 759 00:46:49,350 --> 00:46:50,800 C3. 760 00:46:50,800 --> 00:46:54,880 So different orderings result in different numbers of colors 761 00:46:54,880 --> 00:46:56,380 here. 762 00:46:56,380 --> 00:47:00,850 So the whole art now becomes finding a clever ordering. 763 00:47:00,850 --> 00:47:03,210 And so many people have already had good ideas, 764 00:47:03,210 --> 00:47:05,850 pick the largest degree nodes first. 765 00:47:05,850 --> 00:47:11,740 And in fact, if you simulate the algorithm on lots of graphs, 766 00:47:11,740 --> 00:47:16,140 you do better on average when you color the larger degree 767 00:47:16,140 --> 00:47:17,820 nodes first. 768 00:47:17,820 --> 00:47:20,850 And then if you start to use more exotic orderings, 769 00:47:20,850 --> 00:47:22,762 you can do even better. 770 00:47:22,762 --> 00:47:24,720 If you take a lot of graphs that are out there, 771 00:47:24,720 --> 00:47:26,719 and run your algorithm, and see how well you do, 772 00:47:26,719 --> 00:47:29,540 you do better with more sophisticated orderings. 773 00:47:29,540 --> 00:47:34,550 In fact, this was my senior thesis back 774 00:47:34,550 --> 00:47:36,290 when I was undergraduate student. 775 00:47:36,290 --> 00:47:39,650 I was trying to figure out better and better orderings 776 00:47:39,650 --> 00:47:41,690 that worked for graphs. 777 00:47:41,690 --> 00:47:44,820 And at the time it caused a bit of a problem. 778 00:47:44,820 --> 00:47:47,470 I was a undergraduate at Princeton. 779 00:47:47,470 --> 00:47:49,120 And Princeton, to this day I think, 780 00:47:49,120 --> 00:47:52,090 still has exams after the holidays, the Christmas 781 00:47:52,090 --> 00:47:54,010 holidays, New Year's holidays. 782 00:47:54,010 --> 00:47:57,810 And the students wanted to have the exams before Christmas, 783 00:47:57,810 --> 00:47:59,770 because they hated going home for the holiday, 784 00:47:59,770 --> 00:48:01,790 and then you've got to worry about your exams 785 00:48:01,790 --> 00:48:03,270 when you come back. 786 00:48:03,270 --> 00:48:06,050 And the faculty said no, there's no way 787 00:48:06,050 --> 00:48:09,570 to get them all compressed into a small number of days. 788 00:48:09,570 --> 00:48:12,010 Now I wasn't aware of all that of the time. 789 00:48:12,010 --> 00:48:14,255 But my thesis was go figure out good ordering. 790 00:48:14,255 --> 00:48:15,880 So I tried lots of different orderings. 791 00:48:15,880 --> 00:48:17,850 And I tried the largest degree first, 792 00:48:17,850 --> 00:48:21,730 and recursive versions of that actually worked very well. 793 00:48:21,730 --> 00:48:25,620 And then tried it on the Princeton exam graph. 794 00:48:25,620 --> 00:48:28,310 And lo and behold, you could actually squish it down, 795 00:48:28,310 --> 00:48:30,200 so you could give all the exams, I think was, 796 00:48:30,200 --> 00:48:33,960 4 and 1/2 days, plenty of time to give them before Christmas. 797 00:48:33,960 --> 00:48:36,330 Which caused a fair of scandal at the time, 798 00:48:36,330 --> 00:48:38,670 because then the faculty had to come clean 799 00:48:38,670 --> 00:48:40,730 that they just didn't want to bother having 800 00:48:40,730 --> 00:48:44,180 the exams before Christmas. 801 00:48:44,180 --> 00:48:46,770 Now this algorithm is an example of what's 802 00:48:46,770 --> 00:48:48,780 known as a greedy algorithm. 803 00:48:48,780 --> 00:48:52,930 Now in a greedy algorithm it's always simple. 804 00:48:52,930 --> 00:48:55,360 You just go one step after the next, 805 00:48:55,360 --> 00:48:57,760 taking the best you can do at each stop. 806 00:48:57,760 --> 00:49:00,670 You never go back and try to make things better. 807 00:49:00,670 --> 00:49:03,950 You never do hill climbing, if you're familiar with that term. 808 00:49:03,950 --> 00:49:07,010 You just always keep it simple, one thing after the next, 809 00:49:07,010 --> 00:49:08,830 very fast. 810 00:49:08,830 --> 00:49:10,770 Sometimes it works great in practice. 811 00:49:10,770 --> 00:49:12,570 Sometimes it doesn't. 812 00:49:12,570 --> 00:49:16,860 But it's always where you start, some simple approach like this. 813 00:49:16,860 --> 00:49:20,770 Now this algorithm actually, even 814 00:49:20,770 --> 00:49:22,850 if you don't try to monkey with the ordering, 815 00:49:22,850 --> 00:49:25,140 even for a worst case ordering of the nodes, 816 00:49:25,140 --> 00:49:28,930 that actually does pretty good for a lot of graphs. 817 00:49:28,930 --> 00:49:30,730 And in fact, it does really well-- 818 00:49:30,730 --> 00:49:33,040 as somebody already asked about-- 819 00:49:33,040 --> 00:49:35,510 if all the nodes have low degree. 820 00:49:35,510 --> 00:49:37,060 So let's state that as a theorem. 821 00:49:37,060 --> 00:49:38,518 And then we're going to prove that. 822 00:49:44,329 --> 00:49:54,330 So if every node in a graph G has degree, at most, 823 00:49:54,330 --> 00:49:58,550 d-- so that's the biggest degree in the graph, D-- then 824 00:49:58,550 --> 00:50:16,480 this basic algorithm uses, at most, d plus 1 colors for G. 825 00:50:16,480 --> 00:50:19,920 No matter what the ordering is, you'll 826 00:50:19,920 --> 00:50:23,490 never do worse than d plus 1 colors. 827 00:50:23,490 --> 00:50:28,840 So what's the value of d for our exam graph over here? 828 00:50:28,840 --> 00:50:30,180 d is 3. 829 00:50:30,180 --> 00:50:33,240 Every node has degree, at most, three. 830 00:50:33,240 --> 00:50:37,120 And so it says, that no matter what ordering you picked here, 831 00:50:37,120 --> 00:50:39,350 you'd get at most four colors. 832 00:50:39,350 --> 00:50:40,350 Now you might do better. 833 00:50:40,350 --> 00:50:42,620 In fact, we found an ordering that got three. 834 00:50:42,620 --> 00:50:45,940 So it's possible to do better. 835 00:50:45,940 --> 00:50:49,040 So let's prove this fact because this makes a difference. 836 00:50:49,040 --> 00:50:53,260 Say you have a graph with hundreds of nodes. 837 00:50:53,260 --> 00:50:55,716 But every node has degree, at most, three. 838 00:50:55,716 --> 00:50:58,560 Well that says you only need four colors even, 839 00:50:58,560 --> 00:51:01,840 if the graph has 1,000 nodes, and that's very useful. 840 00:51:01,840 --> 00:51:04,580 So in that kind of situation it does very well. 841 00:51:04,580 --> 00:51:05,525 So let's prove that. 842 00:51:09,570 --> 00:51:12,733 Any ideas as to what proof technique we're going to use? 843 00:51:12,733 --> 00:51:14,705 AUDIENCE: Invariant. 844 00:51:14,705 --> 00:51:17,120 PROFESSOR: Invariant, close. 845 00:51:17,120 --> 00:51:19,560 Not quite an invariant, but close. 846 00:51:19,560 --> 00:51:22,450 AUDIENCE: [INAUDIBLE] 847 00:51:22,450 --> 00:51:23,404 PROFESSOR: What? 848 00:51:23,404 --> 00:51:24,820 AUDIENCE: Well ordering principle. 849 00:51:24,820 --> 00:51:26,420 PROFESSOR: You know well ordering principle, yeah, 850 00:51:26,420 --> 00:51:28,503 we're going to use the equivalent version of that. 851 00:51:28,503 --> 00:51:31,024 We're going to use induction. 852 00:51:31,024 --> 00:51:33,190 If you like well-- it's equivalent to well ordering. 853 00:51:33,190 --> 00:51:35,314 If you like well ordering you could do it that way. 854 00:51:35,314 --> 00:51:38,400 I think it's easier using induction here. 855 00:51:38,400 --> 00:51:39,765 So the proof is by induction. 856 00:51:45,110 --> 00:51:46,720 All right so the first thing we need 857 00:51:46,720 --> 00:51:47,845 is an induction hypothesis. 858 00:51:50,950 --> 00:51:54,450 Any thoughts about what the induction hypothesis should be? 859 00:51:54,450 --> 00:51:55,202 Yeah? 860 00:51:55,202 --> 00:52:01,270 AUDIENCE: If you have a graph with n nodes then 861 00:52:01,270 --> 00:52:06,150 where the degree of any nodes is less than [INAUDIBLE] 862 00:52:06,150 --> 00:52:08,090 then you can do it. 863 00:52:08,090 --> 00:52:09,090 PROFESSOR: That's great. 864 00:52:09,090 --> 00:52:12,030 You're going to do really well on the midterm, 865 00:52:12,030 --> 00:52:14,450 because you put an n into this thing, 866 00:52:14,450 --> 00:52:16,570 but there's not an n here to start. 867 00:52:16,570 --> 00:52:18,320 What are most people going to do-- 868 00:52:18,320 --> 00:52:20,210 we used to ask this actually. 869 00:52:20,210 --> 00:52:22,830 We asked this once on a test many years ago, 870 00:52:22,830 --> 00:52:25,820 and it was an utter disaster, because did everybody do? 871 00:52:25,820 --> 00:52:28,832 May be one student, or two, put an n into there. 872 00:52:28,832 --> 00:52:31,290 But what's the naturally thing to do to induct on here when 873 00:52:31,290 --> 00:52:34,560 you look at this statement? 874 00:52:34,560 --> 00:52:37,760 You're going to induct on d, because the first thing you do 875 00:52:37,760 --> 00:52:39,680 is you make this be your induction hypothesis. 876 00:52:39,680 --> 00:52:41,950 There's only one thing to use, so you're 877 00:52:41,950 --> 00:52:44,910 going to have your predicate be p of d, 878 00:52:44,910 --> 00:52:47,014 and it's going to be that. 879 00:52:47,014 --> 00:52:49,680 Now It didn't occur to us that's what everybody was going to do, 880 00:52:49,680 --> 00:52:50,560 but it should have. 881 00:52:50,560 --> 00:52:53,220 They all did that and it was a disaster. 882 00:52:53,220 --> 00:52:56,010 Because if you do this, well you've 883 00:52:56,010 --> 00:53:00,280 got to take a graph with maximum degree d, or d plus 1 884 00:53:00,280 --> 00:53:03,200 in the inductive step, pull out all the nodes 885 00:53:03,200 --> 00:53:07,021 with degree d plus 1 to get a graph with now degree d. 886 00:53:07,021 --> 00:53:07,770 And that's a mess. 887 00:53:07,770 --> 00:53:10,410 You just pulled out a lot of nodes, potentially. 888 00:53:10,410 --> 00:53:14,650 Color that in d plus 1 colors, now put all that junk back in. 889 00:53:14,650 --> 00:53:17,180 And say only used one more color. 890 00:53:17,180 --> 00:53:18,520 Nightmare. 891 00:53:18,520 --> 00:53:21,300 And these were MIT students under pressure. 892 00:53:21,300 --> 00:53:23,330 It was a nightmare. 893 00:53:23,330 --> 00:53:24,410 So that does not work. 894 00:53:24,410 --> 00:53:27,410 And in fact, we will ask an induction question 895 00:53:27,410 --> 00:53:29,960 on graphs on every test you take in this course. 896 00:53:29,960 --> 00:53:31,190 It will happen. 897 00:53:31,190 --> 00:53:34,850 And so usually, with induction, you 898 00:53:34,850 --> 00:53:36,580 take this as your induction hypothesis. 899 00:53:36,580 --> 00:53:38,650 With graphs, you have to be careful. 900 00:53:38,650 --> 00:53:41,830 And worst part about this is we tell people 901 00:53:41,830 --> 00:53:45,950 when this doesn't work, use a stronger induction hypothesis. 902 00:53:45,950 --> 00:53:48,280 So students tried to make a stronger, 903 00:53:48,280 --> 00:53:51,900 but they're still stuck on d, and it was still a disaster. 904 00:53:51,900 --> 00:53:53,930 With graphs, you do something different. 905 00:53:53,930 --> 00:53:56,240 And the first thing you do with a graph, usually, 906 00:53:56,240 --> 00:53:58,385 is put n in here. 907 00:53:58,385 --> 00:54:00,510 And if it doesn't work with n, the number of nodes, 908 00:54:00,510 --> 00:54:03,410 you put in e the number of edges. 909 00:54:03,410 --> 00:54:05,120 And induct on that. 910 00:54:05,120 --> 00:54:08,460 And so what you said is exactly the right thing to do. 911 00:54:08,460 --> 00:54:11,270 Don't do this, or least don't spend too much time on it. 912 00:54:11,270 --> 00:54:13,360 Pretty quickly try this. 913 00:54:13,360 --> 00:54:17,970 If every end node graph-- if every node in an n 914 00:54:17,970 --> 00:54:23,840 node graph G has degree at most degree, 915 00:54:23,840 --> 00:54:27,070 then the basic algorithm uses at most d, plus one colors. 916 00:54:27,070 --> 00:54:30,350 And now you induct on n. 917 00:54:30,350 --> 00:54:33,100 And almost always on graphs, that's the first thing to try. 918 00:54:33,100 --> 00:54:37,270 Even if it's not in your theorem statement. 919 00:54:37,270 --> 00:54:39,900 Any questions about that? 920 00:54:46,940 --> 00:54:49,570 Well let's start with this, and see 921 00:54:49,570 --> 00:54:51,256 if we can make this one work. 922 00:55:02,230 --> 00:55:04,140 So what's the next step in our proof? 923 00:55:04,140 --> 00:55:06,560 What do we got to do? 924 00:55:06,560 --> 00:55:07,380 Base case. 925 00:55:11,890 --> 00:55:14,750 And the base case will be, not n equals 0, 926 00:55:14,750 --> 00:55:19,300 because we can't have a zero node graph, but n equals 1. 927 00:55:19,300 --> 00:55:22,940 And how many edges do we have? 928 00:55:22,940 --> 00:55:23,440 Zero. 929 00:55:23,440 --> 00:55:25,190 If there's one node, we don't allow loops, 930 00:55:25,190 --> 00:55:33,140 so it's zero edges, which means that the degree of our graph 931 00:55:33,140 --> 00:55:36,380 has to be zero. 932 00:55:36,380 --> 00:55:37,780 There's no edges. 933 00:55:37,780 --> 00:55:39,900 And of course there's only one node, 934 00:55:39,900 --> 00:55:44,140 so one color is going to work, and that 935 00:55:44,140 --> 00:55:46,340 happens to equal d plus 1. 936 00:55:49,460 --> 00:55:52,110 All right, so the base case is true. 937 00:55:52,110 --> 00:55:55,110 For one node graphs, you can always 938 00:55:55,110 --> 00:55:59,010 use d plus 1 colors, where d is the max degree. 939 00:55:59,010 --> 00:56:01,210 All right, next we have the inductive step. 940 00:56:05,500 --> 00:56:14,425 So here we assume P n is true for the induction. 941 00:56:20,330 --> 00:56:23,130 And now we look at an n plus 1 node graph 942 00:56:23,130 --> 00:56:26,620 to show P n plus 1 is true. 943 00:56:26,620 --> 00:56:35,240 So we let G be any N plus 1 node graph. 944 00:56:35,240 --> 00:56:39,152 We got to show you can color it in d plus 1 colors. 945 00:56:41,810 --> 00:56:48,790 And let's let d be the max degree, the largest 946 00:56:48,790 --> 00:56:56,020 degree in G. 947 00:56:56,020 --> 00:57:01,180 We've got to show we can color it in d plus 1 colors. 948 00:57:01,180 --> 00:57:03,120 Well the basic algorithm, let's say. 949 00:57:03,120 --> 00:57:05,680 First thing we do is we order the nodes 950 00:57:05,680 --> 00:57:06,954 in an arbitrary order. 951 00:57:10,363 --> 00:57:13,700 And we're going to show whatever order you pick is OK. 952 00:57:22,920 --> 00:57:28,185 All right so what are the nodes? 953 00:57:38,740 --> 00:57:41,200 Anyway at all. 954 00:57:41,200 --> 00:57:45,690 Now how am I going to use the induction hypothesis? 955 00:57:45,690 --> 00:57:49,900 I know, I can assume, the for any N node graph 956 00:57:49,900 --> 00:57:54,550 I can color it in the max degree plus 1 colors. 957 00:57:54,550 --> 00:57:56,940 How am I going to use that to help me color G here, 958 00:57:56,940 --> 00:57:58,260 the n plus 1 node graph? 959 00:57:58,260 --> 00:58:00,690 Any thoughts? 960 00:58:00,690 --> 00:58:01,190 Yeah? 961 00:58:01,190 --> 00:58:03,860 AUDIENCE: [INAUDIBLE] 962 00:58:03,860 --> 00:58:06,920 PROFESSOR: Yeah, let's create an n node graph 963 00:58:06,920 --> 00:58:09,410 by looking at these nodes, and taking 964 00:58:09,410 --> 00:58:12,310 this one out of the time being. 965 00:58:12,310 --> 00:58:15,910 Remove the last V n plus 1 node in the order. 966 00:58:15,910 --> 00:58:19,400 That leaves an n node graph. 967 00:58:19,400 --> 00:58:22,310 So let's write that down. 968 00:58:22,310 --> 00:58:33,120 We remove the n plus 1 from G. And that creates a new graph, 969 00:58:33,120 --> 00:58:38,530 call it G prime with vertices, V prime and edges, E prime. 970 00:58:38,530 --> 00:58:40,925 So we create a new graph by removing that node. 971 00:58:43,640 --> 00:58:46,240 And we remove all the edges tied to that node. 972 00:58:46,240 --> 00:58:50,360 So for example over here, the last node 973 00:58:50,360 --> 00:58:56,150 was 6042, so we take out 6042, and all these edges. 974 00:58:56,150 --> 00:58:59,116 And this is a graph that we're left with. 975 00:59:02,370 --> 00:59:06,190 That graph has n nodes. 976 00:59:06,190 --> 00:59:08,637 What's the maximum degree in G prime? 977 00:59:11,910 --> 00:59:18,054 When I pull out a node, can the degree of any node go up? 978 00:59:18,054 --> 00:59:21,320 No, I'm just taking stuff out. 979 00:59:21,320 --> 00:59:27,930 So I know that G prime has maximum degree, at most, d. 980 00:59:27,930 --> 00:59:30,370 The degree didn't go up of any node. 981 00:59:35,080 --> 00:59:38,430 Might have gone down, but it didn't go up. 982 00:59:38,430 --> 00:59:44,045 So G prime has max degree, at most, d, and it has n nodes. 983 00:59:47,440 --> 00:59:50,440 So we can use the induction hypothesis P n. 984 00:59:53,360 --> 01:00:01,320 It says that the basic algorithm uses d plus 1, 985 01:00:01,320 --> 01:00:09,758 at most, d plus 1 colors for nodes V1 to V n. 986 01:00:13,104 --> 01:00:14,260 Any questions about that? 987 01:00:17,140 --> 01:00:22,140 So if this were the n plus first node, last node in the ordering 988 01:00:22,140 --> 01:00:24,090 take it out. 989 01:00:24,090 --> 01:00:26,490 The basic algorithm now, take the same order here, 990 01:00:26,490 --> 01:00:33,850 V1, V2, V3, V4, basic, we'll color that in d plus 1 colors. 991 01:00:33,850 --> 01:00:36,950 And all I have left is to give this guy color, 992 01:00:36,950 --> 01:00:41,360 and I'll have color G. Question? 993 01:00:41,360 --> 01:00:43,640 No. 994 01:00:43,640 --> 01:00:45,600 All right. 995 01:00:45,600 --> 01:00:48,820 So by induction I've colored these guys, V1 to V2, 996 01:00:48,820 --> 01:00:51,760 and d plus 1 colors, all that I have left to do 997 01:00:51,760 --> 01:00:56,000 is color V n plus 1. 998 01:00:56,000 --> 01:00:59,585 And hopefully we're not going to use color d plus 2, 999 01:00:59,585 --> 01:01:02,450 because then we sort of-- it wouldn't work. 1000 01:01:02,450 --> 01:01:04,953 We got to use one of the first d plus 1. 1001 01:01:12,330 --> 01:01:14,370 All right, so let's look at V n plus 1. 1002 01:01:17,110 --> 01:01:25,780 And let's call its neighbors in G, U1, U2, Ud. 1003 01:01:25,780 --> 01:01:31,630 It has, at most d neighbors, because every node in G has, 1004 01:01:31,630 --> 01:01:34,320 at most, degree d. 1005 01:01:34,320 --> 01:01:35,980 A neighbor's a node you're adjacent to. 1006 01:01:38,840 --> 01:01:45,950 All right so, V n plus 1 has at most d neighbors, is adjacent 1007 01:01:45,950 --> 01:01:48,061 to, at most, d other nodes. 1008 01:01:51,220 --> 01:01:54,470 Now what does that mean about the color I 1009 01:01:54,470 --> 01:01:56,990 can use on V n plus 1? 1010 01:01:59,540 --> 01:02:04,605 What do I know about what color I can use for that? 1011 01:02:04,605 --> 01:02:05,105 Yeah? 1012 01:02:05,105 --> 01:02:10,070 AUDIENCE: It can't be any of the colors of U1, U2, and so on. 1013 01:02:10,070 --> 01:02:12,850 PROFESSOR: It can't be any one of these colors that 1014 01:02:12,850 --> 01:02:14,860 were assigned here. 1015 01:02:14,860 --> 01:02:16,110 That's true. 1016 01:02:16,110 --> 01:02:19,440 So how many colors got ruled out? 1017 01:02:19,440 --> 01:02:23,300 At most d, and how many am I working with? 1018 01:02:23,300 --> 01:02:25,560 d Plus 1. 1019 01:02:25,560 --> 01:02:29,580 So I got one left that I can use safely. 1020 01:02:29,580 --> 01:02:30,920 OK. 1021 01:02:30,920 --> 01:02:35,660 So this means there exists at least one color 1022 01:02:35,660 --> 01:02:37,890 in my set of d plus 1 colors. 1023 01:02:42,560 --> 01:02:47,575 It's not used by any neighbor. 1024 01:02:53,660 --> 01:02:56,390 And we're going to give V n plus 1 that color. 1025 01:03:07,150 --> 01:03:07,650 All right. 1026 01:03:07,650 --> 01:03:12,350 So now I've colored every node in G, the n plus 1 node graph, 1027 01:03:12,350 --> 01:03:15,315 safely using a total of d plus 1 colors. 1028 01:03:18,692 --> 01:03:25,050 So that means the basic algorithm uses, 1029 01:03:25,050 --> 01:03:33,340 at most, d plus 1 colors, on G. That means P n plus 1 1030 01:03:33,340 --> 01:03:37,815 is true-- whoops-- and the induction is complete. 1031 01:03:43,050 --> 01:03:44,853 Any questions? 1032 01:03:44,853 --> 01:03:45,353 Yeah. 1033 01:03:45,353 --> 01:03:49,281 AUDIENCE: Could you also start from the other way, 1034 01:03:49,281 --> 01:03:55,664 and start 1, go to 2 nodes, 3 nodes at each step keeping 1035 01:03:55,664 --> 01:03:59,101 all nodes at all other nodes. 1036 01:03:59,101 --> 01:04:02,080 [INAUDIBLE] 1037 01:04:02,080 --> 01:04:05,308 PROFESSOR: What do you mean by keeping all nodes connected? 1038 01:04:05,308 --> 01:04:16,264 AUDIENCE: [INAUDIBLE] each node has an edge connecting 1039 01:04:16,264 --> 01:04:17,804 to each other one. 1040 01:04:17,804 --> 01:04:19,720 PROFESSOR: OK so, then I get a specific graph. 1041 01:04:19,720 --> 01:04:24,240 I start with this, I add a node and make it adjacent. 1042 01:04:24,240 --> 01:04:25,680 I add a node and make it adjacent. 1043 01:04:25,680 --> 01:04:26,554 AUDIENCE: [INAUDIBLE] 1044 01:04:32,410 --> 01:04:33,904 PROFESSOR: Yeah. 1045 01:04:33,904 --> 01:04:36,620 So you've constructed a particular graph. 1046 01:04:36,620 --> 01:04:40,260 This is actually called, for the n nodes, it's called Kn, 1047 01:04:40,260 --> 01:04:45,010 is the n node complete graph, also called a clique, 1048 01:04:45,010 --> 01:04:49,100 like a clique of friends, where everybody likes everybody, 1049 01:04:49,100 --> 01:04:52,350 in a clique. 1050 01:04:52,350 --> 01:04:55,660 And in fact for n here, for those n nodes, 1051 01:04:55,660 --> 01:04:58,940 what's the max degree? 1052 01:04:58,940 --> 01:05:01,440 Max degree is n minus 1. 1053 01:05:01,440 --> 01:05:03,420 What's the chromatic number of this graph? 1054 01:05:05,986 --> 01:05:07,485 What's the minimum number of colors? 1055 01:05:07,485 --> 01:05:09,156 [INTERPOSING VOICES] 1056 01:05:09,156 --> 01:05:11,030 PROFESSOR: And they all have to be different, 1057 01:05:11,030 --> 01:05:12,890 which is d plus 1. 1058 01:05:12,890 --> 01:05:16,420 So you have built a special graph 1059 01:05:16,420 --> 01:05:19,560 for which the optimum of number colors is d plus 1. 1060 01:05:19,560 --> 01:05:23,630 But that is not a proof that this is true for all graphs. 1061 01:05:23,630 --> 01:05:26,384 Because you've looked at a particular graph here. 1062 01:05:26,384 --> 01:05:27,259 AUDIENCE: [INAUDIBLE] 1063 01:05:30,222 --> 01:05:31,180 PROFESSOR: What's that? 1064 01:05:31,180 --> 01:05:34,760 AUDIENCE: [INAUDIBLE] It means that you can still use your 1065 01:05:34,760 --> 01:05:36,138 less than or equal to sign. 1066 01:05:36,138 --> 01:05:38,490 PROFESSOR: I see, so you'd add a node, 1067 01:05:38,490 --> 01:05:41,182 and it's only connected to a few of them. 1068 01:05:41,182 --> 01:05:43,015 AUDIENCE: No, it's connected to all of them, 1069 01:05:43,015 --> 01:05:45,930 but it still implies that you need less than 1070 01:05:45,930 --> 01:05:47,050 or equal to the colors. 1071 01:05:47,050 --> 01:05:49,083 It turns out it happens to be equal to. 1072 01:05:49,083 --> 01:05:50,832 PROFESSOR: Yes, in this case that's right. 1073 01:05:50,832 --> 01:05:52,510 So you've made an argument for this case 1074 01:05:52,510 --> 01:05:54,060 where it actually is equal, but that 1075 01:05:54,060 --> 01:05:55,200 only worked for this graph. 1076 01:05:55,200 --> 01:05:59,310 AUDIENCE: [INAUDIBLE] worse case. 1077 01:05:59,310 --> 01:06:02,666 PROFESSOR: It is the worst case, so it meets the bound. 1078 01:06:02,666 --> 01:06:04,290 It shows you cannot improve this bound. 1079 01:06:04,290 --> 01:06:06,052 Yeah, is there a question up there? 1080 01:06:06,052 --> 01:06:09,290 AUDIENCE: All I was going to say is that you've 1081 01:06:09,290 --> 01:06:10,620 proved it's the worst case. 1082 01:06:10,620 --> 01:06:12,369 PROFESSOR: Right, so what you've done here 1083 01:06:12,369 --> 01:06:15,587 is you've shown that I could not make that theorem any stronger. 1084 01:06:15,587 --> 01:06:18,531 I could not replace it with d here. 1085 01:06:18,531 --> 01:06:19,030 All right. 1086 01:06:19,030 --> 01:06:20,571 Because you've given an example where 1087 01:06:20,571 --> 01:06:23,440 I can't get d colors, where the maximum degree is d. 1088 01:06:23,440 --> 01:06:25,880 But that doesn't-- To get a proof for a theorem, 1089 01:06:25,880 --> 01:06:27,110 I got to go through all this. 1090 01:06:27,110 --> 01:06:30,480 That wouldn't give me a proof of the theorem. 1091 01:06:30,480 --> 01:06:32,130 They're not equivalent. 1092 01:06:32,130 --> 01:06:35,660 One's an upper bound, one's an existence of a lower bound. 1093 01:06:35,660 --> 01:06:40,940 This shows that for any graph, you need at most d plus 1. 1094 01:06:40,940 --> 01:06:43,990 So any graph, at most. 1095 01:06:43,990 --> 01:06:50,430 That shows there is a graph that you need at least. 1096 01:06:50,430 --> 01:06:52,640 And they are not equivalent. 1097 01:06:52,640 --> 01:06:53,230 All right. 1098 01:06:53,230 --> 01:06:56,610 One is for all, and upper bound. 1099 01:06:56,610 --> 01:07:00,830 The other is there exists a lower bound. 1100 01:07:00,830 --> 01:07:05,590 So different in two ways that are important. 1101 01:07:05,590 --> 01:07:08,980 This kind of proof is very typical for what you'll 1102 01:07:08,980 --> 01:07:11,390 see with induction in graphs. 1103 01:07:11,390 --> 01:07:13,340 And you'll get a lot of practice with it. 1104 01:07:13,340 --> 01:07:17,380 Are there any other questions on this proof? 1105 01:07:20,200 --> 01:07:21,870 OK. 1106 01:07:21,870 --> 01:07:24,800 All right, see we've seen now, by that example, 1107 01:07:24,800 --> 01:07:27,330 we can't improve the theorem. 1108 01:07:27,330 --> 01:07:30,200 In some cases, though, the theorem 1109 01:07:30,200 --> 01:07:32,020 is way off, for some graphs. 1110 01:07:32,020 --> 01:07:34,530 Can anybody think of a graph where 1111 01:07:34,530 --> 01:07:37,850 the bound we get from the theorem, of d plus 1 colors, 1112 01:07:37,850 --> 01:07:40,680 is way off from the actual chromatic number you need, 1113 01:07:40,680 --> 01:07:41,962 the number of colors you need? 1114 01:07:41,962 --> 01:07:42,462 Yeah? 1115 01:07:42,462 --> 01:07:44,000 AUDIENCE: [INAUDIBLE] 1116 01:07:44,000 --> 01:07:44,916 PROFESSOR: What is it? 1117 01:07:44,916 --> 01:07:51,282 AUDIENCE: A graph [INAUDIBLE] two sets of [INAUDIBLE] 1118 01:07:55,130 --> 01:07:56,470 PROFESSOR: Good, OK. 1119 01:07:56,470 --> 01:07:58,220 Yes, so what if we did this graph. 1120 01:07:58,220 --> 01:08:00,042 Let me draw it out. 1121 01:08:03,310 --> 01:08:09,600 So you've got a bunch of nodes here, bunch of nodes here. 1122 01:08:09,600 --> 01:08:12,300 And every node here is connected to every node 1123 01:08:12,300 --> 01:08:15,600 over the other side. 1124 01:08:15,600 --> 01:08:18,050 And if this is an n no graph, and I've got n over 2 1125 01:08:18,050 --> 01:08:21,100 on each side, what's my degree here? 1126 01:08:25,922 --> 01:08:27,380 What's my max degree of this graph? 1127 01:08:27,380 --> 01:08:28,560 AUDIENCE: N over 2. 1128 01:08:28,560 --> 01:08:30,319 PROFESSOR: N over 2. 1129 01:08:30,319 --> 01:08:32,680 So d is n over 2. 1130 01:08:32,680 --> 01:08:34,060 What's the chromatic number? 1131 01:08:34,060 --> 01:08:36,520 How many colors do I need for this? 1132 01:08:36,520 --> 01:08:38,250 Two. 1133 01:08:38,250 --> 01:08:41,380 All right, so d plus 1 is way off of two. 1134 01:08:41,380 --> 01:08:43,466 There is a even worse example. 1135 01:08:43,466 --> 01:08:43,966 Yeah? 1136 01:08:43,966 --> 01:08:48,130 AUDIENCE: That graph where you have one node center that's 1137 01:08:48,130 --> 01:08:51,126 connected to a bunch of nodes regularly distributed about. 1138 01:08:51,126 --> 01:08:52,564 PROFESSOR: Yeah, the star graph. 1139 01:08:52,564 --> 01:08:56,300 All right, so I got one of the center, I got n minus 1 1140 01:08:56,300 --> 01:08:58,330 outside. 1141 01:08:58,330 --> 01:09:03,050 So here the maximum degree is n minus 1, 1142 01:09:03,050 --> 01:09:05,460 just like a complete graph. 1143 01:09:05,460 --> 01:09:08,649 But how many colors do I need? 1144 01:09:08,649 --> 01:09:10,069 Two. 1145 01:09:10,069 --> 01:09:11,500 So it's even worse here. 1146 01:09:14,319 --> 01:09:18,920 All right now what about the basic algorithm? 1147 01:09:18,920 --> 01:09:21,599 How well does the basic algorithm do on this graph? 1148 01:09:24,649 --> 01:09:27,790 Or to the vertices some way? 1149 01:09:27,790 --> 01:09:29,630 Color on one [INAUDIBLE] lowest color. 1150 01:09:29,630 --> 01:09:31,620 How many colors is it going to use? 1151 01:09:31,620 --> 01:09:33,840 AUDIENCE: Two. 1152 01:09:33,840 --> 01:09:35,240 PROFESSOR: Two. 1153 01:09:35,240 --> 01:09:38,109 It doesn't matter the vertices. 1154 01:09:38,109 --> 01:09:46,619 V1, V2, V3, V4, because I'll color this one 1. 1155 01:09:46,619 --> 01:09:49,668 What am I going to call that one? 1156 01:09:49,668 --> 01:09:51,950 1. 1157 01:09:51,950 --> 01:09:54,770 Then I get to the center, what am I going to color it? 1158 01:09:54,770 --> 01:09:56,360 2. 1159 01:09:56,360 --> 01:10:00,130 And now all the arms, what do they get colored? 1160 01:10:00,130 --> 01:10:01,240 They all get 1. 1161 01:10:01,240 --> 01:10:04,275 Whatever order you pick, you get two colors. 1162 01:10:07,270 --> 01:10:08,892 All right so now there's a difference 1163 01:10:08,892 --> 01:10:11,350 between the theorem just gives you an upper bound, it says, 1164 01:10:11,350 --> 01:10:13,150 at most, d plus 1 colors. 1165 01:10:13,150 --> 01:10:15,700 But in fact the algorithm can do a lot better 1166 01:10:15,700 --> 01:10:19,050 than that, as on this example. 1167 01:10:19,050 --> 01:10:22,420 So the algorithm might be a lot better. 1168 01:10:22,420 --> 01:10:24,770 Everybody see that what we're doing here? 1169 01:10:24,770 --> 01:10:26,520 How the algorithm is better than the bound 1170 01:10:26,520 --> 01:10:28,644 we proved by the theorem, even though the bound was 1171 01:10:28,644 --> 01:10:31,180 pretty good for some graphs. 1172 01:10:31,180 --> 01:10:33,620 Now it turns out-- I mean we're not 1173 01:10:33,620 --> 01:10:36,110 going to win a million dollars for this algorithm. 1174 01:10:36,110 --> 01:10:39,880 And in fact, this algorithm is sometimes very bad. 1175 01:10:39,880 --> 01:10:44,370 And a really bad example it's very close to this. 1176 01:10:44,370 --> 01:10:48,470 In fact actually this one, let's look at how well does basic 1177 01:10:48,470 --> 01:10:51,310 do one this one here. 1178 01:10:54,240 --> 01:10:56,160 Make some ordering. 1179 01:10:56,160 --> 01:10:59,950 V1, V2, V3. 1180 01:11:02,715 --> 01:11:05,520 What's the basic algorithm going to do on this complete-- 1181 01:11:05,520 --> 01:11:08,780 it's called a complete bipartite graph, is what's this called. 1182 01:11:08,780 --> 01:11:10,300 I'll define bipartite in a minute-- 1183 01:11:10,300 --> 01:11:12,001 but what's the basic algorithm do here? 1184 01:11:15,690 --> 01:11:20,860 Any idea-- does it take n over 2 colors, or does it take 2? 1185 01:11:20,860 --> 01:11:23,170 Any ideas? 1186 01:11:23,170 --> 01:11:24,370 2. 1187 01:11:24,370 --> 01:11:29,650 So take a vertex, and the first one, say V1s here, get C1. 1188 01:11:29,650 --> 01:11:32,790 As long as I keep picking vertices over on this side, 1189 01:11:32,790 --> 01:11:33,790 they're going to get C1. 1190 01:11:36,430 --> 01:11:39,560 As soon as I get to a vertex over here, 1191 01:11:39,560 --> 01:11:41,363 what color does it have to get? 1192 01:11:41,363 --> 01:11:42,130 AUDIENCE: C2. 1193 01:11:42,130 --> 01:11:43,620 PROFESSOR: C2 because it's touching 1194 01:11:43,620 --> 01:11:46,296 the very first one we had here. 1195 01:11:46,296 --> 01:11:47,670 So when I get vertices over here, 1196 01:11:47,670 --> 01:11:48,795 they're all going to be C2. 1197 01:11:48,795 --> 01:11:52,270 When I go back over here, they're going to be back to C1. 1198 01:11:52,270 --> 01:11:55,910 So actually basic does good here too, gives you two colors. 1199 01:11:55,910 --> 01:11:56,410 Yeah? 1200 01:11:56,410 --> 01:11:57,408 AUDIENCE: [INAUDIBLE] 1201 01:12:06,870 --> 01:12:08,620 PROFESSOR: Ah, those two aren't connected. 1202 01:12:08,620 --> 01:12:11,480 But this case, if I've got a vertex over here 1203 01:12:11,480 --> 01:12:14,540 it is, by definition, connected to the vertex over here. 1204 01:12:14,540 --> 01:12:17,820 Because every possible edge is here. 1205 01:12:17,820 --> 01:12:18,870 But that's a great idea. 1206 01:12:18,870 --> 01:12:22,447 What if they weren't all connected, 1207 01:12:22,447 --> 01:12:23,655 that's actually a great idea. 1208 01:12:25,590 --> 01:12:29,270 In fact, the nasty example for the basic algorithm 1209 01:12:29,270 --> 01:12:30,270 is very much like that. 1210 01:12:30,270 --> 01:12:31,446 Let's draw it. 1211 01:12:35,748 --> 01:12:38,200 Because so far, the basic algorithm 1212 01:12:38,200 --> 01:12:42,080 is pretty much done perfectly on all the graphs 1213 01:12:42,080 --> 01:12:48,870 we looked at even when the theorem wasn't tight. 1214 01:12:48,870 --> 01:12:50,445 So here is a nasty graph. 1215 01:12:56,140 --> 01:12:58,930 And it is very close to the graph we just look like, 1216 01:12:58,930 --> 01:13:00,910 where all the edges are there. 1217 01:13:00,910 --> 01:13:03,080 In this case, all the edges are there, 1218 01:13:03,080 --> 01:13:07,040 except for the one straight across. 1219 01:13:07,040 --> 01:13:10,980 So if this is-- the edge denotes likes, 1220 01:13:10,980 --> 01:13:14,410 this is a world where you like everybody but your spouse. 1221 01:13:16,950 --> 01:13:19,390 All right, so you have an edge to every one, 1222 01:13:19,390 --> 01:13:21,240 except the one directly across from you. 1223 01:13:25,190 --> 01:13:29,370 No edge there, and so forth. 1224 01:13:29,370 --> 01:13:32,460 So it has almost every edge, but it's missing these edges. 1225 01:13:37,620 --> 01:13:42,440 Now the basic algorithm might do well here. 1226 01:13:42,440 --> 01:13:46,120 What would be a good ordering for this graph 1227 01:13:46,120 --> 01:13:48,010 to label these V1 through Vn? 1228 01:13:48,010 --> 01:13:48,510 Yeah? 1229 01:13:48,510 --> 01:13:51,630 AUDIENCE: Go through everything on the left side, 1230 01:13:51,630 --> 01:13:54,818 and then the right side. 1231 01:13:54,818 --> 01:13:56,312 PROFESSOR: Yeah, that's right. 1232 01:14:03,800 --> 01:14:12,480 Because then color 1, color 1, color 1, all the way down. 1233 01:14:12,480 --> 01:14:17,460 One color for the left, what does this one get? 1234 01:14:17,460 --> 01:14:19,540 Color 2, because it's hooked up against. 1235 01:14:19,540 --> 01:14:25,670 And these all get color 2, so I've used two colors. 1236 01:14:25,670 --> 01:14:26,260 Really good. 1237 01:14:26,260 --> 01:14:28,630 Basic algorithm's looking great. 1238 01:14:28,630 --> 01:14:31,470 Now here's a harder question. 1239 01:14:31,470 --> 01:14:34,800 Can you figure out a bad ordering 1240 01:14:34,800 --> 01:14:37,995 for this graph, where I use a lot more than two colors. 1241 01:14:37,995 --> 01:14:38,870 AUDIENCE: [INAUDIBLE] 1242 01:14:42,124 --> 01:14:43,040 PROFESSOR: What is it? 1243 01:14:43,040 --> 01:14:47,286 AUDIENCE: It starts at the top of the cross, and then 1244 01:14:47,286 --> 01:14:48,685 the next level then across. 1245 01:14:48,685 --> 01:14:49,560 PROFESSOR: Very good. 1246 01:14:49,560 --> 01:14:50,740 V1, V2. 1247 01:14:50,740 --> 01:14:53,210 Just as natural, really, if think about it, 1248 01:14:53,210 --> 01:14:54,490 to order it this way. 1249 01:15:02,670 --> 01:15:03,860 All right. 1250 01:15:03,860 --> 01:15:06,650 What color does V1 get? 1251 01:15:06,650 --> 01:15:07,410 C1. 1252 01:15:07,410 --> 01:15:08,890 What color does V2 get? 1253 01:15:08,890 --> 01:15:09,910 AUDIENCE: C1. 1254 01:15:09,910 --> 01:15:12,250 PROFESSOR: C1 because it's not hooked up here. 1255 01:15:12,250 --> 01:15:13,735 What color does V3 get? 1256 01:15:13,735 --> 01:15:14,387 AUDIENCE: C2. 1257 01:15:14,387 --> 01:15:14,970 PROFESSOR: C2. 1258 01:15:14,970 --> 01:15:16,480 What about V4? 1259 01:15:16,480 --> 01:15:17,887 AUDIENCE: C2. 1260 01:15:17,887 --> 01:15:18,470 PROFESSOR: C2. 1261 01:15:18,470 --> 01:15:19,670 It's not hooked up. 1262 01:15:19,670 --> 01:15:21,560 It can't get one, because that's up here. 1263 01:15:21,560 --> 01:15:25,310 And it's not the two, so it gets two What color does V5 get? 1264 01:15:25,310 --> 01:15:26,600 AUDIENCE: C3. 1265 01:15:26,600 --> 01:15:27,490 PROFESSOR: C3. 1266 01:15:27,490 --> 01:15:29,450 Because it's hooked up to one to two. 1267 01:15:29,450 --> 01:15:30,806 V6 ? 1268 01:15:30,806 --> 01:15:31,610 AUDIENCE: C3. 1269 01:15:31,610 --> 01:15:34,900 PROFESSOR: C3, it's hooked up to one and two, but not three. 1270 01:15:34,900 --> 01:15:38,560 And you can see what's happening here. 1271 01:15:38,560 --> 01:15:40,230 All the way down here he's hooked up 1272 01:15:40,230 --> 01:15:43,165 to all the n over 2 minus 1 colors. 1273 01:15:43,165 --> 01:15:47,110 So he also takes C n over 2. 1274 01:15:47,110 --> 01:15:50,120 So if you pick that ordering, not so good. 1275 01:15:50,120 --> 01:15:52,600 You use n over two colors. 1276 01:15:52,600 --> 01:15:55,510 So it really matters the ordering. 1277 01:15:55,510 --> 01:15:58,030 Now I should say graphs like-- actually any questions 1278 01:15:58,030 --> 01:15:59,700 about what we did here? 1279 01:15:59,700 --> 01:16:02,430 About this? 1280 01:16:02,430 --> 01:16:04,560 All right, now I should say that graphs like this 1281 01:16:04,560 --> 01:16:09,470 have a special name, they're called bipartite graphs. 1282 01:16:09,470 --> 01:16:11,249 And that's important to remember. 1283 01:16:24,520 --> 01:16:43,780 All right, so a graph G is said to be bipartite 1284 01:16:43,780 --> 01:16:53,316 if the vertices can be split into two sets, or partitioned, 1285 01:16:53,316 --> 01:17:01,360 and we'll call them a left set, and a right set, 1286 01:17:01,360 --> 01:17:17,330 so that all the edges connect a node in the left set, 1287 01:17:17,330 --> 01:17:21,190 to a node in the right set. 1288 01:17:23,759 --> 01:17:25,300 So in fact, a lot of today we've been 1289 01:17:25,300 --> 01:17:28,500 looking at bipartite graphs, because the nodes are here. 1290 01:17:28,500 --> 01:17:32,660 Like the men, and the women, and the edges only go from the left 1291 01:17:32,660 --> 01:17:33,900 to the right. 1292 01:17:33,900 --> 01:17:35,335 And that is called bipartite. 1293 01:17:35,335 --> 01:17:37,290 And it's called bipartite because you 1294 01:17:37,290 --> 01:17:41,820 can do it with two colors, or in two pieces. 1295 01:17:41,820 --> 01:17:43,860 So you don't win a million dollars 1296 01:17:43,860 --> 01:17:45,630 for deciding whether or not a graph can 1297 01:17:45,630 --> 01:17:47,600 be colored in two colors. 1298 01:17:47,600 --> 01:17:48,950 That's easy. 1299 01:17:48,950 --> 01:17:51,730 You'll even do it for homework one of these times. 1300 01:17:51,730 --> 01:17:54,330 You do win the million dollars for deciding if a graph can 1301 01:17:54,330 --> 01:17:56,700 be colored in three colors. 1302 01:17:56,700 --> 01:17:58,450 That's really hard to do. 1303 01:18:01,090 --> 01:18:04,800 Now coloring problems come up in all sorts of applications. 1304 01:18:04,800 --> 01:18:08,950 You know with this company, Akamai, that came out of MIT, 1305 01:18:08,950 --> 01:18:10,070 we've talked about. 1306 01:18:10,070 --> 01:18:14,490 We run a network of 75,000 servers. 1307 01:18:14,490 --> 01:18:16,860 And they're used to distribute content on the internet, 1308 01:18:16,860 --> 01:18:18,280 and so forth. 1309 01:18:18,280 --> 01:18:20,210 And we have to deploy a new version 1310 01:18:20,210 --> 01:18:21,800 of our software on those servers, 1311 01:18:21,800 --> 01:18:23,420 pretty much every week. 1312 01:18:23,420 --> 01:18:25,560 We're pushing new software out. 1313 01:18:25,560 --> 01:18:31,070 And you can't deploy on every server at the same time, 1314 01:18:31,070 --> 01:18:32,950 because you've got to take down a server 1315 01:18:32,950 --> 01:18:34,450 to deploy new software on it. 1316 01:18:34,450 --> 01:18:36,050 Got to take it out of commission. 1317 01:18:36,050 --> 01:18:38,650 And so we can't just take down all 75,000 servers, 1318 01:18:38,650 --> 01:18:40,960 because then all the Facebook, and Netflix, 1319 01:18:40,960 --> 01:18:42,410 and all those sites would stop. 1320 01:18:42,410 --> 01:18:44,840 That would be bad. 1321 01:18:44,840 --> 01:18:48,060 And we can't do them one at a time, because there's 75,000. 1322 01:18:48,060 --> 01:18:50,780 And it takes a few hours for each one 1323 01:18:50,780 --> 01:18:54,070 to get the traffic off, stop it, load new software, 1324 01:18:54,070 --> 01:18:54,980 and turn it back on. 1325 01:18:54,980 --> 01:18:58,240 And it would take us years to do one software install, 1326 01:18:58,240 --> 01:19:00,610 which we got to do every week. 1327 01:19:00,610 --> 01:19:03,280 So we've got to figure out a schedule for how many servers 1328 01:19:03,280 --> 01:19:05,530 you take down at a given time, and which ones. 1329 01:19:05,530 --> 01:19:07,590 And it turns out pairs of servers 1330 01:19:07,590 --> 01:19:09,639 have certain critical functions. 1331 01:19:09,639 --> 01:19:11,930 So there's certain pairs of servers you can't take down 1332 01:19:11,930 --> 01:19:14,750 at the same time. 1333 01:19:14,750 --> 01:19:18,536 So we have a gigantic 75,000 node coloring problem, 1334 01:19:18,536 --> 01:19:20,035 where there's edges between servers. 1335 01:19:20,035 --> 01:19:21,618 Nodes are servers, and there's an edge 1336 01:19:21,618 --> 01:19:25,634 between if you can't install new software at the same time. 1337 01:19:25,634 --> 01:19:27,050 And so when it turns out, when you 1338 01:19:27,050 --> 01:19:30,410 run one of these graph coloring algorithms on it, 1339 01:19:30,410 --> 01:19:32,120 you could do it with eight colors. 1340 01:19:32,120 --> 01:19:34,160 It just turns out that way. 1341 01:19:34,160 --> 01:19:37,070 So that means there's eight waves of install 1342 01:19:37,070 --> 01:19:38,920 that go on to the network. 1343 01:19:38,920 --> 01:19:41,450 And now eight times a few hours each 1344 01:19:41,450 --> 01:19:44,672 means that we can do it in a day, and you can manage it. 1345 01:19:44,672 --> 01:19:47,780 You know on a much smaller scale, 1346 01:19:47,780 --> 01:19:50,310 the same problem exists for register allocation, 1347 01:19:50,310 --> 01:19:52,120 for variables. 1348 01:19:52,120 --> 01:19:55,486 Here you've got to assign every variable to register. 1349 01:19:55,486 --> 01:19:56,860 But you can't have variables that 1350 01:19:56,860 --> 01:19:59,360 are active at the same time associated 1351 01:19:59,360 --> 01:20:00,900 with the same register. 1352 01:20:00,900 --> 01:20:03,850 And you want to minimize the number of registers you need. 1353 01:20:03,850 --> 01:20:07,110 So again, you have the graph coloring problem. 1354 01:20:07,110 --> 01:20:10,220 The number of colors is the number of registers you need. 1355 01:20:10,220 --> 01:20:12,920 And two variables can't get the same color if their active 1356 01:20:12,920 --> 01:20:15,810 at the same time, so you put an edge between them. 1357 01:20:15,810 --> 01:20:17,720 The most famous example of graph coloring 1358 01:20:17,720 --> 01:20:21,670 is the map coloring problem, with the four coloring theorem. 1359 01:20:21,670 --> 01:20:25,800 And so here, every country is a node. 1360 01:20:25,800 --> 01:20:27,727 Adjacent countries have an edge between them, 1361 01:20:27,727 --> 01:20:29,810 because you don't want to color adjacent countries 1362 01:20:29,810 --> 01:20:31,480 the same color, or you can't tell 1363 01:20:31,480 --> 01:20:34,730 they're different countries. 1364 01:20:34,730 --> 01:20:36,990 Now the last example we can talk about 1365 01:20:36,990 --> 01:20:40,000 is an important problem in communication theory, 1366 01:20:40,000 --> 01:20:43,630 communication networks, where again coloring comes up. 1367 01:20:43,630 --> 01:20:48,100 Now here you need to assign frequencies to radio stations, 1368 01:20:48,100 --> 01:20:49,035 or the cell towers. 1369 01:20:49,035 --> 01:20:54,510 It comes up in mobile networks, or just in with radio stations. 1370 01:20:54,510 --> 01:20:59,370 And if two towers have an overlapping area, 1371 01:20:59,370 --> 01:21:01,290 they can't be given the same frequency, 1372 01:21:01,290 --> 01:21:05,540 so you get collisions between the towers. 1373 01:21:05,540 --> 01:21:07,110 And frequencies are very expensive. 1374 01:21:07,110 --> 01:21:08,901 Companies pay the government a lot of money 1375 01:21:08,901 --> 01:21:11,020 to get certain spectrum. 1376 01:21:11,020 --> 01:21:13,350 So suppose you had this problem. 1377 01:21:13,350 --> 01:21:17,205 Here's tower A, this is A's range, where it reaches. 1378 01:21:17,205 --> 01:21:20,750 Here's tower B, so it overlaps some 1379 01:21:20,750 --> 01:21:36,660 with A. Here's tower C. Here's tower E. And here's tower D. 1380 01:21:36,660 --> 01:21:38,280 All right now the question would be, 1381 01:21:38,280 --> 01:21:40,869 how many radio frequencies do you need? 1382 01:21:40,869 --> 01:21:42,910 What's the minimum number of frequencies you need 1383 01:21:42,910 --> 01:21:45,916 to enable all the towers here? 1384 01:21:45,916 --> 01:21:47,480 We could make that be a graph. 1385 01:21:47,480 --> 01:21:49,760 There's a node for each tower. 1386 01:21:49,760 --> 01:21:54,400 And an edge between towers, if they overlap. 1387 01:21:54,400 --> 01:21:58,650 C doesn't overlap with B, E does. 1388 01:21:58,650 --> 01:22:00,970 E overlaps here. 1389 01:22:00,970 --> 01:22:05,060 And then D overlaps here. 1390 01:22:05,060 --> 01:22:09,197 So how many frequencies do you need for this graph? 1391 01:22:09,197 --> 01:22:10,360 AUDIENCE: Four. 1392 01:22:10,360 --> 01:22:14,580 PROFESSOR: Four would work, three is better. 1393 01:22:14,580 --> 01:22:16,512 Can you do two? 1394 01:22:19,900 --> 01:22:22,380 No you can't do two, because you got here. 1395 01:22:22,380 --> 01:22:23,820 But you could do three. 1396 01:22:23,820 --> 01:22:29,830 You could do one, two, three, two, one. 1397 01:22:29,830 --> 01:22:30,841 This problem comes up-- 1398 01:22:30,841 --> 01:22:31,715 AUDIENCE: [INAUDIBLE] 1399 01:22:31,715 --> 01:22:33,010 PROFESSOR: Did I screw up? 1400 01:22:33,010 --> 01:22:34,660 Ooh, no I can't do that. 1401 01:22:34,660 --> 01:22:36,630 One, two, yeah much better. 1402 01:22:36,630 --> 01:22:39,300 All right, this problem comes up all over the place. 1403 01:22:39,300 --> 01:22:41,640 I'm certain you'll see it sometime in your career, 1404 01:22:41,640 --> 01:22:44,015 you'll have some problem, or you're scheduling something, 1405 01:22:44,015 --> 01:22:45,960 and it's really a graph problem in disguise. 1406 01:22:45,960 --> 01:22:48,250 OK that's it for today.