1 00:00:09,890 --> 00:00:11,320 PROFESSOR PATRICK WINSTON: I was in Washington for most of 2 00:00:11,320 --> 00:00:15,050 the week prospecting for gold. 3 00:00:15,050 --> 00:00:19,820 Another byproduct of that was that I forgot to arrange a 4 00:00:19,820 --> 00:00:25,330 substitute Bob Berwick for the Thursday recitations. 5 00:00:25,330 --> 00:00:28,070 I shall probably go to hell for this. 6 00:00:28,070 --> 00:00:31,830 In any event, we have many explanations, 7 00:00:31,830 --> 00:00:34,670 none of them good. 8 00:00:34,670 --> 00:00:38,680 But today we'll try to get back on track and you'll learn 9 00:00:38,680 --> 00:00:40,800 something fun. 10 00:00:40,800 --> 00:00:48,060 In particular you will learn how a graduate student of mine 11 00:00:48,060 --> 00:00:51,940 Mark [? Phillipson ?], together with a summer UROP 12 00:00:51,940 --> 00:00:55,980 student, Brett van Zuiden, one of you-- 13 00:00:58,710 --> 00:01:02,390 managed to pull off a tour de force and recognize in these 14 00:01:02,390 --> 00:01:06,610 two descriptions the pattern that we humans commonly call 15 00:01:06,610 --> 00:01:09,860 "revenge." It was discovered. 16 00:01:09,860 --> 00:01:13,090 The system didn't have a name for it, of course. 17 00:01:13,090 --> 00:01:16,160 It just knew that there was a pattern there and sat waiting 18 00:01:16,160 --> 00:01:17,340 for us to give a name to it. 19 00:01:17,340 --> 00:01:20,635 That's where we're going to end up. 20 00:01:20,635 --> 00:01:22,840 But it'll be a bit of a journey before we get there, 21 00:01:22,840 --> 00:01:24,039 because we've got to go through all that 22 00:01:24,039 --> 00:01:26,250 stuff on the outline. 23 00:01:26,250 --> 00:01:29,740 And in particular, we want to start off by a 24 00:01:29,740 --> 00:01:30,910 little tiny bit of review. 25 00:01:30,910 --> 00:01:33,310 Because some of the stuff we did last time 26 00:01:33,310 --> 00:01:36,440 went by pretty fast. 27 00:01:36,440 --> 00:01:39,630 In particular, you may remember they had this 28 00:01:39,630 --> 00:01:42,220 wonderful joint probability table, which tells us all we 29 00:01:42,220 --> 00:01:45,229 want to know, all we want to know. 30 00:01:45,229 --> 00:01:48,950 We can decide what the probability of the police 31 00:01:48,950 --> 00:01:52,520 being called is given the this and the that, and all that 32 00:01:52,520 --> 00:01:56,080 sort of stuff, by clicking the appropriate boxes. 33 00:01:56,080 --> 00:02:00,920 The trouble is, gee, there are only three variables there. 34 00:02:00,920 --> 00:02:03,560 And when there are lots of variables it gets pretty hard 35 00:02:03,560 --> 00:02:07,010 to make up those numbers or to even collect them. 36 00:02:07,010 --> 00:02:08,550 So we're driven to an alternative. 37 00:02:08,550 --> 00:02:11,890 And we got to that alternative just at the end of 38 00:02:11,890 --> 00:02:15,210 the show a week ago. 39 00:02:15,210 --> 00:02:21,750 And we got to the point where we were defining these 40 00:02:21,750 --> 00:02:25,370 inference nets, sometimes called "Bayes nets." And the 41 00:02:25,370 --> 00:02:29,579 one we worked with looked like this. 42 00:02:29,579 --> 00:02:33,060 There's a burglar, a raccoon, the possibility of a dog 43 00:02:33,060 --> 00:02:36,500 barking, the police being called, and a trash can being 44 00:02:36,500 --> 00:02:38,720 overturned. 45 00:02:38,720 --> 00:02:40,620 So more variables than that. 46 00:02:40,620 --> 00:02:41,579 That only has three. 47 00:02:41,579 --> 00:02:44,420 This has got five. 48 00:02:44,420 --> 00:02:47,590 But we're able to do some magic with this because we, as 49 00:02:47,590 --> 00:02:48,960 humans, when we define-- 50 00:02:48,960 --> 00:02:52,130 when we draw this graph we're making an assertion about how 51 00:02:52,130 --> 00:02:55,740 things depend or don't depend on one another. 52 00:02:55,740 --> 00:03:00,410 In particular, there's something to break down and 53 00:03:00,410 --> 00:03:03,240 memorize to the point where it rolls off your tongue. 54 00:03:03,240 --> 00:03:08,240 And that is that any variable on this graph is said by me to 55 00:03:08,240 --> 00:03:10,310 be independent of any other 56 00:03:10,310 --> 00:03:14,500 non-descendant given its parents. 57 00:03:14,500 --> 00:03:15,770 Independent of any 58 00:03:15,770 --> 00:03:19,690 non-descendant given its parents. 59 00:03:19,690 --> 00:03:21,930 So that means that the probability of the dog 60 00:03:21,930 --> 00:03:27,340 barking, given its parents, doesn't depend on T, the trash 61 00:03:27,340 --> 00:03:28,930 can being overturned. 62 00:03:28,930 --> 00:03:32,900 Because the intuition is all of the causality is flowing 63 00:03:32,900 --> 00:03:38,800 through the parents and can't get to this variable D without 64 00:03:38,800 --> 00:03:41,440 going through the parents. 65 00:03:41,440 --> 00:03:42,680 So that is [? inserted ?] 66 00:03:42,680 --> 00:03:44,290 property of the nets that we draw. 67 00:03:44,290 --> 00:03:46,960 And we tend to draw them in a way that reflects causality. 68 00:03:46,960 --> 00:03:49,000 So it tends to make sense. 69 00:03:49,000 --> 00:03:56,329 So somehow this thing is going to be-- 70 00:03:56,329 --> 00:03:59,710 we're going to use this thing instead of that thing. 71 00:03:59,710 --> 00:04:00,880 But wait. 72 00:04:00,880 --> 00:04:02,920 We may need that thing in order to do all the 73 00:04:02,920 --> 00:04:05,530 computations we want to perform. 74 00:04:05,530 --> 00:04:09,930 So we need to be able to show that we can get to that thing 75 00:04:09,930 --> 00:04:15,160 by doing calculations on this thing. 76 00:04:15,160 --> 00:04:18,300 So what to do? 77 00:04:18,300 --> 00:04:20,100 Well, we're going to use the chain rule. 78 00:04:20,100 --> 00:04:23,450 And remember that the chain rule came to us by way of the 79 00:04:23,450 --> 00:04:28,250 basic Axioms of Probability plus the definition plus a 80 00:04:28,250 --> 00:04:30,580 little colored chalk. 81 00:04:30,580 --> 00:04:33,570 So we got to the point last time where we 82 00:04:33,570 --> 00:04:34,750 sort of believed this. 83 00:04:34,750 --> 00:04:35,920 It's a really magical thing. 84 00:04:35,920 --> 00:04:38,670 It says that the probability of all this stuff happening 85 00:04:38,670 --> 00:04:42,170 together is given as the product of a bunch of 86 00:04:42,170 --> 00:04:44,470 conditional probabilities. 87 00:04:44,470 --> 00:04:47,380 And the conditional probabilities in this product 88 00:04:47,380 --> 00:04:49,430 are arranged such that this first guy depends 89 00:04:49,430 --> 00:04:51,500 on everybody else. 90 00:04:51,500 --> 00:04:54,050 The second guy doesn't depend on the first guy but depends 91 00:04:54,050 --> 00:04:55,690 on everything else. 92 00:04:55,690 --> 00:04:58,850 So that list of dependencies gets smaller and smaller as 93 00:04:58,850 --> 00:05:01,380 you go down here until it depends only one thing. 94 00:05:01,380 --> 00:05:05,050 There's no conditional at all. 95 00:05:05,050 --> 00:05:08,780 So that's going to come to our rescue because it enables us 96 00:05:08,780 --> 00:05:14,210 to go from calculations in here to that whole table. 97 00:05:14,210 --> 00:05:17,740 But first I have to show you a little bit more slowly how 98 00:05:17,740 --> 00:05:19,720 that comes to be. 99 00:05:19,720 --> 00:05:21,800 One thing I'm going to do before I think about 100 00:05:21,800 --> 00:05:25,650 probability is I'm going to make a linear list of all 101 00:05:25,650 --> 00:05:27,330 these variables. 102 00:05:27,330 --> 00:05:29,290 And the way I'm going to make it is I'm going to chew away 103 00:05:29,290 --> 00:05:32,030 at those variables from the bottom. 104 00:05:32,030 --> 00:05:33,810 I've taken advantage of a very important 105 00:05:33,810 --> 00:05:36,630 property of these nets. 106 00:05:36,630 --> 00:05:39,490 And that is there no loops. 107 00:05:39,490 --> 00:05:43,090 You can follow the arrows in any way so as 108 00:05:43,090 --> 00:05:45,470 you get back to yourself. 109 00:05:45,470 --> 00:05:48,200 So there's always going to be a bottom. 110 00:05:48,200 --> 00:05:50,300 So what I'm going to do is I'm going to say, well, there are 111 00:05:50,300 --> 00:05:53,409 two bottoms here, there's C and T. So I have a choice. 112 00:05:53,409 --> 00:06:02,330 I'm going to choose C. So I'm going to take that off and 113 00:06:02,330 --> 00:06:05,570 pretend it's not there anymore. 114 00:06:05,570 --> 00:06:06,980 Then I'm going to take this guy. 115 00:06:06,980 --> 00:06:09,430 That's now a bottom because there's nothing below it. 116 00:06:09,430 --> 00:06:11,430 I've already taken C out. 117 00:06:11,430 --> 00:06:14,590 So we'll take that out next. 118 00:06:14,590 --> 00:06:18,440 And now I've got this guy, this guy, and this guy. 119 00:06:18,440 --> 00:06:21,170 This guy no longer has anything below it. 120 00:06:21,170 --> 00:06:23,430 So I can list it next. 121 00:06:23,430 --> 00:06:26,590 Now over here I've got raccoon and trashcan. 122 00:06:26,590 --> 00:06:30,450 But trashcan is at the bottom. 123 00:06:30,450 --> 00:06:34,040 So I've got to take it next because I'm working 124 00:06:34,040 --> 00:06:36,011 from the bottom up. 125 00:06:36,011 --> 00:06:40,760 I want to ensure that there are no descendants before me 126 00:06:40,760 --> 00:06:42,940 in this list. 127 00:06:42,940 --> 00:06:45,780 So finally I get to raccoon. 128 00:06:45,780 --> 00:06:51,930 So the way I constructed this list like so ensures that this 129 00:06:51,930 --> 00:06:55,020 list arranges the elements so that for any particular 130 00:06:55,020 --> 00:06:57,210 element, none of its descendants 131 00:06:57,210 --> 00:06:58,460 appear to its left. 132 00:07:01,300 --> 00:07:04,260 And now that's the magical order for which I want to use 133 00:07:04,260 --> 00:07:06,030 the chain rule. 134 00:07:06,030 --> 00:07:07,960 So now I can write-- 135 00:07:07,960 --> 00:07:11,260 I can pick C to be my variable n. 136 00:07:11,260 --> 00:07:13,530 And I can say that the chain rule says that the joint 137 00:07:13,530 --> 00:07:17,410 probability of all these variables P of C, 138 00:07:17,410 --> 00:07:22,140 D, B, T, and R-- 139 00:07:22,140 --> 00:07:24,360 the probability of any particular combination of 140 00:07:24,360 --> 00:07:28,140 those things is equal to the probability of C given 141 00:07:28,140 --> 00:07:29,390 everybody else. 142 00:07:33,659 --> 00:07:38,190 Next in line is D given everybody else. 143 00:07:42,470 --> 00:07:45,400 Next in line is T-- 144 00:07:45,400 --> 00:07:49,420 next in line is B given everybody else. 145 00:07:57,380 --> 00:08:00,720 And next in line is T given everybody else. 146 00:08:03,270 --> 00:08:10,840 And finally, just R. So this combination of things has a 147 00:08:10,840 --> 00:08:13,145 probability that is given by this chain rule expression. 148 00:08:15,960 --> 00:08:17,010 Ah. 149 00:08:17,010 --> 00:08:19,500 But first of all, none of those expressions condition 150 00:08:19,500 --> 00:08:21,740 any of the variables on anything other than 151 00:08:21,740 --> 00:08:25,970 non-descendants, all right? 152 00:08:25,970 --> 00:08:28,650 That's just because of the way I've arranged the variables. 153 00:08:28,650 --> 00:08:30,950 And I can always do that because are no loops. 154 00:08:30,950 --> 00:08:33,169 I can always chew away at the bottom. 155 00:08:33,169 --> 00:08:35,140 That ensures that whenever I write a variable, it's going 156 00:08:35,140 --> 00:08:39,440 to be conditioned on stuff other than its descendants. 157 00:08:39,440 --> 00:08:43,190 So all of these variables in any of these conditional 158 00:08:43,190 --> 00:08:46,700 probabilities are non-descendants. 159 00:08:46,700 --> 00:08:48,740 Oh wait. 160 00:08:48,740 --> 00:08:54,510 When I drew this diagram, I asserted that no variable 161 00:08:54,510 --> 00:08:58,680 depends on any non-descendant given its parents. 162 00:08:58,680 --> 00:09:02,950 So if I know the parents of a variable I know that the 163 00:09:02,950 --> 00:09:05,895 variable is independent of all other non-descendants. 164 00:09:05,895 --> 00:09:07,320 All right? 165 00:09:07,320 --> 00:09:10,590 Now I can start scratching stuff out. 166 00:09:10,590 --> 00:09:11,690 Well, let's see. 167 00:09:11,690 --> 00:09:16,120 I know that C, from my diagram, has only one parent, 168 00:09:16,120 --> 00:09:21,080 D. So given its parent, it's independent of all other 169 00:09:21,080 --> 00:09:22,730 non-descendants. 170 00:09:22,730 --> 00:09:23,980 So I can scratch them out. 171 00:09:27,370 --> 00:09:33,470 D he has two parents, B and R. But given that, I can scratch 172 00:09:33,470 --> 00:09:34,720 out any other non-descendant. 173 00:09:37,190 --> 00:09:41,460 B is conditional on T and R. Ah, but B has no parent. 174 00:09:41,460 --> 00:09:45,120 So it actually is independent of those two guys. 175 00:09:45,120 --> 00:09:48,330 The trashcan, yeah, that's dependent on R. And R over 176 00:09:48,330 --> 00:09:50,640 here, the final thing in the chain, that's just a 177 00:09:50,640 --> 00:09:53,160 probability. 178 00:09:53,160 --> 00:09:56,390 So now I have a way of calculating any entry in that 179 00:09:56,390 --> 00:09:58,960 table because any entry in that table is going to be some 180 00:09:58,960 --> 00:10:02,880 combination of values for all those variables. 181 00:10:02,880 --> 00:10:04,440 Voila. 182 00:10:04,440 --> 00:10:08,320 So anything I can do with a table, I can do in principle 183 00:10:08,320 --> 00:10:10,580 with this little network. 184 00:10:10,580 --> 00:10:11,920 OK? 185 00:10:11,920 --> 00:10:15,480 But now the question is, I've got some probabilities I'm 186 00:10:15,480 --> 00:10:17,100 going to have to figure out here. 187 00:10:17,100 --> 00:10:19,570 So let me draw a slightly different version of it. 188 00:10:26,010 --> 00:10:32,030 So up here we've got the a priori probability of B. Well, 189 00:10:32,030 --> 00:10:46,680 that's just probability of B. Down here with the dog, I've 190 00:10:46,680 --> 00:10:50,350 got a bigger table because I've got probabilities that 191 00:10:50,350 --> 00:10:52,950 depend on the values of its parents. 192 00:10:52,950 --> 00:10:57,620 The probability of dog barking depends on the condition of 193 00:10:57,620 --> 00:11:00,560 the parents, nothing else. 194 00:11:00,560 --> 00:11:01,070 So let's see. 195 00:11:01,070 --> 00:11:03,920 I've got to have a column for B. I've got to have a column 196 00:11:03,920 --> 00:11:05,940 for the burglar and the raccoon. 197 00:11:08,660 --> 00:11:11,940 And there are a bunch of possibilities for those guys. 198 00:11:11,940 --> 00:11:14,050 But once I get those then I'll be able to calculate the 199 00:11:14,050 --> 00:11:15,670 probability of the dog barking. 200 00:11:18,640 --> 00:11:19,910 So there are two of these variables. 201 00:11:19,910 --> 00:11:22,310 So there are four combinations. 202 00:11:22,310 --> 00:11:30,390 There's T T. There's T R, R T, and-- 203 00:11:30,390 --> 00:11:32,134 whoa, what am I doing? 204 00:11:32,134 --> 00:11:34,050 Wake up! 205 00:11:34,050 --> 00:11:36,720 T false. 206 00:11:36,720 --> 00:11:37,930 False true. 207 00:11:37,930 --> 00:11:39,890 And false false. 208 00:11:39,890 --> 00:11:42,060 So what I really want to do is I want to calculate all of 209 00:11:42,060 --> 00:11:45,070 these probabilities that give the probability of the dog 210 00:11:45,070 --> 00:11:49,210 condition of the burglar and the raccoon. 211 00:11:49,210 --> 00:11:53,070 Similarly, I want to calculate the probability of B happening 212 00:11:53,070 --> 00:11:56,200 doesn't depend on anything else. 213 00:11:56,200 --> 00:11:58,590 So I don't know what to do. 214 00:11:58,590 --> 00:12:00,660 Well, what I'm going to actually do is I'm going to do 215 00:12:00,660 --> 00:12:03,410 the same thing I had to do up there. 216 00:12:03,410 --> 00:12:06,630 I'm going to keep track of-- 217 00:12:06,630 --> 00:12:07,640 I'm going to try a bunch of-- 218 00:12:07,640 --> 00:12:09,560 I'm going to get myself together a bunch of data. 219 00:12:09,560 --> 00:12:10,910 Maybe I do a bunch of experiments. 220 00:12:10,910 --> 00:12:12,730 Maybe somebody hands it to me. 221 00:12:12,730 --> 00:12:16,030 But I'm going to use that data to construct a bunch of 222 00:12:16,030 --> 00:12:19,990 tallies which are going to end up giving me the probabilities 223 00:12:19,990 --> 00:12:22,150 for all of those things. 224 00:12:22,150 --> 00:12:23,160 So I don't know, let's see. 225 00:12:23,160 --> 00:12:25,440 How should we start? 226 00:12:25,440 --> 00:12:27,290 Step one, find colored chalk. 227 00:12:27,290 --> 00:12:30,870 Step two, I'm going to extend these tables a little bit so I 228 00:12:30,870 --> 00:12:32,120 can keep track of the tallies. 229 00:12:38,840 --> 00:12:41,580 So this is going to be all the ones that end up in a 230 00:12:41,580 --> 00:12:42,980 particular row. 231 00:12:42,980 --> 00:12:47,450 And these are going to be the ones for which dog is true. 232 00:12:50,930 --> 00:12:53,960 Similarly, I'm going to extend this guy up here in order to 233 00:12:53,960 --> 00:12:56,020 keep track of some tallies. 234 00:12:56,020 --> 00:13:00,020 This is going to be the ones for which B is true. 235 00:13:02,990 --> 00:13:04,240 And this one will be all. 236 00:13:06,780 --> 00:13:08,680 So that's my set up. 237 00:13:08,680 --> 00:13:12,490 And now suppose that my first experiment comes roaring in. 238 00:13:12,490 --> 00:13:15,060 And it's all T's. 239 00:13:15,060 --> 00:13:21,720 So I have T T T. That's my first experimental result, my 240 00:13:21,720 --> 00:13:24,170 first data item. 241 00:13:24,170 --> 00:13:27,480 So let's see. 242 00:13:27,480 --> 00:13:31,010 The arrangement here is burglar, raccoon, dog. 243 00:13:31,010 --> 00:13:35,040 So burglar as a true. 244 00:13:35,040 --> 00:13:38,510 And there's one tally count in there. 245 00:13:38,510 --> 00:13:42,280 Likewise, the T T, that's the burglar and the raccoon, that 246 00:13:42,280 --> 00:13:45,190 brings me down to this first row. 247 00:13:45,190 --> 00:13:49,200 So that gives me one tally in there and dog is true so that 248 00:13:49,200 --> 00:13:52,475 gives me a tick mark in that one. 249 00:13:52,475 --> 00:13:53,190 All right? 250 00:13:53,190 --> 00:13:54,240 Are you with me so far? 251 00:13:54,240 --> 00:13:55,850 And now let's suppose that the next thing 252 00:13:55,850 --> 00:13:57,100 happens be all false. 253 00:14:01,940 --> 00:14:04,020 Well, burglar is false. 254 00:14:04,020 --> 00:14:05,545 But there is one experiment. 255 00:14:08,590 --> 00:14:09,980 Everybody's false. 256 00:14:09,980 --> 00:14:13,500 So we come down here to false false. 257 00:14:13,500 --> 00:14:14,810 And that's the row we're going to work on. 258 00:14:14,810 --> 00:14:16,650 We get a tally in there. 259 00:14:16,650 --> 00:14:19,150 Do we put one in here? 260 00:14:19,150 --> 00:14:21,860 No, because that's false. 261 00:14:21,860 --> 00:14:22,785 Dog is false. 262 00:14:22,785 --> 00:14:25,650 That's what our data element says. 263 00:14:25,650 --> 00:14:26,640 So that's cool. 264 00:14:26,640 --> 00:14:28,050 Maybe one more. 265 00:14:28,050 --> 00:14:37,210 Let's suppose we have T T F. Well in that case, we have a 266 00:14:37,210 --> 00:14:42,020 tick mark here and a tick mark here because the burglar 267 00:14:42,020 --> 00:14:44,360 element is true. 268 00:14:44,360 --> 00:14:48,040 Then we have T T. That brings us to the first row again. 269 00:14:48,040 --> 00:14:50,180 So we get a tick mark there. 270 00:14:50,180 --> 00:14:53,500 But dog as false, so no tick mark there. 271 00:14:53,500 --> 00:14:55,170 That's how it works. 272 00:14:55,170 --> 00:14:58,550 I suppose you'd like to see a demonstration, right? 273 00:14:58,550 --> 00:15:01,030 Always like to see a demonstration. 274 00:15:01,030 --> 00:15:02,850 So here's what it actually looks like. 275 00:15:05,680 --> 00:15:10,060 So on the left you see the network as we've constructed 276 00:15:10,060 --> 00:15:12,420 it, with a bunch probabilities there. 277 00:15:12,420 --> 00:15:13,850 And what I'm going to do now is I'm going to start 278 00:15:13,850 --> 00:15:17,850 simulating away so as to accumulate tick marks, tally 279 00:15:17,850 --> 00:15:20,160 marks, and see what kinds of probabilities that they 280 00:15:20,160 --> 00:15:21,920 indicate for the table. 281 00:15:21,920 --> 00:15:25,610 I happen to be using a process for which the model on the 282 00:15:25,610 --> 00:15:29,070 left is a correct reflection. 283 00:15:29,070 --> 00:15:31,540 So there's one simulation. 284 00:15:31,540 --> 00:15:33,460 So the dog barking-- 285 00:15:33,460 --> 00:15:34,800 let's see, the burglar is false. 286 00:15:34,800 --> 00:15:36,600 The raccoon is true. 287 00:15:36,600 --> 00:15:37,710 I get one tick mark. 288 00:15:37,710 --> 00:15:40,130 So the probability there is one. 289 00:15:40,130 --> 00:15:43,300 Of course, I'm not going to just go with one. 290 00:15:43,300 --> 00:15:44,840 I want to put a whole bunch of stuff in there. 291 00:15:44,840 --> 00:15:46,300 So I'll just run a bunch more simulations. 292 00:15:50,820 --> 00:15:51,270 No [? dice. ?] 293 00:15:51,270 --> 00:15:54,746 I don't even have an entry at all yet for T F here. 294 00:15:54,746 --> 00:15:57,210 That's because I haven't run enough data. 295 00:15:57,210 --> 00:16:00,280 So let me clear it instead of doing it one at a time. 296 00:16:00,280 --> 00:16:02,280 Let me run 100 simulations. 297 00:16:02,280 --> 00:16:03,450 See, it's still not too good. 298 00:16:03,450 --> 00:16:07,500 Because it says this T T probability true. 299 00:16:07,500 --> 00:16:09,910 This just because I'm feeding it data, right? 300 00:16:09,910 --> 00:16:12,730 And I'm keeping track of what the data elements tell me 301 00:16:12,730 --> 00:16:15,470 about how frequently a 302 00:16:15,470 --> 00:16:17,180 particular combination appears. 303 00:16:17,180 --> 00:16:17,680 Yes, [INAUDIBLE] 304 00:16:17,680 --> 00:16:20,668 STUDENT: So when you're doing one simulation, is that 305 00:16:20,668 --> 00:16:22,162 [INAUDIBLE] variables? 306 00:16:22,162 --> 00:16:23,832 PROFESSOR PATRICK WINSTON: When I'm doing one simulation, 307 00:16:23,832 --> 00:16:26,930 I'm just keeping track of that combination in 308 00:16:26,930 --> 00:16:28,800 each of these tables. 309 00:16:28,800 --> 00:16:30,630 Because it's going to tell me something about the 310 00:16:30,630 --> 00:16:33,730 probabilities that I want reflected in those tables. 311 00:16:33,730 --> 00:16:37,090 So it's pretty easy to see when I go up here to burglar. 312 00:16:37,090 --> 00:16:38,920 If I have a lot of data elements, they're all going to 313 00:16:38,920 --> 00:16:40,730 tell me something about the burglar as well 314 00:16:40,730 --> 00:16:42,280 as the other variables. 315 00:16:42,280 --> 00:16:44,910 So if I just look at that burglar thing, the fraction of 316 00:16:44,910 --> 00:16:47,850 time that it turns out true over all the data elements is 317 00:16:47,850 --> 00:16:50,150 going to be its probability. 318 00:16:50,150 --> 00:16:53,610 So now when I go down to the joint tables, I can still get 319 00:16:53,610 --> 00:16:54,630 these probability numbers. 320 00:16:54,630 --> 00:16:56,400 But now they're conditioned on reticular 321 00:16:56,400 --> 00:16:59,010 condition of its parents. 322 00:16:59,010 --> 00:17:01,430 So that's how I get these probabilities. 323 00:17:01,430 --> 00:17:04,700 So I didn't do too well here because that T T combination 324 00:17:04,700 --> 00:17:06,579 gave me an excessively high probability. 325 00:17:06,579 --> 00:17:10,130 So maybe 100 simulations isn't enough. 326 00:17:10,130 --> 00:17:15,339 Let's run 10,000. 327 00:17:15,339 --> 00:17:18,560 So with that much data running through, the probabilities I 328 00:17:18,560 --> 00:17:22,310 get-- let's see, I've got 893 here, instead of 0.9, 807 329 00:17:22,310 --> 00:17:25,880 instead of 0.8, 607 instead of 0.6. 330 00:17:25,880 --> 00:17:28,500 And that one's dead-on at 0.01. 331 00:17:28,500 --> 00:17:30,570 So if I run enough of these simulations, I get a pretty 332 00:17:30,570 --> 00:17:33,220 good idea what the probabilities ought to be 333 00:17:33,220 --> 00:17:37,030 given that I've got a correct model. 334 00:17:37,030 --> 00:17:38,860 OK, so that takes care of that one. 335 00:17:38,860 --> 00:17:40,880 And of course, I didn't draw the other things in here. 336 00:17:40,880 --> 00:17:44,800 But by extension, you can see how those would work. 337 00:17:44,800 --> 00:17:46,180 Oh. 338 00:17:46,180 --> 00:17:46,810 But you know what? 339 00:17:46,810 --> 00:17:48,895 I think I will put a little probability of 340 00:17:48,895 --> 00:17:50,145 raccoon table in here. 341 00:17:52,550 --> 00:17:55,390 Because the next thing I want to do is I want to 342 00:17:55,390 --> 00:17:56,470 go the other way. 343 00:17:56,470 --> 00:17:59,365 This is recoding tallies from some process so I 344 00:17:59,365 --> 00:18:00,730 can develop a model. 345 00:18:00,730 --> 00:18:03,670 But once I've got these probabilities, of course, then 346 00:18:03,670 --> 00:18:07,780 I can start to simulate what the model would do. 347 00:18:07,780 --> 00:18:08,610 All right? 348 00:18:08,610 --> 00:18:10,690 How would I do that? 349 00:18:10,690 --> 00:18:16,220 Well, do I want to use the same table? 350 00:18:16,220 --> 00:18:19,050 I think just to keep things sanitary, what I'll do is I'll 351 00:18:19,050 --> 00:18:21,340 go over here and do it again. 352 00:18:21,340 --> 00:18:27,260 Here's B. It's got a probability of B. Here's R. 353 00:18:27,260 --> 00:18:33,300 Here's a table probability of R. That comes down into a 354 00:18:33,300 --> 00:18:36,070 joint table for dog. 355 00:18:36,070 --> 00:18:37,320 And it's got four elements. 356 00:18:44,090 --> 00:18:46,940 Depending on the burglar condition and the raccoon 357 00:18:46,940 --> 00:18:51,270 condition, we get a probability of dog. 358 00:18:51,270 --> 00:18:53,990 And now, imagine these have all been filled in. 359 00:18:53,990 --> 00:18:57,290 So what do I want to do if I want to simulate this system 360 00:18:57,290 --> 00:19:02,130 generating some combination of values for all the variables? 361 00:19:02,130 --> 00:19:05,870 Well, I do the opposite of what I did when I was working 362 00:19:05,870 --> 00:19:09,070 around with this chain rule showing that I could go from 363 00:19:09,070 --> 00:19:11,480 the table to those probabilities. 364 00:19:11,480 --> 00:19:12,660 Now I've got the probabilities. 365 00:19:12,660 --> 00:19:14,450 I'm going to go the other direction. 366 00:19:14,450 --> 00:19:17,100 Instead of chewing away from the bottom, I'm going to chew 367 00:19:17,100 --> 00:19:18,660 away from the top. 368 00:19:18,660 --> 00:19:21,740 Because when I go into the top and chew way, everything I 369 00:19:21,740 --> 00:19:24,560 need to know to do a coin flip is there. 370 00:19:24,560 --> 00:19:28,970 So in particular, when I go up in here, I've got the 371 00:19:28,970 --> 00:19:31,350 probability of burglar now. 372 00:19:31,350 --> 00:19:36,250 So I'm going to use that probability to flip a coin. 373 00:19:36,250 --> 00:19:40,950 Say it produces a T. So that takes care of this guy. 374 00:19:40,950 --> 00:19:43,730 And I can now scratch it off since it's no longer in 375 00:19:43,730 --> 00:19:44,390 consideration. 376 00:19:44,390 --> 00:19:46,940 It's no longer a top variable. 377 00:19:46,940 --> 00:19:49,330 So now I go over into raccoon and I do the same thing. 378 00:19:49,330 --> 00:19:51,190 I take this probability. 379 00:19:51,190 --> 00:19:53,310 I do a flip. 380 00:19:53,310 --> 00:19:59,130 And say it produces an F. Whatever its probability is, I 381 00:19:59,130 --> 00:20:02,350 flip a biased coin and that's what I happen to get. 382 00:20:02,350 --> 00:20:06,340 But now, having dealt with these two guys, that uncovers 383 00:20:06,340 --> 00:20:08,230 this dog thing. 384 00:20:08,230 --> 00:20:10,330 And I've got enough information, because I've done 385 00:20:10,330 --> 00:20:13,286 everything above, to make the calculation for whether to dog 386 00:20:13,286 --> 00:20:15,510 is going to be barking or not. 387 00:20:15,510 --> 00:20:16,520 But wait. 388 00:20:16,520 --> 00:20:22,530 I have to know that I've got a T and a T and a T and an F and 389 00:20:22,530 --> 00:20:27,060 an F and a T and an F and an F. Because I have to select 390 00:20:27,060 --> 00:20:28,820 the right row. 391 00:20:28,820 --> 00:20:34,610 So I know that B is T. And I know that R is F. So that 392 00:20:34,610 --> 00:20:39,570 takes me into the table into the second row. 393 00:20:39,570 --> 00:20:42,360 So now I get this probability. 394 00:20:42,360 --> 00:20:49,850 I flip that coin and I get some result, say, T. Voila. 395 00:20:49,850 --> 00:20:51,810 I can do that with the other two variables. 396 00:20:51,810 --> 00:20:55,390 And I've got myself an experimental trial that is 397 00:20:55,390 --> 00:20:56,620 produced in accordance with the 398 00:20:56,620 --> 00:20:59,219 probabilities of the table. 399 00:20:59,219 --> 00:21:00,469 OK? 400 00:21:03,390 --> 00:21:04,640 Of course-- 401 00:21:08,356 --> 00:21:13,730 yeah, in fact, how did I get those numbers? 402 00:21:13,730 --> 00:21:16,990 Actually what I did is I used the model on the left to 403 00:21:16,990 --> 00:21:19,330 generate the samples that were used to compute the 404 00:21:19,330 --> 00:21:22,240 probabilities on the right. 405 00:21:22,240 --> 00:21:27,910 So you've seen that a demonstration of this already. 406 00:21:27,910 --> 00:21:31,330 Now of course-- 407 00:21:31,330 --> 00:21:36,400 I don't know, all of this sort of depends on having 408 00:21:36,400 --> 00:21:38,030 everything right. 409 00:21:38,030 --> 00:21:42,130 I've written a thing to write it one more time. 410 00:21:42,130 --> 00:21:51,580 Burglar, raccoon, dog, call the police, trashcan. 411 00:21:51,580 --> 00:21:54,440 But somebody else may say, oh, you've got it all wrong. 412 00:21:54,440 --> 00:21:55,690 This is what it really looks like. 413 00:21:58,560 --> 00:22:01,780 The dog doesn't care about the raccoon at all. 414 00:22:01,780 --> 00:22:04,130 So that's a correct model. 415 00:22:04,130 --> 00:22:06,380 Now when I do a simulation, I could fill in the tables in 416 00:22:06,380 --> 00:22:07,810 either model, right? 417 00:22:07,810 --> 00:22:10,800 I'm sure you'd like to see a demonstration. 418 00:22:10,800 --> 00:22:13,420 So let me show you a demonstration of that. 419 00:22:21,470 --> 00:22:23,040 So there are the two tables. 420 00:22:23,040 --> 00:22:25,060 And I can run 10,000 simulations 421 00:22:25,060 --> 00:22:26,310 on those guys, too. 422 00:22:28,530 --> 00:22:29,270 Now, look. 423 00:22:29,270 --> 00:22:31,150 The guy on the left is a pretty good reflection of the 424 00:22:31,150 --> 00:22:37,630 probabilities in a model I used to produce the data. 425 00:22:37,630 --> 00:22:39,253 But the guy on the right doesn't know any better. it 426 00:22:39,253 --> 00:22:42,960 just fills in its own tables, too. 427 00:22:42,960 --> 00:22:45,950 So what to do? 428 00:22:45,950 --> 00:22:48,060 I say this one's the right model. 429 00:22:48,060 --> 00:22:50,370 And you say that one's the right model. 430 00:22:50,370 --> 00:22:52,090 Who's right? 431 00:22:52,090 --> 00:22:54,120 Maybe we'll never know. 432 00:22:54,120 --> 00:22:57,830 And the guy on the left will get rich in the stock market 433 00:22:57,830 --> 00:23:00,630 and the guy on the right will go broke. 434 00:23:00,630 --> 00:23:01,370 I would be nice if we could actually 435 00:23:01,370 --> 00:23:04,520 figure out who's right. 436 00:23:04,520 --> 00:23:07,466 So would you to see how to figure out who's right? 437 00:23:07,466 --> 00:23:09,220 Yeah, so would I. What we're going to do is we're going to 438 00:23:09,220 --> 00:23:11,490 look at naive Bayesian inference. 439 00:23:11,490 --> 00:23:13,660 And that's our next chore. 440 00:23:13,660 --> 00:23:16,740 So here's how it works. 441 00:23:16,740 --> 00:23:20,800 We know, from the definition of conditional probability, we 442 00:23:20,800 --> 00:23:25,530 know that the probability of A given B is equal to the 443 00:23:25,530 --> 00:23:29,660 probability of A and B divided by the 444 00:23:29,660 --> 00:23:33,520 probability of B, right? 445 00:23:33,520 --> 00:23:36,400 Equal to by definition. 446 00:23:36,400 --> 00:23:43,540 So that means that the probability of A given B times 447 00:23:43,540 --> 00:23:45,240 the probability of B-- 448 00:23:45,240 --> 00:23:46,680 I'm just multiplying it out-- 449 00:23:46,680 --> 00:23:48,050 it equal to that joint probability. 450 00:23:52,690 --> 00:23:57,715 Oh, but by symmetry, there's no harm in saying I can turn 451 00:23:57,715 --> 00:24:02,450 that around and say that the probability of B given A times 452 00:24:02,450 --> 00:24:07,000 the probability of B is also equal to that joint 453 00:24:07,000 --> 00:24:08,690 probability, right? 454 00:24:08,690 --> 00:24:12,750 I've just expanded it a different and symmetric way. 455 00:24:12,750 --> 00:24:18,630 If I've got to write a, b on B, b, a on A. Thank you. 456 00:24:18,630 --> 00:24:19,590 Who was complaining? 457 00:24:19,590 --> 00:24:20,840 Good work. 458 00:24:24,200 --> 00:24:27,800 That would have been a major-league disaster. 459 00:24:27,800 --> 00:24:30,910 But now, having written that, I can forget about the middle. 460 00:24:30,910 --> 00:24:32,730 Because all I'm really interested in is how I've 461 00:24:32,730 --> 00:24:37,070 turned the probabilities around in that conditional. 462 00:24:37,070 --> 00:24:38,570 Why would I care about doing that? 463 00:24:38,570 --> 00:24:41,760 By the way, we're now talking about the work of 464 00:24:41,760 --> 00:24:43,010 the Reverend Bayes. 465 00:24:47,060 --> 00:24:50,420 Because we can rewrite this yet again as the probability 466 00:24:50,420 --> 00:25:00,160 of A given B is equal to the probability of B given A times 467 00:25:00,160 --> 00:25:06,420 the probability of A divided by the probability of B. 468 00:25:06,420 --> 00:25:09,790 That's just elementary algebra. 469 00:25:09,790 --> 00:25:13,460 But now I'm going to do something magical. 470 00:25:13,460 --> 00:25:18,670 I'm going to say I've got a classification problem. 471 00:25:18,670 --> 00:25:22,120 I want to know which disease you have. 472 00:25:22,120 --> 00:25:23,660 That's a classification problem. 473 00:25:23,660 --> 00:25:26,120 Maybe you've got the swine flu. 474 00:25:26,120 --> 00:25:29,790 Maybe you've got indigestion. 475 00:25:29,790 --> 00:25:30,740 Who knows. 476 00:25:30,740 --> 00:25:33,470 But I get all these symptoms. 477 00:25:33,470 --> 00:25:35,950 I get all these pieces of evidence. 478 00:25:35,950 --> 00:25:37,450 You've got a fever. 479 00:25:37,450 --> 00:25:38,140 You're throwing-- 480 00:25:38,140 --> 00:25:40,470 oh, well, let's not go into too much detail, there. 481 00:25:40,470 --> 00:25:42,060 But what I'm going to do is I'm going to say, well, let's 482 00:25:42,060 --> 00:25:48,490 suppose that A is equal to a class that I'm interested in, 483 00:25:48,490 --> 00:25:50,050 the disease you've got. 484 00:25:50,050 --> 00:25:56,510 And B is equal to the evidence, 485 00:25:56,510 --> 00:25:57,760 the symptoms I observe. 486 00:26:00,540 --> 00:26:02,240 Voila. 487 00:26:02,240 --> 00:26:04,150 I may have a pretty hard time figuring out what the 488 00:26:04,150 --> 00:26:07,260 probability of the class is given the evidence. 489 00:26:07,260 --> 00:26:08,920 But figuring out the probability of the evidence 490 00:26:08,920 --> 00:26:11,420 given the class might not be so hard. 491 00:26:11,420 --> 00:26:14,500 Let me get another board in play and show you what I mean. 492 00:26:20,128 --> 00:26:24,820 By plugging class and evidence into Bayes' rule, what I get 493 00:26:24,820 --> 00:26:31,540 is the probability of some class given the evidence is 494 00:26:31,540 --> 00:26:37,710 equal to the probability of the evidence given the class 495 00:26:37,710 --> 00:26:42,060 times the probability of the class divided by the 496 00:26:42,060 --> 00:26:43,690 probability of the evidence. 497 00:26:46,280 --> 00:26:49,230 Now you've got to let that sing to you a little bit. 498 00:26:49,230 --> 00:26:51,590 Suppose I've got several classes that I'm trying to 499 00:26:51,590 --> 00:26:54,270 decide between. 500 00:26:54,270 --> 00:26:58,500 I'm trying to select the best out of that batch of classes. 501 00:26:58,500 --> 00:26:59,800 Well, I've got the evidence. 502 00:26:59,800 --> 00:27:02,140 And if I know the probability of the evidence given each of 503 00:27:02,140 --> 00:27:06,130 those classes, and if I know, a priori, the initial 504 00:27:06,130 --> 00:27:09,950 probability the class, then I'm done. 505 00:27:09,950 --> 00:27:12,910 Because I've got the two elements in the numerator. 506 00:27:12,910 --> 00:27:14,590 Why am I done? 507 00:27:14,590 --> 00:27:18,630 Because the denominator is the same for all the classes. 508 00:27:18,630 --> 00:27:21,490 It's just the probability of the evidence. 509 00:27:21,490 --> 00:27:22,870 And then I could just sum everything up. 510 00:27:22,870 --> 00:27:25,440 I know it adds to 1 anyway. 511 00:27:25,440 --> 00:27:27,440 So that's cool. 512 00:27:27,440 --> 00:27:31,240 But sometimes there's evidence-- 513 00:27:31,240 --> 00:27:33,980 actually there's more than one piece of evidence. 514 00:27:33,980 --> 00:27:35,660 Let's say that there's some class. 515 00:27:35,660 --> 00:27:38,330 some i, and we're trying to figure out if that's the 516 00:27:38,330 --> 00:27:39,720 correct class. 517 00:27:39,720 --> 00:27:43,000 So we've got c sub i there and c sub i there. 518 00:27:43,000 --> 00:27:46,420 And suppose that that evidence is actually a bunch of pieces 519 00:27:46,420 --> 00:27:47,780 of evidence. 520 00:27:47,780 --> 00:27:56,580 So it could be e sub 1, e sub n, oops, 521 00:27:56,580 --> 00:27:58,880 premature right bracket. 522 00:27:58,880 --> 00:28:03,430 All that evidence, given the class i times the probability 523 00:28:03,430 --> 00:28:07,820 of the class i over some denominator that we don't care 524 00:28:07,820 --> 00:28:11,700 about because it's going to be the same for everybody. 525 00:28:11,700 --> 00:28:15,110 So we'll just write that as d. 526 00:28:15,110 --> 00:28:19,160 Now what if these pieces of evidence are all independent 527 00:28:19,160 --> 00:28:20,410 given the class? 528 00:28:22,720 --> 00:28:25,590 So if you have the swine flu, the probability you have a 529 00:28:25,590 --> 00:28:28,117 fever is independent of the probability you're going to 530 00:28:28,117 --> 00:28:31,610 throw up, say. 531 00:28:31,610 --> 00:28:34,280 Then can we write this another way? 532 00:28:34,280 --> 00:28:35,150 An easier way? 533 00:28:35,150 --> 00:28:36,580 Sure. 534 00:28:36,580 --> 00:28:39,180 Because when things are independent, the joint 535 00:28:39,180 --> 00:28:44,330 probability is equal to the product of the individual 536 00:28:44,330 --> 00:28:45,190 probabilities. 537 00:28:45,190 --> 00:28:46,910 So that is to say-- 538 00:28:46,910 --> 00:28:48,670 it's easier to see it if you write it down than if 539 00:28:48,670 --> 00:28:49,700 you just say it-- 540 00:28:49,700 --> 00:28:54,200 this probability here from these two elements here is 541 00:28:54,200 --> 00:28:59,280 equal to the probability of e sub 1 conditioned on c sub i 542 00:28:59,280 --> 00:29:04,970 times the probability of e sub 2 conditioned on c sub i, all 543 00:29:04,970 --> 00:29:08,080 the way down to the probability of e sub n 544 00:29:08,080 --> 00:29:14,800 conditioned on c sub i divided by some denominator we don't 545 00:29:14,800 --> 00:29:16,480 care about. 546 00:29:16,480 --> 00:29:18,860 See, what I'm going to try to do is I'm going to go through 547 00:29:18,860 --> 00:29:22,240 this for all the ci and see which one's the biggest. 548 00:29:22,240 --> 00:29:23,620 STUDENT: That's the [INAUDIBLE] ci, right? 549 00:29:26,390 --> 00:29:29,070 PROFESSOR PATRICK WINSTON: This is the probability of-- 550 00:29:29,070 --> 00:29:31,470 STUDENT: [INAUDIBLE] 551 00:29:31,470 --> 00:29:33,870 right-hand side [INAUDIBLE]. 552 00:29:33,870 --> 00:29:35,990 PROFESSOR PATRICK WINSTON: Right here? 553 00:29:35,990 --> 00:29:38,820 Oh yes, you're quite right. 554 00:29:38,820 --> 00:29:40,070 Oh yeah, thanks. 555 00:29:45,270 --> 00:29:46,775 I can't write and think at the same time. 556 00:29:46,775 --> 00:29:49,410 Thanks. 557 00:29:49,410 --> 00:29:49,770 OK. 558 00:29:49,770 --> 00:29:51,700 So I've just figure out which one of these is the biggest. 559 00:29:51,700 --> 00:29:54,150 And I've identified the class. 560 00:29:54,150 --> 00:29:57,490 Now you say to me, well, I would like to see an example. 561 00:29:57,490 --> 00:29:59,870 So-- 562 00:29:59,870 --> 00:30:02,050 I don't know, does anyone have any spare change? 563 00:30:04,660 --> 00:30:07,300 A nickel, a quarter. 564 00:30:07,300 --> 00:30:11,190 This is not because of infinitesimally low raises 565 00:30:11,190 --> 00:30:11,960 here at MIT. 566 00:30:11,960 --> 00:30:15,180 I just need it for a demonstration. 567 00:30:15,180 --> 00:30:17,180 I need two coins. 568 00:30:17,180 --> 00:30:19,140 Don't forget to get these back, I tend to be-- 569 00:30:19,140 --> 00:30:23,280 Now suppose these two coins are not exactly the same. 570 00:30:23,280 --> 00:30:28,200 One of these points is a legitimate, highly-prized 571 00:30:28,200 --> 00:30:30,160 American quarter. 572 00:30:30,160 --> 00:30:32,100 The other one is a fake. 573 00:30:32,100 --> 00:30:33,960 And with this one, the probability of heads, let us 574 00:30:33,960 --> 00:30:39,050 say, is 0.8 instead of 0.5. 575 00:30:39,050 --> 00:30:41,710 So I mix these all up. 576 00:30:41,710 --> 00:30:43,960 And I pick one. 577 00:30:43,960 --> 00:30:46,520 And I start flipping it. 578 00:30:46,520 --> 00:30:49,540 And I get a head. 579 00:30:49,540 --> 00:30:52,390 Then I flip it again. 580 00:30:52,390 --> 00:30:55,040 And I get a tail. 581 00:30:55,040 --> 00:30:58,750 Which coin did I pick? 582 00:30:58,750 --> 00:31:02,480 Well, we're going to use this stuff to figure it out. 583 00:31:02,480 --> 00:31:03,730 Here's what happens. 584 00:31:16,940 --> 00:31:18,500 Before I forget. 585 00:31:18,500 --> 00:31:20,440 Thank you very much. 586 00:31:20,440 --> 00:31:22,430 So what we've done is we've selected these 587 00:31:22,430 --> 00:31:23,810 things from my hands. 588 00:31:23,810 --> 00:31:24,390 And I can't draw hands. 589 00:31:24,390 --> 00:31:27,210 So I'll draw a little cup here. 590 00:31:27,210 --> 00:31:28,700 And there are two coins in here. 591 00:31:28,700 --> 00:31:29,610 And we're going to pick one. 592 00:31:29,610 --> 00:31:35,690 And one has a probability of heads equal to 0.8. 593 00:31:35,690 --> 00:31:40,913 And this one has a probability of a head of 0.5. 594 00:31:43,480 --> 00:31:45,900 So here's the draw. 595 00:31:45,900 --> 00:31:47,060 I pick one. 596 00:31:47,060 --> 00:31:48,840 Each has a probability of 0.5. 597 00:31:52,260 --> 00:31:56,020 This one is the one with the 0.8 as the 598 00:31:56,020 --> 00:31:57,300 probability of head. 599 00:31:57,300 --> 00:31:58,500 And this one is the one with the 600 00:31:58,500 --> 00:32:03,270 probability of 0.5 as a head. 601 00:32:03,270 --> 00:32:05,570 OK? 602 00:32:05,570 --> 00:32:12,910 So now suppose the first flips as it was is T. Well, that's a 603 00:32:12,910 --> 00:32:14,382 piece of evidence. 604 00:32:14,382 --> 00:32:15,310 That's here. 605 00:32:15,310 --> 00:32:18,230 Probably of evidence given the class. 606 00:32:18,230 --> 00:32:22,240 Well in the case of having drawn this biased coin, the 607 00:32:22,240 --> 00:32:29,740 probability of coming up with a tail-- ah, let's say a head, 608 00:32:29,740 --> 00:32:30,890 just to make my numbers a little easier. 609 00:32:30,890 --> 00:32:38,010 Probability of coming out there with a head is equal 0.8 610 00:32:38,010 --> 00:32:40,810 given that it's up here in this choice. 611 00:32:40,810 --> 00:32:46,265 The probability given that you have a fair coin is 0.5. 612 00:32:49,260 --> 00:32:59,230 So now if we take the next coin and take it to be a tail 613 00:32:59,230 --> 00:33:02,990 then the probability of this guy given 614 00:33:02,990 --> 00:33:06,610 that evidence is 0.2. 615 00:33:06,610 --> 00:33:08,900 And the probability of this guy given that evidence-- it's 616 00:33:08,900 --> 00:33:11,050 a fair coin, so it doesn't care. 617 00:33:11,050 --> 00:33:12,300 It's still 0.5. 618 00:33:14,250 --> 00:33:16,100 So now what's the probability of this 619 00:33:16,100 --> 00:33:19,710 class given this evidence? 620 00:33:19,710 --> 00:33:25,804 It's the product 0.5 times 0.8 times 0.2. 621 00:33:25,804 --> 00:33:28,590 And what's the probability of this guy? 622 00:33:28,590 --> 00:33:34,800 It's 05 times 0.5 times 0.5, divided by a denominator which 623 00:33:34,800 --> 00:33:37,520 is the same in both cases. 624 00:33:37,520 --> 00:33:40,230 So let's forget about this early 0.5 here. 625 00:33:40,230 --> 00:33:41,650 Because it's the same in both cases. 626 00:33:48,300 --> 00:33:50,660 And we just multiply those numbers together. 627 00:33:50,660 --> 00:33:52,940 That gives us 0.8 times 0.2. 628 00:33:52,940 --> 00:33:53,440 What's that? 629 00:33:53,440 --> 00:33:57,030 0.16? 630 00:33:57,030 --> 00:34:01,010 And this guy, 0.5 times 0.5, that's 0.25. 631 00:34:01,010 --> 00:34:04,610 So it looks an awful lot like-- with this combination-- 632 00:34:04,610 --> 00:34:08,050 that I've picked the coin that's fair. 633 00:34:08,050 --> 00:34:10,260 One more flip? 634 00:34:10,260 --> 00:34:11,870 So let's flip it again, and suppose we 635 00:34:11,870 --> 00:34:13,498 come up with a head. 636 00:34:13,498 --> 00:34:16,520 So that puts a 0.8 in here. 637 00:34:16,520 --> 00:34:19,170 And 0.5 in here. 638 00:34:19,170 --> 00:34:29,590 When you multiply those out that's 0.125. 639 00:34:29,590 --> 00:34:34,980 And this is 0.128. 640 00:34:34,980 --> 00:34:37,830 So it's about equal. 641 00:34:37,830 --> 00:34:40,333 So you see how that works? 642 00:34:40,333 --> 00:34:41,630 All right. 643 00:34:41,630 --> 00:34:44,560 So we're using the coin flips as evidence to figure out 644 00:34:44,560 --> 00:34:47,239 which class is involved. 645 00:34:47,239 --> 00:34:49,840 OK so I don't know, you'd probably like to see a 646 00:34:49,840 --> 00:34:51,370 demonstration of this, too, right? 647 00:34:51,370 --> 00:34:56,270 You say to me, gosh, just two kinds of coins. 648 00:34:56,270 --> 00:34:57,970 That's not very interesting. 649 00:34:57,970 --> 00:34:59,570 Let's try five kinds of coins. 650 00:35:03,660 --> 00:35:08,650 So what I want to show you is how the probabilities for all 651 00:35:08,650 --> 00:35:11,130 these coins-- there are five of them, color-coded-- 652 00:35:11,130 --> 00:35:15,700 how the probabilities vary with a series of flips. 653 00:35:15,700 --> 00:35:18,010 Let's suppose I've got a head-- 654 00:35:18,010 --> 00:35:22,080 the grey line, by the way, is the fraction of heads-- 655 00:35:22,080 --> 00:35:22,820 so that's going to be one. 656 00:35:22,820 --> 00:35:24,500 Because I'm just doing heads. 657 00:35:24,500 --> 00:35:27,830 You see that black line rising? 658 00:35:27,830 --> 00:35:29,250 Should look like a rocket. 659 00:35:29,250 --> 00:35:32,500 That's the probability that the-- 660 00:35:32,500 --> 00:35:35,060 that's the coin which only shows heads, the probability 661 00:35:35,060 --> 00:35:36,310 of head is 1. 662 00:35:38,760 --> 00:35:42,370 And I'm flipping a whole bunch of heads here. 663 00:35:42,370 --> 00:35:43,980 Isn't that cool? 664 00:35:43,980 --> 00:35:46,920 Now what happens if I suddenly put in a tail? 665 00:35:46,920 --> 00:35:50,280 By the way, you'll no doubt, here one the extreme left-- 666 00:35:50,280 --> 00:35:57,300 the initial probability of the P=0 coin was 0.1. 667 00:35:57,300 --> 00:35:59,950 As soon as I flipped a head that went to 0. 668 00:35:59,950 --> 00:36:02,930 And it will never get off 0, right? 669 00:36:02,930 --> 00:36:03,450 That makes sense. 670 00:36:03,450 --> 00:36:05,570 Because if the probability that you'll get a head is 1 671 00:36:05,570 --> 00:36:06,760 you should never see a tail. 672 00:36:06,760 --> 00:36:09,030 If you ever do, that isn't your coin. 673 00:36:09,030 --> 00:36:13,260 What happens now if I interrupt a series of heads 674 00:36:13,260 --> 00:36:14,957 and produce a tail? 675 00:36:14,957 --> 00:36:16,418 STUDENT: [INAUDIBLE]. 676 00:36:16,418 --> 00:36:17,392 PROFESSOR PATRICK WINSTON: What's that? 677 00:36:17,392 --> 00:36:18,860 STUDENT: [INAUDIBLE]. 678 00:36:18,860 --> 00:36:20,240 PROFESSOR PATRICK WINSTON: The black one will go to 0. 679 00:36:20,240 --> 00:36:21,655 What else happens? 680 00:36:21,655 --> 00:36:24,470 By the way, the blue one is the one with the highest 681 00:36:24,470 --> 00:36:27,400 probability of being a head. 682 00:36:27,400 --> 00:36:28,670 [INAUDIBLE] 683 00:36:28,670 --> 00:36:29,850 Boom! 684 00:36:29,850 --> 00:36:31,720 That blue one shot up. 685 00:36:31,720 --> 00:36:32,910 Not going up slowly. 686 00:36:32,910 --> 00:36:35,320 It shot up. 687 00:36:35,320 --> 00:36:37,640 Because now the preponderance of evidence with all those 688 00:36:37,640 --> 00:36:43,130 heads is that I've flipped the coin with a bias of 0.75 689 00:36:43,130 --> 00:36:44,820 towards heads. 690 00:36:44,820 --> 00:36:46,540 So let's clear this. 691 00:36:46,540 --> 00:36:47,680 Pick any probability you want. 692 00:36:47,680 --> 00:36:50,160 0.25, 0.5, and so on. 693 00:36:50,160 --> 00:36:52,260 I don't know, let's pick 0.25 since we've been 694 00:36:52,260 --> 00:36:53,510 at the upper end. 695 00:36:58,100 --> 00:36:59,650 So orange is 0.25. 696 00:36:59,650 --> 00:37:02,200 And sure enough, the probability that I've selected 697 00:37:02,200 --> 00:37:06,900 the 0.5 coin is going up and up and up and up after the 698 00:37:06,900 --> 00:37:08,540 original irregularity. 699 00:37:08,540 --> 00:37:10,610 The Law of Large Numbers is setting in. 700 00:37:10,610 --> 00:37:14,030 And a probability that I've got that 0.25 coin in play is 701 00:37:14,030 --> 00:37:16,660 pretty close to 1. 702 00:37:16,660 --> 00:37:16,910 All right. 703 00:37:16,910 --> 00:37:19,360 So that's cool. 704 00:37:19,360 --> 00:37:23,590 Now you say to me, that's awfully nice but stop. 705 00:37:23,590 --> 00:37:30,280 Awfully nice, but not very real-world-ish. 706 00:37:30,280 --> 00:37:33,520 So let me give you another problem. 707 00:37:33,520 --> 00:37:38,930 It's well-known that you are, with high probability, of the 708 00:37:38,930 --> 00:37:42,660 same political persuasion as your parents. 709 00:37:42,660 --> 00:37:46,620 So if I wanted to figure out which party a parent belongs 710 00:37:46,620 --> 00:37:50,140 to, I could look at the party that their 711 00:37:50,140 --> 00:37:53,490 children belong to, right? 712 00:37:53,490 --> 00:37:57,090 So it's just like flipping coins. 713 00:37:57,090 --> 00:37:59,770 The particular coin I have chosen 714 00:37:59,770 --> 00:38:01,680 corresponds to the parent. 715 00:38:01,680 --> 00:38:05,620 Individual flips correspond to the political party that the 716 00:38:05,620 --> 00:38:07,070 child belongs to. 717 00:38:07,070 --> 00:38:08,200 So let's get up a little bit-- 718 00:38:08,200 --> 00:38:09,890 by the way, I wrote all this stuff over the weekend. 719 00:38:09,890 --> 00:38:11,610 So who knows if any of it will work. 720 00:38:11,610 --> 00:38:13,430 But let's see. 721 00:38:13,430 --> 00:38:16,040 A parent party classifier. 722 00:38:16,040 --> 00:38:18,530 There it is, Democrats and Republicans. 723 00:38:18,530 --> 00:38:22,330 And now the prior for being a Republican given here is 0.5. 724 00:38:24,920 --> 00:38:27,300 But I don't know, this is a little bit Democratic state. 725 00:38:27,300 --> 00:38:31,780 So let's adjust that down a little bit. 726 00:38:31,780 --> 00:38:33,640 Somewhere in there might be about right But let's just, 727 00:38:33,640 --> 00:38:37,370 for the sake of a classroom illustration, go down here. 728 00:38:37,370 --> 00:38:39,650 So now the meter is showing the prior probability because 729 00:38:39,650 --> 00:38:41,510 that's the only thing in the formula so far. 730 00:38:41,510 --> 00:38:43,340 I've got no evidence. 731 00:38:43,340 --> 00:38:45,610 So now let's suppose that child number one is a 732 00:38:45,610 --> 00:38:48,380 Republican. 733 00:38:48,380 --> 00:38:50,850 Back to neutral. 734 00:38:50,850 --> 00:38:53,910 So I've got a low probability that the parent-- 735 00:38:53,910 --> 00:38:59,810 a priori probability that the parent is a Republican and a 736 00:38:59,810 --> 00:39:01,870 child who's a Republican. 737 00:39:01,870 --> 00:39:05,670 I notice that 0.2 and 0.8, the conditional is 0.8. 738 00:39:05,670 --> 00:39:06,620 And the prior is 0.2. 739 00:39:06,620 --> 00:39:09,690 That's why it comes out to balance each other, right? 740 00:39:09,690 --> 00:39:11,230 So now if we get another Republican in 741 00:39:11,230 --> 00:39:14,080 there it goes way up. 742 00:39:14,080 --> 00:39:16,860 If I have a Democratic child it goes back down. 743 00:39:16,860 --> 00:39:19,180 If I have an equal balance between children then it goes 744 00:39:19,180 --> 00:39:22,320 way back down because of that prior probability being low. 745 00:39:22,320 --> 00:39:27,280 So if I make that high, even though the children are 746 00:39:27,280 --> 00:39:29,865 balanced, I'm still going to have a high probability of 747 00:39:29,865 --> 00:39:32,250 being a Republican. 748 00:39:32,250 --> 00:39:33,140 Now let's see. 749 00:39:33,140 --> 00:39:36,180 If I take that slider there, the conditional probability, 750 00:39:36,180 --> 00:39:39,800 and drive it to the left here-- let me make that 751 00:39:39,800 --> 00:39:42,180 equally in. 752 00:39:42,180 --> 00:39:44,760 And let's make that one thing. 753 00:39:44,760 --> 00:39:45,620 I don't know. 754 00:39:45,620 --> 00:39:47,766 What am I doing now? 755 00:39:47,766 --> 00:39:52,460 If I make the probability less than 0.5, what's that mean? 756 00:39:52,460 --> 00:39:54,510 That means you're sore at your parents and you want to belong 757 00:39:54,510 --> 00:39:57,720 to a different party. 758 00:39:57,720 --> 00:40:02,200 All right, so now, what's next? 759 00:40:02,200 --> 00:40:04,650 Oh gosh. 760 00:40:04,650 --> 00:40:05,900 What's next? 761 00:40:09,620 --> 00:40:10,870 This is what's next. 762 00:40:15,720 --> 00:40:18,220 What's next to somewhere? 763 00:40:18,220 --> 00:40:20,440 Yeah, this is what's next. 764 00:40:20,440 --> 00:40:21,210 This here. 765 00:40:21,210 --> 00:40:22,460 We've got two models. 766 00:40:24,400 --> 00:40:28,070 Remember when I said we wanted to decide between them? 767 00:40:28,070 --> 00:40:30,930 Can we use that Bayesian hack to do that, too? 768 00:40:30,930 --> 00:40:31,580 Sure. 769 00:40:31,580 --> 00:40:34,790 Because we've got these two models. 770 00:40:34,790 --> 00:40:37,770 We've got the probabilities in them. 771 00:40:37,770 --> 00:40:42,490 So now I can take my data and calculate the probability of a 772 00:40:42,490 --> 00:40:45,620 left model given the data and the probability of the right 773 00:40:45,620 --> 00:40:48,780 model given the data, multiply that times their a priori 774 00:40:48,780 --> 00:40:51,860 probabilities, which I'll assume are equal. 775 00:40:51,860 --> 00:40:54,710 Then I can do a model selection deal much in 776 00:40:54,710 --> 00:40:57,250 defiance to what I was hinting at before. 777 00:40:57,250 --> 00:40:58,500 so let's try that. 778 00:41:03,330 --> 00:41:05,720 Whoa. 779 00:41:05,720 --> 00:41:08,915 There are my two models. 780 00:41:08,915 --> 00:41:10,460 Yes, there they are. 781 00:41:10,460 --> 00:41:11,850 We've already trained them up. 782 00:41:11,850 --> 00:41:13,990 And they've got their probabilities. 783 00:41:13,990 --> 00:41:15,630 Now what we're going to do is we're going to use the 784 00:41:15,630 --> 00:41:18,950 original model to simulate the data. 785 00:41:18,950 --> 00:41:21,350 So what we're going to do is we're going to simulate draws, 786 00:41:21,350 --> 00:41:25,770 simulate events, similarly combinations of all variables 787 00:41:25,770 --> 00:41:30,626 using a model that looks like the one on the left, that is 788 00:41:30,626 --> 00:41:32,530 the one on the left except for the slight differences in 789 00:41:32,530 --> 00:41:34,320 probabilities, OK? 790 00:41:34,320 --> 00:41:36,570 Then we're going to do this Bayesian thing and see where 791 00:41:36,570 --> 00:41:37,880 the meter goes. 792 00:41:37,880 --> 00:41:40,020 So we'll run one data point. 793 00:41:40,020 --> 00:41:41,960 Oops, went the wrong way. 794 00:41:41,960 --> 00:41:42,960 Makes me nervous. 795 00:41:42,960 --> 00:41:44,750 I just finished this at 9:15. 796 00:41:44,750 --> 00:41:46,610 Maybe there's a bug. 797 00:41:46,610 --> 00:41:49,720 Oops, two data points, swings to the left. 798 00:41:49,720 --> 00:41:51,140 Three data points, back to the right. 799 00:41:51,140 --> 00:41:54,070 Of course that's not much data. 800 00:41:54,070 --> 00:41:56,685 So let's put some more data in. 801 00:41:56,685 --> 00:41:57,130 Yeah. 802 00:41:57,130 --> 00:41:58,730 Boom, there it goes. 803 00:41:58,730 --> 00:41:59,480 Let's try that again. 804 00:41:59,480 --> 00:42:01,060 That was cool. 805 00:42:01,060 --> 00:42:07,050 So let's run 1,000 simulations and one data point. 806 00:42:07,050 --> 00:42:08,780 It bobbles around a little bit and goes 807 00:42:08,780 --> 00:42:09,640 flat over to the left. 808 00:42:09,640 --> 00:42:12,630 Because that is the model that reflects the one that the data 809 00:42:12,630 --> 00:42:15,480 is generated from. 810 00:42:15,480 --> 00:42:19,150 So now we got Bayesian classification, except now the 811 00:42:19,150 --> 00:42:21,790 classification has gone one step more and it becomes 812 00:42:21,790 --> 00:42:23,440 structure discovery. 813 00:42:23,440 --> 00:42:25,700 We've got two choices of structure. 814 00:42:25,700 --> 00:42:28,660 And we can use this Bayesian thing to decide which of the 815 00:42:28,660 --> 00:42:31,060 two structures is best. 816 00:42:31,060 --> 00:42:32,150 Isn't that cool? 817 00:42:32,150 --> 00:42:34,230 Well, it's only cool if you could do what? 818 00:42:38,950 --> 00:42:42,710 So if you had two choices-- 819 00:42:42,710 --> 00:42:46,210 you can select between them and pick the best one-- 820 00:42:46,210 --> 00:42:47,490 but there are-- 821 00:42:47,490 --> 00:42:51,120 gosh, for this number of variables, there are a whole 822 00:42:51,120 --> 00:42:55,540 lot of different networks that satisfy the no looping 823 00:42:55,540 --> 00:43:00,952 criteria and don't have very many parents. 824 00:43:00,952 --> 00:43:02,770 There's an awful lot of them. 825 00:43:02,770 --> 00:43:05,450 In fact, if you strict this network to two parents there 826 00:43:05,450 --> 00:43:08,260 are probably thousands and thousands of possible 827 00:43:08,260 --> 00:43:09,590 structures. 828 00:43:09,590 --> 00:43:10,840 So do I try them all? 829 00:43:13,390 --> 00:43:14,340 Probably not. 830 00:43:14,340 --> 00:43:16,370 It's too much work when you get 30 variables or 831 00:43:16,370 --> 00:43:19,320 something like that. 832 00:43:19,320 --> 00:43:20,450 So what do you do? 833 00:43:20,450 --> 00:43:21,740 We know what to do, right? 834 00:43:21,740 --> 00:43:23,390 We're almost veterans a 6034. 835 00:43:23,390 --> 00:43:25,750 We have to search! 836 00:43:25,750 --> 00:43:30,270 So what we do is we take the loser and we modified it. 837 00:43:30,270 --> 00:43:31,320 And then we modify it again. 838 00:43:31,320 --> 00:43:38,120 And we keep modifying it until we drop dead or we get 839 00:43:38,120 --> 00:43:40,450 something that we're happy with. 840 00:43:40,450 --> 00:43:43,480 So let's see what happens if we change this problem a 841 00:43:43,480 --> 00:43:45,030 little bit and do structure discover. 842 00:43:45,030 --> 00:43:47,740 We're starting out with nothing linked. 843 00:43:47,740 --> 00:43:50,380 And we're going to just start running this guy. 844 00:43:50,380 --> 00:43:51,600 So what's going to happen is that the 845 00:43:51,600 --> 00:43:53,690 good guy will prevail. 846 00:43:53,690 --> 00:43:55,990 And the bad guy will be a copy of the good guy 847 00:43:55,990 --> 00:43:57,150 perturbed in some way. 848 00:43:57,150 --> 00:44:00,280 So it's a random search. 849 00:44:00,280 --> 00:44:01,380 You'll notice that score-- 850 00:44:01,380 --> 00:44:03,170 it's too small for you to read. 851 00:44:03,170 --> 00:44:04,460 All these things are too small to read. 852 00:44:04,460 --> 00:44:05,710 Let me make it a little bigger. 853 00:44:08,230 --> 00:44:11,130 Too small to read, but that number on the right there is 854 00:44:11,130 --> 00:44:14,240 not the product of the probabilities, actually. 855 00:44:14,240 --> 00:44:19,850 It's the sum of the logarithms of the probabilities. 856 00:44:19,850 --> 00:44:21,780 They go together, right? 857 00:44:21,780 --> 00:44:23,440 And the reason you use this instead of the probabilities 858 00:44:23,440 --> 00:44:27,590 is because these numbers get so small that was a 32-bit 859 00:44:27,590 --> 00:44:29,740 machine, you eventually lose. 860 00:44:29,740 --> 00:44:34,730 So use the log of the probabilities rather than the 861 00:44:34,730 --> 00:44:35,630 product of the probabilities. 862 00:44:35,630 --> 00:44:37,660 You use the sum of the logs instead of the product of the 863 00:44:37,660 --> 00:44:39,840 probabilities. 864 00:44:39,840 --> 00:44:43,210 And eventually, you hope that this thing converges on the 865 00:44:43,210 --> 00:44:44,760 correct interpretation. 866 00:44:44,760 --> 00:44:45,470 But you know what? 867 00:44:45,470 --> 00:44:50,900 This thing is so flat as a space and so a large and so 868 00:44:50,900 --> 00:44:56,490 telephone pole-like that it's full of local maxima. 869 00:44:56,490 --> 00:45:00,670 So what this program is doing is every once in awhile-- 870 00:45:00,670 --> 00:45:02,880 I think with probability 1 and 10; I forgot what 871 00:45:02,880 --> 00:45:03,883 parameters I used-- 872 00:45:03,883 --> 00:45:08,410 every once in awhile, it'll do a total radical rearrangement 873 00:45:08,410 --> 00:45:09,150 of the structures. 874 00:45:09,150 --> 00:45:11,830 In other words, it's a random restart. 875 00:45:11,830 --> 00:45:13,820 It keeps track of the best guy so far. 876 00:45:13,820 --> 00:45:16,370 And every once in awhile it does a totally random restart 877 00:45:16,370 --> 00:45:18,870 in its effort to search the space. 878 00:45:18,870 --> 00:45:23,130 So that's how you go from probabilistic inference to 879 00:45:23,130 --> 00:45:25,300 structure discovery. 880 00:45:25,300 --> 00:45:29,560 Now when is this stuff actually useful? 881 00:45:29,560 --> 00:45:32,600 Well, I hinted at a medical diagnosis, right? 882 00:45:32,600 --> 00:45:34,890 That's a situation where you've got some symptoms. 883 00:45:34,890 --> 00:45:38,160 And you want to know what the disease is. 884 00:45:38,160 --> 00:45:42,470 So as soon as you use the keyword "diagnosis," you've 885 00:45:42,470 --> 00:45:45,840 got a problem for which this stuff is a candidate. 886 00:45:45,840 --> 00:45:48,640 So what other kinds of diagnosis problems are there? 887 00:45:48,640 --> 00:45:51,360 Well, you might be lying to me. 888 00:45:51,360 --> 00:45:53,350 So I can put a lie detector on you. 889 00:45:53,350 --> 00:45:55,980 And each of those variables that are measured by the lie 890 00:45:55,980 --> 00:45:58,370 detector are an independent indication whether you're 891 00:45:58,370 --> 00:45:59,910 telling the truth or not. 892 00:45:59,910 --> 00:46:04,300 So it's this kind of Bayesian discovery thing. 893 00:46:04,300 --> 00:46:07,410 Naive Bayesian Classification. 894 00:46:07,410 --> 00:46:09,972 What other kinds of problems speak to 895 00:46:09,972 --> 00:46:11,130 the issue of diagnosis? 896 00:46:11,130 --> 00:46:14,410 Well, we like to know how well you know the material! 897 00:46:14,410 --> 00:46:19,110 So we can use quizzes as pieces of evidence. 898 00:46:19,110 --> 00:46:21,590 Thank god we don't use exactly a naive Bayesian classifier, 899 00:46:21,590 --> 00:46:24,600 because then we wouldn't be able to do that combination. 900 00:46:24,600 --> 00:46:27,020 We have to use a slightly more complex-- 901 00:46:27,020 --> 00:46:30,280 what you can think of as a slightly more complex Bayesian 902 00:46:30,280 --> 00:46:33,430 net to do that particular kind of diagnosis. 903 00:46:33,430 --> 00:46:36,950 You might have a spacecraft or an airplane or other piece of 904 00:46:36,950 --> 00:46:38,950 equipment with all sorts of symptoms. 905 00:46:38,950 --> 00:46:40,530 You're trying to figure out what to do next, 906 00:46:40,530 --> 00:46:42,100 what the cause is. 907 00:46:42,100 --> 00:46:46,210 So using the evidence to go backward to the cause. 908 00:46:46,210 --> 00:46:49,470 So maybe you've got some program that doesn't work. 909 00:46:49,470 --> 00:46:51,390 Happens to me a lot. 910 00:46:51,390 --> 00:46:54,580 So I use the evidence from the symptoms of the misbehavior to 911 00:46:54,580 --> 00:46:58,480 figure out what the most probable cause is. 912 00:46:58,480 --> 00:47:03,020 But now to conclude the day-- last time there weren't any 913 00:47:03,020 --> 00:47:03,890 powerful ideas. 914 00:47:03,890 --> 00:47:08,090 But if you take the combination of the last 915 00:47:08,090 --> 00:47:14,170 lecture and this lecture to be a candidate for gold star 916 00:47:14,170 --> 00:47:17,740 ideas, these are the ones I'd like to leave you with. 917 00:47:17,740 --> 00:47:19,750 We got here is-- 918 00:47:19,750 --> 00:47:22,420 this Bayesian stuff, all these probabilistic calculations are 919 00:47:22,420 --> 00:47:23,965 the right thing to do. 920 00:47:23,965 --> 00:47:30,270 They're the right way to work when you don't know anything, 921 00:47:30,270 --> 00:47:32,600 which would make it sound like you're not very useful, 922 00:47:32,600 --> 00:47:34,890 because you think you always-- well, in fact, there are a lot 923 00:47:34,890 --> 00:47:38,020 of situations where you either can't know everything, don't 924 00:47:38,020 --> 00:47:40,420 have time to know everything, or don't want to take the 925 00:47:40,420 --> 00:47:42,630 effort to know everything. 926 00:47:42,630 --> 00:47:47,190 So in medical diagnosis all you've got is the symptoms. 927 00:47:47,190 --> 00:47:49,730 You can't go in there and figure out in a more precise 928 00:47:49,730 --> 00:47:50,750 way exactly what's wrong. 929 00:47:50,750 --> 00:47:53,730 So you use the symptoms to determine what the cause is. 930 00:47:53,730 --> 00:47:56,850 And then all those other kinds of cases that I mentioned. 931 00:47:56,850 --> 00:48:00,520 But now, what other kinds of structure discovery are there? 932 00:48:00,520 --> 00:48:02,760 Well, the kind of structure discovery that I hinted at in 933 00:48:02,760 --> 00:48:05,550 the beginning will be the subject that we'll begin with 934 00:48:05,550 --> 00:48:10,330 during our next and sadly final conversation here in 935 00:48:10,330 --> 00:48:11,700 [? 10250 ?] 936 00:48:11,700 --> 00:48:13,320 on Wednesday. 937 00:48:13,320 --> 00:48:16,210 It will feature not only a discussion of how this stuff 938 00:48:16,210 --> 00:48:19,890 can be used to discover patterns and stories, but 939 00:48:19,890 --> 00:48:24,590 we'll also talk about what's on the final, what kind of 940 00:48:24,590 --> 00:48:28,870 thing you could do next, that sort of thing to finish off 941 00:48:28,870 --> 00:48:29,420 the subject. 942 00:48:29,420 --> 00:48:31,710 And that's the end of the story for today.