1 00:00:09,082 --> 00:00:13,380 PATRICK WINSTON: Here we are, down to the final sprint. 2 00:00:13,380 --> 00:00:15,480 Three to go. 3 00:00:15,480 --> 00:00:17,955 And we're going to take some of the last three, maybe two 4 00:00:17,955 --> 00:00:22,640 of the last three, to talk a little bit about stuff having 5 00:00:22,640 --> 00:00:24,920 to do with probabilistic approaches-- 6 00:00:24,920 --> 00:00:28,400 use of probability in artificial intelligence. 7 00:00:28,400 --> 00:00:31,060 Now, for many of you, this will be kind of a review, 8 00:00:31,060 --> 00:00:33,370 because I know many of you learned about probability over 9 00:00:33,370 --> 00:00:37,230 the [? sand ?] table and every year since then. 10 00:00:37,230 --> 00:00:39,780 But maybe we'll put another little twist into it, 11 00:00:39,780 --> 00:00:42,340 especially toward the end of the hour when we get into a 12 00:00:42,340 --> 00:00:52,160 discussion of that which has come to be called belief nets. 13 00:00:52,160 --> 00:00:58,530 But first, I was driving in this morning, and I was quite 14 00:00:58,530 --> 00:01:05,670 astonished to see, as I drove in, this thing here. 15 00:01:05,670 --> 00:01:09,720 And my first reaction was, oh my god, it's the world's 16 00:01:09,720 --> 00:01:12,710 greatest hack. 17 00:01:12,710 --> 00:01:16,525 And then I decided, well, maybe it's a piece of art. 18 00:01:19,170 --> 00:01:22,420 So I'd like to address the question of how I could come 19 00:01:22,420 --> 00:01:23,890 to grips with that issue. 20 00:01:23,890 --> 00:01:28,530 There's a distinct possibility that this thing is a 21 00:01:28,530 --> 00:01:32,070 consequence of a hat, possibly the result of 22 00:01:32,070 --> 00:01:33,320 some kind of art show. 23 00:01:36,740 --> 00:01:43,960 And in any event, some sort of statue appeared, and statues 24 00:01:43,960 --> 00:01:46,190 don't usually appear like that. 25 00:01:46,190 --> 00:01:48,710 So I got the possibility of thinking about how all these 26 00:01:48,710 --> 00:01:54,150 things might occur together or not occur together. 27 00:01:54,150 --> 00:02:00,670 So the natural thing is to build myself some sort of 28 00:02:00,670 --> 00:02:05,640 table to keep track of my observations. 29 00:02:05,640 --> 00:02:08,478 So I have three columns in my table. 30 00:02:08,478 --> 00:02:13,680 I've got the possibility of a statue appearing, a hack 31 00:02:13,680 --> 00:02:17,960 having occurred, and some sort of art show. 32 00:02:17,960 --> 00:02:22,090 And so I can make a table of all the combinations of those 33 00:02:22,090 --> 00:02:23,340 things that might appear. 34 00:02:28,290 --> 00:02:32,520 And I happen to have already guessed that there are going 35 00:02:32,520 --> 00:02:36,010 to be eight rows in my table. 36 00:02:36,010 --> 00:02:37,310 So it's going to look like this. 37 00:02:44,430 --> 00:02:47,460 And this is the set of combinations in this row where 38 00:02:47,460 --> 00:02:50,110 none of that occurs at all. 39 00:02:50,110 --> 00:02:53,030 And down here is the situation where all of 40 00:02:53,030 --> 00:02:54,150 those things occur. 41 00:02:54,150 --> 00:02:57,620 After all, it's possible that we can have an art show and 42 00:02:57,620 --> 00:03:00,320 have a hack be a legitimate participant in the art show. 43 00:03:00,320 --> 00:03:03,440 That's why we have that final row. 44 00:03:03,440 --> 00:03:05,665 So we have all manner of combinations in between. 45 00:03:08,810 --> 00:03:11,063 So those are those combinations. 46 00:03:11,063 --> 00:03:21,880 Then we have F, F, T, T, F, F, T, T, F, T, F, T, F, T, F, T. 47 00:03:21,880 --> 00:03:26,150 So it's plain that the number of rows in the table, or these 48 00:03:26,150 --> 00:03:31,820 binary possibilities, is 2 to the number of variables. 49 00:03:31,820 --> 00:03:33,610 And that could be a big number. 50 00:03:33,610 --> 00:03:38,329 In fact, I'd love to do a bigger example, but I don't 51 00:03:38,329 --> 00:03:40,795 have the patience to do it. 52 00:03:40,795 --> 00:03:44,940 But anyhow, what we might do is in order to figure out how 53 00:03:44,940 --> 00:03:47,970 likely any of these combinations are, is we might 54 00:03:47,970 --> 00:03:50,590 have observed the area outside the student center and rest of 55 00:03:50,590 --> 00:03:54,370 campus over a long period of time and keep track of what 56 00:03:54,370 --> 00:03:58,020 happens on 1,000 days. 57 00:03:58,020 --> 00:04:01,230 Or maybe 1,000 months or 1,000 years. 58 00:04:01,230 --> 00:04:02,730 I don't know. 59 00:04:02,730 --> 00:04:04,980 The trouble is, these events don't happen very often. 60 00:04:04,980 --> 00:04:08,690 So the period of time that I use for measurement needs to 61 00:04:08,690 --> 00:04:09,440 be fairly long. 62 00:04:09,440 --> 00:04:10,950 Probably a day is not short enough. 63 00:04:10,950 --> 00:04:16,720 But in any case, I can keep a tally of how often I see these 64 00:04:16,720 --> 00:04:18,420 various combinations. 65 00:04:18,420 --> 00:04:22,330 So this one might be, for example, 405, this one might 66 00:04:22,330 --> 00:04:26,820 be 45, this one might be 225, this one might 67 00:04:26,820 --> 00:04:30,000 be 40, and so on. 68 00:04:30,000 --> 00:04:32,720 And so having done all those measurements, kept track of 69 00:04:32,720 --> 00:04:37,340 all that data, then I could say, well, the probability 70 00:04:37,340 --> 00:04:45,350 that at any given time period one of these things occurs 71 00:04:45,350 --> 00:04:48,280 will just be the frequency-- 72 00:04:48,280 --> 00:04:49,670 the number of tallies divided by the 73 00:04:49,670 --> 00:04:51,760 total number of tallies. 74 00:04:51,760 --> 00:04:53,400 So that would be a number between 0 and 1. 75 00:04:56,330 --> 00:05:00,990 So that's the probability for each of these events. 76 00:05:00,990 --> 00:05:03,690 And it's readily calculated from my data. 77 00:05:03,690 --> 00:05:09,020 And once I do that, then I can say that I got myself a joint 78 00:05:09,020 --> 00:05:14,160 probability table, and I could perform all manner of miracles 79 00:05:14,160 --> 00:05:17,220 using that joint probability table. 80 00:05:17,220 --> 00:05:18,950 So let me perform a few of those miracles, 81 00:05:18,950 --> 00:05:20,200 while we're at it. 82 00:05:28,240 --> 00:05:29,730 There's the table. 83 00:05:29,730 --> 00:05:36,010 And now, what I want to do is I want to count up the 84 00:05:36,010 --> 00:05:39,280 probability in all the rows where the statue appears. 85 00:05:39,280 --> 00:05:41,740 So that's going to be the probability 86 00:05:41,740 --> 00:05:43,770 of the statue appearing. 87 00:05:43,770 --> 00:05:47,990 So I'll just check off those four boxes there. 88 00:05:47,990 --> 00:05:49,800 And it looks like the probability of the statue 89 00:05:49,800 --> 00:05:54,210 appearing is about 0.355 in my model. 90 00:05:54,210 --> 00:05:56,320 I don't think it's quite that frequent, but this is a 91 00:05:56,320 --> 00:05:57,590 classroom exercise, right? 92 00:05:57,590 --> 00:06:01,670 So I can make up whatever numbers I want. 93 00:06:04,560 --> 00:06:09,170 Now, I could say, well, what's the probability of a statue 94 00:06:09,170 --> 00:06:15,220 occurring given that there's an art show? 95 00:06:15,220 --> 00:06:18,180 Well, I can limit my tallies to those in which art show is 96 00:06:18,180 --> 00:06:22,030 true, like so. 97 00:06:22,030 --> 00:06:23,210 And in that case, the probability 98 00:06:23,210 --> 00:06:24,410 has just zoomed up. 99 00:06:24,410 --> 00:06:26,490 So if I know there's an art show, there's a much higher 100 00:06:26,490 --> 00:06:30,885 probability that a statue will appear. 101 00:06:33,540 --> 00:06:37,400 And if I know there's a hack as well as an art show going 102 00:06:37,400 --> 00:06:41,750 on, it goes up higher still to 0.9. 103 00:06:41,750 --> 00:06:43,180 We can also do other kinds of things. 104 00:06:43,180 --> 00:06:46,060 For example, we can go back to the original table. 105 00:06:46,060 --> 00:06:53,450 And instead of counting up the probability we've got a 106 00:06:53,450 --> 00:06:56,310 statue, as we just did, we're going to calculate the 107 00:06:56,310 --> 00:06:59,680 probability that there is an art show. 108 00:06:59,680 --> 00:07:02,800 I guess that would be that one and that one, not that one, 109 00:07:02,800 --> 00:07:03,890 but that one. 110 00:07:03,890 --> 00:07:09,310 So the probability there's an art show is one chance in 10. 111 00:07:09,310 --> 00:07:11,030 Or we can do the same thing with a hack. 112 00:07:11,030 --> 00:07:16,085 In that case, we get that one off, that one on, that one 113 00:07:16,085 --> 00:07:17,565 off, that one on, that one off, that one 114 00:07:17,565 --> 00:07:18,930 on, that one off. 115 00:07:18,930 --> 00:07:21,420 So the probability of a hack on any given time period is 116 00:07:21,420 --> 00:07:25,230 about 50-50. 117 00:07:25,230 --> 00:07:27,750 So I've cooked up this little demo so it does the "ands" of 118 00:07:27,750 --> 00:07:28,240 all these things. 119 00:07:28,240 --> 00:07:29,980 It could do "ors," too, with a little more work. 120 00:07:29,980 --> 00:07:33,130 But these are just the "ands" of these various combinations. 121 00:07:33,130 --> 00:07:35,270 Then you can ask more complicated questions, like 122 00:07:35,270 --> 00:07:38,130 for example, you could say, what is the probability of a 123 00:07:38,130 --> 00:07:43,540 hack given that there's a statue? 124 00:07:43,540 --> 00:07:48,620 And that would be limiting the calculations to those rows in 125 00:07:48,620 --> 00:07:50,075 which the statue thing is true. 126 00:07:52,840 --> 00:07:57,430 And then what I get is 0.781. 127 00:07:57,430 --> 00:08:04,010 Now, what would happen to the probability that it's a hack 128 00:08:04,010 --> 00:08:05,375 if I know that there's an art show? 129 00:08:07,930 --> 00:08:09,280 Will that number go up or down? 130 00:08:12,580 --> 00:08:15,010 Well, let's try it. 131 00:08:15,010 --> 00:08:17,600 Ah, it went down. 132 00:08:17,600 --> 00:08:21,530 So that's sort of because the existence of the art show sort 133 00:08:21,530 --> 00:08:27,810 of explains why the statue might be there. 134 00:08:27,810 --> 00:08:29,660 Now, just for fun, I'm going to switch to another 135 00:08:29,660 --> 00:08:31,610 situation, very similar. 136 00:08:31,610 --> 00:08:39,100 And the situation here is that a neighbor's dog often barks. 137 00:08:39,100 --> 00:08:40,770 It might be because of a burglar. 138 00:08:40,770 --> 00:08:42,480 It might be because of a raccoon. 139 00:08:42,480 --> 00:08:45,660 Sometimes, there's a burglar and a raccoon. 140 00:08:45,660 --> 00:08:48,900 Sometimes, the damn dog just barks. 141 00:08:48,900 --> 00:08:55,370 So let's do some calculations there and calculate the 142 00:08:55,370 --> 00:08:58,310 probability that a raccoon is true, similar to 143 00:08:58,310 --> 00:09:00,660 what we did last time. 144 00:09:00,660 --> 00:09:03,240 Looks like on any given night-- 145 00:09:03,240 --> 00:09:05,550 it's kind of a wooded are-- there's a high probability of 146 00:09:05,550 --> 00:09:08,600 a raccoon showing up. 147 00:09:08,600 --> 00:09:16,130 And then we can ask, well, what is the probability of the 148 00:09:16,130 --> 00:09:19,670 dog barking given that a raccoon shows up? 149 00:09:19,670 --> 00:09:21,960 Well, in that case, we want to just limit the number of rows 150 00:09:21,960 --> 00:09:23,190 to those where a raccoon-- 151 00:09:23,190 --> 00:09:26,430 or where the dog is barking. 152 00:09:26,430 --> 00:09:30,320 Looks like the probability of the dog barking, knowing 153 00:09:30,320 --> 00:09:32,410 nothing else, is about [? 3/7. ?] 154 00:09:36,790 --> 00:09:40,290 But now we want to know the probability of the raccoon-- 155 00:09:40,290 --> 00:09:43,030 that's these guys here need to get checked. 156 00:09:43,030 --> 00:09:44,570 These are off. 157 00:09:44,570 --> 00:09:46,115 So that's the probability of a raccoon. 158 00:09:49,400 --> 00:09:52,490 Did I get that right? 159 00:09:52,490 --> 00:09:54,340 Oh, that's probability of a burglar. 160 00:09:54,340 --> 00:09:55,590 Sorry, that was too hard. 161 00:09:57,540 --> 00:09:59,550 So let me go back and calculate-- 162 00:09:59,550 --> 00:10:02,050 I want to get the probability of a raccoon. 163 00:10:02,050 --> 00:10:10,930 That's true, false, true, false, true, false, true. 164 00:10:10,930 --> 00:10:12,560 So the probability of a raccoon, as I 165 00:10:12,560 --> 00:10:14,570 said before is 0.5. 166 00:10:14,570 --> 00:10:18,220 Now, what happens to that probability if I know the dog 167 00:10:18,220 --> 00:10:20,050 is barking? 168 00:10:20,050 --> 00:10:23,690 Well, all I need to do is limit my rows to those where 169 00:10:23,690 --> 00:10:26,510 the dog is barking, those bottom four. 170 00:10:26,510 --> 00:10:28,800 And I'll click that there, and you'll notice all these 171 00:10:28,800 --> 00:10:33,690 tallies up above the midpoint have gone to zero, because 172 00:10:33,690 --> 00:10:35,380 we're only considering those cases 173 00:10:35,380 --> 00:10:37,590 where the dog is barking. 174 00:10:37,590 --> 00:10:40,140 In that case, the probability that there's a raccoon-- 175 00:10:40,140 --> 00:10:41,500 just the number of tallies over the 176 00:10:41,500 --> 00:10:43,740 total number of tallies-- 177 00:10:43,740 --> 00:10:48,150 gee, I guess it's 225 plus 50 divided by 370. 178 00:10:48,150 --> 00:10:51,050 That turns out to be 0.743. 179 00:10:51,050 --> 00:10:56,400 So about 75% of the time, the dog barking is accounted for-- 180 00:10:56,400 --> 00:10:59,680 well, the probability of a raccoon under those conditions 181 00:10:59,680 --> 00:11:01,560 is pretty high. 182 00:11:01,560 --> 00:11:04,560 And now, once again, I'm going to ask, well, what is the 183 00:11:04,560 --> 00:11:08,810 probability of a raccoon, given that the dog is barking 184 00:11:08,810 --> 00:11:12,170 and there's a burglar? 185 00:11:12,170 --> 00:11:14,040 Any guess what will happen there? 186 00:11:14,040 --> 00:11:18,120 We did this once before with the statue. 187 00:11:18,120 --> 00:11:20,680 Probability first went up when we saw the statue and then 188 00:11:20,680 --> 00:11:23,340 went down when we saw another explanation. 189 00:11:23,340 --> 00:11:24,860 Here's this one here. 190 00:11:24,860 --> 00:11:25,930 Wow, look at that. 191 00:11:25,930 --> 00:11:29,830 It went back to its original condition, its a priori 192 00:11:29,830 --> 00:11:32,120 probability. 193 00:11:32,120 --> 00:11:35,850 So somehow, the existence of the burglar and the dog 194 00:11:35,850 --> 00:11:39,740 barking means that the probability of a raccoon is 195 00:11:39,740 --> 00:11:42,402 just what it was before we started this game. 196 00:11:42,402 --> 00:11:44,350 So those are kind of interesting questions, and 197 00:11:44,350 --> 00:11:47,090 there's a lot we can do when we have this table by way of 198 00:11:47,090 --> 00:11:50,220 those kinds of calculations. 199 00:11:50,220 --> 00:11:54,760 And in fact, the whole miracle of probabilistic inference is 200 00:11:54,760 --> 00:11:55,480 right in front of us. 201 00:11:55,480 --> 00:11:58,130 It's the table. 202 00:11:58,130 --> 00:12:00,060 So why don't we go home? 203 00:12:00,060 --> 00:12:03,980 Well, because there's a little problem with this table-- 204 00:12:03,980 --> 00:12:06,370 with these two tables that I've shown you by way of 205 00:12:06,370 --> 00:12:08,210 illustration. 206 00:12:08,210 --> 00:12:17,250 And the problem is that there are a lot of rows. 207 00:12:17,250 --> 00:12:19,160 And I had a hard time making up those numbers. 208 00:12:19,160 --> 00:12:21,910 I didn't have the patience to wait and make observations. 209 00:12:21,910 --> 00:12:23,580 That would take too long. 210 00:12:23,580 --> 00:12:25,910 So I had to kind of make some guesses. 211 00:12:25,910 --> 00:12:29,730 And I could kind of manage it with eight rows-- 212 00:12:29,730 --> 00:12:31,290 those up there. 213 00:12:31,290 --> 00:12:33,740 I could put in some tallies. 214 00:12:33,740 --> 00:12:35,670 It wasn't that big of a deal. 215 00:12:35,670 --> 00:12:39,330 So I got myself all those eight numbers 216 00:12:39,330 --> 00:12:42,760 up there like that. 217 00:12:42,760 --> 00:12:48,130 And similarly, for the art show calculations, produced 218 00:12:48,130 --> 00:12:50,000 eight numbers. 219 00:12:50,000 --> 00:12:53,250 But what if I added something else to the mix? 220 00:12:53,250 --> 00:12:57,310 What if I added the day of the week or 221 00:12:57,310 --> 00:12:59,530 what I had for breakfast? 222 00:12:59,530 --> 00:13:03,000 Each of those things would double the number of rows of 223 00:13:03,000 --> 00:13:06,350 their binary variables. 224 00:13:06,350 --> 00:13:12,860 So if I have to consider 10 influences all working 225 00:13:12,860 --> 00:13:14,790 together, then I'd have 2 to the 10th. 226 00:13:14,790 --> 00:13:18,810 I'd have 1,000 numbers to deal with. 227 00:13:18,810 --> 00:13:21,020 And that would be hard. 228 00:13:21,020 --> 00:13:23,110 But if I had a joint probability table, then I can 229 00:13:23,110 --> 00:13:24,850 do these kinds of miracles. 230 00:13:24,850 --> 00:13:27,640 But Dave, if I could have this little projector now, please. 231 00:13:31,570 --> 00:13:34,780 I just want to emphasize that although we're talking about 232 00:13:34,780 --> 00:13:38,430 probabilistic inference, and it's a very powerful tool, 233 00:13:38,430 --> 00:13:41,500 it's not the only tool we need in our bag. 234 00:13:41,500 --> 00:13:44,080 Trouble with most ideas in artificial intelligence is 235 00:13:44,080 --> 00:13:46,630 that their hardcore proponents think that they're the only 236 00:13:46,630 --> 00:13:48,420 thing to do. 237 00:13:48,420 --> 00:13:52,720 And probabilistic inference has a role to play in 238 00:13:52,720 --> 00:13:54,550 developing a theory of human intelligence. 239 00:13:54,550 --> 00:13:56,880 And it certainly has a practical value, but it's not 240 00:13:56,880 --> 00:13:58,070 the only thing. 241 00:13:58,070 --> 00:14:01,300 And to illustrate that point, I'd like to imagine for a few 242 00:14:01,300 --> 00:14:10,920 moments that MIT were founded in 1861 BC instead of 1861 AD. 243 00:14:10,920 --> 00:14:15,660 And if that were so, then it might be the case that there 244 00:14:15,660 --> 00:14:19,220 would be a research program on what floats. 245 00:14:19,220 --> 00:14:21,980 And this, of course, would be a problem in experimental 246 00:14:21,980 --> 00:14:25,880 physics, and we could imagine that those people back there 247 00:14:25,880 --> 00:14:29,800 in that early MIT would, being experimentally minded, try 248 00:14:29,800 --> 00:14:30,900 some things. 249 00:14:30,900 --> 00:14:33,210 Oh, I didn't know that's what happened. 250 00:14:33,210 --> 00:14:35,660 It looks like chalk floats. 251 00:14:35,660 --> 00:14:38,384 Here's a rock. 252 00:14:38,384 --> 00:14:40,710 No, it didn't float. 253 00:14:40,710 --> 00:14:43,300 Here's some money. 254 00:14:43,300 --> 00:14:44,910 Doesn't float. 255 00:14:44,910 --> 00:14:46,160 Here's a pencil. 256 00:14:48,630 --> 00:14:49,600 No, it doesn't float. 257 00:14:49,600 --> 00:14:51,690 Here's a pen. 258 00:14:51,690 --> 00:14:54,075 Here's a piece of tin foil I got from Kendra. 259 00:14:54,075 --> 00:14:55,490 That floats. 260 00:14:55,490 --> 00:14:56,130 That's a metal. 261 00:14:56,130 --> 00:14:57,180 The other stuff's metal, too. 262 00:14:57,180 --> 00:14:58,830 Now I'm really getting confused. 263 00:14:58,830 --> 00:15:01,530 Here's a little wad of paper. 264 00:15:01,530 --> 00:15:04,670 Here's a cell ph-- no, actually, 265 00:15:04,670 --> 00:15:05,660 I've tried that before. 266 00:15:05,660 --> 00:15:06,910 They don't float. 267 00:15:06,910 --> 00:15:08,240 And they also don't work afterward, either. 268 00:15:10,950 --> 00:15:16,840 I don't need to do any of that in the MIT of 1861 AD and 269 00:15:16,840 --> 00:15:19,330 beyond, because I know that Archimedes 270 00:15:19,330 --> 00:15:20,410 worked this all out. 271 00:15:20,410 --> 00:15:22,300 And all I have to do is measure the volume of the 272 00:15:22,300 --> 00:15:27,380 stuff, divide that by the weight, and if that ratio is 273 00:15:27,380 --> 00:15:29,970 big enough, then the thing will float. 274 00:15:29,970 --> 00:15:32,220 But back in the old days, I would have to try a lot of 275 00:15:32,220 --> 00:15:35,470 stuff and make a big table, taking into account such 276 00:15:35,470 --> 00:15:40,400 factors as how hard it is, how big it is, how heavy it is, 277 00:15:40,400 --> 00:15:42,740 whether it's alive or not. 278 00:15:42,740 --> 00:15:44,540 Most things that are alive float. 279 00:15:44,540 --> 00:15:46,290 Some don't. 280 00:15:46,290 --> 00:15:49,030 Fish don't, for instance. 281 00:15:49,030 --> 00:15:52,790 So it would be foolhardy to do that. 282 00:15:52,790 --> 00:15:56,580 That's sort of a probabilistic inference. 283 00:15:56,580 --> 00:15:58,430 On the other hand, there are lots of things where I don't 284 00:15:58,430 --> 00:16:00,480 know all the stuff I need to know in order to make the 285 00:16:00,480 --> 00:16:01,600 calculation. 286 00:16:01,600 --> 00:16:03,490 I know all the stuff I need to know in order to decide if 287 00:16:03,490 --> 00:16:06,530 something floats, but not all the stuff I need to know in 288 00:16:06,530 --> 00:16:14,210 order, for example, to decide if the child of a Republican 289 00:16:14,210 --> 00:16:17,860 is likely to be a Republican. 290 00:16:17,860 --> 00:16:20,390 There are a lot of subtle influences there, and it is 291 00:16:20,390 --> 00:16:23,365 the case that the children of Republicans and the children 292 00:16:23,365 --> 00:16:26,010 of Democrats are more likely to share the political party 293 00:16:26,010 --> 00:16:28,360 of their parents. 294 00:16:28,360 --> 00:16:30,280 But I don't have any direct way of calculating whether 295 00:16:30,280 --> 00:16:32,310 that will be true or not. 296 00:16:32,310 --> 00:16:35,590 All I can do in that case is what I've done over here, is 297 00:16:35,590 --> 00:16:38,950 do some measurements, get some frequencies, take some 298 00:16:38,950 --> 00:16:42,440 snapshots of the way the world is and incorporate that into a 299 00:16:42,440 --> 00:16:45,630 set of probabilities that can help me determine if any given 300 00:16:45,630 --> 00:16:50,100 parent is a Republican, given that I've observed the voting 301 00:16:50,100 --> 00:16:52,930 behavior their children. 302 00:16:52,930 --> 00:16:56,010 So probability has a place, but it's not the 303 00:16:56,010 --> 00:16:57,760 only tool we need. 304 00:16:57,760 --> 00:17:00,250 And that is an important preamble to all the stuff 305 00:17:00,250 --> 00:17:02,200 we're going to do today. 306 00:17:02,200 --> 00:17:04,770 Now, we're really through, because this joint probability 307 00:17:04,770 --> 00:17:08,240 table is all that there is to it, except for the fact we 308 00:17:08,240 --> 00:17:13,290 can't either record all those numbers, and it becomes 309 00:17:13,290 --> 00:17:16,579 quickly a pain to guess at them. 310 00:17:16,579 --> 00:17:19,348 There are two ways to think about all this. 311 00:17:19,348 --> 00:17:22,880 We can think about these probabilities as probabilities 312 00:17:22,880 --> 00:17:25,230 that come out of looking at some data. 313 00:17:25,230 --> 00:17:28,180 That's a frequentist view of the probabilities. 314 00:17:28,180 --> 00:17:30,310 Or we could say, well, we can't do those measurements. 315 00:17:30,310 --> 00:17:32,500 So I can just make them up. 316 00:17:32,500 --> 00:17:34,820 That's sort of the subjective view of where these 317 00:17:34,820 --> 00:17:37,530 probabilities come from. 318 00:17:37,530 --> 00:17:41,480 And in some cases, some people like to talk about natural 319 00:17:41,480 --> 00:17:44,790 propensities, like in quantum mechanics. 320 00:17:44,790 --> 00:17:47,330 But for our purposes, we either make them up, or we do 321 00:17:47,330 --> 00:17:49,140 some tallying. 322 00:17:49,140 --> 00:17:52,440 Trouble is, we can't deal with this kind of table. 323 00:17:52,440 --> 00:17:54,920 So as a consequence of not being able to deal with this 324 00:17:54,920 --> 00:18:00,020 kind of table, a gigantic industry has emerged for 325 00:18:00,020 --> 00:18:04,370 dealing with probabilities without the need to work up 326 00:18:04,370 --> 00:18:06,340 this full table. 327 00:18:06,340 --> 00:18:07,570 And that's where we're going to go for 328 00:18:07,570 --> 00:18:08,820 the rest of the hour. 329 00:18:12,620 --> 00:18:15,408 And here's the path we're going to take. 330 00:18:15,408 --> 00:18:18,050 We're going to talk about some basic overview of basic 331 00:18:18,050 --> 00:18:19,380 probability. 332 00:18:19,380 --> 00:18:23,080 Then we're going to move ourselves step by step toward 333 00:18:23,080 --> 00:18:26,500 the so-called belief networks, which make it possible to make 334 00:18:26,500 --> 00:18:29,550 this a practical tool. 335 00:18:29,550 --> 00:18:31,120 So let us begin. 336 00:18:31,120 --> 00:18:33,950 The first thing is basic probability. 337 00:18:33,950 --> 00:18:36,660 Let us say basic. 338 00:18:39,270 --> 00:18:41,060 And basic probability-- 339 00:18:41,060 --> 00:18:44,830 all probability flows from a small number of axioms. 340 00:18:44,830 --> 00:18:50,730 We have the probability of some event a has got to be 341 00:18:50,730 --> 00:18:54,400 greater than 0 and less than 1. 342 00:18:54,400 --> 00:18:55,890 That's axiom number one. 343 00:18:59,460 --> 00:19:02,745 In a binary world, things have a probability of being true. 344 00:19:02,745 --> 00:19:05,190 Some have a probability of being false. 345 00:19:05,190 --> 00:19:07,530 But the true event doesn't have any possibility of being 346 00:19:07,530 --> 00:19:12,470 anything other than true, so the probability of true is 347 00:19:12,470 --> 00:19:16,850 equal to 1, and the probability of false-- 348 00:19:16,850 --> 00:19:20,480 the false event, the false condition-- 349 00:19:20,480 --> 00:19:24,430 has no possibility of being true, so that's 0. 350 00:19:24,430 --> 00:19:31,190 Then the third of the axioms of probability is that the 351 00:19:31,190 --> 00:19:39,760 probability of a plus the probability of b minus the 352 00:19:39,760 --> 00:19:47,510 probability of a and b is equal to the 353 00:19:47,510 --> 00:19:51,510 probability of a or b. 354 00:19:54,040 --> 00:19:56,380 Yeah, that makes sense, right? 355 00:19:56,380 --> 00:19:58,900 I guess it would make more sense if I didn't switch my 356 00:19:58,900 --> 00:20:00,730 notation in midstream-- 357 00:20:00,730 --> 00:20:03,970 a and b. 358 00:20:03,970 --> 00:20:06,290 So those are the axioms that mathematicians love to start 359 00:20:06,290 --> 00:20:08,180 up that way, and they can derive everything there is to 360 00:20:08,180 --> 00:20:09,040 derive from that. 361 00:20:09,040 --> 00:20:12,210 But I never can deal with stuff that way. 362 00:20:12,210 --> 00:20:14,270 I have to draw a picture and think of this stuff in a more 363 00:20:14,270 --> 00:20:16,530 intuitionist type of way. 364 00:20:16,530 --> 00:20:20,530 So that's the formal approach to dealing with probability, 365 00:20:20,530 --> 00:20:28,810 and it's mirrored by intuitions that have to do 366 00:20:28,810 --> 00:20:34,410 with discussions of spaces, like so, in which we have 367 00:20:34,410 --> 00:20:42,120 circles, or areas, representing a and b. 368 00:20:42,120 --> 00:20:45,260 And to keep my notation consistent, 369 00:20:45,260 --> 00:20:46,510 I'll make those lowercase. 370 00:20:49,330 --> 00:20:53,580 So you can think of those as spaces of all possible worlds 371 00:20:53,580 --> 00:20:54,990 in which these things might occur. 372 00:20:54,990 --> 00:20:58,100 Or you can think of them as sample spaces. 373 00:20:58,100 --> 00:21:01,290 But in any event, you associate with the probability 374 00:21:01,290 --> 00:21:06,330 of a the size of this area here relative to the total 375 00:21:06,330 --> 00:21:08,860 area in the rectangle-- 376 00:21:08,860 --> 00:21:10,850 the universe. 377 00:21:10,850 --> 00:21:15,570 So the probability of a is the size of this circle divided by 378 00:21:15,570 --> 00:21:18,690 the size of this rectangle in this picture. 379 00:21:18,690 --> 00:21:22,210 So now all these axioms make sense. 380 00:21:22,210 --> 00:21:25,250 The probability that a is certain is just when that 381 00:21:25,250 --> 00:21:29,010 fills up the whole thing, and there's no other place for a 382 00:21:29,010 --> 00:21:31,590 sample to be, that means it has to be a. 383 00:21:31,590 --> 00:21:35,570 So that probability goes all the way up to 1. 384 00:21:35,570 --> 00:21:39,450 On the other hand, if the size of a is just an infinitesimal 385 00:21:39,450 --> 00:21:44,230 dot, then the chances of landing in that world is 0. 386 00:21:44,230 --> 00:21:46,900 That's the bound on the other end. 387 00:21:46,900 --> 00:21:48,860 So this-- 388 00:21:48,860 --> 00:21:50,900 axiom number one-- makes sense in terms of that 389 00:21:50,900 --> 00:21:52,250 picture over there. 390 00:21:52,250 --> 00:21:54,290 Likewise, axiom number two. 391 00:21:54,290 --> 00:21:57,500 What about axiom number three? 392 00:21:57,500 --> 00:22:03,150 Does that make sense in terms of all this stuff? 393 00:22:03,150 --> 00:22:08,430 And the answer is, sure, because we can just look at 394 00:22:08,430 --> 00:22:12,850 those areas with a little bit of colored chalk. 395 00:22:12,850 --> 00:22:16,920 And so the probability of a is just this area here. 396 00:22:16,920 --> 00:22:21,300 The probability of b is this area here. 397 00:22:21,300 --> 00:22:23,330 And if we want to know the probability that we're in 398 00:22:23,330 --> 00:22:27,700 either a or b, then we just have to add up those areas. 399 00:22:27,700 --> 00:22:30,040 But when we add up those areas, this intersection part 400 00:22:30,040 --> 00:22:32,260 is added in twice. 401 00:22:32,260 --> 00:22:35,675 So we've got to subtract that off in order to make this 402 00:22:35,675 --> 00:22:38,300 thing make a rational equation, so that makes sense. 403 00:22:38,300 --> 00:22:40,230 And axiom three makes sense, just as 404 00:22:40,230 --> 00:22:43,110 axioms one and two did. 405 00:22:43,110 --> 00:22:45,370 So that's all there is to basic probability. 406 00:22:45,370 --> 00:22:48,060 And now you could do all sorts of algebra on that, and it's 407 00:22:48,060 --> 00:22:51,420 elegant, because it's like circuit theory or 408 00:22:51,420 --> 00:22:54,240 electromagnetism, because from a very 409 00:22:54,240 --> 00:22:55,970 small number of axioms-- 410 00:22:55,970 --> 00:22:57,730 in this case three-- 411 00:22:57,730 --> 00:23:02,180 you can build an elegant mathematical system. 412 00:23:02,180 --> 00:23:03,910 And that's what probability subjects do. 413 00:23:03,910 --> 00:23:06,760 But we're not going to go there, because we're sort of 414 00:23:06,760 --> 00:23:10,740 focused on getting down to a point where we can deal with 415 00:23:10,740 --> 00:23:12,570 that joint probability table that we 416 00:23:12,570 --> 00:23:14,260 currently can't deal with. 417 00:23:14,260 --> 00:23:17,050 So we're not going to go into a whole lot of algebra with 418 00:23:17,050 --> 00:23:17,810 these things. 419 00:23:17,810 --> 00:23:22,620 Just what we need in order to go through that network. 420 00:23:22,620 --> 00:23:25,440 So the next thing we need to deal with is conditional 421 00:23:25,440 --> 00:23:27,220 probability. 422 00:23:27,220 --> 00:23:30,360 And whereas those are axioms, this is a definition. 423 00:23:35,100 --> 00:23:41,620 We say that the probability of a given b is equal to, by 424 00:23:41,620 --> 00:23:46,760 definition, the probability of a and b. 425 00:23:46,760 --> 00:23:48,880 I'm using that common notation to mean [INAUDIBLE] 426 00:23:48,880 --> 00:23:51,760 as is conventional in the field. 427 00:23:51,760 --> 00:23:57,190 And then we're going to divide that by the probability of B. 428 00:23:57,190 --> 00:23:59,390 You can take that as a definition, and then it's just 429 00:23:59,390 --> 00:24:01,970 a little bit of mysterious algebra. 430 00:24:01,970 --> 00:24:05,600 Or you could do like we did up there and take an intuitionist 431 00:24:05,600 --> 00:24:13,100 approach and ask what that stuff means in terms of a 432 00:24:13,100 --> 00:24:17,560 circle diagram and some sort of space. 433 00:24:17,560 --> 00:24:18,960 And let's see, what does that mean? 434 00:24:18,960 --> 00:24:23,320 It means that we're trying to restrict the probability of a 435 00:24:23,320 --> 00:24:29,370 to those circumstances where b is known to be so. 436 00:24:29,370 --> 00:24:30,620 And we're going to say that-- 437 00:24:33,080 --> 00:24:37,810 we've got this part here, and then we've got the 438 00:24:37,810 --> 00:24:41,370 intersection of a with b. 439 00:24:41,370 --> 00:24:44,680 And so it does make sense as a definition, because it says 440 00:24:44,680 --> 00:24:47,210 that if you've got b, then the probability that you're going 441 00:24:47,210 --> 00:24:50,190 to get a is the size of that intersection-- 442 00:24:50,190 --> 00:24:52,140 the pink and orange stuff-- 443 00:24:52,140 --> 00:24:55,240 divided by the whole of b. 444 00:24:55,240 --> 00:24:58,450 So it's as if we restricted the universe of consideration 445 00:24:58,450 --> 00:25:00,950 to just that part of the original universe 446 00:25:00,950 --> 00:25:03,220 as covered by b. 447 00:25:03,220 --> 00:25:07,370 So that makes sense as a definition. 448 00:25:07,370 --> 00:25:14,430 And we can rewrite that, of course, as P of a and b is 449 00:25:14,430 --> 00:25:19,190 equal to the probability of a given b times the 450 00:25:19,190 --> 00:25:21,000 probability of b. 451 00:25:23,740 --> 00:25:27,370 That's all basic stuff. 452 00:25:27,370 --> 00:25:31,370 Now, we do want to do a little bit of algebra here, because I 453 00:25:31,370 --> 00:25:34,570 want to consider not just two cases, but what if we divide 454 00:25:34,570 --> 00:25:37,960 this space up into three parts? 455 00:25:37,960 --> 00:25:44,310 Then we'll say that the probability of a, b, and c is 456 00:25:44,310 --> 00:25:45,560 equal to what? 457 00:25:48,900 --> 00:25:51,000 Well, there are lots of ways to think about that. 458 00:25:51,000 --> 00:25:54,360 But one way to think about it is that we are restricting the 459 00:25:54,360 --> 00:25:56,410 universe to that part of the world where b 460 00:25:56,410 --> 00:25:59,530 and c are both true. 461 00:25:59,530 --> 00:26:03,380 So let's say that y is equal to b and c-- 462 00:26:08,330 --> 00:26:12,980 the intersection of b and c, where a and b are both true. 463 00:26:12,980 --> 00:26:18,570 Then we can use this formula over here to say that 464 00:26:18,570 --> 00:26:24,270 probability of a, b, and c is equal to the probability of a 465 00:26:24,270 --> 00:26:33,670 and y, which is equal to the probability of a given y times 466 00:26:33,670 --> 00:26:36,090 the probability of y. 467 00:26:36,090 --> 00:26:42,260 And then we can expand that back out and say that P of a 468 00:26:42,260 --> 00:26:48,020 given b and c is equal to the probability-- 469 00:26:48,020 --> 00:26:52,500 sorry, times the probability of y, but y is equal to the 470 00:26:52,500 --> 00:26:57,250 probability of b and c, like so. 471 00:27:00,990 --> 00:27:02,720 Ah, but wait-- 472 00:27:02,720 --> 00:27:06,365 we can run this idea over that one, too, and we can say that 473 00:27:06,365 --> 00:27:09,760 this whole works is equal to the probability of a given b 474 00:27:09,760 --> 00:27:16,130 and c times the probability of b given c times the 475 00:27:16,130 --> 00:27:19,480 probability of c. 476 00:27:19,480 --> 00:27:22,086 And now, when we stand back and let that sing to us, we 477 00:27:22,086 --> 00:27:25,010 can see that some magic is beginning to happen here, 478 00:27:25,010 --> 00:27:29,660 because we've taken this probability of all things 479 00:27:29,660 --> 00:27:34,850 being so, and we've broken up into a product of three 480 00:27:34,850 --> 00:27:37,020 probabilities. 481 00:27:37,020 --> 00:27:39,150 The first two are conditional probabilities, so they're 482 00:27:39,150 --> 00:27:40,690 really all conditional probabilities. 483 00:27:40,690 --> 00:27:43,530 The last one's conditional on nothing. 484 00:27:43,530 --> 00:27:46,405 But look what happens as we go from left to right. 485 00:27:46,405 --> 00:27:49,220 a is dependent on two things. 486 00:27:49,220 --> 00:27:52,910 b is only dependent on one thing and nothing to the left. 487 00:27:52,910 --> 00:27:57,040 c is dependent on nothing and nothing to the left. 488 00:27:57,040 --> 00:28:00,890 So you can sense a generalization coming. 489 00:28:00,890 --> 00:28:02,220 So let's write it down. 490 00:28:11,820 --> 00:28:17,530 So let's go from here over to here and say that the 491 00:28:17,530 --> 00:28:19,970 probability of a whole bunch of things-- 492 00:28:19,970 --> 00:28:25,250 x1 through x10-- 493 00:28:25,250 --> 00:28:28,675 is equal to some product of probabilities. 494 00:28:28,675 --> 00:28:32,760 We'll let the index i run from n to 1. 495 00:28:32,760 --> 00:28:37,680 Probability of x to the last one in the series, conditioned 496 00:28:37,680 --> 00:28:39,060 on all the other ones-- 497 00:28:39,060 --> 00:28:44,235 sorry, that's probability of i, i minus 1 498 00:28:44,235 --> 00:28:46,200 down to x1 like so. 499 00:28:49,040 --> 00:28:52,950 And for the first one in this product, i will be equal to n. 500 00:28:52,950 --> 00:28:56,160 For the second one, i will be equal to n minus 1. 501 00:28:56,160 --> 00:29:00,740 But you'll notice that as I go from n toward 1, these 502 00:29:00,740 --> 00:29:02,340 conditionals get smaller-- 503 00:29:02,340 --> 00:29:06,740 the number of things on condition get smaller, and 504 00:29:06,740 --> 00:29:11,930 none of these things are on the left. 505 00:29:11,930 --> 00:29:15,190 They're only stuff that I have on the right. 506 00:29:15,190 --> 00:29:18,690 So what I mean to say is all of these things have an index 507 00:29:18,690 --> 00:29:21,240 that's smaller than this index. 508 00:29:21,240 --> 00:29:23,930 None of the ones that have a higher index are appearing in 509 00:29:23,930 --> 00:29:25,870 that conditional. 510 00:29:25,870 --> 00:29:28,900 So it's a way of taking a probability of the end of a 511 00:29:28,900 --> 00:29:32,180 whole bunch of things and writing it as a product of 512 00:29:32,180 --> 00:29:34,690 conditional probabilities. 513 00:29:34,690 --> 00:29:36,000 So we're making good progress. 514 00:29:36,000 --> 00:29:38,010 We've done one. 515 00:29:38,010 --> 00:29:39,420 We've done two. 516 00:29:39,420 --> 00:29:41,220 And now we've done three, because this 517 00:29:41,220 --> 00:29:42,470 is the chain rule. 518 00:29:47,850 --> 00:29:51,340 And we're about halfway through our diagram, halfway 519 00:29:51,340 --> 00:29:54,710 to the point where we can do something fun. 520 00:29:54,710 --> 00:29:56,900 But we still have a couple more concepts to deal with, 521 00:29:56,900 --> 00:29:59,960 and the next concept is the concept of conditional 522 00:29:59,960 --> 00:30:02,730 probability. 523 00:30:02,730 --> 00:30:06,400 So that's all this stuff up here-- 524 00:30:06,400 --> 00:30:07,650 oops. 525 00:30:10,860 --> 00:30:13,740 All this stuff here is the definition of conditional 526 00:30:13,740 --> 00:30:14,990 probability. 527 00:30:19,800 --> 00:30:25,800 And now I want to go to the definition of independence. 528 00:30:44,800 --> 00:30:46,870 So that's another definitional deal. 529 00:30:46,870 --> 00:30:49,560 But it's another definitional deal that makes some sense 530 00:30:49,560 --> 00:30:51,940 with a diagram as well. 531 00:30:51,940 --> 00:30:59,080 So the definition goes like this. 532 00:30:59,080 --> 00:31:10,640 We say that P of a given b is equal to P of a if a 533 00:31:10,640 --> 00:31:20,480 independent of b. 534 00:31:20,480 --> 00:31:23,690 So that says that the probability of a doesn't 535 00:31:23,690 --> 00:31:26,980 depend on what's going on with b. 536 00:31:26,980 --> 00:31:29,520 It's the same either way. 537 00:31:29,520 --> 00:31:30,630 So it's independent. 538 00:31:30,630 --> 00:31:33,310 b doesn't matter. 539 00:31:33,310 --> 00:31:35,550 So what does that look like if we try to do an 540 00:31:35,550 --> 00:31:38,490 intuitionist diagram? 541 00:31:38,490 --> 00:31:39,740 Well, let's see. 542 00:31:42,809 --> 00:31:44,300 Here's a. 543 00:31:44,300 --> 00:31:46,440 Here's b. 544 00:31:46,440 --> 00:31:50,890 Now, the probability of a given b-- 545 00:31:50,890 --> 00:31:51,890 well, let's see. 546 00:31:51,890 --> 00:31:59,780 That must be this part here divided by this part here. 547 00:32:04,060 --> 00:32:08,760 So the ratio of those areas is the probability of a given b. 548 00:32:08,760 --> 00:32:16,300 So that's the probability of this way divided by the 549 00:32:16,300 --> 00:32:20,140 probability of both ways. 550 00:32:24,090 --> 00:32:28,380 So what's the probability of a in terms of these areas? 551 00:32:28,380 --> 00:32:32,000 Well, probability of a in terms of these areas is the 552 00:32:32,000 --> 00:32:34,240 probability-- 553 00:32:34,240 --> 00:32:35,680 let's see, have I got this right? 554 00:32:35,680 --> 00:32:37,610 I've got this upside down. 555 00:32:41,610 --> 00:32:44,790 The probability of a given b is the probability of the 556 00:32:44,790 --> 00:32:46,000 stuff in the intersection-- 557 00:32:46,000 --> 00:32:47,290 so that's both ways-- 558 00:32:49,810 --> 00:32:53,460 divided by the probability of the stuff in b, which 559 00:32:53,460 --> 00:32:54,710 is going this way. 560 00:32:58,620 --> 00:33:03,040 And let's see, the probability of a not conditioned on 561 00:33:03,040 --> 00:33:08,510 anything except being in this universe is all these hash 562 00:33:08,510 --> 00:33:18,100 marks, like so, divided by the universe. 563 00:33:21,170 --> 00:33:23,530 So when we say that something's independent, it 564 00:33:23,530 --> 00:33:25,170 means that those two ratios are the same. 565 00:33:28,170 --> 00:33:30,610 That's all it means in the intuitionist's point of view. 566 00:33:30,610 --> 00:33:33,710 So it says that this little area here divided by this 567 00:33:33,710 --> 00:33:36,970 whole area is the same as this whole area for a divided by 568 00:33:36,970 --> 00:33:39,000 the size of the universe. 569 00:33:39,000 --> 00:33:40,250 So that's what independence means. 570 00:33:43,050 --> 00:33:45,270 Now, that's quite a lot of work. 571 00:33:45,270 --> 00:33:46,980 But we're not done with independence, because we've 572 00:33:46,980 --> 00:33:49,730 got conditional independence to deal with. 573 00:34:01,360 --> 00:34:03,170 And that, too, can be viewed as a definition. 574 00:34:08,340 --> 00:34:11,810 And what we're going to say is that the probability of a 575 00:34:11,810 --> 00:34:19,020 given b and z is equal to the probability of a given z. 576 00:34:23,350 --> 00:34:24,210 What's that mean? 577 00:34:24,210 --> 00:34:28,010 That means that if you know that we're dealing with z, 578 00:34:28,010 --> 00:34:33,100 then the probability of a doesn't depend on b. 579 00:34:33,100 --> 00:34:35,239 b doesn't matter anymore once you're 580 00:34:35,239 --> 00:34:38,900 restricted to being in z. 581 00:34:38,900 --> 00:34:42,350 So you can look at that this way. 582 00:34:47,070 --> 00:34:52,060 Here's a, and here's b, and here is z. 583 00:34:55,600 --> 00:34:58,360 So what we're saying is that we're restricting the world to 584 00:34:58,360 --> 00:35:01,860 being in this part of the universe where z is. 585 00:35:01,860 --> 00:35:09,145 So the probability of a given b and z is this piece in here. 586 00:35:12,340 --> 00:35:16,050 a given b and z is that part there. 587 00:35:16,050 --> 00:35:23,340 And the probability of a given z is this part here 588 00:35:23,340 --> 00:35:27,280 divided by all of z. 589 00:35:27,280 --> 00:35:32,580 So we're saying that the ratio of this little piece here to 590 00:35:32,580 --> 00:35:39,010 this part, which I'll mark that way, ratio of this to 591 00:35:39,010 --> 00:35:42,080 this is the same as the ratio of that to that. 592 00:35:42,080 --> 00:35:45,410 So that's conditional independence. 593 00:35:45,410 --> 00:35:49,810 So you can infer from these things, with a little bit of 594 00:35:49,810 --> 00:36:01,352 algebra, that P of a and b given z is equal to P of a 595 00:36:01,352 --> 00:36:05,490 given z times P of b in z. 596 00:36:09,260 --> 00:36:12,400 Boy, that's been quite a journey, but we got all the 597 00:36:12,400 --> 00:36:16,200 way through one, two, three, four, and five. 598 00:36:16,200 --> 00:36:18,070 And now the next thing is belief nets, and I'm going to 599 00:36:18,070 --> 00:36:22,400 ask you to forget everything I've said for a minute or two. 600 00:36:22,400 --> 00:36:24,420 And we'll come back to it. 601 00:36:24,420 --> 00:36:29,360 I want to talk about the dog and the burglar and the 602 00:36:29,360 --> 00:36:32,300 raccoon again. 603 00:36:32,300 --> 00:36:36,070 And now, forgetting about probability, I can say, look, 604 00:36:36,070 --> 00:36:40,700 the dog barks if a raccoon shows up. 605 00:36:40,700 --> 00:36:44,790 The dog barks if a burglar shows up. 606 00:36:44,790 --> 00:36:48,110 A burglar doesn't show up because the dog is barking. 607 00:36:48,110 --> 00:36:51,470 A raccoon doesn't show up because the dog is barking. 608 00:36:51,470 --> 00:36:54,580 So the causality flows from the burglar and the raccoon to 609 00:36:54,580 --> 00:36:56,020 the barking. 610 00:36:56,020 --> 00:36:58,570 So we can make a diagram of that. 611 00:36:58,570 --> 00:37:01,310 And our diagram will look like this. 612 00:37:01,310 --> 00:37:07,540 Here is the burglar, and here is the raccoon. 613 00:37:07,540 --> 00:37:12,090 And these have causal relations to the dog barking. 614 00:37:15,390 --> 00:37:22,080 So that's an interesting idea, because now I can say that-- 615 00:37:22,080 --> 00:37:24,550 well, I can't say anything yet, because I want to add a 616 00:37:24,550 --> 00:37:26,190 little more complexity to it. 617 00:37:26,190 --> 00:37:28,920 I'm going to add two more variables. 618 00:37:28,920 --> 00:37:34,640 You might call the police, depending on how vigorous the 619 00:37:34,640 --> 00:37:36,430 dog is barking, I guess. 620 00:37:36,430 --> 00:37:40,300 And the raccoon has a propensity to knocking over 621 00:37:40,300 --> 00:37:42,660 the trash can. 622 00:37:42,660 --> 00:37:44,600 So now, I've got five variables. 623 00:37:44,600 --> 00:37:47,900 How big a joint probability table am I going to need to 624 00:37:47,900 --> 00:37:50,020 keep my tallies straight? 625 00:37:50,020 --> 00:37:50,980 Well, it'll be 2 to the 5th. 626 00:37:50,980 --> 00:37:53,900 That's 32. 627 00:37:53,900 --> 00:37:59,780 But what I'm going to say is that this diagram is a 628 00:37:59,780 --> 00:38:07,820 statement, that every node in it depends on its parents and 629 00:38:07,820 --> 00:38:10,630 nothing else that's not a descendant. 630 00:38:10,630 --> 00:38:13,380 Now, I need to say that about 50 times, because you've got 631 00:38:13,380 --> 00:38:15,020 to say it right. 632 00:38:15,020 --> 00:38:18,070 Every node there is independent of every 633 00:38:18,070 --> 00:38:20,620 non-descendant other then its parents. 634 00:38:20,620 --> 00:38:22,310 No, that's not quite right. 635 00:38:22,310 --> 00:38:26,380 Given its parents, every node is independent of all other 636 00:38:26,380 --> 00:38:28,670 non-descendants. 637 00:38:28,670 --> 00:38:32,070 Well, what does that mean? 638 00:38:32,070 --> 00:38:34,730 Here's the deal with calling the police. 639 00:38:34,730 --> 00:38:37,180 Here's its one and only parent. 640 00:38:37,180 --> 00:38:40,400 So given this parent, the probability that they were 641 00:38:40,400 --> 00:38:44,120 going to call the police doesn't depend on anything 642 00:38:44,120 --> 00:38:48,520 like B, R, or T. It's because all of the causality is 643 00:38:48,520 --> 00:38:51,600 flowing through this dog barking. 644 00:38:51,600 --> 00:38:55,150 I'm not going to call the police in a way that's 645 00:38:55,150 --> 00:38:57,240 dependent on anything else other than whether the dog is 646 00:38:57,240 --> 00:38:58,860 barking or not. 647 00:38:58,860 --> 00:39:04,430 Because this guy has this as a parent, and these are not 648 00:39:04,430 --> 00:39:10,245 descendants of calling the police, so this is independent 649 00:39:10,245 --> 00:39:13,730 of B, R, and T. 650 00:39:13,730 --> 00:39:16,220 So let's go walk through the others. 651 00:39:16,220 --> 00:39:17,470 Here's the dog. 652 00:39:17,470 --> 00:39:19,360 The dog's parents are burger 653 00:39:19,360 --> 00:39:21,950 appearing and raccoon appearing. 654 00:39:21,950 --> 00:39:27,590 So the probability that the dog appears is independent of 655 00:39:27,590 --> 00:39:29,580 that trash can over there, because that's not a 656 00:39:29,580 --> 00:39:30,850 descendant. 657 00:39:30,850 --> 00:39:33,660 It is dependent on these parents. 658 00:39:33,660 --> 00:39:35,790 How about the trash can? 659 00:39:35,790 --> 00:39:37,340 It depends only on the raccoon. 660 00:39:40,070 --> 00:39:43,810 It doesn't depend on any other non-descendant, so therefore, 661 00:39:43,810 --> 00:39:50,190 it doesn't depend on D, B, or P. How about B? 662 00:39:50,190 --> 00:39:52,900 It has no parents. 663 00:39:52,900 --> 00:39:58,210 So it depends on nothing else, because everything else is 664 00:39:58,210 --> 00:40:09,070 either a non-descendant, because B does not dependent 665 00:40:09,070 --> 00:40:12,895 on R and T, because they're not descendants. 666 00:40:16,400 --> 00:40:19,160 It's interesting that B might depend on D and P, because 667 00:40:19,160 --> 00:40:20,410 those are descendants. 668 00:40:22,950 --> 00:40:26,120 So it's important to understand that there's this 669 00:40:26,120 --> 00:40:33,020 business of independence given the parents of all other 670 00:40:33,020 --> 00:40:35,200 non-descendants. 671 00:40:35,200 --> 00:40:37,620 And you'll see why that funny, strange language is important 672 00:40:37,620 --> 00:40:40,060 in a minute. 673 00:40:40,060 --> 00:40:40,710 But now, let's see-- 674 00:40:40,710 --> 00:40:43,920 I want to make a model of what's going to happen here. 675 00:40:43,920 --> 00:40:47,540 So let me see what kind of probabilities I'm going to 676 00:40:47,540 --> 00:40:50,300 have to figure out. 677 00:40:50,300 --> 00:40:54,790 This guy doesn't depend on anything upstream. 678 00:40:54,790 --> 00:40:56,460 So we could just say that all we need there is the 679 00:40:56,460 --> 00:40:58,990 probability that a burglar is going to appear. 680 00:40:58,990 --> 00:41:01,880 Let's say it's a fairly high-crime neighborhood-- 681 00:41:01,880 --> 00:41:03,020 1 chance in 10-- 682 00:41:03,020 --> 00:41:06,330 1 day in 10, a burglar appears. 683 00:41:06,330 --> 00:41:11,760 The raccoon doesn't depend on anything other than its own 684 00:41:11,760 --> 00:41:14,130 propensity, so its probability, 685 00:41:14,130 --> 00:41:16,970 we'll say, is 0.5. 686 00:41:16,970 --> 00:41:19,780 Raccoons love the place, so it shows up about 1 day in 2. 687 00:41:22,340 --> 00:41:24,300 So what about the dog barking? 688 00:41:24,300 --> 00:41:28,690 That depends on whether there's a burglar, and the 689 00:41:28,690 --> 00:41:31,110 other parent is whether there's a raccoon. 690 00:41:31,110 --> 00:41:34,270 So we need to keep track of the probability that the dog 691 00:41:34,270 --> 00:41:37,350 will bark for all four combinations. 692 00:41:42,060 --> 00:41:46,980 So this will be the burglar, and this will be the raccoon. 693 00:41:46,980 --> 00:41:51,980 This will be false, false, true, true-- 694 00:41:51,980 --> 00:41:55,400 oops-- false, false, true, false, 695 00:41:55,400 --> 00:41:59,360 false, true, true, true. 696 00:41:59,360 --> 00:42:03,450 So let's say it's a wonderful dog, and it always barks if 697 00:42:03,450 --> 00:42:05,700 there's a burglar. 698 00:42:05,700 --> 00:42:10,500 So that would say that the probability here is 1.0, and 699 00:42:10,500 --> 00:42:13,170 the probability here is 1.0. 700 00:42:13,170 --> 00:42:17,875 And if there's neither a burglar nor a raccoon, the dog 701 00:42:17,875 --> 00:42:19,420 still likes to bark just for fun. 702 00:42:19,420 --> 00:42:22,130 So we'll say that's a chance of 1 in 10. 703 00:42:22,130 --> 00:42:26,370 And then in case there's a burglar, let's say this. 704 00:42:26,370 --> 00:42:28,290 There's no burglar, but there is a raccoon-- 705 00:42:28,290 --> 00:42:31,710 he's tired of the raccoons, so he only barks half the time. 706 00:42:31,710 --> 00:42:34,280 Do these numbers, by the way, have to add up to 1? 707 00:42:34,280 --> 00:42:36,290 They clearly don't. 708 00:42:36,290 --> 00:42:37,370 These numbers don't add up to one. 709 00:42:37,370 --> 00:42:40,690 What adds up to 1 is this is the probability 710 00:42:40,690 --> 00:42:43,210 that the dog barks. 711 00:42:43,210 --> 00:42:47,310 And then the other phantom probability is out here. 712 00:42:47,310 --> 00:42:48,935 And these have to add up to 1. 713 00:42:48,935 --> 00:42:52,850 So that would be 0.9, that would be 0.0, that would be 714 00:42:52,850 --> 00:42:57,050 0.5, and this would be 0.0. 715 00:42:57,050 --> 00:43:01,820 So because those are just 1 minus the numbers in these 716 00:43:01,820 --> 00:43:06,830 columns, I don't bother to write them down. 717 00:43:06,830 --> 00:43:08,540 Well, we still have a couple more things to do. 718 00:43:08,540 --> 00:43:11,280 The probability that we'll call the police depends only 719 00:43:11,280 --> 00:43:12,405 on the dog. 720 00:43:12,405 --> 00:43:14,770 So we'll have a column for the dog, and then we'll have a 721 00:43:14,770 --> 00:43:16,425 probability of calling the police. 722 00:43:19,070 --> 00:43:22,770 There's a probability for that being false and a probability 723 00:43:22,770 --> 00:43:24,760 for that being true. 724 00:43:24,760 --> 00:43:28,790 So if the dog doesn't bark, there's really hardly any 725 00:43:28,790 --> 00:43:30,730 chance we'll call the police. 726 00:43:30,730 --> 00:43:32,820 So make that 0, 0, 1. 727 00:43:32,820 --> 00:43:36,420 If the dog is barking, if he barks vigorously enough, maybe 728 00:43:36,420 --> 00:43:40,430 1 chance in 10. 729 00:43:40,430 --> 00:43:43,640 Here, we have the trash can-- the final thing we have to 730 00:43:43,640 --> 00:43:44,830 think about. 731 00:43:44,830 --> 00:43:48,240 There's the trash can; rather, the raccoon. 732 00:43:48,240 --> 00:43:51,890 And here's the trash can probability. 733 00:43:51,890 --> 00:43:57,460 Depends on the raccoon being either present or not present. 734 00:43:57,460 --> 00:44:00,270 If the raccoon is not present, the probability the trash can 735 00:44:00,270 --> 00:44:04,650 is knocked over by, say, the wind is 1 in 1,000. 736 00:44:04,650 --> 00:44:08,510 If the raccoon is there, oh man, that guy always likes to 737 00:44:08,510 --> 00:44:11,340 go in there, so that's 0.8. 738 00:44:11,340 --> 00:44:14,580 So now I'm done specifying this model. 739 00:44:14,580 --> 00:44:18,570 And the question is, how many numbers did I have to specify? 740 00:44:18,570 --> 00:44:21,140 Well, let's see. 741 00:44:21,140 --> 00:44:25,150 I have to specify that one, that one, that one, that one, 742 00:44:25,150 --> 00:44:29,060 that one, that one-- that's 6, 7, 8, 9, 10. 743 00:44:29,060 --> 00:44:32,540 So I had to specify 10 numbers. 744 00:44:32,540 --> 00:44:35,480 If I just try to build myself a joint probability table 745 00:44:35,480 --> 00:44:39,586 straightaway, how many numbers would I have to supply? 746 00:44:39,586 --> 00:44:41,970 Well, it's 2 to the n. 747 00:44:41,970 --> 00:44:48,970 So it's 2 to the 5th, that's 32. 748 00:44:48,970 --> 00:44:51,560 Considerable saving. 749 00:44:51,560 --> 00:44:54,910 By the way, how do you suppose I made that table? 750 00:44:54,910 --> 00:44:57,220 Not by doing all those numbers. 751 00:44:57,220 --> 00:45:01,460 By making this belief network and then using the belief 752 00:45:01,460 --> 00:45:04,470 network to calculate those numbers. 753 00:45:04,470 --> 00:45:07,900 And that's why this is a miracle, because with these 754 00:45:07,900 --> 00:45:11,400 numbers, I can calculate those numbers instead of making them 755 00:45:11,400 --> 00:45:15,420 up or making a whole lot of tally-type measurements. 756 00:45:15,420 --> 00:45:18,540 So I'd like to make sure that that's true. 757 00:45:18,540 --> 00:45:24,150 And I can use this stuff here to calculate the full joint 758 00:45:24,150 --> 00:45:27,440 probability table. 759 00:45:27,440 --> 00:45:30,890 So here's how this works. 760 00:45:30,890 --> 00:45:33,265 I have the probability of some combination-- 761 00:45:36,020 --> 00:45:44,150 let's say the police, the dog, the burglar, the trash can, 762 00:45:44,150 --> 00:45:45,400 and the raccoon. 763 00:45:50,220 --> 00:45:52,400 All the combinations that are possible there will give me an 764 00:45:52,400 --> 00:45:54,582 entry in the table-- one row. 765 00:45:54,582 --> 00:45:56,280 But let's see-- 766 00:45:56,280 --> 00:45:57,150 there's some miracle here. 767 00:45:57,150 --> 00:45:59,670 Oh, this chain rule. 768 00:45:59,670 --> 00:46:01,920 Let's use the chain rule. 769 00:46:01,920 --> 00:46:05,820 We can write that as a probability that we call the 770 00:46:05,820 --> 00:46:10,950 police given d, b, t, and r. 771 00:46:10,950 --> 00:46:14,480 And then the next one in my chain is probability of d 772 00:46:14,480 --> 00:46:17,950 given b, t, and r. 773 00:46:17,950 --> 00:46:20,090 Then the next one in the chain is the probability of 774 00:46:20,090 --> 00:46:23,920 b given t and r. 775 00:46:23,920 --> 00:46:28,470 And the next one in my chain is P of t given r. 776 00:46:28,470 --> 00:46:31,335 And the final one in my chain is p of r. 777 00:46:33,860 --> 00:46:36,200 Now, we have some conditional independence 778 00:46:36,200 --> 00:46:38,170 knowledge, too, don't we? 779 00:46:38,170 --> 00:46:45,740 We know that this probability here depends only on d because 780 00:46:45,740 --> 00:46:47,150 there are no descendants. 781 00:46:47,150 --> 00:46:49,880 So therefore, we don't have to think about that, and all the 782 00:46:49,880 --> 00:46:54,100 numbers we need here are produced by this table. 783 00:46:54,100 --> 00:46:55,190 How about this one here? 784 00:46:55,190 --> 00:46:58,850 Probability that the dog barks depends only on its parents, b 785 00:46:58,850 --> 00:47:01,550 and r, so it doesn't depend on t. 786 00:47:05,390 --> 00:47:09,080 So b, in turn, depends on-- 787 00:47:09,080 --> 00:47:09,960 what does it depend on? 788 00:47:09,960 --> 00:47:12,030 It doesn't depend on anything. 789 00:47:12,030 --> 00:47:14,330 So we can scratch those. 790 00:47:14,330 --> 00:47:17,890 Probability of t given r, yeah, there's a probability 791 00:47:17,890 --> 00:47:20,030 there, but we can get that from the table. 792 00:47:20,030 --> 00:47:22,680 And finally, P or r. 793 00:47:22,680 --> 00:47:25,550 So that's why I went through all that probability junk, 794 00:47:25,550 --> 00:47:30,680 because if we arrange things in the expansion of this, from 795 00:47:30,680 --> 00:47:35,100 bottom to top, then we arrange things so that none of these 796 00:47:35,100 --> 00:47:39,860 guys depends on a descendant in this formula. 797 00:47:39,860 --> 00:47:41,510 And we have a limited number of things that it 798 00:47:41,510 --> 00:47:44,720 depends on above it. 799 00:47:44,720 --> 00:47:46,750 So that's the way we can calculate back the full joint 800 00:47:46,750 --> 00:47:48,000 probability table. 801 00:47:51,845 --> 00:47:54,380 And that brings us to the end of the discussion today. 802 00:47:54,380 --> 00:47:56,940 But the thing we're going to think about is, how much 803 00:47:56,940 --> 00:47:59,850 saving do we really get out of this? 804 00:47:59,850 --> 00:48:03,940 In this particular case, we only had to devise 10 805 00:48:03,940 --> 00:48:05,290 numbers out of 32. 806 00:48:05,290 --> 00:48:09,400 What if we had 10 properties or 100 properties? 807 00:48:09,400 --> 00:48:11,270 How much saving would we get then? 808 00:48:11,270 --> 00:48:13,070 That's what we'll take up next time, 809 00:48:13,070 --> 00:48:14,430 after the quiz on Wednesday.