1 00:00:00,780 --> 00:00:04,040 PROFESSOR: So let's look now at the mathematical foundations 2 00:00:04,040 --> 00:00:07,100 of probability theory, the basic definitions of which we've just 3 00:00:07,100 --> 00:00:09,270 been doing examples up until now. 4 00:00:09,270 --> 00:00:12,380 So a key concept is a probability space. 5 00:00:12,380 --> 00:00:16,309 And that's what we're going to talk about in this segment. 6 00:00:16,309 --> 00:00:20,370 So the abstract setting of a probability space 7 00:00:20,370 --> 00:00:23,460 is the first thing you start off with is the set of outcomes, 8 00:00:23,460 --> 00:00:26,190 which is what we saw we were doing with the tree 9 00:00:26,190 --> 00:00:28,630 models in the previous videos. 10 00:00:28,630 --> 00:00:30,820 And the condition that we require 11 00:00:30,820 --> 00:00:34,720 is that there should be a countable set of outcomes. 12 00:00:34,720 --> 00:00:36,670 So there's something called the sample space. 13 00:00:36,670 --> 00:00:39,416 And the sample space is designed to model 14 00:00:39,416 --> 00:00:41,040 all the possible things that can happen 15 00:00:41,040 --> 00:00:43,510 as the result of your random experiment, all 16 00:00:43,510 --> 00:00:44,440 the possible outcomes. 17 00:00:44,440 --> 00:00:47,740 And we require that there be a countable number. 18 00:00:47,740 --> 00:00:49,850 Now, the examples that we've seen so far 19 00:00:49,850 --> 00:00:51,320 have only had a finite number. 20 00:00:51,320 --> 00:00:53,680 But we will shortly see a bunch of examples 21 00:00:53,680 --> 00:00:56,120 where we really need an infinite number. 22 00:00:56,120 --> 00:00:58,570 But only a countable infinite number. 23 00:00:58,570 --> 00:01:01,490 That's part of the definition of a probability space-- the set 24 00:01:01,490 --> 00:01:02,140 of outcomes. 25 00:01:02,140 --> 00:01:04,780 The next thing is a probability function 26 00:01:04,780 --> 00:01:08,330 whose task is to assign probabilities to the outcomes. 27 00:01:08,330 --> 00:01:12,770 So the condition is that the probability function, Pr, 28 00:01:12,770 --> 00:01:18,860 gives every element in S, every outcome, 29 00:01:18,860 --> 00:01:21,280 is going to be assigned a probability of between 0 30 00:01:21,280 --> 00:01:24,000 and 1 inclusive. 31 00:01:24,000 --> 00:01:26,890 So every outcome gets a probability between 0 and 1. 32 00:01:26,890 --> 00:01:29,520 But the constraint on the probability function 33 00:01:29,520 --> 00:01:32,330 is that if I sum up the probabilities 34 00:01:32,330 --> 00:01:36,520 of all the outcomes-- omega is an outcome in the sample space 35 00:01:36,520 --> 00:01:40,450 S-- and I take the sum of all of those probabilities of omega, 36 00:01:40,450 --> 00:01:43,510 they have to sum to 1. 37 00:01:43,510 --> 00:01:47,520 That's the crucial condition that defines a probability 38 00:01:47,520 --> 00:01:50,340 function on a sample space. 39 00:01:50,340 --> 00:01:53,560 And the two together are what are called a probability space. 40 00:01:53,560 --> 00:01:55,800 A sample space with a probability function 41 00:01:55,800 --> 00:01:57,920 is a probability space. 42 00:01:57,920 --> 00:02:00,180 So the purpose of the tree model that we were using 43 00:02:00,180 --> 00:02:04,150 is really to figure out which probability space to use. 44 00:02:04,150 --> 00:02:06,690 And the mathematics doesn't really 45 00:02:06,690 --> 00:02:10,030 start until you have the probability space. 46 00:02:10,030 --> 00:02:12,930 Up until that, it's the modeling part that's 47 00:02:12,930 --> 00:02:14,260 very important mathematically. 48 00:02:14,260 --> 00:02:18,970 But you can't say that the model is right or wrong. 49 00:02:18,970 --> 00:02:22,050 It's a model, and its rightness or wrongness 50 00:02:22,050 --> 00:02:24,490 is a matter of judgment and comparison 51 00:02:24,490 --> 00:02:28,360 to how it stacks up against reality and things 52 00:02:28,360 --> 00:02:29,519 that we care about. 53 00:02:29,519 --> 00:02:31,060 When we're using the tree model, it's 54 00:02:31,060 --> 00:02:33,960 the leaves of the tree that correspond to the outcomes. 55 00:02:33,960 --> 00:02:37,740 And the outcome probabilities, which 56 00:02:37,740 --> 00:02:39,910 are crucial for having a probability space, 57 00:02:39,910 --> 00:02:44,010 we got by reasoning about the probabilities 58 00:02:44,010 --> 00:02:46,820 to assign to each possible branch of the tree 59 00:02:46,820 --> 00:02:51,560 as you worked your way from root to leaf. 60 00:02:51,560 --> 00:02:54,030 So the other key concept that we saw already 61 00:02:54,030 --> 00:02:55,110 is the idea of an event. 62 00:02:55,110 --> 00:02:57,610 An event, formally, is nothing but a subset of the sample 63 00:02:57,610 --> 00:02:59,010 space. 64 00:02:59,010 --> 00:03:01,240 An event is some set of outcomes. 65 00:03:01,240 --> 00:03:03,690 Presumably, the event is an event 66 00:03:03,690 --> 00:03:06,440 that you're interested in, like winning. 67 00:03:06,440 --> 00:03:09,680 And the definition of the probability of an event 68 00:03:09,680 --> 00:03:12,630 is simply the sum of the probabilities of all 69 00:03:12,630 --> 00:03:14,260 the outcomes in the event. 70 00:03:14,260 --> 00:03:17,090 And we were using this already for both Monty Hall 71 00:03:17,090 --> 00:03:20,100 and for the poker hands. 72 00:03:20,100 --> 00:03:22,900 But this is the official general definition-- 73 00:03:22,900 --> 00:03:25,010 that once we have a probability function that 74 00:03:25,010 --> 00:03:27,170 assigns probabilities to outcomes, 75 00:03:27,170 --> 00:03:30,990 then we can use that to define the probability of an event. 76 00:03:30,990 --> 00:03:33,950 This is the definition of the probability of an event-- 77 00:03:33,950 --> 00:03:38,480 simply the sum of the outcome probabilities. 78 00:03:38,480 --> 00:03:42,520 And as an immediate corollary of this definition, what we get 79 00:03:42,520 --> 00:03:45,230 is something that's central to probability theory. 80 00:03:45,230 --> 00:03:47,080 It's called the sum rule. 81 00:03:47,080 --> 00:03:51,430 And it says that if you have a bunch of events that 82 00:03:51,430 --> 00:03:53,360 are pairwise disjoint-- so there's 83 00:03:53,360 --> 00:03:59,730 no outcome in common to A0 or A1 or A1 or A2 and so on-- then 84 00:03:59,730 --> 00:04:04,460 the probability of the union of the A's, the probability 85 00:04:04,460 --> 00:04:07,820 that one of these events occurs, one or more of these events 86 00:04:07,820 --> 00:04:12,850 occurs, is simply the sum of the individual probabilities. 87 00:04:12,850 --> 00:04:16,260 And that is a rule that we'll be using all the time. 88 00:04:16,260 --> 00:04:18,230 It's very convenient for computing things. 89 00:04:18,230 --> 00:04:20,784 If you just break them up into separate cases, 90 00:04:20,784 --> 00:04:22,920 then you can handle the separate cases-- each A0, 91 00:04:22,920 --> 00:04:27,210 A1-- separately, and then add up the probabilities. 92 00:04:27,210 --> 00:04:29,990 And in some approaches to probability, more general ones, 93 00:04:29,990 --> 00:04:32,130 this is actually taken as an axiom. 94 00:04:32,130 --> 00:04:35,110 It's the axiom that defines a probability space, 95 00:04:35,110 --> 00:04:38,180 but where you start with an assignment of probabilities 96 00:04:38,180 --> 00:04:38,701 to events. 97 00:04:38,701 --> 00:04:41,200 But in the discrete case, we don't have to worry about that. 98 00:04:41,200 --> 00:04:44,620 It's a corollary of the way we define the probability. 99 00:04:44,620 --> 00:04:47,450 And that, of course, is a crucial rule-- sometimes called 100 00:04:47,450 --> 00:04:48,460 the countable sum rule. 101 00:04:48,460 --> 00:04:50,800 But we're just going to call it the sum rule. 102 00:04:50,800 --> 00:04:53,420 Expressed in concise notation, it's 103 00:04:53,420 --> 00:04:56,050 the probability of the union of the Ai's, 104 00:04:56,050 --> 00:04:58,340 as i ranges over the non-negative integers, 105 00:04:58,340 --> 00:05:02,300 is simply the sum of the individual probabilities 106 00:05:02,300 --> 00:05:04,990 of those events. 107 00:05:04,990 --> 00:05:09,320 Now, why it's called discrete probability that we're studying 108 00:05:09,320 --> 00:05:12,100 is because we have a countable sample space. 109 00:05:12,100 --> 00:05:15,840 And as we saw, that discrete combinatorics 110 00:05:15,840 --> 00:05:18,420 is the combinatorics of countable 111 00:05:18,420 --> 00:05:21,710 and even finite sets, really. 112 00:05:21,710 --> 00:05:23,820 The crucial reason why we're sticking 113 00:05:23,820 --> 00:05:26,820 to discrete probability is that allows us to work 114 00:05:26,820 --> 00:05:29,300 with sums instead of integrals. 115 00:05:29,300 --> 00:05:34,060 If you start allowing continuous intervals of time 116 00:05:34,060 --> 00:05:37,580 and the probability, say, of throwing a dart 117 00:05:37,580 --> 00:05:40,219 and it landing at a given interval on the line 118 00:05:40,219 --> 00:05:41,760 and a whole bunch of other situations 119 00:05:41,760 --> 00:05:46,530 where it's natural to want to use continuous probabilities, 120 00:05:46,530 --> 00:05:49,090 you're forced into defining a probability 121 00:05:49,090 --> 00:05:51,880 in terms of integrals, because every outcome 122 00:05:51,880 --> 00:05:53,230 has probability 0. 123 00:05:53,230 --> 00:05:55,700 And the theoretical basis of it is considerably more 124 00:05:55,700 --> 00:05:57,130 complicated. 125 00:05:57,130 --> 00:05:59,270 And we don't need it for, in fact, 126 00:05:59,270 --> 00:06:02,880 virtually any purposes that come up in computer science. 127 00:06:02,880 --> 00:06:04,940 And so we will, happily, not have 128 00:06:04,940 --> 00:06:07,830 to study integral calculus or measure theory, really, 129 00:06:07,830 --> 00:06:10,880 and just get by with sums. 130 00:06:10,880 --> 00:06:14,810 So let's quickly point out some rules that are now corollaries. 131 00:06:14,810 --> 00:06:17,420 They're really derived rules of probability theory 132 00:06:17,420 --> 00:06:22,060 that follow as a consequence of the countable sum rule. 133 00:06:22,060 --> 00:06:24,230 And the first one is the difference rule. 134 00:06:24,230 --> 00:06:27,460 The probability of A minus B is simply 135 00:06:27,460 --> 00:06:29,830 equal to the probability of A minus the probability 136 00:06:29,830 --> 00:06:31,030 of A intersection B. 137 00:06:31,030 --> 00:06:33,720 Now, notice how much this looks like the difference 138 00:06:33,720 --> 00:06:36,520 rule for cardinalities-- that the cardinality 139 00:06:36,520 --> 00:06:39,500 of the finite set A minus B is simply the cardinality of A 140 00:06:39,500 --> 00:06:42,120 minus the cardinality of A intersection B. 141 00:06:42,120 --> 00:06:43,900 And indeed, the proof of this is just 142 00:06:43,900 --> 00:06:45,150 like the proof of cardinality. 143 00:06:45,150 --> 00:06:48,290 It follows directly from the sum rule 144 00:06:48,290 --> 00:06:50,530 for probabilities, which corresponds, of course, 145 00:06:50,530 --> 00:06:52,840 to the sum rule for cardinalities. 146 00:06:52,840 --> 00:06:54,980 Namely, by the sum rule for probabilities, 147 00:06:54,980 --> 00:06:58,420 what we know is that A is equal set, 148 00:06:58,420 --> 00:07:02,350 theoretically, to A intersection B plus A minus b. 149 00:07:02,350 --> 00:07:04,670 That is, A breaks up into the points 150 00:07:04,670 --> 00:07:07,040 that it has in common with B and the points 151 00:07:07,040 --> 00:07:09,410 that it doesn't have in common with B. Since those 152 00:07:09,410 --> 00:07:10,910 are disjoint, you can add them. 153 00:07:10,910 --> 00:07:13,640 So the probability of A is equal to the probability 154 00:07:13,640 --> 00:07:17,250 of A intersection B plus probability of A minus B. 155 00:07:17,250 --> 00:07:20,440 And so I just transpose the A minus B to the left hand side. 156 00:07:20,440 --> 00:07:23,080 And I get the difference rule, which 157 00:07:23,080 --> 00:07:26,550 is a rule that's worth remembering. 158 00:07:26,550 --> 00:07:29,940 Similarly, we have inclusion-exclusion. 159 00:07:29,940 --> 00:07:32,410 If A and B are not disjoint, then the probability 160 00:07:32,410 --> 00:07:34,320 of A union B is equal to the probability 161 00:07:34,320 --> 00:07:37,730 of A plus the probability of B minus the probability 162 00:07:37,730 --> 00:07:38,600 of the intersection. 163 00:07:38,600 --> 00:07:40,320 And the proof is, in fact, exactly 164 00:07:40,320 --> 00:07:45,140 like the corresponding rule for cardinalities of finite sets. 165 00:07:45,140 --> 00:07:47,590 And of course, it generalizes to more sets. 166 00:07:47,590 --> 00:07:50,090 This is an example of the inclusion-exclusion for three 167 00:07:50,090 --> 00:07:52,760 sets in terms of probability. 168 00:07:52,760 --> 00:07:55,420 Another useful, it turns out, consequence 169 00:07:55,420 --> 00:07:58,810 is that the probability that A or B happens 170 00:07:58,810 --> 00:08:01,560 is guaranteed to be less than or equal to the probability 171 00:08:01,560 --> 00:08:04,770 that A happens plus the probability of B happens. 172 00:08:04,770 --> 00:08:06,950 And this follows as a trivial consequence 173 00:08:06,950 --> 00:08:09,070 of the inclusion-exclusion rule for two sets, 174 00:08:09,070 --> 00:08:11,510 because the probability of A union B 175 00:08:11,510 --> 00:08:14,650 is equal to this plus this minus some probability, 176 00:08:14,650 --> 00:08:16,757 namely, the probability of the intersection. 177 00:08:16,757 --> 00:08:18,590 So you're taking away something non-negative 178 00:08:18,590 --> 00:08:20,470 from these two in order to equal that. 179 00:08:20,470 --> 00:08:25,390 In particular, then, this must be less than or equal to that. 180 00:08:25,390 --> 00:08:30,375 And the closely related phenomenon is [? basi-- ?] 181 00:08:30,375 --> 00:08:39,960 [AUDIO OUT] The probability that A or B happens 182 00:08:39,960 --> 00:08:44,600 is greater than or equal to the probability that A happens. 183 00:08:44,600 --> 00:08:46,520 And finally, we can generalize that 184 00:08:46,520 --> 00:08:48,050 to a countable collection of sets. 185 00:08:48,050 --> 00:08:52,500 If I have a bunch of events-- A0, A1, and so on-- then 186 00:08:52,500 --> 00:08:56,070 the probability that at least one of them occurs, 187 00:08:56,070 --> 00:08:58,860 the probability of the union of the Ai's, is 188 00:08:58,860 --> 00:09:02,800 less than or equal to the sum of their probabilities. 189 00:09:02,800 --> 00:09:05,920 This is, again, another kind of obvious rule, not hard 190 00:09:05,920 --> 00:09:06,560 to prove. 191 00:09:06,560 --> 00:09:07,800 We're not going to bother proving it, 192 00:09:07,800 --> 00:09:09,010 because it really is obvious. 193 00:09:09,010 --> 00:09:12,330 But we will get some mileage out of it later on. 194 00:09:12,330 --> 00:09:17,520 So to summarize, the key concept here is a probability space. 195 00:09:17,520 --> 00:09:21,900 It consists of a countable set of outcomes, the sample space, 196 00:09:21,900 --> 00:09:27,180 and a probability function that assigns values between 0 and 1 197 00:09:27,180 --> 00:09:31,900 to every outcome such that the sum of the probabilities is 1. 198 00:09:31,900 --> 00:09:34,570 And when we're using our tree model and so on, 199 00:09:34,570 --> 00:09:37,450 our objective is to construct one of these things. 200 00:09:37,450 --> 00:09:39,600 Usually, the hard part will be verifying 201 00:09:39,600 --> 00:09:41,570 that, in fact, the way we've assigned 202 00:09:41,570 --> 00:09:44,390 probabilities has the property that the sum of them 203 00:09:44,390 --> 00:09:46,500 is equal to 1.