1 00:00:02,550 --> 00:00:05,000 The probability axioms are the basic rules 2 00:00:05,000 --> 00:00:06,710 of probability theory. 3 00:00:06,710 --> 00:00:09,230 And they are surprisingly few. 4 00:00:09,230 --> 00:00:11,590 But they imply many interesting properties that we 5 00:00:11,590 --> 00:00:13,080 will now explore. 6 00:00:13,080 --> 00:00:16,910 First we will see that what you might think of as missing 7 00:00:16,910 --> 00:00:21,920 axioms are actually implied by the axioms already in place. 8 00:00:21,920 --> 00:00:26,850 For example, we have an axiom that probabilities are 9 00:00:26,850 --> 00:00:29,130 non-negative. 10 00:00:29,130 --> 00:00:34,010 We will show that probabilities are also less 11 00:00:34,010 --> 00:00:36,600 than or equal to 1. 12 00:00:36,600 --> 00:00:40,260 We have another axiom that says that the probability of 13 00:00:40,260 --> 00:00:42,840 the entire sample space is 1. 14 00:00:42,840 --> 00:00:45,740 We will show a counterpart that the probability of the 15 00:00:45,740 --> 00:00:48,540 empty set is equal to 0. 16 00:00:48,540 --> 00:00:49,920 This makes perfect sense. 17 00:00:49,920 --> 00:00:53,720 The empty set has no elements, so it is impossible. 18 00:00:53,720 --> 00:00:57,720 There is 0 probability that the outcome of the experiment 19 00:00:57,720 --> 00:01:00,790 would lie in the empty set. 20 00:01:00,790 --> 00:01:03,430 We also have another intuitive property. 21 00:01:03,430 --> 00:01:07,460 The probability that an event happens plus the probability 22 00:01:07,460 --> 00:01:10,550 that the vendor does not happen exhaust all 23 00:01:10,550 --> 00:01:11,720 possibilities. 24 00:01:11,720 --> 00:01:14,630 And these two probabilities together should add to 1. 25 00:01:14,630 --> 00:01:17,920 For instance, if the probability of heads is 0.6, 26 00:01:17,920 --> 00:01:22,430 then the probability of tails should be 0.4. 27 00:01:22,430 --> 00:01:26,100 Finally, we can generalize the additivity axiom, which was 28 00:01:26,100 --> 00:01:32,320 originally given for the case of two disjoint events to the 29 00:01:32,320 --> 00:01:35,270 case where we're dealing with the union of 30 00:01:35,270 --> 00:01:38,690 several disjoint events. 31 00:01:38,690 --> 00:01:43,140 By disjoint here we mean that the intersection of any two of 32 00:01:43,140 --> 00:01:45,840 these events is the empty set. 33 00:01:45,840 --> 00:01:48,970 We will prove this for the case of three events and then 34 00:01:48,970 --> 00:01:52,310 the argument generalizes for the case where we're taking 35 00:01:52,310 --> 00:01:55,490 the union of k disjoint events, where k 36 00:01:55,490 --> 00:01:57,750 is any finite number. 37 00:01:57,750 --> 00:02:00,210 So the intuition of this result is the same as for the 38 00:02:00,210 --> 00:02:01,730 case of two events. 39 00:02:01,730 --> 00:02:05,480 But we will derive it formally and we will also use it to 40 00:02:05,480 --> 00:02:08,259 come up with a way of calculating the probability of 41 00:02:08,259 --> 00:02:12,740 a finite set by simply adding the probabilities of its 42 00:02:12,740 --> 00:02:14,800 individual elements. 43 00:02:14,800 --> 00:02:18,110 All of these statements that we just 44 00:02:18,110 --> 00:02:20,650 presented are intuitive. 45 00:02:20,650 --> 00:02:22,430 And you do not to really need to be 46 00:02:22,430 --> 00:02:24,760 convinced about their validity. 47 00:02:24,760 --> 00:02:27,940 Nevertheless, it is instructive to see how these 48 00:02:27,940 --> 00:02:31,260 statements follow from the axioms that 49 00:02:31,260 --> 00:02:32,510 we have put in place. 50 00:02:35,210 --> 00:02:39,550 So we will now present the arguments based only on the 51 00:02:39,550 --> 00:02:41,690 three axioms that we have available. 52 00:02:41,690 --> 00:02:45,440 And in order to be able to refer to these axioms, let us 53 00:02:45,440 --> 00:02:51,310 give them some names, call them axioms A, B, and C. 54 00:02:51,310 --> 00:02:52,930 We start as follows. 55 00:02:52,930 --> 00:02:57,390 Let us look at the sample space and a subset of that 56 00:02:57,390 --> 00:02:58,640 sample space. 57 00:02:58,640 --> 00:03:03,550 Call it A. And consider the complement of that subset. 58 00:03:03,550 --> 00:03:06,810 The complement is the set of all elements that do not 59 00:03:06,810 --> 00:03:12,750 belong to the set A. So a set together with its complement 60 00:03:12,750 --> 00:03:16,960 make up everything, which is the entire sample space. 61 00:03:16,960 --> 00:03:19,680 On the other hand, if an element belongs to a set A, it 62 00:03:19,680 --> 00:03:21,740 does not belong to its complement. 63 00:03:21,740 --> 00:03:24,290 So the intersection of a set with its complement 64 00:03:24,290 --> 00:03:27,020 is the empty set. 65 00:03:27,020 --> 00:03:31,210 Now we argue as follows. 66 00:03:31,210 --> 00:03:35,720 We have that the probability of the entire sample space is 67 00:03:35,720 --> 00:03:38,510 equal to 1. 68 00:03:38,510 --> 00:03:42,300 This is true by our second axiom. 69 00:03:42,300 --> 00:03:45,640 Now the sample space, as we just discussed, can be written 70 00:03:45,640 --> 00:03:50,290 as the union of an event and the complement of that event. 71 00:03:50,290 --> 00:03:54,610 This is just a set theoretic relation. 72 00:03:54,610 --> 00:04:00,950 And next since a set and its complement our disjoint, this 73 00:04:00,950 --> 00:04:05,020 means that we can apply the additivity axiom and write 74 00:04:05,020 --> 00:04:09,560 this probability as the sum of the probability of event A 75 00:04:09,560 --> 00:04:14,250 with the probability of the complement of A. This is one 76 00:04:14,250 --> 00:04:18,190 of the relations that we had claimed and which we have now 77 00:04:18,190 --> 00:04:19,730 established. 78 00:04:19,730 --> 00:04:23,310 Based on this relation, we can also write that the 79 00:04:23,310 --> 00:04:27,610 probability of an event A is equal to 1 minus the 80 00:04:27,610 --> 00:04:31,450 probability of the complement of that event. 81 00:04:31,450 --> 00:04:35,280 And because, by the non-negativity axiom this 82 00:04:35,280 --> 00:04:39,570 quantity here is non-negative, 1 minus something non-negative 83 00:04:39,570 --> 00:04:41,980 is less than or equal to 1. 84 00:04:41,980 --> 00:04:44,530 We're using here the non-negativity axiom. 85 00:04:44,530 --> 00:04:47,480 And we have established another property, namely that 86 00:04:47,480 --> 00:04:53,440 probabilities are always less than or equal to 1. 87 00:04:53,440 --> 00:05:04,920 Finally, let us note that 1 is the probability, always, of a 88 00:05:04,920 --> 00:05:10,680 set plus the probability of a complement of that set. 89 00:05:10,680 --> 00:05:13,810 And let us use this property for the case where the set of 90 00:05:13,810 --> 00:05:18,020 interest is the entire sample space. 91 00:05:18,020 --> 00:05:21,780 Now, the probability of the entire sample space is itself 92 00:05:21,780 --> 00:05:24,740 equal to 1. 93 00:05:24,740 --> 00:05:28,560 And what is the complement of the entire sample space? 94 00:05:28,560 --> 00:05:31,510 The complement of the entire sample space consists of all 95 00:05:31,510 --> 00:05:34,050 elements that do not belong to the sample space. 96 00:05:34,050 --> 00:05:38,290 But since the sample space is supposed to contain all 97 00:05:38,290 --> 00:05:41,030 possible elements, its complement is 98 00:05:41,030 --> 00:05:43,420 just the empty set. 99 00:05:43,420 --> 00:05:46,130 And from this relation we get the implication that the 100 00:05:46,130 --> 00:05:50,820 probability of the empty set is equal to 0. 101 00:05:50,820 --> 00:05:54,380 This establishes yet one more of the properties that we had 102 00:05:54,380 --> 00:05:56,090 just claimed a little earlier. 103 00:06:00,060 --> 00:06:03,390 We finally come to the proof of the generalization of our 104 00:06:03,390 --> 00:06:07,110 additivity axiom from the case of two disjoint events to the 105 00:06:07,110 --> 00:06:09,540 case of three disjoint events. 106 00:06:09,540 --> 00:06:12,420 So we have our sample space. 107 00:06:12,420 --> 00:06:15,650 And within that sample space we have three 108 00:06:15,650 --> 00:06:17,910 events, three subsets. 109 00:06:17,910 --> 00:06:21,170 And these subsets are disjoint in the sense that any two of 110 00:06:21,170 --> 00:06:24,920 those subsets have no elements in common. 111 00:06:24,920 --> 00:06:29,190 And we're interested in the probability of the union of A, 112 00:06:29,190 --> 00:06:30,650 B, and C. 113 00:06:30,650 --> 00:06:32,470 How do we make progress? 114 00:06:32,470 --> 00:06:35,750 We have an additivity axiom in our hands, which applies to 115 00:06:35,750 --> 00:06:38,909 the case of the union of two disjoint sets. 116 00:06:38,909 --> 00:06:40,650 Here we have three of them. 117 00:06:40,650 --> 00:06:42,870 But we can do the following trick. 118 00:06:42,870 --> 00:06:46,970 We can think of the union of A, B, and C as consisting of 119 00:06:46,970 --> 00:06:54,950 the union of this blue set with that green set. 120 00:06:54,950 --> 00:06:58,720 Formally, what we're doing is that we're expressing the 121 00:06:58,720 --> 00:07:01,790 union of these three sets as follows. 122 00:07:01,790 --> 00:07:07,330 We form one set by taking the union of A with B. And we have 123 00:07:07,330 --> 00:07:11,220 the other set C. And the overall union can be thought 124 00:07:11,220 --> 00:07:14,080 of as the union of these two sets. 125 00:07:14,080 --> 00:07:17,860 Now since the three sets are disjoint, this implies that 126 00:07:17,860 --> 00:07:21,830 the blue set is disjoint from the green set and so we can 127 00:07:21,830 --> 00:07:25,990 use the additivity axiom here to write this probability as 128 00:07:25,990 --> 00:07:33,310 the probability of A union B plus the probability of C. And 129 00:07:33,310 --> 00:07:36,570 now we can use the additivity axiom once more since the sets 130 00:07:36,570 --> 00:07:40,659 A and B are disjoint to write the first term as probability 131 00:07:40,659 --> 00:07:45,020 of A plus probability of B. We carry over the last term and 132 00:07:45,020 --> 00:07:48,200 we have the relation that we wanted to prove. 133 00:07:48,200 --> 00:07:50,409 This is the proof for the case of three events. 134 00:07:50,409 --> 00:07:54,060 You should be able to follow this line of proof to write an 135 00:07:54,060 --> 00:07:56,850 argument for the case of four events and so on. 136 00:07:56,850 --> 00:07:59,750 And you might want to continue by induction. 137 00:07:59,750 --> 00:08:04,340 And eventually you should be able to prove that if the sets 138 00:08:04,340 --> 00:08:16,300 A1 up to Ak are disjoint then the probability of the union 139 00:08:16,300 --> 00:08:25,290 of those sets is going to be equal to the sum of their 140 00:08:25,290 --> 00:08:28,150 individual probabilities. 141 00:08:28,150 --> 00:08:31,180 So this is the generalization to the case where we're 142 00:08:31,180 --> 00:08:37,570 dealing with the union of finitely many disjoint events. 143 00:08:37,570 --> 00:08:43,140 A very useful application of this comes in the case where 144 00:08:43,140 --> 00:08:48,800 we want to calculate the probability of a finite set. 145 00:08:48,800 --> 00:08:52,960 So here we have a sample space. 146 00:08:52,960 --> 00:08:57,910 And within that sample space we have some particular 147 00:08:57,910 --> 00:09:02,370 elements S1, S2, up to Sk, k of them. 148 00:09:02,370 --> 00:09:07,570 And these elements together form a finite set. 149 00:09:07,570 --> 00:09:09,470 What can we say about the probability 150 00:09:09,470 --> 00:09:11,850 of this finite set? 151 00:09:11,850 --> 00:09:17,270 The idea is to take this finite set that consists of k 152 00:09:17,270 --> 00:09:22,800 elements and think of it as the union of several little 153 00:09:22,800 --> 00:09:27,810 sets that contain one element each. 154 00:09:27,810 --> 00:09:31,010 So set theoretically what we're doing is that we're 155 00:09:31,010 --> 00:09:34,980 taking this set with k elements and we write it as 156 00:09:34,980 --> 00:09:39,210 the union of a set that contains just S1, a set that 157 00:09:39,210 --> 00:09:43,820 contains just the second element S2, and so on, up to 158 00:09:43,820 --> 00:09:45,070 the k-th element. 159 00:09:47,710 --> 00:09:50,800 We're assuming, of course, that these elements are all 160 00:09:50,800 --> 00:09:53,010 different from each other. 161 00:09:53,010 --> 00:09:56,630 So in that case, these sets, these single element sets, are 162 00:09:56,630 --> 00:09:58,010 all disjoint. 163 00:09:58,010 --> 00:10:01,990 So using the additivity property for a union of k 164 00:10:01,990 --> 00:10:07,300 disjoint sets, we can write this as the sum of the 165 00:10:07,300 --> 00:10:11,210 probabilities of the different single element sets. 166 00:10:16,770 --> 00:10:21,180 At this point, it is usual to start abusing, or rather, 167 00:10:21,180 --> 00:10:23,570 simplifying notation a little bit. 168 00:10:23,570 --> 00:10:26,020 Probabilities are assigned to sets. 169 00:10:26,020 --> 00:10:29,080 So here we're talking about the probability of a set that 170 00:10:29,080 --> 00:10:30,910 contains a single element. 171 00:10:30,910 --> 00:10:34,870 But intuitively, we can also talk as just the probability 172 00:10:34,870 --> 00:10:39,880 of that particular element and use this simpler notation. 173 00:10:39,880 --> 00:10:43,450 So when using the simpler notation, we will be talking 174 00:10:43,450 --> 00:10:46,930 about the probabilities of individual elements. 175 00:10:46,930 --> 00:10:49,960 Although in terms of formal mathematics, what we really 176 00:10:49,960 --> 00:10:56,490 mean is the probability of this event that's comprised 177 00:10:56,490 --> 00:11:00,880 only of a particular element S1 and so on.