1 00:00:01,500 --> 00:00:04,940 We have so far discussed the first step involved in the 2 00:00:04,940 --> 00:00:07,350 construction of a probabilistic model. 3 00:00:07,350 --> 00:00:10,930 Namely, the construction of a sample space, which is a 4 00:00:10,930 --> 00:00:13,730 description of the possible outcomes of a probabilistic 5 00:00:13,730 --> 00:00:15,100 experiment. 6 00:00:15,100 --> 00:00:19,760 We now come to the second and much more interesting part. 7 00:00:19,760 --> 00:00:24,210 We need to specify which outcomes are more likely to 8 00:00:24,210 --> 00:00:28,540 occur and which ones are less likely to occur and so on. 9 00:00:28,540 --> 00:00:33,280 And we will do that by assigning probabilities to the 10 00:00:33,280 --> 00:00:35,220 different outcomes. 11 00:00:35,220 --> 00:00:40,550 However, as we try to do this assignment, we run into some 12 00:00:40,550 --> 00:00:43,340 kind of difficulty, which is the following. 13 00:00:43,340 --> 00:00:46,310 Remember the previous experiment involving a 14 00:00:46,310 --> 00:00:50,880 continuous sample space, which was the unit square and in 15 00:00:50,880 --> 00:00:55,280 which we throw a dart at random and record the point 16 00:00:55,280 --> 00:00:57,160 that occurred. 17 00:00:57,160 --> 00:01:00,760 In this experiment, what do you think is the probability 18 00:01:00,760 --> 00:01:02,860 of a particular point? 19 00:01:02,860 --> 00:01:07,540 Let's say what is the probability that my dart hits 20 00:01:07,540 --> 00:01:11,890 exactly the center of this square. 21 00:01:11,890 --> 00:01:14,330 Well, this probability would be essentially 0. 22 00:01:14,330 --> 00:01:16,990 Hitting the center exactly with infinite 23 00:01:16,990 --> 00:01:18,870 precision should be 0. 24 00:01:18,870 --> 00:01:22,460 And so it's natural that in such a continuous model any 25 00:01:22,460 --> 00:01:27,020 individual point should have a 0 probability. 26 00:01:27,020 --> 00:01:31,280 For this reason instead of assigning probabilities to 27 00:01:31,280 --> 00:01:36,640 individual points, we will instead assign probabilities 28 00:01:36,640 --> 00:01:42,870 to whole sets, that is, to subsets of the sample space. 29 00:01:42,870 --> 00:01:49,090 So here we have our sample space, which is some 30 00:01:49,090 --> 00:01:53,620 abstract set omega. 31 00:01:53,620 --> 00:01:56,570 Here is a subset of the sample space. 32 00:01:56,570 --> 00:02:01,540 Call it capital A. We're going to assign a probability to 33 00:02:01,540 --> 00:02:05,620 that subset A, which we're going to denote with this 34 00:02:05,620 --> 00:02:12,380 notation, which we read as the probability of set A. So 35 00:02:12,380 --> 00:02:15,580 probabilities will be assigned to subsets. 36 00:02:15,580 --> 00:02:18,420 And these will not cause us difficulties in the continuous 37 00:02:18,420 --> 00:02:22,280 case because even though individual points would have 0 38 00:02:22,280 --> 00:02:26,720 probability, if you ask me what are the odds that my dart 39 00:02:26,720 --> 00:02:31,760 falls in the upper half, let's say, of this diagram, then 40 00:02:31,760 --> 00:02:34,840 that should be a reasonable positive number. 41 00:02:34,840 --> 00:02:37,220 So even though individual outcomes may have 0 42 00:02:37,220 --> 00:02:41,110 probabilities, sets of outcomes in general would be 43 00:02:41,110 --> 00:02:44,230 expected to have positive probabilities. 44 00:02:44,230 --> 00:02:48,800 So coming back, we're going to assign probabilities to the 45 00:02:48,800 --> 00:02:51,810 various subsets of the sample space. 46 00:02:51,810 --> 00:02:55,090 And here comes a piece of terminology, that a subset of 47 00:02:55,090 --> 00:02:57,890 the sample space is called an event. 48 00:02:57,890 --> 00:02:59,880 Why is it called an event? 49 00:02:59,880 --> 00:03:04,110 Because once we carry out the experiment and we observe the 50 00:03:04,110 --> 00:03:08,150 outcome of the experiment, either this outcome is inside 51 00:03:08,150 --> 00:03:13,820 the set A and in that case we say that event A has occurred, 52 00:03:13,820 --> 00:03:18,550 or the outcome falls outside the set A in which case we say 53 00:03:18,550 --> 00:03:22,640 that event A did not occur. 54 00:03:22,640 --> 00:03:26,630 Now we want to move on and describe certain rules. 55 00:03:26,630 --> 00:03:29,340 The rules of the game in probabilistic models, which 56 00:03:29,340 --> 00:03:31,570 are basically the rules that these 57 00:03:31,570 --> 00:03:34,030 probabilities should satisfy. 58 00:03:34,030 --> 00:03:36,510 They shouldn't be completely arbitrary. 59 00:03:36,510 --> 00:03:42,240 First, by convention, probabilities are always given 60 00:03:42,240 --> 00:03:45,270 in the range between 0 and 1. 61 00:03:45,270 --> 00:03:48,390 Intuitively, 0 probability means that we believe that 62 00:03:48,390 --> 00:03:51,250 something practically cannot happen. 63 00:03:51,250 --> 00:03:56,680 And probability of 1 means that we're practically certain 64 00:03:56,680 --> 00:04:00,080 that an event of interest is going to happen. 65 00:04:00,080 --> 00:04:04,410 So we want to specify rules of these kind for probabilities. 66 00:04:04,410 --> 00:04:07,240 These rules that any probabilistic model should 67 00:04:07,240 --> 00:04:11,100 satisfy are called the axioms of probability theory. 68 00:04:11,100 --> 00:04:14,560 And our first axiom is a nonnegativity axiom. 69 00:04:14,560 --> 00:04:16,810 Namely, probabilities will always be 70 00:04:16,810 --> 00:04:19,130 non-negative numbers. 71 00:04:19,130 --> 00:04:20,940 It's a reasonable rule. 72 00:04:20,940 --> 00:04:26,050 The second rule is that if the subset that we're looking at 73 00:04:26,050 --> 00:04:29,760 is actually not a subset but is the entire sample space 74 00:04:29,760 --> 00:04:35,810 omega, the probability of it should always be equal to 1. 75 00:04:35,810 --> 00:04:37,570 What does that mean? 76 00:04:37,570 --> 00:04:40,850 We know that the outcome is going to be an element of the 77 00:04:40,850 --> 00:04:41,850 sample space. 78 00:04:41,850 --> 00:04:44,430 This is the definition of the sample space. 79 00:04:44,430 --> 00:04:48,000 So we have absolute certainty that our outcome is going to 80 00:04:48,000 --> 00:04:49,490 be in omega. 81 00:04:49,490 --> 00:04:52,659 Or in different language we have absolute certainty that 82 00:04:52,659 --> 00:04:55,630 event omega is going to occur. 83 00:04:55,630 --> 00:04:59,170 And we capture this certainty by saying that the probability 84 00:04:59,170 --> 00:05:02,760 of event omega is equal to 1. 85 00:05:02,760 --> 00:05:07,030 These two axioms are pretty simple and very intuitive. 86 00:05:07,030 --> 00:05:11,150 The more interesting axiom is the next one that says 87 00:05:11,150 --> 00:05:13,880 something a little more complicated. 88 00:05:13,880 --> 00:05:18,800 Before we discuss that particular axiom, a quick 89 00:05:18,800 --> 00:05:22,770 reminder about set theoretic notation. 90 00:05:22,770 --> 00:05:29,250 If we have two sets, let's say a set A, and another set, 91 00:05:29,250 --> 00:05:38,260 another set B, we use this particular notation, which we 92 00:05:38,260 --> 00:05:44,360 read as "A intersection B" to refer to the collection of 93 00:05:44,360 --> 00:05:49,960 elements that belong to both A and B. So in this picture, the 94 00:05:49,960 --> 00:05:56,390 intersection of A and B is this shaded set. 95 00:05:56,390 --> 00:06:03,030 We use this notation, which we read as "A union B", to refer 96 00:06:03,030 --> 00:06:06,960 to the set of elements that belong to A 97 00:06:06,960 --> 00:06:09,840 or to B or to both. 98 00:06:09,840 --> 00:06:13,990 So in terms of this picture, the union of the two sets 99 00:06:13,990 --> 00:06:17,250 would be this blue set. 100 00:06:17,250 --> 00:06:20,830 After this reminder about set theoretic notation, now let us 101 00:06:20,830 --> 00:06:23,370 look at the form of the third axiom. 102 00:06:23,370 --> 00:06:24,880 What does it say? 103 00:06:24,880 --> 00:06:28,860 If we have two sets, two events, two subsets of the 104 00:06:28,860 --> 00:06:33,480 sample space, which are disjoint. 105 00:06:33,480 --> 00:06:36,460 So here's our sample space. 106 00:06:36,460 --> 00:06:40,220 And here are the two sets that are disjoint. 107 00:06:43,340 --> 00:06:47,110 In mathematical terms, two sets being disjoint means that 108 00:06:47,110 --> 00:06:50,640 their intersection has no elements. 109 00:06:50,640 --> 00:06:53,620 So their intersection is the empty set. 110 00:06:53,620 --> 00:06:59,159 And we use this symbol here to denote the empty set. 111 00:06:59,159 --> 00:07:03,030 So if the intersection of two sets is empty, then the 112 00:07:03,030 --> 00:07:07,685 probability that the outcome of the experiments falls in 113 00:07:07,685 --> 00:07:11,740 the union of A and B, that is, the probability that the 114 00:07:11,740 --> 00:07:17,260 outcome is here or there, is equal to the sum of the 115 00:07:17,260 --> 00:07:21,700 probabilities of these two sets. 116 00:07:21,700 --> 00:07:24,070 This is called the additivity axiom. 117 00:07:24,070 --> 00:07:29,040 So it says that we can add probabilities of different 118 00:07:29,040 --> 00:07:32,760 sets when those two sets are disjoint. 119 00:07:32,760 --> 00:07:38,260 In some sense we can think of probability as being one pound 120 00:07:38,260 --> 00:07:43,010 of some substance which is spread over our sample space 121 00:07:43,010 --> 00:07:46,780 and the probability of A is how much of that substance is 122 00:07:46,780 --> 00:07:51,659 sitting on top of a set A. So what this axiom is saying is 123 00:07:51,659 --> 00:07:56,740 that the total amount of that substance sitting on top of A 124 00:07:56,740 --> 00:08:01,510 and B is how much is sitting on top of A plus how much is 125 00:08:01,510 --> 00:08:05,590 sitting on top of B. And that is the case whenever the sets 126 00:08:05,590 --> 00:08:10,510 A and B are disjoint from each other. 127 00:08:10,510 --> 00:08:13,960 The additivity axiom needs to be refined a bit. 128 00:08:13,960 --> 00:08:16,656 We will talk about that a little later. 129 00:08:16,656 --> 00:08:20,150 Other than this refinement, these three axioms are the 130 00:08:20,150 --> 00:08:22,490 only requirements in order to have a 131 00:08:22,490 --> 00:08:25,100 legitimate probability model. 132 00:08:25,100 --> 00:08:27,950 At this point you may ask, shouldn't there be more 133 00:08:27,950 --> 00:08:29,270 requirements? 134 00:08:29,270 --> 00:08:32,909 Shouldn't we, for example, say that probabilities cannot be 135 00:08:32,909 --> 00:08:34,950 greater than 1? 136 00:08:34,950 --> 00:08:36,760 Yes and no. 137 00:08:36,760 --> 00:08:40,570 We do not want probabilities to be larger than 1, but we do 138 00:08:40,570 --> 00:08:42,320 not need to say it. 139 00:08:42,320 --> 00:08:45,460 As we will see in the next segment, such a requirement 140 00:08:45,460 --> 00:08:48,340 follows from what we have already said. 141 00:08:48,340 --> 00:08:51,770 And the same is true for several other natural 142 00:08:51,770 --> 00:08:53,110 properties of probabilities.