1 00:00:01,420 --> 00:00:04,390 A random variable can take different numerical values 2 00:00:04,390 --> 00:00:06,890 depending on the outcome of the experiment. 3 00:00:06,890 --> 00:00:09,800 Some outcomes are more likely than others, and similarly 4 00:00:09,800 --> 00:00:13,130 some of the possible numerical values of a random variable 5 00:00:13,130 --> 00:00:15,480 will be more likely than others. 6 00:00:15,480 --> 00:00:18,500 We restrict ourselves to discrete random variables, and 7 00:00:18,500 --> 00:00:22,060 we will describe these relative likelihoods in terms 8 00:00:22,060 --> 00:00:26,080 of the so-called probability mass function, or PMF for 9 00:00:26,080 --> 00:00:29,440 short, which gives the probability of the different 10 00:00:29,440 --> 00:00:31,530 possible numerical values. 11 00:00:31,530 --> 00:00:35,480 The PMF is also sometimes called the probability law or 12 00:00:35,480 --> 00:00:39,590 the probability distribution of a discrete random variable. 13 00:00:39,590 --> 00:00:43,090 Let me illustrate the idea in terms of a simple example. 14 00:00:43,090 --> 00:00:45,160 We have a probabilistic experiment with 15 00:00:45,160 --> 00:00:47,380 four possible outcomes. 16 00:00:47,380 --> 00:00:52,600 We also have a probability law on the sample space. 17 00:00:52,600 --> 00:00:57,170 And to keep things simple, we assume that all four outcomes 18 00:00:57,170 --> 00:01:00,610 in our sample space are equally likely. 19 00:01:00,610 --> 00:01:03,450 We then introduce a random variable that associates a 20 00:01:03,450 --> 00:01:05,800 number with each possible outcome as 21 00:01:05,800 --> 00:01:07,630 shown in this diagram. 22 00:01:07,630 --> 00:01:12,310 The random variable, X, can take one of 23 00:01:12,310 --> 00:01:14,030 three possible values-- 24 00:01:14,030 --> 00:01:16,510 namely 3, 4, or 5. 25 00:01:16,510 --> 00:01:19,810 Let us focus on one of those numbers-- 26 00:01:19,810 --> 00:01:22,610 let's say the number 5. 27 00:01:22,610 --> 00:01:28,100 So let us focus on x being equal to 5. 28 00:01:28,100 --> 00:01:34,990 We can think of the event that X is equal to 5. 29 00:01:34,990 --> 00:01:36,890 Which event is this? 30 00:01:36,890 --> 00:01:40,090 This is the event that the outcome of the experiment led 31 00:01:40,090 --> 00:01:44,310 to the random variable taking a value of 5. 32 00:01:44,310 --> 00:01:47,470 So it is this particular event which consists of two 33 00:01:47,470 --> 00:01:50,729 elements, namely a and b. 34 00:01:50,729 --> 00:01:55,580 More formally, the event that we're talking about is the set 35 00:01:55,580 --> 00:02:01,790 of all outcomes for which the value, the numerical value of 36 00:02:01,790 --> 00:02:05,200 our random variable, which is a function of the outcome, 37 00:02:05,200 --> 00:02:08,770 that numerical value happens to be equal to 5. 38 00:02:08,770 --> 00:02:11,880 And in this example it is a set 39 00:02:11,880 --> 00:02:13,410 consisting of two elements. 40 00:02:13,410 --> 00:02:15,460 It's a subset of the sample space. 41 00:02:15,460 --> 00:02:17,750 So it is an event. 42 00:02:17,750 --> 00:02:19,700 And it has a probability. 43 00:02:19,700 --> 00:02:21,510 And that probability we will be 44 00:02:21,510 --> 00:02:25,500 denoting with this notation. 45 00:02:25,500 --> 00:02:30,160 And in our case this probability is equal to 1/2. 46 00:02:30,160 --> 00:02:34,310 Because we have two outcomes, each one has probability 1/4. 47 00:02:34,310 --> 00:02:38,890 The probability of this event is equal to 1/2. 48 00:02:38,890 --> 00:02:45,290 More generally, we will be using this notation to denote 49 00:02:45,290 --> 00:02:50,730 the probability of the event that the random variable, X , 50 00:02:50,730 --> 00:02:53,900 takes on a particular value, x. 51 00:02:53,900 --> 00:02:57,130 This is just a piece of notation, not a new concept. 52 00:02:57,130 --> 00:02:59,740 We're dealing with a probability, and we indicate 53 00:02:59,740 --> 00:03:04,350 it using this particular notation. 54 00:03:04,350 --> 00:03:08,550 More formally, the probability that we're dealing with is the 55 00:03:08,550 --> 00:03:12,950 probability, the total probability, of all outcomes 56 00:03:12,950 --> 00:03:17,360 for which the numerical value of our random variable is this 57 00:03:17,360 --> 00:03:20,320 particular number, x. 58 00:03:20,320 --> 00:03:21,960 A few things to notice. 59 00:03:21,960 --> 00:03:28,780 We use a subscript, X, to indicate which random variable 60 00:03:28,780 --> 00:03:30,230 we're talking about. 61 00:03:30,230 --> 00:03:32,510 This will be useful if we have several 62 00:03:32,510 --> 00:03:34,300 random variables involved. 63 00:03:34,300 --> 00:03:37,060 For example, if we have another random variable on the 64 00:03:37,060 --> 00:03:42,890 same sample space, Y, then it would have its own probability 65 00:03:42,890 --> 00:03:48,079 mass function which would be denoted with this particular 66 00:03:48,079 --> 00:03:49,329 notation here. 67 00:03:52,160 --> 00:03:59,470 The argument of the PMF, which is x, ranges over the possible 68 00:03:59,470 --> 00:04:04,750 values of the random variable, X. So in this sense, here 69 00:04:04,750 --> 00:04:08,020 we're really dealing with a function. 70 00:04:08,020 --> 00:04:12,200 A function that we could denote just by p with a 71 00:04:12,200 --> 00:04:13,310 subscript x. 72 00:04:13,310 --> 00:04:16,170 This is a function as opposed to the specific 73 00:04:16,170 --> 00:04:17,839 values of this function. 74 00:04:17,839 --> 00:04:21,350 And we can produce plots of this function. 75 00:04:21,350 --> 00:04:25,050 In this particular example that we're dealing with, the 76 00:04:25,050 --> 00:04:29,890 interesting values of x are 3, 4, and 5. 77 00:04:29,890 --> 00:04:33,640 And the associated probabilities are the value of 78 00:04:33,640 --> 00:04:39,790 5 is obtained with probability 1/2, the value of 4-- 79 00:04:39,790 --> 00:04:45,930 this is the event that the outcome is c, which has 80 00:04:45,930 --> 00:04:48,070 probability 1/4. 81 00:04:48,070 --> 00:04:52,960 And the value of 3 is also obtained with probability 1/4 82 00:04:52,960 --> 00:04:56,420 because the value of 3 is obtained when the outcome is 83 00:04:56,420 --> 00:04:59,740 d, and that outcome has probability 1/4. 84 00:04:59,740 --> 00:05:03,390 So the probability mass function is a function of an 85 00:05:03,390 --> 00:05:05,480 argument x. 86 00:05:05,480 --> 00:05:08,920 And for any x, it specifies the probability that the 87 00:05:08,920 --> 00:05:12,930 random variable takes on this particular value. 88 00:05:12,930 --> 00:05:15,710 A few more things to notice. 89 00:05:15,710 --> 00:05:19,150 The probability mass function is always non-negative, since 90 00:05:19,150 --> 00:05:20,750 we're talking about probabilities and 91 00:05:20,750 --> 00:05:23,930 probabilities are always non-negative. 92 00:05:23,930 --> 00:05:28,350 In addition, since the total probability of all outcomes is 93 00:05:28,350 --> 00:05:32,590 equal to 1, the probabilities of the different possible 94 00:05:32,590 --> 00:05:37,330 values of the random variable should also add to 1. 95 00:05:37,330 --> 00:05:42,040 So when you add over all possible values of x, the sum 96 00:05:42,040 --> 00:05:43,790 of the associated probabilities 97 00:05:43,790 --> 00:05:45,740 should be equal to 1. 98 00:05:45,740 --> 00:05:52,690 In terms of our picture, the event that x is equal to 3, 99 00:05:52,690 --> 00:05:57,200 which is this subset of the sample space, the event that x 100 00:05:57,200 --> 00:06:03,520 is equal to 4, which is this subset of the sample space, 101 00:06:03,520 --> 00:06:06,990 and the event that x is equal to 5, which is this subset of 102 00:06:06,990 --> 00:06:08,470 the sample space. 103 00:06:08,470 --> 00:06:09,840 These three events-- 104 00:06:09,840 --> 00:06:12,080 the red, green, and blue-- 105 00:06:12,080 --> 00:06:14,940 they are disjoint, and together they cover the entire 106 00:06:14,940 --> 00:06:15,820 sample space. 107 00:06:15,820 --> 00:06:18,940 So their probabilities should add to 1. 108 00:06:18,940 --> 00:06:21,360 And the probabilities of these events are the probabilities 109 00:06:21,360 --> 00:06:25,860 of the different values of the random variable, X. So the 110 00:06:25,860 --> 00:06:27,790 probabilities of these different values 111 00:06:27,790 --> 00:06:31,650 should also add to 1. 112 00:06:31,650 --> 00:06:34,950 Let us now go through a simple example to illustrate the 113 00:06:34,950 --> 00:06:37,950 general method for calculating the PMF of a 114 00:06:37,950 --> 00:06:39,890 discrete random variable. 115 00:06:39,890 --> 00:06:43,090 We will revisit our familiar example involving two rolls of 116 00:06:43,090 --> 00:06:44,690 the tetrahedral die. 117 00:06:44,690 --> 00:06:49,000 And let X be the result of the first roll, Y be the result of 118 00:06:49,000 --> 00:06:50,159 the second roll. 119 00:06:50,159 --> 00:06:53,430 And notice that we're using uppercase letters. 120 00:06:53,430 --> 00:06:58,180 And this is because X and Y are random variables. 121 00:06:58,180 --> 00:07:01,160 In order to do any probability calculations, we also need the 122 00:07:01,160 --> 00:07:02,170 probability law. 123 00:07:02,170 --> 00:07:05,150 So to keep things simple, let us assume that every possible 124 00:07:05,150 --> 00:07:08,440 outcome, there's 16 of them, has the same probability which 125 00:07:08,440 --> 00:07:12,420 is therefore 1 over 16 for each one of the outcomes. 126 00:07:12,420 --> 00:07:15,770 We will concentrate on a particular random variable 127 00:07:15,770 --> 00:07:20,540 defined to be the sum of the random variables, X and Y. So 128 00:07:20,540 --> 00:07:24,240 if X and Y both happen to be 1, then Z will take 129 00:07:24,240 --> 00:07:25,850 the value of 2. 130 00:07:25,850 --> 00:07:29,850 If X is 2 and Y is 1 our random variable will take the 131 00:07:29,850 --> 00:07:30,940 value of 3. 132 00:07:30,940 --> 00:07:35,290 And similarly if we have this outcome, in those outcomes 133 00:07:35,290 --> 00:07:38,540 here, the random variable takes the value of 4. 134 00:07:38,540 --> 00:07:43,560 And we can continue this way by marking, for each 135 00:07:43,560 --> 00:07:47,040 particular outcome, the corresponding value of the 136 00:07:47,040 --> 00:07:48,735 random variable of interest. 137 00:07:52,030 --> 00:07:56,280 What we want to do now is to calculate the PMF of this 138 00:07:56,280 --> 00:07:57,110 random variable. 139 00:07:57,110 --> 00:07:59,690 What does it mean to calculate the PMF? 140 00:07:59,690 --> 00:08:07,600 We need to find this value for all choices of z, that is for 141 00:08:07,600 --> 00:08:11,530 all possible values in the range of our random variable. 142 00:08:11,530 --> 00:08:15,120 The way we're going to do it is to consider each possible 143 00:08:15,120 --> 00:08:19,830 value of z, one at a time, and for any particular value find 144 00:08:19,830 --> 00:08:21,810 out what are the outcomes-- 145 00:08:21,810 --> 00:08:24,110 the elements of the sample space-- 146 00:08:24,110 --> 00:08:28,220 for which our random variable takes on the specific value, 147 00:08:28,220 --> 00:08:31,240 and add the probabilities of those outcomes. 148 00:08:31,240 --> 00:08:35,820 So to illustrate this process, let us calculate the value of 149 00:08:35,820 --> 00:08:39,630 the PMF for z equal to 2. 150 00:08:39,630 --> 00:08:42,900 This is by definition the probability that our random 151 00:08:42,900 --> 00:08:45,950 variable takes the value of 2. 152 00:08:45,950 --> 00:08:49,640 And this is an event that can only happen here. 153 00:08:49,640 --> 00:08:52,690 It corresponds to only one element of the sample space, 154 00:08:52,690 --> 00:08:55,185 which has probability 1 over 16. 155 00:08:57,970 --> 00:09:02,390 We can continue the same way for other values of z. 156 00:09:02,390 --> 00:09:07,930 So for example, the value of PMF at z equal to 3, this is 157 00:09:07,930 --> 00:09:10,640 the probability that our random variable takes the 158 00:09:10,640 --> 00:09:12,150 value of 3. 159 00:09:12,150 --> 00:09:15,300 This is an event that can happen in two ways-- 160 00:09:15,300 --> 00:09:17,810 it corresponds to two outcomes-- 161 00:09:17,810 --> 00:09:22,540 and so it has probability 2 over 16. 162 00:09:22,540 --> 00:09:26,100 Continuing similarly, the probability that our random 163 00:09:26,100 --> 00:09:33,650 variable takes the value of 4 is equal to 3 over 16. 164 00:09:33,650 --> 00:09:37,250 And we can continue this way and calculate the remaining 165 00:09:37,250 --> 00:09:39,080 entries of our PMF. 166 00:09:39,080 --> 00:09:42,330 After you are done, you end up with a table-- 167 00:09:42,330 --> 00:09:44,850 or rather a graph-- 168 00:09:44,850 --> 00:09:47,050 a plot that has this form. 169 00:09:47,050 --> 00:09:50,590 And these are the values of the different probabilities 170 00:09:50,590 --> 00:09:52,005 that we have computed. 171 00:09:54,630 --> 00:09:57,260 And you can continue with the other values. 172 00:09:57,260 --> 00:10:00,030 It's a reasonable guess that this was going to be 4 over 173 00:10:00,030 --> 00:10:07,490 16, this is going to be 3 over 16, 2 over 16, and 1 over 16. 174 00:10:07,490 --> 00:10:10,640 So we have completely determined the PMF of our 175 00:10:10,640 --> 00:10:11,640 random variable. 176 00:10:11,640 --> 00:10:14,950 We have given the form of the answers. 177 00:10:14,950 --> 00:10:18,790 And it's always convenient to also provide a plot with the 178 00:10:18,790 --> 00:10:20,040 answers that we have.