1 00:00:01,060 --> 00:00:03,380 By this point, we have discussed pretty much 2 00:00:03,380 --> 00:00:07,180 everything that is to be said about individual discrete 3 00:00:07,180 --> 00:00:08,870 random variables. 4 00:00:08,870 --> 00:00:12,190 Now let us move to the case where we're dealing with 5 00:00:12,190 --> 00:00:15,940 multiple discrete random variables simultaneously, and 6 00:00:15,940 --> 00:00:17,420 talk about their distribution. 7 00:00:17,420 --> 00:00:19,990 As we will see, their distribution is characterized 8 00:00:19,990 --> 00:00:23,080 by a so-called joint PMF. 9 00:00:23,080 --> 00:00:25,660 So suppose that we have a probabilistic model, and on 10 00:00:25,660 --> 00:00:28,860 that model we have defined two random variables-- 11 00:00:28,860 --> 00:00:32,240 X and Y. And that we have available 12 00:00:32,240 --> 00:00:36,030 their individual PMFs. 13 00:00:36,030 --> 00:00:41,120 These PMFs tell us about one random variable at the time. 14 00:00:41,120 --> 00:00:44,440 This tells us about X, this tells us about Y. But they do 15 00:00:44,440 --> 00:00:47,430 not give us any information about how the two random 16 00:00:47,430 --> 00:00:50,220 variables are related to each other. 17 00:00:50,220 --> 00:00:54,670 For example, if you wish to answer this question, whether 18 00:00:54,670 --> 00:00:58,810 the numerical values that the two random variables happen to 19 00:00:58,810 --> 00:01:02,150 be equal, and what is the probability of that event, you 20 00:01:02,150 --> 00:01:05,060 will not be able to answer this question if you only know 21 00:01:05,060 --> 00:01:07,280 the two individual PMFs. 22 00:01:07,280 --> 00:01:10,830 In order to be able to answer a question of this type, we 23 00:01:10,830 --> 00:01:15,530 will need information that tells us what values of X tend 24 00:01:15,530 --> 00:01:19,039 to occur together with what values of Y. And this 25 00:01:19,039 --> 00:01:22,810 information is captured in the so-called joint PMF. 26 00:01:22,810 --> 00:01:27,520 So the joint PMF is nothing but a piece of notation for an 27 00:01:27,520 --> 00:01:28,980 object that's familiar. 28 00:01:28,980 --> 00:01:31,570 This is the probability that when we carry out the 29 00:01:31,570 --> 00:01:36,039 experiment we happen to see random variable X take on a 30 00:01:36,039 --> 00:01:37,610 value, little x. 31 00:01:37,610 --> 00:01:41,440 And simultaneously see that random variable Y takes on a 32 00:01:41,440 --> 00:01:43,170 value, little y. 33 00:01:43,170 --> 00:01:46,490 This quantity we indicate it with this notation. 34 00:01:46,490 --> 00:01:49,720 The letter little p stands for a PMF. 35 00:01:49,720 --> 00:01:52,650 The subscripts tell us which random variables 36 00:01:52,650 --> 00:01:54,080 we're talking about. 37 00:01:54,080 --> 00:01:57,289 And finally, this is a function of two arguments. 38 00:01:57,289 --> 00:02:01,790 Depending on what pair (x,y) we're interested in, we're 39 00:02:01,790 --> 00:02:04,240 going to get a different numerical value for this 40 00:02:04,240 --> 00:02:05,810 probability. 41 00:02:05,810 --> 00:02:08,870 As an example of a joint PMF in which the two random 42 00:02:08,870 --> 00:02:12,650 variables take values in a finite set, we might be given 43 00:02:12,650 --> 00:02:14,840 a table of this form. 44 00:02:14,840 --> 00:02:17,850 Using this table, we can answer questions such as the 45 00:02:17,850 --> 00:02:18,420 following-- 46 00:02:18,420 --> 00:02:22,180 what is the probability that the random variables X and Y 47 00:02:22,180 --> 00:02:27,670 simultaneously take the values, let us say, 1 and 3? 48 00:02:27,670 --> 00:02:31,360 Then we look up in this table, and we identify that it's this 49 00:02:31,360 --> 00:02:34,010 probability, X takes the value of 1, and Y takes 50 00:02:34,010 --> 00:02:35,900 the value of 3. 51 00:02:35,900 --> 00:02:41,050 And according to this table, the answer would be 2/20. 52 00:02:41,050 --> 00:02:44,370 Now, something to notice about joint PMFs. 53 00:02:44,370 --> 00:02:49,400 When you add over all possible pairs, x and y, this exhausts 54 00:02:49,400 --> 00:02:51,380 all the possibilities. 55 00:02:51,380 --> 00:02:54,980 And therefore, these probabilities should add to 1. 56 00:02:54,980 --> 00:02:58,750 In terms of this table, all of the entries that we have here 57 00:02:58,750 --> 00:03:01,510 should add to 1. 58 00:03:01,510 --> 00:03:06,000 Now, once we have in our hands the joint PMF, we can use it 59 00:03:06,000 --> 00:03:11,690 to find the individual PMFs of the random variables X and Y. 60 00:03:11,690 --> 00:03:16,690 And these individual PMFs are called the marginal PMFs. 61 00:03:22,130 --> 00:03:23,770 How do we find them? 62 00:03:23,770 --> 00:03:27,460 Well, the joint PMF tells us everything there is to be 63 00:03:27,460 --> 00:03:29,990 known about the two random variables, so it should 64 00:03:29,990 --> 00:03:32,579 contain enough information for us to 65 00:03:32,579 --> 00:03:35,450 answer any kind of question. 66 00:03:35,450 --> 00:03:40,540 So for example, if we wish to find the probability that the 67 00:03:40,540 --> 00:03:45,660 random variable X takes the value of 4, we look at all 68 00:03:45,660 --> 00:03:51,300 possible outcomes in which X is equal to 4, and add the 69 00:03:51,300 --> 00:03:53,170 probabilities of these outcomes. 70 00:03:53,170 --> 00:03:57,790 So in this case, it would be 1/20 plus 2/20. 71 00:04:00,390 --> 00:04:04,850 So what we're doing is that if we're interested in a specific 72 00:04:04,850 --> 00:04:09,790 value of X, the probability that X takes on a specific 73 00:04:09,790 --> 00:04:15,870 value, we consider all possible pairs associated with 74 00:04:15,870 --> 00:04:17,610 that fixed x. 75 00:04:17,610 --> 00:04:22,190 That is, we're considering one column of the PMF, and we're 76 00:04:22,190 --> 00:04:24,720 adding the corresponding probabilities. 77 00:04:24,720 --> 00:04:31,670 So to find this entry here, let's say px(3), what we need 78 00:04:31,670 --> 00:04:36,010 is to add these terms on that column. 79 00:04:36,010 --> 00:04:41,720 Similarly, we can find the PMF of the random variable Y. 80 00:04:41,720 --> 00:04:46,220 So for example, the probability that the random 81 00:04:46,220 --> 00:04:53,360 variable Y takes on a value of, let's say, 2, can be found 82 00:04:53,360 --> 00:04:55,290 as follows. 83 00:04:55,290 --> 00:04:58,860 You look at the probabilities of all pairs associated with 84 00:04:58,860 --> 00:05:03,080 this specific y, and you add over the x's. 85 00:05:03,080 --> 00:05:09,550 So we fix Y to have a value of 2, and we add over all pairs 86 00:05:09,550 --> 00:05:10,610 in this row. 87 00:05:10,610 --> 00:05:18,560 So in this example, it would be 1/20 plus 3/20, plus 1/20. 88 00:05:18,560 --> 00:05:23,460 Finally, notice that we are able to answer the question 89 00:05:23,460 --> 00:05:26,090 that got us motivated in the first place. 90 00:05:26,090 --> 00:05:28,620 To find the probability that the two random variables take 91 00:05:28,620 --> 00:05:32,860 equal values, we look at all the outcomes for which the two 92 00:05:32,860 --> 00:05:36,380 random variables indeed take the same numerical values. 93 00:05:36,380 --> 00:05:40,830 And we see that it is this event in this diagram, and the 94 00:05:40,830 --> 00:05:45,390 probability of that event is going to be 2/20. 95 00:05:45,390 --> 00:05:49,180 So in general, once we have available the joint PMF of two 96 00:05:49,180 --> 00:05:53,520 random variables, we will be able to answer any questions 97 00:05:53,520 --> 00:05:57,210 regarding probabilities of events that have to do with 98 00:05:57,210 --> 00:06:00,210 these two random variables. 99 00:06:00,210 --> 00:06:03,460 How about more than two random variables? 100 00:06:03,460 --> 00:06:05,140 It's just a matter of notation. 101 00:06:05,140 --> 00:06:08,380 For example, we can define the joint PMF of three random 102 00:06:08,380 --> 00:06:11,670 variables, and you can use the same idea for the joint PMF, 103 00:06:11,670 --> 00:06:15,340 let's say, of five, or 10, or n random variables. 104 00:06:15,340 --> 00:06:17,960 Let's just look at the notation for three. 105 00:06:17,960 --> 00:06:20,890 There is a well-defined probability that when we carry 106 00:06:20,890 --> 00:06:25,060 out the experiment X, Y and Z as random variables take on 107 00:06:25,060 --> 00:06:27,730 certain specific values. 108 00:06:27,730 --> 00:06:31,010 So we look at the probability of that particular triple, and 109 00:06:31,010 --> 00:06:34,470 we indicate that probability with this notation. 110 00:06:34,470 --> 00:06:38,450 Once more, the sub-scripts tell us which random variables 111 00:06:38,450 --> 00:06:39,790 we're talking about. 112 00:06:39,790 --> 00:06:43,300 And the PMF, of course, is going to be a function of this 113 00:06:43,300 --> 00:06:46,680 triple, little x, little y, little z, because each triple 114 00:06:46,680 --> 00:06:50,390 in general should have a different probability. 115 00:06:50,390 --> 00:06:53,010 Of course, probabilities must always add to 1. 116 00:06:53,010 --> 00:06:55,940 So when we consider all triples and we add their 117 00:06:55,940 --> 00:06:59,290 corresponding probabilities, we should get 1. 118 00:06:59,290 --> 00:07:02,370 And finally, once we have the joint PMF, we can again 119 00:07:02,370 --> 00:07:05,270 recover the marginal PMF. 120 00:07:05,270 --> 00:07:07,690 For example, to find the probability that the random 121 00:07:07,690 --> 00:07:11,810 variable takes on a specific value, little x, we consider 122 00:07:11,810 --> 00:07:16,960 all possible triples in which the random variable indeed 123 00:07:16,960 --> 00:07:20,040 takes that value, little x. 124 00:07:20,040 --> 00:07:26,180 And then we sum over all the possible y's and z's that 125 00:07:26,180 --> 00:07:29,810 could go together with this particular x. 126 00:07:29,810 --> 00:07:33,800 In the same spirit, to find the probability that these two 127 00:07:33,800 --> 00:07:38,030 random variables take on two specific values, we consider 128 00:07:38,030 --> 00:07:42,620 all the possible z's that could go together with this 129 00:07:42,620 --> 00:07:45,000 (x,y) pair. 130 00:07:45,000 --> 00:07:50,330 So this way we're ranging over all outcomes in which X and Y 131 00:07:50,330 --> 00:07:52,080 take on these specific values. 132 00:07:52,080 --> 00:07:56,150 But Z is free to take any value, and so we consider all 133 00:07:56,150 --> 00:07:59,480 those possible values of Z and sum the corresponding 134 00:07:59,480 --> 00:08:00,730 probabilities. 135 00:08:02,930 --> 00:08:05,980 Finally, we can talk about functions of 136 00:08:05,980 --> 00:08:08,840 multiple random variables. 137 00:08:08,840 --> 00:08:11,550 Suppose that we have two random variables, x and y, and 138 00:08:11,550 --> 00:08:13,650 that we define a function of them. 139 00:08:13,650 --> 00:08:17,780 So this function is, of course, a random variable. 140 00:08:17,780 --> 00:08:23,260 And we can find the PMF of this random variable if we 141 00:08:23,260 --> 00:08:26,350 know the joint PMF of X and Y. 142 00:08:26,350 --> 00:08:28,880 So the PMF, which is the probability that the random 143 00:08:28,880 --> 00:08:31,510 variable takes on a specific numerical value, that's the 144 00:08:31,510 --> 00:08:35,789 probability that the function of X and Y takes on a specific 145 00:08:35,789 --> 00:08:37,289 numerical value. 146 00:08:37,289 --> 00:08:42,120 And we can find this probability by adding the 147 00:08:42,120 --> 00:08:45,625 probabilities of all (x,y) pairs. 148 00:08:52,930 --> 00:08:54,360 Which (x,y) pairs? 149 00:08:54,360 --> 00:09:01,890 Those (x,y) pairs for which the value of Z would be equal 150 00:09:01,890 --> 00:09:07,450 to this particular number, little z, that we care about. 151 00:09:07,450 --> 00:09:12,380 So we collect essentially all possible outcomes that make 152 00:09:12,380 --> 00:09:16,680 this event to happen, and we add the probabilities of all 153 00:09:16,680 --> 00:09:18,390 those outcomes. 154 00:09:18,390 --> 00:09:22,590 Finally, similarly to the case where we have a single random 155 00:09:22,590 --> 00:09:28,480 variable and function of it, we now can talk about expected 156 00:09:28,480 --> 00:09:31,925 values of functions of two random variables, and there is 157 00:09:31,925 --> 00:09:35,830 an expected value rule that parallels the expected value 158 00:09:35,830 --> 00:09:38,840 rule that we had developed for the case of a 159 00:09:38,840 --> 00:09:40,770 function of this form. 160 00:09:40,770 --> 00:09:45,020 The form that the expected value rule takes is similar, 161 00:09:45,020 --> 00:09:46,430 and it's quite natural. 162 00:09:46,430 --> 00:09:49,230 The interpretation is as follows. 163 00:09:49,230 --> 00:09:52,160 With this probability, a specific 164 00:09:52,160 --> 00:09:54,740 (x,y) pair will occur. 165 00:09:54,740 --> 00:09:58,840 And when that occurs, the value of our random variable 166 00:09:58,840 --> 00:10:02,370 is a certain number. 167 00:10:02,370 --> 00:10:05,000 And the combination of these two terms gives us a 168 00:10:05,000 --> 00:10:07,110 contribution to the expected value. 169 00:10:07,110 --> 00:10:11,290 Now, we consider all possible (x,y) pairs that may occur, 170 00:10:11,290 --> 00:10:15,310 and we sum over all these (x,y) pairs.