1 00:00:01,510 --> 00:00:05,570 Let us now revisit the subject of expectations and develop an 2 00:00:05,570 --> 00:00:08,580 important linearity property for the case where we're 3 00:00:08,580 --> 00:00:11,330 dealing with multiple random variables. 4 00:00:11,330 --> 00:00:14,420 We already have a linearity property. 5 00:00:14,420 --> 00:00:18,350 If we have a linear function of a single random variable, 6 00:00:18,350 --> 00:00:22,220 then expectations behave in a linear fashion. 7 00:00:22,220 --> 00:00:25,370 But now, if we have multiple random variables, we have this 8 00:00:25,370 --> 00:00:26,930 additional property. 9 00:00:26,930 --> 00:00:30,630 The expected value of the sum of two random variables is 10 00:00:30,630 --> 00:00:34,340 equal to the sum of their expectations. 11 00:00:34,340 --> 00:00:37,610 Let us go through the derivation of this very 12 00:00:37,610 --> 00:00:42,030 important fact because it is a nice exercise in applying the 13 00:00:42,030 --> 00:00:45,630 expected value rule and also manipulating 14 00:00:45,630 --> 00:00:47,195 PMFs and joint PMFs. 15 00:00:49,960 --> 00:00:54,740 We're dealing with the expected value of a function 16 00:00:54,740 --> 00:00:58,110 of two random variables. 17 00:00:58,110 --> 00:00:59,700 Which function? 18 00:00:59,700 --> 00:01:08,300 If we write it this way, we are dealing with the function 19 00:01:08,300 --> 00:01:12,275 g, which is just the sum of its two entries. 20 00:01:17,230 --> 00:01:20,760 So now we can continue with the application of the 21 00:01:20,760 --> 00:01:22,710 expected value rule. 22 00:01:22,710 --> 00:01:28,039 And we obtain the sum over all possible x, y pairs. 23 00:01:28,039 --> 00:01:30,750 Here, we need to write to g of x,y. 24 00:01:30,750 --> 00:01:33,759 But in our case, the function we're dealing with 25 00:01:33,759 --> 00:01:36,789 is just x plus y. 26 00:01:36,789 --> 00:01:40,210 And then we weigh, according to the entries 27 00:01:40,210 --> 00:01:41,580 of the joint PMF. 28 00:01:41,580 --> 00:01:45,289 So this is just an application of the expected value rule to 29 00:01:45,289 --> 00:01:47,729 this particular function. 30 00:01:47,729 --> 00:01:52,740 Now let us take this sum and break it into two pieces-- 31 00:01:52,740 --> 00:02:02,870 one involving only the x-term, and another piece involving 32 00:02:02,870 --> 00:02:04,120 only the y-term. 33 00:02:14,640 --> 00:02:20,850 Now, if we look at this double summation, look 34 00:02:20,850 --> 00:02:22,540 at the inner sum. 35 00:02:22,540 --> 00:02:24,490 It's a sum over y's. 36 00:02:24,490 --> 00:02:28,420 While we're adding over y's, the value of x remains fixed. 37 00:02:28,420 --> 00:02:31,900 So x is a constant, as far as the sum is concerned. 38 00:02:31,900 --> 00:02:35,585 So x can be pulled outside this summation. 39 00:02:48,160 --> 00:02:54,930 Let us just continue with this term, the first one, and see 40 00:02:54,930 --> 00:02:57,570 that a simplification happens. 41 00:02:57,570 --> 00:03:01,290 This quantity here is the sum of the probabilities of the 42 00:03:01,290 --> 00:03:05,150 different y's that can go together with a particular x. 43 00:03:05,150 --> 00:03:07,740 So it is just equal to the probability or 44 00:03:07,740 --> 00:03:08,940 that particular x. 45 00:03:08,940 --> 00:03:10,430 It's the marginal PMF. 46 00:03:17,630 --> 00:03:21,600 If we carry out a similar step for the second term, we will 47 00:03:21,600 --> 00:03:23,780 obtain the sum over y's. 48 00:03:23,780 --> 00:03:27,940 It's just a symmetrical argument. 49 00:03:27,940 --> 00:03:31,579 And at this point we recognize that what we have in front of 50 00:03:31,579 --> 00:03:36,520 us is just the expected value of X, this is the first term, 51 00:03:36,520 --> 00:03:40,170 plus the expected value of Y. So this completes the 52 00:03:40,170 --> 00:03:42,870 derivation of this important linearity property. 53 00:03:45,500 --> 00:03:47,900 Of course, we proved the linearity property for the 54 00:03:47,900 --> 00:03:51,460 case of the sum of two random variables. 55 00:03:51,460 --> 00:03:55,130 But you can proceed in a similar way, or maybe use 56 00:03:55,130 --> 00:03:59,930 induction, and one can easily establish, by following the 57 00:03:59,930 --> 00:04:03,100 same kind of argument, that we have a linearity property when 58 00:04:03,100 --> 00:04:07,320 we add any finite number of random variables. 59 00:04:07,320 --> 00:04:09,750 The expected value of a sum is the sum of 60 00:04:09,750 --> 00:04:12,720 the expected values. 61 00:04:12,720 --> 00:04:16,450 Just for a little bit of practice, if, for example, 62 00:04:16,450 --> 00:04:19,200 we're dealing with this expression, the expected value 63 00:04:19,200 --> 00:04:25,510 of that expression would be the expected value of 2X plus 64 00:04:25,510 --> 00:04:33,370 the expected value of 3Y minus the expected value of Z. And 65 00:04:33,370 --> 00:04:37,440 then, using the linearity property for linear functions 66 00:04:37,440 --> 00:04:41,380 of a single random variable, we can pull the constants out 67 00:04:41,380 --> 00:04:42,510 of the expectations. 68 00:04:42,510 --> 00:04:46,330 And this would be twice the expected value of X plus 3 69 00:04:46,330 --> 00:04:56,380 times the expected value of Y minus the expected value of Z. 70 00:04:56,380 --> 00:05:00,160 What we will do next is to use the linearity property of 71 00:05:00,160 --> 00:05:04,920 expectations to solve a problem that would otherwise 72 00:05:04,920 --> 00:05:07,540 be quite difficult. 73 00:05:07,540 --> 00:05:12,040 We will use the linearity property to find the mean of a 74 00:05:12,040 --> 00:05:14,350 binomial random variable. 75 00:05:14,350 --> 00:05:17,170 Let X be a binomial random variable with 76 00:05:17,170 --> 00:05:18,830 parameters n and p. 77 00:05:18,830 --> 00:05:22,780 And we can interpret X as the number of successes in n 78 00:05:22,780 --> 00:05:25,880 independent trials where each one of the trials has a 79 00:05:25,880 --> 00:05:29,320 probability p of resulting in a success. 80 00:05:29,320 --> 00:05:32,470 Well, we know the PMF of a binomial. 81 00:05:32,470 --> 00:05:37,420 And we can use the definition of expectation to obtain this 82 00:05:37,420 --> 00:05:38,690 expression. 83 00:05:38,690 --> 00:05:41,420 This is just the PMF of the binomial. 84 00:05:46,040 --> 00:05:49,240 And therefore, what we have here is the usual definition 85 00:05:49,240 --> 00:05:50,800 of the expected value. 86 00:05:50,800 --> 00:05:54,750 Now, if you look at this sum, it appears quite formidable. 87 00:05:54,750 --> 00:05:58,390 And it would be quite hard to evaluate it. 88 00:05:58,390 --> 00:06:02,190 Instead, we're going to use a very useful trick. 89 00:06:02,190 --> 00:06:07,950 We will employ what we have called indicator variables. 90 00:06:07,950 --> 00:06:11,880 So let's define a random variable Xi, which is a one if 91 00:06:11,880 --> 00:06:16,050 the ith trial is a success, and zero otherwise. 92 00:06:16,050 --> 00:06:20,050 Now if we want to count successes, what we want to 93 00:06:20,050 --> 00:06:24,850 count is how many of the Xi's are equal to 1. 94 00:06:24,850 --> 00:06:30,480 So if we add the Xi's, this will have a contribution of 1 95 00:06:30,480 --> 00:06:32,320 from each one of the successes. 96 00:06:32,320 --> 00:06:34,490 So when you add them up, you obtain the 97 00:06:34,490 --> 00:06:36,710 total number of successes. 98 00:06:36,710 --> 00:06:40,659 So we have expressed a random variable as a sum of much 99 00:06:40,659 --> 00:06:43,270 simpler random variables. 100 00:06:43,270 --> 00:06:47,620 So at this point, we can now use linearity of expectations 101 00:06:47,620 --> 00:06:51,400 to write that the expected value of X will be the 102 00:06:51,400 --> 00:06:57,330 expected value of X1 plus all the way to the 103 00:06:57,330 --> 00:07:00,920 expected value of Xn. 104 00:07:00,920 --> 00:07:05,180 Now what is the expected value of X1? 105 00:07:05,180 --> 00:07:08,920 It is a Bernoulli random variable that takes the value 106 00:07:08,920 --> 00:07:13,060 1 with probability p and takes the value of 0 with 107 00:07:13,060 --> 00:07:14,940 probability 1 minus p. 108 00:07:14,940 --> 00:07:19,370 The expected value of this random variable is p. 109 00:07:19,370 --> 00:07:23,830 And similarly, for each one of these terms in the summation. 110 00:07:23,830 --> 00:07:29,310 And so the final end result is equal to n times p. 111 00:07:29,310 --> 00:07:32,770 This answer, of course, makes also intuitive sense. 112 00:07:32,770 --> 00:07:40,190 If we have to p equal to 1/2, and we toss a coin 100 times, 113 00:07:40,190 --> 00:07:45,230 the expected number, or the average number, of heads we 114 00:07:45,230 --> 00:07:50,650 expect to see will be 1/2 half times 100, which is 50. 115 00:07:50,650 --> 00:07:55,270 The higher p is, the more successes we expect to see. 116 00:07:55,270 --> 00:07:58,480 And of course, if we double n, we expect to see 117 00:07:58,480 --> 00:08:01,120 twice as many successes. 118 00:08:01,120 --> 00:08:05,760 So this is an illustration of the power of breaking up 119 00:08:05,760 --> 00:08:10,150 problems into simpler pieces that are easier to analyze. 120 00:08:10,150 --> 00:08:14,440 And the linearity of expectations is one more tool 121 00:08:14,440 --> 00:08:18,710 that we have in our hands for breaking up perhaps 122 00:08:18,710 --> 00:08:22,220 complicated random variables into simpler ones and then 123 00:08:22,220 --> 00:08:23,470 analyzing them separately.