1 00:00:00,610 --> 00:00:03,930 Let us now revisit the variance and see what happens 2 00:00:03,930 --> 00:00:06,000 in the case of independence. 3 00:00:06,000 --> 00:00:08,640 Variances have some general properties that we have 4 00:00:08,640 --> 00:00:11,010 already seen. 5 00:00:11,010 --> 00:00:14,910 However, since we often add random variables, we would 6 00:00:14,910 --> 00:00:18,020 like to be able to say something about the variance 7 00:00:18,020 --> 00:00:20,800 of the sum of two random variables. 8 00:00:20,800 --> 00:00:24,240 Unfortunately, the situation is not so simple, and in 9 00:00:24,240 --> 00:00:28,150 general, the variance of the sum is not the same as the sum 10 00:00:28,150 --> 00:00:29,670 of the variances. 11 00:00:29,670 --> 00:00:32,150 We will see an example shortly. 12 00:00:32,150 --> 00:00:36,430 On the other hand, when X and Y are independent, the 13 00:00:36,430 --> 00:00:41,610 variance of the sum is equal to the sum of the variances, 14 00:00:41,610 --> 00:00:44,030 and this is a very useful fact. 15 00:00:44,030 --> 00:00:47,040 Let us go through the derivation of this property. 16 00:00:47,040 --> 00:00:52,410 But to keep things simple, let us assume just for the sake of 17 00:00:52,410 --> 00:00:56,820 the derivation, that the two random variables have 0 mean. 18 00:01:02,650 --> 00:01:08,700 So in that case, the variance over the sum is just the 19 00:01:08,700 --> 00:01:11,675 expected value of the square of the sum. 20 00:01:17,210 --> 00:01:21,025 And we can expand the quadratic and write this as 21 00:01:21,025 --> 00:01:29,170 the expectation of X squared plus 2 X Y plus Y squared. 22 00:01:29,170 --> 00:01:33,160 Then we use linearity of expectations to write this as 23 00:01:33,160 --> 00:01:38,450 the expected value of X squared plus twice the 24 00:01:38,450 --> 00:01:46,910 expected value of X times Y and then plus the expected 25 00:01:46,910 --> 00:01:50,570 value of Y squared. 26 00:01:50,570 --> 00:01:55,370 Now, the first term is just the variance of X because we 27 00:01:55,370 --> 00:01:56,920 have assumed that we have 0 mean. 28 00:01:59,680 --> 00:02:06,320 The last term is similarly the variance of Y. How about the 29 00:02:06,320 --> 00:02:07,870 middle term? 30 00:02:07,870 --> 00:02:12,900 Because of independence, the expected value of the product 31 00:02:12,900 --> 00:02:18,850 is the same as the product of the expected values, and the 32 00:02:18,850 --> 00:02:21,690 expected values are 0 in our case. 33 00:02:21,690 --> 00:02:26,720 So this term, because of independence, is going to be 34 00:02:26,720 --> 00:02:28,270 equal to 0. 35 00:02:28,270 --> 00:02:33,060 In particular, what we have is that the expected value of XY 36 00:02:33,060 --> 00:02:38,750 equals the expected value of X times the expected value of Y, 37 00:02:38,750 --> 00:02:41,920 equal to 0. 38 00:02:41,920 --> 00:02:45,180 And so we have verified that indeed the variance of the sum 39 00:02:45,180 --> 00:02:47,910 is equal to the sum of the variances. 40 00:02:47,910 --> 00:02:49,685 Let us now look at some examples. 41 00:02:52,560 --> 00:02:56,730 Suppose that X is the same random variable as Y. Clearly, 42 00:02:56,730 --> 00:02:59,610 this is a case where independence fails to hold. 43 00:02:59,610 --> 00:03:03,410 If I tell you the value of X, then you know the value of Y. 44 00:03:03,410 --> 00:03:07,400 So in this case, the variance of the sum is the same as the 45 00:03:07,400 --> 00:03:12,910 variance of twice X. Since X is the same as Y, X plus Y is 46 00:03:12,910 --> 00:03:18,760 2 times X. And then using this property for the variance, 47 00:03:18,760 --> 00:03:21,500 what happens when we multiply by a constant? 48 00:03:21,500 --> 00:03:27,110 This is going to be 4 times the variance of X. 49 00:03:27,110 --> 00:03:31,230 In another example, suppose that X is the negative of Y. 50 00:03:31,230 --> 00:03:35,790 In that case, X plus Y is identically equal to 0. 51 00:03:35,790 --> 00:03:38,640 So we're dealing with a random variable that 52 00:03:38,640 --> 00:03:40,030 takes a constant value. 53 00:03:40,030 --> 00:03:43,510 In particular, it is always equal to its mean, and so the 54 00:03:43,510 --> 00:03:46,790 difference from the mean is always equal to 0, and so the 55 00:03:46,790 --> 00:03:50,370 variance will also evaluate to 0. 56 00:03:50,370 --> 00:03:53,720 So we see that the variance of the sum can take quite 57 00:03:53,720 --> 00:03:58,410 different values depending on the sort of interrelation that 58 00:03:58,410 --> 00:04:01,450 we have between the two random variables. 59 00:04:01,450 --> 00:04:04,740 So these two examples indicate that knowing the variance of 60 00:04:04,740 --> 00:04:08,780 each one of the random variables is not enough to say 61 00:04:08,780 --> 00:04:11,440 much about the variance of the sum. 62 00:04:11,440 --> 00:04:14,530 The answer will generally depend on how the two random 63 00:04:14,530 --> 00:04:17,260 variables are related to each other and what kind of 64 00:04:17,260 --> 00:04:19,410 dependencies they have. 65 00:04:19,410 --> 00:04:25,260 As a last example, suppose now that X and Y are independent. 66 00:04:25,260 --> 00:04:29,080 X is independent from Y, and therefore X is also 67 00:04:29,080 --> 00:04:32,740 independent from minus 3Y. 68 00:04:32,740 --> 00:04:36,620 Therefore, this variance is equal to the sum of the 69 00:04:36,620 --> 00:04:44,080 variances of X and of minus 3Y. 70 00:04:44,080 --> 00:04:48,190 And using the facts that we already know, this is going to 71 00:04:48,190 --> 00:04:55,440 be equal to the variance of X plus 9 times the variance of 72 00:04:55,440 --> 00:04:59,490 Y. 73 00:04:59,490 --> 00:05:02,970 As an illustration of the usefulness of the property of 74 00:05:02,970 --> 00:05:06,300 the variance that we have just established, we will now use 75 00:05:06,300 --> 00:05:10,240 it to calculate the variance of a binomial random variable. 76 00:05:10,240 --> 00:05:14,330 Remember that a binomial with parameters n and p corresponds 77 00:05:14,330 --> 00:05:17,780 to the number of successes in n independent trials. 78 00:05:20,400 --> 00:05:22,700 We use indicator variables. 79 00:05:22,700 --> 00:05:25,870 This is the same trick that we used to calculate the expected 80 00:05:25,870 --> 00:05:27,670 value of the binomial. 81 00:05:27,670 --> 00:05:31,795 So the random variable X sub i is equal to 1 if the i-th 82 00:05:31,795 --> 00:05:36,070 trial is a success and is a 0 otherwise. 83 00:05:36,070 --> 00:05:42,030 And as we did before, we note that X, the total number of 84 00:05:42,030 --> 00:05:45,909 successes, is the sum of those indicator variables. 85 00:05:45,909 --> 00:05:49,930 Each success makes one of those variables equal to 1, so 86 00:05:49,930 --> 00:05:53,630 by adding those indicator variables, we're just counting 87 00:05:53,630 --> 00:05:56,120 the number of successes. 88 00:05:56,120 --> 00:06:01,100 The key point to note is that the assumption of independence 89 00:06:01,100 --> 00:06:05,400 that we're making is essentially the assumption 90 00:06:05,400 --> 00:06:11,670 that these random variables Xi are independent of each other. 91 00:06:11,670 --> 00:06:16,130 So we're dealing with a situation where we have a sum 92 00:06:16,130 --> 00:06:20,790 of independent random variables, and according to 93 00:06:20,790 --> 00:06:25,490 what we have shown, the variance of X is going to be 94 00:06:25,490 --> 00:06:28,103 the sum of the variances of the Xi's. 95 00:06:34,620 --> 00:06:39,500 Now, the Xi's all have the same distribution so all these 96 00:06:39,500 --> 00:06:40,950 variances will be the same. 97 00:06:40,950 --> 00:06:45,520 It suffices to consider one of them. 98 00:06:45,520 --> 00:06:49,409 Now, X1 is a Bernoulli random variable with parameter p. 99 00:06:49,409 --> 00:06:52,210 We know what its variance is-- 100 00:06:52,210 --> 00:06:57,580 it is p times 1 minus p. 101 00:06:57,580 --> 00:07:02,890 And therefore, this is the formula for the variance of a 102 00:07:02,890 --> 00:07:04,380 binomial random variable.