1 00:00:00,880 --> 00:00:03,730 One situation where covariances show up 2 00:00:03,730 --> 00:00:06,110 is when we try to calculate the variance 3 00:00:06,110 --> 00:00:08,360 of a sum of random variables. 4 00:00:08,360 --> 00:00:10,210 So let us look at the variance of the sum 5 00:00:10,210 --> 00:00:13,120 of two random variables, X1 and X2. 6 00:00:13,120 --> 00:00:15,520 If the two random variables are independent, 7 00:00:15,520 --> 00:00:17,650 then we know that the variance of the sum 8 00:00:17,650 --> 00:00:20,040 is the sum of the variances. 9 00:00:20,040 --> 00:00:23,290 Let us now look at what happens in the case where 10 00:00:23,290 --> 00:00:26,120 we may have dependence. 11 00:00:26,120 --> 00:00:30,350 By definition, the variance is the expected value 12 00:00:30,350 --> 00:00:32,900 of the difference of the random variable we're 13 00:00:32,900 --> 00:00:40,210 interested in from its expected value, squared. 14 00:00:43,480 --> 00:00:47,130 And now we rearrange terms here and write 15 00:00:47,130 --> 00:00:51,180 what is inside the expectation as follows. 16 00:00:51,180 --> 00:01:00,640 We put together X1 with the term minus the expected value of X1 17 00:01:00,640 --> 00:01:07,170 and then X2 together with negative the expected value 18 00:01:07,170 --> 00:01:07,700 of X2. 19 00:01:15,590 --> 00:01:21,270 So now we have the square of the sum of two terms. 20 00:01:21,270 --> 00:01:25,000 We expand the quadratic to obtain 21 00:01:25,000 --> 00:01:32,250 expected value of the square of the first term 22 00:01:32,250 --> 00:01:45,530 plus the square of the second term plus 2 times a cross term. 23 00:01:56,770 --> 00:01:58,890 And what do we have here? 24 00:01:58,890 --> 00:02:01,210 The expected value of the first term 25 00:02:01,210 --> 00:02:04,970 is just the variance of X1. 26 00:02:04,970 --> 00:02:08,419 The expected value of this second term 27 00:02:08,419 --> 00:02:12,740 is just the variance of X2. 28 00:02:12,740 --> 00:02:17,079 And finally, the cross term, the expected value of it, 29 00:02:17,079 --> 00:02:21,370 we recognize that it is the same as the covariance of X1 30 00:02:21,370 --> 00:02:22,590 with X2. 31 00:02:22,590 --> 00:02:26,520 And we also have this factor of 2 up here. 32 00:02:26,520 --> 00:02:29,260 So this is the general form for the variance 33 00:02:29,260 --> 00:02:32,040 of the sum of two random variables. 34 00:02:32,040 --> 00:02:35,730 In the case of independence, the covariance is 0, 35 00:02:35,730 --> 00:02:38,520 and we just have the sum of the two variances. 36 00:02:38,520 --> 00:02:41,300 But when the random variables are dependent, 37 00:02:41,300 --> 00:02:44,240 it is possible that the covariance will be non-zero, 38 00:02:44,240 --> 00:02:46,940 and we have one additional term. 39 00:02:46,940 --> 00:02:49,860 Let us now not generalize this calculation. 40 00:02:49,860 --> 00:02:51,970 Here is for reference and comparison 41 00:02:51,970 --> 00:02:55,430 the formula for the case where we add two random variables. 42 00:02:55,430 --> 00:02:57,140 But now let us look at the variance 43 00:02:57,140 --> 00:02:59,220 of the sum of many of them. 44 00:02:59,220 --> 00:03:01,380 To keep the calculation simple, we're 45 00:03:01,380 --> 00:03:05,230 going to assume that the means are zero. 46 00:03:05,230 --> 00:03:07,390 But the final conclusion will also 47 00:03:07,390 --> 00:03:10,460 be valid for the case of non-zero means. 48 00:03:10,460 --> 00:03:12,810 Since we have assumed zero means, 49 00:03:12,810 --> 00:03:15,740 the variance is the same as the expected value 50 00:03:15,740 --> 00:03:18,890 of the square of the random variable involved, 51 00:03:18,890 --> 00:03:21,130 which is this one. 52 00:03:21,130 --> 00:03:27,190 And now we expand this quadratic to obtain the expected value 53 00:03:27,190 --> 00:03:30,620 of: we will have a bunch of terms 54 00:03:30,620 --> 00:03:36,800 of this, where i ranges from 1 up to n. 55 00:03:36,800 --> 00:03:43,180 And then we will have a bunch of cross terms of the form Xi, Xj. 56 00:03:43,180 --> 00:03:48,380 And we obtain one cross term for each choice of i 57 00:03:48,380 --> 00:03:55,020 from 1 to n and for each choice of j from 1 to n, 58 00:03:55,020 --> 00:03:58,566 as long as i is different from j. 59 00:03:58,566 --> 00:04:04,740 So overall here, this sum will have n squared minus n terms. 60 00:04:08,810 --> 00:04:13,840 Now, we use linearity to move the expectation 61 00:04:13,840 --> 00:04:15,800 inside the summation. 62 00:04:15,800 --> 00:04:21,240 And so from here, we obtain the sum 63 00:04:21,240 --> 00:04:25,940 of the expected value of Xi squared, which 64 00:04:25,940 --> 00:04:28,820 is the same as the variance of Xi, 65 00:04:28,820 --> 00:04:31,630 since we assumed zero means. 66 00:04:31,630 --> 00:04:36,700 And similarly here, we're going to get this double sum over i's 67 00:04:36,700 --> 00:04:41,460 that are different from j of the expected value of Xi, Xj. 68 00:04:41,460 --> 00:04:44,270 And in the case of 0 means again, this 69 00:04:44,270 --> 00:04:48,830 is the same as the covariance of Xi with Xj. 70 00:04:48,830 --> 00:04:52,490 And so we have obtained this general formula 71 00:04:52,490 --> 00:04:56,470 that gives us the variance of a sum of random variables. 72 00:04:56,470 --> 00:04:59,220 If the random variables have 0 covariances, 73 00:04:59,220 --> 00:05:02,740 then the variance of the sum is the sum of the variances. 74 00:05:02,740 --> 00:05:05,460 And this happens in particular when the random variables 75 00:05:05,460 --> 00:05:06,850 are independent. 76 00:05:06,850 --> 00:05:09,940 For the general case, where we may have dependencies 77 00:05:09,940 --> 00:05:13,310 and non-zero variances, then the variance of the sum 78 00:05:13,310 --> 00:05:17,320 involves also all the possible covariances 79 00:05:17,320 --> 00:05:20,220 between the different random variables. 80 00:05:20,220 --> 00:05:23,540 And let me finally add that this formula is also 81 00:05:23,540 --> 00:05:26,380 valid for the general case where we do not 82 00:05:26,380 --> 00:05:28,690 assume that the means are zero. 83 00:05:28,690 --> 00:05:31,280 And the derivation is very similar, 84 00:05:31,280 --> 00:05:33,300 except that there's a few more symbols 85 00:05:33,300 --> 00:05:35,300 that are floating around.