1 00:00:00,800 --> 00:00:03,050 In this segment, we're going to go over 2 00:00:03,050 --> 00:00:06,800 a few theoretical properties of the estimation error in least 3 00:00:06,800 --> 00:00:09,140 mean squares estimation. 4 00:00:09,140 --> 00:00:13,400 Recall that our least mean squares estimator 5 00:00:13,400 --> 00:00:16,410 is the conditional expectation of the unknown random variable, 6 00:00:16,410 --> 00:00:18,830 given our observations. 7 00:00:18,830 --> 00:00:20,680 Let us define the error, which is 8 00:00:20,680 --> 00:00:22,620 the difference between the estimator 9 00:00:22,620 --> 00:00:26,160 and the random variable that we are trying to estimate. 10 00:00:26,160 --> 00:00:28,710 Let us start with some observations. 11 00:00:28,710 --> 00:00:32,689 What is the expected value of our estimator? 12 00:00:32,689 --> 00:00:36,100 Well, using the law of iterated expectations, 13 00:00:36,100 --> 00:00:38,730 the expectation of a conditional expectation 14 00:00:38,730 --> 00:00:41,355 is the same as the unconditional expectation. 15 00:00:44,180 --> 00:00:47,650 And using this property, by moving this Theta 16 00:00:47,650 --> 00:00:50,240 to the other side, what we obtain 17 00:00:50,240 --> 00:00:55,950 is that the estimation error has an expectation of 0. 18 00:00:55,950 --> 00:00:58,370 So this tells us that the estimation error, 19 00:00:58,370 --> 00:01:02,570 on the average, is equal to 0, which is good news. 20 00:01:02,570 --> 00:01:06,590 In fact, something stronger is true. 21 00:01:06,590 --> 00:01:11,900 Not just the overall average of the estimation error is 0, 22 00:01:11,900 --> 00:01:16,780 but even if you condition on a particular measurement, still 23 00:01:16,780 --> 00:01:19,660 the conditional expectation of your estimation error 24 00:01:19,660 --> 00:01:21,610 is going to be equal to 0. 25 00:01:21,610 --> 00:01:24,180 Let us derive this relation. 26 00:01:24,180 --> 00:01:28,740 We're looking at the expected value of Theta tilde, which 27 00:01:28,740 --> 00:01:36,550 is Theta hat minus Theta, conditional on a value of X. 28 00:01:36,550 --> 00:01:39,530 Now, if I tell you the value of X, 29 00:01:39,530 --> 00:01:42,039 then the estimator is completely determined-- 30 00:01:42,039 --> 00:01:44,070 there's no uncertainty about it-- 31 00:01:44,070 --> 00:01:48,030 so the expectation of Theta hat, in this conditional universe, 32 00:01:48,030 --> 00:01:51,740 is just Theta hat itself. 33 00:01:51,740 --> 00:01:54,990 And we're left with the second term, 34 00:01:54,990 --> 00:01:59,280 but the second term is also Theta hat, 35 00:01:59,280 --> 00:02:04,310 and therefore we obtain a difference of 0. 36 00:02:04,310 --> 00:02:09,180 Let us now move to a slightly more complicated question. 37 00:02:09,180 --> 00:02:11,610 What is the covariance between the estimation 38 00:02:11,610 --> 00:02:15,570 error and the estimate? 39 00:02:15,570 --> 00:02:18,690 We will calculate the covariance as follows. 40 00:02:18,690 --> 00:02:21,740 It is the expected value of the product 41 00:02:21,740 --> 00:02:25,829 of the two random variables that we are interested in, 42 00:02:25,829 --> 00:02:28,913 minus the product of their expectations. 43 00:02:35,290 --> 00:02:38,290 Now, we already calculated that the expected value 44 00:02:38,290 --> 00:02:42,130 of the estimation error is equal to 0, 45 00:02:42,130 --> 00:02:46,760 and therefore, this term here disappears. 46 00:02:46,760 --> 00:02:50,329 This term is equal to 0. 47 00:02:50,329 --> 00:02:54,700 So we now need to calculate the first term. 48 00:02:54,700 --> 00:02:58,290 This may seem difficult, but conditioning is always 49 00:02:58,290 --> 00:03:01,050 a great trick, so let's do that. 50 00:03:01,050 --> 00:03:05,904 Let us start by calculating the conditional expectation 51 00:03:05,904 --> 00:03:06,570 of this product. 52 00:03:14,450 --> 00:03:17,180 As before, in the conditional universe, 53 00:03:17,180 --> 00:03:21,710 where we're told the value of X, the value of Theta hat 54 00:03:21,710 --> 00:03:22,960 is known. 55 00:03:22,960 --> 00:03:25,460 It is becoming a constant, so it can 56 00:03:25,460 --> 00:03:27,215 be pulled outside the expectation. 57 00:03:34,150 --> 00:03:38,490 But then we can apply the fact that we established earlier 58 00:03:38,490 --> 00:03:44,240 that this term is 0, and therefore, we obtain a 0 here. 59 00:03:44,240 --> 00:03:49,800 Now, the expected value of a random variable 60 00:03:49,800 --> 00:03:52,200 is the same as the expected value 61 00:03:52,200 --> 00:03:54,320 of the conditional expectation. 62 00:03:54,320 --> 00:03:57,700 This is, again, the law of iterated expectations. 63 00:03:57,700 --> 00:04:00,870 Since the conditional expectation is 0, 64 00:04:00,870 --> 00:04:03,610 when we apply the law of iterated expectations 65 00:04:03,610 --> 00:04:07,370 to this quantity, we also obtain a 0. 66 00:04:07,370 --> 00:04:10,790 Therefore, this term is 0 as well, 67 00:04:10,790 --> 00:04:13,250 and we have established what we wanted to show. 68 00:04:16,269 --> 00:04:20,620 Using this fact, now we can figure out 69 00:04:20,620 --> 00:04:23,300 that the following is true. 70 00:04:23,300 --> 00:04:26,270 We write the random variable Theta 71 00:04:26,270 --> 00:04:33,110 as the sum of Theta hat minus Theta tilde. 72 00:04:33,110 --> 00:04:36,420 This comes simply from this definition here, 73 00:04:36,420 --> 00:04:38,659 by just moving Theta to this side, 74 00:04:38,659 --> 00:04:41,080 and Theta tilde to the other side. 75 00:04:41,080 --> 00:04:45,909 So Theta is the difference of two random variables, 76 00:04:45,909 --> 00:04:49,890 and these two random variables have 0 covariance. 77 00:04:49,890 --> 00:04:53,310 When two random variables have 0 covariance, 78 00:04:53,310 --> 00:04:57,159 then the variance of their sum, or of their difference, 79 00:04:57,159 --> 00:04:59,270 is the sum of the variances. 80 00:04:59,270 --> 00:05:01,640 And this leads us to this relation-- 81 00:05:01,640 --> 00:05:03,560 that the variance of our random variable 82 00:05:03,560 --> 00:05:06,180 can be decomposed into two pieces. 83 00:05:06,180 --> 00:05:10,100 One of them is the variance of the estimator, 84 00:05:10,100 --> 00:05:13,390 and the other is the variance of the estimation error. 85 00:05:15,930 --> 00:05:18,290 This is an interesting fact. 86 00:05:18,290 --> 00:05:21,360 It can actually be derived in a different way, as well. 87 00:05:21,360 --> 00:05:25,240 It is just a manifestation of the law of total variances, 88 00:05:25,240 --> 00:05:29,480 but hidden in somewhat different notation. 89 00:05:29,480 --> 00:05:31,060 And this concludes our discussion 90 00:05:31,060 --> 00:05:34,280 of theoretical properties of the estimation error. 91 00:05:34,280 --> 00:05:37,220 Unfortunately we will not have the opportunity 92 00:05:37,220 --> 00:05:40,180 to use them in any interesting ways. 93 00:05:40,180 --> 00:05:43,120 On the other hand, they are a foundational piece 94 00:05:43,120 --> 00:05:47,000 for the more general theory of least-squares estimation. 95 00:05:47,000 --> 00:05:51,050 If you try to develop it in a more sophisticated and more 96 00:05:51,050 --> 00:05:54,300 deep way, it turns out that these properties 97 00:05:54,300 --> 00:05:57,504 are cornerstones of that theory.