1 00:00:00,550 --> 00:00:02,580 Now that we have found the solution 2 00:00:02,580 --> 00:00:05,960 to the linear least mean squares estimation problem, 3 00:00:05,960 --> 00:00:08,380 it is time to offer a few comments, 4 00:00:08,380 --> 00:00:12,570 make some observations, and provide some insights. 5 00:00:12,570 --> 00:00:16,370 A first important observation is the following. 6 00:00:16,370 --> 00:00:19,920 In order to implement this estimator, 7 00:00:19,920 --> 00:00:22,290 you do not really need to know everything 8 00:00:22,290 --> 00:00:25,470 about the distribution of X and Theta. 9 00:00:25,470 --> 00:00:27,590 The only thing that you need to know 10 00:00:27,590 --> 00:00:31,940 is the mean of the two random variables that are involved, 11 00:00:31,940 --> 00:00:36,610 the variance of X, and the covariance of Theta with X. 12 00:00:36,610 --> 00:00:40,590 So it's only a few pieces of information that we need, 13 00:00:40,590 --> 00:00:42,940 and that means that we do not need 14 00:00:42,940 --> 00:00:47,120 to be so careful about modeling in a particular problem, 15 00:00:47,120 --> 00:00:49,970 as long as we know what the means, variances, 16 00:00:49,970 --> 00:00:51,730 and covariances are. 17 00:00:51,730 --> 00:00:54,190 This is a very desirable property, 18 00:00:54,190 --> 00:00:57,760 because it tells us that this could be simpler 19 00:00:57,760 --> 00:01:01,360 to implement in the real world. 20 00:01:01,360 --> 00:01:04,610 Now let us start looking at the form of the solution, 21 00:01:04,610 --> 00:01:07,980 and let's try to give some interpretation. 22 00:01:07,980 --> 00:01:13,690 Suppose that the correlation coefficient is positive. 23 00:01:13,690 --> 00:01:16,580 Then what does this estimator do? 24 00:01:16,580 --> 00:01:19,110 It starts with a baseline estimate, 25 00:01:19,110 --> 00:01:22,220 which is the expected value of Theta. 26 00:01:22,220 --> 00:01:26,000 And then provides us with a correction term 27 00:01:26,000 --> 00:01:28,420 that's based on the data. 28 00:01:28,420 --> 00:01:31,440 In particular, if it happens that we 29 00:01:31,440 --> 00:01:38,950 see an observation that's larger than expected, in that case, 30 00:01:38,950 --> 00:01:44,030 our estimate is going to be above the expected 31 00:01:44,030 --> 00:01:46,270 value of Theta. 32 00:01:46,270 --> 00:01:48,590 So we started with our baseline estimate, 33 00:01:48,590 --> 00:01:53,250 but if we get a big observation, then our estimate 34 00:01:53,250 --> 00:01:54,950 will also be large. 35 00:01:54,950 --> 00:01:58,810 And conversely, if X happened to be on the lower side, 36 00:01:58,810 --> 00:02:01,770 below the expected value, then our estimate 37 00:02:01,770 --> 00:02:05,380 would also be below the expected value of Theta. 38 00:02:05,380 --> 00:02:07,310 Of course, there's an analogous story. 39 00:02:07,310 --> 00:02:10,720 If rho was negative, then a same argument 40 00:02:10,720 --> 00:02:12,340 would apply, except that it would 41 00:02:12,340 --> 00:02:14,530 work in the opposite direction. 42 00:02:14,530 --> 00:02:18,570 When rho is negative, if we see a large X, 43 00:02:18,570 --> 00:02:23,890 then we will come up with a low estimate for Theta. 44 00:02:23,890 --> 00:02:26,550 Let us look at another special case now. 45 00:02:26,550 --> 00:02:28,680 Suppose that the rho is equal to 0, 46 00:02:28,680 --> 00:02:31,050 X and Theta are uncorrelated. 47 00:02:31,050 --> 00:02:35,650 Then this last term here disappears, and in this case, 48 00:02:35,650 --> 00:02:40,610 our estimate is going to be just the expected value of Theta. 49 00:02:40,610 --> 00:02:42,900 In other words, our estimate doesn't 50 00:02:42,900 --> 00:02:45,530 make any use of the data. 51 00:02:45,530 --> 00:02:49,610 Essentially what's happening is that a linear estimator 52 00:02:49,610 --> 00:02:53,730 exploits the correlation between the two random variables 53 00:02:53,730 --> 00:02:55,440 to come up with an estimate. 54 00:02:55,440 --> 00:02:58,242 But if the two random variables are uncorrelated, 55 00:02:58,242 --> 00:02:59,700 then [there is] nothing that it can 56 00:02:59,700 --> 00:03:03,490 do, and it does not give us anything useful. 57 00:03:03,490 --> 00:03:08,650 It just reports back the expected value of Theta. 58 00:03:08,650 --> 00:03:12,420 Let us now look at the mean square error that's 59 00:03:12,420 --> 00:03:17,710 obtained when we implement this linear estimator. 60 00:03:17,710 --> 00:03:20,890 Let us write down what this expression is. 61 00:03:20,890 --> 00:03:23,590 And to keep things simple, let us 62 00:03:23,590 --> 00:03:28,900 assume that we have 0 means and 0 variances. 63 00:03:28,900 --> 00:03:33,660 So just for the purposes of this derivation and to simplify 64 00:03:33,660 --> 00:03:37,770 the algebra, we will work with the 0 mean case. 65 00:03:40,440 --> 00:03:43,520 So in this case, what we have-- let 66 00:03:43,520 --> 00:03:48,320 me first write Theta, and then put here the estimator. 67 00:03:48,320 --> 00:03:57,150 The estimator is rho times sigma Theta, sigma X times X. 68 00:03:57,150 --> 00:03:59,410 Basically, I took this formula, but I 69 00:03:59,410 --> 00:04:02,770 put 0s for the expected values. 70 00:04:02,770 --> 00:04:05,880 Now let us expand this quadratic. 71 00:04:05,880 --> 00:04:09,000 We obtain the expected value of Theta squared. 72 00:04:11,550 --> 00:04:15,040 That's the variance of Theta, since we assume 0 means. 73 00:04:15,040 --> 00:04:18,709 And the variance is the square of the standard deviation. 74 00:04:18,709 --> 00:04:20,470 Then we have a cross term. 75 00:04:20,470 --> 00:04:24,620 Is going to be twice the expectation 76 00:04:24,620 --> 00:04:27,940 of the product of Theta with this. 77 00:04:27,940 --> 00:04:31,970 Now we can take out this constant outside 78 00:04:31,970 --> 00:04:34,620 of the expectation, so it's 2 rho sigma 79 00:04:34,620 --> 00:04:39,080 Theta, sigma X. And then we are going 80 00:04:39,080 --> 00:04:42,680 to have the expected value of Theta times X. 81 00:04:42,680 --> 00:04:47,080 Because we assume 0 means, the expected value of Theta times X 82 00:04:47,080 --> 00:04:50,510 is the covariance of Theta and X. 83 00:04:50,510 --> 00:04:56,280 And the covariance is rho sigma Theta, sigma X. 84 00:04:56,280 --> 00:05:01,190 And finally, we have a last term, which is rho squared, 85 00:05:01,190 --> 00:05:07,740 sigma Theta squared, divided by sigma X squared 86 00:05:07,740 --> 00:05:10,300 times X squared. 87 00:05:10,300 --> 00:05:12,220 We have an expected value, so that's 88 00:05:12,220 --> 00:05:17,690 the expected value of X squared, which is just the variance of X 89 00:05:17,690 --> 00:05:19,910 or sigma X squared. 90 00:05:19,910 --> 00:05:22,820 OK, this looks like a big mess, but in fact, we 91 00:05:22,820 --> 00:05:24,690 get nice cancellations. 92 00:05:24,690 --> 00:05:27,270 This term cancels with that term. 93 00:05:27,270 --> 00:05:29,840 This term cancels with that term. 94 00:05:29,840 --> 00:05:33,610 We notice that we have sigma Theta squared 95 00:05:33,610 --> 00:05:38,030 in each one of those terms, so we can factor them out. 96 00:05:38,030 --> 00:05:43,640 And then here we have minus 2 rho squared plus rho squared. 97 00:05:43,640 --> 00:05:45,500 When we subtract those two terms we're 98 00:05:45,500 --> 00:05:48,630 left just with a minus rho squared term. 99 00:05:48,630 --> 00:05:50,830 So after you do that algebra carefully, 100 00:05:50,830 --> 00:05:54,510 you find that the answer takes a very simple form. 101 00:05:54,510 --> 00:05:57,930 It's 1 minus rho squared times the variance of Theta. 102 00:05:57,930 --> 00:06:01,580 I should add here that this formula remains valid, 103 00:06:01,580 --> 00:06:06,640 even without the assumption that X and Theta have 0 means. 104 00:06:06,640 --> 00:06:08,850 What's the interpretation of this? 105 00:06:08,850 --> 00:06:12,680 The variance of Theta describes the initial uncertainty 106 00:06:12,680 --> 00:06:14,570 we have about Theta. 107 00:06:14,570 --> 00:06:17,470 But after we carry out the estimation, 108 00:06:17,470 --> 00:06:20,470 the uncertainty gets reduced, and it 109 00:06:20,470 --> 00:06:23,790 gets reduced by a certain factor. 110 00:06:23,790 --> 00:06:25,370 What is this factor? 111 00:06:25,370 --> 00:06:29,670 If rho is equal to 0, than this coefficient is 1 112 00:06:29,670 --> 00:06:32,430 and we do not have any variance reduction. 113 00:06:32,430 --> 00:06:34,760 After all, when rho is equal to 0, 114 00:06:34,760 --> 00:06:37,090 this estimator is not very useful. 115 00:06:37,090 --> 00:06:41,180 It doesn't help us estimate Theta better. 116 00:06:41,180 --> 00:06:44,350 So the expected value of the squared error 117 00:06:44,350 --> 00:06:48,270 is the same as the variance of Theta. 118 00:06:48,270 --> 00:06:51,960 On the other hand, when rho is large, 119 00:06:51,960 --> 00:06:56,180 then this term here becomes small, 120 00:06:56,180 --> 00:07:00,250 and this means that the mean squared error is small. 121 00:07:00,250 --> 00:07:04,460 In fact, there is an extreme case that's interesting, namely 122 00:07:04,460 --> 00:07:09,470 if rho is equal to 1 in absolute value. 123 00:07:09,470 --> 00:07:12,790 So if the random variables are perfectly correlated, 124 00:07:12,790 --> 00:07:17,230 then this term here becomes 0, and the mean 125 00:07:17,230 --> 00:07:19,930 squared estimation error is 0. 126 00:07:19,930 --> 00:07:24,020 Which essentially means that our estimate 127 00:07:24,020 --> 00:07:28,250 is going to be equal to the unknown value. 128 00:07:28,250 --> 00:07:33,620 So the special case of a unit correlation, in absolute value, 129 00:07:33,620 --> 00:07:38,510 corresponds to a case where we can perfectly estimate Theta, 130 00:07:38,510 --> 00:07:41,430 using a linear estimator. 131 00:07:41,430 --> 00:07:44,050 So to summarize, the correlation coefficient 132 00:07:44,050 --> 00:07:48,260 plays a crucial role in linear least squares estimation. 133 00:07:48,260 --> 00:07:51,610 It determines the form of the estimator, 134 00:07:51,610 --> 00:07:57,310 and also, it determines how much the uncertainty in Theta 135 00:07:57,310 --> 00:08:01,680 will be reduced through the process of estimation.