1 00:00:01,000 --> 00:00:03,540 We have presented the complete solution 2 00:00:03,540 --> 00:00:06,920 to the liner least mean squares estimation problem, when 3 00:00:06,920 --> 00:00:10,430 we want to estimate a certain unknown random variable 4 00:00:10,430 --> 00:00:13,720 on the basis of a different random variable X 5 00:00:13,720 --> 00:00:15,550 that we get to observe. 6 00:00:15,550 --> 00:00:19,720 But what if we have multiple observations? 7 00:00:19,720 --> 00:00:23,510 What would be the analogous formulation of the problem? 8 00:00:23,510 --> 00:00:24,950 Here's the idea. 9 00:00:24,950 --> 00:00:28,320 Once more, we restrict ourselves to estimators 10 00:00:28,320 --> 00:00:31,970 that are linear functions of the data, linear functions 11 00:00:31,970 --> 00:00:34,280 of the observations that we have. 12 00:00:34,280 --> 00:00:37,670 And then we pose the problem of finding the best 13 00:00:37,670 --> 00:00:42,570 choices of these coefficients a1 up to a n and b. 14 00:00:42,570 --> 00:00:45,540 What does it mean to find the best choices? 15 00:00:45,540 --> 00:00:49,010 It means that if we fix certain choices, 16 00:00:49,010 --> 00:00:52,170 we obtain an estimator, we look at the difference 17 00:00:52,170 --> 00:00:54,480 between the estimator and the quantity 18 00:00:54,480 --> 00:00:56,700 we're trying to estimate, take the square, 19 00:00:56,700 --> 00:00:58,880 and then take the expectation. 20 00:00:58,880 --> 00:01:01,910 So once more, we're looking at the mean squared error 21 00:01:01,910 --> 00:01:06,970 of our estimator and we try to make it as small as possible. 22 00:01:06,970 --> 00:01:10,760 So this is a well-defined optimization problem. 23 00:01:10,760 --> 00:01:15,830 We have a quantity, which is a function of certain parameters. 24 00:01:15,830 --> 00:01:19,050 And we wish to find the choices for those parameters, 25 00:01:19,050 --> 00:01:21,420 or those coefficients, that will make 26 00:01:21,420 --> 00:01:24,930 this quantity as small as possible. 27 00:01:24,930 --> 00:01:27,820 One first comment is similar to the case 28 00:01:27,820 --> 00:01:30,920 where we had a single measurement [and] 29 00:01:30,920 --> 00:01:32,280 is the following. 30 00:01:32,280 --> 00:01:35,560 If it turns out that the conditional expectation 31 00:01:35,560 --> 00:01:38,590 of Theta given all of the data that we 32 00:01:38,590 --> 00:01:44,440 have is linear in X, if it is of this form, then what happens? 33 00:01:44,440 --> 00:01:47,990 We know that this is the best possible estimator. 34 00:01:47,990 --> 00:01:51,720 If it is also linear, then it is the best estimator 35 00:01:51,720 --> 00:01:55,470 within the class of linear estimators as well 36 00:01:55,470 --> 00:01:59,100 and, therefore, the linear least mean squares estimator 37 00:01:59,100 --> 00:02:03,800 is the same as the general least mean squares estimator. 38 00:02:03,800 --> 00:02:08,050 So if for some problems it turns out that this is linear, 39 00:02:08,050 --> 00:02:13,240 then we automatically also have the optimal linear estimator. 40 00:02:13,240 --> 00:02:15,520 And this is going to be the case, once more, 41 00:02:15,520 --> 00:02:20,560 for certain normal problems with a linear structure of the type 42 00:02:20,560 --> 00:02:22,520 that we have studied earlier. 43 00:02:25,740 --> 00:02:28,870 Now, let us look into what it takes 44 00:02:28,870 --> 00:02:32,079 to carry out this optimization. 45 00:02:32,079 --> 00:02:35,100 If we had a single observation, then we 46 00:02:35,100 --> 00:02:38,710 have seen a closed form formula, a fairly simple formula, 47 00:02:38,710 --> 00:02:41,650 that tells us what the coefficients should be. 48 00:02:41,650 --> 00:02:43,920 For the more general case, formulas 49 00:02:43,920 --> 00:02:47,090 would not be as simple, but we can 50 00:02:47,090 --> 00:02:49,700 make the following observations. 51 00:02:49,700 --> 00:02:53,510 If you take this expression and expand it, 52 00:02:53,510 --> 00:02:56,250 it's going to have a bunch of terms. 53 00:02:56,250 --> 00:03:00,650 For example, it's going to have a term of the form a1 squared 54 00:03:00,650 --> 00:03:04,730 times the expected value of X1 squared. 55 00:03:04,730 --> 00:03:11,590 It's going to have a term such as twice a1, a2 times 56 00:03:11,590 --> 00:03:16,150 the expected value of X1, X2. 57 00:03:16,150 --> 00:03:20,760 And then there's going to be many more terms to some of them 58 00:03:20,760 --> 00:03:26,920 will also involve products of Theta with this. 59 00:03:26,920 --> 00:03:32,829 So we might see that we have a term of the form a1 expected 60 00:03:32,829 --> 00:03:36,290 value of X1 Theta. 61 00:03:36,290 --> 00:03:40,010 And then, there's going to be many, many more terms. 62 00:03:40,010 --> 00:03:42,350 What's the important thing to notice? 63 00:03:42,350 --> 00:03:46,980 That this expression as a function of the coefficient 64 00:03:46,980 --> 00:03:49,526 involves terms either of this kind 65 00:03:49,526 --> 00:03:51,570 or of this kind, or of that kind, 66 00:03:51,570 --> 00:03:55,800 first-order or second-order terms. 67 00:03:55,800 --> 00:03:57,430 To minimize this expression, we're 68 00:03:57,430 --> 00:04:02,730 going to take the derivative of this and set it equal to 0. 69 00:04:02,730 --> 00:04:06,210 When you take the derivative of a function that 70 00:04:06,210 --> 00:04:09,660 involves only quadratic and linear terms, 71 00:04:09,660 --> 00:04:14,410 you get something that's linear in the coefficients. 72 00:04:14,410 --> 00:04:16,730 The conclusion out of all this discussion 73 00:04:16,730 --> 00:04:21,480 is that when you actually go and carry out this minimization 74 00:04:21,480 --> 00:04:23,930 by setting derivatives to zero, what you 75 00:04:23,930 --> 00:04:29,130 will end up doing is solving a system of linear equations 76 00:04:29,130 --> 00:04:32,085 in the coefficients that you're trying to determine. 77 00:04:32,085 --> 00:04:34,310 And why is this interesting? 78 00:04:34,310 --> 00:04:36,650 Well, it is because if you actually 79 00:04:36,650 --> 00:04:39,010 want to carry out this minimization, 80 00:04:39,010 --> 00:04:43,050 all you need to do is to solve a linear system, which is easily 81 00:04:43,050 --> 00:04:46,370 done on a computer. 82 00:04:46,370 --> 00:04:51,100 The next observation is that this expression only 83 00:04:51,100 --> 00:04:55,860 involves expectations of various terms 84 00:04:55,860 --> 00:04:59,750 that are second order in the random variables involved. 85 00:04:59,750 --> 00:05:02,950 So it involves the expected value of X1 squared, 86 00:05:02,950 --> 00:05:05,050 it involves this term, which has something 87 00:05:05,050 --> 00:05:07,960 to do with the covariance of X1 and X2. 88 00:05:07,960 --> 00:05:11,280 This term that has something to do with the covariance of X1 89 00:05:11,280 --> 00:05:12,910 with Theta. 90 00:05:12,910 --> 00:05:17,480 But these are the only terms out of the distribution of the X's 91 00:05:17,480 --> 00:05:20,310 and of Theta that will matter. 92 00:05:20,310 --> 00:05:25,420 So similar to the case where we had a single observation, 93 00:05:25,420 --> 00:05:27,360 in order to solve this problem, we 94 00:05:27,360 --> 00:05:31,590 do not need to know the complete distribution of the X's 95 00:05:31,590 --> 00:05:32,705 and of Theta. 96 00:05:32,705 --> 00:05:35,570 It is enough to know all of the means, 97 00:05:35,570 --> 00:05:39,040 variances, and covariances of the random variables 98 00:05:39,040 --> 00:05:40,550 that are involved. 99 00:05:40,550 --> 00:05:43,390 And once more, this makes this approach 100 00:05:43,390 --> 00:05:47,060 to estimation a practical one, because we do not 101 00:05:47,060 --> 00:05:50,090 need to model in complete detail the distribution 102 00:05:50,090 --> 00:05:53,470 of the different random variables. 103 00:05:53,470 --> 00:05:58,130 Finally, if we do not have just one unknown random variable, 104 00:05:58,130 --> 00:06:00,570 but we have multiple random variables that we 105 00:06:00,570 --> 00:06:03,740 want to estimate, what should we do? 106 00:06:03,740 --> 00:06:05,800 Well, this is pretty simple. 107 00:06:05,800 --> 00:06:08,250 You just apply this estimation methodology 108 00:06:08,250 --> 00:06:13,390 to each one of the unknown random variables separately. 109 00:06:13,390 --> 00:06:18,720 To conclude, this linear estimation methodology 110 00:06:18,720 --> 00:06:23,900 applies also to the case where you have multiple observations. 111 00:06:23,900 --> 00:06:27,120 You need to solve a certain computational problem in order 112 00:06:27,120 --> 00:06:30,390 to find the structure of the best linear estimator, 113 00:06:30,390 --> 00:06:33,640 but it is not a very difficult computational problem, 114 00:06:33,640 --> 00:06:36,260 because all that it involves is to minimize 115 00:06:36,260 --> 00:06:38,780 a quadratic function of the coefficients 116 00:06:38,780 --> 00:06:40,720 that you are trying to determine. 117 00:06:40,720 --> 00:06:43,130 And this leads us to having to solve 118 00:06:43,130 --> 00:06:45,230 a system of linear equations. 119 00:06:45,230 --> 00:06:48,420 For all these reasons, linear estimation, 120 00:06:48,420 --> 00:06:53,310 or estimation using linear estimators, is quite practical.