1 00:00:01,560 --> 00:00:04,260 As we discussed earlier-- even if we 2 00:00:04,260 --> 00:00:06,390 have multiple observations we can still 3 00:00:06,390 --> 00:00:11,050 find the structure of the best linear estimator in a fairly 4 00:00:11,050 --> 00:00:13,650 simple, computational way-- by solving 5 00:00:13,650 --> 00:00:15,470 a system of linear equations. 6 00:00:15,470 --> 00:00:19,880 But usually, we do not get nice and simple formulas. 7 00:00:19,880 --> 00:00:22,010 But here is a nice example, in which 8 00:00:22,010 --> 00:00:25,430 we will get a simple formula. 9 00:00:25,430 --> 00:00:27,550 The example is something that we have 10 00:00:27,550 --> 00:00:30,930 seen before-- in various guises. 11 00:00:30,930 --> 00:00:34,080 We're trying to estimate a certain quantity-- Theta. 12 00:00:34,080 --> 00:00:39,060 And what we obtain is multiple, noisy observations of Theta. 13 00:00:39,060 --> 00:00:43,800 That is-- at each observation we see Theta plus a noise term. 14 00:00:43,800 --> 00:00:46,340 The assumptions that we make is that Theta 15 00:00:46,340 --> 00:00:49,290 has a prior distribution with a certain mean 16 00:00:49,290 --> 00:00:50,820 and a certain variance. 17 00:00:50,820 --> 00:00:53,950 And the noise terms are zero mean-- 18 00:00:53,950 --> 00:00:56,750 but they have some variance. 19 00:00:56,750 --> 00:00:59,060 And the additional assumption that we make 20 00:00:59,060 --> 00:01:01,790 is that all of these random variables 21 00:01:01,790 --> 00:01:03,860 are independent of each other. 22 00:01:03,860 --> 00:01:06,080 So the noise terms are independent 23 00:01:06,080 --> 00:01:09,850 between themselves-- and also, the noise terms are independent 24 00:01:09,850 --> 00:01:11,360 of Theta. 25 00:01:11,360 --> 00:01:14,060 This is the usual assumption-- but actually-- 26 00:01:14,060 --> 00:01:15,640 in the linear estimation problem, 27 00:01:15,640 --> 00:01:19,010 we do not need to make an independence assumption. 28 00:01:19,010 --> 00:01:21,560 It's enough for our purposes to just assume 29 00:01:21,560 --> 00:01:23,890 that they are uncorrelated. 30 00:01:23,890 --> 00:01:26,320 So we will assume that the correlation coefficient 31 00:01:26,320 --> 00:01:32,000 between any two of these random variables is equal to zero. 32 00:01:32,000 --> 00:01:37,890 Now we can write down the form of the mean squared estimation 33 00:01:37,890 --> 00:01:40,250 error criterion that we have-- and try 34 00:01:40,250 --> 00:01:42,759 to find good choices for the coefficients 35 00:01:42,759 --> 00:01:46,590 to be attached to each one of the observations. 36 00:01:46,590 --> 00:01:48,940 However-- we're going to find the solution 37 00:01:48,940 --> 00:01:51,789 to this problem using a shortcut that's 38 00:01:51,789 --> 00:01:55,770 going to bypass all kinds of computations. 39 00:01:55,770 --> 00:01:58,680 Here's the trick. 40 00:01:58,680 --> 00:02:03,840 Let us suppose-- in addition-- that these random variable were 41 00:02:03,840 --> 00:02:06,740 not just uncorrelated, but independent. 42 00:02:06,740 --> 00:02:09,628 And that they happen to be normal random variables. 43 00:02:12,790 --> 00:02:16,760 This is a problem that we did study before-- 44 00:02:16,760 --> 00:02:20,590 and we did find the maximum a posteriori probability 45 00:02:20,590 --> 00:02:22,990 estimate of Theta. 46 00:02:22,990 --> 00:02:25,040 Because the posterior was normal, 47 00:02:25,040 --> 00:02:28,560 and we found that this was also the conditional expectation 48 00:02:28,560 --> 00:02:30,450 estimator of Theta. 49 00:02:30,450 --> 00:02:36,730 And we did find a formula for it-- which took this form. 50 00:02:36,730 --> 00:02:40,670 This was the form of the optimal estimate of Theta. 51 00:02:40,670 --> 00:02:44,490 If you obtain values for the different observations-- 52 00:02:44,490 --> 00:02:47,120 little xi. 53 00:02:47,120 --> 00:02:49,230 On the other hand-- if you want to translate this 54 00:02:49,230 --> 00:02:54,350 into random variable notation-- then notice that this is going 55 00:02:54,350 --> 00:02:57,620 to be a random variable, this is our estimator. 56 00:02:57,620 --> 00:03:00,890 it's a conditional expectation of Theta given X-- 57 00:03:00,890 --> 00:03:03,580 and it's random because it depends 58 00:03:03,580 --> 00:03:07,430 on the values of the data that we see-- which are themselves 59 00:03:07,430 --> 00:03:08,920 random variables. 60 00:03:08,920 --> 00:03:11,950 On the other hand-- this x0-- is actually 61 00:03:11,950 --> 00:03:14,220 the prior mean of Theta. 62 00:03:14,220 --> 00:03:16,210 So this is a constant-- it's not random, 63 00:03:16,210 --> 00:03:20,500 and that's why we keep it with a lowercase notation. 64 00:03:20,500 --> 00:03:23,400 Now-- what is interesting about this form? 65 00:03:23,400 --> 00:03:27,550 It is a linear function of the observations. 66 00:03:27,550 --> 00:03:31,800 And as we have discussed earlier-- if it turns out 67 00:03:31,800 --> 00:03:35,530 that the conditional expectation is linear in the observations, 68 00:03:35,530 --> 00:03:41,079 then this is also the best possible linear estimator. 69 00:03:41,079 --> 00:03:44,030 So for the special case-- where our random variables are 70 00:03:44,030 --> 00:03:48,060 independent and normal-- we have a formula 71 00:03:48,060 --> 00:03:51,600 for the best linear estimator. 72 00:03:51,600 --> 00:03:55,070 Now what if they are not normal? 73 00:03:55,070 --> 00:03:57,550 Suppose that they're not normal. 74 00:03:57,550 --> 00:04:03,040 But that they have the same means and variances as here-- 75 00:04:03,040 --> 00:04:05,830 as in the normal example. 76 00:04:05,830 --> 00:04:08,870 Since these means and variances are the same-- 77 00:04:08,870 --> 00:04:12,330 and since these random variables are uncorrelated-- 78 00:04:12,330 --> 00:04:15,190 it follows that also all kinds of covariances 79 00:04:15,190 --> 00:04:17,800 between the random variable is involved here-- 80 00:04:17,800 --> 00:04:22,670 are going to be the same as for the normal example. 81 00:04:22,670 --> 00:04:27,150 Now, the optimal solution to the linear estimation problem-- 82 00:04:27,150 --> 00:04:31,100 as we discussed earlier-- only cares 83 00:04:31,100 --> 00:04:34,159 about the means, variances, and covariances 84 00:04:34,159 --> 00:04:36,380 of the random variables involved. 85 00:04:36,380 --> 00:04:39,690 The details of the distribution do not matter. 86 00:04:39,690 --> 00:04:42,150 So whether we have normal distributions-- 87 00:04:42,150 --> 00:04:45,890 or non-normal distributions-- as long as we're 88 00:04:45,890 --> 00:04:50,100 making enough assumptions that fix all the means, variances, 89 00:04:50,100 --> 00:04:52,680 and covariances of interest-- we should 90 00:04:52,680 --> 00:04:56,130 be getting exactly the same solution. 91 00:04:56,130 --> 00:04:59,560 Therefore-- this solution remains valid 92 00:04:59,560 --> 00:05:04,160 for the case of general random variables, as well.