1 00:00:01,020 --> 00:00:04,900 In this final segment, we want to discuss an interesting point 2 00:00:04,900 --> 00:00:07,830 about linear estimators. 3 00:00:07,830 --> 00:00:10,000 Here's what the issue is. 4 00:00:10,000 --> 00:00:12,740 You obtain an observation, X, on the basis 5 00:00:12,740 --> 00:00:15,210 of which you want to estimate Theta. 6 00:00:15,210 --> 00:00:19,190 But perhaps you measure X on a different scale, 7 00:00:19,190 --> 00:00:23,410 let's say on a cubic scale, so that what you record actually 8 00:00:23,410 --> 00:00:25,160 is X cubed. 9 00:00:25,160 --> 00:00:28,440 So you're faced with two possible estimation problems. 10 00:00:28,440 --> 00:00:32,200 One estimation problem is to use X to estimate Theta. 11 00:00:32,200 --> 00:00:36,900 Another estimation problem is to use X cubed to estimate Theta. 12 00:00:36,900 --> 00:00:38,990 Does it make a difference? 13 00:00:38,990 --> 00:00:42,890 Let's consider the case of least mean squares estimation, 14 00:00:42,890 --> 00:00:45,550 without any linearity constraint. 15 00:00:45,550 --> 00:00:48,530 If you use X to estimate Theta, your estimator 16 00:00:48,530 --> 00:00:51,160 is going to be this conditional expectation. 17 00:00:51,160 --> 00:00:53,580 If you use X cubed to estimate Theta, 18 00:00:53,580 --> 00:00:56,900 your estimator will be this conditional expectation. 19 00:00:56,900 --> 00:00:58,670 Are they different? 20 00:00:58,670 --> 00:01:05,510 Well, X and X cubed carry the same information about Theta. 21 00:01:05,510 --> 00:01:10,320 In particular, the posterior distribution of Theta given X 22 00:01:10,320 --> 00:01:14,200 is going to be the same as the posterior distribution of Theta 23 00:01:14,200 --> 00:01:16,090 given X cubed. 24 00:01:16,090 --> 00:01:18,470 You will be getting the same information, 25 00:01:18,470 --> 00:01:20,700 the same knowledge about X. 26 00:01:20,700 --> 00:01:22,480 And in particular, if you calculate 27 00:01:22,480 --> 00:01:26,970 conditional expectations, these will also be the same. 28 00:01:26,970 --> 00:01:29,260 What about the linear case? 29 00:01:29,260 --> 00:01:32,320 If we restrict to linear estimators, 30 00:01:32,320 --> 00:01:35,220 then on the basis of X, you would 31 00:01:35,220 --> 00:01:38,110 form a linear estimation of this kind. 32 00:01:38,110 --> 00:01:42,350 But if your observation is in the form of X cubed, 33 00:01:42,350 --> 00:01:45,050 then a linear estimator would form 34 00:01:45,050 --> 00:01:47,680 a linear function of X cubed. 35 00:01:47,680 --> 00:01:51,060 So this would be a different kind of estimator. 36 00:01:51,060 --> 00:01:54,110 We have seen a formula on how to obtain the best 37 00:01:54,110 --> 00:01:57,030 estimator, the best choices of a and b 38 00:01:57,030 --> 00:01:59,229 for estimators of this kind. 39 00:01:59,229 --> 00:02:01,800 We can use that same formula to obtain 40 00:02:01,800 --> 00:02:04,210 the best estimator of that kind. 41 00:02:04,210 --> 00:02:07,100 It's going to be, of course, a different estimator. 42 00:02:07,100 --> 00:02:10,949 Here, we're optimizing within a different class. 43 00:02:10,949 --> 00:02:13,630 Which one of the two is better? 44 00:02:13,630 --> 00:02:15,210 Well, this depends on what you know 45 00:02:15,210 --> 00:02:18,760 about the particular problem at hand. 46 00:02:18,760 --> 00:02:21,850 If you have some reason to believe, 47 00:02:21,850 --> 00:02:26,810 or if you know that Theta and X are roughly related 48 00:02:26,810 --> 00:02:31,110 by some kind of cubic relation, then perhaps estimators 49 00:02:31,110 --> 00:02:34,690 in this class are going to perform better than estimators 50 00:02:34,690 --> 00:02:37,040 in that class. 51 00:02:37,040 --> 00:02:40,620 Let me also point out a related issue that would come here. 52 00:02:40,620 --> 00:02:43,650 To find the right choice of a, you 53 00:02:43,650 --> 00:02:47,400 need to know the covariance between X and Theta. 54 00:02:47,400 --> 00:02:49,380 That's why the formula tells us about 55 00:02:49,380 --> 00:02:51,740 the optimal linear estimator. 56 00:02:51,740 --> 00:02:58,070 Here you would you need to know the covariance between Theta 57 00:02:58,070 --> 00:03:01,140 and X cubed. 58 00:03:01,140 --> 00:03:04,640 In addition, the formula requires the variance of X. 59 00:03:04,640 --> 00:03:07,720 But here, instead of X, we're using X cubed. 60 00:03:07,720 --> 00:03:12,300 So in this case, we would need the variance of X cubed. 61 00:03:12,300 --> 00:03:14,360 Now, this could be more challenging. 62 00:03:14,360 --> 00:03:17,710 In general, the higher the powers that you have, 63 00:03:17,710 --> 00:03:19,880 the more difficult these quantities 64 00:03:19,880 --> 00:03:23,430 are to calculate or to know what they are. 65 00:03:23,430 --> 00:03:27,430 But leaving that issue aside, what we have here 66 00:03:27,430 --> 00:03:32,180 is two alternative choices for the structure of the estimator 67 00:03:32,180 --> 00:03:33,130 that we're using. 68 00:03:35,770 --> 00:03:39,380 Now, we can push this story further. 69 00:03:39,380 --> 00:03:42,910 Instead of considering just estimators of this kind, 70 00:03:42,910 --> 00:03:48,380 we might consider as well estimators of this kind. 71 00:03:48,380 --> 00:03:51,280 Is this a linear estimator? 72 00:03:51,280 --> 00:03:53,610 We still call it a linear estimator, 73 00:03:53,610 --> 00:03:56,680 because it is linear in the coefficients 74 00:03:56,680 --> 00:03:59,900 that we have to choose on how to optimize. 75 00:03:59,900 --> 00:04:01,440 That's the more important part. 76 00:04:01,440 --> 00:04:04,430 It's the linearity in these coefficients that's important, 77 00:04:04,430 --> 00:04:07,460 rather than the linearity in the X's. 78 00:04:07,460 --> 00:04:11,050 So as a function of X, this is non-linear. 79 00:04:11,050 --> 00:04:14,950 On the other hand, we can think of this X as one observation, 80 00:04:14,950 --> 00:04:17,269 X squared as another observation, 81 00:04:17,269 --> 00:04:21,579 X cubed as a third observation, and what we've got here 82 00:04:21,579 --> 00:04:26,210 is a linear function of three different observations. 83 00:04:26,210 --> 00:04:30,650 So we can still pose a least squares problem in which we 84 00:04:30,650 --> 00:04:34,680 try to find the best choices for the coefficients a1, a2, 85 00:04:34,680 --> 00:04:37,370 and a3, as well as the coefficient b, 86 00:04:37,370 --> 00:04:39,200 find those choices that they're going 87 00:04:39,200 --> 00:04:42,960 to give us the smallest possible mean squared error. 88 00:04:42,960 --> 00:04:46,050 So we can optimize within this class. 89 00:04:46,050 --> 00:04:48,450 Within this class of estimators, we certainly 90 00:04:48,450 --> 00:04:50,590 have more flexibility. 91 00:04:50,590 --> 00:04:52,630 This is a more general class of estimators 92 00:04:52,630 --> 00:04:55,480 than either of this one or that one. 93 00:04:55,480 --> 00:04:59,720 So within this class, we should be able to do even better. 94 00:04:59,720 --> 00:05:01,980 On the other hand, we would have to pay a price 95 00:05:01,980 --> 00:05:04,730 that this is a more complex structure. 96 00:05:04,730 --> 00:05:08,420 It would be more difficult to find the optimal coefficients. 97 00:05:08,420 --> 00:05:12,380 And also, we're going to need higher order moments 98 00:05:12,380 --> 00:05:17,010 or expectations related to the X's and the Thetas. 99 00:05:17,010 --> 00:05:21,500 Finally, there's nothing special in us using powers of X 100 00:05:21,500 --> 00:05:23,410 and using a polynomial. 101 00:05:23,410 --> 00:05:25,610 We could also look at estimators that 102 00:05:25,610 --> 00:05:27,800 have some other type of structure. 103 00:05:27,800 --> 00:05:29,730 For example, we might want to mix 104 00:05:29,730 --> 00:05:33,780 an exponential function in X and a logarithmic function of X, 105 00:05:33,780 --> 00:05:38,560 look at estimators of this form, and try to choose the best one. 106 00:05:38,560 --> 00:05:41,520 Find the best choice of the coefficients. 107 00:05:41,520 --> 00:05:45,620 Again, this is something that is possible. 108 00:05:45,620 --> 00:05:47,810 And again, it's going to boil down 109 00:05:47,810 --> 00:05:52,250 to solving a system of linear equations in the coefficients. 110 00:05:52,250 --> 00:05:55,490 On the other hand, we need to know various expectations 111 00:05:55,490 --> 00:05:59,540 about X that might be difficult to obtain. 112 00:05:59,540 --> 00:06:02,330 How do we choose which structure to use 113 00:06:02,330 --> 00:06:06,300 should it be this one, this one, this one, or that one? 114 00:06:06,300 --> 00:06:09,260 There's a trade-off, that more complicated structures 115 00:06:09,260 --> 00:06:11,230 introduce more complexity and make 116 00:06:11,230 --> 00:06:13,290 the problem more difficult. 117 00:06:13,290 --> 00:06:14,990 But there's also another issue. 118 00:06:14,990 --> 00:06:17,050 It has to do with what do we know 119 00:06:17,050 --> 00:06:19,960 about the particular problem at hand. 120 00:06:19,960 --> 00:06:23,290 If we know or have reason to believe that third order 121 00:06:23,290 --> 00:06:27,180 polynomials are going to give us excellent estimates of theta, 122 00:06:27,180 --> 00:06:31,450 then we may want to work within this class. 123 00:06:31,450 --> 00:06:33,990 In any case, the moral of this story 124 00:06:33,990 --> 00:06:38,600 is that if we are to use the linear estimation methodology, 125 00:06:38,600 --> 00:06:40,860 we do have some choices. 126 00:06:40,860 --> 00:06:43,790 Linear in what? 127 00:06:43,790 --> 00:06:47,590 And different choices will give us different performance. 128 00:06:47,590 --> 00:06:51,710 But this now gets somewhat away from the subject 129 00:06:51,710 --> 00:06:56,520 of a mathematical methodology, and it gets closer to the art 130 00:06:56,520 --> 00:07:01,500 that you need to exercise in any particular problem domain.