1 00:00:00,840 --> 00:00:05,070 After our warm-up, we can now come to the real problem. 2 00:00:05,070 --> 00:00:07,320 We have, again, a random variable Theta 3 00:00:07,320 --> 00:00:09,620 with a known prior distribution. 4 00:00:09,620 --> 00:00:12,280 And we're interested in a point estimate. 5 00:00:12,280 --> 00:00:13,700 What will be different this time, 6 00:00:13,700 --> 00:00:16,820 however, is that we now have an observation. 7 00:00:16,820 --> 00:00:20,290 And we also have a model of that observation 8 00:00:20,290 --> 00:00:22,120 as a conditional distribution given 9 00:00:22,120 --> 00:00:24,220 the value of the true parameter. 10 00:00:24,220 --> 00:00:27,885 We observe a value of that random variable. 11 00:00:27,885 --> 00:00:29,410 That value is little x. 12 00:00:29,410 --> 00:00:31,650 And on the basis of that value, we 13 00:00:31,650 --> 00:00:33,910 would like to now come up with a point 14 00:00:33,910 --> 00:00:38,140 estimate of the unknown random variable Theta. 15 00:00:38,140 --> 00:00:39,380 How do we proceed? 16 00:00:39,380 --> 00:00:41,640 We can, of course, use the Bayes rule. 17 00:00:41,640 --> 00:00:44,510 And the Bayes rule is going to give us 18 00:00:44,510 --> 00:00:49,250 a distribution for the unknown random variable given 19 00:00:49,250 --> 00:00:52,440 the observation that we have obtained. 20 00:00:52,440 --> 00:00:55,850 And that distribution could be discrete or continuous. 21 00:00:55,850 --> 00:01:01,010 Let me just plot something as if it's continuous. 22 00:01:01,010 --> 00:01:03,150 And now that we have the posterior distribution 23 00:01:03,150 --> 00:01:06,610 of Theta, we would like to come up with a point estimate. 24 00:01:06,610 --> 00:01:08,160 How do we do it? 25 00:01:08,160 --> 00:01:11,039 Remember our earlier conclusion. 26 00:01:11,039 --> 00:01:13,320 If we do not have any observations, 27 00:01:13,320 --> 00:01:16,190 if we live in a universe where we have a distribution of Theta 28 00:01:16,190 --> 00:01:18,730 and we want a point estimate, the optimal, 29 00:01:18,730 --> 00:01:21,840 the one that minimizes the mean squared error, 30 00:01:21,840 --> 00:01:24,860 is the expected value of our random variable. 31 00:01:24,860 --> 00:01:26,960 But now we live in a different universe, 32 00:01:26,960 --> 00:01:29,695 in a universe where we have a conditional distribution 33 00:01:29,695 --> 00:01:31,539 of Theta. 34 00:01:31,539 --> 00:01:37,220 We want to minimize the conditional mean squared error, 35 00:01:37,220 --> 00:01:39,450 because this is the mean squared error that 36 00:01:39,450 --> 00:01:42,850 applies to this conditional universe in which we have 37 00:01:42,850 --> 00:01:45,690 obtained a particular observation. 38 00:01:45,690 --> 00:01:49,570 What is going to be the result of this minimization? 39 00:01:49,570 --> 00:01:52,610 Well, this is a problem that's identical to the problem 40 00:01:52,610 --> 00:01:56,430 of minimizing this quantity, except that now this problem is 41 00:01:56,430 --> 00:01:58,840 posed in a conditional universe. 42 00:01:58,840 --> 00:02:00,690 So we just follow the same steps. 43 00:02:00,690 --> 00:02:03,220 And obtain the same solution, the solution 44 00:02:03,220 --> 00:02:05,030 is going to be the expected value 45 00:02:05,030 --> 00:02:07,770 of the unknown random variable, except that now we 46 00:02:07,770 --> 00:02:09,419 live in a conditional universe. 47 00:02:09,419 --> 00:02:13,210 And therefore, we should take the relevant expected value 48 00:02:13,210 --> 00:02:15,890 which is the conditional expectation given 49 00:02:15,890 --> 00:02:20,980 the information that we have available in our hands. 50 00:02:20,980 --> 00:02:26,690 So to summarize, what we obtain is that the optimal estimate 51 00:02:26,690 --> 00:02:29,600 is the conditional expectation. 52 00:02:29,600 --> 00:02:32,620 And this is a relation between numbers. 53 00:02:32,620 --> 00:02:35,950 But if we want to think about it more abstractly, we have 54 00:02:35,950 --> 00:02:40,470 designed an estimator which is based on a random variable, 55 00:02:40,470 --> 00:02:44,310 capital X, and calculate the expected value 56 00:02:44,310 --> 00:02:47,980 of our random variable that we're trying to estimate, 57 00:02:47,980 --> 00:02:52,910 namely Theta, on the basis of X. 58 00:02:52,910 --> 00:02:56,560 Let us now continue with some observations. 59 00:02:56,560 --> 00:02:58,960 Remember that the expected value of Theta 60 00:02:58,960 --> 00:03:01,430 minimizes this quantity. 61 00:03:01,430 --> 00:03:03,910 And we can write this more explicitly 62 00:03:03,910 --> 00:03:06,900 in terms of the following inequality-- 63 00:03:06,900 --> 00:03:12,000 that if we use the expected value as an estimate, 64 00:03:12,000 --> 00:03:15,420 the resulting mean squared error is less than 65 00:03:15,420 --> 00:03:18,980 or equal to the mean squared error 66 00:03:18,980 --> 00:03:21,820 that we would have obtained if we 67 00:03:21,820 --> 00:03:25,220 had used any other estimate, c. 68 00:03:25,220 --> 00:03:30,430 So this is a relation that's true for all c. 69 00:03:30,430 --> 00:03:34,170 Now, let us take this inequality and translate it 70 00:03:34,170 --> 00:03:36,600 into our more interesting context where 71 00:03:36,600 --> 00:03:39,870 we have an observation available. 72 00:03:39,870 --> 00:03:42,660 Once more, the conditional expectation 73 00:03:42,660 --> 00:03:44,829 minimizes the mean squared error. 74 00:03:44,829 --> 00:03:47,570 Let us write out explicitly what this 75 00:03:47,570 --> 00:03:51,829 means in a form analogous to what we wrote down earlier. 76 00:03:51,829 --> 00:03:54,880 What it means is that the expected value 77 00:03:54,880 --> 00:04:00,580 of Theta minus the estimate, namely 78 00:04:00,580 --> 00:04:06,336 the conditional expectation, squared. 79 00:04:06,336 --> 00:04:10,610 In this conditional universe in which we live, 80 00:04:10,610 --> 00:04:17,450 this is less than or equal to the mean squared error 81 00:04:17,450 --> 00:04:29,130 that we would have obtained if we had used any other estimate 82 00:04:29,130 --> 00:04:31,980 in the place of the conditional expectation. 83 00:04:31,980 --> 00:04:36,760 So for any value g of x that we might have used, 84 00:04:36,760 --> 00:04:40,260 the error would have been at least as large. 85 00:04:40,260 --> 00:04:43,230 Why am I using this notation g here? 86 00:04:43,230 --> 00:04:45,850 Let us go back to the bigger picture. 87 00:04:45,850 --> 00:04:51,250 What we have is that we are obtaining a numerical value x. 88 00:04:51,250 --> 00:04:54,020 We do some processing to it which 89 00:04:54,020 --> 00:04:57,120 corresponds to some function g. 90 00:04:57,120 --> 00:05:01,400 And we come up with an estimate which 91 00:05:01,400 --> 00:05:06,550 is a function of the little x that we have observed. 92 00:05:06,550 --> 00:05:09,740 So no matter what estimate we use, 93 00:05:09,740 --> 00:05:12,540 the mean squared error is going to be at least as large 94 00:05:12,540 --> 00:05:14,450 as the mean squared error that we obtain 95 00:05:14,450 --> 00:05:17,840 if we use the conditional expectation. 96 00:05:17,840 --> 00:05:21,770 Now, let us take this inequality here and write it 97 00:05:21,770 --> 00:05:24,290 in a more abstract form. 98 00:05:24,290 --> 00:05:28,370 Suppose that we have settled on some particular estimator 99 00:05:28,370 --> 00:05:31,940 and we want to compare this estimator with the expected 100 00:05:31,940 --> 00:05:34,040 value estimator. 101 00:05:34,040 --> 00:05:36,220 Then we're going to get the following inequality. 102 00:05:40,110 --> 00:05:44,580 If we use the conditional expectation 103 00:05:44,580 --> 00:05:49,390 as an estimator in a conditional universe 104 00:05:49,390 --> 00:05:52,750 where we know the value of the random variable X, 105 00:05:52,750 --> 00:05:54,740 the corresponding mean squared error 106 00:05:54,740 --> 00:05:59,710 is going to be less than or equal to the mean squared error 107 00:05:59,710 --> 00:06:06,630 obtained by the alternative estimator g. 108 00:06:06,630 --> 00:06:08,690 What does this inequality say? 109 00:06:08,690 --> 00:06:12,040 This inequality is simply an abstract version 110 00:06:12,040 --> 00:06:13,890 of the previous inequality. 111 00:06:13,890 --> 00:06:20,520 The previous inequality is true for all little x. 112 00:06:20,520 --> 00:06:23,720 Here we have an inequality between random variables. 113 00:06:23,720 --> 00:06:28,160 This random variable here is a random variable 114 00:06:28,160 --> 00:06:32,500 that takes this specific numerical value, whenever 115 00:06:32,500 --> 00:06:35,305 capital X takes the value little x. 116 00:06:35,305 --> 00:06:37,780 When X takes the value little x, we're 117 00:06:37,780 --> 00:06:39,250 conditioning on this event. 118 00:06:39,250 --> 00:06:44,370 And when X is equal to little x, this quantity 119 00:06:44,370 --> 00:06:47,440 takes on this particular numerical value. 120 00:06:47,440 --> 00:06:50,060 And similarly, on the other side, 121 00:06:50,060 --> 00:06:52,810 this is a random variable that takes 122 00:06:52,810 --> 00:06:55,159 this particular numerical value whenever 123 00:06:55,159 --> 00:06:58,680 capital X is equal to little x. 124 00:06:58,680 --> 00:07:01,860 Now that we have an inequality between random variables, 125 00:07:01,860 --> 00:07:05,510 actually between conditional expectations, 126 00:07:05,510 --> 00:07:08,770 we can take expectations of both sides. 127 00:07:08,770 --> 00:07:11,160 And we use the law of iterated expectations. 128 00:07:11,160 --> 00:07:14,000 The expectation of a conditional expectation 129 00:07:14,000 --> 00:07:18,680 is an unconditional expectation. 130 00:07:18,680 --> 00:07:25,090 So we obtain this as the expected 131 00:07:25,090 --> 00:07:27,950 value of this quantity. 132 00:07:27,950 --> 00:07:32,130 And it's less than or equal to the expected value 133 00:07:32,130 --> 00:07:34,120 of this quantity using, again, the law 134 00:07:34,120 --> 00:07:36,659 of iterated expectations. 135 00:07:36,659 --> 00:07:39,510 We obtain this relation here. 136 00:07:43,980 --> 00:07:46,970 And what this inequality is saying 137 00:07:46,970 --> 00:07:50,310 is that the overall mean squared error, 138 00:07:50,310 --> 00:07:53,800 if we use the conditional expectation, now 139 00:07:53,800 --> 00:07:59,540 as an estimator, is less than or equal to the mean squared error 140 00:07:59,540 --> 00:08:04,110 that we would obtain if we had used any other estimator. 141 00:08:06,900 --> 00:08:11,150 So this inequality refers to the following picture. 142 00:08:11,150 --> 00:08:15,000 We obtain an observation which is a random variable. 143 00:08:15,000 --> 00:08:18,050 We process that random variable to come up 144 00:08:18,050 --> 00:08:23,890 with an estimator which is a function of the random variable 145 00:08:23,890 --> 00:08:28,350 that we observe and so is itself a random variable. 146 00:08:28,350 --> 00:08:31,570 So when we use this random variable to estimate Theta, 147 00:08:31,570 --> 00:08:33,799 we obtain a certain mean squared error. 148 00:08:33,799 --> 00:08:37,230 This is going to be at least as large as the mean squared error 149 00:08:37,230 --> 00:08:42,179 that we obtain if we use the conditional expectation 150 00:08:42,179 --> 00:08:44,049 as our estimator. 151 00:08:44,049 --> 00:08:48,650 So to summarize, the conditional expectation 152 00:08:48,650 --> 00:08:52,190 of Theta, viewed as a random variable, 153 00:08:52,190 --> 00:08:57,690 as an estimator, what we call the LMS estimator of Theta, 154 00:08:57,690 --> 00:08:59,850 has the property that it minimizes 155 00:08:59,850 --> 00:09:04,830 the mean squared error over all possible alternative 156 00:09:04,830 --> 00:09:06,350 estimators. 157 00:09:06,350 --> 00:09:11,420 So if you want to design this box using some other function 158 00:09:11,420 --> 00:09:15,000 g, you're going to obtain a mean squared error that's 159 00:09:15,000 --> 00:09:18,580 going to be no better than what you obtain if you were 160 00:09:18,580 --> 00:09:22,050 to use the conditional expectation.