1 00:00:00,820 --> 00:00:03,610 Let's now go through another example, 2 00:00:03,610 --> 00:00:06,770 which will be a little more challenging. 3 00:00:06,770 --> 00:00:09,040 We're going to revisit an old problem. 4 00:00:09,040 --> 00:00:12,535 We have a coin that has an unknown bias, Theta. 5 00:00:12,535 --> 00:00:15,700 And we have a prior distribution on this Theta. 6 00:00:15,700 --> 00:00:18,960 We fix some positive integer, n, we flip a coin 7 00:00:18,960 --> 00:00:21,750 n times, that has this unknown bias. 8 00:00:21,750 --> 00:00:24,180 And we record the number of heads. 9 00:00:24,180 --> 00:00:27,090 On the basis of the number of heads that have been observed, 10 00:00:27,090 --> 00:00:31,850 we wish to estimate the bias, Theta, of the coin. 11 00:00:31,850 --> 00:00:33,360 To make things more concrete, we're 12 00:00:33,360 --> 00:00:36,080 going to assume a prior distribution on Theta that 13 00:00:36,080 --> 00:00:39,120 is uniform on the unit interval. 14 00:00:39,120 --> 00:00:42,500 Now, this is a problem we have considered before. 15 00:00:42,500 --> 00:00:47,800 We have calculated the expected value of Theta given X. 16 00:00:47,800 --> 00:00:51,310 And we did find that the expected value takes 17 00:00:51,310 --> 00:00:52,920 this particular form. 18 00:00:52,920 --> 00:00:56,800 Now, notice that this is a linear function of X. 19 00:00:56,800 --> 00:01:00,180 And if it turns out the least mean squares estimator 20 00:01:00,180 --> 00:01:03,010 is a linear function of X, then we're 21 00:01:03,010 --> 00:01:05,580 guaranteed, since this is the best, 22 00:01:05,580 --> 00:01:08,020 that this is also the best within the class 23 00:01:08,020 --> 00:01:10,620 of linear estimators. 24 00:01:10,620 --> 00:01:13,120 So we immediately have the conclusion 25 00:01:13,120 --> 00:01:15,220 that the linear least mean squares 26 00:01:15,220 --> 00:01:18,900 estimator is this particular function of X. 27 00:01:18,900 --> 00:01:22,350 So there's not much left to do. 28 00:01:22,350 --> 00:01:24,700 On the other hand, just for practice, 29 00:01:24,700 --> 00:01:29,030 let us derive this answer directly from the formulas 30 00:01:29,030 --> 00:01:32,330 that we have for the linear least mean squares estimator, 31 00:01:32,330 --> 00:01:35,910 and see whether we're going to get the same answer. 32 00:01:35,910 --> 00:01:37,930 So we want to use this formula. 33 00:01:37,930 --> 00:01:41,259 And in order to apply this formula, all that we have to do 34 00:01:41,259 --> 00:01:44,979 is to calculate these expected values, this variance, 35 00:01:44,979 --> 00:01:47,110 and this covariance. 36 00:01:47,110 --> 00:01:50,539 So now let's move on to this particular calculational 37 00:01:50,539 --> 00:01:53,030 exercise. 38 00:01:53,030 --> 00:01:55,190 Let's start by writing down what we 39 00:01:55,190 --> 00:01:58,759 know about the random variables involved in this problem. 40 00:01:58,759 --> 00:02:01,180 About Theta, we know that it is uniform. 41 00:02:01,180 --> 00:02:06,470 And so it has a mean of 1/2 and a variance of 1/12. 42 00:02:06,470 --> 00:02:09,460 About X, what we know is the following. 43 00:02:09,460 --> 00:02:13,760 If you fix the bias of the coin, then the number 44 00:02:13,760 --> 00:02:16,400 of heads you're going to obtain in n flips 45 00:02:16,400 --> 00:02:20,670 has a binomial distribution, with parameters n and Theta. 46 00:02:20,670 --> 00:02:23,740 But of course, Theta itself is a random variable. 47 00:02:23,740 --> 00:02:27,220 So for this reason, this is a conditional distribution. 48 00:02:27,220 --> 00:02:29,440 But within the conditional universe, 49 00:02:29,440 --> 00:02:32,460 we know the mean and the variance of a binomial, 50 00:02:32,460 --> 00:02:34,079 and they are as follows. 51 00:02:34,079 --> 00:02:38,000 The mean of a binomial is n times the bias of the coin. 52 00:02:38,000 --> 00:02:40,910 But because we're talking about the conditional universe, 53 00:02:40,910 --> 00:02:42,960 this is a conditional expectation. 54 00:02:42,960 --> 00:02:45,010 And it's a random variable, because it's 55 00:02:45,010 --> 00:02:48,350 affected by the value of the random variable Theta. 56 00:02:48,350 --> 00:02:50,180 And similarly, for the variance, it's 57 00:02:50,180 --> 00:02:53,880 the usual formula for the variance of a binomial, 58 00:02:53,880 --> 00:02:58,780 except that now the bias itself is a random variable. 59 00:02:58,780 --> 00:03:02,250 So now let's continue with the calculation of the quantities 60 00:03:02,250 --> 00:03:06,060 that we need for the formula for our estimator. 61 00:03:06,060 --> 00:03:09,120 Let's start with the expected value of X. 62 00:03:09,120 --> 00:03:12,050 Since we know the conditional expectation of X, 63 00:03:12,050 --> 00:03:15,460 we can use the law of iterated expectations. 64 00:03:15,460 --> 00:03:19,650 The unconditional expectation is the expected value 65 00:03:19,650 --> 00:03:25,200 of the conditional expectation, which is n times Theta. 66 00:03:25,200 --> 00:03:29,480 And since the mean of Theta is 1/2, we obtain n/2. 67 00:03:32,630 --> 00:03:36,250 Let us now continue with the calculation of the variance. 68 00:03:36,250 --> 00:03:38,720 There are different ways that we can calculate it. 69 00:03:38,720 --> 00:03:41,850 One could be the law of total variance. 70 00:03:41,850 --> 00:03:44,620 But we will take the alternative approach, 71 00:03:44,620 --> 00:03:48,010 which is to use the general formula for the variance, 72 00:03:48,010 --> 00:03:50,140 that the variance is equal to the expected 73 00:03:50,140 --> 00:03:52,740 value of the square of a random variable, 74 00:03:52,740 --> 00:03:57,900 minus the square of the expected value. 75 00:03:57,900 --> 00:03:59,970 We know the expected value of X. 76 00:03:59,970 --> 00:04:01,870 So all that's left is to calculate 77 00:04:01,870 --> 00:04:04,540 the expected value of X squared. 78 00:04:04,540 --> 00:04:07,480 How are we going calculate it? 79 00:04:07,480 --> 00:04:11,110 Well, we know the conditional distribution of X. 80 00:04:11,110 --> 00:04:13,800 So it should be easy to calculate 81 00:04:13,800 --> 00:04:17,820 the conditional expectation of X squared 82 00:04:17,820 --> 00:04:21,190 in the conditional universe, and then 83 00:04:21,190 --> 00:04:24,120 use the law of iterated expectations 84 00:04:24,120 --> 00:04:27,460 to obtain the unconditional expectation. 85 00:04:27,460 --> 00:04:30,740 So now, we need to calculate this conditional expectation 86 00:04:30,740 --> 00:04:32,330 here. 87 00:04:32,330 --> 00:04:34,230 How do we do it? 88 00:04:34,230 --> 00:04:36,780 The expected value of a square of a random variable 89 00:04:36,780 --> 00:04:40,370 is always equal to the variance of that random variable 90 00:04:40,370 --> 00:04:43,800 plus the square of the expected value. 91 00:04:43,800 --> 00:04:45,530 We're going to use this property, 92 00:04:45,530 --> 00:04:48,340 but we're going to use it in the conditional universe. 93 00:04:48,340 --> 00:04:50,580 So in the conditional universe, this 94 00:04:50,580 --> 00:04:52,900 is going to be equal to the variance, 95 00:04:52,900 --> 00:04:57,670 in the conditional universe, which is n times Theta times 1 96 00:04:57,670 --> 00:05:04,020 minus Theta, plus the square of the expected value of X, 97 00:05:04,020 --> 00:05:07,300 but the expected value in the conditional universe, which 98 00:05:07,300 --> 00:05:10,050 is this quantity-- n times Theta. 99 00:05:10,050 --> 00:05:16,210 So we obtain another term-- n squared, Theta squared. 100 00:05:16,210 --> 00:05:19,330 So now we can go back to our previous calculation, 101 00:05:19,330 --> 00:05:22,430 and right here, the expected value 102 00:05:22,430 --> 00:05:26,690 of this expression, which is n times Theta. 103 00:05:31,210 --> 00:05:34,930 And then we have some Theta squared terms. 104 00:05:34,930 --> 00:05:36,570 One is n squared. 105 00:05:36,570 --> 00:05:39,159 The other is a minus n. 106 00:05:39,159 --> 00:05:46,880 So we obtain plus n squared minus n Theta squared. 107 00:05:51,380 --> 00:05:54,409 The expected value of n times Theta 108 00:05:54,409 --> 00:05:58,700 is n times the expected value of Theta, which is 1/2. 109 00:05:58,700 --> 00:06:01,715 So we obtain a factor of n/2. 110 00:06:04,460 --> 00:06:08,270 But then we have this additional term here. 111 00:06:08,270 --> 00:06:11,120 We need the expected value of Theta squared. 112 00:06:11,120 --> 00:06:12,180 What is it? 113 00:06:12,180 --> 00:06:15,760 Well, since we know the mean and the variance of Theta, 114 00:06:15,760 --> 00:06:18,470 we can calculate the expected value of Theta squared. 115 00:06:18,470 --> 00:06:26,410 It is equal to the variance plus the square of the mean. 116 00:06:26,410 --> 00:06:30,630 And this evaluates to 1/3. 117 00:06:30,630 --> 00:06:33,880 So from here, we're going to obtain 118 00:06:33,880 --> 00:06:41,540 plus n squared minus n divided by 3. 119 00:06:41,540 --> 00:06:45,450 And you can rewrite this term in a different way 120 00:06:45,450 --> 00:06:52,380 by collecting first the n terms, n/2 minus n/3. 121 00:06:52,380 --> 00:06:53,815 This gives us n/6. 122 00:06:56,510 --> 00:06:58,540 And then there's the n squared term, 123 00:06:58,540 --> 00:07:01,170 which is n squared over 3. 124 00:07:03,850 --> 00:07:06,590 Now that we found the expected value of X squared, 125 00:07:06,590 --> 00:07:09,260 we can go back to this calculation. 126 00:07:09,260 --> 00:07:17,850 We have n/6 plus n squared over 3 127 00:07:17,850 --> 00:07:20,970 minus the square of the expected value of X, 128 00:07:20,970 --> 00:07:22,600 which is this expression here. 129 00:07:22,600 --> 00:07:26,480 So we obtain minus n squared over 4. 130 00:07:26,480 --> 00:07:31,130 1/3 minus 1/4-- that makes 1/12. 131 00:07:31,130 --> 00:07:42,170 So we obtain n/6 plus n squared over 12. 132 00:07:42,170 --> 00:07:49,270 Or another way of writing this is n times n plus 2 over 12. 133 00:07:51,880 --> 00:07:53,840 And this completes the calculation 134 00:07:53,840 --> 00:07:56,470 of the variance of X. 135 00:07:56,470 --> 00:07:59,290 The last quantity that's left for us to calculate 136 00:07:59,290 --> 00:08:03,080 is the covariance of Theta with X. 137 00:08:03,080 --> 00:08:06,640 We're going to calculate it using the alternative formula 138 00:08:06,640 --> 00:08:11,320 for the covariance, which is the expectation of the product 139 00:08:11,320 --> 00:08:13,206 minus the product of the expectations. 140 00:08:16,960 --> 00:08:19,300 We have the expectations, but we do not 141 00:08:19,300 --> 00:08:21,540 have the expectation of the product. 142 00:08:21,540 --> 00:08:23,950 So we need to calculate it. 143 00:08:23,950 --> 00:08:26,160 Once more, it's going to be the same trick. 144 00:08:26,160 --> 00:08:28,800 We're going to condition on Theta, 145 00:08:28,800 --> 00:08:33,070 and then use the law of iterated expectations. 146 00:08:33,070 --> 00:08:35,350 So the law of iterated expectations, 147 00:08:35,350 --> 00:08:38,789 when you condition on Theta, takes this form. 148 00:08:38,789 --> 00:08:41,230 And to continue here, we need to find 149 00:08:41,230 --> 00:08:45,220 this conditional expectation in the inside. 150 00:08:45,220 --> 00:08:49,030 This conditional expectation-- what is it? 151 00:08:49,030 --> 00:08:51,630 If I give you Theta, then you know Theta. 152 00:08:51,630 --> 00:08:53,470 It is becoming now a constant. 153 00:08:53,470 --> 00:08:55,410 There's nothing random to it, so it 154 00:08:55,410 --> 00:08:58,600 can be pulled outside the expectation. 155 00:08:58,600 --> 00:09:03,780 And we obtain Theta times the conditional expectation of X. 156 00:09:03,780 --> 00:09:06,740 We know what the conditional expectation of X is. 157 00:09:06,740 --> 00:09:08,070 It's n times Theta. 158 00:09:08,070 --> 00:09:13,330 So from here, we obtain overall, a term n times Theta squared. 159 00:09:13,330 --> 00:09:15,940 So now we can go back here. 160 00:09:15,940 --> 00:09:17,780 We have the expected value. 161 00:09:17,780 --> 00:09:20,890 And the term in the inside-- we just found it. 162 00:09:20,890 --> 00:09:23,390 It's n times Theta squared. 163 00:09:23,390 --> 00:09:25,750 And since the expected value of Theta squared 164 00:09:25,750 --> 00:09:28,695 is 1/3, from here, we obtain n/3. 165 00:09:32,670 --> 00:09:36,040 And now we can go back, finally, to the calculation 166 00:09:36,040 --> 00:09:37,330 of the covariance. 167 00:09:37,330 --> 00:09:43,160 It's going to be n/3, minus the expected value of Theta, 168 00:09:43,160 --> 00:09:48,350 which is 1/2, times the expected value of X, which is n/2. 169 00:09:48,350 --> 00:09:52,310 So it's minus n over four. 170 00:09:52,310 --> 00:09:54,330 And this evaluates to n/12. 171 00:09:57,880 --> 00:10:01,240 So we have succeeded in calculating all the quantities 172 00:10:01,240 --> 00:10:04,830 that are needed in the formula for the linear least 173 00:10:04,830 --> 00:10:06,045 mean squares estimator. 174 00:10:08,710 --> 00:10:12,540 We can now take those values that we have just found 175 00:10:12,540 --> 00:10:15,620 and substitute them into this formula. 176 00:10:15,620 --> 00:10:18,650 And after a little bit of algebra and moving terms 177 00:10:18,650 --> 00:10:22,820 around, everything simplifies to this expression. 178 00:10:22,820 --> 00:10:25,670 Just to verify that this makes sense, 179 00:10:25,670 --> 00:10:28,122 what is the coefficient next to X? 180 00:10:28,122 --> 00:10:30,820 It's the covariance divided by the variance. 181 00:10:30,820 --> 00:10:34,320 n/12 divided by this expression-- 182 00:10:34,320 --> 00:10:36,050 this n cancels that n. 183 00:10:36,050 --> 00:10:37,410 This 12 cancels that 12. 184 00:10:37,410 --> 00:10:40,460 We're left with an n plus 2 in the denominator. 185 00:10:40,460 --> 00:10:43,520 And indeed, the coefficient that multiplies X 186 00:10:43,520 --> 00:10:47,960 is the term n plus 2 in the denominator. 187 00:10:47,960 --> 00:10:51,280 And you can similarly verify that the constant term as well 188 00:10:51,280 --> 00:10:53,870 is the correct one. 189 00:10:53,870 --> 00:10:58,400 So of course, this answer is what we had found in the past 190 00:10:58,400 --> 00:11:02,160 to be the optimal, the least mean squares 191 00:11:02,160 --> 00:11:08,310 estimator of X. As we discussed earlier, when this is linear 192 00:11:08,310 --> 00:11:11,970 in X, it has to be the same as the linear least mean squares 193 00:11:11,970 --> 00:11:13,050 estimator. 194 00:11:13,050 --> 00:11:15,530 So this answer is not a surprise, 195 00:11:15,530 --> 00:11:17,930 but it was an interesting and perhaps useful 196 00:11:17,930 --> 00:11:21,440 exercise to go through the details of this calculation 197 00:11:21,440 --> 00:11:24,030 to see what it takes to figure out 198 00:11:24,030 --> 00:11:27,360 the different terms in this formula.