1
00:00:00,840 --> 00:00:05,070
After our warm-up, we can
now come to the real problem.

2
00:00:05,070 --> 00:00:07,320
We have, again, a
random variable Theta

3
00:00:07,320 --> 00:00:09,620
with a known prior distribution.

4
00:00:09,620 --> 00:00:12,280
And we're interested
in a point estimate.

5
00:00:12,280 --> 00:00:13,700
What will be
different this time,

6
00:00:13,700 --> 00:00:16,820
however, is that we now
have an observation.

7
00:00:16,820 --> 00:00:20,290
And we also have a model
of that observation

8
00:00:20,290 --> 00:00:22,120
as a conditional
distribution given

9
00:00:22,120 --> 00:00:24,220
the value of the true parameter.

10
00:00:24,220 --> 00:00:27,885
We observe a value of
that random variable.

11
00:00:27,885 --> 00:00:29,410
That value is little x.

12
00:00:29,410 --> 00:00:31,650
And on the basis
of that value, we

13
00:00:31,650 --> 00:00:33,910
would like to now
come up with a point

14
00:00:33,910 --> 00:00:38,140
estimate of the unknown
random variable Theta.

15
00:00:38,140 --> 00:00:39,380
How do we proceed?

16
00:00:39,380 --> 00:00:41,640
We can, of course,
use the Bayes rule.

17
00:00:41,640 --> 00:00:44,510
And the Bayes rule
is going to give us

18
00:00:44,510 --> 00:00:49,250
a distribution for the
unknown random variable given

19
00:00:49,250 --> 00:00:52,440
the observation that
we have obtained.

20
00:00:52,440 --> 00:00:55,850
And that distribution could
be discrete or continuous.

21
00:00:55,850 --> 00:01:01,010
Let me just plot something
as if it's continuous.

22
00:01:01,010 --> 00:01:03,150
And now that we have the
posterior distribution

23
00:01:03,150 --> 00:01:06,610
of Theta, we would like to
come up with a point estimate.

24
00:01:06,610 --> 00:01:08,160
How do we do it?

25
00:01:08,160 --> 00:01:11,039
Remember our earlier conclusion.

26
00:01:11,039 --> 00:01:13,320
If we do not have
any observations,

27
00:01:13,320 --> 00:01:16,190
if we live in a universe where
we have a distribution of Theta

28
00:01:16,190 --> 00:01:18,730
and we want a point
estimate, the optimal,

29
00:01:18,730 --> 00:01:21,840
the one that minimizes
the mean squared error,

30
00:01:21,840 --> 00:01:24,860
is the expected value
of our random variable.

31
00:01:24,860 --> 00:01:26,960
But now we live in a
different universe,

32
00:01:26,960 --> 00:01:29,695
in a universe where we have
a conditional distribution

33
00:01:29,695 --> 00:01:31,539
of Theta.

34
00:01:31,539 --> 00:01:37,220
We want to minimize the
conditional mean squared error,

35
00:01:37,220 --> 00:01:39,450
because this is the
mean squared error that

36
00:01:39,450 --> 00:01:42,850
applies to this conditional
universe in which we have

37
00:01:42,850 --> 00:01:45,690
obtained a particular
observation.

38
00:01:45,690 --> 00:01:49,570
What is going to be the
result of this minimization?

39
00:01:49,570 --> 00:01:52,610
Well, this is a problem that's
identical to the problem

40
00:01:52,610 --> 00:01:56,430
of minimizing this quantity,
except that now this problem is

41
00:01:56,430 --> 00:01:58,840
posed in a conditional universe.

42
00:01:58,840 --> 00:02:00,690
So we just follow
the same steps.

43
00:02:00,690 --> 00:02:03,220
And obtain the same
solution, the solution

44
00:02:03,220 --> 00:02:05,030
is going to be
the expected value

45
00:02:05,030 --> 00:02:07,770
of the unknown random
variable, except that now we

46
00:02:07,770 --> 00:02:09,419
live in a conditional universe.

47
00:02:09,419 --> 00:02:13,210
And therefore, we should take
the relevant expected value

48
00:02:13,210 --> 00:02:15,890
which is the conditional
expectation given

49
00:02:15,890 --> 00:02:20,980
the information that we
have available in our hands.

50
00:02:20,980 --> 00:02:26,690
So to summarize, what we obtain
is that the optimal estimate

51
00:02:26,690 --> 00:02:29,600
is the conditional expectation.

52
00:02:29,600 --> 00:02:32,620
And this is a relation
between numbers.

53
00:02:32,620 --> 00:02:35,950
But if we want to think about
it more abstractly, we have

54
00:02:35,950 --> 00:02:40,470
designed an estimator which
is based on a random variable,

55
00:02:40,470 --> 00:02:44,310
capital X, and calculate
the expected value

56
00:02:44,310 --> 00:02:47,980
of our random variable that
we're trying to estimate,

57
00:02:47,980 --> 00:02:52,910
namely Theta, on the basis of X.

58
00:02:52,910 --> 00:02:56,560
Let us now continue
with some observations.

59
00:02:56,560 --> 00:02:58,960
Remember that the
expected value of Theta

60
00:02:58,960 --> 00:03:01,430
minimizes this quantity.

61
00:03:01,430 --> 00:03:03,910
And we can write
this more explicitly

62
00:03:03,910 --> 00:03:06,900
in terms of the
following inequality--

63
00:03:06,900 --> 00:03:12,000
that if we use the expected
value as an estimate,

64
00:03:12,000 --> 00:03:15,420
the resulting mean
squared error is less than

65
00:03:15,420 --> 00:03:18,980
or equal to the
mean squared error

66
00:03:18,980 --> 00:03:21,820
that we would have
obtained if we

67
00:03:21,820 --> 00:03:25,220
had used any other estimate, c.

68
00:03:25,220 --> 00:03:30,430
So this is a relation
that's true for all c.

69
00:03:30,430 --> 00:03:34,170
Now, let us take this
inequality and translate it

70
00:03:34,170 --> 00:03:36,600
into our more
interesting context where

71
00:03:36,600 --> 00:03:39,870
we have an
observation available.

72
00:03:39,870 --> 00:03:42,660
Once more, the
conditional expectation

73
00:03:42,660 --> 00:03:44,829
minimizes the mean
squared error.

74
00:03:44,829 --> 00:03:47,570
Let us write out
explicitly what this

75
00:03:47,570 --> 00:03:51,829
means in a form analogous to
what we wrote down earlier.

76
00:03:51,829 --> 00:03:54,880
What it means is that
the expected value

77
00:03:54,880 --> 00:04:00,580
of Theta minus the
estimate, namely

78
00:04:00,580 --> 00:04:06,336
the conditional
expectation, squared.

79
00:04:06,336 --> 00:04:10,610
In this conditional
universe in which we live,

80
00:04:10,610 --> 00:04:17,450
this is less than or equal
to the mean squared error

81
00:04:17,450 --> 00:04:29,130
that we would have obtained if
we had used any other estimate

82
00:04:29,130 --> 00:04:31,980
in the place of the
conditional expectation.

83
00:04:31,980 --> 00:04:36,760
So for any value g of x
that we might have used,

84
00:04:36,760 --> 00:04:40,260
the error would have
been at least as large.

85
00:04:40,260 --> 00:04:43,230
Why am I using this
notation g here?

86
00:04:43,230 --> 00:04:45,850
Let us go back to
the bigger picture.

87
00:04:45,850 --> 00:04:51,250
What we have is that we are
obtaining a numerical value x.

88
00:04:51,250 --> 00:04:54,020
We do some processing
to it which

89
00:04:54,020 --> 00:04:57,120
corresponds to some function g.

90
00:04:57,120 --> 00:05:01,400
And we come up with
an estimate which

91
00:05:01,400 --> 00:05:06,550
is a function of the little
x that we have observed.

92
00:05:06,550 --> 00:05:09,740
So no matter what
estimate we use,

93
00:05:09,740 --> 00:05:12,540
the mean squared error is
going to be at least as large

94
00:05:12,540 --> 00:05:14,450
as the mean squared
error that we obtain

95
00:05:14,450 --> 00:05:17,840
if we use the
conditional expectation.

96
00:05:17,840 --> 00:05:21,770
Now, let us take this
inequality here and write it

97
00:05:21,770 --> 00:05:24,290
in a more abstract form.

98
00:05:24,290 --> 00:05:28,370
Suppose that we have settled
on some particular estimator

99
00:05:28,370 --> 00:05:31,940
and we want to compare this
estimator with the expected

100
00:05:31,940 --> 00:05:34,040
value estimator.

101
00:05:34,040 --> 00:05:36,220
Then we're going to get
the following inequality.

102
00:05:40,110 --> 00:05:44,580
If we use the
conditional expectation

103
00:05:44,580 --> 00:05:49,390
as an estimator in a
conditional universe

104
00:05:49,390 --> 00:05:52,750
where we know the value
of the random variable X,

105
00:05:52,750 --> 00:05:54,740
the corresponding
mean squared error

106
00:05:54,740 --> 00:05:59,710
is going to be less than or
equal to the mean squared error

107
00:05:59,710 --> 00:06:06,630
obtained by the
alternative estimator g.

108
00:06:06,630 --> 00:06:08,690
What does this inequality say?

109
00:06:08,690 --> 00:06:12,040
This inequality is simply
an abstract version

110
00:06:12,040 --> 00:06:13,890
of the previous inequality.

111
00:06:13,890 --> 00:06:20,520
The previous inequality
is true for all little x.

112
00:06:20,520 --> 00:06:23,720
Here we have an inequality
between random variables.

113
00:06:23,720 --> 00:06:28,160
This random variable
here is a random variable

114
00:06:28,160 --> 00:06:32,500
that takes this specific
numerical value, whenever

115
00:06:32,500 --> 00:06:35,305
capital X takes
the value little x.

116
00:06:35,305 --> 00:06:37,780
When X takes the
value little x, we're

117
00:06:37,780 --> 00:06:39,250
conditioning on this event.

118
00:06:39,250 --> 00:06:44,370
And when X is equal to
little x, this quantity

119
00:06:44,370 --> 00:06:47,440
takes on this particular
numerical value.

120
00:06:47,440 --> 00:06:50,060
And similarly, on
the other side,

121
00:06:50,060 --> 00:06:52,810
this is a random
variable that takes

122
00:06:52,810 --> 00:06:55,159
this particular
numerical value whenever

123
00:06:55,159 --> 00:06:58,680
capital X is equal to little x.

124
00:06:58,680 --> 00:07:01,860
Now that we have an inequality
between random variables,

125
00:07:01,860 --> 00:07:05,510
actually between
conditional expectations,

126
00:07:05,510 --> 00:07:08,770
we can take expectations
of both sides.

127
00:07:08,770 --> 00:07:11,160
And we use the law of
iterated expectations.

128
00:07:11,160 --> 00:07:14,000
The expectation of a
conditional expectation

129
00:07:14,000 --> 00:07:18,680
is an unconditional expectation.

130
00:07:18,680 --> 00:07:25,090
So we obtain this
as the expected

131
00:07:25,090 --> 00:07:27,950
value of this quantity.

132
00:07:27,950 --> 00:07:32,130
And it's less than or
equal to the expected value

133
00:07:32,130 --> 00:07:34,120
of this quantity
using, again, the law

134
00:07:34,120 --> 00:07:36,659
of iterated expectations.

135
00:07:36,659 --> 00:07:39,510
We obtain this relation here.

136
00:07:43,980 --> 00:07:46,970
And what this
inequality is saying

137
00:07:46,970 --> 00:07:50,310
is that the overall
mean squared error,

138
00:07:50,310 --> 00:07:53,800
if we use the conditional
expectation, now

139
00:07:53,800 --> 00:07:59,540
as an estimator, is less than or
equal to the mean squared error

140
00:07:59,540 --> 00:08:04,110
that we would obtain if we
had used any other estimator.

141
00:08:06,900 --> 00:08:11,150
So this inequality refers
to the following picture.

142
00:08:11,150 --> 00:08:15,000
We obtain an observation
which is a random variable.

143
00:08:15,000 --> 00:08:18,050
We process that random
variable to come up

144
00:08:18,050 --> 00:08:23,890
with an estimator which is a
function of the random variable

145
00:08:23,890 --> 00:08:28,350
that we observe and so is
itself a random variable.

146
00:08:28,350 --> 00:08:31,570
So when we use this random
variable to estimate Theta,

147
00:08:31,570 --> 00:08:33,799
we obtain a certain
mean squared error.

148
00:08:33,799 --> 00:08:37,230
This is going to be at least as
large as the mean squared error

149
00:08:37,230 --> 00:08:42,179
that we obtain if we use
the conditional expectation

150
00:08:42,179 --> 00:08:44,049
as our estimator.

151
00:08:44,049 --> 00:08:48,650
So to summarize, the
conditional expectation

152
00:08:48,650 --> 00:08:52,190
of Theta, viewed as
a random variable,

153
00:08:52,190 --> 00:08:57,690
as an estimator, what we call
the LMS estimator of Theta,

154
00:08:57,690 --> 00:08:59,850
has the property
that it minimizes

155
00:08:59,850 --> 00:09:04,830
the mean squared error over
all possible alternative

156
00:09:04,830 --> 00:09:06,350
estimators.

157
00:09:06,350 --> 00:09:11,420
So if you want to design this
box using some other function

158
00:09:11,420 --> 00:09:15,000
g, you're going to obtain
a mean squared error that's

159
00:09:15,000 --> 00:09:18,580
going to be no better than
what you obtain if you were

160
00:09:18,580 --> 00:09:22,050
to use the conditional
expectation.