1
00:00:01,000 --> 00:00:03,540
We have presented
the complete solution

2
00:00:03,540 --> 00:00:06,920
to the liner least mean squares
estimation problem, when

3
00:00:06,920 --> 00:00:10,430
we want to estimate a certain
unknown random variable

4
00:00:10,430 --> 00:00:13,720
on the basis of a
different random variable X

5
00:00:13,720 --> 00:00:15,550
that we get to observe.

6
00:00:15,550 --> 00:00:19,720
But what if we have
multiple observations?

7
00:00:19,720 --> 00:00:23,510
What would be the analogous
formulation of the problem?

8
00:00:23,510 --> 00:00:24,950
Here's the idea.

9
00:00:24,950 --> 00:00:28,320
Once more, we restrict
ourselves to estimators

10
00:00:28,320 --> 00:00:31,970
that are linear functions of
the data, linear functions

11
00:00:31,970 --> 00:00:34,280
of the observations
that we have.

12
00:00:34,280 --> 00:00:37,670
And then we pose the
problem of finding the best

13
00:00:37,670 --> 00:00:42,570
choices of these coefficients
a1 up to a n and b.

14
00:00:42,570 --> 00:00:45,540
What does it mean to
find the best choices?

15
00:00:45,540 --> 00:00:49,010
It means that if we
fix certain choices,

16
00:00:49,010 --> 00:00:52,170
we obtain an estimator,
we look at the difference

17
00:00:52,170 --> 00:00:54,480
between the estimator
and the quantity

18
00:00:54,480 --> 00:00:56,700
we're trying to estimate,
take the square,

19
00:00:56,700 --> 00:00:58,880
and then take the expectation.

20
00:00:58,880 --> 00:01:01,910
So once more, we're looking
at the mean squared error

21
00:01:01,910 --> 00:01:06,970
of our estimator and we try to
make it as small as possible.

22
00:01:06,970 --> 00:01:10,760
So this is a well-defined
optimization problem.

23
00:01:10,760 --> 00:01:15,830
We have a quantity, which is a
function of certain parameters.

24
00:01:15,830 --> 00:01:19,050
And we wish to find the
choices for those parameters,

25
00:01:19,050 --> 00:01:21,420
or those coefficients,
that will make

26
00:01:21,420 --> 00:01:24,930
this quantity as
small as possible.

27
00:01:24,930 --> 00:01:27,820
One first comment is
similar to the case

28
00:01:27,820 --> 00:01:30,920
where we had a single
measurement [and]

29
00:01:30,920 --> 00:01:32,280
is the following.

30
00:01:32,280 --> 00:01:35,560
If it turns out that the
conditional expectation

31
00:01:35,560 --> 00:01:38,590
of Theta given all
of the data that we

32
00:01:38,590 --> 00:01:44,440
have is linear in X, if it is
of this form, then what happens?

33
00:01:44,440 --> 00:01:47,990
We know that this is the
best possible estimator.

34
00:01:47,990 --> 00:01:51,720
If it is also linear, then
it is the best estimator

35
00:01:51,720 --> 00:01:55,470
within the class of
linear estimators as well

36
00:01:55,470 --> 00:01:59,100
and, therefore, the linear
least mean squares estimator

37
00:01:59,100 --> 00:02:03,800
is the same as the general
least mean squares estimator.

38
00:02:03,800 --> 00:02:08,050
So if for some problems it
turns out that this is linear,

39
00:02:08,050 --> 00:02:13,240
then we automatically also have
the optimal linear estimator.

40
00:02:13,240 --> 00:02:15,520
And this is going to
be the case, once more,

41
00:02:15,520 --> 00:02:20,560
for certain normal problems with
a linear structure of the type

42
00:02:20,560 --> 00:02:22,520
that we have studied earlier.

43
00:02:25,740 --> 00:02:28,870
Now, let us look
into what it takes

44
00:02:28,870 --> 00:02:32,079
to carry out this optimization.

45
00:02:32,079 --> 00:02:35,100
If we had a single
observation, then we

46
00:02:35,100 --> 00:02:38,710
have seen a closed form formula,
a fairly simple formula,

47
00:02:38,710 --> 00:02:41,650
that tells us what the
coefficients should be.

48
00:02:41,650 --> 00:02:43,920
For the more general
case, formulas

49
00:02:43,920 --> 00:02:47,090
would not be as
simple, but we can

50
00:02:47,090 --> 00:02:49,700
make the following observations.

51
00:02:49,700 --> 00:02:53,510
If you take this
expression and expand it,

52
00:02:53,510 --> 00:02:56,250
it's going to have
a bunch of terms.

53
00:02:56,250 --> 00:03:00,650
For example, it's going to have
a term of the form a1 squared

54
00:03:00,650 --> 00:03:04,730
times the expected
value of X1 squared.

55
00:03:04,730 --> 00:03:11,590
It's going to have a term
such as twice a1, a2 times

56
00:03:11,590 --> 00:03:16,150
the expected value of X1, X2.

57
00:03:16,150 --> 00:03:20,760
And then there's going to be
many more terms to some of them

58
00:03:20,760 --> 00:03:26,920
will also involve products
of Theta with this.

59
00:03:26,920 --> 00:03:32,829
So we might see that we have
a term of the form a1 expected

60
00:03:32,829 --> 00:03:36,290
value of X1 Theta.

61
00:03:36,290 --> 00:03:40,010
And then, there's going to
be many, many more terms.

62
00:03:40,010 --> 00:03:42,350
What's the important
thing to notice?

63
00:03:42,350 --> 00:03:46,980
That this expression as a
function of the coefficient

64
00:03:46,980 --> 00:03:49,526
involves terms
either of this kind

65
00:03:49,526 --> 00:03:51,570
or of this kind,
or of that kind,

66
00:03:51,570 --> 00:03:55,800
first-order or
second-order terms.

67
00:03:55,800 --> 00:03:57,430
To minimize this
expression, we're

68
00:03:57,430 --> 00:04:02,730
going to take the derivative
of this and set it equal to 0.

69
00:04:02,730 --> 00:04:06,210
When you take the derivative
of a function that

70
00:04:06,210 --> 00:04:09,660
involves only quadratic
and linear terms,

71
00:04:09,660 --> 00:04:14,410
you get something that's
linear in the coefficients.

72
00:04:14,410 --> 00:04:16,730
The conclusion out of
all this discussion

73
00:04:16,730 --> 00:04:21,480
is that when you actually go
and carry out this minimization

74
00:04:21,480 --> 00:04:23,930
by setting derivatives
to zero, what you

75
00:04:23,930 --> 00:04:29,130
will end up doing is solving
a system of linear equations

76
00:04:29,130 --> 00:04:32,085
in the coefficients that
you're trying to determine.

77
00:04:32,085 --> 00:04:34,310
And why is this interesting?

78
00:04:34,310 --> 00:04:36,650
Well, it is because
if you actually

79
00:04:36,650 --> 00:04:39,010
want to carry out
this minimization,

80
00:04:39,010 --> 00:04:43,050
all you need to do is to solve
a linear system, which is easily

81
00:04:43,050 --> 00:04:46,370
done on a computer.

82
00:04:46,370 --> 00:04:51,100
The next observation is
that this expression only

83
00:04:51,100 --> 00:04:55,860
involves expectations
of various terms

84
00:04:55,860 --> 00:04:59,750
that are second order in the
random variables involved.

85
00:04:59,750 --> 00:05:02,950
So it involves the expected
value of X1 squared,

86
00:05:02,950 --> 00:05:05,050
it involves this term,
which has something

87
00:05:05,050 --> 00:05:07,960
to do with the
covariance of X1 and X2.

88
00:05:07,960 --> 00:05:11,280
This term that has something
to do with the covariance of X1

89
00:05:11,280 --> 00:05:12,910
with Theta.

90
00:05:12,910 --> 00:05:17,480
But these are the only terms out
of the distribution of the X's

91
00:05:17,480 --> 00:05:20,310
and of Theta that will matter.

92
00:05:20,310 --> 00:05:25,420
So similar to the case where
we had a single observation,

93
00:05:25,420 --> 00:05:27,360
in order to solve
this problem, we

94
00:05:27,360 --> 00:05:31,590
do not need to know the
complete distribution of the X's

95
00:05:31,590 --> 00:05:32,705
and of Theta.

96
00:05:32,705 --> 00:05:35,570
It is enough to know
all of the means,

97
00:05:35,570 --> 00:05:39,040
variances, and covariances
of the random variables

98
00:05:39,040 --> 00:05:40,550
that are involved.

99
00:05:40,550 --> 00:05:43,390
And once more, this
makes this approach

100
00:05:43,390 --> 00:05:47,060
to estimation a practical
one, because we do not

101
00:05:47,060 --> 00:05:50,090
need to model in complete
detail the distribution

102
00:05:50,090 --> 00:05:53,470
of the different
random variables.

103
00:05:53,470 --> 00:05:58,130
Finally, if we do not have just
one unknown random variable,

104
00:05:58,130 --> 00:06:00,570
but we have multiple
random variables that we

105
00:06:00,570 --> 00:06:03,740
want to estimate,
what should we do?

106
00:06:03,740 --> 00:06:05,800
Well, this is pretty simple.

107
00:06:05,800 --> 00:06:08,250
You just apply this
estimation methodology

108
00:06:08,250 --> 00:06:13,390
to each one of the unknown
random variables separately.

109
00:06:13,390 --> 00:06:18,720
To conclude, this linear
estimation methodology

110
00:06:18,720 --> 00:06:23,900
applies also to the case where
you have multiple observations.

111
00:06:23,900 --> 00:06:27,120
You need to solve a certain
computational problem in order

112
00:06:27,120 --> 00:06:30,390
to find the structure of
the best linear estimator,

113
00:06:30,390 --> 00:06:33,640
but it is not a very difficult
computational problem,

114
00:06:33,640 --> 00:06:36,260
because all that it
involves is to minimize

115
00:06:36,260 --> 00:06:38,780
a quadratic function
of the coefficients

116
00:06:38,780 --> 00:06:40,720
that you are trying
to determine.

117
00:06:40,720 --> 00:06:43,130
And this leads us
to having to solve

118
00:06:43,130 --> 00:06:45,230
a system of linear equations.

119
00:06:45,230 --> 00:06:48,420
For all these reasons,
linear estimation,

120
00:06:48,420 --> 00:06:53,310
or estimation using linear
estimators, is quite practical.