1
00:00:01,480 --> 00:00:04,670
The trajectory estimation
problem that we considered

2
00:00:04,670 --> 00:00:08,220
gives you a first glimpse
into a large field

3
00:00:08,220 --> 00:00:11,380
that deals with
linear normal models.

4
00:00:11,380 --> 00:00:13,220
In this segment, I
will just give you

5
00:00:13,220 --> 00:00:16,050
a preview of what
happens in that field,

6
00:00:16,050 --> 00:00:20,250
although, we will not attempt
to prove or justify anything.

7
00:00:20,250 --> 00:00:22,630
What happens in this
field is that we're

8
00:00:22,630 --> 00:00:25,970
dealing with models where
there are some underlying

9
00:00:25,970 --> 00:00:28,510
independent normal
random variables.

10
00:00:28,510 --> 00:00:31,110
And then, the random
variables of interest,

11
00:00:31,110 --> 00:00:33,420
the unknown parameters
and the observations,

12
00:00:33,420 --> 00:00:35,960
can all be expressed
as linear functions

13
00:00:35,960 --> 00:00:38,140
of these independent normals.

14
00:00:38,140 --> 00:00:40,730
Since linear functions
of independent normals

15
00:00:40,730 --> 00:00:43,320
are normal, in
particular, this means

16
00:00:43,320 --> 00:00:48,620
that the Theta j and the Xi are
all normal random variables.

17
00:00:48,620 --> 00:00:52,720
Carrying out inference
within this class of models

18
00:00:52,720 --> 00:00:55,580
goes under the name
of linear regression.

19
00:00:59,530 --> 00:01:02,440
One can proceed using
pretty much the same steps

20
00:01:02,440 --> 00:01:05,930
as we had in the trajectory
estimation problem

21
00:01:05,930 --> 00:01:08,700
and write down formulas
for the posterior.

22
00:01:08,700 --> 00:01:11,000
And it turns out
that in every case,

23
00:01:11,000 --> 00:01:14,280
the posterior of
the parameter vector

24
00:01:14,280 --> 00:01:19,620
takes a form which is the
exponential of and the negative

25
00:01:19,620 --> 00:01:22,830
of a quadratic function
of the parameters.

26
00:01:22,830 --> 00:01:25,370
And this means, in
particular, that in order

27
00:01:25,370 --> 00:01:30,470
to find the MAP estimate
of the vector Theta, what

28
00:01:30,470 --> 00:01:32,880
we need to do is
to just minimize

29
00:01:32,880 --> 00:01:35,940
this quadratic function
with respect to theta.

30
00:01:35,940 --> 00:01:38,310
And minimizing a
quadratic function

31
00:01:38,310 --> 00:01:40,820
is done by taking
derivatives and setting them

32
00:01:40,820 --> 00:01:45,800
to zero which leads us to a
system of linear equations,

33
00:01:45,800 --> 00:01:50,000
exactly as in the trajectory
inference problem.

34
00:01:50,000 --> 00:01:54,037
And this means that numerically
it is very simple to come up

35
00:01:54,037 --> 00:01:54,870
with a MAP estimate.

36
00:01:57,479 --> 00:02:00,360
There's an interesting
fact that comes out

37
00:02:00,360 --> 00:02:03,660
of the algebra involved,
namely, that the MAP

38
00:02:03,660 --> 00:02:06,290
estimate of each one
of the parameters

39
00:02:06,290 --> 00:02:10,590
turns out to also be a linear
function of the observations.

40
00:02:10,590 --> 00:02:12,920
We saw this property
in the estimators

41
00:02:12,920 --> 00:02:17,640
that we derived for simpler
cases in this lecture sequence.

42
00:02:17,640 --> 00:02:21,060
It turns out that this
property is still true.

43
00:02:21,060 --> 00:02:23,710
And this is an appealing
and desirable property

44
00:02:23,710 --> 00:02:27,750
because it means that these
estimators can be applied very

45
00:02:27,750 --> 00:02:30,210
efficiently in
practice without having

46
00:02:30,210 --> 00:02:33,430
to do any complicated
calculations.

47
00:02:33,430 --> 00:02:37,700
Finally, there's a number of
important facts, some of which

48
00:02:37,700 --> 00:02:40,180
we have seen in
our examples which

49
00:02:40,180 --> 00:02:42,930
are true in very big generality.

50
00:02:45,579 --> 00:02:50,470
One fact is that the maximum a
posteriori probability estimate

51
00:02:50,470 --> 00:02:53,000
of some parameter
turns out to be

52
00:02:53,000 --> 00:02:58,450
the same as the conditional
expectation of that parameter.

53
00:02:58,450 --> 00:03:03,875
Furthermore, if you look at
this joint density of all

54
00:03:03,875 --> 00:03:06,750
the Theta parameters
and from it you

55
00:03:06,750 --> 00:03:10,660
find the marginal density
of the Theta parameters

56
00:03:10,660 --> 00:03:13,390
always within this
conditional universe.

57
00:03:13,390 --> 00:03:21,550
It turns out that this marginal
posterior PDF is itself normal.

58
00:03:21,550 --> 00:03:26,480
Since it is normal, its mean--
which is this quantity--

59
00:03:26,480 --> 00:03:29,980
is going to be
equal to its peak.

60
00:03:29,980 --> 00:03:35,300
And therefore, it is equal
also to the MAP estimate that

61
00:03:35,300 --> 00:03:39,450
would be derived from
this marginal PDF.

62
00:03:39,450 --> 00:03:41,640
So what do we have here?

63
00:03:41,640 --> 00:03:44,640
There are two ways of coming
up with MAP estimates.

64
00:03:44,640 --> 00:03:51,079
One way is to find the
peak of the joint PDF,

65
00:03:51,079 --> 00:03:55,170
and then read out the
different components of Theta.

66
00:03:55,170 --> 00:03:58,030
Another way of coming
up with MAP estimates

67
00:03:58,030 --> 00:04:01,940
is to find for each
parameter the marginal PDF

68
00:04:01,940 --> 00:04:05,050
and look at the peak
of this marginal PDF.

69
00:04:05,050 --> 00:04:09,360
It turns out that for this
model these two approaches

70
00:04:09,360 --> 00:04:11,670
are going to give
you the same answer.

71
00:04:11,670 --> 00:04:14,850
Whether you work with the
marginal or with the joint,

72
00:04:14,850 --> 00:04:17,160
you get the same MAP estimates.

73
00:04:17,160 --> 00:04:20,110
And this is a reassuring
property to have.

74
00:04:20,110 --> 00:04:24,650
Finally, as in the examples that
we have worked in more detail,

75
00:04:24,650 --> 00:04:28,490
it turns out that the mean
squared error conditioned

76
00:04:28,490 --> 00:04:31,530
on a particular observation
is the same no matter

77
00:04:31,530 --> 00:04:34,780
what the value of
that observation was.

78
00:04:34,780 --> 00:04:38,810
And furthermore, there are
fairly simple and easily

79
00:04:38,810 --> 00:04:42,940
computable formulas that
one can apply in order

80
00:04:42,940 --> 00:04:46,740
to find what this
mean squared error is.

81
00:04:46,740 --> 00:04:51,340
So to summarize,
this class of models

82
00:04:51,340 --> 00:04:56,030
that involve linear relations
and normal random variables

83
00:04:56,030 --> 00:05:00,850
have a rich set of important
and elegant properties.

84
00:05:00,850 --> 00:05:03,860
This is one of the reasons
why these models are

85
00:05:03,860 --> 00:05:06,280
used very much in practice.

86
00:05:06,280 --> 00:05:09,440
And they're probably the
most widely used class

87
00:05:09,440 --> 00:05:11,850
of statistical models.