1
00:00:00,000 --> 00:00:00,040

2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

3
00:00:02,460 --> 00:00:03,870
Commons license.

4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to

5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.

6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from

7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at

8
00:00:19,290 --> 00:00:21,708
ocw.mit.edu.

9
00:00:21,708 --> 00:00:25,380
PROFESSOR: It involves real
phenomena out there.

10
00:00:25,380 --> 00:00:28,960
So we have real stuff
that happens.

11
00:00:28,960 --> 00:00:33,630
So it might be an arrival
process to a bank that we're

12
00:00:33,630 --> 00:00:35,790
trying to model.

13
00:00:35,790 --> 00:00:38,230
This is a reality, but
this is what we have

14
00:00:38,230 --> 00:00:39,660
been doing so far.

15
00:00:39,660 --> 00:00:41,910
We have been playing
with models of

16
00:00:41,910 --> 00:00:43,770
probabilistic phenomena.

17
00:00:43,770 --> 00:00:46,730
And somehow we need to
tie the two together.

18
00:00:46,730 --> 00:00:50,930
The way these are tied is that
we observe the real world and

19
00:00:50,930 --> 00:00:53,530
this gives us data.

20
00:00:53,530 --> 00:00:58,590
And then based on these data, we
try to come up with a model

21
00:00:58,590 --> 00:01:01,930
of what exactly is going on.

22
00:01:01,930 --> 00:01:05,290
For example, for an arrival
process, you might ask the

23
00:01:05,290 --> 00:01:08,680
model in question, is my arrival
process Poisson or is

24
00:01:08,680 --> 00:01:10,300
it something different?

25
00:01:10,300 --> 00:01:14,630
If it is Poisson, what is the
rate of the arrival process?

26
00:01:14,630 --> 00:01:17,460
Once you come up with your model
and you come up with the

27
00:01:17,460 --> 00:01:21,710
parameters of the model, then
you can use it to make

28
00:01:21,710 --> 00:01:27,520
predictions about reality or to
figure out certain hidden

29
00:01:27,520 --> 00:01:31,890
things, certain hidden aspects
of reality, that you do not

30
00:01:31,890 --> 00:01:35,560
observe directly, but you try
to infer what they are.

31
00:01:35,560 --> 00:01:38,900
So that's where the usefulness
of the model comes in.

32
00:01:38,900 --> 00:01:43,330
Now this field is of course
tremendously useful.

33
00:01:43,330 --> 00:01:46,650
And it shows up pretty
much everywhere.

34
00:01:46,650 --> 00:01:50,000
So we talked about the polling
examples in the

35
00:01:50,000 --> 00:01:51,280
last couple of lectures.

36
00:01:51,280 --> 00:01:53,520
This is, of course, a
real application.

37
00:01:53,520 --> 00:01:57,525
You sample and on the basis of
the sample that you have, you

38
00:01:57,525 --> 00:02:00,400
try to make some inferences
about, let's say, the

39
00:02:00,400 --> 00:02:03,060
preferences in a given
population.

40
00:02:03,060 --> 00:02:06,230
Let's say in the medical field,
you want to try whether

41
00:02:06,230 --> 00:02:08,919
a certain drug makes a
difference or not.

42
00:02:08,919 --> 00:02:14,380
So people would do medical
trials, get some results, and

43
00:02:14,380 --> 00:02:17,640
then from the data somehow you
need to make sense of them and

44
00:02:17,640 --> 00:02:18,530
make a decision.

45
00:02:18,530 --> 00:02:21,360
Is the new drug useful
or is it not?

46
00:02:21,360 --> 00:02:23,460
How do we go systematically
about the

47
00:02:23,460 --> 00:02:24,710
question of this type?

48
00:02:24,710 --> 00:02:27,770

49
00:02:27,770 --> 00:02:32,170
A sexier, more recent topic,
there's this famous Netflix

50
00:02:32,170 --> 00:02:37,510
competition where Netflix gives
you a huge table of

51
00:02:37,510 --> 00:02:41,450
movies and people.

52
00:02:41,450 --> 00:02:45,860
And people have rated the
movies, but not everyone has

53
00:02:45,860 --> 00:02:47,850
watched all of the
movies in there.

54
00:02:47,850 --> 00:02:49,460
You have some of the ratings.

55
00:02:49,460 --> 00:02:53,250
For example, this person gave a
4 to that particular movie.

56
00:02:53,250 --> 00:02:56,300
So you get the table that's
partially filled.

57
00:02:56,300 --> 00:02:58,300
And the Netflix asks
you to make

58
00:02:58,300 --> 00:02:59,860
recommendations to people.

59
00:02:59,860 --> 00:03:02,410
So this means trying to guess.

60
00:03:02,410 --> 00:03:06,100
This person here, how much
would they like this

61
00:03:06,100 --> 00:03:07,610
particular movie?

62
00:03:07,610 --> 00:03:11,130
And you can start thinking,
well, maybe this person has

63
00:03:11,130 --> 00:03:14,860
given somewhat similar ratings
with another person.

64
00:03:14,860 --> 00:03:18,440
And if that other person has
also seen that movie, maybe

65
00:03:18,440 --> 00:03:21,290
the rating of that other
person is relevant.

66
00:03:21,290 --> 00:03:24,230
But of course it's a lot more
complicated than that.

67
00:03:24,230 --> 00:03:26,650
And this has been a serious
competition where people have

68
00:03:26,650 --> 00:03:30,230
been using every heavy, wet
machinery that there is in

69
00:03:30,230 --> 00:03:32,540
statistics, trying to
come up with good

70
00:03:32,540 --> 00:03:35,140
recommendation systems.

71
00:03:35,140 --> 00:03:37,870
Then the other people, of
course, are trying to analyze

72
00:03:37,870 --> 00:03:39,010
financial data.

73
00:03:39,010 --> 00:03:43,680
Somebody gives you the sequence
of the values, let's

74
00:03:43,680 --> 00:03:45,840
say of the SMP index.

75
00:03:45,840 --> 00:03:47,850
You look at something like this

76
00:03:47,850 --> 00:03:49,770
and you can ask questions.

77
00:03:49,770 --> 00:03:55,030
How do I model these data using
any of the models that

78
00:03:55,030 --> 00:03:57,060
we have in our bag of tools?

79
00:03:57,060 --> 00:04:00,230
How can I make predictions about
what's going to happen

80
00:04:00,230 --> 00:04:03,310
afterwards, and so on?

81
00:04:03,310 --> 00:04:09,700
On the engineering side,
anywhere where you have noise

82
00:04:09,700 --> 00:04:11,590
inference comes in.

83
00:04:11,590 --> 00:04:13,810
Signal processing, in
some sense, is just

84
00:04:13,810 --> 00:04:14,960
an inference problem.

85
00:04:14,960 --> 00:04:18,730
You observe signals that are
noisy and you try to figure

86
00:04:18,730 --> 00:04:21,750
out exactly what's happening
out there or what kind of

87
00:04:21,750 --> 00:04:24,130
signal has been sent.

88
00:04:24,130 --> 00:04:28,830
Maybe the beginning of the field
could be traced a few

89
00:04:28,830 --> 00:04:32,060
hundred years ago where people
would observe, make

90
00:04:32,060 --> 00:04:35,420
astronomical observations
of the position of the

91
00:04:35,420 --> 00:04:37,550
planets in the sky.

92
00:04:37,550 --> 00:04:41,130
They would have some beliefs
that perhaps the orbits of

93
00:04:41,130 --> 00:04:44,070
planets is an ellipse.

94
00:04:44,070 --> 00:04:47,840
Or if it's a comet, maybe it's
a parabola, hyperbola, don't

95
00:04:47,840 --> 00:04:48,640
know what it is.

96
00:04:48,640 --> 00:04:51,320
But they would have
a model of that.

97
00:04:51,320 --> 00:04:53,840
But, of course, astronomical
measurements would not be

98
00:04:53,840 --> 00:04:55,300
perfectly exact.

99
00:04:55,300 --> 00:05:00,690
And they would try to find the
curve that fits these data.

100
00:05:00,690 --> 00:05:05,580
How do you go about choosing
this particular curve on the

101
00:05:05,580 --> 00:05:07,960
base of noisy data and
try to do it in a

102
00:05:07,960 --> 00:05:11,274
somewhat principled way?

103
00:05:11,274 --> 00:05:13,890
OK, so questions of this
type-- clearly the

104
00:05:13,890 --> 00:05:17,100
applications are all
over the place.

105
00:05:17,100 --> 00:05:20,830
But how is this related
conceptually with what we have

106
00:05:20,830 --> 00:05:22,480
been doing so far?

107
00:05:22,480 --> 00:05:25,960
What's the relation between the
field of inference and the

108
00:05:25,960 --> 00:05:28,130
field of probability
as we have been

109
00:05:28,130 --> 00:05:30,650
practicing until now?

110
00:05:30,650 --> 00:05:33,620
Well, mathematically speaking,
what's going to happen in the

111
00:05:33,620 --> 00:05:38,780
next few lectures could be just
exercises or homework

112
00:05:38,780 --> 00:05:44,880
problems in the class in based
on what we have done so far.

113
00:05:44,880 --> 00:05:48,560
That means you're not going
to get any new facts about

114
00:05:48,560 --> 00:05:50,200
probability theory.

115
00:05:50,200 --> 00:05:53,930
Everything we're going to do
will be simple applications of

116
00:05:53,930 --> 00:05:57,110
things that you already
do know.

117
00:05:57,110 --> 00:06:00,140
So in some sense, statistics
and inference is just an

118
00:06:00,140 --> 00:06:02,780
applied exercise
in probability.

119
00:06:02,780 --> 00:06:08,310
But actually, things are
not that simple in

120
00:06:08,310 --> 00:06:09,550
the following sense.

121
00:06:09,550 --> 00:06:12,510
If you get a probability
problem,

122
00:06:12,510 --> 00:06:14,040
there's a correct answer.

123
00:06:14,040 --> 00:06:15,450
There's a correct solution.

124
00:06:15,450 --> 00:06:18,170
And that correct solution
is unique.

125
00:06:18,170 --> 00:06:20,550
There's no ambiguity.

126
00:06:20,550 --> 00:06:23,380
The theory of probability has
clearly defined rules.

127
00:06:23,380 --> 00:06:24,570
These are the axioms.

128
00:06:24,570 --> 00:06:27,550
You're given some information
about probability

129
00:06:27,550 --> 00:06:28,280
distributions.

130
00:06:28,280 --> 00:06:31,000
You're asked to calculate
certain other things.

131
00:06:31,000 --> 00:06:32,190
There's no ambiguity.

132
00:06:32,190 --> 00:06:34,230
Answers are always unique.

133
00:06:34,230 --> 00:06:39,180
In statistical questions, it's
no longer the case that the

134
00:06:39,180 --> 00:06:41,420
question has a unique answer.

135
00:06:41,420 --> 00:06:44,990
If I give you data and I ask
you what's the best way of

136
00:06:44,990 --> 00:06:49,710
estimating the motion of that
planet, reasonable people can

137
00:06:49,710 --> 00:06:53,370
come up with different
methods.

138
00:06:53,370 --> 00:06:56,790
And reasonable people will try
to argue that's my method has

139
00:06:56,790 --> 00:07:00,140
these desirable properties but
somebody else may say, here's

140
00:07:00,140 --> 00:07:03,740
another method that has certain
desirable properties.

141
00:07:03,740 --> 00:07:08,220
And it's not clear what
the best method is.

142
00:07:08,220 --> 00:07:11,330
So it's good to have some
understanding of what the

143
00:07:11,330 --> 00:07:16,910
issues are and to know at least
what is the general

144
00:07:16,910 --> 00:07:20,150
class of methods that one tries
to consider, how does

145
00:07:20,150 --> 00:07:22,380
one go about such problems.

146
00:07:22,380 --> 00:07:24,360
So we're going to see
lots and lots of

147
00:07:24,360 --> 00:07:25,880
different inference methods.

148
00:07:25,880 --> 00:07:27,350
We're not going to tell
you that one is

149
00:07:27,350 --> 00:07:28,730
better than the other.

150
00:07:28,730 --> 00:07:30,940
But it's important to understand
what are the

151
00:07:30,940 --> 00:07:33,980
concepts between those
different methods.

152
00:07:33,980 --> 00:07:38,710
And finally, statistics can
be misused really badly.

153
00:07:38,710 --> 00:07:41,870
That is, one can come up with
methods that you think are

154
00:07:41,870 --> 00:07:48,650
sound, but in fact they're
not quite that.

155
00:07:48,650 --> 00:07:52,830
I will bring some examples next
time and talk a little

156
00:07:52,830 --> 00:07:54,290
more about this.

157
00:07:54,290 --> 00:07:58,540
So, they want to say, you have
some data, you want to make

158
00:07:58,540 --> 00:08:02,590
some inference from them, what
many people will do is to go

159
00:08:02,590 --> 00:08:06,340
to Wikipedia, find a statistical
test that they

160
00:08:06,340 --> 00:08:08,990
think it applies to that
situation, plug in numbers,

161
00:08:08,990 --> 00:08:10,880
and present results.

162
00:08:10,880 --> 00:08:14,220
Are the conclusions that they
get really justified or are

163
00:08:14,220 --> 00:08:16,400
they misusing statistical
methods?

164
00:08:16,400 --> 00:08:20,520
Well, too many people actually
do misuse statistics and

165
00:08:20,520 --> 00:08:24,530
conclusions that people
get are often false.

166
00:08:24,530 --> 00:08:29,840
So it's important to, besides
just being able to copy

167
00:08:29,840 --> 00:08:32,600
statistical tests and use them,
to understand what are

168
00:08:32,600 --> 00:08:35,860
the assumptions between the
different methods and what

169
00:08:35,860 --> 00:08:40,559
kind of guarantees they
have, if any.

170
00:08:40,559 --> 00:08:44,420
All right, so we'll try to do a
quick tour through the field

171
00:08:44,420 --> 00:08:47,600
of inference in this lecture and
the next few lectures that

172
00:08:47,600 --> 00:08:51,700
we have left this semester and
try to highlight at the very

173
00:08:51,700 --> 00:08:53,940
high level the main concept
skills, and

174
00:08:53,940 --> 00:08:56,990
techniques that come in.

175
00:08:56,990 --> 00:08:59,840
Let's start with some
generalities and some general

176
00:08:59,840 --> 00:09:01,090
statements.

177
00:09:01,090 --> 00:09:03,090

178
00:09:03,090 --> 00:09:07,090
One first statement is that
statistics or inference

179
00:09:07,090 --> 00:09:11,800
problems come up in very
different guises.

180
00:09:11,800 --> 00:09:16,490
And they may look as if they are
of very different forms.

181
00:09:16,490 --> 00:09:20,190
Although, at some fundamental
level, the basic issues turn

182
00:09:20,190 --> 00:09:23,320
out to be always pretty
much the same.

183
00:09:23,320 --> 00:09:27,880
So let's look at this example.

184
00:09:27,880 --> 00:09:31,420
There's an unknown signal
that's being sent.

185
00:09:31,420 --> 00:09:35,840
It's sent through some medium,
and that medium just takes the

186
00:09:35,840 --> 00:09:39,180
signal and amplifies it
by a certain number.

187
00:09:39,180 --> 00:09:41,340
So you can think of
somebody shouting.

188
00:09:41,340 --> 00:09:42,920
There's the air out there.

189
00:09:42,920 --> 00:09:46,420
What you shouted will be
attenuated through the air

190
00:09:46,420 --> 00:09:48,040
until it gets to a receiver.

191
00:09:48,040 --> 00:09:51,730
And that receiver then observes
this, but together

192
00:09:51,730 --> 00:09:53,110
with some random noise.

193
00:09:53,110 --> 00:09:56,040

194
00:09:56,040 --> 00:10:00,390
Here I meant S. S is the signal
that's being sent.

195
00:10:00,390 --> 00:10:06,280
And what you observe is an X.

196
00:10:06,280 --> 00:10:09,240
You observe X, so what kind
of inference problems

197
00:10:09,240 --> 00:10:11,240
could we have here?

198
00:10:11,240 --> 00:10:15,400
In some cases, you want to build
a model of the physical

199
00:10:15,400 --> 00:10:17,450
phenomenon that you're
dealing with.

200
00:10:17,450 --> 00:10:21,180
So for example, you don't know
the attenuation of your signal

201
00:10:21,180 --> 00:10:25,190
and you try to find out what
this number is based on the

202
00:10:25,190 --> 00:10:26,980
observations that you have.

203
00:10:26,980 --> 00:10:30,240
So the way this is done in
engineering systems is that

204
00:10:30,240 --> 00:10:35,020
you design a certain signal, you
know what it is, you shout

205
00:10:35,020 --> 00:10:39,560
a particular word, and then
the receiver listens.

206
00:10:39,560 --> 00:10:43,460
And based on the intensity of
the signal that they get, they

207
00:10:43,460 --> 00:10:48,380
try to make a guess about A. So
you don't know A, but you

208
00:10:48,380 --> 00:10:52,460
know S. And by observing X,
you get some information

209
00:10:52,460 --> 00:10:54,270
about what A is.

210
00:10:54,270 --> 00:10:57,810
So in this case, you're trying
to build a model of the medium

211
00:10:57,810 --> 00:11:01,170
through which your signal
is propagating.

212
00:11:01,170 --> 00:11:04,600
So sometimes one would call
problems of this kind, let's

213
00:11:04,600 --> 00:11:07,990
say, system identification.

214
00:11:07,990 --> 00:11:11,980
In a different version of an
inference problem that comes

215
00:11:11,980 --> 00:11:15,300
with this picture, you've
done your modeling.

216
00:11:15,300 --> 00:11:18,160
You know your A. You know the
medium through which the

217
00:11:18,160 --> 00:11:22,330
signal is going, but it's
a communication system.

218
00:11:22,330 --> 00:11:24,190
This person is trying
to communicate

219
00:11:24,190 --> 00:11:26,140
something to that person.

220
00:11:26,140 --> 00:11:30,250
So you send the signal S, but
that person receives a noisy

221
00:11:30,250 --> 00:11:35,430
version of S. So that person
tries to reconstruct S based

222
00:11:35,430 --> 00:11:36,930
on X.

223
00:11:36,930 --> 00:11:42,210
So in both cases, we have a
linear relation between X and

224
00:11:42,210 --> 00:11:43,490
the unknown quantity.

225
00:11:43,490 --> 00:11:47,360
In one version, A is the unknown
and we know S. In the

226
00:11:47,360 --> 00:11:51,670
other version, A is known,
and so we try to infer S.

227
00:11:51,670 --> 00:11:54,300
Mathematically, you can see that
this is essentially the

228
00:11:54,300 --> 00:11:57,060
same kind of problem
in both cases.

229
00:11:57,060 --> 00:12:03,590
Although, the kind of practical
problem that you're

230
00:12:03,590 --> 00:12:07,580
trying to solve is a
little different.

231
00:12:07,580 --> 00:12:11,880
So we will not be making any
distinctions between problems

232
00:12:11,880 --> 00:12:15,940
of the model building type as
opposed to models where you

233
00:12:15,940 --> 00:12:19,260
try to estimate some unknown
signal and so on.

234
00:12:19,260 --> 00:12:22,400
Because conceptually, the tools
that one uses for both

235
00:12:22,400 --> 00:12:26,850
types of problems are
essentially the same.

236
00:12:26,850 --> 00:12:30,430
OK, next a very useful
classification

237
00:12:30,430 --> 00:12:31,680
of inference problems--

238
00:12:31,680 --> 00:12:34,170

239
00:12:34,170 --> 00:12:37,760
the unknown quantity that you're
trying to estimate

240
00:12:37,760 --> 00:12:40,770
could be either a discrete
one that takes a

241
00:12:40,770 --> 00:12:43,040
small number of values.

242
00:12:43,040 --> 00:12:45,605
So this could be discrete
problems, such as the airplane

243
00:12:45,605 --> 00:12:48,080
radar problem we encountered
back a long

244
00:12:48,080 --> 00:12:50,120
time ago in this class.

245
00:12:50,120 --> 00:12:52,120
So there's two possibilities--

246
00:12:52,120 --> 00:12:55,450
an airplane is out there or an
airplane is not out there.

247
00:12:55,450 --> 00:12:57,050
And you're trying to
make a decision

248
00:12:57,050 --> 00:12:58,940
between these two options.

249
00:12:58,940 --> 00:13:01,570
Or you can have other problems
would you have, let's say,

250
00:13:01,570 --> 00:13:03,380
four possible options.

251
00:13:03,380 --> 00:13:05,970
You don't know which one is
true, but you get data and you

252
00:13:05,970 --> 00:13:09,040
try to figure out which
one is true.

253
00:13:09,040 --> 00:13:12,050
In problems of these kind,
usually you want to make a

254
00:13:12,050 --> 00:13:14,050
decision based on your data.

255
00:13:14,050 --> 00:13:17,000
And you're interested in the
probability of making a

256
00:13:17,000 --> 00:13:18,040
correct decision.

257
00:13:18,040 --> 00:13:19,430
You would like that
probability to

258
00:13:19,430 --> 00:13:21,830
be as high as possible.

259
00:13:21,830 --> 00:13:24,000
Estimation problems are
a little different.

260
00:13:24,000 --> 00:13:28,540
Here you have some continuous
quantity that's not known.

261
00:13:28,540 --> 00:13:31,860
And you try to make a good
guess of that quantity.

262
00:13:31,860 --> 00:13:36,050
And you would like your guess to
be as close as possible to

263
00:13:36,050 --> 00:13:37,310
the true quantity.

264
00:13:37,310 --> 00:13:40,270
So the polling problem
was of this type.

265
00:13:40,270 --> 00:13:44,720
There was an unknown fraction
f of the population that had

266
00:13:44,720 --> 00:13:45,870
some property.

267
00:13:45,870 --> 00:13:50,040
And you try to estimate f as
accurately as you can.

268
00:13:50,040 --> 00:13:53,420
So the distinction here is that
usually here the unknown

269
00:13:53,420 --> 00:13:56,440
quantity takes on discrete
set of values.

270
00:13:56,440 --> 00:13:57,890
Here the unknown quantity
takes a

271
00:13:57,890 --> 00:14:00,030
continuous set of values.

272
00:14:00,030 --> 00:14:02,980
Here we're interested in the
probability of error.

273
00:14:02,980 --> 00:14:07,400
Here we're interested in
the size of the error.

274
00:14:07,400 --> 00:14:11,000
Broadly speaking, most inference
problems fall either

275
00:14:11,000 --> 00:14:13,940
in this category or
in that category.

276
00:14:13,940 --> 00:14:17,230
Although, if you want to
complicate life, you can also

277
00:14:17,230 --> 00:14:20,250
think or construct problems
where both of these aspects

278
00:14:20,250 --> 00:14:24,410
are simultaneously present.

279
00:14:24,410 --> 00:14:28,530
OK, finally since we're in
classification mode, there is

280
00:14:28,530 --> 00:14:33,670
a very big, important dichotomy
into how one goes

281
00:14:33,670 --> 00:14:35,940
about inference problems.

282
00:14:35,940 --> 00:14:39,150
And here there's two
fundamentally different

283
00:14:39,150 --> 00:14:46,070
philosophical points of view,
which is how do we model the

284
00:14:46,070 --> 00:14:50,270
quantity that is unknown?

285
00:14:50,270 --> 00:14:54,530
In one approach, you say there's
a certain quantity

286
00:14:54,530 --> 00:14:57,590
that has a definite value.

287
00:14:57,590 --> 00:15:00,010
It just happens that
they don't know it.

288
00:15:00,010 --> 00:15:01,320
But it's a number.

289
00:15:01,320 --> 00:15:03,290
There's nothing random
about it.

290
00:15:03,290 --> 00:15:05,945
So think of trying to estimate
some physical quantity.

291
00:15:05,945 --> 00:15:10,630

292
00:15:10,630 --> 00:15:13,350
You're making measurements, you
try to estimate the mass

293
00:15:13,350 --> 00:15:15,820
of an electron, which
is a sort of

294
00:15:15,820 --> 00:15:18,270
universal physical constant.

295
00:15:18,270 --> 00:15:20,320
There's nothing random
about it.

296
00:15:20,320 --> 00:15:22,340
It's a fixed number.

297
00:15:22,340 --> 00:15:29,120
You get data, because you have
some measuring apparatus.

298
00:15:29,120 --> 00:15:33,020
And that measuring apparatus,
depending on what that results

299
00:15:33,020 --> 00:15:37,160
that you get are affected by the
true mass of the electron,

300
00:15:37,160 --> 00:15:39,340
but there's also some noise.

301
00:15:39,340 --> 00:15:42,200
You take the data out of your
measuring apparatus and you

302
00:15:42,200 --> 00:15:44,465
try to come up with
some estimate of

303
00:15:44,465 --> 00:15:47,220
that quantity theta.

304
00:15:47,220 --> 00:15:49,760
So this is definitely a
legitimate picture, but the

305
00:15:49,760 --> 00:15:52,370
important thing in this picture
is that this theta is

306
00:15:52,370 --> 00:15:54,570
written as lowercase.

307
00:15:54,570 --> 00:15:58,110
And that's to make the point
that it's a real number, not a

308
00:15:58,110 --> 00:16:00,900
random variable.

309
00:16:00,900 --> 00:16:03,230
There's a different
philosophical approach which

310
00:16:03,230 --> 00:16:08,180
says, well, anything that I
don't know I should model it

311
00:16:08,180 --> 00:16:10,190
as a random variable.

312
00:16:10,190 --> 00:16:11,130
Yes, I know.

313
00:16:11,130 --> 00:16:14,500
The mass of the electron
is not really random.

314
00:16:14,500 --> 00:16:15,690
It's a constant.

315
00:16:15,690 --> 00:16:17,920
But I don't know what it is.

316
00:16:17,920 --> 00:16:22,510
I have some vague sense,
perhaps, what it is perhaps

317
00:16:22,510 --> 00:16:24,290
because of the experiments
that some other

318
00:16:24,290 --> 00:16:25,940
people carried out.

319
00:16:25,940 --> 00:16:30,560
So perhaps I have a prior
distribution on the possible

320
00:16:30,560 --> 00:16:32,160
values of Theta.

321
00:16:32,160 --> 00:16:34,990
And that prior distribution
doesn't mean that the nature

322
00:16:34,990 --> 00:16:39,320
is random, but it's more of a
subjective description of my

323
00:16:39,320 --> 00:16:44,570
subjective beliefs of where do
I think this constant number

324
00:16:44,570 --> 00:16:46,200
happens to be.

325
00:16:46,200 --> 00:16:50,140
So even though it's not truly
random, I model my initial

326
00:16:50,140 --> 00:16:52,600
beliefs before the experiment
starts.

327
00:16:52,600 --> 00:16:55,790
In terms of a prior
distribution, I view it as a

328
00:16:55,790 --> 00:16:57,470
random variable.

329
00:16:57,470 --> 00:17:01,850
Then I observe another related
random variable through some

330
00:17:01,850 --> 00:17:02,930
measuring apparatus.

331
00:17:02,930 --> 00:17:05,920
And then I use this again
to create an estimate.

332
00:17:05,920 --> 00:17:08,819

333
00:17:08,819 --> 00:17:12,069
So these two pictures
philosophically are very

334
00:17:12,069 --> 00:17:13,589
different from each other.

335
00:17:13,589 --> 00:17:17,130
Here we treat the unknown
quantities as unknown numbers.

336
00:17:17,130 --> 00:17:20,589
Here we treat them as
random variables.

337
00:17:20,589 --> 00:17:24,829
When we treat them as a random
variables, then we know pretty

338
00:17:24,829 --> 00:17:27,109
much already what we
should be doing.

339
00:17:27,109 --> 00:17:29,470
We should just use
the Bayes rule.

340
00:17:29,470 --> 00:17:31,850
Based on X, find
the conditional

341
00:17:31,850 --> 00:17:33,670
distribution of Theta.

342
00:17:33,670 --> 00:17:37,520
And that's what we will be doing
mostly over this lecture

343
00:17:37,520 --> 00:17:40,010
and the next lecture.

344
00:17:40,010 --> 00:17:44,660
Now in both cases, what you end
up getting at the end is

345
00:17:44,660 --> 00:17:47,240
an estimate.

346
00:17:47,240 --> 00:17:52,120
But actually, that estimate is
what kind of object is it?

347
00:17:52,120 --> 00:17:55,170
It's a random variable
in both cases.

348
00:17:55,170 --> 00:17:56,000
Why?

349
00:17:56,000 --> 00:17:58,130
Even in this case where
theta was a

350
00:17:58,130 --> 00:18:01,060
constant, my data are random.

351
00:18:01,060 --> 00:18:02,860
I do my data processing.

352
00:18:02,860 --> 00:18:06,050
So I calculate a function
of the data, the

353
00:18:06,050 --> 00:18:07,580
data are random variables.

354
00:18:07,580 --> 00:18:11,390
So out here we output something
which is a function

355
00:18:11,390 --> 00:18:12,770
of a random variable.

356
00:18:12,770 --> 00:18:15,830
So this quantity here
will be also random.

357
00:18:15,830 --> 00:18:18,400
It's affected by the noise and
the experiment that I have

358
00:18:18,400 --> 00:18:19,650
been doing.

359
00:18:19,650 --> 00:18:22,330
That's why these estimators
will be denoted

360
00:18:22,330 --> 00:18:24,920
by uppercase Thetas.

361
00:18:24,920 --> 00:18:26,740
And we will be using hats.

362
00:18:26,740 --> 00:18:29,030
Hat, usually in estimation,
means

363
00:18:29,030 --> 00:18:32,990
an estimate of something.

364
00:18:32,990 --> 00:18:35,380
All right, so this is
the big picture.

365
00:18:35,380 --> 00:18:38,690
We're going to start with
the Bayesian version.

366
00:18:38,690 --> 00:18:42,830
And then the last few lectures
we're going to talk about the

367
00:18:42,830 --> 00:18:45,690
non-Bayesian version or
the classical one.

368
00:18:45,690 --> 00:18:48,610
By the way, I should say that
statisticians have been

369
00:18:48,610 --> 00:18:52,500
debating fiercely for 100 years
whether the right way to

370
00:18:52,500 --> 00:18:56,030
approach statistics is to go
the classical way or the

371
00:18:56,030 --> 00:18:57,420
Bayesian way.

372
00:18:57,420 --> 00:19:00,530
And there have been tides going
back and forth between

373
00:19:00,530 --> 00:19:02,260
the two sides.

374
00:19:02,260 --> 00:19:05,330
These days, Bayesian methods
tend to become a little more

375
00:19:05,330 --> 00:19:07,320
popular for various reasons.

376
00:19:07,320 --> 00:19:11,730
We're going to come back
to this later.

377
00:19:11,730 --> 00:19:14,610
All right, so in Bayesian
estimation, what we got in our

378
00:19:14,610 --> 00:19:16,610
hands is Bayes rule.

379
00:19:16,610 --> 00:19:19,380
And if you have Bayes rule,
there's not a lot

380
00:19:19,380 --> 00:19:21,340
that's left to do.

381
00:19:21,340 --> 00:19:24,190
We have different forms of the
Bayes rule, depending on

382
00:19:24,190 --> 00:19:27,920
whether we're dealing with
discrete data, And discrete

383
00:19:27,920 --> 00:19:32,310
quantities to estimate, or
continuous data, and so on.

384
00:19:32,310 --> 00:19:36,020
In the hypothesis testing
problem, the unknown quantity

385
00:19:36,020 --> 00:19:38,210
Theta is discrete.

386
00:19:38,210 --> 00:19:42,890
So in both cases here,
we have a P of Theta.

387
00:19:42,890 --> 00:19:45,530
We obtain data, the X's.

388
00:19:45,530 --> 00:19:49,040
And on the basis of the X that
we observe, we can calculate

389
00:19:49,040 --> 00:19:53,340
the posterior distribution
of Theta, given the data.

390
00:19:53,340 --> 00:19:59,840
So to use Bayesian inference,
what do we start with?

391
00:19:59,840 --> 00:20:03,160
We start with some priors.

392
00:20:03,160 --> 00:20:05,910
These are our initial
beliefs about what

393
00:20:05,910 --> 00:20:07,890
Theta that might be.

394
00:20:07,890 --> 00:20:10,440
That's before we do
the experiment.

395
00:20:10,440 --> 00:20:13,840
We have a model of the
experimental aparatus.

396
00:20:13,840 --> 00:20:17,520

397
00:20:17,520 --> 00:20:21,550
And the model of the
experimental apparatus tells

398
00:20:21,550 --> 00:20:28,040
us if this Theta is true, I'm
going to see X's of that kind.

399
00:20:28,040 --> 00:20:31,480
If that other Theta is true, I'm
going to see X's that they

400
00:20:31,480 --> 00:20:33,130
are somewhere else.

401
00:20:33,130 --> 00:20:35,200
That models my apparatus.

402
00:20:35,200 --> 00:20:39,150
And based on that knowledge,
once I observe I have these

403
00:20:39,150 --> 00:20:41,975
two functions in my hands, we
have already seen that if you

404
00:20:41,975 --> 00:20:44,760
know those two functions, you
can also calculate the

405
00:20:44,760 --> 00:20:46,550
denominator here.

406
00:20:46,550 --> 00:20:50,900
So all of these functions are
available, so you can compute,

407
00:20:50,900 --> 00:20:54,170
you can find a formula for
this function as well.

408
00:20:54,170 --> 00:20:58,780
And as soon as you observe the
data, that X's, you plug in

409
00:20:58,780 --> 00:21:02,220
here the numerical value
of those X's.

410
00:21:02,220 --> 00:21:04,720
And you get a function
of Theta.

411
00:21:04,720 --> 00:21:07,870
And this is the posterior
distribution of Theta, given

412
00:21:07,870 --> 00:21:09,680
the data that you have seen.

413
00:21:09,680 --> 00:21:11,930
So you've already done
a fair number of

414
00:21:11,930 --> 00:21:13,760
exercises of these kind.

415
00:21:13,760 --> 00:21:17,320
So we not say more about this.

416
00:21:17,320 --> 00:21:20,470
And there's a similar formula as
you know for the case where

417
00:21:20,470 --> 00:21:22,460
we have continuous data.

418
00:21:22,460 --> 00:21:25,140
If the X's are continuous random
variable, then the

419
00:21:25,140 --> 00:21:28,620
formula is the same, except
that X's are described by

420
00:21:28,620 --> 00:21:31,630
densities instead of being
described by a probability

421
00:21:31,630 --> 00:21:32,880
mass functions.

422
00:21:32,880 --> 00:21:35,170

423
00:21:35,170 --> 00:21:40,200
OK, now if Theta is continuous,
then we're dealing

424
00:21:40,200 --> 00:21:42,160
with estimation problems.

425
00:21:42,160 --> 00:21:44,880
But the story is once
more the same.

426
00:21:44,880 --> 00:21:47,920
You're going to use the Bayes
rule to come up with the

427
00:21:47,920 --> 00:21:51,090
posterior density of Theta,
given the data

428
00:21:51,090 --> 00:21:53,300
that you have observed.

429
00:21:53,300 --> 00:21:57,250
Now just for the sake of the
example, let's come back to

430
00:21:57,250 --> 00:21:58,900
this picture here.

431
00:21:58,900 --> 00:22:03,490
Suppose that something is flying
in the air, and maybe

432
00:22:03,490 --> 00:22:07,800
this is just an object in the
air close to the Earth.

433
00:22:07,800 --> 00:22:10,820
So because of gravity, the
trajectory that it's going to

434
00:22:10,820 --> 00:22:15,170
follow it's going to
be a parabola.

435
00:22:15,170 --> 00:22:18,014
So this is the general equation
of a parabola.

436
00:22:18,014 --> 00:22:23,450
Zt is the position of my
objects at time t.

437
00:22:23,450 --> 00:22:26,310

438
00:22:26,310 --> 00:22:29,500
But I don't know exactly
which parabola it is.

439
00:22:29,500 --> 00:22:32,690
So the parameters of the
parabola are unknown

440
00:22:32,690 --> 00:22:34,040
quantities.

441
00:22:34,040 --> 00:22:37,710
What I can do is to go and
measure the position of my

442
00:22:37,710 --> 00:22:41,880
objects at different times.

443
00:22:41,880 --> 00:22:44,575
But unfortunately, my
measurements are noisy.

444
00:22:44,575 --> 00:22:47,380

445
00:22:47,380 --> 00:22:51,070
What I want to do is to model
the motion of my object.

446
00:22:51,070 --> 00:22:56,260
So I guess in the picture, the
axis would be t going this way

447
00:22:56,260 --> 00:22:59,980
and Z going this way.

448
00:22:59,980 --> 00:23:02,470
And on the basis of the
data that they get,

449
00:23:02,470 --> 00:23:05,020
these are my X's.

450
00:23:05,020 --> 00:23:07,390
I want to figure
out the Thetas.

451
00:23:07,390 --> 00:23:09,570
That is, I want to figure
out the exact

452
00:23:09,570 --> 00:23:11,840
equation of this parabola.

453
00:23:11,840 --> 00:23:14,940
Now if somebody gives you
probability distributions for

454
00:23:14,940 --> 00:23:18,490
Theta, these would
be your priors.

455
00:23:18,490 --> 00:23:19,840
So this is given.

456
00:23:19,840 --> 00:23:23,200

457
00:23:23,200 --> 00:23:26,200
We need the conditional
distribution of the X's given

458
00:23:26,200 --> 00:23:27,360
the Thetas.

459
00:23:27,360 --> 00:23:30,870
Well, we have the conditional
distribution of Z, given the

460
00:23:30,870 --> 00:23:32,920
Thetas from this equation.

461
00:23:32,920 --> 00:23:36,040
And then by playing with this
equation, you can also find

462
00:23:36,040 --> 00:23:42,460
how is X distributed if Theta
takes a particular value.

463
00:23:42,460 --> 00:23:46,420
So you do have all of the
densities that you might need.

464
00:23:46,420 --> 00:23:48,790
And you can apply
the Bayes rule.

465
00:23:48,790 --> 00:23:53,620
And at the end, your end result
would be a formula for

466
00:23:53,620 --> 00:23:57,270
the distribution of Theta,
given to the X

467
00:23:57,270 --> 00:23:59,130
that you have observed--

468
00:23:59,130 --> 00:24:03,000
except for one sort of
computation, or to make things

469
00:24:03,000 --> 00:24:04,470
more interesting.

470
00:24:04,470 --> 00:24:07,680
Instead of these X's and Theta's
being single random

471
00:24:07,680 --> 00:24:11,070
variables that we have here,
typically those X's and

472
00:24:11,070 --> 00:24:13,400
Theta's will be
multi-dimensional random

473
00:24:13,400 --> 00:24:16,490
variables or will correspond
to multiple ones.

474
00:24:16,490 --> 00:24:19,920
So this little Theta here
actually stands for a triplet

475
00:24:19,920 --> 00:24:22,880
of Theta0, Theta1, and Theta2.

476
00:24:22,880 --> 00:24:26,820
And that X here stands here for
the entire sequence of X's

477
00:24:26,820 --> 00:24:28,410
that we have observed.

478
00:24:28,410 --> 00:24:31,060
So in reality, the object that
you're going to get at to the

479
00:24:31,060 --> 00:24:35,900
end after inference is done is
a function that you plug in

480
00:24:35,900 --> 00:24:39,430
the values of the data and you
get the function of the

481
00:24:39,430 --> 00:24:43,240
Theta's that tells you the
relative likelihoods of

482
00:24:43,240 --> 00:24:46,780
different Theta triplets.

483
00:24:46,780 --> 00:24:49,760
So what I'm saying is that this
is no harder than the

484
00:24:49,760 --> 00:24:53,720
problems that you have dealt
with so far, except perhaps

485
00:24:53,720 --> 00:24:56,020
for the complication that's
usually in interesting

486
00:24:56,020 --> 00:24:57,490
inference problems.

487
00:24:57,490 --> 00:25:01,940
Your Theta's and X's are often
the vectors of random

488
00:25:01,940 --> 00:25:05,490
variables instead of individual
random variables.

489
00:25:05,490 --> 00:25:09,630
Now if you are to do estimation
in a case where you

490
00:25:09,630 --> 00:25:13,520
have discrete data, again the
situation is no different.

491
00:25:13,520 --> 00:25:17,020
We still have a Bayes rule of
the same kind, except that

492
00:25:17,020 --> 00:25:19,540
densities gets replaced
by PMF's.

493
00:25:19,540 --> 00:25:23,680
If X is discrete, you put a P
here instead of putting an f.

494
00:25:23,680 --> 00:25:27,990
So an example of an estimation
problem with discrete data is

495
00:25:27,990 --> 00:25:29,740
similar to the polling
problem.

496
00:25:29,740 --> 00:25:31,600
You have a coin.

497
00:25:31,600 --> 00:25:33,500
It has an unknown
parameter Theta.

498
00:25:33,500 --> 00:25:35,230
This is the probability
of obtaining heads.

499
00:25:35,230 --> 00:25:37,410
You flip the coin many times.

500
00:25:37,410 --> 00:25:41,560
What can you tell me about
the true value of Theta?

501
00:25:41,560 --> 00:25:46,200
A classical statistician, at
this point, would say, OK, I'm

502
00:25:46,200 --> 00:25:48,900
going to use an estimator,
the most reasonable

503
00:25:48,900 --> 00:25:50,950
one, which is this.

504
00:25:50,950 --> 00:25:54,200
How many heads did they
obtain in n trials?

505
00:25:54,200 --> 00:25:56,440
Divide by the total
number of trials.

506
00:25:56,440 --> 00:26:00,700
This is my estimate of
the bias of my coin.

507
00:26:00,700 --> 00:26:02,860
And then the classical
statistician would continue

508
00:26:02,860 --> 00:26:07,610
from here and try to prove some
properties and argue that

509
00:26:07,610 --> 00:26:10,030
this estimate is a good one.

510
00:26:10,030 --> 00:26:12,850
For example, we have the weak
law of large numbers that

511
00:26:12,850 --> 00:26:15,630
tells us that this particular
estimate converges in

512
00:26:15,630 --> 00:26:17,990
probability to the
true parameter.

513
00:26:17,990 --> 00:26:21,000
This is a kind of guarantee
that's useful to have.

514
00:26:21,000 --> 00:26:23,410
And the classical statistician
would pretty much close the

515
00:26:23,410 --> 00:26:24,660
subject in this way.

516
00:26:24,660 --> 00:26:27,340

517
00:26:27,340 --> 00:26:30,160
What would the Bayesian
person do differently?

518
00:26:30,160 --> 00:26:35,040
The Bayesian person would start
by assuming a prior

519
00:26:35,040 --> 00:26:37,100
distribution of Theta.

520
00:26:37,100 --> 00:26:39,820
Instead of treating Theta as
an unknown constant, they

521
00:26:39,820 --> 00:26:44,340
would say that Theta would speak
randomly or pretend that

522
00:26:44,340 --> 00:26:47,360
it would speak randomly
and assume a

523
00:26:47,360 --> 00:26:49,300
distribution on Theta.

524
00:26:49,300 --> 00:26:54,290
So for example, if you don't
know they need anything more,

525
00:26:54,290 --> 00:26:57,510
you might assume that any value
for the bias of the coin

526
00:26:57,510 --> 00:27:01,460
is as likely as any other value
of the bias of the coin.

527
00:27:01,460 --> 00:27:04,150
And this way so the probability
distribution

528
00:27:04,150 --> 00:27:05,720
that's uniform.

529
00:27:05,720 --> 00:27:09,840
Or if you have a little more
faith in the manufacturing

530
00:27:09,840 --> 00:27:13,270
processes that's created that
coin, you might choose your

531
00:27:13,270 --> 00:27:17,660
prior to be a distribution
that's centered around 1/2 and

532
00:27:17,660 --> 00:27:21,860
sits fairly narrowly centered
around 1/2.

533
00:27:21,860 --> 00:27:24,500
That would be a prior
distribution in which you say,

534
00:27:24,500 --> 00:27:27,920
well, I believe that the
manufacturer tried to make my

535
00:27:27,920 --> 00:27:29,410
coin to be fair.

536
00:27:29,410 --> 00:27:33,070
But they often makes some
mistakes, so it's going to be,

537
00:27:33,070 --> 00:27:36,600
I believe, it's approximately
1/2 but not quite.

538
00:27:36,600 --> 00:27:40,050
So depending on your beliefs,
you would choose an

539
00:27:40,050 --> 00:27:43,630
appropriate prior for the
distribution of Theta.

540
00:27:43,630 --> 00:27:48,610
And then you would use the
Bayes rule to find the

541
00:27:48,610 --> 00:27:52,270
probabilities of different
values of Theta, based on the

542
00:27:52,270 --> 00:27:53,520
data that you have observed.

543
00:27:53,520 --> 00:27:59,620

544
00:27:59,620 --> 00:28:04,640
So no matter which version of
the Bayes rule that you use,

545
00:28:04,640 --> 00:28:10,540
the end product of the Bayes
rule is going to be either a

546
00:28:10,540 --> 00:28:14,400
plot of this kind or a
plot of that kind.

547
00:28:14,400 --> 00:28:16,740
So what am I plotting here?

548
00:28:16,740 --> 00:28:19,810
This axis is the Theta axis.

549
00:28:19,810 --> 00:28:23,830
These are the possible values
of the unknown quantity that

550
00:28:23,830 --> 00:28:26,670
we're trying to estimate.

551
00:28:26,670 --> 00:28:28,990
In the continuous
case, theta is a

552
00:28:28,990 --> 00:28:30,800
continuous random variable.

553
00:28:30,800 --> 00:28:32,560
I obtain my data.

554
00:28:32,560 --> 00:28:36,430
And I plot for the posterior
probability distribution after

555
00:28:36,430 --> 00:28:37,940
observing my data.

556
00:28:37,940 --> 00:28:42,220
And I'm plotting here the
probability density for Theta.

557
00:28:42,220 --> 00:28:45,500
So this is a plot
of that density.

558
00:28:45,500 --> 00:28:49,210
In the discrete case, theta can
take finitely many values

559
00:28:49,210 --> 00:28:51,570
or a discrete set of values.

560
00:28:51,570 --> 00:28:54,470
And for each one of those
values, I'm telling you how

561
00:28:54,470 --> 00:28:58,080
likely is that the value to be
the correct one, given the

562
00:28:58,080 --> 00:29:01,040
data that I have observed.

563
00:29:01,040 --> 00:29:04,990
And in general, what you would
go back to your boss and

564
00:29:04,990 --> 00:29:08,520
report after you've done all
your inference work would be

565
00:29:08,520 --> 00:29:10,870
either a plot of this kinds
or of that kind.

566
00:29:10,870 --> 00:29:14,180
So you go to your boss
who asks you, what is

567
00:29:14,180 --> 00:29:15,190
the value of Theta?

568
00:29:15,190 --> 00:29:17,490
And you say, well, I only
have limited data.

569
00:29:17,490 --> 00:29:19,420
That I don't know what it is.

570
00:29:19,420 --> 00:29:22,920
It could be this, with
so much probability.

571
00:29:22,920 --> 00:29:24,640
There's probability.

572
00:29:24,640 --> 00:29:27,220
OK, let's throw in some
numbers here.

573
00:29:27,220 --> 00:29:32,250
There's probability 0.3 that
Theta is this value.

574
00:29:32,250 --> 00:29:36,100
There's probability 0.2 that
Theta is this value, 0.1 that

575
00:29:36,100 --> 00:29:39,420
it's this one, 0.1 that it's
this one, 0.2 that it's that

576
00:29:39,420 --> 00:29:40,830
one, and so on.

577
00:29:40,830 --> 00:29:44,890
OK, now bosses often want
simple answers.

578
00:29:44,890 --> 00:29:48,480
They say, OK, you're
talking too much.

579
00:29:48,480 --> 00:29:51,770
What do you think Theta is?

580
00:29:51,770 --> 00:29:55,920
And now you're forced
to make a decision.

581
00:29:55,920 --> 00:30:00,680
If that was the situation and
you have to make a decision,

582
00:30:00,680 --> 00:30:02,370
how would you make it?

583
00:30:02,370 --> 00:30:06,880
Well, I'm going to make a
decision that's most likely to

584
00:30:06,880 --> 00:30:09,120
be correct.

585
00:30:09,120 --> 00:30:13,060
If I make this decision,
what's going to happen?

586
00:30:13,060 --> 00:30:17,670
Theta is this value with
probability 0.2, which means

587
00:30:17,670 --> 00:30:21,150
there's probably 0.8 that
they make an error

588
00:30:21,150 --> 00:30:23,280
if I make that guess.

589
00:30:23,280 --> 00:30:29,370
If I make that decision, this
decision has probably 0.3 of

590
00:30:29,370 --> 00:30:30,750
being the correct one.

591
00:30:30,750 --> 00:30:34,530
So I have probably
of error 0.7.

592
00:30:34,530 --> 00:30:38,460
So if you want to just maximize
the probability of

593
00:30:38,460 --> 00:30:41,730
giving the correct decision, or
if you want to minimize the

594
00:30:41,730 --> 00:30:44,780
probability of making an
incorrect decision, what

595
00:30:44,780 --> 00:30:48,790
you're going to choose to report
is that value of Theta

596
00:30:48,790 --> 00:30:51,450
for which the probability
is highest.

597
00:30:51,450 --> 00:30:54,230
So in this case, I would
choose to report this

598
00:30:54,230 --> 00:30:58,210
particular value, the most
likely value of Theta, given

599
00:30:58,210 --> 00:31:00,120
what I have observed.

600
00:31:00,120 --> 00:31:04,640
And that value is called them
maximum a posteriori

601
00:31:04,640 --> 00:31:07,550
probability estimate.

602
00:31:07,550 --> 00:31:11,550
It's going to be this
one in our case.

603
00:31:11,550 --> 00:31:16,830
So picking the point in the
posterior PMF that has the

604
00:31:16,830 --> 00:31:19,040
highest probability.

605
00:31:19,040 --> 00:31:20,720
That's the reasonable
thing to do.

606
00:31:20,720 --> 00:31:23,850
This is the optimal thing to do
if you want to minimize the

607
00:31:23,850 --> 00:31:27,340
probability of an incorrect
inference.

608
00:31:27,340 --> 00:31:31,400
And that's what people do
usually if they need to report

609
00:31:31,400 --> 00:31:35,280
a single answer, if they need
to report a single decision.

610
00:31:35,280 --> 00:31:39,530
How about in the estimation
context?

611
00:31:39,530 --> 00:31:43,250
If that's what you know about
Theta, Theta could be around

612
00:31:43,250 --> 00:31:46,670
here, but there's also some
sharp probability that it is

613
00:31:46,670 --> 00:31:48,720
around here.

614
00:31:48,720 --> 00:31:52,380
What's the single answer that
you would give to your boss?

615
00:31:52,380 --> 00:31:56,310
One option is to use the same
philosophy and say, OK, I'm

616
00:31:56,310 --> 00:32:00,135
going to find the Theta at which
this posterior density

617
00:32:00,135 --> 00:32:01,690
is highest.

618
00:32:01,690 --> 00:32:06,010
So I would pick this point
here and report this

619
00:32:06,010 --> 00:32:06,920
particular Theta.

620
00:32:06,920 --> 00:32:11,110
So this would be my Theta,
again, Theta MAP, the Theta

621
00:32:11,110 --> 00:32:15,290
that has the highest a
posteriori probability, just

622
00:32:15,290 --> 00:32:19,100
because it corresponds to
the peak of the density.

623
00:32:19,100 --> 00:32:23,810
But in this context, the
maximum a posteriori

624
00:32:23,810 --> 00:32:27,120
probability theta was the
one that was most

625
00:32:27,120 --> 00:32:28,600
likely to be true.

626
00:32:28,600 --> 00:32:32,460
In the continuous case, you
cannot really say that this is

627
00:32:32,460 --> 00:32:34,940
the most likely value
of Theta.

628
00:32:34,940 --> 00:32:38,340
In a continuous setting, any
value of Theta has zero

629
00:32:38,340 --> 00:32:41,530
probability, so when we
talk about densities.

630
00:32:41,530 --> 00:32:43,260
So it's not the most likely.

631
00:32:43,260 --> 00:32:48,240
It's the one for which the
density, so the probabilities

632
00:32:48,240 --> 00:32:51,820
of that neighborhoods,
are highest.

633
00:32:51,820 --> 00:32:56,390
So the rationale for picking
this particular estimate in

634
00:32:56,390 --> 00:33:00,050
the continuous case is much
less compelling than the

635
00:33:00,050 --> 00:33:02,210
rationale that we had in here.

636
00:33:02,210 --> 00:33:05,590
So in this case, reasonable
people might choose different

637
00:33:05,590 --> 00:33:07,460
quantities to report.

638
00:33:07,460 --> 00:33:11,810
And the very popular one would
be to report instead the

639
00:33:11,810 --> 00:33:13,700
conditional expectation.

640
00:33:13,700 --> 00:33:15,990
So I don't know quite
what Theta is.

641
00:33:15,990 --> 00:33:19,600
Given the data that I have,
Theta has this distribution.

642
00:33:19,600 --> 00:33:23,320
Let me just report the average
over that distribution.

643
00:33:23,320 --> 00:33:27,090
Let me report to the center
of gravity of this figure.

644
00:33:27,090 --> 00:33:30,340
And in this figure, the center
of gravity would probably be

645
00:33:30,340 --> 00:33:32,230
somewhere around here.

646
00:33:32,230 --> 00:33:35,690
And that would be a different
estimate that you

647
00:33:35,690 --> 00:33:37,520
might choose to report.

648
00:33:37,520 --> 00:33:40,340
So center of gravity is
something around here.

649
00:33:40,340 --> 00:33:43,580
And this is a conditional
expectation of Theta, given

650
00:33:43,580 --> 00:33:46,010
the data that you have.

651
00:33:46,010 --> 00:33:51,190
So these are two, in some sense,
fairly reasonable ways

652
00:33:51,190 --> 00:33:53,850
of choosing what to report
to your boss.

653
00:33:53,850 --> 00:33:55,690
Some people might choose
to report this.

654
00:33:55,690 --> 00:33:58,630
Some people might choose
to report that.

655
00:33:58,630 --> 00:34:03,230
And a priori, if there's no
compelling reason why one

656
00:34:03,230 --> 00:34:08,639
would be preferable than other
one, unless you set some rules

657
00:34:08,639 --> 00:34:12,350
for the game and you describe
a little more precisely what

658
00:34:12,350 --> 00:34:14,090
your objectives are.

659
00:34:14,090 --> 00:34:19,070
But no matter which one you
report, a single answer, a

660
00:34:19,070 --> 00:34:24,350
point estimate, doesn't really
tell you the whole story.

661
00:34:24,350 --> 00:34:28,159
There's a lot more information
conveyed by this posterior

662
00:34:28,159 --> 00:34:31,060
distribution plot than
any single number

663
00:34:31,060 --> 00:34:32,159
that you might report.

664
00:34:32,159 --> 00:34:36,510
So in general, you may wish to
convince your boss that's it's

665
00:34:36,510 --> 00:34:40,310
worth their time to look at the
entire plot, because that

666
00:34:40,310 --> 00:34:43,100
plot sort of covers all
the possibilities.

667
00:34:43,100 --> 00:34:47,060
It tells your boss most likely
we're in that range, but

668
00:34:47,060 --> 00:34:51,620
there's also a distinct change
that our Theta happens to lie

669
00:34:51,620 --> 00:34:54,080
in that range.

670
00:34:54,080 --> 00:34:58,400
All right, now let us try to
perhaps differentiate between

671
00:34:58,400 --> 00:35:02,570
these two and see under what
circumstances this one might

672
00:35:02,570 --> 00:35:05,530
be the better estimate
to perform.

673
00:35:05,530 --> 00:35:07,320
Better with respect to what?

674
00:35:07,320 --> 00:35:08,830
We need some rules.

675
00:35:08,830 --> 00:35:10,730
So we're going to throw
in some rules.

676
00:35:10,730 --> 00:35:14,320

677
00:35:14,320 --> 00:35:17,450
As a warm up, we're going to
deal with the problem of

678
00:35:17,450 --> 00:35:22,000
making an estimation if you
had no information at all,

679
00:35:22,000 --> 00:35:24,670
except for a prior
distribution.

680
00:35:24,670 --> 00:35:27,650
So this is a warm up for what's
coming next, which

681
00:35:27,650 --> 00:35:32,970
would be estimation that takes
into account some information.

682
00:35:32,970 --> 00:35:34,860
So we have a Theta.

683
00:35:34,860 --> 00:35:38,500
And because of your subjective
beliefs or models by others,

684
00:35:38,500 --> 00:35:41,780
you believe that Theta is
uniformly distributed between,

685
00:35:41,780 --> 00:35:46,250
let's say, 4 and 10.

686
00:35:46,250 --> 00:35:48,120
You want to come up with
a point estimate.

687
00:35:48,120 --> 00:35:51,770

688
00:35:51,770 --> 00:35:54,900
Let's try to look
for an estimate.

689
00:35:54,900 --> 00:35:57,580
Call it c, in this case.

690
00:35:57,580 --> 00:36:00,090
I want to pick a number
with which to estimate

691
00:36:00,090 --> 00:36:01,340
the value of Theta.

692
00:36:01,340 --> 00:36:04,030

693
00:36:04,030 --> 00:36:08,260
I will be interested in the size
of the error that I make.

694
00:36:08,260 --> 00:36:12,310
And I really dislike large
errors, so I'm going to focus

695
00:36:12,310 --> 00:36:15,500
on the square of the error
that they make.

696
00:36:15,500 --> 00:36:19,140
So I pick c.

697
00:36:19,140 --> 00:36:21,340
Theta that has a random value
that I don't know.

698
00:36:21,340 --> 00:36:25,900
But whatever it is, once it
becomes known, it results into

699
00:36:25,900 --> 00:36:28,640
a squared error between
what it is and what I

700
00:36:28,640 --> 00:36:30,660
guessed that it was.

701
00:36:30,660 --> 00:36:35,770
And I'm interested in making
a small air on the average,

702
00:36:35,770 --> 00:36:38,170
where the average is taken
with respect to all the

703
00:36:38,170 --> 00:36:42,350
possible and unknown
values of Theta.

704
00:36:42,350 --> 00:36:47,220
So the problem, this is a least
squares formulation of

705
00:36:47,220 --> 00:36:49,240
the problem, where we
try to minimize the

706
00:36:49,240 --> 00:36:51,150
least squares errors.

707
00:36:51,150 --> 00:36:53,900
How do you find the optimal c?

708
00:36:53,900 --> 00:36:57,200
Well, we take that expression
and expand it.

709
00:36:57,200 --> 00:37:00,930

710
00:37:00,930 --> 00:37:05,650
And it is, using linearity
of expectations--

711
00:37:05,650 --> 00:37:11,460
square minus 2c expected
Theta plus c squared--

712
00:37:11,460 --> 00:37:13,620
that's the quantity that
we want to minimize,

713
00:37:13,620 --> 00:37:16,670
with respect to c.

714
00:37:16,670 --> 00:37:19,670
To do the minimization, take the
derivative with respect to

715
00:37:19,670 --> 00:37:21,950
c and set it to 0.

716
00:37:21,950 --> 00:37:27,320
So that differentiation gives us
from here minus 2 expected

717
00:37:27,320 --> 00:37:32,420
value of Theta plus
2c is equal to 0.

718
00:37:32,420 --> 00:37:36,550
And the answer that you get by
solving this equation is that

719
00:37:36,550 --> 00:37:39,350
c is the expected
value of Theta.

720
00:37:39,350 --> 00:37:42,860
So when you do this
optimization, you find that

721
00:37:42,860 --> 00:37:45,170
the optimal estimate, the
things you should be

722
00:37:45,170 --> 00:37:47,970
reporting, is the expected
value of Theta.

723
00:37:47,970 --> 00:37:51,630
So in this particular example,
you would choose your estimate

724
00:37:51,630 --> 00:37:55,500
c to be just the middle
of these values,

725
00:37:55,500 --> 00:37:57,980
which would be 7.

726
00:37:57,980 --> 00:38:02,642

727
00:38:02,642 --> 00:38:06,640
OK, and in case your
boss asks you, how

728
00:38:06,640 --> 00:38:08,610
good is your estimate?

729
00:38:08,610 --> 00:38:11,390
How big is your error
going to be?

730
00:38:11,390 --> 00:38:14,910

731
00:38:14,910 --> 00:38:19,870
What you could report is the
average size of the estimation

732
00:38:19,870 --> 00:38:22,570
error that you are making.

733
00:38:22,570 --> 00:38:26,760
We picked our estimates to be
the expected value of Theta.

734
00:38:26,760 --> 00:38:29,450
So for this particular way that
I'm choosing to do my

735
00:38:29,450 --> 00:38:33,610
estimation, this is the mean
squared error that I get.

736
00:38:33,610 --> 00:38:35,330
And this is a familiar
quantity.

737
00:38:35,330 --> 00:38:38,370
It's just the variance
of the distribution.

738
00:38:38,370 --> 00:38:41,890
So the expectation is that
best way to estimate a

739
00:38:41,890 --> 00:38:45,550
quantity, if you're interested
in the mean squared error.

740
00:38:45,550 --> 00:38:50,430
And the resulting mean squared
error is the variance itself.

741
00:38:50,430 --> 00:38:56,380
How will this story change if
we now have data as well?

742
00:38:56,380 --> 00:39:01,290
Now having data means that
we can compute posterior

743
00:39:01,290 --> 00:39:05,150
distributions or conditional
distributions.

744
00:39:05,150 --> 00:39:10,400
So we get transported into a new
universe where instead the

745
00:39:10,400 --> 00:39:14,740
working with the original
distribution of Theta, the

746
00:39:14,740 --> 00:39:18,860
prior distribution, now we work
with the condition of

747
00:39:18,860 --> 00:39:22,280
distribution of Theta,
given the data

748
00:39:22,280 --> 00:39:24,860
that we have observed.

749
00:39:24,860 --> 00:39:30,430
Now remember our old slogan that
conditional models and

750
00:39:30,430 --> 00:39:33,570
conditional probabilities are
no different than ordinary

751
00:39:33,570 --> 00:39:38,880
probabilities, except that we
live now in a new universe

752
00:39:38,880 --> 00:39:42,690
where the new information has
been taken into account.

753
00:39:42,690 --> 00:39:47,860
So if you use that philosophy
and you're asked to minimize

754
00:39:47,860 --> 00:39:53,310
the squared error but now that
you live in a new universe

755
00:39:53,310 --> 00:39:56,910
where X has been fixed to
something, what would the

756
00:39:56,910 --> 00:39:59,210
optimal solution be?

757
00:39:59,210 --> 00:40:03,540
It would again be the
expectation of theta, but

758
00:40:03,540 --> 00:40:04,730
which expectation?

759
00:40:04,730 --> 00:40:08,910
It's the expectation which
applies in the new conditional

760
00:40:08,910 --> 00:40:12,350
universe in which we
live right now.

761
00:40:12,350 --> 00:40:16,330
So because of what we did
before, by the same

762
00:40:16,330 --> 00:40:20,330
calculation, we would find that
the optimal estimates is

763
00:40:20,330 --> 00:40:24,970
the expected value of X of
Theta, but the optimal

764
00:40:24,970 --> 00:40:26,730
estimate that takes
into account the

765
00:40:26,730 --> 00:40:29,170
information that we have.

766
00:40:29,170 --> 00:40:33,600
So the conclusion, once you get
your data, if you want to

767
00:40:33,600 --> 00:40:40,480
minimize the mean squared error,
you should just report

768
00:40:40,480 --> 00:40:43,870
the conditional estimation of
this unknown quantity based on

769
00:40:43,870 --> 00:40:46,640
the data that you have.

770
00:40:46,640 --> 00:40:53,050
So the picture here is that
Theta is unknown.

771
00:40:53,050 --> 00:41:00,710
You have your apparatus that
creates measurements.

772
00:41:00,710 --> 00:41:07,880
So this creates an X. You take
an X, and here you have a box

773
00:41:07,880 --> 00:41:10,203
that does calculations.

774
00:41:10,203 --> 00:41:13,490

775
00:41:13,490 --> 00:41:18,180
It does calculations and it
spits out the conditional

776
00:41:18,180 --> 00:41:22,230
expectation of Theta, given the
particular data that you

777
00:41:22,230 --> 00:41:24,750
have observed.

778
00:41:24,750 --> 00:41:28,680
And what we have done in this
class so far is, to some

779
00:41:28,680 --> 00:41:33,450
extent, developing the
computational tools and skills

780
00:41:33,450 --> 00:41:36,020
to do with this particular
calculation--

781
00:41:36,020 --> 00:41:39,780
how to calculate the posterior
density for Theta and how to

782
00:41:39,780 --> 00:41:42,750
calculate expectations,
conditional expectations.

783
00:41:42,750 --> 00:41:45,330
So in principle, we know
how to do this.

784
00:41:45,330 --> 00:41:50,040
In principle, we can program a
computer to take the data and

785
00:41:50,040 --> 00:41:51,670
to spit out condition
expectations.

786
00:41:51,670 --> 00:41:56,140

787
00:41:56,140 --> 00:42:04,390
Somebody who doesn't think like
us might instead design a

788
00:42:04,390 --> 00:42:09,940
calculating machine that does
something differently and

789
00:42:09,940 --> 00:42:16,490
produces some other estimate.

790
00:42:16,490 --> 00:42:20,000
So we went through this argument
and we decided to

791
00:42:20,000 --> 00:42:23,110
program our computer to
calculate conditional

792
00:42:23,110 --> 00:42:24,490
expectations.

793
00:42:24,490 --> 00:42:28,460
Somebody else came up with some
other crazy idea for how

794
00:42:28,460 --> 00:42:30,590
to estimate the random
variable.

795
00:42:30,590 --> 00:42:34,460
They came up with some function
g and the programmed

796
00:42:34,460 --> 00:42:38,700
it, and they designed a machine
that estimates Theta's

797
00:42:38,700 --> 00:42:43,000
by outputting a certain
g of X.

798
00:42:43,000 --> 00:42:47,690
That could be an alternative
estimator.

799
00:42:47,690 --> 00:42:50,280
Which one is better?

800
00:42:50,280 --> 00:42:56,350
Well, we convinced ourselves
that this is the optimal one

801
00:42:56,350 --> 00:42:59,780
in a universe where we have
fixed the particular

802
00:42:59,780 --> 00:43:01,420
value of the data.

803
00:43:01,420 --> 00:43:06,030
So what we have proved so far
is a relation of this kind.

804
00:43:06,030 --> 00:43:09,670
In this conditional universe,
the mean squared

805
00:43:09,670 --> 00:43:11,920
error that I get--

806
00:43:11,920 --> 00:43:15,170
I'm the one who's using
this estimator--

807
00:43:15,170 --> 00:43:18,850
is less than or equal than the
mean squared error that this

808
00:43:18,850 --> 00:43:23,960
person will get, the person
who uses that estimator.

809
00:43:23,960 --> 00:43:28,040
For any particular value of
the data, I'm going to do

810
00:43:28,040 --> 00:43:30,190
better than the other person.

811
00:43:30,190 --> 00:43:32,760
Now the data themselves
are random.

812
00:43:32,760 --> 00:43:38,050
If I average over all possible
values of the data, I should

813
00:43:38,050 --> 00:43:40,240
still be better off.

814
00:43:40,240 --> 00:43:45,120
If I'm better off for any
possible value X, then I

815
00:43:45,120 --> 00:43:49,140
should be better off on the
average over all possible

816
00:43:49,140 --> 00:43:50,640
values of X.

817
00:43:50,640 --> 00:43:55,670
So let us average both sides of
this quantity with respect

818
00:43:55,670 --> 00:43:58,990
to the probability distribution
of X. If you want

819
00:43:58,990 --> 00:44:03,350
to do it formally, you can write
this inequality between

820
00:44:03,350 --> 00:44:06,520
numbers as an inequality between
random variables.

821
00:44:06,520 --> 00:44:10,240
And it tells that no matter
what that random variable

822
00:44:10,240 --> 00:44:14,010
turns out to be, this quantity
is better than that quantity.

823
00:44:14,010 --> 00:44:17,270
Take expectations of both
sides, and you get this

824
00:44:17,270 --> 00:44:21,360
inequality between expectations
overall.

825
00:44:21,360 --> 00:44:29,130
And this last inequality tells
me that the person who's using

826
00:44:29,130 --> 00:44:34,430
this estimator who produces
estimates according to this

827
00:44:34,430 --> 00:44:45,090
machine will have a mean squared
estimation error

828
00:44:45,090 --> 00:44:48,580
that's less than or equal to
the estimation error that's

829
00:44:48,580 --> 00:44:51,290
produced by the other person.

830
00:44:51,290 --> 00:44:54,710
In a few words, the conditional
expectation

831
00:44:54,710 --> 00:44:58,500
estimator is the optimal
estimator.

832
00:44:58,500 --> 00:45:01,765
It's the ultimate estimating
machine.

833
00:45:01,765 --> 00:45:04,430

834
00:45:04,430 --> 00:45:08,720
That's how you should solve
estimation problems and report

835
00:45:08,720 --> 00:45:10,240
a single value.

836
00:45:10,240 --> 00:45:14,510
If you're forced to report a
single value and if you're

837
00:45:14,510 --> 00:45:18,060
interested in estimation
errors.

838
00:45:18,060 --> 00:45:24,620
OK, while we could have told you
that story, of course, a

839
00:45:24,620 --> 00:45:29,500
month or two ago, this is really
about interpretation --

840
00:45:29,500 --> 00:45:32,550
about realizing that conditional
expectations have

841
00:45:32,550 --> 00:45:35,160
a very nice property.

842
00:45:35,160 --> 00:45:38,180
But other than that, any
probabilistic skills that come

843
00:45:38,180 --> 00:45:41,180
into this business are just the
probabilistic skills of

844
00:45:41,180 --> 00:45:44,330
being able to calculate
conditional expectations,

845
00:45:44,330 --> 00:45:46,750
which you already
know how to do.

846
00:45:46,750 --> 00:45:51,380
So conclusion, all of optimal
Bayesian estimation just means

847
00:45:51,380 --> 00:45:54,655
calculating and reporting
conditional expectations.

848
00:45:54,655 --> 00:45:58,380
Well, if the world were that
simple, then statisticians

849
00:45:58,380 --> 00:46:02,670
wouldn't be able to find jobs
if life is that simple.

850
00:46:02,670 --> 00:46:05,690
So real life is not
that simple.

851
00:46:05,690 --> 00:46:07,540
There are complications.

852
00:46:07,540 --> 00:46:10,050
And that perhaps makes their
life a little more

853
00:46:10,050 --> 00:46:11,300
interesting.

854
00:46:11,300 --> 00:46:22,010

855
00:46:22,010 --> 00:46:25,500
OK, one complication is that we
would deal with the vectors

856
00:46:25,500 --> 00:46:28,580
instead of just single
random variables.

857
00:46:28,580 --> 00:46:31,830
I use the notation here
as if X was a

858
00:46:31,830 --> 00:46:33,500
single random variable.

859
00:46:33,500 --> 00:46:37,710
In real life, you get
several data.

860
00:46:37,710 --> 00:46:39,520
Does our story change?

861
00:46:39,520 --> 00:46:41,950
Not really, same argument--

862
00:46:41,950 --> 00:46:44,410
given all the data that you
have observed, you should

863
00:46:44,410 --> 00:46:47,660
still report the conditional
expectation of Theta.

864
00:46:47,660 --> 00:46:51,260
But what kind of work does it
take in order to report this

865
00:46:51,260 --> 00:46:53,080
conditional expectation?

866
00:46:53,080 --> 00:46:57,030
One issue is that you need to
cook up a plausible prior

867
00:46:57,030 --> 00:46:58,810
distribution for Theta.

868
00:46:58,810 --> 00:46:59,960
How do you do that?

869
00:46:59,960 --> 00:47:03,570
In a given application , this
is a bit of a judgment call,

870
00:47:03,570 --> 00:47:05,970
what prior would you
be working with.

871
00:47:05,970 --> 00:47:08,840
And there's a certain
skill there of not

872
00:47:08,840 --> 00:47:12,100
making silly choices.

873
00:47:12,100 --> 00:47:16,690
A more pragmatic, practical
issue is that this is a

874
00:47:16,690 --> 00:47:21,180
formula that's extremely nice
and compact and simple that

875
00:47:21,180 --> 00:47:24,560
you can write with
minimal ink.

876
00:47:24,560 --> 00:47:29,180
But the behind it there could
be hidden a huge amount of

877
00:47:29,180 --> 00:47:31,520
calculation.

878
00:47:31,520 --> 00:47:34,820
So doing any sort of
calculations that involve

879
00:47:34,820 --> 00:47:39,640
multiple random variables really
involves calculating

880
00:47:39,640 --> 00:47:42,240
multi-dimensional integrals.

881
00:47:42,240 --> 00:47:46,230
And the multi-dimensional
integrals are hard to compute.

882
00:47:46,230 --> 00:47:50,830
So implementing actually this
calculating machine here may

883
00:47:50,830 --> 00:47:54,340
not be easy, might be
complicated computationally.

884
00:47:54,340 --> 00:47:58,250
It's also complicated in terms
of not being able to derive

885
00:47:58,250 --> 00:47:59,890
intuition about it.

886
00:47:59,890 --> 00:48:03,680
So perhaps you might want to
have a simpler version, a

887
00:48:03,680 --> 00:48:07,940
simpler alternative to this
formula that's easier to work

888
00:48:07,940 --> 00:48:10,950
with and easier to calculate.

889
00:48:10,950 --> 00:48:13,440
We will be talking about
one such simpler

890
00:48:13,440 --> 00:48:15,540
alternative next time.

891
00:48:15,540 --> 00:48:18,570
So again, to conclude, at
the high level, Bayesian

892
00:48:18,570 --> 00:48:22,330
estimation is very, very simple,
given that you have

893
00:48:22,330 --> 00:48:24,180
mastered everything that
has happened in

894
00:48:24,180 --> 00:48:26,370
this course so far.

895
00:48:26,370 --> 00:48:29,860
There are certain practical
issues and it's also good to

896
00:48:29,860 --> 00:48:33,590
be familiar with the concepts
and the issues that in

897
00:48:33,590 --> 00:48:36,620
general, you would prefer to
report that complete posterior

898
00:48:36,620 --> 00:48:37,360
distribution.

899
00:48:37,360 --> 00:48:40,890
But if you're forced to report a
point estimate, then there's

900
00:48:40,890 --> 00:48:43,130
a number of reasonable
ways to do it.

901
00:48:43,130 --> 00:48:45,690
And perhaps the most reasonable
one is to just the

902
00:48:45,690 --> 00:48:48,220
report the conditional
expectation itself.

903
00:48:48,220 --> 00:48:49,470