1
00:00:00,530 --> 00:00:02,960
The following content is
provided under a Creative

2
00:00:02,960 --> 00:00:04,370
Commons license.

3
00:00:04,370 --> 00:00:07,410
Your support will help MIT
OpenCourseWare continue to

4
00:00:07,410 --> 00:00:11,060
offer high-quality educational
resources for free.

5
00:00:11,060 --> 00:00:13,960
To make a donation or view
additional materials from

6
00:00:13,960 --> 00:00:19,790
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:19,790 --> 00:00:21,040
ocw.mit.edu.

8
00:00:24,000 --> 00:00:26,220
PROFESSOR: OK, I guess we might
as well start a minute

9
00:00:26,220 --> 00:00:29,830
early since those of you
who are here are here.

10
00:00:32,580 --> 00:00:36,090
We're coming to the
end of course.

11
00:00:36,090 --> 00:00:41,630
We're deep in chapter 7 now
talking about random walks and

12
00:00:41,630 --> 00:00:44,580
detection theory.

13
00:00:44,580 --> 00:00:48,450
We'll get into martingales
sometime next week.

14
00:00:48,450 --> 00:00:52,090
There are four more lectures
after this one.

15
00:00:52,090 --> 00:00:55,290
The schedule was passed out at
the beginning of the term.

16
00:00:55,290 --> 00:01:00,050
I don't know how I did it, but
I somehow left off the last

17
00:01:00,050 --> 00:01:02,860
Wednesday of class.

18
00:01:02,860 --> 00:01:04,519
The final is going to
be on Wednesday

19
00:01:04,519 --> 00:01:06,630
morning at the ice rink.

20
00:01:06,630 --> 00:01:08,260
I don't know what the
ice rink is like.

21
00:01:08,260 --> 00:01:14,140
It doesn't sound like an ideal
place to take a final, but I

22
00:01:14,140 --> 00:01:17,340
assume they must have desks
there and all that stuff.

23
00:01:19,860 --> 00:01:22,700
We will send out a notice
about that.

24
00:01:22,700 --> 00:01:27,710
This is the last homework set
that you will have to turn in.

25
00:01:27,710 --> 00:01:34,550
We will probably have another
set of practice problems and

26
00:01:34,550 --> 00:01:36,900
problems on--

27
00:01:36,900 --> 00:01:40,870
but not things you
should turn in.

28
00:01:40,870 --> 00:01:43,450
We will try to get solutions
out on them

29
00:01:43,450 --> 00:01:44,920
fairly quickly, also.

30
00:01:44,920 --> 00:01:49,320
So you can do them, but also
look at the answers right

31
00:01:49,320 --> 00:01:50,710
after you do them.

32
00:01:50,710 --> 00:01:55,550
OK, so let's get back
to random walks.

33
00:01:55,550 --> 00:02:00,550
And remember what we were
doing last time.

34
00:02:00,550 --> 00:02:04,720
A random walk, by definition,
you have a sequence of IID

35
00:02:04,720 --> 00:02:06,860
random variables.

36
00:02:06,860 --> 00:02:10,360
You have partial sums of
those random variables.

37
00:02:10,360 --> 00:02:13,090
S sub n is a sum of
the first n of

38
00:02:13,090 --> 00:02:15,690
those IID random variables.

39
00:02:15,690 --> 00:02:21,790
And the sequence of partial sums
S1, S2, S3, and so forth,

40
00:02:21,790 --> 00:02:25,345
that sequence is called
a random walk.

41
00:02:25,345 --> 00:02:27,860
And if you graph the random
walk, it's something which

42
00:02:27,860 --> 00:02:30,670
wanders up and down usually.

43
00:02:30,670 --> 00:02:34,840
And sometimes, if the mean of X
is positive, it wanders off

44
00:02:34,840 --> 00:02:36,080
to infinity.

45
00:02:36,080 --> 00:02:38,520
If the mean of X is negative,
it wanders

46
00:02:38,520 --> 00:02:40,225
off to minus infinity.

47
00:02:40,225 --> 00:02:44,960
If the mean of X is 0, it simply
diffuses somewhat as

48
00:02:44,960 --> 00:02:46,120
time goes on.

49
00:02:46,120 --> 00:02:49,320
And what we're trying to find
that is exactly how do these

50
00:02:49,320 --> 00:02:51,820
things work.

51
00:02:51,820 --> 00:02:53,670
So our focus here is
going to be on

52
00:02:53,670 --> 00:02:56,640
threshold-crossing problems.

53
00:02:56,640 --> 00:03:01,340
Namely, what's the probability
that this random walk is going

54
00:03:01,340 --> 00:03:08,110
to cross some threshold by or at
some particular value of n?

55
00:03:08,110 --> 00:03:11,450
If you have two thresholds,
one above and one below,

56
00:03:11,450 --> 00:03:13,960
what's the probability it's
going to cross the one above?

57
00:03:13,960 --> 00:03:17,750
What's the probability it's
going to cross the one below?

58
00:03:17,750 --> 00:03:21,200
And if it crosses one of these,
when does it cross it?

59
00:03:21,200 --> 00:03:26,740
If it crosses it, how much
of an overshoot is there?

60
00:03:26,740 --> 00:03:29,380
All of those problems just come
in naturally by looking

61
00:03:29,380 --> 00:03:32,160
at a sum of IID random
variables.

62
00:03:32,160 --> 00:03:35,710
But here we're going to be
trying to study them in some

63
00:03:35,710 --> 00:03:40,500
consistent manner looking at the
thresholds particularly.

64
00:03:40,500 --> 00:03:45,650
We've talked a little bit
about two particularly

65
00:03:45,650 --> 00:03:47,030
important applications.

66
00:03:47,030 --> 00:03:49,960
One is [? GG1Qs ?].

67
00:03:49,960 --> 00:03:54,760
And even far more important than
that is this question of

68
00:03:54,760 --> 00:03:58,920
detection, or making decisions,
or hypothesis

69
00:03:58,920 --> 00:04:01,880
testing, all of which
are the same thing.

70
00:04:01,880 --> 00:04:06,340
You remember we did show that
there was at least one

71
00:04:06,340 --> 00:04:09,980
threshold-crossing problem
that was very, very easy.

72
00:04:09,980 --> 00:04:14,600
It's the threshold problem where
the underlying random

73
00:04:14,600 --> 00:04:16,640
variable is binary.

74
00:04:16,640 --> 00:04:21,010
You either go up by 1 or you
go down by 1 on each step.

75
00:04:21,010 --> 00:04:24,040
And the question is, what's the
probability that you will

76
00:04:24,040 --> 00:04:29,570
cross some threshold at
some k greater than 0?

77
00:04:29,570 --> 00:04:32,590
And it turns out that since
you can only go up 1 each

78
00:04:32,590 --> 00:04:36,620
time, the probability of getting
up to some point k is

79
00:04:36,620 --> 00:04:39,390
the probability you
ever got up to 1.

80
00:04:39,390 --> 00:04:41,910
Given that you got up to 1, it's
the probability that you

81
00:04:41,910 --> 00:04:43,250
ever got up to 2.

82
00:04:43,250 --> 00:04:45,260
Given you got up to 2, it's
the probability you

83
00:04:45,260 --> 00:04:46,480
ever got up to 3.

84
00:04:46,480 --> 00:04:49,400
That doesn't mean that you
go directly from 2 to 3.

85
00:04:49,400 --> 00:04:53,160
After you go to 2, you wander
all around, and eventually you

86
00:04:53,160 --> 00:04:54,510
make it up to 3.

87
00:04:54,510 --> 00:04:57,920
If you do, then the question is,
do you ever get from 3 to

88
00:04:57,920 --> 00:04:59,580
4, and so forth.

89
00:04:59,580 --> 00:05:03,650
And we found that the solution
to that problem was p over 1

90
00:05:03,650 --> 00:05:08,110
minus p to the k-th power of p
is less than or equal to 1/2.

91
00:05:08,110 --> 00:05:11,990
And we solved that problem, if
you remember, back when we

92
00:05:11,990 --> 00:05:15,420
were talking about stop when
you're ahead if you're playing

93
00:05:15,420 --> 00:05:17,960
coin tossing with somebody.

94
00:05:17,960 --> 00:05:26,180
And so let's go further and
look particularly at this

95
00:05:26,180 --> 00:05:29,200
problem of detection, and
decisions, and hypothesis

96
00:05:29,200 --> 00:05:33,720
testing, which is really not a
particularly hard problem.

97
00:05:33,720 --> 00:05:38,250
But it's made particularly hard
by statisticians who have

98
00:05:38,250 --> 00:05:44,870
so many special rules, peculiar
cases, and almost

99
00:05:44,870 --> 00:05:48,500
mythology about making
decisions.

100
00:05:48,500 --> 00:05:53,200
And you can imagine why because
as long as you talk

101
00:05:53,200 --> 00:05:56,810
about probability, everybody
knows you're talking about an

102
00:05:56,810 --> 00:05:58,440
abstraction.

103
00:05:58,440 --> 00:06:02,010
As soon as you start talking
about making a decision, it

104
00:06:02,010 --> 00:06:04,910
suddenly becomes real.

105
00:06:04,910 --> 00:06:07,960
I mean, you look at a
bunch of data and

106
00:06:07,960 --> 00:06:09,850
you have to do something.

107
00:06:09,850 --> 00:06:12,210
You look at a bunch of
candidates for a job, you have

108
00:06:12,210 --> 00:06:13,760
to choose one.

109
00:06:13,760 --> 00:06:16,650
That's always very difficult
because you might not choose

110
00:06:16,650 --> 00:06:17,270
the right one.

111
00:06:17,270 --> 00:06:19,170
You might choose a
very poor one.

112
00:06:19,170 --> 00:06:21,830
But you have to do your best.

113
00:06:21,830 --> 00:06:25,260
If you're investing in stocks,
you look at all the statistics

114
00:06:25,260 --> 00:06:26,500
of everything.

115
00:06:26,500 --> 00:06:28,060
And finally you say,
that's where I'm

116
00:06:28,060 --> 00:06:30,210
going to put my money.

117
00:06:30,210 --> 00:06:32,340
Or if you're looking for a job
you say, that's where I'm

118
00:06:32,340 --> 00:06:33,960
going to work, and you
hope that that's

119
00:06:33,960 --> 00:06:35,700
going to work out well.

120
00:06:35,700 --> 00:06:38,790
There are all these situations
where you can evaluate

121
00:06:38,790 --> 00:06:42,720
probabilities until you're
sick in the head.

122
00:06:42,720 --> 00:06:44,810
They don't mean anything.

123
00:06:44,810 --> 00:06:47,170
It's only when you make a
decision and actually do

124
00:06:47,170 --> 00:06:50,670
something with it that it
really means something.

125
00:06:50,670 --> 00:06:53,850
So it becomes important
at this point.

126
00:06:53,850 --> 00:06:58,240
The model we use for this,
since we're studying

127
00:06:58,240 --> 00:06:59,880
probability theory--

128
00:06:59,880 --> 00:07:03,360
well, actually, we're studying
random processes.

129
00:07:03,360 --> 00:07:06,190
But we're really studying
probability theory.

130
00:07:06,190 --> 00:07:09,360
You probably noticed
that by now.

131
00:07:09,360 --> 00:07:13,040
Since we're studying
probability, we study all

132
00:07:13,040 --> 00:07:16,530
these problems in terms of
a probabilistic model.

133
00:07:16,530 --> 00:07:20,630
And in the probabilistic model,
there's a discrete and,

134
00:07:20,630 --> 00:07:25,880
in most cases, binary random
variable, H, which is called

135
00:07:25,880 --> 00:07:28,340
the hypothesis random
variable.

136
00:07:28,340 --> 00:07:31,060
The sample values of H,
you might as well

137
00:07:31,060 --> 00:07:32,900
call them 0 and 1.

138
00:07:32,900 --> 00:07:36,460
That's the easiest things
to call binary things.

139
00:07:36,460 --> 00:07:39,590
They're called the alternative
hypotheses.

140
00:07:39,590 --> 00:07:42,350
They have marginal probabilities
because it's a

141
00:07:42,350 --> 00:07:44,100
probability model.

142
00:07:44,100 --> 00:07:45,440
You have a random variable.

143
00:07:45,440 --> 00:07:49,020
It can only take on the value
0 and 1, so it has to have

144
00:07:49,020 --> 00:07:51,740
probabilities of
being 0 and 1.

145
00:07:51,740 --> 00:07:53,690
Along with that, there
are all sorts of

146
00:07:53,690 --> 00:07:54,880
other random variables.

147
00:07:54,880 --> 00:07:58,360
The situation might be as
complicated as you want.

148
00:07:58,360 --> 00:08:00,770
But since we're making
decisions, we're making

149
00:08:00,770 --> 00:08:04,560
decisions on the basis of some
set of alternatives.

150
00:08:04,560 --> 00:08:07,700
And here, since we're trying
to talk about random walks,

151
00:08:07,700 --> 00:08:11,260
and martingales, and things like
that, also we restrict

152
00:08:11,260 --> 00:08:14,960
our attention to particular
kinds of observations.

153
00:08:14,960 --> 00:08:17,680
And the particular kind of
observation that we restrict

154
00:08:17,680 --> 00:08:23,160
attention to here is a sequence
of random variables,

155
00:08:23,160 --> 00:08:24,900
which we call the observation.

156
00:08:24,900 --> 00:08:26,060
You observe Y1.

157
00:08:26,060 --> 00:08:27,000
You observe Y2.

158
00:08:27,000 --> 00:08:29,460
You observe Y3, and so forth.

159
00:08:29,460 --> 00:08:33,539
In other words, you observe a
sample value of each of those

160
00:08:33,539 --> 00:08:34,770
random variables.

161
00:08:34,770 --> 00:08:36,669
There are a whole sequence
of them.

162
00:08:36,669 --> 00:08:40,409
And we assume, to make life
simple for ourselves, that

163
00:08:40,409 --> 00:08:44,920
each of these are independent,
conditional on the hypothesis.

164
00:08:44,920 --> 00:08:47,250
And they're identically
distributed conditional on the

165
00:08:47,250 --> 00:08:48,080
hypothesis.

166
00:08:48,080 --> 00:08:54,150
That's what this says
right here.

167
00:08:54,150 --> 00:08:57,030
This makes one more assumption
that assumes that these

168
00:08:57,030 --> 00:09:00,040
observations are continuous
random variables.

169
00:09:00,040 --> 00:09:02,600
That doesn't make much
difference, there are just a

170
00:09:02,600 --> 00:09:05,790
few peculiarities that
come in if these are

171
00:09:05,790 --> 00:09:07,780
discrete random variables.

172
00:09:07,780 --> 00:09:10,410
There also a few peculiarities
that come in when they're

173
00:09:10,410 --> 00:09:11,530
continuous.

174
00:09:11,530 --> 00:09:13,830
And there are a lot of
peculiarities that come in

175
00:09:13,830 --> 00:09:15,680
when they're absolutely
arbitrary.

176
00:09:15,680 --> 00:09:18,780
But for the time being, just
imagine each of these are

177
00:09:18,780 --> 00:09:20,800
continuous random variables.

178
00:09:20,800 --> 00:09:26,060
So for each value of n, we
look at n observations.

179
00:09:26,060 --> 00:09:30,010
We can calculate the probability
density that those

180
00:09:30,010 --> 00:09:35,730
observations would occur
conditional on hypothesis 0.

181
00:09:35,730 --> 00:09:39,270
We can find the conditional
probability they could occur

182
00:09:39,270 --> 00:09:41,700
conditional on hypothesis 1.

183
00:09:41,700 --> 00:09:47,460
And since they're IID, that's
equal to this product here.

184
00:09:47,460 --> 00:09:52,210
Excuse me, they are not IID,
they are conditionally ID.

185
00:09:52,210 --> 00:09:54,500
Conditional on the hypothesis.

186
00:09:54,500 --> 00:09:58,340
Namely, the idea is the world
is one way or the world is

187
00:09:58,340 --> 00:09:59,270
another way.

188
00:09:59,270 --> 00:10:01,880
If the world is this way,
then all of these

189
00:10:01,880 --> 00:10:03,860
hypotheses are IID.

190
00:10:03,860 --> 00:10:06,340
You're doing the same experiment
again and again and

191
00:10:06,340 --> 00:10:10,590
again, but it's based on the
same underlying hypothesis.

192
00:10:10,590 --> 00:10:15,460
Or, the underlying hypothesis
is this over here.

193
00:10:15,460 --> 00:10:18,470
You make the number of
observations all based on this

194
00:10:18,470 --> 00:10:22,820
same hypothesis, and you make
as many of these IID

195
00:10:22,820 --> 00:10:25,330
observations conditional
on that

196
00:10:25,330 --> 00:10:27,270
observation as you choose.

197
00:10:27,270 --> 00:10:29,440
And when you're all done,
what do you do?

198
00:10:29,440 --> 00:10:31,140
You have to make
your decision.

199
00:10:31,140 --> 00:10:35,050
OK, so this is a very
simple-minded model of this

200
00:10:35,050 --> 00:10:38,240
very complicated and very
important problem.

201
00:10:38,240 --> 00:10:41,830
But it's close enough to the
truth that we can get a lot of

202
00:10:41,830 --> 00:10:44,190
observations from it.

203
00:10:44,190 --> 00:10:47,750
Now, I spent a lot last time
talking about this.

204
00:10:47,750 --> 00:10:51,810
Spend a lot of time this time
talking about it because when

205
00:10:51,810 --> 00:10:56,130
we use a probability model for
this, when we say that we're

206
00:10:56,130 --> 00:10:57,740
studying probability theory.

207
00:10:57,740 --> 00:11:00,610
And therefore, we're going to
use probability, we have

208
00:11:00,610 --> 00:11:04,900
suddenly allied ourselves
completely with people called

209
00:11:04,900 --> 00:11:09,980
Bayesian statisticians or
Bayesian probabilists.

210
00:11:09,980 --> 00:11:13,440
And we have gone against, turned
our back on people

211
00:11:13,440 --> 00:11:17,150
called Non-Bayesians, or
sometimes classical.

212
00:11:17,150 --> 00:11:19,280
I hate using the word
"classical" because I like the

213
00:11:19,280 --> 00:11:25,120
word "classics." I like the
classics for such an unusual

214
00:11:25,120 --> 00:11:26,630
point of view.

215
00:11:26,630 --> 00:11:30,090
And the unusual point of view
is that we refuse to take a

216
00:11:30,090 --> 00:11:31,840
probability model.

217
00:11:31,840 --> 00:11:35,320
We accept the fact that on all
the observations, all the

218
00:11:35,320 --> 00:11:37,370
observations are
probabilistic.

219
00:11:37,370 --> 00:11:40,710
We assume we have a nice model
for them, which makes sense.

220
00:11:40,710 --> 00:11:42,820
We can do whatever we want
with that model.

221
00:11:42,820 --> 00:11:44,220
We can change the model.

222
00:11:44,220 --> 00:11:46,830
We can do whatever we
want with a model.

223
00:11:46,830 --> 00:11:49,870
But if you once assume that
these two hypotheses that

224
00:11:49,870 --> 00:11:52,730
you're trying to choose between,
that they have a

225
00:11:52,730 --> 00:11:57,490
priori probabilities, then
people get very upset about it

226
00:11:57,490 --> 00:11:59,670
because they say, well,
if what the a priori

227
00:11:59,670 --> 00:12:03,640
probabilities are, why do you
have to do a hypothesis test?

228
00:12:03,640 --> 00:12:05,980
You already understand
everything there is to know

229
00:12:05,980 --> 00:12:07,510
about the problem.

230
00:12:07,510 --> 00:12:08,835
And they feel this
is very strange.

231
00:12:11,870 --> 00:12:15,820
It's not strange because you
use probability models.

232
00:12:15,820 --> 00:12:18,540
You use models to try to
understand certain things

233
00:12:18,540 --> 00:12:19,870
about reality.

234
00:12:19,870 --> 00:12:21,860
And you assume as many
things as you want to

235
00:12:21,860 --> 00:12:22,820
assume about it.

236
00:12:22,820 --> 00:12:26,110
And when you get all done, you
either use all the assumptions

237
00:12:26,110 --> 00:12:27,320
or you don't use them.

238
00:12:27,320 --> 00:12:32,230
What we're going to find today
is that when you use this

239
00:12:32,230 --> 00:12:36,380
assumption of a probability
model, you can answer the

240
00:12:36,380 --> 00:12:40,310
questions that these classical
statisticians go to great

241
00:12:40,310 --> 00:12:41,510
pains to answer.

242
00:12:41,510 --> 00:12:44,670
And you can ask them
very, very simply.

243
00:12:44,670 --> 00:12:48,160
So that after we assume the a
priori probabilities, we can

244
00:12:48,160 --> 00:12:52,390
calculate certain things which
don't depend on those a priori

245
00:12:52,390 --> 00:12:53,860
probabilities.

246
00:12:53,860 --> 00:12:55,520
And therefore, we
know two things.

247
00:12:55,520 --> 00:12:58,220
One, we know that if we
did know the a priori

248
00:12:58,220 --> 00:13:02,290
probabilities, it wouldn't
make any difference.

249
00:13:02,290 --> 00:13:05,670
And two, we know that if we
can estimate the a priori

250
00:13:05,670 --> 00:13:08,710
probabilities, it makes a great
deal of difference.

251
00:13:08,710 --> 00:13:10,380
And three--

252
00:13:10,380 --> 00:13:13,250
and this is the most
important point--

253
00:13:13,250 --> 00:13:17,110
you make 100 observations
of something.

254
00:13:17,110 --> 00:13:20,200
Somebody else says, I don't
believe you, and comes in and

255
00:13:20,200 --> 00:13:22,530
makes another 100
observations.

256
00:13:22,530 --> 00:13:25,780
Somebody else makes another
100 observations.

257
00:13:25,780 --> 00:13:29,580
Now, even if the second person
doesn't believe what the first

258
00:13:29,580 --> 00:13:34,400
person has done, it doesn't make
sense as a scientist to

259
00:13:34,400 --> 00:13:38,800
completely eliminate all of
that from consideration.

260
00:13:38,800 --> 00:13:41,940
Namely, what you would like to
do is say well, since this

261
00:13:41,940 --> 00:13:46,190
person has found such and
such, the a priori

262
00:13:46,190 --> 00:13:48,660
probabilities have changed.

263
00:13:48,660 --> 00:13:53,670
And then I can go on and make
my 100 observations.

264
00:13:53,670 --> 00:13:57,590
I can either make a hypothesis
test based on my 100

265
00:13:57,590 --> 00:14:02,090
observations or I can make a
hypothesis test assuming that

266
00:14:02,090 --> 00:14:04,340
the other person did
their work well.

267
00:14:04,340 --> 00:14:07,640
I can make it based on all
of these observations.

268
00:14:07,640 --> 00:14:11,000
If you try to do that those
two things in a classical

269
00:14:11,000 --> 00:14:13,800
formulation, you run into
a lot of trouble.

270
00:14:13,800 --> 00:14:17,110
If you try to do them in this
probabilistic formulation,

271
00:14:17,110 --> 00:14:18,810
it's all perfectly
straightforward.

272
00:14:18,810 --> 00:14:22,220
Because you can either start
out with a model in which

273
00:14:22,220 --> 00:14:25,490
you're taking 200 observations
or you can start out with a

274
00:14:25,490 --> 00:14:28,200
model in which you take
100 observations.

275
00:14:28,200 --> 00:14:30,440
And then suddenly, the
world changes.

276
00:14:30,440 --> 00:14:33,930
This hypothesis takes on,
perhaps a different value.

277
00:14:33,930 --> 00:14:35,970
You take another hundred
observations.

278
00:14:35,970 --> 00:14:39,150
So you do whatever you want
to within a probabilistic

279
00:14:39,150 --> 00:14:40,510
formulation.

280
00:14:40,510 --> 00:14:47,250
But the other thing is, all of
you that patiently have lived

281
00:14:47,250 --> 00:14:51,050
with this idea of studying
probabilistic

282
00:14:51,050 --> 00:14:53,160
models all term long.

283
00:14:53,160 --> 00:14:56,480
You might as well keep
on living with it.

284
00:14:56,480 --> 00:15:00,710
The fact that we're now
interested in making decisions

285
00:15:00,710 --> 00:15:03,030
should not make you think that
everything you've learned up

286
00:15:03,030 --> 00:15:06,580
until this point is baloney.

287
00:15:06,580 --> 00:15:10,050
And to move from here to
a classical statistical

288
00:15:10,050 --> 00:15:13,520
formulation of the world would
really be saying, I don't

289
00:15:13,520 --> 00:15:15,400
believe in probability theory.

290
00:15:15,400 --> 00:15:17,510
It's that bad.

291
00:15:17,510 --> 00:15:18,760
So here we go.

292
00:15:23,850 --> 00:15:26,490
I'm sorry, we did that.

293
00:15:26,490 --> 00:15:28,190
We were there.

294
00:15:28,190 --> 00:15:33,500
Assume that on the basis of
observing a sample value of

295
00:15:33,500 --> 00:15:37,490
this sequence of observations,
we have to make a decision

296
00:15:37,490 --> 00:15:42,060
about H. We have to choose
H equals 0 or H equals 1.

297
00:15:42,060 --> 00:15:44,990
We have to detect whether
or not H is 1.

298
00:15:44,990 --> 00:15:48,750
When you do this detection, you
would think in the real

299
00:15:48,750 --> 00:15:51,010
world that you've detected
something.

300
00:15:51,010 --> 00:15:54,330
If you've made a decision about
something, that you've

301
00:15:54,330 --> 00:15:58,770
tested a hypothesis and you
found that which is correct.

302
00:15:58,770 --> 00:16:00,460
Not at all.

303
00:16:00,460 --> 00:16:03,470
When you make decisions,
you can make errors.

304
00:16:03,470 --> 00:16:06,700
And the question of what kinds
of errors you're making is a

305
00:16:06,700 --> 00:16:11,170
major part of trying
to make decisions.

306
00:16:11,170 --> 00:16:14,620
I mean, those people who make
decisions and then can't

307
00:16:14,620 --> 00:16:17,780
believe that they might have
made the wrong decision are

308
00:16:17,780 --> 00:16:20,210
the worst kind of fools.

309
00:16:20,210 --> 00:16:21,710
And you see them in politics.

310
00:16:21,710 --> 00:16:22,900
You see them in business.

311
00:16:22,900 --> 00:16:24,250
You see them in academia.

312
00:16:24,250 --> 00:16:27,280
You see them all
over the place.

313
00:16:27,280 --> 00:16:30,640
When you make a decision and
you've made a mistake, you get

314
00:16:30,640 --> 00:16:31,520
some more evidence.

315
00:16:31,520 --> 00:16:33,890
You see that it's a mistake
and you change.

316
00:16:33,890 --> 00:16:37,790
The whole 19th century
was taken up with--

317
00:16:37,790 --> 00:16:41,760
I mean, the scientific community
was driven by

318
00:16:41,760 --> 00:16:43,680
physicists in those days.

319
00:16:43,680 --> 00:16:47,490
And the idea of Newton's
laws was the most

320
00:16:47,490 --> 00:16:48,930
sacred thing they had.

321
00:16:53,410 --> 00:16:55,450
Everybody believed in Newtonian

322
00:16:55,450 --> 00:16:57,300
mechanics in those days.

323
00:16:57,300 --> 00:17:01,500
When quantum mechanics came
along, this wasn't just a

324
00:17:01,500 --> 00:17:04,619
minor perturbation in physics.

325
00:17:04,619 --> 00:17:07,069
This was a most crucial thing.

326
00:17:07,069 --> 00:17:10,010
This said, everything we've
known goes out the window.

327
00:17:10,010 --> 00:17:13,069
We can't rely on anything
anymore.

328
00:17:13,069 --> 00:17:17,950
But the physicists said, OK,
I guess we made a mistake.

329
00:17:17,950 --> 00:17:19,390
We'll make new observations.

330
00:17:19,390 --> 00:17:22,170
We have new observations
that can be made.

331
00:17:22,170 --> 00:17:25,420
We now see that Newtonian
mechanics works over a certain

332
00:17:25,420 --> 00:17:27,050
range of things.

333
00:17:27,050 --> 00:17:29,390
It doesn't work in another
ranges of things.

334
00:17:29,390 --> 00:17:31,180
And they go on and
find new things.

335
00:17:31,180 --> 00:17:32,970
That's the same thing
we do here.

336
00:17:32,970 --> 00:17:34,200
We take these models.

337
00:17:34,200 --> 00:17:36,820
We evaluate our error
probabilities.

338
00:17:36,820 --> 00:17:39,510
And evaluating them, we then
say, well, we've got to go on

339
00:17:39,510 --> 00:17:41,290
and take some more
measurements.

340
00:17:41,290 --> 00:17:43,200
Or we say we're going
to live with it.

341
00:17:43,200 --> 00:17:45,800
But we face the fact that there
are errors involved.

342
00:17:45,800 --> 00:17:50,990
And in doing that, you have to
take a probabilistic model.

343
00:17:50,990 --> 00:17:53,730
If you don't take a
probabilistic model, it's very

344
00:17:53,730 --> 00:17:56,800
hard for you to talk honestly
about what error

345
00:17:56,800 --> 00:17:58,270
probabilities are.

346
00:17:58,270 --> 00:18:00,336
So both ways--

347
00:18:00,336 --> 00:18:02,910
well, I'm preaching
and I'm sorry.

348
00:18:02,910 --> 00:18:10,250
But I've lived for a long time
with many statisticians, many

349
00:18:10,250 --> 00:18:13,300
of whom get into my own
field and who cause a

350
00:18:13,300 --> 00:18:16,530
great deal of trouble.

351
00:18:16,530 --> 00:18:19,850
So the only thing I can do it
urge you all to be cautious

352
00:18:19,850 --> 00:18:20,500
about this.

353
00:18:20,500 --> 00:18:22,690
And to think the matter
through on your own.

354
00:18:22,690 --> 00:18:25,960
I'm not telling you to take
my point of view on it.

355
00:18:25,960 --> 00:18:28,870
I'm telling you, don't take
other people's point of view

356
00:18:28,870 --> 00:18:30,120
without thinking it through.

357
00:18:32,690 --> 00:18:37,270
The probability experiment
here really--

358
00:18:37,270 --> 00:18:42,530
I mean, every probability model
we view in terms of the

359
00:18:42,530 --> 00:18:48,350
real world, as you have this set
of probabilities, a set of

360
00:18:48,350 --> 00:18:49,710
possible events.

361
00:18:49,710 --> 00:18:51,170
You do the experiment.

362
00:18:51,170 --> 00:18:53,580
There's one sample point
that comes out.

363
00:18:53,580 --> 00:18:56,630
And after the one sample point
comes out, then you know what

364
00:18:56,630 --> 00:18:59,040
the result of the
experiment is.

365
00:18:59,040 --> 00:19:04,990
Here, the experiment consists
both of what you normally view

366
00:19:04,990 --> 00:19:06,090
as the experiment.

367
00:19:06,090 --> 00:19:08,660
Namely, taking the
observations.

368
00:19:08,660 --> 00:19:13,100
And it also involves a
choice of hypotheses.

369
00:19:13,100 --> 00:19:16,780
Namely, there's not a correct
hypothesis to start with.

370
00:19:16,780 --> 00:19:21,870
The experiment involves
God throws his dice.

371
00:19:21,870 --> 00:19:26,170
Einstein didn't believe that
God threw dice, but I do.

372
00:19:26,170 --> 00:19:30,260
And after throwing the dice,
one or the other of these

373
00:19:30,260 --> 00:19:32,980
hypotheses turns
out to be true.

374
00:19:32,980 --> 00:19:36,650
All of these observations point
to that or they point to

375
00:19:36,650 --> 00:19:39,290
the other and you
make a decision.

376
00:19:39,290 --> 00:19:42,660
OK, so the experiment consists
both on choosing the

377
00:19:42,660 --> 00:19:45,330
hypothesis and on taking
a whole sequence of

378
00:19:45,330 --> 00:19:46,480
observations.

379
00:19:46,480 --> 00:19:50,650
Now, the other thing to
not forget in this--

380
00:19:50,650 --> 00:19:53,040
because you really have to get
this model in your mind or

381
00:19:53,040 --> 00:19:54,610
you're going to get very
confused with all

382
00:19:54,610 --> 00:19:56,390
the things we do.

383
00:19:56,390 --> 00:19:59,280
The experiment consists
on a whole sequence of

384
00:19:59,280 --> 00:20:03,540
observations, but only one
choice of hypothesis.

385
00:20:03,540 --> 00:20:06,020
Namely, you do the experiment.

386
00:20:06,020 --> 00:20:08,650
There's a hypothesis that
occurs, and there's a whole

387
00:20:08,650 --> 00:20:12,720
sequence of observations which
are all IID conditional on

388
00:20:12,720 --> 00:20:13,970
that particular hypothesis.

389
00:20:16,730 --> 00:20:21,960
So that's the model we're
going to be using.

390
00:20:21,960 --> 00:20:24,980
And now life is quite
simple once we've

391
00:20:24,980 --> 00:20:27,040
explained the model.

392
00:20:27,040 --> 00:20:31,020
We can talk about the
probability that H is equal to

393
00:20:31,020 --> 00:20:34,850
either 0 or 1, conditional
on the

394
00:20:34,850 --> 00:20:37,750
sample point we've observed.

395
00:20:37,750 --> 00:20:43,570
It's equal to the a priori
probability of that hypothesis

396
00:20:43,570 --> 00:20:47,520
times the density of the
observation conditional on the

397
00:20:47,520 --> 00:20:53,870
hypothesis divided by just
a normalization factor.

398
00:20:53,870 --> 00:20:57,360
Namely, the overall probability
of that

399
00:20:57,360 --> 00:21:03,512
observation period, which is the
sum of probability that 0

400
00:21:03,512 --> 00:21:07,090
is a correct hypothesis times
this plus probability that 1

401
00:21:07,090 --> 00:21:12,490
is a correct hypothesis times
the density given 1.

402
00:21:12,490 --> 00:21:15,530
This denominator here
is a pain in the

403
00:21:15,530 --> 00:21:17,790
neck, as you can see.

404
00:21:17,790 --> 00:21:22,450
But you can avoid ever dealing
with a denominator if you take

405
00:21:22,450 --> 00:21:29,570
this for H equals 0, divide by
this for H equals 1, and then

406
00:21:29,570 --> 00:21:34,300
you have this term divided by
this term all divided by this

407
00:21:34,300 --> 00:21:38,460
term for l equals 1 divided
by the same thing.

408
00:21:38,460 --> 00:21:43,370
So the ratio, the probability
that H equals 0 given y over

409
00:21:43,370 --> 00:21:46,390
the probability that
H is 1 equals y is

410
00:21:46,390 --> 00:21:48,850
just this ratio here.

411
00:21:48,850 --> 00:21:52,990
Now, what's the probability of
error if we make a decision at

412
00:21:52,990 --> 00:21:55,470
this point?

413
00:21:55,470 --> 00:22:01,320
If I've got in this particular
sequence Y, this quantity here

414
00:22:01,320 --> 00:22:05,310
is, in fact, the probability
that hypothesis 0 is correct

415
00:22:05,310 --> 00:22:08,020
in the model that
we have chosen.

416
00:22:08,020 --> 00:22:14,870
So this is the probability that
H is equal to 0 given Y.

417
00:22:14,870 --> 00:22:19,540
If we select 1 under these
conditions, if we select

418
00:22:19,540 --> 00:22:22,700
hypothesis 1, if we make a
decision and say, I'm going to

419
00:22:22,700 --> 00:22:25,740
guess that 1 is the
right decision.

420
00:22:25,740 --> 00:22:29,070
That means that this
is the probability

421
00:22:29,070 --> 00:22:30,370
you've made a mistake.

422
00:22:30,370 --> 00:22:32,810
Because this is the probability
that H is actually

423
00:22:32,810 --> 00:22:34,330
0 rather than 1.

424
00:22:34,330 --> 00:22:38,150
This quantity here is the
probability that you've made a

425
00:22:38,150 --> 00:22:41,705
mistake given that 1 is the
correct hypothesis.

426
00:22:45,590 --> 00:22:48,240
So here we are sitting
here with these

427
00:22:48,240 --> 00:22:49,640
probabilities of error.

428
00:22:49,640 --> 00:22:52,280
We don't have to do any
calculations for them.

429
00:22:52,280 --> 00:22:55,100
Well, you might have to do a
great deal of calculation to

430
00:22:55,100 --> 00:22:58,250
calculate this and to
calculate this.

431
00:22:58,250 --> 00:23:01,630
But otherwise, the whole thing
is just sitting there for you.

432
00:23:01,630 --> 00:23:03,600
So what do you do if you
want to minimize the

433
00:23:03,600 --> 00:23:04,870
probability of error?

434
00:23:11,780 --> 00:23:14,810
This was the probability that
you're going to make an error

435
00:23:14,810 --> 00:23:16,380
if you choose 1.

436
00:23:16,380 --> 00:23:19,275
This is the probability of
error if you choose 0.

437
00:23:21,880 --> 00:23:24,670
We want to minimize the
probability of error and we

438
00:23:24,670 --> 00:23:28,205
see the observation Y, we want
to pick the one of these which

439
00:23:28,205 --> 00:23:29,370
is largest.

440
00:23:29,370 --> 00:23:33,580
And that's all there is to it.

441
00:23:33,580 --> 00:23:37,660
This is the decision rule
that minimizes the

442
00:23:37,660 --> 00:23:41,020
probability of an error.

443
00:23:41,020 --> 00:23:44,790
It's based on knowing
what P0 and P1 is.

444
00:23:44,790 --> 00:23:47,600
But otherwise, probability that
H equals l is the correct

445
00:23:47,600 --> 00:23:51,000
hypothesis given the observation
is probability

446
00:23:51,000 --> 00:23:55,000
that H equals L given Y. We
maximize the a posteriori

447
00:23:55,000 --> 00:23:58,080
probability of choosing
correctly by choosing the

448
00:23:58,080 --> 00:24:02,910
maximum over l of probability
that H equals l given Y.

449
00:24:02,910 --> 00:24:08,350
This choosing directly,
maximizing the a posteriori

450
00:24:08,350 --> 00:24:12,510
probability is called the MAP
rule, Maximum A posteriori

451
00:24:12,510 --> 00:24:14,590
Probability.

452
00:24:14,590 --> 00:24:21,620
You can only solve the MAP
problem if you assume that you

453
00:24:21,620 --> 00:24:23,340
know P0 P1.

454
00:24:23,340 --> 00:24:27,650
We do know P0 and P1 if we've
selected a probability model.

455
00:24:27,650 --> 00:24:30,400
So when we select this
probability model, we've

456
00:24:30,400 --> 00:24:35,000
already assumed what these a
priori probabilities are, so

457
00:24:35,000 --> 00:24:37,390
we now make our observation.

458
00:24:37,390 --> 00:24:38,320
And after making our

459
00:24:38,320 --> 00:24:40,210
observation, we make a decision.

460
00:24:40,210 --> 00:24:44,890
And at that point, we have an a
posteriori probability that

461
00:24:44,890 --> 00:24:47,260
each of the hypotheses
is correct.

462
00:24:50,430 --> 00:24:53,350
Anybody has any issues
with this?

463
00:24:53,350 --> 00:24:55,230
I mean, it looks painfully
simple when you

464
00:24:55,230 --> 00:24:56,666
look at this way.

465
00:24:56,666 --> 00:25:01,650
And if it doesn't look painfully
simple, please ask

466
00:25:01,650 --> 00:25:04,020
now or forever hold your
peace as they say.

467
00:25:06,830 --> 00:25:07,040
Yeah?

468
00:25:07,040 --> 00:25:09,375
AUDIENCE: So can you explain
how you get the equation?

469
00:25:09,375 --> 00:25:11,243
Can you explain how you
get the equation

470
00:25:11,243 --> 00:25:13,330
on the first line?

471
00:25:13,330 --> 00:25:15,620
PROFESSOR: On the first
line right up here?

472
00:25:15,620 --> 00:25:18,170
Yes, I use Bayes' law.

473
00:25:18,170 --> 00:25:19,620
AUDIENCE: So what is that?

474
00:25:19,620 --> 00:25:23,600
So that's P of A given B is
equal to P of B given A?

475
00:25:23,600 --> 00:25:24,310
PROFESSOR: Yes.

476
00:25:24,310 --> 00:25:27,360
AUDIENCE: I don't quite
see how to--

477
00:25:27,360 --> 00:25:37,140
P of A given B is equal to P
of B given A times P of A

478
00:25:37,140 --> 00:25:43,110
divided by P of A and B.

479
00:25:43,110 --> 00:25:47,826
If you take this over
there then it's--

480
00:25:47,826 --> 00:25:51,350
am I stating Bayes' law
in a funny way?

481
00:25:51,350 --> 00:25:53,576
AUDIENCE: So the thing on
the bottom is P of B?

482
00:25:53,576 --> 00:25:54,552
OK.

483
00:25:54,552 --> 00:25:56,016
PROFESSOR: What?

484
00:25:56,016 --> 00:25:57,266
AUDIENCE: OK, I get it.

485
00:26:02,330 --> 00:26:04,950
PROFESSOR: I mean, I might
not be explained it well.

486
00:26:04,950 --> 00:26:06,200
AUDIENCE: [INAUDIBLE].

487
00:26:08,120 --> 00:26:12,650
PROFESSOR: Except if you start
out with P of A given B is

488
00:26:12,650 --> 00:26:17,090
equal to P of B given A times P
of B divided by P of A. This

489
00:26:17,090 --> 00:26:28,210
quantity here is P of Y. So we
have probability that H equals

490
00:26:28,210 --> 00:26:34,590
l times probability of Y given
l divided by the probability

491
00:26:34,590 --> 00:26:36,650
of l to start with.

492
00:26:36,650 --> 00:26:43,590
OK, so you maximize the a
posteriori probability by

493
00:26:43,590 --> 00:26:45,180
choosing the maximum of these.

494
00:26:45,180 --> 00:26:47,650
It's called the MAP rule.

495
00:26:47,650 --> 00:26:53,550
And it doesn't require you to
calculate this quantity, which

496
00:26:53,550 --> 00:26:55,000
is sometimes a mess.

497
00:26:55,000 --> 00:26:58,110
All it requires you to do
is to compare these two

498
00:26:58,110 --> 00:27:01,410
quantities, which means you
have to compare these two

499
00:27:01,410 --> 00:27:02,780
quantities.

500
00:27:02,780 --> 00:27:04,136
AUDIENCE: It's 10 o'clock.

501
00:27:04,136 --> 00:27:05,110
PROFESSOR: Well, excuse me.

502
00:27:05,110 --> 00:27:05,250
Yes.

503
00:27:05,250 --> 00:27:06,500
Yes, I know.

504
00:27:13,480 --> 00:27:17,930
These things become clearer if
you state them in terms of

505
00:27:17,930 --> 00:27:20,320
what you call the likelihood
ratio.

506
00:27:20,320 --> 00:27:23,670
Likelihood ratio only works when
you have two hypotheses.

507
00:27:23,670 --> 00:27:29,490
When you have two hypotheses,
you call the ratio of one of

508
00:27:29,490 --> 00:27:34,440
them to the other one the
likelihood ratio.

509
00:27:34,440 --> 00:27:37,290
Why do I put 0 up here
and 1 down here?

510
00:27:37,290 --> 00:27:40,400
Absolutely no reason at all,
it's just convention.

511
00:27:40,400 --> 00:27:42,290
And unfortunately, it's
a convention that

512
00:27:42,290 --> 00:27:43,920
not everybody follows.

513
00:27:43,920 --> 00:27:46,440
So some people have one
convention and some people

514
00:27:46,440 --> 00:27:48,610
have another convention.

515
00:27:48,610 --> 00:27:51,650
If you want to use the other
convention, just imagine

516
00:27:51,650 --> 00:27:54,440
switching 1 and 1
in your mind.

517
00:27:54,440 --> 00:27:58,280
They're both just
binary numbers.

518
00:27:58,280 --> 00:28:03,140
Then, when you want to look at
this MAP rule, the MAP rule is

519
00:28:03,140 --> 00:28:06,790
choosing the larger of
these two things,

520
00:28:06,790 --> 00:28:10,600
which we had back here.

521
00:28:10,600 --> 00:28:15,620
That's choosing whether this is
larger than this, or vice

522
00:28:15,620 --> 00:28:20,540
versa, which is choosing whether
this ratio here is

523
00:28:20,540 --> 00:28:24,630
greater than the ratio
of P1 to P0.

524
00:28:24,630 --> 00:28:29,170
So that's the same, that's
the same thing.

525
00:28:29,170 --> 00:28:34,960
So the MAP rule is to calculate
the likelihood ratio

526
00:28:34,960 --> 00:28:37,530
for this given observation y.

527
00:28:37,530 --> 00:28:41,930
And if this is greater
than P1 over P0, you

528
00:28:41,930 --> 00:28:46,200
select H equals 0.

529
00:28:46,200 --> 00:28:51,910
If it's less than or equal to
P1 over P0, you select H1.

530
00:28:51,910 --> 00:28:56,320
Why do I put the strict equality
here and the strict

531
00:28:56,320 --> 00:28:57,880
inequality here?

532
00:28:57,880 --> 00:29:00,310
Again, no reason whatsoever.

533
00:29:00,310 --> 00:29:03,490
When you have equality, it
doesn't make any difference

534
00:29:03,490 --> 00:29:04,880
which you choose.

535
00:29:04,880 --> 00:29:07,380
So you could flip a coin.

536
00:29:07,380 --> 00:29:11,150
It's a little easier if you just
say, we're going to do

537
00:29:11,150 --> 00:29:14,240
this under this condition.

538
00:29:14,240 --> 00:29:16,770
So we state condition
this way.

539
00:29:16,770 --> 00:29:19,540
We calculate the likelihood
ratio.

540
00:29:19,540 --> 00:29:21,680
We compare it with
a threshold.

541
00:29:21,680 --> 00:29:24,840
The threshold here
is P1 over P0.

542
00:29:24,840 --> 00:29:27,320
And then we select something.

543
00:29:27,320 --> 00:29:30,750
Why did I put a little
hat over this?

544
00:29:30,750 --> 00:29:31,680
AUDIENCE: Estimation.

545
00:29:31,680 --> 00:29:32,165
PROFESSOR: What?

546
00:29:32,165 --> 00:29:34,430
AUDIENCE: Because it's
an estimation.

547
00:29:34,430 --> 00:29:34,530
PROFESSOR: What?

548
00:29:34,530 --> 00:29:37,150
AUDIENCE: It's an estimation?

549
00:29:37,150 --> 00:29:38,750
PROFESSOR: Well, it's not
really an estimation.

550
00:29:38,750 --> 00:29:39,750
It's a detection.

551
00:29:39,750 --> 00:29:44,690
I mean, estimation you usually
view as being analog.

552
00:29:44,690 --> 00:29:47,210
Detection you usually view
as being digital.

553
00:29:47,210 --> 00:29:48,880
And thanks for bringing
that up because it's

554
00:29:48,880 --> 00:29:50,130
an important point.

555
00:29:53,700 --> 00:30:00,140
But in this model, H is either
0 or 1 in the result of this

556
00:30:00,140 --> 00:30:01,140
experiment.

557
00:30:01,140 --> 00:30:03,900
We don't know which it is.

558
00:30:03,900 --> 00:30:06,220
This is what we've chosen.

559
00:30:06,220 --> 00:30:12,320
So H hat is 0 does not mean
that H itself is 0.

560
00:30:12,320 --> 00:30:14,280
So this is our choice.

561
00:30:14,280 --> 00:30:17,360
It might be wrong or
it might be right.

562
00:30:17,360 --> 00:30:21,110
Many decision rules, including
the most common and the most

563
00:30:21,110 --> 00:30:24,980
sensible, are rules that compare
lambda of y to a fixed

564
00:30:24,980 --> 00:30:31,170
threshold, say, eta, is P1 over
P0, which is independent

565
00:30:31,170 --> 00:30:34,090
of y, which is just
a fixed threshold.

566
00:30:34,090 --> 00:30:37,350
The decision rules then vary
only in the way that you

567
00:30:37,350 --> 00:30:39,280
choose the threshold.

568
00:30:39,280 --> 00:30:44,030
Now, what happens as soon
as I call this eta

569
00:30:44,030 --> 00:30:47,160
instead of P1 over P0?

570
00:30:47,160 --> 00:30:51,660
My test becomes independent of
these a priori probabilities

571
00:30:51,660 --> 00:30:55,910
that statisticians have thought
about for so long.

572
00:30:55,910 --> 00:30:58,800
Namely, after a couple of lines
of fiddling around with

573
00:30:58,800 --> 00:31:03,840
these things, suddenly all
of that has disappeared.

574
00:31:03,840 --> 00:31:05,820
We have a threshold test.

575
00:31:05,820 --> 00:31:09,930
The threshold test says,
take this ratio--

576
00:31:09,930 --> 00:31:13,720
everybody agrees that there's
such a ratio that exists--

577
00:31:13,720 --> 00:31:16,400
and compare it with something.

578
00:31:16,400 --> 00:31:23,900
And if it's bigger than that
something, you choose 0.

579
00:31:23,900 --> 00:31:27,220
If it's less than that
thing, you choose 1.

580
00:31:27,220 --> 00:31:29,810
And that's the end of it.

581
00:31:29,810 --> 00:31:33,670
OK, so we have two questions.

582
00:31:33,670 --> 00:31:38,580
One, do we always want to use a
threshold test or are there

583
00:31:38,580 --> 00:31:40,390
cases where we should
use things other

584
00:31:40,390 --> 00:31:42,560
than a threshold test?

585
00:31:42,560 --> 00:31:48,010
And the second question is, if
we're going to use a threshold

586
00:31:48,010 --> 00:31:51,650
test, where should we
set the threshold?

587
00:31:51,650 --> 00:31:54,430
I mean, there's nothing that
says that you really want to

588
00:31:54,430 --> 00:31:58,530
minimize the probability
of error.

589
00:31:58,530 --> 00:32:02,740
I mean, suppose your test
is to see whether--

590
00:32:02,740 --> 00:32:06,000
I mean, something in
the news today.

591
00:32:06,000 --> 00:32:09,350
I mean, you'd like to take an
experiment to see whether your

592
00:32:09,350 --> 00:32:14,410
nuclear plant is going
to explode or not.

593
00:32:14,410 --> 00:32:16,940
So you come up with
one decision, it's

594
00:32:16,940 --> 00:32:18,450
not going to explode.

595
00:32:18,450 --> 00:32:21,580
Or another decision, you
decide it will explode.

596
00:32:21,580 --> 00:32:24,270
Presumably on the basis of
that decision, you do all

597
00:32:24,270 --> 00:32:26,340
sorts of things.

598
00:32:26,340 --> 00:32:30,190
Do you really want to make
it a maximum a posteriori

599
00:32:30,190 --> 00:32:32,420
probability decision?

600
00:32:32,420 --> 00:32:33,840
No.

601
00:32:33,840 --> 00:32:38,900
You recognize that if it's
going to explode, and you

602
00:32:38,900 --> 00:32:42,030
choose that it's not going to
explode and you don't do

603
00:32:42,030 --> 00:32:47,380
anything, there is a humongous
cost associated with that.

604
00:32:47,380 --> 00:32:49,860
If you decide the other way,
there's a pretty large cost

605
00:32:49,860 --> 00:32:52,180
associated with that also.

606
00:32:52,180 --> 00:32:54,940
But there's not really much
comparison between the two.

607
00:32:54,940 --> 00:32:58,360
But anyway, you want to do
something which takes those

608
00:32:58,360 --> 00:32:59,830
costs into account.

609
00:32:59,830 --> 00:33:02,170
One of the problems in the
homework does that.

610
00:33:02,170 --> 00:33:09,510
It's really almost trivial to
readjust this problem, so that

611
00:33:09,510 --> 00:33:13,860
you set the threshold to
involve the costs also.

612
00:33:13,860 --> 00:33:19,730
So if you have arbitrary costs
in making errors, then you

613
00:33:19,730 --> 00:33:21,660
change the threshold
a little bit.

614
00:33:21,660 --> 00:33:25,800
But you still use a
threshold test.

615
00:33:25,800 --> 00:33:29,010
There's something called maximum
likelihood that people

616
00:33:29,010 --> 00:33:32,420
like for making decisions.

617
00:33:32,420 --> 00:33:35,470
And maximum likelihood
says you calculate

618
00:33:35,470 --> 00:33:37,040
the likelihood ratio.

619
00:33:37,040 --> 00:33:39,620
And if the likelihood
ratio is bigger than

620
00:33:39,620 --> 00:33:41,650
1, you go this way.

621
00:33:41,650 --> 00:33:46,530
If it's less than 1,
you go this way.

622
00:33:46,530 --> 00:33:49,780
It's the MAP test if
the two a priori

623
00:33:49,780 --> 00:33:52,140
probabilities are equal.

624
00:33:52,140 --> 00:33:55,790
But in many cases, you want to
use it whether or not the a

625
00:33:55,790 --> 00:33:57,930
priori probabilities
are equal.

626
00:33:57,930 --> 00:34:00,890
It's a standard test,
and there are many

627
00:34:00,890 --> 00:34:02,670
reasons for using it.

628
00:34:02,670 --> 00:34:06,510
Aside from the fact that the a
priori probabilities might be

629
00:34:06,510 --> 00:34:07,860
chosen that way.

630
00:34:07,860 --> 00:34:10,870
So anyway, that's one
other choice.

631
00:34:10,870 --> 00:34:13,070
When we go a little further
day, we'll talk about a

632
00:34:13,070 --> 00:34:14,659
Neyman-Pearson test.

633
00:34:14,659 --> 00:34:20,560
The Neyman-Pearson test says,
for some reason or other, I

634
00:34:20,560 --> 00:34:23,989
want to make sure that the
probability that my nuclear

635
00:34:23,989 --> 00:34:29,250
plant doesn't blow up is
less than, say, 10

636
00:34:29,250 --> 00:34:30,570
to the minus fifth.

637
00:34:30,570 --> 00:34:32,560
Why 10 to the minus fifth?

638
00:34:32,560 --> 00:34:34,060
Pull it out of the air.

639
00:34:34,060 --> 00:34:37,389
Maybe 10 to the minus sixth,
that point our probabilities

640
00:34:37,389 --> 00:34:39,230
don't make much sense anymore.

641
00:34:39,230 --> 00:34:44,170
But however we choose it, we
choose our test to say, we

642
00:34:44,170 --> 00:34:47,860
can't make the probability of
error under one hypothesis

643
00:34:47,860 --> 00:34:52,300
bigger than some certain amount
alpha than what test

644
00:34:52,300 --> 00:34:55,400
will minimize the probability
of error under the other

645
00:34:55,400 --> 00:34:57,360
hypothesis.

646
00:34:57,360 --> 00:35:00,020
Namely, if I have to get one
thing right, or I have to get

647
00:35:00,020 --> 00:35:03,610
it right almost all the time,
what's the best I can do on

648
00:35:03,610 --> 00:35:05,570
the other alternative?

649
00:35:05,570 --> 00:35:08,060
And that's the Neyman-Pearson
test.

650
00:35:08,060 --> 00:35:14,640
That is a favorite test among
the non-Bayesians because it

651
00:35:14,640 --> 00:35:18,260
doesn't involve the a priori
probabilities anymore.

652
00:35:18,260 --> 00:35:20,200
So it's a nice one
in that way.

653
00:35:20,200 --> 00:35:23,940
But we'll see, we get
it anyway using

654
00:35:23,940 --> 00:35:26,750
a probability model.

655
00:35:26,750 --> 00:35:29,890
OK, let's go back to random
walks just a little bit to see

656
00:35:29,890 --> 00:35:33,230
why we're doing what
we're doing.

657
00:35:33,230 --> 00:35:42,560
The logarithm of the threshold
ratio is logarithm of this

658
00:35:42,560 --> 00:35:44,352
lambda of y.

659
00:35:44,352 --> 00:35:45,845
I'm taking m observations.

660
00:35:45,845 --> 00:35:49,380
I'm putting that in explicitly,
is the sum from N

661
00:35:49,380 --> 00:35:53,810
equals 1 to m of the log of
the individual ratio.

662
00:35:53,810 --> 00:35:56,640
In other words, when you--

663
00:35:56,640 --> 00:36:00,960
under hypothesis 0, if I
calculate the probability of

664
00:36:00,960 --> 00:36:09,210
vector y given H equals 0, I'm
finding the probability of n

665
00:36:09,210 --> 00:36:11,250
things which are IID.

666
00:36:11,250 --> 00:36:16,320
So what I'm going to find this
probability density is taking

667
00:36:16,320 --> 00:36:19,720
the product of the probabilities
of each of the

668
00:36:19,720 --> 00:36:22,390
observations.

669
00:36:22,390 --> 00:36:25,530
Most of you know now that
any time you look at a

670
00:36:25,530 --> 00:36:29,210
probability, which is a product
of observations, what

671
00:36:29,210 --> 00:36:32,590
you'd really like to do is to
take the logarithm of it.

672
00:36:32,590 --> 00:36:35,120
So you're talking about a sum
of things rather than a

673
00:36:35,120 --> 00:36:38,540
product of things because
we all know how to add

674
00:36:38,540 --> 00:36:40,700
independent random variables.

675
00:36:40,700 --> 00:36:46,110
So the log of this likelihood
ratio, which is called the log

676
00:36:46,110 --> 00:36:50,940
likelihood ratio as you might
guess, is just a sum of these

677
00:36:50,940 --> 00:36:52,600
likelihood ratios.

678
00:36:52,600 --> 00:36:55,900
If we look at this for each m
greater than or equal to 1,

679
00:36:55,900 --> 00:36:59,780
then given H equals 0,
it's a random walk.

680
00:36:59,780 --> 00:37:03,410
And given H equals 1, it's
another random walk.

681
00:37:03,410 --> 00:37:06,870
It's the same sequence of sample
values in both cases.

682
00:37:06,870 --> 00:37:10,300
Namely, as an experimentalist,
we're taking these

683
00:37:10,300 --> 00:37:11,490
observations.

684
00:37:11,490 --> 00:37:17,230
We don't know whether H equals
0 or H equals 1 is what the

685
00:37:17,230 --> 00:37:20,480
result of the experiment
is going to be.

686
00:37:20,480 --> 00:37:23,450
But what we do know is we know
what those values are.

687
00:37:23,450 --> 00:37:25,870
We can calculate this sum.

688
00:37:25,870 --> 00:37:38,320
And now, if we condition this
on H equals 0, then this

689
00:37:38,320 --> 00:37:41,550
quantity, which is fixed,
has a particular

690
00:37:41,550 --> 00:37:43,490
probability of occurring.

691
00:37:43,490 --> 00:37:47,040
So this is a random variable
then under the

692
00:37:47,040 --> 00:37:49,380
hypothesis H equals 0.

693
00:37:49,380 --> 00:37:53,670
It's a random variable under
the hypothesis H equals 1.

694
00:37:53,670 --> 00:37:57,590
And this sum of random variables
behaves in a very

695
00:37:57,590 --> 00:38:01,090
different way under these
two hypotheses.

696
00:38:01,090 --> 00:38:04,680
What's going to happen is that
under one hypothesis, the

697
00:38:04,680 --> 00:38:09,980
expected value of this log
likelihood ratio is going to

698
00:38:09,980 --> 00:38:12,310
linearly increase with n.

699
00:38:12,310 --> 00:38:16,050
If we look at it under the other
hypothesis, it's going

700
00:38:16,050 --> 00:38:20,530
to linearly decrease
as we increase n.

701
00:38:20,530 --> 00:38:24,550
And a nifty test at that point
is to say, as soon as it

702
00:38:24,550 --> 00:38:28,460
crosses a threshold up here or
a threshold down here, we're

703
00:38:28,460 --> 00:38:31,360
going to make a decision.

704
00:38:31,360 --> 00:38:35,320
And that's called a sequential
test in that case because you

705
00:38:35,320 --> 00:38:38,500
haven't specified ahead of time,
I'm going to take 100

706
00:38:38,500 --> 00:38:41,270
tests and then make
up my mind.

707
00:38:41,270 --> 00:38:44,420
You've specified that I'm going
to take as many tests as

708
00:38:44,420 --> 00:38:48,050
I need to be relatively sure
that I'm getting the right

709
00:38:48,050 --> 00:38:52,340
decision, which is what
you do in real life.

710
00:38:52,340 --> 00:38:57,090
I mean, there's nothing fancy
about doing sequential tests.

711
00:38:57,090 --> 00:39:00,420
Those are the obvious things to
do, except they're a little

712
00:39:00,420 --> 00:39:05,530
more tricky to talk about using
probability theory.

713
00:39:05,530 --> 00:39:08,530
But anyway, that's where
we're headed.

714
00:39:08,530 --> 00:39:14,660
That's why we're talking about
hypothesis testing.

715
00:39:14,660 --> 00:39:19,090
Because when you look at it in
this formulation, we get a

716
00:39:19,090 --> 00:39:21,290
random walk.

717
00:39:21,290 --> 00:39:28,500
And it gives us a nice example
of when you want to use random

718
00:39:28,500 --> 00:39:33,060
walks crossing a threshold as
a way of making decisions.

719
00:39:33,060 --> 00:39:36,640
OK, so that's why we're doing
what we're doing.

720
00:39:41,050 --> 00:39:48,150
Now, let's go back and look at
threshold tests again, and try

721
00:39:48,150 --> 00:39:55,650
to see how we're going to make
threshold tests, what the

722
00:39:55,650 --> 00:39:59,460
error probabilities will be,
and try to analyze them a

723
00:39:59,460 --> 00:40:03,150
little more than just saying,
well, a MAP test does this.

724
00:40:03,150 --> 00:40:06,420
Because as soon as you see that
a MAP test does this, you

725
00:40:06,420 --> 00:40:10,030
say, well, suppose I use
some other test.

726
00:40:10,030 --> 00:40:12,500
And what am I going to
suffer from that?

727
00:40:12,500 --> 00:40:15,120
What am I going to gain by it?

728
00:40:15,120 --> 00:40:19,140
So it's worthwhile to, instead
of looking at even just

729
00:40:19,140 --> 00:40:22,580
threshold tests, to say,
well, let's look at

730
00:40:22,580 --> 00:40:25,300
any old test at all.

731
00:40:25,300 --> 00:40:28,745
Now, any test means
the following.

732
00:40:31,270 --> 00:40:34,150
I have this probability model.

733
00:40:34,150 --> 00:40:37,730
I've already bludgeoned you into
accepting the fact that's

734
00:40:37,730 --> 00:40:40,300
the probability model we're
going to be looking at.

735
00:40:44,390 --> 00:40:46,440
And we have this--

736
00:40:46,440 --> 00:40:49,120
well, we have the likelihood
ratio, but we don't care about

737
00:40:49,120 --> 00:40:50,710
that for the moment.

738
00:40:50,710 --> 00:40:53,390
But we make this observation.

739
00:40:53,390 --> 00:40:54,640
We got to make a decision.

740
00:40:57,330 --> 00:41:01,800
And our decision is going
to be either 1 or 0.

741
00:41:01,800 --> 00:41:03,743
How do we characterize
that mathematically?

742
00:41:08,300 --> 00:41:11,850
Or how do we calculate it if
we want a computer to make

743
00:41:11,850 --> 00:41:14,250
that decision for us?

744
00:41:14,250 --> 00:41:18,690
The systematic way to do it is
for every possible sequence of

745
00:41:18,690 --> 00:41:24,650
y to say ahead of time to give a
formula, which sequences get

746
00:41:24,650 --> 00:41:30,010
mapped into 1 and which
sequences get mapped into 0.

747
00:41:30,010 --> 00:41:36,220
So we're going to call a set A
the set of sample sequences

748
00:41:36,220 --> 00:41:38,750
that get mapped into
hypothesis 1.

749
00:41:38,750 --> 00:41:45,620
That's the most general binary
hypothesis test you can do.

750
00:41:45,620 --> 00:41:48,140
That includes all
possible ways of

751
00:41:48,140 --> 00:41:50,660
choosing either 1 or 0.

752
00:41:50,660 --> 00:41:55,320
You're forced to hire somebody
or not hire somebody.

753
00:41:55,320 --> 00:41:58,500
You can't get them to work for
you for two weeks, and then

754
00:41:58,500 --> 00:41:59,950
make a decision at that point.

755
00:41:59,950 --> 00:42:02,060
Well, sometimes you
can in this world.

756
00:42:02,060 --> 00:42:05,890
But if it's somebody you really
want and other people

757
00:42:05,890 --> 00:42:09,820
want them, too, then you've got
to decide, I'm going to go

758
00:42:09,820 --> 00:42:12,150
with this person or I'm not
going to go with them.

759
00:42:12,150 --> 00:42:18,240
So under all observations that
you've made, you need some way

760
00:42:18,240 --> 00:42:23,550
to decide which ones make
you go to decision 1.

761
00:42:23,550 --> 00:42:27,010
Which ones make you
go to decision 0.

762
00:42:27,010 --> 00:42:30,940
So we will just say arbitrarily,
there's a set A

763
00:42:30,940 --> 00:42:35,230
of sample sequences that
map into hypothesis 1.

764
00:42:35,230 --> 00:42:40,070
And the error probability for
each hypothesis using test A

765
00:42:40,070 --> 00:42:43,140
is given by-- and we'll just
call Q sub 0 of A--

766
00:42:43,140 --> 00:42:45,880
this is our name for the
error probability.

767
00:42:53,120 --> 00:42:54,610
Have I twisted this up?

768
00:42:54,610 --> 00:42:55,860
No.

769
00:42:58,300 --> 00:43:01,150
Q sub 0 of A is the probability

770
00:43:01,150 --> 00:43:02,400
that I actually choose--

771
00:43:05,880 --> 00:43:10,170
it's the probability that I
choose A given that the

772
00:43:10,170 --> 00:43:12,210
hypothesis is 0.

773
00:43:12,210 --> 00:43:20,000
Q sub 1 of A is the probability
that I choose 1.

774
00:43:20,000 --> 00:43:22,275
Blah, let me start
that over again.

775
00:43:25,400 --> 00:43:33,880
Q0 of A is the probability
that I'm going to choose

776
00:43:33,880 --> 00:43:37,460
hypothesis 1 given that
hypothesis 0 was the correct

777
00:43:37,460 --> 00:43:38,410
hypothesis.

778
00:43:38,410 --> 00:43:43,090
It's the probability that Y is
in A. That means that H hat is

779
00:43:43,090 --> 00:43:46,980
equal to 1 given that
H is actually 0.

780
00:43:46,980 --> 00:43:52,510
So that's the probability we
make an error given the

781
00:43:52,510 --> 00:43:56,190
hypothesis, the correct
hypothesis is 0.

782
00:43:56,190 --> 00:43:59,880
Q1 of A is the probability of
making an error given that the

783
00:43:59,880 --> 00:44:02,250
correct hypothesis is 1.

784
00:44:02,250 --> 00:44:04,870
If I have a priori
probabilities, I'm going back

785
00:44:04,870 --> 00:44:07,770
to assuming a priori
probabilities again.

786
00:44:07,770 --> 00:44:10,495
The probability of error is?

787
00:44:15,340 --> 00:44:21,970
It's P0 times the probability
I make an error given that H

788
00:44:21,970 --> 00:44:23,190
equals zero.

789
00:44:23,190 --> 00:44:26,770
P1 a priori probability
of 1 given that I make

790
00:44:26,770 --> 00:44:28,920
an error given 1.

791
00:44:28,920 --> 00:44:30,260
I add these two up.

792
00:44:30,260 --> 00:44:33,750
I can write it this way.

793
00:44:33,750 --> 00:44:35,570
Don't ask for the time being.

794
00:44:35,570 --> 00:44:42,300
I'll just take the P0 out, so
it's Q0 of A plus P1 over P0

795
00:44:42,300 --> 00:44:47,920
Q1 of A. So that's what I've
called eta times Q1 of A.

796
00:44:47,920 --> 00:44:52,160
For the threshold test based
on eta, the probability of

797
00:44:52,160 --> 00:44:55,120
error is the same thing.

798
00:44:55,120 --> 00:44:59,455
But that A there is an eta.

799
00:44:59,455 --> 00:45:04,690
I hope you can imagine that
quantity there is an eta.

800
00:45:04,690 --> 00:45:05,930
This is an eta.

801
00:45:05,930 --> 00:45:10,710
So it's P0 times Q0 of eta
plus eta times Q1 of eta.

802
00:45:10,710 --> 00:45:14,840
So the eta probability, under
this crazy test that you've

803
00:45:14,840 --> 00:45:19,180
designed, is P0 times
this quantity.

804
00:45:19,180 --> 00:45:23,370
Under the MAP test, probability
of error is this

805
00:45:23,370 --> 00:45:25,710
quantity here.

806
00:45:25,710 --> 00:45:28,830
What do we know about
the MAP test?

807
00:45:28,830 --> 00:45:33,300
It minimizes the error
probability under those a

808
00:45:33,300 --> 00:45:35,160
priori probabilities.

809
00:45:35,160 --> 00:45:39,720
So what we know about it is
that this quantity is less

810
00:45:39,720 --> 00:45:43,890
than or equal to
this quantity.

811
00:45:43,890 --> 00:45:48,630
Take out the P0's and it says
that this quantity is less

812
00:45:48,630 --> 00:45:50,280
than or equal to
this quantity.

813
00:45:53,250 --> 00:45:55,920
Pretty simple.

814
00:45:55,920 --> 00:46:00,320
Let's draw a picture that
shows what that means.

815
00:46:00,320 --> 00:46:01,724
Here's a result that we have.

816
00:46:04,690 --> 00:46:09,010
We know because of maximum a
posteriori probability for the

817
00:46:09,010 --> 00:46:15,990
threshold test that this is less
than or equal to this.

818
00:46:15,990 --> 00:46:17,970
This is the minimum
error probability.

819
00:46:17,970 --> 00:46:21,590
This is the error probability
you get with

820
00:46:21,590 --> 00:46:23,820
whatever test you like.

821
00:46:23,820 --> 00:46:30,810
So let's draw a picture on a
graph where the probability of

822
00:46:30,810 --> 00:46:37,810
error given H equals 1 is
on the horizontal axis.

823
00:46:37,810 --> 00:46:42,680
The probability of error
conditional on H equals 0 is

824
00:46:42,680 --> 00:46:46,550
on this axis.

825
00:46:46,550 --> 00:46:53,420
So I can list the probability
of error for the threshold

826
00:46:53,420 --> 00:46:55,970
test, which sits here.

827
00:46:55,970 --> 00:46:59,810
I can list the probability of
error for this arbitrary test,

828
00:46:59,810 --> 00:47:01,380
which sits here.

829
00:47:01,380 --> 00:47:05,880
And I know that this quantity
is greater than or equal to

830
00:47:05,880 --> 00:47:06,890
this quantity.

831
00:47:06,890 --> 00:47:14,400
So the only thing I have to do
now is to sort out using plain

832
00:47:14,400 --> 00:47:19,560
geometry, why these numbers
are what they are.

833
00:47:19,560 --> 00:47:26,760
This number here is Q0 of eta
plus eta times Q1 of eta.

834
00:47:26,760 --> 00:47:30,070
Here's Q1 of eta.

835
00:47:30,070 --> 00:47:33,630
This distance here
is Q1 of eta.

836
00:47:33,630 --> 00:47:37,600
We have a line of slope minus
eta there that we've drawn.

837
00:47:37,600 --> 00:47:42,890
So this point here is, in fact,
Q0 of eta plus eta times

838
00:47:42,890 --> 00:47:44,620
Q1 of eta .

839
00:47:44,620 --> 00:47:47,890
That's just plain geometry.

840
00:47:47,890 --> 00:47:57,880
This point is Q0 of A plus eta
times Q1 of A. Another line of

841
00:47:57,880 --> 00:48:00,660
slope minus et.

842
00:48:00,660 --> 00:48:05,350
What we've shown is that this is
less than or equal to this.

843
00:48:10,620 --> 00:48:11,990
That's because of
the MAP rule.

844
00:48:11,990 --> 00:48:14,470
This has to be less than
or equal to that.

845
00:48:14,470 --> 00:48:16,740
So what have we shown here?

846
00:48:16,740 --> 00:48:21,260
We've shown that for every test
A you can imagine, when

847
00:48:21,260 --> 00:48:25,880
you draw that test on this
two-dimensional plot of error

848
00:48:25,880 --> 00:48:30,710
probability given H equals 1
versus error probability given

849
00:48:30,710 --> 00:48:33,090
H equals 0.

850
00:48:33,090 --> 00:48:37,360
Every test in the world lies
Northeast of this line here.

851
00:48:45,620 --> 00:48:46,050
Yeah?

852
00:48:46,050 --> 00:48:48,268
AUDIENCE: Can you say
again exactly what

853
00:48:48,268 --> 00:48:51,280
axis represents what?

854
00:48:51,280 --> 00:48:54,410
PROFESSOR: This axis here
represents the error

855
00:48:54,410 --> 00:48:58,200
probability given that H
equals 1 is the correct

856
00:48:58,200 --> 00:48:59,690
hypothesis.

857
00:48:59,690 --> 00:49:03,620
This axis is the error
probability given that 0 is

858
00:49:03,620 --> 00:49:04,870
the correct hypothesis.

859
00:49:07,420 --> 00:49:11,200
So we've defined Q1 of eta and
Q0 of eta as those two error

860
00:49:11,200 --> 00:49:12,510
probabilities.

861
00:49:12,510 --> 00:49:17,860
Using the threshold test, or
using the MAP test where eta

862
00:49:17,860 --> 00:49:20,470
is equal to P0 over P1.

863
00:49:20,470 --> 00:49:25,010
And this point here is whatever
it happens to be for

864
00:49:25,010 --> 00:49:27,300
any test that you
happen to like.

865
00:49:30,630 --> 00:49:35,390
You might have a supervisor who
wants to hire somebody and

866
00:49:35,390 --> 00:49:39,190
you view that person is a threat
to yourself, so you've

867
00:49:39,190 --> 00:49:43,010
taken all your observations and
you then make a decision.

868
00:49:43,010 --> 00:49:45,380
If the person is any good,
you say, don't hire him.

869
00:49:45,380 --> 00:49:47,480
If the person is good
you say, hire them.

870
00:49:47,480 --> 00:49:51,770
So just the opposite of
what you should do.

871
00:49:51,770 --> 00:49:57,310
But whatever you do, this says
this is less than or equal to

872
00:49:57,310 --> 00:50:00,910
this because of the MAP rule.

873
00:50:00,910 --> 00:50:05,680
And therefore, this point lies
up in that direction

874
00:50:05,680 --> 00:50:06,930
of this line here.

875
00:50:09,900 --> 00:50:12,630
You can do this for any eta that
you want to do it for.

876
00:50:15,820 --> 00:50:19,490
So for every eta that we want
to use, we get some value of

877
00:50:19,490 --> 00:50:22,240
Q0 of eta and Q1 of eta.

878
00:50:22,240 --> 00:50:25,180
These go along here
in some way.

879
00:50:25,180 --> 00:50:27,640
You can do the same
argument again.

880
00:50:27,640 --> 00:50:33,770
For every threshold test, every
point lies Northeast of

881
00:50:33,770 --> 00:50:37,380
the line of slope minus eta
through that threshold test.

882
00:50:37,380 --> 00:50:43,200
We get a whole family of curves
when eta is very big,

883
00:50:43,200 --> 00:50:47,170
the curve of slope minus
eta goes like this.

884
00:50:47,170 --> 00:50:50,260
When eta is very small,
it goes like this.

885
00:50:54,050 --> 00:50:58,000
We just think of ourselves
plotting all these curves,

886
00:50:58,000 --> 00:51:02,930
taking the upper envelope of
them because every test has to

887
00:51:02,930 --> 00:51:06,770
lie Northeast of every
one of those lines.

888
00:51:06,770 --> 00:51:11,240
So we take the upper envelope of
all of these lines, and we

889
00:51:11,240 --> 00:51:15,050
get something that
looks like this.

890
00:51:15,050 --> 00:51:18,450
We call this the error curve.

891
00:51:18,450 --> 00:51:23,730
And this is the upper envelope
of the straight lines of slope

892
00:51:23,730 --> 00:51:28,110
minus eta that go through the
threshold tests at eta.

893
00:51:31,510 --> 00:51:33,830
You get something else
from that, too.

894
00:51:33,830 --> 00:51:37,330
This curve is convex.

895
00:51:37,330 --> 00:51:39,110
Why is the curve convex?

896
00:51:39,110 --> 00:51:42,380
Well, you might like to take the
second derivative of it,

897
00:51:42,380 --> 00:51:45,090
but that's a pain in the neck.

898
00:51:45,090 --> 00:51:52,110
But the fundamental definition
of convexity is that a

899
00:51:52,110 --> 00:51:55,730
one-dimensional curve is convex
if all of its tangents

900
00:51:55,730 --> 00:51:58,230
lie underneath the curve.

901
00:51:58,230 --> 00:52:00,040
That's the way we've
constructed this.

902
00:52:00,040 --> 00:52:02,870
It's the upper envelope of a
bunch of straight lines.

903
00:52:02,870 --> 00:52:03,360
Yes?

904
00:52:03,360 --> 00:52:06,520
AUDIENCE: Can you please
explain, what is u of alpha?

905
00:52:06,520 --> 00:52:08,930
PROFESSOR: U of alpha
is just what I've

906
00:52:08,930 --> 00:52:11,870
called this upper envelope.

907
00:52:11,870 --> 00:52:14,420
This upper envelope
is now a function.

908
00:52:14,420 --> 00:52:16,110
AUDIENCE: What's
the definition?

909
00:52:16,110 --> 00:52:16,520
PROFESSOR: What?

910
00:52:16,520 --> 00:52:17,700
AUDIENCE: What is
the definition?

911
00:52:17,700 --> 00:52:19,980
PROFESSOR: The definition is
the upper envelope of all

912
00:52:19,980 --> 00:52:23,235
these straight lines.

913
00:52:23,235 --> 00:52:24,485
AUDIENCE: For changing eta?

914
00:52:24,485 --> 00:52:24,870
PROFESSOR: What?

915
00:52:24,870 --> 00:52:27,490
AUDIENCE: For changing eta?

916
00:52:27,490 --> 00:52:28,960
PROFESSOR: Yes.

917
00:52:28,960 --> 00:52:35,540
As eta changes, I get a whole
bunch of these points.

918
00:52:35,540 --> 00:52:37,940
I got a whole bunch
of these points.

919
00:52:37,940 --> 00:52:41,480
I take the upper envelope of all
of these straight lines.

920
00:52:44,200 --> 00:52:47,010
I mean, yes, you'd rather
see an equation.

921
00:52:47,010 --> 00:52:51,670
But if you see an equation
it's terribly ugly.

922
00:52:51,670 --> 00:52:55,590
I mean, you can program
a computer to do this.

923
00:52:55,590 --> 00:52:59,310
as easily as you can
program it to

924
00:52:59,310 --> 00:53:02,800
follow a bunch of equations.

925
00:53:02,800 --> 00:53:06,230
But anyway, I'm not interested
in actually solving for this

926
00:53:06,230 --> 00:53:07,480
curve in particular.

927
00:53:13,420 --> 00:53:16,660
I am particularly interested
in the fact that this upper

928
00:53:16,660 --> 00:53:22,990
envelope is, in fact, a convex
curve and that the threshold

929
00:53:22,990 --> 00:53:25,900
tests lie on the curve.

930
00:53:25,900 --> 00:53:30,690
The other tests lie Northeast
of the curve.

931
00:53:30,690 --> 00:53:34,280
And that's the reason you want
to use threshold tests.

932
00:53:34,280 --> 00:53:38,810
And it has nothing to do with a
priori probabilities at all.

933
00:53:38,810 --> 00:53:41,730
So you see, the thing we've done
is to start out assuming

934
00:53:41,730 --> 00:53:44,120
a priori probabilities.

935
00:53:44,120 --> 00:53:49,450
We've derived this neat result
using a priori probabilities.

936
00:53:49,450 --> 00:53:55,196
But now we have this
error curve.

937
00:53:55,196 --> 00:53:59,240
Well, to give you a better
definition of what u of alpha

938
00:53:59,240 --> 00:54:07,620
is, u of alpha is the error
probability under hypothesis 1

939
00:54:07,620 --> 00:54:12,450
if the error probability under
hypothesis 0 was alpha.

940
00:54:12,450 --> 00:54:16,160
You pick an error probability
here.

941
00:54:16,160 --> 00:54:18,660
You go up to that point here.

942
00:54:18,660 --> 00:54:21,750
There's a threshold
test there.

943
00:54:21,750 --> 00:54:24,730
You read over there.

944
00:54:24,730 --> 00:54:28,480
And at that point, you find the
probability of error given

945
00:54:28,480 --> 00:54:30,426
H equals 1.

946
00:54:30,426 --> 00:54:31,830
AUDIENCE: How do you know
that the threshold

947
00:54:31,830 --> 00:54:35,580
tests lie on the curve?

948
00:54:35,580 --> 00:54:42,640
PROFESSOR: Well, this threshold
test here is

949
00:54:42,640 --> 00:54:45,220
Southwest of all tests.

950
00:54:48,420 --> 00:54:53,175
And therefore, it can't lie
above this upper envelope.

951
00:54:57,300 --> 00:55:00,740
Now, I've cheated you
in one small way.

952
00:55:00,740 --> 00:55:07,370
If you have a discrete test,
what you're going to wind up

953
00:55:07,370 --> 00:55:12,580
with is just a finite set of
these possible points here.

954
00:55:12,580 --> 00:55:15,650
So you're going to wind up with
the upper envelope of a

955
00:55:15,650 --> 00:55:18,130
finite set of straight lines.

956
00:55:18,130 --> 00:55:21,770
So the straight line is
actually going to be--

957
00:55:21,770 --> 00:55:26,120
it's still convex, but it's
piecewise linear.

958
00:55:26,120 --> 00:55:30,970
And it's piecewise linear, and
the threshold tests are at the

959
00:55:30,970 --> 00:55:33,320
points of that curve.

960
00:55:33,320 --> 00:55:36,300
And in between those points,
you don't quite

961
00:55:36,300 --> 00:55:37,550
know what to do.

962
00:55:40,890 --> 00:55:44,630
So since you don't quite know
what to do in between those

963
00:55:44,630 --> 00:55:51,400
points, as far as the maximum a
posteriori probability test

964
00:55:51,400 --> 00:55:58,030
goes, you can reach any one of
those points, sometimes using

965
00:55:58,030 --> 00:56:00,890
one test on one corner of--

966
00:56:00,890 --> 00:56:02,550
I guess it's easier
if I draw it.

967
00:56:07,140 --> 00:56:10,480
And I didn't want to get into
this particularly because it's

968
00:56:10,480 --> 00:56:12,280
a little messier.

969
00:56:18,320 --> 00:56:20,600
So you could have this
kind of curve.

970
00:56:20,600 --> 00:56:24,550
And the notes talk about
this in detail.

971
00:56:24,550 --> 00:56:29,370
So the threshold test correspond
to this point.

972
00:56:29,370 --> 00:56:33,550
This point says always
decide one.

973
00:56:33,550 --> 00:56:38,020
Don't pay any attention to the
tests at all, just say I think

974
00:56:38,020 --> 00:56:40,980
one is the right hypothesis.

975
00:56:40,980 --> 00:56:44,880
I mean, this is the testing
philosophy of people who don't

976
00:56:44,880 --> 00:56:46,980
believe in experimentalism.

977
00:56:46,980 --> 00:56:48,840
They've already made
up their mind.

978
00:56:48,840 --> 00:56:50,330
They look at the results.

979
00:56:50,330 --> 00:56:52,200
They say, that's very
interesting.

980
00:56:52,200 --> 00:56:56,070
And then they say, I'm
going to choose this.

981
00:56:56,070 --> 00:57:00,944
These other points are our
particular threshold tests.

982
00:57:04,680 --> 00:57:07,680
If you want to get error
probabilities in the middle

983
00:57:07,680 --> 00:57:09,420
here, what do you do?

984
00:57:09,420 --> 00:57:11,500
You use a randomized test.

985
00:57:11,500 --> 00:57:12,700
Sometimes you use this.

986
00:57:12,700 --> 00:57:14,150
Sometimes you use this.

987
00:57:14,150 --> 00:57:17,120
You flip a coin and choose
whichever one of these you

988
00:57:17,120 --> 00:57:18,370
want to choose.

989
00:57:20,990 --> 00:57:27,160
So what this says is the
Neyman-Pearson test, which is

990
00:57:27,160 --> 00:57:36,190
the test that says pick some
alpha, which is the error

991
00:57:36,190 --> 00:57:39,630
probability under hypothesis
1 that

992
00:57:39,630 --> 00:57:41,660
you're willing to tolerate.

993
00:57:41,660 --> 00:57:44,130
So you pick alpha.

994
00:57:44,130 --> 00:57:48,330
And then it says, minimize the
error probability of the other

995
00:57:48,330 --> 00:57:51,790
kind, so you read over there.

996
00:57:51,790 --> 00:57:56,630
And the Neyman-Pearson test,
what it does is it minimizes

997
00:57:56,630 --> 00:58:02,130
the error probability under
the other hypothesis.

998
00:58:02,130 --> 00:58:05,530
Now, when this curve is
piecewise linear, the

999
00:58:05,530 --> 00:58:09,530
Neyman-Pearson test is not a
threshold test, but it's a

1000
00:58:09,530 --> 00:58:11,930
randomized threshold test.

1001
00:58:11,930 --> 00:58:15,300
Sometimes when you're at a point
like this, you have to

1002
00:58:15,300 --> 00:58:17,485
use this test and this
test sometimes.

1003
00:58:20,340 --> 00:58:25,710
For most of the tests that you
deal with, Neyman-Pearson test

1004
00:58:25,710 --> 00:58:28,570
is just the threshold
test that's at

1005
00:58:28,570 --> 00:58:31,180
that particular point.

1006
00:58:34,670 --> 00:58:38,100
Any questions about that?

1007
00:58:38,100 --> 00:58:39,960
This is probably one of these
things you have to think about

1008
00:58:39,960 --> 00:58:40,980
a little bit.

1009
00:58:40,980 --> 00:58:41,510
Yes?

1010
00:58:41,510 --> 00:58:44,330
AUDIENCE: When you say you have
to use this test or this

1011
00:58:44,330 --> 00:58:48,045
test, are you talking about
threshold or are you talking

1012
00:58:48,045 --> 00:58:51,184
about-- because this is always--
it's either H equals

1013
00:58:51,184 --> 00:58:53,846
0 or H equal 1, right?

1014
00:58:53,846 --> 00:58:56,508
What do you mean when you say
you have to randomize between

1015
00:58:56,508 --> 00:58:59,180
the two tests?

1016
00:58:59,180 --> 00:59:00,695
I mean threshold tests--

1017
00:59:15,120 --> 00:59:20,290
if I have a finite set of
alternatives, and I'm doing a

1018
00:59:20,290 --> 00:59:24,640
threshold test on that finite
set of alternatives, I only

1019
00:59:24,640 --> 00:59:29,500
have a finite number
of things I can do.

1020
00:59:29,500 --> 00:59:33,230
As I increase the threshold,
I suddenly get to the point

1021
00:59:33,230 --> 00:59:36,750
where this ratio of likelihoods

1022
00:59:36,750 --> 00:59:39,190
includes one more point.

1023
00:59:39,190 --> 00:59:41,500
And then it gets to the point
where it includes one other

1024
00:59:41,500 --> 00:59:43,770
point and so forth.

1025
00:59:43,770 --> 00:59:49,430
So that what happens is that
this upper envelope is just

1026
00:59:49,430 --> 00:59:53,320
the upper envelope of a finite
number of points.

1027
00:59:53,320 --> 00:59:56,980
And this upper envelope of a
finite number of points, the

1028
00:59:56,980 --> 01:00:00,500
threshold tests are just
the corners there.

1029
01:00:00,500 --> 01:00:04,330
So I sometimes have to randomize
between them.

1030
01:00:04,330 --> 01:00:05,880
If you don't like
that, ignore it.

1031
01:00:09,130 --> 01:00:16,450
Because for most tests you deal
with, almost all books on

1032
01:00:16,450 --> 01:00:20,300
statistics that I've ever
seen, it just says the

1033
01:00:20,300 --> 01:00:25,130
Neyman-Pearson test looks at the
threshold curve, at this

1034
01:00:25,130 --> 01:00:26,610
error curve.

1035
01:00:26,610 --> 01:00:29,040
And it chooses accordingly.

1036
01:00:29,040 --> 01:00:31,228
Yes?

1037
01:00:31,228 --> 01:00:36,590
AUDIENCE: You can put the
previous slide back?

1038
01:00:36,590 --> 01:00:42,690
You told us that because
of maximum a posteriori

1039
01:00:42,690 --> 01:00:49,870
probability, if eta is equal to
P0 divided by P1, then the

1040
01:00:49,870 --> 01:00:51,950
probability of error
is minimized.

1041
01:00:51,950 --> 01:00:56,900
And so the errors of the
test A are [INAUDIBLE].

1042
01:00:59,750 --> 01:01:04,738
But if we start changing eta
from 0 to infinity, it doesn't

1043
01:01:04,738 --> 01:01:05,704
have to be anymore.

1044
01:01:05,704 --> 01:01:09,175
[INAUDIBLE], which means
the error is

1045
01:01:09,175 --> 01:01:11,015
not necessarily minimized.

1046
01:01:11,015 --> 01:01:13,170
So the argument doesn't
hold anymore.

1047
01:01:13,170 --> 01:01:17,880
PROFESSOR: As I change eta, I'm
changing P1 and P0 also.

1048
01:01:17,880 --> 01:01:21,760
In other words, now what I'm
doing is I'm saying, let's

1049
01:01:21,760 --> 01:01:27,240
look at this threshold test,
and let's visualize what

1050
01:01:27,240 --> 01:01:32,010
happens as I change the a
priori probabilities.

1051
01:01:32,010 --> 01:01:37,390
So I'm suddenly becoming a
classical statistician instead

1052
01:01:37,390 --> 01:01:40,340
of a Bayesian one.

1053
01:01:40,340 --> 01:01:42,250
But I know what the answers
are from looking at the

1054
01:01:42,250 --> 01:01:43,500
Bayesian case.

1055
01:01:48,370 --> 01:01:53,290
OK, so let's move on.

1056
01:01:57,160 --> 01:02:02,245
I mean, we now sort of see
that these tests--

1057
01:02:05,180 --> 01:02:08,120
well, one thing we've seen is
when you have to make a

1058
01:02:08,120 --> 01:02:13,120
decision under this kind of
probabilistic model we've been

1059
01:02:13,120 --> 01:02:18,070
talking about-- namely, two
hypotheses, IID random

1060
01:02:18,070 --> 01:02:20,153
variable is conditional
on each hypothesis.

1061
01:02:23,420 --> 01:02:26,370
Those hypothesis testing
problems turn

1062
01:02:26,370 --> 01:02:29,350
into random walk problems.

1063
01:02:29,350 --> 01:02:32,580
We also saw that
the [? GG1Q ?]

1064
01:02:32,580 --> 01:02:37,040
when I started looking at when
the system becomes empty, and

1065
01:02:37,040 --> 01:02:43,010
how long it takes to start to
fill up again, that problem is

1066
01:02:43,010 --> 01:02:44,880
a random walk problem.

1067
01:02:44,880 --> 01:02:48,000
So now I want to start to ask
the question, what's the

1068
01:02:48,000 --> 01:02:52,470
probability that a random walk
will cross a threshold?

1069
01:02:52,470 --> 01:02:54,700
I'm going to apply the
Chernoff bound to it.

1070
01:02:54,700 --> 01:02:56,010
You remember the
Chernoff bound?

1071
01:02:56,010 --> 01:03:00,410
We talked about it a little
bit back on the

1072
01:03:00,410 --> 01:03:03,180
second week of the term.

1073
01:03:03,180 --> 01:03:06,420
We were talking about the Markov
inequality and the

1074
01:03:06,420 --> 01:03:08,270
Chebyshev inequality.

1075
01:03:08,270 --> 01:03:12,200
And we said that the Chernoff
inequality was the same sort

1076
01:03:12,200 --> 01:03:17,780
of thing, except it was based
on e to the rZ rather than x

1077
01:03:17,780 --> 01:03:20,290
or x squared.

1078
01:03:20,290 --> 01:03:24,620
And we talked a little bit
about its properties.

1079
01:03:24,620 --> 01:03:28,790
The major thing one uses the
Chernoff bound for is to get

1080
01:03:28,790 --> 01:03:33,020
good estimates very, very
far away from the mean.

1081
01:03:33,020 --> 01:03:36,200
In other words, good estimates
of probabilities that are

1082
01:03:36,200 --> 01:03:38,040
very, very small.

1083
01:03:38,040 --> 01:03:41,370
I've grown up using these all
my life because I've been

1084
01:03:41,370 --> 01:03:43,440
concerned with error
probabilities in

1085
01:03:43,440 --> 01:03:46,010
communication systems.

1086
01:03:46,010 --> 01:03:49,630
You typically want error
probabilities that run between

1087
01:03:49,630 --> 01:03:53,420
10 to the minus fifth and
10 to the minus eighth.

1088
01:03:53,420 --> 01:03:58,940
So you want to look at points
which are quite far away.

1089
01:03:58,940 --> 01:04:02,550
I mean, you take a
large number of--

1090
01:04:02,550 --> 01:04:05,230
you take a sum of a large number
of variables, which

1091
01:04:05,230 --> 01:04:09,330
correspond to a code.

1092
01:04:09,330 --> 01:04:12,400
And you look at error
probabilities for this rather

1093
01:04:12,400 --> 01:04:13,790
complicated thing.

1094
01:04:13,790 --> 01:04:16,380
But you're looking very, very
far away from the mean, and

1095
01:04:16,380 --> 01:04:19,620
you're looking at very large
numbers of observations.

1096
01:04:19,620 --> 01:04:25,920
So instead of the kinds of
things where we deal with

1097
01:04:25,920 --> 01:04:28,380
things like the central limit
theorem where you're trying to

1098
01:04:28,380 --> 01:04:31,430
figure out what goes on close
to the mean, here you're

1099
01:04:31,430 --> 01:04:36,170
trying to figure out what goes
on very far from the mean.

1100
01:04:36,170 --> 01:04:40,990
OK, so what the Chernoff bound
says is that the probability

1101
01:04:40,990 --> 01:04:45,820
that a random variable Z is
greater than or equal to some

1102
01:04:45,820 --> 01:04:47,390
constant b.

1103
01:04:47,390 --> 01:04:50,580
We don't even need sums of
random variables here, it's

1104
01:04:50,580 --> 01:04:54,590
just a Chernoff bound is a
bound on the tail of a

1105
01:04:54,590 --> 01:04:55,980
distribution.

1106
01:04:55,980 --> 01:04:59,800
Is less than or equal to the
moment generating function of

1107
01:04:59,800 --> 01:05:01,810
that random variable.

1108
01:05:01,810 --> 01:05:08,090
g sub Z of r is the expected
value of e to the rZ.

1109
01:05:08,090 --> 01:05:10,890
These generating functions,
you can calculate

1110
01:05:10,890 --> 01:05:12,960
them if you want to.

1111
01:05:12,960 --> 01:05:15,390
Times e to the minus rb.

1112
01:05:15,390 --> 01:05:18,550
This is the Markov inequality
for the random

1113
01:05:18,550 --> 01:05:21,750
variable e to the rZ.

1114
01:05:21,750 --> 01:05:26,330
And go back and review
chapter 1.

1115
01:05:26,330 --> 01:05:29,770
I think it's section
1.43 or something.

1116
01:05:29,770 --> 01:05:34,180
It's the section that deals with
the Markov inequality,

1117
01:05:34,180 --> 01:05:40,970
the Chebyshev inequality,
and the Chernoff bound.

1118
01:05:40,970 --> 01:05:43,880
And as I told you once when we
talked about these things,

1119
01:05:43,880 --> 01:05:45,620
Chernoff is still
alive and well.

1120
01:05:45,620 --> 01:05:47,840
He's a statistician
at Harvard.

1121
01:05:47,840 --> 01:05:51,480
He was somewhat embarrassed by
this inequality becoming so

1122
01:05:51,480 --> 01:05:55,620
famous because he did it as sort
of a throw-off thing in a

1123
01:05:55,620 --> 01:05:59,250
paper where he was trying to
do something which was much

1124
01:05:59,250 --> 01:06:02,290
more mathematically
sophisticated.

1125
01:06:02,290 --> 01:06:05,440
And now the poor guy is only
known for this thing that he

1126
01:06:05,440 --> 01:06:06,690
views as being trivial.

1127
01:06:11,360 --> 01:06:14,220
But what the bound says is the
probability of Z is greater

1128
01:06:14,220 --> 01:06:17,380
than or equal to b is
this inequality.

1129
01:06:17,380 --> 01:06:20,840
Strangely enough, the
probability that Z is less

1130
01:06:20,840 --> 01:06:25,710
than or equal to b is bounded
by the same inequality.

1131
01:06:25,710 --> 01:06:28,980
But one of them, r
is bigger than 0.

1132
01:06:28,980 --> 01:06:33,220
And the other one,
r is less than 0.

1133
01:06:33,220 --> 01:06:35,230
And you have to go back and
read that section to

1134
01:06:35,230 --> 01:06:37,800
understand why.

1135
01:06:37,800 --> 01:06:40,560
Now, this is most useful when
it's applied to a sum of

1136
01:06:40,560 --> 01:06:42,270
random variables.

1137
01:06:42,270 --> 01:06:46,670
I don't know of any applications
for it otherwise.

1138
01:06:46,670 --> 01:06:50,580
So if the moment-generating
function--

1139
01:06:50,580 --> 01:06:52,870
oh, incidentally, also.

1140
01:06:52,870 --> 01:06:56,380
When most people talk about
moment-generating functions,

1141
01:06:56,380 --> 01:06:59,650
and certainly when people talked
about moment-generating

1142
01:06:59,650 --> 01:07:04,640
functions before the 1950s or
so, what they were always

1143
01:07:04,640 --> 01:07:08,830
interested in is the fact that
if you take derivatives of the

1144
01:07:08,830 --> 01:07:12,540
moment-generating functions,
you generate the moments of

1145
01:07:12,540 --> 01:07:14,980
the random variable.

1146
01:07:14,980 --> 01:07:17,860
If you take the derivative of
this with respect to r,

1147
01:07:17,860 --> 01:07:22,970
evaluate it at r equals 0, you
get the expected value of Z.

1148
01:07:22,970 --> 01:07:26,610
If you take the second
derivative evaluated at r

1149
01:07:26,610 --> 01:07:30,720
equals 0, you get the
expected value of Z

1150
01:07:30,720 --> 01:07:32,700
squared, and so forth.

1151
01:07:32,700 --> 01:07:36,810
You can see that by just taking
the derivative of that.

1152
01:07:36,810 --> 01:07:38,580
Here, we're looking
at something else.

1153
01:07:38,580 --> 01:07:42,200
We're not looking at what goes
on around r equals 0.

1154
01:07:42,200 --> 01:07:45,640
We're trying to figure out what
goes on way on the far

1155
01:07:45,640 --> 01:07:48,760
tails of these distributions.

1156
01:07:48,760 --> 01:07:56,860
So if gX of r is e to the rX,
then e to the e to the r Sn--

1157
01:07:56,860 --> 01:07:59,380
Sn is the sum of these
random variables--

1158
01:07:59,380 --> 01:08:04,590
is the expected value of the
product of e to the rXi.

1159
01:08:04,590 --> 01:08:07,300
Namely, it's e to the r.

1160
01:08:07,300 --> 01:08:09,150
Some of Xi.

1161
01:08:09,150 --> 01:08:11,020
So that turns into a product.

1162
01:08:11,020 --> 01:08:15,520
The expected value of a product
of a finite number of

1163
01:08:15,520 --> 01:08:19,319
terms is the product of
the expected value.

1164
01:08:19,319 --> 01:08:23,460
So it's gX or r to
the n-th power.

1165
01:08:23,460 --> 01:08:27,200
So if I want to write this, now
I'm applying the Chernoff

1166
01:08:27,200 --> 01:08:30,020
bound to the random
variable S sub n.

1167
01:08:30,020 --> 01:08:32,880
What's the probability that S
sub n is greater than or equal

1168
01:08:32,880 --> 01:08:34,840
to n times a?

1169
01:08:34,840 --> 01:08:39,000
It's gX to the n of r times
e to the minus rna.

1170
01:08:39,000 --> 01:08:41,260
That's what the Chernoff
bound says.

1171
01:08:41,260 --> 01:08:46,640
This is the Chernoff bound over
on the other side of the

1172
01:08:46,640 --> 01:08:49,240
distribution.

1173
01:08:49,240 --> 01:08:54,020
This only makes sense and has
interesting values when a is

1174
01:08:54,020 --> 01:08:56,990
bigger than the mean or when
a is less than the mean.

1175
01:08:56,990 --> 01:09:01,210
And when r is greater than 0 for
this one and less than 0

1176
01:09:01,210 --> 01:09:02,460
for this one.

1177
01:09:07,370 --> 01:09:10,640
Now, this is easier to
interpret and it's

1178
01:09:10,640 --> 01:09:13,729
easier to work with.

1179
01:09:13,729 --> 01:09:20,819
If you take that product of
terms g to the r to the n-th

1180
01:09:20,819 --> 01:09:27,020
power and you visualize the
logarithm of g to the X.

1181
01:09:27,020 --> 01:09:31,850
Visualize the logarithm of g
to the X, then you get this

1182
01:09:31,850 --> 01:09:33,114
quantity up here.

1183
01:09:41,340 --> 01:09:44,529
You get the probability that Sn
is greater than or equal to

1184
01:09:44,529 --> 01:09:51,350
na is this e to the n times
gamma x of r minus ra.

1185
01:09:51,350 --> 01:09:57,600
Gamma is the logarithm of the
moment-generating function.

1186
01:09:57,600 --> 01:10:00,710
The logarithm of the
moment-generating function is

1187
01:10:00,710 --> 01:10:02,980
always called the
semi-invariant

1188
01:10:02,980 --> 01:10:04,980
moment-generating function.

1189
01:10:04,980 --> 01:10:07,620
The name is, again, because
people were originally

1190
01:10:07,620 --> 01:10:10,570
interested in the
moment-generating properties

1191
01:10:10,570 --> 01:10:12,480
of these random variables.

1192
01:10:12,480 --> 01:10:17,060
If you sit down and take
the derivatives, I can

1193
01:10:17,060 --> 01:10:19,080
probably do it here.

1194
01:10:19,080 --> 01:10:21,195
It's simple enough that
I won't get confused.

1195
01:10:26,640 --> 01:10:37,140
The derivative with respect to
r of the logarithm of g of r

1196
01:10:37,140 --> 01:10:44,810
is first derivative of
r divided by g of r.

1197
01:10:44,810 --> 01:10:52,890
And the second derivative
is then the

1198
01:10:52,890 --> 01:10:55,660
natural log of g of r.

1199
01:10:55,660 --> 01:11:00,120
Taking the derivative of that is
equal to g double prime of

1200
01:11:00,120 --> 01:11:06,000
r over g of r squared.

1201
01:11:06,000 --> 01:11:09,300
Tell me if I'm making a mistake
here because I usually

1202
01:11:09,300 --> 01:11:11,360
do when I do this.

1203
01:11:11,360 --> 01:11:19,690
Minus g of r and g prime of r.

1204
01:11:22,950 --> 01:11:35,770
Probably divided by
this squared.

1205
01:11:35,770 --> 01:11:37,020
Let's see.

1206
01:11:37,020 --> 01:11:38,470
Is this right?

1207
01:11:38,470 --> 01:11:41,810
Who can take derivatives here?

1208
01:11:41,810 --> 01:11:43,620
AUDIENCE: First term doesn't
have a square in it.

1209
01:11:43,620 --> 01:11:43,970
PROFESSOR: What?

1210
01:11:43,970 --> 01:11:45,875
AUDIENCE: First term doesn't
have a square in the

1211
01:11:45,875 --> 01:11:47,150
denominator.

1212
01:11:47,150 --> 01:11:49,780
PROFESSOR: First term?

1213
01:11:49,780 --> 01:11:51,610
Yeah.

1214
01:11:51,610 --> 01:11:53,280
Oh, the first thing doesn't
have a square.

1215
01:11:53,280 --> 01:11:54,375
No, you're right.

1216
01:11:54,375 --> 01:11:56,350
AUDIENCE: Second one
doesn't have--

1217
01:11:56,350 --> 01:11:59,400
PROFESSOR: And the second
one, let's see.

1218
01:11:59,400 --> 01:12:00,650
We have--

1219
01:12:03,850 --> 01:12:06,230
we just have g prime
of r squared

1220
01:12:06,230 --> 01:12:08,150
divided by g of r squared.

1221
01:12:08,150 --> 01:12:12,930
And we evaluate this
at r equals 0.

1222
01:12:12,930 --> 01:12:14,930
This term becomes 1.

1223
01:12:14,930 --> 01:12:17,340
This term becomes 1.

1224
01:12:17,340 --> 01:12:22,760
This term becomes the second
moment x squared bar.

1225
01:12:22,760 --> 01:12:26,300
And this term becomes
x bar squared.

1226
01:12:26,300 --> 01:12:32,030
And this whole thing becomes the
variance of the moment of

1227
01:12:32,030 --> 01:12:37,980
the random variable rather
than the second moment.

1228
01:12:37,980 --> 01:12:43,100
All of these terms might be
wrong, but this term is right.

1229
01:12:43,100 --> 01:12:47,010
And I'm sure all of you can
rewrite that and evaluate it

1230
01:12:47,010 --> 01:12:48,190
at r equals 0.

1231
01:12:48,190 --> 01:12:50,240
So that's why it's called
the semi-invariant

1232
01:12:50,240 --> 01:12:52,280
moment-generating function.

1233
01:12:52,280 --> 01:12:55,610
It doesn't make any difference
for what we're interested in.

1234
01:12:55,610 --> 01:12:59,550
The thing that we're interested
in is that this

1235
01:12:59,550 --> 01:13:00,810
exponent here--

1236
01:13:03,520 --> 01:13:07,490
as you visualize doing this
experiment and taking

1237
01:13:07,490 --> 01:13:15,000
additional observations, what
happens is the probability

1238
01:13:15,000 --> 01:13:19,480
that you exceed na--

1239
01:13:19,480 --> 01:13:25,310
that the n-th sum exceeds n
times some fixed quantity a is

1240
01:13:25,310 --> 01:13:26,950
going down exponentially
with [? the a. ?]

1241
01:13:29,450 --> 01:13:32,100
Now, is this bound any good?

1242
01:13:35,150 --> 01:13:39,970
Well, if you optimize it over
r, It's essentially

1243
01:13:39,970 --> 01:13:41,730
exponentially tight.

1244
01:13:41,730 --> 01:13:45,210
So, in fact, it is good.

1245
01:13:45,210 --> 01:13:48,030
What does it mean to be
exponentially tight?

1246
01:13:48,030 --> 01:13:50,680
That's what I don't want
to define carefully.

1247
01:13:50,680 --> 01:13:53,540
There's a theorem in the notes
that says what exponentially

1248
01:13:53,540 --> 01:13:54,850
tight means.

1249
01:13:54,850 --> 01:13:58,250
And it takes you half an hour to
read it because it's being

1250
01:13:58,250 --> 01:14:00,110
stated very carefully.

1251
01:14:00,110 --> 01:14:08,430
What it says essentially is that
if I take this quantity

1252
01:14:08,430 --> 01:14:14,510
here and I subtract--

1253
01:14:14,510 --> 01:14:16,710
I add an epsilon to it.

1254
01:14:16,710 --> 01:14:22,510
Namely, e to the n times this
quantity minus epsilon.

1255
01:14:22,510 --> 01:14:25,600
So I have an e to the
minus n epsilon, see

1256
01:14:25,600 --> 01:14:26,720
it sitting in there?

1257
01:14:26,720 --> 01:14:31,170
When I take this exponent and I
reduce it just a little bit,

1258
01:14:31,170 --> 01:14:33,120
I get a bound that isn't true.

1259
01:14:33,120 --> 01:14:35,850
This is greater than
or equal to the

1260
01:14:35,850 --> 01:14:37,860
quantity with an epsilon.

1261
01:14:37,860 --> 01:14:40,490
In other words, you can't make
an exponent that's any

1262
01:14:40,490 --> 01:14:42,350
smaller than this.

1263
01:14:42,350 --> 01:14:45,690
You can take coefficients and
play with them, but you can't

1264
01:14:45,690 --> 01:14:48,750
make the exponent any smaller.

1265
01:14:48,750 --> 01:14:55,490
OK, all of these things you
can do them by pictures.

1266
01:14:55,490 --> 01:14:58,560
I know many of you don't like
doing things by pictures.

1267
01:14:58,560 --> 01:15:02,060
I keep doing them by pictures
because I keep trying to

1268
01:15:02,060 --> 01:15:05,820
convince you that pictures
are more rigorous

1269
01:15:05,820 --> 01:15:07,750
than equations are.

1270
01:15:07,750 --> 01:15:10,850
At least, many times.

1271
01:15:10,850 --> 01:15:13,690
If you want to show that
something is convex, you try

1272
01:15:13,690 --> 01:15:17,800
to show that the second
derivative is positive.

1273
01:15:17,800 --> 01:15:20,640
That works sometimes and it
doesn't work sometimes.

1274
01:15:20,640 --> 01:15:23,430
I mean, it works as a function
is continuous and has a

1275
01:15:23,430 --> 01:15:25,450
continuous first derivative.

1276
01:15:25,450 --> 01:15:27,850
It doesn't work. otherwise.

1277
01:15:27,850 --> 01:15:33,280
When you start taking tangents
of the curve, and you say the

1278
01:15:33,280 --> 01:15:40,560
upper envelope of the tangents
to the curve all lie below the

1279
01:15:40,560 --> 01:15:42,640
function, then it
works perfectly.

1280
01:15:42,640 --> 01:15:44,800
That's what a convex function
is by definition.

1281
01:15:48,210 --> 01:15:49,810
How do we derive
all this stuff?

1282
01:15:52,350 --> 01:15:56,320
What we're trying to
do is to find--

1283
01:15:56,320 --> 01:16:04,990
I mean, this inequality here is
true for all r, for all r

1284
01:16:04,990 --> 01:16:09,710
greater than 0 so long as a is
greater than the mean of X.

1285
01:16:09,710 --> 01:16:12,970
It's true for all r for which
this moment-generating

1286
01:16:12,970 --> 01:16:15,020
function exists.

1287
01:16:15,020 --> 01:16:18,400
Moment-generating functions can
sometimes blow up, so they

1288
01:16:18,400 --> 01:16:21,060
don't exist everywhere.

1289
01:16:21,060 --> 01:16:22,270
So it's true wherever the

1290
01:16:22,270 --> 01:16:25,140
moment-generating function exists.

1291
01:16:25,140 --> 01:16:29,890
So we like to find the r for
which this bound is tightest.

1292
01:16:29,890 --> 01:16:33,240
So what I'm going to do is draw
a picture and show you

1293
01:16:33,240 --> 01:16:37,160
where it's tightest in
terms of the picture.

1294
01:16:37,160 --> 01:16:40,380
What I've drawn here is
the semi-invariant

1295
01:16:40,380 --> 01:16:43,240
moment-generating function.

1296
01:16:43,240 --> 01:16:47,580
Why didn't I put that down?

1297
01:16:47,580 --> 01:16:51,130
This is gamma of r.

1298
01:16:51,130 --> 01:16:55,630
Gamma of r at 0, it's the log
of the moment-generating

1299
01:16:55,630 --> 01:16:58,500
function at 0, which is 0.

1300
01:17:01,090 --> 01:17:03,000
It's convex.

1301
01:17:03,000 --> 01:17:06,190
You take its second
derivative.

1302
01:17:06,190 --> 01:17:09,180
Its second derivative at r
equals 0 is pretty easy.

1303
01:17:09,180 --> 01:17:12,250
Its second derivative of other
values or r you have to

1304
01:17:12,250 --> 01:17:13,500
struggle with it.

1305
01:17:16,010 --> 01:17:20,200
But when you struggle a little
bit, it is convex.

1306
01:17:20,200 --> 01:17:23,770
If you've got a curve that goes
down like this, then it

1307
01:17:23,770 --> 01:17:25,800
goes back up again.

1308
01:17:25,800 --> 01:17:28,100
Sometimes goes off
towards infinity.

1309
01:17:28,100 --> 01:17:30,950
Might do whatever
it wants to do.

1310
01:17:30,950 --> 01:17:34,790
Sometimes at a certain value
of r, it stops existing.

1311
01:17:34,790 --> 01:17:37,750
Suppose I take the
simplest random

1312
01:17:37,750 --> 01:17:39,800
variable you know about.

1313
01:17:39,800 --> 01:17:43,440
You only know two simple
random variables.

1314
01:17:43,440 --> 01:17:46,030
One of them is a binary
random variable.

1315
01:17:46,030 --> 01:17:49,420
The other one's an exponential
random variable.

1316
01:17:49,420 --> 01:17:54,330
Suppose I take the exponential
random variable with density

1317
01:17:54,330 --> 01:17:59,020
alpha times e to the minus
alpha X. Where does this

1318
01:17:59,020 --> 01:18:02,580
moment-generating
function exist?

1319
01:18:02,580 --> 01:18:16,220
You take alpha and I multiply
it by e to the rX when I

1320
01:18:16,220 --> 01:18:17,470
integrate it.

1321
01:18:20,940 --> 01:18:22,190
Where does this exist?

1322
01:18:25,100 --> 01:18:27,110
I mean, don't bother
to integrate it.

1323
01:18:31,910 --> 01:18:36,630
If r is bigger than alpha, this
exponent is bigger than

1324
01:18:36,630 --> 01:18:37,890
this exponent.

1325
01:18:37,890 --> 01:18:40,110
And this thing takes off
towards infinity.

1326
01:18:40,110 --> 01:18:43,715
If r is less than a, the
whole thing goes to 0.

1327
01:18:49,220 --> 01:18:59,020
gX of r exists for r less
than alpha in this case.

1328
01:19:01,620 --> 01:19:06,290
And in general, if you look at
a moment-generating function,

1329
01:19:06,290 --> 01:19:11,140
if the tail of that distribution
function is going

1330
01:19:11,140 --> 01:19:14,880
to 0 exponentially, you find the
rate at which it's going

1331
01:19:14,880 --> 01:19:16,980
to 0 exponentially.

1332
01:19:16,980 --> 01:19:18,650
And that's where the
moment-generating

1333
01:19:18,650 --> 01:19:21,810
function cuts off.

1334
01:19:21,810 --> 01:19:23,630
It has to cut off.

1335
01:19:23,630 --> 01:19:27,070
You can't show a result like
this, which says something is

1336
01:19:27,070 --> 01:19:30,710
going to 0, faster than it could
possibly be going to 0.

1337
01:19:33,600 --> 01:19:35,460
So we have to have that
kind of result.

1338
01:19:35,460 --> 01:19:37,760
But anyway, we draw
this curve.

1339
01:19:37,760 --> 01:19:40,350
This is mu sub X of r.

1340
01:19:40,350 --> 01:19:46,520
And then we say, how do we
graphically minimize gamma of

1341
01:19:46,520 --> 01:19:50,900
r minus r times a?

1342
01:19:50,900 --> 01:19:57,140
Well, what I do because I've
done this before and I know

1343
01:19:57,140 --> 01:19:59,090
how to do it--

1344
01:19:59,090 --> 01:20:01,580
I mean, it's not the kind of
thing where if you sat down

1345
01:20:01,580 --> 01:20:05,670
you would immediately
settle on this.

1346
01:20:05,670 --> 01:20:09,710
I look at some particular
value of r.

1347
01:20:09,710 --> 01:20:16,370
If I take a line of slope gamma
prime of r, that's a

1348
01:20:16,370 --> 01:20:19,920
tangent to this curve because
this curve is convex.

1349
01:20:19,920 --> 01:20:24,380
So if I take a line through here
of this slope and I look

1350
01:20:24,380 --> 01:20:29,210
at where this line hits here,
where does it hit?

1351
01:20:29,210 --> 01:20:34,320
It hits at gamma sub X of
r, this point here,

1352
01:20:34,320 --> 01:20:38,230
minus gamma X of r--

1353
01:20:41,880 --> 01:20:43,130
oh.

1354
01:20:50,220 --> 01:20:55,100
Well, what I've done is I've
already optimized the problem.

1355
01:20:55,100 --> 01:20:58,600
I'm trying to find the
probability that Sn is greater

1356
01:20:58,600 --> 01:20:59,950
than or equal to na.

1357
01:20:59,950 --> 01:21:03,870
I'm trying to minimize this
exponent here, gamma

1358
01:21:03,870 --> 01:21:06,890
X of r minus ra.

1359
01:21:06,890 --> 01:21:10,420
Unfortunately, I really start
out by taking the derivative

1360
01:21:10,420 --> 01:21:13,260
of that and setting it equal to
0, which is what you would

1361
01:21:13,260 --> 01:21:15,120
all do, too.

1362
01:21:15,120 --> 01:21:19,330
When I set the derivative of
this equal to 0, I get gamma

1363
01:21:19,330 --> 01:21:26,010
prime of r minus a is equal to
0, which is what this says.

1364
01:21:26,010 --> 01:21:31,540
So then we take a line of slope
gamma x of r equals 0.

1365
01:21:31,540 --> 01:21:34,130
It's tangent at this
point here.

1366
01:21:34,130 --> 01:21:37,500
You look at this point over here
and you get the minimum

1367
01:21:37,500 --> 01:21:41,290
value of the gamma X
of r minus r0 a.

1368
01:21:45,370 --> 01:21:49,920
So what this says is when you
vary a, you can go through

1369
01:21:49,920 --> 01:21:57,440
this maximization tilting
this curve around.

1370
01:21:57,440 --> 01:22:02,180
I mean, a determines the slope
of this line here.

1371
01:22:02,180 --> 01:22:06,930
If I use a smaller value of
a, the slope is smaller.

1372
01:22:06,930 --> 01:22:08,470
It hits in here.

1373
01:22:08,470 --> 01:22:14,220
If I take a larger value of a,
it comes in further down and

1374
01:22:14,220 --> 01:22:15,500
the exponent gets bigger.

1375
01:22:15,500 --> 01:22:16,840
That's not surprising.

1376
01:22:16,840 --> 01:22:19,700
I want to find out the
probability that S sub n is

1377
01:22:19,700 --> 01:22:21,670
greater than or equal to a.

1378
01:22:21,670 --> 01:22:26,350
As I increase a, I expect this
exponent to keep going down as

1379
01:22:26,350 --> 01:22:29,390
I make a bigger and bigger
because it's harder and harder

1380
01:22:29,390 --> 01:22:33,700
for it to be greater
than or equal to a.

1381
01:22:33,700 --> 01:22:37,570
So anyway, when you optimize
this, you get something

1382
01:22:37,570 --> 01:22:39,720
exponentially tight.

1383
01:22:39,720 --> 01:22:42,540
And this is what
it's equal to.

1384
01:22:42,540 --> 01:22:46,560
And I would recommend that you
go back and read the section

1385
01:22:46,560 --> 01:22:50,100
of chapter 1, which goes through
all of this in a

1386
01:22:50,100 --> 01:22:51,350
little more detail.

1387
01:22:56,820 --> 01:23:00,600
Let me go passed that.

1388
01:23:00,600 --> 01:23:03,800
Don't want to talk about that.

1389
01:23:03,800 --> 01:23:09,640
Well, when I do this
optimization, if what I'm

1390
01:23:09,640 --> 01:23:13,210
looking at is the probability
that S sub n is greater than

1391
01:23:13,210 --> 01:23:17,000
or equal to some alpha rather
than n times a when I'm do

1392
01:23:17,000 --> 01:23:20,120
this optimization and I'm
looking at what happens at

1393
01:23:20,120 --> 01:23:24,170
different values of n, it turns
out that when n is very

1394
01:23:24,170 --> 01:23:31,770
big, you get something which
is tangent there.

1395
01:23:31,770 --> 01:23:36,340
As n gets smaller, you get these
tangents that come down

1396
01:23:36,340 --> 01:23:38,620
that comes in to there,
and then it starts

1397
01:23:38,620 --> 01:23:40,330
going back out again.

1398
01:23:40,330 --> 01:23:47,800
This e to the r star is the
tightest the bound ever gets.

1399
01:23:47,800 --> 01:23:54,390
That's the n at which errors
in the hypothesis testing

1400
01:23:54,390 --> 01:23:57,400
usually occur.

1401
01:23:57,400 --> 01:23:59,120
It's the point at which--

1402
01:23:59,120 --> 01:24:02,740
it's the n for which Sn greater
than or equal to alpha

1403
01:24:02,740 --> 01:24:06,240
is most likely to occur.

1404
01:24:06,240 --> 01:24:12,500
And if you evaluate that for
our friendly binary case

1405
01:24:12,500 --> 01:24:18,290
again, X equals 1 or X equals
minus 1, what you find when

1406
01:24:18,290 --> 01:24:25,060
you evaluate that point alpha r
star is that r star is equal

1407
01:24:25,060 --> 01:24:29,595
to log 1 minus P over P. And our
bound of probability union

1408
01:24:29,595 --> 01:24:33,830
of Sn is greater than or equal
to alpha is approximately e to

1409
01:24:33,830 --> 01:24:38,030
the minus alpha r star
is 1 minus P over P

1410
01:24:38,030 --> 01:24:40,690
to the minus alpha.

1411
01:24:40,690 --> 01:24:43,600
I mean, why do I torture
you with this?

1412
01:24:43,600 --> 01:24:46,570
Because we solved this problem
at the beginning of the

1413
01:24:46,570 --> 01:24:47,760
lecture, remember?

1414
01:24:47,760 --> 01:24:54,540
The probability that the sum
S sub n for this binary

1415
01:24:54,540 --> 01:24:59,710
experiment is greater than or
equal to k is equal to 1 minus

1416
01:24:59,710 --> 01:25:02,130
P over P to the minus k.

1417
01:25:02,130 --> 01:25:04,640
That's what it's equal
to exactly.

1418
01:25:04,640 --> 01:25:09,960
When I go through all of this
Chernoff bound stuff, I get

1419
01:25:09,960 --> 01:25:11,790
the same answer.

1420
01:25:11,790 --> 01:25:14,950
Now, this is a much harder way
to do it, but this is a

1421
01:25:14,950 --> 01:25:16,240
general way of doing it.

1422
01:25:16,240 --> 01:25:18,340
And that's a very specialized
way of doing it.

1423
01:25:18,340 --> 01:25:20,250
So we'll talk more about
this next time.