1
00:00:00,530 --> 00:00:02,960
The following content is
provided under a Creative

2
00:00:02,960 --> 00:00:04,370
Commons license.

3
00:00:04,370 --> 00:00:07,410
Your support will help MIT
OpenCourseWare continue to

4
00:00:07,410 --> 00:00:11,060
offer high quality educational
resources for free.

5
00:00:11,060 --> 00:00:13,960
To make a donation or view
additional materials from

6
00:00:13,960 --> 00:00:17,890
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:17,890 --> 00:00:23,400
ocw.mit.edu

8
00:00:23,400 --> 00:00:25,290
PROFESSOR: OK, we have
a busy day today,

9
00:00:25,290 --> 00:00:26,540
so let's get started.

10
00:00:32,580 --> 00:00:36,310
Want to go through Chernoff
bounds and the Wald identity,

11
00:00:36,310 --> 00:00:42,770
which are closely related, as
you'll see, and that involves

12
00:00:42,770 --> 00:00:47,220
coming back to the TG1Q a little
bit and making use of

13
00:00:47,220 --> 00:00:49,290
what we did for that.

14
00:00:49,290 --> 00:00:54,320
It also means coming
back to hypothesis

15
00:00:54,320 --> 00:00:57,140
testing and using that.

16
00:00:57,140 --> 00:00:59,100
Would probably have been better
to start out with

17
00:00:59,100 --> 00:01:04,319
Wald's identity and the Chernoff
bound and then do the

18
00:01:04,319 --> 00:01:08,620
applications when it was at
the natural time for them.

19
00:01:08,620 --> 00:01:12,730
But anyway, this is the way it
is this time, and next time

20
00:01:12,730 --> 00:01:14,080
we'll probably do
it differently.

21
00:01:16,610 --> 00:01:19,580
Suppose you have a random
variable z.

22
00:01:19,580 --> 00:01:21,520
It has a moment generating
function.

23
00:01:21,520 --> 00:01:24,200
Remember, not all random
variables have moment

24
00:01:24,200 --> 00:01:25,250
generating functions.

25
00:01:25,250 --> 00:01:27,810
It's a pretty strong
restriction.

26
00:01:27,810 --> 00:01:28,970
You need a variance.

27
00:01:28,970 --> 00:01:31,570
You need moments
of all orders.

28
00:01:31,570 --> 00:01:35,100
You need all sorts of things,
s but we'll assume it exists

29
00:01:35,100 --> 00:01:41,010
in some region between
r and r plus.

30
00:01:41,010 --> 00:01:43,310
There's always a question,
with moment generating

31
00:01:43,310 --> 00:01:49,240
functions, if they exist up
to some maximum value of r

32
00:01:49,240 --> 00:01:53,490
because some of them exist at
that value of r and then

33
00:01:53,490 --> 00:01:59,650
disappear immediately after
that, and others just sort of

34
00:01:59,650 --> 00:02:08,210
peter away as r approaches
r plus from below.

35
00:02:08,210 --> 00:02:10,750
I think in the homework this
week, you have an example of

36
00:02:10,750 --> 00:02:12,030
both of those.

37
00:02:12,030 --> 00:02:13,900
I mean, it's a very
simple issue.

38
00:02:13,900 --> 00:02:18,000
If you have an exponential
distribution, then as r

39
00:02:18,000 --> 00:02:23,900
approaches, the rate of that
exponential distribution,

40
00:02:23,900 --> 00:02:26,620
obviously, the moment generating
function blows up

41
00:02:26,620 --> 00:02:30,700
because you're taking e to the
minus lambda x, and you're

42
00:02:30,700 --> 00:02:33,390
multiplying it by a the r x.

43
00:02:33,390 --> 00:02:39,280
And when r is equal to lambda,
bingo, you're integrating 1

44
00:02:39,280 --> 00:02:42,290
over an infinite range, so
you've got infinity.

45
00:02:42,290 --> 00:02:46,150
If you multiply that exponential
by something which

46
00:02:46,150 --> 00:02:52,360
makes the integral finite when
you set r equal to lambda,

47
00:02:52,360 --> 00:02:53,900
then of course, you have
something which

48
00:02:53,900 --> 00:02:57,460
is finite at r star.

49
00:02:57,460 --> 00:02:59,450
That is a big pain
in the neck.

50
00:02:59,450 --> 00:03:01,570
It's usually not important.

51
00:03:01,570 --> 00:03:06,240
The notes deal with it very
carefully, so we're not going

52
00:03:06,240 --> 00:03:07,100
to deal with it here.

53
00:03:07,100 --> 00:03:11,210
We will just assume here that
we're talking about r less

54
00:03:11,210 --> 00:03:16,040
than r plus and not worry about
that special case, which

55
00:03:16,040 --> 00:03:17,690
usually is not all
that important.

56
00:03:17,690 --> 00:03:19,940
But sometimes you have
to worry about it.

57
00:03:19,940 --> 00:03:23,610
OK, the Chernoff bound says
that the probability that

58
00:03:23,610 --> 00:03:28,140
random variable is greater than
or equal to alpha is less

59
00:03:28,140 --> 00:03:32,060
than or equal to the moment
generating function evaluated

60
00:03:32,060 --> 00:03:37,690
at some arbitrary value r times
e to the minus r alpha.

61
00:03:37,690 --> 00:03:41,700
And if you put it in terms of
the semi invariant moment

62
00:03:41,700 --> 00:03:44,440
generating function, the log
of the moment generating

63
00:03:44,440 --> 00:03:48,040
function, then the bound
is e to the gamma z of

64
00:03:48,040 --> 00:03:51,100
r minus alpha r.

65
00:03:54,350 --> 00:03:57,410
When you see something like
that, you ought to look at it

66
00:03:57,410 --> 00:04:00,540
and say, gee, that looks funny
because here, we're taking an

67
00:04:00,540 --> 00:04:04,750
arbitrary random variable and
saying the tails of it have to

68
00:04:04,750 --> 00:04:07,370
go down exponentially.

69
00:04:07,370 --> 00:04:09,090
That's exactly what this says.

70
00:04:09,090 --> 00:04:13,560
It says that a z takes
on very large values.

71
00:04:13,560 --> 00:04:18,110
This is a fixed quantity here
for a given value of r, and

72
00:04:18,110 --> 00:04:21,700
it's going down as e to the
minus r times alpha.

73
00:04:21,700 --> 00:04:25,030
As you make alpha larger and
larger, this goes down faster

74
00:04:25,030 --> 00:04:25,620
and faster.

75
00:04:25,620 --> 00:04:26,790
So what's going on?

76
00:04:26,790 --> 00:04:30,090
How do you take an arbitrary
random variable and say the

77
00:04:30,090 --> 00:04:34,140
tails of it is exponentially
decreasing?

78
00:04:34,140 --> 00:04:37,510
That's why you have to insist
that the moment generating

79
00:04:37,510 --> 00:04:41,450
function exists because when the
moment generating function

80
00:04:41,450 --> 00:04:44,300
exists for some r, it means
that the tail of that

81
00:04:44,300 --> 00:04:48,870
distribution is, in fact, going
down at least that fast,

82
00:04:48,870 --> 00:04:50,690
so you get something
that exists.

83
00:04:50,690 --> 00:04:55,310
So the question is what's the
best bound of this sort of

84
00:04:55,310 --> 00:04:58,080
when you optimize o for r?

85
00:04:58,080 --> 00:05:02,650
Then the next thing we did is
we said that z is a sum of

86
00:05:02,650 --> 00:05:07,000
IID, then the semi invariant
moment generating function for

87
00:05:07,000 --> 00:05:12,480
that sum is equal to n times
the semi invariant moment

88
00:05:12,480 --> 00:05:17,722
generating function for the
underlying random variable x.

89
00:05:17,722 --> 00:05:20,840
S of n is n of these IID
random variable.

90
00:05:20,840 --> 00:05:24,180
So one thing you see
immediately, and ought to be

91
00:05:24,180 --> 00:05:28,890
second nature to you now, is
that if a random variable has

92
00:05:28,890 --> 00:05:32,480
a moment generating function
over some range, the sum of a

93
00:05:32,480 --> 00:05:36,070
bunch of those IID random
variables also has a moment

94
00:05:36,070 --> 00:05:39,200
generating function over
that same range.

95
00:05:39,200 --> 00:05:42,120
You can just count on that
because the semi invariant

96
00:05:42,120 --> 00:05:46,000
moment generating function
is just n times this b.

97
00:05:46,000 --> 00:05:49,640
OK, so then what we've said is
the probability that sn is

98
00:05:49,640 --> 00:05:53,170
greater than or equal to na,
where na is playing the role

99
00:05:53,170 --> 00:05:57,640
of alpha and sn is playing the
role of z, is just a minimum

100
00:05:57,640 --> 00:06:04,850
over r of e to the n times gamma
x of r minus ra, and the

101
00:06:04,850 --> 00:06:08,540
n is multiplying the ra
as well as the n.

102
00:06:08,540 --> 00:06:12,100
OK, this is exponential
n for a fixed a.

103
00:06:12,100 --> 00:06:15,820
In other words, what you do in
this minimization, if you

104
00:06:15,820 --> 00:06:18,780
don't worry about the special
cases or anything, how do you

105
00:06:18,780 --> 00:06:19,860
minimize something?

106
00:06:19,860 --> 00:06:25,590
Well, obviously, you want to
minimize the exponent here, so

107
00:06:25,590 --> 00:06:30,210
you take the derivative of this
gamma prime of r has to

108
00:06:30,210 --> 00:06:32,330
be equal to a.

109
00:06:32,330 --> 00:06:35,690
Then n can be whatever it wants
to be when you find that

110
00:06:35,690 --> 00:06:40,240
optimum r, which is where gamma
prime of r equals a,

111
00:06:40,240 --> 00:06:42,610
then this goes down
exponentially with a.

112
00:06:45,590 --> 00:06:48,860
Now, however, we're interested
in something else.

113
00:06:48,860 --> 00:06:51,130
We're interested in threshold
crossings.

114
00:06:51,130 --> 00:06:56,800
We're not interested in picking
a particular value of

115
00:06:56,800 --> 00:07:00,120
a and asking, as n gets very,
very big, what's the

116
00:07:00,120 --> 00:07:03,150
probability that the sum of
random variable is greater

117
00:07:03,150 --> 00:07:05,460
than or equal to n times a.

118
00:07:05,460 --> 00:07:08,760
That is exponential in n, but
what we're interested in is

119
00:07:08,760 --> 00:07:12,440
the probability that s of n is
greater than or equal to just

120
00:07:12,440 --> 00:07:16,300
some constant alpha, and what
we're doing, now, is instead

121
00:07:16,300 --> 00:07:20,850
of varying n and varying
this with n also,

122
00:07:20,850 --> 00:07:22,030
we're holding the stick.

123
00:07:22,030 --> 00:07:26,420
So we're asking as n gets very,
very large, but you hold

124
00:07:26,420 --> 00:07:30,340
this alpha fixed, what happens
on this bound over here?

125
00:07:30,340 --> 00:07:34,030
Well, when you minimize this,
taking the same simple-minded

126
00:07:34,030 --> 00:07:38,090
view, now the n is not
multiplied by the ra.

127
00:07:38,090 --> 00:07:41,480
It's just multiplied
by the gamma x.

128
00:07:41,480 --> 00:07:45,790
You get n times gamma prime of
r is equal to alpha s where

129
00:07:45,790 --> 00:07:51,610
the minimum is so that it says
gamma prime of r is optimized

130
00:07:51,610 --> 00:07:55,270
when you pick gamma prime of r
equal to alpha over and n.

131
00:07:55,270 --> 00:08:00,460
This quantity is minimized when
you pick gamma prime of r

132
00:08:00,460 --> 00:08:01,760
equal to alpha over n.

133
00:08:01,760 --> 00:08:06,200
So if you look at this bound as
n changes, what's happening

134
00:08:06,200 --> 00:08:11,710
is, as n changes, r is changing
also, so this is a

135
00:08:11,710 --> 00:08:15,360
harder thing to deal with
for variable n.

136
00:08:15,360 --> 00:08:19,580
But graphically, it's quite
easy to deal with.

137
00:08:19,580 --> 00:08:22,885
I'm not sure you all got the
graphical argument last time

138
00:08:22,885 --> 00:08:26,570
when we went through it, so I
want to go through it again.

139
00:08:26,570 --> 00:08:32,409
Let's look at this exponent r
minus n over alpha times gamma

140
00:08:32,409 --> 00:08:36,730
of r, and see what
it looks like.

141
00:08:36,730 --> 00:08:42,000
We'll take r, pick
any old r, there.

142
00:08:42,000 --> 00:08:48,530
What we want to do is show that
this, if you take a slope

143
00:08:48,530 --> 00:08:56,010
of alpha over n, and take an
arbitrary r, come down to

144
00:08:56,010 --> 00:09:00,620
gamma of x of r, draw a line
in this slope, and look at

145
00:09:00,620 --> 00:09:06,350
where it hits the horizontal
axis here, that point is r

146
00:09:06,350 --> 00:09:09,750
plus the length of
this line here.

147
00:09:09,750 --> 00:09:13,410
The length of this line here
is gamma of r, that's a

148
00:09:13,410 --> 00:09:18,930
negative value, times 1 over
that slope of this line.

149
00:09:18,930 --> 00:09:24,830
And 1 over the slope of this
line is n over alpha, so when

150
00:09:24,830 --> 00:09:28,730
I pick a particular value of r,
the value of the experiment

151
00:09:28,730 --> 00:09:31,100
I have is this value here.

152
00:09:36,930 --> 00:09:39,570
How do I optimize this over r?

153
00:09:39,570 --> 00:09:42,340
How do I get the largest
exponent here?

154
00:09:42,340 --> 00:09:48,560
Well, I think of varying r, as
I vary r from 0, and each

155
00:09:48,560 --> 00:09:51,280
time, I take this straight
line here.

156
00:09:51,280 --> 00:09:54,530
And I start here, draw a
straight line over there,

157
00:09:54,530 --> 00:09:57,770
start here, draw a straight
line over, start at this

158
00:09:57,770 --> 00:10:00,320
tangent here, draw a
straight line over.

159
00:10:00,320 --> 00:10:03,890
And what happens when I come
to larger values of r?

160
00:10:03,890 --> 00:10:09,300
Just because gamma of s of r is
convex, what happens is I

161
00:10:09,300 --> 00:10:17,500
start taking these slope lines,
slope alpha over n, and

162
00:10:17,500 --> 00:10:21,420
they intercept the horizontal
axis at a smaller value.

163
00:10:21,420 --> 00:10:28,020
So this is optimized over r
at the value of r0, which

164
00:10:28,020 --> 00:10:32,790
satisfies alpha over n equals
gamma prime of r0.

165
00:10:32,790 --> 00:10:35,870
That's the same answer we got
before when we just used

166
00:10:35,870 --> 00:10:38,370
elementary calculus.

167
00:10:38,370 --> 00:10:41,330
Here, we're using a more
sophisticated argument, which

168
00:10:41,330 --> 00:10:44,600
you learned about probably
in 10th grade.

169
00:10:44,600 --> 00:10:47,540
I would argue that you learned
mostly really sophisticated

170
00:10:47,540 --> 00:10:51,450
things when you're in high
school, and then when you get

171
00:10:51,450 --> 00:10:55,370
to study engineering in college,
somehow you always

172
00:10:55,370 --> 00:10:57,130
study these mundane things.

173
00:10:57,130 --> 00:11:01,740
But anyway, aside from
that, why is this

174
00:11:01,740 --> 00:11:04,330
geometric argument better?

175
00:11:04,330 --> 00:11:07,630
Well, when you look at these
special cases of what happens

176
00:11:07,630 --> 00:11:13,650
when gamma of r comes around
like this, and then suddenly

177
00:11:13,650 --> 00:11:17,350
it stops in midair and just
doesn't exist anymore?

178
00:11:17,350 --> 00:11:21,230
So it comes around here, it's
still convex, but then

179
00:11:21,230 --> 00:11:23,470
suddenly it goes off
to infinity.

180
00:11:23,470 --> 00:11:25,580
How do you do that optimization
then?

181
00:11:25,580 --> 00:11:29,050
Well, the graphical argument
makes it clear how you do it,

182
00:11:29,050 --> 00:11:32,060
and makes it perfectly rigorous
how to do it, whereas

183
00:11:32,060 --> 00:11:34,310
if you're doing it by calculus,
you've got a really

184
00:11:34,310 --> 00:11:39,270
think it through, and it
becomes fairly tricky.

185
00:11:39,270 --> 00:11:45,040
OK, so anyway, now, the next
question we want to ask--

186
00:11:47,810 --> 00:11:51,040
I mean, at this point, we've
seen how to minimize this

187
00:11:51,040 --> 00:11:56,490
quantity over r, so we know what
this exponent is for a

188
00:11:56,490 --> 00:11:58,800
particular value of n.

189
00:11:58,800 --> 00:12:01,690
Now, what happens
when we vary n?

190
00:12:01,690 --> 00:12:05,500
As you vary n, the thing that
happens is we have this

191
00:12:05,500 --> 00:12:10,010
tangent line here, a
slope alpha over n.

192
00:12:10,010 --> 00:12:14,650
When you start making n larger,
alpha over n becomes

193
00:12:14,650 --> 00:12:18,850
smaller, so the slope
becomes smaller.

194
00:12:18,850 --> 00:12:24,210
And as n approaches infinity,
you wind up going way,

195
00:12:24,210 --> 00:12:26,270
way the heck out.

196
00:12:26,270 --> 00:12:29,700
As n gets smaller, you
come in again.

197
00:12:29,700 --> 00:12:33,110
You keep coming in until you
get to this point here.

198
00:12:33,110 --> 00:12:34,130
And what happens then?

199
00:12:34,130 --> 00:12:38,010
We're talking about
a line of--

200
00:12:38,010 --> 00:12:39,570
maybe I ought to draw
it on the board.

201
00:12:39,570 --> 00:12:41,370
It would be clearer, I think.

202
00:12:57,850 --> 00:13:02,570
As n gets smaller, you
get a point which is

203
00:13:02,570 --> 00:13:05,510
tangent here, this here.

204
00:13:05,510 --> 00:13:12,000
When you're here, the tangent
gets right here, so we've

205
00:13:12,000 --> 00:13:15,970
moved all the way into this
quantity we call r star, which

206
00:13:15,970 --> 00:13:20,520
is the root of the equation
gamma of r equals 0.

207
00:13:20,520 --> 00:13:24,230
Gamma of r equals 0 typically
has two roots, one

208
00:13:24,230 --> 00:13:27,920
here, and one at 0.

209
00:13:27,920 --> 00:13:31,220
It always has a root at 0
because moment generating

210
00:13:31,220 --> 00:13:36,010
function evaluated is 0 is
always 1, so the log of

211
00:13:36,010 --> 00:13:37,950
it is always 0.

212
00:13:37,950 --> 00:13:41,500
There should be another root
because this is convex, unless

213
00:13:41,500 --> 00:13:45,610
it drops off suddenly, and even
if it drops off suddenly,

214
00:13:45,610 --> 00:13:48,270
you can visualize it
as a straight line

215
00:13:48,270 --> 00:13:50,450
going off to infinity.

216
00:13:50,450 --> 00:13:54,240
So when you get down to this
point, what happens?

217
00:13:54,240 --> 00:13:55,845
Well, we just keep
moving along.

218
00:14:04,340 --> 00:14:09,110
So as n increases, we start
out very large.

219
00:14:09,110 --> 00:14:10,140
We come in.

220
00:14:10,140 --> 00:14:14,260
We hit this point, and then
we start coming out again.

221
00:14:14,260 --> 00:14:18,040
I mean, if you think about it,
that makes perfect sense

222
00:14:18,040 --> 00:14:21,580
because what we're doing here is
we're imagining experiment

223
00:14:21,580 --> 00:14:27,150
where this random variable has
a negative expected value.

224
00:14:27,150 --> 00:14:33,060
That's what's indicated by
this quantity there.

225
00:14:33,060 --> 00:14:36,010
We're asking what's the
probability that the sum of a

226
00:14:36,010 --> 00:14:40,260
large number of IID random
variables with a negative

227
00:14:40,260 --> 00:14:44,480
expected value ever rises above
some positive threshold?

228
00:14:47,200 --> 00:14:50,000
Well, the law of large numbers
says it's not going to do that

229
00:14:50,000 --> 00:14:54,100
when n is very, very large,
and this says that, too.

230
00:14:54,100 --> 00:14:57,070
It says the probability of
it for n very large is

231
00:14:57,070 --> 00:14:58,910
extraordinarily small.

232
00:14:58,910 --> 00:15:02,860
It's e to the minus 10 times
an exponent, which is very,

233
00:15:02,860 --> 00:15:04,660
very large.

234
00:15:04,660 --> 00:15:09,660
So as n gets very small, it's
not going to happen either

235
00:15:09,660 --> 00:15:12,780
because it doesn't have time
to get to the threshold.

236
00:15:12,780 --> 00:15:15,910
So there's some intermediate
value at which it's most

237
00:15:15,910 --> 00:15:18,950
likely to cross the threshold,
if you're going to cross the

238
00:15:18,950 --> 00:15:22,460
threshold, and that intermediate
value is just

239
00:15:22,460 --> 00:15:28,960
that value at which gamma of
r star is equal to zero.

240
00:15:31,640 --> 00:15:34,670
So the probability this union
of terms, namely the

241
00:15:34,670 --> 00:15:38,310
probability you ever cross
alpha, is going to be, in some

242
00:15:38,310 --> 00:15:45,210
sense, approximately a to the
minus alpha r star because

243
00:15:45,210 --> 00:15:47,130
that's where the dominant
term is.

244
00:15:47,130 --> 00:15:50,060
The dominant term is where
alpha over n is

245
00:15:50,060 --> 00:15:52,180
equal to gamma prime.

246
00:15:52,180 --> 00:15:56,030
Blah, blah, blah, blah, blah,
where'd I put that?

247
00:15:56,030 --> 00:16:00,130
r star satisfies gamma
of r star equals 0.

248
00:16:00,130 --> 00:16:06,650
When you look at the line of
slope, gamma prime of r plus

249
00:16:06,650 --> 00:16:12,860
of r star, that's where you get
this critical value of n

250
00:16:12,860 --> 00:16:17,396
where it's most likely the
cross the threshold.

251
00:16:17,396 --> 00:16:18,900
OK, I put that somewhere.

252
00:16:18,900 --> 00:16:24,410
I thought it was on this slide,
but it's the n, the

253
00:16:24,410 --> 00:16:32,195
critical n, let's call it n
crit, is equal to gamma prime.

254
00:16:41,240 --> 00:16:42,930
Is at right?

255
00:16:42,930 --> 00:16:45,720
Alpha over m is the
gamma prime.

256
00:16:45,720 --> 00:16:51,730
Alpha over n, 1 over n crit.

257
00:16:51,730 --> 00:17:02,830
n crit, this says, is alpha over
gamma prime of r star.

258
00:17:02,830 --> 00:17:09,670
OK, so that sort of nails down
everything you want to know

259
00:17:09,670 --> 00:17:12,640
about the Chernoff bound accept
for the fact that it is

260
00:17:12,640 --> 00:17:14,270
exponentially tight.

261
00:17:14,270 --> 00:17:15,780
The text proves that.

262
00:17:15,780 --> 00:17:17,780
I'm not going to go
through that here.

263
00:17:17,780 --> 00:17:21,480
Exponentially tight means, if
you take an exponent which is

264
00:17:21,480 --> 00:17:25,260
just a little bit larger than
the one you found here, and

265
00:17:25,260 --> 00:17:27,660
look what happens as alpha
gets very, very

266
00:17:27,660 --> 00:17:31,590
large, then you lose.

267
00:17:31,590 --> 00:17:37,820
OK, let's go on, and at this
point, we're ready to talk

268
00:17:37,820 --> 00:17:40,510
about Wald's identity.

269
00:17:40,510 --> 00:17:43,600
And we'll prove Wald's identity
at the end of the

270
00:17:43,600 --> 00:17:45,510
lecture today.

271
00:17:45,510 --> 00:17:48,010
Turns out there's a very,
very simple proof of it.

272
00:17:48,010 --> 00:17:55,800
There's hardly anything to it,
but it seems more important to

273
00:17:55,800 --> 00:17:59,960
use it in several ways first so
that you get a sense that

274
00:17:59,960 --> 00:18:03,440
it, in fact, is sort
of important.

275
00:18:03,440 --> 00:18:09,910
OK, so we want to think about
a random walk, s sub n and

276
00:18:09,910 --> 00:18:12,840
greater than or equal to n, so
it's a sequence of sums of

277
00:18:12,840 --> 00:18:16,560
random variable, s sub
n is equal to x1

278
00:18:16,560 --> 00:18:19,150
plus up to x sub n.

279
00:18:19,150 --> 00:18:21,480
The x's are all IID.

280
00:18:21,480 --> 00:18:24,330
This is the thing we've been
talking about all term.

281
00:18:24,330 --> 00:18:27,040
We have a bunch of IID
random variables.

282
00:18:27,040 --> 00:18:29,600
We look at the partial
sums of them.

283
00:18:29,600 --> 00:18:32,220
We're interested in what happens
to that sequence of

284
00:18:32,220 --> 00:18:33,570
partial sums.

285
00:18:33,570 --> 00:18:36,940
The question we're asking here
is does that sequence of

286
00:18:36,940 --> 00:18:41,880
partial sums ever cross
a positive threshold?

287
00:18:41,880 --> 00:18:44,670
And now we're asking does
it ever cross a positive

288
00:18:44,670 --> 00:18:49,170
threshold, or does it cross a
negative threshold, and which

289
00:18:49,170 --> 00:18:51,230
does it cross first?

290
00:18:51,230 --> 00:18:54,690
So the probability that it
crosses this threshold is the

291
00:18:54,690 --> 00:18:57,340
probability that it
goes up, first.

292
00:18:57,340 --> 00:19:00,190
The probability that it crosses
this threshold is the

293
00:19:00,190 --> 00:19:03,140
probability that it
goes down, first.

294
00:19:03,140 --> 00:19:06,360
Now, what Wald's identity says
is the following thing.

295
00:19:06,360 --> 00:19:09,480
We're going to assume that
x is not identically 0.

296
00:19:09,480 --> 00:19:12,230
If x is identically
0, then it's never

297
00:19:12,230 --> 00:19:14,400
going to go any place.

298
00:19:14,400 --> 00:19:17,070
We're going to assume that it
has a semi invariant moment

299
00:19:17,070 --> 00:19:22,870
generating function in some
region, r minus to r plus.

300
00:19:22,870 --> 00:19:25,240
That's the same as assuming
that it has a generating

301
00:19:25,240 --> 00:19:31,230
function in that region, so it
exists from some value less

302
00:19:31,230 --> 00:19:35,060
than zero to some value
greater than zero.

303
00:19:35,060 --> 00:19:38,660
And we picked two thresholds,
one of them positive, one of

304
00:19:38,660 --> 00:19:44,980
them negative, and we let j be
the smallest value of n. j is

305
00:19:44,980 --> 00:19:49,790
a random variable, now, because
we've start to run

306
00:19:49,790 --> 00:19:51,070
this random walk.

307
00:19:51,070 --> 00:19:54,740
We run it until it crosses one
of these thresholds, and if it

308
00:19:54,740 --> 00:19:57,780
crosses the positive threshold,
j is the time at

309
00:19:57,780 --> 00:20:00,250
which it crosses the
positive threshold.

310
00:20:00,250 --> 00:20:03,470
If it crosses the negative
threshold, j is the time that

311
00:20:03,470 --> 00:20:05,390
it crosses the negative
threshold.

312
00:20:05,390 --> 00:20:07,150
We're only looking at the first

313
00:20:07,150 --> 00:20:08,400
threshold that it crosses.

314
00:20:10,990 --> 00:20:14,370
Now, notice that j is
a stopping trial.

315
00:20:14,370 --> 00:20:17,270
In other words, what that means
is you can determine

316
00:20:17,270 --> 00:20:21,720
whether you've crossed a
threshold at time n solely in

317
00:20:21,720 --> 00:20:26,040
terms of s1 up to s sub n.

318
00:20:26,040 --> 00:20:30,330
If you see all these sums, then
you know that you haven't

319
00:20:30,330 --> 00:20:33,170
crossed a threshold
up until time n.

320
00:20:33,170 --> 00:20:35,160
You know you have crossed
it at time n.

321
00:20:35,160 --> 00:20:37,990
Doesn't make any difference
what happens at times

322
00:20:37,990 --> 00:20:39,700
greater than n.

323
00:20:39,700 --> 00:20:42,720
OK, so it's a stopping trial
in the same sense as the

324
00:20:42,720 --> 00:20:45,330
stopping trials we talked
about before.

325
00:20:45,330 --> 00:20:48,720
You get the sense that Wald's
identity, which we're talking

326
00:20:48,720 --> 00:20:52,020
about here, is sort of like
Wald's equality, which we

327
00:20:52,020 --> 00:20:53,810
talked about before.

328
00:20:53,810 --> 00:20:57,620
Both of them have to do with
these stopping trials.

329
00:20:57,620 --> 00:21:01,000
Both of them have everything
to do with stopping trials.

330
00:21:01,000 --> 00:21:05,390
Wald was a famous statistician,
not all that

331
00:21:05,390 --> 00:21:09,030
much before your era.

332
00:21:09,030 --> 00:21:10,910
He didn't die too long ago.

333
00:21:10,910 --> 00:21:15,010
I forget when, but he was one
of the good statisticians.

334
00:21:15,010 --> 00:21:19,230
See, he was a statistician who
recognized that you wanted to

335
00:21:19,230 --> 00:21:21,850
look at lots of different
models to understand the

336
00:21:21,850 --> 00:21:24,830
problem, rather than a
statistician who only wanted

337
00:21:24,830 --> 00:21:30,040
to take data and think that he
wasn't assuming anything.

338
00:21:30,040 --> 00:21:32,630
So Wald was a good guy.

339
00:21:32,630 --> 00:21:37,530
And what his identity says is,
and the trouble with his

340
00:21:37,530 --> 00:21:42,070
identity, is you look at
it, and you blink.

341
00:21:42,070 --> 00:21:47,950
The expected value of
e to the r s sub j.

342
00:21:47,950 --> 00:21:53,520
s sub j is the value of the
random walk at the time when

343
00:21:53,520 --> 00:21:57,470
you cross a threshold minus
the time at which you've

344
00:21:57,470 --> 00:22:01,510
crossed a threshold
times gamma of r.

345
00:22:01,510 --> 00:22:05,670
So when you take the expected
value of e to the this, you're

346
00:22:05,670 --> 00:22:10,410
averaging over j over the time
that you crossed the

347
00:22:10,410 --> 00:22:14,400
threshold, and also at the value
at which you crossed the

348
00:22:14,400 --> 00:22:17,500
threshold, so you're averaging
over both of these things.

349
00:22:17,500 --> 00:22:21,900
And Wald says this expectation
not is less than or equal to

350
00:22:21,900 --> 00:22:27,870
1, but it's exactly 1, and it's
exactly 1 for every r

351
00:22:27,870 --> 00:22:31,700
between r minus and r plus.

352
00:22:31,700 --> 00:22:34,690
So it's a very surprising
result.

353
00:22:34,690 --> 00:22:34,910
Yes?

354
00:22:34,910 --> 00:22:35,862
AUDIENCE: Can you
please explain

355
00:22:35,862 --> 00:22:37,670
why j cannot be defective?

356
00:22:37,670 --> 00:22:38,710
I don't really see it.

357
00:22:38,710 --> 00:22:42,160
PROFESSOR: Oh, it's because
we were looking at two

358
00:22:42,160 --> 00:22:43,400
thresholds.

359
00:22:43,400 --> 00:22:46,790
If we only had one threshold,
then it could be defective.

360
00:22:46,790 --> 00:22:49,700
Since we're looking at two
thresholds, you keep adding

361
00:22:49,700 --> 00:22:53,510
random variables in, and the
sum starts to have a larger

362
00:22:53,510 --> 00:22:55,650
and larger variance.

363
00:22:55,650 --> 00:22:58,250
Now, even with a large variance,
you're not sure that

364
00:22:58,250 --> 00:23:01,090
you crossed a threshold,
but you see why you

365
00:23:01,090 --> 00:23:03,370
must cross a threshold.

366
00:23:03,370 --> 00:23:04,022
Yes?

367
00:23:04,022 --> 00:23:06,475
AUDIENCE: If the MGF is defined
at r minus, r plus,

368
00:23:06,475 --> 00:23:10,280
then is that also [INAUDIBLE]
quality?

369
00:23:10,280 --> 00:23:12,690
PROFESSOR: Yes.

370
00:23:12,690 --> 00:23:15,585
Oh, if it's defined at r plus.

371
00:23:19,240 --> 00:23:19,495
I don't know.

372
00:23:19,495 --> 00:23:24,060
I don't remember, and I would
have to think about it hard.

373
00:23:27,830 --> 00:23:32,225
Funny things happen right at the
ends of where these moment

374
00:23:32,225 --> 00:23:35,070
generating functions are
defined, and you'll see why

375
00:23:35,070 --> 00:23:36,320
when we prove it.

376
00:23:39,650 --> 00:23:42,450
I can give you a clue as to how
we're going to prove it.

377
00:23:42,450 --> 00:23:48,860
What we're going to do is, for
this random variable x, we're

378
00:23:48,860 --> 00:23:52,780
going to define another random
variable which has the same

379
00:23:52,780 --> 00:23:58,070
distribution as x except
it's tilted.

380
00:23:58,070 --> 00:24:03,250
For large values of x, you
multiply it by e to the rx.

381
00:24:03,250 --> 00:24:05,100
For small values of
x, you multiply it

382
00:24:05,100 --> 00:24:07,480
by e to the rx also.

383
00:24:07,480 --> 00:24:10,960
But if r is positive, that
means the positive values

384
00:24:10,960 --> 00:24:17,000
could shifted up, and the small
values get shifted down.

385
00:24:20,280 --> 00:24:23,560
So you're taking some of the
density that looks like this,

386
00:24:23,560 --> 00:24:29,640
and when you shift it to this
tilted value, you're shifting

387
00:24:29,640 --> 00:24:32,190
the whole thing upward.

388
00:24:32,190 --> 00:24:34,620
When r is negative,
you're shifting

389
00:24:34,620 --> 00:24:36,330
the whole thing downward.

390
00:24:36,330 --> 00:24:43,220
Now, what this says is that
tilted random variable, when

391
00:24:43,220 --> 00:24:47,130
it crosses the threshold, the
time of crossing the threshold

392
00:24:47,130 --> 00:24:48,640
is still a random variable.

393
00:24:48,640 --> 00:24:53,660
You will see that this simply
says that the expected value

394
00:24:53,660 --> 00:24:57,360
of that tilted random variable
is equal to--

395
00:24:57,360 --> 00:25:00,560
it says that tilted random
variable is, in fact, the

396
00:25:00,560 --> 00:25:01,430
random variable.

397
00:25:01,430 --> 00:25:02,640
It's not defective.

398
00:25:02,640 --> 00:25:05,600
And it's the same argument as
before, that has a finite

399
00:25:05,600 --> 00:25:09,270
variance, and therefore, since
it has a finite variance, it

400
00:25:09,270 --> 00:25:10,550
keeps expanding.

401
00:25:10,550 --> 00:25:13,960
It will cross one of the
thresholds eventually.

402
00:25:13,960 --> 00:25:21,040
OK, so the other thing you can
do here is to say, suppose

403
00:25:21,040 --> 00:25:25,020
instead of crossing a threshold,
you just fix this

404
00:25:25,020 --> 00:25:29,350
stopping rule to say we'll
stop at time 100.

405
00:25:29,350 --> 00:25:33,370
If you stop at time 100, then
what this says is expected

406
00:25:33,370 --> 00:25:38,745
value of e to the r100 minus
100 times gamma of

407
00:25:38,745 --> 00:25:40,940
r is equal to 1.

408
00:25:40,940 --> 00:25:44,690
But that's obvious because the
expected value of e to the r

409
00:25:44,690 --> 00:25:46,340
is j is, in fact--

410
00:25:54,030 --> 00:25:59,010
it's j times the expected value
of rx, so then you're

411
00:25:59,010 --> 00:26:02,410
subtracting off j times
the log of the

412
00:26:02,410 --> 00:26:04,890
expected value of rx.

413
00:26:04,890 --> 00:26:10,080
So it's a trivial identity
if x is fixed.

414
00:26:10,080 --> 00:26:15,930
OK, so Wald's identity
says this.

415
00:26:15,930 --> 00:26:20,240
Let's see what it means in terms
of crossing a threshold.

416
00:26:20,240 --> 00:26:22,290
We'll assume both thresholds
are there.

417
00:26:22,290 --> 00:26:27,070
Incidentally, Wald's identity
is valid in a much broader

418
00:26:27,070 --> 00:26:30,310
range of circumstances than
just where you have two

419
00:26:30,310 --> 00:26:33,430
thresholds, and you're looking
at a threshold crossing.

420
00:26:33,430 --> 00:26:41,680
It's just that's a particularly
valuable form of

421
00:26:41,680 --> 00:26:44,880
the Wald identity.

422
00:26:44,880 --> 00:26:47,350
So that's the only thing
we're going to use.

423
00:26:47,350 --> 00:26:51,080
But now, if we assume further
that this random variable x

424
00:26:51,080 --> 00:26:55,820
has a negative expectation,
when x has a negative

425
00:26:55,820 --> 00:27:00,490
expectation, gamma of r
starts off going down.

426
00:27:00,490 --> 00:27:03,300
Usually, it comes
back up again.

427
00:27:03,300 --> 00:27:12,000
We're going to assume that this
quantity r star here,

428
00:27:12,000 --> 00:27:22,400
where it crosses 0 again, we're
going to assume there is

429
00:27:22,400 --> 00:27:26,460
some value of r, for which
gamma of r star equals 0.

430
00:27:26,460 --> 00:27:30,280
Mainly, we're going to assume
this typical case in which it

431
00:27:30,280 --> 00:27:35,980
comes back up and crosses
the 0 point.

432
00:27:35,980 --> 00:27:43,600
And in that case, what it says
is the probability that sj is

433
00:27:43,600 --> 00:27:47,200
greater than or equal to alpha
is just less than or equal to

434
00:27:47,200 --> 00:27:50,320
e to the minus r star
times alpha.

435
00:27:50,320 --> 00:27:53,140
Very, very simple bound
at this point.

436
00:27:53,140 --> 00:27:57,130
You look at this, and you sort
of see why we're looking, now,

437
00:27:57,130 --> 00:28:01,820
not at r in general,
but just r star.

438
00:28:01,820 --> 00:28:05,360
At r star, gamma of r
star is equal to 0.

439
00:28:05,360 --> 00:28:09,270
So this term goes away, so we're
only talking about the

440
00:28:09,270 --> 00:28:15,670
expected value of e to the r
sj, e to the r star sj is

441
00:28:15,670 --> 00:28:16,640
equal to 1.

442
00:28:16,640 --> 00:28:19,320
So let's see what happens.

443
00:28:19,320 --> 00:28:24,080
We know that e to the r star sj
is greater than or equal to

444
00:28:24,080 --> 00:28:35,160
0 for all values of sj because
e to the anything real is

445
00:28:35,160 --> 00:28:36,970
going to be positive.

446
00:28:36,970 --> 00:28:41,670
OK, since e to the r star sj is
greater than or equal to 0,

447
00:28:41,670 --> 00:28:46,230
what we can do is break this
expected value here, this term

448
00:28:46,230 --> 00:28:49,760
is 0, now, remember, break
it into two terms.

449
00:28:49,760 --> 00:28:54,590
Break it into the term where s
sub j is bigger than alpha,

450
00:28:54,590 --> 00:29:00,060
and break it into the
term where s sub j

451
00:29:00,060 --> 00:29:02,250
is less than beta.

452
00:29:02,250 --> 00:29:04,570
So I'm just going to ignore the
case where it's less than

453
00:29:04,570 --> 00:29:05,970
or equal to beta.

454
00:29:05,970 --> 00:29:08,200
I'm going to take this
expected value.

455
00:29:08,200 --> 00:29:11,360
I'm going to write it as the
probability that s sub j is

456
00:29:11,360 --> 00:29:15,550
greater than or equal to alpha
times e times the expected

457
00:29:15,550 --> 00:29:19,750
value of either the r star s
sub j given s sub j greater

458
00:29:19,750 --> 00:29:20,900
than or equal to alpha.

459
00:29:20,900 --> 00:29:25,740
There should be another term
in here to make this equal,

460
00:29:25,740 --> 00:29:28,740
and that's the probability that
s sub j is less than or

461
00:29:28,740 --> 00:29:33,450
equal to beta times e to the r
star s sub j, given that s sub

462
00:29:33,450 --> 00:29:35,760
j is less than or
equal to beta.

463
00:29:35,760 --> 00:29:39,120
We're going to ignore that, and
that's why we get the less

464
00:29:39,120 --> 00:29:40,990
than or equal to 1 here.

465
00:29:40,990 --> 00:29:43,770
Now, you can lower bound
e to the r star

466
00:29:43,770 --> 00:29:47,060
sj under this condition.

467
00:29:47,060 --> 00:29:51,980
What's a lower bound to s sub
j given that s sub j is

468
00:29:51,980 --> 00:29:53,352
greater than or equal
to alpha?

469
00:29:56,010 --> 00:29:58,510
Alpha.

470
00:29:58,510 --> 00:30:01,380
OK, we're looking at all cases
where s sub j is greater than

471
00:30:01,380 --> 00:30:08,200
or equal to alpha, and we're
going to stop this experiment

472
00:30:08,200 --> 00:30:10,920
at the point where it
first exceeds alpha.

473
00:30:10,920 --> 00:30:13,280
So we're going to lower bound
the point where it first

474
00:30:13,280 --> 00:30:20,190
exceeds alpha by alpha itself,
so this quantity is lower

475
00:30:20,190 --> 00:30:23,880
bounded, again, by taking the
probability that s sub j

476
00:30:23,880 --> 00:30:28,980
greater than or equal to alpha
times e to the r star alpha,

477
00:30:28,980 --> 00:30:31,680
and that whole thing is less
than or equal to 1.

478
00:30:31,680 --> 00:30:35,380
That says the probability that
sj is greater than or equal to

479
00:30:35,380 --> 00:30:44,590
alpha is less than or equal to
e to the minus r star alpha,

480
00:30:44,590 --> 00:30:47,990
which is what this inequality
says here.

481
00:30:47,990 --> 00:30:52,290
OK, so this is not
rocket science.

482
00:30:52,290 --> 00:30:57,910
This is a fairly simple result
if you believe in Wald's

483
00:30:57,910 --> 00:31:00,740
identity, which we'll
prove later.

484
00:31:00,740 --> 00:31:02,970
OK, so it's valid
for all choices

485
00:31:02,970 --> 00:31:05,430
of this lower threshold.

486
00:31:05,430 --> 00:31:10,260
And remember, this probability
here, it doesn't look like

487
00:31:10,260 --> 00:31:14,640
it's a function of both alpha
and beta, but it is because

488
00:31:14,640 --> 00:31:21,370
you're asking what's the
probability that you cross the

489
00:31:21,370 --> 00:31:26,380
threshold alpha before you
cross the threshold beta.

490
00:31:26,380 --> 00:31:29,180
And if you make beta very, very
large, it makes it more

491
00:31:29,180 --> 00:31:31,780
likely that you're going
to cross the threshold.

492
00:31:31,780 --> 00:31:36,780
If you make beta very close to
0, then you're probably going

493
00:31:36,780 --> 00:31:42,750
to cross beta first, so this
inequality here, this quantity

494
00:31:42,750 --> 00:31:44,750
here, depends on beta also.

495
00:31:44,750 --> 00:31:47,960
But we know that this inequality
is valid no matter

496
00:31:47,960 --> 00:31:52,210
what beta is, so we can let beta
approach minus infinity,

497
00:31:52,210 --> 00:31:54,700
and we can still have
this inequality.

498
00:31:54,700 --> 00:31:58,390
There's a little bit tricky
math involved in that.

499
00:31:58,390 --> 00:32:02,800
There's an exercise in the text
which goes through that

500
00:32:02,800 --> 00:32:06,880
slightly tricky math, but what
you find is that this bound is

501
00:32:06,880 --> 00:32:09,920
valid with only one threshold,
as well as with two

502
00:32:09,920 --> 00:32:10,940
thresholds.

503
00:32:10,940 --> 00:32:15,010
But this proof here that we've
given depends on a lower

504
00:32:15,010 --> 00:32:17,420
threshold, which is somewhere.

505
00:32:17,420 --> 00:32:18,910
We don't care where.

506
00:32:18,910 --> 00:32:21,830
Valid for all choices of
beta, so it's valid

507
00:32:21,830 --> 00:32:24,580
without a lower threshold.

508
00:32:24,580 --> 00:32:30,590
The probability that the union
overall n of sn less than or

509
00:32:30,590 --> 00:32:31,410
equal to alpha.

510
00:32:31,410 --> 00:32:33,330
In other words, the probability
that we ever

511
00:32:33,330 --> 00:32:35,110
crossed a threshold alpha--

512
00:32:35,110 --> 00:32:36,430
AUDIENCE: It's not true equal.

513
00:32:36,430 --> 00:32:36,880
PROFESSOR: What?

514
00:32:36,880 --> 00:32:39,090
AUDIENCE: It's supposed to
be sn larger [INAUDIBLE]

515
00:32:39,090 --> 00:32:40,990
as the last time?

516
00:32:40,990 --> 00:32:43,940
PROFESSOR: It's less than or
equal to e to the minus r star

517
00:32:43,940 --> 00:32:46,590
alpha, which is--

518
00:32:46,590 --> 00:32:47,133
AUDIENCE: Oh, sn?

519
00:32:47,133 --> 00:32:48,000
sn?

520
00:32:48,000 --> 00:32:50,194
PROFESSOR: n.

521
00:32:50,194 --> 00:32:51,076
AUDIENCE: You just [INAUDIBLE]

522
00:32:51,076 --> 00:32:51,517
[? the quantity? ?]

523
00:32:51,517 --> 00:32:54,040
PROFESSOR: Oh, it's a union
overall n greater than or

524
00:32:54,040 --> 00:32:55,290
equal to 1.

525
00:32:58,407 --> 00:33:03,050
OK, in other words, this
quantity we're dealing with

526
00:33:03,050 --> 00:33:08,581
here is the probability
that sn---

527
00:33:08,581 --> 00:33:12,620
oh, I see what you're saying.

528
00:33:12,620 --> 00:33:14,320
This quantity here
should be greater

529
00:33:14,320 --> 00:33:15,230
than or equal to alpha.

530
00:33:15,230 --> 00:33:18,200
You're right.

531
00:33:18,200 --> 00:33:21,290
Sorry about that.

532
00:33:21,290 --> 00:33:22,710
I think it's right
most places.

533
00:33:22,710 --> 00:33:24,850
Yes, it's right.

534
00:33:24,850 --> 00:33:26,100
We have it right here.

535
00:33:32,210 --> 00:33:36,110
The probability of this union
is really the same as the

536
00:33:36,110 --> 00:33:39,930
probability that the value of
it, after it crosses the

537
00:33:39,930 --> 00:33:42,110
threshold, is greater than
or equal to alpha.

538
00:33:45,910 --> 00:33:51,150
OK, now, we saw before that the
probability that s sub n

539
00:33:51,150 --> 00:33:55,500
is greater than or
equal to alpha.

540
00:33:55,500 --> 00:33:56,750
Excuse me, that's the same.

541
00:34:00,580 --> 00:34:04,030
When you're writing things in
LaTeX, the symbol for less

542
00:34:04,030 --> 00:34:06,840
than or equal to is so similar
to that for greater than or

543
00:34:06,840 --> 00:34:09,110
equal to that's hard to
keep them straight.

544
00:34:09,110 --> 00:34:14,239
That quantity there is a greater
than or equal to sign,

545
00:34:14,239 --> 00:34:16,850
if you're going from right to
left instead of right to left.

546
00:34:16,850 --> 00:34:22,659
So all we're doing here is
simply using this, well,

547
00:34:22,659 --> 00:34:24,550
greater than or equal to.

548
00:34:24,550 --> 00:34:27,949
OK, the corollary makes a
stronger and cleaner statement

549
00:34:27,949 --> 00:34:35,630
that the probability that you
ever cross alpha is less than

550
00:34:35,630 --> 00:34:36,810
or equal to--

551
00:34:36,810 --> 00:34:42,910
my heavens, my evil twin got
a hold of these slides.

552
00:34:46,449 --> 00:34:51,760
And let me rewrite that one.

553
00:34:51,760 --> 00:35:02,620
The probability that the union
overall n of the event s sub n

554
00:35:02,620 --> 00:35:09,425
greater than or equal to alpha
is less than or equal to e to

555
00:35:09,425 --> 00:35:13,860
the minus r star alpha.

556
00:35:13,860 --> 00:35:17,350
OK, so we've seen from the
Chernoff bound that for every

557
00:35:17,350 --> 00:35:21,050
n this bound has satisfied, this
says that it's not only

558
00:35:21,050 --> 00:35:25,740
satisfied for each n, but it
says it's satisfied overall n

559
00:35:25,740 --> 00:35:26,790
collectively.

560
00:35:26,790 --> 00:35:29,960
Otherwise, if we were using the
Chernoff bound, what would

561
00:35:29,960 --> 00:35:33,420
we have to do to get a handle
on this quantity?

562
00:35:33,420 --> 00:35:37,370
We'd have to use the union
bound, and then when we use

563
00:35:37,370 --> 00:35:41,950
the union bound, we can show
that for every n, the

564
00:35:41,950 --> 00:35:44,820
probability that sn is greater
than or equal to alpha is less

565
00:35:44,820 --> 00:35:46,170
than or equal to
this quantity.

566
00:35:46,170 --> 00:35:49,340
But then we'd have to add all
those terms, and we would have

567
00:35:49,340 --> 00:35:52,600
to somehow diddle around with
them to show that there are

568
00:35:52,600 --> 00:35:57,760
only a few of them which are
close to this value, and all

569
00:35:57,760 --> 00:36:00,850
the rest are negligible.

570
00:36:00,850 --> 00:36:04,420
And the number that are close to
that value is only growing

571
00:36:04,420 --> 00:36:07,800
with n and goes through
a lot of headache.

572
00:36:07,800 --> 00:36:11,460
Here, we don't have to do this
anymore because the Wald

573
00:36:11,460 --> 00:36:14,100
identity has saved is from
all that difficulty.

574
00:36:17,780 --> 00:36:24,100
OK, we talked about
the G/G/1 queue.

575
00:36:24,100 --> 00:36:28,000
We're going to apply this
corollary to the G/G/1 queue

576
00:36:28,000 --> 00:36:33,000
to the queueing time, namely to
the time w sub i that the

577
00:36:33,000 --> 00:36:36,590
i's arrival spends in
the queue before

578
00:36:36,590 --> 00:36:38,420
starting to be served.

579
00:36:38,420 --> 00:36:41,860
You remember, when we looked at
that, we found that if we

580
00:36:41,860 --> 00:36:48,850
define u sub i to be equal to
the ith interarrival time

581
00:36:48,850 --> 00:36:53,230
minus the i minus first service
time, those are

582
00:36:53,230 --> 00:36:55,640
independent of each other,
so this is the

583
00:36:55,640 --> 00:36:58,300
difference between those.

584
00:36:58,300 --> 00:37:02,100
So ui is the difference between
the i's arrival time

585
00:37:02,100 --> 00:37:05,000
and the previous service time.

586
00:37:05,000 --> 00:37:09,280
What we showed was that this
sequence, u sub i, the

587
00:37:09,280 --> 00:37:14,170
sequence of the sums of u sub
i as a modification of a

588
00:37:14,170 --> 00:37:16,440
random walk.

589
00:37:16,440 --> 00:37:20,180
In other words, the sums of
the u sub i behave exactly

590
00:37:20,180 --> 00:37:24,280
like a random walk does, but
every time it gets down to 0,

591
00:37:24,280 --> 00:37:27,030
if it crosses 0, it
resets to 0 again.

592
00:37:27,030 --> 00:37:30,700
So it keeps bouncing up again.

593
00:37:30,700 --> 00:37:36,970
If you look in the text, what
it shows is that if you look

594
00:37:36,970 --> 00:37:41,130
at this sequence of u sub i's,
and you look at the sum of

595
00:37:41,130 --> 00:37:43,800
them, and you look at them
backward, if you look at the

596
00:37:43,800 --> 00:37:50,380
sum of u sub i plus u sub i
minus 1 plus u sub i minus 2,

597
00:37:50,380 --> 00:37:56,770
and so forth, when you look at
the sum that way, it actually

598
00:37:56,770 --> 00:37:58,490
becomes a random walk.

599
00:37:58,490 --> 00:38:03,290
Therefore, we can apply this
bound to the random walk, and

600
00:38:03,290 --> 00:38:08,200
what we find is that the
probability that the waiting

601
00:38:08,200 --> 00:38:18,010
time of n queue, of the nth
customer, is probability that

602
00:38:18,010 --> 00:38:21,870
it's greater than or equal to
an arbitrary number alpha is

603
00:38:21,870 --> 00:38:25,150
less than or equal to the
probability that w sub

604
00:38:25,150 --> 00:38:28,640
infinity is greater than or
equal to alpha, and it's less

605
00:38:28,640 --> 00:38:31,610
than e to the minus
r star alpha.

606
00:38:31,610 --> 00:38:35,750
So again, all you have to do is
you have this inner arrival

607
00:38:35,750 --> 00:38:39,390
time x, you have this service
time y, you take the

608
00:38:39,390 --> 00:38:42,770
difference of the two, that's a
random variable, you find a

609
00:38:42,770 --> 00:38:46,450
moment generating function of
that random variable, you find

610
00:38:46,450 --> 00:38:49,730
the point of r star at which
that moment generating

611
00:38:49,730 --> 00:38:53,580
function equals 1, and then
the bound says that the

612
00:38:53,580 --> 00:38:57,760
probability that the queueing
time that you're going to be

613
00:38:57,760 --> 00:39:00,840
dealing with is less than
or equal to this

614
00:39:00,840 --> 00:39:01,890
quantity alpha here.

615
00:39:01,890 --> 00:39:02,120
Yes?

616
00:39:02,120 --> 00:39:06,016
AUDIENCE: What do you work with
when you have the gamma

617
00:39:06,016 --> 00:39:08,938
function go like this, and thus
have infinity, and you

618
00:39:08,938 --> 00:39:09,912
cross it there.

619
00:39:09,912 --> 00:39:12,610
[INAUDIBLE] points that
we're looking for?

620
00:39:12,610 --> 00:39:15,090
PROFESSOR: For that, you
have to read the text.

621
00:39:15,090 --> 00:39:20,240
I mean, effectively, you can
think of it just as if gamma

622
00:39:20,240 --> 00:39:24,980
of r is a convex function
like anything else.

623
00:39:24,980 --> 00:39:28,840
It just has a discontinuity in
it, and bingo, it shoots off

624
00:39:28,840 --> 00:39:30,140
to infinity.

625
00:39:30,140 --> 00:39:33,570
So when you take these slope
arguments, what happens is

626
00:39:33,570 --> 00:39:37,140
that for all slopes beyond that
point, they just seesaw

627
00:39:37,140 --> 00:39:39,820
around at one point.

628
00:39:39,820 --> 00:39:41,250
But the same bound holds.

629
00:39:44,020 --> 00:39:46,960
OK, so that's the
Kingman bound.

630
00:39:50,240 --> 00:39:53,250
Then we talked about
large deviations

631
00:39:53,250 --> 00:39:54,590
for hypothesis test.

632
00:39:54,590 --> 00:39:58,340
Well, actually we just talked
about hypothesis test, but not

633
00:39:58,340 --> 00:40:01,680
large deviation for them.

634
00:40:01,680 --> 00:40:05,840
Let's review where
we were on that.

635
00:40:05,840 --> 00:40:14,870
Let's let the vector y be an n
tuple of IID random variables,

636
00:40:14,870 --> 00:40:17,000
y1 up to y sub n.

637
00:40:17,000 --> 00:40:21,050
They're IID conditional
on hypothesis 0.

638
00:40:21,050 --> 00:40:25,540
They're also IID conditional on
hypothesis 1, so the game

639
00:40:25,540 --> 00:40:31,965
is nature chooses either
hypothesis 0 or hypothesis 1.

640
00:40:35,060 --> 00:40:40,590
You take n samples of some IID
random variable, and those n

641
00:40:40,590 --> 00:40:44,370
samples are IID conditional on
either nature choosing 0 or

642
00:40:44,370 --> 00:40:45,870
nature choosing 1.

643
00:40:45,870 --> 00:40:49,260
At the end of choosing those n
samples, you're supposed to

644
00:40:49,260 --> 00:40:54,070
guess whether h0 is the right
hypothesis or 1 is a right

645
00:40:54,070 --> 00:40:55,800
hypothesis.

646
00:40:55,800 --> 00:40:59,790
Invest in Apple stock 10 years
ago, and one hypothesis is

647
00:40:59,790 --> 00:41:00,970
it's going to go broke.

648
00:41:00,970 --> 00:41:04,180
The other hypothesis is it's
going to invent marvelous

649
00:41:04,180 --> 00:41:08,030
things, and your stock will
go up by a factor of 50.

650
00:41:08,030 --> 00:41:12,530
You take some samples, you make
your decision on that.

651
00:41:12,530 --> 00:41:15,210
Fortunately, with that, you can
make a separate decision

652
00:41:15,210 --> 00:41:18,900
each year, but that's the
kind of thing that

653
00:41:18,900 --> 00:41:19,840
we're talking about.

654
00:41:19,840 --> 00:41:24,790
We're just restricting it to
this case where you have n

655
00:41:24,790 --> 00:41:28,280
sample values that you're taking
one after the other,

656
00:41:28,280 --> 00:41:31,380
and they're all IID when the
particular value of the

657
00:41:31,380 --> 00:41:34,210
hypothesis that happens
to be there.

658
00:41:34,210 --> 00:41:37,830
OK, so we said there
is something called

659
00:41:37,830 --> 00:41:39,710
a likelihood ratio.

660
00:41:39,710 --> 00:41:45,610
The likelihood ratio for a
particular sequence y is

661
00:41:45,610 --> 00:41:53,490
lambda of y is equal to the
density of y given h1 divided

662
00:41:53,490 --> 00:41:55,900
by the density of y given h0.

663
00:41:55,900 --> 00:42:00,550
Why is it h1 on the top
and h0 on the bottom?

664
00:42:00,550 --> 00:42:02,970
Purely convention,
nothing else.

665
00:42:02,970 --> 00:42:05,600
The only thing that
distinguishes hypothesis 1

666
00:42:05,600 --> 00:42:10,940
from hypothesis 0 is you choose
one and call it 1, and

667
00:42:10,940 --> 00:42:13,350
you choose the other
and call it 0.

668
00:42:13,350 --> 00:42:16,900
Doesn't make any difference
how you do it.

669
00:42:16,900 --> 00:42:19,470
So after we make that choice,
the likelihood

670
00:42:19,470 --> 00:42:22,870
ratio is that ratio.

671
00:42:22,870 --> 00:42:27,200
Now, the reason for using semi
invariant moment generating

672
00:42:27,200 --> 00:42:31,170
functions is that this
density here is

673
00:42:31,170 --> 00:42:32,930
a product of densities.

674
00:42:32,930 --> 00:42:36,710
This density is a product of
densities, and therefore when

675
00:42:36,710 --> 00:42:44,380
you take the log of this ratio
of products, you get the sum

676
00:42:44,380 --> 00:42:51,810
from i equals 1 to n of this log
likelihood ratio for just

677
00:42:51,810 --> 00:42:54,970
a single experiment.

678
00:42:54,970 --> 00:42:58,520
It's a single experiment that
you're taking based on the

679
00:42:58,520 --> 00:43:01,850
fact that all n experiments
are based on the same

680
00:43:01,850 --> 00:43:05,290
hypothesis, either h0 or h1.

681
00:43:05,290 --> 00:43:08,430
So the game that you're playing,
and please remember

682
00:43:08,430 --> 00:43:11,310
what the game is if you forget
everything else about this

683
00:43:11,310 --> 00:43:16,350
game, is the hypothesis gets
chosen, and at the same time,

684
00:43:16,350 --> 00:43:18,710
you take n sample values.

685
00:43:18,710 --> 00:43:22,830
All n sample values correspond
to the same value of the

686
00:43:22,830 --> 00:43:24,300
hypothesis.

687
00:43:24,300 --> 00:43:29,010
OK, so when you do that, we're
going to call z sub i, this

688
00:43:29,010 --> 00:43:33,090
logarithm here, this log
likelihood ratio.

689
00:43:33,090 --> 00:43:37,370
And then we showed last time
that a threshold test is--

690
00:43:37,370 --> 00:43:43,250
well, we define the threshold
test as comparing the sum with

691
00:43:43,250 --> 00:43:45,620
the logarithm of a threshold.

692
00:43:45,620 --> 00:43:51,630
And the threshold is equal to
p0 over p sub 1, if in fact

693
00:43:51,630 --> 00:43:55,890
you're doing a maximum a
posteriori probability test,

694
00:43:55,890 --> 00:44:01,530
and p0 and p1 are the
probabilities of hypothesis.

695
00:44:01,530 --> 00:44:03,410
Remember how we did that.

696
00:44:03,410 --> 00:44:04,700
It was a very simple thing.

697
00:44:04,700 --> 00:44:09,350
You just write out what the
probability is of hypothesis 0

698
00:44:09,350 --> 00:44:12,060
and a sequence of
n values of y.

699
00:44:12,060 --> 00:44:16,260
You write out what the
probability is of hypotheses 1

700
00:44:16,260 --> 00:44:22,360
and that same sequence of values
with the appropriate

701
00:44:22,360 --> 00:44:30,880
probability on that sequence for
h equals 1 and h equals 0.

702
00:44:30,880 --> 00:44:37,230
And what you get out of that
is that the threshold test

703
00:44:37,230 --> 00:44:40,050
sums up all the z sub i's,
compares it with the

704
00:44:40,050 --> 00:44:44,990
threshold, and makes a choice,
and that is the map choice.

705
00:44:44,990 --> 00:44:50,520
OK, so conditional on h0, you're
going to make an error

706
00:44:50,520 --> 00:44:54,490
if the sum of the z sub i's
is greater than the

707
00:44:54,490 --> 00:44:57,500
logarithm of eta.

708
00:44:57,500 --> 00:45:02,350
And conditional on h1, you're
going to make an error if the

709
00:45:02,350 --> 00:45:05,450
sum is less than or
equal to log eta.

710
00:45:05,450 --> 00:45:11,540
I denote these as the random
variable z sub i 0 to make

711
00:45:11,540 --> 00:45:16,020
sure that you recognize that
this random variable here is

712
00:45:16,020 --> 00:45:20,700
conditional on h0 in this case,
and it's conditional on

713
00:45:20,700 --> 00:45:22,965
h1 in the opposite case.

714
00:45:26,680 --> 00:45:32,570
OK, so the exponential bound
for z sub i sub 0--

715
00:45:32,570 --> 00:45:36,900
OK, so what we're doing now is
we're saying, OK, suppose that

716
00:45:36,900 --> 00:45:40,900
0 is the actual value
of this hypothesis.

717
00:45:40,900 --> 00:45:43,980
0 is the value of
the hypothesis.

718
00:45:43,980 --> 00:45:46,700
The experimenter doesn't
know this.

719
00:45:46,700 --> 00:45:49,820
What the experimenter does is
does what the experimenter has

720
00:45:49,820 --> 00:45:53,910
been told to do, namely the
experimenter take these n

721
00:45:53,910 --> 00:45:59,090
values, y1 up to y sub n, finds
the likelihood ratio,

722
00:45:59,090 --> 00:46:03,390
compares that likelihood ratio
with the threshold, and if the

723
00:46:03,390 --> 00:46:08,370
threshold is larger than the
threshold, it decides 1.

724
00:46:08,370 --> 00:46:11,465
If it's smaller than the
threshold, that decides

725
00:46:11,465 --> 00:46:14,060
opposite thing.

726
00:46:14,060 --> 00:46:17,300
It decides 1 if it's above
the threshold, 0 if

727
00:46:17,300 --> 00:46:18,550
it's below the threshold.

728
00:46:26,390 --> 00:46:30,440
Well, first thing we want to do,
then, is to find the log

729
00:46:30,440 --> 00:46:36,050
likelihood ratio under the
assumption that 0 is the

730
00:46:36,050 --> 00:46:39,550
correct hypothesis, and
something very remarkable

731
00:46:39,550 --> 00:46:41,420
happens here.

732
00:46:41,420 --> 00:46:46,980
Gamma sub 0 of r is now the
logarithm because it's a semi

733
00:46:46,980 --> 00:46:52,190
invariant moment generating
function of the expected value

734
00:46:52,190 --> 00:46:57,912
of this quantity of e to
the r times z sub i.

735
00:46:57,912 --> 00:47:01,200
When we take the expected value,
we integrate over f of

736
00:47:01,200 --> 00:47:07,692
y given h0 times e to the r
times log of f of y given h1

737
00:47:07,692 --> 00:47:10,300
over f of y given h0.

738
00:47:10,300 --> 00:47:12,920
You look at this, and
what do you get?

739
00:47:12,920 --> 00:47:17,330
This quantity here is e
to the r times log of

740
00:47:17,330 --> 00:47:18,920
f of y given h1.

741
00:47:18,920 --> 00:47:22,670
That whole quantity in there
is just f of y given

742
00:47:22,670 --> 00:47:26,830
h1 to the rth power.

743
00:47:26,830 --> 00:47:33,270
So what we have is, in this
quantity here, is f of y given

744
00:47:33,270 --> 00:47:35,990
h0 to the minus r power.

745
00:47:35,990 --> 00:47:40,100
So this term combined with
this term gives us f of 1

746
00:47:40,100 --> 00:47:45,890
minus r of y given h0, and this
quantity here is f to the

747
00:47:45,890 --> 00:47:50,460
r of y given h1 dy.

748
00:47:50,460 --> 00:47:55,340
So the semi invariant moment
generating function is this

749
00:47:55,340 --> 00:47:56,410
quantity here.

750
00:47:56,410 --> 00:48:02,910
At r equals 1, this is just f
of y given h1, so the log of

751
00:48:02,910 --> 00:48:05,470
it is equal to 0.

752
00:48:05,470 --> 00:48:10,420
So what we're saying is that,
for any old detection problem

753
00:48:10,420 --> 00:48:13,590
in the world, so long as this
moment generating function

754
00:48:13,590 --> 00:48:20,560
exists, what happens is it
starts at 0, it comes down,

755
00:48:20,560 --> 00:48:26,750
comes back up again, and
r star is equal to 1.

756
00:48:26,750 --> 00:48:28,110
That's what we've just shown.

757
00:48:28,110 --> 00:48:33,140
When r is equal to 1, this whole
thing is equal to 1, so

758
00:48:33,140 --> 00:48:35,560
the log of 1 is equal to 0.

759
00:48:35,560 --> 00:48:38,680
For every one of these problems,
you know where this

760
00:48:38,680 --> 00:48:42,090
intercept is, you know where
this intercept is, one is at

761
00:48:42,090 --> 00:48:43,950
0, one is at 1.

762
00:48:59,190 --> 00:49:01,560
What we're going to do now is
try to find out what the

763
00:49:01,560 --> 00:49:10,020
probability of error is given
that h is 0, h equals 0, is

764
00:49:10,020 --> 00:49:12,320
the correct hypothesis.

765
00:49:12,320 --> 00:49:16,010
So we're assuming that the
probabilities are actually f

766
00:49:16,010 --> 00:49:17,760
of y given h0.

767
00:49:17,760 --> 00:49:22,450
We calculate this quantity that
looks like this, and we

768
00:49:22,450 --> 00:49:32,290
ask what is the probability
that this sum of random

769
00:49:32,290 --> 00:49:36,690
variables exceeds the threshold,
exceeds the

770
00:49:36,690 --> 00:49:37,750
threshold eta.

771
00:49:37,750 --> 00:49:42,260
So the thing that we do is we
draw a line, a slope, natural

772
00:49:42,260 --> 00:49:44,840
log of eta divided by eta.

773
00:49:44,840 --> 00:49:48,780
We draw that slope along here,
and we find that the

774
00:49:48,780 --> 00:49:54,120
probability of error is upper
bounded by gamma 0 of this

775
00:49:54,120 --> 00:49:58,550
quantity, defined by the slope,
minus r0 times log of

776
00:49:58,550 --> 00:50:01,850
eta divided by eta.

777
00:50:01,850 --> 00:50:04,400
That's all there is to it.

778
00:50:04,400 --> 00:50:05,710
Any questions about that?

779
00:50:08,450 --> 00:50:11,470
Seem obvious?

780
00:50:11,470 --> 00:50:12,720
Seem strange?

781
00:50:14,820 --> 00:50:22,760
OK, so the probability of r
conditional on h equals 0 is e

782
00:50:22,760 --> 00:50:27,090
to the n times gamma 0 of
r0 minus r0, natural

783
00:50:27,090 --> 00:50:28,940
log of eta over eta.

784
00:50:28,940 --> 00:50:33,050
And ql of eta is the probability
of error given

785
00:50:33,050 --> 00:50:36,790
that h is equal to l.

786
00:50:36,790 --> 00:50:43,190
OK, we can do the same thing
for hypothesis 1.

787
00:50:43,190 --> 00:50:46,570
We're asking what's the
probability of error given

788
00:50:46,570 --> 00:50:51,040
that h equals 1 is the correct
hypothesis, and given that we

789
00:50:51,040 --> 00:50:53,950
choose a threshold, say
we know the a priori

790
00:50:53,950 --> 00:50:58,180
probabilities, so we choose
a threshold that way.

791
00:50:58,180 --> 00:51:01,850
OK, we go through the same
argument, z1 of s is the

792
00:51:01,850 --> 00:51:08,760
natural log of f of y given F1
times e to the s, we're using

793
00:51:08,760 --> 00:51:14,150
s in place of r here, times
the natural log of f of y

794
00:51:14,150 --> 00:51:18,320
given h1 over f of y given h0.

795
00:51:18,320 --> 00:51:25,380
And this quantity, now, f of y
given h1, the f of y given h1

796
00:51:25,380 --> 00:51:31,510
is upstairs, so we have f of
1 plus s of y given h1.

797
00:51:31,510 --> 00:51:34,460
This quantity is down here,
so we have f of minus

798
00:51:34,460 --> 00:51:38,210
s of y given h0.

799
00:51:38,210 --> 00:51:42,900
And we notice that when s is
equal to minus 1, this is

800
00:51:42,900 --> 00:51:47,220
again equal to 0, and we notice
also, if you compare

801
00:51:47,220 --> 00:51:53,350
this, gamma 1 of s is equal
to gamma 0 of r minus 1.

802
00:51:53,350 --> 00:51:57,740
These two functions are the
same, just shifts it by one.

803
00:51:57,740 --> 00:52:02,680
OK, so this one of the very
strange things about

804
00:52:02,680 --> 00:52:09,070
hypothesis testing, namely you
are calculating these expected

805
00:52:09,070 --> 00:52:12,640
values, but you're calculating
the expected value of a

806
00:52:12,640 --> 00:52:14,320
likelihood ratio.

807
00:52:14,320 --> 00:52:17,630
And the likelihood ratio
involves the probabilities of

808
00:52:17,630 --> 00:52:22,470
the hypotheses also, so when you
calculate that ratio, what

809
00:52:22,470 --> 00:52:27,240
you get this is funny quantity
here, which is related to what

810
00:52:27,240 --> 00:52:30,460
you get when you calculate
the semi invariant moment

811
00:52:30,460 --> 00:52:34,420
generating function given
the other hypothesis.

812
00:52:34,420 --> 00:52:38,600
So that now, what we wind up
with is a gamma 1 of the eta,

813
00:52:38,600 --> 00:52:42,750
is e to the n times
gamma 0 of r0.

814
00:52:42,750 --> 00:52:46,610
I'm using the fact that gamma 1
of s is equal to gamma 0 of

815
00:52:46,610 --> 00:52:52,150
r minus 1, s is just r shifted
over by 1, so I can do the

816
00:52:52,150 --> 00:52:54,460
same optimization for each.

817
00:52:54,460 --> 00:52:58,940
So what I wind up with is
the probability of error

818
00:52:58,940 --> 00:53:05,150
conditional on hypothesis 0,
is this quantity down here.

819
00:53:11,050 --> 00:53:14,160
That's this one, and the
probability of error

820
00:53:14,160 --> 00:53:18,810
conditional on the other
hypothesis, the exponent is

821
00:53:18,810 --> 00:53:21,804
equal to this quantity here.

822
00:53:21,804 --> 00:53:29,480
OK, so what that says is that
as you shift the threshold--

823
00:53:29,480 --> 00:53:36,270
in other words, suppose instead
of using a map test,

824
00:53:36,270 --> 00:53:39,710
you say, well, I want the
probability of error to be

825
00:53:39,710 --> 00:53:44,050
small when hypothesis
0 correct.

826
00:53:44,050 --> 00:53:47,850
I want it to be small when
hypothesis 1 is correct.

827
00:53:47,850 --> 00:53:50,450
I have a trade off between
those two.

828
00:53:50,450 --> 00:53:54,170
How do I choose my threshold in
order to get the smallest

829
00:53:54,170 --> 00:53:56,610
value overall?

830
00:53:56,610 --> 00:54:00,360
So you say, well,
you're stuck.

831
00:54:00,360 --> 00:54:04,400
You have one exponent
under hypothesis 0.

832
00:54:04,400 --> 00:54:08,120
You have another exponent
under hypothesis 1.

833
00:54:08,120 --> 00:54:10,350
You have this curve here.

834
00:54:10,350 --> 00:54:16,570
You can take whatever value you
want over here, and that

835
00:54:16,570 --> 00:54:19,950
sticks you with a value here.

836
00:54:19,950 --> 00:54:26,920
You can rock things around this
inverted seesaw, and you

837
00:54:26,920 --> 00:54:29,830
can make one probability of
error bigger by making the

838
00:54:29,830 --> 00:54:34,550
other one smaller, or you make
the other one bigger by making

839
00:54:34,550 --> 00:54:37,000
the other one smaller.

840
00:54:37,000 --> 00:54:40,820
Namely, what you're doing is
changing the threshold, and as

841
00:54:40,820 --> 00:54:45,190
you change the threshold, as
you make the threshold

842
00:54:45,190 --> 00:54:50,600
positive, what you're doing is
making it harder to accept h1,

843
00:54:50,600 --> 00:54:54,500
h equals 1, and easier
to accept h equals 0.

844
00:54:54,500 --> 00:54:57,350
When you move the threshold the
other way, you're making

845
00:54:57,350 --> 00:54:59,160
it easier the other way.

846
00:54:59,160 --> 00:55:04,410
This, in fact, gives you the
choice between the two.

847
00:55:04,410 --> 00:55:07,100
You decide you're going
to take n tests.

848
00:55:07,100 --> 00:55:10,650
You can make both of these
smaller by making n bigger.

849
00:55:10,650 --> 00:55:13,860
But there's a trade off between
the two, and the trade

850
00:55:13,860 --> 00:55:21,690
off is given by this tangent
line to this curve here.

851
00:55:21,690 --> 00:55:26,620
And you're always stuck with
r star equals 1 and

852
00:55:26,620 --> 00:55:27,900
all of these problems.

853
00:55:27,900 --> 00:55:30,840
So the only question is what
does this curve look like?

854
00:55:30,840 --> 00:55:39,350
Notice that the expected value
of the likelihood ratio given

855
00:55:39,350 --> 00:55:42,420
h equals 0 is negative.

856
00:55:42,420 --> 00:55:51,500
The expected value given h
equals 1 is positive, and

857
00:55:51,500 --> 00:55:56,400
that's just because of the form
of the likelihood ratio.

858
00:55:56,400 --> 00:56:03,420
OK, so this actually shows
these two exponents.

859
00:56:03,420 --> 00:56:07,040
These are the exponents for
the two kinds of errors.

860
00:56:07,040 --> 00:56:10,950
You can view this as a large
deviation form of the Neyman

861
00:56:10,950 --> 00:56:13,030
Pearson test.

862
00:56:13,030 --> 00:56:16,700
In the Neyman Pearson test,
you're doing things in a very

863
00:56:16,700 --> 00:56:23,080
detailed way, and you're
taking a choice between

864
00:56:23,080 --> 00:56:27,800
choosing different thresholds
to make the probability of

865
00:56:27,800 --> 00:56:32,030
error of one type bigger or less
than the other one, just

866
00:56:32,030 --> 00:56:33,110
the other way.

867
00:56:33,110 --> 00:56:36,250
Here, we're looking at the large
deviation form of it

868
00:56:36,250 --> 00:56:38,830
that becomes an upper bound
rather than an exact

869
00:56:38,830 --> 00:56:42,560
calculation, but it tells you
much, much more because for

870
00:56:42,560 --> 00:56:48,430
most of these threshold tests,
you're going to do enough

871
00:56:48,430 --> 00:56:51,460
experiments that your
probability of error is going

872
00:56:51,460 --> 00:56:52,820
to be very small.

873
00:56:52,820 --> 00:56:57,070
So the only question is where
do you really want the error

874
00:56:57,070 --> 00:56:58,880
probability to be small?

875
00:56:58,880 --> 00:57:02,540
You can make it very small one
way by shifting the curve this

876
00:57:02,540 --> 00:57:05,700
way, and make it very small the
other way by shifting the

877
00:57:05,700 --> 00:57:08,240
curve the other way.

878
00:57:08,240 --> 00:57:10,260
And you take your choice
of which you want.

879
00:57:13,010 --> 00:57:17,350
OK, the a priori probabilities
are usually not the essential

880
00:57:17,350 --> 00:57:20,010
characteristic when you're
dealing with this large

881
00:57:20,010 --> 00:57:23,310
deviation kind of result
because, when you take a large

882
00:57:23,310 --> 00:57:28,360
number of tests, this threshold,
log eta over eta

883
00:57:28,360 --> 00:57:32,830
over n, when n becomes very
large, when you have a large

884
00:57:32,830 --> 00:57:36,840
number of experiments,
log eta over n

885
00:57:36,840 --> 00:57:38,290
becomes relatively small.

886
00:57:38,290 --> 00:57:42,070
So that's not the thing you're
usually concerned with.

887
00:57:42,070 --> 00:57:45,940
What you're concerned with is
whether one test, the patient

888
00:57:45,940 --> 00:57:50,640
dies, and the other tests costs
a lot of money; or one

889
00:57:50,640 --> 00:57:55,010
test, the nuclear plant blows
up, and the other test, you

890
00:57:55,010 --> 00:57:56,450
waste a lot of money,
which you wouldn't

891
00:57:56,450 --> 00:57:57,700
have had to pay otherwise.

892
00:58:01,100 --> 00:58:09,240
OK, now, here's the important
part of all of this.

893
00:58:09,240 --> 00:58:12,630
So far, it looked like there
wasn't any way to get out of

894
00:58:12,630 --> 00:58:18,910
this trade off between choosing
a threshold to make

895
00:58:18,910 --> 00:58:22,350
the error probability small one
way, or making the error

896
00:58:22,350 --> 00:58:24,700
probability small
the other way.

897
00:58:24,700 --> 00:58:26,510
And you think, well,
yes, there is a way

898
00:58:26,510 --> 00:58:28,390
to get around it.

899
00:58:28,390 --> 00:58:32,980
What I should do is what I do
in real life, namely if I'm

900
00:58:32,980 --> 00:58:37,580
trying to decide about
something, what I'm normally

901
00:58:37,580 --> 00:58:41,690
going to do, I don't like to
waste my time deciding about

902
00:58:41,690 --> 00:58:44,770
it, so as soon as the decision
becomes relatively

903
00:58:44,770 --> 00:58:47,470
straightforward, I
make up my mind.

904
00:58:47,470 --> 00:58:49,780
If the decision is not
straightforward, if I don't

905
00:58:49,780 --> 00:58:53,780
have enough evidence, I keep
doing more tests, so

906
00:58:53,780 --> 00:58:57,460
sequential tests are an obvious
thing to try to do if

907
00:58:57,460 --> 00:59:00,080
you can do it.

908
00:59:00,080 --> 00:59:03,900
What we have here, what we've
shown, is we have two coupled

909
00:59:03,900 --> 00:59:05,480
random walks.

910
00:59:05,480 --> 00:59:11,120
Given hypothesis h equals 0, we
have one random walk, and

911
00:59:11,120 --> 00:59:16,250
that random walk is typically
going to go down.

912
00:59:16,250 --> 00:59:19,960
Given h equals 1, we have
another random walk.

913
00:59:19,960 --> 00:59:24,640
That random walk is typically
going to go up.

914
00:59:24,640 --> 00:59:27,680
And one is going to go down, one
is going to go up, because

915
00:59:27,680 --> 00:59:31,935
we've defined the random
variable involved is a log of

916
00:59:31,935 --> 00:59:41,300
f of y given h1 divided by f
of y given h0, which is why

917
00:59:41,300 --> 00:59:45,610
the 1 walk goes up, and
the 0 walk goes down.

918
00:59:45,610 --> 00:59:50,600
Now, the thing we're going to
do is do a sequential test.

919
00:59:50,600 --> 00:59:53,120
We're going to keep
doing experiments

920
00:59:53,120 --> 00:59:55,200
until we cross a threshold.

921
00:59:55,200 --> 00:59:58,560
We're going to decide what
threshold is going to give us

922
00:59:58,560 --> 01:00:02,550
a small enough probability of
error under each condition,

923
01:00:02,550 --> 01:00:04,650
and then we choose
that threshold.

924
01:00:04,650 --> 01:00:09,710
And we continue to test
until we get there.

925
01:00:09,710 --> 01:00:13,120
So we want to find out whether
we've gained anything by that,

926
01:00:13,120 --> 01:00:15,210
how much we've gained
if we gain something

927
01:00:15,210 --> 01:00:18,650
by it, and so forth.

928
01:00:18,650 --> 01:00:24,370
OK, when you use two thresholds,
alpha's going to

929
01:00:24,370 --> 01:00:25,420
be bigger than 0.

930
01:00:25,420 --> 01:00:27,720
Beta's going to be
less than 0.

931
01:00:27,720 --> 01:00:32,110
The expected value of z given
h0 is less than 0, but the

932
01:00:32,110 --> 01:00:36,410
value of z given h1
is greater than 0.

933
01:00:36,410 --> 01:00:38,810
That's why the walks are
coupled, so we can handle each

934
01:00:38,810 --> 01:00:42,120
of them separately until we can
get the answers for one

935
01:00:42,120 --> 01:00:44,530
from the answers
for the other.

936
01:00:44,530 --> 01:00:50,670
Crossing alpha is a rare event
for the random walk with h0

937
01:00:50,670 --> 01:00:52,270
because a random walk
with h0, you're

938
01:00:52,270 --> 01:00:54,250
going to go down typically.

939
01:00:54,250 --> 01:00:55,140
You hardly ever go up.

940
01:00:55,140 --> 01:00:55,784
Yes?

941
01:00:55,784 --> 01:00:57,560
AUDIENCE: Can you please
explain again sign of

942
01:00:57,560 --> 01:00:58,800
expectations?

943
01:00:58,800 --> 01:01:00,360
PROFESSOR: The sign of
the expectations?

944
01:01:00,360 --> 01:01:25,170
Yes, z is the log, so that when
we actually have h equals

945
01:01:25,170 --> 01:01:29,670
1, the expected value of this
is going to be lined up with

946
01:01:29,670 --> 01:01:31,350
this term on top.

947
01:01:31,350 --> 01:01:34,700
We have f of y.

948
01:01:34,700 --> 01:01:37,920
When we have h equals 0,
this lined up with

949
01:01:37,920 --> 01:01:40,230
the term on the bottom.

950
01:01:40,230 --> 01:01:45,270
I mean, actually, you have to
go through and actually show

951
01:01:45,270 --> 01:01:52,590
that the integral of f of y
given h1 of this quantity is

952
01:01:52,590 --> 01:01:55,360
greater than 0, and the other
one is less than 0.

953
01:01:55,360 --> 01:01:59,020
We don't really have to do that
because, if we calculate

954
01:01:59,020 --> 01:02:02,550
this moment generating
function, we can

955
01:02:02,550 --> 01:02:05,640
pick it off of there.

956
01:02:05,640 --> 01:02:13,380
When we look at this moment
generating function, that

957
01:02:13,380 --> 01:02:19,110
slope there is the expected
value of z conditional on h

958
01:02:19,110 --> 01:02:23,900
equals 0, and because of the
shifting property, this slope

959
01:02:23,900 --> 01:02:29,010
here is the expected value of
z given h equals 1, just

960
01:02:29,010 --> 01:02:33,750
because the 1 curve is shifted
from the other by one unit.

961
01:02:38,840 --> 01:02:41,400
It's really because
of that ratio.

962
01:02:41,400 --> 01:02:44,010
If you defined it the other
way, you just changed the

963
01:02:44,010 --> 01:02:46,598
sign, so nothing important
would happen.

964
01:02:51,478 --> 01:02:58,980
OK, so r start equals 1 for
the h0 walk, so the

965
01:02:58,980 --> 01:03:03,700
probability of error, given h0,
is less than or equal to e

966
01:03:03,700 --> 01:03:05,430
to the minus alpha.

967
01:03:05,430 --> 01:03:08,040
Well, that's a nice simple
result, isn't it?

968
01:03:08,040 --> 01:03:09,420
In fact, that's really
beautifully.

969
01:03:09,420 --> 01:03:13,740
You just calculate this moment
generating function, you find

970
01:03:13,740 --> 01:03:15,350
the root of it, and
you're done.

971
01:03:15,350 --> 01:03:18,260
You have a nice bound, and in
fact, it's an exponentially

972
01:03:18,260 --> 01:03:20,460
tight bound.

973
01:03:20,460 --> 01:03:24,790
And on the other hand, when you
deal with the probability

974
01:03:24,790 --> 01:03:29,230
of error given h1 by symmetry,
it's less than or equal to e

975
01:03:29,230 --> 01:03:29,990
to the beta.

976
01:03:29,990 --> 01:03:32,570
Beta is a negative number,
remember, so this is

977
01:03:32,570 --> 01:03:35,460
exponentially going
down as you choose

978
01:03:35,460 --> 01:03:38,030
beta, smaller and smaller.

979
01:03:38,030 --> 01:03:41,140
So the thing that we're getting
is we can make each of

980
01:03:41,140 --> 01:03:48,290
these error probabilities as
small as we want, this one, by

981
01:03:48,290 --> 01:03:49,935
making alpha big.

982
01:03:49,935 --> 01:03:52,530
We can make this one as
small as we want by

983
01:03:52,530 --> 01:03:55,360
making beta big negative.

984
01:03:55,360 --> 01:03:58,460
There must be a cost to this.

985
01:03:58,460 --> 01:03:59,710
OK, but what's the cost?

986
01:04:03,480 --> 01:04:05,595
What happens when you
make alpha big?

987
01:04:10,010 --> 01:04:14,110
When hypothesis 1 is the correct
hypothesis, what

988
01:04:14,110 --> 01:04:19,010
normally happens is that this
random walk is going to go up

989
01:04:19,010 --> 01:04:23,580
roughly at a slope of the
expected value of z

990
01:04:23,580 --> 01:04:26,530
given h equals 0.

991
01:04:26,530 --> 01:04:29,970
So when you make alpha very,
very large, you're forced to

992
01:04:29,970 --> 01:04:35,040
make a very large number of
tests when h is equal to 1.

993
01:04:35,040 --> 01:04:37,610
When you make beta very, very
large, you're forced to take a

994
01:04:37,610 --> 01:04:41,840
large number of tests when
h is equal to 0.

995
01:04:41,840 --> 01:04:45,220
So the trade off here is
a little bit funny.

996
01:04:45,220 --> 01:04:49,090
You make your error probability
for h equals 0

997
01:04:49,090 --> 01:04:54,280
very, very small by costing more
money when hypotheses 1

998
01:04:54,280 --> 01:04:57,730
is the correct hypothesis
because you don't make a

999
01:04:57,730 --> 01:05:01,540
decision until you've
really climb way up

1000
01:05:01,540 --> 01:05:03,280
on this random walk.

1001
01:05:03,280 --> 01:05:09,050
And that means it takes a long
time when you have h equals 1.

1002
01:05:09,050 --> 01:05:12,490
Since when h is equal to 1, the
probability of crossing

1003
01:05:12,490 --> 01:05:17,660
this lower threshold is it is
almost negligible, this

1004
01:05:17,660 --> 01:05:20,550
expected time that it takes
is really just a

1005
01:05:20,550 --> 01:05:23,110
function of h equals 1.

1006
01:05:23,110 --> 01:05:24,880
I'm going to show that
in the next slide.

1007
01:05:31,280 --> 01:05:35,780
When you increase alpha, it
lowers the probability of

1008
01:05:35,780 --> 01:05:39,190
error given h equals 0.

1009
01:05:39,190 --> 01:05:42,560
Excuse me, I should have h
equals 0 instead of h sub 0.

1010
01:05:42,560 --> 01:05:48,660
Exponentially, it increases the
expected number of steps

1011
01:05:48,660 --> 01:05:54,420
until you make a decision
given h1.

1012
01:05:54,420 --> 01:05:58,940
Expected value of j given h1 is
effectively equal to alpha

1013
01:05:58,940 --> 01:06:02,960
divided by expected value
of z given h1.

1014
01:06:02,960 --> 01:06:05,840
Why is that?

1015
01:06:05,840 --> 01:06:08,210
That's essentially
Wald's equality.

1016
01:06:08,210 --> 01:06:11,700
Not Wald's identity, but Wald's
equality because--

1017
01:06:25,250 --> 01:06:29,500
Yes, it says from Wald's
equality, since alpha is

1018
01:06:29,500 --> 01:06:33,570
essentially equal to the
expected value of s of j given

1019
01:06:33,570 --> 01:06:37,040
h equals 1, the number of
testing you have to take when

1020
01:06:37,040 --> 01:06:41,960
h is equal to 1, when alpha
is very, very large, is

1021
01:06:41,960 --> 01:06:45,430
effectively the amount of time
that it takes you to get up to

1022
01:06:45,430 --> 01:06:46,310
the point alpha.

1023
01:06:46,310 --> 01:06:51,110
That expected amount of time is
typically pretty close to

1024
01:06:51,110 --> 01:06:52,790
the mean value.

1025
01:06:52,790 --> 01:07:01,530
So alpha there is close to the
expected value of s of j given

1026
01:07:01,530 --> 01:07:02,730
h equals 1.

1027
01:07:02,730 --> 01:07:07,260
So Wald's equality, given h
equals 1, says the expected

1028
01:07:07,260 --> 01:07:13,880
value of j given h1 is equal
to the expected value of sj

1029
01:07:13,880 --> 01:07:17,530
given h equals 1, that's alpha,
divided by the expected

1030
01:07:17,530 --> 01:07:22,550
value of z given h1,
which is just the

1031
01:07:22,550 --> 01:07:24,220
underlying likelihood ratio.

1032
01:07:27,640 --> 01:07:30,850
So to get this result, we just
substitute alpha for the

1033
01:07:30,850 --> 01:07:33,340
expected value.

1034
01:07:33,340 --> 01:07:37,260
And then the probability of
error, given h equals 0, if we

1035
01:07:37,260 --> 01:07:40,680
write it this way, we see
the cost immediately.

1036
01:07:40,680 --> 01:07:46,230
That's the expected value
of j given h equal to 1.

1037
01:07:46,230 --> 01:07:50,430
In other words, the expected
number of tests given h equals

1038
01:07:50,430 --> 01:07:54,555
1 times the expected value of
the log likelihood ratio given

1039
01:07:54,555 --> 01:07:55,820
h equals 1.

1040
01:07:55,820 --> 01:07:59,220
When you decrease beta, that
lowers the probability of

1041
01:07:59,220 --> 01:08:03,960
error given h1 exponentially,
but it increases the number of

1042
01:08:03,960 --> 01:08:08,390
tests when h0 is the
correct hypothesis.

1043
01:08:08,390 --> 01:08:14,010
So in that case, you get the
probability of error given h

1044
01:08:14,010 --> 01:08:20,240
equals 1 is effectively equal to
the expected value e to the

1045
01:08:20,240 --> 01:08:23,970
expected value of j
equals j equals 0.

1046
01:08:23,970 --> 01:08:27,160
This is just the number of tests
you have to do when h is

1047
01:08:27,160 --> 01:08:28,630
equal to 0.

1048
01:08:28,630 --> 01:08:32,450
This is the expected value of
the log likelihood ratio when

1049
01:08:32,450 --> 01:08:34,930
h is equal to 0.

1050
01:08:34,930 --> 01:08:38,630
This is very approximate, but
this is how you would actually

1051
01:08:38,630 --> 01:08:43,680
choose how big you make alpha,
how big do you make beta if

1052
01:08:43,680 --> 01:08:48,109
you want to do a test between
these two hypotheses.

1053
01:08:48,109 --> 01:08:54,520
Now, this shows what you're
gaining by the sequential test

1054
01:08:54,520 --> 01:08:57,130
over what you're gaining by
the non-sequential test.

1055
01:08:57,130 --> 01:09:00,160
You don't have this in your
notes, so you might just jot

1056
01:09:00,160 --> 01:09:01,439
it down quickly.

1057
01:09:01,439 --> 01:09:08,399
The expected value of z,
conditional on h equals 0, is

1058
01:09:08,399 --> 01:09:14,220
this slope here, the slope of
the moment generating function

1059
01:09:14,220 --> 01:09:16,250
is z equals 0.

1060
01:09:16,250 --> 01:09:20,490
That's the slope of the
underlying random variable.

1061
01:09:20,490 --> 01:09:24,920
Since this point is r equal to
1, this point down here is the

1062
01:09:24,920 --> 01:09:29,130
expected value of z
given h equals 0.

1063
01:09:29,130 --> 01:09:38,620
That's the exponents that you
get when h equals 0 is, in

1064
01:09:38,620 --> 01:09:40,120
fact, the correct exponent.

1065
01:09:40,120 --> 01:09:45,819
When given the probability of
error given that h is equal to

1066
01:09:45,819 --> 01:09:50,600
0, namely the probability that
you choose hypothesis 1.

1067
01:09:50,600 --> 01:09:51,950
Same way over here.

1068
01:09:51,950 --> 01:09:55,440
This slope here is the expected
value of the log

1069
01:09:55,440 --> 01:09:59,015
likelihood ratio given
h equals 1.

1070
01:09:59,015 --> 01:10:03,230
This hits down here at minus
expected value of z

1071
01:10:03,230 --> 01:10:05,050
given h equals 1.

1072
01:10:05,050 --> 01:10:08,750
So you have this exponent going
one way, you have this

1073
01:10:08,750 --> 01:10:12,060
exponent going the other way
when the thing multiplying the

1074
01:10:12,060 --> 01:10:15,900
exponent is not an absolute
value but is, in fact, the

1075
01:10:15,900 --> 01:10:20,130
number of tests you have to
do than the other test.

1076
01:10:20,130 --> 01:10:25,710
Now, if we do the fix test,
what we're fixed with is a

1077
01:10:25,710 --> 01:10:31,550
test where you take a line
tangent to this curve, which

1078
01:10:31,550 --> 01:10:34,090
goes from here across
here to there.

1079
01:10:34,090 --> 01:10:36,410
We can see-saw it around.

1080
01:10:36,410 --> 01:10:40,470
When we see-saw it all the way
in the limit, we can get this

1081
01:10:40,470 --> 01:10:42,370
result here.

1082
01:10:42,370 --> 01:10:48,240
But we get this result here at
the cost of an error, which is

1083
01:10:48,240 --> 01:10:52,030
almost one in the other
case, so that's

1084
01:10:52,030 --> 01:10:54,060
not a very good deal.

1085
01:10:54,060 --> 01:10:57,940
This says that sequential
testing, well, it shows you

1086
01:10:57,940 --> 01:11:01,960
how much you gain by doing
a sequential test.

1087
01:11:01,960 --> 01:11:04,620
I mean, it might not be
intuitively obvious why this

1088
01:11:04,620 --> 01:11:06,580
is happening.

1089
01:11:06,580 --> 01:11:10,480
I mean, really the reason it's
happening is that the times

1090
01:11:10,480 --> 01:11:16,600
when you want to make the test
very long are those times when

1091
01:11:16,600 --> 01:11:21,110
if h is equal to 0, you
normally go down.

1092
01:11:21,110 --> 01:11:24,620
The next most normal thing is
you wobble around without

1093
01:11:24,620 --> 01:11:28,430
doing anything for a long time,
in which case you want

1094
01:11:28,430 --> 01:11:33,210
to keep doing additional tests
until finally it falls down,

1095
01:11:33,210 --> 01:11:35,090
or finally it goes up.

1096
01:11:35,090 --> 01:11:38,810
But by taking additional
tests, you make it very

1097
01:11:38,810 --> 01:11:42,740
unlikely that you're ever going
to cross that threshold.

1098
01:11:42,740 --> 01:11:45,800
So that's the thing
you're gaining.

1099
01:11:45,800 --> 01:11:54,140
You are gaining the fact that
the error is small in those

1100
01:11:54,140 --> 01:11:58,560
situations where the sum of
these random variables stays

1101
01:11:58,560 --> 01:12:02,410
close to 0 for a long time, and
then you don't make errors

1102
01:12:02,410 --> 01:12:03,660
in those cases.

1103
01:12:08,120 --> 01:12:09,930
We now have just a little
bit of time to

1104
01:12:09,930 --> 01:12:11,930
prove Wald's identity.

1105
01:12:11,930 --> 01:12:14,890
I don't want to have a lot of
time to prove it because

1106
01:12:14,890 --> 01:12:18,090
proofs of theorems are things
you really have to look at

1107
01:12:18,090 --> 01:12:19,810
yourselves.

1108
01:12:19,810 --> 01:12:22,320
This one, you almost don't
have to look at it.

1109
01:12:22,320 --> 01:12:27,460
This one is almost obvious as
soon as you understand what a

1110
01:12:27,460 --> 01:12:29,790
tilted probability is.

1111
01:12:29,790 --> 01:12:35,480
So let's suppose that x sub n is
a sequence of IID discrete

1112
01:12:35,480 --> 01:12:36,840
random variables.

1113
01:12:36,840 --> 01:12:42,110
It has a moment generating
function for some given r.

1114
01:12:42,110 --> 01:12:44,240
We're going to assume that these
random variables are

1115
01:12:44,240 --> 01:12:47,740
discrete now to make this
argument simple.

1116
01:12:47,740 --> 01:12:51,770
If they're not discrete, this
whole argument has to be

1117
01:12:51,770 --> 01:12:54,270
replaced with all sorts
of [INAUDIBLE]

1118
01:12:54,270 --> 01:12:56,090
integrals and all
of that stuff.

1119
01:12:56,090 --> 01:12:59,450
It's exactly the same idea,
but it just is messy

1120
01:12:59,450 --> 01:13:01,190
mathematically.

1121
01:13:01,190 --> 01:13:04,540
So what we're going to do is
we're going to define a tilted

1122
01:13:04,540 --> 01:13:05,970
random variable.

1123
01:13:05,970 --> 01:13:10,590
A tilted random variable is a
random variable in a different

1124
01:13:10,590 --> 01:13:11,860
probability space.

1125
01:13:11,860 --> 01:13:14,950
OK, we start out with this
probability space that we're

1126
01:13:14,950 --> 01:13:23,830
interested in, and then we say,
OK, suppose that we, just

1127
01:13:23,830 --> 01:13:29,000
to satisfy our imaginations, we
suppose the probabilities

1128
01:13:29,000 --> 01:13:30,690
are different.

1129
01:13:30,690 --> 01:13:34,090
We assume that the probabilities
for a given r is

1130
01:13:34,090 --> 01:13:39,220
the probability that the random
variable X is equal to

1131
01:13:39,220 --> 01:13:47,070
little x, namely this quantity
here, is equal to the original

1132
01:13:47,070 --> 01:13:50,420
probability that X is
equal to little x.

1133
01:13:50,420 --> 01:13:52,980
All the sample values are
the same, it's just the

1134
01:13:52,980 --> 01:13:59,470
probability's changed, times e
to the rx minus gamma of r.

1135
01:13:59,470 --> 01:14:03,240
So we're taking these
probabilities when X is large.

1136
01:14:03,240 --> 01:14:05,790
We're magnifying them
when x is small.

1137
01:14:05,790 --> 01:14:07,950
We're knocking them down.

1138
01:14:07,950 --> 01:14:09,190
What's the purpose of this?

1139
01:14:09,190 --> 01:14:11,420
It's just a normalization
factor.

1140
01:14:11,420 --> 01:14:16,800
e to the minus gamma of r is 1
over the moment generating

1141
01:14:16,800 --> 01:14:20,730
function of r, so you take
p of x, e to the rx,

1142
01:14:20,730 --> 01:14:25,560
divide it by g of r.

1143
01:14:25,560 --> 01:14:31,280
So this is a probability mass
function, as well as this.

1144
01:14:31,280 --> 01:14:33,980
This is the correct probability
mass function for

1145
01:14:33,980 --> 01:14:36,320
the model you're looking at.

1146
01:14:36,320 --> 01:14:39,860
This is an imaginary one, but
you can always imagine.

1147
01:14:39,860 --> 01:14:44,070
You can say let's suppose that
we had this model instead of

1148
01:14:44,070 --> 01:14:44,950
the other model.

1149
01:14:44,950 --> 01:14:48,430
All the sample values are the
same, but the probabilities

1150
01:14:48,430 --> 01:14:49,800
are different.

1151
01:14:49,800 --> 01:14:53,530
So we want to see what we can
find out from these different

1152
01:14:53,530 --> 01:14:58,200
probabilities in this different
probability model.

1153
01:14:58,200 --> 01:15:01,540
If you sum over x here,
this sum is equal to

1154
01:15:01,540 --> 01:15:03,940
1, as we just said.

1155
01:15:03,940 --> 01:15:11,580
So we'll view q sub xr of x as
the probability mass function

1156
01:15:11,580 --> 01:15:14,560
on x in a new probability
space.

1157
01:15:14,560 --> 01:15:19,050
We can use all the laws of
probability in this new space,

1158
01:15:19,050 --> 01:15:22,000
and that's exactly what
we're going to do.

1159
01:15:22,000 --> 01:15:25,790
And we're going to say things
about the new space, but then

1160
01:15:25,790 --> 01:15:29,670
we can always come back to the
old space from this formula

1161
01:15:29,670 --> 01:15:34,230
here because whatever we find
out in the new space will work

1162
01:15:34,230 --> 01:15:35,880
in the old space.

1163
01:15:35,880 --> 01:15:40,000
One thing we'd like to do is
to be able to find the

1164
01:15:40,000 --> 01:15:44,720
expected value of the random
variable x in this new

1165
01:15:44,720 --> 01:15:48,940
probability space, so this isn't
the expected value in

1166
01:15:48,940 --> 01:15:49,680
the old space.

1167
01:15:49,680 --> 01:15:51,950
It's a probability
in the new space.

1168
01:15:51,950 --> 01:15:57,130
It's the sum over x of x
times q sub xr of x.

1169
01:15:57,130 --> 01:15:59,320
That's what the expected
value is.

1170
01:15:59,320 --> 01:16:02,000
X is the same in both spaces.

1171
01:16:02,000 --> 01:16:05,030
That's just the probabilities
that have changed.

1172
01:16:05,030 --> 01:16:12,400
These are p of x times z to the
rx minus gamma of r, so

1173
01:16:12,400 --> 01:16:16,760
when you sum this, what you get
is 1 over g of xr, which

1174
01:16:16,760 --> 01:16:20,440
is that term, times the
derivative of p sub x

1175
01:16:20,440 --> 01:16:23,560
of x, e to the rx.

1176
01:16:23,560 --> 01:16:27,290
When you take this derivative,
then you get an x in front,

1177
01:16:27,290 --> 01:16:29,020
which is that x there.

1178
01:16:29,020 --> 01:16:33,230
So you get g prime of xr
over gx of r, which is

1179
01:16:33,230 --> 01:16:35,410
gamma prime of r.

1180
01:16:35,410 --> 01:16:38,830
OK, so in terms of that graph
we've drawn, when you take

1181
01:16:38,830 --> 01:16:44,320
these tilted probabilities, you
move that slope, that r

1182
01:16:44,320 --> 01:16:48,370
equals 0, and now you're looking
at a slope at whatever

1183
01:16:48,370 --> 01:16:50,590
r you're looking at.

1184
01:16:50,590 --> 01:16:52,680
And that gives you the
expected value there.

1185
01:16:56,290 --> 01:17:03,770
OK, if you have a joint tilted
probability mass function--

1186
01:17:03,770 --> 01:17:05,750
and don't think it gets
any more complicated.

1187
01:17:05,750 --> 01:17:07,000
It doesn't.

1188
01:17:07,000 --> 01:17:10,120
I mean, you've already gone
through the major complication

1189
01:17:10,120 --> 01:17:11,580
of this argument.

1190
01:17:11,580 --> 01:17:19,690
The joint tilted PMF is the
probability of x1 to xn is the

1191
01:17:19,690 --> 01:17:24,940
old probability of x1 to xn
times all of these tilted

1192
01:17:24,940 --> 01:17:27,440
factors here.

1193
01:17:27,440 --> 01:17:31,470
If you let a of sn be the set
of n tuples which have the

1194
01:17:31,470 --> 01:17:37,590
same sum, then all these terms
become r times s sub n.

1195
01:17:37,590 --> 01:17:42,410
So what you get is that for each
xn for which the sum is

1196
01:17:42,410 --> 01:17:48,620
sn, this tilted probability
becomes the old probability

1197
01:17:48,620 --> 01:17:53,310
times e to the r sn minus n
gamma of r, which says that

1198
01:17:53,310 --> 01:17:58,380
when we look at the tilted
probability of the sum, namely

1199
01:17:58,380 --> 01:18:02,050
we said that when we tilt these
probabilities, we can do

1200
01:18:02,050 --> 01:18:04,710
everything in a new space that
we could do in the old space.

1201
01:18:04,710 --> 01:18:08,080
We can do everything that
probability theory allows us

1202
01:18:08,080 --> 01:18:12,460
to do, so we can look at the
probability of s sub n in the

1203
01:18:12,460 --> 01:18:14,210
new space also.

1204
01:18:14,210 --> 01:18:17,340
The probability of sn in the
old space, namely we're

1205
01:18:17,340 --> 01:18:23,950
summing this quantity, overall
xn in a of sn, so we sum up

1206
01:18:23,950 --> 01:18:28,610
all of those as the probability
sub s sub n at sn

1207
01:18:28,610 --> 01:18:32,090
times this quantity,
which is fixed.

1208
01:18:32,090 --> 01:18:36,980
So this is the key to a lot
of large deviation theory.

1209
01:18:36,980 --> 01:18:41,090
Any time you're dealing with a
difficult problem, and you

1210
01:18:41,090 --> 01:18:45,170
want to see what's happening
way, way away from the mean,

1211
01:18:45,170 --> 01:18:48,110
you want to see what these
sums look like for these

1212
01:18:48,110 --> 01:18:52,040
exceptional cases, what we do
is we look at a new model

1213
01:18:52,040 --> 01:18:56,850
where we tilt the probability so
that the region of concern

1214
01:18:56,850 --> 01:19:02,320
becomes the main region
for that tilted model.

1215
01:19:02,320 --> 01:19:04,960
So for r equals 0, we're
tilting the probability

1216
01:19:04,960 --> 01:19:08,330
towards large values, and you
can use the law of large

1217
01:19:08,330 --> 01:19:11,970
numbers, essential limit
theorem, whatever you want to,

1218
01:19:11,970 --> 01:19:15,280
in that new space, then.

1219
01:19:15,280 --> 01:19:17,105
Now, we can prove
Wald's equality.

1220
01:19:20,490 --> 01:19:25,020
What Wald's identity is is the
statement that when you tilt

1221
01:19:25,020 --> 01:19:31,020
these probabilities, a stopping
rule in this tilted

1222
01:19:31,020 --> 01:19:36,570
world is still the stopping
time is still a random

1223
01:19:36,570 --> 01:19:39,760
variable, namely you still
stop with probability 1.

1224
01:19:39,760 --> 01:19:42,960
Somebody questioned whether you
stop with probability 1 in

1225
01:19:42,960 --> 01:19:44,652
the old world.

1226
01:19:44,652 --> 01:19:47,400
Like I said, you do because
you have this positive

1227
01:19:47,400 --> 01:19:50,330
variance, and the thing with two
thresholds keeps growing

1228
01:19:50,330 --> 01:19:51,850
and growing.

1229
01:19:51,850 --> 01:19:55,560
Here, you have the same thing.

1230
01:19:55,560 --> 01:19:58,530
I mean, the mean doesn't make
any difference at all.

1231
01:19:58,530 --> 01:20:01,770
I mean, you're looking at trying
to exceed one of two

1232
01:20:01,770 --> 01:20:05,020
different thresholds, and
eventually, you exceed one of

1233
01:20:05,020 --> 01:20:07,900
them no matter where
you set r.

1234
01:20:07,900 --> 01:20:13,840
So what this is saying is the
probability that j is equal to

1235
01:20:13,840 --> 01:20:19,610
n in this tilted space is equal
to the probability that

1236
01:20:19,610 --> 01:20:23,920
j is equal to n in the old
space times z to the r sn

1237
01:20:23,920 --> 01:20:26,300
minus gamma of r.

1238
01:20:26,300 --> 01:20:30,280
So this quantity is equal to the
expected value of e to the

1239
01:20:30,280 --> 01:20:32,540
r sn minus gamma of r.

1240
01:20:32,540 --> 01:20:36,300
Given j equals n times the
probability that j is equal to

1241
01:20:36,300 --> 01:20:39,860
n, you sum this over n
and, bingo, you're

1242
01:20:39,860 --> 01:20:42,010
back at the Wald identity.

1243
01:20:42,010 --> 01:20:44,220
So that's all the Wald identity
is, is just a

1244
01:20:44,220 --> 01:20:48,260
statement that when you tilt a
probability, and you have a

1245
01:20:48,260 --> 01:20:52,340
stopping rule on the original
probabilities, you then have a

1246
01:20:52,340 --> 01:20:55,620
stopping rule on the
new probabilities.

1247
01:20:55,620 --> 01:20:59,390
And Wald's identity says--

1248
01:20:59,390 --> 01:21:03,500
well, Wald's identity holds
whenever that tilted stopping

1249
01:21:03,500 --> 01:21:06,372
rule is a random variable.

1250
01:21:06,372 --> 01:21:11,540
OK, that's it for today.

1251
01:21:11,540 --> 01:21:13,240
We will do martingales
on Wednesday.