1
00:00:00,000 --> 00:00:00,040

2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

3
00:00:02,460 --> 00:00:03,870
Commons license.

4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to

5
00:00:06,910 --> 00:00:10,560
offer high-quality educational
resources for free.

6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from

7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at

8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.

9
00:00:20,540 --> 00:00:22,560

10
00:00:22,560 --> 00:00:25,340
PROFESSOR: We're going to finish
today our discussion of

11
00:00:25,340 --> 00:00:27,460
limit theorems.

12
00:00:27,460 --> 00:00:30,340
I'm going to remind you what the
central limit theorem is,

13
00:00:30,340 --> 00:00:33,460
which we introduced
briefly last time.

14
00:00:33,460 --> 00:00:37,230
We're going to discuss what
exactly it says and its

15
00:00:37,230 --> 00:00:38,780
implications.

16
00:00:38,780 --> 00:00:42,100
And then we're going to apply
to a couple of examples,

17
00:00:42,100 --> 00:00:45,520
mostly on the binomial
distribution.

18
00:00:45,520 --> 00:00:49,950
OK, so the situation is that
we are dealing with a large

19
00:00:49,950 --> 00:00:52,420
number of independent,
identically

20
00:00:52,420 --> 00:00:55,000
distributed random variables.

21
00:00:55,000 --> 00:00:58,270
And we want to look at the sum
of them and say something

22
00:00:58,270 --> 00:01:00,510
about the distribution
of the sum.

23
00:01:00,510 --> 00:01:03,310

24
00:01:03,310 --> 00:01:06,910
We might want to say that
the sum is distributed

25
00:01:06,910 --> 00:01:10,510
approximately as a normal random
variable, although,

26
00:01:10,510 --> 00:01:12,750
formally, this is
not quite right.

27
00:01:12,750 --> 00:01:16,330
As n goes to infinity, the
distribution of the sum

28
00:01:16,330 --> 00:01:20,000
becomes very spread out, and
it doesn't converge to a

29
00:01:20,000 --> 00:01:21,830
limiting distribution.

30
00:01:21,830 --> 00:01:24,930
In order to get an interesting
limit, we need first to take

31
00:01:24,930 --> 00:01:28,150
the sum and standardize it.

32
00:01:28,150 --> 00:01:32,267
By standardizing it, what we
mean is to subtract the mean

33
00:01:32,267 --> 00:01:38,060
and then divide by the
standard deviation.

34
00:01:38,060 --> 00:01:41,320
Now, the mean is, of course, n
times the expected value of

35
00:01:41,320 --> 00:01:43,080
each one of the X's.

36
00:01:43,080 --> 00:01:45,130
And the standard deviation
is the

37
00:01:45,130 --> 00:01:46,610
square root of the variance.

38
00:01:46,610 --> 00:01:50,530
The variance is n times sigma
squared, where sigma is the

39
00:01:50,530 --> 00:01:52,180
variance of the X's --

40
00:01:52,180 --> 00:01:53,400
so that's the standard
deviation.

41
00:01:53,400 --> 00:01:56,330
And after we do this, we obtain
a random variable that

42
00:01:56,330 --> 00:02:01,100
has 0 mean -- its centered
-- and the

43
00:02:01,100 --> 00:02:03,230
variance is equal to 1.

44
00:02:03,230 --> 00:02:07,240
And so the variance stays the
same, no matter how large n is

45
00:02:07,240 --> 00:02:08,500
going to be.

46
00:02:08,500 --> 00:02:12,660
So the distribution of Zn keeps
changing with n, but it

47
00:02:12,660 --> 00:02:14,080
cannot change too much.

48
00:02:14,080 --> 00:02:15,240
It stays in place.

49
00:02:15,240 --> 00:02:19,550
The mean is 0, and the width
remains also roughly the same

50
00:02:19,550 --> 00:02:22,000
because the variance is 1.

51
00:02:22,000 --> 00:02:25,820
The surprising thing is that, as
n grows, that distribution

52
00:02:25,820 --> 00:02:31,250
of Zn kind of settles in a
certain asymptotic shape.

53
00:02:31,250 --> 00:02:33,620
And that's the shape
of a standard

54
00:02:33,620 --> 00:02:35,290
normal random variable.

55
00:02:35,290 --> 00:02:37,580
So standard normal means
that it has 0

56
00:02:37,580 --> 00:02:39,930
mean and unit variance.

57
00:02:39,930 --> 00:02:43,850
More precisely, what the central
limit theorem tells us

58
00:02:43,850 --> 00:02:46,560
is a relation between the
cumulative distribution

59
00:02:46,560 --> 00:02:49,430
function of Zn and its relation
to the cumulative

60
00:02:49,430 --> 00:02:51,990
distribution function of
the standard normal.

61
00:02:51,990 --> 00:02:56,620
So for any given number, c,
the probability that Zn is

62
00:02:56,620 --> 00:03:01,140
less than or equal to c, in the
limit, becomes the same as

63
00:03:01,140 --> 00:03:04,090
the probability that the
standard normal becomes less

64
00:03:04,090 --> 00:03:05,760
than or equal to c.

65
00:03:05,760 --> 00:03:08,800
And of course, this is useful
because these probabilities

66
00:03:08,800 --> 00:03:11,960
are available from the normal
tables, whereas the

67
00:03:11,960 --> 00:03:15,850
distribution of Zn might be a
very complicated expression if

68
00:03:15,850 --> 00:03:19,520
you were to calculate
it exactly.

69
00:03:19,520 --> 00:03:22,960
So some comments about the
central limit theorem.

70
00:03:22,960 --> 00:03:27,860
First thing is that it's quite
amazing that it's universal.

71
00:03:27,860 --> 00:03:31,970
It doesn't matter what the
distribution of the X's is.

72
00:03:31,970 --> 00:03:35,970
It can be any distribution
whatsoever, as long as it has

73
00:03:35,970 --> 00:03:39,070
finite mean and finite
variance.

74
00:03:39,070 --> 00:03:42,170
And when you go and do your
approximations using the

75
00:03:42,170 --> 00:03:44,520
central limit theorem, the only
thing that you need to

76
00:03:44,520 --> 00:03:47,580
know about the distribution
of the X's are the

77
00:03:47,580 --> 00:03:49,130
mean and the variance.

78
00:03:49,130 --> 00:03:52,410
You need those in order
to standardize Sn.

79
00:03:52,410 --> 00:03:55,910
I mean -- to subtract the mean
and divide by the standard

80
00:03:55,910 --> 00:03:56,810
deviation --

81
00:03:56,810 --> 00:03:59,120
you need to know the mean
and the variance.

82
00:03:59,120 --> 00:04:02,350
But these are the only things
that you need to know in order

83
00:04:02,350 --> 00:04:03,600
to apply it.

84
00:04:03,600 --> 00:04:06,060

85
00:04:06,060 --> 00:04:08,730
In addition, it's
a very accurate

86
00:04:08,730 --> 00:04:10,650
computational shortcut.

87
00:04:10,650 --> 00:04:14,660
So the distribution of this
Zn's, in principle, you can

88
00:04:14,660 --> 00:04:18,130
calculate it by convolution of
the distribution of the X's

89
00:04:18,130 --> 00:04:20,050
with itself many, many times.

90
00:04:20,050 --> 00:04:23,720
But this is tedious, and if you
try to do it analytically,

91
00:04:23,720 --> 00:04:26,570
it might be a very complicated
expression.

92
00:04:26,570 --> 00:04:29,910
Whereas by just appealing to the
standard normal table for

93
00:04:29,910 --> 00:04:33,870
the standard normal random
variable, things are done in a

94
00:04:33,870 --> 00:04:35,360
very quick way.

95
00:04:35,360 --> 00:04:39,070
So it's a nice computational
shortcut if you don't want to

96
00:04:39,070 --> 00:04:42,210
get an exact answer to a
probability problem.

97
00:04:42,210 --> 00:04:47,480
Now, at a more philosophical
level, it justifies why we are

98
00:04:47,480 --> 00:04:50,930
really interested in normal
random variables.

99
00:04:50,930 --> 00:04:55,230
Whenever you have a phenomenon
which is noisy, and the noise

100
00:04:55,230 --> 00:05:00,420
that you observe is created by
adding the lots of little

101
00:05:00,420 --> 00:05:03,820
pieces of randomness that are
independent of each other, the

102
00:05:03,820 --> 00:05:06,840
overall effect that you're
going to observe can be

103
00:05:06,840 --> 00:05:10,240
described by a normal
random variable.

104
00:05:10,240 --> 00:05:16,810
So in a classic example that
goes 100 years back or so,

105
00:05:16,810 --> 00:05:19,840
suppose that you have a fluid,
and inside that fluid, there's

106
00:05:19,840 --> 00:05:23,340
a little particle of dust
or whatever that's

107
00:05:23,340 --> 00:05:24,950
suspended in there.

108
00:05:24,950 --> 00:05:28,380
That little particle gets
hit by molecules

109
00:05:28,380 --> 00:05:30,000
completely at random --

110
00:05:30,000 --> 00:05:32,730
and so what you're going to see
is that particle kind of

111
00:05:32,730 --> 00:05:36,020
moving randomly inside
that liquid.

112
00:05:36,020 --> 00:05:40,260
Now that random motion, if you
ask, after one second, how

113
00:05:40,260 --> 00:05:45,520
much is my particle displaced,
let's say, in the x-axis along

114
00:05:45,520 --> 00:05:47,170
the x direction.

115
00:05:47,170 --> 00:05:50,960
That displacement is very, very
well modeled by a normal

116
00:05:50,960 --> 00:05:51,960
random variable.

117
00:05:51,960 --> 00:05:55,710
And the reason is that the
position of that particle is

118
00:05:55,710 --> 00:06:00,160
decided by the cumulative effect
of lots of random hits

119
00:06:00,160 --> 00:06:04,480
by molecules that hit
that particle.

120
00:06:04,480 --> 00:06:11,630
So that's a sort of celebrated
physical model that goes under

121
00:06:11,630 --> 00:06:15,000
the name of Brownian motion.

122
00:06:15,000 --> 00:06:18,100
And it's the same model that
some people use to describe

123
00:06:18,100 --> 00:06:20,300
the movement in the
financial markets.

124
00:06:20,300 --> 00:06:24,660
The argument might go that the
movement of prices has to do

125
00:06:24,660 --> 00:06:28,300
with lots of little decisions
and lots of little events by

126
00:06:28,300 --> 00:06:31,310
many, many different
actors that are

127
00:06:31,310 --> 00:06:32,890
involved in the market.

128
00:06:32,890 --> 00:06:37,440
So the distribution of stock
prices might be well described

129
00:06:37,440 --> 00:06:39,740
by normal random variables.

130
00:06:39,740 --> 00:06:43,780
At least that's what people
wanted to believe until

131
00:06:43,780 --> 00:06:45,160
somewhat recently.

132
00:06:45,160 --> 00:06:48,300
Now, the evidence is that,
actually, these distributions

133
00:06:48,300 --> 00:06:52,210
are a little more heavy-tailed
in the sense that extreme

134
00:06:52,210 --> 00:06:55,630
events are a little more likely
to occur that what

135
00:06:55,630 --> 00:06:58,040
normal random variables would
seem to indicate.

136
00:06:58,040 --> 00:07:03,110
But as a first model, again,
it could be a plausible

137
00:07:03,110 --> 00:07:07,300
argument to have, at least as
a starting model, one that

138
00:07:07,300 --> 00:07:10,200
involves normal random
variables.

139
00:07:10,200 --> 00:07:13,020
So this is the philosophical
side of things.

140
00:07:13,020 --> 00:07:15,820
On the more accurate,
mathematical side, it's

141
00:07:15,820 --> 00:07:18,290
important to appreciate
exactly quite kind of

142
00:07:18,290 --> 00:07:21,250
statement the central
limit theorem is.

143
00:07:21,250 --> 00:07:25,460
It's a statement about the
convergence of the CDF of

144
00:07:25,460 --> 00:07:27,940
these standardized random
variables to

145
00:07:27,940 --> 00:07:29,840
the CDF of a normal.

146
00:07:29,840 --> 00:07:32,470
So it's a statement about
convergence of CDFs.

147
00:07:32,470 --> 00:07:36,580
It's not a statement about
convergence of PMFs, or

148
00:07:36,580 --> 00:07:39,100
convergence of PDFs.

149
00:07:39,100 --> 00:07:42,160
Now, if one makes additional
mathematical assumptions,

150
00:07:42,160 --> 00:07:44,880
there are variations of the
central limit theorem that

151
00:07:44,880 --> 00:07:47,220
talk about PDFs and PMFs.

152
00:07:47,220 --> 00:07:51,930
But in general, that's not
necessarily the case.

153
00:07:51,930 --> 00:07:54,610
And I'm going to illustrate
this with--

154
00:07:54,610 --> 00:07:58,890
I have a plot here which
is not in your slides.

155
00:07:58,890 --> 00:08:04,700
But just to make the point,
consider two different

156
00:08:04,700 --> 00:08:06,710
discrete distributions.

157
00:08:06,710 --> 00:08:09,820
This discrete distribution
takes values 1, 4, 7.

158
00:08:09,820 --> 00:08:13,470

159
00:08:13,470 --> 00:08:16,110
This discrete distribution can
take values 1, 2, 4, 6, and 7.

160
00:08:16,110 --> 00:08:18,720

161
00:08:18,720 --> 00:08:24,270
So this one has sort of a
periodicity of 3, this one,

162
00:08:24,270 --> 00:08:27,960
the range of values is a little
more interesting.

163
00:08:27,960 --> 00:08:30,910
The numbers in these two
distributions are cooked up so

164
00:08:30,910 --> 00:08:34,380
that they have the same mean
and the same variance.

165
00:08:34,380 --> 00:08:38,970
Now, what I'm going to do is
to take eight independent

166
00:08:38,970 --> 00:08:44,090
copies of the random variable
and plot the PMF of the sum of

167
00:08:44,090 --> 00:08:45,980
eight random variables.

168
00:08:45,980 --> 00:08:51,520
Now, if I plot the PMF of the
sum of 8 of these, I get the

169
00:08:51,520 --> 00:08:59,690
plot, which corresponds to these
bullets in this diagram.

170
00:08:59,690 --> 00:09:03,040
If I take 8 random variables,
according to this

171
00:09:03,040 --> 00:09:07,270
distribution, and add them up
and compute their PMF, the PMF

172
00:09:07,270 --> 00:09:10,310
I get is the one denoted
here by the X's.

173
00:09:10,310 --> 00:09:15,630
The two PMFs look really
different, at least, when you

174
00:09:15,630 --> 00:09:16,890
eyeball them.

175
00:09:16,890 --> 00:09:23,500
On the other hand, if you were
to plot the CDFs of them, then

176
00:09:23,500 --> 00:09:34,000
the CDFs, if you compare them
with the normal CDF, which is

177
00:09:34,000 --> 00:09:38,390
this continuous curve, the CDF,
of course, it goes up in

178
00:09:38,390 --> 00:09:41,870
steps because we're looking at
discrete random variables.

179
00:09:41,870 --> 00:09:47,600
But it's very close
to the normal CDF.

180
00:09:47,600 --> 00:09:52,000
And if we, instead of n equal to
8, we were to take 16, then

181
00:09:52,000 --> 00:09:54,480
the coincidence would
be even better.

182
00:09:54,480 --> 00:09:59,850
So in terms of CDFs, when we add
8 or 16 of these, we get

183
00:09:59,850 --> 00:10:01,930
very close to the normal CDF.

184
00:10:01,930 --> 00:10:05,080
We would get essentially the
same picture if I were to take

185
00:10:05,080 --> 00:10:06,850
8 or 16 of these.

186
00:10:06,850 --> 00:10:11,730
So the CDFs sit, essentially, on
top of each other, although

187
00:10:11,730 --> 00:10:14,400
the two PMFs look
quite different.

188
00:10:14,400 --> 00:10:17,230
So this is to appreciate that,
formally speaking, we only

189
00:10:17,230 --> 00:10:22,470
have a statement about
CDFs, not about PMFs.

190
00:10:22,470 --> 00:10:26,980
Now in practice, how do you use
the central limit theorem?

191
00:10:26,980 --> 00:10:30,550
Well, it tells us that we can
calculate probabilities by

192
00:10:30,550 --> 00:10:32,810
treating Zn as if it
were a standard

193
00:10:32,810 --> 00:10:34,550
normal random variable.

194
00:10:34,550 --> 00:10:38,280
Now Zn is a linear
function of Sn.

195
00:10:38,280 --> 00:10:43,120
Conversely, Sn is a linear
function of Zn.

196
00:10:43,120 --> 00:10:45,680
Linear functions of normals
are normal.

197
00:10:45,680 --> 00:10:49,450
So if I pretend that Zn is
normal, it's essentially the

198
00:10:49,450 --> 00:10:53,230
same as if we pretend
that Sn is normal.

199
00:10:53,230 --> 00:10:55,560
And so we can calculate
probabilities that have to do

200
00:10:55,560 --> 00:10:59,830
with Sn as if Sn were normal.

201
00:10:59,830 --> 00:11:03,850
Now, the central limit theorem
does not tell us that Sn is

202
00:11:03,850 --> 00:11:05,120
approximately normal.

203
00:11:05,120 --> 00:11:08,860
The formal statement is about
Zn, but, practically speaking,

204
00:11:08,860 --> 00:11:11,150
when you use the result,
you can just

205
00:11:11,150 --> 00:11:14,650
pretend that Sn is normal.

206
00:11:14,650 --> 00:11:18,620
Finally, it's a limit theorem,
so it tells us about what

207
00:11:18,620 --> 00:11:21,240
happens when n goes
to infinity.

208
00:11:21,240 --> 00:11:23,880
If we are to use it in practice,
of course, n is not

209
00:11:23,880 --> 00:11:25,120
going to be infinity.

210
00:11:25,120 --> 00:11:28,320
Maybe n is equal to 15.

211
00:11:28,320 --> 00:11:32,130
Can we use a limit theorem when
n is a small number, as

212
00:11:32,130 --> 00:11:34,020
small as 15?

213
00:11:34,020 --> 00:11:36,980
Well, it turns out that it's
a very good approximation.

214
00:11:36,980 --> 00:11:41,420
Even for quite small values
of n, it gives us

215
00:11:41,420 --> 00:11:43,770
very accurate answers.

216
00:11:43,770 --> 00:11:49,710
So n over the order of 15, or
20, or so give us very good

217
00:11:49,710 --> 00:11:51,790
results in practice.

218
00:11:51,790 --> 00:11:54,820
There are no good theorems
that will give us hard

219
00:11:54,820 --> 00:11:58,550
guarantees because the quality
of the approximation does

220
00:11:58,550 --> 00:12:03,490
depend on the details of the
distribution of the X's.

221
00:12:03,490 --> 00:12:07,510
If the X's have a distribution
that, from the outset, looks a

222
00:12:07,510 --> 00:12:13,200
little bit like the normal, then
for small values of n,

223
00:12:13,200 --> 00:12:15,700
you are going to see,
essentially, a normal

224
00:12:15,700 --> 00:12:16,980
distribution for the sum.

225
00:12:16,980 --> 00:12:20,030
If the distribution of the X's
is very different from the

226
00:12:20,030 --> 00:12:23,350
normal, it's going to take a
larger value of n for the

227
00:12:23,350 --> 00:12:25,770
central limit theorem
to take effect.

228
00:12:25,770 --> 00:12:29,960
So let's illustrates this with
a few representative plots.

229
00:12:29,960 --> 00:12:32,600

230
00:12:32,600 --> 00:12:36,460
So here, we're starting with a
discrete uniform distribution

231
00:12:36,460 --> 00:12:39,580
that goes from 1 to 8.

232
00:12:39,580 --> 00:12:44,200
Let's add 2 of these random
variables, 2 random variables

233
00:12:44,200 --> 00:12:47,870
with this PMF, and find
the PMF of the sum.

234
00:12:47,870 --> 00:12:52,570
This is a convolution of 2
discrete uniforms, and I

235
00:12:52,570 --> 00:12:54,960
believe you have seen this
exercise before.

236
00:12:54,960 --> 00:12:59,030
When you convolve this with
itself, you get a triangle.

237
00:12:59,030 --> 00:13:04,400
So this is the PMF for the sum
of two discrete uniforms.

238
00:13:04,400 --> 00:13:05,370
Now let's continue.

239
00:13:05,370 --> 00:13:07,980
Let's convolve this
with itself.

240
00:13:07,980 --> 00:13:10,750
These was going to give
us the PMF of a sum

241
00:13:10,750 --> 00:13:13,740
of 4 discrete uniforms.

242
00:13:13,740 --> 00:13:17,930
And we get this, which starts
looking like a normal.

243
00:13:17,930 --> 00:13:23,450
If we go to n equal to 32, then
it looks, essentially,

244
00:13:23,450 --> 00:13:25,270
exactly like a normal.

245
00:13:25,270 --> 00:13:27,850
And it's an excellent
approximation.

246
00:13:27,850 --> 00:13:32,290
So this is the PMF of the sum
of 32 discrete random

247
00:13:32,290 --> 00:13:36,560
variables with this uniform
distribution.

248
00:13:36,560 --> 00:13:42,190
If we start with a PMF which
is not symmetric--

249
00:13:42,190 --> 00:13:44,640
this one is symmetric
around the mean.

250
00:13:44,640 --> 00:13:47,630
But if we start with a PMF which
is non-symmetric, so

251
00:13:47,630 --> 00:13:53,780
this is, here, is a truncated
geometric PMF, then things do

252
00:13:53,780 --> 00:13:58,960
not work out as nicely when
I add 8 of these.

253
00:13:58,960 --> 00:14:03,640
That is, if I convolve this
with itself 8 times, I get

254
00:14:03,640 --> 00:14:08,600
this PMF, which maybe resembles
a little bit to the

255
00:14:08,600 --> 00:14:09,800
normal one.

256
00:14:09,800 --> 00:14:13,050
But you can really tell that
it's different from the normal

257
00:14:13,050 --> 00:14:16,640
if you focus at the details
here and there.

258
00:14:16,640 --> 00:14:19,930
Here it sort of rises sharply.

259
00:14:19,930 --> 00:14:23,420
Here it tails off
a bit slower.

260
00:14:23,420 --> 00:14:27,660
So there's an asymmetry here
that's present, and which is a

261
00:14:27,660 --> 00:14:29,340
consequence of the
asymmetry of the

262
00:14:29,340 --> 00:14:31,710
distribution we started with.

263
00:14:31,710 --> 00:14:35,320
If we go to 16, it looks a
little better, but still you

264
00:14:35,320 --> 00:14:39,600
can see the asymmetry between
this tail and that tail.

265
00:14:39,600 --> 00:14:43,030
If you get to 32 there's still a
little bit of asymmetry, but

266
00:14:43,030 --> 00:14:48,520
at least now it starts looking
like a normal distribution.

267
00:14:48,520 --> 00:14:54,270
So the moral from these plots
is that it might vary, a

268
00:14:54,270 --> 00:14:57,360
little bit, what kind of values
of n you need before

269
00:14:57,360 --> 00:15:00,070
you get the really good
approximation.

270
00:15:00,070 --> 00:15:04,520
But for values of n in the range
20 to 30 or so, usually

271
00:15:04,520 --> 00:15:07,340
you expect to get a pretty
good approximation.

272
00:15:07,340 --> 00:15:10,180
At least that's what the visual
inspection of these

273
00:15:10,180 --> 00:15:13,330
graphs tells us.

274
00:15:13,330 --> 00:15:16,560
So now that we know that we have
a good approximation in

275
00:15:16,560 --> 00:15:18,460
our hands, let's use it.

276
00:15:18,460 --> 00:15:21,890
Let's use it by revisiting an
example from last time.

277
00:15:21,890 --> 00:15:24,480
This is the polling problem.

278
00:15:24,480 --> 00:15:28,360
We're interested in the fraction
of population that

279
00:15:28,360 --> 00:15:30,220
has a certain habit been.

280
00:15:30,220 --> 00:15:33,680
And we try to find what f is.

281
00:15:33,680 --> 00:15:38,120
And the way we do it is by
polling people at random and

282
00:15:38,120 --> 00:15:40,600
recording the answers that they
give, whether they have

283
00:15:40,600 --> 00:15:42,340
the habit or not.

284
00:15:42,340 --> 00:15:45,250
So for each person, we get the
Bernoulli random variable.

285
00:15:45,250 --> 00:15:52,050
With probability f, a person is
going to respond 1, or yes,

286
00:15:52,050 --> 00:15:55,080
so this is with probability f.

287
00:15:55,080 --> 00:15:58,490
And with the remaining
probability 1-f, the person

288
00:15:58,490 --> 00:16:00,390
responds no.

289
00:16:00,390 --> 00:16:04,520
We record this number, which
is how many people answered

290
00:16:04,520 --> 00:16:06,800
yes, divided by the total
number of people.

291
00:16:06,800 --> 00:16:10,740
That's the fraction of the
population that we asked.

292
00:16:10,740 --> 00:16:16,980
This is the fraction inside our
sample that answered yes.

293
00:16:16,980 --> 00:16:21,410
And as we discussed last time,
you might start with some

294
00:16:21,410 --> 00:16:23,210
specs for the poll.

295
00:16:23,210 --> 00:16:25,660
And the specs have
two parameters--

296
00:16:25,660 --> 00:16:29,400
the accuracy that you want and
the confidence that you want

297
00:16:29,400 --> 00:16:33,620
to have that you did really
obtain the desired accuracy.

298
00:16:33,620 --> 00:16:40,550
So the specs here is that we
want, probability 95% that our

299
00:16:40,550 --> 00:16:46,400
estimate is within 1 % point
from the true answer.

300
00:16:46,400 --> 00:16:48,940
So the event of interest
is this.

301
00:16:48,940 --> 00:16:53,640
That's the result of the poll
minus distance from the true

302
00:16:53,640 --> 00:16:59,150
answer is less or bigger
than 1 % point.

303
00:16:59,150 --> 00:17:02,000
And we're interested in
calculating or approximating

304
00:17:02,000 --> 00:17:04,140
this particular probability.

305
00:17:04,140 --> 00:17:08,000
So we want to do it using the
central limit theorem.

306
00:17:08,000 --> 00:17:13,050
And one way of arranging the
mechanics of this calculation

307
00:17:13,050 --> 00:17:17,880
is to take the event of interest
and massage it by

308
00:17:17,880 --> 00:17:21,400
subtracting and dividing things
from both sides of this

309
00:17:21,400 --> 00:17:27,510
inequality so that you bring
him to the picture the

310
00:17:27,510 --> 00:17:31,600
standardized random variable,
the Zn, and then apply the

311
00:17:31,600 --> 00:17:33,900
central limit theorem.

312
00:17:33,900 --> 00:17:38,550
So the event of interest, let
me write it in full, Mn is

313
00:17:38,550 --> 00:17:42,280
this quantity, so I'm putting it
here, minus f, which is the

314
00:17:42,280 --> 00:17:44,410
same as nf divided by n.

315
00:17:44,410 --> 00:17:46,980
So this is the same
as that event.

316
00:17:46,980 --> 00:17:49,840
We're going to calculate the
probability of this.

317
00:17:49,840 --> 00:17:52,460
This is not exactly in the form
in which we apply the

318
00:17:52,460 --> 00:17:53,430
central limit theorem.

319
00:17:53,430 --> 00:17:56,570
To apply the central limit
theorem, we need, down here,

320
00:17:56,570 --> 00:17:59,660
to have sigma square root n.

321
00:17:59,660 --> 00:18:03,100
So how can I put sigma
square root n here?

322
00:18:03,100 --> 00:18:07,350
I can divide both sides of
this inequality by sigma.

323
00:18:07,350 --> 00:18:10,970
And then I can take a factor of
square root n from here and

324
00:18:10,970 --> 00:18:13,240
send it to the other side.

325
00:18:13,240 --> 00:18:15,660
So this event is the
same as that event.

326
00:18:15,660 --> 00:18:19,190
This will happen if and only
if that will happen.

327
00:18:19,190 --> 00:18:23,670
So calculating the probability
of this event here is the same

328
00:18:23,670 --> 00:18:27,110
as calculating the probability
that this events happens.

329
00:18:27,110 --> 00:18:30,870
And now we are in business
because the random variable

330
00:18:30,870 --> 00:18:36,510
that we got in here is Zn, or
the absolute value of Zn, and

331
00:18:36,510 --> 00:18:41,480
we're talking about the
probability that Zn, absolute

332
00:18:41,480 --> 00:18:45,660
value of Zn, is bigger than
a certain number.

333
00:18:45,660 --> 00:18:50,310
Since Zn is to be approximated
by a standard normal random

334
00:18:50,310 --> 00:18:54,560
variable, our approximation is
going to be, instead of asking

335
00:18:54,560 --> 00:18:59,040
for Zn being bigger than this
number, we will ask for Z,

336
00:18:59,040 --> 00:19:02,500
absolute value of Z, being
bigger than this number.

337
00:19:02,500 --> 00:19:05,640
So this is the probability that
we want to calculate.

338
00:19:05,640 --> 00:19:09,730
And now Z is a standard normal
random variable.

339
00:19:09,730 --> 00:19:12,760
There's a small difficulty,
the one that we also

340
00:19:12,760 --> 00:19:14,310
encountered last time.

341
00:19:14,310 --> 00:19:18,110
And the difficulty is that the
standard deviation, sigma, of

342
00:19:18,110 --> 00:19:20,720
the Xi's is not known.

343
00:19:20,720 --> 00:19:24,560
Sigma is equal to f times--

344
00:19:24,560 --> 00:19:30,090
sigma, in this example, is f
times (1-f), and the only

345
00:19:30,090 --> 00:19:32,690
thing that we know about sigma
is that it's going to be a

346
00:19:32,690 --> 00:19:35,010
number less than 1/2.

347
00:19:35,010 --> 00:19:39,810

348
00:19:39,810 --> 00:19:45,180
OK, so we're going to have to
use an inequality here.

349
00:19:45,180 --> 00:19:48,890
We're going to use a
conservative value of sigma,

350
00:19:48,890 --> 00:19:54,120
the value of sigma equal to 1/2
and use that instead of

351
00:19:54,120 --> 00:19:55,760
the exact value of sigma.

352
00:19:55,760 --> 00:19:59,100
And this gives us an inequality
going this way.

353
00:19:59,100 --> 00:20:03,710
Let's just make sure why the
inequality goes this way.

354
00:20:03,710 --> 00:20:06,683
We got, on our axis,
two numbers.

355
00:20:06,683 --> 00:20:12,390

356
00:20:12,390 --> 00:20:21,650
One number is 0.01 square
root n divided by sigma.

357
00:20:21,650 --> 00:20:27,870
And the other number is
0.02 square root of n.

358
00:20:27,870 --> 00:20:30,840
And my claim is that the numbers
are related to each

359
00:20:30,840 --> 00:20:32,930
other in this particular way.

360
00:20:32,930 --> 00:20:33,500
Why is this?

361
00:20:33,500 --> 00:20:35,410
Sigma is less than 2.

362
00:20:35,410 --> 00:20:39,580
So 1/sigma is bigger than 2.

363
00:20:39,580 --> 00:20:44,020
So since 1/sigma is bigger than
2 this means that this

364
00:20:44,020 --> 00:20:47,740
numbers sits to the right
of that number.

365
00:20:47,740 --> 00:20:51,950
So here we have the probability
that Z is bigger

366
00:20:51,950 --> 00:20:54,820
than this number.

367
00:20:54,820 --> 00:20:59,060
The probability of falling out
there is less than the

368
00:20:59,060 --> 00:21:03,060
probability of falling
in this interval.

369
00:21:03,060 --> 00:21:06,170
So that's what that last
inequality is saying--

370
00:21:06,170 --> 00:21:09,330
this probability is smaller
than that probability.

371
00:21:09,330 --> 00:21:12,010
This is the probability that
we're interested in, but since

372
00:21:12,010 --> 00:21:16,490
we don't know sigma, we take the
conservative value, and we

373
00:21:16,490 --> 00:21:21,610
use an upper bound in terms
of the probability of this

374
00:21:21,610 --> 00:21:23,730
interval here.

375
00:21:23,730 --> 00:21:26,920
And now we are in business.

376
00:21:26,920 --> 00:21:30,980
We can start using our normal
tables to calculate

377
00:21:30,980 --> 00:21:33,140
probabilities of interest.

378
00:21:33,140 --> 00:21:40,300
So for example, let's say that's
we take n to be 10,000.

379
00:21:40,300 --> 00:21:42,370
How is the calculation
going to go?

380
00:21:42,370 --> 00:21:45,860
We want to calculate the
probability that the absolute

381
00:21:45,860 --> 00:21:52,920
value of Z is bigger than 0.2
times 1000, which is the

382
00:21:52,920 --> 00:21:56,530
probability that the absolute
value of Z is larger than or

383
00:21:56,530 --> 00:21:58,490
equal to 2.

384
00:21:58,490 --> 00:22:00,500
And here let's do
some mechanics,

385
00:22:00,500 --> 00:22:03,300
just to stay in shape.

386
00:22:03,300 --> 00:22:05,860
The probability that you're
larger than or equal to 2 in

387
00:22:05,860 --> 00:22:09,290
absolute value, since the normal
is symmetric around the

388
00:22:09,290 --> 00:22:13,590
mean, this is going to be twice
the probability that Z

389
00:22:13,590 --> 00:22:16,560
is larger than or equal to 2.

390
00:22:16,560 --> 00:22:22,330
Can we use the cumulative
distribution function of Z to

391
00:22:22,330 --> 00:22:23,300
calculate this?

392
00:22:23,300 --> 00:22:26,100
Well, almost the cumulative
gives us probabilities of

393
00:22:26,100 --> 00:22:28,910
being less than something, not
bigger than something.

394
00:22:28,910 --> 00:22:33,480
So we need one more step and
write this as 1 minus the

395
00:22:33,480 --> 00:22:38,420
probability that Z is less
than or equal to 2.

396
00:22:38,420 --> 00:22:41,620
And this probability, now,
you can read off

397
00:22:41,620 --> 00:22:43,770
from the normal tables.

398
00:22:43,770 --> 00:22:46,460
And the normal tables will
tell you that this

399
00:22:46,460 --> 00:22:52,840
probability is 0.9772.

400
00:22:52,840 --> 00:22:54,520
And you do get an answer.

401
00:22:54,520 --> 00:23:02,530
And the answer is 0.0456.

402
00:23:02,530 --> 00:23:05,220
OK, so we tried 10,000.

403
00:23:05,220 --> 00:23:10,990
And we find that our probably
of error is 4.5%, so we're

404
00:23:10,990 --> 00:23:15,710
doing better than the
spec that we had.

405
00:23:15,710 --> 00:23:19,490
So this tells us that maybe
we have some leeway.

406
00:23:19,490 --> 00:23:24,070
Maybe we can use a smaller
sample size and still stay

407
00:23:24,070 --> 00:23:26,030
without our specs.

408
00:23:26,030 --> 00:23:29,630
Let's try to find how much
we can push the envelope.

409
00:23:29,630 --> 00:23:34,716
How much smaller
can we take n?

410
00:23:34,716 --> 00:23:37,890
To answer that question, we
need to do this kind of

411
00:23:37,890 --> 00:23:40,790
calculation, essentially,
going backwards.

412
00:23:40,790 --> 00:23:46,420
We're going to fix this number
to be 0.05 and work backwards

413
00:23:46,420 --> 00:23:49,130
here to find--

414
00:23:49,130 --> 00:23:50,770
did I do a mistake here?

415
00:23:50,770 --> 00:23:51,770
10,000.

416
00:23:51,770 --> 00:23:53,700
So I'm missing a 0 here.

417
00:23:53,700 --> 00:23:57,440

418
00:23:57,440 --> 00:24:07,540
Ah, but I'm taking the square
root, so it's 100.

419
00:24:07,540 --> 00:24:11,080
Where did the 0.02
come in from?

420
00:24:11,080 --> 00:24:12,020
Ah, from here.

421
00:24:12,020 --> 00:24:15,870
OK, all right.

422
00:24:15,870 --> 00:24:19,330
0.02 times 100, that
gives us 2.

423
00:24:19,330 --> 00:24:22,130
OK, all right.

424
00:24:22,130 --> 00:24:24,240
Very good, OK.

425
00:24:24,240 --> 00:24:27,570
So we'll have to do this
calculation now backwards,

426
00:24:27,570 --> 00:24:33,510
figure out if this is 0.05,
what kind of number we're

427
00:24:33,510 --> 00:24:41,380
going to need here and then
here, and from this we will be

428
00:24:41,380 --> 00:24:45,240
able to tell what value
of n do we need.

429
00:24:45,240 --> 00:24:53,670
OK, so we want to find n such
that the probability that Z is

430
00:24:53,670 --> 00:25:04,870
bigger than 0.02 square
root n is 0.05.

431
00:25:04,870 --> 00:25:09,320
OK, so Z is a standard normal
random variable.

432
00:25:09,320 --> 00:25:16,810
And we want the probability
that we are

433
00:25:16,810 --> 00:25:18,640
outside this range.

434
00:25:18,640 --> 00:25:21,940
We want the probability of
those two tails together.

435
00:25:21,940 --> 00:25:24,960

436
00:25:24,960 --> 00:25:26,920
Those two tails together
should have

437
00:25:26,920 --> 00:25:29,990
probability of 0.05.

438
00:25:29,990 --> 00:25:33,280
This means that this tail,
by itself, should have

439
00:25:33,280 --> 00:25:36,900
probability 0.025.

440
00:25:36,900 --> 00:25:45,960
And this means that this
probability should be 0.975.

441
00:25:45,960 --> 00:25:52,350
Now, if this probability
is to be 0.975, what

442
00:25:52,350 --> 00:25:54,970
should that number be?

443
00:25:54,970 --> 00:25:59,980
You go to the normal tables,
and you find which is the

444
00:25:59,980 --> 00:26:03,190
entry that corresponds
to that number.

445
00:26:03,190 --> 00:26:07,020
I actually brought a normal
table with me.

446
00:26:07,020 --> 00:26:12,740
And 0.975 is down here.

447
00:26:12,740 --> 00:26:15,420
And it tells you that
to the number that

448
00:26:15,420 --> 00:26:19,820
corresponds to it is 1.96.

449
00:26:19,820 --> 00:26:24,890
So this tells us that
this number

450
00:26:24,890 --> 00:26:31,790
should be equal to 1.96.

451
00:26:31,790 --> 00:26:36,380
And now, from here, you
do the calculations.

452
00:26:36,380 --> 00:26:47,510
And you find that n is 9604.

453
00:26:47,510 --> 00:26:53,200
So with a sample of 10,000, we
got probability of error 4.5%.

454
00:26:53,200 --> 00:26:57,910
With a slightly smaller sample
size of 9,600, we can get the

455
00:26:57,910 --> 00:27:01,880
probability of a mistake
to be 0.05, which

456
00:27:01,880 --> 00:27:04,070
was exactly our spec.

457
00:27:04,070 --> 00:27:07,450
So these are essentially the two
ways that you're going to

458
00:27:07,450 --> 00:27:09,830
be using the central
limit theorem.

459
00:27:09,830 --> 00:27:12,690
Either you're given n and
you try to calculate

460
00:27:12,690 --> 00:27:13,610
probabilities.

461
00:27:13,610 --> 00:27:15,590
Or you're given the
probabilities, and you want to

462
00:27:15,590 --> 00:27:18,210
work backwards to
find n itself.

463
00:27:18,210 --> 00:27:20,990

464
00:27:20,990 --> 00:27:27,710
So in this example, the random
variable that we dealt with

465
00:27:27,710 --> 00:27:30,450
was, of course, a binomial
random variable.

466
00:27:30,450 --> 00:27:38,590
The Xi's were Bernoulli,
so the sum of

467
00:27:38,590 --> 00:27:40,950
the Xi's were binomial.

468
00:27:40,950 --> 00:27:44,100
So the central limit theorem
certainly applies to the

469
00:27:44,100 --> 00:27:45,950
binomial distribution.

470
00:27:45,950 --> 00:27:49,440
To be more precise, of course,
it applies to the standardized

471
00:27:49,440 --> 00:27:52,730
version of the binomial
random variable.

472
00:27:52,730 --> 00:27:55,140
So here's what we did,
essentially, in

473
00:27:55,140 --> 00:27:57,300
the previous example.

474
00:27:57,300 --> 00:28:00,690
We fixed the number p, which is
the probability of success

475
00:28:00,690 --> 00:28:02,010
in our experiments.

476
00:28:02,010 --> 00:28:06,550
p corresponds to f in the
previous example.

477
00:28:06,550 --> 00:28:10,570
Let every Xi a Bernoulli
random variable and are

478
00:28:10,570 --> 00:28:13,790
standing assumption is that
these random variables are

479
00:28:13,790 --> 00:28:15,040
independent.

480
00:28:15,040 --> 00:28:17,580

481
00:28:17,580 --> 00:28:20,730
When we add them, we get a
random variable that has a

482
00:28:20,730 --> 00:28:22,030
binomial distribution.

483
00:28:22,030 --> 00:28:25,220
We know the mean and the
variance of the binomial, so

484
00:28:25,220 --> 00:28:29,130
we take Sn, we subtract the
mean, which is this, divide by

485
00:28:29,130 --> 00:28:30,470
the standard deviation.

486
00:28:30,470 --> 00:28:32,790
The central limit theorem tells
us that the cumulative

487
00:28:32,790 --> 00:28:36,130
distribution function of this
random variable is a standard

488
00:28:36,130 --> 00:28:39,860
normal random variable
in the limit.

489
00:28:39,860 --> 00:28:43,730
So let's do one more example
of a calculation.

490
00:28:43,730 --> 00:28:47,160
Let's take n to be--

491
00:28:47,160 --> 00:28:50,110
let's choose some specific
numbers to work with.

492
00:28:50,110 --> 00:28:52,950

493
00:28:52,950 --> 00:28:58,300
So in this example, first thing
to do is to find the

494
00:28:58,300 --> 00:29:02,390
expected value of Sn,
which is n times p.

495
00:29:02,390 --> 00:29:04,150
It's 18.

496
00:29:04,150 --> 00:29:08,100
Then we need to write down
the standard deviation.

497
00:29:08,100 --> 00:29:12,430

498
00:29:12,430 --> 00:29:16,530
The variance of Sn is the
sum of the variances.

499
00:29:16,530 --> 00:29:19,940
It's np times (1-p).

500
00:29:19,940 --> 00:29:25,920
And in this particular example,
p times (1-p) is 1/4,

501
00:29:25,920 --> 00:29:28,320
n is 36, so this is 9.

502
00:29:28,320 --> 00:29:33,120
And that tells us that the
standard deviation of this n

503
00:29:33,120 --> 00:29:34,370
is equal to 3.

504
00:29:34,370 --> 00:29:37,170

505
00:29:37,170 --> 00:29:40,650
So what we're going to do is to
take the event of interest,

506
00:29:40,650 --> 00:29:46,400
which is Sn less than 21, and
rewrite it in a way that

507
00:29:46,400 --> 00:29:48,910
involves the standardized
random variable.

508
00:29:48,910 --> 00:29:51,700
So to do that, we need
to subtract the mean.

509
00:29:51,700 --> 00:29:55,680
So we write this as Sn-3
should be less

510
00:29:55,680 --> 00:29:58,460
than or equal to 21-3.

511
00:29:58,460 --> 00:30:00,360
This is the same event.

512
00:30:00,360 --> 00:30:02,890
And then divide by the standard
deviation, which is

513
00:30:02,890 --> 00:30:06,450
3, and we end up with this.

514
00:30:06,450 --> 00:30:08,300
So the event itself of--

515
00:30:08,300 --> 00:30:09,550
AUDIENCE: [INAUDIBLE].

516
00:30:09,550 --> 00:30:13,700

517
00:30:13,700 --> 00:30:24,150
Should subtract, 18, yes, which
gives me a much nicer

518
00:30:24,150 --> 00:30:26,640
number out here, which is 1.

519
00:30:26,640 --> 00:30:31,650
So the event of interest, that
Sn is less than 21, is the

520
00:30:31,650 --> 00:30:37,330
same as the event that a
standard normal random

521
00:30:37,330 --> 00:30:41,580
variable is less than
or equal to 1.

522
00:30:41,580 --> 00:30:44,690
And once more, you can look this
up at the normal tables.

523
00:30:44,690 --> 00:30:50,690
And you find that the answer
that you get is 0.43.

524
00:30:50,690 --> 00:30:53,390
Now it's interesting to compare
this answer that we

525
00:30:53,390 --> 00:30:57,230
got through the central limit
theorem with the exact answer.

526
00:30:57,230 --> 00:31:01,920
The exact answer involves the
exact binomial distribution.

527
00:31:01,920 --> 00:31:08,780
What we have here is the
binomial probability that, Sn

528
00:31:08,780 --> 00:31:10,970
is equal to k.

529
00:31:10,970 --> 00:31:15,230
Sn being equal to k is given
by this formula.

530
00:31:15,230 --> 00:31:22,610
And we add, over all values for
k going from 0 up to 21,

531
00:31:22,610 --> 00:31:28,670
we write a two lines code to
calculate this sum, and we get

532
00:31:28,670 --> 00:31:32,530
the exact answer,
which is 0.8785.

533
00:31:32,530 --> 00:31:35,760
So there's a pretty good
agreements between the two,

534
00:31:35,760 --> 00:31:38,600
although you wouldn't
call that's

535
00:31:38,600 --> 00:31:40,395
necessarily excellent agreement.

536
00:31:40,395 --> 00:31:45,080

537
00:31:45,080 --> 00:31:47,060
Can we do a little
better than that?

538
00:31:47,060 --> 00:31:51,570

539
00:31:51,570 --> 00:31:53,750
OK.

540
00:31:53,750 --> 00:31:56,510
It turns out that we can.

541
00:31:56,510 --> 00:31:58,625
And here's the idea.

542
00:31:58,625 --> 00:32:02,300

543
00:32:02,300 --> 00:32:07,750
So our random variable
Sn has a mean of 18.

544
00:32:07,750 --> 00:32:09,540
It has a binomial
distribution.

545
00:32:09,540 --> 00:32:14,050
It's described by a PMF that has
a shape roughly like this

546
00:32:14,050 --> 00:32:16,690
and which keeps going on.

547
00:32:16,690 --> 00:32:20,960
Using the central limit
theorem is basically

548
00:32:20,960 --> 00:32:26,650
pretending that Sn is
normal with the

549
00:32:26,650 --> 00:32:28,650
right mean and variance.

550
00:32:28,650 --> 00:32:35,200
So pretending that Zn has
0 mean unit variance, we

551
00:32:35,200 --> 00:32:38,850
approximate it with Z, that
has 0 mean unit variance.

552
00:32:38,850 --> 00:32:42,190
If you were to pretend that
Sn is normal, you would

553
00:32:42,190 --> 00:32:45,407
approximate it with a normal
that has the correct mean and

554
00:32:45,407 --> 00:32:46,250
correct variance.

555
00:32:46,250 --> 00:32:49,390
So it would still be
centered at 18.

556
00:32:49,390 --> 00:32:53,800
And it would have the same
variance as the binomial PMF.

557
00:32:53,800 --> 00:32:57,350
So using the central limit
theorem essentially means that

558
00:32:57,350 --> 00:33:00,420
we keep the mean and the
variance what they are but we

559
00:33:00,420 --> 00:33:03,960
pretend that our distribution
is normal.

560
00:33:03,960 --> 00:33:06,780
We want to calculate the
probability that Sn is less

561
00:33:06,780 --> 00:33:09,590
than or equal to 21.

562
00:33:09,590 --> 00:33:14,310
I pretend that my random
variable is normal, so I draw

563
00:33:14,310 --> 00:33:18,680
a line here and I calculate
the area under the normal

564
00:33:18,680 --> 00:33:22,000
curve going up to 21.

565
00:33:22,000 --> 00:33:23,500
That's essentially
what we did.

566
00:33:23,500 --> 00:33:26,260

567
00:33:26,260 --> 00:33:29,730
Now, a smart person comes
around and says, Sn is a

568
00:33:29,730 --> 00:33:31,360
discrete random variable.

569
00:33:31,360 --> 00:33:34,750
So the event that Sn is less
than or equal to 21 is the

570
00:33:34,750 --> 00:33:38,480
same as Sn being strictly less
than 22 because nothing in

571
00:33:38,480 --> 00:33:41,240
between can happen.

572
00:33:41,240 --> 00:33:43,700
So I'm going to use the
central limit theorem

573
00:33:43,700 --> 00:33:48,290
approximation by pretending
again that Sn is normal and

574
00:33:48,290 --> 00:33:51,650
finding the probability of this
event while pretending

575
00:33:51,650 --> 00:33:53,720
that Sn is normal.

576
00:33:53,720 --> 00:33:57,870
So what this person would do
would be to draw a line here,

577
00:33:57,870 --> 00:34:02,780
at 22, and calculate the area
under the normal curve

578
00:34:02,780 --> 00:34:05,490
all the way to 22.

579
00:34:05,490 --> 00:34:06,700
Who is right?

580
00:34:06,700 --> 00:34:08,820
Which one is better?

581
00:34:08,820 --> 00:34:15,639
Well neither, but we can do
better than both if we sort of

582
00:34:15,639 --> 00:34:17,949
split the difference.

583
00:34:17,949 --> 00:34:21,969
So another way of writing the
same event for Sn is to write

584
00:34:21,969 --> 00:34:25,940
it as Sn being less than 21.5.

585
00:34:25,940 --> 00:34:29,570
In terms of the discrete random
variable Sn, all three

586
00:34:29,570 --> 00:34:32,239
of these are exactly
the same event.

587
00:34:32,239 --> 00:34:35,090
But when you do the continuous
approximation, they give you

588
00:34:35,090 --> 00:34:36,250
different probabilities.

589
00:34:36,250 --> 00:34:39,760
It's a matter of whether you
integrate the area under the

590
00:34:39,760 --> 00:34:46,159
normal curve up to here, up to
the midway point, or up to 22.

591
00:34:46,159 --> 00:34:50,659
It turns out that integrating
up to the midpoint is what

592
00:34:50,659 --> 00:34:54,469
gives us the better
numerical results.

593
00:34:54,469 --> 00:34:59,170
So we take here 21 and 1/2,
and we integrate the area

594
00:34:59,170 --> 00:35:01,170
under the normal curve
up to here.

595
00:35:01,170 --> 00:35:14,100

596
00:35:14,100 --> 00:35:18,560
So let's do this calculation
and see what we get.

597
00:35:18,560 --> 00:35:21,330
What would we change here?

598
00:35:21,330 --> 00:35:27,730
Instead of 21, we would
now write 21 and 1/2.

599
00:35:27,730 --> 00:35:32,810
This 18 becomes, no, that
18 stays what it is.

600
00:35:32,810 --> 00:35:36,890
But this 21 becomes
21 and 1/2.

601
00:35:36,890 --> 00:35:44,790
And so this one becomes
1 + 0.5 by 3.

602
00:35:44,790 --> 00:35:48,210
This is 117.

603
00:35:48,210 --> 00:35:51,980
So we now look up into the
normal tables and ask for the

604
00:35:51,980 --> 00:36:00,000
probability that Z is
less than 1.17.

605
00:36:00,000 --> 00:36:06,070
So this here gets approximated
by the probability that the

606
00:36:06,070 --> 00:36:09,240
standard normal is
less than 1.17.

607
00:36:09,240 --> 00:36:15,960
And the normal tables will
tell us this is 0.879.

608
00:36:15,960 --> 00:36:23,550
Going back to the previous
slide, what we got this time

609
00:36:23,550 --> 00:36:30,310
with this improved approximation
is 0.879.

610
00:36:30,310 --> 00:36:33,730
This is a really good
approximation

611
00:36:33,730 --> 00:36:35,730
of the correct number.

612
00:36:35,730 --> 00:36:39,160
This is what we got
using the 21.

613
00:36:39,160 --> 00:36:42,360
This is what we get using
the 21 and 1/2.

614
00:36:42,360 --> 00:36:45,940
And it's an approximation that's
sort of right on-- a

615
00:36:45,940 --> 00:36:48,350
very good one.

616
00:36:48,350 --> 00:36:54,120
The moral from this numerical
example is that doing this 1

617
00:36:54,120 --> 00:37:00,933
and 1/2 correction does give
us better approximations.

618
00:37:00,933 --> 00:37:06,070

619
00:37:06,070 --> 00:37:12,010
In fact, we can use this 1/2
idea to even calculate

620
00:37:12,010 --> 00:37:14,340
individual probabilities.

621
00:37:14,340 --> 00:37:17,130
So suppose you want to
approximate the probability

622
00:37:17,130 --> 00:37:21,010
that Sn equal to 19.

623
00:37:21,010 --> 00:37:25,620
If you were to pretend that Sn
is normal and calculate this

624
00:37:25,620 --> 00:37:28,470
probability, the probability
that the normal random

625
00:37:28,470 --> 00:37:31,670
variable is equal to 19 is 0.

626
00:37:31,670 --> 00:37:34,150
So you don't get an interesting
answer.

627
00:37:34,150 --> 00:37:37,610
You get a more interesting
answer by writing this event,

628
00:37:37,610 --> 00:37:41,460
19 as being the same as the
event of falling between 18

629
00:37:41,460 --> 00:37:45,910
and 1/2 and 19 and 1/2 and using
the normal approximation

630
00:37:45,910 --> 00:37:48,230
to calculate this probability.

631
00:37:48,230 --> 00:37:51,890
In terms of our previous
picture, this corresponds to

632
00:37:51,890 --> 00:37:53,140
the following.

633
00:37:53,140 --> 00:37:59,400

634
00:37:59,400 --> 00:38:04,650
We are interested in the
probability that

635
00:38:04,650 --> 00:38:07,130
Sn is equal to 19.

636
00:38:07,130 --> 00:38:11,230
So we're interested in the
height of this bar.

637
00:38:11,230 --> 00:38:15,720
We're going to consider the area
under the normal curve

638
00:38:15,720 --> 00:38:21,500
going from here to here,
and use this area as an

639
00:38:21,500 --> 00:38:25,110
approximation for the height
of that particular bar.

640
00:38:25,110 --> 00:38:30,670
So what we're basically doing
is, we take the probability

641
00:38:30,670 --> 00:38:33,830
under the normal curve that's
assigned over a continuum of

642
00:38:33,830 --> 00:38:38,280
values and attributed it to
different discrete values.

643
00:38:38,280 --> 00:38:43,510
Whatever is above the midpoint
gets attributed to 19.

644
00:38:43,510 --> 00:38:45,640
Whatever is below that
midpoint gets

645
00:38:45,640 --> 00:38:47,250
attributed to 18.

646
00:38:47,250 --> 00:38:54,280
So this is green area is our
approximation of the value of

647
00:38:54,280 --> 00:38:56,500
the PMF at 19.

648
00:38:56,500 --> 00:39:00,740
So similarly, if you wanted to
approximate the value of the

649
00:39:00,740 --> 00:39:04,440
PMF at this point, you would
take this interval and

650
00:39:04,440 --> 00:39:06,580
integrate the area
under the normal

651
00:39:06,580 --> 00:39:09,350
curve over that interval.

652
00:39:09,350 --> 00:39:13,410
It turns out that this gives a
very good approximation of the

653
00:39:13,410 --> 00:39:15,660
PMF of the binomial.

654
00:39:15,660 --> 00:39:22,580
And actually, this was the
context in which the central

655
00:39:22,580 --> 00:39:26,310
limit theorem was proved in
the first place, when this

656
00:39:26,310 --> 00:39:27,990
business started.

657
00:39:27,990 --> 00:39:33,060
So this business goes back
a few hundred years.

658
00:39:33,060 --> 00:39:35,700
And the central limit theorem
was first approved by

659
00:39:35,700 --> 00:39:39,420
considering the PMF of a
binomial random variable when

660
00:39:39,420 --> 00:39:41,840
p is equal to 1/2.

661
00:39:41,840 --> 00:39:45,590
People did the algebra, and they
found out that the exact

662
00:39:45,590 --> 00:39:49,700
expression for the PMF is quite
well approximated by

663
00:39:49,700 --> 00:39:51,980
that expression hat you would
get from a normal

664
00:39:51,980 --> 00:39:53,380
distribution.

665
00:39:53,380 --> 00:39:57,510
Then the proof was extended to
binomials for more general

666
00:39:57,510 --> 00:39:59,690
values of p.

667
00:39:59,690 --> 00:40:04,220
So here we talk about this as
a refinement of the general

668
00:40:04,220 --> 00:40:07,480
central limit theorem, but,
historically, that refinement

669
00:40:07,480 --> 00:40:09,830
was where the whole business
got started

670
00:40:09,830 --> 00:40:11,820
in the first place.

671
00:40:11,820 --> 00:40:18,700
All right, so let's go through
the mechanics of approximating

672
00:40:18,700 --> 00:40:21,970
the probability that
Sn is equal to 19--

673
00:40:21,970 --> 00:40:23,810
exactly 19.

674
00:40:23,810 --> 00:40:27,340
As we said, we're going to write
this event as an event

675
00:40:27,340 --> 00:40:31,040
that covers an interval of unit
length from 18 and 1/2 to

676
00:40:31,040 --> 00:40:31,970
19 and 1/2.

677
00:40:31,970 --> 00:40:33,730
This is the event of interest.

678
00:40:33,730 --> 00:40:37,070
First step is to massage the
event of interest so that it

679
00:40:37,070 --> 00:40:40,010
involves our Zn random
variable.

680
00:40:40,010 --> 00:40:43,290
So subtract 18 from all sides.

681
00:40:43,290 --> 00:40:46,860
Divide by the standard deviation
of 3 from all sides.

682
00:40:46,860 --> 00:40:50,850
That's the equivalent
representation of the event.

683
00:40:50,850 --> 00:40:54,200
This is our standardized
random variable Zn.

684
00:40:54,200 --> 00:40:56,950
These are just these numbers.

685
00:40:56,950 --> 00:41:00,530
And to do an approximation, we
want to find the probability

686
00:41:00,530 --> 00:41:04,380
of this event, but Zn is
approximately normal, so we

687
00:41:04,380 --> 00:41:08,030
plug in here the Z, which
is the standard normal.

688
00:41:08,030 --> 00:41:10,150
So we want to find the
probability that the standard

689
00:41:10,150 --> 00:41:12,890
normal falls inside
this interval.

690
00:41:12,890 --> 00:41:15,630
You find these using CDFs
because this is the

691
00:41:15,630 --> 00:41:18,760
probability that you're
less than this but

692
00:41:18,760 --> 00:41:22,370
not less than that.

693
00:41:22,370 --> 00:41:25,370
So it's a difference between two
cumulative probabilities.

694
00:41:25,370 --> 00:41:27,400
Then, you look up your
normal tables.

695
00:41:27,400 --> 00:41:30,560
You find two numbers for these
quantities, and, finally, you

696
00:41:30,560 --> 00:41:35,140
get a numerical answer for an
individual entry of the PMF of

697
00:41:35,140 --> 00:41:36,480
the binomial.

698
00:41:36,480 --> 00:41:39,350
This is a pretty good
approximation, it turns out.

699
00:41:39,350 --> 00:41:42,910
If you were to do the
calculations using the exact

700
00:41:42,910 --> 00:41:47,130
formula, you would
get something

701
00:41:47,130 --> 00:41:49,360
which is pretty close--

702
00:41:49,360 --> 00:41:52,800
an error in the third digit--

703
00:41:52,800 --> 00:41:56,980
this is pretty good.

704
00:41:56,980 --> 00:41:59,650
So I guess what we did here
with our discussion of the

705
00:41:59,650 --> 00:42:04,560
binomial slightly contradicts
what I said before--

706
00:42:04,560 --> 00:42:07,330
that the central limit theorem
is a statement about

707
00:42:07,330 --> 00:42:09,240
cumulative distribution
functions.

708
00:42:09,240 --> 00:42:13,240
In general, it doesn't tell you
what to do to approximate

709
00:42:13,240 --> 00:42:15,270
PMFs themselves.

710
00:42:15,270 --> 00:42:17,440
And that's indeed the
case in general.

711
00:42:17,440 --> 00:42:20,220
One the other hand, for the
special case of a binomial

712
00:42:20,220 --> 00:42:23,610
distribution, the central limit
theorem approximation,

713
00:42:23,610 --> 00:42:28,200
with this 1/2 correction, is a
very good approximation even

714
00:42:28,200 --> 00:42:29,560
for the individual PMF.

715
00:42:29,560 --> 00:42:33,290

716
00:42:33,290 --> 00:42:40,210
All right, so we spent quite
a bit of time on mechanics.

717
00:42:40,210 --> 00:42:46,050
So let's spend the last few
minutes today thinking a bit

718
00:42:46,050 --> 00:42:47,930
and look at a small puzzle.

719
00:42:47,930 --> 00:42:51,390

720
00:42:51,390 --> 00:42:54,240
So the puzzle is
the following.

721
00:42:54,240 --> 00:43:02,460
Consider Poisson process that
runs over a unit interval.

722
00:43:02,460 --> 00:43:07,770
And where the arrival
rate is equal to 1.

723
00:43:07,770 --> 00:43:09,790
So this is the unit interval.

724
00:43:09,790 --> 00:43:12,720
And let X be the number
of arrivals.

725
00:43:12,720 --> 00:43:15,430

726
00:43:15,430 --> 00:43:19,930
And this is Poisson,
with mean 1.

727
00:43:19,930 --> 00:43:25,000

728
00:43:25,000 --> 00:43:28,160
Now, let me take this interval
and divide it

729
00:43:28,160 --> 00:43:30,650
into n little pieces.

730
00:43:30,650 --> 00:43:34,270
So each piece has length 1/n.

731
00:43:34,270 --> 00:43:41,225
And let Xi be the number
of arrivals during

732
00:43:41,225 --> 00:43:43,490
the Ith little interval.

733
00:43:43,490 --> 00:43:48,000

734
00:43:48,000 --> 00:43:51,630
OK, what do we know about
the random variables Xi?

735
00:43:51,630 --> 00:43:55,260
Is they are themselves
Poisson.

736
00:43:55,260 --> 00:43:58,490
It's a number of arrivals
during a small interval.

737
00:43:58,490 --> 00:44:02,340
We also know that when n is
big, so the length of the

738
00:44:02,340 --> 00:44:08,190
interval is small, these Xi's
are approximately Bernoulli,

739
00:44:08,190 --> 00:44:11,730
with mean 1/n.

740
00:44:11,730 --> 00:44:13,970
Guess it doesn't matter whether
we model them as

741
00:44:13,970 --> 00:44:15,720
Bernoulli or not.

742
00:44:15,720 --> 00:44:19,660
What matters is that the
Xi's are independent.

743
00:44:19,660 --> 00:44:20,970
Why are they independent?

744
00:44:20,970 --> 00:44:24,410
Because, in a Poisson process,
these joint intervals are

745
00:44:24,410 --> 00:44:26,770
independent of each other.

746
00:44:26,770 --> 00:44:28,955
So the Xi's are independent.

747
00:44:28,955 --> 00:44:31,840

748
00:44:31,840 --> 00:44:35,570
And they also have the
same distribution.

749
00:44:35,570 --> 00:44:40,360
And we have that X, the total
number of arrivals, is the sum

750
00:44:40,360 --> 00:44:41,610
over the Xn's.

751
00:44:41,610 --> 00:44:44,470

752
00:44:44,470 --> 00:44:49,510
So the central limit theorem
tells us that, approximately,

753
00:44:49,510 --> 00:44:53,670
the sum of independent,
identically distributed random

754
00:44:53,670 --> 00:44:57,720
variables, when we have lots
of these random variables,

755
00:44:57,720 --> 00:45:01,530
behaves like a normal
random variable.

756
00:45:01,530 --> 00:45:07,475
So by using this decomposition
of X into a sum of i.i.d

757
00:45:07,475 --> 00:45:11,540
random variables, and by using
values of n that are bigger

758
00:45:11,540 --> 00:45:16,540
and bigger, by taking the limit,
it should follow that X

759
00:45:16,540 --> 00:45:19,510
has a normal distribution.

760
00:45:19,510 --> 00:45:22,120
On the other hand, we know
that X has a Poisson

761
00:45:22,120 --> 00:45:23,370
distribution.

762
00:45:23,370 --> 00:45:25,270

763
00:45:25,270 --> 00:45:32,640
So something must be wrong
in this argument here.

764
00:45:32,640 --> 00:45:34,900
Can we really use the
central limit

765
00:45:34,900 --> 00:45:38,330
theorem in this situation?

766
00:45:38,330 --> 00:45:41,300
So what do we need for the
central limit theorem?

767
00:45:41,300 --> 00:45:44,160
We need to have independent,
identically

768
00:45:44,160 --> 00:45:46,700
distributed random variables.

769
00:45:46,700 --> 00:45:49,060
We have it here.

770
00:45:49,060 --> 00:45:53,410
We want them to have a finite
mean and finite variance.

771
00:45:53,410 --> 00:45:57,610
We also have it here, means
variances are finite.

772
00:45:57,610 --> 00:46:02,050
What is another assumption that
was never made explicit,

773
00:46:02,050 --> 00:46:04,080
but essentially was there?

774
00:46:04,080 --> 00:46:07,680

775
00:46:07,680 --> 00:46:13,260
Or in other words, what is the
flaw in this argument that

776
00:46:13,260 --> 00:46:15,520
uses the central limit
theorem here?

777
00:46:15,520 --> 00:46:16,770
Any thoughts?

778
00:46:16,770 --> 00:46:24,110

779
00:46:24,110 --> 00:46:29,640
So in the central limit theorem,
we said, consider--

780
00:46:29,640 --> 00:46:34,820
fix a probability distribution,
and let the Xi's

781
00:46:34,820 --> 00:46:38,280
be distributed according to that
probability distribution,

782
00:46:38,280 --> 00:46:42,935
and add a larger and larger
number or Xi's.

783
00:46:42,935 --> 00:46:47,410
But the underlying, unstated
assumption is that we fix the

784
00:46:47,410 --> 00:46:49,490
distribution of the Xi's.

785
00:46:49,490 --> 00:46:52,810
As we let n increase,
the statistics of

786
00:46:52,810 --> 00:46:55,930
each Xi do not change.

787
00:46:55,930 --> 00:46:59,010
Whereas here, I'm playing
a trick on you.

788
00:46:59,010 --> 00:47:03,700
As I'm taking more and more
random variables, I'm actually

789
00:47:03,700 --> 00:47:07,850
changing what those random
variables are.

790
00:47:07,850 --> 00:47:12,960
When I take a larger n, the Xi's
are random variables with

791
00:47:12,960 --> 00:47:15,720
a different mean and
different variance.

792
00:47:15,720 --> 00:47:19,800
So I'm adding more of these, but
at the same time, in this

793
00:47:19,800 --> 00:47:23,420
example, I'm changing
their distributions.

794
00:47:23,420 --> 00:47:26,380
That's something that doesn't
fit the setting of the central

795
00:47:26,380 --> 00:47:27,000
limit theorem.

796
00:47:27,000 --> 00:47:29,910
In the central limit theorem,
you first fix the distribution

797
00:47:29,910 --> 00:47:31,200
of the X's.

798
00:47:31,200 --> 00:47:35,290
You keep it fixed, and then you
consider adding more and

799
00:47:35,290 --> 00:47:38,950
more according to that
particular fixed distribution.

800
00:47:38,950 --> 00:47:40,020
So that's the catch.

801
00:47:40,020 --> 00:47:42,240
That's why the central limit
theorem does not

802
00:47:42,240 --> 00:47:43,970
apply to this situation.

803
00:47:43,970 --> 00:47:46,230
And we're lucky that it
doesn't apply because,

804
00:47:46,230 --> 00:47:50,220
otherwise, we would have a huge
contradiction destroying

805
00:47:50,220 --> 00:47:52,770
probability theory.

806
00:47:52,770 --> 00:48:02,240
OK, but now that's still
leaves us with a

807
00:48:02,240 --> 00:48:05,040
little bit of a dilemma.

808
00:48:05,040 --> 00:48:08,510
Suppose that, here, essentially
we're adding

809
00:48:08,510 --> 00:48:12,815
independent Bernoulli
random variables.

810
00:48:12,815 --> 00:48:22,650

811
00:48:22,650 --> 00:48:25,300
So the issue is that the central
limit theorem has to

812
00:48:25,300 --> 00:48:28,920
do with asymptotics as
n goes to infinity.

813
00:48:28,920 --> 00:48:34,260
And if we consider a binomial,
and somebody gives us specific

814
00:48:34,260 --> 00:48:38,870
numbers about the parameters of
that binomial, it might not

815
00:48:38,870 --> 00:48:40,830
necessarily be obvious
what kind of

816
00:48:40,830 --> 00:48:42,790
approximation do we use.

817
00:48:42,790 --> 00:48:45,660
In particular, we do have two
different approximations for

818
00:48:45,660 --> 00:48:47,100
the binomial.

819
00:48:47,100 --> 00:48:51,610
If we fix p, then the binomial
is the sum of Bernoulli's that

820
00:48:51,610 --> 00:48:54,930
come from a fixed distribution,
we consider more

821
00:48:54,930 --> 00:48:56,450
and more of these.

822
00:48:56,450 --> 00:48:58,990
When we add them, the central
limit theorem tells us that we

823
00:48:58,990 --> 00:49:01,190
get the normal distribution.

824
00:49:01,190 --> 00:49:04,430
There's another sort of limit,
which has the flavor of this

825
00:49:04,430 --> 00:49:10,770
example, in which we still deal
with a binomial, sum of n

826
00:49:10,770 --> 00:49:11,170
Bernoulli's.

827
00:49:11,170 --> 00:49:14,310
We let that sum, the
number of the

828
00:49:14,310 --> 00:49:16,090
Bernoulli's go to infinity.

829
00:49:16,090 --> 00:49:18,890
But each Bernoulli has a
probability of success that

830
00:49:18,890 --> 00:49:23,830
goes to 0, and we do this in a
way so that np, the expected

831
00:49:23,830 --> 00:49:27,090
number of successes,
stays finite.

832
00:49:27,090 --> 00:49:30,660
This is the situation that we
dealt with when we first

833
00:49:30,660 --> 00:49:32,960
defined our Poisson process.

834
00:49:32,960 --> 00:49:37,540
We have a very, very large
number so lots, of time slots,

835
00:49:37,540 --> 00:49:40,920
but during each time slot,
there's a tiny probability of

836
00:49:40,920 --> 00:49:42,950
obtaining an arrival.

837
00:49:42,950 --> 00:49:48,460
Under that setting, in discrete
time, we have a

838
00:49:48,460 --> 00:49:51,670
binomial distribution, or
Bernoulli process, but when we

839
00:49:51,670 --> 00:49:54,530
take the limit, we obtain the
Poisson process and the

840
00:49:54,530 --> 00:49:56,470
Poisson approximation.

841
00:49:56,470 --> 00:49:58,510
So these are two equally valid

842
00:49:58,510 --> 00:50:00,550
approximations of the binomial.

843
00:50:00,550 --> 00:50:03,300
But they're valid in different
asymptotic regimes.

844
00:50:03,300 --> 00:50:06,180
In one regime, we fixed p,
let n go to infinity.

845
00:50:06,180 --> 00:50:09,360
In the other regime, we let
both n and p change

846
00:50:09,360 --> 00:50:11,540
simultaneously.

847
00:50:11,540 --> 00:50:14,240
Now, in real life, you're
never dealing with the

848
00:50:14,240 --> 00:50:15,290
limiting situations.

849
00:50:15,290 --> 00:50:17,870
You're dealing with
actual numbers.

850
00:50:17,870 --> 00:50:21,820
So if somebody tells you that
the numbers are like this,

851
00:50:21,820 --> 00:50:25,160
then you should probably say
that this is the situation

852
00:50:25,160 --> 00:50:27,380
that fits the Poisson
description--

853
00:50:27,380 --> 00:50:30,180
large number of slots with
each slot having a tiny

854
00:50:30,180 --> 00:50:32,460
probability of success.

855
00:50:32,460 --> 00:50:36,890
On the other hand, if p is
something like this, and n is

856
00:50:36,890 --> 00:50:40,460
500, then you expect to get
the distribution for the

857
00:50:40,460 --> 00:50:41,680
number of successes.

858
00:50:41,680 --> 00:50:45,740
It's going to have a mean of 50
and to have a fair amount

859
00:50:45,740 --> 00:50:47,280
of spread around there.

860
00:50:47,280 --> 00:50:50,150
It turns out that the normal
approximation would be better

861
00:50:50,150 --> 00:50:51,500
in this context.

862
00:50:51,500 --> 00:50:57,120
As a rule of thumb, if n times p
is bigger than 10 or 20, you

863
00:50:57,120 --> 00:50:59,320
can start using the normal
approximation.

864
00:50:59,320 --> 00:51:04,310
If n times p is a small number,
then you prefer to use

865
00:51:04,310 --> 00:51:06,090
the Poisson approximation.

866
00:51:06,090 --> 00:51:08,840
But there's no hard theorems
or rules about

867
00:51:08,840 --> 00:51:11,650
how to go about this.

868
00:51:11,650 --> 00:51:15,440
OK, so from next time we're
going to switch base again.

869
00:51:15,440 --> 00:51:17,830
And we're going to put together
everything we learned

870
00:51:17,830 --> 00:51:20,620
in this class to start solving
inference problems.

871
00:51:20,620 --> 00:51:22,050