1
00:00:00,000 --> 00:00:00,040

2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

3
00:00:02,460 --> 00:00:03,870
Commons license.

4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to

5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.

6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from

7
00:00:13,460 --> 00:00:17,390
hundreds of MIT courses, visit
MIT OpenCourseWare at

8
00:00:17,390 --> 00:00:18,640
ocw.mit.edu.

9
00:00:18,640 --> 00:00:21,860

10
00:00:21,860 --> 00:00:25,130
JOHN TSITSIKLIS: We're going
to start today a new unit.

11
00:00:25,130 --> 00:00:29,320
so we will be talking about
limit theorems.

12
00:00:29,320 --> 00:00:33,580
So just to introduce the topic,
let's think of the

13
00:00:33,580 --> 00:00:35,560
following situation.

14
00:00:35,560 --> 00:00:37,580
There's a population
of penguins down

15
00:00:37,580 --> 00:00:38,970
at the South Pole.

16
00:00:38,970 --> 00:00:42,740
And if you were to pick a
penguin at random and measure

17
00:00:42,740 --> 00:00:46,930
their height, the expected value
of their height would be

18
00:00:46,930 --> 00:00:50,020
the average of the heights of
the different penguins in the

19
00:00:50,020 --> 00:00:50,970
population.

20
00:00:50,970 --> 00:00:53,430
So suppose when you
pick one, every

21
00:00:53,430 --> 00:00:55,210
penguin is equally likely.

22
00:00:55,210 --> 00:00:58,020
Then the expected value is just
the average of all the

23
00:00:58,020 --> 00:00:59,340
penguins out there.

24
00:00:59,340 --> 00:01:01,650
So your boss asks you to
find out what that the

25
00:01:01,650 --> 00:01:03,020
expected value is.

26
00:01:03,020 --> 00:01:04,980
One way would be to
go and measure

27
00:01:04,980 --> 00:01:06,540
each and every penguin.

28
00:01:06,540 --> 00:01:08,600
That might be a little
time consuming.

29
00:01:08,600 --> 00:01:13,120
So alternatively, what you can
do is to go and pick penguins

30
00:01:13,120 --> 00:01:17,450
at random, pick a few of them,
let's say a number n of them.

31
00:01:17,450 --> 00:01:20,420
So you measure the height
of each one.

32
00:01:20,420 --> 00:01:25,920
And then you calculate the
average of the heights of

33
00:01:25,920 --> 00:01:29,050
those penguins that you
have collected.

34
00:01:29,050 --> 00:01:33,100
So this is your estimate
of the expected value.

35
00:01:33,100 --> 00:01:41,010
Now, we called this the sample
mean, which is the mean value,

36
00:01:41,010 --> 00:01:44,430
but within the sample that
you have collected.

37
00:01:44,430 --> 00:01:48,090
This is something that's sort
of feels the same as the

38
00:01:48,090 --> 00:01:52,140
expected value, which
is again, the mean.

39
00:01:52,140 --> 00:01:54,400
But the expected value's a
different kind of mean.

40
00:01:54,400 --> 00:01:57,870
The expected value is the mean
over the entire population,

41
00:01:57,870 --> 00:02:01,680
whereas the sample mean is the
average over the smaller

42
00:02:01,680 --> 00:02:03,940
sample that you have measured.

43
00:02:03,940 --> 00:02:06,330
The expected value
is a number.

44
00:02:06,330 --> 00:02:09,220
The sample mean is a
random variable.

45
00:02:09,220 --> 00:02:11,720
It's a random variable because
the sample you have

46
00:02:11,720 --> 00:02:15,010
collected is random.

47
00:02:15,010 --> 00:02:18,520
Now, we think that this is a
reasonable way of estimating

48
00:02:18,520 --> 00:02:19,900
the expectation.

49
00:02:19,900 --> 00:02:25,710
So in the limit as n goes to
infinity, it's plausible that

50
00:02:25,710 --> 00:02:29,170
the sample mean, the estimate
that we are constructing,

51
00:02:29,170 --> 00:02:33,790
should somehow get close
to the expected value.

52
00:02:33,790 --> 00:02:34,560
What does this mean?

53
00:02:34,560 --> 00:02:36,160
What does it mean
to get close?

54
00:02:36,160 --> 00:02:37,620
In what sense?

55
00:02:37,620 --> 00:02:39,440
And is this statement true?

56
00:02:39,440 --> 00:02:44,160
This is the kind of statement
that we deal with when dealing

57
00:02:44,160 --> 00:02:45,710
with limit theorems.

58
00:02:45,710 --> 00:02:49,500
That's the subject of limit
theorems, when what happens if

59
00:02:49,500 --> 00:02:52,020
you're dealing with lots and
lots of random variables, and

60
00:02:52,020 --> 00:02:54,620
perhaps take averages
and so on.

61
00:02:54,620 --> 00:02:57,280
So why do we bother
about this?

62
00:02:57,280 --> 00:03:01,200
Well, if you're in the sampling
business, it would be

63
00:03:01,200 --> 00:03:04,870
reassuring to know that this
particular way of estimating

64
00:03:04,870 --> 00:03:06,880
the expected value
actually gets you

65
00:03:06,880 --> 00:03:08,850
close to the true answer.

66
00:03:08,850 --> 00:03:11,890
There's also a higher level
reason, which is a little more

67
00:03:11,890 --> 00:03:13,660
abstract and mathematical.

68
00:03:13,660 --> 00:03:17,110
So probability problems are easy
to deal with if you're

69
00:03:17,110 --> 00:03:20,040
having in your hands one or
two random variables.

70
00:03:20,040 --> 00:03:23,520
You can write down their mass
functions, joints density

71
00:03:23,520 --> 00:03:24,930
functions, and so on.

72
00:03:24,930 --> 00:03:27,500
You can calculate on paper
or on a computer,

73
00:03:27,500 --> 00:03:29,430
you can get the answers.

74
00:03:29,430 --> 00:03:33,510
Probability problems become
computationally intractable if

75
00:03:33,510 --> 00:03:36,760
you're dealing, let's say, with
100 random variables and

76
00:03:36,760 --> 00:03:40,280
you're trying to get the exact
answers for anything.

77
00:03:40,280 --> 00:03:43,050
So in principle, the same
formulas that we have, they

78
00:03:43,050 --> 00:03:44,230
still apply.

79
00:03:44,230 --> 00:03:47,360
But they involve summations
over large ranges of

80
00:03:47,360 --> 00:03:48,830
combinations of indices.

81
00:03:48,830 --> 00:03:51,310
And that makes life extremely
difficult.

82
00:03:51,310 --> 00:03:55,100
But when you push the envelope
and you go to a situation

83
00:03:55,100 --> 00:03:58,480
where you're dealing with a
very, very large number of

84
00:03:58,480 --> 00:04:02,130
variables, then you can
start taking limits.

85
00:04:02,130 --> 00:04:05,200
And when you take limits,
wonderful things happen.

86
00:04:05,200 --> 00:04:08,030
Many formulas start simplifying,
and you can

87
00:04:08,030 --> 00:04:11,770
actually get useful answers by
considering those limits.

88
00:04:11,770 --> 00:04:15,450
And that's sort of the big
reason why looking at limit

89
00:04:15,450 --> 00:04:17,820
theorems is a useful
thing to do.

90
00:04:17,820 --> 00:04:20,990
So what we're going to do today,
first we're going to

91
00:04:20,990 --> 00:04:27,110
start with a useful, simple tool
that allows us to relates

92
00:04:27,110 --> 00:04:30,290
probabilities with
expected values.

93
00:04:30,290 --> 00:04:33,230
The Markov inequality is the
first inequality we're going

94
00:04:33,230 --> 00:04:33,840
to write down.

95
00:04:33,840 --> 00:04:37,650
And then using that, we're going
to get the Chebyshev's

96
00:04:37,650 --> 00:04:39,760
inequality, a related
inequality.

97
00:04:39,760 --> 00:04:43,760
Then we need to define what do
we mean by convergence when we

98
00:04:43,760 --> 00:04:45,270
talk about random variables.

99
00:04:45,270 --> 00:04:48,310
It's a notion that's a
generalization of the notion

100
00:04:48,310 --> 00:04:51,000
of the usual convergence
of limits of

101
00:04:51,000 --> 00:04:52,690
a sequence of numbers.

102
00:04:52,690 --> 00:04:55,710
And once we have our notion of
convergence, we're going to

103
00:04:55,710 --> 00:05:00,860
see that, indeed, the sample
mean converges to the true

104
00:05:00,860 --> 00:05:04,380
mean, converges to the expected
value of the X's.

105
00:05:04,380 --> 00:05:08,840
And this statement is called the
weak law of large numbers.

106
00:05:08,840 --> 00:05:11,650
The reason it's called the weak
law is because there's

107
00:05:11,650 --> 00:05:14,640
also a strong law, which is
a statement with the same

108
00:05:14,640 --> 00:05:16,570
flavor, but with a somewhat
different

109
00:05:16,570 --> 00:05:18,410
mathematical content.

110
00:05:18,410 --> 00:05:20,790
But it's a little more abstract,
and we will not be

111
00:05:20,790 --> 00:05:21,680
getting into this.

112
00:05:21,680 --> 00:05:26,070
So the weak law is all that
you're going to get.

113
00:05:26,070 --> 00:05:28,570
All right.

114
00:05:28,570 --> 00:05:31,050
So now we start our
digression.

115
00:05:31,050 --> 00:05:38,220
And our first tool will be the
so-called Markov inequality.

116
00:05:38,220 --> 00:05:45,050

117
00:05:45,050 --> 00:05:48,040
So let's take a random variable
that's always

118
00:05:48,040 --> 00:05:48,870
non-negative.

119
00:05:48,870 --> 00:05:51,790
No matter what, it gets
no negative values.

120
00:05:51,790 --> 00:05:53,710
To keep things simple,
let's assume it's a

121
00:05:53,710 --> 00:05:55,500
discrete random variable.

122
00:05:55,500 --> 00:05:59,770
So the expected value is the sum
over all possible values

123
00:05:59,770 --> 00:06:01,460
that a random variable
can take.

124
00:06:01,460 --> 00:06:04,440

125
00:06:04,440 --> 00:06:06,600
The values of the random
variables that can take

126
00:06:06,600 --> 00:06:10,850
weighted according to their
corresponding probabilities.

127
00:06:10,850 --> 00:06:13,700
Now, this is a sum
over all x's.

128
00:06:13,700 --> 00:06:16,640
But x takes non-negative
values.

129
00:06:16,640 --> 00:06:19,780
And the PMF is also
non-negative.

130
00:06:19,780 --> 00:06:24,310
So if I take a sum over fewer
things, I'm going to get a

131
00:06:24,310 --> 00:06:25,550
smaller value.

132
00:06:25,550 --> 00:06:29,180
So the sum when I add over
everything is less than or

133
00:06:29,180 --> 00:06:33,255
equal to the sum that I will get
if I only add those terms

134
00:06:33,255 --> 00:06:35,620
that are bigger than
a certain constant.

135
00:06:35,620 --> 00:06:38,600

136
00:06:38,600 --> 00:06:45,140
Now, if I'm adding over x's that
are bigger than a, the x

137
00:06:45,140 --> 00:06:48,630
that shows up up there
will always be larger

138
00:06:48,630 --> 00:06:50,490
than or equal to a.

139
00:06:50,490 --> 00:06:52,370
So we get this inequality.

140
00:06:52,370 --> 00:06:58,170

141
00:06:58,170 --> 00:06:59,980
And now, a is a constant.

142
00:06:59,980 --> 00:07:02,870
I can pull it outside
the summation.

143
00:07:02,870 --> 00:07:05,320
And then I'm left with the
probabilities of all the x's

144
00:07:05,320 --> 00:07:06,990
that are bigger than a.

145
00:07:06,990 --> 00:07:08,850
And that's just the
probability of

146
00:07:08,850 --> 00:07:10,250
being bigger than a.

147
00:07:10,250 --> 00:07:15,540

148
00:07:15,540 --> 00:07:18,140
OK, so that's the Markov
inequality.

149
00:07:18,140 --> 00:07:23,800
Basically tells us that the
expected value is larger than

150
00:07:23,800 --> 00:07:26,240
or equal to this number.

151
00:07:26,240 --> 00:07:30,260
It relates expected values
to probabilities.

152
00:07:30,260 --> 00:07:34,660
It tells us that if the expected
value is small, then

153
00:07:34,660 --> 00:07:39,250
the probability that x is big
is also going to be small.

154
00:07:39,250 --> 00:07:42,240
So it's translates a statement
about smallness of expected

155
00:07:42,240 --> 00:07:46,205
values to a statement about
smallness of probabilities.

156
00:07:46,205 --> 00:07:49,020

157
00:07:49,020 --> 00:07:49,930
OK.

158
00:07:49,930 --> 00:07:54,210
What we actually need is a
somewhat different version of

159
00:07:54,210 --> 00:07:57,240
this same statement.

160
00:07:57,240 --> 00:08:03,010
And what we're going to do is to
apply this inequality to a

161
00:08:03,010 --> 00:08:08,150
non-negative random variable
of a special type.

162
00:08:08,150 --> 00:08:13,330
And you can think of applying
this same calculation to a

163
00:08:13,330 --> 00:08:18,800
random variable of this form, (X
minus mu)-squared, where mu

164
00:08:18,800 --> 00:08:21,870
is the expected value of X.

165
00:08:21,870 --> 00:08:24,075
Now, this is a non-negative
random variable.

166
00:08:24,075 --> 00:08:35,419

167
00:08:35,419 --> 00:08:37,919
So, the expected value of this
random variable, which is the

168
00:08:37,919 --> 00:08:42,220
variance, by following the same
thinking as we had in

169
00:08:42,220 --> 00:08:52,880
that derivation up to there, is
bigger than the probability

170
00:08:52,880 --> 00:08:58,210
that this random variable
is bigger than some--

171
00:08:58,210 --> 00:09:04,760
let me use a-squared
instead of an a

172
00:09:04,760 --> 00:09:06,585
times the value a-squared.

173
00:09:06,585 --> 00:09:12,420

174
00:09:12,420 --> 00:09:16,310
So now of course, this
probability is the same as the

175
00:09:16,310 --> 00:09:23,440
probability that the absolute
value of X minus mu is bigger

176
00:09:23,440 --> 00:09:27,190
than a times a-squared.

177
00:09:27,190 --> 00:09:34,860
And this side is equal to the
variance of X. So this relates

178
00:09:34,860 --> 00:09:40,890
the variance of X to the
probability that our random

179
00:09:40,890 --> 00:09:45,200
variable is far away
from its mean.

180
00:09:45,200 --> 00:09:50,590
If the variance is small, then
it means that the probability

181
00:09:50,590 --> 00:09:54,635
of being far away from the
mean is also small.

182
00:09:54,635 --> 00:09:57,240

183
00:09:57,240 --> 00:10:02,220
So I derived this by applying
the Markov inequality to this

184
00:10:02,220 --> 00:10:04,950
particular non-negative
random variable.

185
00:10:04,950 --> 00:10:09,500
Or just to reinforce, perhaps,
the message, and increase your

186
00:10:09,500 --> 00:10:13,450
confidence in this inequality,
let's just look at the

187
00:10:13,450 --> 00:10:16,980
derivation once more, where I'm
going, here, to start from

188
00:10:16,980 --> 00:10:20,890
first principles, but use the
same idea as the one that was

189
00:10:20,890 --> 00:10:23,480
used in the proof out here.

190
00:10:23,480 --> 00:10:23,685
Ok.

191
00:10:23,685 --> 00:10:26,920
So just for variety, now let's
think of X as being a

192
00:10:26,920 --> 00:10:28,760
continuous random variable.

193
00:10:28,760 --> 00:10:31,520
The derivation is the same
whether it's discrete or

194
00:10:31,520 --> 00:10:32,510
continuous.

195
00:10:32,510 --> 00:10:35,990
So by definition, the variance
is the integral, is this

196
00:10:35,990 --> 00:10:38,130
particular integral.

197
00:10:38,130 --> 00:10:43,920
Now, the integral is going to
become smaller if I integrate,

198
00:10:43,920 --> 00:10:47,130
instead of integrating over
the full range, I only

199
00:10:47,130 --> 00:10:51,070
integrate over x's that are
far away from the mean.

200
00:10:51,070 --> 00:10:52,700
So mu is the mean.

201
00:10:52,700 --> 00:10:54,345
Think of c as some big number.

202
00:10:54,345 --> 00:10:59,670

203
00:10:59,670 --> 00:11:02,210
These are x's that are far
away from the mean to the

204
00:11:02,210 --> 00:11:05,410
left, from minus infinity
to mu minus c.

205
00:11:05,410 --> 00:11:09,030
And these are the x's that are
far away from the mean on the

206
00:11:09,030 --> 00:11:11,210
positive side.

207
00:11:11,210 --> 00:11:13,420
So by integrating over
fewer stuff, I'm

208
00:11:13,420 --> 00:11:15,580
getting a smaller integral.

209
00:11:15,580 --> 00:11:21,970
Now, for any x in this range,
this distance, x minus mu, is

210
00:11:21,970 --> 00:11:23,220
at least c.

211
00:11:23,220 --> 00:11:26,320
So that squared is at
least c squared.

212
00:11:26,320 --> 00:11:28,910
So this term over this
range of integration

213
00:11:28,910 --> 00:11:30,520
is at least c squared.

214
00:11:30,520 --> 00:11:33,250
So I can take it outside
the integral.

215
00:11:33,250 --> 00:11:36,400
And I'm left just with the
integral of the density.

216
00:11:36,400 --> 00:11:38,480
Same thing on the other side.

217
00:11:38,480 --> 00:11:41,770
And so what factors out is
this term c squared.

218
00:11:41,770 --> 00:11:45,360
And inside, we're left with the
probability of being to

219
00:11:45,360 --> 00:11:49,060
the left of mu minus c, and then
the probability of being

220
00:11:49,060 --> 00:11:52,310
to the right of mu plus c,
which is the same as the

221
00:11:52,310 --> 00:11:55,370
probability that the absolute
value of the distance from the

222
00:11:55,370 --> 00:11:58,770
mean is larger than
or equal to c.

223
00:11:58,770 --> 00:12:04,820
So that's the same inequality
that we proved there, except

224
00:12:04,820 --> 00:12:06,060
that here I'm using c.

225
00:12:06,060 --> 00:12:10,530
There I used a, but it's
exactly the same one.

226
00:12:10,530 --> 00:12:12,960
This inequality was maybe better
to understand if you

227
00:12:12,960 --> 00:12:16,790
take that term and send it
to the other side and

228
00:12:16,790 --> 00:12:18,780
write it this form.

229
00:12:18,780 --> 00:12:20,010
What does it tell us?

230
00:12:20,010 --> 00:12:25,750
It tells us that if c is a big
number, it tells us that the

231
00:12:25,750 --> 00:12:30,750
probability of being more than
c away from the mean is going

232
00:12:30,750 --> 00:12:32,330
to be a small number.

233
00:12:32,330 --> 00:12:34,780
When c is big, this is small.

234
00:12:34,780 --> 00:12:35,970
Now, this is intuitive.

235
00:12:35,970 --> 00:12:38,290
The variance is a measure
of the spread of the

236
00:12:38,290 --> 00:12:40,960
distribution, how wide it is.

237
00:12:40,960 --> 00:12:43,960
It tells us that if the
variance is small, the

238
00:12:43,960 --> 00:12:46,320
distribution is not very wide.

239
00:12:46,320 --> 00:12:49,020
And mathematically, this
translates to this statement

240
00:12:49,020 --> 00:12:52,360
that when the variance is small,
the probability of

241
00:12:52,360 --> 00:12:54,880
being far away is going
to be small.

242
00:12:54,880 --> 00:12:58,370
And the further away you're
looking, that is, if c is a

243
00:12:58,370 --> 00:13:00,330
bigger number, that probability

244
00:13:00,330 --> 00:13:01,765
also becomes small.

245
00:13:01,765 --> 00:13:04,930

246
00:13:04,930 --> 00:13:07,880
Maybe an even more intuitive way
to think about the content

247
00:13:07,880 --> 00:13:13,230
of this inequality is to,
instead of c, use the number

248
00:13:13,230 --> 00:13:16,910
k, where k is positive
and sigma is

249
00:13:16,910 --> 00:13:18,530
the standard deviation.

250
00:13:18,530 --> 00:13:22,670
So let's just plug k sigma
in the place of c.

251
00:13:22,670 --> 00:13:25,300
So this becomes k
sigma squared.

252
00:13:25,300 --> 00:13:27,130
These sigma squared's cancel.

253
00:13:27,130 --> 00:13:29,770
We're left with 1
over k-square.

254
00:13:29,770 --> 00:13:31,690
Now, what is this?

255
00:13:31,690 --> 00:13:36,260
This is the event that you are
k standard deviations away

256
00:13:36,260 --> 00:13:37,770
from the mean.

257
00:13:37,770 --> 00:13:40,600
So for example, this statement
here tells you that if you

258
00:13:40,600 --> 00:13:44,900
look at the test scores from a
quiz, what fraction of the

259
00:13:44,900 --> 00:13:49,900
class are 3 standard deviations
away from the mean?

260
00:13:49,900 --> 00:13:53,000
It's possible, but it's not
going to be a lot of people.

261
00:13:53,000 --> 00:13:57,930
It's going to be at most, 1/9
of the class that can be 3

262
00:13:57,930 --> 00:14:02,190
standard deviations or more
away from the mean.

263
00:14:02,190 --> 00:14:05,250
So the Chebyshev inequality
is a really useful one.

264
00:14:05,250 --> 00:14:07,860

265
00:14:07,860 --> 00:14:11,300
It comes in handy whenever you
want to relate probabilities

266
00:14:11,300 --> 00:14:12,800
and expected values.

267
00:14:12,800 --> 00:14:16,390
So if you know that your
expected values or, in

268
00:14:16,390 --> 00:14:19,260
particular, that your variance
is small, this tells you

269
00:14:19,260 --> 00:14:23,080
something about tailed
probabilities.

270
00:14:23,080 --> 00:14:25,530
So this is the end of our
first digression.

271
00:14:25,530 --> 00:14:28,320
We have this inequality
in our hands.

272
00:14:28,320 --> 00:14:31,170
Our second digression is
talk about limits.

273
00:14:31,170 --> 00:14:34,680

274
00:14:34,680 --> 00:14:37,190
We want to eventually talk
about limits of random

275
00:14:37,190 --> 00:14:39,750
variables, but as a warm up,
we're going to start with

276
00:14:39,750 --> 00:14:42,440
limits of sequences.

277
00:14:42,440 --> 00:14:47,670
So you're given a sequence
of numbers, a1,

278
00:14:47,670 --> 00:14:50,500
a2, a3, and so on.

279
00:14:50,500 --> 00:14:54,160
And we want to define the
notion that a sequence

280
00:14:54,160 --> 00:14:56,470
converges to a number.

281
00:14:56,470 --> 00:15:04,710
You sort of know what this
means, but let's just go

282
00:15:04,710 --> 00:15:06,510
through it some more.

283
00:15:06,510 --> 00:15:09,890
So here's a.

284
00:15:09,890 --> 00:15:16,200
We have our sequence of
values as n increases.

285
00:15:16,200 --> 00:15:20,290
What do we mean by the sequence
converging to a is

286
00:15:20,290 --> 00:15:23,550
that when you look at those
values, they get closer and

287
00:15:23,550 --> 00:15:25,140
closer to a.

288
00:15:25,140 --> 00:15:29,570
So this value here is your
typical a sub n.

289
00:15:29,570 --> 00:15:33,880
They get closer and closer to
a, and they stay closer.

290
00:15:33,880 --> 00:15:36,860
So let's try to make
that more precise.

291
00:15:36,860 --> 00:15:40,750
What it means is let's
fix a sense of what

292
00:15:40,750 --> 00:15:42,250
it means to be close.

293
00:15:42,250 --> 00:15:47,540
Let me look at an interval that
goes from a - epsilon to

294
00:15:47,540 --> 00:15:50,340
a + epsilon.

295
00:15:50,340 --> 00:15:57,280
Then if my sequence converges
to a, this means that as n

296
00:15:57,280 --> 00:16:02,810
increases, eventually the values
of the sequence that I

297
00:16:02,810 --> 00:16:06,420
get stay inside this band.

298
00:16:06,420 --> 00:16:10,430
Since they converge to a, this
means that eventually they

299
00:16:10,430 --> 00:16:14,130
will be smaller than
a + epsilon and

300
00:16:14,130 --> 00:16:16,310
bigger than a - epsilon.

301
00:16:16,310 --> 00:16:21,320
So convergence means that
given a band of positive

302
00:16:21,320 --> 00:16:25,690
length around the number a,
the values of the sequence

303
00:16:25,690 --> 00:16:28,720
that you get eventually
get inside and

304
00:16:28,720 --> 00:16:31,300
stay inside that band.

305
00:16:31,300 --> 00:16:34,060
So that's sort of the picture
definition of

306
00:16:34,060 --> 00:16:35,840
what convergence means.

307
00:16:35,840 --> 00:16:40,460
So now let's translate this into
a mathematical statement.

308
00:16:40,460 --> 00:16:45,610
Given a band of positive length,
no matter how wide

309
00:16:45,610 --> 00:16:50,690
that band is or how narrow it
is, so for every epsilon

310
00:16:50,690 --> 00:16:56,500
positive, eventually the
sequence gets inside the band.

311
00:16:56,500 --> 00:16:58,460
What does eventually mean?

312
00:16:58,460 --> 00:17:01,410
There exists a time,
so that after that

313
00:17:01,410 --> 00:17:03,510
time something happens.

314
00:17:03,510 --> 00:17:07,230
And the something that happens
is that after that time, we

315
00:17:07,230 --> 00:17:09,520
are inside that band.

316
00:17:09,520 --> 00:17:12,060
So this is a formal mathematical
definition, which

317
00:17:12,060 --> 00:17:17,250
actually translates what I was
telling in the wordy way

318
00:17:17,250 --> 00:17:20,140
before, and showing in
terms of the picture.

319
00:17:20,140 --> 00:17:25,140
Given a certain band, even if
it's narrow, eventually, after

320
00:17:25,140 --> 00:17:28,520
a certain time n0, the values
of the sequence are going to

321
00:17:28,520 --> 00:17:30,240
stay inside this band.

322
00:17:30,240 --> 00:17:35,770
Now, if I were to take epsilon
to be very small, this thing

323
00:17:35,770 --> 00:17:38,130
would still be true that
eventually I'm going to get

324
00:17:38,130 --> 00:17:42,400
inside of the band, except that
I may have to wait longer

325
00:17:42,400 --> 00:17:45,770
for the values to
get inside here.

326
00:17:45,770 --> 00:17:48,400
All right, that's what it means
for a deterministic

327
00:17:48,400 --> 00:17:51,350
sequence to converge
to something.

328
00:17:51,350 --> 00:17:54,150
Now, how about random
variables.

329
00:17:54,150 --> 00:17:57,340
What does it mean for a sequence
of random variables

330
00:17:57,340 --> 00:18:00,280
to converge to a number?

331
00:18:00,280 --> 00:18:02,600
We're just going to twist
a little bit of the word

332
00:18:02,600 --> 00:18:03,310
definition.

333
00:18:03,310 --> 00:18:08,390
For numbers, we said that
eventually the numbers get

334
00:18:08,390 --> 00:18:10,180
inside that band.

335
00:18:10,180 --> 00:18:13,270
But if instead of numbers we
have random variables with a

336
00:18:13,270 --> 00:18:18,080
certain distribution, so here
instead of a_n we're dealing

337
00:18:18,080 --> 00:18:20,750
with a random variable that has
a distribution, let's say,

338
00:18:20,750 --> 00:18:26,650
of this kind, what we want is
that this distribution gets

339
00:18:26,650 --> 00:18:31,460
inside this band, so it gets
concentrated inside here.

340
00:18:31,460 --> 00:18:33,150
What does it means that
the distribution

341
00:18:33,150 --> 00:18:34,850
gets inside this band?

342
00:18:34,850 --> 00:18:36,910
I mean a random variable
has a distribution.

343
00:18:36,910 --> 00:18:40,130
It may have some tails, so
maybe not the entire

344
00:18:40,130 --> 00:18:43,920
distribution gets concentrated
inside of the band.

345
00:18:43,920 --> 00:18:48,660
But we want that more and more
of this distribution is

346
00:18:48,660 --> 00:18:50,820
concentrated in this band.

347
00:18:50,820 --> 00:18:51,730
So that --

348
00:18:51,730 --> 00:18:53,130
in a sense that --

349
00:18:53,130 --> 00:18:57,070
the probability of falling
outside the band converges to

350
00:18:57,070 --> 00:19:00,410
0 -- becomes smaller
and smaller.

351
00:19:00,410 --> 00:19:05,660
So in words, we're going to say
that the sequence random

352
00:19:05,660 --> 00:19:09,070
variables or a sequence of
probability distributions,

353
00:19:09,070 --> 00:19:12,060
that would be the same,
converges to a particular

354
00:19:12,060 --> 00:19:15,070
number a if the following
is true.

355
00:19:15,070 --> 00:19:22,320
If I consider a small band
around a, then the probability

356
00:19:22,320 --> 00:19:26,300
that my random variable falls
outside this band, which is

357
00:19:26,300 --> 00:19:29,530
the area under this curve,
this probability becomes

358
00:19:29,530 --> 00:19:32,620
smaller and smaller as
n goes to infinity.

359
00:19:32,620 --> 00:19:35,370
The probability of being
outside this band

360
00:19:35,370 --> 00:19:38,570
converges to 0.

361
00:19:38,570 --> 00:19:40,620
So that's the intuitive idea.

362
00:19:40,620 --> 00:19:45,080
So in the beginning, maybe our
distribution is sitting

363
00:19:45,080 --> 00:19:46,590
everywhere.

364
00:19:46,590 --> 00:19:49,490
As n increases, the distribution
starts to get

365
00:19:49,490 --> 00:19:51,560
concentrating inside the band.

366
00:19:51,560 --> 00:19:57,300
When a is even bigger, our
distribution is even more

367
00:19:57,300 --> 00:20:00,310
inside that band, so that these
outside probabilities

368
00:20:00,310 --> 00:20:02,460
become smaller and smaller.

369
00:20:02,460 --> 00:20:03,860
So the corresponding
mathematical

370
00:20:03,860 --> 00:20:06,760
statement is the following.

371
00:20:06,760 --> 00:20:13,730
I fix a band around
a, a +/- epsilon.

372
00:20:13,730 --> 00:20:18,170
Given that band, the probability
of falling outside

373
00:20:18,170 --> 00:20:21,350
this band, this probability
converges to 0.

374
00:20:21,350 --> 00:20:23,600
Or another way to say it is
that the limit of this

375
00:20:23,600 --> 00:20:26,560
probability is equal to 0.

376
00:20:26,560 --> 00:20:29,720
If you were to translate this
into a complete mathematical

377
00:20:29,720 --> 00:20:31,800
statement, you would have
to write down the

378
00:20:31,800 --> 00:20:34,150
following messy thing.

379
00:20:34,150 --> 00:20:37,220
For every epsilon positive --

380
00:20:37,220 --> 00:20:39,480
that's this statement --

381
00:20:39,480 --> 00:20:41,240
the limit is 0.

382
00:20:41,240 --> 00:20:44,610
What does it mean that the
limit of something is 0?

383
00:20:44,610 --> 00:20:47,670
We flip back to the
previous slide.

384
00:20:47,670 --> 00:20:48,110
Why?

385
00:20:48,110 --> 00:20:51,430
Because a probability
is a number.

386
00:20:51,430 --> 00:20:54,720
So here we're talking about
a sequence of numbers

387
00:20:54,720 --> 00:20:56,340
convergent to 0.

388
00:20:56,340 --> 00:20:58,190
What does it mean for a
sequence of numbers to

389
00:20:58,190 --> 00:20:59,180
converge to 0?

390
00:20:59,180 --> 00:21:05,320
It means that for any epsilon
prime positive, there exists

391
00:21:05,320 --> 00:21:11,230
some n0 such that for every
n bigger than n0 the

392
00:21:11,230 --> 00:21:12,770
following is true --

393
00:21:12,770 --> 00:21:16,450
that this probability
is less than or

394
00:21:16,450 --> 00:21:17,860
equal to epsilon prime.

395
00:21:17,860 --> 00:21:20,610

396
00:21:20,610 --> 00:21:27,660
So the mathematical statement
is a little hard to parse.

397
00:21:27,660 --> 00:21:32,270
For every size of that band,
and then you take the

398
00:21:32,270 --> 00:21:34,990
definition of what it means for
the limit of a sequence of

399
00:21:34,990 --> 00:21:37,720
numbers to converge to 0.

400
00:21:37,720 --> 00:21:42,340
But it's a lot easier to
describe this in words and,

401
00:21:42,340 --> 00:21:45,010
basically, think in terms
of this picture.

402
00:21:45,010 --> 00:21:48,690
That as n increases, the
probability of falling outside

403
00:21:48,690 --> 00:21:51,305
those bands just become
smaller and smaller.

404
00:21:51,305 --> 00:21:56,590
So the statement is that our
distribution gets concentrated

405
00:21:56,590 --> 00:22:01,340
in arbitrarily narrow little
bands around that

406
00:22:01,340 --> 00:22:05,050
particular number a.

407
00:22:05,050 --> 00:22:05,350
OK.

408
00:22:05,350 --> 00:22:07,790
So let's look at an example.

409
00:22:07,790 --> 00:22:11,660
Suppose a random variable Yn has
a discrete distribution of

410
00:22:11,660 --> 00:22:13,720
this particular type.

411
00:22:13,720 --> 00:22:17,150
Does it converge to something?

412
00:22:17,150 --> 00:22:19,570
Well, the probability
distribution of this random

413
00:22:19,570 --> 00:22:22,370
variable gets concentrated
at 0 --

414
00:22:22,370 --> 00:22:26,520
there's more and more
probability of being at 0.

415
00:22:26,520 --> 00:22:29,710
If I fix a band around 0 --

416
00:22:29,710 --> 00:22:34,850
so if I take the band from minus
epsilon to epsilon and

417
00:22:34,850 --> 00:22:36,520
look at that band--

418
00:22:36,520 --> 00:22:42,350
the probability of falling
outside this band is 1/n.

419
00:22:42,350 --> 00:22:45,780
As n goes to infinity, that
probability goes to 0.

420
00:22:45,780 --> 00:22:50,550
So in this case, we do
have convergence.

421
00:22:50,550 --> 00:22:56,780
And Yn converges in probability
to the number 0.

422
00:22:56,780 --> 00:23:00,310
So this just captures the
facts obvious from this

423
00:23:00,310 --> 00:23:03,680
picture, that more and more of
our probability distribution

424
00:23:03,680 --> 00:23:07,630
gets concentrated around 0,
as n goes to infinity.

425
00:23:07,630 --> 00:23:10,330
Now, an interesting thing to
notice is the following, that

426
00:23:10,330 --> 00:23:15,390
even though Yn converges to 0,
if you were to write down the

427
00:23:15,390 --> 00:23:20,440
expected value for Yn,
what would it be?

428
00:23:20,440 --> 00:23:24,410
It's going to be n times the
probability of this value,

429
00:23:24,410 --> 00:23:26,240
which is 1/n.

430
00:23:26,240 --> 00:23:29,230
So the expected value
turns out to be 1.

431
00:23:29,230 --> 00:23:34,300
And if you were to look at the
expected value of Yn-squared,

432
00:23:34,300 --> 00:23:38,190
this would be 0.

433
00:23:38,190 --> 00:23:41,770
times this probability, and
then n-squared times this

434
00:23:41,770 --> 00:23:45,720
probability, which
is equal to n.

435
00:23:45,720 --> 00:23:49,850
And this actually goes
to infinity.

436
00:23:49,850 --> 00:23:53,580
So we have this, perhaps,
strange situation where a

437
00:23:53,580 --> 00:23:58,030
random variable goes to 0, but
the expected value of this

438
00:23:58,030 --> 00:24:01,140
random variable does
not go to 0.

439
00:24:01,140 --> 00:24:04,570
And the second moment of that
random variable actually goes

440
00:24:04,570 --> 00:24:05,790
to infinity.

441
00:24:05,790 --> 00:24:08,740
So this tells us that
convergence in probability

442
00:24:08,740 --> 00:24:11,380
tells you something,
but it doesn't tell

443
00:24:11,380 --> 00:24:13,310
you the whole story.

444
00:24:13,310 --> 00:24:17,260
Convergence to 0 of a random
variable doesn't imply

445
00:24:17,260 --> 00:24:20,630
anything about convergence
of expected values or of

446
00:24:20,630 --> 00:24:23,420
variances and so on.

447
00:24:23,420 --> 00:24:26,060
So the reason is that
convergence in probability

448
00:24:26,060 --> 00:24:28,470
tells you that this
tail probability

449
00:24:28,470 --> 00:24:30,400
here is very small.

450
00:24:30,400 --> 00:24:34,440
But it doesn't tell you how
far does this tail go.

451
00:24:34,440 --> 00:24:39,390
As in this example, the tail
probability is small, but that

452
00:24:39,390 --> 00:24:43,410
tail acts far away, so it
gives a disproportionate

453
00:24:43,410 --> 00:24:45,950
contribution to the expected
value or the

454
00:24:45,950 --> 00:24:47,200
expected value squared.

455
00:24:47,200 --> 00:24:53,340

456
00:24:53,340 --> 00:24:53,650
OK.

457
00:24:53,650 --> 00:24:59,000
So now we've got everything that
we need to go back to the

458
00:24:59,000 --> 00:25:02,900
sample mean and study
its properties.

459
00:25:02,900 --> 00:25:05,460
So the sad thing is
that we have a

460
00:25:05,460 --> 00:25:07,320
sequence of random variables.

461
00:25:07,320 --> 00:25:08,350
They're independent.

462
00:25:08,350 --> 00:25:10,450
They have the same
distribution.

463
00:25:10,450 --> 00:25:12,790
And we assume that they
have a finite mean

464
00:25:12,790 --> 00:25:14,480
and a finite variance.

465
00:25:14,480 --> 00:25:18,430
We're looking at the
sample mean.

466
00:25:18,430 --> 00:25:21,670
Now in principle, you can
calculate the probability

467
00:25:21,670 --> 00:25:25,090
distribution of the sample mean,
because we know how to

468
00:25:25,090 --> 00:25:26,950
find the distributions
of sums of

469
00:25:26,950 --> 00:25:28,320
independent random variables.

470
00:25:28,320 --> 00:25:31,030
You use the convolution
formula over and over.

471
00:25:31,030 --> 00:25:32,870
But this is pretty
complicated, so

472
00:25:32,870 --> 00:25:34,730
let's not look at that.

473
00:25:34,730 --> 00:25:38,920
Let's just look at expected
values, variances, and the

474
00:25:38,920 --> 00:25:42,610
probabilities that the sample
mean is far away

475
00:25:42,610 --> 00:25:44,310
from the true mean.

476
00:25:44,310 --> 00:25:47,470
So what is the expected value
of this random variable?

477
00:25:47,470 --> 00:25:51,260
The expected value of a sum of
random variables is the sum of

478
00:25:51,260 --> 00:25:52,510
the expected values.

479
00:25:52,510 --> 00:25:56,320

480
00:25:56,320 --> 00:26:00,320
And then we have this factor
of n in the denominator.

481
00:26:00,320 --> 00:26:07,040
Each one of these expected
values is mu, so we get mu.

482
00:26:07,040 --> 00:26:13,960
So the sample mean, the average
value of this Mn in

483
00:26:13,960 --> 00:26:18,570
expectation is the same as
the true mean inside our

484
00:26:18,570 --> 00:26:20,620
population.

485
00:26:20,620 --> 00:26:26,560
Now here, this is a fine
conceptual point, there's two

486
00:26:26,560 --> 00:26:29,920
kinds of averages involved
when you write down this

487
00:26:29,920 --> 00:26:31,280
expression.

488
00:26:31,280 --> 00:26:33,310
We understand that
expectations are

489
00:26:33,310 --> 00:26:36,470
some kind of average.

490
00:26:36,470 --> 00:26:40,250
The sample mean is also an
average over the values that

491
00:26:40,250 --> 00:26:42,240
we have observed.

492
00:26:42,240 --> 00:26:45,220
But it's two different
kinds of averages.

493
00:26:45,220 --> 00:26:50,460
The sample mean is the average
of the heights of the penguins

494
00:26:50,460 --> 00:26:54,330
that we collected over
a single expedition.

495
00:26:54,330 --> 00:26:59,600
The expected value is to be
thought of as follows, my

496
00:26:59,600 --> 00:27:02,060
probabilistic experiment
is one expedition

497
00:27:02,060 --> 00:27:04,160
to the South Pole.

498
00:27:04,160 --> 00:27:09,760
Expected value here means
thinking on the average over a

499
00:27:09,760 --> 00:27:12,620
huge number of expeditions.

500
00:27:12,620 --> 00:27:16,270
So my expedition is a random
experiment, I collect random

501
00:27:16,270 --> 00:27:18,520
samples, and they record Mn.

502
00:27:18,520 --> 00:27:21,230

503
00:27:21,230 --> 00:27:27,170
The average result of an
expedition is what we would

504
00:27:27,170 --> 00:27:31,060
get if we were to carry out
a zillion expeditions and

505
00:27:31,060 --> 00:27:35,050
average the averages that we
get at each particular

506
00:27:35,050 --> 00:27:36,090
expedition.

507
00:27:36,090 --> 00:27:39,860
So this Mn is the average during
a single expedition.

508
00:27:39,860 --> 00:27:44,090
This expectation is the average
over an imagined

509
00:27:44,090 --> 00:27:46,125
infinite sequence
of expeditions.

510
00:27:46,125 --> 00:27:49,760

511
00:27:49,760 --> 00:27:52,830
And of course, the other thing
to always keep in mind is that

512
00:27:52,830 --> 00:27:56,910
expectations give you numbers,
whereas the sample mean is

513
00:27:56,910 --> 00:28:00,210
actually a random variable.

514
00:28:00,210 --> 00:28:00,486
All right.

515
00:28:00,486 --> 00:28:03,310
So this random variable,
how random is it?

516
00:28:03,310 --> 00:28:05,610
How big is its variance?

517
00:28:05,610 --> 00:28:10,040
So the variance of a sum of
random variables is the sum of

518
00:28:10,040 --> 00:28:12,370
the variances.

519
00:28:12,370 --> 00:28:16,610
But since we're dividing by n,
when you calculate variances

520
00:28:16,610 --> 00:28:19,580
this brings in a factor
of n-squared.

521
00:28:19,580 --> 00:28:21,215
So the variance is sigma-squared
over n.

522
00:28:21,215 --> 00:28:24,340

523
00:28:24,340 --> 00:28:26,870
And in particular, the variance
of the sample mean

524
00:28:26,870 --> 00:28:28,620
becomes smaller and smaller.

525
00:28:28,620 --> 00:28:31,170
It means that when you estimate
that average height

526
00:28:31,170 --> 00:28:34,570
of penguins, if you take a
large sample, then your

527
00:28:34,570 --> 00:28:37,530
estimate is not going
to be too random.

528
00:28:37,530 --> 00:28:41,120
The randomness in your estimates
become small if you

529
00:28:41,120 --> 00:28:43,250
have a large sample size.

530
00:28:43,250 --> 00:28:46,090
Having a large sample size kind
of removes the randomness

531
00:28:46,090 --> 00:28:47,930
from your experiment.

532
00:28:47,930 --> 00:28:52,690
Now let's apply the Chebyshev
inequality to say something

533
00:28:52,690 --> 00:28:56,020
about tail probabilities
for the sample mean.

534
00:28:56,020 --> 00:28:59,610
The probability that you are
more than epsilon away from

535
00:28:59,610 --> 00:29:03,650
the true mean is less than or
equal to the variance of this

536
00:29:03,650 --> 00:29:07,030
quantity divided by this
number squared.

537
00:29:07,030 --> 00:29:09,860
So that's just the translation
of the Chebyshev inequality to

538
00:29:09,860 --> 00:29:12,320
the particular context
we've got here.

539
00:29:12,320 --> 00:29:13,590
We found the variance.

540
00:29:13,590 --> 00:29:15,100
It's sigma-squared over n.

541
00:29:15,100 --> 00:29:18,340
So we end up with
this expression.

542
00:29:18,340 --> 00:29:20,490
So what does this
expression do?

543
00:29:20,490 --> 00:29:25,570

544
00:29:25,570 --> 00:29:32,370
For any given epsilon, if
I fix epsilon, then this

545
00:29:32,370 --> 00:29:36,630
probability, which is less
than sigma-squared over n

546
00:29:36,630 --> 00:29:40,550
epsilon-squared, converges to
0 as n goes to infinity.

547
00:29:40,550 --> 00:29:44,730

548
00:29:44,730 --> 00:29:48,050
And this is just the definition
of convergence in

549
00:29:48,050 --> 00:29:49,690
probability.

550
00:29:49,690 --> 00:29:54,310
If this happens, that the
probability of being more than

551
00:29:54,310 --> 00:29:57,590
epsilon away from the mean, that
probability goes to 0,

552
00:29:57,590 --> 00:30:01,510
and this is true no matter how
I choose my epsilon, then by

553
00:30:01,510 --> 00:30:04,490
definition we have convergence
in probability.

554
00:30:04,490 --> 00:30:08,050
So we have proved that the
sample mean converges in

555
00:30:08,050 --> 00:30:11,430
probability to the true mean.

556
00:30:11,430 --> 00:30:16,210
And this is what the weak law
of large numbers tells us.

557
00:30:16,210 --> 00:30:21,060
So in some vague sense, it
tells us that the sample

558
00:30:21,060 --> 00:30:24,350
means, when you take the
average of many, many

559
00:30:24,350 --> 00:30:28,150
measurements in your sample,
then the sample mean is a good

560
00:30:28,150 --> 00:30:31,870
estimate of the true mean in the
sense that it approaches

561
00:30:31,870 --> 00:30:36,380
the true mean as your sample
size increases.

562
00:30:36,380 --> 00:30:39,220
It approaches the true mean,
but of course in a very

563
00:30:39,220 --> 00:30:42,540
specific sense, in probability,
according to this

564
00:30:42,540 --> 00:30:46,550
notion of convergence
that we have used.

565
00:30:46,550 --> 00:30:51,060
So since we're talking about
sampling, let's go over an

566
00:30:51,060 --> 00:30:56,150
example, which is the typical
situation faced by someone

567
00:30:56,150 --> 00:30:58,110
who's constructing a poll.

568
00:30:58,110 --> 00:31:02,680
So you're interested in some
property of the population.

569
00:31:02,680 --> 00:31:05,590
So what fraction of
the population

570
00:31:05,590 --> 00:31:08,380
prefers Coke to Pepsi?

571
00:31:08,380 --> 00:31:11,080
So there's a number f, which
is that fraction of the

572
00:31:11,080 --> 00:31:12,460
population.

573
00:31:12,460 --> 00:31:16,260
And so this is an
exact number.

574
00:31:16,260 --> 00:31:20,250
So out of a population of 100
million, 20 million prefer

575
00:31:20,250 --> 00:31:25,590
Coke, then f would be 0.2.

576
00:31:25,590 --> 00:31:27,970
We want to find out what
that fraction is.

577
00:31:27,970 --> 00:31:30,590
We cannot ask everyone.

578
00:31:30,590 --> 00:31:34,250
What we're going to do is to
take a random sample of people

579
00:31:34,250 --> 00:31:37,300
and ask them for their
preferences.

580
00:31:37,300 --> 00:31:42,690
So the ith person either says
yes for Coke or no.

581
00:31:42,690 --> 00:31:46,430
And we record that by putting
a 1 each time that we get a

582
00:31:46,430 --> 00:31:49,160
yes answer.

583
00:31:49,160 --> 00:31:51,850
And then we form the average
of these x's.

584
00:31:51,850 --> 00:31:53,070
What is this average?

585
00:31:53,070 --> 00:31:57,000
It's the number of 1's that
we got divided by n.

586
00:31:57,000 --> 00:32:02,590
So this is a fraction, but
calculated only on the basis

587
00:32:02,590 --> 00:32:04,880
of the sample that we have.

588
00:32:04,880 --> 00:32:10,260
So you can think of this as
being an estimate, f_hat,

589
00:32:10,260 --> 00:32:13,120
based on the sample
that we have.

590
00:32:13,120 --> 00:32:17,155
Now, even though we used the
lower case letter here, this

591
00:32:17,155 --> 00:32:20,590
f_hat is, of course,
a random variable.

592
00:32:20,590 --> 00:32:23,300
f is a number.

593
00:32:23,300 --> 00:32:27,570
This is the true fraction in
the overall population.

594
00:32:27,570 --> 00:32:30,380
f_hat is the estimate
that we get by using

595
00:32:30,380 --> 00:32:32,300
our particular sample.

596
00:32:32,300 --> 00:32:32,410
Ok.

597
00:32:32,410 --> 00:32:38,760
So your boss told you, I need to
know what f is, but go and

598
00:32:38,760 --> 00:32:40,150
do some sampling.

599
00:32:40,150 --> 00:32:42,720
What are you going to respond?

600
00:32:42,720 --> 00:32:46,360
Unless I ask everyone in the
whole population, there's no

601
00:32:46,360 --> 00:32:51,180
way for me to know f exactly.

602
00:32:51,180 --> 00:32:51,890
Right?

603
00:32:51,890 --> 00:32:54,560
There's no way.

604
00:32:54,560 --> 00:32:59,040
OK, so the boss tells you, well
OK, then that'll me f

605
00:32:59,040 --> 00:33:00,860
within an accuracy.

606
00:33:00,860 --> 00:33:10,910
I want an answer from you,
that's your answer, which is

607
00:33:10,910 --> 00:33:14,930
close to the correct answer
within 1 % point.

608
00:33:14,930 --> 00:33:20,260
So if the true f is 0.4, your
answer should be somewhere

609
00:33:20,260 --> 00:33:22,500
between 0.39 and 0.41.

610
00:33:22,500 --> 00:33:25,520
I want a really accurate
answer.

611
00:33:25,520 --> 00:33:27,580
What are you going to say?

612
00:33:27,580 --> 00:33:31,360
Well, there's no guarantee
that my answer

613
00:33:31,360 --> 00:33:33,230
will be within 1 %.

614
00:33:33,230 --> 00:33:37,320
Maybe I'm unlucky and I just
happen to sample the wrong set

615
00:33:37,320 --> 00:33:40,450
of people and my answer
comes out to be wrong.

616
00:33:40,450 --> 00:33:45,800
So I cannot give you a hard
guarantee that this inequality

617
00:33:45,800 --> 00:33:47,240
will be satisfied.

618
00:33:47,240 --> 00:33:51,990
But perhaps, I can give you a
guarantee that this inequality

619
00:33:51,990 --> 00:33:55,680
will be satisfied, this accuracy
requirement will be

620
00:33:55,680 --> 00:33:59,340
satisfied, with high
confidence.

621
00:33:59,340 --> 00:34:02,520
That is, there's going to be
a smaller probability that

622
00:34:02,520 --> 00:34:04,420
things go wrong, that
I'm unlikely

623
00:34:04,420 --> 00:34:07,030
and I use a bad sample.

624
00:34:07,030 --> 00:34:10,750
But leaving aside that smaller
probability of being unlucky,

625
00:34:10,750 --> 00:34:13,989
my answer will be accurate
within the accuracy

626
00:34:13,989 --> 00:34:16,100
requirement that you have.

627
00:34:16,100 --> 00:34:20,500
So these two numbers are the
usual specs that one has when

628
00:34:20,500 --> 00:34:22,010
designing polls.

629
00:34:22,010 --> 00:34:27,370
So this number is the accuracy
that we want.

630
00:34:27,370 --> 00:34:29,300
It's the desired accuracy.

631
00:34:29,300 --> 00:34:35,239
And this number has to do with
the confidence that we want.

632
00:34:35,239 --> 00:34:40,210
So 1 minus that number, we could
call it the confidence

633
00:34:40,210 --> 00:34:43,500
that we want out
of our sample.

634
00:34:43,500 --> 00:34:47,820
So this is really 1
minus confidence.

635
00:34:47,820 --> 00:34:51,830
So now your job is to figure out
how large an n, how large

636
00:34:51,830 --> 00:34:56,219
a sample should you be using, in
order to satisfy the specs

637
00:34:56,219 --> 00:34:59,060
that your boss gave you.

638
00:34:59,060 --> 00:35:02,560
All you know at this stage is
the Chebyshev inequality.

639
00:35:02,560 --> 00:35:05,210
So you just try to use it.

640
00:35:05,210 --> 00:35:09,780
The probability of getting an
answer that's more than 0.01

641
00:35:09,780 --> 00:35:14,780
away from the true answer is, by
Chebyshev's inequality, the

642
00:35:14,780 --> 00:35:20,170
variance of this random variable
divided by this

643
00:35:20,170 --> 00:35:21,540
number squared.

644
00:35:21,540 --> 00:35:25,870
The variance, as we argued
a little earlier, is the

645
00:35:25,870 --> 00:35:29,190
variance of the x's
divided by n.

646
00:35:29,190 --> 00:35:31,830
So we get this expression.

647
00:35:31,830 --> 00:35:35,230
So we would like this
number to be less

648
00:35:35,230 --> 00:35:38,330
than or equal to 0.05.

649
00:35:38,330 --> 00:35:41,620
OK, here we hit a little
bit off a difficulty.

650
00:35:41,620 --> 00:35:49,040
The variance, (sigma_x)-squared,
what is it?

651
00:35:49,040 --> 00:35:54,010
(Sigma_x)-squared is, if you
remember the variance of a

652
00:35:54,010 --> 00:35:58,010
Bernoulli random variable,
is this quantity.

653
00:35:58,010 --> 00:35:59,730
But we don't know it.

654
00:35:59,730 --> 00:36:02,880
f is what we're trying to
estimate in the first place.

655
00:36:02,880 --> 00:36:06,790
So the variance is not known,
so I cannot plug in a number

656
00:36:06,790 --> 00:36:08,080
inside here.

657
00:36:08,080 --> 00:36:12,340
What I can do is to be
conservative and use an upper

658
00:36:12,340 --> 00:36:14,050
bound of the variance.

659
00:36:14,050 --> 00:36:17,280
How large can this number get?

660
00:36:17,280 --> 00:36:20,090
Well, you can plot
f times (1-f).

661
00:36:20,090 --> 00:36:25,950

662
00:36:25,950 --> 00:36:26,750
It's a parabola.

663
00:36:26,750 --> 00:36:29,420
It has a root at 0 and at 1.

664
00:36:29,420 --> 00:36:34,450
So the maximum value is going to
be, by symmetry, at 1/2 and

665
00:36:34,450 --> 00:36:39,350
when f is 1/2, then this
variance becomes 1/4.

666
00:36:39,350 --> 00:36:42,340
So I don't know
(sigma_x)-squared, but I'm

667
00:36:42,340 --> 00:36:45,480
going to use the worst case
value for (sigma_x)-squared,

668
00:36:45,480 --> 00:36:48,480
which is 4.

669
00:36:48,480 --> 00:36:53,320
And this is now an inequality
that I know to be always true.

670
00:36:53,320 --> 00:36:56,910
I've got my specs, and my specs
tell me that I want this

671
00:36:56,910 --> 00:36:59,800
number to be less than 0.05.

672
00:36:59,800 --> 00:37:04,980
And given what I know, the best
thing I can do is to say,

673
00:37:04,980 --> 00:37:07,860
OK, I'm going to take
this number and make

674
00:37:07,860 --> 00:37:14,070
it less than 0.05.

675
00:37:14,070 --> 00:37:20,860
If I choose my n so that this
is less than 0.05, then I'm

676
00:37:20,860 --> 00:37:24,890
certain that this probability
is also less than 0.05.

677
00:37:24,890 --> 00:37:28,720
What does it take for this
inequality to be true?

678
00:37:28,720 --> 00:37:36,370
You can solve for n here, and
you find that to satisfy this

679
00:37:36,370 --> 00:37:40,780
inequality, n should be larger
than or equal to 50,000.

680
00:37:40,780 --> 00:37:44,250
So you can just let n
be equal to 50,000.

681
00:37:44,250 --> 00:37:47,920
So the Chebyshev inequality
tells us that if you take n

682
00:37:47,920 --> 00:37:51,940
equal to 50,000, then by the
Chebyshev inequality, we're

683
00:37:51,940 --> 00:37:57,850
guaranteed to satisfy the specs
that we were given.

684
00:37:57,850 --> 00:37:57,960
Ok.

685
00:37:57,960 --> 00:38:03,950
Now, 50,000 is a bit of
a large sample size.

686
00:38:03,950 --> 00:38:05,980
Right?

687
00:38:05,980 --> 00:38:09,490
If you read anything in the
newspapers where they say so

688
00:38:09,490 --> 00:38:13,230
much of the voters think this
and that, this was determined

689
00:38:13,230 --> 00:38:19,830
on the basis of a sample of
1,200 likely voters or so.

690
00:38:19,830 --> 00:38:23,430
So the numbers that you will
typically see in these news

691
00:38:23,430 --> 00:38:27,590
items about polling, they
usually involve sample sizes

692
00:38:27,590 --> 00:38:30,080
about the 1,000 or so.

693
00:38:30,080 --> 00:38:35,250
You will never see a sample
size of 50,000.

694
00:38:35,250 --> 00:38:37,230
That's too much.

695
00:38:37,230 --> 00:38:41,670
So where can we cut
some corners?

696
00:38:41,670 --> 00:38:46,390
Well, we can cut corners
basically in three places.

697
00:38:46,390 --> 00:38:49,950
This requirement is a
little too tight.

698
00:38:49,950 --> 00:38:53,530
Newspaper stories will usually
tell you, we have an accuracy

699
00:38:53,530 --> 00:38:58,800
of +/- 3 % points, instead
of 1 % point.

700
00:38:58,800 --> 00:39:03,770
And because this number comes up
as a square, by making it 3

701
00:39:03,770 --> 00:39:09,000
% points instead of 1, saves
you a factor of 10.

702
00:39:09,000 --> 00:39:12,790
Then, the five percent
confidence, I guess that's

703
00:39:12,790 --> 00:39:15,180
usually OK.

704
00:39:15,180 --> 00:39:19,400
If we use that factor of 10,
then we make our sample that

705
00:39:19,400 --> 00:39:23,730
we gain from here, then we get
a sample size of 10,000.

706
00:39:23,730 --> 00:39:25,980
And that's, again,
a little too big.

707
00:39:25,980 --> 00:39:28,140
So where can we fix things?

708
00:39:28,140 --> 00:39:31,140
Well, it turns out that this
inequality that we're using

709
00:39:31,140 --> 00:39:34,660
here, Chebyshev's inequality,
is just an inequality.

710
00:39:34,660 --> 00:39:36,890
It's not that tight.

711
00:39:36,890 --> 00:39:38,850
It's not very accurate.

712
00:39:38,850 --> 00:39:42,800
Maybe there's a better way of
calculating or estimating this

713
00:39:42,800 --> 00:39:46,760
quantity, which is smaller
than this.

714
00:39:46,760 --> 00:39:49,770
And using a more accurate
inequality or a more accurate

715
00:39:49,770 --> 00:39:55,320
bound, then we can convince
ourselves that we can settle

716
00:39:55,320 --> 00:39:57,800
with a smaller sample size.

717
00:39:57,800 --> 00:40:01,770
This more accurate kind of
inequality comes out of a

718
00:40:01,770 --> 00:40:04,140
difference limit theorem,
which is the next limit

719
00:40:04,140 --> 00:40:06,030
theorem we're going
to consider.

720
00:40:06,030 --> 00:40:08,310
We're going to start the
discussion today, but we're

721
00:40:08,310 --> 00:40:12,150
going to continue with
it next week.

722
00:40:12,150 --> 00:40:18,750
Before I tell you exactly what
that other limit theorem says,

723
00:40:18,750 --> 00:40:20,800
let me give you the
big picture of

724
00:40:20,800 --> 00:40:24,760
what's involved here.

725
00:40:24,760 --> 00:40:29,170
We're dealing with sums of
i.i.d random variables.

726
00:40:29,170 --> 00:40:32,300
Each X has a distribution
of its own.

727
00:40:32,300 --> 00:40:34,840

728
00:40:34,840 --> 00:40:41,190
So suppose that X has a
distribution which is

729
00:40:41,190 --> 00:40:43,090
something like this.

730
00:40:43,090 --> 00:40:48,560
This is the density of X. If I
add lots of X's together, what

731
00:40:48,560 --> 00:40:51,460
kind of distribution
do I expect?

732
00:40:51,460 --> 00:40:55,170
The mean is going to be
n times the mean of an

733
00:40:55,170 --> 00:41:00,560
individual X. So if this is mu,
I'm going to get a mean of

734
00:41:00,560 --> 00:41:02,730
n times mu.

735
00:41:02,730 --> 00:41:06,620
But my variance will
also increase.

736
00:41:06,620 --> 00:41:08,050
When I add the random
variables,

737
00:41:08,050 --> 00:41:10,190
I'm adding the variances.

738
00:41:10,190 --> 00:41:13,370
So since the variance increases,
we're going to get

739
00:41:13,370 --> 00:41:17,610
a distribution that's
pretty wide.

740
00:41:17,610 --> 00:41:23,240
So this is the density of X1
plus all the way to Xn.

741
00:41:23,240 --> 00:41:27,640
So as n increases, my
distribution shifts, because

742
00:41:27,640 --> 00:41:28,770
the mean is positive.

743
00:41:28,770 --> 00:41:30,610
So I keep adding things.

744
00:41:30,610 --> 00:41:33,870
And also, my distribution
becomes wider and wider.

745
00:41:33,870 --> 00:41:36,080
The variance increases.

746
00:41:36,080 --> 00:41:39,260
Well, we started a different
scaling.

747
00:41:39,260 --> 00:41:42,980
We started a scaled version of
this quantity when we looked

748
00:41:42,980 --> 00:41:46,180
at the weak law of
large numbers.

749
00:41:46,180 --> 00:41:49,580
In the weak law of large
numbers, we take this random

750
00:41:49,580 --> 00:41:52,140
variable and divide it by n.

751
00:41:52,140 --> 00:41:56,300
And what the weak law tells us
is that we're going to get a

752
00:41:56,300 --> 00:42:01,050
distribution that's very highly
concentrated around the

753
00:42:01,050 --> 00:42:03,650
true mean, which is mu.

754
00:42:03,650 --> 00:42:07,520
So this here would be the
density of X1 plus

755
00:42:07,520 --> 00:42:12,630
Xn divided by n.

756
00:42:12,630 --> 00:42:16,660
Because I've divided by n, the
mean has become the original

757
00:42:16,660 --> 00:42:19,410
mean, which is mu.

758
00:42:19,410 --> 00:42:22,620
But the weak law of large
numbers tells us that the

759
00:42:22,620 --> 00:42:26,650
distribution of this random
variable is very concentrated

760
00:42:26,650 --> 00:42:27,810
around the mean.

761
00:42:27,810 --> 00:42:29,850
So we get a distribution
that's very

762
00:42:29,850 --> 00:42:31,520
narrow in this kind.

763
00:42:31,520 --> 00:42:34,230
In the limit, this distribution
becomes one

764
00:42:34,230 --> 00:42:37,570
that's just concentrated
on top of mu.

765
00:42:37,570 --> 00:42:40,930
So it's sort of a degenerate
distribution.

766
00:42:40,930 --> 00:42:46,070
So these are two extremes, no
scaling for the sum, a scaling

767
00:42:46,070 --> 00:42:47,740
where we divide by n.

768
00:42:47,740 --> 00:42:50,680
In this extreme, we get the
trivial case of a distribution

769
00:42:50,680 --> 00:42:52,860
that flattens out completely.

770
00:42:52,860 --> 00:42:56,070
In this scaling, we get a
distribution that gets

771
00:42:56,070 --> 00:42:59,150
concentrated around
a single point.

772
00:42:59,150 --> 00:43:02,030
Again, we look at some
intermediate scaling that

773
00:43:02,030 --> 00:43:04,050
makes things more interesting.

774
00:43:04,050 --> 00:43:09,700
Things do become interesting
if we scale by dividing the

775
00:43:09,700 --> 00:43:14,520
sum by square root of n instead
of dividing by n.

776
00:43:14,520 --> 00:43:17,210
What effect does this have?

777
00:43:17,210 --> 00:43:22,510
When we scale by dividing by
square root of n, the variance

778
00:43:22,510 --> 00:43:28,050
of Sn over square root of n is
going to be the variance of Sn

779
00:43:28,050 --> 00:43:30,760
over sum divided by n.

780
00:43:30,760 --> 00:43:32,780
That's how variances behave.

781
00:43:32,780 --> 00:43:37,370
The variance of Sn is n
sigma-squared, divide by n,

782
00:43:37,370 --> 00:43:41,330
which is sigma squared, which
means that when we scale in

783
00:43:41,330 --> 00:43:45,940
this particular way,
as n changes, the

784
00:43:45,940 --> 00:43:48,230
variance doesn't change.

785
00:43:48,230 --> 00:43:50,300
So the width of our
distribution

786
00:43:50,300 --> 00:43:52,190
will be sort of constant.

787
00:43:52,190 --> 00:43:56,360
The distribution changes shape,
but it doesn't become

788
00:43:56,360 --> 00:43:59,910
narrower as was the case here.

789
00:43:59,910 --> 00:44:04,550
It doesn't become wider, kind
of keeps the same width.

790
00:44:04,550 --> 00:44:09,260
So perhaps in the limit, this
distribution is going to take

791
00:44:09,260 --> 00:44:11,080
an interesting shape.

792
00:44:11,080 --> 00:44:14,170
And that's indeed the case.

793
00:44:14,170 --> 00:44:19,800
So let's do what
we did before.

794
00:44:19,800 --> 00:44:25,110
So we're looking at the sum, and
we want to divide the sum

795
00:44:25,110 --> 00:44:28,860
by something that goes like
square root of n.

796
00:44:28,860 --> 00:44:33,140
So the variance of Sn
is n sigma squared.

797
00:44:33,140 --> 00:44:38,240
The variance of the sigma Sn
is the square root of that.

798
00:44:38,240 --> 00:44:39,570
It's this number.

799
00:44:39,570 --> 00:44:43,930
So effectively, we're scaling
by order of square root n.

800
00:44:43,930 --> 00:44:47,570
Now, I'm doing another
thing here.

801
00:44:47,570 --> 00:44:52,350
If my random variable has a
positive mean, then this

802
00:44:52,350 --> 00:44:55,470
quantity is going to
have a mean that's

803
00:44:55,470 --> 00:44:56,950
positive and growing.

804
00:44:56,950 --> 00:44:59,450
It's going to be shifting
to the right.

805
00:44:59,450 --> 00:45:01,350
Why is that?

806
00:45:01,350 --> 00:45:04,370
Sn has a mean that's
proportional to n.

807
00:45:04,370 --> 00:45:09,510
When I divide by square root n,
then it means that the mean

808
00:45:09,510 --> 00:45:11,990
scales like square root of n.

809
00:45:11,990 --> 00:45:14,740
So my distribution would
still keep shifting

810
00:45:14,740 --> 00:45:16,720
after I do this division.

811
00:45:16,720 --> 00:45:20,860
I want to keep my distribution
in place, so I subtract out

812
00:45:20,860 --> 00:45:23,920
the mean of Sn.

813
00:45:23,920 --> 00:45:29,580
So what we're doing here is
a standard technique or

814
00:45:29,580 --> 00:45:32,670
transformation where you take
a random variable and you

815
00:45:32,670 --> 00:45:34,890
so-called standardize it.

816
00:45:34,890 --> 00:45:38,500
I remove the mean of that random
variable and I divide

817
00:45:38,500 --> 00:45:40,100
by the standard deviation.

818
00:45:40,100 --> 00:45:43,030
This results in a random
variable that has 0 mean and

819
00:45:43,030 --> 00:45:44,960
unit variance.

820
00:45:44,960 --> 00:45:49,880
What Zn measures is the
following, Zn tells me how

821
00:45:49,880 --> 00:45:55,520
many standard deviations am
I away from the mean.

822
00:45:55,520 --> 00:45:59,380
Sn minus (n times expected value
of X) tells me how much

823
00:45:59,380 --> 00:46:02,980
is Sn away from the
mean value of Sn.

824
00:46:02,980 --> 00:46:06,250
And by dividing by the standard
deviation of Sn --

825
00:46:06,250 --> 00:46:09,830
this tells me how many standard
deviations away from

826
00:46:09,830 --> 00:46:12,550
the mean am I.

827
00:46:12,550 --> 00:46:15,360
So we're going to look at this
random variable, which is just

828
00:46:15,360 --> 00:46:17,260
a transformation Zn.

829
00:46:17,260 --> 00:46:20,840
It's a linear transformation
of Sn.

830
00:46:20,840 --> 00:46:24,740
S And we're going to compare
this random variable to a

831
00:46:24,740 --> 00:46:27,230
standard normal random
variable.

832
00:46:27,230 --> 00:46:30,610
So a standard normal is the
random variable that you are

833
00:46:30,610 --> 00:46:35,200
familiar with, given by the
usual formula, and for which

834
00:46:35,200 --> 00:46:37,400
we have tables for it.

835
00:46:37,400 --> 00:46:40,400
This Zn has 0 mean and
unit variance.

836
00:46:40,400 --> 00:46:44,220
So in that respect, it has the
same statistics as the

837
00:46:44,220 --> 00:46:45,655
standard normal.

838
00:46:45,655 --> 00:46:48,960
The distribution of Zn
could be anything --

839
00:46:48,960 --> 00:46:50,770
can be pretty messy.

840
00:46:50,770 --> 00:46:53,320
But there is this amazing
theorem called the central

841
00:46:53,320 --> 00:46:58,250
limit theorem that tells us that
the distribution of Zn

842
00:46:58,250 --> 00:47:01,930
approaches the distribution of
the standard normal in the

843
00:47:01,930 --> 00:47:06,270
following sense, that
probability is that you can

844
00:47:06,270 --> 00:47:07,080
calculate --

845
00:47:07,080 --> 00:47:07,930
of this type --

846
00:47:07,930 --> 00:47:10,350
that you can calculate
for Zn --

847
00:47:10,350 --> 00:47:13,330
is the limit becomes the same as
the probabilities that you

848
00:47:13,330 --> 00:47:17,590
would get from the standard
normal tables for Z.

849
00:47:17,590 --> 00:47:19,750
It's a statement about
the cumulative

850
00:47:19,750 --> 00:47:21,960
distribution functions.

851
00:47:21,960 --> 00:47:25,060
This quantity, as a function
of c, is the cumulative

852
00:47:25,060 --> 00:47:27,920
distribution function of
the random variable Zn.

853
00:47:27,920 --> 00:47:30,860
This is the cumulative
distribution function of the

854
00:47:30,860 --> 00:47:32,190
standard normal.

855
00:47:32,190 --> 00:47:34,530
The central limit theorem tells
us that the cumulative

856
00:47:34,530 --> 00:47:39,340
distribution function of the
sum of a number of random

857
00:47:39,340 --> 00:47:43,040
variables, after they're
appropriately standardized,

858
00:47:43,040 --> 00:47:46,480
approaches the cumulative
distribution function over the

859
00:47:46,480 --> 00:47:50,580
standard normal distribution.

860
00:47:50,580 --> 00:47:53,620
In particular, this tells
us that we can calculate

861
00:47:53,620 --> 00:47:59,480
probabilities for Zn when n is
large by calculating instead

862
00:47:59,480 --> 00:48:02,800
probabilities for Z. And that's
going to be a good

863
00:48:02,800 --> 00:48:04,020
approximation.

864
00:48:04,020 --> 00:48:07,670
Probabilities for Z are easy to
calculate because they're

865
00:48:07,670 --> 00:48:09,250
well tabulated.

866
00:48:09,250 --> 00:48:12,820
So we get a very nice shortcut
for calculating

867
00:48:12,820 --> 00:48:14,990
probabilities for Zn.

868
00:48:14,990 --> 00:48:17,990
Now, it's not Zn that you're
interested in.

869
00:48:17,990 --> 00:48:20,890
What you're interested
in is Sn.

870
00:48:20,890 --> 00:48:23,820
And Sn --

871
00:48:23,820 --> 00:48:29,080
inverting this relation
here --

872
00:48:29,080 --> 00:48:38,330
Sn is square root n sigma
Zn plus n expected

873
00:48:38,330 --> 00:48:42,602
value of X. All right.

874
00:48:42,602 --> 00:48:46,620
Now, if you can calculate
probabilities for Zn, even

875
00:48:46,620 --> 00:48:49,380
approximately, then you can
certainly calculate

876
00:48:49,380 --> 00:48:53,290
probabilities for Sn, because
one is a linear

877
00:48:53,290 --> 00:48:55,206
function of the other.

878
00:48:55,206 --> 00:48:58,710
And we're going to do a little
bit of that next time.

879
00:48:58,710 --> 00:49:02,220
You're going to get, also, some
practice in recitation.

880
00:49:02,220 --> 00:49:04,975
At a more vague level, you could
describe the central

881
00:49:04,975 --> 00:49:08,270
limit theorem as saying the
following, when n is large,

882
00:49:08,270 --> 00:49:12,160
you can pretend that Zn is
a standard normal random

883
00:49:12,160 --> 00:49:15,440
variable and do the calculations
as if Zn was

884
00:49:15,440 --> 00:49:16,680
standard normal.

885
00:49:16,680 --> 00:49:21,530
Now, pretending that Zn is
normal is the same as

886
00:49:21,530 --> 00:49:25,900
pretending that Sn is normal,
because Sn is a linear

887
00:49:25,900 --> 00:49:27,700
function of Zn.

888
00:49:27,700 --> 00:49:30,400
And we know that linear
functions of normal random

889
00:49:30,400 --> 00:49:32,140
variables are normal.

890
00:49:32,140 --> 00:49:36,290
So the central limit theorem
essentially tells us that we

891
00:49:36,290 --> 00:49:40,070
can pretend that Sn is a normal
random variable and do

892
00:49:40,070 --> 00:49:44,760
the calculations just as if it
were a normal random variable.

893
00:49:44,760 --> 00:49:47,020
Mathematically speaking though,
the central limit

894
00:49:47,020 --> 00:49:50,480
theorem does not talk about
the distribution of Sn,

895
00:49:50,480 --> 00:49:54,940
because the distribution of Sn
becomes degenerate in the

896
00:49:54,940 --> 00:49:57,650
limit, just a very flat
and long thing.

897
00:49:57,650 --> 00:49:59,810
So strictly speaking
mathematically, it's a

898
00:49:59,810 --> 00:50:03,060
statement about cumulative
distributions of Zn's.

899
00:50:03,060 --> 00:50:06,420
Practically, the way you use it
is by just pretending that

900
00:50:06,420 --> 00:50:08,415
Sn is normal.

901
00:50:08,415 --> 00:50:09,400
Very good.

902
00:50:09,400 --> 00:50:11,080
Enjoy the Thanksgiving Holiday.

903
00:50:11,080 --> 00:50:12,330