1
00:00:00,000 --> 00:00:01,930
FEMALE SPEAKER: The following
content is provided under a

2
00:00:01,930 --> 00:00:03,670
Creative Commons license.

3
00:00:03,670 --> 00:00:06,640
Your support will help MIT
OpenCourseWare continue to

4
00:00:06,640 --> 00:00:09,980
offer high-quality educational
resources for free.

5
00:00:09,980 --> 00:00:12,820
To make a donation or to view
additional materials from

6
00:00:12,820 --> 00:00:15,246
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:15,246 --> 00:00:16,496
ocw.mit.edu.

8
00:00:20,825 --> 00:00:23,110
PROFESSOR: OK, so my name is Ben
Olken and we're going to

9
00:00:23,110 --> 00:00:25,640
be talking about how to think
about sample size for

10
00:00:25,640 --> 00:00:26,780
randomized evaluations.

11
00:00:26,780 --> 00:00:29,250
And more generally that the
point of this lecture is not

12
00:00:29,250 --> 00:00:32,810
just about sample size but we've
spent a lot of time,

13
00:00:32,810 --> 00:00:34,650
like in last lecture, for
example, thinking about the

14
00:00:34,650 --> 00:00:35,800
data we're going to collect.

15
00:00:35,800 --> 00:00:37,000
Then the question is, well,
what are we going to

16
00:00:37,000 --> 00:00:38,120
do with that data?

17
00:00:38,120 --> 00:00:41,550
And so it's about sample size
but also more generally, we're

18
00:00:41,550 --> 00:00:44,540
going to talk about how do we
analyze data in the context of

19
00:00:44,540 --> 00:00:45,504
an experiment.

20
00:00:45,504 --> 00:00:46,310
OK.

21
00:00:46,310 --> 00:00:50,970
So as I said, where we're going
to end up at the end of

22
00:00:50,970 --> 00:00:53,630
this lecture is, how big
a sample do we need?

23
00:00:53,630 --> 00:00:55,670
But in order to think about how
big a sample we need, we

24
00:00:55,670 --> 00:00:58,020
need to understand a little more
about how do we actually

25
00:00:58,020 --> 00:01:01,070
analyze this data.

26
00:01:01,070 --> 00:01:04,690
When we say, how large does a
sample need to be to credibly

27
00:01:04,690 --> 00:01:09,110
detect a given treatment effect,
we're going to need to

28
00:01:09,110 --> 00:01:12,170
be a little more precise about
what we mean by credibly and

29
00:01:12,170 --> 00:01:15,950
particularly think a little bit
about the statistics that

30
00:01:15,950 --> 00:01:18,470
are involved in thinking
through--

31
00:01:18,470 --> 00:01:19,180
evaluate--

32
00:01:19,180 --> 00:01:21,210
understanding these
experiments.

33
00:01:21,210 --> 00:01:23,290
And particularly, when we say
something that's credibly

34
00:01:23,290 --> 00:01:25,870
different, what we mean is
that we can be reasonably

35
00:01:25,870 --> 00:01:27,660
sure, and I'll be a little bit
more precise about what we

36
00:01:27,660 --> 00:01:30,110
mean by that, that the
difference between the two

37
00:01:30,110 --> 00:01:32,430
different groups-- the treatment
and control group--

38
00:01:32,430 --> 00:01:34,770
didn't just occur by random
chance, right?

39
00:01:34,770 --> 00:01:37,500
That there's really something
that we'll call statistically

40
00:01:37,500 --> 00:01:41,260
significantly different between
these two groups, OK?

41
00:01:41,260 --> 00:01:43,840
And when we think about
randomizing, right?

42
00:01:43,840 --> 00:01:47,350
So we've talked about which
groups get the treatment and

43
00:01:47,350 --> 00:01:51,290
which get the control, that's
going to mean that we expect

44
00:01:51,290 --> 00:01:53,610
the two groups to be similar
if there was no treatment

45
00:01:53,610 --> 00:01:55,260
effect because the only
difference between them is

46
00:01:55,260 --> 00:01:56,550
that they were randomized.

47
00:01:56,550 --> 00:02:00,060
But there's going to be some
variation in the outcomes

48
00:02:00,060 --> 00:02:02,780
between the two different
groups, OK?

49
00:02:02,780 --> 00:02:05,160
And so randomization is going
to remove the bias.

50
00:02:05,160 --> 00:02:07,290
It's going to mean that the
groups-- we expect the two

51
00:02:07,290 --> 00:02:08,919
different groups to be
the same, but there

52
00:02:08,919 --> 00:02:10,520
still could be noise.

53
00:02:10,520 --> 00:02:12,990
So in some sense, another way of
thinking about this lecture

54
00:02:12,990 --> 00:02:15,170
is that this lecture is
all about the noise.

55
00:02:15,170 --> 00:02:18,160
And how big a sample do we
need for the noise to be

56
00:02:18,160 --> 00:02:20,800
sufficiently small for us to
actually credibly detect the

57
00:02:20,800 --> 00:02:24,400
differences between the two
different groups, OK?

58
00:02:24,400 --> 00:02:26,760
So that's what we're going to
talk about is basically, how

59
00:02:26,760 --> 00:02:28,960
large is large so we can
get rid of the noise?

60
00:02:28,960 --> 00:02:30,595
And let me say, by the way, that
we've got an hour and a

61
00:02:30,595 --> 00:02:33,310
half, but you should feel free
to interrupt with questions or

62
00:02:33,310 --> 00:02:35,270
whatever if I say something
that's not clear because

63
00:02:35,270 --> 00:02:36,650
there's a lot of material that
we're going to be going

64
00:02:36,650 --> 00:02:38,980
through pretty quickly.

65
00:02:38,980 --> 00:02:40,390
OK.

66
00:02:40,390 --> 00:02:45,050
So when we think about how big
our sample means to be--

67
00:02:45,050 --> 00:02:49,050
remember, the whole point is how
big does our sample have

68
00:02:49,050 --> 00:02:52,530
to be remove the noise that's
going to be in our data?

69
00:02:52,530 --> 00:02:56,050
And when we think about that, we
think essentially about how

70
00:02:56,050 --> 00:02:58,190
noisy our data is, right?

71
00:02:58,190 --> 00:03:02,230
So how big a sample we need is
going to be determined by how

72
00:03:02,230 --> 00:03:05,810
noisy is the data and also how
big an effect we're looking

73
00:03:05,810 --> 00:03:06,650
for, right?

74
00:03:06,650 --> 00:03:11,770
So if the data is really noisy
but the effect is enormous,

75
00:03:11,770 --> 00:03:13,200
then we don't need as
big of a sample.

76
00:03:13,200 --> 00:03:15,310
But if the effect we're looking
for is really small

77
00:03:15,310 --> 00:03:17,550
relative to the noise in the
data, we're going to need a

78
00:03:17,550 --> 00:03:18,260
bigger sample.

79
00:03:18,260 --> 00:03:20,690
So actually, sometimes it's
the comparison between the

80
00:03:20,690 --> 00:03:22,450
effect size and how
noisy the data is.

81
00:03:22,450 --> 00:03:24,330
It's the ratio between
these things

82
00:03:24,330 --> 00:03:26,520
that's really important.

83
00:03:26,520 --> 00:03:28,920
Other factors that we're going
to talk about are, did we do a

84
00:03:28,920 --> 00:03:31,160
baseline survey before
we started?

85
00:03:31,160 --> 00:03:33,400
Because a baseline can
essentially help us reduce the

86
00:03:33,400 --> 00:03:35,950
noise in some sense.

87
00:03:35,950 --> 00:03:37,520
We're going to talk about
whether individual responses

88
00:03:37,520 --> 00:03:38,560
are correlated with
each other.

89
00:03:38,560 --> 00:03:43,790
So for example, if we were to
randomize a whole group of

90
00:03:43,790 --> 00:03:47,050
people into a given treatment,
that group might be similar in

91
00:03:47,050 --> 00:03:48,100
lots of other respects.

92
00:03:48,100 --> 00:03:50,610
So you can't really count that
whole group as if they were

93
00:03:50,610 --> 00:03:52,430
all independent observations
because they might be

94
00:03:52,430 --> 00:03:52,880
correlated.

95
00:03:52,880 --> 00:03:54,890
For example, you all just
took my lecture.

96
00:03:54,890 --> 00:03:57,710
So if you all were put in the
same treatment group, you all

97
00:03:57,710 --> 00:04:00,085
were exposed to the treatment
but you also all were exposed

98
00:04:00,085 --> 00:04:01,030
to my lecture and so you're not

99
00:04:01,030 --> 00:04:03,970
necessarily independent events.

100
00:04:03,970 --> 00:04:05,650
And there are some other issues
in terms of the design

101
00:04:05,650 --> 00:04:07,270
of the experiment that we'll
talk about that can help

102
00:04:07,270 --> 00:04:10,050
affect samples as well, like
stratification, control

103
00:04:10,050 --> 00:04:12,860
variables, baseline data, et
cetera, which we're going to

104
00:04:12,860 --> 00:04:14,790
talk about, OK?

105
00:04:14,790 --> 00:04:17,640
So the way we're going to go in
this lecture is, I'm going

106
00:04:17,640 --> 00:04:20,620
to start off with some basics
about, what does it mean to

107
00:04:20,620 --> 00:04:23,280
test a hypothesis
statistically?

108
00:04:23,280 --> 00:04:25,440
And then when we get into
hypothesis testing, there are

109
00:04:25,440 --> 00:04:28,670
two different types of
errors that we're

110
00:04:28,670 --> 00:04:29,280
going to talk about.

111
00:04:29,280 --> 00:04:32,640
They're helpfully named type
I and type II errors.

112
00:04:32,640 --> 00:04:34,960
And you have to be careful not
to make a type III error,

113
00:04:34,960 --> 00:04:38,630
which is to confuse a type
I and a type II error.

114
00:04:38,630 --> 00:04:42,790
So we'll talk about
what those are.

115
00:04:42,790 --> 00:04:45,215
Then we'll talk about standard
errors and significance, which

116
00:04:45,215 --> 00:04:48,380
is, how do we think about more
formally what these different

117
00:04:48,380 --> 00:04:49,630
types of errors are?

118
00:04:54,060 --> 00:04:55,150
We'll talk about power.

119
00:04:55,150 --> 00:04:56,500
We'll talk about the
effect size.

120
00:04:56,500 --> 00:04:58,470
And then, finally, the factors
that influence power, OK?

121
00:04:58,470 --> 00:04:59,710
So this is all the stuff
we're going to go

122
00:04:59,710 --> 00:05:01,950
through, all right?

123
00:05:01,950 --> 00:05:09,108
So in order to understand
the basic concepts of--

124
00:05:09,108 --> 00:05:11,420
when we're talking about
hypothesis testing, we need to

125
00:05:11,420 --> 00:05:13,285
think a little about
probabilities, OK?

126
00:05:13,285 --> 00:05:15,480
Because all this comes down,
essentially, to some basic

127
00:05:15,480 --> 00:05:18,600
analysis about probability.

128
00:05:18,600 --> 00:05:21,040
So for example, suppose you had
a professional-- and the

129
00:05:21,040 --> 00:05:24,390
intuition here is that the more
observations we get, the

130
00:05:24,390 --> 00:05:27,200
more we can understand the
true probability that

131
00:05:27,200 --> 00:05:28,380
something occurred--

132
00:05:28,380 --> 00:05:30,700
whether the true probability
that something occurred was

133
00:05:30,700 --> 00:05:33,620
due to a real difference in
the underlying process or

134
00:05:33,620 --> 00:05:34,510
whether it was just
random chance.

135
00:05:34,510 --> 00:05:37,500
So for example, consider
the following example.

136
00:05:37,500 --> 00:05:41,100
So suppose you're faced with a
professional gambler who told

137
00:05:41,100 --> 00:05:44,550
you that she could get heads
most of the time.

138
00:05:44,550 --> 00:05:46,860
OK, so you might think this is
a reasonable claim or an

139
00:05:46,860 --> 00:05:48,880
unreasonable claim, but this is
what they're claiming and

140
00:05:48,880 --> 00:05:50,020
you want to see if
this is true.

141
00:05:50,020 --> 00:05:53,160
So they toss the coin and
they get heads, right?

142
00:05:53,160 --> 00:05:56,400
So can we learn anything
from that?

143
00:05:56,400 --> 00:05:59,070
Well, probably not because
anyone, even with a fair coin,

144
00:05:59,070 --> 00:06:01,160
50% of the time, they would get
heads if they tossed it.

145
00:06:01,160 --> 00:06:03,840
So we're really can't infer
anything from this one.

146
00:06:03,840 --> 00:06:06,540
What you saw that they did five
times and they got heads,

147
00:06:06,540 --> 00:06:09,440
heads, tails, heads, heads.

148
00:06:09,440 --> 00:06:12,000
Well, can you infer anything
about that?

149
00:06:12,000 --> 00:06:12,610
Well, maybe.

150
00:06:12,610 --> 00:06:15,000
You can start to say, well, this
seems less likely to have

151
00:06:15,000 --> 00:06:16,110
occurred just by
random chance.

152
00:06:16,110 --> 00:06:17,930
But you know there's
only five tosses.

153
00:06:17,930 --> 00:06:19,750
What's the chance that someone
with an even coin

154
00:06:19,750 --> 00:06:20,540
can get four heads?

155
00:06:20,540 --> 00:06:24,610
Well, we could calculate that if
we knew the probabilities.

156
00:06:24,610 --> 00:06:26,140
And it's certainly not
impossible that this could

157
00:06:26,140 --> 00:06:27,650
occur, right?

158
00:06:27,650 --> 00:06:29,860
And now, what if they got
20 tosses, right?

159
00:06:29,860 --> 00:06:32,210
Well, now you're starting to get
information, although in

160
00:06:32,210 --> 00:06:34,600
this particular example,
it was closer to 50-50.

161
00:06:34,600 --> 00:06:37,520
So now you have 12
versus eight.

162
00:06:37,520 --> 00:06:39,050
Could that have occurred
by random chance?

163
00:06:39,050 --> 00:06:40,870
Well, maybe it could
have, right?

164
00:06:40,870 --> 00:06:44,000
Because it's pretty
close to 50-50.

165
00:06:44,000 --> 00:06:46,530
And now, suppose you had 100
tosses or suppose you had

166
00:06:46,530 --> 00:06:50,900
1,000 tosses with 609 heads
and 391 tails, right?

167
00:06:50,900 --> 00:06:56,250
So as you're getting more and
more data, right, you're much

168
00:06:56,250 --> 00:06:58,600
more likely to say something
is meaningful.

169
00:06:58,600 --> 00:07:01,600
So if you saw this data, for
example, the odds that could

170
00:07:01,600 --> 00:07:03,545
occur by random chance
are pretty high.

171
00:07:03,545 --> 00:07:07,610
But if you saw this data with
609 heads and 391 tails out of

172
00:07:07,610 --> 00:07:10,180
1,000 tosses, it's actually
pretty unlikely that this

173
00:07:10,180 --> 00:07:12,450
would occur just by
random chance, OK?

174
00:07:12,450 --> 00:07:17,330
And so this shows you, as you
get more data you can actually

175
00:07:17,330 --> 00:07:19,500
say, how likely was this outcome
to have occurred by

176
00:07:19,500 --> 00:07:20,970
random chance?

177
00:07:20,970 --> 00:07:23,540
And the more data you have, the
more likely you're going

178
00:07:23,540 --> 00:07:25,850
to be able to conclude that
actually, this difference you

179
00:07:25,850 --> 00:07:28,790
observed was actually due to
something that the person was

180
00:07:28,790 --> 00:07:32,500
doing and not just due to what
would happen randomly.

181
00:07:32,500 --> 00:07:36,490
And in some sense, all of
statistics is basically this

182
00:07:36,490 --> 00:07:40,640
intuition, which is, you take
the data you observe and you

183
00:07:40,640 --> 00:07:43,510
calculate what is the chance
that the data I observe could

184
00:07:43,510 --> 00:07:45,710
have occurred just
by random chance.

185
00:07:45,710 --> 00:07:48,690
And if the chance that the data
I observed could have

186
00:07:48,690 --> 00:07:51,680
happened just by random chance
is really unlikely, then you

187
00:07:51,680 --> 00:07:54,280
say, well then it must've been
that your program actually had

188
00:07:54,280 --> 00:07:56,250
an effect, OK?

189
00:07:56,250 --> 00:07:58,600
Does that make sense?

190
00:07:58,600 --> 00:08:00,285
That's the basic idea,
essentially, of all of

191
00:08:00,285 --> 00:08:01,820
statistics is, what's the
probability that this thing

192
00:08:01,820 --> 00:08:02,710
could have happened randomly?

193
00:08:02,710 --> 00:08:06,090
And if it's unlikely, then
probably there was something

194
00:08:06,090 --> 00:08:08,730
else going on.

195
00:08:08,730 --> 00:08:11,860
Here's another example.

196
00:08:11,860 --> 00:08:16,110
So what this example shows is,
now suppose you have a second

197
00:08:16,110 --> 00:08:18,880
gambler who had 1,000 tosses
and they had 530

198
00:08:18,880 --> 00:08:20,130
heads and 470 tails.

199
00:08:23,350 --> 00:08:28,770
What this shows is that-- and
that's really a lot of data.

200
00:08:28,770 --> 00:08:31,660
But in some sense, what we can
learn about this data depends

201
00:08:31,660 --> 00:08:33,905
on what hypothesis we're
interested in.

202
00:08:33,905 --> 00:08:39,700
So if the gambler claimed they
obtained heads 70% of the

203
00:08:39,700 --> 00:08:43,210
time, we could probably say, no,
I don't think so, right?

204
00:08:43,210 --> 00:08:45,470
This is enough data that the
odds that you would get this

205
00:08:45,470 --> 00:08:48,330
data pattern if you had heads
70% of the time are really,

206
00:08:48,330 --> 00:08:50,440
really small, right?

207
00:08:50,440 --> 00:08:54,280
So we could say, I can
reject this claim.

208
00:08:54,280 --> 00:08:59,810
But suppose they said that
they claim they could get

209
00:08:59,810 --> 00:09:02,780
heads 54% of the time, OK?

210
00:09:02,780 --> 00:09:05,760
And you observe they got
heads 53% of the time.

211
00:09:05,760 --> 00:09:08,450
Well, you probably couldn't
reject this claim, right?

212
00:09:08,450 --> 00:09:11,180
Because this is similar enough
to this that if this was the

213
00:09:11,180 --> 00:09:14,410
truth, this could have occurred
by random chance.

214
00:09:14,410 --> 00:09:21,840
So in some sense, what we can
say based on the data depends

215
00:09:21,840 --> 00:09:26,180
on how far the data is from our
hypothesis and how much

216
00:09:26,180 --> 00:09:28,330
data we have.

217
00:09:28,330 --> 00:09:31,280
Does that make sense as
some basic intuition?

218
00:09:31,280 --> 00:09:32,530
OK.

219
00:09:34,680 --> 00:09:38,760
So how do we apply this
to an experiment?

220
00:09:38,760 --> 00:09:40,690
Well, at the end of the
experiment, what we're going

221
00:09:40,690 --> 00:09:42,680
to do is we're going
to compare the

222
00:09:42,680 --> 00:09:43,300
two different groups.

223
00:09:43,300 --> 00:09:44,060
We're going to compare
the treatment

224
00:09:44,060 --> 00:09:45,600
and the control group.

225
00:09:45,600 --> 00:09:47,240
And we're going to say--

226
00:09:47,240 --> 00:09:49,240
we're going to take a look at
the average, just like we were

227
00:09:49,240 --> 00:09:50,090
doing in the gambling example.

228
00:09:50,090 --> 00:09:51,870
We'll compare the average in
the treatment group and the

229
00:09:51,870 --> 00:09:54,930
average in the control, OK?

230
00:09:54,930 --> 00:09:56,400
And the difference is
the effect size.

231
00:09:59,590 --> 00:10:02,900
So for example, in this
particular case, in the

232
00:10:02,900 --> 00:10:05,070
Panchayat case, you'd look at,
for example, the mean number

233
00:10:05,070 --> 00:10:07,100
of wells you've got in the
village with the female

234
00:10:07,100 --> 00:10:09,195
leaders versus the mean number
of wells in the villages with

235
00:10:09,195 --> 00:10:11,060
the male leaders, OK?

236
00:10:11,060 --> 00:10:13,100
So that's in some sense our
estimate of how big the

237
00:10:13,100 --> 00:10:14,350
difference is.

238
00:10:19,130 --> 00:10:22,370
And the question is going to be,
how likely would we have

239
00:10:22,370 --> 00:10:25,200
been to observe this difference
between the

240
00:10:25,200 --> 00:10:28,050
treatment and the control group
if it was just due to

241
00:10:28,050 --> 00:10:29,080
random chance, OK?

242
00:10:29,080 --> 00:10:32,400
And that's what we need the
statistics to figure out.

243
00:10:32,400 --> 00:10:42,893
Now one of the reasons--

244
00:10:46,520 --> 00:10:49,990
so where does the
noise come from?

245
00:10:49,990 --> 00:10:53,920
In some sense, we're not going
to observe an infinite number

246
00:10:53,920 --> 00:10:56,030
of villages.

247
00:10:56,030 --> 00:10:57,990
Or we're not going to observe
all possible villages.

248
00:10:57,990 --> 00:11:00,600
In fact, even if we observe all
the villages that exist,

249
00:11:00,600 --> 00:11:02,110
we're not going to observe,
in some sense, all of the

250
00:11:02,110 --> 00:11:04,130
possible villages that could've
hypothetically

251
00:11:04,130 --> 00:11:06,640
existed if the villages were
replicated millions and

252
00:11:06,640 --> 00:11:08,580
millions of times.

253
00:11:08,580 --> 00:11:09,630
We're just going
to observe some

254
00:11:09,630 --> 00:11:12,550
finite number of villages.

255
00:11:12,550 --> 00:11:17,070
And so we're going to estimate
this mean by computing the

256
00:11:17,070 --> 00:11:21,820
mean in the villages that
we observed, OK?

257
00:11:21,820 --> 00:11:32,290
And if there are very few
villages, that mean that we're

258
00:11:32,290 --> 00:11:35,700
going to calculate is going to
be imprecise because if you

259
00:11:35,700 --> 00:11:38,460
took a different sample of
villages, you would get a

260
00:11:38,460 --> 00:11:41,080
slightly different mean, OK?

261
00:11:41,080 --> 00:11:43,010
If you sample an infinite number
of villages, you get

262
00:11:43,010 --> 00:11:44,770
the same thing every time.

263
00:11:44,770 --> 00:11:49,110
But suppose you only sampled
one village.

264
00:11:49,110 --> 00:11:50,950
Or suppose there was a million
villages out there and you

265
00:11:50,950 --> 00:11:52,380
sampled two, right?

266
00:11:52,380 --> 00:11:53,970
And you took the average, OK?

267
00:11:53,970 --> 00:11:56,120
If you sampled a different two
villages, just by random

268
00:11:56,120 --> 00:11:58,480
chance, you would get
a different average.

269
00:11:58,480 --> 00:12:00,880
And sometimes that's where the
part of the noise in our data

270
00:12:00,880 --> 00:12:02,130
is coming from.

271
00:12:08,370 --> 00:12:09,620
So for example--

272
00:12:11,860 --> 00:12:14,020
sorry.

273
00:12:14,020 --> 00:12:17,350
So in some sense, what we need
to know is, we need to know if

274
00:12:17,350 --> 00:12:20,480
these two groups-- it sort of
goes back to the same as

275
00:12:20,480 --> 00:12:24,760
before, if these two groups were
the same and I sampled

276
00:12:24,760 --> 00:12:27,950
them, what are the chances I
would get the difference that

277
00:12:27,950 --> 00:12:29,010
I observed by random chance?

278
00:12:29,010 --> 00:12:31,640
So for example, suppose
you observed these two

279
00:12:31,640 --> 00:12:33,670
distributions, OK?

280
00:12:33,670 --> 00:12:34,900
So this is your control
group and this is

281
00:12:34,900 --> 00:12:36,685
your treatment group.

282
00:12:36,685 --> 00:12:40,640
Now you can see there is some
noise in the data, right?

283
00:12:40,640 --> 00:12:44,170
This one is a mean of 50 and
this one is a mean of 60.

284
00:12:44,170 --> 00:12:45,060
And there's some--

285
00:12:45,060 --> 00:12:46,360
these are histograms, right?

286
00:12:46,360 --> 00:12:48,580
So this is the distribution of
the number of villages that

287
00:12:48,580 --> 00:12:52,930
you observed for each
possible outcome.

288
00:12:52,930 --> 00:12:55,540
So you can see here that there's
some noise, right?

289
00:12:55,540 --> 00:12:57,490
It's not that everyone here was
exactly 50 and everyone

290
00:12:57,490 --> 00:12:58,310
here was exactly 60.

291
00:12:58,310 --> 00:12:59,310
Some people were 45.

292
00:12:59,310 --> 00:13:02,030
Some were 55 or whatever.

293
00:13:02,030 --> 00:13:07,650
But if you look at these two
distributions, you could say

294
00:13:07,650 --> 00:13:10,730
it's pretty unlikely that if
these were actually drawn from

295
00:13:10,730 --> 00:13:12,780
the same distribution of
villages, all of the blue ones

296
00:13:12,780 --> 00:13:14,030
would be over here and
all the yellow ones

297
00:13:14,030 --> 00:13:16,210
would be over here.

298
00:13:16,210 --> 00:13:18,490
It's very unlikely that if these
were actually the same

299
00:13:18,490 --> 00:13:21,930
and you draw randomly, you get
this real bifurcation of these

300
00:13:21,930 --> 00:13:25,040
villages, OK?

301
00:13:28,680 --> 00:13:33,090
And where are we basing that
idea, that conclusion on?

302
00:13:33,090 --> 00:13:37,610
We're basing the conclusion on
the fact that there's not a

303
00:13:37,610 --> 00:13:40,710
lot of overlap, in some sense,
between these two groups.

304
00:13:40,710 --> 00:13:45,780
But now, what if you saw
this picture, right?

305
00:13:45,780 --> 00:13:48,411
What would you be able
to conclude?

306
00:13:48,411 --> 00:13:51,030
Well, it's a little
less clear.

307
00:13:51,030 --> 00:13:52,410
The mean is still the same.

308
00:13:52,410 --> 00:13:54,650
All the yellows still have an
average of 60 and all the

309
00:13:54,650 --> 00:13:56,000
blues have an average of 50.

310
00:13:56,000 --> 00:13:57,450
But there's a lot more
overlap between them.

311
00:13:57,450 --> 00:13:59,460
Now if we look at this, we can
sort of eyeball it and say,

312
00:13:59,460 --> 00:14:02,916
well, there's really a pretty
big difference even relative

313
00:14:02,916 --> 00:14:04,640
to the distributions there.

314
00:14:04,640 --> 00:14:06,060
So maybe we could conclude that

315
00:14:06,060 --> 00:14:06,790
they were really different.

316
00:14:06,790 --> 00:14:07,950
Maybe not.

317
00:14:07,950 --> 00:14:10,510
And what if we saw
this, right?

318
00:14:10,510 --> 00:14:12,340
This is still the same means.

319
00:14:12,340 --> 00:14:13,990
The yellows have a mean
of 60 and the blues

320
00:14:13,990 --> 00:14:15,225
have a mean of 50.

321
00:14:15,225 --> 00:14:17,590
But now they're so
interspersed that

322
00:14:17,590 --> 00:14:18,940
is harder to know--

323
00:14:18,940 --> 00:14:20,890
it's possible, if you saw
pictures like this, you would

324
00:14:20,890 --> 00:14:24,170
say, well, yes, the yellows are
higher, but maybe this was

325
00:14:24,170 --> 00:14:28,990
just due to random chance, OK?

326
00:14:28,990 --> 00:14:32,790
So what the purpose of these
graphs are, is to show you is

327
00:14:32,790 --> 00:14:35,900
that in order-- so in both
cases, we the same difference

328
00:14:35,900 --> 00:14:37,110
in the mean outcomes.

329
00:14:37,110 --> 00:14:41,160
It was 60 versus 50 in all
three cases, right?

330
00:14:41,160 --> 00:14:44,440
But when you saw this graph, it
was quite clear that these

331
00:14:44,440 --> 00:14:46,210
two groups were really
different.

332
00:14:46,210 --> 00:14:49,040
When you saw this graph, is was
much harder to figure out

333
00:14:49,040 --> 00:14:50,740
if these two were really
different or if this was just

334
00:14:50,740 --> 00:14:53,340
due to random chance, OK?

335
00:14:53,340 --> 00:14:55,490
Does that make sense of
where we're going?

336
00:14:55,490 --> 00:14:58,510
And so, just to come back to
the same theme, all the

337
00:14:58,510 --> 00:15:02,350
statistics are going to do in
our case is going to help us

338
00:15:02,350 --> 00:15:06,710
figure out, are these
differences big enough, given

339
00:15:06,710 --> 00:15:10,600
the distribution of data we
have, how likely is it that

340
00:15:10,600 --> 00:15:11,670
the difference we observed
could have

341
00:15:11,670 --> 00:15:13,470
happened by random chance.

342
00:15:13,470 --> 00:15:16,570
And so intuitively, we can
look at this one and say,

343
00:15:16,570 --> 00:15:17,740
definitely different.

344
00:15:17,740 --> 00:15:19,510
And this one, maybe not sure.

345
00:15:19,510 --> 00:15:21,550
But if we want to be a little
more precise about that,

346
00:15:21,550 --> 00:15:23,390
that's where we need the
added statistics.

347
00:15:23,390 --> 00:15:25,190
AUDIENCE: Is the sample size
the same in both examples?

348
00:15:25,190 --> 00:15:28,210
PROFESSOR: Yeah, the sample
size is the same.

349
00:15:28,210 --> 00:15:29,580
Yeah, sample size is
exactly the same.

350
00:15:29,580 --> 00:15:34,240
So you can see that the numbers
go down because it's

351
00:15:34,240 --> 00:15:35,490
more spread out.

352
00:15:41,410 --> 00:15:44,320
All right.

353
00:15:44,320 --> 00:15:47,520
So in some sense, what are the
ingredients that we've talked

354
00:15:47,520 --> 00:15:49,300
about in terms of thinking
about whether you have a

355
00:15:49,300 --> 00:15:52,580
statistically significant
difference?

356
00:15:52,580 --> 00:15:54,180
If you think back to the gambler
example, we talked

357
00:15:54,180 --> 00:15:57,830
about the sample size
matters, right?

358
00:15:57,830 --> 00:16:04,170
So if we saw 1,000 tosses, we
had much more precision about

359
00:16:04,170 --> 00:16:08,100
our estimates than if we had
10 tosses or five tosses.

360
00:16:08,100 --> 00:16:10,850
The hypothesis you're testing
matters, right?

361
00:16:10,850 --> 00:16:17,050
Because the smaller an effect
size we're trying to detect,

362
00:16:17,050 --> 00:16:20,820
the more tosses we need in
the gambler example.

363
00:16:20,820 --> 00:16:23,660
If you're trying to detect a
really small difference, you

364
00:16:23,660 --> 00:16:26,180
need a ton of data, whereas if
you're trying to detect really

365
00:16:26,180 --> 00:16:31,130
extreme differences, you can
do it with less data, OK?

366
00:16:31,130 --> 00:16:33,830
And the third thing we saw is
the variability of the outcome

367
00:16:33,830 --> 00:16:34,890
matters, right?

368
00:16:34,890 --> 00:16:38,340
So the more noisy the outcome
is, the harder it is to know

369
00:16:38,340 --> 00:16:40,460
whether the differences that
we observe are due just to

370
00:16:40,460 --> 00:16:43,160
random chance or if they're
really due some difference in

371
00:16:43,160 --> 00:16:44,480
the treatment versus
the control group.

372
00:16:48,690 --> 00:16:50,940
OK, so does this makes sense?

373
00:16:50,940 --> 00:16:53,510
Before I go on, these are the
three ingredients that we're

374
00:16:53,510 --> 00:16:54,700
going to be playing with.

375
00:16:54,700 --> 00:16:55,740
Do these make sense?

376
00:16:55,740 --> 00:16:59,225
Do you have questions on this?

377
00:16:59,225 --> 00:17:00,710
OK.

378
00:17:00,710 --> 00:17:06,210
So you may have heard of
a confidence interval.

379
00:17:06,210 --> 00:17:06,990
How many of you guys
have heard of

380
00:17:06,990 --> 00:17:08,859
a confidence interval?

381
00:17:08,859 --> 00:17:10,109
OK.

382
00:17:12,290 --> 00:17:13,710
How many of you can state
the definition of

383
00:17:13,710 --> 00:17:16,079
a confidence interval?

384
00:17:16,079 --> 00:17:16,609
Thanks, Dan.

385
00:17:16,609 --> 00:17:17,859
I'm glad that you can.

386
00:17:21,060 --> 00:17:30,270
So what do we mean when we
say confidence interval?

387
00:17:30,270 --> 00:17:31,665
What we mean by a confidence
interval--

388
00:17:35,000 --> 00:17:37,060
so let's just go through what's
on the slide and then

389
00:17:37,060 --> 00:17:38,690
we can talk about it
a little more.

390
00:17:38,690 --> 00:17:41,760
So we're going to measure,
say, 100 people and we're

391
00:17:41,760 --> 00:17:43,560
going to come up with an
average length of 53

392
00:17:43,560 --> 00:17:44,880
centimeters.

393
00:17:44,880 --> 00:17:47,260
So we want to be able to say
something about how precise

394
00:17:47,260 --> 00:17:48,460
our estimate is.

395
00:17:48,460 --> 00:17:51,990
So we say the average
is 53 centimeters.

396
00:17:51,990 --> 00:17:56,810
How confident are we or how
precise are we that it's 53%?

397
00:17:56,810 --> 00:17:59,600
And that's what a conference
interval is trying to say.

398
00:17:59,600 --> 00:18:02,315
And a confidence interval,
essentially, tells us that

399
00:18:02,315 --> 00:18:04,920
with 95% probability--

400
00:18:04,920 --> 00:18:07,390
so we have a confidence interval
of 50-56 says that

401
00:18:07,390 --> 00:18:10,860
with 95% probability, the
true average length lies

402
00:18:10,860 --> 00:18:13,210
between 50 and 56.

403
00:18:13,210 --> 00:18:20,210
And so the precise definition
is that if you had a

404
00:18:20,210 --> 00:18:25,280
hypothesis that the true average
length was in this

405
00:18:25,280 --> 00:18:31,500
range with--

406
00:18:31,500 --> 00:18:34,900
no, I'm going to get it wrong.

407
00:18:34,900 --> 00:18:43,800
It says that if you had a
hypothesis that the true

408
00:18:43,800 --> 00:18:48,090
average was in here, it's within
95% probability that

409
00:18:48,090 --> 00:18:52,790
you would get the data
that you observe, OK?

410
00:18:52,790 --> 00:18:55,720
A converse way of saying it is
that the truth is somewhere in

411
00:18:55,720 --> 00:18:58,180
this range, right?

412
00:19:01,620 --> 00:19:03,600
You can be 95% certain that the
truth is somewhere within

413
00:19:03,600 --> 00:19:08,720
this range, So if you did 20 of
these tests, only one out

414
00:19:08,720 --> 00:19:10,930
of 20 times would the truth
be outside your confidence

415
00:19:10,930 --> 00:19:16,370
interval, OK?

416
00:19:16,370 --> 00:19:18,980
And so an approximate
interpretation of a confidence

417
00:19:18,980 --> 00:19:19,930
interval is--

418
00:19:19,930 --> 00:19:23,910
so we know that the point
estimate of 43, we have some

419
00:19:23,910 --> 00:19:24,950
uncertainty about
that estimate.

420
00:19:24,950 --> 00:19:27,790
We think the average is 53, but
there's some uncertainty.

421
00:19:27,790 --> 00:19:32,210
And the confidence interval
says, well, it's 95% likely

422
00:19:32,210 --> 00:19:35,755
that the true answer is between
50 and 56, if that was

423
00:19:35,755 --> 00:19:41,300
the confidence interval, OK?

424
00:19:41,300 --> 00:19:45,730
So why is that useful for us?

425
00:19:45,730 --> 00:19:47,700
Well, our goal is
to figure out--

426
00:19:47,700 --> 00:19:49,280
we don't care, actually, what
our estimate of the

427
00:19:49,280 --> 00:19:50,080
program's effect is.

428
00:19:50,080 --> 00:19:51,990
We care what the true effect
of a program is, right?

429
00:19:51,990 --> 00:19:52,880
So we did some intervention.

430
00:19:52,880 --> 00:19:54,540
Like, for example, we had a
female Panchayat leader

431
00:19:54,540 --> 00:19:56,870
instead of a male Panchayat
leader and we want to figure

432
00:19:56,870 --> 00:20:02,740
out what the actual difference
that that intervention made is

433
00:20:02,740 --> 00:20:04,230
in the world.

434
00:20:04,230 --> 00:20:08,930
We're going to observe some
sample of Panchayats and we'll

435
00:20:08,930 --> 00:20:11,500
look at the difference
in that sample.

436
00:20:11,500 --> 00:20:13,490
And we want to know how much
can we learn about the true

437
00:20:13,490 --> 00:20:15,650
program effect from
what we estimated.

438
00:20:15,650 --> 00:20:17,580
And the confidence interval
basically tells us that with

439
00:20:17,580 --> 00:20:21,310
95% probability, the true
program effect is somewhere in

440
00:20:21,310 --> 00:20:24,020
the confidence interval, OK?

441
00:20:24,020 --> 00:20:25,270
Does that makes sense?

442
00:20:29,790 --> 00:20:33,020
How many of you guys have heard
of the standard error?

443
00:20:33,020 --> 00:20:33,930
OK.

444
00:20:33,930 --> 00:20:38,620
So a standard error is related
to the confidence interval in

445
00:20:38,620 --> 00:20:45,430
that a standard error says that
if we have some estimate,

446
00:20:45,430 --> 00:20:49,840
you could imagine that if we
did the experiment again,

447
00:20:49,840 --> 00:20:52,610
essentially, with a new sample
of people that looked like the

448
00:20:52,610 --> 00:20:57,830
original sample of people, we
might get a slightly different

449
00:20:57,830 --> 00:20:59,360
point estimate because it's
a different sample.

450
00:21:02,680 --> 00:21:06,470
The standard error basically
says, what's the distribution

451
00:21:06,470 --> 00:21:10,810
of those possible estimates
that you could get, OK?

452
00:21:10,810 --> 00:21:13,640
So it says that basically, if
I did this experiment again,

453
00:21:13,640 --> 00:21:15,810
maybe i wouldn't get
53, I'd get 54.

454
00:21:15,810 --> 00:21:17,380
If I did it again,
maybe I'd get 52.

455
00:21:17,380 --> 00:21:19,020
If I did it again,
I might get 53.

456
00:21:19,020 --> 00:21:21,220
The standard error is
essentially the standard

457
00:21:21,220 --> 00:21:25,250
deviation of those possible
estimates that you could get.

458
00:21:25,250 --> 00:21:27,880
What that means in practice
is that--

459
00:21:31,870 --> 00:21:33,080
well, in practice, the standard
error is very related

460
00:21:33,080 --> 00:21:34,340
to the confidence interval.

461
00:21:34,340 --> 00:21:38,150
And basically, a good rule of
thumb is that a 95% confidence

462
00:21:38,150 --> 00:21:40,330
interval is about two
standard errors.

463
00:21:40,330 --> 00:21:43,400
So if you ever see an estimate
of the standard error, you can

464
00:21:43,400 --> 00:21:45,180
calculate the confidence
interval, essentially, by

465
00:21:45,180 --> 00:21:51,420
going up or down two standard
errors from the point

466
00:21:51,420 --> 00:21:54,020
estimate, OK?

467
00:21:54,020 --> 00:21:56,250
And the confidence interval
and standard error,

468
00:21:56,250 --> 00:21:57,980
essentially, are capturing
the same thing.

469
00:21:57,980 --> 00:22:00,190
They're both capturing--

470
00:22:00,190 --> 00:22:03,080
when I said we need statistics
to basically compute, how

471
00:22:03,080 --> 00:22:05,420
likely is it that we would get
these differences by random

472
00:22:05,420 --> 00:22:07,930
chance, those are all coming out
in the standard error and

473
00:22:07,930 --> 00:22:09,060
the confidence interval,
right?

474
00:22:09,060 --> 00:22:12,630
They're computed by both looking
at how noisy our data

475
00:22:12,630 --> 00:22:17,710
is, which is the variability
of the outcome, and how big

476
00:22:17,710 --> 00:22:19,410
our sample is, right?

477
00:22:19,410 --> 00:22:22,450
Because from these two things,
you can basically calculate

478
00:22:22,450 --> 00:22:26,570
how uncertain your estimate
would be.

479
00:22:26,570 --> 00:22:29,800
This is a lot of terminology
very quickly, but does this

480
00:22:29,800 --> 00:22:32,180
all make sense?

481
00:22:32,180 --> 00:22:35,380
Any questions on this?

482
00:22:35,380 --> 00:22:36,630
OK.

483
00:22:38,810 --> 00:22:41,550
So for example.

484
00:22:41,550 --> 00:22:46,005
So suppose we saw the sampled
women Pradhans had 7.13 years

485
00:22:46,005 --> 00:22:51,200
of education and the men had
9.92 years of education, OK?

486
00:22:51,200 --> 00:22:56,410
And you want to know, is the
truth that men have more

487
00:22:56,410 --> 00:22:58,860
education than women or is this
just a random artifact of

488
00:22:58,860 --> 00:23:00,980
our sample?

489
00:23:00,980 --> 00:23:06,640
So suppose you calculated that
the difference was 2.59.

490
00:23:06,640 --> 00:23:07,880
That's easy to calculate.

491
00:23:07,880 --> 00:23:11,520
And the standard error was 0.54
and the standard error

492
00:23:11,520 --> 00:23:13,650
was going to be calculated based
on both how much data

493
00:23:13,650 --> 00:23:17,250
you had and how noisy
the data was.

494
00:23:17,250 --> 00:23:19,660
You would compute that the 95%
confidence interval is between

495
00:23:19,660 --> 00:23:22,620
1.53 and 3.64, OK?

496
00:23:22,620 --> 00:23:24,960
So this means that with 95%
probability, the true

497
00:23:24,960 --> 00:23:27,570
difference in education rates
between men and women is

498
00:23:27,570 --> 00:23:30,300
between 1.53 and 3.64.

499
00:23:30,300 --> 00:23:33,870
So if you were interested in
testing the hypothesis that,

500
00:23:33,870 --> 00:23:38,570
in fact, men and women are the
same in education, you could

501
00:23:38,570 --> 00:23:40,860
say that I can reject
that hypothesis.

502
00:23:40,860 --> 00:23:43,285
With 95% probability, the
true difference is

503
00:23:43,285 --> 00:23:45,360
between 1.53 and 3.64--

504
00:23:45,360 --> 00:23:49,040
so zero is not in this
confidence interval, right?

505
00:23:49,040 --> 00:23:52,910
So we can reject the hypothesis
that there's no

506
00:23:52,910 --> 00:23:55,730
difference between
these two groups.

507
00:23:55,730 --> 00:23:58,130
Does that makes sense?

508
00:23:58,130 --> 00:23:59,600
So doing another example.

509
00:23:59,600 --> 00:24:02,410
So in this example, suppose
that we saw that control

510
00:24:02,410 --> 00:24:04,790
children had an average test
score of 2.45 and the

511
00:24:04,790 --> 00:24:07,350
treatment had an average
test score of 2.5.

512
00:24:07,350 --> 00:24:10,060
So we saw a difference of
0.05 and the standard

513
00:24:10,060 --> 00:24:13,530
error was 0.26, OK?

514
00:24:13,530 --> 00:24:15,620
So in this case, you would say
well, the 95% confidence

515
00:24:15,620 --> 00:24:18,690
interval is minus 0.55.

516
00:24:18,690 --> 00:24:20,710
This is approximately two.

517
00:24:20,710 --> 00:24:21,820
It's not exactly two.

518
00:24:21,820 --> 00:24:22,630
Minus 0.55--

519
00:24:22,630 --> 00:24:24,140
oh, no, it is exactly
two in this example.

520
00:24:24,140 --> 00:24:28,850
Minus 0.55 to 0.46, OK?

521
00:24:28,850 --> 00:24:31,355
And here, you would say that
if we were introducing the

522
00:24:31,355 --> 00:24:33,170
hypothesis that the null
hypothesis is that the

523
00:24:33,170 --> 00:24:36,300
treatment had no effect on test
scores, you could not

524
00:24:36,300 --> 00:24:37,890
reject that null hypothesis,
right?

525
00:24:37,890 --> 00:24:40,990
Because an effect of zero
is within the confidence

526
00:24:40,990 --> 00:24:43,650
interval, OK?

527
00:24:43,650 --> 00:24:45,610
So that's basically how we
use confidence intervals.

528
00:24:45,610 --> 00:24:46,100
Yeah.

529
00:24:46,100 --> 00:24:48,350
AUDIENCE: Shouldn't the two
points of that confidence

530
00:24:48,350 --> 00:24:54,370
interval be equidistant
from 2.59?

531
00:24:54,370 --> 00:24:58,310
PROFESSOR: From 0.05 you mean?

532
00:24:58,310 --> 00:24:58,730
Yeah.

533
00:24:58,730 --> 00:24:59,250
AUDIENCE: [INAUDIBLE]

534
00:24:59,250 --> 00:25:01,360
PROFESSOR: Yeah, I think--

535
00:25:01,360 --> 00:25:02,500
oh, over here?

536
00:25:02,500 --> 00:25:03,330
AUDIENCE: Yeah.

537
00:25:03,330 --> 00:25:07,035
PROFESSOR: So they actually
don't always have--

538
00:25:07,035 --> 00:25:09,750
so you raise a good point.

539
00:25:09,750 --> 00:25:11,310
So there may be some
math errors here.

540
00:25:11,310 --> 00:25:13,540
I think a more reasonable
estimate, by the way, is that

541
00:25:13,540 --> 00:25:15,970
this would have to be minus
0.05 for you to get

542
00:25:15,970 --> 00:25:16,540
something like this.

543
00:25:16,540 --> 00:25:20,620
AUDIENCE: But in the first
example, if 2.59 is the mean,

544
00:25:20,620 --> 00:25:23,885
is the difference--

545
00:25:23,885 --> 00:25:26,790
PROFESSOR: So it's
approximately

546
00:25:26,790 --> 00:25:27,380
the same, isn't it?

547
00:25:27,380 --> 00:25:29,120
AUDIENCE: I think it's
a little skewed--

548
00:25:29,120 --> 00:25:29,670
PROFESSOR: Yeah.

549
00:25:29,670 --> 00:25:30,620
AUDIENCE: On that side,
it is 2.64.

550
00:25:30,620 --> 00:25:31,530
PROFESSOR: OK.

551
00:25:31,530 --> 00:25:33,720
Yeah, so you raise
a good point.

552
00:25:33,720 --> 00:25:39,520
So when I said that a rule
of thumb is two times the

553
00:25:39,520 --> 00:25:42,030
standard error, that's
a rule of thumb.

554
00:25:42,030 --> 00:25:45,820
And in particular cases, you can
sometimes get asymmetric

555
00:25:45,820 --> 00:25:47,100
confidence intervals.

556
00:25:47,100 --> 00:25:49,670
So you're right that usually
they should be symmetric and

557
00:25:49,670 --> 00:25:51,600
probably, for simplicity, we
should have put up symmetric

558
00:25:51,600 --> 00:25:54,780
ones, but it can occur that
confidence intervals are

559
00:25:54,780 --> 00:25:57,080
asymmetric.

560
00:25:57,080 --> 00:25:58,440
For example, if you had a--

561
00:26:02,000 --> 00:26:05,410
yeah, depending on the
estimation, if you have

562
00:26:05,410 --> 00:26:07,270
truncation at zero--

563
00:26:07,270 --> 00:26:08,740
if you know for sure that there
can never be an outcome

564
00:26:08,740 --> 00:26:11,264
below zero, for example, then
you can get asymmetric

565
00:26:11,264 --> 00:26:11,738
confidence intervals.

566
00:26:11,738 --> 00:26:15,530
AUDIENCE: When the distribution
is not normal?

567
00:26:15,530 --> 00:26:16,770
PROFESSOR: Yeah.

568
00:26:16,770 --> 00:26:19,070
Exactly.

569
00:26:19,070 --> 00:26:24,570
But for most things that you'll
be investigating,

570
00:26:24,570 --> 00:26:26,070
usually they're going to be--

571
00:26:26,070 --> 00:26:26,950
AUDIENCE: Normal.

572
00:26:26,950 --> 00:26:28,520
PROFESSOR: Yeah, for outcomes
that are zero.

573
00:26:28,520 --> 00:26:29,190
One [UNINTELLIGIBLE]

574
00:26:29,190 --> 00:26:31,670
get non-normal, but
yes, in general,

575
00:26:31,670 --> 00:26:33,010
they are pretty symmetric.

576
00:26:33,010 --> 00:26:37,563
But they might not be
exactly symmetric.

577
00:26:37,563 --> 00:26:40,010
OK.

578
00:26:40,010 --> 00:26:43,340
So as I sort of was suggesting
as we were going through these

579
00:26:43,340 --> 00:26:46,470
examples, we're often interested
in testing the

580
00:26:46,470 --> 00:26:50,460
hypothesis that the effect size
is equal to zero, right?

581
00:26:50,460 --> 00:26:54,710
The classic hypothesis the you
typically want to know is, did

582
00:26:54,710 --> 00:26:59,605
my program do anything, right?

583
00:26:59,605 --> 00:27:02,640
And so, how do you test the
hypothesis that my program--

584
00:27:02,640 --> 00:27:04,470
so we want to know, did
my program have

585
00:27:04,470 --> 00:27:06,140
any effect at all?

586
00:27:06,140 --> 00:27:08,890
And so what we technically want
to do is we want to test

587
00:27:08,890 --> 00:27:11,960
what's called the null
hypothesis, that the program

588
00:27:11,960 --> 00:27:15,240
had an effect of nothing,
against an alternative

589
00:27:15,240 --> 00:27:18,616
hypothesis that the program
had some effect.

590
00:27:18,616 --> 00:27:23,370
So this is the typical test
that we want to do.

591
00:27:23,370 --> 00:27:27,670
Now you could say, actually,
I don't care about zero.

592
00:27:27,670 --> 00:27:30,970
I want to say that I know--
for example, this is the

593
00:27:30,970 --> 00:27:33,420
standard thing that we would do
in most policy evaluations

594
00:27:33,420 --> 00:27:34,670
that we're going to be doing.

595
00:27:37,050 --> 00:27:38,680
It doesn't have to be zero.

596
00:27:38,680 --> 00:27:40,740
Suppose you were doing a drug
trial and you knew that the

597
00:27:40,740 --> 00:27:42,850
best existing treatment
out there already have

598
00:27:42,850 --> 00:27:44,160
an effect of one.

599
00:27:44,160 --> 00:27:49,420
And so instead of comparing
to zero, you might be

600
00:27:49,420 --> 00:27:51,250
comparing to one.

601
00:27:51,250 --> 00:27:54,800
Is it actually better than the
best existing treatment?

602
00:27:54,800 --> 00:27:59,170
In most cases, we're usually
comparing to zero, OK?

603
00:27:59,170 --> 00:28:02,140
And usually, we have the
alternative hypothesis that

604
00:28:02,140 --> 00:28:03,140
the effect is just not zero.

605
00:28:03,140 --> 00:28:05,080
We're interested in anything
other than zero.

606
00:28:05,080 --> 00:28:08,000
Sometimes you can specify other
alternative hypotheses,

607
00:28:08,000 --> 00:28:11,040
that the effect is always
positive or always negative,

608
00:28:11,040 --> 00:28:13,590
but usually this is the classic
case, which is we're

609
00:28:13,590 --> 00:28:17,490
saying, we think this thing
had-- the null is no effect.

610
00:28:17,490 --> 00:28:19,460
We want to say, did this program
have an effect and

611
00:28:19,460 --> 00:28:23,760
we're interested in any
possible effect, OK?

612
00:28:23,760 --> 00:28:27,010
And hypothesis testing says,
when can I reject this null

613
00:28:27,010 --> 00:28:32,170
hypothesis in favor of
this alternative, OK?

614
00:28:36,600 --> 00:28:39,510
And as we saw, essentially,
the confidence interval is

615
00:28:39,510 --> 00:28:41,090
giving you a way to do that.

616
00:28:41,090 --> 00:28:44,700
It's saying, if the null is
outside the confidence

617
00:28:44,700 --> 00:28:47,370
interval, then I can
reject the null.

618
00:28:47,370 --> 00:28:47,610
Yeah.

619
00:28:47,610 --> 00:28:53,665
AUDIENCE: Surely, if we're
trying to assess the impact of

620
00:28:53,665 --> 00:28:57,220
an intervention, we're always
going to think it's positive.

621
00:28:57,220 --> 00:28:59,610
Or in general, because--

622
00:28:59,610 --> 00:29:02,180
I gave someone some money to
increase their income or not.

623
00:29:02,180 --> 00:29:04,830
We've got a pretty good idea
it's going to be positive.

624
00:29:04,830 --> 00:29:08,230
The probability it's negative
is pretty--

625
00:29:08,230 --> 00:29:09,845
PROFESSOR: Why do you--

626
00:29:09,845 --> 00:29:12,230
AUDIENCE: Yeah, why
do we change our

627
00:29:12,230 --> 00:29:12,510
significance level--

628
00:29:12,510 --> 00:29:12,720
[INTERPOSING VOICES]

629
00:29:12,720 --> 00:29:16,420
PROFESSOR: You ask
a great question.

630
00:29:16,420 --> 00:29:17,840
And I have to say this is
a bit of a source of

631
00:29:17,840 --> 00:29:20,380
frustration of mine.

632
00:29:20,380 --> 00:29:24,950
Let me give you a couple
different answers to that.

633
00:29:24,950 --> 00:29:26,360
Here's the thing.

634
00:29:26,360 --> 00:29:27,610
If you did that--

635
00:29:30,790 --> 00:29:34,260
if I said I can commit, before
I look at the data, that I

636
00:29:34,260 --> 00:29:37,720
only think it could be positive,
that would mean that

637
00:29:37,720 --> 00:29:40,700
if it's negative, no matter how
negative, you're going to

638
00:29:40,700 --> 00:29:43,430
say that was random
chance, OK?

639
00:29:43,430 --> 00:29:46,730
So it would require a fair
amount of commitment on you,

640
00:29:46,730 --> 00:29:50,300
on your part, as the
experimenter to say, if I get

641
00:29:50,300 --> 00:29:53,540
a negative result, no matter how
crazy that negative result

642
00:29:53,540 --> 00:29:57,310
is, I'm going to say that's
random chance, OK?

643
00:29:57,310 --> 00:30:04,170
And typically, what often
happens ex post is that people

644
00:30:04,170 --> 00:30:06,450
can't commit to actually
doing that.

645
00:30:06,450 --> 00:30:08,780
So suppose you did your
program and you--

646
00:30:08,780 --> 00:30:12,120
so I actually have a program
right now that I'm working on

647
00:30:12,120 --> 00:30:13,700
in Indonesia that's supposed to

648
00:30:13,700 --> 00:30:15,510
improve health and education.

649
00:30:15,510 --> 00:30:17,820
And it seems to be making
education worse.

650
00:30:17,820 --> 00:30:20,020
Now, we have no theory for why
this program should be making

651
00:30:20,020 --> 00:30:22,700
education worse, OK?

652
00:30:22,700 --> 00:30:25,235
But it certainly seems to
be there in the data.

653
00:30:25,235 --> 00:30:28,520
Now, if we had adopted your
approach, we wouldn't be

654
00:30:28,520 --> 00:30:30,630
entertaining the hypothesis that
it made education worse.

655
00:30:30,630 --> 00:30:33,215
We would say, even though it's
looking like this program is

656
00:30:33,215 --> 00:30:35,555
making education worse,
that must be random

657
00:30:35,555 --> 00:30:37,250
noise in the data.

658
00:30:37,250 --> 00:30:42,110
We're not going to treat that as
something potentially real.

659
00:30:42,110 --> 00:30:44,480
Ex post, though, you see this in
the data and you're likely

660
00:30:44,480 --> 00:30:47,530
to say, gee, man, that's a
really negative effect.

661
00:30:47,530 --> 00:30:49,460
Maybe the program was doing
something that I

662
00:30:49,460 --> 00:30:50,150
didn't think about.

663
00:30:50,150 --> 00:30:51,435
And in our case, actually, we're
starting to investigate

664
00:30:51,435 --> 00:30:53,560
and maybe it's because it was
health and education and we're

665
00:30:53,560 --> 00:30:56,910
sort of sucking resources away
from education into health.

666
00:30:56,910 --> 00:30:59,660
So it requires a lot of
commitment on your part, as

667
00:30:59,660 --> 00:31:04,500
the researcher, that if you get
these negative effects, to

668
00:31:04,500 --> 00:31:06,260
treat them as random noise.

669
00:31:06,260 --> 00:31:09,440
And I think that, because most
researchers, even though they

670
00:31:09,440 --> 00:31:11,640
would like to say they're
going to do that, if it

671
00:31:11,640 --> 00:31:13,370
happens that they get a really
negative effect, they're going

672
00:31:13,370 --> 00:31:15,030
to want to say, gee, that looks
like a negative effect.

673
00:31:15,030 --> 00:31:15,980
We're going to want
to investigate

674
00:31:15,980 --> 00:31:17,490
that, take that seriously.

675
00:31:17,490 --> 00:31:22,030
Because most people do that ex
post, the convention is that

676
00:31:22,030 --> 00:31:25,340
in most cases, to say we're
going to test against either

677
00:31:25,340 --> 00:31:26,545
hypothesis in either
direction.

678
00:31:26,545 --> 00:31:27,410
AUDIENCE: Except that
the approach--

679
00:31:27,410 --> 00:31:28,650
PROFESSOR: Does that
makes sense?

680
00:31:28,650 --> 00:31:31,120
AUDIENCE: Your issue is do
I do this program or not.

681
00:31:31,120 --> 00:31:33,307
So it doesn't matter whether the
impact of the program is

682
00:31:33,307 --> 00:31:34,540
zero or negative.

683
00:31:34,540 --> 00:31:36,170
Even if it's zero, you're
saying that it's--

684
00:31:36,170 --> 00:31:37,080
PROFESSOR: You're absolutely
right.

685
00:31:37,080 --> 00:31:42,846
So if you were strict about it
and said, I'm going to do it

686
00:31:42,846 --> 00:31:45,430
if it's positive and not if it's
zero, then I think you

687
00:31:45,430 --> 00:31:47,990
were correct that, strictly
speaking, a one-sided

688
00:31:47,990 --> 00:31:49,640
hypothesis test will be correct
and it would give you

689
00:31:49,640 --> 00:31:50,260
some more power.

690
00:31:50,260 --> 00:31:52,210
AUDIENCE: So it would
give you power.

691
00:31:52,210 --> 00:31:53,090
PROFESSOR: Yeah, it would
give you more power.

692
00:31:53,090 --> 00:31:53,530
AUDIENCE: [UNINTELLIGIBLE]

693
00:31:53,530 --> 00:31:53,740
PROFESSOR: Right.

694
00:31:53,740 --> 00:31:57,220
And the reason it gives you
power is, remember, how does

695
00:31:57,220 --> 00:31:57,970
hypothesis testing work?

696
00:31:57,970 --> 00:31:59,520
It says, well, what is the
chance this outcome could have

697
00:31:59,520 --> 00:32:01,140
occurred 95--

698
00:32:01,140 --> 00:32:05,050
what would have occurred by
chance 95% of the time?

699
00:32:05,050 --> 00:32:06,810
When you do a two-sided
test, you say, OK--

700
00:32:06,810 --> 00:32:08,710
where's my chalkboard?

701
00:32:08,710 --> 00:32:09,692
Here.

702
00:32:09,692 --> 00:32:12,440
You imagine a normal
distribution of outcomes.

703
00:32:12,440 --> 00:32:14,910
You're going to say, well, the
95% is in the middle and

704
00:32:14,910 --> 00:32:18,790
anything in the tails is the
stuff that I'm going to

705
00:32:18,790 --> 00:32:20,360
[UNINTELLIGIBLE] by
non-random chance.

706
00:32:20,360 --> 00:32:22,250
Well, what you're doing with a
one-sided test is you're going

707
00:32:22,250 --> 00:32:24,960
to say, I'm going to take
that negative stuff--

708
00:32:24,960 --> 00:32:27,070
way out there negative stuff--
and I'm going to say that's

709
00:32:27,070 --> 00:32:28,530
also random chance.

710
00:32:28,530 --> 00:32:31,570
So I'm going to pick my 95%
all the way to the left.

711
00:32:31,570 --> 00:32:33,960
And that means that the 5%
that's not random chance is a

712
00:32:33,960 --> 00:32:35,870
little more to the right.

713
00:32:35,870 --> 00:32:36,900
Do you see what I'm saying?

714
00:32:36,900 --> 00:32:38,700
But it requires that if--

715
00:32:38,700 --> 00:32:40,820
you're committing to, even if
you get really negative

716
00:32:40,820 --> 00:32:43,160
outcomes, asserting that they're
random chance, which

717
00:32:43,160 --> 00:32:45,680
is really, often, kind
of unbelievable.

718
00:32:45,680 --> 00:32:48,480
The other thing is that,
although this is technically

719
00:32:48,480 --> 00:32:51,040
the way hypothesis testing
is set up, the norms and

720
00:32:51,040 --> 00:32:54,050
conventions are that we all use
two-sided tests for these

721
00:32:54,050 --> 00:32:55,310
reasons I talked about.

722
00:32:55,310 --> 00:33:03,250
And so I can just tell you that,
practically speaking, I

723
00:33:03,250 --> 00:33:05,140
think if you do a one-sided
test, people are going to be

724
00:33:05,140 --> 00:33:09,930
skeptical because it may be that
you, actually, would do

725
00:33:09,930 --> 00:33:13,750
that, but I think most of
the time, people can't

726
00:33:13,750 --> 00:33:14,300
commit to do that.

727
00:33:14,300 --> 00:33:16,830
And so the standard has become
two-sided tests.

728
00:33:16,830 --> 00:33:17,700
But I certainly agree
with you.

729
00:33:17,700 --> 00:33:20,320
It's very frustrating because
one should be able to

730
00:33:20,320 --> 00:33:21,570
articulate one-sided
hypotheses.

731
00:33:24,420 --> 00:33:27,418
That's sort of a long answer,
but does that make sense?

732
00:33:27,418 --> 00:33:28,350
It's OK.

733
00:33:28,350 --> 00:33:30,280
OK, now, for those of you on
this side of the board, you

734
00:33:30,280 --> 00:33:32,945
won't be able to see, but
maybe if I need to write

735
00:33:32,945 --> 00:33:34,220
something on the board
it will be better.

736
00:33:34,220 --> 00:33:35,470
OK.

737
00:33:38,595 --> 00:33:39,760
So now we're going to talk
about type I and type II

738
00:33:39,760 --> 00:33:46,230
errors, which, as I mentioned,
are not helpfully named.

739
00:33:46,230 --> 00:33:47,650
OK.

740
00:33:47,650 --> 00:33:48,900
A type I error--

741
00:33:53,940 --> 00:33:56,780
so this is all about
probability, so nothing we can

742
00:33:56,780 --> 00:33:57,590
ever say for sure.

743
00:33:57,590 --> 00:34:01,250
We can always say that this
is more or less likely.

744
00:34:01,250 --> 00:34:03,240
And there's two different types
of errors we can make

745
00:34:03,240 --> 00:34:05,780
when we're doing these
probabilities or doing these

746
00:34:05,780 --> 00:34:06,980
assessments.

747
00:34:06,980 --> 00:34:09,969
The first error, and it's called
type I error, is we can

748
00:34:09,969 --> 00:34:12,760
conclude that there was an
effect when, in fact, there

749
00:34:12,760 --> 00:34:17,219
was no effect, OK?

750
00:34:17,219 --> 00:34:21,070
So when I said the 95%
confidence interval, that 95%

751
00:34:21,070 --> 00:34:23,199
is coming from our choice
about type 1 errors.

752
00:34:31,530 --> 00:34:32,790
So for example--

753
00:34:36,530 --> 00:34:38,550
a significance level is the
probability that you're going

754
00:34:38,550 --> 00:34:40,620
to falsely conclude the program
had an effect when, in

755
00:34:40,620 --> 00:34:42,960
fact, there was no effect, OK?

756
00:34:42,960 --> 00:34:47,020
And that's related to when you
say a 95% confidence interval,

757
00:34:47,020 --> 00:34:49,710
the remaining 5% is what we're
talking about here.

758
00:34:49,710 --> 00:34:53,980
That's the probability of making
a type I error, OK?

759
00:34:53,980 --> 00:34:55,030
And why is that?

760
00:34:55,030 --> 00:34:57,830
Well, we said there's a 95%
chance that it's going to be

761
00:34:57,830 --> 00:34:59,590
within this range.

762
00:34:59,590 --> 00:35:02,510
That means that just by random
chance, there's some chance it

763
00:35:02,510 --> 00:35:05,295
could be outside that
range, right?

764
00:35:05,295 --> 00:35:08,430
So if your confidence interval
was over here and zero was

765
00:35:08,430 --> 00:35:12,650
over here, you would say, well,
with 95% conference, I'm

766
00:35:12,650 --> 00:35:14,630
going to assume the program had
an effect because zero is

767
00:35:14,630 --> 00:35:16,500
not within my confidence
interval.

768
00:35:16,500 --> 00:35:20,400
However, 5% of the time, the
true effect could be over here

769
00:35:20,400 --> 00:35:21,430
outside your confidence
interval.

770
00:35:21,430 --> 00:35:23,460
That's what a 95% confidence
interval means.

771
00:35:28,050 --> 00:35:33,230
So in some sense, that's
what we mean by a--

772
00:35:33,230 --> 00:35:35,000
so that's in some sense what
a type I error is.

773
00:35:35,000 --> 00:35:36,770
A type I error is the
probability that you're going

774
00:35:36,770 --> 00:35:46,000
to detect an effect when,
in fact, there's not.

775
00:35:46,000 --> 00:35:51,950
And so the typical levels that
you may see are 5%, 1% or 10%

776
00:35:51,950 --> 00:35:53,420
significance levels.

777
00:35:53,420 --> 00:35:55,930
And the way to think about those
significance levels is,

778
00:35:55,930 --> 00:35:58,650
if you see something that's
significant at the 10% level,

779
00:35:58,650 --> 00:36:01,820
that means it 10% of the time,
an effect of that size

780
00:36:01,820 --> 00:36:03,380
could've been just due
to random chance.

781
00:36:03,380 --> 00:36:06,860
Might not actually
be a true effect.

782
00:36:06,860 --> 00:36:10,440
And if you've heard of a
p-value, a p-value is exactly

783
00:36:10,440 --> 00:36:11,560
this number.

784
00:36:11,560 --> 00:36:14,880
A p-value basically says, what
is the probability that an

785
00:36:14,880 --> 00:36:18,650
effect this size or larger could
have occurred just by

786
00:36:18,650 --> 00:36:21,880
random chance, OK?

787
00:36:24,540 --> 00:36:27,280
So that's what's called
a type I error.

788
00:36:27,280 --> 00:36:34,620
And typically, there's no deep
reason why 5% is the normal

789
00:36:34,620 --> 00:36:36,580
level of type I errors that we
use, but it's kind of the

790
00:36:36,580 --> 00:36:39,160
convention.

791
00:36:39,160 --> 00:36:40,020
It's what everyone else uses.

792
00:36:40,020 --> 00:36:41,950
If you use something different,
people are going to

793
00:36:41,950 --> 00:36:42,820
look at you a little funny.

794
00:36:42,820 --> 00:36:47,430
So the conventions are we have
5%, 10%, and 1%, as these

795
00:36:47,430 --> 00:36:48,860
significance levels.

796
00:36:48,860 --> 00:36:55,030
And you might say, gee, 5%
or 10% seems pretty low.

797
00:36:55,030 --> 00:36:56,110
Maybe I would want
a bigger one.

798
00:36:56,110 --> 00:36:58,030
But on the other hand, if you
start thinking about it, that

799
00:36:58,030 --> 00:37:00,160
means that if you use 10%
significance, that means that

800
00:37:00,160 --> 00:37:02,876
one out of every 10 studies
is going to be wrong.

801
00:37:02,876 --> 00:37:05,710
Or if you had 10 different
outcomes in your data set, one

802
00:37:05,710 --> 00:37:08,660
out of every 10 would be
significant even just by

803
00:37:08,660 --> 00:37:09,910
random chance.

804
00:37:12,750 --> 00:37:15,920
So the other type of error is
what's called, as I said,

805
00:37:15,920 --> 00:37:18,210
helpfully, a type II error.

806
00:37:18,210 --> 00:37:21,310
And a type II error says that
you fail to reject that the

807
00:37:21,310 --> 00:37:26,570
program had no effect when, in
fact, there was an effect, OK?

808
00:37:26,570 --> 00:37:30,870
So this is, the program did
something, but I can't pick it

809
00:37:30,870 --> 00:37:35,480
up in the data, OK?

810
00:37:35,480 --> 00:37:39,280
And we talk about the
power of a test.

811
00:37:39,280 --> 00:37:42,870
The power is basically the
opposite of a type II error.

812
00:37:42,870 --> 00:37:45,550
A power just says, what's the
probability that I will be

813
00:37:45,550 --> 00:37:48,730
able to find an effect
given that the actual

814
00:37:48,730 --> 00:37:52,490
effect is there, OK?

815
00:37:52,490 --> 00:37:57,070
So when we talk about how big
a sample size we need, what

816
00:37:57,070 --> 00:38:00,320
we're basically talking about
is, how much power are we

817
00:38:00,320 --> 00:38:02,250
going to have to detect
an effect?

818
00:38:02,250 --> 00:38:04,560
Or what's the probability that
given that a true effect is

819
00:38:04,560 --> 00:38:08,150
there, we're going to pick
it up in the data, OK?

820
00:38:08,150 --> 00:38:10,740
So here's an example of how
to think about power.

821
00:38:10,740 --> 00:38:13,960
If I ran the experiment 100
times-- not 100 samples, but

822
00:38:13,960 --> 00:38:16,380
if I ran the whole
thing 100 times--

823
00:38:16,380 --> 00:38:18,960
what percentage of the time and
in how many these cases

824
00:38:18,960 --> 00:38:21,120
would I be able to say, reject
the hypothesis that men and

825
00:38:21,120 --> 00:38:24,280
women have the same education
at the 5% level if, in fact,

826
00:38:24,280 --> 00:38:28,010
they're different, OK?

827
00:38:28,010 --> 00:38:34,650
So this is a helpful graph
which basically plots the

828
00:38:34,650 --> 00:38:36,253
truth and what you're going
to conclude based

829
00:38:36,253 --> 00:38:37,730
on your data, OK?

830
00:38:37,730 --> 00:38:40,940
So suppose the truth is that
you had no effect and you

831
00:38:40,940 --> 00:38:43,570
conclude your no effect, OK?

832
00:38:43,570 --> 00:38:47,010
Then you're happy.

833
00:38:47,010 --> 00:38:49,290
If there was an effect and you
conclude there was an effect,

834
00:38:49,290 --> 00:38:49,960
you're happy.

835
00:38:49,960 --> 00:38:52,330
So you want to be in one
of these two boxes.

836
00:38:52,330 --> 00:38:54,540
And the two types of errors you
can make-- so one type of

837
00:38:54,540 --> 00:38:56,770
error is over here, right?

838
00:38:56,770 --> 00:39:00,160
So if the truth is there was no
effect, but you concluded

839
00:39:00,160 --> 00:39:01,860
there was an effect, that
would be making a

840
00:39:01,860 --> 00:39:03,670
type I error, OK?

841
00:39:03,670 --> 00:39:05,080
And this is what we
talked about size.

842
00:39:05,080 --> 00:39:09,075
So this one, we normally
fix this one at 5%.

843
00:39:09,075 --> 00:39:12,680
So it's only 5% of the time--

844
00:39:12,680 --> 00:39:15,505
if there's no effect, 5% of the
time you're going to end

845
00:39:15,505 --> 00:39:17,680
up here and 95% of the time
you're going to end up here.

846
00:39:17,680 --> 00:39:20,560
That's what a 95% confidence
interval is telling you.

847
00:39:20,560 --> 00:39:24,090
And the other thing is, suppose
that the thing had an

848
00:39:24,090 --> 00:39:29,310
effect but you couldn't find
it in the data, OK?

849
00:39:29,310 --> 00:39:30,850
That's what's called
a type II error.

850
00:39:30,850 --> 00:39:34,040
And that's, when we design our
experiments, we want to make

851
00:39:34,040 --> 00:39:36,400
sure that our samples are
sufficiently large that the

852
00:39:36,400 --> 00:39:40,710
probability you end up in this
box is not too big, OK?

853
00:39:40,710 --> 00:39:44,430
So that's a sense of what we
mean by the different types of

854
00:39:44,430 --> 00:39:45,720
mistakes or errors
you could make.

855
00:39:45,720 --> 00:39:46,181
Yeah.

856
00:39:46,181 --> 00:39:50,330
AUDIENCE: It's kind of
a stupid question.

857
00:39:50,330 --> 00:39:53,040
So power is the probability
that you are not

858
00:39:53,040 --> 00:39:54,495
making a type II error?

859
00:39:54,495 --> 00:39:55,465
PROFESSOR: Yes.

860
00:39:55,465 --> 00:39:58,470
AUDIENCE: So then power is the
probability that you're in the

861
00:39:58,470 --> 00:40:00,382
smiley face box,
that you are--

862
00:40:00,382 --> 00:40:00,860
[INTERPOSING VOICES]

863
00:40:00,860 --> 00:40:03,190
PROFESSOR: Yes.

864
00:40:03,190 --> 00:40:05,840
Power is the probability
you're over here.

865
00:40:05,840 --> 00:40:07,750
Yeah, we say power is related
to type II errors.

866
00:40:07,750 --> 00:40:08,580
Power is over here.

867
00:40:08,580 --> 00:40:09,930
This is the power.

868
00:40:09,930 --> 00:40:11,530
Power is conditional on
there being an effect.

869
00:40:11,530 --> 00:40:13,966
What's the probability you're
in this box, not this box?

870
00:40:17,390 --> 00:40:21,630
Probably should say one minus
power to be clearer.

871
00:40:21,630 --> 00:40:21,780
OK?

872
00:40:21,780 --> 00:40:23,527
Does that makes sense?

873
00:40:23,527 --> 00:40:24,660
All right.

874
00:40:24,660 --> 00:40:27,230
So when we're designing
experiments, we typically fix

875
00:40:27,230 --> 00:40:31,520
this at conventional levels.

876
00:40:31,520 --> 00:40:34,450
And we choose our sample size
so that we get this, the

877
00:40:34,450 --> 00:40:36,500
power, or the probability that
you're in the happy face box

878
00:40:36,500 --> 00:40:39,190
over here to a reasonable level
given the effect size

879
00:40:39,190 --> 00:40:42,862
that we think we're
likely to get, OK?

880
00:40:46,348 --> 00:40:47,598
OK.

881
00:40:49,350 --> 00:40:53,652
Now, in some sense, the next two
things, standard errors,

882
00:40:53,652 --> 00:40:58,620
are about this box, size.

883
00:40:58,620 --> 00:41:03,245
And power is about this
box, or these boxes.

884
00:41:03,245 --> 00:41:05,470
Yeah.

885
00:41:05,470 --> 00:41:08,260
AUDIENCE: Why is power not also
the probability that you

886
00:41:08,260 --> 00:41:11,620
end up in the bottom right box
as opposed to the bottom left?

887
00:41:11,620 --> 00:41:12,990
PROFESSOR: Because
that's size.

888
00:41:12,990 --> 00:41:17,310
AUDIENCE: Isn't size also linked
to-- or power also

889
00:41:17,310 --> 00:41:17,790
linked to--

890
00:41:17,790 --> 00:41:21,985
PROFESSOR: No, they're all
related, but we typically--

891
00:41:24,860 --> 00:41:27,490
they're related in the
following way.

892
00:41:27,490 --> 00:41:31,190
We assert a size because when
we calculate our standard

893
00:41:31,190 --> 00:41:35,620
error-- our confidence
intervals, we pick how big or

894
00:41:35,620 --> 00:41:37,840
small we want the confidence
intervals to be.

895
00:41:37,840 --> 00:41:40,090
When we say a 95% confidence
interval, we're

896
00:41:40,090 --> 00:41:42,460
picking the size, OK?

897
00:41:42,460 --> 00:41:45,045
So this one, we get to choose.

898
00:41:45,045 --> 00:41:47,935
AUDIENCE: So it's not sample
size, it's size of the

899
00:41:47,935 --> 00:41:48,560
confidence interval?

900
00:41:48,560 --> 00:41:49,690
PROFESSOR: No.

901
00:41:49,690 --> 00:41:53,660
Yeah, this is size is a--

902
00:41:53,660 --> 00:41:55,430
yeah, it's the size of the
confidence interval.

903
00:41:55,430 --> 00:41:55,880
That's right.

904
00:41:55,880 --> 00:41:56,900
Sorry, it's not the
sample size.

905
00:41:56,900 --> 00:41:58,470
That's right.

906
00:41:58,470 --> 00:42:02,210
It's called the size of the
test in yet more confusing

907
00:42:02,210 --> 00:42:03,460
terminology.

908
00:42:05,474 --> 00:42:06,350
That's right.

909
00:42:06,350 --> 00:42:08,640
This is the size of the
confidence interval,

910
00:42:08,640 --> 00:42:09,120
essentially.

911
00:42:09,120 --> 00:42:11,090
And this one you pick,
and this one is

912
00:42:11,090 --> 00:42:13,140
determined by your data.

913
00:42:16,916 --> 00:42:19,276
OK?

914
00:42:19,276 --> 00:42:20,220
All right.

915
00:42:20,220 --> 00:42:25,000
OK, so now let's talk about this
part, which is standard

916
00:42:25,000 --> 00:42:26,710
errors and significance.

917
00:42:26,710 --> 00:42:29,540
It's all kind of related.

918
00:42:29,540 --> 00:42:32,530
All right, so we're going
to estimate the

919
00:42:32,530 --> 00:42:33,380
effect of our program.

920
00:42:33,380 --> 00:42:37,750
And we typically call that
beta, or beta hat.

921
00:42:37,750 --> 00:42:40,560
So the convention is that things
that are estimated, we

922
00:42:40,560 --> 00:42:42,800
put a little hat
over them, OK?

923
00:42:42,800 --> 00:42:45,550
So beta hat is going to be our
estimate of the program's

924
00:42:45,550 --> 00:42:46,760
effectiveness.

925
00:42:46,760 --> 00:42:49,510
This is our best guess as to the
difference between these

926
00:42:49,510 --> 00:42:51,120
two groups.

927
00:42:51,120 --> 00:42:55,100
So for example, this is the
average treatment test score

928
00:42:55,100 --> 00:42:56,380
minus the average control
test score.

929
00:42:59,450 --> 00:43:02,560
And then we're also going to
calculate our estimate of the

930
00:43:02,560 --> 00:43:04,540
standard error of
beta hat, right?

931
00:43:04,540 --> 00:43:06,130
And remember that the confidence
interval is about

932
00:43:06,130 --> 00:43:08,890
two times the standard error.

933
00:43:08,890 --> 00:43:10,880
So the standard error is going
to say how precise our

934
00:43:10,880 --> 00:43:13,440
estimate of beta hat is, which
is, remember, if we ran the

935
00:43:13,440 --> 00:43:15,590
experiment 100 times, what will
be the distributions of

936
00:43:15,590 --> 00:43:21,180
beta hats that we
would get, OK?

937
00:43:21,180 --> 00:43:23,870
And this depends on the sample
size and the noise in the

938
00:43:23,870 --> 00:43:25,980
data, right?

939
00:43:25,980 --> 00:43:28,910
And remember we went through
this already that here, in

940
00:43:28,910 --> 00:43:32,490
this case, the standard error of
how confident we would be--

941
00:43:32,490 --> 00:43:36,140
so the beta hat, in this case,
is going to be 10, and in this

942
00:43:36,140 --> 00:43:38,500
case, it's also going
to be 10, right?

943
00:43:38,500 --> 00:43:42,910
But here, these two things are
really precisely estimated, so

944
00:43:42,910 --> 00:43:46,540
our standard error of beta hat
is going to be very small

945
00:43:46,540 --> 00:43:49,100
because we're going to say we
have a very precise estimate

946
00:43:49,100 --> 00:43:50,920
of the difference
between them.

947
00:43:50,920 --> 00:43:52,280
And so the confidence
interval is also

948
00:43:52,280 --> 00:43:53,830
going to be very small.

949
00:43:53,830 --> 00:43:55,630
And here, there's lots of noise
in the data, so our

950
00:43:55,630 --> 00:43:58,570
estimate of the standard error
is going to be larger.

951
00:43:58,570 --> 00:44:00,490
So in both cases, beta
hat is the same.

952
00:44:00,490 --> 00:44:01,940
It's 10 in both cases.

953
00:44:01,940 --> 00:44:03,580
But the standard error
is very big here and

954
00:44:03,580 --> 00:44:07,260
very small here, OK?

955
00:44:07,260 --> 00:44:11,320
Now, when we calculate the
statistical significance, we

956
00:44:11,320 --> 00:44:14,120
use something called
a t-ratio.

957
00:44:14,120 --> 00:44:15,370
And the t-ratio--

958
00:44:19,370 --> 00:44:21,510
it's actually often called the
student's t-ratio, which I

959
00:44:21,510 --> 00:44:22,810
thought was because
students used it.

960
00:44:22,810 --> 00:44:24,275
But it's actually named
after Mr. Student.

961
00:44:28,310 --> 00:44:30,770
It's the ratio of beta hat
to the standard error

962
00:44:30,770 --> 00:44:33,430
of beta hat, OK?

963
00:44:33,430 --> 00:44:38,040
And the reason that we happen to
use this ratio is that, if

964
00:44:38,040 --> 00:44:42,140
there is no effect, if beta hat
is actually zero, we know

965
00:44:42,140 --> 00:44:44,030
that this thing has a normal
distribution, so we can

966
00:44:44,030 --> 00:44:46,590
calculate the probabilities that
this thing is really or

967
00:44:46,590 --> 00:44:49,010
really small, OK?

968
00:44:51,630 --> 00:44:54,090
So we calculate this ratio of
beta hat over the standard

969
00:44:54,090 --> 00:44:55,870
error of beta hat.

970
00:44:58,550 --> 00:45:01,030
It turns out that
if t is greater

971
00:45:01,030 --> 00:45:05,330
than, in absolute value--

972
00:45:05,330 --> 00:45:08,850
sorry, if the absolute value of
t, I should say, is greater

973
00:45:08,850 --> 00:45:10,940
than 1.96--

974
00:45:10,940 --> 00:45:16,380
so essentially, if it's bigger
than 2 or less than minus 2,

975
00:45:16,380 --> 00:45:18,920
we're going to reject the
hypothesis of a quality at a

976
00:45:18,920 --> 00:45:20,810
5% significance level.

977
00:45:20,810 --> 00:45:21,850
And why is that?

978
00:45:21,850 --> 00:45:28,980
It's because it turns out, from
statistics, that if the

979
00:45:28,980 --> 00:45:31,350
truth is zero, OK?

980
00:45:31,350 --> 00:45:35,170
So if we're in the no effect
box and the truth is zero,

981
00:45:35,170 --> 00:45:39,180
this ratio, it turns out, will
have a normal distribution.

982
00:45:39,180 --> 00:45:40,990
And it just turns out from a
normal distribution that the

983
00:45:40,990 --> 00:45:45,590
probability that the 5%
confidence interval is 1.96

984
00:45:45,590 --> 00:45:47,780
away from zero if you have
a normal distribution.

985
00:45:47,780 --> 00:45:51,880
That's just a fact about normal
distributions, OK?

986
00:45:51,880 --> 00:45:54,880
So if we calculate this ratio
and we say it's greater in

987
00:45:54,880 --> 00:45:57,900
absolute value than 1.96, we're
going to reject the

988
00:45:57,900 --> 00:46:00,320
hypothesis of a quality
at the 5% level, OK?

989
00:46:00,320 --> 00:46:01,220
So we can reject zero.

990
00:46:01,220 --> 00:46:02,650
Zero is going to be outside of
our confidence interval.

991
00:46:02,650 --> 00:46:04,945
And if it's less than 1.96,
we're going to fail to reject

992
00:46:04,945 --> 00:46:06,945
it because zero is going to
be inside our confidence

993
00:46:06,945 --> 00:46:11,240
interval, OK?

994
00:46:11,240 --> 00:46:13,470
So in this case, for example,
the difference was 2.59.

995
00:46:13,470 --> 00:46:14,990
The standard error was 0.54.

996
00:46:14,990 --> 00:46:19,690
The t-ratio is about seven.

997
00:46:19,690 --> 00:46:21,490
No, it's about five.

998
00:46:21,490 --> 00:46:23,180
So we're definitely going
to be able to

999
00:46:23,180 --> 00:46:24,760
reject in this case.

1000
00:46:24,760 --> 00:46:30,090
So we have a t-ratio
of about five, OK?

1001
00:46:30,090 --> 00:46:35,180
So you may see this terminology
and this is where

1002
00:46:35,180 --> 00:46:36,430
it's coming from.

1003
00:46:36,430 --> 00:46:39,870
Now, there's an important point
to note here, which will

1004
00:46:39,870 --> 00:46:42,380
come up later when we talk
about power calculations,

1005
00:46:42,380 --> 00:46:50,180
which is, in some sense, that
the power that we have is

1006
00:46:50,180 --> 00:46:53,980
determined by this ratio of
the point estimate to our

1007
00:46:53,980 --> 00:46:55,620
standard error.

1008
00:46:55,620 --> 00:46:58,530
And so this says, for example,
that if we kind of look at

1009
00:46:58,530 --> 00:47:05,010
this a little more, that if you
have bigger betas, you can

1010
00:47:05,010 --> 00:47:07,210
still detect effects for
a given standard--

1011
00:47:07,210 --> 00:47:08,650
so if you fix the standard
error but you made beta

1012
00:47:08,650 --> 00:47:11,070
bigger, you're more likely
to conclude there was a

1013
00:47:11,070 --> 00:47:11,940
difference, right?

1014
00:47:11,940 --> 00:47:15,950
So what's going to increase your
being able to conclude

1015
00:47:15,950 --> 00:47:16,980
there was a difference?

1016
00:47:16,980 --> 00:47:18,780
Either your effect size is
bigger or your standard error

1017
00:47:18,780 --> 00:47:20,800
is smaller, mechanically.

1018
00:47:25,500 --> 00:47:28,040
OK.

1019
00:47:28,040 --> 00:47:32,230
So that's how we are going to
calculate being in this box.

1020
00:47:32,230 --> 00:47:34,230
So how do we think about
power, which is the

1021
00:47:34,230 --> 00:47:36,150
probability that we're
in this box?

1022
00:47:36,150 --> 00:47:38,890
We had an effect and we're able
to detect that-- sorry,

1023
00:47:38,890 --> 00:47:46,170
power's in this box-- that
we had an effect, OK?

1024
00:47:46,170 --> 00:47:53,520
So when we're planning an
experiment, we can do some

1025
00:47:53,520 --> 00:47:56,670
calculations to help us figure
out what that power is.

1026
00:47:56,670 --> 00:48:00,450
What's the probability, if the
truth is a certain level, that

1027
00:48:00,450 --> 00:48:02,180
we're going to be able to
pick it up in the data?

1028
00:48:05,390 --> 00:48:06,640
And what do we need
to do that?

1029
00:48:10,180 --> 00:48:12,550
We're going to have to specify
a null hypothesis, which is

1030
00:48:12,550 --> 00:48:13,500
usually zero.

1031
00:48:13,500 --> 00:48:15,030
We're going to be testing that
something's different than

1032
00:48:15,030 --> 00:48:19,120
zero, the two groups are
the same, for example.

1033
00:48:19,120 --> 00:48:19,940
We're going to have to pick our

1034
00:48:19,940 --> 00:48:21,920
significance level, our size.

1035
00:48:21,920 --> 00:48:23,170
And that, we almost
always pick at 5%.

1036
00:48:26,000 --> 00:48:31,680
We're going to have to
pick an effect size.

1037
00:48:31,680 --> 00:48:33,530
And we'll talk about what
exactly this means in a couple

1038
00:48:33,530 --> 00:48:33,950
more slides.

1039
00:48:33,950 --> 00:48:37,650
But when we calculate a power,
a power is for a given

1040
00:48:37,650 --> 00:48:42,380
effect size, OK?

1041
00:48:42,380 --> 00:48:43,800
And then we'll calculate
the power.

1042
00:48:46,330 --> 00:48:50,600
So for example, suppose
that we did this

1043
00:48:50,600 --> 00:48:52,660
and a power was 80%.

1044
00:48:52,660 --> 00:48:54,780
That would mean that if we did
this experiment 100 times--

1045
00:48:54,780 --> 00:48:56,860
not 100 times, but actually
repeated the whole experiment

1046
00:48:56,860 --> 00:49:03,620
100 times, 80% of the times we
did this experiment, if the

1047
00:49:03,620 --> 00:49:07,130
hypothesis is, in fact, false,
and instead, the truth is

1048
00:49:07,130 --> 00:49:11,300
this, we would be able to reject
the null and conclude

1049
00:49:11,300 --> 00:49:16,300
there was a true effect
80% of the time, OK?

1050
00:49:16,300 --> 00:49:18,610
That's a little bit complicated,
but does that

1051
00:49:18,610 --> 00:49:20,975
make sense, what we're going to
be trying to do with power?

1052
00:49:25,250 --> 00:49:26,990
So we're going to fix
the effect size.

1053
00:49:26,990 --> 00:49:29,505
So remember, we fix
the bottom box.

1054
00:49:33,680 --> 00:49:36,040
When we calculate power, we
have to speculate not just

1055
00:49:36,040 --> 00:49:37,020
effect versus no effect.

1056
00:49:37,020 --> 00:49:40,090
We have to postulate just how
effective the program is.

1057
00:49:40,090 --> 00:49:42,920
So we're going to say, suppose
that the effect size is 5%.

1058
00:49:42,920 --> 00:49:48,890
The truth is 0.2, right?

1059
00:49:48,890 --> 00:49:51,930
How big a sample would we need
to be in this box 80%

1060
00:49:51,930 --> 00:49:54,390
of the time, OK?

1061
00:49:54,390 --> 00:49:57,300
So when we say power,
that's what we mean.

1062
00:49:57,300 --> 00:50:03,050
And when we calculate the size
of the experiments, you have

1063
00:50:03,050 --> 00:50:07,300
to make a judgment call of how
big a power do you want.

1064
00:50:07,300 --> 00:50:09,850
The typical powers that we use
when we do power calculations,

1065
00:50:09,850 --> 00:50:12,430
are either 80% or 90%.

1066
00:50:12,430 --> 00:50:14,670
So what does this mean?

1067
00:50:14,670 --> 00:50:16,620
This means-- suppose
you did 80%.

1068
00:50:16,620 --> 00:50:17,680
Or [UNINTELLIGIBLE] this.

1069
00:50:17,680 --> 00:50:19,540
If you did 80%, that would
mean that if you ran your

1070
00:50:19,540 --> 00:50:24,070
experiment 100 times and the
true effect was 0.2 in this

1071
00:50:24,070 --> 00:50:27,220
case, you would be able
to pick up an effect,

1072
00:50:27,220 --> 00:50:30,510
statistically 80 out
of those 100 times.

1073
00:50:30,510 --> 00:50:32,100
20 out of 100 times,
you wouldn't.

1074
00:50:37,280 --> 00:50:42,100
And the bigger your sample size,
the larger your power is

1075
00:50:42,100 --> 00:50:47,070
going to be, OK?

1076
00:50:47,070 --> 00:50:50,410
Does that make sense so far?

1077
00:50:50,410 --> 00:50:52,260
OK.

1078
00:50:52,260 --> 00:50:54,720
Suppose you wanted to calculate
what our power is

1079
00:50:54,720 --> 00:50:57,610
going to be.

1080
00:50:57,610 --> 00:51:00,680
What are the things you
would need to know?

1081
00:51:00,680 --> 00:51:02,590
You would need to know
your significance

1082
00:51:02,590 --> 00:51:03,710
level of your size.

1083
00:51:03,710 --> 00:51:07,210
And as I said, this,
we just assume, OK?

1084
00:51:07,210 --> 00:51:09,100
This is that bottom box.

1085
00:51:09,100 --> 00:51:12,560
We're just going to assume
that it's 5%.

1086
00:51:12,560 --> 00:51:14,290
And the lower it is,
the larger sample

1087
00:51:14,290 --> 00:51:15,580
you're going to need.

1088
00:51:15,580 --> 00:51:18,190
But this one is sort
of picked for you.

1089
00:51:18,190 --> 00:51:21,250
We almost always use 5% because
that's the convention.

1090
00:51:21,250 --> 00:51:22,738
That's what everyone
uses, essentially.

1091
00:51:27,060 --> 00:51:29,720
The second thing you need to
know is the mean and the

1092
00:51:29,720 --> 00:51:34,014
variance of the outcome in
the comparison group.

1093
00:51:34,014 --> 00:51:37,050
So you need to know--

1094
00:51:37,050 --> 00:51:40,600
so remember, all this power
calculation is going to depend

1095
00:51:40,600 --> 00:51:44,820
on whether your sample looks
like this, really tight, or

1096
00:51:44,820 --> 00:51:46,570
looks like this and
is very noisy.

1097
00:51:46,570 --> 00:51:47,640
Because you obviously
need a much bigger

1098
00:51:47,640 --> 00:51:49,390
sample here than here.

1099
00:51:49,390 --> 00:51:51,920
So in order to do a power
calculation, you need to know,

1100
00:51:51,920 --> 00:51:55,610
well, just what does the outcome
look like, right?

1101
00:51:55,610 --> 00:51:57,920
Does the outcome really have
very narrow variance?

1102
00:52:00,580 --> 00:52:03,110
Is everyone almost exactly the
same, in which case it's going

1103
00:52:03,110 --> 00:52:04,020
to be very easy to
detect effects?

1104
00:52:04,020 --> 00:52:09,620
Or is there are huge range of
people, in which case you're

1105
00:52:09,620 --> 00:52:11,556
going to need bigger effects.

1106
00:52:11,556 --> 00:52:13,120
Now, how do we get this?

1107
00:52:13,120 --> 00:52:16,350
So this one, we just
conventionally set.

1108
00:52:16,350 --> 00:52:17,960
This one, we have to
get somewhere.

1109
00:52:17,960 --> 00:52:22,890
And we usually have to get it
from some other survey.

1110
00:52:22,890 --> 00:52:26,730
So we have to find someone
that collected data in a

1111
00:52:26,730 --> 00:52:27,970
similar population.

1112
00:52:27,970 --> 00:52:30,700
Or sometimes we'll go and
collect data ourselves in that

1113
00:52:30,700 --> 00:52:31,310
same population.

1114
00:52:31,310 --> 00:52:33,820
Just a very small survey just
to get a sense of what this

1115
00:52:33,820 --> 00:52:37,120
variable looks like, OK?

1116
00:52:37,120 --> 00:52:41,010
And if the variability is big,
we're going to need a really

1117
00:52:41,010 --> 00:52:42,650
big sample.

1118
00:52:42,650 --> 00:52:44,323
And if the variability is really
small, we're going to

1119
00:52:44,323 --> 00:52:45,325
need a small sample.

1120
00:52:45,325 --> 00:52:49,010
And it's really important to do
this because you don't want

1121
00:52:49,010 --> 00:52:51,530
to spend all your time and money
running an experiment

1122
00:52:51,530 --> 00:52:53,930
only to turn out that there was
no hope of ever finding an

1123
00:52:53,930 --> 00:52:59,650
effect because the power
was too small, right?

1124
00:52:59,650 --> 00:52:59,955
Yeah.

1125
00:52:59,955 --> 00:53:01,931
AUDIENCE: And this is in the
entire population, not just

1126
00:53:01,931 --> 00:53:03,695
the comparison group, right?

1127
00:53:03,695 --> 00:53:04,577
It says-- --

1128
00:53:04,577 --> 00:53:06,920
PROFESSOR: Yeah, but before
you do your treatment, the

1129
00:53:06,920 --> 00:53:08,540
comparison and the treatment
are the same.

1130
00:53:08,540 --> 00:53:09,340
AUDIENCE: They are the same.

1131
00:53:09,340 --> 00:53:10,030
PROFESSOR: Doesn't matter.

1132
00:53:10,030 --> 00:53:10,740
AUDIENCE: So it's a baseline
population.

1133
00:53:10,740 --> 00:53:11,520
PROFESSOR: Baseline
would be fine.

1134
00:53:11,520 --> 00:53:14,400
Yeah.

1135
00:53:14,400 --> 00:53:16,090
Before you do your treatment,
they're the same.

1136
00:53:16,090 --> 00:53:20,860
So it doesn't matter, OK?

1137
00:53:20,860 --> 00:53:24,660
And the first thing you need
is, you need to make an

1138
00:53:24,660 --> 00:53:29,570
assumption about what effect
size you want to detect.

1139
00:53:29,570 --> 00:53:30,820
And this one--

1140
00:53:33,950 --> 00:53:37,530
sometimes you also have
to supply this.

1141
00:53:37,530 --> 00:53:42,350
And the best way to think about
what effect size you

1142
00:53:42,350 --> 00:53:47,880
want to put in here is you
want to say, what's the

1143
00:53:47,880 --> 00:53:55,660
smallest effect that would
prompt a policy response, OK?

1144
00:53:55,660 --> 00:53:57,520
So one could think about this,
for example, by doing a

1145
00:53:57,520 --> 00:53:58,710
cost-benefit calculation,
right?

1146
00:53:58,710 --> 00:54:01,910
You could say that we do a
cost-benefit calculation.

1147
00:54:01,910 --> 00:54:03,590
This thing costs $100.

1148
00:54:03,590 --> 00:54:06,120
If we don't get an effective
0.1, it's just

1149
00:54:06,120 --> 00:54:08,800
not worth $100, right?

1150
00:54:08,800 --> 00:54:11,450
So that would be a good way of
coming up with how big an

1151
00:54:11,450 --> 00:54:14,680
effect size you want here.

1152
00:54:14,680 --> 00:54:16,460
And the idea, then, is if the
effect is any smaller than

1153
00:54:16,460 --> 00:54:18,722
this, it's just not interesting
to distinguish it

1154
00:54:18,722 --> 00:54:21,000
from zero, right?

1155
00:54:21,000 --> 00:54:24,060
Suppose that the thing had a
true effect of 0.001, right?

1156
00:54:24,060 --> 00:54:26,260
But if it was that small of an
effect, it could be completely

1157
00:54:26,260 --> 00:54:26,880
cost effective.

1158
00:54:26,880 --> 00:54:29,750
So say the thing happens
at an effect of 0.001.

1159
00:54:29,750 --> 00:54:32,130
Who cares, right?

1160
00:54:32,130 --> 00:54:35,100
So you want to be thinking
about, from a policy

1161
00:54:35,100 --> 00:54:37,330
perspective is, what's the
smallest effect size you want

1162
00:54:37,330 --> 00:54:39,925
to know, from a policy
perspective, in order to set

1163
00:54:39,925 --> 00:54:41,710
your power calculations?

1164
00:54:41,710 --> 00:54:42,140
Yeah.

1165
00:54:42,140 --> 00:54:44,100
AUDIENCE: I have a question
back at the mean

1166
00:54:44,100 --> 00:54:45,025
and variance thing.

1167
00:54:45,025 --> 00:54:46,290
PROFESSOR: Oh, here.

1168
00:54:46,290 --> 00:54:46,773
Yeah.

1169
00:54:46,773 --> 00:54:47,739
AUDIENCE: Yeah.

1170
00:54:47,739 --> 00:54:50,154
So in terms of the baseline
thing that you would collect--

1171
00:54:50,154 --> 00:54:54,030
so I'm on the implementation
side of this, right?

1172
00:54:54,030 --> 00:54:54,825
So we do projects.

1173
00:54:54,825 --> 00:54:57,100
We collect baseline data.

1174
00:54:57,100 --> 00:55:03,050
Now, the case that I'm thinking
of, the baseline data

1175
00:55:03,050 --> 00:55:06,820
that we would collect might not
be exactly the same kind

1176
00:55:06,820 --> 00:55:12,530
of data that we are looking
for in terms of our study.

1177
00:55:12,530 --> 00:55:13,985
What kind of base-- how--

1178
00:55:13,985 --> 00:55:14,670
PROFESSOR: Right, OK.

1179
00:55:14,670 --> 00:55:18,180
So when we say baseline, there's
two different things

1180
00:55:18,180 --> 00:55:20,380
we mean by baseline.

1181
00:55:20,380 --> 00:55:22,820
For this case, this is not
strictly a baseline.

1182
00:55:22,820 --> 00:55:26,340
This is just something about
what's your variable

1183
00:55:26,340 --> 00:55:26,870
going to look like.

1184
00:55:26,870 --> 00:55:27,880
Let me come back to
that in a sec.

1185
00:55:27,880 --> 00:55:29,910
We also sometimes talk about
baselines that we are going to

1186
00:55:29,910 --> 00:55:33,020
use of actually collecting the
actual outcome variable before

1187
00:55:33,020 --> 00:55:34,930
we start the intervention,
right?

1188
00:55:34,930 --> 00:55:37,750
Those are also useful, and we'll
talk about those in a

1189
00:55:37,750 --> 00:55:38,970
couple slides.

1190
00:55:38,970 --> 00:55:41,900
And those, one wants them to be
more similar, probably, to

1191
00:55:41,900 --> 00:55:44,285
the actual variable you're
going to use.

1192
00:55:44,285 --> 00:55:48,410
Now, for your case,
we often don't--

1193
00:55:52,590 --> 00:55:56,640
the accuracy of your power
calculation depends pretty

1194
00:55:56,640 --> 00:56:01,780
critically on how close this
mean and variance are to what

1195
00:56:01,780 --> 00:56:04,100
you're going to actually
get in your data.

1196
00:56:04,100 --> 00:56:07,990
And when you start in the
example that you guys are

1197
00:56:07,990 --> 00:56:09,470
going to work on or that maybe
you've already started working

1198
00:56:09,470 --> 00:56:11,560
on, you're going to
find that it's

1199
00:56:11,560 --> 00:56:12,920
actually pretty sensitive.

1200
00:56:12,920 --> 00:56:15,010
Turns out it's pretty
sensitive.

1201
00:56:15,010 --> 00:56:19,870
So getting these wrong is
going to mean your power

1202
00:56:19,870 --> 00:56:23,200
calculation is going
to be wrong.

1203
00:56:23,200 --> 00:56:26,460
So that's sort of an argument
for saying you want this to be

1204
00:56:26,460 --> 00:56:28,530
as good as possible.

1205
00:56:28,530 --> 00:56:32,750
Now the flip side of that,
though, is you're going to

1206
00:56:32,750 --> 00:56:36,710
find that these power
calculations are fairly

1207
00:56:36,710 --> 00:56:40,640
sensitive to what effect size
you choose as well.

1208
00:56:40,640 --> 00:56:44,500
So you're going to find that if
you go from a effect size

1209
00:56:44,500 --> 00:56:46,870
of 0.2 to an effect size of 0.1,
you're going to need four

1210
00:56:46,870 --> 00:56:48,960
times the sample.

1211
00:56:48,960 --> 00:56:50,210
That's just the way the
math works out.

1212
00:56:54,820 --> 00:57:00,360
By which I'm going to mean that
I think that these power

1213
00:57:00,360 --> 00:57:03,480
calculations are useful for
making sure you're in the

1214
00:57:03,480 --> 00:57:07,550
right ballpark, but not
necessarily going to nail an

1215
00:57:07,550 --> 00:57:09,250
exact number for you.

1216
00:57:12,210 --> 00:57:17,050
All that's by way of saying that
you want to get-- because

1217
00:57:17,050 --> 00:57:19,910
these things are so sensitive,
you want to get as close as

1218
00:57:19,910 --> 00:57:22,770
possible to what's actually
going to be there.

1219
00:57:22,770 --> 00:57:25,710
On the other hand you're going
to find the results are also

1220
00:57:25,710 --> 00:57:29,080
so sensitive to the effect size
you want to detect that

1221
00:57:29,080 --> 00:57:32,260
if this was a little bit off,
that might be a tradeoff you

1222
00:57:32,260 --> 00:57:33,540
would be willing to live
with in practice.

1223
00:57:33,540 --> 00:57:34,780
AUDIENCE: So, from my--

1224
00:57:34,780 --> 00:57:36,870
PROFESSOR: Does that
make sense?

1225
00:57:36,870 --> 00:57:40,630
AUDIENCE: Yeah, but it seems
like the effect size-- your

1226
00:57:40,630 --> 00:57:44,690
estimate of your effect
size is this kind of--

1227
00:57:44,690 --> 00:57:47,930
we've got all this science for
the calculation and yet your

1228
00:57:47,930 --> 00:57:49,650
estimate of your effect
size is based on--

1229
00:57:49,650 --> 00:57:50,780
PROFESSOR: You're absolutely
right.

1230
00:57:50,780 --> 00:57:51,900
AUDIENCE: --getting that--

1231
00:57:51,900 --> 00:57:52,690
PROFESSOR: Hold on, though.

1232
00:57:52,690 --> 00:57:53,990
Let me back up a little
bit, though.

1233
00:57:53,990 --> 00:57:55,370
You're right, except the--

1234
00:57:55,370 --> 00:57:57,505
in some sense, the best way to
get estimates for your effect

1235
00:57:57,505 --> 00:58:00,100
size is to look at similar
programs, OK?

1236
00:58:00,100 --> 00:58:03,970
So now there are lots
of programs in

1237
00:58:03,970 --> 00:58:05,130
education, for example.

1238
00:58:05,130 --> 00:58:08,860
And they tend to find effect--

1239
00:58:08,860 --> 00:58:11,840
I've now seen a bazillion things
that work on improving

1240
00:58:11,840 --> 00:58:12,780
test scores.

1241
00:58:12,780 --> 00:58:14,870
And I can tell you that
they tend to get--

1242
00:58:14,870 --> 00:58:16,565
standardized effect size is the
effect size divided by the

1243
00:58:16,565 --> 00:58:17,360
standard deviation.

1244
00:58:17,360 --> 00:58:21,550
And they tend to get effect
sizes in the 0.1, 0.15, 0.2

1245
00:58:21,550 --> 00:58:24,410
range, right?

1246
00:58:24,410 --> 00:58:26,660
So you can look at those and
say, well, I think that most

1247
00:58:26,660 --> 00:58:30,040
other comparable interventions
are getting 0.1, so I'm going

1248
00:58:30,040 --> 00:58:32,910
to use 0.1 as my effect size.

1249
00:58:32,910 --> 00:58:34,490
So you're right if you're
just trying to sit here

1250
00:58:34,490 --> 00:58:34,940
introspectful--

1251
00:58:34,940 --> 00:58:37,060
what my effect size is going
to be, it's very hard.

1252
00:58:37,060 --> 00:58:40,580
But if you use comparable
studies to get a sense, then

1253
00:58:40,580 --> 00:58:41,520
you can get a sense.

1254
00:58:41,520 --> 00:58:42,990
And the other thing I mentioned
is, you can do

1255
00:58:42,990 --> 00:58:45,640
cost-benefit analysis and
say, well, look--

1256
00:58:45,640 --> 00:58:47,260
which is sort of another way
of saying it, If there are

1257
00:58:47,260 --> 00:58:51,650
other things out there which
cost $100 per kid and get 0.1,

1258
00:58:51,650 --> 00:58:54,150
then my thing, presumably, has
got to do at least as well as

1259
00:58:54,150 --> 00:58:56,790
0.1 for $100-- suppose the other
thing also costs $100 a

1260
00:58:56,790 --> 00:58:58,150
kid, I've got to do at
last as well as 0.1.

1261
00:58:58,150 --> 00:58:59,760
Otherwise, I'd rather
do this other thing.

1262
00:58:59,760 --> 00:59:01,640
So it's another way of getting
at the effect size.

1263
00:59:01,640 --> 00:59:04,715
AUDIENCE: Could you, then, also
look at existing data in

1264
00:59:04,715 --> 00:59:08,110
the literature for the mean
and variance thing,

1265
00:59:08,110 --> 00:59:08,740
or do you have to--

1266
00:59:08,740 --> 00:59:10,480
PROFESSOR: You could, but this
one is going to be more

1267
00:59:10,480 --> 00:59:11,385
sensitive to your population.

1268
00:59:11,385 --> 00:59:14,580
AUDIENCE: So it would just have
to be very well-matched

1269
00:59:14,580 --> 00:59:16,165
to be able to use it.

1270
00:59:16,165 --> 00:59:16,540
PROFESSOR: Right.

1271
00:59:16,540 --> 00:59:19,240
I mean, look, if you don't have
it, you could do it to

1272
00:59:19,240 --> 00:59:21,480
get a sense, but this is
one where the different

1273
00:59:21,480 --> 00:59:23,730
populations are going to be
very different in terms of

1274
00:59:23,730 --> 00:59:24,980
their mean and variance.

1275
00:59:28,540 --> 00:59:31,550
In order to get an estimate of
this, you need a much, much,

1276
00:59:31,550 --> 00:59:33,970
much smaller sample size than
you need to get an estimate of

1277
00:59:33,970 --> 00:59:35,970
the overall treatment effect
of the program.

1278
00:59:35,970 --> 00:59:39,720
So you can often do
a small survey--

1279
00:59:39,720 --> 00:59:42,690
much, much, smaller than your
big survey, but a small survey

1280
00:59:42,690 --> 00:59:44,810
just to get a sense of what
these things look like.

1281
00:59:44,810 --> 00:59:47,070
And that can often be a very
worthwhile thing to do.

1282
00:59:47,070 --> 00:59:49,115
AUDIENCE: I have a
related question.

1283
00:59:51,680 --> 00:59:53,520
How often do you see--

1284
00:59:53,520 --> 00:59:54,530
PROFESSOR: Oh, sorry.

1285
00:59:54,530 --> 00:59:55,390
I just wanted to do one
other thing on this.

1286
00:59:55,390 --> 00:59:57,550
I've had this come up in my
own experience, where I've

1287
00:59:57,550 --> 01:00:01,050
done this small survey, and
found that the baseline

1288
01:00:01,050 --> 01:00:02,350
situation was such that
the whole experiment

1289
01:00:02,350 --> 01:00:03,010
didn't make any sense.

1290
01:00:03,010 --> 01:00:05,150
And we just canceled
the experiment.

1291
01:00:05,150 --> 01:00:06,390
And it can be really useful.

1292
01:00:06,390 --> 01:00:11,050
If you say, if I do this and
my power is 0.01, for

1293
01:00:11,050 --> 01:00:14,300
reasonable effect sizes,
this is pointless.

1294
01:00:14,300 --> 01:00:16,110
So it can be worth it.

1295
01:00:16,110 --> 01:00:16,520
Sorry.

1296
01:00:16,520 --> 01:00:16,930
Go ahead.

1297
01:00:16,930 --> 01:00:21,460
AUDIENCE: So to estimate the
effect size, have you seen

1298
01:00:21,460 --> 01:00:24,960
people run small pilots in
different populations than

1299
01:00:24,960 --> 01:00:28,010
they're eventually going to do
their impact evaluation to get

1300
01:00:28,010 --> 01:00:30,670
a sense of what effect size are
they seeing with that same

1301
01:00:30,670 --> 01:00:30,965
intervention?

1302
01:00:30,965 --> 01:00:35,060
PROFESSOR: Not usually, because
you can't do a small

1303
01:00:35,060 --> 01:00:37,500
pilot to get the effect
size, right?

1304
01:00:37,500 --> 01:00:38,340
AUDIENCE: You're going
to see something--

1305
01:00:38,340 --> 01:00:39,290
PROFESSOR: You've got to
do the whole thing.

1306
01:00:39,290 --> 01:00:40,060
AUDIENCE: Yeah, yeah.

1307
01:00:40,060 --> 01:00:40,310
PROFESSOR: Right?

1308
01:00:40,310 --> 01:00:41,610
That's the whole point of the
power calculations is, in

1309
01:00:41,610 --> 01:00:45,220
order to detect an effect of
that size, you need to do the

1310
01:00:45,220 --> 01:00:45,610
whole sample.

1311
01:00:45,610 --> 01:00:48,080
So a small pilot won't
really do it.

1312
01:00:48,080 --> 01:00:49,556
AUDIENCE: OK.

1313
01:00:49,556 --> 01:00:51,750
PROFESSOR: So it's not really
going to-- you could get a--

1314
01:00:51,750 --> 01:00:54,290
no, I guess you really can't get
a sense because you would

1315
01:00:54,290 --> 01:00:56,530
need the whole experiment to
detect the effect size.

1316
01:00:59,490 --> 01:01:03,265
AUDIENCE: Don't you think that
there should be a lot more

1317
01:01:03,265 --> 01:01:05,850
conversation about effect size
before things start?

1318
01:01:05,850 --> 01:01:09,130
Because if you've got a
treatment, if you've got a

1319
01:01:09,130 --> 01:01:17,110
program, and you can't have a
very-- and you've struggled to

1320
01:01:17,110 --> 01:01:21,170
have a good conversation about
what is actually going to

1321
01:01:21,170 --> 01:01:23,560
happen to the kids or what's
going to happen to the health

1322
01:01:23,560 --> 01:01:25,840
or what's going to happen to
the income as a result of

1323
01:01:25,840 --> 01:01:29,420
this, it really may be quite
telling that you really don't

1324
01:01:29,420 --> 01:01:30,680
know what you're doing.

1325
01:01:30,680 --> 01:01:34,690
That there isn't enough of
a theory behind your-- or

1326
01:01:34,690 --> 01:01:37,920
practice or science or anything
behind what your

1327
01:01:37,920 --> 01:01:38,900
program is.

1328
01:01:38,900 --> 01:01:43,605
If people are not pretty
sure, what--

1329
01:01:43,605 --> 01:01:45,010
PROFESSOR: I mean,
yes and no--

1330
01:01:45,010 --> 01:01:48,170
AUDIENCE: And then, also, on
the resource allocation.

1331
01:01:48,170 --> 01:01:52,670
Resource allocation, it just
seems to me, most of the time,

1332
01:01:52,670 --> 01:01:57,105
if your ultimate client
is really probably the

1333
01:01:57,105 --> 01:01:57,980
government, right?

1334
01:01:57,980 --> 01:02:00,020
Because the government is the
one that's going to make the

1335
01:02:00,020 --> 01:02:00,470
big resource allocations--

1336
01:02:00,470 --> 01:02:02,760
PROFESSOR: It depends on who
you're working with.

1337
01:02:02,760 --> 01:02:04,450
It could be an NGO, whoever.

1338
01:02:04,450 --> 01:02:04,870
But yes.

1339
01:02:04,870 --> 01:02:08,540
AUDIENCE: No, but an NGO is
doing something, usually, as a

1340
01:02:08,540 --> 01:02:12,540
demonstration that, in fact,
if it works, then the

1341
01:02:12,540 --> 01:02:14,130
government should do it.

1342
01:02:14,130 --> 01:02:16,070
PROFESSOR: Not always, but
there's someone who,

1343
01:02:16,070 --> 01:02:17,836
presumably, is going
to scale up.

1344
01:02:17,836 --> 01:02:18,900
AUDIENCE: Right.

1345
01:02:18,900 --> 01:02:21,750
And yes, businesses,
maybe, right?

1346
01:02:21,750 --> 01:02:24,430
But I would say, 90% of the
time, it's going to be,

1347
01:02:24,430 --> 01:02:26,360
ultimately, the government
needs to--

1348
01:02:26,360 --> 01:02:27,480
PROFESSOR: Often, it's
the government.

1349
01:02:27,480 --> 01:02:29,560
In India, for example, there
are NGOs who are--

1350
01:02:29,560 --> 01:02:32,250
I don't know who's worked on
the Pratham reading thing.

1351
01:02:32,250 --> 01:02:33,470
They're trying to teach--

1352
01:02:33,470 --> 01:02:35,750
NGOs trying to teach millions
of kids to read, as an NGO.

1353
01:02:35,750 --> 01:02:37,170
So sometimes NGOs
scale up too.

1354
01:02:37,170 --> 01:02:40,880
But anyway, you're right that
there's an ultimate client

1355
01:02:40,880 --> 01:02:41,660
who's interested in this.

1356
01:02:41,660 --> 01:02:43,060
AUDIENCE: So then, having
a conversation

1357
01:02:43,060 --> 01:02:45,160
very early on about--

1358
01:02:45,160 --> 01:02:45,720
PROFESSOR: Yeah.

1359
01:02:45,720 --> 01:02:46,410
Could be very useful.

1360
01:02:46,410 --> 01:02:47,190
That's absolutely right.

1361
01:02:47,190 --> 01:02:47,980
That's absolutely right.

1362
01:02:47,980 --> 01:02:48,580
AUDIENCE: Because--

1363
01:02:48,580 --> 01:02:50,770
PROFESSOR: Now, in terms of
your point about theory,

1364
01:02:50,770 --> 01:02:52,560
though, yes and no.

1365
01:02:52,560 --> 01:02:55,670
So I can design an experiment
that's supposed to teach kids

1366
01:02:55,670 --> 01:02:57,210
how to read.

1367
01:02:57,210 --> 01:03:00,550
I know the theory says it should
affect reading but I

1368
01:03:00,550 --> 01:03:02,750
have no idea how much.

1369
01:03:02,750 --> 01:03:03,090
And so--

1370
01:03:03,090 --> 01:03:06,740
AUDIENCE: Wouldn't you say that
a significant percentage

1371
01:03:06,740 --> 01:03:09,810
of the time, if it's a good
theory about reading, it

1372
01:03:09,810 --> 01:03:10,760
actually should tell you?

1373
01:03:10,760 --> 01:03:11,630
PROFESSOR: Not always.

1374
01:03:11,630 --> 01:03:12,090
I mean--

1375
01:03:12,090 --> 01:03:13,680
AUDIENCE: Well, then I'd
say it's not such a

1376
01:03:13,680 --> 01:03:14,960
great theory, right?

1377
01:03:14,960 --> 01:03:15,725
Wouldn't you--

1378
01:03:15,725 --> 01:03:18,000
PROFESSOR: It's a little bit
semantic, but I think that a

1379
01:03:18,000 --> 01:03:20,540
lot of times, I can--

1380
01:03:20,540 --> 01:03:23,240
say I'm going to teach kids to
read a paragraph or whatever.

1381
01:03:23,240 --> 01:03:26,570
But what percentage of the kids
is it going to work for?

1382
01:03:26,570 --> 01:03:30,550
What percentage of the kids
are going to be affected?

1383
01:03:30,550 --> 01:03:33,580
I think that using theory
to calculate how--

1384
01:03:33,580 --> 01:03:35,060
I think theory can tell
you a lot what

1385
01:03:35,060 --> 01:03:36,420
variables should be affected.

1386
01:03:36,420 --> 01:03:38,620
And that's what we talked about
in the last lecture.

1387
01:03:38,620 --> 01:03:41,110
I think theory can tell you what
the sign of those effects

1388
01:03:41,110 --> 01:03:42,050
likely to be.

1389
01:03:42,050 --> 01:03:45,350
I think it's often putting a lot
of demands on your theory

1390
01:03:45,350 --> 01:03:47,440
to have them tell you
the magnitude.

1391
01:03:47,440 --> 01:03:48,595
And that's why you want
to do the experiment.

1392
01:03:48,595 --> 01:03:51,540
AUDIENCE: And you just told me
that even beyond the theory,

1393
01:03:51,540 --> 01:03:53,950
you say, well, but we did this
in one school and we saw it

1394
01:03:53,950 --> 01:03:55,290
had this great thing,
but you're saying--

1395
01:03:55,290 --> 01:03:55,790
[INTERPOSING VOICES]

1396
01:03:55,790 --> 01:03:57,175
PROFESSOR: But your confidence
interval is going to be--

1397
01:03:57,175 --> 01:03:58,190
well, it's not nothing.

1398
01:03:58,190 --> 01:03:59,550
It's going to tell you
something, but your confidence

1399
01:03:59,550 --> 01:04:00,310
interval is going
to be enormous.

1400
01:04:00,310 --> 01:04:02,340
AUDIENCE: Right, nothing
that you could

1401
01:04:02,340 --> 01:04:04,200
rely on to set a good--

1402
01:04:04,200 --> 01:04:04,490
[INTERPOSING VOICES]

1403
01:04:04,490 --> 01:04:07,296
PROFESSOR: Right, it gives you
a data point, but it's going

1404
01:04:07,296 --> 01:04:09,174
to have a huge conference
interval.

1405
01:04:09,174 --> 01:04:13,330
AUDIENCE: I don't want to
belabor this, but if you think

1406
01:04:13,330 --> 01:04:14,490
about it in business
terms, right?

1407
01:04:14,490 --> 01:04:16,250
I want to go out and
raise some money.

1408
01:04:16,250 --> 01:04:17,440
PROFESSOR: Yes, absolutely.

1409
01:04:17,440 --> 01:04:17,790
[INTERPOSING VOICES]

1410
01:04:17,790 --> 01:04:18,535
AUDIENCE: --something.

1411
01:04:18,535 --> 01:04:20,520
And so, in order to raise that
money, I have to tell you

1412
01:04:20,520 --> 01:04:22,680
that, in fact, you're going
to make this much money.

1413
01:04:22,680 --> 01:04:23,360
PROFESSOR: Right.

1414
01:04:23,360 --> 01:04:25,816
AUDIENCE: And, of course, it
could turn out to be wrong.

1415
01:04:25,816 --> 01:04:28,483
But I have to tell you you're
going to get a 25% return on

1416
01:04:28,483 --> 01:04:28,790
your money.

1417
01:04:28,790 --> 01:04:30,575
And that means I have to
explain to you why this

1418
01:04:30,575 --> 01:04:32,347
business is going to successful,
how many people

1419
01:04:32,347 --> 01:04:34,320
are going to buy it, how I'm
going to manage my costs down.

1420
01:04:34,320 --> 01:04:36,750
So it's always curious to me
that, when you're talking

1421
01:04:36,750 --> 01:04:41,000
about social interventions, that
I'm not having to make

1422
01:04:41,000 --> 01:04:45,120
that same argument with that
same level of specificity,

1423
01:04:45,120 --> 01:04:46,930
which means I've talked
about the effect size.

1424
01:04:46,930 --> 01:04:50,540
Because I can't raise money if I
tell you, look, I might only

1425
01:04:50,540 --> 01:04:53,695
make you 5% or we might shoot
the moon and make 100%.

1426
01:04:53,695 --> 01:04:55,290
You'll say, thank
you very much.

1427
01:04:55,290 --> 01:04:56,690
This person doesn't know
what their business is.

1428
01:04:56,690 --> 01:04:58,340
I'm not going to give
them my money.

1429
01:04:58,340 --> 01:04:59,730
PROFESSOR: Right.

1430
01:04:59,730 --> 01:05:03,710
So you actually hit on exactly
what's on the next slide.

1431
01:05:03,710 --> 01:05:07,110
Which is exactly what I was
going to say, which is, what

1432
01:05:07,110 --> 01:05:09,000
you want to think about with
your effect size is exactly

1433
01:05:09,000 --> 01:05:09,370
this thing.

1434
01:05:09,370 --> 01:05:10,850
What's the cost of this
program versus

1435
01:05:10,850 --> 01:05:12,170
the benefit it brings?

1436
01:05:12,170 --> 01:05:15,220
And sometimes, what's the cost
vis-a-vis alternative uses of

1437
01:05:15,220 --> 01:05:16,270
the money, right?

1438
01:05:16,270 --> 01:05:17,730
And that's going to be a
conversation you're going to

1439
01:05:17,730 --> 01:05:19,960
have with your client, which is
going to say, if the effect

1440
01:05:19,960 --> 01:05:22,200
size was 0.1, I would do it.

1441
01:05:22,200 --> 01:05:23,730
And then you say, OK, I'm going
to design an experiment

1442
01:05:23,730 --> 01:05:26,910
to see if it's 0.1
or bigger, right?

1443
01:05:26,910 --> 01:05:30,770
So I'm totally on
board with that.

1444
01:05:30,770 --> 01:05:33,030
Because, as I was saying, if
the effect size is smaller

1445
01:05:33,030 --> 01:05:35,610
than that, it still could be
positive, but if your client

1446
01:05:35,610 --> 01:05:39,210
doesn't care, if it's not worth
the money at that level,

1447
01:05:39,210 --> 01:05:41,210
then why do we need to design
a big experiment

1448
01:05:41,210 --> 01:05:42,460
to pick that up?

1449
01:05:44,230 --> 01:05:46,590
It's also worth noting this
is not your expected

1450
01:05:46,590 --> 01:05:48,540
effect size, right?

1451
01:05:48,540 --> 01:05:53,620
I could expect this thing to
have an effect of 0.2 but even

1452
01:05:53,620 --> 01:05:55,040
if it was as low as 0.1,
it would still be

1453
01:05:55,040 --> 01:05:56,430
worth doing, OK?

1454
01:05:56,430 --> 01:05:58,170
And in that case, I might want
to design an experiment of

1455
01:05:58,170 --> 01:06:02,920
0.1, right?

1456
01:06:02,920 --> 01:06:05,580
Conversely, you guys can all
imagine the opposite, which is

1457
01:06:05,580 --> 01:06:08,350
you could say, I expect
this thing to be 0.1,

1458
01:06:08,350 --> 01:06:11,090
but maybe it's 0.2.

1459
01:06:11,090 --> 01:06:12,020
Maybe it's actually--

1460
01:06:12,020 --> 01:06:14,260
I'm not sure how good it is.

1461
01:06:14,260 --> 01:06:14,850
I think it's OK.

1462
01:06:14,850 --> 01:06:17,370
But maybe it could
be really great.

1463
01:06:17,370 --> 01:06:19,550
And if it was really great, I
would want to adopt it, so I

1464
01:06:19,550 --> 01:06:21,540
would design an experiment
to 0.2.

1465
01:06:21,540 --> 01:06:25,120
So it's not the expected effect
size, it's what you

1466
01:06:25,120 --> 01:06:26,758
would use to adopt
the program.

1467
01:06:33,180 --> 01:06:35,180
When we talk about
effect sizes, we

1468
01:06:35,180 --> 01:06:37,282
often talk about them--

1469
01:06:37,282 --> 01:06:40,970
we talk about what we call
standardized effect size, OK?

1470
01:06:46,020 --> 01:06:48,670
As I mentioned, how large an
effect you can detect depends

1471
01:06:48,670 --> 01:06:51,010
on how variable your
sample is.

1472
01:06:51,010 --> 01:06:53,830
So if everyone's the
same, it's very

1473
01:06:53,830 --> 01:06:55,870
easy to pick up effects.

1474
01:06:55,870 --> 01:06:58,770
And we often talk about
standardized effects are the

1475
01:06:58,770 --> 01:07:01,050
effect size divided by the
standard deviation of the

1476
01:07:01,050 --> 01:07:03,350
outcome, OK?

1477
01:07:03,350 --> 01:07:05,250
So standard deviation of outcome
is the measure of how

1478
01:07:05,250 --> 01:07:06,600
variable your outcome is.

1479
01:07:06,600 --> 01:07:10,530
So we often express our effect
sizes relative to the standard

1480
01:07:10,530 --> 01:07:12,680
deviation of the outcome, OK?

1481
01:07:12,680 --> 01:07:14,310
And so when I was talking
about test scores, for

1482
01:07:14,310 --> 01:07:16,770
example, test scores are usually
normalized to have a

1483
01:07:16,770 --> 01:07:18,290
standard deviation of one.

1484
01:07:18,290 --> 01:07:20,400
So this is actually how we
normally express things in

1485
01:07:20,400 --> 01:07:22,510
terms of test scores, but we
could do it for anything.

1486
01:07:22,510 --> 01:07:25,850
And so effect sizes of
0.1, 0.2 are small.

1487
01:07:25,850 --> 01:07:26,910
0.4 are medium.

1488
01:07:26,910 --> 01:07:28,150
0.5 are large.

1489
01:07:28,150 --> 01:07:29,830
Now what do we mean by that?

1490
01:07:29,830 --> 01:07:31,790
This is actually a very helpful
way of thinking about

1491
01:07:31,790 --> 01:07:34,580
what a standardized effect
size is telling you.

1492
01:07:34,580 --> 01:07:37,830
So a standardized effect size of
0.2, which is what we were

1493
01:07:37,830 --> 01:07:43,350
saying was a modest one, means
that the average person in the

1494
01:07:43,350 --> 01:07:47,980
treatment group, the median
or the mean person of the

1495
01:07:47,980 --> 01:07:52,610
treatment group, had a better
outcome than 58% of the people

1496
01:07:52,610 --> 01:07:54,930
in the control group.

1497
01:07:54,930 --> 01:07:57,810
So remember, if it was zero,
it would be 50-50.

1498
01:07:57,810 --> 01:07:58,840
It would be 50%, right?

1499
01:07:58,840 --> 01:08:01,160
If there was no effect, the
distributions would line up

1500
01:08:01,160 --> 01:08:03,400
and this person's in the
treatment group--

1501
01:08:03,400 --> 01:08:04,590
the median person in the
treatment group would be

1502
01:08:04,590 --> 01:08:09,150
better than 50% of the people
in the control group.

1503
01:08:09,150 --> 01:08:11,680
So this is saying, instead of
lining up at exactly 50-50,

1504
01:08:11,680 --> 01:08:15,920
it's lining up 58%-50%, OK?

1505
01:08:15,920 --> 01:08:20,700
If you get an effect size of
0.5, which we were saying was

1506
01:08:20,700 --> 01:08:24,490
a large effect, that means that
69% of the people in the

1507
01:08:24,490 --> 01:08:26,720
treatment group are going to
be bigger than the median

1508
01:08:26,720 --> 01:08:29,484
person in the control group.

1509
01:08:29,484 --> 01:08:31,100
Sorry, it's the other
way around.

1510
01:08:31,100 --> 01:08:32,490
The average member of the
intervention group is better

1511
01:08:32,490 --> 01:08:36,310
than 69% of people in
the control group.

1512
01:08:36,310 --> 01:08:37,950
So the distributions are
still overlapping.

1513
01:08:37,950 --> 01:08:39,170
But now there's--

1514
01:08:39,170 --> 01:08:42,170
the middle of the treatment
distribution is at the 69th

1515
01:08:42,170 --> 01:08:45,779
percentile of the control.

1516
01:08:45,779 --> 01:08:49,800
And a large effect of 0.8 would
mean that the median

1517
01:08:49,800 --> 01:08:55,210
person in the treatment group
is at the 79th percentile of

1518
01:08:55,210 --> 01:08:56,970
the control.

1519
01:08:56,970 --> 01:08:58,580
That just gives you a sense of
when we're talking about

1520
01:08:58,580 --> 01:09:02,180
standardized effect sizes, how
big we're talking about.

1521
01:09:02,180 --> 01:09:04,990
And so you can see that
0.2, is actually--

1522
01:09:04,990 --> 01:09:08,689
you can imagine is going to be
pretty hard to detect, right?

1523
01:09:08,689 --> 01:09:10,800
If the median person in the
treatment group looks like the

1524
01:09:10,800 --> 01:09:14,029
58th percentile of the control
group, that's going to be a

1525
01:09:14,029 --> 01:09:18,800
case where those distributions
have a lot of overlap, right?

1526
01:09:18,800 --> 01:09:20,450
And so this is going to be much
harder to detect than

1527
01:09:20,450 --> 01:09:25,130
this case when the overlap
is much smaller.

1528
01:09:25,130 --> 01:09:25,330
Yeah.

1529
01:09:25,330 --> 01:09:28,950
AUDIENCE: So in your experience,
what do most

1530
01:09:28,950 --> 01:09:30,826
people think their
effect size is?

1531
01:09:30,826 --> 01:09:32,140
Where do they settle?

1532
01:09:32,140 --> 01:09:34,080
They probably wouldn't
settle at 0.2?

1533
01:09:34,080 --> 01:09:36,649
PROFESSOR: Actually, a lot of
people in a lot of educational

1534
01:09:36,649 --> 01:09:36,989
interventions--

1535
01:09:36,989 --> 01:09:38,680
AUDIENCE: That's enough
for them?

1536
01:09:38,680 --> 01:09:40,439
PROFESSOR: Yeah.

1537
01:09:40,439 --> 01:09:42,279
I would say the typical
intervention that people study

1538
01:09:42,279 --> 01:09:44,370
that I've seen in education,
the effect size is in the

1539
01:09:44,370 --> 01:09:50,284
0.15, 0.2 range.

1540
01:09:50,284 --> 01:09:52,019
It turns out it's really hard
to move test scores.

1541
01:09:52,019 --> 01:09:52,982
AUDIENCE: Yeah.

1542
01:09:52,982 --> 01:09:56,570
PROFESSOR: So yeah, I
would say a lot of--

1543
01:09:56,570 --> 01:09:58,410
but you'll see when you do the
power calculations, that to

1544
01:09:58,410 --> 01:10:00,040
detect 0.2, you often need
a pretty big sample.

1545
01:10:03,810 --> 01:10:05,620
Look, it depends a lot on what
your intervention is, but I've

1546
01:10:05,620 --> 01:10:09,270
seen a lot in that range.

1547
01:10:09,270 --> 01:10:13,340
And I'm just trying to think
of an experiment I did.

1548
01:10:13,340 --> 01:10:14,940
I can't think of it off hand.

1549
01:10:14,940 --> 01:10:17,280
But yeah, I would say
a lot in this range.

1550
01:10:17,280 --> 01:10:19,777
AUDIENCE: So would the converse
be true, that in

1551
01:10:19,777 --> 01:10:22,890
fact, you don't see too
many that have a real

1552
01:10:22,890 --> 01:10:25,962
large effect size?

1553
01:10:25,962 --> 01:10:28,490
PROFESSOR: I would say it's
pretty rare that I see

1554
01:10:28,490 --> 01:10:31,238
interventions that are 0.8.

1555
01:10:31,238 --> 01:10:31,704
Yeah.

1556
01:10:31,704 --> 01:10:34,160
AUDIENCE: Do you think it's
valuable that just because

1557
01:10:34,160 --> 01:10:36,500
you're setting a low effect
size in designing your

1558
01:10:36,500 --> 01:10:37,320
experiment, you're being
conservative.

1559
01:10:37,320 --> 01:10:39,140
You can still pick up a
[UNINTELLIGIBLE] effect size--

1560
01:10:39,140 --> 01:10:39,740
PROFESSOR: Of course.

1561
01:10:39,740 --> 01:10:41,412
AUDIENCE: It's just in
the design process--

1562
01:10:41,412 --> 01:10:41,830
[INTERPOSING VOICES]

1563
01:10:41,830 --> 01:10:42,450
PROFESSOR: Right.

1564
01:10:42,450 --> 01:10:44,520
This is the minimum thing
you could pick up.

1565
01:10:44,520 --> 01:10:45,720
That's absolutely right.

1566
01:10:45,720 --> 01:10:46,460
That's right.

1567
01:10:46,460 --> 01:10:49,470
So right, if you design for 0.2
but, in fact, your thing

1568
01:10:49,470 --> 01:10:54,150
is amazing and does 0.8, well,
there's no problem at all.

1569
01:10:54,150 --> 01:10:56,520
You'll have a p-value
of 0.00 something.

1570
01:10:56,520 --> 01:10:59,184
You'll have a very strong
[INAUDIBLE].

1571
01:10:59,184 --> 01:11:01,900
It's a good point.

1572
01:11:01,900 --> 01:11:03,510
OK.

1573
01:11:03,510 --> 01:11:06,950
So how do we actually
calculate our power?

1574
01:11:06,950 --> 01:11:10,390
So there's actually a very nice
software package, which,

1575
01:11:10,390 --> 01:11:12,330
have you guys started
using this yet?

1576
01:11:12,330 --> 01:11:12,810
Yeah?

1577
01:11:12,810 --> 01:11:12,990
OK.

1578
01:11:12,990 --> 01:11:14,110
AUDIENCE: I have a question.

1579
01:11:14,110 --> 01:11:15,970
Can you just clarify something
before you go on?

1580
01:11:15,970 --> 01:11:16,900
PROFESSOR: Yeah.

1581
01:11:16,900 --> 01:11:20,208
AUDIENCE: So by rejecting a null
hypothesis, you won't be

1582
01:11:20,208 --> 01:11:23,064
able to say what the expected
effect is, so you won't be

1583
01:11:23,064 --> 01:11:24,254
able to necessarily quantify
the impact.

1584
01:11:24,254 --> 01:11:26,865
PROFESSOR: No, that's
not quite right.

1585
01:11:26,865 --> 01:11:27,697
AUDIENCE: OK.

1586
01:11:27,697 --> 01:11:31,570
PROFESSOR: So you're going
to estimate your--

1587
01:11:31,570 --> 01:11:34,260
you run your experiment, you're
going to get a beta,

1588
01:11:34,260 --> 01:11:35,940
which is your estimate,
And you're going to

1589
01:11:35,940 --> 01:11:38,470
get a standard error.

1590
01:11:38,470 --> 01:11:42,110
You reject the null, which
means you say with 95%

1591
01:11:42,110 --> 01:11:44,540
probability, I'm in my
confidence interval.

1592
01:11:44,540 --> 01:11:48,360
So you know you're somewhere
in the confidence interval.

1593
01:11:48,360 --> 01:11:50,990
And then beyond that, you have
an estimate of where in the

1594
01:11:50,990 --> 01:11:52,360
confidence interval you are.

1595
01:11:52,360 --> 01:11:54,210
And your best estimate for
where you are on the

1596
01:11:54,210 --> 01:11:57,080
confidence interval is
your point estimate.

1597
01:11:57,080 --> 01:11:58,230
Does that make sense?

1598
01:11:58,230 --> 01:12:02,380
So in terms of thinking through
the cost-benefit or

1599
01:12:02,380 --> 01:12:05,080
whatever, your best guess of the
effect of the program is

1600
01:12:05,080 --> 01:12:07,470
your point estimate,
is your beta.

1601
01:12:07,470 --> 01:12:10,480
If you wanted to be a little
more precise about it, you

1602
01:12:10,480 --> 01:12:11,730
could say--

1603
01:12:19,190 --> 01:12:25,090
so this is your estimate, this
is your beta hat, this is your

1604
01:12:25,090 --> 01:12:26,810
confidence interval, right?

1605
01:12:26,810 --> 01:12:29,970
Zero is over here, so you can
reject zero in this case.

1606
01:12:29,970 --> 01:12:34,210
But, in fact, there's a
distribution of where your

1607
01:12:34,210 --> 01:12:36,100
estimates are likely to be.

1608
01:12:36,100 --> 01:12:37,870
And when we said it was 95%
confidence interval, that's

1609
01:12:37,870 --> 01:12:41,100
because the probability of
being over here is 95%.

1610
01:12:41,100 --> 01:12:43,710
But this says you're most likely
to be right here, but

1611
01:12:43,710 --> 01:12:45,430
there's some probability
over here.

1612
01:12:45,430 --> 01:12:48,980
You're more likely to be near
beta then you are to be very--

1613
01:12:48,980 --> 01:12:51,160
it's not that you're equally
likely to be anywhere in your

1614
01:12:51,160 --> 01:12:52,350
confidence interval.

1615
01:12:52,350 --> 01:12:54,700
You're most likely to be right
near your point estimate.

1616
01:12:54,700 --> 01:12:58,740
So, in fact, if you actually
cared about the range, you

1617
01:12:58,740 --> 01:13:01,060
could say, well, what's the
probability I'm over here?

1618
01:13:01,060 --> 01:13:02,410
And calculate that.

1619
01:13:02,410 --> 01:13:03,340
What's the probability
I'm over here?

1620
01:13:03,340 --> 01:13:06,130
And you could average them to
calculate the average benefit

1621
01:13:06,130 --> 01:13:07,390
of your program.

1622
01:13:07,390 --> 01:13:09,230
Usually, though, we don't bother
to do this and usually

1623
01:13:09,230 --> 01:13:11,680
what we do is we say our best
estimate is that you're right

1624
01:13:11,680 --> 01:13:12,250
at beta hat.

1625
01:13:12,250 --> 01:13:14,372
That is our best estimate and
we calculate our estimate

1626
01:13:14,372 --> 01:13:15,622
based on that.

1627
01:13:18,480 --> 01:13:20,215
But in theory, you could use
the whole distribution

1628
01:13:20,215 --> 01:13:22,392
[INAUDIBLE].

1629
01:13:22,392 --> 01:13:23,642
OK.

1630
01:13:27,600 --> 01:13:28,850
OK, so suppose we want--

1631
01:13:28,850 --> 01:13:31,010
so how do we actually calculate
some of these?

1632
01:13:31,010 --> 01:13:34,710
So using the software helps get
a sense, intuitively, of

1633
01:13:34,710 --> 01:13:35,910
what these tradeoffs are
going to look like.

1634
01:13:35,910 --> 01:13:37,690
And I don't know that I'll have
time to go through all

1635
01:13:37,690 --> 01:13:41,320
this, but we'll go through
most of it, OK?

1636
01:13:41,320 --> 01:13:43,960
So for example, so if you run
the software and look at power

1637
01:13:43,960 --> 01:13:45,210
versus number of clusters--

1638
01:13:50,156 --> 01:13:52,340
hold on.

1639
01:13:52,340 --> 01:13:54,430
So how would you set this
up in the software?

1640
01:13:54,430 --> 01:13:58,490
So we'll talk about clustered
effects in a sec.

1641
01:13:58,490 --> 01:14:03,540
As we discussed, you have to
pick a significance level.

1642
01:14:03,540 --> 01:14:05,930
You have to pick a standardized
effect size.

1643
01:14:05,930 --> 01:14:07,360
That's what delta is
in the software.

1644
01:14:07,360 --> 01:14:10,850
So we use 0.2, OK?

1645
01:14:10,850 --> 01:14:12,620
In the software, it's always
a standardized effect size.

1646
01:14:12,620 --> 01:14:13,650
You just divide by
your standard

1647
01:14:13,650 --> 01:14:16,260
deviation of your outcome.

1648
01:14:16,260 --> 01:14:18,400
That's why you know your
actual outcome variable

1649
01:14:18,400 --> 01:14:19,670
because you know--
but I think the

1650
01:14:19,670 --> 01:14:21,690
actual effect is whatever--

1651
01:14:21,690 --> 01:14:23,590
people get one centimeter
longer in order to get a

1652
01:14:23,590 --> 01:14:24,920
standardized effect size, I
need to know the standard

1653
01:14:24,920 --> 01:14:27,290
deviation of my outcome
variable.

1654
01:14:27,290 --> 01:14:34,660
And the program is going to
give you the power as a

1655
01:14:34,660 --> 01:14:39,180
function of your sample
size, OK?

1656
01:14:39,180 --> 01:14:42,960
And one of the things that you
can see is that this is not

1657
01:14:42,960 --> 01:14:46,410
necessarily a linear
relationship, right?

1658
01:14:46,410 --> 01:14:51,020
So for example, here, we've
plotted a delta of--

1659
01:14:51,020 --> 01:14:54,050
effect size of 0.2 and here's
an effect size of 0.4.

1660
01:14:54,050 --> 01:14:58,930
So this says that with about 200
clusters, you're going to

1661
01:14:58,930 --> 01:15:04,315
get to a power of 0.8 with the
effect size of 0.4, but you're

1662
01:15:04,315 --> 01:15:07,010
still going to be at a
power of 0.2 with an

1663
01:15:07,010 --> 01:15:08,670
effect size of 0.2.

1664
01:15:08,670 --> 01:15:11,570
So the formulas are
complicated.

1665
01:15:11,570 --> 01:15:13,940
They're not necessarily a linear
function of your power.

1666
01:15:22,050 --> 01:15:24,750
When we think about power, we've
talked about a couple of

1667
01:15:24,750 --> 01:15:29,260
things that influence our power
in terms of the variance

1668
01:15:29,260 --> 01:15:30,800
of our outcome, right?

1669
01:15:30,800 --> 01:15:32,890
The variance of our outcome,
how big our effect size is.

1670
01:15:32,890 --> 01:15:34,180
And those are the basic
things that are going

1671
01:15:34,180 --> 01:15:35,420
to affect our power.

1672
01:15:35,420 --> 01:15:39,650
But there are things that we
can do in our experiment--

1673
01:15:39,650 --> 01:15:41,360
in the way we design our
experiment that are also going

1674
01:15:41,360 --> 01:15:44,470
to make our experiment more
or less powerful.

1675
01:15:44,470 --> 01:15:45,790
And here are some of the
things that we can do.

1676
01:15:49,720 --> 01:15:52,390
One thing that we can
do is we can think

1677
01:15:52,390 --> 01:15:55,240
about having a cluster--

1678
01:15:55,240 --> 01:15:57,330
so whether we whether randomize
at the individual

1679
01:15:57,330 --> 01:16:03,320
level or in clusters, whether we
have a baseline, whether we

1680
01:16:03,320 --> 01:16:06,570
use control variables or
stratification, and the type

1681
01:16:06,570 --> 01:16:09,750
hypothesis being tested.

1682
01:16:09,750 --> 01:16:12,550
All four of these are things
that we're going to do that

1683
01:16:12,550 --> 01:16:15,130
for a given outcome variable
and a given effect size, in

1684
01:16:15,130 --> 01:16:17,750
some sense, are going to
affect how powerful our

1685
01:16:17,750 --> 01:16:20,490
experiment is.

1686
01:16:20,490 --> 01:16:22,800
In some sense--

1687
01:16:22,800 --> 01:16:24,820
given that I may not have time
to finish everything, the one

1688
01:16:24,820 --> 01:16:28,550
that I want to focus on is
the clustering issue.

1689
01:16:28,550 --> 01:16:31,720
This is the one that is the
biggest for designing

1690
01:16:31,720 --> 01:16:37,270
experiments, and it often
makes a big difference.

1691
01:16:37,270 --> 01:16:43,640
So the intuition for clustering
is that--

1692
01:16:43,640 --> 01:16:45,040
so what is clustering?

1693
01:16:45,040 --> 01:16:49,480
Clustering is, instead of
randomizing-- suppose I want

1694
01:16:49,480 --> 01:16:54,750
to do an experiment on whether
the J-PAL executive ed class

1695
01:16:54,750 --> 01:16:56,980
improves your ability to--

1696
01:16:56,980 --> 01:16:59,440
whether you took this lecture
improves your understanding of

1697
01:16:59,440 --> 01:17:01,820
power calculation, OK?

1698
01:17:01,820 --> 01:17:05,380
Suppose I randomly sampled this
half of the room and gave

1699
01:17:05,380 --> 01:17:08,980
you my lecture and this half
was the control group.

1700
01:17:08,980 --> 01:17:11,270
And I flipped a coin so I split
you in halves down the

1701
01:17:11,270 --> 01:17:13,160
middle and I said, OK, I'm going
to flip a coin, which is

1702
01:17:13,160 --> 01:17:14,410
control, which is treatment.

1703
01:17:16,590 --> 01:17:20,020
You guys, presumably you all
sat with your friends, OK?

1704
01:17:20,020 --> 01:17:23,280
So people on this side of the
room are going to be more like

1705
01:17:23,280 --> 01:17:27,440
each other then people on that
side of the room, OK?

1706
01:17:27,440 --> 01:17:32,620
So I didn't get an independent
sample, right?

1707
01:17:32,620 --> 01:17:35,810
This group, their outcomes are
going to be correlated because

1708
01:17:35,810 --> 01:17:37,510
some of you are friends
and have similar

1709
01:17:37,510 --> 01:17:38,530
backgrounds and skills.

1710
01:17:38,530 --> 01:17:41,350
And this group is going
to be correlated.

1711
01:17:41,350 --> 01:17:44,490
On the other hand, suppose I had
gone through everyone and

1712
01:17:44,490 --> 01:17:46,450
randomly flipped a coin for
every person and said,

1713
01:17:46,450 --> 01:17:47,200
treatment or control, treatment
or control,

1714
01:17:47,200 --> 01:17:49,920
treatment or control?

1715
01:17:49,920 --> 01:17:53,540
In that case, I would've flipped
the coin 60 times and

1716
01:17:53,540 --> 01:17:56,070
there would be no correlation
between who is in the control

1717
01:17:56,070 --> 01:17:57,550
group and who is in the
treatment group because I

1718
01:17:57,550 --> 01:17:59,690
wouldn't have been randomizing
you into the same groups

1719
01:17:59,690 --> 01:18:02,550
together, OK?

1720
01:18:02,550 --> 01:18:07,070
By doing the cluster design,
splitting you in half and then

1721
01:18:07,070 --> 01:18:10,170
randomizing treatment versus
control or splitting you into

1722
01:18:10,170 --> 01:18:11,600
groups of 10--

1723
01:18:11,600 --> 01:18:12,920
you five, you 10, you 10.

1724
01:18:12,920 --> 01:18:15,660
You 10, you 10, you 10, and
then flipping the coin.

1725
01:18:15,660 --> 01:18:18,440
I have less variation, in some
sense, than if I had flipped

1726
01:18:18,440 --> 01:18:19,890
the coin in individual--

1727
01:18:19,890 --> 01:18:21,220
person by person--

1728
01:18:21,220 --> 01:18:23,510
because those groups are
going to be correlated.

1729
01:18:23,510 --> 01:18:25,580
They're going to have
similar outcomes.

1730
01:18:25,580 --> 01:18:31,100
So the basic point is that your
power is going to be--

1731
01:18:31,100 --> 01:18:33,790
the more times you flip the coin
to randomize treatment

1732
01:18:33,790 --> 01:18:35,780
and control, essentially, the
more power you're going to

1733
01:18:35,780 --> 01:18:37,950
have because the more your
different groups are going to

1734
01:18:37,950 --> 01:18:40,160
be independent, OK?

1735
01:18:40,160 --> 01:18:44,430
So to go through this again,
suppose you wanted to know--

1736
01:18:44,430 --> 01:18:46,340
this is, in general,
about clustering.

1737
01:18:46,340 --> 01:18:49,600
Suppose you wanted to know how
the outcome of the national

1738
01:18:49,600 --> 01:18:51,180
elections are going to be.

1739
01:18:51,180 --> 01:18:53,520
So you could either randomly
sample 50 people from the

1740
01:18:53,520 --> 01:18:56,040
entire Indian population, or
you randomly pick five

1741
01:18:56,040 --> 01:18:59,470
families and you ask 10
people per family what

1742
01:18:59,470 --> 01:19:01,590
their opinions are.

1743
01:19:01,590 --> 01:19:03,960
Clearly, this is going to give
you more information than this

1744
01:19:03,960 --> 01:19:06,110
is because those family members
are going to be

1745
01:19:06,110 --> 01:19:08,160
correlated, right?

1746
01:19:08,160 --> 01:19:10,690
I have views like my wife and
like my father, et cetera.

1747
01:19:10,690 --> 01:19:12,990
So we're not getting independent
views, whereas

1748
01:19:12,990 --> 01:19:15,565
here, you're getting, really,
50 independent data points.

1749
01:19:15,565 --> 01:19:16,910
And that's the same as
what we were talking

1750
01:19:16,910 --> 01:19:19,230
about with the class.

1751
01:19:19,230 --> 01:19:21,700
So this approach is going to
have more power than this

1752
01:19:21,700 --> 01:19:24,132
approach because of the way
you did the sample.

1753
01:19:24,132 --> 01:19:24,568
Yeah.

1754
01:19:24,568 --> 01:19:26,465
AUDIENCE: So is the only reason
that you would cluster,

1755
01:19:26,465 --> 01:19:30,370
then, just because you had to
because you had no choice--

1756
01:19:30,370 --> 01:19:31,230
PROFESSOR: Yes.

1757
01:19:31,230 --> 01:19:34,824
AUDIENCE: --for political
reasons or just feasibility.

1758
01:19:34,824 --> 01:19:35,782
PROFESSOR: And cost.

1759
01:19:35,782 --> 01:19:36,740
AUDIENCE: And cost.

1760
01:19:36,740 --> 01:19:37,410
PROFESSOR: Yeah.

1761
01:19:37,410 --> 01:19:38,810
AUDIENCE: Well, and the
level of intervention.

1762
01:19:38,810 --> 01:19:40,800
PROFESSOR: Exactly.

1763
01:19:40,800 --> 01:19:41,550
And we'll talk about that.

1764
01:19:41,550 --> 01:19:44,040
There are lots of
reasons people--

1765
01:19:44,040 --> 01:19:47,260
given this issue, people have
lots of good reasons for

1766
01:19:47,260 --> 01:19:51,720
clustering, but the point is
that there are negative

1767
01:19:51,720 --> 01:19:52,830
tradeoffs for sample size.

1768
01:19:52,830 --> 01:19:56,360
AUDIENCE: About the clusters.

1769
01:19:56,360 --> 01:20:03,260
If you flip the coin for all of
the class and then after,

1770
01:20:03,260 --> 01:20:06,690
you decide that you will select
among the people that

1771
01:20:06,690 --> 01:20:10,880
you have assigned, you will
select those seated--

1772
01:20:10,880 --> 01:20:15,000
you will select half of those
seated on the left.

1773
01:20:15,000 --> 01:20:16,290
Will that solve the problem--

1774
01:20:16,290 --> 01:20:18,760
PROFESSOR: You select half of
the ones seated on the left?

1775
01:20:18,760 --> 01:20:19,290
AUDIENCE: Yeah.

1776
01:20:19,290 --> 01:20:21,670
PROFESSOR: Well, it's
a different issue.

1777
01:20:21,670 --> 01:20:25,300
Suppose I first select the left
and now I go one by one,

1778
01:20:25,300 --> 01:20:28,140
flip a coin of the people
on the left.

1779
01:20:28,140 --> 01:20:30,270
I don't have the clustering
issue because I flipped the

1780
01:20:30,270 --> 01:20:32,310
coin per person.

1781
01:20:32,310 --> 01:20:34,860
But I have a different issue,
which is that the people I

1782
01:20:34,860 --> 01:20:36,860
selected are not necessarily
representative of the whole

1783
01:20:36,860 --> 01:20:40,450
population because I didn't pick
a representative sample.

1784
01:20:40,450 --> 01:20:42,040
I picked the ones who happened
to sit over here.

1785
01:20:42,040 --> 01:20:42,966
AUDIENCE: My question--

1786
01:20:42,966 --> 01:20:45,320
PROFESSOR: So there's two
different issues.

1787
01:20:45,320 --> 01:20:50,080
One is, essentially, how many
times you flip a coin is how

1788
01:20:50,080 --> 01:20:53,000
much power you have, how
independent you're thing is.

1789
01:20:53,000 --> 01:20:55,950
The other issue is, is this
group here representative of

1790
01:20:55,950 --> 01:20:57,700
the entire population?

1791
01:20:57,700 --> 01:21:01,450
You might think that people who
sit near the window like

1792
01:21:01,450 --> 01:21:03,310
to look at the river are
daydreamers and they're not as

1793
01:21:03,310 --> 01:21:06,020
good at math as people who don't
sit near the window.

1794
01:21:06,020 --> 01:21:09,850
And so I would get the effect of
my treatment on people who

1795
01:21:09,850 --> 01:21:11,630
like to sit near the window and
aren't as good at math.

1796
01:21:11,630 --> 01:21:13,310
And that might be a different
treatment effect than if I had

1797
01:21:13,310 --> 01:21:14,530
done it over the whole room.

1798
01:21:14,530 --> 01:21:18,286
So it's a different issue.

1799
01:21:18,286 --> 01:21:24,946
AUDIENCE: Yeah, but my question
was you first draw a

1800
01:21:24,946 --> 01:21:29,752
random number of people that you
assign to the treatment or

1801
01:21:29,752 --> 01:21:32,040
to the control.

1802
01:21:32,040 --> 01:21:37,580
And after that, within that
people, you now say, I will

1803
01:21:37,580 --> 01:21:39,726
take half of those people--

1804
01:21:39,726 --> 01:21:44,037
I will take half that are seated
on the left and half

1805
01:21:44,037 --> 01:21:47,390
that are seated on the right.

1806
01:21:47,390 --> 01:21:48,640
PROFESSOR: I'm not sure--

1807
01:21:48,640 --> 01:21:50,710
let me come back to
your question.

1808
01:21:50,710 --> 01:21:51,920
I'm not sure I fully understand
what you're saying.

1809
01:21:51,920 --> 01:21:53,540
Maybe we can talk about
it afterwards.

1810
01:21:53,540 --> 01:21:54,850
I think what you're saying may
be about stratification.

1811
01:21:54,850 --> 01:21:57,550
Why don't we talk
about it later?

1812
01:21:57,550 --> 01:22:00,050
Because we're running a
little short on time.

1813
01:22:00,050 --> 01:22:00,820
In fact, can I borrow
somone's handouts?

1814
01:22:00,820 --> 01:22:04,950
Because I want to make sure I
cover the most important stuff

1815
01:22:04,950 --> 01:22:05,865
in the lecture.

1816
01:22:05,865 --> 01:22:07,115
Let me just see where we are.

1817
01:22:13,910 --> 01:22:15,251
OK.

1818
01:22:15,251 --> 01:22:16,940
AUDIENCE: And if you
need to, you can

1819
01:22:16,940 --> 01:22:18,017
take ten extra minutes.

1820
01:22:18,017 --> 01:22:20,510
PROFESSOR: I may do that.

1821
01:22:20,510 --> 01:22:22,620
I was going to ask you,
Mark, for permission.

1822
01:22:22,620 --> 01:22:24,170
I just wanted to see
what I had left.

1823
01:22:24,170 --> 01:22:25,050
OK.

1824
01:22:25,050 --> 01:22:28,220
So where were we?

1825
01:22:31,690 --> 01:22:31,940
Right.

1826
01:22:31,940 --> 01:22:32,310
OK.

1827
01:22:32,310 --> 01:22:33,000
Right.

1828
01:22:33,000 --> 01:22:36,680
So as I was saying, when
possible, it's better to run

1829
01:22:36,680 --> 01:22:39,460
clustered design.

1830
01:22:39,460 --> 01:22:42,780
And so a cluster randomized
trial is one in which the

1831
01:22:42,780 --> 01:22:48,750
units that are randomized are
clusters of units rather than

1832
01:22:48,750 --> 01:22:49,400
the individual units.

1833
01:22:49,400 --> 01:22:51,640
So I randomized a whole cluster
at a time rather than

1834
01:22:51,640 --> 01:22:54,030
individual person by person.

1835
01:22:54,030 --> 01:22:55,950
And there are lots of common
examples of this.

1836
01:22:55,950 --> 01:22:59,340
So the PROGRESA program, for
example, in Mexico was a

1837
01:22:59,340 --> 01:23:00,410
conditional cash transfer
program.

1838
01:23:00,410 --> 01:23:01,900
They randomized village.

1839
01:23:01,900 --> 01:23:03,470
Some villages were in, some
villages were out.

1840
01:23:03,470 --> 01:23:06,640
If a village was in,
everybody was in.

1841
01:23:06,640 --> 01:23:08,700
In the panchayat case
we talked about, it

1842
01:23:08,700 --> 01:23:10,340
was basically a village.

1843
01:23:10,340 --> 01:23:10,930
It was a panchayat.

1844
01:23:10,930 --> 01:23:12,470
So the whole panchayat
was in or the whole

1845
01:23:12,470 --> 01:23:14,460
panchayat was not in.

1846
01:23:14,460 --> 01:23:17,135
In a lot of education
experiments, we randomize at

1847
01:23:17,135 --> 01:23:18,240
the level of a school.

1848
01:23:18,240 --> 01:23:20,280
Either the whole school is in
or the whole school is out.

1849
01:23:20,280 --> 01:23:21,220
Sometimes you do
it as a class.

1850
01:23:21,220 --> 01:23:23,400
A whole class in a school is
in [UNINTELLIGIBLE] is out.

1851
01:23:23,400 --> 01:23:25,970
In this iron supplementation
example, it was by the family.

1852
01:23:25,970 --> 01:23:29,590
So there's lots of cases where
you would do this kind of

1853
01:23:29,590 --> 01:23:31,940
clustering.

1854
01:23:31,940 --> 01:23:34,790
And there are lots of good
reasons, as I've mentioned,

1855
01:23:34,790 --> 01:23:36,130
for doing clustering.

1856
01:23:36,130 --> 01:23:40,470
So one reason is you're
worried about

1857
01:23:40,470 --> 01:23:43,450
contamination, right?

1858
01:23:43,450 --> 01:23:47,000
So for example, when they're
interested in deworming, worms

1859
01:23:47,000 --> 01:23:49,100
are very easily--

1860
01:23:49,100 --> 01:23:50,400
there's a lot of
cross-contamination.

1861
01:23:50,400 --> 01:23:52,785
If one kid has worms, the next
kid who's also in school with

1862
01:23:52,785 --> 01:23:53,940
him is likely to get worms.

1863
01:23:53,940 --> 01:23:56,960
So if I just deworm half the
kids in the school, that's

1864
01:23:56,960 --> 01:24:00,350
going to have very little effect
because my control-

1865
01:24:00,350 --> 01:24:02,270
they're going to get
recontaminated by the kids who

1866
01:24:02,270 --> 01:24:03,440
weren't dewormed, right?

1867
01:24:03,440 --> 01:24:04,540
Or it could be the
other way around.

1868
01:24:04,540 --> 01:24:06,720
It could be that if I deworm
half the kids, that's enough

1869
01:24:06,720 --> 01:24:07,830
to knock worms out of
the population.

1870
01:24:07,830 --> 01:24:09,750
The control group is
also affected.

1871
01:24:09,750 --> 01:24:12,300
So you need to choose a level
of randomization where your

1872
01:24:12,300 --> 01:24:14,300
treatment is going to affect
the treatment group and not

1873
01:24:14,300 --> 01:24:16,120
affect the control group.

1874
01:24:16,120 --> 01:24:18,700
So that's a very important
reason for cluster

1875
01:24:18,700 --> 01:24:20,470
randomizing.

1876
01:24:20,470 --> 01:24:23,960
Another reason is this
feasibility consideration.

1877
01:24:23,960 --> 01:24:26,960
So it's often just for a
variety of reasons not

1878
01:24:26,960 --> 01:24:31,370
feasible to give some people the
treatment and not others.

1879
01:24:31,370 --> 01:24:35,060
Sometimes within a village, it's
hard to make some people

1880
01:24:35,060 --> 01:24:38,100
eligible for a program
and others not.

1881
01:24:38,100 --> 01:24:40,170
It's just sometimes hard to
treat people in the same place

1882
01:24:40,170 --> 01:24:41,100
differently.

1883
01:24:41,100 --> 01:24:43,050
And so that's often a reason
why we do cluster

1884
01:24:43,050 --> 01:24:45,080
randomization.

1885
01:24:45,080 --> 01:24:50,100
And some experiments naturally
just occur at a cluster level.

1886
01:24:50,100 --> 01:24:52,710
So for example, if I want to do
something that affects an

1887
01:24:52,710 --> 01:24:54,900
entire classroom,
like give out--

1888
01:24:54,900 --> 01:24:57,685
suppose I want to train
a teacher, right?

1889
01:24:57,685 --> 01:25:00,130
That obviously affects all the
kids in the teacher's class.

1890
01:25:00,130 --> 01:25:03,280
There's no way to have that only
affect half the kids in

1891
01:25:03,280 --> 01:25:04,130
the teacher's class.

1892
01:25:04,130 --> 01:25:06,420
It's just a fact of life.

1893
01:25:06,420 --> 01:25:09,510
So there are lots of good
reasons why we do cluster

1894
01:25:09,510 --> 01:25:13,080
randomized designs even though
they have negative

1895
01:25:13,080 --> 01:25:16,280
impacts on our power.

1896
01:25:16,280 --> 01:25:20,490
So as I mentioned, the reason
the cluster has a negative

1897
01:25:20,490 --> 01:25:22,820
impact on your power is
because the groups are

1898
01:25:22,820 --> 01:25:24,560
correlated.

1899
01:25:24,560 --> 01:25:27,070
The outcomes for the individuals
are correlated.

1900
01:25:27,070 --> 01:25:28,720
So, for example, if all of the
villagers are exposed to the

1901
01:25:28,720 --> 01:25:30,860
same weather, right?

1902
01:25:30,860 --> 01:25:32,610
All villagers are exposed
to the same weather.

1903
01:25:32,610 --> 01:25:35,360
So it could be that
the weather was

1904
01:25:35,360 --> 01:25:36,570
really bad in this village.

1905
01:25:36,570 --> 01:25:40,300
So all those people are going
to have a lower outcome, for

1906
01:25:40,300 --> 01:25:41,550
example, than if the
weather was good.

1907
01:25:44,910 --> 01:25:48,150
And so, in some sense, even if
there are 1,000 people in that

1908
01:25:48,150 --> 01:25:50,920
village, they all got this
common shock, which is the

1909
01:25:50,920 --> 01:25:53,860
negative weather, you don't
actually have 1,000

1910
01:25:53,860 --> 01:25:56,336
independent observations in that
village because they have

1911
01:25:56,336 --> 01:26:00,180
this common correlated
component, OK?

1912
01:26:00,180 --> 01:26:05,230
And this common correlated
component we denote by the

1913
01:26:05,230 --> 01:26:08,460
Greek letter rho, which is the
correlation of the units

1914
01:26:08,460 --> 01:26:09,710
within the same cluster.

1915
01:26:13,840 --> 01:26:20,110
So rho measures the correlation
between units in

1916
01:26:20,110 --> 01:26:20,850
the same cluster.

1917
01:26:20,850 --> 01:26:24,090
If rho is zero, then people in
the same cluster are just as

1918
01:26:24,090 --> 01:26:25,420
if they were independent.

1919
01:26:25,420 --> 01:26:27,060
There's no correlation.

1920
01:26:27,060 --> 01:26:29,080
Just as if they had been not
in the same cluster.

1921
01:26:29,080 --> 01:26:31,760
If rho is one, they're perfectly
correlated and it

1922
01:26:31,760 --> 01:26:36,320
means it they all have exactly
the same outcome, OK?

1923
01:26:36,320 --> 01:26:39,150
So it's somewhere between
zero and one.

1924
01:26:39,150 --> 01:26:42,990
And the lower the rho is, the
better you are if you're doing

1925
01:26:42,990 --> 01:26:44,650
a cluster randomized design.

1926
01:26:44,650 --> 01:26:45,570
And why is that?

1927
01:26:45,570 --> 01:26:47,960
It's because the problem within
a clustered randomized

1928
01:26:47,960 --> 01:26:50,230
design is, as I was saying, if
people were all exposed to the

1929
01:26:50,230 --> 01:26:53,020
same weather, it's not as if
you had 1,000 independent

1930
01:26:53,020 --> 01:26:54,720
people in that village.

1931
01:26:54,720 --> 01:26:57,310
You effectively had fewer than
1,000 because they were

1932
01:26:57,310 --> 01:26:58,150
correlated.

1933
01:26:58,150 --> 01:27:02,030
And rho captures that effect--

1934
01:27:02,030 --> 01:27:05,540
how much smaller, effectively,
is your sample, OK?

1935
01:27:05,540 --> 01:27:09,340
And the bigger rho is, the
smaller your effective sample

1936
01:27:09,340 --> 01:27:11,760
size is, OK?

1937
01:27:15,130 --> 01:27:16,910
And once again, when you do the
power calculations, you

1938
01:27:16,910 --> 01:27:19,120
can play with this and you'll
note that small differences in

1939
01:27:19,120 --> 01:27:21,540
rho make very big differences
in your power.

1940
01:27:21,540 --> 01:27:23,380
And I'll show you the
formula in a sec.

1941
01:27:23,380 --> 01:27:26,320
So often it's low, but it
can be substantial.

1942
01:27:26,320 --> 01:27:29,370
So in some of these test score
cases, for example, it's

1943
01:27:29,370 --> 01:27:34,060
between 0.2 and 0.6, which,
0.6 means that most of the

1944
01:27:34,060 --> 01:27:38,840
differences are coming between
groups, not within groups.

1945
01:27:38,840 --> 01:27:44,190
So the groups, really, are much
closer to one object.

1946
01:27:44,190 --> 01:27:44,665
Yeah.

1947
01:27:44,665 --> 01:27:49,370
AUDIENCE: What does
the 0.5 mean?

1948
01:27:49,370 --> 01:27:56,458
Are you saying that in
Madagascar, the scores on math

1949
01:27:56,458 --> 01:27:58,370
and language--

1950
01:27:58,370 --> 01:28:00,200
PROFESSOR: It's the correlation

1951
01:28:00,200 --> 01:28:03,470
coefficient, which is the--

1952
01:28:07,070 --> 01:28:13,230
technically, I believe it's the
between variation divided

1953
01:28:13,230 --> 01:28:14,110
by the total variation.

1954
01:28:14,110 --> 01:28:15,580
I think that's the formula.

1955
01:28:15,580 --> 01:28:16,490
Dan's shaking his head.

1956
01:28:16,490 --> 01:28:17,250
Good.

1957
01:28:17,250 --> 01:28:18,070
Excellent.

1958
01:28:18,070 --> 01:28:20,430
A for me.

1959
01:28:20,430 --> 01:28:24,910
It's what share of the variation
is coming between

1960
01:28:24,910 --> 01:28:29,710
groups divided by the total
share of variation.

1961
01:28:29,710 --> 01:28:33,365
So 0.5 means that, in some
sense, half of the variation

1962
01:28:33,365 --> 01:28:35,330
in your sample is coming
between groups.

1963
01:28:37,890 --> 01:28:38,802
AUDIENCE: Okay.

1964
01:28:38,802 --> 01:28:39,590
PROFESSOR: What?

1965
01:28:39,590 --> 01:28:41,360
AUDIENCE: Isn't it within
[INAUDIBLE]?

1966
01:28:41,360 --> 01:28:41,980
If a rho is--

1967
01:28:41,980 --> 01:28:45,080
PROFESSOR: No, it's between.

1968
01:28:45,080 --> 01:28:47,680
Because if rho is one, then
each group is one.

1969
01:28:50,310 --> 01:28:51,590
Yeah, it's between.

1970
01:28:55,880 --> 01:29:01,230
If it was zero, then they're
independent and it's saying

1971
01:29:01,230 --> 01:29:03,884
that it's all coming
from within.

1972
01:29:03,884 --> 01:29:06,139
Yeah.

1973
01:29:06,139 --> 01:29:10,180
AUDIENCE: But here it's between
math and language

1974
01:29:10,180 --> 01:29:15,290
scores of one kid or between
math plus language scores of

1975
01:29:15,290 --> 01:29:16,970
two kids in the same group.

1976
01:29:16,970 --> 01:29:20,130
AUDIENCE: Or is it math and
language scores in Madagascar

1977
01:29:20,130 --> 01:29:21,770
are explained by--

1978
01:29:21,770 --> 01:29:24,280
PROFESSOR: This says
the following.

1979
01:29:24,280 --> 01:29:27,750
This was in Madagascar, they
sampled math and language

1980
01:29:27,750 --> 01:29:31,170
schools by--

1981
01:29:31,170 --> 01:29:34,650
they took math and language
scores for each kid by

1982
01:29:34,650 --> 01:29:35,330
classroom--

1983
01:29:35,330 --> 01:29:35,830
or by school.

1984
01:29:35,830 --> 01:29:38,210
I think it was by school in
this particular case.

1985
01:29:38,210 --> 01:29:40,110
Then they said, looking over
the whole sample that they

1986
01:29:40,110 --> 01:29:44,030
looked at in Madagascar, what
percentage of the variation in

1987
01:29:44,030 --> 01:29:47,620
test scores came
between schools

1988
01:29:47,620 --> 01:29:49,870
relative to within schools.

1989
01:29:49,870 --> 01:29:51,030
And they're saying
that half of the

1990
01:29:51,030 --> 01:29:53,790
variation was between schools.

1991
01:29:58,390 --> 01:29:59,640
OK.

1992
01:30:09,190 --> 01:30:11,350
So how much does this hurt
us, essentially?

1993
01:30:11,350 --> 01:30:15,460
So we need to adjust our
standard errors, given the

1994
01:30:15,460 --> 01:30:19,640
fact that these things
are correlated.

1995
01:30:19,640 --> 01:30:26,130
And this is the formula, which
is that for a given total

1996
01:30:26,130 --> 01:30:28,780
sample size, if we have clusters
of size m-- so say we

1997
01:30:28,780 --> 01:30:30,710
have 100 kids per school--

1998
01:30:30,710 --> 01:30:34,070
and intercorrelation coefficient
should be a rho,

1999
01:30:34,070 --> 01:30:37,160
the size of the smallest effect
we can detect increases

2000
01:30:37,160 --> 01:30:39,940
by this formula compared to
a non-clustered design.

2001
01:30:39,940 --> 01:30:43,880
So this shows you what
this looks like, OK?

2002
01:30:43,880 --> 01:30:50,230
So suppose you had 100
kids per school, OK?

2003
01:30:50,230 --> 01:30:53,270
Suppose you had 100 kids per
school and you randomized at

2004
01:30:53,270 --> 01:30:56,310
the school level rather
the individual level.

2005
01:30:56,310 --> 01:30:59,040
If your correlation coefficient
was zero, it would

2006
01:30:59,040 --> 01:31:00,460
be the same as if we randomized
at the individual

2007
01:31:00,460 --> 01:31:02,450
level because they're totally
uncorrelated.

2008
01:31:02,450 --> 01:31:05,095
Suppose your correlation
coefficient was 0.1--

2009
01:31:05,095 --> 01:31:06,980
rho was 0.1.

2010
01:31:06,980 --> 01:31:11,200
Then the smallest effect size
you could have would be 3.3

2011
01:31:11,200 --> 01:31:15,120
times larger than if you had
done an individual design.

2012
01:31:19,413 --> 01:31:22,860
So does that make sense
how to interpret this?

2013
01:31:22,860 --> 01:31:27,150
And so this illustrates that,
even with very mild

2014
01:31:27,150 --> 01:31:30,110
correlation coefficients-- and
we saw examples of those math

2015
01:31:30,110 --> 01:31:31,610
test scores that
were like 0.5.

2016
01:31:31,610 --> 01:31:34,300
This is only 0.1, but it already
means, in some sense,

2017
01:31:34,300 --> 01:31:38,380
that your experiment
can detect things--

2018
01:31:41,180 --> 01:31:42,980
if you had been able to
individually randomize, you

2019
01:31:42,980 --> 01:31:44,480
would be able to detect things
that were three times as

2020
01:31:44,480 --> 01:31:47,100
small, right?

2021
01:31:47,100 --> 01:31:49,520
Now that's a combination of
the fact that you have the

2022
01:31:49,520 --> 01:31:50,540
correlation coefficient
and the number

2023
01:31:50,540 --> 01:31:51,930
of people per cluster.

2024
01:31:51,930 --> 01:31:56,634
AUDIENCE: Then in the previous
slide, 0.5 does not mean half?

2025
01:31:56,634 --> 01:32:00,198
PROFESSOR: No, 0.5 is the
correlation-- it's rho.

2026
01:32:00,198 --> 01:32:01,820
AUDIENCE: No, in the

2027
01:32:01,820 --> 01:32:02,495
PROFESSOR: It's rho.

2028
01:32:02,495 --> 01:32:04,600
AUDIENCE: Then it does not mean
half of the difference--

2029
01:32:04,600 --> 01:32:07,650
PROFESSOR: No, it's half
of the variance.

2030
01:32:10,960 --> 01:32:11,930
Let me move on.

2031
01:32:11,930 --> 01:32:15,365
We can talk about the
formula for that.

2032
01:32:15,365 --> 01:32:16,615
OK.

2033
01:32:20,040 --> 01:32:21,420
So what this means is, if the
experimental design is

2034
01:32:21,420 --> 01:32:24,720
clustered, we now not only need
to consider all the other

2035
01:32:24,720 --> 01:32:26,200
factors we talked about before,
we also need to

2036
01:32:26,200 --> 01:32:29,660
consider this factor rho when
doing our power calculations.

2037
01:32:29,660 --> 01:32:32,360
And rho is yet another thing we
can try to estimate based

2038
01:32:32,360 --> 01:32:35,845
on our little survey of our
population to get a sense of

2039
01:32:35,845 --> 01:32:37,095
what this rho is likely to be.

2040
01:32:40,570 --> 01:32:46,150
And given this clustering issue,
it's very important not

2041
01:32:46,150 --> 01:32:47,820
just that you have a big
enough number of people

2042
01:32:47,820 --> 01:32:50,130
involved in your experiment, but
that you randomize across

2043
01:32:50,130 --> 01:32:52,560
a big enough number
of groups, right?

2044
01:32:52,560 --> 01:32:54,745
And the way I like to think
about is, how many times did

2045
01:32:54,745 --> 01:32:56,350
you flip the coin as to who
should be treatment and who

2046
01:32:56,350 --> 01:32:57,600
should be control?

2047
01:33:00,830 --> 01:33:03,540
And, in fact, it's usually the
case that the number of groups

2048
01:33:03,540 --> 01:33:07,430
you have is often more important
than the total

2049
01:33:07,430 --> 01:33:11,660
number of individuals that you
have because the individuals

2050
01:33:11,660 --> 01:33:17,090
are correlated within
a group, OK?

2051
01:33:17,090 --> 01:33:18,340
So moving on.

2052
01:33:25,530 --> 01:33:26,890
So I'm going to flip
through this.

2053
01:33:26,890 --> 01:33:28,050
This is mostly going over some
of this if you were doing the

2054
01:33:28,050 --> 01:33:29,300
exercise quickly.

2055
01:33:33,161 --> 01:33:35,120
OK.

2056
01:33:35,120 --> 01:33:37,090
And so this chart--

2057
01:33:41,860 --> 01:33:43,930
in your exercise shows you some
of the tradeoffs that you

2058
01:33:43,930 --> 01:33:46,290
should think about when you're
trying to decide how you

2059
01:33:46,290 --> 01:33:52,180
should trade off the number of
groups you have versus the

2060
01:33:52,180 --> 01:33:54,460
number of people within
a group, OK?

2061
01:33:54,460 --> 01:33:58,820
So in this particular case, a
group was a gram panchayat,

2062
01:33:58,820 --> 01:34:01,740
and within a group there
were villages, OK?

2063
01:34:01,740 --> 01:34:03,680
And there were different costs
involved in doing these

2064
01:34:03,680 --> 01:34:04,390
different things, right?

2065
01:34:04,390 --> 01:34:07,670
So going to the place involved
transportation costs to get to

2066
01:34:07,670 --> 01:34:08,980
the gram panchayat.

2067
01:34:08,980 --> 01:34:10,410
That, say, was a
couple of days.

2068
01:34:10,410 --> 01:34:12,480
And then it took, like, half a
day, say, for every village

2069
01:34:12,480 --> 01:34:13,620
you interviewed.

2070
01:34:13,620 --> 01:34:16,450
So that said, there's some
cost of adding a new gram

2071
01:34:16,450 --> 01:34:18,840
panchayat, but also some
marginal cost of adding

2072
01:34:18,840 --> 01:34:21,860
additional village per
gram panchayat, OK?

2073
01:34:21,860 --> 01:34:25,750
So you could calculate, based
on all your parameters and

2074
01:34:25,750 --> 01:34:29,360
power of 80% and whatever the
intercluster correlation is in

2075
01:34:29,360 --> 01:34:31,710
this particular case, you could
say, well, if we had

2076
01:34:31,710 --> 01:34:34,830
this many villages per gram
panchayat, how many gram

2077
01:34:34,830 --> 01:34:36,550
panchayats would we need
and how many villages

2078
01:34:36,550 --> 01:34:38,280
would we need, OK?

2079
01:34:38,280 --> 01:34:39,460
So you can do this
set of exercises

2080
01:34:39,460 --> 01:34:40,460
and you can say that--

2081
01:34:40,460 --> 01:34:43,500
and you'll note, for example,
that as we reduce the number

2082
01:34:43,500 --> 01:34:46,230
of gram panchayats we go to--
another way, as we add more

2083
01:34:46,230 --> 01:34:49,040
villages per gram panchayat, the
total number of villages

2084
01:34:49,040 --> 01:34:51,040
we need to survey goes up.

2085
01:34:51,040 --> 01:34:54,030
And in this particular case, it
doesn't go up by that much

2086
01:34:54,030 --> 01:34:57,290
because the intercluster
correlation is not that high.

2087
01:34:57,290 --> 01:34:59,170
And you could actually do this
type of calculation and you

2088
01:34:59,170 --> 01:35:01,910
could say, well I know
my costs are, right?

2089
01:35:01,910 --> 01:35:03,510
I know what my costs of
going this place are.

2090
01:35:03,510 --> 01:35:06,420
And I can calculate which of
these designs is the cheapest

2091
01:35:06,420 --> 01:35:08,780
design given what I
want to achieve.

2092
01:35:08,780 --> 01:35:13,080
The other thing is, in this
case, the experiment was

2093
01:35:13,080 --> 01:35:14,340
happening everywhere and
they were just trying

2094
01:35:14,340 --> 01:35:15,680
to design the survey.

2095
01:35:15,680 --> 01:35:17,690
But often when we're doing this,
we also need to pay for

2096
01:35:17,690 --> 01:35:19,690
the intervention itself.

2097
01:35:19,690 --> 01:35:22,860
And, at least in a lot of the
cases that I've worked with,

2098
01:35:22,860 --> 01:35:26,940
the cost of actually doing the
intervention is much bigger

2099
01:35:26,940 --> 01:35:29,150
than the cost of doing
the survey.

2100
01:35:29,150 --> 01:35:33,460
And so, in that case, if you
always have to treat every

2101
01:35:33,460 --> 01:35:37,670
village in the gram panchayat,
you can actually save a ton of

2102
01:35:37,670 --> 01:35:39,670
money by going down in the
number of gram panchayats and

2103
01:35:39,670 --> 01:35:41,930
surveying a lot more villages.

2104
01:35:41,930 --> 01:35:44,520
But the whole point is there
are these tradeoffs and you

2105
01:35:44,520 --> 01:35:47,910
need to, in deciding how you're
going to structure your

2106
01:35:47,910 --> 01:35:49,370
experiment and how you're
going to structure your

2107
01:35:49,370 --> 01:35:51,560
survey, you need to think
through what these tradeoffs

2108
01:35:51,560 --> 01:35:54,380
are, make sure you have enough
power, given your estimates of

2109
01:35:54,380 --> 01:35:56,150
your intercluster correlation
and sort of do the cost

2110
01:35:56,150 --> 01:35:58,560
minimizing thing.

2111
01:35:58,560 --> 01:36:01,320
OK, so in the last five minutes
or so, let me just

2112
01:36:01,320 --> 01:36:04,880
highlight a couple of the other
issues that come up in

2113
01:36:04,880 --> 01:36:07,210
thinking about power
calculations.

2114
01:36:07,210 --> 01:36:10,970
So as I mentioned, the cluster
design is one of the most

2115
01:36:10,970 --> 01:36:12,320
important ones.

2116
01:36:12,320 --> 01:36:14,070
And the key thing is making
sure you have enough

2117
01:36:14,070 --> 01:36:17,070
independent groups, where you
flip the coin to randomize

2118
01:36:17,070 --> 01:36:19,300
betwee treatment and control
enough times.

2119
01:36:19,300 --> 01:36:21,300
Some other things that matter
are baselines, control

2120
01:36:21,300 --> 01:36:23,190
variables, and the hypothesis
being tested.

2121
01:36:23,190 --> 01:36:26,150
So one minute on
each of those.

2122
01:36:26,150 --> 01:36:29,810
A baseline has two uses--

2123
01:36:29,810 --> 01:36:31,330
main uses.

2124
01:36:31,330 --> 01:36:34,160
One use of a baseline is that it
lets you check whether the

2125
01:36:34,160 --> 01:36:35,400
treatment and control
group look the

2126
01:36:35,400 --> 01:36:37,780
same before you started.

2127
01:36:37,780 --> 01:36:40,750
And if you randomized properly,
we know they should

2128
01:36:40,750 --> 01:36:41,630
look similar.

2129
01:36:41,630 --> 01:36:43,500
But you want to make sure that
your randomization was

2130
01:36:43,500 --> 01:36:46,030
actually carried out the way it
was supposed to be and that

2131
01:36:46,030 --> 01:36:48,970
it wasn't the case that people
were pulling out of the hat

2132
01:36:48,970 --> 01:36:51,060
until they got a treatment or
something, that they were

2133
01:36:51,060 --> 01:36:52,960
actually randomizing the way
they were supposed to.

2134
01:36:52,960 --> 01:36:56,200
And having a baseline conducted
before you start can

2135
01:36:56,200 --> 01:37:00,370
allow you to test that your
randomization is actually

2136
01:37:00,370 --> 01:37:03,630
truly random and your groups
look balanced.

2137
01:37:03,630 --> 01:37:06,070
The other thing is, the baseline
can actually help

2138
01:37:06,070 --> 01:37:11,280
reduce your survey size needed
because, but it requires you

2139
01:37:11,280 --> 01:37:14,720
to a survey before you start
the intervention, right?

2140
01:37:14,720 --> 01:37:17,700
And the reason it can reduce
your sample size is that now,

2141
01:37:17,700 --> 01:37:22,550
instead of just looking at, say,
test scores across kids,

2142
01:37:22,550 --> 01:37:25,170
I can look at the change in test
scores from before versus

2143
01:37:25,170 --> 01:37:27,460
after the experiment started.

2144
01:37:27,460 --> 01:37:29,450
And if people are really
persistent, like if the people

2145
01:37:29,450 --> 01:37:31,120
who did really well on the test
this year are likely to

2146
01:37:31,120 --> 01:37:33,270
do really well on the test
next year, that can

2147
01:37:33,270 --> 01:37:37,710
essentially reduce the variance
of your outcome.

2148
01:37:37,710 --> 01:37:43,100
It can be that the variance of
difference in test scores can

2149
01:37:43,100 --> 01:37:45,550
be a lot lower than the variance
in test scores.

2150
01:37:45,550 --> 01:37:47,490
And having a baseline can help
you for that reason.

2151
01:37:51,390 --> 01:37:54,570
And as this slide points out,
your evaluation costs

2152
01:37:54,570 --> 01:37:57,910
basically double because you
have to do two surveys, not

2153
01:37:57,910 --> 01:38:01,450
one survey, but the costs of
the intervention go down

2154
01:38:01,450 --> 01:38:03,620
because you can have a slightly
smaller sample.

2155
01:38:03,620 --> 01:38:06,420
So if your intervention is
really expensive relative to

2156
01:38:06,420 --> 01:38:08,580
your survey, this can
make a lot of sense.

2157
01:38:08,580 --> 01:38:10,020
If your survey is really
expensive relative to your

2158
01:38:10,020 --> 01:38:13,820
intervention, you might
not want to do this.

2159
01:38:13,820 --> 01:38:17,830
And to figure out how this is
going to affect your power,

2160
01:38:17,830 --> 01:38:21,010
you need to know yet another
fact, which is how correlated

2161
01:38:21,010 --> 01:38:24,600
are people's outcomes
over time, right?

2162
01:38:24,600 --> 01:38:27,040
What's the correlation between
how well I do on a test today

2163
01:38:27,040 --> 01:38:28,240
and how well I do on
a test tomorrow?

2164
01:38:28,240 --> 01:38:29,770
And some things are really
correlated and some things are

2165
01:38:29,770 --> 01:38:30,900
not that correlated.

2166
01:38:30,900 --> 01:38:32,640
And a baseline really helps you
on things that are really

2167
01:38:32,640 --> 01:38:33,890
correlated.

2168
01:38:38,570 --> 01:38:40,065
Another thing that can help
you is stratification.

2169
01:38:42,740 --> 01:38:48,980
So what stratification can do
is, stratification says,

2170
01:38:48,980 --> 01:38:50,150
suppose I--

2171
01:38:50,150 --> 01:38:52,280
in some ways, it's conceptually
a little bit like

2172
01:38:52,280 --> 01:38:54,750
a baseline, which is, suppose I
know that all of the people

2173
01:38:54,750 --> 01:38:58,040
who live in this village tend
to have similar outcomes and

2174
01:38:58,040 --> 01:38:59,430
all of the people who live in
this village tend to have

2175
01:38:59,430 --> 01:38:59,880
similar outcomes.

2176
01:38:59,880 --> 01:39:01,780
And all of the people who live
in this village tend to have

2177
01:39:01,780 --> 01:39:03,960
similar outcomes.

2178
01:39:03,960 --> 01:39:07,810
If I can then randomize by
village, I can compare the

2179
01:39:07,810 --> 01:39:11,090
people in each village
to each other, OK?

2180
01:39:11,090 --> 01:39:14,480
So if I'm looking within
village, if people in villages

2181
01:39:14,480 --> 01:39:16,910
tend to be similar and I can
randomize within village, if I

2182
01:39:16,910 --> 01:39:18,820
look within villages, the
difference between the

2183
01:39:18,820 --> 01:39:19,710
treatment and the
control group is

2184
01:39:19,710 --> 01:39:21,650
going to be less noisy.

2185
01:39:21,650 --> 01:39:23,930
So stratifying is basically a
way of saying I'm going to

2186
01:39:23,930 --> 01:39:27,270
make sure my sample is balanced
across the treatment

2187
01:39:27,270 --> 01:39:29,620
and control groups within
certain subgroups of the

2188
01:39:29,620 --> 01:39:30,680
population.

2189
01:39:30,680 --> 01:39:32,920
And then I'm going to compare
within those subgroups of the

2190
01:39:32,920 --> 01:39:35,340
population when I
do my analysis.

2191
01:39:35,340 --> 01:39:36,630
And once again, we can
think this as a

2192
01:39:36,630 --> 01:39:38,120
way of reducing noise.

2193
01:39:38,120 --> 01:39:41,100
That if people in similar
villages tend to be similar,

2194
01:39:41,100 --> 01:39:43,250
if I only compare treatmet and
control within the same

2195
01:39:43,250 --> 01:39:48,160
village, the noise there
is going to be smaller.

2196
01:39:48,160 --> 01:39:51,150
So in some sense,
it's similar.

2197
01:39:51,150 --> 01:39:55,520
So some things we tend to
stratify by, if we know the

2198
01:39:55,520 --> 01:39:57,660
baseline value of the outcome,
we can sometimes stratify by

2199
01:39:57,660 --> 01:39:59,480
that because we know that the
effects are going to be

2200
01:39:59,480 --> 01:40:02,720
similar for people who have very
similar baseline values.

2201
01:40:02,720 --> 01:40:04,815
Or often, I think,
geographically we often tend

2202
01:40:04,815 --> 01:40:04,980
to do that.

2203
01:40:04,980 --> 01:40:07,420
So basically, we think that
people in certain areas tend

2204
01:40:07,420 --> 01:40:09,170
to be similar so we're going
to make sure our treatments

2205
01:40:09,170 --> 01:40:11,760
and controls are balanced in
those areas as a way of

2206
01:40:11,760 --> 01:40:13,010
reducing noise.

2207
01:40:19,400 --> 01:40:23,680
And the final thing we want to
mention is the hypothesis

2208
01:40:23,680 --> 01:40:30,080
being tested, which is, the more
things you want to test,

2209
01:40:30,080 --> 01:40:32,930
the bigger your sample is
going to need to be.

2210
01:40:32,930 --> 01:40:35,480
So for example, are we
interested in the difference

2211
01:40:35,480 --> 01:40:37,170
between two treatments
as well as the

2212
01:40:37,170 --> 01:40:39,280
treatment versus control?

2213
01:40:39,280 --> 01:40:41,740
If so, we need a much bigger
sample because we not only

2214
01:40:41,740 --> 01:40:43,620
need to be able to tell the
treatment versus the control

2215
01:40:43,620 --> 01:40:45,490
but we also need to be able to
tell the two treatments from

2216
01:40:45,490 --> 01:40:48,110
each other, right?

2217
01:40:48,110 --> 01:40:50,600
So suppose you have two
different treatments.

2218
01:40:50,600 --> 01:40:54,255
Are you interested in just the
overall effect of each of the

2219
01:40:54,255 --> 01:40:54,760
two treatments?

2220
01:40:54,760 --> 01:40:55,890
Or are you interested in
whether the treatments

2221
01:40:55,890 --> 01:40:58,900
interact produces a different
effect if they happen

2222
01:40:58,900 --> 01:41:00,400
together, right?

2223
01:41:00,400 --> 01:41:03,120
The more things you're
interested in, the bigger your

2224
01:41:03,120 --> 01:41:05,100
sample needs to be because you
need to design your sample to

2225
01:41:05,100 --> 01:41:08,450
be big enough to answer each of
these different questions.

2226
01:41:08,450 --> 01:41:10,740
Another thing, for example, you
were interested in testing

2227
01:41:10,740 --> 01:41:12,270
whether the effect is different
in different

2228
01:41:12,270 --> 01:41:13,330
subpopulations.

2229
01:41:13,330 --> 01:41:15,440
Do you just want to know the
average effect of your program

2230
01:41:15,440 --> 01:41:17,380
or do you want to know if it was
different in rural areas

2231
01:41:17,380 --> 01:41:19,170
versus urban areas?

2232
01:41:19,170 --> 01:41:20,860
If you want to know if it's
different in rural versus

2233
01:41:20,860 --> 01:41:22,500
urban areas, you're going to
need a big enough sample in

2234
01:41:22,500 --> 01:41:24,846
rural areas and a big enough
sample in urban areas that you

2235
01:41:24,846 --> 01:41:27,270
can compare the difference
between them.

2236
01:41:27,270 --> 01:41:34,550
So the more different things
that you want to test,

2237
01:41:34,550 --> 01:41:36,840
obviously, the bigger
your experiment's

2238
01:41:36,840 --> 01:41:37,760
going to need to be.

2239
01:41:37,760 --> 01:41:39,400
And a lot of times, in actually
designing the

2240
01:41:39,400 --> 01:41:41,660
experiment, this is something
that comes up all the time,

2241
01:41:41,660 --> 01:41:44,250
that you will very quickly
figure out that the number of

2242
01:41:44,250 --> 01:41:46,800
questions you would like to
answer is far bigger than the

2243
01:41:46,800 --> 01:41:49,270
sample size you can afford.

2244
01:41:49,270 --> 01:41:52,110
And one of the really important
conversations you

2245
01:41:52,110 --> 01:41:53,160
need to have as you're
starting to design an

2246
01:41:53,160 --> 01:41:57,490
experiment are, which are the
really critical questions that

2247
01:41:57,490 --> 01:41:59,690
I really need to know
the answer to?

2248
01:41:59,690 --> 01:42:02,210
So for example, in a project
I was recently doing in

2249
01:42:02,210 --> 01:42:05,020
Indonesia, it turned out that
the government really wanted

2250
01:42:05,020 --> 01:42:06,870
to know whether this program
would work differently in

2251
01:42:06,870 --> 01:42:10,200
urban versus rural areas because
they had a view that

2252
01:42:10,200 --> 01:42:11,210
urban areas are really
different.

2253
01:42:11,210 --> 01:42:13,320
And they were willing to do
different programs in urban

2254
01:42:13,320 --> 01:42:14,220
versus rural areas.

2255
01:42:14,220 --> 01:42:16,810
So we designed our whole sample
to make sure we had

2256
01:42:16,810 --> 01:42:21,110
enough sampled in urban areas
and in rural areas that we

2257
01:42:21,110 --> 01:42:22,550
could test those two
things apart.

2258
01:42:22,550 --> 01:42:24,650
That almost doubled the size
of the experiment, but the

2259
01:42:24,650 --> 01:42:26,580
government thought that was
important enough that they

2260
01:42:26,580 --> 01:42:29,410
really wanted to do that.

2261
01:42:29,410 --> 01:42:32,200
The point here is that--

2262
01:42:32,200 --> 01:42:33,930
that was the one they
wanted to focus on.

2263
01:42:33,930 --> 01:42:34,690
There was a million
other things we

2264
01:42:34,690 --> 01:42:35,780
could have done instead.

2265
01:42:35,780 --> 01:42:39,670
And so it's really important
to think about, before you

2266
01:42:39,670 --> 01:42:42,430
design the experiment, what the
few key things you want to

2267
01:42:42,430 --> 01:42:45,530
test are because, as I said, net
you're never going to have

2268
01:42:45,530 --> 01:42:48,035
enough money to test all
things you want.

2269
01:42:48,035 --> 01:42:51,390
That's sort of a universal
truth.

2270
01:42:51,390 --> 01:42:53,360
So just to conclude, we've talk
about in this lecture--

2271
01:42:59,270 --> 01:43:02,050
going back to the basic
statistics of how you're going

2272
01:43:02,050 --> 01:43:06,030
to analyze the experiment,
thinking about how noisy your

2273
01:43:06,030 --> 01:43:08,486
outcome is going to be and how
you're going to compute your

2274
01:43:08,486 --> 01:43:10,230
confidence intervals,
how big your effect

2275
01:43:10,230 --> 01:43:11,900
size is going to be.

2276
01:43:11,900 --> 01:43:15,140
That's all what goes into doing
a power calculation.

2277
01:43:15,140 --> 01:43:16,870
You also need to do some
guess work, right?

2278
01:43:16,870 --> 01:43:19,330
The power calculation is going
to require you to estimate how

2279
01:43:19,330 --> 01:43:21,680
big your sample is
going to be--

2280
01:43:21,680 --> 01:43:25,130
sorry, how much variance there's
going to be, what your

2281
01:43:25,130 --> 01:43:25,980
effect size is going to be.

2282
01:43:25,980 --> 01:43:28,410
You have to make some
assumptions.

2283
01:43:28,410 --> 01:43:31,400
And a little bit of pilot
testing before the experiment

2284
01:43:31,400 --> 01:43:33,680
begins can be really useful,
I think, mostly

2285
01:43:33,680 --> 01:43:35,730
for thinking about--

2286
01:43:35,730 --> 01:43:37,380
just collecting some data
can be useful to

2287
01:43:37,380 --> 01:43:38,630
estimate these variances.

2288
01:43:41,040 --> 01:43:43,350
The power calculations can
help you think about this

2289
01:43:43,350 --> 01:43:45,380
question of how many treatments
you can afford to

2290
01:43:45,380 --> 01:43:50,720
have, and can I afford to do
three different versions of

2291
01:43:50,720 --> 01:43:54,210
the program or do I really need
to just pick one or two?

2292
01:43:54,210 --> 01:43:56,880
How do I make this tradeoff of
more clusters versus more

2293
01:43:56,880 --> 01:43:57,830
observations per cluster?

2294
01:43:57,830 --> 01:44:00,390
The power calculation can
be very helpful here.

2295
01:44:00,390 --> 01:44:01,910
And the other thing, and in some
sense, the place I find

2296
01:44:01,910 --> 01:44:04,050
power calculations the most
useful, especially because

2297
01:44:04,050 --> 01:44:06,430
there is a bit of guesswork in
power calculations and you get

2298
01:44:06,430 --> 01:44:07,750
rough rules of thumb.

2299
01:44:07,750 --> 01:44:09,450
You don't get precise answers
because it depends on the

2300
01:44:09,450 --> 01:44:09,960
assumptions.

2301
01:44:09,960 --> 01:44:12,580
But what I find this is really
useful for is whether this is

2302
01:44:12,580 --> 01:44:14,090
feasible or not, right?

2303
01:44:14,090 --> 01:44:17,790
Is this something where I'm
kind of in the right range

2304
01:44:17,790 --> 01:44:20,630
where I think I can get
estimates, or where there's no

2305
01:44:20,630 --> 01:44:23,000
chance, no matter how successful
this program is,

2306
01:44:23,000 --> 01:44:25,900
that I'm going to be able to
pick it up in my data because

2307
01:44:25,900 --> 01:44:28,730
the variable is just
way too noisy.

2308
01:44:28,730 --> 01:44:31,210
And it's really important that
you do the power calculation,

2309
01:44:31,210 --> 01:44:32,750
particularly--

2310
01:44:32,750 --> 01:44:35,200
both for structuring how to
design the experiment, but

2311
01:44:35,200 --> 01:44:37,450
particularly to make sure you're
not going to waste a

2312
01:44:37,450 --> 01:44:38,740
lot of time and money doing
something where you're going

2313
01:44:38,740 --> 01:44:40,820
to have no hope of
picking it up.

2314
01:44:44,220 --> 01:44:46,720
Because a study which is
underpowered is going to waste

2315
01:44:46,720 --> 01:44:49,210
a lot of everyone's time and
you're very frustrated.

2316
01:44:49,210 --> 01:44:50,970
Very frustrating for
everyone involved.

2317
01:44:50,970 --> 01:44:53,440
So you want to make sure you do
this right before you start

2318
01:44:53,440 --> 01:44:54,940
because otherwise, you're going
to end up spending a lot

2319
01:44:54,940 --> 01:44:58,150
of time, money, and effort on
an experiment and ending up

2320
01:44:58,150 --> 01:45:01,060
not being able to conclude
much of anything.

2321
01:45:01,060 --> 01:45:01,735
OK.

2322
01:45:01,735 --> 01:45:02,985
Thanks very much.