1 00:00:00,000 --> 00:00:01,930 FEMALE SPEAKER: The following content is provided under a 2 00:00:01,930 --> 00:00:03,670 Creative Commons license. 3 00:00:03,670 --> 00:00:06,640 Your support will help MIT OpenCourseWare continue to 4 00:00:06,640 --> 00:00:09,980 offer high-quality educational resources for free. 5 00:00:09,980 --> 00:00:12,820 To make a donation or to view additional materials from 6 00:00:12,820 --> 00:00:15,246 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:15,246 --> 00:00:16,496 ocw.mit.edu. 8 00:00:20,825 --> 00:00:23,110 PROFESSOR: OK, so my name is Ben Olken and we're going to 9 00:00:23,110 --> 00:00:25,640 be talking about how to think about sample size for 10 00:00:25,640 --> 00:00:26,780 randomized evaluations. 11 00:00:26,780 --> 00:00:29,250 And more generally that the point of this lecture is not 12 00:00:29,250 --> 00:00:32,810 just about sample size but we've spent a lot of time, 13 00:00:32,810 --> 00:00:34,650 like in last lecture, for example, thinking about the 14 00:00:34,650 --> 00:00:35,800 data we're going to collect. 15 00:00:35,800 --> 00:00:37,000 Then the question is, well, what are we going to 16 00:00:37,000 --> 00:00:38,120 do with that data? 17 00:00:38,120 --> 00:00:41,550 And so it's about sample size but also more generally, we're 18 00:00:41,550 --> 00:00:44,540 going to talk about how do we analyze data in the context of 19 00:00:44,540 --> 00:00:45,504 an experiment. 20 00:00:45,504 --> 00:00:46,310 OK. 21 00:00:46,310 --> 00:00:50,970 So as I said, where we're going to end up at the end of 22 00:00:50,970 --> 00:00:53,630 this lecture is, how big a sample do we need? 23 00:00:53,630 --> 00:00:55,670 But in order to think about how big a sample we need, we 24 00:00:55,670 --> 00:00:58,020 need to understand a little more about how do we actually 25 00:00:58,020 --> 00:01:01,070 analyze this data. 26 00:01:01,070 --> 00:01:04,690 When we say, how large does a sample need to be to credibly 27 00:01:04,690 --> 00:01:09,110 detect a given treatment effect, we're going to need to 28 00:01:09,110 --> 00:01:12,170 be a little more precise about what we mean by credibly and 29 00:01:12,170 --> 00:01:15,950 particularly think a little bit about the statistics that 30 00:01:15,950 --> 00:01:18,470 are involved in thinking through-- 31 00:01:18,470 --> 00:01:19,180 evaluate-- 32 00:01:19,180 --> 00:01:21,210 understanding these experiments. 33 00:01:21,210 --> 00:01:23,290 And particularly, when we say something that's credibly 34 00:01:23,290 --> 00:01:25,870 different, what we mean is that we can be reasonably 35 00:01:25,870 --> 00:01:27,660 sure, and I'll be a little bit more precise about what we 36 00:01:27,660 --> 00:01:30,110 mean by that, that the difference between the two 37 00:01:30,110 --> 00:01:32,430 different groups-- the treatment and control group-- 38 00:01:32,430 --> 00:01:34,770 didn't just occur by random chance, right? 39 00:01:34,770 --> 00:01:37,500 That there's really something that we'll call statistically 40 00:01:37,500 --> 00:01:41,260 significantly different between these two groups, OK? 41 00:01:41,260 --> 00:01:43,840 And when we think about randomizing, right? 42 00:01:43,840 --> 00:01:47,350 So we've talked about which groups get the treatment and 43 00:01:47,350 --> 00:01:51,290 which get the control, that's going to mean that we expect 44 00:01:51,290 --> 00:01:53,610 the two groups to be similar if there was no treatment 45 00:01:53,610 --> 00:01:55,260 effect because the only difference between them is 46 00:01:55,260 --> 00:01:56,550 that they were randomized. 47 00:01:56,550 --> 00:02:00,060 But there's going to be some variation in the outcomes 48 00:02:00,060 --> 00:02:02,780 between the two different groups, OK? 49 00:02:02,780 --> 00:02:05,160 And so randomization is going to remove the bias. 50 00:02:05,160 --> 00:02:07,290 It's going to mean that the groups-- we expect the two 51 00:02:07,290 --> 00:02:08,919 different groups to be the same, but there 52 00:02:08,919 --> 00:02:10,520 still could be noise. 53 00:02:10,520 --> 00:02:12,990 So in some sense, another way of thinking about this lecture 54 00:02:12,990 --> 00:02:15,170 is that this lecture is all about the noise. 55 00:02:15,170 --> 00:02:18,160 And how big a sample do we need for the noise to be 56 00:02:18,160 --> 00:02:20,800 sufficiently small for us to actually credibly detect the 57 00:02:20,800 --> 00:02:24,400 differences between the two different groups, OK? 58 00:02:24,400 --> 00:02:26,760 So that's what we're going to talk about is basically, how 59 00:02:26,760 --> 00:02:28,960 large is large so we can get rid of the noise? 60 00:02:28,960 --> 00:02:30,595 And let me say, by the way, that we've got an hour and a 61 00:02:30,595 --> 00:02:33,310 half, but you should feel free to interrupt with questions or 62 00:02:33,310 --> 00:02:35,270 whatever if I say something that's not clear because 63 00:02:35,270 --> 00:02:36,650 there's a lot of material that we're going to be going 64 00:02:36,650 --> 00:02:38,980 through pretty quickly. 65 00:02:38,980 --> 00:02:40,390 OK. 66 00:02:40,390 --> 00:02:45,050 So when we think about how big our sample means to be-- 67 00:02:45,050 --> 00:02:49,050 remember, the whole point is how big does our sample have 68 00:02:49,050 --> 00:02:52,530 to be remove the noise that's going to be in our data? 69 00:02:52,530 --> 00:02:56,050 And when we think about that, we think essentially about how 70 00:02:56,050 --> 00:02:58,190 noisy our data is, right? 71 00:02:58,190 --> 00:03:02,230 So how big a sample we need is going to be determined by how 72 00:03:02,230 --> 00:03:05,810 noisy is the data and also how big an effect we're looking 73 00:03:05,810 --> 00:03:06,650 for, right? 74 00:03:06,650 --> 00:03:11,770 So if the data is really noisy but the effect is enormous, 75 00:03:11,770 --> 00:03:13,200 then we don't need as big of a sample. 76 00:03:13,200 --> 00:03:15,310 But if the effect we're looking for is really small 77 00:03:15,310 --> 00:03:17,550 relative to the noise in the data, we're going to need a 78 00:03:17,550 --> 00:03:18,260 bigger sample. 79 00:03:18,260 --> 00:03:20,690 So actually, sometimes it's the comparison between the 80 00:03:20,690 --> 00:03:22,450 effect size and how noisy the data is. 81 00:03:22,450 --> 00:03:24,330 It's the ratio between these things 82 00:03:24,330 --> 00:03:26,520 that's really important. 83 00:03:26,520 --> 00:03:28,920 Other factors that we're going to talk about are, did we do a 84 00:03:28,920 --> 00:03:31,160 baseline survey before we started? 85 00:03:31,160 --> 00:03:33,400 Because a baseline can essentially help us reduce the 86 00:03:33,400 --> 00:03:35,950 noise in some sense. 87 00:03:35,950 --> 00:03:37,520 We're going to talk about whether individual responses 88 00:03:37,520 --> 00:03:38,560 are correlated with each other. 89 00:03:38,560 --> 00:03:43,790 So for example, if we were to randomize a whole group of 90 00:03:43,790 --> 00:03:47,050 people into a given treatment, that group might be similar in 91 00:03:47,050 --> 00:03:48,100 lots of other respects. 92 00:03:48,100 --> 00:03:50,610 So you can't really count that whole group as if they were 93 00:03:50,610 --> 00:03:52,430 all independent observations because they might be 94 00:03:52,430 --> 00:03:52,880 correlated. 95 00:03:52,880 --> 00:03:54,890 For example, you all just took my lecture. 96 00:03:54,890 --> 00:03:57,710 So if you all were put in the same treatment group, you all 97 00:03:57,710 --> 00:04:00,085 were exposed to the treatment but you also all were exposed 98 00:04:00,085 --> 00:04:01,030 to my lecture and so you're not 99 00:04:01,030 --> 00:04:03,970 necessarily independent events. 100 00:04:03,970 --> 00:04:05,650 And there are some other issues in terms of the design 101 00:04:05,650 --> 00:04:07,270 of the experiment that we'll talk about that can help 102 00:04:07,270 --> 00:04:10,050 affect samples as well, like stratification, control 103 00:04:10,050 --> 00:04:12,860 variables, baseline data, et cetera, which we're going to 104 00:04:12,860 --> 00:04:14,790 talk about, OK? 105 00:04:14,790 --> 00:04:17,640 So the way we're going to go in this lecture is, I'm going 106 00:04:17,640 --> 00:04:20,620 to start off with some basics about, what does it mean to 107 00:04:20,620 --> 00:04:23,280 test a hypothesis statistically? 108 00:04:23,280 --> 00:04:25,440 And then when we get into hypothesis testing, there are 109 00:04:25,440 --> 00:04:28,670 two different types of errors that we're 110 00:04:28,670 --> 00:04:29,280 going to talk about. 111 00:04:29,280 --> 00:04:32,640 They're helpfully named type I and type II errors. 112 00:04:32,640 --> 00:04:34,960 And you have to be careful not to make a type III error, 113 00:04:34,960 --> 00:04:38,630 which is to confuse a type I and a type II error. 114 00:04:38,630 --> 00:04:42,790 So we'll talk about what those are. 115 00:04:42,790 --> 00:04:45,215 Then we'll talk about standard errors and significance, which 116 00:04:45,215 --> 00:04:48,380 is, how do we think about more formally what these different 117 00:04:48,380 --> 00:04:49,630 types of errors are? 118 00:04:54,060 --> 00:04:55,150 We'll talk about power. 119 00:04:55,150 --> 00:04:56,500 We'll talk about the effect size. 120 00:04:56,500 --> 00:04:58,470 And then, finally, the factors that influence power, OK? 121 00:04:58,470 --> 00:04:59,710 So this is all the stuff we're going to go 122 00:04:59,710 --> 00:05:01,950 through, all right? 123 00:05:01,950 --> 00:05:09,108 So in order to understand the basic concepts of-- 124 00:05:09,108 --> 00:05:11,420 when we're talking about hypothesis testing, we need to 125 00:05:11,420 --> 00:05:13,285 think a little about probabilities, OK? 126 00:05:13,285 --> 00:05:15,480 Because all this comes down, essentially, to some basic 127 00:05:15,480 --> 00:05:18,600 analysis about probability. 128 00:05:18,600 --> 00:05:21,040 So for example, suppose you had a professional-- and the 129 00:05:21,040 --> 00:05:24,390 intuition here is that the more observations we get, the 130 00:05:24,390 --> 00:05:27,200 more we can understand the true probability that 131 00:05:27,200 --> 00:05:28,380 something occurred-- 132 00:05:28,380 --> 00:05:30,700 whether the true probability that something occurred was 133 00:05:30,700 --> 00:05:33,620 due to a real difference in the underlying process or 134 00:05:33,620 --> 00:05:34,510 whether it was just random chance. 135 00:05:34,510 --> 00:05:37,500 So for example, consider the following example. 136 00:05:37,500 --> 00:05:41,100 So suppose you're faced with a professional gambler who told 137 00:05:41,100 --> 00:05:44,550 you that she could get heads most of the time. 138 00:05:44,550 --> 00:05:46,860 OK, so you might think this is a reasonable claim or an 139 00:05:46,860 --> 00:05:48,880 unreasonable claim, but this is what they're claiming and 140 00:05:48,880 --> 00:05:50,020 you want to see if this is true. 141 00:05:50,020 --> 00:05:53,160 So they toss the coin and they get heads, right? 142 00:05:53,160 --> 00:05:56,400 So can we learn anything from that? 143 00:05:56,400 --> 00:05:59,070 Well, probably not because anyone, even with a fair coin, 144 00:05:59,070 --> 00:06:01,160 50% of the time, they would get heads if they tossed it. 145 00:06:01,160 --> 00:06:03,840 So we're really can't infer anything from this one. 146 00:06:03,840 --> 00:06:06,540 What you saw that they did five times and they got heads, 147 00:06:06,540 --> 00:06:09,440 heads, tails, heads, heads. 148 00:06:09,440 --> 00:06:12,000 Well, can you infer anything about that? 149 00:06:12,000 --> 00:06:12,610 Well, maybe. 150 00:06:12,610 --> 00:06:15,000 You can start to say, well, this seems less likely to have 151 00:06:15,000 --> 00:06:16,110 occurred just by random chance. 152 00:06:16,110 --> 00:06:17,930 But you know there's only five tosses. 153 00:06:17,930 --> 00:06:19,750 What's the chance that someone with an even coin 154 00:06:19,750 --> 00:06:20,540 can get four heads? 155 00:06:20,540 --> 00:06:24,610 Well, we could calculate that if we knew the probabilities. 156 00:06:24,610 --> 00:06:26,140 And it's certainly not impossible that this could 157 00:06:26,140 --> 00:06:27,650 occur, right? 158 00:06:27,650 --> 00:06:29,860 And now, what if they got 20 tosses, right? 159 00:06:29,860 --> 00:06:32,210 Well, now you're starting to get information, although in 160 00:06:32,210 --> 00:06:34,600 this particular example, it was closer to 50-50. 161 00:06:34,600 --> 00:06:37,520 So now you have 12 versus eight. 162 00:06:37,520 --> 00:06:39,050 Could that have occurred by random chance? 163 00:06:39,050 --> 00:06:40,870 Well, maybe it could have, right? 164 00:06:40,870 --> 00:06:44,000 Because it's pretty close to 50-50. 165 00:06:44,000 --> 00:06:46,530 And now, suppose you had 100 tosses or suppose you had 166 00:06:46,530 --> 00:06:50,900 1,000 tosses with 609 heads and 391 tails, right? 167 00:06:50,900 --> 00:06:56,250 So as you're getting more and more data, right, you're much 168 00:06:56,250 --> 00:06:58,600 more likely to say something is meaningful. 169 00:06:58,600 --> 00:07:01,600 So if you saw this data, for example, the odds that could 170 00:07:01,600 --> 00:07:03,545 occur by random chance are pretty high. 171 00:07:03,545 --> 00:07:07,610 But if you saw this data with 609 heads and 391 tails out of 172 00:07:07,610 --> 00:07:10,180 1,000 tosses, it's actually pretty unlikely that this 173 00:07:10,180 --> 00:07:12,450 would occur just by random chance, OK? 174 00:07:12,450 --> 00:07:17,330 And so this shows you, as you get more data you can actually 175 00:07:17,330 --> 00:07:19,500 say, how likely was this outcome to have occurred by 176 00:07:19,500 --> 00:07:20,970 random chance? 177 00:07:20,970 --> 00:07:23,540 And the more data you have, the more likely you're going 178 00:07:23,540 --> 00:07:25,850 to be able to conclude that actually, this difference you 179 00:07:25,850 --> 00:07:28,790 observed was actually due to something that the person was 180 00:07:28,790 --> 00:07:32,500 doing and not just due to what would happen randomly. 181 00:07:32,500 --> 00:07:36,490 And in some sense, all of statistics is basically this 182 00:07:36,490 --> 00:07:40,640 intuition, which is, you take the data you observe and you 183 00:07:40,640 --> 00:07:43,510 calculate what is the chance that the data I observe could 184 00:07:43,510 --> 00:07:45,710 have occurred just by random chance. 185 00:07:45,710 --> 00:07:48,690 And if the chance that the data I observed could have 186 00:07:48,690 --> 00:07:51,680 happened just by random chance is really unlikely, then you 187 00:07:51,680 --> 00:07:54,280 say, well then it must've been that your program actually had 188 00:07:54,280 --> 00:07:56,250 an effect, OK? 189 00:07:56,250 --> 00:07:58,600 Does that make sense? 190 00:07:58,600 --> 00:08:00,285 That's the basic idea, essentially, of all of 191 00:08:00,285 --> 00:08:01,820 statistics is, what's the probability that this thing 192 00:08:01,820 --> 00:08:02,710 could have happened randomly? 193 00:08:02,710 --> 00:08:06,090 And if it's unlikely, then probably there was something 194 00:08:06,090 --> 00:08:08,730 else going on. 195 00:08:08,730 --> 00:08:11,860 Here's another example. 196 00:08:11,860 --> 00:08:16,110 So what this example shows is, now suppose you have a second 197 00:08:16,110 --> 00:08:18,880 gambler who had 1,000 tosses and they had 530 198 00:08:18,880 --> 00:08:20,130 heads and 470 tails. 199 00:08:23,350 --> 00:08:28,770 What this shows is that-- and that's really a lot of data. 200 00:08:28,770 --> 00:08:31,660 But in some sense, what we can learn about this data depends 201 00:08:31,660 --> 00:08:33,905 on what hypothesis we're interested in. 202 00:08:33,905 --> 00:08:39,700 So if the gambler claimed they obtained heads 70% of the 203 00:08:39,700 --> 00:08:43,210 time, we could probably say, no, I don't think so, right? 204 00:08:43,210 --> 00:08:45,470 This is enough data that the odds that you would get this 205 00:08:45,470 --> 00:08:48,330 data pattern if you had heads 70% of the time are really, 206 00:08:48,330 --> 00:08:50,440 really small, right? 207 00:08:50,440 --> 00:08:54,280 So we could say, I can reject this claim. 208 00:08:54,280 --> 00:08:59,810 But suppose they said that they claim they could get 209 00:08:59,810 --> 00:09:02,780 heads 54% of the time, OK? 210 00:09:02,780 --> 00:09:05,760 And you observe they got heads 53% of the time. 211 00:09:05,760 --> 00:09:08,450 Well, you probably couldn't reject this claim, right? 212 00:09:08,450 --> 00:09:11,180 Because this is similar enough to this that if this was the 213 00:09:11,180 --> 00:09:14,410 truth, this could have occurred by random chance. 214 00:09:14,410 --> 00:09:21,840 So in some sense, what we can say based on the data depends 215 00:09:21,840 --> 00:09:26,180 on how far the data is from our hypothesis and how much 216 00:09:26,180 --> 00:09:28,330 data we have. 217 00:09:28,330 --> 00:09:31,280 Does that make sense as some basic intuition? 218 00:09:31,280 --> 00:09:32,530 OK. 219 00:09:34,680 --> 00:09:38,760 So how do we apply this to an experiment? 220 00:09:38,760 --> 00:09:40,690 Well, at the end of the experiment, what we're going 221 00:09:40,690 --> 00:09:42,680 to do is we're going to compare the 222 00:09:42,680 --> 00:09:43,300 two different groups. 223 00:09:43,300 --> 00:09:44,060 We're going to compare the treatment 224 00:09:44,060 --> 00:09:45,600 and the control group. 225 00:09:45,600 --> 00:09:47,240 And we're going to say-- 226 00:09:47,240 --> 00:09:49,240 we're going to take a look at the average, just like we were 227 00:09:49,240 --> 00:09:50,090 doing in the gambling example. 228 00:09:50,090 --> 00:09:51,870 We'll compare the average in the treatment group and the 229 00:09:51,870 --> 00:09:54,930 average in the control, OK? 230 00:09:54,930 --> 00:09:56,400 And the difference is the effect size. 231 00:09:59,590 --> 00:10:02,900 So for example, in this particular case, in the 232 00:10:02,900 --> 00:10:05,070 Panchayat case, you'd look at, for example, the mean number 233 00:10:05,070 --> 00:10:07,100 of wells you've got in the village with the female 234 00:10:07,100 --> 00:10:09,195 leaders versus the mean number of wells in the villages with 235 00:10:09,195 --> 00:10:11,060 the male leaders, OK? 236 00:10:11,060 --> 00:10:13,100 So that's in some sense our estimate of how big the 237 00:10:13,100 --> 00:10:14,350 difference is. 238 00:10:19,130 --> 00:10:22,370 And the question is going to be, how likely would we have 239 00:10:22,370 --> 00:10:25,200 been to observe this difference between the 240 00:10:25,200 --> 00:10:28,050 treatment and the control group if it was just due to 241 00:10:28,050 --> 00:10:29,080 random chance, OK? 242 00:10:29,080 --> 00:10:32,400 And that's what we need the statistics to figure out. 243 00:10:32,400 --> 00:10:42,893 Now one of the reasons-- 244 00:10:46,520 --> 00:10:49,990 so where does the noise come from? 245 00:10:49,990 --> 00:10:53,920 In some sense, we're not going to observe an infinite number 246 00:10:53,920 --> 00:10:56,030 of villages. 247 00:10:56,030 --> 00:10:57,990 Or we're not going to observe all possible villages. 248 00:10:57,990 --> 00:11:00,600 In fact, even if we observe all the villages that exist, 249 00:11:00,600 --> 00:11:02,110 we're not going to observe, in some sense, all of the 250 00:11:02,110 --> 00:11:04,130 possible villages that could've hypothetically 251 00:11:04,130 --> 00:11:06,640 existed if the villages were replicated millions and 252 00:11:06,640 --> 00:11:08,580 millions of times. 253 00:11:08,580 --> 00:11:09,630 We're just going to observe some 254 00:11:09,630 --> 00:11:12,550 finite number of villages. 255 00:11:12,550 --> 00:11:17,070 And so we're going to estimate this mean by computing the 256 00:11:17,070 --> 00:11:21,820 mean in the villages that we observed, OK? 257 00:11:21,820 --> 00:11:32,290 And if there are very few villages, that mean that we're 258 00:11:32,290 --> 00:11:35,700 going to calculate is going to be imprecise because if you 259 00:11:35,700 --> 00:11:38,460 took a different sample of villages, you would get a 260 00:11:38,460 --> 00:11:41,080 slightly different mean, OK? 261 00:11:41,080 --> 00:11:43,010 If you sample an infinite number of villages, you get 262 00:11:43,010 --> 00:11:44,770 the same thing every time. 263 00:11:44,770 --> 00:11:49,110 But suppose you only sampled one village. 264 00:11:49,110 --> 00:11:50,950 Or suppose there was a million villages out there and you 265 00:11:50,950 --> 00:11:52,380 sampled two, right? 266 00:11:52,380 --> 00:11:53,970 And you took the average, OK? 267 00:11:53,970 --> 00:11:56,120 If you sampled a different two villages, just by random 268 00:11:56,120 --> 00:11:58,480 chance, you would get a different average. 269 00:11:58,480 --> 00:12:00,880 And sometimes that's where the part of the noise in our data 270 00:12:00,880 --> 00:12:02,130 is coming from. 271 00:12:08,370 --> 00:12:09,620 So for example-- 272 00:12:11,860 --> 00:12:14,020 sorry. 273 00:12:14,020 --> 00:12:17,350 So in some sense, what we need to know is, we need to know if 274 00:12:17,350 --> 00:12:20,480 these two groups-- it sort of goes back to the same as 275 00:12:20,480 --> 00:12:24,760 before, if these two groups were the same and I sampled 276 00:12:24,760 --> 00:12:27,950 them, what are the chances I would get the difference that 277 00:12:27,950 --> 00:12:29,010 I observed by random chance? 278 00:12:29,010 --> 00:12:31,640 So for example, suppose you observed these two 279 00:12:31,640 --> 00:12:33,670 distributions, OK? 280 00:12:33,670 --> 00:12:34,900 So this is your control group and this is 281 00:12:34,900 --> 00:12:36,685 your treatment group. 282 00:12:36,685 --> 00:12:40,640 Now you can see there is some noise in the data, right? 283 00:12:40,640 --> 00:12:44,170 This one is a mean of 50 and this one is a mean of 60. 284 00:12:44,170 --> 00:12:45,060 And there's some-- 285 00:12:45,060 --> 00:12:46,360 these are histograms, right? 286 00:12:46,360 --> 00:12:48,580 So this is the distribution of the number of villages that 287 00:12:48,580 --> 00:12:52,930 you observed for each possible outcome. 288 00:12:52,930 --> 00:12:55,540 So you can see here that there's some noise, right? 289 00:12:55,540 --> 00:12:57,490 It's not that everyone here was exactly 50 and everyone 290 00:12:57,490 --> 00:12:58,310 here was exactly 60. 291 00:12:58,310 --> 00:12:59,310 Some people were 45. 292 00:12:59,310 --> 00:13:02,030 Some were 55 or whatever. 293 00:13:02,030 --> 00:13:07,650 But if you look at these two distributions, you could say 294 00:13:07,650 --> 00:13:10,730 it's pretty unlikely that if these were actually drawn from 295 00:13:10,730 --> 00:13:12,780 the same distribution of villages, all of the blue ones 296 00:13:12,780 --> 00:13:14,030 would be over here and all the yellow ones 297 00:13:14,030 --> 00:13:16,210 would be over here. 298 00:13:16,210 --> 00:13:18,490 It's very unlikely that if these were actually the same 299 00:13:18,490 --> 00:13:21,930 and you draw randomly, you get this real bifurcation of these 300 00:13:21,930 --> 00:13:25,040 villages, OK? 301 00:13:28,680 --> 00:13:33,090 And where are we basing that idea, that conclusion on? 302 00:13:33,090 --> 00:13:37,610 We're basing the conclusion on the fact that there's not a 303 00:13:37,610 --> 00:13:40,710 lot of overlap, in some sense, between these two groups. 304 00:13:40,710 --> 00:13:45,780 But now, what if you saw this picture, right? 305 00:13:45,780 --> 00:13:48,411 What would you be able to conclude? 306 00:13:48,411 --> 00:13:51,030 Well, it's a little less clear. 307 00:13:51,030 --> 00:13:52,410 The mean is still the same. 308 00:13:52,410 --> 00:13:54,650 All the yellows still have an average of 60 and all the 309 00:13:54,650 --> 00:13:56,000 blues have an average of 50. 310 00:13:56,000 --> 00:13:57,450 But there's a lot more overlap between them. 311 00:13:57,450 --> 00:13:59,460 Now if we look at this, we can sort of eyeball it and say, 312 00:13:59,460 --> 00:14:02,916 well, there's really a pretty big difference even relative 313 00:14:02,916 --> 00:14:04,640 to the distributions there. 314 00:14:04,640 --> 00:14:06,060 So maybe we could conclude that 315 00:14:06,060 --> 00:14:06,790 they were really different. 316 00:14:06,790 --> 00:14:07,950 Maybe not. 317 00:14:07,950 --> 00:14:10,510 And what if we saw this, right? 318 00:14:10,510 --> 00:14:12,340 This is still the same means. 319 00:14:12,340 --> 00:14:13,990 The yellows have a mean of 60 and the blues 320 00:14:13,990 --> 00:14:15,225 have a mean of 50. 321 00:14:15,225 --> 00:14:17,590 But now they're so interspersed that 322 00:14:17,590 --> 00:14:18,940 is harder to know-- 323 00:14:18,940 --> 00:14:20,890 it's possible, if you saw pictures like this, you would 324 00:14:20,890 --> 00:14:24,170 say, well, yes, the yellows are higher, but maybe this was 325 00:14:24,170 --> 00:14:28,990 just due to random chance, OK? 326 00:14:28,990 --> 00:14:32,790 So what the purpose of these graphs are, is to show you is 327 00:14:32,790 --> 00:14:35,900 that in order-- so in both cases, we the same difference 328 00:14:35,900 --> 00:14:37,110 in the mean outcomes. 329 00:14:37,110 --> 00:14:41,160 It was 60 versus 50 in all three cases, right? 330 00:14:41,160 --> 00:14:44,440 But when you saw this graph, it was quite clear that these 331 00:14:44,440 --> 00:14:46,210 two groups were really different. 332 00:14:46,210 --> 00:14:49,040 When you saw this graph, is was much harder to figure out 333 00:14:49,040 --> 00:14:50,740 if these two were really different or if this was just 334 00:14:50,740 --> 00:14:53,340 due to random chance, OK? 335 00:14:53,340 --> 00:14:55,490 Does that make sense of where we're going? 336 00:14:55,490 --> 00:14:58,510 And so, just to come back to the same theme, all the 337 00:14:58,510 --> 00:15:02,350 statistics are going to do in our case is going to help us 338 00:15:02,350 --> 00:15:06,710 figure out, are these differences big enough, given 339 00:15:06,710 --> 00:15:10,600 the distribution of data we have, how likely is it that 340 00:15:10,600 --> 00:15:11,670 the difference we observed could have 341 00:15:11,670 --> 00:15:13,470 happened by random chance. 342 00:15:13,470 --> 00:15:16,570 And so intuitively, we can look at this one and say, 343 00:15:16,570 --> 00:15:17,740 definitely different. 344 00:15:17,740 --> 00:15:19,510 And this one, maybe not sure. 345 00:15:19,510 --> 00:15:21,550 But if we want to be a little more precise about that, 346 00:15:21,550 --> 00:15:23,390 that's where we need the added statistics. 347 00:15:23,390 --> 00:15:25,190 AUDIENCE: Is the sample size the same in both examples? 348 00:15:25,190 --> 00:15:28,210 PROFESSOR: Yeah, the sample size is the same. 349 00:15:28,210 --> 00:15:29,580 Yeah, sample size is exactly the same. 350 00:15:29,580 --> 00:15:34,240 So you can see that the numbers go down because it's 351 00:15:34,240 --> 00:15:35,490 more spread out. 352 00:15:41,410 --> 00:15:44,320 All right. 353 00:15:44,320 --> 00:15:47,520 So in some sense, what are the ingredients that we've talked 354 00:15:47,520 --> 00:15:49,300 about in terms of thinking about whether you have a 355 00:15:49,300 --> 00:15:52,580 statistically significant difference? 356 00:15:52,580 --> 00:15:54,180 If you think back to the gambler example, we talked 357 00:15:54,180 --> 00:15:57,830 about the sample size matters, right? 358 00:15:57,830 --> 00:16:04,170 So if we saw 1,000 tosses, we had much more precision about 359 00:16:04,170 --> 00:16:08,100 our estimates than if we had 10 tosses or five tosses. 360 00:16:08,100 --> 00:16:10,850 The hypothesis you're testing matters, right? 361 00:16:10,850 --> 00:16:17,050 Because the smaller an effect size we're trying to detect, 362 00:16:17,050 --> 00:16:20,820 the more tosses we need in the gambler example. 363 00:16:20,820 --> 00:16:23,660 If you're trying to detect a really small difference, you 364 00:16:23,660 --> 00:16:26,180 need a ton of data, whereas if you're trying to detect really 365 00:16:26,180 --> 00:16:31,130 extreme differences, you can do it with less data, OK? 366 00:16:31,130 --> 00:16:33,830 And the third thing we saw is the variability of the outcome 367 00:16:33,830 --> 00:16:34,890 matters, right? 368 00:16:34,890 --> 00:16:38,340 So the more noisy the outcome is, the harder it is to know 369 00:16:38,340 --> 00:16:40,460 whether the differences that we observe are due just to 370 00:16:40,460 --> 00:16:43,160 random chance or if they're really due some difference in 371 00:16:43,160 --> 00:16:44,480 the treatment versus the control group. 372 00:16:48,690 --> 00:16:50,940 OK, so does this makes sense? 373 00:16:50,940 --> 00:16:53,510 Before I go on, these are the three ingredients that we're 374 00:16:53,510 --> 00:16:54,700 going to be playing with. 375 00:16:54,700 --> 00:16:55,740 Do these make sense? 376 00:16:55,740 --> 00:16:59,225 Do you have questions on this? 377 00:16:59,225 --> 00:17:00,710 OK. 378 00:17:00,710 --> 00:17:06,210 So you may have heard of a confidence interval. 379 00:17:06,210 --> 00:17:06,990 How many of you guys have heard of 380 00:17:06,990 --> 00:17:08,859 a confidence interval? 381 00:17:08,859 --> 00:17:10,109 OK. 382 00:17:12,290 --> 00:17:13,710 How many of you can state the definition of 383 00:17:13,710 --> 00:17:16,079 a confidence interval? 384 00:17:16,079 --> 00:17:16,609 Thanks, Dan. 385 00:17:16,609 --> 00:17:17,859 I'm glad that you can. 386 00:17:21,060 --> 00:17:30,270 So what do we mean when we say confidence interval? 387 00:17:30,270 --> 00:17:31,665 What we mean by a confidence interval-- 388 00:17:35,000 --> 00:17:37,060 so let's just go through what's on the slide and then 389 00:17:37,060 --> 00:17:38,690 we can talk about it a little more. 390 00:17:38,690 --> 00:17:41,760 So we're going to measure, say, 100 people and we're 391 00:17:41,760 --> 00:17:43,560 going to come up with an average length of 53 392 00:17:43,560 --> 00:17:44,880 centimeters. 393 00:17:44,880 --> 00:17:47,260 So we want to be able to say something about how precise 394 00:17:47,260 --> 00:17:48,460 our estimate is. 395 00:17:48,460 --> 00:17:51,990 So we say the average is 53 centimeters. 396 00:17:51,990 --> 00:17:56,810 How confident are we or how precise are we that it's 53%? 397 00:17:56,810 --> 00:17:59,600 And that's what a conference interval is trying to say. 398 00:17:59,600 --> 00:18:02,315 And a confidence interval, essentially, tells us that 399 00:18:02,315 --> 00:18:04,920 with 95% probability-- 400 00:18:04,920 --> 00:18:07,390 so we have a confidence interval of 50-56 says that 401 00:18:07,390 --> 00:18:10,860 with 95% probability, the true average length lies 402 00:18:10,860 --> 00:18:13,210 between 50 and 56. 403 00:18:13,210 --> 00:18:20,210 And so the precise definition is that if you had a 404 00:18:20,210 --> 00:18:25,280 hypothesis that the true average length was in this 405 00:18:25,280 --> 00:18:31,500 range with-- 406 00:18:31,500 --> 00:18:34,900 no, I'm going to get it wrong. 407 00:18:34,900 --> 00:18:43,800 It says that if you had a hypothesis that the true 408 00:18:43,800 --> 00:18:48,090 average was in here, it's within 95% probability that 409 00:18:48,090 --> 00:18:52,790 you would get the data that you observe, OK? 410 00:18:52,790 --> 00:18:55,720 A converse way of saying it is that the truth is somewhere in 411 00:18:55,720 --> 00:18:58,180 this range, right? 412 00:19:01,620 --> 00:19:03,600 You can be 95% certain that the truth is somewhere within 413 00:19:03,600 --> 00:19:08,720 this range, So if you did 20 of these tests, only one out 414 00:19:08,720 --> 00:19:10,930 of 20 times would the truth be outside your confidence 415 00:19:10,930 --> 00:19:16,370 interval, OK? 416 00:19:16,370 --> 00:19:18,980 And so an approximate interpretation of a confidence 417 00:19:18,980 --> 00:19:19,930 interval is-- 418 00:19:19,930 --> 00:19:23,910 so we know that the point estimate of 43, we have some 419 00:19:23,910 --> 00:19:24,950 uncertainty about that estimate. 420 00:19:24,950 --> 00:19:27,790 We think the average is 53, but there's some uncertainty. 421 00:19:27,790 --> 00:19:32,210 And the confidence interval says, well, it's 95% likely 422 00:19:32,210 --> 00:19:35,755 that the true answer is between 50 and 56, if that was 423 00:19:35,755 --> 00:19:41,300 the confidence interval, OK? 424 00:19:41,300 --> 00:19:45,730 So why is that useful for us? 425 00:19:45,730 --> 00:19:47,700 Well, our goal is to figure out-- 426 00:19:47,700 --> 00:19:49,280 we don't care, actually, what our estimate of the 427 00:19:49,280 --> 00:19:50,080 program's effect is. 428 00:19:50,080 --> 00:19:51,990 We care what the true effect of a program is, right? 429 00:19:51,990 --> 00:19:52,880 So we did some intervention. 430 00:19:52,880 --> 00:19:54,540 Like, for example, we had a female Panchayat leader 431 00:19:54,540 --> 00:19:56,870 instead of a male Panchayat leader and we want to figure 432 00:19:56,870 --> 00:20:02,740 out what the actual difference that that intervention made is 433 00:20:02,740 --> 00:20:04,230 in the world. 434 00:20:04,230 --> 00:20:08,930 We're going to observe some sample of Panchayats and we'll 435 00:20:08,930 --> 00:20:11,500 look at the difference in that sample. 436 00:20:11,500 --> 00:20:13,490 And we want to know how much can we learn about the true 437 00:20:13,490 --> 00:20:15,650 program effect from what we estimated. 438 00:20:15,650 --> 00:20:17,580 And the confidence interval basically tells us that with 439 00:20:17,580 --> 00:20:21,310 95% probability, the true program effect is somewhere in 440 00:20:21,310 --> 00:20:24,020 the confidence interval, OK? 441 00:20:24,020 --> 00:20:25,270 Does that makes sense? 442 00:20:29,790 --> 00:20:33,020 How many of you guys have heard of the standard error? 443 00:20:33,020 --> 00:20:33,930 OK. 444 00:20:33,930 --> 00:20:38,620 So a standard error is related to the confidence interval in 445 00:20:38,620 --> 00:20:45,430 that a standard error says that if we have some estimate, 446 00:20:45,430 --> 00:20:49,840 you could imagine that if we did the experiment again, 447 00:20:49,840 --> 00:20:52,610 essentially, with a new sample of people that looked like the 448 00:20:52,610 --> 00:20:57,830 original sample of people, we might get a slightly different 449 00:20:57,830 --> 00:20:59,360 point estimate because it's a different sample. 450 00:21:02,680 --> 00:21:06,470 The standard error basically says, what's the distribution 451 00:21:06,470 --> 00:21:10,810 of those possible estimates that you could get, OK? 452 00:21:10,810 --> 00:21:13,640 So it says that basically, if I did this experiment again, 453 00:21:13,640 --> 00:21:15,810 maybe i wouldn't get 53, I'd get 54. 454 00:21:15,810 --> 00:21:17,380 If I did it again, maybe I'd get 52. 455 00:21:17,380 --> 00:21:19,020 If I did it again, I might get 53. 456 00:21:19,020 --> 00:21:21,220 The standard error is essentially the standard 457 00:21:21,220 --> 00:21:25,250 deviation of those possible estimates that you could get. 458 00:21:25,250 --> 00:21:27,880 What that means in practice is that-- 459 00:21:31,870 --> 00:21:33,080 well, in practice, the standard error is very related 460 00:21:33,080 --> 00:21:34,340 to the confidence interval. 461 00:21:34,340 --> 00:21:38,150 And basically, a good rule of thumb is that a 95% confidence 462 00:21:38,150 --> 00:21:40,330 interval is about two standard errors. 463 00:21:40,330 --> 00:21:43,400 So if you ever see an estimate of the standard error, you can 464 00:21:43,400 --> 00:21:45,180 calculate the confidence interval, essentially, by 465 00:21:45,180 --> 00:21:51,420 going up or down two standard errors from the point 466 00:21:51,420 --> 00:21:54,020 estimate, OK? 467 00:21:54,020 --> 00:21:56,250 And the confidence interval and standard error, 468 00:21:56,250 --> 00:21:57,980 essentially, are capturing the same thing. 469 00:21:57,980 --> 00:22:00,190 They're both capturing-- 470 00:22:00,190 --> 00:22:03,080 when I said we need statistics to basically compute, how 471 00:22:03,080 --> 00:22:05,420 likely is it that we would get these differences by random 472 00:22:05,420 --> 00:22:07,930 chance, those are all coming out in the standard error and 473 00:22:07,930 --> 00:22:09,060 the confidence interval, right? 474 00:22:09,060 --> 00:22:12,630 They're computed by both looking at how noisy our data 475 00:22:12,630 --> 00:22:17,710 is, which is the variability of the outcome, and how big 476 00:22:17,710 --> 00:22:19,410 our sample is, right? 477 00:22:19,410 --> 00:22:22,450 Because from these two things, you can basically calculate 478 00:22:22,450 --> 00:22:26,570 how uncertain your estimate would be. 479 00:22:26,570 --> 00:22:29,800 This is a lot of terminology very quickly, but does this 480 00:22:29,800 --> 00:22:32,180 all make sense? 481 00:22:32,180 --> 00:22:35,380 Any questions on this? 482 00:22:35,380 --> 00:22:36,630 OK. 483 00:22:38,810 --> 00:22:41,550 So for example. 484 00:22:41,550 --> 00:22:46,005 So suppose we saw the sampled women Pradhans had 7.13 years 485 00:22:46,005 --> 00:22:51,200 of education and the men had 9.92 years of education, OK? 486 00:22:51,200 --> 00:22:56,410 And you want to know, is the truth that men have more 487 00:22:56,410 --> 00:22:58,860 education than women or is this just a random artifact of 488 00:22:58,860 --> 00:23:00,980 our sample? 489 00:23:00,980 --> 00:23:06,640 So suppose you calculated that the difference was 2.59. 490 00:23:06,640 --> 00:23:07,880 That's easy to calculate. 491 00:23:07,880 --> 00:23:11,520 And the standard error was 0.54 and the standard error 492 00:23:11,520 --> 00:23:13,650 was going to be calculated based on both how much data 493 00:23:13,650 --> 00:23:17,250 you had and how noisy the data was. 494 00:23:17,250 --> 00:23:19,660 You would compute that the 95% confidence interval is between 495 00:23:19,660 --> 00:23:22,620 1.53 and 3.64, OK? 496 00:23:22,620 --> 00:23:24,960 So this means that with 95% probability, the true 497 00:23:24,960 --> 00:23:27,570 difference in education rates between men and women is 498 00:23:27,570 --> 00:23:30,300 between 1.53 and 3.64. 499 00:23:30,300 --> 00:23:33,870 So if you were interested in testing the hypothesis that, 500 00:23:33,870 --> 00:23:38,570 in fact, men and women are the same in education, you could 501 00:23:38,570 --> 00:23:40,860 say that I can reject that hypothesis. 502 00:23:40,860 --> 00:23:43,285 With 95% probability, the true difference is 503 00:23:43,285 --> 00:23:45,360 between 1.53 and 3.64-- 504 00:23:45,360 --> 00:23:49,040 so zero is not in this confidence interval, right? 505 00:23:49,040 --> 00:23:52,910 So we can reject the hypothesis that there's no 506 00:23:52,910 --> 00:23:55,730 difference between these two groups. 507 00:23:55,730 --> 00:23:58,130 Does that makes sense? 508 00:23:58,130 --> 00:23:59,600 So doing another example. 509 00:23:59,600 --> 00:24:02,410 So in this example, suppose that we saw that control 510 00:24:02,410 --> 00:24:04,790 children had an average test score of 2.45 and the 511 00:24:04,790 --> 00:24:07,350 treatment had an average test score of 2.5. 512 00:24:07,350 --> 00:24:10,060 So we saw a difference of 0.05 and the standard 513 00:24:10,060 --> 00:24:13,530 error was 0.26, OK? 514 00:24:13,530 --> 00:24:15,620 So in this case, you would say well, the 95% confidence 515 00:24:15,620 --> 00:24:18,690 interval is minus 0.55. 516 00:24:18,690 --> 00:24:20,710 This is approximately two. 517 00:24:20,710 --> 00:24:21,820 It's not exactly two. 518 00:24:21,820 --> 00:24:22,630 Minus 0.55-- 519 00:24:22,630 --> 00:24:24,140 oh, no, it is exactly two in this example. 520 00:24:24,140 --> 00:24:28,850 Minus 0.55 to 0.46, OK? 521 00:24:28,850 --> 00:24:31,355 And here, you would say that if we were introducing the 522 00:24:31,355 --> 00:24:33,170 hypothesis that the null hypothesis is that the 523 00:24:33,170 --> 00:24:36,300 treatment had no effect on test scores, you could not 524 00:24:36,300 --> 00:24:37,890 reject that null hypothesis, right? 525 00:24:37,890 --> 00:24:40,990 Because an effect of zero is within the confidence 526 00:24:40,990 --> 00:24:43,650 interval, OK? 527 00:24:43,650 --> 00:24:45,610 So that's basically how we use confidence intervals. 528 00:24:45,610 --> 00:24:46,100 Yeah. 529 00:24:46,100 --> 00:24:48,350 AUDIENCE: Shouldn't the two points of that confidence 530 00:24:48,350 --> 00:24:54,370 interval be equidistant from 2.59? 531 00:24:54,370 --> 00:24:58,310 PROFESSOR: From 0.05 you mean? 532 00:24:58,310 --> 00:24:58,730 Yeah. 533 00:24:58,730 --> 00:24:59,250 AUDIENCE: [INAUDIBLE] 534 00:24:59,250 --> 00:25:01,360 PROFESSOR: Yeah, I think-- 535 00:25:01,360 --> 00:25:02,500 oh, over here? 536 00:25:02,500 --> 00:25:03,330 AUDIENCE: Yeah. 537 00:25:03,330 --> 00:25:07,035 PROFESSOR: So they actually don't always have-- 538 00:25:07,035 --> 00:25:09,750 so you raise a good point. 539 00:25:09,750 --> 00:25:11,310 So there may be some math errors here. 540 00:25:11,310 --> 00:25:13,540 I think a more reasonable estimate, by the way, is that 541 00:25:13,540 --> 00:25:15,970 this would have to be minus 0.05 for you to get 542 00:25:15,970 --> 00:25:16,540 something like this. 543 00:25:16,540 --> 00:25:20,620 AUDIENCE: But in the first example, if 2.59 is the mean, 544 00:25:20,620 --> 00:25:23,885 is the difference-- 545 00:25:23,885 --> 00:25:26,790 PROFESSOR: So it's approximately 546 00:25:26,790 --> 00:25:27,380 the same, isn't it? 547 00:25:27,380 --> 00:25:29,120 AUDIENCE: I think it's a little skewed-- 548 00:25:29,120 --> 00:25:29,670 PROFESSOR: Yeah. 549 00:25:29,670 --> 00:25:30,620 AUDIENCE: On that side, it is 2.64. 550 00:25:30,620 --> 00:25:31,530 PROFESSOR: OK. 551 00:25:31,530 --> 00:25:33,720 Yeah, so you raise a good point. 552 00:25:33,720 --> 00:25:39,520 So when I said that a rule of thumb is two times the 553 00:25:39,520 --> 00:25:42,030 standard error, that's a rule of thumb. 554 00:25:42,030 --> 00:25:45,820 And in particular cases, you can sometimes get asymmetric 555 00:25:45,820 --> 00:25:47,100 confidence intervals. 556 00:25:47,100 --> 00:25:49,670 So you're right that usually they should be symmetric and 557 00:25:49,670 --> 00:25:51,600 probably, for simplicity, we should have put up symmetric 558 00:25:51,600 --> 00:25:54,780 ones, but it can occur that confidence intervals are 559 00:25:54,780 --> 00:25:57,080 asymmetric. 560 00:25:57,080 --> 00:25:58,440 For example, if you had a-- 561 00:26:02,000 --> 00:26:05,410 yeah, depending on the estimation, if you have 562 00:26:05,410 --> 00:26:07,270 truncation at zero-- 563 00:26:07,270 --> 00:26:08,740 if you know for sure that there can never be an outcome 564 00:26:08,740 --> 00:26:11,264 below zero, for example, then you can get asymmetric 565 00:26:11,264 --> 00:26:11,738 confidence intervals. 566 00:26:11,738 --> 00:26:15,530 AUDIENCE: When the distribution is not normal? 567 00:26:15,530 --> 00:26:16,770 PROFESSOR: Yeah. 568 00:26:16,770 --> 00:26:19,070 Exactly. 569 00:26:19,070 --> 00:26:24,570 But for most things that you'll be investigating, 570 00:26:24,570 --> 00:26:26,070 usually they're going to be-- 571 00:26:26,070 --> 00:26:26,950 AUDIENCE: Normal. 572 00:26:26,950 --> 00:26:28,520 PROFESSOR: Yeah, for outcomes that are zero. 573 00:26:28,520 --> 00:26:29,190 One [UNINTELLIGIBLE] 574 00:26:29,190 --> 00:26:31,670 get non-normal, but yes, in general, 575 00:26:31,670 --> 00:26:33,010 they are pretty symmetric. 576 00:26:33,010 --> 00:26:37,563 But they might not be exactly symmetric. 577 00:26:37,563 --> 00:26:40,010 OK. 578 00:26:40,010 --> 00:26:43,340 So as I sort of was suggesting as we were going through these 579 00:26:43,340 --> 00:26:46,470 examples, we're often interested in testing the 580 00:26:46,470 --> 00:26:50,460 hypothesis that the effect size is equal to zero, right? 581 00:26:50,460 --> 00:26:54,710 The classic hypothesis the you typically want to know is, did 582 00:26:54,710 --> 00:26:59,605 my program do anything, right? 583 00:26:59,605 --> 00:27:02,640 And so, how do you test the hypothesis that my program-- 584 00:27:02,640 --> 00:27:04,470 so we want to know, did my program have 585 00:27:04,470 --> 00:27:06,140 any effect at all? 586 00:27:06,140 --> 00:27:08,890 And so what we technically want to do is we want to test 587 00:27:08,890 --> 00:27:11,960 what's called the null hypothesis, that the program 588 00:27:11,960 --> 00:27:15,240 had an effect of nothing, against an alternative 589 00:27:15,240 --> 00:27:18,616 hypothesis that the program had some effect. 590 00:27:18,616 --> 00:27:23,370 So this is the typical test that we want to do. 591 00:27:23,370 --> 00:27:27,670 Now you could say, actually, I don't care about zero. 592 00:27:27,670 --> 00:27:30,970 I want to say that I know-- for example, this is the 593 00:27:30,970 --> 00:27:33,420 standard thing that we would do in most policy evaluations 594 00:27:33,420 --> 00:27:34,670 that we're going to be doing. 595 00:27:37,050 --> 00:27:38,680 It doesn't have to be zero. 596 00:27:38,680 --> 00:27:40,740 Suppose you were doing a drug trial and you knew that the 597 00:27:40,740 --> 00:27:42,850 best existing treatment out there already have 598 00:27:42,850 --> 00:27:44,160 an effect of one. 599 00:27:44,160 --> 00:27:49,420 And so instead of comparing to zero, you might be 600 00:27:49,420 --> 00:27:51,250 comparing to one. 601 00:27:51,250 --> 00:27:54,800 Is it actually better than the best existing treatment? 602 00:27:54,800 --> 00:27:59,170 In most cases, we're usually comparing to zero, OK? 603 00:27:59,170 --> 00:28:02,140 And usually, we have the alternative hypothesis that 604 00:28:02,140 --> 00:28:03,140 the effect is just not zero. 605 00:28:03,140 --> 00:28:05,080 We're interested in anything other than zero. 606 00:28:05,080 --> 00:28:08,000 Sometimes you can specify other alternative hypotheses, 607 00:28:08,000 --> 00:28:11,040 that the effect is always positive or always negative, 608 00:28:11,040 --> 00:28:13,590 but usually this is the classic case, which is we're 609 00:28:13,590 --> 00:28:17,490 saying, we think this thing had-- the null is no effect. 610 00:28:17,490 --> 00:28:19,460 We want to say, did this program have an effect and 611 00:28:19,460 --> 00:28:23,760 we're interested in any possible effect, OK? 612 00:28:23,760 --> 00:28:27,010 And hypothesis testing says, when can I reject this null 613 00:28:27,010 --> 00:28:32,170 hypothesis in favor of this alternative, OK? 614 00:28:36,600 --> 00:28:39,510 And as we saw, essentially, the confidence interval is 615 00:28:39,510 --> 00:28:41,090 giving you a way to do that. 616 00:28:41,090 --> 00:28:44,700 It's saying, if the null is outside the confidence 617 00:28:44,700 --> 00:28:47,370 interval, then I can reject the null. 618 00:28:47,370 --> 00:28:47,610 Yeah. 619 00:28:47,610 --> 00:28:53,665 AUDIENCE: Surely, if we're trying to assess the impact of 620 00:28:53,665 --> 00:28:57,220 an intervention, we're always going to think it's positive. 621 00:28:57,220 --> 00:28:59,610 Or in general, because-- 622 00:28:59,610 --> 00:29:02,180 I gave someone some money to increase their income or not. 623 00:29:02,180 --> 00:29:04,830 We've got a pretty good idea it's going to be positive. 624 00:29:04,830 --> 00:29:08,230 The probability it's negative is pretty-- 625 00:29:08,230 --> 00:29:09,845 PROFESSOR: Why do you-- 626 00:29:09,845 --> 00:29:12,230 AUDIENCE: Yeah, why do we change our 627 00:29:12,230 --> 00:29:12,510 significance level-- 628 00:29:12,510 --> 00:29:12,720 [INTERPOSING VOICES] 629 00:29:12,720 --> 00:29:16,420 PROFESSOR: You ask a great question. 630 00:29:16,420 --> 00:29:17,840 And I have to say this is a bit of a source of 631 00:29:17,840 --> 00:29:20,380 frustration of mine. 632 00:29:20,380 --> 00:29:24,950 Let me give you a couple different answers to that. 633 00:29:24,950 --> 00:29:26,360 Here's the thing. 634 00:29:26,360 --> 00:29:27,610 If you did that-- 635 00:29:30,790 --> 00:29:34,260 if I said I can commit, before I look at the data, that I 636 00:29:34,260 --> 00:29:37,720 only think it could be positive, that would mean that 637 00:29:37,720 --> 00:29:40,700 if it's negative, no matter how negative, you're going to 638 00:29:40,700 --> 00:29:43,430 say that was random chance, OK? 639 00:29:43,430 --> 00:29:46,730 So it would require a fair amount of commitment on you, 640 00:29:46,730 --> 00:29:50,300 on your part, as the experimenter to say, if I get 641 00:29:50,300 --> 00:29:53,540 a negative result, no matter how crazy that negative result 642 00:29:53,540 --> 00:29:57,310 is, I'm going to say that's random chance, OK? 643 00:29:57,310 --> 00:30:04,170 And typically, what often happens ex post is that people 644 00:30:04,170 --> 00:30:06,450 can't commit to actually doing that. 645 00:30:06,450 --> 00:30:08,780 So suppose you did your program and you-- 646 00:30:08,780 --> 00:30:12,120 so I actually have a program right now that I'm working on 647 00:30:12,120 --> 00:30:13,700 in Indonesia that's supposed to 648 00:30:13,700 --> 00:30:15,510 improve health and education. 649 00:30:15,510 --> 00:30:17,820 And it seems to be making education worse. 650 00:30:17,820 --> 00:30:20,020 Now, we have no theory for why this program should be making 651 00:30:20,020 --> 00:30:22,700 education worse, OK? 652 00:30:22,700 --> 00:30:25,235 But it certainly seems to be there in the data. 653 00:30:25,235 --> 00:30:28,520 Now, if we had adopted your approach, we wouldn't be 654 00:30:28,520 --> 00:30:30,630 entertaining the hypothesis that it made education worse. 655 00:30:30,630 --> 00:30:33,215 We would say, even though it's looking like this program is 656 00:30:33,215 --> 00:30:35,555 making education worse, that must be random 657 00:30:35,555 --> 00:30:37,250 noise in the data. 658 00:30:37,250 --> 00:30:42,110 We're not going to treat that as something potentially real. 659 00:30:42,110 --> 00:30:44,480 Ex post, though, you see this in the data and you're likely 660 00:30:44,480 --> 00:30:47,530 to say, gee, man, that's a really negative effect. 661 00:30:47,530 --> 00:30:49,460 Maybe the program was doing something that I 662 00:30:49,460 --> 00:30:50,150 didn't think about. 663 00:30:50,150 --> 00:30:51,435 And in our case, actually, we're starting to investigate 664 00:30:51,435 --> 00:30:53,560 and maybe it's because it was health and education and we're 665 00:30:53,560 --> 00:30:56,910 sort of sucking resources away from education into health. 666 00:30:56,910 --> 00:30:59,660 So it requires a lot of commitment on your part, as 667 00:30:59,660 --> 00:31:04,500 the researcher, that if you get these negative effects, to 668 00:31:04,500 --> 00:31:06,260 treat them as random noise. 669 00:31:06,260 --> 00:31:09,440 And I think that, because most researchers, even though they 670 00:31:09,440 --> 00:31:11,640 would like to say they're going to do that, if it 671 00:31:11,640 --> 00:31:13,370 happens that they get a really negative effect, they're going 672 00:31:13,370 --> 00:31:15,030 to want to say, gee, that looks like a negative effect. 673 00:31:15,030 --> 00:31:15,980 We're going to want to investigate 674 00:31:15,980 --> 00:31:17,490 that, take that seriously. 675 00:31:17,490 --> 00:31:22,030 Because most people do that ex post, the convention is that 676 00:31:22,030 --> 00:31:25,340 in most cases, to say we're going to test against either 677 00:31:25,340 --> 00:31:26,545 hypothesis in either direction. 678 00:31:26,545 --> 00:31:27,410 AUDIENCE: Except that the approach-- 679 00:31:27,410 --> 00:31:28,650 PROFESSOR: Does that makes sense? 680 00:31:28,650 --> 00:31:31,120 AUDIENCE: Your issue is do I do this program or not. 681 00:31:31,120 --> 00:31:33,307 So it doesn't matter whether the impact of the program is 682 00:31:33,307 --> 00:31:34,540 zero or negative. 683 00:31:34,540 --> 00:31:36,170 Even if it's zero, you're saying that it's-- 684 00:31:36,170 --> 00:31:37,080 PROFESSOR: You're absolutely right. 685 00:31:37,080 --> 00:31:42,846 So if you were strict about it and said, I'm going to do it 686 00:31:42,846 --> 00:31:45,430 if it's positive and not if it's zero, then I think you 687 00:31:45,430 --> 00:31:47,990 were correct that, strictly speaking, a one-sided 688 00:31:47,990 --> 00:31:49,640 hypothesis test will be correct and it would give you 689 00:31:49,640 --> 00:31:50,260 some more power. 690 00:31:50,260 --> 00:31:52,210 AUDIENCE: So it would give you power. 691 00:31:52,210 --> 00:31:53,090 PROFESSOR: Yeah, it would give you more power. 692 00:31:53,090 --> 00:31:53,530 AUDIENCE: [UNINTELLIGIBLE] 693 00:31:53,530 --> 00:31:53,740 PROFESSOR: Right. 694 00:31:53,740 --> 00:31:57,220 And the reason it gives you power is, remember, how does 695 00:31:57,220 --> 00:31:57,970 hypothesis testing work? 696 00:31:57,970 --> 00:31:59,520 It says, well, what is the chance this outcome could have 697 00:31:59,520 --> 00:32:01,140 occurred 95-- 698 00:32:01,140 --> 00:32:05,050 what would have occurred by chance 95% of the time? 699 00:32:05,050 --> 00:32:06,810 When you do a two-sided test, you say, OK-- 700 00:32:06,810 --> 00:32:08,710 where's my chalkboard? 701 00:32:08,710 --> 00:32:09,692 Here. 702 00:32:09,692 --> 00:32:12,440 You imagine a normal distribution of outcomes. 703 00:32:12,440 --> 00:32:14,910 You're going to say, well, the 95% is in the middle and 704 00:32:14,910 --> 00:32:18,790 anything in the tails is the stuff that I'm going to 705 00:32:18,790 --> 00:32:20,360 [UNINTELLIGIBLE] by non-random chance. 706 00:32:20,360 --> 00:32:22,250 Well, what you're doing with a one-sided test is you're going 707 00:32:22,250 --> 00:32:24,960 to say, I'm going to take that negative stuff-- 708 00:32:24,960 --> 00:32:27,070 way out there negative stuff-- and I'm going to say that's 709 00:32:27,070 --> 00:32:28,530 also random chance. 710 00:32:28,530 --> 00:32:31,570 So I'm going to pick my 95% all the way to the left. 711 00:32:31,570 --> 00:32:33,960 And that means that the 5% that's not random chance is a 712 00:32:33,960 --> 00:32:35,870 little more to the right. 713 00:32:35,870 --> 00:32:36,900 Do you see what I'm saying? 714 00:32:36,900 --> 00:32:38,700 But it requires that if-- 715 00:32:38,700 --> 00:32:40,820 you're committing to, even if you get really negative 716 00:32:40,820 --> 00:32:43,160 outcomes, asserting that they're random chance, which 717 00:32:43,160 --> 00:32:45,680 is really, often, kind of unbelievable. 718 00:32:45,680 --> 00:32:48,480 The other thing is that, although this is technically 719 00:32:48,480 --> 00:32:51,040 the way hypothesis testing is set up, the norms and 720 00:32:51,040 --> 00:32:54,050 conventions are that we all use two-sided tests for these 721 00:32:54,050 --> 00:32:55,310 reasons I talked about. 722 00:32:55,310 --> 00:33:03,250 And so I can just tell you that, practically speaking, I 723 00:33:03,250 --> 00:33:05,140 think if you do a one-sided test, people are going to be 724 00:33:05,140 --> 00:33:09,930 skeptical because it may be that you, actually, would do 725 00:33:09,930 --> 00:33:13,750 that, but I think most of the time, people can't 726 00:33:13,750 --> 00:33:14,300 commit to do that. 727 00:33:14,300 --> 00:33:16,830 And so the standard has become two-sided tests. 728 00:33:16,830 --> 00:33:17,700 But I certainly agree with you. 729 00:33:17,700 --> 00:33:20,320 It's very frustrating because one should be able to 730 00:33:20,320 --> 00:33:21,570 articulate one-sided hypotheses. 731 00:33:24,420 --> 00:33:27,418 That's sort of a long answer, but does that make sense? 732 00:33:27,418 --> 00:33:28,350 It's OK. 733 00:33:28,350 --> 00:33:30,280 OK, now, for those of you on this side of the board, you 734 00:33:30,280 --> 00:33:32,945 won't be able to see, but maybe if I need to write 735 00:33:32,945 --> 00:33:34,220 something on the board it will be better. 736 00:33:34,220 --> 00:33:35,470 OK. 737 00:33:38,595 --> 00:33:39,760 So now we're going to talk about type I and type II 738 00:33:39,760 --> 00:33:46,230 errors, which, as I mentioned, are not helpfully named. 739 00:33:46,230 --> 00:33:47,650 OK. 740 00:33:47,650 --> 00:33:48,900 A type I error-- 741 00:33:53,940 --> 00:33:56,780 so this is all about probability, so nothing we can 742 00:33:56,780 --> 00:33:57,590 ever say for sure. 743 00:33:57,590 --> 00:34:01,250 We can always say that this is more or less likely. 744 00:34:01,250 --> 00:34:03,240 And there's two different types of errors we can make 745 00:34:03,240 --> 00:34:05,780 when we're doing these probabilities or doing these 746 00:34:05,780 --> 00:34:06,980 assessments. 747 00:34:06,980 --> 00:34:09,969 The first error, and it's called type I error, is we can 748 00:34:09,969 --> 00:34:12,760 conclude that there was an effect when, in fact, there 749 00:34:12,760 --> 00:34:17,219 was no effect, OK? 750 00:34:17,219 --> 00:34:21,070 So when I said the 95% confidence interval, that 95% 751 00:34:21,070 --> 00:34:23,199 is coming from our choice about type 1 errors. 752 00:34:31,530 --> 00:34:32,790 So for example-- 753 00:34:36,530 --> 00:34:38,550 a significance level is the probability that you're going 754 00:34:38,550 --> 00:34:40,620 to falsely conclude the program had an effect when, in 755 00:34:40,620 --> 00:34:42,960 fact, there was no effect, OK? 756 00:34:42,960 --> 00:34:47,020 And that's related to when you say a 95% confidence interval, 757 00:34:47,020 --> 00:34:49,710 the remaining 5% is what we're talking about here. 758 00:34:49,710 --> 00:34:53,980 That's the probability of making a type I error, OK? 759 00:34:53,980 --> 00:34:55,030 And why is that? 760 00:34:55,030 --> 00:34:57,830 Well, we said there's a 95% chance that it's going to be 761 00:34:57,830 --> 00:34:59,590 within this range. 762 00:34:59,590 --> 00:35:02,510 That means that just by random chance, there's some chance it 763 00:35:02,510 --> 00:35:05,295 could be outside that range, right? 764 00:35:05,295 --> 00:35:08,430 So if your confidence interval was over here and zero was 765 00:35:08,430 --> 00:35:12,650 over here, you would say, well, with 95% conference, I'm 766 00:35:12,650 --> 00:35:14,630 going to assume the program had an effect because zero is 767 00:35:14,630 --> 00:35:16,500 not within my confidence interval. 768 00:35:16,500 --> 00:35:20,400 However, 5% of the time, the true effect could be over here 769 00:35:20,400 --> 00:35:21,430 outside your confidence interval. 770 00:35:21,430 --> 00:35:23,460 That's what a 95% confidence interval means. 771 00:35:28,050 --> 00:35:33,230 So in some sense, that's what we mean by a-- 772 00:35:33,230 --> 00:35:35,000 so that's in some sense what a type I error is. 773 00:35:35,000 --> 00:35:36,770 A type I error is the probability that you're going 774 00:35:36,770 --> 00:35:46,000 to detect an effect when, in fact, there's not. 775 00:35:46,000 --> 00:35:51,950 And so the typical levels that you may see are 5%, 1% or 10% 776 00:35:51,950 --> 00:35:53,420 significance levels. 777 00:35:53,420 --> 00:35:55,930 And the way to think about those significance levels is, 778 00:35:55,930 --> 00:35:58,650 if you see something that's significant at the 10% level, 779 00:35:58,650 --> 00:36:01,820 that means it 10% of the time, an effect of that size 780 00:36:01,820 --> 00:36:03,380 could've been just due to random chance. 781 00:36:03,380 --> 00:36:06,860 Might not actually be a true effect. 782 00:36:06,860 --> 00:36:10,440 And if you've heard of a p-value, a p-value is exactly 783 00:36:10,440 --> 00:36:11,560 this number. 784 00:36:11,560 --> 00:36:14,880 A p-value basically says, what is the probability that an 785 00:36:14,880 --> 00:36:18,650 effect this size or larger could have occurred just by 786 00:36:18,650 --> 00:36:21,880 random chance, OK? 787 00:36:24,540 --> 00:36:27,280 So that's what's called a type I error. 788 00:36:27,280 --> 00:36:34,620 And typically, there's no deep reason why 5% is the normal 789 00:36:34,620 --> 00:36:36,580 level of type I errors that we use, but it's kind of the 790 00:36:36,580 --> 00:36:39,160 convention. 791 00:36:39,160 --> 00:36:40,020 It's what everyone else uses. 792 00:36:40,020 --> 00:36:41,950 If you use something different, people are going to 793 00:36:41,950 --> 00:36:42,820 look at you a little funny. 794 00:36:42,820 --> 00:36:47,430 So the conventions are we have 5%, 10%, and 1%, as these 795 00:36:47,430 --> 00:36:48,860 significance levels. 796 00:36:48,860 --> 00:36:55,030 And you might say, gee, 5% or 10% seems pretty low. 797 00:36:55,030 --> 00:36:56,110 Maybe I would want a bigger one. 798 00:36:56,110 --> 00:36:58,030 But on the other hand, if you start thinking about it, that 799 00:36:58,030 --> 00:37:00,160 means that if you use 10% significance, that means that 800 00:37:00,160 --> 00:37:02,876 one out of every 10 studies is going to be wrong. 801 00:37:02,876 --> 00:37:05,710 Or if you had 10 different outcomes in your data set, one 802 00:37:05,710 --> 00:37:08,660 out of every 10 would be significant even just by 803 00:37:08,660 --> 00:37:09,910 random chance. 804 00:37:12,750 --> 00:37:15,920 So the other type of error is what's called, as I said, 805 00:37:15,920 --> 00:37:18,210 helpfully, a type II error. 806 00:37:18,210 --> 00:37:21,310 And a type II error says that you fail to reject that the 807 00:37:21,310 --> 00:37:26,570 program had no effect when, in fact, there was an effect, OK? 808 00:37:26,570 --> 00:37:30,870 So this is, the program did something, but I can't pick it 809 00:37:30,870 --> 00:37:35,480 up in the data, OK? 810 00:37:35,480 --> 00:37:39,280 And we talk about the power of a test. 811 00:37:39,280 --> 00:37:42,870 The power is basically the opposite of a type II error. 812 00:37:42,870 --> 00:37:45,550 A power just says, what's the probability that I will be 813 00:37:45,550 --> 00:37:48,730 able to find an effect given that the actual 814 00:37:48,730 --> 00:37:52,490 effect is there, OK? 815 00:37:52,490 --> 00:37:57,070 So when we talk about how big a sample size we need, what 816 00:37:57,070 --> 00:38:00,320 we're basically talking about is, how much power are we 817 00:38:00,320 --> 00:38:02,250 going to have to detect an effect? 818 00:38:02,250 --> 00:38:04,560 Or what's the probability that given that a true effect is 819 00:38:04,560 --> 00:38:08,150 there, we're going to pick it up in the data, OK? 820 00:38:08,150 --> 00:38:10,740 So here's an example of how to think about power. 821 00:38:10,740 --> 00:38:13,960 If I ran the experiment 100 times-- not 100 samples, but 822 00:38:13,960 --> 00:38:16,380 if I ran the whole thing 100 times-- 823 00:38:16,380 --> 00:38:18,960 what percentage of the time and in how many these cases 824 00:38:18,960 --> 00:38:21,120 would I be able to say, reject the hypothesis that men and 825 00:38:21,120 --> 00:38:24,280 women have the same education at the 5% level if, in fact, 826 00:38:24,280 --> 00:38:28,010 they're different, OK? 827 00:38:28,010 --> 00:38:34,650 So this is a helpful graph which basically plots the 828 00:38:34,650 --> 00:38:36,253 truth and what you're going to conclude based 829 00:38:36,253 --> 00:38:37,730 on your data, OK? 830 00:38:37,730 --> 00:38:40,940 So suppose the truth is that you had no effect and you 831 00:38:40,940 --> 00:38:43,570 conclude your no effect, OK? 832 00:38:43,570 --> 00:38:47,010 Then you're happy. 833 00:38:47,010 --> 00:38:49,290 If there was an effect and you conclude there was an effect, 834 00:38:49,290 --> 00:38:49,960 you're happy. 835 00:38:49,960 --> 00:38:52,330 So you want to be in one of these two boxes. 836 00:38:52,330 --> 00:38:54,540 And the two types of errors you can make-- so one type of 837 00:38:54,540 --> 00:38:56,770 error is over here, right? 838 00:38:56,770 --> 00:39:00,160 So if the truth is there was no effect, but you concluded 839 00:39:00,160 --> 00:39:01,860 there was an effect, that would be making a 840 00:39:01,860 --> 00:39:03,670 type I error, OK? 841 00:39:03,670 --> 00:39:05,080 And this is what we talked about size. 842 00:39:05,080 --> 00:39:09,075 So this one, we normally fix this one at 5%. 843 00:39:09,075 --> 00:39:12,680 So it's only 5% of the time-- 844 00:39:12,680 --> 00:39:15,505 if there's no effect, 5% of the time you're going to end 845 00:39:15,505 --> 00:39:17,680 up here and 95% of the time you're going to end up here. 846 00:39:17,680 --> 00:39:20,560 That's what a 95% confidence interval is telling you. 847 00:39:20,560 --> 00:39:24,090 And the other thing is, suppose that the thing had an 848 00:39:24,090 --> 00:39:29,310 effect but you couldn't find it in the data, OK? 849 00:39:29,310 --> 00:39:30,850 That's what's called a type II error. 850 00:39:30,850 --> 00:39:34,040 And that's, when we design our experiments, we want to make 851 00:39:34,040 --> 00:39:36,400 sure that our samples are sufficiently large that the 852 00:39:36,400 --> 00:39:40,710 probability you end up in this box is not too big, OK? 853 00:39:40,710 --> 00:39:44,430 So that's a sense of what we mean by the different types of 854 00:39:44,430 --> 00:39:45,720 mistakes or errors you could make. 855 00:39:45,720 --> 00:39:46,181 Yeah. 856 00:39:46,181 --> 00:39:50,330 AUDIENCE: It's kind of a stupid question. 857 00:39:50,330 --> 00:39:53,040 So power is the probability that you are not 858 00:39:53,040 --> 00:39:54,495 making a type II error? 859 00:39:54,495 --> 00:39:55,465 PROFESSOR: Yes. 860 00:39:55,465 --> 00:39:58,470 AUDIENCE: So then power is the probability that you're in the 861 00:39:58,470 --> 00:40:00,382 smiley face box, that you are-- 862 00:40:00,382 --> 00:40:00,860 [INTERPOSING VOICES] 863 00:40:00,860 --> 00:40:03,190 PROFESSOR: Yes. 864 00:40:03,190 --> 00:40:05,840 Power is the probability you're over here. 865 00:40:05,840 --> 00:40:07,750 Yeah, we say power is related to type II errors. 866 00:40:07,750 --> 00:40:08,580 Power is over here. 867 00:40:08,580 --> 00:40:09,930 This is the power. 868 00:40:09,930 --> 00:40:11,530 Power is conditional on there being an effect. 869 00:40:11,530 --> 00:40:13,966 What's the probability you're in this box, not this box? 870 00:40:17,390 --> 00:40:21,630 Probably should say one minus power to be clearer. 871 00:40:21,630 --> 00:40:21,780 OK? 872 00:40:21,780 --> 00:40:23,527 Does that makes sense? 873 00:40:23,527 --> 00:40:24,660 All right. 874 00:40:24,660 --> 00:40:27,230 So when we're designing experiments, we typically fix 875 00:40:27,230 --> 00:40:31,520 this at conventional levels. 876 00:40:31,520 --> 00:40:34,450 And we choose our sample size so that we get this, the 877 00:40:34,450 --> 00:40:36,500 power, or the probability that you're in the happy face box 878 00:40:36,500 --> 00:40:39,190 over here to a reasonable level given the effect size 879 00:40:39,190 --> 00:40:42,862 that we think we're likely to get, OK? 880 00:40:46,348 --> 00:40:47,598 OK. 881 00:40:49,350 --> 00:40:53,652 Now, in some sense, the next two things, standard errors, 882 00:40:53,652 --> 00:40:58,620 are about this box, size. 883 00:40:58,620 --> 00:41:03,245 And power is about this box, or these boxes. 884 00:41:03,245 --> 00:41:05,470 Yeah. 885 00:41:05,470 --> 00:41:08,260 AUDIENCE: Why is power not also the probability that you 886 00:41:08,260 --> 00:41:11,620 end up in the bottom right box as opposed to the bottom left? 887 00:41:11,620 --> 00:41:12,990 PROFESSOR: Because that's size. 888 00:41:12,990 --> 00:41:17,310 AUDIENCE: Isn't size also linked to-- or power also 889 00:41:17,310 --> 00:41:17,790 linked to-- 890 00:41:17,790 --> 00:41:21,985 PROFESSOR: No, they're all related, but we typically-- 891 00:41:24,860 --> 00:41:27,490 they're related in the following way. 892 00:41:27,490 --> 00:41:31,190 We assert a size because when we calculate our standard 893 00:41:31,190 --> 00:41:35,620 error-- our confidence intervals, we pick how big or 894 00:41:35,620 --> 00:41:37,840 small we want the confidence intervals to be. 895 00:41:37,840 --> 00:41:40,090 When we say a 95% confidence interval, we're 896 00:41:40,090 --> 00:41:42,460 picking the size, OK? 897 00:41:42,460 --> 00:41:45,045 So this one, we get to choose. 898 00:41:45,045 --> 00:41:47,935 AUDIENCE: So it's not sample size, it's size of the 899 00:41:47,935 --> 00:41:48,560 confidence interval? 900 00:41:48,560 --> 00:41:49,690 PROFESSOR: No. 901 00:41:49,690 --> 00:41:53,660 Yeah, this is size is a-- 902 00:41:53,660 --> 00:41:55,430 yeah, it's the size of the confidence interval. 903 00:41:55,430 --> 00:41:55,880 That's right. 904 00:41:55,880 --> 00:41:56,900 Sorry, it's not the sample size. 905 00:41:56,900 --> 00:41:58,470 That's right. 906 00:41:58,470 --> 00:42:02,210 It's called the size of the test in yet more confusing 907 00:42:02,210 --> 00:42:03,460 terminology. 908 00:42:05,474 --> 00:42:06,350 That's right. 909 00:42:06,350 --> 00:42:08,640 This is the size of the confidence interval, 910 00:42:08,640 --> 00:42:09,120 essentially. 911 00:42:09,120 --> 00:42:11,090 And this one you pick, and this one is 912 00:42:11,090 --> 00:42:13,140 determined by your data. 913 00:42:16,916 --> 00:42:19,276 OK? 914 00:42:19,276 --> 00:42:20,220 All right. 915 00:42:20,220 --> 00:42:25,000 OK, so now let's talk about this part, which is standard 916 00:42:25,000 --> 00:42:26,710 errors and significance. 917 00:42:26,710 --> 00:42:29,540 It's all kind of related. 918 00:42:29,540 --> 00:42:32,530 All right, so we're going to estimate the 919 00:42:32,530 --> 00:42:33,380 effect of our program. 920 00:42:33,380 --> 00:42:37,750 And we typically call that beta, or beta hat. 921 00:42:37,750 --> 00:42:40,560 So the convention is that things that are estimated, we 922 00:42:40,560 --> 00:42:42,800 put a little hat over them, OK? 923 00:42:42,800 --> 00:42:45,550 So beta hat is going to be our estimate of the program's 924 00:42:45,550 --> 00:42:46,760 effectiveness. 925 00:42:46,760 --> 00:42:49,510 This is our best guess as to the difference between these 926 00:42:49,510 --> 00:42:51,120 two groups. 927 00:42:51,120 --> 00:42:55,100 So for example, this is the average treatment test score 928 00:42:55,100 --> 00:42:56,380 minus the average control test score. 929 00:42:59,450 --> 00:43:02,560 And then we're also going to calculate our estimate of the 930 00:43:02,560 --> 00:43:04,540 standard error of beta hat, right? 931 00:43:04,540 --> 00:43:06,130 And remember that the confidence interval is about 932 00:43:06,130 --> 00:43:08,890 two times the standard error. 933 00:43:08,890 --> 00:43:10,880 So the standard error is going to say how precise our 934 00:43:10,880 --> 00:43:13,440 estimate of beta hat is, which is, remember, if we ran the 935 00:43:13,440 --> 00:43:15,590 experiment 100 times, what will be the distributions of 936 00:43:15,590 --> 00:43:21,180 beta hats that we would get, OK? 937 00:43:21,180 --> 00:43:23,870 And this depends on the sample size and the noise in the 938 00:43:23,870 --> 00:43:25,980 data, right? 939 00:43:25,980 --> 00:43:28,910 And remember we went through this already that here, in 940 00:43:28,910 --> 00:43:32,490 this case, the standard error of how confident we would be-- 941 00:43:32,490 --> 00:43:36,140 so the beta hat, in this case, is going to be 10, and in this 942 00:43:36,140 --> 00:43:38,500 case, it's also going to be 10, right? 943 00:43:38,500 --> 00:43:42,910 But here, these two things are really precisely estimated, so 944 00:43:42,910 --> 00:43:46,540 our standard error of beta hat is going to be very small 945 00:43:46,540 --> 00:43:49,100 because we're going to say we have a very precise estimate 946 00:43:49,100 --> 00:43:50,920 of the difference between them. 947 00:43:50,920 --> 00:43:52,280 And so the confidence interval is also 948 00:43:52,280 --> 00:43:53,830 going to be very small. 949 00:43:53,830 --> 00:43:55,630 And here, there's lots of noise in the data, so our 950 00:43:55,630 --> 00:43:58,570 estimate of the standard error is going to be larger. 951 00:43:58,570 --> 00:44:00,490 So in both cases, beta hat is the same. 952 00:44:00,490 --> 00:44:01,940 It's 10 in both cases. 953 00:44:01,940 --> 00:44:03,580 But the standard error is very big here and 954 00:44:03,580 --> 00:44:07,260 very small here, OK? 955 00:44:07,260 --> 00:44:11,320 Now, when we calculate the statistical significance, we 956 00:44:11,320 --> 00:44:14,120 use something called a t-ratio. 957 00:44:14,120 --> 00:44:15,370 And the t-ratio-- 958 00:44:19,370 --> 00:44:21,510 it's actually often called the student's t-ratio, which I 959 00:44:21,510 --> 00:44:22,810 thought was because students used it. 960 00:44:22,810 --> 00:44:24,275 But it's actually named after Mr. Student. 961 00:44:28,310 --> 00:44:30,770 It's the ratio of beta hat to the standard error 962 00:44:30,770 --> 00:44:33,430 of beta hat, OK? 963 00:44:33,430 --> 00:44:38,040 And the reason that we happen to use this ratio is that, if 964 00:44:38,040 --> 00:44:42,140 there is no effect, if beta hat is actually zero, we know 965 00:44:42,140 --> 00:44:44,030 that this thing has a normal distribution, so we can 966 00:44:44,030 --> 00:44:46,590 calculate the probabilities that this thing is really or 967 00:44:46,590 --> 00:44:49,010 really small, OK? 968 00:44:51,630 --> 00:44:54,090 So we calculate this ratio of beta hat over the standard 969 00:44:54,090 --> 00:44:55,870 error of beta hat. 970 00:44:58,550 --> 00:45:01,030 It turns out that if t is greater 971 00:45:01,030 --> 00:45:05,330 than, in absolute value-- 972 00:45:05,330 --> 00:45:08,850 sorry, if the absolute value of t, I should say, is greater 973 00:45:08,850 --> 00:45:10,940 than 1.96-- 974 00:45:10,940 --> 00:45:16,380 so essentially, if it's bigger than 2 or less than minus 2, 975 00:45:16,380 --> 00:45:18,920 we're going to reject the hypothesis of a quality at a 976 00:45:18,920 --> 00:45:20,810 5% significance level. 977 00:45:20,810 --> 00:45:21,850 And why is that? 978 00:45:21,850 --> 00:45:28,980 It's because it turns out, from statistics, that if the 979 00:45:28,980 --> 00:45:31,350 truth is zero, OK? 980 00:45:31,350 --> 00:45:35,170 So if we're in the no effect box and the truth is zero, 981 00:45:35,170 --> 00:45:39,180 this ratio, it turns out, will have a normal distribution. 982 00:45:39,180 --> 00:45:40,990 And it just turns out from a normal distribution that the 983 00:45:40,990 --> 00:45:45,590 probability that the 5% confidence interval is 1.96 984 00:45:45,590 --> 00:45:47,780 away from zero if you have a normal distribution. 985 00:45:47,780 --> 00:45:51,880 That's just a fact about normal distributions, OK? 986 00:45:51,880 --> 00:45:54,880 So if we calculate this ratio and we say it's greater in 987 00:45:54,880 --> 00:45:57,900 absolute value than 1.96, we're going to reject the 988 00:45:57,900 --> 00:46:00,320 hypothesis of a quality at the 5% level, OK? 989 00:46:00,320 --> 00:46:01,220 So we can reject zero. 990 00:46:01,220 --> 00:46:02,650 Zero is going to be outside of our confidence interval. 991 00:46:02,650 --> 00:46:04,945 And if it's less than 1.96, we're going to fail to reject 992 00:46:04,945 --> 00:46:06,945 it because zero is going to be inside our confidence 993 00:46:06,945 --> 00:46:11,240 interval, OK? 994 00:46:11,240 --> 00:46:13,470 So in this case, for example, the difference was 2.59. 995 00:46:13,470 --> 00:46:14,990 The standard error was 0.54. 996 00:46:14,990 --> 00:46:19,690 The t-ratio is about seven. 997 00:46:19,690 --> 00:46:21,490 No, it's about five. 998 00:46:21,490 --> 00:46:23,180 So we're definitely going to be able to 999 00:46:23,180 --> 00:46:24,760 reject in this case. 1000 00:46:24,760 --> 00:46:30,090 So we have a t-ratio of about five, OK? 1001 00:46:30,090 --> 00:46:35,180 So you may see this terminology and this is where 1002 00:46:35,180 --> 00:46:36,430 it's coming from. 1003 00:46:36,430 --> 00:46:39,870 Now, there's an important point to note here, which will 1004 00:46:39,870 --> 00:46:42,380 come up later when we talk about power calculations, 1005 00:46:42,380 --> 00:46:50,180 which is, in some sense, that the power that we have is 1006 00:46:50,180 --> 00:46:53,980 determined by this ratio of the point estimate to our 1007 00:46:53,980 --> 00:46:55,620 standard error. 1008 00:46:55,620 --> 00:46:58,530 And so this says, for example, that if we kind of look at 1009 00:46:58,530 --> 00:47:05,010 this a little more, that if you have bigger betas, you can 1010 00:47:05,010 --> 00:47:07,210 still detect effects for a given standard-- 1011 00:47:07,210 --> 00:47:08,650 so if you fix the standard error but you made beta 1012 00:47:08,650 --> 00:47:11,070 bigger, you're more likely to conclude there was a 1013 00:47:11,070 --> 00:47:11,940 difference, right? 1014 00:47:11,940 --> 00:47:15,950 So what's going to increase your being able to conclude 1015 00:47:15,950 --> 00:47:16,980 there was a difference? 1016 00:47:16,980 --> 00:47:18,780 Either your effect size is bigger or your standard error 1017 00:47:18,780 --> 00:47:20,800 is smaller, mechanically. 1018 00:47:25,500 --> 00:47:28,040 OK. 1019 00:47:28,040 --> 00:47:32,230 So that's how we are going to calculate being in this box. 1020 00:47:32,230 --> 00:47:34,230 So how do we think about power, which is the 1021 00:47:34,230 --> 00:47:36,150 probability that we're in this box? 1022 00:47:36,150 --> 00:47:38,890 We had an effect and we're able to detect that-- sorry, 1023 00:47:38,890 --> 00:47:46,170 power's in this box-- that we had an effect, OK? 1024 00:47:46,170 --> 00:47:53,520 So when we're planning an experiment, we can do some 1025 00:47:53,520 --> 00:47:56,670 calculations to help us figure out what that power is. 1026 00:47:56,670 --> 00:48:00,450 What's the probability, if the truth is a certain level, that 1027 00:48:00,450 --> 00:48:02,180 we're going to be able to pick it up in the data? 1028 00:48:05,390 --> 00:48:06,640 And what do we need to do that? 1029 00:48:10,180 --> 00:48:12,550 We're going to have to specify a null hypothesis, which is 1030 00:48:12,550 --> 00:48:13,500 usually zero. 1031 00:48:13,500 --> 00:48:15,030 We're going to be testing that something's different than 1032 00:48:15,030 --> 00:48:19,120 zero, the two groups are the same, for example. 1033 00:48:19,120 --> 00:48:19,940 We're going to have to pick our 1034 00:48:19,940 --> 00:48:21,920 significance level, our size. 1035 00:48:21,920 --> 00:48:23,170 And that, we almost always pick at 5%. 1036 00:48:26,000 --> 00:48:31,680 We're going to have to pick an effect size. 1037 00:48:31,680 --> 00:48:33,530 And we'll talk about what exactly this means in a couple 1038 00:48:33,530 --> 00:48:33,950 more slides. 1039 00:48:33,950 --> 00:48:37,650 But when we calculate a power, a power is for a given 1040 00:48:37,650 --> 00:48:42,380 effect size, OK? 1041 00:48:42,380 --> 00:48:43,800 And then we'll calculate the power. 1042 00:48:46,330 --> 00:48:50,600 So for example, suppose that we did this 1043 00:48:50,600 --> 00:48:52,660 and a power was 80%. 1044 00:48:52,660 --> 00:48:54,780 That would mean that if we did this experiment 100 times-- 1045 00:48:54,780 --> 00:48:56,860 not 100 times, but actually repeated the whole experiment 1046 00:48:56,860 --> 00:49:03,620 100 times, 80% of the times we did this experiment, if the 1047 00:49:03,620 --> 00:49:07,130 hypothesis is, in fact, false, and instead, the truth is 1048 00:49:07,130 --> 00:49:11,300 this, we would be able to reject the null and conclude 1049 00:49:11,300 --> 00:49:16,300 there was a true effect 80% of the time, OK? 1050 00:49:16,300 --> 00:49:18,610 That's a little bit complicated, but does that 1051 00:49:18,610 --> 00:49:20,975 make sense, what we're going to be trying to do with power? 1052 00:49:25,250 --> 00:49:26,990 So we're going to fix the effect size. 1053 00:49:26,990 --> 00:49:29,505 So remember, we fix the bottom box. 1054 00:49:33,680 --> 00:49:36,040 When we calculate power, we have to speculate not just 1055 00:49:36,040 --> 00:49:37,020 effect versus no effect. 1056 00:49:37,020 --> 00:49:40,090 We have to postulate just how effective the program is. 1057 00:49:40,090 --> 00:49:42,920 So we're going to say, suppose that the effect size is 5%. 1058 00:49:42,920 --> 00:49:48,890 The truth is 0.2, right? 1059 00:49:48,890 --> 00:49:51,930 How big a sample would we need to be in this box 80% 1060 00:49:51,930 --> 00:49:54,390 of the time, OK? 1061 00:49:54,390 --> 00:49:57,300 So when we say power, that's what we mean. 1062 00:49:57,300 --> 00:50:03,050 And when we calculate the size of the experiments, you have 1063 00:50:03,050 --> 00:50:07,300 to make a judgment call of how big a power do you want. 1064 00:50:07,300 --> 00:50:09,850 The typical powers that we use when we do power calculations, 1065 00:50:09,850 --> 00:50:12,430 are either 80% or 90%. 1066 00:50:12,430 --> 00:50:14,670 So what does this mean? 1067 00:50:14,670 --> 00:50:16,620 This means-- suppose you did 80%. 1068 00:50:16,620 --> 00:50:17,680 Or [UNINTELLIGIBLE] this. 1069 00:50:17,680 --> 00:50:19,540 If you did 80%, that would mean that if you ran your 1070 00:50:19,540 --> 00:50:24,070 experiment 100 times and the true effect was 0.2 in this 1071 00:50:24,070 --> 00:50:27,220 case, you would be able to pick up an effect, 1072 00:50:27,220 --> 00:50:30,510 statistically 80 out of those 100 times. 1073 00:50:30,510 --> 00:50:32,100 20 out of 100 times, you wouldn't. 1074 00:50:37,280 --> 00:50:42,100 And the bigger your sample size, the larger your power is 1075 00:50:42,100 --> 00:50:47,070 going to be, OK? 1076 00:50:47,070 --> 00:50:50,410 Does that make sense so far? 1077 00:50:50,410 --> 00:50:52,260 OK. 1078 00:50:52,260 --> 00:50:54,720 Suppose you wanted to calculate what our power is 1079 00:50:54,720 --> 00:50:57,610 going to be. 1080 00:50:57,610 --> 00:51:00,680 What are the things you would need to know? 1081 00:51:00,680 --> 00:51:02,590 You would need to know your significance 1082 00:51:02,590 --> 00:51:03,710 level of your size. 1083 00:51:03,710 --> 00:51:07,210 And as I said, this, we just assume, OK? 1084 00:51:07,210 --> 00:51:09,100 This is that bottom box. 1085 00:51:09,100 --> 00:51:12,560 We're just going to assume that it's 5%. 1086 00:51:12,560 --> 00:51:14,290 And the lower it is, the larger sample 1087 00:51:14,290 --> 00:51:15,580 you're going to need. 1088 00:51:15,580 --> 00:51:18,190 But this one is sort of picked for you. 1089 00:51:18,190 --> 00:51:21,250 We almost always use 5% because that's the convention. 1090 00:51:21,250 --> 00:51:22,738 That's what everyone uses, essentially. 1091 00:51:27,060 --> 00:51:29,720 The second thing you need to know is the mean and the 1092 00:51:29,720 --> 00:51:34,014 variance of the outcome in the comparison group. 1093 00:51:34,014 --> 00:51:37,050 So you need to know-- 1094 00:51:37,050 --> 00:51:40,600 so remember, all this power calculation is going to depend 1095 00:51:40,600 --> 00:51:44,820 on whether your sample looks like this, really tight, or 1096 00:51:44,820 --> 00:51:46,570 looks like this and is very noisy. 1097 00:51:46,570 --> 00:51:47,640 Because you obviously need a much bigger 1098 00:51:47,640 --> 00:51:49,390 sample here than here. 1099 00:51:49,390 --> 00:51:51,920 So in order to do a power calculation, you need to know, 1100 00:51:51,920 --> 00:51:55,610 well, just what does the outcome look like, right? 1101 00:51:55,610 --> 00:51:57,920 Does the outcome really have very narrow variance? 1102 00:52:00,580 --> 00:52:03,110 Is everyone almost exactly the same, in which case it's going 1103 00:52:03,110 --> 00:52:04,020 to be very easy to detect effects? 1104 00:52:04,020 --> 00:52:09,620 Or is there are huge range of people, in which case you're 1105 00:52:09,620 --> 00:52:11,556 going to need bigger effects. 1106 00:52:11,556 --> 00:52:13,120 Now, how do we get this? 1107 00:52:13,120 --> 00:52:16,350 So this one, we just conventionally set. 1108 00:52:16,350 --> 00:52:17,960 This one, we have to get somewhere. 1109 00:52:17,960 --> 00:52:22,890 And we usually have to get it from some other survey. 1110 00:52:22,890 --> 00:52:26,730 So we have to find someone that collected data in a 1111 00:52:26,730 --> 00:52:27,970 similar population. 1112 00:52:27,970 --> 00:52:30,700 Or sometimes we'll go and collect data ourselves in that 1113 00:52:30,700 --> 00:52:31,310 same population. 1114 00:52:31,310 --> 00:52:33,820 Just a very small survey just to get a sense of what this 1115 00:52:33,820 --> 00:52:37,120 variable looks like, OK? 1116 00:52:37,120 --> 00:52:41,010 And if the variability is big, we're going to need a really 1117 00:52:41,010 --> 00:52:42,650 big sample. 1118 00:52:42,650 --> 00:52:44,323 And if the variability is really small, we're going to 1119 00:52:44,323 --> 00:52:45,325 need a small sample. 1120 00:52:45,325 --> 00:52:49,010 And it's really important to do this because you don't want 1121 00:52:49,010 --> 00:52:51,530 to spend all your time and money running an experiment 1122 00:52:51,530 --> 00:52:53,930 only to turn out that there was no hope of ever finding an 1123 00:52:53,930 --> 00:52:59,650 effect because the power was too small, right? 1124 00:52:59,650 --> 00:52:59,955 Yeah. 1125 00:52:59,955 --> 00:53:01,931 AUDIENCE: And this is in the entire population, not just 1126 00:53:01,931 --> 00:53:03,695 the comparison group, right? 1127 00:53:03,695 --> 00:53:04,577 It says-- -- 1128 00:53:04,577 --> 00:53:06,920 PROFESSOR: Yeah, but before you do your treatment, the 1129 00:53:06,920 --> 00:53:08,540 comparison and the treatment are the same. 1130 00:53:08,540 --> 00:53:09,340 AUDIENCE: They are the same. 1131 00:53:09,340 --> 00:53:10,030 PROFESSOR: Doesn't matter. 1132 00:53:10,030 --> 00:53:10,740 AUDIENCE: So it's a baseline population. 1133 00:53:10,740 --> 00:53:11,520 PROFESSOR: Baseline would be fine. 1134 00:53:11,520 --> 00:53:14,400 Yeah. 1135 00:53:14,400 --> 00:53:16,090 Before you do your treatment, they're the same. 1136 00:53:16,090 --> 00:53:20,860 So it doesn't matter, OK? 1137 00:53:20,860 --> 00:53:24,660 And the first thing you need is, you need to make an 1138 00:53:24,660 --> 00:53:29,570 assumption about what effect size you want to detect. 1139 00:53:29,570 --> 00:53:30,820 And this one-- 1140 00:53:33,950 --> 00:53:37,530 sometimes you also have to supply this. 1141 00:53:37,530 --> 00:53:42,350 And the best way to think about what effect size you 1142 00:53:42,350 --> 00:53:47,880 want to put in here is you want to say, what's the 1143 00:53:47,880 --> 00:53:55,660 smallest effect that would prompt a policy response, OK? 1144 00:53:55,660 --> 00:53:57,520 So one could think about this, for example, by doing a 1145 00:53:57,520 --> 00:53:58,710 cost-benefit calculation, right? 1146 00:53:58,710 --> 00:54:01,910 You could say that we do a cost-benefit calculation. 1147 00:54:01,910 --> 00:54:03,590 This thing costs $100. 1148 00:54:03,590 --> 00:54:06,120 If we don't get an effective 0.1, it's just 1149 00:54:06,120 --> 00:54:08,800 not worth $100, right? 1150 00:54:08,800 --> 00:54:11,450 So that would be a good way of coming up with how big an 1151 00:54:11,450 --> 00:54:14,680 effect size you want here. 1152 00:54:14,680 --> 00:54:16,460 And the idea, then, is if the effect is any smaller than 1153 00:54:16,460 --> 00:54:18,722 this, it's just not interesting to distinguish it 1154 00:54:18,722 --> 00:54:21,000 from zero, right? 1155 00:54:21,000 --> 00:54:24,060 Suppose that the thing had a true effect of 0.001, right? 1156 00:54:24,060 --> 00:54:26,260 But if it was that small of an effect, it could be completely 1157 00:54:26,260 --> 00:54:26,880 cost effective. 1158 00:54:26,880 --> 00:54:29,750 So say the thing happens at an effect of 0.001. 1159 00:54:29,750 --> 00:54:32,130 Who cares, right? 1160 00:54:32,130 --> 00:54:35,100 So you want to be thinking about, from a policy 1161 00:54:35,100 --> 00:54:37,330 perspective is, what's the smallest effect size you want 1162 00:54:37,330 --> 00:54:39,925 to know, from a policy perspective, in order to set 1163 00:54:39,925 --> 00:54:41,710 your power calculations? 1164 00:54:41,710 --> 00:54:42,140 Yeah. 1165 00:54:42,140 --> 00:54:44,100 AUDIENCE: I have a question back at the mean 1166 00:54:44,100 --> 00:54:45,025 and variance thing. 1167 00:54:45,025 --> 00:54:46,290 PROFESSOR: Oh, here. 1168 00:54:46,290 --> 00:54:46,773 Yeah. 1169 00:54:46,773 --> 00:54:47,739 AUDIENCE: Yeah. 1170 00:54:47,739 --> 00:54:50,154 So in terms of the baseline thing that you would collect-- 1171 00:54:50,154 --> 00:54:54,030 so I'm on the implementation side of this, right? 1172 00:54:54,030 --> 00:54:54,825 So we do projects. 1173 00:54:54,825 --> 00:54:57,100 We collect baseline data. 1174 00:54:57,100 --> 00:55:03,050 Now, the case that I'm thinking of, the baseline data 1175 00:55:03,050 --> 00:55:06,820 that we would collect might not be exactly the same kind 1176 00:55:06,820 --> 00:55:12,530 of data that we are looking for in terms of our study. 1177 00:55:12,530 --> 00:55:13,985 What kind of base-- how-- 1178 00:55:13,985 --> 00:55:14,670 PROFESSOR: Right, OK. 1179 00:55:14,670 --> 00:55:18,180 So when we say baseline, there's two different things 1180 00:55:18,180 --> 00:55:20,380 we mean by baseline. 1181 00:55:20,380 --> 00:55:22,820 For this case, this is not strictly a baseline. 1182 00:55:22,820 --> 00:55:26,340 This is just something about what's your variable 1183 00:55:26,340 --> 00:55:26,870 going to look like. 1184 00:55:26,870 --> 00:55:27,880 Let me come back to that in a sec. 1185 00:55:27,880 --> 00:55:29,910 We also sometimes talk about baselines that we are going to 1186 00:55:29,910 --> 00:55:33,020 use of actually collecting the actual outcome variable before 1187 00:55:33,020 --> 00:55:34,930 we start the intervention, right? 1188 00:55:34,930 --> 00:55:37,750 Those are also useful, and we'll talk about those in a 1189 00:55:37,750 --> 00:55:38,970 couple slides. 1190 00:55:38,970 --> 00:55:41,900 And those, one wants them to be more similar, probably, to 1191 00:55:41,900 --> 00:55:44,285 the actual variable you're going to use. 1192 00:55:44,285 --> 00:55:48,410 Now, for your case, we often don't-- 1193 00:55:52,590 --> 00:55:56,640 the accuracy of your power calculation depends pretty 1194 00:55:56,640 --> 00:56:01,780 critically on how close this mean and variance are to what 1195 00:56:01,780 --> 00:56:04,100 you're going to actually get in your data. 1196 00:56:04,100 --> 00:56:07,990 And when you start in the example that you guys are 1197 00:56:07,990 --> 00:56:09,470 going to work on or that maybe you've already started working 1198 00:56:09,470 --> 00:56:11,560 on, you're going to find that it's 1199 00:56:11,560 --> 00:56:12,920 actually pretty sensitive. 1200 00:56:12,920 --> 00:56:15,010 Turns out it's pretty sensitive. 1201 00:56:15,010 --> 00:56:19,870 So getting these wrong is going to mean your power 1202 00:56:19,870 --> 00:56:23,200 calculation is going to be wrong. 1203 00:56:23,200 --> 00:56:26,460 So that's sort of an argument for saying you want this to be 1204 00:56:26,460 --> 00:56:28,530 as good as possible. 1205 00:56:28,530 --> 00:56:32,750 Now the flip side of that, though, is you're going to 1206 00:56:32,750 --> 00:56:36,710 find that these power calculations are fairly 1207 00:56:36,710 --> 00:56:40,640 sensitive to what effect size you choose as well. 1208 00:56:40,640 --> 00:56:44,500 So you're going to find that if you go from a effect size 1209 00:56:44,500 --> 00:56:46,870 of 0.2 to an effect size of 0.1, you're going to need four 1210 00:56:46,870 --> 00:56:48,960 times the sample. 1211 00:56:48,960 --> 00:56:50,210 That's just the way the math works out. 1212 00:56:54,820 --> 00:57:00,360 By which I'm going to mean that I think that these power 1213 00:57:00,360 --> 00:57:03,480 calculations are useful for making sure you're in the 1214 00:57:03,480 --> 00:57:07,550 right ballpark, but not necessarily going to nail an 1215 00:57:07,550 --> 00:57:09,250 exact number for you. 1216 00:57:12,210 --> 00:57:17,050 All that's by way of saying that you want to get-- because 1217 00:57:17,050 --> 00:57:19,910 these things are so sensitive, you want to get as close as 1218 00:57:19,910 --> 00:57:22,770 possible to what's actually going to be there. 1219 00:57:22,770 --> 00:57:25,710 On the other hand you're going to find the results are also 1220 00:57:25,710 --> 00:57:29,080 so sensitive to the effect size you want to detect that 1221 00:57:29,080 --> 00:57:32,260 if this was a little bit off, that might be a tradeoff you 1222 00:57:32,260 --> 00:57:33,540 would be willing to live with in practice. 1223 00:57:33,540 --> 00:57:34,780 AUDIENCE: So, from my-- 1224 00:57:34,780 --> 00:57:36,870 PROFESSOR: Does that make sense? 1225 00:57:36,870 --> 00:57:40,630 AUDIENCE: Yeah, but it seems like the effect size-- your 1226 00:57:40,630 --> 00:57:44,690 estimate of your effect size is this kind of-- 1227 00:57:44,690 --> 00:57:47,930 we've got all this science for the calculation and yet your 1228 00:57:47,930 --> 00:57:49,650 estimate of your effect size is based on-- 1229 00:57:49,650 --> 00:57:50,780 PROFESSOR: You're absolutely right. 1230 00:57:50,780 --> 00:57:51,900 AUDIENCE: --getting that-- 1231 00:57:51,900 --> 00:57:52,690 PROFESSOR: Hold on, though. 1232 00:57:52,690 --> 00:57:53,990 Let me back up a little bit, though. 1233 00:57:53,990 --> 00:57:55,370 You're right, except the-- 1234 00:57:55,370 --> 00:57:57,505 in some sense, the best way to get estimates for your effect 1235 00:57:57,505 --> 00:58:00,100 size is to look at similar programs, OK? 1236 00:58:00,100 --> 00:58:03,970 So now there are lots of programs in 1237 00:58:03,970 --> 00:58:05,130 education, for example. 1238 00:58:05,130 --> 00:58:08,860 And they tend to find effect-- 1239 00:58:08,860 --> 00:58:11,840 I've now seen a bazillion things that work on improving 1240 00:58:11,840 --> 00:58:12,780 test scores. 1241 00:58:12,780 --> 00:58:14,870 And I can tell you that they tend to get-- 1242 00:58:14,870 --> 00:58:16,565 standardized effect size is the effect size divided by the 1243 00:58:16,565 --> 00:58:17,360 standard deviation. 1244 00:58:17,360 --> 00:58:21,550 And they tend to get effect sizes in the 0.1, 0.15, 0.2 1245 00:58:21,550 --> 00:58:24,410 range, right? 1246 00:58:24,410 --> 00:58:26,660 So you can look at those and say, well, I think that most 1247 00:58:26,660 --> 00:58:30,040 other comparable interventions are getting 0.1, so I'm going 1248 00:58:30,040 --> 00:58:32,910 to use 0.1 as my effect size. 1249 00:58:32,910 --> 00:58:34,490 So you're right if you're just trying to sit here 1250 00:58:34,490 --> 00:58:34,940 introspectful-- 1251 00:58:34,940 --> 00:58:37,060 what my effect size is going to be, it's very hard. 1252 00:58:37,060 --> 00:58:40,580 But if you use comparable studies to get a sense, then 1253 00:58:40,580 --> 00:58:41,520 you can get a sense. 1254 00:58:41,520 --> 00:58:42,990 And the other thing I mentioned is, you can do 1255 00:58:42,990 --> 00:58:45,640 cost-benefit analysis and say, well, look-- 1256 00:58:45,640 --> 00:58:47,260 which is sort of another way of saying it, If there are 1257 00:58:47,260 --> 00:58:51,650 other things out there which cost $100 per kid and get 0.1, 1258 00:58:51,650 --> 00:58:54,150 then my thing, presumably, has got to do at least as well as 1259 00:58:54,150 --> 00:58:56,790 0.1 for $100-- suppose the other thing also costs $100 a 1260 00:58:56,790 --> 00:58:58,150 kid, I've got to do at last as well as 0.1. 1261 00:58:58,150 --> 00:58:59,760 Otherwise, I'd rather do this other thing. 1262 00:58:59,760 --> 00:59:01,640 So it's another way of getting at the effect size. 1263 00:59:01,640 --> 00:59:04,715 AUDIENCE: Could you, then, also look at existing data in 1264 00:59:04,715 --> 00:59:08,110 the literature for the mean and variance thing, 1265 00:59:08,110 --> 00:59:08,740 or do you have to-- 1266 00:59:08,740 --> 00:59:10,480 PROFESSOR: You could, but this one is going to be more 1267 00:59:10,480 --> 00:59:11,385 sensitive to your population. 1268 00:59:11,385 --> 00:59:14,580 AUDIENCE: So it would just have to be very well-matched 1269 00:59:14,580 --> 00:59:16,165 to be able to use it. 1270 00:59:16,165 --> 00:59:16,540 PROFESSOR: Right. 1271 00:59:16,540 --> 00:59:19,240 I mean, look, if you don't have it, you could do it to 1272 00:59:19,240 --> 00:59:21,480 get a sense, but this is one where the different 1273 00:59:21,480 --> 00:59:23,730 populations are going to be very different in terms of 1274 00:59:23,730 --> 00:59:24,980 their mean and variance. 1275 00:59:28,540 --> 00:59:31,550 In order to get an estimate of this, you need a much, much, 1276 00:59:31,550 --> 00:59:33,970 much smaller sample size than you need to get an estimate of 1277 00:59:33,970 --> 00:59:35,970 the overall treatment effect of the program. 1278 00:59:35,970 --> 00:59:39,720 So you can often do a small survey-- 1279 00:59:39,720 --> 00:59:42,690 much, much, smaller than your big survey, but a small survey 1280 00:59:42,690 --> 00:59:44,810 just to get a sense of what these things look like. 1281 00:59:44,810 --> 00:59:47,070 And that can often be a very worthwhile thing to do. 1282 00:59:47,070 --> 00:59:49,115 AUDIENCE: I have a related question. 1283 00:59:51,680 --> 00:59:53,520 How often do you see-- 1284 00:59:53,520 --> 00:59:54,530 PROFESSOR: Oh, sorry. 1285 00:59:54,530 --> 00:59:55,390 I just wanted to do one other thing on this. 1286 00:59:55,390 --> 00:59:57,550 I've had this come up in my own experience, where I've 1287 00:59:57,550 --> 01:00:01,050 done this small survey, and found that the baseline 1288 01:00:01,050 --> 01:00:02,350 situation was such that the whole experiment 1289 01:00:02,350 --> 01:00:03,010 didn't make any sense. 1290 01:00:03,010 --> 01:00:05,150 And we just canceled the experiment. 1291 01:00:05,150 --> 01:00:06,390 And it can be really useful. 1292 01:00:06,390 --> 01:00:11,050 If you say, if I do this and my power is 0.01, for 1293 01:00:11,050 --> 01:00:14,300 reasonable effect sizes, this is pointless. 1294 01:00:14,300 --> 01:00:16,110 So it can be worth it. 1295 01:00:16,110 --> 01:00:16,520 Sorry. 1296 01:00:16,520 --> 01:00:16,930 Go ahead. 1297 01:00:16,930 --> 01:00:21,460 AUDIENCE: So to estimate the effect size, have you seen 1298 01:00:21,460 --> 01:00:24,960 people run small pilots in different populations than 1299 01:00:24,960 --> 01:00:28,010 they're eventually going to do their impact evaluation to get 1300 01:00:28,010 --> 01:00:30,670 a sense of what effect size are they seeing with that same 1301 01:00:30,670 --> 01:00:30,965 intervention? 1302 01:00:30,965 --> 01:00:35,060 PROFESSOR: Not usually, because you can't do a small 1303 01:00:35,060 --> 01:00:37,500 pilot to get the effect size, right? 1304 01:00:37,500 --> 01:00:38,340 AUDIENCE: You're going to see something-- 1305 01:00:38,340 --> 01:00:39,290 PROFESSOR: You've got to do the whole thing. 1306 01:00:39,290 --> 01:00:40,060 AUDIENCE: Yeah, yeah. 1307 01:00:40,060 --> 01:00:40,310 PROFESSOR: Right? 1308 01:00:40,310 --> 01:00:41,610 That's the whole point of the power calculations is, in 1309 01:00:41,610 --> 01:00:45,220 order to detect an effect of that size, you need to do the 1310 01:00:45,220 --> 01:00:45,610 whole sample. 1311 01:00:45,610 --> 01:00:48,080 So a small pilot won't really do it. 1312 01:00:48,080 --> 01:00:49,556 AUDIENCE: OK. 1313 01:00:49,556 --> 01:00:51,750 PROFESSOR: So it's not really going to-- you could get a-- 1314 01:00:51,750 --> 01:00:54,290 no, I guess you really can't get a sense because you would 1315 01:00:54,290 --> 01:00:56,530 need the whole experiment to detect the effect size. 1316 01:00:59,490 --> 01:01:03,265 AUDIENCE: Don't you think that there should be a lot more 1317 01:01:03,265 --> 01:01:05,850 conversation about effect size before things start? 1318 01:01:05,850 --> 01:01:09,130 Because if you've got a treatment, if you've got a 1319 01:01:09,130 --> 01:01:17,110 program, and you can't have a very-- and you've struggled to 1320 01:01:17,110 --> 01:01:21,170 have a good conversation about what is actually going to 1321 01:01:21,170 --> 01:01:23,560 happen to the kids or what's going to happen to the health 1322 01:01:23,560 --> 01:01:25,840 or what's going to happen to the income as a result of 1323 01:01:25,840 --> 01:01:29,420 this, it really may be quite telling that you really don't 1324 01:01:29,420 --> 01:01:30,680 know what you're doing. 1325 01:01:30,680 --> 01:01:34,690 That there isn't enough of a theory behind your-- or 1326 01:01:34,690 --> 01:01:37,920 practice or science or anything behind what your 1327 01:01:37,920 --> 01:01:38,900 program is. 1328 01:01:38,900 --> 01:01:43,605 If people are not pretty sure, what-- 1329 01:01:43,605 --> 01:01:45,010 PROFESSOR: I mean, yes and no-- 1330 01:01:45,010 --> 01:01:48,170 AUDIENCE: And then, also, on the resource allocation. 1331 01:01:48,170 --> 01:01:52,670 Resource allocation, it just seems to me, most of the time, 1332 01:01:52,670 --> 01:01:57,105 if your ultimate client is really probably the 1333 01:01:57,105 --> 01:01:57,980 government, right? 1334 01:01:57,980 --> 01:02:00,020 Because the government is the one that's going to make the 1335 01:02:00,020 --> 01:02:00,470 big resource allocations-- 1336 01:02:00,470 --> 01:02:02,760 PROFESSOR: It depends on who you're working with. 1337 01:02:02,760 --> 01:02:04,450 It could be an NGO, whoever. 1338 01:02:04,450 --> 01:02:04,870 But yes. 1339 01:02:04,870 --> 01:02:08,540 AUDIENCE: No, but an NGO is doing something, usually, as a 1340 01:02:08,540 --> 01:02:12,540 demonstration that, in fact, if it works, then the 1341 01:02:12,540 --> 01:02:14,130 government should do it. 1342 01:02:14,130 --> 01:02:16,070 PROFESSOR: Not always, but there's someone who, 1343 01:02:16,070 --> 01:02:17,836 presumably, is going to scale up. 1344 01:02:17,836 --> 01:02:18,900 AUDIENCE: Right. 1345 01:02:18,900 --> 01:02:21,750 And yes, businesses, maybe, right? 1346 01:02:21,750 --> 01:02:24,430 But I would say, 90% of the time, it's going to be, 1347 01:02:24,430 --> 01:02:26,360 ultimately, the government needs to-- 1348 01:02:26,360 --> 01:02:27,480 PROFESSOR: Often, it's the government. 1349 01:02:27,480 --> 01:02:29,560 In India, for example, there are NGOs who are-- 1350 01:02:29,560 --> 01:02:32,250 I don't know who's worked on the Pratham reading thing. 1351 01:02:32,250 --> 01:02:33,470 They're trying to teach-- 1352 01:02:33,470 --> 01:02:35,750 NGOs trying to teach millions of kids to read, as an NGO. 1353 01:02:35,750 --> 01:02:37,170 So sometimes NGOs scale up too. 1354 01:02:37,170 --> 01:02:40,880 But anyway, you're right that there's an ultimate client 1355 01:02:40,880 --> 01:02:41,660 who's interested in this. 1356 01:02:41,660 --> 01:02:43,060 AUDIENCE: So then, having a conversation 1357 01:02:43,060 --> 01:02:45,160 very early on about-- 1358 01:02:45,160 --> 01:02:45,720 PROFESSOR: Yeah. 1359 01:02:45,720 --> 01:02:46,410 Could be very useful. 1360 01:02:46,410 --> 01:02:47,190 That's absolutely right. 1361 01:02:47,190 --> 01:02:47,980 That's absolutely right. 1362 01:02:47,980 --> 01:02:48,580 AUDIENCE: Because-- 1363 01:02:48,580 --> 01:02:50,770 PROFESSOR: Now, in terms of your point about theory, 1364 01:02:50,770 --> 01:02:52,560 though, yes and no. 1365 01:02:52,560 --> 01:02:55,670 So I can design an experiment that's supposed to teach kids 1366 01:02:55,670 --> 01:02:57,210 how to read. 1367 01:02:57,210 --> 01:03:00,550 I know the theory says it should affect reading but I 1368 01:03:00,550 --> 01:03:02,750 have no idea how much. 1369 01:03:02,750 --> 01:03:03,090 And so-- 1370 01:03:03,090 --> 01:03:06,740 AUDIENCE: Wouldn't you say that a significant percentage 1371 01:03:06,740 --> 01:03:09,810 of the time, if it's a good theory about reading, it 1372 01:03:09,810 --> 01:03:10,760 actually should tell you? 1373 01:03:10,760 --> 01:03:11,630 PROFESSOR: Not always. 1374 01:03:11,630 --> 01:03:12,090 I mean-- 1375 01:03:12,090 --> 01:03:13,680 AUDIENCE: Well, then I'd say it's not such a 1376 01:03:13,680 --> 01:03:14,960 great theory, right? 1377 01:03:14,960 --> 01:03:15,725 Wouldn't you-- 1378 01:03:15,725 --> 01:03:18,000 PROFESSOR: It's a little bit semantic, but I think that a 1379 01:03:18,000 --> 01:03:20,540 lot of times, I can-- 1380 01:03:20,540 --> 01:03:23,240 say I'm going to teach kids to read a paragraph or whatever. 1381 01:03:23,240 --> 01:03:26,570 But what percentage of the kids is it going to work for? 1382 01:03:26,570 --> 01:03:30,550 What percentage of the kids are going to be affected? 1383 01:03:30,550 --> 01:03:33,580 I think that using theory to calculate how-- 1384 01:03:33,580 --> 01:03:35,060 I think theory can tell you a lot what 1385 01:03:35,060 --> 01:03:36,420 variables should be affected. 1386 01:03:36,420 --> 01:03:38,620 And that's what we talked about in the last lecture. 1387 01:03:38,620 --> 01:03:41,110 I think theory can tell you what the sign of those effects 1388 01:03:41,110 --> 01:03:42,050 likely to be. 1389 01:03:42,050 --> 01:03:45,350 I think it's often putting a lot of demands on your theory 1390 01:03:45,350 --> 01:03:47,440 to have them tell you the magnitude. 1391 01:03:47,440 --> 01:03:48,595 And that's why you want to do the experiment. 1392 01:03:48,595 --> 01:03:51,540 AUDIENCE: And you just told me that even beyond the theory, 1393 01:03:51,540 --> 01:03:53,950 you say, well, but we did this in one school and we saw it 1394 01:03:53,950 --> 01:03:55,290 had this great thing, but you're saying-- 1395 01:03:55,290 --> 01:03:55,790 [INTERPOSING VOICES] 1396 01:03:55,790 --> 01:03:57,175 PROFESSOR: But your confidence interval is going to be-- 1397 01:03:57,175 --> 01:03:58,190 well, it's not nothing. 1398 01:03:58,190 --> 01:03:59,550 It's going to tell you something, but your confidence 1399 01:03:59,550 --> 01:04:00,310 interval is going to be enormous. 1400 01:04:00,310 --> 01:04:02,340 AUDIENCE: Right, nothing that you could 1401 01:04:02,340 --> 01:04:04,200 rely on to set a good-- 1402 01:04:04,200 --> 01:04:04,490 [INTERPOSING VOICES] 1403 01:04:04,490 --> 01:04:07,296 PROFESSOR: Right, it gives you a data point, but it's going 1404 01:04:07,296 --> 01:04:09,174 to have a huge conference interval. 1405 01:04:09,174 --> 01:04:13,330 AUDIENCE: I don't want to belabor this, but if you think 1406 01:04:13,330 --> 01:04:14,490 about it in business terms, right? 1407 01:04:14,490 --> 01:04:16,250 I want to go out and raise some money. 1408 01:04:16,250 --> 01:04:17,440 PROFESSOR: Yes, absolutely. 1409 01:04:17,440 --> 01:04:17,790 [INTERPOSING VOICES] 1410 01:04:17,790 --> 01:04:18,535 AUDIENCE: --something. 1411 01:04:18,535 --> 01:04:20,520 And so, in order to raise that money, I have to tell you 1412 01:04:20,520 --> 01:04:22,680 that, in fact, you're going to make this much money. 1413 01:04:22,680 --> 01:04:23,360 PROFESSOR: Right. 1414 01:04:23,360 --> 01:04:25,816 AUDIENCE: And, of course, it could turn out to be wrong. 1415 01:04:25,816 --> 01:04:28,483 But I have to tell you you're going to get a 25% return on 1416 01:04:28,483 --> 01:04:28,790 your money. 1417 01:04:28,790 --> 01:04:30,575 And that means I have to explain to you why this 1418 01:04:30,575 --> 01:04:32,347 business is going to successful, how many people 1419 01:04:32,347 --> 01:04:34,320 are going to buy it, how I'm going to manage my costs down. 1420 01:04:34,320 --> 01:04:36,750 So it's always curious to me that, when you're talking 1421 01:04:36,750 --> 01:04:41,000 about social interventions, that I'm not having to make 1422 01:04:41,000 --> 01:04:45,120 that same argument with that same level of specificity, 1423 01:04:45,120 --> 01:04:46,930 which means I've talked about the effect size. 1424 01:04:46,930 --> 01:04:50,540 Because I can't raise money if I tell you, look, I might only 1425 01:04:50,540 --> 01:04:53,695 make you 5% or we might shoot the moon and make 100%. 1426 01:04:53,695 --> 01:04:55,290 You'll say, thank you very much. 1427 01:04:55,290 --> 01:04:56,690 This person doesn't know what their business is. 1428 01:04:56,690 --> 01:04:58,340 I'm not going to give them my money. 1429 01:04:58,340 --> 01:04:59,730 PROFESSOR: Right. 1430 01:04:59,730 --> 01:05:03,710 So you actually hit on exactly what's on the next slide. 1431 01:05:03,710 --> 01:05:07,110 Which is exactly what I was going to say, which is, what 1432 01:05:07,110 --> 01:05:09,000 you want to think about with your effect size is exactly 1433 01:05:09,000 --> 01:05:09,370 this thing. 1434 01:05:09,370 --> 01:05:10,850 What's the cost of this program versus 1435 01:05:10,850 --> 01:05:12,170 the benefit it brings? 1436 01:05:12,170 --> 01:05:15,220 And sometimes, what's the cost vis-a-vis alternative uses of 1437 01:05:15,220 --> 01:05:16,270 the money, right? 1438 01:05:16,270 --> 01:05:17,730 And that's going to be a conversation you're going to 1439 01:05:17,730 --> 01:05:19,960 have with your client, which is going to say, if the effect 1440 01:05:19,960 --> 01:05:22,200 size was 0.1, I would do it. 1441 01:05:22,200 --> 01:05:23,730 And then you say, OK, I'm going to design an experiment 1442 01:05:23,730 --> 01:05:26,910 to see if it's 0.1 or bigger, right? 1443 01:05:26,910 --> 01:05:30,770 So I'm totally on board with that. 1444 01:05:30,770 --> 01:05:33,030 Because, as I was saying, if the effect size is smaller 1445 01:05:33,030 --> 01:05:35,610 than that, it still could be positive, but if your client 1446 01:05:35,610 --> 01:05:39,210 doesn't care, if it's not worth the money at that level, 1447 01:05:39,210 --> 01:05:41,210 then why do we need to design a big experiment 1448 01:05:41,210 --> 01:05:42,460 to pick that up? 1449 01:05:44,230 --> 01:05:46,590 It's also worth noting this is not your expected 1450 01:05:46,590 --> 01:05:48,540 effect size, right? 1451 01:05:48,540 --> 01:05:53,620 I could expect this thing to have an effect of 0.2 but even 1452 01:05:53,620 --> 01:05:55,040 if it was as low as 0.1, it would still be 1453 01:05:55,040 --> 01:05:56,430 worth doing, OK? 1454 01:05:56,430 --> 01:05:58,170 And in that case, I might want to design an experiment of 1455 01:05:58,170 --> 01:06:02,920 0.1, right? 1456 01:06:02,920 --> 01:06:05,580 Conversely, you guys can all imagine the opposite, which is 1457 01:06:05,580 --> 01:06:08,350 you could say, I expect this thing to be 0.1, 1458 01:06:08,350 --> 01:06:11,090 but maybe it's 0.2. 1459 01:06:11,090 --> 01:06:12,020 Maybe it's actually-- 1460 01:06:12,020 --> 01:06:14,260 I'm not sure how good it is. 1461 01:06:14,260 --> 01:06:14,850 I think it's OK. 1462 01:06:14,850 --> 01:06:17,370 But maybe it could be really great. 1463 01:06:17,370 --> 01:06:19,550 And if it was really great, I would want to adopt it, so I 1464 01:06:19,550 --> 01:06:21,540 would design an experiment to 0.2. 1465 01:06:21,540 --> 01:06:25,120 So it's not the expected effect size, it's what you 1466 01:06:25,120 --> 01:06:26,758 would use to adopt the program. 1467 01:06:33,180 --> 01:06:35,180 When we talk about effect sizes, we 1468 01:06:35,180 --> 01:06:37,282 often talk about them-- 1469 01:06:37,282 --> 01:06:40,970 we talk about what we call standardized effect size, OK? 1470 01:06:46,020 --> 01:06:48,670 As I mentioned, how large an effect you can detect depends 1471 01:06:48,670 --> 01:06:51,010 on how variable your sample is. 1472 01:06:51,010 --> 01:06:53,830 So if everyone's the same, it's very 1473 01:06:53,830 --> 01:06:55,870 easy to pick up effects. 1474 01:06:55,870 --> 01:06:58,770 And we often talk about standardized effects are the 1475 01:06:58,770 --> 01:07:01,050 effect size divided by the standard deviation of the 1476 01:07:01,050 --> 01:07:03,350 outcome, OK? 1477 01:07:03,350 --> 01:07:05,250 So standard deviation of outcome is the measure of how 1478 01:07:05,250 --> 01:07:06,600 variable your outcome is. 1479 01:07:06,600 --> 01:07:10,530 So we often express our effect sizes relative to the standard 1480 01:07:10,530 --> 01:07:12,680 deviation of the outcome, OK? 1481 01:07:12,680 --> 01:07:14,310 And so when I was talking about test scores, for 1482 01:07:14,310 --> 01:07:16,770 example, test scores are usually normalized to have a 1483 01:07:16,770 --> 01:07:18,290 standard deviation of one. 1484 01:07:18,290 --> 01:07:20,400 So this is actually how we normally express things in 1485 01:07:20,400 --> 01:07:22,510 terms of test scores, but we could do it for anything. 1486 01:07:22,510 --> 01:07:25,850 And so effect sizes of 0.1, 0.2 are small. 1487 01:07:25,850 --> 01:07:26,910 0.4 are medium. 1488 01:07:26,910 --> 01:07:28,150 0.5 are large. 1489 01:07:28,150 --> 01:07:29,830 Now what do we mean by that? 1490 01:07:29,830 --> 01:07:31,790 This is actually a very helpful way of thinking about 1491 01:07:31,790 --> 01:07:34,580 what a standardized effect size is telling you. 1492 01:07:34,580 --> 01:07:37,830 So a standardized effect size of 0.2, which is what we were 1493 01:07:37,830 --> 01:07:43,350 saying was a modest one, means that the average person in the 1494 01:07:43,350 --> 01:07:47,980 treatment group, the median or the mean person of the 1495 01:07:47,980 --> 01:07:52,610 treatment group, had a better outcome than 58% of the people 1496 01:07:52,610 --> 01:07:54,930 in the control group. 1497 01:07:54,930 --> 01:07:57,810 So remember, if it was zero, it would be 50-50. 1498 01:07:57,810 --> 01:07:58,840 It would be 50%, right? 1499 01:07:58,840 --> 01:08:01,160 If there was no effect, the distributions would line up 1500 01:08:01,160 --> 01:08:03,400 and this person's in the treatment group-- 1501 01:08:03,400 --> 01:08:04,590 the median person in the treatment group would be 1502 01:08:04,590 --> 01:08:09,150 better than 50% of the people in the control group. 1503 01:08:09,150 --> 01:08:11,680 So this is saying, instead of lining up at exactly 50-50, 1504 01:08:11,680 --> 01:08:15,920 it's lining up 58%-50%, OK? 1505 01:08:15,920 --> 01:08:20,700 If you get an effect size of 0.5, which we were saying was 1506 01:08:20,700 --> 01:08:24,490 a large effect, that means that 69% of the people in the 1507 01:08:24,490 --> 01:08:26,720 treatment group are going to be bigger than the median 1508 01:08:26,720 --> 01:08:29,484 person in the control group. 1509 01:08:29,484 --> 01:08:31,100 Sorry, it's the other way around. 1510 01:08:31,100 --> 01:08:32,490 The average member of the intervention group is better 1511 01:08:32,490 --> 01:08:36,310 than 69% of people in the control group. 1512 01:08:36,310 --> 01:08:37,950 So the distributions are still overlapping. 1513 01:08:37,950 --> 01:08:39,170 But now there's-- 1514 01:08:39,170 --> 01:08:42,170 the middle of the treatment distribution is at the 69th 1515 01:08:42,170 --> 01:08:45,779 percentile of the control. 1516 01:08:45,779 --> 01:08:49,800 And a large effect of 0.8 would mean that the median 1517 01:08:49,800 --> 01:08:55,210 person in the treatment group is at the 79th percentile of 1518 01:08:55,210 --> 01:08:56,970 the control. 1519 01:08:56,970 --> 01:08:58,580 That just gives you a sense of when we're talking about 1520 01:08:58,580 --> 01:09:02,180 standardized effect sizes, how big we're talking about. 1521 01:09:02,180 --> 01:09:04,990 And so you can see that 0.2, is actually-- 1522 01:09:04,990 --> 01:09:08,689 you can imagine is going to be pretty hard to detect, right? 1523 01:09:08,689 --> 01:09:10,800 If the median person in the treatment group looks like the 1524 01:09:10,800 --> 01:09:14,029 58th percentile of the control group, that's going to be a 1525 01:09:14,029 --> 01:09:18,800 case where those distributions have a lot of overlap, right? 1526 01:09:18,800 --> 01:09:20,450 And so this is going to be much harder to detect than 1527 01:09:20,450 --> 01:09:25,130 this case when the overlap is much smaller. 1528 01:09:25,130 --> 01:09:25,330 Yeah. 1529 01:09:25,330 --> 01:09:28,950 AUDIENCE: So in your experience, what do most 1530 01:09:28,950 --> 01:09:30,826 people think their effect size is? 1531 01:09:30,826 --> 01:09:32,140 Where do they settle? 1532 01:09:32,140 --> 01:09:34,080 They probably wouldn't settle at 0.2? 1533 01:09:34,080 --> 01:09:36,649 PROFESSOR: Actually, a lot of people in a lot of educational 1534 01:09:36,649 --> 01:09:36,989 interventions-- 1535 01:09:36,989 --> 01:09:38,680 AUDIENCE: That's enough for them? 1536 01:09:38,680 --> 01:09:40,439 PROFESSOR: Yeah. 1537 01:09:40,439 --> 01:09:42,279 I would say the typical intervention that people study 1538 01:09:42,279 --> 01:09:44,370 that I've seen in education, the effect size is in the 1539 01:09:44,370 --> 01:09:50,284 0.15, 0.2 range. 1540 01:09:50,284 --> 01:09:52,019 It turns out it's really hard to move test scores. 1541 01:09:52,019 --> 01:09:52,982 AUDIENCE: Yeah. 1542 01:09:52,982 --> 01:09:56,570 PROFESSOR: So yeah, I would say a lot of-- 1543 01:09:56,570 --> 01:09:58,410 but you'll see when you do the power calculations, that to 1544 01:09:58,410 --> 01:10:00,040 detect 0.2, you often need a pretty big sample. 1545 01:10:03,810 --> 01:10:05,620 Look, it depends a lot on what your intervention is, but I've 1546 01:10:05,620 --> 01:10:09,270 seen a lot in that range. 1547 01:10:09,270 --> 01:10:13,340 And I'm just trying to think of an experiment I did. 1548 01:10:13,340 --> 01:10:14,940 I can't think of it off hand. 1549 01:10:14,940 --> 01:10:17,280 But yeah, I would say a lot in this range. 1550 01:10:17,280 --> 01:10:19,777 AUDIENCE: So would the converse be true, that in 1551 01:10:19,777 --> 01:10:22,890 fact, you don't see too many that have a real 1552 01:10:22,890 --> 01:10:25,962 large effect size? 1553 01:10:25,962 --> 01:10:28,490 PROFESSOR: I would say it's pretty rare that I see 1554 01:10:28,490 --> 01:10:31,238 interventions that are 0.8. 1555 01:10:31,238 --> 01:10:31,704 Yeah. 1556 01:10:31,704 --> 01:10:34,160 AUDIENCE: Do you think it's valuable that just because 1557 01:10:34,160 --> 01:10:36,500 you're setting a low effect size in designing your 1558 01:10:36,500 --> 01:10:37,320 experiment, you're being conservative. 1559 01:10:37,320 --> 01:10:39,140 You can still pick up a [UNINTELLIGIBLE] effect size-- 1560 01:10:39,140 --> 01:10:39,740 PROFESSOR: Of course. 1561 01:10:39,740 --> 01:10:41,412 AUDIENCE: It's just in the design process-- 1562 01:10:41,412 --> 01:10:41,830 [INTERPOSING VOICES] 1563 01:10:41,830 --> 01:10:42,450 PROFESSOR: Right. 1564 01:10:42,450 --> 01:10:44,520 This is the minimum thing you could pick up. 1565 01:10:44,520 --> 01:10:45,720 That's absolutely right. 1566 01:10:45,720 --> 01:10:46,460 That's right. 1567 01:10:46,460 --> 01:10:49,470 So right, if you design for 0.2 but, in fact, your thing 1568 01:10:49,470 --> 01:10:54,150 is amazing and does 0.8, well, there's no problem at all. 1569 01:10:54,150 --> 01:10:56,520 You'll have a p-value of 0.00 something. 1570 01:10:56,520 --> 01:10:59,184 You'll have a very strong [INAUDIBLE]. 1571 01:10:59,184 --> 01:11:01,900 It's a good point. 1572 01:11:01,900 --> 01:11:03,510 OK. 1573 01:11:03,510 --> 01:11:06,950 So how do we actually calculate our power? 1574 01:11:06,950 --> 01:11:10,390 So there's actually a very nice software package, which, 1575 01:11:10,390 --> 01:11:12,330 have you guys started using this yet? 1576 01:11:12,330 --> 01:11:12,810 Yeah? 1577 01:11:12,810 --> 01:11:12,990 OK. 1578 01:11:12,990 --> 01:11:14,110 AUDIENCE: I have a question. 1579 01:11:14,110 --> 01:11:15,970 Can you just clarify something before you go on? 1580 01:11:15,970 --> 01:11:16,900 PROFESSOR: Yeah. 1581 01:11:16,900 --> 01:11:20,208 AUDIENCE: So by rejecting a null hypothesis, you won't be 1582 01:11:20,208 --> 01:11:23,064 able to say what the expected effect is, so you won't be 1583 01:11:23,064 --> 01:11:24,254 able to necessarily quantify the impact. 1584 01:11:24,254 --> 01:11:26,865 PROFESSOR: No, that's not quite right. 1585 01:11:26,865 --> 01:11:27,697 AUDIENCE: OK. 1586 01:11:27,697 --> 01:11:31,570 PROFESSOR: So you're going to estimate your-- 1587 01:11:31,570 --> 01:11:34,260 you run your experiment, you're going to get a beta, 1588 01:11:34,260 --> 01:11:35,940 which is your estimate, And you're going to 1589 01:11:35,940 --> 01:11:38,470 get a standard error. 1590 01:11:38,470 --> 01:11:42,110 You reject the null, which means you say with 95% 1591 01:11:42,110 --> 01:11:44,540 probability, I'm in my confidence interval. 1592 01:11:44,540 --> 01:11:48,360 So you know you're somewhere in the confidence interval. 1593 01:11:48,360 --> 01:11:50,990 And then beyond that, you have an estimate of where in the 1594 01:11:50,990 --> 01:11:52,360 confidence interval you are. 1595 01:11:52,360 --> 01:11:54,210 And your best estimate for where you are on the 1596 01:11:54,210 --> 01:11:57,080 confidence interval is your point estimate. 1597 01:11:57,080 --> 01:11:58,230 Does that make sense? 1598 01:11:58,230 --> 01:12:02,380 So in terms of thinking through the cost-benefit or 1599 01:12:02,380 --> 01:12:05,080 whatever, your best guess of the effect of the program is 1600 01:12:05,080 --> 01:12:07,470 your point estimate, is your beta. 1601 01:12:07,470 --> 01:12:10,480 If you wanted to be a little more precise about it, you 1602 01:12:10,480 --> 01:12:11,730 could say-- 1603 01:12:19,190 --> 01:12:25,090 so this is your estimate, this is your beta hat, this is your 1604 01:12:25,090 --> 01:12:26,810 confidence interval, right? 1605 01:12:26,810 --> 01:12:29,970 Zero is over here, so you can reject zero in this case. 1606 01:12:29,970 --> 01:12:34,210 But, in fact, there's a distribution of where your 1607 01:12:34,210 --> 01:12:36,100 estimates are likely to be. 1608 01:12:36,100 --> 01:12:37,870 And when we said it was 95% confidence interval, that's 1609 01:12:37,870 --> 01:12:41,100 because the probability of being over here is 95%. 1610 01:12:41,100 --> 01:12:43,710 But this says you're most likely to be right here, but 1611 01:12:43,710 --> 01:12:45,430 there's some probability over here. 1612 01:12:45,430 --> 01:12:48,980 You're more likely to be near beta then you are to be very-- 1613 01:12:48,980 --> 01:12:51,160 it's not that you're equally likely to be anywhere in your 1614 01:12:51,160 --> 01:12:52,350 confidence interval. 1615 01:12:52,350 --> 01:12:54,700 You're most likely to be right near your point estimate. 1616 01:12:54,700 --> 01:12:58,740 So, in fact, if you actually cared about the range, you 1617 01:12:58,740 --> 01:13:01,060 could say, well, what's the probability I'm over here? 1618 01:13:01,060 --> 01:13:02,410 And calculate that. 1619 01:13:02,410 --> 01:13:03,340 What's the probability I'm over here? 1620 01:13:03,340 --> 01:13:06,130 And you could average them to calculate the average benefit 1621 01:13:06,130 --> 01:13:07,390 of your program. 1622 01:13:07,390 --> 01:13:09,230 Usually, though, we don't bother to do this and usually 1623 01:13:09,230 --> 01:13:11,680 what we do is we say our best estimate is that you're right 1624 01:13:11,680 --> 01:13:12,250 at beta hat. 1625 01:13:12,250 --> 01:13:14,372 That is our best estimate and we calculate our estimate 1626 01:13:14,372 --> 01:13:15,622 based on that. 1627 01:13:18,480 --> 01:13:20,215 But in theory, you could use the whole distribution 1628 01:13:20,215 --> 01:13:22,392 [INAUDIBLE]. 1629 01:13:22,392 --> 01:13:23,642 OK. 1630 01:13:27,600 --> 01:13:28,850 OK, so suppose we want-- 1631 01:13:28,850 --> 01:13:31,010 so how do we actually calculate some of these? 1632 01:13:31,010 --> 01:13:34,710 So using the software helps get a sense, intuitively, of 1633 01:13:34,710 --> 01:13:35,910 what these tradeoffs are going to look like. 1634 01:13:35,910 --> 01:13:37,690 And I don't know that I'll have time to go through all 1635 01:13:37,690 --> 01:13:41,320 this, but we'll go through most of it, OK? 1636 01:13:41,320 --> 01:13:43,960 So for example, so if you run the software and look at power 1637 01:13:43,960 --> 01:13:45,210 versus number of clusters-- 1638 01:13:50,156 --> 01:13:52,340 hold on. 1639 01:13:52,340 --> 01:13:54,430 So how would you set this up in the software? 1640 01:13:54,430 --> 01:13:58,490 So we'll talk about clustered effects in a sec. 1641 01:13:58,490 --> 01:14:03,540 As we discussed, you have to pick a significance level. 1642 01:14:03,540 --> 01:14:05,930 You have to pick a standardized effect size. 1643 01:14:05,930 --> 01:14:07,360 That's what delta is in the software. 1644 01:14:07,360 --> 01:14:10,850 So we use 0.2, OK? 1645 01:14:10,850 --> 01:14:12,620 In the software, it's always a standardized effect size. 1646 01:14:12,620 --> 01:14:13,650 You just divide by your standard 1647 01:14:13,650 --> 01:14:16,260 deviation of your outcome. 1648 01:14:16,260 --> 01:14:18,400 That's why you know your actual outcome variable 1649 01:14:18,400 --> 01:14:19,670 because you know-- but I think the 1650 01:14:19,670 --> 01:14:21,690 actual effect is whatever-- 1651 01:14:21,690 --> 01:14:23,590 people get one centimeter longer in order to get a 1652 01:14:23,590 --> 01:14:24,920 standardized effect size, I need to know the standard 1653 01:14:24,920 --> 01:14:27,290 deviation of my outcome variable. 1654 01:14:27,290 --> 01:14:34,660 And the program is going to give you the power as a 1655 01:14:34,660 --> 01:14:39,180 function of your sample size, OK? 1656 01:14:39,180 --> 01:14:42,960 And one of the things that you can see is that this is not 1657 01:14:42,960 --> 01:14:46,410 necessarily a linear relationship, right? 1658 01:14:46,410 --> 01:14:51,020 So for example, here, we've plotted a delta of-- 1659 01:14:51,020 --> 01:14:54,050 effect size of 0.2 and here's an effect size of 0.4. 1660 01:14:54,050 --> 01:14:58,930 So this says that with about 200 clusters, you're going to 1661 01:14:58,930 --> 01:15:04,315 get to a power of 0.8 with the effect size of 0.4, but you're 1662 01:15:04,315 --> 01:15:07,010 still going to be at a power of 0.2 with an 1663 01:15:07,010 --> 01:15:08,670 effect size of 0.2. 1664 01:15:08,670 --> 01:15:11,570 So the formulas are complicated. 1665 01:15:11,570 --> 01:15:13,940 They're not necessarily a linear function of your power. 1666 01:15:22,050 --> 01:15:24,750 When we think about power, we've talked about a couple of 1667 01:15:24,750 --> 01:15:29,260 things that influence our power in terms of the variance 1668 01:15:29,260 --> 01:15:30,800 of our outcome, right? 1669 01:15:30,800 --> 01:15:32,890 The variance of our outcome, how big our effect size is. 1670 01:15:32,890 --> 01:15:34,180 And those are the basic things that are going 1671 01:15:34,180 --> 01:15:35,420 to affect our power. 1672 01:15:35,420 --> 01:15:39,650 But there are things that we can do in our experiment-- 1673 01:15:39,650 --> 01:15:41,360 in the way we design our experiment that are also going 1674 01:15:41,360 --> 01:15:44,470 to make our experiment more or less powerful. 1675 01:15:44,470 --> 01:15:45,790 And here are some of the things that we can do. 1676 01:15:49,720 --> 01:15:52,390 One thing that we can do is we can think 1677 01:15:52,390 --> 01:15:55,240 about having a cluster-- 1678 01:15:55,240 --> 01:15:57,330 so whether we whether randomize at the individual 1679 01:15:57,330 --> 01:16:03,320 level or in clusters, whether we have a baseline, whether we 1680 01:16:03,320 --> 01:16:06,570 use control variables or stratification, and the type 1681 01:16:06,570 --> 01:16:09,750 hypothesis being tested. 1682 01:16:09,750 --> 01:16:12,550 All four of these are things that we're going to do that 1683 01:16:12,550 --> 01:16:15,130 for a given outcome variable and a given effect size, in 1684 01:16:15,130 --> 01:16:17,750 some sense, are going to affect how powerful our 1685 01:16:17,750 --> 01:16:20,490 experiment is. 1686 01:16:20,490 --> 01:16:22,800 In some sense-- 1687 01:16:22,800 --> 01:16:24,820 given that I may not have time to finish everything, the one 1688 01:16:24,820 --> 01:16:28,550 that I want to focus on is the clustering issue. 1689 01:16:28,550 --> 01:16:31,720 This is the one that is the biggest for designing 1690 01:16:31,720 --> 01:16:37,270 experiments, and it often makes a big difference. 1691 01:16:37,270 --> 01:16:43,640 So the intuition for clustering is that-- 1692 01:16:43,640 --> 01:16:45,040 so what is clustering? 1693 01:16:45,040 --> 01:16:49,480 Clustering is, instead of randomizing-- suppose I want 1694 01:16:49,480 --> 01:16:54,750 to do an experiment on whether the J-PAL executive ed class 1695 01:16:54,750 --> 01:16:56,980 improves your ability to-- 1696 01:16:56,980 --> 01:16:59,440 whether you took this lecture improves your understanding of 1697 01:16:59,440 --> 01:17:01,820 power calculation, OK? 1698 01:17:01,820 --> 01:17:05,380 Suppose I randomly sampled this half of the room and gave 1699 01:17:05,380 --> 01:17:08,980 you my lecture and this half was the control group. 1700 01:17:08,980 --> 01:17:11,270 And I flipped a coin so I split you in halves down the 1701 01:17:11,270 --> 01:17:13,160 middle and I said, OK, I'm going to flip a coin, which is 1702 01:17:13,160 --> 01:17:14,410 control, which is treatment. 1703 01:17:16,590 --> 01:17:20,020 You guys, presumably you all sat with your friends, OK? 1704 01:17:20,020 --> 01:17:23,280 So people on this side of the room are going to be more like 1705 01:17:23,280 --> 01:17:27,440 each other then people on that side of the room, OK? 1706 01:17:27,440 --> 01:17:32,620 So I didn't get an independent sample, right? 1707 01:17:32,620 --> 01:17:35,810 This group, their outcomes are going to be correlated because 1708 01:17:35,810 --> 01:17:37,510 some of you are friends and have similar 1709 01:17:37,510 --> 01:17:38,530 backgrounds and skills. 1710 01:17:38,530 --> 01:17:41,350 And this group is going to be correlated. 1711 01:17:41,350 --> 01:17:44,490 On the other hand, suppose I had gone through everyone and 1712 01:17:44,490 --> 01:17:46,450 randomly flipped a coin for every person and said, 1713 01:17:46,450 --> 01:17:47,200 treatment or control, treatment or control, 1714 01:17:47,200 --> 01:17:49,920 treatment or control? 1715 01:17:49,920 --> 01:17:53,540 In that case, I would've flipped the coin 60 times and 1716 01:17:53,540 --> 01:17:56,070 there would be no correlation between who is in the control 1717 01:17:56,070 --> 01:17:57,550 group and who is in the treatment group because I 1718 01:17:57,550 --> 01:17:59,690 wouldn't have been randomizing you into the same groups 1719 01:17:59,690 --> 01:18:02,550 together, OK? 1720 01:18:02,550 --> 01:18:07,070 By doing the cluster design, splitting you in half and then 1721 01:18:07,070 --> 01:18:10,170 randomizing treatment versus control or splitting you into 1722 01:18:10,170 --> 01:18:11,600 groups of 10-- 1723 01:18:11,600 --> 01:18:12,920 you five, you 10, you 10. 1724 01:18:12,920 --> 01:18:15,660 You 10, you 10, you 10, and then flipping the coin. 1725 01:18:15,660 --> 01:18:18,440 I have less variation, in some sense, than if I had flipped 1726 01:18:18,440 --> 01:18:19,890 the coin in individual-- 1727 01:18:19,890 --> 01:18:21,220 person by person-- 1728 01:18:21,220 --> 01:18:23,510 because those groups are going to be correlated. 1729 01:18:23,510 --> 01:18:25,580 They're going to have similar outcomes. 1730 01:18:25,580 --> 01:18:31,100 So the basic point is that your power is going to be-- 1731 01:18:31,100 --> 01:18:33,790 the more times you flip the coin to randomize treatment 1732 01:18:33,790 --> 01:18:35,780 and control, essentially, the more power you're going to 1733 01:18:35,780 --> 01:18:37,950 have because the more your different groups are going to 1734 01:18:37,950 --> 01:18:40,160 be independent, OK? 1735 01:18:40,160 --> 01:18:44,430 So to go through this again, suppose you wanted to know-- 1736 01:18:44,430 --> 01:18:46,340 this is, in general, about clustering. 1737 01:18:46,340 --> 01:18:49,600 Suppose you wanted to know how the outcome of the national 1738 01:18:49,600 --> 01:18:51,180 elections are going to be. 1739 01:18:51,180 --> 01:18:53,520 So you could either randomly sample 50 people from the 1740 01:18:53,520 --> 01:18:56,040 entire Indian population, or you randomly pick five 1741 01:18:56,040 --> 01:18:59,470 families and you ask 10 people per family what 1742 01:18:59,470 --> 01:19:01,590 their opinions are. 1743 01:19:01,590 --> 01:19:03,960 Clearly, this is going to give you more information than this 1744 01:19:03,960 --> 01:19:06,110 is because those family members are going to be 1745 01:19:06,110 --> 01:19:08,160 correlated, right? 1746 01:19:08,160 --> 01:19:10,690 I have views like my wife and like my father, et cetera. 1747 01:19:10,690 --> 01:19:12,990 So we're not getting independent views, whereas 1748 01:19:12,990 --> 01:19:15,565 here, you're getting, really, 50 independent data points. 1749 01:19:15,565 --> 01:19:16,910 And that's the same as what we were talking 1750 01:19:16,910 --> 01:19:19,230 about with the class. 1751 01:19:19,230 --> 01:19:21,700 So this approach is going to have more power than this 1752 01:19:21,700 --> 01:19:24,132 approach because of the way you did the sample. 1753 01:19:24,132 --> 01:19:24,568 Yeah. 1754 01:19:24,568 --> 01:19:26,465 AUDIENCE: So is the only reason that you would cluster, 1755 01:19:26,465 --> 01:19:30,370 then, just because you had to because you had no choice-- 1756 01:19:30,370 --> 01:19:31,230 PROFESSOR: Yes. 1757 01:19:31,230 --> 01:19:34,824 AUDIENCE: --for political reasons or just feasibility. 1758 01:19:34,824 --> 01:19:35,782 PROFESSOR: And cost. 1759 01:19:35,782 --> 01:19:36,740 AUDIENCE: And cost. 1760 01:19:36,740 --> 01:19:37,410 PROFESSOR: Yeah. 1761 01:19:37,410 --> 01:19:38,810 AUDIENCE: Well, and the level of intervention. 1762 01:19:38,810 --> 01:19:40,800 PROFESSOR: Exactly. 1763 01:19:40,800 --> 01:19:41,550 And we'll talk about that. 1764 01:19:41,550 --> 01:19:44,040 There are lots of reasons people-- 1765 01:19:44,040 --> 01:19:47,260 given this issue, people have lots of good reasons for 1766 01:19:47,260 --> 01:19:51,720 clustering, but the point is that there are negative 1767 01:19:51,720 --> 01:19:52,830 tradeoffs for sample size. 1768 01:19:52,830 --> 01:19:56,360 AUDIENCE: About the clusters. 1769 01:19:56,360 --> 01:20:03,260 If you flip the coin for all of the class and then after, 1770 01:20:03,260 --> 01:20:06,690 you decide that you will select among the people that 1771 01:20:06,690 --> 01:20:10,880 you have assigned, you will select those seated-- 1772 01:20:10,880 --> 01:20:15,000 you will select half of those seated on the left. 1773 01:20:15,000 --> 01:20:16,290 Will that solve the problem-- 1774 01:20:16,290 --> 01:20:18,760 PROFESSOR: You select half of the ones seated on the left? 1775 01:20:18,760 --> 01:20:19,290 AUDIENCE: Yeah. 1776 01:20:19,290 --> 01:20:21,670 PROFESSOR: Well, it's a different issue. 1777 01:20:21,670 --> 01:20:25,300 Suppose I first select the left and now I go one by one, 1778 01:20:25,300 --> 01:20:28,140 flip a coin of the people on the left. 1779 01:20:28,140 --> 01:20:30,270 I don't have the clustering issue because I flipped the 1780 01:20:30,270 --> 01:20:32,310 coin per person. 1781 01:20:32,310 --> 01:20:34,860 But I have a different issue, which is that the people I 1782 01:20:34,860 --> 01:20:36,860 selected are not necessarily representative of the whole 1783 01:20:36,860 --> 01:20:40,450 population because I didn't pick a representative sample. 1784 01:20:40,450 --> 01:20:42,040 I picked the ones who happened to sit over here. 1785 01:20:42,040 --> 01:20:42,966 AUDIENCE: My question-- 1786 01:20:42,966 --> 01:20:45,320 PROFESSOR: So there's two different issues. 1787 01:20:45,320 --> 01:20:50,080 One is, essentially, how many times you flip a coin is how 1788 01:20:50,080 --> 01:20:53,000 much power you have, how independent you're thing is. 1789 01:20:53,000 --> 01:20:55,950 The other issue is, is this group here representative of 1790 01:20:55,950 --> 01:20:57,700 the entire population? 1791 01:20:57,700 --> 01:21:01,450 You might think that people who sit near the window like 1792 01:21:01,450 --> 01:21:03,310 to look at the river are daydreamers and they're not as 1793 01:21:03,310 --> 01:21:06,020 good at math as people who don't sit near the window. 1794 01:21:06,020 --> 01:21:09,850 And so I would get the effect of my treatment on people who 1795 01:21:09,850 --> 01:21:11,630 like to sit near the window and aren't as good at math. 1796 01:21:11,630 --> 01:21:13,310 And that might be a different treatment effect than if I had 1797 01:21:13,310 --> 01:21:14,530 done it over the whole room. 1798 01:21:14,530 --> 01:21:18,286 So it's a different issue. 1799 01:21:18,286 --> 01:21:24,946 AUDIENCE: Yeah, but my question was you first draw a 1800 01:21:24,946 --> 01:21:29,752 random number of people that you assign to the treatment or 1801 01:21:29,752 --> 01:21:32,040 to the control. 1802 01:21:32,040 --> 01:21:37,580 And after that, within that people, you now say, I will 1803 01:21:37,580 --> 01:21:39,726 take half of those people-- 1804 01:21:39,726 --> 01:21:44,037 I will take half that are seated on the left and half 1805 01:21:44,037 --> 01:21:47,390 that are seated on the right. 1806 01:21:47,390 --> 01:21:48,640 PROFESSOR: I'm not sure-- 1807 01:21:48,640 --> 01:21:50,710 let me come back to your question. 1808 01:21:50,710 --> 01:21:51,920 I'm not sure I fully understand what you're saying. 1809 01:21:51,920 --> 01:21:53,540 Maybe we can talk about it afterwards. 1810 01:21:53,540 --> 01:21:54,850 I think what you're saying may be about stratification. 1811 01:21:54,850 --> 01:21:57,550 Why don't we talk about it later? 1812 01:21:57,550 --> 01:22:00,050 Because we're running a little short on time. 1813 01:22:00,050 --> 01:22:00,820 In fact, can I borrow somone's handouts? 1814 01:22:00,820 --> 01:22:04,950 Because I want to make sure I cover the most important stuff 1815 01:22:04,950 --> 01:22:05,865 in the lecture. 1816 01:22:05,865 --> 01:22:07,115 Let me just see where we are. 1817 01:22:13,910 --> 01:22:15,251 OK. 1818 01:22:15,251 --> 01:22:16,940 AUDIENCE: And if you need to, you can 1819 01:22:16,940 --> 01:22:18,017 take ten extra minutes. 1820 01:22:18,017 --> 01:22:20,510 PROFESSOR: I may do that. 1821 01:22:20,510 --> 01:22:22,620 I was going to ask you, Mark, for permission. 1822 01:22:22,620 --> 01:22:24,170 I just wanted to see what I had left. 1823 01:22:24,170 --> 01:22:25,050 OK. 1824 01:22:25,050 --> 01:22:28,220 So where were we? 1825 01:22:31,690 --> 01:22:31,940 Right. 1826 01:22:31,940 --> 01:22:32,310 OK. 1827 01:22:32,310 --> 01:22:33,000 Right. 1828 01:22:33,000 --> 01:22:36,680 So as I was saying, when possible, it's better to run 1829 01:22:36,680 --> 01:22:39,460 clustered design. 1830 01:22:39,460 --> 01:22:42,780 And so a cluster randomized trial is one in which the 1831 01:22:42,780 --> 01:22:48,750 units that are randomized are clusters of units rather than 1832 01:22:48,750 --> 01:22:49,400 the individual units. 1833 01:22:49,400 --> 01:22:51,640 So I randomized a whole cluster at a time rather than 1834 01:22:51,640 --> 01:22:54,030 individual person by person. 1835 01:22:54,030 --> 01:22:55,950 And there are lots of common examples of this. 1836 01:22:55,950 --> 01:22:59,340 So the PROGRESA program, for example, in Mexico was a 1837 01:22:59,340 --> 01:23:00,410 conditional cash transfer program. 1838 01:23:00,410 --> 01:23:01,900 They randomized village. 1839 01:23:01,900 --> 01:23:03,470 Some villages were in, some villages were out. 1840 01:23:03,470 --> 01:23:06,640 If a village was in, everybody was in. 1841 01:23:06,640 --> 01:23:08,700 In the panchayat case we talked about, it 1842 01:23:08,700 --> 01:23:10,340 was basically a village. 1843 01:23:10,340 --> 01:23:10,930 It was a panchayat. 1844 01:23:10,930 --> 01:23:12,470 So the whole panchayat was in or the whole 1845 01:23:12,470 --> 01:23:14,460 panchayat was not in. 1846 01:23:14,460 --> 01:23:17,135 In a lot of education experiments, we randomize at 1847 01:23:17,135 --> 01:23:18,240 the level of a school. 1848 01:23:18,240 --> 01:23:20,280 Either the whole school is in or the whole school is out. 1849 01:23:20,280 --> 01:23:21,220 Sometimes you do it as a class. 1850 01:23:21,220 --> 01:23:23,400 A whole class in a school is in [UNINTELLIGIBLE] is out. 1851 01:23:23,400 --> 01:23:25,970 In this iron supplementation example, it was by the family. 1852 01:23:25,970 --> 01:23:29,590 So there's lots of cases where you would do this kind of 1853 01:23:29,590 --> 01:23:31,940 clustering. 1854 01:23:31,940 --> 01:23:34,790 And there are lots of good reasons, as I've mentioned, 1855 01:23:34,790 --> 01:23:36,130 for doing clustering. 1856 01:23:36,130 --> 01:23:40,470 So one reason is you're worried about 1857 01:23:40,470 --> 01:23:43,450 contamination, right? 1858 01:23:43,450 --> 01:23:47,000 So for example, when they're interested in deworming, worms 1859 01:23:47,000 --> 01:23:49,100 are very easily-- 1860 01:23:49,100 --> 01:23:50,400 there's a lot of cross-contamination. 1861 01:23:50,400 --> 01:23:52,785 If one kid has worms, the next kid who's also in school with 1862 01:23:52,785 --> 01:23:53,940 him is likely to get worms. 1863 01:23:53,940 --> 01:23:56,960 So if I just deworm half the kids in the school, that's 1864 01:23:56,960 --> 01:24:00,350 going to have very little effect because my control- 1865 01:24:00,350 --> 01:24:02,270 they're going to get recontaminated by the kids who 1866 01:24:02,270 --> 01:24:03,440 weren't dewormed, right? 1867 01:24:03,440 --> 01:24:04,540 Or it could be the other way around. 1868 01:24:04,540 --> 01:24:06,720 It could be that if I deworm half the kids, that's enough 1869 01:24:06,720 --> 01:24:07,830 to knock worms out of the population. 1870 01:24:07,830 --> 01:24:09,750 The control group is also affected. 1871 01:24:09,750 --> 01:24:12,300 So you need to choose a level of randomization where your 1872 01:24:12,300 --> 01:24:14,300 treatment is going to affect the treatment group and not 1873 01:24:14,300 --> 01:24:16,120 affect the control group. 1874 01:24:16,120 --> 01:24:18,700 So that's a very important reason for cluster 1875 01:24:18,700 --> 01:24:20,470 randomizing. 1876 01:24:20,470 --> 01:24:23,960 Another reason is this feasibility consideration. 1877 01:24:23,960 --> 01:24:26,960 So it's often just for a variety of reasons not 1878 01:24:26,960 --> 01:24:31,370 feasible to give some people the treatment and not others. 1879 01:24:31,370 --> 01:24:35,060 Sometimes within a village, it's hard to make some people 1880 01:24:35,060 --> 01:24:38,100 eligible for a program and others not. 1881 01:24:38,100 --> 01:24:40,170 It's just sometimes hard to treat people in the same place 1882 01:24:40,170 --> 01:24:41,100 differently. 1883 01:24:41,100 --> 01:24:43,050 And so that's often a reason why we do cluster 1884 01:24:43,050 --> 01:24:45,080 randomization. 1885 01:24:45,080 --> 01:24:50,100 And some experiments naturally just occur at a cluster level. 1886 01:24:50,100 --> 01:24:52,710 So for example, if I want to do something that affects an 1887 01:24:52,710 --> 01:24:54,900 entire classroom, like give out-- 1888 01:24:54,900 --> 01:24:57,685 suppose I want to train a teacher, right? 1889 01:24:57,685 --> 01:25:00,130 That obviously affects all the kids in the teacher's class. 1890 01:25:00,130 --> 01:25:03,280 There's no way to have that only affect half the kids in 1891 01:25:03,280 --> 01:25:04,130 the teacher's class. 1892 01:25:04,130 --> 01:25:06,420 It's just a fact of life. 1893 01:25:06,420 --> 01:25:09,510 So there are lots of good reasons why we do cluster 1894 01:25:09,510 --> 01:25:13,080 randomized designs even though they have negative 1895 01:25:13,080 --> 01:25:16,280 impacts on our power. 1896 01:25:16,280 --> 01:25:20,490 So as I mentioned, the reason the cluster has a negative 1897 01:25:20,490 --> 01:25:22,820 impact on your power is because the groups are 1898 01:25:22,820 --> 01:25:24,560 correlated. 1899 01:25:24,560 --> 01:25:27,070 The outcomes for the individuals are correlated. 1900 01:25:27,070 --> 01:25:28,720 So, for example, if all of the villagers are exposed to the 1901 01:25:28,720 --> 01:25:30,860 same weather, right? 1902 01:25:30,860 --> 01:25:32,610 All villagers are exposed to the same weather. 1903 01:25:32,610 --> 01:25:35,360 So it could be that the weather was 1904 01:25:35,360 --> 01:25:36,570 really bad in this village. 1905 01:25:36,570 --> 01:25:40,300 So all those people are going to have a lower outcome, for 1906 01:25:40,300 --> 01:25:41,550 example, than if the weather was good. 1907 01:25:44,910 --> 01:25:48,150 And so, in some sense, even if there are 1,000 people in that 1908 01:25:48,150 --> 01:25:50,920 village, they all got this common shock, which is the 1909 01:25:50,920 --> 01:25:53,860 negative weather, you don't actually have 1,000 1910 01:25:53,860 --> 01:25:56,336 independent observations in that village because they have 1911 01:25:56,336 --> 01:26:00,180 this common correlated component, OK? 1912 01:26:00,180 --> 01:26:05,230 And this common correlated component we denote by the 1913 01:26:05,230 --> 01:26:08,460 Greek letter rho, which is the correlation of the units 1914 01:26:08,460 --> 01:26:09,710 within the same cluster. 1915 01:26:13,840 --> 01:26:20,110 So rho measures the correlation between units in 1916 01:26:20,110 --> 01:26:20,850 the same cluster. 1917 01:26:20,850 --> 01:26:24,090 If rho is zero, then people in the same cluster are just as 1918 01:26:24,090 --> 01:26:25,420 if they were independent. 1919 01:26:25,420 --> 01:26:27,060 There's no correlation. 1920 01:26:27,060 --> 01:26:29,080 Just as if they had been not in the same cluster. 1921 01:26:29,080 --> 01:26:31,760 If rho is one, they're perfectly correlated and it 1922 01:26:31,760 --> 01:26:36,320 means it they all have exactly the same outcome, OK? 1923 01:26:36,320 --> 01:26:39,150 So it's somewhere between zero and one. 1924 01:26:39,150 --> 01:26:42,990 And the lower the rho is, the better you are if you're doing 1925 01:26:42,990 --> 01:26:44,650 a cluster randomized design. 1926 01:26:44,650 --> 01:26:45,570 And why is that? 1927 01:26:45,570 --> 01:26:47,960 It's because the problem within a clustered randomized 1928 01:26:47,960 --> 01:26:50,230 design is, as I was saying, if people were all exposed to the 1929 01:26:50,230 --> 01:26:53,020 same weather, it's not as if you had 1,000 independent 1930 01:26:53,020 --> 01:26:54,720 people in that village. 1931 01:26:54,720 --> 01:26:57,310 You effectively had fewer than 1,000 because they were 1932 01:26:57,310 --> 01:26:58,150 correlated. 1933 01:26:58,150 --> 01:27:02,030 And rho captures that effect-- 1934 01:27:02,030 --> 01:27:05,540 how much smaller, effectively, is your sample, OK? 1935 01:27:05,540 --> 01:27:09,340 And the bigger rho is, the smaller your effective sample 1936 01:27:09,340 --> 01:27:11,760 size is, OK? 1937 01:27:15,130 --> 01:27:16,910 And once again, when you do the power calculations, you 1938 01:27:16,910 --> 01:27:19,120 can play with this and you'll note that small differences in 1939 01:27:19,120 --> 01:27:21,540 rho make very big differences in your power. 1940 01:27:21,540 --> 01:27:23,380 And I'll show you the formula in a sec. 1941 01:27:23,380 --> 01:27:26,320 So often it's low, but it can be substantial. 1942 01:27:26,320 --> 01:27:29,370 So in some of these test score cases, for example, it's 1943 01:27:29,370 --> 01:27:34,060 between 0.2 and 0.6, which, 0.6 means that most of the 1944 01:27:34,060 --> 01:27:38,840 differences are coming between groups, not within groups. 1945 01:27:38,840 --> 01:27:44,190 So the groups, really, are much closer to one object. 1946 01:27:44,190 --> 01:27:44,665 Yeah. 1947 01:27:44,665 --> 01:27:49,370 AUDIENCE: What does the 0.5 mean? 1948 01:27:49,370 --> 01:27:56,458 Are you saying that in Madagascar, the scores on math 1949 01:27:56,458 --> 01:27:58,370 and language-- 1950 01:27:58,370 --> 01:28:00,200 PROFESSOR: It's the correlation 1951 01:28:00,200 --> 01:28:03,470 coefficient, which is the-- 1952 01:28:07,070 --> 01:28:13,230 technically, I believe it's the between variation divided 1953 01:28:13,230 --> 01:28:14,110 by the total variation. 1954 01:28:14,110 --> 01:28:15,580 I think that's the formula. 1955 01:28:15,580 --> 01:28:16,490 Dan's shaking his head. 1956 01:28:16,490 --> 01:28:17,250 Good. 1957 01:28:17,250 --> 01:28:18,070 Excellent. 1958 01:28:18,070 --> 01:28:20,430 A for me. 1959 01:28:20,430 --> 01:28:24,910 It's what share of the variation is coming between 1960 01:28:24,910 --> 01:28:29,710 groups divided by the total share of variation. 1961 01:28:29,710 --> 01:28:33,365 So 0.5 means that, in some sense, half of the variation 1962 01:28:33,365 --> 01:28:35,330 in your sample is coming between groups. 1963 01:28:37,890 --> 01:28:38,802 AUDIENCE: Okay. 1964 01:28:38,802 --> 01:28:39,590 PROFESSOR: What? 1965 01:28:39,590 --> 01:28:41,360 AUDIENCE: Isn't it within [INAUDIBLE]? 1966 01:28:41,360 --> 01:28:41,980 If a rho is-- 1967 01:28:41,980 --> 01:28:45,080 PROFESSOR: No, it's between. 1968 01:28:45,080 --> 01:28:47,680 Because if rho is one, then each group is one. 1969 01:28:50,310 --> 01:28:51,590 Yeah, it's between. 1970 01:28:55,880 --> 01:29:01,230 If it was zero, then they're independent and it's saying 1971 01:29:01,230 --> 01:29:03,884 that it's all coming from within. 1972 01:29:03,884 --> 01:29:06,139 Yeah. 1973 01:29:06,139 --> 01:29:10,180 AUDIENCE: But here it's between math and language 1974 01:29:10,180 --> 01:29:15,290 scores of one kid or between math plus language scores of 1975 01:29:15,290 --> 01:29:16,970 two kids in the same group. 1976 01:29:16,970 --> 01:29:20,130 AUDIENCE: Or is it math and language scores in Madagascar 1977 01:29:20,130 --> 01:29:21,770 are explained by-- 1978 01:29:21,770 --> 01:29:24,280 PROFESSOR: This says the following. 1979 01:29:24,280 --> 01:29:27,750 This was in Madagascar, they sampled math and language 1980 01:29:27,750 --> 01:29:31,170 schools by-- 1981 01:29:31,170 --> 01:29:34,650 they took math and language scores for each kid by 1982 01:29:34,650 --> 01:29:35,330 classroom-- 1983 01:29:35,330 --> 01:29:35,830 or by school. 1984 01:29:35,830 --> 01:29:38,210 I think it was by school in this particular case. 1985 01:29:38,210 --> 01:29:40,110 Then they said, looking over the whole sample that they 1986 01:29:40,110 --> 01:29:44,030 looked at in Madagascar, what percentage of the variation in 1987 01:29:44,030 --> 01:29:47,620 test scores came between schools 1988 01:29:47,620 --> 01:29:49,870 relative to within schools. 1989 01:29:49,870 --> 01:29:51,030 And they're saying that half of the 1990 01:29:51,030 --> 01:29:53,790 variation was between schools. 1991 01:29:58,390 --> 01:29:59,640 OK. 1992 01:30:09,190 --> 01:30:11,350 So how much does this hurt us, essentially? 1993 01:30:11,350 --> 01:30:15,460 So we need to adjust our standard errors, given the 1994 01:30:15,460 --> 01:30:19,640 fact that these things are correlated. 1995 01:30:19,640 --> 01:30:26,130 And this is the formula, which is that for a given total 1996 01:30:26,130 --> 01:30:28,780 sample size, if we have clusters of size m-- so say we 1997 01:30:28,780 --> 01:30:30,710 have 100 kids per school-- 1998 01:30:30,710 --> 01:30:34,070 and intercorrelation coefficient should be a rho, 1999 01:30:34,070 --> 01:30:37,160 the size of the smallest effect we can detect increases 2000 01:30:37,160 --> 01:30:39,940 by this formula compared to a non-clustered design. 2001 01:30:39,940 --> 01:30:43,880 So this shows you what this looks like, OK? 2002 01:30:43,880 --> 01:30:50,230 So suppose you had 100 kids per school, OK? 2003 01:30:50,230 --> 01:30:53,270 Suppose you had 100 kids per school and you randomized at 2004 01:30:53,270 --> 01:30:56,310 the school level rather the individual level. 2005 01:30:56,310 --> 01:30:59,040 If your correlation coefficient was zero, it would 2006 01:30:59,040 --> 01:31:00,460 be the same as if we randomized at the individual 2007 01:31:00,460 --> 01:31:02,450 level because they're totally uncorrelated. 2008 01:31:02,450 --> 01:31:05,095 Suppose your correlation coefficient was 0.1-- 2009 01:31:05,095 --> 01:31:06,980 rho was 0.1. 2010 01:31:06,980 --> 01:31:11,200 Then the smallest effect size you could have would be 3.3 2011 01:31:11,200 --> 01:31:15,120 times larger than if you had done an individual design. 2012 01:31:19,413 --> 01:31:22,860 So does that make sense how to interpret this? 2013 01:31:22,860 --> 01:31:27,150 And so this illustrates that, even with very mild 2014 01:31:27,150 --> 01:31:30,110 correlation coefficients-- and we saw examples of those math 2015 01:31:30,110 --> 01:31:31,610 test scores that were like 0.5. 2016 01:31:31,610 --> 01:31:34,300 This is only 0.1, but it already means, in some sense, 2017 01:31:34,300 --> 01:31:38,380 that your experiment can detect things-- 2018 01:31:41,180 --> 01:31:42,980 if you had been able to individually randomize, you 2019 01:31:42,980 --> 01:31:44,480 would be able to detect things that were three times as 2020 01:31:44,480 --> 01:31:47,100 small, right? 2021 01:31:47,100 --> 01:31:49,520 Now that's a combination of the fact that you have the 2022 01:31:49,520 --> 01:31:50,540 correlation coefficient and the number 2023 01:31:50,540 --> 01:31:51,930 of people per cluster. 2024 01:31:51,930 --> 01:31:56,634 AUDIENCE: Then in the previous slide, 0.5 does not mean half? 2025 01:31:56,634 --> 01:32:00,198 PROFESSOR: No, 0.5 is the correlation-- it's rho. 2026 01:32:00,198 --> 01:32:01,820 AUDIENCE: No, in the 2027 01:32:01,820 --> 01:32:02,495 PROFESSOR: It's rho. 2028 01:32:02,495 --> 01:32:04,600 AUDIENCE: Then it does not mean half of the difference-- 2029 01:32:04,600 --> 01:32:07,650 PROFESSOR: No, it's half of the variance. 2030 01:32:10,960 --> 01:32:11,930 Let me move on. 2031 01:32:11,930 --> 01:32:15,365 We can talk about the formula for that. 2032 01:32:15,365 --> 01:32:16,615 OK. 2033 01:32:20,040 --> 01:32:21,420 So what this means is, if the experimental design is 2034 01:32:21,420 --> 01:32:24,720 clustered, we now not only need to consider all the other 2035 01:32:24,720 --> 01:32:26,200 factors we talked about before, we also need to 2036 01:32:26,200 --> 01:32:29,660 consider this factor rho when doing our power calculations. 2037 01:32:29,660 --> 01:32:32,360 And rho is yet another thing we can try to estimate based 2038 01:32:32,360 --> 01:32:35,845 on our little survey of our population to get a sense of 2039 01:32:35,845 --> 01:32:37,095 what this rho is likely to be. 2040 01:32:40,570 --> 01:32:46,150 And given this clustering issue, it's very important not 2041 01:32:46,150 --> 01:32:47,820 just that you have a big enough number of people 2042 01:32:47,820 --> 01:32:50,130 involved in your experiment, but that you randomize across 2043 01:32:50,130 --> 01:32:52,560 a big enough number of groups, right? 2044 01:32:52,560 --> 01:32:54,745 And the way I like to think about is, how many times did 2045 01:32:54,745 --> 01:32:56,350 you flip the coin as to who should be treatment and who 2046 01:32:56,350 --> 01:32:57,600 should be control? 2047 01:33:00,830 --> 01:33:03,540 And, in fact, it's usually the case that the number of groups 2048 01:33:03,540 --> 01:33:07,430 you have is often more important than the total 2049 01:33:07,430 --> 01:33:11,660 number of individuals that you have because the individuals 2050 01:33:11,660 --> 01:33:17,090 are correlated within a group, OK? 2051 01:33:17,090 --> 01:33:18,340 So moving on. 2052 01:33:25,530 --> 01:33:26,890 So I'm going to flip through this. 2053 01:33:26,890 --> 01:33:28,050 This is mostly going over some of this if you were doing the 2054 01:33:28,050 --> 01:33:29,300 exercise quickly. 2055 01:33:33,161 --> 01:33:35,120 OK. 2056 01:33:35,120 --> 01:33:37,090 And so this chart-- 2057 01:33:41,860 --> 01:33:43,930 in your exercise shows you some of the tradeoffs that you 2058 01:33:43,930 --> 01:33:46,290 should think about when you're trying to decide how you 2059 01:33:46,290 --> 01:33:52,180 should trade off the number of groups you have versus the 2060 01:33:52,180 --> 01:33:54,460 number of people within a group, OK? 2061 01:33:54,460 --> 01:33:58,820 So in this particular case, a group was a gram panchayat, 2062 01:33:58,820 --> 01:34:01,740 and within a group there were villages, OK? 2063 01:34:01,740 --> 01:34:03,680 And there were different costs involved in doing these 2064 01:34:03,680 --> 01:34:04,390 different things, right? 2065 01:34:04,390 --> 01:34:07,670 So going to the place involved transportation costs to get to 2066 01:34:07,670 --> 01:34:08,980 the gram panchayat. 2067 01:34:08,980 --> 01:34:10,410 That, say, was a couple of days. 2068 01:34:10,410 --> 01:34:12,480 And then it took, like, half a day, say, for every village 2069 01:34:12,480 --> 01:34:13,620 you interviewed. 2070 01:34:13,620 --> 01:34:16,450 So that said, there's some cost of adding a new gram 2071 01:34:16,450 --> 01:34:18,840 panchayat, but also some marginal cost of adding 2072 01:34:18,840 --> 01:34:21,860 additional village per gram panchayat, OK? 2073 01:34:21,860 --> 01:34:25,750 So you could calculate, based on all your parameters and 2074 01:34:25,750 --> 01:34:29,360 power of 80% and whatever the intercluster correlation is in 2075 01:34:29,360 --> 01:34:31,710 this particular case, you could say, well, if we had 2076 01:34:31,710 --> 01:34:34,830 this many villages per gram panchayat, how many gram 2077 01:34:34,830 --> 01:34:36,550 panchayats would we need and how many villages 2078 01:34:36,550 --> 01:34:38,280 would we need, OK? 2079 01:34:38,280 --> 01:34:39,460 So you can do this set of exercises 2080 01:34:39,460 --> 01:34:40,460 and you can say that-- 2081 01:34:40,460 --> 01:34:43,500 and you'll note, for example, that as we reduce the number 2082 01:34:43,500 --> 01:34:46,230 of gram panchayats we go to-- another way, as we add more 2083 01:34:46,230 --> 01:34:49,040 villages per gram panchayat, the total number of villages 2084 01:34:49,040 --> 01:34:51,040 we need to survey goes up. 2085 01:34:51,040 --> 01:34:54,030 And in this particular case, it doesn't go up by that much 2086 01:34:54,030 --> 01:34:57,290 because the intercluster correlation is not that high. 2087 01:34:57,290 --> 01:34:59,170 And you could actually do this type of calculation and you 2088 01:34:59,170 --> 01:35:01,910 could say, well I know my costs are, right? 2089 01:35:01,910 --> 01:35:03,510 I know what my costs of going this place are. 2090 01:35:03,510 --> 01:35:06,420 And I can calculate which of these designs is the cheapest 2091 01:35:06,420 --> 01:35:08,780 design given what I want to achieve. 2092 01:35:08,780 --> 01:35:13,080 The other thing is, in this case, the experiment was 2093 01:35:13,080 --> 01:35:14,340 happening everywhere and they were just trying 2094 01:35:14,340 --> 01:35:15,680 to design the survey. 2095 01:35:15,680 --> 01:35:17,690 But often when we're doing this, we also need to pay for 2096 01:35:17,690 --> 01:35:19,690 the intervention itself. 2097 01:35:19,690 --> 01:35:22,860 And, at least in a lot of the cases that I've worked with, 2098 01:35:22,860 --> 01:35:26,940 the cost of actually doing the intervention is much bigger 2099 01:35:26,940 --> 01:35:29,150 than the cost of doing the survey. 2100 01:35:29,150 --> 01:35:33,460 And so, in that case, if you always have to treat every 2101 01:35:33,460 --> 01:35:37,670 village in the gram panchayat, you can actually save a ton of 2102 01:35:37,670 --> 01:35:39,670 money by going down in the number of gram panchayats and 2103 01:35:39,670 --> 01:35:41,930 surveying a lot more villages. 2104 01:35:41,930 --> 01:35:44,520 But the whole point is there are these tradeoffs and you 2105 01:35:44,520 --> 01:35:47,910 need to, in deciding how you're going to structure your 2106 01:35:47,910 --> 01:35:49,370 experiment and how you're going to structure your 2107 01:35:49,370 --> 01:35:51,560 survey, you need to think through what these tradeoffs 2108 01:35:51,560 --> 01:35:54,380 are, make sure you have enough power, given your estimates of 2109 01:35:54,380 --> 01:35:56,150 your intercluster correlation and sort of do the cost 2110 01:35:56,150 --> 01:35:58,560 minimizing thing. 2111 01:35:58,560 --> 01:36:01,320 OK, so in the last five minutes or so, let me just 2112 01:36:01,320 --> 01:36:04,880 highlight a couple of the other issues that come up in 2113 01:36:04,880 --> 01:36:07,210 thinking about power calculations. 2114 01:36:07,210 --> 01:36:10,970 So as I mentioned, the cluster design is one of the most 2115 01:36:10,970 --> 01:36:12,320 important ones. 2116 01:36:12,320 --> 01:36:14,070 And the key thing is making sure you have enough 2117 01:36:14,070 --> 01:36:17,070 independent groups, where you flip the coin to randomize 2118 01:36:17,070 --> 01:36:19,300 betwee treatment and control enough times. 2119 01:36:19,300 --> 01:36:21,300 Some other things that matter are baselines, control 2120 01:36:21,300 --> 01:36:23,190 variables, and the hypothesis being tested. 2121 01:36:23,190 --> 01:36:26,150 So one minute on each of those. 2122 01:36:26,150 --> 01:36:29,810 A baseline has two uses-- 2123 01:36:29,810 --> 01:36:31,330 main uses. 2124 01:36:31,330 --> 01:36:34,160 One use of a baseline is that it lets you check whether the 2125 01:36:34,160 --> 01:36:35,400 treatment and control group look the 2126 01:36:35,400 --> 01:36:37,780 same before you started. 2127 01:36:37,780 --> 01:36:40,750 And if you randomized properly, we know they should 2128 01:36:40,750 --> 01:36:41,630 look similar. 2129 01:36:41,630 --> 01:36:43,500 But you want to make sure that your randomization was 2130 01:36:43,500 --> 01:36:46,030 actually carried out the way it was supposed to be and that 2131 01:36:46,030 --> 01:36:48,970 it wasn't the case that people were pulling out of the hat 2132 01:36:48,970 --> 01:36:51,060 until they got a treatment or something, that they were 2133 01:36:51,060 --> 01:36:52,960 actually randomizing the way they were supposed to. 2134 01:36:52,960 --> 01:36:56,200 And having a baseline conducted before you start can 2135 01:36:56,200 --> 01:37:00,370 allow you to test that your randomization is actually 2136 01:37:00,370 --> 01:37:03,630 truly random and your groups look balanced. 2137 01:37:03,630 --> 01:37:06,070 The other thing is, the baseline can actually help 2138 01:37:06,070 --> 01:37:11,280 reduce your survey size needed because, but it requires you 2139 01:37:11,280 --> 01:37:14,720 to a survey before you start the intervention, right? 2140 01:37:14,720 --> 01:37:17,700 And the reason it can reduce your sample size is that now, 2141 01:37:17,700 --> 01:37:22,550 instead of just looking at, say, test scores across kids, 2142 01:37:22,550 --> 01:37:25,170 I can look at the change in test scores from before versus 2143 01:37:25,170 --> 01:37:27,460 after the experiment started. 2144 01:37:27,460 --> 01:37:29,450 And if people are really persistent, like if the people 2145 01:37:29,450 --> 01:37:31,120 who did really well on the test this year are likely to 2146 01:37:31,120 --> 01:37:33,270 do really well on the test next year, that can 2147 01:37:33,270 --> 01:37:37,710 essentially reduce the variance of your outcome. 2148 01:37:37,710 --> 01:37:43,100 It can be that the variance of difference in test scores can 2149 01:37:43,100 --> 01:37:45,550 be a lot lower than the variance in test scores. 2150 01:37:45,550 --> 01:37:47,490 And having a baseline can help you for that reason. 2151 01:37:51,390 --> 01:37:54,570 And as this slide points out, your evaluation costs 2152 01:37:54,570 --> 01:37:57,910 basically double because you have to do two surveys, not 2153 01:37:57,910 --> 01:38:01,450 one survey, but the costs of the intervention go down 2154 01:38:01,450 --> 01:38:03,620 because you can have a slightly smaller sample. 2155 01:38:03,620 --> 01:38:06,420 So if your intervention is really expensive relative to 2156 01:38:06,420 --> 01:38:08,580 your survey, this can make a lot of sense. 2157 01:38:08,580 --> 01:38:10,020 If your survey is really expensive relative to your 2158 01:38:10,020 --> 01:38:13,820 intervention, you might not want to do this. 2159 01:38:13,820 --> 01:38:17,830 And to figure out how this is going to affect your power, 2160 01:38:17,830 --> 01:38:21,010 you need to know yet another fact, which is how correlated 2161 01:38:21,010 --> 01:38:24,600 are people's outcomes over time, right? 2162 01:38:24,600 --> 01:38:27,040 What's the correlation between how well I do on a test today 2163 01:38:27,040 --> 01:38:28,240 and how well I do on a test tomorrow? 2164 01:38:28,240 --> 01:38:29,770 And some things are really correlated and some things are 2165 01:38:29,770 --> 01:38:30,900 not that correlated. 2166 01:38:30,900 --> 01:38:32,640 And a baseline really helps you on things that are really 2167 01:38:32,640 --> 01:38:33,890 correlated. 2168 01:38:38,570 --> 01:38:40,065 Another thing that can help you is stratification. 2169 01:38:42,740 --> 01:38:48,980 So what stratification can do is, stratification says, 2170 01:38:48,980 --> 01:38:50,150 suppose I-- 2171 01:38:50,150 --> 01:38:52,280 in some ways, it's conceptually a little bit like 2172 01:38:52,280 --> 01:38:54,750 a baseline, which is, suppose I know that all of the people 2173 01:38:54,750 --> 01:38:58,040 who live in this village tend to have similar outcomes and 2174 01:38:58,040 --> 01:38:59,430 all of the people who live in this village tend to have 2175 01:38:59,430 --> 01:38:59,880 similar outcomes. 2176 01:38:59,880 --> 01:39:01,780 And all of the people who live in this village tend to have 2177 01:39:01,780 --> 01:39:03,960 similar outcomes. 2178 01:39:03,960 --> 01:39:07,810 If I can then randomize by village, I can compare the 2179 01:39:07,810 --> 01:39:11,090 people in each village to each other, OK? 2180 01:39:11,090 --> 01:39:14,480 So if I'm looking within village, if people in villages 2181 01:39:14,480 --> 01:39:16,910 tend to be similar and I can randomize within village, if I 2182 01:39:16,910 --> 01:39:18,820 look within villages, the difference between the 2183 01:39:18,820 --> 01:39:19,710 treatment and the control group is 2184 01:39:19,710 --> 01:39:21,650 going to be less noisy. 2185 01:39:21,650 --> 01:39:23,930 So stratifying is basically a way of saying I'm going to 2186 01:39:23,930 --> 01:39:27,270 make sure my sample is balanced across the treatment 2187 01:39:27,270 --> 01:39:29,620 and control groups within certain subgroups of the 2188 01:39:29,620 --> 01:39:30,680 population. 2189 01:39:30,680 --> 01:39:32,920 And then I'm going to compare within those subgroups of the 2190 01:39:32,920 --> 01:39:35,340 population when I do my analysis. 2191 01:39:35,340 --> 01:39:36,630 And once again, we can think this as a 2192 01:39:36,630 --> 01:39:38,120 way of reducing noise. 2193 01:39:38,120 --> 01:39:41,100 That if people in similar villages tend to be similar, 2194 01:39:41,100 --> 01:39:43,250 if I only compare treatmet and control within the same 2195 01:39:43,250 --> 01:39:48,160 village, the noise there is going to be smaller. 2196 01:39:48,160 --> 01:39:51,150 So in some sense, it's similar. 2197 01:39:51,150 --> 01:39:55,520 So some things we tend to stratify by, if we know the 2198 01:39:55,520 --> 01:39:57,660 baseline value of the outcome, we can sometimes stratify by 2199 01:39:57,660 --> 01:39:59,480 that because we know that the effects are going to be 2200 01:39:59,480 --> 01:40:02,720 similar for people who have very similar baseline values. 2201 01:40:02,720 --> 01:40:04,815 Or often, I think, geographically we often tend 2202 01:40:04,815 --> 01:40:04,980 to do that. 2203 01:40:04,980 --> 01:40:07,420 So basically, we think that people in certain areas tend 2204 01:40:07,420 --> 01:40:09,170 to be similar so we're going to make sure our treatments 2205 01:40:09,170 --> 01:40:11,760 and controls are balanced in those areas as a way of 2206 01:40:11,760 --> 01:40:13,010 reducing noise. 2207 01:40:19,400 --> 01:40:23,680 And the final thing we want to mention is the hypothesis 2208 01:40:23,680 --> 01:40:30,080 being tested, which is, the more things you want to test, 2209 01:40:30,080 --> 01:40:32,930 the bigger your sample is going to need to be. 2210 01:40:32,930 --> 01:40:35,480 So for example, are we interested in the difference 2211 01:40:35,480 --> 01:40:37,170 between two treatments as well as the 2212 01:40:37,170 --> 01:40:39,280 treatment versus control? 2213 01:40:39,280 --> 01:40:41,740 If so, we need a much bigger sample because we not only 2214 01:40:41,740 --> 01:40:43,620 need to be able to tell the treatment versus the control 2215 01:40:43,620 --> 01:40:45,490 but we also need to be able to tell the two treatments from 2216 01:40:45,490 --> 01:40:48,110 each other, right? 2217 01:40:48,110 --> 01:40:50,600 So suppose you have two different treatments. 2218 01:40:50,600 --> 01:40:54,255 Are you interested in just the overall effect of each of the 2219 01:40:54,255 --> 01:40:54,760 two treatments? 2220 01:40:54,760 --> 01:40:55,890 Or are you interested in whether the treatments 2221 01:40:55,890 --> 01:40:58,900 interact produces a different effect if they happen 2222 01:40:58,900 --> 01:41:00,400 together, right? 2223 01:41:00,400 --> 01:41:03,120 The more things you're interested in, the bigger your 2224 01:41:03,120 --> 01:41:05,100 sample needs to be because you need to design your sample to 2225 01:41:05,100 --> 01:41:08,450 be big enough to answer each of these different questions. 2226 01:41:08,450 --> 01:41:10,740 Another thing, for example, you were interested in testing 2227 01:41:10,740 --> 01:41:12,270 whether the effect is different in different 2228 01:41:12,270 --> 01:41:13,330 subpopulations. 2229 01:41:13,330 --> 01:41:15,440 Do you just want to know the average effect of your program 2230 01:41:15,440 --> 01:41:17,380 or do you want to know if it was different in rural areas 2231 01:41:17,380 --> 01:41:19,170 versus urban areas? 2232 01:41:19,170 --> 01:41:20,860 If you want to know if it's different in rural versus 2233 01:41:20,860 --> 01:41:22,500 urban areas, you're going to need a big enough sample in 2234 01:41:22,500 --> 01:41:24,846 rural areas and a big enough sample in urban areas that you 2235 01:41:24,846 --> 01:41:27,270 can compare the difference between them. 2236 01:41:27,270 --> 01:41:34,550 So the more different things that you want to test, 2237 01:41:34,550 --> 01:41:36,840 obviously, the bigger your experiment's 2238 01:41:36,840 --> 01:41:37,760 going to need to be. 2239 01:41:37,760 --> 01:41:39,400 And a lot of times, in actually designing the 2240 01:41:39,400 --> 01:41:41,660 experiment, this is something that comes up all the time, 2241 01:41:41,660 --> 01:41:44,250 that you will very quickly figure out that the number of 2242 01:41:44,250 --> 01:41:46,800 questions you would like to answer is far bigger than the 2243 01:41:46,800 --> 01:41:49,270 sample size you can afford. 2244 01:41:49,270 --> 01:41:52,110 And one of the really important conversations you 2245 01:41:52,110 --> 01:41:53,160 need to have as you're starting to design an 2246 01:41:53,160 --> 01:41:57,490 experiment are, which are the really critical questions that 2247 01:41:57,490 --> 01:41:59,690 I really need to know the answer to? 2248 01:41:59,690 --> 01:42:02,210 So for example, in a project I was recently doing in 2249 01:42:02,210 --> 01:42:05,020 Indonesia, it turned out that the government really wanted 2250 01:42:05,020 --> 01:42:06,870 to know whether this program would work differently in 2251 01:42:06,870 --> 01:42:10,200 urban versus rural areas because they had a view that 2252 01:42:10,200 --> 01:42:11,210 urban areas are really different. 2253 01:42:11,210 --> 01:42:13,320 And they were willing to do different programs in urban 2254 01:42:13,320 --> 01:42:14,220 versus rural areas. 2255 01:42:14,220 --> 01:42:16,810 So we designed our whole sample to make sure we had 2256 01:42:16,810 --> 01:42:21,110 enough sampled in urban areas and in rural areas that we 2257 01:42:21,110 --> 01:42:22,550 could test those two things apart. 2258 01:42:22,550 --> 01:42:24,650 That almost doubled the size of the experiment, but the 2259 01:42:24,650 --> 01:42:26,580 government thought that was important enough that they 2260 01:42:26,580 --> 01:42:29,410 really wanted to do that. 2261 01:42:29,410 --> 01:42:32,200 The point here is that-- 2262 01:42:32,200 --> 01:42:33,930 that was the one they wanted to focus on. 2263 01:42:33,930 --> 01:42:34,690 There was a million other things we 2264 01:42:34,690 --> 01:42:35,780 could have done instead. 2265 01:42:35,780 --> 01:42:39,670 And so it's really important to think about, before you 2266 01:42:39,670 --> 01:42:42,430 design the experiment, what the few key things you want to 2267 01:42:42,430 --> 01:42:45,530 test are because, as I said, net you're never going to have 2268 01:42:45,530 --> 01:42:48,035 enough money to test all things you want. 2269 01:42:48,035 --> 01:42:51,390 That's sort of a universal truth. 2270 01:42:51,390 --> 01:42:53,360 So just to conclude, we've talk about in this lecture-- 2271 01:42:59,270 --> 01:43:02,050 going back to the basic statistics of how you're going 2272 01:43:02,050 --> 01:43:06,030 to analyze the experiment, thinking about how noisy your 2273 01:43:06,030 --> 01:43:08,486 outcome is going to be and how you're going to compute your 2274 01:43:08,486 --> 01:43:10,230 confidence intervals, how big your effect 2275 01:43:10,230 --> 01:43:11,900 size is going to be. 2276 01:43:11,900 --> 01:43:15,140 That's all what goes into doing a power calculation. 2277 01:43:15,140 --> 01:43:16,870 You also need to do some guess work, right? 2278 01:43:16,870 --> 01:43:19,330 The power calculation is going to require you to estimate how 2279 01:43:19,330 --> 01:43:21,680 big your sample is going to be-- 2280 01:43:21,680 --> 01:43:25,130 sorry, how much variance there's going to be, what your 2281 01:43:25,130 --> 01:43:25,980 effect size is going to be. 2282 01:43:25,980 --> 01:43:28,410 You have to make some assumptions. 2283 01:43:28,410 --> 01:43:31,400 And a little bit of pilot testing before the experiment 2284 01:43:31,400 --> 01:43:33,680 begins can be really useful, I think, mostly 2285 01:43:33,680 --> 01:43:35,730 for thinking about-- 2286 01:43:35,730 --> 01:43:37,380 just collecting some data can be useful to 2287 01:43:37,380 --> 01:43:38,630 estimate these variances. 2288 01:43:41,040 --> 01:43:43,350 The power calculations can help you think about this 2289 01:43:43,350 --> 01:43:45,380 question of how many treatments you can afford to 2290 01:43:45,380 --> 01:43:50,720 have, and can I afford to do three different versions of 2291 01:43:50,720 --> 01:43:54,210 the program or do I really need to just pick one or two? 2292 01:43:54,210 --> 01:43:56,880 How do I make this tradeoff of more clusters versus more 2293 01:43:56,880 --> 01:43:57,830 observations per cluster? 2294 01:43:57,830 --> 01:44:00,390 The power calculation can be very helpful here. 2295 01:44:00,390 --> 01:44:01,910 And the other thing, and in some sense, the place I find 2296 01:44:01,910 --> 01:44:04,050 power calculations the most useful, especially because 2297 01:44:04,050 --> 01:44:06,430 there is a bit of guesswork in power calculations and you get 2298 01:44:06,430 --> 01:44:07,750 rough rules of thumb. 2299 01:44:07,750 --> 01:44:09,450 You don't get precise answers because it depends on the 2300 01:44:09,450 --> 01:44:09,960 assumptions. 2301 01:44:09,960 --> 01:44:12,580 But what I find this is really useful for is whether this is 2302 01:44:12,580 --> 01:44:14,090 feasible or not, right? 2303 01:44:14,090 --> 01:44:17,790 Is this something where I'm kind of in the right range 2304 01:44:17,790 --> 01:44:20,630 where I think I can get estimates, or where there's no 2305 01:44:20,630 --> 01:44:23,000 chance, no matter how successful this program is, 2306 01:44:23,000 --> 01:44:25,900 that I'm going to be able to pick it up in my data because 2307 01:44:25,900 --> 01:44:28,730 the variable is just way too noisy. 2308 01:44:28,730 --> 01:44:31,210 And it's really important that you do the power calculation, 2309 01:44:31,210 --> 01:44:32,750 particularly-- 2310 01:44:32,750 --> 01:44:35,200 both for structuring how to design the experiment, but 2311 01:44:35,200 --> 01:44:37,450 particularly to make sure you're not going to waste a 2312 01:44:37,450 --> 01:44:38,740 lot of time and money doing something where you're going 2313 01:44:38,740 --> 01:44:40,820 to have no hope of picking it up. 2314 01:44:44,220 --> 01:44:46,720 Because a study which is underpowered is going to waste 2315 01:44:46,720 --> 01:44:49,210 a lot of everyone's time and you're very frustrated. 2316 01:44:49,210 --> 01:44:50,970 Very frustrating for everyone involved. 2317 01:44:50,970 --> 01:44:53,440 So you want to make sure you do this right before you start 2318 01:44:53,440 --> 01:44:54,940 because otherwise, you're going to end up spending a lot 2319 01:44:54,940 --> 01:44:58,150 of time, money, and effort on an experiment and ending up 2320 01:44:58,150 --> 01:45:01,060 not being able to conclude much of anything. 2321 01:45:01,060 --> 01:45:01,735 OK. 2322 01:45:01,735 --> 01:45:02,985 Thanks very much.