1 00:00:00,040 --> 00:00:02,390 The following content is provided under a Creative 2 00:00:02,390 --> 00:00:03,680 Commons license. 3 00:00:03,680 --> 00:00:06,640 Your support will help MIT OpenCourseWare continue to 4 00:00:06,640 --> 00:00:09,980 offer high quality educational resources for free. 5 00:00:09,980 --> 00:00:12,820 To make a donation or to view additional materials from 6 00:00:12,820 --> 00:00:16,750 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:16,750 --> 00:00:18,000 ocw.mit.edu. 8 00:00:21,170 --> 00:00:21,800 SHAWN COLE: Great. 9 00:00:21,800 --> 00:00:25,890 It's a real pleasure to be here and thank you for 10 00:00:25,890 --> 00:00:27,350 listening to me. 11 00:00:27,350 --> 00:00:30,430 This is perhaps the capstone lecture of the course or at 12 00:00:30,430 --> 00:00:32,150 least the last lecture of the course. 13 00:00:32,150 --> 00:00:34,950 And I'm going to try to pick up right where Michael left 14 00:00:34,950 --> 00:00:37,190 off talking about intention to treat and moving on to 15 00:00:37,190 --> 00:00:38,880 treatment of the treated. 16 00:00:38,880 --> 00:00:41,420 But with any luck, they'll be some time at the end where we 17 00:00:41,420 --> 00:00:44,620 can have more general discussions or questions if 18 00:00:44,620 --> 00:00:48,310 people are still interested about particular topics or 19 00:00:48,310 --> 00:00:49,450 have questions. 20 00:00:49,450 --> 00:00:51,630 And I'll stay after class to talk to people. 21 00:00:51,630 --> 00:00:54,490 And I think you have my email on the slides. 22 00:00:54,490 --> 00:00:57,970 So feel free to get in touch with me at any time. 23 00:00:57,970 --> 00:00:59,540 We've got this great project. 24 00:00:59,540 --> 00:01:02,850 We want to evaluate it or write JPAL. 25 00:01:02,850 --> 00:01:04,920 So what are we going to do today? 26 00:01:04,920 --> 00:01:08,520 Look at some challenges to randomized evaluations. 27 00:01:08,520 --> 00:01:10,740 So these problems. 28 00:01:10,740 --> 00:01:13,700 So basically, when people don't do what you assign them 29 00:01:13,700 --> 00:01:17,080 to do because you can't control them. 30 00:01:17,080 --> 00:01:20,270 You can control undergraduates in a lab, but in the real 31 00:01:20,270 --> 00:01:21,660 world it's a lot harder. 32 00:01:21,660 --> 00:01:23,570 Then we're going to talk about sort of how do you choose 33 00:01:23,570 --> 00:01:25,030 which effects to report in your study? 34 00:01:25,030 --> 00:01:27,390 So you've got your study, you did a bunch of household 35 00:01:27,390 --> 00:01:29,450 surveying, what do you want to report? 36 00:01:29,450 --> 00:01:31,240 How credible are your results? 37 00:01:31,240 --> 00:01:32,990 Then we'll spend some time talking about external 38 00:01:32,990 --> 00:01:36,530 validity, which is sort of the question, OK, I have a study 39 00:01:36,530 --> 00:01:38,600 that I think is internally valid. 40 00:01:38,600 --> 00:01:41,760 We did the randomization correctly and the results we 41 00:01:41,760 --> 00:01:42,490 think are legitimate. 42 00:01:42,490 --> 00:01:44,550 But how much can that tell us about the greater 43 00:01:44,550 --> 00:01:45,920 world around us? 44 00:01:45,920 --> 00:01:48,750 And finally, we'll conclude by talking about cost 45 00:01:48,750 --> 00:01:49,440 effectiveness. 46 00:01:49,440 --> 00:01:52,420 Which, as economists, is very important to us. 47 00:01:52,420 --> 00:01:55,220 So we may have a program that's effective, but how do 48 00:01:55,220 --> 00:01:59,310 we compare whether we should spend our precious aid or 49 00:01:59,310 --> 00:02:02,930 budget dollars on that particular program as composed 50 00:02:02,930 --> 00:02:04,620 to a host of other programs? 51 00:02:04,620 --> 00:02:08,800 And so I usually teach at HBS in the case method, which is 52 00:02:08,800 --> 00:02:11,320 very different than a lecture format because it's basically 53 00:02:11,320 --> 00:02:13,770 the students always talking and me refereeing. 54 00:02:13,770 --> 00:02:16,880 I'm not going to do that today, but I'm very 55 00:02:16,880 --> 00:02:18,760 comfortable with interruptions, or questions, 56 00:02:18,760 --> 00:02:23,430 or requests for clarification, et cetera. 57 00:02:23,430 --> 00:02:26,060 And I think we all stay more engaged if there's more 58 00:02:26,060 --> 00:02:26,750 discussion. 59 00:02:26,750 --> 00:02:29,790 And so that's super valuable. 60 00:02:29,790 --> 00:02:32,000 So that's the outline for today. 61 00:02:32,000 --> 00:02:34,540 And the slides up here are going to differ a little bit 62 00:02:34,540 --> 00:02:37,050 from the slides that you have printed out because there was 63 00:02:37,050 --> 00:02:42,330 some last minute optimization and re-coordination. 64 00:02:42,330 --> 00:02:45,350 So the basic problem, which I think Michael talked about, is 65 00:02:45,350 --> 00:02:47,890 that individuals are allocated to treatment groups and they 66 00:02:47,890 --> 00:02:50,640 don't receive treatment, or individuals are allocated to 67 00:02:50,640 --> 00:02:51,840 the assignment group, but somehow 68 00:02:51,840 --> 00:02:52,890 managed to get treatment. 69 00:02:52,890 --> 00:02:57,140 So you talked about students didn't show up at school, so 70 00:02:57,140 --> 00:02:58,740 they didn't get the treatment on the treatment day. 71 00:02:58,740 --> 00:03:02,610 Or the program said you can't give this program, you can't 72 00:03:02,610 --> 00:03:05,430 give de-worming medicine to girls over the age of 13 73 00:03:05,430 --> 00:03:07,170 because of health reasons. 74 00:03:07,170 --> 00:03:09,680 They may be pregnant and we don't know how the de-worming 75 00:03:09,680 --> 00:03:11,870 medicine affects pregnancy. 76 00:03:11,870 --> 00:03:13,720 So what do you do? 77 00:03:13,720 --> 00:03:17,600 You came up with the solution of estimating the program 78 00:03:17,600 --> 00:03:20,780 effect, ITT. 79 00:03:20,780 --> 00:03:24,680 Which is to use the original assignment. 80 00:03:24,680 --> 00:03:27,770 So we have our baseline survey, our baseline list of 81 00:03:27,770 --> 00:03:29,780 schools or people and we flipped our coins or run our 82 00:03:29,780 --> 00:03:31,130 [UNINTELLIGIBLE] code to randomize. 83 00:03:31,130 --> 00:03:33,470 So then it would just evaluate them no matter what actually 84 00:03:33,470 --> 00:03:33,950 happened and them. 85 00:03:33,950 --> 00:03:36,170 We count them in the treatment group or we count them in the 86 00:03:36,170 --> 00:03:37,090 control group. 87 00:03:37,090 --> 00:03:40,870 And that gives us our intention to treat estimate. 88 00:03:40,870 --> 00:03:43,710 And so the interpretation of that is, what was the average 89 00:03:43,710 --> 00:03:47,560 effect on an individual in the treated population relative to 90 00:03:47,560 --> 00:03:50,650 an individual in the comparison population? 91 00:03:50,650 --> 00:03:55,900 And you've already covered this with Michael? 92 00:03:55,900 --> 00:03:57,240 This is just a review. 93 00:03:57,240 --> 00:03:59,420 So is this the right number to look for? 94 00:03:59,420 --> 00:04:01,820 Well, if you're thinking about putting in a de-worming 95 00:04:01,820 --> 00:04:03,820 program, you have to realize that people aren't going to be 96 00:04:03,820 --> 00:04:06,570 at school when the government shows up to administer the 97 00:04:06,570 --> 00:04:09,250 de-worming program to all their students. 98 00:04:09,250 --> 00:04:10,550 And that's that. 99 00:04:10,550 --> 00:04:12,970 So maybe we can just pause for a second and think about some 100 00:04:12,970 --> 00:04:15,630 programs that if you're an [? IPARA ?] 101 00:04:15,630 --> 00:04:18,480 you think you're going to be involved with, or if you're 102 00:04:18,480 --> 00:04:20,740 involved in an NGO, the type of program you're running. 103 00:04:20,740 --> 00:04:24,530 And maybe a few people can volunteer what the intent to 104 00:04:24,530 --> 00:04:27,590 treat estimate might look like in their evaluation and 105 00:04:27,590 --> 00:04:30,580 whether it's something they care about. 106 00:04:30,580 --> 00:04:31,830 Put this into practice. 107 00:04:39,244 --> 00:04:41,570 I should be comfortable with long pregnant pauses as well. 108 00:04:45,660 --> 00:04:46,140 Any examples? 109 00:04:46,140 --> 00:04:46,450 Excellent. 110 00:04:46,450 --> 00:04:49,348 AUDIENCE: So one of the projects that we're working on 111 00:04:49,348 --> 00:04:52,246 is looking at the impact of financial education in low 112 00:04:52,246 --> 00:04:54,661 income communities in New York City. 113 00:04:57,559 --> 00:05:00,210 Obviously we're trying to measure the impact that 114 00:05:00,210 --> 00:05:03,620 financial education has, but we're working specifically 115 00:05:03,620 --> 00:05:07,350 with a certain NGO to implement that education. 116 00:05:12,260 --> 00:05:15,206 Whatever estimate we get that's the intention to treat 117 00:05:15,206 --> 00:05:17,661 will just be measuring the impact of the program. 118 00:05:17,661 --> 00:05:19,500 SHAWN COLE: And so what are the compliance problems you 119 00:05:19,500 --> 00:05:20,750 anticipate there? 120 00:05:23,700 --> 00:05:30,034 AUDIENCE: Maybe people not showing up for classes or not 121 00:05:30,034 --> 00:05:31,657 following through on whatever they're asked 122 00:05:31,657 --> 00:05:35,760 to do in the education. 123 00:05:35,760 --> 00:05:37,330 SHAWN COLE: Or people in the control group could get on the 124 00:05:37,330 --> 00:05:41,150 internet and download the government's financial 125 00:05:41,150 --> 00:05:43,520 literacy program and study very industriously 126 00:05:43,520 --> 00:05:45,780 themselves and learn. 127 00:05:45,780 --> 00:05:47,990 The intention to treat will tell you what the program does 128 00:05:47,990 --> 00:05:51,580 and you could imagine that if only 15% of the people turn up 129 00:05:51,580 --> 00:05:54,770 for your meetings, you're going to have a pretty small 130 00:05:54,770 --> 00:05:56,210 difference between the treatment group and the 131 00:05:56,210 --> 00:05:57,480 comparison group. 132 00:05:57,480 --> 00:06:00,600 That'll be an accurate measure of the program, but that's not 133 00:06:00,600 --> 00:06:02,690 going to tell us much about how important financial 134 00:06:02,690 --> 00:06:04,540 literacy is in making financial decisions. 135 00:06:04,540 --> 00:06:06,760 So if maybe your outcome variable is what interest rate 136 00:06:06,760 --> 00:06:08,960 do people borrow at, or do they pay their credit cards 137 00:06:08,960 --> 00:06:11,860 back on time, you might not find much effect. 138 00:06:11,860 --> 00:06:15,250 From that we can't conclude that financial education 139 00:06:15,250 --> 00:06:18,300 doesn't affect credit card repayment behavior. 140 00:06:18,300 --> 00:06:20,070 We just have to conclude that this particular program we 141 00:06:20,070 --> 00:06:22,440 delivered wasn't very effective. 142 00:06:22,440 --> 00:06:23,580 So that's an example. 143 00:06:23,580 --> 00:06:24,830 Any other examples? 144 00:06:27,020 --> 00:06:30,210 The point is hopefully pretty clear. 145 00:06:30,210 --> 00:06:30,800 Great. 146 00:06:30,800 --> 00:06:34,520 So I don't know if you went through these calculations. 147 00:06:34,520 --> 00:06:35,840 Yeah, OK. 148 00:06:35,840 --> 00:06:36,930 It's absolutely simple. 149 00:06:36,930 --> 00:06:39,060 You just take the average in this one, average in this one, 150 00:06:39,060 --> 00:06:40,080 and it's the difference. 151 00:06:40,080 --> 00:06:44,260 So it's not rocket science. 152 00:06:44,260 --> 00:06:49,370 So it relates to the actual program and it gives us an 153 00:06:49,370 --> 00:06:50,620 estimate of the program's impact. 154 00:06:59,550 --> 00:07:03,580 I guess the second method we're going to talk about, 155 00:07:03,580 --> 00:07:06,410 which you talked about in your learning groups this morning 156 00:07:06,410 --> 00:07:07,710 too, is treatment on the treated. 157 00:07:07,710 --> 00:07:11,310 And maybe you could motivate this by telling the joke about 158 00:07:11,310 --> 00:07:12,650 the econometricians or 159 00:07:12,650 --> 00:07:15,110 statisticians who went hunting. 160 00:07:15,110 --> 00:07:16,320 Have you heard this? 161 00:07:16,320 --> 00:07:19,490 So the first one aims at a deer and shoots and misses 10 162 00:07:19,490 --> 00:07:20,910 meters to the left. 163 00:07:20,910 --> 00:07:24,660 And the second one aims at the deer and misses 10 164 00:07:24,660 --> 00:07:26,080 meters to the right. 165 00:07:26,080 --> 00:07:29,940 And the third one says, yes, we got it. 166 00:07:29,940 --> 00:07:33,740 So the intention to treat is giving us the average program 167 00:07:33,740 --> 00:07:36,780 effect, but maybe we care more about what's the effect of 168 00:07:36,780 --> 00:07:38,050 knowing financial literacy? 169 00:07:38,050 --> 00:07:39,650 What's the effect of actually changing people's 170 00:07:39,650 --> 00:07:41,680 understanding of financial literacy? 171 00:07:41,680 --> 00:07:45,490 And that's where the treatment on the treated estimate can 172 00:07:45,490 --> 00:07:48,770 provide some assistance. 173 00:07:48,770 --> 00:07:52,410 So again, we went back to the worming example which you 174 00:07:52,410 --> 00:07:53,080 talked about. 175 00:07:53,080 --> 00:07:58,750 And so we have 76% of the people in the treatment 176 00:07:58,750 --> 00:08:00,760 schools got some treatment in the first round. 177 00:08:00,760 --> 00:08:03,190 And in the next round it was 72%. 178 00:08:03,190 --> 00:08:06,370 So that's actually nowhere near 100%. 179 00:08:06,370 --> 00:08:07,900 One-fifth of the students are not getting 180 00:08:07,900 --> 00:08:09,150 their de-worming medicine. 181 00:08:13,080 --> 00:08:18,840 And some students in the comparison group received 182 00:08:18,840 --> 00:08:19,890 treatment also. 183 00:08:19,890 --> 00:08:22,760 So for example, I think when you were testing the children 184 00:08:22,760 --> 00:08:25,220 and you found that they had worms in the comparison group, 185 00:08:25,220 --> 00:08:27,680 the sort of medical protocol required you 186 00:08:27,680 --> 00:08:30,530 to give them treatment. 187 00:08:30,530 --> 00:08:33,159 So what would you do if you wanted to know the effect of 188 00:08:33,159 --> 00:08:37,270 medicine on the children who took the medicine? 189 00:08:37,270 --> 00:08:40,530 And so you can't just compare children who took the medicine 190 00:08:40,530 --> 00:08:42,500 with children who didn't take the medicine. 191 00:08:42,500 --> 00:08:46,910 That leads to all the same selection problems we had in 192 00:08:46,910 --> 00:08:51,430 the first few days of class where the people who decided 193 00:08:51,430 --> 00:08:53,800 not to come to school that day or weren't able to make it to 194 00:08:53,800 --> 00:08:55,930 school that day are different than the people who did. 195 00:08:55,930 --> 00:08:59,450 And the people in the comparison group who went out 196 00:08:59,450 --> 00:09:01,440 to the pharmacy and bought the de-worming medicine are going 197 00:09:01,440 --> 00:09:06,950 to be different than the people who didn't. 198 00:09:06,950 --> 00:09:09,630 So what we do in this case is something really quite simple 199 00:09:09,630 --> 00:09:15,550 and it's at the foundation of the entire field of modern 200 00:09:15,550 --> 00:09:16,480 empirical research. 201 00:09:16,480 --> 00:09:20,380 But we won't go into all the details, we'll just talk about 202 00:09:20,380 --> 00:09:24,400 this treatment on the treated estimator, or ToT. 203 00:09:24,400 --> 00:09:27,820 And so what you don't do is just take the change of the 204 00:09:27,820 --> 00:09:31,040 people who were treated and the change of the people who 205 00:09:31,040 --> 00:09:32,970 were not treated and compare them. 206 00:09:32,970 --> 00:09:36,630 That would be just silly because people who are treated 207 00:09:36,630 --> 00:09:38,980 are different than the people who are not treated. 208 00:09:42,760 --> 00:09:46,480 So I think conceptually, this is actually a really simple 209 00:09:46,480 --> 00:09:47,150 thing that we're doing. 210 00:09:47,150 --> 00:09:49,890 So I don't want to get bogged down or confused in the math. 211 00:09:49,890 --> 00:09:55,090 But in the ideal experiment, in the treatment group 100% of 212 00:09:55,090 --> 00:09:55,850 the people are treated. 213 00:09:55,850 --> 00:09:58,250 In the comparison group, 0% of the people are treated. 214 00:09:58,250 --> 00:10:00,820 And then the average difference is just the average 215 00:10:00,820 --> 00:10:03,260 treatment effect. 216 00:10:03,260 --> 00:10:06,380 But in real life when we do our experiments, we very often 217 00:10:06,380 --> 00:10:08,550 have leakage across groups. 218 00:10:08,550 --> 00:10:12,050 So the treatment control difference is not 1, 100% in 219 00:10:12,050 --> 00:10:14,460 the treatment group treated and 0% in the 220 00:10:14,460 --> 00:10:15,930 control group treated. 221 00:10:15,930 --> 00:10:17,180 But it's smaller. 222 00:10:21,430 --> 00:10:24,270 The formal econometric phrase for this is 223 00:10:24,270 --> 00:10:25,520 instrumental variables. 224 00:10:29,220 --> 00:10:31,330 We instrument the probability of treatment with the original 225 00:10:31,330 --> 00:10:34,510 assignment and this will rescale the difference in 226 00:10:34,510 --> 00:10:37,890 means to give us a better estimate of what the effect of 227 00:10:37,890 --> 00:10:41,140 the treatment on the people who were treated is. 228 00:10:58,910 --> 00:11:00,080 So this is a simple example. 229 00:11:00,080 --> 00:11:02,020 And it turns out it gets more complicated. 230 00:11:02,020 --> 00:11:03,510 We're not going to go into all the nuances. 231 00:11:03,510 --> 00:11:05,330 But just suppose for simplicity that children who 232 00:11:05,330 --> 00:11:07,880 get the treatment have a weight gain of A, irrespective 233 00:11:07,880 --> 00:11:08,916 of whether they're in the treatment or 234 00:11:08,916 --> 00:11:10,000 in the control school. 235 00:11:10,000 --> 00:11:11,820 And children who get no treatment have a weight gain 236 00:11:11,820 --> 00:11:14,380 of B. Again, in both schools. 237 00:11:14,380 --> 00:11:16,990 And we want to know A minus B, the difference between getting 238 00:11:16,990 --> 00:11:18,520 treated and not getting treated. 239 00:11:24,540 --> 00:11:26,630 This is the math. 240 00:11:26,630 --> 00:11:28,680 And maybe it looks complicated, 241 00:11:28,680 --> 00:11:31,010 but it's really not. 242 00:11:34,410 --> 00:11:36,880 I think if we work through the Excel worksheet explaining 243 00:11:36,880 --> 00:11:39,880 what we're doing and then go back to the math, it'll become 244 00:11:39,880 --> 00:11:41,130 pretty clear. 245 00:11:43,190 --> 00:11:46,860 Imagine we run this experiment with pupils in School 1 and 246 00:11:46,860 --> 00:11:49,970 pupils in School 2. 247 00:11:49,970 --> 00:11:51,830 We intended to treat everybody in School 1. 248 00:11:51,830 --> 00:11:53,450 We intended for everybody in School 2 to be 249 00:11:53,450 --> 00:11:56,000 in the control group. 250 00:11:56,000 --> 00:11:58,470 Unfortunately, we were only able to treat 6 out of the 10 251 00:11:58,470 --> 00:12:02,670 people in this group and 2 out of the 10 people in this group 252 00:12:02,670 --> 00:12:03,990 managed to get treatment somehow. 253 00:12:07,140 --> 00:12:09,840 The formula sort of guides us through what we need to do and 254 00:12:09,840 --> 00:12:14,120 then we can talk about the intuition between what it is. 255 00:12:14,120 --> 00:12:17,120 Is there a heroic soul who's willing to try and just talk 256 00:12:17,120 --> 00:12:18,370 through the math and figure out? 257 00:12:22,889 --> 00:12:24,139 A non-heroic soul? 258 00:12:28,290 --> 00:12:29,540 Cold call. 259 00:12:32,630 --> 00:12:35,930 So, Will, let's start out with the easy stuff. 260 00:12:35,930 --> 00:12:39,150 So the formula we want is the average in the treatment group 261 00:12:39,150 --> 00:12:42,180 minus the average in the control group divided by the 262 00:12:42,180 --> 00:12:43,960 probability of treatment in the treatment group minus the 263 00:12:43,960 --> 00:12:46,050 probability of treatment in the control group. 264 00:12:46,050 --> 00:12:50,055 OK, so how do we calculate all of these things? 265 00:12:50,055 --> 00:12:52,470 AUDIENCE: So first you want to look at the change in the 266 00:12:52,470 --> 00:12:53,960 treatment group. 267 00:12:53,960 --> 00:12:59,100 So you average out the observed change in weight for 268 00:12:59,100 --> 00:13:01,540 the treatment group. 269 00:13:01,540 --> 00:13:02,790 SHAWN COLE: Great. 270 00:13:04,472 --> 00:13:07,102 OK, so that's three. 271 00:13:07,102 --> 00:13:09,910 AUDIENCE: You do the same calculation for the control 272 00:13:09,910 --> 00:13:12,994 group looking at the average change in weight. 273 00:13:15,800 --> 00:13:18,907 You take the difference between those two numbers and 274 00:13:18,907 --> 00:13:20,110 that's the numerator of the proper fraction. 275 00:13:20,110 --> 00:13:24,380 SHAWN COLE: OK, so that's yt minus-- 276 00:13:24,380 --> 00:13:26,490 OK. 277 00:13:26,490 --> 00:13:29,370 AUDIENCE: And then to do the second half of the calculation 278 00:13:29,370 --> 00:13:30,330 you've got the denominator. 279 00:13:30,330 --> 00:13:35,879 Compare the rate of compliance in the treatment group to the 280 00:13:35,879 --> 00:13:39,252 rate of compliance in the control group. 281 00:13:39,252 --> 00:13:44,196 So the percentage in the treatment group that actually 282 00:13:44,196 --> 00:13:47,770 complied would be 0.6. 283 00:13:47,770 --> 00:13:49,780 SHAWN COLE: 1, 2, 3, 4, 5, 6. 284 00:13:49,780 --> 00:13:50,370 Awesome. 285 00:13:50,370 --> 00:13:51,670 OK. 286 00:13:51,670 --> 00:13:54,975 AUDIENCE: And for the control group, it would be 287 00:13:54,975 --> 00:13:56,406 the 2 out of 10. 288 00:13:56,406 --> 00:13:59,400 So the rate that received the treatment. 289 00:13:59,400 --> 00:14:02,820 So 0.6 minus 0.2. 290 00:14:02,820 --> 00:14:05,120 SHAWN COLE: 0.6 minus 0.2. 291 00:14:05,120 --> 00:14:06,860 OK. 292 00:14:06,860 --> 00:14:09,220 And now? 293 00:14:09,220 --> 00:14:10,930 AUDIENCE: Just divide the top and bottom. 294 00:14:13,580 --> 00:14:15,350 SHAWN COLE: OK, so that's the math. 295 00:14:15,350 --> 00:14:17,170 See if we got it right. 296 00:14:17,170 --> 00:14:18,810 Yes, we got it right. 297 00:14:18,810 --> 00:14:19,620 Excellent. 298 00:14:19,620 --> 00:14:21,990 So what's the intuition behind what we're doing here? 299 00:14:29,718 --> 00:14:33,740 AUDIENCE: You're sort of doing a weighted average. 300 00:14:37,670 --> 00:14:39,420 SHAWN COLE: OK, we're certainly taking the two 301 00:14:39,420 --> 00:14:43,390 averages and we're taking the difference of them. 302 00:14:43,390 --> 00:14:44,810 And what do you mean by weighting? 303 00:14:44,810 --> 00:14:49,165 AUDIENCE: You're weighting them by the degree of 304 00:14:49,165 --> 00:14:50,500 compliance. 305 00:14:50,500 --> 00:14:54,800 SHAWN COLE: So if the difference in compliance-- 306 00:14:54,800 --> 00:14:58,670 keeping the top term the same, so suppose 307 00:14:58,670 --> 00:14:59,530 that this is still-- 308 00:14:59,530 --> 00:15:03,312 AUDIENCE: If compliance were really horrible you'd end up 309 00:15:03,312 --> 00:15:05,080 with no effect. 310 00:15:05,080 --> 00:15:07,208 It could swap the difference between the 311 00:15:07,208 --> 00:15:09,400 control and the treatment. 312 00:15:09,400 --> 00:15:12,630 SHAWN COLE: OK, so let's take the mental exercise, keeping 313 00:15:12,630 --> 00:15:19,040 this yt minus yc the same at 2.1. 314 00:15:19,040 --> 00:15:21,350 If the compliance in the treatment group goes down from 315 00:15:21,350 --> 00:15:25,970 0.6 to 0.3, what's going to happen to the ToT effect? 316 00:15:25,970 --> 00:15:27,490 Is it going to go up or down? 317 00:15:27,490 --> 00:15:27,960 AUDIENCE: Up. 318 00:15:27,960 --> 00:15:30,910 SHAWN COLE: Why? 319 00:15:30,910 --> 00:15:32,160 AUDIENCE: [UNINTELLIGIBLE PHRASE] 320 00:15:34,070 --> 00:15:37,750 The people that you were targeting are not receiving 321 00:15:37,750 --> 00:15:38,700 the treatment. 322 00:15:38,700 --> 00:15:44,090 So in a sense, the effect that you are trying to describe is 323 00:15:44,090 --> 00:15:47,510 not what we would expect would happen because some of these 324 00:15:47,510 --> 00:15:51,205 people did not comply with what you were expecting from 325 00:15:51,205 --> 00:15:53,480 the beginning. 326 00:15:53,480 --> 00:15:57,190 AUDIENCE: It could go either up or down depending on the 327 00:15:57,190 --> 00:15:58,070 second parameter. 328 00:15:58,070 --> 00:16:00,470 SHAWN COLE: OK, so what I want to do is I want to say there 329 00:16:00,470 --> 00:16:02,030 are four parameters here. 330 00:16:02,030 --> 00:16:03,830 There's the average in the treatment group. 331 00:16:03,830 --> 00:16:05,060 There's the average in the control group. 332 00:16:05,060 --> 00:16:06,500 There's the probability of treatment in the treatment 333 00:16:06,500 --> 00:16:08,000 group and the probability of treatment 334 00:16:08,000 --> 00:16:09,490 in the control group. 335 00:16:09,490 --> 00:16:15,870 If this goes down, what's the ToT estimate going to do? 336 00:16:15,870 --> 00:16:16,760 It's going to go? 337 00:16:16,760 --> 00:16:17,620 AUDIENCE: Increase. 338 00:16:17,620 --> 00:16:19,380 SHAWN COLE: Why? 339 00:16:19,380 --> 00:16:24,280 AUDIENCE: Because you previously underestimated the 340 00:16:24,280 --> 00:16:27,660 effect because not that many people received the treatment. 341 00:16:27,660 --> 00:16:29,910 SHAWN COLE: So mathematically, if we're making the 342 00:16:29,910 --> 00:16:33,490 denominator smaller, the size of the fraction has to go up. 343 00:16:33,490 --> 00:16:36,300 That's just a law of mathematics. 344 00:16:36,300 --> 00:16:38,560 But the intuition, what I'm looking for is the intuition. 345 00:16:38,560 --> 00:16:41,480 So what's going on? 346 00:16:41,480 --> 00:16:46,600 So suppose I first tell you that this difference is 3. 347 00:16:46,600 --> 00:16:49,083 Can you guys not see these numbers here? 348 00:16:49,083 --> 00:16:50,030 AUDIENCE: No. 349 00:16:50,030 --> 00:16:51,310 SHAWN COLE: That's excellent. 350 00:16:57,680 --> 00:17:00,420 OK, so let me use a blackboard, I think this'll be 351 00:17:00,420 --> 00:17:01,670 a little bit helpful. 352 00:17:06,020 --> 00:17:10,690 AUDIENCE: So were you asking just basically that because 353 00:17:10,690 --> 00:17:12,297 not everybody is being treated it's 354 00:17:12,297 --> 00:17:15,550 diluting the overall result? 355 00:17:15,550 --> 00:17:16,069 SHAWN COLE: Explain this. 356 00:17:16,069 --> 00:17:19,220 Give us the sort of intuition. 357 00:17:19,220 --> 00:17:30,100 I said that this treatment group has an average of 3 and 358 00:17:30,100 --> 00:17:37,010 the control group has an average of 0.1 I think. 359 00:17:37,010 --> 00:17:41,210 No 0.9. 360 00:17:41,210 --> 00:17:46,360 And first I told you that the probability of treatment 361 00:17:46,360 --> 00:17:51,900 conditional on being in the treatment group is 0.6. 362 00:17:51,900 --> 00:17:53,510 But now I tell you actually it's much lower. 363 00:17:53,510 --> 00:17:54,880 It's only 0.3. 364 00:17:54,880 --> 00:17:56,420 So we know from the math that it's going to 365 00:17:56,420 --> 00:17:57,230 be higher, but why? 366 00:17:57,230 --> 00:17:57,880 What's the intuition? 367 00:17:57,880 --> 00:18:00,128 Why does it have to be higher? 368 00:18:00,128 --> 00:18:02,508 AUDIENCE: Because the gap in outcomes is actually caused by 369 00:18:02,508 --> 00:18:02,990 fewer people. 370 00:18:02,990 --> 00:18:05,620 Meaning that for those two people it must have been a 371 00:18:05,620 --> 00:18:08,670 really big gap to balance out the average to still be a lot 372 00:18:08,670 --> 00:18:10,930 higher when everyone else in the treated group 373 00:18:10,930 --> 00:18:12,380 actually got zeros. 374 00:18:12,380 --> 00:18:13,900 SHAWN COLE: Yeah, absolutely. 375 00:18:13,900 --> 00:18:18,820 So we've got some difference in observed outcomes. 376 00:18:18,820 --> 00:18:21,760 And that has to be caused by the fact that there were more 377 00:18:21,760 --> 00:18:24,580 people in the treatment group treated than in the control 378 00:18:24,580 --> 00:18:25,860 group who were treated. 379 00:18:25,860 --> 00:18:28,240 So if there were a whole lot of people in the treatment 380 00:18:28,240 --> 00:18:32,240 group treated, then this rate difference could be just that 381 00:18:32,240 --> 00:18:36,110 the average score is two higher for each person in the 382 00:18:36,110 --> 00:18:37,140 treatment group, if everybody in the 383 00:18:37,140 --> 00:18:38,500 treatment group were treated. 384 00:18:38,500 --> 00:18:40,375 But if only three people in the treatment group were 385 00:18:40,375 --> 00:18:44,800 treated, to raise the average from 0.9 to 3, you have to 386 00:18:44,800 --> 00:18:48,810 have a really big effect on those three people. 387 00:18:48,810 --> 00:18:51,940 So that's all the treatment on the treated estimator is 388 00:18:51,940 --> 00:18:56,960 doing, is it's rescaling this observed effect to account for 389 00:18:56,960 --> 00:19:03,630 the fact that the difference is not 1 here. 390 00:19:03,630 --> 00:19:06,760 If we had perfect compliance, if this were 1 and this were 391 00:19:06,760 --> 00:19:10,490 0, then the denominator of the fraction would just be 1. 392 00:19:10,490 --> 00:19:12,910 This would look just like the intention to treat and the 393 00:19:12,910 --> 00:19:14,680 intention to treat and the treatment on the treated 394 00:19:14,680 --> 00:19:16,710 estimators would be the same. 395 00:19:16,710 --> 00:19:21,050 But if we have imperfect compliance, then we can scale 396 00:19:21,050 --> 00:19:24,440 up the effect to account for the fact that all of the 397 00:19:24,440 --> 00:19:27,710 change has to be due to the fact that a smaller number of 398 00:19:27,710 --> 00:19:30,153 people were getting the treatment. 399 00:19:30,153 --> 00:19:34,750 AUDIENCE: Now you can only do this because you believe there 400 00:19:34,750 --> 00:19:37,780 is no systematic difference for why people were treated or 401 00:19:37,780 --> 00:19:40,420 not in between these two groups? 402 00:19:40,420 --> 00:19:40,720 SHAWN COLE: Right. 403 00:19:40,720 --> 00:19:46,900 So there's some technical assumptions on how you 404 00:19:46,900 --> 00:19:48,470 interpret these effects. 405 00:19:48,470 --> 00:19:52,040 But if we agree that the effect is basically a constant 406 00:19:52,040 --> 00:19:54,580 effect, so our literacy program has the same effect on 407 00:19:54,580 --> 00:19:58,530 everybody, then this is perfectly fine. 408 00:19:58,530 --> 00:20:01,950 And we can do this because we randomly assigned the 409 00:20:01,950 --> 00:20:03,830 treatment in the control group so ex ante the two 410 00:20:03,830 --> 00:20:05,080 groups are the same. 411 00:20:07,570 --> 00:20:09,510 Will? 412 00:20:09,510 --> 00:20:10,440 AUDIENCE: So why do you not have to worry that the people 413 00:20:10,440 --> 00:20:13,780 in the treatment group who actually comply are the more 414 00:20:13,780 --> 00:20:16,350 motivated batch of the treatment group? 415 00:20:16,350 --> 00:20:17,110 SHAWN COLE: Right. 416 00:20:17,110 --> 00:20:27,650 That is a concern and it's covered in the papers, which 417 00:20:27,650 --> 00:20:30,020 I'll get to at the end. 418 00:20:30,020 --> 00:20:34,270 Basically, the sort of aggressive intuition is that 419 00:20:34,270 --> 00:20:37,520 this tends to measure the effect on people who are 420 00:20:37,520 --> 00:20:41,180 affected by the program. 421 00:20:41,180 --> 00:20:45,490 But in general, this is a pretty good way of just 422 00:20:45,490 --> 00:20:48,870 scaling up your program effects to account for the 423 00:20:48,870 --> 00:20:50,120 possibility of noncompliance. 424 00:20:53,470 --> 00:20:57,441 So moving back to-- 425 00:20:57,441 --> 00:20:58,320 sure. 426 00:20:58,320 --> 00:21:00,048 AUDIENCE: Did you answer his question? 427 00:21:03,160 --> 00:21:06,560 SHAWN COLE: Why do we have to worry about it? 428 00:21:06,560 --> 00:21:08,150 AUDIENCE: Yeah. 429 00:21:08,150 --> 00:21:12,110 It seemed like it's a serious problem. 430 00:21:12,110 --> 00:21:14,340 AUDIENCE: It started random, but then people who actually 431 00:21:14,340 --> 00:21:17,060 got treated, if they were treated because they were 432 00:21:17,060 --> 00:21:20,500 systematically different then it's no longer-- 433 00:21:20,500 --> 00:21:23,412 your control group is no longer a good representation 434 00:21:23,412 --> 00:21:24,250 of the [UNINTELLIGIBLE]. 435 00:21:24,250 --> 00:21:24,870 SHAWN COLE: Right. 436 00:21:24,870 --> 00:21:27,540 There are technical assumptions about monotonicity 437 00:21:27,540 --> 00:21:31,910 and independence that I sort of would rather not go into. 438 00:21:31,910 --> 00:21:38,040 But if you'll at least just grant that there's a constant 439 00:21:38,040 --> 00:21:42,960 treatment effect, then I think we're fine and by scaling up 440 00:21:42,960 --> 00:21:48,476 the impact to account for the non-compliers, we'll be OK. 441 00:21:48,476 --> 00:21:50,690 AUDIENCE: Is the idea-- sorry not to belabor the point. 442 00:21:50,690 --> 00:21:53,070 But is the idea that this method will get the correct 443 00:21:53,070 --> 00:21:55,980 magnitude of the treatment on those who are treated in this 444 00:21:55,980 --> 00:21:58,450 study, but you can't necessarily extrapolate that 445 00:21:58,450 --> 00:22:00,620 to say, that would have been the same 446 00:22:00,620 --> 00:22:01,570 treatment on the treated. 447 00:22:01,570 --> 00:22:03,550 Would you be able to force these other people who might 448 00:22:03,550 --> 00:22:05,460 be different in some way to get treated. 449 00:22:05,460 --> 00:22:05,790 SHAWN COLE: Right. 450 00:22:05,790 --> 00:22:09,430 This will tell you the correct magnitude of the people who 451 00:22:09,430 --> 00:22:11,960 because they were treated, get the treatment. 452 00:22:11,960 --> 00:22:14,010 So that's again, a relevant parameter when you're running 453 00:22:14,010 --> 00:22:18,050 a program that causes people to take the treatment, you 454 00:22:18,050 --> 00:22:19,870 have the effect. 455 00:22:19,870 --> 00:22:21,500 If on the other hand you're the government and you have 456 00:22:21,500 --> 00:22:25,985 some ability to compel compliance, then you might not 457 00:22:25,985 --> 00:22:26,970 get effect. 458 00:22:26,970 --> 00:22:28,240 So you may worry about this. 459 00:22:28,240 --> 00:22:31,130 Some people have done studies to try and sort of see how big 460 00:22:31,130 --> 00:22:32,640 a problem this is. 461 00:22:32,640 --> 00:22:36,630 And one example I can cite is some studies on educational 462 00:22:36,630 --> 00:22:37,020 literature. 463 00:22:37,020 --> 00:22:40,520 So in the US people have looked at mandatory school 464 00:22:40,520 --> 00:22:44,120 attendance laws that say in some states you can drop out 465 00:22:44,120 --> 00:22:46,330 of school at the age of 16 and in some states you can drop 466 00:22:46,330 --> 00:22:49,400 out of school at the age of 15. 467 00:22:49,400 --> 00:22:52,130 And so changes in these laws induce some 468 00:22:52,130 --> 00:22:53,460 people to stay in longer. 469 00:22:53,460 --> 00:22:56,720 But probably nobody here would have been affected by one of 470 00:22:56,720 --> 00:22:59,040 these mandatory schooling laws because we were all planning 471 00:22:59,040 --> 00:23:01,340 to go on to college anyway. 472 00:23:01,340 --> 00:23:05,550 And so people have compared estimates in the US with those 473 00:23:05,550 --> 00:23:08,280 in the UK where the mandatory schooling law were very 474 00:23:08,280 --> 00:23:10,700 binding and they found that the point estimates were 475 00:23:10,700 --> 00:23:11,370 pretty similar. 476 00:23:11,370 --> 00:23:15,080 So that's an example of where it can be reasonable. 477 00:23:15,080 --> 00:23:18,210 But this is something that you have to treat with a little 478 00:23:18,210 --> 00:23:20,850 bit of caution. 479 00:23:20,850 --> 00:23:23,100 So there are other challenges with this. 480 00:23:25,610 --> 00:23:28,680 To get the ToT, we need to know the probability of 481 00:23:28,680 --> 00:23:30,750 treatment conditional being on the treatment and the 482 00:23:30,750 --> 00:23:32,750 probability of being treated condition on being in the 483 00:23:32,750 --> 00:23:34,020 control group. 484 00:23:34,020 --> 00:23:35,480 So why might these be hard to get? 485 00:23:38,896 --> 00:23:40,848 AUDIENCE: I actually had a question. 486 00:23:40,848 --> 00:23:44,150 I don't know if it's directly related. 487 00:23:44,150 --> 00:23:48,008 Does this work if the probability that you're 488 00:23:48,008 --> 00:23:50,372 treated in both the treatment and control turn out to be 489 00:23:50,372 --> 00:23:51,710 equal or is there probability that-- 490 00:23:51,710 --> 00:23:53,690 SHAWN COLE: Great question. 491 00:23:53,690 --> 00:23:56,780 So you're anticipating a slide three slides from now. 492 00:23:56,780 --> 00:23:58,100 So let's go there and then we'll come 493 00:23:58,100 --> 00:23:59,420 back to my other problem. 494 00:23:59,420 --> 00:24:01,230 But this is the equation. 495 00:24:01,230 --> 00:24:03,070 So what happens if the probability of treatment and 496 00:24:03,070 --> 00:24:05,838 the treatment in the control group is the same? 497 00:24:05,838 --> 00:24:07,275 AUDIENCE: Mathematically you have problems. 498 00:24:10,630 --> 00:24:12,830 SHAWN COLE: Mathematically you're dividing something by 0 499 00:24:12,830 --> 00:24:16,160 and you're going to get infinity or negative infinity. 500 00:24:16,160 --> 00:24:19,320 So yeah, that's not successful. 501 00:24:19,320 --> 00:24:20,740 I mean, another way of putting that is that 502 00:24:20,740 --> 00:24:22,300 your experiment failed. 503 00:24:22,300 --> 00:24:24,910 You randomly assigned treatment and you assumed that 504 00:24:24,910 --> 00:24:27,540 your program, a financial literacy education program, 505 00:24:27,540 --> 00:24:31,250 de-worming medicine, is going to deliver a product. 506 00:24:31,250 --> 00:24:33,850 But if, in fact, it doesn't change the probability of 507 00:24:33,850 --> 00:24:36,060 treatment at all, then your program failed. 508 00:24:36,060 --> 00:24:39,280 It's sort of like if I were to sit here and 509 00:24:39,280 --> 00:24:41,210 working with an [? MFI, ?] 510 00:24:41,210 --> 00:24:43,810 send them an email and say, I'm assigning these 50 511 00:24:43,810 --> 00:24:46,540 villages to treatment and these 50 villages to control. 512 00:24:46,540 --> 00:24:49,560 And then the email gets lost and they don't do any sort of 513 00:24:49,560 --> 00:24:50,810 treatment or control. 514 00:24:50,810 --> 00:24:52,570 And then they send me the data back a year later 515 00:24:52,570 --> 00:24:53,430 and I do the analysis. 516 00:24:53,430 --> 00:24:55,940 Well, I'm going to find that there's no difference in 517 00:24:55,940 --> 00:24:58,030 probability of treatment between the two groups and I 518 00:24:58,030 --> 00:24:59,700 won't have anything interesting to say. 519 00:25:03,190 --> 00:25:05,300 So that's definitely one problem. 520 00:25:05,300 --> 00:25:10,270 The experiment doesn't work unless the treatment induces a 521 00:25:10,270 --> 00:25:11,980 change in probability of treatment between the 522 00:25:11,980 --> 00:25:13,230 treatment in the comparison groups. 523 00:25:16,200 --> 00:25:17,610 So that's one problem. 524 00:25:17,610 --> 00:25:19,600 There's a second problem with this estimation since we're on 525 00:25:19,600 --> 00:25:23,540 this slide, which is perhaps, slightly more technical. 526 00:25:23,540 --> 00:25:27,830 But if the difference between the treatment and the 527 00:25:27,830 --> 00:25:29,560 comparison group is small. 528 00:25:29,560 --> 00:25:33,360 So say it's 10% of the treatment group are treated 529 00:25:33,360 --> 00:25:36,720 and none of the comparison group is treated. 530 00:25:36,720 --> 00:25:40,660 Then we're estimating the mean of all of the treated people 531 00:25:40,660 --> 00:25:43,260 minus the mean of all the control people and dividing by 532 00:25:43,260 --> 00:25:47,280 0.1, which is the same thing as multiplying by 10. 533 00:25:47,280 --> 00:25:51,510 So if there's a little bit of noise in these measures, then 534 00:25:51,510 --> 00:25:52,550 instead of-- 535 00:25:52,550 --> 00:25:54,300 suppose the true mean is 3, but you 536 00:25:54,300 --> 00:25:57,410 happened to measure 3.1. 537 00:25:57,410 --> 00:25:59,890 Then that noise is going to be blown up by a 538 00:25:59,890 --> 00:26:02,690 factor of 10 as well. 539 00:26:02,690 --> 00:26:08,020 So the estimate is going to be much less precise when the 540 00:26:08,020 --> 00:26:11,540 difference in treatment between the treatment and the 541 00:26:11,540 --> 00:26:15,740 comparison group is low. 542 00:26:15,740 --> 00:26:19,333 And as you said, if it gets to 0, you're in a pickle. 543 00:26:19,333 --> 00:26:22,005 AUDIENCE: That's a pretty extreme problem. 544 00:26:22,005 --> 00:26:25,740 It means I had a treatment group and 545 00:26:25,740 --> 00:26:27,850 only 10% of the people. 546 00:26:27,850 --> 00:26:30,160 SHAWN COLE: Sure. 547 00:26:30,160 --> 00:26:33,040 That's true, but it's not implausible that you'd want to 548 00:26:33,040 --> 00:26:34,470 run that type of experiment. 549 00:26:34,470 --> 00:26:36,700 So if you think of-- 550 00:26:36,700 --> 00:26:38,960 I don't know the exact numbers on the flu encouragement 551 00:26:38,960 --> 00:26:41,440 design papers. 552 00:26:41,440 --> 00:26:43,960 To try and understand how effective a flu shot is, you 553 00:26:43,960 --> 00:26:47,750 can send out letters to a bunch of people and some of 554 00:26:47,750 --> 00:26:49,720 the people-- the control group, you just send a letter 555 00:26:49,720 --> 00:26:52,200 saying, just for your information, the influenza 556 00:26:52,200 --> 00:26:52,970 season is coming. 557 00:26:52,970 --> 00:26:54,860 Please wash your hands a lot. 558 00:26:54,860 --> 00:26:57,330 The treatment group you send a letter saying that information 559 00:26:57,330 --> 00:27:00,180 plus we're offering free flu shots down at the clinic, why 560 00:27:00,180 --> 00:27:01,470 don't you come down? 561 00:27:01,470 --> 00:27:03,070 And you may only get a 10% response 562 00:27:03,070 --> 00:27:05,280 rate from that letter. 563 00:27:05,280 --> 00:27:08,760 But in that case, the treatment is very cheap. 564 00:27:08,760 --> 00:27:12,670 It's just $0.32, or whatever a stamp cost these days. 565 00:27:12,670 --> 00:27:16,290 So you could do a study with 100,000 people. 566 00:27:16,290 --> 00:27:19,090 Then you would estimate both of these things very precisely 567 00:27:19,090 --> 00:27:21,230 because you'd have 50,000 people in the treatment group, 568 00:27:21,230 --> 00:27:23,540 50,000 people in the control group. 569 00:27:23,540 --> 00:27:26,590 And you'd have these numbers and so you could come up with 570 00:27:26,590 --> 00:27:27,300 an estimate. 571 00:27:27,300 --> 00:27:29,690 It's not like we should give up on an experiment that has a 572 00:27:29,690 --> 00:27:33,680 low probability of treatment, it's just if we think we're 573 00:27:33,680 --> 00:27:35,370 going to be there we want to do our power 574 00:27:35,370 --> 00:27:36,190 calculations carefully. 575 00:27:36,190 --> 00:27:39,880 And I think you would have seen when you did your-- 576 00:27:39,880 --> 00:27:42,940 did your power calculations include noncompliance? 577 00:27:42,940 --> 00:27:45,660 OK, so if you start adjusting the power calculations for 578 00:27:45,660 --> 00:27:48,700 noncompliance, you see that you need larger sample sizes. 579 00:27:48,700 --> 00:27:51,440 And so that's an important lesson as well. 580 00:27:55,460 --> 00:27:56,750 Sticking with that flu example. 581 00:28:00,400 --> 00:28:02,280 Why might this be hard? 582 00:28:02,280 --> 00:28:04,480 When we sent out these letters to all these people either 583 00:28:04,480 --> 00:28:06,170 encouraging them to wash their hands or encouraging them to 584 00:28:06,170 --> 00:28:06,705 get flu shots. 585 00:28:06,705 --> 00:28:09,240 AUDIENCE: We have to observe both. 586 00:28:09,240 --> 00:28:12,494 Whether there's treatment in your control in particular, 587 00:28:12,494 --> 00:28:15,180 that could be really hard to do. 588 00:28:15,180 --> 00:28:17,700 SHAWN COLE: So maybe if you're directing them to your flu 589 00:28:17,700 --> 00:28:20,020 clinic, you're going sort of observe everybody you send a 590 00:28:20,020 --> 00:28:23,230 letter who comes in for a flu shot. 591 00:28:23,230 --> 00:28:25,760 And maybe they're employees of Blue Cross Blue Shield or 592 00:28:25,760 --> 00:28:28,060 something and you hope you'll get a good estimate. 593 00:28:28,060 --> 00:28:30,350 But what about this, you've got 50,000 people you've sent 594 00:28:30,350 --> 00:28:32,940 letters to in the control group. 595 00:28:32,940 --> 00:28:33,480 What can you do? 596 00:28:33,480 --> 00:28:40,480 Do you have to phone up all 50,000 people and try and-- 597 00:28:40,480 --> 00:28:42,690 AUDIENCE: And say, oh, by the way, did you happen 598 00:28:42,690 --> 00:28:43,210 to get a flu shot. 599 00:28:43,210 --> 00:28:43,320 SHAWN COLE: Exactly. 600 00:28:43,320 --> 00:28:45,690 So do we need to make 50,000 telephone calls 601 00:28:45,690 --> 00:28:47,402 to solve this problem? 602 00:28:47,402 --> 00:28:49,250 AUDIENCE: Randomly sample. 603 00:28:49,250 --> 00:28:49,970 SHAWN COLE: Randomly sample? 604 00:28:49,970 --> 00:28:53,140 So pick 500, even 1,000, or who knows? 605 00:28:53,140 --> 00:28:55,520 Maybe even 200 people, 300 people will give you a pretty 606 00:28:55,520 --> 00:28:57,040 good estimate of what this is. 607 00:28:57,040 --> 00:28:59,590 You just need to make sure that it is a randomly selected 608 00:28:59,590 --> 00:29:01,670 sample and you're not just asking all of the people in 609 00:29:01,670 --> 00:29:02,920 this particular clinic. 610 00:29:08,500 --> 00:29:12,540 We're getting a little bit out of order. 611 00:29:12,540 --> 00:29:14,900 I've gone through the math, but PowerPoint wouldn't let 612 00:29:14,900 --> 00:29:15,420 you see it. 613 00:29:15,420 --> 00:29:19,800 But at least suffice it to say Will talked us through it. 614 00:29:19,800 --> 00:29:22,470 It's not a lot of fancy econometrics we're doing here. 615 00:29:22,470 --> 00:29:24,740 This is called the Wald estimate if you've taken 616 00:29:24,740 --> 00:29:26,170 econometrics. 617 00:29:26,170 --> 00:29:32,100 But it's a very simple method for coming up. 618 00:29:32,100 --> 00:29:35,630 So there's some problems or there's some areas where this 619 00:29:35,630 --> 00:29:36,330 could be a problem. 620 00:29:36,330 --> 00:29:37,280 And Will hinted at one. 621 00:29:37,280 --> 00:29:39,170 We can mention a couple others. 622 00:29:39,170 --> 00:29:44,140 So how might this treatment on the treatment design fail, 623 00:29:44,140 --> 00:29:46,740 let's say, in the letters example, 624 00:29:46,740 --> 00:29:47,870 the influenza example? 625 00:29:47,870 --> 00:29:50,170 So we're sending out a bunch of letters. 626 00:29:50,170 --> 00:29:52,290 So suppose the treatment group is you send out a bunch of 627 00:29:52,290 --> 00:29:55,290 letters saying, it's flu season, come get a flu shot. 628 00:29:55,290 --> 00:29:57,425 And the control group is you don't send out any letters. 629 00:30:00,150 --> 00:30:03,640 And so what happens? 630 00:30:03,640 --> 00:30:07,990 Suppose you get 50% compliance. 631 00:30:07,990 --> 00:30:15,290 So treated and the control group, 632 00:30:15,290 --> 00:30:19,070 compliance means flu shot. 633 00:30:19,070 --> 00:30:21,020 You get 50%. 634 00:30:21,020 --> 00:30:23,600 Let's make it easy, in the control group you get 0%. 635 00:30:23,600 --> 00:30:26,120 You do a sample and just nobody gets a flu shot. 636 00:30:26,120 --> 00:30:28,650 Maybe this is in a developing country where people don't 637 00:30:28,650 --> 00:30:30,260 tend to think about flu shots. 638 00:30:30,260 --> 00:30:38,480 And suppose you get the flu rate to be 10% here. 639 00:30:38,480 --> 00:30:41,180 We could say this is in Mexico. 640 00:30:41,180 --> 00:30:42,510 And 15% here. 641 00:30:45,070 --> 00:30:46,460 Do I have my math right? 642 00:30:46,460 --> 00:30:46,960 Yeah. 643 00:30:46,960 --> 00:30:49,940 So what would the treatment on the treated estimate here be? 644 00:30:55,700 --> 00:30:57,620 AUDIENCE: 20%? 645 00:30:57,620 --> 00:30:59,280 SHAWN COLE: So what's the formula? 646 00:31:04,840 --> 00:31:08,814 AUDIENCE: 10 minus 15. 647 00:31:08,814 --> 00:31:12,140 Divided by 0.5. 648 00:31:12,140 --> 00:31:13,810 SHAWN COLE: Minus 0. 649 00:31:13,810 --> 00:31:15,060 So it's minus 10. 650 00:31:19,200 --> 00:31:23,520 So giving the flu shot to a population will reduce the 651 00:31:23,520 --> 00:31:27,350 incidence of flu by 10 percentage points. 652 00:31:27,350 --> 00:31:31,860 So what's an example of how this experiment could fail, or 653 00:31:31,860 --> 00:31:35,366 how this number could not be correct? 654 00:31:35,366 --> 00:31:38,824 AUDIENCE: By reminding people to get a flu shot, you remind 655 00:31:38,824 --> 00:31:41,788 them that the flu is out there and they might do other things 656 00:31:41,788 --> 00:31:42,290 besides get a flu shot. 657 00:31:42,290 --> 00:31:42,710 SHAWN COLE: Absolutely. 658 00:31:42,710 --> 00:31:44,830 So now they wash their hands, or they don't go out in public 659 00:31:44,830 --> 00:31:46,990 places, or they wear masks. 660 00:31:46,990 --> 00:31:51,570 And so you think that you're reducing the flu a lot by 661 00:31:51,570 --> 00:31:52,470 these flu shots. 662 00:31:52,470 --> 00:31:54,530 But maybe in fact, washing your hands is a lot more 663 00:31:54,530 --> 00:31:55,380 important than-- 664 00:31:55,380 --> 00:31:57,330 in fact, I think it probably is-- washing your hands off 665 00:31:57,330 --> 00:31:58,990 and is a lot more important than getting a flu shot. 666 00:31:58,990 --> 00:32:00,800 And that's what's giving you the effect. 667 00:32:00,800 --> 00:32:03,670 So if you're scaling up this impact in the treatment on the 668 00:32:03,670 --> 00:32:07,440 treated to give yourself credit for the fact that 50% 669 00:32:07,440 --> 00:32:08,710 of the people didn't get a flu shot. 670 00:32:08,710 --> 00:32:10,980 If instead they're actually washing their hands a lot, 671 00:32:10,980 --> 00:32:13,610 then this won't be the correct estimate of 672 00:32:13,610 --> 00:32:15,900 treatment on the treated. 673 00:32:15,900 --> 00:32:19,096 Is that reasonably clear? 674 00:32:19,096 --> 00:32:20,578 AUDIENCE: I have a question. 675 00:32:20,578 --> 00:32:24,010 Will it be a good estimate of the impact of that specific 676 00:32:24,010 --> 00:32:25,400 intervention on the treated? 677 00:32:25,400 --> 00:32:29,524 So instead of measuring the impact of the flu shot, you're 678 00:32:29,524 --> 00:32:32,512 measuring the impact of reminding people about flu 679 00:32:32,512 --> 00:32:34,520 shots and giving them access to free ones? 680 00:32:34,520 --> 00:32:37,090 SHAWN COLE: So would you want to scale that up or not? 681 00:32:37,090 --> 00:32:38,948 Which estimate would you want to take there? 682 00:32:38,948 --> 00:32:40,945 AUDIENCE: Depends on what you're interested in. 683 00:32:40,945 --> 00:32:42,310 SHAWN COLE: So suppose we're interested in, what's the 684 00:32:42,310 --> 00:32:43,980 impact of sending a letter to somebody and 685 00:32:43,980 --> 00:32:46,780 offering them a flu shot? 686 00:32:46,780 --> 00:32:50,250 Is the correct estimate 5 percentage points or 10 687 00:32:50,250 --> 00:32:51,500 percentage points? 688 00:32:54,380 --> 00:32:55,420 AUDIENCE: 5. 689 00:32:55,420 --> 00:32:56,120 SHAWN COLE: Why? 690 00:32:56,120 --> 00:32:58,625 AUDIENCE: Then your intent to treat is maybe more 691 00:32:58,625 --> 00:33:01,365 interesting because you want to take into account that 692 00:33:01,365 --> 00:33:04,116 people change their behavior and their compliance rates are 693 00:33:04,116 --> 00:33:04,280 [UNINTELLIGIBLE]. 694 00:33:04,280 --> 00:33:04,590 SHAWN COLE: Right. 695 00:33:04,590 --> 00:33:07,030 So then we're really interested in the package of 696 00:33:07,030 --> 00:33:11,540 effects that sending a letter causes, which includes 697 00:33:11,540 --> 00:33:12,940 sometimes getting a shot, more likely. 698 00:33:12,940 --> 00:33:15,330 But also washing your hands or being more careful. 699 00:33:15,330 --> 00:33:16,890 So the intent to treat-- 700 00:33:16,890 --> 00:33:18,633 and as we say, so if you have this situation, you should 701 00:33:18,633 --> 00:33:19,850 only use intent to treat. 702 00:33:19,850 --> 00:33:21,170 And intent to treat is very interesting and it tells us 703 00:33:21,170 --> 00:33:23,480 the effect of sending out a letter, but it doesn't 704 00:33:23,480 --> 00:33:26,720 necessarily tell us the effect of the flu shot. 705 00:33:26,720 --> 00:33:31,300 Anybody have any examples from their own programs or 706 00:33:31,300 --> 00:33:33,410 projected projects where they're concerned about this? 707 00:33:37,300 --> 00:33:40,260 Maybe you guys can be at least sure to work this in on your 708 00:33:40,260 --> 00:33:43,470 presentations tomorrow. 709 00:33:43,470 --> 00:33:46,350 What about when you have spillovers or externalities? 710 00:33:46,350 --> 00:33:48,270 So I think Michael talked about spillovers and 711 00:33:48,270 --> 00:33:49,100 externalities this morning. 712 00:33:49,100 --> 00:33:50,800 But he might not have integrated that with 713 00:33:50,800 --> 00:33:53,100 intention to treat. 714 00:33:53,100 --> 00:33:54,090 Is that a correct 715 00:33:54,090 --> 00:33:55,570 characterization of his lecture? 716 00:33:55,570 --> 00:33:56,820 Excellent. 717 00:34:01,130 --> 00:34:03,440 How could we go wrong-- let's stick to something I know 718 00:34:03,440 --> 00:34:06,450 well-- with the Balsakhi Program. 719 00:34:06,450 --> 00:34:07,860 In the Balsakhi Program, we have 720 00:34:07,860 --> 00:34:09,980 treatment and control schools. 721 00:34:09,980 --> 00:34:17,300 And sort of the compliance is 20% in the treatment schools 722 00:34:17,300 --> 00:34:20,010 and 0% in the control schools. 723 00:34:20,010 --> 00:34:23,420 And the change in test scores is, let's 724 00:34:23,420 --> 00:34:26,050 say, 1 standard deviation. 725 00:34:26,050 --> 00:34:28,690 No, that's going to be way to high. 726 00:34:28,690 --> 00:34:33,510 0.2 standard deviations here and 0 727 00:34:33,510 --> 00:34:36,489 standard deviations here. 728 00:34:36,489 --> 00:34:39,880 So quickly, as a review, how do we get the ITT estimator? 729 00:34:45,159 --> 00:34:47,810 This is what we call a chip shot at HBS. 730 00:34:47,810 --> 00:34:50,520 But since we're not awarding points for participation there 731 00:34:50,520 --> 00:34:52,344 aren't a lot of golfers out there. 732 00:34:52,344 --> 00:34:53,312 AUDIENCE: Intention to treat? 733 00:34:53,312 --> 00:34:54,280 It's just 0.2. 734 00:34:54,280 --> 00:34:57,430 SHAWN COLE: OK and how did you get that? 735 00:34:57,430 --> 00:34:59,335 You just saw it? 736 00:34:59,335 --> 00:35:00,670 AUDIENCE: You just subtract it. 737 00:35:00,670 --> 00:35:01,560 You don't have to do anything. 738 00:35:01,560 --> 00:35:02,885 SHAWN COLE: So it's 0.2. 739 00:35:02,885 --> 00:35:03,920 AUDIENCE: Minus 0. 740 00:35:03,920 --> 00:35:05,340 SHAWN COLE: Minus 0. 741 00:35:05,340 --> 00:35:06,363 Over? 742 00:35:06,363 --> 00:35:09,470 AUDIENCE: Wait, are you saying-- 743 00:35:09,470 --> 00:35:09,910 SHAWN COLE: Oh, sorry. 744 00:35:09,910 --> 00:35:11,320 Intent to treat. 745 00:35:11,320 --> 00:35:11,780 Yeah, exactly. 746 00:35:11,780 --> 00:35:12,350 Yeah, you're right. 747 00:35:12,350 --> 00:35:13,030 Sorry, my bad. 748 00:35:13,030 --> 00:35:15,500 So the intent to treat is 0.2. 749 00:35:15,500 --> 00:35:18,143 And what's the treatment on the treated? 750 00:35:18,143 --> 00:35:19,690 AUDIENCE: 0.2 minus 0. 751 00:35:19,690 --> 00:35:20,944 SHAWN COLE: Great. 752 00:35:20,944 --> 00:35:25,540 AUDIENCE: Over 0.2 minus 0. 753 00:35:25,540 --> 00:35:27,650 SHAWN COLE: This is confusing. 754 00:35:27,650 --> 00:35:29,570 This is the standard deviations in test score and 755 00:35:29,570 --> 00:35:31,590 this is the percentage compliance. 756 00:35:31,590 --> 00:35:33,440 And so what's that going to give us? 757 00:35:33,440 --> 00:35:33,870 AUDIENCE: 1. 758 00:35:33,870 --> 00:35:35,680 SHAWN COLE: 1. 759 00:35:35,680 --> 00:35:38,520 So wow, that's a spectacularly effective program. 760 00:35:38,520 --> 00:35:42,600 I was very proud to be associated with it. 761 00:35:42,600 --> 00:35:44,520 It raised test scores by 1 standard deviation. 762 00:35:44,520 --> 00:35:46,730 Which if you know the education literature, is a 763 00:35:46,730 --> 00:35:47,980 pretty big impact. 764 00:35:50,970 --> 00:35:55,100 What might we be concerned about in this case? 765 00:35:55,100 --> 00:35:56,570 So what did the Balsakhi Program do? 766 00:35:56,570 --> 00:35:57,490 Refresh ourselves. 767 00:35:57,490 --> 00:36:00,070 It takes 20% of the students, pulls them out of the 768 00:36:00,070 --> 00:36:03,150 classroom during the regular class, and 769 00:36:03,150 --> 00:36:04,440 sends them to a tutor. 770 00:36:04,440 --> 00:36:06,360 And these are the kids who are often in the back of the 771 00:36:06,360 --> 00:36:08,240 class, the teacher's not paying attention to them 772 00:36:08,240 --> 00:36:11,040 because they're only teaching to the top of the class. 773 00:36:11,040 --> 00:36:12,830 Maybe they're making trouble, throwing 774 00:36:12,830 --> 00:36:14,080 things around, et cetera. 775 00:36:17,150 --> 00:36:19,716 What could go wrong? 776 00:36:19,716 --> 00:36:22,999 AUDIENCE: You're attributing the fact to just taking this 777 00:36:22,999 --> 00:36:25,701 class to just those students in particular. 778 00:36:25,701 --> 00:36:28,106 Whereas you're taking them out of the class, so you're making 779 00:36:28,106 --> 00:36:30,511 the class that you left behind smaller. 780 00:36:30,511 --> 00:36:32,440 So there's effects-- 781 00:36:32,440 --> 00:36:34,010 SHAWN COLE: Right, so how does making the class that you left 782 00:36:34,010 --> 00:36:34,950 behind smaller matter? 783 00:36:34,950 --> 00:36:39,340 AUDIENCE: Usually it's easier to learn in a smaller group. 784 00:36:39,340 --> 00:36:41,960 SHAWN COLE: That's why we teach a JPAL maximum class 785 00:36:41,960 --> 00:36:44,960 size of 15 or 20. 786 00:36:44,960 --> 00:36:49,330 We hope that small classes are more effective. 787 00:36:49,330 --> 00:36:52,180 And there's also a tracking argument that maybe now the 788 00:36:52,180 --> 00:36:56,080 teacher can focus really on the homogeneous group rather 789 00:36:56,080 --> 00:36:58,510 than having to teach to multiple levels, et cetera. 790 00:36:58,510 --> 00:37:00,520 So there are all these other reasons to think that it may 791 00:37:00,520 --> 00:37:02,920 be the case that they're going to be some spillovers. 792 00:37:02,920 --> 00:37:07,830 So how would you explain in words to a policymaker why 793 00:37:07,830 --> 00:37:09,710 you're not sure that 1 is the right effective 794 00:37:09,710 --> 00:37:10,960 treatment on treated? 795 00:37:15,460 --> 00:37:17,610 This is particular for the IPA folks who will have to be 796 00:37:17,610 --> 00:37:20,370 doing this to earn their paychecks. 797 00:37:23,457 --> 00:37:24,707 But anybody is welcome. 798 00:37:27,222 --> 00:37:32,020 AUDIENCE: There were sidebar benefits to the control group. 799 00:37:32,020 --> 00:37:33,890 SHAWN COLE: Right. 800 00:37:33,890 --> 00:37:35,100 Well, not the control group. 801 00:37:35,100 --> 00:37:37,510 AUDIENCE: I mean to the-- 802 00:37:37,510 --> 00:37:39,510 SHAWN COLE: Untreated students in the treatment group. 803 00:37:39,510 --> 00:37:41,710 So we were attributing all of this test gain to sending 804 00:37:41,710 --> 00:37:44,260 these kids out to class. 805 00:37:44,260 --> 00:37:47,020 But in fact, there could've been some test gain in the 806 00:37:47,020 --> 00:37:48,440 other groups. 807 00:37:48,440 --> 00:37:51,650 In the extreme, I suppose you could imagine that there's no 808 00:37:51,650 --> 00:37:53,770 performance gain for the children who go out to the 809 00:37:53,770 --> 00:37:57,520 Balsakhis, but getting those misbehaving children out of 810 00:37:57,520 --> 00:38:00,750 the class causes everybody else to learn so much more 811 00:38:00,750 --> 00:38:03,750 effectively that it raises the test score. 812 00:38:03,750 --> 00:38:06,820 So the program still has an effect, but the way it has an 813 00:38:06,820 --> 00:38:10,460 effect is by getting the misbehaving 814 00:38:10,460 --> 00:38:11,760 children out of the class. 815 00:38:11,760 --> 00:38:14,140 Now it turns out that that's not the case that there are 816 00:38:14,140 --> 00:38:17,050 sort of some interesting ways to try and tease out how big 817 00:38:17,050 --> 00:38:18,280 the spillovers were. 818 00:38:18,280 --> 00:38:19,750 And I encourage you to read the paper if you're 819 00:38:19,750 --> 00:38:20,800 interested. 820 00:38:20,800 --> 00:38:23,800 And it turns out that there's not really any evidence of 821 00:38:23,800 --> 00:38:26,380 spillovers, so it really does seem like the effect happened 822 00:38:26,380 --> 00:38:29,150 through the Balsakhi Program. 823 00:38:29,150 --> 00:38:32,080 But it's definitely something we want to be aware about when 824 00:38:32,080 --> 00:38:33,330 we're doing our analysis. 825 00:38:38,060 --> 00:38:40,420 OK, so we've already talked about this. 826 00:38:40,420 --> 00:38:41,580 If you have partial compliance, your 827 00:38:41,580 --> 00:38:45,560 power may be affected. 828 00:38:45,560 --> 00:38:49,930 So the intention to treat is often appropriate for program 829 00:38:49,930 --> 00:38:51,270 evaluations. 830 00:38:51,270 --> 00:38:52,730 It's simple to calculate. 831 00:38:52,730 --> 00:38:54,840 It's easy to explain. 832 00:38:54,840 --> 00:38:57,880 If you do this program, you'll get a mean change in your 833 00:38:57,880 --> 00:39:00,240 outcome of 0.3, 0.4, 0.5. 834 00:39:00,240 --> 00:39:02,570 There are a lot of advantages to it. 835 00:39:02,570 --> 00:39:05,620 But sometimes you may be interested as well on the 836 00:39:05,620 --> 00:39:06,520 program itself. 837 00:39:06,520 --> 00:39:09,580 And as we said, it measures the treatment effect for those 838 00:39:09,580 --> 00:39:12,980 who take the treatment because they're assigned to it. 839 00:39:12,980 --> 00:39:15,260 If you have people who will never ever 840 00:39:15,260 --> 00:39:16,180 ever take the treatment. 841 00:39:16,180 --> 00:39:20,260 If you tried to run a randomized evaluation on 842 00:39:20,260 --> 00:39:22,280 Christian scientists who refuse medical treatment and 843 00:39:22,280 --> 00:39:26,830 you assign them free medical care, you're not going to find 844 00:39:26,830 --> 00:39:27,380 any effect. 845 00:39:27,380 --> 00:39:31,780 But we can find the effect of the treatment on people who 846 00:39:31,780 --> 00:39:34,350 take the treatment when they're offered it. 847 00:39:34,350 --> 00:39:36,600 And so when you're doing the design of your experiment, 848 00:39:36,600 --> 00:39:39,270 it's important to think through these issues and think 849 00:39:39,270 --> 00:39:42,040 at the end of the day, at the end of the study, in two years 850 00:39:42,040 --> 00:39:45,150 time when we've collected all the data and analyzed it, what 851 00:39:45,150 --> 00:39:46,480 sort of results are we going to report? 852 00:39:46,480 --> 00:39:47,910 How are we going to report them? 853 00:39:47,910 --> 00:39:50,588 And how are we going to explain them to other folks? 854 00:39:53,980 --> 00:39:58,650 So that is intention to treat and treatment on the treated. 855 00:39:58,650 --> 00:40:02,230 And let's move briefly to the choice of outcomes and 856 00:40:02,230 --> 00:40:03,480 covariates. 857 00:40:07,640 --> 00:40:12,080 We always look forward to your feedback from the course. 858 00:40:12,080 --> 00:40:15,580 But in my view, the course might benefit from a little 859 00:40:15,580 --> 00:40:18,820 bit more focus on some of the practical aspects of doing an 860 00:40:18,820 --> 00:40:19,460 evaluation. 861 00:40:19,460 --> 00:40:24,690 So for example, survey design and determining what outcomes. 862 00:40:24,690 --> 00:40:27,100 Although maybe many of you are already familiar with that. 863 00:40:27,100 --> 00:40:30,220 So often when you do these randomized evaluations, you do 864 00:40:30,220 --> 00:40:32,940 like a household survey or you have administrative data on 865 00:40:32,940 --> 00:40:35,980 the individual and you have 100 or you have 150 variables 866 00:40:35,980 --> 00:40:36,370 about them. 867 00:40:36,370 --> 00:40:38,820 So you go into the field with a 20 page survey and you sit 868 00:40:38,820 --> 00:40:40,890 and you ask, how many cows do you have? 869 00:40:40,890 --> 00:40:42,330 How many goats do you have? 870 00:40:42,330 --> 00:40:43,840 How much land do you have? 871 00:40:43,840 --> 00:40:46,040 What was your monthly income? 872 00:40:46,040 --> 00:40:48,540 Are your children going to school, et cetera. 873 00:40:48,540 --> 00:40:52,290 And so if you imagine an outcome like microfinance, 874 00:40:52,290 --> 00:40:57,400 which we hope causes people to be more productive and engaged 875 00:40:57,400 --> 00:41:00,470 and boosts household income, arguably boosts women's 876 00:41:00,470 --> 00:41:01,930 empowerment, et cetera. 877 00:41:01,930 --> 00:41:04,420 There are sort of a lot of things that could plausibly 878 00:41:04,420 --> 00:41:09,420 happen because you offer people microfinance. 879 00:41:09,420 --> 00:41:12,070 But from a statistical side, this offers some challenges. 880 00:41:12,070 --> 00:41:15,390 So what's the problem when you're interested in how 881 00:41:15,390 --> 00:41:17,480 microfinance affects 40 different outcomes. 882 00:41:17,480 --> 00:41:22,350 So education, consumption, feelings of empowerment, 883 00:41:22,350 --> 00:41:23,330 number of hours worked. 884 00:41:23,330 --> 00:41:25,310 You know, you can come up with a lot of plausible things that 885 00:41:25,310 --> 00:41:27,320 you think microfinance would affect. 886 00:41:33,240 --> 00:41:36,250 AUDIENCE: At a 0.5 level too, those will come out 887 00:41:36,250 --> 00:41:38,980 significant having 40 even though-- 888 00:41:38,980 --> 00:41:39,910 SHAWN COLE: Right. 889 00:41:39,910 --> 00:41:45,110 So for people who haven't done hypothesis testing, hypothesis 890 00:41:45,110 --> 00:41:48,880 testing, which is what we use to analyze data says, how 891 00:41:48,880 --> 00:41:50,510 likely is it that the difference between the 892 00:41:50,510 --> 00:41:52,710 treatment and the control group is because of the 893 00:41:52,710 --> 00:41:55,990 program or simply because of random chance? 894 00:41:55,990 --> 00:41:58,710 And so you can look at the distribution of outcomes in 895 00:41:58,710 --> 00:42:00,690 the treatment group and the control group and observe 896 00:42:00,690 --> 00:42:02,820 their means and their variances and the number of 897 00:42:02,820 --> 00:42:07,480 observations, and you can come up with a rule that says it's 898 00:42:07,480 --> 00:42:08,500 very, very unlikely. 899 00:42:08,500 --> 00:42:11,680 Only 1 in 100 times would this thing have happened because of 900 00:42:11,680 --> 00:42:13,240 random chance. 901 00:42:13,240 --> 00:42:16,310 The standard that tends to be used in economics and the 902 00:42:16,310 --> 00:42:19,290 medical literature is this 5% p value, which 903 00:42:19,290 --> 00:42:22,190 is 1 out of 20 times. 904 00:42:22,190 --> 00:42:26,450 But if you're looking at 40 outcomes, then on average, 2 905 00:42:26,450 --> 00:42:28,580 of them are going to be statistically significant just 906 00:42:28,580 --> 00:42:29,240 out of random chance. 907 00:42:29,240 --> 00:42:33,030 So if I were to just randomly divide this group, this class 908 00:42:33,030 --> 00:42:35,810 into two groups, and started looking at things like US born 909 00:42:35,810 --> 00:42:41,210 or foreign born, or Yankees fan or Red Sox fan, or what 910 00:42:41,210 --> 00:42:43,880 have you, it wouldn't be too hard to eventually find 911 00:42:43,880 --> 00:42:46,160 something that were statistically significantly 912 00:42:46,160 --> 00:42:49,500 different between the two groups. 913 00:42:49,500 --> 00:42:54,770 This is a challenge. 914 00:42:54,770 --> 00:42:57,240 There are a few ways to deal 915 00:42:57,240 --> 00:42:58,140 effectively with this challenge. 916 00:42:58,140 --> 00:43:01,020 So this isn't a particularly difficult challenge. 917 00:43:01,020 --> 00:43:03,780 So what the medical literature does in which, I think, we're 918 00:43:03,780 --> 00:43:07,310 sort of slowly moving towards in social sciences, is you 919 00:43:07,310 --> 00:43:11,090 have to stay in advance where you expect to find effects. 920 00:43:11,090 --> 00:43:13,380 So if you're going to the FDA to test the efficacy of a 921 00:43:13,380 --> 00:43:17,030 drug, when you apply for the phase III evaluation you have 922 00:43:17,030 --> 00:43:19,110 to say, I think this is going to cure Shawn's 923 00:43:19,110 --> 00:43:20,790 male pattern baldness. 924 00:43:20,790 --> 00:43:27,760 And I think that this is going to result in a more charming 925 00:43:27,760 --> 00:43:28,480 personality. 926 00:43:28,480 --> 00:43:32,400 And then you find Shawn and all of his brothers and you 927 00:43:32,400 --> 00:43:33,170 run the experiment. 928 00:43:33,170 --> 00:43:37,180 And then if the outcome turns out that the treatment group 929 00:43:37,180 --> 00:43:43,030 is now never get sick, simply immune from all diseases. 930 00:43:43,030 --> 00:43:46,450 In the past year, nobody in the treatment group got sick, 931 00:43:46,450 --> 00:43:48,680 you can't then go out and market that drug as something 932 00:43:48,680 --> 00:43:51,520 that cures diseases because that wasn't your stated 933 00:43:51,520 --> 00:43:54,470 hypothesis test ahead of time. 934 00:43:54,470 --> 00:43:57,320 So oftentimes, we have a lot of guidance on what 935 00:43:57,320 --> 00:43:58,420 hypotheses to test. 936 00:43:58,420 --> 00:44:01,470 So if we're doing a financial literacy program, we expect 937 00:44:01,470 --> 00:44:04,400 that to effect financial outcomes and not employment 938 00:44:04,400 --> 00:44:08,830 and not divorce rates. 939 00:44:08,830 --> 00:44:10,500 Although you could tell a story that 940 00:44:10,500 --> 00:44:12,250 eventually gets you there. 941 00:44:12,250 --> 00:44:14,920 But when you're reporting your results to the world then, 942 00:44:14,920 --> 00:44:16,950 what you want to do is report the results on all the 943 00:44:16,950 --> 00:44:18,770 measured outcomes, even the ones for 944 00:44:18,770 --> 00:44:20,330 which you find no effect. 945 00:44:20,330 --> 00:44:23,790 So then, anybody who takes the study can look and say, OK, 946 00:44:23,790 --> 00:44:27,070 they're saying that their program has a great effect on 947 00:44:27,070 --> 00:44:30,610 income and children's schooling and health. 948 00:44:30,610 --> 00:44:35,560 But they tested 200 things, so I'm a little bit skeptical. 949 00:44:35,560 --> 00:44:38,050 Or but they only tested 6 things and half of them were 950 00:44:38,050 --> 00:44:40,400 large statistically significant impact, so I 951 00:44:40,400 --> 00:44:41,840 really believe that study. 952 00:44:46,950 --> 00:44:49,020 Maybe it's a little unfortunate the last class of 953 00:44:49,020 --> 00:44:51,470 the course is about sort of challenges with randomized 954 00:44:51,470 --> 00:44:56,000 evaluations because I should just take a sidebar to 955 00:44:56,000 --> 00:44:57,800 emphasize that these problems we're talking about are not 956 00:44:57,800 --> 00:44:59,370 unique to randomized evaluations. 957 00:44:59,370 --> 00:45:02,310 Any evaluation you run these risks. 958 00:45:02,310 --> 00:45:06,310 So if it's just sort of the standard, what we like to 959 00:45:06,310 --> 00:45:08,170 think of as a pretty bad evaluation where you just find 960 00:45:08,170 --> 00:45:10,820 some treatment people and find some associated comparison 961 00:45:10,820 --> 00:45:13,460 people and do a survey, you're going to run into this exact 962 00:45:13,460 --> 00:45:13,940 same problem. 963 00:45:13,940 --> 00:45:17,640 So don't take many of these as a criticism as randomized 964 00:45:17,640 --> 00:45:21,010 evaluation, but just take them as good science. 965 00:45:21,010 --> 00:45:22,620 What to do for good science. 966 00:45:22,620 --> 00:45:25,360 And there are other things you can do which is to adjust your 967 00:45:25,360 --> 00:45:26,440 standard errors. 968 00:45:26,440 --> 00:45:28,840 So we have a very simple way of 969 00:45:28,840 --> 00:45:30,180 calculating standard errors. 970 00:45:30,180 --> 00:45:32,630 But if you're testing multiple hypotheses, then you can 971 00:45:32,630 --> 00:45:35,430 actually statistically take that into account and come up 972 00:45:35,430 --> 00:45:38,180 with sort of corrected or bounds on 973 00:45:38,180 --> 00:45:38,820 your standard errors. 974 00:45:38,820 --> 00:45:42,460 And those are described in the literature that I'm going to 975 00:45:42,460 --> 00:45:44,572 refer to you at the end of the talk as well. 976 00:45:44,572 --> 00:45:46,600 AUDIENCE: What about taking half your data and mining it? 977 00:45:49,250 --> 00:45:51,610 SHAWN COLE: And throwing the rest away? 978 00:45:51,610 --> 00:45:53,860 AUDIENCE: No, I mine half of my data. 979 00:45:53,860 --> 00:45:57,010 I just figure out what really matters and that's how I 980 00:45:57,010 --> 00:45:58,260 generate my hypothesis. 981 00:46:01,142 --> 00:46:04,258 SHAWN COLE: That sounds reasonable to me. 982 00:46:04,258 --> 00:46:06,750 AUDIENCE: I better have had a big enough sample. 983 00:46:06,750 --> 00:46:08,710 SHAWN COLE: Yeah, you better have had a big enough sample. 984 00:46:08,710 --> 00:46:12,460 I think sort of that intuition is-- 985 00:46:12,460 --> 00:46:15,810 data mining is problematic, but it is useful when you run 986 00:46:15,810 --> 00:46:18,200 a study to report all of your results because if there are 987 00:46:18,200 --> 00:46:21,190 some surprising things there, we don't know whether they're 988 00:46:21,190 --> 00:46:23,260 there because of chance or because the 989 00:46:23,260 --> 00:46:24,130 program had that effect. 990 00:46:24,130 --> 00:46:27,400 But that gives us sort of a view on what to do when we 991 00:46:27,400 --> 00:46:28,110 test again. 992 00:46:28,110 --> 00:46:30,590 So if we're going to try this microfinance program out in 993 00:46:30,590 --> 00:46:32,910 some other country or some other area of the country, we 994 00:46:32,910 --> 00:46:36,350 said it had this surprising effect on girl's empowerment. 995 00:46:36,350 --> 00:46:39,280 So now let's make that one of our key outcomes and let's 996 00:46:39,280 --> 00:46:39,810 test it again. 997 00:46:39,810 --> 00:46:41,700 So I think that sounds pretty reasonable. 998 00:46:44,660 --> 00:46:47,010 There's a pretty advanced, developed statistical 999 00:46:47,010 --> 00:46:50,000 literature on hypothesis testing and how you develop 1000 00:46:50,000 --> 00:46:51,540 hypotheses. 1001 00:46:51,540 --> 00:46:53,976 But that sounds not unreasonable to me. 1002 00:46:53,976 --> 00:46:55,226 Other thoughts? 1003 00:47:00,670 --> 00:47:06,490 So another possibility is heterogeneous treatment 1004 00:47:06,490 --> 00:47:08,600 effects, which is what we often look for. 1005 00:47:08,600 --> 00:47:11,180 So you might think that the financial literacy program is 1006 00:47:11,180 --> 00:47:14,240 more effective with women than men because women started out 1007 00:47:14,240 --> 00:47:16,770 with lower levels of initial financial literacy, or you 1008 00:47:16,770 --> 00:47:19,200 might think that the de-worming medication is more 1009 00:47:19,200 --> 00:47:23,800 effective for children who live near the river because 1010 00:47:23,800 --> 00:47:25,890 then they play outside more often, or something like that. 1011 00:47:25,890 --> 00:47:28,170 And so it's very tempting to run your regressions for 1012 00:47:28,170 --> 00:47:29,580 different subgroups. 1013 00:47:29,580 --> 00:47:33,900 But again, there's this risk that you're data mining. 1014 00:47:33,900 --> 00:47:39,430 Suppose I wanted to show that I randomly assigned you this 1015 00:47:39,430 --> 00:47:42,430 group into a treatment and control group. 1016 00:47:42,430 --> 00:47:46,750 And I wanted to show that Yankees and Red Sox 1017 00:47:46,750 --> 00:47:49,170 preferences were significantly different in the two groups. 1018 00:47:49,170 --> 00:47:52,590 I could probably cut the data in different ways and 1019 00:47:52,590 --> 00:47:56,100 eventually find some subset of you for whom the treatment and 1020 00:47:56,100 --> 00:47:59,080 the control variables were actually different. 1021 00:47:59,080 --> 00:48:03,200 And so you want to be aware of that possibility. 1022 00:48:03,200 --> 00:48:09,570 And again, like the FDA drug trial way to avoid this is 1023 00:48:09,570 --> 00:48:12,170 just to announce in advance which subgroups you expect 1024 00:48:12,170 --> 00:48:14,550 this product to be more or less effective for and make 1025 00:48:14,550 --> 00:48:18,140 sure you have a sufficient sample size to test 1026 00:48:18,140 --> 00:48:20,010 statistical significance within those subgroups. 1027 00:48:24,160 --> 00:48:28,010 Again, as a service to all the consumers of your studies, 1028 00:48:28,010 --> 00:48:29,500 report the results on all the subgroups. 1029 00:48:29,500 --> 00:48:34,370 Even the subgroups for whom the program's not effective. 1030 00:48:34,370 --> 00:48:35,890 So there's another problem that-- 1031 00:48:35,890 --> 00:48:39,320 did we talk about clustering in groups, data in the power 1032 00:48:39,320 --> 00:48:39,870 calculations? 1033 00:48:39,870 --> 00:48:46,370 OK, so that is sort of the bane of statistical analysis. 1034 00:48:46,370 --> 00:48:50,050 Or as we liked to say when I was a graduate student, sort 1035 00:48:50,050 --> 00:48:52,890 of people used to not really appreciate the importance of 1036 00:48:52,890 --> 00:48:54,230 these grouped errors. 1037 00:48:54,230 --> 00:48:56,370 It was much easier to write a paper back then because you 1038 00:48:56,370 --> 00:48:58,740 found lots of statistically significant results. 1039 00:48:58,740 --> 00:49:01,980 And once you start using this cluster 1040 00:49:01,980 --> 00:49:04,180 adjustment, it's a lot harder. 1041 00:49:04,180 --> 00:49:06,700 Now only 5% of results are statistically significant. 1042 00:49:06,700 --> 00:49:10,110 So we whined that if only we had graduated five years 1043 00:49:10,110 --> 00:49:11,630 earlier, it would have been much easier to 1044 00:49:11,630 --> 00:49:12,680 get our thesis done. 1045 00:49:12,680 --> 00:49:17,140 But we here don't care about getting the thesis done. 1046 00:49:17,140 --> 00:49:22,390 We here care about finding out the truth and measuring the 1047 00:49:22,390 --> 00:49:25,110 potential for cluster in standard errors, or clustering 1048 00:49:25,110 --> 00:49:28,830 common shocks from the group is very strong. 1049 00:49:28,830 --> 00:49:31,840 You can often get estimates of the correlation within groups 1050 00:49:31,840 --> 00:49:33,670 using survey data that you already have. 1051 00:49:33,670 --> 00:49:36,090 That will inform your power calculations. 1052 00:49:36,090 --> 00:49:39,760 But when you're doing your analysis, you need to adjust 1053 00:49:39,760 --> 00:49:43,600 your statistical results for this clustering. 1054 00:49:43,600 --> 00:49:47,130 And in particular, you run into problems when your groups 1055 00:49:47,130 --> 00:49:48,030 are very small. 1056 00:49:48,030 --> 00:49:50,810 So if you're thinking, I want to do an evaluation where I 1057 00:49:50,810 --> 00:49:54,090 had 15 treatment villages and 15 control villages. 1058 00:49:54,090 --> 00:49:55,760 Well, it's very likely that outcomes are going to be 1059 00:49:55,760 --> 00:49:57,610 correlated within that village. 1060 00:49:57,610 --> 00:50:00,200 And then, all of a sudden, you only have 30 clusters. 1061 00:50:00,200 --> 00:50:02,990 And even the statistical technique isn't great for 1062 00:50:02,990 --> 00:50:04,820 sample sizes that are that small. 1063 00:50:04,820 --> 00:50:07,320 You have to use other statistical techniques, which 1064 00:50:07,320 --> 00:50:10,950 are sort of valid, but less powerful. 1065 00:50:10,950 --> 00:50:14,650 So we won't go into the randomization inference. 1066 00:50:14,650 --> 00:50:18,410 Again, it's mentioned in this paper I was talking about. 1067 00:50:22,000 --> 00:50:25,140 You should be aware of this and I think the most important 1068 00:50:25,140 --> 00:50:27,300 time to think about this is when you're designing your 1069 00:50:27,300 --> 00:50:30,890 evaluation, to make sure you get enough clusters. 1070 00:50:30,890 --> 00:50:34,670 And if you can, it's almost always preferable to randomize 1071 00:50:34,670 --> 00:50:37,320 by individual than group. 1072 00:50:37,320 --> 00:50:40,010 Because randomizing by group requires typically, much 1073 00:50:40,010 --> 00:50:41,070 larger sample sizes. 1074 00:50:41,070 --> 00:50:43,502 AUDIENCE: Can you just say one more thing or give us a 1075 00:50:43,502 --> 00:50:46,525 reference on what you meant by other statistical techniques 1076 00:50:46,525 --> 00:50:49,550 that are valid, but not as powerful? 1077 00:50:49,550 --> 00:50:50,480 SHAWN COLE: Sure. 1078 00:50:50,480 --> 00:50:52,660 Let me just skip to this. 1079 00:50:52,660 --> 00:50:57,730 This should be on your slide package. 1080 00:50:57,730 --> 00:51:00,930 The ultimate slide is additional resources. 1081 00:51:00,930 --> 00:51:04,200 And so there's a paper called "Using Randomization in 1082 00:51:04,200 --> 00:51:05,890 Development Economics Research: A Toolkit, by 1083 00:51:05,890 --> 00:51:07,410 Esther, Rachel, and Michael. 1084 00:51:07,410 --> 00:51:09,390 And this goes through everything we've talked about 1085 00:51:09,390 --> 00:51:12,560 this week in pretty careful detail, works out the math, 1086 00:51:12,560 --> 00:51:14,630 and gives you references to what you need. 1087 00:51:14,630 --> 00:51:17,470 So it's in there. 1088 00:51:17,470 --> 00:51:20,950 Josh Angrist, who's on the faculty here, has a very good 1089 00:51:20,950 --> 00:51:23,840 book called "Mostly Harmless Econometrics." But it's 1090 00:51:23,840 --> 00:51:24,840 designed for academics. 1091 00:51:24,840 --> 00:51:26,110 It's not really a textbook. 1092 00:51:26,110 --> 00:51:28,900 But it goes through these things in very, very, very 1093 00:51:28,900 --> 00:51:30,870 good detail and it's fun to read. 1094 00:51:30,870 --> 00:51:33,110 But specifically, the technique is called 1095 00:51:33,110 --> 00:51:34,825 randomization inference. 1096 00:51:34,825 --> 00:51:37,260 It was developed by Fisher. 1097 00:51:37,260 --> 00:51:39,225 And what you do is you basically-- 1098 00:51:45,420 --> 00:51:47,210 so you have your treatment and your control group and your 1099 00:51:47,210 --> 00:51:49,760 mean between the treatment and the mean in the control and 1100 00:51:49,760 --> 00:51:51,510 you test the statistical significance. 1101 00:51:51,510 --> 00:51:54,190 And then what you do is you just randomly reassign 1102 00:51:54,190 --> 00:51:57,320 everybody to either treatment or control, regardless of what 1103 00:51:57,320 --> 00:51:59,920 they actually did, and see if there's a difference between 1104 00:51:59,920 --> 00:52:01,330 the treatment and control group. 1105 00:52:01,330 --> 00:52:05,470 And if you do that a hundred times, you can sort of get a 1106 00:52:05,470 --> 00:52:09,090 sense for how often you find statistically significant 1107 00:52:09,090 --> 00:52:10,070 differences or not. 1108 00:52:10,070 --> 00:52:11,340 It's related to bootstrapping. 1109 00:52:16,940 --> 00:52:19,040 That's a reasonable method, but the problem with that is 1110 00:52:19,040 --> 00:52:21,160 the statistical power's not very good. 1111 00:52:21,160 --> 00:52:23,050 So you need a larger sample. 1112 00:52:23,050 --> 00:52:25,140 And then once you have a larger sample, then you don't 1113 00:52:25,140 --> 00:52:27,070 need to worry about it because you can cluster. 1114 00:52:30,630 --> 00:52:32,980 So another question that's maybe a little bit more 1115 00:52:32,980 --> 00:52:35,670 technical is when you're doing your analysis, your regression 1116 00:52:35,670 --> 00:52:40,690 analysis, is what covariates do you want to control for? 1117 00:52:40,690 --> 00:52:43,510 So we're looking at the effect of financial literacy 1118 00:52:43,510 --> 00:52:46,960 education on credit card repayment. 1119 00:52:46,960 --> 00:52:48,760 When we do our statistical analysis, do we want to 1120 00:52:48,760 --> 00:52:53,310 control for the age of the person, for their gender, for 1121 00:52:53,310 --> 00:52:55,420 their initial measured level of financial 1122 00:52:55,420 --> 00:52:57,660 literacy, et cetera. 1123 00:52:57,660 --> 00:52:59,860 Now the beauty of randomization is that it 1124 00:52:59,860 --> 00:53:00,920 doesn't matter. 1125 00:53:00,920 --> 00:53:04,190 Even if you don't have data on any of these covariates, as 1126 00:53:04,190 --> 00:53:06,680 long as the program was initially randomly assigned 1127 00:53:06,680 --> 00:53:12,670 and the sample size is large enough, then you'll be OK. 1128 00:53:12,670 --> 00:53:16,340 But what the controls can do is that they can help you get 1129 00:53:16,340 --> 00:53:17,560 a more precise estimate. 1130 00:53:17,560 --> 00:53:20,400 So if a lot of people's credit card repayment behavior is 1131 00:53:20,400 --> 00:53:24,920 explained by whether they ate a cookie as a child or not and 1132 00:53:24,920 --> 00:53:26,710 you happen to have that particular data for people who 1133 00:53:26,710 --> 00:53:31,370 have been reading The New Yorker, then you can soak up 1134 00:53:31,370 --> 00:53:32,780 some of the variation and come up with a 1135 00:53:32,780 --> 00:53:34,480 more precise estimate. 1136 00:53:34,480 --> 00:53:35,970 Or you can control for age, or control for 1137 00:53:35,970 --> 00:53:37,340 income, or other things. 1138 00:53:37,340 --> 00:53:41,760 So it's often desirable to have additional controls. 1139 00:53:41,760 --> 00:53:44,070 But what you don't want to do is control for a variable that 1140 00:53:44,070 --> 00:53:46,480 might have been affected by the treatment itself. 1141 00:53:46,480 --> 00:53:49,620 So if you're looking at the effect of microfinance on 1142 00:53:49,620 --> 00:53:53,060 women's empowerment, so that's the goal of your study. 1143 00:53:53,060 --> 00:53:55,990 And then you would say, well, women who have higher levels 1144 00:53:55,990 --> 00:53:59,760 of income report higher levels of feeling empowered. 1145 00:53:59,760 --> 00:54:00,910 So that's an important 1146 00:54:00,910 --> 00:54:02,400 determinant of feeling empowered. 1147 00:54:02,400 --> 00:54:04,610 We should include that in our control 1148 00:54:04,610 --> 00:54:06,100 when we do our analysis. 1149 00:54:06,100 --> 00:54:09,930 But if it turns out that microfinance increased income 1150 00:54:09,930 --> 00:54:13,330 and increased control, then we might conclude that there's no 1151 00:54:13,330 --> 00:54:15,910 effect because we're attributing the effect to the 1152 00:54:15,910 --> 00:54:18,420 differences in income. 1153 00:54:18,420 --> 00:54:19,670 Is that clear? 1154 00:54:23,660 --> 00:54:28,890 A lot of these are fairly nuanced issues and it's often 1155 00:54:28,890 --> 00:54:34,920 worth consulting an academic or often a PhD student are 1156 00:54:34,920 --> 00:54:38,930 eager to work on projects like this as well. 1157 00:54:38,930 --> 00:54:41,750 Just as a rule, it's important to report the raw differences 1158 00:54:41,750 --> 00:54:43,310 and the regression adjusted results. 1159 00:54:48,510 --> 00:54:51,760 I think we advance a very strong view that randomized 1160 00:54:51,760 --> 00:54:55,780 evaluation is a very credible method of evaluation. 1161 00:54:55,780 --> 00:54:59,700 But even still, there are always ways to tweak or twist 1162 00:54:59,700 --> 00:55:01,860 things a little bit to try and get the results you want. 1163 00:55:01,860 --> 00:55:05,160 So you could have a survey with a hundred people or a 1164 00:55:05,160 --> 00:55:08,150 hundred outcomes and only report seven of them. 1165 00:55:08,150 --> 00:55:10,940 That might make your program look better than it is. 1166 00:55:10,940 --> 00:55:14,110 So these rules we're proposing are ways to give people an 1167 00:55:14,110 --> 00:55:17,700 honest and a thorough view of the 1168 00:55:17,700 --> 00:55:18,800 effectiveness of your program. 1169 00:55:18,800 --> 00:55:21,280 So another rule is that when you're reporting your 1170 00:55:21,280 --> 00:55:24,750 regression results, you should include the results with the 1171 00:55:24,750 --> 00:55:26,600 covariates, as well as the results without the 1172 00:55:26,600 --> 00:55:27,850 covariates. 1173 00:55:32,140 --> 00:55:36,550 So now let's talk about threats to external validity. 1174 00:55:36,550 --> 00:55:39,710 So we spent a lot of time so far talking about internal 1175 00:55:39,710 --> 00:55:43,880 validity, which is sort of was the 1176 00:55:43,880 --> 00:55:45,130 treatment randomly assigned? 1177 00:55:45,130 --> 00:55:46,760 Did enough people in the treatment group comply that 1178 00:55:46,760 --> 00:55:47,600 you have a difference between the 1179 00:55:47,600 --> 00:55:49,130 treatment and control group? 1180 00:55:49,130 --> 00:55:52,560 But it's not sufficient to know that we've learned a lot 1181 00:55:52,560 --> 00:55:55,432 from a randomized evaluation. 1182 00:55:55,432 --> 00:56:01,190 So there's some threats that just doing the evaluation 1183 00:56:01,190 --> 00:56:05,070 itself may have some impact above and beyond the program. 1184 00:56:05,070 --> 00:56:07,630 And so these are called Hawthorne and Henry effects. 1185 00:56:07,630 --> 00:56:11,270 And maybe we'll go back to the audience for some examples of 1186 00:56:11,270 --> 00:56:12,760 a Hawthorne effect. 1187 00:56:12,760 --> 00:56:15,570 So a Hawthorne effect is when the treatment group behavior 1188 00:56:15,570 --> 00:56:16,990 changes because of the experiment. 1189 00:56:16,990 --> 00:56:18,550 What's an example of that? 1190 00:56:18,550 --> 00:56:21,881 Anybody familiar with the original Hawthorne study? 1191 00:56:21,881 --> 00:56:24,290 AUDIENCE: Was is the lights being dimmed or different 1192 00:56:24,290 --> 00:56:27,850 levels of lights and workers felt like they were getting 1193 00:56:27,850 --> 00:56:30,407 attention paid to them no matter what the 1194 00:56:30,407 --> 00:56:31,590 light level was at? 1195 00:56:31,590 --> 00:56:33,300 It's just that there was something going on and so 1196 00:56:33,300 --> 00:56:35,710 their productivity was higher. 1197 00:56:35,710 --> 00:56:37,140 SHAWN COLE: I actually don't remember the study. 1198 00:56:37,140 --> 00:56:38,040 Can I just try and rephrase that? 1199 00:56:38,040 --> 00:56:38,810 You can tell me whether I'm right. 1200 00:56:38,810 --> 00:56:41,900 So the experiment was to try and figure out how the level 1201 00:56:41,900 --> 00:56:44,575 of lighting in a factory affects productivity? 1202 00:56:44,575 --> 00:56:45,920 I suppose. 1203 00:56:45,920 --> 00:56:48,770 And so they said we're going to raise the level of lighting 1204 00:56:48,770 --> 00:56:51,610 in this sort of select, maybe even randomly selected. 1205 00:56:51,610 --> 00:56:55,560 We randomly select 50 out of 100 of our work groups and we 1206 00:56:55,560 --> 00:56:57,440 roll in an extra light. 1207 00:56:57,440 --> 00:56:59,250 And they're like, oh great, management 1208 00:56:59,250 --> 00:57:00,800 really cares about us. 1209 00:57:00,800 --> 00:57:02,230 They're including us in this survey. 1210 00:57:02,230 --> 00:57:03,530 They're giving us extra light. 1211 00:57:03,530 --> 00:57:04,450 We're going to work extra hard. 1212 00:57:04,450 --> 00:57:08,660 And so you find a higher output from that group. 1213 00:57:08,660 --> 00:57:10,580 Alternatively, if you had said, maybe people are getting 1214 00:57:10,580 --> 00:57:12,060 distracted by having too much light. 1215 00:57:12,060 --> 00:57:14,780 So management picked these 50 groups, went in and unscrewed 1216 00:57:14,780 --> 00:57:15,410 some light bulbs. 1217 00:57:15,410 --> 00:57:16,770 They're working in sort of a dim area, they'd 1218 00:57:16,770 --> 00:57:17,740 be like, oh, wow. 1219 00:57:17,740 --> 00:57:20,150 Management really cares about our well being. 1220 00:57:20,150 --> 00:57:22,350 Now we focus on the natural light and we're going to work 1221 00:57:22,350 --> 00:57:23,150 really hard. 1222 00:57:23,150 --> 00:57:26,420 So just the act of sort of being in the treatment causes 1223 00:57:26,420 --> 00:57:27,645 your behavior to change. 1224 00:57:27,645 --> 00:57:30,810 And so what's wrong with that? 1225 00:57:30,810 --> 00:57:32,680 I mean we're trying to measure the effect of treatment. 1226 00:57:32,680 --> 00:57:34,220 If the effect of treatment is to increase 1227 00:57:34,220 --> 00:57:35,850 productivity, fine. 1228 00:57:35,850 --> 00:57:39,110 AUDIENCE: Maybe you can not make a generalization to a 1229 00:57:39,110 --> 00:57:42,700 population that doesn't have change that behavior because 1230 00:57:42,700 --> 00:57:44,690 doesn't get the treatment already. 1231 00:57:49,680 --> 00:57:52,030 AUDIENCE: You run around changing the light levels at 1232 00:57:52,030 --> 00:57:55,210 different factories and you don't get the effect because 1233 00:57:55,210 --> 00:58:00,570 the real treatment was people running around and testing and 1234 00:58:00,570 --> 00:58:03,340 observing and measuring the change in light. 1235 00:58:03,340 --> 00:58:03,600 SHAWN COLE: Right. 1236 00:58:03,600 --> 00:58:05,100 And saying, we're really glad you're part of this study. 1237 00:58:05,100 --> 00:58:06,632 It's very important. 1238 00:58:06,632 --> 00:58:07,920 AUDIENCE: Which was really what got 1239 00:58:07,920 --> 00:58:09,260 people to work harder. 1240 00:58:09,260 --> 00:58:10,510 SHAWN COLE: Right. 1241 00:58:12,400 --> 00:58:14,500 You might be able to generalize that if we decided 1242 00:58:14,500 --> 00:58:18,650 to run this study in every factory in our firm, then we 1243 00:58:18,650 --> 00:58:20,500 might get similar results in different factories. 1244 00:58:20,500 --> 00:58:22,710 But probably, within a few months, people would sort of 1245 00:58:22,710 --> 00:58:24,960 just catch on that this is just kind of wacky, and 1246 00:58:24,960 --> 00:58:27,550 what's going on? 1247 00:58:27,550 --> 00:58:30,240 It wouldn't really be the effect of the program. 1248 00:58:30,240 --> 00:58:34,292 Any other examples from your programs of Hawthorne effects? 1249 00:58:34,292 --> 00:58:36,900 Or things you might be worried about? 1250 00:58:40,540 --> 00:58:42,325 AUDIENCE: It seems like a lot of behavior situations, 1251 00:58:42,325 --> 00:58:43,200 there's a threat to this. 1252 00:58:43,200 --> 00:58:47,105 Especially if you have some in developing country context 1253 00:58:47,105 --> 00:58:49,855 where you have foreigners coming in or people from the 1254 00:58:49,855 --> 00:58:51,410 capitol coming in. 1255 00:58:51,410 --> 00:58:53,650 Especially if it's something that-- again, with a behavior 1256 00:58:53,650 --> 00:58:56,560 change where OK, I know I'm supposed to be washing my 1257 00:58:56,560 --> 00:58:57,230 hands with soap. 1258 00:58:57,230 --> 00:58:59,630 I normally don't, but I know that the white people get 1259 00:58:59,630 --> 00:59:02,520 really happy when I do it and they're coming in to evaluate. 1260 00:59:02,520 --> 00:59:02,800 So I'm going to go ahead-- 1261 00:59:02,800 --> 00:59:03,031 SHAWN COLE: Right. 1262 00:59:03,031 --> 00:59:07,090 So if you show up from abroad and put posters encouraging 1263 00:59:07,090 --> 00:59:09,890 people to wash their hands, people may pay more 1264 00:59:09,890 --> 00:59:12,260 attention to that. 1265 00:59:12,260 --> 00:59:14,330 What does that validate or not invalidate? 1266 00:59:17,010 --> 00:59:19,380 So suppose we did this and we found that the effect was it 1267 00:59:19,380 --> 00:59:24,310 reduces reported incidence of diarrhea by 15%. 1268 00:59:24,310 --> 00:59:27,045 AUDIENCE: If you then don't still have white people coming 1269 00:59:27,045 --> 00:59:30,840 into the village, then the same effect might not happen. 1270 00:59:30,840 --> 00:59:31,170 SHAWN COLE: Right. 1271 00:59:31,170 --> 00:59:34,550 So I guess it's a little bit nuanced because we should 1272 00:59:34,550 --> 00:59:37,550 distinguish between the program generalizability, 1273 00:59:37,550 --> 00:59:41,260 which is the program could be white people come into the 1274 00:59:41,260 --> 00:59:44,030 village, and Hawthorne effects, which is because I 1275 00:59:44,030 --> 00:59:46,250 know I'm in the treatment group in the study, I'm going 1276 00:59:46,250 --> 00:59:47,850 to act differently. 1277 00:59:47,850 --> 00:59:52,890 So what's another example of really, a Hawthorne effect? 1278 00:59:52,890 --> 00:59:55,450 I'm sympathetic to yours as a Hawthorne effect, but I want 1279 00:59:55,450 --> 00:59:56,700 to really sort of nail it. 1280 00:59:56,700 --> 01:00:00,330 AUDIENCE: There's one for hand washing where every week 1281 01:00:00,330 --> 01:00:02,490 people go into the villages and then tell them the 1282 01:00:02,490 --> 01:00:05,571 importance of hand washing as a way to prevent malaria. 1283 01:00:05,571 --> 01:00:07,320 And every time they ask them, did you wash hands? 1284 01:00:07,320 --> 01:00:09,000 Do you wash hands? 1285 01:00:09,000 --> 01:00:12,350 But that's not sustainable on a long-term basis because-- 1286 01:00:12,350 --> 01:00:14,900 and at the same time, you're distributing free soap. 1287 01:00:14,900 --> 01:00:17,140 So how do you separate everything? 1288 01:00:17,140 --> 01:00:18,980 SHAWN COLE: I would say again, that's the program. 1289 01:00:18,980 --> 01:00:21,740 And so we would say if we scaled that program up to all 1290 01:00:21,740 --> 01:00:24,270 of the country, we'd be fine. 1291 01:00:24,270 --> 01:00:26,330 If we go in every week and tell people to wash their 1292 01:00:26,330 --> 01:00:30,029 hands and distribute soap, that's fine. 1293 01:00:30,029 --> 01:00:32,494 AUDIENCE: So sometimes on sexual behavior studies, 1294 01:00:32,494 --> 01:00:36,109 you'll find that a treatment group that is encouraged to 1295 01:00:36,109 --> 01:00:36,931 adopt condoms, or something. 1296 01:00:36,931 --> 01:00:38,725 And then in the post-measurement period, they 1297 01:00:38,725 --> 01:00:40,970 know what the intervention is about. 1298 01:00:40,970 --> 01:00:42,840 And so if you ask them, what has your sexual behavior been 1299 01:00:42,840 --> 01:00:45,275 in the last week, they're much more likely to say that 1300 01:00:45,275 --> 01:00:47,460 they've been using condoms. 1301 01:00:47,460 --> 01:00:51,340 Or change their partnering habits or these other sorts of 1302 01:00:51,340 --> 01:00:53,180 things that have nothing to do with-- 1303 01:00:53,180 --> 01:00:53,440 SHAWN COLE: Right. 1304 01:00:53,440 --> 01:00:57,000 So the effect of being in the treatment group and knowing 1305 01:00:57,000 --> 01:00:59,440 that you're getting this treatment might change how you 1306 01:00:59,440 --> 01:01:01,010 answer the survey questions, even if you 1307 01:01:01,010 --> 01:01:03,440 didn't behave that way. 1308 01:01:03,440 --> 01:01:05,660 Any other? 1309 01:01:05,660 --> 01:01:08,770 AUDIENCE: For the Hawthorne effect? 1310 01:01:08,770 --> 01:01:12,680 SHAWN COLE: So it's certainly a problem with your survey. 1311 01:01:12,680 --> 01:01:13,752 AUDIENCE: It might change other aspects of their 1312 01:01:13,752 --> 01:01:17,820 behavior other than condom use that would be simply because 1313 01:01:17,820 --> 01:01:19,200 they know that you're looking. 1314 01:01:21,910 --> 01:01:24,110 SHAWN COLE: So I mean I would've said something like I 1315 01:01:24,110 --> 01:01:27,220 know that they really care about me and so because I'm 1316 01:01:27,220 --> 01:01:31,320 part of this MIT Poverty Action Lab Study, it must be 1317 01:01:31,320 --> 01:01:32,350 really important that I do this. 1318 01:01:32,350 --> 01:01:35,690 But if you were to generalize the program, not as part of a 1319 01:01:35,690 --> 01:01:39,542 study, then people would react to it differently. 1320 01:01:39,542 --> 01:01:43,900 The other side of the coin is the John Henry effect, which 1321 01:01:43,900 --> 01:01:45,700 is people in the comparison group behave differently. 1322 01:01:45,700 --> 01:01:47,350 What are examples of that? 1323 01:01:52,580 --> 01:01:59,400 AUDIENCE: The village in the control group is resentful to 1324 01:01:59,400 --> 01:02:02,952 the politician who they perceived as determining who 1325 01:02:02,952 --> 01:02:06,880 got treatment status and so they don't try as hard on 1326 01:02:06,880 --> 01:02:07,730 whatever's being measured. 1327 01:02:07,730 --> 01:02:12,580 Or they intentionally turn in a poor performance as a sign 1328 01:02:12,580 --> 01:02:14,035 of protest. 1329 01:02:14,035 --> 01:02:15,290 SHAWN COLE: Right. 1330 01:02:15,290 --> 01:02:17,460 They're like, why am I in the control group of this study. 1331 01:02:17,460 --> 01:02:19,250 I wanted to be in the treatment group. 1332 01:02:19,250 --> 01:02:21,520 I'm not going to use fertilizer because I know this 1333 01:02:21,520 --> 01:02:22,940 study's about fertilizer or something. 1334 01:02:26,530 --> 01:02:27,060 Other thoughts? 1335 01:02:27,060 --> 01:02:29,310 AUDIENCE: You could do the opposite. 1336 01:02:29,310 --> 01:02:33,440 I could say, oh, those guys, they got the treatment. 1337 01:02:33,440 --> 01:02:35,680 But I don't need that. 1338 01:02:35,680 --> 01:02:37,710 I can do just as well, so I'm going to-- 1339 01:02:37,710 --> 01:02:38,790 SHAWN COLE: I'm going to pull myself by my bootstraps. 1340 01:02:38,790 --> 01:02:40,560 AUDIENCE: [UNINTELLIGIBLE] going to start studying and 1341 01:02:40,560 --> 01:02:43,870 double up and I'll show them. 1342 01:02:43,870 --> 01:02:45,400 SHAWN COLE: That's an interesting problem, is we 1343 01:02:45,400 --> 01:02:49,900 don't really know which way these effects go ex ante. 1344 01:02:49,900 --> 01:02:52,810 The Hawthorne or John Henry effects could be positive or 1345 01:02:52,810 --> 01:02:55,630 could be negative and could be a challenge for the 1346 01:02:55,630 --> 01:02:55,910 evaluation. 1347 01:02:55,910 --> 01:02:58,230 So how do you sort of try and address this to 1348 01:02:58,230 --> 01:02:59,480 resolve these problems? 1349 01:03:06,766 --> 01:03:08,365 AUDIENCE: It'd be hard to do statistically. 1350 01:03:14,930 --> 01:03:16,900 AUDIENCE: This could be dangerous I suppose. 1351 01:03:16,900 --> 01:03:19,660 But if you try to make the people in the control group, 1352 01:03:19,660 --> 01:03:21,950 for instance, feel special in some other way. 1353 01:03:21,950 --> 01:03:26,540 But in a way that you can say is not related to anything 1354 01:03:26,540 --> 01:03:28,120 you're measuring from the treatment. 1355 01:03:28,120 --> 01:03:28,255 SHAWN COLE: Right. 1356 01:03:28,255 --> 01:03:30,115 So we're doing a financial literacy evaluation where 1357 01:03:30,115 --> 01:03:31,920 we're showing financial literacy videos to the 1358 01:03:31,920 --> 01:03:32,940 treatment group. 1359 01:03:32,940 --> 01:03:34,570 In the control group we're bringing them in and we're 1360 01:03:34,570 --> 01:03:37,610 showing them films about health or something that we 1361 01:03:37,610 --> 01:03:39,350 don't think will have any effect on financial literacy, 1362 01:03:39,350 --> 01:03:42,720 but lots of the things are the same. 1363 01:03:42,720 --> 01:03:45,170 In the medical literature they do often double blind studies, 1364 01:03:45,170 --> 01:03:46,790 where you don't even know whether you're in the 1365 01:03:46,790 --> 01:03:47,710 treatment group or the control group. 1366 01:03:47,710 --> 01:03:50,230 So you can't get despondent for being in the control group 1367 01:03:50,230 --> 01:03:54,610 because you don't know you're there. 1368 01:03:54,610 --> 01:03:57,210 Sometimes these are sort of inevitable and you can't get 1369 01:03:57,210 --> 01:03:59,520 around them, but you should think about them carefully and 1370 01:03:59,520 --> 01:04:02,780 try to minimize the risk. 1371 01:04:02,780 --> 01:04:04,680 So another problem with evaluations is sort of 1372 01:04:04,680 --> 01:04:07,270 behavioral responses to evaluations. 1373 01:04:07,270 --> 01:04:09,380 So we assign some schools to treatment schools and some 1374 01:04:09,380 --> 01:04:11,250 schools to comparison schools. 1375 01:04:11,250 --> 01:04:17,310 And the people in the comparison school say, oh. 1376 01:04:17,310 --> 01:04:19,490 So we give textbooks to the treatment school. 1377 01:04:19,490 --> 01:04:20,990 So lots of people say, hey, this 1378 01:04:20,990 --> 01:04:22,110 school's got new textbooks. 1379 01:04:22,110 --> 01:04:24,450 I'm going to go to this school. 1380 01:04:24,450 --> 01:04:26,100 And so that increases the class size. 1381 01:04:26,100 --> 01:04:29,470 So the textbooks may benefit the test score, but the 1382 01:04:29,470 --> 01:04:32,610 increased class size offsets that and you find no effect of 1383 01:04:32,610 --> 01:04:34,780 textbooks because the behavioral 1384 01:04:34,780 --> 01:04:35,950 responses undid this. 1385 01:04:35,950 --> 01:04:38,480 Whereas if you were to do it throughout the country and 1386 01:04:38,480 --> 01:04:40,480 give every school new textbooks, then there'd be no 1387 01:04:40,480 --> 01:04:43,150 transferring around because there'd be no reason to change 1388 01:04:43,150 --> 01:04:45,070 schools because your school would have free 1389 01:04:45,070 --> 01:04:46,720 textbooks as well. 1390 01:04:46,720 --> 01:04:48,420 So that's sort of another. 1391 01:04:48,420 --> 01:04:50,478 AUDIENCE: I just wanted to go back to your question about 1392 01:04:50,478 --> 01:04:54,503 how to minimize the Hawthorne and John Henry effect. 1393 01:04:54,503 --> 01:04:59,340 We're doing an impact study, impact evaluation on 1394 01:04:59,340 --> 01:05:02,735 microfinance product in Mexico. 1395 01:05:02,735 --> 01:05:05,650 We try not to talk about the study. 1396 01:05:05,650 --> 01:05:08,868 The people at the top know that they're implementing this 1397 01:05:08,868 --> 01:05:10,110 study, but the participants don't know 1398 01:05:10,110 --> 01:05:10,610 anything about the study. 1399 01:05:10,610 --> 01:05:15,601 And it's very kind of hush hush, basically to try to 1400 01:05:15,601 --> 01:05:17,450 avoid changing behavior. 1401 01:05:17,450 --> 01:05:20,110 And obviously that's not always possible. 1402 01:05:20,110 --> 01:05:23,150 But to the extent that it is, people don't have to know that 1403 01:05:23,150 --> 01:05:24,040 they're part of a study. 1404 01:05:24,040 --> 01:05:26,460 SHAWN COLE: I'm sure everybody here who's lived in the US has 1405 01:05:26,460 --> 01:05:32,070 participated in a study sponsored by a credit card 1406 01:05:32,070 --> 01:05:37,390 company that does randomized evaluations to figure out how 1407 01:05:37,390 --> 01:05:38,880 to get people to sign up for their credit cards. 1408 01:05:38,880 --> 01:05:41,740 So they randomly send some people 10 point font on the 1409 01:05:41,740 --> 01:05:43,440 outside letter, some people 12 point font 1410 01:05:43,440 --> 01:05:44,030 on the outside letter. 1411 01:05:44,030 --> 01:05:46,690 Some people 16 point font on the outside letter. 1412 01:05:46,690 --> 01:05:48,570 And they keep track of who responds and they figure out 1413 01:05:48,570 --> 01:05:50,110 that this is the right font size. 1414 01:05:50,110 --> 01:05:51,650 And then they say, what color should it be? 1415 01:05:51,650 --> 01:05:52,650 This is the right color. 1416 01:05:52,650 --> 01:05:54,790 What should the teaser interest rate be? 1417 01:05:54,790 --> 01:06:00,290 And so lots of firms do this without you even knowing it. 1418 01:06:00,290 --> 01:06:02,740 And then you won't get any John Henry or John Hawthorne 1419 01:06:02,740 --> 01:06:03,800 effects because people won't even know they're in 1420 01:06:03,800 --> 01:06:04,670 experiments. 1421 01:06:04,670 --> 01:06:07,530 Sometimes there are sort of consent issues that you need 1422 01:06:07,530 --> 01:06:11,165 people's informed consent that preclude that from happening. 1423 01:06:14,280 --> 01:06:16,450 There's some issues that we were touching on before, sort 1424 01:06:16,450 --> 01:06:18,550 of the generalizability of results. 1425 01:06:18,550 --> 01:06:20,670 So can the program be replicated on a 1426 01:06:20,670 --> 01:06:22,100 large national scale? 1427 01:06:22,100 --> 01:06:24,700 So we're going in and giving free soap to these villages, 1428 01:06:24,700 --> 01:06:26,860 but it would get expensive to give free soap to every 1429 01:06:26,860 --> 01:06:28,080 village in the country. 1430 01:06:28,080 --> 01:06:29,860 The study sample, is it representative? 1431 01:06:29,860 --> 01:06:32,020 So what's a problem you might run into here? 1432 01:06:36,710 --> 01:06:39,765 AUDIENCE: If for logistical reasons, you're only doing it 1433 01:06:39,765 --> 01:06:44,130 in one state in a country and it's randomized within the 1434 01:06:44,130 --> 01:06:48,110 state, but then there's just a different culture in the 1435 01:06:48,110 --> 01:06:50,994 state, or there's a really strong history or traditions 1436 01:06:50,994 --> 01:06:53,922 in the state that then are not generalizable up to the 1437 01:06:53,922 --> 01:06:55,880 country as a whole. 1438 01:06:55,880 --> 01:07:00,990 SHAWN COLE: A problem you often run into is NGOs will, I 1439 01:07:00,990 --> 01:07:05,910 think for reasonable reasons say, OK, let's do this study 1440 01:07:05,910 --> 01:07:08,480 at out best bank branch or at our best district. 1441 01:07:08,480 --> 01:07:10,820 Because doing a study is pretty hard. 1442 01:07:10,820 --> 01:07:12,970 You have to have treatment and you have to have control and 1443 01:07:12,970 --> 01:07:14,600 keep track of who's in which group. 1444 01:07:14,600 --> 01:07:16,600 And so these people have been with us for five years and 1445 01:07:16,600 --> 01:07:18,110 they can do the study really well. 1446 01:07:18,110 --> 01:07:23,150 And so you do the study and you find some nice effect, but 1447 01:07:23,150 --> 01:07:24,760 that is the effect of putting your best people into the 1448 01:07:24,760 --> 01:07:26,660 program and you only have 20 of them. 1449 01:07:26,660 --> 01:07:30,110 And now when you try to scale it up to 500 villages, you 1450 01:07:30,110 --> 01:07:32,170 just don't have that level of human capital to implement the 1451 01:07:32,170 --> 01:07:34,020 same quality of program elsewhere. 1452 01:07:37,540 --> 01:07:39,860 So sensitivity of results. 1453 01:07:39,860 --> 01:07:46,560 This is sort of important, but may be second order important. 1454 01:07:46,560 --> 01:07:49,110 The state of the art and the sciences, we're still looking 1455 01:07:49,110 --> 01:07:51,110 for things that work and work well. 1456 01:07:51,110 --> 01:07:56,960 So we're not as worried about figuring out if we give the 1457 01:07:56,960 --> 01:08:00,320 de-worming tablet every month versus every three weeks, 1458 01:08:00,320 --> 01:08:01,470 which one is more effective. 1459 01:08:01,470 --> 01:08:03,970 I mean that's a useful important question and it 1460 01:08:03,970 --> 01:08:05,450 probably deserves a study. 1461 01:08:05,450 --> 01:08:09,190 But it's hard enough to get the big picture studies done 1462 01:08:09,190 --> 01:08:12,740 to then move onto the sensitivity of the results. 1463 01:08:12,740 --> 01:08:15,270 That said, sometimes there are often interesting economic 1464 01:08:15,270 --> 01:08:16,020 questions you have. 1465 01:08:16,020 --> 01:08:18,800 So you want to know whether microfinance has an impact on 1466 01:08:18,800 --> 01:08:19,770 people's wealth. 1467 01:08:19,770 --> 01:08:22,109 But you might also care about the interest rate. 1468 01:08:22,109 --> 01:08:25,384 And so for microfinance to be sustainable, the interest rate 1469 01:08:25,384 --> 01:08:25,979 has to be high. 1470 01:08:25,979 --> 01:08:28,290 But for it to generate a lot of income for the borrowers, 1471 01:08:28,290 --> 01:08:29,500 the interest rate has to be low. 1472 01:08:29,500 --> 01:08:32,250 So you could try your program at different interest rates 1473 01:08:32,250 --> 01:08:35,140 and see whether you find the same effect at different 1474 01:08:35,140 --> 01:08:36,460 interest rates. 1475 01:08:36,460 --> 01:08:37,710 That would be very interesting. 1476 01:08:40,620 --> 01:08:43,029 So there's often a trade-off between internal 1477 01:08:43,029 --> 01:08:46,500 and external validity. 1478 01:08:46,500 --> 01:08:52,149 In my experience, I think it's probably reasonable to focus 1479 01:08:52,149 --> 01:08:54,450 on the first pass on the internal validity. 1480 01:08:54,450 --> 01:08:58,399 Because the advantage of picking your best branch or 1481 01:08:58,399 --> 01:09:00,939 picking your good people to get the study done and done 1482 01:09:00,939 --> 01:09:04,220 well and have a large treatment effect is that we 1483 01:09:04,220 --> 01:09:06,470 were sure we know what we're measuring. 1484 01:09:06,470 --> 01:09:10,229 It's often hard to measure effects in the real world. 1485 01:09:10,229 --> 01:09:13,120 It's not as if the hundred people in this room are the 1486 01:09:13,120 --> 01:09:14,270 first people to think we should do 1487 01:09:14,270 --> 01:09:16,520 something to reduce poverty. 1488 01:09:16,520 --> 01:09:18,520 It's a difficult and thorny problem. 1489 01:09:18,520 --> 01:09:21,310 And so if we can throw sort of our best program and show that 1490 01:09:21,310 --> 01:09:26,710 our best program is effective, then we can sort of work on 1491 01:09:26,710 --> 01:09:31,000 expanding and testing our second best program. 1492 01:09:31,000 --> 01:09:34,120 Statistical power is often stronger if you have a very 1493 01:09:34,120 --> 01:09:35,149 homogeneous sample. 1494 01:09:35,149 --> 01:09:41,609 So if you can randomize in a set of twins or something like 1495 01:09:41,609 --> 01:09:43,130 that, you have very good statistical power. 1496 01:09:43,130 --> 01:09:45,470 But twins might not be representative of the general 1497 01:09:45,470 --> 01:09:46,770 population. 1498 01:09:46,770 --> 01:09:48,899 And then, of course, the study location is 1499 01:09:48,899 --> 01:09:49,655 almost never random. 1500 01:09:49,655 --> 01:09:53,710 In Indonesia we did manage to do a nationally representative 1501 01:09:53,710 --> 01:09:56,860 randomized evaluation of a financial literacy program, 1502 01:09:56,860 --> 01:09:59,480 but that was just because the world bank was doing a 1503 01:09:59,480 --> 01:10:01,940 nationally representative survey and we persuaded them 1504 01:10:01,940 --> 01:10:03,470 to tack the experiment on at the end. 1505 01:10:03,470 --> 01:10:06,080 But otherwise it's almost always prohibitively expensive 1506 01:10:06,080 --> 01:10:09,640 to travel around to hundreds of locations. 1507 01:10:09,640 --> 01:10:11,450 But at the end of the day, you do care a lot 1508 01:10:11,450 --> 01:10:12,550 about external validity. 1509 01:10:12,550 --> 01:10:15,550 You want to know that before you throw a lot of money at 1510 01:10:15,550 --> 01:10:19,390 the program, can you get the same effect when 1511 01:10:19,390 --> 01:10:20,920 you scale it up? 1512 01:10:20,920 --> 01:10:24,320 And is this program effective for large populations? 1513 01:10:27,440 --> 01:10:30,710 So in the last 5 or 10 minutes, we'll talk a little 1514 01:10:30,710 --> 01:10:32,020 bit about cost effectiveness. 1515 01:10:32,020 --> 01:10:33,870 So you've done your program, you've done your evaluation, 1516 01:10:33,870 --> 01:10:35,160 you've got the efficacy. 1517 01:10:35,160 --> 01:10:36,640 You know how much it costs to deliver the program. 1518 01:10:36,640 --> 01:10:40,400 Now how do you decide which program to pursue? 1519 01:10:46,360 --> 01:10:48,150 I guess the important thing in this is pretty obvious. 1520 01:10:48,150 --> 01:10:50,380 It's just finding a metric that you can use to compare 1521 01:10:50,380 --> 01:10:51,340 different programs. 1522 01:10:51,340 --> 01:10:54,670 So in educational programs we often look at years of 1523 01:10:54,670 --> 01:10:55,920 schooling as an output. 1524 01:10:58,540 --> 01:11:00,840 Having an extra teacher causes people to stay in school 1525 01:11:00,840 --> 01:11:02,940 longer, but extra teachers are expensive. 1526 01:11:02,940 --> 01:11:07,450 You can figure out how much it costs per child year of 1527 01:11:07,450 --> 01:11:08,910 schooling you create. 1528 01:11:08,910 --> 01:11:11,200 In health programs, they have something called a disability 1529 01:11:11,200 --> 01:11:14,100 adjusted life year, which I'm sure some of you know a lot 1530 01:11:14,100 --> 01:11:14,780 better than I do. 1531 01:11:14,780 --> 01:11:18,810 But it's basically an unimpaired year of life with 1532 01:11:18,810 --> 01:11:22,390 no disability counts as one. 1533 01:11:22,390 --> 01:11:25,450 If your legs are immobile, then maybe 1534 01:11:25,450 --> 01:11:26,830 it'd be 0.6 or something. 1535 01:11:26,830 --> 01:11:30,930 And it sort of gets adjusted down to figure out which 1536 01:11:30,930 --> 01:11:33,170 health interventions are more or less cost effective. 1537 01:11:33,170 --> 01:11:35,230 Or you could do cost per death averted. 1538 01:11:42,080 --> 01:11:47,210 I think the interesting takeaway here is that doing 1539 01:11:47,210 --> 01:11:49,150 these types of comparisons can sometimes lead to pretty 1540 01:11:49,150 --> 01:11:50,650 surprising results. 1541 01:11:50,650 --> 01:11:59,640 So we know how to get people in school by reducing the cost 1542 01:11:59,640 --> 01:12:02,820 of education, so the PROGRESA program in Mexico made 1543 01:12:02,820 --> 01:12:04,980 conditional cash transfers to students' parents 1544 01:12:04,980 --> 01:12:06,260 who attended school. 1545 01:12:06,260 --> 01:12:08,030 Providing free uniforms increases attendance. 1546 01:12:08,030 --> 01:12:11,280 Providing school meals increases attendance. 1547 01:12:11,280 --> 01:12:13,270 We've looked at incentives to increase learning. 1548 01:12:13,270 --> 01:12:15,010 But we've also looked at the de-worming case. 1549 01:12:19,380 --> 01:12:21,780 If you'd said five years ago to educational people who 1550 01:12:21,780 --> 01:12:24,180 specialize in education in developing countries, what do 1551 01:12:24,180 --> 01:12:28,020 you think a very high impact intervention would be? 1552 01:12:28,020 --> 01:12:30,850 I think very few people would have suggested de-worming. 1553 01:12:30,850 --> 01:12:35,680 But if you do the math, you can figure out that an extra 1554 01:12:35,680 --> 01:12:38,540 teacher will induce let's say one year of additional 1555 01:12:38,540 --> 01:12:41,790 schooling, but costs $55 per pupil. 1556 01:12:41,790 --> 01:12:44,395 So the cost per additional year of schooling is here for 1557 01:12:44,395 --> 01:12:45,410 extra teacher. 1558 01:12:45,410 --> 01:12:47,120 Iron supplements here. 1559 01:12:47,120 --> 01:12:48,850 School meals here. 1560 01:12:48,850 --> 01:12:49,600 Deworming here. 1561 01:12:49,600 --> 01:12:53,070 So it's just tremendously cost effective to provide this 1562 01:12:53,070 --> 01:12:56,860 de-worming medicine as a means of 1563 01:12:56,860 --> 01:13:00,450 increasing years of education. 1564 01:13:00,450 --> 01:13:02,580 Much cheaper than scholarships for girls, et cetera, or 1565 01:13:02,580 --> 01:13:03,830 school uniforms. 1566 01:13:07,330 --> 01:13:10,980 And you could do this calculation not just for 1567 01:13:10,980 --> 01:13:12,720 education, you could say, there are a lot of things that 1568 01:13:12,720 --> 01:13:13,260 we care about. 1569 01:13:13,260 --> 01:13:18,120 We care about health outcomes, human capital investment, 1570 01:13:18,120 --> 01:13:19,650 externalities. 1571 01:13:19,650 --> 01:13:22,990 And so an interesting thing that came out of the 1572 01:13:22,990 --> 01:13:27,380 de-worming study was that if you did the old studies that 1573 01:13:27,380 --> 01:13:30,165 didn't take into account the externalities and just sort of 1574 01:13:30,165 --> 01:13:31,935 treated some people in a school but not other people in 1575 01:13:31,935 --> 01:13:34,170 a school, it didn't look like a very good intervention. 1576 01:13:34,170 --> 01:13:36,940 Because the kids would keep reinfecting each other even 1577 01:13:36,940 --> 01:13:38,840 though they'd just been treated, and so it wasn't that 1578 01:13:38,840 --> 01:13:39,780 cost effective. 1579 01:13:39,780 --> 01:13:43,540 But once you did the school level randomization and took 1580 01:13:43,540 --> 01:13:46,440 into account for the externalities, then the 1581 01:13:46,440 --> 01:13:50,060 program turned out to look very, very cheap as a way of 1582 01:13:50,060 --> 01:13:51,310 providing education. 1583 01:13:54,660 --> 01:13:57,580 It's also an incredibly effective way of improving 1584 01:13:57,580 --> 01:14:00,560 health outcomes, de-worming. 1585 01:14:00,560 --> 01:14:05,540 And much more effective, for example, than treating 1586 01:14:05,540 --> 01:14:06,790 schistosomiasis. 1587 01:14:09,880 --> 01:14:11,790 You can do even more calculations. 1588 01:14:11,790 --> 01:14:14,150 You can say OK, so we know that the deworming medicine is 1589 01:14:14,150 --> 01:14:16,810 going to increase years of education by 0.2. 1590 01:14:16,810 --> 01:14:19,590 Well, what's 0.2 years of education worth? 1591 01:14:19,590 --> 01:14:22,150 There are economists who have done estimates of the returns 1592 01:14:22,150 --> 01:14:23,080 to schooling in Kenya. 1593 01:14:23,080 --> 01:14:25,310 They say if you get an extra year of schooling, you're 1594 01:14:25,310 --> 01:14:29,010 going to get 7% more income throughout your life. 1595 01:14:29,010 --> 01:14:31,360 And then so you've got 40 years of life 1596 01:14:31,360 --> 01:14:32,510 at 7% higher income. 1597 01:14:32,510 --> 01:14:36,430 You can take the present value of that stream of additional 1598 01:14:36,430 --> 01:14:40,460 wage payments and you can see that, wow, by investing only 1599 01:14:40,460 --> 01:14:46,170 $0.49, we're going to generate $20 more in wages at net 1600 01:14:46,170 --> 01:14:50,180 present value, on average. 1601 01:14:50,180 --> 01:14:53,560 And so if you have a tax rate of 10%, that's clearly a 1602 01:14:53,560 --> 01:14:55,660 profitable intervention for the government if 1603 01:14:55,660 --> 01:14:57,260 it's patient enough. 1604 01:14:57,260 --> 01:15:00,040 Because it'll take in $2 in net present value in taxes at 1605 01:15:00,040 --> 01:15:02,180 a cost of only $0.49 of delivering. 1606 01:15:05,310 --> 01:15:06,600 Of course, there may not be any taxes on 1607 01:15:06,600 --> 01:15:09,660 informal labor in Kenya. 1608 01:15:09,660 --> 01:15:14,170 So I think this is an example we like to cite first, because 1609 01:15:14,170 --> 01:15:16,200 Michael helped prepare this particular lecture the first 1610 01:15:16,200 --> 01:15:18,360 time around and is fond of that paper. 1611 01:15:18,360 --> 01:15:22,300 But second, I think it's a very nice example of a program 1612 01:15:22,300 --> 01:15:24,660 that has a really big macro effect. 1613 01:15:24,660 --> 01:15:29,660 So basically, it's been adopted nationwide in Uganda, 1614 01:15:29,660 --> 01:15:32,460 and they're expanding it a lot in Kenya. 1615 01:15:32,460 --> 01:15:36,230 It's been tried in India and many other countries have 1616 01:15:36,230 --> 01:15:36,980 realized that this is a 1617 01:15:36,980 --> 01:15:39,040 tremendously effective program. 1618 01:15:39,040 --> 01:15:42,350 And the ability to have this randomized evaluation, there's 1619 01:15:42,350 --> 01:15:45,070 very credible evidence that said, we had these treatment 1620 01:15:45,070 --> 01:15:45,970 groups, these control groups. 1621 01:15:45,970 --> 01:15:49,490 We followed people for three years and the reason why our 1622 01:15:49,490 --> 01:15:51,440 results are so different than the other results that you 1623 01:15:51,440 --> 01:15:54,410 were citing as a reason not to provide de-worming is because 1624 01:15:54,410 --> 01:15:55,350 of these externalities. 1625 01:15:55,350 --> 01:15:58,310 And we can show you why these externalities matter. 1626 01:15:58,310 --> 01:16:01,320 The credibility of that study really helped to transform 1627 01:16:01,320 --> 01:16:05,800 policy and literally save thousands, tens of 1628 01:16:05,800 --> 01:16:07,800 thousands of lives. 1629 01:16:07,800 --> 01:16:09,810 Maybe more. 1630 01:16:09,810 --> 01:16:10,870 Other examples. 1631 01:16:10,870 --> 01:16:13,430 PROGRESA, which some of you might be familiar with. 1632 01:16:13,430 --> 01:16:17,110 It's actually the government of Mexico decided to integrate 1633 01:16:17,110 --> 01:16:19,260 a bunch of randomized evaluations into its social 1634 01:16:19,260 --> 01:16:21,770 welfare programs. 1635 01:16:21,770 --> 01:16:24,510 That methodology and the results of what's been shown 1636 01:16:24,510 --> 01:16:26,810 effective in that program have been adopted throughout Latin 1637 01:16:26,810 --> 01:16:28,020 America and elsewhere. 1638 01:16:28,020 --> 01:16:31,510 And Ben Olken, whom you saw earlier, did some experiments 1639 01:16:31,510 --> 01:16:34,180 on threat of audits in Indonesia for corruption. 1640 01:16:34,180 --> 01:16:36,790 And the government of Indonesia is increasing the 1641 01:16:36,790 --> 01:16:39,910 probability of audits as a way of fighting corruption in the 1642 01:16:39,910 --> 01:16:44,100 nation's fourth most populous country. 1643 01:16:44,100 --> 01:16:49,760 So I think the conclusion is that these evaluations, which 1644 01:16:49,760 --> 01:16:51,410 take a lot of time and take a lot of effort, 1645 01:16:51,410 --> 01:16:52,350 let's not kid ourselves. 1646 01:16:52,350 --> 01:16:54,690 If you sign up for one of these, it's 1647 01:16:54,690 --> 01:16:57,370 going to be a big affair. 1648 01:16:57,370 --> 01:16:59,750 But it can have a tremendous impact. 1649 01:16:59,750 --> 01:17:03,040 It's very important to know from your own perspective how 1650 01:17:03,040 --> 01:17:04,640 effective your program is, but you can 1651 01:17:04,640 --> 01:17:06,220 influence policy a lot. 1652 01:17:06,220 --> 01:17:09,180 So I'm just going to conclude with two things. 1653 01:17:09,180 --> 01:17:12,960 One is this mention of the additional resources, which 1654 01:17:12,960 --> 01:17:14,500 should be on the JPAL website. 1655 01:17:14,500 --> 01:17:17,780 This is a book you have to buy for $60. 1656 01:17:17,780 --> 01:17:19,880 Only buy this if you're already familiar with 1657 01:17:19,880 --> 01:17:20,860 econometrics. 1658 01:17:20,860 --> 01:17:23,010 But they're both great treatments of the material 1659 01:17:23,010 --> 01:17:26,260 we've covered this week. 1660 01:17:26,260 --> 01:17:29,800 And I believe JPAL is in the process of developing a 1661 01:17:29,800 --> 01:17:31,970 practitioner's guide as well. 1662 01:17:31,970 --> 01:17:33,170 This is a much more technical guide 1663 01:17:33,170 --> 01:17:34,220 that's full of equations. 1664 01:17:34,220 --> 01:17:39,080 But hopefully, as you've seen throughout the last week, 1665 01:17:39,080 --> 01:17:41,980 we've tried to explain things in ways that are accessible 1666 01:17:41,980 --> 01:17:44,500 that you can explain to people who haven't taken econometrics 1667 01:17:44,500 --> 01:17:47,020 and that will hopefully be coming out pretty soon. 1668 01:17:47,020 --> 01:17:50,250 And so if I were just to at least take two seconds to give 1669 01:17:50,250 --> 01:17:54,160 my perspective, I'm young but I've done a few of these. 1670 01:17:54,160 --> 01:17:56,000 It's probably helpful when you're doing one of these 1671 01:17:56,000 --> 01:17:58,460 evaluations to engage in academics. 1672 01:17:58,460 --> 01:18:00,330 If you're thinking of doing a study, you just 1673 01:18:00,330 --> 01:18:01,900 send an email to JPAL. 1674 01:18:01,900 --> 01:18:06,050 They'll send it out to their network of 15 or 20 people and 1675 01:18:06,050 --> 01:18:07,340 I don't know what the response rate is. 1676 01:18:07,340 --> 01:18:11,170 But I think there's a lot of interest in doing these 1677 01:18:11,170 --> 01:18:12,730 experiments. 1678 01:18:12,730 --> 01:18:14,230 IPA is another organization that does this. 1679 01:18:14,230 --> 01:18:16,850 But there's some subtle nuanced issues that require 1680 01:18:16,850 --> 01:18:18,000 careful thinking through. 1681 01:18:18,000 --> 01:18:21,430 Or you could just call me up and say, we're going to be 1682 01:18:21,430 --> 01:18:22,160 doing this study. 1683 01:18:22,160 --> 01:18:23,800 We're spending a lot of money on it, can we just talk 1684 01:18:23,800 --> 01:18:25,700 through these issues with you? 1685 01:18:25,700 --> 01:18:30,310 I'd be perfectly happy to do that. 1686 01:18:30,310 --> 01:18:32,580 And then, one thought. 1687 01:18:32,580 --> 01:18:34,080 Just a few thoughts on keeping your 1688 01:18:34,080 --> 01:18:35,140 evaluation costs effective. 1689 01:18:35,140 --> 01:18:36,640 And maybe even think about this when you're proposing 1690 01:18:36,640 --> 01:18:40,230 your program tomorrow, is that randomizing at the individual 1691 01:18:40,230 --> 01:18:46,550 level where possible and appropriate is a tremendously 1692 01:18:46,550 --> 01:18:48,590 useful way to keep costs down. 1693 01:18:48,590 --> 01:18:51,270 Because their statistical power is so much higher, so to 1694 01:18:51,270 --> 01:18:54,340 detect any given effect size you can do it with many fewer 1695 01:18:54,340 --> 01:18:56,010 observations. 1696 01:18:56,010 --> 01:18:57,990 Another way to save a lot of money is to use 1697 01:18:57,990 --> 01:18:58,850 administrative data. 1698 01:18:58,850 --> 01:19:02,790 So if you can get people to give you access in New York to 1699 01:19:02,790 --> 01:19:06,090 their checking accounts and credit card statements, and 1700 01:19:06,090 --> 01:19:09,040 get those reported, at a low cost to you, the 1701 01:19:09,040 --> 01:19:11,520 administrative data is often of very high quality and can 1702 01:19:11,520 --> 01:19:14,020 be collected for little money. 1703 01:19:14,020 --> 01:19:16,550 And then the third trick is sort of using lotteries to 1704 01:19:16,550 --> 01:19:17,610 ensure high compliance. 1705 01:19:17,610 --> 01:19:19,250 So if you sort of announce that you've got this new 1706 01:19:19,250 --> 01:19:22,440 program and you let everybody who's interested apply, and 1707 01:19:22,440 --> 01:19:26,850 then you sort of randomly select 60 and have 60 as a 1708 01:19:26,850 --> 01:19:29,380 control group, then the compliance is going to be 1709 01:19:29,380 --> 01:19:32,230 pretty high in the treatment group. 1710 01:19:32,230 --> 01:19:35,200 That Wald estimator, the difference between these two 1711 01:19:35,200 --> 01:19:37,740 things, is going to be pretty high. 1712 01:19:37,740 --> 01:19:42,700 So we're at 2:25, which at MIT is the end of the class. 1713 01:19:42,700 --> 01:19:45,890 And I guess the end of the course, at least from the 1714 01:19:45,890 --> 01:19:46,520 lecturing side. 1715 01:19:46,520 --> 01:19:49,000 So thank you very much and I will be around here to answer 1716 01:19:49,000 --> 01:19:51,110 questions for the next 15 minutes if people 1717 01:19:51,110 --> 01:19:52,360 would like to talk.