1 00:00:00,040 --> 00:00:01,930 ANNOUNCER: The following content is provided under a 2 00:00:01,930 --> 00:00:03,680 Creative Commons license. 3 00:00:03,680 --> 00:00:06,640 Your support will help MIT OpenCourseWare continue to 4 00:00:06,640 --> 00:00:09,980 offer high quality educational resources for free. 5 00:00:09,980 --> 00:00:12,820 To make a donation or to view additional materials from 6 00:00:12,820 --> 00:00:16,750 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:16,750 --> 00:00:18,000 ocw.mit.edu. 8 00:00:21,600 --> 00:00:23,860 MICHAEL KREMER: So I understand that in the session 9 00:00:23,860 --> 00:00:27,280 that you just had, you went through the deworming case. 10 00:00:27,280 --> 00:00:30,220 And I was just talking to some people in the break, and they 11 00:00:30,220 --> 00:00:33,100 were saying that everything's been very focused on methods, 12 00:00:33,100 --> 00:00:34,150 which is understandable. 13 00:00:34,150 --> 00:00:35,790 That's what the purpose of the course is. 14 00:00:35,790 --> 00:00:38,130 But it sounded like people were interested in hearing a 15 00:00:38,130 --> 00:00:40,450 little bit about the substantive results. 16 00:00:40,450 --> 00:00:42,560 So I just thought before I launched into this lecture, 17 00:00:42,560 --> 00:00:45,450 I'd say a little bit about that. 18 00:00:45,450 --> 00:00:47,450 And maybe this is also a way to give you a little bit of 19 00:00:47,450 --> 00:00:49,580 background on where I'm coming from. 20 00:00:49,580 --> 00:00:54,340 So I taught secondary school in Kenya right after college 21 00:00:54,340 --> 00:00:56,360 and then went to grad school. 22 00:00:56,360 --> 00:01:01,220 And then I went back after graduating, from getting my 23 00:01:01,220 --> 00:01:03,460 Ph.D. and getting a real job and having some money. 24 00:01:03,460 --> 00:01:06,430 I eventually went back to visit some friends. 25 00:01:06,430 --> 00:01:10,160 And one of them was working for an NGO, which was just 26 00:01:10,160 --> 00:01:11,810 starting work in western Kenya. 27 00:01:11,810 --> 00:01:18,200 And his job was to find seven schools to start a program in. 28 00:01:18,200 --> 00:01:22,890 And I said to him, not really thinking this was something 29 00:01:22,890 --> 00:01:27,500 that he would do, "Why don't you pick twice as many and 30 00:01:27,500 --> 00:01:29,030 choose the seven randomly, at least where you're going to 31 00:01:29,030 --> 00:01:33,710 start?" And much to my surprise, he was interested. 32 00:01:33,710 --> 00:01:37,870 And then he went to his boss, and his boss actually did it. 33 00:01:37,870 --> 00:01:44,520 So that's, in part, how this wave of randomized evaluations 34 00:01:44,520 --> 00:01:48,080 with NGOs got going. 35 00:01:48,080 --> 00:01:50,620 This NGO worked a lot on education. 36 00:01:50,620 --> 00:01:55,870 And over the years, we tried a number of things to try to get 37 00:01:55,870 --> 00:01:58,250 more kids in school and stop kids from dropping out. 38 00:01:58,250 --> 00:02:02,780 But eventually, they tried treating kids for worms. 39 00:02:02,780 --> 00:02:05,790 And part of this was based on reading the literature which 40 00:02:05,790 --> 00:02:09,100 suggested that this is an important health intervention. 41 00:02:09,100 --> 00:02:11,270 There was a question, would it have education effects. 42 00:02:11,270 --> 00:02:13,640 So it turned out, of all the various things that we looked 43 00:02:13,640 --> 00:02:17,560 at, we calculated what was the cost per additional year of 44 00:02:17,560 --> 00:02:19,030 schooling generated. 45 00:02:19,030 --> 00:02:20,900 So we're comparing a bunch of things in that same 46 00:02:20,900 --> 00:02:22,750 environment in western Kenya. 47 00:02:22,750 --> 00:02:26,660 And deworming came out an order of magnitude better than 48 00:02:26,660 --> 00:02:27,870 anything else. 49 00:02:27,870 --> 00:02:33,080 So this was a really striking result. 50 00:02:33,080 --> 00:02:36,250 If you spent $3.50, you could generate an additional year of 51 00:02:36,250 --> 00:02:37,450 education for a child. 52 00:02:37,450 --> 00:02:40,860 It was just much cheaper than any of the other alternatives. 53 00:02:40,860 --> 00:02:43,265 So we had that academic result. 54 00:02:46,020 --> 00:02:47,420 There were people at the World Bank who were very 55 00:02:47,420 --> 00:02:49,550 interested in this. 56 00:02:49,550 --> 00:02:51,520 There's a lot of heterogeneity in the Bank, but a lot of 57 00:02:51,520 --> 00:02:54,260 people there who are very interested in and understand 58 00:02:54,260 --> 00:02:56,580 evidence or are interested in or responsive to it. 59 00:02:56,580 --> 00:02:59,350 And the particular people who are working on health and 60 00:02:59,350 --> 00:03:02,810 education sector in the Bank in Kenya, that very much 61 00:03:02,810 --> 00:03:03,590 applies to. 62 00:03:03,590 --> 00:03:07,320 So they then took it to the Ministry of Education and 63 00:03:07,320 --> 00:03:08,875 brought us in to talk to the people in 64 00:03:08,875 --> 00:03:10,125 the Ministry of Education. 65 00:03:14,270 --> 00:03:16,000 This process took quite a lot of time. 66 00:03:16,000 --> 00:03:17,790 I don't want to underestimate this. 67 00:03:17,790 --> 00:03:19,090 Took a lot of time. 68 00:03:19,090 --> 00:03:21,210 The first time, they said, these results are interesting, 69 00:03:21,210 --> 00:03:22,240 then, yes, we should pursue them. 70 00:03:22,240 --> 00:03:24,710 But there's a lot going on inside the Ministry of 71 00:03:24,710 --> 00:03:26,380 Education, lots of other priorities. 72 00:03:26,380 --> 00:03:28,280 There are teacher strikes. 73 00:03:28,280 --> 00:03:29,520 There's all sorts of things that have 74 00:03:29,520 --> 00:03:31,590 to take higher priority. 75 00:03:31,590 --> 00:03:38,820 But both externally, outside in international fora and 76 00:03:38,820 --> 00:03:42,560 academic fora, and internally inside Kenya, we kept 77 00:03:42,560 --> 00:03:43,890 bringing this up. 78 00:03:43,890 --> 00:03:48,615 And eventually, the permanent secretary, who's very strong, 79 00:03:48,615 --> 00:03:50,480 the permanent secretary of the Ministry of Education said, 80 00:03:50,480 --> 00:03:51,140 let's do this. 81 00:03:51,140 --> 00:03:53,790 And he brought in various people. 82 00:03:53,790 --> 00:03:56,830 And they decided they were going to try to implement 83 00:03:56,830 --> 00:04:00,630 this, have a national scale up of this. 84 00:04:00,630 --> 00:04:07,430 And there was both that internal persuasion of people 85 00:04:07,430 --> 00:04:09,050 within the Ministry of Education. 86 00:04:09,050 --> 00:04:10,080 And then, of course, there's the question of 87 00:04:10,080 --> 00:04:11,770 getting budget for it. 88 00:04:11,770 --> 00:04:14,990 Obviously, having the World Bank on side helped on that. 89 00:04:14,990 --> 00:04:18,120 The other thing that we did was-- 90 00:04:18,120 --> 00:04:23,570 so Esther and I were both involved in an event the World 91 00:04:23,570 --> 00:04:26,070 Economic Forum put on in Davos. 92 00:04:26,070 --> 00:04:29,370 And working with that group, we were able to arrange for 93 00:04:29,370 --> 00:04:33,690 there to be an event in Davos on this issue of deworming. 94 00:04:33,690 --> 00:04:37,430 And we helped start an organization called Deworm the 95 00:04:37,430 --> 00:04:39,570 World which was designed to promote this. 96 00:04:39,570 --> 00:04:41,270 And we invited the prime minister of 97 00:04:41,270 --> 00:04:42,350 Kenya to come speak. 98 00:04:42,350 --> 00:04:43,340 And he made this announcement. 99 00:04:43,340 --> 00:04:46,390 And I think that helped drive this forward a lot. 100 00:04:46,390 --> 00:04:51,480 Because once you got a public announcement by a politician, 101 00:04:51,480 --> 00:04:54,070 then it's really going to happen in a way. 102 00:04:54,070 --> 00:04:57,640 So between the support internally within the ministry 103 00:04:57,640 --> 00:05:01,680 and this higher level political support, Kenya has, 104 00:05:01,680 --> 00:05:04,760 just as we speak, just in the past few months, dewormed 105 00:05:04,760 --> 00:05:05,820 almost 3 million children. 106 00:05:05,820 --> 00:05:10,090 So this is an example of how, if you can identify successful 107 00:05:10,090 --> 00:05:12,640 intervention, it can really help promote scale-up of the 108 00:05:12,640 --> 00:05:14,760 successful interventions. 109 00:05:14,760 --> 00:05:17,400 Ultimately, that's the purpose of what we're doing is trying 110 00:05:17,400 --> 00:05:18,610 to improve policy. 111 00:05:18,610 --> 00:05:24,050 So I just wanted to give you that tie-in to reality before 112 00:05:24,050 --> 00:05:25,305 plunging back into econometrics. 113 00:05:30,720 --> 00:05:32,560 Any comments or questions on that? 114 00:05:32,560 --> 00:05:34,210 AUDIENCE: I thought it was really interesting. 115 00:05:34,210 --> 00:05:36,150 Could you just give us a couple of the 116 00:05:36,150 --> 00:05:37,260 year points in that? 117 00:05:37,260 --> 00:05:39,120 You talked about things taking a long time. 118 00:05:39,120 --> 00:05:40,370 Can you give more concrete evidence? 119 00:05:43,070 --> 00:05:45,280 MICHAEL KREMER: Our article appeared, and publishing the 120 00:05:45,280 --> 00:05:46,725 article takes a long time too. 121 00:05:46,725 --> 00:05:52,510 Our article appeared in 2004, I believe. 122 00:05:52,510 --> 00:05:55,200 It's now 2009, and this is happening now. 123 00:05:58,940 --> 00:06:01,410 The NGO responded much more quickly. 124 00:06:01,410 --> 00:06:03,850 Although eventually, they changed 125 00:06:03,850 --> 00:06:04,560 their strategy as well. 126 00:06:04,560 --> 00:06:05,990 So the NGO scaled up. 127 00:06:05,990 --> 00:06:09,400 But to get the national government to scale up, I 128 00:06:09,400 --> 00:06:09,970 think that took a 129 00:06:09,970 --> 00:06:12,030 constellation of various people. 130 00:06:12,030 --> 00:06:15,705 It took some time for this to get in the media, to get in 131 00:06:15,705 --> 00:06:20,970 the academic world, to get out to the media, to get out to 132 00:06:20,970 --> 00:06:23,765 opinion leaders, both internationally. 133 00:06:26,370 --> 00:06:30,050 And then it took time for the right set of people to be 134 00:06:30,050 --> 00:06:33,000 available and the money to be available to do it. 135 00:06:33,000 --> 00:06:33,430 Yes? 136 00:06:33,430 --> 00:06:34,915 AUDIENCE: Just kind of following up on that. 137 00:06:34,915 --> 00:06:37,080 This is a issue that is obviously quite concerning, 138 00:06:37,080 --> 00:06:40,364 the bridge between academia and the policy world. 139 00:06:40,364 --> 00:06:42,332 And the fact that this research is absolutely 140 00:06:42,332 --> 00:06:43,960 fabulous, but then, at the end of the day, 141 00:06:43,960 --> 00:06:44,888 it stays in a textbook. 142 00:06:44,888 --> 00:06:46,910 And what use is it to beneficiaries? 143 00:06:49,630 --> 00:06:51,230 This example is, again, fabulous. 144 00:06:51,230 --> 00:06:56,740 But what sort of actions or roles are there to extend 145 00:06:56,740 --> 00:06:58,882 findings into the policy world and into 146 00:06:58,882 --> 00:07:01,292 the development world? 147 00:07:01,292 --> 00:07:02,738 I'm sure that's a big topic. 148 00:07:02,738 --> 00:07:07,140 But just very briefly, is J-PAL or sister organizations 149 00:07:07,140 --> 00:07:08,460 doing that sort of extension? 150 00:07:17,230 --> 00:07:20,180 MICHAEL KREMER: In this particular case, I would say 151 00:07:20,180 --> 00:07:24,460 that I spent a fair amount of time afterwards trying to 152 00:07:24,460 --> 00:07:25,730 disseminate this. 153 00:07:25,730 --> 00:07:30,720 And J-PAL has been very important in starting to 154 00:07:30,720 --> 00:07:31,970 deworm the world. 155 00:07:34,380 --> 00:07:38,610 I think this requires effort at a variety of levels, both 156 00:07:38,610 --> 00:07:40,640 in trying to get the prime minister on board, but also in 157 00:07:40,640 --> 00:07:45,100 trying to do tasks like, well, you need a spreadsheet of 158 00:07:45,100 --> 00:07:48,730 where are all the schools in the country, and which ones of 159 00:07:48,730 --> 00:07:53,130 them are in areas where we think that there's worms, and 160 00:07:53,130 --> 00:07:55,150 working out a bunch of logistics of, well, how many 161 00:07:55,150 --> 00:07:56,890 trainers do you need, and so on. 162 00:07:56,890 --> 00:08:00,390 Now I think in different settings, they'll be-- 163 00:08:00,390 --> 00:08:07,540 in this one, people who are at J-PAL and IPA have been 164 00:08:07,540 --> 00:08:10,540 involved in even down to that spreadsheet level. 165 00:08:14,280 --> 00:08:15,950 That may not be the case all the time. 166 00:08:15,950 --> 00:08:17,630 I think it depends a lot on the particular government. 167 00:08:17,630 --> 00:08:21,040 I think, obviously, J-PAL is primarily a academic 168 00:08:21,040 --> 00:08:21,770 organization. 169 00:08:21,770 --> 00:08:27,230 And so it's not the right organization to manage the 170 00:08:27,230 --> 00:08:28,585 actual roll out. 171 00:08:28,585 --> 00:08:33,179 But where you draw the line, it's a difficult question. 172 00:08:33,179 --> 00:08:33,929 But I also-- 173 00:08:33,929 --> 00:08:34,883 Yes. 174 00:08:34,883 --> 00:08:35,360 AUDIENCE: [INAUDIBLE]. 175 00:08:35,360 --> 00:08:36,658 MICHAEL KREMER: No, go ahead. 176 00:08:36,658 --> 00:08:38,404 AUDIENCE: So it seems like that the deworming medication 177 00:08:38,404 --> 00:08:41,149 is really cheap, and it's a very easy treatment. 178 00:08:41,149 --> 00:08:43,394 Have you looked at other types of diseases and using the 179 00:08:43,394 --> 00:08:46,020 school system to try and manage it? 180 00:08:46,020 --> 00:08:48,130 MICHAEL KREMER: Yeah. 181 00:08:48,130 --> 00:08:50,440 So there are other things which could be done in the 182 00:08:50,440 --> 00:08:52,910 school health area and perhaps with 183 00:08:52,910 --> 00:08:55,650 micronutrients, et cetera. 184 00:08:55,650 --> 00:08:57,830 I don't want to go too far there both because I want to 185 00:08:57,830 --> 00:09:02,670 get back to the econometrics and because I know more about 186 00:09:02,670 --> 00:09:03,810 worms than I do other things. 187 00:09:03,810 --> 00:09:06,310 But I think there are micronutrients and other 188 00:09:06,310 --> 00:09:07,880 things that could be delivered that way. 189 00:09:07,880 --> 00:09:11,350 There's some work that's been done on presumptive treatment 190 00:09:11,350 --> 00:09:15,660 for malaria that is very intriguing and suggests that 191 00:09:15,660 --> 00:09:16,060 might work. 192 00:09:16,060 --> 00:09:20,180 There's also things you can do on education, HIV/AIDS 193 00:09:20,180 --> 00:09:23,560 education, and so on. 194 00:09:23,560 --> 00:09:25,380 Pascaline Dupas does some very nice work on that. 195 00:09:25,380 --> 00:09:28,690 And then Esther and Pascaline and Samuel Sinei and I have 196 00:09:28,690 --> 00:09:30,010 some joint work on that as well. 197 00:09:33,870 --> 00:09:34,120 OK. 198 00:09:34,120 --> 00:09:38,930 Well, let me turn to the topic of this lecture, which is-- 199 00:09:38,930 --> 00:09:41,260 and I'm happy if there's time at the end or in the break, 200 00:09:41,260 --> 00:09:44,650 I'm happy to follow up on these issues-- 201 00:09:44,650 --> 00:09:48,530 which is managing threats to 202 00:09:48,530 --> 00:09:50,660 evaluation and to data analysis. 203 00:09:50,660 --> 00:09:54,860 So I think in the previous discussion, there's been 204 00:09:54,860 --> 00:09:57,780 things about how do you set up your sample size, how do you 205 00:09:57,780 --> 00:10:00,660 actually randomize. 206 00:10:00,660 --> 00:10:03,800 Doing those things is obviously critical, but it 207 00:10:03,800 --> 00:10:05,180 might not be sufficient. 208 00:10:05,180 --> 00:10:08,780 Because there can still be problems with impact 209 00:10:08,780 --> 00:10:11,010 measurement and analysis. 210 00:10:11,010 --> 00:10:15,100 Some of those, you can try to minimize ahead of time. 211 00:10:15,100 --> 00:10:16,290 I'm going to focus mostly on what can be 212 00:10:16,290 --> 00:10:17,020 done ahead of time. 213 00:10:17,020 --> 00:10:19,270 And then Shawn's going to talk about what can be done in the 214 00:10:19,270 --> 00:10:22,820 analysis stage to try and deal with problems that did come 215 00:10:22,820 --> 00:10:25,510 up, and what inferences you can make, and what inferences 216 00:10:25,510 --> 00:10:26,760 you can't make. 217 00:10:31,390 --> 00:10:36,960 I'm going to do a small, semi-randomized trial here, 218 00:10:36,960 --> 00:10:40,010 quasi-randomized trial, I guess should say. 219 00:10:40,010 --> 00:10:43,860 I'm going to consider a program which is giving people 220 00:10:43,860 --> 00:10:46,460 money as a social, anti-poverty program. 221 00:10:49,810 --> 00:10:54,120 I think, rather than do full randomization-- 222 00:10:54,120 --> 00:10:56,370 you can actually leave that down for awhile-- 223 00:10:56,370 --> 00:10:57,680 I'll come back to the evaluation of 224 00:10:57,680 --> 00:10:58,500 this program later. 225 00:10:58,500 --> 00:11:00,480 Now I'm going to do the randomization and the 226 00:11:00,480 --> 00:11:00,885 implementation. 227 00:11:00,885 --> 00:11:02,500 And we can do the-- 228 00:11:02,500 --> 00:11:03,110 AUDIENCE: Too late. 229 00:11:03,110 --> 00:11:03,250 MICHAEL KREMER: OK. 230 00:11:03,250 --> 00:11:03,530 Too late? 231 00:11:03,530 --> 00:11:06,760 OK, that's fine. 232 00:11:06,760 --> 00:11:08,790 We could count off people, one, two, one, two, and then 233 00:11:08,790 --> 00:11:13,080 just give all the ones $500 and the twos nothing. 234 00:11:45,466 --> 00:11:47,454 AUDIENCE: It really does feel bad when you're 235 00:11:47,454 --> 00:11:48,704 in the control group. 236 00:11:54,920 --> 00:11:55,695 AUDIENCE: No sharing. 237 00:11:55,695 --> 00:11:57,340 [UNINTELLIGIBLE]. 238 00:11:57,340 --> 00:11:58,900 AUDIENCE: This is real money or this-- 239 00:11:58,900 --> 00:11:59,270 MICHAEL KREMER: Don't worry. 240 00:11:59,270 --> 00:12:01,310 Well, you shouldn't feel too bad if you're 241 00:12:01,310 --> 00:12:04,232 in the control group. 242 00:12:04,232 --> 00:12:06,520 AUDIENCE: I meant cash or money? 243 00:12:06,520 --> 00:12:07,904 MICHAEL KREMER: You've got the money now, yeah? 244 00:12:07,904 --> 00:12:08,710 AUDIENCE: [UNINTELLIGIBLE] cash? 245 00:12:08,710 --> 00:12:10,815 MICHAEL KREMER: Yeah, they'll be opportunities later on. 246 00:12:25,840 --> 00:12:28,620 OK, here are the problems I want to discuss. 247 00:12:28,620 --> 00:12:29,840 The first one is-- 248 00:12:29,840 --> 00:12:33,640 so hang on to this money, we'll deal with it later on-- 249 00:12:33,640 --> 00:12:36,190 the first one is attrition. 250 00:12:36,190 --> 00:12:39,190 The second is externalities. 251 00:12:39,190 --> 00:12:42,780 And the third one is partial compliance. 252 00:12:42,780 --> 00:12:44,290 So the first one is attrition. 253 00:12:44,290 --> 00:12:47,370 Some people you're not able to collect follow up data on. 254 00:12:47,370 --> 00:12:49,120 You try, but you're not able to. 255 00:12:49,120 --> 00:12:50,610 The second one, externalities. 256 00:12:50,610 --> 00:12:53,350 What happens if your program, as in the case of deworming, 257 00:12:53,350 --> 00:12:55,470 winds up affecting the comparison group as well as 258 00:12:55,470 --> 00:12:57,130 the treatment group? 259 00:12:57,130 --> 00:12:59,290 And the third one is partial compliance. 260 00:12:59,290 --> 00:13:01,240 You want to implement in certain places, but some 261 00:13:01,240 --> 00:13:03,910 places, they don't actually implement it. 262 00:13:03,910 --> 00:13:05,150 Maybe some of your comparison group 263 00:13:05,150 --> 00:13:07,350 accidentally gets treated. 264 00:13:07,350 --> 00:13:10,660 What do you do in that case? 265 00:13:10,660 --> 00:13:13,710 All of these things are really about internal validity. 266 00:13:13,710 --> 00:13:16,600 So there's important questions of external validity and 267 00:13:16,600 --> 00:13:17,460 interpretation. 268 00:13:17,460 --> 00:13:19,580 And Shawn's going to talk to some about those. 269 00:13:19,580 --> 00:13:20,740 But I'm going to just focus on the 270 00:13:20,740 --> 00:13:22,000 internal validity of these. 271 00:13:24,730 --> 00:13:29,040 So the first question with attrition is is it going to be 272 00:13:29,040 --> 00:13:32,890 a problem if some of the people disappear before you 273 00:13:32,890 --> 00:13:34,460 can collect the data. 274 00:13:34,460 --> 00:13:41,670 And this can be a real problem in Kenya for example. 275 00:13:41,670 --> 00:13:43,490 Kids often change their name. 276 00:13:43,490 --> 00:13:45,220 So it's just a part of the culture. 277 00:13:45,220 --> 00:13:46,880 You change your name at some point. 278 00:13:46,880 --> 00:13:49,130 That's going to make it difficult to find everybody 279 00:13:49,130 --> 00:13:50,380 afterwards. 280 00:13:53,150 --> 00:13:56,370 So a first question-- oh gosh, I thought this was going to 281 00:13:56,370 --> 00:13:59,633 come up bit by bit. 282 00:13:59,633 --> 00:14:00,700 Well, OK. 283 00:14:00,700 --> 00:14:02,440 We got the whole slide. 284 00:14:02,440 --> 00:14:08,790 So is it a problem if the type of person who disappears is 285 00:14:08,790 --> 00:14:10,980 correlated with the treatment? 286 00:14:10,980 --> 00:14:16,150 And does anybody want to answer that even though 287 00:14:16,150 --> 00:14:17,400 there's some answer there? 288 00:14:24,500 --> 00:14:26,660 This says the name of it, but it's not saying 289 00:14:26,660 --> 00:14:28,070 what the issue is. 290 00:14:28,070 --> 00:14:31,710 Does anybody want to comment on that? 291 00:14:31,710 --> 00:14:32,040 Yes. 292 00:14:32,040 --> 00:14:38,384 AUDIENCE: So if the attrition is quantited with treatment, 293 00:14:38,384 --> 00:14:41,312 then you're going to end up with underestimated or 294 00:14:41,312 --> 00:14:45,230 overestimated effect depending on what the correlation is. 295 00:14:45,230 --> 00:14:46,440 MICHAEL KREMER: OK. 296 00:14:46,440 --> 00:14:47,120 That's great. 297 00:14:47,120 --> 00:14:49,022 Can you say more about that? 298 00:14:49,022 --> 00:14:57,140 AUDIENCE: So if the correlation is that the people 299 00:14:57,140 --> 00:15:05,375 who disappear are people who didn't get the treatment, who 300 00:15:05,375 --> 00:15:09,552 most needed the treatment, then what you're left with in 301 00:15:09,552 --> 00:15:14,703 the control group is stronger people, the people who maybe 302 00:15:14,703 --> 00:15:17,420 didn't need the treatment as much or maybe had other 303 00:15:17,420 --> 00:15:19,585 reasons that they were doing just fine. 304 00:15:19,585 --> 00:15:25,030 And so it's going to look like the treatment effect is less 305 00:15:25,030 --> 00:15:27,850 because you have a strong control treatment group 306 00:15:27,850 --> 00:15:31,530 compared to a randomized treatment group. 307 00:15:31,530 --> 00:15:31,860 MICHAEL KREMER: Right. 308 00:15:31,860 --> 00:15:32,190 OK. 309 00:15:32,190 --> 00:15:32,950 So that's great. 310 00:15:32,950 --> 00:15:37,950 So let's go through an example where we can potentially see 311 00:15:37,950 --> 00:15:39,750 that sort of thing happening. 312 00:15:39,750 --> 00:15:48,130 So let's think about a problem where there's some kids who 313 00:15:48,130 --> 00:15:50,080 don't come to school because they're too weak, they're 314 00:15:50,080 --> 00:15:50,650 undernourished. 315 00:15:50,650 --> 00:15:53,280 So imagine that's the context. 316 00:15:53,280 --> 00:15:56,020 And imagine you start a school feeding program, and you want 317 00:15:56,020 --> 00:15:58,970 to do an evaluation of the impact of this on school 318 00:15:58,970 --> 00:15:59,430 attendance. 319 00:15:59,430 --> 00:16:03,340 So this, in fact, was something we wanted to do. 320 00:16:03,340 --> 00:16:05,270 And imagine you're interested both in the impact on 321 00:16:05,270 --> 00:16:08,780 enrollment, but also on children's nutrition, which 322 00:16:08,780 --> 00:16:11,500 you measure by their weight. 323 00:16:11,500 --> 00:16:16,230 And imagine that the real effect of this program is that 324 00:16:16,230 --> 00:16:19,200 the weak, stunted children actually go to school more if 325 00:16:19,200 --> 00:16:21,310 they're near a treatment school. 326 00:16:21,310 --> 00:16:24,580 So if you go to all the schools and you measure 327 00:16:24,580 --> 00:16:29,080 everyone who's in school on a given day, in that case, are 328 00:16:29,080 --> 00:16:30,840 you going to see the treatment and control difference in 329 00:16:30,840 --> 00:16:32,770 weight overstated or understated? 330 00:16:36,840 --> 00:16:38,010 AUDIENCE: Overstated. 331 00:16:38,010 --> 00:16:38,570 MICHAEL KREMER: Overstated. 332 00:16:38,570 --> 00:16:40,420 So what's the story for why it would be overstated? 333 00:16:40,420 --> 00:16:44,832 AUDIENCE: Because in the treatment schools, a lot of 334 00:16:44,832 --> 00:16:46,950 kids who really need the nutrition 335 00:16:46,950 --> 00:16:48,090 would start going in. 336 00:16:48,090 --> 00:16:50,916 Whereas, in the control group, they have no incentive to go. 337 00:16:50,916 --> 00:16:52,800 So they're not being included in it. 338 00:16:52,800 --> 00:16:53,310 MICHAEL KREMER: That's interesting. 339 00:16:53,310 --> 00:16:54,340 That's interesting. 340 00:16:54,340 --> 00:16:55,590 OK, OK. 341 00:16:57,740 --> 00:16:59,680 In fact, the example is going to be the opposite. 342 00:16:59,680 --> 00:17:01,590 But I think it's true, you could tell a story where this 343 00:17:01,590 --> 00:17:02,795 could go either way. 344 00:17:02,795 --> 00:17:05,349 And you just told a story where it would go that way. 345 00:17:09,819 --> 00:17:12,510 Let's show you a hypothetical numerical example. 346 00:17:12,510 --> 00:17:14,900 And if you can actually work through this, 347 00:17:14,900 --> 00:17:17,849 that would be useful. 348 00:17:17,849 --> 00:17:20,380 Imagine there's just three kids in each of these 349 00:17:20,380 --> 00:17:22,710 communities. 350 00:17:22,710 --> 00:17:29,930 So imagine that before treatment, the distribution 351 00:17:29,930 --> 00:17:30,830 looked identical. 352 00:17:30,830 --> 00:17:33,600 So there was one kid who weighed 30 kilos, another at 353 00:17:33,600 --> 00:17:36,800 35, another at 40. 354 00:17:36,800 --> 00:17:42,800 And after treatment, let's say it's a successful program, and 355 00:17:42,800 --> 00:17:44,495 it gets everybody up by two pounds-- 356 00:17:48,330 --> 00:17:50,520 I guess we should do this in pounds given these numbers-- 357 00:17:50,520 --> 00:17:52,680 moves everybody up by two pounds. 358 00:17:52,680 --> 00:17:56,000 In the comparison group, everybody stays the same. 359 00:17:56,000 --> 00:18:01,940 So when you calculate, the average here is going to be 360 00:18:01,940 --> 00:18:05,530 35, and here it's going to be 35. 361 00:18:05,530 --> 00:18:09,310 So there's going to be no difference at baseline. 362 00:18:09,310 --> 00:18:13,280 If you look afterwards, if you didn't have any attrition and 363 00:18:13,280 --> 00:18:16,960 you managed to follow all these kids, you would 364 00:18:16,960 --> 00:18:18,870 correctly measure the impact of this program. 365 00:18:18,870 --> 00:18:22,550 You would say that it's added two pounds to people's weight. 366 00:18:22,550 --> 00:18:24,660 Now here's one possible pattern of attrition. 367 00:18:24,660 --> 00:18:28,410 Suppose you go on a given day, but not all of the kids are 368 00:18:28,410 --> 00:18:29,300 there that day. 369 00:18:29,300 --> 00:18:39,090 So in particular, imagine that the weaker kids are less 370 00:18:39,090 --> 00:18:41,890 likely to be there. 371 00:18:41,890 --> 00:18:46,700 So suppose only children who are more than 30 kilograms 372 00:18:46,700 --> 00:18:49,600 come to school. 373 00:18:49,600 --> 00:18:53,810 Imagine the kids who are less than 30 kilograms are only 374 00:18:53,810 --> 00:18:55,430 there half the time or something. 375 00:18:55,430 --> 00:18:59,050 And you happen to show up on a day when kids who are only 376 00:18:59,050 --> 00:19:01,170 over 30 kilograms come to school. 377 00:19:01,170 --> 00:19:04,350 Well, then the person who is still at 30 kilograms in the 378 00:19:04,350 --> 00:19:07,730 comparison group isn't going to be there at all. 379 00:19:07,730 --> 00:19:09,990 You'll measure the average here at 37 and a half. 380 00:19:16,490 --> 00:19:19,540 So you'll see no difference beforehand. 381 00:19:19,540 --> 00:19:23,420 Afterwards, can you compute what you're going to estimate 382 00:19:23,420 --> 00:19:25,690 the impact of the treatment to be? 383 00:19:25,690 --> 00:19:25,920 Yeah. 384 00:19:25,920 --> 00:19:27,260 AUDIENCE: It's negative half a pound. 385 00:19:27,260 --> 00:19:28,760 MICHAEL KREMER: Negative half a pound, right. 386 00:19:28,760 --> 00:19:33,050 So in this case, for this particular set of assumptions, 387 00:19:33,050 --> 00:19:36,730 you'll underestimate the impact. 388 00:19:36,730 --> 00:19:39,770 It's not necessarily the case that attrition differences 389 00:19:39,770 --> 00:19:42,710 between the groups always lead to underestimates. 390 00:19:42,710 --> 00:19:43,920 It can be the opposite. 391 00:19:43,920 --> 00:19:46,860 So we happened to pick a case here where it worked this way. 392 00:19:46,860 --> 00:19:48,365 But here's another example. 393 00:19:51,550 --> 00:19:54,130 Let's put that other context behind us. 394 00:19:54,130 --> 00:19:55,380 Think about a different context. 395 00:19:55,380 --> 00:19:57,870 Think about the context of we're just 396 00:19:57,870 --> 00:19:58,830 trying to improve learning. 397 00:19:58,830 --> 00:20:00,330 And we've got a new math course. 398 00:20:00,330 --> 00:20:01,580 And it's a hard course. 399 00:20:05,540 --> 00:20:11,600 For example, in the state of Massachusetts, there are now 400 00:20:11,600 --> 00:20:13,080 graduation requirements. 401 00:20:13,080 --> 00:20:14,990 Used to be that it was very easy to graduate from 402 00:20:14,990 --> 00:20:16,030 secondary school. 403 00:20:16,030 --> 00:20:17,880 They put in requirements to make this much tougher. 404 00:20:17,880 --> 00:20:20,210 You have to pass an exam. 405 00:20:20,210 --> 00:20:23,450 And the proponents of this argue, well, it's a good thing 406 00:20:23,450 --> 00:20:25,650 because it forces the kids to study more, it forces the 407 00:20:25,650 --> 00:20:27,380 teachers to really prepare them. 408 00:20:27,380 --> 00:20:28,320 And they're probably right. 409 00:20:28,320 --> 00:20:31,270 The opponents argue that, well, the kids who figure 410 00:20:31,270 --> 00:20:33,530 they're not going to be able to pass just drop out. 411 00:20:33,530 --> 00:20:35,400 They may be right as well. 412 00:20:35,400 --> 00:20:37,110 So if you're trying to evaluate the 413 00:20:37,110 --> 00:20:38,670 impact of this program-- 414 00:20:38,670 --> 00:20:41,340 and imagine that we randomized across states in the US, and 415 00:20:41,340 --> 00:20:43,050 some states implemented, and some didn't-- 416 00:20:45,650 --> 00:20:48,120 if you looked at the average score among those who got 417 00:20:48,120 --> 00:20:51,160 through, well, you might see it's better in 418 00:20:51,160 --> 00:20:53,030 the treatment group. 419 00:20:53,030 --> 00:20:56,890 But would that be the right conclusion about the impact of 420 00:20:56,890 --> 00:20:57,340 the program? 421 00:20:57,340 --> 00:20:58,400 It might not be. 422 00:20:58,400 --> 00:21:00,760 So let me keep going with this. 423 00:21:00,760 --> 00:21:03,830 So we've got this harder course. 424 00:21:03,830 --> 00:21:05,860 Imagine those who can't handle it drop out. 425 00:21:05,860 --> 00:21:07,300 You give the same math test in the 426 00:21:07,300 --> 00:21:09,500 treatment and control schools. 427 00:21:09,500 --> 00:21:12,030 But you only have data on those who didn't drop out 428 00:21:12,030 --> 00:21:13,820 because you go to the school and you get everybody who's 429 00:21:13,820 --> 00:21:15,810 there in the school. 430 00:21:15,810 --> 00:21:18,020 So what's the direction the bias is going 431 00:21:18,020 --> 00:21:20,680 to be in that case? 432 00:21:20,680 --> 00:21:21,900 AUDIENCE: It'll overstate the effect. 433 00:21:21,900 --> 00:21:23,190 You'll only see the strongest. 434 00:21:23,190 --> 00:21:24,250 MICHAEL KREMER: Exactly, exactly. 435 00:21:24,250 --> 00:21:26,250 In the treatment group, you'll only see the strong students. 436 00:21:26,250 --> 00:21:28,220 In the comparison group, you'll have the mix. 437 00:21:35,200 --> 00:21:37,865 So that's an example of the case that you 438 00:21:37,865 --> 00:21:40,460 were talking about. 439 00:21:40,460 --> 00:21:43,880 In the deworming program with testing, what was the natural 440 00:21:43,880 --> 00:21:47,690 concern with attrition bias there? 441 00:21:47,690 --> 00:21:51,854 AUDIENCE: The weakest, the ones with the most worms 442 00:21:51,854 --> 00:21:53,180 weren't going to be-- 443 00:21:53,180 --> 00:21:55,040 MICHAEL KREMER: Exactly, you get them to stay. 444 00:21:55,040 --> 00:21:56,880 The kid's pretty weak because they've had lots of worms. 445 00:21:56,880 --> 00:21:59,690 You cut off the worms, they come to school. 446 00:21:59,690 --> 00:22:03,080 So the treatment group would then be adding in these kids 447 00:22:03,080 --> 00:22:04,720 who are weaker in some way. 448 00:22:04,720 --> 00:22:06,580 So that would be the concern. 449 00:22:06,580 --> 00:22:07,480 How do you deal with it? 450 00:22:07,480 --> 00:22:12,840 Well, one way is you can try to follow everybody up. 451 00:22:12,840 --> 00:22:17,500 And this is the first thing you should do is the brute 452 00:22:17,500 --> 00:22:19,780 force approach, which is to try and follow everybody up. 453 00:22:19,780 --> 00:22:23,940 And that means if it's a school program, maybe you 454 00:22:23,940 --> 00:22:27,270 don't just test the kids in the school, the ones who 455 00:22:27,270 --> 00:22:29,420 dropped out, you try and find them and test them anyway. 456 00:22:29,420 --> 00:22:30,650 Now that's expensive. 457 00:22:30,650 --> 00:22:32,930 And it's very difficult to find people, and it may be 458 00:22:32,930 --> 00:22:34,850 difficult to get them to take the exam. 459 00:22:34,850 --> 00:22:37,400 But if you think that the program is going seriously 460 00:22:37,400 --> 00:22:40,460 affect dropout rates, then that can be a very important 461 00:22:40,460 --> 00:22:41,710 thing to do. 462 00:22:43,760 --> 00:22:46,310 To do that, you have to pick a sample of those who are going 463 00:22:46,310 --> 00:22:48,250 to be tested before the treatment, and you have to 464 00:22:48,250 --> 00:22:49,660 follow those people. 465 00:22:49,660 --> 00:22:51,770 So if you hadn't done a baseline, then this is going 466 00:22:51,770 --> 00:22:54,410 to be especially hard because you don't even know who 467 00:22:54,410 --> 00:22:55,430 dropped out. 468 00:22:55,430 --> 00:22:59,660 They might not have records of those kids. 469 00:22:59,660 --> 00:23:00,470 There's sometimes questions. 470 00:23:00,470 --> 00:23:02,990 Should you do a baseline, or should you not? 471 00:23:02,990 --> 00:23:05,230 In theory, you could do a randomized evaluation without 472 00:23:05,230 --> 00:23:06,480 a baseline. 473 00:23:08,730 --> 00:23:10,720 Almost always, it's much better to have the baseline. 474 00:23:10,720 --> 00:23:15,750 And this is one of them, which is if the program might affect 475 00:23:15,750 --> 00:23:18,710 dropout, you want to measure the effect of the program by 476 00:23:18,710 --> 00:23:20,990 looking at the people who were initially in the program. 477 00:23:23,890 --> 00:23:28,830 So then imagine that you do that, but the truth is it's 478 00:23:28,830 --> 00:23:30,760 just hard to find all these kids who dropped out. 479 00:23:30,760 --> 00:23:33,620 Some of them have moved, or they're not home, or whatever, 480 00:23:33,620 --> 00:23:35,560 or they don't want to come take the test. 481 00:23:35,560 --> 00:23:39,560 So imagine that you've done this, and the treatment group 482 00:23:39,560 --> 00:23:42,450 has 20% attrition, the comparison 483 00:23:42,450 --> 00:23:43,800 group has 20% attrition. 484 00:23:43,800 --> 00:23:46,050 Are you then OK? 485 00:23:46,050 --> 00:23:46,540 OK. 486 00:23:46,540 --> 00:23:49,200 I'm seeing the answer, no. 487 00:23:49,200 --> 00:23:51,490 Does anybody want to say what the potential 488 00:23:51,490 --> 00:23:52,990 problem might be? 489 00:23:52,990 --> 00:23:53,170 Yeah. 490 00:23:53,170 --> 00:23:56,495 AUDIENCE: Well, if it's not random as to who drops out, 491 00:23:56,495 --> 00:23:57,920 then we're just still going to have to [UNINTELLIGIBLE] 492 00:23:57,920 --> 00:23:58,400 facts. 493 00:23:58,400 --> 00:24:01,431 If there's still a correlation between who's dropping out in 494 00:24:01,431 --> 00:24:03,224 the control group versus who's dropping out in the treatment 495 00:24:03,224 --> 00:24:06,160 group, that's still going to affect the outcomes. 496 00:24:06,160 --> 00:24:08,120 MICHAEL KREMER: Yeah. 497 00:24:08,120 --> 00:24:10,360 That's exactly right. 498 00:24:10,360 --> 00:24:12,130 I'm trying to think of this myself. 499 00:24:12,130 --> 00:24:16,690 Can anybody come up with a hypothetical, but concrete 500 00:24:16,690 --> 00:24:20,080 example, where you could have the same attrition rate in the 501 00:24:20,080 --> 00:24:24,560 two groups, but your estimate would still be messed up or 502 00:24:24,560 --> 00:24:26,140 biased, to use the technical term? 503 00:24:26,140 --> 00:24:26,621 Yeah. 504 00:24:26,621 --> 00:24:30,950 AUDIENCE: For example, if the treatment group is only the 505 00:24:30,950 --> 00:24:32,393 [UNINTELLIGIBLE] 506 00:24:32,393 --> 00:24:34,798 could drop off and then the control group is 507 00:24:34,798 --> 00:24:36,241 [? losing flow ?] 508 00:24:36,241 --> 00:24:38,650 would drop off, it's not going to [UNINTELLIGIBLE]. 509 00:24:38,650 --> 00:24:40,400 MICHAEL KREMER: Exactly, exactly. 510 00:24:40,400 --> 00:24:43,720 So if, in each case, you lose 20%, but in the treatment 511 00:24:43,720 --> 00:24:48,680 group, you're losing the top 20% and the comparison group, 512 00:24:48,680 --> 00:24:52,120 you're losing the bottom 20%, and you only measure those who 513 00:24:52,120 --> 00:24:53,830 remain, you're going to be biased. 514 00:24:59,650 --> 00:25:03,770 So here's an example of something that could do that. 515 00:25:03,770 --> 00:25:07,740 Imagine that you put in a remedial education program. 516 00:25:13,010 --> 00:25:15,370 Imagine you lower the levels of the curriculum. 517 00:25:15,370 --> 00:25:20,730 Well, then maybe the kids in the treatment group, maybe the 518 00:25:20,730 --> 00:25:23,560 kids who are at the top of the distribution say, I don't want 519 00:25:23,560 --> 00:25:25,760 to be in this school, I'm switching to another school, 520 00:25:25,760 --> 00:25:28,040 because they don't want the lower level curriculum. 521 00:25:28,040 --> 00:25:30,870 So you lose 20%. 522 00:25:30,870 --> 00:25:34,490 In the comparison school, the 20% at the top don't drop out, 523 00:25:34,490 --> 00:25:37,360 but the 20% at the bottom drop out because they didn't have 524 00:25:37,360 --> 00:25:39,060 this special attention. 525 00:25:39,060 --> 00:25:41,930 So in each case, you've got 20% attrition, but the 526 00:25:41,930 --> 00:25:45,010 estimate of the impact of the program is going to be very 527 00:25:45,010 --> 00:25:47,840 seriously biased. 528 00:25:47,840 --> 00:25:50,080 So how can you deal with that? 529 00:25:50,080 --> 00:25:52,660 Well, what you should do is you should check 530 00:25:52,660 --> 00:25:54,440 whether you have a-- 531 00:25:54,440 --> 00:25:56,680 imagine you had pre-test scores for the kids. 532 00:25:56,680 --> 00:26:00,980 Well, then you could see what's the predictors of drop 533 00:26:00,980 --> 00:26:03,500 out in the treatment group and in the comparison group. 534 00:26:03,500 --> 00:26:06,330 And ideally, you'd find the predictors are the same. 535 00:26:06,330 --> 00:26:07,570 And then you're somewhat reassured. 536 00:26:07,570 --> 00:26:11,670 You're not completely, completely safe because maybe 537 00:26:11,670 --> 00:26:13,950 your initial test scores aren't really a good measure 538 00:26:13,950 --> 00:26:17,730 of their true eventual test score. 539 00:26:17,730 --> 00:26:20,190 But it helps a lot. 540 00:26:20,190 --> 00:26:22,980 The other thing you can do is you can try to bound the 541 00:26:22,980 --> 00:26:23,960 extent of the bias. 542 00:26:23,960 --> 00:26:26,310 So we go through an exercise like this in 543 00:26:26,310 --> 00:26:27,950 the deworming paper. 544 00:26:27,950 --> 00:26:30,310 So suppose everyone who dropped out of the treatment 545 00:26:30,310 --> 00:26:32,400 got the lowest test score that you got. 546 00:26:32,400 --> 00:26:35,230 So what you can do is you can say, we're going to put those 547 00:26:35,230 --> 00:26:38,200 people for whom we don't have outcome data, we're going to 548 00:26:38,200 --> 00:26:41,040 create an artificial data set, where we put them back in the 549 00:26:41,040 --> 00:26:44,200 data, but we artificially assign them the lowest 550 00:26:44,200 --> 00:26:46,610 conceivable score. 551 00:26:46,610 --> 00:26:49,470 And then suppose everybody who dropped out of the control 552 00:26:49,470 --> 00:26:52,630 group got the highest score that anybody could get. 553 00:26:52,630 --> 00:26:55,180 So if you artificially give everybody who dropped out of 554 00:26:55,180 --> 00:26:57,820 treatment the lowest possible score, and you artificially 555 00:26:57,820 --> 00:27:01,460 give everybody who dropped out of the control group the 556 00:27:01,460 --> 00:27:05,980 highest possible score, well, then you're bending over 557 00:27:05,980 --> 00:27:08,290 backwards to say, how bad could the program 558 00:27:08,290 --> 00:27:09,550 potentially have been. 559 00:27:09,550 --> 00:27:11,830 And if you do this exercise and you find that even when 560 00:27:11,830 --> 00:27:15,700 you do this, it looks like the program is good, then you can 561 00:27:15,700 --> 00:27:17,060 be pretty confident the program's good. 562 00:27:17,060 --> 00:27:19,960 So this is what's called constructing the lower bound. 563 00:27:19,960 --> 00:27:22,510 And similarly, you can construct an upper bound on 564 00:27:22,510 --> 00:27:25,620 how well the program did. 565 00:27:25,620 --> 00:27:31,400 And if you have a high dropout rate, your lower bound and 566 00:27:31,400 --> 00:27:32,700 your upper bound are going to be very far 567 00:27:32,700 --> 00:27:33,530 apart from each other. 568 00:27:33,530 --> 00:27:37,370 You're not going to be able to say that much about what the 569 00:27:37,370 --> 00:27:38,500 impact of the program is. 570 00:27:38,500 --> 00:27:43,870 But if you have a low dropout rate, it might be that your 571 00:27:43,870 --> 00:27:45,140 bounds are very close together. 572 00:27:45,140 --> 00:27:47,000 AUDIENCE: And cheaper. 573 00:27:47,000 --> 00:27:50,250 MICHAEL KREMER: It's cheaper than fighting everybody, yeah. 574 00:27:54,080 --> 00:27:55,930 I think a lot depends on the particular context. 575 00:27:58,875 --> 00:28:01,220 And there's also various bounds you can do. 576 00:28:01,220 --> 00:28:04,170 So let me not go into the full detail on that. 577 00:28:04,170 --> 00:28:07,650 But you can have bounds that are very conservative. 578 00:28:07,650 --> 00:28:17,120 This would be an example of them, where this is very much 579 00:28:17,120 --> 00:28:17,870 a worst case scenario. 580 00:28:17,870 --> 00:28:20,660 You can imagine other scenarios that are not the 581 00:28:20,660 --> 00:28:26,720 very worst case scenario, but are pretty bad case scenarios, 582 00:28:26,720 --> 00:28:29,604 and say, even in that case, the program would have worked. 583 00:28:35,420 --> 00:28:38,430 The next topic is going to be externalities. 584 00:28:38,430 --> 00:28:42,140 But before I go on to that, do people have questions on 585 00:28:42,140 --> 00:28:45,280 attrition or comments on it? 586 00:28:45,280 --> 00:28:47,085 Or questions about this in practice? 587 00:28:52,890 --> 00:28:54,800 OK, let me move on to externalities. 588 00:28:54,800 --> 00:28:58,440 So first, I want to create some externalities. 589 00:28:58,440 --> 00:29:01,560 So everybody who got some money-- 590 00:29:01,560 --> 00:29:04,220 I heard a suggestion of sharing some money. 591 00:29:04,220 --> 00:29:05,590 So why don't we implement that? 592 00:29:05,590 --> 00:29:08,740 So why don't you turn to your neighbor, and why don't you 593 00:29:08,740 --> 00:29:12,440 share some of the money with your neighbor? 594 00:29:12,440 --> 00:29:15,970 I'll let you decide how generous you want to be. 595 00:29:15,970 --> 00:29:17,387 It is fake money after all. 596 00:29:30,536 --> 00:29:33,000 AUDIENCE: Can we give to multiple neighbors? 597 00:29:33,000 --> 00:29:34,270 MICHAEL KREMER: Do whatever you like, do 598 00:29:34,270 --> 00:29:35,520 whatever you like. 599 00:29:48,240 --> 00:29:55,590 And by the way, what you guys just did, there are a lot of 600 00:29:55,590 --> 00:29:56,560 theories of development-- 601 00:29:56,560 --> 00:29:58,660 I don't know whether this is practice or not-- which would 602 00:29:58,660 --> 00:30:01,180 say that that sort of thing might happen, a lot of 603 00:30:01,180 --> 00:30:04,570 theories about risk sharing within communities, and so on. 604 00:30:04,570 --> 00:30:06,300 Maybe that's all propaganda, I don't know. 605 00:30:06,300 --> 00:30:08,070 But anyway, some people would claim that that sort of thing 606 00:30:08,070 --> 00:30:09,320 can happen. 607 00:30:11,180 --> 00:30:13,620 So now what I want to talk about though is what's the 608 00:30:13,620 --> 00:30:16,310 impact on our program evaluation. 609 00:30:18,930 --> 00:30:26,020 What I'd like to do is to do a program evaluation now of what 610 00:30:26,020 --> 00:30:28,980 was the impact of this program, where 611 00:30:28,980 --> 00:30:29,690 you're all a village. 612 00:30:29,690 --> 00:30:32,330 I gave half the people in the village $500. 613 00:30:32,330 --> 00:30:33,450 So how did we do that? 614 00:30:33,450 --> 00:30:38,520 Well, we pseudo-randomized the program, reasonably close, 615 00:30:38,520 --> 00:30:39,770 counting off one, two. 616 00:30:42,290 --> 00:30:43,930 What's the impact of the program? 617 00:30:43,930 --> 00:30:47,400 Well, let's figure out how much money our treatment group 618 00:30:47,400 --> 00:30:49,220 people have and our comparison group people have. 619 00:30:49,220 --> 00:30:51,600 So if you can look in your wallet, figure out how much 620 00:30:51,600 --> 00:30:54,620 money you have there, add in the fake money, and come up 621 00:30:54,620 --> 00:30:55,290 with a total. 622 00:30:55,290 --> 00:30:57,860 And then we'll try and do some-- 623 00:30:57,860 --> 00:30:59,290 I'll do some data collection. 624 00:30:59,290 --> 00:31:01,784 So let me put this up here. 625 00:31:01,784 --> 00:31:03,110 AUDIENCE: Our actual money? 626 00:31:03,110 --> 00:31:04,880 MICHAEL KREMER: Yeah, add in your actual money and your 627 00:31:04,880 --> 00:31:06,890 fake money, and we'll see. 628 00:31:37,160 --> 00:31:38,900 So are you a treatment group person? 629 00:31:38,900 --> 00:31:39,980 AUDIENCE: Yes. 630 00:31:39,980 --> 00:31:41,290 MICHAEL KREMER: OK. 631 00:31:41,290 --> 00:31:44,070 So how much money do you have, including everything? 632 00:31:44,070 --> 00:31:45,230 AUDIENCE: $784. 633 00:31:45,230 --> 00:31:47,836 MICHAEL KREMER: $784, OK. 634 00:31:47,836 --> 00:31:50,090 I hope there's no thieves around here that I'm 635 00:31:50,090 --> 00:31:51,423 revealing things to. 636 00:31:51,423 --> 00:31:52,872 AUDIENCE: $784? 637 00:31:52,872 --> 00:31:53,838 AUDIENCE: Including this. 638 00:31:53,838 --> 00:31:54,810 AUDIENCE: Because I'm a control group. 639 00:31:54,810 --> 00:31:56,116 MICHAEL KREMER: You're a control group. 640 00:31:56,116 --> 00:31:57,610 AUDIENCE: I have $300. 641 00:31:57,610 --> 00:32:02,413 MICHAEL KREMER: $300. 642 00:32:02,413 --> 00:32:04,297 AUDIENCE: [? Only money ?] pounds [? away ?]. 643 00:32:04,297 --> 00:32:06,181 AUDIENCE: But these guys gave it to you. 644 00:32:06,181 --> 00:32:08,880 AUDIENCE: It was on your Charlie card or whatever. 645 00:32:08,880 --> 00:32:11,060 MICHAEL KREMER: So how much do you have? 646 00:32:11,060 --> 00:32:12,040 AUDIENCE: $407. 647 00:32:12,040 --> 00:32:12,350 MICHAEL KREMER: $407. 648 00:32:12,350 --> 00:32:13,600 And you're a treatment, right? 649 00:32:15,930 --> 00:32:18,990 AUDIENCE: I got $14 and $1 on my Charlie card. 650 00:32:18,990 --> 00:32:19,630 MICHAEL KREMER: OK. 651 00:32:19,630 --> 00:32:22,887 So $15, we'll call it. 652 00:32:22,887 --> 00:32:24,170 AUDIENCE: $550. 653 00:32:24,170 --> 00:32:27,110 MICHAEL KREMER: $550. 654 00:32:27,110 --> 00:32:28,690 Maybe the second row should just come up 655 00:32:28,690 --> 00:32:29,980 and write on here. 656 00:32:29,980 --> 00:32:30,850 AUDIENCE: $140. 657 00:32:30,850 --> 00:32:31,120 MICHAEL KREMER: Sorry? 658 00:32:31,120 --> 00:32:32,020 AUDIENCE: $140. 659 00:32:32,020 --> 00:32:34,860 MICHAEL KREMER: $140. 660 00:32:34,860 --> 00:32:35,730 AUDIENCE: $428. 661 00:32:35,730 --> 00:32:39,050 MICHAEL KREMER: $428. 662 00:32:39,050 --> 00:32:40,510 AUDIENCE: $318. 663 00:32:40,510 --> 00:32:43,290 MICHAEL KREMER: $318. 664 00:32:43,290 --> 00:32:45,390 AUDIENCE: $698. 665 00:32:45,390 --> 00:32:48,610 MICHAEL KREMER: I'll put this here. 666 00:32:48,610 --> 00:32:49,990 AUDIENCE: $263. 667 00:32:49,990 --> 00:32:51,210 MICHAEL KREMER: And are you a one or a two? 668 00:32:51,210 --> 00:32:53,380 AUDIENCE: I'm $500, I don't know what-- 669 00:32:53,380 --> 00:32:54,910 MICHAEL KREMER: So you're group one. 670 00:32:54,910 --> 00:32:56,653 And sorry, what was the number again? 671 00:32:56,653 --> 00:32:57,560 AUDIENCE: $263. 672 00:32:57,560 --> 00:33:00,270 MICHAEL KREMER: $263. 673 00:33:00,270 --> 00:33:04,960 You're a very generous guy at least with fake money, right? 674 00:33:04,960 --> 00:33:06,800 AUDIENCE: $270. 675 00:33:06,800 --> 00:33:09,440 MICHAEL KREMER: Oh, $270. 676 00:33:09,440 --> 00:33:10,850 Looks like the program was 677 00:33:10,850 --> 00:33:12,760 counterproductive in your case. 678 00:33:12,760 --> 00:33:15,250 We had a negative seven effect on income. 679 00:33:15,250 --> 00:33:16,600 AUDIENCE: I have $227. 680 00:33:16,600 --> 00:33:19,860 MICHAEL KREMER: $227. 681 00:33:19,860 --> 00:33:21,976 AUDIENCE: $500. 682 00:33:21,976 --> 00:33:23,290 MICHAEL KREMER: $500. 683 00:33:23,290 --> 00:33:23,850 You know what? 684 00:33:23,850 --> 00:33:27,810 We could go and do the full sample, but maybe we should-- 685 00:33:27,810 --> 00:33:29,710 well, we'll take two more. 686 00:33:29,710 --> 00:33:31,020 AUDIENCE: $700. 687 00:33:31,020 --> 00:33:32,260 MICHAEL KREMER: I'm sorry, which group are you? 688 00:33:32,260 --> 00:33:32,900 AUDIENCE: Treatment. 689 00:33:32,900 --> 00:33:35,209 MICHAEL KREMER: $700, OK. 690 00:33:35,209 --> 00:33:37,530 AUDIENCE: I'm control, and I have $200. 691 00:33:37,530 --> 00:33:37,850 MICHAEL KREMER: $200? 692 00:33:37,850 --> 00:33:39,160 Oh, so that got the-- 693 00:33:45,660 --> 00:33:48,460 We'll just take a partial sample rather keep going. 694 00:33:48,460 --> 00:33:51,370 Let's try and get the average in the treatment group and the 695 00:33:51,370 --> 00:33:54,918 average in the comparison group. 696 00:33:54,918 --> 00:33:58,854 AUDIENCE: The average in the treatment group is 507 or 8. 697 00:33:58,854 --> 00:34:00,230 MICHAEL KREMER: 508. 698 00:34:00,230 --> 00:34:03,140 AUDIENCE: And the average in the control group is 249. 699 00:34:03,140 --> 00:34:09,659 MICHAEL KREMER: 249. 700 00:34:09,659 --> 00:34:11,699 So now we do our evaluation. 701 00:34:11,699 --> 00:34:14,719 And we go through, and we say, OK, we 702 00:34:14,719 --> 00:34:16,580 gave out $500 to people. 703 00:34:16,580 --> 00:34:19,070 Now we've gone back to see how they're doing, compare them to 704 00:34:19,070 --> 00:34:20,030 the comparison group. 705 00:34:20,030 --> 00:34:25,570 And it looks like they're $259 richer. 706 00:34:25,570 --> 00:34:27,714 So did the program work? 707 00:34:27,714 --> 00:34:28,940 Well, the program worked. 708 00:34:28,940 --> 00:34:30,250 But was it cost effective? 709 00:34:30,250 --> 00:34:30,840 Not really. 710 00:34:30,840 --> 00:34:33,130 Because we gave them $500. 711 00:34:33,130 --> 00:34:35,940 They're only $250 approximately richer. 712 00:34:35,940 --> 00:34:37,970 This really wasn't a big success. 713 00:34:42,010 --> 00:34:43,000 AUDIENCE: Well-- 714 00:34:43,000 --> 00:34:44,455 MICHAEL KREMER: Go ahead. 715 00:34:44,455 --> 00:34:46,880 AUDIENCE: That's only one way of looking at it, right? 716 00:34:46,880 --> 00:34:47,560 MICHAEL KREMER: Exactly. 717 00:34:47,560 --> 00:34:49,810 That's one way of looking at it. 718 00:34:49,810 --> 00:34:53,150 If you came with that conclusion, you'd be missing a 719 00:34:53,150 --> 00:34:54,830 really important dimension of what the impact 720 00:34:54,830 --> 00:34:55,449 of the program is. 721 00:34:55,449 --> 00:34:58,530 Certainly, if you're a policy maker who's mostly concerned 722 00:34:58,530 --> 00:35:01,310 about what's the impact on the community, not what's the 723 00:35:01,310 --> 00:35:05,110 impact on the particular individual I gave it to, then 724 00:35:05,110 --> 00:35:07,470 you'd basically have a very misleading answer. 725 00:35:07,470 --> 00:35:10,060 So that's the danger. 726 00:35:10,060 --> 00:35:14,590 The topic of this lecture is what are threats. 727 00:35:14,590 --> 00:35:16,340 And this is a threat. 728 00:35:16,340 --> 00:35:18,870 You misunderstand the impact of the program because you 729 00:35:18,870 --> 00:35:20,970 haven't adequately accounted for the externality. 730 00:35:30,790 --> 00:35:31,720 That's the problem. 731 00:35:31,720 --> 00:35:34,245 Let me now talk about what can you do about that problem. 732 00:35:37,000 --> 00:35:39,840 So let me look at this in the context of deworming. 733 00:35:39,840 --> 00:35:45,680 Then maybe we can come back to this example again. 734 00:35:45,680 --> 00:35:49,780 So in the case of deworming, a lot of the earlier work 735 00:35:49,780 --> 00:35:53,580 randomized deworming treatment within schools. 736 00:35:53,580 --> 00:36:00,080 So the problem is that when you are dewormed, that may 737 00:36:00,080 --> 00:36:03,310 interfere with the transmission of the disease. 738 00:36:03,310 --> 00:36:06,020 If the treatment kills the worms in your body, that means 739 00:36:06,020 --> 00:36:08,270 the worms are no longer laying eggs, they're no longer being 740 00:36:08,270 --> 00:36:10,150 spread in the community as much. 741 00:36:10,150 --> 00:36:13,820 So what's the problem that that's going to create for the 742 00:36:13,820 --> 00:36:15,070 evaluation? 743 00:36:17,080 --> 00:36:18,770 AUDIENCE: You're going to see benefits in the control group. 744 00:36:18,770 --> 00:36:19,040 MICHAEL KREMER: Right. 745 00:36:19,040 --> 00:36:21,470 You could see benefits in the control group, just as this 746 00:36:21,470 --> 00:36:22,720 cash example. 747 00:36:28,150 --> 00:36:31,800 In this particular case, we argue that those benefits 748 00:36:31,800 --> 00:36:34,110 might not just have affected kids who go to that school, 749 00:36:34,110 --> 00:36:36,500 but might have also affected neighboring schools as well. 750 00:36:36,500 --> 00:36:39,000 But let's start out with the analytically simpler case. 751 00:36:39,000 --> 00:36:41,200 Suppose the benefits are local. 752 00:36:41,200 --> 00:36:43,865 So suppose you only shared money with your neighbors, but 753 00:36:43,865 --> 00:36:46,860 you don't share money with people in another classroom in 754 00:36:46,860 --> 00:36:48,390 engineering or something like that. 755 00:36:51,180 --> 00:36:54,750 And how can you measure the total impact, the impact on 756 00:36:54,750 --> 00:36:58,290 the community as a whole of the program? 757 00:36:58,290 --> 00:36:59,540 What could you do in that case? 758 00:37:05,450 --> 00:37:09,770 AUDIENCE: You could phase in at different rates to try and 759 00:37:09,770 --> 00:37:13,607 evaluate what would be the impact of just having a 760 00:37:13,607 --> 00:37:17,180 peer-controlled peer treatment and then try and figure out 761 00:37:17,180 --> 00:37:20,108 from the phase-in what the impact of the 762 00:37:20,108 --> 00:37:23,520 externality would be. 763 00:37:23,520 --> 00:37:25,490 MICHAEL KREMER: So you could phase it in. 764 00:37:25,490 --> 00:37:27,680 In this case, if the externalities were local 765 00:37:27,680 --> 00:37:29,790 within a school or within a classroom, in the case of this 766 00:37:29,790 --> 00:37:33,660 money example, you could phase it in at the level of schools 767 00:37:33,660 --> 00:37:35,010 or of classrooms. 768 00:37:35,010 --> 00:37:37,340 And say, we're going to do 20% of the people in that 769 00:37:37,340 --> 00:37:39,930 classroom, 40% of the people in this classroom, 60% of the 770 00:37:39,930 --> 00:37:42,280 people in that classroom. 771 00:37:42,280 --> 00:37:46,380 By the way, before I go further with this, so there's 772 00:37:46,380 --> 00:37:48,680 an advantage of this-- well, let me come back to this. 773 00:37:48,680 --> 00:37:50,555 I'm going to immediately assess the advantage of this, 774 00:37:50,555 --> 00:37:52,920 but there's also a disadvantage. 775 00:37:52,920 --> 00:37:57,910 So let's take this case where's there's externalities 776 00:37:57,910 --> 00:37:59,160 within a school. 777 00:38:02,890 --> 00:38:08,690 So if we think about this particular case, so imagine 778 00:38:08,690 --> 00:38:11,390 that there's no externalities. 779 00:38:11,390 --> 00:38:13,300 Pupil one is treated, and the outcome is 780 00:38:13,300 --> 00:38:14,200 they don't have worms. 781 00:38:14,200 --> 00:38:16,530 Pupil two is not treated, but they still don't have worms. 782 00:38:16,530 --> 00:38:18,290 Some people just don't get the worms. 783 00:38:18,290 --> 00:38:19,900 Pupil three is treated. 784 00:38:19,900 --> 00:38:21,850 They don't have worms because the medicine worked. 785 00:38:21,850 --> 00:38:24,640 Pupil four is not treated, and they do have worms. 786 00:38:24,640 --> 00:38:27,010 Pupil five is treated, and they don't have worms. 787 00:38:27,010 --> 00:38:29,200 Pupil six isn't treated, and they do have worms. 788 00:38:32,580 --> 00:38:35,430 So in this case, where there's no externalities going on, 789 00:38:35,430 --> 00:38:37,660 what's going to be the estimate of the treatment 790 00:38:37,660 --> 00:38:38,910 effect here? 791 00:38:43,980 --> 00:38:45,860 AUDIENCE: 100% [INAUDIBLE]. 792 00:38:45,860 --> 00:38:47,730 MICHAEL KREMER: I'm sorry. 793 00:38:47,730 --> 00:38:49,270 You said 100%? 794 00:38:49,270 --> 00:38:51,340 Do you want to go through the reasoning you're 795 00:38:51,340 --> 00:38:51,985 thinking up on that? 796 00:38:51,985 --> 00:38:53,235 AUDIENCE: [INAUDIBLE]. 797 00:38:59,310 --> 00:39:01,660 MICHAEL KREMER: So it's true that nobody who is treated has 798 00:39:01,660 --> 00:39:03,130 worms because the medicine works. 799 00:39:06,930 --> 00:39:11,480 So the total people in worms and the treatment group with 800 00:39:11,480 --> 00:39:12,860 worms is going to be 0. 801 00:39:12,860 --> 00:39:15,680 So in that sense, you've eliminated 100% of the group 802 00:39:15,680 --> 00:39:17,020 that does have worms. 803 00:39:17,020 --> 00:39:20,410 How many in the control group are going to have worms? 804 00:39:20,410 --> 00:39:22,350 AUDIENCE: Three. 805 00:39:22,350 --> 00:39:23,330 MICHAEL KREMER: Three. 806 00:39:23,330 --> 00:39:27,030 So it depends how you define-- 807 00:39:27,030 --> 00:39:30,520 this is a big distinction that people-- 808 00:39:30,520 --> 00:39:33,180 it's tedious, but it's important to make when you 809 00:39:33,180 --> 00:39:36,010 write things up is percentage effect versus 810 00:39:36,010 --> 00:39:37,860 percentage point effect. 811 00:39:37,860 --> 00:39:42,390 So percentage point is the absolute value. 812 00:39:42,390 --> 00:39:44,940 So let me first do the percentage point and then come 813 00:39:44,940 --> 00:39:46,540 back to the percent. 814 00:39:46,540 --> 00:39:51,120 So we have 0 people having it in the treatment group. 815 00:39:51,120 --> 00:39:53,410 The total in the control with worms, was that three if I 816 00:39:53,410 --> 00:39:54,120 remember right? 817 00:39:54,120 --> 00:39:54,570 OK. 818 00:39:54,570 --> 00:39:57,600 So 50% of people have it in the control group, 0 have it 819 00:39:57,600 --> 00:39:58,880 in the comparison group. 820 00:39:58,880 --> 00:40:01,710 So it's a 50 percentage point difference. 821 00:40:01,710 --> 00:40:03,680 The difference between 50 percentage points and 0 822 00:40:03,680 --> 00:40:04,520 percentage points. 823 00:40:04,520 --> 00:40:07,110 So one accurate way to write this up would be say we had a 824 00:40:07,110 --> 00:40:08,620 50 percentage point difference. 825 00:40:08,620 --> 00:40:11,970 Another way would be to say, we eliminated 100% of the 826 00:40:11,970 --> 00:40:13,310 initial level. 827 00:40:13,310 --> 00:40:14,040 They're both accurate. 828 00:40:14,040 --> 00:40:16,720 It's just different ways of expressing it. 829 00:40:16,720 --> 00:40:20,290 When you write things up, the convention is to use 830 00:40:20,290 --> 00:40:22,700 percentage point for the absolute value. 831 00:40:22,700 --> 00:40:24,020 So the treatment effect would be 50 832 00:40:24,020 --> 00:40:26,570 percentage points or 100%. 833 00:40:26,570 --> 00:40:30,010 But now suppose that you actually do have 834 00:40:30,010 --> 00:40:31,280 externalities. 835 00:40:31,280 --> 00:40:35,050 So some children are not reinfected with worms. 836 00:40:35,050 --> 00:40:39,300 So these worms have a life cycle, so eventually the worms 837 00:40:39,300 --> 00:40:39,940 in you die. 838 00:40:39,940 --> 00:40:41,980 You have a high wormload because you're continually 839 00:40:41,980 --> 00:40:44,030 being reinfected. 840 00:40:44,030 --> 00:40:48,840 So think about this example, where some of the kids in the 841 00:40:48,840 --> 00:40:52,550 comparison group don't get reinfected. 842 00:40:52,550 --> 00:40:54,530 Let's just think about the percentage point effect for 843 00:40:54,530 --> 00:40:55,440 comparison. 844 00:40:55,440 --> 00:40:57,570 What are you going to estimate the impact being in this case? 845 00:41:26,834 --> 00:41:28,322 AUDIENCE: 53? 846 00:41:28,322 --> 00:41:29,572 MICHAEL KREMER: Right. 847 00:41:34,930 --> 00:41:37,456 Did I just-- 848 00:41:37,456 --> 00:41:38,706 let's see this thing. 849 00:41:41,360 --> 00:41:43,296 Let me just do the-- 850 00:41:43,296 --> 00:41:44,320 I'm sorry. 851 00:41:44,320 --> 00:41:45,780 We didn't-- 852 00:41:45,780 --> 00:41:50,550 I think this other thing, did we do the counting 853 00:41:50,550 --> 00:41:51,270 right in that one? 854 00:41:51,270 --> 00:41:55,446 AUDIENCE: Well, you said there was 50% control with worms. 855 00:41:55,446 --> 00:41:57,258 And unless I'm misunderstanding it, it looks 856 00:41:57,258 --> 00:41:58,620 like it's 100%. 857 00:41:58,620 --> 00:42:01,560 MICHAEL KREMER: Yeah, that's right. 858 00:42:01,560 --> 00:42:03,570 Somebody had said 50, and I didn't look. 859 00:42:03,570 --> 00:42:05,000 I just assumed that was the right number. 860 00:42:05,000 --> 00:42:07,060 Let me just look at the nos. 861 00:42:07,060 --> 00:42:08,160 Yes, that's right. 862 00:42:08,160 --> 00:42:08,770 I'm sorry. 863 00:42:08,770 --> 00:42:10,720 It's 100% who have worms. 864 00:42:10,720 --> 00:42:13,750 Sorry, that was very confusing. 865 00:42:13,750 --> 00:42:16,280 So now I see why you said-- it's 100% either way whether 866 00:42:16,280 --> 00:42:18,370 it's percentage points or percent. 867 00:42:18,370 --> 00:42:19,730 You would have reached that same conclusion. 868 00:42:23,370 --> 00:42:25,810 Let me just go back here just to repeat, in case it wasn't 869 00:42:25,810 --> 00:42:28,210 clear to others like it wasn't clear to me. 870 00:42:28,210 --> 00:42:30,930 So if the total in the treatment with worms is 100% 871 00:42:30,930 --> 00:42:35,310 in this example, total in the control with worms is 0. 872 00:42:35,310 --> 00:42:39,000 I think I must've got confused in reading it in horizontal 873 00:42:39,000 --> 00:42:40,160 lines there. 874 00:42:40,160 --> 00:42:42,860 So in this case, the total in the treatment 875 00:42:42,860 --> 00:42:46,590 group with worms is-- 876 00:42:46,590 --> 00:42:50,150 we still have 0 in the treatment group with worms. 877 00:42:50,150 --> 00:42:55,330 And in the comparison group, we've got 67%, 67%. 878 00:42:59,685 --> 00:43:01,540 So we're going to estimate the treatment effect in 879 00:43:01,540 --> 00:43:03,800 this case being 67%. 880 00:43:03,800 --> 00:43:04,980 Now notice that this-- 881 00:43:04,980 --> 00:43:06,780 AUDIENCE: 130? 882 00:43:06,780 --> 00:43:07,230 Why? 883 00:43:07,230 --> 00:43:08,760 Oh, because it's the difference. 884 00:43:08,760 --> 00:43:09,380 MICHAEL KREMER: The difference, yeah. 885 00:43:09,380 --> 00:43:10,630 The difference between 100 and 67. 886 00:43:12,950 --> 00:43:16,160 Sorry, the hundred and-- 887 00:43:16,160 --> 00:43:17,160 see if we got this-- 888 00:43:17,160 --> 00:43:18,944 the hundred-- 889 00:43:18,944 --> 00:43:20,892 AUDIENCE: It's the total [UNINTELLIGIBLE] 890 00:43:20,892 --> 00:43:23,650 0. 891 00:43:23,650 --> 00:43:25,380 MICHAEL KREMER: This is 0, and this is 67. 892 00:43:25,380 --> 00:43:26,855 So we've estimated 67% effect. 893 00:43:32,226 --> 00:43:36,480 So the thing to take away from this is that if there were no 894 00:43:36,480 --> 00:43:40,450 externalities, we would have estimated this correctly at 895 00:43:40,450 --> 00:43:42,980 100%, the effect of the program. 896 00:43:42,980 --> 00:43:46,670 Now we say, suppose there are externalities to this. 897 00:43:46,670 --> 00:43:49,590 So now that makes the program actually better because more 898 00:43:49,590 --> 00:43:52,450 people are being cured of worms through this program. 899 00:43:52,450 --> 00:43:55,150 But we're going to estimate the effect of the program is 900 00:43:55,150 --> 00:43:56,660 actually lower. 901 00:43:56,660 --> 00:43:59,850 Instead of estimating the 100% benefit, we'll estimate only a 902 00:43:59,850 --> 00:44:01,100 67% benefit. 903 00:44:10,720 --> 00:44:11,900 So how do you deal with that? 904 00:44:11,900 --> 00:44:15,020 Well, if you design the unit of the randomization, so it 905 00:44:15,020 --> 00:44:19,170 encompasses all those spillovers, that's one way to 906 00:44:19,170 --> 00:44:20,530 address this problem. 907 00:44:20,530 --> 00:44:22,580 So if you expected all the externalities are within 908 00:44:22,580 --> 00:44:25,310 school, you can just randomize at the level of the school. 909 00:44:30,260 --> 00:44:35,230 So here's another approach. 910 00:44:35,230 --> 00:44:38,390 And this is the actual data from the program. 911 00:44:38,390 --> 00:44:40,380 The percentage of children with a moderate or heavy 912 00:44:40,380 --> 00:44:42,950 infection in the treatment schools was 27%. 913 00:44:42,950 --> 00:44:45,370 It was 52% in the comparison schools. 914 00:44:45,370 --> 00:44:47,210 So the program reduced moderate to heavy 915 00:44:47,210 --> 00:44:49,530 infections by 25%. 916 00:44:49,530 --> 00:44:52,760 This medicine probably affected more kids initially. 917 00:44:52,760 --> 00:44:55,420 But if you go back and measure a year later, some of them 918 00:44:55,420 --> 00:44:57,760 have been reinfected. 919 00:44:57,760 --> 00:44:59,420 You also got a reduction in the number of kids who were 920 00:44:59,420 --> 00:45:02,780 sick and who are anemic. 921 00:45:02,780 --> 00:45:07,680 This is comparing one school to another school. 922 00:45:07,680 --> 00:45:10,210 So we will have accounted for the total impact of the 923 00:45:10,210 --> 00:45:13,260 program within schools if there's within school 924 00:45:13,260 --> 00:45:14,510 spillovers. 925 00:45:26,120 --> 00:45:28,600 Suppose you wanted to actually measure the spillovers. 926 00:45:28,600 --> 00:45:31,290 Suppose you were interested in the spillovers themselves and 927 00:45:31,290 --> 00:45:32,710 not just the total impact. 928 00:45:32,710 --> 00:45:33,970 And you might well be. 929 00:45:33,970 --> 00:45:37,600 Imagine you are interested in the question of do we really 930 00:45:37,600 --> 00:45:40,350 need to incentivize people to take this, or could we charge 931 00:45:40,350 --> 00:45:41,110 them for the medicine. 932 00:45:41,110 --> 00:45:43,455 Well, if you thought that everybody benefited from the 933 00:45:43,455 --> 00:45:45,380 medicine pretty much equally, whether you took it or not, 934 00:45:45,380 --> 00:45:47,860 because most of the impact was on the transmission of the 935 00:45:47,860 --> 00:45:51,400 disease, then you might need to subsidize it more. 936 00:45:51,400 --> 00:45:53,630 You might want to subsidize it more than if you thought the 937 00:45:53,630 --> 00:45:55,560 individual got all the benefit. 938 00:45:55,560 --> 00:45:59,580 So if you actually want to measure the spillovers, here's 939 00:45:59,580 --> 00:46:04,130 one of the things we did in the paper on deworming. 940 00:46:04,130 --> 00:46:06,730 So at the time-- 941 00:46:06,730 --> 00:46:08,880 this is no longer the case, I want to emphasize-- but at the 942 00:46:08,880 --> 00:46:14,440 time, there was concern that the official guidelines were 943 00:46:14,440 --> 00:46:17,020 not to treat girls over 12 unless you knew they had 944 00:46:17,020 --> 00:46:20,010 worms, they shouldn't be treated presumptively in case 945 00:46:20,010 --> 00:46:22,240 the medicine caused birth defects and in case the girls 946 00:46:22,240 --> 00:46:23,590 were pregnant. 947 00:46:23,590 --> 00:46:26,030 Turns out, that they've now given this widely enough that 948 00:46:26,030 --> 00:46:29,500 WHO guidance is there's no evidence it causes birth 949 00:46:29,500 --> 00:46:31,390 defects, and you can give it to everybody. 950 00:46:31,390 --> 00:46:36,170 But at the time, they weren't giving it to girls above 12. 951 00:46:36,170 --> 00:46:39,500 So imagine you compared girls above 12 in the treatments 952 00:46:39,500 --> 00:46:41,835 schools to girls above 12 in the comparison schools. 953 00:46:47,240 --> 00:46:49,570 There are some other things we can do. 954 00:46:49,570 --> 00:46:51,830 So there are some other sources of who wasn't treated. 955 00:46:51,830 --> 00:46:53,310 This comparison I'm going to show you is a little 956 00:46:53,310 --> 00:46:54,340 bit more than that. 957 00:46:54,340 --> 00:46:57,840 But you can compare the treated students in treatment 958 00:46:57,840 --> 00:46:59,940 schools to the comparable students in 959 00:46:59,940 --> 00:47:01,030 the comparison schools. 960 00:47:01,030 --> 00:47:06,450 So kids who looked comparable on a variety of observable 961 00:47:06,450 --> 00:47:10,530 dimensions or who wound up taking this when they became 962 00:47:10,530 --> 00:47:11,950 eligible to take it. 963 00:47:11,950 --> 00:47:14,380 We saw a very big gap in prevalence 964 00:47:14,380 --> 00:47:15,960 among those two groups. 965 00:47:15,960 --> 00:47:17,230 So this is much more of a straight 966 00:47:17,230 --> 00:47:22,680 treatment comparison look. 967 00:47:22,680 --> 00:47:25,170 Here, we're looking at the untreated students in the 968 00:47:25,170 --> 00:47:28,010 treatment schools and trying to find comparable students in 969 00:47:28,010 --> 00:47:29,260 the comparison schools. 970 00:47:32,080 --> 00:47:36,480 I should emphasize this isn't quite as pure as a standard 971 00:47:36,480 --> 00:47:37,730 randomized design. 972 00:47:41,310 --> 00:47:43,170 This program was phased in over time. 973 00:47:43,170 --> 00:47:45,360 These are the people that when their school was phased in, 974 00:47:45,360 --> 00:47:46,620 they wound up not getting treated. 975 00:47:46,620 --> 00:47:49,910 So maybe there were differences between years, but 976 00:47:49,910 --> 00:47:52,220 that's a caveat or a footnote. 977 00:47:56,630 --> 00:47:58,910 So none of these guys were treated, but these people were 978 00:47:58,910 --> 00:48:01,710 in a school where their classmates were treated. 979 00:48:01,710 --> 00:48:04,510 So they have much lower levels of infection than these people 980 00:48:04,510 --> 00:48:06,000 who were also not treated, but whose 981 00:48:06,000 --> 00:48:07,250 classmates were not treated. 982 00:48:16,710 --> 00:48:19,780 Now what if you expect externalities across-- 983 00:48:19,780 --> 00:48:26,550 so actually, before I go on to this further challenge of what 984 00:48:26,550 --> 00:48:28,900 if there are externalities across schools, just sticking 985 00:48:28,900 --> 00:48:32,140 with this question of externalities within schools, 986 00:48:32,140 --> 00:48:34,050 talked about one way of dealing with that was to do 987 00:48:34,050 --> 00:48:36,670 the randomization at the level of the school. 988 00:48:36,670 --> 00:48:38,950 So what's the disadvantage of doing the randomization at the 989 00:48:38,950 --> 00:48:40,200 level of the school? 990 00:48:47,160 --> 00:48:49,905 AUDIENCE: Assuming that everybody in the same school 991 00:48:49,905 --> 00:48:51,300 is at the same level. 992 00:48:56,930 --> 00:48:59,780 MICHAEL KREMER: So you could still have some differences 993 00:48:59,780 --> 00:49:01,800 within the school. 994 00:49:01,800 --> 00:49:07,990 But there is a sense in which you're going to have less 995 00:49:07,990 --> 00:49:10,230 information if you're randomizing at the level of 996 00:49:10,230 --> 00:49:11,890 the school. 997 00:49:11,890 --> 00:49:13,564 Yes. 998 00:49:13,564 --> 00:49:15,350 AUDIENCE: You'd need more schools. 999 00:49:15,350 --> 00:49:18,600 MICHAEL KREMER: Yeah, right. 1000 00:49:18,600 --> 00:49:23,835 The crudest way of putting this is if there's 400 kids in 1001 00:49:23,835 --> 00:49:26,550 a school and you have 100 schools and you're randomizing 1002 00:49:26,550 --> 00:49:28,750 at the level of the individual, then you've got 1003 00:49:28,750 --> 00:49:30,090 40,000 observations. 1004 00:49:30,090 --> 00:49:32,740 And if you've got 200 schools you're randomizing and you're 1005 00:49:32,740 --> 00:49:36,940 randomizing at the level of the school, you've got 100 1006 00:49:36,940 --> 00:49:39,070 treatment schools and 100 comparison schools. 1007 00:49:39,070 --> 00:49:43,270 Much smaller sample size, much less power. 1008 00:49:43,270 --> 00:49:45,830 That particular calculation I just did is overstating the 1009 00:49:45,830 --> 00:49:47,080 difference. 1010 00:49:50,230 --> 00:49:51,100 You've learned about clustering 1011 00:49:51,100 --> 00:49:52,130 standard errors before. 1012 00:49:52,130 --> 00:49:55,120 But since there's a lot of variation-- 1013 00:49:55,120 --> 00:49:57,020 to come back to the way you were putting it-- since 1014 00:49:57,020 --> 00:49:59,350 there's a lot of random variation between schools-- 1015 00:49:59,350 --> 00:50:01,315 some schools have good headmasters, some schools have 1016 00:50:01,315 --> 00:50:04,490 bad headmasters, et cetera-- 1017 00:50:04,490 --> 00:50:07,300 there's a lot of background noise. 1018 00:50:07,300 --> 00:50:10,470 And it's going to make it harder to estimate precisely 1019 00:50:10,470 --> 00:50:11,720 the impact of the program. 1020 00:50:14,180 --> 00:50:16,565 So you really have to think about your particular context. 1021 00:50:16,565 --> 00:50:19,330 When you're thinking about what level to randomize at, 1022 00:50:19,330 --> 00:50:21,900 think about, in your context, do you think spillovers are a 1023 00:50:21,900 --> 00:50:22,710 real issue. 1024 00:50:22,710 --> 00:50:25,400 If you think spillovers are a real issue, then you better 1025 00:50:25,400 --> 00:50:26,890 randomize at a higher level. 1026 00:50:26,890 --> 00:50:29,430 But if you think, in this particular context, I don't 1027 00:50:29,430 --> 00:50:32,470 need to worry about it, if this were a cancer drug rather 1028 00:50:32,470 --> 00:50:35,420 than a worm drug, then you wouldn't need to worry about 1029 00:50:35,420 --> 00:50:36,680 it, and you're much better off randomizing at 1030 00:50:36,680 --> 00:50:37,950 the individual level. 1031 00:50:41,410 --> 00:50:43,686 There might also be-- yes. 1032 00:50:43,686 --> 00:50:46,010 AUDIENCE: So this might be a little too [INAUDIBLE], but if 1033 00:50:46,010 --> 00:50:48,168 you're worried about attrition, would randomizing 1034 00:50:48,168 --> 00:50:52,152 at a higher level make that less of an issue as well 1035 00:50:52,152 --> 00:50:54,642 because now you're looking at a higher [INAUDIBLE]? 1036 00:50:54,642 --> 00:50:57,630 So if you lose individuals within that, 1037 00:50:57,630 --> 00:50:59,380 it's still an issue. 1038 00:50:59,380 --> 00:51:02,300 MICHAEL KREMER: It would still be an issue. 1039 00:51:02,300 --> 00:51:03,550 Yeah, it's still an issue. 1040 00:51:10,110 --> 00:51:11,470 So let's say that we've decided 1041 00:51:11,470 --> 00:51:12,430 we're going to randomize. 1042 00:51:12,430 --> 00:51:14,030 Take this worm example. 1043 00:51:14,030 --> 00:51:16,020 And we think that most of the externalities are within 1044 00:51:16,020 --> 00:51:19,580 schools, so we're going to randomize within schools. 1045 00:51:19,580 --> 00:51:22,690 We know that there might be some externalities across 1046 00:51:22,690 --> 00:51:26,300 schools because this is an environment where everybody 1047 00:51:26,300 --> 00:51:28,200 lives on their own farm basically. 1048 00:51:28,200 --> 00:51:30,390 So you might have two kids living next to each other, one 1049 00:51:30,390 --> 00:51:31,750 of whom goes to one school and another 1050 00:51:31,750 --> 00:51:33,300 goes to another school. 1051 00:51:33,300 --> 00:51:34,130 That's not that uncommon. 1052 00:51:34,130 --> 00:51:36,370 So you could have some externalities across schools. 1053 00:51:36,370 --> 00:51:41,530 But randomizing at the level of a district would really not 1054 00:51:41,530 --> 00:51:43,580 logistically be very possible. 1055 00:51:43,580 --> 00:51:45,400 You'd have no sample size left. 1056 00:51:45,400 --> 00:51:47,770 So you know there might be some externalities across 1057 00:51:47,770 --> 00:51:50,600 schools as well as those within them. 1058 00:51:50,600 --> 00:51:52,640 But you've already made the decision to randomize at the 1059 00:51:52,640 --> 00:51:53,860 level of schools. 1060 00:51:53,860 --> 00:51:55,030 So what do you do? 1061 00:51:55,030 --> 00:51:59,380 Well, what you can try and do is use random variation in the 1062 00:51:59,380 --> 00:52:00,930 density of treatment nearby. 1063 00:52:00,930 --> 00:52:03,636 So if you pick the schools randomly that were going to be 1064 00:52:03,636 --> 00:52:06,680 treatment schools and the ones that are going to be 1065 00:52:06,680 --> 00:52:09,270 comparison schools, there will be some comparison schools 1066 00:52:09,270 --> 00:52:11,730 that happen to be completely surrounded 1067 00:52:11,730 --> 00:52:13,130 by treatment schools. 1068 00:52:13,130 --> 00:52:14,750 There will be other comparison schools that don't have any 1069 00:52:14,750 --> 00:52:16,420 treatment schools nearby. 1070 00:52:16,420 --> 00:52:19,290 So you can use that to try to pick up how big the 1071 00:52:19,290 --> 00:52:20,490 externality is. 1072 00:52:20,490 --> 00:52:24,750 So that's what we tried to do in this paper. 1073 00:52:24,750 --> 00:52:26,650 So here's a map. 1074 00:52:26,650 --> 00:52:28,370 So the ones are group one schools. 1075 00:52:28,370 --> 00:52:29,410 They've been treated. 1076 00:52:29,410 --> 00:52:30,940 The twos are group two schools, treated 1077 00:52:30,940 --> 00:52:31,910 in the second way. 1078 00:52:31,910 --> 00:52:33,660 Threes are group three schools, now treated 1079 00:52:33,660 --> 00:52:34,910 to the third way. 1080 00:52:38,320 --> 00:52:40,940 Here's a school that's in the middle of the lake which I 1081 00:52:40,940 --> 00:52:43,760 think is actually on an island. 1082 00:52:43,760 --> 00:52:48,520 These schools in Uganda are not really in Uganda. 1083 00:52:48,520 --> 00:52:53,710 GPS used to be intentionally degraded because people were 1084 00:52:53,710 --> 00:52:55,300 wanting to use it for military-- it was developed by 1085 00:52:55,300 --> 00:52:56,470 the military, I guess-- 1086 00:52:56,470 --> 00:52:59,190 they didn't want foreign militaries to have it. 1087 00:53:02,510 --> 00:53:05,020 There are measured with some error. 1088 00:53:05,020 --> 00:53:06,460 So we've got these schools. 1089 00:53:06,460 --> 00:53:10,420 So we can see there are some schools that are near 1090 00:53:10,420 --> 00:53:11,230 treatment schools. 1091 00:53:11,230 --> 00:53:13,470 Other schools aren't near treatment schools. 1092 00:53:13,470 --> 00:53:15,920 By the way, the treatment schools in this example I just 1093 00:53:15,920 --> 00:53:18,710 did here, the ones shared with the twos. 1094 00:53:18,710 --> 00:53:20,900 But you could imagine the ones might share with other ones. 1095 00:53:20,900 --> 00:53:22,480 So there could be externalities on other 1096 00:53:22,480 --> 00:53:23,730 treatment schools. 1097 00:53:27,680 --> 00:53:32,130 Here's a group three school that's all by itself and 1098 00:53:32,130 --> 00:53:33,600 doesn't have any neighbors who were treated. 1099 00:53:36,250 --> 00:53:39,640 Here is a group three school that has three group one 1100 00:53:39,640 --> 00:53:42,300 schools that are treated. 1101 00:53:42,300 --> 00:53:47,340 So would you want to compare those to estimate what the 1102 00:53:47,340 --> 00:53:48,640 effect of the deworming program is? 1103 00:53:53,810 --> 00:53:56,732 AUDIENCE: Wouldn't you use it to estimate the impact of the 1104 00:53:56,732 --> 00:53:58,210 spillovers? 1105 00:53:58,210 --> 00:53:59,620 MICHAEL KREMER: Yeah, suppose you were interested in 1106 00:53:59,620 --> 00:54:01,610 estimating the impact of the spillovers, the medical 1107 00:54:01,610 --> 00:54:03,240 spillovers of treatment. 1108 00:54:03,240 --> 00:54:05,975 Could you compare those two schools? 1109 00:54:08,830 --> 00:54:11,410 What might make that comparison invalid if you're 1110 00:54:11,410 --> 00:54:12,855 trying to estimate the impact of spillovers? 1111 00:54:12,855 --> 00:54:14,105 AUDIENCE: [INAUDIBLE]. 1112 00:54:16,600 --> 00:54:17,790 MICHAEL KREMER: I'm sorry. 1113 00:54:17,790 --> 00:54:18,820 One means a treatment school. 1114 00:54:18,820 --> 00:54:22,325 Three means a comparison school. 1115 00:54:22,325 --> 00:54:23,230 AUDIENCE: [INAUDIBLE]. 1116 00:54:23,230 --> 00:54:25,610 MICHAEL KREMER: So one's more rural. 1117 00:54:25,610 --> 00:54:26,860 Exactly. 1118 00:54:28,910 --> 00:54:31,070 This one is obviously in a less densely settled 1119 00:54:31,070 --> 00:54:31,770 population. 1120 00:54:31,770 --> 00:54:34,670 This one, turns out these are all rural, but this is 1121 00:54:34,670 --> 00:54:36,020 obviously much more densely settled. 1122 00:54:36,020 --> 00:54:39,270 That's why they've got all these schools around there. 1123 00:54:39,270 --> 00:54:43,960 So now in this particular setting, why 1124 00:54:43,960 --> 00:54:45,210 might that be a problem? 1125 00:54:50,450 --> 00:54:50,940 Yeah. 1126 00:54:50,940 --> 00:54:52,900 AUDIENCE: Because that area might be internally different 1127 00:54:52,900 --> 00:54:55,900 from the [INAUDIBLE]. 1128 00:54:55,900 --> 00:54:57,920 MICHAEL KREMER: Yeah. 1129 00:54:57,920 --> 00:54:58,970 So this is a disease. 1130 00:54:58,970 --> 00:55:00,460 I probably should have said more 1131 00:55:00,460 --> 00:55:01,580 about this in the beginning. 1132 00:55:01,580 --> 00:55:06,360 So these worms affect one out of every three or four people 1133 00:55:06,360 --> 00:55:06,990 in the world. 1134 00:55:06,990 --> 00:55:10,950 And they're spread through fecal-oral routes. 1135 00:55:10,950 --> 00:55:12,400 They're spread through fecal matter. 1136 00:55:16,220 --> 00:55:19,790 What are the odds that you're going to get contaminated with 1137 00:55:19,790 --> 00:55:20,360 fecal matter? 1138 00:55:20,360 --> 00:55:22,710 It depends how many other people are depositing fecal 1139 00:55:22,710 --> 00:55:24,110 matter in the environment. 1140 00:55:24,110 --> 00:55:27,270 Clearly, over here, there's a lot of people nearby you who 1141 00:55:27,270 --> 00:55:29,550 might be depositing fecal matter in the environment. 1142 00:55:29,550 --> 00:55:31,450 Over here, there aren't so many. 1143 00:55:31,450 --> 00:55:33,810 So you might think that that's how densely settled the 1144 00:55:33,810 --> 00:55:35,360 population is. 1145 00:55:35,360 --> 00:55:40,100 We don't think of Alaska or the middle of the desert 1146 00:55:40,100 --> 00:55:44,680 somewhere being very diseased environments. 1147 00:55:44,680 --> 00:55:48,500 But you think of a highly concentrated place having a 1148 00:55:48,500 --> 00:55:49,750 lot more disease. 1149 00:55:51,980 --> 00:55:56,120 There's reasons to think that sparsely settled places will 1150 00:55:56,120 --> 00:55:59,800 have different prevalence of the disease than heavily 1151 00:55:59,800 --> 00:56:02,500 settled places. 1152 00:56:02,500 --> 00:56:07,440 So what we did because of that, we didn't want to just 1153 00:56:07,440 --> 00:56:09,310 look at the number of treatment school nearby. 1154 00:56:09,310 --> 00:56:12,326 We just talked about why that would be a problem. 1155 00:56:12,326 --> 00:56:14,930 But we want to do that controlling for the total 1156 00:56:14,930 --> 00:56:16,600 number of schools nearby. 1157 00:56:16,600 --> 00:56:19,300 So we control for the total density in the area, total 1158 00:56:19,300 --> 00:56:21,390 number of schools within a certain distance or pupils 1159 00:56:21,390 --> 00:56:25,870 within a certain distance, and see what's the effect of those 1160 00:56:25,870 --> 00:56:27,600 schools being treatment schools as opposed to being 1161 00:56:27,600 --> 00:56:28,470 comparison schools. 1162 00:56:28,470 --> 00:56:31,020 Oops, did I skip a-- 1163 00:56:31,020 --> 00:56:31,270 OK. 1164 00:56:31,270 --> 00:56:35,900 So controlling for density, what we find is that the 1165 00:56:35,900 --> 00:56:38,810 infection rates are 26 percentage points lower per 1166 00:56:38,810 --> 00:56:43,600 1,000 pupils in treatment schools within 3 kilometers. 1167 00:56:43,600 --> 00:56:46,420 And then if you go further out, there are 14 percentage 1168 00:56:46,420 --> 00:56:48,860 points per treatment schools that are 3 to 1169 00:56:48,860 --> 00:56:50,220 6 kilometers away. 1170 00:56:50,220 --> 00:56:53,880 So this is controlling for the overall density in the area. 1171 00:56:53,880 --> 00:56:56,290 So hopefully, we're abstracting from that 1172 00:56:56,290 --> 00:56:57,540 particular problem. 1173 00:57:01,210 --> 00:57:04,270 So now suppose we want to estimate the overall effects. 1174 00:57:04,270 --> 00:57:09,110 So let me come back to this problem. 1175 00:57:09,110 --> 00:57:11,780 Clearly, we've incorrectly estimated. 1176 00:57:11,780 --> 00:57:15,960 We estimated that only $250 of benefit went through. 1177 00:57:15,960 --> 00:57:18,480 But we think that the true effect should include the 1178 00:57:18,480 --> 00:57:19,730 effect on the comparison. 1179 00:57:21,960 --> 00:57:26,860 In this previous case, we were able to estimate the increase 1180 00:57:26,860 --> 00:57:30,200 in school participation in the treatment group and then also 1181 00:57:30,200 --> 00:57:33,990 in the comparison group through this technique that I 1182 00:57:33,990 --> 00:57:35,130 just outlined. 1183 00:57:35,130 --> 00:57:38,720 So we know in the comparison schools, there's a 1.5 1184 00:57:38,720 --> 00:57:40,615 percentage point increase in school participation. 1185 00:57:45,900 --> 00:57:48,000 There are three pupils in control schools for every 1186 00:57:48,000 --> 00:57:50,660 treated child. 1187 00:57:50,660 --> 00:57:53,990 And in the treatment schools, there was a 7 percentage point 1188 00:57:53,990 --> 00:57:56,090 increase in school participation for all 1189 00:57:56,090 --> 00:58:00,310 children, but you only needed to treat 2/3 of the children. 1190 00:58:00,310 --> 00:58:03,870 So you can then calculate what the overall effect is of 1191 00:58:03,870 --> 00:58:05,280 treating one child. 1192 00:58:05,280 --> 00:58:07,980 So if you treat one child, you pick up three children in 1193 00:58:07,980 --> 00:58:12,420 comparison schools, each of whom gets a benefit of 0.015 1194 00:58:12,420 --> 00:58:16,950 additional years of education. 1195 00:58:16,950 --> 00:58:22,020 Then you pick up this is the effect on children in the same 1196 00:58:22,020 --> 00:58:24,700 school, and you get an overall effect of 1197 00:58:24,700 --> 00:58:27,210 0.15 years of education. 1198 00:58:27,210 --> 00:58:34,280 So treating a child costs about $0.50-- 1199 00:58:34,280 --> 00:58:37,720 in fact, it's probably cheaper than that when done at scale-- 1200 00:58:37,720 --> 00:58:47,840 but the impact that you're going to get is if each child, 1201 00:58:47,840 --> 00:58:50,500 you get an extra 0.15 years of education, if you treat seven 1202 00:58:50,500 --> 00:58:53,980 children, you'll get about an extra year of education. 1203 00:58:53,980 --> 00:58:56,935 7 times $0.50 is $3.50. 1204 00:58:56,935 --> 00:59:00,605 You spend $3.50 on deworming, and you get an 1205 00:59:00,605 --> 00:59:01,855 extra year of education. 1206 00:59:06,115 --> 00:59:10,200 Let me pause again here, and I'll go on and discuss some 1207 00:59:10,200 --> 00:59:14,200 issues on partial compliance and sample selection bias. 1208 00:59:14,200 --> 00:59:18,225 I'll get partway through that topic, and then Shawn's going 1209 00:59:18,225 --> 00:59:20,900 to take up where I leave off. 1210 00:59:20,900 --> 00:59:23,650 But are there any questions on externalities before I go on? 1211 00:59:27,480 --> 00:59:28,730 OK. 1212 00:59:32,510 --> 00:59:37,800 So you might think if you randomize where the treatment 1213 00:59:37,800 --> 00:59:42,000 is, you're going to get rid of sample selection bias. 1214 00:59:42,000 --> 00:59:43,950 That's not necessarily the case. 1215 00:59:43,950 --> 00:59:46,940 So let me show an example. 1216 00:59:46,940 --> 00:59:52,160 So where you randomize where you want the program to be, 1217 00:59:52,160 --> 00:59:54,870 that's not necessarily the sole determinant of which 1218 00:59:54,870 --> 00:59:56,120 places actually get treated. 1219 00:59:59,280 --> 01:00:02,660 So let me talk about why. 1220 01:00:02,660 --> 01:00:05,600 So one example would be people who are assigned to the 1221 01:00:05,600 --> 01:00:09,680 comparison group might try to move into the treatment group. 1222 01:00:09,680 --> 01:00:12,480 I don't think this happened, but parents could try and move 1223 01:00:12,480 --> 01:00:14,140 their children from the comparison school to the 1224 01:00:14,140 --> 01:00:15,210 treatment school. 1225 01:00:15,210 --> 01:00:17,690 It's at least hypothetically possible. 1226 01:00:17,690 --> 01:00:21,770 What are other possible reasons why you might not get 1227 01:00:21,770 --> 01:00:26,100 this match between the initial assignment and where people 1228 01:00:26,100 --> 01:00:30,960 wound up, where the people wound up treated? 1229 01:00:30,960 --> 01:00:31,390 Yeah. 1230 01:00:31,390 --> 01:00:33,062 AUDIENCE: So you might get somebody treated, and they 1231 01:00:33,062 --> 01:00:34,620 don't want to take the medication. 1232 01:00:34,620 --> 01:00:36,645 MICHAEL KREMER: Sure, exactly. 1233 01:00:36,645 --> 01:00:39,705 In the case of deworming, there were some people who 1234 01:00:39,705 --> 01:00:42,790 either didn't want to take the medication or who maybe they 1235 01:00:42,790 --> 01:00:44,425 wanted to, but they weren't able to get the permission 1236 01:00:44,425 --> 01:00:46,270 slip to do it. 1237 01:00:46,270 --> 01:00:48,140 So that's one great example. 1238 01:00:48,140 --> 01:00:49,435 What are other possible examples? 1239 01:00:56,290 --> 01:00:57,800 If you think about your concrete 1240 01:00:57,800 --> 01:01:00,720 experience, imagine that-- 1241 01:01:00,720 --> 01:01:04,880 I'll tell you a story from our experience. 1242 01:01:04,880 --> 01:01:07,370 When this NGO was trying to get started and they were 1243 01:01:07,370 --> 01:01:08,970 trying to pick the seven schools where they were going 1244 01:01:08,970 --> 01:01:12,200 to work, they picked the schools, and they had to go to 1245 01:01:12,200 --> 01:01:14,640 the government for permission to start working. 1246 01:01:14,640 --> 01:01:18,850 And permission was slow. 1247 01:01:18,850 --> 01:01:21,320 It kept being slow and slow. 1248 01:01:21,320 --> 01:01:23,640 And they didn't realize what was going on. 1249 01:01:23,640 --> 01:01:27,090 And it turned out, eventually, that there was a politician 1250 01:01:27,090 --> 01:01:28,340 who was upset. 1251 01:01:30,210 --> 01:01:32,760 The NGO didn't understand why the politician was upset. 1252 01:01:32,760 --> 01:01:38,400 Because one of the schools was in his constituency, where 1253 01:01:38,400 --> 01:01:39,750 they were going to start working. 1254 01:01:39,750 --> 01:01:41,920 Well, it turned out it was in the part of his constituency 1255 01:01:41,920 --> 01:01:43,170 that voted for his opponent. 1256 01:01:47,090 --> 01:01:49,420 So in that sort of a situation, what the-- 1257 01:01:51,950 --> 01:02:02,210 I don't remember exactly the timing of this, but what the 1258 01:02:02,210 --> 01:02:06,570 eventual resolution of this was they started working in 1259 01:02:06,570 --> 01:02:08,640 the other part of his constituency, where his 1260 01:02:08,640 --> 01:02:10,580 supporters lived as well. 1261 01:02:10,580 --> 01:02:13,310 So they're all sorts of cases where you're going to want to 1262 01:02:13,310 --> 01:02:15,800 randomize, but you may not be able to 1263 01:02:15,800 --> 01:02:19,180 have that happen perfectly. 1264 01:02:19,180 --> 01:02:22,290 In this case, it wasn't a, quote, "legitimate" reason. 1265 01:02:22,290 --> 01:02:23,800 But there are other cases where there'd be very 1266 01:02:23,800 --> 01:02:25,330 legitimate reasons why. 1267 01:02:25,330 --> 01:02:30,390 Maybe the need is very intense in some area, and so the NGO 1268 01:02:30,390 --> 01:02:32,350 or the organization feels it's very important to 1269 01:02:32,350 --> 01:02:33,550 work in that area. 1270 01:02:33,550 --> 01:02:37,920 So there may be lots of reasons why some people in the 1271 01:02:37,920 --> 01:02:39,490 comparison group wind up getting treated. 1272 01:02:45,840 --> 01:02:48,080 So there are cases like we just heard about where 1273 01:02:48,080 --> 01:02:49,530 individuals allocated to treatment 1274 01:02:49,530 --> 01:02:52,530 might not get treatment. 1275 01:02:52,530 --> 01:02:53,990 And there are cases where people who are in the 1276 01:02:53,990 --> 01:02:57,560 comparison group do get treated. 1277 01:02:57,560 --> 01:03:00,980 So in the case of deworming, 78% of those assigned to 1278 01:03:00,980 --> 01:03:03,750 receive treatment got some treatment. 1279 01:03:03,750 --> 01:03:05,450 And the main reason they weren't treated is they just 1280 01:03:05,450 --> 01:03:07,870 happened to be absent from school the day that the 1281 01:03:07,870 --> 01:03:08,940 treatment was given. 1282 01:03:08,940 --> 01:03:11,610 Some students in the comparison group were treated 1283 01:03:11,610 --> 01:03:14,130 because they went out and got the treatment on their own 1284 01:03:14,130 --> 01:03:15,580 through clinics. 1285 01:03:15,580 --> 01:03:17,010 So what do you do? 1286 01:03:17,010 --> 01:03:19,380 Suppose this has already happened. 1287 01:03:19,380 --> 01:03:21,710 So imagine you have data on everybody, so attrition isn't 1288 01:03:21,710 --> 01:03:23,780 the problem, but just the actual 1289 01:03:23,780 --> 01:03:24,670 assignment to treatment. 1290 01:03:24,670 --> 01:03:26,940 The assignment to treatment and actual treatment don't 1291 01:03:26,940 --> 01:03:28,190 correspond. 1292 01:03:30,070 --> 01:03:35,160 So first, what's the problem if you just do a straight 1293 01:03:35,160 --> 01:03:37,290 comparison, and what might you do about it? 1294 01:03:43,647 --> 01:03:45,114 AUDIENCE: In this [UNINTELLIGIBLE], say they 1295 01:03:45,114 --> 01:03:47,559 [UNINTELLIGIBLE] the students who were absent just by 1296 01:03:47,559 --> 01:03:51,000 [UNINTELLIGIBLE] to their homes and [UNINTELLIGIBLE]? 1297 01:03:51,000 --> 01:03:53,240 MICHAEL KREMER: No, so the program didn't do that. 1298 01:03:53,240 --> 01:03:55,760 So we talked about in the case of the evaluation, when you're 1299 01:03:55,760 --> 01:03:58,320 trying to measure the test scores or the impact on 1300 01:03:58,320 --> 01:04:00,670 attendance, you could-- well, impact on attendance, 1301 01:04:00,670 --> 01:04:01,920 obviously, you find out whether they're there or not 1302 01:04:01,920 --> 01:04:02,420 by visiting the school. 1303 01:04:02,420 --> 01:04:04,440 Unless you wanted to do test scores, you 1304 01:04:04,440 --> 01:04:05,070 could track them home. 1305 01:04:05,070 --> 01:04:07,770 But the way the program was implemented, those kids who 1306 01:04:07,770 --> 01:04:09,980 weren't at school that day when they gave out the 1307 01:04:09,980 --> 01:04:12,390 deworming pills, they just didn't get treated. 1308 01:04:12,390 --> 01:04:14,310 Maybe the program shouldn't have been run that way, but 1309 01:04:14,310 --> 01:04:15,680 that's how it was run. 1310 01:04:15,680 --> 01:04:17,780 And there are reasons why maybe it 1311 01:04:17,780 --> 01:04:19,030 should be run that way. 1312 01:04:25,884 --> 01:04:27,504 AUDIENCE: In the end, you wouldn't be considering the 1313 01:04:27,504 --> 01:04:29,772 effect of actually treating people? 1314 01:04:29,772 --> 01:04:34,146 You'd be comparing the effect of intending to treat people. 1315 01:04:38,160 --> 01:04:40,270 MICHAEL KREMER: This is exactly where we're going to 1316 01:04:40,270 --> 01:04:45,960 go, and it's where I'm going to wind up and where Shawn's 1317 01:04:45,960 --> 01:04:47,250 going to be taking over. 1318 01:04:50,670 --> 01:04:53,290 Imagine you are interested in the impact of this program on 1319 01:04:53,290 --> 01:04:54,410 test scores. 1320 01:04:54,410 --> 01:05:02,350 So one thing you might think would be the right thing to do 1321 01:05:02,350 --> 01:05:07,990 would be to just look at the people who actually were 1322 01:05:07,990 --> 01:05:10,820 treated and compare them to people who 1323 01:05:10,820 --> 01:05:12,150 actually weren't treated. 1324 01:05:12,150 --> 01:05:13,960 That's going to be problematic for reasons that 1325 01:05:13,960 --> 01:05:14,830 we'll explain later. 1326 01:05:14,830 --> 01:05:18,600 But let me follow up on your suggestion which is if you're 1327 01:05:18,600 --> 01:05:26,350 a policy maker, there are questions beyond this you'd be 1328 01:05:26,350 --> 01:05:27,290 interested in. 1329 01:05:27,290 --> 01:05:29,790 But you raised the idea of saying, well, what's the 1330 01:05:29,790 --> 01:05:32,090 impact of the intent to treat somebody. 1331 01:05:32,090 --> 01:05:34,020 And that is going to be the right 1332 01:05:34,020 --> 01:05:35,240 answer to some questions. 1333 01:05:35,240 --> 01:05:37,100 So let me start with that question, the relatively 1334 01:05:37,100 --> 01:05:40,060 easier question. 1335 01:05:40,060 --> 01:05:43,840 I'll let Shawn handle the harder questions. 1336 01:05:43,840 --> 01:05:47,410 Suppose you're a policy maker, and you're saying, look, 1337 01:05:47,410 --> 01:05:50,100 what's the impact of putting in this 1338 01:05:50,100 --> 01:05:51,790 school-based deworming program? 1339 01:05:51,790 --> 01:05:53,680 Well, if you're interested in what's the impact of a 1340 01:05:53,680 --> 01:05:56,880 school-based deworming program, well, you know in 1341 01:05:56,880 --> 01:05:59,440 reality, it's a true thing that you want to get at that 1342 01:05:59,440 --> 01:06:00,580 some people are not going to get it. 1343 01:06:00,580 --> 01:06:02,540 If this is a school-based program and you hand out the 1344 01:06:02,540 --> 01:06:05,830 pills at the school, tracking kids to their houses who are 1345 01:06:05,830 --> 01:06:07,630 absent that day, that's expensive. 1346 01:06:07,630 --> 01:06:09,390 That's hard to implement. 1347 01:06:09,390 --> 01:06:11,280 It uses too much teachers' time. 1348 01:06:11,280 --> 01:06:12,650 You're probably not going to find that 1349 01:06:12,650 --> 01:06:14,000 many of the kids anyway. 1350 01:06:14,000 --> 01:06:15,620 So you wouldn't actually implement it that way. 1351 01:06:18,770 --> 01:06:20,240 If you're a scientist, you do care. 1352 01:06:20,240 --> 01:06:22,990 But if you're the policy maker, you might say, no, the 1353 01:06:22,990 --> 01:06:25,070 true effect of this program is that I'm only going to be able 1354 01:06:25,070 --> 01:06:28,870 to get 78% of the pupils because 22% of the pupils 1355 01:06:28,870 --> 01:06:30,020 aren't there. 1356 01:06:30,020 --> 01:06:32,140 And if some kids don't want worms-- 1357 01:06:32,140 --> 01:06:35,170 don't want the medicine-- sorry, if they don't want-- 1358 01:06:35,170 --> 01:06:38,060 or maybe they do want the worms, or they don't want the 1359 01:06:38,060 --> 01:06:39,630 worms, but they don't want the medicine either. 1360 01:06:39,630 --> 01:06:41,090 Anyway, if some kids aren't going to take it, then those 1361 01:06:41,090 --> 01:06:44,330 kids aren't going to take it. 1362 01:06:44,330 --> 01:06:46,610 So you might think, well, I want to measure the impact of 1363 01:06:46,610 --> 01:06:49,240 this program in realistic conditions. 1364 01:06:49,240 --> 01:06:51,120 And realistic conditions are that not everybody's going to 1365 01:06:51,120 --> 01:06:52,880 be able to get it. 1366 01:06:52,880 --> 01:06:56,600 So let's suppose that you're a policy maker. 1367 01:06:56,600 --> 01:06:59,090 Then what you could do is you could look at what's called 1368 01:06:59,090 --> 01:07:01,950 the intention to treat estimate, which is what's the 1369 01:07:01,950 --> 01:07:06,120 effect of the school having the program or being assigned 1370 01:07:06,120 --> 01:07:07,450 to the program. 1371 01:07:07,450 --> 01:07:09,930 This comes up in medical trials a lot with, say, 1372 01:07:09,930 --> 01:07:11,010 chemotherapy. 1373 01:07:11,010 --> 01:07:13,730 So some people who start chemotherapy don't finish it 1374 01:07:13,730 --> 01:07:17,440 because it's just too painful for them. 1375 01:07:17,440 --> 01:07:20,300 Or they're not able to handle it medically. 1376 01:07:20,300 --> 01:07:23,250 Again, do want to measure the impact of chemotherapy on 1377 01:07:23,250 --> 01:07:25,580 those people who managed to get all the way through? 1378 01:07:25,580 --> 01:07:26,710 Well, not necessarily. 1379 01:07:26,710 --> 01:07:29,340 Maybe what you're interested in is what's the effect of 1380 01:07:29,340 --> 01:07:30,945 being in this group that tries it. 1381 01:07:34,270 --> 01:07:36,051 Yes. 1382 01:07:36,051 --> 01:07:38,860 AUDIENCE: Do you think it actually happened in 1997? 1383 01:07:38,860 --> 01:07:40,560 MICHAEL KREMER: So yeah, I guess that actually 1384 01:07:40,560 --> 01:07:41,840 helps on the dates. 1385 01:07:41,840 --> 01:07:43,640 AUDIENCE: So it actually is 10 years on the-- 1386 01:07:43,640 --> 01:07:44,270 MICHAEL KREMER: That's right. 1387 01:07:44,270 --> 01:07:45,200 So it's 10 years. 1388 01:07:45,200 --> 01:07:47,810 It's 10 years before this was rolled out nationally. 1389 01:07:47,810 --> 01:07:50,610 So yes, some things happened before that, but this is a 1390 01:07:50,610 --> 01:07:51,550 long delay. 1391 01:07:51,550 --> 01:07:54,500 Yeah, so that first delay of publication 1392 01:07:54,500 --> 01:07:55,430 took quite a while. 1393 01:07:55,430 --> 01:07:59,680 And then there was a second delay after it. 1394 01:07:59,680 --> 01:08:03,930 Unfortunately, there's often a long delay in these things. 1395 01:08:03,930 --> 01:08:05,180 Let me see where we are. 1396 01:08:08,610 --> 01:08:11,930 So what you can do is you use the original assignment, and 1397 01:08:11,930 --> 01:08:13,980 then you're winding up with what's called an intention to 1398 01:08:13,980 --> 01:08:15,230 treat estimate. 1399 01:08:20,450 --> 01:08:25,014 So what intention to treat measures is what happened to 1400 01:08:25,014 --> 01:08:29,240 the average child who was in a treated school in the 1401 01:08:29,240 --> 01:08:30,420 population. 1402 01:08:30,420 --> 01:08:32,240 So it's not saying, what happens to the kids who 1403 01:08:32,240 --> 01:08:33,609 actually got the medicine. 1404 01:08:33,609 --> 01:08:35,752 It's saying, what happened to the average child who is in a 1405 01:08:35,752 --> 01:08:36,729 treated school. 1406 01:08:36,729 --> 01:08:40,160 So that's the correct interpretation of that. 1407 01:08:40,160 --> 01:08:41,920 Now is that the right number to look for? 1408 01:08:47,000 --> 01:08:49,160 I talked about some purposes where that might be the right 1409 01:08:49,160 --> 01:08:50,029 number to look for. 1410 01:08:50,029 --> 01:08:53,859 What would be some reasons why you might be interested in 1411 01:08:53,859 --> 01:08:56,960 other questions other than the answer to that question of 1412 01:08:56,960 --> 01:08:59,399 what happened to the average child in a treated school? 1413 01:09:02,501 --> 01:09:05,333 AUDIENCE: You were thinking of having a mandatory deworming 1414 01:09:05,333 --> 01:09:08,640 program in Kenya. 1415 01:09:08,640 --> 01:09:10,717 And then you want to know what would be the impact if 1416 01:09:10,717 --> 01:09:14,460 everybody was forced to treat. 1417 01:09:14,460 --> 01:09:15,630 MICHAEL KREMER: Exactly. 1418 01:09:15,630 --> 01:09:22,140 So in this particular case, this was a program where it 1419 01:09:22,140 --> 01:09:23,450 was designed in such a way that not 1420 01:09:23,450 --> 01:09:24,630 everybody had to be treated. 1421 01:09:24,630 --> 01:09:27,250 It wasn't that you can't come to school unless you show your 1422 01:09:27,250 --> 01:09:28,620 certificate showing you've been treated. 1423 01:09:28,620 --> 01:09:31,330 But you might well be interested in, well, what if 1424 01:09:31,330 --> 01:09:32,420 we went a step further. 1425 01:09:32,420 --> 01:09:35,630 And we said, we're going to keep a supply of medicine at 1426 01:09:35,630 --> 01:09:38,560 the school, and if you are gone that particular day, then 1427 01:09:38,560 --> 01:09:39,800 you get it the next day. 1428 01:09:39,800 --> 01:09:41,470 And we don't let you come back to school unless 1429 01:09:41,470 --> 01:09:42,870 you take the medicine. 1430 01:09:42,870 --> 01:09:46,130 Well, you wouldn't be measuring the 1431 01:09:46,130 --> 01:09:50,020 impact of that program. 1432 01:09:50,020 --> 01:09:52,569 Intention to treat is very good if you're interested in 1433 01:09:52,569 --> 01:09:54,770 the narrow question of what's the impact 1434 01:09:54,770 --> 01:09:56,320 of this exact program. 1435 01:09:56,320 --> 01:09:58,080 But if you're trying to go beyond what's the impact of 1436 01:09:58,080 --> 01:10:00,740 this exact program, you're trying to start to think about 1437 01:10:00,740 --> 01:10:05,470 generalization, then maybe you want to understand some of the 1438 01:10:05,470 --> 01:10:07,030 underlying parameters. 1439 01:10:07,030 --> 01:10:09,035 And in this case, the underlying parameter is what's 1440 01:10:09,035 --> 01:10:12,280 the effect on school attendance of a kid who had 1441 01:10:12,280 --> 01:10:15,070 worms or a particular level of worms no longer having that. 1442 01:10:19,000 --> 01:10:21,190 And then it's using that underlying parameter that you 1443 01:10:21,190 --> 01:10:22,670 might be able to generalize what would be the effect of 1444 01:10:22,670 --> 01:10:25,020 everybody getting treated, what would be the effect of 1445 01:10:25,020 --> 01:10:27,080 only some people getting treated. 1446 01:10:27,080 --> 01:10:34,260 So to do that, Shawn's going to talk a little bit about how 1447 01:10:34,260 --> 01:10:35,510 you would do that. 1448 01:10:39,600 --> 01:10:43,570 Let's do this example, where we're trying to get the-- 1449 01:10:47,155 --> 01:10:50,380 I'm wondering whether to skip this example or not. 1450 01:10:50,380 --> 01:10:51,510 I'll do it. 1451 01:10:51,510 --> 01:10:53,410 I'll go through it. 1452 01:10:53,410 --> 01:10:59,460 In this example, if you look at the people who were-- 1453 01:10:59,460 --> 01:11:01,980 here's the intent, whether there was an intention to 1454 01:11:01,980 --> 01:11:02,740 treat them. 1455 01:11:02,740 --> 01:11:06,200 So all school one, you tried to treat everybody, but only 1456 01:11:06,200 --> 01:11:07,750 some of them got treated. 1457 01:11:07,750 --> 01:11:11,150 In school two, the intent was not to treat that. 1458 01:11:11,150 --> 01:11:12,910 They were assigned to the comparison group. 1459 01:11:12,910 --> 01:11:15,150 But a few people got treated anyway. 1460 01:11:15,150 --> 01:11:18,000 And this is the change in weight for each individual. 1461 01:11:18,000 --> 01:11:21,220 So then if we average the change in weight, the average 1462 01:11:21,220 --> 01:11:23,690 change in school one-- 1463 01:11:23,690 --> 01:11:28,008 I don't know if people want to figure that out for a second-- 1464 01:11:28,008 --> 01:11:28,932 AUDIENCE: [INAUDIBLE]? 1465 01:11:28,932 --> 01:11:29,394 MICHAEL KREMER: Sorry? 1466 01:11:29,394 --> 01:11:31,130 AUDIENCE: [INAUDIBLE]? 1467 01:11:31,130 --> 01:11:34,830 MICHAEL KREMER: So it's 1.3, right? 1468 01:11:34,830 --> 01:11:43,800 And the average change in school two is 0.9. 1469 01:11:43,800 --> 01:11:48,200 So the intention to treat effect would be comparing the 1470 01:11:48,200 --> 01:11:56,010 1.3 to the 0.9. 1471 01:11:56,010 --> 01:11:57,350 Now when is that useful? 1472 01:11:57,350 --> 01:12:00,680 Well, that's what I'm saying, for an actual program. 1473 01:12:00,680 --> 01:12:03,240 But you're not measuring this medical effect that you'd want 1474 01:12:03,240 --> 01:12:04,490 for generalization. 1475 01:12:15,450 --> 01:12:20,150 Here is an example where it's a malaria prevention program, 1476 01:12:20,150 --> 01:12:24,430 but there's political pressures to treat. 1477 01:12:24,430 --> 01:12:27,080 And so you add. 1478 01:12:27,080 --> 01:12:35,190 Again, you can measure this impact, this intention to 1479 01:12:35,190 --> 01:12:38,400 treat measure. 1480 01:12:42,300 --> 01:12:43,090 Let me-- 1481 01:12:43,090 --> 01:12:45,630 I'm wondering whether I should-- 1482 01:12:45,630 --> 01:12:47,610 let me go back here and say-- 1483 01:12:53,770 --> 01:12:55,980 Initially, the blue circles were the ones that were 1484 01:12:55,980 --> 01:12:57,240 supposed to be treated. 1485 01:12:57,240 --> 01:13:00,960 I want to talk about why you can't do what the apparently 1486 01:13:00,960 --> 01:13:03,540 obvious thing of just comparing the guys who were 1487 01:13:03,540 --> 01:13:06,790 treated to the ones who weren't. 1488 01:13:06,790 --> 01:13:08,450 You've got this malaria prevention program. 1489 01:13:08,450 --> 01:13:10,400 40 villages are sampled. 1490 01:13:10,400 --> 01:13:13,000 20 were assigned to get the treatment the first year. 1491 01:13:13,000 --> 01:13:15,080 20 were assigned to be the comparison. 1492 01:13:15,080 --> 01:13:19,280 But some of the comparison villages object to this, and 1493 01:13:19,280 --> 01:13:21,000 they say, we want to be treated too. 1494 01:13:21,000 --> 01:13:23,740 And the program manager says, look, we just have to go ahead 1495 01:13:23,740 --> 01:13:25,270 and treat this. 1496 01:13:25,270 --> 01:13:30,660 So if the program only gets implemented in 15 villages, as 1497 01:13:30,660 --> 01:13:32,480 well as in 2 villages that were supposed to be 1498 01:13:32,480 --> 01:13:35,350 comparison, so what do you do to measure the 1499 01:13:35,350 --> 01:13:37,430 impact of the program? 1500 01:13:37,430 --> 01:13:41,360 So by the way, in the previous case I mentioned with the 1501 01:13:41,360 --> 01:13:45,260 politician in Kenya, the extra school was neither treatment 1502 01:13:45,260 --> 01:13:48,720 or comparison, so really, in that case, there was no 1503 01:13:48,720 --> 01:13:50,870 problem because it was just out of the sample frame 1504 01:13:50,870 --> 01:13:53,890 altogether, the extra school that got treated. 1505 01:13:53,890 --> 01:13:56,760 In this case, some of the comparison schools wind up 1506 01:13:56,760 --> 01:13:58,530 getting treated. 1507 01:13:58,530 --> 01:14:00,030 So how do you measure it? 1508 01:14:00,030 --> 01:14:07,690 Well, here's the problem with what would happen if you just 1509 01:14:07,690 --> 01:14:10,100 did the naive thing and said, we're going to compare all the 1510 01:14:10,100 --> 01:14:13,020 guys who actually got treated to the comparison schools. 1511 01:14:13,020 --> 01:14:15,570 So we've got the blue schools are the ones that were 1512 01:14:15,570 --> 01:14:18,240 supposed to be treated that are in the sample. 1513 01:14:18,240 --> 01:14:20,970 The white schools are other villages. 1514 01:14:20,970 --> 01:14:24,310 So T is the original treatment group. 1515 01:14:24,310 --> 01:14:27,820 The T's are supposed to be treated. 1516 01:14:27,820 --> 01:14:30,360 The blues without T's in them are supposed to be the 1517 01:14:30,360 --> 01:14:32,770 comparison. 1518 01:14:32,770 --> 01:14:35,600 Now the actual treatment are the green circles. 1519 01:14:39,660 --> 01:14:44,010 So you can't compare the green circle villages 1520 01:14:44,010 --> 01:14:45,400 with the blue dots. 1521 01:14:45,400 --> 01:14:52,680 The green circles are the ones that were actually treated. 1522 01:14:52,680 --> 01:14:55,650 And the blue dots are the comparison. 1523 01:14:55,650 --> 01:14:56,935 Why can't you make that comparison? 1524 01:15:01,420 --> 01:15:02,410 AUDIENCE: They're not randomly assigned 1525 01:15:02,410 --> 01:15:03,160 from the very beginning. 1526 01:15:03,160 --> 01:15:04,270 MICHAEL KREMER: They're not randomly assigned from the 1527 01:15:04,270 --> 01:15:04,790 very beginning. 1528 01:15:04,790 --> 01:15:08,380 And can you be more specific about what your hypothesis 1529 01:15:08,380 --> 01:15:10,020 might be on the difference? 1530 01:15:10,020 --> 01:15:11,433 AUDIENCE: [UNINTELLIGIBLE] that fought to get the 1531 01:15:11,433 --> 01:15:15,515 treatment would prefer some help from the schools or 1532 01:15:15,515 --> 01:15:18,040 villages that were initially selected randomly. 1533 01:15:18,040 --> 01:15:19,290 MICHAEL KREMER: Exactly, exactly. 1534 01:15:31,830 --> 01:15:33,690 The guys who fought to get the treatment might differ from 1535 01:15:33,690 --> 01:15:35,720 the ones that are initially selected randomly. 1536 01:15:35,720 --> 01:15:38,280 They might have particularly capable leaders, for example, 1537 01:15:38,280 --> 01:15:39,600 or influential leaders. 1538 01:15:39,600 --> 01:15:41,910 And those influential leaders-- this politician who 1539 01:15:41,910 --> 01:15:45,136 managed to get the NGO program assigned to his area, he might 1540 01:15:45,136 --> 01:15:46,990 have fought to get lots of other programs assigned there. 1541 01:15:46,990 --> 01:15:48,690 So we don't know whether we're measuring the impact of this 1542 01:15:48,690 --> 01:15:50,760 program, or we're measuring the fact that they're just 1543 01:15:50,760 --> 01:15:53,180 able to use their political influence to get everything 1544 01:15:53,180 --> 01:15:53,940 assigned there. 1545 01:15:53,940 --> 01:15:56,530 And similarly, if you leave out the-- 1546 01:16:02,060 --> 01:16:04,470 So this is basically just making the 1547 01:16:04,470 --> 01:16:06,900 point that you said. 1548 01:16:06,900 --> 01:16:10,400 The other thing that you could think about doing is comparing 1549 01:16:10,400 --> 01:16:13,380 the villages that were assigned to be a treatment 1550 01:16:13,380 --> 01:16:16,890 group and actually got treated with the ones that were 1551 01:16:16,890 --> 01:16:18,400 supposed to be a comparison group. 1552 01:16:18,400 --> 01:16:21,109 So what's the problem with that? 1553 01:16:21,109 --> 01:16:22,810 AUDIENCE: Attrition. 1554 01:16:22,810 --> 01:16:24,560 It's kind of like attrition. 1555 01:16:24,560 --> 01:16:26,710 MICHAEL KREMER: Exactly, it's the same principle, which is 1556 01:16:26,710 --> 01:16:30,810 you'd be leaving out a group that is the ones who were 1557 01:16:30,810 --> 01:16:32,150 assigned to be treated, but didn't 1558 01:16:32,150 --> 01:16:33,890 wind up getting treated. 1559 01:16:33,890 --> 01:16:35,810 Well, the ones who were assigned to be treated and, 1560 01:16:35,810 --> 01:16:38,050 nonetheless, didn't get treated, those might be the 1561 01:16:38,050 --> 01:16:40,620 ones-- so for example, imagine there's violence in some of 1562 01:16:40,620 --> 01:16:41,150 these areas. 1563 01:16:41,150 --> 01:16:44,090 And your field workers can't go there, so they never wound 1564 01:16:44,090 --> 01:16:46,160 up getting treated. 1565 01:16:46,160 --> 01:16:48,750 Well, the violence itself might have had an impact on 1566 01:16:48,750 --> 01:16:49,810 development outcomes. 1567 01:16:49,810 --> 01:16:52,450 So you may be measuring the impact of the violence or of 1568 01:16:52,450 --> 01:16:55,300 particularly bad leaders who, despite being in the treatment 1569 01:16:55,300 --> 01:16:57,300 group, still can't get their village treated. 1570 01:17:01,190 --> 01:17:03,330 So that's not going to be a valid comparison either. 1571 01:17:10,790 --> 01:17:14,400 So one thing you can do is the intention to treat estimator. 1572 01:17:14,400 --> 01:17:16,320 You can do that again in this case. 1573 01:17:16,320 --> 01:17:19,430 So compare the initial 20 treatment villages with the 1574 01:17:19,430 --> 01:17:21,970 initial 20 comparison villages. 1575 01:17:21,970 --> 01:17:25,120 And then you've got the ITT estimator. 1576 01:17:25,120 --> 01:17:28,320 Now before I argued that the ITT estimator, in the case of 1577 01:17:28,320 --> 01:17:31,010 the deworming program, that arguably might be a very good 1578 01:17:31,010 --> 01:17:32,650 measure of some things. 1579 01:17:32,650 --> 01:17:34,450 You might not be able to do some other things with it, but 1580 01:17:34,450 --> 01:17:36,020 it was still a useful measure. 1581 01:17:36,020 --> 01:17:39,000 But in this case, suppose we want to actually understand 1582 01:17:39,000 --> 01:17:41,340 what the impact of the malaria treatment program. 1583 01:17:41,340 --> 01:17:44,820 And we know that what's the impact of it if you're able to 1584 01:17:44,820 --> 01:17:46,090 implement it. 1585 01:17:46,090 --> 01:17:47,990 Well, the intention to treat estimator isn't 1586 01:17:47,990 --> 01:17:49,140 really telling you that. 1587 01:17:49,140 --> 01:17:50,760 It's telling you what's the effect of being assigned to 1588 01:17:50,760 --> 01:17:51,770 the treatment. 1589 01:17:51,770 --> 01:17:54,020 But it's not saying what's the effect of the program in the 1590 01:17:54,020 --> 01:17:56,280 cases where you're able to implement it. 1591 01:17:56,280 --> 01:17:58,130 So that's a problem. 1592 01:17:58,130 --> 01:18:01,990 And that's where I'm going to leave you with that problem. 1593 01:18:01,990 --> 01:18:03,620 And then Shawn's going to tell you, at least, 1594 01:18:03,620 --> 01:18:04,870 a solution to it.