1 00:00:00,040 --> 00:00:02,390 SPEAKER: The following content is provided under a Creative 2 00:00:02,390 --> 00:00:03,680 Commons license. 3 00:00:03,680 --> 00:00:06,640 Your support will help MIT OpenCourseWare continue to 4 00:00:06,640 --> 00:00:09,980 offer high-quality educational resources for free. 5 00:00:09,980 --> 00:00:12,820 To make a donation or view additional materials from 6 00:00:12,820 --> 00:00:16,750 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:16,750 --> 00:00:18,000 ocw.mit.edu. 8 00:00:22,390 --> 00:00:22,760 PROFESSOR: OK. 9 00:00:22,760 --> 00:00:25,340 So before we begin, I would like to just ask 10 00:00:25,340 --> 00:00:27,500 a very simple question. 11 00:00:27,500 --> 00:00:31,140 Do you think randomized evaluation are the best way to 12 00:00:31,140 --> 00:00:32,729 conduct an impact evaluation? 13 00:00:32,729 --> 00:00:36,550 Please raise your hand if you think so. 14 00:00:36,550 --> 00:00:38,700 Just be honest. 15 00:00:38,700 --> 00:00:42,260 All right, the TAs, you guys don't count. 16 00:00:42,260 --> 00:00:43,100 All right. 17 00:00:43,100 --> 00:00:44,080 OK. 18 00:00:44,080 --> 00:00:46,700 So I have a job to do now. 19 00:00:46,700 --> 00:00:48,780 Whereas I thought that maybe not. 20 00:00:48,780 --> 00:00:54,050 One of the things I would like to do is to-- 21 00:00:54,050 --> 00:00:56,760 this is one thing I've discovered about teaching. 22 00:00:56,760 --> 00:00:59,180 We have about an hour and 25 minutes. 23 00:00:59,180 --> 00:01:02,410 And if I speak for an hour and 25 minutes, I know two things 24 00:01:02,410 --> 00:01:02,990 will happen. 25 00:01:02,990 --> 00:01:05,080 One, you will get very bored, and two, you 26 00:01:05,080 --> 00:01:06,290 will not learn anything. 27 00:01:06,290 --> 00:01:10,960 So I want you to make sure that you interrupt with 28 00:01:10,960 --> 00:01:12,550 questions that you have. 29 00:01:12,550 --> 00:01:15,370 If they can be on the topic, that would be very good. 30 00:01:15,370 --> 00:01:18,460 If they're are off-topic, I may delay the question or I 31 00:01:18,460 --> 00:01:21,320 may postpone the question, at least until I get there. 32 00:01:21,320 --> 00:01:25,610 The other thing I would like to say about the way this 33 00:01:25,610 --> 00:01:28,140 would work, is I have no power over you. 34 00:01:28,140 --> 00:01:30,860 Whereas my students, I have a grade to give, with 35 00:01:30,860 --> 00:01:31,800 you I have no power. 36 00:01:31,800 --> 00:01:34,580 But I will still ask you to do certain things during the 37 00:01:34,580 --> 00:01:35,230 presentation. 38 00:01:35,230 --> 00:01:37,070 So I hope you'll collaborate. 39 00:01:37,070 --> 00:01:39,170 So my session is called Why Randomize? 40 00:01:39,170 --> 00:01:42,600 And the idea of Why Randomize? 41 00:01:42,600 --> 00:01:45,240 comes, for those of you who are convinced, I hope you can 42 00:01:45,240 --> 00:01:50,970 use this session to help convince others why this 43 00:01:50,970 --> 00:01:53,775 method is a very good method to do an impact evaluation. 44 00:01:53,775 --> 00:01:56,330 And for those of you who are not convinced, I would like to 45 00:01:56,330 --> 00:02:00,230 actually welcome you to raise any objections you have. 46 00:02:00,230 --> 00:02:05,230 And I'm not here to tell you randomization is a panacea or 47 00:02:05,230 --> 00:02:08,690 it's a solution to all the problems of mankind. 48 00:02:08,690 --> 00:02:11,470 But I think in terms of impact evaluations, it's a very 49 00:02:11,470 --> 00:02:13,320 powerful method. 50 00:02:13,320 --> 00:02:15,640 So the outline of the talk, I'll give you a little bit of 51 00:02:15,640 --> 00:02:16,090 background. 52 00:02:16,090 --> 00:02:18,590 We'll define, what is a randomized evaluation? 53 00:02:18,590 --> 00:02:20,030 It's going to be important to make sure we 54 00:02:20,030 --> 00:02:21,050 have a common language. 55 00:02:21,050 --> 00:02:25,830 Then advantages and disadvantages of experiments. 56 00:02:25,830 --> 00:02:28,620 Then we're going to do the get out the vote, and then finally 57 00:02:28,620 --> 00:02:31,910 conclude in hopefully an hour and 20 minutes. 58 00:02:31,910 --> 00:02:33,610 So how to measure impact? 59 00:02:33,610 --> 00:02:36,440 This is something that Rachel referred to. 60 00:02:36,440 --> 00:02:39,360 The idea for measuring impact is, we want to compare what 61 00:02:39,360 --> 00:02:43,190 happened to the beneficiaries of a program versus what would 62 00:02:43,190 --> 00:02:45,350 have happened in the absence of the program. 63 00:02:45,350 --> 00:02:46,400 This is really key. 64 00:02:46,400 --> 00:02:48,760 What would have happened in the absence of a program is 65 00:02:48,760 --> 00:02:51,830 what's called a counterfactual, and it's key 66 00:02:51,830 --> 00:02:55,350 for you to evaluate any method to estimate program impact. 67 00:02:55,350 --> 00:02:57,040 Not just randomized evaluation. 68 00:02:57,040 --> 00:03:00,090 So when you are trying to assess how someone is going to 69 00:03:00,090 --> 00:03:02,990 do an impact evaluation, always ask yourself the 70 00:03:02,990 --> 00:03:06,040 question, what is the counterfactual here? 71 00:03:06,040 --> 00:03:09,700 How are they planning to think about this counterfactual? 72 00:03:09,700 --> 00:03:14,090 How do these people look like in the absence of the program? 73 00:03:14,090 --> 00:03:16,600 In the case of Kenya in the textbooks that Rachel was 74 00:03:16,600 --> 00:03:18,810 referring to this morning, we thought about the 75 00:03:18,810 --> 00:03:22,750 counterfactual in terms of how these children fared after 76 00:03:22,750 --> 00:03:25,460 this textbook program was implemented versus how they 77 00:03:25,460 --> 00:03:28,590 would have fared at the same moment in time had the program 78 00:03:28,590 --> 00:03:29,970 not been implemented. 79 00:03:29,970 --> 00:03:34,170 This is crucial, because even before and after methodologies 80 00:03:34,170 --> 00:03:36,470 or any other of those methodologies, you are 81 00:03:36,470 --> 00:03:38,890 assuming implicitly counterfactual. 82 00:03:38,890 --> 00:03:41,330 And the question is, what counterfactual are you 83 00:03:41,330 --> 00:03:45,640 assuming, and then is that assumption realistic? 84 00:03:45,640 --> 00:03:47,150 And in some cases, it may be. 85 00:03:47,150 --> 00:03:49,430 In other cases, it may not. 86 00:03:49,430 --> 00:03:52,030 So the problem is, the counterfactual is not 87 00:03:52,030 --> 00:03:52,720 observable. 88 00:03:52,720 --> 00:03:55,030 So the key goal of this impact evaluation 89 00:03:55,030 --> 00:03:57,140 methodology is to mimic it. 90 00:03:57,140 --> 00:04:00,050 You can't observe how this children in Kenya would have 91 00:04:00,050 --> 00:04:03,940 fared if the textbook program had not been implemented. 92 00:04:03,940 --> 00:04:06,600 The truth is, the textbook program was implemented, these 93 00:04:06,600 --> 00:04:10,310 textbooks were sent, and so you can't observe what that 94 00:04:10,310 --> 00:04:13,520 alternative reality would have been. 95 00:04:13,520 --> 00:04:17,130 And so constructing the counterfactual is usually done 96 00:04:17,130 --> 00:04:19,810 by selecting a group of people-- 97 00:04:19,810 --> 00:04:22,810 in this case, children, in the case of the Kenya example-- 98 00:04:22,810 --> 00:04:26,320 that have not been exposed to the program, or were not 99 00:04:26,320 --> 00:04:27,980 affected by the program. 100 00:04:27,980 --> 00:04:31,680 And so in a randomized evaluation, the key goal here 101 00:04:31,680 --> 00:04:33,650 of the randomized evaluation is that you 102 00:04:33,650 --> 00:04:35,470 do it from the beginning. 103 00:04:35,470 --> 00:04:38,300 And this is a question that I think Logan had in the first 104 00:04:38,300 --> 00:04:40,190 session with Rachel. 105 00:04:40,190 --> 00:04:43,680 You can't do a randomized evaluation three years after 106 00:04:43,680 --> 00:04:45,400 the program was implemented. 107 00:04:45,400 --> 00:04:49,200 And the reason you can't do it is that you need to create, 108 00:04:49,200 --> 00:04:52,120 through this randomized experiment, the treatment in 109 00:04:52,120 --> 00:04:52,860 the control group. 110 00:04:52,860 --> 00:04:55,600 You need to decide early on who's going to get the 111 00:04:55,600 --> 00:04:57,870 treatment or who's going to be offered the treatment and who 112 00:04:57,870 --> 00:04:59,830 is not going to be offered the treatment. 113 00:04:59,830 --> 00:05:03,070 There are some opportunities, as Rachel referred to, and 114 00:05:03,070 --> 00:05:07,010 your get out the vote case is a good example, where someone 115 00:05:07,010 --> 00:05:08,610 already did this. 116 00:05:08,610 --> 00:05:11,580 And so you may be lucky, and you may step into the room and 117 00:05:11,580 --> 00:05:12,520 say, oh, look. 118 00:05:12,520 --> 00:05:13,670 Someone did it. 119 00:05:13,670 --> 00:05:18,310 But this thing is, someone should have taken care so that 120 00:05:18,310 --> 00:05:20,890 the assignment to this treatment and control group 121 00:05:20,890 --> 00:05:23,130 was done in a random manner. 122 00:05:23,130 --> 00:05:25,950 And in effect, and we'll see what exactly is random, but 123 00:05:25,950 --> 00:05:30,085 what I can tell you for now is if someone doesn't say, we did 124 00:05:30,085 --> 00:05:33,100 it randomized, we did a deliberate process so that it 125 00:05:33,100 --> 00:05:35,930 was random, it's probably not random. 126 00:05:35,930 --> 00:05:38,950 Random is not what people say in the real world. 127 00:05:38,950 --> 00:05:39,120 Oh! 128 00:05:39,120 --> 00:05:40,560 This is just a random event. 129 00:05:40,560 --> 00:05:43,250 Random has a very specific definition which we're going 130 00:05:43,250 --> 00:05:44,430 to see in a second. 131 00:05:44,430 --> 00:05:47,450 So it's not enough to just say, oh, look. 132 00:05:47,450 --> 00:05:49,210 We didn't do anything systematic. 133 00:05:49,210 --> 00:05:51,560 Just people enrolled, and that's what happened. 134 00:05:51,560 --> 00:05:55,200 If they didn't do something deliberate to do it random, 135 00:05:55,200 --> 00:05:56,810 then it probably wasn't random. 136 00:05:56,810 --> 00:06:01,640 You can try to check this, but not always possible. 137 00:06:01,640 --> 00:06:06,320 The non-randomized, basically, I use that some excluded 138 00:06:06,320 --> 00:06:10,570 group, the group of people you're going to use as this 139 00:06:10,570 --> 00:06:14,130 comparison group, it's mimicking this counterfactual. 140 00:06:14,130 --> 00:06:19,490 And the non-randomized methods rely on the strength of the 141 00:06:19,490 --> 00:06:20,730 assumption that you're making. 142 00:06:20,730 --> 00:06:25,580 So the methods will be strong if the assumption that the 143 00:06:25,580 --> 00:06:27,960 comparison group mimics the counterfactual is a good 144 00:06:27,960 --> 00:06:28,960 assumption. 145 00:06:28,960 --> 00:06:32,490 There's not any sense in which you say, well, this method is 146 00:06:32,490 --> 00:06:35,860 better than other this other in some absolute manner. 147 00:06:35,860 --> 00:06:39,240 It is better or it's not better if the assumptions 148 00:06:39,240 --> 00:06:41,690 needed to mimic the counterfactual hold. 149 00:06:41,690 --> 00:06:45,560 If they hold, then that's great, you have a good method. 150 00:06:45,560 --> 00:06:47,660 The key distinction between this-- 151 00:06:47,660 --> 00:06:48,563 yes? 152 00:06:48,563 --> 00:06:50,910 AUDIENCE: Could you give us an example of when the 153 00:06:50,910 --> 00:06:54,330 assumptions were just obviously untrue? 154 00:06:54,330 --> 00:06:54,810 PROFESSOR: Sure. 155 00:06:54,810 --> 00:07:00,130 So suppose that you had this textbook program and it was 156 00:07:00,130 --> 00:07:03,940 happening in Kenya, where many-- 157 00:07:03,940 --> 00:07:05,590 and this is program happened-- 158 00:07:05,590 --> 00:07:07,370 where many other things were happening in 159 00:07:07,370 --> 00:07:08,670 this education system. 160 00:07:08,670 --> 00:07:11,550 So textbooks were being distributed, different 161 00:07:11,550 --> 00:07:13,000 teachers were being hired. 162 00:07:13,000 --> 00:07:15,520 A lot of activities were happening. 163 00:07:15,520 --> 00:07:19,310 And so you just compare what test scores of children were 164 00:07:19,310 --> 00:07:22,880 before the program and then what textbooks of children 165 00:07:22,880 --> 00:07:27,420 were after the program, you would suspect that-- 166 00:07:27,420 --> 00:07:30,570 well, first of all, if you did that, the counterfactual you 167 00:07:30,570 --> 00:07:33,760 would be assuming is that in the absence of the program, 168 00:07:33,760 --> 00:07:36,870 test scores would have remained flat. 169 00:07:36,870 --> 00:07:38,700 And that may be a reasonable 170 00:07:38,700 --> 00:07:40,530 counterfactual in some contexts. 171 00:07:40,530 --> 00:07:42,340 Not many, to be honest. 172 00:07:42,340 --> 00:07:43,340 But not in others. 173 00:07:43,340 --> 00:07:46,200 So in one context in which other things happening in the 174 00:07:46,200 --> 00:07:50,200 education system in Kenya, it's very hard to argue that 175 00:07:50,200 --> 00:07:52,220 nothing would have changed in test scores. 176 00:07:52,220 --> 00:07:54,060 Because test scores would have increased, because there are 177 00:07:54,060 --> 00:07:55,740 lots of things that happen. 178 00:07:55,740 --> 00:07:59,130 Now suppose you implemented this same program in a very 179 00:07:59,130 --> 00:08:03,280 remote village, very secluded area where nothing else would 180 00:08:03,280 --> 00:08:04,110 have happened. 181 00:08:04,110 --> 00:08:07,350 You sort of have a pretty good sense that no other 182 00:08:07,350 --> 00:08:09,960 intervention was happening for one group or the 183 00:08:09,960 --> 00:08:11,430 other at the same time. 184 00:08:11,430 --> 00:08:13,240 The assumption maybe more plausible. 185 00:08:13,240 --> 00:08:15,970 I think in this case, the textbook example, it's still 186 00:08:15,970 --> 00:08:18,790 questionable, because there are other educational input 187 00:08:18,790 --> 00:08:19,970 said may be happening. 188 00:08:19,970 --> 00:08:23,120 But the key is that the context and the method are the 189 00:08:23,120 --> 00:08:25,980 ones that together can tell you how good 190 00:08:25,980 --> 00:08:27,360 the assumption is. 191 00:08:27,360 --> 00:08:29,380 The method by itself cannot tell you. 192 00:08:29,380 --> 00:08:32,510 The method by itself may be reasonable under certain 193 00:08:32,510 --> 00:08:34,500 conditions but not under others. 194 00:08:34,500 --> 00:08:36,870 AUDIENCE: But there aren't any sort of big famous studies 195 00:08:36,870 --> 00:08:39,836 that weren't randomized, that everybody thinks they're 196 00:08:39,836 --> 00:08:40,140 pretty good? 197 00:08:40,140 --> 00:08:40,700 PROFESSOR: Yes. 198 00:08:40,700 --> 00:08:45,270 So I don't want to get a lot into this, but there's a whole 199 00:08:45,270 --> 00:08:50,780 debate now in economics literature as to whether 200 00:08:50,780 --> 00:08:53,680 randomized experiments are the only way to 201 00:08:53,680 --> 00:08:55,490 estimate causal effects. 202 00:08:55,490 --> 00:08:59,200 This is a big, big debate, and there are very respectable 203 00:08:59,200 --> 00:09:02,160 people on both sides of the debate. 204 00:09:02,160 --> 00:09:05,060 What I can tell you is that debate has not been solved, 205 00:09:05,060 --> 00:09:08,240 but I think more and more people are sort of 206 00:09:08,240 --> 00:09:11,460 recognizing, at least, that the randomized experiment 207 00:09:11,460 --> 00:09:12,670 should be a first best. 208 00:09:12,670 --> 00:09:17,120 I think even the opponents of the method do say that. 209 00:09:17,120 --> 00:09:19,190 But the other thing I would say is there have been many 210 00:09:19,190 --> 00:09:23,210 studies trying to compare the results of an experiment with 211 00:09:23,210 --> 00:09:25,920 some of the other non-experimental methods. 212 00:09:25,920 --> 00:09:28,990 You have one in your get out the vote. 213 00:09:28,990 --> 00:09:32,950 That was not a study in which the non-experimental methods 214 00:09:32,950 --> 00:09:35,840 fared very well, but there are other studies in which they 215 00:09:35,840 --> 00:09:36,880 fared well. 216 00:09:36,880 --> 00:09:40,000 The key thing is we haven't been able to figure out under 217 00:09:40,000 --> 00:09:44,650 what conditions the non-randomized evaluations 218 00:09:44,650 --> 00:09:45,560 fared well. 219 00:09:45,560 --> 00:09:47,750 If we knew, then it would be nice. 220 00:09:47,750 --> 00:09:50,080 But I think so far, the answer-- 221 00:09:50,080 --> 00:09:51,300 we don't know. 222 00:09:51,300 --> 00:09:53,660 We know the theoretical answer, which is, if the 223 00:09:53,660 --> 00:09:57,860 assumptions hold, we're golden. 224 00:09:57,860 --> 00:10:03,290 The problem, key problem, is that this is relying on the 225 00:10:03,290 --> 00:10:07,040 assumptions, and you cannot test these assumptions. 226 00:10:07,040 --> 00:10:09,210 If you could test this assumption, if you could test 227 00:10:09,210 --> 00:10:13,380 under what assumption this mimics the counterfactuals, 228 00:10:13,380 --> 00:10:14,220 we'll be all done. 229 00:10:14,220 --> 00:10:17,230 We'll be able to say, from the very beginning, we can use 230 00:10:17,230 --> 00:10:18,050 this method. 231 00:10:18,050 --> 00:10:20,990 You cannot, no matter how sophisticated and how good the 232 00:10:20,990 --> 00:10:23,320 non-experimental method is. 233 00:10:23,320 --> 00:10:24,220 Yes? 234 00:10:24,220 --> 00:10:24,895 You seem skeptical. 235 00:10:24,895 --> 00:10:26,860 AUDIENCE: No, no, no. 236 00:10:26,860 --> 00:10:27,800 PROFESSOR: You're--? 237 00:10:27,800 --> 00:10:28,950 OK. 238 00:10:28,950 --> 00:10:32,380 So this is very confusing. 239 00:10:32,380 --> 00:10:33,905 It's like twice they're showing-- 240 00:10:36,850 --> 00:10:38,560 you should do a randomized evaluation 241 00:10:38,560 --> 00:10:39,590 to see if this helps. 242 00:10:39,590 --> 00:10:41,400 Two boards. 243 00:10:41,400 --> 00:10:41,870 All right. 244 00:10:41,870 --> 00:10:44,620 So the randomized evaluations here, you have a bunch of 245 00:10:44,620 --> 00:10:47,030 other names in which they are known-- random assignment 246 00:10:47,030 --> 00:10:50,180 studies, randomized field trials, just in case-- 247 00:10:50,180 --> 00:10:53,360 RCTs are the way that they were known very early in the 248 00:10:53,360 --> 00:10:57,310 literature, and still nowadays in other disciplines. 249 00:10:57,310 --> 00:11:01,030 And then the non-experimental methods, all of this that you 250 00:11:01,030 --> 00:11:03,560 have here, some of which are in your get 251 00:11:03,560 --> 00:11:07,310 out the vote study. 252 00:11:07,310 --> 00:11:07,610 All right. 253 00:11:07,610 --> 00:11:10,490 So before we go into what is a randomized experiment, I want 254 00:11:10,490 --> 00:11:12,640 to introduce the notion of validity. 255 00:11:12,640 --> 00:11:14,130 And Rachel raised it a little bit. 256 00:11:14,130 --> 00:11:18,640 But we usually think of in terms of two kinds of validity 257 00:11:18,640 --> 00:11:20,570 when you assess a study. 258 00:11:20,570 --> 00:11:22,440 The first one is internal validity. 259 00:11:22,440 --> 00:11:24,150 This has to do with your ability 260 00:11:24,150 --> 00:11:25,670 to draw causal inference. 261 00:11:25,670 --> 00:11:29,260 So your ability to attribute your impact 262 00:11:29,260 --> 00:11:30,980 estimates to the program. 263 00:11:30,980 --> 00:11:34,440 So if you said, this difference is my impact 264 00:11:34,440 --> 00:11:38,470 estimate, the study has strong internal validity if you can 265 00:11:38,470 --> 00:11:41,610 reliably attribute that to the program and not to something 266 00:11:41,610 --> 00:11:48,310 else for whatever population is represented in your study. 267 00:11:48,310 --> 00:11:53,400 So if you did the textbook project in Kenya, in a rural 268 00:11:53,400 --> 00:11:57,410 village in Kenya, well, that study-- 269 00:11:57,410 --> 00:12:00,210 if it's internally valid, or if it has strong internal 270 00:12:00,210 --> 00:12:03,620 validity, then it's going to be valid for the population 271 00:12:03,620 --> 00:12:06,755 represented by the sample you drew in Kenya, in that rural 272 00:12:06,755 --> 00:12:07,760 village in Kenya. 273 00:12:07,760 --> 00:12:10,680 External validity, on the other hand, has to do with the 274 00:12:10,680 --> 00:12:14,400 ability to generalize to other populations, other settings, 275 00:12:14,400 --> 00:12:15,650 other time periods. 276 00:12:18,370 --> 00:12:23,660 The reason I mention this is that these two things often 277 00:12:23,660 --> 00:12:27,490 trade off against each other when you are sort of trying to 278 00:12:27,490 --> 00:12:29,050 commission or conduct a study. 279 00:12:29,050 --> 00:12:32,750 So you may decide, I'm going to go this randomized trial in 280 00:12:32,750 --> 00:12:36,920 this very small place to test out my model. 281 00:12:36,920 --> 00:12:39,850 And you may be concerned with, how do I know if it 282 00:12:39,850 --> 00:12:42,830 generalizes to other settings? 283 00:12:42,830 --> 00:12:45,990 On the other hand, you may decide, well, I'm going to use 284 00:12:45,990 --> 00:12:50,260 other kinds of methods and be representative of the whole 285 00:12:50,260 --> 00:12:52,630 Kenya, or the whole India, or whatever country you're 286 00:12:52,630 --> 00:12:53,790 working in. 287 00:12:53,790 --> 00:12:56,050 The key thing is to distinguish two things. 288 00:12:56,050 --> 00:12:59,110 The first one has to do with causal inference for your own 289 00:12:59,110 --> 00:13:02,050 sample, or for the population represented in your sample. 290 00:13:02,050 --> 00:13:05,920 The second one has to do with generalizability. 291 00:13:05,920 --> 00:13:08,150 And Rachel talked a little bit about how much you can 292 00:13:08,150 --> 00:13:10,650 generalize from experiments, and we can talk 293 00:13:10,650 --> 00:13:12,500 about that if you want. 294 00:13:12,500 --> 00:13:13,010 All right. 295 00:13:13,010 --> 00:13:15,960 So what is a randomized evaluation? 296 00:13:15,960 --> 00:13:19,360 So the very basics-- 297 00:13:19,360 --> 00:13:21,880 can someone tell me what the basics are? 298 00:13:21,880 --> 00:13:24,640 Randomized experiments? 299 00:13:24,640 --> 00:13:26,800 How do you do it? 300 00:13:26,800 --> 00:13:28,050 How does it work? 301 00:13:31,600 --> 00:13:36,420 There's one thing that you should know. 302 00:13:36,420 --> 00:13:39,760 When I first started teaching, I used to be very, very 303 00:13:39,760 --> 00:13:42,720 nervous when there was silence in the room. 304 00:13:42,720 --> 00:13:45,650 But now I'm very comfortable. 305 00:13:45,650 --> 00:13:46,990 So you tell me. 306 00:13:46,990 --> 00:13:51,120 So how does a randomized trial work? 307 00:13:51,120 --> 00:13:53,390 AUDIENCE: Allocate the subject into the treatment of the 308 00:13:53,390 --> 00:13:55,680 control group based on a random assignment. 309 00:13:55,680 --> 00:13:56,810 PROFESSOR: OK. 310 00:13:56,810 --> 00:13:57,970 random assignment. 311 00:13:57,970 --> 00:14:00,140 Sort of like a flip of a coin, right? 312 00:14:00,140 --> 00:14:03,940 So in the simple scenario, we take a sample of program 313 00:14:03,940 --> 00:14:04,460 applicants-- 314 00:14:04,460 --> 00:14:06,370 just like we do with drug trials-- 315 00:14:06,370 --> 00:14:09,240 take a sample of program applicants and we randomly 316 00:14:09,240 --> 00:14:11,670 assign them either to a treatment group which is 317 00:14:11,670 --> 00:14:14,330 offered the treatment and a control group. 318 00:14:14,330 --> 00:14:15,740 They're not offered the treatment. 319 00:14:15,740 --> 00:14:20,440 This is a very simple setting, but the idea here is that by 320 00:14:20,440 --> 00:14:23,550 doing this, the treatment and the control group are 321 00:14:23,550 --> 00:14:25,650 comparable to each other. 322 00:14:25,650 --> 00:14:28,290 And so any differences you observe between these two 323 00:14:28,290 --> 00:14:31,970 groups should be attributable to the program. 324 00:14:31,970 --> 00:14:33,880 The key about this method-- 325 00:14:33,880 --> 00:14:37,840 so this do not differ systematically at the outset 326 00:14:37,840 --> 00:14:39,070 of the experiment. 327 00:14:39,070 --> 00:14:42,320 The key about this method is that this control group is 328 00:14:42,320 --> 00:14:43,930 mimicking the counterfactuals. 329 00:14:43,930 --> 00:14:47,100 It's mimicking what will happen to the treatment in the 330 00:14:47,100 --> 00:14:48,560 absence of the treatment. 331 00:14:48,560 --> 00:14:52,160 And the reason it's mimicking the counterfactual is that on 332 00:14:52,160 --> 00:14:55,110 average, this group should be exactly like 333 00:14:55,110 --> 00:14:55,915 the treatment group. 334 00:14:55,915 --> 00:14:59,950 So if we took all of you and we flip coins, from each of 335 00:14:59,950 --> 00:15:02,710 you we flip coins, and then you ended up in two different 336 00:15:02,710 --> 00:15:07,110 groups, the two groups would have, on average, the same 337 00:15:07,110 --> 00:15:08,440 characteristics. 338 00:15:08,440 --> 00:15:11,370 So the same people that come from a 339 00:15:11,370 --> 00:15:12,830 certain area of the world. 340 00:15:12,830 --> 00:15:14,320 The same percent of females. 341 00:15:14,320 --> 00:15:15,810 The same average intelligence. 342 00:15:15,810 --> 00:15:17,520 The same average income. 343 00:15:17,520 --> 00:15:19,330 The same average education. 344 00:15:19,330 --> 00:15:20,090 You name it. 345 00:15:20,090 --> 00:15:22,540 We're going to do an exercise where you can see this. 346 00:15:22,540 --> 00:15:25,470 The beauty of this method is that the two groups 347 00:15:25,470 --> 00:15:28,940 statistically are going to be identical to each other. 348 00:15:28,940 --> 00:15:32,620 If they're not identical to each other statistically then 349 00:15:32,620 --> 00:15:34,030 you don't have random assignment. 350 00:15:34,030 --> 00:15:35,250 It has failed. 351 00:15:35,250 --> 00:15:37,340 Random assignment. 352 00:15:37,340 --> 00:15:40,110 So the random assignment is the process you employ to 353 00:15:40,110 --> 00:15:42,280 create these two comparable groups. 354 00:15:42,280 --> 00:15:45,950 The huge advantage of this random assignment is that you 355 00:15:45,950 --> 00:15:49,780 don't need to think about, are the two groups the same on 356 00:15:49,780 --> 00:15:52,350 this characteristic that I care about? 357 00:15:52,350 --> 00:15:54,180 You don't need to think about that. 358 00:15:54,180 --> 00:15:57,680 The two groups should be the same on those characteristics. 359 00:15:57,680 --> 00:15:58,620 AUDIENCE: So that's theoretically. 360 00:15:58,620 --> 00:16:01,590 So now thinking in terms of a program where you have, say, 361 00:16:01,590 --> 00:16:03,480 selection criteria. 362 00:16:03,480 --> 00:16:05,830 So let's say you want to do a program in a particular 363 00:16:05,830 --> 00:16:09,490 district, and you're looking for people that have three 364 00:16:09,490 --> 00:16:11,750 characteristics that are all the same. 365 00:16:11,750 --> 00:16:13,840 Let's say for whatever reason, the number of people that 366 00:16:13,840 --> 00:16:17,065 present themselves in that way is a relatively small number. 367 00:16:19,670 --> 00:16:22,980 Then you can randomly select within that small number. 368 00:16:22,980 --> 00:16:25,900 But then you're challenged by the size of your group. 369 00:16:25,900 --> 00:16:26,710 PROFESSOR: Absolutely. 370 00:16:26,710 --> 00:16:29,420 And on Thursday, you'll get to that 371 00:16:29,420 --> 00:16:31,220 minimum sample size detected. 372 00:16:31,220 --> 00:16:34,090 But the key there, if those three characteristics are your 373 00:16:34,090 --> 00:16:37,250 selection criteria, you don't want to modify your selection 374 00:16:37,250 --> 00:16:39,180 criteria because someone is going to come and do an 375 00:16:39,180 --> 00:16:40,040 experiment. 376 00:16:40,040 --> 00:16:42,500 You want to offer the program to whoever you're going to 377 00:16:42,500 --> 00:16:43,640 offer the program. 378 00:16:43,640 --> 00:16:46,010 So those three characteristics are key for your program, 379 00:16:46,010 --> 00:16:49,050 because you decide those are the people you want to serve, 380 00:16:49,050 --> 00:16:52,040 then you need to find a way to do your evaluation that 381 00:16:52,040 --> 00:16:54,270 doesn't involve relaxing that criteria. 382 00:16:54,270 --> 00:16:57,070 Unless you really are thinking, well, it would be 383 00:16:57,070 --> 00:16:59,250 interesting to know if I served this other group, 384 00:16:59,250 --> 00:17:01,590 whether the program has a different effect or no. 385 00:17:01,590 --> 00:17:03,620 AUDIENCE: But you can't mix and match among the criteria. 386 00:17:03,620 --> 00:17:05,280 You can't say-- or could you? 387 00:17:05,280 --> 00:17:06,640 Let's say you have trouble. 388 00:17:06,640 --> 00:17:07,710 You're not getting enough people 389 00:17:07,710 --> 00:17:08,710 with those three criteria. 390 00:17:08,710 --> 00:17:11,090 So you say, OK, now we're going to make it six criteria, 391 00:17:11,090 --> 00:17:13,339 and we'll be happy if they only meet four of the six. 392 00:17:13,339 --> 00:17:16,630 That right there would not make it possible to do this. 393 00:17:16,630 --> 00:17:22,160 PROFESSOR: So if, at the end of your processes, where 394 00:17:22,160 --> 00:17:24,859 you're saying three criteria, six criteria, five, four, 395 00:17:24,859 --> 00:17:27,490 whatever you say-- if at the end of this process, you end 396 00:17:27,490 --> 00:17:32,340 up with a large enough pool to be able to randomly assign 397 00:17:32,340 --> 00:17:34,930 into two groups, treatment and control? 398 00:17:34,930 --> 00:17:36,190 No problem. 399 00:17:36,190 --> 00:17:37,495 You could have relaxed the criteria. 400 00:17:37,495 --> 00:17:41,990 You could have said six, five, four, whatever you want. 401 00:17:41,990 --> 00:17:45,960 My previous answer is more to, don't change the criteria just 402 00:17:45,960 --> 00:17:47,580 because you want to do a randomized trial. 403 00:17:47,580 --> 00:17:49,180 You want to evaluate the program 404 00:17:49,180 --> 00:17:50,380 that you want to evaluate. 405 00:17:50,380 --> 00:17:52,790 You don't want to evaluate the program that you think will 406 00:17:52,790 --> 00:17:55,180 fit the randomized design. 407 00:17:55,180 --> 00:17:56,800 Make sense? 408 00:17:56,800 --> 00:17:59,703 Other questions, comments? 409 00:17:59,703 --> 00:18:01,180 No? 410 00:18:01,180 --> 00:18:01,520 OK. 411 00:18:01,520 --> 00:18:04,630 So the two groups did not differ systematically at the 412 00:18:04,630 --> 00:18:05,610 outset of the experiment. 413 00:18:05,610 --> 00:18:06,990 I want to emphasize this. 414 00:18:06,990 --> 00:18:09,070 And again, there's going to be an exercise where you can see 415 00:18:09,070 --> 00:18:10,180 this in Excel. 416 00:18:10,180 --> 00:18:13,850 But the key is that the two groups will be identical both 417 00:18:13,850 --> 00:18:15,875 on observable characteristics and non-observable. 418 00:18:15,875 --> 00:18:17,920 And when I say identical, they're identical 419 00:18:17,920 --> 00:18:18,730 statistically. 420 00:18:18,730 --> 00:18:20,910 It's not like the needs of these two groups 421 00:18:20,910 --> 00:18:21,980 are exactly the same. 422 00:18:21,980 --> 00:18:26,290 They are statistically identical in the sense that 423 00:18:26,290 --> 00:18:28,560 you should not observe a pattern of statistically 424 00:18:28,560 --> 00:18:31,210 significant differences between the two groups. 425 00:18:31,210 --> 00:18:34,370 If you were to test 100 characteristics, then five of 426 00:18:34,370 --> 00:18:36,670 them may end up being statistically significant, 427 00:18:36,670 --> 00:18:40,240 just because of the luck of the draw or multiple testing. 428 00:18:40,240 --> 00:18:43,690 But they shouldn't differ systematically at the outset 429 00:18:43,690 --> 00:18:46,130 of the experiment. 430 00:18:46,130 --> 00:18:49,240 And this is the key. 431 00:18:49,240 --> 00:18:51,940 The whole key of impact evaluation is that then you 432 00:18:51,940 --> 00:18:54,660 can take that difference and attribute it to the program. 433 00:18:54,660 --> 00:18:57,510 And then you're not thinking, is it the program, or is it 434 00:18:57,510 --> 00:19:00,650 some pre-existing differences between the groups? 435 00:19:00,650 --> 00:19:04,710 If you reach the end of an impact evaluation and you're 436 00:19:04,710 --> 00:19:08,180 wondering, is it the program, or is it something else? 437 00:19:08,180 --> 00:19:10,560 Unfortunately, that's not a very good impact evaluation. 438 00:19:16,780 --> 00:19:18,680 So there are some variations on the basics. 439 00:19:18,680 --> 00:19:20,530 You could assign to multiple treatment groups. 440 00:19:20,530 --> 00:19:23,490 So rather than having only one treatment, you could have 441 00:19:23,490 --> 00:19:25,230 multiple treatments. 442 00:19:25,230 --> 00:19:27,460 And this happens a lot if you're trying to test 443 00:19:27,460 --> 00:19:30,080 different ways of implementing a program. 444 00:19:30,080 --> 00:19:35,100 So you may have a program that you're thinking, well, I don't 445 00:19:35,100 --> 00:19:37,450 know of the best way to deliver it is method number 446 00:19:37,450 --> 00:19:38,920 one or method number two. 447 00:19:38,920 --> 00:19:41,570 And you may randomize into three groups. 448 00:19:41,570 --> 00:19:43,360 Method number one, method number two, 449 00:19:43,360 --> 00:19:44,490 and a control group. 450 00:19:44,490 --> 00:19:47,090 Or you may decide to do away with the control group and 451 00:19:47,090 --> 00:19:50,320 only randomize into, say, three methods, three ways of 452 00:19:50,320 --> 00:19:51,640 delivering an intervention. 453 00:19:51,640 --> 00:19:54,180 If you do away with the control group, you're going to 454 00:19:54,180 --> 00:19:56,320 be able to answer the question, is one treatment 455 00:19:56,320 --> 00:19:57,400 better than the other? 456 00:19:57,400 --> 00:19:59,790 But you're not going to be able to answer the question, 457 00:19:59,790 --> 00:20:02,580 is any of this treatment better than what would have 458 00:20:02,580 --> 00:20:04,950 happened in the absence of the program? 459 00:20:04,950 --> 00:20:06,210 So this is one variation. 460 00:20:06,210 --> 00:20:09,190 And the other variation, we were talking about when Iqbal 461 00:20:09,190 --> 00:20:10,470 answered the question. 462 00:20:10,470 --> 00:20:12,120 He said, well, you have a bunch of people. 463 00:20:12,120 --> 00:20:13,490 You assign some to the treatment or 464 00:20:13,490 --> 00:20:14,720 to the control group. 465 00:20:14,720 --> 00:20:17,800 You can assign units other then people or households. 466 00:20:17,800 --> 00:20:21,770 Health centers, schools, local government, villages. 467 00:20:21,770 --> 00:20:23,740 And you can see in JPAL's website. 468 00:20:23,740 --> 00:20:25,950 There are a bunch of examples where each of these have been 469 00:20:25,950 --> 00:20:29,910 used as units for random assignment? 470 00:20:29,910 --> 00:20:31,400 Yes? 471 00:20:31,400 --> 00:20:32,280 Your name, please? 472 00:20:32,280 --> 00:20:34,340 We don't have name tags, but I like to call 473 00:20:34,340 --> 00:20:35,630 people by their name. 474 00:20:35,630 --> 00:20:36,550 Wendy? 475 00:20:36,550 --> 00:20:37,920 Go ahead. 476 00:20:37,920 --> 00:20:45,630 AUDIENCE: So if we pick schools, my conclusions will 477 00:20:45,630 --> 00:20:47,380 be about schools. 478 00:20:47,380 --> 00:20:50,427 They won't be about the students in the school. 479 00:20:50,427 --> 00:20:51,590 Or is that wrong? 480 00:20:51,590 --> 00:20:54,970 PROFESSOR: So it depends on-- you say your conclusions will 481 00:20:54,970 --> 00:20:57,530 be about the schools? 482 00:20:57,530 --> 00:21:01,320 The key thing is, what is the unit of intervention here? 483 00:21:01,320 --> 00:21:04,990 So it's a program that's directed at all the children 484 00:21:04,990 --> 00:21:08,110 in the school, only some children in the school? 485 00:21:08,110 --> 00:21:10,970 In part, the decision of what you randomize, whether it's 486 00:21:10,970 --> 00:21:13,990 schools or children within schools, depends on what's the 487 00:21:13,990 --> 00:21:16,040 nature of the treatment. 488 00:21:16,040 --> 00:21:19,660 So if you have a program that serves everyone 489 00:21:19,660 --> 00:21:21,280 in the school, yes. 490 00:21:21,280 --> 00:21:24,740 Your assignment should be at the school level. 491 00:21:24,740 --> 00:21:27,020 That is, you should have some schools that receive the 492 00:21:27,020 --> 00:21:28,970 program and others that don't. 493 00:21:28,970 --> 00:21:31,540 But if you have a program that is only going to serve some 494 00:21:31,540 --> 00:21:35,820 children in the school, then your assignment could be 495 00:21:35,820 --> 00:21:38,570 within the school, and you have some children who receive 496 00:21:38,570 --> 00:21:41,840 the treatment, and others that do not. 497 00:21:41,840 --> 00:21:44,630 The key, though, is if you're using your second method, you 498 00:21:44,630 --> 00:21:46,610 want to make sure there are no spillovers. 499 00:21:46,610 --> 00:21:49,070 You want to make sure that someone receiving the 500 00:21:49,070 --> 00:21:53,020 treatment is not going to affect the outcomes of someone 501 00:21:53,020 --> 00:21:55,030 not receiving the treatment. 502 00:21:55,030 --> 00:21:56,550 And so you're going to see the spillovers. 503 00:21:56,550 --> 00:21:59,700 That's something you're going to see on Friday. 504 00:21:59,700 --> 00:22:03,500 But the basic idea is, what level of randomization you 505 00:22:03,500 --> 00:22:06,890 have depends on, what is the level of your treatment? 506 00:22:06,890 --> 00:22:09,240 If you're treating schools, if you're treating individuals 507 00:22:09,240 --> 00:22:10,775 within schools, et cetera. 508 00:22:10,775 --> 00:22:14,060 AUDIENCE: So statistically I want them to be the same. 509 00:22:14,060 --> 00:22:18,310 PROFESSOR: You want them to be the same, yes. 510 00:22:18,310 --> 00:22:19,555 AUDIENCE: My name is Manuel. 511 00:22:19,555 --> 00:22:21,485 Please talk a little bit about the unobserved 512 00:22:21,485 --> 00:22:22,590 characteristics. 513 00:22:22,590 --> 00:22:23,930 PROFESSOR: Yes. 514 00:22:23,930 --> 00:22:26,960 So the unobserved characteristics-- 515 00:22:26,960 --> 00:22:30,220 this is something that a lot of the non-experimental 516 00:22:30,220 --> 00:22:32,430 methods wrestle with. 517 00:22:32,430 --> 00:22:38,390 And the idea is, the randomized experiment creates 518 00:22:38,390 --> 00:22:42,680 these two groups that, by pure laws of statistics, are 519 00:22:42,680 --> 00:22:46,480 identical in every single characteristic, 520 00:22:46,480 --> 00:22:48,060 statistically speaking. 521 00:22:48,060 --> 00:22:49,920 So both the ones you observe and the 522 00:22:49,920 --> 00:22:51,390 ones you don't observe. 523 00:22:51,390 --> 00:22:54,410 So if we were trying to do an experiment in this classroom 524 00:22:54,410 --> 00:22:58,575 and I randomly assigned you into two groups, I can be 525 00:22:58,575 --> 00:23:02,430 confident that even things I don't observe about you, 526 00:23:02,430 --> 00:23:04,920 you're going to be balanced across those two groups. 527 00:23:04,920 --> 00:23:09,480 If instead I try to match you, I use all the information you 528 00:23:09,480 --> 00:23:14,450 gave me on your application forms and say OK, these people 529 00:23:14,450 --> 00:23:15,450 are from this-- 530 00:23:15,450 --> 00:23:19,200 I'm going to be able to do so with the observables, but not 531 00:23:19,200 --> 00:23:20,520 with the unobservables. 532 00:23:20,520 --> 00:23:23,540 And again, depending on how important these unobservable 533 00:23:23,540 --> 00:23:27,080 are in explaining the outcomes, that may be a big 534 00:23:27,080 --> 00:23:29,810 disadvantage or not so big disadvantage. 535 00:23:29,810 --> 00:23:33,640 And this is what happened in the get out the vote example. 536 00:23:33,640 --> 00:23:37,730 You were able to observe some characteristics of people. 537 00:23:37,730 --> 00:23:40,660 And then non-experimental methods, all of them-- 538 00:23:40,660 --> 00:23:42,250 I mean, not all of them, but most of them-- 539 00:23:42,250 --> 00:23:45,710 can address those. 540 00:23:45,710 --> 00:23:48,450 Some of the methods can also address some unobservables, 541 00:23:48,450 --> 00:23:53,170 but again, they always rely on some assumption about how 542 00:23:53,170 --> 00:23:55,060 those unobservables behave. 543 00:23:55,060 --> 00:23:58,090 Here you're not relying on any assumptions. 544 00:23:58,090 --> 00:24:00,750 You need to do the random assignment properly, but once 545 00:24:00,750 --> 00:24:05,700 it's done properly, you're not relying on any assumption. 546 00:24:05,700 --> 00:24:07,220 AUDIENCE: Is that the general dichotomy? 547 00:24:07,220 --> 00:24:12,992 There's randomized tests, and then matched pairs tests? 548 00:24:12,992 --> 00:24:15,066 Or is there other , is it generally broken 549 00:24:15,066 --> 00:24:17,100 down into those two? 550 00:24:17,100 --> 00:24:21,220 PROFESSOR: So the way that I think most people break it 551 00:24:21,220 --> 00:24:25,520 down is randomized, where you use this random assignment, 552 00:24:25,520 --> 00:24:28,860 and then non-experimental methods. 553 00:24:28,860 --> 00:24:31,760 But I don't mean to imply that all the non-experimental 554 00:24:31,760 --> 00:24:33,440 methods are the same. 555 00:24:33,440 --> 00:24:35,620 And in fact, there are some people who called them 556 00:24:35,620 --> 00:24:37,120 quasi-experimental methods. 557 00:24:37,120 --> 00:24:40,840 Those people tend to think of them a little bit higher than 558 00:24:40,840 --> 00:24:42,460 the non-experimental methods. 559 00:24:42,460 --> 00:24:46,100 Non-experimental people tend to say, this is not good. 560 00:24:46,100 --> 00:24:49,990 Quasi-experimental, oh, this gets closer to the experiment. 561 00:24:49,990 --> 00:24:55,470 But the key thing here is that whatever method you use, the 562 00:24:55,470 --> 00:25:00,010 key is how are the people getting into the program being 563 00:25:00,010 --> 00:25:03,500 selected, and how are you forming that comparison group, 564 00:25:03,500 --> 00:25:07,060 and what statistical techniques are you using to 565 00:25:07,060 --> 00:25:10,550 adjust for whether that comparison group is the same 566 00:25:10,550 --> 00:25:11,810 or not than the treatment? 567 00:25:11,810 --> 00:25:13,290 So the dichotomy is not between 568 00:25:13,290 --> 00:25:14,310 randomized and matching. 569 00:25:14,310 --> 00:25:17,100 The dichotomy is usually between randomized and 570 00:25:17,100 --> 00:25:18,530 everything else. 571 00:25:18,530 --> 00:25:20,660 But within everything else, there are methods that are 572 00:25:20,660 --> 00:25:23,010 much better than others. 573 00:25:23,010 --> 00:25:24,260 Yes? 574 00:25:26,110 --> 00:25:27,674 [? Holgo? ?] 575 00:25:27,674 --> 00:25:30,536 AUDIENCE: How do we randomize when we assign people into 576 00:25:30,536 --> 00:25:31,967 treatment and control groups, besides a lottery? 577 00:25:31,967 --> 00:25:33,217 [INAUDIBLE] 578 00:25:35,310 --> 00:25:36,750 PROFESSOR: You mean the process? 579 00:25:36,750 --> 00:25:39,220 So tomorrow, the whole day is going to be 580 00:25:39,220 --> 00:25:40,650 about how to randomize. 581 00:25:40,650 --> 00:25:44,650 But the basic idea is, you can do it in a variety of ways. 582 00:25:44,650 --> 00:25:47,250 You can do it in a computer, which allows you a lot more 583 00:25:47,250 --> 00:25:48,640 flexibility. 584 00:25:48,640 --> 00:25:52,630 But if for any reason, you need to show people that 585 00:25:52,630 --> 00:25:55,210 you're doing it in a random, transparent manner, that can 586 00:25:55,210 --> 00:25:56,070 also be done. 587 00:25:56,070 --> 00:26:01,910 We just did one in Niger in West Africa where we used 588 00:26:01,910 --> 00:26:03,060 bingo balls. 589 00:26:03,060 --> 00:26:05,780 So literally, people would draw from there, and then 590 00:26:05,780 --> 00:26:07,140 everyone could see. 591 00:26:07,140 --> 00:26:10,430 If we had brought a computer into their room in Niger and 592 00:26:10,430 --> 00:26:13,300 tried to do things, it just wouldn't have worked. 593 00:26:13,300 --> 00:26:16,080 People would have said, what are you doing here? 594 00:26:16,080 --> 00:26:19,770 So there are there of many different ways of randomizing. 595 00:26:19,770 --> 00:26:22,070 The key-- and this is something we're going to talk 596 00:26:22,070 --> 00:26:23,200 about in a little bit-- 597 00:26:23,200 --> 00:26:26,590 is what exactly is the process that you use to make sure that 598 00:26:26,590 --> 00:26:29,510 it's random assignment, not the how, you know, whether 599 00:26:29,510 --> 00:26:34,640 it's bingo balls or a lottery or a coin or whatever it is. 600 00:26:34,640 --> 00:26:35,090 Yes? 601 00:26:35,090 --> 00:26:38,520 AUDIENCE: So at what point this week will we talk about 602 00:26:38,520 --> 00:26:42,590 the ethical dimensions of denying treatment to someone? 603 00:26:42,590 --> 00:26:43,040 PROFESSOR: OK. 604 00:26:43,040 --> 00:26:46,530 Like in three slides, you can jump at me 605 00:26:46,530 --> 00:26:49,360 with the ethical issues. 606 00:26:49,360 --> 00:26:52,550 And then if I don't satisfy you, you have four more days 607 00:26:52,550 --> 00:26:55,340 to jump at every single people who comes into this room. 608 00:26:58,560 --> 00:27:01,210 So what I want to give you is a little bit of 609 00:27:01,210 --> 00:27:02,170 the nuts and bolts. 610 00:27:02,170 --> 00:27:05,040 Rather they keep this discussion in the abstract, 611 00:27:05,040 --> 00:27:06,730 this is what happens in the experiment. 612 00:27:06,730 --> 00:27:09,160 The nuts and bolts, if you wanted to do a randomized 613 00:27:09,160 --> 00:27:12,240 experiment tomorrow, these are sort of eight key steps that 614 00:27:12,240 --> 00:27:14,650 you need to think about. 615 00:27:14,650 --> 00:27:17,410 This is a very simplified description of the process. 616 00:27:17,410 --> 00:27:21,390 As those people sitting in the back will tell you, this is 617 00:27:21,390 --> 00:27:22,420 very simplified. 618 00:27:22,420 --> 00:27:25,900 Their daily lives are consumed with many of the steps, and 619 00:27:25,900 --> 00:27:30,520 they work months, if not years, in each of this. 620 00:27:30,520 --> 00:27:33,530 The first step, and I can't emphasize this enough, is to 621 00:27:33,530 --> 00:27:35,870 design the study carefully. 622 00:27:35,870 --> 00:27:41,550 So no matter what you do, what you do at the beginning is 623 00:27:41,550 --> 00:27:44,480 going to affect you study for the rest of the study. 624 00:27:44,480 --> 00:27:47,450 This is true for some things in life and not others. 625 00:27:47,450 --> 00:27:51,420 For evaluations, impact evaluations, if you don't do 626 00:27:51,420 --> 00:27:53,900 it right at the beginning, you're going to be in trouble. 627 00:27:53,900 --> 00:27:55,360 That's going to come down to haunt you. 628 00:27:55,360 --> 00:27:58,690 So anything you can do to spend time at the beginning, 629 00:27:58,690 --> 00:28:02,120 making sure that the study is designed properly, is going to 630 00:28:02,120 --> 00:28:03,890 be very helpful. 631 00:28:03,890 --> 00:28:07,690 What that means, in very practical terms, is if you are 632 00:28:07,690 --> 00:28:10,860 in a position where you are commissioning a study, and you 633 00:28:10,860 --> 00:28:14,280 don't have people in your staff who are expert at this, 634 00:28:14,280 --> 00:28:16,470 make sure that whoever is going to help you do the 635 00:28:16,470 --> 00:28:19,600 evaluation is involved from the very beginning. 636 00:28:19,600 --> 00:28:24,010 What this also means is that calling someone three years 637 00:28:24,010 --> 00:28:26,100 after the program was implemented, saying, can you 638 00:28:26,100 --> 00:28:28,060 come and evaluate? 639 00:28:28,060 --> 00:28:31,980 That leaves the evaluator with very few options. 640 00:28:31,980 --> 00:28:37,060 So the earlier the evaluators are involved, the better the 641 00:28:37,060 --> 00:28:39,350 options are in terms of how you can do this. 642 00:28:39,350 --> 00:28:42,550 Both in terms of the validity of the evaluation, but also in 643 00:28:42,550 --> 00:28:47,010 terms of how it will interact with the program in a way that 644 00:28:47,010 --> 00:28:49,140 it doesn't disrupt the program. 645 00:28:49,140 --> 00:28:50,360 So this is key. 646 00:28:50,360 --> 00:28:53,870 And we can talk about design a little bit now, but you will 647 00:28:53,870 --> 00:28:56,670 learn a little bit about design when you speak about 648 00:28:56,670 --> 00:28:59,960 sample size, about measurement issues, and all of those 649 00:28:59,960 --> 00:29:01,090 sessions are coming. 650 00:29:01,090 --> 00:29:02,110 How to randomize. 651 00:29:02,110 --> 00:29:05,930 So Wednesday and Thursday are really about that. 652 00:29:05,930 --> 00:29:08,750 The second one is to randomly assign people to treatment or 653 00:29:08,750 --> 00:29:11,790 control or more groups, if there are more than those. 654 00:29:11,790 --> 00:29:13,840 The third one is to collect baseline data. 655 00:29:13,840 --> 00:29:15,660 So this is a big question that comes up. 656 00:29:15,660 --> 00:29:17,680 Should you collect baseline data? 657 00:29:17,680 --> 00:29:23,300 I think my answer to that is, in general, if you don't have 658 00:29:23,300 --> 00:29:27,250 a randomized evaluation, it's going to be very, very, very 659 00:29:27,250 --> 00:29:30,530 difficult to get away without baseline data. 660 00:29:30,530 --> 00:29:32,180 There are some methods that work, but 661 00:29:32,180 --> 00:29:33,170 it's going to be difficult. 662 00:29:33,170 --> 00:29:36,220 By baseline, I mean, before the intervention started. 663 00:29:36,220 --> 00:29:41,110 If you have a randomized trial it would be highly preferable 664 00:29:41,110 --> 00:29:43,390 to have baseline data. 665 00:29:43,390 --> 00:29:44,530 Highly preferable. 666 00:29:44,530 --> 00:29:47,630 But not as critical as with other methods. 667 00:29:47,630 --> 00:29:49,400 And it's preferable in two ways. 668 00:29:49,400 --> 00:29:53,240 The first one is if you have a baseline data, you can verify, 669 00:29:53,240 --> 00:29:55,910 at least in terms of those characteristics you collected 670 00:29:55,910 --> 00:29:58,730 in the baseline survey, you can verify that 671 00:29:58,730 --> 00:30:00,040 two groups look like. 672 00:30:00,040 --> 00:30:03,280 This is a nice thing to verify at the beginning and not at 673 00:30:03,280 --> 00:30:04,660 the end of the evaluation. 674 00:30:04,660 --> 00:30:06,600 So if you can do it, that would be helpful. 675 00:30:06,600 --> 00:30:08,470 And the second thing you have to do is-- yes? 676 00:30:08,470 --> 00:30:09,370 AUDIENCE: Sorry. 677 00:30:09,370 --> 00:30:11,880 What happens if, at the baseline data, you realize 678 00:30:11,880 --> 00:30:14,370 that the two groups that you made were not random? 679 00:30:14,370 --> 00:30:16,820 Do you go and keep randomizing until you get there? 680 00:30:16,820 --> 00:30:18,400 PROFESSOR: So it depends. 681 00:30:18,400 --> 00:30:20,950 It depends on when you discovered this. 682 00:30:20,950 --> 00:30:23,620 If you discover this when the treatment is already being 683 00:30:23,620 --> 00:30:27,130 implemented, it is too late to do anything else in terms of 684 00:30:27,130 --> 00:30:28,200 re-randomizing. 685 00:30:28,200 --> 00:30:31,980 The ideal scenario is one in which you can do this, collect 686 00:30:31,980 --> 00:30:35,830 the baseline data, randomize, verify that they are similar, 687 00:30:35,830 --> 00:30:38,570 and then if they are not similar, then you can 688 00:30:38,570 --> 00:30:41,070 re-randomize again. 689 00:30:41,070 --> 00:30:44,120 There's controversy about how many times you should do this, 690 00:30:44,120 --> 00:30:47,980 but for the most part, in general, if you randomize, the 691 00:30:47,980 --> 00:30:49,440 two groups should look similar. 692 00:30:49,440 --> 00:30:53,720 There are very few scenarios, but they exist, where they 693 00:30:53,720 --> 00:30:55,040 don't look similar to each other. 694 00:30:55,040 --> 00:30:57,770 And if you reach one of those scenarios, you can 695 00:30:57,770 --> 00:30:59,420 re-randomize. 696 00:30:59,420 --> 00:31:02,910 What you can't do is re-randomize when the 697 00:31:02,910 --> 00:31:04,580 treatment is already being distributed. 698 00:31:04,580 --> 00:31:06,740 So if you already decided, you're in the treatment group, 699 00:31:06,740 --> 00:31:08,260 you're in the control group, you can't 700 00:31:08,260 --> 00:31:11,200 re-randomize at that phase. 701 00:31:11,200 --> 00:31:13,350 The second reason you want to collect data, and this is 702 00:31:13,350 --> 00:31:15,530 going to be important particularly in a setting like 703 00:31:15,530 --> 00:31:18,310 yours, if you are worried about sample size, is that it 704 00:31:18,310 --> 00:31:19,960 buys a lot of statistical power. 705 00:31:19,960 --> 00:31:22,540 Particularly if you can collect data on the baseline 706 00:31:22,540 --> 00:31:24,800 version of the outcomes that you care about. 707 00:31:24,800 --> 00:31:27,730 If you can do that, it's highly desirable. 708 00:31:27,730 --> 00:31:31,310 The reality is that sometimes it's feasible to collect 709 00:31:31,310 --> 00:31:34,080 baseline data and sometimes the nature of implementation 710 00:31:34,080 --> 00:31:35,930 of the program makes it difficult. 711 00:31:35,930 --> 00:31:42,804 But you will do well if you can collect baseline data. 712 00:31:42,804 --> 00:31:46,020 AUDIENCE: Wouldn't it seem that by the very fact of 713 00:31:46,020 --> 00:31:48,990 collecting the baseline data, once we have already 714 00:31:48,990 --> 00:31:57,547 randomized, can bias this randomized by collecting the 715 00:31:57,547 --> 00:31:58,720 baseline data? 716 00:31:58,720 --> 00:32:05,120 PROFESSOR: Because you're affecting the people who are 717 00:32:05,120 --> 00:32:07,000 answering the survey? 718 00:32:07,000 --> 00:32:10,210 Well, this has to do a little bit more with survey design 719 00:32:10,210 --> 00:32:11,600 than with any other thing. 720 00:32:11,600 --> 00:32:15,530 The key is, you're going to collect baseline data for both 721 00:32:15,530 --> 00:32:17,520 the participant or the treatment 722 00:32:17,520 --> 00:32:19,280 and the control group. 723 00:32:19,280 --> 00:32:21,910 So if you feel that when people answer a 724 00:32:21,910 --> 00:32:24,810 survey, they somehow-- 725 00:32:24,810 --> 00:32:25,910 I don't know-- 726 00:32:25,910 --> 00:32:29,260 get optimistic about life and do better or the other way 727 00:32:29,260 --> 00:32:34,440 around, as long as it happens in the same way for both 728 00:32:34,440 --> 00:32:37,070 treatment and control groups, it's not a problem for the 729 00:32:37,070 --> 00:32:38,780 randomized trials. 730 00:32:38,780 --> 00:32:41,200 The problem would be if, for some reason, you think that 731 00:32:41,200 --> 00:32:43,400 administering a survey is going to affect the treatment 732 00:32:43,400 --> 00:32:44,900 and the control group differently. 733 00:32:44,900 --> 00:32:47,450 If that's the case, then you need to be careful about how 734 00:32:47,450 --> 00:32:50,134 you do the survey. 735 00:32:50,134 --> 00:32:55,660 AUDIENCE: Can you explain how [INAUDIBLE] statistical power? 736 00:32:55,660 --> 00:32:59,890 PROFESSOR: So in technical terms, what happens is, you, 737 00:32:59,890 --> 00:33:03,680 in your regression, where you estimate an impact, you have 738 00:33:03,680 --> 00:33:05,780 an outcome of interest. 739 00:33:05,780 --> 00:33:10,530 And that outcome has a variance, has some variations. 740 00:33:10,530 --> 00:33:13,710 And then if you can add into your regressions statistical 741 00:33:13,710 --> 00:33:17,610 controls, things you collected at baseline, what essentially 742 00:33:17,610 --> 00:33:20,720 happens is, in technical terms, the standard errors of 743 00:33:20,720 --> 00:33:22,910 your coefficients, particularly if these 744 00:33:22,910 --> 00:33:25,990 variables have a lot of explanatory power, those 745 00:33:25,990 --> 00:33:27,480 standard errors should drop, and you get 746 00:33:27,480 --> 00:33:28,730 more statistical power. 747 00:33:31,220 --> 00:33:32,210 Yes, Jessica? 748 00:33:32,210 --> 00:33:34,110 AUDIENCE: Do you mean to say that you have to collect the 749 00:33:34,110 --> 00:33:36,960 baseline data after you do the first round of randomization? 750 00:33:36,960 --> 00:33:38,180 Does it matter what order you do those steps in? 751 00:33:38,180 --> 00:33:39,010 PROFESSOR: Sorry. 752 00:33:39,010 --> 00:33:42,290 Steps two and three can be inverted. 753 00:33:42,290 --> 00:33:47,110 In fact, it would be ideal if you could invert them. 754 00:33:47,110 --> 00:33:49,000 It would be ideal, because then you can do 755 00:33:49,000 --> 00:33:50,010 what Iqbal is saying. 756 00:33:50,010 --> 00:33:52,570 Which is, you collect the baseline data, you do the 757 00:33:52,570 --> 00:33:55,230 randomization, and then you say, OK. 758 00:33:55,230 --> 00:33:57,160 Are they the same or not? 759 00:33:57,160 --> 00:34:00,100 Then if they're not the same, you re-randomize. 760 00:34:00,100 --> 00:34:03,290 If you collect the baseline data after randomly assigning, 761 00:34:03,290 --> 00:34:06,360 unless you have not communicated to people who 762 00:34:06,360 --> 00:34:09,139 gets the treatment and who gets the control, your options 763 00:34:09,139 --> 00:34:11,429 for re-randomizing are not very good. 764 00:34:11,429 --> 00:34:13,880 So very good point. 765 00:34:13,880 --> 00:34:14,360 All right. 766 00:34:14,360 --> 00:34:16,850 So the fourth step is to verify that the assignment 767 00:34:16,850 --> 00:34:17,630 looks random. 768 00:34:17,630 --> 00:34:19,770 By verifying that the assignment looks random, this 769 00:34:19,770 --> 00:34:22,760 is something that if you were to commission an evaluation, 770 00:34:22,760 --> 00:34:24,590 you should make sure that your evaluator 771 00:34:24,590 --> 00:34:26,790 provides to you this. 772 00:34:26,790 --> 00:34:29,580 Which is at the very least a table that says, here's the 773 00:34:29,580 --> 00:34:33,080 treatment group, here's the control group, and here's how 774 00:34:33,080 --> 00:34:35,110 they look like in terms of these baseline 775 00:34:35,110 --> 00:34:36,350 characteristics. 776 00:34:36,350 --> 00:34:40,290 And ideally those two groups, those tables should have very, 777 00:34:40,290 --> 00:34:44,159 very few differences between the groups. 778 00:34:44,159 --> 00:34:47,370 When I say differences, they cannot be, in practical terms, 779 00:34:47,370 --> 00:34:48,699 large differences. 780 00:34:48,699 --> 00:34:50,239 There could be some differences that are 781 00:34:50,239 --> 00:34:53,199 statistically significant, because either you have a lot 782 00:34:53,199 --> 00:34:57,920 of statistical power, or more likely, if you compare 10 783 00:34:57,920 --> 00:35:00,770 variables, some of them will end up being significant. 784 00:35:00,770 --> 00:35:03,210 The key is, there are no systematic differences between 785 00:35:03,210 --> 00:35:03,880 the groups. 786 00:35:03,880 --> 00:35:06,410 If you observe systematic differences, 787 00:35:06,410 --> 00:35:07,800 then you're in trouble. 788 00:35:07,800 --> 00:35:09,150 This didn't work well. 789 00:35:09,150 --> 00:35:12,350 But I can tell you from experience, from the law of 790 00:35:12,350 --> 00:35:17,360 statistics, these two groups will look the same 791 00:35:17,360 --> 00:35:20,430 a lot of the time. 792 00:35:20,430 --> 00:35:20,770 OK. 793 00:35:20,770 --> 00:35:24,490 So obviously you can only do that verification if you have 794 00:35:24,490 --> 00:35:27,290 some data on the two groups before. 795 00:35:27,290 --> 00:35:30,110 Now, when I say "collect baseline data," if maybe you 796 00:35:30,110 --> 00:35:33,360 already have baseline data-- for some reason this is a 797 00:35:33,360 --> 00:35:35,760 population that you're ready serving, you already did 798 00:35:35,760 --> 00:35:38,790 surveys on these people-- 799 00:35:38,790 --> 00:35:42,750 if that's the case, then all the better. 800 00:35:42,750 --> 00:35:45,950 It may be that you don't have baseline data, but you may be 801 00:35:45,950 --> 00:35:47,390 able to get baseline data. 802 00:35:47,390 --> 00:35:50,840 So for example, if you're randomly assigning schools, 803 00:35:50,840 --> 00:35:53,770 you may have, from the government or from some 804 00:35:53,770 --> 00:35:56,220 agency, some census of schools. 805 00:35:56,220 --> 00:35:58,670 And you may be able to compare schools in terms of 806 00:35:58,670 --> 00:36:01,050 socioeconomic characteristics of the students. 807 00:36:01,050 --> 00:36:03,540 You may be able to compare schools, you know, percent of 808 00:36:03,540 --> 00:36:04,800 private, public. 809 00:36:04,800 --> 00:36:07,040 If there was a test done nationally for all the 810 00:36:07,040 --> 00:36:08,720 schools, you may be able to compare test 811 00:36:08,720 --> 00:36:10,480 scores on those schools. 812 00:36:10,480 --> 00:36:13,490 The key thing is, anything you can do to verify that, will a 813 00:36:13,490 --> 00:36:14,910 random assignment work? 814 00:36:14,910 --> 00:36:15,600 Is good. 815 00:36:15,600 --> 00:36:19,910 It would be useful to do it at the beginning. 816 00:36:19,910 --> 00:36:22,940 The fifth step is to monitor the process so that the 817 00:36:22,940 --> 00:36:24,830 integrity of the experiment is not compromised. 818 00:36:24,830 --> 00:36:28,300 This is something that's really, really key. 819 00:36:28,300 --> 00:36:30,660 When you do a randomized experiment, designing the 820 00:36:30,660 --> 00:36:32,230 study carefully is very important. 821 00:36:32,230 --> 00:36:34,480 Doing the random assignment is very important. 822 00:36:34,480 --> 00:36:37,690 But you can't just relax and then wait for two years until 823 00:36:37,690 --> 00:36:39,270 you collect the outcomes. 824 00:36:39,270 --> 00:36:41,840 And the people who are sitting at the back of the room know 825 00:36:41,840 --> 00:36:43,540 this much better than I do. 826 00:36:43,540 --> 00:36:46,510 If you are not following exactly what's happening in 827 00:36:46,510 --> 00:36:52,280 the field, the opportunities for this experiment to not go 828 00:36:52,280 --> 00:36:56,160 well are very, very big. 829 00:36:56,160 --> 00:36:59,220 You're going to have a whole session on Friday on threats 830 00:36:59,220 --> 00:37:01,470 to an experiment. 831 00:37:01,470 --> 00:37:05,290 The only thing I will say now is that the best way to deal 832 00:37:05,290 --> 00:37:09,450 with threats to an experiment is to avoid those threats, and 833 00:37:09,450 --> 00:37:12,950 to avoid them at this stage of implementation. 834 00:37:12,950 --> 00:37:14,145 One very quick threat. 835 00:37:14,145 --> 00:37:16,890 If you assign people to a treatment group and people to 836 00:37:16,890 --> 00:37:20,310 a control group, that means that people in the control 837 00:37:20,310 --> 00:37:22,190 group are not offered the treatment. 838 00:37:22,190 --> 00:37:24,890 But that also means, they shouldn't get the treatment. 839 00:37:24,890 --> 00:37:29,160 And as some of you know, that doesn't always happen. 840 00:37:29,160 --> 00:37:31,300 So some people in the control group find their 841 00:37:31,300 --> 00:37:33,420 way into the program. 842 00:37:33,420 --> 00:37:36,960 Having systems to monitor that this doesn't happen, and that 843 00:37:36,960 --> 00:37:42,180 if it does happen, that it happens in very, very few 844 00:37:42,180 --> 00:37:44,810 exceptional cases, is going to be very important. 845 00:37:44,810 --> 00:37:46,450 Yes, Logan? 846 00:37:46,450 --> 00:37:49,580 AUDIENCE: One of the arguments for the superiority of the 847 00:37:49,580 --> 00:37:54,530 matched pairs is that if one treatment group ends up not 848 00:37:54,530 --> 00:37:58,040 getting the treatment because lack of capacity in that 849 00:37:58,040 --> 00:38:00,733 region, or vice versa, the scenario you described, you 850 00:38:00,733 --> 00:38:02,210 can just drop that pair. 851 00:38:02,210 --> 00:38:03,140 PROFESSOR: Yes. 852 00:38:03,140 --> 00:38:06,810 The problem when you drop that pair is that it may 853 00:38:06,810 --> 00:38:08,300 be costly to you. 854 00:38:08,300 --> 00:38:09,700 Dropping that pair. 855 00:38:09,700 --> 00:38:13,880 And you have to assume that that-- well, first of all, you 856 00:38:13,880 --> 00:38:17,030 have to assume that pair was comparable to begin with. 857 00:38:17,030 --> 00:38:22,000 And then even if you were to drop that pair, well, first of 858 00:38:22,000 --> 00:38:23,990 all, matching doesn't always work on one-to-one. 859 00:38:23,990 --> 00:38:26,940 But even if you had one-to-one matching, suppose you had to 860 00:38:26,940 --> 00:38:31,910 drop 10% or 20% or 30% of your pairs, then you lose 861 00:38:31,910 --> 00:38:34,040 statistical power, and then you also lose external 862 00:38:34,040 --> 00:38:36,700 validity to begin with. 863 00:38:36,700 --> 00:38:37,230 Yes? 864 00:38:37,230 --> 00:38:39,940 AUDIENCE: So there's also the issue of spillover effect, 865 00:38:39,940 --> 00:38:40,910 which isn't the same. 866 00:38:40,910 --> 00:38:43,730 So one might be that somebody sneaks into the program who 867 00:38:43,730 --> 00:38:44,888 was supposed to be in the program. 868 00:38:44,888 --> 00:38:47,710 But the other is, if you do things in the same community, 869 00:38:47,710 --> 00:38:50,460 which is often the case in the work that we do, or in a 870 00:38:50,460 --> 00:38:53,370 similar environment, the mere effect of having 871 00:38:53,370 --> 00:38:54,410 something going on-- 872 00:38:54,410 --> 00:38:55,300 PROFESSOR: Yes. 873 00:38:55,300 --> 00:38:58,460 And this is why the first stage is very important. 874 00:38:58,460 --> 00:39:02,360 If you think spillovers will occur, the moment to think 875 00:39:02,360 --> 00:39:04,980 about them is at the design stage of the evaluation. 876 00:39:04,980 --> 00:39:08,460 Because then you can decide on how you're going to randomize 877 00:39:08,460 --> 00:39:11,120 in a way that minimizes the effect that 878 00:39:11,120 --> 00:39:12,890 spillovers would have. 879 00:39:12,890 --> 00:39:16,410 So there's some statistical techniques to deal with some 880 00:39:16,410 --> 00:39:17,050 of these problems. 881 00:39:17,050 --> 00:39:20,020 But the best way to do with these problems is to avoid 882 00:39:20,020 --> 00:39:20,980 them in the first place. 883 00:39:20,980 --> 00:39:23,400 And you avoid them by good design, where the evaluator 884 00:39:23,400 --> 00:39:27,200 can help, and by a good monitoring system to make sure 885 00:39:27,200 --> 00:39:31,860 that the evaluation is being implemented as intended. 886 00:39:31,860 --> 00:39:32,500 Makes sense? 887 00:39:32,500 --> 00:39:35,640 Yes, your name please? 888 00:39:35,640 --> 00:39:38,800 Are you also filming from this camera here? 889 00:39:38,800 --> 00:39:40,160 OK. 890 00:39:40,160 --> 00:39:41,200 I'm nervous now. 891 00:39:41,200 --> 00:39:44,170 Two cameras. 892 00:39:44,170 --> 00:39:46,240 AUDIENCE: What's on the [INAUDIBLE] 893 00:39:46,240 --> 00:39:50,010 to avoid [INAUDIBLE]? 894 00:39:50,010 --> 00:39:50,730 PROFESSOR: Yes. 895 00:39:50,730 --> 00:39:57,490 So I think one important thing is to have people in the field 896 00:39:57,490 --> 00:40:02,300 who can help monitor, and who know about the evaluation. 897 00:40:02,300 --> 00:40:04,120 Two is to have a clear commitment. 898 00:40:04,120 --> 00:40:05,950 This is something that Rachel said this morning that's 899 00:40:05,950 --> 00:40:07,810 really, really key. 900 00:40:07,810 --> 00:40:11,850 Very clear commitment from whoever is organizing. 901 00:40:11,850 --> 00:40:13,470 That's very creative. 902 00:40:13,470 --> 00:40:18,990 For whoever is implementing the program. 903 00:40:18,990 --> 00:40:20,350 So I'll give you an example. 904 00:40:20,350 --> 00:40:23,770 We were evaluating this program in Jamaica. 905 00:40:23,770 --> 00:40:28,000 And we were telling them, we need to monitor the 906 00:40:28,000 --> 00:40:28,670 crossovers. 907 00:40:28,670 --> 00:40:31,220 We can't have people who are not supposed to receive the 908 00:40:31,220 --> 00:40:32,790 program, get into the program. 909 00:40:32,790 --> 00:40:35,850 Yes, yes, yes. 910 00:40:35,850 --> 00:40:37,730 Is it OK if a few do it? 911 00:40:37,730 --> 00:40:41,270 We say, well, only if a few, but really, this has to be the 912 00:40:41,270 --> 00:40:44,160 exception, and you really have to monitor this rate, and we 913 00:40:44,160 --> 00:40:46,540 asked them for a report on this rate, and so on. 914 00:40:46,540 --> 00:40:49,340 This is a government agency in Jamaica. 915 00:40:49,340 --> 00:40:51,050 And so they were all the time asking, OK. 916 00:40:51,050 --> 00:40:54,200 How many is too many? 917 00:40:54,200 --> 00:40:55,740 And we were like, no, no, no, no. 918 00:40:55,740 --> 00:40:57,960 You have to keep that rate to a minimum. 919 00:40:57,960 --> 00:41:00,590 There's no way you can have crossovers. 920 00:41:00,590 --> 00:41:01,760 Just keep-- 921 00:41:01,760 --> 00:41:03,680 no, but how many, how many? 922 00:41:03,680 --> 00:41:07,990 In one day of weakness, we said, OK. 923 00:41:07,990 --> 00:41:10,790 If it's more than 10%, this is completely ruined. 924 00:41:10,790 --> 00:41:13,190 We can't do anything with it. 925 00:41:13,190 --> 00:41:17,000 So end of the evaluation arrived. 926 00:41:17,000 --> 00:41:19,110 We compute the crossover rate. 927 00:41:19,110 --> 00:41:20,360 9.6%. 928 00:41:21,990 --> 00:41:25,460 So what I want to say here is that if they didn't want to 929 00:41:25,460 --> 00:41:29,120 comply with our request, they could have made this rate be 930 00:41:29,120 --> 00:41:33,430 30% or 40% and we would have not heard anything about it. 931 00:41:33,430 --> 00:41:35,320 I'm not saying 10% is the right threshold. 932 00:41:35,320 --> 00:41:38,190 It of course depends on the program and on other things. 933 00:41:38,190 --> 00:41:41,680 But the key thing here is, you need to have full cooperation 934 00:41:41,680 --> 00:41:44,520 between the people in the field who are implementing and 935 00:41:44,520 --> 00:41:46,580 the people in the field who are evaluating. 936 00:41:46,580 --> 00:41:49,370 If you don't have that, then it's very difficult. 937 00:41:49,370 --> 00:41:52,700 Because people find a way to get to a program if they hear 938 00:41:52,700 --> 00:41:55,880 that this program is serving, is doing some good. 939 00:41:55,880 --> 00:42:00,630 So I mean, who's a parent in this room? 940 00:42:00,630 --> 00:42:00,940 All right. 941 00:42:00,940 --> 00:42:02,680 So now, confess. 942 00:42:02,680 --> 00:42:06,970 If your child, in your school, there was a randomized trial 943 00:42:06,970 --> 00:42:09,700 on this very promising, you name it. 944 00:42:09,700 --> 00:42:11,150 After school program. 945 00:42:11,150 --> 00:42:15,110 And your child fell in the control group. 946 00:42:15,110 --> 00:42:18,200 Would you be at least tempted to go to the principal and 947 00:42:18,200 --> 00:42:22,070 say, I want my child in that program? 948 00:42:22,070 --> 00:42:23,260 Tempted? 949 00:42:23,260 --> 00:42:23,810 All right. 950 00:42:23,810 --> 00:42:26,090 I can tell you that other parents are more than tempted, 951 00:42:26,090 --> 00:42:27,740 and will find a way. 952 00:42:27,740 --> 00:42:29,070 All right. 953 00:42:29,070 --> 00:42:30,180 AUDIENCE: What do you do with the spillovers? 954 00:42:30,180 --> 00:42:31,880 Do you just exclude them and put them in 955 00:42:31,880 --> 00:42:33,630 the comparison group? 956 00:42:33,630 --> 00:42:36,100 PROFESSOR: So these are called crossovers, because they cross 957 00:42:36,100 --> 00:42:37,900 from the control to the treatment. 958 00:42:37,900 --> 00:42:41,050 The key thing-- this comes at the analysis stage, and this 959 00:42:41,050 --> 00:42:42,700 you'll do on Friday. 960 00:42:42,700 --> 00:42:46,530 But the key thing is, what random assignment buys you is 961 00:42:46,530 --> 00:42:49,400 that the two groups are comparable as a whole. 962 00:42:49,400 --> 00:42:52,460 The whole treatment group with the whole control group. 963 00:42:52,460 --> 00:42:54,470 You can't then just say, oh, I don't like this 964 00:42:54,470 --> 00:42:55,360 control group member. 965 00:42:55,360 --> 00:42:56,670 I'm just going to throw it out. 966 00:42:56,670 --> 00:42:59,230 That completely destroys the comparability. 967 00:42:59,230 --> 00:43:02,660 You still need to compare the full two groups, and you do 968 00:43:02,660 --> 00:43:05,890 some statistical adjustments to deal with the crossover. 969 00:43:05,890 --> 00:43:08,590 But once a treatment, always a treatment. 970 00:43:08,590 --> 00:43:10,930 Once a control, always a control. 971 00:43:10,930 --> 00:43:13,940 The random assignment buys you that two groups are the same. 972 00:43:13,940 --> 00:43:17,370 If you throw away-- suppose then, 10% of crossovers. 973 00:43:17,370 --> 00:43:21,280 If you throw them away you will be comparing the whole 974 00:43:21,280 --> 00:43:24,540 treatment group with this 90% of the control group. 975 00:43:24,540 --> 00:43:28,140 And let's just assume for a second that that 10% who 976 00:43:28,140 --> 00:43:31,060 crossover are people who are particularly motivated, and 977 00:43:31,060 --> 00:43:32,820 that's why they switch over. 978 00:43:32,820 --> 00:43:35,770 Well then, the average motivation of the two groups 979 00:43:35,770 --> 00:43:37,840 were the same at the beginning, but once you throw 980 00:43:37,840 --> 00:43:41,030 that 10% away, the average motivation of the treatment 981 00:43:41,030 --> 00:43:43,220 group is going to be higher than the average motivation of 982 00:43:43,220 --> 00:43:43,960 the control group. 983 00:43:43,960 --> 00:43:46,400 So any difference you find in outcomes between these two 984 00:43:46,400 --> 00:43:49,400 groups could be due to the program, but could also be due 985 00:43:49,400 --> 00:43:51,700 to differences in motivation. 986 00:43:51,700 --> 00:43:53,140 You can't throw them away. 987 00:43:53,140 --> 00:43:56,020 There's statistical ways of dealing with them. 988 00:43:56,020 --> 00:43:57,965 Yes? 989 00:43:57,965 --> 00:44:00,090 AUDIENCE: Turns out, I guess I didn't understand the answer 990 00:44:00,090 --> 00:44:02,260 to the earlier question. 991 00:44:02,260 --> 00:44:05,630 So we're worried about spillover, and we're going to 992 00:44:05,630 --> 00:44:08,450 deliver books to-- 993 00:44:08,450 --> 00:44:11,990 clearly the intervention is that the kids get books that 994 00:44:11,990 --> 00:44:14,350 they can take home to study at night. 995 00:44:14,350 --> 00:44:17,750 But I've decided that because I'm worried about spillover 996 00:44:17,750 --> 00:44:20,660 and because it's more administratively convenient, 997 00:44:20,660 --> 00:44:24,370 I'm going to deliver to some schools. 998 00:44:24,370 --> 00:44:28,360 So I'm going to draw the schools at random, but I'm 999 00:44:28,360 --> 00:44:31,040 looking at the kids, impact on the kids. 1000 00:44:31,040 --> 00:44:33,100 PROFESSOR: That's OK. 1001 00:44:33,100 --> 00:44:37,810 AUDIENCE: So even so, I haven't damaged my ability to 1002 00:44:37,810 --> 00:44:42,900 look at the students' effects, because my unit of 1003 00:44:42,900 --> 00:44:46,660 randomization was at a different level. 1004 00:44:46,660 --> 00:44:49,340 PROFESSOR: That's perfectly fine. 1005 00:44:49,340 --> 00:44:55,650 However, the higher the unit of randomization, the more 1006 00:44:55,650 --> 00:44:58,920 trouble you're going to have in having enough statistical 1007 00:44:58,920 --> 00:45:00,310 power to detect effects. 1008 00:45:00,310 --> 00:45:02,880 But that's a topic that I want to leave up to Thursday. 1009 00:45:02,880 --> 00:45:04,510 But yes. 1010 00:45:04,510 --> 00:45:06,750 I mean, when we say the schools are treated-- 1011 00:45:06,750 --> 00:45:08,230 I mean, the schools are buildings. 1012 00:45:08,230 --> 00:45:10,130 They're not being treated in any way. 1013 00:45:10,130 --> 00:45:12,480 Unless you paint them or do something to them, they're not 1014 00:45:12,480 --> 00:45:13,150 being treated-- 1015 00:45:13,150 --> 00:45:14,010 AUDIENCE: Ours got paint. 1016 00:45:14,010 --> 00:45:15,260 PROFESSOR: OK. 1017 00:45:15,260 --> 00:45:19,790 So if it's just painting them, then the schools-- 1018 00:45:19,790 --> 00:45:20,770 no, but seriously. 1019 00:45:20,770 --> 00:45:24,150 When I say treated, who's being 1020 00:45:24,150 --> 00:45:25,540 affected by the treatment? 1021 00:45:25,540 --> 00:45:27,685 AUDIENCE: Well, I can't have a-- 1022 00:45:27,685 --> 00:45:28,850 it's going to hurt my power. 1023 00:45:28,850 --> 00:45:31,870 But I can randomize at a different level than 1024 00:45:31,870 --> 00:45:32,700 [INAUDIBLE]. 1025 00:45:32,700 --> 00:45:33,720 PROFESSOR: You can. 1026 00:45:33,720 --> 00:45:35,910 Particularly if you want to avoid spillovers, that's 1027 00:45:35,910 --> 00:45:39,100 exactly what you should be doing. 1028 00:45:39,100 --> 00:45:39,840 All right. 1029 00:45:39,840 --> 00:45:40,792 Yes? 1030 00:45:40,792 --> 00:45:43,100 AUDIENCE: My name is Cesar. 1031 00:45:43,100 --> 00:45:46,270 What happened when the intervention is something 1032 00:45:46,270 --> 00:45:47,010 about knowledge? 1033 00:45:47,010 --> 00:45:50,900 For example, that some nurse trained to a treatment group 1034 00:45:50,900 --> 00:45:55,850 about wash your hands, and this knowledge can-- 1035 00:45:55,850 --> 00:45:56,710 PROFESSOR: Can spillover. 1036 00:45:56,710 --> 00:45:57,270 Yeah. 1037 00:45:57,270 --> 00:45:58,190 That's exactly right. 1038 00:45:58,190 --> 00:46:01,250 So again, you need to think about the design of the study. 1039 00:46:01,250 --> 00:46:04,170 If you really think it's going to spill over, then you need 1040 00:46:04,170 --> 00:46:07,970 to think about randomizing at a higher level so that the 1041 00:46:07,970 --> 00:46:09,940 spillover doesn't occur. 1042 00:46:09,940 --> 00:46:11,560 I do have to say one thing. 1043 00:46:11,560 --> 00:46:13,830 There's some interventions where the 1044 00:46:13,830 --> 00:46:15,290 spillover is evident. 1045 00:46:15,290 --> 00:46:17,550 And you're going to see that in the deworming case. 1046 00:46:17,550 --> 00:46:18,880 I think it's case number 4. 1047 00:46:18,880 --> 00:46:21,400 So it's very clear that this is happening. 1048 00:46:21,400 --> 00:46:24,570 There's a human biological transmission of disease that 1049 00:46:24,570 --> 00:46:26,070 makes spillovers very clear. 1050 00:46:28,820 --> 00:46:30,070 This is my own bias. 1051 00:46:30,070 --> 00:46:34,110 But there are tons of problem programs out there that have 1052 00:46:34,110 --> 00:46:36,620 difficulty affecting the people that 1053 00:46:36,620 --> 00:46:38,620 they're intended to. 1054 00:46:38,620 --> 00:46:41,880 So thinking that they're going to affect other people they 1055 00:46:41,880 --> 00:46:45,800 haven't been intending to help, in some cases at least, 1056 00:46:45,800 --> 00:46:46,750 is a stretch. 1057 00:46:46,750 --> 00:46:50,080 Having said that, if you think spillovers will occur, then 1058 00:46:50,080 --> 00:46:52,520 you need to think about that at the design 1059 00:46:52,520 --> 00:46:54,130 stage of the study. 1060 00:46:54,130 --> 00:46:54,630 yes? 1061 00:46:54,630 --> 00:46:55,365 Your name please? 1062 00:46:55,365 --> 00:46:56,006 AUDIENCE: Yes, sir. 1063 00:46:56,006 --> 00:46:57,956 Raj. 1064 00:46:57,956 --> 00:46:59,888 Just getting back to the example where you were saying 1065 00:46:59,888 --> 00:47:02,061 if you took each of us, and you assigned us to two 1066 00:47:02,061 --> 00:47:04,718 different groups, it would adjust for the unobservable 1067 00:47:04,718 --> 00:47:05,684 characteristics. 1068 00:47:05,684 --> 00:47:08,300 Would that work out in a sample size so small? 1069 00:47:08,300 --> 00:47:11,280 PROFESSOR: In a sample size like this, you will have 1070 00:47:11,280 --> 00:47:12,480 trouble with statistical-- 1071 00:47:12,480 --> 00:47:14,290 I want to leave all those questions of-- 1072 00:47:14,290 --> 00:47:17,340 you have our superstar, Esther Duflo, who's going to speak 1073 00:47:17,340 --> 00:47:18,690 about statistical power. 1074 00:47:18,690 --> 00:47:26,390 But the key thing here is, if you have a small group, then 1075 00:47:26,390 --> 00:47:28,970 what happens is the sampling error is bigger. 1076 00:47:28,970 --> 00:47:31,300 So you may observe differences between the groups. 1077 00:47:31,300 --> 00:47:33,930 You may not declare them to be statistically significant 1078 00:47:33,930 --> 00:47:35,910 because you have very little power. 1079 00:47:35,910 --> 00:47:39,610 So in general, you want larger sample sizes. 1080 00:47:39,610 --> 00:47:41,180 This group is probably small. 1081 00:47:41,180 --> 00:47:43,470 But even if you did it with this group, and I challenge 1082 00:47:43,470 --> 00:47:44,440 you to do it-- 1083 00:47:44,440 --> 00:47:46,570 just take an Excel spreadsheet and take five 1084 00:47:46,570 --> 00:47:48,060 characteristics of you. 1085 00:47:48,060 --> 00:47:50,040 And the random assignment, you're going to see some 1086 00:47:50,040 --> 00:47:51,640 differences. 1087 00:47:51,640 --> 00:47:53,660 But it's really amazing how the two 1088 00:47:53,660 --> 00:47:54,910 groups will look alike. 1089 00:47:54,910 --> 00:47:55,620 And the other thing. 1090 00:47:55,620 --> 00:47:59,680 If you're not accounting for unobservable differences like 1091 00:47:59,680 --> 00:48:02,260 some non-experimental methods do. 1092 00:48:02,260 --> 00:48:04,630 The key thing about this is, you don't need to account for 1093 00:48:04,630 --> 00:48:07,090 anything, because the two groups are balanced across 1094 00:48:07,090 --> 00:48:08,060 these two things. 1095 00:48:08,060 --> 00:48:11,750 So they have the same average level of motivation, and so I 1096 00:48:11,750 --> 00:48:14,270 don't need to control statistically for motivation. 1097 00:48:14,270 --> 00:48:16,580 Because that cannot be a confounding factor if the two 1098 00:48:16,580 --> 00:48:18,040 groups are the same. 1099 00:48:18,040 --> 00:48:19,590 OK? 1100 00:48:19,590 --> 00:48:20,110 All right. 1101 00:48:20,110 --> 00:48:22,820 So step number six. 1102 00:48:22,820 --> 00:48:25,060 If you're going to measure the impact of a program on an 1103 00:48:25,060 --> 00:48:27,110 outcome of interest, you need to collect 1104 00:48:27,110 --> 00:48:28,200 data on that outcome. 1105 00:48:28,200 --> 00:48:29,780 And that's called follow-up data. 1106 00:48:29,780 --> 00:48:32,290 And the key thing is, you need to collect that for both 1107 00:48:32,290 --> 00:48:34,380 treatment and control groups. 1108 00:48:34,380 --> 00:48:37,640 And it's important that it be done in identical ways. 1109 00:48:37,640 --> 00:48:42,690 So you can't, or it would not be a good idea, to have 1110 00:48:42,690 --> 00:48:46,370 treatment group data come from one source, say, a survey, and 1111 00:48:46,370 --> 00:48:48,620 control group data come from another source, say, 1112 00:48:48,620 --> 00:48:51,370 administrative data, because data sources are generally not 1113 00:48:51,370 --> 00:48:54,400 very compatible to each other. 1114 00:48:54,400 --> 00:48:55,860 The seventh step. 1115 00:48:55,860 --> 00:48:57,870 Of course, estimate the program impact. 1116 00:48:57,870 --> 00:49:00,280 And if the experiment is properly done, what you should 1117 00:49:00,280 --> 00:49:02,880 be doing is just compare the outcomes-- the mean outcomes 1118 00:49:02,880 --> 00:49:04,780 of the treatment group with the mean outcomes of the 1119 00:49:04,780 --> 00:49:06,170 control groups. 1120 00:49:06,170 --> 00:49:09,560 Now, there are versions of the experiments where they are 1121 00:49:09,560 --> 00:49:12,790 more sophisticated, and then you need to use the multiple 1122 00:49:12,790 --> 00:49:14,920 regression framework to control for things, 1123 00:49:14,920 --> 00:49:16,850 particularly if you have stratified your 1124 00:49:16,850 --> 00:49:18,480 sample, and so on. 1125 00:49:18,480 --> 00:49:21,990 But in general, the basic idea is, there are no differences 1126 00:49:21,990 --> 00:49:23,730 between these two groups. 1127 00:49:23,730 --> 00:49:27,090 Then the simple differences in outcomes between those groups 1128 00:49:27,090 --> 00:49:30,170 should give you the impact of the program. 1129 00:49:30,170 --> 00:49:33,140 There are other reasons you may want to use the regression 1130 00:49:33,140 --> 00:49:35,630 framework, such as statistical power, that we were talking 1131 00:49:35,630 --> 00:49:38,110 about before, but this is the basic idea. 1132 00:49:38,110 --> 00:49:41,460 If the differences between the two groups is very different 1133 00:49:41,460 --> 00:49:44,870 than what you get with the regression, you should start 1134 00:49:44,870 --> 00:49:47,710 thinking about what's going on. 1135 00:49:47,710 --> 00:49:48,440 And then eight. 1136 00:49:48,440 --> 00:49:51,200 And I think this is very important for practitioners. 1137 00:49:51,200 --> 00:49:53,440 You should assess whether the program's impact are 1138 00:49:53,440 --> 00:49:56,150 statistically significant, but also if they're practically 1139 00:49:56,150 --> 00:49:56,910 significant. 1140 00:49:56,910 --> 00:49:58,910 So if statistically significant means, we're 1141 00:49:58,910 --> 00:50:01,500 confident that this impact is different from 0 in a 1142 00:50:01,500 --> 00:50:03,150 statistical sense. 1143 00:50:03,150 --> 00:50:06,070 Having said that, the impact may still be very small for 1144 00:50:06,070 --> 00:50:07,250 any practical purposes. 1145 00:50:07,250 --> 00:50:10,250 So it may be that a program affects some outcome of 1146 00:50:10,250 --> 00:50:14,010 interest, but the effect is so small that you won't decide 1147 00:50:14,010 --> 00:50:16,760 that this program was a success on the basis of that. 1148 00:50:16,760 --> 00:50:18,980 So both of those things are important. 1149 00:50:18,980 --> 00:50:22,240 The stars or the asterisks for statistical significance are 1150 00:50:22,240 --> 00:50:26,380 not enough for you to conclude that a program is successful. 1151 00:50:26,380 --> 00:50:27,480 Yes? 1152 00:50:27,480 --> 00:50:28,676 Your name pace. 1153 00:50:28,676 --> 00:50:30,222 AUDIENCE: Ashu. 1154 00:50:30,222 --> 00:50:30,684 Yeah. 1155 00:50:30,684 --> 00:50:33,120 I understand we can get the mean just by seeing the 1156 00:50:33,120 --> 00:50:34,760 difference between the two sample sets. 1157 00:50:34,760 --> 00:50:37,974 How do we get a handle on this trend of standard error and 1158 00:50:37,974 --> 00:50:40,334 consequently the statistical significance? 1159 00:50:40,334 --> 00:50:40,810 PROFESSOR: Yeah. 1160 00:50:40,810 --> 00:50:47,340 So again, in the simplest, very, very simple, you just do 1161 00:50:47,340 --> 00:50:50,830 a comparison of two groups, this is the standard t-test, 1162 00:50:50,830 --> 00:50:52,530 there's nothing else to do. 1163 00:50:52,530 --> 00:50:57,230 In practice, a lot of this impact estimation is done 1164 00:50:57,230 --> 00:50:58,970 through the regression framework. 1165 00:50:58,970 --> 00:51:01,250 However you're going to do it, you're going to let your 1166 00:51:01,250 --> 00:51:03,480 statistical software calculate those standard errors. 1167 00:51:03,480 --> 00:51:06,960 Of course you need to be careful about things you learn 1168 00:51:06,960 --> 00:51:08,820 on Thursday, such as clustering and so on. 1169 00:51:08,820 --> 00:51:11,530 You need to make sure that those errors reflect that. 1170 00:51:11,530 --> 00:51:16,570 But the basic idea is, you let your statistical software or 1171 00:51:16,570 --> 00:51:18,770 the evaluator calculate those impacts. 1172 00:51:18,770 --> 00:51:24,220 But as a proxy, if the two means are not different, then 1173 00:51:24,220 --> 00:51:26,330 it's going to be hard to argue that this 1174 00:51:26,330 --> 00:51:27,580 program had a big effect. 1175 00:51:30,280 --> 00:51:31,670 OK. 1176 00:51:31,670 --> 00:51:32,950 So random. 1177 00:51:32,950 --> 00:51:37,060 As I said at the beginning, anyone can tell me, what does 1178 00:51:37,060 --> 00:51:38,310 the term "random" mean? 1179 00:51:42,590 --> 00:51:42,970 Yes? 1180 00:51:42,970 --> 00:51:44,540 AUDIENCE: Chosen by chance. 1181 00:51:44,540 --> 00:51:48,330 PROFESSOR: Oh, you work for public opinion polls. 1182 00:51:48,330 --> 00:51:50,410 I should have asked you. 1183 00:51:50,410 --> 00:51:51,040 All right. 1184 00:51:51,040 --> 00:51:52,890 So "chosen by chance." What does that mean? 1185 00:51:55,794 --> 00:51:57,044 AUDIENCE: [INAUDIBLE] 1186 00:52:00,634 --> 00:52:03,860 One can say random if there's no systematic 1187 00:52:03,860 --> 00:52:06,650 trend behind the selection. 1188 00:52:06,650 --> 00:52:07,520 PROFESSOR: OK. 1189 00:52:07,520 --> 00:52:08,480 Systematic trends. 1190 00:52:08,480 --> 00:52:10,690 So you don't have someone saying, you 1191 00:52:10,690 --> 00:52:13,740 go here you go there. 1192 00:52:13,740 --> 00:52:16,400 So suppose I wanted to do a random 1193 00:52:16,400 --> 00:52:18,930 assignment in this classroom. 1194 00:52:18,930 --> 00:52:23,370 And I went here, and I closed my eyes, and I throw a ball 1195 00:52:23,370 --> 00:52:24,700 right here. 1196 00:52:24,700 --> 00:52:25,870 I don't see where I'm throwing. 1197 00:52:25,870 --> 00:52:27,360 I just throw it. 1198 00:52:27,360 --> 00:52:29,700 Person gets it, falls into the treatment. 1199 00:52:29,700 --> 00:52:31,252 Is that random? 1200 00:52:31,252 --> 00:52:31,710 AUDIENCE: No. 1201 00:52:31,710 --> 00:52:32,960 PROFESSOR: Why not? 1202 00:52:35,460 --> 00:52:36,980 I already turned that way, right? 1203 00:52:36,980 --> 00:52:38,580 AUDIENCE: Maybe you like the sun. 1204 00:52:38,580 --> 00:52:40,810 PROFESSOR: Maybe I like the sun. 1205 00:52:40,810 --> 00:52:43,170 And the people sitting near the sun may be different from 1206 00:52:43,170 --> 00:52:44,800 the people who are not. 1207 00:52:44,800 --> 00:52:46,160 Who knows. 1208 00:52:46,160 --> 00:52:49,290 The key thing is that when we say random, particularly in a 1209 00:52:49,290 --> 00:52:53,780 simple randomized experiment, what we mean is that everyone, 1210 00:52:53,780 --> 00:52:58,070 every single one of you, has the same probability of being 1211 00:52:58,070 --> 00:53:00,920 selected into the treatment group. 1212 00:53:00,920 --> 00:53:02,000 Or into one of the groups. 1213 00:53:02,000 --> 00:53:03,460 Let's say the treatment group. 1214 00:53:03,460 --> 00:53:10,140 So the key thing here is that Iqbal, Brook, Jamie, Jessica, 1215 00:53:10,140 --> 00:53:13,620 everyone, Farah, everyone in this room, if we do a simple 1216 00:53:13,620 --> 00:53:16,660 random assignment, you should have the same probability of 1217 00:53:16,660 --> 00:53:18,810 being assigned to the treatment group. 1218 00:53:18,810 --> 00:53:21,440 So it has a precise statistical definition. 1219 00:53:21,440 --> 00:53:24,530 It's not just someone saying, oh, yeah. 1220 00:53:24,530 --> 00:53:26,250 We can't remember how we did it. 1221 00:53:26,250 --> 00:53:27,120 It must have been random. 1222 00:53:27,120 --> 00:53:27,330 No. 1223 00:53:27,330 --> 00:53:30,120 It has a very, very precise definition. 1224 00:53:30,120 --> 00:53:34,390 Because if you trust someone telling you, it was random, 1225 00:53:34,390 --> 00:53:36,980 and then you trust that word, and then you start doing your 1226 00:53:36,980 --> 00:53:39,970 study, and three years later, you discover it wasn't random, 1227 00:53:39,970 --> 00:53:42,910 you are not going to be very happy with yourself. 1228 00:53:42,910 --> 00:53:47,050 So there are variations on this. 1229 00:53:47,050 --> 00:53:49,360 If you have stratified, it doesn't mean that everyone 1230 00:53:49,360 --> 00:53:50,340 must have the same probability. 1231 00:53:50,340 --> 00:53:52,040 It means everyone within a strata. 1232 00:53:52,040 --> 00:53:56,050 But the basic idea is, before we do random assignments, we 1233 00:53:56,050 --> 00:53:59,380 should know the probability of everyone being selected. 1234 00:53:59,380 --> 00:54:01,990 When I say the same probability of being selected 1235 00:54:01,990 --> 00:54:04,080 into a treatment group, that probability 1236 00:54:04,080 --> 00:54:05,630 doesn't need to be half. 1237 00:54:05,630 --> 00:54:07,210 So it could be a third. 1238 00:54:07,210 --> 00:54:08,670 It could be two thirds. 1239 00:54:08,670 --> 00:54:10,530 From a statistical power perspective, you 1240 00:54:10,530 --> 00:54:12,230 prefer half and half. 1241 00:54:12,230 --> 00:54:15,700 But whatever it is, all of you should have the same 1242 00:54:15,700 --> 00:54:17,400 probability of being selected. 1243 00:54:17,400 --> 00:54:19,160 Make sense? 1244 00:54:19,160 --> 00:54:20,410 OK. 1245 00:54:22,800 --> 00:54:26,043 AUDIENCE: In your example of drawing the ball, is that a 1246 00:54:26,043 --> 00:54:27,890 random assignment? 1247 00:54:27,890 --> 00:54:28,150 PROFESSOR: Right. 1248 00:54:28,150 --> 00:54:30,940 So again, it depends on the details on how you do it. 1249 00:54:30,940 --> 00:54:35,320 But suppose we have balls for, I don't know, 30 participants 1250 00:54:35,320 --> 00:54:39,150 or however many you are, and you have balls from 1 to 30, 1251 00:54:39,150 --> 00:54:41,890 and you mix the bag, and you really trusted the physics 1252 00:54:41,890 --> 00:54:44,700 that by mixing, that all the balls would have the same 1253 00:54:44,700 --> 00:54:47,470 chance of being selected, and you draw one 1254 00:54:47,470 --> 00:54:49,300 ball from the bag-- 1255 00:54:49,300 --> 00:54:51,480 all the balls had the same chance of being selected. 1256 00:54:51,480 --> 00:54:53,796 All of you had the same chance of being selected. 1257 00:54:53,796 --> 00:54:55,014 AUDIENCE: But the second person-- 1258 00:54:55,014 --> 00:54:59,030 so when you draw one, that's 1 out of 30. 1259 00:54:59,030 --> 00:55:00,450 PROFESSOR: Yes. 1260 00:55:00,450 --> 00:55:06,160 AUDIENCE: But the second time you do it, you could have a-- 1261 00:55:06,160 --> 00:55:09,200 PROFESSOR: So if the sample size is very, very small, you 1262 00:55:09,200 --> 00:55:11,990 worry about sampling with replacement and without 1263 00:55:11,990 --> 00:55:13,110 replacing-- 1264 00:55:13,110 --> 00:55:18,550 if the population from which you're drawing is very small, 1265 00:55:18,550 --> 00:55:19,970 you may have an issue with that. 1266 00:55:19,970 --> 00:55:23,170 If the population is large, the difference between 1 in 1267 00:55:23,170 --> 00:55:28,820 1000 and 1 in 999, it's going to be pretty small. 1268 00:55:28,820 --> 00:55:30,260 If you do it sequentially like that. 1269 00:55:30,260 --> 00:55:34,200 If you do it in a computer, you can have a randomizing 1270 00:55:34,200 --> 00:55:38,060 device that just generates a random number, and then you 1271 00:55:38,060 --> 00:55:39,310 pick the first half. 1272 00:55:41,980 --> 00:55:42,670 OK. 1273 00:55:42,670 --> 00:55:46,040 So is random assignment the same as random sampling? 1274 00:55:53,730 --> 00:55:56,196 I see no, yes? 1275 00:55:56,196 --> 00:55:56,630 AUDIENCE: No. 1276 00:55:56,630 --> 00:55:58,130 PROFESSOR: No. 1277 00:55:58,130 --> 00:55:59,440 I need a little bit more than that. 1278 00:55:59,440 --> 00:56:01,141 AUDIENCE: A random assignment, you would have already 1279 00:56:01,141 --> 00:56:04,300 narrowed down to a smaller sample, and assigned within 1280 00:56:04,300 --> 00:56:05,713 that sample. 1281 00:56:05,713 --> 00:56:08,431 Random sampling would be taking a group out of a whole 1282 00:56:08,431 --> 00:56:09,790 population. 1283 00:56:09,790 --> 00:56:10,420 PROFESSOR: OK. 1284 00:56:10,420 --> 00:56:10,860 Very good. 1285 00:56:10,860 --> 00:56:17,110 So one way think about this is you have your target 1286 00:56:17,110 --> 00:56:20,730 population, then you have potential participants. 1287 00:56:20,730 --> 00:56:24,870 This may be children you're targeting to in your 1288 00:56:24,870 --> 00:56:26,220 intervention. 1289 00:56:26,220 --> 00:56:28,920 And then you have your evaluation sample. 1290 00:56:28,920 --> 00:56:34,170 Here's where the random sampling could occur. 1291 00:56:34,170 --> 00:56:36,150 So-- 1292 00:56:36,150 --> 00:56:37,450 sorry I forgot your name, 1293 00:56:37,450 --> 00:56:38,270 AUDIENCE: I didn't tell you. 1294 00:56:38,270 --> 00:56:39,120 PROFESSOR: You didn't tell me. 1295 00:56:39,120 --> 00:56:41,800 This is even worse. 1296 00:56:41,800 --> 00:56:42,600 Jean. 1297 00:56:42,600 --> 00:56:46,100 So what Jean is saying is, random sampling happened at 1298 00:56:46,100 --> 00:56:47,070 this stage. 1299 00:56:47,070 --> 00:56:49,400 Or could have happened in this stage. 1300 00:56:49,400 --> 00:56:53,870 What random sampling is buying you is the ability to 1301 00:56:53,870 --> 00:56:56,440 generalize from your evaluation to 1302 00:56:56,440 --> 00:56:57,790 this population here. 1303 00:56:57,790 --> 00:57:00,080 And whether this is a population of policy interests 1304 00:57:00,080 --> 00:57:01,320 or not, that's a different matter. 1305 00:57:01,320 --> 00:57:04,730 But that's what random sampling is buying you. 1306 00:57:04,730 --> 00:57:07,760 What random assignment is doing is once you have the 1307 00:57:07,760 --> 00:57:10,540 samples-- so suppose there are 100,000 potential 1308 00:57:10,540 --> 00:57:11,640 participants. 1309 00:57:11,640 --> 00:57:15,000 You don't have money to enroll 100,000 people in a program or 1310 00:57:15,000 --> 00:57:16,350 in an evaluation. 1311 00:57:16,350 --> 00:57:20,510 You pick, out of this 100,000, 5,000 at random, the results 1312 00:57:20,510 --> 00:57:23,990 of your study are going to be generalizable to this 100,000. 1313 00:57:23,990 --> 00:57:27,380 Now, within this 5,000, you do random assignment and you 1314 00:57:27,380 --> 00:57:29,670 assign to a treatment group and to a control group. 1315 00:57:29,670 --> 00:57:34,800 Maybe of this 5,000, 2,500 fall here, 2,500 fall here. 1316 00:57:34,800 --> 00:57:38,770 What random assignment buys you is these two groups are 1317 00:57:38,770 --> 00:57:41,490 identical, and so any difference you observe in 1318 00:57:41,490 --> 00:57:43,670 outcomes is due to the program. 1319 00:57:43,670 --> 00:57:45,140 That's internal validity. 1320 00:57:45,140 --> 00:57:48,760 That has to do with causal inference that is about this 1321 00:57:48,760 --> 00:57:50,920 5,000 that are here. 1322 00:57:50,920 --> 00:57:56,340 So where the 5,000 generalize to is an external validation. 1323 00:57:56,340 --> 00:57:59,610 So they both have the word "random," but these are two 1324 00:57:59,610 --> 00:58:01,980 different concepts. 1325 00:58:01,980 --> 00:58:04,370 Again, random assignment relates to internal validity, 1326 00:58:04,370 --> 00:58:05,320 causal inference. 1327 00:58:05,320 --> 00:58:09,020 Random sampling refers to external validity. 1328 00:58:09,020 --> 00:58:09,480 yes? 1329 00:58:09,480 --> 00:58:12,040 AUDIENCE: My name is Cornelia. 1330 00:58:12,040 --> 00:58:13,590 PROFESSOR: I should know it by now. 1331 00:58:13,590 --> 00:58:14,840 AUDIENCE: I haven't said it yet. 1332 00:58:16,900 --> 00:58:18,100 Can you do one and not the other? 1333 00:58:18,100 --> 00:58:18,780 Not really. 1334 00:58:18,780 --> 00:58:19,610 Do you have to--? 1335 00:58:19,610 --> 00:58:20,430 PROFESSOR: You can, you can. 1336 00:58:20,430 --> 00:58:21,130 In fact-- 1337 00:58:21,130 --> 00:58:22,230 well, sorry. 1338 00:58:22,230 --> 00:58:25,040 If it's called a randomized experiment, this 1339 00:58:25,040 --> 00:58:28,440 one has to be there. 1340 00:58:28,440 --> 00:58:30,610 This is what defines a randomized experiment. 1341 00:58:30,610 --> 00:58:31,860 there was random assignment. 1342 00:58:34,250 --> 00:58:37,230 AUDIENCE: So you can do a randomized assignment, even if 1343 00:58:37,230 --> 00:58:38,600 your sampling is not running. 1344 00:58:38,600 --> 00:58:39,760 PROFESSOR: That's right. 1345 00:58:39,760 --> 00:58:43,350 So what that means is that then you need to think about 1346 00:58:43,350 --> 00:58:44,600 who you generalize to. 1347 00:58:47,880 --> 00:58:48,580 All right. 1348 00:58:48,580 --> 00:58:51,225 So advantages and limitations of experiments. 1349 00:58:53,790 --> 00:58:57,830 For those of you who are a little bit more statistically 1350 00:58:57,830 --> 00:59:03,170 inclined, the key thing about random assignment is that not 1351 00:59:03,170 --> 00:59:06,510 only on average the two groups are the same, but the 1352 00:59:06,510 --> 00:59:09,550 distribution, the statistical distribution of the two 1353 00:59:09,550 --> 00:59:13,220 groups, is the same. 1354 00:59:13,220 --> 00:59:16,310 And this is very powerful for a lot of the adjustments that 1355 00:59:16,310 --> 00:59:19,010 come at a later stage, particularly when there are 1356 00:59:19,010 --> 00:59:21,200 crossovers and similar things. 1357 00:59:21,200 --> 00:59:24,450 The idea is that the two groups not only on average 1358 00:59:24,450 --> 00:59:26,900 both unobservable, and unobservable characteristics 1359 00:59:26,900 --> 00:59:29,320 look the same, but the whole distribution. 1360 00:59:29,320 --> 00:59:31,480 So they have the same variance, they have the same 1361 00:59:31,480 --> 00:59:35,530 25th percentile, the same 75th percentile. 1362 00:59:35,530 --> 00:59:38,720 And of course, when I say the same, again, it's in a 1363 00:59:38,720 --> 00:59:43,430 statistical sense, subject to sampling error, which we can 1364 00:59:43,430 --> 00:59:45,000 account for. 1365 00:59:45,000 --> 00:59:45,970 And so there are-- 1366 00:59:45,970 --> 00:59:46,730 yes? 1367 00:59:46,730 --> 00:59:49,287 AUDIENCE: That doesn't necessarily mean that they're 1368 00:59:49,287 --> 00:59:51,010 both anomolies. 1369 00:59:51,010 --> 00:59:51,770 PROFESSOR: No, no, no. 1370 00:59:51,770 --> 00:59:52,640 AUDIENCE: [INAUDIBLE] 1371 00:59:52,640 --> 00:59:53,986 PROFESSOR: Anything. 1372 00:59:53,986 --> 00:59:54,420 Yeah. 1373 00:59:54,420 --> 00:59:56,760 Anything. 1374 00:59:56,760 --> 00:59:58,910 But the distribution should look the same. 1375 01:00:01,510 --> 01:00:01,860 OK. 1376 01:00:01,860 --> 01:00:04,790 So no systematic differences between the two groups. 1377 01:00:07,520 --> 01:00:12,610 This is deliberately a repeated slide. 1378 01:00:12,610 --> 01:00:15,340 I didn't forget to take it out of the presentation. 1379 01:00:15,340 --> 01:00:17,940 Key advantage, key takeaway message-- 1380 01:00:17,940 --> 01:00:21,950 these two groups do not differ systematically at the outset, 1381 01:00:21,950 --> 01:00:25,620 so any difference you observe should be attributable to the 1382 01:00:25,620 --> 01:00:26,240 experiment. 1383 01:00:26,240 --> 01:00:28,960 And this is under the big assumption that the experiment 1384 01:00:28,960 --> 01:00:31,160 was properly designed and conducted. 1385 01:00:31,160 --> 01:00:33,755 It's not like any experiment will reach this. 1386 01:00:38,660 --> 01:00:41,540 So other advantages of experiments. 1387 01:00:41,540 --> 01:00:45,330 Relative to results from non-experimental studies, 1388 01:00:45,330 --> 01:00:48,040 they're less subject to methodological debates. 1389 01:00:48,040 --> 01:00:51,760 So a lot more boring conversations in academic 1390 01:00:51,760 --> 01:01:00,140 seminars because there may be some questions about what 1391 01:01:00,140 --> 01:01:03,070 question is being answered, there may be some questions 1392 01:01:03,070 --> 01:01:05,370 about things that happen in the field that may have 1393 01:01:05,370 --> 01:01:06,710 threatened the experiment. 1394 01:01:06,710 --> 01:01:09,830 But the basic notion that if it was done properly, the two 1395 01:01:09,830 --> 01:01:12,580 groups should look alike, it's never debated. 1396 01:01:12,580 --> 01:01:17,340 Whereas with non-experimental methods, that's the whole sort 1397 01:01:17,340 --> 01:01:22,500 of central claim of the seminar and of the presenter. 1398 01:01:22,500 --> 01:01:23,780 They're easier to convey. 1399 01:01:23,780 --> 01:01:25,250 You can explain to people, look. 1400 01:01:25,250 --> 01:01:27,570 These two groups look alike at the beginning. 1401 01:01:27,570 --> 01:01:29,240 Now there's a difference. 1402 01:01:29,240 --> 01:01:31,100 It must have been the program. 1403 01:01:31,100 --> 01:01:34,240 And they're more likely to be convincing to program funders 1404 01:01:34,240 --> 01:01:35,920 and/or policymakers. 1405 01:01:35,920 --> 01:01:39,460 If they find it more credible, easier to convey, it's more 1406 01:01:39,460 --> 01:01:40,960 likely that they will take action. 1407 01:01:40,960 --> 01:01:44,210 Although in this respect, I can't emphasize enough what 1408 01:01:44,210 --> 01:01:46,870 Rachel said, which is, look. 1409 01:01:46,870 --> 01:01:50,090 If you have the right question, then answering that 1410 01:01:50,090 --> 01:01:53,650 question is going to be important to lead to change. 1411 01:01:53,650 --> 01:01:55,490 If you have the wrong question, even if you did a 1412 01:01:55,490 --> 01:01:58,500 nice experiment, it's not going to help you that much. 1413 01:01:58,500 --> 01:02:00,964 Yes? 1414 01:02:00,964 --> 01:02:06,420 AUDIENCE: I've been to the conference two months ago. 1415 01:02:06,420 --> 01:02:12,372 Some people were arguing that last first advantage that is 1416 01:02:12,372 --> 01:02:15,950 with randomization-- 1417 01:02:15,950 --> 01:02:18,105 that's random assignment-- 1418 01:02:18,105 --> 01:02:28,200 how to build two groups that are identical to each other. 1419 01:02:28,200 --> 01:02:33,030 And some people argue that you will almost never find a 1420 01:02:33,030 --> 01:02:39,380 context where you will have that situation occur. 1421 01:02:39,380 --> 01:02:43,690 The way the government programs operating in most 1422 01:02:43,690 --> 01:02:51,710 cases, it is almost impossible that you find an exact 1423 01:02:51,710 --> 01:02:55,630 identical treatment group and control group. 1424 01:02:55,630 --> 01:03:00,730 PROFESSOR: See, the key thing here is that you don't 1425 01:03:00,730 --> 01:03:01,520 need to find it. 1426 01:03:01,520 --> 01:03:04,270 It's not like you have a treatment group and now let's 1427 01:03:04,270 --> 01:03:06,980 look in the whole country, where is the control group? 1428 01:03:06,980 --> 01:03:08,400 No. 1429 01:03:08,400 --> 01:03:12,270 This method forces the two groups to be the same. 1430 01:03:12,270 --> 01:03:14,680 As long as there are some people who are going to be 1431 01:03:14,680 --> 01:03:18,430 served by the program and some that are not, if you randomly 1432 01:03:18,430 --> 01:03:21,000 assign to these two groups, the two 1433 01:03:21,000 --> 01:03:22,260 groups should be identical. 1434 01:03:22,260 --> 01:03:24,890 Not because you were very smart and looked for the 1435 01:03:24,890 --> 01:03:25,670 other group, no. 1436 01:03:25,670 --> 01:03:32,370 It's like random assignment is for those of us who precisely 1437 01:03:32,370 --> 01:03:34,130 don't think we can come up with that other 1438 01:03:34,130 --> 01:03:37,340 group on our own. 1439 01:03:37,340 --> 01:03:44,080 So there may be issues with whether you have enough 1440 01:03:44,080 --> 01:03:47,130 program applicants to be able to divide them into two 1441 01:03:47,130 --> 01:03:50,540 groups, participants and non-participants. 1442 01:03:50,540 --> 01:03:52,060 But in context where you're not 1443 01:03:52,060 --> 01:03:55,240 serving all the two groups-- 1444 01:03:55,240 --> 01:03:58,470 so if you don't have money to serve 1,000 people, and 1,000 1445 01:03:58,470 --> 01:04:00,990 people applied to your program, and you only have 400 1446 01:04:00,990 --> 01:04:04,510 slots, that's not going to-- this goes to the ethical 1447 01:04:04,510 --> 01:04:06,110 issue, which we'll discuss in a second. 1448 01:04:10,140 --> 01:04:12,990 The only thing that changes is how you select those 400. 1449 01:04:12,990 --> 01:04:15,520 But once you've selected randomly, those two groups 1450 01:04:15,520 --> 01:04:16,860 should look identical. 1451 01:04:16,860 --> 01:04:20,940 Again, not because you were incredibly astute at saying, 1452 01:04:20,940 --> 01:04:22,260 oh, here's another group. 1453 01:04:22,260 --> 01:04:22,500 No. 1454 01:04:22,500 --> 01:04:25,490 This this happens through the flip of a coin. 1455 01:04:25,490 --> 01:04:30,380 This is not a researcher a kind of, oh, can the research 1456 01:04:30,380 --> 01:04:31,500 and find a group? 1457 01:04:31,500 --> 01:04:35,140 Or the context is development versus a developed country. 1458 01:04:35,140 --> 01:04:36,780 This has to do with the technique 1459 01:04:36,780 --> 01:04:38,880 applied to any setting. 1460 01:04:38,880 --> 01:04:43,240 Again, you're going to have a case where you see a 1461 01:04:43,240 --> 01:04:45,440 spreadsheet and you can see, you can do the random 1462 01:04:45,440 --> 01:04:47,430 assignment and see for yourself that the two groups 1463 01:04:47,430 --> 01:04:49,200 will look similar. 1464 01:04:49,200 --> 01:04:50,450 OK? 1465 01:04:52,480 --> 01:04:54,680 AUDIENCE: Is it necessary that the size of the two groups 1466 01:04:54,680 --> 01:04:56,120 have to be the same? 1467 01:04:56,120 --> 01:04:57,470 PROFESSOR: No, it's not necessary. 1468 01:04:57,470 --> 01:05:01,150 And in fact in practice, what happens is, suppose you had 1469 01:05:01,150 --> 01:05:07,200 1,000 applicants and you had money to serve 600. 1470 01:05:07,200 --> 01:05:11,760 Then no matter what the statistician says-- oh, it 1471 01:05:11,760 --> 01:05:13,840 would be nice to have 500 and 500-- 1472 01:05:13,840 --> 01:05:18,140 you're not going to have 100 people not being served just 1473 01:05:18,140 --> 01:05:23,170 because you want to keep the half-half ratio. 1474 01:05:23,170 --> 01:05:26,450 From a statistical perspective it's ideal to have 50-50 1475 01:05:26,450 --> 01:05:30,740 ratio, but only from a statistical prospective. 1476 01:05:30,740 --> 01:05:33,360 If you deviate too much from that 50-50, 1477 01:05:33,360 --> 01:05:34,750 then you get in trouble. 1478 01:05:34,750 --> 01:05:36,683 So if you get to-- 1479 01:05:36,683 --> 01:05:37,410 I don't know. 1480 01:05:37,410 --> 01:05:38,810 The rule of thumb may be different 1481 01:05:38,810 --> 01:05:39,540 for different people. 1482 01:05:39,540 --> 01:05:44,040 But if you get over 70-30, I would say probably you're 1483 01:05:44,040 --> 01:05:45,300 going to lose a lot of statistical 1484 01:05:45,300 --> 01:05:46,400 power by doing that. 1485 01:05:46,400 --> 01:05:51,450 AUDIENCE: Yeah, but in some cases, for example, a country 1486 01:05:51,450 --> 01:05:57,870 needs to make priority in aid with about 200 1487 01:05:57,870 --> 01:05:59,690 hospitals, for example. 1488 01:05:59,690 --> 01:06:06,320 And in my country, there are one hospital that is the most 1489 01:06:06,320 --> 01:06:08,830 important public hospital in Honduras. 1490 01:06:08,830 --> 01:06:13,630 So you can apply this randomized process. 1491 01:06:13,630 --> 01:06:20,390 But if you don't include this particular hospital, you 1492 01:06:20,390 --> 01:06:24,300 cannot include this particular hospital 1493 01:06:24,300 --> 01:06:26,740 because it's too important. 1494 01:06:26,740 --> 01:06:30,750 We call that [UNINTELLIGIBLE] 1495 01:06:30,750 --> 01:06:32,940 [? represented ?] 1496 01:06:32,940 --> 01:06:36,090 subject for this type of problem, who have the 1497 01:06:36,090 --> 01:06:38,960 possibility of 1. 1498 01:06:38,960 --> 01:06:42,570 Should be in the sample. 1499 01:06:42,570 --> 01:06:44,960 I don't know if you understand my Spanglish. 1500 01:06:44,960 --> 01:06:45,750 PROFESSOR: No, no. 1501 01:06:45,750 --> 01:06:46,520 I speak Spanish. 1502 01:06:46,520 --> 01:06:48,010 We can communicate here. 1503 01:06:48,010 --> 01:06:55,320 So the key thing is, Again, you're trying to create 1504 01:06:55,320 --> 01:06:57,220 comparable groups. 1505 01:06:57,220 --> 01:07:01,350 If for some reason you need to serve a hospital because the 1506 01:07:01,350 --> 01:07:04,260 president of your country says, you need to serve this 1507 01:07:04,260 --> 01:07:06,300 hospital, that's fine. 1508 01:07:06,300 --> 01:07:07,650 One slot. 1509 01:07:07,650 --> 01:07:10,605 But that hospital should not be a part of your study, 1510 01:07:10,605 --> 01:07:14,920 because that hospital was not randomly assigned. 1511 01:07:14,920 --> 01:07:15,570 That's all. 1512 01:07:15,570 --> 01:07:16,550 As simple as that. 1513 01:07:16,550 --> 01:07:17,690 And you may have a few of those. 1514 01:07:17,690 --> 01:07:21,890 I mean, I can tell you, in my own experience, we're trying 1515 01:07:21,890 --> 01:07:25,680 to implement random assignment in Niger, a program financed 1516 01:07:25,680 --> 01:07:28,156 by the Millennium Challenge Corporation. 1517 01:07:28,156 --> 01:07:31,560 A program about building schools. 1518 01:07:31,560 --> 01:07:33,250 We said, we're going to do a random assignment. 1519 01:07:33,250 --> 01:07:35,230 And they say, yes, yes, yes. 1520 01:07:35,230 --> 01:07:38,380 Well, the US ambassador visited two of the villages, 1521 01:07:38,380 --> 01:07:41,740 and he promised them they were getting schools. 1522 01:07:41,740 --> 01:07:43,850 Now, you tell me if you want to be the evaluator and tell 1523 01:07:43,850 --> 01:07:44,870 those schools, no, no. 1524 01:07:44,870 --> 01:07:47,780 We're going to put you in the pool of-- 1525 01:07:47,780 --> 01:07:48,440 no way. 1526 01:07:48,440 --> 01:07:51,520 Those two villages are going to get their schools, but 1527 01:07:51,520 --> 01:07:52,820 they're not part of our evaluation. 1528 01:07:57,552 --> 01:08:01,960 AUDIENCE: Is there an acceptable margin? 1529 01:08:01,960 --> 01:08:03,920 PROFESSOR: See, that's again the Jamaica question. 1530 01:08:03,920 --> 01:08:05,710 I won't make that mistake again. 1531 01:08:05,710 --> 01:08:06,960 I won't to tell you. 1532 01:08:09,210 --> 01:08:11,080 You're going to see on Thursday a whole session on 1533 01:08:11,080 --> 01:08:13,110 statistical power, and you're going to get a sense 1534 01:08:13,110 --> 01:08:14,610 of where you are. 1535 01:08:14,610 --> 01:08:16,870 You don't want to have too many first, because you lose 1536 01:08:16,870 --> 01:08:19,410 sample size, and second because you lose 1537 01:08:19,410 --> 01:08:20,500 representativeness. 1538 01:08:20,500 --> 01:08:23,380 I mean, in the case of the hospital in Honduras, if 1539 01:08:23,380 --> 01:08:28,700 that's the hospital where 90% of things are happening, then 1540 01:08:28,700 --> 01:08:31,950 it's a little bit hard to have that as a hospital that's out 1541 01:08:31,950 --> 01:08:32,850 of your study. 1542 01:08:32,850 --> 01:08:36,279 So that is an important issue. 1543 01:08:36,279 --> 01:08:36,830 All right. 1544 01:08:36,830 --> 01:08:38,510 There are limitations of experiments, 1545 01:08:38,510 --> 01:08:39,760 believe it or not. 1546 01:08:42,279 --> 01:08:47,800 So the first one is, huge methodological advantages. 1547 01:08:47,800 --> 01:08:50,700 But you still need to worry about these issues of internal 1548 01:08:50,700 --> 01:08:53,410 validity and external validity. 1549 01:08:53,410 --> 01:08:56,210 And what I would say about this is, on Friday youo're 1550 01:08:56,210 --> 01:08:59,040 going to learn a lot about how to do with these internal 1551 01:08:59,040 --> 01:09:00,540 validity issues. 1552 01:09:00,540 --> 01:09:02,270 And I'm not going to go over them now. 1553 01:09:02,270 --> 01:09:05,060 But the key thing is, if you can avoid them from the 1554 01:09:05,060 --> 01:09:07,790 beginning in terms of how you design your program and how 1555 01:09:07,790 --> 01:09:10,229 you implement them, then much better. 1556 01:09:10,229 --> 01:09:11,810 External validity issues-- 1557 01:09:11,810 --> 01:09:14,970 as Rachel said, any impact evaluation conducted in a 1558 01:09:14,970 --> 01:09:17,880 particular setting is going to have external validity issues. 1559 01:09:17,880 --> 01:09:20,279 But experiments are particularly prone to them 1560 01:09:20,279 --> 01:09:23,819 because they're sometimes done in particularly concentrated 1561 01:09:23,819 --> 01:09:26,350 areas where you really want to find out, does this program 1562 01:09:26,350 --> 01:09:28,490 work before expanding it, so the external 1563 01:09:28,490 --> 01:09:30,889 validity issue is there. 1564 01:09:30,889 --> 01:09:33,810 As Rachel said, if you can design an experiment to test 1565 01:09:33,810 --> 01:09:40,760 each thing in your theory of change, that usually helps 1566 01:09:40,760 --> 01:09:41,600 with external validity. 1567 01:09:41,600 --> 01:09:43,279 And of course, if you can replicate 1568 01:09:43,279 --> 01:09:45,220 evaluation in other settings. 1569 01:09:45,220 --> 01:09:48,410 AUDIENCE: So OK, you're going to have 10 variables with 1570 01:09:48,410 --> 01:09:52,450 internal validity, equal internal validity, but only 1571 01:09:52,450 --> 01:09:54,910 three variables with external validity? 1572 01:09:54,910 --> 01:09:56,790 PROFESSOR: When you say three variables, what do you mean 1573 01:09:56,790 --> 01:09:58,950 with variables? 1574 01:09:58,950 --> 01:10:02,310 AUDIENCE: The variables that you are--variables. 1575 01:10:02,310 --> 01:10:03,790 The study variables. 1576 01:10:03,790 --> 01:10:08,310 I mean, when you're going to evaluate internal validity, 1577 01:10:08,310 --> 01:10:11,320 you're going to have 10 variables or 20. 1578 01:10:11,320 --> 01:10:13,440 PROFESSOR: Well, internal validity, the two groups 1579 01:10:13,440 --> 01:10:14,450 should be the same. 1580 01:10:14,450 --> 01:10:17,250 And you have pretty strong internal validity if you can 1581 01:10:17,250 --> 01:10:19,460 deal with this problem. 1582 01:10:19,460 --> 01:10:24,190 AUDIENCE: When you're going to the external validity, maybe 1583 01:10:24,190 --> 01:10:28,090 not the whole 20 variables will have external validity. 1584 01:10:28,090 --> 01:10:33,620 But maybe your three or four where you have been made 1585 01:10:33,620 --> 01:10:35,390 different experiment in-- 1586 01:10:35,390 --> 01:10:39,100 PROFESSOR: So it really depends on the 1587 01:10:39,100 --> 01:10:40,880 context of your project. 1588 01:10:40,880 --> 01:10:44,030 Again, I think the good example is deworming. 1589 01:10:44,030 --> 01:10:48,950 So deworming, you take out worms. 1590 01:10:48,950 --> 01:10:54,700 Well, in Honduras, if children who go to school, there are no 1591 01:10:54,700 --> 01:10:56,690 worms, and that's not the reason they don't go to 1592 01:10:56,690 --> 01:11:00,420 school, then that program in Kenya doesn't have much 1593 01:11:00,420 --> 01:11:03,020 external validity or generalizability to Honduras. 1594 01:11:03,020 --> 01:11:06,020 So you need to be thinking about how the effect is 1595 01:11:06,020 --> 01:11:06,875 supposed to be happening. 1596 01:11:06,875 --> 01:11:09,930 And here there was the anemia thing, which may work in the 1597 01:11:09,930 --> 01:11:12,240 case of Honduras or not. 1598 01:11:12,240 --> 01:11:15,600 You need to be seeing, what is the chain? 1599 01:11:15,600 --> 01:11:18,360 And seeing whether that chain is likely to hold in whatever 1600 01:11:18,360 --> 01:11:20,350 other contexts you want to apply. 1601 01:11:20,350 --> 01:11:22,260 There's no magic formula here. 1602 01:11:22,260 --> 01:11:24,890 AUDIENCE: Yeah, but you are going to control the 1603 01:11:24,890 --> 01:11:28,880 theoretical framework with just three, four variables 1604 01:11:28,880 --> 01:11:33,700 because that variable will be common in different countries? 1605 01:11:33,700 --> 01:11:36,290 PROFESSOR: Yeah, but you can have 200 variables. 1606 01:11:36,290 --> 01:11:39,530 You can say, it depends on so many things. 1607 01:11:39,530 --> 01:11:42,810 But there's a limit to how much-- 1608 01:11:42,810 --> 01:11:45,520 the external validity issue is an issue that you can always 1609 01:11:45,520 --> 01:11:46,370 hide behind it. 1610 01:11:46,370 --> 01:11:49,520 You can always say, oh, this program worked in Kenya. 1611 01:11:49,520 --> 01:11:51,670 Who knows whether it would work somewhere else? 1612 01:11:51,670 --> 01:11:54,530 And then if you take that attitude, then you can't learn 1613 01:11:54,530 --> 01:11:57,880 anything from a randomized experiment, or from any impact 1614 01:11:57,880 --> 01:12:00,140 evaluation that's done in a specific setting. 1615 01:12:00,140 --> 01:12:03,340 Because even if you did it in Kenya, in a particular point 1616 01:12:03,340 --> 01:12:07,000 in time, you can always say, well, it worked in Kenya ten 1617 01:12:07,000 --> 01:12:09,370 years ago, but maybe it won't work today. 1618 01:12:09,370 --> 01:12:12,080 So I lean to the middle ground here. 1619 01:12:12,080 --> 01:12:15,290 You sort of think about what are the critical steps or 1620 01:12:15,290 --> 01:12:19,880 stages in which it can work, and then go implement it, and 1621 01:12:19,880 --> 01:12:21,630 maybe evaluate it. 1622 01:12:21,630 --> 01:12:25,370 I think my answer here is, external validity issues are 1623 01:12:25,370 --> 01:12:27,090 going to be present for both experiments and 1624 01:12:27,090 --> 01:12:27,730 non-experiments. 1625 01:12:27,730 --> 01:12:29,370 There is no magic formula here. 1626 01:12:29,370 --> 01:12:32,060 As long as you evaluate in a particular setting, you're 1627 01:12:32,060 --> 01:12:35,400 still going to be subject to the question, does it work in 1628 01:12:35,400 --> 01:12:38,610 some other setting? 1629 01:12:38,610 --> 01:12:40,980 Some of these threats also affect the validity of 1630 01:12:40,980 --> 01:12:43,080 non-experimental studies. 1631 01:12:43,080 --> 01:12:45,930 The key thing, though, is that some of this, in the 1632 01:12:45,930 --> 01:12:49,290 non-experimental studies, you may not even realize that you 1633 01:12:49,290 --> 01:12:50,370 have the threat. 1634 01:12:50,370 --> 01:12:53,160 Because you've already done something that allows you to 1635 01:12:53,160 --> 01:12:54,960 be blind to the threat. 1636 01:12:59,600 --> 01:13:03,820 So other limitations, the experiment measures the impact 1637 01:13:03,820 --> 01:13:07,070 of the offer of the treatment. 1638 01:13:07,070 --> 01:13:13,770 So when we implement the program, and we say, OK, you 1639 01:13:13,770 --> 01:13:15,600 are in the treatment group, you're going to get the 1640 01:13:15,600 --> 01:13:18,510 program, as you know from implementing these programs in 1641 01:13:18,510 --> 01:13:21,580 the field, not all of the people you offer the program 1642 01:13:21,580 --> 01:13:23,670 are going to take up the program. 1643 01:13:23,670 --> 01:13:27,680 So what the experiment buys you is, the whole treatment 1644 01:13:27,680 --> 01:13:29,860 group is comparable to the whole control group. 1645 01:13:29,860 --> 01:13:33,030 So the experiment is going to tell you, this is the impact 1646 01:13:33,030 --> 01:13:36,000 for every, on average, for the whole treatment group. 1647 01:13:36,000 --> 01:13:39,940 So some of them may not have received the program, and some 1648 01:13:39,940 --> 01:13:41,730 of them may be diluting the impact of the 1649 01:13:41,730 --> 01:13:43,310 program when you estimate. 1650 01:13:43,310 --> 01:13:48,270 But technically, that's the impact that the experiment is 1651 01:13:48,270 --> 01:13:49,170 estimating. 1652 01:13:49,170 --> 01:13:53,470 So if you have a program with a very low take-up rate, then 1653 01:13:53,470 --> 01:13:56,710 you need to worry about the issue that the non-takers are 1654 01:13:56,710 --> 01:13:58,950 going to dilute the effect of the program. 1655 01:13:58,950 --> 01:14:01,670 You can then go and calculate, what is the effect of the 1656 01:14:01,670 --> 01:14:04,070 program for those who participated? 1657 01:14:04,070 --> 01:14:07,810 But then you start relying on non-experimental assumptions. 1658 01:14:07,810 --> 01:14:11,220 You've lost a bit the advantage of the experiment. 1659 01:14:11,220 --> 01:14:14,750 So that's something that you need to think about when you 1660 01:14:14,750 --> 01:14:16,000 do an experiment. 1661 01:14:18,840 --> 01:14:20,440 There's a limitation in terms of these 1662 01:14:20,440 --> 01:14:22,726 experiments can be costly. 1663 01:14:22,726 --> 01:14:25,250 I'll sort of just say two things about being costly. 1664 01:14:29,210 --> 01:14:31,460 I'll say three things about being costly. 1665 01:14:31,460 --> 01:14:33,930 And I did learn that I should never say "I'll say three 1666 01:14:33,930 --> 01:14:35,960 things," and I'll forget what those three things are. 1667 01:14:35,960 --> 01:14:37,360 But I think I'll keep them in mind. 1668 01:14:37,360 --> 01:14:39,310 The first thing-- 1669 01:14:39,310 --> 01:14:42,340 a lot of the cost of an experiment is data collection. 1670 01:14:42,340 --> 01:14:45,150 So if you are trying to evaluate the impact of a 1671 01:14:45,150 --> 01:14:48,540 program through some other non-experimental method that 1672 01:14:48,540 --> 01:14:54,280 involves data collection, you've already made the two 1673 01:14:54,280 --> 01:14:55,540 costs pretty comparable. 1674 01:14:55,540 --> 01:14:58,340 Because again, data collection is a big cost. 1675 01:14:58,340 --> 01:15:00,760 If you had a non-experimental method where you don't have to 1676 01:15:00,760 --> 01:15:04,150 collect data, obviously there's no question that that 1677 01:15:04,150 --> 01:15:05,870 is going to be cheaper. 1678 01:15:05,870 --> 01:15:07,330 So it can be costly. 1679 01:15:07,330 --> 01:15:09,790 But again, main cost data collection, which may be the 1680 01:15:09,790 --> 01:15:13,490 same for non-experimental studies that collect data. 1681 01:15:13,490 --> 01:15:17,540 But the other thing about the experiment in terms of cost is 1682 01:15:17,540 --> 01:15:22,000 that the same sample size buys you more statistical power. 1683 01:15:22,000 --> 01:15:24,570 And you may see some of this on Thursday. 1684 01:15:24,570 --> 01:15:27,750 So if you have a sample size of 1,000 people for an 1685 01:15:27,750 --> 01:15:31,590 experimental study and a sample size of 1,000 people 1686 01:15:31,590 --> 01:15:35,310 for a non-experimental study, those data collections' cost 1687 01:15:35,310 --> 01:15:38,010 will be identical, but they will be buying you different 1688 01:15:38,010 --> 01:15:39,310 statistical power. 1689 01:15:39,310 --> 01:15:42,690 So that's one thing to keep in mind about the cost of 1690 01:15:42,690 --> 01:15:43,880 experiments. 1691 01:15:43,880 --> 01:15:47,580 And the last thing is, you need to factor in, what is the 1692 01:15:47,580 --> 01:15:49,140 cost of getting the wrong answers? 1693 01:15:49,140 --> 01:15:51,510 If you really think that non-experimental methods are 1694 01:15:51,510 --> 01:15:54,710 not going to work in your particular context, then it's 1695 01:15:54,710 --> 01:15:57,540 not so useful to invest less money if you don't think 1696 01:15:57,540 --> 01:15:58,870 you're going to get the same answer. 1697 01:15:58,870 --> 01:16:01,430 And again, I don't want to push the notion that only with 1698 01:16:01,430 --> 01:16:02,930 an experiment you'll get the right answer. 1699 01:16:02,930 --> 01:16:05,880 But if you think with a non-experiment, you won't get 1700 01:16:05,880 --> 01:16:08,700 the right answer, then the cost of the wrong answer, the 1701 01:16:08,700 --> 01:16:10,440 risk of a wrong answer. 1702 01:16:10,440 --> 01:16:13,550 Ethical issues. 1703 01:16:13,550 --> 01:16:15,720 Throw them at me. 1704 01:16:15,720 --> 01:16:18,380 AUDIENCE: How do you say no to people who come to you, saying 1705 01:16:18,380 --> 01:16:20,450 I want to put myself in this program. 1706 01:16:20,450 --> 01:16:23,090 I have all the characteristics you're asking for. 1707 01:16:23,090 --> 01:16:25,710 You're offering it to my neighbor. 1708 01:16:25,710 --> 01:16:27,080 How come you're not offering it to me? 1709 01:16:27,080 --> 01:16:28,600 PROFESSOR: OK. 1710 01:16:28,600 --> 01:16:34,120 The first thing to think about here is experiments are 1711 01:16:34,120 --> 01:16:39,180 typically done in context where there's access demand. 1712 01:16:39,180 --> 01:16:42,340 Where there are more people who want to be in your program 1713 01:16:42,340 --> 01:16:45,670 than can be served by your program. 1714 01:16:45,670 --> 01:16:48,740 And if that's the case, suppose you had 1,000 people 1715 01:16:48,740 --> 01:16:54,440 who applied to your program, and you can only serve 400. 1716 01:16:54,440 --> 01:16:56,800 The question I ask you, Cornelia-- 1717 01:16:56,800 --> 01:16:58,490 and only you-- 1718 01:16:58,490 --> 01:17:02,620 is how many people are you going to have to say, sorry, I 1719 01:17:02,620 --> 01:17:04,840 can't serve you? 1720 01:17:04,840 --> 01:17:06,030 600. 1721 01:17:06,030 --> 01:17:09,030 Both in the context of an experiment and in the context 1722 01:17:09,030 --> 01:17:10,730 of a non-experimental study. 1723 01:17:10,730 --> 01:17:14,990 The only thing that changes is how you decide who those 600 1724 01:17:14,990 --> 01:17:16,020 people are. 1725 01:17:16,020 --> 01:17:17,470 It's the only thing that changes. 1726 01:17:17,470 --> 01:17:22,200 And in fact, in some contexts, the flip of the coin can seem 1727 01:17:22,200 --> 01:17:27,190 more fair then you deciding, I think this person is more 1728 01:17:27,190 --> 01:17:29,990 deserving, or this person-- 1729 01:17:29,990 --> 01:17:33,190 So in that context, in the context where you're going to 1730 01:17:33,190 --> 01:17:37,070 have to turn away people, then the ethical issues, in my 1731 01:17:37,070 --> 01:17:40,200 mind, are much harder to justify. 1732 01:17:40,200 --> 01:17:43,120 I'm not saying there are no ethical issues in experiments. 1733 01:17:43,120 --> 01:17:44,380 There are some context in which 1734 01:17:44,380 --> 01:17:45,390 there are ethical issues. 1735 01:17:45,390 --> 01:17:48,760 So if you are completely convinced that your program 1736 01:17:48,760 --> 01:17:54,330 works, then why are you going to do this whole randomized 1737 01:17:54,330 --> 01:17:55,070 experiment? 1738 01:17:55,070 --> 01:17:57,450 The only thing I can tell you is that a lot of people have 1739 01:17:57,450 --> 01:18:00,060 been very convinced that some programs work, and then they 1740 01:18:00,060 --> 01:18:01,600 turn out not to work. 1741 01:18:01,600 --> 01:18:03,520 But if you are completely convinced that the program 1742 01:18:03,520 --> 01:18:06,860 works, then you shouldn't be doing it. 1743 01:18:06,860 --> 01:18:11,560 And then the other thing is, if you are testing an 1744 01:18:11,560 --> 01:18:16,210 intervention that you think can harm people, then there 1745 01:18:16,210 --> 01:18:18,000 are ethical issues involved. 1746 01:18:18,000 --> 01:18:22,610 So I don't think anyone will be very fond of doing an 1747 01:18:22,610 --> 01:18:27,720 experiment to try to find out whether smoking causes lung 1748 01:18:27,720 --> 01:18:30,590 cancer, for example. 1749 01:18:30,590 --> 01:18:33,270 Because we don't have experimental evidence, but the 1750 01:18:33,270 --> 01:18:34,920 medical evidence seems to be pretty 1751 01:18:34,920 --> 01:18:36,692 strongly in favor of that. 1752 01:18:36,692 --> 01:18:37,942 Maria Teresa? 1753 01:18:39,996 --> 01:18:42,230 AUDIENCE: A consequence of that ethical question, was 1754 01:18:42,230 --> 01:18:45,134 hard for me, was people who are indeed chosen to be in the 1755 01:18:45,134 --> 01:18:47,325 program and people who are not. 1756 01:18:47,325 --> 01:18:48,708 You have to come back to these people who are not and follow 1757 01:18:48,708 --> 01:18:50,355 up with them. 1758 01:18:50,355 --> 01:18:53,455 And how willing to cooperate were they to collect more 1759 01:18:53,455 --> 01:18:55,442 data, to talk with them. 1760 01:18:55,442 --> 01:18:56,649 And you know, working [UNINTELLIGIBLE] is really 1761 01:18:56,649 --> 01:19:01,410 hard, because you take time from the farmer for two hours 1762 01:19:01,410 --> 01:19:04,050 every couple months, and come back, and standing there. 1763 01:19:04,050 --> 01:19:06,760 I mean, while the other guy received something for these 1764 01:19:06,760 --> 01:19:08,072 two hours that are given to you. 1765 01:19:08,072 --> 01:19:09,245 So I think that that is the-- 1766 01:19:09,245 --> 01:19:11,610 Maybe you need to apply this more often. 1767 01:19:11,610 --> 01:19:12,280 PROFESSOR: Yeah. 1768 01:19:12,280 --> 01:19:16,250 So I mean, again, I think there are things you try to do 1769 01:19:16,250 --> 01:19:19,110 to deal with them. 1770 01:19:19,110 --> 01:19:22,790 That has to do more with the implementation of any study in 1771 01:19:22,790 --> 01:19:23,920 which you have a comparison group. 1772 01:19:23,920 --> 01:19:25,010 It's not the experiment. 1773 01:19:25,010 --> 01:19:26,200 Experiment has a control group. 1774 01:19:26,200 --> 01:19:28,480 With any other study that has a comparison group where 1775 01:19:28,480 --> 01:19:31,300 you're collecting data faces this issue. 1776 01:19:31,300 --> 01:19:32,650 And then there are things you can do. 1777 01:19:35,180 --> 01:19:36,470 It depends on the program. 1778 01:19:36,470 --> 01:19:40,100 But certainly sometimes offering some small incentive 1779 01:19:40,100 --> 01:19:44,070 for people in both groups to fill in the survey is 1780 01:19:44,070 --> 01:19:46,180 certainly one thing that could help. 1781 01:19:46,180 --> 01:19:49,490 The other thing that I think is very important is data 1782 01:19:49,490 --> 01:19:50,710 collection. 1783 01:19:50,710 --> 01:19:58,140 The average researcher, when they are asked the question, 1784 01:19:58,140 --> 01:20:00,480 do you want to add one more question to the survey? 1785 01:20:00,480 --> 01:20:03,940 The probability of saying yes is 99% for the average 1786 01:20:03,940 --> 01:20:04,440 researcher. 1787 01:20:04,440 --> 01:20:07,950 So if you have two hours in the field, you have to start 1788 01:20:07,950 --> 01:20:11,580 thinking, well, how many of this question do I really need 1789 01:20:11,580 --> 01:20:12,780 to be asking? 1790 01:20:12,780 --> 01:20:15,780 I mean, that's an issue of implementation versus-- 1791 01:20:15,780 --> 01:20:18,350 So I think there ways to do with this. 1792 01:20:18,350 --> 01:20:20,150 But again, it's not unique to experiment. 1793 01:20:20,150 --> 01:20:23,760 It really has to do with how you implement any study in 1794 01:20:23,760 --> 01:20:26,140 which you're going to collect data on people who are not 1795 01:20:26,140 --> 01:20:29,520 receiving any benefit. 1796 01:20:29,520 --> 01:20:29,900 Yes? 1797 01:20:29,900 --> 01:20:30,750 Ethical issues? 1798 01:20:30,750 --> 01:20:31,730 AUDIENCE: Nigel. 1799 01:20:31,730 --> 01:20:34,180 I think an answer which-- 1800 01:20:34,180 --> 01:20:34,640 PROFESSOR: Nigel. 1801 01:20:34,640 --> 01:20:35,910 You are from the Kennedy School. 1802 01:20:35,910 --> 01:20:36,810 Very nice to meet you. 1803 01:20:36,810 --> 01:20:38,060 AUDIENCE: I'm leaving next week. 1804 01:20:40,130 --> 01:20:43,090 The issue of, even if you had as much money as you kept to 1805 01:20:43,090 --> 01:20:45,190 all give to those 1,000 people, you 1806 01:20:45,190 --> 01:20:46,620 can't do them all today. 1807 01:20:46,620 --> 01:20:49,870 So the way to do it is say, OK, we'll do 500 this year and 1808 01:20:49,870 --> 01:20:50,830 500 next year. 1809 01:20:50,830 --> 01:20:56,410 So you're getting all 1,000 people, but you do your 1810 01:20:56,410 --> 01:20:58,670 randomized evaluation year one. 1811 01:20:58,670 --> 01:20:59,700 PROFESSOR: Exactly. 1812 01:20:59,700 --> 01:21:02,660 And tomorrow there are going to be two sessions on how to 1813 01:21:02,660 --> 01:21:06,560 do roll out design-- there's a bunch of designs that are 1814 01:21:06,560 --> 01:21:09,504 applying the same principle. 1815 01:21:09,504 --> 01:21:14,150 AUDIENCE: When you think about the cost of the study, don't 1816 01:21:14,150 --> 01:21:17,970 you think a question you should deal with way early on 1817 01:21:17,970 --> 01:21:22,080 is the size of the impact that you're looking for? 1818 01:21:22,080 --> 01:21:24,418 PROFESSOR: Absolutely. 1819 01:21:24,418 --> 01:21:27,840 AUDIENCE: If the study is going to cost me a lot of 1820 01:21:27,840 --> 01:21:35,280 money, and there's a significant probability that 1821 01:21:35,280 --> 01:21:37,890 it might have only a small effect, then that maybe isn't 1822 01:21:37,890 --> 01:21:40,144 worth bothering with. 1823 01:21:40,144 --> 01:21:44,440 And so you talked about looking up the size of the 1824 01:21:44,440 --> 01:21:48,080 effect and the statistics, and whether it's statistically 1825 01:21:48,080 --> 01:21:48,886 significant. 1826 01:21:48,886 --> 01:21:53,270 But that size question, it seems to me, gets 1827 01:21:53,270 --> 01:21:55,730 looked at very late. 1828 01:21:55,730 --> 01:22:01,253 And it should be way up front in the very early days because 1829 01:22:01,253 --> 01:22:05,015 of the impact, whether the program is really of interest, 1830 01:22:05,015 --> 01:22:07,000 and worth following. 1831 01:22:07,000 --> 01:22:08,880 PROFESSOR: So two quick reactions. 1832 01:22:08,880 --> 01:22:11,190 The first one is what Rachel said. 1833 01:22:11,190 --> 01:22:13,270 Think strategically about impact evaluations. 1834 01:22:13,270 --> 01:22:16,330 You don't want to evaluate every single thing that's in 1835 01:22:16,330 --> 01:22:19,640 your organization or every single thing under the sun. 1836 01:22:19,640 --> 01:22:22,270 You're not going to be able to do an impact evaluation on all 1837 01:22:22,270 --> 01:22:23,180 of those things. 1838 01:22:23,180 --> 01:22:25,800 You may do other kinds of evaluations on hopefully most 1839 01:22:25,800 --> 01:22:28,480 of your programs, but an impact evaluation, you should 1840 01:22:28,480 --> 01:22:30,810 be very strategic on where you do it. 1841 01:22:30,810 --> 01:22:33,350 And if you think this is a program that is not generating 1842 01:22:33,350 --> 01:22:36,010 much impact and it's not costing you that much money, 1843 01:22:36,010 --> 01:22:39,410 then you may say, I'm not going to evaluate it. 1844 01:22:39,410 --> 01:22:45,710 The second thing I would say with regard to that is 1845 01:22:45,710 --> 01:22:48,660 thinking about the effect of the program is something you 1846 01:22:48,660 --> 01:22:51,460 need to do at stage one, the designing of the study. 1847 01:22:51,460 --> 01:22:54,770 And this will connect with your session on sample size 1848 01:22:54,770 --> 01:22:57,200 that Esther will speak about on Thursday. 1849 01:22:57,200 --> 01:23:00,910 Because thinking about the larger that impact is, that 1850 01:23:00,910 --> 01:23:03,770 affects your calculations of sample size. 1851 01:23:03,770 --> 01:23:07,690 The paradox in all of this, despite of what you said, the 1852 01:23:07,690 --> 01:23:10,400 paradox in all of this is that the bigger the 1853 01:23:10,400 --> 01:23:13,110 effect of the program-- 1854 01:23:13,110 --> 01:23:14,990 so if you expect this program is going 1855 01:23:14,990 --> 01:23:17,580 to have a huge effect-- 1856 01:23:17,580 --> 01:23:20,240 the smaller the sample size you need, and hence the 1857 01:23:20,240 --> 01:23:22,100 smaller the data collection costs. 1858 01:23:22,100 --> 01:23:25,450 So paradoxically, if the program is extremely 1859 01:23:25,450 --> 01:23:29,250 important, the data collection cost should actually be lower 1860 01:23:29,250 --> 01:23:31,740 than a program where you want to detect effects that are 1861 01:23:31,740 --> 01:23:32,530 very small. 1862 01:23:32,530 --> 01:23:35,970 Having said that, you want to evaluate the programs that 1863 01:23:35,970 --> 01:23:38,630 make strategic sense for you to evaluate. 1864 01:23:38,630 --> 01:23:41,390 I mean, one thing I think you should try to avoid, despite 1865 01:23:41,390 --> 01:23:44,390 all our enthusiasm with randomized experiment, you 1866 01:23:44,390 --> 01:23:46,670 shouldn't leave this course thinking, OK. 1867 01:23:46,670 --> 01:23:49,420 Where do I see an opportunity to randomize? 1868 01:23:49,420 --> 01:23:53,840 And then forget about what is it that you're trying to do. 1869 01:23:53,840 --> 01:23:56,880 You know, you may find a great opportunity to randomize, but 1870 01:23:56,880 --> 01:23:58,890 if it doesn't answer a question you care about, 1871 01:23:58,890 --> 01:24:02,470 you've just wasted money. 1872 01:24:02,470 --> 01:24:04,970 All right, so-- 1873 01:24:04,970 --> 01:24:06,160 you have a question? 1874 01:24:06,160 --> 01:24:08,950 This is very interesting. 1875 01:24:08,950 --> 01:24:14,770 AUDIENCE: I want to know, do you think that in any context, 1876 01:24:14,770 --> 01:24:16,470 one can be able to carry out an impact evaluation? 1877 01:24:20,390 --> 01:24:22,350 For any type of program-- 1878 01:24:22,350 --> 01:24:26,860 PROFESSOR: So my answer to that is 1879 01:24:26,860 --> 01:24:29,340 no, not in any context. 1880 01:24:29,340 --> 01:24:33,600 But probably in more contexts than you think about. 1881 01:24:33,600 --> 01:24:34,540 That is my short answer. 1882 01:24:34,540 --> 01:24:39,420 AUDIENCE: What about, for example, infrastructure--? 1883 01:24:39,420 --> 01:24:40,220 PROFESSOR: There have been. 1884 01:24:40,220 --> 01:24:41,600 It's harder to do. 1885 01:24:41,600 --> 01:24:42,920 There have been some studies. 1886 01:24:42,920 --> 01:24:44,970 This is actually, I think, a growing area. 1887 01:24:44,970 --> 01:24:48,760 This is an area where people are trying to do some impact 1888 01:24:48,760 --> 01:24:49,670 evaluation. 1889 01:24:49,670 --> 01:24:51,780 I mean, if you're building a road in the middle of the 1890 01:24:51,780 --> 01:24:55,180 country, and this is one road for the whole country-- 1891 01:24:55,180 --> 01:24:56,270 you can't do it. 1892 01:24:56,270 --> 01:24:57,070 But it's OK. 1893 01:24:57,070 --> 01:25:00,820 You don't need to do an impact evaluation for everything you 1894 01:25:00,820 --> 01:25:04,130 do, and you don't need to do a randomized impact evaluation 1895 01:25:04,130 --> 01:25:05,510 for everything you do. 1896 01:25:05,510 --> 01:25:09,450 What I do hope the message comes clear is, if you decide 1897 01:25:09,450 --> 01:25:12,100 to do an impact evaluation, then thinking about a 1898 01:25:12,100 --> 01:25:15,290 randomized design should be your first choice. 1899 01:25:15,290 --> 01:25:19,460 If you can't do it-- and can't do it is not just, oh, there's 1900 01:25:19,460 --> 01:25:20,190 some issues-- 1901 01:25:20,190 --> 01:25:20,740 no, no. 1902 01:25:20,740 --> 01:25:23,510 Can't do it, really trying, given all these advantages, 1903 01:25:23,510 --> 01:25:27,490 really trying-- if you can't do it, then you may consider 1904 01:25:27,490 --> 01:25:28,810 doing other things. 1905 01:25:28,810 --> 01:25:31,680 But this should be your first option if you decide to do an 1906 01:25:31,680 --> 01:25:32,930 impact evaluation. 1907 01:25:34,990 --> 01:25:35,310 All right. 1908 01:25:35,310 --> 01:25:37,200 Partial equilibrium. 1909 01:25:37,200 --> 01:25:38,470 It's a little bit more technical. 1910 01:25:38,470 --> 01:25:42,050 But if you have a program that only affects some people 1911 01:25:42,050 --> 01:25:43,750 differentially. 1912 01:25:43,750 --> 01:25:47,440 So suppose you had a program that was going to train people 1913 01:25:47,440 --> 01:25:50,740 on how to have better resumes. 1914 01:25:50,740 --> 01:25:53,960 And if you only do it for a few people, then this program 1915 01:25:53,960 --> 01:25:55,040 may have a huge effect. 1916 01:25:55,040 --> 01:25:58,270 But if you do it for everyone in your town, there's going to 1917 01:25:58,270 --> 01:26:01,000 be little advantage that's gained from this. 1918 01:26:01,000 --> 01:26:05,200 And so the randomized experiment estimates a partial 1919 01:26:05,200 --> 01:26:06,320 equilibrium effect. 1920 01:26:06,320 --> 01:26:09,140 You don't know what would happen if everyone in a 1921 01:26:09,140 --> 01:26:11,270 particular setting got the treatment. 1922 01:26:11,270 --> 01:26:15,290 I think this is important in some settings, but not enough. 1923 01:26:15,290 --> 01:26:15,610 All right. 1924 01:26:15,610 --> 01:26:18,960 So I'm not going to go too much about get out the vote, 1925 01:26:18,960 --> 01:26:22,260 because we're already a minute away from time. 1926 01:26:22,260 --> 01:26:29,180 What I want to do is just show you this table here. 1927 01:26:29,180 --> 01:26:30,450 You already discussed it. 1928 01:26:40,110 --> 01:26:43,670 So this is what the case study shows. 1929 01:26:43,670 --> 01:26:46,970 This is a situation where you had four 1930 01:26:46,970 --> 01:26:49,950 methods to estimate impacts. 1931 01:26:49,950 --> 01:26:52,540 The first four methods found out that the 1932 01:26:52,540 --> 01:26:54,380 program had an effect. 1933 01:26:54,380 --> 01:26:57,130 The last method, the randomized experiment, found 1934 01:26:57,130 --> 01:26:59,590 no statistically significant effect. 1935 01:26:59,590 --> 01:27:02,470 I'm not saying that in every single-- this goes back to 1936 01:27:02,470 --> 01:27:03,070 your question. 1937 01:27:03,070 --> 01:27:04,170 I'm not saying that in every single 1938 01:27:04,170 --> 01:27:06,320 setting, this will happen. 1939 01:27:06,320 --> 01:27:09,630 But this is a good example of a setting in which if you had 1940 01:27:09,630 --> 01:27:11,680 gone with any of these techniques, you would have 1941 01:27:11,680 --> 01:27:14,250 concluded the program had an effect when it didn't. 1942 01:27:14,250 --> 01:27:17,050 And there are other settings where the reverse may happen. 1943 01:27:17,050 --> 01:27:21,830 And so if we were able to say ex ante, before the 1944 01:27:21,830 --> 01:27:24,900 evaluation, this method is going to be just as good as 1945 01:27:24,900 --> 01:27:27,070 the experiment, that's great. 1946 01:27:27,070 --> 01:27:29,790 We may be able to save some money if there's no data 1947 01:27:29,790 --> 01:27:31,820 collection involved, and that would be great. 1948 01:27:31,820 --> 01:27:34,440 But I think the bottom line here is, we 1949 01:27:34,440 --> 01:27:35,980 are not always able-- 1950 01:27:35,980 --> 01:27:39,150 and I think very few people will tell you, we know when 1951 01:27:39,150 --> 01:27:40,900 this method will work. 1952 01:27:40,900 --> 01:27:46,090 Because the assumption behind each of this methods on how 1953 01:27:46,090 --> 01:27:47,660 the work is untestable-- 1954 01:27:47,660 --> 01:27:50,890 you can't statistically test that assumption. 1955 01:27:50,890 --> 01:27:53,680 So you may argue in favor of it. 1956 01:27:53,680 --> 01:27:56,640 You may show evidence in favor of it. 1957 01:27:56,640 --> 01:27:58,520 But you can't specifically test it. 1958 01:27:58,520 --> 01:28:04,040 And that's the big advantage of the experiment. 1959 01:28:04,040 --> 01:28:09,250 So let me just close with what I hope are the 1960 01:28:09,250 --> 01:28:10,660 bottom lines from this. 1961 01:28:10,660 --> 01:28:12,940 The first thing, what's underlined there. 1962 01:28:12,940 --> 01:28:15,290 If properly designed and conducted, the social 1963 01:28:15,290 --> 01:28:17,290 experiments provide the most credible 1964 01:28:17,290 --> 01:28:19,770 assessment of the program. 1965 01:28:19,770 --> 01:28:22,440 But the "if" is a very important "if." Don't leave 1966 01:28:22,440 --> 01:28:25,400 this course thinking, if it's a randomized 1967 01:28:25,400 --> 01:28:26,870 experiment, piece of cake. 1968 01:28:26,870 --> 01:28:27,850 Everything will work. 1969 01:28:27,850 --> 01:28:29,930 That's not the message that we want to give you here. 1970 01:28:29,930 --> 01:28:33,180 It needs to be properly designed and conducted. 1971 01:28:33,180 --> 01:28:35,602 And for that, you really need a partnership between the 1972 01:28:35,602 --> 01:28:38,990 evaluators and the agencies implementing it. 1973 01:28:38,990 --> 01:28:41,480 They're easy to understand, much less subject to the 1974 01:28:41,480 --> 01:28:45,700 methodological quibbles, and more likely to convince 1975 01:28:45,700 --> 01:28:47,060 policymakers. 1976 01:28:47,060 --> 01:28:50,890 These advantages are only present if they are properly 1977 01:28:50,890 --> 01:28:54,310 conducted and implemented, and you must assess the validity 1978 01:28:54,310 --> 01:28:56,540 of experiment in the same way you assess the 1979 01:28:56,540 --> 01:28:57,760 validity of any studies. 1980 01:28:57,760 --> 01:29:00,150 Because you're going to have threats to an experiment 1981 01:29:00,150 --> 01:29:02,750 anyway, and on Friday, you're going to learn how to deal 1982 01:29:02,750 --> 01:29:04,430 with some of them. 1983 01:29:04,430 --> 01:29:06,790 I hope this was moderately helpful. 1984 01:29:06,790 --> 01:29:09,990 I think I have one of the toughest sessions to teach, 1985 01:29:09,990 --> 01:29:13,740 because you guys, some of you come completely convinced of 1986 01:29:13,740 --> 01:29:16,300 why you want to randomize, some of you come very 1987 01:29:16,300 --> 01:29:18,410 skeptical, and I have to reach a middle ground. 1988 01:29:18,410 --> 01:29:19,650 I hope I did. 1989 01:29:19,650 --> 01:29:22,380 If you have one more question, I'll take it. 1990 01:29:22,380 --> 01:29:23,329 Yes? 1991 01:29:23,329 --> 01:29:26,263 AUDIENCE: Have you found that it's possible to teach 1992 01:29:26,263 --> 01:29:29,360 organizations to run their own randomized trials from start 1993 01:29:29,360 --> 01:29:32,510 to finish, even if there are no economists on staff? 1994 01:29:32,510 --> 01:29:36,220 Or does this always sort of require the intervention or 1995 01:29:36,220 --> 01:29:39,740 assistance of outside modulators? 1996 01:29:39,740 --> 01:29:42,710 PROFESSOR: I think, as you will see throughout this 1997 01:29:42,710 --> 01:29:47,470 course, conducting an impact evaluation, even a randomized 1998 01:29:47,470 --> 01:29:50,530 one, does involve some technical skills and does 1999 01:29:50,530 --> 01:29:53,420 involve some practical experience in doing it. 2000 01:29:53,420 --> 01:29:58,180 I'm not saying those cannot be found in organizations that 2001 01:29:58,180 --> 01:29:59,510 are in the field. 2002 01:29:59,510 --> 01:30:02,130 But if those skills are not there, it's going to be very 2003 01:30:02,130 --> 01:30:04,000 hard to do it. 2004 01:30:04,000 --> 01:30:07,710 Now, you can do a lot of training on 2005 01:30:07,710 --> 01:30:09,180 how to do this things. 2006 01:30:09,180 --> 01:30:13,620 But I think it'd be hard to do it without someone who has at 2007 01:30:13,620 --> 01:30:16,330 least done a few of these and seen some of the 2008 01:30:16,330 --> 01:30:17,590 problems that arise. 2009 01:30:17,590 --> 01:30:19,200 Because problems will arise-- 2010 01:30:19,200 --> 01:30:21,830 I mean, no question about it. 2011 01:30:21,830 --> 01:30:27,030 You will be asking the evaluator, how far can we go? 2012 01:30:27,030 --> 01:30:30,100 And the evaluator, whoever it is, whether they're in the 2013 01:30:30,100 --> 01:30:34,490 agency or not, needs to be able to answer that question 2014 01:30:34,490 --> 01:30:38,200 in a way that at the end, you have a credible evaluation. 2015 01:30:38,200 --> 01:30:42,970 I'm not saying you need an expert outside of the 2016 01:30:42,970 --> 01:30:44,160 organization. 2017 01:30:44,160 --> 01:30:46,630 But I am saying you need an expert somewhere. 2018 01:30:46,630 --> 01:30:49,880 And whether you have it inside or outside, there's a whole 2019 01:30:49,880 --> 01:30:52,830 issue of independence versus objectivity that 2020 01:30:52,830 --> 01:30:54,931 I won't speak to. 2021 01:30:54,931 --> 01:30:58,700 AUDIENCE: Consumer companies do it. 2022 01:30:58,700 --> 01:31:00,072 PROFESSOR: Consumer companies? 2023 01:31:00,072 --> 01:31:00,554 AUDIENCE: Yeah. 2024 01:31:00,554 --> 01:31:03,205 Procter & Gamble and big companies like that do 2025 01:31:03,205 --> 01:31:05,615 experiments all the time, build their capability into 2026 01:31:05,615 --> 01:31:10,080 the organization, how they make decisions. 2027 01:31:10,080 --> 01:31:12,510 I'm just wondering that if someone leaving this course 2028 01:31:12,510 --> 01:31:14,880 with a few experiments under their belt could implement 2029 01:31:14,880 --> 01:31:18,000 something like this, or whether you need to go as far 2030 01:31:18,000 --> 01:31:22,074 as getting an economics degree in order to be able to do the 2031 01:31:22,074 --> 01:31:25,000 coordinating and evaluation of this type. 2032 01:31:25,000 --> 01:31:29,450 PROFESSOR: So I think to do an impact evaluation, there are 2033 01:31:29,450 --> 01:31:32,290 usually more than one people involved. 2034 01:31:32,290 --> 01:31:35,050 And there are different roles for different people. 2035 01:31:35,050 --> 01:31:38,100 There are some roles who are having good training in 2036 01:31:38,100 --> 01:31:40,540 economics as particularly useful. 2037 01:31:40,540 --> 01:31:44,480 There are other roles where I would say it's particularly 2038 01:31:44,480 --> 01:31:46,920 un-useful to be an economist. 2039 01:31:46,920 --> 01:31:55,360 So I really think it depends on what role a person leaving 2040 01:31:55,360 --> 01:32:01,640 this course would like to sort of play in the evaluation. 2041 01:32:01,640 --> 01:32:07,550 And you know, whether leaving this course, you'll be able to 2042 01:32:07,550 --> 01:32:12,430 run your experiments on your own-- 2043 01:32:12,430 --> 01:32:14,690 I think would be an extremely successful 2044 01:32:14,690 --> 01:32:17,610 course if that happened. 2045 01:32:17,610 --> 01:32:20,740 We have no way to measure the impact of this program, but if 2046 01:32:20,740 --> 01:32:23,670 that were to happen, relative to what would have happened if 2047 01:32:23,670 --> 01:32:25,860 you had not come to this course, that would be 2048 01:32:25,860 --> 01:32:27,840 phenomenal. 2049 01:32:27,840 --> 01:32:31,460 I think my sense is unless you have prior training in this 2050 01:32:31,460 --> 01:32:35,280 kind of thing, what this course will hopefully give you 2051 01:32:35,280 --> 01:32:39,940 is the ability to be involved in an evaluation and to be 2052 01:32:39,940 --> 01:32:45,180 pretty good at interacting with whoever is also involved 2053 01:32:45,180 --> 01:32:47,000 in evaluation at asking the right 2054 01:32:47,000 --> 01:32:48,590 question of the evaluator. 2055 01:32:48,590 --> 01:32:50,800 This is extremely important. 2056 01:32:50,800 --> 01:32:54,330 And being very aware in the field of what may be 2057 01:32:54,330 --> 01:32:56,440 threatening an evaluation. 2058 01:32:56,440 --> 01:32:59,970 If you're able to do it on your own after this, I hate to 2059 01:32:59,970 --> 01:33:02,850 say it, but I don't think it's because this session that you 2060 01:33:02,850 --> 01:33:06,370 heard from me today. 2061 01:33:06,370 --> 01:33:06,770 All right. 2062 01:33:06,770 --> 01:33:10,120 I think I already ate a few minutes into your time. 2063 01:33:10,120 --> 01:33:10,970 It was a pleasure. 2064 01:33:10,970 --> 01:33:13,440 I'll be here for a few more minutes if you want. 2065 01:33:13,440 --> 01:33:17,630 I hope you have a wonderful rest of the course, and see 2066 01:33:17,630 --> 01:33:18,880 you somewhere.