1 00:00:00,030 --> 00:00:02,400 The following content is provided under a Creative 2 00:00:02,400 --> 00:00:03,780 Commons license. 3 00:00:03,780 --> 00:00:06,020 Your support will help MIT OpenCourseWare 4 00:00:06,020 --> 00:00:10,100 continue to offer high quality educational resources for free. 5 00:00:10,100 --> 00:00:12,670 To make a donation or to view additional materials 6 00:00:12,670 --> 00:00:16,580 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,580 --> 00:00:17,235 at ocw.mit.edu. 8 00:00:27,200 --> 00:00:32,189 KAREN WILLCOX: So first up, bootstrapping, 9 00:00:32,189 --> 00:00:35,390 which is really aimed at this question 10 00:00:35,390 --> 00:00:38,880 how do we get estimates of the standard errors 11 00:00:38,880 --> 00:00:41,990 and our estimators that don't have known distribution? 12 00:00:41,990 --> 00:00:45,695 So remember we were talking on Wednesday. 13 00:00:45,695 --> 00:00:47,320 I don't even know what day it is today. 14 00:00:47,320 --> 00:00:48,020 Monday. 15 00:00:48,020 --> 00:00:51,420 We were talking on Monday about how you can run the Monte Carlo 16 00:00:51,420 --> 00:00:55,470 simulation and then we know, for example, the mean estimate 17 00:00:55,470 --> 00:00:58,311 and the limit then goes to infinity by the central limit 18 00:00:58,311 --> 00:00:58,810 theorem. 19 00:00:58,810 --> 00:01:00,760 Then the distribution of the estimator 20 00:01:00,760 --> 00:01:04,300 itself, estimate for the main is normal. 21 00:01:04,300 --> 00:01:05,860 And the mean of that distribution 22 00:01:05,860 --> 00:01:07,901 is the actual mean that we're trying to estimate. 23 00:01:07,901 --> 00:01:08,864 It's unbiased. 24 00:01:08,864 --> 00:01:11,280 And the standard deviation of that distribution, remember, 25 00:01:11,280 --> 00:01:13,750 was the standard deviation of the quantity 26 00:01:13,750 --> 00:01:17,970 we're putting in an output divided by square root of n. 27 00:01:17,970 --> 00:01:19,480 But we also said that, for example, 28 00:01:19,480 --> 00:01:21,105 the estimate of the variance-- remember 29 00:01:21,105 --> 00:01:24,340 Alex told you three different ways to estimate variance-- 30 00:01:24,340 --> 00:01:27,160 that those estimators don't have known distributions. 31 00:01:27,160 --> 00:01:29,660 They're not necessarily following a normal distribution 32 00:01:29,660 --> 00:01:31,580 or any other known distribution. 33 00:01:31,580 --> 00:01:35,197 So bootstrapping is a trick, and actually, 34 00:01:35,197 --> 00:01:37,030 as I was putting together today's lecture, I 35 00:01:37,030 --> 00:01:38,310 [AUDIO OUT] tricks. 36 00:01:38,310 --> 00:01:42,720 Statisticians tend to have lots of tricks up their sleeves-- 37 00:01:42,720 --> 00:01:44,900 just going to pull out the section on the notes 38 00:01:44,900 --> 00:01:49,890 because I think you should have read this-- 39 00:01:49,890 --> 00:01:51,520 as to how bootstrapping works. 40 00:01:51,520 --> 00:01:56,880 And there's a lot of words here, but let's just 41 00:01:56,880 --> 00:01:59,505 come down to the [INAUDIBLE]. 42 00:01:59,505 --> 00:02:00,880 So what do we do in bootstrapping 43 00:02:00,880 --> 00:02:03,710 is perform our initial Monte Carlo sample, 44 00:02:03,710 --> 00:02:05,210 just like we've been talking about. 45 00:02:05,210 --> 00:02:09,240 So take the inputs, draw in realizations from the input 46 00:02:09,240 --> 00:02:11,710 distributions, run them each through the model, 47 00:02:11,710 --> 00:02:15,250 produce the samples, the yi's, and compute the desired 48 00:02:15,250 --> 00:02:18,257 estimator So this theta here might be a mean estimate 49 00:02:18,257 --> 00:02:20,090 or think of it as a variance instrument, one 50 00:02:20,090 --> 00:02:24,600 we can't necessarily easily do the confidence interval 51 00:02:24,600 --> 00:02:25,960 analysis on. 52 00:02:25,960 --> 00:02:27,890 Then what bootstrapping says is, well, 53 00:02:27,890 --> 00:02:29,607 now you've got these N samples. 54 00:02:29,607 --> 00:02:31,190 And remember each one of these samples 55 00:02:31,190 --> 00:02:33,530 is being run through your finite element model, run 56 00:02:33,530 --> 00:02:35,500 through your CAD model. 57 00:02:35,500 --> 00:02:38,100 Let's now re sample those values. 58 00:02:38,100 --> 00:02:42,790 But N samples of y, N samples of the blade middle hot side 59 00:02:42,790 --> 00:02:44,660 temperatures sitting in the bin, let's 60 00:02:44,660 --> 00:02:46,957 just re sample from those ones. 61 00:02:46,957 --> 00:02:48,040 And why is that efficient? 62 00:02:48,040 --> 00:02:49,998 Because we don't have to run the model anymore. 63 00:02:49,998 --> 00:02:52,350 We don't have to do the finite element analysis. 64 00:02:52,350 --> 00:02:55,150 And we're just going to re sample from those 65 00:02:55,150 --> 00:02:56,490 with equal probability. 66 00:02:56,490 --> 00:03:01,050 So probability 1 over N int, we do it with replacement, 67 00:03:01,050 --> 00:03:05,590 so you could pick the same sample more than once. 68 00:03:05,590 --> 00:03:06,590 So they're pretty clear. 69 00:03:06,590 --> 00:03:07,570 We had N samples. 70 00:03:07,570 --> 00:03:09,920 Then we're going to sample N time from those, 71 00:03:09,920 --> 00:03:11,840 but we're going to get a different set of N 72 00:03:11,840 --> 00:03:14,697 because, I mean, maybe we'll pick each one once, 73 00:03:14,697 --> 00:03:17,280 but more likely we're going to pick that one a couple of times 74 00:03:17,280 --> 00:03:18,820 and this one not at all. 75 00:03:18,820 --> 00:03:22,170 So we'll have a different sampling, 76 00:03:22,170 --> 00:03:25,150 we could compute the estimator, and we could keep 77 00:03:25,150 --> 00:03:27,760 doing that N minus 1 times. 78 00:03:27,760 --> 00:03:30,820 In the end, we're going to end up with N values 79 00:03:30,820 --> 00:03:33,090 to the estimator. 80 00:03:33,090 --> 00:03:38,051 And from there you could try to determine competence intervals. 81 00:03:38,051 --> 00:03:40,950 So it's kind of a trick because it's kind of cheating. 82 00:03:40,950 --> 00:03:45,089 And it's not the same thing as running a new Monte Carlo, 83 00:03:45,089 --> 00:03:47,130 but hopefully, you can see that this is something 84 00:03:47,130 --> 00:03:50,609 you can do with almost no more computational cost, 85 00:03:50,609 --> 00:03:52,900 and it actually turns out that if you do some analysis, 86 00:03:52,900 --> 00:03:55,130 this is a reasonable thing to do and it 87 00:03:55,130 --> 00:03:56,880 take a lot of statisticians to analyze it. 88 00:03:56,880 --> 00:04:00,080 So when people talk about bootstrapping in this context, 89 00:04:00,080 --> 00:04:01,700 this is what they're talking about, 90 00:04:01,700 --> 00:04:06,130 re sampling from your samples, putting the information 91 00:04:06,130 --> 00:04:07,915 together in different ways, and then using 92 00:04:07,915 --> 00:04:10,880 that to get a sense of the distribution 93 00:04:10,880 --> 00:04:14,000 and say that they're [? untestable. ?] 94 00:04:14,000 --> 00:04:15,045 Is that clear? 95 00:04:19,562 --> 00:04:22,550 So [AUDIO OUT] internet, so I didn't get time 96 00:04:22,550 --> 00:04:26,480 to go through it on Monday. 97 00:04:26,480 --> 00:04:28,650 But I mostly want us to spin today 98 00:04:28,650 --> 00:04:32,030 talking about different ways to reduce variance. 99 00:04:32,030 --> 00:04:35,600 And again, the really important idea 100 00:04:35,600 --> 00:04:40,231 that should be in your heads is the quality of the estimators 101 00:04:40,231 --> 00:04:42,490 that come out depends on how many samples in the Monte 102 00:04:42,490 --> 00:04:43,280 Carlo. 103 00:04:43,280 --> 00:04:46,860 And if you can afford millions of samples, 104 00:04:46,860 --> 00:04:50,430 you can probably get really good estimates for the mean 105 00:04:50,430 --> 00:04:53,530 and maybe OK estimates of the variance. 106 00:04:53,530 --> 00:04:55,850 But if, for example, you're interested 107 00:04:55,850 --> 00:05:01,012 in tail probabilities, so think about an aircraft design, 108 00:05:01,012 --> 00:05:02,220 a new flight critical system. 109 00:05:02,220 --> 00:05:06,210 Maybe they has to be a 10 to the minus 9 failure probability 110 00:05:06,210 --> 00:05:07,370 that we have to satisfy. 111 00:05:07,370 --> 00:05:09,970 So if we want to estimate that by Monte Carlo simulation, 112 00:05:09,970 --> 00:05:13,950 how many samples do we need? 113 00:05:13,950 --> 00:05:14,460 A lot. 114 00:05:14,460 --> 00:05:15,360 How many is a lot? 115 00:05:15,360 --> 00:05:16,258 At least how many? 116 00:05:20,166 --> 00:05:22,360 At least a billion. 117 00:05:22,360 --> 00:05:24,210 If a probability is 10 to the minus 9 118 00:05:24,210 --> 00:05:27,560 and we had a billion samples, then on average, 119 00:05:27,560 --> 00:05:30,820 you would expect one sample to be that failure. 120 00:05:30,820 --> 00:05:35,180 So that's not even going to be enough because you might get 9, 121 00:05:35,180 --> 00:05:36,800 you might get 1, you might get 2. 122 00:05:36,800 --> 00:05:38,383 So that means you're going to estimate 123 00:05:38,383 --> 00:05:40,390 this probability differently. 124 00:05:40,390 --> 00:05:43,130 So billions, tens of billions, hundreds of billions 125 00:05:43,130 --> 00:05:46,580 of samples to estimate what are called [? variates, ?] 126 00:05:46,580 --> 00:05:49,470 but even just in general, even just means, if you're talking 127 00:05:49,470 --> 00:05:52,680 about running a CSC code or a finite element simulation that 128 00:05:52,680 --> 00:05:55,260 takes minutes to run, we usually can't even 129 00:05:55,260 --> 00:05:57,490 afford hundreds of thousands of simulations. 130 00:05:57,490 --> 00:05:59,610 So the variance reduction method, 131 00:05:59,610 --> 00:06:02,360 trying to increase the accuracy of our Monte Carlo 132 00:06:02,360 --> 00:06:05,480 estimates for a given number of iterations. 133 00:06:05,480 --> 00:06:07,570 And I think of them, their kind of tricks, 134 00:06:07,570 --> 00:06:09,235 and they're really clever tricks. 135 00:06:09,235 --> 00:06:11,230 I'll try to show you how, at least, 136 00:06:11,230 --> 00:06:12,880 important sampling works. 137 00:06:12,880 --> 00:06:14,380 And you can think of it in two ways. 138 00:06:14,380 --> 00:06:16,580 Here, if you've got a good number of samples, 139 00:06:16,580 --> 00:06:20,710 I'm willing to run a million samples, how much accuracy can 140 00:06:20,710 --> 00:06:22,690 I get in my estimators? 141 00:06:22,690 --> 00:06:25,320 Or you could think if I want to achieve a given 142 00:06:25,320 --> 00:06:27,920 level of accuracy in my estimator, 143 00:06:27,920 --> 00:06:29,540 how many samples do I need, and can we 144 00:06:29,540 --> 00:06:31,480 reduce the number of samples? 145 00:06:31,480 --> 00:06:32,980 So there are a lot of different ways 146 00:06:32,980 --> 00:06:35,640 that people do variance reduction-- important sampling, 147 00:06:35,640 --> 00:06:39,280 what's called antithetic sampling, control variates, 148 00:06:39,280 --> 00:06:40,471 stratified sampling. 149 00:06:40,471 --> 00:06:41,970 And I've decided I was going to talk 150 00:06:41,970 --> 00:06:43,736 about control variates and importance sampling, 151 00:06:43,736 --> 00:06:46,194 but I think we'll just talk about importance sampling today 152 00:06:46,194 --> 00:06:50,760 because it's already quite a lot to think about. 153 00:06:50,760 --> 00:06:53,310 So in general importance sampling, 154 00:06:53,310 --> 00:06:59,220 is a general technique for estimating properties 155 00:06:59,220 --> 00:07:02,360 of one distribution while only having samples 156 00:07:02,360 --> 00:07:04,930 generated from another different distribution. 157 00:07:04,930 --> 00:07:06,600 So keep this in mind as we go through, 158 00:07:06,600 --> 00:07:07,859 because it's a key idea. 159 00:07:07,859 --> 00:07:09,400 We're going to have samples generated 160 00:07:09,400 --> 00:07:11,490 from one distribution, and we're going 161 00:07:11,490 --> 00:07:14,850 to figure out how to manipulate those samples so that they 162 00:07:14,850 --> 00:07:19,240 can estimate things about a different distribution for us. 163 00:07:19,240 --> 00:07:21,840 And it seems kind of like a wacky thing to do, 164 00:07:21,840 --> 00:07:26,330 but what you'll see is that if you pick this distribution 165 00:07:26,330 --> 00:07:29,300 carefully, we can say things about the statistics 166 00:07:29,300 --> 00:07:32,750 of the other distribution in a more efficient way 167 00:07:32,750 --> 00:07:35,060 with the Monte Carlo estimates. 168 00:07:35,060 --> 00:07:40,360 So let me get the screen up. 169 00:07:44,770 --> 00:07:58,676 And so we're at number 3, important sampling. 170 00:07:58,676 --> 00:08:01,620 Not a very good piece of chalk. 171 00:08:06,120 --> 00:08:24,833 And so let's think about a random variable, x, 172 00:08:24,833 --> 00:08:38,220 that has pdf f sub x, and it has mean 173 00:08:38,220 --> 00:08:46,140 that we'll call mu sub x, which is the expectation of x. 174 00:08:46,140 --> 00:08:49,210 And I'm going to put a little x on 175 00:08:49,210 --> 00:08:53,990 the subscript on the expectation to just denote 176 00:08:53,990 --> 00:09:04,130 that we're taking an expectation under the density f sub x. 177 00:09:04,130 --> 00:09:05,380 So what I mean by that? 178 00:09:05,380 --> 00:09:09,650 If you want to think about an expectation as an integral, 179 00:09:09,650 --> 00:09:12,450 and remember Alex talked about this on Monday, 180 00:09:12,450 --> 00:09:19,151 the expectation of x mu of x is the integral of what? 181 00:09:19,151 --> 00:09:26,860 x times it's pdf dx. 182 00:09:26,860 --> 00:09:31,110 And so again, you can see where we're going with this. 183 00:09:31,110 --> 00:09:33,740 Remember we talked about having a different distribution. 184 00:09:33,740 --> 00:09:35,198 We want to be clear that when we're 185 00:09:35,198 --> 00:09:37,599 talking about the expectation of the random variable, x, 186 00:09:37,599 --> 00:09:39,890 we're talking about taking the expectation with respect 187 00:09:39,890 --> 00:09:44,220 to it or under its pdf f sub x. 188 00:09:44,220 --> 00:09:46,244 Is the notation clear? 189 00:09:46,244 --> 00:09:48,610 Yes? 190 00:09:48,610 --> 00:09:52,360 So we have this random variable, x. 191 00:09:52,360 --> 00:09:55,060 We may be interested in estimating its mean, 192 00:09:55,060 --> 00:09:58,950 and we know that this is the definition of the mean. 193 00:09:58,950 --> 00:10:04,740 It's an integral of x weighted by the pdf of x. 194 00:10:04,740 --> 00:10:06,195 I didn't put limits here, but this 195 00:10:06,195 --> 00:10:10,480 is an integral over this 1/4 of x, 196 00:10:10,480 --> 00:10:13,800 which would be minus infinity to infinity 197 00:10:13,800 --> 00:10:15,580 if x was a normal interval. 198 00:10:15,580 --> 00:10:18,880 If it was uniform, it would be an integral over the ab range, 199 00:10:18,880 --> 00:10:21,807 what this 1/4 of x would be. 200 00:10:21,807 --> 00:10:23,390 So now what we're going to do is we're 201 00:10:23,390 --> 00:10:28,429 going to choose a random variable that we'll call z. 202 00:10:36,390 --> 00:10:41,960 And we're going to ask z to be non-negative, 203 00:10:41,960 --> 00:10:46,595 and we're going to choose it subject to the constraints 204 00:10:46,595 --> 00:10:56,300 that the expectation of z taken under the pdf-- still the pdf 205 00:10:56,300 --> 00:11:01,773 here-- of fx is equal to 1. 206 00:11:04,551 --> 00:11:06,430 And why are we writing this? 207 00:11:06,430 --> 00:11:08,610 Well, what's the 6th notation? 208 00:11:08,610 --> 00:11:17,470 It's the integral of z f sub x of x dx. 209 00:11:17,470 --> 00:11:22,110 That's the expectation of the random variable, v, 210 00:11:22,110 --> 00:11:24,720 under the pdf fx. 211 00:11:28,199 --> 00:11:34,280 So if this is equal to 1, maybe you can see that all we've done 212 00:11:34,280 --> 00:11:41,211 is really define another pdf that we can write as f sub z. 213 00:11:41,211 --> 00:11:43,700 Yep. 214 00:11:43,700 --> 00:11:46,175 Because it's going to satisfy the properties of a pdf. 215 00:11:46,175 --> 00:11:49,376 It's going to integrate to 1. 216 00:11:49,376 --> 00:11:52,800 Is that OK? 217 00:11:52,800 --> 00:11:53,748 AUDIENCE: [INAUDIBLE]. 218 00:11:57,540 --> 00:12:00,738 KAREN WILLCOX: z is going to be just another random variable. 219 00:12:07,450 --> 00:12:10,230 Yeah, so you can maybe just think 220 00:12:10,230 --> 00:12:15,360 of z as being defined on the same support if you want to, 221 00:12:15,360 --> 00:12:23,860 or I think this thing is allowed to be 0 in some places, 222 00:12:23,860 --> 00:12:26,830 but it's not allowed to be 0 everywhere. 223 00:12:26,830 --> 00:12:30,308 There are some conditions related to where z can be 0. 224 00:12:33,528 --> 00:12:35,611 Is that OK for the lecturer to ask you a question? 225 00:12:35,611 --> 00:12:36,527 AUDIENCE: [INAUDIBLE]. 226 00:12:39,927 --> 00:12:40,760 KAREN WILLCOX: Yeah. 227 00:12:40,760 --> 00:12:43,900 So I think the easiest thing is to think 228 00:12:43,900 --> 00:12:46,690 about x as having infinite support, then integrals away 229 00:12:46,690 --> 00:12:49,410 from minus infinity [INAUDIBLE]. 230 00:12:49,410 --> 00:12:51,990 So we haven't done anything yet. 231 00:12:51,990 --> 00:12:53,860 We've just manipulated things. 232 00:12:53,860 --> 00:12:55,010 So why are we doing this? 233 00:12:55,010 --> 00:12:56,830 So if we did this, we'd have to think 234 00:12:56,830 --> 00:13:01,730 about the mean of x, mu x, the thing here, 235 00:13:01,730 --> 00:13:06,905 which is, again, the expectation of x with respect 236 00:13:06,905 --> 00:13:10,850 to under its own pdf. 237 00:13:10,850 --> 00:13:13,590 And we said that this thing was integral of f 238 00:13:13,590 --> 00:13:18,610 times its pdf, weighted by pdf. 239 00:13:18,610 --> 00:13:20,270 And so now what can we do? 240 00:13:20,270 --> 00:13:24,636 We can divide by z and multiply by z. 241 00:13:32,220 --> 00:13:34,432 Just multiply by 1. 242 00:13:34,432 --> 00:13:39,150 And now we recognize that z f of x 243 00:13:39,150 --> 00:13:46,580 is just what we're now calling f of z, where this is x over z f 244 00:13:46,580 --> 00:13:47,736 of z dx. 245 00:13:54,190 --> 00:13:56,966 So all we did was multiply and divide by z, 246 00:13:56,966 --> 00:13:58,715 and then say that we're going to group the 247 00:13:58,715 --> 00:14:03,870 z times the pdf's of x together and [INAUDIBLE] f of z. 248 00:14:03,870 --> 00:14:06,570 And we can do that because [? opposite ?] constraint is 249 00:14:06,570 --> 00:14:10,110 that same as integrating to 1. 250 00:14:10,110 --> 00:14:12,190 So what is this guy here? 251 00:14:12,190 --> 00:14:16,430 This guy here is the expectation of something 252 00:14:16,430 --> 00:14:21,140 else with respect to the pdf of z, the density of z. 253 00:14:21,140 --> 00:14:23,830 What is it the expectation of? 254 00:14:23,830 --> 00:14:25,080 x over z? 255 00:14:25,080 --> 00:14:29,070 So this is the expectation of x over z, 256 00:14:29,070 --> 00:14:34,860 but now taken with respect to the pdf fd. 257 00:14:41,925 --> 00:14:42,610 All right. 258 00:14:42,610 --> 00:14:44,318 So does this kind of feel like we're just 259 00:14:44,318 --> 00:14:46,150 moving chairs around? 260 00:14:46,150 --> 00:14:47,229 Yeah. 261 00:14:47,229 --> 00:14:49,770 That's why I said they sort of seem a little bit like tricks. 262 00:14:49,770 --> 00:14:53,884 Is anybody uncomfortable with anything that we've done? 263 00:14:53,884 --> 00:14:55,230 It's OK? 264 00:14:55,230 --> 00:14:58,560 We've put a 1 over d in here, and we've 265 00:14:58,560 --> 00:15:01,660 changed the pdf, the weighting and the integral 266 00:15:01,660 --> 00:15:04,867 to accommodate to make it equal to same. 267 00:15:04,867 --> 00:15:05,366 Yeah, Kevin? 268 00:15:05,366 --> 00:15:06,700 AUDIENCE: [INAUDIBLE]. 269 00:15:06,700 --> 00:15:07,590 KAREN WILLCOX: The constraint we have 270 00:15:07,590 --> 00:15:09,589 here is that this is the expectation of [? d1 ?] 271 00:15:09,589 --> 00:15:10,340 of f of x is 1. 272 00:15:13,220 --> 00:15:16,975 Well, this is going to be what leads us to find the pdf's. 273 00:15:16,975 --> 00:15:17,475 Yeah. 274 00:15:17,475 --> 00:15:18,413 AUDIENCE: [INAUDIBLE]. 275 00:15:22,170 --> 00:15:25,710 KAREN WILLCOX: So why do we put [INAUDIBLE] underneath 276 00:15:25,710 --> 00:15:26,850 and then bring it about? 277 00:15:26,850 --> 00:15:29,021 So let's look at what we have. 278 00:15:29,021 --> 00:15:34,500 We have now two different ways to compute the mean of x. 279 00:15:34,500 --> 00:15:36,320 That's what we're after. 280 00:15:36,320 --> 00:15:39,930 We have [AUDIO OUT] what we normally think about, 281 00:15:39,930 --> 00:15:44,560 which is thinking about this mean as the expectation of x 282 00:15:44,560 --> 00:15:47,300 under the pdf. 283 00:15:47,300 --> 00:15:51,390 And now we have derived this alternate expression, 284 00:15:51,390 --> 00:15:55,160 which is the expectation of x over z 285 00:15:55,160 --> 00:15:59,310 where the expectation is taken now with the pdf 286 00:15:59,310 --> 00:16:02,564 f of z, the z, and the weighting. 287 00:16:02,564 --> 00:16:05,420 So this is what we've been talking about. 288 00:16:05,420 --> 00:16:08,890 How do we estimate the mean this way? 289 00:16:08,890 --> 00:16:19,182 Well, we sample x by drawing samples from the distribution 290 00:16:19,182 --> 00:16:21,790 f sub x. 291 00:16:21,790 --> 00:16:24,460 We define the pdf of x input. 292 00:16:24,460 --> 00:16:26,205 We draw samples from it. 293 00:16:33,110 --> 00:16:35,060 We estimate the mean to be the sample mean. 294 00:16:39,080 --> 00:16:54,400 And the variance of the corresponding estimator 295 00:16:54,400 --> 00:16:58,580 to the estimator variance is what? 296 00:17:05,980 --> 00:17:08,609 What was on the numerator? 297 00:17:08,609 --> 00:17:10,164 Not mu x. 298 00:17:10,164 --> 00:17:11,466 AUDIENCE: [INAUDIBLE]. 299 00:17:11,466 --> 00:17:13,382 KAREN WILLCOX: Yeah, it's [INAUDIBLE] squared. 300 00:17:13,382 --> 00:17:17,540 So we'll call it the variance of x. 301 00:17:17,540 --> 00:17:21,195 And again, I'm going to put the subjects here just to denote. 302 00:17:21,195 --> 00:17:24,807 So that's what we saw before. 303 00:17:24,807 --> 00:17:26,890 Actually, we sampled inputs and we propagated them 304 00:17:26,890 --> 00:17:29,940 through to the outputs, but basically, we're 305 00:17:29,940 --> 00:17:36,010 just sampling from this density, and given 306 00:17:36,010 --> 00:17:40,010 the sample mean and our estimator 307 00:17:40,010 --> 00:17:43,462 and the estimator of variance with the [INAUDIBLE]. 308 00:17:43,462 --> 00:17:48,050 So what we've done here is defined 309 00:17:48,050 --> 00:17:51,220 a parallel pattern analogous [? path. ?] So what we do here? 310 00:17:51,220 --> 00:17:58,780 We're going to sample x over z, and we're 311 00:17:58,780 --> 00:18:06,300 going to sample it under now the distribution or the density xz. 312 00:18:11,050 --> 00:18:17,610 And again, we take the samples, we take the sample mean. 313 00:18:17,610 --> 00:18:19,310 That would be our estimate. 314 00:18:19,310 --> 00:18:23,440 And the variance of that estimator, the estimator 315 00:18:23,440 --> 00:18:27,705 variance, is now going to be what? 316 00:18:34,280 --> 00:18:42,900 Variance of x over z under this divided by [? N. ?] 317 00:18:46,100 --> 00:18:49,600 And now maybe this is where you see why these tricks work 318 00:18:49,600 --> 00:18:53,350 because putting the d in the denominator and bringing it out 319 00:18:53,350 --> 00:18:55,320 here didn't change what we're estimating. 320 00:18:55,320 --> 00:18:56,992 These two expressions are equal. 321 00:19:00,170 --> 00:19:02,720 I like to think about it is that because of the way 322 00:19:02,720 --> 00:19:05,650 that variance behaves non-linearly, 323 00:19:05,650 --> 00:19:08,620 pulling the z under and putting it out here in front 324 00:19:08,620 --> 00:19:11,500 does not mean that these two things are going to be equal. 325 00:19:11,500 --> 00:19:12,570 And maybe you see there. 326 00:19:12,570 --> 00:19:14,861 You could drive me crazy when I was learning statistics 327 00:19:14,861 --> 00:19:17,280 is that things like means would behave in kind 328 00:19:17,280 --> 00:19:19,480 of this linear way that made thins, 329 00:19:19,480 --> 00:19:25,132 but the variance never seemed to be doing the sensible thing. 330 00:19:25,132 --> 00:19:26,840 So basically we're going to exploit that. 331 00:19:26,840 --> 00:19:29,550 So again, we can do this manipulation 332 00:19:29,550 --> 00:19:32,450 with this other variable, z, and not change the estimate, 333 00:19:32,450 --> 00:19:34,770 still be getting after the thing we want. 334 00:19:34,770 --> 00:19:36,810 But if we choose z in the right way, 335 00:19:36,810 --> 00:19:39,970 and we'll talk about how to do that, we could make 336 00:19:39,970 --> 00:19:42,710 this smaller, which means, again, for a given 337 00:19:42,710 --> 00:19:46,730 number of samples, if this is smaller than this, 338 00:19:46,730 --> 00:19:50,590 then our estimates are going to be tighter. 339 00:19:50,590 --> 00:19:54,920 Or to achieve a given confidence in the estimator, 340 00:19:54,920 --> 00:20:00,090 we could take less samples here than doing it this way. 341 00:20:00,090 --> 00:20:02,820 Yeah. 342 00:20:02,820 --> 00:20:09,840 So We'll talk about sort of the general ideas, 343 00:20:09,840 --> 00:20:12,020 and then I'll show you exactly how you might come up 344 00:20:12,020 --> 00:20:15,210 with a z for probabilities. 345 00:20:15,210 --> 00:20:16,960 But the general ideas are first of all, 346 00:20:16,960 --> 00:20:21,380 that some values of x in the Monte Carlo simulation 347 00:20:21,380 --> 00:20:25,230 are going to be more important for estimating the parameter. 348 00:20:25,230 --> 00:20:26,850 The mean is not a very good example, 349 00:20:26,850 --> 00:20:28,430 but you can imagine if we're talking 350 00:20:28,430 --> 00:20:30,720 about tail probabilities, we want 351 00:20:30,720 --> 00:20:36,450 to get more samples around that failure probability. 352 00:20:36,450 --> 00:20:39,596 What goes on over here we don't really necessarily care about 353 00:20:39,596 --> 00:20:41,595 because it's like doesn't fail, it doesn't fail, 354 00:20:41,595 --> 00:20:43,652 it doesn't fail. 355 00:20:43,652 --> 00:20:45,334 So the idea is that some values of x 356 00:20:45,334 --> 00:20:47,000 are going to be more important, and what 357 00:20:47,000 --> 00:20:52,010 we want to do when we choose the z here is to somehow emphasize 358 00:20:52,010 --> 00:20:54,405 those more important values. 359 00:20:54,405 --> 00:20:56,280 And so really what it's going to come down to 360 00:20:56,280 --> 00:21:00,370 is how do we choose z again so that this estimated variance is 361 00:21:00,370 --> 00:21:03,070 going to be less than this guy. 362 00:21:03,070 --> 00:21:06,150 And we will talk about that in flow probabilities. 363 00:21:12,900 --> 00:21:14,320 So any questions? 364 00:21:20,104 --> 00:21:21,895 I didn't ask you at the beginning of class, 365 00:21:21,895 --> 00:21:23,895 did Professor [? Nguyen ?] talk about importance 366 00:21:23,895 --> 00:21:25,385 sampling in 1609? 367 00:21:25,385 --> 00:21:25,885 No? 368 00:21:25,885 --> 00:21:29,490 Just as well. 369 00:21:29,490 --> 00:21:29,990 All right. 370 00:21:29,990 --> 00:21:36,970 So let's talk about-- this is 3.2, importance sampling 371 00:21:36,970 --> 00:21:39,410 for probability estimation. 372 00:21:39,410 --> 00:21:41,760 And hopefully, maybe it will make this a little bit 373 00:21:41,760 --> 00:21:43,080 more concrete for you. 374 00:21:50,010 --> 00:22:00,960 So what did we already see? 375 00:22:00,960 --> 00:22:05,542 We saw that the probability of A, 376 00:22:05,542 --> 00:22:09,322 where A is some event that we're interested in 377 00:22:09,322 --> 00:22:15,420 is estimated by a Monte Carlo, or can 378 00:22:15,420 --> 00:22:20,740 be estimated by a Monte Carlo, with an estimator 379 00:22:20,740 --> 00:22:27,960 that we'll call P hat A. 380 00:22:27,960 --> 00:22:34,930 So let's think about the case where, 381 00:22:34,930 --> 00:22:37,438 again, we're interested in the probability of failures, 382 00:22:37,438 --> 00:22:45,792 so A is the event where, say y is greater than y current. 383 00:22:45,792 --> 00:22:47,825 So y is some output of interest. 384 00:22:47,825 --> 00:22:49,820 It's the temperature in [? blades, ?] 385 00:22:49,820 --> 00:22:53,390 and we're interested in looking at when that temperature 386 00:22:53,390 --> 00:22:54,950 exceeds some critical value. 387 00:22:58,685 --> 00:23:01,420 So this shows up often, and if we are looking 388 00:23:01,420 --> 00:23:02,825 at a probability of failure. 389 00:23:08,160 --> 00:23:12,820 So we're going to define an indicator function. 390 00:23:12,820 --> 00:23:15,000 Usually when you're working with probabilities 391 00:23:15,000 --> 00:23:17,550 and if you're told to find an indicator function, 392 00:23:17,550 --> 00:23:22,410 you guys should have seen those in 1609, yes? 393 00:23:22,410 --> 00:23:30,264 In 6041. 394 00:23:30,264 --> 00:23:34,000 So the indicator function, i, is the function of Ai. 395 00:23:34,000 --> 00:23:38,120 So this is going to be corresponding 396 00:23:38,120 --> 00:23:42,630 to the i-th sample in our Monte Carlo. 397 00:23:42,630 --> 00:23:45,710 So we're drawing a sample, where running it through 398 00:23:45,710 --> 00:23:48,280 our [? planet ?] element code, and we're looking 399 00:23:48,280 --> 00:23:51,570 to see whether event A happens. 400 00:23:51,570 --> 00:23:56,000 And indicator function says if event A happens, then 401 00:23:56,000 --> 00:23:59,917 takes a value of 1, if not, take a value of 0. 402 00:23:59,917 --> 00:24:03,520 So this is going to be if the corresponding sample of y, 403 00:24:03,520 --> 00:24:08,640 the yi exceeds y current, and if not, 404 00:24:08,640 --> 00:24:13,522 then we can set indicator function to be 0. 405 00:24:13,522 --> 00:24:15,280 So indicator function is set to 0. 406 00:24:15,280 --> 00:24:17,430 1 result. Failed. 407 00:24:17,430 --> 00:24:18,686 Didn't fail. 408 00:24:18,686 --> 00:24:19,580 A happened. 409 00:24:19,580 --> 00:24:21,878 A didn't happen. 410 00:24:21,878 --> 00:24:24,213 AUDIENCE: [INAUDIBLE]. 411 00:24:24,213 --> 00:24:28,500 KAREN WILLCOX: i to the range. i is either 0 or 1. 412 00:24:28,500 --> 00:24:30,990 AUDIENCE: [INAUDIBLE]. 413 00:24:30,990 --> 00:24:32,300 KAREN WILLCOX: Oh, little i. 414 00:24:32,300 --> 00:24:32,800 Sorry. 415 00:24:32,800 --> 00:24:34,958 Yeah, that's right. 416 00:24:34,958 --> 00:24:37,749 Little i from 1 up to n. 417 00:24:37,749 --> 00:24:38,248 Sorry. 418 00:24:41,020 --> 00:24:45,280 So if we just think about these things in this way, 419 00:24:45,280 --> 00:24:50,370 then the reason we do this is because then our estimator P 420 00:24:50,370 --> 00:24:54,910 hat of A-- what was he estimator for the probability? 421 00:24:58,070 --> 00:24:58,973 What was it? 422 00:25:04,290 --> 00:25:04,790 Yep. 423 00:25:04,790 --> 00:25:07,780 So the number of samples where A occurred just 424 00:25:07,780 --> 00:25:09,540 divided by the [INAUDIBLE] number. 425 00:25:09,540 --> 00:25:12,905 So what does that look like? 426 00:25:12,905 --> 00:25:14,280 What does that look like in terms 427 00:25:14,280 --> 00:25:15,670 of the indicator function? 428 00:25:15,670 --> 00:25:18,150 I'll put the denominator in there. 429 00:25:18,150 --> 00:25:20,191 What's the numerator? 430 00:25:20,191 --> 00:25:20,690 Yeah. 431 00:25:20,690 --> 00:25:23,320 Just the sum of the iA's. 432 00:25:23,320 --> 00:25:32,660 So it's 1 over N. The sum equals 1 to capital N of iA's. 433 00:25:32,660 --> 00:25:37,280 In other words, it's the sample mean of the indicator function. 434 00:25:37,280 --> 00:25:40,640 And so we can think of the estimator this way, 435 00:25:40,640 --> 00:25:41,965 and we've already seen this. 436 00:25:41,965 --> 00:25:46,570 We saw that the expected value of this estimator 437 00:25:46,570 --> 00:25:48,190 for the probability was equal to what? 438 00:25:53,720 --> 00:25:57,614 Probability of A, unbiased. 439 00:26:01,970 --> 00:26:04,700 This would be a big piece. 440 00:26:04,700 --> 00:26:06,345 And do you remember the expressions 441 00:26:06,345 --> 00:26:08,066 to the variance of this? 442 00:26:11,328 --> 00:26:14,905 Does anyone remember it? 443 00:26:14,905 --> 00:26:22,480 Here, it's P of A [? deserves ?] 1 minus P of A. 444 00:26:22,480 --> 00:26:25,680 And what's always in the denominator? 445 00:26:25,680 --> 00:26:26,588 N. Right. 446 00:26:26,588 --> 00:26:31,201 Standard error of the estimator is 1/3 of that thing. 447 00:26:31,201 --> 00:26:33,536 All right. 448 00:26:33,536 --> 00:26:37,080 So that's what we saw before. 449 00:26:37,080 --> 00:26:39,620 So now what we're going to do is think 450 00:26:39,620 --> 00:26:42,080 about how importance sampling can 451 00:26:42,080 --> 00:26:46,110 help us come up with an estimator, 452 00:26:46,110 --> 00:26:49,405 for P of A that's got smaller variance. 453 00:26:49,405 --> 00:26:51,570 Do you remember when we did that analysis? 454 00:26:51,570 --> 00:26:55,280 We also did an analysis where if P is really small 455 00:26:55,280 --> 00:26:57,480 and you want to have a tolerance that's 456 00:26:57,480 --> 00:26:59,890 relative to the size of P, you end up 457 00:26:59,890 --> 00:27:03,132 needing billions, millions of samples. 458 00:27:03,132 --> 00:27:06,080 So we're going to see whether importance sampling 459 00:27:06,080 --> 00:27:11,610 or see how importance sampling can help us. 460 00:27:11,610 --> 00:27:13,890 Again, I'm going to erase this, but I 461 00:27:13,890 --> 00:27:16,500 find it helpful to think about these two paths. 462 00:27:16,500 --> 00:27:19,590 We could sample, take lots of samples 463 00:27:19,590 --> 00:27:22,710 and we're talking about the mean here, 464 00:27:22,710 --> 00:27:25,500 use our regular estimator. 465 00:27:25,500 --> 00:27:27,390 Or we could think about introducing 466 00:27:27,390 --> 00:27:30,890 another random variable with another pdf 467 00:27:30,890 --> 00:27:38,060 and changing what we sample from and then weighting the integral 468 00:27:38,060 --> 00:27:41,005 to account for the fact that we sampled from a different pdf 469 00:27:41,005 --> 00:27:43,415 and coming out with an estimate of the same thing 470 00:27:43,415 --> 00:27:46,760 but changing the variance. 471 00:27:46,760 --> 00:27:48,910 So let's see how that would work out. 472 00:27:58,046 --> 00:28:02,552 So we'll go straight to a pdf. 473 00:28:02,552 --> 00:28:05,910 So we'll introduce another pdf, and it's going to be called w. 474 00:28:08,755 --> 00:28:10,130 And it's a function of x, so it's 475 00:28:10,130 --> 00:28:17,205 defined on the same support as x, whatever x is. 476 00:28:17,205 --> 00:28:19,680 What are we doing? 477 00:28:19,680 --> 00:28:21,001 Yeah, we have [INAUDIBLE]. 478 00:28:21,001 --> 00:28:22,384 AUDIENCE: [INAUDIBLE] 479 00:28:22,384 --> 00:28:24,120 KAREN WILLCOX: Ah, fourth. 480 00:28:24,120 --> 00:28:25,578 AUDIENCE: [? Don ?] Ulrich just wanted to say hello. 481 00:28:25,578 --> 00:28:26,550 KAREN WILLCOX: Hi, Don. 482 00:28:26,550 --> 00:28:27,522 AUDIENCE: How are you doing? 483 00:28:27,522 --> 00:28:28,890 You've got chalk all over your face. 484 00:28:28,890 --> 00:28:29,806 KAREN WILLCOX: I have. 485 00:28:29,806 --> 00:28:31,817 You guys know who this is, right? 486 00:28:31,817 --> 00:28:33,400 [? Don ?] Ulrich, Satellite astronaut. 487 00:28:33,400 --> 00:28:35,940 AUDIENCE: I'm one of the lucky guys who took a fall in space. 488 00:28:35,940 --> 00:28:36,940 KAREN WILLCOX: They're excited in learning all 489 00:28:36,940 --> 00:28:38,106 about computational methods. 490 00:28:38,106 --> 00:28:40,610 And today we're talking about estimating low failure 491 00:28:40,610 --> 00:28:44,020 probabilities, so things like 10 to the minus 9, 492 00:28:44,020 --> 00:28:45,395 things that can go wrong. 493 00:28:45,395 --> 00:28:46,728 Do you know anything about that? 494 00:28:46,728 --> 00:28:49,170 AUDIENCE: That's 10 to the minus 9. 495 00:28:49,170 --> 00:28:51,625 Well, you don't need to worry about that. 496 00:28:51,625 --> 00:28:53,500 KAREN WILLCOX: I'll be at the dinner tonight. 497 00:28:53,500 --> 00:28:54,458 So I'll pick up a beer. 498 00:28:54,458 --> 00:28:55,260 Thanks. 499 00:28:55,260 --> 00:28:56,395 Where is the talk? 500 00:28:56,395 --> 00:28:59,370 AUDIENCE: Marlar Lounge. 501 00:28:59,370 --> 00:29:00,870 Second floor. 502 00:29:00,870 --> 00:29:02,550 4:00. 503 00:29:02,550 --> 00:29:05,596 KAREN WILLCOX: Or you could come to office hours. 504 00:29:05,596 --> 00:29:08,131 AUDIENCE: And make sure you learn this stuff because it's 505 00:29:08,131 --> 00:29:08,630 important. 506 00:29:15,104 --> 00:29:17,020 KAREN WILLCOX: I don't if you guys know, but I 507 00:29:17,020 --> 00:29:20,000 made it to the finals at the astronaut selection last year, 508 00:29:20,000 --> 00:29:23,448 but didn't get picked so, Don was, yeah. 509 00:29:29,420 --> 00:29:29,920 All right. 510 00:29:29,920 --> 00:29:31,210 So we just introduced a bit-- I'd 511 00:29:31,210 --> 00:29:32,285 much rather teach that stuff. 512 00:29:32,285 --> 00:29:34,284 It's more exciting than going into space, right? 513 00:29:34,284 --> 00:29:35,236 AUDIENCE: [INAUDIBLE]. 514 00:29:42,964 --> 00:29:46,840 KAREN WILLCOX: So we've introduced to pdf w of x, 515 00:29:46,840 --> 00:29:48,740 and I want you to think is this as being 516 00:29:48,740 --> 00:29:54,720 like an alternative pdf for x. 517 00:29:54,720 --> 00:29:57,370 And x is just a generic at this point. 518 00:29:57,370 --> 00:30:05,190 This thing is called the biasing entity. 519 00:30:09,190 --> 00:30:11,320 And you can see what we're going to do 520 00:30:11,320 --> 00:30:23,070 is we're going to choose w of x so that this event, A, occurs 521 00:30:23,070 --> 00:30:23,780 more frequently. 522 00:30:30,944 --> 00:30:32,450 So the idea is we don't want to have 523 00:30:32,450 --> 00:30:35,030 to wait around for a billion samples for the one event 524 00:30:35,030 --> 00:30:36,060 to occur. 525 00:30:36,060 --> 00:30:40,170 We're going to choose this alternate pdf w, 526 00:30:40,170 --> 00:30:44,600 in such a way that this event, A, occurs more frequently. 527 00:30:44,600 --> 00:30:48,300 So then what is our estimate to the probability look like? 528 00:30:48,300 --> 00:30:59,910 Probability of A is maybe you can see it directly 529 00:30:59,910 --> 00:31:00,840 from this expression. 530 00:31:00,840 --> 00:31:03,540 We could write it as being the expectation of the indicator 531 00:31:03,540 --> 00:31:04,040 function. 532 00:31:06,662 --> 00:31:12,596 And it's the expectation taken under f of x. 533 00:31:12,596 --> 00:31:15,540 So it's the expectation of the indicator 534 00:31:15,540 --> 00:31:23,250 function with expectation as with respect to f of x. 535 00:31:23,250 --> 00:31:33,270 Maybe up here I should define that then f of x is but the pdf 536 00:31:33,270 --> 00:31:34,286 f of x. 537 00:31:38,020 --> 00:31:38,970 And what is this? 538 00:31:38,970 --> 00:31:45,030 This is nothing but the integral of the indicator function 539 00:31:45,030 --> 00:31:47,465 weighted by the pdf and then [? graded ?] up 540 00:31:47,465 --> 00:31:49,520 for the support of x. 541 00:31:49,520 --> 00:31:53,100 So we're going to do the same trick that we did before. 542 00:31:53,100 --> 00:31:57,880 We're going to multiply and divide by this w. 543 00:31:57,880 --> 00:32:00,710 So we've got our f of x. 544 00:32:00,710 --> 00:32:02,636 We're going divide by w. 545 00:32:02,636 --> 00:32:04,630 We're going to multiply by w. 546 00:32:09,085 --> 00:32:15,220 And maybe now you can see that what we have now, 547 00:32:15,220 --> 00:32:22,480 we have an expectation of [? sine. ?] We can think 548 00:32:22,480 --> 00:32:26,220 about lumping this thing together now taken 549 00:32:26,220 --> 00:32:29,220 with respect to the pdf w. 550 00:32:29,220 --> 00:32:33,010 So this is just the integral of something times the pdf 551 00:32:33,010 --> 00:32:36,032 integrated over the support. 552 00:32:36,032 --> 00:32:39,770 So we have to write it up here. 553 00:32:45,912 --> 00:32:47,495 So the last line in that development-- 554 00:32:47,495 --> 00:32:51,175 the left-hand side is P of A, which we haven't discretized 555 00:32:51,175 --> 00:32:52,050 with Monte Carlo yet. 556 00:32:52,050 --> 00:32:56,370 We're still doing things exactly-- 557 00:32:56,370 --> 00:33:04,980 is expectation with respect or under this pdf w of-- now, 558 00:33:04,980 --> 00:33:15,010 the indicator function with this waiting, f of x over w of x. 559 00:33:20,290 --> 00:33:24,920 So what does this mean? 560 00:33:24,920 --> 00:33:35,660 This means we could draw samples from w instead of from x. 561 00:33:35,660 --> 00:33:42,590 So draw samples from the pdf w. 562 00:33:42,590 --> 00:33:44,100 But if you do that, in order to get 563 00:33:44,100 --> 00:33:46,530 the right estimate for the probability, 564 00:33:46,530 --> 00:33:52,010 you have to weight the samples by the f over w 565 00:33:52,010 --> 00:33:53,900 to make up for the fact that you drew 566 00:33:53,900 --> 00:33:56,546 from a different distribution. 567 00:33:56,546 --> 00:33:58,360 Does that make sense? 568 00:33:58,360 --> 00:34:05,440 So weight by f of x over w of x, I'd 569 00:34:05,440 --> 00:34:09,010 say to counter the sort of my informal way of thinking 570 00:34:09,010 --> 00:34:09,949 about it. 571 00:34:09,949 --> 00:34:13,530 And if you do that, then the expectation works out 572 00:34:13,530 --> 00:34:19,760 so that you are still getting the probability you're after. 573 00:34:19,760 --> 00:34:23,090 So then the last step is just to write down what our Monte Carlo 574 00:34:23,090 --> 00:34:30,830 estimator would be-- so then our Monte Carlo importance sampling 575 00:34:30,830 --> 00:34:41,690 estimator for this probability of A is what? 576 00:34:41,690 --> 00:34:44,332 So let's call it P hat. 577 00:34:44,332 --> 00:34:46,630 We'll put an IS down here to indicate 578 00:34:46,630 --> 00:34:48,800 that's the importance sampling estimator. 579 00:34:48,800 --> 00:34:49,662 So help me write it. 580 00:34:49,662 --> 00:34:50,620 What does it look like? 581 00:34:53,570 --> 00:34:54,540 Take this risk part. 582 00:34:54,540 --> 00:34:55,415 That's the easy part. 583 00:34:58,730 --> 00:35:02,290 One over N-- is it Libby or Casey? 584 00:35:02,290 --> 00:35:04,408 Oh, it's [INAUDIBLE]. 585 00:35:04,408 --> 00:35:05,080 All right. 586 00:35:05,080 --> 00:35:07,690 Now, one of you guys have to do the harder part. 587 00:35:07,690 --> 00:35:09,487 1 over N what? 588 00:35:14,480 --> 00:35:17,980 There is a regular Monte Carlo estimator. 589 00:35:17,980 --> 00:35:23,080 Summation from I equals 1 to N of what? 590 00:35:25,984 --> 00:35:37,960 iA times f of xi over w of xi. 591 00:35:37,960 --> 00:35:41,210 And it doesn't show up in the formula, 592 00:35:41,210 --> 00:35:47,230 but it's important to know that this is all when the xi's 593 00:35:47,230 --> 00:35:52,490 are drawn from the pdf w. 594 00:35:58,620 --> 00:35:59,387 That's that. 595 00:36:01,930 --> 00:36:04,020 We never noted that before because we only ever 596 00:36:04,020 --> 00:36:07,030 had one pdf floating around, but now that we've got two, 597 00:36:07,030 --> 00:36:09,150 so here's the estimator. 598 00:36:09,150 --> 00:36:15,580 If we drew the samples from f of x, 599 00:36:15,580 --> 00:36:17,612 which is the regular way of doing it, 600 00:36:17,612 --> 00:36:20,210 that would be the estimator, but if we draw the samples 601 00:36:20,210 --> 00:36:22,405 from the pdf of w, then this is our estimator, 602 00:36:22,405 --> 00:36:25,263 and basically we just have to re weigh it. 603 00:36:28,650 --> 00:36:29,387 Is that clear? 604 00:36:32,660 --> 00:36:34,398 So can you see-- yeah. 605 00:36:34,398 --> 00:36:35,396 AUDIENCE: [INAUDIBLE]. 606 00:36:42,881 --> 00:36:43,751 KAREN WILLCOX: Here? 607 00:36:43,751 --> 00:36:44,250 Yeah. 608 00:36:44,250 --> 00:36:46,930 We divided by w, and we multiplied it by w. 609 00:36:46,930 --> 00:36:47,930 AUDIENCE: [INAUDIBLE]. 610 00:36:54,432 --> 00:36:57,550 KAREN WILLCOX: Well, yeah. 611 00:36:57,550 --> 00:36:59,514 w is a pdf. 612 00:36:59,514 --> 00:37:00,930 Maybe I could have written it like 613 00:37:00,930 --> 00:37:05,120 if w might have been-- yeah. 614 00:37:05,120 --> 00:37:08,100 Sorry. w is a pdf. 615 00:37:08,100 --> 00:37:08,600 Yeah. 616 00:37:08,600 --> 00:37:09,600 AUDIENCE: [INAUDIBLE]. 617 00:37:13,865 --> 00:37:15,740 KAREN WILLCOX: This one is strange, actually. 618 00:37:15,740 --> 00:37:17,320 We approached it here by starting off 619 00:37:17,320 --> 00:37:19,500 by saying w is a pdf. 620 00:37:19,500 --> 00:37:21,182 So w itself is a pdf. 621 00:37:21,182 --> 00:37:23,140 When I introduce a general importance sampling, 622 00:37:23,140 --> 00:37:25,327 we started off with a random variable 623 00:37:25,327 --> 00:37:27,660 and said that's why we put that constraint on so that we 624 00:37:27,660 --> 00:37:28,509 could get to a pdf. 625 00:37:28,509 --> 00:37:30,800 But here we're just say let's introduce a secondary pdf 626 00:37:30,800 --> 00:37:34,570 or an accelerated pdf. 627 00:37:34,570 --> 00:37:37,040 So w is a pdf, which means it has to satisfy 628 00:37:37,040 --> 00:37:38,980 the property of a pdf. 629 00:37:41,960 --> 00:37:43,360 Greg, did you have some-- 630 00:37:43,360 --> 00:37:44,360 AUDIENCE: [INAUDIBLE]. 631 00:37:49,064 --> 00:37:50,352 KAREN WILLCOX: This guy? 632 00:37:50,352 --> 00:37:51,276 AUDIENCE: [INAUDIBLE]. 633 00:37:54,594 --> 00:37:55,760 KAREN WILLCOX: That's right. 634 00:37:55,760 --> 00:37:56,736 AUDIENCE: [INAUDIBLE]. 635 00:38:03,477 --> 00:38:04,310 KAREN WILLCOX: Yeah. 636 00:38:04,310 --> 00:38:08,750 So this would be given to you. 637 00:38:08,750 --> 00:38:11,860 So one thing that's just a little bit confusing and maybe 638 00:38:11,860 --> 00:38:15,770 I haven't been entirely, so I'm going to put it over here, 639 00:38:15,770 --> 00:38:18,610 is that when we think about Monte Carlo, 640 00:38:18,610 --> 00:38:21,530 I'm going to use this completely different variable so that I 641 00:38:21,530 --> 00:38:24,990 don't mix up with w's and f's. 642 00:38:24,990 --> 00:38:29,780 The model has say, an input that might be d 643 00:38:29,780 --> 00:38:33,940 and an output that might be q. 644 00:38:33,940 --> 00:38:37,440 And we talk about drawing from some distribution of d. 645 00:38:37,440 --> 00:38:42,000 Say d is normal propagating it through and getting 646 00:38:42,000 --> 00:38:45,172 the corresponding whatever q looks like. 647 00:38:45,172 --> 00:38:48,750 So here when I talk about estimating the probability of A 648 00:38:48,750 --> 00:38:52,900 by drawing from x-- x is really q. 649 00:38:52,900 --> 00:38:58,736 Because this is the pdf that defines if A is out in here, 650 00:38:58,736 --> 00:39:00,360 we want to draw from this distribution. 651 00:39:00,360 --> 00:39:02,701 I mean, if I gave you just this distribution, q, 652 00:39:02,701 --> 00:39:04,450 and I asked you to estimate a probability, 653 00:39:04,450 --> 00:39:07,220 you would randomly sample from it 654 00:39:07,220 --> 00:39:09,490 and then you would count the number on the tail. 655 00:39:09,490 --> 00:39:11,370 So there is kind of an extra step, which 656 00:39:11,370 --> 00:39:14,580 is that we're really drawing from the d and pushing it 657 00:39:14,580 --> 00:39:18,450 through the model to characterize q. 658 00:39:18,450 --> 00:39:20,186 Is that clear enough? 659 00:39:20,186 --> 00:39:22,144 AUDIENCE: The f point, is that all the samples, 660 00:39:22,144 --> 00:39:26,016 or is that just the [? sales ?] proportion? 661 00:39:26,016 --> 00:39:27,420 KAREN WILLCOX: For 10 points? 662 00:39:27,420 --> 00:39:28,110 Oh, the N? 663 00:39:28,110 --> 00:39:29,814 That's going to be all the samples. 664 00:39:29,814 --> 00:39:30,314 Yeah. 665 00:39:30,314 --> 00:39:31,268 AUDIENCE: [INAUDIBLE]. 666 00:39:35,057 --> 00:39:35,890 KAREN WILLCOX: Yeah. 667 00:39:35,890 --> 00:39:38,990 So let me try to write a summary and draw 668 00:39:38,990 --> 00:39:45,857 some pictures to help you, because I don't 669 00:39:45,857 --> 00:39:46,940 want to mix up two things. 670 00:39:46,940 --> 00:39:50,525 But what we would we do here is we would sample randomly, 671 00:39:50,525 --> 00:39:54,206 and more samples are going to end up in here than in here. 672 00:39:54,206 --> 00:39:55,580 And every time we sample here, we 673 00:39:55,580 --> 00:39:58,600 generate a corresponding sample of this. 674 00:39:58,600 --> 00:40:00,600 So when you think about the importance sampling, 675 00:40:00,600 --> 00:40:03,190 just don't really think about this part of it. 676 00:40:03,190 --> 00:40:05,670 Think about it as we're sampling from this. 677 00:40:05,670 --> 00:40:07,920 But it turns out we're sampling from this by something 678 00:40:07,920 --> 00:40:09,520 from this and going forward. 679 00:40:09,520 --> 00:40:12,220 But we're still sampling from this. 680 00:40:12,220 --> 00:40:15,909 So when we do that and we put samples, most of them 681 00:40:15,909 --> 00:40:17,700 are going to end up over here, and then one 682 00:40:17,700 --> 00:40:21,660 in a billion times, we get one out here on average. 683 00:40:21,660 --> 00:40:22,790 Yeah. 684 00:40:22,790 --> 00:40:26,809 So what we're saying is we could define 685 00:40:26,809 --> 00:40:27,850 a different distribution. 686 00:40:27,850 --> 00:40:30,470 We're jumping here a little bit, but let's just 687 00:40:30,470 --> 00:40:34,190 conceptually say that distribution looks like this. 688 00:40:34,190 --> 00:40:37,749 And instead of sampling from this guy-- which by the way 689 00:40:37,749 --> 00:40:39,540 we do by sampling here and pushing forward, 690 00:40:39,540 --> 00:40:43,460 but that's not really relevant-- this sample from this guy 691 00:40:43,460 --> 00:40:47,030 so that now when I sample from here, instead of 1 692 00:40:47,030 --> 00:40:51,970 in a billion, maybe 1 in 10 or maybe half of them 693 00:40:51,970 --> 00:40:55,930 actually fall in the regions I mentioned for them. 694 00:40:55,930 --> 00:40:56,690 Yep. 695 00:40:56,690 --> 00:40:59,790 So now I'm going to have N samples from this guy. 696 00:40:59,790 --> 00:41:01,640 But now when I compute the estimate 697 00:41:01,640 --> 00:41:04,730 of the probability of A, I can't just 698 00:41:04,730 --> 00:41:10,080 take the fraction that are here divided by the total numbers. 699 00:41:10,080 --> 00:41:12,500 Clearly that would be the wrong estimate. 700 00:41:12,500 --> 00:41:16,960 But if I take each sample that [? folds ?] in here, 701 00:41:16,960 --> 00:41:22,630 the 1's, and multiply them by the ratio of this guy 702 00:41:22,630 --> 00:41:27,930 to this guy, that's going to recalibrate 703 00:41:27,930 --> 00:41:29,180 that to be the right estimate. 704 00:41:29,180 --> 00:41:30,805 And you can kind of see it graphically, 705 00:41:30,805 --> 00:41:34,390 because what happens is you get a sample here. 706 00:41:34,390 --> 00:41:38,807 The f of x is basically 0, fairly small. 707 00:41:38,807 --> 00:41:40,120 This guy is big. 708 00:41:40,120 --> 00:41:45,474 That ratio is going to be really small. 709 00:41:45,474 --> 00:41:49,420 So even though it's the 1, it gets weighted by a 10 710 00:41:49,420 --> 00:41:54,480 to the minus 9 or whatever it is, or 10 to the minus 10. 711 00:41:54,480 --> 00:41:58,400 So you can see how it works, and it all works out. 712 00:41:58,400 --> 00:42:01,250 Draw from this guy, get more sample than here, 713 00:42:01,250 --> 00:42:03,670 but then each sample doesn't get a 1. 714 00:42:03,670 --> 00:42:06,950 It gets a 1 times the ratio of this to this. 715 00:42:06,950 --> 00:42:09,110 And by doing that, we'll be weighting, [INAUDIBLE] 716 00:42:09,110 --> 00:42:12,454 actually estimating the right probability. 717 00:42:12,454 --> 00:42:13,870 But then the next question, that's 718 00:42:13,870 --> 00:42:15,286 kind of the next thing we're going 719 00:42:15,286 --> 00:42:19,437 to talk about is, how do you figure out what a good biasing 720 00:42:19,437 --> 00:42:20,103 distribution is? 721 00:42:20,103 --> 00:42:22,290 How do you come up with the distribution? 722 00:42:22,290 --> 00:42:24,240 And it would be easier if we didn't 723 00:42:24,240 --> 00:42:27,370 have this part of the problem, but because we 724 00:42:27,370 --> 00:42:30,310 are generating the samples from the inputs 725 00:42:30,310 --> 00:42:32,650 and pushing them through the model, 726 00:42:32,650 --> 00:42:35,680 we don't necessarily know how to bias the distribution 727 00:42:35,680 --> 00:42:39,260 here so that we get a good biasing distribution here. 728 00:42:39,260 --> 00:42:42,940 And that's like saying do you know what combination of inputs 729 00:42:42,940 --> 00:42:45,960 makes your aircraft much vulnerable to failure. 730 00:42:45,960 --> 00:42:49,030 And sometimes you do, and sometimes you don't. 731 00:42:49,030 --> 00:42:51,359 So again, that's kind of the separate part. 732 00:42:51,359 --> 00:42:53,400 Part of it is a little bit different to describe. 733 00:42:55,920 --> 00:42:56,420 Yeah. 734 00:42:56,420 --> 00:42:59,920 AUDIENCE: So you could take like a Gaussian variance 735 00:42:59,920 --> 00:43:01,894 with a [? unit ?] and plot that over there? 736 00:43:01,894 --> 00:43:02,870 KAREN WILLCOX: Yeah. 737 00:43:02,870 --> 00:43:03,754 You could. 738 00:43:03,754 --> 00:43:05,670 AUDIENCE: Would that be like [? an ?] example? 739 00:43:05,670 --> 00:43:06,764 KAREN WILLCOX: Yeah. 740 00:43:06,764 --> 00:43:09,200 So really simple things that people do 741 00:43:09,200 --> 00:43:12,050 is that they, and we'll look, they scale this thing 742 00:43:12,050 --> 00:43:13,400 to shift more mass. 743 00:43:13,400 --> 00:43:16,715 Well, what I drew here is a little bit extreme. 744 00:43:16,715 --> 00:43:18,090 They scale it to shift more mass, 745 00:43:18,090 --> 00:43:22,240 and they basically make this thing have better tails. 746 00:43:22,240 --> 00:43:24,320 And sometimes they actually just shift it. 747 00:43:24,320 --> 00:43:26,170 We'll take whatever the distribution is and just shift 748 00:43:26,170 --> 00:43:26,836 it and scale it. 749 00:43:28,787 --> 00:43:30,620 And then there are more sophisticated things 750 00:43:30,620 --> 00:43:33,440 you can do, but, in fact, this is somewhat of an open question 751 00:43:33,440 --> 00:43:38,370 is how do you, [INAUDIBLE] define a good biasing 752 00:43:38,370 --> 00:43:44,323 distribution to make your sampling really efficient? 753 00:43:44,323 --> 00:43:48,251 AUDIENCE: I would think that it would [INAUDIBLE]. 754 00:43:56,107 --> 00:43:58,080 KAREN WILLCOX: That's right. 755 00:43:58,080 --> 00:44:01,649 And so usually in the cases where 756 00:44:01,649 --> 00:44:03,690 it's easy to come up with a biasing distribution, 757 00:44:03,690 --> 00:44:06,970 you kind of know what conditions including failure anywhere. 758 00:44:06,970 --> 00:44:09,850 And so it's sort of obvious way to look. 759 00:44:09,850 --> 00:44:14,025 The really difficult problems are when there's tens, 760 00:44:14,025 --> 00:44:16,710 hundreds, thousands of them that are through input here. 761 00:44:16,710 --> 00:44:18,630 We don't even know which combinations of input 762 00:44:18,630 --> 00:44:21,624 to the ones that make you most vulnerable to failure. 763 00:44:21,624 --> 00:44:23,040 And the only way to find out would 764 00:44:23,040 --> 00:44:24,710 be to sample them all, which we already 765 00:44:24,710 --> 00:44:26,356 said we don't want to do. 766 00:44:26,356 --> 00:44:31,420 So aircraft design is a good one, also 767 00:44:31,420 --> 00:44:34,400 people who simulate earthquakes and all these kinds of things, 768 00:44:34,400 --> 00:44:40,380 I mean, where events of different weather stuff. 769 00:44:40,380 --> 00:44:45,340 It's not even clear sometimes that you can simulate and get 770 00:44:45,340 --> 00:44:51,830 good estimates for some of these events. 771 00:44:51,830 --> 00:44:52,780 AUDIENCE: [INAUDIBLE]. 772 00:44:56,127 --> 00:44:56,960 KAREN WILLCOX: Yeah. 773 00:44:56,960 --> 00:45:00,380 In some places, it's really hard for us 774 00:45:00,380 --> 00:45:02,390 to characterize what q is out here. 775 00:45:02,390 --> 00:45:04,139 The financial markets are another one. 776 00:45:04,139 --> 00:45:06,680 I don't know if you guys realize The Black Swan book, so it's 777 00:45:06,680 --> 00:45:10,510 all about [? fact ?] tails and suit and models 778 00:45:10,510 --> 00:45:14,625 and don't account for things that are like way out here, 779 00:45:14,625 --> 00:45:16,583 that are really unlikely, but when they happen, 780 00:45:16,583 --> 00:45:18,614 they have a really big impact. 781 00:45:18,614 --> 00:45:22,960 So if we knew what was going on here, 782 00:45:22,960 --> 00:45:24,936 we would already know the probabilities. 783 00:45:24,936 --> 00:45:26,310 And so the trick is that we don't 784 00:45:26,310 --> 00:45:28,220 know what's going on here, but we 785 00:45:28,220 --> 00:45:30,640 want to learn about what's going on here by finding 786 00:45:30,640 --> 00:45:31,704 better ways to sample. 787 00:45:31,704 --> 00:45:33,120 But to find better ways to sample, 788 00:45:33,120 --> 00:45:35,835 we need to know what we're looking for. 789 00:45:35,835 --> 00:45:37,210 So then there are things, I mean, 790 00:45:37,210 --> 00:45:38,950 this still always happens, and this 791 00:45:38,950 --> 00:45:40,580 is what keeps us busy in research 792 00:45:40,580 --> 00:45:42,575 is that's where you're using activity. 793 00:45:42,575 --> 00:45:46,480 You learn a little bit and sample. 794 00:45:46,480 --> 00:45:55,530 So let's just quickly summarize the steps 795 00:45:55,530 --> 00:45:56,880 for importance sampling. 796 00:45:56,880 --> 00:46:02,990 So we're going to define a w of x and however we come up, 797 00:46:02,990 --> 00:46:11,158 we're going to draw samples of x from that pdf. 798 00:46:14,650 --> 00:46:17,063 So I'll just note here that this is a pdf. 799 00:46:17,063 --> 00:46:21,870 If you prefer to call it f sub w, that's fine. 800 00:46:21,870 --> 00:46:26,377 And again, the idea is we want to get more samples 801 00:46:26,377 --> 00:46:27,460 in the region of interest. 802 00:46:34,420 --> 00:46:43,110 And then we're going to estimate this probability, 803 00:46:43,110 --> 00:46:49,530 but we're going to do it using this thing here 804 00:46:49,530 --> 00:46:54,460 that we define, the P hat, IS, where 805 00:46:54,460 --> 00:47:00,310 we have to do the weighting to account for the fact 806 00:47:00,310 --> 00:47:02,150 that we drew from the wrong distribution. 807 00:47:05,895 --> 00:47:19,840 So importance weights if x divided by the w, 808 00:47:19,840 --> 00:47:20,925 that's the sample point. 809 00:47:23,552 --> 00:47:25,010 And one thing I should say actually 810 00:47:25,010 --> 00:47:27,676 is if you don't choose a good w, you can actually make it worse. 811 00:47:27,676 --> 00:47:35,110 You could make your Monte Carlo convergence worse. 812 00:47:35,110 --> 00:47:38,270 So actually, I think there's one final result to write down 813 00:47:38,270 --> 00:47:48,310 is what are the properties of this P hat IS? 814 00:47:48,310 --> 00:47:56,850 They are that the expected value of our importance sampling 815 00:47:56,850 --> 00:48:01,250 estimator for the probability of A. What do you think it is? 816 00:48:08,680 --> 00:48:11,510 Hopefully, it's P of A. It is, indeed, P of A. 817 00:48:11,510 --> 00:48:16,040 And that's actually, I mean, we didn't do anything over here 818 00:48:16,040 --> 00:48:17,740 except move things around. 819 00:48:17,740 --> 00:48:19,240 We had equality all the way through. 820 00:48:19,240 --> 00:48:21,176 So that's key. 821 00:48:21,176 --> 00:48:25,710 We dumbed this down with the expected value. 822 00:48:25,710 --> 00:48:28,280 In other words, it's unbiased. 823 00:48:28,280 --> 00:48:33,980 But then the key is that-- you could put a w in here 824 00:48:33,980 --> 00:48:37,440 to be clear-- that the variance of the estimator, 825 00:48:37,440 --> 00:48:41,180 which again controls basically how good it is for a given 826 00:48:41,180 --> 00:48:45,270 number of samples, has got this more complicated expression. 827 00:48:45,270 --> 00:48:49,500 So the 1 over N is still out there. 828 00:48:49,500 --> 00:48:58,810 And then there is an expectation of i of A, f of x over w of x, 829 00:48:58,810 --> 00:49:03,167 and then there is a minus t of A squared. 830 00:49:03,167 --> 00:49:05,210 I think I got all the pieces. 831 00:49:11,111 --> 00:49:13,360 And that actually goes back to what I was just saying. 832 00:49:13,360 --> 00:49:15,100 It's not actually obvious whether this 833 00:49:15,100 --> 00:49:17,220 is smaller or bigger than what we had before 834 00:49:17,220 --> 00:49:19,550 with the regular Monte Carlo estimator, but of course, 835 00:49:19,550 --> 00:49:22,120 it all comes down to the choice of w. 836 00:49:22,120 --> 00:49:26,700 I mean, the idea of choose a w to make this small so that you 837 00:49:26,700 --> 00:49:28,421 get away with fewer sampling. 838 00:49:36,340 --> 00:49:37,590 That's kind of neat trick, no? 839 00:49:40,410 --> 00:49:42,876 Multiplying and dividing by things, no? 840 00:49:42,876 --> 00:49:44,659 AUDIENCE: [INAUDIBLE]. 841 00:49:44,659 --> 00:49:45,700 KAREN WILLCOX: It's what? 842 00:49:45,700 --> 00:49:47,125 AUDIENCE: [INAUDIBLE]. 843 00:49:47,125 --> 00:49:48,550 KAREN WILLCOX: Yeah. 844 00:49:48,550 --> 00:49:52,523 It used to always drive me crazy when I was an undergrad. 845 00:49:52,523 --> 00:49:55,301 AUDIENCE: [INAUDIBLE] 846 00:49:55,301 --> 00:49:58,656 KAREN WILLCOX: So next just briefly how to pick, 847 00:49:58,656 --> 00:50:00,697 and I'm not going to give, sorry, [? Trey, ?] I'm 848 00:50:00,697 --> 00:50:03,260 not going to give you any definitive answers on this, 849 00:50:03,260 --> 00:50:05,990 but how to pick the biasing distributions. 850 00:50:08,985 --> 00:50:10,930 So what do you do? 851 00:50:10,930 --> 00:50:16,970 So we'll just look at two yeah? 852 00:50:16,970 --> 00:50:17,958 AUDIENCE: [INAUDIBLE]. 853 00:50:26,356 --> 00:50:27,390 KAREN WILLCOX: Oh, here? 854 00:50:27,390 --> 00:50:27,910 Yeah. 855 00:50:27,910 --> 00:50:31,580 Just like all of the variances of the estimators 856 00:50:31,580 --> 00:50:35,806 have like sigma y or P of A or whatever it is. 857 00:50:35,806 --> 00:50:36,778 AUDIENCE: [INAUDIBLE]. 858 00:50:41,152 --> 00:50:43,100 KAREN WILLCOX: Yeah. 859 00:50:43,100 --> 00:50:45,120 Remember like the original. 860 00:50:45,120 --> 00:50:47,870 Even the variance to the mean estimate is sigma y, 861 00:50:47,870 --> 00:50:50,986 sigma squared over N, and you don't know sigma. 862 00:50:50,986 --> 00:50:52,360 So you can plug in the estimates, 863 00:50:52,360 --> 00:50:55,630 but then you're introducing additional error. 864 00:50:55,630 --> 00:50:58,154 Yeah. 865 00:50:58,154 --> 00:50:59,570 To actually compute them, you have 866 00:50:59,570 --> 00:51:03,216 to use an estimate for P of A. Yeah. 867 00:51:03,216 --> 00:51:05,656 AUDIENCE: [INAUDIBLE]. 868 00:51:05,656 --> 00:51:06,632 KAREN WILLCOX: Yep. 869 00:51:08,584 --> 00:51:09,560 AUDIENCE: [INAUDIBLE]. 870 00:51:13,464 --> 00:51:14,950 KAREN WILLCOX: Yeah. 871 00:51:14,950 --> 00:51:17,140 So this is the theoretical expression for the thing. 872 00:51:17,140 --> 00:51:20,050 How you actually compute this is a whole other question, 873 00:51:20,050 --> 00:51:22,466 but we've seen that with every single one of the variances 874 00:51:22,466 --> 00:51:23,320 of our estimator. 875 00:51:23,320 --> 00:51:27,950 We had sigma squared over N. We didn't know what this was. 876 00:51:27,950 --> 00:51:31,500 On the original one, we had t1 minus t over N. 877 00:51:31,500 --> 00:51:33,220 We didn't know what this was. 878 00:51:33,220 --> 00:51:36,650 So all of them have got the theoretical result 879 00:51:36,650 --> 00:51:38,110 and how you compute them. 880 00:51:38,110 --> 00:51:39,610 And you could compute the estimates. 881 00:51:39,610 --> 00:51:42,120 You can do bootstrapping, and in this case 882 00:51:42,120 --> 00:51:46,641 you would be able to estimate everything. 883 00:51:46,641 --> 00:51:47,140 Yes. 884 00:51:53,760 --> 00:51:57,660 So one simple approach is just to scale. 885 00:51:57,660 --> 00:52:00,800 So we're talking about picking w. 886 00:52:00,800 --> 00:52:04,230 And thank you. 887 00:52:04,230 --> 00:52:07,030 These are homeworks 7. 888 00:52:07,030 --> 00:52:07,950 Thank you. 889 00:52:10,710 --> 00:52:15,810 And it's scaled in such a way that we shift probability 890 00:52:15,810 --> 00:52:22,390 into the region of interest, shift probability 891 00:52:22,390 --> 00:52:24,570 into the event regions. 892 00:52:24,570 --> 00:52:29,772 Again, the idea being that we want to get more samples. 893 00:52:29,772 --> 00:52:30,730 So how would that work? 894 00:52:30,730 --> 00:52:35,830 In that case, w of x could be some 1 895 00:52:35,830 --> 00:52:43,110 over a times the original pdf. 896 00:52:43,110 --> 00:52:45,251 But now the function of x over a, 897 00:52:45,251 --> 00:52:51,518 where a is some constant that's greater than 1. 898 00:52:51,518 --> 00:52:57,660 So I have to draw these for them to make sense to me. 899 00:52:57,660 --> 00:53:02,610 So it's saying pick w to be the scaled version of f 900 00:53:02,610 --> 00:53:11,994 of x where we scale the argument and then also scale the result. 901 00:53:11,994 --> 00:53:14,725 So if I can just sort of sketch emotionally 902 00:53:14,725 --> 00:53:16,100 what that might look like, I know 903 00:53:16,100 --> 00:53:19,750 it helps me to think about what it's doing. 904 00:53:19,750 --> 00:53:25,060 So let's say x and this is the f of x, the original pdf. 905 00:53:25,060 --> 00:53:28,520 So maybe it's Gaussian. 906 00:53:28,520 --> 00:53:30,520 And just to make it easier to think about, let's 907 00:53:30,520 --> 00:53:31,615 just set it at 0. 908 00:53:31,615 --> 00:53:33,476 It doesn't have to be. 909 00:53:33,476 --> 00:53:35,470 And it's high, which is going to be something. 910 00:53:35,470 --> 00:53:38,670 We'll call it q. 911 00:53:38,670 --> 00:53:46,687 Then what is w? 912 00:53:46,687 --> 00:53:48,770 Does anyone want to have a go at [? throwing ?] w? 913 00:53:51,740 --> 00:53:58,664 So the height is q over a. 914 00:53:58,664 --> 00:53:59,600 OK, good. 915 00:53:59,600 --> 00:54:01,486 Am I going to choose a greater than 1? 916 00:54:01,486 --> 00:54:03,750 So it's going to be lower. 917 00:54:07,250 --> 00:54:08,250 AUDIENCE: [INAUDIBLE]. 918 00:54:14,750 --> 00:54:16,319 KAREN WILLCOX: Good. 919 00:54:16,319 --> 00:54:18,360 You're much better at visualizing this than I am. 920 00:54:18,360 --> 00:54:20,819 I had to mentally do a few points 921 00:54:20,819 --> 00:54:22,090 because I'm not very good. 922 00:54:22,090 --> 00:54:27,570 It's not a very good fit Gaussian. 923 00:54:27,570 --> 00:54:29,450 All right. 924 00:54:29,450 --> 00:54:31,970 So I had to actually sort of mentally thinking it through. 925 00:54:31,970 --> 00:54:36,560 I'm like at this point, we're at 1 and say a were 2, 926 00:54:36,560 --> 00:54:44,934 then this guy at 1 is going to get the mass from 1/2. 927 00:54:44,934 --> 00:54:46,600 Then it's going to scale it down by two. 928 00:54:46,600 --> 00:54:49,230 But because this thing is falling off quickly, 929 00:54:49,230 --> 00:54:51,450 it's still going to end up being set 930 00:54:51,450 --> 00:54:54,190 or relative to what was in the middle. 931 00:54:54,190 --> 00:54:57,090 So it's exactly a scaling that's going 932 00:54:57,090 --> 00:55:02,354 to shift probability mass out. 933 00:55:02,354 --> 00:55:04,562 So this is what I was saying when I meant [? heads ?] 934 00:55:04,562 --> 00:55:05,740 or tails. 935 00:55:05,740 --> 00:55:07,870 Any problems you see with this if we're 936 00:55:07,870 --> 00:55:10,040 trying to estimate a probability of failure, 937 00:55:10,040 --> 00:55:12,170 say the probability that x is greater 938 00:55:12,170 --> 00:55:13,490 than some critical value? 939 00:55:20,390 --> 00:55:20,890 What's that? 940 00:55:20,890 --> 00:55:21,590 AUDIENCE: [INAUDIBLE]. 941 00:55:21,590 --> 00:55:22,770 KAREN WILLCOX: Yeah, it might not still be. 942 00:55:22,770 --> 00:55:24,580 Anyway it's definitely going to generate more samples over 943 00:55:24,580 --> 00:55:27,470 here, but it's also going to generate more samples over 944 00:55:27,470 --> 00:55:28,370 here. 945 00:55:28,370 --> 00:55:32,360 So we've made this tail fatter, but we've also 946 00:55:32,360 --> 00:55:34,876 made this tail fatter. 947 00:55:34,876 --> 00:55:38,828 So 948 00:55:38,828 --> 00:55:39,816 AUDIENCE: [INAUDIBLE]? 949 00:55:43,274 --> 00:55:46,930 KAREN WILLCOX: Can you do one if f of x is uniform? 950 00:55:46,930 --> 00:55:50,830 I think about what w would be. 951 00:55:55,100 --> 00:55:57,061 But basically w can be anything you want. 952 00:55:57,061 --> 00:56:00,288 AUDIENCE: [INAUDIBLE]. 953 00:56:00,288 --> 00:56:03,161 KAREN WILLCOX: Yeah, that's just one way to do it. 954 00:56:05,870 --> 00:56:13,350 But you've got to make sure that the support of w 955 00:56:13,350 --> 00:56:16,090 is it at least as wide [INAUDIBLE] actually 956 00:56:16,090 --> 00:56:19,090 going to be dividing by 0. 957 00:56:19,090 --> 00:56:23,070 So if this guy was uniform, I mean-- 958 00:56:23,070 --> 00:56:24,020 AUDIENCE: [INAUDIBLE]. 959 00:56:28,295 --> 00:56:31,020 KAREN WILLCOX: That's right. 960 00:56:31,020 --> 00:56:34,130 But you couldn't have this be uniform and then say, 961 00:56:34,130 --> 00:56:39,010 I'm going to take my w to be this guy. 962 00:56:39,010 --> 00:56:43,948 Well, what's generating from here would you-- 963 00:56:43,948 --> 00:56:46,239 you would be OK because you never generate points here. 964 00:56:49,162 --> 00:56:49,662 Yeah. 965 00:56:49,662 --> 00:56:50,640 AUDIENCE: [INAUDIBLE]. 966 00:56:55,530 --> 00:56:57,300 KAREN WILLCOX: You could but then they 967 00:56:57,300 --> 00:57:00,200 would just get multiplied by 0. 968 00:57:00,200 --> 00:57:02,780 Then you'd be wasting [? holds. ?] Yes, 969 00:57:02,780 --> 00:57:05,688 I guess there's no reason why you couldn't do this. 970 00:57:05,688 --> 00:57:06,636 Yeah. 971 00:57:06,636 --> 00:57:10,340 So that's one simple thing to do is just to move mass out, 972 00:57:10,340 --> 00:57:11,390 maybe [? theta ?] tails. 973 00:57:11,390 --> 00:57:12,790 Another simple thing you could do 974 00:57:12,790 --> 00:57:20,920 is translation, which just means that w of x 975 00:57:20,920 --> 00:57:26,410 would be f of x minus c, where maybe c is greater than 0 976 00:57:26,410 --> 00:57:31,922 if we're thinking about a right tail. 977 00:57:31,922 --> 00:57:33,630 And again, maybe this wouldn't make sense 978 00:57:33,630 --> 00:57:35,730 if you had a finite support like a uniform distribution 979 00:57:35,730 --> 00:57:37,105 because then you would be putting 980 00:57:37,105 --> 00:57:39,220 w, which you didn't care about. 981 00:57:39,220 --> 00:57:41,060 But what would that look like notionally? 982 00:57:41,060 --> 00:57:44,360 If that's x and that's f of x, we 983 00:57:44,360 --> 00:57:52,410 get a first term normal [INAUDIBLE] 0, then the w of x 984 00:57:52,410 --> 00:57:56,480 is just going to be shifted over by an amount, c. 985 00:57:59,670 --> 00:58:03,340 So again, all it's doing is putting more probability mass 986 00:58:03,340 --> 00:58:06,400 in this case in the right tail that we care about. 987 00:58:14,620 --> 00:58:16,550 But thinking of biasing distribution, 988 00:58:16,550 --> 00:58:18,930 the simple problems, simple things probably 989 00:58:18,930 --> 00:58:21,060 work for really difficult problems 990 00:58:21,060 --> 00:58:24,630 with lots of input in very rare events, 991 00:58:24,630 --> 00:58:26,911 differently, still an open question. 992 00:58:33,294 --> 00:58:35,210 Any other questions about importance sampling. 993 00:58:35,210 --> 00:58:37,126 We sometimes talk about distributing analysis? 994 00:58:39,174 --> 00:58:43,370 Does it all makes sense that's on this screen? 995 00:58:46,431 --> 00:58:46,930 Yes. 996 00:58:46,930 --> 00:58:47,846 AUDIENCE: [INAUDIBLE]. 997 00:58:50,760 --> 00:58:53,350 KAREN WILLCOX: Actually control variance are my favorites. 998 00:58:53,350 --> 00:58:56,970 I've mentioned some of the other variance reduction methods. 999 00:58:56,970 --> 00:59:01,150 Control variance is another way to do variance reduction that I 1000 00:59:01,150 --> 00:59:02,862 think is also a kind of neat. 1001 00:59:02,862 --> 00:59:04,385 So if you're interested in knowing 1002 00:59:04,385 --> 00:59:05,760 other variance reduction methods, 1003 00:59:05,760 --> 00:59:07,900 I could give you some stuff to read. 1004 00:59:10,510 --> 00:59:15,880 So let's now just in the last 15 minutes 1005 00:59:15,880 --> 00:59:18,604 talk a little bit about sensitivity analysis 1006 00:59:18,604 --> 00:59:22,652 in the context of Monte Carlos. 1007 00:59:22,652 --> 00:59:27,460 So the question is now how do we use our Monte Carlo 1008 00:59:27,460 --> 00:59:31,640 results to understand which uncertain inputs are 1009 00:59:31,640 --> 00:59:33,760 contributing the most to output variability. 1010 00:59:33,760 --> 00:59:36,060 So I want you to put this picture back 1011 00:59:36,060 --> 00:59:38,870 in your mind, this guy here, which 1012 00:59:38,870 --> 00:59:44,390 is that we've got uncertain inputs, 1013 00:59:44,390 --> 00:59:46,090 and maybe there's a lot of them-- 1014 00:59:46,090 --> 00:59:50,940 d1, d2, d3, however many. 1015 00:59:50,940 --> 00:59:53,790 And maybe there's one output of interest, 1016 00:59:53,790 --> 00:59:55,015 or maybe there is lots. 1017 00:59:55,015 --> 00:59:57,830 But we've run these Monte Carlos, 1018 00:59:57,830 --> 01:00:01,350 we put pdf's in all these guys. 1019 01:00:01,350 --> 01:00:02,280 We draw samples. 1020 01:00:02,280 --> 01:00:03,090 We run through. 1021 01:00:03,090 --> 01:00:08,160 We generate some kind of output histogram 1022 01:00:08,160 --> 01:00:12,890 on q that looks like however it looks. 1023 01:00:12,890 --> 01:00:15,100 But now we want to ask the question, well, 1024 01:00:15,100 --> 01:00:18,050 which one of these is really contributing 1025 01:00:18,050 --> 01:00:23,050 to the variability, or which combinations of these 1026 01:00:23,050 --> 01:00:24,680 are contributing to the variability? 1027 01:00:24,680 --> 01:00:27,672 Which combinations are causing this big fat tail 1028 01:00:27,672 --> 01:00:29,630 out here, which corresponds to [INAUDIBLE] that 1029 01:00:29,630 --> 01:00:32,136 don't meet our design criteria? 1030 01:00:32,136 --> 01:00:35,449 So this is really now kind of the opposite question. 1031 01:00:35,449 --> 01:00:37,490 You've done a forward propagation of uncertainty, 1032 01:00:37,490 --> 01:00:39,225 and you've got the uncertainty in q, 1033 01:00:39,225 --> 01:00:42,830 and now you want to figure out where did it come from. 1034 01:00:42,830 --> 01:00:44,790 So forward propagation of uncertainty, 1035 01:00:44,790 --> 01:00:49,150 sensitivity analysis is kind of like the reverse question. 1036 01:00:49,150 --> 01:00:50,940 And it's important for two reasons, 1037 01:00:50,940 --> 01:00:54,050 one is that it helps us understand 1038 01:00:54,050 --> 01:00:57,670 where we should focus our uncertainty reduction efforts. 1039 01:00:57,670 --> 01:01:00,372 But I think in many ways when you look say like an aircraft 1040 01:01:00,372 --> 01:01:02,580 design and development program at a place like Boeing 1041 01:01:02,580 --> 01:01:05,250 or wherever, so much of the process 1042 01:01:05,250 --> 01:01:08,250 is about reducing uncertainty. 1043 01:01:08,250 --> 01:01:11,450 It's about running experiments or tests 1044 01:01:11,450 --> 01:01:13,200 or running models and trying to figure out 1045 01:01:13,200 --> 01:01:15,033 exactly what the performance is of the thing 1046 01:01:15,033 --> 01:01:17,660 that you're designing and making decisions as well. 1047 01:01:17,660 --> 01:01:21,060 So understanding where this big uncertainty can 1048 01:01:21,060 --> 01:01:23,600 help you decide what kind of experiments 1049 01:01:23,600 --> 01:01:25,690 to do, whether it will improve your models, 1050 01:01:25,690 --> 01:01:27,340 if you're thinking about uncertainty 1051 01:01:27,340 --> 01:01:29,350 in finished products, it can help 1052 01:01:29,350 --> 01:01:31,620 guide you one where you should tighten tolerances 1053 01:01:31,620 --> 01:01:34,235 in the new manufacturing process. 1054 01:01:34,235 --> 01:01:36,610 It might tell you that you've got uncertainty because you 1055 01:01:36,610 --> 01:01:37,860 don't know certain conditions. 1056 01:01:37,860 --> 01:01:40,210 So maybe it says you need a sensor in your engine 1057 01:01:40,210 --> 01:01:42,590 or on board your aircraft or wherever. 1058 01:01:42,590 --> 01:01:46,100 Sometimes this is called factor prioritization, 1059 01:01:46,100 --> 01:01:49,410 figuring out what the priority order of inputs. 1060 01:01:49,410 --> 01:01:53,500 Effect is just a term that is often used in the statistics 1061 01:01:53,500 --> 01:01:54,340 community. 1062 01:01:54,340 --> 01:01:57,656 Which one should be priorities for uncertainty reductions. 1063 01:01:57,656 --> 01:02:00,164 So it's kind of half a story, but the other half of story 1064 01:02:00,164 --> 01:02:01,580 is that it's also really important 1065 01:02:01,580 --> 01:02:04,100 to understand where the uncertainties are not 1066 01:02:04,100 --> 01:02:06,540 important. 1067 01:02:06,540 --> 01:02:11,200 That they could be distributions on d1, 1068 01:02:11,200 --> 01:02:13,770 but this doesn't really matter because this thing is not 1069 01:02:13,770 --> 01:02:16,459 sensitive to d1. 1070 01:02:16,459 --> 01:02:18,000 And that's important because then you 1071 01:02:18,000 --> 01:02:19,833 shouldn't waste your time worrying about d1, 1072 01:02:19,833 --> 01:02:22,850 and you shouldn't waste your time arguing with other groups 1073 01:02:22,850 --> 01:02:26,060 about whether d1 should be 1 or 1.1 or 1.2. 1074 01:02:26,060 --> 01:02:29,190 So it's important to understand where things are actually 1075 01:02:29,190 --> 01:02:31,312 not important. 1076 01:02:31,312 --> 01:02:32,770 And this is sometimes called factor 1077 01:02:32,770 --> 01:02:36,734 fixing because this is where you can understand where it's not 1078 01:02:36,734 --> 01:02:38,150 important to consider uncertainty, 1079 01:02:38,150 --> 01:02:41,210 and you can just pick a deterministic value of an input 1080 01:02:41,210 --> 01:02:42,279 and go forward. 1081 01:02:42,279 --> 01:02:44,820 And this is really important, actually in the policy setting. 1082 01:02:44,820 --> 01:02:48,050 As I mentioned before, you can work with FAA. 1083 01:02:48,050 --> 01:02:50,840 People argue about all kinds of things when they make policies. 1084 01:02:50,840 --> 01:02:52,430 It's nice to be able to show that some 1085 01:02:52,430 --> 01:02:54,280 of the things that you argue about actually 1086 01:02:54,280 --> 01:02:59,410 don't matter for the uncertainty in the policy decision. 1087 01:02:59,410 --> 01:03:00,835 Less things to argue about. 1088 01:03:00,835 --> 01:03:01,480 All right. 1089 01:03:01,480 --> 01:03:04,340 So how can we do sensitivity analysis? 1090 01:03:04,340 --> 01:03:07,790 There is something called VABO, vary-all-but-one [INAUDIBLE]. 1091 01:03:11,730 --> 01:03:13,200 Think about doing. 1092 01:03:13,200 --> 01:03:14,690 So here are the steps. 1093 01:03:14,690 --> 01:03:20,160 First of all run your Monte Caro with all the inputs varying. 1094 01:03:20,160 --> 01:03:25,377 Think about d1, d2, d3, up to however many we have of them. 1095 01:03:25,377 --> 01:03:26,710 We've got pdf's for all of them. 1096 01:03:26,710 --> 01:03:27,520 We sample all of them. 1097 01:03:27,520 --> 01:03:29,446 We generate the corresponding-- it's called q 1098 01:03:29,446 --> 01:03:31,080 on that order over there. 1099 01:03:31,080 --> 01:03:34,155 Then let's go and take input case, 1100 01:03:34,155 --> 01:03:36,840 so it's like the first one, d1, and let's 1101 01:03:36,840 --> 01:03:40,930 fix it to deterministic value so it's no longer got a pdf, it's 1102 01:03:40,930 --> 01:03:44,080 no longer able to vary, and rerun the Monte Carlo with all 1103 01:03:44,080 --> 01:03:48,600 the other ones-- all the other, however many there 1104 01:03:48,600 --> 01:03:51,470 are d minus 1 inputs varying. 1105 01:03:51,470 --> 01:03:54,250 So now we have the results of two Monte Carlo simulations, 1106 01:03:54,250 --> 01:03:55,740 one with a fixed, one was varying, 1107 01:03:55,740 --> 01:03:57,630 and one where it wasn't. 1108 01:03:57,630 --> 01:03:59,950 And you could compare the statistic to the output. 1109 01:03:59,950 --> 01:04:01,790 You could look at the variance from run one, 1110 01:04:01,790 --> 01:04:03,680 and the variance of run two. 1111 01:04:03,680 --> 01:04:07,420 And you could then attribute the difference in the variance 1112 01:04:07,420 --> 01:04:09,800 to that fixed factor. 1113 01:04:09,800 --> 01:04:15,290 And then he would repeat it with 1, 3, 4, 5, 6. 1114 01:04:15,290 --> 01:04:19,360 It varies at 6 2, 6 3, 6 4. 1115 01:04:19,360 --> 01:04:21,230 So vary all but one. 1116 01:04:23,524 --> 01:04:25,190 I mean, maybe it's a useful thing to do, 1117 01:04:25,190 --> 01:04:26,810 but of course, there's question. 1118 01:04:26,810 --> 01:04:29,143 The first question you'd say, well, what value should we 1119 01:04:29,143 --> 01:04:31,670 fix factor k? 1120 01:04:31,670 --> 01:04:36,160 If we fix it to this value and ran the Monte Carlo, 1121 01:04:36,160 --> 01:04:38,690 would we get a different result than if we 1122 01:04:38,690 --> 01:04:41,010 fixed it a different value? 1123 01:04:41,010 --> 01:04:45,255 And the answer for a complicated system is probably yes. 1124 01:04:45,255 --> 01:04:47,630 And then the other question maybe you should be wondering 1125 01:04:47,630 --> 01:04:49,400 is what about possible interactions. 1126 01:04:49,400 --> 01:04:54,400 If I fix input 1 at a value and analyze things, 1127 01:04:54,400 --> 01:04:58,556 but what if I fixed input 1 and input 2 and there 1128 01:04:58,556 --> 01:05:01,120 were interactions between them, and I never explored them. 1129 01:05:01,120 --> 01:05:03,640 Are there interactions that might be important 1130 01:05:03,640 --> 01:05:06,610 that I would be missing? 1131 01:05:06,610 --> 01:05:09,670 So to address those limitations is 1132 01:05:09,670 --> 01:05:14,420 something that's called global sensitivity analysis. 1133 01:05:14,420 --> 01:05:19,720 So let's get this off. 1134 01:05:19,720 --> 01:05:23,620 And I think when you think about global sensitivity analysis, 1135 01:05:23,620 --> 01:05:27,540 it's useful to have this vary all but month's color 1136 01:05:27,540 --> 01:05:34,340 in your mind because it's conceptually a little bit 1137 01:05:34,340 --> 01:05:36,650 the same. 1138 01:05:36,650 --> 01:05:38,073 So we're on 4. 1139 01:05:38,073 --> 01:05:40,450 And 4a was the vary-all-but-one. 1140 01:05:40,450 --> 01:05:44,895 So this is 4b, global sensitivity analysis. 1141 01:06:02,162 --> 01:06:05,520 So I'm going to draw a picture here. 1142 01:06:05,520 --> 01:06:07,835 I'm going to use different notation, 1143 01:06:07,835 --> 01:06:09,460 but let me not try to change otherwise, 1144 01:06:09,460 --> 01:06:12,050 I will end up mixing something up. 1145 01:06:12,050 --> 01:06:14,450 So we've got a model. 1146 01:06:14,450 --> 01:06:19,150 So then I'll call the inputs x1, x2, down to-- we're 1147 01:06:19,150 --> 01:06:28,041 going to have [INAUDIBLE], so d random inputs. 1148 01:06:28,041 --> 01:06:32,350 Now, let's just think about a single random output, so then 1149 01:06:32,350 --> 01:06:34,170 just one random output. 1150 01:06:40,688 --> 01:06:47,475 So global sensitivity analysis defines [AUDIO OUT] 1151 01:06:47,475 --> 01:06:48,600 called sensitivity indices. 1152 01:06:51,210 --> 01:06:55,360 And there are two different kinds of sensitivity indices. 1153 01:06:55,360 --> 01:06:58,982 So the first one is what's called a main effect. 1154 01:06:58,982 --> 01:07:03,783 Did I talk about main effects in 1609, 6041? 1155 01:07:03,783 --> 01:07:05,082 No. 1156 01:07:05,082 --> 01:07:07,410 Maybe in 62x, anybody seen effects? 1157 01:07:10,230 --> 01:07:21,040 So if you've seen effects sensitivity index for input i, 1158 01:07:21,040 --> 01:07:23,262 usually the set of [? sessions ?] of u 1159 01:07:23,262 --> 01:07:24,970 say people would use it with factor here, 1160 01:07:24,970 --> 01:07:27,820 but let's just call it an input. 1161 01:07:27,820 --> 01:07:31,912 So this thing is [? the limitation ?] si. 1162 01:07:31,912 --> 01:07:34,120 Let's write the formula and talk about what it means. 1163 01:07:34,120 --> 01:07:41,020 It's the variance of Y minus the expected value 1164 01:07:41,020 --> 01:08:03,945 of the variance of Y given xi all divided by the variance, Y. 1165 01:08:03,945 --> 01:08:06,560 So the variance of Y minus the expected 1166 01:08:06,560 --> 01:08:09,790 value of the variance of Y given xi divided 1167 01:08:09,790 --> 01:08:11,986 by the variance of Y. Does anyone 1168 01:08:11,986 --> 01:08:13,860 want to have a go at explaining what this is? 1169 01:08:16,848 --> 01:08:17,844 AUDIENCE: [INAUDIBLE]. 1170 01:08:46,230 --> 01:08:47,710 KAREN WILLCOX: Yes. 1171 01:08:47,710 --> 01:08:53,850 So if this happened to be 0, then overall sensitivity index 1172 01:08:53,850 --> 01:08:55,950 would be what? 1173 01:08:55,950 --> 01:08:57,290 1. 1174 01:08:57,290 --> 01:08:59,510 And it would mean that all of the variance in Y 1175 01:08:59,510 --> 01:09:01,370 was explained by xi, because when we fix 1176 01:09:01,370 --> 01:09:05,029 xi the variance [INAUDIBLE]. 1177 01:09:05,029 --> 01:09:07,439 What about if the variance of this thing 1178 01:09:07,439 --> 01:09:09,711 were the variance of Y? 1179 01:09:09,711 --> 01:09:11,202 Doesn't have any effect. 1180 01:09:11,202 --> 01:09:12,540 Yes. 1181 01:09:12,540 --> 01:09:14,140 Why is there an expectation here? 1182 01:09:21,149 --> 01:09:27,686 So we go with [AUDIO OUT] So we're saying Y give xi. 1183 01:09:27,686 --> 01:09:28,684 AUDIENCE: [INAUDIBLE]. 1184 01:09:37,167 --> 01:09:38,180 KAREN WILLCOX: Yeah. 1185 01:09:38,180 --> 01:09:41,752 If we picked any particular value which to effect xi, 1186 01:09:41,752 --> 01:09:45,119 as I said given xie to some x star, 1187 01:09:45,119 --> 01:09:47,510 then it would just be a number. 1188 01:09:47,510 --> 01:09:50,840 But because this thing is a random variable, 1189 01:09:50,840 --> 01:09:52,260 we don't know where to fix it. 1190 01:09:52,260 --> 01:09:53,968 So this is addressing this question of we 1191 01:09:53,968 --> 01:09:55,188 don't know where to fix it. 1192 01:09:55,188 --> 01:09:57,515 So at least the way I think of it conceptually 1193 01:09:57,515 --> 01:10:00,740 is that we're going to fix it at all possible places 1194 01:10:00,740 --> 01:10:03,234 and then [INAUDIBLE] over there. 1195 01:10:03,234 --> 01:10:08,110 And so it's the expected reduction in the variance 1196 01:10:08,110 --> 01:10:08,650 is relative. 1197 01:10:08,650 --> 01:10:09,910 It's normalized. 1198 01:10:09,910 --> 01:10:12,780 So it's the expected relative reduction of the variance 1199 01:10:12,780 --> 01:10:16,660 if we learned everything about the factor xi. 1200 01:10:16,660 --> 01:10:21,560 So it's giving us a measure of how much to does xi contribute 1201 01:10:21,560 --> 01:10:24,160 to the variance in Y, but it's a measure 1202 01:10:24,160 --> 01:10:27,260 where we're taking expectation over the possible values 1203 01:10:27,260 --> 01:10:28,830 that xi could take on. 1204 01:10:28,830 --> 01:10:31,334 So that's where the word global come from. 1205 01:10:31,334 --> 01:10:31,834 [INAUDIBLE] 1206 01:10:36,740 --> 01:10:39,720 So let me just write it in words. 1207 01:10:39,720 --> 01:10:47,330 So expected relative reduction in output variance-- 1208 01:10:47,330 --> 01:10:53,440 and then output variance is variance of Y-- it's 1209 01:10:53,440 --> 01:11:04,230 a true value of xi is [? learned. ?] 1210 01:11:04,230 --> 01:11:06,460 And then another thing that's important to see 1211 01:11:06,460 --> 01:11:16,817 is that this is a measure of the effects of varying xi alone. 1212 01:11:23,219 --> 01:11:25,260 And if we wrote this in a slightly different way, 1213 01:11:25,260 --> 01:11:26,926 we could write it, we can see that would 1214 01:11:26,926 --> 01:11:31,360 be averaged over variations in other inputs. 1215 01:11:37,742 --> 01:11:39,830 We'll see you Monday when we talk 1216 01:11:39,830 --> 01:11:42,460 about [? planet ?] experiments. 1217 01:11:42,460 --> 01:11:44,346 It's another way of computing of effects 1218 01:11:44,346 --> 01:11:46,720 that [? finds ?] out more with the second interpretation. 1219 01:11:46,720 --> 01:11:47,553 Let me say it again. 1220 01:11:47,553 --> 01:11:50,630 Measure, this is what we talked about. 1221 01:11:50,630 --> 01:11:52,240 Measure of the effect of varying xi 1222 01:11:52,240 --> 01:11:57,200 alone averaged over the variations of other inputs. 1223 01:11:57,200 --> 01:12:00,113 And that's where the term main effect comes in. 1224 01:12:00,113 --> 01:12:04,510 So if we were to change xi, if we were to vary xi, 1225 01:12:04,510 --> 01:12:09,780 how does it change the output averaged over everything else 1226 01:12:09,780 --> 01:12:13,520 in the [? x3, ?] x4 varying? 1227 01:12:13,520 --> 01:12:14,980 Is that clear? 1228 01:12:14,980 --> 01:12:18,310 So there are two different ways you can think about that. 1229 01:12:18,310 --> 01:12:19,278 AUDIENCE: [INAUDIBLE]. 1230 01:12:26,054 --> 01:12:32,270 KAREN WILLCOX: So xi can at most be 1 1231 01:12:32,270 --> 01:12:37,070 because the most we can get here is 0 if i 1232 01:12:37,070 --> 01:12:38,990 is going to be between 0 and 1. 1233 01:12:38,990 --> 01:12:41,920 And the idea is that you could compute these 1234 01:12:41,920 --> 01:12:44,090 then you would rank them. 1235 01:12:44,090 --> 01:12:47,110 And if you were looking for where you focus your efforts, 1236 01:12:47,110 --> 01:12:49,360 which variable you try to control in the manufacturing 1237 01:12:49,360 --> 01:12:52,860 process, before we go after the one with the biggest 1238 01:12:52,860 --> 01:12:55,850 sensitivity index that you occupy. 1239 01:12:55,850 --> 01:13:01,100 So actually turns out we can also compute higher order 1240 01:13:01,100 --> 01:13:02,040 interaction indices. 1241 01:13:04,630 --> 01:13:08,760 You can also compute effects for the interactions. 1242 01:13:12,270 --> 01:13:16,935 And I won't go into details, but instead 1243 01:13:16,935 --> 01:13:19,880 of now just being si, there could be an sij, which 1244 01:13:19,880 --> 01:13:23,760 would be two variables, two inputs interacting or three 1245 01:13:23,760 --> 01:13:26,420 variables interacting or four, all the way up 1246 01:13:26,420 --> 01:13:29,530 to all of them interacting. 1247 01:13:29,530 --> 01:13:31,320 And what the match shows, I mean, 1248 01:13:31,320 --> 01:13:33,820 we're not going to talk about it, is this idea that you can 1249 01:13:33,820 --> 01:13:38,130 take the variance of Y, think of it as this pi, 1250 01:13:38,130 --> 01:13:42,740 and you can decompose this variance of Y. 1251 01:13:42,740 --> 01:13:45,280 So let's think about-- it's easier for me 1252 01:13:45,280 --> 01:13:54,860 to draw fY depends on x1 and x2, so they're just two inputs. 1253 01:13:54,860 --> 01:13:57,410 There would be a part of the variance that 1254 01:13:57,410 --> 01:14:03,280 might be due to x1 acting alone, there 1255 01:14:03,280 --> 01:14:07,040 might be a part of the variance that's due to x2 acting alone, 1256 01:14:07,040 --> 01:14:10,460 and there might be a part of the variance of the interaction 1257 01:14:10,460 --> 01:14:13,678 between x1 and x2. 1258 01:14:13,678 --> 01:14:18,232 And if we computed the sensitivity indices, f1, f2, 1259 01:14:18,232 --> 01:14:25,120 and f12, they would all come up to 1. 1260 01:14:25,120 --> 01:14:27,150 Or the variance due to x1, the variance 1261 01:14:27,150 --> 01:14:29,550 due to x2, and the variance x12 would come out 1262 01:14:29,550 --> 01:14:33,877 to be the variance of Y. And you can 1263 01:14:33,877 --> 01:14:37,120 show there is something called a high dimensional model 1264 01:14:37,120 --> 01:14:40,320 representation that shows how all of this comes out. 1265 01:14:40,320 --> 01:14:45,200 So it's kind of a nice result that you can attribute. 1266 01:14:45,200 --> 01:14:47,020 It actually turns out that there's 1267 01:14:47,020 --> 01:14:49,700 something else also called a total effect sensitivity 1268 01:14:49,700 --> 01:14:55,650 index that you can also compute that tells you 1269 01:14:55,650 --> 01:15:00,030 how much does x1 contribute, not just by itself but with all 1270 01:15:00,030 --> 01:15:01,770 of the interactions as well. 1271 01:15:01,770 --> 01:15:07,910 So the total effect sensitivity index, what include x1 and x12, 1272 01:15:07,910 --> 01:15:08,410 x1. 1273 01:15:08,410 --> 01:15:12,210 And the total effect x2 would include x2 and x12. 1274 01:15:12,210 --> 01:15:14,682 And then all the other ones, if you had more variables then 1275 01:15:14,682 --> 01:15:16,390 I would add them all up, and it turns out 1276 01:15:16,390 --> 01:15:19,506 to be a fairly easy way to compute that as well. 1277 01:15:24,465 --> 01:15:25,090 Last questions? 1278 01:15:27,660 --> 01:15:28,419 Great. 1279 01:15:28,419 --> 01:15:29,377 AUDIENCE: [INAUDIBLE]? 1280 01:15:36,587 --> 01:15:38,420 KAREN WILLCOX: Yeah, that's a good question. 1281 01:15:38,420 --> 01:15:39,836 So I actually haven't told you how 1282 01:15:39,836 --> 01:15:41,570 to compute these things that all. 1283 01:15:41,570 --> 01:15:43,466 These are the expressions for them. 1284 01:15:43,466 --> 01:15:46,320 It turns out there are a couple different ways. 1285 01:15:46,320 --> 01:15:48,380 I think the most common way to compute 1286 01:15:48,380 --> 01:15:51,035 these things is a method called the [? Sogl ?] method. 1287 01:15:51,035 --> 01:15:53,920 [? Sogl, ?] I think, was a Russian statistician, 1288 01:15:53,920 --> 01:15:56,710 which involves Monte Carlo simulations. 1289 01:15:56,710 --> 01:15:59,440 And you basically end up doing one Monte Carlo simulation 1290 01:15:59,440 --> 01:16:02,740 and then you do a second Monte Carlo simulation 1291 01:16:02,740 --> 01:16:04,560 where you free some of the variables 1292 01:16:04,560 --> 01:16:06,669 and redraw other ones. 1293 01:16:06,669 --> 01:16:08,210 And it's done in kind of a clever way 1294 01:16:08,210 --> 01:16:11,040 so that you get to this. 1295 01:16:11,040 --> 01:16:12,790 So then the question of how [? converge ?] 1296 01:16:12,790 --> 01:16:15,040 of the sensitivity indices comes to be 1297 01:16:15,040 --> 01:16:17,852 a question of how converged are the variance estimates. 1298 01:16:17,852 --> 01:16:19,790 And we've talked a lot about mean estimate 1299 01:16:19,790 --> 01:16:20,460 and how they converge. 1300 01:16:20,460 --> 01:16:22,751 It actually turns out that to get variance to converge, 1301 01:16:22,751 --> 01:16:25,160 you usually have to take more Monte Carlo simulations. 1302 01:16:25,160 --> 01:16:26,910 Most of the time when we use these things, 1303 01:16:26,910 --> 01:16:28,540 we do as many samples as we can afford, 1304 01:16:28,540 --> 01:16:30,830 and that ends up being the [INAUDIBLE]. 1305 01:16:30,830 --> 01:16:32,090 So that's a good question. 1306 01:16:32,090 --> 01:16:33,631 And it also depends on what you want. 1307 01:16:33,631 --> 01:16:36,070 Do you actually care whether you get the sensitivity index 1308 01:16:36,070 --> 01:16:38,110 to four decimal places, or do you just 1309 01:16:38,110 --> 01:16:42,030 care about saying number two is the biggest?