1
00:00:00,030 --> 00:00:02,400
The following content is
provided under a Creative

2
00:00:02,400 --> 00:00:03,780
Commons license.

3
00:00:03,780 --> 00:00:06,020
Your support will help
MIT OpenCourseWare

4
00:00:06,020 --> 00:00:10,100
continue to offer high quality
educational resources for free.

5
00:00:10,100 --> 00:00:12,670
To make a donation or to
view additional materials

6
00:00:12,670 --> 00:00:16,580
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,580 --> 00:00:17,235
at ocw.mit.edu.

8
00:00:27,200 --> 00:00:32,189
KAREN WILLCOX: So first
up, bootstrapping,

9
00:00:32,189 --> 00:00:35,390
which is really aimed
at this question

10
00:00:35,390 --> 00:00:38,880
how do we get estimates
of the standard errors

11
00:00:38,880 --> 00:00:41,990
and our estimators that don't
have known distribution?

12
00:00:41,990 --> 00:00:45,695
So remember we were
talking on Wednesday.

13
00:00:45,695 --> 00:00:47,320
I don't even know
what day it is today.

14
00:00:47,320 --> 00:00:48,020
Monday.

15
00:00:48,020 --> 00:00:51,420
We were talking on Monday about
how you can run the Monte Carlo

16
00:00:51,420 --> 00:00:55,470
simulation and then we know,
for example, the mean estimate

17
00:00:55,470 --> 00:00:58,311
and the limit then goes to
infinity by the central limit

18
00:00:58,311 --> 00:00:58,810
theorem.

19
00:00:58,810 --> 00:01:00,760
Then the distribution
of the estimator

20
00:01:00,760 --> 00:01:04,300
itself, estimate for
the main is normal.

21
00:01:04,300 --> 00:01:05,860
And the mean of
that distribution

22
00:01:05,860 --> 00:01:07,901
is the actual mean that
we're trying to estimate.

23
00:01:07,901 --> 00:01:08,864
It's unbiased.

24
00:01:08,864 --> 00:01:11,280
And the standard deviation of
that distribution, remember,

25
00:01:11,280 --> 00:01:13,750
was the standard
deviation of the quantity

26
00:01:13,750 --> 00:01:17,970
we're putting in an output
divided by square root of n.

27
00:01:17,970 --> 00:01:19,480
But we also said
that, for example,

28
00:01:19,480 --> 00:01:21,105
the estimate of the
variance-- remember

29
00:01:21,105 --> 00:01:24,340
Alex told you three different
ways to estimate variance--

30
00:01:24,340 --> 00:01:27,160
that those estimators don't
have known distributions.

31
00:01:27,160 --> 00:01:29,660
They're not necessarily
following a normal distribution

32
00:01:29,660 --> 00:01:31,580
or any other known distribution.

33
00:01:31,580 --> 00:01:35,197
So bootstrapping is a
trick, and actually,

34
00:01:35,197 --> 00:01:37,030
as I was putting together
today's lecture, I

35
00:01:37,030 --> 00:01:38,310
[AUDIO OUT] tricks.

36
00:01:38,310 --> 00:01:42,720
Statisticians tend to have lots
of tricks up their sleeves--

37
00:01:42,720 --> 00:01:44,900
just going to pull out
the section on the notes

38
00:01:44,900 --> 00:01:49,890
because I think you
should have read this--

39
00:01:49,890 --> 00:01:51,520
as to how bootstrapping works.

40
00:01:51,520 --> 00:01:56,880
And there's a lot of
words here, but let's just

41
00:01:56,880 --> 00:01:59,505
come down to the [INAUDIBLE].

42
00:01:59,505 --> 00:02:00,880
So what do we do
in bootstrapping

43
00:02:00,880 --> 00:02:03,710
is perform our initial
Monte Carlo sample,

44
00:02:03,710 --> 00:02:05,210
just like we've
been talking about.

45
00:02:05,210 --> 00:02:09,240
So take the inputs, draw in
realizations from the input

46
00:02:09,240 --> 00:02:11,710
distributions, run them
each through the model,

47
00:02:11,710 --> 00:02:15,250
produce the samples, the
yi's, and compute the desired

48
00:02:15,250 --> 00:02:18,257
estimator So this theta here
might be a mean estimate

49
00:02:18,257 --> 00:02:20,090
or think of it as a
variance instrument, one

50
00:02:20,090 --> 00:02:24,600
we can't necessarily easily
do the confidence interval

51
00:02:24,600 --> 00:02:25,960
analysis on.

52
00:02:25,960 --> 00:02:27,890
Then what bootstrapping
says is, well,

53
00:02:27,890 --> 00:02:29,607
now you've got these N samples.

54
00:02:29,607 --> 00:02:31,190
And remember each
one of these samples

55
00:02:31,190 --> 00:02:33,530
is being run through your
finite element model, run

56
00:02:33,530 --> 00:02:35,500
through your CAD model.

57
00:02:35,500 --> 00:02:38,100
Let's now re sample
those values.

58
00:02:38,100 --> 00:02:42,790
But N samples of y, N samples
of the blade middle hot side

59
00:02:42,790 --> 00:02:44,660
temperatures sitting
in the bin, let's

60
00:02:44,660 --> 00:02:46,957
just re sample from those ones.

61
00:02:46,957 --> 00:02:48,040
And why is that efficient?

62
00:02:48,040 --> 00:02:49,998
Because we don't have to
run the model anymore.

63
00:02:49,998 --> 00:02:52,350
We don't have to do the
finite element analysis.

64
00:02:52,350 --> 00:02:55,150
And we're just going
to re sample from those

65
00:02:55,150 --> 00:02:56,490
with equal probability.

66
00:02:56,490 --> 00:03:01,050
So probability 1 over N int,
we do it with replacement,

67
00:03:01,050 --> 00:03:05,590
so you could pick the same
sample more than once.

68
00:03:05,590 --> 00:03:06,590
So they're pretty clear.

69
00:03:06,590 --> 00:03:07,570
We had N samples.

70
00:03:07,570 --> 00:03:09,920
Then we're going to
sample N time from those,

71
00:03:09,920 --> 00:03:11,840
but we're going to get
a different set of N

72
00:03:11,840 --> 00:03:14,697
because, I mean, maybe
we'll pick each one once,

73
00:03:14,697 --> 00:03:17,280
but more likely we're going to
pick that one a couple of times

74
00:03:17,280 --> 00:03:18,820
and this one not at all.

75
00:03:18,820 --> 00:03:22,170
So we'll have a
different sampling,

76
00:03:22,170 --> 00:03:25,150
we could compute the
estimator, and we could keep

77
00:03:25,150 --> 00:03:27,760
doing that N minus 1 times.

78
00:03:27,760 --> 00:03:30,820
In the end, we're going
to end up with N values

79
00:03:30,820 --> 00:03:33,090
to the estimator.

80
00:03:33,090 --> 00:03:38,051
And from there you could try to
determine competence intervals.

81
00:03:38,051 --> 00:03:40,950
So it's kind of a trick
because it's kind of cheating.

82
00:03:40,950 --> 00:03:45,089
And it's not the same thing
as running a new Monte Carlo,

83
00:03:45,089 --> 00:03:47,130
but hopefully, you can
see that this is something

84
00:03:47,130 --> 00:03:50,609
you can do with almost no
more computational cost,

85
00:03:50,609 --> 00:03:52,900
and it actually turns out
that if you do some analysis,

86
00:03:52,900 --> 00:03:55,130
this is a reasonable
thing to do and it

87
00:03:55,130 --> 00:03:56,880
take a lot of statisticians
to analyze it.

88
00:03:56,880 --> 00:04:00,080
So when people talk about
bootstrapping in this context,

89
00:04:00,080 --> 00:04:01,700
this is what they're
talking about,

90
00:04:01,700 --> 00:04:06,130
re sampling from your samples,
putting the information

91
00:04:06,130 --> 00:04:07,915
together in different
ways, and then using

92
00:04:07,915 --> 00:04:10,880
that to get a sense
of the distribution

93
00:04:10,880 --> 00:04:14,000
and say that they're
[? untestable. ?]

94
00:04:14,000 --> 00:04:15,045
Is that clear?

95
00:04:19,562 --> 00:04:22,550
So [AUDIO OUT] internet,
so I didn't get time

96
00:04:22,550 --> 00:04:26,480
to go through it on Monday.

97
00:04:26,480 --> 00:04:28,650
But I mostly want
us to spin today

98
00:04:28,650 --> 00:04:32,030
talking about different
ways to reduce variance.

99
00:04:32,030 --> 00:04:35,600
And again, the
really important idea

100
00:04:35,600 --> 00:04:40,231
that should be in your heads is
the quality of the estimators

101
00:04:40,231 --> 00:04:42,490
that come out depends on how
many samples in the Monte

102
00:04:42,490 --> 00:04:43,280
Carlo.

103
00:04:43,280 --> 00:04:46,860
And if you can afford
millions of samples,

104
00:04:46,860 --> 00:04:50,430
you can probably get really
good estimates for the mean

105
00:04:50,430 --> 00:04:53,530
and maybe OK estimates
of the variance.

106
00:04:53,530 --> 00:04:55,850
But if, for example,
you're interested

107
00:04:55,850 --> 00:05:01,012
in tail probabilities, so
think about an aircraft design,

108
00:05:01,012 --> 00:05:02,220
a new flight critical system.

109
00:05:02,220 --> 00:05:06,210
Maybe they has to be a 10 to
the minus 9 failure probability

110
00:05:06,210 --> 00:05:07,370
that we have to satisfy.

111
00:05:07,370 --> 00:05:09,970
So if we want to estimate that
by Monte Carlo simulation,

112
00:05:09,970 --> 00:05:13,950
how many samples do we need?

113
00:05:13,950 --> 00:05:14,460
A lot.

114
00:05:14,460 --> 00:05:15,360
How many is a lot?

115
00:05:15,360 --> 00:05:16,258
At least how many?

116
00:05:20,166 --> 00:05:22,360
At least a billion.

117
00:05:22,360 --> 00:05:24,210
If a probability is
10 to the minus 9

118
00:05:24,210 --> 00:05:27,560
and we had a billion
samples, then on average,

119
00:05:27,560 --> 00:05:30,820
you would expect one
sample to be that failure.

120
00:05:30,820 --> 00:05:35,180
So that's not even going to be
enough because you might get 9,

121
00:05:35,180 --> 00:05:36,800
you might get 1,
you might get 2.

122
00:05:36,800 --> 00:05:38,383
So that means you're
going to estimate

123
00:05:38,383 --> 00:05:40,390
this probability differently.

124
00:05:40,390 --> 00:05:43,130
So billions, tens of
billions, hundreds of billions

125
00:05:43,130 --> 00:05:46,580
of samples to estimate what
are called [? variates, ?]

126
00:05:46,580 --> 00:05:49,470
but even just in general, even
just means, if you're talking

127
00:05:49,470 --> 00:05:52,680
about running a CSC code or a
finite element simulation that

128
00:05:52,680 --> 00:05:55,260
takes minutes to run,
we usually can't even

129
00:05:55,260 --> 00:05:57,490
afford hundreds of
thousands of simulations.

130
00:05:57,490 --> 00:05:59,610
So the variance
reduction method,

131
00:05:59,610 --> 00:06:02,360
trying to increase the
accuracy of our Monte Carlo

132
00:06:02,360 --> 00:06:05,480
estimates for a given
number of iterations.

133
00:06:05,480 --> 00:06:07,570
And I think of them,
their kind of tricks,

134
00:06:07,570 --> 00:06:09,235
and they're really
clever tricks.

135
00:06:09,235 --> 00:06:11,230
I'll try to show
you how, at least,

136
00:06:11,230 --> 00:06:12,880
important sampling works.

137
00:06:12,880 --> 00:06:14,380
And you can think
of it in two ways.

138
00:06:14,380 --> 00:06:16,580
Here, if you've got a
good number of samples,

139
00:06:16,580 --> 00:06:20,710
I'm willing to run a million
samples, how much accuracy can

140
00:06:20,710 --> 00:06:22,690
I get in my estimators?

141
00:06:22,690 --> 00:06:25,320
Or you could think if I
want to achieve a given

142
00:06:25,320 --> 00:06:27,920
level of accuracy
in my estimator,

143
00:06:27,920 --> 00:06:29,540
how many samples do
I need, and can we

144
00:06:29,540 --> 00:06:31,480
reduce the number of samples?

145
00:06:31,480 --> 00:06:32,980
So there are a lot
of different ways

146
00:06:32,980 --> 00:06:35,640
that people do variance
reduction-- important sampling,

147
00:06:35,640 --> 00:06:39,280
what's called antithetic
sampling, control variates,

148
00:06:39,280 --> 00:06:40,471
stratified sampling.

149
00:06:40,471 --> 00:06:41,970
And I've decided I
was going to talk

150
00:06:41,970 --> 00:06:43,736
about control variates
and importance sampling,

151
00:06:43,736 --> 00:06:46,194
but I think we'll just talk
about importance sampling today

152
00:06:46,194 --> 00:06:50,760
because it's already quite
a lot to think about.

153
00:06:50,760 --> 00:06:53,310
So in general
importance sampling,

154
00:06:53,310 --> 00:06:59,220
is a general technique
for estimating properties

155
00:06:59,220 --> 00:07:02,360
of one distribution
while only having samples

156
00:07:02,360 --> 00:07:04,930
generated from another
different distribution.

157
00:07:04,930 --> 00:07:06,600
So keep this in mind
as we go through,

158
00:07:06,600 --> 00:07:07,859
because it's a key idea.

159
00:07:07,859 --> 00:07:09,400
We're going to have
samples generated

160
00:07:09,400 --> 00:07:11,490
from one distribution,
and we're going

161
00:07:11,490 --> 00:07:14,850
to figure out how to manipulate
those samples so that they

162
00:07:14,850 --> 00:07:19,240
can estimate things about a
different distribution for us.

163
00:07:19,240 --> 00:07:21,840
And it seems kind of
like a wacky thing to do,

164
00:07:21,840 --> 00:07:26,330
but what you'll see is that
if you pick this distribution

165
00:07:26,330 --> 00:07:29,300
carefully, we can say
things about the statistics

166
00:07:29,300 --> 00:07:32,750
of the other distribution
in a more efficient way

167
00:07:32,750 --> 00:07:35,060
with the Monte Carlo estimates.

168
00:07:35,060 --> 00:07:40,360
So let me get the screen up.

169
00:07:44,770 --> 00:07:58,676
And so we're at number
3, important sampling.

170
00:07:58,676 --> 00:08:01,620
Not a very good piece of chalk.

171
00:08:06,120 --> 00:08:24,833
And so let's think about
a random variable, x,

172
00:08:24,833 --> 00:08:38,220
that has pdf f sub
x, and it has mean

173
00:08:38,220 --> 00:08:46,140
that we'll call mu sub x,
which is the expectation of x.

174
00:08:46,140 --> 00:08:49,210
And I'm going to
put a little x on

175
00:08:49,210 --> 00:08:53,990
the subscript on the
expectation to just denote

176
00:08:53,990 --> 00:09:04,130
that we're taking an expectation
under the density f sub x.

177
00:09:04,130 --> 00:09:05,380
So what I mean by that?

178
00:09:05,380 --> 00:09:09,650
If you want to think about an
expectation as an integral,

179
00:09:09,650 --> 00:09:12,450
and remember Alex talked
about this on Monday,

180
00:09:12,450 --> 00:09:19,151
the expectation of x mu of
x is the integral of what?

181
00:09:19,151 --> 00:09:26,860
x times it's pdf dx.

182
00:09:26,860 --> 00:09:31,110
And so again, you can see
where we're going with this.

183
00:09:31,110 --> 00:09:33,740
Remember we talked about having
a different distribution.

184
00:09:33,740 --> 00:09:35,198
We want to be clear
that when we're

185
00:09:35,198 --> 00:09:37,599
talking about the expectation
of the random variable, x,

186
00:09:37,599 --> 00:09:39,890
we're talking about taking
the expectation with respect

187
00:09:39,890 --> 00:09:44,220
to it or under its pdf f sub x.

188
00:09:44,220 --> 00:09:46,244
Is the notation clear?

189
00:09:46,244 --> 00:09:48,610
Yes?

190
00:09:48,610 --> 00:09:52,360
So we have this
random variable, x.

191
00:09:52,360 --> 00:09:55,060
We may be interested
in estimating its mean,

192
00:09:55,060 --> 00:09:58,950
and we know that this is
the definition of the mean.

193
00:09:58,950 --> 00:10:04,740
It's an integral of x
weighted by the pdf of x.

194
00:10:04,740 --> 00:10:06,195
I didn't put limits
here, but this

195
00:10:06,195 --> 00:10:10,480
is an integral
over this 1/4 of x,

196
00:10:10,480 --> 00:10:13,800
which would be minus
infinity to infinity

197
00:10:13,800 --> 00:10:15,580
if x was a normal interval.

198
00:10:15,580 --> 00:10:18,880
If it was uniform, it would be
an integral over the ab range,

199
00:10:18,880 --> 00:10:21,807
what this 1/4 of x would be.

200
00:10:21,807 --> 00:10:23,390
So now what we're
going to do is we're

201
00:10:23,390 --> 00:10:28,429
going to choose a random
variable that we'll call z.

202
00:10:36,390 --> 00:10:41,960
And we're going to ask
z to be non-negative,

203
00:10:41,960 --> 00:10:46,595
and we're going to choose it
subject to the constraints

204
00:10:46,595 --> 00:10:56,300
that the expectation of z taken
under the pdf-- still the pdf

205
00:10:56,300 --> 00:11:01,773
here-- of fx is equal to 1.

206
00:11:04,551 --> 00:11:06,430
And why are we writing this?

207
00:11:06,430 --> 00:11:08,610
Well, what's the 6th notation?

208
00:11:08,610 --> 00:11:17,470
It's the integral of
z f sub x of x dx.

209
00:11:17,470 --> 00:11:22,110
That's the expectation of
the random variable, v,

210
00:11:22,110 --> 00:11:24,720
under the pdf fx.

211
00:11:28,199 --> 00:11:34,280
So if this is equal to 1, maybe
you can see that all we've done

212
00:11:34,280 --> 00:11:41,211
is really define another pdf
that we can write as f sub z.

213
00:11:41,211 --> 00:11:43,700
Yep.

214
00:11:43,700 --> 00:11:46,175
Because it's going to satisfy
the properties of a pdf.

215
00:11:46,175 --> 00:11:49,376
It's going to integrate to 1.

216
00:11:49,376 --> 00:11:52,800
Is that OK?

217
00:11:52,800 --> 00:11:53,748
AUDIENCE: [INAUDIBLE].

218
00:11:57,540 --> 00:12:00,738
KAREN WILLCOX: z is going to be
just another random variable.

219
00:12:07,450 --> 00:12:10,230
Yeah, so you can
maybe just think

220
00:12:10,230 --> 00:12:15,360
of z as being defined on the
same support if you want to,

221
00:12:15,360 --> 00:12:23,860
or I think this thing is
allowed to be 0 in some places,

222
00:12:23,860 --> 00:12:26,830
but it's not allowed
to be 0 everywhere.

223
00:12:26,830 --> 00:12:30,308
There are some conditions
related to where z can be 0.

224
00:12:33,528 --> 00:12:35,611
Is that OK for the lecturer
to ask you a question?

225
00:12:35,611 --> 00:12:36,527
AUDIENCE: [INAUDIBLE].

226
00:12:39,927 --> 00:12:40,760
KAREN WILLCOX: Yeah.

227
00:12:40,760 --> 00:12:43,900
So I think the easiest
thing is to think

228
00:12:43,900 --> 00:12:46,690
about x as having infinite
support, then integrals away

229
00:12:46,690 --> 00:12:49,410
from minus infinity [INAUDIBLE].

230
00:12:49,410 --> 00:12:51,990
So we haven't done anything yet.

231
00:12:51,990 --> 00:12:53,860
We've just manipulated things.

232
00:12:53,860 --> 00:12:55,010
So why are we doing this?

233
00:12:55,010 --> 00:12:56,830
So if we did this,
we'd have to think

234
00:12:56,830 --> 00:13:01,730
about the mean of x,
mu x, the thing here,

235
00:13:01,730 --> 00:13:06,905
which is, again, the
expectation of x with respect

236
00:13:06,905 --> 00:13:10,850
to under its own pdf.

237
00:13:10,850 --> 00:13:13,590
And we said that this
thing was integral of f

238
00:13:13,590 --> 00:13:18,610
times its pdf, weighted by pdf.

239
00:13:18,610 --> 00:13:20,270
And so now what can we do?

240
00:13:20,270 --> 00:13:24,636
We can divide by z
and multiply by z.

241
00:13:32,220 --> 00:13:34,432
Just multiply by 1.

242
00:13:34,432 --> 00:13:39,150
And now we recognize
that z f of x

243
00:13:39,150 --> 00:13:46,580
is just what we're now calling
f of z, where this is x over z f

244
00:13:46,580 --> 00:13:47,736
of z dx.

245
00:13:54,190 --> 00:13:56,966
So all we did was
multiply and divide by z,

246
00:13:56,966 --> 00:13:58,715
and then say that we're
going to group the

247
00:13:58,715 --> 00:14:03,870
z times the pdf's of x together
and [INAUDIBLE] f of z.

248
00:14:03,870 --> 00:14:06,570
And we can do that because
[? opposite ?] constraint is

249
00:14:06,570 --> 00:14:10,110
that same as integrating to 1.

250
00:14:10,110 --> 00:14:12,190
So what is this guy here?

251
00:14:12,190 --> 00:14:16,430
This guy here is the
expectation of something

252
00:14:16,430 --> 00:14:21,140
else with respect to the
pdf of z, the density of z.

253
00:14:21,140 --> 00:14:23,830
What is it the expectation of?

254
00:14:23,830 --> 00:14:25,080
x over z?

255
00:14:25,080 --> 00:14:29,070
So this is the
expectation of x over z,

256
00:14:29,070 --> 00:14:34,860
but now taken with
respect to the pdf fd.

257
00:14:41,925 --> 00:14:42,610
All right.

258
00:14:42,610 --> 00:14:44,318
So does this kind of
feel like we're just

259
00:14:44,318 --> 00:14:46,150
moving chairs around?

260
00:14:46,150 --> 00:14:47,229
Yeah.

261
00:14:47,229 --> 00:14:49,770
That's why I said they sort of
seem a little bit like tricks.

262
00:14:49,770 --> 00:14:53,884
Is anybody uncomfortable with
anything that we've done?

263
00:14:53,884 --> 00:14:55,230
It's OK?

264
00:14:55,230 --> 00:14:58,560
We've put a 1 over
d in here, and we've

265
00:14:58,560 --> 00:15:01,660
changed the pdf, the
weighting and the integral

266
00:15:01,660 --> 00:15:04,867
to accommodate to
make it equal to same.

267
00:15:04,867 --> 00:15:05,366
Yeah, Kevin?

268
00:15:05,366 --> 00:15:06,700
AUDIENCE: [INAUDIBLE].

269
00:15:06,700 --> 00:15:07,590
KAREN WILLCOX: The
constraint we have

270
00:15:07,590 --> 00:15:09,589
here is that this is the
expectation of [? d1 ?]

271
00:15:09,589 --> 00:15:10,340
of f of x is 1.

272
00:15:13,220 --> 00:15:16,975
Well, this is going to be what
leads us to find the pdf's.

273
00:15:16,975 --> 00:15:17,475
Yeah.

274
00:15:17,475 --> 00:15:18,413
AUDIENCE: [INAUDIBLE].

275
00:15:22,170 --> 00:15:25,710
KAREN WILLCOX: So why do we
put [INAUDIBLE] underneath

276
00:15:25,710 --> 00:15:26,850
and then bring it about?

277
00:15:26,850 --> 00:15:29,021
So let's look at what we have.

278
00:15:29,021 --> 00:15:34,500
We have now two different
ways to compute the mean of x.

279
00:15:34,500 --> 00:15:36,320
That's what we're after.

280
00:15:36,320 --> 00:15:39,930
We have [AUDIO OUT] what
we normally think about,

281
00:15:39,930 --> 00:15:44,560
which is thinking about this
mean as the expectation of x

282
00:15:44,560 --> 00:15:47,300
under the pdf.

283
00:15:47,300 --> 00:15:51,390
And now we have derived
this alternate expression,

284
00:15:51,390 --> 00:15:55,160
which is the
expectation of x over z

285
00:15:55,160 --> 00:15:59,310
where the expectation is
taken now with the pdf

286
00:15:59,310 --> 00:16:02,564
f of z, the z,
and the weighting.

287
00:16:02,564 --> 00:16:05,420
So this is what we've
been talking about.

288
00:16:05,420 --> 00:16:08,890
How do we estimate
the mean this way?

289
00:16:08,890 --> 00:16:19,182
Well, we sample x by drawing
samples from the distribution

290
00:16:19,182 --> 00:16:21,790
f sub x.

291
00:16:21,790 --> 00:16:24,460
We define the pdf of x input.

292
00:16:24,460 --> 00:16:26,205
We draw samples from it.

293
00:16:33,110 --> 00:16:35,060
We estimate the mean
to be the sample mean.

294
00:16:39,080 --> 00:16:54,400
And the variance of the
corresponding estimator

295
00:16:54,400 --> 00:16:58,580
to the estimator
variance is what?

296
00:17:05,980 --> 00:17:08,609
What was on the numerator?

297
00:17:08,609 --> 00:17:10,164
Not mu x.

298
00:17:10,164 --> 00:17:11,466
AUDIENCE: [INAUDIBLE].

299
00:17:11,466 --> 00:17:13,382
KAREN WILLCOX: Yeah,
it's [INAUDIBLE] squared.

300
00:17:13,382 --> 00:17:17,540
So we'll call it
the variance of x.

301
00:17:17,540 --> 00:17:21,195
And again, I'm going to put the
subjects here just to denote.

302
00:17:21,195 --> 00:17:24,807
So that's what we saw before.

303
00:17:24,807 --> 00:17:26,890
Actually, we sampled inputs
and we propagated them

304
00:17:26,890 --> 00:17:29,940
through to the outputs,
but basically, we're

305
00:17:29,940 --> 00:17:36,010
just sampling from
this density, and given

306
00:17:36,010 --> 00:17:40,010
the sample mean
and our estimator

307
00:17:40,010 --> 00:17:43,462
and the estimator of variance
with the [INAUDIBLE].

308
00:17:43,462 --> 00:17:48,050
So what we've done
here is defined

309
00:17:48,050 --> 00:17:51,220
a parallel pattern analogous
[? path. ?] So what we do here?

310
00:17:51,220 --> 00:17:58,780
We're going to sample
x over z, and we're

311
00:17:58,780 --> 00:18:06,300
going to sample it under now the
distribution or the density xz.

312
00:18:11,050 --> 00:18:17,610
And again, we take the samples,
we take the sample mean.

313
00:18:17,610 --> 00:18:19,310
That would be our estimate.

314
00:18:19,310 --> 00:18:23,440
And the variance of that
estimator, the estimator

315
00:18:23,440 --> 00:18:27,705
variance, is now
going to be what?

316
00:18:34,280 --> 00:18:42,900
Variance of x over z under
this divided by [? N. ?]

317
00:18:46,100 --> 00:18:49,600
And now maybe this is where
you see why these tricks work

318
00:18:49,600 --> 00:18:53,350
because putting the d in the
denominator and bringing it out

319
00:18:53,350 --> 00:18:55,320
here didn't change
what we're estimating.

320
00:18:55,320 --> 00:18:56,992
These two expressions are equal.

321
00:19:00,170 --> 00:19:02,720
I like to think about it
is that because of the way

322
00:19:02,720 --> 00:19:05,650
that variance
behaves non-linearly,

323
00:19:05,650 --> 00:19:08,620
pulling the z under and
putting it out here in front

324
00:19:08,620 --> 00:19:11,500
does not mean that these two
things are going to be equal.

325
00:19:11,500 --> 00:19:12,570
And maybe you see there.

326
00:19:12,570 --> 00:19:14,861
You could drive me crazy when
I was learning statistics

327
00:19:14,861 --> 00:19:17,280
is that things like means
would behave in kind

328
00:19:17,280 --> 00:19:19,480
of this linear way
that made thins,

329
00:19:19,480 --> 00:19:25,132
but the variance never seemed
to be doing the sensible thing.

330
00:19:25,132 --> 00:19:26,840
So basically we're
going to exploit that.

331
00:19:26,840 --> 00:19:29,550
So again, we can do
this manipulation

332
00:19:29,550 --> 00:19:32,450
with this other variable, z,
and not change the estimate,

333
00:19:32,450 --> 00:19:34,770
still be getting after
the thing we want.

334
00:19:34,770 --> 00:19:36,810
But if we choose z
in the right way,

335
00:19:36,810 --> 00:19:39,970
and we'll talk about how
to do that, we could make

336
00:19:39,970 --> 00:19:42,710
this smaller, which
means, again, for a given

337
00:19:42,710 --> 00:19:46,730
number of samples, if
this is smaller than this,

338
00:19:46,730 --> 00:19:50,590
then our estimates are
going to be tighter.

339
00:19:50,590 --> 00:19:54,920
Or to achieve a given
confidence in the estimator,

340
00:19:54,920 --> 00:20:00,090
we could take less samples
here than doing it this way.

341
00:20:00,090 --> 00:20:02,820
Yeah.

342
00:20:02,820 --> 00:20:09,840
So We'll talk about sort
of the general ideas,

343
00:20:09,840 --> 00:20:12,020
and then I'll show you
exactly how you might come up

344
00:20:12,020 --> 00:20:15,210
with a z for probabilities.

345
00:20:15,210 --> 00:20:16,960
But the general ideas
are first of all,

346
00:20:16,960 --> 00:20:21,380
that some values of x in
the Monte Carlo simulation

347
00:20:21,380 --> 00:20:25,230
are going to be more important
for estimating the parameter.

348
00:20:25,230 --> 00:20:26,850
The mean is not a
very good example,

349
00:20:26,850 --> 00:20:28,430
but you can imagine
if we're talking

350
00:20:28,430 --> 00:20:30,720
about tail
probabilities, we want

351
00:20:30,720 --> 00:20:36,450
to get more samples around
that failure probability.

352
00:20:36,450 --> 00:20:39,596
What goes on over here we don't
really necessarily care about

353
00:20:39,596 --> 00:20:41,595
because it's like doesn't
fail, it doesn't fail,

354
00:20:41,595 --> 00:20:43,652
it doesn't fail.

355
00:20:43,652 --> 00:20:45,334
So the idea is that
some values of x

356
00:20:45,334 --> 00:20:47,000
are going to be more
important, and what

357
00:20:47,000 --> 00:20:52,010
we want to do when we choose the
z here is to somehow emphasize

358
00:20:52,010 --> 00:20:54,405
those more important values.

359
00:20:54,405 --> 00:20:56,280
And so really what it's
going to come down to

360
00:20:56,280 --> 00:21:00,370
is how do we choose z again so
that this estimated variance is

361
00:21:00,370 --> 00:21:03,070
going to be less than this guy.

362
00:21:03,070 --> 00:21:06,150
And we will talk about
that in flow probabilities.

363
00:21:12,900 --> 00:21:14,320
So any questions?

364
00:21:20,104 --> 00:21:21,895
I didn't ask you at
the beginning of class,

365
00:21:21,895 --> 00:21:23,895
did Professor [? Nguyen ?]
talk about importance

366
00:21:23,895 --> 00:21:25,385
sampling in 1609?

367
00:21:25,385 --> 00:21:25,885
No?

368
00:21:25,885 --> 00:21:29,490
Just as well.

369
00:21:29,490 --> 00:21:29,990
All right.

370
00:21:29,990 --> 00:21:36,970
So let's talk about-- this
is 3.2, importance sampling

371
00:21:36,970 --> 00:21:39,410
for probability estimation.

372
00:21:39,410 --> 00:21:41,760
And hopefully, maybe it
will make this a little bit

373
00:21:41,760 --> 00:21:43,080
more concrete for you.

374
00:21:50,010 --> 00:22:00,960
So what did we already see?

375
00:22:00,960 --> 00:22:05,542
We saw that the
probability of A,

376
00:22:05,542 --> 00:22:09,322
where A is some event
that we're interested in

377
00:22:09,322 --> 00:22:15,420
is estimated by a
Monte Carlo, or can

378
00:22:15,420 --> 00:22:20,740
be estimated by a Monte
Carlo, with an estimator

379
00:22:20,740 --> 00:22:27,960
that we'll call P hat A.

380
00:22:27,960 --> 00:22:34,930
So let's think about
the case where,

381
00:22:34,930 --> 00:22:37,438
again, we're interested in
the probability of failures,

382
00:22:37,438 --> 00:22:45,792
so A is the event where, say
y is greater than y current.

383
00:22:45,792 --> 00:22:47,825
So y is some output of interest.

384
00:22:47,825 --> 00:22:49,820
It's the temperature
in [? blades, ?]

385
00:22:49,820 --> 00:22:53,390
and we're interested in looking
at when that temperature

386
00:22:53,390 --> 00:22:54,950
exceeds some critical value.

387
00:22:58,685 --> 00:23:01,420
So this shows up often,
and if we are looking

388
00:23:01,420 --> 00:23:02,825
at a probability of failure.

389
00:23:08,160 --> 00:23:12,820
So we're going to define
an indicator function.

390
00:23:12,820 --> 00:23:15,000
Usually when you're
working with probabilities

391
00:23:15,000 --> 00:23:17,550
and if you're told to find
an indicator function,

392
00:23:17,550 --> 00:23:22,410
you guys should have
seen those in 1609, yes?

393
00:23:22,410 --> 00:23:30,264
In 6041.

394
00:23:30,264 --> 00:23:34,000
So the indicator function,
i, is the function of Ai.

395
00:23:34,000 --> 00:23:38,120
So this is going
to be corresponding

396
00:23:38,120 --> 00:23:42,630
to the i-th sample
in our Monte Carlo.

397
00:23:42,630 --> 00:23:45,710
So we're drawing a sample,
where running it through

398
00:23:45,710 --> 00:23:48,280
our [? planet ?] element
code, and we're looking

399
00:23:48,280 --> 00:23:51,570
to see whether event A happens.

400
00:23:51,570 --> 00:23:56,000
And indicator function says
if event A happens, then

401
00:23:56,000 --> 00:23:59,917
takes a value of 1, if
not, take a value of 0.

402
00:23:59,917 --> 00:24:03,520
So this is going to be if the
corresponding sample of y,

403
00:24:03,520 --> 00:24:08,640
the yi exceeds y
current, and if not,

404
00:24:08,640 --> 00:24:13,522
then we can set indicator
function to be 0.

405
00:24:13,522 --> 00:24:15,280
So indicator
function is set to 0.

406
00:24:15,280 --> 00:24:17,430
1 result. Failed.

407
00:24:17,430 --> 00:24:18,686
Didn't fail.

408
00:24:18,686 --> 00:24:19,580
A happened.

409
00:24:19,580 --> 00:24:21,878
A didn't happen.

410
00:24:21,878 --> 00:24:24,213
AUDIENCE: [INAUDIBLE].

411
00:24:24,213 --> 00:24:28,500
KAREN WILLCOX: i to the
range. i is either 0 or 1.

412
00:24:28,500 --> 00:24:30,990
AUDIENCE: [INAUDIBLE].

413
00:24:30,990 --> 00:24:32,300
KAREN WILLCOX: Oh, little i.

414
00:24:32,300 --> 00:24:32,800
Sorry.

415
00:24:32,800 --> 00:24:34,958
Yeah, that's right.

416
00:24:34,958 --> 00:24:37,749
Little i from 1 up to n.

417
00:24:37,749 --> 00:24:38,248
Sorry.

418
00:24:41,020 --> 00:24:45,280
So if we just think about
these things in this way,

419
00:24:45,280 --> 00:24:50,370
then the reason we do this is
because then our estimator P

420
00:24:50,370 --> 00:24:54,910
hat of A-- what was he
estimator for the probability?

421
00:24:58,070 --> 00:24:58,973
What was it?

422
00:25:04,290 --> 00:25:04,790
Yep.

423
00:25:04,790 --> 00:25:07,780
So the number of samples
where A occurred just

424
00:25:07,780 --> 00:25:09,540
divided by the
[INAUDIBLE] number.

425
00:25:09,540 --> 00:25:12,905
So what does that look like?

426
00:25:12,905 --> 00:25:14,280
What does that
look like in terms

427
00:25:14,280 --> 00:25:15,670
of the indicator function?

428
00:25:15,670 --> 00:25:18,150
I'll put the
denominator in there.

429
00:25:18,150 --> 00:25:20,191
What's the numerator?

430
00:25:20,191 --> 00:25:20,690
Yeah.

431
00:25:20,690 --> 00:25:23,320
Just the sum of the iA's.

432
00:25:23,320 --> 00:25:32,660
So it's 1 over N. The sum
equals 1 to capital N of iA's.

433
00:25:32,660 --> 00:25:37,280
In other words, it's the sample
mean of the indicator function.

434
00:25:37,280 --> 00:25:40,640
And so we can think of
the estimator this way,

435
00:25:40,640 --> 00:25:41,965
and we've already seen this.

436
00:25:41,965 --> 00:25:46,570
We saw that the expected
value of this estimator

437
00:25:46,570 --> 00:25:48,190
for the probability
was equal to what?

438
00:25:53,720 --> 00:25:57,614
Probability of A, unbiased.

439
00:26:01,970 --> 00:26:04,700
This would be a big piece.

440
00:26:04,700 --> 00:26:06,345
And do you remember
the expressions

441
00:26:06,345 --> 00:26:08,066
to the variance of this?

442
00:26:11,328 --> 00:26:14,905
Does anyone remember it?

443
00:26:14,905 --> 00:26:22,480
Here, it's P of A
[? deserves ?] 1 minus P of A.

444
00:26:22,480 --> 00:26:25,680
And what's always
in the denominator?

445
00:26:25,680 --> 00:26:26,588
N. Right.

446
00:26:26,588 --> 00:26:31,201
Standard error of the
estimator is 1/3 of that thing.

447
00:26:31,201 --> 00:26:33,536
All right.

448
00:26:33,536 --> 00:26:37,080
So that's what we saw before.

449
00:26:37,080 --> 00:26:39,620
So now what we're
going to do is think

450
00:26:39,620 --> 00:26:42,080
about how importance
sampling can

451
00:26:42,080 --> 00:26:46,110
help us come up
with an estimator,

452
00:26:46,110 --> 00:26:49,405
for P of A that's
got smaller variance.

453
00:26:49,405 --> 00:26:51,570
Do you remember when
we did that analysis?

454
00:26:51,570 --> 00:26:55,280
We also did an analysis
where if P is really small

455
00:26:55,280 --> 00:26:57,480
and you want to have
a tolerance that's

456
00:26:57,480 --> 00:26:59,890
relative to the size
of P, you end up

457
00:26:59,890 --> 00:27:03,132
needing billions,
millions of samples.

458
00:27:03,132 --> 00:27:06,080
So we're going to see
whether importance sampling

459
00:27:06,080 --> 00:27:11,610
or see how importance
sampling can help us.

460
00:27:11,610 --> 00:27:13,890
Again, I'm going to
erase this, but I

461
00:27:13,890 --> 00:27:16,500
find it helpful to think
about these two paths.

462
00:27:16,500 --> 00:27:19,590
We could sample,
take lots of samples

463
00:27:19,590 --> 00:27:22,710
and we're talking
about the mean here,

464
00:27:22,710 --> 00:27:25,500
use our regular estimator.

465
00:27:25,500 --> 00:27:27,390
Or we could think
about introducing

466
00:27:27,390 --> 00:27:30,890
another random variable
with another pdf

467
00:27:30,890 --> 00:27:38,060
and changing what we sample from
and then weighting the integral

468
00:27:38,060 --> 00:27:41,005
to account for the fact that
we sampled from a different pdf

469
00:27:41,005 --> 00:27:43,415
and coming out with an
estimate of the same thing

470
00:27:43,415 --> 00:27:46,760
but changing the variance.

471
00:27:46,760 --> 00:27:48,910
So let's see how
that would work out.

472
00:27:58,046 --> 00:28:02,552
So we'll go straight to a pdf.

473
00:28:02,552 --> 00:28:05,910
So we'll introduce another pdf,
and it's going to be called w.

474
00:28:08,755 --> 00:28:10,130
And it's a function
of x, so it's

475
00:28:10,130 --> 00:28:17,205
defined on the same support
as x, whatever x is.

476
00:28:17,205 --> 00:28:19,680
What are we doing?

477
00:28:19,680 --> 00:28:21,001
Yeah, we have [INAUDIBLE].

478
00:28:21,001 --> 00:28:22,384
AUDIENCE: [INAUDIBLE]

479
00:28:22,384 --> 00:28:24,120
KAREN WILLCOX: Ah, fourth.

480
00:28:24,120 --> 00:28:25,578
AUDIENCE: [? Don ?] Ulrich
just wanted to say hello.

481
00:28:25,578 --> 00:28:26,550
KAREN WILLCOX: Hi, Don.

482
00:28:26,550 --> 00:28:27,522
AUDIENCE: How are you doing?

483
00:28:27,522 --> 00:28:28,890
You've got chalk
all over your face.

484
00:28:28,890 --> 00:28:29,806
KAREN WILLCOX: I have.

485
00:28:29,806 --> 00:28:31,817
You guys know who
this is, right?

486
00:28:31,817 --> 00:28:33,400
[? Don ?] Ulrich,
Satellite astronaut.

487
00:28:33,400 --> 00:28:35,940
AUDIENCE: I'm one of the lucky
guys who took a fall in space.

488
00:28:35,940 --> 00:28:36,940
KAREN WILLCOX: They're
excited in learning all

489
00:28:36,940 --> 00:28:38,106
about computational methods.

490
00:28:38,106 --> 00:28:40,610
And today we're talking
about estimating low failure

491
00:28:40,610 --> 00:28:44,020
probabilities, so things
like 10 to the minus 9,

492
00:28:44,020 --> 00:28:45,395
things that can go wrong.

493
00:28:45,395 --> 00:28:46,728
Do you know anything about that?

494
00:28:46,728 --> 00:28:49,170
AUDIENCE: That's
10 to the minus 9.

495
00:28:49,170 --> 00:28:51,625
Well, you don't need
to worry about that.

496
00:28:51,625 --> 00:28:53,500
KAREN WILLCOX: I'll be
at the dinner tonight.

497
00:28:53,500 --> 00:28:54,458
So I'll pick up a beer.

498
00:28:54,458 --> 00:28:55,260
Thanks.

499
00:28:55,260 --> 00:28:56,395
Where is the talk?

500
00:28:56,395 --> 00:28:59,370
AUDIENCE: Marlar Lounge.

501
00:28:59,370 --> 00:29:00,870
Second floor.

502
00:29:00,870 --> 00:29:02,550
4:00.

503
00:29:02,550 --> 00:29:05,596
KAREN WILLCOX: Or you
could come to office hours.

504
00:29:05,596 --> 00:29:08,131
AUDIENCE: And make sure you
learn this stuff because it's

505
00:29:08,131 --> 00:29:08,630
important.

506
00:29:15,104 --> 00:29:17,020
KAREN WILLCOX: I don't
if you guys know, but I

507
00:29:17,020 --> 00:29:20,000
made it to the finals at the
astronaut selection last year,

508
00:29:20,000 --> 00:29:23,448
but didn't get picked
so, Don was, yeah.

509
00:29:29,420 --> 00:29:29,920
All right.

510
00:29:29,920 --> 00:29:31,210
So we just introduced
a bit-- I'd

511
00:29:31,210 --> 00:29:32,285
much rather teach that stuff.

512
00:29:32,285 --> 00:29:34,284
It's more exciting than
going into space, right?

513
00:29:34,284 --> 00:29:35,236
AUDIENCE: [INAUDIBLE].

514
00:29:42,964 --> 00:29:46,840
KAREN WILLCOX: So we've
introduced to pdf w of x,

515
00:29:46,840 --> 00:29:48,740
and I want you to
think is this as being

516
00:29:48,740 --> 00:29:54,720
like an alternative pdf for x.

517
00:29:54,720 --> 00:29:57,370
And x is just a
generic at this point.

518
00:29:57,370 --> 00:30:05,190
This thing is called
the biasing entity.

519
00:30:09,190 --> 00:30:11,320
And you can see what
we're going to do

520
00:30:11,320 --> 00:30:23,070
is we're going to choose w of
x so that this event, A, occurs

521
00:30:23,070 --> 00:30:23,780
more frequently.

522
00:30:30,944 --> 00:30:32,450
So the idea is we
don't want to have

523
00:30:32,450 --> 00:30:35,030
to wait around for a billion
samples for the one event

524
00:30:35,030 --> 00:30:36,060
to occur.

525
00:30:36,060 --> 00:30:40,170
We're going to choose
this alternate pdf w,

526
00:30:40,170 --> 00:30:44,600
in such a way that this event,
A, occurs more frequently.

527
00:30:44,600 --> 00:30:48,300
So then what is our estimate
to the probability look like?

528
00:30:48,300 --> 00:30:59,910
Probability of A is maybe
you can see it directly

529
00:30:59,910 --> 00:31:00,840
from this expression.

530
00:31:00,840 --> 00:31:03,540
We could write it as being the
expectation of the indicator

531
00:31:03,540 --> 00:31:04,040
function.

532
00:31:06,662 --> 00:31:12,596
And it's the expectation
taken under f of x.

533
00:31:12,596 --> 00:31:15,540
So it's the expectation
of the indicator

534
00:31:15,540 --> 00:31:23,250
function with expectation
as with respect to f of x.

535
00:31:23,250 --> 00:31:33,270
Maybe up here I should define
that then f of x is but the pdf

536
00:31:33,270 --> 00:31:34,286
f of x.

537
00:31:38,020 --> 00:31:38,970
And what is this?

538
00:31:38,970 --> 00:31:45,030
This is nothing but the integral
of the indicator function

539
00:31:45,030 --> 00:31:47,465
weighted by the pdf and
then [? graded ?] up

540
00:31:47,465 --> 00:31:49,520
for the support of x.

541
00:31:49,520 --> 00:31:53,100
So we're going to do the same
trick that we did before.

542
00:31:53,100 --> 00:31:57,880
We're going to multiply
and divide by this w.

543
00:31:57,880 --> 00:32:00,710
So we've got our f of x.

544
00:32:00,710 --> 00:32:02,636
We're going divide by w.

545
00:32:02,636 --> 00:32:04,630
We're going to multiply by w.

546
00:32:09,085 --> 00:32:15,220
And maybe now you can see
that what we have now,

547
00:32:15,220 --> 00:32:22,480
we have an expectation of
[? sine. ?] We can think

548
00:32:22,480 --> 00:32:26,220
about lumping this
thing together now taken

549
00:32:26,220 --> 00:32:29,220
with respect to the pdf w.

550
00:32:29,220 --> 00:32:33,010
So this is just the integral
of something times the pdf

551
00:32:33,010 --> 00:32:36,032
integrated over the support.

552
00:32:36,032 --> 00:32:39,770
So we have to write it up here.

553
00:32:45,912 --> 00:32:47,495
So the last line in
that development--

554
00:32:47,495 --> 00:32:51,175
the left-hand side is P of A,
which we haven't discretized

555
00:32:51,175 --> 00:32:52,050
with Monte Carlo yet.

556
00:32:52,050 --> 00:32:56,370
We're still doing
things exactly--

557
00:32:56,370 --> 00:33:04,980
is expectation with respect
or under this pdf w of-- now,

558
00:33:04,980 --> 00:33:15,010
the indicator function with this
waiting, f of x over w of x.

559
00:33:20,290 --> 00:33:24,920
So what does this mean?

560
00:33:24,920 --> 00:33:35,660
This means we could draw samples
from w instead of from x.

561
00:33:35,660 --> 00:33:42,590
So draw samples from the pdf w.

562
00:33:42,590 --> 00:33:44,100
But if you do that,
in order to get

563
00:33:44,100 --> 00:33:46,530
the right estimate
for the probability,

564
00:33:46,530 --> 00:33:52,010
you have to weight the
samples by the f over w

565
00:33:52,010 --> 00:33:53,900
to make up for the
fact that you drew

566
00:33:53,900 --> 00:33:56,546
from a different distribution.

567
00:33:56,546 --> 00:33:58,360
Does that make sense?

568
00:33:58,360 --> 00:34:05,440
So weight by f of
x over w of x, I'd

569
00:34:05,440 --> 00:34:09,010
say to counter the sort of
my informal way of thinking

570
00:34:09,010 --> 00:34:09,949
about it.

571
00:34:09,949 --> 00:34:13,530
And if you do that, then
the expectation works out

572
00:34:13,530 --> 00:34:19,760
so that you are still getting
the probability you're after.

573
00:34:19,760 --> 00:34:23,090
So then the last step is just to
write down what our Monte Carlo

574
00:34:23,090 --> 00:34:30,830
estimator would be-- so then our
Monte Carlo importance sampling

575
00:34:30,830 --> 00:34:41,690
estimator for this
probability of A is what?

576
00:34:41,690 --> 00:34:44,332
So let's call it P hat.

577
00:34:44,332 --> 00:34:46,630
We'll put an IS down
here to indicate

578
00:34:46,630 --> 00:34:48,800
that's the importance
sampling estimator.

579
00:34:48,800 --> 00:34:49,662
So help me write it.

580
00:34:49,662 --> 00:34:50,620
What does it look like?

581
00:34:53,570 --> 00:34:54,540
Take this risk part.

582
00:34:54,540 --> 00:34:55,415
That's the easy part.

583
00:34:58,730 --> 00:35:02,290
One over N-- is
it Libby or Casey?

584
00:35:02,290 --> 00:35:04,408
Oh, it's [INAUDIBLE].

585
00:35:04,408 --> 00:35:05,080
All right.

586
00:35:05,080 --> 00:35:07,690
Now, one of you guys have
to do the harder part.

587
00:35:07,690 --> 00:35:09,487
1 over N what?

588
00:35:14,480 --> 00:35:17,980
There is a regular
Monte Carlo estimator.

589
00:35:17,980 --> 00:35:23,080
Summation from I
equals 1 to N of what?

590
00:35:25,984 --> 00:35:37,960
iA times f of xi over w of xi.

591
00:35:37,960 --> 00:35:41,210
And it doesn't show
up in the formula,

592
00:35:41,210 --> 00:35:47,230
but it's important to know
that this is all when the xi's

593
00:35:47,230 --> 00:35:52,490
are drawn from the pdf w.

594
00:35:58,620 --> 00:35:59,387
That's that.

595
00:36:01,930 --> 00:36:04,020
We never noted that before
because we only ever

596
00:36:04,020 --> 00:36:07,030
had one pdf floating around,
but now that we've got two,

597
00:36:07,030 --> 00:36:09,150
so here's the estimator.

598
00:36:09,150 --> 00:36:15,580
If we drew the
samples from f of x,

599
00:36:15,580 --> 00:36:17,612
which is the regular
way of doing it,

600
00:36:17,612 --> 00:36:20,210
that would be the estimator,
but if we draw the samples

601
00:36:20,210 --> 00:36:22,405
from the pdf of w, then
this is our estimator,

602
00:36:22,405 --> 00:36:25,263
and basically we just
have to re weigh it.

603
00:36:28,650 --> 00:36:29,387
Is that clear?

604
00:36:32,660 --> 00:36:34,398
So can you see-- yeah.

605
00:36:34,398 --> 00:36:35,396
AUDIENCE: [INAUDIBLE].

606
00:36:42,881 --> 00:36:43,751
KAREN WILLCOX: Here?

607
00:36:43,751 --> 00:36:44,250
Yeah.

608
00:36:44,250 --> 00:36:46,930
We divided by w, and
we multiplied it by w.

609
00:36:46,930 --> 00:36:47,930
AUDIENCE: [INAUDIBLE].

610
00:36:54,432 --> 00:36:57,550
KAREN WILLCOX: Well, yeah.

611
00:36:57,550 --> 00:36:59,514
w is a pdf.

612
00:36:59,514 --> 00:37:00,930
Maybe I could have
written it like

613
00:37:00,930 --> 00:37:05,120
if w might have been-- yeah.

614
00:37:05,120 --> 00:37:08,100
Sorry. w is a pdf.

615
00:37:08,100 --> 00:37:08,600
Yeah.

616
00:37:08,600 --> 00:37:09,600
AUDIENCE: [INAUDIBLE].

617
00:37:13,865 --> 00:37:15,740
KAREN WILLCOX: This one
is strange, actually.

618
00:37:15,740 --> 00:37:17,320
We approached it
here by starting off

619
00:37:17,320 --> 00:37:19,500
by saying w is a pdf.

620
00:37:19,500 --> 00:37:21,182
So w itself is a pdf.

621
00:37:21,182 --> 00:37:23,140
When I introduce a general
importance sampling,

622
00:37:23,140 --> 00:37:25,327
we started off with
a random variable

623
00:37:25,327 --> 00:37:27,660
and said that's why we put
that constraint on so that we

624
00:37:27,660 --> 00:37:28,509
could get to a pdf.

625
00:37:28,509 --> 00:37:30,800
But here we're just say let's
introduce a secondary pdf

626
00:37:30,800 --> 00:37:34,570
or an accelerated pdf.

627
00:37:34,570 --> 00:37:37,040
So w is a pdf, which
means it has to satisfy

628
00:37:37,040 --> 00:37:38,980
the property of a pdf.

629
00:37:41,960 --> 00:37:43,360
Greg, did you have some--

630
00:37:43,360 --> 00:37:44,360
AUDIENCE: [INAUDIBLE].

631
00:37:49,064 --> 00:37:50,352
KAREN WILLCOX: This guy?

632
00:37:50,352 --> 00:37:51,276
AUDIENCE: [INAUDIBLE].

633
00:37:54,594 --> 00:37:55,760
KAREN WILLCOX: That's right.

634
00:37:55,760 --> 00:37:56,736
AUDIENCE: [INAUDIBLE].

635
00:38:03,477 --> 00:38:04,310
KAREN WILLCOX: Yeah.

636
00:38:04,310 --> 00:38:08,750
So this would be given to you.

637
00:38:08,750 --> 00:38:11,860
So one thing that's just a
little bit confusing and maybe

638
00:38:11,860 --> 00:38:15,770
I haven't been entirely, so
I'm going to put it over here,

639
00:38:15,770 --> 00:38:18,610
is that when we think
about Monte Carlo,

640
00:38:18,610 --> 00:38:21,530
I'm going to use this completely
different variable so that I

641
00:38:21,530 --> 00:38:24,990
don't mix up with w's and f's.

642
00:38:24,990 --> 00:38:29,780
The model has say, an
input that might be d

643
00:38:29,780 --> 00:38:33,940
and an output that might be q.

644
00:38:33,940 --> 00:38:37,440
And we talk about drawing
from some distribution of d.

645
00:38:37,440 --> 00:38:42,000
Say d is normal propagating
it through and getting

646
00:38:42,000 --> 00:38:45,172
the corresponding
whatever q looks like.

647
00:38:45,172 --> 00:38:48,750
So here when I talk about
estimating the probability of A

648
00:38:48,750 --> 00:38:52,900
by drawing from
x-- x is really q.

649
00:38:52,900 --> 00:38:58,736
Because this is the pdf that
defines if A is out in here,

650
00:38:58,736 --> 00:39:00,360
we want to draw from
this distribution.

651
00:39:00,360 --> 00:39:02,701
I mean, if I gave you
just this distribution, q,

652
00:39:02,701 --> 00:39:04,450
and I asked you to
estimate a probability,

653
00:39:04,450 --> 00:39:07,220
you would randomly
sample from it

654
00:39:07,220 --> 00:39:09,490
and then you would count
the number on the tail.

655
00:39:09,490 --> 00:39:11,370
So there is kind of
an extra step, which

656
00:39:11,370 --> 00:39:14,580
is that we're really drawing
from the d and pushing it

657
00:39:14,580 --> 00:39:18,450
through the model
to characterize q.

658
00:39:18,450 --> 00:39:20,186
Is that clear enough?

659
00:39:20,186 --> 00:39:22,144
AUDIENCE: The f point,
is that all the samples,

660
00:39:22,144 --> 00:39:26,016
or is that just the
[? sales ?] proportion?

661
00:39:26,016 --> 00:39:27,420
KAREN WILLCOX: For 10 points?

662
00:39:27,420 --> 00:39:28,110
Oh, the N?

663
00:39:28,110 --> 00:39:29,814
That's going to be
all the samples.

664
00:39:29,814 --> 00:39:30,314
Yeah.

665
00:39:30,314 --> 00:39:31,268
AUDIENCE: [INAUDIBLE].

666
00:39:35,057 --> 00:39:35,890
KAREN WILLCOX: Yeah.

667
00:39:35,890 --> 00:39:38,990
So let me try to write
a summary and draw

668
00:39:38,990 --> 00:39:45,857
some pictures to help
you, because I don't

669
00:39:45,857 --> 00:39:46,940
want to mix up two things.

670
00:39:46,940 --> 00:39:50,525
But what we would we do here
is we would sample randomly,

671
00:39:50,525 --> 00:39:54,206
and more samples are going to
end up in here than in here.

672
00:39:54,206 --> 00:39:55,580
And every time we
sample here, we

673
00:39:55,580 --> 00:39:58,600
generate a corresponding
sample of this.

674
00:39:58,600 --> 00:40:00,600
So when you think about
the importance sampling,

675
00:40:00,600 --> 00:40:03,190
just don't really think
about this part of it.

676
00:40:03,190 --> 00:40:05,670
Think about it as we're
sampling from this.

677
00:40:05,670 --> 00:40:07,920
But it turns out we're
sampling from this by something

678
00:40:07,920 --> 00:40:09,520
from this and going forward.

679
00:40:09,520 --> 00:40:12,220
But we're still
sampling from this.

680
00:40:12,220 --> 00:40:15,909
So when we do that and we
put samples, most of them

681
00:40:15,909 --> 00:40:17,700
are going to end up
over here, and then one

682
00:40:17,700 --> 00:40:21,660
in a billion times, we get
one out here on average.

683
00:40:21,660 --> 00:40:22,790
Yeah.

684
00:40:22,790 --> 00:40:26,809
So what we're saying
is we could define

685
00:40:26,809 --> 00:40:27,850
a different distribution.

686
00:40:27,850 --> 00:40:30,470
We're jumping here a
little bit, but let's just

687
00:40:30,470 --> 00:40:34,190
conceptually say that
distribution looks like this.

688
00:40:34,190 --> 00:40:37,749
And instead of sampling from
this guy-- which by the way

689
00:40:37,749 --> 00:40:39,540
we do by sampling here
and pushing forward,

690
00:40:39,540 --> 00:40:43,460
but that's not really relevant--
this sample from this guy

691
00:40:43,460 --> 00:40:47,030
so that now when I sample
from here, instead of 1

692
00:40:47,030 --> 00:40:51,970
in a billion, maybe 1 in
10 or maybe half of them

693
00:40:51,970 --> 00:40:55,930
actually fall in the regions
I mentioned for them.

694
00:40:55,930 --> 00:40:56,690
Yep.

695
00:40:56,690 --> 00:40:59,790
So now I'm going to have
N samples from this guy.

696
00:40:59,790 --> 00:41:01,640
But now when I
compute the estimate

697
00:41:01,640 --> 00:41:04,730
of the probability
of A, I can't just

698
00:41:04,730 --> 00:41:10,080
take the fraction that are here
divided by the total numbers.

699
00:41:10,080 --> 00:41:12,500
Clearly that would be
the wrong estimate.

700
00:41:12,500 --> 00:41:16,960
But if I take each sample
that [? folds ?] in here,

701
00:41:16,960 --> 00:41:22,630
the 1's, and multiply them
by the ratio of this guy

702
00:41:22,630 --> 00:41:27,930
to this guy, that's
going to recalibrate

703
00:41:27,930 --> 00:41:29,180
that to be the right estimate.

704
00:41:29,180 --> 00:41:30,805
And you can kind of
see it graphically,

705
00:41:30,805 --> 00:41:34,390
because what happens is
you get a sample here.

706
00:41:34,390 --> 00:41:38,807
The f of x is basically
0, fairly small.

707
00:41:38,807 --> 00:41:40,120
This guy is big.

708
00:41:40,120 --> 00:41:45,474
That ratio is going
to be really small.

709
00:41:45,474 --> 00:41:49,420
So even though it's the 1,
it gets weighted by a 10

710
00:41:49,420 --> 00:41:54,480
to the minus 9 or whatever
it is, or 10 to the minus 10.

711
00:41:54,480 --> 00:41:58,400
So you can see how it
works, and it all works out.

712
00:41:58,400 --> 00:42:01,250
Draw from this guy, get
more sample than here,

713
00:42:01,250 --> 00:42:03,670
but then each sample
doesn't get a 1.

714
00:42:03,670 --> 00:42:06,950
It gets a 1 times the
ratio of this to this.

715
00:42:06,950 --> 00:42:09,110
And by doing that, we'll
be weighting, [INAUDIBLE]

716
00:42:09,110 --> 00:42:12,454
actually estimating
the right probability.

717
00:42:12,454 --> 00:42:13,870
But then the next
question, that's

718
00:42:13,870 --> 00:42:15,286
kind of the next
thing we're going

719
00:42:15,286 --> 00:42:19,437
to talk about is, how do you
figure out what a good biasing

720
00:42:19,437 --> 00:42:20,103
distribution is?

721
00:42:20,103 --> 00:42:22,290
How do you come up
with the distribution?

722
00:42:22,290 --> 00:42:24,240
And it would be
easier if we didn't

723
00:42:24,240 --> 00:42:27,370
have this part of the
problem, but because we

724
00:42:27,370 --> 00:42:30,310
are generating the
samples from the inputs

725
00:42:30,310 --> 00:42:32,650
and pushing them
through the model,

726
00:42:32,650 --> 00:42:35,680
we don't necessarily know
how to bias the distribution

727
00:42:35,680 --> 00:42:39,260
here so that we get a good
biasing distribution here.

728
00:42:39,260 --> 00:42:42,940
And that's like saying do you
know what combination of inputs

729
00:42:42,940 --> 00:42:45,960
makes your aircraft much
vulnerable to failure.

730
00:42:45,960 --> 00:42:49,030
And sometimes you do,
and sometimes you don't.

731
00:42:49,030 --> 00:42:51,359
So again, that's kind
of the separate part.

732
00:42:51,359 --> 00:42:53,400
Part of it is a little
bit different to describe.

733
00:42:55,920 --> 00:42:56,420
Yeah.

734
00:42:56,420 --> 00:42:59,920
AUDIENCE: So you could take
like a Gaussian variance

735
00:42:59,920 --> 00:43:01,894
with a [? unit ?] and
plot that over there?

736
00:43:01,894 --> 00:43:02,870
KAREN WILLCOX: Yeah.

737
00:43:02,870 --> 00:43:03,754
You could.

738
00:43:03,754 --> 00:43:05,670
AUDIENCE: Would that be
like [? an ?] example?

739
00:43:05,670 --> 00:43:06,764
KAREN WILLCOX: Yeah.

740
00:43:06,764 --> 00:43:09,200
So really simple
things that people do

741
00:43:09,200 --> 00:43:12,050
is that they, and we'll
look, they scale this thing

742
00:43:12,050 --> 00:43:13,400
to shift more mass.

743
00:43:13,400 --> 00:43:16,715
Well, what I drew here
is a little bit extreme.

744
00:43:16,715 --> 00:43:18,090
They scale it to
shift more mass,

745
00:43:18,090 --> 00:43:22,240
and they basically make this
thing have better tails.

746
00:43:22,240 --> 00:43:24,320
And sometimes they
actually just shift it.

747
00:43:24,320 --> 00:43:26,170
We'll take whatever the
distribution is and just shift

748
00:43:26,170 --> 00:43:26,836
it and scale it.

749
00:43:28,787 --> 00:43:30,620
And then there are more
sophisticated things

750
00:43:30,620 --> 00:43:33,440
you can do, but, in fact, this
is somewhat of an open question

751
00:43:33,440 --> 00:43:38,370
is how do you, [INAUDIBLE]
define a good biasing

752
00:43:38,370 --> 00:43:44,323
distribution to make your
sampling really efficient?

753
00:43:44,323 --> 00:43:48,251
AUDIENCE: I would think
that it would [INAUDIBLE].

754
00:43:56,107 --> 00:43:58,080
KAREN WILLCOX: That's right.

755
00:43:58,080 --> 00:44:01,649
And so usually in
the cases where

756
00:44:01,649 --> 00:44:03,690
it's easy to come up with
a biasing distribution,

757
00:44:03,690 --> 00:44:06,970
you kind of know what conditions
including failure anywhere.

758
00:44:06,970 --> 00:44:09,850
And so it's sort of
obvious way to look.

759
00:44:09,850 --> 00:44:14,025
The really difficult problems
are when there's tens,

760
00:44:14,025 --> 00:44:16,710
hundreds, thousands of them
that are through input here.

761
00:44:16,710 --> 00:44:18,630
We don't even know which
combinations of input

762
00:44:18,630 --> 00:44:21,624
to the ones that make you
most vulnerable to failure.

763
00:44:21,624 --> 00:44:23,040
And the only way
to find out would

764
00:44:23,040 --> 00:44:24,710
be to sample them
all, which we already

765
00:44:24,710 --> 00:44:26,356
said we don't want to do.

766
00:44:26,356 --> 00:44:31,420
So aircraft design
is a good one, also

767
00:44:31,420 --> 00:44:34,400
people who simulate earthquakes
and all these kinds of things,

768
00:44:34,400 --> 00:44:40,380
I mean, where events of
different weather stuff.

769
00:44:40,380 --> 00:44:45,340
It's not even clear sometimes
that you can simulate and get

770
00:44:45,340 --> 00:44:51,830
good estimates for
some of these events.

771
00:44:51,830 --> 00:44:52,780
AUDIENCE: [INAUDIBLE].

772
00:44:56,127 --> 00:44:56,960
KAREN WILLCOX: Yeah.

773
00:44:56,960 --> 00:45:00,380
In some places, it's
really hard for us

774
00:45:00,380 --> 00:45:02,390
to characterize
what q is out here.

775
00:45:02,390 --> 00:45:04,139
The financial markets
are another one.

776
00:45:04,139 --> 00:45:06,680
I don't know if you guys realize
The Black Swan book, so it's

777
00:45:06,680 --> 00:45:10,510
all about [? fact ?]
tails and suit and models

778
00:45:10,510 --> 00:45:14,625
and don't account for things
that are like way out here,

779
00:45:14,625 --> 00:45:16,583
that are really unlikely,
but when they happen,

780
00:45:16,583 --> 00:45:18,614
they have a really big impact.

781
00:45:18,614 --> 00:45:22,960
So if we knew what
was going on here,

782
00:45:22,960 --> 00:45:24,936
we would already know
the probabilities.

783
00:45:24,936 --> 00:45:26,310
And so the trick
is that we don't

784
00:45:26,310 --> 00:45:28,220
know what's going
on here, but we

785
00:45:28,220 --> 00:45:30,640
want to learn about what's
going on here by finding

786
00:45:30,640 --> 00:45:31,704
better ways to sample.

787
00:45:31,704 --> 00:45:33,120
But to find better
ways to sample,

788
00:45:33,120 --> 00:45:35,835
we need to know what
we're looking for.

789
00:45:35,835 --> 00:45:37,210
So then there are
things, I mean,

790
00:45:37,210 --> 00:45:38,950
this still always
happens, and this

791
00:45:38,950 --> 00:45:40,580
is what keeps us
busy in research

792
00:45:40,580 --> 00:45:42,575
is that's where
you're using activity.

793
00:45:42,575 --> 00:45:46,480
You learn a little
bit and sample.

794
00:45:46,480 --> 00:45:55,530
So let's just quickly
summarize the steps

795
00:45:55,530 --> 00:45:56,880
for importance sampling.

796
00:45:56,880 --> 00:46:02,990
So we're going to define a w
of x and however we come up,

797
00:46:02,990 --> 00:46:11,158
we're going to draw
samples of x from that pdf.

798
00:46:14,650 --> 00:46:17,063
So I'll just note here
that this is a pdf.

799
00:46:17,063 --> 00:46:21,870
If you prefer to call
it f sub w, that's fine.

800
00:46:21,870 --> 00:46:26,377
And again, the idea is we
want to get more samples

801
00:46:26,377 --> 00:46:27,460
in the region of interest.

802
00:46:34,420 --> 00:46:43,110
And then we're going to
estimate this probability,

803
00:46:43,110 --> 00:46:49,530
but we're going to do
it using this thing here

804
00:46:49,530 --> 00:46:54,460
that we define, the
P hat, IS, where

805
00:46:54,460 --> 00:47:00,310
we have to do the weighting
to account for the fact

806
00:47:00,310 --> 00:47:02,150
that we drew from the
wrong distribution.

807
00:47:05,895 --> 00:47:19,840
So importance weights
if x divided by the w,

808
00:47:19,840 --> 00:47:20,925
that's the sample point.

809
00:47:23,552 --> 00:47:25,010
And one thing I
should say actually

810
00:47:25,010 --> 00:47:27,676
is if you don't choose a good w,
you can actually make it worse.

811
00:47:27,676 --> 00:47:35,110
You could make your Monte
Carlo convergence worse.

812
00:47:35,110 --> 00:47:38,270
So actually, I think there's
one final result to write down

813
00:47:38,270 --> 00:47:48,310
is what are the properties
of this P hat IS?

814
00:47:48,310 --> 00:47:56,850
They are that the expected
value of our importance sampling

815
00:47:56,850 --> 00:48:01,250
estimator for the probability
of A. What do you think it is?

816
00:48:08,680 --> 00:48:11,510
Hopefully, it's P of A.
It is, indeed, P of A.

817
00:48:11,510 --> 00:48:16,040
And that's actually, I mean,
we didn't do anything over here

818
00:48:16,040 --> 00:48:17,740
except move things around.

819
00:48:17,740 --> 00:48:19,240
We had equality all
the way through.

820
00:48:19,240 --> 00:48:21,176
So that's key.

821
00:48:21,176 --> 00:48:25,710
We dumbed this down
with the expected value.

822
00:48:25,710 --> 00:48:28,280
In other words, it's unbiased.

823
00:48:28,280 --> 00:48:33,980
But then the key is that--
you could put a w in here

824
00:48:33,980 --> 00:48:37,440
to be clear-- that the
variance of the estimator,

825
00:48:37,440 --> 00:48:41,180
which again controls basically
how good it is for a given

826
00:48:41,180 --> 00:48:45,270
number of samples, has got this
more complicated expression.

827
00:48:45,270 --> 00:48:49,500
So the 1 over N is
still out there.

828
00:48:49,500 --> 00:48:58,810
And then there is an expectation
of i of A, f of x over w of x,

829
00:48:58,810 --> 00:49:03,167
and then there is a
minus t of A squared.

830
00:49:03,167 --> 00:49:05,210
I think I got all the pieces.

831
00:49:11,111 --> 00:49:13,360
And that actually goes back
to what I was just saying.

832
00:49:13,360 --> 00:49:15,100
It's not actually
obvious whether this

833
00:49:15,100 --> 00:49:17,220
is smaller or bigger
than what we had before

834
00:49:17,220 --> 00:49:19,550
with the regular Monte Carlo
estimator, but of course,

835
00:49:19,550 --> 00:49:22,120
it all comes down
to the choice of w.

836
00:49:22,120 --> 00:49:26,700
I mean, the idea of choose a w
to make this small so that you

837
00:49:26,700 --> 00:49:28,421
get away with fewer sampling.

838
00:49:36,340 --> 00:49:37,590
That's kind of neat trick, no?

839
00:49:40,410 --> 00:49:42,876
Multiplying and
dividing by things, no?

840
00:49:42,876 --> 00:49:44,659
AUDIENCE: [INAUDIBLE].

841
00:49:44,659 --> 00:49:45,700
KAREN WILLCOX: It's what?

842
00:49:45,700 --> 00:49:47,125
AUDIENCE: [INAUDIBLE].

843
00:49:47,125 --> 00:49:48,550
KAREN WILLCOX: Yeah.

844
00:49:48,550 --> 00:49:52,523
It used to always drive me
crazy when I was an undergrad.

845
00:49:52,523 --> 00:49:55,301
AUDIENCE: [INAUDIBLE]

846
00:49:55,301 --> 00:49:58,656
KAREN WILLCOX: So next
just briefly how to pick,

847
00:49:58,656 --> 00:50:00,697
and I'm not going to give,
sorry, [? Trey, ?] I'm

848
00:50:00,697 --> 00:50:03,260
not going to give you any
definitive answers on this,

849
00:50:03,260 --> 00:50:05,990
but how to pick the
biasing distributions.

850
00:50:08,985 --> 00:50:10,930
So what do you do?

851
00:50:10,930 --> 00:50:16,970
So we'll just look at two yeah?

852
00:50:16,970 --> 00:50:17,958
AUDIENCE: [INAUDIBLE].

853
00:50:26,356 --> 00:50:27,390
KAREN WILLCOX: Oh, here?

854
00:50:27,390 --> 00:50:27,910
Yeah.

855
00:50:27,910 --> 00:50:31,580
Just like all of the
variances of the estimators

856
00:50:31,580 --> 00:50:35,806
have like sigma y or P
of A or whatever it is.

857
00:50:35,806 --> 00:50:36,778
AUDIENCE: [INAUDIBLE].

858
00:50:41,152 --> 00:50:43,100
KAREN WILLCOX: Yeah.

859
00:50:43,100 --> 00:50:45,120
Remember like the original.

860
00:50:45,120 --> 00:50:47,870
Even the variance to the
mean estimate is sigma y,

861
00:50:47,870 --> 00:50:50,986
sigma squared over N,
and you don't know sigma.

862
00:50:50,986 --> 00:50:52,360
So you can plug
in the estimates,

863
00:50:52,360 --> 00:50:55,630
but then you're introducing
additional error.

864
00:50:55,630 --> 00:50:58,154
Yeah.

865
00:50:58,154 --> 00:50:59,570
To actually compute
them, you have

866
00:50:59,570 --> 00:51:03,216
to use an estimate
for P of A. Yeah.

867
00:51:03,216 --> 00:51:05,656
AUDIENCE: [INAUDIBLE].

868
00:51:05,656 --> 00:51:06,632
KAREN WILLCOX: Yep.

869
00:51:08,584 --> 00:51:09,560
AUDIENCE: [INAUDIBLE].

870
00:51:13,464 --> 00:51:14,950
KAREN WILLCOX: Yeah.

871
00:51:14,950 --> 00:51:17,140
So this is the theoretical
expression for the thing.

872
00:51:17,140 --> 00:51:20,050
How you actually compute this
is a whole other question,

873
00:51:20,050 --> 00:51:22,466
but we've seen that with every
single one of the variances

874
00:51:22,466 --> 00:51:23,320
of our estimator.

875
00:51:23,320 --> 00:51:27,950
We had sigma squared over N.
We didn't know what this was.

876
00:51:27,950 --> 00:51:31,500
On the original one, we
had t1 minus t over N.

877
00:51:31,500 --> 00:51:33,220
We didn't know what this was.

878
00:51:33,220 --> 00:51:36,650
So all of them have got
the theoretical result

879
00:51:36,650 --> 00:51:38,110
and how you compute them.

880
00:51:38,110 --> 00:51:39,610
And you could compute
the estimates.

881
00:51:39,610 --> 00:51:42,120
You can do bootstrapping,
and in this case

882
00:51:42,120 --> 00:51:46,641
you would be able to
estimate everything.

883
00:51:46,641 --> 00:51:47,140
Yes.

884
00:51:53,760 --> 00:51:57,660
So one simple approach
is just to scale.

885
00:51:57,660 --> 00:52:00,800
So we're talking
about picking w.

886
00:52:00,800 --> 00:52:04,230
And thank you.

887
00:52:04,230 --> 00:52:07,030
These are homeworks 7.

888
00:52:07,030 --> 00:52:07,950
Thank you.

889
00:52:10,710 --> 00:52:15,810
And it's scaled in such a
way that we shift probability

890
00:52:15,810 --> 00:52:22,390
into the region of
interest, shift probability

891
00:52:22,390 --> 00:52:24,570
into the event regions.

892
00:52:24,570 --> 00:52:29,772
Again, the idea being that
we want to get more samples.

893
00:52:29,772 --> 00:52:30,730
So how would that work?

894
00:52:30,730 --> 00:52:35,830
In that case, w of
x could be some 1

895
00:52:35,830 --> 00:52:43,110
over a times the original pdf.

896
00:52:43,110 --> 00:52:45,251
But now the function
of x over a,

897
00:52:45,251 --> 00:52:51,518
where a is some constant
that's greater than 1.

898
00:52:51,518 --> 00:52:57,660
So I have to draw these for
them to make sense to me.

899
00:52:57,660 --> 00:53:02,610
So it's saying pick w to
be the scaled version of f

900
00:53:02,610 --> 00:53:11,994
of x where we scale the argument
and then also scale the result.

901
00:53:11,994 --> 00:53:14,725
So if I can just sort
of sketch emotionally

902
00:53:14,725 --> 00:53:16,100
what that might
look like, I know

903
00:53:16,100 --> 00:53:19,750
it helps me to think
about what it's doing.

904
00:53:19,750 --> 00:53:25,060
So let's say x and this is
the f of x, the original pdf.

905
00:53:25,060 --> 00:53:28,520
So maybe it's Gaussian.

906
00:53:28,520 --> 00:53:30,520
And just to make it easier
to think about, let's

907
00:53:30,520 --> 00:53:31,615
just set it at 0.

908
00:53:31,615 --> 00:53:33,476
It doesn't have to be.

909
00:53:33,476 --> 00:53:35,470
And it's high, which is
going to be something.

910
00:53:35,470 --> 00:53:38,670
We'll call it q.

911
00:53:38,670 --> 00:53:46,687
Then what is w?

912
00:53:46,687 --> 00:53:48,770
Does anyone want to have
a go at [? throwing ?] w?

913
00:53:51,740 --> 00:53:58,664
So the height is q over a.

914
00:53:58,664 --> 00:53:59,600
OK, good.

915
00:53:59,600 --> 00:54:01,486
Am I going to choose
a greater than 1?

916
00:54:01,486 --> 00:54:03,750
So it's going to be lower.

917
00:54:07,250 --> 00:54:08,250
AUDIENCE: [INAUDIBLE].

918
00:54:14,750 --> 00:54:16,319
KAREN WILLCOX: Good.

919
00:54:16,319 --> 00:54:18,360
You're much better at
visualizing this than I am.

920
00:54:18,360 --> 00:54:20,819
I had to mentally
do a few points

921
00:54:20,819 --> 00:54:22,090
because I'm not very good.

922
00:54:22,090 --> 00:54:27,570
It's not a very
good fit Gaussian.

923
00:54:27,570 --> 00:54:29,450
All right.

924
00:54:29,450 --> 00:54:31,970
So I had to actually sort of
mentally thinking it through.

925
00:54:31,970 --> 00:54:36,560
I'm like at this point,
we're at 1 and say a were 2,

926
00:54:36,560 --> 00:54:44,934
then this guy at 1 is going
to get the mass from 1/2.

927
00:54:44,934 --> 00:54:46,600
Then it's going to
scale it down by two.

928
00:54:46,600 --> 00:54:49,230
But because this thing
is falling off quickly,

929
00:54:49,230 --> 00:54:51,450
it's still going
to end up being set

930
00:54:51,450 --> 00:54:54,190
or relative to what
was in the middle.

931
00:54:54,190 --> 00:54:57,090
So it's exactly a
scaling that's going

932
00:54:57,090 --> 00:55:02,354
to shift probability mass out.

933
00:55:02,354 --> 00:55:04,562
So this is what I was saying
when I meant [? heads ?]

934
00:55:04,562 --> 00:55:05,740
or tails.

935
00:55:05,740 --> 00:55:07,870
Any problems you see
with this if we're

936
00:55:07,870 --> 00:55:10,040
trying to estimate a
probability of failure,

937
00:55:10,040 --> 00:55:12,170
say the probability
that x is greater

938
00:55:12,170 --> 00:55:13,490
than some critical value?

939
00:55:20,390 --> 00:55:20,890
What's that?

940
00:55:20,890 --> 00:55:21,590
AUDIENCE: [INAUDIBLE].

941
00:55:21,590 --> 00:55:22,770
KAREN WILLCOX: Yeah,
it might not still be.

942
00:55:22,770 --> 00:55:24,580
Anyway it's definitely going
to generate more samples over

943
00:55:24,580 --> 00:55:27,470
here, but it's also going to
generate more samples over

944
00:55:27,470 --> 00:55:28,370
here.

945
00:55:28,370 --> 00:55:32,360
So we've made this tail
fatter, but we've also

946
00:55:32,360 --> 00:55:34,876
made this tail fatter.

947
00:55:34,876 --> 00:55:38,828
So

948
00:55:38,828 --> 00:55:39,816
AUDIENCE: [INAUDIBLE]?

949
00:55:43,274 --> 00:55:46,930
KAREN WILLCOX: Can you do
one if f of x is uniform?

950
00:55:46,930 --> 00:55:50,830
I think about what w would be.

951
00:55:55,100 --> 00:55:57,061
But basically w can
be anything you want.

952
00:55:57,061 --> 00:56:00,288
AUDIENCE: [INAUDIBLE].

953
00:56:00,288 --> 00:56:03,161
KAREN WILLCOX: Yeah, that's
just one way to do it.

954
00:56:05,870 --> 00:56:13,350
But you've got to make
sure that the support of w

955
00:56:13,350 --> 00:56:16,090
is it at least as wide
[INAUDIBLE] actually

956
00:56:16,090 --> 00:56:19,090
going to be dividing by 0.

957
00:56:19,090 --> 00:56:23,070
So if this guy was
uniform, I mean--

958
00:56:23,070 --> 00:56:24,020
AUDIENCE: [INAUDIBLE].

959
00:56:28,295 --> 00:56:31,020
KAREN WILLCOX: That's right.

960
00:56:31,020 --> 00:56:34,130
But you couldn't have this
be uniform and then say,

961
00:56:34,130 --> 00:56:39,010
I'm going to take
my w to be this guy.

962
00:56:39,010 --> 00:56:43,948
Well, what's generating
from here would you--

963
00:56:43,948 --> 00:56:46,239
you would be OK because you
never generate points here.

964
00:56:49,162 --> 00:56:49,662
Yeah.

965
00:56:49,662 --> 00:56:50,640
AUDIENCE: [INAUDIBLE].

966
00:56:55,530 --> 00:56:57,300
KAREN WILLCOX: You
could but then they

967
00:56:57,300 --> 00:57:00,200
would just get multiplied by 0.

968
00:57:00,200 --> 00:57:02,780
Then you'd be wasting
[? holds. ?] Yes,

969
00:57:02,780 --> 00:57:05,688
I guess there's no reason
why you couldn't do this.

970
00:57:05,688 --> 00:57:06,636
Yeah.

971
00:57:06,636 --> 00:57:10,340
So that's one simple thing to
do is just to move mass out,

972
00:57:10,340 --> 00:57:11,390
maybe [? theta ?] tails.

973
00:57:11,390 --> 00:57:12,790
Another simple
thing you could do

974
00:57:12,790 --> 00:57:20,920
is translation, which
just means that w of x

975
00:57:20,920 --> 00:57:26,410
would be f of x minus c, where
maybe c is greater than 0

976
00:57:26,410 --> 00:57:31,922
if we're thinking
about a right tail.

977
00:57:31,922 --> 00:57:33,630
And again, maybe this
wouldn't make sense

978
00:57:33,630 --> 00:57:35,730
if you had a finite support
like a uniform distribution

979
00:57:35,730 --> 00:57:37,105
because then you
would be putting

980
00:57:37,105 --> 00:57:39,220
w, which you didn't care about.

981
00:57:39,220 --> 00:57:41,060
But what would that
look like notionally?

982
00:57:41,060 --> 00:57:44,360
If that's x and
that's f of x, we

983
00:57:44,360 --> 00:57:52,410
get a first term normal
[INAUDIBLE] 0, then the w of x

984
00:57:52,410 --> 00:57:56,480
is just going to be shifted
over by an amount, c.

985
00:57:59,670 --> 00:58:03,340
So again, all it's doing is
putting more probability mass

986
00:58:03,340 --> 00:58:06,400
in this case in the right
tail that we care about.

987
00:58:14,620 --> 00:58:16,550
But thinking of
biasing distribution,

988
00:58:16,550 --> 00:58:18,930
the simple problems,
simple things probably

989
00:58:18,930 --> 00:58:21,060
work for really
difficult problems

990
00:58:21,060 --> 00:58:24,630
with lots of input
in very rare events,

991
00:58:24,630 --> 00:58:26,911
differently, still
an open question.

992
00:58:33,294 --> 00:58:35,210
Any other questions about
importance sampling.

993
00:58:35,210 --> 00:58:37,126
We sometimes talk about
distributing analysis?

994
00:58:39,174 --> 00:58:43,370
Does it all makes sense
that's on this screen?

995
00:58:46,431 --> 00:58:46,930
Yes.

996
00:58:46,930 --> 00:58:47,846
AUDIENCE: [INAUDIBLE].

997
00:58:50,760 --> 00:58:53,350
KAREN WILLCOX: Actually control
variance are my favorites.

998
00:58:53,350 --> 00:58:56,970
I've mentioned some of the other
variance reduction methods.

999
00:58:56,970 --> 00:59:01,150
Control variance is another way
to do variance reduction that I

1000
00:59:01,150 --> 00:59:02,862
think is also a kind of neat.

1001
00:59:02,862 --> 00:59:04,385
So if you're
interested in knowing

1002
00:59:04,385 --> 00:59:05,760
other variance
reduction methods,

1003
00:59:05,760 --> 00:59:07,900
I could give you
some stuff to read.

1004
00:59:10,510 --> 00:59:15,880
So let's now just in
the last 15 minutes

1005
00:59:15,880 --> 00:59:18,604
talk a little bit about
sensitivity analysis

1006
00:59:18,604 --> 00:59:22,652
in the context of Monte Carlos.

1007
00:59:22,652 --> 00:59:27,460
So the question is now how
do we use our Monte Carlo

1008
00:59:27,460 --> 00:59:31,640
results to understand
which uncertain inputs are

1009
00:59:31,640 --> 00:59:33,760
contributing the most
to output variability.

1010
00:59:33,760 --> 00:59:36,060
So I want you to put
this picture back

1011
00:59:36,060 --> 00:59:38,870
in your mind, this
guy here, which

1012
00:59:38,870 --> 00:59:44,390
is that we've got
uncertain inputs,

1013
00:59:44,390 --> 00:59:46,090
and maybe there's
a lot of them--

1014
00:59:46,090 --> 00:59:50,940
d1, d2, d3, however many.

1015
00:59:50,940 --> 00:59:53,790
And maybe there's one
output of interest,

1016
00:59:53,790 --> 00:59:55,015
or maybe there is lots.

1017
00:59:55,015 --> 00:59:57,830
But we've run
these Monte Carlos,

1018
00:59:57,830 --> 01:00:01,350
we put pdf's in all these guys.

1019
01:00:01,350 --> 01:00:02,280
We draw samples.

1020
01:00:02,280 --> 01:00:03,090
We run through.

1021
01:00:03,090 --> 01:00:08,160
We generate some kind
of output histogram

1022
01:00:08,160 --> 01:00:12,890
on q that looks like
however it looks.

1023
01:00:12,890 --> 01:00:15,100
But now we want to ask
the question, well,

1024
01:00:15,100 --> 01:00:18,050
which one of these is
really contributing

1025
01:00:18,050 --> 01:00:23,050
to the variability, or
which combinations of these

1026
01:00:23,050 --> 01:00:24,680
are contributing
to the variability?

1027
01:00:24,680 --> 01:00:27,672
Which combinations are
causing this big fat tail

1028
01:00:27,672 --> 01:00:29,630
out here, which corresponds
to [INAUDIBLE] that

1029
01:00:29,630 --> 01:00:32,136
don't meet our design criteria?

1030
01:00:32,136 --> 01:00:35,449
So this is really now kind
of the opposite question.

1031
01:00:35,449 --> 01:00:37,490
You've done a forward
propagation of uncertainty,

1032
01:00:37,490 --> 01:00:39,225
and you've got the
uncertainty in q,

1033
01:00:39,225 --> 01:00:42,830
and now you want to figure
out where did it come from.

1034
01:00:42,830 --> 01:00:44,790
So forward propagation
of uncertainty,

1035
01:00:44,790 --> 01:00:49,150
sensitivity analysis is kind
of like the reverse question.

1036
01:00:49,150 --> 01:00:50,940
And it's important
for two reasons,

1037
01:00:50,940 --> 01:00:54,050
one is that it
helps us understand

1038
01:00:54,050 --> 01:00:57,670
where we should focus our
uncertainty reduction efforts.

1039
01:00:57,670 --> 01:01:00,372
But I think in many ways when
you look say like an aircraft

1040
01:01:00,372 --> 01:01:02,580
design and development
program at a place like Boeing

1041
01:01:02,580 --> 01:01:05,250
or wherever, so
much of the process

1042
01:01:05,250 --> 01:01:08,250
is about reducing uncertainty.

1043
01:01:08,250 --> 01:01:11,450
It's about running
experiments or tests

1044
01:01:11,450 --> 01:01:13,200
or running models and
trying to figure out

1045
01:01:13,200 --> 01:01:15,033
exactly what the
performance is of the thing

1046
01:01:15,033 --> 01:01:17,660
that you're designing and
making decisions as well.

1047
01:01:17,660 --> 01:01:21,060
So understanding where
this big uncertainty can

1048
01:01:21,060 --> 01:01:23,600
help you decide what
kind of experiments

1049
01:01:23,600 --> 01:01:25,690
to do, whether it will
improve your models,

1050
01:01:25,690 --> 01:01:27,340
if you're thinking
about uncertainty

1051
01:01:27,340 --> 01:01:29,350
in finished
products, it can help

1052
01:01:29,350 --> 01:01:31,620
guide you one where you
should tighten tolerances

1053
01:01:31,620 --> 01:01:34,235
in the new
manufacturing process.

1054
01:01:34,235 --> 01:01:36,610
It might tell you that you've
got uncertainty because you

1055
01:01:36,610 --> 01:01:37,860
don't know certain conditions.

1056
01:01:37,860 --> 01:01:40,210
So maybe it says you need
a sensor in your engine

1057
01:01:40,210 --> 01:01:42,590
or on board your
aircraft or wherever.

1058
01:01:42,590 --> 01:01:46,100
Sometimes this is called
factor prioritization,

1059
01:01:46,100 --> 01:01:49,410
figuring out what the
priority order of inputs.

1060
01:01:49,410 --> 01:01:53,500
Effect is just a term that is
often used in the statistics

1061
01:01:53,500 --> 01:01:54,340
community.

1062
01:01:54,340 --> 01:01:57,656
Which one should be priorities
for uncertainty reductions.

1063
01:01:57,656 --> 01:02:00,164
So it's kind of half a story,
but the other half of story

1064
01:02:00,164 --> 01:02:01,580
is that it's also
really important

1065
01:02:01,580 --> 01:02:04,100
to understand where the
uncertainties are not

1066
01:02:04,100 --> 01:02:06,540
important.

1067
01:02:06,540 --> 01:02:11,200
That they could be
distributions on d1,

1068
01:02:11,200 --> 01:02:13,770
but this doesn't really matter
because this thing is not

1069
01:02:13,770 --> 01:02:16,459
sensitive to d1.

1070
01:02:16,459 --> 01:02:18,000
And that's important
because then you

1071
01:02:18,000 --> 01:02:19,833
shouldn't waste your
time worrying about d1,

1072
01:02:19,833 --> 01:02:22,850
and you shouldn't waste your
time arguing with other groups

1073
01:02:22,850 --> 01:02:26,060
about whether d1 should
be 1 or 1.1 or 1.2.

1074
01:02:26,060 --> 01:02:29,190
So it's important to understand
where things are actually

1075
01:02:29,190 --> 01:02:31,312
not important.

1076
01:02:31,312 --> 01:02:32,770
And this is sometimes
called factor

1077
01:02:32,770 --> 01:02:36,734
fixing because this is where you
can understand where it's not

1078
01:02:36,734 --> 01:02:38,150
important to
consider uncertainty,

1079
01:02:38,150 --> 01:02:41,210
and you can just pick a
deterministic value of an input

1080
01:02:41,210 --> 01:02:42,279
and go forward.

1081
01:02:42,279 --> 01:02:44,820
And this is really important,
actually in the policy setting.

1082
01:02:44,820 --> 01:02:48,050
As I mentioned before,
you can work with FAA.

1083
01:02:48,050 --> 01:02:50,840
People argue about all kinds of
things when they make policies.

1084
01:02:50,840 --> 01:02:52,430
It's nice to be able
to show that some

1085
01:02:52,430 --> 01:02:54,280
of the things that you
argue about actually

1086
01:02:54,280 --> 01:02:59,410
don't matter for the uncertainty
in the policy decision.

1087
01:02:59,410 --> 01:03:00,835
Less things to argue about.

1088
01:03:00,835 --> 01:03:01,480
All right.

1089
01:03:01,480 --> 01:03:04,340
So how can we do
sensitivity analysis?

1090
01:03:04,340 --> 01:03:07,790
There is something called VABO,
vary-all-but-one [INAUDIBLE].

1091
01:03:11,730 --> 01:03:13,200
Think about doing.

1092
01:03:13,200 --> 01:03:14,690
So here are the steps.

1093
01:03:14,690 --> 01:03:20,160
First of all run your Monte Caro
with all the inputs varying.

1094
01:03:20,160 --> 01:03:25,377
Think about d1, d2, d3, up to
however many we have of them.

1095
01:03:25,377 --> 01:03:26,710
We've got pdf's for all of them.

1096
01:03:26,710 --> 01:03:27,520
We sample all of them.

1097
01:03:27,520 --> 01:03:29,446
We generate the
corresponding-- it's called q

1098
01:03:29,446 --> 01:03:31,080
on that order over there.

1099
01:03:31,080 --> 01:03:34,155
Then let's go and
take input case,

1100
01:03:34,155 --> 01:03:36,840
so it's like the first
one, d1, and let's

1101
01:03:36,840 --> 01:03:40,930
fix it to deterministic value so
it's no longer got a pdf, it's

1102
01:03:40,930 --> 01:03:44,080
no longer able to vary, and
rerun the Monte Carlo with all

1103
01:03:44,080 --> 01:03:48,600
the other ones-- all the
other, however many there

1104
01:03:48,600 --> 01:03:51,470
are d minus 1 inputs varying.

1105
01:03:51,470 --> 01:03:54,250
So now we have the results of
two Monte Carlo simulations,

1106
01:03:54,250 --> 01:03:55,740
one with a fixed,
one was varying,

1107
01:03:55,740 --> 01:03:57,630
and one where it wasn't.

1108
01:03:57,630 --> 01:03:59,950
And you could compare the
statistic to the output.

1109
01:03:59,950 --> 01:04:01,790
You could look at the
variance from run one,

1110
01:04:01,790 --> 01:04:03,680
and the variance of run two.

1111
01:04:03,680 --> 01:04:07,420
And you could then attribute
the difference in the variance

1112
01:04:07,420 --> 01:04:09,800
to that fixed factor.

1113
01:04:09,800 --> 01:04:15,290
And then he would repeat
it with 1, 3, 4, 5, 6.

1114
01:04:15,290 --> 01:04:19,360
It varies at 6 2, 6 3, 6 4.

1115
01:04:19,360 --> 01:04:21,230
So vary all but one.

1116
01:04:23,524 --> 01:04:25,190
I mean, maybe it's a
useful thing to do,

1117
01:04:25,190 --> 01:04:26,810
but of course, there's question.

1118
01:04:26,810 --> 01:04:29,143
The first question you'd say,
well, what value should we

1119
01:04:29,143 --> 01:04:31,670
fix factor k?

1120
01:04:31,670 --> 01:04:36,160
If we fix it to this value
and ran the Monte Carlo,

1121
01:04:36,160 --> 01:04:38,690
would we get a different
result than if we

1122
01:04:38,690 --> 01:04:41,010
fixed it a different value?

1123
01:04:41,010 --> 01:04:45,255
And the answer for a complicated
system is probably yes.

1124
01:04:45,255 --> 01:04:47,630
And then the other question
maybe you should be wondering

1125
01:04:47,630 --> 01:04:49,400
is what about
possible interactions.

1126
01:04:49,400 --> 01:04:54,400
If I fix input 1 at a
value and analyze things,

1127
01:04:54,400 --> 01:04:58,556
but what if I fixed input
1 and input 2 and there

1128
01:04:58,556 --> 01:05:01,120
were interactions between them,
and I never explored them.

1129
01:05:01,120 --> 01:05:03,640
Are there interactions
that might be important

1130
01:05:03,640 --> 01:05:06,610
that I would be missing?

1131
01:05:06,610 --> 01:05:09,670
So to address those
limitations is

1132
01:05:09,670 --> 01:05:14,420
something that's called
global sensitivity analysis.

1133
01:05:14,420 --> 01:05:19,720
So let's get this off.

1134
01:05:19,720 --> 01:05:23,620
And I think when you think about
global sensitivity analysis,

1135
01:05:23,620 --> 01:05:27,540
it's useful to have this
vary all but month's color

1136
01:05:27,540 --> 01:05:34,340
in your mind because it's
conceptually a little bit

1137
01:05:34,340 --> 01:05:36,650
the same.

1138
01:05:36,650 --> 01:05:38,073
So we're on 4.

1139
01:05:38,073 --> 01:05:40,450
And 4a was the vary-all-but-one.

1140
01:05:40,450 --> 01:05:44,895
So this is 4b, global
sensitivity analysis.

1141
01:06:02,162 --> 01:06:05,520
So I'm going to
draw a picture here.

1142
01:06:05,520 --> 01:06:07,835
I'm going to use
different notation,

1143
01:06:07,835 --> 01:06:09,460
but let me not try
to change otherwise,

1144
01:06:09,460 --> 01:06:12,050
I will end up
mixing something up.

1145
01:06:12,050 --> 01:06:14,450
So we've got a model.

1146
01:06:14,450 --> 01:06:19,150
So then I'll call the inputs
x1, x2, down to-- we're

1147
01:06:19,150 --> 01:06:28,041
going to have [INAUDIBLE],
so d random inputs.

1148
01:06:28,041 --> 01:06:32,350
Now, let's just think about a
single random output, so then

1149
01:06:32,350 --> 01:06:34,170
just one random output.

1150
01:06:40,688 --> 01:06:47,475
So global sensitivity
analysis defines [AUDIO OUT]

1151
01:06:47,475 --> 01:06:48,600
called sensitivity indices.

1152
01:06:51,210 --> 01:06:55,360
And there are two different
kinds of sensitivity indices.

1153
01:06:55,360 --> 01:06:58,982
So the first one is what's
called a main effect.

1154
01:06:58,982 --> 01:07:03,783
Did I talk about main
effects in 1609, 6041?

1155
01:07:03,783 --> 01:07:05,082
No.

1156
01:07:05,082 --> 01:07:07,410
Maybe in 62x,
anybody seen effects?

1157
01:07:10,230 --> 01:07:21,040
So if you've seen effects
sensitivity index for input i,

1158
01:07:21,040 --> 01:07:23,262
usually the set of
[? sessions ?] of u

1159
01:07:23,262 --> 01:07:24,970
say people would use
it with factor here,

1160
01:07:24,970 --> 01:07:27,820
but let's just call it an input.

1161
01:07:27,820 --> 01:07:31,912
So this thing is
[? the limitation ?] si.

1162
01:07:31,912 --> 01:07:34,120
Let's write the formula and
talk about what it means.

1163
01:07:34,120 --> 01:07:41,020
It's the variance of Y
minus the expected value

1164
01:07:41,020 --> 01:08:03,945
of the variance of Y given xi
all divided by the variance, Y.

1165
01:08:03,945 --> 01:08:06,560
So the variance of
Y minus the expected

1166
01:08:06,560 --> 01:08:09,790
value of the variance
of Y given xi divided

1167
01:08:09,790 --> 01:08:11,986
by the variance
of Y. Does anyone

1168
01:08:11,986 --> 01:08:13,860
want to have a go at
explaining what this is?

1169
01:08:16,848 --> 01:08:17,844
AUDIENCE: [INAUDIBLE].

1170
01:08:46,230 --> 01:08:47,710
KAREN WILLCOX: Yes.

1171
01:08:47,710 --> 01:08:53,850
So if this happened to be 0,
then overall sensitivity index

1172
01:08:53,850 --> 01:08:55,950
would be what?

1173
01:08:55,950 --> 01:08:57,290
1.

1174
01:08:57,290 --> 01:08:59,510
And it would mean that
all of the variance in Y

1175
01:08:59,510 --> 01:09:01,370
was explained by xi,
because when we fix

1176
01:09:01,370 --> 01:09:05,029
xi the variance [INAUDIBLE].

1177
01:09:05,029 --> 01:09:07,439
What about if the
variance of this thing

1178
01:09:07,439 --> 01:09:09,711
were the variance of Y?

1179
01:09:09,711 --> 01:09:11,202
Doesn't have any effect.

1180
01:09:11,202 --> 01:09:12,540
Yes.

1181
01:09:12,540 --> 01:09:14,140
Why is there an
expectation here?

1182
01:09:21,149 --> 01:09:27,686
So we go with [AUDIO OUT]
So we're saying Y give xi.

1183
01:09:27,686 --> 01:09:28,684
AUDIENCE: [INAUDIBLE].

1184
01:09:37,167 --> 01:09:38,180
KAREN WILLCOX: Yeah.

1185
01:09:38,180 --> 01:09:41,752
If we picked any particular
value which to effect xi,

1186
01:09:41,752 --> 01:09:45,119
as I said given
xie to some x star,

1187
01:09:45,119 --> 01:09:47,510
then it would just be a number.

1188
01:09:47,510 --> 01:09:50,840
But because this thing
is a random variable,

1189
01:09:50,840 --> 01:09:52,260
we don't know where to fix it.

1190
01:09:52,260 --> 01:09:53,968
So this is addressing
this question of we

1191
01:09:53,968 --> 01:09:55,188
don't know where to fix it.

1192
01:09:55,188 --> 01:09:57,515
So at least the way I
think of it conceptually

1193
01:09:57,515 --> 01:10:00,740
is that we're going to fix
it at all possible places

1194
01:10:00,740 --> 01:10:03,234
and then [INAUDIBLE] over there.

1195
01:10:03,234 --> 01:10:08,110
And so it's the expected
reduction in the variance

1196
01:10:08,110 --> 01:10:08,650
is relative.

1197
01:10:08,650 --> 01:10:09,910
It's normalized.

1198
01:10:09,910 --> 01:10:12,780
So it's the expected relative
reduction of the variance

1199
01:10:12,780 --> 01:10:16,660
if we learned everything
about the factor xi.

1200
01:10:16,660 --> 01:10:21,560
So it's giving us a measure of
how much to does xi contribute

1201
01:10:21,560 --> 01:10:24,160
to the variance in
Y, but it's a measure

1202
01:10:24,160 --> 01:10:27,260
where we're taking expectation
over the possible values

1203
01:10:27,260 --> 01:10:28,830
that xi could take on.

1204
01:10:28,830 --> 01:10:31,334
So that's where the
word global come from.

1205
01:10:31,334 --> 01:10:31,834
[INAUDIBLE]

1206
01:10:36,740 --> 01:10:39,720
So let me just
write it in words.

1207
01:10:39,720 --> 01:10:47,330
So expected relative
reduction in output variance--

1208
01:10:47,330 --> 01:10:53,440
and then output variance
is variance of Y-- it's

1209
01:10:53,440 --> 01:11:04,230
a true value of xi
is [? learned. ?]

1210
01:11:04,230 --> 01:11:06,460
And then another thing
that's important to see

1211
01:11:06,460 --> 01:11:16,817
is that this is a measure of
the effects of varying xi alone.

1212
01:11:23,219 --> 01:11:25,260
And if we wrote this in
a slightly different way,

1213
01:11:25,260 --> 01:11:26,926
we could write it,
we can see that would

1214
01:11:26,926 --> 01:11:31,360
be averaged over
variations in other inputs.

1215
01:11:37,742 --> 01:11:39,830
We'll see you
Monday when we talk

1216
01:11:39,830 --> 01:11:42,460
about [? planet ?] experiments.

1217
01:11:42,460 --> 01:11:44,346
It's another way of
computing of effects

1218
01:11:44,346 --> 01:11:46,720
that [? finds ?] out more with
the second interpretation.

1219
01:11:46,720 --> 01:11:47,553
Let me say it again.

1220
01:11:47,553 --> 01:11:50,630
Measure, this is
what we talked about.

1221
01:11:50,630 --> 01:11:52,240
Measure of the
effect of varying xi

1222
01:11:52,240 --> 01:11:57,200
alone averaged over the
variations of other inputs.

1223
01:11:57,200 --> 01:12:00,113
And that's where the term
main effect comes in.

1224
01:12:00,113 --> 01:12:04,510
So if we were to change
xi, if we were to vary xi,

1225
01:12:04,510 --> 01:12:09,780
how does it change the output
averaged over everything else

1226
01:12:09,780 --> 01:12:13,520
in the [? x3, ?] x4 varying?

1227
01:12:13,520 --> 01:12:14,980
Is that clear?

1228
01:12:14,980 --> 01:12:18,310
So there are two different
ways you can think about that.

1229
01:12:18,310 --> 01:12:19,278
AUDIENCE: [INAUDIBLE].

1230
01:12:26,054 --> 01:12:32,270
KAREN WILLCOX: So
xi can at most be 1

1231
01:12:32,270 --> 01:12:37,070
because the most we
can get here is 0 if i

1232
01:12:37,070 --> 01:12:38,990
is going to be between 0 and 1.

1233
01:12:38,990 --> 01:12:41,920
And the idea is that
you could compute these

1234
01:12:41,920 --> 01:12:44,090
then you would rank them.

1235
01:12:44,090 --> 01:12:47,110
And if you were looking for
where you focus your efforts,

1236
01:12:47,110 --> 01:12:49,360
which variable you try to
control in the manufacturing

1237
01:12:49,360 --> 01:12:52,860
process, before we go after
the one with the biggest

1238
01:12:52,860 --> 01:12:55,850
sensitivity index
that you occupy.

1239
01:12:55,850 --> 01:13:01,100
So actually turns out we can
also compute higher order

1240
01:13:01,100 --> 01:13:02,040
interaction indices.

1241
01:13:04,630 --> 01:13:08,760
You can also compute effects
for the interactions.

1242
01:13:12,270 --> 01:13:16,935
And I won't go into
details, but instead

1243
01:13:16,935 --> 01:13:19,880
of now just being si, there
could be an sij, which

1244
01:13:19,880 --> 01:13:23,760
would be two variables, two
inputs interacting or three

1245
01:13:23,760 --> 01:13:26,420
variables interacting
or four, all the way up

1246
01:13:26,420 --> 01:13:29,530
to all of them interacting.

1247
01:13:29,530 --> 01:13:31,320
And what the match
shows, I mean,

1248
01:13:31,320 --> 01:13:33,820
we're not going to talk about
it, is this idea that you can

1249
01:13:33,820 --> 01:13:38,130
take the variance of Y,
think of it as this pi,

1250
01:13:38,130 --> 01:13:42,740
and you can decompose
this variance of Y.

1251
01:13:42,740 --> 01:13:45,280
So let's think about--
it's easier for me

1252
01:13:45,280 --> 01:13:54,860
to draw fY depends on x1 and
x2, so they're just two inputs.

1253
01:13:54,860 --> 01:13:57,410
There would be a part
of the variance that

1254
01:13:57,410 --> 01:14:03,280
might be due to x1
acting alone, there

1255
01:14:03,280 --> 01:14:07,040
might be a part of the variance
that's due to x2 acting alone,

1256
01:14:07,040 --> 01:14:10,460
and there might be a part of
the variance of the interaction

1257
01:14:10,460 --> 01:14:13,678
between x1 and x2.

1258
01:14:13,678 --> 01:14:18,232
And if we computed the
sensitivity indices, f1, f2,

1259
01:14:18,232 --> 01:14:25,120
and f12, they would
all come up to 1.

1260
01:14:25,120 --> 01:14:27,150
Or the variance due
to x1, the variance

1261
01:14:27,150 --> 01:14:29,550
due to x2, and the
variance x12 would come out

1262
01:14:29,550 --> 01:14:33,877
to be the variance
of Y. And you can

1263
01:14:33,877 --> 01:14:37,120
show there is something called
a high dimensional model

1264
01:14:37,120 --> 01:14:40,320
representation that shows
how all of this comes out.

1265
01:14:40,320 --> 01:14:45,200
So it's kind of a nice result
that you can attribute.

1266
01:14:45,200 --> 01:14:47,020
It actually turns
out that there's

1267
01:14:47,020 --> 01:14:49,700
something else also called
a total effect sensitivity

1268
01:14:49,700 --> 01:14:55,650
index that you can also
compute that tells you

1269
01:14:55,650 --> 01:15:00,030
how much does x1 contribute,
not just by itself but with all

1270
01:15:00,030 --> 01:15:01,770
of the interactions as well.

1271
01:15:01,770 --> 01:15:07,910
So the total effect sensitivity
index, what include x1 and x12,

1272
01:15:07,910 --> 01:15:08,410
x1.

1273
01:15:08,410 --> 01:15:12,210
And the total effect x2
would include x2 and x12.

1274
01:15:12,210 --> 01:15:14,682
And then all the other ones,
if you had more variables then

1275
01:15:14,682 --> 01:15:16,390
I would add them all
up, and it turns out

1276
01:15:16,390 --> 01:15:19,506
to be a fairly easy way
to compute that as well.

1277
01:15:24,465 --> 01:15:25,090
Last questions?

1278
01:15:27,660 --> 01:15:28,419
Great.

1279
01:15:28,419 --> 01:15:29,377
AUDIENCE: [INAUDIBLE]?

1280
01:15:36,587 --> 01:15:38,420
KAREN WILLCOX: Yeah,
that's a good question.

1281
01:15:38,420 --> 01:15:39,836
So I actually
haven't told you how

1282
01:15:39,836 --> 01:15:41,570
to compute these
things that all.

1283
01:15:41,570 --> 01:15:43,466
These are the
expressions for them.

1284
01:15:43,466 --> 01:15:46,320
It turns out there are
a couple different ways.

1285
01:15:46,320 --> 01:15:48,380
I think the most
common way to compute

1286
01:15:48,380 --> 01:15:51,035
these things is a method
called the [? Sogl ?] method.

1287
01:15:51,035 --> 01:15:53,920
[? Sogl, ?] I think, was
a Russian statistician,

1288
01:15:53,920 --> 01:15:56,710
which involves Monte
Carlo simulations.

1289
01:15:56,710 --> 01:15:59,440
And you basically end up doing
one Monte Carlo simulation

1290
01:15:59,440 --> 01:16:02,740
and then you do a second
Monte Carlo simulation

1291
01:16:02,740 --> 01:16:04,560
where you free some
of the variables

1292
01:16:04,560 --> 01:16:06,669
and redraw other ones.

1293
01:16:06,669 --> 01:16:08,210
And it's done in
kind of a clever way

1294
01:16:08,210 --> 01:16:11,040
so that you get to this.

1295
01:16:11,040 --> 01:16:12,790
So then the question
of how [? converge ?]

1296
01:16:12,790 --> 01:16:15,040
of the sensitivity
indices comes to be

1297
01:16:15,040 --> 01:16:17,852
a question of how converged
are the variance estimates.

1298
01:16:17,852 --> 01:16:19,790
And we've talked a lot
about mean estimate

1299
01:16:19,790 --> 01:16:20,460
and how they converge.

1300
01:16:20,460 --> 01:16:22,751
It actually turns out that
to get variance to converge,

1301
01:16:22,751 --> 01:16:25,160
you usually have to take
more Monte Carlo simulations.

1302
01:16:25,160 --> 01:16:26,910
Most of the time when
we use these things,

1303
01:16:26,910 --> 01:16:28,540
we do as many samples
as we can afford,

1304
01:16:28,540 --> 01:16:30,830
and that ends up
being the [INAUDIBLE].

1305
01:16:30,830 --> 01:16:32,090
So that's a good question.

1306
01:16:32,090 --> 01:16:33,631
And it also depends
on what you want.

1307
01:16:33,631 --> 01:16:36,070
Do you actually care whether
you get the sensitivity index

1308
01:16:36,070 --> 01:16:38,110
to four decimal
places, or do you just

1309
01:16:38,110 --> 01:16:42,030
care about saying number
two is the biggest?