1
00:00:00,530 --> 00:00:02,960
The following content is
provided under a Creative

2
00:00:02,960 --> 00:00:04,370
Commons license.

3
00:00:04,370 --> 00:00:07,410
Your support will help MIT
OpenCourseWare continue to

4
00:00:07,410 --> 00:00:11,060
offer high quality educational
resources for free.

5
00:00:11,060 --> 00:00:13,960
To make a donation, or view
additional materials from

6
00:00:13,960 --> 00:00:18,240
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:18,240 --> 00:00:19,490
ocw.mit.edu.

8
00:00:23,894 --> 00:00:27,320
PROFESSOR: Today I'm starting a
new topic and that's always

9
00:00:27,320 --> 00:00:29,980
the occasion for putting things
into perspective.

10
00:00:29,980 --> 00:00:32,700
Keep in mind what we were trying
to do in the subject.

11
00:00:32,700 --> 00:00:36,480
We were trying to introduce
several intellectual themes.

12
00:00:36,480 --> 00:00:39,560
The first, and absolutely the
most important, is how do you

13
00:00:39,560 --> 00:00:42,210
design a complex system?

14
00:00:42,210 --> 00:00:45,350
We think that's very important
because there's absolutely no

15
00:00:45,350 --> 00:00:48,960
way this department could exist
the way it does, making

16
00:00:48,960 --> 00:00:51,730
things like that, hooking up
internets and so forth.

17
00:00:51,730 --> 00:00:53,700
Those are truly complex
systems.

18
00:00:53,700 --> 00:00:55,780
And if you didn't have an
organized way of thinking

19
00:00:55,780 --> 00:00:59,030
about complexity, they're
hopeless.

20
00:00:59,030 --> 00:01:02,140
So the kinds of things we're
interested to teach you about

21
00:01:02,140 --> 00:01:05,330
are just hopeless if you can't
get a handle on complexity.

22
00:01:05,330 --> 00:01:07,390
So that's by far the most
important thing that we've

23
00:01:07,390 --> 00:01:09,210
been thinking about.

24
00:01:09,210 --> 00:01:11,880
We've been interested in
modeling, and controlling

25
00:01:11,880 --> 00:01:13,140
physical systems.

26
00:01:13,140 --> 00:01:16,110
I hope you remember the way we
chased the robot around the

27
00:01:16,110 --> 00:01:18,980
lab, and that was
the point there.

28
00:01:18,980 --> 00:01:22,570
We've thought about augmenting
physical systems by adding

29
00:01:22,570 --> 00:01:25,570
computation, I hope you've
got a feel for that.

30
00:01:25,570 --> 00:01:27,810
And we're going to start today
thinking about how do you

31
00:01:27,810 --> 00:01:29,720
build systems that are robust.

32
00:01:29,720 --> 00:01:33,130
So just in review, so far--

33
00:01:33,130 --> 00:01:34,800
you've already seen
most of this--

34
00:01:34,800 --> 00:01:39,340
so far we've taught you about
abstraction, hierarchy, and

35
00:01:39,340 --> 00:01:42,300
controlling complexity starting
primarily by thinking

36
00:01:42,300 --> 00:01:43,480
about software engineering.

37
00:01:43,480 --> 00:01:46,730
Because that's such a good
pedagogical place to start.

38
00:01:46,730 --> 00:01:50,230
We introduced the idea of PCAP,
and that has continued

39
00:01:50,230 --> 00:01:53,180
throughout the rest of the
subject, then we worried about

40
00:01:53,180 --> 00:01:55,930
how do you control things.

41
00:01:55,930 --> 00:01:58,770
We developed ways of modeling so
that you could predict the

42
00:01:58,770 --> 00:02:01,350
outcome before you actually
built the system.

43
00:02:01,350 --> 00:02:02,310
That's crucial.

44
00:02:02,310 --> 00:02:05,710
You can't afford to build
prototypes for everything,

45
00:02:05,710 --> 00:02:07,520
it's just not economical.

46
00:02:07,520 --> 00:02:10,949
And so this was an exercise in
making models, figuring out

47
00:02:10,949 --> 00:02:14,490
how behaviors relate to the
models, and trying to get the

48
00:02:14,490 --> 00:02:19,240
design done in the modeling
stage rather than in the

49
00:02:19,240 --> 00:02:23,800
prototyping stage, and
you built circuits.

50
00:02:23,800 --> 00:02:28,920
This had to do with how you
augment a system with new

51
00:02:28,920 --> 00:02:35,240
capabilities, either hardware
or software.

52
00:02:35,240 --> 00:02:38,370
Today what I want to start to
think about is, how do you

53
00:02:38,370 --> 00:02:41,370
with uncertainty?

54
00:02:41,370 --> 00:02:44,470
And how do you deal with things
that are much more

55
00:02:44,470 --> 00:02:46,370
complicated to plan?

56
00:02:46,370 --> 00:02:49,170
So the things that we will do
in this segment are things

57
00:02:49,170 --> 00:02:51,360
like mapping.

58
00:02:51,360 --> 00:02:54,820
What if we gave you a maze--

59
00:02:54,820 --> 00:02:56,280
you, the robot.

60
00:02:56,280 --> 00:02:59,270
What if we gave the robot a maze
and didn't tell them the

61
00:02:59,270 --> 00:03:00,160
structure of the maze?

62
00:03:00,160 --> 00:03:01,800
How would it discover
the structure?

63
00:03:01,800 --> 00:03:03,970
How would it make a map?

64
00:03:03,970 --> 00:03:05,100
How would it localize?

65
00:03:05,100 --> 00:03:07,820
What if you had a maze--

66
00:03:07,820 --> 00:03:09,440
to make it simple, let's
say that I tell you

67
00:03:09,440 --> 00:03:11,330
what the maze is.

68
00:03:11,330 --> 00:03:14,020
But you wake up-- you're the
robot, you wake up, you have

69
00:03:14,020 --> 00:03:15,130
no idea where you are.

70
00:03:15,130 --> 00:03:15,500
What do you do?

71
00:03:15,500 --> 00:03:17,720
How do you figure out
where you are?

72
00:03:17,720 --> 00:03:19,245
That's a problem we
call localization.

73
00:03:19,245 --> 00:03:22,080
And then planning.

74
00:03:22,080 --> 00:03:25,210
What if you have a really
complicated objective?

75
00:03:25,210 --> 00:03:27,080
What's the step-by-step
things that you

76
00:03:27,080 --> 00:03:28,860
could do to get there?

77
00:03:28,860 --> 00:03:30,830
Those are the kinds of things
we're going to do, and here's

78
00:03:30,830 --> 00:03:33,490
a typical kind of problem.

79
00:03:33,490 --> 00:03:36,520
Let's say that the robot starts
someplace, and say that

80
00:03:36,520 --> 00:03:38,560
it has something in it
that lets it know

81
00:03:38,560 --> 00:03:41,320
where it is, like GPS.

82
00:03:41,320 --> 00:03:44,570
And it knows where
it wants to go.

83
00:03:44,570 --> 00:03:47,100
Making a plan is not very
difficult, right?

84
00:03:47,100 --> 00:03:48,950
I'm here and I want to
go there, connect

85
00:03:48,950 --> 00:03:49,730
with a straight line.

86
00:03:49,730 --> 00:03:52,530
And that's what I've
done here.

87
00:03:52,530 --> 00:03:56,420
The problem is that unbeknownst
to the robot, that

88
00:03:56,420 --> 00:03:59,430
path doesn't really work.

89
00:03:59,430 --> 00:04:03,620
So on the first step, he thinks
he's going to go from

90
00:04:03,620 --> 00:04:05,110
here to here in a
straight line.

91
00:04:05,110 --> 00:04:08,660
The blue represents the path
that the robot would like to

92
00:04:08,660 --> 00:04:15,490
take, but then on the first step
the sonars report that

93
00:04:15,490 --> 00:04:16,839
they hit walls.

94
00:04:16,839 --> 00:04:20,110
And those show up as the
black marks over here.

95
00:04:20,110 --> 00:04:22,990
So already it can see that it's
not going to be able to

96
00:04:22,990 --> 00:04:25,840
do what it wants to do.

97
00:04:25,840 --> 00:04:28,990
So it starts to turn and it
finds even more places that

98
00:04:28,990 --> 00:04:31,780
don't work.

99
00:04:31,780 --> 00:04:33,390
Try again.

100
00:04:33,390 --> 00:04:34,060
Try again.

101
00:04:34,060 --> 00:04:37,240
Notice that the plan now is,
well I don't know what's going

102
00:04:37,240 --> 00:04:39,320
on here, but I certainly can't
go through there, so I'm going

103
00:04:39,320 --> 00:04:40,570
to have to go around it.

104
00:04:43,780 --> 00:04:45,070
Keep trying.

105
00:04:45,070 --> 00:04:46,320
Keep trying.

106
00:04:49,690 --> 00:04:50,570
Notice the plan.

107
00:04:50,570 --> 00:04:54,035
So he's always making a plan
that sort of make sense.

108
00:04:58,580 --> 00:05:02,820
He's using for each plan the
information about the walls

109
00:05:02,820 --> 00:05:04,640
that he's already figured out.

110
00:05:04,640 --> 00:05:07,750
And now he's figured out,
well that didn't work.

111
00:05:07,750 --> 00:05:09,820
So now back track, try
to get out of here.

112
00:05:14,600 --> 00:05:16,990
AUDIENCE: Is he backtracking
right now?

113
00:05:16,990 --> 00:05:17,440
Or is he--

114
00:05:17,440 --> 00:05:18,820
PROFESSOR: Well, he's
going forward.

115
00:05:18,820 --> 00:05:21,160
He's making a forward plan.

116
00:05:21,160 --> 00:05:24,070
He's saying, OK now I know all
these walls are here, and I'm

117
00:05:24,070 --> 00:05:26,910
way down in this corner, how
do I get on the other

118
00:05:26,910 --> 00:05:27,890
side of that wall.

119
00:05:27,890 --> 00:05:31,110
Well given the information that
I know, I'm going to have

120
00:05:31,110 --> 00:05:32,650
to go around the known walls.

121
00:05:35,160 --> 00:05:37,250
So my point of showing you
this is several fold.

122
00:05:37,250 --> 00:05:39,340
First off it's uncertain.

123
00:05:39,340 --> 00:05:42,860
You didn't know at the outset
just how bad the problem was.

124
00:05:42,860 --> 00:05:47,920
So there's no way to kind of
pre-plan for all of this.

125
00:05:47,920 --> 00:05:50,840
Secondly, it's a really
hard problem.

126
00:05:50,840 --> 00:05:53,990
If you were to think about
structuring a program to solve

127
00:05:53,990 --> 00:05:57,270
that problem, in a kind of
High School programming

128
00:05:57,270 --> 00:05:59,480
sense-- if this happens
then do this, if this

129
00:05:59,480 --> 00:06:00,710
happens then do this--

130
00:06:00,710 --> 00:06:03,810
you would have a lot of
if statements, right?

131
00:06:03,810 --> 00:06:07,260
That's just not the
way to do this.

132
00:06:07,260 --> 00:06:10,400
So what we're going to learn to
do in this module is think

133
00:06:10,400 --> 00:06:14,700
through much more complicated
plans.

134
00:06:14,700 --> 00:06:17,670
We're going to be looking at the
kind of plans like shown

135
00:06:17,670 --> 00:06:22,990
here that just are not practical
for, do this until

136
00:06:22,990 --> 00:06:25,180
this happens, and then do this
until this happens, and then

137
00:06:25,180 --> 00:06:26,900
while that's going
on, do this.

138
00:06:26,900 --> 00:06:31,480
It's just not going to be
practical, that's the idea.

139
00:06:31,480 --> 00:06:34,150
So the very first element, the
thing that we have to get on

140
00:06:34,150 --> 00:06:39,980
top of first, is how to think
about uncertainty.

141
00:06:39,980 --> 00:06:42,460
And there's a theory for that
and the theory is actually

142
00:06:42,460 --> 00:06:44,780
trivial, the theory is actually
simple, except that

143
00:06:44,780 --> 00:06:48,510
it's mind-boggling weird that
nobody can get their head

144
00:06:48,510 --> 00:06:50,480
around it the first
time they see it.

145
00:06:50,480 --> 00:06:52,090
It's called Probability
Theory.

146
00:06:52,090 --> 00:06:55,370
As you'll see in a minute, the
rules are completely trivial.

147
00:06:55,370 --> 00:06:57,430
You'll have no trouble
with the basic rules.

148
00:06:57,430 --> 00:06:59,050
What you will have
trouble with--

149
00:06:59,050 --> 00:07:02,830
unless you're a lot different
from most people--

150
00:07:02,830 --> 00:07:05,500
the first time you see this
theory it's very hard to

151
00:07:05,500 --> 00:07:07,270
imagine exactly what's
going on.

152
00:07:07,270 --> 00:07:10,770
And it's extremely difficult
to have an intuition for

153
00:07:10,770 --> 00:07:12,360
what's going on.

154
00:07:12,360 --> 00:07:14,940
So the theory is going to give
us a framework then for

155
00:07:14,940 --> 00:07:16,000
thinking about uncertainty.

156
00:07:16,000 --> 00:07:23,180
In particular, uncertainty
sounds uncertain.

157
00:07:23,180 --> 00:07:26,150
What we would like to do is make
precise statements about

158
00:07:26,150 --> 00:07:28,580
uncertain situations.

159
00:07:28,580 --> 00:07:31,780
Sounds contradictory, but we'll
do several examples in

160
00:07:31,780 --> 00:07:33,310
lecture and then you'll
do a lot more

161
00:07:33,310 --> 00:07:36,080
examples in the next week.

162
00:07:36,080 --> 00:07:39,570
So that you learn exactly
what that means.

163
00:07:39,570 --> 00:07:44,320
We would like to draw reliable
inferences from unreliable

164
00:07:44,320 --> 00:07:45,490
observations.

165
00:07:45,490 --> 00:07:47,660
OK, you have a lot of experience
with unreliable

166
00:07:47,660 --> 00:07:48,760
observations, right?

167
00:07:48,760 --> 00:07:52,610
The seminars don't tell you
the same thing each time.

168
00:07:52,610 --> 00:07:54,300
That's what we'd like
to deal with.

169
00:07:54,300 --> 00:07:58,600
We would like to be able to
take a bunch of different

170
00:07:58,600 --> 00:08:02,010
individually, not all that
reliable observations, and

171
00:08:02,010 --> 00:08:04,740
come up with a conclusion that's
a lot more reliable

172
00:08:04,740 --> 00:08:08,800
than any particular
observation.

173
00:08:08,800 --> 00:08:10,740
And when we're all done with
that what we'd like to do is

174
00:08:10,740 --> 00:08:13,630
use this theory to help us
design robust systems.

175
00:08:13,630 --> 00:08:15,420
Systems that are not fragile.

176
00:08:15,420 --> 00:08:20,610
Systems that are not thrown off
track by having a small

177
00:08:20,610 --> 00:08:24,990
feature that was not part of
the original formulation of

178
00:08:24,990 --> 00:08:26,940
the problem.

179
00:08:26,940 --> 00:08:28,640
So that's the goal.

180
00:08:28,640 --> 00:08:31,120
And what I'd like to do is start
by motivating it with

181
00:08:31,120 --> 00:08:33,090
the kind of practical thing
to get you thinking.

182
00:08:33,090 --> 00:08:34,880
So here's the game,
Let's Make a Deal.

183
00:08:37,500 --> 00:08:40,380
I'm going to put 4 LEGO
bricks in a bag.

184
00:08:40,380 --> 00:08:41,460
OK.

185
00:08:41,460 --> 00:08:44,881
LEGO bricks, you've seen
those probably.

186
00:08:44,881 --> 00:08:46,131
Bag.

187
00:08:48,150 --> 00:08:52,340
The LEGO bricks are
white or red.

188
00:08:52,340 --> 00:08:54,510
There's only going to be 4, and
you're not going to know

189
00:08:54,510 --> 00:08:58,480
how many of each there is.

190
00:08:58,480 --> 00:09:05,620
Then you get to pull one LEGO
brick out, and if you pull a

191
00:09:05,620 --> 00:09:07,125
red one out, I'll
give you 20$.

192
00:09:10,310 --> 00:09:14,790
The hitch is you have to pay
me to play this game.

193
00:09:14,790 --> 00:09:19,940
So the question is, how much are
you willing to pay me to

194
00:09:19,940 --> 00:09:20,570
play the game?

195
00:09:20,570 --> 00:09:21,740
So, I need a volunteer.

196
00:09:21,740 --> 00:09:25,100
I need somebody to take 4 LEGOs
and not let me see, OK,

197
00:09:25,100 --> 00:09:27,050
please, please.

198
00:09:27,050 --> 00:09:30,710
I want you to put 4
LEGOs, only four.

199
00:09:30,710 --> 00:09:32,160
They can be white or red.

200
00:09:32,160 --> 00:09:34,110
If you have LEGOs in your
pockets that are a different

201
00:09:34,110 --> 00:09:35,600
color, don't use them.

202
00:09:39,320 --> 00:09:41,760
You're allowed to know what the
answer is but you're not

203
00:09:41,760 --> 00:09:43,170
allowed to tell me, or them.

204
00:09:43,170 --> 00:09:46,410
So OK well come over here.

205
00:09:46,410 --> 00:09:50,930
So bag, LEGOs, hide,
put some number in.

206
00:09:55,430 --> 00:09:55,630
Oh, no no no.

207
00:09:55,630 --> 00:09:56,540
Wait, wait, wait.

208
00:09:56,540 --> 00:09:57,400
Put them back, put them back.

209
00:09:57,400 --> 00:09:59,630
I'm not supposed
to see either.

210
00:09:59,630 --> 00:10:00,880
OK, I'll go way.

211
00:10:10,950 --> 00:10:12,560
OK, 4.

212
00:10:12,560 --> 00:10:15,920
OK, so we'll close
the bag, right?

213
00:10:15,920 --> 00:10:18,740
And I'll call you back later,
but it'll be nearer

214
00:10:18,740 --> 00:10:19,480
the end of the hour.

215
00:10:19,480 --> 00:10:22,700
So here's 4 LEGOs, sort of
sounds like 4 LEGOs,

216
00:10:22,700 --> 00:10:24,840
it's more than one.

217
00:10:24,840 --> 00:10:27,600
OK, so how much would you
be willing to pay

218
00:10:27,600 --> 00:10:31,080
me to play the game?

219
00:10:34,960 --> 00:10:36,900
AUDIENCE: 5$.

220
00:10:36,900 --> 00:10:38,360
PROFESSOR: 5$ --

221
00:10:38,360 --> 00:10:39,928
Can I get more?

222
00:10:39,928 --> 00:10:41,920
I want to make money.

223
00:10:41,920 --> 00:10:43,414
Can I get a higher bid?

224
00:10:43,414 --> 00:10:44,410
More than 5$..

225
00:10:44,410 --> 00:10:45,904
AUDIENCE: $9.90.

226
00:10:45,904 --> 00:10:47,398
PROFESSOR: How much?

227
00:10:47,398 --> 00:10:48,394
AUDIENCE: $9.90.

228
00:10:48,394 --> 00:10:49,888
PROFESSOR: $9.90, very
interesting.

229
00:10:49,888 --> 00:10:56,860
Can I get more than $9.90?

230
00:10:56,860 --> 00:10:58,354
AUDIENCE: $9.99 and a half.

231
00:10:58,354 --> 00:11:00,346
PROFESSOR: $9.99 and a half?

232
00:11:00,346 --> 00:11:01,342
Magic number.

233
00:11:01,342 --> 00:11:02,245
AUDIENCE: 10$.

234
00:11:02,245 --> 00:11:05,440
PROFESSOR: 10$.

235
00:11:05,440 --> 00:11:08,392
Can I hear even a penny more?

236
00:11:11,350 --> 00:11:12,260
A penny more?

237
00:11:12,260 --> 00:11:14,220
AUDIENCE: I'll offer
a penny more.

238
00:11:14,220 --> 00:11:17,160
You just have to
go to the bag.

239
00:11:17,160 --> 00:11:20,500
PROFESSOR: I thought we were
being very careful and letting

240
00:11:20,500 --> 00:11:21,400
them not know.

241
00:11:21,400 --> 00:11:22,190
AUDIENCE: No, no, no.

242
00:11:22,190 --> 00:11:25,184
Aren't you going to put 4 white
blocks in all the time?

243
00:11:25,184 --> 00:11:26,182
PROFESSOR: I didn't do it.

244
00:11:26,182 --> 00:11:28,178
That person did it.

245
00:11:28,178 --> 00:11:29,675
It wasn't me.

246
00:11:29,675 --> 00:11:30,673
I'm innocent.

247
00:11:30,673 --> 00:11:32,170
I'm completely fair.

248
00:11:36,162 --> 00:11:37,160
Yeah?

249
00:11:37,160 --> 00:11:41,152
AUDIENCE: Are we imagining that
you are equally as likely

250
00:11:41,152 --> 00:11:42,160
to put any number
of blocks in?

251
00:11:42,160 --> 00:11:44,620
So, are we able to say that
she's more likely

252
00:11:44,620 --> 00:11:46,099
to put it all white?

253
00:11:46,099 --> 00:11:49,057
Because that just changes
how you calculate it.

254
00:11:49,057 --> 00:11:51,530
PROFESSOR: OK, that's an
interesting question.

255
00:11:51,530 --> 00:11:53,100
We need a model of a person.

256
00:11:56,260 --> 00:11:59,080
That's tricky.

257
00:11:59,080 --> 00:12:01,470
OK, I have another idea.

258
00:12:01,470 --> 00:12:03,710
Two more volunteers.

259
00:12:03,710 --> 00:12:04,960
OK, volunteer, volunteer.

260
00:12:07,840 --> 00:12:14,650
Here's the experiment one person
will hold the bag up

261
00:12:14,650 --> 00:12:19,120
high, so that the other
person can't see it,

262
00:12:19,120 --> 00:12:20,180
and the other person--

263
00:12:20,180 --> 00:12:23,110
I didn't look in, notice I'm
being very careful, I'm very

264
00:12:23,110 --> 00:12:24,330
honest, right?

265
00:12:24,330 --> 00:12:27,920
Except for the X-ray vision,
which you don't know about.

266
00:12:27,920 --> 00:12:29,540
Everything is completely fair.

267
00:12:29,540 --> 00:12:31,177
And the little window in
the back you don't

268
00:12:31,177 --> 00:12:32,720
know about that either.

269
00:12:32,720 --> 00:12:35,890
So, one person holds it up so
the other person can't see in,

270
00:12:35,890 --> 00:12:39,270
the other person grabs a LEGO
and pulls it out and lets

271
00:12:39,270 --> 00:12:40,935
everybody see that LEGO.

272
00:12:51,900 --> 00:12:54,290
It was intended to make
it hard to see in.

273
00:12:54,290 --> 00:12:55,540
OK, red one.

274
00:12:58,690 --> 00:13:01,010
OK, that's fine,
so we're done.

275
00:13:01,010 --> 00:13:02,480
AUDIENCE: We each should
get $20, right?

276
00:13:02,480 --> 00:13:03,530
PROFESSOR: No, no, no.

277
00:13:03,530 --> 00:13:05,600
This was a different
part of the bet.

278
00:13:05,600 --> 00:13:08,180
No, no, no, no, no.

279
00:13:08,180 --> 00:13:09,660
Thank you, thank you.

280
00:13:09,660 --> 00:13:13,746
Now how much would you pay
me to play the game?

281
00:13:13,746 --> 00:13:15,674
AUDIENCE: With that one out?

282
00:13:15,674 --> 00:13:17,906
PROFESSOR: No, we'll
put that one back.

283
00:13:17,906 --> 00:13:21,510
OK, so this one came
out, it was red.

284
00:13:21,510 --> 00:13:26,090
Now without looking, I'm going
to stick it back in.

285
00:13:26,090 --> 00:13:27,870
OK, so we pulled it out.

286
00:13:27,870 --> 00:13:29,256
So what do we know?

287
00:13:29,256 --> 00:13:31,686
We know there's at
least 1 red.

288
00:13:31,686 --> 00:13:34,602
OK, now what are you willing
to pay to play the game?

289
00:13:34,602 --> 00:13:35,574
AUDIENCE: 5$..

290
00:13:35,574 --> 00:13:37,518
PROFESSOR: 5$..

291
00:13:37,518 --> 00:13:39,948
Yes?

292
00:13:39,948 --> 00:13:40,920
5$..

293
00:13:40,920 --> 00:13:41,892
AUDIENCE: $4.99.

294
00:13:41,892 --> 00:13:42,864
PROFESSOR: $4.99?

295
00:13:42,864 --> 00:13:43,836
Wait a minute.

296
00:13:43,836 --> 00:13:47,400
Should you be willing
to pay more or less?

297
00:13:47,400 --> 00:13:48,650
I got it up to 10$.

298
00:13:50,740 --> 00:13:53,570
Should you be willing to
pay more or less now?

299
00:13:53,570 --> 00:13:54,170
AUDIENCE: More.

300
00:13:54,170 --> 00:13:56,650
PROFESSOR: More, why?

301
00:13:56,650 --> 00:13:58,634
The same.

302
00:13:58,634 --> 00:13:59,626
More.

303
00:13:59,626 --> 00:14:00,122
The same.

304
00:14:00,122 --> 00:14:03,098
AUDIENCE: You're insured that
there's at least 1 red block.

305
00:14:03,098 --> 00:14:06,180
PROFESSOR: I know that there's
at least 1, but didn't I know

306
00:14:06,180 --> 00:14:07,650
that before?

307
00:14:07,650 --> 00:14:08,140
No.

308
00:14:08,140 --> 00:14:12,060
The person could have been e
that first person could have

309
00:14:12,060 --> 00:14:15,980
loaded it, because I was
giving her a cut.

310
00:14:15,980 --> 00:14:18,920
I didn't talk about
this before.

311
00:14:18,920 --> 00:14:20,170
This is not a set-up.

312
00:14:22,840 --> 00:14:26,270
So I want to vote.

313
00:14:26,270 --> 00:14:30,799
How many people would give
me less than 10$?

314
00:14:30,799 --> 00:14:34,290
I'm going to give you
[UNINTELLIGIBLE] first.

315
00:14:34,290 --> 00:14:36,130
10 to 12.

316
00:14:36,130 --> 00:14:43,288
Let's see, 13 to 15, 16
to 18, more than 18.

317
00:14:43,288 --> 00:14:45,390
So how many people would
give me-- you're only

318
00:14:45,390 --> 00:14:48,600
allowed to vote once.

319
00:14:48,600 --> 00:14:49,420
Keep in mind that I'm more
likely to choose

320
00:14:49,420 --> 00:14:52,980
you if you vote high.

321
00:14:52,980 --> 00:14:53,580
Right?

322
00:14:53,580 --> 00:14:54,720
Vote high.

323
00:14:54,720 --> 00:14:57,270
So how many people would
give me less than

324
00:14:57,270 --> 00:14:59,870
$10 to play the game?

325
00:14:59,870 --> 00:15:01,716
A lot, I would say 20%.

326
00:15:04,380 --> 00:15:08,070
How many people would give
me between $10 and $12?

327
00:15:08,070 --> 00:15:11,590
A lot smaller, 5%.

328
00:15:11,590 --> 00:15:13,355
How many people would give
me between $13 and $15?

329
00:15:16,240 --> 00:15:19,980
Even smaller, 2%.

330
00:15:19,980 --> 00:15:23,590
How many people would give
me between $16 and $18?

331
00:15:23,590 --> 00:15:26,320
Wait, these numbers are not
going to add up to 100%.

332
00:15:29,740 --> 00:15:31,680
OK, we'll learn the
theory for how to

333
00:15:31,680 --> 00:15:34,120
normalize things in a minute.

334
00:15:34,120 --> 00:15:35,770
OK, so we're down to about 1%.

335
00:15:35,770 --> 00:15:40,170
How many people would give
me more than $18?

336
00:15:40,170 --> 00:15:41,110
One person.

337
00:15:41,110 --> 00:15:42,440
Thank you, thank you.

338
00:15:42,440 --> 00:15:47,680
So that's 1 in 200 or 0.05%.

339
00:15:47,680 --> 00:15:52,140
OK, so what I'd like to do now
is go through the theory

340
00:15:52,140 --> 00:15:55,980
that's going to let us make a
precise calculation for how

341
00:15:55,980 --> 00:15:59,150
much a rational person--
not to say

342
00:15:59,150 --> 00:16:00,530
that you're not rational--

343
00:16:00,530 --> 00:16:03,600
but how much a rational person
might be willing to pay.

344
00:16:03,600 --> 00:16:07,500
So that was the set up, then
we'll do the theory, then

345
00:16:07,500 --> 00:16:09,760
we'll come back at the end of
the hour and see how many

346
00:16:09,760 --> 00:16:11,750
people I would have gypped--

347
00:16:11,750 --> 00:16:14,320
made money, or whatever.

348
00:16:14,320 --> 00:16:18,870
OK, so we're going to think
about probability.

349
00:16:18,870 --> 00:16:22,170
And the first idea that
we need is set theory.

350
00:16:22,170 --> 00:16:24,740
Because we're going to think
about experiments having

351
00:16:24,740 --> 00:16:26,840
outcomes, and we're going
to talk about the

352
00:16:26,840 --> 00:16:29,230
outcomes being an event.

353
00:16:29,230 --> 00:16:34,390
An event is any describable
outcome from an experiment.

354
00:16:34,390 --> 00:16:36,630
So for example, what if the
experiment were to flip 3

355
00:16:36,630 --> 00:16:38,150
coins in sequence.

356
00:16:38,150 --> 00:16:43,120
An event could be head,
head, head.

357
00:16:43,120 --> 00:16:46,540
And you could talk about was the
outcome head, head, head.

358
00:16:46,540 --> 00:16:49,460
The event could be head,
tail, ahead.

359
00:16:49,460 --> 00:16:52,100
The event could be 1
head and 2 tails.

360
00:16:54,720 --> 00:16:56,940
The event could be the first
toss was a head.

361
00:17:00,130 --> 00:17:05,910
So the idea is there's sets that
we're rethinking about.

362
00:17:05,910 --> 00:17:08,950
And we're going to think about
events as possible outcomes

363
00:17:08,950 --> 00:17:12,160
being members of sets.

364
00:17:12,160 --> 00:17:14,720
There's going to be a special
kind of event that we're

365
00:17:14,720 --> 00:17:19,829
especially interested in, and
that is an atomic event.

366
00:17:19,829 --> 00:17:21,874
By which we mean finest grain.

367
00:17:24,609 --> 00:17:28,470
Finest grain is kind
of amorphous idea.

368
00:17:28,470 --> 00:17:34,930
What it really means is for
the experiment at hand, it

369
00:17:34,930 --> 00:17:40,150
doesn't seem to make sense to
try to slice the outcome into

370
00:17:40,150 --> 00:17:41,800
two smaller units.

371
00:17:41,800 --> 00:17:44,830
You keep slicing them down
until slicing them into a

372
00:17:44,830 --> 00:17:48,770
smaller unit won't affect
the outcome.

373
00:17:48,770 --> 00:17:52,490
So for example, in the coin toss
experiment, I might think

374
00:17:52,490 --> 00:17:55,350
that there are 8
atomic events.

375
00:17:55,350 --> 00:17:57,300
Head, head, head, head, head
tail, head, tail, head, head,

376
00:17:57,300 --> 00:17:59,990
tail, tail, blah, blah, blah.

377
00:17:59,990 --> 00:18:06,130
So I've ignored some things
like, it took 3 minutes to do

378
00:18:06,130 --> 00:18:09,290
the first flip, and it took 2
minutes to do the second one.

379
00:18:09,290 --> 00:18:09,670
Right?

380
00:18:09,670 --> 00:18:14,990
That's the art of figuring out
what atomic units are.

381
00:18:14,990 --> 00:18:18,490
So for the class of problems
that I'm thinking about, those

382
00:18:18,490 --> 00:18:21,390
things can be ignored so
I'm not counting them.

383
00:18:21,390 --> 00:18:23,800
But that's an art, that's
not really a science.

384
00:18:23,800 --> 00:18:28,220
So you sort of have to use good
judgment when you try to

385
00:18:28,220 --> 00:18:31,360
figure out what are the atomic
events for a particular

386
00:18:31,360 --> 00:18:33,070
experiment.

387
00:18:33,070 --> 00:18:36,030
Atomic events always have
several properties, they are

388
00:18:36,030 --> 00:18:38,940
always mutually exclusive.

389
00:18:38,940 --> 00:18:43,960
If I know the outcome was atomic
event 3, then I know

390
00:18:43,960 --> 00:18:48,680
for sure that it was
not atomic event 4.

391
00:18:48,680 --> 00:18:51,070
And you can see that these
events up here don't have

392
00:18:51,070 --> 00:18:54,050
those properties, right?

393
00:18:54,050 --> 00:18:56,560
So the first toss--

394
00:18:56,560 --> 00:18:59,590
here's an event head, head,
head, which is not mutually

395
00:18:59,590 --> 00:19:03,730
exclusive with the first
toss was a head.

396
00:19:03,730 --> 00:19:07,800
So atomic that events have
to be mutually exclusive.

397
00:19:07,800 --> 00:19:11,920
Furthermore, if you list all of
the atomic events, that set

398
00:19:11,920 --> 00:19:13,930
has to be collectively
exhaustive.

399
00:19:13,930 --> 00:19:14,930
Collectively exhaustive?

400
00:19:14,930 --> 00:19:16,300
What buzz words?

401
00:19:16,300 --> 00:19:20,620
OK, that means that you've
exhausted all possibilities

402
00:19:20,620 --> 00:19:23,970
when you've accounted for the
collective behaviors of all

403
00:19:23,970 --> 00:19:25,760
the atomic events.

404
00:19:25,760 --> 00:19:27,670
And we have a very special name
for that because it comes

405
00:19:27,670 --> 00:19:29,200
up over, and over,
and over again.

406
00:19:29,200 --> 00:19:34,050
The set of atomic events, the
maximum set of atomic events,

407
00:19:34,050 --> 00:19:36,660
is called the sample space.

408
00:19:36,660 --> 00:19:38,850
So the first thing we need to
know when we're thinking about

409
00:19:38,850 --> 00:19:43,990
probability theory, is
how to chunk outcomes

410
00:19:43,990 --> 00:19:47,520
into a sample space.

411
00:19:47,520 --> 00:19:51,030
Second thing we need to know are
the rules of probability.

412
00:19:51,030 --> 00:19:55,810
These are the things that are
so absurdly simple, that

413
00:19:55,810 --> 00:19:58,860
everybody who sees these
immediately comes to the

414
00:19:58,860 --> 00:20:01,660
conclusion that probability
theory is trivial, they then

415
00:20:01,660 --> 00:20:05,070
don't do anything until the next
exam, and then they don't

416
00:20:05,070 --> 00:20:07,050
have a clue what we're asking.

417
00:20:07,050 --> 00:20:10,630
Because it's subtle, it's more
subtle than you might think.

418
00:20:10,630 --> 00:20:13,960
Here's the rules, probabilities
are real numbers

419
00:20:13,960 --> 00:20:15,210
that are not negative.

420
00:20:17,850 --> 00:20:20,360
Pretty easy.

421
00:20:20,360 --> 00:20:23,100
Probabilities have the feature
that the probability of the

422
00:20:23,100 --> 00:20:25,260
sample space is 1.

423
00:20:25,260 --> 00:20:26,810
That's really just scaling.

424
00:20:26,810 --> 00:20:31,190
That's really just telling me
how big all the numbers are.

425
00:20:31,190 --> 00:20:36,140
So if I enumerate all the
possible atomic events, the

426
00:20:36,140 --> 00:20:40,190
probability of having one of
those as the outcome of an

427
00:20:40,190 --> 00:20:44,660
experiment, that probability
is 1.

428
00:20:44,660 --> 00:20:47,240
Doesn't seem like I said much,
and I'm already 2/3 of the way

429
00:20:47,240 --> 00:20:48,010
through the list.

430
00:20:48,010 --> 00:20:48,876
Yes?

431
00:20:48,876 --> 00:20:50,214
AUDIENCE: Doesn't that just mean
that something happened?

432
00:20:50,214 --> 00:20:52,690
PROFESSOR: Something
happened, yes.

433
00:20:52,690 --> 00:20:56,165
And we are going to say that
this certain event has

434
00:20:56,165 --> 00:20:58,840
probability 1.

435
00:20:58,840 --> 00:21:02,020
All probabilities are real, all
probabilities are bigger

436
00:21:02,020 --> 00:21:05,150
than 0, and the probability
of the certain event--

437
00:21:05,150 --> 00:21:09,810
written here as the universe,
the sample space--

438
00:21:09,810 --> 00:21:11,540
the probability of some
element in the

439
00:21:11,540 --> 00:21:14,140
sample space is 1.

440
00:21:14,140 --> 00:21:17,610
The only one that's terribly
interesting is additivity.

441
00:21:17,610 --> 00:21:22,890
If the intersection between
A and B is empty, the

442
00:21:22,890 --> 00:21:33,390
probability of the union is the
sum of the probabilities

443
00:21:33,390 --> 00:21:36,240
of the individual events.

444
00:21:36,240 --> 00:21:40,290
Astonishingly, I'm done.

445
00:21:40,290 --> 00:21:44,280
And this doesn't alter the fact
that people are still, to

446
00:21:44,280 --> 00:21:46,990
this day, doing fundamental
research

447
00:21:46,990 --> 00:21:48,570
in probability theory.

448
00:21:48,570 --> 00:21:53,120
There are many subjects in
probability theory, including

449
00:21:53,120 --> 00:21:58,000
many highly advanced graduate
subjects, all of which derive

450
00:21:58,000 --> 00:21:59,000
from these three rules.

451
00:21:59,000 --> 00:22:06,100
It's absurd how un-intuitive
things can be given such

452
00:22:06,100 --> 00:22:08,910
simple beginnings.

453
00:22:08,910 --> 00:22:10,130
Just as an idea.

454
00:22:10,130 --> 00:22:13,470
So you can prove all of the
interesting results from

455
00:22:13,470 --> 00:22:14,260
probability theory--

456
00:22:14,260 --> 00:22:18,350
you can prove all results from
probability theory with these

457
00:22:18,350 --> 00:22:22,560
three rules, and here's
just one example.

458
00:22:22,560 --> 00:22:29,120
If the intersection of A and B
were not empty, you can still

459
00:22:29,120 --> 00:22:32,740
compute the probability of
the union, it's just more

460
00:22:32,740 --> 00:22:35,550
complicated than if they were
empty, if the intersection

461
00:22:35,550 --> 00:22:36,510
were empty.

462
00:22:36,510 --> 00:22:38,820
Generally speaking, the
probability of the union of A

463
00:22:38,820 --> 00:22:41,920
and B, is the probability of A
plus the probability of B,

464
00:22:41,920 --> 00:22:44,560
minus the probability
of the intersection.

465
00:22:44,560 --> 00:22:46,890
And you can sort of see why that
ought to be true, if you

466
00:22:46,890 --> 00:22:49,540
think about a Venn diagram.

467
00:22:49,540 --> 00:22:54,110
If you think about the odds of
having A in the universe--

468
00:22:54,110 --> 00:22:56,160
the universe is the
sample space--

469
00:22:56,160 --> 00:22:58,620
probability of having sum event
A, the probability of

470
00:22:58,620 --> 00:23:00,980
having sum event B, the
probability of their

471
00:23:00,980 --> 00:23:02,530
intersection.

472
00:23:02,530 --> 00:23:05,270
If you were to just add the
probability of A and B, you

473
00:23:05,270 --> 00:23:08,600
doubly count the intersection.

474
00:23:08,600 --> 00:23:11,890
You don't want to double count
it, you want to count it once.

475
00:23:11,890 --> 00:23:13,150
So you have to subtract
one off.

476
00:23:13,150 --> 00:23:15,940
So that's sort of
what's going on.

477
00:23:15,940 --> 00:23:19,910
OK, as I said the theory
is very simple.

478
00:23:19,910 --> 00:23:23,140
But let's make sure that you've
got the basics first.

479
00:23:23,140 --> 00:23:27,670
So experiment, I'm going to
roll a fair, 6-sided die.

480
00:23:27,670 --> 00:23:32,010
And I'm going to count as the
outcome the number of dots on

481
00:23:32,010 --> 00:23:34,930
the top surface, not
surprisingly.

482
00:23:34,930 --> 00:23:37,990
Find the probability that the
roll is odd, and greater than

483
00:23:37,990 --> 00:23:40,020
3 You have 10 seconds.

484
00:23:56,840 --> 00:23:58,050
OK, 10 seconds are up.

485
00:23:58,050 --> 00:23:59,650
What's the answer? (1),
(2), (3), (4) or (5)?

486
00:23:59,650 --> 00:24:00,530
Raise your hands.

487
00:24:00,530 --> 00:24:01,510
Excellent, wonderful.

488
00:24:01,510 --> 00:24:03,840
The answer is (1).

489
00:24:03,840 --> 00:24:06,830
The way I want you to think
about that is in terms of the

490
00:24:06,830 --> 00:24:09,500
theory that we just generated
because it's useful for

491
00:24:09,500 --> 00:24:12,170
developing the answers to more
complicated questions.

492
00:24:12,170 --> 00:24:15,620
In terms of the theory, what we
will always do, the process

493
00:24:15,620 --> 00:24:21,060
that always works, is enumerate
the sample space.

494
00:24:21,060 --> 00:24:21,680
What's that mean?

495
00:24:21,680 --> 00:24:26,790
That means identify all
of the atomic events.

496
00:24:26,790 --> 00:24:30,440
The atomic events here are
the faces that show are

497
00:24:30,440 --> 00:24:32,970
1, 2, 3, 4, 5, 6.

498
00:24:32,970 --> 00:24:36,140
Enumerate the sample space.

499
00:24:36,140 --> 00:24:40,950
And then find the
event interest.

500
00:24:40,950 --> 00:24:44,220
So here the event was
a compound event.

501
00:24:44,220 --> 00:24:46,330
The result is odd and
greater than 3.

502
00:24:46,330 --> 00:24:50,950
Odd, well that's 1, 3, 5, shown
by the check marks.

503
00:24:50,950 --> 00:24:55,920
Bigger than 3, that's the
bottom 3 check marks.

504
00:24:55,920 --> 00:24:57,830
If it's going to be both, then
you have to look where there's

505
00:24:57,830 --> 00:25:01,605
overlap and that only happens
for the outcome 5.

506
00:25:01,605 --> 00:25:05,440
Since there's only 1, and
so fair meant that these

507
00:25:05,440 --> 00:25:06,710
probabilities were the same.

508
00:25:06,710 --> 00:25:09,810
If you think through the
fundamental axioms of

509
00:25:09,810 --> 00:25:15,270
probability, if they're equal,
they're all non-negative real

510
00:25:15,270 --> 00:25:20,320
numbers, and they sum to 1,
then they are all 1/6.

511
00:25:20,320 --> 00:25:23,180
So the answer is 1/6, right?

512
00:25:23,180 --> 00:25:24,430
OK, that was easy.

513
00:25:26,430 --> 00:25:32,880
The rule that is most
interesting for us, happens

514
00:25:32,880 --> 00:25:35,930
not surprisingly to also be the
one that people have the

515
00:25:35,930 --> 00:25:38,560
most trouble with.

516
00:25:38,560 --> 00:25:40,980
Not excluding the people
who originally

517
00:25:40,980 --> 00:25:42,640
invented the theory.

518
00:25:42,640 --> 00:25:44,600
The theory goes back
to Laplace.

519
00:25:44,600 --> 00:25:46,900
A bunch of people back then who
were absolutely brilliant

520
00:25:46,900 --> 00:25:49,150
mathematicians, and still
it took a while to

521
00:25:49,150 --> 00:25:50,480
formulate this rule.

522
00:25:50,480 --> 00:25:52,060
It was formulated a
guy named Bayes.

523
00:25:55,030 --> 00:25:59,390
Bayes' theorem gives us a way
to think about conditional

524
00:25:59,390 --> 00:26:01,490
probability.

525
00:26:01,490 --> 00:26:06,825
What if I tell you, in some
sample space, B happened?

526
00:26:10,140 --> 00:26:14,270
How should you relabel the
probabilities to take that

527
00:26:14,270 --> 00:26:16,770
into account?

528
00:26:16,770 --> 00:26:20,650
Bayes' rule is trivial, it says
it if I know B happened,

529
00:26:20,650 --> 00:26:24,080
what is the probability
that A occurs, given

530
00:26:24,080 --> 00:26:25,330
that I know B happens?

531
00:26:27,900 --> 00:26:30,630
And the rule is, you find
the probability of the

532
00:26:30,630 --> 00:26:31,020
intersection.

533
00:26:31,020 --> 00:26:32,736
AUDIENCE: How do you do that?

534
00:26:32,736 --> 00:26:35,720
PROFESSOR: We'll do
some examples.

535
00:26:35,720 --> 00:26:37,900
So we need to find the
probability of the

536
00:26:37,900 --> 00:26:40,200
intersection, and then we have
to find the probability of B

537
00:26:40,200 --> 00:26:42,200
occurring, and then
we normalize--

538
00:26:42,200 --> 00:26:44,680
a word I used before, and that's
exactly what we need to

539
00:26:44,680 --> 00:26:47,220
do to that distribution--

540
00:26:47,220 --> 00:26:54,110
we normalize the intersection
by the probability of B.

541
00:26:54,110 --> 00:26:57,330
That's an interesting rule.

542
00:26:57,330 --> 00:26:59,310
It's the kind of thing we're
going to want to know about.

543
00:26:59,310 --> 00:27:01,450
We're going to want to know--

544
00:27:01,450 --> 00:27:03,780
OK, I'm a robot.

545
00:27:03,780 --> 00:27:04,360
I'm in a space.

546
00:27:04,360 --> 00:27:06,890
I don't know where I am.

547
00:27:06,890 --> 00:27:09,910
I have some a priori probability
idea about where I

548
00:27:09,910 --> 00:27:14,810
am, so I think I'm 1/20 likely
to be here, I'm 1/20 likely to

549
00:27:14,810 --> 00:27:17,100
be there, et cetera,
et cetera.

550
00:27:17,100 --> 00:27:22,775
And then I find out the sonars
told me that I'm 0.03 meters

551
00:27:22,775 --> 00:27:26,760
-- no it can't be that small,
0.72 meters from a wall.

552
00:27:26,760 --> 00:27:32,960
Well, how do I take into account
this new information

553
00:27:32,960 --> 00:27:36,640
to update my probabilities
for where I might be?

554
00:27:36,640 --> 00:27:40,120
That's what this rule
is good for.

555
00:27:40,120 --> 00:27:41,600
So here's a picture.

556
00:27:41,600 --> 00:27:45,060
The way to think about the rule
is if I condition on B,

557
00:27:45,060 --> 00:27:51,510
if I tell you B happened, that's
equivalent to shrinking

558
00:27:51,510 --> 00:27:54,510
the universe --

559
00:27:54,510 --> 00:27:56,630
the universe U, the square.

560
00:27:56,630 --> 00:28:00,420
That's everything
that can happen.

561
00:28:00,420 --> 00:28:03,640
Inside the universe, there's
this event A and it does not

562
00:28:03,640 --> 00:28:04,890
occupy the entire universe.

563
00:28:07,700 --> 00:28:10,420
There is a fraction of outcomes
that belong logically

564
00:28:10,420 --> 00:28:14,240
in not A. OK?

565
00:28:14,240 --> 00:28:19,750
That's the part that's in U but
not in A. Similarly with

566
00:28:19,750 --> 00:28:24,840
B. Similarly there's some
region, there's some part of

567
00:28:24,840 --> 00:28:28,790
the universe where both A and
B occur, the intersection of

568
00:28:28,790 --> 00:28:30,570
the two occurred.

569
00:28:30,570 --> 00:28:37,600
So what Bayes' theorem says is,
if I tell you B occurred,

570
00:28:37,600 --> 00:28:41,870
all this part of the universe
outside of B is irrelevant.

571
00:28:41,870 --> 00:28:44,410
As far as you're concerned,
B's the new universe.

572
00:28:48,400 --> 00:28:53,340
Notice that if B is the
new universe, then the

573
00:28:53,340 --> 00:28:54,320
intersection--

574
00:28:54,320 --> 00:28:56,055
which is the part where
A occurred--

575
00:28:59,800 --> 00:29:06,070
is bigger after the conditioning
then it was

576
00:29:06,070 --> 00:29:08,360
before the conditioning.

577
00:29:08,360 --> 00:29:11,060
Before the conditioning the
universe was this big, now the

578
00:29:11,060 --> 00:29:13,250
universe is this big.

579
00:29:13,250 --> 00:29:18,410
The universe is smaller, so
this region of overlap

580
00:29:18,410 --> 00:29:23,160
occupies a greater part
of the new universe.

581
00:29:23,160 --> 00:29:24,460
Is that clear?

582
00:29:24,460 --> 00:29:27,820
So when you condition, you're
really making the universe

583
00:29:27,820 --> 00:29:31,890
smaller, And the relative
likelihood of things that are

584
00:29:31,890 --> 00:29:34,315
still in the universe,
seem bigger.

585
00:29:37,020 --> 00:29:40,040
So what's the conditional
probability of getting a die

586
00:29:40,040 --> 00:29:45,970
roll greater than 3, given
that it was odd?

587
00:29:45,970 --> 00:29:48,040
Calculate, you have
30 seconds.

588
00:29:48,040 --> 00:29:49,290
This is three times harder.

589
00:30:23,850 --> 00:30:28,340
OK, what's the probability of
getting a die roll greater

590
00:30:28,340 --> 00:30:30,910
than 3, given that the
die role was odd?

591
00:30:30,910 --> 00:30:32,540
Everybody raise your hands.

592
00:30:32,540 --> 00:30:36,980
And it's a landslide,
the answer is (2).

593
00:30:36,980 --> 00:30:40,790
You roughly do the same thing we
did before, except now the

594
00:30:40,790 --> 00:30:43,140
math is incrementally
harder because you

595
00:30:43,140 --> 00:30:45,120
have to do a divide.

596
00:30:45,120 --> 00:30:48,850
So we think about the same two
events, the event that it is

597
00:30:48,850 --> 00:30:51,480
odd and the event that it's
bigger than 3, and now we ask

598
00:30:51,480 --> 00:30:52,190
the question.

599
00:30:52,190 --> 00:30:56,590
If it were odd, what's the
likelihood that it's

600
00:30:56,590 --> 00:30:57,780
greater than 3?

601
00:30:57,780 --> 00:30:59,840
Before I did the conditioning,
what was the likelihood that

602
00:30:59,840 --> 00:31:01,090
it was bigger than 3?

603
00:31:03,905 --> 00:31:05,300
AUDIENCE: 1/6

604
00:31:05,300 --> 00:31:07,160
PROFESSOR: Nope.

605
00:31:07,160 --> 00:31:09,356
1/2.

606
00:31:09,356 --> 00:31:12,000
So bigger than 3 is
4, 5, or 6 --

607
00:31:12,000 --> 00:31:12,810
right?

608
00:31:12,810 --> 00:31:16,190
There are 3 atomic
units there.

609
00:31:16,190 --> 00:31:18,300
There are 6 atomic units
to start with.

610
00:31:18,300 --> 00:31:19,620
They are equally likely.

611
00:31:19,620 --> 00:31:21,870
So before I did the
conditioning, the event of

612
00:31:21,870 --> 00:31:24,960
interest had a probability
of a 1/2.

613
00:31:24,960 --> 00:31:28,920
After I do the conditioning, I
know that half of the possible

614
00:31:28,920 --> 00:31:30,090
samples didn't happen.

615
00:31:30,090 --> 00:31:33,240
The universe shrank.

616
00:31:33,240 --> 00:31:36,310
Instead of having a sample space
with 6, I now have a

617
00:31:36,310 --> 00:31:39,600
sample space with 3.

618
00:31:39,600 --> 00:31:42,550
Similarly the probability
law changed.

619
00:31:42,550 --> 00:31:47,480
So now the event of interest is
bigger than 3, but bigger

620
00:31:47,480 --> 00:31:51,080
than 3 now only happens once.

621
00:31:51,080 --> 00:31:55,830
So what I need to do is rescale
my probabilities.

622
00:31:55,830 --> 00:31:59,070
Remember the scaling rule, one
of the fundamental properties

623
00:31:59,070 --> 00:31:59,790
of probability.

624
00:31:59,790 --> 00:32:01,230
The scaling rule said
the sum of the

625
00:32:01,230 --> 00:32:03,420
probabilities must be 1.

626
00:32:03,420 --> 00:32:04,660
After I've conditioned,
the sum of the

627
00:32:04,660 --> 00:32:07,020
probabilities is a 1/2.

628
00:32:07,020 --> 00:32:08,510
That's not good.

629
00:32:08,510 --> 00:32:11,170
I've got to fix it.

630
00:32:11,170 --> 00:32:18,140
So the way to think about Bayes'
rule is, if all I know

631
00:32:18,140 --> 00:32:22,040
is it the universe
got smaller, how

632
00:32:22,040 --> 00:32:25,260
should I redo the scaling?

633
00:32:25,260 --> 00:32:31,560
Well if all I've told you is
that the answer is odd, then

634
00:32:31,560 --> 00:32:33,980
there are three possibilities.

635
00:32:33,980 --> 00:32:38,750
Before I told you that the
answer was odd, they were

636
00:32:38,750 --> 00:32:40,250
equally likely.

637
00:32:40,250 --> 00:32:42,730
After I tell you that they're
odd, has it changed the fact

638
00:32:42,730 --> 00:32:45,750
that they're equally likely?

639
00:32:45,750 --> 00:32:46,570
No.

640
00:32:46,570 --> 00:32:51,810
They're still equally likely
even under that new condition.

641
00:32:51,810 --> 00:32:55,470
I haven't changed their
individual probabilities.

642
00:32:55,470 --> 00:32:59,750
So they started out equally
likely, they're still equally

643
00:32:59,750 --> 00:33:03,310
likely, they just don't
sum to 1 anymore.

644
00:33:03,310 --> 00:33:07,950
Bayes' rule says, make
them sum to 1.

645
00:33:07,950 --> 00:33:11,300
OK, so the way I make this
sum, sum to one is

646
00:33:11,300 --> 00:33:13,330
to divide by 1/2.

647
00:33:13,330 --> 00:33:16,840
If you divide six by
1/2, you get 1/3.

648
00:33:16,840 --> 00:33:21,180
Notice that the probability that
it's bigger than 3 went

649
00:33:21,180 --> 00:33:24,890
from 1/2 to a 1/3.

650
00:33:24,890 --> 00:33:26,140
It got smaller.

651
00:33:29,020 --> 00:33:33,640
It could have gone either way.

652
00:33:33,640 --> 00:33:40,760
So, think about what happens
when the world shrinks, when

653
00:33:40,760 --> 00:33:42,140
the universe gets
smaller, when I

654
00:33:42,140 --> 00:33:44,940
tell you that B happened.

655
00:33:44,940 --> 00:33:49,510
Well when I tell you that B
happened, then I ask you

656
00:33:49,510 --> 00:33:52,890
whether A happened, here I'm
showing a picture that in the

657
00:33:52,890 --> 00:33:55,780
original universe A and
B sort of covered the

658
00:33:55,780 --> 00:33:57,720
same amount of area.

659
00:33:57,720 --> 00:34:00,000
By which I mean, they're
about equally likely.

660
00:34:03,310 --> 00:34:06,003
Before I did the conditioning,
the probability of A was about

661
00:34:06,003 --> 00:34:10,480
the same size as the probability
of B. What happens

662
00:34:10,480 --> 00:34:12,070
when I condition?

663
00:34:12,070 --> 00:34:19,280
Well, when I condition now the
universe is B. But notice the

664
00:34:19,280 --> 00:34:21,320
way I've drawn them, there's
very little overlap.

665
00:34:21,320 --> 00:34:27,370
So now when I condition on B,
the odds that I'm in A seems

666
00:34:27,370 --> 00:34:28,620
to have got smaller.

667
00:34:31,199 --> 00:34:36,330
Rather than being of equal
probability, as I show here,

668
00:34:36,330 --> 00:34:40,370
after the conditioning the
relative likelihood of being

669
00:34:40,370 --> 00:34:43,719
event A is smaller than
it used to be.

670
00:34:43,719 --> 00:34:48,260
But that's entirely because of
the way I rigged the circles.

671
00:34:48,260 --> 00:34:50,300
I could have rigged the circles
to have a large amount

672
00:34:50,300 --> 00:34:51,550
of overlap.

673
00:34:54,360 --> 00:34:58,400
Then when I condition, it
seems as though it's

674
00:34:58,400 --> 00:35:04,420
relatively more likely that
I'm in the event A. That's

675
00:35:04,420 --> 00:35:06,390
what we mean by the
conditioning.

676
00:35:06,390 --> 00:35:11,760
The conditioning can give you
un-intuitive insight.

677
00:35:11,760 --> 00:35:15,550
Because when you condition,
probabilities can get bigger

678
00:35:15,550 --> 00:35:17,400
or littler.

679
00:35:17,400 --> 00:35:19,890
And that's something that sort
of at a gut level, we all have

680
00:35:19,890 --> 00:35:21,140
trouble dealing with.

681
00:35:23,620 --> 00:35:27,210
OK, so that's the fundamental
ideas, right?

682
00:35:27,210 --> 00:35:30,630
We've talked about events.

683
00:35:30,630 --> 00:35:34,910
Three axioms of probability that
are completely trivial.

684
00:35:34,910 --> 00:35:43,470
One, not quite so trivial rule,
which is Bayes' rule.

685
00:35:43,470 --> 00:35:46,280
In order to apply it, there's
two more things we need to

686
00:35:46,280 --> 00:35:46,760
talk about.

687
00:35:46,760 --> 00:35:48,736
The first is, notation.

688
00:35:51,300 --> 00:35:54,190
We could do the entire rest of
the course using the notation

689
00:35:54,190 --> 00:35:57,700
that I showed so far, drawing
circles on the blackboard, it

690
00:35:57,700 --> 00:35:59,090
would work.

691
00:35:59,090 --> 00:36:02,150
It would not be very
convenient.

692
00:36:02,150 --> 00:36:06,010
So to better take advantage of
math, which is a very concise

693
00:36:06,010 --> 00:36:10,630
way to write things down, we
will define a new notion which

694
00:36:10,630 --> 00:36:13,370
is a random variable.

695
00:36:13,370 --> 00:36:18,390
Random variable is just like a
variable, except shockingly,

696
00:36:18,390 --> 00:36:21,200
it's random.

697
00:36:21,200 --> 00:36:24,350
So where we would normally
think about a variable

698
00:36:24,350 --> 00:36:29,480
represents a number, a random
variable represents a

699
00:36:29,480 --> 00:36:30,730
distribution.

700
00:36:33,150 --> 00:36:38,300
So we could, for example in the
die rolling case, we could

701
00:36:38,300 --> 00:36:46,660
say the sample space has 6
atomic events, and I could

702
00:36:46,660 --> 00:36:49,450
think about it as 6 circles.

703
00:36:49,450 --> 00:36:51,770
Circles wouldn't pack
all that well.

704
00:36:51,770 --> 00:36:55,150
6 squares inside the
universe, right?

705
00:36:55,150 --> 00:36:57,980
Because they are mutually
exclusive, and collectively

706
00:36:57,980 --> 00:37:00,130
exhaustive, so if I started with
a universal that looked

707
00:37:00,130 --> 00:37:03,920
like that, I would have this one
would be the probability

708
00:37:03,920 --> 00:37:09,730
that the number of dots was 1,
2, 3, it has to fill up by the

709
00:37:09,730 --> 00:37:11,760
time I've put 6 of
them in there.

710
00:37:11,760 --> 00:37:14,810
And they have to not overlap.

711
00:37:14,810 --> 00:37:18,660
A more convenient notation is
to say, OK, let's let X

712
00:37:18,660 --> 00:37:19,910
represent that outcome.

713
00:37:27,190 --> 00:37:29,150
So I can label the
events with math.

714
00:37:29,150 --> 00:37:33,900
I can say, there's the event X
equals 1, the event X equals

715
00:37:33,900 --> 00:37:38,950
2, the event X equals 3, and it
just makes it much easier

716
00:37:38,950 --> 00:37:41,770
to write down the possibilities,
then to try to

717
00:37:41,770 --> 00:37:44,380
draw pictures with Venn
diagrams all the time.

718
00:37:44,380 --> 00:37:48,370
So all we're doing here is
introducing a mathematical

719
00:37:48,370 --> 00:37:52,080
representation for the same
thing we talked about before.

720
00:37:52,080 --> 00:38:00,750
But among the things that
you can do, after you've

721
00:38:00,750 --> 00:38:03,450
formalized this, so you can have
a random variable then

722
00:38:03,450 --> 00:38:06,070
it's a very small jump
to say you can have a

723
00:38:06,070 --> 00:38:09,260
multi-dimensional
random variable.

724
00:38:09,260 --> 00:38:11,620
Let's just for example
have a 2-space.

725
00:38:11,620 --> 00:38:13,730
X and Y, for example.

726
00:38:13,730 --> 00:38:20,400
So now we can talk very
conveniently about situations

727
00:38:20,400 --> 00:38:21,990
that factor.

728
00:38:21,990 --> 00:38:30,450
So, for example when I think
about flipping 3 coins, I can

729
00:38:30,450 --> 00:38:35,500
think about that as a
multivariate random variable

730
00:38:35,500 --> 00:38:36,960
in three dimensions.

731
00:38:36,960 --> 00:38:40,120
One dimension represents the
outcome of the first die--

732
00:38:40,120 --> 00:38:44,160
the first coin toss.

733
00:38:44,160 --> 00:38:45,830
Another dimension is the
second, the third

734
00:38:45,830 --> 00:38:47,630
dimension is the third.

735
00:38:47,630 --> 00:38:49,870
So there is a very convenient
way of talking about it, and

736
00:38:49,870 --> 00:38:51,870
we have a more concise
notation.

737
00:38:51,870 --> 00:38:57,180
We say, OK let V be the outcome
of the first die roll,

738
00:38:57,180 --> 00:38:58,400
or whatever.

739
00:38:58,400 --> 00:39:01,480
Let W be the second one, and
then we can think about the

740
00:39:01,480 --> 00:39:05,350
joint probability distribution,
in terms of the

741
00:39:05,350 --> 00:39:07,920
multi-dimensional
random variable.

742
00:39:07,920 --> 00:39:12,680
So we have the random variable
defined by V and W. We will

743
00:39:12,680 --> 00:39:16,072
generally to try to make things
easy for you to know

744
00:39:16,072 --> 00:39:18,120
what we're trying to talk about,
we'll try to remember

745
00:39:18,120 --> 00:39:20,430
to capitalize things when we're
talking about random

746
00:39:20,430 --> 00:39:23,390
variables, and then we'll use
the small numbers to talk

747
00:39:23,390 --> 00:39:26,230
about events.

748
00:39:26,230 --> 00:39:29,330
So this notation would represent
the probability that

749
00:39:29,330 --> 00:39:32,510
V took on the value little
v, and W took on the

750
00:39:32,510 --> 00:39:34,300
value little w.

751
00:39:34,300 --> 00:39:36,110
We'll see examples of
this in a minute.

752
00:39:36,110 --> 00:39:39,290
So the idea is-- you don't need
to do this, it's just a

753
00:39:39,290 --> 00:39:41,210
convenient notation
to write more

754
00:39:41,210 --> 00:39:44,085
complicated things concisely.

755
00:39:48,180 --> 00:39:52,830
Now a concept that's very easy
to talk about, now we have

756
00:39:52,830 --> 00:39:55,950
random variables, is reducing
dimensionality.

757
00:39:55,950 --> 00:39:58,750
And in fact, we will
constantly reduce

758
00:39:58,750 --> 00:40:01,850
dimensionality of complicated
problems that are represented

759
00:40:01,850 --> 00:40:06,670
by multiple dimensions, to
smaller dimensional problems.

760
00:40:06,670 --> 00:40:08,650
And we'll talk about two
ways of doing that.

761
00:40:08,650 --> 00:40:11,570
The first is what we will
call marginalizing.

762
00:40:11,570 --> 00:40:15,650
Marginalizing means, I don't
care what happened in the

763
00:40:15,650 --> 00:40:18,560
other dimensions.

764
00:40:18,560 --> 00:40:22,240
So if I have a probability
rule that told me, for

765
00:40:22,240 --> 00:40:27,850
example, about the outcome of
one toss a fair die, and a

766
00:40:27,850 --> 00:40:32,720
second toss of a fair die, and
if I tell you the joint

767
00:40:32,720 --> 00:40:36,600
probability space
for that, right?

768
00:40:36,600 --> 00:40:39,690
So I would have 6 outcomes on
one dimension, 6 outcomes on

769
00:40:39,690 --> 00:40:42,330
another dimension, let's say
they're all equally likely.

770
00:40:42,330 --> 00:40:46,710
I have 36 points altogether, if
they're all equally likely,

771
00:40:46,710 --> 00:40:50,640
then my probability law is
a joint distribution.

772
00:40:50,640 --> 00:40:54,220
The joint distribution has 32
non-zero points and each point

773
00:40:54,220 --> 00:40:56,700
has height of.

774
00:40:56,700 --> 00:40:57,580
I said the right thing, right.

775
00:40:57,580 --> 00:41:00,090
36 is what I meant to say.

776
00:41:00,090 --> 00:41:02,160
My brain is telling me that I
might not have said that.

777
00:41:02,160 --> 00:41:04,920
I meant 36.

778
00:41:04,920 --> 00:41:10,900
So if I have 36 equally likely
events, how high is each one?

779
00:41:10,900 --> 00:41:12,470
1/36.

780
00:41:12,470 --> 00:41:19,440
OK, so the joint probability
space for two tosses of a fair

781
00:41:19,440 --> 00:41:24,040
6-sided die, is this
6-by-6 space.

782
00:41:24,040 --> 00:41:26,440
And I may be interested
in marginalizing.

783
00:41:26,440 --> 00:41:28,270
Marginalizing would mean,
I don't care what

784
00:41:28,270 --> 00:41:30,590
the second one was.

785
00:41:30,590 --> 00:41:33,520
OK well, how do you infer the
rule for the first one from

786
00:41:33,520 --> 00:41:38,140
the joint, if I don't care what
the second one was, well

787
00:41:38,140 --> 00:41:39,390
you sum out the second.

788
00:41:42,260 --> 00:41:45,790
So if I have this 2-space
that represented the

789
00:41:45,790 --> 00:41:47,040
first and the second.

790
00:41:51,050 --> 00:41:52,730
So, say its X and
Y, for example.

791
00:41:52,730 --> 00:41:59,520
So, I've got 6 points that
represent 1, 2, 3, 4, 5, 6.

792
00:41:59,520 --> 00:42:04,260
And then 6 this way, that sort
of thing, except now I have to

793
00:42:04,260 --> 00:42:06,830
draw in tediously all of
the others, right?

794
00:42:06,830 --> 00:42:10,290
So you get the idea.

795
00:42:10,290 --> 00:42:18,150
Each one of the X's represents a
point with probability 1/36,

796
00:42:18,150 --> 00:42:21,890
and imagine direction that
they're all in straight lines.

797
00:42:21,890 --> 00:42:25,400
Now if I didn't care what is
the second one, how would I

798
00:42:25,400 --> 00:42:28,340
find the rule for the first one,
well I just sum over the

799
00:42:28,340 --> 00:42:28,760
second one.

800
00:42:28,760 --> 00:42:31,210
So, say I'm only interested in
what happened in the first

801
00:42:31,210 --> 00:42:35,310
one, well I would describe all
of the probabilities here to

802
00:42:35,310 --> 00:42:36,610
that point.

803
00:42:36,610 --> 00:42:41,700
I would sum out the one that
I don't care about.

804
00:42:41,700 --> 00:42:42,510
That's obvious, right?

805
00:42:42,510 --> 00:42:47,530
Because if I marginalized these
X's that all represent

806
00:42:47,530 --> 00:42:50,500
the number 1/36 have to turn
into a single dimension axis,

807
00:42:50,500 --> 00:42:54,780
which is just X, and they have
to be 6 numbers that

808
00:42:54,780 --> 00:42:57,770
are each how high?

809
00:42:57,770 --> 00:42:59,620
1/6, right?

810
00:42:59,620 --> 00:43:03,310
So the way I get 6 numbers
that are each 1/6, when I

811
00:43:03,310 --> 00:43:08,160
started with 36 numbers that
were each 1/36 is use sum.

812
00:43:08,160 --> 00:43:10,440
OK, so that's called
marginalization.

813
00:43:10,440 --> 00:43:12,500
The other thing that I
can do is condition.

814
00:43:12,500 --> 00:43:21,090
I can tell you something about
the sample space and ask you

815
00:43:21,090 --> 00:43:24,700
to figure out a conditional
probability.

816
00:43:24,700 --> 00:43:31,690
So I might tell you what's the
probability rule for Y

817
00:43:31,690 --> 00:43:35,630
conditioned on the first
one being 3?

818
00:43:35,630 --> 00:43:36,960
OK.

819
00:43:36,960 --> 00:43:39,700
Mathematically that's a
different problem, that's a

820
00:43:39,700 --> 00:43:44,580
re-scale problem, because
that's Bayes' rule.

821
00:43:44,580 --> 00:43:48,190
So generally if I carved out by
conditioning some fraction

822
00:43:48,190 --> 00:43:51,500
of the sample space, the way
you would compute the new

823
00:43:51,500 --> 00:43:53,840
probabilities would
be to re-scale.

824
00:43:53,840 --> 00:43:56,360
So there's two operations
that we will do.

825
00:43:56,360 --> 00:43:58,890
We will marginalize, which
means summing out.

826
00:43:58,890 --> 00:44:03,690
And we will condition,
which means re-scale.

827
00:44:03,690 --> 00:44:04,860
OK.

828
00:44:04,860 --> 00:44:07,540
So give some practice
at that, let's think

829
00:44:07,540 --> 00:44:12,130
about a tangible problem.

830
00:44:12,130 --> 00:44:14,590
Example, prevalence and
testing for AIDS.

831
00:44:14,590 --> 00:44:19,690
Consider the effectiveness
of a test for AIDS.

832
00:44:19,690 --> 00:44:22,490
This is real data.

833
00:44:22,490 --> 00:44:24,390
Data from the United States.

834
00:44:24,390 --> 00:44:28,060
So imagine that we take a
population, representative of

835
00:44:28,060 --> 00:44:31,070
the population in the United
States, and classify every

836
00:44:31,070 --> 00:44:36,600
individual as having AIDS or
not, and being diagnosed

837
00:44:36,600 --> 00:44:39,660
according to some test as
positive or negative.

838
00:44:42,580 --> 00:44:45,090
OK, two dimensional.

839
00:44:45,090 --> 00:44:48,630
The two dimensions are
what was the value of

840
00:44:48,630 --> 00:44:49,880
AIDS, true or false?

841
00:44:52,700 --> 00:44:57,312
And what's the value of the
test, positive or negative?

842
00:44:57,312 --> 00:45:01,750
So we've divided the population
into four pieces.

843
00:45:01,750 --> 00:45:04,770
And by using the idea of
relative frequency, I've

844
00:45:04,770 --> 00:45:07,450
written probabilities here.

845
00:45:07,450 --> 00:45:12,300
So what's the probability of
choosing by random choice an

846
00:45:12,300 --> 00:45:17,290
individual that has AIDS
and tested positive.

847
00:45:17,290 --> 00:45:21,090
OK, so that's 0,003648,
et cetera.

848
00:45:21,090 --> 00:45:25,140
So I've divided the population
into four groups.

849
00:45:25,140 --> 00:45:27,440
Multidimensional,

850
00:45:27,440 --> 00:45:30,590
multidimensional random variable.

851
00:45:30,590 --> 00:45:31,220
OK.

852
00:45:31,220 --> 00:45:34,310
The question is, what's the
probability that the test is

853
00:45:34,310 --> 00:45:39,020
positive given that the
subject has AIDS?

854
00:45:39,020 --> 00:45:41,990
I want to know how
good the test is.

855
00:45:41,990 --> 00:45:44,920
So the first question I'm going
to ask is, given that

856
00:45:44,920 --> 00:45:48,930
the person has AIDS what's the
probability that the test

857
00:45:48,930 --> 00:45:52,230
gives a true answer?

858
00:45:52,230 --> 00:45:53,970
You've got 60 seconds.

859
00:45:53,970 --> 00:45:55,220
This is harder.

860
00:45:57,590 --> 00:45:58,840
Some people don't think
it's harder.

861
00:46:24,750 --> 00:46:27,730
So what's the probability that
the test is positive, given

862
00:46:27,730 --> 00:46:29,460
that the subject has AIDS?

863
00:46:29,460 --> 00:46:30,730
Is it bigger than 90%?

864
00:46:30,730 --> 00:46:32,070
Between 50% and 90%?

865
00:46:32,070 --> 00:46:32,460
Less than 50%?

866
00:46:32,460 --> 00:46:33,830
Or you can't tell
from the data?

867
00:46:33,830 --> 00:46:37,350
Everybody vote, and the answer
is 100% correct.

868
00:46:37,350 --> 00:46:38,170
Wonderful.

869
00:46:38,170 --> 00:46:40,540
So let me make it harder.

870
00:46:40,540 --> 00:46:43,420
Is it between 90% and 95%?

871
00:46:43,420 --> 00:46:45,886
Or between 95% and a 100%?

872
00:46:45,886 --> 00:46:46,862
AUDIENCE: 95% and a 100%

873
00:46:46,862 --> 00:46:48,080
PROFESSOR: 95%.

874
00:46:48,080 --> 00:46:53,810
Is it between 95% and 97%,
or 97% and 100%?

875
00:46:57,826 --> 00:46:59,940
OK, sorry.

876
00:46:59,940 --> 00:47:01,190
This is called marginalization.

877
00:47:03,860 --> 00:47:06,470
I told you something about the
population that lets you

878
00:47:06,470 --> 00:47:09,660
eliminate some of the numbers.

879
00:47:09,660 --> 00:47:13,390
So if I told you that the person
has AIDS, then I know

880
00:47:13,390 --> 00:47:16,460
I'm in the first column.

881
00:47:16,460 --> 00:47:17,790
That's marginalization.

882
00:47:17,790 --> 00:47:20,880
I gave you new information.

883
00:47:20,880 --> 00:47:23,720
I'm saying the other cases
didn't happen.

884
00:47:23,720 --> 00:47:26,800
I've shrunk the universe, it
used to have 4 groups of

885
00:47:26,800 --> 00:47:32,090
people, now it has 2 groups of
people, I used Bayes' rule.

886
00:47:32,090 --> 00:47:38,330
I need to re-scale the numbers
so that they add to 1.

887
00:47:38,330 --> 00:47:42,150
So these 2 numbers, the only 2
possibilities that can occur--

888
00:47:42,150 --> 00:47:43,870
after I've done the
conditioning, no

889
00:47:43,870 --> 00:47:45,990
longer add to 1.

890
00:47:45,990 --> 00:47:48,410
I've got to make
them add to 1.

891
00:47:48,410 --> 00:47:52,160
I do that by dividing by the
probability of the event that

892
00:47:52,160 --> 00:47:54,620
I'm using to normalize.

893
00:47:54,620 --> 00:47:58,240
So the sum of these two
probabilities is something,

894
00:47:58,240 --> 00:48:03,350
whatever it is 0.003700.

895
00:48:03,350 --> 00:48:06,450
So I divide each of those
probabilities by that sum,

896
00:48:06,450 --> 00:48:07,930
that's just Bayes' rule.

897
00:48:07,930 --> 00:48:11,070
And I find out that the answer
is the probability that the

898
00:48:11,070 --> 00:48:14,190
test is positive--

899
00:48:14,190 --> 00:48:16,780
given that person has AIDS, the
probability that the test

900
00:48:16,780 --> 00:48:20,240
is positive is 0.986.

901
00:48:20,240 --> 00:48:21,490
Good test?

902
00:48:24,310 --> 00:48:26,376
Good test?

903
00:48:26,376 --> 00:48:28,380
98%.

904
00:48:28,380 --> 00:48:30,060
I won't say that.

905
00:48:30,060 --> 00:48:33,160
98%. is a good test right?

906
00:48:33,160 --> 00:48:36,040
Not that today is an appropriate
day to talk about

907
00:48:36,040 --> 00:48:38,230
the outcomes of tests and 98%.

908
00:48:38,230 --> 00:48:39,380
But, I won't mention that.

909
00:48:39,380 --> 00:48:43,810
OK, so good test.

910
00:48:43,810 --> 00:48:47,530
The accuracy of the test
is greater than 98%.

911
00:48:47,530 --> 00:48:48,780
Quite good.

912
00:48:56,020 --> 00:48:57,010
New question.

913
00:48:57,010 --> 00:48:59,310
What's the probability that
the subject has AIDS given

914
00:48:59,310 --> 00:49:00,560
that the test is positive?

915
00:49:15,020 --> 00:49:16,270
Everybody vote. (1),
(2), (3), (4).

916
00:49:22,210 --> 00:49:23,020
Looks 100%.

917
00:49:23,020 --> 00:49:24,970
OK, the answer is
less than 50%.

918
00:49:24,970 --> 00:49:25,610
Why is that?

919
00:49:25,610 --> 00:49:27,970
Well that's another
marginalization problem, but

920
00:49:27,970 --> 00:49:31,490
now we're marginalizing on
a different population.

921
00:49:31,490 --> 00:49:34,840
This is how you can go awry
thinking about probability.

922
00:49:34,840 --> 00:49:37,670
The 2 numbers seem kind
of contradictory.

923
00:49:37,670 --> 00:49:40,550
Here I'm saying that the test
came out positive and I'm

924
00:49:40,550 --> 00:49:44,570
asking does the subject
have AIDS.

925
00:49:44,570 --> 00:49:45,970
It's still marginalization.

926
00:49:45,970 --> 00:49:50,140
I'm still throwing away 2 of the
conditions, two fractions

927
00:49:50,140 --> 00:49:52,610
of the population, I'm only
thinking about 2.

928
00:49:52,610 --> 00:49:58,240
I still have to normalize so
that the sums come out 1, but

929
00:49:58,240 --> 00:49:59,490
the numbers are different.

930
00:49:59,490 --> 00:50:01,772
Yes?

931
00:50:01,772 --> 00:50:03,022
AUDIENCE: [INAUDIBLE PHRASE].

932
00:50:08,534 --> 00:50:09,980
PROFESSOR: Thank you.

933
00:50:09,980 --> 00:50:13,890
Because my brain's
not working.

934
00:50:13,890 --> 00:50:16,150
OK, I've been saying
marginalization and I meant

935
00:50:16,150 --> 00:50:19,130
uniformly, over the last five
minutes, to be saying

936
00:50:19,130 --> 00:50:21,490
conditioning.

937
00:50:21,490 --> 00:50:24,450
OK, so I skipped breakfast this
morning, my blood sugar

938
00:50:24,450 --> 00:50:27,450
is low, sorry.

939
00:50:27,450 --> 00:50:28,850
Thank you very much.

940
00:50:28,850 --> 00:50:33,620
I should have been saying
conditioning.

941
00:50:33,620 --> 00:50:34,580
Sorry.

942
00:50:34,580 --> 00:50:36,290
OK, so backing up.

943
00:50:38,920 --> 00:50:46,200
OK I conditioned on the fact
that the person had AIDS, and

944
00:50:46,200 --> 00:50:49,100
then I conditioned on
the fact that the

945
00:50:49,100 --> 00:50:50,750
test came up positive.

946
00:50:50,750 --> 00:50:54,850
In both cases I was
conditioning.

947
00:50:54,850 --> 00:50:58,050
In both cases I was
doing Bayes' rule.

948
00:50:58,050 --> 00:51:00,200
Please ignore the person
who can't connect his

949
00:51:00,200 --> 00:51:02,740
brain to his mouth.

950
00:51:02,740 --> 00:51:07,340
So, here because the
conditioning event has a very

951
00:51:07,340 --> 00:51:11,950
different set of numbers from
these numbers, the relative

952
00:51:11,950 --> 00:51:18,450
likelihood that the subject
has AIDS is small.

953
00:51:18,450 --> 00:51:25,460
So even though the test is very
effective in identifying

954
00:51:25,460 --> 00:51:30,990
cases that are known to be true,
it is not very effective

955
00:51:30,990 --> 00:51:35,840
in taking a random person from
the population and saying the

956
00:51:35,840 --> 00:51:39,060
test was positive,
you have it.

957
00:51:39,060 --> 00:51:42,430
OK, those are very different
things and the probability

958
00:51:42,430 --> 00:51:45,700
theory gives us a way
to say exactly how

959
00:51:45,700 --> 00:51:46,950
different those are.

960
00:51:49,330 --> 00:51:52,710
Why are they so different?

961
00:51:52,710 --> 00:51:54,785
The reason they're different
is that other word.

962
00:51:57,300 --> 00:51:58,370
Because the marginal

963
00:51:58,370 --> 00:52:00,530
probabilities are so different.

964
00:52:00,530 --> 00:52:05,320
And that is because the
population is skewed.

965
00:52:05,320 --> 00:52:09,650
So the fact that the test came
out positive, is offset at

966
00:52:09,650 --> 00:52:14,000
least somewhat by the skew
in the population.

967
00:52:14,000 --> 00:52:17,210
So the point here is actually
marginalizing.

968
00:52:17,210 --> 00:52:20,470
If I think about how many people
in the population have

969
00:52:20,470 --> 00:52:27,120
AIDS, that means I'm summing
on the columns, rather than

970
00:52:27,120 --> 00:52:29,220
conditioning.

971
00:52:29,220 --> 00:52:33,200
And what you see is a very
skewed population.

972
00:52:33,200 --> 00:52:38,310
And that's the reason you can't
conclude from the test,

973
00:52:38,310 --> 00:52:42,440
whether or not this particular
subject has the disease or not

974
00:52:42,440 --> 00:52:44,980
because the population
is so skewed.

975
00:52:44,980 --> 00:52:49,850
So this was intended to be an
example of conditioning versus

976
00:52:49,850 --> 00:52:52,140
marginalization and how you
think about that in a

977
00:52:52,140 --> 00:52:54,400
multi-dimensional
random variable.

978
00:52:54,400 --> 00:52:55,976
Yes?

979
00:52:55,976 --> 00:52:59,148
AUDIENCE: Don't you sum
[UNINTELLIGIBLE] in order to

980
00:52:59,148 --> 00:53:00,398
do Bayes' rule?

981
00:53:02,808 --> 00:53:08,880
PROFESSOR: In order to condition
on has AIDS, you

982
00:53:08,880 --> 00:53:12,140
need to sum has AIDS.

983
00:53:12,140 --> 00:53:13,950
And then you use that number.

984
00:53:13,950 --> 00:53:14,680
Yes?

985
00:53:14,680 --> 00:53:15,732
That's right.

986
00:53:15,732 --> 00:53:16,968
AUDIENCE: So how are
they different?

987
00:53:16,968 --> 00:53:20,740
PROFESSOR: One of them has a
[UNINTELLIGIBLE] and the other

988
00:53:20,740 --> 00:53:21,400
one doesn't.

989
00:53:21,400 --> 00:53:26,570
So when we did Bayes' rule, we
did the marginalization here,

990
00:53:26,570 --> 00:53:32,850
but then we use that summed
number to normalize the

991
00:53:32,850 --> 00:53:36,400
individual probabilities by
scaling, by dividing.

992
00:53:36,400 --> 00:53:40,600
So that the new sum, over
the new smaller sample

993
00:53:40,600 --> 00:53:45,280
space is still one.

994
00:53:45,280 --> 00:53:47,320
So your point 's right.

995
00:53:47,320 --> 00:53:50,600
So regardless of whether we're
conditioning or marginalizing,

996
00:53:50,600 --> 00:53:54,080
we still end up computing
the marginals.

997
00:53:54,080 --> 00:53:56,010
it's just that in one case were
done, and in the other

998
00:53:56,010 --> 00:54:02,567
case we use that marginal
to re-scale OK?

999
00:54:07,120 --> 00:54:12,421
So I said, we could just use
set theory and we're done.

1000
00:54:12,421 --> 00:54:14,420
We'll in fact use
random variables

1001
00:54:14,420 --> 00:54:15,230
because it's simpler.

1002
00:54:15,230 --> 00:54:17,740
That's one of the two other
things we need to do which are

1003
00:54:17,740 --> 00:54:20,170
non-essential, it just makes
our life easier.

1004
00:54:20,170 --> 00:54:23,160
And the other non-essential
thing that we will do is

1005
00:54:23,160 --> 00:54:26,430
represent it in some sort
of a Python structure.

1006
00:54:26,430 --> 00:54:29,200
So we would like to be able
to conveniently represent

1007
00:54:29,200 --> 00:54:32,590
probabilities in Python.

1008
00:54:32,590 --> 00:54:36,690
The way we'll do that, is a
little obscure the first time

1009
00:54:36,690 --> 00:54:37,500
you look at it.

1010
00:54:37,500 --> 00:54:40,160
But again, once you've done
it a few times it's a very

1011
00:54:40,160 --> 00:54:41,920
natural way of doing
it, otherwise we

1012
00:54:41,920 --> 00:54:43,200
wouldn't do it this way.

1013
00:54:43,200 --> 00:54:47,170
How are we going to represent
probability laws in Python?

1014
00:54:47,170 --> 00:54:54,470
The way we'll do it, since the
labels for random variables

1015
00:54:54,470 --> 00:54:57,040
can be lots of different
things-- so for example, the

1016
00:54:57,040 --> 00:55:01,270
label in the previous one was
in the case of the subject

1017
00:55:01,270 --> 00:55:05,900
having AIDS or not, the label
was true or false.

1018
00:55:05,900 --> 00:55:10,190
The label for the test was
positive or negative.

1019
00:55:10,190 --> 00:55:14,710
So in order to allow you to
give symbolic and human

1020
00:55:14,710 --> 00:55:21,500
meaningful names to events we
will use a dictionary as the

1021
00:55:21,500 --> 00:55:27,300
fundamental way of associating
probabilities with events.

1022
00:55:27,300 --> 00:55:29,450
So, we'll represent
a probability

1023
00:55:29,450 --> 00:55:31,270
distribution by a class--

1024
00:55:31,270 --> 00:55:34,760
what a surprise, by
a Python class--

1025
00:55:34,760 --> 00:55:37,460
that we will call DDist which
means discrete distribution.

1026
00:55:39,980 --> 00:55:47,110
DDists want to associate the
name of an atomic event which

1027
00:55:47,110 --> 00:55:53,280
we will let you use any string,
or in fact any--

1028
00:55:53,280 --> 00:55:55,580
I should generalize that.

1029
00:55:55,580 --> 00:56:02,230
You can use any Python data
structure to identify an

1030
00:56:02,230 --> 00:56:04,190
atomic event.

1031
00:56:04,190 --> 00:56:06,870
And then we will associate
that using a Python

1032
00:56:06,870 --> 00:56:10,970
dictionary, with the
probability.

1033
00:56:10,970 --> 00:56:16,310
So what we will do when you
instantiate a new discrete

1034
00:56:16,310 --> 00:56:21,380
distribution, you will-- the
instantiation rule, you must

1035
00:56:21,380 --> 00:56:22,670
call it with a dictionary.

1036
00:56:22,670 --> 00:56:26,550
A dictionary is a thing in
Python that associates one

1037
00:56:26,550 --> 00:56:31,130
thing with another thing, I'll
give an example in a minute.

1038
00:56:31,130 --> 00:56:37,430
And the utility of this is that
you'll be able to use as

1039
00:56:37,430 --> 00:56:43,320
your atomic event a string, like
true or false, a string

1040
00:56:43,320 --> 00:56:46,840
like positive or negative, or
something more complicated

1041
00:56:46,840 --> 00:56:47,740
like a tuple.

1042
00:56:47,740 --> 00:56:49,810
And I'll show you an example of
where you would want to do

1043
00:56:49,810 --> 00:56:51,470
that in just a second.

1044
00:56:51,470 --> 00:56:55,120
So the idea is going to be
you establish a discrete

1045
00:56:55,120 --> 00:56:58,690
distribution by the unique
method called the dictionary.

1046
00:56:58,690 --> 00:57:08,080
The dictionary is just a list
of keys which tell you which

1047
00:57:08,080 --> 00:57:11,130
event that you're trying to
name the probability of.

1048
00:57:11,130 --> 00:57:13,390
Associated with a number,
and that number is the

1049
00:57:13,390 --> 00:57:15,380
probability.

1050
00:57:15,380 --> 00:57:18,210
And this shows you that
there's one extremely

1051
00:57:18,210 --> 00:57:22,450
interesting method, which
is the Prob method.

1052
00:57:22,450 --> 00:57:25,790
The idea is that Prob will
tell you what is the

1053
00:57:25,790 --> 00:57:28,640
probability associated
with that key.

1054
00:57:28,640 --> 00:57:31,020
If it doesn't find the key in
the dictionary, I'll tell you

1055
00:57:31,020 --> 00:57:32,370
the answer is 0.

1056
00:57:32,370 --> 00:57:35,550
We do that for a specific reason
too, because a lot of

1057
00:57:35,550 --> 00:57:38,900
the probability spaces that we
will talk about, have lots of

1058
00:57:38,900 --> 00:57:40,560
0's in them.

1059
00:57:40,560 --> 00:57:44,110
So instead of having to
enumerate all of the cases

1060
00:57:44,110 --> 00:57:48,020
that are 0 we will assume that
if you didn't tell us a

1061
00:57:48,020 --> 00:57:53,480
probability, the answer was 0.

1062
00:57:53,480 --> 00:57:55,900
OK so this is the idea.

1063
00:57:55,900 --> 00:58:02,850
I could say use the disk module
in lib 601 to create

1064
00:58:02,850 --> 00:58:05,770
the outcome of a coin
toss experiment.

1065
00:58:05,770 --> 00:58:08,130
And I have a syntax error.

1066
00:58:08,130 --> 00:58:10,650
This should have had
a squiggle brace.

1067
00:58:13,490 --> 00:58:15,330
A dictionary is something
that in Python--

1068
00:58:15,330 --> 00:58:18,420
So I should have said something
like this--

1069
00:58:18,420 --> 00:58:19,670
dist.DDist of squiggle.

1070
00:58:23,010 --> 00:58:25,450
Sorry about that, that should've
said squiggle, I'll

1071
00:58:25,450 --> 00:58:27,770
fix it and put the answer
on the website.

1072
00:58:30,290 --> 00:58:40,090
Head should be associated with
the probability 0.5 and tail

1073
00:58:40,090 --> 00:58:43,200
should be associated with
the probability 0.5.

1074
00:58:43,200 --> 00:58:46,840
End of dictionary,
end of call.

1075
00:58:46,840 --> 00:58:48,700
Sorry, I missed the squiggle.

1076
00:58:48,700 --> 00:58:51,140
Actually what happened was,
I put the squiggle in

1077
00:58:51,140 --> 00:58:52,260
and LaTeX ate it.

1078
00:58:52,260 --> 00:58:56,790
Because that's the
LaTeX, anyway.

1079
00:58:56,790 --> 00:59:00,930
It's sort of my fault.

1080
00:59:00,930 --> 00:59:02,790
The dog ate my homework.

1081
00:59:02,790 --> 00:59:05,140
LaTeX ate my squiggle, it's
sort of the same thing.

1082
00:59:08,140 --> 00:59:11,480
So having defined a
distribution, then I can ask

1083
00:59:11,480 --> 00:59:14,560
what's the probability
of the event head?

1084
00:59:14,560 --> 00:59:15,880
The answer is a half.

1085
00:59:15,880 --> 00:59:17,460
The probability of event tail?

1086
00:59:17,460 --> 00:59:19,360
The answer is a half.

1087
00:59:19,360 --> 00:59:21,640
The probability of event H?

1088
00:59:21,640 --> 00:59:23,860
There is no H. The answer 0.

1089
00:59:23,860 --> 00:59:25,920
That's what I meant
by sparsity.

1090
00:59:25,920 --> 00:59:29,400
If I didn't tell you what the
probability is, we assume the

1091
00:59:29,400 --> 00:59:30,650
answer is 0.

1092
00:59:33,290 --> 00:59:37,830
Conditional probabilities are
a little more obscure.

1093
00:59:37,830 --> 00:59:40,830
What's the conditional
probability that the test

1094
00:59:40,830 --> 00:59:44,060
gives me some outcome given that
I tell you the status of

1095
00:59:44,060 --> 00:59:47,840
whether the patient has,
or doesn't have AIDS?

1096
00:59:47,840 --> 00:59:49,930
OK, well conditionals--

1097
00:59:49,930 --> 00:59:54,900
you're going to have to tell
me which case I want to

1098
00:59:54,900 --> 00:59:56,680
condition on.

1099
00:59:56,680 --> 00:59:59,620
So in order for me to tell you
the right probability law you

1100
00:59:59,620 --> 01:00:04,270
have to tell me does the person
have AIDS or not.

1101
01:00:04,270 --> 01:00:06,430
So that becomes an argument.

1102
01:00:06,430 --> 01:00:09,620
So we're going to represent
conditional probabilities as

1103
01:00:09,620 --> 01:00:11,450
procedures.

1104
01:00:11,450 --> 01:00:12,160
That's a little weird.

1105
01:00:12,160 --> 01:00:17,920
So the input to the procedure,
specifies the condition.

1106
01:00:17,920 --> 01:00:22,420
So if I want to call the
procedure and find out what's

1107
01:00:22,420 --> 01:00:26,410
the distribution for the
tests, given that

1108
01:00:26,410 --> 01:00:29,070
the person has AIDS?

1109
01:00:29,070 --> 01:00:32,085
Then I would call, test
given AIDS of true.

1110
01:00:34,980 --> 01:00:42,040
So if AIDS is true, return this
DDist, otherwise return

1111
01:00:42,040 --> 01:00:43,140
this DDist.

1112
01:00:43,140 --> 01:00:46,460
So it's a little bizarre but
think about what it has to do.

1113
01:00:46,460 --> 01:00:50,720
If I want to specify a
conditional probability, I

1114
01:00:50,720 --> 01:00:53,640
have to tell you an answer.

1115
01:00:53,640 --> 01:00:56,990
And that's what the
parameter is for.

1116
01:00:56,990 --> 01:01:00,260
So the way that would work is
illustrated here having

1117
01:01:00,260 --> 01:01:03,910
defined this as the conditional
distribution I

1118
01:01:03,910 --> 01:01:07,980
could call it by saying what is
the distribution on tests

1119
01:01:07,980 --> 01:01:11,210
given that AIDS was true?

1120
01:01:11,210 --> 01:01:12,775
And the answer to that
is the DDist.

1121
01:01:15,320 --> 01:01:20,140
Or if I had that DDist, which
would be this phrase, I could

1122
01:01:20,140 --> 01:01:22,650
say what's then the probability
in that new

1123
01:01:22,650 --> 01:01:26,980
distribution that the
answer is negative?

1124
01:01:26,980 --> 01:01:30,720
Then I would look up the dot
prob method within the

1125
01:01:30,720 --> 01:01:36,160
resulting conditional
distribution, and look up the

1126
01:01:36,160 --> 01:01:37,410
condition negative.

1127
01:01:39,680 --> 01:01:42,930
And finally the way that I
would think about a joint

1128
01:01:42,930 --> 01:01:48,850
probability distribution,
is to use a tuple.

1129
01:01:48,850 --> 01:01:50,700
Joint probability distributions
are

1130
01:01:50,700 --> 01:01:54,600
multi-dimensional, tuples
are multi-dimensional.

1131
01:01:54,600 --> 01:01:57,500
So for example, if I wanted
to represent this

1132
01:01:57,500 --> 01:02:03,990
multi-dimensional data,
I might have the joint

1133
01:02:03,990 --> 01:02:11,300
distribution of AIDS
and tests.

1134
01:02:11,300 --> 01:02:12,870
OK that's a 2-by-2.

1135
01:02:12,870 --> 01:02:16,900
AIDS can take on 2 different
values, true or false.

1136
01:02:16,900 --> 01:02:18,380
And tests can take
on 2 different

1137
01:02:18,380 --> 01:02:19,820
values, positive or negative.

1138
01:02:19,820 --> 01:02:21,940
So there's 4 cases.

1139
01:02:21,940 --> 01:02:25,310
The way I would specify a joint
distribution would be

1140
01:02:25,310 --> 01:02:29,950
create a joint distribution
starting with the marginal

1141
01:02:29,950 --> 01:02:39,900
distribution for AIDS and then
using Bayes' rule tell me the

1142
01:02:39,900 --> 01:02:44,860
two different conditional
probabilities given AIDS.

1143
01:02:44,860 --> 01:02:50,130
And that then will create a new
joint distribution that

1144
01:02:50,130 --> 01:02:53,980
whose DDist is a tuple.

1145
01:02:53,980 --> 01:02:58,900
So in this new joint
distribution, AIDS and tests,

1146
01:02:58,900 --> 01:03:02,860
if AIDS is false, and
test is negative--

1147
01:03:02,860 --> 01:03:06,830
so false negative
is this number--

1148
01:03:06,830 --> 01:03:12,830
the probability associated with
tuple is that number.

1149
01:03:12,830 --> 01:03:14,710
Is that clear?

1150
01:03:14,710 --> 01:03:17,920
So I'm going to construct
joint distributions by

1151
01:03:17,920 --> 01:03:22,870
thinking about conditional
probabilities.

1152
01:03:22,870 --> 01:03:25,290
So I have a simple distributions
which are

1153
01:03:25,290 --> 01:03:26,850
defined with dictionaries.

1154
01:03:26,850 --> 01:03:30,040
I have conditional probabilities
which are

1155
01:03:30,040 --> 01:03:31,790
defined by procedures.

1156
01:03:31,790 --> 01:03:34,970
And I have joint probabilities
which are defined by tuples.

1157
01:03:39,230 --> 01:03:43,990
OK, so that's the Python magic
that we will use and a lot of

1158
01:03:43,990 --> 01:03:48,010
the exercises for Week 10 have
to do with getting that

1159
01:03:48,010 --> 01:03:50,170
nomenclature straight.

1160
01:03:50,170 --> 01:03:52,530
It's a little confusing at
first, I assure you that by

1161
01:03:52,530 --> 01:03:54,130
the time you've practiced
with it, it is

1162
01:03:54,130 --> 01:03:56,910
a reasonable notation.

1163
01:03:56,910 --> 01:03:59,970
It just takes a little bit of
practice to get onto it, much

1164
01:03:59,970 --> 01:04:01,380
like other notations.

1165
01:04:01,380 --> 01:04:03,650
OK where are we going
with this?

1166
01:04:03,650 --> 01:04:06,000
What we would like to do is
solve that problem that I

1167
01:04:06,000 --> 01:04:08,120
showed at the beginning
of the hour.

1168
01:04:08,120 --> 01:04:12,620
So we would like to know things
like, where am I?

1169
01:04:12,620 --> 01:04:15,670
So the kind of thing that we're
going to do is think

1170
01:04:15,670 --> 01:04:19,720
about where am I based on my
current velocity and where I

1171
01:04:19,720 --> 01:04:23,280
think I am, odometry--

1172
01:04:23,280 --> 01:04:27,200
which is uncertain,
it's unreliable--

1173
01:04:27,200 --> 01:04:29,750
versus for example where
I think I am

1174
01:04:29,750 --> 01:04:31,670
based on noisy sensors.

1175
01:04:31,670 --> 01:04:36,860
OK so that's like two
independent noisy things.

1176
01:04:36,860 --> 01:04:37,070
Right?

1177
01:04:37,070 --> 01:04:39,410
The odometry you can't
completely rely on it.

1178
01:04:39,410 --> 01:04:42,300
You've probably run
into that by now.

1179
01:04:42,300 --> 01:04:44,750
The sonars are not completely
reliable.

1180
01:04:44,750 --> 01:04:46,960
So there are two kinds
of noisy things.

1181
01:04:46,960 --> 01:04:49,470
How do you optimally
combine them?

1182
01:04:49,470 --> 01:04:51,830
That's where we're heading.

1183
01:04:51,830 --> 01:04:54,350
So the idea is going to be
here I am, I think I'm a

1184
01:04:54,350 --> 01:04:56,510
robot, I think I'm heading
toward a wall, I'd like to

1185
01:04:56,510 --> 01:04:58,790
know where am I.

1186
01:04:58,790 --> 01:05:01,430
So the kinds of data that we're
going to look at are

1187
01:05:01,430 --> 01:05:08,370
things like, I think I know
where I started out.

1188
01:05:08,370 --> 01:05:10,360
Now my thinking could
be pretty vague.

1189
01:05:10,360 --> 01:05:13,340
It could be, I have no clue so
I'm going to assume that I'm

1190
01:05:13,340 --> 01:05:17,030
equally likely anywhere
in space.

1191
01:05:17,030 --> 01:05:19,600
So I have a small probability
of being many places.

1192
01:05:19,600 --> 01:05:20,850
That just means that my initial

1193
01:05:20,850 --> 01:05:22,100
distribution is very broad.

1194
01:05:25,510 --> 01:05:29,510
But then I will define where
I think I am by taking into

1195
01:05:29,510 --> 01:05:34,300
account where I think I will
be after my next step.

1196
01:05:34,300 --> 01:05:38,320
So I think I'm moving
at some speed.

1197
01:05:38,320 --> 01:05:41,120
If I were here, and if
I'm going at some

1198
01:05:41,120 --> 01:05:43,790
speed I'll be there.

1199
01:05:43,790 --> 01:05:48,650
So we will formalize that by
thinking about a transition.

1200
01:05:48,650 --> 01:05:52,920
I think that if I am here at
time T, I will be there at

1201
01:05:52,920 --> 01:05:55,860
time T plus 1.

1202
01:05:55,860 --> 01:05:58,680
And I'll also think about, what
do I think the sonars

1203
01:05:58,680 --> 01:05:59,720
should've told me.

1204
01:05:59,720 --> 01:06:02,090
If I think I'm here, what would
the sonars have said?

1205
01:06:02,090 --> 01:06:05,060
If I think I'm here, what would
the sonars have said?

1206
01:06:05,060 --> 01:06:09,160
And we'll use those as a way
to work backwards in

1207
01:06:09,160 --> 01:06:12,890
probability, use Bayes' rule.

1208
01:06:12,890 --> 01:06:16,970
To say, I have a noisy idea
about where I will be if I

1209
01:06:16,970 --> 01:06:19,230
started there.

1210
01:06:19,230 --> 01:06:22,180
I have a noisy idea of what the
sonars would have said, if

1211
01:06:22,180 --> 01:06:24,190
I started there.

1212
01:06:24,190 --> 01:06:25,930
But I don't know where
I started.

1213
01:06:25,930 --> 01:06:28,240
Where did I start?

1214
01:06:28,240 --> 01:06:32,780
That's the way we're going to
use the probability theory.

1215
01:06:32,780 --> 01:06:36,650
So for example, if I thought I
was here and if I thought I

1216
01:06:36,650 --> 01:06:41,770
was going ahead 2 units in
space per unit in time, I

1217
01:06:41,770 --> 01:06:46,290
would think that the
next time I'm here.

1218
01:06:46,290 --> 01:06:49,350
But since I'm not quite sure
where I was maybe I'll be

1219
01:06:49,350 --> 01:06:51,370
there, and maybe I'll be there,
but there's very little

1220
01:06:51,370 --> 01:06:52,760
chance that I'll be there.

1221
01:06:52,760 --> 01:06:57,230
That's what I mean by
a transition model.

1222
01:06:57,230 --> 01:07:01,760
It's a probabilistic way of
describing the difference

1223
01:07:01,760 --> 01:07:04,900
between where I start and where
I finish in one step.

1224
01:07:08,040 --> 01:07:11,060
Similarly, we'll think about
an observation model.

1225
01:07:11,060 --> 01:07:13,310
If I think I'm here,
what do I think the

1226
01:07:13,310 --> 01:07:14,900
sonars would have said.

1227
01:07:14,900 --> 01:07:18,910
Well I think I've got some
distribution that it's very

1228
01:07:18,910 --> 01:07:21,870
likely that they'll give me the
right answer, but it might

1229
01:07:21,870 --> 01:07:23,850
be a little short it
might be a long.

1230
01:07:23,850 --> 01:07:25,760
Maybe it'll make
a bigger error.

1231
01:07:25,760 --> 01:07:30,760
So I'll think about
two things.

1232
01:07:30,760 --> 01:07:34,220
Where do I think I will be
based on how I'm going?

1233
01:07:34,220 --> 01:07:37,200
And where do I think I'll be
based on my observations?

1234
01:07:37,200 --> 01:07:39,640
And then we'll try to formalize
that into a

1235
01:07:39,640 --> 01:07:42,790
structure that gives me a better
idea of where I am.

1236
01:07:46,040 --> 01:07:49,130
That's the point of the
exercises next week when we

1237
01:07:49,130 --> 01:07:51,420
won't have a lecture.

1238
01:07:51,420 --> 01:07:53,810
So this week we're going to
learn how to do some very

1239
01:07:53,810 --> 01:07:57,660
simple ideas with modelling
probabilities.

1240
01:07:57,660 --> 01:07:59,950
With thinking about these
kinds of distributions.

1241
01:07:59,950 --> 01:08:02,380
And the idea next week then is
going to be incorporating it

1242
01:08:02,380 --> 01:08:06,310
into a structure that will let
us figure out where the robot

1243
01:08:06,310 --> 01:08:10,700
is in some sort of
an optimal sense.

1244
01:08:10,700 --> 01:08:13,220
So thinking about optimal --

1245
01:08:13,220 --> 01:08:14,720
let's come back to the
original question.

1246
01:08:17,229 --> 01:08:23,130
How much would you pay
me to play the game?

1247
01:08:23,130 --> 01:08:24,740
OK, we had some votes.

1248
01:08:24,740 --> 01:08:27,470
They didn't add up to 1.

1249
01:08:27,470 --> 01:08:30,160
What should I do to make
them add up to 1?

1250
01:08:33,000 --> 01:08:34,330
Divide by the sum.

1251
01:08:34,330 --> 01:08:35,960
Right?

1252
01:08:35,960 --> 01:08:37,359
Look at all of you know
already, right?

1253
01:08:37,359 --> 01:08:41,149
So you now know all this great
probability theory.

1254
01:08:41,149 --> 01:08:44,630
So the question is can we use
probability theory to come up

1255
01:08:44,630 --> 01:08:50,010
with a rational way of thinking
how much it's worth?

1256
01:08:50,010 --> 01:08:55,210
Most of you thought that it's
worth less than 10$.

1257
01:08:55,210 --> 01:08:56,979
OK, so how do we think
about this?

1258
01:08:56,979 --> 01:09:01,069
How do we use the theory that
we just generated to come up

1259
01:09:01,069 --> 01:09:05,560
with a rational decision about
how much that's worth?

1260
01:09:05,560 --> 01:09:10,210
OK, thinking about the bet
quantitatively, what we're

1261
01:09:10,210 --> 01:09:11,439
going to try to do
is think about it

1262
01:09:11,439 --> 01:09:13,710
with probability theory.

1263
01:09:13,710 --> 01:09:18,200
There are 5 possibilities
inside the bag.

1264
01:09:18,200 --> 01:09:22,040
Originally there could have been
4 white, or 3 white and 1

1265
01:09:22,040 --> 01:09:26,790
red, or 2 and 2, or 1
and 3, or 0 and 4.

1266
01:09:26,790 --> 01:09:28,290
That was the original case.

1267
01:09:28,290 --> 01:09:29,040
You didn't know.

1268
01:09:29,040 --> 01:09:30,260
I didn't know.

1269
01:09:30,260 --> 01:09:31,970
They were thrown into
the bag over here.

1270
01:09:31,970 --> 01:09:33,160
We didn't know.

1271
01:09:33,160 --> 01:09:36,250
How much would that game--

1272
01:09:36,250 --> 01:09:43,590
how much should you be willing
to pay to play that game?

1273
01:09:43,590 --> 01:09:48,189
Someone asked how many white
ones and how many red ones did

1274
01:09:48,189 --> 01:09:49,810
the person put in the bag?

1275
01:09:49,810 --> 01:09:51,609
I don't have a clue, right?

1276
01:09:51,609 --> 01:09:54,970
We need a model for
the person.

1277
01:09:54,970 --> 01:10:01,940
Since I don't have a clue, one
very common strategy is to say

1278
01:10:01,940 --> 01:10:03,770
all these things I know nothing
about let's just

1279
01:10:03,770 --> 01:10:06,880
assume they're all
equally likely.

1280
01:10:06,880 --> 01:10:10,570
So that's called maximum
likelihood, when you do that.

1281
01:10:10,570 --> 01:10:12,670
There's other possible
strategies.

1282
01:10:12,670 --> 01:10:14,990
I'll use the maximum likelihood
idea just

1283
01:10:14,990 --> 01:10:16,290
because it's easy.

1284
01:10:16,290 --> 01:10:18,080
So I have no idea.

1285
01:10:18,080 --> 01:10:21,790
Let's just assume that here's
all of the conditions that

1286
01:10:21,790 --> 01:10:22,450
could have happened.

1287
01:10:22,450 --> 01:10:24,602
The number of red that are in
the bag could have been 0, 1,

1288
01:10:24,602 --> 01:10:25,852
2, 3, or 4.

1289
01:10:28,000 --> 01:10:31,270
I have no idea how the
person chose the

1290
01:10:31,270 --> 01:10:33,700
number of LEGO parts.

1291
01:10:33,700 --> 01:10:38,580
So I'll assume that each of
those cases is 1/5 likely,

1292
01:10:38,580 --> 01:10:41,730
since there's 5 cases.

1293
01:10:41,730 --> 01:10:45,970
OK now I'll think about what's
my expected value of the

1294
01:10:45,970 --> 01:10:50,350
amount of money that I'll make
if the random variable S,

1295
01:10:50,350 --> 01:10:54,910
which is the number of red
things that are in the bag was

1296
01:10:54,910 --> 01:10:56,430
s which is either 0,
1, 2, 3, or 4.

1297
01:10:59,260 --> 01:11:04,410
OK, if there are 0, how much
money do you expect to make?

1298
01:11:04,410 --> 01:11:06,410
None.

1299
01:11:06,410 --> 01:11:08,610
If there are 4 reds, how
much money would

1300
01:11:08,610 --> 01:11:11,680
you expect to make?

1301
01:11:11,680 --> 01:11:19,870
$20 If there are 2 reds, you
would expect to make 10 $.

1302
01:11:19,870 --> 01:11:21,860
Everybody see that.

1303
01:11:21,860 --> 01:11:24,960
I'm trying to think through a
logical sequence of steps for

1304
01:11:24,960 --> 01:11:28,450
thinking about how much is it
worth to play the game.

1305
01:11:28,450 --> 01:11:33,110
So this is the amount of money
that you would expect given

1306
01:11:33,110 --> 01:11:37,460
that the number of red in the
bag, which you don't know,

1307
01:11:37,460 --> 01:11:39,750
were 0, 1, 2, 3, or 4.

1308
01:11:39,750 --> 01:11:41,560
That's this row.

1309
01:11:41,560 --> 01:11:45,360
What's the probability, what's
the expected value of the

1310
01:11:45,360 --> 01:11:49,410
amount of money you would
get., and that happens?

1311
01:11:49,410 --> 01:11:51,390
Well I have to use
Bayes' rule.

1312
01:11:54,020 --> 01:11:57,810
What I need to do is I have to
take this probability times

1313
01:11:57,810 --> 01:12:00,670
that amount to get that
dollar value.

1314
01:12:00,670 --> 01:12:08,530
So over here, in the event that
there are 4 reds in the

1315
01:12:08,530 --> 01:12:13,435
bag, I'm expecting to get $20
but that's only 1/5 likely.

1316
01:12:16,330 --> 01:12:16,590
Right?

1317
01:12:16,590 --> 01:12:20,440
Because there don't have to
be 4 reds in the bag.

1318
01:12:20,440 --> 01:12:23,920
So I multiply the 1/5 times
the $20, and I get 4$.

1319
01:12:23,920 --> 01:12:27,200
So my expected outcome
for this trial is 4$.

1320
01:12:29,710 --> 01:12:33,490
Here, I'm expecting to make 10$
if I knew that there was 2

1321
01:12:33,490 --> 01:12:34,390
reds in the bag.

1322
01:12:34,390 --> 01:12:36,070
But I don't know that there's
2 reds in the bag, there's a

1323
01:12:36,070 --> 01:12:40,340
1/5 probability there's
2 reds in the bag.

1324
01:12:40,340 --> 01:12:44,465
So 1/5 of my expected amount of
money which is 10$ is 2$.

1325
01:12:47,090 --> 01:12:49,970
So then in order to figure out
my expected amount money I

1326
01:12:49,970 --> 01:12:53,630
just add these all up,
marginalizing.

1327
01:12:53,630 --> 01:12:55,110
And I get the [UNINTELLIGIBLE]

1328
01:12:55,110 --> 01:12:58,470
4 plus 3 is 7 plus 2
is 9 plus 1 is 10.

1329
01:12:58,470 --> 01:13:02,470
So this theory says it if I can
regard the person who put

1330
01:13:02,470 --> 01:13:07,260
the LEGOs in the bag as being
completely random, I should

1331
01:13:07,260 --> 01:13:12,110
expect to make 10$ on
the experiment.

1332
01:13:12,110 --> 01:13:14,180
So that means you should
be willing to pay 10$.

1333
01:13:16,840 --> 01:13:20,400
Because on average, you'll
get back 10$.

1334
01:13:20,400 --> 01:13:22,330
If you wanted to make a
profit you ought to be

1335
01:13:22,330 --> 01:13:25,040
willing to pay 9$.

1336
01:13:25,040 --> 01:13:25,350
Right?

1337
01:13:25,350 --> 01:13:28,240
Because then you would pay
9$ expecting to get 10$.

1338
01:13:28,240 --> 01:13:30,630
If you really would like
to make a loss, right?

1339
01:13:30,630 --> 01:13:33,760
Then you should pay $11.

1340
01:13:33,760 --> 01:13:34,499
Yeah?

1341
01:13:34,499 --> 01:13:36,594
AUDIENCE: Why do we
assume that these

1342
01:13:36,594 --> 01:13:37,493
events are equally likely?

1343
01:13:37,493 --> 01:13:39,990
PROFESSOR: Completely
arbitrary.

1344
01:13:39,990 --> 01:13:44,210
So there's theories, more
advanced theories, for how you

1345
01:13:44,210 --> 01:13:46,200
would make that choice.

1346
01:13:46,200 --> 01:13:49,910
So for example if in your head
you thought that the person

1347
01:13:49,910 --> 01:13:55,770
just took a large collection of
LEGO parts and reached in,

1348
01:13:55,770 --> 01:13:59,390
then you would think that the
number of red and white might

1349
01:13:59,390 --> 01:14:03,820
depend on the number that
started out in the bin.

1350
01:14:03,820 --> 01:14:05,680
But I don't think that's
probably true, right.

1351
01:14:05,680 --> 01:14:08,010
The person was probably looking
at them and saying, oh

1352
01:14:08,010 --> 01:14:10,800
throw in one red, through
in one white.

1353
01:14:10,800 --> 01:14:14,540
So you need a theory for doing
that, and I'm saying that in

1354
01:14:14,540 --> 01:14:18,940
the absence of any other
information let me assume that

1355
01:14:18,940 --> 01:14:22,210
those are equally likely and
see what the consequence of

1356
01:14:22,210 --> 01:14:23,050
that would be.

1357
01:14:23,050 --> 01:14:26,640
The consequence of assuming that
is that I should expect

1358
01:14:26,640 --> 01:14:29,800
to get 10 $ back.

1359
01:14:29,800 --> 01:14:34,770
What happens if you
pull out a red?

1360
01:14:34,770 --> 01:14:36,960
As we did.

1361
01:14:36,960 --> 01:14:39,330
How does that affect things?

1362
01:14:39,330 --> 01:14:43,430
Well it increases
the bottom line.

1363
01:14:43,430 --> 01:14:47,320
I start out again with the
assumption that all 5 cases

1364
01:14:47,320 --> 01:14:49,810
are equally likely.

1365
01:14:49,810 --> 01:14:54,440
Now I have to ask the case, how
likely is it that the one

1366
01:14:54,440 --> 01:14:57,540
that we pulled out was red?

1367
01:14:57,540 --> 01:15:01,520
Well it's not very likely that
the one that I pulled out was

1368
01:15:01,520 --> 01:15:05,340
red, if they were all white.

1369
01:15:05,340 --> 01:15:07,180
The probability of that
happening is 0.

1370
01:15:10,320 --> 01:15:12,690
What's the probability if there
were 2 that the person

1371
01:15:12,690 --> 01:15:13,740
pulled out a red?

1372
01:15:13,740 --> 01:15:16,880
Well 2 of them were red, 2 of
them were white, 2 out of 4

1373
01:15:16,880 --> 01:15:24,550
cases would have showed this
case of pulling out a red.

1374
01:15:24,550 --> 01:15:27,290
So this line then tells
me how likely is it

1375
01:15:27,290 --> 01:15:30,670
that the red was pulled.

1376
01:15:30,670 --> 01:15:31,260
OK.

1377
01:15:31,260 --> 01:15:34,020
Then what I want to do is
think about what's the

1378
01:15:34,020 --> 01:15:38,110
probability that I pulled
out a red, and there was

1379
01:15:38,110 --> 01:15:39,530
0, 1, 2, 3, or 4.

1380
01:15:39,530 --> 01:15:45,350
So I multiply 1/5 times 0/40,
0/20, 1/5 times 1/4 to get

1381
01:15:45,350 --> 01:15:50,120
1/20, 1/5 times 2/4
you get 2/20.

1382
01:15:50,120 --> 01:15:52,220
So those are probabilities
of each

1383
01:15:52,220 --> 01:15:54,300
individual event happening.

1384
01:15:54,300 --> 01:15:57,220
But they don't sum to 1.

1385
01:15:57,220 --> 01:15:59,670
So then the next step I have
to make them sum to 1.

1386
01:15:59,670 --> 01:16:01,380
So the sum of these is a 1/2.

1387
01:16:01,380 --> 01:16:05,270
So I make them sum
to one this way.

1388
01:16:05,270 --> 01:16:10,170
So now what's happened is it's
relatively more likely 4 out

1389
01:16:10,170 --> 01:16:14,200
of 10, that this case happened,
than that case.

1390
01:16:14,200 --> 01:16:19,890
I know for sure, for example,
that there's not 4 whites.

1391
01:16:19,890 --> 01:16:22,360
The probability of
4 whites is 0--

1392
01:16:22,360 --> 01:16:25,400
0 out of 10

1393
01:16:25,400 --> 01:16:29,730
So what I've done is I've skewed
the distribution toward

1394
01:16:29,730 --> 01:16:34,770
more red by learning that
there's at least 1, I now know

1395
01:16:34,770 --> 01:16:36,650
that I know additional
information.

1396
01:16:36,650 --> 01:16:39,570
These were not equally likely.

1397
01:16:39,570 --> 01:16:41,670
In fact, the ones with
more red were

1398
01:16:41,670 --> 01:16:43,520
relatively more likely.

1399
01:16:43,520 --> 01:16:46,480
So if I compute this probability
times that

1400
01:16:46,480 --> 01:16:50,580
expected amount, I now get a
much bigger answer for the

1401
01:16:50,580 --> 01:16:54,100
high number of reds.

1402
01:16:54,100 --> 01:16:57,350
So I still get 0 just like I
did before for this case,

1403
01:16:57,350 --> 01:16:59,540
because there's no
reds in the bag.

1404
01:16:59,540 --> 01:17:02,930
But now it's much more likely
that they're all red, because

1405
01:17:02,930 --> 01:17:05,600
I know there was
at least 1 red.

1406
01:17:05,600 --> 01:17:08,740
And then the answer
comes out $15.

1407
01:17:08,740 --> 01:17:15,620
So my overall assessment,
don't go to Vegas.

1408
01:17:18,350 --> 01:17:25,710
You could have made a lot more
money by offering $13.

1409
01:17:25,710 --> 01:17:27,380
Because on average,
you should've

1410
01:17:27,380 --> 01:17:30,420
expected to make $15.

1411
01:17:30,420 --> 01:17:34,490
OK, so what I wanted to do by
this example is go through a

1412
01:17:34,490 --> 01:17:39,130
specific example of how you can
speak quantitatively about

1413
01:17:39,130 --> 01:17:40,540
things that are uncertain.

1414
01:17:40,540 --> 01:17:43,170
And that's the theme for
the rest of the course.