1
00:00:00,000 --> 00:00:00,040

2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

3
00:00:02,460 --> 00:00:03,870
Commons license.

4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to

5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.

6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from

7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at

8
00:00:19,290 --> 00:00:22,004
ocw.mit.edu

9
00:00:22,004 --> 00:00:24,966
JOHN TSISIKLIS: So here's
the agenda for today.

10
00:00:24,966 --> 00:00:26,848
We're going to do a
very quick review.

11
00:00:26,848 --> 00:00:28,936
And then we're going
to introduce some

12
00:00:28,936 --> 00:00:30,560
very important concepts.

13
00:00:30,560 --> 00:00:34,060
The idea is that all
information is--

14
00:00:34,060 --> 00:00:36,450
Information is always partial.

15
00:00:36,450 --> 00:00:40,260
And the question is what do we
do to probabilities if we have

16
00:00:40,260 --> 00:00:43,340
some partial information about
the random experiments.

17
00:00:43,340 --> 00:00:45,770
We're going to introduce the
important concept of

18
00:00:45,770 --> 00:00:47,530
conditional probability.

19
00:00:47,530 --> 00:00:50,860
And then we will see three
very useful ways

20
00:00:50,860 --> 00:00:52,670
in which it is used.

21
00:00:52,670 --> 00:00:55,410
And these ways basically
correspond to divide and

22
00:00:55,410 --> 00:00:58,070
conquer methods for breaking
up problems

23
00:00:58,070 --> 00:01:00,120
into simpler pieces.

24
00:01:00,120 --> 00:01:04,010
And also one more fundamental
tool which allows us to use

25
00:01:04,010 --> 00:01:07,420
conditional probabilities to do
inference, that is, if we

26
00:01:07,420 --> 00:01:09,440
get a little bit of information
about some

27
00:01:09,440 --> 00:01:12,620
phenomenon, what can we
infer about the things

28
00:01:12,620 --> 00:01:14,640
that we have not seen?

29
00:01:14,640 --> 00:01:17,050
So our quick review.

30
00:01:17,050 --> 00:01:22,100
In setting up a model of a
random experiment, the first

31
00:01:22,100 --> 00:01:25,930
thing to do is to come up with
a list of all the possible

32
00:01:25,930 --> 00:01:27,870
outcomes of the experiment.

33
00:01:27,870 --> 00:01:31,120
So that list is what we
call the sample space.

34
00:01:31,120 --> 00:01:32,480
It's a set.

35
00:01:32,480 --> 00:01:34,580
And the elements of the
sample space are all

36
00:01:34,580 --> 00:01:35,720
the possible outcomes.

37
00:01:35,720 --> 00:01:37,560
Those possible outcomes must be

38
00:01:37,560 --> 00:01:39,690
distinguishable from each other.

39
00:01:39,690 --> 00:01:41,020
They're mutually exclusive.

40
00:01:41,020 --> 00:01:44,900
Either one happens or the other
happens, but not both.

41
00:01:44,900 --> 00:01:47,440
And they are collectively
exhaustive, that is no matter

42
00:01:47,440 --> 00:01:50,480
what the outcome of the
experiment is going to be an

43
00:01:50,480 --> 00:01:52,130
element of the sample space.

44
00:01:52,130 --> 00:01:54,200
And then we discussed last
time that there's also an

45
00:01:54,200 --> 00:01:57,510
element of art in how to choose
your sample space,

46
00:01:57,510 --> 00:02:01,440
depending on how much detail
you want to capture.

47
00:02:01,440 --> 00:02:03,130
This is usually the easy part.

48
00:02:03,130 --> 00:02:06,980
Then the more interesting part
is to assign probabilities to

49
00:02:06,980 --> 00:02:10,660
our model, that is to make some
statements about what we

50
00:02:10,660 --> 00:02:14,610
believe to be likely and what
we believe to be unlikely.

51
00:02:14,610 --> 00:02:17,720
The way we do that is by
assigning probabilities to

52
00:02:17,720 --> 00:02:20,510
subsets of the sample space.

53
00:02:20,510 --> 00:02:26,120
So as we have our sample space
here, we may have a subset A.

54
00:02:26,120 --> 00:02:31,090
And we assign a number to that
subset P(A), which is the

55
00:02:31,090 --> 00:02:33,910
probability that this
event happens.

56
00:02:33,910 --> 00:02:37,080
Or this is the probability that
when we do the experiment

57
00:02:37,080 --> 00:02:39,860
and we get an outcome it's the
probability that the outcome

58
00:02:39,860 --> 00:02:41,850
happens to fall inside
that event.

59
00:02:41,850 --> 00:02:44,500
We have certain rules that
probabilities should satisfy.

60
00:02:44,500 --> 00:02:46,210
They're non-negative.

61
00:02:46,210 --> 00:02:49,780
The probability of the overall
sample space is equal to one,

62
00:02:49,780 --> 00:02:52,900
which expresses the fact that
we're are certain, no matter

63
00:02:52,900 --> 00:02:55,480
what, the outcome is going
to be an element

64
00:02:55,480 --> 00:02:56,830
of the sample space.

65
00:02:56,830 --> 00:02:59,760
Well, if we set the top right
so that it exhausts all

66
00:02:59,760 --> 00:03:03,190
possibilities, this should
be the case.

67
00:03:03,190 --> 00:03:05,480
And then there's another
interesting property of

68
00:03:05,480 --> 00:03:09,240
probabilities that says that,
if we have two events or two

69
00:03:09,240 --> 00:03:11,910
subsets that are disjoint, and
we're interested in the

70
00:03:11,910 --> 00:03:17,670
probability, that one or the
other happens, that is the

71
00:03:17,670 --> 00:03:21,870
outcome belongs to A or belongs
to B. For disjoint

72
00:03:21,870 --> 00:03:25,320
events the total probability of
these two, taken together,

73
00:03:25,320 --> 00:03:28,030
is just the sum of their
individual probabilities.

74
00:03:28,030 --> 00:03:30,270
So probabilities behave
like masses.

75
00:03:30,270 --> 00:03:34,760
The mass of the object
consisting of A and B is the

76
00:03:34,760 --> 00:03:37,230
sum of the masses of
these two objects.

77
00:03:37,230 --> 00:03:39,720
Or you can think of
probabilities as areas.

78
00:03:39,720 --> 00:03:41,240
They have, again, the
same property.

79
00:03:41,240 --> 00:03:45,490
The area of A together with B is
the area of A plus the area

80
00:03:45,490 --> 00:03:46,410
B.

81
00:03:46,410 --> 00:03:50,290
But as we discussed at the end
of last lecture, it's useful

82
00:03:50,290 --> 00:03:53,970
to have in our hands a more
general version of this

83
00:03:53,970 --> 00:03:58,990
additivity property, which says
the following, if we take

84
00:03:58,990 --> 00:04:00,982
a sequence of sets--

85
00:04:00,982 --> 00:04:07,480
A1, A2, A3, A4, and so on.

86
00:04:07,480 --> 00:04:09,630
And we put all of those
sets together.

87
00:04:09,630 --> 00:04:11,410
It's an infinite sequence.

88
00:04:11,410 --> 00:04:14,950
And we ask for the probability
that the outcome falls

89
00:04:14,950 --> 00:04:19,170
somewhere in this infinite
union, that is we are asking

90
00:04:19,170 --> 00:04:22,640
for the probability that the
outcome belongs to one of

91
00:04:22,640 --> 00:04:27,950
these sets, and assuming that
the sets are disjoint, we can

92
00:04:27,950 --> 00:04:32,820
again find the probability for
the overall set by adding up

93
00:04:32,820 --> 00:04:36,000
the probabilities of the
individual sets.

94
00:04:36,000 --> 00:04:38,910
So this is a nice and
simple property.

95
00:04:38,910 --> 00:04:43,130
But it's a little more subtle
than you might think.

96
00:04:43,130 --> 00:04:45,820
And let's see what's going
on by considering

97
00:04:45,820 --> 00:04:47,770
the following example.

98
00:04:47,770 --> 00:04:51,850
We had an example last time
where we take our sample space

99
00:04:51,850 --> 00:04:53,800
to be the unit square.

100
00:04:53,800 --> 00:04:58,110
And we said let's consider a
probability law that says that

101
00:04:58,110 --> 00:05:04,190
the probability of a subset is
just the area of that subset.

102
00:05:04,190 --> 00:05:07,630
So let's consider this
probability law.

103
00:05:07,630 --> 00:05:08,530
OK.

104
00:05:08,530 --> 00:05:13,990
Now the unit square is
the set --let me just

105
00:05:13,990 --> 00:05:15,210
draw it this way--

106
00:05:15,210 --> 00:05:20,520
the unit square is the union of
one element set consisting

107
00:05:20,520 --> 00:05:21,680
all of the points.

108
00:05:21,680 --> 00:05:28,280
So the unit square is made up
by the union of the various

109
00:05:28,280 --> 00:05:30,740
points inside the square.

110
00:05:30,740 --> 00:05:33,830
So union over all x's and y's.

111
00:05:33,830 --> 00:05:34,770
OK?

112
00:05:34,770 --> 00:05:36,690
So the square is made
up out of all the

113
00:05:36,690 --> 00:05:38,400
points that this contains.

114
00:05:38,400 --> 00:05:41,140
And now let's do
a calculation.

115
00:05:41,140 --> 00:05:45,060
One is the probability of our
overall sample space, which is

116
00:05:45,060 --> 00:05:47,260
the unit square.

117
00:05:47,260 --> 00:06:02,000
Now the unit square is the union
of these things, which,

118
00:06:02,000 --> 00:06:06,810
according to our additivity
axiom, is the sum of the

119
00:06:06,810 --> 00:06:10,595
probabilities of all of these
one element sets.

120
00:06:10,595 --> 00:06:16,830

121
00:06:16,830 --> 00:06:20,580
Now what is the probability
of a one element set?

122
00:06:20,580 --> 00:06:23,520
What is the probability of
this one element set?

123
00:06:23,520 --> 00:06:26,100
What's the probability that our
outcome is exactly that

124
00:06:26,100 --> 00:06:27,490
particular point?

125
00:06:27,490 --> 00:06:31,460
Well, it's the area of that
set, which is zero.

126
00:06:31,460 --> 00:06:33,990
So it's just the sum of zeros.

127
00:06:33,990 --> 00:06:35,950
And by any reasonable
definition the

128
00:06:35,950 --> 00:06:38,370
sum of zeros is zero.

129
00:06:38,370 --> 00:06:42,220
So we just proved that
one is equal to zero.

130
00:06:42,220 --> 00:06:42,680
OK.

131
00:06:42,680 --> 00:06:48,340
Either probability theory is
dead or there is some mistake

132
00:06:48,340 --> 00:06:51,030
in the derivation that I did.

133
00:06:51,030 --> 00:06:54,580
OK, the mistake is quite
subtle and it

134
00:06:54,580 --> 00:06:57,300
comes at this step.

135
00:06:57,300 --> 00:07:00,640
We're sort of applied the
additivity axiom by saying

136
00:07:00,640 --> 00:07:04,040
that the unit square is the
union of all those sets.

137
00:07:04,040 --> 00:07:06,500
Can we really apply our
additivity axiom.

138
00:07:06,500 --> 00:07:07,260
Here's the catch.

139
00:07:07,260 --> 00:07:11,470
The additivity axiom applies
to the case where we have a

140
00:07:11,470 --> 00:07:17,180
sequence of disjoint events
and we take their union.

141
00:07:17,180 --> 00:07:21,740
Is this a sequence of sets?

142
00:07:21,740 --> 00:07:27,780
Can you make up the whole unit
square by taking a sequence of

143
00:07:27,780 --> 00:07:31,310
elements inside it and cover
the whole unit square?

144
00:07:31,310 --> 00:07:34,900
Well if you try, if you start
looking at the sequence of one

145
00:07:34,900 --> 00:07:40,910
element points, that sequence
will never be able to exhaust

146
00:07:40,910 --> 00:07:43,100
the whole unit square.

147
00:07:43,100 --> 00:07:45,680
So there's a deeper reason
behind that.

148
00:07:45,680 --> 00:07:48,790
And the reason is that infinite
sets are not all of

149
00:07:48,790 --> 00:07:50,130
the same size.

150
00:07:50,130 --> 00:07:52,620
The integers are an
infinite set.

151
00:07:52,620 --> 00:07:55,510
And you can arrange the integers
in a sequence.

152
00:07:55,510 --> 00:07:57,630
But the continuous set
like the units

153
00:07:57,630 --> 00:08:00,205
square is a bigger set.

154
00:08:00,205 --> 00:08:02,050
It's so-called uncountable.

155
00:08:02,050 --> 00:08:06,160
It has more elements than
any sequence could have.

156
00:08:06,160 --> 00:08:13,610
So this union here is not of
this kind, where we would have

157
00:08:13,610 --> 00:08:16,930
a sequence of events.

158
00:08:16,930 --> 00:08:18,370
It's a different
kind of union.

159
00:08:18,370 --> 00:08:23,070
It's a Union that involves a
union of many, many more sets.

160
00:08:23,070 --> 00:08:25,420
So the countable additivity
axiom does not

161
00:08:25,420 --> 00:08:27,360
apply in this case.

162
00:08:27,360 --> 00:08:30,230
Because, we're not dealing
with a sequence of sets.

163
00:08:30,230 --> 00:08:33,780
And so this is the
incorrect step.

164
00:08:33,780 --> 00:08:37,240
So at some level you might think
that this is puzzling

165
00:08:37,240 --> 00:08:38,580
and awfully confusing.

166
00:08:38,580 --> 00:08:41,070
On the other hand, if you think
about areas of the way

167
00:08:41,070 --> 00:08:43,520
you're used to them from
calculus, there's nothing

168
00:08:43,520 --> 00:08:44,940
mysterious about it.

169
00:08:44,940 --> 00:08:47,460
Every point on the unit
square has zero area.

170
00:08:47,460 --> 00:08:50,140
When you put all the points
together, they make up

171
00:08:50,140 --> 00:08:52,330
something that has
finite area.

172
00:08:52,330 --> 00:08:55,470
So there shouldn't be any
mystery behind it.

173
00:08:55,470 --> 00:09:00,230
Now, one interesting thing that
this discussion tells us,

174
00:09:00,230 --> 00:09:03,670
especially the fact that the
single elements set has zero

175
00:09:03,670 --> 00:09:05,790
area, is the following--

176
00:09:05,790 --> 00:09:08,960
Individual points have
zero probability.

177
00:09:08,960 --> 00:09:12,390
After you do the experiment and
you observe the outcome,

178
00:09:12,390 --> 00:09:14,660
it's going to be an
individual point.

179
00:09:14,660 --> 00:09:18,160
So what happened in that
experiment is something that

180
00:09:18,160 --> 00:09:21,820
initially you thought had zero
probability of occurring.

181
00:09:21,820 --> 00:09:25,420
So if you happen to get some
particular numbers and you

182
00:09:25,420 --> 00:09:28,290
say, "Well, in the beginning,
what did I think about those

183
00:09:28,290 --> 00:09:29,280
specific numbers?

184
00:09:29,280 --> 00:09:31,290
I thought they had
zero probability.

185
00:09:31,290 --> 00:09:36,250
But yet those particular
numbers did occur."

186
00:09:36,250 --> 00:09:41,640
So one moral from this is that
zero probability does not mean

187
00:09:41,640 --> 00:09:42,890
impossible.

188
00:09:42,890 --> 00:09:46,920
It just means extremely,
extremely unlikely by itself.

189
00:09:46,920 --> 00:09:49,420
So zero probability
things do happen.

190
00:09:49,420 --> 00:09:53,340
In such continuous models,
actually zero probability

191
00:09:53,340 --> 00:09:56,930
outcomes are everything
that happens.

192
00:09:56,930 --> 00:10:00,790
And the bumper sticker version
of this is to always expect

193
00:10:00,790 --> 00:10:02,220
the unexpected.

194
00:10:02,220 --> 00:10:05,095
Yes?

195
00:10:05,095 --> 00:10:06,345
AUDIENCE: [INAUDIBLE].

196
00:10:06,345 --> 00:10:08,532

197
00:10:08,532 --> 00:10:11,800
JOHN TSISIKLIS: Well,
probability is supposed to be

198
00:10:11,800 --> 00:10:12,530
a real number.

199
00:10:12,530 --> 00:10:16,220
So it's either zero or it's
a positive number.

200
00:10:16,220 --> 00:10:21,350
So you can think of the
probability of things just

201
00:10:21,350 --> 00:10:25,040
close to that point and those
probabilities are tiny and

202
00:10:25,040 --> 00:10:26,390
close to zero.

203
00:10:26,390 --> 00:10:28,780
So that's how we're going to
interpret probabilities in

204
00:10:28,780 --> 00:10:29,810
continuous models.

205
00:10:29,810 --> 00:10:31,340
But this is two chapters
ahead.

206
00:10:31,340 --> 00:10:33,950

207
00:10:33,950 --> 00:10:34,230
Yeah?

208
00:10:34,230 --> 00:10:36,198
AUDIENCE: How do we interpret
probability of zero?

209
00:10:36,198 --> 00:10:37,674
If we can use models that
way, then how about

210
00:10:37,674 --> 00:10:38,658
probability of one?

211
00:10:38,658 --> 00:10:40,462
That it it's extremely
likely but not

212
00:10:40,462 --> 00:10:42,110
necessarily for certain?

213
00:10:42,110 --> 00:10:43,320
JOHN TSISIKLIS: That's
also the case.

214
00:10:43,320 --> 00:10:47,450
For example, if you ask in this
continuous model, if you

215
00:10:47,450 --> 00:10:52,190
ask me for the probability that
x, y, is different than

216
00:10:52,190 --> 00:10:55,840
the zero, zero this is
the whole square,

217
00:10:55,840 --> 00:10:57,220
except for one point.

218
00:10:57,220 --> 00:11:01,150
So the area of this is
going to be one.

219
00:11:01,150 --> 00:11:06,330
But this event is not entirely
certain because the zero, zero

220
00:11:06,330 --> 00:11:08,210
outcome is also possible.

221
00:11:08,210 --> 00:11:12,330
So again, probability of one
means essential certainty.

222
00:11:12,330 --> 00:11:16,450
But it still allows the
possibility that the outcome

223
00:11:16,450 --> 00:11:18,320
might be outside that set.

224
00:11:18,320 --> 00:11:20,910
So these are some of the weird
things that are happening when

225
00:11:20,910 --> 00:11:22,680
you have continuous models.

226
00:11:22,680 --> 00:11:25,240
And that's why we start to
this class with discrete

227
00:11:25,240 --> 00:11:27,050
models, on which would
be spending the

228
00:11:27,050 --> 00:11:30,400
next couple of weeks.

229
00:11:30,400 --> 00:11:30,820
OK.

230
00:11:30,820 --> 00:11:35,650
So now once we have set up our
probability model and we have

231
00:11:35,650 --> 00:11:39,160
a legitimate probability law
that has these properties,

232
00:11:39,160 --> 00:11:43,070
then the rest is
usually simple.

233
00:11:43,070 --> 00:11:45,950
Somebody asks you a question of
calculating the probability

234
00:11:45,950 --> 00:11:47,520
of some event.

235
00:11:47,520 --> 00:11:50,270
While you were told something
about the probability law,

236
00:11:50,270 --> 00:11:52,520
such as for example the
probabilities are equal to

237
00:11:52,520 --> 00:11:55,460
areas, and then you just
need to calculate.

238
00:11:55,460 --> 00:11:58,730
In these type of examples
somebody would give you a set

239
00:11:58,730 --> 00:12:00,230
and you would have
to calculate the

240
00:12:00,230 --> 00:12:01,500
area of that set.

241
00:12:01,500 --> 00:12:06,060
So the rest is just calculation
and simple.

242
00:12:06,060 --> 00:12:09,390
Alright, so now it's time
to start with our main

243
00:12:09,390 --> 00:12:12,600
business for today.

244
00:12:12,600 --> 00:12:16,880
And the starting point
is the following--

245
00:12:16,880 --> 00:12:18,920
You know something
about the world.

246
00:12:18,920 --> 00:12:21,690
And based on what you know when
you set up a probability

247
00:12:21,690 --> 00:12:23,820
model and you write down
probabilities for the

248
00:12:23,820 --> 00:12:26,000
different outcomes.

249
00:12:26,000 --> 00:12:28,950
Then something happens, and
somebody tells you a little

250
00:12:28,950 --> 00:12:33,620
more about the world, gives
you some new information.

251
00:12:33,620 --> 00:12:37,430
This new information, in
general, should change your

252
00:12:37,430 --> 00:12:41,240
beliefs about what happened
or what may happen.

253
00:12:41,240 --> 00:12:44,550
So whenever we're given new
information, some partial

254
00:12:44,550 --> 00:12:47,400
information about the outcome
of the experiment, we should

255
00:12:47,400 --> 00:12:49,750
revise our beliefs.

256
00:12:49,750 --> 00:12:54,470
And conditional probabilities
are just the probabilities

257
00:12:54,470 --> 00:12:58,820
that apply after the revision
of our beliefs, when we're

258
00:12:58,820 --> 00:13:00,580
given some information.

259
00:13:00,580 --> 00:13:04,510
So lets make this into
a numerical example.

260
00:13:04,510 --> 00:13:07,870
So inside the sample space, this
part of the sample space,

261
00:13:07,870 --> 00:13:12,580
let's say has probability 3/6,
this part has 2/6, and that

262
00:13:12,580 --> 00:13:14,550
part has 1/6.

263
00:13:14,550 --> 00:13:17,940
I guess that means that out here
we have zero probability.

264
00:13:17,940 --> 00:13:21,900
So these were our initial
beliefs about the outcome of

265
00:13:21,900 --> 00:13:23,270
the experiment.

266
00:13:23,270 --> 00:13:27,160
Suppose now that someone
comes and tells you

267
00:13:27,160 --> 00:13:30,960
that event B occurred.

268
00:13:30,960 --> 00:13:33,560
So they don't tell you the
full outcome with the

269
00:13:33,560 --> 00:13:34,440
experiment.

270
00:13:34,440 --> 00:13:38,960
But they just tell you that the
outcome is known to lie

271
00:13:38,960 --> 00:13:41,060
inside this set B.

272
00:13:41,060 --> 00:13:44,320
Well then, you should certainly
change your beliefs

273
00:13:44,320 --> 00:13:45,560
in some way.

274
00:13:45,560 --> 00:13:48,420
And your new beliefs about what
is likely to occur and

275
00:13:48,420 --> 00:13:51,770
what is not is going to be
denoted by this notation.

276
00:13:51,770 --> 00:13:55,330
This is the conditional
probability that the event A

277
00:13:55,330 --> 00:13:57,970
is going to occur, the
probability that the outcome

278
00:13:57,970 --> 00:14:01,580
is going to fall inside the set
A given that we are told

279
00:14:01,580 --> 00:14:05,890
and we're sure that the event
lies inside the event B Now

280
00:14:05,890 --> 00:14:09,000
once you're told that the
outcome lies inside the event

281
00:14:09,000 --> 00:14:13,740
B, then our old sample space
in some ways is irrelevant.

282
00:14:13,740 --> 00:14:16,975
We have then you sample space,
which is just the set B. We

283
00:14:16,975 --> 00:14:21,020
are certain that the outcome
is going to be inside B.

284
00:14:21,020 --> 00:14:25,465
For example, what is this
conditional probability?

285
00:14:25,465 --> 00:14:29,120

286
00:14:29,120 --> 00:14:30,160
It should be one.

287
00:14:30,160 --> 00:14:33,250
Given that I told you that B
occurred, you're certain that

288
00:14:33,250 --> 00:14:36,380
B occurred, so this has
unit probability.

289
00:14:36,380 --> 00:14:40,340
So here we see an instance of
revision of our beliefs.

290
00:14:40,340 --> 00:14:44,880
Initially, event B had the
probability of (2+1)/6 --

291
00:14:44,880 --> 00:14:46,300
that's 1/2.

292
00:14:46,300 --> 00:14:49,500
Initially, we thought B
had probability 1/2.

293
00:14:49,500 --> 00:14:52,370
Once we're told that B occurred,
the new probability

294
00:14:52,370 --> 00:14:54,250
of B is equal to one.

295
00:14:54,250 --> 00:14:55,160
OK.

296
00:14:55,160 --> 00:15:00,860
How do we revise the probability
that A occurs?

297
00:15:00,860 --> 00:15:03,950
So we are going to have the
outcome of the experiment.

298
00:15:03,950 --> 00:15:07,330
We know that it's inside B. So
we will either get something

299
00:15:07,330 --> 00:15:09,200
here, and A does not occur.

300
00:15:09,200 --> 00:15:12,570
Or something inside here,
and A does occur.

301
00:15:12,570 --> 00:15:16,280
What's the likelihood that,
given that we're inside B, the

302
00:15:16,280 --> 00:15:18,160
outcome is inside here?

303
00:15:18,160 --> 00:15:21,380
Here's how we're going
to think about.

304
00:15:21,380 --> 00:15:26,110
This part of this set B, in
which A also occurs, in our

305
00:15:26,110 --> 00:15:31,280
initial model was twice as
likely as that part of B. So

306
00:15:31,280 --> 00:15:36,220
outcomes inside here
collectively were twice as

307
00:15:36,220 --> 00:15:38,950
likely as outcomes out there.

308
00:15:38,950 --> 00:15:43,240
So we're going to keep the same
proportions and say, that

309
00:15:43,240 --> 00:15:47,280
given that we are inside the set
B, we still want outcomes

310
00:15:47,280 --> 00:15:51,120
inside here to be twice as
likely outcomes there.

311
00:15:51,120 --> 00:15:55,800
So the proportion of the
probabilities should be two

312
00:15:55,800 --> 00:15:57,570
versus one.

313
00:15:57,570 --> 00:16:01,210
And these probabilities should
add up to one because together

314
00:16:01,210 --> 00:16:04,340
they make the conditional
probability of B. So the

315
00:16:04,340 --> 00:16:09,260
conditional probabilities should
be 2/3 probability of

316
00:16:09,260 --> 00:16:13,080
being here and 1/3 probability
of being there.

317
00:16:13,080 --> 00:16:16,860
That's how we revise
our probabilities.

318
00:16:16,860 --> 00:16:20,740
That's a reasonable, intuitively
reasonable, way of

319
00:16:20,740 --> 00:16:22,230
doing this revision.

320
00:16:22,230 --> 00:16:26,650
Let's translate what we
did into a definition.

321
00:16:26,650 --> 00:16:29,490
The definition says the
following, that the

322
00:16:29,490 --> 00:16:33,410
conditional probability of A
given that B occurred is

323
00:16:33,410 --> 00:16:35,270
calculated as follows.

324
00:16:35,270 --> 00:16:39,430
We look at the total probability
of B. And out of

325
00:16:39,430 --> 00:16:43,190
that probability that was inside
here, what fraction of

326
00:16:43,190 --> 00:16:48,310
that probability is assigned to
points for which the event

327
00:16:48,310 --> 00:16:49,780
A also occurs?

328
00:16:49,780 --> 00:16:54,480

329
00:16:54,480 --> 00:16:56,860
Does it give us the same numbers
as we got with this

330
00:16:56,860 --> 00:16:58,420
heuristic argument?

331
00:16:58,420 --> 00:17:01,530
Well in this example,
probability of A intersection

332
00:17:01,530 --> 00:17:06,359
B is 2/6, divided by total
probability of B, which is

333
00:17:06,359 --> 00:17:12,369
3/6, and so it's 2/3, which
agrees with this answer that's

334
00:17:12,369 --> 00:17:13,589
we got before.

335
00:17:13,589 --> 00:17:18,280
So the former indeed matches
what we were trying to do.

336
00:17:18,280 --> 00:17:21,040
One little technical detail.

337
00:17:21,040 --> 00:17:24,970
If the event B has zero
probability, and then here we

338
00:17:24,970 --> 00:17:27,770
have a ratio that doesn't
make sense.

339
00:17:27,770 --> 00:17:30,470
So in this case, we say that
conditional probabilities are

340
00:17:30,470 --> 00:17:31,720
not defined.

341
00:17:31,720 --> 00:17:34,780

342
00:17:34,780 --> 00:17:38,980
Now you can take this definition
and unravel it and

343
00:17:38,980 --> 00:17:40,260
write it in this form.

344
00:17:40,260 --> 00:17:43,510
The probability of A
intersection B is the

345
00:17:43,510 --> 00:17:46,780
probability of B times the
conditional probability.

346
00:17:46,780 --> 00:17:50,350

347
00:17:50,350 --> 00:17:53,820
So this is just consequence of
the definition but it has a

348
00:17:53,820 --> 00:17:55,370
nice interpretation.

349
00:17:55,370 --> 00:17:57,930
Think of probabilities
as frequencies.

350
00:17:57,930 --> 00:18:01,480
If I do the experiment over and
over, what fraction of the

351
00:18:01,480 --> 00:18:05,300
time is it going to be the case
that both A and B occur?

352
00:18:05,300 --> 00:18:08,490
Well, there's going to be a
certain fraction of the time

353
00:18:08,490 --> 00:18:10,820
at which B occurs.

354
00:18:10,820 --> 00:18:14,760
And out of those times when B
occurs, there's going to be a

355
00:18:14,760 --> 00:18:17,270
further fraction of
the experiments in

356
00:18:17,270 --> 00:18:19,410
which A also occurs.

357
00:18:19,410 --> 00:18:21,930
So interpret the conditional
probability as follows.

358
00:18:21,930 --> 00:18:24,320
You only look at those
experiments at which

359
00:18:24,320 --> 00:18:26,050
B happens to occur.

360
00:18:26,050 --> 00:18:29,820
And look at what fraction of
those experiments where B

361
00:18:29,820 --> 00:18:33,670
already occurred, event
A also occurs.

362
00:18:33,670 --> 00:18:39,610
And there's a symmetrical
version of this equality.

363
00:18:39,610 --> 00:18:44,660
There's symmetry between the
events B and A. So you also

364
00:18:44,660 --> 00:18:48,890
have this relation that
goes the other way.

365
00:18:48,890 --> 00:18:53,950
OK, so what do we use these
conditional probabilities for?

366
00:18:53,950 --> 00:18:55,120
First, one comment.

367
00:18:55,120 --> 00:18:58,100
Conditional probabilities
are just like ordinary

368
00:18:58,100 --> 00:18:59,170
probabilities.

369
00:18:59,170 --> 00:19:02,820
They're the new probabilities
that apply in a new universe

370
00:19:02,820 --> 00:19:07,300
where event B is known
to have occurred.

371
00:19:07,300 --> 00:19:10,620
So we had an original
probability model.

372
00:19:10,620 --> 00:19:12,210
We are told that B occurs.

373
00:19:12,210 --> 00:19:13,840
We revise our model.

374
00:19:13,840 --> 00:19:16,690
Our new model should still be
legitimate probability model.

375
00:19:16,690 --> 00:19:20,770
So it should satisfy all sorts
of properties that ordinary

376
00:19:20,770 --> 00:19:23,210
probabilities do satisfy.

377
00:19:23,210 --> 00:19:29,230
So for example, if A and B are
disjoint events, then we know

378
00:19:29,230 --> 00:19:33,830
that the probability of A
union B is equal to the

379
00:19:33,830 --> 00:19:39,230
probability of A plus
probability of B. And now if I

380
00:19:39,230 --> 00:19:42,770
tell you that a certain event C
occurred, we're placed in a

381
00:19:42,770 --> 00:19:45,220
new universe where
event C occurred.

382
00:19:45,220 --> 00:19:47,515
We have new probabilities
for that universe.

383
00:19:47,515 --> 00:19:49,880
These are the conditional
probabilities.

384
00:19:49,880 --> 00:19:52,960
And conditional probabilities
also satisfy

385
00:19:52,960 --> 00:19:54,820
this kind of property.

386
00:19:54,820 --> 00:19:58,380
So this is just our usual
additivity axiom but the

387
00:19:58,380 --> 00:20:02,290
applied in a new model, in which
we were told that event

388
00:20:02,290 --> 00:20:03,250
C occurred.

389
00:20:03,250 --> 00:20:06,580
So conditional probabilities
do not taste or smell any

390
00:20:06,580 --> 00:20:09,970
different than ordinary
probabilities do.

391
00:20:09,970 --> 00:20:14,350
Conditional probabilities, given
a specific event B, just

392
00:20:14,350 --> 00:20:19,480
form a probability law
on our sample space.

393
00:20:19,480 --> 00:20:22,460
It's a different probability
law but it's still a

394
00:20:22,460 --> 00:20:26,430
probability law that has all
of the desired properties.

395
00:20:26,430 --> 00:20:30,360
OK, so where do conditional
probabilities come up?

396
00:20:30,360 --> 00:20:32,450
They do come up in quizzes
and they do

397
00:20:32,450 --> 00:20:34,070
come up in silly problems.

398
00:20:34,070 --> 00:20:35,680
So let's start with this.

399
00:20:35,680 --> 00:20:37,790
We have this example
from last time.

400
00:20:37,790 --> 00:20:42,220
Two rolls of a die, all possible
pairs of roles are

401
00:20:42,220 --> 00:20:46,410
equally likely, so every element
in this square has

402
00:20:46,410 --> 00:20:47,660
probability of 1/16.

403
00:20:47,660 --> 00:20:50,300

404
00:20:50,300 --> 00:20:52,330
So all elements are
equally likely.

405
00:20:52,330 --> 00:20:54,280
That's our original model.

406
00:20:54,280 --> 00:20:57,210
Then somebody comes and tells us
that the minimum of the two

407
00:20:57,210 --> 00:20:59,530
rolls is equal to zero.

408
00:20:59,530 --> 00:21:02,060
What's that event?

409
00:21:02,060 --> 00:21:05,990
The minimum equal to zero can
happen in many ways, if we get

410
00:21:05,990 --> 00:21:08,990
two zeros or if we
get a zero and--

411
00:21:08,990 --> 00:21:13,140
sorry, if we get two
two's, or get a two

412
00:21:13,140 --> 00:21:14,830
and something larger.

413
00:21:14,830 --> 00:21:21,400
And so the is our new event B.
The red event is the event B.

414
00:21:21,400 --> 00:21:23,500
And now we want to calculate
probabilities

415
00:21:23,500 --> 00:21:25,310
inside this new universe.

416
00:21:25,310 --> 00:21:28,770
For example, you may be
interested in the question,

417
00:21:28,770 --> 00:21:31,960
questions about the maximum
of the two rolls.

418
00:21:31,960 --> 00:21:34,310
In the new universe, what's
the probability that the

419
00:21:34,310 --> 00:21:37,550
maximum is equal to one?

420
00:21:37,550 --> 00:21:44,320
The maximum being equal to
one is this black event.

421
00:21:44,320 --> 00:21:49,240
And given that we're told that
B occurred, this black events

422
00:21:49,240 --> 00:21:50,300
cannot happen.

423
00:21:50,300 --> 00:21:53,240
So this probability
is equal to zero.

424
00:21:53,240 --> 00:21:56,500
How about the maximum
being equal to two,

425
00:21:56,500 --> 00:21:59,110
given that event B?

426
00:21:59,110 --> 00:22:01,760
OK, we can use the
definition here.

427
00:22:01,760 --> 00:22:05,730
It's going to be the probability
that the maximum

428
00:22:05,730 --> 00:22:10,590
is equal to two and B occurs
divided by the probability of

429
00:22:10,590 --> 00:22:16,020
B. The probability that the
maximum is equal to two.

430
00:22:16,020 --> 00:22:19,470
OK, what's the event that the
maximum is equal to two?

431
00:22:19,470 --> 00:22:20,340
Let's draw it.

432
00:22:20,340 --> 00:22:22,300
This is going to be
the blue event.

433
00:22:22,300 --> 00:22:25,950
The maximum is equal to
two if we get any

434
00:22:25,950 --> 00:22:28,520
of those blue points.

435
00:22:28,520 --> 00:22:32,310
So the intersection of the two
events is the intersection of

436
00:22:32,310 --> 00:22:35,170
the red event and
the blue event.

437
00:22:35,170 --> 00:22:37,770
There's only one point in
their intersection.

438
00:22:37,770 --> 00:22:39,640
So the probability of
that intersection

439
00:22:39,640 --> 00:22:41,080
happening is 1/16.

440
00:22:41,080 --> 00:22:43,740

441
00:22:43,740 --> 00:22:45,160
That's the numerator.

442
00:22:45,160 --> 00:22:47,110
How about the denominator?

443
00:22:47,110 --> 00:22:50,610
The event B consists of five
elements, each one of which

444
00:22:50,610 --> 00:22:52,270
had probability of 1/16.

445
00:22:52,270 --> 00:22:54,570
So that's 5/16.

446
00:22:54,570 --> 00:22:58,340
And so the answer is 1/5.

447
00:22:58,340 --> 00:23:02,830
Could we have gotten this
answer in a faster way?

448
00:23:02,830 --> 00:23:04,190
Yes.

449
00:23:04,190 --> 00:23:05,560
Here's how it goes.

450
00:23:05,560 --> 00:23:09,060
We're trying to find the
conditional probability that

451
00:23:09,060 --> 00:23:13,210
we get this point, given
that B occurred.

452
00:23:13,210 --> 00:23:15,570
B consist of five elements.

453
00:23:15,570 --> 00:23:18,250
All of those five elements were
equally likely when we

454
00:23:18,250 --> 00:23:22,720
started, so they remain equally
likely afterwards.

455
00:23:22,720 --> 00:23:25,180
Because when we define
conditional probabilities, we

456
00:23:25,180 --> 00:23:28,110
keep the same proportions
inside the set.

457
00:23:28,110 --> 00:23:31,940
So the five red elements
were equally likely.

458
00:23:31,940 --> 00:23:35,050
They remain equally likely
in the conditional world.

459
00:23:35,050 --> 00:23:39,080
So conditional event B having
happened, each one of these

460
00:23:39,080 --> 00:23:41,580
five elements has the
same probability.

461
00:23:41,580 --> 00:23:44,300
So the probability that we
actually get this point is

462
00:23:44,300 --> 00:23:46,210
going to be 1/5.

463
00:23:46,210 --> 00:23:48,280
And so that's the shortcut.

464
00:23:48,280 --> 00:23:53,070
More generally, whenever you
have a uniform distribution on

465
00:23:53,070 --> 00:23:56,470
your initial sample space,
when you condition on an

466
00:23:56,470 --> 00:24:01,000
event, your new distribution is
still going to be uniform,

467
00:24:01,000 --> 00:24:05,010
but on the smaller events
of that we considered.

468
00:24:05,010 --> 00:24:09,780
So we started with a uniform
distribution on the big square

469
00:24:09,780 --> 00:24:13,730
and we ended up with a
uniform distribution

470
00:24:13,730 --> 00:24:17,230
just on the red point.

471
00:24:17,230 --> 00:24:19,850
Now besides silly problems,
however, conditional

472
00:24:19,850 --> 00:24:25,070
probabilities show up in real
and interesting situations.

473
00:24:25,070 --> 00:24:27,390
And this example is going
to give you some

474
00:24:27,390 --> 00:24:30,430
idea of how that happens.

475
00:24:30,430 --> 00:24:32,250
OK.

476
00:24:32,250 --> 00:24:35,450
Actually, in this example,
instead of starting with a

477
00:24:35,450 --> 00:24:39,480
probability model in terms of
regular probabilities, I'm

478
00:24:39,480 --> 00:24:43,070
actually going to define the
model in terms of conditional

479
00:24:43,070 --> 00:24:43,890
probabilities.

480
00:24:43,890 --> 00:24:45,880
And we'll see how
this is done.

481
00:24:45,880 --> 00:24:48,330
So here's the story.

482
00:24:48,330 --> 00:24:52,210
There may be an airplane flying
up in the sky, in a

483
00:24:52,210 --> 00:24:55,400
particular sector of the sky
that you're watching.

484
00:24:55,400 --> 00:24:57,950
Sometimes there is one sometimes
there isn't.

485
00:24:57,950 --> 00:25:01,760
And from experience you know
that when you look up, there's

486
00:25:01,760 --> 00:25:04,400
five percent probability that
the plane is flying above

487
00:25:04,400 --> 00:25:09,670
there and 95% probability that
there's no plane up there.

488
00:25:09,670 --> 00:25:14,930
So event A is the event that the
plane is flying out there.

489
00:25:14,930 --> 00:25:19,140
Now you bought this wonderful
radar that's looks up.

490
00:25:19,140 --> 00:25:23,300
And you're told in the
manufacturer's specs that, if

491
00:25:23,300 --> 00:25:27,310
there is a plane out there,
your radar is going to

492
00:25:27,310 --> 00:25:30,090
register something, a
blip on the screen

493
00:25:30,090 --> 00:25:32,940
with probability 99%.

494
00:25:32,940 --> 00:25:35,540
And it will not register
anything with

495
00:25:35,540 --> 00:25:37,500
probability one percent.

496
00:25:37,500 --> 00:25:43,890
So this particular part of the
picture is a self-contained

497
00:25:43,890 --> 00:25:50,280
probability model of what your
radar does in a world where a

498
00:25:50,280 --> 00:25:52,530
plane is out there.

499
00:25:52,530 --> 00:25:55,380
So I'm telling you that the
plane is out there.

500
00:25:55,380 --> 00:25:58,240
So we're now dealing with
conditional probabilities

501
00:25:58,240 --> 00:26:00,920
because I gave you some
particular information.

502
00:26:00,920 --> 00:26:04,120
Given this information that the
plane is out there, that's

503
00:26:04,120 --> 00:26:07,770
how your radar is going to
behave with probability 99% is

504
00:26:07,770 --> 00:26:10,320
going to detect it, with
probability one percent is

505
00:26:10,320 --> 00:26:11,620
going to miss it.

506
00:26:11,620 --> 00:26:14,100
So this piece of the picture
is a self-contained

507
00:26:14,100 --> 00:26:15,060
probability model.

508
00:26:15,060 --> 00:26:17,130
The probabilities
add up to one.

509
00:26:17,130 --> 00:26:20,300
But it's a piece of
a larger model.

510
00:26:20,300 --> 00:26:22,820
Similarly, there's the
other possibility.

511
00:26:22,820 --> 00:26:27,980
Maybe a plane is not up there
and the manufacturer specs

512
00:26:27,980 --> 00:26:32,630
tell you something about
false alarms.

513
00:26:32,630 --> 00:26:37,490
A false alarm is the situation
where the plane is not there,

514
00:26:37,490 --> 00:26:41,190
but for some reason your radar
picked up some noise or

515
00:26:41,190 --> 00:26:43,700
whatever and shows a
blip on the screen.

516
00:26:43,700 --> 00:26:46,790
And suppose that this happens
with probability ten percent.

517
00:26:46,790 --> 00:26:49,170
Whereas with probability
90% your radar

518
00:26:49,170 --> 00:26:51,220
gives the correct answer.

519
00:26:51,220 --> 00:26:55,430
So this is sort of a model of
what's going to happen with

520
00:26:55,430 --> 00:26:59,430
respect to both the plane --
we're given probabilities

521
00:26:59,430 --> 00:27:02,000
about this -- and we're given
probabilities about how the

522
00:27:02,000 --> 00:27:04,120
radar behaves.

523
00:27:04,120 --> 00:27:07,740
So here I have indirectly
specified the probability law

524
00:27:07,740 --> 00:27:10,810
in our model by starting with
conditional probabilities as

525
00:27:10,810 --> 00:27:13,670
opposed to starting with
ordinary probabilities.

526
00:27:13,670 --> 00:27:17,160
Can we derive ordinary
probabilities starting from

527
00:27:17,160 --> 00:27:18,740
the conditional number ones?

528
00:27:18,740 --> 00:27:20,340
Yeah, we certainly can.

529
00:27:20,340 --> 00:27:25,810
Let's look at this event, A
intersection B, which is the

530
00:27:25,810 --> 00:27:31,160
event up here, that there
is a plane and our

531
00:27:31,160 --> 00:27:33,750
radar picks it up.

532
00:27:33,750 --> 00:27:35,760
How can we calculate
this probability?

533
00:27:35,760 --> 00:27:38,600
Well we use the definition of
conditional probabilities and

534
00:27:38,600 --> 00:27:41,430
this is the probability of
A times the conditional

535
00:27:41,430 --> 00:27:50,260
probability of B given A.
So it's 0.05 times 0.99.

536
00:27:50,260 --> 00:27:53,290
And the answer, in
case you care--

537
00:27:53,290 --> 00:27:56,730
It's 0.0495.

538
00:27:56,730 --> 00:27:57,650
OK.

539
00:27:57,650 --> 00:28:01,370
So we can calculate the
probabilities of final

540
00:28:01,370 --> 00:28:05,120
outcomes, which are the leaves
of the tree, by using the

541
00:28:05,120 --> 00:28:07,250
probabilities that
we have along the

542
00:28:07,250 --> 00:28:09,000
branches of the tree.

543
00:28:09,000 --> 00:28:11,950
So essentially, what we ended
up doing was to multiply the

544
00:28:11,950 --> 00:28:13,700
probability of this
branch times the

545
00:28:13,700 --> 00:28:17,220
probability of that branch.

546
00:28:17,220 --> 00:28:20,690
Now, how about the answer
to this question.

547
00:28:20,690 --> 00:28:25,350
What is the probability
that our radar is

548
00:28:25,350 --> 00:28:28,660
going to register something?

549
00:28:28,660 --> 00:28:32,800
OK, this is an event that can
happen in multiple ways.

550
00:28:32,800 --> 00:28:38,020
It's the event that consists
of this outcome.

551
00:28:38,020 --> 00:28:41,640
There is a plane and the radar
registers something together

552
00:28:41,640 --> 00:28:46,440
with this outcome, there is no
plane but the radar still

553
00:28:46,440 --> 00:28:48,470
registers something.

554
00:28:48,470 --> 00:28:52,650
So to find the probability of
this event, we need the

555
00:28:52,650 --> 00:28:56,940
individual probabilities
of the two outcomes.

556
00:28:56,940 --> 00:29:00,780
For the first outcome, we
already calculated it.

557
00:29:00,780 --> 00:29:03,870
For the second outcome, the
probability that this happens

558
00:29:03,870 --> 00:29:08,480
is going to be this probability
95% times 0.10,

559
00:29:08,480 --> 00:29:11,280
which is the conditional
probability for taking this

560
00:29:11,280 --> 00:29:15,070
branch, given that there
was no plane out there.

561
00:29:15,070 --> 00:29:18,080
So we just add the numbers.

562
00:29:18,080 --> 00:29:26,950
0.05 times 0.99 plus 0.95
times 0.1 and the

563
00:29:26,950 --> 00:29:31,720
final answer is 0.1445.

564
00:29:31,720 --> 00:29:32,410
OK.

565
00:29:32,410 --> 00:29:35,730
And now here's the interesting
question.

566
00:29:35,730 --> 00:29:41,480
Given that your radar recorded
something, how likely is it

567
00:29:41,480 --> 00:29:45,070
that there is an airplane
up there?

568
00:29:45,070 --> 00:29:46,810
Your radar registering
something --

569
00:29:46,810 --> 00:29:48,730
that can be caused
by two things.

570
00:29:48,730 --> 00:29:52,390
Either there's a plane there,
and your radar did its job.

571
00:29:52,390 --> 00:29:57,400
Or there was nothing, but your
radar fired a false alarm.

572
00:29:57,400 --> 00:30:01,690
What's the probability that this
is the case as opposed to

573
00:30:01,690 --> 00:30:05,370
that being the case?

574
00:30:05,370 --> 00:30:06,460
OK.

575
00:30:06,460 --> 00:30:10,510
The intuitive shortcut would
be that it should be the

576
00:30:10,510 --> 00:30:12,930
probability--

577
00:30:12,930 --> 00:30:15,820
you look at their relative odds
of these two elements and

578
00:30:15,820 --> 00:30:19,570
you use them to find out how
much more likely it is to be

579
00:30:19,570 --> 00:30:21,730
there as opposed
to being there.

580
00:30:21,730 --> 00:30:24,240
But instead of doing this,
let's just write down the

581
00:30:24,240 --> 00:30:26,570
definition and just use it.

582
00:30:26,570 --> 00:30:30,480
It's the probability of A and
B happening, divided by the

583
00:30:30,480 --> 00:30:34,250
probability of B. This is just
our definition of conditional

584
00:30:34,250 --> 00:30:35,540
probabilities.

585
00:30:35,540 --> 00:30:39,300
Now we have already found
the numerator.

586
00:30:39,300 --> 00:30:42,450
We have already calculated
the denominator.

587
00:30:42,450 --> 00:30:46,440
So we take the ratio of these
two numbers and we find the

588
00:30:46,440 --> 00:30:47,650
final answer --

589
00:30:47,650 --> 00:30:54,490
which is 0.34.

590
00:30:54,490 --> 00:30:55,980
OK.

591
00:30:55,980 --> 00:30:59,040
There's this slightly
curious thing that's

592
00:30:59,040 --> 00:31:02,270
happened in this example.

593
00:31:02,270 --> 00:31:08,380
Doesn't this number feel
a little too low?

594
00:31:08,380 --> 00:31:10,700
My radar --

595
00:31:10,700 --> 00:31:13,820
So this is a conditional
probability, given that my

596
00:31:13,820 --> 00:31:17,110
radar said there is something
out there, that there is

597
00:31:17,110 --> 00:31:19,200
indeed something there.

598
00:31:19,200 --> 00:31:21,960
So it's sort of the probability
that our radar

599
00:31:21,960 --> 00:31:24,560
gave the correct answer.

600
00:31:24,560 --> 00:31:28,580
Now, the specs of our radar
we're pretty good.

601
00:31:28,580 --> 00:31:31,460
In this situation, it gives
you the correct

602
00:31:31,460 --> 00:31:34,160
answer 99% of the time.

603
00:31:34,160 --> 00:31:36,020
In this situation, it gives
you the correct

604
00:31:36,020 --> 00:31:38,400
answer 90% of the time.

605
00:31:38,400 --> 00:31:39,730
So you would think
that your radar

606
00:31:39,730 --> 00:31:41,870
there is really reliable.

607
00:31:41,870 --> 00:31:47,730
But yet here the radar recorded
something, but the

608
00:31:47,730 --> 00:31:51,900
chance that the answer that
you get out of this is the

609
00:31:51,900 --> 00:31:55,180
right one, given that it
recorded something, the chance

610
00:31:55,180 --> 00:31:58,970
that there is an airplane
out there is only 30%.

611
00:31:58,970 --> 00:32:01,980
So you cannot really rely on
the measurements from your

612
00:32:01,980 --> 00:32:06,650
radar, even though the specs of
the radar were really good.

613
00:32:06,650 --> 00:32:08,620
What's the reason for this?

614
00:32:08,620 --> 00:32:17,730
Well, the reason is that false
alarms are pretty common.

615
00:32:17,730 --> 00:32:20,110
Most of the time there's
nothing.

616
00:32:20,110 --> 00:32:23,750
And there's a ten percent
probability of false alarms.

617
00:32:23,750 --> 00:32:26,640
So there's roughly a ten percent
probability that in

618
00:32:26,640 --> 00:32:29,730
any given experiment, you
have a false alarm.

619
00:32:29,730 --> 00:32:33,450
And there is about the five
percent probability that

620
00:32:33,450 --> 00:32:37,090
something out there and
your radar gets it.

621
00:32:37,090 --> 00:32:41,350
So when your radar records
something, it's actually more

622
00:32:41,350 --> 00:32:44,980
likely to be a false
alarm rather than

623
00:32:44,980 --> 00:32:46,860
being an actual airplane.

624
00:32:46,860 --> 00:32:49,100
This has probability ten
percent roughly.

625
00:32:49,100 --> 00:32:52,000
This has probability roughly
five percent

626
00:32:52,000 --> 00:32:55,130
So conditional probabilities
are sometimes

627
00:32:55,130 --> 00:32:58,250
counter-intuitive in terms of
the answers that they get.

628
00:32:58,250 --> 00:33:01,210
And you can make similar
stories about doctors

629
00:33:01,210 --> 00:33:04,370
interpreting the results
of tests.

630
00:33:04,370 --> 00:33:07,560
So you tested positive for
a certain disease.

631
00:33:07,560 --> 00:33:11,260
Does it mean that you have
the disease necessarily?

632
00:33:11,260 --> 00:33:14,590
Well if that disease has been
eradicated from the face of

633
00:33:14,590 --> 00:33:17,900
the earth, testing positive
doesn't mean that you have the

634
00:33:17,900 --> 00:33:21,740
disease, even if the test
was designed to be

635
00:33:21,740 --> 00:33:23,320
a pretty good one.

636
00:33:23,320 --> 00:33:28,190
So unfortunately, doctors do get
it wrong also sometimes.

637
00:33:28,190 --> 00:33:29,990
And the reasoning that
comes in such

638
00:33:29,990 --> 00:33:32,290
situations is pretty subtle.

639
00:33:32,290 --> 00:33:34,890
Now for the rest of the lecture,
what we're going to

640
00:33:34,890 --> 00:33:40,710
do is to take this example where
we did three things and

641
00:33:40,710 --> 00:33:41,880
abstract them.

642
00:33:41,880 --> 00:33:44,540
These three trivial calculations
that's we just

643
00:33:44,540 --> 00:33:50,190
did are three very important,
very basic tools that you use

644
00:33:50,190 --> 00:33:53,350
to solve more general
probability problems.

645
00:33:53,350 --> 00:33:55,040
So what's the first one?

646
00:33:55,040 --> 00:33:58,040
We find the probability of a
composite event, two things

647
00:33:58,040 --> 00:34:01,300
happening, by multiplying
probabilities and conditional

648
00:34:01,300 --> 00:34:03,130
probabilities.

649
00:34:03,130 --> 00:34:08,639
More general version of this,
look at any situation, maybe

650
00:34:08,639 --> 00:34:10,860
involving lots and
lots of events.

651
00:34:10,860 --> 00:34:15,510
So here's a story that event A
may happen or may not happen.

652
00:34:15,510 --> 00:34:19,440
Given that A occurred, it's
possible that B happens or

653
00:34:19,440 --> 00:34:21,360
that B does not happen.

654
00:34:21,360 --> 00:34:25,280
Given that B also happens, it's
possible that the event C

655
00:34:25,280 --> 00:34:29,770
also happens or that event
C does not happen.

656
00:34:29,770 --> 00:34:33,400
And somebody specifies for you
a model by giving you all

657
00:34:33,400 --> 00:34:36,230
these conditional probabilities
along the way.

658
00:34:36,230 --> 00:34:39,570
Notice what we move along
the branches as the tree

659
00:34:39,570 --> 00:34:40,690
progresses.

660
00:34:40,690 --> 00:34:45,110
Any point in the tree
corresponds to certain events

661
00:34:45,110 --> 00:34:47,050
having happened.

662
00:34:47,050 --> 00:34:50,980
And then, given that this
has happened, we specify

663
00:34:50,980 --> 00:34:52,360
conditional probabilities.

664
00:34:52,360 --> 00:34:55,989
Given that this has happened,
how likely is it for that C

665
00:34:55,989 --> 00:34:57,900
also occurs?

666
00:34:57,900 --> 00:35:00,890
Given a model of this kind, how
do we find the probability

667
00:35:00,890 --> 00:35:02,660
or for this event?

668
00:35:02,660 --> 00:35:05,310
The answer is extremely
simple.

669
00:35:05,310 --> 00:35:09,930
All that you do is move along
with the tree and multiply

670
00:35:09,930 --> 00:35:12,950
conditional probabilities
along the way.

671
00:35:12,950 --> 00:35:16,900
So in terms of frequencies, how
often do all three things

672
00:35:16,900 --> 00:35:19,310
happen, A, B, and C?

673
00:35:19,310 --> 00:35:22,450
You first see how often
does A occur.

674
00:35:22,450 --> 00:35:24,860
Out of the times that
A occurs, how

675
00:35:24,860 --> 00:35:26,710
often does B occur?

676
00:35:26,710 --> 00:35:29,630
And out of the times where both
A and B have occurred,

677
00:35:29,630 --> 00:35:31,660
how often does C occur?

678
00:35:31,660 --> 00:35:34,390
And you can just multiply those
three frequencies with

679
00:35:34,390 --> 00:35:36,440
each other.

680
00:35:36,440 --> 00:35:39,740
What is the formal
proof of this?

681
00:35:39,740 --> 00:35:43,000
Well, the only thing we have in
our hands is the definition

682
00:35:43,000 --> 00:35:44,890
of conditional probabilities.

683
00:35:44,890 --> 00:35:49,660
So let's just use this.

684
00:35:49,660 --> 00:35:50,910
And--

685
00:35:50,910 --> 00:35:54,370

686
00:35:54,370 --> 00:35:55,000
OK.

687
00:35:55,000 --> 00:35:58,210
Now, the definition of
conditional probabilities

688
00:35:58,210 --> 00:36:00,770
tells us that the probability
of two things is the

689
00:36:00,770 --> 00:36:03,660
probability of one of them
times a conditional

690
00:36:03,660 --> 00:36:04,620
probability.

691
00:36:04,620 --> 00:36:05,850
Unfortunately, here we have the

692
00:36:05,850 --> 00:36:07,310
probability of three things.

693
00:36:07,310 --> 00:36:09,000
What can I do?

694
00:36:09,000 --> 00:36:13,570
I can put a parenthesis in here
and think of this as the

695
00:36:13,570 --> 00:36:18,640
probability of this and that
and apply our definition of

696
00:36:18,640 --> 00:36:20,300
conditional probabilities
here.

697
00:36:20,300 --> 00:36:23,920
The probability of two things
happening is the probability

698
00:36:23,920 --> 00:36:28,430
that the first happens times
the conditional probability

699
00:36:28,430 --> 00:36:34,070
that the second happens, given
A and B, given that the first

700
00:36:34,070 --> 00:36:35,330
one happened.

701
00:36:35,330 --> 00:36:38,850
So this is just the definition
of the conditional probability

702
00:36:38,850 --> 00:36:41,980
of an event, given
another event.

703
00:36:41,980 --> 00:36:44,270
That other event is a
composite one, but

704
00:36:44,270 --> 00:36:45,330
that's not an issue.

705
00:36:45,330 --> 00:36:47,300
It's just an event.

706
00:36:47,300 --> 00:36:50,040
And then we use the definition
of conditional probabilities

707
00:36:50,040 --> 00:36:56,290
once more to break this apart
and make it P(A), P(B given A)

708
00:36:56,290 --> 00:36:58,260
and then finally,
the last term.

709
00:36:58,260 --> 00:37:00,930

710
00:37:00,930 --> 00:37:01,270
OK.

711
00:37:01,270 --> 00:37:03,680
So this proves the formula
that I have up

712
00:37:03,680 --> 00:37:05,290
there on the slides.

713
00:37:05,290 --> 00:37:07,470
And if you wish to calculate
any other

714
00:37:07,470 --> 00:37:09,330
probability in this diagram.

715
00:37:09,330 --> 00:37:12,590
For example, if you want to
calculate this probability,

716
00:37:12,590 --> 00:37:15,580
you would still multiply the
conditional probabilities

717
00:37:15,580 --> 00:37:18,560
along the different branches
of the tree.

718
00:37:18,560 --> 00:37:22,360
In particular, here in this
branch, you would have the

719
00:37:22,360 --> 00:37:26,670
conditional probability of
C complement, given A

720
00:37:26,670 --> 00:37:29,790
intersection B complement,
and so on.

721
00:37:29,790 --> 00:37:32,070
So you write down probabilities
along all those

722
00:37:32,070 --> 00:37:35,940
tree branches and just multiply
them as you go.

723
00:37:35,940 --> 00:37:38,510

724
00:37:38,510 --> 00:37:44,450
So this was the first skill
that we are covering.

725
00:37:44,450 --> 00:37:46,690
What was the second one?

726
00:37:46,690 --> 00:37:53,240
What we did was to calculate
the total probability of a

727
00:37:53,240 --> 00:37:58,520
certain event B that
consisted of--

728
00:37:58,520 --> 00:38:02,820
was made up from different
possibilities, which

729
00:38:02,820 --> 00:38:05,580
corresponded to different
scenarios.

730
00:38:05,580 --> 00:38:08,870
So we wanted to calculate the
probability of this event B

731
00:38:08,870 --> 00:38:12,030
that consisted of those
two elements.

732
00:38:12,030 --> 00:38:13,280
Let's generalize.

733
00:38:13,280 --> 00:38:18,600

734
00:38:18,600 --> 00:38:23,080
So we have our big model.

735
00:38:23,080 --> 00:38:26,110
And this sample space
is partitioned

736
00:38:26,110 --> 00:38:27,410
in a number of sets.

737
00:38:27,410 --> 00:38:30,620
In our radar example, we had
a partition in two sets.

738
00:38:30,620 --> 00:38:33,600
Either a plane is there, or
a plane is not there.

739
00:38:33,600 --> 00:38:35,850
Since we're trying to
generalize, now I'm going to

740
00:38:35,850 --> 00:38:39,410
give you a picture for the case
of three possibilities or

741
00:38:39,410 --> 00:38:41,360
three possible scenarios.

742
00:38:41,360 --> 00:38:45,160
So whatever happens in the
world, there are three

743
00:38:45,160 --> 00:38:49,660
possible scenarios,
A1, A2, A3.

744
00:38:49,660 --> 00:38:54,695
So think of these as there's
nothing in the air, there's an

745
00:38:54,695 --> 00:38:58,190
airplane in the air, or there's
a flock of geese

746
00:38:58,190 --> 00:38:59,490
flying in the air.

747
00:38:59,490 --> 00:39:03,050
So there's three possible
scenarios.

748
00:39:03,050 --> 00:39:08,972
And then there's a certain event
B of interest, such as a

749
00:39:08,972 --> 00:39:12,800
radar records something or
doesn't record something.

750
00:39:12,800 --> 00:39:15,870
We specify this model by giving

751
00:39:15,870 --> 00:39:18,040
probabilities for the Ai's--

752
00:39:18,040 --> 00:39:20,690

753
00:39:20,690 --> 00:39:23,420
That's the probability of
the different scenarios.

754
00:39:23,420 --> 00:39:27,180
And somebody also gives us the
probabilities that this event

755
00:39:27,180 --> 00:39:31,010
B is going to occur, given
that the Ai-th

756
00:39:31,010 --> 00:39:33,480
scenario has occurred.

757
00:39:33,480 --> 00:39:36,230
Think of the Ai's
as scenarios.

758
00:39:36,230 --> 00:39:39,130

759
00:39:39,130 --> 00:39:43,110
And we want to calculate the
overall probability of the

760
00:39:43,110 --> 00:39:47,210
event B. What's happening
in this example?

761
00:39:47,210 --> 00:39:49,640
Perhaps, instead of this
picture, it's easier to

762
00:39:49,640 --> 00:39:54,970
visualize if I go back to the
picture I was using before.

763
00:39:54,970 --> 00:39:59,990
We have three possible
scenarios, A1, A2, A3.

764
00:39:59,990 --> 00:40:05,150
And under each scenario, B may
happen or B may not happen.

765
00:40:05,150 --> 00:40:11,360

766
00:40:11,360 --> 00:40:12,250
And so on.

767
00:40:12,250 --> 00:40:16,060
So here we have A2 intersection
B. And here we

768
00:40:16,060 --> 00:40:22,110
have A3 intersection B. In the
previous slide, we found how

769
00:40:22,110 --> 00:40:25,350
to calculate the probability
of any event of this kind,

770
00:40:25,350 --> 00:40:28,870
which is done by multiplying
probabilities here and

771
00:40:28,870 --> 00:40:31,100
conditional probabilities
there.

772
00:40:31,100 --> 00:40:34,320
Now we are asked to calculate
the total probability of the

773
00:40:34,320 --> 00:40:38,410
event B. The event B can happen
in three possible ways.

774
00:40:38,410 --> 00:40:39,900
It can happen here.

775
00:40:39,900 --> 00:40:41,700
It can happen there.

776
00:40:41,700 --> 00:40:43,780
And it can happen here.

777
00:40:43,780 --> 00:40:50,020
So this is our event B. It
consists of three elements.

778
00:40:50,020 --> 00:40:53,370
To calculate the total
probability of our event B,

779
00:40:53,370 --> 00:40:56,730
all we need to do is to add
these three probabilities.

780
00:40:56,730 --> 00:40:59,440

781
00:40:59,440 --> 00:41:03,510
So B is an event that consists
of these three elements.

782
00:41:03,510 --> 00:41:06,450
There are three ways
that B can happen.

783
00:41:06,450 --> 00:41:10,390
Either B happens together with
A1, or B happens together with

784
00:41:10,390 --> 00:41:13,030
A2, or B happens together
with A3.

785
00:41:13,030 --> 00:41:15,340
So we need to add the
probabilities of these three

786
00:41:15,340 --> 00:41:16,630
contingencies.

787
00:41:16,630 --> 00:41:18,980
For each one of those
contingencies, we can

788
00:41:18,980 --> 00:41:23,020
calculate its probability by
using the multiplication rule.

789
00:41:23,020 --> 00:41:27,580
So the probability of A1 and
B happening is this--

790
00:41:27,580 --> 00:41:30,030
It's the probability of A1
and then B happening

791
00:41:30,030 --> 00:41:32,020
given that A1 happens.

792
00:41:32,020 --> 00:41:36,140
The probability of this
contingency is found by taking

793
00:41:36,140 --> 00:41:39,470
the probability that A2 happens
times the conditional

794
00:41:39,470 --> 00:41:42,350
probability of A2, given
that B happened.

795
00:41:42,350 --> 00:41:44,640
And similarly for
the third one.

796
00:41:44,640 --> 00:41:48,030
So this is the general rule
that we have here.

797
00:41:48,030 --> 00:41:50,830
The rule is written for the
case of three scenarios.

798
00:41:50,830 --> 00:41:54,020
But obviously, it has a
generalization for the case of

799
00:41:54,020 --> 00:41:57,440
four or five or more
scenarios.

800
00:41:57,440 --> 00:42:02,050
It gives you a way of breaking
up the calculation of an event

801
00:42:02,050 --> 00:42:06,740
that can happen in multiple ways
by considering individual

802
00:42:06,740 --> 00:42:09,720
probabilities for the different
ways that the event

803
00:42:09,720 --> 00:42:10,970
can happen.

804
00:42:10,970 --> 00:42:12,950

805
00:42:12,950 --> 00:42:14,640
OK.

806
00:42:14,640 --> 00:42:16,300
So--

807
00:42:16,300 --> 00:42:16,656
Yes?

808
00:42:16,656 --> 00:42:18,180
AUDIENCE: Does this
have to change for

809
00:42:18,180 --> 00:42:19,800
infinite sample space?

810
00:42:19,800 --> 00:42:20,760
JOHN TSISIKLIS: No.

811
00:42:20,760 --> 00:42:23,050
This is true whether
your sample space

812
00:42:23,050 --> 00:42:25,450
is infinite or finite.

813
00:42:25,450 --> 00:42:28,410
What I'm using in this argument
that we have a

814
00:42:28,410 --> 00:42:33,670
partition into just three
scenarios, three events.

815
00:42:33,670 --> 00:42:36,720
So it's a partition to a finite
number of events.

816
00:42:36,720 --> 00:42:41,100
It's also true if it's a
partition into an infinite

817
00:42:41,100 --> 00:42:43,670
sequence of events.

818
00:42:43,670 --> 00:42:47,550
But that's, I think, one of the
theoretical problems at

819
00:42:47,550 --> 00:42:49,430
the end of the chapter.

820
00:42:49,430 --> 00:42:54,350
You probably may not
need it for now.

821
00:42:54,350 --> 00:42:57,550
OK, going back to
the story here.

822
00:42:57,550 --> 00:43:00,410
There are three possible
scenarios about what could

823
00:43:00,410 --> 00:43:03,390
happen in the world that
are captured here.

824
00:43:03,390 --> 00:43:08,660
Event, under each scenario,
event B may or may not happen.

825
00:43:08,660 --> 00:43:11,850
And so these probabilities tell
us the likelihoods of the

826
00:43:11,850 --> 00:43:13,270
different scenarios.

827
00:43:13,270 --> 00:43:17,640
These conditional probabilities
tell us how

828
00:43:17,640 --> 00:43:21,030
likely is it for B to happen
under one scenario, or the

829
00:43:21,030 --> 00:43:23,760
other scenario, or the
other scenario.

830
00:43:23,760 --> 00:43:28,510
The overall probability of
B is found by taking some

831
00:43:28,510 --> 00:43:32,380
combination of the probabilities
of B in the

832
00:43:32,380 --> 00:43:34,250
different possible
worlds, in the

833
00:43:34,250 --> 00:43:36,230
different possible scenarios.

834
00:43:36,230 --> 00:43:38,690
Under some scenario, B
may be very likely.

835
00:43:38,690 --> 00:43:42,280
Under another scenario, it
may be very unlikely.

836
00:43:42,280 --> 00:43:45,740
We take all of these into
account and weigh them

837
00:43:45,740 --> 00:43:48,590
according to the likelihood
of the scenarios.

838
00:43:48,590 --> 00:43:53,040
Now notice that since A1, A2,
and three form a partition,

839
00:43:53,040 --> 00:43:58,530
these three probabilities
have what property?

840
00:43:58,530 --> 00:44:00,810
Add to what?

841
00:44:00,810 --> 00:44:03,640
They add to one.

842
00:44:03,640 --> 00:44:06,020
So it's the probability of this
branch, plus this branch,

843
00:44:06,020 --> 00:44:07,240
plus this branch.

844
00:44:07,240 --> 00:44:11,660
So what we have here is a
weighted average of the

845
00:44:11,660 --> 00:44:15,120
probabilities of the B's into
the different worlds, or in

846
00:44:15,120 --> 00:44:16,690
the different scenarios.

847
00:44:16,690 --> 00:44:17,860
Special case.

848
00:44:17,860 --> 00:44:20,370
Suppose the three scenarios
are equally likely.

849
00:44:20,370 --> 00:44:25,300
So P of A1 equals 1/3, equals
to P of A2, P of A3.

850
00:44:25,300 --> 00:44:27,320
what are we saying here?

851
00:44:27,320 --> 00:44:31,750
In that case of equally likely
scenarios, the probability of

852
00:44:31,750 --> 00:44:35,920
B is the average of the
probabilities of B in the

853
00:44:35,920 --> 00:44:38,835
three different words, or in the
three different scenarios.

854
00:44:38,835 --> 00:44:42,950

855
00:44:42,950 --> 00:44:43,450
OK.

856
00:44:43,450 --> 00:44:46,630
So to finally, the last step.

857
00:44:46,630 --> 00:44:53,800
If we go back again two slides,
the last thing that we

858
00:44:53,800 --> 00:44:57,510
did was to calculate a
conditional probability of

859
00:44:57,510 --> 00:45:01,760
this kind, probability of
A given B, which is a

860
00:45:01,760 --> 00:45:04,080
probability associated
essentially with

861
00:45:04,080 --> 00:45:05,630
an inference problem.

862
00:45:05,630 --> 00:45:09,840
Given that our radar recorded
something, how likely is it

863
00:45:09,840 --> 00:45:12,060
that the plane was up there?

864
00:45:12,060 --> 00:45:15,240
So we're trying to infer whether
a plane was up there

865
00:45:15,240 --> 00:45:18,610
or not, based on the information
that we've got.

866
00:45:18,610 --> 00:45:20,770
So let's generalize once more.

867
00:45:20,770 --> 00:45:24,560

868
00:45:24,560 --> 00:45:28,250
And we're just going to rewrite
what we did in that

869
00:45:28,250 --> 00:45:32,190
example, but in terms of general
symbols instead of the

870
00:45:32,190 --> 00:45:33,650
specific numbers.

871
00:45:33,650 --> 00:45:38,180
So once more, the model that we
have involves probabilities

872
00:45:38,180 --> 00:45:40,480
of the different scenarios.

873
00:45:40,480 --> 00:45:42,830
These we call them prior
probabilities.

874
00:45:42,830 --> 00:45:46,690
They're are our initial beliefs
about how likely each

875
00:45:46,690 --> 00:45:49,360
scenario is to occur.

876
00:45:49,360 --> 00:45:54,500
We also have a model of our
measuring device that tells us

877
00:45:54,500 --> 00:45:58,110
under that scenario how likely
is it that our radar will

878
00:45:58,110 --> 00:46:00,140
register something or not.

879
00:46:00,140 --> 00:46:03,220
So we're given again these
conditional probabilities.

880
00:46:03,220 --> 00:46:04,330
We're given the conditional

881
00:46:04,330 --> 00:46:06,950
probabilities for these branches.

882
00:46:06,950 --> 00:46:11,050
Then we are told that
event B occurred.

883
00:46:11,050 --> 00:46:15,330
And on the basis of this new
information, we want to form

884
00:46:15,330 --> 00:46:18,510
some new beliefs about the
relative likelihood of the

885
00:46:18,510 --> 00:46:20,110
different scenarios.

886
00:46:20,110 --> 00:46:23,790
Going back again to our radar
example, an airplane was

887
00:46:23,790 --> 00:46:26,340
present with probability 5%.

888
00:46:26,340 --> 00:46:29,180
Given that the radar recorded
something, we're going to

889
00:46:29,180 --> 00:46:30,540
change our beliefs.

890
00:46:30,540 --> 00:46:34,870
Now, a plane is present
with probability 34%.

891
00:46:34,870 --> 00:46:38,270
The radar, since we saw
something, we are going to

892
00:46:38,270 --> 00:46:41,880
revise our beliefs as to whether
the plane is out there

893
00:46:41,880 --> 00:46:43,130
or is not there.

894
00:46:43,130 --> 00:46:46,040

895
00:46:46,040 --> 00:46:52,660
And so what we need to do is to
calculate the conditional

896
00:46:52,660 --> 00:46:57,290
probabilities of the different
scenarios, given the

897
00:46:57,290 --> 00:46:59,340
information that we got.

898
00:46:59,340 --> 00:47:02,330
So initially, we have these
probabilities for the

899
00:47:02,330 --> 00:47:04,000
different scenarios.

900
00:47:04,000 --> 00:47:06,870
Once we get the information,
we update them and we

901
00:47:06,870 --> 00:47:09,760
calculate our revised
probabilities or conditional

902
00:47:09,760 --> 00:47:14,130
probabilities given the
observation that we made.

903
00:47:14,130 --> 00:47:14,730
OK.

904
00:47:14,730 --> 00:47:15,760
So what do we do?

905
00:47:15,760 --> 00:47:17,620
We just use the definition
of conditional

906
00:47:17,620 --> 00:47:19,360
probabilities twice.

907
00:47:19,360 --> 00:47:22,490
By definition the conditional
probability is the probability

908
00:47:22,490 --> 00:47:25,740
of two things happening divided
by the probability of

909
00:47:25,740 --> 00:47:27,960
the conditioning event.

910
00:47:27,960 --> 00:47:30,480
Now, I'm using the definition
of conditional probabilities

911
00:47:30,480 --> 00:47:33,550
once more, or rather I use
the multiplication rule.

912
00:47:33,550 --> 00:47:35,970
The probability of two things
happening is the probability

913
00:47:35,970 --> 00:47:38,740
of the first and the second.

914
00:47:38,740 --> 00:47:41,190
So these are things that
are given to us.

915
00:47:41,190 --> 00:47:43,430
They're the probabilities of
the different scenarios.

916
00:47:43,430 --> 00:47:47,750
And it's the model of our
measuring device, which we

917
00:47:47,750 --> 00:47:51,810
assume to be available.

918
00:47:51,810 --> 00:47:53,450
And how about the denominator?

919
00:47:53,450 --> 00:47:57,780
This is total probability of the
event B. But we just found

920
00:47:57,780 --> 00:48:01,140
that's it's easy to calculate
using the formula in the

921
00:48:01,140 --> 00:48:02,400
previous slide.

922
00:48:02,400 --> 00:48:04,750
To find the overall probability
of event B

923
00:48:04,750 --> 00:48:08,260
occurring, we look at the
probabilities of B occurring

924
00:48:08,260 --> 00:48:11,560
under the different scenario
and weigh them according to

925
00:48:11,560 --> 00:48:13,710
the probabilities of
all the scenarios.

926
00:48:13,710 --> 00:48:17,370
So in the end, we have a formula
for the conditional

927
00:48:17,370 --> 00:48:22,730
probability, A's given B,
based on the data of the

928
00:48:22,730 --> 00:48:25,090
problem, which were
probabilities of the different

929
00:48:25,090 --> 00:48:27,360
scenarios and conditional
probabilities of

930
00:48:27,360 --> 00:48:29,490
B, given the A's.

931
00:48:29,490 --> 00:48:33,320
So what this calculation does
is, basically, it reverses the

932
00:48:33,320 --> 00:48:35,310
order of conditioning.

933
00:48:35,310 --> 00:48:39,000
We are given conditional
probabilities of these kind,

934
00:48:39,000 --> 00:48:42,950
where it's B given A and we
produce new conditional

935
00:48:42,950 --> 00:48:46,630
probabilities, where things
go the other way.

936
00:48:46,630 --> 00:48:53,530
So schematically, what's
happening here is that we have

937
00:48:53,530 --> 00:48:59,995
model of cause and
effect and--

938
00:48:59,995 --> 00:49:02,550

939
00:49:02,550 --> 00:49:09,840
So a scenario occurs and that
may cause B to happen or may

940
00:49:09,840 --> 00:49:11,880
not cause it to happen.

941
00:49:11,880 --> 00:49:14,495
So this is a cause/effect
model.

942
00:49:14,495 --> 00:49:17,300

943
00:49:17,300 --> 00:49:20,090
And it's modeled using
probabilities, such as

944
00:49:20,090 --> 00:49:23,350
probability of B given Ai.

945
00:49:23,350 --> 00:49:28,710
And what we want to do is
inference where we are told

946
00:49:28,710 --> 00:49:35,910
that B occurs, and we want
to infer whether Ai

947
00:49:35,910 --> 00:49:38,580
also occurred or not.

948
00:49:38,580 --> 00:49:42,050
And the appropriate
probabilities for that are the

949
00:49:42,050 --> 00:49:45,010
conditional probabilities
that A occurred,

950
00:49:45,010 --> 00:49:48,110
given that B occurred.

951
00:49:48,110 --> 00:49:52,250
So we're starting with a causal
model of our situation.

952
00:49:52,250 --> 00:49:57,220
It models from a given cause how
likely is a certain effect

953
00:49:57,220 --> 00:49:58,830
to be observed.

954
00:49:58,830 --> 00:50:02,920
And then we do inference, which
answers the question,

955
00:50:02,920 --> 00:50:06,730
given that the effect was
observed, how likely is it

956
00:50:06,730 --> 00:50:10,870
that the world was in this
particular situation or state

957
00:50:10,870 --> 00:50:12,940
or scenario.

958
00:50:12,940 --> 00:50:17,260
So the name of the Bayes rule
comes from Thomas Bayes, a

959
00:50:17,260 --> 00:50:20,750
British theologian back
in the 1700s.

960
00:50:20,750 --> 00:50:21,530
It actually--

961
00:50:21,530 --> 00:50:25,000
This calculation addresses
a basic problem, a basic

962
00:50:25,000 --> 00:50:30,230
philosophical problem, how one
can learn from experience or

963
00:50:30,230 --> 00:50:33,300
from experimental data and
some systematic way.

964
00:50:33,300 --> 00:50:35,840
So the British at that time
were preoccupied with this

965
00:50:35,840 --> 00:50:36,710
type of question.

966
00:50:36,710 --> 00:50:41,200
Is there a basic theory that
about how we can incorporate

967
00:50:41,200 --> 00:50:44,280
new knowledge to previous
knowledge.

968
00:50:44,280 --> 00:50:47,600
And this calculation made an
argument that, yes, it is

969
00:50:47,600 --> 00:50:50,100
possible to do that in
a systematic way.

970
00:50:50,100 --> 00:50:53,040
So the philosophical
underpinnings of this have a

971
00:50:53,040 --> 00:50:57,050
very long history and a lot
of discussion around them.

972
00:50:57,050 --> 00:51:00,560
But for our purposes, it's just
an extremely useful tool.

973
00:51:00,560 --> 00:51:03,550
And it's the foundation of
almost everything that gets

974
00:51:03,550 --> 00:51:07,190
done when you try to do
inference based on partial

975
00:51:07,190 --> 00:51:08,860
observations.

976
00:51:08,860 --> 00:51:09,690
Very well.

977
00:51:09,690 --> 00:51:10,940
Till next time.

978
00:51:10,940 --> 00:51:11,760