1
00:00:00,000 --> 00:00:00,040

2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

3
00:00:02,460 --> 00:00:03,870
Commons license.

4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to

5
00:00:06,910 --> 00:00:10,560
offer high-quality educational
resources for free.

6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from

7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at

8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.

9
00:00:20,540 --> 00:00:23,050

10
00:00:23,050 --> 00:00:25,080
JOHN TSITSIKLIS:
OK let's start.

11
00:00:25,080 --> 00:00:26,560
So we've had the quiz.

12
00:00:26,560 --> 00:00:29,760
And I guess there's both good
and bad news in it.

13
00:00:29,760 --> 00:00:31,590
Yesterday, as you know,
the bad news.

14
00:00:31,590 --> 00:00:33,910
The average was a little
lower than what

15
00:00:33,910 --> 00:00:36,260
we would have wanted.

16
00:00:36,260 --> 00:00:39,580
On the other hand, the good news
is that the distribution

17
00:00:39,580 --> 00:00:41,770
was nicely spread.

18
00:00:41,770 --> 00:00:44,890
And that's the main purpose of
this quiz is basically for you

19
00:00:44,890 --> 00:00:48,260
to calibrate and see roughly
where you are standing.

20
00:00:48,260 --> 00:00:50,650
The other piece of the good
news is that, as you know,

21
00:00:50,650 --> 00:00:53,590
this quiz doesn't count for very
much in your final grade.

22
00:00:53,590 --> 00:00:58,230
So it's really a matter of
calibration and to get your

23
00:00:58,230 --> 00:01:02,810
mind set appropriately to
prepare for the second quiz,

24
00:01:02,810 --> 00:01:04,470
which counts a lot more.

25
00:01:04,470 --> 00:01:06,370
And it's more substantial.

26
00:01:06,370 --> 00:01:08,810
And we'll make sure that
the second quiz will

27
00:01:08,810 --> 00:01:12,110
have a higher average.

28
00:01:12,110 --> 00:01:12,520
All right.

29
00:01:12,520 --> 00:01:15,410
So let's go to our material.

30
00:01:15,410 --> 00:01:18,190
We're talking now
these days about

31
00:01:18,190 --> 00:01:20,440
continuous random variables.

32
00:01:20,440 --> 00:01:23,240
And I'll remind you what
we discussed last time.

33
00:01:23,240 --> 00:01:25,970
I'll remind you of the concept
of the probability density

34
00:01:25,970 --> 00:01:28,230
function of a single
random variable.

35
00:01:28,230 --> 00:01:31,090
And then we're going to rush
through all the concepts that

36
00:01:31,090 --> 00:01:34,230
we covered for the case of
discrete random variables and

37
00:01:34,230 --> 00:01:37,770
discuss their analogs for
the continuous case.

38
00:01:37,770 --> 00:01:40,410
And talk about notions
such as conditioning

39
00:01:40,410 --> 00:01:42,170
independence and so on.

40
00:01:42,170 --> 00:01:46,420
So the big picture is here.

41
00:01:46,420 --> 00:01:49,590
We have all those concepts that
we developed for the case

42
00:01:49,590 --> 00:01:52,350
of discrete random variables.

43
00:01:52,350 --> 00:01:55,560
And now we will just talk about
their analogs in the

44
00:01:55,560 --> 00:01:56,840
continuous case.

45
00:01:56,840 --> 00:02:00,800
We already discussed this analog
last week, the density

46
00:02:00,800 --> 00:02:04,520
of a single random variable.

47
00:02:04,520 --> 00:02:08,570
Then there are certain concepts
that show up both in

48
00:02:08,570 --> 00:02:10,780
the discrete and the
continuous case.

49
00:02:10,780 --> 00:02:14,560
So we have the cumulative
distribution function, which

50
00:02:14,560 --> 00:02:18,070
is a description of the
probability distribution of a

51
00:02:18,070 --> 00:02:21,470
random variable and which
applies whether you have a

52
00:02:21,470 --> 00:02:23,780
discrete or continuous
random variable.

53
00:02:23,780 --> 00:02:26,500
Then there's the notion
of the expected value.

54
00:02:26,500 --> 00:02:29,990
And in the two cases, the
expected value is calculated

55
00:02:29,990 --> 00:02:32,990
in a slightly different way,
but not very different.

56
00:02:32,990 --> 00:02:36,080
We have sums in one case,
integrals in the other.

57
00:02:36,080 --> 00:02:37,720
And this is the general
pattern that

58
00:02:37,720 --> 00:02:39,030
we're going to have.

59
00:02:39,030 --> 00:02:42,120
Formulas for the discrete case
translate to corresponding

60
00:02:42,120 --> 00:02:44,920
formulas or expressions in
the continuous case.

61
00:02:44,920 --> 00:02:50,010
We generically replace sums by
integrals, and we replace must

62
00:02:50,010 --> 00:02:54,230
functions with density
functions.

63
00:02:54,230 --> 00:02:58,330
Then the new pieces for today
are going to be mostly the

64
00:02:58,330 --> 00:03:01,570
notion of a joint density
function, which is how we

65
00:03:01,570 --> 00:03:04,330
describe the probability
distribution of two random

66
00:03:04,330 --> 00:03:08,370
variables that are somehow
related, in general, and then

67
00:03:08,370 --> 00:03:11,780
the notion of a conditional
density function that tells us

68
00:03:11,780 --> 00:03:15,160
the distribution of one random
variable X when you're told

69
00:03:15,160 --> 00:03:19,200
the value of another random
variable Y. There's another

70
00:03:19,200 --> 00:03:22,680
concept, which is the
conditional PDF given that the

71
00:03:22,680 --> 00:03:24,860
certain event has happened.

72
00:03:24,860 --> 00:03:27,420
This is a concept that's
in some ways simpler.

73
00:03:27,420 --> 00:03:31,360
You've already seen a little
bit of that in last week's

74
00:03:31,360 --> 00:03:33,140
recitation and tutorial.

75
00:03:33,140 --> 00:03:35,640
The idea is that we have a
single random variable.

76
00:03:35,640 --> 00:03:37,710
It's described by a density.

77
00:03:37,710 --> 00:03:41,110
Then you're told that the
certain event has occurred.

78
00:03:41,110 --> 00:03:42,880
Your model changes
the universe that

79
00:03:42,880 --> 00:03:43,910
you are dealing with.

80
00:03:43,910 --> 00:03:46,640
In the new universe, you are
dealing with a new density

81
00:03:46,640 --> 00:03:51,310
function, the one that applies
given the knowledge that we

82
00:03:51,310 --> 00:03:55,700
have that the certain
event has occurred.

83
00:03:55,700 --> 00:03:56,160
All right.

84
00:03:56,160 --> 00:03:59,870
So what exactly did
we say about

85
00:03:59,870 --> 00:04:02,140
continuous random variables?

86
00:04:02,140 --> 00:04:05,020
The first thing is the
definition, that a random

87
00:04:05,020 --> 00:04:09,370
variable is said to be
continuous if we are given a

88
00:04:09,370 --> 00:04:12,220
certain object that we call
the probability density

89
00:04:12,220 --> 00:04:17,050
function and we can calculate
interval probabilities given

90
00:04:17,050 --> 00:04:18,709
this density function.

91
00:04:18,709 --> 00:04:21,589
So the definition is that the
random variable is continuous

92
00:04:21,589 --> 00:04:24,490
if you can calculate
probabilities associated with

93
00:04:24,490 --> 00:04:27,380
that random variable
given that formula.

94
00:04:27,380 --> 00:04:29,770
So this formula tells you that
the probability that your

95
00:04:29,770 --> 00:04:33,340
random variable falls inside
this interval is the area

96
00:04:33,340 --> 00:04:34,880
under the density curve.

97
00:04:34,880 --> 00:04:37,390

98
00:04:37,390 --> 00:04:37,700
OK.

99
00:04:37,700 --> 00:04:39,720
There's a few properties
that a density

100
00:04:39,720 --> 00:04:41,020
function must satisfy.

101
00:04:41,020 --> 00:04:42,900
Since we're talking about
probabilities, and

102
00:04:42,900 --> 00:04:45,890
probabilities are non-negative,
we have that the

103
00:04:45,890 --> 00:04:49,530
density function is always
a non-negative function.

104
00:04:49,530 --> 00:04:52,790
The total probability over
the entire real line

105
00:04:52,790 --> 00:04:54,690
must be equal to 1.

106
00:04:54,690 --> 00:04:58,070
So the integral when you
integrate over the entire real

107
00:04:58,070 --> 00:04:59,590
line has to be equal to 1.

108
00:04:59,590 --> 00:05:01,800
That's the second property.

109
00:05:01,800 --> 00:05:05,200
Another property that you get is
that if you let a equal to

110
00:05:05,200 --> 00:05:07,720
b, this integral becomes 0.

111
00:05:07,720 --> 00:05:11,390
And that tells you that the
probability of a single point

112
00:05:11,390 --> 00:05:15,990
in the continuous case
is always equal to 0.

113
00:05:15,990 --> 00:05:17,780
So these are formal
properties.

114
00:05:17,780 --> 00:05:21,290
When you want to think
intuitively, the best way to

115
00:05:21,290 --> 00:05:25,540
think about what the density
function is to think in terms

116
00:05:25,540 --> 00:05:28,320
of little intervals, the
probability that my random

117
00:05:28,320 --> 00:05:31,540
variable falls inside
the little interval.

118
00:05:31,540 --> 00:05:35,170
Well, inside that little
interval, the density function

119
00:05:35,170 --> 00:05:36,940
here is roughly constant.

120
00:05:36,940 --> 00:05:42,430
So that integral becomes the
value of the density times the

121
00:05:42,430 --> 00:05:45,340
length of the interval over
which you are integrating,

122
00:05:45,340 --> 00:05:47,070
which is delta.

123
00:05:47,070 --> 00:05:50,240
And so the density function
basically gives us

124
00:05:50,240 --> 00:05:54,990
probabilities of little events,
of small events.

125
00:05:54,990 --> 00:05:59,200
And the density is to be
interpreted as probability per

126
00:05:59,200 --> 00:06:02,290
unit length at a certain
place in the diagram.

127
00:06:02,290 --> 00:06:04,800
So in that place in the diagram,
the probability per

128
00:06:04,800 --> 00:06:07,870
unit length around this
neighborhood would be the

129
00:06:07,870 --> 00:06:12,320
height of the density function
at that point.

130
00:06:12,320 --> 00:06:13,270
What else?

131
00:06:13,270 --> 00:06:16,440
We have a formula for
calculating expected values of

132
00:06:16,440 --> 00:06:17,980
functions of random variables.

133
00:06:17,980 --> 00:06:21,310
In the discrete case, we had the
formula where here we had

134
00:06:21,310 --> 00:06:25,430
the sum, and instead of the
density, we had the PMF.

135
00:06:25,430 --> 00:06:29,188
The same formula is also valid
in the continuous case.

136
00:06:29,188 --> 00:06:35,120
And it's not too hard to derive,
but we will not do it.

137
00:06:35,120 --> 00:06:36,910
But let's think of the
intuition of what

138
00:06:36,910 --> 00:06:38,420
this formula says.

139
00:06:38,420 --> 00:06:41,670
You're trying to figure out on
the average how much g(X) is

140
00:06:41,670 --> 00:06:42,780
going to be.

141
00:06:42,780 --> 00:06:47,130
And then you reason, and you
say, well, X may turn out to

142
00:06:47,130 --> 00:06:52,560
take a particular value or a
small interval of values.

143
00:06:52,560 --> 00:06:54,780
This is the probability
that X falls

144
00:06:54,780 --> 00:06:56,640
inside the small interval.

145
00:06:56,640 --> 00:07:00,310
And when that happens, g(X)
takes that value.

146
00:07:00,310 --> 00:07:03,930
So this fraction of the time,
you fall in the little

147
00:07:03,930 --> 00:07:07,350
neighborhood of x, and
you get so much.

148
00:07:07,350 --> 00:07:10,860
Then you average over all the
possible x's that can happen.

149
00:07:10,860 --> 00:07:13,930
And that gives you the average
value of the function g(X).

150
00:07:13,930 --> 00:07:17,730

151
00:07:17,730 --> 00:07:18,045
OK.

152
00:07:18,045 --> 00:07:20,650
So this is the easy stuff.

153
00:07:20,650 --> 00:07:23,690
Now let's get to the
new material.

154
00:07:23,690 --> 00:07:26,330
We want to talk about multiple
random variables

155
00:07:26,330 --> 00:07:27,320
simultaneously.

156
00:07:27,320 --> 00:07:31,530
So we want to talk now about two
random variables that are

157
00:07:31,530 --> 00:07:35,020
continuous, and in some sense
that they are jointly

158
00:07:35,020 --> 00:07:35,840
continuous.

159
00:07:35,840 --> 00:07:38,080
And let's see what this means.

160
00:07:38,080 --> 00:07:40,840
The definition is similar to
the definition we had for a

161
00:07:40,840 --> 00:07:44,850
single random variable, where
I take this formula here as

162
00:07:44,850 --> 00:07:49,510
the definition of continuous
random variables.

163
00:07:49,510 --> 00:07:53,830
Two random variables are said to
be jointly continuous if we

164
00:07:53,830 --> 00:07:58,190
can calculate probabilities by
integrating a certain function

165
00:07:58,190 --> 00:08:01,070
that we call the joint
density function

166
00:08:01,070 --> 00:08:03,310
over the set of interest.

167
00:08:03,310 --> 00:08:08,690
So we have our two-dimensional
plane.

168
00:08:08,690 --> 00:08:10,900
This is the x-y plane.

169
00:08:10,900 --> 00:08:13,810
There's a certain event S that
we're interested in.

170
00:08:13,810 --> 00:08:15,860
We want to calculate
the probability.

171
00:08:15,860 --> 00:08:17,370
How do we do that?

172
00:08:17,370 --> 00:08:22,660
We are given this function
f_(X,Y), the joint density.

173
00:08:22,660 --> 00:08:25,910
It's a function of the two
arguments x and y.

174
00:08:25,910 --> 00:08:29,530
So think of that function as
being some kind of surface

175
00:08:29,530 --> 00:08:34,809
that sits on top of the
two-dimensional plane.

176
00:08:34,809 --> 00:08:39,140
The probability of falling
inside the set S, we calculate

177
00:08:39,140 --> 00:08:45,350
it by looking at the volume
under the surface, that volume

178
00:08:45,350 --> 00:08:50,470
that sits on top of S. So the
surface underneath it has a

179
00:08:50,470 --> 00:08:52,010
certain total volume.

180
00:08:52,010 --> 00:08:54,650
What should that total
volume be?

181
00:08:54,650 --> 00:08:57,050
Well, we think of these volumes
as probabilities.

182
00:08:57,050 --> 00:09:00,180
So the total probability
should be equal to 1.

183
00:09:00,180 --> 00:09:05,430
The total volume under this
surface, should be equal to 1.

184
00:09:05,430 --> 00:09:08,220
So that's one property
that we want our

185
00:09:08,220 --> 00:09:10,138
density function to have.

186
00:09:10,138 --> 00:09:16,080

187
00:09:16,080 --> 00:09:20,500
So when you integrate over the
entire space, this is of the

188
00:09:20,500 --> 00:09:22,400
volume under your surface.

189
00:09:22,400 --> 00:09:24,090
That should be equal to 1.

190
00:09:24,090 --> 00:09:27,280
Of course, since we're talking
about probabilities, the joint

191
00:09:27,280 --> 00:09:29,560
density should be a non-negative
function.

192
00:09:29,560 --> 00:09:34,140
So think of the situation
as having one pound of

193
00:09:34,140 --> 00:09:38,230
probability that's spread
all over your space.

194
00:09:38,230 --> 00:09:41,430
And the height of this joint
density function basically

195
00:09:41,430 --> 00:09:45,470
tells you how much probability
tends to be accumulated in

196
00:09:45,470 --> 00:09:48,400
certain regions of space
as opposed to other

197
00:09:48,400 --> 00:09:49,870
parts of the space.

198
00:09:49,870 --> 00:09:53,130
So wherever the density is big,
that means that this is

199
00:09:53,130 --> 00:09:54,920
an area of the two-dimensional
plane that's

200
00:09:54,920 --> 00:09:56,340
more likely to occur.

201
00:09:56,340 --> 00:09:59,160
Where the density is small, that
means that those x-y's

202
00:09:59,160 --> 00:10:01,100
are less likely to occur.

203
00:10:01,100 --> 00:10:03,070
You have already seen
one example

204
00:10:03,070 --> 00:10:06,050
of continuous densities.

205
00:10:06,050 --> 00:10:08,730
That was the example we had in
the very beginning of the

206
00:10:08,730 --> 00:10:10,700
class with a uniform

207
00:10:10,700 --> 00:10:13,380
distribution on the unit square.

208
00:10:13,380 --> 00:10:15,510
That was a special
case of a density

209
00:10:15,510 --> 00:10:17,250
function that was constant.

210
00:10:17,250 --> 00:10:20,090
So all places in the unit square
were roughly equally

211
00:10:20,090 --> 00:10:22,010
likely as any other places.

212
00:10:22,010 --> 00:10:25,580
But in other models, some parts
of the space may be more

213
00:10:25,580 --> 00:10:27,000
likely than others.

214
00:10:27,000 --> 00:10:29,470
And we describe those relative
likelihoods using

215
00:10:29,470 --> 00:10:31,120
this density function.

216
00:10:31,120 --> 00:10:33,420
So if somebody gives us the
density function, this

217
00:10:33,420 --> 00:10:38,480
determines for us probabilities
of all the

218
00:10:38,480 --> 00:10:41,520
subsets of the two-dimensional
plane.

219
00:10:41,520 --> 00:10:45,710
Now for an intuitive
interpretation, it's good to

220
00:10:45,710 --> 00:10:47,460
think about small events.

221
00:10:47,460 --> 00:10:51,220
So let's take a particular x
here and then x plus delta.

222
00:10:51,220 --> 00:10:53,020
So this is a small interval.

223
00:10:53,020 --> 00:10:56,190
Take another small interval
here that goes from y to y

224
00:10:56,190 --> 00:10:57,560
plus delta.

225
00:10:57,560 --> 00:11:03,270
And let's look at the event that
x falls here and y falls

226
00:11:03,270 --> 00:11:04,780
right there.

227
00:11:04,780 --> 00:11:05,780
What is this event?

228
00:11:05,780 --> 00:11:07,760
Well, this is the event
that will fall

229
00:11:07,760 --> 00:11:11,030
inside this little rectangle.

230
00:11:11,030 --> 00:11:15,820
Using this rule for calculating
probabilities,

231
00:11:15,820 --> 00:11:19,040
what is the probability of that
rectangle going to be?

232
00:11:19,040 --> 00:11:23,130
Well, it should be the integral
of the density over

233
00:11:23,130 --> 00:11:24,300
this rectangle.

234
00:11:24,300 --> 00:11:29,720
Or it's the volume under the
surface that sits on top of

235
00:11:29,720 --> 00:11:31,010
that rectangle.

236
00:11:31,010 --> 00:11:34,300
Now, if the rectangle is very
small, the joint density is

237
00:11:34,300 --> 00:11:36,760
not going to change very much
in that neighborhood.

238
00:11:36,760 --> 00:11:38,770
So we can treat it
as a constant.

239
00:11:38,770 --> 00:11:42,350
So the volume is going to
be the height times

240
00:11:42,350 --> 00:11:44,030
the area of the base.

241
00:11:44,030 --> 00:11:47,150
The height at that point is
whatever the function happens

242
00:11:47,150 --> 00:11:49,460
to be around that point.

243
00:11:49,460 --> 00:11:52,590
And the area of the base
is delta squared.

244
00:11:52,590 --> 00:11:58,750
So this is the intuitive way
to understand what a joint

245
00:11:58,750 --> 00:12:01,070
density function really
tells you.

246
00:12:01,070 --> 00:12:04,200
It specifies for you
probabilities of little

247
00:12:04,200 --> 00:12:08,500
squares, of little rectangles.

248
00:12:08,500 --> 00:12:11,880
And it allows you to think of
the joint density function as

249
00:12:11,880 --> 00:12:15,310
probability per unit area.

250
00:12:15,310 --> 00:12:18,790
So these are the units of the
density, its probability per

251
00:12:18,790 --> 00:12:23,800
unit area in the neighborhood
of a certain point.

252
00:12:23,800 --> 00:12:26,970
So what do we do with this
density function once we have

253
00:12:26,970 --> 00:12:28,410
it in our hands?

254
00:12:28,410 --> 00:12:32,640
Well, we can use it to calculate
expected values.

255
00:12:32,640 --> 00:12:34,880
Suppose that you have a
function of two random

256
00:12:34,880 --> 00:12:38,040
variables described by
a joint density.

257
00:12:38,040 --> 00:12:41,580
You can find, perhaps, the
distribution of this random

258
00:12:41,580 --> 00:12:45,330
variable and then use the
basic definition of the

259
00:12:45,330 --> 00:12:46,150
expectation.

260
00:12:46,150 --> 00:12:49,260
Or you can calculate
expectations directly, using

261
00:12:49,260 --> 00:12:52,010
the distribution of the original
random variables.

262
00:12:52,010 --> 00:12:55,280
This is a formula that's again
identical to the formula that

263
00:12:55,280 --> 00:12:57,290
we had for the discrete case.

264
00:12:57,290 --> 00:12:59,500
In the discrete case,
we had a double sum

265
00:12:59,500 --> 00:13:02,590
here, and we had PMFs.

266
00:13:02,590 --> 00:13:06,290
So the intuition behind this
formula is the same that one

267
00:13:06,290 --> 00:13:08,220
had for the discrete case.

268
00:13:08,220 --> 00:13:12,550
It's just that the mechanics
are different.

269
00:13:12,550 --> 00:13:16,220
Then something that we did in
the discrete case was to find

270
00:13:16,220 --> 00:13:21,510
a way to go from the joint
density of the two random

271
00:13:21,510 --> 00:13:25,750
variables taken together to the
density of just one of the

272
00:13:25,750 --> 00:13:28,190
random variables.

273
00:13:28,190 --> 00:13:30,570
So we had a formula for
the discrete case.

274
00:13:30,570 --> 00:13:33,450
Let's see how things are
going to work out in

275
00:13:33,450 --> 00:13:35,800
the continuous case.

276
00:13:35,800 --> 00:13:40,560
So in the continuous
case, we have here

277
00:13:40,560 --> 00:13:42,330
our two random variables.

278
00:13:42,330 --> 00:13:45,030
And we have a density
for them.

279
00:13:45,030 --> 00:13:48,340
And let's say that we want to
calculate the probability that

280
00:13:48,340 --> 00:13:51,570
x falls inside this interval.

281
00:13:51,570 --> 00:13:53,510
So we're looking at the
probability that our random

282
00:13:53,510 --> 00:13:58,380
variable X falls in the interval
from little x to x

283
00:13:58,380 --> 00:13:59,630
plus delta.

284
00:13:59,630 --> 00:14:02,130

285
00:14:02,130 --> 00:14:08,260
Now, by the properties that we
already have for interpreting

286
00:14:08,260 --> 00:14:11,460
the density function of a single
random variable, the

287
00:14:11,460 --> 00:14:14,100
probability of a little interval
is approximately the

288
00:14:14,100 --> 00:14:18,750
density of that single random
variable times delta.

289
00:14:18,750 --> 00:14:22,120
And now we want to find a
formula for this marginal

290
00:14:22,120 --> 00:14:26,540
density in terms of
the joint density.

291
00:14:26,540 --> 00:14:26,890
OK.

292
00:14:26,890 --> 00:14:28,930
So this is the probability
that x

293
00:14:28,930 --> 00:14:30,970
falls inside this interval.

294
00:14:30,970 --> 00:14:34,070
In terms of the two-dimensional
plane, this is

295
00:14:34,070 --> 00:14:40,030
the probability that (x,y)
falls inside this strip.

296
00:14:40,030 --> 00:14:44,520
So to find that probability,
we need to calculate the

297
00:14:44,520 --> 00:14:48,530
probability that (x,y) falls in
here, which is going to be

298
00:14:48,530 --> 00:14:55,780
the double integral over the
interval over this strip, of

299
00:14:55,780 --> 00:14:57,030
the joint density.

300
00:14:57,030 --> 00:15:05,080

301
00:15:05,080 --> 00:15:07,920
And what are we integrating
over?

302
00:15:07,920 --> 00:15:11,185
y goes from minus infinity
to plus infinity.

303
00:15:11,185 --> 00:15:15,680

304
00:15:15,680 --> 00:15:22,755
And the dummy variable x goes
from little x to x plus delta.

305
00:15:22,755 --> 00:15:27,240

306
00:15:27,240 --> 00:15:31,580
So to integrate over this strip,
what we do is for any

307
00:15:31,580 --> 00:15:34,810
given y, we integrate
in this dimension.

308
00:15:34,810 --> 00:15:36,770
This is the x integral.

309
00:15:36,770 --> 00:15:40,220
And then we integrate over
the y dimension.

310
00:15:40,220 --> 00:15:42,920
Now what is this
inner integral?

311
00:15:42,920 --> 00:15:50,250
Because x only varies very
little, this is approximately

312
00:15:50,250 --> 00:15:53,040
constant in that range.

313
00:15:53,040 --> 00:15:56,210
So the integral with
respect to x just

314
00:15:56,210 --> 00:15:58,840
becomes delta times f(x,y).

315
00:15:58,840 --> 00:16:02,010

316
00:16:02,010 --> 00:16:03,490
And then we've got our dy.

317
00:16:03,490 --> 00:16:06,930

318
00:16:06,930 --> 00:16:11,760
So this is what the inner
integral will evaluate to.

319
00:16:11,760 --> 00:16:15,280
We are integrating over
the little interval.

320
00:16:15,280 --> 00:16:17,450
So we're keeping y fixed.

321
00:16:17,450 --> 00:16:22,020
Integrating over here, we take
the value of the density times

322
00:16:22,020 --> 00:16:24,940
how much we're integrating
over.

323
00:16:24,940 --> 00:16:27,890
And we get this formula.

324
00:16:27,890 --> 00:16:28,410
OK.

325
00:16:28,410 --> 00:16:33,170
Now, this expression must be
equal to that expression.

326
00:16:33,170 --> 00:16:40,060
So if we cancel the deltas, we
see that the marginal density

327
00:16:40,060 --> 00:16:44,000
must be equal to the integral of
the joint density, where we

328
00:16:44,000 --> 00:16:48,200
have integrated out
the value of y.

329
00:16:48,200 --> 00:16:54,060

330
00:16:54,060 --> 00:16:59,000
So this formula should come as
no surprise at this point.

331
00:16:59,000 --> 00:17:01,380
It's exactly the same as the
formula that we had for

332
00:17:01,380 --> 00:17:03,270
discrete random variables.

333
00:17:03,270 --> 00:17:06,800
But now we are replacing the
sum with an integral.

334
00:17:06,800 --> 00:17:14,690
And instead of using the
joint PMF, we are

335
00:17:14,690 --> 00:17:18,480
using the joint PDF.

336
00:17:18,480 --> 00:17:21,810
Then, continuing going down the
list of things we did for

337
00:17:21,810 --> 00:17:24,839
discrete random variables, we
can now introduce a definition

338
00:17:24,839 --> 00:17:28,310
of the notion of independence
of two random variables.

339
00:17:28,310 --> 00:17:31,050
And by analogy with the discrete
case, we define

340
00:17:31,050 --> 00:17:33,940
independence to be the
following condition.

341
00:17:33,940 --> 00:17:37,210
Two random variables are
independent if and only if

342
00:17:37,210 --> 00:17:42,220
their joint density function
factors out as a product of

343
00:17:42,220 --> 00:17:44,390
their marginal densities.

344
00:17:44,390 --> 00:17:48,000
And this property needs to
be true for all x and y.

345
00:17:48,000 --> 00:17:49,890
So this is the formal
definition.

346
00:17:49,890 --> 00:17:53,020
Operationally and intuitively,
what does it mean?

347
00:17:53,020 --> 00:17:55,110
Well, intuitively it means
the same thing as in

348
00:17:55,110 --> 00:17:56,600
the discrete case.

349
00:17:56,600 --> 00:18:00,610
Knowing anything about X
shouldn't tell you anything

350
00:18:00,610 --> 00:18:05,320
about Y. That is, information
about X is not going to change

351
00:18:05,320 --> 00:18:10,120
your beliefs about Y. We are
going to come back to this

352
00:18:10,120 --> 00:18:11,370
statement in a second.

353
00:18:11,370 --> 00:18:14,320

354
00:18:14,320 --> 00:18:16,920
The other thing that it
allows you to do--

355
00:18:16,920 --> 00:18:20,750
I'm not going to derive this--
is it allows you to calculate

356
00:18:20,750 --> 00:18:25,650
probabilities by multiplying
individual probabilities.

357
00:18:25,650 --> 00:18:28,110
So if you ask for the
probability that x falls in a

358
00:18:28,110 --> 00:18:34,220
certain set A and y falls in a
certain set B, then you can

359
00:18:34,220 --> 00:18:37,670
calculate that probability
by multiplying individual

360
00:18:37,670 --> 00:18:38,920
probabilities.

361
00:18:38,920 --> 00:18:41,860

362
00:18:41,860 --> 00:18:46,090
This takes just two lines of
derivation, which I'm not

363
00:18:46,090 --> 00:18:47,710
going to do.

364
00:18:47,710 --> 00:18:51,240
But it comes back to
the usual notion of

365
00:18:51,240 --> 00:18:53,370
independence of events.

366
00:18:53,370 --> 00:18:56,340
Basically, operationally
independence means that you

367
00:18:56,340 --> 00:18:57,660
can multiply probabilities.

368
00:18:57,660 --> 00:19:00,190

369
00:19:00,190 --> 00:19:04,380
So now let's look
at an example.

370
00:19:04,380 --> 00:19:08,150
There's a sort of pretty famous
and classical one.

371
00:19:08,150 --> 00:19:12,540
It goes back a lot more
than a 100 years.

372
00:19:12,540 --> 00:19:16,290
And it's the famous
Needle of Buffon.

373
00:19:16,290 --> 00:19:19,860
Buffon was a French naturalist
who, for some reason, also

374
00:19:19,860 --> 00:19:22,150
decided to play with
probability.

375
00:19:22,150 --> 00:19:24,590
And look at the following
problem.

376
00:19:24,590 --> 00:19:28,400
So you have the two-dimensional
plane.

377
00:19:28,400 --> 00:19:33,870
And on the plane we draw a
bunch of parallel lines.

378
00:19:33,870 --> 00:19:37,575
And those parallel lines are
separated by a length.

379
00:19:37,575 --> 00:19:46,830

380
00:19:46,830 --> 00:19:52,270
And the lines are apart
at distance d.

381
00:19:52,270 --> 00:19:58,780
And we throw a needle at random,
completely at random.

382
00:19:58,780 --> 00:20:01,510
And we'll have to give a meaning
to what "completely at

383
00:20:01,510 --> 00:20:03,180
random" means.

384
00:20:03,180 --> 00:20:06,490
And when we throw a needle,
there's two possibilities.

385
00:20:06,490 --> 00:20:09,640
Either the needle is going to
fall in a way that does not

386
00:20:09,640 --> 00:20:13,120
intersect any of the lines, or
it's going to fall in a way

387
00:20:13,120 --> 00:20:15,700
that it intersects
one of the lines.

388
00:20:15,700 --> 00:20:19,470
We're taking the needle to be
shorter than this distance, so

389
00:20:19,470 --> 00:20:22,185
the needle cannot intersect
two lines simultaneously.

390
00:20:22,185 --> 00:20:26,230
It either intersects 0, or it
intersects one of the lines.

391
00:20:26,230 --> 00:20:29,610
The question is to find the
probability that the needle is

392
00:20:29,610 --> 00:20:32,100
going to intersect a line.

393
00:20:32,100 --> 00:20:34,650
What's the probability
of this?

394
00:20:34,650 --> 00:20:35,010
OK.

395
00:20:35,010 --> 00:20:40,020
We are going to approach this
problem by using our standard

396
00:20:40,020 --> 00:20:42,110
four-step procedure.

397
00:20:42,110 --> 00:20:46,560
Set up your sample space,
describe a probability law on

398
00:20:46,560 --> 00:20:51,460
that sample space, identify
the event of interest, and

399
00:20:51,460 --> 00:20:53,370
then calculate.

400
00:20:53,370 --> 00:20:58,470
These four steps basically
correspond to these three

401
00:20:58,470 --> 00:21:04,110
bullets and then the last
equation down here.

402
00:21:04,110 --> 00:21:06,510
So first thing is to set
up a sample space.

403
00:21:06,510 --> 00:21:09,470
We need some variables to
describe what happened in the

404
00:21:09,470 --> 00:21:10,780
experiment.

405
00:21:10,780 --> 00:21:14,300
So what happens in the
experiment is that the needle

406
00:21:14,300 --> 00:21:16,500
lands somewhere.

407
00:21:16,500 --> 00:21:20,450
And where it lands, we can
describe this by specifying

408
00:21:20,450 --> 00:21:24,160
the location of the center
of the needle.

409
00:21:24,160 --> 00:21:27,020
And what do we mean by the
location of the center?

410
00:21:27,020 --> 00:21:30,310
Well, we can take as our
variable to be the distance

411
00:21:30,310 --> 00:21:33,035
from the center of the needle
to the nearest line.

412
00:21:33,035 --> 00:21:36,280

413
00:21:36,280 --> 00:21:42,520
So it tells us the vertical
distance of the center of the

414
00:21:42,520 --> 00:21:45,930
needle from the nearest line.

415
00:21:45,930 --> 00:21:47,500
The other thing that
matters is the

416
00:21:47,500 --> 00:21:49,400
orientation of the needle.

417
00:21:49,400 --> 00:21:53,820
So we need one more variable,
which we take to be the angle

418
00:21:53,820 --> 00:21:56,940
that the needle is forming
with the lines.

419
00:21:56,940 --> 00:22:00,260
We can put the angle here,
or you can put in there.

420
00:22:00,260 --> 00:22:02,620
Yes, it's still the
same angle.

421
00:22:02,620 --> 00:22:06,850
So we have these two variables
that described what happened

422
00:22:06,850 --> 00:22:08,190
in the experiment.

423
00:22:08,190 --> 00:22:11,280
And we can take our sample space
to be the set of all

424
00:22:11,280 --> 00:22:14,390
possible x's and theta's.

425
00:22:14,390 --> 00:22:16,770
What are the possible x's?

426
00:22:16,770 --> 00:22:20,800
The lines are d apart, so the
nearest line is going to be

427
00:22:20,800 --> 00:22:24,400
anywhere between
0 and d/2 away.

428
00:22:24,400 --> 00:22:28,630
So that tells us what the
possible x's will be.

429
00:22:28,630 --> 00:22:31,420
As for theta, it really
depends how

430
00:22:31,420 --> 00:22:33,230
you define your angle.

431
00:22:33,230 --> 00:22:37,510
We are going to define our theta
to be the acute angle

432
00:22:37,510 --> 00:22:44,020
that's formed between the needle
and a line, if you were

433
00:22:44,020 --> 00:22:45,130
to extend it.

434
00:22:45,130 --> 00:22:50,180
So theta is going to be
something between 0 and pi/2.

435
00:22:50,180 --> 00:22:54,140
So I guess these red pieces
really correspond to the part

436
00:22:54,140 --> 00:22:58,490
of setting up the
sample space.

437
00:22:58,490 --> 00:22:58,810
OK.

438
00:22:58,810 --> 00:23:00,270
So that's part one.

439
00:23:00,270 --> 00:23:03,390
Second part is we
need a model.

440
00:23:03,390 --> 00:23:03,690
OK.

441
00:23:03,690 --> 00:23:08,140
Let's take our model to be that
we basically know nothing

442
00:23:08,140 --> 00:23:10,600
about how the needle falls.

443
00:23:10,600 --> 00:23:13,890
It can fall in any possible way,
and all possible ways are

444
00:23:13,890 --> 00:23:15,230
equally likely.

445
00:23:15,230 --> 00:23:18,910
Now, if you have those parallel
lines, and you close

446
00:23:18,910 --> 00:23:22,330
your eyes completely and throw a
needle completely at random,

447
00:23:22,330 --> 00:23:25,260
any x should be equally
likely.

448
00:23:25,260 --> 00:23:29,490
So we describe that situation by
saying that X should have a

449
00:23:29,490 --> 00:23:31,360
uniform distribution.

450
00:23:31,360 --> 00:23:33,880
That is, it should have a
constant density over the

451
00:23:33,880 --> 00:23:35,410
range of interest.

452
00:23:35,410 --> 00:23:39,160
Similarly, if you kind of spin
your needle completely at

453
00:23:39,160 --> 00:23:43,580
random, any angle should be as
likely as any other angle.

454
00:23:43,580 --> 00:23:47,160
And we decide to model this
situation by saying that theta

455
00:23:47,160 --> 00:23:49,680
also has a uniform
distribution over

456
00:23:49,680 --> 00:23:50,995
the range of interest.

457
00:23:50,995 --> 00:23:54,220

458
00:23:54,220 --> 00:23:58,500
And finally, where we put it
should have nothing to do with

459
00:23:58,500 --> 00:24:00,370
how much we rotate it.

460
00:24:00,370 --> 00:24:04,320
And we capture this
mathematically by saying that

461
00:24:04,320 --> 00:24:07,480
X is going to be independent
of theta.

462
00:24:07,480 --> 00:24:09,220
Now, this is going
to be our model.

463
00:24:09,220 --> 00:24:11,920
I'm not deriving the model
from anything.

464
00:24:11,920 --> 00:24:15,480
I'm only saying that this sounds
like a model that does

465
00:24:15,480 --> 00:24:19,800
not assume any knowledge or
preference for certain values

466
00:24:19,800 --> 00:24:22,360
of x rather than other
values of theta.

467
00:24:22,360 --> 00:24:25,660
In the absence of any other
particular information you

468
00:24:25,660 --> 00:24:28,420
might have in your hands, that's
the most reasonable

469
00:24:28,420 --> 00:24:30,520
model to come up with.

470
00:24:30,520 --> 00:24:32,150
So you model the problem
that way.

471
00:24:32,150 --> 00:24:35,490
So what's the formula for
the joint density?

472
00:24:35,490 --> 00:24:37,590
It's going to be the
product of the

473
00:24:37,590 --> 00:24:41,200
densities of X and Theta.

474
00:24:41,200 --> 00:24:42,410
Why is it the product?

475
00:24:42,410 --> 00:24:45,530
This is because we assumed
independence.

476
00:24:45,530 --> 00:24:48,910
And the density of X, since
it's uniform, and since it

477
00:24:48,910 --> 00:24:54,630
needs to integrate to 1, that
density needs to be 2/d.

478
00:24:54,630 --> 00:24:57,580
That's the density of X.
And the density of

479
00:24:57,580 --> 00:25:00,740
Theta needs to be 2/pi.

480
00:25:00,740 --> 00:25:03,660
That's the value for the density
of Theta so that the

481
00:25:03,660 --> 00:25:07,920
overall probability over this
interval ends up being 1.

482
00:25:07,920 --> 00:25:12,390
So now we do have our joint
density in our hands.

483
00:25:12,390 --> 00:25:14,690
The next thing to do
is to identify

484
00:25:14,690 --> 00:25:17,920
the event of interest.

485
00:25:17,920 --> 00:25:20,720
And this is best done
in a picture.

486
00:25:20,720 --> 00:25:23,380
And there's two possible
situations

487
00:25:23,380 --> 00:25:25,450
that one could have.

488
00:25:25,450 --> 00:25:33,450
Either the needle falls this
way, or it falls this way.

489
00:25:33,450 --> 00:25:38,300
So how can we tell if one or the
other is going to happen?

490
00:25:38,300 --> 00:25:45,470
It has to do with whether this
interval here is smaller than

491
00:25:45,470 --> 00:25:50,130
that or bigger than that.

492
00:25:50,130 --> 00:25:52,260
So we are comparing
the height of this

493
00:25:52,260 --> 00:25:55,460
interval to that interval.

494
00:25:55,460 --> 00:25:58,220
This interval here
is capital X.

495
00:25:58,220 --> 00:26:02,350
This interval here,
what is it?

496
00:26:02,350 --> 00:26:07,040
This is half of the length of
the needle, which is l/2.

497
00:26:07,040 --> 00:26:10,590
To find this height, we take l/2
and multiply it with the

498
00:26:10,590 --> 00:26:13,700
sine of the angle
that we have.

499
00:26:13,700 --> 00:26:18,330
So the length of this
interval up here is

500
00:26:18,330 --> 00:26:23,500
l/2 times sine theta.

501
00:26:23,500 --> 00:26:28,520
If this is smaller than
x, the needle does not

502
00:26:28,520 --> 00:26:30,010
intersect the line.

503
00:26:30,010 --> 00:26:33,130
If this is bigger than
x, then the needle

504
00:26:33,130 --> 00:26:34,920
intersects the line.

505
00:26:34,920 --> 00:26:37,870
So the event of interest, that
the needle intersects the

506
00:26:37,870 --> 00:26:42,740
line, is described this way
in terms of x and theta.

507
00:26:42,740 --> 00:26:46,170
And now that we have the event
of interest described

508
00:26:46,170 --> 00:26:50,100
mathematically, all that we
need to do is to find the

509
00:26:50,100 --> 00:26:54,800
probability of this event, we
integrate the joint density

510
00:26:54,800 --> 00:26:59,560
over the part of (x, theta)
space in which this

511
00:26:59,560 --> 00:27:01,320
inequality is true.

512
00:27:01,320 --> 00:27:04,670
So it's a double integral over
the set of all x's and theta's

513
00:27:04,670 --> 00:27:06,450
where this is true.

514
00:27:06,450 --> 00:27:11,430
The way to do this integral is
we fix theta, and we integrate

515
00:27:11,430 --> 00:27:15,150
for x's that go from 0
up to that number.

516
00:27:15,150 --> 00:27:19,030
And theta can be anything
between 0 and pi/2.

517
00:27:19,030 --> 00:27:23,620
So the integral over this set
is basically this double

518
00:27:23,620 --> 00:27:24,980
integral here.

519
00:27:24,980 --> 00:27:27,475
We already have a formula
for the joint density.

520
00:27:27,475 --> 00:27:30,930
It's 4 over pi d, so
we put it here.

521
00:27:30,930 --> 00:27:32,640
And now, fortunately,
this is a pretty

522
00:27:32,640 --> 00:27:34,645
easy integral to evaluate.

523
00:27:34,645 --> 00:27:37,650
The integral with respect to x
-- there's nothing in here.

524
00:27:37,650 --> 00:27:40,950
So the integral is just the
length of the interval over

525
00:27:40,950 --> 00:27:42,370
which we're integrating.

526
00:27:42,370 --> 00:27:44,950
It's l/2 sine theta.

527
00:27:44,950 --> 00:27:47,870
And then we need to integrate
this with respect to theta.

528
00:27:47,870 --> 00:27:53,990
We know that the integral of a
sine is a negative cosine.

529
00:27:53,990 --> 00:27:56,990
You plug in the values for
the negative cosine

530
00:27:56,990 --> 00:27:58,390
at the two end points.

531
00:27:58,390 --> 00:28:00,260
I'm sure you can do
this integral .

532
00:28:00,260 --> 00:28:04,540
And we finally obtain the
answer, which is amazingly

533
00:28:04,540 --> 00:28:08,210
simple for such a pretty
complicated-looking problem.

534
00:28:08,210 --> 00:28:09,910
It's 2l over pi d.

535
00:28:09,910 --> 00:28:12,420

536
00:28:12,420 --> 00:28:15,360
So some people a long, long time
ago, after they looked at

537
00:28:15,360 --> 00:28:19,290
this answer, they said that
maybe that gives us an

538
00:28:19,290 --> 00:28:22,910
interesting way where one could
estimate the value by

539
00:28:22,910 --> 00:28:26,130
pi, for example,
experimentally.

540
00:28:26,130 --> 00:28:27,690
How do you do that?

541
00:28:27,690 --> 00:28:32,360
Fix l and d, the dimensions
of the problem.

542
00:28:32,360 --> 00:28:36,680
Throw a million needles on
your piece of paper.

543
00:28:36,680 --> 00:28:40,690
See how often your needless
do intersect the line.

544
00:28:40,690 --> 00:28:43,540
That gives you a number
for this quantity.

545
00:28:43,540 --> 00:28:48,540
You know l and d, so you can
use that to infer pi.

546
00:28:48,540 --> 00:28:52,330
And there's an apocryphal story
about a wounded soldier

547
00:28:52,330 --> 00:28:55,300
in a hospital after the
American Civil War who

548
00:28:55,300 --> 00:28:58,490
actually had heard about this
and was spending his time in

549
00:28:58,490 --> 00:29:02,680
the hospital throwing needles
on pieces of paper.

550
00:29:02,680 --> 00:29:04,350
I don't know if it's
true or not.

551
00:29:04,350 --> 00:29:07,330
But let's do something
similar here.

552
00:29:07,330 --> 00:29:11,720
So let's look at this diagram.

553
00:29:11,720 --> 00:29:14,110
We fix the dimensions.

554
00:29:14,110 --> 00:29:15,920
This is supposed to
be our little d.

555
00:29:15,920 --> 00:29:18,330
That's supposed to
be our little l.

556
00:29:18,330 --> 00:29:22,430
We have the formula from the
previous slide that p

557
00:29:22,430 --> 00:29:25,230
is 2l over pi d.

558
00:29:25,230 --> 00:29:29,230
In this instance, we choose
d to be twice l.

559
00:29:29,230 --> 00:29:32,170
So this number is 1/pi.

560
00:29:32,170 --> 00:29:37,770
So the probability that the
needle hits the line is 1/pi.

561
00:29:37,770 --> 00:29:41,150
So I need needles that are
3.1 centimeters long.

562
00:29:41,150 --> 00:29:42,730
I couldn't find such needles.

563
00:29:42,730 --> 00:29:47,360
But I could find paper clips
that are 3.1 centimeters long.

564
00:29:47,360 --> 00:29:51,510
So let's start throwing paper
clips at random and see how

565
00:29:51,510 --> 00:29:55,285
many of them will end up
intersecting the lines.

566
00:29:55,285 --> 00:30:00,501

567
00:30:00,501 --> 00:30:01,920
Good.

568
00:30:01,920 --> 00:30:02,400
OK.

569
00:30:02,400 --> 00:30:09,350
So out of eight paper clips,
we have exactly four that

570
00:30:09,350 --> 00:30:11,510
intersected the line.

571
00:30:11,510 --> 00:30:13,620
So our estimate for the
probability of intersecting

572
00:30:13,620 --> 00:30:18,970
the line is 1/2, which gives us
an estimate for the value

573
00:30:18,970 --> 00:30:22,010
of pi, which is two.

574
00:30:22,010 --> 00:30:24,960
Well, I mean, within an
engineering approximation,

575
00:30:24,960 --> 00:30:29,090
we're in the right
ballpark, right?

576
00:30:29,090 --> 00:30:32,890
So this might look like a
silly way of trying to

577
00:30:32,890 --> 00:30:33,920
estimate pi.

578
00:30:33,920 --> 00:30:36,420
And it probably is.

579
00:30:36,420 --> 00:30:41,200
On the other hand, this kind of
methodology is being used

580
00:30:41,200 --> 00:30:44,930
especially by physicists and
also by statisticians.

581
00:30:44,930 --> 00:30:46,550
It's used a lot.

582
00:30:46,550 --> 00:30:48,260
When is it used?

583
00:30:48,260 --> 00:30:52,300
If you have an integral to
calculate, such as this

584
00:30:52,300 --> 00:30:55,980
integral, but you're not lucky,
and your functions are

585
00:30:55,980 --> 00:30:59,980
not so simple where you can do
your calculations by hand, and

586
00:30:59,980 --> 00:31:02,590
maybe the dimensions are
larger-- instead of two random

587
00:31:02,590 --> 00:31:04,590
variables you have 100
random variables, so

588
00:31:04,590 --> 00:31:08,210
it's a 100-fold integral--

589
00:31:08,210 --> 00:31:10,830
then there's no way to do
that in the computer.

590
00:31:10,830 --> 00:31:14,230
But the way that you can
actually do it is by

591
00:31:14,230 --> 00:31:18,290
generating random samples of
your random variables, doing

592
00:31:18,290 --> 00:31:21,220
that simulation over and
over many times.

593
00:31:21,220 --> 00:31:25,010
That is, by interpreting an
integral as a probability, you

594
00:31:25,010 --> 00:31:29,060
can use simulation to estimate
that probability.

595
00:31:29,060 --> 00:31:32,470
And that gives you a way of
calculating integrals.

596
00:31:32,470 --> 00:31:36,850
And physicists do actually use
that a lot, as well as

597
00:31:36,850 --> 00:31:39,630
statisticians, computer
scientists, and so on.

598
00:31:39,630 --> 00:31:41,760
It's a so-called Monte
Carlo method

599
00:31:41,760 --> 00:31:43,990
for evaluating integrals.

600
00:31:43,990 --> 00:31:50,250
And it's a basic piece of the
toolbox in science these days.

601
00:31:50,250 --> 00:31:54,610
Finally, the harder concept
of the day is the idea of

602
00:31:54,610 --> 00:31:55,770
conditioning.

603
00:31:55,770 --> 00:31:58,740
And here things become a little
subtle when you deal

604
00:31:58,740 --> 00:32:00,970
with continuous random
variables.

605
00:32:00,970 --> 00:32:02,290
OK.

606
00:32:02,290 --> 00:32:05,810
First, remember again our basic
interpretation of what a

607
00:32:05,810 --> 00:32:06,860
density is.

608
00:32:06,860 --> 00:32:08,200
A density gives us

609
00:32:08,200 --> 00:32:10,500
probabilities of little intervals.

610
00:32:10,500 --> 00:32:13,560
So how should we define
conditional densities?

611
00:32:13,560 --> 00:32:16,600
Conditional densities should
again give us probabilities of

612
00:32:16,600 --> 00:32:21,290
little intervals, but inside a
conditional world where we

613
00:32:21,290 --> 00:32:24,530
have been told something about
the other random variable.

614
00:32:24,530 --> 00:32:28,090
So what we would like to be
true is the following.

615
00:32:28,090 --> 00:32:31,340
We would like to define a
concept of a conditional

616
00:32:31,340 --> 00:32:34,530
density of a random variable X
given the value of another

617
00:32:34,530 --> 00:32:37,860
random variable Y. And it should
behave the following

618
00:32:37,860 --> 00:32:40,570
way, that the conditional
density gives us the

619
00:32:40,570 --> 00:32:42,690
probability of little
intervals--

620
00:32:42,690 --> 00:32:44,260
same as here--

621
00:32:44,260 --> 00:32:48,440
given that we are told
the value of y.

622
00:32:48,440 --> 00:32:50,930
And here's where the
subtleties come.

623
00:32:50,930 --> 00:32:54,420
The main thing to notice is
that here I didn't write

624
00:32:54,420 --> 00:32:59,000
"equal," I wrote "approximately
equal." Why do

625
00:32:59,000 --> 00:33:01,250
we need that?

626
00:33:01,250 --> 00:33:04,460
Well, the thing is that
conditional probabilities are

627
00:33:04,460 --> 00:33:08,840
not defined when you condition
on an event that has 0

628
00:33:08,840 --> 00:33:10,180
probability.

629
00:33:10,180 --> 00:33:13,400
So we need the conditioning
event here to have posed this

630
00:33:13,400 --> 00:33:14,430
probability.

631
00:33:14,430 --> 00:33:18,840
So instead of saying that Y is
exactly equal to little y, we

632
00:33:18,840 --> 00:33:22,900
want to instead say we're in a
new universe where capital Y

633
00:33:22,900 --> 00:33:27,070
is very close to little y.

634
00:33:27,070 --> 00:33:31,410
And then this notion of "very
close" kind of takes the limit

635
00:33:31,410 --> 00:33:34,910
and takes it to be
infinitesimally close.

636
00:33:34,910 --> 00:33:38,610
So this is the way to interpret
conditional

637
00:33:38,610 --> 00:33:40,120
probabilities.

638
00:33:40,120 --> 00:33:42,550
That's what they should mean.

639
00:33:42,550 --> 00:33:45,330
Now, in practice, when you
actually use probability, you

640
00:33:45,330 --> 00:33:46,780
forget about that subtlety.

641
00:33:46,780 --> 00:33:50,940
And you say, well, I've been
told that Y is equal to 1.3.

642
00:33:50,940 --> 00:33:53,780
Give me the conditional
distribution of X. But

643
00:33:53,780 --> 00:33:58,080
formally or rigorously, you
should say I'm being told that

644
00:33:58,080 --> 00:34:01,400
Y is infinitesimally
close to 1.3.

645
00:34:01,400 --> 00:34:03,620
Tell me the distribution of X.

646
00:34:03,620 --> 00:34:08,580
Now, if this is what we want,
what should this quantity be?

647
00:34:08,580 --> 00:34:10,489
It's a conditional probability,
so it should be

648
00:34:10,489 --> 00:34:12,800
the probability of two
things happening--

649
00:34:12,800 --> 00:34:16,550
X being close to little x, Y
being close to little y.

650
00:34:16,550 --> 00:34:20,010
And that's basically given to
us by the joint density

651
00:34:20,010 --> 00:34:23,920
divided by the probability of
the conditioning event, which

652
00:34:23,920 --> 00:34:27,449
has something to do with the
density of Y itself.

653
00:34:27,449 --> 00:34:30,840
And if you do things carefully,
you see that the

654
00:34:30,840 --> 00:34:34,350
only way to satisfy this
relation is to define the

655
00:34:34,350 --> 00:34:38,065
conditional density by this
particular formula.

656
00:34:38,065 --> 00:34:38,590
OK.

657
00:34:38,590 --> 00:34:44,159
Big discussion to come down in
the end to what you should

658
00:34:44,159 --> 00:34:46,120
have probably guessed by now.

659
00:34:46,120 --> 00:34:49,170
We just take any formulas and
expressions from the discrete

660
00:34:49,170 --> 00:34:53,570
case and replace PMFs by PDFs.

661
00:34:53,570 --> 00:34:58,030
So the conditional PDF is
defined by this formula where

662
00:34:58,030 --> 00:35:02,450
here we have joint PDF and
marginal PDF, as opposed to

663
00:35:02,450 --> 00:35:05,450
the discrete case where we
had the joint PMF and

664
00:35:05,450 --> 00:35:07,540
the marginal PMF.

665
00:35:07,540 --> 00:35:11,850
So in some sense, it's just
a syntactic change.

666
00:35:11,850 --> 00:35:14,510
In another sense, it's a little
subtler on how you

667
00:35:14,510 --> 00:35:17,130
actually interpret it.

668
00:35:17,130 --> 00:35:20,230
Speaking about interpretation,
what are some ways of thinking

669
00:35:20,230 --> 00:35:22,170
about the joint density?

670
00:35:22,170 --> 00:35:24,740
Well, the best way to think
about it is that somebody has

671
00:35:24,740 --> 00:35:27,720
fixed little y for you.

672
00:35:27,720 --> 00:35:31,980
So little y is being
fixed here.

673
00:35:31,980 --> 00:35:35,350
And we look at this density
as a function of X.

674
00:35:35,350 --> 00:35:37,020
I've told you what Y is.

675
00:35:37,020 --> 00:35:39,870
Tell me what you know about X.
And you tell me that X has a

676
00:35:39,870 --> 00:35:42,070
certain distribution.

677
00:35:42,070 --> 00:35:44,840
What does that distribution
look like?

678
00:35:44,840 --> 00:35:50,070
It has exactly the same shape
as the joint density.

679
00:35:50,070 --> 00:35:53,390
Remember, we fixed Y. So
this is a constant.

680
00:35:53,390 --> 00:35:57,200
So the only thing that varies
is X. So we get the function

681
00:35:57,200 --> 00:36:01,320
that behaves like the joint
density when you fix y, which

682
00:36:01,320 --> 00:36:04,100
is really you take the joint
density, and you

683
00:36:04,100 --> 00:36:05,650
take a slice of it.

684
00:36:05,650 --> 00:36:09,200
You fix a y, and you see
how it varies with x.

685
00:36:09,200 --> 00:36:11,810
So in that sense, the
conditional PDF is just a

686
00:36:11,810 --> 00:36:14,150
slice of the joint PDF.

687
00:36:14,150 --> 00:36:17,230
But we need to divide by a
certain number, which just

688
00:36:17,230 --> 00:36:19,480
scales it and changes
its shape.

689
00:36:19,480 --> 00:36:21,950
We're coming back to a
picture in a second.

690
00:36:21,950 --> 00:36:25,410
But before going to the picture,
lets go back to the

691
00:36:25,410 --> 00:36:27,840
interpretation of
independence.

692
00:36:27,840 --> 00:36:30,230
If the two random the variables
are independent,

693
00:36:30,230 --> 00:36:33,550
according to our definition in
the previous slide, the joint

694
00:36:33,550 --> 00:36:36,130
density is going to factor
as the product of

695
00:36:36,130 --> 00:36:37,820
the marginal densities.

696
00:36:37,820 --> 00:36:40,850
The density of Y in the
numerator cancels the density

697
00:36:40,850 --> 00:36:42,010
in the denominator.

698
00:36:42,010 --> 00:36:44,410
And we're just left with
the density of X.

699
00:36:44,410 --> 00:36:46,940
So in the case of independence,
what we get is

700
00:36:46,940 --> 00:36:49,870
that the conditional is the
same as the marginal.

701
00:36:49,870 --> 00:36:52,980
And that solidifies our
intuition that in the case of

702
00:36:52,980 --> 00:36:58,080
independence, being told
something about the value of Y

703
00:36:58,080 --> 00:37:02,540
does not change our beliefs
about how X is distributed.

704
00:37:02,540 --> 00:37:06,110
So whatever we expected about X
is going to remain true even

705
00:37:06,110 --> 00:37:09,180
after we are told something
about Y.

706
00:37:09,180 --> 00:37:12,680
So let's look at
some pictures.

707
00:37:12,680 --> 00:37:16,110
Here is what the joint
PDF might look like.

708
00:37:16,110 --> 00:37:19,480
Here we've got our
x and y-axis.

709
00:37:19,480 --> 00:37:23,100
And if you want to calculate the
probability of a certain

710
00:37:23,100 --> 00:37:27,240
event, what you do is you look
at that event and you see how

711
00:37:27,240 --> 00:37:31,740
much of that mass is sitting
on top of that event.

712
00:37:31,740 --> 00:37:35,180
Now let's start slicing.

713
00:37:35,180 --> 00:37:43,360
Let's fix a value of x and look
along that slice where we

714
00:37:43,360 --> 00:37:48,610
obtain this function.

715
00:37:48,610 --> 00:37:52,280
Now what does that slice do?

716
00:37:52,280 --> 00:37:56,100
That slice tells us for that
particular x what the possible

717
00:37:56,100 --> 00:38:00,330
values of y are going to be
and how likely they are.

718
00:38:00,330 --> 00:38:05,440
If we integrate over all
y's, what do we get?

719
00:38:05,440 --> 00:38:10,400
Integrating over all y's just
gives us the marginal density

720
00:38:10,400 --> 00:38:15,270
of X. It's the calculation
that we did here.

721
00:38:15,270 --> 00:38:19,820
By integrating over all y's, we
find the marginal density

722
00:38:19,820 --> 00:38:27,850
of X. So the total area under
that slice gives us the

723
00:38:27,850 --> 00:38:31,340
marginal density of X. And by
looking at the different

724
00:38:31,340 --> 00:38:35,430
slices, we find how likely the
different values of x are

725
00:38:35,430 --> 00:38:36,660
going to be.

726
00:38:36,660 --> 00:38:39,410
How about the conditional?

727
00:38:39,410 --> 00:38:48,790
If we're interested in the
conditional of Y given X, how

728
00:38:48,790 --> 00:38:51,200
would you think about it?

729
00:38:51,200 --> 00:38:54,620
This refers to a universe where
we are told that capital

730
00:38:54,620 --> 00:38:57,550
X takes on a specific value.

731
00:38:57,550 --> 00:39:00,010
So we put ourselves in
the universe where

732
00:39:00,010 --> 00:39:01,810
this line has happened.

733
00:39:01,810 --> 00:39:05,940
There's still possible values
of y that can happen.

734
00:39:05,940 --> 00:39:09,270
And this shape kind of tells us
the relative likelihoods of

735
00:39:09,270 --> 00:39:10,760
the different y's.

736
00:39:10,760 --> 00:39:14,060
And this is indeed going to be
the shape of the conditional

737
00:39:14,060 --> 00:39:17,850
distribution of Y given
that X has occurred.

738
00:39:17,850 --> 00:39:21,090
On the other hand, the
conditional distribution must

739
00:39:21,090 --> 00:39:22,630
add up to 1.

740
00:39:22,630 --> 00:39:25,920
So the total probability over
all of the different y's in

741
00:39:25,920 --> 00:39:27,730
this universe, that
total probability

742
00:39:27,730 --> 00:39:29,540
should be equal to 1.

743
00:39:29,540 --> 00:39:31,450
Here it's not equal to 1.

744
00:39:31,450 --> 00:39:34,290
The total area is the
marginal density.

745
00:39:34,290 --> 00:39:38,590
To make it equal to 1, we need
to divide by the marginal

746
00:39:38,590 --> 00:39:44,160
density, which is basically to
renormalize this shape so that

747
00:39:44,160 --> 00:39:48,500
the total area under that slice,
under that shape, is

748
00:39:48,500 --> 00:39:50,400
equal to 1.

749
00:39:50,400 --> 00:39:53,430
So we start with the joint.

750
00:39:53,430 --> 00:39:55,730
We take the slices.

751
00:39:55,730 --> 00:40:00,280
And then we adjust the slices
so that every slice has an

752
00:40:00,280 --> 00:40:03,610
area underneath equal to 1.

753
00:40:03,610 --> 00:40:05,650
And this gives us
the conditional.

754
00:40:05,650 --> 00:40:09,160
So for example, down here--

755
00:40:09,160 --> 00:40:11,840
you can not even see it
in this diagram--

756
00:40:11,840 --> 00:40:15,410
but after you renormalize it
so that its total area is

757
00:40:15,410 --> 00:40:20,160
equal to 1, you get this sort of
narrow spike that goes up.

758
00:40:20,160 --> 00:40:22,980
And so this is a plot of the
conditional distributions that

759
00:40:22,980 --> 00:40:26,060
you get for the different
values of x.

760
00:40:26,060 --> 00:40:29,050
Given a particular value of x,
you're going to get this

761
00:40:29,050 --> 00:40:31,460
certain conditional
distribution.

762
00:40:31,460 --> 00:40:36,460
So this picture is worth about
as much as anything else in

763
00:40:36,460 --> 00:40:38,840
this particular chapter.

764
00:40:38,840 --> 00:40:42,990
Make sure you kind of understand
exactly all these

765
00:40:42,990 --> 00:40:44,240
pieces of the picture.

766
00:40:44,240 --> 00:40:47,130

767
00:40:47,130 --> 00:40:49,870
And finally, let's go, in the
remaining time, through an

768
00:40:49,870 --> 00:40:55,240
example where we're going to
throw in the bucket all the

769
00:40:55,240 --> 00:40:58,320
concepts and notations that
we have introduced so far.

770
00:40:58,320 --> 00:40:59,960
So the example is as follows.

771
00:40:59,960 --> 00:41:04,210
We start with a stick that
has a certain length.

772
00:41:04,210 --> 00:41:07,790
And we break it a completely
random location.

773
00:41:07,790 --> 00:41:09,390
And--

774
00:41:09,390 --> 00:41:13,686
yes, this 1 should be l.

775
00:41:13,686 --> 00:41:14,130
OK.

776
00:41:14,130 --> 00:41:15,770
So it has length l.

777
00:41:15,770 --> 00:41:19,210
And we're going to break
it at the random place.

778
00:41:19,210 --> 00:41:21,970
And we call that random place
where we break it, we call it

779
00:41:21,970 --> 00:41:24,210
X.

780
00:41:24,210 --> 00:41:26,670
X can be anywhere, uniform
distribution.

781
00:41:26,670 --> 00:41:31,800
So this means that X has a
density that goes from 0 to l.

782
00:41:31,800 --> 00:41:34,760
I guess this capital L is
supposed to be the same as the

783
00:41:34,760 --> 00:41:36,190
lower-case l.

784
00:41:36,190 --> 00:41:39,430
So that's the density of X. And
since the density needs to

785
00:41:39,430 --> 00:41:43,160
integrate to 1, the height of
that density has to be 1/l.

786
00:41:43,160 --> 00:41:46,330

787
00:41:46,330 --> 00:41:49,660
Now, having broken the stick
and given that we are left

788
00:41:49,660 --> 00:41:53,080
with this piece of the stick,
I'm now going to break it

789
00:41:53,080 --> 00:41:56,900
again at a completely random
place, meaning I'm going to

790
00:41:56,900 --> 00:41:59,940
choose a point where I break it
uniformly over the length

791
00:41:59,940 --> 00:42:00,940
of the stick.

792
00:42:00,940 --> 00:42:02,750
What does this mean?

793
00:42:02,750 --> 00:42:05,720
And let's call Y the location
where I break it.

794
00:42:05,720 --> 00:42:10,290
So Y is going to range
between 0 and x.

795
00:42:10,290 --> 00:42:11,850
x is the stick that
I'm left with.

796
00:42:11,850 --> 00:42:14,190
So I'm going to break it
somewhere in between.

797
00:42:14,190 --> 00:42:21,140
So I pick a y between 0 and x.

798
00:42:21,140 --> 00:42:24,480
And of course, x
is less than l.

799
00:42:24,480 --> 00:42:26,150
And I'm going to
break it there.

800
00:42:26,150 --> 00:42:30,640
So y is uniform between
0 and x.

801
00:42:30,640 --> 00:42:36,460
What does that mean, that the
density of y, given that you

802
00:42:36,460 --> 00:42:42,940
have already told me x, ranges
from 0 to little x?

803
00:42:42,940 --> 00:42:46,170
If I told you that the first
break happened at a particular

804
00:42:46,170 --> 00:42:50,850
x, then y can only range
over this interval.

805
00:42:50,850 --> 00:42:52,830
And I'm assuming a uniform

806
00:42:52,830 --> 00:42:54,330
distribution over that interval.

807
00:42:54,330 --> 00:42:56,420
So we have this kind of shape.

808
00:42:56,420 --> 00:43:00,700
And that fixes for
us the height of

809
00:43:00,700 --> 00:43:01,950
the conditional density.

810
00:43:01,950 --> 00:43:05,380

811
00:43:05,380 --> 00:43:11,690
So what's the joint density of
those two random variables?

812
00:43:11,690 --> 00:43:14,440
By the definition of conditional
densities, the

813
00:43:14,440 --> 00:43:18,290
conditional was defined as the
ratio of this divided by that.

814
00:43:18,290 --> 00:43:21,500
So we can find the joint density
by taking the marginal

815
00:43:21,500 --> 00:43:23,630
and then multiplying
by the conditional.

816
00:43:23,630 --> 00:43:26,120
This is the same formula as
in the discrete case.

817
00:43:26,120 --> 00:43:29,770
This is our very familiar
multiplication rule, but

818
00:43:29,770 --> 00:43:32,150
adjusted to the case of
continuous random variables.

819
00:43:32,150 --> 00:43:34,871
So Ps become Fs.

820
00:43:34,871 --> 00:43:35,290
OK.

821
00:43:35,290 --> 00:43:37,560
So we do have a formula
for this.

822
00:43:37,560 --> 00:43:38,540
What is it?

823
00:43:38,540 --> 00:43:40,190
It's 1/l--

824
00:43:40,190 --> 00:43:42,140
that's the density of X --

825
00:43:42,140 --> 00:43:46,460
times 1/x, which is the
conditional density of Y. This

826
00:43:46,460 --> 00:43:48,630
is the formula for the
joint density.

827
00:43:48,630 --> 00:43:50,140
But we must be careful.

828
00:43:50,140 --> 00:43:53,230
This is a formula that's
not valid anywhere.

829
00:43:53,230 --> 00:43:57,150
It's only valid for the x's
and y's that are possible.

830
00:43:57,150 --> 00:44:00,840
And the x's and y's that are
possible are given by these

831
00:44:00,840 --> 00:44:01,900
inequalities.

832
00:44:01,900 --> 00:44:05,940
So x can range from 0 to
l, and y can only be

833
00:44:05,940 --> 00:44:07,270
smaller than x.

834
00:44:07,270 --> 00:44:09,780
So this is the formula
for the density on

835
00:44:09,780 --> 00:44:12,310
this part of our space.

836
00:44:12,310 --> 00:44:16,270
The density is 0
anywhere else.

837
00:44:16,270 --> 00:44:18,430
So what does it look like?

838
00:44:18,430 --> 00:44:20,950
It's basically a 1/x function.

839
00:44:20,950 --> 00:44:23,460
So it's sort of constant
along that dimension.

840
00:44:23,460 --> 00:44:27,600
But as x goes to 0, your
density goes up and

841
00:44:27,600 --> 00:44:29,280
can even blow up.

842
00:44:29,280 --> 00:44:33,400
It sort of looks like a sail
that's raised and somewhat

843
00:44:33,400 --> 00:44:37,640
curved and has a point up
there going to infinity.

844
00:44:37,640 --> 00:44:39,680
So this is the joint density.

845
00:44:39,680 --> 00:44:43,480
Now once you have in your hands
a joint density, then

846
00:44:43,480 --> 00:44:46,010
you can answer in principle
any problem.

847
00:44:46,010 --> 00:44:50,550
It's just a matter of plugging
in and doing computations.

848
00:44:50,550 --> 00:44:53,650
How about calculating something
like a conditional

849
00:44:53,650 --> 00:44:59,040
expectation of Y given
a value of x?

850
00:44:59,040 --> 00:44:59,430
OK.

851
00:44:59,430 --> 00:45:02,530
That's a concept we have
not defined so far.

852
00:45:02,530 --> 00:45:04,860
But how should we define it?

853
00:45:04,860 --> 00:45:06,080
Means the reasonable thing.

854
00:45:06,080 --> 00:45:09,930
We'll define it the same way
as ordinary expectations

855
00:45:09,930 --> 00:45:14,160
except that since we're given
some conditioning information,

856
00:45:14,160 --> 00:45:17,130
we should use the probability
distribution that applies to

857
00:45:17,130 --> 00:45:18,840
that particular situation.

858
00:45:18,840 --> 00:45:22,570
So in a situation where we are
told the value of x, the

859
00:45:22,570 --> 00:45:25,760
distribution that applies is the
conditional distribution

860
00:45:25,760 --> 00:45:29,950
of Y. So it's going to be the
conditional density of Y given

861
00:45:29,950 --> 00:45:31,470
the value of x.

862
00:45:31,470 --> 00:45:34,120
Now, we know what this is.

863
00:45:34,120 --> 00:45:37,860
It's given by 1/x.

864
00:45:37,860 --> 00:45:46,160
So we need to integrate
y times 1/x dy.

865
00:45:46,160 --> 00:45:48,920
And what should we
integrate over?

866
00:45:48,920 --> 00:45:53,930
Well, given the value of x, y
can only range from 0 to x.

867
00:45:53,930 --> 00:45:56,150
So this is what we get.

868
00:45:56,150 --> 00:46:01,690
And you do your integral, and
you get that this is x/2.

869
00:46:01,690 --> 00:46:03,060
Is it a surprise?

870
00:46:03,060 --> 00:46:04,450
It shouldn't be.

871
00:46:04,450 --> 00:46:10,890
This is just the expected value
of Y in a universe where

872
00:46:10,890 --> 00:46:14,560
X has been realized and Y is
given by this distribution.

873
00:46:14,560 --> 00:46:17,390
Y is uniform between 0 and x.

874
00:46:17,390 --> 00:46:20,820
The expected value of Y should
be the midpoint of this

875
00:46:20,820 --> 00:46:22,100
interval, which is x/2.

876
00:46:22,100 --> 00:46:25,090

877
00:46:25,090 --> 00:46:28,580
Now let's do fancier stuff.

878
00:46:28,580 --> 00:46:31,850
Since we have the joint
distribution, we should be

879
00:46:31,850 --> 00:46:34,250
able to calculate
the marginal.

880
00:46:34,250 --> 00:46:36,500
What is the distribution of Y?

881
00:46:36,500 --> 00:46:40,510
After breaking the stick twice,
how big is the little

882
00:46:40,510 --> 00:46:42,890
piece that I'm left with?

883
00:46:42,890 --> 00:46:44,630
How do we find this?

884
00:46:44,630 --> 00:46:48,850
To find the marginal, we just
take the joint and integrate

885
00:46:48,850 --> 00:46:52,670
out the variable that
we don't want.

886
00:46:52,670 --> 00:46:55,220
A particular y can happen
in many ways.

887
00:46:55,220 --> 00:46:57,800
It can happen together
with any x.

888
00:46:57,800 --> 00:47:00,700
So we consider all the possible
x's that can go

889
00:47:00,700 --> 00:47:05,940
together with this y and average
over all those x's.

890
00:47:05,940 --> 00:47:09,330
So we plug in the formula for
the joint density from the

891
00:47:09,330 --> 00:47:10,140
previous slide.

892
00:47:10,140 --> 00:47:13,070
We know that it's 1/lx.

893
00:47:13,070 --> 00:47:16,880
And what's the range
of the x's?

894
00:47:16,880 --> 00:47:22,880
So to find the density of Y for
a particular y up here,

895
00:47:22,880 --> 00:47:26,480
I'm going to integrate
over x's.

896
00:47:26,480 --> 00:47:29,040
The density is 0
here and there.

897
00:47:29,040 --> 00:47:32,160
The density is nonzero
only in this part.

898
00:47:32,160 --> 00:47:37,260
So I need to integrate over x's
going from here to there.

899
00:47:37,260 --> 00:47:39,120
So what's the "here"?

900
00:47:39,120 --> 00:47:42,200
This line goes up at
the slope of 1.

901
00:47:42,200 --> 00:47:45,420
So this is the line
x equals y.

902
00:47:45,420 --> 00:47:49,835
So if I fix y, it means that
my integral starts from a

903
00:47:49,835 --> 00:47:53,670
value of x that is
also equal to y.

904
00:47:53,670 --> 00:47:58,330
So where the integral starts
from is at x equals y.

905
00:47:58,330 --> 00:48:01,770
And it goes all the way until
the end of the length of our

906
00:48:01,770 --> 00:48:03,660
stick, which is l.

907
00:48:03,660 --> 00:48:08,760
So we need to integrate
from little y up to l.

908
00:48:08,760 --> 00:48:12,520
So that's something that
almost always comes up.

909
00:48:12,520 --> 00:48:15,690
It's not enough to have just
this formula for integrating

910
00:48:15,690 --> 00:48:16,640
the joint density.

911
00:48:16,640 --> 00:48:19,160
You need to keep track
of different regions.

912
00:48:19,160 --> 00:48:23,920
And if the joint density is 0
in some regions, then you

913
00:48:23,920 --> 00:48:28,250
exclude those regions from
the range of integration.

914
00:48:28,250 --> 00:48:32,380
So the range of integration is
only over those values where

915
00:48:32,380 --> 00:48:35,600
the particular formula is valid,
the places where the

916
00:48:35,600 --> 00:48:37,990
joint density is nonzero.

917
00:48:37,990 --> 00:48:38,360
All right.

918
00:48:38,360 --> 00:48:41,760
The integral of 1/x dx, that
gives you a logarithm.

919
00:48:41,760 --> 00:48:45,460
So we evaluate this integral,
and we get an

920
00:48:45,460 --> 00:48:47,410
expression of this kind.

921
00:48:47,410 --> 00:48:53,660
So the density of Y has a
somewhat unexpected shape.

922
00:48:53,660 --> 00:48:55,470
So it's a logarithmic
function.

923
00:48:55,470 --> 00:48:59,860
And it goes this way.

924
00:48:59,860 --> 00:49:02,980
It's for y going all
the way to l.

925
00:49:02,980 --> 00:49:07,860
When y is equal to l, the
logarithm of 1 is equal to 0.

926
00:49:07,860 --> 00:49:12,660
But when y approaches 0,
logarithm of something big

927
00:49:12,660 --> 00:49:15,740
blows up, and we get a
shape of this form.

928
00:49:15,740 --> 00:49:21,900

929
00:49:21,900 --> 00:49:22,330
OK.

930
00:49:22,330 --> 00:49:25,960
Finally, we can calculate the
expected value of Y. And we

931
00:49:25,960 --> 00:49:29,430
can do this by using the
definition of the expectation.

932
00:49:29,430 --> 00:49:33,300
So integral of y times
the density of y.

933
00:49:33,300 --> 00:49:36,290
We already found what that
density is, so we

934
00:49:36,290 --> 00:49:38,030
can plug it in here.

935
00:49:38,030 --> 00:49:40,470
And we're integrating over
the range of possible

936
00:49:40,470 --> 00:49:42,470
y's, from 0 to l.

937
00:49:42,470 --> 00:49:46,930
Now this involves the integral
for y log y, which I'm sure

938
00:49:46,930 --> 00:49:49,500
you have encountered in your
calculus classes but maybe do

939
00:49:49,500 --> 00:49:51,350
not remember how to do it.

940
00:49:51,350 --> 00:49:53,650
In any case, you look it
up in some integral

941
00:49:53,650 --> 00:49:55,300
tables or do it by parts.

942
00:49:55,300 --> 00:49:59,360
And you get the final
answer of l/4.

943
00:49:59,360 --> 00:50:02,400
And at this point, you say,
that's a really simple answer.

944
00:50:02,400 --> 00:50:06,200
Shouldn't I have expected
it to be l/4?

945
00:50:06,200 --> 00:50:07,680
I guess, yes.

946
00:50:07,680 --> 00:50:11,070
I mean, when you break it once,
the expected value of

947
00:50:11,070 --> 00:50:14,220
what you are left with is going
to be 1/2 of what you

948
00:50:14,220 --> 00:50:15,860
started with.

949
00:50:15,860 --> 00:50:19,320
When you break it the next time,
the expected length of

950
00:50:19,320 --> 00:50:23,380
what you're left with should be
1/2 of the piece that you

951
00:50:23,380 --> 00:50:24,550
are now breaking.

952
00:50:24,550 --> 00:50:27,350
So each time that you break it
at random, you expected it to

953
00:50:27,350 --> 00:50:29,840
become smaller by
a factor of 1/2.

954
00:50:29,840 --> 00:50:31,960
So if you break it twice, you
are left something that's

955
00:50:31,960 --> 00:50:33,940
expected to be 1/4.

956
00:50:33,940 --> 00:50:37,350
This is reasoning on the
average, which happens to give

957
00:50:37,350 --> 00:50:39,010
you the right answer
in this case.

958
00:50:39,010 --> 00:50:41,800
But again, there's the warning
that reasoning on the average

959
00:50:41,800 --> 00:50:44,230
doesn't always give you
the right answer.

960
00:50:44,230 --> 00:50:48,100
So be careful about doing
arguments of this type.

961
00:50:48,100 --> 00:50:48,620
Very good.

962
00:50:48,620 --> 00:50:49,870
See you on Wednesday.

963
00:50:49,870 --> 00:50:50,870