1
00:00:00,530 --> 00:00:02,960
The following content is
provided under a Creative

2
00:00:02,960 --> 00:00:04,370
Commons license.

3
00:00:04,370 --> 00:00:07,410
Your support will help MIT
OpenCourseWare continue to

4
00:00:07,410 --> 00:00:11,060
offer high quality educational
resources for free.

5
00:00:11,060 --> 00:00:13,960
To make a donation or view
additional materials from

6
00:00:13,960 --> 00:00:17,890
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:17,890 --> 00:00:19,140
ocw.mit.edu.

8
00:00:24,010 --> 00:00:26,560
PROFESSOR: I'm going to spend
most of time talking about

9
00:00:26,560 --> 00:00:29,400
chapters one, two, and three.

10
00:00:29,400 --> 00:00:32,220
A little bit talking about
chapter four, because we've

11
00:00:32,220 --> 00:00:36,370
been doing so much with chapter
four in the last

12
00:00:36,370 --> 00:00:39,980
couple of weeks that you
probably remember that more.

13
00:00:39,980 --> 00:00:40,580
OK.

14
00:00:40,580 --> 00:00:44,310
The basics, which we started out
with, and which you should

15
00:00:44,310 --> 00:00:48,800
never forget, is that any time
you develop a probability

16
00:00:48,800 --> 00:00:53,840
model, you've got to specify
what the sample space is and

17
00:00:53,840 --> 00:00:57,920
what the probability measure
on that sample space is.

18
00:00:57,920 --> 00:01:01,850
And in practice, and in almost
everything we've talked about

19
00:01:01,850 --> 00:01:05,800
so far, there's really a basic
countable set of random

20
00:01:05,800 --> 00:01:08,490
variables which determine
everything else.

21
00:01:08,490 --> 00:01:12,030
In other words, when you find
the joint probability

22
00:01:12,030 --> 00:01:16,730
distribution on that set of
random variables, that tells

23
00:01:16,730 --> 00:01:20,570
you everything else
of interest.

24
00:01:20,570 --> 00:01:25,200
And a sample point or a sample
path on that set of random

25
00:01:25,200 --> 00:01:29,520
variables is in a collection of
sample values, one sample

26
00:01:29,520 --> 00:01:33,980
value for each random
variable.

27
00:01:33,980 --> 00:01:37,740
It's very convenient, especially
when you're in an

28
00:01:37,740 --> 00:01:43,630
exam and a little bit rushed,
to confuse random variables

29
00:01:43,630 --> 00:01:47,250
with the sample values for
the random variables.

30
00:01:47,250 --> 00:01:48,920
And that's fine.

31
00:01:48,920 --> 00:01:51,900
I just want to caution you
again, and I've done this many

32
00:01:51,900 --> 00:01:58,410
times, that about half the
mistakes that people make--

33
00:01:58,410 --> 00:02:01,980
half of the conceptual mistakes
that people make

34
00:02:01,980 --> 00:02:06,200
doing problems and doing quizzes
are connected with

35
00:02:06,200 --> 00:02:09,810
getting confused at some point
about what's a random variable

36
00:02:09,810 --> 00:02:12,210
and what's a sample value
of that random variable.

37
00:02:12,210 --> 00:02:17,210
And you start thinking about
sample values as just numbers.

38
00:02:17,210 --> 00:02:19,090
And I do that too.

39
00:02:19,090 --> 00:02:21,220
It's convenient for thinking
about things.

40
00:02:21,220 --> 00:02:26,790
But you have to know that that's
not the whole story.

41
00:02:26,790 --> 00:02:29,740
Often, we have uncountable
sets of random variables.

42
00:02:29,740 --> 00:02:34,720
Like in renewal processes, we
have the counting renewal

43
00:02:34,720 --> 00:02:38,690
process, which typically has an
uncountable set of random

44
00:02:38,690 --> 00:02:43,860
variables, a number of arrivals
up to each time, t,

45
00:02:43,860 --> 00:02:48,750
where t is a continuous valued
random variable.

46
00:02:48,750 --> 00:02:52,810
But in almost all of those
cases, you can define things

47
00:02:52,810 --> 00:02:56,195
in terms of simpler sets of
random variables, like the

48
00:02:56,195 --> 00:02:59,480
interarrival times,
which are IID.

49
00:03:02,530 --> 00:03:05,960
Most of the processes we've
talked about really have a

50
00:03:05,960 --> 00:03:08,600
pretty simple description if
you look for the simplest

51
00:03:08,600 --> 00:03:09,850
description of them.

52
00:03:13,730 --> 00:03:17,680
If you have a sequence of
IID random variables--

53
00:03:17,680 --> 00:03:25,270
which is what we have for
Poisson and renewal processes,

54
00:03:25,270 --> 00:03:28,680
and what we have for Markov
chains is not that much more

55
00:03:28,680 --> 00:03:30,310
complicated--

56
00:03:30,310 --> 00:03:35,500
the laws of large numbers are
useful to specify what the

57
00:03:35,500 --> 00:03:38,500
long term behavior is.

58
00:03:38,500 --> 00:03:47,280
The sample time average is, as
we all know by now, is the sum

59
00:03:47,280 --> 00:03:49,960
of the random variables
divided by n.

60
00:03:49,960 --> 00:03:53,090
So it's a sample average
of these quantities.

61
00:03:53,090 --> 00:03:57,570
The random variable, which has
a main x bar, the expected

62
00:03:57,570 --> 00:04:00,140
value of x, that's
almost obvious.

63
00:04:00,140 --> 00:04:03,350
You just take the expected value
of s sub n, and it's n

64
00:04:03,350 --> 00:04:08,360
times the expected value of x
divided by n, and you're done.

65
00:04:08,360 --> 00:04:11,680
And the variance, since these
random variables are

66
00:04:11,680 --> 00:04:15,540
independent, you find that
almost as easily.

67
00:04:15,540 --> 00:04:18,810
That has this very
simple-minded

68
00:04:18,810 --> 00:04:20,850
distribution function.

69
00:04:20,850 --> 00:04:24,340
Remember, we usually work
with distribution

70
00:04:24,340 --> 00:04:26,960
functions in this class.

71
00:04:26,960 --> 00:04:32,580
And often, the exercises are
much easier when you do them

72
00:04:32,580 --> 00:04:36,500
in terms of the distribution
function than if you use

73
00:04:36,500 --> 00:04:40,760
formulas you remember from
elementary courses, which are

74
00:04:40,760 --> 00:04:44,260
specialized to--

75
00:04:44,260 --> 00:04:47,140
which are specialized to
probability density and

76
00:04:47,140 --> 00:04:51,170
probability mass functions, and
often have more special

77
00:04:51,170 --> 00:04:53,110
conditions on them than that.

78
00:04:53,110 --> 00:04:57,470
But anyway, the distribution
function starts

79
00:04:57,470 --> 00:04:58,570
to look like this.

80
00:04:58,570 --> 00:05:03,250
As n gets bigger, you notice
that what's happening is that

81
00:05:03,250 --> 00:05:08,860
you get a distribution which
is scrunching in this way,

82
00:05:08,860 --> 00:05:10,820
which is starting to
look smoother.

83
00:05:10,820 --> 00:05:13,450
The jumps in it gets smaller.

84
00:05:13,450 --> 00:05:18,630
And you start out with this
thing which is kind of crazy.

85
00:05:18,630 --> 00:05:21,370
And by time, n is even 50.

86
00:05:21,370 --> 00:05:25,770
You get something which
almost looks like a--

87
00:05:25,770 --> 00:05:26,840
I don't know how we tell
the difference

88
00:05:26,840 --> 00:05:28,460
between those two things.

89
00:05:28,460 --> 00:05:30,060
I thought we could,
but we can't.

90
00:05:30,060 --> 00:05:31,670
I certainly can't up there.

91
00:05:31,670 --> 00:05:37,650
But anyway, the one that's
tightest in is the one

92
00:05:37,650 --> 00:05:39,880
for n equals 50.

93
00:05:39,880 --> 00:05:44,150
And what these laws of large
numbers all say in some sense

94
00:05:44,150 --> 00:05:51,380
is that this distribution
function gets crunched in

95
00:05:51,380 --> 00:05:54,550
towards an impulse
at the mean.

96
00:05:54,550 --> 00:05:58,260
And then they say other more
specialized things about how

97
00:05:58,260 --> 00:06:02,580
this happens, about sample
paths and all of that.

98
00:06:02,580 --> 00:06:06,270
But the idea is that this
distribution function is

99
00:06:06,270 --> 00:06:10,760
heading towards a
unit impulse.

100
00:06:10,760 --> 00:06:14,440
The weak law of large numbers
then says that if the expected

101
00:06:14,440 --> 00:06:18,840
value of the magnitude of x
is less than infinity--

102
00:06:18,840 --> 00:06:21,660
and usually when we talk about
random variables having a

103
00:06:21,660 --> 00:06:25,630
mean, that's exactly
what we mean.

104
00:06:25,630 --> 00:06:31,220
If that condition is not
satisfied, then we usually say

105
00:06:31,220 --> 00:06:33,690
that the random variable
doesn't have a mean.

106
00:06:33,690 --> 00:06:37,300
And you'll see that every time
you look at anything in

107
00:06:37,300 --> 00:06:38,520
probability theory.

108
00:06:38,520 --> 00:06:41,940
When people say the mean exists,
that's what they

109
00:06:41,940 --> 00:06:43,830
always mean.

110
00:06:43,830 --> 00:06:47,950
And what the theorem says then
is exactly what we were

111
00:06:47,950 --> 00:06:49,060
talking about before.

112
00:06:49,060 --> 00:06:54,940
The probability that the
difference between s n over n,

113
00:06:54,940 --> 00:06:58,570
and the mean of x bar, the
probability that it's greater

114
00:06:58,570 --> 00:07:03,090
than or equal to epsilon
equals 0 in the limit.

115
00:07:03,090 --> 00:07:06,020
So it's saying that you put
epsilon limits on that

116
00:07:06,020 --> 00:07:10,860
distribution function and let
n get bigger and bigger, it

117
00:07:10,860 --> 00:07:14,570
goes to 1 and 0.

118
00:07:14,570 --> 00:07:18,120
It says the probability of s n
over n, less than or equal to

119
00:07:18,120 --> 00:07:23,240
x, approaches a unit step as
n approaches infinity.

120
00:07:23,240 --> 00:07:27,660
This says this is the condition
for convergence in

121
00:07:27,660 --> 00:07:30,440
probability.

122
00:07:30,440 --> 00:07:33,880
What we're saying is that that
also means convergence and

123
00:07:33,880 --> 00:07:38,740
distribution function, and
distribution for this case.

124
00:07:38,740 --> 00:07:42,520
And then we also, when we got
to renewal processes, we

125
00:07:42,520 --> 00:07:45,330
talked about the strong
law of large numbers.

126
00:07:45,330 --> 00:07:49,760
And that says that the expected
value of x is finite.

127
00:07:49,760 --> 00:07:56,630
Then this limit approaches
x on a sample path basis.

128
00:07:56,630 --> 00:07:59,770
In other words, for every sample
path, except this set

129
00:07:59,770 --> 00:08:05,020
of probability 0, this
condition holds true.

130
00:08:05,020 --> 00:08:08,260
That doesn't seem like it's
very different or very

131
00:08:08,260 --> 00:08:10,610
important for the time being.

132
00:08:10,610 --> 00:08:14,060
But when we started studying
renewal processes, which is

133
00:08:14,060 --> 00:08:19,120
where we actually talked about
this, we saw that in fact, it

134
00:08:19,120 --> 00:08:24,830
let us talk about this, which
says that if you take any

135
00:08:24,830 --> 00:08:28,700
function of s n over n--

136
00:08:28,700 --> 00:08:31,590
in other words, a function
of a real value--

137
00:08:31,590 --> 00:08:33,830
a function of a--

138
00:08:33,830 --> 00:08:35,720
a real valued function of a--

139
00:08:40,010 --> 00:08:43,570
a real valued function
of a real value, yes.

140
00:08:43,570 --> 00:08:46,470
What you get is that
same function

141
00:08:46,470 --> 00:08:49,100
applied to the mean here.

142
00:08:49,100 --> 00:08:50,260
And that's the thing
which is so

143
00:08:50,260 --> 00:08:52,630
useful for renewal processes.

144
00:08:52,630 --> 00:08:55,740
And it's what usually makes
the strong law of large

145
00:08:55,740 --> 00:08:58,730
numbers so much easier to
use than the weak law.

146
00:09:04,220 --> 00:09:06,170
That's a plug for
the strong law.

147
00:09:06,170 --> 00:09:08,745
There are many extensions of the
week love telling how fast

148
00:09:08,745 --> 00:09:10,910
the convergence is.

149
00:09:10,910 --> 00:09:14,350
One thing you should always
remember about the central

150
00:09:14,350 --> 00:09:17,510
limit theorem, is it really
tells you something about the

151
00:09:17,510 --> 00:09:18,790
weak law of large numbers.

152
00:09:18,790 --> 00:09:22,260
It tells you how fast that
convergence is and what the

153
00:09:22,260 --> 00:09:24,720
convergence looks like.

154
00:09:24,720 --> 00:09:28,170
It says that if the variance
of this underlying random

155
00:09:28,170 --> 00:09:34,000
variable is finite, then this
limit here is equal to the

156
00:09:34,000 --> 00:09:37,290
normal distribution function,
the Gaussian at

157
00:09:37,290 --> 00:09:41,350
variance 1 and mean 0.

158
00:09:41,350 --> 00:09:45,070
And that becomes a little easier
to see what it's saying

159
00:09:45,070 --> 00:09:46,870
if you look at it this way.

160
00:09:46,870 --> 00:09:51,510
It says probability that s
n over n minus x bar--

161
00:09:51,510 --> 00:09:56,890
namely the difference between
the sum and the mean which

162
00:09:56,890 --> 00:09:58,380
it's converging to--

163
00:09:58,380 --> 00:10:01,340
the probability that that's less
than or equal to y sigma

164
00:10:01,340 --> 00:10:04,010
over square root of
n is this normal

165
00:10:04,010 --> 00:10:05,480
Gaussian random variable.

166
00:10:05,480 --> 00:10:11,740
It says that as n gets bigger
and bigger, this quantity here

167
00:10:11,740 --> 00:10:13,030
gets tighter and tighter.

168
00:10:13,030 --> 00:10:18,620
What it says in terms of the
picture here, in terms of this

169
00:10:18,620 --> 00:10:22,900
picture, it says that as n gets
bigger and bigger, this

170
00:10:22,900 --> 00:10:28,560
picture here scrunches down as
1 over the square root of n.

171
00:10:28,560 --> 00:10:30,970
And it also becomes Gaussian.

172
00:10:30,970 --> 00:10:33,760
| it tells you exactly what
kind of convergence you

173
00:10:33,760 --> 00:10:34,770
actually have here.

174
00:10:34,770 --> 00:10:39,200
Is not only saying that this
does converge to a unit step.

175
00:10:39,200 --> 00:10:42,010
It says how it converges.

176
00:10:42,010 --> 00:10:48,240
And that's a nice thing,
conceptually.

177
00:10:48,240 --> 00:10:51,780
You don't always need
it in problems.

178
00:10:51,780 --> 00:10:54,600
But you need it for
understanding what's going on.

179
00:10:59,890 --> 00:11:01,690
We're moving backwards,
it seems.

180
00:11:06,180 --> 00:11:09,420
Now, 1, 2, Poisson processes.

181
00:11:09,420 --> 00:11:12,630
We talked about arrival
processes.

182
00:11:12,630 --> 00:11:15,260
You'd almost think that all
processes are arrival

183
00:11:15,260 --> 00:11:17,080
processes at this point.

184
00:11:17,080 --> 00:11:19,770
But any time you start to think
about that, think of a

185
00:11:19,770 --> 00:11:21,270
Markov chain.

186
00:11:21,270 --> 00:11:26,150
And a Markov chain is not an
arrival process, ordinarily.

187
00:11:26,150 --> 00:11:28,470
Some of them can be
viewed that way.

188
00:11:28,470 --> 00:11:29,690
But most of them can't.

189
00:11:29,690 --> 00:11:31,990
An arrival processes
is an increasing

190
00:11:31,990 --> 00:11:34,650
sequence of random variables.

191
00:11:34,650 --> 00:11:40,020
0 less than s1, which is the
time of the first arrival, s2,

192
00:11:40,020 --> 00:11:42,810
which is a time of the second
arrival, and so forth.

193
00:11:42,810 --> 00:11:48,220
Interarrival times are x1 equals
s1, and x i equals s i

194
00:11:48,220 --> 00:11:51,150
minus s i minus 1.

195
00:11:51,150 --> 00:11:55,480
The picture, which you should
have indelibly printed on the

196
00:11:55,480 --> 00:11:58,850
back of your brain someplace
by this time, is

197
00:11:58,850 --> 00:12:00,430
this picture here.

198
00:12:00,430 --> 00:12:04,930
s1, s2, s3, are the times
at which arrivals occur.

199
00:12:04,930 --> 00:12:07,590
These are random variables, so
these arrivals come in at

200
00:12:07,590 --> 00:12:09,320
random times.

201
00:12:09,320 --> 00:12:14,690
x1, x2, x3 are the intervals
between arrivals.

202
00:12:14,690 --> 00:12:18,280
And N of t is the number of
arrivals that have occurred up

203
00:12:18,280 --> 00:12:19,860
until time t.

204
00:12:19,860 --> 00:12:26,800
So every time the t passes one
of these arrival times, N of t

205
00:12:26,800 --> 00:12:31,140
pops up by one, pops up by one
again, pops up by one again.

206
00:12:31,140 --> 00:12:34,200
The sample value
pops up by one.

207
00:12:34,200 --> 00:12:36,920
Arrival process can model
arrivals to a queue,

208
00:12:36,920 --> 00:12:40,320
departures from a queue,
locations of breaks in an oil

209
00:12:40,320 --> 00:12:43,960
line, an enormous number
of things.

210
00:12:43,960 --> 00:12:46,260
It's not just arrivals
we're talking about.

211
00:12:46,260 --> 00:12:48,070
It's all of these other
things, also.

212
00:12:48,070 --> 00:12:54,330
But it's something laid out on
a one-dimensional axis where

213
00:12:54,330 --> 00:12:58,390
things happen at various
places on that

214
00:12:58,390 --> 00:12:59,700
one-dimensional axis.

215
00:12:59,700 --> 00:13:05,100
So that's the way to view it.

216
00:13:05,100 --> 00:13:07,540
OK, same picture again.

217
00:13:07,540 --> 00:13:11,510
Process can be specified by the
joint distribution of the

218
00:13:11,510 --> 00:13:15,570
arrival epochs or the
interarrival times, and, in

219
00:13:15,570 --> 00:13:18,090
fact, of the counting process.

220
00:13:18,090 --> 00:13:25,200
If you see a sample path of
the counting process, then

221
00:13:25,200 --> 00:13:29,180
from that you can determine the
sample path of the arrival

222
00:13:29,180 --> 00:13:33,220
times and the sample path of
the interarrival times.

223
00:13:33,220 --> 00:13:38,320
And since any set of these
random variables specifies all

224
00:13:38,320 --> 00:13:43,220
three of these things, the
three are all equivalent.

225
00:13:43,220 --> 00:13:47,150
OK, we have this important
condition here.

226
00:13:47,150 --> 00:13:55,960
And I always sort of forget
this, but when these arrivals

227
00:13:55,960 --> 00:13:59,700
are highly delayed, when there's
a long period of time

228
00:13:59,700 --> 00:14:05,380
between each arrival, what that
says is the accounting

229
00:14:05,380 --> 00:14:08,480
process is getting small.

230
00:14:08,480 --> 00:14:12,570
So big interarrival times
corresponds to a small

231
00:14:12,570 --> 00:14:14,180
value of N of t.

232
00:14:14,180 --> 00:14:16,420
And you can see that in
the picture here.

233
00:14:16,420 --> 00:14:20,020
If you spread out these
arrivals, you make s1 all the

234
00:14:20,020 --> 00:14:21,290
way out here.

235
00:14:21,290 --> 00:14:26,190
Then N of t doesn't become
1 until way out here.

236
00:14:26,190 --> 00:14:32,930
So N of t as a function of t is
getting smaller as s sub n

237
00:14:32,930 --> 00:14:36,030
is getting larger.

238
00:14:36,030 --> 00:14:41,560
S sub n is the minimum of the
set of t, such that N of t is

239
00:14:41,560 --> 00:14:45,830
greater than or equal to N.
Sounds like a unpleasantly

240
00:14:45,830 --> 00:14:49,460
complicated expression.

241
00:14:49,460 --> 00:14:52,210
If any of you can find a simpler
way to say it than

242
00:14:52,210 --> 00:14:55,950
that, I would be absolutely
delighted to hear it.

243
00:14:55,950 --> 00:14:57,530
But I don't think there is.

244
00:14:57,530 --> 00:15:01,150
I think the simpler way to say
it is this picture here.

245
00:15:01,150 --> 00:15:03,230
And the picture says it.

246
00:15:03,230 --> 00:15:08,770
And you can sort of figure out
all those logical statements

247
00:15:08,770 --> 00:15:11,670
from the picture, which
is intuitively a

248
00:15:11,670 --> 00:15:12,942
lot clearer, I think.

249
00:15:17,270 --> 00:15:23,380
So now, renewal processes is
an arrival process with IID

250
00:15:23,380 --> 00:15:25,100
interarrival times.

251
00:15:25,100 --> 00:15:28,800
And a Poisson process is a
renewal process where the

252
00:15:28,800 --> 00:15:32,130
interarrival random variables
are exponential.

253
00:15:32,130 --> 00:15:35,290
So, Poisson process
is a special

254
00:15:35,290 --> 00:15:37,200
case of renewal process.

255
00:15:37,200 --> 00:15:40,920
Why are these exponential
interarrival

256
00:15:40,920 --> 00:15:43,350
arrival times so important?

257
00:15:43,350 --> 00:15:46,550
Well, it's because they're
memoryless.

258
00:15:46,550 --> 00:15:50,360
And the memoryless property says
that the probability that

259
00:15:50,360 --> 00:15:54,535
x is greater than t plus x is
equal to the probability that

260
00:15:54,535 --> 00:15:58,190
it's greater than x times the
probability that it's greater

261
00:15:58,190 --> 00:16:01,830
than t for all x and t greater
than or equal to 0.

262
00:16:01,830 --> 00:16:04,860
This makes better sense if
you say it conditionally.

263
00:16:04,860 --> 00:16:09,040
The probability that x is
greater than t plus x, given

264
00:16:09,040 --> 00:16:12,700
that it's greater than t, is
the same as the probability

265
00:16:12,700 --> 00:16:14,800
that x is greater that--

266
00:16:14,800 --> 00:16:17,460
capital X is greater
than little x.

267
00:16:17,460 --> 00:16:20,420
This really gives you
the memoryless

268
00:16:20,420 --> 00:16:21,780
property in a nutshell.

269
00:16:21,780 --> 00:16:25,860
It says if you're looking at
this process as it evolves,

270
00:16:25,860 --> 00:16:29,010
and you see an arrival, and then
you start looking for the

271
00:16:29,010 --> 00:16:32,160
next arrival, it says that no
matter how long you've been

272
00:16:32,160 --> 00:16:36,240
looking, the distribution
function, as the time to wait

273
00:16:36,240 --> 00:16:38,930
until the next arrival,
is the same

274
00:16:38,930 --> 00:16:40,580
exponential random variable.

275
00:16:40,580 --> 00:16:44,220
So you never gain anything
by waiting.

276
00:16:44,220 --> 00:16:46,390
You might as well
be impatient.

277
00:16:46,390 --> 00:16:48,790
But it doesn't do any good
to be impatient.

278
00:16:48,790 --> 00:16:51,130
Doesn't to any good to wait.

279
00:16:51,130 --> 00:16:52,850
It doesn't do any good
to not wait.

280
00:16:52,850 --> 00:16:56,280
No matter what you do, this
damn thing always takes an

281
00:16:56,280 --> 00:16:59,780
exponential amount
of time to occur.

282
00:16:59,780 --> 00:17:01,410
OK, that's what it means
to be memoryless.

283
00:17:01,410 --> 00:17:03,910
And the exponential is the only

284
00:17:03,910 --> 00:17:05,835
memoryless random variable.

285
00:17:10,775 --> 00:17:14,910
How about a geometric
random variable?

286
00:17:14,910 --> 00:17:19,190
The geometric random variable
is memoryless if you're only

287
00:17:19,190 --> 00:17:22,150
looking at integer times.

288
00:17:22,150 --> 00:17:32,180
Here we're talking about
times on a continuum.

289
00:17:32,180 --> 00:17:35,090
That's what this says.

290
00:17:35,090 --> 00:17:38,410
Well, that's what this says.

291
00:17:38,410 --> 00:17:46,590
And if you look at discrete
times, then a geometric random

292
00:17:46,590 --> 00:17:49,860
variable is memoryless also.

293
00:17:55,020 --> 00:17:58,210
We're given a Poisson
of rate lambda.

294
00:17:58,210 --> 00:18:01,290
The interval from any given t
greater than 0 until the first

295
00:18:01,290 --> 00:18:04,190
arrival after t is a
random variable.

296
00:18:04,190 --> 00:18:06,010
Let's call it z1.

297
00:18:06,010 --> 00:18:08,650
We already said that that
random variable was

298
00:18:08,650 --> 00:18:11,430
exponential.

299
00:18:11,430 --> 00:18:17,040
And it's independent of all
arrivals which occur before

300
00:18:17,040 --> 00:18:18,630
that starting time t.

301
00:18:18,630 --> 00:18:23,220
So looking at any starting
time t, doesn't make any

302
00:18:23,220 --> 00:18:25,530
difference what has happened
back here.

303
00:18:25,530 --> 00:18:27,450
That's not only the
last arrival, but

304
00:18:27,450 --> 00:18:29,630
all the other arrivals.

305
00:18:29,630 --> 00:18:32,880
The time until the next arrival
is exponential.

306
00:18:32,880 --> 00:18:36,520
The time until each arrival
after that is exponential

307
00:18:36,520 --> 00:18:41,690
also, which says that if you
look at this process starting

308
00:18:41,690 --> 00:18:47,250
at time t, it's a Poisson
process again, where all the

309
00:18:47,250 --> 00:18:50,450
times have to be shifted, of
course, but it's a Poisson

310
00:18:50,450 --> 00:18:52,830
process starting at time t.

311
00:18:52,830 --> 00:19:00,570
The corresponding counting
process, we can call it n

312
00:19:00,570 --> 00:19:04,950
tilde of t and tau, where tau is
greater than or equal to t,

313
00:19:04,950 --> 00:19:09,690
where this is the number of
arrivals in the original

314
00:19:09,690 --> 00:19:14,610
process up until time tau minus
the number of arrivals

315
00:19:14,610 --> 00:19:16,340
up until time t.

316
00:19:16,340 --> 00:19:19,330
If you look at that difference,
so many arrivals

317
00:19:19,330 --> 00:19:26,550
up until t, so many more
up until time tau.

318
00:19:26,550 --> 00:19:29,030
You look at the difference
between tau and t.

319
00:19:29,030 --> 00:19:37,080
The number of arrivals in that
interval is the same Poisson

320
00:19:37,080 --> 00:19:39,800
distributing random
variable again.

321
00:19:39,800 --> 00:19:43,080
So, it has the same
distribution as N

322
00:19:43,080 --> 00:19:45,020
of tau minus t.

323
00:19:45,020 --> 00:19:47,650
And that's called the stationary
increment property.

324
00:19:47,650 --> 00:19:50,720
It says that no matter where you
start a Poisson process,

325
00:19:50,720 --> 00:19:53,030
it always looks exactly
the same.

326
00:19:53,030 --> 00:19:58,370
It says that if you wait for one
hour and start then, it's

327
00:19:58,370 --> 00:20:01,750
exactly the same as what
it was before.

328
00:20:01,750 --> 00:20:05,960
If we had Poisson processes in
the world, it wouldn't do any

329
00:20:05,960 --> 00:20:09,720
good to travel on certain days
rather than other days.

330
00:20:09,720 --> 00:20:13,170
It wouldn't do any good to leave
to drive home at one

331
00:20:13,170 --> 00:20:14,850
hour rather than another hour.

332
00:20:14,850 --> 00:20:17,670
You'd have the same travel
all the time.

333
00:20:17,670 --> 00:20:18,980
It's all equal.

334
00:20:18,980 --> 00:20:21,140
It would be an awful world
if it were stationary.

335
00:20:23,770 --> 00:20:26,750
The independent increment
properties for counting

336
00:20:26,750 --> 00:20:33,170
process is that for all
sequences of ordered times--

337
00:20:33,170 --> 00:20:37,490
0 less than t1 less than
t2 up to t k--

338
00:20:37,490 --> 00:20:40,310
the random variables n of t1--

339
00:20:40,310 --> 00:20:44,440
and now we're talking about the
number of arrivals between

340
00:20:44,440 --> 00:20:47,510
t1 and t2, the number
of arrivals between

341
00:20:47,510 --> 00:20:49,600
n minus 1 and tn.

342
00:20:49,600 --> 00:20:52,330
These are all independent
of each other.

343
00:20:52,330 --> 00:20:55,390
That's what this independent
increment property says.

344
00:20:55,390 --> 00:20:58,110
And we see from what we've said
about this memoryless

345
00:20:58,110 --> 00:21:02,680
property that the Poisson
process does indeed have this

346
00:21:02,680 --> 00:21:04,750
independent increment
property.

347
00:21:04,750 --> 00:21:08,720
Poisson processes have both the
stationary and independent

348
00:21:08,720 --> 00:21:11,240
increment properties.

349
00:21:11,240 --> 00:21:15,760
And this looks like an immediate
consequence of that.

350
00:21:15,760 --> 00:21:16,370
It's not.

351
00:21:16,370 --> 00:21:19,630
Remember, we had to struggle
with this for a bit.

352
00:21:19,630 --> 00:21:22,500
But it says plus Poisson
processes can be defined by

353
00:21:22,500 --> 00:21:26,450
the stationary and independent
increment properties, plus

354
00:21:26,450 --> 00:21:32,730
either the Poisson PMF for N
of t, or this incremental

355
00:21:32,730 --> 00:21:38,660
property, the probability that N
tilde of t and t plus delta,

356
00:21:38,660 --> 00:21:43,320
and the number of arrivals
between t and t plus delta,

357
00:21:43,320 --> 00:21:46,170
the probability that that's
1 is equal to

358
00:21:46,170 --> 00:21:47,600
lambda times delta.

359
00:21:47,600 --> 00:21:53,040
In other words, this view of a
Poisson process is the view

360
00:21:53,040 --> 00:21:56,850
that you get when you sort
of forget about time.

361
00:21:56,850 --> 00:22:00,220
And you think of arrivals from
outer space coming down and

362
00:22:00,220 --> 00:22:01,470
hitting on a line.

363
00:22:01,470 --> 00:22:03,760
And they hit on that
line randomly.

364
00:22:03,760 --> 00:22:05,860
And each one of them
is independent

365
00:22:05,860 --> 00:22:07,780
of every other one.

366
00:22:07,780 --> 00:22:15,350
And that's what you get if you
wind up with a density of

367
00:22:15,350 --> 00:22:18,770
lambda arrivals per unit time.

368
00:22:18,770 --> 00:22:22,120
OK, we talked about all
of that, of course.

369
00:22:22,120 --> 00:22:23,400
The probability distributions--

370
00:22:26,050 --> 00:22:29,380
there are many of them for
a Poisson process.

371
00:22:29,380 --> 00:22:32,470
The Poisson process is
remarkable in the sense that

372
00:22:32,470 --> 00:22:35,320
anything you want to find,
there's generally a simple

373
00:22:35,320 --> 00:22:37,070
formula for it.

374
00:22:37,070 --> 00:22:39,530
If it's complicated, you're
probably not looking at

375
00:22:39,530 --> 00:22:42,010
it the right way.

376
00:22:42,010 --> 00:22:45,360
So many things come out
very, very simply.

377
00:22:45,360 --> 00:22:46,660
The probability--

378
00:22:46,660 --> 00:22:50,580
the joint probability
distribution of all of the

379
00:22:50,580 --> 00:22:58,670
arrival times up until time N is
an exponential just in the

380
00:22:58,670 --> 00:23:05,080
last one, which says that the
intermediate arrival epochs

381
00:23:05,080 --> 00:23:09,140
are equally likely to be
anywhere, just as long as they

382
00:23:09,140 --> 00:23:13,440
satisfy this ordering
restriction, s1 less than s2.

383
00:23:13,440 --> 00:23:15,430
That's what this formula says.

384
00:23:15,430 --> 00:23:20,490
It says that the joint density
of these arrival times doesn't

385
00:23:20,490 --> 00:23:23,010
depend on anything except the
time of the last one.

386
00:23:25,740 --> 00:23:28,520
But it does depend on the fact
that they're [INAUDIBLE].

387
00:23:28,520 --> 00:23:31,435
From that, you can find
virtually everything else if

388
00:23:31,435 --> 00:23:32,900
you want to.

389
00:23:32,900 --> 00:23:36,600
That really is saying exactly
the same thing as we were just

390
00:23:36,600 --> 00:23:38,440
saying a while ago.

391
00:23:38,440 --> 00:23:41,740
This is the viewpoint of looking
at this line from

392
00:23:41,740 --> 00:23:47,040
outer space with arrivals coming
in, coming in uniformly

393
00:23:47,040 --> 00:23:51,630
distributed over this line
interval, and each of them

394
00:23:51,630 --> 00:23:54,080
independent of each other one.

395
00:23:54,080 --> 00:23:57,740
That's what you wind
up saying.

396
00:23:57,740 --> 00:24:01,490
This density, then, of the
n-th arrival, if you just

397
00:24:01,490 --> 00:24:05,620
integrate all this stuff, you
get the Erlang formula.

398
00:24:05,620 --> 00:24:12,940
Probability of arrival n in
t to t plus delta is--

399
00:24:12,940 --> 00:24:17,820
now this is the derivation that
we went through before,

400
00:24:17,820 --> 00:24:20,310
going from Erlang to Poisson.

401
00:24:20,310 --> 00:24:24,370
You can go from Poisson to
Erlang too, if you want to.

402
00:24:24,370 --> 00:24:26,320
But it's a little easier
to go this way.

403
00:24:26,320 --> 00:24:30,500
The probability of arrival in
t to t plus delta is the

404
00:24:30,500 --> 00:24:35,890
probability that n of t is
equal to n minus 1 times

405
00:24:35,890 --> 00:24:40,670
lambda delta plus an o
of delta, of course.

406
00:24:40,670 --> 00:24:46,270
And the probability that n of
t is equal to n minus 1 from

407
00:24:46,270 --> 00:24:53,050
this formula here is going to be
the density of when s sub n

408
00:24:53,050 --> 00:24:55,040
appears, divided by lambda.

409
00:24:55,040 --> 00:24:58,910
That's exactly what this
formula here says.

410
00:24:58,910 --> 00:25:01,980
So that's just the Poisson
distribution.

411
00:25:01,980 --> 00:25:04,910
We've been through
that derivation.

412
00:25:04,910 --> 00:25:08,420
It's almost a derivation worth
remembering, because it just

413
00:25:08,420 --> 00:25:11,940
appears so often.

414
00:25:11,940 --> 00:25:16,160
As you've seen from the problem
sets we've done,

415
00:25:16,160 --> 00:25:20,970
almost every problem you can
dream of, dealing with Poisson

416
00:25:20,970 --> 00:25:27,150
processes, the easy way to do
them comes from this property

417
00:25:27,150 --> 00:25:30,730
of combining and splitting
Poisson processes.

418
00:25:30,730 --> 00:25:35,170
It says if n1 of t, n2 of t,
up to n sub k of t are

419
00:25:35,170 --> 00:25:37,500
independent Poisson
processes--

420
00:25:37,500 --> 00:25:39,880
what do you mean by
a process being

421
00:25:39,880 --> 00:25:42,200
independent of another process?

422
00:25:42,200 --> 00:25:46,660
Well, the process is specified
by the interarrival times for

423
00:25:46,660 --> 00:25:47,660
that process.

424
00:25:47,660 --> 00:25:50,950
So what we're saying here is the
interarrival times for the

425
00:25:50,950 --> 00:25:54,470
first process are independent
of the interarrival times of

426
00:25:54,470 --> 00:25:56,770
the second process,
independent of the

427
00:25:56,770 --> 00:26:00,620
interarrival times for the third
process, and so forth.

428
00:26:00,620 --> 00:26:02,990
Again, this is a view of someone
from outer space,

429
00:26:02,990 --> 00:26:06,180
throwing darts onto a line.

430
00:26:06,180 --> 00:26:09,750
And if you have multiple people
throwing darts on a

431
00:26:09,750 --> 00:26:13,450
line, but they're all equally
distributed, all uniformly

432
00:26:13,450 --> 00:26:16,600
distributed over the line,
this is exactly

433
00:26:16,600 --> 00:26:20,670
the model you get.

434
00:26:20,670 --> 00:26:22,180
So we have two views here.

435
00:26:22,180 --> 00:26:26,480
The first one is to look at
the arrival epochs that's

436
00:26:26,480 --> 00:26:28,420
generated from each process.

437
00:26:28,420 --> 00:26:31,710
And then combine all arrivals
into one Poisson process.

438
00:26:31,710 --> 00:26:34,900
So we look at all these Poisson
processes, and then

439
00:26:34,900 --> 00:26:38,340
take the sum of them, and we
get a Poisson process.

440
00:26:38,340 --> 00:26:40,190
The other way to look at it--

441
00:26:40,190 --> 00:26:43,120
and going back and forth between
these two views is the

442
00:26:43,120 --> 00:26:45,060
way you solve problems--

443
00:26:45,060 --> 00:26:46,770
you look at the combined
sequence of

444
00:26:46,770 --> 00:26:48,900
arrival epochs first.

445
00:26:48,900 --> 00:26:52,400
And then for each arrival that
comes in, you think of an IID

446
00:26:52,400 --> 00:26:55,450
random variable independent
of all the other random

447
00:26:55,450 --> 00:27:02,860
variables, which decides for
each arrival which of the

448
00:27:02,860 --> 00:27:04,710
sub-processes it goes to.

449
00:27:04,710 --> 00:27:08,680
So there's this hidden
process--

450
00:27:08,680 --> 00:27:09,890
well, it's not hidden.

451
00:27:09,890 --> 00:27:12,100
You can see what it's doing
from looking at all the

452
00:27:12,100 --> 00:27:14,340
sub-processes.

453
00:27:14,340 --> 00:27:20,670
And each arrival then is
associated with the given

454
00:27:20,670 --> 00:27:24,700
sub-process, with the
probability mass function

455
00:27:24,700 --> 00:27:28,160
lambda sub i over the
sum of lambda sub j.

456
00:27:28,160 --> 00:27:30,460
So this is the workhorse
of Poisson

457
00:27:30,460 --> 00:27:32,270
type queueing problems.

458
00:27:32,270 --> 00:27:35,990
You study queuing theory,
every page, you

459
00:27:35,990 --> 00:27:37,980
see this thing used.

460
00:27:37,980 --> 00:27:41,480
If you look at Kleinrock's books
on queueing, they're

461
00:27:41,480 --> 00:27:45,120
very nice books because they
cover so many different

462
00:27:45,120 --> 00:27:47,040
queueing situations.

463
00:27:47,040 --> 00:27:50,230
You find him using this
on every page.

464
00:27:50,230 --> 00:27:54,060
And he never tells you that he's
using it, but that's what

465
00:27:54,060 --> 00:27:54,670
he's doing.

466
00:27:54,670 --> 00:27:59,360
So that's a useful
thing to know.

467
00:27:59,360 --> 00:28:02,840
We then talked about conditional
arrivals and order

468
00:28:02,840 --> 00:28:05,590
statistics.

469
00:28:05,590 --> 00:28:12,280
The conditional distribution
of the N first arrivals--

470
00:28:12,280 --> 00:28:17,670
namely, s sub 1 s sub
2 up to s sub n--

471
00:28:17,670 --> 00:28:24,250
given the number of arrivals in
N of t is just n factorial

472
00:28:24,250 --> 00:28:25,430
over t to the n.

473
00:28:25,430 --> 00:28:29,380
Again, it doesn't depend on
where these arrivals are.

474
00:28:29,380 --> 00:28:33,215
It's just a function which is
independent of each arrival.

475
00:28:33,215 --> 00:28:36,660
It's the same kind of
conditioning we had before.

476
00:28:36,660 --> 00:28:40,080
It's n factorial divided
by t to the n.

477
00:28:40,080 --> 00:28:44,360
Because of the fact that if
you order these random

478
00:28:44,360 --> 00:28:49,450
variables, t1 less than t2 less
than t3, and so forth, up

479
00:28:49,450 --> 00:28:53,540
until time t, and then you say
how many different ways can I

480
00:28:53,540 --> 00:29:01,590
arrange a set of numbers, each
between 0 and t so that we

481
00:29:01,590 --> 00:29:03,630
have different orderings
of them.

482
00:29:03,630 --> 00:29:06,700
And you can choose any one
of the N to be the first.

483
00:29:06,700 --> 00:29:09,560
You can choose any one
of the remaining n

484
00:29:09,560 --> 00:29:11,510
minus 1 to be the second.

485
00:29:11,510 --> 00:29:14,670
And that's where this is n
factorial comes from here.

486
00:29:14,670 --> 00:29:18,140
And that, again we've
been over.

487
00:29:18,140 --> 00:29:21,660
The probability that s1 is
greater than tau, given that

488
00:29:21,660 --> 00:29:27,540
they're interarrivals in the
overall interval t, comes from

489
00:29:27,540 --> 00:29:31,390
just looking at N uniformly
distributed random variables

490
00:29:31,390 --> 00:29:33,190
between 0 and t.

491
00:29:33,190 --> 00:29:35,840
And then what do you do with
those uniformly distributed

492
00:29:35,840 --> 00:29:37,670
random variables?

493
00:29:37,670 --> 00:29:40,490
Well, you ask the question,
what's the probability that

494
00:29:40,490 --> 00:29:44,140
all of them occur
after time tau?

495
00:29:44,140 --> 00:29:47,820
And that's just t minus tau
divided by t raised to the

496
00:29:47,820 --> 00:29:48,910
n-th power.

497
00:29:48,910 --> 00:29:51,980
And see, all of these formulas
just come from particular

498
00:29:51,980 --> 00:29:54,360
viewpoints about what's
going on.

499
00:29:54,360 --> 00:29:55,760
You have a number
of viewpoints.

500
00:29:55,760 --> 00:29:58,550
One of them is throwing
darts at a line.

501
00:29:58,550 --> 00:30:01,140
One of them is having
exponential

502
00:30:01,140 --> 00:30:02,510
interarrival times.

503
00:30:02,510 --> 00:30:06,660
One of them is these uniform
interarrivals.

504
00:30:06,660 --> 00:30:08,880
It's only a very small
number of tricks.

505
00:30:08,880 --> 00:30:13,600
And you just use them in
various combinations.

506
00:30:13,600 --> 00:30:17,800
So the joint distribution of s1
to s n, given N of t equals

507
00:30:17,800 --> 00:30:21,250
n, is the same as the joint
distribution of N uniform

508
00:30:21,250 --> 00:30:24,070
random variables after
they've been ordered.

509
00:30:28,650 --> 00:30:32,115
So let's go on to finite
state Markov chains.

510
00:30:35,240 --> 00:30:37,670
Seems like we're covering an
enormous amount of material in

511
00:30:37,670 --> 00:30:38,350
this course.

512
00:30:38,350 --> 00:30:40,150
And I think we are.

513
00:30:40,150 --> 00:30:44,290
But as I'm trying to say, as
we go along, it's all--

514
00:30:44,290 --> 00:30:46,850
I mean, everything follows from
a relatively small set of

515
00:30:46,850 --> 00:30:48,620
principles.

516
00:30:48,620 --> 00:30:51,100
Of course, it's harder to
understand the small set of

517
00:30:51,100 --> 00:30:54,580
principles and how to apply them
than it is to understand

518
00:30:54,580 --> 00:30:55,460
all the details.

519
00:30:55,460 --> 00:30:56,710
But that's--

520
00:30:58,970 --> 00:31:01,560
but on the other hand, if you
understand the principles,

521
00:31:01,560 --> 00:31:04,620
then all those details,
including the ones we haven't

522
00:31:04,620 --> 00:31:08,280
talked about, are easy
to deal with.

523
00:31:08,280 --> 00:31:11,750
An integer-time stochastic
process--

524
00:31:11,750 --> 00:31:14,450
x1, x2, x3, blah, blah, blah--

525
00:31:14,450 --> 00:31:19,220
is a Markov chain if for all n,
namely the number of them

526
00:31:19,220 --> 00:31:21,770
that we're looking at--

527
00:31:21,770 --> 00:31:23,020
well--

528
00:31:25,880 --> 00:31:30,190
for all n, i, j, k, l, and so
forth, the probability that

529
00:31:30,190 --> 00:31:35,770
the n-th of these random
variables is equal to j, given

530
00:31:35,770 --> 00:31:39,340
what all of the others are-- and
these are not ordered now.

531
00:31:39,340 --> 00:31:41,460
I mean, in a Markov chain,
nothing is ordered.

532
00:31:41,460 --> 00:31:44,430
We're not talking about
an arrival process.

533
00:31:44,430 --> 00:31:47,220
We're just talking about a frog
jumping around on lily

534
00:31:47,220 --> 00:31:52,660
pads, if you arrange the lily
pads in a linear way, if these

535
00:31:52,660 --> 00:31:54,430
are random variables.

536
00:31:54,430 --> 00:32:00,530
The probability that the n-th
location is equal to j, given

537
00:32:00,530 --> 00:32:06,410
that the previous locations are
i, k, back to m, is just

538
00:32:06,410 --> 00:32:11,010
some probability p sub
i j, a conditional

539
00:32:11,010 --> 00:32:14,120
probability of j given i.

540
00:32:14,120 --> 00:32:17,670
In other words, once if you're
looking at what happens at

541
00:32:17,670 --> 00:32:22,340
time n, once you know what
happened at time n minus 1,

542
00:32:22,340 --> 00:32:24,830
everything else is
of no concern.

543
00:32:24,830 --> 00:32:29,400
This process evolves by having
a history of only one time

544
00:32:29,400 --> 00:32:31,980
unit, a little like the
Poisson process.

545
00:32:31,980 --> 00:32:36,070
The Poisson process evolves
by being totally

546
00:32:36,070 --> 00:32:37,880
independent of the past.

547
00:32:37,880 --> 00:32:40,600
Here, you put a little
dependence in the past.

548
00:32:40,600 --> 00:32:44,150
But the dependence is only to
look at the last thing that

549
00:32:44,150 --> 00:32:49,040
happened, and nothing before the
last time that happened.

550
00:32:49,040 --> 00:32:53,850
So p sub i j depends
only on i and j.

551
00:32:53,850 --> 00:32:59,170
And the initial probability mass
function is arbitrary.

552
00:32:59,170 --> 00:33:02,470
Markov chain is finite-state if
the sample space for each x

553
00:33:02,470 --> 00:33:07,400
i, as a finite set S. And the
sample space S is usually

554
00:33:07,400 --> 00:33:10,530
taken to be integers
1 up to M.

555
00:33:10,530 --> 00:33:13,490
In all these formulas we write,
we're always summing

556
00:33:13,490 --> 00:33:17,230
from one to M. And the reason
for that is we've assumed the

557
00:33:17,230 --> 00:33:22,120
states are 1, 2, 3, up to M.
Sometimes it's more convenient

558
00:33:22,120 --> 00:33:23,765
to think of different
state spaces.

559
00:33:26,730 --> 00:33:29,040
But all the formulas
we use are based on

560
00:33:29,040 --> 00:33:31,290
this state space here.

561
00:33:31,290 --> 00:33:36,500
Markov up chain is completely
described by these transition

562
00:33:36,500 --> 00:33:41,200
probabilities plus the initial
probabilities.

563
00:33:41,200 --> 00:33:44,390
If you want to write down the
probability of what x is this

564
00:33:44,390 --> 00:33:49,030
some time N given what was at
some time 0, all you have to

565
00:33:49,030 --> 00:33:52,890
do is trace all the paths from
0 out to N, add up the

566
00:33:52,890 --> 00:33:56,890
probabilities of all of those
paths, and that tells you the

567
00:33:56,890 --> 00:33:58,020
probability you want.

568
00:33:58,020 --> 00:34:01,820
All probabilities and be
calculated just from knowing

569
00:34:01,820 --> 00:34:06,240
what these transition
probabilities are.

570
00:34:06,240 --> 00:34:10,980
Note that when we're dealing
with Poisson processes, we

571
00:34:10,980 --> 00:34:15,520
defined everything in
terms of how many--

572
00:34:15,520 --> 00:34:20,250
how many variables are there in
defining a Poisson process?

573
00:34:20,250 --> 00:34:25,020
How many things do you have to
specify before I know exactly

574
00:34:25,020 --> 00:34:27,320
what Poisson process
I'm talking about?

575
00:34:30,540 --> 00:34:31,760
Only the Poisson rate.

576
00:34:31,760 --> 00:34:35,650
Only one parameter is necessary

577
00:34:35,650 --> 00:34:37,639
for a Poisson process.

578
00:34:37,639 --> 00:34:43,219
For a finite-state Markov
process, you need a lot more.

579
00:34:43,219 --> 00:34:48,310
What you need is all of these
values, p sub i j.

580
00:34:48,310 --> 00:34:52,409
If you sum p sub i j over
j, you have to get 1.

581
00:34:52,409 --> 00:34:54,830
So that removes one of them.

582
00:34:54,830 --> 00:34:58,360
But as soon as you specify that
transition matrix, you've

583
00:34:58,360 --> 00:34:59,960
specified everything.

584
00:34:59,960 --> 00:35:01,260
So there's nothing more to know

585
00:35:01,260 --> 00:35:03,220
about the Poisson process.

586
00:35:03,220 --> 00:35:06,060
There's only all these gruesome
derivations that we

587
00:35:06,060 --> 00:35:07,580
go through.

588
00:35:07,580 --> 00:35:11,600
But everything is initially
determined.

589
00:35:11,600 --> 00:35:13,960
Set of transition probabilities
is usually

590
00:35:13,960 --> 00:35:16,030
viewed as the Markov chain.

591
00:35:16,030 --> 00:35:19,760
And the initial probabilities
are usually viewed as just a

592
00:35:19,760 --> 00:35:21,740
parameter that we deal with.

593
00:35:21,740 --> 00:35:23,840
In other words, we--

594
00:35:23,840 --> 00:35:28,250
in other words, what we study
is the particular Markov

595
00:35:28,250 --> 00:35:31,550
chain, whether it's recurrent,
whether it's transient,

596
00:35:31,550 --> 00:35:32,800
whatever it is.

597
00:35:32,800 --> 00:35:35,770
How you break it up into
classes, all of that stuff

598
00:35:35,770 --> 00:35:39,060
only depends on these transition
probabilities and

599
00:35:39,060 --> 00:35:40,815
doesn't depend on
where you start.

600
00:35:46,920 --> 00:35:51,490
Now, a finite-state Markov chain
can be described either

601
00:35:51,490 --> 00:35:54,230
as a directed graph
or as a matrix.

602
00:35:54,230 --> 00:35:58,300
I hope you've seen by this
time that some things are

603
00:35:58,300 --> 00:36:03,040
easier to look at if you look at
things in terms of a graph.

604
00:36:03,040 --> 00:36:07,180
Some things are easier to look
at if you look at something

605
00:36:07,180 --> 00:36:08,660
like this matrix.

606
00:36:08,660 --> 00:36:13,230
And some problems can be solved
by inspection, if you

607
00:36:13,230 --> 00:36:14,700
draw a graph of it.

608
00:36:14,700 --> 00:36:17,890
Some can be solved almost
by inspection if

609
00:36:17,890 --> 00:36:19,480
you look at the matrix.

610
00:36:19,480 --> 00:36:23,460
If you're doing things by
computer, usually computers

611
00:36:23,460 --> 00:36:27,450
deal with matrices more easily
than with graphs.

612
00:36:27,450 --> 00:36:31,070
If you're dealing with a Markov
chain with 100,000

613
00:36:31,070 --> 00:36:35,290
states, you're not going to
look at the graph and

614
00:36:35,290 --> 00:36:38,330
determine very much from it,
because it's typically going

615
00:36:38,330 --> 00:36:39,650
to be fairly complicated--

616
00:36:39,650 --> 00:36:42,020
unless it has some very
simple structure.

617
00:36:42,020 --> 00:36:46,440
And sometimes that simple
structure is determined.

618
00:36:46,440 --> 00:36:48,780
If it's something where
you can only--

619
00:36:48,780 --> 00:36:52,190
where you have the states
numbered from 1 to 100,000,

620
00:36:52,190 --> 00:36:56,270
and you can only go from state
i to state i plus 1, or from

621
00:36:56,270 --> 00:36:59,910
state i to i plus 1, or
i minus 1, then it

622
00:36:59,910 --> 00:37:01,380
becomes very simple.

623
00:37:01,380 --> 00:37:04,320
And you like to look at
it as a graph again.

624
00:37:04,320 --> 00:37:07,670
But ordinarily, you don't
like to do that.

625
00:37:07,670 --> 00:37:15,000
But the nice thing about this
graph is that it tells you

626
00:37:15,000 --> 00:37:19,090
very simply and visually which
transition probabilities are

627
00:37:19,090 --> 00:37:23,810
zero, and which transition
probabilities are non-zero.

628
00:37:23,810 --> 00:37:26,690
And that's the thing that
specifies which states are

629
00:37:26,690 --> 00:37:31,650
recurrent, which states are
transient, and all of that.

630
00:37:31,650 --> 00:37:35,400
All of that kind of elementary
analysis about a Markov chain

631
00:37:35,400 --> 00:37:40,300
all comes from looking at this
graph and seeing whether you

632
00:37:40,300 --> 00:37:46,290
can get from one state to
another state by some process.

633
00:37:46,290 --> 00:37:50,520
So let's move on from that.

634
00:37:50,520 --> 00:37:53,620
Talk about the classification
of states.

635
00:37:53,620 --> 00:37:57,500
We started out with the
idea of a walk and

636
00:37:57,500 --> 00:37:59,370
a path and a cycle.

637
00:37:59,370 --> 00:38:03,610
I'm not sure these terms are
uniform throughout the field.

638
00:38:03,610 --> 00:38:07,550
But a walk is an ordered
string of nodes, like

639
00:38:07,550 --> 00:38:10,020
i0, i1, up to i n.

640
00:38:10,020 --> 00:38:14,960
You can have repeated elements
here, but you need a directed

641
00:38:14,960 --> 00:38:18,170
arc from i sub n minus
1 to i sub m.

642
00:38:18,170 --> 00:38:23,035
Like for example, in this stupid
Markov chain here--

643
00:38:25,870 --> 00:38:28,880
I mean, when you're drawing
things is LaTeX, it's kind of

644
00:38:28,880 --> 00:38:31,760
hard to draw those nice
little curves there.

645
00:38:31,760 --> 00:38:34,610
And because of that, when you
once draw a Markov chain, you

646
00:38:34,610 --> 00:38:36,050
never want to change it.

647
00:38:36,050 --> 00:38:39,210
And that's why these nodes
have a very small set of

648
00:38:39,210 --> 00:38:40,530
Markov chains in them.

649
00:38:40,530 --> 00:38:46,580
It's just to save me some work,
drawing and drawing

650
00:38:46,580 --> 00:38:47,830
these diagrams.

651
00:38:50,030 --> 00:38:55,700
An example of a walk, as you
start in 4, you take the self

652
00:38:55,700 --> 00:38:58,800
loop, go back to 4 at time 2.

653
00:38:58,800 --> 00:39:01,660
Then you go to state
1 at time 3.

654
00:39:01,660 --> 00:39:05,240
Then you go to state
2 at time 4.

655
00:39:05,240 --> 00:39:08,140
Then you go to stage
3, time 5.

656
00:39:08,140 --> 00:39:11,010
And back to state 2 at time 6.

657
00:39:11,010 --> 00:39:13,300
You have repeated nodes there.

658
00:39:13,300 --> 00:39:17,230
You have repeated nodes
separated here.

659
00:39:17,230 --> 00:39:20,630
Another example of a
walk is 4, 1, 2, 3.

660
00:39:20,630 --> 00:39:24,120
Example of a path, the path
can't have any repeated nodes.

661
00:39:24,120 --> 00:39:27,060
We'd like to look at paths,
because if you're going to be

662
00:39:27,060 --> 00:39:30,280
able to get from one node to
another node, and there's some

663
00:39:30,280 --> 00:39:33,420
walk that goes all around the
place and gets to that final

664
00:39:33,420 --> 00:39:36,770
node, there's also path
that goes there.

665
00:39:36,770 --> 00:39:39,900
If you look at the walk, you
just leave that all the cycles

666
00:39:39,900 --> 00:39:42,570
along the way, and
you get to the n.

667
00:39:42,570 --> 00:39:45,980
And a cycle, of course, which I
didn't define, is something

668
00:39:45,980 --> 00:39:49,820
which starts at one node, goes
through a path, and then

669
00:39:49,820 --> 00:39:52,730
finally comes back to the same
node that it started at.

670
00:39:52,730 --> 00:39:56,800
And it doesn't make any
difference for the cycle 2, 3,

671
00:39:56,800 --> 00:40:01,610
2 whether you call it
2, 3, 2 or 3, 2, 3.

672
00:40:01,610 --> 00:40:04,390
That's the same cycle, and
it's not even worth

673
00:40:04,390 --> 00:40:07,200
distinguishing between
those two ideas.

674
00:40:07,200 --> 00:40:12,723
OK That's that.

675
00:40:15,360 --> 00:40:20,010
If there's a path from--

676
00:40:20,010 --> 00:40:21,260
where did I--

677
00:40:26,110 --> 00:40:31,800
node j is accessible from i,
which we abbreviate as i

678
00:40:31,800 --> 00:40:33,680
has a path to j.

679
00:40:33,680 --> 00:40:38,010
If there's a walk from i to
j, which means that p

680
00:40:38,010 --> 00:40:40,650
sup i j to the n--

681
00:40:40,650 --> 00:40:44,150
this is the transition
probability, the probability

682
00:40:44,150 --> 00:40:49,160
that x sub n is equal to
j, given that x sub

683
00:40:49,160 --> 00:40:50,710
0 is equal to i.

684
00:40:50,710 --> 00:40:53,380
And we use this all the time.

685
00:40:53,380 --> 00:40:57,370
If this is greater than zero
for some n greater than 0.

686
00:40:57,370 --> 00:41:06,950
In other words, j is accessible
from i if there's a

687
00:41:06,950 --> 00:41:09,240
path from i that goes to j.

688
00:41:12,300 --> 00:41:17,170
And trivially, if i go to j, and
there's a path from j to

689
00:41:17,170 --> 00:41:21,520
k, then there has to be
a path from i to k.

690
00:41:21,520 --> 00:41:25,730
If you've ever tried to make up
a mapping program to find

691
00:41:25,730 --> 00:41:28,910
how to get from here to there,
this is one of the most useful

692
00:41:28,910 --> 00:41:29,740
things you use.

693
00:41:29,740 --> 00:41:32,320
If there's a way to get here
to there, and a way to get

694
00:41:32,320 --> 00:41:35,330
from here to there, then there's
a way to get from here

695
00:41:35,330 --> 00:41:37,560
all the way to the end.

696
00:41:37,560 --> 00:41:42,650
And if you look up what most of
these map programs do, you

697
00:41:42,650 --> 00:41:47,040
see that they overuse this
enormously and they wind up

698
00:41:47,040 --> 00:41:50,910
taking you from here to there
by some bizarre path just

699
00:41:50,910 --> 00:41:53,880
because it happens to go through
some intermediate node

700
00:41:53,880 --> 00:41:55,460
on the way.

701
00:41:55,460 --> 00:41:58,680
So two nodes communicate--

702
00:41:58,680 --> 00:42:01,890
i double arrow j--

703
00:42:01,890 --> 00:42:08,860
if j is accessible from i, and
if i is accessible from j.

704
00:42:08,860 --> 00:42:12,450
That means there's a path from
i to j, and another path from

705
00:42:12,450 --> 00:42:16,260
j back to i, if you shorten
them as much as you can.

706
00:42:16,260 --> 00:42:17,040
There's a cycle.

707
00:42:17,040 --> 00:42:23,530
It starts at i, goes through j,
and comes back to i again.

708
00:42:23,530 --> 00:42:29,810
I didn't say that quite right,
so delete that from what

709
00:42:29,810 --> 00:42:31,200
you've just heard.

710
00:42:31,200 --> 00:42:35,630
A class C of states as a
non-empty set, such that i and

711
00:42:35,630 --> 00:42:40,370
j communicate for each
i j in this class.

712
00:42:40,370 --> 00:42:45,330
But i does not communicate
with j for each i and C--

713
00:42:49,420 --> 00:42:53,210
for i and C and j, not in C.

714
00:42:53,210 --> 00:42:55,870
The convenient way to think
about this-- and I should have

715
00:42:55,870 --> 00:42:59,670
stated this as a theorem in
the notes, because it's--

716
00:43:03,990 --> 00:43:06,130
I think it's something that
we all use without even

717
00:43:06,130 --> 00:43:07,750
thinking about it.

718
00:43:07,750 --> 00:43:12,480
It says that the entire set of
states, or the entire set of

719
00:43:12,480 --> 00:43:16,500
nodes in a graph, is partitioned
into classes.

720
00:43:16,500 --> 00:43:22,860
The class C, containing, is i
in union with all of the j's

721
00:43:22,860 --> 00:43:24,110
that communicate with i.

722
00:43:24,110 --> 00:43:27,580
So if you want to find this
partition, you start out with

723
00:43:27,580 --> 00:43:31,280
an arbitrary node, you find all
of the other nodes that it

724
00:43:31,280 --> 00:43:34,590
communicates with, and you
find them by picking

725
00:43:34,590 --> 00:43:36,320
them one at a time.

726
00:43:36,320 --> 00:43:41,050
You pick all of the nodes
for which p sub i j is

727
00:43:41,050 --> 00:43:42,540
greater than 0.

728
00:43:42,540 --> 00:43:44,100
Then you pick--

729
00:43:44,100 --> 00:43:46,530
and p sub j i is great--

730
00:43:46,530 --> 00:43:47,780
well-- blah.

731
00:43:50,030 --> 00:43:55,400
If you want to find the set of
nodes that are accessible from

732
00:43:55,400 --> 00:43:57,640
i, you start out looking at i.

733
00:43:57,640 --> 00:44:00,640
You look at all the states
which are accessible

734
00:44:00,640 --> 00:44:03,300
from i in one step.

735
00:44:03,300 --> 00:44:06,870
Then you look at all the steps,
all of the states,

736
00:44:06,870 --> 00:44:09,380
which you can access from
any one of those.

737
00:44:09,380 --> 00:44:12,720
Those are the states which are
accessible in two states--

738
00:44:12,720 --> 00:44:16,150
in two steps, then in three
steps, and so forth.

739
00:44:16,150 --> 00:44:21,380
So you find all the nodes that
are accessible from node i.

740
00:44:21,380 --> 00:44:24,640
And then you turn around and
do it the other way.

741
00:44:24,640 --> 00:44:29,600
And presto, you have all of
these classes of states all

742
00:44:29,600 --> 00:44:30,910
very simply.

743
00:44:30,910 --> 00:44:34,990
For finite-state change, the
state i is transient if

744
00:44:34,990 --> 00:44:40,200
there's a j in S such that
i goes into j, but j

745
00:44:40,200 --> 00:44:41,420
does not go into i.

746
00:44:41,420 --> 00:44:46,900
In other words, if I'm a state
i, and I can get to you, but

747
00:44:46,900 --> 00:44:55,450
you can't get back to me,
then I'm transient.

748
00:44:55,450 --> 00:45:01,600
Because the way Markov chains
work, we keep going from one

749
00:45:01,600 --> 00:45:04,720
step to the next step to the
next step to the next step.

750
00:45:04,720 --> 00:45:09,710
And if I keep returning to
myself, then eventually I'm

751
00:45:09,710 --> 00:45:11,010
going to go to you.

752
00:45:11,010 --> 00:45:14,040
And once I go to you, I'll
never get back again.

753
00:45:14,040 --> 00:45:18,540
So because of that, these
transient states are states

754
00:45:18,540 --> 00:45:21,450
where eventually you
leave them and you

755
00:45:21,450 --> 00:45:23,160
never get back again.

756
00:45:23,160 --> 00:45:26,190
As soon as we start talking
about countable state Markov

757
00:45:26,190 --> 00:45:28,270
chains, you'll see that
this definition

758
00:45:28,270 --> 00:45:30,250
doesn't work anymore.

759
00:45:30,250 --> 00:45:32,620
You can--

760
00:45:32,620 --> 00:45:36,520
it is very possible to just
wander away in a countable

761
00:45:36,520 --> 00:45:40,390
state Markov chain, and you
never get back again that way.

762
00:45:40,390 --> 00:45:43,640
After you wander away too far,
the probability of getting

763
00:45:43,640 --> 00:45:45,540
back gets smaller and smaller.

764
00:45:45,540 --> 00:45:47,830
You keep getting further
and further away.

765
00:45:47,830 --> 00:45:52,810
The probability of returning
gets smaller and smaller, so

766
00:45:52,810 --> 00:45:56,360
that you have transience
that way also.

767
00:45:56,360 --> 00:45:59,470
But here, the situation is
simpler for a finite-state

768
00:45:59,470 --> 00:46:01,030
Markov chain.

769
00:46:01,030 --> 00:46:05,570
And you can define transience if
there's a j in S such that

770
00:46:05,570 --> 00:46:09,440
i goes into j, but j
doesn't go into i.

771
00:46:09,440 --> 00:46:13,160
If i's not transient,
then it's recurrent.

772
00:46:13,160 --> 00:46:16,240
Usually you define recurrence
first and transience later,

773
00:46:16,240 --> 00:46:19,470
but it's a little simpler
this way.

774
00:46:19,470 --> 00:46:22,310
All states in a class are
transient, or all are

775
00:46:22,310 --> 00:46:26,330
recurrent, and a finite-state
Markov chain contains at least

776
00:46:26,330 --> 00:46:27,990
one recurrent class.

777
00:46:27,990 --> 00:46:29,770
You did that in your homework.

778
00:46:29,770 --> 00:46:33,040
And you were surprised at how
complicated it was to do it.

779
00:46:33,040 --> 00:46:36,350
I hope that after you wrote
down a proof of this, you

780
00:46:36,350 --> 00:46:41,800
stopped and thought about what
you were actually proving,

781
00:46:41,800 --> 00:46:46,030
which intuitively is something
very, very simple.

782
00:46:46,030 --> 00:46:48,960
It's just looking at all of
the transient classes.

783
00:46:48,960 --> 00:46:51,480
Starting at one transient
class, you

784
00:46:51,480 --> 00:46:54,950
find if there's another--

785
00:46:54,950 --> 00:46:59,190
if there's another state you can
get to from OK i which is

786
00:46:59,190 --> 00:47:02,170
also transient, and then you
find if there's another state

787
00:47:02,170 --> 00:47:04,910
you get to from there which
is also transient.

788
00:47:04,910 --> 00:47:08,500
And eventually, you have to come
to a state from which you

789
00:47:08,500 --> 00:47:13,325
can't go to some other state,
from which you can't get back.

790
00:47:17,350 --> 00:47:20,410
That was explaining it almost
as badly as the problem

791
00:47:20,410 --> 00:47:22,120
statement explained it.

792
00:47:22,120 --> 00:47:25,460
And I hope that after you did
the problem, even if you can't

793
00:47:25,460 --> 00:47:27,910
explain it to someone,
you have an

794
00:47:27,910 --> 00:47:30,430
understanding of why it's true.

795
00:47:30,430 --> 00:47:34,920
It shouldn't be surprising
after you do that.

796
00:47:34,920 --> 00:47:38,950
So the finite-state Markov chain
contains at least one

797
00:47:38,950 --> 00:47:40,200
recurrent class.

798
00:47:42,800 --> 00:47:46,720
OK, the period of a state
i as the greatest common

799
00:47:46,720 --> 00:47:51,730
denominator of n, such that
p i n is greater than 0.

800
00:47:51,730 --> 00:47:54,580
Again, a very complicated
definition for a

801
00:47:54,580 --> 00:47:56,280
simple kind of idea.

802
00:47:56,280 --> 00:47:58,670
Namely, you start out
in a state i.

803
00:47:58,670 --> 00:48:02,440
You look at all of the times at
which you can get back to

804
00:48:02,440 --> 00:48:03,940
state i again.

805
00:48:03,940 --> 00:48:08,780
If you find it that set of
times has a period in it,

806
00:48:08,780 --> 00:48:19,550
namely, if every sequences of
states is a multiple of some

807
00:48:19,550 --> 00:48:25,410
d, then you know that the state
is periodic if d is

808
00:48:25,410 --> 00:48:26,720
greater than 1.

809
00:48:26,720 --> 00:48:30,060
And what you have to do is to
find the largest such number.

810
00:48:30,060 --> 00:48:32,040
And that's the period
of the state.

811
00:48:32,040 --> 00:48:35,170
All states in the same class
have the same period.

812
00:48:35,170 --> 00:48:38,690
A recurring class with period
d greater than one can be

813
00:48:38,690 --> 00:48:40,550
partitioned into sub-class--

814
00:48:40,550 --> 00:48:42,640
this is the best way
of looking at

815
00:48:42,640 --> 00:48:45,820
periodic classes of states.

816
00:48:45,820 --> 00:48:49,780
If you have a periodic class of
states, then you can always

817
00:48:49,780 --> 00:48:53,960
separate it into
d sub-classes.

818
00:48:53,960 --> 00:48:59,300
And in such a set of
sub-classes, transitions from

819
00:48:59,300 --> 00:49:03,770
S1 and the states in
S1 only go to S2.

820
00:49:03,770 --> 00:49:07,710
Transitions from states
in S2 only go to S3.

821
00:49:07,710 --> 00:49:12,430
Up to, transitions from S
d only go back to S1.

822
00:49:12,430 --> 00:49:16,050
They have to go someplace,
so they go back to S1.

823
00:49:16,050 --> 00:49:22,500
So as you cycle around, it takes
d steps to cycle from 1

824
00:49:22,500 --> 00:49:24,000
back to 1 again.

825
00:49:24,000 --> 00:49:28,410
It takes d steps to cycle
from 2 back to 2 again.

826
00:49:28,410 --> 00:49:31,300
So you can see the structure of
the Markov chain and why,

827
00:49:31,300 --> 00:49:34,810
in fact, it does have to be--

828
00:49:34,810 --> 00:49:38,480
why that class has
to be periodic.

829
00:49:38,480 --> 00:49:41,870
An ergodic class is a recurrent
aperiodic class.

830
00:49:41,870 --> 00:49:44,760
In other words, it's a class
where the period is equal to

831
00:49:44,760 --> 00:49:48,450
1, which means there really
isn't any period.

832
00:49:48,450 --> 00:49:52,550
A Markov chain with only one
class is ergodic if the class

833
00:49:52,550 --> 00:49:54,640
is ergodic.

834
00:49:54,640 --> 00:49:56,880
And the big theorem here--

835
00:49:56,880 --> 00:49:59,670
I mean, this is probably the
most important theorem about

836
00:49:59,670 --> 00:50:01,820
finite-state Markov chains.

837
00:50:01,820 --> 00:50:05,100
You have an ergodic,
finite-state Markov chain.

838
00:50:05,100 --> 00:50:12,300
Then the limit as n goes to
infinity of the probability of

839
00:50:12,300 --> 00:50:16,700
arriving in state j after n
steps, given that you started

840
00:50:16,700 --> 00:50:20,780
in state i, is just some
function of j.

841
00:50:20,780 --> 00:50:24,400
In other words, when n gets very
large, it doesn't depend

842
00:50:24,400 --> 00:50:27,370
on how large M is.

843
00:50:27,370 --> 00:50:28,480
It stays the same.

844
00:50:28,480 --> 00:50:30,570
It becomes independent of n.

845
00:50:30,570 --> 00:50:32,450
It doesn't depend on
where you started.

846
00:50:32,450 --> 00:50:34,860
No matter where you start
in a finite-state

847
00:50:34,860 --> 00:50:36,570
ergodic Markov chain.

848
00:50:36,570 --> 00:50:40,580
After a very long time, the
probability of being in a

849
00:50:40,580 --> 00:50:44,620
state j is independent of where
you started, and it's

850
00:50:44,620 --> 00:50:48,170
independent of how long
you've been running.

851
00:50:48,170 --> 00:50:52,200
So that's a very strong
kind of--

852
00:50:52,200 --> 00:50:54,890
it's a very strong kind
of limit theorem.

853
00:50:54,890 --> 00:50:58,690
It's very much like the law of
large numbers and all of these

854
00:50:58,690 --> 00:51:00,030
other things.

855
00:51:00,030 --> 00:51:03,120
I'm going to talk a little bit
at the end about what that

856
00:51:03,120 --> 00:51:04,820
relationship really is.

857
00:51:07,360 --> 00:51:10,850
Except what it says is, after a
long time, you're in steady

858
00:51:10,850 --> 00:51:12,670
state, which is why
it's called the

859
00:51:12,670 --> 00:51:13,760
steady state theorem.

860
00:51:13,760 --> 00:51:14,440
Yes?

861
00:51:14,440 --> 00:51:17,386
AUDIENCE: Could you define the
steady states for periodic

862
00:51:17,386 --> 00:51:18,636
changes [INAUDIBLE]?

863
00:51:21,320 --> 00:51:26,460
PROFESSOR: I try to avoid doing
that because you have

864
00:51:26,460 --> 00:51:28,650
steady state probabilities.

865
00:51:28,650 --> 00:51:31,810
The steady state probabilities
that you have are, you take--

866
00:51:34,990 --> 00:51:38,760
is if you have these
sub-classes.

867
00:51:38,760 --> 00:51:42,690
Then you wind up with a steady
state within each sub-class.

868
00:51:42,690 --> 00:51:46,900
If you assign a probability
of the probability in the

869
00:51:46,900 --> 00:51:51,870
sub-class, divided by d, then
you get what is the steady

870
00:51:51,870 --> 00:51:52,930
state probability.

871
00:51:52,930 --> 00:51:56,870
If you start out in that steady
state, then you're in

872
00:51:56,870 --> 00:52:00,130
each sub-class with probability
1 over d.

873
00:52:00,130 --> 00:52:04,230
And you shift to the next
sub-class and you're still in

874
00:52:04,230 --> 00:52:08,340
steady state, because you have
a probability, 1 over d, of

875
00:52:08,340 --> 00:52:12,230
being in each of those
sub-classes to start with.

876
00:52:12,230 --> 00:52:16,970
You shift and you're still in
one of the sub-classes with

877
00:52:16,970 --> 00:52:19,130
probability 1 over d.

878
00:52:19,130 --> 00:52:22,690
So there still is a steady
state in that sense, but

879
00:52:22,690 --> 00:52:24,830
there's not a steady state
in any nice sense.

880
00:52:31,940 --> 00:52:39,470
So anyway, that's
the way it is.

881
00:52:39,470 --> 00:52:44,860
But you see, if you understand
this theorem for ergodic

882
00:52:44,860 --> 00:52:48,550
finite state and Markov
chains, and then you

883
00:52:48,550 --> 00:52:52,540
understand about periodic
change and this set of

884
00:52:52,540 --> 00:52:56,070
sub-classes, you can
see within each

885
00:52:56,070 --> 00:52:59,450
sub-class, if you look at--

886
00:52:59,450 --> 00:53:00,700
if you look at--

887
00:53:04,440 --> 00:53:11,500
if you look at time 0, time d,
time 2d, times 3d and 4d, then

888
00:53:11,500 --> 00:53:14,470
whatever state you start in,
you're going to be in the same

889
00:53:14,470 --> 00:53:19,380
class after d steps, the same
class after 2d steps.

890
00:53:19,380 --> 00:53:21,480
You're going to have
a transition

891
00:53:21,480 --> 00:53:24,280
matrix over d steps.

892
00:53:24,280 --> 00:53:27,360
And this theorem still applies
to these sub-classes over

893
00:53:27,360 --> 00:53:29,200
periods of d.

894
00:53:29,200 --> 00:53:32,030
So the hard part of it
is proving this.

895
00:53:32,030 --> 00:53:35,180
After you prove this, then you
see that the same thing

896
00:53:35,180 --> 00:53:38,200
happens over each sub-class
after that.

897
00:53:43,650 --> 00:53:45,290
That's a pretty major theorem.

898
00:53:45,290 --> 00:53:46,990
It's difficult to prove.

899
00:53:46,990 --> 00:53:50,890
A sub-step is to show that for
an ergodic M state Markov

900
00:53:50,890 --> 00:53:56,380
chain, the probability of being
in state j at time n,

901
00:53:56,380 --> 00:54:00,930
given that you're in state i at
time 0, is positive for all

902
00:54:00,930 --> 00:54:05,870
i j, and all n greater than
M minus 1 squared plus 1.

903
00:54:05,870 --> 00:54:10,900
It's very surprising that you
have to go this many states--

904
00:54:10,900 --> 00:54:14,980
this many steps before you get
to the point that all these

905
00:54:14,980 --> 00:54:18,440
transition probabilities
are positive.

906
00:54:18,440 --> 00:54:22,450
You look at this particular kind
of Markov chain in the

907
00:54:22,450 --> 00:54:26,660
homework, and I hope what you
found out from it was that if

908
00:54:26,660 --> 00:54:32,040
you start, say, in state two,
then at the next time, you

909
00:54:32,040 --> 00:54:33,640
have to be in 3.

910
00:54:33,640 --> 00:54:37,020
Next time, you have to be in
4, you have to be in 5, you

911
00:54:37,020 --> 00:54:38,560
have to be in 6.

912
00:54:38,560 --> 00:54:41,300
In other words, the size of
the set that you can be in

913
00:54:41,300 --> 00:54:46,550
after one step is just 1.

914
00:54:46,550 --> 00:54:51,170
One possible state here, one
possible state here, one

915
00:54:51,170 --> 00:54:52,640
possible state here.

916
00:54:52,640 --> 00:54:57,250
The next step, you're in either
1 or 2, and as you

917
00:54:57,250 --> 00:55:01,600
travel around, the size of the
set of states you can be in at

918
00:55:01,600 --> 00:55:06,510
these different steps, is 2,
until you get all the way

919
00:55:06,510 --> 00:55:07,510
around again.

920
00:55:07,510 --> 00:55:09,800
And then there's
a way to get--

921
00:55:09,800 --> 00:55:15,050
when you get to state 6 again,
the set of states enlarges.

922
00:55:15,050 --> 00:55:18,970
So finally you get up to a
set of states, which is

923
00:55:18,970 --> 00:55:20,800
up to M minus 1.

924
00:55:20,800 --> 00:55:25,630
And that's why you get the M
minus 1 squared here, plus 1.

925
00:55:25,630 --> 00:55:28,710
And this is the only Markov
chain there is.

926
00:55:28,710 --> 00:55:31,850
You can have as many
states going around

927
00:55:31,850 --> 00:55:33,770
here as you want to.

928
00:55:33,770 --> 00:55:36,020
But you have to have this
structure at the end, where

929
00:55:36,020 --> 00:55:39,930
there's one special state and
one way of circumventing it,

930
00:55:39,930 --> 00:55:43,930
which means there's one cycle
of size M minus 1, and one

931
00:55:43,930 --> 00:55:48,440
cycle of size M. And that's the
only way you can get it.

932
00:55:48,440 --> 00:55:52,780
And that's the only Markov chain
that meets this bound

933
00:55:52,780 --> 00:55:53,640
with equality.

934
00:55:53,640 --> 00:56:01,470
In all other cases, you get this
property much earlier.

935
00:56:01,470 --> 00:56:05,200
And often, you get it after just
a linear amount of time.

936
00:56:09,360 --> 00:56:13,350
The other part of this major
theorem that you reach steady

937
00:56:13,350 --> 00:56:17,350
state says, let P be
greater than 0.

938
00:56:17,350 --> 00:56:19,150
In other words, let
all the transition

939
00:56:19,150 --> 00:56:22,410
probabilities be positive.

940
00:56:22,410 --> 00:56:28,040
And then define some quantity
alpha as a minimum of the

941
00:56:28,040 --> 00:56:30,160
transition probabilities.

942
00:56:30,160 --> 00:56:34,110
And then the theorem says, for
all states j and all n greater

943
00:56:34,110 --> 00:56:38,470
than or equal to 1, the maximum
over the initial

944
00:56:38,470 --> 00:56:43,180
states minus the minimum over
the initial states of P sub i

945
00:56:43,180 --> 00:56:49,040
j to the n plus-- first step,
that difference is less than

946
00:56:49,040 --> 00:56:52,470
or equal to the difference
a the n-th step,

947
00:56:52,470 --> 00:56:54,300
times 1 minus 2 alpha.

948
00:56:54,300 --> 00:56:58,970
Now 1 minus 2 alpha is
as a positive number.

949
00:56:58,970 --> 00:57:03,700
And this says that this maximum
minus minimum is 1

950
00:57:03,700 --> 00:57:07,860
minus 2 alpha to the n, which
says that the limit of the

951
00:57:07,860 --> 00:57:11,220
maximizing term is equal
to the limit of

952
00:57:11,220 --> 00:57:12,640
the minimizing term.

953
00:57:12,640 --> 00:57:13,850
And what does that say?

954
00:57:13,850 --> 00:57:18,740
It says that everything in the
middle gets squeezed together.

955
00:57:18,740 --> 00:57:24,200
And it says exactly what we want
it to say, that the limit

956
00:57:24,200 --> 00:57:30,380
of P sub l j to the n is
independent of l, after n gets

957
00:57:30,380 --> 00:57:31,310
very large.

958
00:57:31,310 --> 00:57:34,090
Because the maximum and
the minimum get very

959
00:57:34,090 --> 00:57:37,560
close to each other.

960
00:57:37,560 --> 00:57:40,170
We also showed that [? our ?]
approaches that limit

961
00:57:40,170 --> 00:57:41,780
exponentially.

962
00:57:41,780 --> 00:57:43,640
That's what this says.

963
00:57:43,640 --> 00:57:49,860
The exponent here is just this
alpha, determined in that way.

964
00:57:49,860 --> 00:57:54,630
And the theorem for ergodic
Markov chains then follows by

965
00:57:54,630 --> 00:58:01,380
just looking at successive h
steps in the Markov chain when

966
00:58:01,380 --> 00:58:06,110
h is large enough so that all
these transition probabilities

967
00:58:06,110 --> 00:58:07,360
are positive.

968
00:58:09,300 --> 00:58:12,220
So you go out far enough
that all the transition

969
00:58:12,220 --> 00:58:13,860
probabilities are positive.

970
00:58:13,860 --> 00:58:16,980
And then you look at repetitions
of that, and apply

971
00:58:16,980 --> 00:58:18,230
this theorem.

972
00:58:18,230 --> 00:58:21,570
And suddenly you have this
general theorem,

973
00:58:21,570 --> 00:58:22,900
which is what we wanted.

974
00:58:27,200 --> 00:58:30,530
An ergodic unichain is a Markov
up chain with one

975
00:58:30,530 --> 00:58:33,870
ergodic recurring class,
plus perhaps a set

976
00:58:33,870 --> 00:58:36,550
of transient states.

977
00:58:36,550 --> 00:58:39,600
And most of the things we talk
about in this course are for

978
00:58:39,600 --> 00:58:45,870
unichains, usually ergodic
unichains, because if you have

979
00:58:45,870 --> 00:58:49,160
multiple recurrent classes,
it just makes a mess.

980
00:58:49,160 --> 00:58:51,780
You wind up in this recurrent
class, or

981
00:58:51,780 --> 00:58:53,950
this recurrent class.

982
00:58:53,950 --> 00:59:00,080
And aside from the question of
which one you get to, you

983
00:59:00,080 --> 00:59:01,730
don't much care about it.

984
00:59:01,730 --> 00:59:05,790
And the theorem here is for an
ergodic finite-state unichain.

985
00:59:05,790 --> 00:59:10,370
The limit of P sub i j to the
n probability of being in

986
00:59:10,370 --> 00:59:15,130
state j at time n, given that
you're in state i at time 0,

987
00:59:15,130 --> 00:59:17,290
is equal to pi sub j.

988
00:59:17,290 --> 00:59:22,330
In other words, this limit
here exists for all i j.

989
00:59:22,330 --> 00:59:25,210
And the limit is independent
of i.

990
00:59:25,210 --> 00:59:27,900
And it's independent of n
as n gets big enough.

991
00:59:32,820 --> 00:59:42,970
And then also, we can choose
this so that this set of

992
00:59:42,970 --> 00:59:47,680
probabilities here satisfies
this, what's called the steady

993
00:59:47,680 --> 00:59:51,780
state condition, the sum
of pi i times P sub i j

994
00:59:51,780 --> 00:59:53,140
is equal to pi j.

995
00:59:53,140 --> 00:59:56,380
In other words, if you start out
in steady state, and you

996
00:59:56,380 --> 01:00:00,300
look at the probabilities of
being in the different states

997
01:00:00,300 --> 01:00:06,610
at the next time unit, this is
the probability of being in

998
01:00:06,610 --> 01:00:11,610
state j at time n plus 1, if
this is the probability of

999
01:00:11,610 --> 01:00:14,420
being in state i at time n.

1000
01:00:14,420 --> 01:00:17,790
So that condition
gets satisfied.

1001
01:00:17,790 --> 01:00:19,280
That condition is satisfied.

1002
01:00:19,280 --> 01:00:22,760
You just stay in steady
state forever.

1003
01:00:22,760 --> 01:00:29,210
And pi i has to be positive for
a recurrent i, and pi i is

1004
01:00:29,210 --> 01:00:31,680
equal to 0 otherwise.

1005
01:00:31,680 --> 01:00:35,230
So this is just a
generalization

1006
01:00:35,230 --> 01:00:38,090
of the ergodic theorem.

1007
01:00:38,090 --> 01:00:43,400
And this is not what people
refer to as the ergodic

1008
01:00:43,400 --> 01:00:48,160
theorem, which is a much more
general theorem than this.

1009
01:00:48,160 --> 01:00:50,900
This is the ergodic theorem for
the case of finite state

1010
01:00:50,900 --> 01:00:53,110
Markov chains.

1011
01:00:53,110 --> 01:00:59,190
You can restate this in matrix
form as the limit of the

1012
01:00:59,190 --> 01:01:02,900
matrix P to the n-th power.

1013
01:01:02,900 --> 01:01:06,680
What I didn't mention here and
what I probably didn't mention

1014
01:01:06,680 --> 01:01:11,880
enough in the notes is
that P sub i j--

1015
01:01:32,360 --> 01:01:47,560
but also, if you take the matrix
P times P time P, n

1016
01:01:47,560 --> 01:01:53,880
times, namely, you take the
matrix, P to the n.

1017
01:01:53,880 --> 01:02:00,720
This says the P sub i j
is the i j element.

1018
01:02:09,900 --> 01:02:12,530
I'm sure all of you know that
by now, because you've been

1019
01:02:12,530 --> 01:02:15,310
using it all the time.

1020
01:02:15,310 --> 01:02:18,820
And what this says here--

1021
01:02:18,820 --> 01:02:26,150
what we've said before is that
every row of this matrix, P to

1022
01:02:26,150 --> 01:02:28,600
the n, is the same.

1023
01:02:28,600 --> 01:02:31,290
Every row is equal to pi.

1024
01:02:31,290 --> 01:02:47,786
P to the n tends to a matrix
which is pi 1, pi 2,

1025
01:02:47,786 --> 01:02:52,120
up to pi sub n.

1026
01:02:52,120 --> 01:02:57,000
Pi 1, pi 2, up to pi sub n.

1027
01:03:00,760 --> 01:03:06,770
Pi 1, pi 2, up to pi sub n.

1028
01:03:06,770 --> 01:03:14,660
And the easiest way to express
this is the vector e times pi,

1029
01:03:14,660 --> 01:03:24,960
where e is transposed.

1030
01:03:24,960 --> 01:03:32,755
In other words, if you take a
column matrix, column 1, 1, 1,

1031
01:03:32,755 --> 01:03:40,670
1, 1, and you multiply this by
a row vector, pi 1 times pi

1032
01:03:40,670 --> 01:03:48,030
sub n, what you get is, for this
first row multiplied by

1033
01:03:48,030 --> 01:03:51,210
this, this gives you--

1034
01:03:51,210 --> 01:03:53,480
well, in fact, if you
multiply this out,

1035
01:03:53,480 --> 01:03:56,360
this is what you get.

1036
01:03:56,360 --> 01:03:58,650
And if you've never gone through
the trouble of seeing

1037
01:03:58,650 --> 01:04:03,880
that this multiplication leads
to this, please do it, because

1038
01:04:03,880 --> 01:04:07,170
it's important to notice
that correspondence.

1039
01:04:14,530 --> 01:04:18,080
We got specific results by
looking at the eigenvalues and

1040
01:04:18,080 --> 01:04:20,880
eigenvectors of stochastic
matrices.

1041
01:04:20,880 --> 01:04:24,720
And a stochastic matrix is the
matrix of a Markov chain.

1042
01:04:28,500 --> 01:04:31,290
So some of these things
are sort of obvious.

1043
01:04:31,290 --> 01:04:36,870
Lambda is an eigenvalue of P, if
and only if P minus lambda

1044
01:04:36,870 --> 01:04:38,120
i is singular.

1045
01:04:41,670 --> 01:04:45,040
This set of relationships
is not obvious.

1046
01:04:45,040 --> 01:04:48,130
This is obvious linear
algebra.

1047
01:04:48,130 --> 01:04:51,250
This is something that when
you study eigenvalues and

1048
01:04:51,250 --> 01:04:55,430
eigenvectors in linear algebra,
you recognize that

1049
01:04:55,430 --> 01:04:57,270
this is a summary of
a lot of things.

1050
01:04:57,270 --> 01:05:01,440
If and only if this determinant
is equal to 0,

1051
01:05:01,440 --> 01:05:05,650
which is true if and only if
there's some vector nu for

1052
01:05:05,650 --> 01:05:12,560
which P times nu equals lambda
times nu for nu unequal to 0.

1053
01:05:12,560 --> 01:05:16,920
And if and only if pi P equals
lambda pi for some

1054
01:05:16,920 --> 01:05:18,210
pi unequal to 0.

1055
01:05:18,210 --> 01:05:23,250
In other words, if this
determinant is equal to 0, it

1056
01:05:23,250 --> 01:05:32,040
means that the matrix P minus
lambda i is singular.

1057
01:05:32,040 --> 01:05:35,950
If the matrix is singular, there
has to be some solution

1058
01:05:35,950 --> 01:05:38,370
to this equation here.

1059
01:05:38,370 --> 01:05:40,220
There has to be some
solution to this

1060
01:05:40,220 --> 01:05:44,530
left eigenvector equation.

1061
01:05:44,530 --> 01:05:48,740
Now, once you see this, you
notice that e is always a

1062
01:05:48,740 --> 01:05:53,750
right eigenvector of P. Every
stochastic matrix in the world

1063
01:05:53,750 --> 01:05:58,920
has the property that e is a
right eigenvector of it.

1064
01:05:58,920 --> 01:05:59,800
Why is that?

1065
01:05:59,800 --> 01:06:05,230
Because all of the rows of a
stochastic matrix sum to 1.

1066
01:06:05,230 --> 01:06:10,070
If you start off in state i, the
sum of the possible states

1067
01:06:10,070 --> 01:06:14,530
you can be at in the next
step is equal to 1.

1068
01:06:14,530 --> 01:06:17,120
You have to go somewhere.

1069
01:06:17,120 --> 01:06:21,650
So e is always a right
eigenvector of P with

1070
01:06:21,650 --> 01:06:23,300
eigenvalue 1.

1071
01:06:23,300 --> 01:06:26,510
Since e is also is a right
eigenvector of P with

1072
01:06:26,510 --> 01:06:29,850
eigenvalue 1, we go up here.

1073
01:06:29,850 --> 01:06:32,460
We look at these if and
only if statements.

1074
01:06:32,460 --> 01:06:34,890
We see, then, P must
be singular.

1075
01:06:34,890 --> 01:06:38,410
And then pi times P
equals lambda pi.

1076
01:06:38,410 --> 01:06:41,410
So no matter how many recurrent
classes we have, no

1077
01:06:41,410 --> 01:06:46,430
matter what periodicity we have
in each of them, there's

1078
01:06:46,430 --> 01:06:53,170
always a solution to pi
times P equals pi.

1079
01:06:53,170 --> 01:06:55,550
There's always at least one
steady state vector.

1080
01:06:59,320 --> 01:07:03,580
This determinant has an M-th
degree polynomial in lambda.

1081
01:07:03,580 --> 01:07:08,150
M-th degree polynomials
have M roots.

1082
01:07:08,150 --> 01:07:10,400
They aren't necessarily
distinct.

1083
01:07:10,400 --> 01:07:14,040
The multiplicity of an
eigenvalue is the number roots

1084
01:07:14,040 --> 01:07:15,500
of that value.

1085
01:07:15,500 --> 01:07:19,780
And the multiplicity
of lambda equals 1.

1086
01:07:19,780 --> 01:07:22,530
How many different roots
are there which have

1087
01:07:22,530 --> 01:07:24,360
lambda equals 1?

1088
01:07:24,360 --> 01:07:26,940
Well it turns out to be just
the number of recurrent

1089
01:07:26,940 --> 01:07:29,550
classes that you have.

1090
01:07:29,550 --> 01:07:32,750
If you have a bunch of recurrent
classes, within each

1091
01:07:32,750 --> 01:07:37,330
recurring class, there's a
solution to pi P equals pi,

1092
01:07:37,330 --> 01:07:41,540
which is non-zero only one
that recurrent class.

1093
01:07:41,540 --> 01:07:46,340
Namely, you take this huge
Markov chain and you say, I

1094
01:07:46,340 --> 01:07:48,650
don't care about any
of this except this

1095
01:07:48,650 --> 01:07:50,890
one recurrent class.

1096
01:07:50,890 --> 01:07:53,990
If we look at this one recurrent
class, and solve for

1097
01:07:53,990 --> 01:07:57,500
the steady state probability in
that one recurrent class,

1098
01:07:57,500 --> 01:08:01,220
then we get an eigenvector
which is non-zero on that

1099
01:08:01,220 --> 01:08:05,990
class, 0 everywhere else, that
has an eigenvalue 1.

1100
01:08:05,990 --> 01:08:08,050
And for every other recurrent
class, we

1101
01:08:08,050 --> 01:08:10,590
get the same situation.

1102
01:08:10,590 --> 01:08:14,150
So the multiplicity of lambda
equals 1 is equal to the

1103
01:08:14,150 --> 01:08:17,260
number of recurrent classes.

1104
01:08:17,260 --> 01:08:21,950
If you didn't get that proof
on the fly, it gets

1105
01:08:21,950 --> 01:08:23,310
proved in the notes.

1106
01:08:23,310 --> 01:08:27,130
And if you don't get the proof,
just remember that

1107
01:08:27,130 --> 01:08:28,380
that's the way it is.

1108
01:08:30,859 --> 01:08:34,859
For the special case where all
M eigenvalues are distinct,

1109
01:08:34,859 --> 01:08:38,640
the right eigenvectors are
linearly independent.

1110
01:08:38,640 --> 01:08:42,620
You remember that proof we went
through that all of the

1111
01:08:42,620 --> 01:08:46,470
left eigenvectors and all the
right eigenvectors are all

1112
01:08:46,470 --> 01:08:49,870
orthonormal to each other,
or you can make them all

1113
01:08:49,870 --> 01:08:52,270
orthonormal to each other?

1114
01:08:52,270 --> 01:08:57,380
That says that if the right
eigenvectors are linearly

1115
01:08:57,380 --> 01:09:01,120
independent, you can represent
them as the columns of an

1116
01:09:01,120 --> 01:09:04,750
invertible matrix U.
Then P times U is

1117
01:09:04,750 --> 01:09:06,819
equal to U times lambda.

1118
01:09:06,819 --> 01:09:09,800
What does this equations say?

1119
01:09:09,800 --> 01:09:12,460
You split it up into a
bunch of equations.

1120
01:09:16,500 --> 01:09:46,080
P times U and we look at it as
nu 1, nu 2, nu sub [? n ?].

1121
01:09:46,080 --> 01:09:52,580
I guess better put the
superscripts on it.

1122
01:09:56,100 --> 01:10:01,270
If I take the matrix U and just
view it as M different

1123
01:10:01,270 --> 01:10:05,190
columns, then what this
is saying is that

1124
01:10:05,190 --> 01:10:06,545
this is equal to--

1125
01:10:17,290 --> 01:10:35,540
nu 1, nu 2, nu M, times lambda
1, lambda 2, up to lambda M.

1126
01:10:35,540 --> 01:10:38,500
Now you multiply this out,
and what do you get?

1127
01:10:38,500 --> 01:10:41,860
You get nu 1 times lambda 1.

1128
01:10:41,860 --> 01:10:46,190
You get a nu 2 times lambda 2
for the second column, nu M

1129
01:10:46,190 --> 01:10:49,820
times lambda M for the last
column, and here you get P

1130
01:10:49,820 --> 01:10:54,360
times nu 1 is equal to a nu 1
times lambda 1, and so forth.

1131
01:10:54,360 --> 01:10:59,240
So all this vector equation says
is the same thing that

1132
01:10:59,240 --> 01:11:04,760
these n M individual eigenvector
equations say.

1133
01:11:04,760 --> 01:11:11,160
It's just a more compact way
of saying the same thing.

1134
01:11:11,160 --> 01:11:17,300
And if these eigenvectors span
this space, then this set of

1135
01:11:17,300 --> 01:11:20,710
eigenvectors are linearly
independent of each other.

1136
01:11:20,710 --> 01:11:24,860
And when you look at the set of
them, this matrix here has

1137
01:11:24,860 --> 01:11:26,440
to have an inverse.

1138
01:11:26,440 --> 01:11:34,890
So you can also express this
as P equals this vector--

1139
01:11:34,890 --> 01:11:40,820
this matrix of right
eigenvectors times the

1140
01:11:40,820 --> 01:11:46,630
diagonal matrix lambda, times
the inverse of this matrix.

1141
01:11:46,630 --> 01:11:49,880
Matrix U to the minus 1 turns
out to have rows equal to the

1142
01:11:49,880 --> 01:11:51,730
left eigenvectors.

1143
01:11:51,730 --> 01:11:54,330
That's because these
eigenvectors--

1144
01:11:54,330 --> 01:11:57,440
that's because the right
eigenvectors and the left

1145
01:11:57,440 --> 01:12:01,270
eigenvectors are orthogonal
to each other.

1146
01:12:04,670 --> 01:12:09,690
When we then split up this
matrix into a sum of M

1147
01:12:09,690 --> 01:12:13,830
different matrices, each matrix
having only one--

1148
01:12:41,270 --> 01:12:43,460
and so forth.

1149
01:12:43,460 --> 01:12:45,710
Then what you get--

1150
01:12:45,710 --> 01:12:48,490
here's this--

1151
01:12:48,490 --> 01:12:54,730
this nice equation here, which
says that if all the

1152
01:12:54,730 --> 01:12:58,870
eigenvalues are distinct, then
you can always represent a

1153
01:12:58,870 --> 01:13:03,420
stochastic matrix as the sum of
lambda i times nu to the i

1154
01:13:03,420 --> 01:13:04,670
times pi to the i.

1155
01:13:04,670 --> 01:13:10,000
More importantly, if you take
this equation here and look at

1156
01:13:10,000 --> 01:13:14,470
P to the n, P to the n is U
times lambda times U to the

1157
01:13:14,470 --> 01:13:18,820
minus 1, times U times lambda
times U to the minus 1, blah,

1158
01:13:18,820 --> 01:13:20,270
blah, blah forever.

1159
01:13:20,270 --> 01:13:24,030
Each U to the minus 1 cancels
out with the following U. And

1160
01:13:24,030 --> 01:13:29,330
you wind up with P to the n
equals U times lambda to the

1161
01:13:29,330 --> 01:13:33,170
n, U to the minus 1.

1162
01:13:33,170 --> 01:13:40,250
Which says that P to the
n is just a sum here.

1163
01:13:40,250 --> 01:13:44,650
It's the sum of the eigenvalues
to the n-th power

1164
01:13:44,650 --> 01:13:47,320
times these pairs of
eigenvectors here.

1165
01:13:47,320 --> 01:13:51,660
So this is a general
decomposition for P to the n.

1166
01:13:51,660 --> 01:13:56,010
What we're interested in is what
happens as n gets large.

1167
01:13:56,010 --> 01:13:59,360
If we have a unit chain, we
already know what happens as n

1168
01:13:59,360 --> 01:14:00,570
gets large.

1169
01:14:00,570 --> 01:14:07,110
We know that as n gets large,
we wind up with just 1 times

1170
01:14:07,110 --> 01:14:12,480
this eigenvector e times
this eigenvector pi.

1171
01:14:12,480 --> 01:14:15,760
Which says that all of the other
eigenvalues have to go

1172
01:14:15,760 --> 01:14:19,670
to 0, which says that the
magnitude of these other

1173
01:14:19,670 --> 01:14:22,200
eigenvalues are less than 1.

1174
01:14:22,200 --> 01:14:23,450
So they're all going away.

1175
01:14:26,600 --> 01:14:32,300
So the facts here are that all
eigenvalues lambda have to

1176
01:14:32,300 --> 01:14:35,310
satisfy the magnitude
of lambda is less

1177
01:14:35,310 --> 01:14:36,740
than or equal to 1.

1178
01:14:36,740 --> 01:14:39,680
That's what I just argued.

1179
01:14:39,680 --> 01:14:44,530
For each recurrent class C,
there's one lambda equals 1,

1180
01:14:44,530 --> 01:14:47,750
with a left side and vector
equals the steady state on

1181
01:14:47,750 --> 01:14:51,190
that recurrent class
and 0 elsewhere.

1182
01:14:51,190 --> 01:14:55,230
The right eigenvector nu
satisfies the limit as n goes

1183
01:14:55,230 --> 01:14:56,410
to infinity.

1184
01:14:56,410 --> 01:15:00,930
So the probability that x sub n
is in this recurring class,

1185
01:15:00,930 --> 01:15:04,850
given that x sub 0 is equal
to 0, is equal to the i-th

1186
01:15:04,850 --> 01:15:08,700
component of that right
eigenvector.

1187
01:15:08,700 --> 01:15:13,200
In other words, if you have a
Markov chain which has several

1188
01:15:13,200 --> 01:15:16,480
recurrent classes, and you
want to find out what the

1189
01:15:16,480 --> 01:15:23,630
probability is, starting in the
transient state, of going

1190
01:15:23,630 --> 01:15:29,170
to one of those classes, this is
what tells you that answer.

1191
01:15:29,170 --> 01:15:33,510
This says that the probability
that you go to a particular

1192
01:15:33,510 --> 01:15:37,530
recurrent class C, given that
you start off in a particular

1193
01:15:37,530 --> 01:15:41,340
transient state i, is whatever
that right eigenvector

1194
01:15:41,340 --> 01:15:42,690
turns out to be.

1195
01:15:42,690 --> 01:15:46,170
And you can solve that right
eigenvector problem just as an

1196
01:15:46,170 --> 01:15:48,920
M by M set of linear
equations.

1197
01:15:48,920 --> 01:15:51,170
So you can find the
probabilities of going through

1198
01:15:51,170 --> 01:15:56,370
each transient state just by
solving that set of linear

1199
01:15:56,370 --> 01:16:01,650
equations and finding those
eigenvector equations.

1200
01:16:01,650 --> 01:16:05,770
For each recurrent periodic
class of period d, there are d

1201
01:16:05,770 --> 01:16:09,140
eigenvalues equally spaced
on the unit circle.

1202
01:16:09,140 --> 01:16:13,330
There are no other eigenvalues
with lambda equals 1-- with a

1203
01:16:13,330 --> 01:16:15,080
magnitude of lambda equals 1.

1204
01:16:15,080 --> 01:16:19,070
In other words, for each
recurrent class, you get one

1205
01:16:19,070 --> 01:16:20,700
eigenvalue that's equal to 1.

1206
01:16:20,700 --> 01:16:25,260
If that recurrent class is
periodic, you get a bunch of

1207
01:16:25,260 --> 01:16:30,640
other eigenvalues put around
the unit circle.

1208
01:16:30,640 --> 01:16:35,380
And those are all the
eigenvalues there are.

1209
01:16:35,380 --> 01:16:36,296
Oh my God.

1210
01:16:36,296 --> 01:16:38,000
It's--

1211
01:16:38,000 --> 01:16:39,930
I thought I was talking
quickly.

1212
01:16:39,930 --> 01:16:44,870
But anyway, if the eigenvectors
don't span the

1213
01:16:44,870 --> 01:16:50,360
space, then P to the n is equal
to U times this Jordan

1214
01:16:50,360 --> 01:16:55,350
reform, U to the minus 1, where
J is a Jordan form.

1215
01:16:55,350 --> 01:16:58,320
What you saw in the homework
when you looked at the--

1216
01:17:02,030 --> 01:17:04,075
when you looked at the
Markov chain--

1217
01:17:28,120 --> 01:17:28,620
OK.

1218
01:17:28,620 --> 01:17:35,020
This is one recurrent class
with this one node in it.

1219
01:17:35,020 --> 01:17:38,030
These two nodes are
both transient.

1220
01:17:38,030 --> 01:17:41,720
If you look at how long it takes
to get from here over to

1221
01:17:41,720 --> 01:17:45,120
there, those transition
probabilities do not

1222
01:17:45,120 --> 01:17:51,620
correspond to this
equation here.

1223
01:17:51,620 --> 01:17:54,075
Instead, P sub 1 2--

1224
01:17:57,400 --> 01:18:00,230
P sub 2 3, the way I've
drawn it here.

1225
01:18:00,230 --> 01:18:07,140
P sub 1 3 is n times this
eigenvalue, which

1226
01:18:07,140 --> 01:18:09,760
is 1/2 in this case.

1227
01:18:09,760 --> 01:18:12,820
And it doesn't correspond to
this, which is why you need a

1228
01:18:12,820 --> 01:18:14,290
Jordan form.

1229
01:18:14,290 --> 01:18:17,860
I said that Jordan forms
are excessively ugly.

1230
01:18:17,860 --> 01:18:22,120
Jordan forms are really very
classy and nice ways of

1231
01:18:22,120 --> 01:18:24,460
dealing with a problem
which is very ugly.

1232
01:18:24,460 --> 01:18:26,340
So don't blame Jordan.

1233
01:18:26,340 --> 01:18:29,670
Jordan simplified
things for us.

1234
01:18:29,670 --> 01:18:36,840
So that's roughly as far as we
went with Markov chains.

1235
01:18:40,970 --> 01:18:44,910
Renewal processes, we don't have
to review them because

1236
01:18:44,910 --> 01:18:47,400
you're already immediately
familiar with them.

1237
01:18:50,610 --> 01:18:55,910
I will do one thing next time
with renewal classes and

1238
01:18:55,910 --> 01:19:00,290
Markov chains, which is to
explain to you why the

1239
01:19:00,290 --> 01:19:04,660
expected amount of time to get
from one state back to itself

1240
01:19:04,660 --> 01:19:07,380
is equal to 1 over pi--

1241
01:19:07,380 --> 01:19:09,160
1 over pi sub i.

1242
01:19:09,160 --> 01:19:10,790
You did that in the homework.

1243
01:19:10,790 --> 01:19:12,900
And it was an awful
way to do it.

1244
01:19:12,900 --> 01:19:14,340
And there's a nice
way to do it.

1245
01:19:14,340 --> 01:19:15,860
I'll talk about that next time.