1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation or view additional materials from 6 00:00:13,960 --> 00:00:17,890 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,890 --> 00:00:19,140 ocw.mit.edu. 8 00:00:24,010 --> 00:00:26,560 PROFESSOR: I'm going to spend most of time talking about 9 00:00:26,560 --> 00:00:29,400 chapters one, two, and three. 10 00:00:29,400 --> 00:00:32,220 A little bit talking about chapter four, because we've 11 00:00:32,220 --> 00:00:36,370 been doing so much with chapter four in the last 12 00:00:36,370 --> 00:00:39,980 couple of weeks that you probably remember that more. 13 00:00:39,980 --> 00:00:40,580 OK. 14 00:00:40,580 --> 00:00:44,310 The basics, which we started out with, and which you should 15 00:00:44,310 --> 00:00:48,800 never forget, is that any time you develop a probability 16 00:00:48,800 --> 00:00:53,840 model, you've got to specify what the sample space is and 17 00:00:53,840 --> 00:00:57,920 what the probability measure on that sample space is. 18 00:00:57,920 --> 00:01:01,850 And in practice, and in almost everything we've talked about 19 00:01:01,850 --> 00:01:05,800 so far, there's really a basic countable set of random 20 00:01:05,800 --> 00:01:08,490 variables which determine everything else. 21 00:01:08,490 --> 00:01:12,030 In other words, when you find the joint probability 22 00:01:12,030 --> 00:01:16,730 distribution on that set of random variables, that tells 23 00:01:16,730 --> 00:01:20,570 you everything else of interest. 24 00:01:20,570 --> 00:01:25,200 And a sample point or a sample path on that set of random 25 00:01:25,200 --> 00:01:29,520 variables is in a collection of sample values, one sample 26 00:01:29,520 --> 00:01:33,980 value for each random variable. 27 00:01:33,980 --> 00:01:37,740 It's very convenient, especially when you're in an 28 00:01:37,740 --> 00:01:43,630 exam and a little bit rushed, to confuse random variables 29 00:01:43,630 --> 00:01:47,250 with the sample values for the random variables. 30 00:01:47,250 --> 00:01:48,920 And that's fine. 31 00:01:48,920 --> 00:01:51,900 I just want to caution you again, and I've done this many 32 00:01:51,900 --> 00:01:58,410 times, that about half the mistakes that people make-- 33 00:01:58,410 --> 00:02:01,980 half of the conceptual mistakes that people make 34 00:02:01,980 --> 00:02:06,200 doing problems and doing quizzes are connected with 35 00:02:06,200 --> 00:02:09,810 getting confused at some point about what's a random variable 36 00:02:09,810 --> 00:02:12,210 and what's a sample value of that random variable. 37 00:02:12,210 --> 00:02:17,210 And you start thinking about sample values as just numbers. 38 00:02:17,210 --> 00:02:19,090 And I do that too. 39 00:02:19,090 --> 00:02:21,220 It's convenient for thinking about things. 40 00:02:21,220 --> 00:02:26,790 But you have to know that that's not the whole story. 41 00:02:26,790 --> 00:02:29,740 Often, we have uncountable sets of random variables. 42 00:02:29,740 --> 00:02:34,720 Like in renewal processes, we have the counting renewal 43 00:02:34,720 --> 00:02:38,690 process, which typically has an uncountable set of random 44 00:02:38,690 --> 00:02:43,860 variables, a number of arrivals up to each time, t, 45 00:02:43,860 --> 00:02:48,750 where t is a continuous valued random variable. 46 00:02:48,750 --> 00:02:52,810 But in almost all of those cases, you can define things 47 00:02:52,810 --> 00:02:56,195 in terms of simpler sets of random variables, like the 48 00:02:56,195 --> 00:02:59,480 interarrival times, which are IID. 49 00:03:02,530 --> 00:03:05,960 Most of the processes we've talked about really have a 50 00:03:05,960 --> 00:03:08,600 pretty simple description if you look for the simplest 51 00:03:08,600 --> 00:03:09,850 description of them. 52 00:03:13,730 --> 00:03:17,680 If you have a sequence of IID random variables-- 53 00:03:17,680 --> 00:03:25,270 which is what we have for Poisson and renewal processes, 54 00:03:25,270 --> 00:03:28,680 and what we have for Markov chains is not that much more 55 00:03:28,680 --> 00:03:30,310 complicated-- 56 00:03:30,310 --> 00:03:35,500 the laws of large numbers are useful to specify what the 57 00:03:35,500 --> 00:03:38,500 long term behavior is. 58 00:03:38,500 --> 00:03:47,280 The sample time average is, as we all know by now, is the sum 59 00:03:47,280 --> 00:03:49,960 of the random variables divided by n. 60 00:03:49,960 --> 00:03:53,090 So it's a sample average of these quantities. 61 00:03:53,090 --> 00:03:57,570 The random variable, which has a main x bar, the expected 62 00:03:57,570 --> 00:04:00,140 value of x, that's almost obvious. 63 00:04:00,140 --> 00:04:03,350 You just take the expected value of s sub n, and it's n 64 00:04:03,350 --> 00:04:08,360 times the expected value of x divided by n, and you're done. 65 00:04:08,360 --> 00:04:11,680 And the variance, since these random variables are 66 00:04:11,680 --> 00:04:15,540 independent, you find that almost as easily. 67 00:04:15,540 --> 00:04:18,810 That has this very simple-minded 68 00:04:18,810 --> 00:04:20,850 distribution function. 69 00:04:20,850 --> 00:04:24,340 Remember, we usually work with distribution 70 00:04:24,340 --> 00:04:26,960 functions in this class. 71 00:04:26,960 --> 00:04:32,580 And often, the exercises are much easier when you do them 72 00:04:32,580 --> 00:04:36,500 in terms of the distribution function than if you use 73 00:04:36,500 --> 00:04:40,760 formulas you remember from elementary courses, which are 74 00:04:40,760 --> 00:04:44,260 specialized to-- 75 00:04:44,260 --> 00:04:47,140 which are specialized to probability density and 76 00:04:47,140 --> 00:04:51,170 probability mass functions, and often have more special 77 00:04:51,170 --> 00:04:53,110 conditions on them than that. 78 00:04:53,110 --> 00:04:57,470 But anyway, the distribution function starts 79 00:04:57,470 --> 00:04:58,570 to look like this. 80 00:04:58,570 --> 00:05:03,250 As n gets bigger, you notice that what's happening is that 81 00:05:03,250 --> 00:05:08,860 you get a distribution which is scrunching in this way, 82 00:05:08,860 --> 00:05:10,820 which is starting to look smoother. 83 00:05:10,820 --> 00:05:13,450 The jumps in it gets smaller. 84 00:05:13,450 --> 00:05:18,630 And you start out with this thing which is kind of crazy. 85 00:05:18,630 --> 00:05:21,370 And by time, n is even 50. 86 00:05:21,370 --> 00:05:25,770 You get something which almost looks like a-- 87 00:05:25,770 --> 00:05:26,840 I don't know how we tell the difference 88 00:05:26,840 --> 00:05:28,460 between those two things. 89 00:05:28,460 --> 00:05:30,060 I thought we could, but we can't. 90 00:05:30,060 --> 00:05:31,670 I certainly can't up there. 91 00:05:31,670 --> 00:05:37,650 But anyway, the one that's tightest in is the one 92 00:05:37,650 --> 00:05:39,880 for n equals 50. 93 00:05:39,880 --> 00:05:44,150 And what these laws of large numbers all say in some sense 94 00:05:44,150 --> 00:05:51,380 is that this distribution function gets crunched in 95 00:05:51,380 --> 00:05:54,550 towards an impulse at the mean. 96 00:05:54,550 --> 00:05:58,260 And then they say other more specialized things about how 97 00:05:58,260 --> 00:06:02,580 this happens, about sample paths and all of that. 98 00:06:02,580 --> 00:06:06,270 But the idea is that this distribution function is 99 00:06:06,270 --> 00:06:10,760 heading towards a unit impulse. 100 00:06:10,760 --> 00:06:14,440 The weak law of large numbers then says that if the expected 101 00:06:14,440 --> 00:06:18,840 value of the magnitude of x is less than infinity-- 102 00:06:18,840 --> 00:06:21,660 and usually when we talk about random variables having a 103 00:06:21,660 --> 00:06:25,630 mean, that's exactly what we mean. 104 00:06:25,630 --> 00:06:31,220 If that condition is not satisfied, then we usually say 105 00:06:31,220 --> 00:06:33,690 that the random variable doesn't have a mean. 106 00:06:33,690 --> 00:06:37,300 And you'll see that every time you look at anything in 107 00:06:37,300 --> 00:06:38,520 probability theory. 108 00:06:38,520 --> 00:06:41,940 When people say the mean exists, that's what they 109 00:06:41,940 --> 00:06:43,830 always mean. 110 00:06:43,830 --> 00:06:47,950 And what the theorem says then is exactly what we were 111 00:06:47,950 --> 00:06:49,060 talking about before. 112 00:06:49,060 --> 00:06:54,940 The probability that the difference between s n over n, 113 00:06:54,940 --> 00:06:58,570 and the mean of x bar, the probability that it's greater 114 00:06:58,570 --> 00:07:03,090 than or equal to epsilon equals 0 in the limit. 115 00:07:03,090 --> 00:07:06,020 So it's saying that you put epsilon limits on that 116 00:07:06,020 --> 00:07:10,860 distribution function and let n get bigger and bigger, it 117 00:07:10,860 --> 00:07:14,570 goes to 1 and 0. 118 00:07:14,570 --> 00:07:18,120 It says the probability of s n over n, less than or equal to 119 00:07:18,120 --> 00:07:23,240 x, approaches a unit step as n approaches infinity. 120 00:07:23,240 --> 00:07:27,660 This says this is the condition for convergence in 121 00:07:27,660 --> 00:07:30,440 probability. 122 00:07:30,440 --> 00:07:33,880 What we're saying is that that also means convergence and 123 00:07:33,880 --> 00:07:38,740 distribution function, and distribution for this case. 124 00:07:38,740 --> 00:07:42,520 And then we also, when we got to renewal processes, we 125 00:07:42,520 --> 00:07:45,330 talked about the strong law of large numbers. 126 00:07:45,330 --> 00:07:49,760 And that says that the expected value of x is finite. 127 00:07:49,760 --> 00:07:56,630 Then this limit approaches x on a sample path basis. 128 00:07:56,630 --> 00:07:59,770 In other words, for every sample path, except this set 129 00:07:59,770 --> 00:08:05,020 of probability 0, this condition holds true. 130 00:08:05,020 --> 00:08:08,260 That doesn't seem like it's very different or very 131 00:08:08,260 --> 00:08:10,610 important for the time being. 132 00:08:10,610 --> 00:08:14,060 But when we started studying renewal processes, which is 133 00:08:14,060 --> 00:08:19,120 where we actually talked about this, we saw that in fact, it 134 00:08:19,120 --> 00:08:24,830 let us talk about this, which says that if you take any 135 00:08:24,830 --> 00:08:28,700 function of s n over n-- 136 00:08:28,700 --> 00:08:31,590 in other words, a function of a real value-- 137 00:08:31,590 --> 00:08:33,830 a function of a-- 138 00:08:33,830 --> 00:08:35,720 a real valued function of a-- 139 00:08:40,010 --> 00:08:43,570 a real valued function of a real value, yes. 140 00:08:43,570 --> 00:08:46,470 What you get is that same function 141 00:08:46,470 --> 00:08:49,100 applied to the mean here. 142 00:08:49,100 --> 00:08:50,260 And that's the thing which is so 143 00:08:50,260 --> 00:08:52,630 useful for renewal processes. 144 00:08:52,630 --> 00:08:55,740 And it's what usually makes the strong law of large 145 00:08:55,740 --> 00:08:58,730 numbers so much easier to use than the weak law. 146 00:09:04,220 --> 00:09:06,170 That's a plug for the strong law. 147 00:09:06,170 --> 00:09:08,745 There are many extensions of the week love telling how fast 148 00:09:08,745 --> 00:09:10,910 the convergence is. 149 00:09:10,910 --> 00:09:14,350 One thing you should always remember about the central 150 00:09:14,350 --> 00:09:17,510 limit theorem, is it really tells you something about the 151 00:09:17,510 --> 00:09:18,790 weak law of large numbers. 152 00:09:18,790 --> 00:09:22,260 It tells you how fast that convergence is and what the 153 00:09:22,260 --> 00:09:24,720 convergence looks like. 154 00:09:24,720 --> 00:09:28,170 It says that if the variance of this underlying random 155 00:09:28,170 --> 00:09:34,000 variable is finite, then this limit here is equal to the 156 00:09:34,000 --> 00:09:37,290 normal distribution function, the Gaussian at 157 00:09:37,290 --> 00:09:41,350 variance 1 and mean 0. 158 00:09:41,350 --> 00:09:45,070 And that becomes a little easier to see what it's saying 159 00:09:45,070 --> 00:09:46,870 if you look at it this way. 160 00:09:46,870 --> 00:09:51,510 It says probability that s n over n minus x bar-- 161 00:09:51,510 --> 00:09:56,890 namely the difference between the sum and the mean which 162 00:09:56,890 --> 00:09:58,380 it's converging to-- 163 00:09:58,380 --> 00:10:01,340 the probability that that's less than or equal to y sigma 164 00:10:01,340 --> 00:10:04,010 over square root of n is this normal 165 00:10:04,010 --> 00:10:05,480 Gaussian random variable. 166 00:10:05,480 --> 00:10:11,740 It says that as n gets bigger and bigger, this quantity here 167 00:10:11,740 --> 00:10:13,030 gets tighter and tighter. 168 00:10:13,030 --> 00:10:18,620 What it says in terms of the picture here, in terms of this 169 00:10:18,620 --> 00:10:22,900 picture, it says that as n gets bigger and bigger, this 170 00:10:22,900 --> 00:10:28,560 picture here scrunches down as 1 over the square root of n. 171 00:10:28,560 --> 00:10:30,970 And it also becomes Gaussian. 172 00:10:30,970 --> 00:10:33,760 | it tells you exactly what kind of convergence you 173 00:10:33,760 --> 00:10:34,770 actually have here. 174 00:10:34,770 --> 00:10:39,200 Is not only saying that this does converge to a unit step. 175 00:10:39,200 --> 00:10:42,010 It says how it converges. 176 00:10:42,010 --> 00:10:48,240 And that's a nice thing, conceptually. 177 00:10:48,240 --> 00:10:51,780 You don't always need it in problems. 178 00:10:51,780 --> 00:10:54,600 But you need it for understanding what's going on. 179 00:10:59,890 --> 00:11:01,690 We're moving backwards, it seems. 180 00:11:06,180 --> 00:11:09,420 Now, 1, 2, Poisson processes. 181 00:11:09,420 --> 00:11:12,630 We talked about arrival processes. 182 00:11:12,630 --> 00:11:15,260 You'd almost think that all processes are arrival 183 00:11:15,260 --> 00:11:17,080 processes at this point. 184 00:11:17,080 --> 00:11:19,770 But any time you start to think about that, think of a 185 00:11:19,770 --> 00:11:21,270 Markov chain. 186 00:11:21,270 --> 00:11:26,150 And a Markov chain is not an arrival process, ordinarily. 187 00:11:26,150 --> 00:11:28,470 Some of them can be viewed that way. 188 00:11:28,470 --> 00:11:29,690 But most of them can't. 189 00:11:29,690 --> 00:11:31,990 An arrival processes is an increasing 190 00:11:31,990 --> 00:11:34,650 sequence of random variables. 191 00:11:34,650 --> 00:11:40,020 0 less than s1, which is the time of the first arrival, s2, 192 00:11:40,020 --> 00:11:42,810 which is a time of the second arrival, and so forth. 193 00:11:42,810 --> 00:11:48,220 Interarrival times are x1 equals s1, and x i equals s i 194 00:11:48,220 --> 00:11:51,150 minus s i minus 1. 195 00:11:51,150 --> 00:11:55,480 The picture, which you should have indelibly printed on the 196 00:11:55,480 --> 00:11:58,850 back of your brain someplace by this time, is 197 00:11:58,850 --> 00:12:00,430 this picture here. 198 00:12:00,430 --> 00:12:04,930 s1, s2, s3, are the times at which arrivals occur. 199 00:12:04,930 --> 00:12:07,590 These are random variables, so these arrivals come in at 200 00:12:07,590 --> 00:12:09,320 random times. 201 00:12:09,320 --> 00:12:14,690 x1, x2, x3 are the intervals between arrivals. 202 00:12:14,690 --> 00:12:18,280 And N of t is the number of arrivals that have occurred up 203 00:12:18,280 --> 00:12:19,860 until time t. 204 00:12:19,860 --> 00:12:26,800 So every time the t passes one of these arrival times, N of t 205 00:12:26,800 --> 00:12:31,140 pops up by one, pops up by one again, pops up by one again. 206 00:12:31,140 --> 00:12:34,200 The sample value pops up by one. 207 00:12:34,200 --> 00:12:36,920 Arrival process can model arrivals to a queue, 208 00:12:36,920 --> 00:12:40,320 departures from a queue, locations of breaks in an oil 209 00:12:40,320 --> 00:12:43,960 line, an enormous number of things. 210 00:12:43,960 --> 00:12:46,260 It's not just arrivals we're talking about. 211 00:12:46,260 --> 00:12:48,070 It's all of these other things, also. 212 00:12:48,070 --> 00:12:54,330 But it's something laid out on a one-dimensional axis where 213 00:12:54,330 --> 00:12:58,390 things happen at various places on that 214 00:12:58,390 --> 00:12:59,700 one-dimensional axis. 215 00:12:59,700 --> 00:13:05,100 So that's the way to view it. 216 00:13:05,100 --> 00:13:07,540 OK, same picture again. 217 00:13:07,540 --> 00:13:11,510 Process can be specified by the joint distribution of the 218 00:13:11,510 --> 00:13:15,570 arrival epochs or the interarrival times, and, in 219 00:13:15,570 --> 00:13:18,090 fact, of the counting process. 220 00:13:18,090 --> 00:13:25,200 If you see a sample path of the counting process, then 221 00:13:25,200 --> 00:13:29,180 from that you can determine the sample path of the arrival 222 00:13:29,180 --> 00:13:33,220 times and the sample path of the interarrival times. 223 00:13:33,220 --> 00:13:38,320 And since any set of these random variables specifies all 224 00:13:38,320 --> 00:13:43,220 three of these things, the three are all equivalent. 225 00:13:43,220 --> 00:13:47,150 OK, we have this important condition here. 226 00:13:47,150 --> 00:13:55,960 And I always sort of forget this, but when these arrivals 227 00:13:55,960 --> 00:13:59,700 are highly delayed, when there's a long period of time 228 00:13:59,700 --> 00:14:05,380 between each arrival, what that says is the accounting 229 00:14:05,380 --> 00:14:08,480 process is getting small. 230 00:14:08,480 --> 00:14:12,570 So big interarrival times corresponds to a small 231 00:14:12,570 --> 00:14:14,180 value of N of t. 232 00:14:14,180 --> 00:14:16,420 And you can see that in the picture here. 233 00:14:16,420 --> 00:14:20,020 If you spread out these arrivals, you make s1 all the 234 00:14:20,020 --> 00:14:21,290 way out here. 235 00:14:21,290 --> 00:14:26,190 Then N of t doesn't become 1 until way out here. 236 00:14:26,190 --> 00:14:32,930 So N of t as a function of t is getting smaller as s sub n 237 00:14:32,930 --> 00:14:36,030 is getting larger. 238 00:14:36,030 --> 00:14:41,560 S sub n is the minimum of the set of t, such that N of t is 239 00:14:41,560 --> 00:14:45,830 greater than or equal to N. Sounds like a unpleasantly 240 00:14:45,830 --> 00:14:49,460 complicated expression. 241 00:14:49,460 --> 00:14:52,210 If any of you can find a simpler way to say it than 242 00:14:52,210 --> 00:14:55,950 that, I would be absolutely delighted to hear it. 243 00:14:55,950 --> 00:14:57,530 But I don't think there is. 244 00:14:57,530 --> 00:15:01,150 I think the simpler way to say it is this picture here. 245 00:15:01,150 --> 00:15:03,230 And the picture says it. 246 00:15:03,230 --> 00:15:08,770 And you can sort of figure out all those logical statements 247 00:15:08,770 --> 00:15:11,670 from the picture, which is intuitively a 248 00:15:11,670 --> 00:15:12,942 lot clearer, I think. 249 00:15:17,270 --> 00:15:23,380 So now, renewal processes is an arrival process with IID 250 00:15:23,380 --> 00:15:25,100 interarrival times. 251 00:15:25,100 --> 00:15:28,800 And a Poisson process is a renewal process where the 252 00:15:28,800 --> 00:15:32,130 interarrival random variables are exponential. 253 00:15:32,130 --> 00:15:35,290 So, Poisson process is a special 254 00:15:35,290 --> 00:15:37,200 case of renewal process. 255 00:15:37,200 --> 00:15:40,920 Why are these exponential interarrival 256 00:15:40,920 --> 00:15:43,350 arrival times so important? 257 00:15:43,350 --> 00:15:46,550 Well, it's because they're memoryless. 258 00:15:46,550 --> 00:15:50,360 And the memoryless property says that the probability that 259 00:15:50,360 --> 00:15:54,535 x is greater than t plus x is equal to the probability that 260 00:15:54,535 --> 00:15:58,190 it's greater than x times the probability that it's greater 261 00:15:58,190 --> 00:16:01,830 than t for all x and t greater than or equal to 0. 262 00:16:01,830 --> 00:16:04,860 This makes better sense if you say it conditionally. 263 00:16:04,860 --> 00:16:09,040 The probability that x is greater than t plus x, given 264 00:16:09,040 --> 00:16:12,700 that it's greater than t, is the same as the probability 265 00:16:12,700 --> 00:16:14,800 that x is greater that-- 266 00:16:14,800 --> 00:16:17,460 capital X is greater than little x. 267 00:16:17,460 --> 00:16:20,420 This really gives you the memoryless 268 00:16:20,420 --> 00:16:21,780 property in a nutshell. 269 00:16:21,780 --> 00:16:25,860 It says if you're looking at this process as it evolves, 270 00:16:25,860 --> 00:16:29,010 and you see an arrival, and then you start looking for the 271 00:16:29,010 --> 00:16:32,160 next arrival, it says that no matter how long you've been 272 00:16:32,160 --> 00:16:36,240 looking, the distribution function, as the time to wait 273 00:16:36,240 --> 00:16:38,930 until the next arrival, is the same 274 00:16:38,930 --> 00:16:40,580 exponential random variable. 275 00:16:40,580 --> 00:16:44,220 So you never gain anything by waiting. 276 00:16:44,220 --> 00:16:46,390 You might as well be impatient. 277 00:16:46,390 --> 00:16:48,790 But it doesn't do any good to be impatient. 278 00:16:48,790 --> 00:16:51,130 Doesn't to any good to wait. 279 00:16:51,130 --> 00:16:52,850 It doesn't do any good to not wait. 280 00:16:52,850 --> 00:16:56,280 No matter what you do, this damn thing always takes an 281 00:16:56,280 --> 00:16:59,780 exponential amount of time to occur. 282 00:16:59,780 --> 00:17:01,410 OK, that's what it means to be memoryless. 283 00:17:01,410 --> 00:17:03,910 And the exponential is the only 284 00:17:03,910 --> 00:17:05,835 memoryless random variable. 285 00:17:10,775 --> 00:17:14,910 How about a geometric random variable? 286 00:17:14,910 --> 00:17:19,190 The geometric random variable is memoryless if you're only 287 00:17:19,190 --> 00:17:22,150 looking at integer times. 288 00:17:22,150 --> 00:17:32,180 Here we're talking about times on a continuum. 289 00:17:32,180 --> 00:17:35,090 That's what this says. 290 00:17:35,090 --> 00:17:38,410 Well, that's what this says. 291 00:17:38,410 --> 00:17:46,590 And if you look at discrete times, then a geometric random 292 00:17:46,590 --> 00:17:49,860 variable is memoryless also. 293 00:17:55,020 --> 00:17:58,210 We're given a Poisson of rate lambda. 294 00:17:58,210 --> 00:18:01,290 The interval from any given t greater than 0 until the first 295 00:18:01,290 --> 00:18:04,190 arrival after t is a random variable. 296 00:18:04,190 --> 00:18:06,010 Let's call it z1. 297 00:18:06,010 --> 00:18:08,650 We already said that that random variable was 298 00:18:08,650 --> 00:18:11,430 exponential. 299 00:18:11,430 --> 00:18:17,040 And it's independent of all arrivals which occur before 300 00:18:17,040 --> 00:18:18,630 that starting time t. 301 00:18:18,630 --> 00:18:23,220 So looking at any starting time t, doesn't make any 302 00:18:23,220 --> 00:18:25,530 difference what has happened back here. 303 00:18:25,530 --> 00:18:27,450 That's not only the last arrival, but 304 00:18:27,450 --> 00:18:29,630 all the other arrivals. 305 00:18:29,630 --> 00:18:32,880 The time until the next arrival is exponential. 306 00:18:32,880 --> 00:18:36,520 The time until each arrival after that is exponential 307 00:18:36,520 --> 00:18:41,690 also, which says that if you look at this process starting 308 00:18:41,690 --> 00:18:47,250 at time t, it's a Poisson process again, where all the 309 00:18:47,250 --> 00:18:50,450 times have to be shifted, of course, but it's a Poisson 310 00:18:50,450 --> 00:18:52,830 process starting at time t. 311 00:18:52,830 --> 00:19:00,570 The corresponding counting process, we can call it n 312 00:19:00,570 --> 00:19:04,950 tilde of t and tau, where tau is greater than or equal to t, 313 00:19:04,950 --> 00:19:09,690 where this is the number of arrivals in the original 314 00:19:09,690 --> 00:19:14,610 process up until time tau minus the number of arrivals 315 00:19:14,610 --> 00:19:16,340 up until time t. 316 00:19:16,340 --> 00:19:19,330 If you look at that difference, so many arrivals 317 00:19:19,330 --> 00:19:26,550 up until t, so many more up until time tau. 318 00:19:26,550 --> 00:19:29,030 You look at the difference between tau and t. 319 00:19:29,030 --> 00:19:37,080 The number of arrivals in that interval is the same Poisson 320 00:19:37,080 --> 00:19:39,800 distributing random variable again. 321 00:19:39,800 --> 00:19:43,080 So, it has the same distribution as N 322 00:19:43,080 --> 00:19:45,020 of tau minus t. 323 00:19:45,020 --> 00:19:47,650 And that's called the stationary increment property. 324 00:19:47,650 --> 00:19:50,720 It says that no matter where you start a Poisson process, 325 00:19:50,720 --> 00:19:53,030 it always looks exactly the same. 326 00:19:53,030 --> 00:19:58,370 It says that if you wait for one hour and start then, it's 327 00:19:58,370 --> 00:20:01,750 exactly the same as what it was before. 328 00:20:01,750 --> 00:20:05,960 If we had Poisson processes in the world, it wouldn't do any 329 00:20:05,960 --> 00:20:09,720 good to travel on certain days rather than other days. 330 00:20:09,720 --> 00:20:13,170 It wouldn't do any good to leave to drive home at one 331 00:20:13,170 --> 00:20:14,850 hour rather than another hour. 332 00:20:14,850 --> 00:20:17,670 You'd have the same travel all the time. 333 00:20:17,670 --> 00:20:18,980 It's all equal. 334 00:20:18,980 --> 00:20:21,140 It would be an awful world if it were stationary. 335 00:20:23,770 --> 00:20:26,750 The independent increment properties for counting 336 00:20:26,750 --> 00:20:33,170 process is that for all sequences of ordered times-- 337 00:20:33,170 --> 00:20:37,490 0 less than t1 less than t2 up to t k-- 338 00:20:37,490 --> 00:20:40,310 the random variables n of t1-- 339 00:20:40,310 --> 00:20:44,440 and now we're talking about the number of arrivals between 340 00:20:44,440 --> 00:20:47,510 t1 and t2, the number of arrivals between 341 00:20:47,510 --> 00:20:49,600 n minus 1 and tn. 342 00:20:49,600 --> 00:20:52,330 These are all independent of each other. 343 00:20:52,330 --> 00:20:55,390 That's what this independent increment property says. 344 00:20:55,390 --> 00:20:58,110 And we see from what we've said about this memoryless 345 00:20:58,110 --> 00:21:02,680 property that the Poisson process does indeed have this 346 00:21:02,680 --> 00:21:04,750 independent increment property. 347 00:21:04,750 --> 00:21:08,720 Poisson processes have both the stationary and independent 348 00:21:08,720 --> 00:21:11,240 increment properties. 349 00:21:11,240 --> 00:21:15,760 And this looks like an immediate consequence of that. 350 00:21:15,760 --> 00:21:16,370 It's not. 351 00:21:16,370 --> 00:21:19,630 Remember, we had to struggle with this for a bit. 352 00:21:19,630 --> 00:21:22,500 But it says plus Poisson processes can be defined by 353 00:21:22,500 --> 00:21:26,450 the stationary and independent increment properties, plus 354 00:21:26,450 --> 00:21:32,730 either the Poisson PMF for N of t, or this incremental 355 00:21:32,730 --> 00:21:38,660 property, the probability that N tilde of t and t plus delta, 356 00:21:38,660 --> 00:21:43,320 and the number of arrivals between t and t plus delta, 357 00:21:43,320 --> 00:21:46,170 the probability that that's 1 is equal to 358 00:21:46,170 --> 00:21:47,600 lambda times delta. 359 00:21:47,600 --> 00:21:53,040 In other words, this view of a Poisson process is the view 360 00:21:53,040 --> 00:21:56,850 that you get when you sort of forget about time. 361 00:21:56,850 --> 00:22:00,220 And you think of arrivals from outer space coming down and 362 00:22:00,220 --> 00:22:01,470 hitting on a line. 363 00:22:01,470 --> 00:22:03,760 And they hit on that line randomly. 364 00:22:03,760 --> 00:22:05,860 And each one of them is independent 365 00:22:05,860 --> 00:22:07,780 of every other one. 366 00:22:07,780 --> 00:22:15,350 And that's what you get if you wind up with a density of 367 00:22:15,350 --> 00:22:18,770 lambda arrivals per unit time. 368 00:22:18,770 --> 00:22:22,120 OK, we talked about all of that, of course. 369 00:22:22,120 --> 00:22:23,400 The probability distributions-- 370 00:22:26,050 --> 00:22:29,380 there are many of them for a Poisson process. 371 00:22:29,380 --> 00:22:32,470 The Poisson process is remarkable in the sense that 372 00:22:32,470 --> 00:22:35,320 anything you want to find, there's generally a simple 373 00:22:35,320 --> 00:22:37,070 formula for it. 374 00:22:37,070 --> 00:22:39,530 If it's complicated, you're probably not looking at 375 00:22:39,530 --> 00:22:42,010 it the right way. 376 00:22:42,010 --> 00:22:45,360 So many things come out very, very simply. 377 00:22:45,360 --> 00:22:46,660 The probability-- 378 00:22:46,660 --> 00:22:50,580 the joint probability distribution of all of the 379 00:22:50,580 --> 00:22:58,670 arrival times up until time N is an exponential just in the 380 00:22:58,670 --> 00:23:05,080 last one, which says that the intermediate arrival epochs 381 00:23:05,080 --> 00:23:09,140 are equally likely to be anywhere, just as long as they 382 00:23:09,140 --> 00:23:13,440 satisfy this ordering restriction, s1 less than s2. 383 00:23:13,440 --> 00:23:15,430 That's what this formula says. 384 00:23:15,430 --> 00:23:20,490 It says that the joint density of these arrival times doesn't 385 00:23:20,490 --> 00:23:23,010 depend on anything except the time of the last one. 386 00:23:25,740 --> 00:23:28,520 But it does depend on the fact that they're [INAUDIBLE]. 387 00:23:28,520 --> 00:23:31,435 From that, you can find virtually everything else if 388 00:23:31,435 --> 00:23:32,900 you want to. 389 00:23:32,900 --> 00:23:36,600 That really is saying exactly the same thing as we were just 390 00:23:36,600 --> 00:23:38,440 saying a while ago. 391 00:23:38,440 --> 00:23:41,740 This is the viewpoint of looking at this line from 392 00:23:41,740 --> 00:23:47,040 outer space with arrivals coming in, coming in uniformly 393 00:23:47,040 --> 00:23:51,630 distributed over this line interval, and each of them 394 00:23:51,630 --> 00:23:54,080 independent of each other one. 395 00:23:54,080 --> 00:23:57,740 That's what you wind up saying. 396 00:23:57,740 --> 00:24:01,490 This density, then, of the n-th arrival, if you just 397 00:24:01,490 --> 00:24:05,620 integrate all this stuff, you get the Erlang formula. 398 00:24:05,620 --> 00:24:12,940 Probability of arrival n in t to t plus delta is-- 399 00:24:12,940 --> 00:24:17,820 now this is the derivation that we went through before, 400 00:24:17,820 --> 00:24:20,310 going from Erlang to Poisson. 401 00:24:20,310 --> 00:24:24,370 You can go from Poisson to Erlang too, if you want to. 402 00:24:24,370 --> 00:24:26,320 But it's a little easier to go this way. 403 00:24:26,320 --> 00:24:30,500 The probability of arrival in t to t plus delta is the 404 00:24:30,500 --> 00:24:35,890 probability that n of t is equal to n minus 1 times 405 00:24:35,890 --> 00:24:40,670 lambda delta plus an o of delta, of course. 406 00:24:40,670 --> 00:24:46,270 And the probability that n of t is equal to n minus 1 from 407 00:24:46,270 --> 00:24:53,050 this formula here is going to be the density of when s sub n 408 00:24:53,050 --> 00:24:55,040 appears, divided by lambda. 409 00:24:55,040 --> 00:24:58,910 That's exactly what this formula here says. 410 00:24:58,910 --> 00:25:01,980 So that's just the Poisson distribution. 411 00:25:01,980 --> 00:25:04,910 We've been through that derivation. 412 00:25:04,910 --> 00:25:08,420 It's almost a derivation worth remembering, because it just 413 00:25:08,420 --> 00:25:11,940 appears so often. 414 00:25:11,940 --> 00:25:16,160 As you've seen from the problem sets we've done, 415 00:25:16,160 --> 00:25:20,970 almost every problem you can dream of, dealing with Poisson 416 00:25:20,970 --> 00:25:27,150 processes, the easy way to do them comes from this property 417 00:25:27,150 --> 00:25:30,730 of combining and splitting Poisson processes. 418 00:25:30,730 --> 00:25:35,170 It says if n1 of t, n2 of t, up to n sub k of t are 419 00:25:35,170 --> 00:25:37,500 independent Poisson processes-- 420 00:25:37,500 --> 00:25:39,880 what do you mean by a process being 421 00:25:39,880 --> 00:25:42,200 independent of another process? 422 00:25:42,200 --> 00:25:46,660 Well, the process is specified by the interarrival times for 423 00:25:46,660 --> 00:25:47,660 that process. 424 00:25:47,660 --> 00:25:50,950 So what we're saying here is the interarrival times for the 425 00:25:50,950 --> 00:25:54,470 first process are independent of the interarrival times of 426 00:25:54,470 --> 00:25:56,770 the second process, independent of the 427 00:25:56,770 --> 00:26:00,620 interarrival times for the third process, and so forth. 428 00:26:00,620 --> 00:26:02,990 Again, this is a view of someone from outer space, 429 00:26:02,990 --> 00:26:06,180 throwing darts onto a line. 430 00:26:06,180 --> 00:26:09,750 And if you have multiple people throwing darts on a 431 00:26:09,750 --> 00:26:13,450 line, but they're all equally distributed, all uniformly 432 00:26:13,450 --> 00:26:16,600 distributed over the line, this is exactly 433 00:26:16,600 --> 00:26:20,670 the model you get. 434 00:26:20,670 --> 00:26:22,180 So we have two views here. 435 00:26:22,180 --> 00:26:26,480 The first one is to look at the arrival epochs that's 436 00:26:26,480 --> 00:26:28,420 generated from each process. 437 00:26:28,420 --> 00:26:31,710 And then combine all arrivals into one Poisson process. 438 00:26:31,710 --> 00:26:34,900 So we look at all these Poisson processes, and then 439 00:26:34,900 --> 00:26:38,340 take the sum of them, and we get a Poisson process. 440 00:26:38,340 --> 00:26:40,190 The other way to look at it-- 441 00:26:40,190 --> 00:26:43,120 and going back and forth between these two views is the 442 00:26:43,120 --> 00:26:45,060 way you solve problems-- 443 00:26:45,060 --> 00:26:46,770 you look at the combined sequence of 444 00:26:46,770 --> 00:26:48,900 arrival epochs first. 445 00:26:48,900 --> 00:26:52,400 And then for each arrival that comes in, you think of an IID 446 00:26:52,400 --> 00:26:55,450 random variable independent of all the other random 447 00:26:55,450 --> 00:27:02,860 variables, which decides for each arrival which of the 448 00:27:02,860 --> 00:27:04,710 sub-processes it goes to. 449 00:27:04,710 --> 00:27:08,680 So there's this hidden process-- 450 00:27:08,680 --> 00:27:09,890 well, it's not hidden. 451 00:27:09,890 --> 00:27:12,100 You can see what it's doing from looking at all the 452 00:27:12,100 --> 00:27:14,340 sub-processes. 453 00:27:14,340 --> 00:27:20,670 And each arrival then is associated with the given 454 00:27:20,670 --> 00:27:24,700 sub-process, with the probability mass function 455 00:27:24,700 --> 00:27:28,160 lambda sub i over the sum of lambda sub j. 456 00:27:28,160 --> 00:27:30,460 So this is the workhorse of Poisson 457 00:27:30,460 --> 00:27:32,270 type queueing problems. 458 00:27:32,270 --> 00:27:35,990 You study queuing theory, every page, you 459 00:27:35,990 --> 00:27:37,980 see this thing used. 460 00:27:37,980 --> 00:27:41,480 If you look at Kleinrock's books on queueing, they're 461 00:27:41,480 --> 00:27:45,120 very nice books because they cover so many different 462 00:27:45,120 --> 00:27:47,040 queueing situations. 463 00:27:47,040 --> 00:27:50,230 You find him using this on every page. 464 00:27:50,230 --> 00:27:54,060 And he never tells you that he's using it, but that's what 465 00:27:54,060 --> 00:27:54,670 he's doing. 466 00:27:54,670 --> 00:27:59,360 So that's a useful thing to know. 467 00:27:59,360 --> 00:28:02,840 We then talked about conditional arrivals and order 468 00:28:02,840 --> 00:28:05,590 statistics. 469 00:28:05,590 --> 00:28:12,280 The conditional distribution of the N first arrivals-- 470 00:28:12,280 --> 00:28:17,670 namely, s sub 1 s sub 2 up to s sub n-- 471 00:28:17,670 --> 00:28:24,250 given the number of arrivals in N of t is just n factorial 472 00:28:24,250 --> 00:28:25,430 over t to the n. 473 00:28:25,430 --> 00:28:29,380 Again, it doesn't depend on where these arrivals are. 474 00:28:29,380 --> 00:28:33,215 It's just a function which is independent of each arrival. 475 00:28:33,215 --> 00:28:36,660 It's the same kind of conditioning we had before. 476 00:28:36,660 --> 00:28:40,080 It's n factorial divided by t to the n. 477 00:28:40,080 --> 00:28:44,360 Because of the fact that if you order these random 478 00:28:44,360 --> 00:28:49,450 variables, t1 less than t2 less than t3, and so forth, up 479 00:28:49,450 --> 00:28:53,540 until time t, and then you say how many different ways can I 480 00:28:53,540 --> 00:29:01,590 arrange a set of numbers, each between 0 and t so that we 481 00:29:01,590 --> 00:29:03,630 have different orderings of them. 482 00:29:03,630 --> 00:29:06,700 And you can choose any one of the N to be the first. 483 00:29:06,700 --> 00:29:09,560 You can choose any one of the remaining n 484 00:29:09,560 --> 00:29:11,510 minus 1 to be the second. 485 00:29:11,510 --> 00:29:14,670 And that's where this is n factorial comes from here. 486 00:29:14,670 --> 00:29:18,140 And that, again we've been over. 487 00:29:18,140 --> 00:29:21,660 The probability that s1 is greater than tau, given that 488 00:29:21,660 --> 00:29:27,540 they're interarrivals in the overall interval t, comes from 489 00:29:27,540 --> 00:29:31,390 just looking at N uniformly distributed random variables 490 00:29:31,390 --> 00:29:33,190 between 0 and t. 491 00:29:33,190 --> 00:29:35,840 And then what do you do with those uniformly distributed 492 00:29:35,840 --> 00:29:37,670 random variables? 493 00:29:37,670 --> 00:29:40,490 Well, you ask the question, what's the probability that 494 00:29:40,490 --> 00:29:44,140 all of them occur after time tau? 495 00:29:44,140 --> 00:29:47,820 And that's just t minus tau divided by t raised to the 496 00:29:47,820 --> 00:29:48,910 n-th power. 497 00:29:48,910 --> 00:29:51,980 And see, all of these formulas just come from particular 498 00:29:51,980 --> 00:29:54,360 viewpoints about what's going on. 499 00:29:54,360 --> 00:29:55,760 You have a number of viewpoints. 500 00:29:55,760 --> 00:29:58,550 One of them is throwing darts at a line. 501 00:29:58,550 --> 00:30:01,140 One of them is having exponential 502 00:30:01,140 --> 00:30:02,510 interarrival times. 503 00:30:02,510 --> 00:30:06,660 One of them is these uniform interarrivals. 504 00:30:06,660 --> 00:30:08,880 It's only a very small number of tricks. 505 00:30:08,880 --> 00:30:13,600 And you just use them in various combinations. 506 00:30:13,600 --> 00:30:17,800 So the joint distribution of s1 to s n, given N of t equals 507 00:30:17,800 --> 00:30:21,250 n, is the same as the joint distribution of N uniform 508 00:30:21,250 --> 00:30:24,070 random variables after they've been ordered. 509 00:30:28,650 --> 00:30:32,115 So let's go on to finite state Markov chains. 510 00:30:35,240 --> 00:30:37,670 Seems like we're covering an enormous amount of material in 511 00:30:37,670 --> 00:30:38,350 this course. 512 00:30:38,350 --> 00:30:40,150 And I think we are. 513 00:30:40,150 --> 00:30:44,290 But as I'm trying to say, as we go along, it's all-- 514 00:30:44,290 --> 00:30:46,850 I mean, everything follows from a relatively small set of 515 00:30:46,850 --> 00:30:48,620 principles. 516 00:30:48,620 --> 00:30:51,100 Of course, it's harder to understand the small set of 517 00:30:51,100 --> 00:30:54,580 principles and how to apply them than it is to understand 518 00:30:54,580 --> 00:30:55,460 all the details. 519 00:30:55,460 --> 00:30:56,710 But that's-- 520 00:30:58,970 --> 00:31:01,560 but on the other hand, if you understand the principles, 521 00:31:01,560 --> 00:31:04,620 then all those details, including the ones we haven't 522 00:31:04,620 --> 00:31:08,280 talked about, are easy to deal with. 523 00:31:08,280 --> 00:31:11,750 An integer-time stochastic process-- 524 00:31:11,750 --> 00:31:14,450 x1, x2, x3, blah, blah, blah-- 525 00:31:14,450 --> 00:31:19,220 is a Markov chain if for all n, namely the number of them 526 00:31:19,220 --> 00:31:21,770 that we're looking at-- 527 00:31:21,770 --> 00:31:23,020 well-- 528 00:31:25,880 --> 00:31:30,190 for all n, i, j, k, l, and so forth, the probability that 529 00:31:30,190 --> 00:31:35,770 the n-th of these random variables is equal to j, given 530 00:31:35,770 --> 00:31:39,340 what all of the others are-- and these are not ordered now. 531 00:31:39,340 --> 00:31:41,460 I mean, in a Markov chain, nothing is ordered. 532 00:31:41,460 --> 00:31:44,430 We're not talking about an arrival process. 533 00:31:44,430 --> 00:31:47,220 We're just talking about a frog jumping around on lily 534 00:31:47,220 --> 00:31:52,660 pads, if you arrange the lily pads in a linear way, if these 535 00:31:52,660 --> 00:31:54,430 are random variables. 536 00:31:54,430 --> 00:32:00,530 The probability that the n-th location is equal to j, given 537 00:32:00,530 --> 00:32:06,410 that the previous locations are i, k, back to m, is just 538 00:32:06,410 --> 00:32:11,010 some probability p sub i j, a conditional 539 00:32:11,010 --> 00:32:14,120 probability of j given i. 540 00:32:14,120 --> 00:32:17,670 In other words, once if you're looking at what happens at 541 00:32:17,670 --> 00:32:22,340 time n, once you know what happened at time n minus 1, 542 00:32:22,340 --> 00:32:24,830 everything else is of no concern. 543 00:32:24,830 --> 00:32:29,400 This process evolves by having a history of only one time 544 00:32:29,400 --> 00:32:31,980 unit, a little like the Poisson process. 545 00:32:31,980 --> 00:32:36,070 The Poisson process evolves by being totally 546 00:32:36,070 --> 00:32:37,880 independent of the past. 547 00:32:37,880 --> 00:32:40,600 Here, you put a little dependence in the past. 548 00:32:40,600 --> 00:32:44,150 But the dependence is only to look at the last thing that 549 00:32:44,150 --> 00:32:49,040 happened, and nothing before the last time that happened. 550 00:32:49,040 --> 00:32:53,850 So p sub i j depends only on i and j. 551 00:32:53,850 --> 00:32:59,170 And the initial probability mass function is arbitrary. 552 00:32:59,170 --> 00:33:02,470 Markov chain is finite-state if the sample space for each x 553 00:33:02,470 --> 00:33:07,400 i, as a finite set S. And the sample space S is usually 554 00:33:07,400 --> 00:33:10,530 taken to be integers 1 up to M. 555 00:33:10,530 --> 00:33:13,490 In all these formulas we write, we're always summing 556 00:33:13,490 --> 00:33:17,230 from one to M. And the reason for that is we've assumed the 557 00:33:17,230 --> 00:33:22,120 states are 1, 2, 3, up to M. Sometimes it's more convenient 558 00:33:22,120 --> 00:33:23,765 to think of different state spaces. 559 00:33:26,730 --> 00:33:29,040 But all the formulas we use are based on 560 00:33:29,040 --> 00:33:31,290 this state space here. 561 00:33:31,290 --> 00:33:36,500 Markov up chain is completely described by these transition 562 00:33:36,500 --> 00:33:41,200 probabilities plus the initial probabilities. 563 00:33:41,200 --> 00:33:44,390 If you want to write down the probability of what x is this 564 00:33:44,390 --> 00:33:49,030 some time N given what was at some time 0, all you have to 565 00:33:49,030 --> 00:33:52,890 do is trace all the paths from 0 out to N, add up the 566 00:33:52,890 --> 00:33:56,890 probabilities of all of those paths, and that tells you the 567 00:33:56,890 --> 00:33:58,020 probability you want. 568 00:33:58,020 --> 00:34:01,820 All probabilities and be calculated just from knowing 569 00:34:01,820 --> 00:34:06,240 what these transition probabilities are. 570 00:34:06,240 --> 00:34:10,980 Note that when we're dealing with Poisson processes, we 571 00:34:10,980 --> 00:34:15,520 defined everything in terms of how many-- 572 00:34:15,520 --> 00:34:20,250 how many variables are there in defining a Poisson process? 573 00:34:20,250 --> 00:34:25,020 How many things do you have to specify before I know exactly 574 00:34:25,020 --> 00:34:27,320 what Poisson process I'm talking about? 575 00:34:30,540 --> 00:34:31,760 Only the Poisson rate. 576 00:34:31,760 --> 00:34:35,650 Only one parameter is necessary 577 00:34:35,650 --> 00:34:37,639 for a Poisson process. 578 00:34:37,639 --> 00:34:43,219 For a finite-state Markov process, you need a lot more. 579 00:34:43,219 --> 00:34:48,310 What you need is all of these values, p sub i j. 580 00:34:48,310 --> 00:34:52,409 If you sum p sub i j over j, you have to get 1. 581 00:34:52,409 --> 00:34:54,830 So that removes one of them. 582 00:34:54,830 --> 00:34:58,360 But as soon as you specify that transition matrix, you've 583 00:34:58,360 --> 00:34:59,960 specified everything. 584 00:34:59,960 --> 00:35:01,260 So there's nothing more to know 585 00:35:01,260 --> 00:35:03,220 about the Poisson process. 586 00:35:03,220 --> 00:35:06,060 There's only all these gruesome derivations that we 587 00:35:06,060 --> 00:35:07,580 go through. 588 00:35:07,580 --> 00:35:11,600 But everything is initially determined. 589 00:35:11,600 --> 00:35:13,960 Set of transition probabilities is usually 590 00:35:13,960 --> 00:35:16,030 viewed as the Markov chain. 591 00:35:16,030 --> 00:35:19,760 And the initial probabilities are usually viewed as just a 592 00:35:19,760 --> 00:35:21,740 parameter that we deal with. 593 00:35:21,740 --> 00:35:23,840 In other words, we-- 594 00:35:23,840 --> 00:35:28,250 in other words, what we study is the particular Markov 595 00:35:28,250 --> 00:35:31,550 chain, whether it's recurrent, whether it's transient, 596 00:35:31,550 --> 00:35:32,800 whatever it is. 597 00:35:32,800 --> 00:35:35,770 How you break it up into classes, all of that stuff 598 00:35:35,770 --> 00:35:39,060 only depends on these transition probabilities and 599 00:35:39,060 --> 00:35:40,815 doesn't depend on where you start. 600 00:35:46,920 --> 00:35:51,490 Now, a finite-state Markov chain can be described either 601 00:35:51,490 --> 00:35:54,230 as a directed graph or as a matrix. 602 00:35:54,230 --> 00:35:58,300 I hope you've seen by this time that some things are 603 00:35:58,300 --> 00:36:03,040 easier to look at if you look at things in terms of a graph. 604 00:36:03,040 --> 00:36:07,180 Some things are easier to look at if you look at something 605 00:36:07,180 --> 00:36:08,660 like this matrix. 606 00:36:08,660 --> 00:36:13,230 And some problems can be solved by inspection, if you 607 00:36:13,230 --> 00:36:14,700 draw a graph of it. 608 00:36:14,700 --> 00:36:17,890 Some can be solved almost by inspection if 609 00:36:17,890 --> 00:36:19,480 you look at the matrix. 610 00:36:19,480 --> 00:36:23,460 If you're doing things by computer, usually computers 611 00:36:23,460 --> 00:36:27,450 deal with matrices more easily than with graphs. 612 00:36:27,450 --> 00:36:31,070 If you're dealing with a Markov chain with 100,000 613 00:36:31,070 --> 00:36:35,290 states, you're not going to look at the graph and 614 00:36:35,290 --> 00:36:38,330 determine very much from it, because it's typically going 615 00:36:38,330 --> 00:36:39,650 to be fairly complicated-- 616 00:36:39,650 --> 00:36:42,020 unless it has some very simple structure. 617 00:36:42,020 --> 00:36:46,440 And sometimes that simple structure is determined. 618 00:36:46,440 --> 00:36:48,780 If it's something where you can only-- 619 00:36:48,780 --> 00:36:52,190 where you have the states numbered from 1 to 100,000, 620 00:36:52,190 --> 00:36:56,270 and you can only go from state i to state i plus 1, or from 621 00:36:56,270 --> 00:36:59,910 state i to i plus 1, or i minus 1, then it 622 00:36:59,910 --> 00:37:01,380 becomes very simple. 623 00:37:01,380 --> 00:37:04,320 And you like to look at it as a graph again. 624 00:37:04,320 --> 00:37:07,670 But ordinarily, you don't like to do that. 625 00:37:07,670 --> 00:37:15,000 But the nice thing about this graph is that it tells you 626 00:37:15,000 --> 00:37:19,090 very simply and visually which transition probabilities are 627 00:37:19,090 --> 00:37:23,810 zero, and which transition probabilities are non-zero. 628 00:37:23,810 --> 00:37:26,690 And that's the thing that specifies which states are 629 00:37:26,690 --> 00:37:31,650 recurrent, which states are transient, and all of that. 630 00:37:31,650 --> 00:37:35,400 All of that kind of elementary analysis about a Markov chain 631 00:37:35,400 --> 00:37:40,300 all comes from looking at this graph and seeing whether you 632 00:37:40,300 --> 00:37:46,290 can get from one state to another state by some process. 633 00:37:46,290 --> 00:37:50,520 So let's move on from that. 634 00:37:50,520 --> 00:37:53,620 Talk about the classification of states. 635 00:37:53,620 --> 00:37:57,500 We started out with the idea of a walk and 636 00:37:57,500 --> 00:37:59,370 a path and a cycle. 637 00:37:59,370 --> 00:38:03,610 I'm not sure these terms are uniform throughout the field. 638 00:38:03,610 --> 00:38:07,550 But a walk is an ordered string of nodes, like 639 00:38:07,550 --> 00:38:10,020 i0, i1, up to i n. 640 00:38:10,020 --> 00:38:14,960 You can have repeated elements here, but you need a directed 641 00:38:14,960 --> 00:38:18,170 arc from i sub n minus 1 to i sub m. 642 00:38:18,170 --> 00:38:23,035 Like for example, in this stupid Markov chain here-- 643 00:38:25,870 --> 00:38:28,880 I mean, when you're drawing things is LaTeX, it's kind of 644 00:38:28,880 --> 00:38:31,760 hard to draw those nice little curves there. 645 00:38:31,760 --> 00:38:34,610 And because of that, when you once draw a Markov chain, you 646 00:38:34,610 --> 00:38:36,050 never want to change it. 647 00:38:36,050 --> 00:38:39,210 And that's why these nodes have a very small set of 648 00:38:39,210 --> 00:38:40,530 Markov chains in them. 649 00:38:40,530 --> 00:38:46,580 It's just to save me some work, drawing and drawing 650 00:38:46,580 --> 00:38:47,830 these diagrams. 651 00:38:50,030 --> 00:38:55,700 An example of a walk, as you start in 4, you take the self 652 00:38:55,700 --> 00:38:58,800 loop, go back to 4 at time 2. 653 00:38:58,800 --> 00:39:01,660 Then you go to state 1 at time 3. 654 00:39:01,660 --> 00:39:05,240 Then you go to state 2 at time 4. 655 00:39:05,240 --> 00:39:08,140 Then you go to stage 3, time 5. 656 00:39:08,140 --> 00:39:11,010 And back to state 2 at time 6. 657 00:39:11,010 --> 00:39:13,300 You have repeated nodes there. 658 00:39:13,300 --> 00:39:17,230 You have repeated nodes separated here. 659 00:39:17,230 --> 00:39:20,630 Another example of a walk is 4, 1, 2, 3. 660 00:39:20,630 --> 00:39:24,120 Example of a path, the path can't have any repeated nodes. 661 00:39:24,120 --> 00:39:27,060 We'd like to look at paths, because if you're going to be 662 00:39:27,060 --> 00:39:30,280 able to get from one node to another node, and there's some 663 00:39:30,280 --> 00:39:33,420 walk that goes all around the place and gets to that final 664 00:39:33,420 --> 00:39:36,770 node, there's also path that goes there. 665 00:39:36,770 --> 00:39:39,900 If you look at the walk, you just leave that all the cycles 666 00:39:39,900 --> 00:39:42,570 along the way, and you get to the n. 667 00:39:42,570 --> 00:39:45,980 And a cycle, of course, which I didn't define, is something 668 00:39:45,980 --> 00:39:49,820 which starts at one node, goes through a path, and then 669 00:39:49,820 --> 00:39:52,730 finally comes back to the same node that it started at. 670 00:39:52,730 --> 00:39:56,800 And it doesn't make any difference for the cycle 2, 3, 671 00:39:56,800 --> 00:40:01,610 2 whether you call it 2, 3, 2 or 3, 2, 3. 672 00:40:01,610 --> 00:40:04,390 That's the same cycle, and it's not even worth 673 00:40:04,390 --> 00:40:07,200 distinguishing between those two ideas. 674 00:40:07,200 --> 00:40:12,723 OK That's that. 675 00:40:15,360 --> 00:40:20,010 If there's a path from-- 676 00:40:20,010 --> 00:40:21,260 where did I-- 677 00:40:26,110 --> 00:40:31,800 node j is accessible from i, which we abbreviate as i 678 00:40:31,800 --> 00:40:33,680 has a path to j. 679 00:40:33,680 --> 00:40:38,010 If there's a walk from i to j, which means that p 680 00:40:38,010 --> 00:40:40,650 sup i j to the n-- 681 00:40:40,650 --> 00:40:44,150 this is the transition probability, the probability 682 00:40:44,150 --> 00:40:49,160 that x sub n is equal to j, given that x sub 683 00:40:49,160 --> 00:40:50,710 0 is equal to i. 684 00:40:50,710 --> 00:40:53,380 And we use this all the time. 685 00:40:53,380 --> 00:40:57,370 If this is greater than zero for some n greater than 0. 686 00:40:57,370 --> 00:41:06,950 In other words, j is accessible from i if there's a 687 00:41:06,950 --> 00:41:09,240 path from i that goes to j. 688 00:41:12,300 --> 00:41:17,170 And trivially, if i go to j, and there's a path from j to 689 00:41:17,170 --> 00:41:21,520 k, then there has to be a path from i to k. 690 00:41:21,520 --> 00:41:25,730 If you've ever tried to make up a mapping program to find 691 00:41:25,730 --> 00:41:28,910 how to get from here to there, this is one of the most useful 692 00:41:28,910 --> 00:41:29,740 things you use. 693 00:41:29,740 --> 00:41:32,320 If there's a way to get here to there, and a way to get 694 00:41:32,320 --> 00:41:35,330 from here to there, then there's a way to get from here 695 00:41:35,330 --> 00:41:37,560 all the way to the end. 696 00:41:37,560 --> 00:41:42,650 And if you look up what most of these map programs do, you 697 00:41:42,650 --> 00:41:47,040 see that they overuse this enormously and they wind up 698 00:41:47,040 --> 00:41:50,910 taking you from here to there by some bizarre path just 699 00:41:50,910 --> 00:41:53,880 because it happens to go through some intermediate node 700 00:41:53,880 --> 00:41:55,460 on the way. 701 00:41:55,460 --> 00:41:58,680 So two nodes communicate-- 702 00:41:58,680 --> 00:42:01,890 i double arrow j-- 703 00:42:01,890 --> 00:42:08,860 if j is accessible from i, and if i is accessible from j. 704 00:42:08,860 --> 00:42:12,450 That means there's a path from i to j, and another path from 705 00:42:12,450 --> 00:42:16,260 j back to i, if you shorten them as much as you can. 706 00:42:16,260 --> 00:42:17,040 There's a cycle. 707 00:42:17,040 --> 00:42:23,530 It starts at i, goes through j, and comes back to i again. 708 00:42:23,530 --> 00:42:29,810 I didn't say that quite right, so delete that from what 709 00:42:29,810 --> 00:42:31,200 you've just heard. 710 00:42:31,200 --> 00:42:35,630 A class C of states as a non-empty set, such that i and 711 00:42:35,630 --> 00:42:40,370 j communicate for each i j in this class. 712 00:42:40,370 --> 00:42:45,330 But i does not communicate with j for each i and C-- 713 00:42:49,420 --> 00:42:53,210 for i and C and j, not in C. 714 00:42:53,210 --> 00:42:55,870 The convenient way to think about this-- and I should have 715 00:42:55,870 --> 00:42:59,670 stated this as a theorem in the notes, because it's-- 716 00:43:03,990 --> 00:43:06,130 I think it's something that we all use without even 717 00:43:06,130 --> 00:43:07,750 thinking about it. 718 00:43:07,750 --> 00:43:12,480 It says that the entire set of states, or the entire set of 719 00:43:12,480 --> 00:43:16,500 nodes in a graph, is partitioned into classes. 720 00:43:16,500 --> 00:43:22,860 The class C, containing, is i in union with all of the j's 721 00:43:22,860 --> 00:43:24,110 that communicate with i. 722 00:43:24,110 --> 00:43:27,580 So if you want to find this partition, you start out with 723 00:43:27,580 --> 00:43:31,280 an arbitrary node, you find all of the other nodes that it 724 00:43:31,280 --> 00:43:34,590 communicates with, and you find them by picking 725 00:43:34,590 --> 00:43:36,320 them one at a time. 726 00:43:36,320 --> 00:43:41,050 You pick all of the nodes for which p sub i j is 727 00:43:41,050 --> 00:43:42,540 greater than 0. 728 00:43:42,540 --> 00:43:44,100 Then you pick-- 729 00:43:44,100 --> 00:43:46,530 and p sub j i is great-- 730 00:43:46,530 --> 00:43:47,780 well-- blah. 731 00:43:50,030 --> 00:43:55,400 If you want to find the set of nodes that are accessible from 732 00:43:55,400 --> 00:43:57,640 i, you start out looking at i. 733 00:43:57,640 --> 00:44:00,640 You look at all the states which are accessible 734 00:44:00,640 --> 00:44:03,300 from i in one step. 735 00:44:03,300 --> 00:44:06,870 Then you look at all the steps, all of the states, 736 00:44:06,870 --> 00:44:09,380 which you can access from any one of those. 737 00:44:09,380 --> 00:44:12,720 Those are the states which are accessible in two states-- 738 00:44:12,720 --> 00:44:16,150 in two steps, then in three steps, and so forth. 739 00:44:16,150 --> 00:44:21,380 So you find all the nodes that are accessible from node i. 740 00:44:21,380 --> 00:44:24,640 And then you turn around and do it the other way. 741 00:44:24,640 --> 00:44:29,600 And presto, you have all of these classes of states all 742 00:44:29,600 --> 00:44:30,910 very simply. 743 00:44:30,910 --> 00:44:34,990 For finite-state change, the state i is transient if 744 00:44:34,990 --> 00:44:40,200 there's a j in S such that i goes into j, but j 745 00:44:40,200 --> 00:44:41,420 does not go into i. 746 00:44:41,420 --> 00:44:46,900 In other words, if I'm a state i, and I can get to you, but 747 00:44:46,900 --> 00:44:55,450 you can't get back to me, then I'm transient. 748 00:44:55,450 --> 00:45:01,600 Because the way Markov chains work, we keep going from one 749 00:45:01,600 --> 00:45:04,720 step to the next step to the next step to the next step. 750 00:45:04,720 --> 00:45:09,710 And if I keep returning to myself, then eventually I'm 751 00:45:09,710 --> 00:45:11,010 going to go to you. 752 00:45:11,010 --> 00:45:14,040 And once I go to you, I'll never get back again. 753 00:45:14,040 --> 00:45:18,540 So because of that, these transient states are states 754 00:45:18,540 --> 00:45:21,450 where eventually you leave them and you 755 00:45:21,450 --> 00:45:23,160 never get back again. 756 00:45:23,160 --> 00:45:26,190 As soon as we start talking about countable state Markov 757 00:45:26,190 --> 00:45:28,270 chains, you'll see that this definition 758 00:45:28,270 --> 00:45:30,250 doesn't work anymore. 759 00:45:30,250 --> 00:45:32,620 You can-- 760 00:45:32,620 --> 00:45:36,520 it is very possible to just wander away in a countable 761 00:45:36,520 --> 00:45:40,390 state Markov chain, and you never get back again that way. 762 00:45:40,390 --> 00:45:43,640 After you wander away too far, the probability of getting 763 00:45:43,640 --> 00:45:45,540 back gets smaller and smaller. 764 00:45:45,540 --> 00:45:47,830 You keep getting further and further away. 765 00:45:47,830 --> 00:45:52,810 The probability of returning gets smaller and smaller, so 766 00:45:52,810 --> 00:45:56,360 that you have transience that way also. 767 00:45:56,360 --> 00:45:59,470 But here, the situation is simpler for a finite-state 768 00:45:59,470 --> 00:46:01,030 Markov chain. 769 00:46:01,030 --> 00:46:05,570 And you can define transience if there's a j in S such that 770 00:46:05,570 --> 00:46:09,440 i goes into j, but j doesn't go into i. 771 00:46:09,440 --> 00:46:13,160 If i's not transient, then it's recurrent. 772 00:46:13,160 --> 00:46:16,240 Usually you define recurrence first and transience later, 773 00:46:16,240 --> 00:46:19,470 but it's a little simpler this way. 774 00:46:19,470 --> 00:46:22,310 All states in a class are transient, or all are 775 00:46:22,310 --> 00:46:26,330 recurrent, and a finite-state Markov chain contains at least 776 00:46:26,330 --> 00:46:27,990 one recurrent class. 777 00:46:27,990 --> 00:46:29,770 You did that in your homework. 778 00:46:29,770 --> 00:46:33,040 And you were surprised at how complicated it was to do it. 779 00:46:33,040 --> 00:46:36,350 I hope that after you wrote down a proof of this, you 780 00:46:36,350 --> 00:46:41,800 stopped and thought about what you were actually proving, 781 00:46:41,800 --> 00:46:46,030 which intuitively is something very, very simple. 782 00:46:46,030 --> 00:46:48,960 It's just looking at all of the transient classes. 783 00:46:48,960 --> 00:46:51,480 Starting at one transient class, you 784 00:46:51,480 --> 00:46:54,950 find if there's another-- 785 00:46:54,950 --> 00:46:59,190 if there's another state you can get to from OK i which is 786 00:46:59,190 --> 00:47:02,170 also transient, and then you find if there's another state 787 00:47:02,170 --> 00:47:04,910 you get to from there which is also transient. 788 00:47:04,910 --> 00:47:08,500 And eventually, you have to come to a state from which you 789 00:47:08,500 --> 00:47:13,325 can't go to some other state, from which you can't get back. 790 00:47:17,350 --> 00:47:20,410 That was explaining it almost as badly as the problem 791 00:47:20,410 --> 00:47:22,120 statement explained it. 792 00:47:22,120 --> 00:47:25,460 And I hope that after you did the problem, even if you can't 793 00:47:25,460 --> 00:47:27,910 explain it to someone, you have an 794 00:47:27,910 --> 00:47:30,430 understanding of why it's true. 795 00:47:30,430 --> 00:47:34,920 It shouldn't be surprising after you do that. 796 00:47:34,920 --> 00:47:38,950 So the finite-state Markov chain contains at least one 797 00:47:38,950 --> 00:47:40,200 recurrent class. 798 00:47:42,800 --> 00:47:46,720 OK, the period of a state i as the greatest common 799 00:47:46,720 --> 00:47:51,730 denominator of n, such that p i n is greater than 0. 800 00:47:51,730 --> 00:47:54,580 Again, a very complicated definition for a 801 00:47:54,580 --> 00:47:56,280 simple kind of idea. 802 00:47:56,280 --> 00:47:58,670 Namely, you start out in a state i. 803 00:47:58,670 --> 00:48:02,440 You look at all of the times at which you can get back to 804 00:48:02,440 --> 00:48:03,940 state i again. 805 00:48:03,940 --> 00:48:08,780 If you find it that set of times has a period in it, 806 00:48:08,780 --> 00:48:19,550 namely, if every sequences of states is a multiple of some 807 00:48:19,550 --> 00:48:25,410 d, then you know that the state is periodic if d is 808 00:48:25,410 --> 00:48:26,720 greater than 1. 809 00:48:26,720 --> 00:48:30,060 And what you have to do is to find the largest such number. 810 00:48:30,060 --> 00:48:32,040 And that's the period of the state. 811 00:48:32,040 --> 00:48:35,170 All states in the same class have the same period. 812 00:48:35,170 --> 00:48:38,690 A recurring class with period d greater than one can be 813 00:48:38,690 --> 00:48:40,550 partitioned into sub-class-- 814 00:48:40,550 --> 00:48:42,640 this is the best way of looking at 815 00:48:42,640 --> 00:48:45,820 periodic classes of states. 816 00:48:45,820 --> 00:48:49,780 If you have a periodic class of states, then you can always 817 00:48:49,780 --> 00:48:53,960 separate it into d sub-classes. 818 00:48:53,960 --> 00:48:59,300 And in such a set of sub-classes, transitions from 819 00:48:59,300 --> 00:49:03,770 S1 and the states in S1 only go to S2. 820 00:49:03,770 --> 00:49:07,710 Transitions from states in S2 only go to S3. 821 00:49:07,710 --> 00:49:12,430 Up to, transitions from S d only go back to S1. 822 00:49:12,430 --> 00:49:16,050 They have to go someplace, so they go back to S1. 823 00:49:16,050 --> 00:49:22,500 So as you cycle around, it takes d steps to cycle from 1 824 00:49:22,500 --> 00:49:24,000 back to 1 again. 825 00:49:24,000 --> 00:49:28,410 It takes d steps to cycle from 2 back to 2 again. 826 00:49:28,410 --> 00:49:31,300 So you can see the structure of the Markov chain and why, 827 00:49:31,300 --> 00:49:34,810 in fact, it does have to be-- 828 00:49:34,810 --> 00:49:38,480 why that class has to be periodic. 829 00:49:38,480 --> 00:49:41,870 An ergodic class is a recurrent aperiodic class. 830 00:49:41,870 --> 00:49:44,760 In other words, it's a class where the period is equal to 831 00:49:44,760 --> 00:49:48,450 1, which means there really isn't any period. 832 00:49:48,450 --> 00:49:52,550 A Markov chain with only one class is ergodic if the class 833 00:49:52,550 --> 00:49:54,640 is ergodic. 834 00:49:54,640 --> 00:49:56,880 And the big theorem here-- 835 00:49:56,880 --> 00:49:59,670 I mean, this is probably the most important theorem about 836 00:49:59,670 --> 00:50:01,820 finite-state Markov chains. 837 00:50:01,820 --> 00:50:05,100 You have an ergodic, finite-state Markov chain. 838 00:50:05,100 --> 00:50:12,300 Then the limit as n goes to infinity of the probability of 839 00:50:12,300 --> 00:50:16,700 arriving in state j after n steps, given that you started 840 00:50:16,700 --> 00:50:20,780 in state i, is just some function of j. 841 00:50:20,780 --> 00:50:24,400 In other words, when n gets very large, it doesn't depend 842 00:50:24,400 --> 00:50:27,370 on how large M is. 843 00:50:27,370 --> 00:50:28,480 It stays the same. 844 00:50:28,480 --> 00:50:30,570 It becomes independent of n. 845 00:50:30,570 --> 00:50:32,450 It doesn't depend on where you started. 846 00:50:32,450 --> 00:50:34,860 No matter where you start in a finite-state 847 00:50:34,860 --> 00:50:36,570 ergodic Markov chain. 848 00:50:36,570 --> 00:50:40,580 After a very long time, the probability of being in a 849 00:50:40,580 --> 00:50:44,620 state j is independent of where you started, and it's 850 00:50:44,620 --> 00:50:48,170 independent of how long you've been running. 851 00:50:48,170 --> 00:50:52,200 So that's a very strong kind of-- 852 00:50:52,200 --> 00:50:54,890 it's a very strong kind of limit theorem. 853 00:50:54,890 --> 00:50:58,690 It's very much like the law of large numbers and all of these 854 00:50:58,690 --> 00:51:00,030 other things. 855 00:51:00,030 --> 00:51:03,120 I'm going to talk a little bit at the end about what that 856 00:51:03,120 --> 00:51:04,820 relationship really is. 857 00:51:07,360 --> 00:51:10,850 Except what it says is, after a long time, you're in steady 858 00:51:10,850 --> 00:51:12,670 state, which is why it's called the 859 00:51:12,670 --> 00:51:13,760 steady state theorem. 860 00:51:13,760 --> 00:51:14,440 Yes? 861 00:51:14,440 --> 00:51:17,386 AUDIENCE: Could you define the steady states for periodic 862 00:51:17,386 --> 00:51:18,636 changes [INAUDIBLE]? 863 00:51:21,320 --> 00:51:26,460 PROFESSOR: I try to avoid doing that because you have 864 00:51:26,460 --> 00:51:28,650 steady state probabilities. 865 00:51:28,650 --> 00:51:31,810 The steady state probabilities that you have are, you take-- 866 00:51:34,990 --> 00:51:38,760 is if you have these sub-classes. 867 00:51:38,760 --> 00:51:42,690 Then you wind up with a steady state within each sub-class. 868 00:51:42,690 --> 00:51:46,900 If you assign a probability of the probability in the 869 00:51:46,900 --> 00:51:51,870 sub-class, divided by d, then you get what is the steady 870 00:51:51,870 --> 00:51:52,930 state probability. 871 00:51:52,930 --> 00:51:56,870 If you start out in that steady state, then you're in 872 00:51:56,870 --> 00:52:00,130 each sub-class with probability 1 over d. 873 00:52:00,130 --> 00:52:04,230 And you shift to the next sub-class and you're still in 874 00:52:04,230 --> 00:52:08,340 steady state, because you have a probability, 1 over d, of 875 00:52:08,340 --> 00:52:12,230 being in each of those sub-classes to start with. 876 00:52:12,230 --> 00:52:16,970 You shift and you're still in one of the sub-classes with 877 00:52:16,970 --> 00:52:19,130 probability 1 over d. 878 00:52:19,130 --> 00:52:22,690 So there still is a steady state in that sense, but 879 00:52:22,690 --> 00:52:24,830 there's not a steady state in any nice sense. 880 00:52:31,940 --> 00:52:39,470 So anyway, that's the way it is. 881 00:52:39,470 --> 00:52:44,860 But you see, if you understand this theorem for ergodic 882 00:52:44,860 --> 00:52:48,550 finite state and Markov chains, and then you 883 00:52:48,550 --> 00:52:52,540 understand about periodic change and this set of 884 00:52:52,540 --> 00:52:56,070 sub-classes, you can see within each 885 00:52:56,070 --> 00:52:59,450 sub-class, if you look at-- 886 00:52:59,450 --> 00:53:00,700 if you look at-- 887 00:53:04,440 --> 00:53:11,500 if you look at time 0, time d, time 2d, times 3d and 4d, then 888 00:53:11,500 --> 00:53:14,470 whatever state you start in, you're going to be in the same 889 00:53:14,470 --> 00:53:19,380 class after d steps, the same class after 2d steps. 890 00:53:19,380 --> 00:53:21,480 You're going to have a transition 891 00:53:21,480 --> 00:53:24,280 matrix over d steps. 892 00:53:24,280 --> 00:53:27,360 And this theorem still applies to these sub-classes over 893 00:53:27,360 --> 00:53:29,200 periods of d. 894 00:53:29,200 --> 00:53:32,030 So the hard part of it is proving this. 895 00:53:32,030 --> 00:53:35,180 After you prove this, then you see that the same thing 896 00:53:35,180 --> 00:53:38,200 happens over each sub-class after that. 897 00:53:43,650 --> 00:53:45,290 That's a pretty major theorem. 898 00:53:45,290 --> 00:53:46,990 It's difficult to prove. 899 00:53:46,990 --> 00:53:50,890 A sub-step is to show that for an ergodic M state Markov 900 00:53:50,890 --> 00:53:56,380 chain, the probability of being in state j at time n, 901 00:53:56,380 --> 00:54:00,930 given that you're in state i at time 0, is positive for all 902 00:54:00,930 --> 00:54:05,870 i j, and all n greater than M minus 1 squared plus 1. 903 00:54:05,870 --> 00:54:10,900 It's very surprising that you have to go this many states-- 904 00:54:10,900 --> 00:54:14,980 this many steps before you get to the point that all these 905 00:54:14,980 --> 00:54:18,440 transition probabilities are positive. 906 00:54:18,440 --> 00:54:22,450 You look at this particular kind of Markov chain in the 907 00:54:22,450 --> 00:54:26,660 homework, and I hope what you found out from it was that if 908 00:54:26,660 --> 00:54:32,040 you start, say, in state two, then at the next time, you 909 00:54:32,040 --> 00:54:33,640 have to be in 3. 910 00:54:33,640 --> 00:54:37,020 Next time, you have to be in 4, you have to be in 5, you 911 00:54:37,020 --> 00:54:38,560 have to be in 6. 912 00:54:38,560 --> 00:54:41,300 In other words, the size of the set that you can be in 913 00:54:41,300 --> 00:54:46,550 after one step is just 1. 914 00:54:46,550 --> 00:54:51,170 One possible state here, one possible state here, one 915 00:54:51,170 --> 00:54:52,640 possible state here. 916 00:54:52,640 --> 00:54:57,250 The next step, you're in either 1 or 2, and as you 917 00:54:57,250 --> 00:55:01,600 travel around, the size of the set of states you can be in at 918 00:55:01,600 --> 00:55:06,510 these different steps, is 2, until you get all the way 919 00:55:06,510 --> 00:55:07,510 around again. 920 00:55:07,510 --> 00:55:09,800 And then there's a way to get-- 921 00:55:09,800 --> 00:55:15,050 when you get to state 6 again, the set of states enlarges. 922 00:55:15,050 --> 00:55:18,970 So finally you get up to a set of states, which is 923 00:55:18,970 --> 00:55:20,800 up to M minus 1. 924 00:55:20,800 --> 00:55:25,630 And that's why you get the M minus 1 squared here, plus 1. 925 00:55:25,630 --> 00:55:28,710 And this is the only Markov chain there is. 926 00:55:28,710 --> 00:55:31,850 You can have as many states going around 927 00:55:31,850 --> 00:55:33,770 here as you want to. 928 00:55:33,770 --> 00:55:36,020 But you have to have this structure at the end, where 929 00:55:36,020 --> 00:55:39,930 there's one special state and one way of circumventing it, 930 00:55:39,930 --> 00:55:43,930 which means there's one cycle of size M minus 1, and one 931 00:55:43,930 --> 00:55:48,440 cycle of size M. And that's the only way you can get it. 932 00:55:48,440 --> 00:55:52,780 And that's the only Markov chain that meets this bound 933 00:55:52,780 --> 00:55:53,640 with equality. 934 00:55:53,640 --> 00:56:01,470 In all other cases, you get this property much earlier. 935 00:56:01,470 --> 00:56:05,200 And often, you get it after just a linear amount of time. 936 00:56:09,360 --> 00:56:13,350 The other part of this major theorem that you reach steady 937 00:56:13,350 --> 00:56:17,350 state says, let P be greater than 0. 938 00:56:17,350 --> 00:56:19,150 In other words, let all the transition 939 00:56:19,150 --> 00:56:22,410 probabilities be positive. 940 00:56:22,410 --> 00:56:28,040 And then define some quantity alpha as a minimum of the 941 00:56:28,040 --> 00:56:30,160 transition probabilities. 942 00:56:30,160 --> 00:56:34,110 And then the theorem says, for all states j and all n greater 943 00:56:34,110 --> 00:56:38,470 than or equal to 1, the maximum over the initial 944 00:56:38,470 --> 00:56:43,180 states minus the minimum over the initial states of P sub i 945 00:56:43,180 --> 00:56:49,040 j to the n plus-- first step, that difference is less than 946 00:56:49,040 --> 00:56:52,470 or equal to the difference a the n-th step, 947 00:56:52,470 --> 00:56:54,300 times 1 minus 2 alpha. 948 00:56:54,300 --> 00:56:58,970 Now 1 minus 2 alpha is as a positive number. 949 00:56:58,970 --> 00:57:03,700 And this says that this maximum minus minimum is 1 950 00:57:03,700 --> 00:57:07,860 minus 2 alpha to the n, which says that the limit of the 951 00:57:07,860 --> 00:57:11,220 maximizing term is equal to the limit of 952 00:57:11,220 --> 00:57:12,640 the minimizing term. 953 00:57:12,640 --> 00:57:13,850 And what does that say? 954 00:57:13,850 --> 00:57:18,740 It says that everything in the middle gets squeezed together. 955 00:57:18,740 --> 00:57:24,200 And it says exactly what we want it to say, that the limit 956 00:57:24,200 --> 00:57:30,380 of P sub l j to the n is independent of l, after n gets 957 00:57:30,380 --> 00:57:31,310 very large. 958 00:57:31,310 --> 00:57:34,090 Because the maximum and the minimum get very 959 00:57:34,090 --> 00:57:37,560 close to each other. 960 00:57:37,560 --> 00:57:40,170 We also showed that [? our ?] approaches that limit 961 00:57:40,170 --> 00:57:41,780 exponentially. 962 00:57:41,780 --> 00:57:43,640 That's what this says. 963 00:57:43,640 --> 00:57:49,860 The exponent here is just this alpha, determined in that way. 964 00:57:49,860 --> 00:57:54,630 And the theorem for ergodic Markov chains then follows by 965 00:57:54,630 --> 00:58:01,380 just looking at successive h steps in the Markov chain when 966 00:58:01,380 --> 00:58:06,110 h is large enough so that all these transition probabilities 967 00:58:06,110 --> 00:58:07,360 are positive. 968 00:58:09,300 --> 00:58:12,220 So you go out far enough that all the transition 969 00:58:12,220 --> 00:58:13,860 probabilities are positive. 970 00:58:13,860 --> 00:58:16,980 And then you look at repetitions of that, and apply 971 00:58:16,980 --> 00:58:18,230 this theorem. 972 00:58:18,230 --> 00:58:21,570 And suddenly you have this general theorem, 973 00:58:21,570 --> 00:58:22,900 which is what we wanted. 974 00:58:27,200 --> 00:58:30,530 An ergodic unichain is a Markov up chain with one 975 00:58:30,530 --> 00:58:33,870 ergodic recurring class, plus perhaps a set 976 00:58:33,870 --> 00:58:36,550 of transient states. 977 00:58:36,550 --> 00:58:39,600 And most of the things we talk about in this course are for 978 00:58:39,600 --> 00:58:45,870 unichains, usually ergodic unichains, because if you have 979 00:58:45,870 --> 00:58:49,160 multiple recurrent classes, it just makes a mess. 980 00:58:49,160 --> 00:58:51,780 You wind up in this recurrent class, or 981 00:58:51,780 --> 00:58:53,950 this recurrent class. 982 00:58:53,950 --> 00:59:00,080 And aside from the question of which one you get to, you 983 00:59:00,080 --> 00:59:01,730 don't much care about it. 984 00:59:01,730 --> 00:59:05,790 And the theorem here is for an ergodic finite-state unichain. 985 00:59:05,790 --> 00:59:10,370 The limit of P sub i j to the n probability of being in 986 00:59:10,370 --> 00:59:15,130 state j at time n, given that you're in state i at time 0, 987 00:59:15,130 --> 00:59:17,290 is equal to pi sub j. 988 00:59:17,290 --> 00:59:22,330 In other words, this limit here exists for all i j. 989 00:59:22,330 --> 00:59:25,210 And the limit is independent of i. 990 00:59:25,210 --> 00:59:27,900 And it's independent of n as n gets big enough. 991 00:59:32,820 --> 00:59:42,970 And then also, we can choose this so that this set of 992 00:59:42,970 --> 00:59:47,680 probabilities here satisfies this, what's called the steady 993 00:59:47,680 --> 00:59:51,780 state condition, the sum of pi i times P sub i j 994 00:59:51,780 --> 00:59:53,140 is equal to pi j. 995 00:59:53,140 --> 00:59:56,380 In other words, if you start out in steady state, and you 996 00:59:56,380 --> 01:00:00,300 look at the probabilities of being in the different states 997 01:00:00,300 --> 01:00:06,610 at the next time unit, this is the probability of being in 998 01:00:06,610 --> 01:00:11,610 state j at time n plus 1, if this is the probability of 999 01:00:11,610 --> 01:00:14,420 being in state i at time n. 1000 01:00:14,420 --> 01:00:17,790 So that condition gets satisfied. 1001 01:00:17,790 --> 01:00:19,280 That condition is satisfied. 1002 01:00:19,280 --> 01:00:22,760 You just stay in steady state forever. 1003 01:00:22,760 --> 01:00:29,210 And pi i has to be positive for a recurrent i, and pi i is 1004 01:00:29,210 --> 01:00:31,680 equal to 0 otherwise. 1005 01:00:31,680 --> 01:00:35,230 So this is just a generalization 1006 01:00:35,230 --> 01:00:38,090 of the ergodic theorem. 1007 01:00:38,090 --> 01:00:43,400 And this is not what people refer to as the ergodic 1008 01:00:43,400 --> 01:00:48,160 theorem, which is a much more general theorem than this. 1009 01:00:48,160 --> 01:00:50,900 This is the ergodic theorem for the case of finite state 1010 01:00:50,900 --> 01:00:53,110 Markov chains. 1011 01:00:53,110 --> 01:00:59,190 You can restate this in matrix form as the limit of the 1012 01:00:59,190 --> 01:01:02,900 matrix P to the n-th power. 1013 01:01:02,900 --> 01:01:06,680 What I didn't mention here and what I probably didn't mention 1014 01:01:06,680 --> 01:01:11,880 enough in the notes is that P sub i j-- 1015 01:01:32,360 --> 01:01:47,560 but also, if you take the matrix P times P time P, n 1016 01:01:47,560 --> 01:01:53,880 times, namely, you take the matrix, P to the n. 1017 01:01:53,880 --> 01:02:00,720 This says the P sub i j is the i j element. 1018 01:02:09,900 --> 01:02:12,530 I'm sure all of you know that by now, because you've been 1019 01:02:12,530 --> 01:02:15,310 using it all the time. 1020 01:02:15,310 --> 01:02:18,820 And what this says here-- 1021 01:02:18,820 --> 01:02:26,150 what we've said before is that every row of this matrix, P to 1022 01:02:26,150 --> 01:02:28,600 the n, is the same. 1023 01:02:28,600 --> 01:02:31,290 Every row is equal to pi. 1024 01:02:31,290 --> 01:02:47,786 P to the n tends to a matrix which is pi 1, pi 2, 1025 01:02:47,786 --> 01:02:52,120 up to pi sub n. 1026 01:02:52,120 --> 01:02:57,000 Pi 1, pi 2, up to pi sub n. 1027 01:03:00,760 --> 01:03:06,770 Pi 1, pi 2, up to pi sub n. 1028 01:03:06,770 --> 01:03:14,660 And the easiest way to express this is the vector e times pi, 1029 01:03:14,660 --> 01:03:24,960 where e is transposed. 1030 01:03:24,960 --> 01:03:32,755 In other words, if you take a column matrix, column 1, 1, 1, 1031 01:03:32,755 --> 01:03:40,670 1, 1, and you multiply this by a row vector, pi 1 times pi 1032 01:03:40,670 --> 01:03:48,030 sub n, what you get is, for this first row multiplied by 1033 01:03:48,030 --> 01:03:51,210 this, this gives you-- 1034 01:03:51,210 --> 01:03:53,480 well, in fact, if you multiply this out, 1035 01:03:53,480 --> 01:03:56,360 this is what you get. 1036 01:03:56,360 --> 01:03:58,650 And if you've never gone through the trouble of seeing 1037 01:03:58,650 --> 01:04:03,880 that this multiplication leads to this, please do it, because 1038 01:04:03,880 --> 01:04:07,170 it's important to notice that correspondence. 1039 01:04:14,530 --> 01:04:18,080 We got specific results by looking at the eigenvalues and 1040 01:04:18,080 --> 01:04:20,880 eigenvectors of stochastic matrices. 1041 01:04:20,880 --> 01:04:24,720 And a stochastic matrix is the matrix of a Markov chain. 1042 01:04:28,500 --> 01:04:31,290 So some of these things are sort of obvious. 1043 01:04:31,290 --> 01:04:36,870 Lambda is an eigenvalue of P, if and only if P minus lambda 1044 01:04:36,870 --> 01:04:38,120 i is singular. 1045 01:04:41,670 --> 01:04:45,040 This set of relationships is not obvious. 1046 01:04:45,040 --> 01:04:48,130 This is obvious linear algebra. 1047 01:04:48,130 --> 01:04:51,250 This is something that when you study eigenvalues and 1048 01:04:51,250 --> 01:04:55,430 eigenvectors in linear algebra, you recognize that 1049 01:04:55,430 --> 01:04:57,270 this is a summary of a lot of things. 1050 01:04:57,270 --> 01:05:01,440 If and only if this determinant is equal to 0, 1051 01:05:01,440 --> 01:05:05,650 which is true if and only if there's some vector nu for 1052 01:05:05,650 --> 01:05:12,560 which P times nu equals lambda times nu for nu unequal to 0. 1053 01:05:12,560 --> 01:05:16,920 And if and only if pi P equals lambda pi for some 1054 01:05:16,920 --> 01:05:18,210 pi unequal to 0. 1055 01:05:18,210 --> 01:05:23,250 In other words, if this determinant is equal to 0, it 1056 01:05:23,250 --> 01:05:32,040 means that the matrix P minus lambda i is singular. 1057 01:05:32,040 --> 01:05:35,950 If the matrix is singular, there has to be some solution 1058 01:05:35,950 --> 01:05:38,370 to this equation here. 1059 01:05:38,370 --> 01:05:40,220 There has to be some solution to this 1060 01:05:40,220 --> 01:05:44,530 left eigenvector equation. 1061 01:05:44,530 --> 01:05:48,740 Now, once you see this, you notice that e is always a 1062 01:05:48,740 --> 01:05:53,750 right eigenvector of P. Every stochastic matrix in the world 1063 01:05:53,750 --> 01:05:58,920 has the property that e is a right eigenvector of it. 1064 01:05:58,920 --> 01:05:59,800 Why is that? 1065 01:05:59,800 --> 01:06:05,230 Because all of the rows of a stochastic matrix sum to 1. 1066 01:06:05,230 --> 01:06:10,070 If you start off in state i, the sum of the possible states 1067 01:06:10,070 --> 01:06:14,530 you can be at in the next step is equal to 1. 1068 01:06:14,530 --> 01:06:17,120 You have to go somewhere. 1069 01:06:17,120 --> 01:06:21,650 So e is always a right eigenvector of P with 1070 01:06:21,650 --> 01:06:23,300 eigenvalue 1. 1071 01:06:23,300 --> 01:06:26,510 Since e is also is a right eigenvector of P with 1072 01:06:26,510 --> 01:06:29,850 eigenvalue 1, we go up here. 1073 01:06:29,850 --> 01:06:32,460 We look at these if and only if statements. 1074 01:06:32,460 --> 01:06:34,890 We see, then, P must be singular. 1075 01:06:34,890 --> 01:06:38,410 And then pi times P equals lambda pi. 1076 01:06:38,410 --> 01:06:41,410 So no matter how many recurrent classes we have, no 1077 01:06:41,410 --> 01:06:46,430 matter what periodicity we have in each of them, there's 1078 01:06:46,430 --> 01:06:53,170 always a solution to pi times P equals pi. 1079 01:06:53,170 --> 01:06:55,550 There's always at least one steady state vector. 1080 01:06:59,320 --> 01:07:03,580 This determinant has an M-th degree polynomial in lambda. 1081 01:07:03,580 --> 01:07:08,150 M-th degree polynomials have M roots. 1082 01:07:08,150 --> 01:07:10,400 They aren't necessarily distinct. 1083 01:07:10,400 --> 01:07:14,040 The multiplicity of an eigenvalue is the number roots 1084 01:07:14,040 --> 01:07:15,500 of that value. 1085 01:07:15,500 --> 01:07:19,780 And the multiplicity of lambda equals 1. 1086 01:07:19,780 --> 01:07:22,530 How many different roots are there which have 1087 01:07:22,530 --> 01:07:24,360 lambda equals 1? 1088 01:07:24,360 --> 01:07:26,940 Well it turns out to be just the number of recurrent 1089 01:07:26,940 --> 01:07:29,550 classes that you have. 1090 01:07:29,550 --> 01:07:32,750 If you have a bunch of recurrent classes, within each 1091 01:07:32,750 --> 01:07:37,330 recurring class, there's a solution to pi P equals pi, 1092 01:07:37,330 --> 01:07:41,540 which is non-zero only one that recurrent class. 1093 01:07:41,540 --> 01:07:46,340 Namely, you take this huge Markov chain and you say, I 1094 01:07:46,340 --> 01:07:48,650 don't care about any of this except this 1095 01:07:48,650 --> 01:07:50,890 one recurrent class. 1096 01:07:50,890 --> 01:07:53,990 If we look at this one recurrent class, and solve for 1097 01:07:53,990 --> 01:07:57,500 the steady state probability in that one recurrent class, 1098 01:07:57,500 --> 01:08:01,220 then we get an eigenvector which is non-zero on that 1099 01:08:01,220 --> 01:08:05,990 class, 0 everywhere else, that has an eigenvalue 1. 1100 01:08:05,990 --> 01:08:08,050 And for every other recurrent class, we 1101 01:08:08,050 --> 01:08:10,590 get the same situation. 1102 01:08:10,590 --> 01:08:14,150 So the multiplicity of lambda equals 1 is equal to the 1103 01:08:14,150 --> 01:08:17,260 number of recurrent classes. 1104 01:08:17,260 --> 01:08:21,950 If you didn't get that proof on the fly, it gets 1105 01:08:21,950 --> 01:08:23,310 proved in the notes. 1106 01:08:23,310 --> 01:08:27,130 And if you don't get the proof, just remember that 1107 01:08:27,130 --> 01:08:28,380 that's the way it is. 1108 01:08:30,859 --> 01:08:34,859 For the special case where all M eigenvalues are distinct, 1109 01:08:34,859 --> 01:08:38,640 the right eigenvectors are linearly independent. 1110 01:08:38,640 --> 01:08:42,620 You remember that proof we went through that all of the 1111 01:08:42,620 --> 01:08:46,470 left eigenvectors and all the right eigenvectors are all 1112 01:08:46,470 --> 01:08:49,870 orthonormal to each other, or you can make them all 1113 01:08:49,870 --> 01:08:52,270 orthonormal to each other? 1114 01:08:52,270 --> 01:08:57,380 That says that if the right eigenvectors are linearly 1115 01:08:57,380 --> 01:09:01,120 independent, you can represent them as the columns of an 1116 01:09:01,120 --> 01:09:04,750 invertible matrix U. Then P times U is 1117 01:09:04,750 --> 01:09:06,819 equal to U times lambda. 1118 01:09:06,819 --> 01:09:09,800 What does this equations say? 1119 01:09:09,800 --> 01:09:12,460 You split it up into a bunch of equations. 1120 01:09:16,500 --> 01:09:46,080 P times U and we look at it as nu 1, nu 2, nu sub [? n ?]. 1121 01:09:46,080 --> 01:09:52,580 I guess better put the superscripts on it. 1122 01:09:56,100 --> 01:10:01,270 If I take the matrix U and just view it as M different 1123 01:10:01,270 --> 01:10:05,190 columns, then what this is saying is that 1124 01:10:05,190 --> 01:10:06,545 this is equal to-- 1125 01:10:17,290 --> 01:10:35,540 nu 1, nu 2, nu M, times lambda 1, lambda 2, up to lambda M. 1126 01:10:35,540 --> 01:10:38,500 Now you multiply this out, and what do you get? 1127 01:10:38,500 --> 01:10:41,860 You get nu 1 times lambda 1. 1128 01:10:41,860 --> 01:10:46,190 You get a nu 2 times lambda 2 for the second column, nu M 1129 01:10:46,190 --> 01:10:49,820 times lambda M for the last column, and here you get P 1130 01:10:49,820 --> 01:10:54,360 times nu 1 is equal to a nu 1 times lambda 1, and so forth. 1131 01:10:54,360 --> 01:10:59,240 So all this vector equation says is the same thing that 1132 01:10:59,240 --> 01:11:04,760 these n M individual eigenvector equations say. 1133 01:11:04,760 --> 01:11:11,160 It's just a more compact way of saying the same thing. 1134 01:11:11,160 --> 01:11:17,300 And if these eigenvectors span this space, then this set of 1135 01:11:17,300 --> 01:11:20,710 eigenvectors are linearly independent of each other. 1136 01:11:20,710 --> 01:11:24,860 And when you look at the set of them, this matrix here has 1137 01:11:24,860 --> 01:11:26,440 to have an inverse. 1138 01:11:26,440 --> 01:11:34,890 So you can also express this as P equals this vector-- 1139 01:11:34,890 --> 01:11:40,820 this matrix of right eigenvectors times the 1140 01:11:40,820 --> 01:11:46,630 diagonal matrix lambda, times the inverse of this matrix. 1141 01:11:46,630 --> 01:11:49,880 Matrix U to the minus 1 turns out to have rows equal to the 1142 01:11:49,880 --> 01:11:51,730 left eigenvectors. 1143 01:11:51,730 --> 01:11:54,330 That's because these eigenvectors-- 1144 01:11:54,330 --> 01:11:57,440 that's because the right eigenvectors and the left 1145 01:11:57,440 --> 01:12:01,270 eigenvectors are orthogonal to each other. 1146 01:12:04,670 --> 01:12:09,690 When we then split up this matrix into a sum of M 1147 01:12:09,690 --> 01:12:13,830 different matrices, each matrix having only one-- 1148 01:12:41,270 --> 01:12:43,460 and so forth. 1149 01:12:43,460 --> 01:12:45,710 Then what you get-- 1150 01:12:45,710 --> 01:12:48,490 here's this-- 1151 01:12:48,490 --> 01:12:54,730 this nice equation here, which says that if all the 1152 01:12:54,730 --> 01:12:58,870 eigenvalues are distinct, then you can always represent a 1153 01:12:58,870 --> 01:13:03,420 stochastic matrix as the sum of lambda i times nu to the i 1154 01:13:03,420 --> 01:13:04,670 times pi to the i. 1155 01:13:04,670 --> 01:13:10,000 More importantly, if you take this equation here and look at 1156 01:13:10,000 --> 01:13:14,470 P to the n, P to the n is U times lambda times U to the 1157 01:13:14,470 --> 01:13:18,820 minus 1, times U times lambda times U to the minus 1, blah, 1158 01:13:18,820 --> 01:13:20,270 blah, blah forever. 1159 01:13:20,270 --> 01:13:24,030 Each U to the minus 1 cancels out with the following U. And 1160 01:13:24,030 --> 01:13:29,330 you wind up with P to the n equals U times lambda to the 1161 01:13:29,330 --> 01:13:33,170 n, U to the minus 1. 1162 01:13:33,170 --> 01:13:40,250 Which says that P to the n is just a sum here. 1163 01:13:40,250 --> 01:13:44,650 It's the sum of the eigenvalues to the n-th power 1164 01:13:44,650 --> 01:13:47,320 times these pairs of eigenvectors here. 1165 01:13:47,320 --> 01:13:51,660 So this is a general decomposition for P to the n. 1166 01:13:51,660 --> 01:13:56,010 What we're interested in is what happens as n gets large. 1167 01:13:56,010 --> 01:13:59,360 If we have a unit chain, we already know what happens as n 1168 01:13:59,360 --> 01:14:00,570 gets large. 1169 01:14:00,570 --> 01:14:07,110 We know that as n gets large, we wind up with just 1 times 1170 01:14:07,110 --> 01:14:12,480 this eigenvector e times this eigenvector pi. 1171 01:14:12,480 --> 01:14:15,760 Which says that all of the other eigenvalues have to go 1172 01:14:15,760 --> 01:14:19,670 to 0, which says that the magnitude of these other 1173 01:14:19,670 --> 01:14:22,200 eigenvalues are less than 1. 1174 01:14:22,200 --> 01:14:23,450 So they're all going away. 1175 01:14:26,600 --> 01:14:32,300 So the facts here are that all eigenvalues lambda have to 1176 01:14:32,300 --> 01:14:35,310 satisfy the magnitude of lambda is less 1177 01:14:35,310 --> 01:14:36,740 than or equal to 1. 1178 01:14:36,740 --> 01:14:39,680 That's what I just argued. 1179 01:14:39,680 --> 01:14:44,530 For each recurrent class C, there's one lambda equals 1, 1180 01:14:44,530 --> 01:14:47,750 with a left side and vector equals the steady state on 1181 01:14:47,750 --> 01:14:51,190 that recurrent class and 0 elsewhere. 1182 01:14:51,190 --> 01:14:55,230 The right eigenvector nu satisfies the limit as n goes 1183 01:14:55,230 --> 01:14:56,410 to infinity. 1184 01:14:56,410 --> 01:15:00,930 So the probability that x sub n is in this recurring class, 1185 01:15:00,930 --> 01:15:04,850 given that x sub 0 is equal to 0, is equal to the i-th 1186 01:15:04,850 --> 01:15:08,700 component of that right eigenvector. 1187 01:15:08,700 --> 01:15:13,200 In other words, if you have a Markov chain which has several 1188 01:15:13,200 --> 01:15:16,480 recurrent classes, and you want to find out what the 1189 01:15:16,480 --> 01:15:23,630 probability is, starting in the transient state, of going 1190 01:15:23,630 --> 01:15:29,170 to one of those classes, this is what tells you that answer. 1191 01:15:29,170 --> 01:15:33,510 This says that the probability that you go to a particular 1192 01:15:33,510 --> 01:15:37,530 recurrent class C, given that you start off in a particular 1193 01:15:37,530 --> 01:15:41,340 transient state i, is whatever that right eigenvector 1194 01:15:41,340 --> 01:15:42,690 turns out to be. 1195 01:15:42,690 --> 01:15:46,170 And you can solve that right eigenvector problem just as an 1196 01:15:46,170 --> 01:15:48,920 M by M set of linear equations. 1197 01:15:48,920 --> 01:15:51,170 So you can find the probabilities of going through 1198 01:15:51,170 --> 01:15:56,370 each transient state just by solving that set of linear 1199 01:15:56,370 --> 01:16:01,650 equations and finding those eigenvector equations. 1200 01:16:01,650 --> 01:16:05,770 For each recurrent periodic class of period d, there are d 1201 01:16:05,770 --> 01:16:09,140 eigenvalues equally spaced on the unit circle. 1202 01:16:09,140 --> 01:16:13,330 There are no other eigenvalues with lambda equals 1-- with a 1203 01:16:13,330 --> 01:16:15,080 magnitude of lambda equals 1. 1204 01:16:15,080 --> 01:16:19,070 In other words, for each recurrent class, you get one 1205 01:16:19,070 --> 01:16:20,700 eigenvalue that's equal to 1. 1206 01:16:20,700 --> 01:16:25,260 If that recurrent class is periodic, you get a bunch of 1207 01:16:25,260 --> 01:16:30,640 other eigenvalues put around the unit circle. 1208 01:16:30,640 --> 01:16:35,380 And those are all the eigenvalues there are. 1209 01:16:35,380 --> 01:16:36,296 Oh my God. 1210 01:16:36,296 --> 01:16:38,000 It's-- 1211 01:16:38,000 --> 01:16:39,930 I thought I was talking quickly. 1212 01:16:39,930 --> 01:16:44,870 But anyway, if the eigenvectors don't span the 1213 01:16:44,870 --> 01:16:50,360 space, then P to the n is equal to U times this Jordan 1214 01:16:50,360 --> 01:16:55,350 reform, U to the minus 1, where J is a Jordan form. 1215 01:16:55,350 --> 01:16:58,320 What you saw in the homework when you looked at the-- 1216 01:17:02,030 --> 01:17:04,075 when you looked at the Markov chain-- 1217 01:17:28,120 --> 01:17:28,620 OK. 1218 01:17:28,620 --> 01:17:35,020 This is one recurrent class with this one node in it. 1219 01:17:35,020 --> 01:17:38,030 These two nodes are both transient. 1220 01:17:38,030 --> 01:17:41,720 If you look at how long it takes to get from here over to 1221 01:17:41,720 --> 01:17:45,120 there, those transition probabilities do not 1222 01:17:45,120 --> 01:17:51,620 correspond to this equation here. 1223 01:17:51,620 --> 01:17:54,075 Instead, P sub 1 2-- 1224 01:17:57,400 --> 01:18:00,230 P sub 2 3, the way I've drawn it here. 1225 01:18:00,230 --> 01:18:07,140 P sub 1 3 is n times this eigenvalue, which 1226 01:18:07,140 --> 01:18:09,760 is 1/2 in this case. 1227 01:18:09,760 --> 01:18:12,820 And it doesn't correspond to this, which is why you need a 1228 01:18:12,820 --> 01:18:14,290 Jordan form. 1229 01:18:14,290 --> 01:18:17,860 I said that Jordan forms are excessively ugly. 1230 01:18:17,860 --> 01:18:22,120 Jordan forms are really very classy and nice ways of 1231 01:18:22,120 --> 01:18:24,460 dealing with a problem which is very ugly. 1232 01:18:24,460 --> 01:18:26,340 So don't blame Jordan. 1233 01:18:26,340 --> 01:18:29,670 Jordan simplified things for us. 1234 01:18:29,670 --> 01:18:36,840 So that's roughly as far as we went with Markov chains. 1235 01:18:40,970 --> 01:18:44,910 Renewal processes, we don't have to review them because 1236 01:18:44,910 --> 01:18:47,400 you're already immediately familiar with them. 1237 01:18:50,610 --> 01:18:55,910 I will do one thing next time with renewal classes and 1238 01:18:55,910 --> 01:19:00,290 Markov chains, which is to explain to you why the 1239 01:19:00,290 --> 01:19:04,660 expected amount of time to get from one state back to itself 1240 01:19:04,660 --> 01:19:07,380 is equal to 1 over pi-- 1241 01:19:07,380 --> 01:19:09,160 1 over pi sub i. 1242 01:19:09,160 --> 01:19:10,790 You did that in the homework. 1243 01:19:10,790 --> 01:19:12,900 And it was an awful way to do it. 1244 01:19:12,900 --> 01:19:14,340 And there's a nice way to do it. 1245 01:19:14,340 --> 01:19:15,860 I'll talk about that next time.