1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high-quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation or view additional materials from 6 00:00:13,960 --> 00:00:19,790 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:19,790 --> 00:00:21,040 ocw.mit.edu. 8 00:00:23,540 --> 00:00:29,950 PROFESSOR: OK, let's get started again on finite-state 9 00:00:29,950 --> 00:00:31,490 Markov chains. 10 00:00:31,490 --> 00:00:33,130 Sorry I was away last week. 11 00:00:33,130 --> 00:00:38,390 It was a long-term commitment that I had to honor. 12 00:00:38,390 --> 00:00:40,170 But I think I will be around for all 13 00:00:40,170 --> 00:00:41,550 the rest of the lectures. 14 00:00:41,550 --> 00:00:49,950 So I want to start out by reviewing just a little bit. 15 00:00:49,950 --> 00:00:53,570 I'm spending a lot more time on finite-state Markov chains 16 00:00:53,570 --> 00:00:57,940 than we usually do in this course, partly because I've 17 00:00:57,940 --> 00:01:02,230 rewritten this section, partly because I think the material 18 00:01:02,230 --> 00:01:04,510 is very important. 19 00:01:04,510 --> 00:01:09,850 It's sort of bread-and-butter stuff, of 20 00:01:09,850 --> 00:01:11,930 discrete stochastic processes. 21 00:01:11,930 --> 00:01:13,690 You use it all the time. 22 00:01:13,690 --> 00:01:18,080 It's a foundation for almost everything else. 23 00:01:18,080 --> 00:01:22,410 And after thinking about it for a long time, it really 24 00:01:22,410 --> 00:01:23,800 isn't all that complicated. 25 00:01:23,800 --> 00:01:27,760 I used to think that all these details of finding eigenvalues 26 00:01:27,760 --> 00:01:32,580 and eigenvectors and so on was extremely tedious. 27 00:01:32,580 --> 00:01:35,810 And it turns out that there's a very nice 28 00:01:35,810 --> 00:01:37,850 pleasant theory there. 29 00:01:37,850 --> 00:01:40,960 You can find all of these things after you know what 30 00:01:40,960 --> 00:01:46,790 you're doing by very simple computer packages. 31 00:01:46,790 --> 00:01:49,160 But they don't help if you don't know what's going on. 32 00:01:49,160 --> 00:01:52,930 So here, we're trying to figure out what's going on. 33 00:01:52,930 --> 00:01:57,720 So let's start out by reviewing what we know about 34 00:01:57,720 --> 00:02:05,120 ergodic unit chains and proceed from there. 35 00:02:05,120 --> 00:02:10,710 An ergodic finite-state Markov chain has transition 36 00:02:10,710 --> 00:02:17,190 probabilities which, if you look at the transition matrix 37 00:02:17,190 --> 00:02:21,180 raised to the nth power, what that gives you is the 38 00:02:21,180 --> 00:02:24,440 transition probabilities of an n-step Markov chain. 39 00:02:24,440 --> 00:02:29,360 In other words, you start at time 0, and at time n, you 40 00:02:29,360 --> 00:02:31,310 look at what state you're in. 41 00:02:31,310 --> 00:02:38,190 P sub ij to the nth power is then the probability that 42 00:02:38,190 --> 00:02:42,110 you're in state j at time n, given that you're in 43 00:02:42,110 --> 00:02:45,910 state i at time 0. 44 00:02:45,910 --> 00:02:48,980 So this has all the information that you want 45 00:02:48,980 --> 00:02:53,180 about what happens to Markov chain as time gets large. 46 00:02:53,180 --> 00:02:57,010 One of the things we're most concerned with is, do you go 47 00:02:57,010 --> 00:02:58,440 to steady state? 48 00:02:58,440 --> 00:03:01,150 And if you do go to steady state, how fast do you go to 49 00:03:01,150 --> 00:03:02,610 steady state? 50 00:03:02,610 --> 00:03:05,830 And of course, this matrix tells you the whole story 51 00:03:05,830 --> 00:03:11,720 there, because if you go to steady state, and the Markov 52 00:03:11,720 --> 00:03:24,390 chain forgets where it started, then P sub ij to the 53 00:03:24,390 --> 00:03:29,240 n goes to some constant, pi sub j, which is independent of 54 00:03:29,240 --> 00:03:33,190 the starting state, i, and independent of m, 55 00:03:33,190 --> 00:03:35,080 asymptotically, as n gets big. 56 00:03:35,080 --> 00:03:41,730 So this pi is a strictly positive probability vector. 57 00:03:41,730 --> 00:03:43,680 I shouldn't say so it is. 58 00:03:43,680 --> 00:03:47,590 That's something that was shown last time. 59 00:03:47,590 --> 00:03:55,530 If you multiply both sides of this equation by P sub jk in 60 00:03:55,530 --> 00:03:57,980 sum over k, then what do you get? 61 00:03:57,980 --> 00:04:01,580 You get P sub ik to the n plus 1. 62 00:04:01,580 --> 00:04:02,995 That goes to a limit also. 63 00:04:02,995 --> 00:04:04,740 If the limit in n goes to infin-- 64 00:04:07,670 --> 00:04:10,730 then the limit as n plus 1 goes to infinity is clearly 65 00:04:10,730 --> 00:04:12,150 the same thing. 66 00:04:12,150 --> 00:04:17,200 So this quantity here is the sum over j, of pi 67 00:04:17,200 --> 00:04:19,329 sub j, P sub jk. 68 00:04:19,329 --> 00:04:25,730 And this quantity is equal to pi sub k, just by definition 69 00:04:25,730 --> 00:04:26,930 of this quantity. 70 00:04:26,930 --> 00:04:29,960 So P sub k is equal to sum of pi j. 71 00:04:29,960 --> 00:04:32,050 Pjk, what does that say? 72 00:04:32,050 --> 00:04:34,770 That's the definition of a steady state vector. 73 00:04:34,770 --> 00:04:40,310 That's the definition of, if your probabilities of being in 74 00:04:40,310 --> 00:04:46,490 state k satisfy this equation, then one step later, you still 75 00:04:46,490 --> 00:04:49,580 have the same probability of being in state k. 76 00:04:49,580 --> 00:04:52,870 Two steps later, you still have the same probability of 77 00:04:52,870 --> 00:04:54,860 being in state k. 78 00:04:54,860 --> 00:05:00,430 So this is called the steady state equation. 79 00:05:00,430 --> 00:05:05,070 And a solution to that is called a steady state vector. 80 00:05:05,070 --> 00:05:07,795 And that satisfies this. 81 00:05:07,795 --> 00:05:10,990 In matrix terms, if you rate this out, what does it say? 82 00:05:10,990 --> 00:05:14,800 It says the limit as n approaches infinity of p to 83 00:05:14,800 --> 00:05:21,210 the n is equal to the column vector, e of all 1s. 84 00:05:23,720 --> 00:05:26,300 The transpose here means it's a column vector. 85 00:05:26,300 --> 00:05:30,200 So you have a column vector times a row vector. 86 00:05:30,200 --> 00:05:32,760 Now, you know if you have a row vector times a column 87 00:05:32,760 --> 00:05:36,360 vector, that just gives you a number. 88 00:05:36,360 --> 00:05:38,960 If you have a column vector times a row 89 00:05:38,960 --> 00:05:42,200 vector, what happens? 90 00:05:42,200 --> 00:05:46,510 Well, for each element of the column, you 91 00:05:46,510 --> 00:05:47,970 get this whole row. 92 00:05:47,970 --> 00:05:50,840 And for the next element of the column, you get the whole 93 00:05:50,840 --> 00:05:54,790 row down beneath it multiplied by the element of the column, 94 00:05:54,790 --> 00:05:56,170 and so forth, day on. 95 00:05:56,170 --> 00:06:01,760 So a column vector times a row vector is, in 96 00:06:01,760 --> 00:06:03,280 fact, a whole matrix. 97 00:06:03,280 --> 00:06:07,590 It's a j by j matrix. 98 00:06:07,590 --> 00:06:13,900 And since e is all 1s, what that matrix is is a matrix 99 00:06:13,900 --> 00:06:17,950 where every row is a steady state vector pi. 100 00:06:17,950 --> 00:06:21,950 So we're saying not only does this pi that we're talking 101 00:06:21,950 --> 00:06:25,770 about satisfy this steady state equation, but more 102 00:06:25,770 --> 00:06:29,640 important, it's this limiting vector here. 103 00:06:29,640 --> 00:06:32,450 And as n goes to infinity, you in fact do 104 00:06:32,450 --> 00:06:34,730 forget where you were. 105 00:06:34,730 --> 00:06:41,660 And the entire matrix of where you are at time n, given where 106 00:06:41,660 --> 00:06:46,690 you were at time 0, goes to just this fixed vector pi. 107 00:06:46,690 --> 00:06:51,377 So this is a column vector, and pi is a row vector then. 108 00:06:54,660 --> 00:06:59,180 The same result almost holds for ergodic unit chains. 109 00:06:59,180 --> 00:07:01,510 What's an ergodic unit chain? 110 00:07:01,510 --> 00:07:06,740 An ergodic unit chain is an ergodic set of states plus a 111 00:07:06,740 --> 00:07:09,060 whole bunch of transient states. 112 00:07:09,060 --> 00:07:12,970 Doesn't matter whether the transient states are one class 113 00:07:12,970 --> 00:07:16,360 of transient states or whether it's multiple classes of 114 00:07:16,360 --> 00:07:17,960 transient states. 115 00:07:17,960 --> 00:07:19,850 It's just transient states. 116 00:07:19,850 --> 00:07:22,540 And there's one recurrent class. 117 00:07:22,540 --> 00:07:24,970 And we're assuming here that it's recurrent. 118 00:07:24,970 --> 00:07:28,740 So you can almost see intuitively that if you start 119 00:07:28,740 --> 00:07:31,800 out in any one of these transient states, you bum 120 00:07:31,800 --> 00:07:34,360 around through the transient states for a while. 121 00:07:34,360 --> 00:07:39,800 And eventually, you flop off into the recurrent class. 122 00:07:39,800 --> 00:07:41,110 And once you're in the recurrent 123 00:07:41,110 --> 00:07:43,960 class, there's no return. 124 00:07:43,960 --> 00:07:46,220 So you stay there forever. 125 00:07:46,220 --> 00:07:49,070 Now, that's something that has to be proven. 126 00:07:49,070 --> 00:07:50,190 And it's proven in the notes. 127 00:07:50,190 --> 00:07:51,885 It was probably proven last time. 128 00:07:54,420 --> 00:07:59,250 But anyway, what happens then is that the sole difference 129 00:07:59,250 --> 00:08:05,090 between ergodic unit chains and just having a completely 130 00:08:05,090 --> 00:08:10,670 ergodic Markov chain is that the steady state factor is now 131 00:08:10,670 --> 00:08:14,610 positive for all ergodic states and it's 0 for all 132 00:08:14,610 --> 00:08:16,100 transient states. 133 00:08:16,100 --> 00:08:20,710 And aside from that, you still get the same behavior still. 134 00:08:20,710 --> 00:08:26,140 As n gets large, you go to the steady state vector, which is 135 00:08:26,140 --> 00:08:30,920 the steady state vector of the ergodic chain. 136 00:08:30,920 --> 00:08:35,500 If you're doing this stuff by hand, how do you do it? 137 00:08:35,500 --> 00:08:39,270 Well, you start out just with the ergodic class. 138 00:08:39,270 --> 00:08:42,130 I mean, you might as well ignore everything else, 139 00:08:42,130 --> 00:08:45,250 because you know that eventually you're in that 140 00:08:45,250 --> 00:08:46,280 ergodic class. 141 00:08:46,280 --> 00:08:49,830 And you find the steady state vector in that ergodic class, 142 00:08:49,830 --> 00:08:51,480 and that's the steady state vector you're 143 00:08:51,480 --> 00:08:53,890 going to wind up with. 144 00:08:53,890 --> 00:08:56,350 This is one advantage of understanding what you're 145 00:08:56,350 --> 00:08:59,930 doing, because if you don't understand what you're doing 146 00:08:59,930 --> 00:09:04,110 and you're just using computer programs, then you never have 147 00:09:04,110 --> 00:09:06,360 any idea what's ergodic, what's not 148 00:09:06,360 --> 00:09:07,650 ergodic or anything else. 149 00:09:07,650 --> 00:09:11,180 You just plug it, you grind away, you get some answer and 150 00:09:11,180 --> 00:09:15,230 say, ah, I'll publish a paper. 151 00:09:15,230 --> 00:09:19,220 And you put down exactly what the computer says, but you 152 00:09:19,220 --> 00:09:22,520 have no interpretation of it at all. 153 00:09:22,520 --> 00:09:27,700 So the other way of looking at this is, when you have a bunch 154 00:09:27,700 --> 00:09:33,640 of transient states, and you also have an ergodic class, 155 00:09:33,640 --> 00:09:40,050 you can represent a matrix if the recurrent states are at 156 00:09:40,050 --> 00:09:43,190 the end of the chain and the transient states are at the 157 00:09:43,190 --> 00:09:45,830 beginning of the chain. 158 00:09:45,830 --> 00:09:50,850 This matrix here is the matrix of transition probabilities 159 00:09:50,850 --> 00:09:53,200 within the recurrent class. 160 00:09:53,200 --> 00:09:58,870 These are the probabilities for going from the transient 161 00:09:58,870 --> 00:10:02,820 states to the recurrent class. 162 00:10:02,820 --> 00:10:04,810 And once you get over here, the only place 163 00:10:04,810 --> 00:10:06,060 to go is down here. 164 00:10:10,880 --> 00:10:15,380 And the transient class is just a t by t class. 165 00:10:15,380 --> 00:10:20,035 And the recurrent class is just a j minus t 166 00:10:20,035 --> 00:10:22,540 by j minus t matrix. 167 00:10:22,540 --> 00:10:25,250 So the idea is that each transient state eventually has 168 00:10:25,250 --> 00:10:28,910 a transition to a recurrent state, and the class of 169 00:10:28,910 --> 00:10:33,770 recurrent states leads to study state as before. 170 00:10:33,770 --> 00:10:37,230 So that really, all that analysis of ergodic unit 171 00:10:37,230 --> 00:10:43,470 chains, if you look at it intuitively, it's all obvious. 172 00:10:43,470 --> 00:10:48,230 Now, as in much of mathematics, knowing that 173 00:10:48,230 --> 00:10:52,820 something is obvious does not relieve you of the need to 174 00:10:52,820 --> 00:10:55,950 prove it, because sometimes you find that something that 175 00:10:55,950 --> 00:10:58,800 looks obvious is true most of the time but 176 00:10:58,800 --> 00:10:59,870 not all of the time. 177 00:10:59,870 --> 00:11:04,230 And that's the purpose of doing these things. 178 00:11:04,230 --> 00:11:07,650 There's another way to express this eigenvalue, eigenvector 179 00:11:07,650 --> 00:11:10,700 equation we have here. 180 00:11:10,700 --> 00:11:16,900 And that is that the transition matrix minus lambda 181 00:11:16,900 --> 00:11:23,020 times the identity matrix times the column vector v is 182 00:11:23,020 --> 00:11:24,250 equal to 0. 183 00:11:24,250 --> 00:11:30,760 That's the same as the equation p times v is equal to 184 00:11:30,760 --> 00:11:34,910 v. That's the same as a right eigenvector. 185 00:11:34,910 --> 00:11:38,880 Well, this is the equation for an eigenvalue 1. 186 00:11:38,880 --> 00:11:42,780 This is an equation for an arbitrary eigenvalue lambda. 187 00:11:42,780 --> 00:11:48,610 But p times v equals lambda times v is the same as p minus 188 00:11:48,610 --> 00:11:50,810 lambda i times v equals 0. 189 00:11:50,810 --> 00:11:55,010 Why do we even bother to say something so obvious? 190 00:11:55,010 --> 00:11:59,980 Well, because when you look at linear algebra, how many of 191 00:11:59,980 --> 00:12:04,220 you have never studied any linear algebra at all or have 192 00:12:04,220 --> 00:12:09,430 only studied completely mathematical linear algebra, 193 00:12:09,430 --> 00:12:15,000 where you never deal with n-tuples as vectors or 194 00:12:15,000 --> 00:12:16,970 matrices or any things like this? 195 00:12:16,970 --> 00:12:18,220 Is there anyone? 196 00:12:21,410 --> 00:12:25,890 If you don't have this background, pick up-- 197 00:12:30,330 --> 00:12:31,633 what's his name? 198 00:12:31,633 --> 00:12:32,420 AUDIENCE: Strang. 199 00:12:32,420 --> 00:12:33,390 PROFESSOR: Strang. 200 00:12:33,390 --> 00:12:35,070 Strang's book. 201 00:12:35,070 --> 00:12:39,090 It's a remarkably simple-minded book which says 202 00:12:39,090 --> 00:12:42,370 everything as clearly as it can be stated. 203 00:12:42,370 --> 00:12:45,040 And it tells you everything you have to know. 204 00:12:45,040 --> 00:12:48,730 And it does it in a very straightforward way. 205 00:12:48,730 --> 00:12:52,860 So I highly recommend it to get any of the background that 206 00:12:52,860 --> 00:12:53,960 you might need. 207 00:12:53,960 --> 00:12:55,660 Most of you, I'm sure, are very 208 00:12:55,660 --> 00:12:56,870 familiar with these things. 209 00:12:56,870 --> 00:12:59,760 So I'm just reminding you of then. 210 00:12:59,760 --> 00:13:03,540 Now, a square matrix is singular if there's a vector 211 00:13:03,540 --> 00:13:07,640 v, such that a times v is equal to 0. 212 00:13:07,640 --> 00:13:10,890 That's just a definition as a singularity. 213 00:13:10,890 --> 00:13:16,340 Now, lambda is an eigenvalue of a matrix p if and only if p 214 00:13:16,340 --> 00:13:19,180 minus lambda times i is singular. 215 00:13:19,180 --> 00:13:23,600 In other words, if there's some v for which p minus 216 00:13:23,600 --> 00:13:27,870 lambda i times v is equal to 0, that's what this says. 217 00:13:27,870 --> 00:13:32,150 You put p minus lambda i in for a, and it says it's 218 00:13:32,150 --> 00:13:37,580 singular if there's some v for which this matrix-- 219 00:13:37,580 --> 00:13:40,940 this matrix is singular if there's some v such that p 220 00:13:40,940 --> 00:13:44,860 minus lambda i times v is equal to 0. 221 00:13:44,860 --> 00:13:48,800 So let a1 to am be the columns of a. 222 00:13:48,800 --> 00:13:52,430 Then a is going to be singular if a1 to am 223 00:13:52,430 --> 00:13:53,980 are linearly dependent. 224 00:13:53,980 --> 00:14:02,280 In other words, if there's some set of coefficients you 225 00:14:02,280 --> 00:14:09,000 can attach to a1 times v1 plus a2 times v2, plus up to am 226 00:14:09,000 --> 00:14:15,510 times vm such that that sum is equal to 0, that means that a1 227 00:14:15,510 --> 00:14:17,910 to am are linearly dependent. 228 00:14:17,910 --> 00:14:24,580 It also means that the matrix a times that v is equal to 0. 229 00:14:24,580 --> 00:14:27,390 So those two things say the same thing again. 230 00:14:27,390 --> 00:14:30,200 So the square matrix, a, is singular if and only if the 231 00:14:30,200 --> 00:14:34,390 rows of a are linearly independent. 232 00:14:34,390 --> 00:14:36,100 We set columns here. 233 00:14:36,100 --> 00:14:38,260 Here, we're doing the same thing for rows. 234 00:14:38,260 --> 00:14:40,120 It still holds true. 235 00:14:40,120 --> 00:14:44,340 And one new thing, if and only if the determinant of a is 236 00:14:44,340 --> 00:14:45,720 equal to 0. 237 00:14:45,720 --> 00:14:49,210 One of the nice things about determinants is that 238 00:14:49,210 --> 00:14:54,470 determinants are 0 if the matrix is singular, if and 239 00:14:54,470 --> 00:14:56,170 only if the matrix is singular. 240 00:14:56,170 --> 00:15:01,440 So the summary of all of this for a matrix which is a 241 00:15:01,440 --> 00:15:02,740 transition matrix-- 242 00:15:02,740 --> 00:15:04,960 namely, a stochastic matrix-- 243 00:15:04,960 --> 00:15:10,050 is lambda, is an eigenvalue of p, if and only if p minus 244 00:15:10,050 --> 00:15:14,320 lambda i is singular, if and only if the determinant of p 245 00:15:14,320 --> 00:15:19,870 minus lambda i is equal to 0, if and only if p times some 246 00:15:19,870 --> 00:15:25,150 vector v equals lambda v, and if and only if u times p 247 00:15:25,150 --> 00:15:28,410 equals lambda u for some u. 248 00:15:28,410 --> 00:15:29,310 Yes? 249 00:15:29,310 --> 00:15:31,672 AUDIENCE: The second to last statement is actually linearly 250 00:15:31,672 --> 00:15:34,540 independent, you said? 251 00:15:34,540 --> 00:15:35,974 The second to last. 252 00:15:35,974 --> 00:15:38,364 Square matrix a. 253 00:15:38,364 --> 00:15:39,215 No, above that. 254 00:15:39,215 --> 00:15:41,870 PROFESSOR: Oh, above that. 255 00:15:41,870 --> 00:15:46,910 A square matrix a is singular if and only if the rows of a 256 00:15:46,910 --> 00:15:49,773 are linearly dependent, yes. 257 00:15:49,773 --> 00:15:50,680 AUDIENCE: Dependent. 258 00:15:50,680 --> 00:15:51,490 PROFESSOR: Dependent, yes. 259 00:15:51,490 --> 00:15:56,050 In other words, if there's some vector v such that a 260 00:15:56,050 --> 00:16:08,370 times v is equal to 0, that means that those columns are 261 00:16:08,370 --> 00:16:09,620 linearly dependent. 262 00:16:13,080 --> 00:16:16,630 So we need all of those relationships. 263 00:16:16,630 --> 00:16:20,072 It says for every stochastic matrix-- 264 00:16:20,072 --> 00:16:22,820 oh, now this is something new. 265 00:16:22,820 --> 00:16:28,000 For every stochastic matrix, P times e is equal to e. 266 00:16:28,000 --> 00:16:46,040 Obviously, because if you sum up the sum of Pij over j is 267 00:16:46,040 --> 00:16:46,860 equal to 1. 268 00:16:46,860 --> 00:16:51,230 P sub ij is the probability, given that you start in state 269 00:16:51,230 --> 00:16:54,370 i, that in the next step, you'll be in state j. 270 00:16:54,370 --> 00:16:56,650 You have to be somewhere in the next step. 271 00:16:56,650 --> 00:17:00,570 So if you sum these quantities up, you have to get 1, which 272 00:17:00,570 --> 00:17:03,230 says you have to be some place. 273 00:17:03,230 --> 00:17:04,480 So that's all this is saying. 274 00:17:07,109 --> 00:17:10,579 That's true for every finite-state Markov chain in 275 00:17:10,579 --> 00:17:15,839 the world, no matter how ugly it is, how many sets of 276 00:17:15,839 --> 00:17:20,660 recurrent states it has, how much periodicity it has. 277 00:17:20,660 --> 00:17:26,010 A complete generality, P times e is equal to e. 278 00:17:26,010 --> 00:17:30,620 So lambda is always an eigenvalue of a stochastic 279 00:17:30,620 --> 00:17:35,350 matrix, and e is always a right eigenvector. 280 00:17:35,350 --> 00:17:38,130 Well, from what we've just said, that means there has to 281 00:17:38,130 --> 00:17:41,090 be a left eigenvector also. 282 00:17:41,090 --> 00:17:44,080 So there has to be some pi such that pi times 283 00:17:44,080 --> 00:17:47,380 P is equal to pi. 284 00:17:47,380 --> 00:17:51,210 So suddenly, we find there's also a left eigenvector. 285 00:17:51,210 --> 00:17:56,470 What we haven't shown yet is that that pi that satisfies 286 00:17:56,470 --> 00:17:59,210 this equation is a probability vector. 287 00:17:59,210 --> 00:18:03,070 Namely, we haven't shown that all the components of pi are 288 00:18:03,070 --> 00:18:04,740 greater than or equal to 0. 289 00:18:04,740 --> 00:18:06,800 We still have to do that. 290 00:18:06,800 --> 00:18:10,340 And in fact, that's not completely trivial. 291 00:18:10,340 --> 00:18:14,020 If we can find such a vector that is a probability vector, 292 00:18:14,020 --> 00:18:17,960 the compound in sum to 1 and they're not negative, then 293 00:18:17,960 --> 00:18:21,890 this is the equation for a steady state vector. 294 00:18:21,890 --> 00:18:25,590 So what we don't know yet is whether a steady 295 00:18:25,590 --> 00:18:27,120 state vector exists. 296 00:18:27,120 --> 00:18:31,400 We do know that a left eigenvector exists. 297 00:18:31,400 --> 00:18:33,690 We're going to show later that there is a steady 298 00:18:33,690 --> 00:18:35,050 state vector pi. 299 00:18:35,050 --> 00:18:40,400 In other words, a non-negative vector which sums to 1 for all 300 00:18:40,400 --> 00:18:42,340 finite-state Markov chains. 301 00:18:42,340 --> 00:18:46,780 In other words, no matter how messy it is, just like e, the 302 00:18:46,780 --> 00:18:50,900 column vector of all 1s is always a right eigenvector of 303 00:18:50,900 --> 00:18:52,480 eigenvalue 1. 304 00:18:52,480 --> 00:18:56,800 There is always a non-negative vector pi whose components sum 305 00:18:56,800 --> 00:19:03,590 to 1, which is a left eigenvector with eigenvalue 1. 306 00:19:03,590 --> 00:19:06,260 So these two relationships hold everywhere. 307 00:19:10,780 --> 00:19:15,030 Incidentally, the notes at one point claim 308 00:19:15,030 --> 00:19:17,030 to have shown this. 309 00:19:17,030 --> 00:19:18,680 And the notes really don't show it. 310 00:19:18,680 --> 00:19:21,270 I'm going to show it to you today. 311 00:19:21,270 --> 00:19:22,400 I'm sorry for that. 312 00:19:22,400 --> 00:19:26,660 It's something I've known for so long that I find it hard to 313 00:19:26,660 --> 00:19:29,420 say is this true or not. 314 00:19:29,420 --> 00:19:30,790 Of course it's true. 315 00:19:30,790 --> 00:19:34,540 But it does have to be shown, and I will show it 316 00:19:34,540 --> 00:19:35,790 to you later on. 317 00:19:38,490 --> 00:19:44,410 Chapter three of the notes is largely rewritten this year. 318 00:19:44,410 --> 00:19:47,920 And it has a few more typos in it than most 319 00:19:47,920 --> 00:19:49,870 of the other chapters. 320 00:19:49,870 --> 00:19:52,280 And a few of the typos are fairly important. 321 00:19:52,280 --> 00:19:55,518 I'll try to point some of them out as we go. 322 00:19:55,518 --> 00:19:59,190 But I'm sure I haven't caught them all yet. 323 00:19:59,190 --> 00:20:03,990 Now, what is the determinant of an M by M matrix? 324 00:20:03,990 --> 00:20:08,660 It's this very simple-looking but rather messy formula, 325 00:20:08,660 --> 00:20:13,560 which says the determinant of a square matrix A is the sum 326 00:20:13,560 --> 00:20:14,810 over all partitions-- 327 00:20:17,340 --> 00:20:19,000 and then there's a plus minus here, which 328 00:20:19,000 --> 00:20:20,700 I'll talk about later-- 329 00:20:20,700 --> 00:20:24,780 of the product from i equals 1 to M. M is the number of 330 00:20:24,780 --> 00:20:30,270 states of A sub i. 331 00:20:30,270 --> 00:20:34,480 This is the component of the ij position. 332 00:20:34,480 --> 00:20:36,300 And we're taking A sub i. 333 00:20:36,300 --> 00:20:40,260 And then the partition that we're dealing with, mu sub i. 334 00:20:40,260 --> 00:20:46,520 So what we're doing is taking a matrix with all sorts of 335 00:20:46,520 --> 00:20:48,050 terms in it-- 336 00:20:48,050 --> 00:21:03,600 A11 up to A1j on to Aj1 up to A sub jj. 337 00:21:03,600 --> 00:21:06,880 And these partitions we're talking about are ways of 338 00:21:06,880 --> 00:21:11,900 selecting one element from each row and one element from 339 00:21:11,900 --> 00:21:12,540 each column. 340 00:21:12,540 --> 00:21:20,240 Namely, that first sum there is talking about one element 341 00:21:20,240 --> 00:21:22,000 from each row. 342 00:21:22,000 --> 00:21:25,140 And then when we're talking about a permutation here, 343 00:21:25,140 --> 00:21:29,700 we're doing something like, for this row, we're looking 344 00:21:29,700 --> 00:21:31,170 at, say, this element. 345 00:21:31,170 --> 00:21:34,430 For this row, we might be looking at this element. 346 00:21:34,430 --> 00:21:37,500 For this row, we might be looking at this element, and 347 00:21:37,500 --> 00:21:40,790 so forth down, until finally, we're looking at some 348 00:21:40,790 --> 00:21:41,910 element down here. 349 00:21:41,910 --> 00:21:45,090 Now, we've picked out every column and every row in doing 350 00:21:45,090 --> 00:21:48,960 this, but we only have one element in each row and one 351 00:21:48,960 --> 00:21:51,400 element in each column. 352 00:21:51,400 --> 00:21:54,860 If you've studied linear algebra and you're at all 353 00:21:54,860 --> 00:21:58,190 interested in computation, the first thing that everybody 354 00:21:58,190 --> 00:22:02,610 tells you is that this is a god-awful way to ever compute 355 00:22:02,610 --> 00:22:07,760 a determinant, because the number of permutations grows 356 00:22:07,760 --> 00:22:10,680 very, very fast with the size of the matrix. 357 00:22:10,680 --> 00:22:12,310 And therefore you don't want to use this 358 00:22:12,310 --> 00:22:14,230 formula very often. 359 00:22:14,230 --> 00:22:18,590 It's a very useful formula conceptually, though, because 360 00:22:18,590 --> 00:22:23,620 if we look at the determinant of p minus lambda i, if we 361 00:22:23,620 --> 00:22:27,740 want to ask the question, how many eigenvalues does this 362 00:22:27,740 --> 00:22:30,140 transition matrix have? 363 00:22:30,140 --> 00:22:33,190 well, the number of eigenvalues it has is the 364 00:22:33,190 --> 00:22:36,940 number of values of lambda such that the determinant of p 365 00:22:36,940 --> 00:22:42,270 minus lambda i is 0. 366 00:22:42,270 --> 00:22:44,840 Now, how many such values are there? 367 00:22:44,840 --> 00:22:56,300 Well, you look the matrix for that, and you get A11 minus 368 00:22:56,300 --> 00:23:13,460 lambda A12 and A22 minus lambda Ajj minus lambda. 369 00:23:13,460 --> 00:23:16,430 And none of the other elements have lambda in it. 370 00:23:16,430 --> 00:23:20,450 So when you're looking at this formula for finding the 371 00:23:20,450 --> 00:23:24,840 determinant, one of the partitions is this partition, 372 00:23:24,840 --> 00:23:30,520 which is a polynomial of degree j in lambda. 373 00:23:30,520 --> 00:23:33,590 All of the others are polynomials of degree less 374 00:23:33,590 --> 00:23:35,270 than j in lambda. 375 00:23:35,270 --> 00:23:38,750 And therefore this whole bloody mess here is a 376 00:23:38,750 --> 00:23:44,140 polynomial of degree j and lambda. 377 00:23:44,140 --> 00:23:48,410 So the equation, determinant of p minus lambda i, which is 378 00:23:48,410 --> 00:23:53,070 a polynomial of degree j in lambda, equals 0. 379 00:23:53,070 --> 00:23:55,440 How many roots does it have? 380 00:23:55,440 --> 00:23:58,120 Well, the fundamental theorem of algebra says that a 381 00:23:58,120 --> 00:24:04,510 polynomial of degree j, of complex numbers-- 382 00:24:04,510 --> 00:24:07,130 and real is a special case of complex-- 383 00:24:07,130 --> 00:24:12,360 that it has exactly j roots. 384 00:24:12,360 --> 00:24:16,630 So there are exactly, in this case, M-- 385 00:24:16,630 --> 00:24:17,780 excuse me, I've been calling it j 386 00:24:17,780 --> 00:24:19,030 sometimes and M sometimes. 387 00:24:21,870 --> 00:24:26,810 This equation here has exactly M roots to it. 388 00:24:26,810 --> 00:24:30,200 And since it has exactly M roots, that's the number of 389 00:24:30,200 --> 00:24:32,370 eigenvalues there are. 390 00:24:32,370 --> 00:24:35,710 There's one flaw in that argument. 391 00:24:35,710 --> 00:24:40,020 And that is, some of the roots might be repeated. 392 00:24:40,020 --> 00:24:44,070 Say you have M roots altogether. 393 00:24:44,070 --> 00:24:48,460 Some of them appear more than one time, so you'll have roots 394 00:24:48,460 --> 00:24:51,240 of multiplicity, something or other. 395 00:24:51,240 --> 00:24:54,740 And when you add up the multiplicities of each of the 396 00:24:54,740 --> 00:24:58,860 distinct eigenvalues, you get capital M, which is 397 00:24:58,860 --> 00:25:00,700 the number of states. 398 00:25:00,700 --> 00:25:04,580 So the number of different eigenvalues is less than or 399 00:25:04,580 --> 00:25:09,940 equal to M. And the number of distinct eigenvalues times the 400 00:25:09,940 --> 00:25:17,350 multiplicity of each eigenvalue is equal to M. 401 00:25:17,350 --> 00:25:19,910 That's a simple, straightforward fact. 402 00:25:19,910 --> 00:25:22,910 And it's worth remembering. 403 00:25:22,910 --> 00:25:24,850 So there are M roots to the equation. 404 00:25:24,850 --> 00:25:28,100 Determinant p minus lambda i equals 0. 405 00:25:28,100 --> 00:25:33,220 And therefore there are M eigenvalues of p. 406 00:25:33,220 --> 00:25:38,210 And therefore you might think that there are M eigenvectors. 407 00:25:38,210 --> 00:25:43,530 That, unfortunately, is not true necessarily. 408 00:25:43,530 --> 00:25:46,460 That's one of the really-- 409 00:25:46,460 --> 00:25:50,380 it's probably the only really ugly thing in linear algebra. 410 00:25:50,380 --> 00:25:52,505 I mean, linear algebra is a beautiful theory. 411 00:25:55,380 --> 00:25:58,260 I mean, it's like Poisson's stochastic processes. 412 00:25:58,260 --> 00:26:01,190 Everything that can be true is true. 413 00:26:01,190 --> 00:26:03,070 And if something isn't true, there's a simple 414 00:26:03,070 --> 00:26:05,580 counter-example of why it can't be true. 415 00:26:05,580 --> 00:26:09,260 This thing is just a bloody mess. 416 00:26:09,260 --> 00:26:15,570 But unfortunately, if you have M states in a finite-state 417 00:26:15,570 --> 00:26:20,380 Markov chain, you might not have M different eigenvectors. 418 00:26:20,380 --> 00:26:24,790 And that's unfortunate, but we will forget about that for as 419 00:26:24,790 --> 00:26:28,780 long as we can, and we'll finally come back to it 420 00:26:28,780 --> 00:26:31,130 towards the end. 421 00:26:31,130 --> 00:26:32,380 AUDIENCE: [INAUDIBLE]? 422 00:26:38,600 --> 00:26:38,870 PROFESSOR: What? 423 00:26:38,870 --> 00:26:41,158 AUDIENCE: Why would we care about all the eigenvectors if 424 00:26:41,158 --> 00:26:45,790 we are only concerned with the ones that [INAUDIBLE]? 425 00:26:45,790 --> 00:26:47,890 PROFESSOR: Well, because we're interested in the other ones 426 00:26:47,890 --> 00:26:51,960 because that tells us how fast p to the M converges to what 427 00:26:51,960 --> 00:26:54,760 it should be. 428 00:26:54,760 --> 00:26:58,700 I mean, all those other eigenvalues, as we'll see, are 429 00:26:58,700 --> 00:27:04,884 the error terms in p to the M as it approaches this 430 00:27:04,884 --> 00:27:06,868 asymptotic value. 431 00:27:06,868 --> 00:27:10,416 And therefore we want to know what those eigenvalues are. 432 00:27:10,416 --> 00:27:11,630 At least we want to know what the 433 00:27:11,630 --> 00:27:13,258 second-biggest eigenvalue is. 434 00:27:19,820 --> 00:27:23,850 Now, let's look at just a case of two states. 435 00:27:23,850 --> 00:27:28,020 Most of the things that can happen will happen with two 436 00:27:28,020 --> 00:27:31,280 states, except for this ugly thing that I told you about 437 00:27:31,280 --> 00:27:33,330 that can't happen with two states. 438 00:27:33,330 --> 00:27:36,370 And therefore two states is a good thing to look at, because 439 00:27:36,370 --> 00:27:38,930 with two states, you can calculate everything very 440 00:27:38,930 --> 00:27:43,210 easily and you don't have to use any linear algebra. 441 00:27:43,210 --> 00:27:48,010 So if we look at a Markov chain with two states, P sub 442 00:27:48,010 --> 00:27:54,620 ij is this set of transition probabilities. 443 00:27:54,620 --> 00:28:03,880 The left eigenvector equation is pi 1 times P11 times pi 2 444 00:28:03,880 --> 00:28:07,490 times P21 is equal to lambda pi 1. 445 00:28:07,490 --> 00:28:12,500 And so this is writing out what we said before. 446 00:28:12,500 --> 00:28:17,770 The vector pi times the matrix P is equal to lambda 447 00:28:17,770 --> 00:28:19,780 times the vector pi. 448 00:28:19,780 --> 00:28:21,560 That covers both of these equations. 449 00:28:21,560 --> 00:28:23,810 Since M is only 2, we only have to 450 00:28:23,810 --> 00:28:25,830 write things out twice. 451 00:28:25,830 --> 00:28:29,250 Same thing for the right eigenvector equation. 452 00:28:29,250 --> 00:28:30,990 That's this. 453 00:28:30,990 --> 00:28:34,230 The determinant of P minus lambda i, if we use this 454 00:28:34,230 --> 00:28:39,940 formula that we talked about here, you put A11 minus 455 00:28:39,940 --> 00:28:42,430 lambda, A22 minus lambda. 456 00:28:42,430 --> 00:28:44,620 Well, then you're done. 457 00:28:44,620 --> 00:28:50,220 So all you need is P11 minus lambda times P22 minus lambda. 458 00:28:50,220 --> 00:28:53,560 That's this permutation there. 459 00:28:53,560 --> 00:28:59,150 And then you have an odd permutation, A12 times A21. 460 00:28:59,150 --> 00:29:01,470 How do you know which permutations are even and 461 00:29:01,470 --> 00:29:04,160 which permutations are odd? 462 00:29:04,160 --> 00:29:07,070 It's how many flips you have to do. 463 00:29:07,070 --> 00:29:09,930 But to see that that's consistent, you really have to 464 00:29:09,930 --> 00:29:13,800 look at Strang or some book on linear algebra, because it's 465 00:29:13,800 --> 00:29:15,590 not relevant here. 466 00:29:15,590 --> 00:29:18,100 But anyway, that determinant is equal to 467 00:29:18,100 --> 00:29:19,970 this quantity here. 468 00:29:19,970 --> 00:29:24,850 That's a polynomial of degree 2 in lambda. 469 00:29:24,850 --> 00:29:30,650 If you solve it, you find out that one solution is 470 00:29:30,650 --> 00:29:32,690 lambda 1 equals 1. 471 00:29:32,690 --> 00:29:39,710 The other solution is lambda 2 is 1 minus P12 minus P21. 472 00:29:39,710 --> 00:29:44,020 Now, there are a bunch of cases to look at here. 473 00:29:44,020 --> 00:29:48,770 If the off-diagonal transition probabilities are both 0, what 474 00:29:48,770 --> 00:29:49,380 does that mean? 475 00:29:49,380 --> 00:29:52,520 It means if you start in state 0, you stay there. 476 00:29:52,520 --> 00:29:56,450 If you start in state 1, you stay there forever. 477 00:29:56,450 --> 00:29:59,590 If you start in state 2, you stay there forever. 478 00:29:59,590 --> 00:30:04,520 That's a very boring Markov chain, but it's not very nice 479 00:30:04,520 --> 00:30:06,780 for the theory. 480 00:30:06,780 --> 00:30:11,120 So we're going to leave that case out for the time being. 481 00:30:11,120 --> 00:30:14,530 But anyway, if you have that case, then the chain has two 482 00:30:14,530 --> 00:30:16,520 recurrent classes. 483 00:30:16,520 --> 00:30:19,740 Lambda equals 1, has multiplicity 2. 484 00:30:19,740 --> 00:30:27,430 You have two eigenvalues of algebraic multiplicity 2. 485 00:30:27,430 --> 00:30:30,660 I mean, it's just one number, but it appears twice in this 486 00:30:30,660 --> 00:30:32,920 determinant equation. 487 00:30:32,920 --> 00:30:35,960 And it also appears twice in the sense that you have two 488 00:30:35,960 --> 00:30:37,510 recurrent classes. 489 00:30:37,510 --> 00:30:43,930 And you will find that there are two linearly independent 490 00:30:43,930 --> 00:30:47,210 left eigenvectors, two linearly independent right 491 00:30:47,210 --> 00:30:48,360 eigenvectors. 492 00:30:48,360 --> 00:30:50,400 And how do you find those? 493 00:30:50,400 --> 00:30:54,160 You use your common sense and you say, well, if you start in 494 00:30:54,160 --> 00:30:55,910 state 1, you're always there. 495 00:30:55,910 --> 00:30:58,400 If you start in state 2, you're always there. 496 00:30:58,400 --> 00:31:01,130 Why do I even look at these two states? 497 00:31:01,130 --> 00:31:05,330 This is a crazy thing where wherever I start, I stay there 498 00:31:05,330 --> 00:31:09,130 and I only look at state 1 or state 2. 499 00:31:09,130 --> 00:31:13,820 It's scarcely even a Markov chain. 500 00:31:13,820 --> 00:31:19,630 If P12 and P21 are both 1, what it means is you can never 501 00:31:19,630 --> 00:31:21,610 go from state 1 to state 1. 502 00:31:21,610 --> 00:31:24,200 You always go from state 1 to state 2. 503 00:31:24,200 --> 00:31:27,220 And you always go from state 2 to state 1. 504 00:31:27,220 --> 00:31:30,830 It means you have a two-state periodic chain. 505 00:31:30,830 --> 00:31:33,130 And that's the other crazy case. 506 00:31:33,130 --> 00:31:35,170 The other case is not very interesting. 507 00:31:35,170 --> 00:31:38,800 There's nothing stochastic about it at all. 508 00:31:38,800 --> 00:31:40,970 So the chain is periodic. 509 00:31:40,970 --> 00:31:45,170 And if you look at this equation here, the second 510 00:31:45,170 --> 00:31:48,270 eigenvalue is equal to minus 1. 511 00:31:48,270 --> 00:31:51,790 I might as well tell you that, in general, if you have a 512 00:31:51,790 --> 00:31:57,520 periodic Markov chain, just one recurrent class and it's 513 00:31:57,520 --> 00:32:04,760 periodic, a period d, then the eigenvalues turn out to be the 514 00:32:04,760 --> 00:32:08,890 uniformly spaced eigenvalues around the unit circle. 515 00:32:08,890 --> 00:32:10,920 One is one of the eigenvalues. 516 00:32:10,920 --> 00:32:12,490 We've already seen that. 517 00:32:12,490 --> 00:32:15,940 And the other d minus 1 eigenvalues are those 518 00:32:15,940 --> 00:32:18,690 uniformly spaced around the unit circle. 519 00:32:18,690 --> 00:32:24,410 So they add up to 360 degrees when you get all done with it. 520 00:32:24,410 --> 00:32:26,270 So that's an easy case. 521 00:32:26,270 --> 00:32:29,780 Proving that is tedious. 522 00:32:29,780 --> 00:32:31,420 It's done in the notes. 523 00:32:31,420 --> 00:32:32,690 It's not even done in the notes. 524 00:32:32,690 --> 00:32:34,680 It's done in one of the exercises. 525 00:32:34,680 --> 00:32:38,260 And you can do it if you choose. 526 00:32:41,470 --> 00:32:46,540 So let's look at these eigenvector equations and the 527 00:32:46,540 --> 00:32:48,400 eigenvalue equations. 528 00:32:48,400 --> 00:32:52,200 Incidentally, if you don't know what the eigenvalues are, 529 00:32:52,200 --> 00:32:57,010 is this a linear set of equations? 530 00:32:57,010 --> 00:32:59,380 No, it's a nonlinear set of equations. 531 00:32:59,380 --> 00:33:02,770 This is a nonlinear set of equations in pi 532 00:33:02,770 --> 00:33:06,930 1, pi 2, and lambda. 533 00:33:06,930 --> 00:33:11,390 How do you solve non-linear equations like that? 534 00:33:11,390 --> 00:33:14,750 Well, if you have much sense, you first find out what lambda 535 00:33:14,750 --> 00:33:17,350 is and then you solve linear equations. 536 00:33:20,105 --> 00:33:21,390 And you can always do that. 537 00:33:21,390 --> 00:33:25,880 We've said that these solutions for lambda, there 538 00:33:25,880 --> 00:33:28,660 can only be M of them. 539 00:33:28,660 --> 00:33:30,210 And you can find them by solving 540 00:33:30,210 --> 00:33:32,220 this polynomial equation. 541 00:33:32,220 --> 00:33:35,910 Then you can solve the linear equation by finding the 542 00:33:35,910 --> 00:33:37,100 eigenvectors. 543 00:33:37,100 --> 00:33:39,750 There are packages to do all of these things, so there's 544 00:33:39,750 --> 00:33:44,650 nothing you should waste time on doing here. 545 00:33:44,650 --> 00:33:49,090 It's just knowing what the results are that's important. 546 00:33:49,090 --> 00:33:55,100 From now on, I'm going to assume that P12 or P21 are 547 00:33:55,100 --> 00:33:56,140 greater than 0. 548 00:33:56,140 --> 00:33:58,650 In other words, I'm going to assume that we don't have the 549 00:33:58,650 --> 00:34:05,010 periodic case and we don't have the case where you have 550 00:34:05,010 --> 00:34:07,000 two classes of states. 551 00:34:07,000 --> 00:34:11,760 In other words, I'm going to assume that our Markov chain 552 00:34:11,760 --> 00:34:13,080 is actually ergodic. 553 00:34:13,080 --> 00:34:17,530 That's the assumption that I'm making here. 554 00:34:17,530 --> 00:34:22,500 If you then solve these equations using lambda 1 555 00:34:22,500 --> 00:34:27,380 equals 1, you'll find out that pi 1 is the 556 00:34:27,380 --> 00:34:29,350 component sum to 1. 557 00:34:29,350 --> 00:34:32,670 First component is P21 over the sum. 558 00:34:32,670 --> 00:34:36,700 Second component is P12 over the sum. 559 00:34:36,700 --> 00:34:37,950 Not very interesting. 560 00:34:40,440 --> 00:34:44,520 Why is the steady state probability weighted towards 561 00:34:44,520 --> 00:34:47,839 the largest of these transition probabilities? 562 00:34:47,839 --> 00:34:54,330 If P21 is bigger than P12, how do you know intuitively that 563 00:34:54,330 --> 00:34:59,196 you're going to be in state 1 more than you're in state 2? 564 00:34:59,196 --> 00:35:03,131 Is this intuitively obvious to-- yeah? 565 00:35:03,131 --> 00:35:04,381 AUDIENCE: [INAUDIBLE]. 566 00:35:06,220 --> 00:35:08,810 PROFESSOR: Because you make more transistors from 2 to 1. 567 00:35:08,810 --> 00:35:11,970 Well, actually you don't make more transitions from 2 to 1. 568 00:35:11,970 --> 00:35:14,890 You make exactly the same number of transitions, but 569 00:35:14,890 --> 00:35:17,260 since the probability is higher, it means you have to 570 00:35:17,260 --> 00:35:20,160 be in state 1 more of the time. 571 00:35:20,160 --> 00:35:21,410 Good. 572 00:35:23,840 --> 00:35:25,520 So these are the two. 573 00:35:28,740 --> 00:35:35,920 And this is the left eigenvector for the second 574 00:35:35,920 --> 00:35:36,480 eigenvalue-- 575 00:35:36,480 --> 00:35:39,230 namely, the smaller eigenvalue. 576 00:35:39,230 --> 00:35:47,000 Now, if you look at these equations, you'll notice that 577 00:35:47,000 --> 00:35:54,390 the vector pi, the left i-th eigenvector, multiplied by the 578 00:35:54,390 --> 00:35:59,650 right j-th eigenvector, is always equal to delta ij. 579 00:35:59,650 --> 00:36:05,790 In other words, the left eigenvectors are orthogonal to 580 00:36:05,790 --> 00:36:08,670 the right eigenvectors. 581 00:36:08,670 --> 00:36:11,840 I mean, you can see this just by multiplying it out. 582 00:36:11,840 --> 00:36:16,310 You multiply pi 1 times nu 1, and what do you get? 583 00:36:16,310 --> 00:36:20,460 You get this plus this, which is 1. 584 00:36:20,460 --> 00:36:25,310 Delta 11 means there's something which is 1 when i is 585 00:36:25,310 --> 00:36:29,165 equal j and 0 when i is unequal to j. 586 00:36:29,165 --> 00:36:36,170 You take this and you multiply it by this, 587 00:36:36,170 --> 00:36:36,950 and what do you get? 588 00:36:36,950 --> 00:36:41,160 You get P21 times P12 over the square. 589 00:36:41,160 --> 00:36:45,830 Minus P12 times P21, it's 0. 590 00:36:45,830 --> 00:36:47,040 Same thing here. 591 00:36:47,040 --> 00:36:53,160 1 minus 1, that vector times this vector, is 0 again. 592 00:36:53,160 --> 00:36:56,160 So the cross-terms are 0. 593 00:36:56,160 --> 00:36:58,680 The diagonal terms are 1. 594 00:37:07,150 --> 00:37:08,400 That's the way it is. 595 00:37:11,500 --> 00:37:13,410 So let's move on with this. 596 00:37:17,580 --> 00:37:21,440 These right eigenvector equations, you can write them 597 00:37:21,440 --> 00:37:23,530 in matrix form. 598 00:37:23,530 --> 00:37:24,720 I'm doing this slowly. 599 00:37:24,720 --> 00:37:27,740 I hope I'm not boring those who have done a lot of linear 600 00:37:27,740 --> 00:37:29,830 algebra too much. 601 00:37:29,830 --> 00:37:37,130 But they won't go on forever, and it gets us to where we 602 00:37:37,130 --> 00:37:38,270 want to go. 603 00:37:38,270 --> 00:37:41,830 So if you take these two equations and you write them 604 00:37:41,830 --> 00:37:48,170 in matrix form, what you get is P times u, where u is a 605 00:37:48,170 --> 00:37:53,770 matrix whose columns are the vector nu 1 and 606 00:37:53,770 --> 00:37:56,120 the vector nu 2. 607 00:37:56,120 --> 00:38:01,090 And capital lambda is the diagonal matrix of the 608 00:38:01,090 --> 00:38:01,980 eigenvalues. 609 00:38:01,980 --> 00:38:07,540 If you multiply P times the first column of u, and then 610 00:38:07,540 --> 00:38:11,940 you look at the first column of this matrix, what you get-- 611 00:38:11,940 --> 00:38:13,680 yes, that's exactly the right way to do it. 612 00:38:17,100 --> 00:38:20,510 And if you're not doing that, you're probably not 613 00:38:20,510 --> 00:38:21,330 understanding it. 614 00:38:21,330 --> 00:38:25,220 But if you just think of ordinary matrix vector 615 00:38:25,220 --> 00:38:29,110 multiplication, this all works out. 616 00:38:32,320 --> 00:38:36,700 Because of this orthogonality relationship, we see that the 617 00:38:36,700 --> 00:38:52,530 matrix whose rows are the left eigenvectors times the matrix 618 00:38:52,530 --> 00:38:56,980 whose columns are the right eigenvectors, 619 00:38:56,980 --> 00:38:59,720 that's equal to i. 620 00:38:59,720 --> 00:39:01,740 Namely, it's equal to the identity matrix. 621 00:39:01,740 --> 00:39:05,580 That's what this orthogonality relationship means. 622 00:39:05,580 --> 00:39:12,730 This means that this matrix is the inverse of this matrix. 623 00:39:12,730 --> 00:39:16,310 This proves that u is invertible. 624 00:39:16,310 --> 00:39:20,250 And in fact, we've done this just for m equals 2. 625 00:39:20,250 --> 00:39:24,390 But in fact, this proof is general and holds for 626 00:39:24,390 --> 00:39:31,520 arbitrary Markov chains if the eigenvectors span the space. 627 00:39:31,520 --> 00:39:32,960 And we'll see that later. 628 00:39:32,960 --> 00:39:38,030 We're doing this for m equals 2 now, so we how to proceed 629 00:39:38,030 --> 00:39:41,130 when we have an arbitrary Markov chain. 630 00:39:41,130 --> 00:39:42,610 u is invertible. 631 00:39:42,610 --> 00:39:46,790 u to the minus 1 has pi 1 and pi 2 as rows. 632 00:39:46,790 --> 00:39:49,690 And thus P is going to be equal to-- 633 00:39:52,540 --> 00:39:54,020 I guess we should-- 634 00:39:54,020 --> 00:39:56,180 oh, we set it up here. 635 00:39:56,180 --> 00:39:59,220 P times u is equal to u times lambda. 636 00:39:59,220 --> 00:40:02,780 We've shown here that u is invertible, therefore we can 637 00:40:02,780 --> 00:40:06,830 multiply this equation by u to the minus 1. 638 00:40:06,830 --> 00:40:11,830 And we get the transition matrix P is equal to u times 639 00:40:11,830 --> 00:40:15,730 the diagonal matrix lambda times the matrix u 640 00:40:15,730 --> 00:40:18,300 to the minus 1. 641 00:40:18,300 --> 00:40:21,350 What happens if we try to find P squared? 642 00:40:21,350 --> 00:40:25,470 Well, it's u times lambda times u to the minus 1. 643 00:40:25,470 --> 00:40:28,470 One of the nice things about matrices is you can multiply 644 00:40:28,470 --> 00:40:30,310 them, if you don't worry about the 645 00:40:30,310 --> 00:40:32,960 details, almost like numbers. 646 00:40:32,960 --> 00:40:36,540 Times u times lambda times u to the minus 1. 647 00:40:36,540 --> 00:40:41,580 Except you don't have commutativity. 648 00:40:41,580 --> 00:40:44,310 That's the only thing that you don't have. 649 00:40:44,310 --> 00:40:47,100 But anyway, you have u times lambda times u to the minus 1 650 00:40:47,100 --> 00:40:50,600 times u times lambda times u to t he minus 1. 651 00:40:50,600 --> 00:40:54,840 This and this turn out to be the identity matrix, so you 652 00:40:54,840 --> 00:40:58,220 have u times lambda times lambda, which is lambda 653 00:40:58,220 --> 00:41:00,870 squared, times u to the minus 1. 654 00:41:00,870 --> 00:41:03,580 You still have this diagonal matrix here, but the 655 00:41:03,580 --> 00:41:06,610 eigenvalues have all been doubled. 656 00:41:06,610 --> 00:41:12,660 If you keep doing that repeatedly, you find out that 657 00:41:12,660 --> 00:41:17,410 P to the n-- namely, this long-term transition matrix, 658 00:41:17,410 --> 00:41:20,290 which is the thing we're interested in-- 659 00:41:20,290 --> 00:41:25,860 is the matrix u times this diagonal matrix, lambda to the 660 00:41:25,860 --> 00:41:29,710 n, times u to the minus 1. 661 00:41:29,710 --> 00:41:34,910 Equation 329 in the text has a typo, and it should be this. 662 00:41:34,910 --> 00:41:39,650 It's given as u to the minus 1 times lambda to the n times u, 663 00:41:39,650 --> 00:41:43,030 which is not at all right. 664 00:41:43,030 --> 00:41:47,730 That's probably the worst typo, because if you try to 665 00:41:47,730 --> 00:41:51,076 say something from that, you'll get very confused. 666 00:41:51,076 --> 00:41:54,700 You can solve one in general if all the M eigenvalues are 667 00:41:54,700 --> 00:41:57,730 distinct as easily as for M equals 2. 668 00:41:57,730 --> 00:42:01,020 This is still valid so long as the 669 00:42:01,020 --> 00:42:04,780 eigenvectors span the space. 670 00:42:04,780 --> 00:42:09,600 So now the thing we want to do is relatively simple. 671 00:42:09,600 --> 00:42:14,460 This lambda to the n is a diagonal matrix. 672 00:42:14,460 --> 00:42:19,900 I can represent it as the sum of M different matrices. 673 00:42:19,900 --> 00:42:23,750 And each of those matrices has only one 674 00:42:23,750 --> 00:42:26,590 diagonal element, non-0. 675 00:42:26,590 --> 00:42:30,670 In other words, for the case here, what we're doing is 676 00:42:30,670 --> 00:42:40,650 taking lambda 1, 0 to the n, 0 lambda 2 to the n, and 677 00:42:40,650 --> 00:42:54,090 representing this as lambda 1 to the n, 0, 0, 0, plus 0, 0, 678 00:42:54,090 --> 00:42:57,890 0 lambda 2 to the n. 679 00:43:01,240 --> 00:43:07,940 So we have those trivial matrices with u on the left 680 00:43:07,940 --> 00:43:11,620 side and u to the minus 1 on the right side. 681 00:43:11,620 --> 00:43:17,920 And we think of how to multiply the matrix u, which 682 00:43:17,920 --> 00:43:23,910 is a matrix whose columns are the eigenvectors, times this 683 00:43:23,910 --> 00:43:27,730 matrix with only one non-0 element, times the matrix 684 00:43:27,730 --> 00:43:33,620 here, whose elements are the left eigenvectors. 685 00:43:33,620 --> 00:43:36,050 And how do you do that? 686 00:43:36,050 --> 00:43:40,200 Well, if you do this for a while, and you think of what 687 00:43:40,200 --> 00:43:46,720 this one element here times a matrix whose rows are 688 00:43:46,720 --> 00:43:52,280 eigenvectors does, this non-0 term in here picks out the 689 00:43:52,280 --> 00:43:54,830 appropriate row here. 690 00:43:54,830 --> 00:43:57,310 And this non-0 element picks out the 691 00:43:57,310 --> 00:43:59,620 appropriate column here. 692 00:43:59,620 --> 00:44:05,770 So what that gives you is p to the n is equal to the sum over 693 00:44:05,770 --> 00:44:10,550 the number of states in the Markov chain times lambda sub 694 00:44:10,550 --> 00:44:13,780 i-- the i-th value to the nth power-- 695 00:44:13,780 --> 00:44:17,090 times nu to the i times pi to the i. 696 00:44:17,090 --> 00:44:21,560 pi to the i is the i-th eigenvector of p. 697 00:44:21,560 --> 00:44:26,150 nu to the i is the i-th right eigenvector of p. 698 00:44:26,150 --> 00:44:28,010 They have nothing to do with n. 699 00:44:28,010 --> 00:44:33,370 The only thing that n affects is this eigenvalue here. 700 00:44:33,370 --> 00:44:37,650 And what this is saying is that p to the n is just the 701 00:44:37,650 --> 00:44:46,160 sum of eigenvalues which are, if lambda is bigger than 1, 702 00:44:46,160 --> 00:44:47,720 this is exploding. 703 00:44:47,720 --> 00:44:51,540 If lambda 1 is less than 1, it's going to 0. 704 00:44:51,540 --> 00:44:55,410 And if lambda 1 is equal to 1, it's staying constant. 705 00:44:55,410 --> 00:45:00,790 If lambda 1 is complex but has magnitude 1, then it's just 706 00:45:00,790 --> 00:45:04,770 gradually rotating around and not doing much of interest at 707 00:45:04,770 --> 00:45:06,930 all, but it's going away. 708 00:45:06,930 --> 00:45:08,520 So that's what this equation means. 709 00:45:08,520 --> 00:45:13,730 It says that we've converted the problem of finding the nth 710 00:45:13,730 --> 00:45:18,030 power of p just to this problem of finding the nth 711 00:45:18,030 --> 00:45:20,740 power of these eigenvalues. 712 00:45:20,740 --> 00:45:22,070 So we've made some real progress. 713 00:45:22,070 --> 00:45:24,505 AUDIENCE: Professor, what is nu i right here? 714 00:45:24,505 --> 00:45:24,992 PROFESSOR: What? 715 00:45:24,992 --> 00:45:26,453 AUDIENCE: What is nu i? 716 00:45:29,635 --> 00:45:34,680 PROFESSOR: nu sub i is the i-th of the right eigenvectors 717 00:45:34,680 --> 00:45:37,624 of the matrix p. 718 00:45:37,624 --> 00:45:38,940 AUDIENCE: And pi i? 719 00:45:38,940 --> 00:45:44,020 PROFESSOR: And pi i is the i-th left eigenvector. 720 00:45:44,020 --> 00:45:48,060 And what we've shown is that these are orthogonal to each 721 00:45:48,060 --> 00:45:51,554 other, orthonormal. 722 00:45:51,554 --> 00:45:53,932 AUDIENCE: Can you please say again what happens when lambda 723 00:45:53,932 --> 00:45:54,890 is complex? 724 00:45:54,890 --> 00:45:55,290 PROFESSOR: What? 725 00:45:55,290 --> 00:45:57,094 AUDIENCE: When lambda is complex, what exactly happens? 726 00:46:01,110 --> 00:46:04,160 PROFESSOR: Oh, if lambda i is complex and the magnitude is 727 00:46:04,160 --> 00:46:07,190 less than 1, it just dies away. 728 00:46:07,190 --> 00:46:09,730 if the magnitude is bigger than 1, it explodes, which 729 00:46:09,730 --> 00:46:10,860 will be very strange. 730 00:46:10,860 --> 00:46:12,800 And we'll see that can't happen. 731 00:46:12,800 --> 00:46:17,540 And if the magnitude is 1, as you take powers of a complex 732 00:46:17,540 --> 00:46:22,320 number of magnitude 1, I mean, it start out here, it goes 733 00:46:22,320 --> 00:46:23,820 here, then here. 734 00:46:23,820 --> 00:46:27,620 I mean, it just rotates around in some crazy way. 735 00:46:27,620 --> 00:46:31,220 But it maintains its magnitude as being equal 736 00:46:31,220 --> 00:46:32,470 to 1 all the time. 737 00:46:38,290 --> 00:46:40,850 So this is just repeating what we had before. 738 00:46:40,850 --> 00:46:42,100 These are the eigenvectors. 739 00:46:46,350 --> 00:46:51,690 If you calculate this very quickly using this and this, 740 00:46:51,690 --> 00:46:59,970 and if you recognize that the right eigenvector, nu 2, is 741 00:46:59,970 --> 00:47:07,060 the first part of it is pi sub 2, the second part of it minus 742 00:47:07,060 --> 00:47:14,450 pi sub 1, where pi is just this first eigenvector here. 743 00:47:14,450 --> 00:47:17,010 So if you do this multiplication, you find that 744 00:47:17,010 --> 00:47:19,960 nu to the 1-- 745 00:47:19,960 --> 00:47:21,560 oh, I thought I had all of these things out. 746 00:47:21,560 --> 00:47:22,810 This should be nu. 747 00:47:26,580 --> 00:47:32,270 The first right eigenvector times the first left 748 00:47:32,270 --> 00:47:32,630 eigenvector. 749 00:47:32,630 --> 00:47:35,750 Oh, but this is all right, because I'm saying the first 750 00:47:35,750 --> 00:47:39,080 left eigenvector is a steady state vector, which is the 751 00:47:39,080 --> 00:47:40,570 thing we're interested in. 752 00:47:40,570 --> 00:47:46,210 That's pi 1, pi 2, pi 1, pi 2, where pi 1 is 753 00:47:46,210 --> 00:47:49,070 this and pi 2 is this. 754 00:47:49,070 --> 00:47:53,480 nu 2 times pi 2 is just this. 755 00:47:53,480 --> 00:47:59,910 So when we calculate p sub n, we get pi 1 plus pi 2 times 756 00:47:59,910 --> 00:48:03,530 this eigenvalue to the nth power. 757 00:48:03,530 --> 00:48:07,000 Pi 1 minus pi 1, lambda 2 to the nth power. 758 00:48:07,000 --> 00:48:12,110 pi 2 and pi 2 is what we get for the main eigenvalue. 759 00:48:12,110 --> 00:48:14,980 This is what we get for the little eigenvalue. 760 00:48:14,980 --> 00:48:20,790 This little eigenvalue here is 1 minus P12 minus P21, which 761 00:48:20,790 --> 00:48:29,690 has magnitude less than 1, unless we either have the 762 00:48:29,690 --> 00:48:33,840 situation where P12 is equal to P21 is equal to 0, or both 763 00:48:33,840 --> 00:48:35,140 of them are 1. 764 00:48:35,140 --> 00:48:38,500 So these are the terms that go to 0. 765 00:48:38,500 --> 00:48:39,755 This solution is exact. 766 00:48:39,755 --> 00:48:41,990 There were no approximations in here. 767 00:48:41,990 --> 00:48:47,140 Before, when we analyzed what happened to P to the n, we saw 768 00:48:47,140 --> 00:48:49,690 that we converged, but we didn't really 769 00:48:49,690 --> 00:48:51,250 see how fast we converged. 770 00:48:51,250 --> 00:48:53,750 Now we know how fast we converge. 771 00:48:53,750 --> 00:48:59,010 The rate of convergence is the value of this second 772 00:48:59,010 --> 00:49:01,790 eigenvalue here. 773 00:49:01,790 --> 00:49:04,230 And that's a pretty general result. 774 00:49:04,230 --> 00:49:08,150 You converged like the second-largest eigenvalue. 775 00:49:08,150 --> 00:49:10,900 And we'll see how that works out. 776 00:49:15,210 --> 00:49:18,810 Now, let's go on to the case where you have an arbitrary 777 00:49:18,810 --> 00:49:20,230 number of states. 778 00:49:20,230 --> 00:49:23,820 We've almost solved that already, because as we were 779 00:49:23,820 --> 00:49:29,870 looking at the case with two states, we were doing most of 780 00:49:29,870 --> 00:49:32,000 the things in general. 781 00:49:32,000 --> 00:49:36,430 If you have an n state Markov chain, a determinant of P 782 00:49:36,430 --> 00:49:40,760 minus lambda is a polynomial of degree M in lambda. 783 00:49:40,760 --> 00:49:42,790 That was what we said a while ago. 784 00:49:42,790 --> 00:49:45,480 It has M roots, eigenvalues. 785 00:49:45,480 --> 00:49:48,740 And here, we're going to assume that those roots are 786 00:49:48,740 --> 00:49:49,520 all distinct. 787 00:49:49,520 --> 00:49:52,590 So we don't have to worry about what happens with 788 00:49:52,590 --> 00:49:54,320 repeated roots. 789 00:49:54,320 --> 00:49:58,010 Each eigenvalue lambda sub i-- there are M of them now-- 790 00:49:58,010 --> 00:50:03,160 has a right eigenvector, nu sub i, and a left 791 00:50:03,160 --> 00:50:06,010 eigenvector, pi sub i. 792 00:50:06,010 --> 00:50:10,030 And we have seen that-- 793 00:50:10,030 --> 00:50:11,220 well, we haven't seen it yet. 794 00:50:11,220 --> 00:50:13,140 We're going to show it in a second. 795 00:50:13,140 --> 00:50:18,060 pi super i times nu super j is equal to j for each 796 00:50:18,060 --> 00:50:20,420 ij unequal to i. 797 00:50:20,420 --> 00:50:24,660 If you scale either this or that, when you saw this 798 00:50:24,660 --> 00:50:30,160 eigenvector equation, you have a pi on both sides or a nu on 799 00:50:30,160 --> 00:50:34,280 both sides, and you have a scale factor which can't be 800 00:50:34,280 --> 00:50:37,040 determined from the eigenvector equation. 801 00:50:37,040 --> 00:50:41,020 So you have to choose that scaling factor somehow. 802 00:50:41,020 --> 00:50:45,070 If we choose the scaling factor appropriately, we get 803 00:50:45,070 --> 00:50:51,810 pi, the i-th left eigenvector, times the i-th right 804 00:50:51,810 --> 00:50:52,075 eigenvector. 805 00:50:52,075 --> 00:50:53,610 This is just a number now. 806 00:50:53,610 --> 00:50:56,520 It's that times that. 807 00:50:56,520 --> 00:51:00,810 We can scale things, so that's equal to 1. 808 00:51:00,810 --> 00:51:05,930 Then as before, let u be the matrix with columns nu 1 to nu 809 00:51:05,930 --> 00:51:12,090 M, and let v have the rows, pi 1 to pi M. Because of this 810 00:51:12,090 --> 00:51:16,340 orthogonality relationship we've set up, v times u is 811 00:51:16,340 --> 00:51:17,530 equal to i. 812 00:51:17,530 --> 00:51:26,310 So again, the left eigenvector rows forms a matrix which is 813 00:51:26,310 --> 00:51:30,910 the inverse of the right eigenvector columns. 814 00:51:30,910 --> 00:51:35,400 So that says v is equal to u to the minus 1. 815 00:51:35,400 --> 00:51:41,040 Thus the eigenvector is nu, the first right eigenvector up 816 00:51:41,040 --> 00:51:44,870 to the nth right eigenvector, these are linearly 817 00:51:44,870 --> 00:51:46,100 independent. 818 00:51:46,100 --> 00:51:47,350 And they span M space. 819 00:51:50,040 --> 00:51:53,480 That's a very peculiar thing we've done. 820 00:51:53,480 --> 00:51:57,600 We've said we have all these M right eigenvectors. 821 00:51:57,600 --> 00:52:02,690 We don't know anything about them, but what we do know is 822 00:52:02,690 --> 00:52:08,030 we also have M left eigenvectors. 823 00:52:08,030 --> 00:52:13,070 And the left eigenvectors, as we're going to show in just a 824 00:52:13,070 --> 00:52:17,800 second, are orthogonal to the right eigenvectors. 825 00:52:17,800 --> 00:52:20,500 And therefore, when we look at these two matrices, we can 826 00:52:20,500 --> 00:52:23,610 multiply them and get the identity matrix. 827 00:52:23,610 --> 00:52:29,370 And that means that the right eigenvectors have to be-- 828 00:52:29,370 --> 00:52:31,970 when we look at the matrix of the right eigenvectors, is 829 00:52:31,970 --> 00:52:33,220 non-singular. 830 00:52:34,920 --> 00:52:37,870 Very, very peculiar argument. 831 00:52:37,870 --> 00:52:41,350 I mean, we find out that those right eigenvectors span the 832 00:52:41,350 --> 00:52:44,600 space, not by looking at the right eigenvectors, but by 833 00:52:44,600 --> 00:52:48,220 looking at how they relate to the left eigenvectors. 834 00:52:48,220 --> 00:52:51,370 But anyway, that's perfectly all right. 835 00:52:51,370 --> 00:52:54,890 And so long as we can show that we can satisfy this 836 00:52:54,890 --> 00:53:01,330 orthogonality condition, then in fact all this works out. v 837 00:53:01,330 --> 00:53:03,560 is equal to u to the minus 1. 838 00:53:03,560 --> 00:53:06,210 These eigenvectors are linearly independent and they 839 00:53:06,210 --> 00:53:07,380 span M space. 840 00:53:07,380 --> 00:53:08,630 Same here. 841 00:53:12,980 --> 00:53:17,010 And putting these equations together, P times u equals u 842 00:53:17,010 --> 00:53:17,810 times lambda. 843 00:53:17,810 --> 00:53:19,960 This is exactly what we did before. 844 00:53:19,960 --> 00:53:24,680 Post-multiplying by u to the minus 1, we get P equals u 845 00:53:24,680 --> 00:53:27,210 times lambda times u to the minus 1. 846 00:53:27,210 --> 00:53:30,430 P to the n is then u times lambda to the n times u 847 00:53:30,430 --> 00:53:32,230 to the minus 1. 848 00:53:32,230 --> 00:53:35,670 All this stuff about convergence is all revolving 849 00:53:35,670 --> 00:53:39,010 down to simply the question of what happens to these 850 00:53:39,010 --> 00:53:40,200 eigenvalues. 851 00:53:40,200 --> 00:53:43,420 I mean, there's a mess first, finding out what all these 852 00:53:43,420 --> 00:53:47,580 right eigenvectors are and what all these left 853 00:53:47,580 --> 00:53:48,740 eigenvectors are. 854 00:53:48,740 --> 00:53:54,870 But once you do that, P to the n is just looking at this 855 00:53:54,870 --> 00:53:57,300 quantity, breaking up lambda to the n 856 00:53:57,300 --> 00:53:59,660 the way we did before. 857 00:53:59,660 --> 00:54:04,550 P to the n is just this sum here. 858 00:54:04,550 --> 00:54:08,670 Now, each row of P sums to 1, so e is a right eigenvector of 859 00:54:08,670 --> 00:54:10,960 eigenvalue 1. 860 00:54:10,960 --> 00:54:15,040 So we have a theorem that says the left eigenvector pi of 861 00:54:15,040 --> 00:54:20,060 eigenvalue 1 is a steady state vector if it's normalized to 862 00:54:20,060 --> 00:54:22,510 pi times e equals 1. 863 00:54:26,050 --> 00:54:30,760 So we almost did that before, but now we want to be a little 864 00:54:30,760 --> 00:54:32,010 more careful about it. 865 00:54:38,768 --> 00:54:42,180 Oh, excuse me. 866 00:54:42,180 --> 00:54:45,640 The theorem is that the left eigenvector pi is a steady 867 00:54:45,640 --> 00:54:48,040 state vector if it's normalized in this way. 868 00:54:48,040 --> 00:54:53,250 In other words, we know that there is a left eigenvector 869 00:54:53,250 --> 00:54:57,830 pi, which has eigenvalue 1, because there's a right 870 00:54:57,830 --> 00:54:58,260 eigenvector. 871 00:54:58,260 --> 00:55:00,850 If there's a right eigenvector, there has to be a 872 00:55:00,850 --> 00:55:02,320 left eigenvector. 873 00:55:02,320 --> 00:55:06,190 What we don't know is that pi actually has 874 00:55:06,190 --> 00:55:08,340 non-negative terms. 875 00:55:08,340 --> 00:55:11,320 So that's the thing we want to show. 876 00:55:11,320 --> 00:55:15,790 The proof is, there must be a left eigenvector pi for 877 00:55:15,790 --> 00:55:16,960 eigenvalue 1. 878 00:55:16,960 --> 00:55:18,540 We already know that. 879 00:55:18,540 --> 00:55:25,275 For every j, Pi sub j is equal to the sum over k times pi sub 880 00:55:25,275 --> 00:55:27,490 k times p sub kj. 881 00:55:27,490 --> 00:55:29,860 We don't know whether these are complex or real. 882 00:55:29,860 --> 00:55:32,140 We don't know whether they're positive or negative, if 883 00:55:32,140 --> 00:55:33,270 they're real. 884 00:55:33,270 --> 00:55:37,960 But we do know that since they satisfy this eigenvector 885 00:55:37,960 --> 00:55:41,500 equation, they satisfy this equation. 886 00:55:41,500 --> 00:55:43,670 If I take the magnitudes of all of these 887 00:55:43,670 --> 00:55:45,220 things, what do I get? 888 00:55:45,220 --> 00:55:51,440 The magnitude on this side is pi sub j magnitude. 889 00:55:51,440 --> 00:55:55,700 This is less than or equal to the sum of the magnitudes of 890 00:55:55,700 --> 00:55:56,720 these terms. 891 00:55:56,720 --> 00:56:03,690 If you take two complex numbers and you add them up, 892 00:56:03,690 --> 00:56:07,250 you get something which, in magnitude, is less than or 893 00:56:07,250 --> 00:56:10,120 equal to the sum of the magnitudes. 894 00:56:12,804 --> 00:56:16,030 It might sound strange, but if you look 895 00:56:16,030 --> 00:56:20,070 in the complex plane-- 896 00:56:20,070 --> 00:56:23,480 imaginary, real-- 897 00:56:23,480 --> 00:56:27,250 and you look at one complex number, and you add it to 898 00:56:27,250 --> 00:56:33,380 another complex number, this distance here is less than or 899 00:56:33,380 --> 00:56:36,700 equal to this magnitude plus this magnitude. 900 00:56:36,700 --> 00:56:38,940 That's all that equation is saying. 901 00:56:38,940 --> 00:56:44,160 And this is equal to this distance plus this distance if 902 00:56:44,160 --> 00:56:51,080 and only if each of these components of the eigenvector 903 00:56:51,080 --> 00:56:55,110 that we're talking about, if and only if those components 904 00:56:55,110 --> 00:56:57,630 are all heading off in the same direction 905 00:56:57,630 --> 00:57:00,620 in the complex plane. 906 00:57:00,620 --> 00:57:02,404 Now what do we do? 907 00:57:02,404 --> 00:57:05,950 Well, you look at this for a while and you say, OK, what 908 00:57:05,950 --> 00:57:11,031 happens if I sum this inequality over j? 909 00:57:11,031 --> 00:57:15,320 Well, if I sum this over j, I get one. 910 00:57:15,320 --> 00:57:28,410 And therefore when I sum both sides over j, the sum over j 911 00:57:28,410 --> 00:57:33,240 of the magnitudes of these eigenvector components is less 912 00:57:33,240 --> 00:57:36,570 than or equal to the sum over k of the magnitude. 913 00:57:36,570 --> 00:57:38,760 This is the same as this. 914 00:57:38,760 --> 00:57:42,220 This j is just a dummy index of summation. 915 00:57:42,220 --> 00:57:45,030 This is a dummy index of summation. 916 00:57:45,030 --> 00:57:47,810 Obviously, this is less than or equal to this. 917 00:57:47,810 --> 00:57:52,470 But what's interesting here is that this is equal to this. 918 00:57:52,470 --> 00:57:56,290 And the only way this can be equal to this is if every one 919 00:57:56,290 --> 00:58:00,450 of these things are satisfied with equality. 920 00:58:00,450 --> 00:58:03,720 If any one of these are satisfied with inequality, 921 00:58:03,720 --> 00:58:07,690 then when you add them all up, this will be satisfied with 922 00:58:07,690 --> 00:58:10,120 inequality also, which is impossible. 923 00:58:10,120 --> 00:58:15,080 So all of these are satisfied with equality, which says that 924 00:58:15,080 --> 00:58:25,060 the magnitude of pi sub j, the vector whose elements are the 925 00:58:25,060 --> 00:58:31,010 magnitudes of this thing we started with, in fact form a 926 00:58:31,010 --> 00:58:35,700 steady state vector if we normalize them to 1. 927 00:58:35,700 --> 00:58:38,600 It says these magnitudes satisfy the 928 00:58:38,600 --> 00:58:41,270 steady state equation. 929 00:58:41,270 --> 00:58:45,010 These magnitudes are real and they're positive. 930 00:58:45,010 --> 00:58:48,470 So when we normalize them to sum to 1, we have a steady 931 00:58:48,470 --> 00:58:50,940 state vector. 932 00:58:50,940 --> 00:58:53,780 And therefore the left eigenvector pi of eigenvalue 1 933 00:58:53,780 --> 00:58:57,790 is a steady state vector if it's normalized to pi times e 934 00:58:57,790 --> 00:59:03,120 equals 1, which is the way we want to normalize them. 935 00:59:03,120 --> 00:59:07,140 So there always is a steady state vector for every 936 00:59:07,140 --> 00:59:08,840 finite-state Markov chain. 937 00:59:12,440 --> 00:59:15,580 So this is a non-negative vector satisfying a steady 938 00:59:15,580 --> 00:59:16,960 state vector equation. 939 00:59:16,960 --> 00:59:20,420 And normalizing it, we have a steady state vector. 940 00:59:20,420 --> 00:59:24,300 So we've demonstrated the existence of a left 941 00:59:24,300 --> 00:59:27,480 eigenvector which is a steady state vector. 942 00:59:27,480 --> 00:59:34,180 Another theorem is that every eigenvalue satisfies lambda, 943 00:59:34,180 --> 00:59:37,520 magnitude of the eigenvalue is less than or equal to 1. 944 00:59:37,520 --> 00:59:41,370 This, again, is sort of obvious, because if you have 945 00:59:41,370 --> 00:59:45,190 an eigenvalue which is bigger than 1 and you start taking 946 00:59:45,190 --> 00:59:49,020 powers of it, it starts marching off to infinity. 947 00:59:49,020 --> 00:59:50,920 Now, you might say, maybe something else 948 00:59:50,920 --> 00:59:52,140 is balancing that. 949 00:59:52,140 --> 00:59:55,760 But since you only have a finite number of these things, 950 00:59:55,760 --> 00:59:58,050 that sounds pretty weird. 951 00:59:58,050 --> 00:59:59,790 And in fact, it is. 952 00:59:59,790 --> 01:00:09,140 So the proof of this is, we want to assume that pi super l 953 01:00:09,140 --> 01:00:14,985 is the l-th of these eigenvectors of P. Its 954 01:00:14,985 --> 01:00:18,820 eigenvalue is lambda sub l. 955 01:00:18,820 --> 01:00:25,170 It also is a left eigenvector of P to the n with eigenvalue 956 01:00:25,170 --> 01:00:26,370 lambda to the n. 957 01:00:26,370 --> 01:00:29,120 That's what we've shown before. 958 01:00:29,120 --> 01:00:33,070 I mean, you can multiply this matrix P, and all you're doing 959 01:00:33,070 --> 01:00:37,710 is just taking powers of the eigenvalue. 960 01:00:37,710 --> 01:00:43,160 So if we start out with lambda to the n, let's forget about 961 01:00:43,160 --> 01:00:46,290 the l's, because we're just looking at a fixed l now. 962 01:00:46,290 --> 01:00:54,870 Lambda to the nth power times the j-th component of pi is 963 01:00:54,870 --> 01:01:04,900 equal to the sum over i of the i-th component of pi times Pij 964 01:01:04,900 --> 01:01:06,640 to the n, for all j. 965 01:01:11,080 --> 01:01:14,430 Now I take the magnitude of everything is before. 966 01:01:14,430 --> 01:01:17,510 The magnitude of this is, again, less than or equal to 967 01:01:17,510 --> 01:01:19,380 the magnitude of this. 968 01:01:19,380 --> 01:01:25,510 I want to let beta be the largest of these quantities. 969 01:01:25,510 --> 01:01:32,240 And when I put that maximizing j in here, lambda to the l 970 01:01:32,240 --> 01:01:40,550 times beta is less than or equal to the sum over i of-- 971 01:01:40,550 --> 01:01:43,810 I can upper-bound these by beta. 972 01:01:43,810 --> 01:01:47,340 So I wind up with lambda to the l times beta is less than 973 01:01:47,340 --> 01:01:51,800 or equal to the sum over i of beta times Pij to the n. 974 01:01:51,800 --> 01:01:54,680 I don't know what these powers are, but they're certainly 975 01:01:54,680 --> 01:01:57,340 less than or equal to 1. 976 01:01:57,340 --> 01:02:03,920 So lambda sub l is less than or equal to n. 977 01:02:03,920 --> 01:02:05,260 That's what this said. 978 01:02:05,260 --> 01:02:14,680 When you take this magnitude of the l-th eigenvalue, it's 979 01:02:14,680 --> 01:02:17,210 less than or equal to this number n. 980 01:02:17,210 --> 01:02:22,310 Now, if this number were larger than 1, if it was 1 981 01:02:22,310 --> 01:02:27,300 plus 10 to the minus sixth, and you multiplied it by a 982 01:02:27,300 --> 01:02:31,410 large enough number n, that this would grow to be 983 01:02:31,410 --> 01:02:33,330 arbitrarily large. 984 01:02:33,330 --> 01:02:36,880 It can't grow to be arbitrarily large, therefore 985 01:02:36,880 --> 01:02:39,890 the magnitude of lambda sub l has to be less 986 01:02:39,890 --> 01:02:41,880 than or equal to 1. 987 01:02:41,880 --> 01:02:48,980 Tedious proof, but unfortunately, the notes just 988 01:02:48,980 --> 01:02:50,230 assume this. 989 01:02:53,630 --> 01:02:56,610 Maybe I had some good, simple reason for it before. 990 01:02:56,610 --> 01:02:59,600 I don't have any now, so I have to go through a proof. 991 01:02:59,600 --> 01:03:04,440 Anyway, these two theorems, if you look at them, are valid 992 01:03:04,440 --> 01:03:06,880 for all finite-state Markov chains. 993 01:03:06,880 --> 01:03:12,190 There was no place that we used the fact that we had 994 01:03:12,190 --> 01:03:14,790 anything with distinct eigenvalues or anything. 995 01:03:14,790 --> 01:03:20,840 But now when we had distinct eigenvalues, we have the nth 996 01:03:20,840 --> 01:03:28,500 power of P is the sum here again over right eigenvectors 997 01:03:28,500 --> 01:03:32,600 times left eigenvectors. 998 01:03:32,600 --> 01:03:35,340 When you take a right eigenvector, which is a column 999 01:03:35,340 --> 01:03:39,720 vector, times a left eigenvector, which is a row 1000 01:03:39,720 --> 01:03:44,080 vector, you get an M by M matrix. 1001 01:03:44,080 --> 01:03:47,330 I don't know what that matrix is, but it's a matrix. 1002 01:03:47,330 --> 01:03:50,980 It's a fixed matrix independent of n. 1003 01:03:50,980 --> 01:03:53,390 And the only thing that's varying with n is these 1004 01:03:53,390 --> 01:03:56,000 eigenvalues. 1005 01:03:56,000 --> 01:03:59,220 These quantities are less than or equal to 1. 1006 01:03:59,220 --> 01:04:03,270 So if the chain is an ergodic unit chain, we've already seen 1007 01:04:03,270 --> 01:04:07,250 that one eigenvalue is 1, and the rest of the eigenvalues 1008 01:04:07,250 --> 01:04:09,260 are strictly less than 1 in magnitude. 1009 01:04:09,260 --> 01:04:13,600 We saw that by showing that for an ergodic unit chain, P 1010 01:04:13,600 --> 01:04:16,060 to the n converged. 1011 01:04:16,060 --> 01:04:21,280 So the rate at which P to the n approaches e times pi is 1012 01:04:21,280 --> 01:04:24,190 going to be determined by the second-largest 1013 01:04:24,190 --> 01:04:27,710 eigenvalue in here. 1014 01:04:27,710 --> 01:04:31,180 And that second-largest eigenvalue is going to be less 1015 01:04:31,180 --> 01:04:33,880 than 1, strictly less than 1. 1016 01:04:33,880 --> 01:04:35,170 We don't know what it is. 1017 01:04:35,170 --> 01:04:39,040 Before, we knew this convergence here for an 1018 01:04:39,040 --> 01:04:43,050 ergodic unit chain is exponential. 1019 01:04:43,050 --> 01:04:45,380 Now we know that it's exponential and we know 1020 01:04:45,380 --> 01:04:48,740 exactly how fast it goes, because the speed of 1021 01:04:48,740 --> 01:04:52,480 convergence is just the second-largest eigenvalue. 1022 01:04:52,480 --> 01:04:58,530 If you want to know how fast P to the n approaches e times 1023 01:04:58,530 --> 01:05:02,170 the steady state vector pi, all you have to do is find 1024 01:05:02,170 --> 01:05:05,220 that second-largest eigenvalue, and that tells you 1025 01:05:05,220 --> 01:05:09,560 how fast the convergence is, except for calculating these 1026 01:05:09,560 --> 01:05:11,015 things, which are just fixed. 1027 01:05:13,580 --> 01:05:19,200 If P is a periodic unit chain with period d, then if you 1028 01:05:19,200 --> 01:05:20,110 read the notes-- 1029 01:05:20,110 --> 01:05:22,110 you should read the notes-- 1030 01:05:22,110 --> 01:05:24,420 there are d eigenvalues equally spaced 1031 01:05:24,420 --> 01:05:26,160 around the unit circle. 1032 01:05:26,160 --> 01:05:28,470 P to the n doesn't converge. 1033 01:05:28,470 --> 01:05:33,040 The only thing you can say here is, what happens if you 1034 01:05:33,040 --> 01:05:37,280 look at P to the d-th power? 1035 01:05:37,280 --> 01:05:39,890 And you can imagine what happens if you look at P to 1036 01:05:39,890 --> 01:05:44,070 the d-th power without doing any analysis. 1037 01:05:44,070 --> 01:05:49,290 I mean, we know that what happens in a periodic chain is 1038 01:05:49,290 --> 01:05:53,170 that you rotate from one set of states to another set of 1039 01:05:53,170 --> 01:05:56,220 states to another set of states to another set of 1040 01:05:56,220 --> 01:05:57,910 states, and then back to the set of 1041 01:05:57,910 --> 01:05:59,110 states you started with. 1042 01:05:59,110 --> 01:06:01,220 And you keep rotating around. 1043 01:06:01,220 --> 01:06:05,520 Now, there are d sets of states going around here. 1044 01:06:05,520 --> 01:06:08,860 What happens if I take P to the d? 1045 01:06:08,860 --> 01:06:12,320 P to the d is looking at the d-step transitions. 1046 01:06:12,320 --> 01:06:16,840 So it's looking at, if you start here, after d steps, 1047 01:06:16,840 --> 01:06:19,310 you're back here again, after d steps, 1048 01:06:19,310 --> 01:06:20,960 you're back here again. 1049 01:06:20,960 --> 01:06:31,700 So the matrix, P to the d, is in fact the matrix of d 1050 01:06:31,700 --> 01:06:35,180 ergodic subclasses. 1051 01:06:37,940 --> 01:06:41,090 And for each one of them, whatever subclass you start 1052 01:06:41,090 --> 01:06:44,200 in, you stay in that subclass forever. 1053 01:06:44,200 --> 01:06:49,130 So the analysis of a periodic unit chain, really the classy 1054 01:06:49,130 --> 01:06:52,730 way to do it is to look at P to the d and see 1055 01:06:52,730 --> 01:06:54,980 what happens there. 1056 01:06:54,980 --> 01:06:58,800 And you see that you get convergence within each 1057 01:06:58,800 --> 01:07:03,030 subclass, but you just keep rotating among subclasses. 1058 01:07:03,030 --> 01:07:06,060 So there's nothing very fancy going on there. 1059 01:07:06,060 --> 01:07:09,860 You just rotate from one subclass to another. 1060 01:07:09,860 --> 01:07:12,350 And that's the way it is. 1061 01:07:12,350 --> 01:07:14,720 And P to the n doesn't converge. 1062 01:07:14,720 --> 01:07:18,495 But P to the d times n does converge. 1063 01:07:22,900 --> 01:07:30,050 Now, let's look at the next-most complicated state. 1064 01:07:30,050 --> 01:07:34,460 Suppose we have M states and we have M independent 1065 01:07:34,460 --> 01:07:35,180 eigenvectors. 1066 01:07:35,180 --> 01:07:38,650 OK, remember I told you that there was a very ugly thing in 1067 01:07:38,650 --> 01:07:43,140 linear algebra that said, when you had an eigenvalue of 1068 01:07:43,140 --> 01:07:50,590 multiplicity k, you might not have k linearly independent 1069 01:07:50,590 --> 01:07:51,060 eigenvectors. 1070 01:07:51,060 --> 01:07:52,720 You might have a smaller number of them. 1071 01:07:52,720 --> 01:07:55,070 We'll look at an example of that later. 1072 01:07:55,070 --> 01:07:58,730 But here, I'm saying, let's forget about that case, 1073 01:07:58,730 --> 01:08:00,760 because it's ugly. 1074 01:08:00,760 --> 01:08:04,640 Let's assume that whatever multiplicity each of these 1075 01:08:04,640 --> 01:08:08,950 eigenvalues has, if you have an eigenvalue with 1076 01:08:08,950 --> 01:08:15,010 multiplicity k, then you have k linearly independent right 1077 01:08:15,010 --> 01:08:19,279 eigenvectors and k linearly independent left eigenvectors 1078 01:08:19,279 --> 01:08:20,960 to correspond to that. 1079 01:08:20,960 --> 01:08:26,420 And then when you add up all of the eigenvectors, you have 1080 01:08:26,420 --> 01:08:30,020 M linearly independent eigenvectors. 1081 01:08:30,020 --> 01:08:36,029 And what happens when you have M linearly independent vectors 1082 01:08:36,029 --> 01:08:39,710 in a space of dimension M? 1083 01:08:39,710 --> 01:08:42,649 If you have M linearly independent vectors in a space 1084 01:08:42,649 --> 01:08:48,880 of dimension N, you expand the whole space, which says that 1085 01:08:48,880 --> 01:08:53,490 the vector of these eigenvectors is in fact 1086 01:08:53,490 --> 01:08:56,920 non-singular, which says, again, we can do all of the 1087 01:08:56,920 --> 01:08:58,700 stuff we did before. 1088 01:08:58,700 --> 01:09:01,830 There's a little bit of a trick in showing that the left 1089 01:09:01,830 --> 01:09:04,460 eigenvectors and the right eigenvectors can be made 1090 01:09:04,460 --> 01:09:06,490 orthogonal. 1091 01:09:06,490 --> 01:09:10,359 But aside from that, P to the n is again 1092 01:09:10,359 --> 01:09:13,960 equal to the same form. 1093 01:09:13,960 --> 01:09:23,550 And what this form says is, if all of the eigenvalues except 1094 01:09:23,550 --> 01:09:27,250 one are less than 1, then you're again going to approach 1095 01:09:27,250 --> 01:09:28,649 steady state. 1096 01:09:28,649 --> 01:09:29,899 What does that mean? 1097 01:09:32,870 --> 01:09:39,729 Suppose I have more than one ergodic chain, more than one 1098 01:09:39,729 --> 01:09:44,350 ergodic class, or suppose I have a periodic class or 1099 01:09:44,350 --> 01:09:45,130 something else. 1100 01:09:45,130 --> 01:09:49,399 Is it possible to have one eigenvalue equal to 1 and all 1101 01:09:49,399 --> 01:09:52,040 the other eigenvalues be smaller? 1102 01:09:52,040 --> 01:09:55,670 If there's one eigenvalue that's equal to 1, according 1103 01:09:55,670 --> 01:09:59,740 to this formula here, eventually P to the n 1104 01:09:59,740 --> 01:10:05,090 converges to that one value equal to 1. 1105 01:10:05,090 --> 01:10:09,290 And right eigenvector can be taken as e. 1106 01:10:09,290 --> 01:10:13,230 Left eigenvector can be taken as a steady state vector pi. 1107 01:10:13,230 --> 01:10:16,250 And we have the case of convergence. 1108 01:10:16,250 --> 01:10:20,830 Can you have convergence to all the rows being the same if 1109 01:10:20,830 --> 01:10:24,830 you have multiple ergodic classes? 1110 01:10:24,830 --> 01:10:25,900 No. 1111 01:10:25,900 --> 01:10:28,820 If you have multiple ergodic classes and you start out in 1112 01:10:28,820 --> 01:10:30,040 one class, you stay there. 1113 01:10:30,040 --> 01:10:32,350 You can't get out of it. 1114 01:10:32,350 --> 01:10:35,190 If you have a periodic class and you start out in that 1115 01:10:35,190 --> 01:10:39,120 periodic class, you can't have convergence there. 1116 01:10:39,120 --> 01:10:47,100 So in this situation here, where all the eigenvalues are 1117 01:10:47,100 --> 01:10:51,180 distinct, you can only have one eigenvalue equal to 1. 1118 01:10:51,180 --> 01:10:55,270 Here, when we're going to this more general case, we might 1119 01:10:55,270 --> 01:10:58,470 have more than one eigenvalue equal to 1. 1120 01:10:58,470 --> 01:11:02,960 But if in fact we only have one eigenvalue equal to 1, and 1121 01:11:02,960 --> 01:11:06,440 all the others are strictly smaller in magnitude, then in 1122 01:11:06,440 --> 01:11:09,620 fact you're just talking about this case of an ergodic unit 1123 01:11:09,620 --> 01:11:10,505 chain again. 1124 01:11:10,505 --> 01:11:14,490 It's the only place you can be. 1125 01:11:14,490 --> 01:11:19,350 So let's look at an example of this. 1126 01:11:19,350 --> 01:11:23,050 Suppose you have a Markov chain which has l 1127 01:11:23,050 --> 01:11:26,610 ergodic sets of states. 1128 01:11:26,610 --> 01:11:29,420 You have one set of states. 1129 01:11:40,990 --> 01:11:47,610 So we have one set of states over here, which will all go 1130 01:11:47,610 --> 01:11:50,480 back and forth to each other. 1131 01:11:50,480 --> 01:11:52,850 Then another set of states over here. 1132 01:11:58,260 --> 01:12:03,840 Let's let l equal 2 in this case. 1133 01:12:03,840 --> 01:12:05,945 So what happens in this situation? 1134 01:12:16,840 --> 01:12:18,660 We'll have to work quickly before it gets up. 1135 01:12:25,400 --> 01:12:29,860 Anybody with any sense, faced with a Markov chain like this, 1136 01:12:29,860 --> 01:12:32,800 would say if we start here, we're going to stay here, if 1137 01:12:32,800 --> 01:12:35,020 we start here, we're going to stay here. 1138 01:12:35,020 --> 01:12:37,150 Let's just analyze this first. 1139 01:12:37,150 --> 01:12:39,390 And then after we're done analyzing this, 1140 01:12:39,390 --> 01:12:40,960 we'll analyze this. 1141 01:12:40,960 --> 01:12:43,160 And then we'll put the whole thing together. 1142 01:12:43,160 --> 01:12:48,180 And what we will find is a transition matrix 1143 01:12:48,180 --> 01:12:49,510 which looks like this. 1144 01:12:54,540 --> 01:12:56,420 And if you're here, you stay here. 1145 01:12:56,420 --> 01:12:57,990 If you're here, you stay here. 1146 01:12:57,990 --> 01:13:01,630 We can find the eigenvalues and eigenvectors of this. 1147 01:13:01,630 --> 01:13:05,030 We can find the eigenvalues and eigenvectors of this. 1148 01:13:05,030 --> 01:13:08,530 If you look at this crazy formula for finding 1149 01:13:08,530 --> 01:13:12,940 determinants, what you're stuck with is permutations 1150 01:13:12,940 --> 01:13:16,500 within here times permutations within here. 1151 01:13:16,500 --> 01:13:20,490 So the eigenvalues that you wind up with are products of 1152 01:13:20,490 --> 01:13:21,960 the two eigenvalues. 1153 01:13:21,960 --> 01:13:29,970 Or any eigenvalue here is an eigenvalue of the whole thing. 1154 01:13:29,970 --> 01:13:32,715 Any eigenvalue here is an eigenvalue of the whole thing. 1155 01:13:32,715 --> 01:13:36,120 And we just look at the sum of the number of eigenvalues here 1156 01:13:36,120 --> 01:13:37,300 and the number there. 1157 01:13:37,300 --> 01:13:40,490 So we have a very boring case here. 1158 01:13:40,490 --> 01:13:44,750 Each ergodic set has an eigenvalue equal to 1, has a 1159 01:13:44,750 --> 01:13:47,580 right eigenvector equal to 1. 1160 01:13:47,580 --> 01:13:53,090 When the steps of that state and 0 elsewhere. 1161 01:13:53,090 --> 01:13:56,290 There's also a steady state vector on that set of states. 1162 01:13:56,290 --> 01:13:58,120 We've already seen that. 1163 01:13:58,120 --> 01:14:03,940 So P to the n converges to a block diagonal matrix, where 1164 01:14:03,940 --> 01:14:08,270 for each ergodic set, the rows within that set are the same. 1165 01:14:08,270 --> 01:14:21,400 So P to the n then is pi 1, pi 1. 1166 01:14:21,400 --> 01:14:27,095 And then here, we have pi 2, pi 2, pi 2. 1167 01:14:29,610 --> 01:14:34,000 So that's all that can happen here. 1168 01:14:34,000 --> 01:14:35,250 This is limit. 1169 01:14:42,090 --> 01:14:47,220 So one message of this is that, after you understand 1170 01:14:47,220 --> 01:14:51,740 ergodic unit chains, you understand almost everything. 1171 01:14:51,740 --> 01:14:55,310 You still have to worry about periodic unit chains. 1172 01:14:55,310 --> 01:14:58,220 But you just take a power of them, and then you have 1173 01:14:58,220 --> 01:15:00,400 ergodic sets of states. 1174 01:15:04,650 --> 01:15:07,250 one final thing. 1175 01:15:07,250 --> 01:15:09,640 Good, I have five minutes to talk about this. 1176 01:15:09,640 --> 01:15:12,480 I don't want any more time to talk about it, because I'll 1177 01:15:12,480 --> 01:15:15,490 get terribly confused if I do. 1178 01:15:15,490 --> 01:15:21,030 And it's a topic which, if you want to read more about it, 1179 01:15:21,030 --> 01:15:24,610 read about it in Strang. 1180 01:15:24,610 --> 01:15:27,010 He obviously doesn't like the topic either. 1181 01:15:27,010 --> 01:15:28,710 Nobody likes the topic. 1182 01:15:28,710 --> 01:15:33,320 Strang at least was driven to say something clear about it. 1183 01:15:33,320 --> 01:15:36,330 Most people don't even bother to say 1184 01:15:36,330 --> 01:15:38,260 something clear about it. 1185 01:15:38,260 --> 01:15:42,190 There's a theorem, due to, I guess, Jordan, because it's 1186 01:15:42,190 --> 01:15:45,320 called a Jordan form. 1187 01:15:45,320 --> 01:15:51,210 And what Jordan said is, in the nice cases we talked 1188 01:15:51,210 --> 01:15:57,860 about, you have this decomposition of the 1189 01:15:57,860 --> 01:16:04,090 transition matrix in P into a matrix here whose columns are 1190 01:16:04,090 --> 01:16:09,480 the right eigenvectors times a matrix here, which is a 1191 01:16:09,480 --> 01:16:13,140 diagonal matrix with the eigenvalues along it. 1192 01:16:13,140 --> 01:16:19,980 And this, finally, is a matrix which is the inverse of this, 1193 01:16:19,980 --> 01:16:24,200 and, which properly normalized, is the left 1194 01:16:24,200 --> 01:16:33,400 eigenvectors of P. And you can replace this form by what's 1195 01:16:33,400 --> 01:16:39,040 called a Jordan form, where P is equal to some matrix u 1196 01:16:39,040 --> 01:16:45,720 times the Jordan form matrix j times the inverse of u. 1197 01:16:45,720 --> 01:16:49,870 Now, u is no longer the right eigenvectors. 1198 01:16:49,870 --> 01:16:52,480 It can't be the right eigenvectors, because when we 1199 01:16:52,480 --> 01:16:56,090 needed Jordan form, we don't have enough right eigenvectors 1200 01:16:56,090 --> 01:16:58,030 to span the space. 1201 01:16:58,030 --> 01:17:00,910 So it has to be something else. 1202 01:17:00,910 --> 01:17:04,450 And like everyone else, we say, I don't care 1203 01:17:04,450 --> 01:17:06,320 what that matrix is. 1204 01:17:06,320 --> 01:17:09,940 Jordan proved that there is such a matrix, and that's all 1205 01:17:09,940 --> 01:17:11,270 we want to know. 1206 01:17:11,270 --> 01:17:17,230 The important thing is that this matrix j in here is as 1207 01:17:17,230 --> 01:17:19,860 close as you can get it. 1208 01:17:19,860 --> 01:17:25,400 It's a matrix, which along the main diagonal, has all the 1209 01:17:25,400 --> 01:17:28,310 eigenvalues with their appropriate multiplicity. 1210 01:17:28,310 --> 01:17:31,670 Namely, lambda 1 is an eigenvalue with 1211 01:17:31,670 --> 01:17:33,550 multiplicity 2. 1212 01:17:33,550 --> 01:17:38,700 Lambda 2 is an eigenvalue of multiplicity 3. 1213 01:17:38,700 --> 01:17:43,210 And in this situation, you have two eigenvectors here, so 1214 01:17:43,210 --> 01:17:46,180 nothing appears up there. 1215 01:17:46,180 --> 01:17:53,530 With this multiplicity 3 eigenvalue, there are only two 1216 01:17:53,530 --> 01:17:56,370 linearly independent eigenvectors. 1217 01:17:56,370 --> 01:18:00,640 And therefore Jordan says, why don't we stick a 1 in here and 1218 01:18:00,640 --> 01:18:03,270 then solve everything else? 1219 01:18:03,270 --> 01:18:08,770 And his theorem says, if you do that, it in fact works. 1220 01:18:08,770 --> 01:18:11,190 So every time-- 1221 01:18:11,190 --> 01:18:17,850 well, the eigenvalue is on the main diagonal, the ones on the 1222 01:18:17,850 --> 01:18:22,190 next diagonal up, the only place would be anything non-0 1223 01:18:22,190 --> 01:18:25,850 is on the main diagonal in this form, and on the next 1224 01:18:25,850 --> 01:18:29,400 diagonal up, where you occasionally have a 1. 1225 01:18:29,400 --> 01:18:33,420 And the 1 is to replace the need for deficient 1226 01:18:33,420 --> 01:18:33,900 eigenvectors. 1227 01:18:33,900 --> 01:18:37,230 So every time you have a deficient eigenvector, you 1228 01:18:37,230 --> 01:18:39,260 have some 1 appearing there. 1229 01:18:39,260 --> 01:18:40,960 And then there's a way to solve for u. 1230 01:18:40,960 --> 01:18:44,650 And I don't have any idea what it is, and I don't care. 1231 01:18:44,650 --> 01:18:49,390 But if you get interested in it, I think that's wonderful. 1232 01:18:49,390 --> 01:18:53,075 But please don't tell me about it. 1233 01:18:59,250 --> 01:19:04,400 Nice example of this is this matrix here. 1234 01:19:04,400 --> 01:19:10,160 What happens if you try to take the determinant of P 1235 01:19:10,160 --> 01:19:11,835 minus lambda i? 1236 01:19:11,835 --> 01:19:16,850 Well, you have 1/2 minus lambda, 1/2 minus lambda, 1 1237 01:19:16,850 --> 01:19:19,250 minus lambda. 1238 01:19:19,250 --> 01:19:25,180 What are all the permutations here that you can take? 1239 01:19:25,180 --> 01:19:29,200 There's the permutation of the main diagonal itself. 1240 01:19:29,200 --> 01:19:33,380 If I try to include that element, there's nothing I can 1241 01:19:33,380 --> 01:19:35,880 do but have some element down here. 1242 01:19:35,880 --> 01:19:37,150 And all these elements are 0. 1243 01:19:39,870 --> 01:19:43,480 So those elements don't contribute to a 1244 01:19:43,480 --> 01:19:45,140 determinant at all. 1245 01:19:45,140 --> 01:19:49,100 So I have one eigenvalue which is equal to 1. 1246 01:19:49,100 --> 01:19:53,020 I have two values at multiplicity 2, eigenvalue 1247 01:19:53,020 --> 01:19:54,600 which is 1/2. 1248 01:19:54,600 --> 01:19:58,070 If you try to find the eigenvector here, you find 1249 01:19:58,070 --> 01:19:59,930 there is only one. 1250 01:19:59,930 --> 01:20:03,700 So in fact, this corresponds to a Jordan form, 1251 01:20:03,700 --> 01:20:07,180 where you have 1/2. 1252 01:20:15,300 --> 01:20:22,010 1, and a 0, and a 1 here, and 0 everywhere else. 1253 01:20:29,650 --> 01:20:37,110 And now if I want to find P to the n, I have u times this j 1254 01:20:37,110 --> 01:20:39,320 times u to the minus 1 times u. 1255 01:20:39,320 --> 01:20:42,140 All the u's in the middle cancel out, so I wind up 1256 01:20:42,140 --> 01:20:46,640 eventually with u times j to the nth power times u 1257 01:20:46,640 --> 01:20:48,020 to the minus 1. 1258 01:20:48,020 --> 01:20:49,490 What is j to the nth power? 1259 01:20:49,490 --> 01:20:56,260 What happens if I multiply this matrix by itself n times? 1260 01:20:56,260 --> 01:20:59,970 Well, it turns out that what happens is that this main 1261 01:20:59,970 --> 01:21:03,880 diagonal here, you wind up with a 1/4 and 1262 01:21:03,880 --> 01:21:06,350 then 1/8 and so forth. 1263 01:21:06,350 --> 01:21:13,190 This term here, it goes down exponential. 1264 01:21:13,190 --> 01:21:24,270 Well, if you multiply this by itself, eventually, you can 1265 01:21:24,270 --> 01:21:27,920 see what's going on here more easily if you draw the Markov 1266 01:21:27,920 --> 01:21:29,160 chain for it. 1267 01:21:29,160 --> 01:21:34,590 You have state 1, state 2, and state 3. 1268 01:21:34,590 --> 01:21:40,680 State 1, there's a transition 1/2 and a transition 1/2. 1269 01:21:40,680 --> 01:21:47,530 State 2, there's a transition 1/2 and a transition 1/2, And 1270 01:21:47,530 --> 01:21:50,810 state 3, you just stay there. 1271 01:21:50,810 --> 01:21:53,600 So the amount of time that it takes you to get to steady 1272 01:21:53,600 --> 01:21:56,820 state is the amount of time it takes you-- 1273 01:21:56,820 --> 01:21:58,690 you start in state 1. 1274 01:21:58,690 --> 01:22:01,930 You've got to make this transition eventually, and 1275 01:22:01,930 --> 01:22:05,690 then you've got to make this transition eventually. 1276 01:22:05,690 --> 01:22:08,800 And the amount of time that it takes you to do that is the 1277 01:22:08,800 --> 01:22:12,170 sum of the amount of time it takes you to go there, plus 1278 01:22:12,170 --> 01:22:15,220 the amount of time that it takes to go there. 1279 01:22:15,220 --> 01:22:16,960 So you have two random variables. 1280 01:22:16,960 --> 01:22:19,400 One is the time to go here. 1281 01:22:19,400 --> 01:22:22,320 The other is the time to go here. 1282 01:22:22,320 --> 01:22:24,590 Both of those are geometrically decreasing 1283 01:22:24,590 --> 01:22:25,960 random variables. 1284 01:22:25,960 --> 01:22:30,470 When we convolve those things with each other, what we get 1285 01:22:30,470 --> 01:22:31,940 is an extra term n. 1286 01:22:31,940 --> 01:22:40,070 So we get an n times 1/2 to the n. 1287 01:22:40,070 --> 01:22:43,200 So the thing which is different in the Jordan form 1288 01:22:43,200 --> 01:22:47,200 is, instead of having an eigenvalue to the nth power, 1289 01:22:47,200 --> 01:22:50,840 you have an eigenvalue times-- 1290 01:22:50,840 --> 01:22:54,610 if there's only a single one there, there's an n there. 1291 01:22:54,610 --> 01:22:58,450 If there are two 1s both together, you get an n times n 1292 01:22:58,450 --> 01:23:00,230 minus 1, and so forth. 1293 01:23:00,230 --> 01:23:04,690 So worst case, you've got a polynomial to the nth power 1294 01:23:04,690 --> 01:23:07,020 times an eigenvalue. 1295 01:23:07,020 --> 01:23:10,130 For all practical purposes, this is still the eigenvalue 1296 01:23:10,130 --> 01:23:11,950 going down exponentially. 1297 01:23:11,950 --> 01:23:17,090 So for all practical purposes, what you wind up with is the 1298 01:23:17,090 --> 01:23:22,180 second-largest eigenvalue still determines how fast you 1299 01:23:22,180 --> 01:23:23,430 get convergence. 1300 01:23:26,490 --> 01:23:29,120 Sorry, I took eight minutes talking about the Jordan form. 1301 01:23:29,120 --> 01:23:32,020 I wanted to take five minutes talking about it. 1302 01:23:32,020 --> 01:23:34,030 You can read more about it in the notes.