1 00:00:00,000 --> 00:00:02,780 SPEAKER: The following content is provided under a creative 2 00:00:02,780 --> 00:00:03,640 commons license. 3 00:00:03,640 --> 00:00:06,600 Your support will help MIT OpenCourseWare continue to 4 00:00:06,600 --> 00:00:09,420 offer high quality educational resources for free. 5 00:00:09,420 --> 00:00:12,780 To make a donation, or to view additional materials from 6 00:00:12,780 --> 00:00:16,740 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:16,740 --> 00:00:17,990 ocw.mit.edu. 8 00:00:21,790 --> 00:00:26,450 PROFESSOR: The AEP is probably one of the most difficult 9 00:00:26,450 --> 00:00:29,640 concepts we talk about in this course. 10 00:00:29,640 --> 00:00:32,370 It seems simple to start with. 11 00:00:32,370 --> 00:00:34,870 As I said before, it's one of those things where you think 12 00:00:34,870 --> 00:00:36,930 you understand it, and then you think you 13 00:00:36,930 --> 00:00:38,240 don't understand it. 14 00:00:38,240 --> 00:00:41,120 When Shannon first came out with this theory, there were a 15 00:00:41,120 --> 00:00:44,670 lot of very, very good professional mathematicians 16 00:00:44,670 --> 00:00:48,080 who spent a long time trying to understand this. 17 00:00:48,080 --> 00:00:49,920 And who, in fact, blew it. 18 00:00:49,920 --> 00:00:53,330 Because, in fact, they were trying to look at it strictly 19 00:00:53,330 --> 00:00:55,930 in terms of mathematics. 20 00:00:55,930 --> 00:00:58,900 They were looking for strict mathematical theorems, they 21 00:00:58,900 --> 00:01:01,590 weren't looking to try to get the insight from it. 22 00:01:01,590 --> 00:01:03,560 Because of that they couldn't absorb it. 23 00:01:03,560 --> 00:01:05,720 There were a lot of engineers who looked at it, and couldn't 24 00:01:05,720 --> 00:01:09,690 absorb it because they couldn't match it with any of 25 00:01:09,690 --> 00:01:11,990 the mathematics, and therefore they started thinking there 26 00:01:11,990 --> 00:01:15,530 was more there then there really was so, so this is, in 27 00:01:15,530 --> 00:01:16,760 fact, tricky. 28 00:01:16,760 --> 00:01:24,820 What we're looking at is a sequence of chance variables 29 00:01:24,820 --> 00:01:28,920 coming from a discrete memoryless source. 30 00:01:28,920 --> 00:01:31,650 In other words, a discrete memoryless source is something 31 00:01:31,650 --> 00:01:35,520 that spits out symbols and each symbol is independent of 32 00:01:35,520 --> 00:01:39,810 each other symbol, each symbol has the same probability mass 33 00:01:39,810 --> 00:01:42,820 function as every other symbol. 34 00:01:42,820 --> 00:01:46,380 We said it's a neat thing to look at this log pmf random 35 00:01:46,380 --> 00:01:51,260 variable, and a log pmf random variable is minus log of the 36 00:01:51,260 --> 00:01:55,550 probability of the particular symbol. 37 00:01:55,550 --> 00:02:00,120 So we have a sequence now of random variables. 38 00:02:00,120 --> 00:02:04,430 And the expected value of that random variable, for each of 39 00:02:04,430 --> 00:02:12,280 the random variables, is the expected value of the log pmf 40 00:02:12,280 --> 00:02:15,140 which is, as we said before, is this thing we've called the 41 00:02:15,140 --> 00:02:18,240 entropy, which we're trying to get some insight into. 42 00:02:18,240 --> 00:02:21,550 So we have this log pmf random variable, we have the entropy, 43 00:02:21,550 --> 00:02:23,970 which is the expected value of it. 44 00:02:23,970 --> 00:02:25,790 We then talked about the sequence of 45 00:02:25,790 --> 00:02:27,540 these random variables. 46 00:02:27,540 --> 00:02:31,990 We talked about the sample average of them, and the whole 47 00:02:31,990 --> 00:02:34,930 reason you want to look at this random variable is the 48 00:02:34,930 --> 00:02:40,440 sample average of a lot of these pmf's, since logs add 49 00:02:40,440 --> 00:02:44,870 when the thing you're taking the log of multiplies. 50 00:02:44,870 --> 00:02:49,870 What you wind up with is the sum of these log pmf's is, in 51 00:02:49,870 --> 00:02:53,930 fact, equal to minus the lof of the probability of the 52 00:02:53,930 --> 00:02:54,840 whole sequence. 53 00:02:54,840 --> 00:02:59,200 In other words, you look at this whole sequence as a big, 54 00:02:59,200 --> 00:03:05,600 giant chance variable, which has capital X to the n 55 00:03:05,600 --> 00:03:07,100 different possible values. 56 00:03:07,100 --> 00:03:09,840 Namely, every possible sequence of length n. 57 00:03:09,840 --> 00:03:12,420 When you're talking about source coding, you have to 58 00:03:12,420 --> 00:03:17,680 find the code word for each of those m to 59 00:03:17,680 --> 00:03:20,950 the n different sequences. 60 00:03:20,950 --> 00:03:23,260 So what you're doing here is trying to look at the 61 00:03:23,260 --> 00:03:27,130 probability of each of those m to the n sequences. 62 00:03:27,130 --> 00:03:30,130 We can then use Huffman coding, or whatever we choose 63 00:03:30,130 --> 00:03:32,560 to use, to try to encode those things. 64 00:03:36,850 --> 00:03:40,910 OK, the weak law of large numbers applies here, and says 65 00:03:40,910 --> 00:03:46,450 that the probability that this sample average of the log pmf 66 00:03:46,450 --> 00:03:50,840 is close to the expected value of the log pmf. 67 00:03:50,840 --> 00:03:53,370 The probability that that's greater than or equal to 68 00:03:53,370 --> 00:03:57,770 epsilon is less than or equal to the variance of the log pmf 69 00:03:57,770 --> 00:04:02,850 random variable, divided by n times epsilon squared. 70 00:04:02,850 --> 00:04:08,090 Now, we are going to take the viewpoint that n epsilon 71 00:04:08,090 --> 00:04:10,860 squared, is very small. 72 00:04:10,860 --> 00:04:12,440 Excuse me, it's very large. 73 00:04:12,440 --> 00:04:15,970 Epsilon we're thinking as a small number, and we're 74 00:04:15,970 --> 00:04:19,680 thinking of as a large number, but the game we always play 75 00:04:19,680 --> 00:04:25,790 here is to first pick some epsilong, as small as you want 76 00:04:25,790 --> 00:04:30,640 to make it, and then you make n larger and larger. 77 00:04:30,640 --> 00:04:34,640 And as n gets bigger and bigger, eventually this number 78 00:04:34,640 --> 00:04:35,850 gets very small. 79 00:04:35,850 --> 00:04:37,970 So that's the game that we're playing. 80 00:04:37,970 --> 00:04:41,070 It's y n times epsilon squared, we're thinking of it 81 00:04:41,070 --> 00:04:42,350 is a large number. 82 00:04:42,350 --> 00:04:47,590 So what this says is the probability that this log pmf 83 00:04:47,590 --> 00:04:52,600 of the entire sequence, well the sample value of it, namely 84 00:04:52,600 --> 00:04:57,780 the log pmf divided by n, us close to H of X. The 85 00:04:57,780 --> 00:05:02,220 probability of that is less than or equal to this. 86 00:05:02,220 --> 00:05:06,180 We then define the typical set, T sub epsilon of n. 87 00:05:06,180 --> 00:05:11,240 This is the set of all typical sequences out of the source. 88 00:05:11,240 --> 00:05:15,120 So we're defining typical sequences in this way. 89 00:05:15,120 --> 00:05:18,000 It's a set of all sequences which are 90 00:05:18,000 --> 00:05:19,560 what we put into there. 91 00:05:19,560 --> 00:05:22,510 Well, it's what we put into there, but we're looking at 92 00:05:22,510 --> 00:05:24,340 the compliment of this set. 93 00:05:24,340 --> 00:05:26,880 These are the exceptional things. 94 00:05:26,880 --> 00:05:29,790 These are the typical things. 95 00:05:29,790 --> 00:05:32,470 We're saying that the exceptional things here have a 96 00:05:32,470 --> 00:05:35,650 small probability when you make n big enough. 97 00:05:35,650 --> 00:05:39,070 This is saying that the exceptional things don't 98 00:05:39,070 --> 00:05:42,680 amount to anything, and the typical things fill up the 99 00:05:42,680 --> 00:05:44,840 entire probability space. 100 00:05:44,840 --> 00:05:49,880 So what were saying is the probability that this sequence 101 00:05:49,880 --> 00:05:54,150 is actually typical is greater than or equal to 1 minus 102 00:05:54,150 --> 00:05:55,490 something small. 103 00:05:55,490 --> 00:06:00,940 It says when n gets big enough the probability that you get a 104 00:06:00,940 --> 00:06:04,230 typical sequence out of the source is going to wan. 105 00:06:08,580 --> 00:06:08,900 OK. 106 00:06:08,900 --> 00:06:12,660 We drew this in terms of a probability distribution, 107 00:06:12,660 --> 00:06:16,530 which I hope makes things look a little more straightforward. 108 00:06:16,530 --> 00:06:20,770 This is the distribution function of the sample average 109 00:06:20,770 --> 00:06:23,320 of this log pmf random variable. 110 00:06:23,320 --> 00:06:28,210 And what we're saying is that as n gets larger and larger, 111 00:06:28,210 --> 00:06:30,950 the thing that's going to happen because of the fact 112 00:06:30,950 --> 00:06:35,860 that the variance of the sample average -- the sample 113 00:06:35,860 --> 00:06:38,230 average is always a random variable. 114 00:06:38,230 --> 00:06:42,270 The average is not a random variable, it's a fixed number. 115 00:06:42,270 --> 00:06:45,150 And what this is saying is that as n gets larger and 116 00:06:45,150 --> 00:06:49,560 larger, this distribution function here gets closer and 117 00:06:49,560 --> 00:06:52,530 closer to a stack. 118 00:06:52,530 --> 00:06:55,750 In other words, the thing that's happening is that as n 119 00:06:55,750 --> 00:07:00,580 gets big, you go along here, nothing happens suddenly you 120 00:07:00,580 --> 00:07:04,450 move up here and suddenly you move across there. 121 00:07:04,450 --> 00:07:07,270 That's what happens in the limit, mainly the sample 122 00:07:07,270 --> 00:07:13,050 average is, at that point, always equal to the average. 123 00:07:13,050 --> 00:07:15,550 So as n goes to infinity, a typical set approaches 124 00:07:15,550 --> 00:07:17,170 probability 1. 125 00:07:17,170 --> 00:07:20,740 We express that in terms of the Chebyshev inequality in 126 00:07:20,740 --> 00:07:24,650 this way, but the picture says you can interpret it in any 127 00:07:24,650 --> 00:07:26,890 one of a hundred different ways. 128 00:07:26,890 --> 00:07:29,760 Because the real essence of this is not the Chebyshev 129 00:07:29,760 --> 00:07:33,970 inequality, the real essence of it is saying that as n gets 130 00:07:33,970 --> 00:07:35,900 bigger and bigger, this distribution 131 00:07:35,900 --> 00:07:37,950 function becomes a stack. 132 00:07:37,950 --> 00:07:39,970 So that's a nice way of thinking about it. 133 00:07:43,410 --> 00:07:46,570 Let's summarized what we did with that. 134 00:07:46,570 --> 00:07:50,350 All of the major results about typical sets. 135 00:07:50,350 --> 00:07:53,170 The first of them, is this one. 136 00:07:53,170 --> 00:07:56,430 It's a bound on the number of elements in a typical set, I'm 137 00:07:56,430 --> 00:07:59,260 not going to rederive that again, but it comes from 138 00:07:59,260 --> 00:08:01,740 looking at the fact that all of the typical elements have 139 00:08:01,740 --> 00:08:05,790 roughly the same probability. 140 00:08:05,790 --> 00:08:08,880 All of them collectively fill up the whole space, and 141 00:08:08,880 --> 00:08:11,760 therefore you can find the probability of each of them by 142 00:08:11,760 --> 00:08:14,960 taking one and dividing by the probability of each of them. 143 00:08:14,960 --> 00:08:17,140 When you do that, you get two bounds. 144 00:08:17,140 --> 00:08:20,960 One of them says that this magnitude is less than 2 to 145 00:08:20,960 --> 00:08:23,380 the n times H of X plus epsilon. 146 00:08:23,380 --> 00:08:27,250 The other one says it's greater than 2 to the n H of X 147 00:08:27,250 --> 00:08:28,380 minus epsilon. 148 00:08:28,380 --> 00:08:31,130 And you have to throw in this little fudge factor here 149 00:08:31,130 --> 00:08:34,060 because of the fact that the typical set doesn't quite fill 150 00:08:34,060 --> 00:08:36,490 up the whole space. 151 00:08:36,490 --> 00:08:39,770 What that all says is when n gets large the number of 152 00:08:39,770 --> 00:08:43,000 typical elements is approximately equal to 2 to 153 00:08:43,000 --> 00:08:46,990 the n times H of X. Again, think hard 154 00:08:46,990 --> 00:08:48,990 about what this means. 155 00:08:48,990 --> 00:08:54,530 This number here really isn't all like 2 to the n H of X. 156 00:08:54,530 --> 00:08:58,770 This n times epsilon, is going to be a big number. 157 00:08:58,770 --> 00:09:02,610 As n gets bigger, it gets bigger and bigger. 158 00:09:02,610 --> 00:09:06,140 But, at the same time, when you look at 2 to the n times H 159 00:09:06,140 --> 00:09:10,090 of X plus epsilon, and you know that H of X is something 160 00:09:10,090 --> 00:09:14,720 substantial and you know that epsilon is very small, in some 161 00:09:14,720 --> 00:09:17,600 sense it still says this. 162 00:09:17,600 --> 00:09:20,750 And in source coding terms, it very much says this. 163 00:09:20,750 --> 00:09:23,050 Because in source coding terms, what you're always 164 00:09:23,050 --> 00:09:25,070 looking at is these exponents. 165 00:09:25,070 --> 00:09:29,060 Because if you have n different things you're trying 166 00:09:29,060 --> 00:09:32,940 to encode, it takes you log of n bits to encode them. 167 00:09:32,940 --> 00:09:35,890 So when you take the log of this, you see that the number 168 00:09:35,890 --> 00:09:39,720 of extra bits you needs to encode these sequences is on 169 00:09:39,720 --> 00:09:42,030 the order of n times epsilon. 170 00:09:42,030 --> 00:09:45,250 Which is some number of bits, but the real number bits 171 00:09:45,250 --> 00:09:49,110 you're looking at is n times H of X. So that's the major 172 00:09:49,110 --> 00:09:52,030 term, and this is just fiddly. 173 00:09:52,030 --> 00:09:54,630 The next thing we found is that the probability of an 174 00:09:54,630 --> 00:09:58,400 element in a typical set is between 2 the minus n times H 175 00:09:58,400 --> 00:10:01,280 of X plus epsilon and 2 to the minus n times 176 00:10:01,280 --> 00:10:02,760 H of X minus epsilon. 177 00:10:02,760 --> 00:10:07,180 Which is saying almost the same thing as this is saying. 178 00:10:07,180 --> 00:10:11,940 And again, there's this approximation which says this 179 00:10:11,940 --> 00:10:16,130 is about equal to 2 to the minus n of H of X. And this is 180 00:10:16,130 --> 00:10:19,620 an approximation in exactly the same sense that that's an 181 00:10:19,620 --> 00:10:21,440 approximation. 182 00:10:21,440 --> 00:10:25,300 Finally, the last statement is that the probability that 183 00:10:25,300 --> 00:10:27,890 this typical -- 184 00:10:27,890 --> 00:10:31,490 the probability that you get a typical sequence is greater 185 00:10:31,490 --> 00:10:35,140 than or equal to 1 minus this variance divided by n times 186 00:10:35,140 --> 00:10:38,740 epsilon squared, with the same kind of approximation. 187 00:10:38,740 --> 00:10:41,250 The probability that you get the typical 188 00:10:41,250 --> 00:10:44,090 sequence is about 1. 189 00:10:44,090 --> 00:10:46,600 So what this is saying is there are hardly any 190 00:10:46,600 --> 00:10:48,840 exceptions in terms of probability. 191 00:10:48,840 --> 00:10:51,120 There are a huge number of exceptions. 192 00:10:51,120 --> 00:10:52,970 But they're all extraordinarily small in 193 00:10:52,970 --> 00:10:55,830 probability. 194 00:10:55,830 --> 00:11:00,440 So most of the space is all tucked into this typical set. 195 00:11:00,440 --> 00:11:03,910 Most of these, all of these typical sequences, have about 196 00:11:03,910 --> 00:11:05,700 the same probability. 197 00:11:05,700 --> 00:11:09,030 And the number of typical sequences is about 2 to the n 198 00:11:09,030 --> 00:11:11,450 times H of X. 199 00:11:11,450 --> 00:11:14,900 Does that say that the entropy has any significance? 200 00:11:14,900 --> 00:11:17,020 It sure does. 201 00:11:17,020 --> 00:11:20,060 Because the entropy is really what's determining everything 202 00:11:20,060 --> 00:11:22,270 about this distribution, as far as 203 00:11:22,270 --> 00:11:24,480 probabilities are concerned. 204 00:11:24,480 --> 00:11:26,600 There's nothing left beyond that. 205 00:11:26,600 --> 00:11:29,250 If you're going to look at very long sequences, and in 206 00:11:29,250 --> 00:11:30,950 source coding we want to look at long 207 00:11:30,950 --> 00:11:34,460 sequences, that's the story. 208 00:11:34,460 --> 00:11:37,720 So the entropy tells a story. 209 00:11:37,720 --> 00:11:40,470 Despite what Huffman said. 210 00:11:40,470 --> 00:11:45,370 Huffman totally ignored the idea of entropy and because of 211 00:11:45,370 --> 00:11:48,140 that he came up with the optimal algorithm, but his 212 00:11:48,140 --> 00:11:50,450 optimal algorithm didn't tell him anything about 213 00:11:50,450 --> 00:11:51,890 what was going on. 214 00:11:51,890 --> 00:11:52,710 This is not the [UNINTELLIGIBLE] 215 00:11:52,710 --> 00:11:58,260 Huffman, I think his algorithm is one of the neatest things 216 00:11:58,260 --> 00:12:00,090 I've ever seen, because he came up 217 00:12:00,090 --> 00:12:01,390 with it out of nowhere. 218 00:12:01,390 --> 00:12:04,070 Just pure thought, which is nice. 219 00:12:04,070 --> 00:12:06,810 We then start to talk about fixed lenght to fized length 220 00:12:06,810 --> 00:12:08,120 source coding. 221 00:12:08,120 --> 00:12:10,140 This is not a practical thing. 222 00:12:10,140 --> 00:12:14,230 This is not something I would advise trying to do. 223 00:12:14,230 --> 00:12:17,560 It's something which is conceptually useful, because 224 00:12:17,560 --> 00:12:21,910 it's talks about anything you can do eventually, when you 225 00:12:21,910 --> 00:12:25,280 look at an almost infinite sequence of symbols from the 226 00:12:25,280 --> 00:12:28,150 source and you look at how many bits it takes you to 227 00:12:28,150 --> 00:12:30,310 represent them. 228 00:12:30,310 --> 00:12:35,000 Ultimately you need to turn that encoder on, at some point 229 00:12:35,000 --> 00:12:38,700 in pre-history, and pre-history in our present day 230 00:12:38,700 --> 00:12:40,660 is about one year ago. 231 00:12:40,660 --> 00:12:43,440 And you have to turn it off sometime in the distant 232 00:12:43,440 --> 00:12:47,300 future, which is maybe six months from now. 233 00:12:47,300 --> 00:12:50,770 During that time you have to encode all these bits. 234 00:12:50,770 --> 00:12:53,160 So, in fact, when you get it all done and look at the 235 00:12:53,160 --> 00:12:57,870 overall picture, it's fixed length to fixed length. 236 00:12:57,870 --> 00:13:00,560 And all of these algorithms are just ways of doing the 237 00:13:00,560 --> 00:13:04,200 fixed length to fixed length without too much delay 238 00:13:04,200 --> 00:13:06,890 involved in them. 239 00:13:10,780 --> 00:13:14,350 I didn't say everything there. 240 00:13:14,350 --> 00:13:19,280 What this typical set picture gives us there, is you can 241 00:13:19,280 --> 00:13:23,640 achieve an expected number of bits per source symbol, which 242 00:13:23,640 --> 00:13:27,990 is about n times H of X, with very rare failures. 243 00:13:27,990 --> 00:13:32,190 And if you try to achieve H of X minus epsilon bits per 244 00:13:32,190 --> 00:13:36,220 symbol, the interesting thing we found last time, was that 245 00:13:36,220 --> 00:13:42,950 the fraction of sequences you can encode was zilch. 246 00:13:42,950 --> 00:13:45,730 In other words, there is a very rapid transition here. 247 00:13:45,730 --> 00:13:51,320 If you try to get by with too few bits per symbol, you die 248 00:13:51,320 --> 00:13:53,250 very, very quickly. 249 00:13:53,250 --> 00:13:55,910 It's not that your error probability is large, your 250 00:13:55,910 --> 00:13:59,400 error probability is asymptotically equal to 1. 251 00:13:59,400 --> 00:14:02,190 You always screw up. 252 00:14:02,190 --> 00:14:07,890 So that's the picture. 253 00:14:07,890 --> 00:14:09,740 We want to go onto Markow sources. 254 00:14:09,740 --> 00:14:13,080 Let me explain why I want to do this first. 255 00:14:15,960 --> 00:14:19,100 When we're talking about the discrete memoryless sources, 256 00:14:19,100 --> 00:14:24,530 it should be obvious that that's totally a toy problem. 257 00:14:24,530 --> 00:14:28,180 There aren't any sources I can imagine where you would want 258 00:14:28,180 --> 00:14:32,640 to encode their output where you can reasonably conclude 259 00:14:32,640 --> 00:14:34,760 that they were discrete and memoryless. 260 00:14:34,760 --> 00:14:37,040 Namely, that each symbol was independent 261 00:14:37,040 --> 00:14:38,250 of each other symbol. 262 00:14:38,250 --> 00:14:41,060 The only possibility I can think of is where you're 263 00:14:41,060 --> 00:14:44,890 trying to report the results of gambling or something. 264 00:14:44,890 --> 00:14:47,250 And gambling is so dishonest that they probably aren't 265 00:14:47,250 --> 00:14:49,690 independent anyway. 266 00:14:49,690 --> 00:14:51,730 You could use this is a way of showing they aren't 267 00:14:51,730 --> 00:14:55,970 independent, but it's not a very useful thing. 268 00:14:55,970 --> 00:14:59,300 So somehow we want to be able to talk about how do you 269 00:14:59,300 --> 00:15:01,450 encode sources with memory. 270 00:15:01,450 --> 00:15:04,860 Well Markow sources are the easiest kind of 271 00:15:04,860 --> 00:15:06,640 sources with memory. 272 00:15:06,640 --> 00:15:10,260 And they have the nice property that you can include 273 00:15:10,260 --> 00:15:13,010 as much statistics in them as you want to. 274 00:15:13,010 --> 00:15:16,710 You can make them include as much of the structure you can 275 00:15:16,710 --> 00:15:21,430 find as anything else will do. 276 00:15:21,430 --> 00:15:24,290 So that people talk about much more general classes of 277 00:15:24,290 --> 00:15:27,130 sources, but these are really sufficient. 278 00:15:27,130 --> 00:15:30,060 These are sufficient to talk about everything useful. 279 00:15:30,060 --> 00:15:34,060 Not necessarily the nicest way to think about useful things 280 00:15:34,060 --> 00:15:35,990 but it's sufficient. 281 00:15:35,990 --> 00:15:38,590 So a finite state in Markov chain, I assume you're 282 00:15:38,590 --> 00:15:41,690 somewhat familiar with Markov chaings from taking a 283 00:15:41,690 --> 00:15:43,800 probability courses. 284 00:15:43,800 --> 00:15:46,820 If not, you should probably review it there, because the 285 00:15:46,820 --> 00:15:49,150 notes go through it pretty quickly. 286 00:15:49,150 --> 00:15:51,570 There's nothing terribly complicated there, but a 287 00:15:51,570 --> 00:15:55,820 finite state in a Markov chain is a sequence of discrete 288 00:15:55,820 --> 00:15:59,900 chance variables, in that sense it's exactly like the 289 00:15:59,900 --> 00:16:02,700 discrete memoryless sources we were looking at. 290 00:16:02,700 --> 00:16:06,210 The letters come from some finite alphabet, so in that 291 00:16:06,210 --> 00:16:10,590 sense it's like the discrete memoryless sources. 292 00:16:10,590 --> 00:16:16,740 But here the difference is that each letter depends on 293 00:16:16,740 --> 00:16:18,920 the letter before. 294 00:16:18,920 --> 00:16:23,580 Namely, before the Markov chain changes from one state 295 00:16:23,580 --> 00:16:27,040 to another state in one of these steps, it looks at the 296 00:16:27,040 --> 00:16:31,580 state it's in and decides where it's going to go next. 297 00:16:31,580 --> 00:16:36,960 So we have a transition probability matrix, you can 298 00:16:36,960 --> 00:16:39,260 think of this as a matrix, it's something which has 299 00:16:39,260 --> 00:16:43,350 values for every S in the state space, and for everys S 300 00:16:43,350 --> 00:16:45,980 prime in the state space. 301 00:16:45,980 --> 00:16:49,800 Which represents the probability that the state at 302 00:16:49,800 --> 00:16:54,400 time n is equal the this state S, and the state of time n 303 00:16:54,400 --> 00:16:59,000 minus 1 is equal to the states S prime. 304 00:16:59,000 --> 00:17:02,080 So this tells you what's the probabilities are of going 305 00:17:02,080 --> 00:17:04,070 from one state to another. 306 00:17:04,070 --> 00:17:09,440 The important thing here is that this single step 307 00:17:09,440 --> 00:17:14,340 transition incorporates all of the statistical knowledge. 308 00:17:14,340 --> 00:17:18,600 In other words, this is also equal to the probability that 309 00:17:18,600 --> 00:17:23,490 S n is equal to this state S, given that S n minus 1 is 310 00:17:23,490 --> 00:17:28,120 equal to the state S prime, and also that S n minus 2 is 311 00:17:28,120 --> 00:17:32,360 equal to any given state of time n minus 2, and all the 312 00:17:32,360 --> 00:17:37,650 way back to S minus zero being any old state at time S zero. 313 00:17:37,650 --> 00:17:42,330 So it says that this source loses memory, except for the 314 00:17:42,330 --> 00:17:44,370 first thing back. 315 00:17:44,370 --> 00:17:48,270 I like to think of this as a blind frog who has very good 316 00:17:48,270 --> 00:17:52,360 sensory perception, jumping around on lily pads. 317 00:17:52,360 --> 00:17:55,250 In other words, he can't see the lily pad, he can only 318 00:17:55,250 --> 00:17:57,530 sense the nearby lily pads. 319 00:17:57,530 --> 00:18:00,550 So he jumps from one lily pad to the next and when he gets 320 00:18:00,550 --> 00:18:03,960 to the next lily pad, he then decides which lily pad he's 321 00:18:03,960 --> 00:18:05,740 going go to go to next. 322 00:18:10,250 --> 00:18:11,500 That'll wake you up, anyway. 323 00:18:15,150 --> 00:18:20,120 And we also want to define some initial pmf on the 324 00:18:20,120 --> 00:18:21,250 initial state. 325 00:18:21,250 --> 00:18:24,350 So you like to think of Markov chains as starting at some 326 00:18:24,350 --> 00:18:27,880 time and then proceeding forever into the future. 327 00:18:30,440 --> 00:18:34,990 That seems to indicate that all we've done is to replace 328 00:18:34,990 --> 00:18:38,240 one trivial problem with another trivial problem. 329 00:18:38,240 --> 00:18:42,020 In other words, on step memory is not enough to deal with 330 00:18:42,020 --> 00:18:45,590 things like English text and things like that. 331 00:18:45,590 --> 00:18:48,960 Or any other language that you like. 332 00:18:48,960 --> 00:18:55,630 So the idea of a Markov source is that you create whatever 333 00:18:55,630 --> 00:18:58,780 set of states that you want, you can have a very large 334 00:18:58,780 --> 00:19:04,480 state space, but you associate the output of the source not 335 00:19:04,480 --> 00:19:07,140 with the states, but with the transitions from 336 00:19:07,140 --> 00:19:09,750 one state to another. 337 00:19:09,750 --> 00:19:12,840 So this gives an example of it, it's easier to explain the 338 00:19:12,840 --> 00:19:17,370 idea by simply looking at an example. 339 00:19:17,370 --> 00:19:20,090 In this particular example, which I think is the same as 340 00:19:20,090 --> 00:19:25,300 the one in the notes, what you're looking at is a memory 341 00:19:25,300 --> 00:19:27,520 of the two previous states. 342 00:19:27,520 --> 00:19:30,590 So this is a binary source, it produces binary digits, just 343 00:19:30,590 --> 00:19:32,930 either zero or 1. 344 00:19:32,930 --> 00:19:38,190 If the previous two digits were zero zero, it says that 345 00:19:38,190 --> 00:19:43,240 the next digit is going to be a 1 with probability 0.1 and 346 00:19:43,240 --> 00:19:47,910 the next is going to be a zero with probability 0.9. 347 00:19:47,910 --> 00:19:53,280 If you're in states 0,0 and the next thing that comes out 348 00:19:53,280 --> 00:19:57,490 is a zero, then at that point, namely at that point one 349 00:19:57,490 --> 00:20:01,180 advanced into the future, you have a zero as your last 350 00:20:01,180 --> 00:20:04,460 digit, a zero as a previous digit, and a zero as the digit 351 00:20:04,460 --> 00:20:05,530 before that. 352 00:20:05,530 --> 00:20:15,020 So your state has move from xn minus 2 and xn minus 1, to xn 353 00:20:15,020 --> 00:20:19,420 minus 1 and xn. 354 00:20:19,420 --> 00:20:23,830 In other words, any time you make a transition, this digit 355 00:20:23,830 --> 00:20:28,290 here, which is the previous digit, has to always become 356 00:20:28,290 --> 00:20:30,590 that digit. 357 00:20:30,590 --> 00:20:32,500 You'll notice the same thing in all of these. 358 00:20:32,500 --> 00:20:37,750 When you go from here to there, this last digit becomes 359 00:20:37,750 --> 00:20:38,940 the first digit. 360 00:20:38,940 --> 00:20:43,380 When you go from here to here, the last digit because the 361 00:20:43,380 --> 00:20:45,320 first digit, and so forth. 362 00:20:45,320 --> 00:20:48,970 And that's a characteristic of this particular structure of 363 00:20:48,970 --> 00:20:53,640 having the memory represented by the last two digits. 364 00:20:53,640 --> 00:20:56,730 So the kind of output you can get from this source -- 365 00:20:56,730 --> 00:21:01,300 I mean you see that it doesn't do anything very interesting. 366 00:21:01,300 --> 00:21:05,040 When you have two zeroes in the past, it tends to produce 367 00:21:05,040 --> 00:21:09,640 a lot more zeroes, it tends to get stuck with lots of zeroes. 368 00:21:09,640 --> 00:21:14,240 If you got a single 1 that goes over into this state, and 369 00:21:14,240 --> 00:21:18,540 from there it can either go there or there. 370 00:21:18,540 --> 00:21:22,390 Once it gets here, it tends to produce a large number of 1's. 371 00:21:22,390 --> 00:21:25,660 So what's happening here is you have a Markov chain which 372 00:21:25,660 --> 00:21:29,530 goes from long sequences of zeroes, to transition regions 373 00:21:29,530 --> 00:21:32,210 where there are a bunch of zeros and 1's, and finally 374 00:21:32,210 --> 00:21:35,640 gets trapped into either the all zero state again, or the 375 00:21:35,640 --> 00:21:38,300 all 1 state again. 376 00:21:38,300 --> 00:21:40,350 And moves on from there. 377 00:21:40,350 --> 00:21:47,460 So the letter is what's in this case the source output. 378 00:21:47,460 --> 00:21:50,825 If you know the old state, the source output specifies in 379 00:21:50,825 --> 00:21:51,980 this state. 380 00:21:51,980 --> 00:21:55,730 If you know the old state and the new state, that specifies 381 00:21:55,730 --> 00:21:56,170 the letter. 382 00:21:56,170 --> 00:21:58,290 In other words, one of the curious things about this 383 00:21:58,290 --> 00:22:02,410 chain is I've arranged it so that, since we have a binary 384 00:22:02,410 --> 00:22:04,560 output, they're only two possible 385 00:22:04,560 --> 00:22:07,390 transitions from each state. 386 00:22:07,390 --> 00:22:11,840 Which says that the state plus the sequence of source outputs 387 00:22:11,840 --> 00:22:15,970 specifies the state at every point in time, the stayed at 388 00:22:15,970 --> 00:22:19,490 every point in time specifies the source sequence. 389 00:22:19,490 --> 00:22:22,490 In other words, the two are isomorphic to each other. 390 00:22:22,490 --> 00:22:23,920 One specifies the other. 391 00:22:23,920 --> 00:22:28,240 Since one specifies the other, you can pretty much forget 392 00:22:28,240 --> 00:22:32,010 about the sequence and look at the state 393 00:22:32,010 --> 00:22:33,890 chain, if you want to. 394 00:22:33,890 --> 00:22:36,470 And therefore everything you know about Markov chains is 395 00:22:36,470 --> 00:22:38,180 useful here. 396 00:22:38,180 --> 00:22:40,650 Or if you like to think about the real source, you can think 397 00:22:40,650 --> 00:22:43,580 about the real source as producing these letters. 398 00:22:43,580 --> 00:22:46,040 So either way is fine because either one 399 00:22:46,040 --> 00:22:47,270 specifies the other. 400 00:22:47,270 --> 00:22:51,140 When you don't have that property, and you look at a 401 00:22:51,140 --> 00:22:55,420 sequence of letters from this source, these are called 402 00:22:55,420 --> 00:22:58,710 partially specified Markov chains. 403 00:22:58,710 --> 00:23:01,680 And they're awful things to deal with. 404 00:23:01,680 --> 00:23:04,940 You can write theses about, but you don't get any insight 405 00:23:04,940 --> 00:23:05,910 about them. 406 00:23:05,910 --> 00:23:08,140 There's hardly anything that you would like to be true 407 00:23:08,140 --> 00:23:10,540 which is true. 408 00:23:10,540 --> 00:23:14,890 And these are just awful things to look at. 409 00:23:14,890 --> 00:23:16,710 So we won't look them. 410 00:23:16,710 --> 00:23:19,800 One of the nice things about engineering is you can create 411 00:23:19,800 --> 00:23:21,370 your own models. 412 00:23:21,370 --> 00:23:24,420 Mathematicians have to look at the crazy models that 413 00:23:24,420 --> 00:23:26,400 engineers suggest to them. 414 00:23:26,400 --> 00:23:28,230 They have no choice. 415 00:23:28,230 --> 00:23:29,620 That's their job. 416 00:23:29,620 --> 00:23:33,760 But as an engineer, we can only look at the nice models. 417 00:23:33,760 --> 00:23:36,160 We can play and the mathematicians have to work. 418 00:23:36,160 --> 00:23:41,650 So it's nicer to be an engineer, I think. 419 00:23:41,650 --> 00:23:45,150 Although famous mathematicians only look at the engineering 420 00:23:45,150 --> 00:23:48,870 problems that appeal to them, so in fact, when they become 421 00:23:48,870 --> 00:23:53,130 famous the two groups come back together again. 422 00:23:53,130 --> 00:23:56,100 Because the good engineers are also good mathematician, so 423 00:23:56,100 --> 00:24:00,450 they become sort of the same group. 424 00:24:00,450 --> 00:24:05,950 These transitions, mainly the transition lines that we draw 425 00:24:05,950 --> 00:24:09,860 on a graph like this, always indicate positive probability. 426 00:24:09,860 --> 00:24:12,540 In other words, that there's zero probability from going to 427 00:24:12,540 --> 00:24:13,550 here to here. 428 00:24:13,550 --> 00:24:17,090 You don't clutter up the diagram by putting a line in, 429 00:24:17,090 --> 00:24:20,650 which allows you to just look at these transitions to figure 430 00:24:20,650 --> 00:24:23,810 out what's going on. 431 00:24:23,810 --> 00:24:27,490 One of the things that you learned when you study finite 432 00:24:27,490 --> 00:24:33,730 state Markov chains, is that a state s is accessible from 433 00:24:33,730 --> 00:24:38,130 some other state s prime, if the graph has some path from s 434 00:24:38,130 --> 00:24:39,360 prime to s. 435 00:24:39,360 --> 00:24:40,920 In other words, it's not saying you can go 436 00:24:40,920 --> 00:24:42,450 there in one step. 437 00:24:42,450 --> 00:24:45,740 It's saying that there's some way you can get there if you 438 00:24:45,740 --> 00:24:47,060 go long enough. 439 00:24:47,060 --> 00:24:50,630 Which means there's some probability of getting there. 440 00:24:50,630 --> 00:24:54,490 And the fact that there's a probability of getting there 441 00:24:54,490 --> 00:24:58,930 pretty much means that you're going to get there eventually. 442 00:24:58,930 --> 00:25:02,200 That's not an obvious statement but let's see what 443 00:25:02,200 --> 00:25:04,330 that means here. 444 00:25:04,330 --> 00:25:08,210 This state is accessible from this state, in the sense that 445 00:25:08,210 --> 00:25:12,840 you can get from here to there by going over here and then 446 00:25:12,840 --> 00:25:15,890 going to here. 447 00:25:15,890 --> 00:25:19,490 If we look at the states which are accessible from each 448 00:25:19,490 --> 00:25:23,640 other, you get some set of states, and if you're in that 449 00:25:23,640 --> 00:25:26,430 set of states, you can never get out of it. 450 00:25:26,430 --> 00:25:30,290 Therefore, every one of those states remains with positive 451 00:25:30,290 --> 00:25:33,680 probability and you keep rotating back and forth 452 00:25:33,680 --> 00:25:38,950 between them in some way, but you never get out of them. 453 00:25:38,950 --> 00:25:43,480 In other words, a Markov chain which doesn't have this 454 00:25:43,480 --> 00:25:46,700 property would be the following Markov chain. 455 00:25:49,720 --> 00:25:51,880 That's the simplest one I can think of. 456 00:25:51,880 --> 00:25:54,440 If you start out in this state you stay there. 457 00:25:54,440 --> 00:26:00,100 If you start out in this state, you stay there, That's 458 00:26:00,100 --> 00:26:01,930 not a very nice chain. 459 00:26:01,930 --> 00:26:05,390 Is this a decent model for an engineering study? 460 00:26:05,390 --> 00:26:08,050 No. 461 00:26:08,050 --> 00:26:11,120 Because when you're looking at engineering, the thing that 462 00:26:11,120 --> 00:26:13,410 you're interested in, is you're looking at something 463 00:26:13,410 --> 00:26:16,240 that happens over a long period of time. 464 00:26:16,240 --> 00:26:19,640 Back at time infinity you can decide whether you're here, or 465 00:26:19,640 --> 00:26:24,940 whether you're here, and you might as well not worry a 466 00:26:24,940 --> 00:26:29,500 whole lot about what happened back at time minus infinity, 467 00:26:29,500 --> 00:26:30,800 as far as building your model. 468 00:26:30,800 --> 00:26:33,550 You may as well just build a model for this, or build a 469 00:26:33,550 --> 00:26:35,540 model for that. 470 00:26:35,540 --> 00:26:39,030 There's another thing here, which is periodicity. 471 00:26:39,030 --> 00:26:45,980 In some chains, you can go from this state to this state 472 00:26:45,980 --> 00:26:49,320 in one step, well one, two steps. 473 00:26:49,320 --> 00:26:54,350 Or you can go there in one, two, three, four steps. 474 00:26:54,350 --> 00:27:00,650 Or you can go there in one, two, three steps and so forth. 475 00:27:00,650 --> 00:27:05,690 Which says, in terms of m the period of s is the greatest 476 00:27:05,690 --> 00:27:08,420 common denominator of path lengths 477 00:27:08,420 --> 00:27:11,250 from s back to s again. 478 00:27:11,250 --> 00:27:16,850 If that period is equal to 1, mainly if there's not some 479 00:27:16,850 --> 00:27:20,210 periodic structure which says the only way you can get back 480 00:27:20,210 --> 00:27:27,000 to a state is by coming back every two steps or every three 481 00:27:27,000 --> 00:27:29,100 steps or something, then again it's not a 482 00:27:29,100 --> 00:27:31,400 very nice Markov chain. 483 00:27:31,400 --> 00:27:36,060 Because if you're modeling it, you might as well just model 484 00:27:36,060 --> 00:27:40,520 things over two states instead of over single states. 485 00:27:40,520 --> 00:27:45,510 So the upshot of that is you define these nice Markov 486 00:27:45,510 --> 00:27:49,790 chains, which are aperiodic, which don't have any of this 487 00:27:49,790 --> 00:27:53,470 periodic structure, and every state is accessible from every 488 00:27:53,470 --> 00:27:54,730 other state. 489 00:27:54,730 --> 00:27:58,790 And you call them ergodic Markov chains. 490 00:27:58,790 --> 00:28:04,600 And what ergodic means, sort of and as a more general 491 00:28:04,600 --> 00:28:15,140 principle, is that the probabilities of things are 492 00:28:15,140 --> 00:28:17,180 equal to the relative frequency of things. 493 00:28:17,180 --> 00:28:20,210 Namely, if you look at a very long sequence of things out of 494 00:28:20,210 --> 00:28:24,130 a Markov chain, what you see in that very long sequence 495 00:28:24,130 --> 00:28:28,420 should be representative of the probabilities of that very 496 00:28:28,420 --> 00:28:29,270 long sequence. 497 00:28:29,270 --> 00:28:32,230 The probabilities of transitions at various times 498 00:28:32,230 --> 00:28:35,380 should be the same from one time to another. 499 00:28:35,380 --> 00:28:37,050 And that's whate ergodicity means. 500 00:28:37,050 --> 00:28:39,030 It means that the thing is stationery. 501 00:28:39,030 --> 00:28:42,080 You look at it at one time, it behaves the same way as at 502 00:28:42,080 --> 00:28:43,010 another time. 503 00:28:43,010 --> 00:28:45,820 It doesn't have any periodic structure, which means if you 504 00:28:45,820 --> 00:28:49,110 look at it at even times, it behaves differently from 505 00:28:49,110 --> 00:28:51,310 looking at it at odd times. 506 00:28:51,310 --> 00:28:53,920 That's the kind of Markov chain you would think you 507 00:28:53,920 --> 00:28:56,890 would have, unless you look at these odd ball examples of 508 00:28:56,890 --> 00:28:57,920 other things. 509 00:28:57,920 --> 00:29:01,170 Everything we do is going to be based on the idea of 510 00:29:01,170 --> 00:29:03,580 ergodic Markov chains, because they're the 511 00:29:03,580 --> 00:29:06,020 nicest models to use. 512 00:29:06,020 --> 00:29:09,250 A Markov source then is a sequence of labeled 513 00:29:09,250 --> 00:29:12,340 transitions on an and ergodic Markov chain. 514 00:29:12,340 --> 00:29:14,530 Those are the only things we want to look at. 515 00:29:14,530 --> 00:29:16,350 But that's general enough to do most of the 516 00:29:16,350 --> 00:29:17,600 things we want to do. 517 00:29:21,860 --> 00:29:24,470 And once you have ergodic Markov chains, there are a lot 518 00:29:24,470 --> 00:29:25,870 of nice things that happen. 519 00:29:29,690 --> 00:29:38,460 Mainly, if you try to solve this set of equations, namely 520 00:29:38,460 --> 00:29:42,480 if you want to say, well suppose there is some pmf 521 00:29:42,480 --> 00:29:45,090 function which gives me the relative 522 00:29:45,090 --> 00:29:47,850 frequency of a given state. 523 00:29:47,850 --> 00:29:51,470 Namely, if I look at an enormous number of states, I 524 00:29:51,470 --> 00:29:57,800 would like the state little s to come up with 525 00:29:57,800 --> 00:29:59,460 some relative frequency. 526 00:29:59,460 --> 00:30:01,400 All the time that I do it. 527 00:30:01,400 --> 00:30:05,610 Namely, I wouldn't like to have one sample path which 528 00:30:05,610 --> 00:30:08,750 comes up with a relative frequency 1/2, and another 529 00:30:08,750 --> 00:30:12,050 sample path of almost infinite length which comes up with a 530 00:30:12,050 --> 00:30:13,480 different relative frequency. 531 00:30:13,480 --> 00:30:17,720 Because it would mean that different sequences of states 532 00:30:17,720 --> 00:30:21,610 are not typical of the Markov chain. 533 00:30:21,610 --> 00:30:26,460 That's another way of looking at what ergodicity means. 534 00:30:26,460 --> 00:30:29,270 It means that infinite length sequences 535 00:30:29,270 --> 00:30:31,670 are not typical anymore. 536 00:30:31,670 --> 00:30:36,020 It depends on when they start, when they stop. 537 00:30:36,020 --> 00:30:38,020 It depends on whether you start at an even 538 00:30:38,020 --> 00:30:39,560 time or an odd time. 539 00:30:39,560 --> 00:30:43,920 Depends on all of these things, that -- all of these 540 00:30:43,920 --> 00:30:47,250 things that real sources shouldn't depend on. 541 00:30:47,250 --> 00:30:51,220 So, if you have relative frequencies then you should 542 00:30:51,220 --> 00:30:54,390 have those relative frequencies at time n and at 543 00:30:54,390 --> 00:30:56,020 time n minus 1. 544 00:30:56,020 --> 00:31:02,030 And probability of a particular symbol s, if the 545 00:31:02,030 --> 00:31:05,890 probabilities of the previous symbol s prime were the same 546 00:31:05,890 --> 00:31:07,590 values, q of s prime. 547 00:31:07,590 --> 00:31:10,390 We know the transition probabilities, that's q of s 548 00:31:10,390 --> 00:31:11,880 given s prime. 549 00:31:11,880 --> 00:31:18,440 The sum over s prime, of q of s prime, given capital Q of s 550 00:31:18,440 --> 00:31:20,530 given s prime, what is that? 551 00:31:20,530 --> 00:31:24,340 That's the probability of x. 552 00:31:24,340 --> 00:31:27,970 In other words, if you start out at time n minus 1 with 553 00:31:27,970 --> 00:31:33,100 something pmf function on the states at time n minus 1, this 554 00:31:33,100 --> 00:31:37,180 is the formula you would use to calculate the probability 555 00:31:37,180 --> 00:31:42,560 the pmf function for states at time s. 556 00:31:42,560 --> 00:31:47,120 This is the probability mass function for the states at the 557 00:31:47,120 --> 00:31:50,640 next unit of time, time n. 558 00:31:50,640 --> 00:31:53,130 If this probability distribution is the same as 559 00:31:53,130 --> 00:31:57,380 this probability distribution, then you say that you're in 560 00:31:57,380 --> 00:32:00,970 steady state, because you do this again, you plug this into 561 00:32:00,970 --> 00:32:02,410 here, it's the same thing. 562 00:32:02,410 --> 00:32:05,070 You get the same answer at time n plus 1. 563 00:32:05,070 --> 00:32:08,350 You plug it in again you got the same answer at time n plus 564 00:32:08,350 --> 00:32:10,860 2, and so on forever. 565 00:32:10,860 --> 00:32:14,250 So you stay in steady state. 566 00:32:14,250 --> 00:32:18,540 The question is, if you have a matrix here, can you solve 567 00:32:18,540 --> 00:32:20,400 this equation? 568 00:32:20,400 --> 00:32:23,030 Is it easy to solve? 569 00:32:23,030 --> 00:32:25,840 And what's the solution? 570 00:32:25,840 --> 00:32:29,590 There's a nice theorem that says, if the chain is ergodic, 571 00:32:29,590 --> 00:32:33,000 namely if it has these nice properties of transition 572 00:32:33,000 --> 00:32:35,860 probabilities, that corresponds to a particular 573 00:32:35,860 --> 00:32:37,340 kind of matrix here. 574 00:32:37,340 --> 00:32:41,790 If you have that kind of matrix, this is just a vector 575 00:32:41,790 --> 00:32:44,440 matrix equation. 576 00:32:44,440 --> 00:32:49,230 That vector matrix equation has a unique solution then for 577 00:32:49,230 --> 00:32:54,550 this probability little q, in terms of this transition 578 00:32:54,550 --> 00:32:55,590 probability. 579 00:32:55,590 --> 00:32:59,230 It also has the nice property that if you start out with any 580 00:32:59,230 --> 00:33:02,790 old distribution, and you grind this thing away for a 581 00:33:02,790 --> 00:33:08,840 number of times, this q of x is going to approach the 582 00:33:08,840 --> 00:33:10,170 steady state solution. 583 00:33:10,170 --> 00:33:14,090 Which means if you start a Markov chain out in some known 584 00:33:14,090 --> 00:33:18,580 state, after awhile the probability that you're in 585 00:33:18,580 --> 00:33:22,190 state s is going to become this steady state probability. 586 00:33:22,190 --> 00:33:25,220 It gets closer and closer to it exponentially 587 00:33:25,220 --> 00:33:27,730 as time goes on. 588 00:33:27,730 --> 00:33:31,170 So that's just arithmetic. 589 00:33:31,170 --> 00:33:33,520 These steady state probabilities are approached 590 00:33:33,520 --> 00:33:37,380 asymptotically from any starting state, i.e. for all s 591 00:33:37,380 --> 00:33:38,790 and s prime and s. 592 00:33:38,790 --> 00:33:42,260 The limit of the probability that s sub n, the state at 593 00:33:42,260 --> 00:33:46,320 time n, is equal to a given state s, given that the state 594 00:33:46,320 --> 00:33:49,020 at time zero was equal to s prime. 595 00:33:49,020 --> 00:33:53,370 This probability is equal to q of s and the limit as n goes 596 00:33:53,370 --> 00:33:54,990 to infinity. 597 00:33:54,990 --> 00:33:57,770 All of you know those things from studying Markov chains. 598 00:33:57,770 --> 00:34:01,670 I hope, because those are the main facts 599 00:34:01,670 --> 00:34:03,870 about Markov chains. 600 00:34:03,870 --> 00:34:05,190 Incidentally I'm not interested in 601 00:34:05,190 --> 00:34:07,220 Markov chains here. 602 00:34:07,220 --> 00:34:10,630 We're not going to do anything with them, the only thing I 603 00:34:10,630 --> 00:34:14,700 want to do with them is to show you that there are ways 604 00:34:14,700 --> 00:34:18,490 of modeling real sources, and coming as close to good models 605 00:34:18,490 --> 00:34:22,480 for real sources as you want to come. 606 00:34:22,480 --> 00:34:24,550 That's the whole approach here. 607 00:34:24,550 --> 00:34:28,770 How do you do coding for Markov sources? 608 00:34:28,770 --> 00:34:33,500 The simplest approach, which doesn't work very well, is to 609 00:34:33,500 --> 00:34:37,590 use a separate prefix-free code for each prior state. 610 00:34:37,590 --> 00:34:47,225 Namely, if I look at this Markov chain that I had, it 611 00:34:47,225 --> 00:34:51,220 says that when I'm in the state I want to somehow encode 612 00:34:51,220 --> 00:34:53,930 the next state that I go to, or the next letter that comes 613 00:34:53,930 --> 00:34:55,470 out of the Markov source. 614 00:34:55,470 --> 00:34:57,810 The things that can come out of the Markov source are 615 00:34:57,810 --> 00:35:01,530 either a 1 or a zero. 616 00:35:01,530 --> 00:35:04,130 Now you see the whole problem with this approach. 617 00:35:04,130 --> 00:35:07,870 As soon as you look at an example, it sort of blows the 618 00:35:07,870 --> 00:35:09,990 cover on this. 619 00:35:09,990 --> 00:35:15,340 What's the best prefix-free code to encode a 1 and a zero? 620 00:35:15,340 --> 00:35:17,840 Where one appears with probability 0.9 and the other 621 00:35:17,840 --> 00:35:20,480 one appears with probability 0.1. 622 00:35:20,480 --> 00:35:24,200 What's the Huffman encoder do? 623 00:35:24,200 --> 00:35:27,640 It assigns one of those symols to 1 and one of them to zero. 624 00:35:27,640 --> 00:35:30,390 You might as well encode 1 to this one, and 625 00:35:30,390 --> 00:35:31,710 zero to that one. 626 00:35:31,710 --> 00:35:38,260 Which means that all of the theory, all it does is 627 00:35:38,260 --> 00:35:40,850 generate the same symbols that you had before. 628 00:35:40,850 --> 00:35:44,040 You're not doing any compression at all. 629 00:35:44,040 --> 00:35:45,400 It's a nice thing to think about. 630 00:35:47,990 --> 00:35:50,300 In other words, by thinking about these things you then 631 00:35:50,300 --> 00:35:51,660 see the solution. 632 00:35:51,660 --> 00:35:55,740 And our solution before to these problems was that if you 633 00:35:55,740 --> 00:36:00,170 don't get anywhere by using a prefix-free code on a single 634 00:36:00,170 --> 00:36:03,230 digit, take a block of n digits and use a prefix-free 635 00:36:03,230 --> 00:36:04,530 code there. 636 00:36:04,530 --> 00:36:08,240 So that's the approach we will take here. 637 00:36:08,240 --> 00:36:13,300 The general idea for single letters is this prefix-free 638 00:36:13,300 --> 00:36:17,330 code we're going to generate satisfies a Kraft inequality, 639 00:36:17,330 --> 00:36:19,980 you can use the Huffman algorithm, you get this 640 00:36:19,980 --> 00:36:21,970 property here. 641 00:36:21,970 --> 00:36:27,940 And this entropy, which is now a function of the particular 642 00:36:27,940 --> 00:36:31,330 state that we were in to start with, is just 643 00:36:31,330 --> 00:36:32,770 this entropy here. 644 00:36:32,770 --> 00:36:36,540 So this is a conditional entropy, which we get a 645 00:36:36,540 --> 00:36:38,530 different conditional entropy for each 646 00:36:38,530 --> 00:36:40,320 possible previous state. 647 00:36:40,320 --> 00:36:43,250 And that conditional entropy for each possible previous 648 00:36:43,250 --> 00:36:47,580 state, is what tells us exactly what we can do as far 649 00:36:47,580 --> 00:36:50,320 as generating a Huffman code for that next state. 650 00:36:50,320 --> 00:36:53,630 This would work fine if you had a symbol alphabet of size 651 00:36:53,630 --> 00:36:55,730 10,000 or something. 652 00:36:55,730 --> 00:36:57,080 It just doesn't work well when your 653 00:36:57,080 --> 00:36:58,660 symbol alphabet is binary. 654 00:37:09,990 --> 00:37:12,480 If we start out in a steady state, then all of these 655 00:37:12,480 --> 00:37:16,420 probability stay in steady state. 656 00:37:16,420 --> 00:37:20,840 When we look at the number of binary digits per source 657 00:37:20,840 --> 00:37:24,280 symbol and we average them over all of the initial 658 00:37:24,280 --> 00:37:26,590 states, the initial states occur with these 659 00:37:26,590 --> 00:37:29,080 probabilities q of s. 660 00:37:29,080 --> 00:37:32,040 We have these best Huffman code. 661 00:37:32,040 --> 00:37:36,940 So the number of binary digits we're using per source symbol 662 00:37:36,940 --> 00:37:39,920 is really this average here. 663 00:37:39,920 --> 00:37:41,820 Because this is averaging over all the states 664 00:37:41,820 --> 00:37:44,730 you're going to go into. 665 00:37:44,730 --> 00:37:52,000 And the entropy of the source output, conditional on the 666 00:37:52,000 --> 00:37:55,430 chance variable s, is now, in fact, defined 667 00:37:55,430 --> 00:37:58,280 just as that average. 668 00:37:58,280 --> 00:38:01,790 So they encoder transmits s zero, followed by the code 669 00:38:01,790 --> 00:38:05,150 word for s 1 using s zero. 670 00:38:05,150 --> 00:38:08,740 That specifies s 1, and then you encode x 2, 671 00:38:08,740 --> 00:38:11,480 using s 1 and so forth. 672 00:38:11,480 --> 00:38:13,760 And the decoder is sitting there and the decoder does 673 00:38:13,760 --> 00:38:15,980 exactly the same thing. 674 00:38:15,980 --> 00:38:21,220 Namely, the decoder first sees what s zero is, then it uses 675 00:38:21,220 --> 00:38:25,750 the code for s zero to decide what x 1 was, then it uses the 676 00:38:25,750 --> 00:38:34,050 code for s 1, which is determined by s 1, s one is 677 00:38:34,050 --> 00:38:38,100 determined by x 1, and it goes on and on like that. 678 00:38:43,180 --> 00:38:48,460 Let me review a little bit about conditional entropy. 679 00:38:48,460 --> 00:38:51,080 I'm going pretty fast here and I'm not deriving these things 680 00:38:51,080 --> 00:38:54,810 because they're in the notes. 681 00:38:54,810 --> 00:38:58,880 And it's almost as if I want you to get some kind of a 682 00:38:58,880 --> 00:39:03,300 pattern sensitivity to these things, without the idea that 683 00:39:03,300 --> 00:39:04,350 we're going to use them a lot. 684 00:39:04,350 --> 00:39:08,040 You ought to have a general idea of waht the results are. 685 00:39:08,040 --> 00:39:10,860 We're not going to spend a lot of time on this because, as I 686 00:39:10,860 --> 00:39:14,610 said before, the only thing I want you to recognize is that 687 00:39:14,610 --> 00:39:17,390 if you ever want to model a source, this, in fact, gives 688 00:39:17,390 --> 00:39:20,350 you a general way of doing it. 689 00:39:20,350 --> 00:39:25,120 So this general entropy then is using what we had before. 690 00:39:25,120 --> 00:39:28,510 It's the sum over all the states, and the sum over all 691 00:39:28,510 --> 00:39:36,880 of the source outputs of these log pmf probabilities. 692 00:39:36,880 --> 00:39:43,050 This general entropy of both a symbol and a state is equal to 693 00:39:43,050 --> 00:39:45,160 this combined thing. 694 00:39:45,160 --> 00:39:48,730 Which is equal, not surprisingly, to the entropy 695 00:39:48,730 --> 00:39:50,110 of the state. 696 00:39:50,110 --> 00:39:53,650 Namely, first you want to know what the state is that has 697 00:39:53,650 --> 00:39:55,060 this entropy. 698 00:39:55,060 --> 00:39:59,020 And then given the state, this is the entropy of the next 699 00:39:59,020 --> 00:40:01,060 letter, conditional on the state. 700 00:40:03,610 --> 00:40:11,140 Since this joint entropy is less than or equal to H of s 701 00:40:11,140 --> 00:40:14,000 plus H of x, I think you just proved that in the homework, 702 00:40:14,000 --> 00:40:15,300 didn't you? 703 00:40:15,300 --> 00:40:17,620 I hope so. 704 00:40:17,620 --> 00:40:22,660 That says that the entropy of x conditional on s, is less 705 00:40:22,660 --> 00:40:28,270 than or equal to the entropy of x, which is not surprising. 706 00:40:28,270 --> 00:40:31,810 It says that if you use the previous state in trying to do 707 00:40:31,810 --> 00:40:34,660 source encoding, you're going to do better than if 708 00:40:34,660 --> 00:40:35,910 you don't use it. 709 00:40:38,360 --> 00:40:40,010 I mean the whole theory would be pretty 710 00:40:40,010 --> 00:40:41,260 stupid if you didn't. 711 00:40:47,010 --> 00:40:49,700 That's what that says. 712 00:40:49,700 --> 00:40:52,380 As I told you before, the only way we can make all of this 713 00:40:52,380 --> 00:40:56,230 work is to use n-to-variable-length codes for 714 00:40:56,230 --> 00:40:56,930 each state. 715 00:40:56,930 --> 00:41:01,390 In other words, you encode n letters at the same time. 716 00:41:01,390 --> 00:41:07,060 If you look at the entropy of the first n states given a 717 00:41:07,060 --> 00:41:12,350 starting state, turns out to be n times H of x given s. 718 00:41:12,350 --> 00:41:15,320 By the same kind of rule that you were using before. 719 00:41:27,020 --> 00:41:30,010 The same argument that you used to show that this is 720 00:41:30,010 --> 00:41:36,460 equal to that, you can use to show that this is equal to H 721 00:41:36,460 --> 00:41:41,940 of S 1, given S zero, plus H of S 2, given S 1, plus H of S 722 00:41:41,940 --> 00:41:44,370 3, given S 2 and so forth. 723 00:41:44,370 --> 00:41:47,070 And by the stationarity that we have here, these are all 724 00:41:47,070 --> 00:41:51,080 equal, so you wind up with n times times 725 00:41:51,080 --> 00:41:53,260 this conditional entropy. 726 00:41:53,260 --> 00:41:57,650 And since the source outputs specified the states, and the 727 00:41:57,650 --> 00:42:01,220 states specified the source outputs, you can then convince 728 00:42:01,220 --> 00:42:05,560 yourself that this entropy is also equal to n times H of X, 729 00:42:05,560 --> 00:42:09,760 given S. And once you do that, you're back in the same 730 00:42:09,760 --> 00:42:13,470 position we were in when we looked at n-to-variable-length 731 00:42:13,470 --> 00:42:18,220 coding when we were looking at discrete memoryless sources. 732 00:42:18,220 --> 00:42:21,330 Namely, the only thing that happens when you're looking at 733 00:42:21,330 --> 00:42:26,680 n-to-variable length coding, is it that one fudge factor 734 00:42:26,680 --> 00:42:29,030 becomes a 1 over n. 735 00:42:29,030 --> 00:42:33,330 When you have a small symbol space by going two blocks, you 736 00:42:33,330 --> 00:42:38,060 get rid of this, you can make the expected length close to H 737 00:42:38,060 --> 00:42:42,490 of X, given S. Which means, in fact, that all of the memory 738 00:42:42,490 --> 00:42:46,700 is taking into account and it still is this one parameter, 739 00:42:46,700 --> 00:42:50,830 the entropy, that says everything. 740 00:42:50,830 --> 00:42:53,690 The AEP holds -- 741 00:42:53,690 --> 00:42:56,530 I mean if you want to, you can sit down and just see that it 742 00:42:56,530 --> 00:43:01,130 holds, once you see what these entropies are, I mean the 743 00:43:01,130 --> 00:43:06,110 entropy you're using log pmf's, you're just looking at 744 00:43:06,110 --> 00:43:09,960 products of probabilities, which are sums of log pmf's, 745 00:43:09,960 --> 00:43:13,170 and everything is the same as before. 746 00:43:13,170 --> 00:43:17,920 And again, if you're using n-to-variable-length codes, 747 00:43:17,920 --> 00:43:22,010 you just can't achieve an expected length less than H of 748 00:43:22,010 --> 00:43:26,510 X, given S. So H of X, given S gives the whole story. 749 00:43:31,140 --> 00:43:34,130 You should read those notes, because I've gone through that 750 00:43:34,130 --> 00:43:38,160 very, very fast, partly because some of you are 751 00:43:38,160 --> 00:43:40,840 already very familiar with Markov chains some of you are 752 00:43:40,840 --> 00:43:44,490 probably less familiar with it, so you should check that 753 00:43:44,490 --> 00:43:45,730 out a little bit on your own. 754 00:43:45,730 --> 00:43:50,980 I want to talk about the Lempel Ziv universal 755 00:43:50,980 --> 00:43:56,270 algorithm, which was rather surprising to many people. 756 00:43:59,440 --> 00:44:03,120 Jacob Ziv is one of the great theorists 757 00:44:03,120 --> 00:44:05,560 of information theory. 758 00:44:05,560 --> 00:44:08,300 Before this time, he wrote a lot of very, 759 00:44:08,300 --> 00:44:10,720 very powerful papers. 760 00:44:10,720 --> 00:44:14,500 Which were quite hard to read in many cases. 761 00:44:14,500 --> 00:44:17,690 So it was a real surprise to people when he came up with 762 00:44:17,690 --> 00:44:21,750 this beautiful idea, which was a lovely, simple algorithm. 763 00:44:21,750 --> 00:44:25,425 Some people, because of that, thought it was Abe Lempel, who 764 00:44:25,425 --> 00:44:28,760 spends part of his year at Brandeis, who was really the 765 00:44:28,760 --> 00:44:30,720 genius behind it. 766 00:44:30,720 --> 00:44:34,160 In fact, it wasn't Abe Lempel, it was Jacob Ziv who was the 767 00:44:34,160 --> 00:44:36,070 genius behind it. 768 00:44:36,070 --> 00:44:42,390 Abe Lempel was pretty much the one who really implemented it, 769 00:44:42,390 --> 00:44:45,630 because once you see what the algorithm is, it's still not 770 00:44:45,630 --> 00:44:49,650 trivial to try to find how to implement it in a 771 00:44:49,650 --> 00:44:52,990 simple, easy way. 772 00:44:52,990 --> 00:44:56,350 If you look at all the articles about it, the authors 773 00:44:56,350 --> 00:45:00,230 are Ziv and Lempel, instead of Lempel and Ziv, so why it got 774 00:45:00,230 --> 00:45:04,020 called Lempel Ziv is a mystery to everyone. 775 00:45:04,020 --> 00:45:07,450 Anyway, they came up with two algorithms. 776 00:45:07,450 --> 00:45:12,770 One which they came up with in 1977, and people looked at 777 00:45:12,770 --> 00:45:16,530 their 1977 algorithm and said, oh that's much too complicated 778 00:45:16,530 --> 00:45:18,210 to implement. 779 00:45:18,210 --> 00:45:20,830 So they went back to the drawing board, came up with 780 00:45:20,830 --> 00:45:24,890 another one in 1978, which people said, ah, we can 781 00:45:24,890 --> 00:45:26,900 implement that. 782 00:45:26,900 --> 00:45:31,750 So people started implementing the LZ78 and of course, by 783 00:45:31,750 --> 00:45:34,920 that time, all the technology was much better. 784 00:45:34,920 --> 00:45:39,380 You could do things faster and cheaper then you could before, 785 00:45:39,380 --> 00:45:44,920 and what happened then is that a few years after that people 786 00:45:44,920 --> 00:45:48,720 were implementing LZ77, which turned out 787 00:45:48,720 --> 00:45:50,190 to work much better. 788 00:45:50,190 --> 00:45:51,930 Which is often the way this field works. 789 00:45:51,930 --> 00:45:55,030 People do something interesting theoretically, 790 00:45:55,030 --> 00:45:59,580 people say, no you can't do it, so they simplify it, 791 00:45:59,580 --> 00:46:03,200 thereby destroying some of its best characteristics. 792 00:46:03,200 --> 00:46:06,210 And then a few years later people are doing the more 793 00:46:06,210 --> 00:46:08,930 sophisticated thing, which they should have started out 794 00:46:08,930 --> 00:46:10,180 doing at the beginning. 795 00:46:13,730 --> 00:46:17,210 What is a Universal Data Compression algorithm? 796 00:46:17,210 --> 00:46:21,330 A Universal Data Compression algorithm, is an algorithm 797 00:46:21,330 --> 00:46:25,240 which doesn't have any probabilities tucked into it. 798 00:46:25,240 --> 00:46:28,690 In other words, the algorithm itself simply looks at a 799 00:46:28,690 --> 00:46:31,320 sequence of letters from an alphabet, and 800 00:46:31,320 --> 00:46:34,060 encodes it in some way. 801 00:46:34,060 --> 00:46:38,300 And what you would like to be able to do is somehow measure 802 00:46:38,300 --> 00:46:42,400 what the statistics are, and at the same time as you're 803 00:46:42,400 --> 00:46:47,000 measuring the statistics, you want to encode the digits. 804 00:46:47,000 --> 00:46:49,630 You don't care too much about delay. 805 00:46:49,630 --> 00:46:51,610 In fact, one way to do this -- 806 00:46:51,610 --> 00:46:53,680 I mean, if you're surprised that you can build a good 807 00:46:53,680 --> 00:46:56,880 universal encoder, you shouldn't be. 808 00:46:56,880 --> 00:47:00,200 Because you could just take the first million letters out 809 00:47:00,200 --> 00:47:03,510 of the source, go through all the statistical analysis you 810 00:47:03,510 --> 00:47:08,560 want to, model the source in whatever way makes the best 811 00:47:08,560 --> 00:47:12,270 sense to you, and then build a Huffman encoder, which, in 812 00:47:12,270 --> 00:47:16,780 fact, encodes things according to that model that you have. 813 00:47:16,780 --> 00:47:19,710 Of course you then have to send the decoder the first 814 00:47:19,710 --> 00:47:23,560 million digits, and the decoder goes through the same 815 00:47:23,560 --> 00:47:26,730 statistical analysis, and therefore finds out what code 816 00:47:26,730 --> 00:47:30,410 you're going to use, and then the encoder encodes, the 817 00:47:30,410 --> 00:47:34,690 decoder decodes and you have this million symbols of 818 00:47:34,690 --> 00:47:38,460 overhead in the algorithm, that if you use the algorithm 819 00:47:38,460 --> 00:47:41,710 for a billion letters instead of a million letters, then it 820 00:47:41,710 --> 00:47:43,160 all works pretty well. 821 00:47:43,160 --> 00:47:46,610 So there's a little bit of that flavor here, but the 822 00:47:46,610 --> 00:47:50,730 other part of it is, it's a neat algorithm. 823 00:47:50,730 --> 00:47:53,550 And the algorithm measures things in a faster way then 824 00:47:53,550 --> 00:47:55,520 you would believe. 825 00:47:55,520 --> 00:47:59,080 And as you look at it later you say, gee, this makes a 826 00:47:59,080 --> 00:48:02,820 great deal of sense even if there isn't much statistical 827 00:48:02,820 --> 00:48:04,690 structure here. 828 00:48:04,690 --> 00:48:09,860 In other words, you can show that if the source really is a 829 00:48:09,860 --> 00:48:15,220 Markov source, then this algorithm will behave just as 830 00:48:15,220 --> 00:48:20,740 well, asymptotically, as the best algorithm you can design 831 00:48:20,740 --> 00:48:22,350 for that Markov source. 832 00:48:22,350 --> 00:48:28,450 Namely, it's so good that it will in fact measure the 833 00:48:28,450 --> 00:48:33,970 statistics in that Markov model and implement them. 834 00:48:33,970 --> 00:48:36,780 But it does something better than that. 835 00:48:36,780 --> 00:48:40,280 And the thing which is better is that, if you're going to 836 00:48:40,280 --> 00:48:43,690 look at this first million symbols and your objective 837 00:48:43,690 --> 00:48:48,920 then is to build a Markov model, after you build the 838 00:48:48,920 --> 00:48:52,240 Markov model for that million symbols, one of the things 839 00:48:52,240 --> 00:48:54,890 that you always question about is, should I have used a 840 00:48:54,890 --> 00:48:59,270 Markov model or should I have used some other kind of model? 841 00:48:59,270 --> 00:49:01,330 And that's a difficult question to ask. 842 00:49:01,330 --> 00:49:04,460 You go through all of the different possibilities, and 843 00:49:04,460 --> 00:49:07,460 one of the nice things about the Lempel Ziv algorithm is, 844 00:49:07,460 --> 00:49:11,010 in a sense, it just does this automatically. 845 00:49:11,010 --> 00:49:13,620 If there's some kind of statistical structure there, 846 00:49:13,620 --> 00:49:15,820 it's going to find it. 847 00:49:15,820 --> 00:49:19,040 If it's not Markov, if it's some other kind of structure, 848 00:49:19,040 --> 00:49:20,050 it will find it. 849 00:49:20,050 --> 00:49:22,790 The question is how does it find this statistical 850 00:49:22,790 --> 00:49:27,800 structure without knowing what kind of model you should use 851 00:49:27,800 --> 00:49:29,010 to start with? 852 00:49:29,010 --> 00:49:32,100 And that's the genius of things which are universal, 853 00:49:32,100 --> 00:49:35,430 because they don't assume that you have to measure particular 854 00:49:35,430 --> 00:49:38,560 things in some model that you believe in. 855 00:49:38,560 --> 00:49:41,090 It just does the whole thing all at once. 856 00:49:41,090 --> 00:49:45,230 If it's running along and the statistics change, bing, it 857 00:49:45,230 --> 00:49:48,070 changes, too. 858 00:49:48,070 --> 00:49:51,750 And suddenly it will start producing more binary digits 859 00:49:51,750 --> 00:49:54,590 per source symbol, or fewer, because that's 860 00:49:54,590 --> 00:49:56,480 what it has to do. 861 00:49:56,480 --> 00:49:59,140 And that's just the way it works. 862 00:49:59,140 --> 00:50:02,510 But it does have all these nice properties. 863 00:50:02,510 --> 00:50:05,770 It has instantaneous decodability. 864 00:50:05,770 --> 00:50:09,770 In a sense, it is a prefix-free code, although you 865 00:50:09,770 --> 00:50:11,430 have to interpret pretty carefully 866 00:50:11,430 --> 00:50:12,730 what you mean by that. 867 00:50:12,730 --> 00:50:16,010 We'll understand that in a little bit. 868 00:50:16,010 --> 00:50:20,680 But in fact, it does do all of these neat things. 869 00:50:20,680 --> 00:50:25,420 And there are better algorithms out there now, 870 00:50:25,420 --> 00:50:29,380 whether they're better in terms of the trade-off between 871 00:50:29,380 --> 00:50:34,690 complexity and compressability, I don't know. 872 00:50:34,690 --> 00:50:37,390 But anyway, the people who do research on these things have 873 00:50:37,390 --> 00:50:40,670 to have something to keep them busy. 874 00:50:40,670 --> 00:50:44,690 And they have to have some kind of results to get money 875 00:50:44,690 --> 00:50:48,160 for, and therefore they claim that the new algorithms are 876 00:50:48,160 --> 00:50:50,110 better than the old algorithms. 877 00:50:50,110 --> 00:50:53,620 And they probably are, but I'm not sure. 878 00:50:53,620 --> 00:50:55,140 Anyway, this is a very cute algorithm. 879 00:51:00,540 --> 00:51:04,190 So what you're trying to do here, the objective, one 880 00:51:04,190 --> 00:51:09,500 objective which is achieved, is if you observe the output 881 00:51:09,500 --> 00:51:13,620 from the given probability model, say a Markov source, 882 00:51:13,620 --> 00:51:17,810 and I build the best code I can for that Markov source, 883 00:51:17,810 --> 00:51:20,250 then we know how many bits we need per symbol. 884 00:51:20,250 --> 00:51:23,800 Number of bits we need per symbol is this entropy of a 885 00:51:23,800 --> 00:51:27,280 letter of a symbol given the state. 886 00:51:27,280 --> 00:51:29,750 That's the best we can do. 887 00:51:29,750 --> 00:51:33,160 How well does the Lempel Ziv algorithm do? 888 00:51:33,160 --> 00:51:37,290 Asymptotically, when you make everything large it will 889 00:51:37,290 --> 00:51:41,130 encode using a number of bits per symbol, which is H of X, 890 00:51:41,130 --> 00:51:45,770 given S. So it'll do just as well as the best thing does 891 00:51:45,770 --> 00:51:49,030 which happens to know the model to start with. 892 00:51:49,030 --> 00:51:53,520 As I said before the algorithm also compresses in the absence 893 00:51:53,520 --> 00:51:56,120 of any ordinary kind of statistical structure. 894 00:51:56,120 --> 00:51:58,860 Whatever kind of structure is there, this 895 00:51:58,860 --> 00:52:01,160 algorithm sorts it out. 896 00:52:01,160 --> 00:52:04,080 It should deal with gradually changing statistics. 897 00:52:04,080 --> 00:52:06,980 It does that also, but perhaps not in the best way. 898 00:52:06,980 --> 00:52:08,280 We'll talk about that later. 899 00:52:11,730 --> 00:52:13,180 Let's describe it a little bit. 900 00:52:19,350 --> 00:52:23,800 If we let x 1, x 2 blah blah blah, be the output of the 901 00:52:23,800 --> 00:52:29,610 source, and the alphabet is some alphabet capital X, which 902 00:52:29,610 --> 00:52:41,720 has size m, let's just as notation, let x sub m super n 903 00:52:41,720 --> 00:52:46,780 denote the string xm, xm plus 1, up to xn. 904 00:52:46,780 --> 00:52:49,900 In other words, in describing this algorithm we are, all the 905 00:52:49,900 --> 00:52:54,310 time talking about strings of letters taken out of this 906 00:52:54,310 --> 00:52:57,110 infinite length string that comes out of the source. 907 00:52:57,110 --> 00:53:00,800 We want to have a nice notation for talking about a 908 00:53:00,800 --> 00:53:04,210 sub string of the actual sequence. 909 00:53:04,210 --> 00:53:08,300 We're going to use a window in this algorithm. 910 00:53:08,300 --> 00:53:12,510 We want the window to have a size with the power of 2. 911 00:53:12,510 --> 00:53:17,290 Typical values for the window range from about a thousand up 912 00:53:17,290 --> 00:53:20,270 to about a million. 913 00:53:20,270 --> 00:53:23,830 Maybe they're even bigger now, I don't know. 914 00:53:23,830 --> 00:53:25,740 But as we'll see later, there's 915 00:53:25,740 --> 00:53:28,370 some constraints there. 916 00:53:28,370 --> 00:53:33,950 What the Lempel Ziv algorithm does, this LZ77 algorithm, is 917 00:53:33,950 --> 00:53:38,830 it matches the longest string of yet unencoded -- this is 918 00:53:38,830 --> 00:53:47,310 unencoded also, isn't it, that's simple enough -- 919 00:53:47,310 --> 00:53:51,330 of yet unencoded symbols by using strings 920 00:53:51,330 --> 00:53:52,370 starting in the window. 921 00:53:52,370 --> 00:53:56,700 So it takes this sequence of stuff we haven't observed yet, 922 00:53:56,700 --> 00:54:00,750 it tries to find the longest string starting there which it 923 00:54:00,750 --> 00:54:04,590 can match with something that's already in the window. 924 00:54:04,590 --> 00:54:07,580 If it can find something which matches with something in the 925 00:54:07,580 --> 00:54:10,630 window, what does it do? 926 00:54:10,630 --> 00:54:13,940 It's going to first say how long the match was, and then 927 00:54:13,940 --> 00:54:17,210 it's going to say where in the window it found it. 928 00:54:17,210 --> 00:54:19,830 And the decoder is sitting there, the decoder has this 929 00:54:19,830 --> 00:54:24,570 window which it observes also, so the decoder can find the 930 00:54:24,570 --> 00:54:27,200 same match which is in the window. 931 00:54:27,200 --> 00:54:29,870 Why does it work? 932 00:54:29,870 --> 00:54:33,740 Well, it works because with all of these AEP properties 933 00:54:33,740 --> 00:54:37,750 that we're thinking of, you tend to have typical sequences 934 00:54:37,750 --> 00:54:39,790 sitting there in the window. 935 00:54:39,790 --> 00:54:42,420 And you tend to have typical sequences which come out of 936 00:54:42,420 --> 00:54:43,300 the source. 937 00:54:43,300 --> 00:54:46,300 So the thing we're trying to encode is some typical 938 00:54:46,300 --> 00:54:48,660 sequence -- 939 00:54:48,660 --> 00:54:51,470 well you can you can think of short typical sequences and 940 00:54:51,470 --> 00:54:53,510 longer typical sequences. 941 00:54:53,510 --> 00:54:56,980 We try to find the longest typical sequence that we can. 942 00:54:56,980 --> 00:54:59,530 And we're looking back into this window, and there are 943 00:54:59,530 --> 00:55:02,290 enormous number of typical sequences there. 944 00:55:02,290 --> 00:55:04,720 If we make the typical sequences short enough, there 945 00:55:04,720 --> 00:55:07,430 aren't too many of them, and most of them are sitting there 946 00:55:07,430 --> 00:55:09,000 in the window. 947 00:55:09,000 --> 00:55:13,150 This'll become clearer as we go. 948 00:55:13,150 --> 00:55:15,280 Let's go on and actually explain what 949 00:55:15,280 --> 00:55:16,560 the algorithm does. 950 00:55:20,130 --> 00:55:23,930 So here's the algorithm. 951 00:55:23,930 --> 00:55:29,850 First, you take this large W, this large window size, and 952 00:55:29,850 --> 00:55:32,190 we're going to encode the first thought W symbols. 953 00:55:32,190 --> 00:55:35,500 We're not even going to use any compression, that's just 954 00:55:35,500 --> 00:55:37,320 lost stuff. 955 00:55:37,320 --> 00:55:41,820 So we encode this first million symbols, we grin and 956 00:55:41,820 --> 00:55:45,640 bear it, and then the decoder has this window 957 00:55:45,640 --> 00:55:47,170 of a million symbols. 958 00:55:47,170 --> 00:55:50,840 We at the encoder have this window of a million symbols, 959 00:55:50,840 --> 00:55:53,350 and we proceed from there. 960 00:55:53,350 --> 00:55:56,150 So it gets amortized, so we don't care. 961 00:55:56,150 --> 00:56:01,350 So we then have a pointer, and we set the pointer to W. 962 00:56:01,350 --> 00:56:05,610 So the pointer is the last thing that we encoded. 963 00:56:05,610 --> 00:56:10,190 So we have all this encoded stuff starting at time P, 964 00:56:10,190 --> 00:56:14,850 everything beyond there is as yet unencoded. 965 00:56:14,850 --> 00:56:17,200 That's the first step in the algorithm. 966 00:56:17,200 --> 00:56:18,450 So far, so good. 967 00:56:22,740 --> 00:56:27,360 The next step is to find the largest n, greater than or 968 00:56:27,360 --> 00:56:29,780 equal to 2, I'll explain why greater than or equal to 2 969 00:56:29,780 --> 00:56:38,160 later, such that the string x sub p plus 1 up to p plus n, 970 00:56:38,160 --> 00:56:39,190 what is that? 971 00:56:39,190 --> 00:56:43,820 It's the string which starts right beyond the pointer, 972 00:56:43,820 --> 00:56:47,160 namely the string that starts here, what we're trying to do 973 00:56:47,160 --> 00:56:50,560 is find the largest n, in other words the longest string 974 00:56:50,560 --> 00:56:54,020 starting here, which we can match with something that's in 975 00:56:54,020 --> 00:56:55,290 the window. 976 00:56:55,290 --> 00:57:00,430 Now we look at a, a is in the window, we look at a b, a b as 977 00:57:00,430 --> 00:57:01,530 in the window. 978 00:57:01,530 --> 00:57:10,900 We look at a b a, a b a as in the window. a b a b, a b a b 979 00:57:10,900 --> 00:57:12,530 is not in the window. 980 00:57:12,530 --> 00:57:15,960 At least I hope it's not in the window or I screwed up. 981 00:57:15,960 --> 00:57:17,210 Yeah, it's not in the window. 982 00:57:17,210 --> 00:57:20,810 So the longest thing we can find which matches with what's 983 00:57:20,810 --> 00:57:23,430 in the window is this match of length three. 984 00:57:26,440 --> 00:57:29,420 So this is finding the longest match which matches with 985 00:57:29,420 --> 00:57:31,770 something here. 986 00:57:31,770 --> 00:57:36,900 This next example, I think the only way I can regard 987 00:57:36,900 --> 00:57:39,300 that is as a hack. 988 00:57:39,300 --> 00:57:42,410 It's a kind of hack that programmers like. 989 00:57:42,410 --> 00:57:45,500 It's very mysterious, but it's also the kind of hack that 990 00:57:45,500 --> 00:57:49,840 mathematicians like because in this case this particular hack 991 00:57:49,840 --> 00:57:52,790 makes the analysis much easier. 992 00:57:52,790 --> 00:57:55,420 So this is another kind of match. 993 00:57:55,420 --> 00:57:58,240 It's looking for the longest string here, 994 00:57:58,240 --> 00:58:00,510 starting at this pointer. 995 00:58:00,510 --> 00:58:05,450 a b a b so forth, which matches things starting here. 996 00:58:05,450 --> 00:58:07,150 Starting somewhere in the window. 997 00:58:07,150 --> 00:58:11,990 So it finds a match a b here. a b a here. 998 00:58:11,990 --> 00:58:17,050 But now it looks for a b a b, a match of four. 999 00:58:17,050 --> 00:58:18,640 Where do we find it? 1000 00:58:18,640 --> 00:58:21,940 We can start back here, which is still in the window, and 1001 00:58:21,940 --> 00:58:25,190 what we see is a b a b. 1002 00:58:25,190 --> 00:58:30,160 So these four digits match these four digits. 1003 00:58:30,160 --> 00:58:34,410 Well you might say foul ball, because if I tell you there's 1004 00:58:34,410 --> 00:58:40,740 a match of four and I tell you where it is, in fact, all the 1005 00:58:40,740 --> 00:58:44,600 poor decoder knows is this. 1006 00:58:44,600 --> 00:58:48,180 If I tell you there's a match of four and it starts here, 1007 00:58:48,180 --> 00:58:50,110 what's the poor decoder going to do? 1008 00:58:50,110 --> 00:58:56,950 The poor decoder says, ok, so a is that digit two digits 1009 00:58:56,950 --> 00:58:59,180 ago, so that gives me the a. 1010 00:58:59,180 --> 00:59:02,000 So I know there's an a there. b is the next 1011 00:59:02,000 --> 00:59:04,920 digit, so b is there. 1012 00:59:04,920 --> 00:59:08,420 And then I know the first two digits beyond the window, and 1013 00:59:08,420 --> 00:59:12,960 therefore this third digit is a, so that must be that digit. 1014 00:59:12,960 --> 00:59:15,340 The fourth digit is this digit, which 1015 00:59:15,340 --> 00:59:17,030 must be that digit. 1016 00:59:17,030 --> 00:59:18,430 OK? 1017 00:59:18,430 --> 00:59:21,220 If you didn't catch that, you can just think about it, it'll 1018 00:59:21,220 --> 00:59:23,140 become clear. 1019 00:59:23,140 --> 00:59:24,510 I mean it really is a hack. 1020 00:59:24,510 --> 00:59:25,830 It's not very important. 1021 00:59:25,830 --> 00:59:28,000 It won't change the way this thing bahaves. 1022 00:59:28,000 --> 00:59:33,470 But it does change the way you analyze it 1023 00:59:33,470 --> 00:59:34,690 So that's the first thing you do, you 1024 00:59:34,690 --> 00:59:38,170 look for these matches. 1025 00:59:38,170 --> 00:59:40,230 Next thing we're going to do is we're going to try to 1026 00:59:40,230 --> 00:59:42,690 encode the matches. 1027 00:59:42,690 --> 00:59:46,640 Namely, we're going to try to encode the thing the we found 1028 00:59:46,640 --> 00:59:48,180 in the window. 1029 00:59:48,180 --> 00:59:50,900 How do we encode what we found in the window? 1030 00:59:50,900 --> 00:59:52,480 Well the first thing we have to do -- yeah? 1031 00:59:52,480 --> 00:59:55,630 AUDIENCE: What if you don't find any matches? 1032 00:59:55,630 --> 00:59:56,980 PROFESSOR: I'm going to talk about that later. 1033 00:59:56,980 --> 01:00:00,030 If you don't find any matches, I mean what I was looking for 1034 01:00:00,030 --> 01:00:02,900 was matches of two or more. 1035 01:00:02,900 --> 01:00:06,440 If you don't find any matches of two or more, what you do is 1036 01:00:06,440 --> 01:00:11,040 you just take the first letter in the window and you encode 1037 01:00:11,040 --> 01:00:12,390 that without any compression. 1038 01:00:16,790 --> 01:00:21,180 I mean our strategy here is to always send the length of the 1039 01:00:21,180 --> 01:00:22,530 match first. 1040 01:00:22,530 --> 01:00:25,870 If you say the length of the match is one, than the decoder 1041 01:00:25,870 --> 01:00:28,850 knows to look for uncompressed symbols, instead of looking 1042 01:00:28,850 --> 01:00:30,750 for something in the window. 1043 01:00:30,750 --> 01:00:33,880 So it takes care of the case where there haven't been any 1044 01:00:33,880 --> 01:00:38,240 occurrences of symbol anywhere in the window. 1045 01:00:38,240 --> 01:00:41,030 So you only look for matches of length two or more. 1046 01:00:44,400 --> 01:00:50,350 So then you use something called a unary-binary code. 1047 01:00:50,350 --> 01:00:54,990 Theoriticians always copy everybody else's work. 1048 01:00:54,990 --> 01:01:01,030 The unary-binary code was due to Peter Elias, who was the 1049 01:01:01,030 --> 01:01:03,960 head of this department for a long time. 1050 01:01:03,960 --> 01:01:07,610 He just died about six months ago. 1051 01:01:07,610 --> 01:01:10,970 He was here up until his death. 1052 01:01:10,970 --> 01:01:14,290 He used to organize department colloquia. 1053 01:01:14,290 --> 01:01:17,620 He was so essential that since he died, nobody's taken over 1054 01:01:17,620 --> 01:01:19,430 the department colloquia. 1055 01:01:19,430 --> 01:01:23,090 He was my thesis adviser, so I tend to think very kindly of 1056 01:01:23,090 --> 01:01:26,350 him And he was lots of other things. 1057 01:01:26,350 --> 01:01:31,195 But anyway, he invented this unary-binary code, which is a 1058 01:01:31,195 --> 01:01:33,980 way of encoding the integers, which has a lot of nice 1059 01:01:33,980 --> 01:01:35,530 properties. 1060 01:01:35,530 --> 01:01:39,230 And they're universal properties, as you will see. 1061 01:01:39,230 --> 01:01:42,130 The idea is to encode the integers, there are infinite 1062 01:01:42,130 --> 01:01:44,550 number of integers. 1063 01:01:44,550 --> 01:01:47,870 What you'd like to do, somehow or other, is have shorter code 1064 01:01:47,870 --> 01:01:51,530 words for lower integers, and longer code 1065 01:01:51,530 --> 01:01:54,920 words for longer integers. 1066 01:01:54,920 --> 01:01:58,140 In this particular Lempel Ziv algorithm, it's particularly 1067 01:01:58,140 --> 01:02:01,920 important to have the lenght of the code words growing as 1068 01:02:01,920 --> 01:02:04,790 the logarithm of n. 1069 01:02:04,790 --> 01:02:08,720 Because then anytime you find a really long match, and you 1070 01:02:08,720 --> 01:02:11,560 got a very large n, you're encoding a whole lot of 1071 01:02:11,560 --> 01:02:16,020 letters, and therefore you don't care if there's an 1072 01:02:16,020 --> 01:02:19,080 overhead which is proportional to log n. 1073 01:02:19,080 --> 01:02:20,720 So you don't mind that. 1074 01:02:20,720 --> 01:02:23,680 And if there's a very small number of letters encoded, you 1075 01:02:23,680 --> 01:02:27,280 want something very efficient then. 1076 01:02:27,280 --> 01:02:28,910 So it does that. 1077 01:02:28,910 --> 01:02:33,950 And the way it does it is, first you generate a prefix, 1078 01:02:33,950 --> 01:02:38,470 and then you have a representation in base 2. 1079 01:02:38,470 --> 01:02:40,450 Namely base 2 expansion. 1080 01:02:40,450 --> 01:02:46,750 So the number n, the prefix here, I think I said it here, 1081 01:02:46,750 --> 01:02:49,980 the positive integer n is encoded into the binary 1082 01:02:49,980 --> 01:02:56,070 representation of n, preceded by a prefix of integer part of 1083 01:02:56,070 --> 01:02:58,790 log to the base 2 of n zero. 1084 01:02:58,790 --> 01:03:03,930 Now what's the integer part of log to the base 2 of 1? 1085 01:03:03,930 --> 01:03:07,220 Log to the base 2 of 1 is zero. 1086 01:03:07,220 --> 01:03:10,320 It was a prefix of zero zeros. 1087 01:03:10,320 --> 01:03:12,390 So zero zeros is nothing. 1088 01:03:12,390 --> 01:03:17,890 So the prefix is nothing, the expansion of 1, in a base 2 1089 01:03:17,890 --> 01:03:21,360 expansion or any other expansion, is 1. 1090 01:03:21,360 --> 01:03:25,830 So the code word for 1 is 1. 1091 01:03:25,830 --> 01:03:31,820 If you have the number 2, log to the base 2 of 2 is 1. 1092 01:03:31,820 --> 01:03:35,510 The integer part of 1 is 1, so you start out with a single 1093 01:03:35,510 --> 01:03:38,550 zero and then you have the expansion, 2 is 1094 01:03:38,550 --> 01:03:40,590 expanded as 1 zero. 1095 01:03:40,590 --> 01:03:41,970 And so forth. 1096 01:03:41,970 --> 01:03:46,700 Oh and then 3 is expanded as 1 1, again with a prefix of 1. 1097 01:03:46,700 --> 01:03:49,470 Four is encoded as 1 zero zero, blah 1098 01:03:49,470 --> 01:03:52,600 blah blah and so forth. 1099 01:03:52,600 --> 01:03:55,500 Why don't you just leave the prefix out? 1100 01:04:00,090 --> 01:04:02,750 Anybody figure out why I need the prefix there? 1101 01:04:02,750 --> 01:04:07,250 AUDIENCE: If without those prefixes, you don't a 1102 01:04:07,250 --> 01:04:07,490 prefix-free code. 1103 01:04:07,490 --> 01:04:12,430 PROFESSOR: Yeah Right. 1104 01:04:12,430 --> 01:04:15,850 If I left them out, everything would start with 1. 1105 01:04:15,850 --> 01:04:18,680 I would get to the 1, I would say, gee, is that the end of 1106 01:04:18,680 --> 01:04:20,580 it or isn't it the end of it? 1107 01:04:20,580 --> 01:04:23,050 I wouldn't know. 1108 01:04:23,050 --> 01:04:26,840 But with this, if I see 1, the only code word that starts 1109 01:04:26,840 --> 01:04:30,030 with 1 is this one. 1110 01:04:30,030 --> 01:04:34,780 If it's 2 or 3, it starts with zero and then there's a 1, 1111 01:04:34,780 --> 01:04:39,010 which says it's on that next branch which has probably 1/4. 1112 01:04:39,010 --> 01:04:45,380 I have a 1 0, 1 1, a prefix of 0 0, followed by a 1, put me 1113 01:04:45,380 --> 01:04:47,640 off on another branch. 1114 01:04:47,640 --> 01:04:49,150 And so forth. 1115 01:04:49,150 --> 01:04:51,810 So, yes, this is a prefix-free code. 1116 01:04:51,810 --> 01:04:56,000 And it's a prefix-free code which has this nice property 1117 01:04:56,000 --> 01:05:02,890 that the number of digits in the code word is approximately 1118 01:05:02,890 --> 01:05:04,980 2 times log n. 1119 01:05:04,980 --> 01:05:08,940 Namely, it goes up both ways here. 1120 01:05:08,940 --> 01:05:12,540 The number of zeros I need is log to the base 2 of n. 1121 01:05:12,540 --> 01:05:15,910 The number of digits in the base 2 expansion, is also the 1122 01:05:15,910 --> 01:05:19,310 integer part of log to the base 2 of n. 1123 01:05:19,310 --> 01:05:21,820 So it works both ways. 1124 01:05:21,820 --> 01:05:26,920 And there's always this 1 in the middle. 1125 01:05:26,920 --> 01:05:28,880 Again, it's a hack. 1126 01:05:28,880 --> 01:05:30,940 It's a hack that works very nicely when you 1127 01:05:30,940 --> 01:05:32,350 try to analyze this. 1128 01:05:35,830 --> 01:05:41,480 OK so if the size of the match is bigger than one, we're 1129 01:05:41,480 --> 01:05:44,810 going to encode the positive integer u. 1130 01:05:44,810 --> 01:05:47,750 u was where the match occurred. 1131 01:05:47,750 --> 01:05:49,670 How far back do you have to count before 1132 01:05:49,670 --> 01:05:51,780 you find this match? 1133 01:05:51,780 --> 01:05:55,020 You're going to encode that integer u into a fixed length 1134 01:05:55,020 --> 01:06:00,530 code of length log of w bits. 1135 01:06:00,530 --> 01:06:03,020 In other words, you have a window of 1136 01:06:03,020 --> 01:06:05,780 size 2 to the twentieth. 1137 01:06:05,780 --> 01:06:10,860 You can encode any point in there with 20 binary digits. 1138 01:06:10,860 --> 01:06:14,740 The 20 binary digits say how far do you have to go back to 1139 01:06:14,740 --> 01:06:16,060 find this code word. 1140 01:06:19,660 --> 01:06:23,850 So first we're encoding n, by this unary-binary code, then 1141 01:06:23,850 --> 01:06:27,240 we're encoding w just with this simple minded way of 1142 01:06:27,240 --> 01:06:29,780 encoding log w bits. 1143 01:06:29,780 --> 01:06:32,000 And that tells us where the match is. 1144 01:06:32,000 --> 01:06:40,110 The decoder goes back, there finds the match, pumps it out 1145 01:06:40,110 --> 01:06:42,990 if n is equal to 1, here's the answer to your question, you 1146 01:06:42,990 --> 01:06:45,840 encode the single letter without compression. 1147 01:06:45,840 --> 01:06:49,370 And that takes care of the case, either where you have a 1148 01:06:49,370 --> 01:06:52,890 match to that single letter, or there isn't any match to 1149 01:06:52,890 --> 01:06:55,980 the single letter. 1150 01:06:55,980 --> 01:06:58,140 The next thing, as you might imagine, is you set the 1151 01:06:58,140 --> 01:07:03,570 pointer to P plus n, because you've encoded n digits, and 1152 01:07:03,570 --> 01:07:04,770 you go to step two. 1153 01:07:04,770 --> 01:07:07,470 Namely, you keep iterating forever. 1154 01:07:07,470 --> 01:07:11,420 Until the source wears out, or until the encoder wears out, 1155 01:07:11,420 --> 01:07:13,370 or until the decoder wears out. 1156 01:07:13,370 --> 01:07:16,030 You just keep going. 1157 01:07:16,030 --> 01:07:17,790 That's all the algorithm is. 1158 01:07:17,790 --> 01:07:18,050 Yeah? 1159 01:07:18,050 --> 01:07:23,692 AUDIENCE: Can you throw out the first n bits, when you 1160 01:07:23,692 --> 01:07:27,852 reset the pointer, because you only have w bits that say 1161 01:07:27,852 --> 01:07:29,102 where n was for the next iteration? 1162 01:07:31,260 --> 01:07:35,730 PROFESSOR: No, I throw out the n oldest bits in the window. 1163 01:07:35,730 --> 01:07:37,570 AUDIENCE: Well, those are the first n bits. 1164 01:07:37,570 --> 01:07:40,790 PROFESSOR: Yes the first n bits out of the window and I 1165 01:07:40,790 --> 01:07:45,010 keep all of the more recent bits. 1166 01:07:45,010 --> 01:07:47,530 I tend to think of the first ones as the things closest to 1167 01:07:47,530 --> 01:07:53,600 the pointer, but you think of it either way, which is fine. 1168 01:07:53,600 --> 01:07:56,960 So as you do it, the window keeps sliding along. 1169 01:08:00,790 --> 01:08:02,700 That's what it does. 1170 01:08:07,650 --> 01:08:11,430 Why do you think this works? / There's a half 1171 01:08:11,430 --> 01:08:13,520 analysis in the notes. 1172 01:08:17,000 --> 01:08:20,530 i'd like to say a little bit about how that analysis is 1173 01:08:20,530 --> 01:08:25,450 cheating, because it's not quite a fair analysis. 1174 01:08:25,450 --> 01:08:29,460 If you look at the window, there are w different starting 1175 01:08:29,460 --> 01:08:32,830 points in the window. 1176 01:08:32,830 --> 01:08:34,630 So let's write this down. 1177 01:08:38,100 --> 01:08:40,780 w starting points. 1178 01:08:48,590 --> 01:08:56,860 So for any given n, there are there are w 1179 01:08:56,860 --> 01:09:02,050 springs of length n. 1180 01:09:07,180 --> 01:09:11,070 We don't know how long this match is going to be, but what 1181 01:09:11,070 --> 01:09:14,850 I would like to do, if I'm thinking of a Markov source, 1182 01:09:14,850 --> 01:09:20,810 is to say, OK, let's make n large enough so that the size 1183 01:09:20,810 --> 01:09:24,620 of the typical set is about w. 1184 01:09:24,620 --> 01:09:25,040 ok 1185 01:09:25,040 --> 01:09:45,910 So choose n to be about w divided by H of X given S. And 1186 01:09:45,910 --> 01:09:50,530 the size of the typical set is then going to be 2 to the n, 1187 01:09:50,530 --> 01:09:52,660 wait a minute. 1188 01:10:00,400 --> 01:10:02,860 n is equal to log w. 1189 01:10:12,000 --> 01:10:16,460 So the size of the typical set then is T sub epsilon, is 1190 01:10:16,460 --> 01:10:23,240 going to be roughly from what we said 2 to the n times H of 1191 01:10:23,240 --> 01:10:32,550 X given S. So what I'm going to do is to set w equal to T 1192 01:10:32,550 --> 01:10:33,800 of epsilon. 1193 01:10:39,090 --> 01:10:45,080 I'm going to focus on a match length which I'm hoping to 1194 01:10:45,080 --> 01:10:50,470 achieve, of log w over H of X given S. The typical set, 1195 01:10:50,470 --> 01:10:55,675 then, is a size 2 to the n times H of x give S. And if 1196 01:10:55,675 --> 01:11:00,380 the typical set is of this size, and I look at these w 1197 01:11:00,380 --> 01:11:04,030 strings in the window, yeah, I'm going to have some 1198 01:11:04,030 --> 01:11:08,380 duplicates but roughly I'm going to have a large enough 1199 01:11:08,380 --> 01:11:12,780 number of things in the window to represent all of these 1200 01:11:12,780 --> 01:11:13,990 typical strings. 1201 01:11:13,990 --> 01:11:15,660 Or most of them. 1202 01:11:15,660 --> 01:11:19,680 If I try to choose an n which is a little bigger than that, 1203 01:11:19,680 --> 01:11:22,540 let's call this n star. 1204 01:11:22,540 --> 01:11:26,190 If I try to make n a little bit bigger than this typical 1205 01:11:26,190 --> 01:11:29,970 match size, I don't have a prayer of a chance, because 1206 01:11:29,970 --> 01:11:34,340 the typical set then is just very much larger than w, so 1207 01:11:34,340 --> 01:11:38,360 I'd be very, very lucky if I found anything in the window. 1208 01:11:38,360 --> 01:11:40,310 So that can't work. 1209 01:11:40,310 --> 01:11:45,160 If I make n a good deal smaller, then I'm going to 1210 01:11:45,160 --> 01:11:49,030 succeed with great probability it seems, because I'm even 1211 01:11:49,030 --> 01:11:51,600 allowing for many, many duplicates of each of these 1212 01:11:51,600 --> 01:11:55,540 typical sets to be in the window. 1213 01:11:55,540 --> 01:11:57,610 So what this is saying is there ought to be some 1214 01:11:57,610 --> 01:12:00,600 critical length when the window was very large, 1215 01:12:00,600 --> 01:12:04,050 critical match length, and most of the time the match is 1216 01:12:04,050 --> 01:12:08,310 going to be somewhere around this value here. 1217 01:12:08,310 --> 01:12:11,880 And as w becomes truly humongous, and as the match 1218 01:12:11,880 --> 01:12:13,990 size becomes large -- 1219 01:12:13,990 --> 01:12:17,360 you remember for these typical sets to make any sense, this 1220 01:12:17,360 --> 01:12:19,690 number has to be large. 1221 01:12:19,690 --> 01:12:22,230 And when this number gets large, the size of the typical 1222 01:12:22,230 --> 01:12:25,750 set it humongous. 1223 01:12:25,750 --> 01:12:29,770 Which says, that for this asymptotic analysis, a window 1224 01:12:29,770 --> 01:12:32,370 of 2 to the twentieth, probably 1225 01:12:32,370 --> 01:12:34,530 isn't nearly big enough. 1226 01:12:34,530 --> 01:12:38,530 So the asymptotic analysis is really saying, when you have 1227 01:12:38,530 --> 01:12:41,670 really humongous windows this is going to work. 1228 01:12:44,630 --> 01:12:47,390 You don't make windows that large, so you have to have 1229 01:12:47,390 --> 01:12:50,560 some faith that this theoretical argument is going 1230 01:12:50,560 --> 01:12:51,250 to work here. 1231 01:12:51,250 --> 01:12:54,950 But that tells you roughly what the size of these matches 1232 01:12:54,950 --> 01:12:56,280 is going to be. 1233 01:12:56,280 --> 01:13:05,020 If the size of the matches is that, and you use log w bits 1234 01:13:05,020 --> 01:13:09,390 plus 2 log n bits, to encode each match, what happens? 1235 01:13:14,130 --> 01:13:15,890 Encode match. 1236 01:13:23,350 --> 01:13:36,180 You use log w, plus 2 log n star. 1237 01:13:36,180 --> 01:13:38,530 That's the number of bits it takes you, this is the number 1238 01:13:38,530 --> 01:13:42,770 of bits it takes you to encode what the match size is. 1239 01:13:42,770 --> 01:13:44,720 You still have to encode that. 1240 01:13:44,720 --> 01:13:48,790 This is the number of bits it takes you to encode where the 1241 01:13:48,790 --> 01:13:50,720 match occurs. 1242 01:13:50,720 --> 01:13:54,290 Now how big is this relative to this? 1243 01:13:54,290 --> 01:14:00,700 Well n star is on the order of log w, so we're taking log w 1244 01:14:00,700 --> 01:14:05,480 plus 2 times log of log of w. 1245 01:14:05,480 --> 01:14:08,860 So in an approximate analysis, you say, I don't even care 1246 01:14:08,860 --> 01:14:11,760 about that. 1247 01:14:11,760 --> 01:14:17,460 You wind up with encoding a match with log w bits. 1248 01:14:17,460 --> 01:14:22,530 So you encode n star bits, you use log w bits to do it, how 1249 01:14:22,530 --> 01:14:27,300 many bits are you using per symbol? 1250 01:14:27,300 --> 01:14:43,725 H of X given S. That's roughly the idea of why the Lempel Ziv 1251 01:14:43,725 --> 01:14:46,020 algorithm works. 1252 01:14:46,020 --> 01:14:48,880 Can anybody spot any problems with that analysis? 1253 01:14:48,880 --> 01:14:49,210 Yeah. 1254 01:14:49,210 --> 01:14:55,643 AUDIENCE: You don't know the probabilities beforehand, so 1255 01:14:55,643 --> 01:15:00,490 how do you pick w? 1256 01:15:00,490 --> 01:15:02,350 PROFESSOR: Good one. 1257 01:15:02,350 --> 01:15:09,320 You picked w by saying, I have a computer which will go at a 1258 01:15:09,320 --> 01:15:10,820 certain speed. 1259 01:15:10,820 --> 01:15:14,390 My data rate is coming in at a certain speed, and I'm going 1260 01:15:14,390 --> 01:15:17,690 to pick w as large as I can keep up with. 1261 01:15:17,690 --> 01:15:19,710 With the best algorithm I can think of for 1262 01:15:19,710 --> 01:15:23,010 doing string matching. 1263 01:15:23,010 --> 01:15:26,690 And string matching a hard thing to do, but it's not a 1264 01:15:26,690 --> 01:15:29,100 terribly easy thing to do either. 1265 01:15:29,100 --> 01:15:30,810 So you make w as large as you can. 1266 01:15:33,830 --> 01:15:36,480 And if it's not large enough, tough. 1267 01:15:36,480 --> 01:15:39,760 You got matches which are somewhat smaller -- all this 1268 01:15:39,760 --> 01:15:43,830 argument about typical sets still work except for the 1269 01:15:43,830 --> 01:15:48,690 epsilon and deltas that are tucked into there. 1270 01:15:48,690 --> 01:15:51,560 So it's just that the epsilons and the deltas get too big 1271 01:15:51,560 --> 01:15:54,410 when you're strings are not long enough. 1272 01:15:54,410 --> 01:15:54,740 Yeah? 1273 01:15:54,740 --> 01:15:59,931 AUDIENCE: So your w is just make your processing time 1274 01:15:59,931 --> 01:16:01,160 equal the time [UNINTELLIGIBLE]? 1275 01:16:01,160 --> 01:16:03,550 PROFESSOR: Yeah. 1276 01:16:03,550 --> 01:16:05,290 That's what the determines w. 1277 01:16:05,290 --> 01:16:08,690 It's how fast you can do a string search over this long, 1278 01:16:08,690 --> 01:16:10,410 long window. 1279 01:16:10,410 --> 01:16:13,950 You're not going to just search everything one by one, 1280 01:16:13,950 --> 01:16:16,300 you're going to build some kind of data structure there 1281 01:16:16,300 --> 01:16:19,940 that makes these searches run fast. 1282 01:16:19,940 --> 01:16:21,870 Can anybody think of why you might not want 1283 01:16:21,870 --> 01:16:23,350 to make w too large? 1284 01:16:28,150 --> 01:16:31,010 This isn't a theoretical reason, this is 1285 01:16:31,010 --> 01:16:31,790 more practical thing. 1286 01:16:31,790 --> 01:16:33,993 AUDIENCE: Is it if the probabilities change, it's 1287 01:16:33,993 --> 01:16:35,370 slow to react to those changes? 1288 01:16:35,370 --> 01:16:37,110 PROFESSOR: If the probabilities change, it's 1289 01:16:37,110 --> 01:16:41,900 slow to react to them, because it's got this humongous window 1290 01:16:41,900 --> 01:16:46,300 here, and it's not until the window fills up with all of 1291 01:16:46,300 --> 01:16:53,240 this new stuff, that it starts to work well. 1292 01:16:53,240 --> 01:16:57,150 And before it fills up, you're using an effective small 1293 01:16:57,150 --> 01:17:00,910 window, but you're using a number of bits which is 1294 01:17:00,910 --> 01:17:04,250 proportionate to log of a large window, and therefore 1295 01:17:04,250 --> 01:17:06,370 you're wasting bits. 1296 01:17:06,370 --> 01:17:09,450 So another thing that that determines how big you 1297 01:17:09,450 --> 01:17:11,630 want w to be -- 1298 01:17:11,630 --> 01:17:14,040 I mean the main thing that's determines it is just that you 1299 01:17:14,040 --> 01:17:15,940 can't run that fast. 1300 01:17:15,940 --> 01:17:19,130 Because you'd like to make it pretty big. 1301 01:17:19,130 --> 01:17:21,090 Another question. 1302 01:17:21,090 --> 01:17:25,140 How about this matter of wasting w symbols at the 1303 01:17:25,140 --> 01:17:27,560 beginning to fill up the window? 1304 01:17:27,560 --> 01:17:28,810 what do you do about that? 1305 01:17:31,420 --> 01:17:33,220 I mean, that's a pretty stupid thing, right? 1306 01:17:37,900 --> 01:17:40,340 Anybody suggest a solution to that? 1307 01:17:40,340 --> 01:17:44,720 If you're building this yourself, how 1308 01:17:44,720 --> 01:17:47,400 would you handle it? 1309 01:17:47,400 --> 01:17:50,970 It's the same argument as if the statistics change in 1310 01:17:50,970 --> 01:17:52,220 mid-stream. 1311 01:17:54,010 --> 01:17:56,890 I mean, you don't measure that the statistics have changed, 1312 01:17:56,890 --> 01:17:59,060 and throw out what's in the window. 1313 01:18:05,610 --> 01:18:06,000 What? 1314 01:18:06,000 --> 01:18:08,366 AUDIENCE: Can you not assume that you already have a 1315 01:18:08,366 --> 01:18:09,040 typical sequence? 1316 01:18:09,040 --> 01:18:10,810 PROFESSOR: Yes, and you don't care whether 1317 01:18:10,810 --> 01:18:12,730 it's right or wrong. 1318 01:18:12,730 --> 01:18:16,240 You could assume that the typical sequence is all zeros, 1319 01:18:16,240 --> 01:18:18,570 so you fill up the window with all zeros. 1320 01:18:18,570 --> 01:18:21,340 The decoder also fills it up with all zeros, because this 1321 01:18:21,340 --> 01:18:23,750 is the way you always start. 1322 01:18:23,750 --> 01:18:25,840 And then you just start running a log, looking for 1323 01:18:25,840 --> 01:18:28,300 matches and encoding things. 1324 01:18:28,300 --> 01:18:35,530 And as w builds up, you start matching things. 1325 01:18:35,530 --> 01:18:38,320 You could even be smarter, and know that you're window wasn't 1326 01:18:38,320 --> 01:18:41,780 very big and let your window grow also. 1327 01:18:41,780 --> 01:18:45,990 If you wanted to really be fancy about this. 1328 01:18:45,990 --> 01:18:48,880 So if you want to encode this you can have a lot of fun, and 1329 01:18:48,880 --> 01:18:51,600 a lot of people over the years have had a lot of fun trying 1330 01:18:51,600 --> 01:18:53,570 to encode these things. 1331 01:18:53,570 --> 01:18:54,820 It's a neat thing to do.