1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation or view additional materials from 6 00:00:13,960 --> 00:00:17,890 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,890 --> 00:00:19,140 ocw.mit.edu. 8 00:00:23,140 --> 00:00:27,360 PROFESSOR: OK, I guess we're all set for getting close to 9 00:00:27,360 --> 00:00:31,550 the end, coming now to a race about whether we could say 10 00:00:31,550 --> 00:00:35,140 anything meaningful about Martingales or not. 11 00:00:35,140 --> 00:00:37,370 But I think we can. 12 00:00:37,370 --> 00:00:43,040 I want to spend a little time reviewing the Wald identity 13 00:00:43,040 --> 00:00:47,380 today and also sequential tests. 14 00:00:47,380 --> 00:00:51,446 It turns out that last time on the slides-- 15 00:00:55,750 --> 00:00:57,690 I didn't get the thresholds confused-- 16 00:00:57,690 --> 00:01:01,760 I got hypothesis 0 and hypothesis 1 interchanged from 17 00:01:01,760 --> 00:01:03,560 the way we usually do them. 18 00:01:03,560 --> 00:01:05,780 And it doesn't make any difference. 19 00:01:05,780 --> 00:01:11,520 There's no difference between hypothesis 0 and hypothesis 1. 20 00:01:11,520 --> 00:01:13,900 And you can do it either way you want to. 21 00:01:13,900 --> 00:01:18,000 But it gets very confusing when you switch from one to 22 00:01:18,000 --> 00:01:20,520 the other when you're halfway through an argument. 23 00:01:20,520 --> 00:01:25,960 So I'm going to go through part of that again today. 24 00:01:25,960 --> 00:01:30,370 And we will get revised slides on the web, so that if you 25 00:01:30,370 --> 00:01:37,310 want to see them with the hypotheses done in a 26 00:01:37,310 --> 00:01:40,690 consistent way, you will see it there. 27 00:01:40,690 --> 00:01:44,750 That should be on there by this afternoon I hope. 28 00:01:44,750 --> 00:01:52,590 OK, so let's go on and review what Mr. Wald said. 29 00:01:55,530 --> 00:01:58,820 He was talking about a random walk. 30 00:01:58,820 --> 00:02:02,910 Random walk consists of a bunch of a sequence of IID 31 00:02:02,910 --> 00:02:04,510 random variables. 32 00:02:04,510 --> 00:02:08,949 The random walk consists of the sequence of partial sums 33 00:02:08,949 --> 00:02:11,100 of those random variables. 34 00:02:11,100 --> 00:02:16,260 And the question is if this random walk is taking place 35 00:02:16,260 --> 00:02:21,000 and you have two thresholds, one at alpha and one of beta-- 36 00:02:21,000 --> 00:02:24,080 and beta is below 0 and alpha is above 0-- 37 00:02:24,080 --> 00:02:29,510 and you start at 0, of course, and the question is when do 38 00:02:29,510 --> 00:02:33,550 you cross one of these two thresholds and which 39 00:02:33,550 --> 00:02:34,790 threshold do cross? 40 00:02:34,790 --> 00:02:38,190 What's the probability of crossing each other, and 41 00:02:38,190 --> 00:02:41,090 everything else you can say about this problem. 42 00:02:41,090 --> 00:02:44,990 And it turns out this is a very major problem as far as 43 00:02:44,990 --> 00:02:49,300 stochastic processes are concerned, because it comes up 44 00:02:49,300 --> 00:02:50,670 almost everywhere. 45 00:02:50,670 --> 00:02:54,210 It certainly comes up as far as 46 00:02:54,210 --> 00:02:57,110 hypothesis testing is concerned. 47 00:02:57,110 --> 00:02:59,940 It's probably the major problem there when you get 48 00:02:59,940 --> 00:03:02,710 into sequential analysis. 49 00:03:02,710 --> 00:03:05,090 It's the major problem there. 50 00:03:05,090 --> 00:03:07,060 So it's a very important problem. 51 00:03:07,060 --> 00:03:14,820 And what Wald said was if you let the random variable J be 52 00:03:14,820 --> 00:03:18,960 the stopping time of this random walk, namely, the time 53 00:03:18,960 --> 00:03:23,340 in which the walk first crosses either alpha or 54 00:03:23,340 --> 00:03:29,090 crosses the beta, and then he said no matter what r you 55 00:03:29,090 --> 00:03:34,100 choose and the range of points where the moment generating 56 00:03:34,100 --> 00:03:38,525 function g sub X of r exists. 57 00:03:38,525 --> 00:03:43,980 Y You can pick any r in that range, and then what you get 58 00:03:43,980 --> 00:03:48,910 is this strange looking equality here. 59 00:03:48,910 --> 00:03:58,030 And I pointed out last time it just wasn't all that strange, 60 00:03:58,030 --> 00:04:02,720 because if instead of using the stopping time of when you 61 00:04:02,720 --> 00:04:07,130 cross a threshold, if instead you used as a stopping time 62 00:04:07,130 --> 00:04:08,640 just some particular end. 63 00:04:08,640 --> 00:04:11,660 You go for some number of steps, and then you stop. 64 00:04:11,660 --> 00:04:13,440 And at that point, you have the expected 65 00:04:13,440 --> 00:04:17,560 value of E to the rsn. 66 00:04:17,560 --> 00:04:22,460 The expected value of E to the rsn by definition is the 67 00:04:22,460 --> 00:04:30,150 moment generating function at r of S sub n, which is exactly 68 00:04:30,150 --> 00:04:41,460 equal to the minus J times gamma n times gamma of r. 69 00:04:41,460 --> 00:04:46,540 So all we're doing here, all that Wald did-- 70 00:04:46,540 --> 00:04:48,820 as it turns out, it was quite a bit-- 71 00:04:48,820 --> 00:04:52,430 was to say that when you replace a fixed end with a 72 00:04:52,430 --> 00:04:56,150 stopping time, you still get the same result. 73 00:04:56,150 --> 00:04:59,750 We're stating it here just for the case of two thresholds. 74 00:04:59,750 --> 00:05:01,990 Wald stated it in much general terms. 75 00:05:01,990 --> 00:05:05,620 We'll use it a more general terms, when we say more about 76 00:05:05,620 --> 00:05:06,870 martingales. 77 00:05:11,740 --> 00:05:15,520 X, now remember is the underlying random variable. 78 00:05:15,520 --> 00:05:18,038 S sub n is the sum of the X's. 79 00:05:29,740 --> 00:05:39,080 If X bar is less than 0, and if gamma r star equals 0, r 80 00:05:39,080 --> 00:05:43,140 star is the r at which gamma of r equals 0. 81 00:05:43,140 --> 00:05:46,160 It's the second root of gamma of r. 82 00:05:46,160 --> 00:06:01,660 Gamma of r, if you remember, looks like this. 83 00:06:01,660 --> 00:06:04,230 This is r star here. 84 00:06:07,610 --> 00:06:13,420 This is the expected value of X as the slope here. 85 00:06:13,420 --> 00:06:18,930 And we're assuming that X bar is less than 0 for this. 86 00:06:18,930 --> 00:06:22,380 I don't know how that greater than 0 got in. 87 00:06:22,380 --> 00:06:26,060 And then what it says is the probability that SJ is greater 88 00:06:26,060 --> 00:06:28,810 than or equal to alpha is less than or equal to the minus 89 00:06:28,810 --> 00:06:30,880 alpha r star. 90 00:06:30,880 --> 00:06:33,080 And last time, remember, we went through a long, messy 91 00:06:33,080 --> 00:06:35,850 bunch of equations for that. 92 00:06:35,850 --> 00:06:40,520 I looked at it again, and this is just the simple old Markov 93 00:06:40,520 --> 00:06:43,090 inequality again. 94 00:06:43,090 --> 00:06:48,620 All you do to get this is you say, OK, think of the random 95 00:06:48,620 --> 00:06:51,730 variable, E to the r SJ. 96 00:06:51,730 --> 00:06:54,220 SJ is a very complicated random variable, but it's a 97 00:06:54,220 --> 00:06:56,420 random variable, nonetheless. 98 00:06:56,420 --> 00:07:02,960 So either the rSJ is a random variable and the expected 99 00:07:02,960 --> 00:07:18,960 value of that random variable is at r star, the expected 100 00:07:18,960 --> 00:07:21,165 value of it is just one. 101 00:07:33,430 --> 00:07:34,790 I'll write down. 102 00:07:34,790 --> 00:07:37,610 It'll be easier. 103 00:07:37,610 --> 00:07:47,630 The expected value e to the r star S sub J is equal to 1. 104 00:07:47,630 --> 00:08:00,360 And therefore, the probability that E to the r star SJ is a 105 00:08:00,360 --> 00:08:10,520 greater than or equal to E to the r star alpha is just less 106 00:08:10,520 --> 00:08:31,820 than or equal to 1 over E to the r star alpha, OK? 107 00:08:31,820 --> 00:08:33,480 And that's what the inequality says. 108 00:08:33,480 --> 00:08:36,960 So that's all there is to it. 109 00:08:36,960 --> 00:08:38,700 OK, what? 110 00:08:38,700 --> 00:08:43,155 AUDIENCE: I don't really see why these two [INAUDIBLE]? 111 00:08:43,155 --> 00:08:47,120 They don't [INAUDIBLE]. 112 00:08:47,120 --> 00:08:49,610 PROFESSOR: You need x1 negative so that you get 113 00:08:49,610 --> 00:08:53,760 another root so that r star exists. 114 00:08:53,760 --> 00:08:58,410 If r star is positive, if the expected value of x is 115 00:08:58,410 --> 00:09:03,090 positive, then r star is down here at negative r. 116 00:09:13,000 --> 00:09:14,170 I mean, you're talking about the other 117 00:09:14,170 --> 00:09:15,790 threshold in a sense. 118 00:09:19,900 --> 00:09:26,050 OK, this is valid for all lower thresholds. 119 00:09:26,050 --> 00:09:28,850 And it's also valid for no threshold. 120 00:09:28,850 --> 00:09:32,760 OK, in other words, this equation here does not have 121 00:09:32,760 --> 00:09:34,630 beta in it at all. 122 00:09:34,630 --> 00:09:41,130 So this equation is an upper bound on the probability, that 123 00:09:41,130 --> 00:09:43,960 you're going to cross that threshold of alpha. 124 00:09:43,960 --> 00:09:47,550 And that upper bound is valid, no matter where you put the 125 00:09:47,550 --> 00:09:49,480 lower bound at all. 126 00:09:49,480 --> 00:09:51,830 So you can go to the limit as the lower 127 00:09:51,830 --> 00:09:53,930 bound goes to infinity. 128 00:09:53,930 --> 00:09:57,120 And this inequality should still be valid. 129 00:09:57,120 --> 00:10:00,200 You have a homework problem where you actually prove that. 130 00:10:00,200 --> 00:10:04,050 Sometimes when things go to infinity, funny things happen. 131 00:10:04,050 --> 00:10:07,790 And that proves that nothing funny happens then. 132 00:10:07,790 --> 00:10:13,900 So what happens then is the probability that you ever 133 00:10:13,900 --> 00:10:18,000 cross a threshold at plus alpha, when you have a random 134 00:10:18,000 --> 00:10:26,840 variable, which has a negative mean, is this exponent here. 135 00:10:26,840 --> 00:10:31,900 And we also sort of showed by looking at the turn off bound 136 00:10:31,900 --> 00:10:34,300 that this bound is pretty tight. 137 00:10:34,300 --> 00:10:37,130 So in other words, what this is saying is when you're 138 00:10:37,130 --> 00:10:40,740 looking at threshold crossing problems-- 139 00:10:40,740 --> 00:10:45,560 this quantity here, this quantity where the second root 140 00:10:45,560 --> 00:10:48,220 of gamma of r is-- 141 00:10:48,220 --> 00:10:50,460 that's sort of the crucial parameter 142 00:10:50,460 --> 00:10:51,470 that you want to know. 143 00:10:51,470 --> 00:10:54,170 Usually the first thing you want to know about a random 144 00:10:54,170 --> 00:10:57,150 variable is its mean, its variance, all sorts 145 00:10:57,150 --> 00:10:58,640 of things like that. 146 00:10:58,640 --> 00:11:01,160 This is saying if you're interested in thresholds, 147 00:11:01,160 --> 00:11:05,702 forget about all those things, look at r star. 148 00:11:05,702 --> 00:11:09,320 If r star is positive that means it means is negative, so 149 00:11:09,320 --> 00:11:10,580 there's no problem there. 150 00:11:10,580 --> 00:11:14,700 But this one quantity here is sort of the most important 151 00:11:14,700 --> 00:11:18,140 parameter of all of these problems. 152 00:11:18,140 --> 00:11:25,900 OK, so let's go back to look at a hypothesis testing again, 153 00:11:25,900 --> 00:11:32,750 where we're looking at the likelihood ratio of being the 154 00:11:32,750 --> 00:11:37,460 ratio of the density for hypothesis 0 divided by 155 00:11:37,460 --> 00:11:39,960 hypothesis 1. 156 00:11:39,960 --> 00:11:45,480 What you get then is you observe this sequence Y sub n. 157 00:11:45,480 --> 00:11:48,360 These are the observations that you're taking. 158 00:11:48,360 --> 00:11:51,330 In other words, nature at the beginning of this whole 159 00:11:51,330 --> 00:11:58,870 experiment chooses either H equals 0 or H equals 1. 160 00:11:58,870 --> 00:12:02,370 At that point, you start to make measurements. 161 00:12:02,370 --> 00:12:06,180 Now whether nature chooses H equals 0 before or after or 162 00:12:06,180 --> 00:12:08,220 when doesn't make any difference. 163 00:12:08,220 --> 00:12:12,090 The point is the experiment consists of nature choosing 164 00:12:12,090 --> 00:12:13,980 one of these two hypotheses. 165 00:12:13,980 --> 00:12:18,590 You know all the probabilities that exist in the world in 166 00:12:18,590 --> 00:12:19,710 this model. 167 00:12:19,710 --> 00:12:21,770 You go making these measurements. 168 00:12:21,770 --> 00:12:23,640 All you of observe is these measurements. 169 00:12:23,640 --> 00:12:28,830 You don't observe what the hypothesis is, so you define 170 00:12:28,830 --> 00:12:33,760 this likelihood ratio of the ratio of the densities of the 171 00:12:33,760 --> 00:12:42,940 vector Y for H equals 0 and the vector Y with H equals 1. 172 00:12:42,940 --> 00:12:47,660 These quantities exist no matter what the a priori 173 00:12:47,660 --> 00:12:49,360 probabilities of the thresholds 174 00:12:49,360 --> 00:12:51,470 are or anything else. 175 00:12:51,470 --> 00:12:55,430 Even without all of that, so long as you have a model which 176 00:12:55,430 --> 00:13:01,680 tells you what the densities of these observations are, 177 00:13:01,680 --> 00:13:04,110 conditional on each hypothesis, 178 00:13:04,110 --> 00:13:05,140 you can define this. 179 00:13:05,140 --> 00:13:09,070 This doesn't depend on a priori probabilities at all. 180 00:13:09,070 --> 00:13:14,010 OK, so now you look at the probability that H is equal to 181 00:13:14,010 --> 00:13:18,150 0, given all these observations divided by the 182 00:13:18,150 --> 00:13:20,470 probability it's equal to 1. 183 00:13:20,470 --> 00:13:23,000 What you get here now, you have the a priori 184 00:13:23,000 --> 00:13:26,090 probabilities, p0 over p1. 185 00:13:26,090 --> 00:13:28,880 Here is the likelihood ratio here. 186 00:13:28,880 --> 00:13:34,610 So what you have this p0 over p1 times the likelihood ratio 187 00:13:34,610 --> 00:13:36,400 of this vector of however many 188 00:13:36,400 --> 00:13:39,585 observations you have observed. 189 00:13:39,585 --> 00:13:43,350 It's just a nice way of breaking up the problem into 190 00:13:43,350 --> 00:13:50,780 the likelihood ratio and the a priori probabilities. 191 00:13:50,780 --> 00:13:55,170 Incidentally, we haven't talked about this at all, but 192 00:13:55,170 --> 00:13:59,730 there's an important idea and all of this hypothesis testing 193 00:13:59,730 --> 00:14:03,600 of a sufficient statistic, and what do you think a sufficient 194 00:14:03,600 --> 00:14:05,380 statistic is. 195 00:14:05,380 --> 00:14:07,630 It's anything from which you can calculate 196 00:14:07,630 --> 00:14:10,020 the likelihood ratio. 197 00:14:10,020 --> 00:14:12,480 In other words, what we're saying here, the point we're 198 00:14:12,480 --> 00:14:17,510 making, is that any intelligent choice of 199 00:14:17,510 --> 00:14:23,090 hypothesis is it based on a threshold test on the 200 00:14:23,090 --> 00:14:24,970 likelihood ratio. 201 00:14:24,970 --> 00:14:29,380 And therefore, the only thing you can really be interested 202 00:14:29,380 --> 00:14:33,430 in in all your observations is just what is 203 00:14:33,430 --> 00:14:35,190 the likelihood ratio? 204 00:14:35,190 --> 00:14:38,680 If they make all these 1,000 observations complicated sort 205 00:14:38,680 --> 00:14:41,680 of thing, you calculate one number. 206 00:14:41,680 --> 00:14:44,220 And that's the only thing you're interested in. 207 00:14:44,220 --> 00:14:47,900 And anything from which that number could be calculated is 208 00:14:47,900 --> 00:14:49,900 a sufficient statistic. 209 00:14:49,900 --> 00:14:52,720 And anything from which it can't be calculated you've 210 00:14:52,720 --> 00:14:56,260 thrown away some of the information that you have. 211 00:14:56,260 --> 00:15:00,140 If you study communication and you study detection, you study 212 00:15:00,140 --> 00:15:04,900 how to receive data that's being sent, what you find is 213 00:15:04,900 --> 00:15:10,000 that right at the beginning, even before you do any 214 00:15:10,000 --> 00:15:13,530 detection, even before you do any filtering, there's some 215 00:15:13,530 --> 00:15:16,490 idea of a sufficient statistic there. 216 00:15:16,490 --> 00:15:20,900 That's what you need in order to calculate everything else. 217 00:15:20,900 --> 00:15:23,100 And you want to make sure that you have that. 218 00:15:23,100 --> 00:15:26,240 So that's an important idea there. 219 00:15:26,240 --> 00:15:34,340 OK, but anyway, the MAP rule, which comes right from this, 220 00:15:34,340 --> 00:15:38,100 says if you have these a priori probabilities, and 221 00:15:38,100 --> 00:15:41,380 you're trying to maximize the probability of choosing 222 00:15:41,380 --> 00:15:44,750 correctly, what do you do? 223 00:15:44,750 --> 00:15:52,290 Well, your probability of H equals 0 was the correct 224 00:15:52,290 --> 00:15:55,680 hypothesis, given all the observations you 225 00:15:55,680 --> 00:15:57,440 made, is in fact this. 226 00:15:57,440 --> 00:15:59,820 The probability that H equals 1 is the correct 227 00:15:59,820 --> 00:16:01,670 hypothesis is this. 228 00:16:01,670 --> 00:16:04,300 What do you do if you want to maximize the probability of 229 00:16:04,300 --> 00:16:05,270 being correct? 230 00:16:05,270 --> 00:16:07,340 You choose the one which is biggest. 231 00:16:07,340 --> 00:16:11,260 In other words, what you do is you look at this number. 232 00:16:11,260 --> 00:16:14,890 And if this number is bigger than 1, you choose 0. 233 00:16:14,890 --> 00:16:20,290 If it's less than 1, you choose hypothesis 1. 234 00:16:20,290 --> 00:16:23,960 And what it turns out to is threshold of rule. 235 00:16:23,960 --> 00:16:26,280 You take this likelihood ratio. 236 00:16:26,280 --> 00:16:30,020 You compare it with p1 over p0. 237 00:16:30,020 --> 00:16:34,250 And in this case, you select h equals 0. 238 00:16:34,250 --> 00:16:37,540 In this case you select H equals 1. 239 00:16:37,540 --> 00:16:42,040 And the last time I just a 1 and 0 reversed, which is fine, 240 00:16:42,040 --> 00:16:44,620 but if you reverse them one place, you want a reverse them 241 00:16:44,620 --> 00:16:47,550 every place. 242 00:16:47,550 --> 00:16:52,260 And every other threshold test does something like this, 243 00:16:52,260 --> 00:16:56,920 except you replace p1 over p0 with some arbitrary threshold. 244 00:16:56,920 --> 00:16:59,590 You say whatever reason you want to find for that 245 00:16:59,590 --> 00:17:03,880 threshold, that's the only intelligent kind of 246 00:17:03,880 --> 00:17:06,460 test you can make. 247 00:17:06,460 --> 00:17:09,300 OK, then we define the log-likelihood ratio of the 248 00:17:09,300 --> 00:17:12,710 logarithm of the likelihood ratio. 249 00:17:12,710 --> 00:17:17,970 And that was nice because it was a sum of this quantity 250 00:17:17,970 --> 00:17:20,339 related to the individual observations. 251 00:17:20,339 --> 00:17:25,400 For each observation you really want to know what f of 252 00:17:25,400 --> 00:17:30,870 Y given H of Y given 0, divided by Y given 1. 253 00:17:30,870 --> 00:17:32,000 You want to divide those two. 254 00:17:32,000 --> 00:17:34,060 You want to take the logarithm of it. 255 00:17:34,060 --> 00:17:37,130 And then you have those numbers, and the sufficient 256 00:17:37,130 --> 00:17:41,070 statistic that you're interested in is just a sum of 257 00:17:41,070 --> 00:17:42,530 those numbers. 258 00:17:42,530 --> 00:17:47,385 So you're looking at a sum of IID, random variable. 259 00:17:47,385 --> 00:17:50,540 IID, why IID? 260 00:17:50,540 --> 00:17:57,650 Well, under the hypothesis that H is equal to 1, those 261 00:17:57,650 --> 00:18:00,410 Yi's are IID. 262 00:18:00,410 --> 00:18:03,000 And therefore under the hypothesis that 263 00:18:03,000 --> 00:18:04,250 H is equal to 1. 264 00:18:09,140 --> 00:18:12,410 Well, little Zi is just a sample value. 265 00:18:12,410 --> 00:18:15,080 If you look at the random variable which has these 266 00:18:15,080 --> 00:18:21,560 sampled values, Z sub i, under the probability measure, 267 00:18:21,560 --> 00:18:27,850 corresponding to H equals 1, those Z sub i's are IID. 268 00:18:27,850 --> 00:18:34,350 So what that says is when you look at these sums of random 269 00:18:34,350 --> 00:18:39,780 variables, the sum of Zi from 1 to n, under hypothesis H 270 00:18:39,780 --> 00:18:43,750 equals 1, what do you get? 271 00:18:43,750 --> 00:18:46,540 You get a random walk. 272 00:18:46,540 --> 00:18:49,935 You get a sum of IID random variables. 273 00:18:53,780 --> 00:18:58,770 If you take more observations, S sub n just changes. 274 00:18:58,770 --> 00:19:02,020 With n changing, then you have a larger number of 275 00:19:02,020 --> 00:19:02,840 observations. 276 00:19:02,840 --> 00:19:06,270 So the random walk goes a little further out, and you 277 00:19:06,270 --> 00:19:08,825 might get closer to a threshold or whatever. 278 00:19:11,990 --> 00:19:14,110 And that's what we're trying to do here. 279 00:19:14,110 --> 00:19:20,680 OK, so the Z sub i's under the hypothesis H equals 1, or IID, 280 00:19:20,680 --> 00:19:24,820 and the moment generating function of the Z sub i's 281 00:19:24,820 --> 00:19:27,305 given H equals 1, is this. 282 00:19:30,460 --> 00:19:33,220 Let's be careful about this. 283 00:19:33,220 --> 00:19:41,240 The sampled values of the Z sub i do not depend on the 284 00:19:41,240 --> 00:19:43,410 hypotheses at all. 285 00:19:43,410 --> 00:19:45,900 Namely, you make an observation. 286 00:19:45,900 --> 00:19:49,280 You make an observation of Y sub i. 287 00:19:49,280 --> 00:19:52,730 You calculate Z sub i from Y sub i. 288 00:19:52,730 --> 00:19:55,910 That has nothing to do with whether H equals 289 00:19:55,910 --> 00:19:58,270 0 or H equals 1. 290 00:19:58,270 --> 00:20:00,940 You try to calculate this moment 291 00:20:00,940 --> 00:20:03,250 generating function, however. 292 00:20:03,250 --> 00:20:05,980 And you want to know what the probability 293 00:20:05,980 --> 00:20:09,870 density of the Y's are. 294 00:20:09,870 --> 00:20:13,140 And you get a different probability density for H 295 00:20:13,140 --> 00:20:16,500 equals 1, then you get on the other hypothesis. 296 00:20:16,500 --> 00:20:19,850 If the observations behaved the same way under both 297 00:20:19,850 --> 00:20:24,500 hypotheses, it wouldn't make much sense to do the 298 00:20:24,500 --> 00:20:25,450 observation. 299 00:20:25,450 --> 00:20:27,580 Unless you have a government grant, and you're trying to 300 00:20:27,580 --> 00:20:30,130 get money out of the government instead of trying 301 00:20:30,130 --> 00:20:32,470 to do anything worthwhile. 302 00:20:32,470 --> 00:20:35,690 Under those circumstances, you keep on making observations. 303 00:20:35,690 --> 00:20:38,560 You now perfectly well that nothing is going 304 00:20:38,560 --> 00:20:39,830 to come from them. 305 00:20:39,830 --> 00:20:42,310 But otherwise, it's a little silly. 306 00:20:42,310 --> 00:20:46,690 So this moment generating function under the hypothesis 307 00:20:46,690 --> 00:20:52,150 H equals 1 is given by this quantity here. 308 00:20:52,150 --> 00:20:57,630 And this density here is the same as this density here. 309 00:20:57,630 --> 00:21:03,590 So you get this density to the 1 minus r power, and you get 310 00:21:03,590 --> 00:21:06,130 this density to the r power. 311 00:21:06,130 --> 00:21:09,210 So you get the product of these two densities. 312 00:21:09,210 --> 00:21:16,200 You integrate it over Y, and that's what gamma 1 of r is. 313 00:21:16,200 --> 00:21:18,780 Now I said that the really important thing in all of 314 00:21:18,780 --> 00:21:23,990 these threshold problems is what is our star? 315 00:21:23,990 --> 00:21:27,180 And for this problem, r star is trivial. 316 00:21:27,180 --> 00:21:28,640 It's always the same. 317 00:21:28,640 --> 00:21:31,570 r star is always equal to 1. 318 00:21:31,570 --> 00:21:37,160 And the reason is when you set r equal to 1 here, this 319 00:21:37,160 --> 00:21:39,080 quantity becomes 1. 320 00:21:39,080 --> 00:21:41,860 This quantity becomes the density of Y 321 00:21:41,860 --> 00:21:44,370 conditional on H equals 0. 322 00:21:44,370 --> 00:21:46,990 When you integrate that, you get 1. 323 00:21:46,990 --> 00:21:53,100 So for all of these hypothesis testing problems, r star is 324 00:21:53,100 --> 00:21:56,150 equal to 1. 325 00:21:56,150 --> 00:22:00,050 Gamma 1 of 1 is equal to 0. 326 00:22:00,050 --> 00:22:02,405 And this is what this curve says. 327 00:22:10,660 --> 00:22:17,220 OK, this is gamma 1 of r here. 328 00:22:17,220 --> 00:22:20,450 This curve starts out here, negative slope. 329 00:22:20,450 --> 00:22:22,030 It comes up here. 330 00:22:22,030 --> 00:22:26,630 r star is equal to 1 in this case. 331 00:22:26,630 --> 00:22:29,565 And that's sort of the end of the story for that. 332 00:22:33,080 --> 00:22:38,660 Now if you are doing a test with a fixed value of n, you 333 00:22:38,660 --> 00:22:41,000 say I'm going to make n observations, it's 334 00:22:41,000 --> 00:22:42,770 all I have time for. 335 00:22:42,770 --> 00:22:44,180 The week is over. 336 00:22:44,180 --> 00:22:46,680 I'm going on vacation next week. 337 00:22:46,680 --> 00:22:48,580 I've got to stop this test. 338 00:22:48,580 --> 00:22:50,730 I've got to write my paper. 339 00:22:50,730 --> 00:22:52,290 Take the end test. 340 00:22:52,290 --> 00:22:54,170 You write your paper. 341 00:22:54,170 --> 00:22:56,820 And what do you do? 342 00:22:56,820 --> 00:23:01,420 You go through the optimal tests the best you can. 343 00:23:01,420 --> 00:23:05,910 And what you find is given H equals 1, an error is going to 344 00:23:05,910 --> 00:23:11,020 occur if the sum of random variables, namely the 345 00:23:11,020 --> 00:23:14,800 log-likelihood likelihood ratio, exceeds the logarithm 346 00:23:14,800 --> 00:23:16,490 of your threshold. 347 00:23:16,490 --> 00:23:20,650 OK, this is whatever threshold you decide to establish. 348 00:23:20,650 --> 00:23:27,580 And we showed before that the probability that S sub n is 349 00:23:27,580 --> 00:23:33,290 greater than or equal to log of the threshold is evaluated 350 00:23:33,290 --> 00:23:37,220 as E to the n times this quantity right here. 351 00:23:37,220 --> 00:23:40,210 The probability of error given H equals 1 is 352 00:23:40,210 --> 00:23:41,430 this quantity here. 353 00:23:41,430 --> 00:23:45,380 Probability of error given H equals 1 is the probability 354 00:23:45,380 --> 00:23:48,880 that the data looks like H equals 0 was a right 355 00:23:48,880 --> 00:23:49,720 hypothesis. 356 00:23:49,720 --> 00:23:52,690 In other words, that you crossed the threshold at plus 357 00:23:52,690 --> 00:23:55,790 alpha, instead of crossing the threshold at beta. 358 00:23:59,710 --> 00:24:02,660 Excuse me. 359 00:24:02,660 --> 00:24:04,590 We have too many cases here we're looking 360 00:24:04,590 --> 00:24:07,300 at, so it gets confusing. 361 00:24:07,300 --> 00:24:11,810 What I'm looking at here is the probability that the 362 00:24:11,810 --> 00:24:16,090 log-likelihood ratio exceeds this threshold data, whatever 363 00:24:16,090 --> 00:24:18,380 we set beta to be. 364 00:24:18,380 --> 00:24:22,230 Eta is set, depending on the cost of making errors of both 365 00:24:22,230 --> 00:24:26,955 types on our a priori beta, if we have any and 366 00:24:26,955 --> 00:24:28,420 all of those things. 367 00:24:28,420 --> 00:24:31,920 And the probability of error given H equals 1 is this 368 00:24:31,920 --> 00:24:36,490 quantity here, which has the threshold in it over there. 369 00:24:36,490 --> 00:24:41,070 We've looked at that a number of times in a lecture. 370 00:24:41,070 --> 00:24:42,950 We looked at it in chapter one. 371 00:24:42,950 --> 00:24:46,510 And then those, we looked at it in chapter seven. 372 00:24:46,510 --> 00:24:50,220 And you calculate it by taking this moment generating 373 00:24:50,220 --> 00:24:58,480 function, drawing attention to it at the point where slope 374 00:24:58,480 --> 00:25:02,430 natural log of eta divided by n. 375 00:25:02,430 --> 00:25:05,200 And then you take where it comes in to this 376 00:25:05,200 --> 00:25:07,140 vertical axis here. 377 00:25:07,140 --> 00:25:13,690 And that's the exponent of the error of probability when 378 00:25:13,690 --> 00:25:16,206 hypothesis 1 is correct. 379 00:25:16,206 --> 00:25:23,050 Now, if the hypothesis is H equals 0 instead, at that 380 00:25:23,050 --> 00:25:29,630 point with H equals 0, the expected value of this 381 00:25:29,630 --> 00:25:32,860 log-likelihood ratio is going to be positive. 382 00:25:32,860 --> 00:25:37,800 The situation is going to be a curve that comes over here, 383 00:25:37,800 --> 00:25:40,080 comes back at some point here. 384 00:25:40,080 --> 00:25:44,040 And what we've showed is that this curve is just a 385 00:25:44,040 --> 00:25:47,630 translation of this curve by 1. 386 00:25:47,630 --> 00:25:53,770 OK, namely if you calculate the moment generating function 387 00:25:53,770 --> 00:25:59,450 for H equals 0, you get the same thing that we got before. 388 00:26:01,970 --> 00:26:04,390 I'm not going to go through all the details of this. 389 00:26:06,890 --> 00:26:09,695 Now you have 0 here instead of 1. 390 00:26:13,740 --> 00:26:19,540 Over here you're going to have just minus r. 391 00:26:19,540 --> 00:26:25,440 And over here you're going to have 1 plus r. 392 00:26:25,440 --> 00:26:29,170 So this whole thing is translated by 1. 393 00:26:29,170 --> 00:26:30,860 The action happens here. 394 00:26:30,860 --> 00:26:35,010 But if you translate it by 1 over in this direction, what 395 00:26:35,010 --> 00:26:38,590 happens is the error of probability is 396 00:26:38,590 --> 00:26:41,160 determined by this. 397 00:26:44,220 --> 00:26:45,560 It's this. 398 00:26:45,560 --> 00:26:49,800 The exponent is this point right here, gamma 1 of r0 plus 399 00:26:49,800 --> 00:26:53,430 1 minus r0, log of eta over n. 400 00:26:53,430 --> 00:26:57,280 And the r0 was again determined by this point at 401 00:26:57,280 --> 00:27:00,718 which the slope is equal to log eta over n. 402 00:27:03,240 --> 00:27:04,490 We did that before. 403 00:27:10,450 --> 00:27:13,440 Then I want to make clear you understood it, because to 404 00:27:13,440 --> 00:27:16,100 really understand it, you have to go through the arithmetic 405 00:27:16,100 --> 00:27:18,190 yourselves at least once. 406 00:27:18,190 --> 00:27:21,860 And you can do that easily by following the notes, because 407 00:27:21,860 --> 00:27:25,650 it does it in almost excruciating detail. 408 00:27:25,650 --> 00:27:29,820 So that's the argument you get. 409 00:27:29,820 --> 00:27:36,380 We had this idea before, of the Neyman-Pearson principle, 410 00:27:36,380 --> 00:27:40,750 which says you don't assume a priori probabilities. 411 00:27:40,750 --> 00:27:45,450 You look at the probability of making an error as being a 412 00:27:45,450 --> 00:27:50,160 trade off between the error you make when H is equal to 1 413 00:27:50,160 --> 00:27:53,380 and the error you make when H is equal to 0. 414 00:27:53,380 --> 00:27:55,270 In terms of the Chernoff bound, this 415 00:27:55,270 --> 00:27:57,320 trade off is very clear. 416 00:28:00,700 --> 00:28:05,200 As you change the exponent that you want to get under H 417 00:28:05,200 --> 00:28:09,810 equals 1, this point moves. 418 00:28:09,810 --> 00:28:12,330 The tangent then moves. 419 00:28:12,330 --> 00:28:14,300 And the exponent over here moves. 420 00:28:14,300 --> 00:28:16,950 So you have this inverted seesaw. 421 00:28:16,950 --> 00:28:20,170 And the exponent for one kind of error is over here. 422 00:28:20,170 --> 00:28:24,790 And the exponent for the other kind of error is over there. 423 00:28:24,790 --> 00:28:29,550 Then the next thing we said was this is really stupid, 424 00:28:29,550 --> 00:28:32,410 unless you're going on vacation this Friday. 425 00:28:32,410 --> 00:28:35,130 If you're not going on vacation this Friday, if 426 00:28:35,130 --> 00:28:38,620 you're really serious about making the right decision, 427 00:28:38,620 --> 00:28:44,210 then what you're going to do is keep on making observations 428 00:28:44,210 --> 00:28:45,885 until you're pretty sure you're right. 429 00:28:49,420 --> 00:28:53,620 Now somebody at the end of the lecture last time pointed out 430 00:28:53,620 --> 00:28:59,840 something, which says that when you do experiments and 431 00:28:59,840 --> 00:29:02,420 you keep on making observations until you get the 432 00:29:02,420 --> 00:29:05,760 data that you want, there's something very 433 00:29:05,760 --> 00:29:08,960 unethical about that. 434 00:29:08,960 --> 00:29:12,580 Is this that kind of unethical behavior? 435 00:29:12,580 --> 00:29:13,830 Or is this really valid? 436 00:29:17,770 --> 00:29:21,840 Well, I claim this is valid, because what we're doing when 437 00:29:21,840 --> 00:29:27,050 we're doing sequential testing is we're deciding what we're 438 00:29:27,050 --> 00:29:29,390 going to do ahead of time. 439 00:29:29,390 --> 00:29:31,900 Namely, we've decided what we're going to do is we're 440 00:29:31,900 --> 00:29:36,550 going to continue testing until we cross a threshold and 441 00:29:36,550 --> 00:29:42,290 threshold gives us a suitable probability of error. 442 00:29:42,290 --> 00:29:44,770 So we're not cooking the books at all. 443 00:29:44,770 --> 00:29:47,730 What we're doing is we're following this preset 444 00:29:47,730 --> 00:29:49,250 procedure we've set up. 445 00:29:49,250 --> 00:29:54,230 And the only question is can we get a very small error 446 00:29:54,230 --> 00:29:59,240 probability by using a smaller number of observations on the 447 00:29:59,240 --> 00:30:01,580 average than what we need otherwise? 448 00:30:04,330 --> 00:30:06,415 Put it in terms of a communication system. 449 00:30:09,060 --> 00:30:12,490 One kind of communication system, you have to send some 450 00:30:12,490 --> 00:30:15,320 data from one point to another. 451 00:30:15,320 --> 00:30:18,380 You're not going to get any feedback on it. 452 00:30:18,380 --> 00:30:21,630 You've got to get the data through the first time. 453 00:30:21,630 --> 00:30:23,140 It's got to be right. 454 00:30:23,140 --> 00:30:25,250 What are you going to do? 455 00:30:25,250 --> 00:30:27,920 You're going to send this data a very large number of times 456 00:30:27,920 --> 00:30:31,960 or use a very powerful coding technique on it. 457 00:30:31,960 --> 00:30:35,790 And by time it gets through, you're going to be very sure 458 00:30:35,790 --> 00:30:38,860 you're right. 459 00:30:38,860 --> 00:30:42,780 Now a much better procedure, and the thing which is used in 460 00:30:42,780 --> 00:30:46,600 almost all communication systems, and the thing which 461 00:30:46,600 --> 00:30:50,220 we use as human beings all the time, and the thing which 462 00:30:50,220 --> 00:30:53,120 control people use all the time, the thing which almost 463 00:30:53,120 --> 00:30:57,150 everybody uses, because most of us have common sense if we 464 00:30:57,150 --> 00:31:01,840 spend some time trying to do these things, is instead of 465 00:31:01,840 --> 00:31:06,720 trying to get it right the first time, we try little bit 466 00:31:06,720 --> 00:31:08,500 to get it right the first time. 467 00:31:08,500 --> 00:31:10,770 And we make sure that if we don't get it right the first 468 00:31:10,770 --> 00:31:13,440 time, we have some way of finding out about it and 469 00:31:13,440 --> 00:31:16,550 getting it right the second time. 470 00:31:21,390 --> 00:31:26,500 And in the scientific way of looking at it, what we do is 471 00:31:26,500 --> 00:31:30,250 we decide ahead of time exactly what our procedure is 472 00:31:30,250 --> 00:31:33,043 going to be for making repetitions-- 473 00:31:36,090 --> 00:31:39,100 something called ARQ in communication systems, which 474 00:31:39,100 --> 00:31:41,910 means automatic repeat request. 475 00:31:41,910 --> 00:31:47,470 It's automatic, which means you don't try to make your 476 00:31:47,470 --> 00:31:51,660 decision depending on whether you'd like to receive this 0 477 00:31:51,660 --> 00:31:54,400 or like to receive a 1. 478 00:31:54,400 --> 00:31:57,730 You make the decision ahead of time that if you have a clean 479 00:31:57,730 --> 00:32:01,110 enough answer, you're going to accept it. 480 00:32:01,110 --> 00:32:04,310 If it looks doubtful, you're going to send it over again. 481 00:32:04,310 --> 00:32:08,740 That's exactly the same sort of thing we're doing here. 482 00:32:08,740 --> 00:32:17,810 OK, when we do that, given H equals 1, we again have this S 483 00:32:17,810 --> 00:32:22,400 sub n as a function of n as a random walk. 484 00:32:22,400 --> 00:32:26,040 It's a sum of IID random variables and 485 00:32:26,040 --> 00:32:27,640 conditional on H equals 1. 486 00:32:27,640 --> 00:32:28,950 You have a random walk. 487 00:32:28,950 --> 00:32:33,810 Conditional on H equals 1, you have a negative slope on this 488 00:32:33,810 --> 00:32:34,530 random walk. 489 00:32:34,530 --> 00:32:38,480 The random walk starts out and on the average is going to go 490 00:32:38,480 --> 00:32:41,380 down, and it's going to continue going down forever. 491 00:32:41,380 --> 00:32:43,580 And if you're looking for across some positive 492 00:32:43,580 --> 00:32:46,640 threshold, if it doesn't cross it pretty soon, it's not going 493 00:32:46,640 --> 00:32:47,490 to cross it. 494 00:32:47,490 --> 00:32:50,230 But anyway, we have a test which said we have some 495 00:32:50,230 --> 00:32:51,320 positive threshold. 496 00:32:51,320 --> 00:32:53,420 We have some negative threshold. 497 00:32:53,420 --> 00:32:56,950 If we ever cross the positive threshold, we say 498 00:32:56,950 --> 00:32:58,610 H is equal to 0. 499 00:32:58,610 --> 00:33:01,410 If we ever cross the negative threshold, we say 500 00:33:01,410 --> 00:33:02,630 H is equal to 1. 501 00:33:02,630 --> 00:33:05,600 And then we're done with it. 502 00:33:05,600 --> 00:33:10,200 OK, now, let me give you another argument why that 503 00:33:10,200 --> 00:33:10,920 makes sense. 504 00:33:10,920 --> 00:33:13,250 I gave you one argument last time. 505 00:33:13,250 --> 00:33:14,930 I'll give you another argument this time. 506 00:33:18,780 --> 00:33:28,730 If S sub J is greater than or equal to 0, we're going to 507 00:33:28,730 --> 00:33:32,550 decide that 0 is the correct hypothesis. 508 00:33:35,300 --> 00:33:40,370 If H equals 1 is the correct hypothesis, then we're going 509 00:33:40,370 --> 00:33:42,750 make an error when S sub J is greater 510 00:33:42,750 --> 00:33:44,571 than or equal to alpha. 511 00:33:44,571 --> 00:33:47,010 If S sub J is less than or equal to the beta, we're going 512 00:33:47,010 --> 00:33:50,410 to decide H equals 1. 513 00:33:50,410 --> 00:33:54,810 And conditional on H equals 1, an error is made if SJ is 514 00:33:54,810 --> 00:33:56,560 greater than or equal to alpha. 515 00:33:56,560 --> 00:34:00,710 Conditional on H equals 0, an error is made if SJ is less 516 00:34:00,710 --> 00:34:02,570 than or equal to beta. 517 00:34:02,570 --> 00:34:06,880 OK, so the probability of the error conditional on H equals 518 00:34:06,880 --> 00:34:12,090 1 is the probability that S sub J is greater than or equal 519 00:34:12,090 --> 00:34:16,370 to alpha, given H equals 1, which is less than or equal to 520 00:34:16,370 --> 00:34:18,120 E to the minus alpha or star. 521 00:34:18,120 --> 00:34:22,340 This is the thing that we said before. 522 00:34:22,340 --> 00:34:25,270 r star is the root of gamma of r. 523 00:34:25,270 --> 00:34:29,840 And gamma of r is equal to this. 524 00:34:29,840 --> 00:34:38,060 OK, so, let's just make life a little easier for ourselves 525 00:34:38,060 --> 00:34:43,050 assume that our a priori probabilities are each 1/2. 526 00:34:43,050 --> 00:34:47,810 This is also called maximum likelihood decision. 527 00:34:47,810 --> 00:34:51,300 You take this likelihood ratio, and you just decide on 528 00:34:51,300 --> 00:34:54,770 the basis of the likelihood ratio. 529 00:34:54,770 --> 00:34:58,790 OK, then at the end of trial end, the probability of H 530 00:34:58,790 --> 00:35:03,850 equals 0 given Sn divided by the probability of H equals 1 531 00:35:03,850 --> 00:35:07,120 given Sn, the a prioris cancel out. 532 00:35:07,120 --> 00:35:11,260 It is just E to the S sub n. 533 00:35:11,260 --> 00:35:12,510 That's what it is. 534 00:35:15,000 --> 00:35:17,230 It's the likelihood ratio. 535 00:35:17,230 --> 00:35:21,100 S sub n is the log-likelihood ratio. 536 00:35:21,100 --> 00:35:24,360 So this is what it is. 537 00:35:24,360 --> 00:35:30,190 If you now take probability of H equals 0 on probability that 538 00:35:30,190 --> 00:35:34,130 H equals 1 given S of in, this equation 539 00:35:34,130 --> 00:35:36,440 becomes this equation. 540 00:35:36,440 --> 00:35:40,560 And then the probability of H equals 1 given S sub n is just 541 00:35:40,560 --> 00:35:44,910 E to the minus Sn over 1 plus E to the minus Sn. 542 00:35:44,910 --> 00:35:50,210 Now if Sn is a large number, E to the minus Sn is going to be 543 00:35:50,210 --> 00:35:52,120 totally trivial. 544 00:35:52,120 --> 00:35:55,570 And the probability that H equals 1 given Sn is 545 00:35:55,570 --> 00:35:58,730 essentially E to the minus Sn. 546 00:35:58,730 --> 00:36:04,260 It means when you can choose different values of n, this 547 00:36:04,260 --> 00:36:06,990 very directly gives you a control on what the 548 00:36:06,990 --> 00:36:08,900 probability of error is. 549 00:36:08,900 --> 00:36:13,680 The probability of error is essentially E to the minus Sn. 550 00:36:13,680 --> 00:36:17,210 So if you choose a threshold alpha, what you're doing is 551 00:36:17,210 --> 00:36:20,450 you're guaranteeing that the probability of error cannot be 552 00:36:20,450 --> 00:36:23,080 less than E to the minus alpha. 553 00:36:23,080 --> 00:36:27,650 OK, so this is more than just talking about averages. 554 00:36:27,650 --> 00:36:31,120 This is saying if you use a threshold rule, then what 555 00:36:31,120 --> 00:36:34,600 you're doing is guaranteeing that the probability of error 556 00:36:34,600 --> 00:36:37,360 is never going to be less than this quantity 557 00:36:37,360 --> 00:36:39,720 of specified here. 558 00:36:39,720 --> 00:36:43,940 OK, we saw last time the cost of choosing alpha to be large 559 00:36:43,940 --> 00:36:48,060 is that you have to make a very large number of trials, 560 00:36:48,060 --> 00:36:50,010 at least given H equals 0. 561 00:36:50,010 --> 00:36:53,770 Why don't I worry about the number of trials 562 00:36:53,770 --> 00:36:56,840 for H equals 1? 563 00:36:56,840 --> 00:37:00,590 I mean, it's nothing to be thought through here. 564 00:37:00,590 --> 00:37:04,370 If my thresholds are large, my probability of 565 00:37:04,370 --> 00:37:05,620 error is very small. 566 00:37:08,060 --> 00:37:17,530 The expected values of things for very large log-likelihood 567 00:37:17,530 --> 00:37:22,800 ratios are determined almost entirely by H equals 0. 568 00:37:22,800 --> 00:37:24,610 H equals 1 sometimes. 569 00:37:24,610 --> 00:37:27,250 You sometimes make a mistake, because it's something very, 570 00:37:27,250 --> 00:37:28,730 very unusual. 571 00:37:28,730 --> 00:37:31,450 But that has very little influence on the expected 572 00:37:31,450 --> 00:37:33,680 number of tests you're making. 573 00:37:33,680 --> 00:37:38,960 So what happens then is the expected number of tests you 574 00:37:38,960 --> 00:37:42,790 make under the hypothesis that H is equal to 0-- 575 00:37:45,580 --> 00:37:47,440 now we're using Wald's equality 576 00:37:47,440 --> 00:37:49,540 rather than Wald's identity-- 577 00:37:49,540 --> 00:37:52,720 it's equal to the expected value of S sub J given H 578 00:37:52,720 --> 00:37:56,530 equals 0, divided by the expected value of Z, 579 00:37:56,530 --> 00:37:57,780 given H equals 0. 580 00:38:01,150 --> 00:38:05,940 Z is the log-likelihood ratio of one trial. 581 00:38:05,940 --> 00:38:10,650 This is just Wald's equality with this 582 00:38:10,650 --> 00:38:12,710 condition thrown into it. 583 00:38:12,710 --> 00:38:17,290 Now what's the expected value of SJ given H equals 0? 584 00:38:17,290 --> 00:38:20,780 It's essentially alpha, and if you want to be more careful, 585 00:38:20,780 --> 00:38:26,720 it's alpha plus the expected overshoot given H equals 0. 586 00:38:26,720 --> 00:38:29,960 And that's divided by the expected value of Z, 587 00:38:29,960 --> 00:38:31,090 given H equals 0. 588 00:38:31,090 --> 00:38:32,990 This is the answer we got last time. 589 00:38:36,260 --> 00:38:39,520 So the number of tests you have to make, if you set a 590 00:38:39,520 --> 00:38:43,820 positive threshold alpha, is essentially the number of 591 00:38:43,820 --> 00:38:48,310 tests you have to make when the hypothesis is equal 0. 592 00:38:48,310 --> 00:38:52,400 So the funny thing which is happening here is that as you 593 00:38:52,400 --> 00:38:56,760 change alpha, you're changing the probability of error for 594 00:38:56,760 --> 00:39:00,810 hypothesis H equals 1. 595 00:39:00,810 --> 00:39:02,830 And you're changing the number of tests you're going to have 596 00:39:02,830 --> 00:39:05,290 to do when H is equal to 0. 597 00:39:05,290 --> 00:39:08,650 When you change beta, it's just the opposite. 598 00:39:08,650 --> 00:39:13,130 So that when you change beta, if you make beta a very large 599 00:39:13,130 --> 00:39:17,720 negative, you have to make an enormous number of tests under 600 00:39:17,720 --> 00:39:20,750 the circumstance that H is equal to 1. 601 00:39:20,750 --> 00:39:24,700 But you might make an error when H is equal to 0. 602 00:39:24,700 --> 00:39:30,270 So the trade off is between number of trials under one 603 00:39:30,270 --> 00:39:33,700 hypothesis, error of probability under the other 604 00:39:33,700 --> 00:39:34,950 hypothesis. 605 00:39:39,300 --> 00:39:43,140 That's almost all we wanted to say about Wald's identity. 606 00:39:43,140 --> 00:39:46,426 There's one other huge thing that we want to talk about. 607 00:39:48,990 --> 00:39:54,470 If you take the first two derivatives of Wald's identity 608 00:39:54,470 --> 00:40:00,360 at r equals 0, you get some interesting things coming out. 609 00:40:00,360 --> 00:40:03,080 I mean, Wald's identity, you can use it any value 610 00:40:03,080 --> 00:40:05,430 of r you want to. 611 00:40:05,430 --> 00:40:08,190 And when you use it for a large value of r, you that an 612 00:40:08,190 --> 00:40:11,520 interesting result about large deviations. 613 00:40:11,520 --> 00:40:15,650 When you use it at a small value of r, you get something 614 00:40:15,650 --> 00:40:18,220 more about typical cases. 615 00:40:18,220 --> 00:40:25,590 So looking at it at r equals 0, what you want to do is you 616 00:40:25,590 --> 00:40:28,970 want to take the derivative with respect 617 00:40:28,970 --> 00:40:34,170 to r of Wald's identity. 618 00:40:34,170 --> 00:40:37,720 This expected value in here we know is equal to 1. 619 00:40:37,720 --> 00:40:41,050 It's equal to 1 whatever value of r we choose. 620 00:40:41,050 --> 00:40:45,140 And therefore, when we take the derivative of this, we 621 00:40:45,140 --> 00:40:46,890 have to get 0. 622 00:40:46,890 --> 00:40:49,150 But we also want to take the derivative of it 623 00:40:49,150 --> 00:40:50,670 to see what we get. 624 00:40:50,670 --> 00:40:53,460 So when you take the derivative of this quantity 625 00:40:53,460 --> 00:40:56,260 here and you don't worry about what exists and 626 00:40:56,260 --> 00:40:57,510 what doesn't exist-- 627 00:41:01,250 --> 00:41:03,290 you have to take the derivative here-- 628 00:41:03,290 --> 00:41:07,470 so you get an S sub J there. 629 00:41:07,470 --> 00:41:09,340 You take the derivative here, you get a 630 00:41:09,340 --> 00:41:11,190 gamma prime of r there. 631 00:41:11,190 --> 00:41:15,440 If you get SJ minus J times gamma prime of r, and this E 632 00:41:15,440 --> 00:41:19,110 to the what have you just sits there. 633 00:41:19,110 --> 00:41:22,450 You take the derivative of E to something, you never get 634 00:41:22,450 --> 00:41:23,580 rid of the E to something. 635 00:41:23,580 --> 00:41:27,300 You just get piled up stuff in front of it. 636 00:41:27,300 --> 00:41:33,840 OK, so when we evaluate that at r equals 0, what happens? 637 00:41:33,840 --> 00:41:36,380 Well, what's the value of the gamma prime of 0? 638 00:41:40,180 --> 00:41:42,120 It's the expected value of X, yes. 639 00:41:48,060 --> 00:41:53,860 And this quantity here is all equal to 1, so we can forget 640 00:41:53,860 --> 00:41:56,140 about that. 641 00:41:56,140 --> 00:42:00,530 When r is equal to 0, this is equal to 0. 642 00:42:00,530 --> 00:42:04,320 When r is equal to 0, gamma of r is equal to 0. 643 00:42:04,320 --> 00:42:07,810 So this whole thing in here is 0. 644 00:42:07,810 --> 00:42:09,510 So E to the 0 is 1. 645 00:42:09,510 --> 00:42:11,070 So we've got a 1 there. 646 00:42:11,070 --> 00:42:16,870 We got expected value of S sub J minus J times X 647 00:42:16,870 --> 00:42:19,770 bar is equal to 0. 648 00:42:19,770 --> 00:42:20,290 What is that? 649 00:42:20,290 --> 00:42:22,190 That's Wald's equality. 650 00:42:22,190 --> 00:42:27,260 So Wald's equality falls out of the Wald's identity as what 651 00:42:27,260 --> 00:42:31,630 happens as the derivative of Wald's identity 652 00:42:31,630 --> 00:42:34,000 that r equals 0. 653 00:42:34,000 --> 00:42:36,975 Well, since we're so successful with that, let's go 654 00:42:36,975 --> 00:42:38,360 on and take another derivative. 655 00:42:40,860 --> 00:42:41,566 Yes? 656 00:42:41,566 --> 00:42:46,180 AUDIENCE: I guess you want the final equal to 0 [INAUDIBLE]. 657 00:42:46,180 --> 00:42:49,970 PROFESSOR: Oh, the final equal to 0 comes from the fact that 658 00:42:49,970 --> 00:42:54,080 this quantity here that you're starting with is equal to 1 659 00:42:54,080 --> 00:42:56,120 for all values of r. 660 00:42:56,120 --> 00:42:57,680 Therefore, I want to take the derivative with 661 00:42:57,680 --> 00:43:01,130 respect to r, I get 0. 662 00:43:01,130 --> 00:43:02,750 So that's one equation. 663 00:43:02,750 --> 00:43:05,100 The other thing is I just go through the mechanics of 664 00:43:05,100 --> 00:43:07,400 taking the derivative. 665 00:43:07,400 --> 00:43:11,720 OK, so let's try to take the second derivative. 666 00:43:11,720 --> 00:43:16,360 Take the second derivative by taking the derivative of the 667 00:43:16,360 --> 00:43:18,290 first derivative. 668 00:43:18,290 --> 00:43:26,370 And what happens is then is this quantity in here I get an 669 00:43:26,370 --> 00:43:29,320 extra term of that sitting over there. 670 00:43:29,320 --> 00:43:31,840 And along with that, I get the derivative of this with 671 00:43:31,840 --> 00:43:33,090 respect to r. 672 00:43:35,480 --> 00:43:39,060 I should probably have written that down there but since I 673 00:43:39,060 --> 00:43:47,160 didn't, let me see if I can do it. 674 00:43:47,160 --> 00:44:03,740 I get the expected value of SJ minus J gamma prime of r. 675 00:44:03,740 --> 00:44:05,650 And this quantity is squared now, 676 00:44:05,650 --> 00:44:07,450 because I have this there. 677 00:44:07,450 --> 00:44:11,780 I'm taking the derivative of this term with respect to r. 678 00:44:11,780 --> 00:44:16,190 And also, I have to take the derivative of this with 679 00:44:16,190 --> 00:44:17,700 respect to r. 680 00:44:17,700 --> 00:44:25,630 So that gives me minus J times gamma double prime of r. 681 00:44:25,630 --> 00:44:41,570 And all of this times E to the r SJ minus J gamma of r. 682 00:44:45,260 --> 00:44:49,990 Now I want to evaluate this at r equals 0. 683 00:44:49,990 --> 00:44:54,510 Evaluating this at r equals 0, this term goes away. 684 00:44:54,510 --> 00:45:04,030 So I wind up with the expected value of SJ minus J gamma 685 00:45:04,030 --> 00:45:12,240 prime of r where minus J gamma double prime of 686 00:45:12,240 --> 00:45:17,740 r is equal to 0. 687 00:45:17,740 --> 00:45:20,060 Well, this doesn't look bad. 688 00:45:20,060 --> 00:45:23,700 But if you try to use it if you expand this term here, you 689 00:45:23,700 --> 00:45:30,480 get a term the expected value of S of J times J. And you can 690 00:45:30,480 --> 00:45:32,440 struggle with that. 691 00:45:32,440 --> 00:45:33,900 And it's ugly. 692 00:45:33,900 --> 00:45:34,920 That's very ugly. 693 00:45:34,920 --> 00:45:43,340 But if you now say, if we have a mean we 694 00:45:43,340 --> 00:45:45,200 can use Wald's equality. 695 00:45:45,200 --> 00:45:47,150 It tells us what we want to know. 696 00:45:47,150 --> 00:45:51,170 If we don't have a mean, then Wald's equality doesn't tell 697 00:45:51,170 --> 00:45:53,320 us anything. 698 00:45:53,320 --> 00:45:55,670 But this is going to tell us something. 699 00:45:55,670 --> 00:45:59,100 So we're going to make the assumption here that r is 700 00:45:59,100 --> 00:46:06,525 equal to 0 and X bar is equal to 0. 701 00:46:09,340 --> 00:46:13,590 And if X bar is equal to 0, gamma prime of 702 00:46:13,590 --> 00:46:17,170 0 is equal to 0. 703 00:46:17,170 --> 00:46:29,540 And gamma double prime of 0 is equal to sigma squared of X. 704 00:46:29,540 --> 00:46:31,810 So you do all of that. 705 00:46:31,810 --> 00:46:37,030 What you get is the expected value of S sub J squared minus 706 00:46:37,030 --> 00:46:42,070 sigma X squared of J is equal to 0. 707 00:46:42,070 --> 00:46:45,460 This is the same kind of thing that we 708 00:46:45,460 --> 00:46:48,360 got from Wald's equality. 709 00:46:48,360 --> 00:46:52,130 From Wald's equality, it didn't tell us anything. 710 00:46:52,130 --> 00:46:56,100 It just gave us a relationship between the expected value of 711 00:46:56,100 --> 00:46:59,780 S sub J and expected value of J. This is 712 00:46:59,780 --> 00:47:01,180 doing the same thing. 713 00:47:01,180 --> 00:47:05,540 It's giving us a relationship between the expected value of 714 00:47:05,540 --> 00:47:12,040 S sub J squared and the expected value of J. So we get 715 00:47:12,040 --> 00:47:13,930 the same kind of quantity. 716 00:47:13,930 --> 00:47:16,970 It's doing the same thing for us. 717 00:47:16,970 --> 00:47:27,530 Now you look at this for a 0 means simple random walk. 718 00:47:27,530 --> 00:47:29,590 Now you would have thought before you started to take 719 00:47:29,590 --> 00:47:34,390 this class that a simple random walk with mean 0 was 720 00:47:34,390 --> 00:47:37,350 the simplest thing in the world. 721 00:47:37,350 --> 00:47:41,010 And we've seen by looking at that it really isn't all that 722 00:47:41,010 --> 00:47:45,830 simple, that you come play these silly games like you can 723 00:47:45,830 --> 00:47:50,750 gamble forever, where with probability 1/2 you lose $1, 724 00:47:50,750 --> 00:47:53,590 with probability 1/2 you win $1-- 725 00:47:53,590 --> 00:47:55,420 perfectly fair game. 726 00:47:55,420 --> 00:47:58,365 And with probability one, you to make $1 out of that, and 727 00:47:58,365 --> 00:48:00,140 quit and go home. 728 00:48:00,140 --> 00:48:03,470 And since you to make $1 out of it, and quit and go home, 729 00:48:03,470 --> 00:48:06,610 you can then quickly come back again and it again. 730 00:48:06,610 --> 00:48:07,960 You can make $2. 731 00:48:07,960 --> 00:48:09,440 You can make $10. 732 00:48:09,440 --> 00:48:10,750 You can make $1,000. 733 00:48:10,750 --> 00:48:14,770 You can make $1 million with probability 1. 734 00:48:14,770 --> 00:48:17,840 So the simple random walk is no longer simple. 735 00:48:17,840 --> 00:48:19,820 It becomes puzzling. 736 00:48:19,820 --> 00:48:23,910 But Wald's identity is dealing with two thresholds, one of 737 00:48:23,910 --> 00:48:27,390 alpha and one at beta. 738 00:48:27,390 --> 00:48:34,380 When we apply this and you observe it as a simple random 739 00:48:34,380 --> 00:48:39,810 walk, where you either go up by 1 or go down by 1, each 740 00:48:39,810 --> 00:48:44,410 with probability 1/2, the mean of X is 0 and the 741 00:48:44,410 --> 00:48:46,730 variance of X is 1. 742 00:48:46,730 --> 00:48:49,900 So this quantity here is 1. 743 00:48:49,900 --> 00:48:56,240 You can then play games with what the probability is that 744 00:48:56,240 --> 00:48:59,526 you hit the upper threshold and the probability that you 745 00:48:59,526 --> 00:49:01,150 hit the lower threshold. 746 00:49:01,150 --> 00:49:02,270 I mean, it's done in the text. 747 00:49:02,270 --> 00:49:05,390 You don't have to take my word for it. 748 00:49:05,390 --> 00:49:09,680 And when you do that, what you find is the expected value of 749 00:49:09,680 --> 00:49:14,230 J is equal to minus beta times alpha. 750 00:49:14,230 --> 00:49:18,000 Theta is a negative number, remember, so this is expected 751 00:49:18,000 --> 00:49:22,690 value of J is the magnitude of theta times the 752 00:49:22,690 --> 00:49:23,940 magnitude of alpha. 753 00:49:26,290 --> 00:49:30,230 Now that's a little bizarre, but then you think about it a 754 00:49:30,230 --> 00:49:31,370 little bit. 755 00:49:31,370 --> 00:49:34,410 You think what happens. 756 00:49:34,410 --> 00:49:36,310 And this is really exact. 757 00:49:36,310 --> 00:49:40,720 I mean, this isn't an approximation or anything. 758 00:49:40,720 --> 00:49:45,000 If alpha is very large, and beta is very large and 759 00:49:45,000 --> 00:49:50,050 negative, and you play this random walk game, you're going 760 00:49:50,050 --> 00:49:52,250 to fluctuate a long time. 761 00:49:52,250 --> 00:49:54,350 You're going to disperse slowly. 762 00:49:54,350 --> 00:49:56,920 You're going to disperse according to the square root 763 00:49:56,920 --> 00:50:00,010 of n, or the number of tests you take. 764 00:50:00,010 --> 00:50:03,720 So the amount of time it takes you until you get way out to 765 00:50:03,720 --> 00:50:08,330 these thresholds should be-- 766 00:50:08,330 --> 00:50:16,080 to the namely the value that n has to have-- 767 00:50:16,080 --> 00:50:19,990 roughly the square of alpha when beta and alpha 768 00:50:19,990 --> 00:50:21,040 are both the same. 769 00:50:21,040 --> 00:50:22,820 This is something more general than that. 770 00:50:22,820 --> 00:50:27,950 It says that if Sn, the stop-when-you're-ahead game, 771 00:50:27,950 --> 00:50:34,820 we make alpha equals 1, the expected value of J depends on 772 00:50:34,820 --> 00:50:37,110 what the lower threshold is. 773 00:50:37,110 --> 00:50:41,870 And that suddenly makes sense, because what that's saying is 774 00:50:41,870 --> 00:50:45,530 if we have a lower threshold at 10, an upper threshold at 775 00:50:45,530 --> 00:50:54,170 one, then most of the time you win. 776 00:50:54,170 --> 00:50:57,250 But when you lose, you lose $10. 777 00:50:57,250 --> 00:50:58,900 When you win, you win, $1. 778 00:50:58,900 --> 00:51:02,000 When you set a lower threshold at 100, when you 779 00:51:02,000 --> 00:51:03,800 lose, you lose $100. 780 00:51:03,800 --> 00:51:06,370 When you win, you win $1. 781 00:51:06,370 --> 00:51:09,310 And suddenly, that stop-when-you're-ahead game 782 00:51:09,310 --> 00:51:13,300 does not look quite as attractive as it did before. 783 00:51:13,300 --> 00:51:17,290 What you're doing is taking a chance where you're probably 784 00:51:17,290 --> 00:51:22,950 going to win of winning $1, and you're risking your life's 785 00:51:22,950 --> 00:51:27,470 assets for it, which doesn't make too much sense anymore. 786 00:51:27,470 --> 00:51:32,200 OK this, I think, it gives you a better idea of what's going 787 00:51:32,200 --> 00:51:36,600 on on the simple random walk than anything else I've seen. 788 00:51:50,210 --> 00:51:53,100 So it's time to start talking about martingales. 789 00:51:56,570 --> 00:51:59,740 A martingale, like most of the other things we've been 790 00:51:59,740 --> 00:52:02,120 talking about in the course, is a 791 00:52:02,120 --> 00:52:05,320 sequence of random variables. 792 00:52:05,320 --> 00:52:08,650 This is a more general kind of sequence than most of them. 793 00:52:12,085 --> 00:52:16,200 Almost all of the processes we've talked about so far have 794 00:52:16,200 --> 00:52:20,400 been the kinds of things you can sort of get your hands on. 795 00:52:20,400 --> 00:52:23,950 And this is defined very abstractly in terms of a 796 00:52:23,950 --> 00:52:26,690 peculiar property that it has. 797 00:52:26,690 --> 00:52:31,500 And then the peculiar property it has is the expected value 798 00:52:31,500 --> 00:52:36,320 as the nth term, in this thing called a martingale, 799 00:52:36,320 --> 00:52:40,010 conditional on knowing the values of all the previous 800 00:52:40,010 --> 00:52:45,340 values, expected value of Z sub n given the value of Z and 801 00:52:45,340 --> 00:52:50,130 minus 1, Z and minus 2, all the way down to Z1 is equal to 802 00:52:50,130 --> 00:52:52,590 Z sub n minus 1. 803 00:52:52,590 --> 00:53:00,330 Namely, the expected value here is what you had there. 804 00:53:00,330 --> 00:53:05,370 The word martingale comes from gambling, where gamblers used 805 00:53:05,370 --> 00:53:09,740 to spend a great deal of time trying to find gambling 806 00:53:09,740 --> 00:53:15,020 strategies when to stop, when to start betting bigger, when 807 00:53:15,020 --> 00:53:18,460 to start betting smaller, when to do all sorts of things, all 808 00:53:18,460 --> 00:53:23,050 sorts of strategies for how to lose less money. 809 00:53:23,050 --> 00:53:27,520 Let me put it that way, because you rarely find that 810 00:53:27,520 --> 00:53:30,890 opportunity where you can play a fair game. 811 00:53:30,890 --> 00:53:37,450 But if you play a fair game, martingales are what sort of 812 00:53:37,450 --> 00:53:38,650 rules on that. 813 00:53:38,650 --> 00:53:42,560 And what that says is if you play this game for a long 814 00:53:42,560 --> 00:53:47,480 time, your capital is Z sub n minus 1. 815 00:53:50,790 --> 00:53:54,090 This says, figure expected capital after you 816 00:53:54,090 --> 00:53:56,790 play one more time. 817 00:53:56,790 --> 00:54:01,640 No matter what strategy you use, your expected capital is 818 00:54:01,640 --> 00:54:05,460 going to be the same as was as the actual 819 00:54:05,460 --> 00:54:09,170 capital the time before. 820 00:54:09,170 --> 00:54:14,360 If this is too abstract for you, and it's too abstract for 821 00:54:14,360 --> 00:54:18,010 me half the time, because I look at this, and I say, gee, 822 00:54:18,010 --> 00:54:20,840 that's not much of a restriction, is it? 823 00:54:20,840 --> 00:54:24,380 What we're talking about is expected values here. 824 00:54:24,380 --> 00:54:28,810 But it's more than that, because it's saying for every 825 00:54:28,810 --> 00:54:33,990 choice of sample value for all of these things, none of them 826 00:54:33,990 --> 00:54:37,440 make any difference, except the last one. 827 00:54:37,440 --> 00:54:39,970 And that's what happens in gambling. 828 00:54:39,970 --> 00:54:44,130 It doesn't make any difference how your capital has gotten to 829 00:54:44,130 --> 00:54:49,310 the point where it is at time n minus 1. 830 00:54:49,310 --> 00:54:55,880 You make a bet in a fair bet, and what you win is solely a 831 00:54:55,880 --> 00:55:01,020 function of what you've bet, if the game is fair. 832 00:55:01,020 --> 00:55:02,270 And that's what this is saying. 833 00:55:05,790 --> 00:55:09,510 So when you write it out this way, the expected value of Zn, 834 00:55:09,510 --> 00:55:13,820 given that 1, the random variable Zn minus 1 has a 835 00:55:13,820 --> 00:55:16,900 particular value Zn minus 1. 836 00:55:19,920 --> 00:55:24,530 You started out with a particular value. 837 00:55:24,530 --> 00:55:29,520 It says that expected value is equal to what you 838 00:55:29,520 --> 00:55:32,870 said at the last time. 839 00:55:32,870 --> 00:55:36,170 And this is true for all sample values Zn minus 1 down 840 00:55:36,170 --> 00:55:40,080 to Z1, which is why it's a much stronger statement than 841 00:55:40,080 --> 00:55:41,330 it appears to be. 842 00:55:43,910 --> 00:55:46,200 now there's a lemma. 843 00:55:46,200 --> 00:55:49,250 I want to talk about that a little bit, because it's a 844 00:55:49,250 --> 00:55:51,500 good time to get you used to what these 845 00:55:51,500 --> 00:55:53,055 expected values mean. 846 00:55:55,810 --> 00:56:02,040 For martingale, the expected value of Zn given Zi, Zi minus 847 00:56:02,040 --> 00:56:04,080 1, all the way down to Z1. 848 00:56:04,080 --> 00:56:07,770 This expected value is equal to Z sub i. 849 00:56:07,770 --> 00:56:12,560 In other words, it's not only that your expected capital, 850 00:56:12,560 --> 00:56:18,030 given all of the past, is equal to what you had on the 851 00:56:18,030 --> 00:56:19,880 last time instant. 852 00:56:19,880 --> 00:56:25,830 If you're not given anything for 100 years back, and all 853 00:56:25,830 --> 00:56:33,750 you know is what your capital was 100 years ago, and if we 854 00:56:33,750 --> 00:56:36,180 think we're playing a fair game all of this time, which 855 00:56:36,180 --> 00:56:39,840 is of course always a question, the expected value 856 00:56:39,840 --> 00:56:45,530 of what we have now, conditional on everything from 857 00:56:45,530 --> 00:56:51,730 100 years back through recorded history, is 858 00:56:51,730 --> 00:56:53,400 just that last term. 859 00:56:53,400 --> 00:56:58,350 In other words, it's the same kind of isolation of the past 860 00:56:58,350 --> 00:57:01,790 from the future as we had with Markov change. 861 00:57:01,790 --> 00:57:05,260 With Markov change, remember, it's only what happens at one 862 00:57:05,260 --> 00:57:10,990 instant, given what happens a one instant, it makes the past 863 00:57:10,990 --> 00:57:13,080 independent of the future. 864 00:57:13,080 --> 00:57:17,430 Here it's not quite that way, because the past and the 865 00:57:17,430 --> 00:57:22,140 future are separated only in terms of the details of the 866 00:57:22,140 --> 00:57:27,135 past, and the expected value the future. 867 00:57:27,135 --> 00:57:33,320 It says expected value of Z sub n given all of the details 868 00:57:33,320 --> 00:57:36,640 of the past, no matter what the details of the past are, 869 00:57:36,640 --> 00:57:41,340 the effective value of Zn is equal to the actual value at 870 00:57:41,340 --> 00:57:43,440 times Z sub i. 871 00:57:43,440 --> 00:57:48,780 So I want to improve this for you, and I warn you you're not 872 00:57:48,780 --> 00:57:50,880 going to follow this proof. 873 00:57:50,880 --> 00:57:53,820 And that's part of the reason for me to do it, because I 874 00:57:53,820 --> 00:57:59,840 want you to go back to Chapter 1 and think that through. 875 00:57:59,840 --> 00:58:02,740 Because in dealing with martingales, you have to think 876 00:58:02,740 --> 00:58:04,300 this through. 877 00:58:04,300 --> 00:58:07,240 Because if you don't think it through, you're stuck with 878 00:58:07,240 --> 00:58:10,760 this notation all the way through. 879 00:58:10,760 --> 00:58:14,730 And if you try to use this notation with martingales, 880 00:58:14,730 --> 00:58:18,330 this is nice notation when you get confused, but you don't 881 00:58:18,330 --> 00:58:19,950 want to use it all the time. 882 00:58:19,950 --> 00:58:23,500 So you have to be able to go through arguments like this. 883 00:58:23,500 --> 00:58:30,860 What I want to show is that if E to the Z3, given Z1 and Z2 884 00:58:30,860 --> 00:58:37,390 is equal to Z2, then one special case of this lemma is 885 00:58:37,390 --> 00:58:45,370 that expected value of Z3 given Z1 is equal to Z1. 886 00:58:45,370 --> 00:58:47,480 And how do we show that? 887 00:58:47,480 --> 00:58:51,930 Well, what we do is we use this law complete expectation. 888 00:58:55,930 --> 00:59:00,500 Well, first we remember the expected value of an arbitrary 889 00:59:00,500 --> 00:59:06,830 random variable X is the expected value of the expected 890 00:59:06,830 --> 00:59:10,050 value of X given Y. Now what does that mean? 891 00:59:10,050 --> 00:59:17,000 The expected value of the random variable X, given Y, is 892 00:59:17,000 --> 00:59:18,990 a random variable. 893 00:59:18,990 --> 00:59:22,500 It's a random variable which depends on Y. That's a 894 00:59:22,500 --> 00:59:27,250 function of the sample value of Y. Namely, if you look at 895 00:59:27,250 --> 00:59:30,990 this quantity up here, expected value of X 896 00:59:30,990 --> 00:59:33,290 given Y equals 1. 897 00:59:33,290 --> 00:59:36,130 Expected value of X given Y equals 2. 898 00:59:36,130 --> 00:59:38,830 Expected value of X given Y equals 3. 899 00:59:38,830 --> 00:59:41,010 We have all of these values here. 900 00:59:41,010 --> 00:59:43,720 We have a probability measure on it. 901 00:59:43,720 --> 00:59:51,150 This is a random variable, which is a function of Y. 902 00:59:51,150 --> 00:59:54,710 You've averaged that over X, but you're left move why 903 00:59:54,710 --> 00:59:56,810 because of the conditioning here. 904 00:59:56,810 --> 01:00:04,440 So this quantity in here is now a function of Y. So when 905 01:00:04,440 --> 01:00:11,290 we take this equation and we add the conditioning on Z1, 906 01:00:11,290 --> 01:00:16,920 namely, this is being used for Z3 and Z2. 907 01:00:16,920 --> 01:00:26,910 Expected value of Z3 is equal to the expected value over Z2 908 01:00:26,910 --> 01:00:32,340 of the expected value of Z3 given Z2, whole thing 909 01:00:32,340 --> 01:00:35,180 dependent on Z1. 910 01:00:35,180 --> 01:00:39,040 OK, so what it says is this expected value is the expected 911 01:00:39,040 --> 01:00:46,110 value of the expected value of Z3 condition on Z2 and Z1. 912 01:00:46,110 --> 01:00:50,840 This quantity here as a function of what? 913 01:00:50,840 --> 01:00:51,990 That's a random variable. 914 01:00:51,990 --> 01:00:55,136 It's a function of what random variables? 915 01:00:55,136 --> 01:00:56,540 AUDIENCE: Z2, Z1. 916 01:00:56,540 --> 01:00:59,350 PROFESSOR: Z1 and Z2, yes. 917 01:00:59,350 --> 01:01:02,380 So this is a function of Z1 and Z2. 918 01:01:02,380 --> 01:01:06,220 What value is it as a function of Z1 and Z2? 919 01:01:06,220 --> 01:01:09,040 It's just equal to Z2. 920 01:01:09,040 --> 01:01:12,360 So this quantity in here is Z2. 921 01:01:12,360 --> 01:01:17,620 so we're asking what's expected value of Z2 given Z1. 922 01:01:17,620 --> 01:01:23,960 And by definition of martingale, it is equal to Z1. 923 01:01:27,130 --> 01:01:30,910 Now I imagine about half you could follow that, and half of 924 01:01:30,910 --> 01:01:35,460 you couldn't, and half of you sort of followed it. 925 01:01:35,460 --> 01:01:39,530 This is a kind of argument we'll be using all the way 926 01:01:39,530 --> 01:01:41,840 through on this stuff. 927 01:01:41,840 --> 01:01:43,310 So make sure you understand it. 928 01:01:45,990 --> 01:01:49,220 I mean, once you get it, it's easy. 929 01:01:49,220 --> 01:01:52,390 And you can apply it in all sorts of places. 930 01:01:52,390 --> 01:01:55,010 So it's worth doing it. 931 01:01:55,010 --> 01:01:57,650 In the same way, you can follow the same kind of 932 01:01:57,650 --> 01:02:02,470 argument through the expected value Z sub i plus 2, using 933 01:02:02,470 --> 01:02:09,020 this total expectation based on Zi plus 1. 934 01:02:09,020 --> 01:02:11,900 And you go through the whole thing. 935 01:02:11,900 --> 01:02:17,770 When you go down to i equals 1, it says the expected value 936 01:02:17,770 --> 01:02:22,330 of z is equal to the expected value of Z1. 937 01:02:22,330 --> 01:02:26,420 If you want to become wealthy, have a wealthy parent, who 938 01:02:26,420 --> 01:02:28,880 leads to a lot of money 20 years ago. 939 01:02:28,880 --> 01:02:34,600 That's the easiest way to make a million dollars is to start 940 01:02:34,600 --> 01:02:37,580 out with 2 million dollars is the way that 941 01:02:37,580 --> 01:02:40,830 some people put it. 942 01:02:40,830 --> 01:02:45,770 OK, let's have some simple examples a martingales. 943 01:02:45,770 --> 01:02:49,536 One of them is a zero-mean random walk. 944 01:02:49,536 --> 01:02:52,370 Mainly, what I'm trying to do here is to show you this 945 01:02:52,370 --> 01:02:56,320 martingales are really pretty general things. 946 01:02:56,320 --> 01:03:00,480 And since there are many very general theorems that hold for 947 01:03:00,480 --> 01:03:03,980 all martingales, you can then apply them to all of these 948 01:03:03,980 --> 01:03:07,640 special cases, which is kind of neat. 949 01:03:07,640 --> 01:03:14,530 To have a zero-mean random walk, let Z sub n be the sum 950 01:03:14,530 --> 01:03:23,720 of X1 plus Xn and the X sub i's are IID and zero mean. 951 01:03:23,720 --> 01:03:26,710 The fact that they're IID makes it a random walk. 952 01:03:26,710 --> 01:03:30,060 The fact that there's zero mean makes it a special zero 953 01:03:30,060 --> 01:03:34,790 mean random walk, and the expected value of Z sub n, 954 01:03:34,790 --> 01:03:38,860 given Zn minus 1. 955 01:03:38,860 --> 01:03:47,870 All the way back, Zn now is Xn plus Zn minus 1. 956 01:03:47,870 --> 01:03:50,990 OK, Zn is the sum of all these random variables. 957 01:03:50,990 --> 01:03:54,030 So you get up n minus 1 of them, and then you add the 958 01:03:54,030 --> 01:03:55,290 last one in. 959 01:03:55,290 --> 01:04:07,250 So it's Zn minus 1 plus Xn, so its expected value of Xn plus 960 01:04:07,250 --> 01:04:12,700 Zn minus 1, given all the stuff before that. 961 01:04:12,700 --> 01:04:17,160 The expected value of Xn, given all this stuff, is what? 962 01:04:17,160 --> 01:04:20,640 Xn is independent of all the other X's, therefore it's 963 01:04:20,640 --> 01:04:23,760 independent of all the earlier Z's. 964 01:04:23,760 --> 01:04:27,840 And therefore, that's just expected value of Xn. 965 01:04:27,840 --> 01:04:32,740 So we have the expected value of Zn minus 1, given Zn minus 966 01:04:32,740 --> 01:04:34,380 1 back to Z1. 967 01:04:34,380 --> 01:04:39,130 What's expected value of Zn minus 1, given Zn minus 1? 968 01:04:39,130 --> 01:04:42,596 Well, it's Zn minus 1. 969 01:04:42,596 --> 01:04:44,380 That's no problem there. 970 01:04:44,380 --> 01:04:47,200 So this is 0. 971 01:04:47,200 --> 01:04:53,340 So this is equal to Zn minus 1, as it's supposed to be. 972 01:04:53,340 --> 01:04:56,770 All of these things you ought to go back and think them 973 01:04:56,770 --> 01:04:57,620 through yourself. 974 01:04:57,620 --> 01:05:01,120 Because the first time you look at martingales, all of 975 01:05:01,120 --> 01:05:05,040 this stuff, it's all pretty easy, but it all looks a 976 01:05:05,040 --> 01:05:07,860 little strange at first. 977 01:05:07,860 --> 01:05:10,260 The next one is sums of arbitrary 978 01:05:10,260 --> 01:05:12,540 dependent random variables. 979 01:05:12,540 --> 01:05:15,270 They're not quite arbitrary. 980 01:05:15,270 --> 01:05:19,900 Suppose you have a sequence of random variables, X sub i, i 981 01:05:19,900 --> 01:05:21,910 greater or equal to 1. 982 01:05:21,910 --> 01:05:25,970 And they satisfy the expected value of Xi, given all the 983 01:05:25,970 --> 01:05:30,310 earlier X of i's is equal to 0. 984 01:05:30,310 --> 01:05:33,060 It's similar to what a martingale is. 985 01:05:33,060 --> 01:05:37,350 But here, we're just saying the Xi's all have expected 986 01:05:37,350 --> 01:05:39,530 value of 0. 987 01:05:39,530 --> 01:05:45,350 And Zn, the sum of these, has to be a martingale. 988 01:05:45,350 --> 01:05:48,380 And I'm not going to write all a proof of that. 989 01:05:48,380 --> 01:05:55,020 I mean, this proof is really the same as this proof. 990 01:05:55,020 --> 01:05:59,110 This is really a pretty important thing, because given 991 01:05:59,110 --> 01:06:06,180 any martingale, you can always look at the partial sums 992 01:06:06,180 --> 01:06:09,370 between the terms of the martingale-- 993 01:06:09,370 --> 01:06:13,120 namely, given Z1, Z2, up to Z sub n. 994 01:06:13,120 --> 01:06:16,440 You can always look at Z2 minus Z1. 995 01:06:16,440 --> 01:06:18,970 You can look at Z3 minus Z2. 996 01:06:18,970 --> 01:06:23,980 You could at Z4 minus Z3, and so forth. 997 01:06:23,980 --> 01:06:26,910 And each of the Z's is just the sum of those 998 01:06:26,910 --> 01:06:28,380 other random variables. 999 01:06:28,380 --> 01:06:32,810 So given any martingale in the world, you can always define 1000 01:06:32,810 --> 01:06:37,920 the set of arbitrary depending random variables would satisfy 1001 01:06:37,920 --> 01:06:39,970 this rule here. 1002 01:06:39,970 --> 01:06:45,900 So Zn, in this case, is a martingale. 1003 01:06:45,900 --> 01:06:50,690 And if Zn is a martingale, you can always define a set of 1004 01:06:50,690 --> 01:06:55,450 random variables, which satisfy this property. 1005 01:06:55,450 --> 01:06:58,920 I think it's almost easier to see what a random variable 1006 01:06:58,920 --> 01:07:01,430 really has to do with gambling, which is where it 1007 01:07:01,430 --> 01:07:05,420 started, by looking at this. 1008 01:07:05,420 --> 01:07:08,790 This is not your capital at time n. 1009 01:07:08,790 --> 01:07:13,750 This is how much you win or lose at time i. 1010 01:07:13,750 --> 01:07:20,270 And what it's saying is your winnings or losings at time i 1011 01:07:20,270 --> 01:07:25,560 has zero mean independent of everything in the past. 1012 01:07:25,560 --> 01:07:28,770 In other words, in a fair game, you can bet whatever you 1013 01:07:28,770 --> 01:07:34,470 want to and depending on what you bet, that's the expected 1014 01:07:34,470 --> 01:07:37,940 amount you get on that trial. 1015 01:07:37,940 --> 01:07:40,590 And that's what this says. 1016 01:07:40,590 --> 01:07:43,860 This says essentially, you're applying a fair game. 1017 01:07:43,860 --> 01:07:47,100 So martingales really have to do is fair games. 1018 01:07:47,100 --> 01:07:50,390 If you can find fair games, why, that's great. 1019 01:07:50,390 --> 01:07:54,200 But we always look for games where we have an edge. 1020 01:07:54,200 --> 01:07:56,585 But what you want to avoid is games where Las 1021 01:07:56,585 --> 01:07:58,630 Vegas has an edge. 1022 01:07:58,630 --> 01:08:01,800 OK so, that's a general one. 1023 01:08:04,320 --> 01:08:10,790 Here's an interesting one, because I think this is an 1024 01:08:10,790 --> 01:08:13,470 example which you can use. 1025 01:08:13,470 --> 01:08:15,820 I mean, in any field you study, there are always 1026 01:08:15,820 --> 01:08:20,760 generic examples, which can be used to generate counter 1027 01:08:20,760 --> 01:08:24,899 examples to any simple thing you might want to think of. 1028 01:08:24,899 --> 01:08:27,850 And this is to me the most interesting one of those for 1029 01:08:27,850 --> 01:08:29,880 martingales. 1030 01:08:29,880 --> 01:08:34,800 Suppose that Xi is the product of two random variables-- 1031 01:08:34,800 --> 01:08:40,890 one is either plus 1 or minus 1, each with probability 1/2. 1032 01:08:40,890 --> 01:08:44,510 And the other one, Y sub i is anything it wants to be. 1033 01:08:44,510 --> 01:08:46,609 I don't care what Y sub i is. 1034 01:08:46,609 --> 01:08:50,810 Y sub i is non-negative, might as well make it non-negative. 1035 01:08:50,810 --> 01:08:51,540 I don't care about it. 1036 01:08:51,540 --> 01:08:54,670 I don't care how it's related to all the other Y sub i's. 1037 01:08:54,670 --> 01:08:58,200 All I want is the that the U sub i's are all independent of 1038 01:08:58,200 --> 01:09:00,999 all the Y sub i's. 1039 01:09:00,999 --> 01:09:03,590 And what happens then? 1040 01:09:03,590 --> 01:09:07,420 I take the expected value of X sub i, give it anything in the 1041 01:09:07,420 --> 01:09:09,529 past, and what do I get? 1042 01:09:12,100 --> 01:09:15,450 U sub i is independent of Y sub i. 1043 01:09:15,450 --> 01:09:19,229 And therefore, the expected value of U sub i times Y sub i 1044 01:09:19,229 --> 01:09:24,330 is expected value of U sub i, which is what-- 1045 01:09:24,330 --> 01:09:27,665 plus 1 or minus 1 of probability 1/2 each. 1046 01:09:27,665 --> 01:09:31,880 The expected value of U sub i is equal to 0. 1047 01:09:31,880 --> 01:09:35,050 That makes expected value of X sub i of 0 1048 01:09:35,050 --> 01:09:39,080 whatever the past is. 1049 01:09:39,080 --> 01:09:41,716 So you automatically have the-- 1050 01:09:46,676 --> 01:09:49,050 I don't know what to call in them, the terms between the 1051 01:09:49,050 --> 01:09:51,640 terms of a martingale-- 1052 01:09:51,640 --> 01:09:55,990 the interarrival terms, so to speak. 1053 01:09:55,990 --> 01:09:59,500 I mean, it's like those for a renewal process. 1054 01:09:59,500 --> 01:10:02,130 Those terms always have mean 0. 1055 01:10:02,130 --> 01:10:08,180 And therefore, the sums of these turn out to be this 1056 01:10:08,180 --> 01:10:11,000 simple kind of martingale. 1057 01:10:11,000 --> 01:10:14,910 So that's a nice martingale to use as counter examples for 1058 01:10:14,910 --> 01:10:17,350 almost anything. 1059 01:10:17,350 --> 01:10:19,730 The next one is product for martingales. 1060 01:10:19,730 --> 01:10:24,970 Product for martingales are things we use quite a bit too, 1061 01:10:24,970 --> 01:10:28,240 because now when we're using generating functions, we're in 1062 01:10:28,240 --> 01:10:31,200 the habit of multiplying things together. 1063 01:10:31,200 --> 01:10:34,320 And that's a useful thing to do. 1064 01:10:34,320 --> 01:10:41,000 So the expected value of Z to the n, given Z to the n minus 1065 01:10:41,000 --> 01:10:48,590 1, now to Z1, where Zn is this product of terms. 1066 01:10:48,590 --> 01:10:55,410 OK, Z sub n then is equal to Xn times Z sub n minus 1, 1067 01:10:55,410 --> 01:10:56,900 which is what we're doing here. 1068 01:10:56,900 --> 01:11:01,180 Expected value of Zn conditional on the past is the 1069 01:11:01,180 --> 01:11:06,600 expected value of Xn times Zn minus 1, conditional on the 1070 01:11:06,600 --> 01:11:13,170 past, Xn and Zn minus 1. 1071 01:11:19,750 --> 01:11:25,790 Oh, the expected value of Xn for any given value of Zn 1072 01:11:25,790 --> 01:11:29,770 minus 1, all the way back, is just the expected 1073 01:11:29,770 --> 01:11:32,750 value of X sub n. 1074 01:11:32,750 --> 01:11:36,510 So we have expected value of X sub n times the expected value 1075 01:11:36,510 --> 01:11:40,970 of Z sub n minus 1, given Zn n minus 1 down to Z1. 1076 01:11:40,970 --> 01:11:43,200 So that's just Zn minus 1. 1077 01:11:46,170 --> 01:11:51,260 Ah, the missing quantity, fortunately I wrote it here. 1078 01:11:51,260 --> 01:11:53,675 The X sub i's are unit means random variables. 1079 01:11:57,990 --> 01:11:58,880 And they're IIDs. 1080 01:11:58,880 --> 01:12:00,700 They're independent of each other. 1081 01:12:00,700 --> 01:12:05,260 And since the X sub i's are independent of each other, X 1082 01:12:05,260 --> 01:12:09,140 sub n is independent of Xn minus 1, all 1083 01:12:09,140 --> 01:12:12,110 the way back to X1. 1084 01:12:12,110 --> 01:12:16,080 Zn minus 1 back to Z1 is a function of Xn 1085 01:12:16,080 --> 01:12:19,140 minus 1, down to X1. 1086 01:12:19,140 --> 01:12:24,090 So Xn is independent of all those previous Z's also. 1087 01:12:24,090 --> 01:12:27,560 That's why I could split this apart in this way. 1088 01:12:27,560 --> 01:12:31,350 And suddenly I wind up with Zn n minus 1 again. 1089 01:12:31,350 --> 01:12:35,390 So product form martingales work. 1090 01:12:35,390 --> 01:12:39,010 Special form of product form martingales-- 1091 01:12:39,010 --> 01:12:44,930 this again is favored counter example for when you can and 1092 01:12:44,930 --> 01:12:48,530 can't get around with going to limits and 1093 01:12:48,530 --> 01:12:51,040 interchanging limits. 1094 01:12:51,040 --> 01:12:53,360 And it's a simple one. 1095 01:12:53,360 --> 01:12:56,740 Suppose that X sub i's are IID, as in 1096 01:12:56,740 --> 01:12:59,790 the previous example. 1097 01:12:59,790 --> 01:13:02,735 And they're [INAUDIBLE] probably 2 or 0. 1098 01:13:09,010 --> 01:13:12,820 I mean, this is a game you often play-- 1099 01:13:12,820 --> 01:13:14,190 double or nothing. 1100 01:13:14,190 --> 01:13:15,950 You start out with dollar. 1101 01:13:15,950 --> 01:13:16,890 You play the game. 1102 01:13:16,890 --> 01:13:18,780 If you win, you get $2. 1103 01:13:18,780 --> 01:13:21,300 If you lose, you're broke. 1104 01:13:21,300 --> 01:13:24,210 If you win, you play your $2. 1105 01:13:24,210 --> 01:13:26,020 If you win again, you have $4. 1106 01:13:26,020 --> 01:13:27,760 If you lose, you're broke. 1107 01:13:27,760 --> 01:13:28,950 You play again. 1108 01:13:28,950 --> 01:13:30,460 If you win, you have $8. 1109 01:13:30,460 --> 01:13:33,020 If you lose again, you're broke. 1110 01:13:33,020 --> 01:13:39,140 So the probability that Z sub n, which is your capital after 1111 01:13:39,140 --> 01:13:43,640 n trials, is equals to 2 to the n, namely you've won all n 1112 01:13:43,640 --> 01:13:48,540 times, is 2 to the minus n. 1113 01:13:48,540 --> 01:13:52,840 And every other instance you've lost. 1114 01:13:52,840 --> 01:13:57,370 So the probability that Zn is equal to 0 is 1 minus 2 1115 01:13:57,370 --> 01:13:59,170 to the minus n. 1116 01:13:59,170 --> 01:14:02,660 So for each n, if you calculate the expected value 1117 01:14:02,660 --> 01:14:07,420 of Z sub n, it's equal to 1. 1118 01:14:07,420 --> 01:14:12,970 Namely, with probability 2 to the minus n, your capital is 2 1119 01:14:12,970 --> 01:14:15,850 to the n, so that's 1. 1120 01:14:15,850 --> 01:14:18,630 With all the other probability, you have nothing. 1121 01:14:18,630 --> 01:14:24,070 So your expected value of Z sub n is always equal to 1. 1122 01:14:24,070 --> 01:14:27,420 That's what this product form martingale says. 1123 01:14:27,420 --> 01:14:29,820 And this is a product form martingale. 1124 01:14:29,820 --> 01:14:36,150 However, the limit as n goes to infinity of Zn is equal to 1125 01:14:36,150 --> 01:14:39,130 0 with probability 1. 1126 01:14:39,130 --> 01:14:43,300 If you play double or nothing, eventually you lose. 1127 01:14:43,300 --> 01:14:45,920 And then you're wiped out. 1128 01:14:45,920 --> 01:14:47,830 In other words, there's no real purpose to playing the 1129 01:14:47,830 --> 01:14:50,720 game, because eventually you lose. 1130 01:14:50,720 --> 01:14:53,620 If you're playing with somebody else, and they're 1131 01:14:53,620 --> 01:14:56,500 playing double or nothing, then of course you get their 1132 01:14:56,500 --> 01:14:58,530 money eventually. 1133 01:14:58,530 --> 01:15:02,890 Or you go broke and the bank that you bank at fails, and 1134 01:15:02,890 --> 01:15:05,240 all that stuff. 1135 01:15:05,240 --> 01:15:06,810 We won't worry about that. 1136 01:15:06,810 --> 01:15:10,220 OK, but the point of this is that the limit as n goes to 1137 01:15:10,220 --> 01:15:13,690 infinity of Zn is equal to 0. 1138 01:15:13,690 --> 01:15:17,020 And the limit of the expected value of Z sub 1139 01:15:17,020 --> 01:15:19,920 n is equal to 1. 1140 01:15:19,920 --> 01:15:24,870 And therefore, the limit of Zn and the expected value of the 1141 01:15:24,870 --> 01:15:26,130 limit of Zn. 1142 01:15:26,130 --> 01:15:29,260 The expected value of the limit of Zn is 0. 1143 01:15:29,260 --> 01:15:33,550 The limit as expected value of Zn is equal to one. 1144 01:15:33,550 --> 01:15:36,530 So this is a case where you can't interchange limit and 1145 01:15:36,530 --> 01:15:37,920 expectation. 1146 01:15:37,920 --> 01:15:41,690 It's an easy one to keep in mind, because we all know 1147 01:15:41,690 --> 01:15:43,740 about playing double or nothing. 1148 01:15:50,630 --> 01:15:53,500 Might as well define submartingales and 1149 01:15:53,500 --> 01:15:55,540 supermartingales. 1150 01:15:55,540 --> 01:16:00,800 Because the first thing to know about them is they're 1151 01:16:00,800 --> 01:16:03,780 like martingales, except they're defined with 1152 01:16:03,780 --> 01:16:06,220 inequalities. 1153 01:16:06,220 --> 01:16:10,580 And for a submartingale, the expected value of Zn, given 1154 01:16:10,580 --> 01:16:12,970 all the previous terms, is greater than or 1155 01:16:12,970 --> 01:16:15,560 equal to Zn minus 1. 1156 01:16:15,560 --> 01:16:18,520 So submartingales go up. 1157 01:16:18,520 --> 01:16:20,630 Supermartingales are the opposite. 1158 01:16:20,630 --> 01:16:23,350 Supermartingales goes down. 1159 01:16:23,350 --> 01:16:27,930 What else could you expect from a mathematical theory? 1160 01:16:27,930 --> 01:16:29,410 Things that should go up, go down. 1161 01:16:29,410 --> 01:16:31,500 Things that should go down, go up. 1162 01:16:31,500 --> 01:16:34,110 Only thing you have to remember about submartingales 1163 01:16:34,110 --> 01:16:39,490 and supermartingales is you figure out what terminology 1164 01:16:39,490 --> 01:16:43,350 should have been used, and you remember the terminology they 1165 01:16:43,350 --> 01:16:45,750 use was the opposite of what they should use. 1166 01:16:51,520 --> 01:16:52,980 I don't know whether I've ever seen stupider 1167 01:16:52,980 --> 01:16:54,720 terminology than this. 1168 01:16:54,720 --> 01:16:57,890 And someone once explained the reasoning for it, and the 1169 01:16:57,890 --> 01:16:59,140 reasoning was stupid too. 1170 01:17:01,990 --> 01:17:03,786 So there's no excuse for that one. 1171 01:17:08,130 --> 01:17:11,680 We're only going to refer to submartingales in what we're 1172 01:17:11,680 --> 01:17:15,330 doing, partly because that's where most of the 1173 01:17:15,330 --> 01:17:16,980 neat results are. 1174 01:17:16,980 --> 01:17:24,920 And the other thing is if you have to deal with a 1175 01:17:24,920 --> 01:17:28,350 supermartingale, what you might as well do is instead of 1176 01:17:28,350 --> 01:17:29,430 dealing with a sequence-- 1177 01:17:29,430 --> 01:17:30,850 Z1, Z2-- 1178 01:17:30,850 --> 01:17:35,130 deal with the sequence minus Z1, minus Z2, and so forth. 1179 01:17:35,130 --> 01:17:39,290 And if you change the sign on all the terms, then you change 1180 01:17:39,290 --> 01:17:43,840 supermartingales into submartingales and vice versa. 1181 01:17:43,840 --> 01:17:47,510 You don't really have to deal with both of them. 1182 01:17:47,510 --> 01:17:57,090 Let me talk briefly about an inequality that I'm sure most 1183 01:17:57,090 --> 01:17:58,450 of you heard of. 1184 01:17:58,450 --> 01:18:00,480 How many people have heard of Jensen's Inequality? 1185 01:18:03,680 --> 01:18:06,930 Maybe half of you, so not everyone. 1186 01:18:06,930 --> 01:18:10,700 Well, it's one of the main work horses 1187 01:18:10,700 --> 01:18:12,540 of probability theory. 1188 01:18:16,730 --> 01:18:21,160 Even though we haven't seen it yet this term, you will see it 1189 01:18:21,160 --> 01:18:22,990 many times. 1190 01:18:22,990 --> 01:18:26,680 So what a convex function is. 1191 01:18:26,680 --> 01:18:31,210 A convex function in simple minded terms is something, a 1192 01:18:31,210 --> 01:18:33,790 convex function from r into r. 1193 01:18:33,790 --> 01:18:40,700 A real value convex function is a function which has a 1194 01:18:40,700 --> 01:18:43,250 positive second derivative everywhere. 1195 01:18:43,250 --> 01:18:46,760 So it curves down and comes back up again. 1196 01:18:46,760 --> 01:18:49,730 Since you also want to talk about functions which don't 1197 01:18:49,730 --> 01:18:53,740 have second derivatives, you want something more 1198 01:18:53,740 --> 01:18:55,110 general than that. 1199 01:18:55,110 --> 01:18:58,760 So you go from derivatives. 1200 01:18:58,760 --> 01:19:01,260 You go back to your high school ideas, 1201 01:19:01,260 --> 01:19:02,900 and you draw a picture. 1202 01:19:02,900 --> 01:19:07,060 And function is convex. 1203 01:19:07,060 --> 01:19:13,660 If all the tangents to the curve lie not strictly below, 1204 01:19:13,660 --> 01:19:17,530 but all the tangents can curve lie beneath the curve, so 1205 01:19:17,530 --> 01:19:20,800 wherever you draw a tangent, you get something which 1206 01:19:20,800 --> 01:19:24,530 doesn't cross the curve. 1207 01:19:24,530 --> 01:19:29,550 Magnitude of X is a convex function of X. Magnitude X as 1208 01:19:29,550 --> 01:19:32,120 a function of X looks like this. 1209 01:19:32,120 --> 01:19:34,990 You go down or up. 1210 01:19:34,990 --> 01:19:39,740 And all tangents to this, this goes off to infinity and this 1211 01:19:39,740 --> 01:19:40,790 goes off to infinity. 1212 01:19:40,790 --> 01:19:42,440 So there's no way to get something like 1213 01:19:42,440 --> 01:19:44,670 that in this tangent. 1214 01:19:44,670 --> 01:19:46,540 So you have one tangent here. 1215 01:19:46,540 --> 01:19:48,870 You have a bunch of tangents along here. 1216 01:19:48,870 --> 01:19:50,810 And you have one tangent there. 1217 01:19:50,810 --> 01:19:52,840 And they all lie below the curve. 1218 01:19:52,840 --> 01:19:56,320 So X bar is a convex function too. 1219 01:19:56,320 --> 01:20:02,390 And Jensen's Inequality says if H is convex, and if Z is a 1220 01:20:02,390 --> 01:20:07,970 random variable, it has finite expectation, then H of the 1221 01:20:07,970 --> 01:20:12,470 expected value of Z is less than or equal to the expected 1222 01:20:12,470 --> 01:20:16,960 value of H of Z. You can interchange expected value and 1223 01:20:16,960 --> 01:20:21,020 function with inequality like this, if 1224 01:20:21,020 --> 01:20:23,770 the function is convex. 1225 01:20:23,770 --> 01:20:25,180 Now, why is this true? 1226 01:20:29,430 --> 01:20:33,950 You can see why it's true automatically, if you're 1227 01:20:33,950 --> 01:20:38,600 dealing with a random variable that has only two values. 1228 01:20:38,600 --> 01:20:45,600 If you have two values for the random variable-- 1229 01:20:45,600 --> 01:20:47,680 Z is a random variable here. 1230 01:20:47,680 --> 01:20:51,630 You have one variable here, one value here, one sample 1231 01:20:51,630 --> 01:20:53,660 value here. 1232 01:20:53,660 --> 01:21:00,130 Look at what the expected value of H of Z is. 1233 01:21:00,130 --> 01:21:06,640 The expected value of H of Z is the expected value of this 1234 01:21:06,640 --> 01:21:10,173 and this with the appropriate probability put in on it, so 1235 01:21:10,173 --> 01:21:12,770 at some point that lies on the straight line 1236 01:21:12,770 --> 01:21:15,990 between here and there. 1237 01:21:15,990 --> 01:21:21,910 When you look at the H of the expect the value, then you 1238 01:21:21,910 --> 01:21:24,340 find the expected value along here. 1239 01:21:24,340 --> 01:21:27,200 You can think of finding it along the straight line here. 1240 01:21:27,200 --> 01:21:29,960 And then it's that point there. 1241 01:21:29,960 --> 01:21:40,500 So since the curve is convex, the H of the expected value of 1242 01:21:40,500 --> 01:21:43,870 Z is there's always a bunch of points. 1243 01:21:43,870 --> 01:21:45,530 Average them, which lie on a straight 1244 01:21:45,530 --> 01:21:47,820 line beneath the curve. 1245 01:21:47,820 --> 01:21:51,270 And the expected value of H of Z is taking the average 1246 01:21:51,270 --> 01:21:53,220 directly along the curve. 1247 01:21:53,220 --> 01:21:56,280 So you get this boosting up everywhere. 1248 01:21:56,280 --> 01:22:03,570 It's like saying that the absolute value of expected 1249 01:22:03,570 --> 01:22:09,470 value of Z is less than or equal to the expected value of 1250 01:22:09,470 --> 01:22:14,610 the absolute value of Z. 1251 01:22:14,610 --> 01:22:22,242 And I think I will stop there instead of going on, because 1252 01:22:22,242 --> 01:22:24,410 we had a lot of new things today. 1253 01:22:24,410 --> 01:22:30,390 And somehow sequential detection always wears one's 1254 01:22:30,390 --> 01:22:34,590 mind out in a short period of time. 1255 01:22:34,590 --> 01:22:35,840 That should be enough.