1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high-quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation or view additional materials from 6 00:00:13,960 --> 00:00:19,790 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:19,790 --> 00:00:21,040 ocw.mit.edu. 8 00:00:24,000 --> 00:00:26,220 PROFESSOR: OK, I guess we might as well start a minute 9 00:00:26,220 --> 00:00:29,830 early since those of you who are here are here. 10 00:00:32,580 --> 00:00:36,090 We're coming to the end of course. 11 00:00:36,090 --> 00:00:41,630 We're deep in chapter 7 now talking about random walks and 12 00:00:41,630 --> 00:00:44,580 detection theory. 13 00:00:44,580 --> 00:00:48,450 We'll get into martingales sometime next week. 14 00:00:48,450 --> 00:00:52,090 There are four more lectures after this one. 15 00:00:52,090 --> 00:00:55,290 The schedule was passed out at the beginning of the term. 16 00:00:55,290 --> 00:01:00,050 I don't know how I did it, but I somehow left off the last 17 00:01:00,050 --> 00:01:02,860 Wednesday of class. 18 00:01:02,860 --> 00:01:04,519 The final is going to be on Wednesday 19 00:01:04,519 --> 00:01:06,630 morning at the ice rink. 20 00:01:06,630 --> 00:01:08,260 I don't know what the ice rink is like. 21 00:01:08,260 --> 00:01:14,140 It doesn't sound like an ideal place to take a final, but I 22 00:01:14,140 --> 00:01:17,340 assume they must have desks there and all that stuff. 23 00:01:19,860 --> 00:01:22,700 We will send out a notice about that. 24 00:01:22,700 --> 00:01:27,710 This is the last homework set that you will have to turn in. 25 00:01:27,710 --> 00:01:34,550 We will probably have another set of practice problems and 26 00:01:34,550 --> 00:01:36,900 problems on-- 27 00:01:36,900 --> 00:01:40,870 but not things you should turn in. 28 00:01:40,870 --> 00:01:43,450 We will try to get solutions out on them 29 00:01:43,450 --> 00:01:44,920 fairly quickly, also. 30 00:01:44,920 --> 00:01:49,320 So you can do them, but also look at the answers right 31 00:01:49,320 --> 00:01:50,710 after you do them. 32 00:01:50,710 --> 00:01:55,550 OK, so let's get back to random walks. 33 00:01:55,550 --> 00:02:00,550 And remember what we were doing last time. 34 00:02:00,550 --> 00:02:04,720 A random walk, by definition, you have a sequence of IID 35 00:02:04,720 --> 00:02:06,860 random variables. 36 00:02:06,860 --> 00:02:10,360 You have partial sums of those random variables. 37 00:02:10,360 --> 00:02:13,090 S sub n is a sum of the first n of 38 00:02:13,090 --> 00:02:15,690 those IID random variables. 39 00:02:15,690 --> 00:02:21,790 And the sequence of partial sums S1, S2, S3, and so forth, 40 00:02:21,790 --> 00:02:25,345 that sequence is called a random walk. 41 00:02:25,345 --> 00:02:27,860 And if you graph the random walk, it's something which 42 00:02:27,860 --> 00:02:30,670 wanders up and down usually. 43 00:02:30,670 --> 00:02:34,840 And sometimes, if the mean of X is positive, it wanders off 44 00:02:34,840 --> 00:02:36,080 to infinity. 45 00:02:36,080 --> 00:02:38,520 If the mean of X is negative, it wanders 46 00:02:38,520 --> 00:02:40,225 off to minus infinity. 47 00:02:40,225 --> 00:02:44,960 If the mean of X is 0, it simply diffuses somewhat as 48 00:02:44,960 --> 00:02:46,120 time goes on. 49 00:02:46,120 --> 00:02:49,320 And what we're trying to find that is exactly how do these 50 00:02:49,320 --> 00:02:51,820 things work. 51 00:02:51,820 --> 00:02:53,670 So our focus here is going to be on 52 00:02:53,670 --> 00:02:56,640 threshold-crossing problems. 53 00:02:56,640 --> 00:03:01,340 Namely, what's the probability that this random walk is going 54 00:03:01,340 --> 00:03:08,110 to cross some threshold by or at some particular value of n? 55 00:03:08,110 --> 00:03:11,450 If you have two thresholds, one above and one below, 56 00:03:11,450 --> 00:03:13,960 what's the probability it's going to cross the one above? 57 00:03:13,960 --> 00:03:17,750 What's the probability it's going to cross the one below? 58 00:03:17,750 --> 00:03:21,200 And if it crosses one of these, when does it cross it? 59 00:03:21,200 --> 00:03:26,740 If it crosses it, how much of an overshoot is there? 60 00:03:26,740 --> 00:03:29,380 All of those problems just come in naturally by looking 61 00:03:29,380 --> 00:03:32,160 at a sum of IID random variables. 62 00:03:32,160 --> 00:03:35,710 But here we're going to be trying to study them in some 63 00:03:35,710 --> 00:03:40,500 consistent manner looking at the thresholds particularly. 64 00:03:40,500 --> 00:03:45,650 We've talked a little bit about two particularly 65 00:03:45,650 --> 00:03:47,030 important applications. 66 00:03:47,030 --> 00:03:49,960 One is [? GG1Qs ?]. 67 00:03:49,960 --> 00:03:54,760 And even far more important than that is this question of 68 00:03:54,760 --> 00:03:58,920 detection, or making decisions, or hypothesis 69 00:03:58,920 --> 00:04:01,880 testing, all of which are the same thing. 70 00:04:01,880 --> 00:04:06,340 You remember we did show that there was at least one 71 00:04:06,340 --> 00:04:09,980 threshold-crossing problem that was very, very easy. 72 00:04:09,980 --> 00:04:14,600 It's the threshold problem where the underlying random 73 00:04:14,600 --> 00:04:16,640 variable is binary. 74 00:04:16,640 --> 00:04:21,010 You either go up by 1 or you go down by 1 on each step. 75 00:04:21,010 --> 00:04:24,040 And the question is, what's the probability that you will 76 00:04:24,040 --> 00:04:29,570 cross some threshold at some k greater than 0? 77 00:04:29,570 --> 00:04:32,590 And it turns out that since you can only go up 1 each 78 00:04:32,590 --> 00:04:36,620 time, the probability of getting up to some point k is 79 00:04:36,620 --> 00:04:39,390 the probability you ever got up to 1. 80 00:04:39,390 --> 00:04:41,910 Given that you got up to 1, it's the probability that you 81 00:04:41,910 --> 00:04:43,250 ever got up to 2. 82 00:04:43,250 --> 00:04:45,260 Given you got up to 2, it's the probability you 83 00:04:45,260 --> 00:04:46,480 ever got up to 3. 84 00:04:46,480 --> 00:04:49,400 That doesn't mean that you go directly from 2 to 3. 85 00:04:49,400 --> 00:04:53,160 After you go to 2, you wander all around, and eventually you 86 00:04:53,160 --> 00:04:54,510 make it up to 3. 87 00:04:54,510 --> 00:04:57,920 If you do, then the question is, do you ever get from 3 to 88 00:04:57,920 --> 00:04:59,580 4, and so forth. 89 00:04:59,580 --> 00:05:03,650 And we found that the solution to that problem was p over 1 90 00:05:03,650 --> 00:05:08,110 minus p to the k-th power of p is less than or equal to 1/2. 91 00:05:08,110 --> 00:05:11,990 And we solved that problem, if you remember, back when we 92 00:05:11,990 --> 00:05:15,420 were talking about stop when you're ahead if you're playing 93 00:05:15,420 --> 00:05:17,960 coin tossing with somebody. 94 00:05:17,960 --> 00:05:26,180 And so let's go further and look particularly at this 95 00:05:26,180 --> 00:05:29,200 problem of detection, and decisions, and hypothesis 96 00:05:29,200 --> 00:05:33,720 testing, which is really not a particularly hard problem. 97 00:05:33,720 --> 00:05:38,250 But it's made particularly hard by statisticians who have 98 00:05:38,250 --> 00:05:44,870 so many special rules, peculiar cases, and almost 99 00:05:44,870 --> 00:05:48,500 mythology about making decisions. 100 00:05:48,500 --> 00:05:53,200 And you can imagine why because as long as you talk 101 00:05:53,200 --> 00:05:56,810 about probability, everybody knows you're talking about an 102 00:05:56,810 --> 00:05:58,440 abstraction. 103 00:05:58,440 --> 00:06:02,010 As soon as you start talking about making a decision, it 104 00:06:02,010 --> 00:06:04,910 suddenly becomes real. 105 00:06:04,910 --> 00:06:07,960 I mean, you look at a bunch of data and 106 00:06:07,960 --> 00:06:09,850 you have to do something. 107 00:06:09,850 --> 00:06:12,210 You look at a bunch of candidates for a job, you have 108 00:06:12,210 --> 00:06:13,760 to choose one. 109 00:06:13,760 --> 00:06:16,650 That's always very difficult because you might not choose 110 00:06:16,650 --> 00:06:17,270 the right one. 111 00:06:17,270 --> 00:06:19,170 You might choose a very poor one. 112 00:06:19,170 --> 00:06:21,830 But you have to do your best. 113 00:06:21,830 --> 00:06:25,260 If you're investing in stocks, you look at all the statistics 114 00:06:25,260 --> 00:06:26,500 of everything. 115 00:06:26,500 --> 00:06:28,060 And finally you say, that's where I'm 116 00:06:28,060 --> 00:06:30,210 going to put my money. 117 00:06:30,210 --> 00:06:32,340 Or if you're looking for a job you say, that's where I'm 118 00:06:32,340 --> 00:06:33,960 going to work, and you hope that that's 119 00:06:33,960 --> 00:06:35,700 going to work out well. 120 00:06:35,700 --> 00:06:38,790 There are all these situations where you can evaluate 121 00:06:38,790 --> 00:06:42,720 probabilities until you're sick in the head. 122 00:06:42,720 --> 00:06:44,810 They don't mean anything. 123 00:06:44,810 --> 00:06:47,170 It's only when you make a decision and actually do 124 00:06:47,170 --> 00:06:50,670 something with it that it really means something. 125 00:06:50,670 --> 00:06:53,850 So it becomes important at this point. 126 00:06:53,850 --> 00:06:58,240 The model we use for this, since we're studying 127 00:06:58,240 --> 00:06:59,880 probability theory-- 128 00:06:59,880 --> 00:07:03,360 well, actually, we're studying random processes. 129 00:07:03,360 --> 00:07:06,190 But we're really studying probability theory. 130 00:07:06,190 --> 00:07:09,360 You probably noticed that by now. 131 00:07:09,360 --> 00:07:13,040 Since we're studying probability, we study all 132 00:07:13,040 --> 00:07:16,530 these problems in terms of a probabilistic model. 133 00:07:16,530 --> 00:07:20,630 And in the probabilistic model, there's a discrete and, 134 00:07:20,630 --> 00:07:25,880 in most cases, binary random variable, H, which is called 135 00:07:25,880 --> 00:07:28,340 the hypothesis random variable. 136 00:07:28,340 --> 00:07:31,060 The sample values of H, you might as well 137 00:07:31,060 --> 00:07:32,900 call them 0 and 1. 138 00:07:32,900 --> 00:07:36,460 That's the easiest things to call binary things. 139 00:07:36,460 --> 00:07:39,590 They're called the alternative hypotheses. 140 00:07:39,590 --> 00:07:42,350 They have marginal probabilities because it's a 141 00:07:42,350 --> 00:07:44,100 probability model. 142 00:07:44,100 --> 00:07:45,440 You have a random variable. 143 00:07:45,440 --> 00:07:49,020 It can only take on the value 0 and 1, so it has to have 144 00:07:49,020 --> 00:07:51,740 probabilities of being 0 and 1. 145 00:07:51,740 --> 00:07:53,690 Along with that, there are all sorts of 146 00:07:53,690 --> 00:07:54,880 other random variables. 147 00:07:54,880 --> 00:07:58,360 The situation might be as complicated as you want. 148 00:07:58,360 --> 00:08:00,770 But since we're making decisions, we're making 149 00:08:00,770 --> 00:08:04,560 decisions on the basis of some set of alternatives. 150 00:08:04,560 --> 00:08:07,700 And here, since we're trying to talk about random walks, 151 00:08:07,700 --> 00:08:11,260 and martingales, and things like that, also we restrict 152 00:08:11,260 --> 00:08:14,960 our attention to particular kinds of observations. 153 00:08:14,960 --> 00:08:17,680 And the particular kind of observation that we restrict 154 00:08:17,680 --> 00:08:23,160 attention to here is a sequence of random variables, 155 00:08:23,160 --> 00:08:24,900 which we call the observation. 156 00:08:24,900 --> 00:08:26,060 You observe Y1. 157 00:08:26,060 --> 00:08:27,000 You observe Y2. 158 00:08:27,000 --> 00:08:29,460 You observe Y3, and so forth. 159 00:08:29,460 --> 00:08:33,539 In other words, you observe a sample value of each of those 160 00:08:33,539 --> 00:08:34,770 random variables. 161 00:08:34,770 --> 00:08:36,669 There are a whole sequence of them. 162 00:08:36,669 --> 00:08:40,409 And we assume, to make life simple for ourselves, that 163 00:08:40,409 --> 00:08:44,920 each of these are independent, conditional on the hypothesis. 164 00:08:44,920 --> 00:08:47,250 And they're identically distributed conditional on the 165 00:08:47,250 --> 00:08:48,080 hypothesis. 166 00:08:48,080 --> 00:08:54,150 That's what this says right here. 167 00:08:54,150 --> 00:08:57,030 This makes one more assumption that assumes that these 168 00:08:57,030 --> 00:09:00,040 observations are continuous random variables. 169 00:09:00,040 --> 00:09:02,600 That doesn't make much difference, there are just a 170 00:09:02,600 --> 00:09:05,790 few peculiarities that come in if these are 171 00:09:05,790 --> 00:09:07,780 discrete random variables. 172 00:09:07,780 --> 00:09:10,410 There also a few peculiarities that come in when they're 173 00:09:10,410 --> 00:09:11,530 continuous. 174 00:09:11,530 --> 00:09:13,830 And there are a lot of peculiarities that come in 175 00:09:13,830 --> 00:09:15,680 when they're absolutely arbitrary. 176 00:09:15,680 --> 00:09:18,780 But for the time being, just imagine each of these are 177 00:09:18,780 --> 00:09:20,800 continuous random variables. 178 00:09:20,800 --> 00:09:26,060 So for each value of n, we look at n observations. 179 00:09:26,060 --> 00:09:30,010 We can calculate the probability density that those 180 00:09:30,010 --> 00:09:35,730 observations would occur conditional on hypothesis 0. 181 00:09:35,730 --> 00:09:39,270 We can find the conditional probability they could occur 182 00:09:39,270 --> 00:09:41,700 conditional on hypothesis 1. 183 00:09:41,700 --> 00:09:47,460 And since they're IID, that's equal to this product here. 184 00:09:47,460 --> 00:09:52,210 Excuse me, they are not IID, they are conditionally ID. 185 00:09:52,210 --> 00:09:54,500 Conditional on the hypothesis. 186 00:09:54,500 --> 00:09:58,340 Namely, the idea is the world is one way or the world is 187 00:09:58,340 --> 00:09:59,270 another way. 188 00:09:59,270 --> 00:10:01,880 If the world is this way, then all of these 189 00:10:01,880 --> 00:10:03,860 hypotheses are IID. 190 00:10:03,860 --> 00:10:06,340 You're doing the same experiment again and again and 191 00:10:06,340 --> 00:10:10,590 again, but it's based on the same underlying hypothesis. 192 00:10:10,590 --> 00:10:15,460 Or, the underlying hypothesis is this over here. 193 00:10:15,460 --> 00:10:18,470 You make the number of observations all based on this 194 00:10:18,470 --> 00:10:22,820 same hypothesis, and you make as many of these IID 195 00:10:22,820 --> 00:10:25,330 observations conditional on that 196 00:10:25,330 --> 00:10:27,270 observation as you choose. 197 00:10:27,270 --> 00:10:29,440 And when you're all done, what do you do? 198 00:10:29,440 --> 00:10:31,140 You have to make your decision. 199 00:10:31,140 --> 00:10:35,050 OK, so this is a very simple-minded model of this 200 00:10:35,050 --> 00:10:38,240 very complicated and very important problem. 201 00:10:38,240 --> 00:10:41,830 But it's close enough to the truth that we can get a lot of 202 00:10:41,830 --> 00:10:44,190 observations from it. 203 00:10:44,190 --> 00:10:47,750 Now, I spent a lot last time talking about this. 204 00:10:47,750 --> 00:10:51,810 Spend a lot of time this time talking about it because when 205 00:10:51,810 --> 00:10:56,130 we use a probability model for this, when we say that we're 206 00:10:56,130 --> 00:10:57,740 studying probability theory. 207 00:10:57,740 --> 00:11:00,610 And therefore, we're going to use probability, we have 208 00:11:00,610 --> 00:11:04,900 suddenly allied ourselves completely with people called 209 00:11:04,900 --> 00:11:09,980 Bayesian statisticians or Bayesian probabilists. 210 00:11:09,980 --> 00:11:13,440 And we have gone against, turned our back on people 211 00:11:13,440 --> 00:11:17,150 called Non-Bayesians, or sometimes classical. 212 00:11:17,150 --> 00:11:19,280 I hate using the word "classical" because I like the 213 00:11:19,280 --> 00:11:25,120 word "classics." I like the classics for such an unusual 214 00:11:25,120 --> 00:11:26,630 point of view. 215 00:11:26,630 --> 00:11:30,090 And the unusual point of view is that we refuse to take a 216 00:11:30,090 --> 00:11:31,840 probability model. 217 00:11:31,840 --> 00:11:35,320 We accept the fact that on all the observations, all the 218 00:11:35,320 --> 00:11:37,370 observations are probabilistic. 219 00:11:37,370 --> 00:11:40,710 We assume we have a nice model for them, which makes sense. 220 00:11:40,710 --> 00:11:42,820 We can do whatever we want with that model. 221 00:11:42,820 --> 00:11:44,220 We can change the model. 222 00:11:44,220 --> 00:11:46,830 We can do whatever we want with a model. 223 00:11:46,830 --> 00:11:49,870 But if you once assume that these two hypotheses that 224 00:11:49,870 --> 00:11:52,730 you're trying to choose between, that they have a 225 00:11:52,730 --> 00:11:57,490 priori probabilities, then people get very upset about it 226 00:11:57,490 --> 00:11:59,670 because they say, well, if what the a priori 227 00:11:59,670 --> 00:12:03,640 probabilities are, why do you have to do a hypothesis test? 228 00:12:03,640 --> 00:12:05,980 You already understand everything there is to know 229 00:12:05,980 --> 00:12:07,510 about the problem. 230 00:12:07,510 --> 00:12:08,835 And they feel this is very strange. 231 00:12:11,870 --> 00:12:15,820 It's not strange because you use probability models. 232 00:12:15,820 --> 00:12:18,540 You use models to try to understand certain things 233 00:12:18,540 --> 00:12:19,870 about reality. 234 00:12:19,870 --> 00:12:21,860 And you assume as many things as you want to 235 00:12:21,860 --> 00:12:22,820 assume about it. 236 00:12:22,820 --> 00:12:26,110 And when you get all done, you either use all the assumptions 237 00:12:26,110 --> 00:12:27,320 or you don't use them. 238 00:12:27,320 --> 00:12:32,230 What we're going to find today is that when you use this 239 00:12:32,230 --> 00:12:36,380 assumption of a probability model, you can answer the 240 00:12:36,380 --> 00:12:40,310 questions that these classical statisticians go to great 241 00:12:40,310 --> 00:12:41,510 pains to answer. 242 00:12:41,510 --> 00:12:44,670 And you can ask them very, very simply. 243 00:12:44,670 --> 00:12:48,160 So that after we assume the a priori probabilities, we can 244 00:12:48,160 --> 00:12:52,390 calculate certain things which don't depend on those a priori 245 00:12:52,390 --> 00:12:53,860 probabilities. 246 00:12:53,860 --> 00:12:55,520 And therefore, we know two things. 247 00:12:55,520 --> 00:12:58,220 One, we know that if we did know the a priori 248 00:12:58,220 --> 00:13:02,290 probabilities, it wouldn't make any difference. 249 00:13:02,290 --> 00:13:05,670 And two, we know that if we can estimate the a priori 250 00:13:05,670 --> 00:13:08,710 probabilities, it makes a great deal of difference. 251 00:13:08,710 --> 00:13:10,380 And three-- 252 00:13:10,380 --> 00:13:13,250 and this is the most important point-- 253 00:13:13,250 --> 00:13:17,110 you make 100 observations of something. 254 00:13:17,110 --> 00:13:20,200 Somebody else says, I don't believe you, and comes in and 255 00:13:20,200 --> 00:13:22,530 makes another 100 observations. 256 00:13:22,530 --> 00:13:25,780 Somebody else makes another 100 observations. 257 00:13:25,780 --> 00:13:29,580 Now, even if the second person doesn't believe what the first 258 00:13:29,580 --> 00:13:34,400 person has done, it doesn't make sense as a scientist to 259 00:13:34,400 --> 00:13:38,800 completely eliminate all of that from consideration. 260 00:13:38,800 --> 00:13:41,940 Namely, what you would like to do is say well, since this 261 00:13:41,940 --> 00:13:46,190 person has found such and such, the a priori 262 00:13:46,190 --> 00:13:48,660 probabilities have changed. 263 00:13:48,660 --> 00:13:53,670 And then I can go on and make my 100 observations. 264 00:13:53,670 --> 00:13:57,590 I can either make a hypothesis test based on my 100 265 00:13:57,590 --> 00:14:02,090 observations or I can make a hypothesis test assuming that 266 00:14:02,090 --> 00:14:04,340 the other person did their work well. 267 00:14:04,340 --> 00:14:07,640 I can make it based on all of these observations. 268 00:14:07,640 --> 00:14:11,000 If you try to do that those two things in a classical 269 00:14:11,000 --> 00:14:13,800 formulation, you run into a lot of trouble. 270 00:14:13,800 --> 00:14:17,110 If you try to do them in this probabilistic formulation, 271 00:14:17,110 --> 00:14:18,810 it's all perfectly straightforward. 272 00:14:18,810 --> 00:14:22,220 Because you can either start out with a model in which 273 00:14:22,220 --> 00:14:25,490 you're taking 200 observations or you can start out with a 274 00:14:25,490 --> 00:14:28,200 model in which you take 100 observations. 275 00:14:28,200 --> 00:14:30,440 And then suddenly, the world changes. 276 00:14:30,440 --> 00:14:33,930 This hypothesis takes on, perhaps a different value. 277 00:14:33,930 --> 00:14:35,970 You take another hundred observations. 278 00:14:35,970 --> 00:14:39,150 So you do whatever you want to within a probabilistic 279 00:14:39,150 --> 00:14:40,510 formulation. 280 00:14:40,510 --> 00:14:47,250 But the other thing is, all of you that patiently have lived 281 00:14:47,250 --> 00:14:51,050 with this idea of studying probabilistic 282 00:14:51,050 --> 00:14:53,160 models all term long. 283 00:14:53,160 --> 00:14:56,480 You might as well keep on living with it. 284 00:14:56,480 --> 00:15:00,710 The fact that we're now interested in making decisions 285 00:15:00,710 --> 00:15:03,030 should not make you think that everything you've learned up 286 00:15:03,030 --> 00:15:06,580 until this point is baloney. 287 00:15:06,580 --> 00:15:10,050 And to move from here to a classical statistical 288 00:15:10,050 --> 00:15:13,520 formulation of the world would really be saying, I don't 289 00:15:13,520 --> 00:15:15,400 believe in probability theory. 290 00:15:15,400 --> 00:15:17,510 It's that bad. 291 00:15:17,510 --> 00:15:18,760 So here we go. 292 00:15:23,850 --> 00:15:26,490 I'm sorry, we did that. 293 00:15:26,490 --> 00:15:28,190 We were there. 294 00:15:28,190 --> 00:15:33,500 Assume that on the basis of observing a sample value of 295 00:15:33,500 --> 00:15:37,490 this sequence of observations, we have to make a decision 296 00:15:37,490 --> 00:15:42,060 about H. We have to choose H equals 0 or H equals 1. 297 00:15:42,060 --> 00:15:44,990 We have to detect whether or not H is 1. 298 00:15:44,990 --> 00:15:48,750 When you do this detection, you would think in the real 299 00:15:48,750 --> 00:15:51,010 world that you've detected something. 300 00:15:51,010 --> 00:15:54,330 If you've made a decision about something, that you've 301 00:15:54,330 --> 00:15:58,770 tested a hypothesis and you found that which is correct. 302 00:15:58,770 --> 00:16:00,460 Not at all. 303 00:16:00,460 --> 00:16:03,470 When you make decisions, you can make errors. 304 00:16:03,470 --> 00:16:06,700 And the question of what kinds of errors you're making is a 305 00:16:06,700 --> 00:16:11,170 major part of trying to make decisions. 306 00:16:11,170 --> 00:16:14,620 I mean, those people who make decisions and then can't 307 00:16:14,620 --> 00:16:17,780 believe that they might have made the wrong decision are 308 00:16:17,780 --> 00:16:20,210 the worst kind of fools. 309 00:16:20,210 --> 00:16:21,710 And you see them in politics. 310 00:16:21,710 --> 00:16:22,900 You see them in business. 311 00:16:22,900 --> 00:16:24,250 You see them in academia. 312 00:16:24,250 --> 00:16:27,280 You see them all over the place. 313 00:16:27,280 --> 00:16:30,640 When you make a decision and you've made a mistake, you get 314 00:16:30,640 --> 00:16:31,520 some more evidence. 315 00:16:31,520 --> 00:16:33,890 You see that it's a mistake and you change. 316 00:16:33,890 --> 00:16:37,790 The whole 19th century was taken up with-- 317 00:16:37,790 --> 00:16:41,760 I mean, the scientific community was driven by 318 00:16:41,760 --> 00:16:43,680 physicists in those days. 319 00:16:43,680 --> 00:16:47,490 And the idea of Newton's laws was the most 320 00:16:47,490 --> 00:16:48,930 sacred thing they had. 321 00:16:53,410 --> 00:16:55,450 Everybody believed in Newtonian 322 00:16:55,450 --> 00:16:57,300 mechanics in those days. 323 00:16:57,300 --> 00:17:01,500 When quantum mechanics came along, this wasn't just a 324 00:17:01,500 --> 00:17:04,619 minor perturbation in physics. 325 00:17:04,619 --> 00:17:07,069 This was a most crucial thing. 326 00:17:07,069 --> 00:17:10,010 This said, everything we've known goes out the window. 327 00:17:10,010 --> 00:17:13,069 We can't rely on anything anymore. 328 00:17:13,069 --> 00:17:17,950 But the physicists said, OK, I guess we made a mistake. 329 00:17:17,950 --> 00:17:19,390 We'll make new observations. 330 00:17:19,390 --> 00:17:22,170 We have new observations that can be made. 331 00:17:22,170 --> 00:17:25,420 We now see that Newtonian mechanics works over a certain 332 00:17:25,420 --> 00:17:27,050 range of things. 333 00:17:27,050 --> 00:17:29,390 It doesn't work in another ranges of things. 334 00:17:29,390 --> 00:17:31,180 And they go on and find new things. 335 00:17:31,180 --> 00:17:32,970 That's the same thing we do here. 336 00:17:32,970 --> 00:17:34,200 We take these models. 337 00:17:34,200 --> 00:17:36,820 We evaluate our error probabilities. 338 00:17:36,820 --> 00:17:39,510 And evaluating them, we then say, well, we've got to go on 339 00:17:39,510 --> 00:17:41,290 and take some more measurements. 340 00:17:41,290 --> 00:17:43,200 Or we say we're going to live with it. 341 00:17:43,200 --> 00:17:45,800 But we face the fact that there are errors involved. 342 00:17:45,800 --> 00:17:50,990 And in doing that, you have to take a probabilistic model. 343 00:17:50,990 --> 00:17:53,730 If you don't take a probabilistic model, it's very 344 00:17:53,730 --> 00:17:56,800 hard for you to talk honestly about what error 345 00:17:56,800 --> 00:17:58,270 probabilities are. 346 00:17:58,270 --> 00:18:00,336 So both ways-- 347 00:18:00,336 --> 00:18:02,910 well, I'm preaching and I'm sorry. 348 00:18:02,910 --> 00:18:10,250 But I've lived for a long time with many statisticians, many 349 00:18:10,250 --> 00:18:13,300 of whom get into my own field and who cause a 350 00:18:13,300 --> 00:18:16,530 great deal of trouble. 351 00:18:16,530 --> 00:18:19,850 So the only thing I can do it urge you all to be cautious 352 00:18:19,850 --> 00:18:20,500 about this. 353 00:18:20,500 --> 00:18:22,690 And to think the matter through on your own. 354 00:18:22,690 --> 00:18:25,960 I'm not telling you to take my point of view on it. 355 00:18:25,960 --> 00:18:28,870 I'm telling you, don't take other people's point of view 356 00:18:28,870 --> 00:18:30,120 without thinking it through. 357 00:18:32,690 --> 00:18:37,270 The probability experiment here really-- 358 00:18:37,270 --> 00:18:42,530 I mean, every probability model we view in terms of the 359 00:18:42,530 --> 00:18:48,350 real world, as you have this set of probabilities, a set of 360 00:18:48,350 --> 00:18:49,710 possible events. 361 00:18:49,710 --> 00:18:51,170 You do the experiment. 362 00:18:51,170 --> 00:18:53,580 There's one sample point that comes out. 363 00:18:53,580 --> 00:18:56,630 And after the one sample point comes out, then you know what 364 00:18:56,630 --> 00:18:59,040 the result of the experiment is. 365 00:18:59,040 --> 00:19:04,990 Here, the experiment consists both of what you normally view 366 00:19:04,990 --> 00:19:06,090 as the experiment. 367 00:19:06,090 --> 00:19:08,660 Namely, taking the observations. 368 00:19:08,660 --> 00:19:13,100 And it also involves a choice of hypotheses. 369 00:19:13,100 --> 00:19:16,780 Namely, there's not a correct hypothesis to start with. 370 00:19:16,780 --> 00:19:21,870 The experiment involves God throws his dice. 371 00:19:21,870 --> 00:19:26,170 Einstein didn't believe that God threw dice, but I do. 372 00:19:26,170 --> 00:19:30,260 And after throwing the dice, one or the other of these 373 00:19:30,260 --> 00:19:32,980 hypotheses turns out to be true. 374 00:19:32,980 --> 00:19:36,650 All of these observations point to that or they point to 375 00:19:36,650 --> 00:19:39,290 the other and you make a decision. 376 00:19:39,290 --> 00:19:42,660 OK, so the experiment consists both on choosing the 377 00:19:42,660 --> 00:19:45,330 hypothesis and on taking a whole sequence of 378 00:19:45,330 --> 00:19:46,480 observations. 379 00:19:46,480 --> 00:19:50,650 Now, the other thing to not forget in this-- 380 00:19:50,650 --> 00:19:53,040 because you really have to get this model in your mind or 381 00:19:53,040 --> 00:19:54,610 you're going to get very confused with all 382 00:19:54,610 --> 00:19:56,390 the things we do. 383 00:19:56,390 --> 00:19:59,280 The experiment consists on a whole sequence of 384 00:19:59,280 --> 00:20:03,540 observations, but only one choice of hypothesis. 385 00:20:03,540 --> 00:20:06,020 Namely, you do the experiment. 386 00:20:06,020 --> 00:20:08,650 There's a hypothesis that occurs, and there's a whole 387 00:20:08,650 --> 00:20:12,720 sequence of observations which are all IID conditional on 388 00:20:12,720 --> 00:20:13,970 that particular hypothesis. 389 00:20:16,730 --> 00:20:21,960 So that's the model we're going to be using. 390 00:20:21,960 --> 00:20:24,980 And now life is quite simple once we've 391 00:20:24,980 --> 00:20:27,040 explained the model. 392 00:20:27,040 --> 00:20:31,020 We can talk about the probability that H is equal to 393 00:20:31,020 --> 00:20:34,850 either 0 or 1, conditional on the 394 00:20:34,850 --> 00:20:37,750 sample point we've observed. 395 00:20:37,750 --> 00:20:43,570 It's equal to the a priori probability of that hypothesis 396 00:20:43,570 --> 00:20:47,520 times the density of the observation conditional on the 397 00:20:47,520 --> 00:20:53,870 hypothesis divided by just a normalization factor. 398 00:20:53,870 --> 00:20:57,360 Namely, the overall probability of that 399 00:20:57,360 --> 00:21:03,512 observation period, which is the sum of probability that 0 400 00:21:03,512 --> 00:21:07,090 is a correct hypothesis times this plus probability that 1 401 00:21:07,090 --> 00:21:12,490 is a correct hypothesis times the density given 1. 402 00:21:12,490 --> 00:21:15,530 This denominator here is a pain in the 403 00:21:15,530 --> 00:21:17,790 neck, as you can see. 404 00:21:17,790 --> 00:21:22,450 But you can avoid ever dealing with a denominator if you take 405 00:21:22,450 --> 00:21:29,570 this for H equals 0, divide by this for H equals 1, and then 406 00:21:29,570 --> 00:21:34,300 you have this term divided by this term all divided by this 407 00:21:34,300 --> 00:21:38,460 term for l equals 1 divided by the same thing. 408 00:21:38,460 --> 00:21:43,370 So the ratio, the probability that H equals 0 given y over 409 00:21:43,370 --> 00:21:46,390 the probability that H is 1 equals y is 410 00:21:46,390 --> 00:21:48,850 just this ratio here. 411 00:21:48,850 --> 00:21:52,990 Now, what's the probability of error if we make a decision at 412 00:21:52,990 --> 00:21:55,470 this point? 413 00:21:55,470 --> 00:22:01,320 If I've got in this particular sequence Y, this quantity here 414 00:22:01,320 --> 00:22:05,310 is, in fact, the probability that hypothesis 0 is correct 415 00:22:05,310 --> 00:22:08,020 in the model that we have chosen. 416 00:22:08,020 --> 00:22:14,870 So this is the probability that H is equal to 0 given Y. 417 00:22:14,870 --> 00:22:19,540 If we select 1 under these conditions, if we select 418 00:22:19,540 --> 00:22:22,700 hypothesis 1, if we make a decision and say, I'm going to 419 00:22:22,700 --> 00:22:25,740 guess that 1 is the right decision. 420 00:22:25,740 --> 00:22:29,070 That means that this is the probability 421 00:22:29,070 --> 00:22:30,370 you've made a mistake. 422 00:22:30,370 --> 00:22:32,810 Because this is the probability that H is actually 423 00:22:32,810 --> 00:22:34,330 0 rather than 1. 424 00:22:34,330 --> 00:22:38,150 This quantity here is the probability that you've made a 425 00:22:38,150 --> 00:22:41,705 mistake given that 1 is the correct hypothesis. 426 00:22:45,590 --> 00:22:48,240 So here we are sitting here with these 427 00:22:48,240 --> 00:22:49,640 probabilities of error. 428 00:22:49,640 --> 00:22:52,280 We don't have to do any calculations for them. 429 00:22:52,280 --> 00:22:55,100 Well, you might have to do a great deal of calculation to 430 00:22:55,100 --> 00:22:58,250 calculate this and to calculate this. 431 00:22:58,250 --> 00:23:01,630 But otherwise, the whole thing is just sitting there for you. 432 00:23:01,630 --> 00:23:03,600 So what do you do if you want to minimize the 433 00:23:03,600 --> 00:23:04,870 probability of error? 434 00:23:11,780 --> 00:23:14,810 This was the probability that you're going to make an error 435 00:23:14,810 --> 00:23:16,380 if you choose 1. 436 00:23:16,380 --> 00:23:19,275 This is the probability of error if you choose 0. 437 00:23:21,880 --> 00:23:24,670 We want to minimize the probability of error and we 438 00:23:24,670 --> 00:23:28,205 see the observation Y, we want to pick the one of these which 439 00:23:28,205 --> 00:23:29,370 is largest. 440 00:23:29,370 --> 00:23:33,580 And that's all there is to it. 441 00:23:33,580 --> 00:23:37,660 This is the decision rule that minimizes the 442 00:23:37,660 --> 00:23:41,020 probability of an error. 443 00:23:41,020 --> 00:23:44,790 It's based on knowing what P0 and P1 is. 444 00:23:44,790 --> 00:23:47,600 But otherwise, probability that H equals l is the correct 445 00:23:47,600 --> 00:23:51,000 hypothesis given the observation is probability 446 00:23:51,000 --> 00:23:55,000 that H equals L given Y. We maximize the a posteriori 447 00:23:55,000 --> 00:23:58,080 probability of choosing correctly by choosing the 448 00:23:58,080 --> 00:24:02,910 maximum over l of probability that H equals l given Y. 449 00:24:02,910 --> 00:24:08,350 This choosing directly, maximizing the a posteriori 450 00:24:08,350 --> 00:24:12,510 probability is called the MAP rule, Maximum A posteriori 451 00:24:12,510 --> 00:24:14,590 Probability. 452 00:24:14,590 --> 00:24:21,620 You can only solve the MAP problem if you assume that you 453 00:24:21,620 --> 00:24:23,340 know P0 P1. 454 00:24:23,340 --> 00:24:27,650 We do know P0 and P1 if we've selected a probability model. 455 00:24:27,650 --> 00:24:30,400 So when we select this probability model, we've 456 00:24:30,400 --> 00:24:35,000 already assumed what these a priori probabilities are, so 457 00:24:35,000 --> 00:24:37,390 we now make our observation. 458 00:24:37,390 --> 00:24:38,320 And after making our 459 00:24:38,320 --> 00:24:40,210 observation, we make a decision. 460 00:24:40,210 --> 00:24:44,890 And at that point, we have an a posteriori probability that 461 00:24:44,890 --> 00:24:47,260 each of the hypotheses is correct. 462 00:24:50,430 --> 00:24:53,350 Anybody has any issues with this? 463 00:24:53,350 --> 00:24:55,230 I mean, it looks painfully simple when you 464 00:24:55,230 --> 00:24:56,666 look at this way. 465 00:24:56,666 --> 00:25:01,650 And if it doesn't look painfully simple, please ask 466 00:25:01,650 --> 00:25:04,020 now or forever hold your peace as they say. 467 00:25:06,830 --> 00:25:07,040 Yeah? 468 00:25:07,040 --> 00:25:09,375 AUDIENCE: So can you explain how you get the equation? 469 00:25:09,375 --> 00:25:11,243 Can you explain how you get the equation 470 00:25:11,243 --> 00:25:13,330 on the first line? 471 00:25:13,330 --> 00:25:15,620 PROFESSOR: On the first line right up here? 472 00:25:15,620 --> 00:25:18,170 Yes, I use Bayes' law. 473 00:25:18,170 --> 00:25:19,620 AUDIENCE: So what is that? 474 00:25:19,620 --> 00:25:23,600 So that's P of A given B is equal to P of B given A? 475 00:25:23,600 --> 00:25:24,310 PROFESSOR: Yes. 476 00:25:24,310 --> 00:25:27,360 AUDIENCE: I don't quite see how to-- 477 00:25:27,360 --> 00:25:37,140 P of A given B is equal to P of B given A times P of A 478 00:25:37,140 --> 00:25:43,110 divided by P of A and B. 479 00:25:43,110 --> 00:25:47,826 If you take this over there then it's-- 480 00:25:47,826 --> 00:25:51,350 am I stating Bayes' law in a funny way? 481 00:25:51,350 --> 00:25:53,576 AUDIENCE: So the thing on the bottom is P of B? 482 00:25:53,576 --> 00:25:54,552 OK. 483 00:25:54,552 --> 00:25:56,016 PROFESSOR: What? 484 00:25:56,016 --> 00:25:57,266 AUDIENCE: OK, I get it. 485 00:26:02,330 --> 00:26:04,950 PROFESSOR: I mean, I might not be explained it well. 486 00:26:04,950 --> 00:26:06,200 AUDIENCE: [INAUDIBLE]. 487 00:26:08,120 --> 00:26:12,650 PROFESSOR: Except if you start out with P of A given B is 488 00:26:12,650 --> 00:26:17,090 equal to P of B given A times P of B divided by P of A. This 489 00:26:17,090 --> 00:26:28,210 quantity here is P of Y. So we have probability that H equals 490 00:26:28,210 --> 00:26:34,590 l times probability of Y given l divided by the probability 491 00:26:34,590 --> 00:26:36,650 of l to start with. 492 00:26:36,650 --> 00:26:43,590 OK, so you maximize the a posteriori probability by 493 00:26:43,590 --> 00:26:45,180 choosing the maximum of these. 494 00:26:45,180 --> 00:26:47,650 It's called the MAP rule. 495 00:26:47,650 --> 00:26:53,550 And it doesn't require you to calculate this quantity, which 496 00:26:53,550 --> 00:26:55,000 is sometimes a mess. 497 00:26:55,000 --> 00:26:58,110 All it requires you to do is to compare these two 498 00:26:58,110 --> 00:27:01,410 quantities, which means you have to compare these two 499 00:27:01,410 --> 00:27:02,780 quantities. 500 00:27:02,780 --> 00:27:04,136 AUDIENCE: It's 10 o'clock. 501 00:27:04,136 --> 00:27:05,110 PROFESSOR: Well, excuse me. 502 00:27:05,110 --> 00:27:05,250 Yes. 503 00:27:05,250 --> 00:27:06,500 Yes, I know. 504 00:27:13,480 --> 00:27:17,930 These things become clearer if you state them in terms of 505 00:27:17,930 --> 00:27:20,320 what you call the likelihood ratio. 506 00:27:20,320 --> 00:27:23,670 Likelihood ratio only works when you have two hypotheses. 507 00:27:23,670 --> 00:27:29,490 When you have two hypotheses, you call the ratio of one of 508 00:27:29,490 --> 00:27:34,440 them to the other one the likelihood ratio. 509 00:27:34,440 --> 00:27:37,290 Why do I put 0 up here and 1 down here? 510 00:27:37,290 --> 00:27:40,400 Absolutely no reason at all, it's just convention. 511 00:27:40,400 --> 00:27:42,290 And unfortunately, it's a convention that 512 00:27:42,290 --> 00:27:43,920 not everybody follows. 513 00:27:43,920 --> 00:27:46,440 So some people have one convention and some people 514 00:27:46,440 --> 00:27:48,610 have another convention. 515 00:27:48,610 --> 00:27:51,650 If you want to use the other convention, just imagine 516 00:27:51,650 --> 00:27:54,440 switching 1 and 1 in your mind. 517 00:27:54,440 --> 00:27:58,280 They're both just binary numbers. 518 00:27:58,280 --> 00:28:03,140 Then, when you want to look at this MAP rule, the MAP rule is 519 00:28:03,140 --> 00:28:06,790 choosing the larger of these two things, 520 00:28:06,790 --> 00:28:10,600 which we had back here. 521 00:28:10,600 --> 00:28:15,620 That's choosing whether this is larger than this, or vice 522 00:28:15,620 --> 00:28:20,540 versa, which is choosing whether this ratio here is 523 00:28:20,540 --> 00:28:24,630 greater than the ratio of P1 to P0. 524 00:28:24,630 --> 00:28:29,170 So that's the same, that's the same thing. 525 00:28:29,170 --> 00:28:34,960 So the MAP rule is to calculate the likelihood ratio 526 00:28:34,960 --> 00:28:37,530 for this given observation y. 527 00:28:37,530 --> 00:28:41,930 And if this is greater than P1 over P0, you 528 00:28:41,930 --> 00:28:46,200 select H equals 0. 529 00:28:46,200 --> 00:28:51,910 If it's less than or equal to P1 over P0, you select H1. 530 00:28:51,910 --> 00:28:56,320 Why do I put the strict equality here and the strict 531 00:28:56,320 --> 00:28:57,880 inequality here? 532 00:28:57,880 --> 00:29:00,310 Again, no reason whatsoever. 533 00:29:00,310 --> 00:29:03,490 When you have equality, it doesn't make any difference 534 00:29:03,490 --> 00:29:04,880 which you choose. 535 00:29:04,880 --> 00:29:07,380 So you could flip a coin. 536 00:29:07,380 --> 00:29:11,150 It's a little easier if you just say, we're going to do 537 00:29:11,150 --> 00:29:14,240 this under this condition. 538 00:29:14,240 --> 00:29:16,770 So we state condition this way. 539 00:29:16,770 --> 00:29:19,540 We calculate the likelihood ratio. 540 00:29:19,540 --> 00:29:21,680 We compare it with a threshold. 541 00:29:21,680 --> 00:29:24,840 The threshold here is P1 over P0. 542 00:29:24,840 --> 00:29:27,320 And then we select something. 543 00:29:27,320 --> 00:29:30,750 Why did I put a little hat over this? 544 00:29:30,750 --> 00:29:31,680 AUDIENCE: Estimation. 545 00:29:31,680 --> 00:29:32,165 PROFESSOR: What? 546 00:29:32,165 --> 00:29:34,430 AUDIENCE: Because it's an estimation. 547 00:29:34,430 --> 00:29:34,530 PROFESSOR: What? 548 00:29:34,530 --> 00:29:37,150 AUDIENCE: It's an estimation? 549 00:29:37,150 --> 00:29:38,750 PROFESSOR: Well, it's not really an estimation. 550 00:29:38,750 --> 00:29:39,750 It's a detection. 551 00:29:39,750 --> 00:29:44,690 I mean, estimation you usually view as being analog. 552 00:29:44,690 --> 00:29:47,210 Detection you usually view as being digital. 553 00:29:47,210 --> 00:29:48,880 And thanks for bringing that up because it's 554 00:29:48,880 --> 00:29:50,130 an important point. 555 00:29:53,700 --> 00:30:00,140 But in this model, H is either 0 or 1 in the result of this 556 00:30:00,140 --> 00:30:01,140 experiment. 557 00:30:01,140 --> 00:30:03,900 We don't know which it is. 558 00:30:03,900 --> 00:30:06,220 This is what we've chosen. 559 00:30:06,220 --> 00:30:12,320 So H hat is 0 does not mean that H itself is 0. 560 00:30:12,320 --> 00:30:14,280 So this is our choice. 561 00:30:14,280 --> 00:30:17,360 It might be wrong or it might be right. 562 00:30:17,360 --> 00:30:21,110 Many decision rules, including the most common and the most 563 00:30:21,110 --> 00:30:24,980 sensible, are rules that compare lambda of y to a fixed 564 00:30:24,980 --> 00:30:31,170 threshold, say, eta, is P1 over P0, which is independent 565 00:30:31,170 --> 00:30:34,090 of y, which is just a fixed threshold. 566 00:30:34,090 --> 00:30:37,350 The decision rules then vary only in the way that you 567 00:30:37,350 --> 00:30:39,280 choose the threshold. 568 00:30:39,280 --> 00:30:44,030 Now, what happens as soon as I call this eta 569 00:30:44,030 --> 00:30:47,160 instead of P1 over P0? 570 00:30:47,160 --> 00:30:51,660 My test becomes independent of these a priori probabilities 571 00:30:51,660 --> 00:30:55,910 that statisticians have thought about for so long. 572 00:30:55,910 --> 00:30:58,800 Namely, after a couple of lines of fiddling around with 573 00:30:58,800 --> 00:31:03,840 these things, suddenly all of that has disappeared. 574 00:31:03,840 --> 00:31:05,820 We have a threshold test. 575 00:31:05,820 --> 00:31:09,930 The threshold test says, take this ratio-- 576 00:31:09,930 --> 00:31:13,720 everybody agrees that there's such a ratio that exists-- 577 00:31:13,720 --> 00:31:16,400 and compare it with something. 578 00:31:16,400 --> 00:31:23,900 And if it's bigger than that something, you choose 0. 579 00:31:23,900 --> 00:31:27,220 If it's less than that thing, you choose 1. 580 00:31:27,220 --> 00:31:29,810 And that's the end of it. 581 00:31:29,810 --> 00:31:33,670 OK, so we have two questions. 582 00:31:33,670 --> 00:31:38,580 One, do we always want to use a threshold test or are there 583 00:31:38,580 --> 00:31:40,390 cases where we should use things other 584 00:31:40,390 --> 00:31:42,560 than a threshold test? 585 00:31:42,560 --> 00:31:48,010 And the second question is, if we're going to use a threshold 586 00:31:48,010 --> 00:31:51,650 test, where should we set the threshold? 587 00:31:51,650 --> 00:31:54,430 I mean, there's nothing that says that you really want to 588 00:31:54,430 --> 00:31:58,530 minimize the probability of error. 589 00:31:58,530 --> 00:32:02,740 I mean, suppose your test is to see whether-- 590 00:32:02,740 --> 00:32:06,000 I mean, something in the news today. 591 00:32:06,000 --> 00:32:09,350 I mean, you'd like to take an experiment to see whether your 592 00:32:09,350 --> 00:32:14,410 nuclear plant is going to explode or not. 593 00:32:14,410 --> 00:32:16,940 So you come up with one decision, it's 594 00:32:16,940 --> 00:32:18,450 not going to explode. 595 00:32:18,450 --> 00:32:21,580 Or another decision, you decide it will explode. 596 00:32:21,580 --> 00:32:24,270 Presumably on the basis of that decision, you do all 597 00:32:24,270 --> 00:32:26,340 sorts of things. 598 00:32:26,340 --> 00:32:30,190 Do you really want to make it a maximum a posteriori 599 00:32:30,190 --> 00:32:32,420 probability decision? 600 00:32:32,420 --> 00:32:33,840 No. 601 00:32:33,840 --> 00:32:38,900 You recognize that if it's going to explode, and you 602 00:32:38,900 --> 00:32:42,030 choose that it's not going to explode and you don't do 603 00:32:42,030 --> 00:32:47,380 anything, there is a humongous cost associated with that. 604 00:32:47,380 --> 00:32:49,860 If you decide the other way, there's a pretty large cost 605 00:32:49,860 --> 00:32:52,180 associated with that also. 606 00:32:52,180 --> 00:32:54,940 But there's not really much comparison between the two. 607 00:32:54,940 --> 00:32:58,360 But anyway, you want to do something which takes those 608 00:32:58,360 --> 00:32:59,830 costs into account. 609 00:32:59,830 --> 00:33:02,170 One of the problems in the homework does that. 610 00:33:02,170 --> 00:33:09,510 It's really almost trivial to readjust this problem, so that 611 00:33:09,510 --> 00:33:13,860 you set the threshold to involve the costs also. 612 00:33:13,860 --> 00:33:19,730 So if you have arbitrary costs in making errors, then you 613 00:33:19,730 --> 00:33:21,660 change the threshold a little bit. 614 00:33:21,660 --> 00:33:25,800 But you still use a threshold test. 615 00:33:25,800 --> 00:33:29,010 There's something called maximum likelihood that people 616 00:33:29,010 --> 00:33:32,420 like for making decisions. 617 00:33:32,420 --> 00:33:35,470 And maximum likelihood says you calculate 618 00:33:35,470 --> 00:33:37,040 the likelihood ratio. 619 00:33:37,040 --> 00:33:39,620 And if the likelihood ratio is bigger than 620 00:33:39,620 --> 00:33:41,650 1, you go this way. 621 00:33:41,650 --> 00:33:46,530 If it's less than 1, you go this way. 622 00:33:46,530 --> 00:33:49,780 It's the MAP test if the two a priori 623 00:33:49,780 --> 00:33:52,140 probabilities are equal. 624 00:33:52,140 --> 00:33:55,790 But in many cases, you want to use it whether or not the a 625 00:33:55,790 --> 00:33:57,930 priori probabilities are equal. 626 00:33:57,930 --> 00:34:00,890 It's a standard test, and there are many 627 00:34:00,890 --> 00:34:02,670 reasons for using it. 628 00:34:02,670 --> 00:34:06,510 Aside from the fact that the a priori probabilities might be 629 00:34:06,510 --> 00:34:07,860 chosen that way. 630 00:34:07,860 --> 00:34:10,870 So anyway, that's one other choice. 631 00:34:10,870 --> 00:34:13,070 When we go a little further day, we'll talk about a 632 00:34:13,070 --> 00:34:14,659 Neyman-Pearson test. 633 00:34:14,659 --> 00:34:20,560 The Neyman-Pearson test says, for some reason or other, I 634 00:34:20,560 --> 00:34:23,989 want to make sure that the probability that my nuclear 635 00:34:23,989 --> 00:34:29,250 plant doesn't blow up is less than, say, 10 636 00:34:29,250 --> 00:34:30,570 to the minus fifth. 637 00:34:30,570 --> 00:34:32,560 Why 10 to the minus fifth? 638 00:34:32,560 --> 00:34:34,060 Pull it out of the air. 639 00:34:34,060 --> 00:34:37,389 Maybe 10 to the minus sixth, that point our probabilities 640 00:34:37,389 --> 00:34:39,230 don't make much sense anymore. 641 00:34:39,230 --> 00:34:44,170 But however we choose it, we choose our test to say, we 642 00:34:44,170 --> 00:34:47,860 can't make the probability of error under one hypothesis 643 00:34:47,860 --> 00:34:52,300 bigger than some certain amount alpha than what test 644 00:34:52,300 --> 00:34:55,400 will minimize the probability of error under the other 645 00:34:55,400 --> 00:34:57,360 hypothesis. 646 00:34:57,360 --> 00:35:00,020 Namely, if I have to get one thing right, or I have to get 647 00:35:00,020 --> 00:35:03,610 it right almost all the time, what's the best I can do on 648 00:35:03,610 --> 00:35:05,570 the other alternative? 649 00:35:05,570 --> 00:35:08,060 And that's the Neyman-Pearson test. 650 00:35:08,060 --> 00:35:14,640 That is a favorite test among the non-Bayesians because it 651 00:35:14,640 --> 00:35:18,260 doesn't involve the a priori probabilities anymore. 652 00:35:18,260 --> 00:35:20,200 So it's a nice one in that way. 653 00:35:20,200 --> 00:35:23,940 But we'll see, we get it anyway using 654 00:35:23,940 --> 00:35:26,750 a probability model. 655 00:35:26,750 --> 00:35:29,890 OK, let's go back to random walks just a little bit to see 656 00:35:29,890 --> 00:35:33,230 why we're doing what we're doing. 657 00:35:33,230 --> 00:35:42,560 The logarithm of the threshold ratio is logarithm of this 658 00:35:42,560 --> 00:35:44,352 lambda of y. 659 00:35:44,352 --> 00:35:45,845 I'm taking m observations. 660 00:35:45,845 --> 00:35:49,380 I'm putting that in explicitly, is the sum from N 661 00:35:49,380 --> 00:35:53,810 equals 1 to m of the log of the individual ratio. 662 00:35:53,810 --> 00:35:56,640 In other words, when you-- 663 00:35:56,640 --> 00:36:00,960 under hypothesis 0, if I calculate the probability of 664 00:36:00,960 --> 00:36:09,210 vector y given H equals 0, I'm finding the probability of n 665 00:36:09,210 --> 00:36:11,250 things which are IID. 666 00:36:11,250 --> 00:36:16,320 So what I'm going to find this probability density is taking 667 00:36:16,320 --> 00:36:19,720 the product of the probabilities of each of the 668 00:36:19,720 --> 00:36:22,390 observations. 669 00:36:22,390 --> 00:36:25,530 Most of you know now that any time you look at a 670 00:36:25,530 --> 00:36:29,210 probability, which is a product of observations, what 671 00:36:29,210 --> 00:36:32,590 you'd really like to do is to take the logarithm of it. 672 00:36:32,590 --> 00:36:35,120 So you're talking about a sum of things rather than a 673 00:36:35,120 --> 00:36:38,540 product of things because we all know how to add 674 00:36:38,540 --> 00:36:40,700 independent random variables. 675 00:36:40,700 --> 00:36:46,110 So the log of this likelihood ratio, which is called the log 676 00:36:46,110 --> 00:36:50,940 likelihood ratio as you might guess, is just a sum of these 677 00:36:50,940 --> 00:36:52,600 likelihood ratios. 678 00:36:52,600 --> 00:36:55,900 If we look at this for each m greater than or equal to 1, 679 00:36:55,900 --> 00:36:59,780 then given H equals 0, it's a random walk. 680 00:36:59,780 --> 00:37:03,410 And given H equals 1, it's another random walk. 681 00:37:03,410 --> 00:37:06,870 It's the same sequence of sample values in both cases. 682 00:37:06,870 --> 00:37:10,300 Namely, as an experimentalist, we're taking these 683 00:37:10,300 --> 00:37:11,490 observations. 684 00:37:11,490 --> 00:37:17,230 We don't know whether H equals 0 or H equals 1 is what the 685 00:37:17,230 --> 00:37:20,480 result of the experiment is going to be. 686 00:37:20,480 --> 00:37:23,450 But what we do know is we know what those values are. 687 00:37:23,450 --> 00:37:25,870 We can calculate this sum. 688 00:37:25,870 --> 00:37:38,320 And now, if we condition this on H equals 0, then this 689 00:37:38,320 --> 00:37:41,550 quantity, which is fixed, has a particular 690 00:37:41,550 --> 00:37:43,490 probability of occurring. 691 00:37:43,490 --> 00:37:47,040 So this is a random variable then under the 692 00:37:47,040 --> 00:37:49,380 hypothesis H equals 0. 693 00:37:49,380 --> 00:37:53,670 It's a random variable under the hypothesis H equals 1. 694 00:37:53,670 --> 00:37:57,590 And this sum of random variables behaves in a very 695 00:37:57,590 --> 00:38:01,090 different way under these two hypotheses. 696 00:38:01,090 --> 00:38:04,680 What's going to happen is that under one hypothesis, the 697 00:38:04,680 --> 00:38:09,980 expected value of this log likelihood ratio is going to 698 00:38:09,980 --> 00:38:12,310 linearly increase with n. 699 00:38:12,310 --> 00:38:16,050 If we look at it under the other hypothesis, it's going 700 00:38:16,050 --> 00:38:20,530 to linearly decrease as we increase n. 701 00:38:20,530 --> 00:38:24,550 And a nifty test at that point is to say, as soon as it 702 00:38:24,550 --> 00:38:28,460 crosses a threshold up here or a threshold down here, we're 703 00:38:28,460 --> 00:38:31,360 going to make a decision. 704 00:38:31,360 --> 00:38:35,320 And that's called a sequential test in that case because you 705 00:38:35,320 --> 00:38:38,500 haven't specified ahead of time, I'm going to take 100 706 00:38:38,500 --> 00:38:41,270 tests and then make up my mind. 707 00:38:41,270 --> 00:38:44,420 You've specified that I'm going to take as many tests as 708 00:38:44,420 --> 00:38:48,050 I need to be relatively sure that I'm getting the right 709 00:38:48,050 --> 00:38:52,340 decision, which is what you do in real life. 710 00:38:52,340 --> 00:38:57,090 I mean, there's nothing fancy about doing sequential tests. 711 00:38:57,090 --> 00:39:00,420 Those are the obvious things to do, except they're a little 712 00:39:00,420 --> 00:39:05,530 more tricky to talk about using probability theory. 713 00:39:05,530 --> 00:39:08,530 But anyway, that's where we're headed. 714 00:39:08,530 --> 00:39:14,660 That's why we're talking about hypothesis testing. 715 00:39:14,660 --> 00:39:19,090 Because when you look at it in this formulation, we get a 716 00:39:19,090 --> 00:39:21,290 random walk. 717 00:39:21,290 --> 00:39:28,500 And it gives us a nice example of when you want to use random 718 00:39:28,500 --> 00:39:33,060 walks crossing a threshold as a way of making decisions. 719 00:39:33,060 --> 00:39:36,640 OK, so that's why we're doing what we're doing. 720 00:39:41,050 --> 00:39:48,150 Now, let's go back and look at threshold tests again, and try 721 00:39:48,150 --> 00:39:55,650 to see how we're going to make threshold tests, what the 722 00:39:55,650 --> 00:39:59,460 error probabilities will be, and try to analyze them a 723 00:39:59,460 --> 00:40:03,150 little more than just saying, well, a MAP test does this. 724 00:40:03,150 --> 00:40:06,420 Because as soon as you see that a MAP test does this, you 725 00:40:06,420 --> 00:40:10,030 say, well, suppose I use some other test. 726 00:40:10,030 --> 00:40:12,500 And what am I going to suffer from that? 727 00:40:12,500 --> 00:40:15,120 What am I going to gain by it? 728 00:40:15,120 --> 00:40:19,140 So it's worthwhile to, instead of looking at even just 729 00:40:19,140 --> 00:40:22,580 threshold tests, to say, well, let's look at 730 00:40:22,580 --> 00:40:25,300 any old test at all. 731 00:40:25,300 --> 00:40:28,745 Now, any test means the following. 732 00:40:31,270 --> 00:40:34,150 I have this probability model. 733 00:40:34,150 --> 00:40:37,730 I've already bludgeoned you into accepting the fact that's 734 00:40:37,730 --> 00:40:40,300 the probability model we're going to be looking at. 735 00:40:44,390 --> 00:40:46,440 And we have this-- 736 00:40:46,440 --> 00:40:49,120 well, we have the likelihood ratio, but we don't care about 737 00:40:49,120 --> 00:40:50,710 that for the moment. 738 00:40:50,710 --> 00:40:53,390 But we make this observation. 739 00:40:53,390 --> 00:40:54,640 We got to make a decision. 740 00:40:57,330 --> 00:41:01,800 And our decision is going to be either 1 or 0. 741 00:41:01,800 --> 00:41:03,743 How do we characterize that mathematically? 742 00:41:08,300 --> 00:41:11,850 Or how do we calculate it if we want a computer to make 743 00:41:11,850 --> 00:41:14,250 that decision for us? 744 00:41:14,250 --> 00:41:18,690 The systematic way to do it is for every possible sequence of 745 00:41:18,690 --> 00:41:24,650 y to say ahead of time to give a formula, which sequences get 746 00:41:24,650 --> 00:41:30,010 mapped into 1 and which sequences get mapped into 0. 747 00:41:30,010 --> 00:41:36,220 So we're going to call a set A the set of sample sequences 748 00:41:36,220 --> 00:41:38,750 that get mapped into hypothesis 1. 749 00:41:38,750 --> 00:41:45,620 That's the most general binary hypothesis test you can do. 750 00:41:45,620 --> 00:41:48,140 That includes all possible ways of 751 00:41:48,140 --> 00:41:50,660 choosing either 1 or 0. 752 00:41:50,660 --> 00:41:55,320 You're forced to hire somebody or not hire somebody. 753 00:41:55,320 --> 00:41:58,500 You can't get them to work for you for two weeks, and then 754 00:41:58,500 --> 00:41:59,950 make a decision at that point. 755 00:41:59,950 --> 00:42:02,060 Well, sometimes you can in this world. 756 00:42:02,060 --> 00:42:05,890 But if it's somebody you really want and other people 757 00:42:05,890 --> 00:42:09,820 want them, too, then you've got to decide, I'm going to go 758 00:42:09,820 --> 00:42:12,150 with this person or I'm not going to go with them. 759 00:42:12,150 --> 00:42:18,240 So under all observations that you've made, you need some way 760 00:42:18,240 --> 00:42:23,550 to decide which ones make you go to decision 1. 761 00:42:23,550 --> 00:42:27,010 Which ones make you go to decision 0. 762 00:42:27,010 --> 00:42:30,940 So we will just say arbitrarily, there's a set A 763 00:42:30,940 --> 00:42:35,230 of sample sequences that map into hypothesis 1. 764 00:42:35,230 --> 00:42:40,070 And the error probability for each hypothesis using test A 765 00:42:40,070 --> 00:42:43,140 is given by-- and we'll just call Q sub 0 of A-- 766 00:42:43,140 --> 00:42:45,880 this is our name for the error probability. 767 00:42:53,120 --> 00:42:54,610 Have I twisted this up? 768 00:42:54,610 --> 00:42:55,860 No. 769 00:42:58,300 --> 00:43:01,150 Q sub 0 of A is the probability 770 00:43:01,150 --> 00:43:02,400 that I actually choose-- 771 00:43:05,880 --> 00:43:10,170 it's the probability that I choose A given that the 772 00:43:10,170 --> 00:43:12,210 hypothesis is 0. 773 00:43:12,210 --> 00:43:20,000 Q sub 1 of A is the probability that I choose 1. 774 00:43:20,000 --> 00:43:22,275 Blah, let me start that over again. 775 00:43:25,400 --> 00:43:33,880 Q0 of A is the probability that I'm going to choose 776 00:43:33,880 --> 00:43:37,460 hypothesis 1 given that hypothesis 0 was the correct 777 00:43:37,460 --> 00:43:38,410 hypothesis. 778 00:43:38,410 --> 00:43:43,090 It's the probability that Y is in A. That means that H hat is 779 00:43:43,090 --> 00:43:46,980 equal to 1 given that H is actually 0. 780 00:43:46,980 --> 00:43:52,510 So that's the probability we make an error given the 781 00:43:52,510 --> 00:43:56,190 hypothesis, the correct hypothesis is 0. 782 00:43:56,190 --> 00:43:59,880 Q1 of A is the probability of making an error given that the 783 00:43:59,880 --> 00:44:02,250 correct hypothesis is 1. 784 00:44:02,250 --> 00:44:04,870 If I have a priori probabilities, I'm going back 785 00:44:04,870 --> 00:44:07,770 to assuming a priori probabilities again. 786 00:44:07,770 --> 00:44:10,495 The probability of error is? 787 00:44:15,340 --> 00:44:21,970 It's P0 times the probability I make an error given that H 788 00:44:21,970 --> 00:44:23,190 equals zero. 789 00:44:23,190 --> 00:44:26,770 P1 a priori probability of 1 given that I make 790 00:44:26,770 --> 00:44:28,920 an error given 1. 791 00:44:28,920 --> 00:44:30,260 I add these two up. 792 00:44:30,260 --> 00:44:33,750 I can write it this way. 793 00:44:33,750 --> 00:44:35,570 Don't ask for the time being. 794 00:44:35,570 --> 00:44:42,300 I'll just take the P0 out, so it's Q0 of A plus P1 over P0 795 00:44:42,300 --> 00:44:47,920 Q1 of A. So that's what I've called eta times Q1 of A. 796 00:44:47,920 --> 00:44:52,160 For the threshold test based on eta, the probability of 797 00:44:52,160 --> 00:44:55,120 error is the same thing. 798 00:44:55,120 --> 00:44:59,455 But that A there is an eta. 799 00:44:59,455 --> 00:45:04,690 I hope you can imagine that quantity there is an eta. 800 00:45:04,690 --> 00:45:05,930 This is an eta. 801 00:45:05,930 --> 00:45:10,710 So it's P0 times Q0 of eta plus eta times Q1 of eta. 802 00:45:10,710 --> 00:45:14,840 So the eta probability, under this crazy test that you've 803 00:45:14,840 --> 00:45:19,180 designed, is P0 times this quantity. 804 00:45:19,180 --> 00:45:23,370 Under the MAP test, probability of error is this 805 00:45:23,370 --> 00:45:25,710 quantity here. 806 00:45:25,710 --> 00:45:28,830 What do we know about the MAP test? 807 00:45:28,830 --> 00:45:33,300 It minimizes the error probability under those a 808 00:45:33,300 --> 00:45:35,160 priori probabilities. 809 00:45:35,160 --> 00:45:39,720 So what we know about it is that this quantity is less 810 00:45:39,720 --> 00:45:43,890 than or equal to this quantity. 811 00:45:43,890 --> 00:45:48,630 Take out the P0's and it says that this quantity is less 812 00:45:48,630 --> 00:45:50,280 than or equal to this quantity. 813 00:45:53,250 --> 00:45:55,920 Pretty simple. 814 00:45:55,920 --> 00:46:00,320 Let's draw a picture that shows what that means. 815 00:46:00,320 --> 00:46:01,724 Here's a result that we have. 816 00:46:04,690 --> 00:46:09,010 We know because of maximum a posteriori probability for the 817 00:46:09,010 --> 00:46:15,990 threshold test that this is less than or equal to this. 818 00:46:15,990 --> 00:46:17,970 This is the minimum error probability. 819 00:46:17,970 --> 00:46:21,590 This is the error probability you get with 820 00:46:21,590 --> 00:46:23,820 whatever test you like. 821 00:46:23,820 --> 00:46:30,810 So let's draw a picture on a graph where the probability of 822 00:46:30,810 --> 00:46:37,810 error given H equals 1 is on the horizontal axis. 823 00:46:37,810 --> 00:46:42,680 The probability of error conditional on H equals 0 is 824 00:46:42,680 --> 00:46:46,550 on this axis. 825 00:46:46,550 --> 00:46:53,420 So I can list the probability of error for the threshold 826 00:46:53,420 --> 00:46:55,970 test, which sits here. 827 00:46:55,970 --> 00:46:59,810 I can list the probability of error for this arbitrary test, 828 00:46:59,810 --> 00:47:01,380 which sits here. 829 00:47:01,380 --> 00:47:05,880 And I know that this quantity is greater than or equal to 830 00:47:05,880 --> 00:47:06,890 this quantity. 831 00:47:06,890 --> 00:47:14,400 So the only thing I have to do now is to sort out using plain 832 00:47:14,400 --> 00:47:19,560 geometry, why these numbers are what they are. 833 00:47:19,560 --> 00:47:26,760 This number here is Q0 of eta plus eta times Q1 of eta. 834 00:47:26,760 --> 00:47:30,070 Here's Q1 of eta. 835 00:47:30,070 --> 00:47:33,630 This distance here is Q1 of eta. 836 00:47:33,630 --> 00:47:37,600 We have a line of slope minus eta there that we've drawn. 837 00:47:37,600 --> 00:47:42,890 So this point here is, in fact, Q0 of eta plus eta times 838 00:47:42,890 --> 00:47:44,620 Q1 of eta . 839 00:47:44,620 --> 00:47:47,890 That's just plain geometry. 840 00:47:47,890 --> 00:47:57,880 This point is Q0 of A plus eta times Q1 of A. Another line of 841 00:47:57,880 --> 00:48:00,660 slope minus et. 842 00:48:00,660 --> 00:48:05,350 What we've shown is that this is less than or equal to this. 843 00:48:10,620 --> 00:48:11,990 That's because of the MAP rule. 844 00:48:11,990 --> 00:48:14,470 This has to be less than or equal to that. 845 00:48:14,470 --> 00:48:16,740 So what have we shown here? 846 00:48:16,740 --> 00:48:21,260 We've shown that for every test A you can imagine, when 847 00:48:21,260 --> 00:48:25,880 you draw that test on this two-dimensional plot of error 848 00:48:25,880 --> 00:48:30,710 probability given H equals 1 versus error probability given 849 00:48:30,710 --> 00:48:33,090 H equals 0. 850 00:48:33,090 --> 00:48:37,360 Every test in the world lies Northeast of this line here. 851 00:48:45,620 --> 00:48:46,050 Yeah? 852 00:48:46,050 --> 00:48:48,268 AUDIENCE: Can you say again exactly what 853 00:48:48,268 --> 00:48:51,280 axis represents what? 854 00:48:51,280 --> 00:48:54,410 PROFESSOR: This axis here represents the error 855 00:48:54,410 --> 00:48:58,200 probability given that H equals 1 is the correct 856 00:48:58,200 --> 00:48:59,690 hypothesis. 857 00:48:59,690 --> 00:49:03,620 This axis is the error probability given that 0 is 858 00:49:03,620 --> 00:49:04,870 the correct hypothesis. 859 00:49:07,420 --> 00:49:11,200 So we've defined Q1 of eta and Q0 of eta as those two error 860 00:49:11,200 --> 00:49:12,510 probabilities. 861 00:49:12,510 --> 00:49:17,860 Using the threshold test, or using the MAP test where eta 862 00:49:17,860 --> 00:49:20,470 is equal to P0 over P1. 863 00:49:20,470 --> 00:49:25,010 And this point here is whatever it happens to be for 864 00:49:25,010 --> 00:49:27,300 any test that you happen to like. 865 00:49:30,630 --> 00:49:35,390 You might have a supervisor who wants to hire somebody and 866 00:49:35,390 --> 00:49:39,190 you view that person is a threat to yourself, so you've 867 00:49:39,190 --> 00:49:43,010 taken all your observations and you then make a decision. 868 00:49:43,010 --> 00:49:45,380 If the person is any good, you say, don't hire him. 869 00:49:45,380 --> 00:49:47,480 If the person is good you say, hire them. 870 00:49:47,480 --> 00:49:51,770 So just the opposite of what you should do. 871 00:49:51,770 --> 00:49:57,310 But whatever you do, this says this is less than or equal to 872 00:49:57,310 --> 00:50:00,910 this because of the MAP rule. 873 00:50:00,910 --> 00:50:05,680 And therefore, this point lies up in that direction 874 00:50:05,680 --> 00:50:06,930 of this line here. 875 00:50:09,900 --> 00:50:12,630 You can do this for any eta that you want to do it for. 876 00:50:15,820 --> 00:50:19,490 So for every eta that we want to use, we get some value of 877 00:50:19,490 --> 00:50:22,240 Q0 of eta and Q1 of eta. 878 00:50:22,240 --> 00:50:25,180 These go along here in some way. 879 00:50:25,180 --> 00:50:27,640 You can do the same argument again. 880 00:50:27,640 --> 00:50:33,770 For every threshold test, every point lies Northeast of 881 00:50:33,770 --> 00:50:37,380 the line of slope minus eta through that threshold test. 882 00:50:37,380 --> 00:50:43,200 We get a whole family of curves when eta is very big, 883 00:50:43,200 --> 00:50:47,170 the curve of slope minus eta goes like this. 884 00:50:47,170 --> 00:50:50,260 When eta is very small, it goes like this. 885 00:50:54,050 --> 00:50:58,000 We just think of ourselves plotting all these curves, 886 00:50:58,000 --> 00:51:02,930 taking the upper envelope of them because every test has to 887 00:51:02,930 --> 00:51:06,770 lie Northeast of every one of those lines. 888 00:51:06,770 --> 00:51:11,240 So we take the upper envelope of all of these lines, and we 889 00:51:11,240 --> 00:51:15,050 get something that looks like this. 890 00:51:15,050 --> 00:51:18,450 We call this the error curve. 891 00:51:18,450 --> 00:51:23,730 And this is the upper envelope of the straight lines of slope 892 00:51:23,730 --> 00:51:28,110 minus eta that go through the threshold tests at eta. 893 00:51:31,510 --> 00:51:33,830 You get something else from that, too. 894 00:51:33,830 --> 00:51:37,330 This curve is convex. 895 00:51:37,330 --> 00:51:39,110 Why is the curve convex? 896 00:51:39,110 --> 00:51:42,380 Well, you might like to take the second derivative of it, 897 00:51:42,380 --> 00:51:45,090 but that's a pain in the neck. 898 00:51:45,090 --> 00:51:52,110 But the fundamental definition of convexity is that a 899 00:51:52,110 --> 00:51:55,730 one-dimensional curve is convex if all of its tangents 900 00:51:55,730 --> 00:51:58,230 lie underneath the curve. 901 00:51:58,230 --> 00:52:00,040 That's the way we've constructed this. 902 00:52:00,040 --> 00:52:02,870 It's the upper envelope of a bunch of straight lines. 903 00:52:02,870 --> 00:52:03,360 Yes? 904 00:52:03,360 --> 00:52:06,520 AUDIENCE: Can you please explain, what is u of alpha? 905 00:52:06,520 --> 00:52:08,930 PROFESSOR: U of alpha is just what I've 906 00:52:08,930 --> 00:52:11,870 called this upper envelope. 907 00:52:11,870 --> 00:52:14,420 This upper envelope is now a function. 908 00:52:14,420 --> 00:52:16,110 AUDIENCE: What's the definition? 909 00:52:16,110 --> 00:52:16,520 PROFESSOR: What? 910 00:52:16,520 --> 00:52:17,700 AUDIENCE: What is the definition? 911 00:52:17,700 --> 00:52:19,980 PROFESSOR: The definition is the upper envelope of all 912 00:52:19,980 --> 00:52:23,235 these straight lines. 913 00:52:23,235 --> 00:52:24,485 AUDIENCE: For changing eta? 914 00:52:24,485 --> 00:52:24,870 PROFESSOR: What? 915 00:52:24,870 --> 00:52:27,490 AUDIENCE: For changing eta? 916 00:52:27,490 --> 00:52:28,960 PROFESSOR: Yes. 917 00:52:28,960 --> 00:52:35,540 As eta changes, I get a whole bunch of these points. 918 00:52:35,540 --> 00:52:37,940 I got a whole bunch of these points. 919 00:52:37,940 --> 00:52:41,480 I take the upper envelope of all of these straight lines. 920 00:52:44,200 --> 00:52:47,010 I mean, yes, you'd rather see an equation. 921 00:52:47,010 --> 00:52:51,670 But if you see an equation it's terribly ugly. 922 00:52:51,670 --> 00:52:55,590 I mean, you can program a computer to do this. 923 00:52:55,590 --> 00:52:59,310 as easily as you can program it to 924 00:52:59,310 --> 00:53:02,800 follow a bunch of equations. 925 00:53:02,800 --> 00:53:06,230 But anyway, I'm not interested in actually solving for this 926 00:53:06,230 --> 00:53:07,480 curve in particular. 927 00:53:13,420 --> 00:53:16,660 I am particularly interested in the fact that this upper 928 00:53:16,660 --> 00:53:22,990 envelope is, in fact, a convex curve and that the threshold 929 00:53:22,990 --> 00:53:25,900 tests lie on the curve. 930 00:53:25,900 --> 00:53:30,690 The other tests lie Northeast of the curve. 931 00:53:30,690 --> 00:53:34,280 And that's the reason you want to use threshold tests. 932 00:53:34,280 --> 00:53:38,810 And it has nothing to do with a priori probabilities at all. 933 00:53:38,810 --> 00:53:41,730 So you see, the thing we've done is to start out assuming 934 00:53:41,730 --> 00:53:44,120 a priori probabilities. 935 00:53:44,120 --> 00:53:49,450 We've derived this neat result using a priori probabilities. 936 00:53:49,450 --> 00:53:55,196 But now we have this error curve. 937 00:53:55,196 --> 00:53:59,240 Well, to give you a better definition of what u of alpha 938 00:53:59,240 --> 00:54:07,620 is, u of alpha is the error probability under hypothesis 1 939 00:54:07,620 --> 00:54:12,450 if the error probability under hypothesis 0 was alpha. 940 00:54:12,450 --> 00:54:16,160 You pick an error probability here. 941 00:54:16,160 --> 00:54:18,660 You go up to that point here. 942 00:54:18,660 --> 00:54:21,750 There's a threshold test there. 943 00:54:21,750 --> 00:54:24,730 You read over there. 944 00:54:24,730 --> 00:54:28,480 And at that point, you find the probability of error given 945 00:54:28,480 --> 00:54:30,426 H equals 1. 946 00:54:30,426 --> 00:54:31,830 AUDIENCE: How do you know that the threshold 947 00:54:31,830 --> 00:54:35,580 tests lie on the curve? 948 00:54:35,580 --> 00:54:42,640 PROFESSOR: Well, this threshold test here is 949 00:54:42,640 --> 00:54:45,220 Southwest of all tests. 950 00:54:48,420 --> 00:54:53,175 And therefore, it can't lie above this upper envelope. 951 00:54:57,300 --> 00:55:00,740 Now, I've cheated you in one small way. 952 00:55:00,740 --> 00:55:07,370 If you have a discrete test, what you're going to wind up 953 00:55:07,370 --> 00:55:12,580 with is just a finite set of these possible points here. 954 00:55:12,580 --> 00:55:15,650 So you're going to wind up with the upper envelope of a 955 00:55:15,650 --> 00:55:18,130 finite set of straight lines. 956 00:55:18,130 --> 00:55:21,770 So the straight line is actually going to be-- 957 00:55:21,770 --> 00:55:26,120 it's still convex, but it's piecewise linear. 958 00:55:26,120 --> 00:55:30,970 And it's piecewise linear, and the threshold tests are at the 959 00:55:30,970 --> 00:55:33,320 points of that curve. 960 00:55:33,320 --> 00:55:36,300 And in between those points, you don't quite 961 00:55:36,300 --> 00:55:37,550 know what to do. 962 00:55:40,890 --> 00:55:44,630 So since you don't quite know what to do in between those 963 00:55:44,630 --> 00:55:51,400 points, as far as the maximum a posteriori probability test 964 00:55:51,400 --> 00:55:58,030 goes, you can reach any one of those points, sometimes using 965 00:55:58,030 --> 00:56:00,890 one test on one corner of-- 966 00:56:00,890 --> 00:56:02,550 I guess it's easier if I draw it. 967 00:56:07,140 --> 00:56:10,480 And I didn't want to get into this particularly because it's 968 00:56:10,480 --> 00:56:12,280 a little messier. 969 00:56:18,320 --> 00:56:20,600 So you could have this kind of curve. 970 00:56:20,600 --> 00:56:24,550 And the notes talk about this in detail. 971 00:56:24,550 --> 00:56:29,370 So the threshold test correspond to this point. 972 00:56:29,370 --> 00:56:33,550 This point says always decide one. 973 00:56:33,550 --> 00:56:38,020 Don't pay any attention to the tests at all, just say I think 974 00:56:38,020 --> 00:56:40,980 one is the right hypothesis. 975 00:56:40,980 --> 00:56:44,880 I mean, this is the testing philosophy of people who don't 976 00:56:44,880 --> 00:56:46,980 believe in experimentalism. 977 00:56:46,980 --> 00:56:48,840 They've already made up their mind. 978 00:56:48,840 --> 00:56:50,330 They look at the results. 979 00:56:50,330 --> 00:56:52,200 They say, that's very interesting. 980 00:56:52,200 --> 00:56:56,070 And then they say, I'm going to choose this. 981 00:56:56,070 --> 00:57:00,944 These other points are our particular threshold tests. 982 00:57:04,680 --> 00:57:07,680 If you want to get error probabilities in the middle 983 00:57:07,680 --> 00:57:09,420 here, what do you do? 984 00:57:09,420 --> 00:57:11,500 You use a randomized test. 985 00:57:11,500 --> 00:57:12,700 Sometimes you use this. 986 00:57:12,700 --> 00:57:14,150 Sometimes you use this. 987 00:57:14,150 --> 00:57:17,120 You flip a coin and choose whichever one of these you 988 00:57:17,120 --> 00:57:18,370 want to choose. 989 00:57:20,990 --> 00:57:27,160 So what this says is the Neyman-Pearson test, which is 990 00:57:27,160 --> 00:57:36,190 the test that says pick some alpha, which is the error 991 00:57:36,190 --> 00:57:39,630 probability under hypothesis 1 that 992 00:57:39,630 --> 00:57:41,660 you're willing to tolerate. 993 00:57:41,660 --> 00:57:44,130 So you pick alpha. 994 00:57:44,130 --> 00:57:48,330 And then it says, minimize the error probability of the other 995 00:57:48,330 --> 00:57:51,790 kind, so you read over there. 996 00:57:51,790 --> 00:57:56,630 And the Neyman-Pearson test, what it does is it minimizes 997 00:57:56,630 --> 00:58:02,130 the error probability under the other hypothesis. 998 00:58:02,130 --> 00:58:05,530 Now, when this curve is piecewise linear, the 999 00:58:05,530 --> 00:58:09,530 Neyman-Pearson test is not a threshold test, but it's a 1000 00:58:09,530 --> 00:58:11,930 randomized threshold test. 1001 00:58:11,930 --> 00:58:15,300 Sometimes when you're at a point like this, you have to 1002 00:58:15,300 --> 00:58:17,485 use this test and this test sometimes. 1003 00:58:20,340 --> 00:58:25,710 For most of the tests that you deal with, Neyman-Pearson test 1004 00:58:25,710 --> 00:58:28,570 is just the threshold test that's at 1005 00:58:28,570 --> 00:58:31,180 that particular point. 1006 00:58:34,670 --> 00:58:38,100 Any questions about that? 1007 00:58:38,100 --> 00:58:39,960 This is probably one of these things you have to think about 1008 00:58:39,960 --> 00:58:40,980 a little bit. 1009 00:58:40,980 --> 00:58:41,510 Yes? 1010 00:58:41,510 --> 00:58:44,330 AUDIENCE: When you say you have to use this test or this 1011 00:58:44,330 --> 00:58:48,045 test, are you talking about threshold or are you talking 1012 00:58:48,045 --> 00:58:51,184 about-- because this is always-- it's either H equals 1013 00:58:51,184 --> 00:58:53,846 0 or H equal 1, right? 1014 00:58:53,846 --> 00:58:56,508 What do you mean when you say you have to randomize between 1015 00:58:56,508 --> 00:58:59,180 the two tests? 1016 00:58:59,180 --> 00:59:00,695 I mean threshold tests-- 1017 00:59:15,120 --> 00:59:20,290 if I have a finite set of alternatives, and I'm doing a 1018 00:59:20,290 --> 00:59:24,640 threshold test on that finite set of alternatives, I only 1019 00:59:24,640 --> 00:59:29,500 have a finite number of things I can do. 1020 00:59:29,500 --> 00:59:33,230 As I increase the threshold, I suddenly get to the point 1021 00:59:33,230 --> 00:59:36,750 where this ratio of likelihoods 1022 00:59:36,750 --> 00:59:39,190 includes one more point. 1023 00:59:39,190 --> 00:59:41,500 And then it gets to the point where it includes one other 1024 00:59:41,500 --> 00:59:43,770 point and so forth. 1025 00:59:43,770 --> 00:59:49,430 So that what happens is that this upper envelope is just 1026 00:59:49,430 --> 00:59:53,320 the upper envelope of a finite number of points. 1027 00:59:53,320 --> 00:59:56,980 And this upper envelope of a finite number of points, the 1028 00:59:56,980 --> 01:00:00,500 threshold tests are just the corners there. 1029 01:00:00,500 --> 01:00:04,330 So I sometimes have to randomize between them. 1030 01:00:04,330 --> 01:00:05,880 If you don't like that, ignore it. 1031 01:00:09,130 --> 01:00:16,450 Because for most tests you deal with, almost all books on 1032 01:00:16,450 --> 01:00:20,300 statistics that I've ever seen, it just says the 1033 01:00:20,300 --> 01:00:25,130 Neyman-Pearson test looks at the threshold curve, at this 1034 01:00:25,130 --> 01:00:26,610 error curve. 1035 01:00:26,610 --> 01:00:29,040 And it chooses accordingly. 1036 01:00:29,040 --> 01:00:31,228 Yes? 1037 01:00:31,228 --> 01:00:36,590 AUDIENCE: You can put the previous slide back? 1038 01:00:36,590 --> 01:00:42,690 You told us that because of maximum a posteriori 1039 01:00:42,690 --> 01:00:49,870 probability, if eta is equal to P0 divided by P1, then the 1040 01:00:49,870 --> 01:00:51,950 probability of error is minimized. 1041 01:00:51,950 --> 01:00:56,900 And so the errors of the test A are [INAUDIBLE]. 1042 01:00:59,750 --> 01:01:04,738 But if we start changing eta from 0 to infinity, it doesn't 1043 01:01:04,738 --> 01:01:05,704 have to be anymore. 1044 01:01:05,704 --> 01:01:09,175 [INAUDIBLE], which means the error is 1045 01:01:09,175 --> 01:01:11,015 not necessarily minimized. 1046 01:01:11,015 --> 01:01:13,170 So the argument doesn't hold anymore. 1047 01:01:13,170 --> 01:01:17,880 PROFESSOR: As I change eta, I'm changing P1 and P0 also. 1048 01:01:17,880 --> 01:01:21,760 In other words, now what I'm doing is I'm saying, let's 1049 01:01:21,760 --> 01:01:27,240 look at this threshold test, and let's visualize what 1050 01:01:27,240 --> 01:01:32,010 happens as I change the a priori probabilities. 1051 01:01:32,010 --> 01:01:37,390 So I'm suddenly becoming a classical statistician instead 1052 01:01:37,390 --> 01:01:40,340 of a Bayesian one. 1053 01:01:40,340 --> 01:01:42,250 But I know what the answers are from looking at the 1054 01:01:42,250 --> 01:01:43,500 Bayesian case. 1055 01:01:48,370 --> 01:01:53,290 OK, so let's move on. 1056 01:01:57,160 --> 01:02:02,245 I mean, we now sort of see that these tests-- 1057 01:02:05,180 --> 01:02:08,120 well, one thing we've seen is when you have to make a 1058 01:02:08,120 --> 01:02:13,120 decision under this kind of probabilistic model we've been 1059 01:02:13,120 --> 01:02:18,070 talking about-- namely, two hypotheses, IID random 1060 01:02:18,070 --> 01:02:20,153 variable is conditional on each hypothesis. 1061 01:02:23,420 --> 01:02:26,370 Those hypothesis testing problems turn 1062 01:02:26,370 --> 01:02:29,350 into random walk problems. 1063 01:02:29,350 --> 01:02:32,580 We also saw that the [? GG1Q ?] 1064 01:02:32,580 --> 01:02:37,040 when I started looking at when the system becomes empty, and 1065 01:02:37,040 --> 01:02:43,010 how long it takes to start to fill up again, that problem is 1066 01:02:43,010 --> 01:02:44,880 a random walk problem. 1067 01:02:44,880 --> 01:02:48,000 So now I want to start to ask the question, what's the 1068 01:02:48,000 --> 01:02:52,470 probability that a random walk will cross a threshold? 1069 01:02:52,470 --> 01:02:54,700 I'm going to apply the Chernoff bound to it. 1070 01:02:54,700 --> 01:02:56,010 You remember the Chernoff bound? 1071 01:02:56,010 --> 01:03:00,410 We talked about it a little bit back on the 1072 01:03:00,410 --> 01:03:03,180 second week of the term. 1073 01:03:03,180 --> 01:03:06,420 We were talking about the Markov inequality and the 1074 01:03:06,420 --> 01:03:08,270 Chebyshev inequality. 1075 01:03:08,270 --> 01:03:12,200 And we said that the Chernoff inequality was the same sort 1076 01:03:12,200 --> 01:03:17,780 of thing, except it was based on e to the rZ rather than x 1077 01:03:17,780 --> 01:03:20,290 or x squared. 1078 01:03:20,290 --> 01:03:24,620 And we talked a little bit about its properties. 1079 01:03:24,620 --> 01:03:28,790 The major thing one uses the Chernoff bound for is to get 1080 01:03:28,790 --> 01:03:33,020 good estimates very, very far away from the mean. 1081 01:03:33,020 --> 01:03:36,200 In other words, good estimates of probabilities that are 1082 01:03:36,200 --> 01:03:38,040 very, very small. 1083 01:03:38,040 --> 01:03:41,370 I've grown up using these all my life because I've been 1084 01:03:41,370 --> 01:03:43,440 concerned with error probabilities in 1085 01:03:43,440 --> 01:03:46,010 communication systems. 1086 01:03:46,010 --> 01:03:49,630 You typically want error probabilities that run between 1087 01:03:49,630 --> 01:03:53,420 10 to the minus fifth and 10 to the minus eighth. 1088 01:03:53,420 --> 01:03:58,940 So you want to look at points which are quite far away. 1089 01:03:58,940 --> 01:04:02,550 I mean, you take a large number of-- 1090 01:04:02,550 --> 01:04:05,230 you take a sum of a large number of variables, which 1091 01:04:05,230 --> 01:04:09,330 correspond to a code. 1092 01:04:09,330 --> 01:04:12,400 And you look at error probabilities for this rather 1093 01:04:12,400 --> 01:04:13,790 complicated thing. 1094 01:04:13,790 --> 01:04:16,380 But you're looking very, very far away from the mean, and 1095 01:04:16,380 --> 01:04:19,620 you're looking at very large numbers of observations. 1096 01:04:19,620 --> 01:04:25,920 So instead of the kinds of things where we deal with 1097 01:04:25,920 --> 01:04:28,380 things like the central limit theorem where you're trying to 1098 01:04:28,380 --> 01:04:31,430 figure out what goes on close to the mean, here you're 1099 01:04:31,430 --> 01:04:36,170 trying to figure out what goes on very far from the mean. 1100 01:04:36,170 --> 01:04:40,990 OK, so what the Chernoff bound says is that the probability 1101 01:04:40,990 --> 01:04:45,820 that a random variable Z is greater than or equal to some 1102 01:04:45,820 --> 01:04:47,390 constant b. 1103 01:04:47,390 --> 01:04:50,580 We don't even need sums of random variables here, it's 1104 01:04:50,580 --> 01:04:54,590 just a Chernoff bound is a bound on the tail of a 1105 01:04:54,590 --> 01:04:55,980 distribution. 1106 01:04:55,980 --> 01:04:59,800 Is less than or equal to the moment generating function of 1107 01:04:59,800 --> 01:05:01,810 that random variable. 1108 01:05:01,810 --> 01:05:08,090 g sub Z of r is the expected value of e to the rZ. 1109 01:05:08,090 --> 01:05:10,890 These generating functions, you can calculate 1110 01:05:10,890 --> 01:05:12,960 them if you want to. 1111 01:05:12,960 --> 01:05:15,390 Times e to the minus rb. 1112 01:05:15,390 --> 01:05:18,550 This is the Markov inequality for the random 1113 01:05:18,550 --> 01:05:21,750 variable e to the rZ. 1114 01:05:21,750 --> 01:05:26,330 And go back and review chapter 1. 1115 01:05:26,330 --> 01:05:29,770 I think it's section 1.43 or something. 1116 01:05:29,770 --> 01:05:34,180 It's the section that deals with the Markov inequality, 1117 01:05:34,180 --> 01:05:40,970 the Chebyshev inequality, and the Chernoff bound. 1118 01:05:40,970 --> 01:05:43,880 And as I told you once when we talked about these things, 1119 01:05:43,880 --> 01:05:45,620 Chernoff is still alive and well. 1120 01:05:45,620 --> 01:05:47,840 He's a statistician at Harvard. 1121 01:05:47,840 --> 01:05:51,480 He was somewhat embarrassed by this inequality becoming so 1122 01:05:51,480 --> 01:05:55,620 famous because he did it as sort of a throw-off thing in a 1123 01:05:55,620 --> 01:05:59,250 paper where he was trying to do something which was much 1124 01:05:59,250 --> 01:06:02,290 more mathematically sophisticated. 1125 01:06:02,290 --> 01:06:05,440 And now the poor guy is only known for this thing that he 1126 01:06:05,440 --> 01:06:06,690 views as being trivial. 1127 01:06:11,360 --> 01:06:14,220 But what the bound says is the probability of Z is greater 1128 01:06:14,220 --> 01:06:17,380 than or equal to b is this inequality. 1129 01:06:17,380 --> 01:06:20,840 Strangely enough, the probability that Z is less 1130 01:06:20,840 --> 01:06:25,710 than or equal to b is bounded by the same inequality. 1131 01:06:25,710 --> 01:06:28,980 But one of them, r is bigger than 0. 1132 01:06:28,980 --> 01:06:33,220 And the other one, r is less than 0. 1133 01:06:33,220 --> 01:06:35,230 And you have to go back and read that section to 1134 01:06:35,230 --> 01:06:37,800 understand why. 1135 01:06:37,800 --> 01:06:40,560 Now, this is most useful when it's applied to a sum of 1136 01:06:40,560 --> 01:06:42,270 random variables. 1137 01:06:42,270 --> 01:06:46,670 I don't know of any applications for it otherwise. 1138 01:06:46,670 --> 01:06:50,580 So if the moment-generating function-- 1139 01:06:50,580 --> 01:06:52,870 oh, incidentally, also. 1140 01:06:52,870 --> 01:06:56,380 When most people talk about moment-generating functions, 1141 01:06:56,380 --> 01:06:59,650 and certainly when people talked about moment-generating 1142 01:06:59,650 --> 01:07:04,640 functions before the 1950s or so, what they were always 1143 01:07:04,640 --> 01:07:08,830 interested in is the fact that if you take derivatives of the 1144 01:07:08,830 --> 01:07:12,540 moment-generating functions, you generate the moments of 1145 01:07:12,540 --> 01:07:14,980 the random variable. 1146 01:07:14,980 --> 01:07:17,860 If you take the derivative of this with respect to r, 1147 01:07:17,860 --> 01:07:22,970 evaluate it at r equals 0, you get the expected value of Z. 1148 01:07:22,970 --> 01:07:26,610 If you take the second derivative evaluated at r 1149 01:07:26,610 --> 01:07:30,720 equals 0, you get the expected value of Z 1150 01:07:30,720 --> 01:07:32,700 squared, and so forth. 1151 01:07:32,700 --> 01:07:36,810 You can see that by just taking the derivative of that. 1152 01:07:36,810 --> 01:07:38,580 Here, we're looking at something else. 1153 01:07:38,580 --> 01:07:42,200 We're not looking at what goes on around r equals 0. 1154 01:07:42,200 --> 01:07:45,640 We're trying to figure out what goes on way on the far 1155 01:07:45,640 --> 01:07:48,760 tails of these distributions. 1156 01:07:48,760 --> 01:07:56,860 So if gX of r is e to the rX, then e to the e to the r Sn-- 1157 01:07:56,860 --> 01:07:59,380 Sn is the sum of these random variables-- 1158 01:07:59,380 --> 01:08:04,590 is the expected value of the product of e to the rXi. 1159 01:08:04,590 --> 01:08:07,300 Namely, it's e to the r. 1160 01:08:07,300 --> 01:08:09,150 Some of Xi. 1161 01:08:09,150 --> 01:08:11,020 So that turns into a product. 1162 01:08:11,020 --> 01:08:15,520 The expected value of a product of a finite number of 1163 01:08:15,520 --> 01:08:19,319 terms is the product of the expected value. 1164 01:08:19,319 --> 01:08:23,460 So it's gX or r to the n-th power. 1165 01:08:23,460 --> 01:08:27,200 So if I want to write this, now I'm applying the Chernoff 1166 01:08:27,200 --> 01:08:30,020 bound to the random variable S sub n. 1167 01:08:30,020 --> 01:08:32,880 What's the probability that S sub n is greater than or equal 1168 01:08:32,880 --> 01:08:34,840 to n times a? 1169 01:08:34,840 --> 01:08:39,000 It's gX to the n of r times e to the minus rna. 1170 01:08:39,000 --> 01:08:41,260 That's what the Chernoff bound says. 1171 01:08:41,260 --> 01:08:46,640 This is the Chernoff bound over on the other side of the 1172 01:08:46,640 --> 01:08:49,240 distribution. 1173 01:08:49,240 --> 01:08:54,020 This only makes sense and has interesting values when a is 1174 01:08:54,020 --> 01:08:56,990 bigger than the mean or when a is less than the mean. 1175 01:08:56,990 --> 01:09:01,210 And when r is greater than 0 for this one and less than 0 1176 01:09:01,210 --> 01:09:02,460 for this one. 1177 01:09:07,370 --> 01:09:10,640 Now, this is easier to interpret and it's 1178 01:09:10,640 --> 01:09:13,729 easier to work with. 1179 01:09:13,729 --> 01:09:20,819 If you take that product of terms g to the r to the n-th 1180 01:09:20,819 --> 01:09:27,020 power and you visualize the logarithm of g to the X. 1181 01:09:27,020 --> 01:09:31,850 Visualize the logarithm of g to the X, then you get this 1182 01:09:31,850 --> 01:09:33,114 quantity up here. 1183 01:09:41,340 --> 01:09:44,529 You get the probability that Sn is greater than or equal to 1184 01:09:44,529 --> 01:09:51,350 na is this e to the n times gamma x of r minus ra. 1185 01:09:51,350 --> 01:09:57,600 Gamma is the logarithm of the moment-generating function. 1186 01:09:57,600 --> 01:10:00,710 The logarithm of the moment-generating function is 1187 01:10:00,710 --> 01:10:02,980 always called the semi-invariant 1188 01:10:02,980 --> 01:10:04,980 moment-generating function. 1189 01:10:04,980 --> 01:10:07,620 The name is, again, because people were originally 1190 01:10:07,620 --> 01:10:10,570 interested in the moment-generating properties 1191 01:10:10,570 --> 01:10:12,480 of these random variables. 1192 01:10:12,480 --> 01:10:17,060 If you sit down and take the derivatives, I can 1193 01:10:17,060 --> 01:10:19,080 probably do it here. 1194 01:10:19,080 --> 01:10:21,195 It's simple enough that I won't get confused. 1195 01:10:26,640 --> 01:10:37,140 The derivative with respect to r of the logarithm of g of r 1196 01:10:37,140 --> 01:10:44,810 is first derivative of r divided by g of r. 1197 01:10:44,810 --> 01:10:52,890 And the second derivative is then the 1198 01:10:52,890 --> 01:10:55,660 natural log of g of r. 1199 01:10:55,660 --> 01:11:00,120 Taking the derivative of that is equal to g double prime of 1200 01:11:00,120 --> 01:11:06,000 r over g of r squared. 1201 01:11:06,000 --> 01:11:09,300 Tell me if I'm making a mistake here because I usually 1202 01:11:09,300 --> 01:11:11,360 do when I do this. 1203 01:11:11,360 --> 01:11:19,690 Minus g of r and g prime of r. 1204 01:11:22,950 --> 01:11:35,770 Probably divided by this squared. 1205 01:11:35,770 --> 01:11:37,020 Let's see. 1206 01:11:37,020 --> 01:11:38,470 Is this right? 1207 01:11:38,470 --> 01:11:41,810 Who can take derivatives here? 1208 01:11:41,810 --> 01:11:43,620 AUDIENCE: First term doesn't have a square in it. 1209 01:11:43,620 --> 01:11:43,970 PROFESSOR: What? 1210 01:11:43,970 --> 01:11:45,875 AUDIENCE: First term doesn't have a square in the 1211 01:11:45,875 --> 01:11:47,150 denominator. 1212 01:11:47,150 --> 01:11:49,780 PROFESSOR: First term? 1213 01:11:49,780 --> 01:11:51,610 Yeah. 1214 01:11:51,610 --> 01:11:53,280 Oh, the first thing doesn't have a square. 1215 01:11:53,280 --> 01:11:54,375 No, you're right. 1216 01:11:54,375 --> 01:11:56,350 AUDIENCE: Second one doesn't have-- 1217 01:11:56,350 --> 01:11:59,400 PROFESSOR: And the second one, let's see. 1218 01:11:59,400 --> 01:12:00,650 We have-- 1219 01:12:03,850 --> 01:12:06,230 we just have g prime of r squared 1220 01:12:06,230 --> 01:12:08,150 divided by g of r squared. 1221 01:12:08,150 --> 01:12:12,930 And we evaluate this at r equals 0. 1222 01:12:12,930 --> 01:12:14,930 This term becomes 1. 1223 01:12:14,930 --> 01:12:17,340 This term becomes 1. 1224 01:12:17,340 --> 01:12:22,760 This term becomes the second moment x squared bar. 1225 01:12:22,760 --> 01:12:26,300 And this term becomes x bar squared. 1226 01:12:26,300 --> 01:12:32,030 And this whole thing becomes the variance of the moment of 1227 01:12:32,030 --> 01:12:37,980 the random variable rather than the second moment. 1228 01:12:37,980 --> 01:12:43,100 All of these terms might be wrong, but this term is right. 1229 01:12:43,100 --> 01:12:47,010 And I'm sure all of you can rewrite that and evaluate it 1230 01:12:47,010 --> 01:12:48,190 at r equals 0. 1231 01:12:48,190 --> 01:12:50,240 So that's why it's called the semi-invariant 1232 01:12:50,240 --> 01:12:52,280 moment-generating function. 1233 01:12:52,280 --> 01:12:55,610 It doesn't make any difference for what we're interested in. 1234 01:12:55,610 --> 01:12:59,550 The thing that we're interested in is that this 1235 01:12:59,550 --> 01:13:00,810 exponent here-- 1236 01:13:03,520 --> 01:13:07,490 as you visualize doing this experiment and taking 1237 01:13:07,490 --> 01:13:15,000 additional observations, what happens is the probability 1238 01:13:15,000 --> 01:13:19,480 that you exceed na-- 1239 01:13:19,480 --> 01:13:25,310 that the n-th sum exceeds n times some fixed quantity a is 1240 01:13:25,310 --> 01:13:26,950 going down exponentially with [? the a. ?] 1241 01:13:29,450 --> 01:13:32,100 Now, is this bound any good? 1242 01:13:35,150 --> 01:13:39,970 Well, if you optimize it over r, It's essentially 1243 01:13:39,970 --> 01:13:41,730 exponentially tight. 1244 01:13:41,730 --> 01:13:45,210 So, in fact, it is good. 1245 01:13:45,210 --> 01:13:48,030 What does it mean to be exponentially tight? 1246 01:13:48,030 --> 01:13:50,680 That's what I don't want to define carefully. 1247 01:13:50,680 --> 01:13:53,540 There's a theorem in the notes that says what exponentially 1248 01:13:53,540 --> 01:13:54,850 tight means. 1249 01:13:54,850 --> 01:13:58,250 And it takes you half an hour to read it because it's being 1250 01:13:58,250 --> 01:14:00,110 stated very carefully. 1251 01:14:00,110 --> 01:14:08,430 What it says essentially is that if I take this quantity 1252 01:14:08,430 --> 01:14:14,510 here and I subtract-- 1253 01:14:14,510 --> 01:14:16,710 I add an epsilon to it. 1254 01:14:16,710 --> 01:14:22,510 Namely, e to the n times this quantity minus epsilon. 1255 01:14:22,510 --> 01:14:25,600 So I have an e to the minus n epsilon, see 1256 01:14:25,600 --> 01:14:26,720 it sitting in there? 1257 01:14:26,720 --> 01:14:31,170 When I take this exponent and I reduce it just a little bit, 1258 01:14:31,170 --> 01:14:33,120 I get a bound that isn't true. 1259 01:14:33,120 --> 01:14:35,850 This is greater than or equal to the 1260 01:14:35,850 --> 01:14:37,860 quantity with an epsilon. 1261 01:14:37,860 --> 01:14:40,490 In other words, you can't make an exponent that's any 1262 01:14:40,490 --> 01:14:42,350 smaller than this. 1263 01:14:42,350 --> 01:14:45,690 You can take coefficients and play with them, but you can't 1264 01:14:45,690 --> 01:14:48,750 make the exponent any smaller. 1265 01:14:48,750 --> 01:14:55,490 OK, all of these things you can do them by pictures. 1266 01:14:55,490 --> 01:14:58,560 I know many of you don't like doing things by pictures. 1267 01:14:58,560 --> 01:15:02,060 I keep doing them by pictures because I keep trying to 1268 01:15:02,060 --> 01:15:05,820 convince you that pictures are more rigorous 1269 01:15:05,820 --> 01:15:07,750 than equations are. 1270 01:15:07,750 --> 01:15:10,850 At least, many times. 1271 01:15:10,850 --> 01:15:13,690 If you want to show that something is convex, you try 1272 01:15:13,690 --> 01:15:17,800 to show that the second derivative is positive. 1273 01:15:17,800 --> 01:15:20,640 That works sometimes and it doesn't work sometimes. 1274 01:15:20,640 --> 01:15:23,430 I mean, it works as a function is continuous and has a 1275 01:15:23,430 --> 01:15:25,450 continuous first derivative. 1276 01:15:25,450 --> 01:15:27,850 It doesn't work. otherwise. 1277 01:15:27,850 --> 01:15:33,280 When you start taking tangents of the curve, and you say the 1278 01:15:33,280 --> 01:15:40,560 upper envelope of the tangents to the curve all lie below the 1279 01:15:40,560 --> 01:15:42,640 function, then it works perfectly. 1280 01:15:42,640 --> 01:15:44,800 That's what a convex function is by definition. 1281 01:15:48,210 --> 01:15:49,810 How do we derive all this stuff? 1282 01:15:52,350 --> 01:15:56,320 What we're trying to do is to find-- 1283 01:15:56,320 --> 01:16:04,990 I mean, this inequality here is true for all r, for all r 1284 01:16:04,990 --> 01:16:09,710 greater than 0 so long as a is greater than the mean of X. 1285 01:16:09,710 --> 01:16:12,970 It's true for all r for which this moment-generating 1286 01:16:12,970 --> 01:16:15,020 function exists. 1287 01:16:15,020 --> 01:16:18,400 Moment-generating functions can sometimes blow up, so they 1288 01:16:18,400 --> 01:16:21,060 don't exist everywhere. 1289 01:16:21,060 --> 01:16:22,270 So it's true wherever the 1290 01:16:22,270 --> 01:16:25,140 moment-generating function exists. 1291 01:16:25,140 --> 01:16:29,890 So we like to find the r for which this bound is tightest. 1292 01:16:29,890 --> 01:16:33,240 So what I'm going to do is draw a picture and show you 1293 01:16:33,240 --> 01:16:37,160 where it's tightest in terms of the picture. 1294 01:16:37,160 --> 01:16:40,380 What I've drawn here is the semi-invariant 1295 01:16:40,380 --> 01:16:43,240 moment-generating function. 1296 01:16:43,240 --> 01:16:47,580 Why didn't I put that down? 1297 01:16:47,580 --> 01:16:51,130 This is gamma of r. 1298 01:16:51,130 --> 01:16:55,630 Gamma of r at 0, it's the log of the moment-generating 1299 01:16:55,630 --> 01:16:58,500 function at 0, which is 0. 1300 01:17:01,090 --> 01:17:03,000 It's convex. 1301 01:17:03,000 --> 01:17:06,190 You take its second derivative. 1302 01:17:06,190 --> 01:17:09,180 Its second derivative at r equals 0 is pretty easy. 1303 01:17:09,180 --> 01:17:12,250 Its second derivative of other values or r you have to 1304 01:17:12,250 --> 01:17:13,500 struggle with it. 1305 01:17:16,010 --> 01:17:20,200 But when you struggle a little bit, it is convex. 1306 01:17:20,200 --> 01:17:23,770 If you've got a curve that goes down like this, then it 1307 01:17:23,770 --> 01:17:25,800 goes back up again. 1308 01:17:25,800 --> 01:17:28,100 Sometimes goes off towards infinity. 1309 01:17:28,100 --> 01:17:30,950 Might do whatever it wants to do. 1310 01:17:30,950 --> 01:17:34,790 Sometimes at a certain value of r, it stops existing. 1311 01:17:34,790 --> 01:17:37,750 Suppose I take the simplest random 1312 01:17:37,750 --> 01:17:39,800 variable you know about. 1313 01:17:39,800 --> 01:17:43,440 You only know two simple random variables. 1314 01:17:43,440 --> 01:17:46,030 One of them is a binary random variable. 1315 01:17:46,030 --> 01:17:49,420 The other one's an exponential random variable. 1316 01:17:49,420 --> 01:17:54,330 Suppose I take the exponential random variable with density 1317 01:17:54,330 --> 01:17:59,020 alpha times e to the minus alpha X. Where does this 1318 01:17:59,020 --> 01:18:02,580 moment-generating function exist? 1319 01:18:02,580 --> 01:18:16,220 You take alpha and I multiply it by e to the rX when I 1320 01:18:16,220 --> 01:18:17,470 integrate it. 1321 01:18:20,940 --> 01:18:22,190 Where does this exist? 1322 01:18:25,100 --> 01:18:27,110 I mean, don't bother to integrate it. 1323 01:18:31,910 --> 01:18:36,630 If r is bigger than alpha, this exponent is bigger than 1324 01:18:36,630 --> 01:18:37,890 this exponent. 1325 01:18:37,890 --> 01:18:40,110 And this thing takes off towards infinity. 1326 01:18:40,110 --> 01:18:43,715 If r is less than a, the whole thing goes to 0. 1327 01:18:49,220 --> 01:18:59,020 gX of r exists for r less than alpha in this case. 1328 01:19:01,620 --> 01:19:06,290 And in general, if you look at a moment-generating function, 1329 01:19:06,290 --> 01:19:11,140 if the tail of that distribution function is going 1330 01:19:11,140 --> 01:19:14,880 to 0 exponentially, you find the rate at which it's going 1331 01:19:14,880 --> 01:19:16,980 to 0 exponentially. 1332 01:19:16,980 --> 01:19:18,650 And that's where the moment-generating 1333 01:19:18,650 --> 01:19:21,810 function cuts off. 1334 01:19:21,810 --> 01:19:23,630 It has to cut off. 1335 01:19:23,630 --> 01:19:27,070 You can't show a result like this, which says something is 1336 01:19:27,070 --> 01:19:30,710 going to 0, faster than it could possibly be going to 0. 1337 01:19:33,600 --> 01:19:35,460 So we have to have that kind of result. 1338 01:19:35,460 --> 01:19:37,760 But anyway, we draw this curve. 1339 01:19:37,760 --> 01:19:40,350 This is mu sub X of r. 1340 01:19:40,350 --> 01:19:46,520 And then we say, how do we graphically minimize gamma of 1341 01:19:46,520 --> 01:19:50,900 r minus r times a? 1342 01:19:50,900 --> 01:19:57,140 Well, what I do because I've done this before and I know 1343 01:19:57,140 --> 01:19:59,090 how to do it-- 1344 01:19:59,090 --> 01:20:01,580 I mean, it's not the kind of thing where if you sat down 1345 01:20:01,580 --> 01:20:05,670 you would immediately settle on this. 1346 01:20:05,670 --> 01:20:09,710 I look at some particular value of r. 1347 01:20:09,710 --> 01:20:16,370 If I take a line of slope gamma prime of r, that's a 1348 01:20:16,370 --> 01:20:19,920 tangent to this curve because this curve is convex. 1349 01:20:19,920 --> 01:20:24,380 So if I take a line through here of this slope and I look 1350 01:20:24,380 --> 01:20:29,210 at where this line hits here, where does it hit? 1351 01:20:29,210 --> 01:20:34,320 It hits at gamma sub X of r, this point here, 1352 01:20:34,320 --> 01:20:38,230 minus gamma X of r-- 1353 01:20:41,880 --> 01:20:43,130 oh. 1354 01:20:50,220 --> 01:20:55,100 Well, what I've done is I've already optimized the problem. 1355 01:20:55,100 --> 01:20:58,600 I'm trying to find the probability that Sn is greater 1356 01:20:58,600 --> 01:20:59,950 than or equal to na. 1357 01:20:59,950 --> 01:21:03,870 I'm trying to minimize this exponent here, gamma 1358 01:21:03,870 --> 01:21:06,890 X of r minus ra. 1359 01:21:06,890 --> 01:21:10,420 Unfortunately, I really start out by taking the derivative 1360 01:21:10,420 --> 01:21:13,260 of that and setting it equal to 0, which is what you would 1361 01:21:13,260 --> 01:21:15,120 all do, too. 1362 01:21:15,120 --> 01:21:19,330 When I set the derivative of this equal to 0, I get gamma 1363 01:21:19,330 --> 01:21:26,010 prime of r minus a is equal to 0, which is what this says. 1364 01:21:26,010 --> 01:21:31,540 So then we take a line of slope gamma x of r equals 0. 1365 01:21:31,540 --> 01:21:34,130 It's tangent at this point here. 1366 01:21:34,130 --> 01:21:37,500 You look at this point over here and you get the minimum 1367 01:21:37,500 --> 01:21:41,290 value of the gamma X of r minus r0 a. 1368 01:21:45,370 --> 01:21:49,920 So what this says is when you vary a, you can go through 1369 01:21:49,920 --> 01:21:57,440 this maximization tilting this curve around. 1370 01:21:57,440 --> 01:22:02,180 I mean, a determines the slope of this line here. 1371 01:22:02,180 --> 01:22:06,930 If I use a smaller value of a, the slope is smaller. 1372 01:22:06,930 --> 01:22:08,470 It hits in here. 1373 01:22:08,470 --> 01:22:14,220 If I take a larger value of a, it comes in further down and 1374 01:22:14,220 --> 01:22:15,500 the exponent gets bigger. 1375 01:22:15,500 --> 01:22:16,840 That's not surprising. 1376 01:22:16,840 --> 01:22:19,700 I want to find out the probability that S sub n is 1377 01:22:19,700 --> 01:22:21,670 greater than or equal to a. 1378 01:22:21,670 --> 01:22:26,350 As I increase a, I expect this exponent to keep going down as 1379 01:22:26,350 --> 01:22:29,390 I make a bigger and bigger because it's harder and harder 1380 01:22:29,390 --> 01:22:33,700 for it to be greater than or equal to a. 1381 01:22:33,700 --> 01:22:37,570 So anyway, when you optimize this, you get something 1382 01:22:37,570 --> 01:22:39,720 exponentially tight. 1383 01:22:39,720 --> 01:22:42,540 And this is what it's equal to. 1384 01:22:42,540 --> 01:22:46,560 And I would recommend that you go back and read the section 1385 01:22:46,560 --> 01:22:50,100 of chapter 1, which goes through all of this in a 1386 01:22:50,100 --> 01:22:51,350 little more detail. 1387 01:22:56,820 --> 01:23:00,600 Let me go passed that. 1388 01:23:00,600 --> 01:23:03,800 Don't want to talk about that. 1389 01:23:03,800 --> 01:23:09,640 Well, when I do this optimization, if what I'm 1390 01:23:09,640 --> 01:23:13,210 looking at is the probability that S sub n is greater than 1391 01:23:13,210 --> 01:23:17,000 or equal to some alpha rather than n times a when I'm do 1392 01:23:17,000 --> 01:23:20,120 this optimization and I'm looking at what happens at 1393 01:23:20,120 --> 01:23:24,170 different values of n, it turns out that when n is very 1394 01:23:24,170 --> 01:23:31,770 big, you get something which is tangent there. 1395 01:23:31,770 --> 01:23:36,340 As n gets smaller, you get these tangents that come down 1396 01:23:36,340 --> 01:23:38,620 that comes in to there, and then it starts 1397 01:23:38,620 --> 01:23:40,330 going back out again. 1398 01:23:40,330 --> 01:23:47,800 This e to the r star is the tightest the bound ever gets. 1399 01:23:47,800 --> 01:23:54,390 That's the n at which errors in the hypothesis testing 1400 01:23:54,390 --> 01:23:57,400 usually occur. 1401 01:23:57,400 --> 01:23:59,120 It's the point at which-- 1402 01:23:59,120 --> 01:24:02,740 it's the n for which Sn greater than or equal to alpha 1403 01:24:02,740 --> 01:24:06,240 is most likely to occur. 1404 01:24:06,240 --> 01:24:12,500 And if you evaluate that for our friendly binary case 1405 01:24:12,500 --> 01:24:18,290 again, X equals 1 or X equals minus 1, what you find when 1406 01:24:18,290 --> 01:24:25,060 you evaluate that point alpha r star is that r star is equal 1407 01:24:25,060 --> 01:24:29,595 to log 1 minus P over P. And our bound of probability union 1408 01:24:29,595 --> 01:24:33,830 of Sn is greater than or equal to alpha is approximately e to 1409 01:24:33,830 --> 01:24:38,030 the minus alpha r star is 1 minus P over P 1410 01:24:38,030 --> 01:24:40,690 to the minus alpha. 1411 01:24:40,690 --> 01:24:43,600 I mean, why do I torture you with this? 1412 01:24:43,600 --> 01:24:46,570 Because we solved this problem at the beginning of the 1413 01:24:46,570 --> 01:24:47,760 lecture, remember? 1414 01:24:47,760 --> 01:24:54,540 The probability that the sum S sub n for this binary 1415 01:24:54,540 --> 01:24:59,710 experiment is greater than or equal to k is equal to 1 minus 1416 01:24:59,710 --> 01:25:02,130 P over P to the minus k. 1417 01:25:02,130 --> 01:25:04,640 That's what it's equal to exactly. 1418 01:25:04,640 --> 01:25:09,960 When I go through all of this Chernoff bound stuff, I get 1419 01:25:09,960 --> 01:25:11,790 the same answer. 1420 01:25:11,790 --> 01:25:14,950 Now, this is a much harder way to do it, but this is a 1421 01:25:14,950 --> 01:25:16,240 general way of doing it. 1422 01:25:16,240 --> 01:25:18,340 And that's a very specialized way of doing it. 1423 01:25:18,340 --> 01:25:20,250 So we'll talk more about this next time.