1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation or view additional materials from 6 00:00:13,960 --> 00:00:17,890 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,890 --> 00:00:23,400 ocw.mit.edu 8 00:00:23,400 --> 00:00:25,290 PROFESSOR: OK, we have a busy day today, 9 00:00:25,290 --> 00:00:26,540 so let's get started. 10 00:00:32,580 --> 00:00:36,310 Want to go through Chernoff bounds and the Wald identity, 11 00:00:36,310 --> 00:00:42,770 which are closely related, as you'll see, and that involves 12 00:00:42,770 --> 00:00:47,220 coming back to the TG1Q a little bit and making use of 13 00:00:47,220 --> 00:00:49,290 what we did for that. 14 00:00:49,290 --> 00:00:54,320 It also means coming back to hypothesis 15 00:00:54,320 --> 00:00:57,140 testing and using that. 16 00:00:57,140 --> 00:00:59,100 Would probably have been better to start out with 17 00:00:59,100 --> 00:01:04,319 Wald's identity and the Chernoff bound and then do the 18 00:01:04,319 --> 00:01:08,620 applications when it was at the natural time for them. 19 00:01:08,620 --> 00:01:12,730 But anyway, this is the way it is this time, and next time 20 00:01:12,730 --> 00:01:14,080 we'll probably do it differently. 21 00:01:16,610 --> 00:01:19,580 Suppose you have a random variable z. 22 00:01:19,580 --> 00:01:21,520 It has a moment generating function. 23 00:01:21,520 --> 00:01:24,200 Remember, not all random variables have moment 24 00:01:24,200 --> 00:01:25,250 generating functions. 25 00:01:25,250 --> 00:01:27,810 It's a pretty strong restriction. 26 00:01:27,810 --> 00:01:28,970 You need a variance. 27 00:01:28,970 --> 00:01:31,570 You need moments of all orders. 28 00:01:31,570 --> 00:01:35,100 You need all sorts of things, s but we'll assume it exists 29 00:01:35,100 --> 00:01:41,010 in some region between r and r plus. 30 00:01:41,010 --> 00:01:43,310 There's always a question, with moment generating 31 00:01:43,310 --> 00:01:49,240 functions, if they exist up to some maximum value of r 32 00:01:49,240 --> 00:01:53,490 because some of them exist at that value of r and then 33 00:01:53,490 --> 00:01:59,650 disappear immediately after that, and others just sort of 34 00:01:59,650 --> 00:02:08,210 peter away as r approaches r plus from below. 35 00:02:08,210 --> 00:02:10,750 I think in the homework this week, you have an example of 36 00:02:10,750 --> 00:02:12,030 both of those. 37 00:02:12,030 --> 00:02:13,900 I mean, it's a very simple issue. 38 00:02:13,900 --> 00:02:18,000 If you have an exponential distribution, then as r 39 00:02:18,000 --> 00:02:23,900 approaches, the rate of that exponential distribution, 40 00:02:23,900 --> 00:02:26,620 obviously, the moment generating function blows up 41 00:02:26,620 --> 00:02:30,700 because you're taking e to the minus lambda x, and you're 42 00:02:30,700 --> 00:02:33,390 multiplying it by a the r x. 43 00:02:33,390 --> 00:02:39,280 And when r is equal to lambda, bingo, you're integrating 1 44 00:02:39,280 --> 00:02:42,290 over an infinite range, so you've got infinity. 45 00:02:42,290 --> 00:02:46,150 If you multiply that exponential by something which 46 00:02:46,150 --> 00:02:52,360 makes the integral finite when you set r equal to lambda, 47 00:02:52,360 --> 00:02:53,900 then of course, you have something which 48 00:02:53,900 --> 00:02:57,460 is finite at r star. 49 00:02:57,460 --> 00:02:59,450 That is a big pain in the neck. 50 00:02:59,450 --> 00:03:01,570 It's usually not important. 51 00:03:01,570 --> 00:03:06,240 The notes deal with it very carefully, so we're not going 52 00:03:06,240 --> 00:03:07,100 to deal with it here. 53 00:03:07,100 --> 00:03:11,210 We will just assume here that we're talking about r less 54 00:03:11,210 --> 00:03:16,040 than r plus and not worry about that special case, which 55 00:03:16,040 --> 00:03:17,690 usually is not all that important. 56 00:03:17,690 --> 00:03:19,940 But sometimes you have to worry about it. 57 00:03:19,940 --> 00:03:23,610 OK, the Chernoff bound says that the probability that 58 00:03:23,610 --> 00:03:28,140 random variable is greater than or equal to alpha is less 59 00:03:28,140 --> 00:03:32,060 than or equal to the moment generating function evaluated 60 00:03:32,060 --> 00:03:37,690 at some arbitrary value r times e to the minus r alpha. 61 00:03:37,690 --> 00:03:41,700 And if you put it in terms of the semi invariant moment 62 00:03:41,700 --> 00:03:44,440 generating function, the log of the moment generating 63 00:03:44,440 --> 00:03:48,040 function, then the bound is e to the gamma z of 64 00:03:48,040 --> 00:03:51,100 r minus alpha r. 65 00:03:54,350 --> 00:03:57,410 When you see something like that, you ought to look at it 66 00:03:57,410 --> 00:04:00,540 and say, gee, that looks funny because here, we're taking an 67 00:04:00,540 --> 00:04:04,750 arbitrary random variable and saying the tails of it have to 68 00:04:04,750 --> 00:04:07,370 go down exponentially. 69 00:04:07,370 --> 00:04:09,090 That's exactly what this says. 70 00:04:09,090 --> 00:04:13,560 It says that a z takes on very large values. 71 00:04:13,560 --> 00:04:18,110 This is a fixed quantity here for a given value of r, and 72 00:04:18,110 --> 00:04:21,700 it's going down as e to the minus r times alpha. 73 00:04:21,700 --> 00:04:25,030 As you make alpha larger and larger, this goes down faster 74 00:04:25,030 --> 00:04:25,620 and faster. 75 00:04:25,620 --> 00:04:26,790 So what's going on? 76 00:04:26,790 --> 00:04:30,090 How do you take an arbitrary random variable and say the 77 00:04:30,090 --> 00:04:34,140 tails of it is exponentially decreasing? 78 00:04:34,140 --> 00:04:37,510 That's why you have to insist that the moment generating 79 00:04:37,510 --> 00:04:41,450 function exists because when the moment generating function 80 00:04:41,450 --> 00:04:44,300 exists for some r, it means that the tail of that 81 00:04:44,300 --> 00:04:48,870 distribution is, in fact, going down at least that fast, 82 00:04:48,870 --> 00:04:50,690 so you get something that exists. 83 00:04:50,690 --> 00:04:55,310 So the question is what's the best bound of this sort of 84 00:04:55,310 --> 00:04:58,080 when you optimize o for r? 85 00:04:58,080 --> 00:05:02,650 Then the next thing we did is we said that z is a sum of 86 00:05:02,650 --> 00:05:07,000 IID, then the semi invariant moment generating function for 87 00:05:07,000 --> 00:05:12,480 that sum is equal to n times the semi invariant moment 88 00:05:12,480 --> 00:05:17,722 generating function for the underlying random variable x. 89 00:05:17,722 --> 00:05:20,840 S of n is n of these IID random variable. 90 00:05:20,840 --> 00:05:24,180 So one thing you see immediately, and ought to be 91 00:05:24,180 --> 00:05:28,890 second nature to you now, is that if a random variable has 92 00:05:28,890 --> 00:05:32,480 a moment generating function over some range, the sum of a 93 00:05:32,480 --> 00:05:36,070 bunch of those IID random variables also has a moment 94 00:05:36,070 --> 00:05:39,200 generating function over that same range. 95 00:05:39,200 --> 00:05:42,120 You can just count on that because the semi invariant 96 00:05:42,120 --> 00:05:46,000 moment generating function is just n times this b. 97 00:05:46,000 --> 00:05:49,640 OK, so then what we've said is the probability that sn is 98 00:05:49,640 --> 00:05:53,170 greater than or equal to na, where na is playing the role 99 00:05:53,170 --> 00:05:57,640 of alpha and sn is playing the role of z, is just a minimum 100 00:05:57,640 --> 00:06:04,850 over r of e to the n times gamma x of r minus ra, and the 101 00:06:04,850 --> 00:06:08,540 n is multiplying the ra as well as the n. 102 00:06:08,540 --> 00:06:12,100 OK, this is exponential n for a fixed a. 103 00:06:12,100 --> 00:06:15,820 In other words, what you do in this minimization, if you 104 00:06:15,820 --> 00:06:18,780 don't worry about the special cases or anything, how do you 105 00:06:18,780 --> 00:06:19,860 minimize something? 106 00:06:19,860 --> 00:06:25,590 Well, obviously, you want to minimize the exponent here, so 107 00:06:25,590 --> 00:06:30,210 you take the derivative of this gamma prime of r has to 108 00:06:30,210 --> 00:06:32,330 be equal to a. 109 00:06:32,330 --> 00:06:35,690 Then n can be whatever it wants to be when you find that 110 00:06:35,690 --> 00:06:40,240 optimum r, which is where gamma prime of r equals a, 111 00:06:40,240 --> 00:06:42,610 then this goes down exponentially with a. 112 00:06:45,590 --> 00:06:48,860 Now, however, we're interested in something else. 113 00:06:48,860 --> 00:06:51,130 We're interested in threshold crossings. 114 00:06:51,130 --> 00:06:56,800 We're not interested in picking a particular value of 115 00:06:56,800 --> 00:07:00,120 a and asking, as n gets very, very big, what's the 116 00:07:00,120 --> 00:07:03,150 probability that the sum of random variable is greater 117 00:07:03,150 --> 00:07:05,460 than or equal to n times a. 118 00:07:05,460 --> 00:07:08,760 That is exponential in n, but what we're interested in is 119 00:07:08,760 --> 00:07:12,440 the probability that s of n is greater than or equal to just 120 00:07:12,440 --> 00:07:16,300 some constant alpha, and what we're doing, now, is instead 121 00:07:16,300 --> 00:07:20,850 of varying n and varying this with n also, 122 00:07:20,850 --> 00:07:22,030 we're holding the stick. 123 00:07:22,030 --> 00:07:26,420 So we're asking as n gets very, very large, but you hold 124 00:07:26,420 --> 00:07:30,340 this alpha fixed, what happens on this bound over here? 125 00:07:30,340 --> 00:07:34,030 Well, when you minimize this, taking the same simple-minded 126 00:07:34,030 --> 00:07:38,090 view, now the n is not multiplied by the ra. 127 00:07:38,090 --> 00:07:41,480 It's just multiplied by the gamma x. 128 00:07:41,480 --> 00:07:45,790 You get n times gamma prime of r is equal to alpha s where 129 00:07:45,790 --> 00:07:51,610 the minimum is so that it says gamma prime of r is optimized 130 00:07:51,610 --> 00:07:55,270 when you pick gamma prime of r equal to alpha over and n. 131 00:07:55,270 --> 00:08:00,460 This quantity is minimized when you pick gamma prime of r 132 00:08:00,460 --> 00:08:01,760 equal to alpha over n. 133 00:08:01,760 --> 00:08:06,200 So if you look at this bound as n changes, what's happening 134 00:08:06,200 --> 00:08:11,710 is, as n changes, r is changing also, so this is a 135 00:08:11,710 --> 00:08:15,360 harder thing to deal with for variable n. 136 00:08:15,360 --> 00:08:19,580 But graphically, it's quite easy to deal with. 137 00:08:19,580 --> 00:08:22,885 I'm not sure you all got the graphical argument last time 138 00:08:22,885 --> 00:08:26,570 when we went through it, so I want to go through it again. 139 00:08:26,570 --> 00:08:32,409 Let's look at this exponent r minus n over alpha times gamma 140 00:08:32,409 --> 00:08:36,730 of r, and see what it looks like. 141 00:08:36,730 --> 00:08:42,000 We'll take r, pick any old r, there. 142 00:08:42,000 --> 00:08:48,530 What we want to do is show that this, if you take a slope 143 00:08:48,530 --> 00:08:56,010 of alpha over n, and take an arbitrary r, come down to 144 00:08:56,010 --> 00:09:00,620 gamma of x of r, draw a line in this slope, and look at 145 00:09:00,620 --> 00:09:06,350 where it hits the horizontal axis here, that point is r 146 00:09:06,350 --> 00:09:09,750 plus the length of this line here. 147 00:09:09,750 --> 00:09:13,410 The length of this line here is gamma of r, that's a 148 00:09:13,410 --> 00:09:18,930 negative value, times 1 over that slope of this line. 149 00:09:18,930 --> 00:09:24,830 And 1 over the slope of this line is n over alpha, so when 150 00:09:24,830 --> 00:09:28,730 I pick a particular value of r, the value of the experiment 151 00:09:28,730 --> 00:09:31,100 I have is this value here. 152 00:09:36,930 --> 00:09:39,570 How do I optimize this over r? 153 00:09:39,570 --> 00:09:42,340 How do I get the largest exponent here? 154 00:09:42,340 --> 00:09:48,560 Well, I think of varying r, as I vary r from 0, and each 155 00:09:48,560 --> 00:09:51,280 time, I take this straight line here. 156 00:09:51,280 --> 00:09:54,530 And I start here, draw a straight line over there, 157 00:09:54,530 --> 00:09:57,770 start here, draw a straight line over, start at this 158 00:09:57,770 --> 00:10:00,320 tangent here, draw a straight line over. 159 00:10:00,320 --> 00:10:03,890 And what happens when I come to larger values of r? 160 00:10:03,890 --> 00:10:09,300 Just because gamma of s of r is convex, what happens is I 161 00:10:09,300 --> 00:10:17,500 start taking these slope lines, slope alpha over n, and 162 00:10:17,500 --> 00:10:21,420 they intercept the horizontal axis at a smaller value. 163 00:10:21,420 --> 00:10:28,020 So this is optimized over r at the value of r0, which 164 00:10:28,020 --> 00:10:32,790 satisfies alpha over n equals gamma prime of r0. 165 00:10:32,790 --> 00:10:35,870 That's the same answer we got before when we just used 166 00:10:35,870 --> 00:10:38,370 elementary calculus. 167 00:10:38,370 --> 00:10:41,330 Here, we're using a more sophisticated argument, which 168 00:10:41,330 --> 00:10:44,600 you learned about probably in 10th grade. 169 00:10:44,600 --> 00:10:47,540 I would argue that you learned mostly really sophisticated 170 00:10:47,540 --> 00:10:51,450 things when you're in high school, and then when you get 171 00:10:51,450 --> 00:10:55,370 to study engineering in college, somehow you always 172 00:10:55,370 --> 00:10:57,130 study these mundane things. 173 00:10:57,130 --> 00:11:01,740 But anyway, aside from that, why is this 174 00:11:01,740 --> 00:11:04,330 geometric argument better? 175 00:11:04,330 --> 00:11:07,630 Well, when you look at these special cases of what happens 176 00:11:07,630 --> 00:11:13,650 when gamma of r comes around like this, and then suddenly 177 00:11:13,650 --> 00:11:17,350 it stops in midair and just doesn't exist anymore? 178 00:11:17,350 --> 00:11:21,230 So it comes around here, it's still convex, but then 179 00:11:21,230 --> 00:11:23,470 suddenly it goes off to infinity. 180 00:11:23,470 --> 00:11:25,580 How do you do that optimization then? 181 00:11:25,580 --> 00:11:29,050 Well, the graphical argument makes it clear how you do it, 182 00:11:29,050 --> 00:11:32,060 and makes it perfectly rigorous how to do it, whereas 183 00:11:32,060 --> 00:11:34,310 if you're doing it by calculus, you've got a really 184 00:11:34,310 --> 00:11:39,270 think it through, and it becomes fairly tricky. 185 00:11:39,270 --> 00:11:45,040 OK, so anyway, now, the next question we want to ask-- 186 00:11:47,810 --> 00:11:51,040 I mean, at this point, we've seen how to minimize this 187 00:11:51,040 --> 00:11:56,490 quantity over r, so we know what this exponent is for a 188 00:11:56,490 --> 00:11:58,800 particular value of n. 189 00:11:58,800 --> 00:12:01,690 Now, what happens when we vary n? 190 00:12:01,690 --> 00:12:05,500 As you vary n, the thing that happens is we have this 191 00:12:05,500 --> 00:12:10,010 tangent line here, a slope alpha over n. 192 00:12:10,010 --> 00:12:14,650 When you start making n larger, alpha over n becomes 193 00:12:14,650 --> 00:12:18,850 smaller, so the slope becomes smaller. 194 00:12:18,850 --> 00:12:24,210 And as n approaches infinity, you wind up going way, 195 00:12:24,210 --> 00:12:26,270 way the heck out. 196 00:12:26,270 --> 00:12:29,700 As n gets smaller, you come in again. 197 00:12:29,700 --> 00:12:33,110 You keep coming in until you get to this point here. 198 00:12:33,110 --> 00:12:34,130 And what happens then? 199 00:12:34,130 --> 00:12:38,010 We're talking about a line of-- 200 00:12:38,010 --> 00:12:39,570 maybe I ought to draw it on the board. 201 00:12:39,570 --> 00:12:41,370 It would be clearer, I think. 202 00:12:57,850 --> 00:13:02,570 As n gets smaller, you get a point which is 203 00:13:02,570 --> 00:13:05,510 tangent here, this here. 204 00:13:05,510 --> 00:13:12,000 When you're here, the tangent gets right here, so we've 205 00:13:12,000 --> 00:13:15,970 moved all the way into this quantity we call r star, which 206 00:13:15,970 --> 00:13:20,520 is the root of the equation gamma of r equals 0. 207 00:13:20,520 --> 00:13:24,230 Gamma of r equals 0 typically has two roots, one 208 00:13:24,230 --> 00:13:27,920 here, and one at 0. 209 00:13:27,920 --> 00:13:31,220 It always has a root at 0 because moment generating 210 00:13:31,220 --> 00:13:36,010 function evaluated is 0 is always 1, so the log of 211 00:13:36,010 --> 00:13:37,950 it is always 0. 212 00:13:37,950 --> 00:13:41,500 There should be another root because this is convex, unless 213 00:13:41,500 --> 00:13:45,610 it drops off suddenly, and even if it drops off suddenly, 214 00:13:45,610 --> 00:13:48,270 you can visualize it as a straight line 215 00:13:48,270 --> 00:13:50,450 going off to infinity. 216 00:13:50,450 --> 00:13:54,240 So when you get down to this point, what happens? 217 00:13:54,240 --> 00:13:55,845 Well, we just keep moving along. 218 00:14:04,340 --> 00:14:09,110 So as n increases, we start out very large. 219 00:14:09,110 --> 00:14:10,140 We come in. 220 00:14:10,140 --> 00:14:14,260 We hit this point, and then we start coming out again. 221 00:14:14,260 --> 00:14:18,040 I mean, if you think about it, that makes perfect sense 222 00:14:18,040 --> 00:14:21,580 because what we're doing here is we're imagining experiment 223 00:14:21,580 --> 00:14:27,150 where this random variable has a negative expected value. 224 00:14:27,150 --> 00:14:33,060 That's what's indicated by this quantity there. 225 00:14:33,060 --> 00:14:36,010 We're asking what's the probability that the sum of a 226 00:14:36,010 --> 00:14:40,260 large number of IID random variables with a negative 227 00:14:40,260 --> 00:14:44,480 expected value ever rises above some positive threshold? 228 00:14:47,200 --> 00:14:50,000 Well, the law of large numbers says it's not going to do that 229 00:14:50,000 --> 00:14:54,100 when n is very, very large, and this says that, too. 230 00:14:54,100 --> 00:14:57,070 It says the probability of it for n very large is 231 00:14:57,070 --> 00:14:58,910 extraordinarily small. 232 00:14:58,910 --> 00:15:02,860 It's e to the minus 10 times an exponent, which is very, 233 00:15:02,860 --> 00:15:04,660 very large. 234 00:15:04,660 --> 00:15:09,660 So as n gets very small, it's not going to happen either 235 00:15:09,660 --> 00:15:12,780 because it doesn't have time to get to the threshold. 236 00:15:12,780 --> 00:15:15,910 So there's some intermediate value at which it's most 237 00:15:15,910 --> 00:15:18,950 likely to cross the threshold, if you're going to cross the 238 00:15:18,950 --> 00:15:22,460 threshold, and that intermediate value is just 239 00:15:22,460 --> 00:15:28,960 that value at which gamma of r star is equal to zero. 240 00:15:31,640 --> 00:15:34,670 So the probability this union of terms, namely the 241 00:15:34,670 --> 00:15:38,310 probability you ever cross alpha, is going to be, in some 242 00:15:38,310 --> 00:15:45,210 sense, approximately a to the minus alpha r star because 243 00:15:45,210 --> 00:15:47,130 that's where the dominant term is. 244 00:15:47,130 --> 00:15:50,060 The dominant term is where alpha over n is 245 00:15:50,060 --> 00:15:52,180 equal to gamma prime. 246 00:15:52,180 --> 00:15:56,030 Blah, blah, blah, blah, blah, where'd I put that? 247 00:15:56,030 --> 00:16:00,130 r star satisfies gamma of r star equals 0. 248 00:16:00,130 --> 00:16:06,650 When you look at the line of slope, gamma prime of r plus 249 00:16:06,650 --> 00:16:12,860 of r star, that's where you get this critical value of n 250 00:16:12,860 --> 00:16:17,396 where it's most likely the cross the threshold. 251 00:16:17,396 --> 00:16:18,900 OK, I put that somewhere. 252 00:16:18,900 --> 00:16:24,410 I thought it was on this slide, but it's the n, the 253 00:16:24,410 --> 00:16:32,195 critical n, let's call it n crit, is equal to gamma prime. 254 00:16:41,240 --> 00:16:42,930 Is at right? 255 00:16:42,930 --> 00:16:45,720 Alpha over m is the gamma prime. 256 00:16:45,720 --> 00:16:51,730 Alpha over n, 1 over n crit. 257 00:16:51,730 --> 00:17:02,830 n crit, this says, is alpha over gamma prime of r star. 258 00:17:02,830 --> 00:17:09,670 OK, so that sort of nails down everything you want to know 259 00:17:09,670 --> 00:17:12,640 about the Chernoff bound accept for the fact that it is 260 00:17:12,640 --> 00:17:14,270 exponentially tight. 261 00:17:14,270 --> 00:17:15,780 The text proves that. 262 00:17:15,780 --> 00:17:17,780 I'm not going to go through that here. 263 00:17:17,780 --> 00:17:21,480 Exponentially tight means, if you take an exponent which is 264 00:17:21,480 --> 00:17:25,260 just a little bit larger than the one you found here, and 265 00:17:25,260 --> 00:17:27,660 look what happens as alpha gets very, very 266 00:17:27,660 --> 00:17:31,590 large, then you lose. 267 00:17:31,590 --> 00:17:37,820 OK, let's go on, and at this point, we're ready to talk 268 00:17:37,820 --> 00:17:40,510 about Wald's identity. 269 00:17:40,510 --> 00:17:43,600 And we'll prove Wald's identity at the end of the 270 00:17:43,600 --> 00:17:45,510 lecture today. 271 00:17:45,510 --> 00:17:48,010 Turns out there's a very, very simple proof of it. 272 00:17:48,010 --> 00:17:55,800 There's hardly anything to it, but it seems more important to 273 00:17:55,800 --> 00:17:59,960 use it in several ways first so that you get a sense that 274 00:17:59,960 --> 00:18:03,440 it, in fact, is sort of important. 275 00:18:03,440 --> 00:18:09,910 OK, so we want to think about a random walk, s sub n and 276 00:18:09,910 --> 00:18:12,840 greater than or equal to n, so it's a sequence of sums of 277 00:18:12,840 --> 00:18:16,560 random variable, s sub n is equal to x1 278 00:18:16,560 --> 00:18:19,150 plus up to x sub n. 279 00:18:19,150 --> 00:18:21,480 The x's are all IID. 280 00:18:21,480 --> 00:18:24,330 This is the thing we've been talking about all term. 281 00:18:24,330 --> 00:18:27,040 We have a bunch of IID random variables. 282 00:18:27,040 --> 00:18:29,600 We look at the partial sums of them. 283 00:18:29,600 --> 00:18:32,220 We're interested in what happens to that sequence of 284 00:18:32,220 --> 00:18:33,570 partial sums. 285 00:18:33,570 --> 00:18:36,940 The question we're asking here is does that sequence of 286 00:18:36,940 --> 00:18:41,880 partial sums ever cross a positive threshold? 287 00:18:41,880 --> 00:18:44,670 And now we're asking does it ever cross a positive 288 00:18:44,670 --> 00:18:49,170 threshold, or does it cross a negative threshold, and which 289 00:18:49,170 --> 00:18:51,230 does it cross first? 290 00:18:51,230 --> 00:18:54,690 So the probability that it crosses this threshold is the 291 00:18:54,690 --> 00:18:57,340 probability that it goes up, first. 292 00:18:57,340 --> 00:19:00,190 The probability that it crosses this threshold is the 293 00:19:00,190 --> 00:19:03,140 probability that it goes down, first. 294 00:19:03,140 --> 00:19:06,360 Now, what Wald's identity says is the following thing. 295 00:19:06,360 --> 00:19:09,480 We're going to assume that x is not identically 0. 296 00:19:09,480 --> 00:19:12,230 If x is identically 0, then it's never 297 00:19:12,230 --> 00:19:14,400 going to go any place. 298 00:19:14,400 --> 00:19:17,070 We're going to assume that it has a semi invariant moment 299 00:19:17,070 --> 00:19:22,870 generating function in some region, r minus to r plus. 300 00:19:22,870 --> 00:19:25,240 That's the same as assuming that it has a generating 301 00:19:25,240 --> 00:19:31,230 function in that region, so it exists from some value less 302 00:19:31,230 --> 00:19:35,060 than zero to some value greater than zero. 303 00:19:35,060 --> 00:19:38,660 And we picked two thresholds, one of them positive, one of 304 00:19:38,660 --> 00:19:44,980 them negative, and we let j be the smallest value of n. j is 305 00:19:44,980 --> 00:19:49,790 a random variable, now, because we've start to run 306 00:19:49,790 --> 00:19:51,070 this random walk. 307 00:19:51,070 --> 00:19:54,740 We run it until it crosses one of these thresholds, and if it 308 00:19:54,740 --> 00:19:57,780 crosses the positive threshold, j is the time at 309 00:19:57,780 --> 00:20:00,250 which it crosses the positive threshold. 310 00:20:00,250 --> 00:20:03,470 If it crosses the negative threshold, j is the time that 311 00:20:03,470 --> 00:20:05,390 it crosses the negative threshold. 312 00:20:05,390 --> 00:20:07,150 We're only looking at the first 313 00:20:07,150 --> 00:20:08,400 threshold that it crosses. 314 00:20:10,990 --> 00:20:14,370 Now, notice that j is a stopping trial. 315 00:20:14,370 --> 00:20:17,270 In other words, what that means is you can determine 316 00:20:17,270 --> 00:20:21,720 whether you've crossed a threshold at time n solely in 317 00:20:21,720 --> 00:20:26,040 terms of s1 up to s sub n. 318 00:20:26,040 --> 00:20:30,330 If you see all these sums, then you know that you haven't 319 00:20:30,330 --> 00:20:33,170 crossed a threshold up until time n. 320 00:20:33,170 --> 00:20:35,160 You know you have crossed it at time n. 321 00:20:35,160 --> 00:20:37,990 Doesn't make any difference what happens at times 322 00:20:37,990 --> 00:20:39,700 greater than n. 323 00:20:39,700 --> 00:20:42,720 OK, so it's a stopping trial in the same sense as the 324 00:20:42,720 --> 00:20:45,330 stopping trials we talked about before. 325 00:20:45,330 --> 00:20:48,720 You get the sense that Wald's identity, which we're talking 326 00:20:48,720 --> 00:20:52,020 about here, is sort of like Wald's equality, which we 327 00:20:52,020 --> 00:20:53,810 talked about before. 328 00:20:53,810 --> 00:20:57,620 Both of them have to do with these stopping trials. 329 00:20:57,620 --> 00:21:01,000 Both of them have everything to do with stopping trials. 330 00:21:01,000 --> 00:21:05,390 Wald was a famous statistician, not all that 331 00:21:05,390 --> 00:21:09,030 much before your era. 332 00:21:09,030 --> 00:21:10,910 He didn't die too long ago. 333 00:21:10,910 --> 00:21:15,010 I forget when, but he was one of the good statisticians. 334 00:21:15,010 --> 00:21:19,230 See, he was a statistician who recognized that you wanted to 335 00:21:19,230 --> 00:21:21,850 look at lots of different models to understand the 336 00:21:21,850 --> 00:21:24,830 problem, rather than a statistician who only wanted 337 00:21:24,830 --> 00:21:30,040 to take data and think that he wasn't assuming anything. 338 00:21:30,040 --> 00:21:32,630 So Wald was a good guy. 339 00:21:32,630 --> 00:21:37,530 And what his identity says is, and the trouble with his 340 00:21:37,530 --> 00:21:42,070 identity, is you look at it, and you blink. 341 00:21:42,070 --> 00:21:47,950 The expected value of e to the r s sub j. 342 00:21:47,950 --> 00:21:53,520 s sub j is the value of the random walk at the time when 343 00:21:53,520 --> 00:21:57,470 you cross a threshold minus the time at which you've 344 00:21:57,470 --> 00:22:01,510 crossed a threshold times gamma of r. 345 00:22:01,510 --> 00:22:05,670 So when you take the expected value of e to the this, you're 346 00:22:05,670 --> 00:22:10,410 averaging over j over the time that you crossed the 347 00:22:10,410 --> 00:22:14,400 threshold, and also at the value at which you crossed the 348 00:22:14,400 --> 00:22:17,500 threshold, so you're averaging over both of these things. 349 00:22:17,500 --> 00:22:21,900 And Wald says this expectation not is less than or equal to 350 00:22:21,900 --> 00:22:27,870 1, but it's exactly 1, and it's exactly 1 for every r 351 00:22:27,870 --> 00:22:31,700 between r minus and r plus. 352 00:22:31,700 --> 00:22:34,690 So it's a very surprising result. 353 00:22:34,690 --> 00:22:34,910 Yes? 354 00:22:34,910 --> 00:22:35,862 AUDIENCE: Can you please explain 355 00:22:35,862 --> 00:22:37,670 why j cannot be defective? 356 00:22:37,670 --> 00:22:38,710 I don't really see it. 357 00:22:38,710 --> 00:22:42,160 PROFESSOR: Oh, it's because we were looking at two 358 00:22:42,160 --> 00:22:43,400 thresholds. 359 00:22:43,400 --> 00:22:46,790 If we only had one threshold, then it could be defective. 360 00:22:46,790 --> 00:22:49,700 Since we're looking at two thresholds, you keep adding 361 00:22:49,700 --> 00:22:53,510 random variables in, and the sum starts to have a larger 362 00:22:53,510 --> 00:22:55,650 and larger variance. 363 00:22:55,650 --> 00:22:58,250 Now, even with a large variance, you're not sure that 364 00:22:58,250 --> 00:23:01,090 you crossed a threshold, but you see why you 365 00:23:01,090 --> 00:23:03,370 must cross a threshold. 366 00:23:03,370 --> 00:23:04,022 Yes? 367 00:23:04,022 --> 00:23:06,475 AUDIENCE: If the MGF is defined at r minus, r plus, 368 00:23:06,475 --> 00:23:10,280 then is that also [INAUDIBLE] quality? 369 00:23:10,280 --> 00:23:12,690 PROFESSOR: Yes. 370 00:23:12,690 --> 00:23:15,585 Oh, if it's defined at r plus. 371 00:23:19,240 --> 00:23:19,495 I don't know. 372 00:23:19,495 --> 00:23:24,060 I don't remember, and I would have to think about it hard. 373 00:23:27,830 --> 00:23:32,225 Funny things happen right at the ends of where these moment 374 00:23:32,225 --> 00:23:35,070 generating functions are defined, and you'll see why 375 00:23:35,070 --> 00:23:36,320 when we prove it. 376 00:23:39,650 --> 00:23:42,450 I can give you a clue as to how we're going to prove it. 377 00:23:42,450 --> 00:23:48,860 What we're going to do is, for this random variable x, we're 378 00:23:48,860 --> 00:23:52,780 going to define another random variable which has the same 379 00:23:52,780 --> 00:23:58,070 distribution as x except it's tilted. 380 00:23:58,070 --> 00:24:03,250 For large values of x, you multiply it by e to the rx. 381 00:24:03,250 --> 00:24:05,100 For small values of x, you multiply it 382 00:24:05,100 --> 00:24:07,480 by e to the rx also. 383 00:24:07,480 --> 00:24:10,960 But if r is positive, that means the positive values 384 00:24:10,960 --> 00:24:17,000 could shifted up, and the small values get shifted down. 385 00:24:20,280 --> 00:24:23,560 So you're taking some of the density that looks like this, 386 00:24:23,560 --> 00:24:29,640 and when you shift it to this tilted value, you're shifting 387 00:24:29,640 --> 00:24:32,190 the whole thing upward. 388 00:24:32,190 --> 00:24:34,620 When r is negative, you're shifting 389 00:24:34,620 --> 00:24:36,330 the whole thing downward. 390 00:24:36,330 --> 00:24:43,220 Now, what this says is that tilted random variable, when 391 00:24:43,220 --> 00:24:47,130 it crosses the threshold, the time of crossing the threshold 392 00:24:47,130 --> 00:24:48,640 is still a random variable. 393 00:24:48,640 --> 00:24:53,660 You will see that this simply says that the expected value 394 00:24:53,660 --> 00:24:57,360 of that tilted random variable is equal to-- 395 00:24:57,360 --> 00:25:00,560 it says that tilted random variable is, in fact, the 396 00:25:00,560 --> 00:25:01,430 random variable. 397 00:25:01,430 --> 00:25:02,640 It's not defective. 398 00:25:02,640 --> 00:25:05,600 And it's the same argument as before, that has a finite 399 00:25:05,600 --> 00:25:09,270 variance, and therefore, since it has a finite variance, it 400 00:25:09,270 --> 00:25:10,550 keeps expanding. 401 00:25:10,550 --> 00:25:13,960 It will cross one of the thresholds eventually. 402 00:25:13,960 --> 00:25:21,040 OK, so the other thing you can do here is to say, suppose 403 00:25:21,040 --> 00:25:25,020 instead of crossing a threshold, you just fix this 404 00:25:25,020 --> 00:25:29,350 stopping rule to say we'll stop at time 100. 405 00:25:29,350 --> 00:25:33,370 If you stop at time 100, then what this says is expected 406 00:25:33,370 --> 00:25:38,745 value of e to the r100 minus 100 times gamma of 407 00:25:38,745 --> 00:25:40,940 r is equal to 1. 408 00:25:40,940 --> 00:25:44,690 But that's obvious because the expected value of e to the r 409 00:25:44,690 --> 00:25:46,340 is j is, in fact-- 410 00:25:54,030 --> 00:25:59,010 it's j times the expected value of rx, so then you're 411 00:25:59,010 --> 00:26:02,410 subtracting off j times the log of the 412 00:26:02,410 --> 00:26:04,890 expected value of rx. 413 00:26:04,890 --> 00:26:10,080 So it's a trivial identity if x is fixed. 414 00:26:10,080 --> 00:26:15,930 OK, so Wald's identity says this. 415 00:26:15,930 --> 00:26:20,240 Let's see what it means in terms of crossing a threshold. 416 00:26:20,240 --> 00:26:22,290 We'll assume both thresholds are there. 417 00:26:22,290 --> 00:26:27,070 Incidentally, Wald's identity is valid in a much broader 418 00:26:27,070 --> 00:26:30,310 range of circumstances than just where you have two 419 00:26:30,310 --> 00:26:33,430 thresholds, and you're looking at a threshold crossing. 420 00:26:33,430 --> 00:26:41,680 It's just that's a particularly valuable form of 421 00:26:41,680 --> 00:26:44,880 the Wald identity. 422 00:26:44,880 --> 00:26:47,350 So that's the only thing we're going to use. 423 00:26:47,350 --> 00:26:51,080 But now, if we assume further that this random variable x 424 00:26:51,080 --> 00:26:55,820 has a negative expectation, when x has a negative 425 00:26:55,820 --> 00:27:00,490 expectation, gamma of r starts off going down. 426 00:27:00,490 --> 00:27:03,300 Usually, it comes back up again. 427 00:27:03,300 --> 00:27:12,000 We're going to assume that this quantity r star here, 428 00:27:12,000 --> 00:27:22,400 where it crosses 0 again, we're going to assume there is 429 00:27:22,400 --> 00:27:26,460 some value of r, for which gamma of r star equals 0. 430 00:27:26,460 --> 00:27:30,280 Mainly, we're going to assume this typical case in which it 431 00:27:30,280 --> 00:27:35,980 comes back up and crosses the 0 point. 432 00:27:35,980 --> 00:27:43,600 And in that case, what it says is the probability that sj is 433 00:27:43,600 --> 00:27:47,200 greater than or equal to alpha is just less than or equal to 434 00:27:47,200 --> 00:27:50,320 e to the minus r star times alpha. 435 00:27:50,320 --> 00:27:53,140 Very, very simple bound at this point. 436 00:27:53,140 --> 00:27:57,130 You look at this, and you sort of see why we're looking, now, 437 00:27:57,130 --> 00:28:01,820 not at r in general, but just r star. 438 00:28:01,820 --> 00:28:05,360 At r star, gamma of r star is equal to 0. 439 00:28:05,360 --> 00:28:09,270 So this term goes away, so we're only talking about the 440 00:28:09,270 --> 00:28:15,670 expected value of e to the r sj, e to the r star sj is 441 00:28:15,670 --> 00:28:16,640 equal to 1. 442 00:28:16,640 --> 00:28:19,320 So let's see what happens. 443 00:28:19,320 --> 00:28:24,080 We know that e to the r star sj is greater than or equal to 444 00:28:24,080 --> 00:28:35,160 0 for all values of sj because e to the anything real is 445 00:28:35,160 --> 00:28:36,970 going to be positive. 446 00:28:36,970 --> 00:28:41,670 OK, since e to the r star sj is greater than or equal to 0, 447 00:28:41,670 --> 00:28:46,230 what we can do is break this expected value here, this term 448 00:28:46,230 --> 00:28:49,760 is 0, now, remember, break it into two terms. 449 00:28:49,760 --> 00:28:54,590 Break it into the term where s sub j is bigger than alpha, 450 00:28:54,590 --> 00:29:00,060 and break it into the term where s sub j 451 00:29:00,060 --> 00:29:02,250 is less than beta. 452 00:29:02,250 --> 00:29:04,570 So I'm just going to ignore the case where it's less than 453 00:29:04,570 --> 00:29:05,970 or equal to beta. 454 00:29:05,970 --> 00:29:08,200 I'm going to take this expected value. 455 00:29:08,200 --> 00:29:11,360 I'm going to write it as the probability that s sub j is 456 00:29:11,360 --> 00:29:15,550 greater than or equal to alpha times e times the expected 457 00:29:15,550 --> 00:29:19,750 value of either the r star s sub j given s sub j greater 458 00:29:19,750 --> 00:29:20,900 than or equal to alpha. 459 00:29:20,900 --> 00:29:25,740 There should be another term in here to make this equal, 460 00:29:25,740 --> 00:29:28,740 and that's the probability that s sub j is less than or 461 00:29:28,740 --> 00:29:33,450 equal to beta times e to the r star s sub j, given that s sub 462 00:29:33,450 --> 00:29:35,760 j is less than or equal to beta. 463 00:29:35,760 --> 00:29:39,120 We're going to ignore that, and that's why we get the less 464 00:29:39,120 --> 00:29:40,990 than or equal to 1 here. 465 00:29:40,990 --> 00:29:43,770 Now, you can lower bound e to the r star 466 00:29:43,770 --> 00:29:47,060 sj under this condition. 467 00:29:47,060 --> 00:29:51,980 What's a lower bound to s sub j given that s sub j is 468 00:29:51,980 --> 00:29:53,352 greater than or equal to alpha? 469 00:29:56,010 --> 00:29:58,510 Alpha. 470 00:29:58,510 --> 00:30:01,380 OK, we're looking at all cases where s sub j is greater than 471 00:30:01,380 --> 00:30:08,200 or equal to alpha, and we're going to stop this experiment 472 00:30:08,200 --> 00:30:10,920 at the point where it first exceeds alpha. 473 00:30:10,920 --> 00:30:13,280 So we're going to lower bound the point where it first 474 00:30:13,280 --> 00:30:20,190 exceeds alpha by alpha itself, so this quantity is lower 475 00:30:20,190 --> 00:30:23,880 bounded, again, by taking the probability that s sub j 476 00:30:23,880 --> 00:30:28,980 greater than or equal to alpha times e to the r star alpha, 477 00:30:28,980 --> 00:30:31,680 and that whole thing is less than or equal to 1. 478 00:30:31,680 --> 00:30:35,380 That says the probability that sj is greater than or equal to 479 00:30:35,380 --> 00:30:44,590 alpha is less than or equal to e to the minus r star alpha, 480 00:30:44,590 --> 00:30:47,990 which is what this inequality says here. 481 00:30:47,990 --> 00:30:52,290 OK, so this is not rocket science. 482 00:30:52,290 --> 00:30:57,910 This is a fairly simple result if you believe in Wald's 483 00:30:57,910 --> 00:31:00,740 identity, which we'll prove later. 484 00:31:00,740 --> 00:31:02,970 OK, so it's valid for all choices 485 00:31:02,970 --> 00:31:05,430 of this lower threshold. 486 00:31:05,430 --> 00:31:10,260 And remember, this probability here, it doesn't look like 487 00:31:10,260 --> 00:31:14,640 it's a function of both alpha and beta, but it is because 488 00:31:14,640 --> 00:31:21,370 you're asking what's the probability that you cross the 489 00:31:21,370 --> 00:31:26,380 threshold alpha before you cross the threshold beta. 490 00:31:26,380 --> 00:31:29,180 And if you make beta very, very large, it makes it more 491 00:31:29,180 --> 00:31:31,780 likely that you're going to cross the threshold. 492 00:31:31,780 --> 00:31:36,780 If you make beta very close to 0, then you're probably going 493 00:31:36,780 --> 00:31:42,750 to cross beta first, so this inequality here, this quantity 494 00:31:42,750 --> 00:31:44,750 here, depends on beta also. 495 00:31:44,750 --> 00:31:47,960 But we know that this inequality is valid no matter 496 00:31:47,960 --> 00:31:52,210 what beta is, so we can let beta approach minus infinity, 497 00:31:52,210 --> 00:31:54,700 and we can still have this inequality. 498 00:31:54,700 --> 00:31:58,390 There's a little bit tricky math involved in that. 499 00:31:58,390 --> 00:32:02,800 There's an exercise in the text which goes through that 500 00:32:02,800 --> 00:32:06,880 slightly tricky math, but what you find is that this bound is 501 00:32:06,880 --> 00:32:09,920 valid with only one threshold, as well as with two 502 00:32:09,920 --> 00:32:10,940 thresholds. 503 00:32:10,940 --> 00:32:15,010 But this proof here that we've given depends on a lower 504 00:32:15,010 --> 00:32:17,420 threshold, which is somewhere. 505 00:32:17,420 --> 00:32:18,910 We don't care where. 506 00:32:18,910 --> 00:32:21,830 Valid for all choices of beta, so it's valid 507 00:32:21,830 --> 00:32:24,580 without a lower threshold. 508 00:32:24,580 --> 00:32:30,590 The probability that the union overall n of sn less than or 509 00:32:30,590 --> 00:32:31,410 equal to alpha. 510 00:32:31,410 --> 00:32:33,330 In other words, the probability that we ever 511 00:32:33,330 --> 00:32:35,110 crossed a threshold alpha-- 512 00:32:35,110 --> 00:32:36,430 AUDIENCE: It's not true equal. 513 00:32:36,430 --> 00:32:36,880 PROFESSOR: What? 514 00:32:36,880 --> 00:32:39,090 AUDIENCE: It's supposed to be sn larger [INAUDIBLE] 515 00:32:39,090 --> 00:32:40,990 as the last time? 516 00:32:40,990 --> 00:32:43,940 PROFESSOR: It's less than or equal to e to the minus r star 517 00:32:43,940 --> 00:32:46,590 alpha, which is-- 518 00:32:46,590 --> 00:32:47,133 AUDIENCE: Oh, sn? 519 00:32:47,133 --> 00:32:48,000 sn? 520 00:32:48,000 --> 00:32:50,194 PROFESSOR: n. 521 00:32:50,194 --> 00:32:51,076 AUDIENCE: You just [INAUDIBLE] 522 00:32:51,076 --> 00:32:51,517 [? the quantity? ?] 523 00:32:51,517 --> 00:32:54,040 PROFESSOR: Oh, it's a union overall n greater than or 524 00:32:54,040 --> 00:32:55,290 equal to 1. 525 00:32:58,407 --> 00:33:03,050 OK, in other words, this quantity we're dealing with 526 00:33:03,050 --> 00:33:08,581 here is the probability that sn--- 527 00:33:08,581 --> 00:33:12,620 oh, I see what you're saying. 528 00:33:12,620 --> 00:33:14,320 This quantity here should be greater 529 00:33:14,320 --> 00:33:15,230 than or equal to alpha. 530 00:33:15,230 --> 00:33:18,200 You're right. 531 00:33:18,200 --> 00:33:21,290 Sorry about that. 532 00:33:21,290 --> 00:33:22,710 I think it's right most places. 533 00:33:22,710 --> 00:33:24,850 Yes, it's right. 534 00:33:24,850 --> 00:33:26,100 We have it right here. 535 00:33:32,210 --> 00:33:36,110 The probability of this union is really the same as the 536 00:33:36,110 --> 00:33:39,930 probability that the value of it, after it crosses the 537 00:33:39,930 --> 00:33:42,110 threshold, is greater than or equal to alpha. 538 00:33:45,910 --> 00:33:51,150 OK, now, we saw before that the probability that s sub n 539 00:33:51,150 --> 00:33:55,500 is greater than or equal to alpha. 540 00:33:55,500 --> 00:33:56,750 Excuse me, that's the same. 541 00:34:00,580 --> 00:34:04,030 When you're writing things in LaTeX, the symbol for less 542 00:34:04,030 --> 00:34:06,840 than or equal to is so similar to that for greater than or 543 00:34:06,840 --> 00:34:09,110 equal to that's hard to keep them straight. 544 00:34:09,110 --> 00:34:14,239 That quantity there is a greater than or equal to sign, 545 00:34:14,239 --> 00:34:16,850 if you're going from right to left instead of right to left. 546 00:34:16,850 --> 00:34:22,659 So all we're doing here is simply using this, well, 547 00:34:22,659 --> 00:34:24,550 greater than or equal to. 548 00:34:24,550 --> 00:34:27,949 OK, the corollary makes a stronger and cleaner statement 549 00:34:27,949 --> 00:34:35,630 that the probability that you ever cross alpha is less than 550 00:34:35,630 --> 00:34:36,810 or equal to-- 551 00:34:36,810 --> 00:34:42,910 my heavens, my evil twin got a hold of these slides. 552 00:34:46,449 --> 00:34:51,760 And let me rewrite that one. 553 00:34:51,760 --> 00:35:02,620 The probability that the union overall n of the event s sub n 554 00:35:02,620 --> 00:35:09,425 greater than or equal to alpha is less than or equal to e to 555 00:35:09,425 --> 00:35:13,860 the minus r star alpha. 556 00:35:13,860 --> 00:35:17,350 OK, so we've seen from the Chernoff bound that for every 557 00:35:17,350 --> 00:35:21,050 n this bound has satisfied, this says that it's not only 558 00:35:21,050 --> 00:35:25,740 satisfied for each n, but it says it's satisfied overall n 559 00:35:25,740 --> 00:35:26,790 collectively. 560 00:35:26,790 --> 00:35:29,960 Otherwise, if we were using the Chernoff bound, what would 561 00:35:29,960 --> 00:35:33,420 we have to do to get a handle on this quantity? 562 00:35:33,420 --> 00:35:37,370 We'd have to use the union bound, and then when we use 563 00:35:37,370 --> 00:35:41,950 the union bound, we can show that for every n, the 564 00:35:41,950 --> 00:35:44,820 probability that sn is greater than or equal to alpha is less 565 00:35:44,820 --> 00:35:46,170 than or equal to this quantity. 566 00:35:46,170 --> 00:35:49,340 But then we'd have to add all those terms, and we would have 567 00:35:49,340 --> 00:35:52,600 to somehow diddle around with them to show that there are 568 00:35:52,600 --> 00:35:57,760 only a few of them which are close to this value, and all 569 00:35:57,760 --> 00:36:00,850 the rest are negligible. 570 00:36:00,850 --> 00:36:04,420 And the number that are close to that value is only growing 571 00:36:04,420 --> 00:36:07,800 with n and goes through a lot of headache. 572 00:36:07,800 --> 00:36:11,460 Here, we don't have to do this anymore because the Wald 573 00:36:11,460 --> 00:36:14,100 identity has saved is from all that difficulty. 574 00:36:17,780 --> 00:36:24,100 OK, we talked about the G/G/1 queue. 575 00:36:24,100 --> 00:36:28,000 We're going to apply this corollary to the G/G/1 queue 576 00:36:28,000 --> 00:36:33,000 to the queueing time, namely to the time w sub i that the 577 00:36:33,000 --> 00:36:36,590 i's arrival spends in the queue before 578 00:36:36,590 --> 00:36:38,420 starting to be served. 579 00:36:38,420 --> 00:36:41,860 You remember, when we looked at that, we found that if we 580 00:36:41,860 --> 00:36:48,850 define u sub i to be equal to the ith interarrival time 581 00:36:48,850 --> 00:36:53,230 minus the i minus first service time, those are 582 00:36:53,230 --> 00:36:55,640 independent of each other, so this is the 583 00:36:55,640 --> 00:36:58,300 difference between those. 584 00:36:58,300 --> 00:37:02,100 So ui is the difference between the i's arrival time 585 00:37:02,100 --> 00:37:05,000 and the previous service time. 586 00:37:05,000 --> 00:37:09,280 What we showed was that this sequence, u sub i, the 587 00:37:09,280 --> 00:37:14,170 sequence of the sums of u sub i as a modification of a 588 00:37:14,170 --> 00:37:16,440 random walk. 589 00:37:16,440 --> 00:37:20,180 In other words, the sums of the u sub i behave exactly 590 00:37:20,180 --> 00:37:24,280 like a random walk does, but every time it gets down to 0, 591 00:37:24,280 --> 00:37:27,030 if it crosses 0, it resets to 0 again. 592 00:37:27,030 --> 00:37:30,700 So it keeps bouncing up again. 593 00:37:30,700 --> 00:37:36,970 If you look in the text, what it shows is that if you look 594 00:37:36,970 --> 00:37:41,130 at this sequence of u sub i's, and you look at the sum of 595 00:37:41,130 --> 00:37:43,800 them, and you look at them backward, if you look at the 596 00:37:43,800 --> 00:37:50,380 sum of u sub i plus u sub i minus 1 plus u sub i minus 2, 597 00:37:50,380 --> 00:37:56,770 and so forth, when you look at the sum that way, it actually 598 00:37:56,770 --> 00:37:58,490 becomes a random walk. 599 00:37:58,490 --> 00:38:03,290 Therefore, we can apply this bound to the random walk, and 600 00:38:03,290 --> 00:38:08,200 what we find is that the probability that the waiting 601 00:38:08,200 --> 00:38:18,010 time of n queue, of the nth customer, is probability that 602 00:38:18,010 --> 00:38:21,870 it's greater than or equal to an arbitrary number alpha is 603 00:38:21,870 --> 00:38:25,150 less than or equal to the probability that w sub 604 00:38:25,150 --> 00:38:28,640 infinity is greater than or equal to alpha, and it's less 605 00:38:28,640 --> 00:38:31,610 than e to the minus r star alpha. 606 00:38:31,610 --> 00:38:35,750 So again, all you have to do is you have this inner arrival 607 00:38:35,750 --> 00:38:39,390 time x, you have this service time y, you take the 608 00:38:39,390 --> 00:38:42,770 difference of the two, that's a random variable, you find a 609 00:38:42,770 --> 00:38:46,450 moment generating function of that random variable, you find 610 00:38:46,450 --> 00:38:49,730 the point of r star at which that moment generating 611 00:38:49,730 --> 00:38:53,580 function equals 1, and then the bound says that the 612 00:38:53,580 --> 00:38:57,760 probability that the queueing time that you're going to be 613 00:38:57,760 --> 00:39:00,840 dealing with is less than or equal to this 614 00:39:00,840 --> 00:39:01,890 quantity alpha here. 615 00:39:01,890 --> 00:39:02,120 Yes? 616 00:39:02,120 --> 00:39:06,016 AUDIENCE: What do you work with when you have the gamma 617 00:39:06,016 --> 00:39:08,938 function go like this, and thus have infinity, and you 618 00:39:08,938 --> 00:39:09,912 cross it there. 619 00:39:09,912 --> 00:39:12,610 [INAUDIBLE] points that we're looking for? 620 00:39:12,610 --> 00:39:15,090 PROFESSOR: For that, you have to read the text. 621 00:39:15,090 --> 00:39:20,240 I mean, effectively, you can think of it just as if gamma 622 00:39:20,240 --> 00:39:24,980 of r is a convex function like anything else. 623 00:39:24,980 --> 00:39:28,840 It just has a discontinuity in it, and bingo, it shoots off 624 00:39:28,840 --> 00:39:30,140 to infinity. 625 00:39:30,140 --> 00:39:33,570 So when you take these slope arguments, what happens is 626 00:39:33,570 --> 00:39:37,140 that for all slopes beyond that point, they just seesaw 627 00:39:37,140 --> 00:39:39,820 around at one point. 628 00:39:39,820 --> 00:39:41,250 But the same bound holds. 629 00:39:44,020 --> 00:39:46,960 OK, so that's the Kingman bound. 630 00:39:50,240 --> 00:39:53,250 Then we talked about large deviations 631 00:39:53,250 --> 00:39:54,590 for hypothesis test. 632 00:39:54,590 --> 00:39:58,340 Well, actually we just talked about hypothesis test, but not 633 00:39:58,340 --> 00:40:01,680 large deviation for them. 634 00:40:01,680 --> 00:40:05,840 Let's review where we were on that. 635 00:40:05,840 --> 00:40:14,870 Let's let the vector y be an n tuple of IID random variables, 636 00:40:14,870 --> 00:40:17,000 y1 up to y sub n. 637 00:40:17,000 --> 00:40:21,050 They're IID conditional on hypothesis 0. 638 00:40:21,050 --> 00:40:25,540 They're also IID conditional on hypothesis 1, so the game 639 00:40:25,540 --> 00:40:31,965 is nature chooses either hypothesis 0 or hypothesis 1. 640 00:40:35,060 --> 00:40:40,590 You take n samples of some IID random variable, and those n 641 00:40:40,590 --> 00:40:44,370 samples are IID conditional on either nature choosing 0 or 642 00:40:44,370 --> 00:40:45,870 nature choosing 1. 643 00:40:45,870 --> 00:40:49,260 At the end of choosing those n samples, you're supposed to 644 00:40:49,260 --> 00:40:54,070 guess whether h0 is the right hypothesis or 1 is a right 645 00:40:54,070 --> 00:40:55,800 hypothesis. 646 00:40:55,800 --> 00:40:59,790 Invest in Apple stock 10 years ago, and one hypothesis is 647 00:40:59,790 --> 00:41:00,970 it's going to go broke. 648 00:41:00,970 --> 00:41:04,180 The other hypothesis is it's going to invent marvelous 649 00:41:04,180 --> 00:41:08,030 things, and your stock will go up by a factor of 50. 650 00:41:08,030 --> 00:41:12,530 You take some samples, you make your decision on that. 651 00:41:12,530 --> 00:41:15,210 Fortunately, with that, you can make a separate decision 652 00:41:15,210 --> 00:41:18,900 each year, but that's the kind of thing that 653 00:41:18,900 --> 00:41:19,840 we're talking about. 654 00:41:19,840 --> 00:41:24,790 We're just restricting it to this case where you have n 655 00:41:24,790 --> 00:41:28,280 sample values that you're taking one after the other, 656 00:41:28,280 --> 00:41:31,380 and they're all IID when the particular value of the 657 00:41:31,380 --> 00:41:34,210 hypothesis that happens to be there. 658 00:41:34,210 --> 00:41:37,830 OK, so we said there is something called 659 00:41:37,830 --> 00:41:39,710 a likelihood ratio. 660 00:41:39,710 --> 00:41:45,610 The likelihood ratio for a particular sequence y is 661 00:41:45,610 --> 00:41:53,490 lambda of y is equal to the density of y given h1 divided 662 00:41:53,490 --> 00:41:55,900 by the density of y given h0. 663 00:41:55,900 --> 00:42:00,550 Why is it h1 on the top and h0 on the bottom? 664 00:42:00,550 --> 00:42:02,970 Purely convention, nothing else. 665 00:42:02,970 --> 00:42:05,600 The only thing that distinguishes hypothesis 1 666 00:42:05,600 --> 00:42:10,940 from hypothesis 0 is you choose one and call it 1, and 667 00:42:10,940 --> 00:42:13,350 you choose the other and call it 0. 668 00:42:13,350 --> 00:42:16,900 Doesn't make any difference how you do it. 669 00:42:16,900 --> 00:42:19,470 So after we make that choice, the likelihood 670 00:42:19,470 --> 00:42:22,870 ratio is that ratio. 671 00:42:22,870 --> 00:42:27,200 Now, the reason for using semi invariant moment generating 672 00:42:27,200 --> 00:42:31,170 functions is that this density here is 673 00:42:31,170 --> 00:42:32,930 a product of densities. 674 00:42:32,930 --> 00:42:36,710 This density is a product of densities, and therefore when 675 00:42:36,710 --> 00:42:44,380 you take the log of this ratio of products, you get the sum 676 00:42:44,380 --> 00:42:51,810 from i equals 1 to n of this log likelihood ratio for just 677 00:42:51,810 --> 00:42:54,970 a single experiment. 678 00:42:54,970 --> 00:42:58,520 It's a single experiment that you're taking based on the 679 00:42:58,520 --> 00:43:01,850 fact that all n experiments are based on the same 680 00:43:01,850 --> 00:43:05,290 hypothesis, either h0 or h1. 681 00:43:05,290 --> 00:43:08,430 So the game that you're playing, and please remember 682 00:43:08,430 --> 00:43:11,310 what the game is if you forget everything else about this 683 00:43:11,310 --> 00:43:16,350 game, is the hypothesis gets chosen, and at the same time, 684 00:43:16,350 --> 00:43:18,710 you take n sample values. 685 00:43:18,710 --> 00:43:22,830 All n sample values correspond to the same value of the 686 00:43:22,830 --> 00:43:24,300 hypothesis. 687 00:43:24,300 --> 00:43:29,010 OK, so when you do that, we're going to call z sub i, this 688 00:43:29,010 --> 00:43:33,090 logarithm here, this log likelihood ratio. 689 00:43:33,090 --> 00:43:37,370 And then we showed last time that a threshold test is-- 690 00:43:37,370 --> 00:43:43,250 well, we define the threshold test as comparing the sum with 691 00:43:43,250 --> 00:43:45,620 the logarithm of a threshold. 692 00:43:45,620 --> 00:43:51,630 And the threshold is equal to p0 over p sub 1, if in fact 693 00:43:51,630 --> 00:43:55,890 you're doing a maximum a posteriori probability test, 694 00:43:55,890 --> 00:44:01,530 and p0 and p1 are the probabilities of hypothesis. 695 00:44:01,530 --> 00:44:03,410 Remember how we did that. 696 00:44:03,410 --> 00:44:04,700 It was a very simple thing. 697 00:44:04,700 --> 00:44:09,350 You just write out what the probability is of hypothesis 0 698 00:44:09,350 --> 00:44:12,060 and a sequence of n values of y. 699 00:44:12,060 --> 00:44:16,260 You write out what the probability is of hypotheses 1 700 00:44:16,260 --> 00:44:22,360 and that same sequence of values with the appropriate 701 00:44:22,360 --> 00:44:30,880 probability on that sequence for h equals 1 and h equals 0. 702 00:44:30,880 --> 00:44:37,230 And what you get out of that is that the threshold test 703 00:44:37,230 --> 00:44:40,050 sums up all the z sub i's, compares it with the 704 00:44:40,050 --> 00:44:44,990 threshold, and makes a choice, and that is the map choice. 705 00:44:44,990 --> 00:44:50,520 OK, so conditional on h0, you're going to make an error 706 00:44:50,520 --> 00:44:54,490 if the sum of the z sub i's is greater than the 707 00:44:54,490 --> 00:44:57,500 logarithm of eta. 708 00:44:57,500 --> 00:45:02,350 And conditional on h1, you're going to make an error if the 709 00:45:02,350 --> 00:45:05,450 sum is less than or equal to log eta. 710 00:45:05,450 --> 00:45:11,540 I denote these as the random variable z sub i 0 to make 711 00:45:11,540 --> 00:45:16,020 sure that you recognize that this random variable here is 712 00:45:16,020 --> 00:45:20,700 conditional on h0 in this case, and it's conditional on 713 00:45:20,700 --> 00:45:22,965 h1 in the opposite case. 714 00:45:26,680 --> 00:45:32,570 OK, so the exponential bound for z sub i sub 0-- 715 00:45:32,570 --> 00:45:36,900 OK, so what we're doing now is we're saying, OK, suppose that 716 00:45:36,900 --> 00:45:40,900 0 is the actual value of this hypothesis. 717 00:45:40,900 --> 00:45:43,980 0 is the value of the hypothesis. 718 00:45:43,980 --> 00:45:46,700 The experimenter doesn't know this. 719 00:45:46,700 --> 00:45:49,820 What the experimenter does is does what the experimenter has 720 00:45:49,820 --> 00:45:53,910 been told to do, namely the experimenter take these n 721 00:45:53,910 --> 00:45:59,090 values, y1 up to y sub n, finds the likelihood ratio, 722 00:45:59,090 --> 00:46:03,390 compares that likelihood ratio with the threshold, and if the 723 00:46:03,390 --> 00:46:08,370 threshold is larger than the threshold, it decides 1. 724 00:46:08,370 --> 00:46:11,465 If it's smaller than the threshold, that decides 725 00:46:11,465 --> 00:46:14,060 opposite thing. 726 00:46:14,060 --> 00:46:17,300 It decides 1 if it's above the threshold, 0 if 727 00:46:17,300 --> 00:46:18,550 it's below the threshold. 728 00:46:26,390 --> 00:46:30,440 Well, first thing we want to do, then, is to find the log 729 00:46:30,440 --> 00:46:36,050 likelihood ratio under the assumption that 0 is the 730 00:46:36,050 --> 00:46:39,550 correct hypothesis, and something very remarkable 731 00:46:39,550 --> 00:46:41,420 happens here. 732 00:46:41,420 --> 00:46:46,980 Gamma sub 0 of r is now the logarithm because it's a semi 733 00:46:46,980 --> 00:46:52,190 invariant moment generating function of the expected value 734 00:46:52,190 --> 00:46:57,912 of this quantity of e to the r times z sub i. 735 00:46:57,912 --> 00:47:01,200 When we take the expected value, we integrate over f of 736 00:47:01,200 --> 00:47:07,692 y given h0 times e to the r times log of f of y given h1 737 00:47:07,692 --> 00:47:10,300 over f of y given h0. 738 00:47:10,300 --> 00:47:12,920 You look at this, and what do you get? 739 00:47:12,920 --> 00:47:17,330 This quantity here is e to the r times log of 740 00:47:17,330 --> 00:47:18,920 f of y given h1. 741 00:47:18,920 --> 00:47:22,670 That whole quantity in there is just f of y given 742 00:47:22,670 --> 00:47:26,830 h1 to the rth power. 743 00:47:26,830 --> 00:47:33,270 So what we have is, in this quantity here, is f of y given 744 00:47:33,270 --> 00:47:35,990 h0 to the minus r power. 745 00:47:35,990 --> 00:47:40,100 So this term combined with this term gives us f of 1 746 00:47:40,100 --> 00:47:45,890 minus r of y given h0, and this quantity here is f to the 747 00:47:45,890 --> 00:47:50,460 r of y given h1 dy. 748 00:47:50,460 --> 00:47:55,340 So the semi invariant moment generating function is this 749 00:47:55,340 --> 00:47:56,410 quantity here. 750 00:47:56,410 --> 00:48:02,910 At r equals 1, this is just f of y given h1, so the log of 751 00:48:02,910 --> 00:48:05,470 it is equal to 0. 752 00:48:05,470 --> 00:48:10,420 So what we're saying is that, for any old detection problem 753 00:48:10,420 --> 00:48:13,590 in the world, so long as this moment generating function 754 00:48:13,590 --> 00:48:20,560 exists, what happens is it starts at 0, it comes down, 755 00:48:20,560 --> 00:48:26,750 comes back up again, and r star is equal to 1. 756 00:48:26,750 --> 00:48:28,110 That's what we've just shown. 757 00:48:28,110 --> 00:48:33,140 When r is equal to 1, this whole thing is equal to 1, so 758 00:48:33,140 --> 00:48:35,560 the log of 1 is equal to 0. 759 00:48:35,560 --> 00:48:38,680 For every one of these problems, you know where this 760 00:48:38,680 --> 00:48:42,090 intercept is, you know where this intercept is, one is at 761 00:48:42,090 --> 00:48:43,950 0, one is at 1. 762 00:48:59,190 --> 00:49:01,560 What we're going to do now is try to find out what the 763 00:49:01,560 --> 00:49:10,020 probability of error is given that h is 0, h equals 0, is 764 00:49:10,020 --> 00:49:12,320 the correct hypothesis. 765 00:49:12,320 --> 00:49:16,010 So we're assuming that the probabilities are actually f 766 00:49:16,010 --> 00:49:17,760 of y given h0. 767 00:49:17,760 --> 00:49:22,450 We calculate this quantity that looks like this, and we 768 00:49:22,450 --> 00:49:32,290 ask what is the probability that this sum of random 769 00:49:32,290 --> 00:49:36,690 variables exceeds the threshold, exceeds the 770 00:49:36,690 --> 00:49:37,750 threshold eta. 771 00:49:37,750 --> 00:49:42,260 So the thing that we do is we draw a line, a slope, natural 772 00:49:42,260 --> 00:49:44,840 log of eta divided by eta. 773 00:49:44,840 --> 00:49:48,780 We draw that slope along here, and we find that the 774 00:49:48,780 --> 00:49:54,120 probability of error is upper bounded by gamma 0 of this 775 00:49:54,120 --> 00:49:58,550 quantity, defined by the slope, minus r0 times log of 776 00:49:58,550 --> 00:50:01,850 eta divided by eta. 777 00:50:01,850 --> 00:50:04,400 That's all there is to it. 778 00:50:04,400 --> 00:50:05,710 Any questions about that? 779 00:50:08,450 --> 00:50:11,470 Seem obvious? 780 00:50:11,470 --> 00:50:12,720 Seem strange? 781 00:50:14,820 --> 00:50:22,760 OK, so the probability of r conditional on h equals 0 is e 782 00:50:22,760 --> 00:50:27,090 to the n times gamma 0 of r0 minus r0, natural 783 00:50:27,090 --> 00:50:28,940 log of eta over eta. 784 00:50:28,940 --> 00:50:33,050 And ql of eta is the probability of error given 785 00:50:33,050 --> 00:50:36,790 that h is equal to l. 786 00:50:36,790 --> 00:50:43,190 OK, we can do the same thing for hypothesis 1. 787 00:50:43,190 --> 00:50:46,570 We're asking what's the probability of error given 788 00:50:46,570 --> 00:50:51,040 that h equals 1 is the correct hypothesis, and given that we 789 00:50:51,040 --> 00:50:53,950 choose a threshold, say we know the a priori 790 00:50:53,950 --> 00:50:58,180 probabilities, so we choose a threshold that way. 791 00:50:58,180 --> 00:51:01,850 OK, we go through the same argument, z1 of s is the 792 00:51:01,850 --> 00:51:08,760 natural log of f of y given F1 times e to the s, we're using 793 00:51:08,760 --> 00:51:14,150 s in place of r here, times the natural log of f of y 794 00:51:14,150 --> 00:51:18,320 given h1 over f of y given h0. 795 00:51:18,320 --> 00:51:25,380 And this quantity, now, f of y given h1, the f of y given h1 796 00:51:25,380 --> 00:51:31,510 is upstairs, so we have f of 1 plus s of y given h1. 797 00:51:31,510 --> 00:51:34,460 This quantity is down here, so we have f of minus 798 00:51:34,460 --> 00:51:38,210 s of y given h0. 799 00:51:38,210 --> 00:51:42,900 And we notice that when s is equal to minus 1, this is 800 00:51:42,900 --> 00:51:47,220 again equal to 0, and we notice also, if you compare 801 00:51:47,220 --> 00:51:53,350 this, gamma 1 of s is equal to gamma 0 of r minus 1. 802 00:51:53,350 --> 00:51:57,740 These two functions are the same, just shifts it by one. 803 00:51:57,740 --> 00:52:02,680 OK, so this one of the very strange things about 804 00:52:02,680 --> 00:52:09,070 hypothesis testing, namely you are calculating these expected 805 00:52:09,070 --> 00:52:12,640 values, but you're calculating the expected value of a 806 00:52:12,640 --> 00:52:14,320 likelihood ratio. 807 00:52:14,320 --> 00:52:17,630 And the likelihood ratio involves the probabilities of 808 00:52:17,630 --> 00:52:22,470 the hypotheses also, so when you calculate that ratio, what 809 00:52:22,470 --> 00:52:27,240 you get this is funny quantity here, which is related to what 810 00:52:27,240 --> 00:52:30,460 you get when you calculate the semi invariant moment 811 00:52:30,460 --> 00:52:34,420 generating function given the other hypothesis. 812 00:52:34,420 --> 00:52:38,600 So that now, what we wind up with is a gamma 1 of the eta, 813 00:52:38,600 --> 00:52:42,750 is e to the n times gamma 0 of r0. 814 00:52:42,750 --> 00:52:46,610 I'm using the fact that gamma 1 of s is equal to gamma 0 of 815 00:52:46,610 --> 00:52:52,150 r minus 1, s is just r shifted over by 1, so I can do the 816 00:52:52,150 --> 00:52:54,460 same optimization for each. 817 00:52:54,460 --> 00:52:58,940 So what I wind up with is the probability of error 818 00:52:58,940 --> 00:53:05,150 conditional on hypothesis 0, is this quantity down here. 819 00:53:11,050 --> 00:53:14,160 That's this one, and the probability of error 820 00:53:14,160 --> 00:53:18,810 conditional on the other hypothesis, the exponent is 821 00:53:18,810 --> 00:53:21,804 equal to this quantity here. 822 00:53:21,804 --> 00:53:29,480 OK, so what that says is that as you shift the threshold-- 823 00:53:29,480 --> 00:53:36,270 in other words, suppose instead of using a map test, 824 00:53:36,270 --> 00:53:39,710 you say, well, I want the probability of error to be 825 00:53:39,710 --> 00:53:44,050 small when hypothesis 0 correct. 826 00:53:44,050 --> 00:53:47,850 I want it to be small when hypothesis 1 is correct. 827 00:53:47,850 --> 00:53:50,450 I have a trade off between those two. 828 00:53:50,450 --> 00:53:54,170 How do I choose my threshold in order to get the smallest 829 00:53:54,170 --> 00:53:56,610 value overall? 830 00:53:56,610 --> 00:54:00,360 So you say, well, you're stuck. 831 00:54:00,360 --> 00:54:04,400 You have one exponent under hypothesis 0. 832 00:54:04,400 --> 00:54:08,120 You have another exponent under hypothesis 1. 833 00:54:08,120 --> 00:54:10,350 You have this curve here. 834 00:54:10,350 --> 00:54:16,570 You can take whatever value you want over here, and that 835 00:54:16,570 --> 00:54:19,950 sticks you with a value here. 836 00:54:19,950 --> 00:54:26,920 You can rock things around this inverted seesaw, and you 837 00:54:26,920 --> 00:54:29,830 can make one probability of error bigger by making the 838 00:54:29,830 --> 00:54:34,550 other one smaller, or you make the other one bigger by making 839 00:54:34,550 --> 00:54:37,000 the other one smaller. 840 00:54:37,000 --> 00:54:40,820 Namely, what you're doing is changing the threshold, and as 841 00:54:40,820 --> 00:54:45,190 you change the threshold, as you make the threshold 842 00:54:45,190 --> 00:54:50,600 positive, what you're doing is making it harder to accept h1, 843 00:54:50,600 --> 00:54:54,500 h equals 1, and easier to accept h equals 0. 844 00:54:54,500 --> 00:54:57,350 When you move the threshold the other way, you're making 845 00:54:57,350 --> 00:54:59,160 it easier the other way. 846 00:54:59,160 --> 00:55:04,410 This, in fact, gives you the choice between the two. 847 00:55:04,410 --> 00:55:07,100 You decide you're going to take n tests. 848 00:55:07,100 --> 00:55:10,650 You can make both of these smaller by making n bigger. 849 00:55:10,650 --> 00:55:13,860 But there's a trade off between the two, and the trade 850 00:55:13,860 --> 00:55:21,690 off is given by this tangent line to this curve here. 851 00:55:21,690 --> 00:55:26,620 And you're always stuck with r star equals 1 and 852 00:55:26,620 --> 00:55:27,900 all of these problems. 853 00:55:27,900 --> 00:55:30,840 So the only question is what does this curve look like? 854 00:55:30,840 --> 00:55:39,350 Notice that the expected value of the likelihood ratio given 855 00:55:39,350 --> 00:55:42,420 h equals 0 is negative. 856 00:55:42,420 --> 00:55:51,500 The expected value given h equals 1 is positive, and 857 00:55:51,500 --> 00:55:56,400 that's just because of the form of the likelihood ratio. 858 00:55:56,400 --> 00:56:03,420 OK, so this actually shows these two exponents. 859 00:56:03,420 --> 00:56:07,040 These are the exponents for the two kinds of errors. 860 00:56:07,040 --> 00:56:10,950 You can view this as a large deviation form of the Neyman 861 00:56:10,950 --> 00:56:13,030 Pearson test. 862 00:56:13,030 --> 00:56:16,700 In the Neyman Pearson test, you're doing things in a very 863 00:56:16,700 --> 00:56:23,080 detailed way, and you're taking a choice between 864 00:56:23,080 --> 00:56:27,800 choosing different thresholds to make the probability of 865 00:56:27,800 --> 00:56:32,030 error of one type bigger or less than the other one, just 866 00:56:32,030 --> 00:56:33,110 the other way. 867 00:56:33,110 --> 00:56:36,250 Here, we're looking at the large deviation form of it 868 00:56:36,250 --> 00:56:38,830 that becomes an upper bound rather than an exact 869 00:56:38,830 --> 00:56:42,560 calculation, but it tells you much, much more because for 870 00:56:42,560 --> 00:56:48,430 most of these threshold tests, you're going to do enough 871 00:56:48,430 --> 00:56:51,460 experiments that your probability of error is going 872 00:56:51,460 --> 00:56:52,820 to be very small. 873 00:56:52,820 --> 00:56:57,070 So the only question is where do you really want the error 874 00:56:57,070 --> 00:56:58,880 probability to be small? 875 00:56:58,880 --> 00:57:02,540 You can make it very small one way by shifting the curve this 876 00:57:02,540 --> 00:57:05,700 way, and make it very small the other way by shifting the 877 00:57:05,700 --> 00:57:08,240 curve the other way. 878 00:57:08,240 --> 00:57:10,260 And you take your choice of which you want. 879 00:57:13,010 --> 00:57:17,350 OK, the a priori probabilities are usually not the essential 880 00:57:17,350 --> 00:57:20,010 characteristic when you're dealing with this large 881 00:57:20,010 --> 00:57:23,310 deviation kind of result because, when you take a large 882 00:57:23,310 --> 00:57:28,360 number of tests, this threshold, log eta over eta 883 00:57:28,360 --> 00:57:32,830 over n, when n becomes very large, when you have a large 884 00:57:32,830 --> 00:57:36,840 number of experiments, log eta over n 885 00:57:36,840 --> 00:57:38,290 becomes relatively small. 886 00:57:38,290 --> 00:57:42,070 So that's not the thing you're usually concerned with. 887 00:57:42,070 --> 00:57:45,940 What you're concerned with is whether one test, the patient 888 00:57:45,940 --> 00:57:50,640 dies, and the other tests costs a lot of money; or one 889 00:57:50,640 --> 00:57:55,010 test, the nuclear plant blows up, and the other test, you 890 00:57:55,010 --> 00:57:56,450 waste a lot of money, which you wouldn't 891 00:57:56,450 --> 00:57:57,700 have had to pay otherwise. 892 00:58:01,100 --> 00:58:09,240 OK, now, here's the important part of all of this. 893 00:58:09,240 --> 00:58:12,630 So far, it looked like there wasn't any way to get out of 894 00:58:12,630 --> 00:58:18,910 this trade off between choosing a threshold to make 895 00:58:18,910 --> 00:58:22,350 the error probability small one way, or making the error 896 00:58:22,350 --> 00:58:24,700 probability small the other way. 897 00:58:24,700 --> 00:58:26,510 And you think, well, yes, there is a way 898 00:58:26,510 --> 00:58:28,390 to get around it. 899 00:58:28,390 --> 00:58:32,980 What I should do is what I do in real life, namely if I'm 900 00:58:32,980 --> 00:58:37,580 trying to decide about something, what I'm normally 901 00:58:37,580 --> 00:58:41,690 going to do, I don't like to waste my time deciding about 902 00:58:41,690 --> 00:58:44,770 it, so as soon as the decision becomes relatively 903 00:58:44,770 --> 00:58:47,470 straightforward, I make up my mind. 904 00:58:47,470 --> 00:58:49,780 If the decision is not straightforward, if I don't 905 00:58:49,780 --> 00:58:53,780 have enough evidence, I keep doing more tests, so 906 00:58:53,780 --> 00:58:57,460 sequential tests are an obvious thing to try to do if 907 00:58:57,460 --> 00:59:00,080 you can do it. 908 00:59:00,080 --> 00:59:03,900 What we have here, what we've shown, is we have two coupled 909 00:59:03,900 --> 00:59:05,480 random walks. 910 00:59:05,480 --> 00:59:11,120 Given hypothesis h equals 0, we have one random walk, and 911 00:59:11,120 --> 00:59:16,250 that random walk is typically going to go down. 912 00:59:16,250 --> 00:59:19,960 Given h equals 1, we have another random walk. 913 00:59:19,960 --> 00:59:24,640 That random walk is typically going to go up. 914 00:59:24,640 --> 00:59:27,680 And one is going to go down, one is going to go up, because 915 00:59:27,680 --> 00:59:31,935 we've defined the random variable involved is a log of 916 00:59:31,935 --> 00:59:41,300 f of y given h1 divided by f of y given h0, which is why 917 00:59:41,300 --> 00:59:45,610 the 1 walk goes up, and the 0 walk goes down. 918 00:59:45,610 --> 00:59:50,600 Now, the thing we're going to do is do a sequential test. 919 00:59:50,600 --> 00:59:53,120 We're going to keep doing experiments 920 00:59:53,120 --> 00:59:55,200 until we cross a threshold. 921 00:59:55,200 --> 00:59:58,560 We're going to decide what threshold is going to give us 922 00:59:58,560 --> 01:00:02,550 a small enough probability of error under each condition, 923 01:00:02,550 --> 01:00:04,650 and then we choose that threshold. 924 01:00:04,650 --> 01:00:09,710 And we continue to test until we get there. 925 01:00:09,710 --> 01:00:13,120 So we want to find out whether we've gained anything by that, 926 01:00:13,120 --> 01:00:15,210 how much we've gained if we gain something 927 01:00:15,210 --> 01:00:18,650 by it, and so forth. 928 01:00:18,650 --> 01:00:24,370 OK, when you use two thresholds, alpha's going to 929 01:00:24,370 --> 01:00:25,420 be bigger than 0. 930 01:00:25,420 --> 01:00:27,720 Beta's going to be less than 0. 931 01:00:27,720 --> 01:00:32,110 The expected value of z given h0 is less than 0, but the 932 01:00:32,110 --> 01:00:36,410 value of z given h1 is greater than 0. 933 01:00:36,410 --> 01:00:38,810 That's why the walks are coupled, so we can handle each 934 01:00:38,810 --> 01:00:42,120 of them separately until we can get the answers for one 935 01:00:42,120 --> 01:00:44,530 from the answers for the other. 936 01:00:44,530 --> 01:00:50,670 Crossing alpha is a rare event for the random walk with h0 937 01:00:50,670 --> 01:00:52,270 because a random walk with h0, you're 938 01:00:52,270 --> 01:00:54,250 going to go down typically. 939 01:00:54,250 --> 01:00:55,140 You hardly ever go up. 940 01:00:55,140 --> 01:00:55,784 Yes? 941 01:00:55,784 --> 01:00:57,560 AUDIENCE: Can you please explain again sign of 942 01:00:57,560 --> 01:00:58,800 expectations? 943 01:00:58,800 --> 01:01:00,360 PROFESSOR: The sign of the expectations? 944 01:01:00,360 --> 01:01:25,170 Yes, z is the log, so that when we actually have h equals 945 01:01:25,170 --> 01:01:29,670 1, the expected value of this is going to be lined up with 946 01:01:29,670 --> 01:01:31,350 this term on top. 947 01:01:31,350 --> 01:01:34,700 We have f of y. 948 01:01:34,700 --> 01:01:37,920 When we have h equals 0, this lined up with 949 01:01:37,920 --> 01:01:40,230 the term on the bottom. 950 01:01:40,230 --> 01:01:45,270 I mean, actually, you have to go through and actually show 951 01:01:45,270 --> 01:01:52,590 that the integral of f of y given h1 of this quantity is 952 01:01:52,590 --> 01:01:55,360 greater than 0, and the other one is less than 0. 953 01:01:55,360 --> 01:01:59,020 We don't really have to do that because, if we calculate 954 01:01:59,020 --> 01:02:02,550 this moment generating function, we can 955 01:02:02,550 --> 01:02:05,640 pick it off of there. 956 01:02:05,640 --> 01:02:13,380 When we look at this moment generating function, that 957 01:02:13,380 --> 01:02:19,110 slope there is the expected value of z conditional on h 958 01:02:19,110 --> 01:02:23,900 equals 0, and because of the shifting property, this slope 959 01:02:23,900 --> 01:02:29,010 here is the expected value of z given h equals 1, just 960 01:02:29,010 --> 01:02:33,750 because the 1 curve is shifted from the other by one unit. 961 01:02:38,840 --> 01:02:41,400 It's really because of that ratio. 962 01:02:41,400 --> 01:02:44,010 If you defined it the other way, you just changed the 963 01:02:44,010 --> 01:02:46,598 sign, so nothing important would happen. 964 01:02:51,478 --> 01:02:58,980 OK, so r start equals 1 for the h0 walk, so the 965 01:02:58,980 --> 01:03:03,700 probability of error, given h0, is less than or equal to e 966 01:03:03,700 --> 01:03:05,430 to the minus alpha. 967 01:03:05,430 --> 01:03:08,040 Well, that's a nice simple result, isn't it? 968 01:03:08,040 --> 01:03:09,420 In fact, that's really beautifully. 969 01:03:09,420 --> 01:03:13,740 You just calculate this moment generating function, you find 970 01:03:13,740 --> 01:03:15,350 the root of it, and you're done. 971 01:03:15,350 --> 01:03:18,260 You have a nice bound, and in fact, it's an exponentially 972 01:03:18,260 --> 01:03:20,460 tight bound. 973 01:03:20,460 --> 01:03:24,790 And on the other hand, when you deal with the probability 974 01:03:24,790 --> 01:03:29,230 of error given h1 by symmetry, it's less than or equal to e 975 01:03:29,230 --> 01:03:29,990 to the beta. 976 01:03:29,990 --> 01:03:32,570 Beta is a negative number, remember, so this is 977 01:03:32,570 --> 01:03:35,460 exponentially going down as you choose 978 01:03:35,460 --> 01:03:38,030 beta, smaller and smaller. 979 01:03:38,030 --> 01:03:41,140 So the thing that we're getting is we can make each of 980 01:03:41,140 --> 01:03:48,290 these error probabilities as small as we want, this one, by 981 01:03:48,290 --> 01:03:49,935 making alpha big. 982 01:03:49,935 --> 01:03:52,530 We can make this one as small as we want by 983 01:03:52,530 --> 01:03:55,360 making beta big negative. 984 01:03:55,360 --> 01:03:58,460 There must be a cost to this. 985 01:03:58,460 --> 01:03:59,710 OK, but what's the cost? 986 01:04:03,480 --> 01:04:05,595 What happens when you make alpha big? 987 01:04:10,010 --> 01:04:14,110 When hypothesis 1 is the correct hypothesis, what 988 01:04:14,110 --> 01:04:19,010 normally happens is that this random walk is going to go up 989 01:04:19,010 --> 01:04:23,580 roughly at a slope of the expected value of z 990 01:04:23,580 --> 01:04:26,530 given h equals 0. 991 01:04:26,530 --> 01:04:29,970 So when you make alpha very, very large, you're forced to 992 01:04:29,970 --> 01:04:35,040 make a very large number of tests when h is equal to 1. 993 01:04:35,040 --> 01:04:37,610 When you make beta very, very large, you're forced to take a 994 01:04:37,610 --> 01:04:41,840 large number of tests when h is equal to 0. 995 01:04:41,840 --> 01:04:45,220 So the trade off here is a little bit funny. 996 01:04:45,220 --> 01:04:49,090 You make your error probability for h equals 0 997 01:04:49,090 --> 01:04:54,280 very, very small by costing more money when hypotheses 1 998 01:04:54,280 --> 01:04:57,730 is the correct hypothesis because you don't make a 999 01:04:57,730 --> 01:05:01,540 decision until you've really climb way up 1000 01:05:01,540 --> 01:05:03,280 on this random walk. 1001 01:05:03,280 --> 01:05:09,050 And that means it takes a long time when you have h equals 1. 1002 01:05:09,050 --> 01:05:12,490 Since when h is equal to 1, the probability of crossing 1003 01:05:12,490 --> 01:05:17,660 this lower threshold is it is almost negligible, this 1004 01:05:17,660 --> 01:05:20,550 expected time that it takes is really just a 1005 01:05:20,550 --> 01:05:23,110 function of h equals 1. 1006 01:05:23,110 --> 01:05:24,880 I'm going to show that in the next slide. 1007 01:05:31,280 --> 01:05:35,780 When you increase alpha, it lowers the probability of 1008 01:05:35,780 --> 01:05:39,190 error given h equals 0. 1009 01:05:39,190 --> 01:05:42,560 Excuse me, I should have h equals 0 instead of h sub 0. 1010 01:05:42,560 --> 01:05:48,660 Exponentially, it increases the expected number of steps 1011 01:05:48,660 --> 01:05:54,420 until you make a decision given h1. 1012 01:05:54,420 --> 01:05:58,940 Expected value of j given h1 is effectively equal to alpha 1013 01:05:58,940 --> 01:06:02,960 divided by expected value of z given h1. 1014 01:06:02,960 --> 01:06:05,840 Why is that? 1015 01:06:05,840 --> 01:06:08,210 That's essentially Wald's equality. 1016 01:06:08,210 --> 01:06:11,700 Not Wald's identity, but Wald's equality because-- 1017 01:06:25,250 --> 01:06:29,500 Yes, it says from Wald's equality, since alpha is 1018 01:06:29,500 --> 01:06:33,570 essentially equal to the expected value of s of j given 1019 01:06:33,570 --> 01:06:37,040 h equals 1, the number of testing you have to take when 1020 01:06:37,040 --> 01:06:41,960 h is equal to 1, when alpha is very, very large, is 1021 01:06:41,960 --> 01:06:45,430 effectively the amount of time that it takes you to get up to 1022 01:06:45,430 --> 01:06:46,310 the point alpha. 1023 01:06:46,310 --> 01:06:51,110 That expected amount of time is typically pretty close to 1024 01:06:51,110 --> 01:06:52,790 the mean value. 1025 01:06:52,790 --> 01:07:01,530 So alpha there is close to the expected value of s of j given 1026 01:07:01,530 --> 01:07:02,730 h equals 1. 1027 01:07:02,730 --> 01:07:07,260 So Wald's equality, given h equals 1, says the expected 1028 01:07:07,260 --> 01:07:13,880 value of j given h1 is equal to the expected value of sj 1029 01:07:13,880 --> 01:07:17,530 given h equals 1, that's alpha, divided by the expected 1030 01:07:17,530 --> 01:07:22,550 value of z given h1, which is just the 1031 01:07:22,550 --> 01:07:24,220 underlying likelihood ratio. 1032 01:07:27,640 --> 01:07:30,850 So to get this result, we just substitute alpha for the 1033 01:07:30,850 --> 01:07:33,340 expected value. 1034 01:07:33,340 --> 01:07:37,260 And then the probability of error, given h equals 0, if we 1035 01:07:37,260 --> 01:07:40,680 write it this way, we see the cost immediately. 1036 01:07:40,680 --> 01:07:46,230 That's the expected value of j given h equal to 1. 1037 01:07:46,230 --> 01:07:50,430 In other words, the expected number of tests given h equals 1038 01:07:50,430 --> 01:07:54,555 1 times the expected value of the log likelihood ratio given 1039 01:07:54,555 --> 01:07:55,820 h equals 1. 1040 01:07:55,820 --> 01:07:59,220 When you decrease beta, that lowers the probability of 1041 01:07:59,220 --> 01:08:03,960 error given h1 exponentially, but it increases the number of 1042 01:08:03,960 --> 01:08:08,390 tests when h0 is the correct hypothesis. 1043 01:08:08,390 --> 01:08:14,010 So in that case, you get the probability of error given h 1044 01:08:14,010 --> 01:08:20,240 equals 1 is effectively equal to the expected value e to the 1045 01:08:20,240 --> 01:08:23,970 expected value of j equals j equals 0. 1046 01:08:23,970 --> 01:08:27,160 This is just the number of tests you have to do when h is 1047 01:08:27,160 --> 01:08:28,630 equal to 0. 1048 01:08:28,630 --> 01:08:32,450 This is the expected value of the log likelihood ratio when 1049 01:08:32,450 --> 01:08:34,930 h is equal to 0. 1050 01:08:34,930 --> 01:08:38,630 This is very approximate, but this is how you would actually 1051 01:08:38,630 --> 01:08:43,680 choose how big you make alpha, how big do you make beta if 1052 01:08:43,680 --> 01:08:48,109 you want to do a test between these two hypotheses. 1053 01:08:48,109 --> 01:08:54,520 Now, this shows what you're gaining by the sequential test 1054 01:08:54,520 --> 01:08:57,130 over what you're gaining by the non-sequential test. 1055 01:08:57,130 --> 01:09:00,160 You don't have this in your notes, so you might just jot 1056 01:09:00,160 --> 01:09:01,439 it down quickly. 1057 01:09:01,439 --> 01:09:08,399 The expected value of z, conditional on h equals 0, is 1058 01:09:08,399 --> 01:09:14,220 this slope here, the slope of the moment generating function 1059 01:09:14,220 --> 01:09:16,250 is z equals 0. 1060 01:09:16,250 --> 01:09:20,490 That's the slope of the underlying random variable. 1061 01:09:20,490 --> 01:09:24,920 Since this point is r equal to 1, this point down here is the 1062 01:09:24,920 --> 01:09:29,130 expected value of z given h equals 0. 1063 01:09:29,130 --> 01:09:38,620 That's the exponents that you get when h equals 0 is, in 1064 01:09:38,620 --> 01:09:40,120 fact, the correct exponent. 1065 01:09:40,120 --> 01:09:45,819 When given the probability of error given that h is equal to 1066 01:09:45,819 --> 01:09:50,600 0, namely the probability that you choose hypothesis 1. 1067 01:09:50,600 --> 01:09:51,950 Same way over here. 1068 01:09:51,950 --> 01:09:55,440 This slope here is the expected value of the log 1069 01:09:55,440 --> 01:09:59,015 likelihood ratio given h equals 1. 1070 01:09:59,015 --> 01:10:03,230 This hits down here at minus expected value of z 1071 01:10:03,230 --> 01:10:05,050 given h equals 1. 1072 01:10:05,050 --> 01:10:08,750 So you have this exponent going one way, you have this 1073 01:10:08,750 --> 01:10:12,060 exponent going the other way when the thing multiplying the 1074 01:10:12,060 --> 01:10:15,900 exponent is not an absolute value but is, in fact, the 1075 01:10:15,900 --> 01:10:20,130 number of tests you have to do than the other test. 1076 01:10:20,130 --> 01:10:25,710 Now, if we do the fix test, what we're fixed with is a 1077 01:10:25,710 --> 01:10:31,550 test where you take a line tangent to this curve, which 1078 01:10:31,550 --> 01:10:34,090 goes from here across here to there. 1079 01:10:34,090 --> 01:10:36,410 We can see-saw it around. 1080 01:10:36,410 --> 01:10:40,470 When we see-saw it all the way in the limit, we can get this 1081 01:10:40,470 --> 01:10:42,370 result here. 1082 01:10:42,370 --> 01:10:48,240 But we get this result here at the cost of an error, which is 1083 01:10:48,240 --> 01:10:52,030 almost one in the other case, so that's 1084 01:10:52,030 --> 01:10:54,060 not a very good deal. 1085 01:10:54,060 --> 01:10:57,940 This says that sequential testing, well, it shows you 1086 01:10:57,940 --> 01:11:01,960 how much you gain by doing a sequential test. 1087 01:11:01,960 --> 01:11:04,620 I mean, it might not be intuitively obvious why this 1088 01:11:04,620 --> 01:11:06,580 is happening. 1089 01:11:06,580 --> 01:11:10,480 I mean, really the reason it's happening is that the times 1090 01:11:10,480 --> 01:11:16,600 when you want to make the test very long are those times when 1091 01:11:16,600 --> 01:11:21,110 if h is equal to 0, you normally go down. 1092 01:11:21,110 --> 01:11:24,620 The next most normal thing is you wobble around without 1093 01:11:24,620 --> 01:11:28,430 doing anything for a long time, in which case you want 1094 01:11:28,430 --> 01:11:33,210 to keep doing additional tests until finally it falls down, 1095 01:11:33,210 --> 01:11:35,090 or finally it goes up. 1096 01:11:35,090 --> 01:11:38,810 But by taking additional tests, you make it very 1097 01:11:38,810 --> 01:11:42,740 unlikely that you're ever going to cross that threshold. 1098 01:11:42,740 --> 01:11:45,800 So that's the thing you're gaining. 1099 01:11:45,800 --> 01:11:54,140 You are gaining the fact that the error is small in those 1100 01:11:54,140 --> 01:11:58,560 situations where the sum of these random variables stays 1101 01:11:58,560 --> 01:12:02,410 close to 0 for a long time, and then you don't make errors 1102 01:12:02,410 --> 01:12:03,660 in those cases. 1103 01:12:08,120 --> 01:12:09,930 We now have just a little bit of time to 1104 01:12:09,930 --> 01:12:11,930 prove Wald's identity. 1105 01:12:11,930 --> 01:12:14,890 I don't want to have a lot of time to prove it because 1106 01:12:14,890 --> 01:12:18,090 proofs of theorems are things you really have to look at 1107 01:12:18,090 --> 01:12:19,810 yourselves. 1108 01:12:19,810 --> 01:12:22,320 This one, you almost don't have to look at it. 1109 01:12:22,320 --> 01:12:27,460 This one is almost obvious as soon as you understand what a 1110 01:12:27,460 --> 01:12:29,790 tilted probability is. 1111 01:12:29,790 --> 01:12:35,480 So let's suppose that x sub n is a sequence of IID discrete 1112 01:12:35,480 --> 01:12:36,840 random variables. 1113 01:12:36,840 --> 01:12:42,110 It has a moment generating function for some given r. 1114 01:12:42,110 --> 01:12:44,240 We're going to assume that these random variables are 1115 01:12:44,240 --> 01:12:47,740 discrete now to make this argument simple. 1116 01:12:47,740 --> 01:12:51,770 If they're not discrete, this whole argument has to be 1117 01:12:51,770 --> 01:12:54,270 replaced with all sorts of [INAUDIBLE] 1118 01:12:54,270 --> 01:12:56,090 integrals and all of that stuff. 1119 01:12:56,090 --> 01:12:59,450 It's exactly the same idea, but it just is messy 1120 01:12:59,450 --> 01:13:01,190 mathematically. 1121 01:13:01,190 --> 01:13:04,540 So what we're going to do is we're going to define a tilted 1122 01:13:04,540 --> 01:13:05,970 random variable. 1123 01:13:05,970 --> 01:13:10,590 A tilted random variable is a random variable in a different 1124 01:13:10,590 --> 01:13:11,860 probability space. 1125 01:13:11,860 --> 01:13:14,950 OK, we start out with this probability space that we're 1126 01:13:14,950 --> 01:13:23,830 interested in, and then we say, OK, suppose that we, just 1127 01:13:23,830 --> 01:13:29,000 to satisfy our imaginations, we suppose the probabilities 1128 01:13:29,000 --> 01:13:30,690 are different. 1129 01:13:30,690 --> 01:13:34,090 We assume that the probabilities for a given r is 1130 01:13:34,090 --> 01:13:39,220 the probability that the random variable X is equal to 1131 01:13:39,220 --> 01:13:47,070 little x, namely this quantity here, is equal to the original 1132 01:13:47,070 --> 01:13:50,420 probability that X is equal to little x. 1133 01:13:50,420 --> 01:13:52,980 All the sample values are the same, it's just the 1134 01:13:52,980 --> 01:13:59,470 probability's changed, times e to the rx minus gamma of r. 1135 01:13:59,470 --> 01:14:03,240 So we're taking these probabilities when X is large. 1136 01:14:03,240 --> 01:14:05,790 We're magnifying them when x is small. 1137 01:14:05,790 --> 01:14:07,950 We're knocking them down. 1138 01:14:07,950 --> 01:14:09,190 What's the purpose of this? 1139 01:14:09,190 --> 01:14:11,420 It's just a normalization factor. 1140 01:14:11,420 --> 01:14:16,800 e to the minus gamma of r is 1 over the moment generating 1141 01:14:16,800 --> 01:14:20,730 function of r, so you take p of x, e to the rx, 1142 01:14:20,730 --> 01:14:25,560 divide it by g of r. 1143 01:14:25,560 --> 01:14:31,280 So this is a probability mass function, as well as this. 1144 01:14:31,280 --> 01:14:33,980 This is the correct probability mass function for 1145 01:14:33,980 --> 01:14:36,320 the model you're looking at. 1146 01:14:36,320 --> 01:14:39,860 This is an imaginary one, but you can always imagine. 1147 01:14:39,860 --> 01:14:44,070 You can say let's suppose that we had this model instead of 1148 01:14:44,070 --> 01:14:44,950 the other model. 1149 01:14:44,950 --> 01:14:48,430 All the sample values are the same, but the probabilities 1150 01:14:48,430 --> 01:14:49,800 are different. 1151 01:14:49,800 --> 01:14:53,530 So we want to see what we can find out from these different 1152 01:14:53,530 --> 01:14:58,200 probabilities in this different probability model. 1153 01:14:58,200 --> 01:15:01,540 If you sum over x here, this sum is equal to 1154 01:15:01,540 --> 01:15:03,940 1, as we just said. 1155 01:15:03,940 --> 01:15:11,580 So we'll view q sub xr of x as the probability mass function 1156 01:15:11,580 --> 01:15:14,560 on x in a new probability space. 1157 01:15:14,560 --> 01:15:19,050 We can use all the laws of probability in this new space, 1158 01:15:19,050 --> 01:15:22,000 and that's exactly what we're going to do. 1159 01:15:22,000 --> 01:15:25,790 And we're going to say things about the new space, but then 1160 01:15:25,790 --> 01:15:29,670 we can always come back to the old space from this formula 1161 01:15:29,670 --> 01:15:34,230 here because whatever we find out in the new space will work 1162 01:15:34,230 --> 01:15:35,880 in the old space. 1163 01:15:35,880 --> 01:15:40,000 One thing we'd like to do is to be able to find the 1164 01:15:40,000 --> 01:15:44,720 expected value of the random variable x in this new 1165 01:15:44,720 --> 01:15:48,940 probability space, so this isn't the expected value in 1166 01:15:48,940 --> 01:15:49,680 the old space. 1167 01:15:49,680 --> 01:15:51,950 It's a probability in the new space. 1168 01:15:51,950 --> 01:15:57,130 It's the sum over x of x times q sub xr of x. 1169 01:15:57,130 --> 01:15:59,320 That's what the expected value is. 1170 01:15:59,320 --> 01:16:02,000 X is the same in both spaces. 1171 01:16:02,000 --> 01:16:05,030 That's just the probabilities that have changed. 1172 01:16:05,030 --> 01:16:12,400 These are p of x times z to the rx minus gamma of r, so 1173 01:16:12,400 --> 01:16:16,760 when you sum this, what you get is 1 over g of xr, which 1174 01:16:16,760 --> 01:16:20,440 is that term, times the derivative of p sub x 1175 01:16:20,440 --> 01:16:23,560 of x, e to the rx. 1176 01:16:23,560 --> 01:16:27,290 When you take this derivative, then you get an x in front, 1177 01:16:27,290 --> 01:16:29,020 which is that x there. 1178 01:16:29,020 --> 01:16:33,230 So you get g prime of xr over gx of r, which is 1179 01:16:33,230 --> 01:16:35,410 gamma prime of r. 1180 01:16:35,410 --> 01:16:38,830 OK, so in terms of that graph we've drawn, when you take 1181 01:16:38,830 --> 01:16:44,320 these tilted probabilities, you move that slope, that r 1182 01:16:44,320 --> 01:16:48,370 equals 0, and now you're looking at a slope at whatever 1183 01:16:48,370 --> 01:16:50,590 r you're looking at. 1184 01:16:50,590 --> 01:16:52,680 And that gives you the expected value there. 1185 01:16:56,290 --> 01:17:03,770 OK, if you have a joint tilted probability mass function-- 1186 01:17:03,770 --> 01:17:05,750 and don't think it gets any more complicated. 1187 01:17:05,750 --> 01:17:07,000 It doesn't. 1188 01:17:07,000 --> 01:17:10,120 I mean, you've already gone through the major complication 1189 01:17:10,120 --> 01:17:11,580 of this argument. 1190 01:17:11,580 --> 01:17:19,690 The joint tilted PMF is the probability of x1 to xn is the 1191 01:17:19,690 --> 01:17:24,940 old probability of x1 to xn times all of these tilted 1192 01:17:24,940 --> 01:17:27,440 factors here. 1193 01:17:27,440 --> 01:17:31,470 If you let a of sn be the set of n tuples which have the 1194 01:17:31,470 --> 01:17:37,590 same sum, then all these terms become r times s sub n. 1195 01:17:37,590 --> 01:17:42,410 So what you get is that for each xn for which the sum is 1196 01:17:42,410 --> 01:17:48,620 sn, this tilted probability becomes the old probability 1197 01:17:48,620 --> 01:17:53,310 times e to the r sn minus n gamma of r, which says that 1198 01:17:53,310 --> 01:17:58,380 when we look at the tilted probability of the sum, namely 1199 01:17:58,380 --> 01:18:02,050 we said that when we tilt these probabilities, we can do 1200 01:18:02,050 --> 01:18:04,710 everything in a new space that we could do in the old space. 1201 01:18:04,710 --> 01:18:08,080 We can do everything that probability theory allows us 1202 01:18:08,080 --> 01:18:12,460 to do, so we can look at the probability of s sub n in the 1203 01:18:12,460 --> 01:18:14,210 new space also. 1204 01:18:14,210 --> 01:18:17,340 The probability of sn in the old space, namely we're 1205 01:18:17,340 --> 01:18:23,950 summing this quantity, overall xn in a of sn, so we sum up 1206 01:18:23,950 --> 01:18:28,610 all of those as the probability sub s sub n at sn 1207 01:18:28,610 --> 01:18:32,090 times this quantity, which is fixed. 1208 01:18:32,090 --> 01:18:36,980 So this is the key to a lot of large deviation theory. 1209 01:18:36,980 --> 01:18:41,090 Any time you're dealing with a difficult problem, and you 1210 01:18:41,090 --> 01:18:45,170 want to see what's happening way, way away from the mean, 1211 01:18:45,170 --> 01:18:48,110 you want to see what these sums look like for these 1212 01:18:48,110 --> 01:18:52,040 exceptional cases, what we do is we look at a new model 1213 01:18:52,040 --> 01:18:56,850 where we tilt the probability so that the region of concern 1214 01:18:56,850 --> 01:19:02,320 becomes the main region for that tilted model. 1215 01:19:02,320 --> 01:19:04,960 So for r equals 0, we're tilting the probability 1216 01:19:04,960 --> 01:19:08,330 towards large values, and you can use the law of large 1217 01:19:08,330 --> 01:19:11,970 numbers, essential limit theorem, whatever you want to, 1218 01:19:11,970 --> 01:19:15,280 in that new space, then. 1219 01:19:15,280 --> 01:19:17,105 Now, we can prove Wald's equality. 1220 01:19:20,490 --> 01:19:25,020 What Wald's identity is is the statement that when you tilt 1221 01:19:25,020 --> 01:19:31,020 these probabilities, a stopping rule in this tilted 1222 01:19:31,020 --> 01:19:36,570 world is still the stopping time is still a random 1223 01:19:36,570 --> 01:19:39,760 variable, namely you still stop with probability 1. 1224 01:19:39,760 --> 01:19:42,960 Somebody questioned whether you stop with probability 1 in 1225 01:19:42,960 --> 01:19:44,652 the old world. 1226 01:19:44,652 --> 01:19:47,400 Like I said, you do because you have this positive 1227 01:19:47,400 --> 01:19:50,330 variance, and the thing with two thresholds keeps growing 1228 01:19:50,330 --> 01:19:51,850 and growing. 1229 01:19:51,850 --> 01:19:55,560 Here, you have the same thing. 1230 01:19:55,560 --> 01:19:58,530 I mean, the mean doesn't make any difference at all. 1231 01:19:58,530 --> 01:20:01,770 I mean, you're looking at trying to exceed one of two 1232 01:20:01,770 --> 01:20:05,020 different thresholds, and eventually, you exceed one of 1233 01:20:05,020 --> 01:20:07,900 them no matter where you set r. 1234 01:20:07,900 --> 01:20:13,840 So what this is saying is the probability that j is equal to 1235 01:20:13,840 --> 01:20:19,610 n in this tilted space is equal to the probability that 1236 01:20:19,610 --> 01:20:23,920 j is equal to n in the old space times z to the r sn 1237 01:20:23,920 --> 01:20:26,300 minus gamma of r. 1238 01:20:26,300 --> 01:20:30,280 So this quantity is equal to the expected value of e to the 1239 01:20:30,280 --> 01:20:32,540 r sn minus gamma of r. 1240 01:20:32,540 --> 01:20:36,300 Given j equals n times the probability that j is equal to 1241 01:20:36,300 --> 01:20:39,860 n, you sum this over n and, bingo, you're 1242 01:20:39,860 --> 01:20:42,010 back at the Wald identity. 1243 01:20:42,010 --> 01:20:44,220 So that's all the Wald identity is, is just a 1244 01:20:44,220 --> 01:20:48,260 statement that when you tilt a probability, and you have a 1245 01:20:48,260 --> 01:20:52,340 stopping rule on the original probabilities, you then have a 1246 01:20:52,340 --> 01:20:55,620 stopping rule on the new probabilities. 1247 01:20:55,620 --> 01:20:59,390 And Wald's identity says-- 1248 01:20:59,390 --> 01:21:03,500 well, Wald's identity holds whenever that tilted stopping 1249 01:21:03,500 --> 01:21:06,372 rule is a random variable. 1250 01:21:06,372 --> 01:21:11,540 OK, that's it for today. 1251 01:21:11,540 --> 01:21:13,240 We will do martingales on Wednesday.