1 00:00:00,000 --> 00:00:00,590 Hi. 2 00:00:00,590 --> 00:00:03,110 In this problem, Romeo and Juliet are back and they're 3 00:00:03,110 --> 00:00:05,470 still looking to meet up for a date. 4 00:00:05,470 --> 00:00:07,944 Remember, the last time we met up with them, it was back in 5 00:00:07,944 --> 00:00:09,613 the beginning of the course and they were trying to meet 6 00:00:09,613 --> 00:00:11,510 up for a date but they weren't always punctual. 7 00:00:11,510 --> 00:00:15,570 So we modeled their delay as uniformly distributed between 8 00:00:15,570 --> 00:00:18,060 0 and 1 hour. 9 00:00:18,060 --> 00:00:19,770 So now in this problem, we're actually 10 00:00:19,770 --> 00:00:21,610 going to look at variation. 11 00:00:21,610 --> 00:00:24,020 And we're going to ask the question, how do we actually 12 00:00:24,020 --> 00:00:26,970 know that the distribution is uniformly distributed between 13 00:00:26,970 --> 00:00:29,190 0 and 1 hour? 14 00:00:29,190 --> 00:00:31,900 Or it could also be the case that it is uniformly 15 00:00:31,900 --> 00:00:34,190 distributed between 0 and half an hour, or 16 00:00:34,190 --> 00:00:35,600 zero and two hours. 17 00:00:35,600 --> 00:00:38,150 How do we actually know what this parameter of the uniform 18 00:00:38,150 --> 00:00:39,844 distribution is? 19 00:00:39,844 --> 00:00:44,030 OK, so let's put ourselves in the shoes of Romeo who's tired 20 00:00:44,030 --> 00:00:46,690 of being stood up by Juliet on all these dates. 21 00:00:46,690 --> 00:00:49,870 And fortunately, he's learned some probability since the 22 00:00:49,870 --> 00:00:51,590 beginning of course, and so have we. 23 00:00:51,590 --> 00:00:54,470 And in particular we've learned Bayesian inference. 24 00:00:54,470 --> 00:00:56,120 And so in this problem, we're actually going to use 25 00:00:56,120 --> 00:00:58,830 basically all the concepts and tools of Bayesian inference 26 00:00:58,830 --> 00:01:00,720 that we learned chapter eight and apply them. 27 00:01:00,720 --> 00:01:04,269 So it's a nice review problem, and so let's get started. 28 00:01:04,269 --> 00:01:08,530 The set of the problem is similar to the first Romeo and 29 00:01:08,530 --> 00:01:11,810 Juliet problem that we dealt with. 30 00:01:11,810 --> 00:01:14,480 They are meeting up for a date, and they're not always 31 00:01:14,480 --> 00:01:16,190 punctual and they have a delay. 32 00:01:16,190 --> 00:01:19,680 But instead of the delay being uniformly distributed between 33 00:01:19,680 --> 00:01:24,570 0 and 1 hour, now we have an extra layer of uncertainty. 34 00:01:24,570 --> 00:01:31,780 So if we know sum theta, then we know that the delay, which 35 00:01:31,780 --> 00:01:34,770 we'll call x is uniformly distributed 36 00:01:34,770 --> 00:01:36,520 between 0 and the theta. 37 00:01:36,520 --> 00:01:39,670 So here's one possible theta, theta 1. 38 00:01:39,670 --> 00:01:42,740 But we don't actually know what this theta is. 39 00:01:42,740 --> 00:01:45,120 So in the original problem we knew that theta 40 00:01:45,120 --> 00:01:46,820 was exactly one hour. 41 00:01:46,820 --> 00:01:49,320 But in this problem we don't know what theta is. 42 00:01:49,320 --> 00:01:54,150 So theta could also be like this, some other theta 2. 43 00:01:54,150 --> 00:01:56,530 And we don't know what this theta is. 44 00:01:56,530 --> 00:02:01,590 And we choose to model it as being uniformly distributed 45 00:02:01,590 --> 00:02:04,950 between 0 and 1. 46 00:02:04,950 --> 00:02:07,330 So like I said, we have two layers now. 47 00:02:07,330 --> 00:02:10,419 We have uncertainty about theta, which is the parameters 48 00:02:10,419 --> 00:02:11,730 of the uniform distribution. 49 00:02:11,730 --> 00:02:16,510 And then we have uncertainty in regards to the 50 00:02:16,510 --> 00:02:19,342 actual delay, x. 51 00:02:19,342 --> 00:02:23,030 OK, so let's actually write out what these 52 00:02:23,030 --> 00:02:23,720 distributions are. 53 00:02:23,720 --> 00:02:27,360 So theta, the unknown parameter, we're told in the 54 00:02:27,360 --> 00:02:29,500 problem that we're going to assume that is uniformly 55 00:02:29,500 --> 00:02:30,930 distributed between 0 and 1. 56 00:02:30,930 --> 00:02:35,960 And so the PDF is just 1, when theta is between 0 and 1, and 57 00:02:35,960 --> 00:02:38,870 0 otherwise. 58 00:02:38,870 --> 00:02:44,530 And we're told that, given what theta is, given what this 59 00:02:44,530 --> 00:02:50,090 parameter is, the delay is uniformly distributed between 60 00:02:50,090 --> 00:02:52,330 0 and this theta. 61 00:02:52,330 --> 00:02:55,670 So what that means is that we know this conditional PDF, the 62 00:02:55,670 --> 00:03:01,410 conditional PDF of x given theta is going to be 1 over 63 00:03:01,410 --> 00:03:10,620 theta if x is between 0 and theta, and 0 otherwise. 64 00:03:10,620 --> 00:03:14,120 All right, because we know that given a theta, x is 65 00:03:14,120 --> 00:03:16,810 uniformly distributed between 0 and theta. 66 00:03:16,810 --> 00:03:20,520 So in order to make this uniform distribution, it's the 67 00:03:20,520 --> 00:03:23,900 normalization or the heights, you can think of it, has to be 68 00:03:23,900 --> 00:03:25,520 exactly 1 over theta. 69 00:03:25,520 --> 00:03:29,300 So just imagine for a concrete case, if theta were 1, 1 hour 70 00:03:29,300 --> 00:03:32,330 in the original problem, then this would just be a PDF of 1 71 00:03:32,330 --> 00:03:36,350 or a standard uniform distribution between 0 and 1. 72 00:03:36,350 --> 00:03:41,160 OK, so now this is, we have the necessary fundamentals for 73 00:03:41,160 --> 00:03:42,110 this problem. 74 00:03:42,110 --> 00:03:43,790 And what do we do in inference? 75 00:03:43,790 --> 00:03:46,760 Well the objective is to try to infer 76 00:03:46,760 --> 00:03:48,350 some unknown parameter. 77 00:03:48,350 --> 00:03:56,780 And what we have is we have a prior which is our initial 78 00:03:56,780 --> 00:04:00,280 belief for what this parameter might be. 79 00:04:00,280 --> 00:04:01,940 And then we have some data. 80 00:04:01,940 --> 00:04:04,370 So in this case, the data that we collect is the actual 81 00:04:04,370 --> 00:04:07,700 observed delayed for Juliet, x. 82 00:04:07,700 --> 00:04:10,420 And this model tells us how this data 83 00:04:10,420 --> 00:04:12,540 is essentially generated. 84 00:04:12,540 --> 00:04:17,740 And now what we do is, we want to use the data and our prior 85 00:04:17,740 --> 00:04:20,560 belief, combined them somehow, and use it to update our 86 00:04:20,560 --> 00:04:23,180 belief into what we call our posterior. 87 00:04:23,180 --> 00:04:26,050 In order to do that, we use Bayes' rule, which is why this 88 00:04:26,050 --> 00:04:28,340 is called Bayesian inference. 89 00:04:28,340 --> 00:04:33,750 So when we use Bayes' rule, remember the Bayes' rule is 90 00:04:33,750 --> 00:04:36,870 just, we want to now find the posterior which is the 91 00:04:36,870 --> 00:04:40,360 conditional PDF of theta, the unknown parameter, given x. 92 00:04:40,360 --> 00:04:43,710 So essentially just flip this condition. 93 00:04:43,710 --> 00:04:47,560 And remember Bayes' rule is given as the following. 94 00:04:47,560 --> 00:04:56,210 It's just the prior times this conditional PDF of x given 95 00:04:56,210 --> 00:05:02,290 theta divided by the PDF of x. 96 00:05:02,290 --> 00:05:06,660 All right, and we know what most of these things are. 97 00:05:06,660 --> 00:05:14,050 The prior or just the PDF of theta is 1. 98 00:05:14,050 --> 00:05:20,180 The condition PDF of x given theta is 1 over theta. 99 00:05:20,180 --> 00:05:24,370 And then of course we have this PDF of x. 100 00:05:24,370 --> 00:05:27,950 But we always have to be careful because these two 101 00:05:27,950 --> 00:05:32,520 values are only valid for certain ranges of theta and x. 102 00:05:32,520 --> 00:05:35,620 So in order for this to be valid we need theta to be 103 00:05:35,620 --> 00:05:38,560 between 0 and 1 because otherwise it would be 0. 104 00:05:38,560 --> 00:05:41,890 So we need theta to be between 0 and 1. 105 00:05:41,890 --> 00:05:45,640 And we need x to be between 0 and theta. 106 00:05:45,640 --> 00:05:49,580 107 00:05:49,580 --> 00:05:52,330 And otherwise this would be 0. 108 00:05:52,330 --> 00:05:55,400 109 00:05:55,400 --> 00:05:57,920 So now we're almost done. 110 00:05:57,920 --> 00:06:00,200 One last thing we need to do is just calculate what this 111 00:06:00,200 --> 00:06:06,210 denominator is, f x of x. 112 00:06:06,210 --> 00:06:09,360 Well the denominator, remember, is just a 113 00:06:09,360 --> 00:06:10,490 normalization. 114 00:06:10,490 --> 00:06:13,400 And it's actually relatively less important because what 115 00:06:13,400 --> 00:06:17,760 we'll find out is that this has no dependence on theta. 116 00:06:17,760 --> 00:06:21,670 It will only depend on x. 117 00:06:21,670 --> 00:06:24,440 So the importance, the dependence on theta, will be 118 00:06:24,440 --> 00:06:26,390 captured just by the numerator. 119 00:06:26,390 --> 00:06:29,810 But for completeness let's calculate out what this is. 120 00:06:29,810 --> 00:06:32,350 So it's just a normalization. 121 00:06:32,350 --> 00:06:41,230 So it's actually just the integral of the numerator. 122 00:06:41,230 --> 00:06:44,720 You can think of it as an application of kind of total 123 00:06:44,720 --> 00:06:45,970 probability. 124 00:06:45,970 --> 00:06:48,400 125 00:06:48,400 --> 00:06:52,700 So we have this that we integrate over and what do we 126 00:06:52,700 --> 00:06:54,460 integrate this over? 127 00:06:54,460 --> 00:06:58,020 Well we know that we're integrating over theta. 128 00:06:58,020 --> 00:07:02,220 And we know that theta has to be between x-- 129 00:07:02,220 --> 00:07:06,060 has to be greater than x and it has to be less than 1. 130 00:07:06,060 --> 00:07:10,750 So we integrate from theta equals x to 1. 131 00:07:10,750 --> 00:07:13,850 And this is just the integral from x to 1 of 132 00:07:13,850 --> 00:07:15,210 the numerator, right? 133 00:07:15,210 --> 00:07:17,830 This is just 1 and this is 1 over theta. 134 00:07:17,830 --> 00:07:23,370 So it's the integral of 1 over theta, d theta from x to 1. 135 00:07:23,370 --> 00:07:25,450 Which when you do it out, this is the integral, 136 00:07:25,450 --> 00:07:27,560 this is log of theta. 137 00:07:27,560 --> 00:07:33,520 So it's log of 1 minus log of x. 138 00:07:33,520 --> 00:07:35,750 Log of 1 is 0. 139 00:07:35,750 --> 00:07:39,800 X, remember x is between 0 and theta. 140 00:07:39,800 --> 00:07:40,560 Theta is less than 1. 141 00:07:40,560 --> 00:07:42,960 So x has to be between 0 and 1. 142 00:07:42,960 --> 00:07:46,310 The log of something between 0 and 1 is negative. 143 00:07:46,310 --> 00:07:48,690 So this is a negative number. 144 00:07:48,690 --> 00:07:51,030 This is 0. 145 00:07:51,030 --> 00:07:52,490 And then we have a negative sign. 146 00:07:52,490 --> 00:07:58,910 So really what we can write this as is the absolute value 147 00:07:58,910 --> 00:08:01,260 of log of x. 148 00:08:01,260 --> 00:08:04,590 This is just so that it would actually be negative log of x. 149 00:08:04,590 --> 00:08:07,300 But because log of x is negative we can just-- 150 00:08:07,300 --> 00:08:09,210 we know that this is actually going to be a positive number. 151 00:08:09,210 --> 00:08:13,550 So this is just to make it look more intuitive. 152 00:08:13,550 --> 00:08:17,810 OK so now to complete this we can just plug that back in and 153 00:08:17,810 --> 00:08:20,470 the final answer is-- 154 00:08:20,470 --> 00:08:31,480 this is going to be the absolute value log of x or you 155 00:08:31,480 --> 00:08:38,380 could also rewrite this as 1 over theta times absolute 156 00:08:38,380 --> 00:08:41,880 value log of x. 157 00:08:41,880 --> 00:08:45,780 And of course, remember that the actual limits for where 158 00:08:45,780 --> 00:08:49,690 this is valid are very important. 159 00:08:49,690 --> 00:08:54,430 OK, so what does this actually mean? 160 00:08:54,430 --> 00:09:02,170 Let's try to interpret what this answer is. 161 00:09:02,170 --> 00:09:07,900 So what we have is this is the posterior distribution. 162 00:09:07,900 --> 00:09:09,710 And now what have we done? 163 00:09:09,710 --> 00:09:14,650 Well we started out with the prior, which was that theta is 164 00:09:14,650 --> 00:09:21,500 uniform between 0 and between 0 and 1. 165 00:09:21,500 --> 00:09:24,120 This is our prior belief. 166 00:09:24,120 --> 00:09:25,900 Now we observed some data. 167 00:09:25,900 --> 00:09:29,200 And this allows us to update our belief. 168 00:09:29,200 --> 00:09:31,870 And this is the update that we get. 169 00:09:31,870 --> 00:09:36,700 So let's just assume that we observe that Juliet is late by 170 00:09:36,700 --> 00:09:38,430 half an hour. 171 00:09:38,430 --> 00:09:40,660 Well if she's late by half an hour, what does that tell us 172 00:09:40,660 --> 00:09:42,540 about what theta can be? 173 00:09:42,540 --> 00:09:46,750 Well what we know from that at least is that theta cannot be 174 00:09:46,750 --> 00:09:50,460 anything less than half an hour because if theta were 175 00:09:50,460 --> 00:09:54,040 less than half an hour there's no way that her delay-- 176 00:09:54,040 --> 00:09:56,740 remember her delay we know has to be distributed 177 00:09:56,740 --> 00:09:58,150 between 0 and theta. 178 00:09:58,150 --> 00:10:01,090 There's no way that her delay could be half an hour if theta 179 00:10:01,090 --> 00:10:02,860 were less than half an hour. 180 00:10:02,860 --> 00:10:10,960 So automatically we know that now theta has to be somewhere 181 00:10:10,960 --> 00:10:14,900 between x and one which is where this limit comes in. 182 00:10:14,900 --> 00:10:17,530 So we know that theta have to be between x and 1 now instead 183 00:10:17,530 --> 00:10:18,820 of just 0 and 1. 184 00:10:18,820 --> 00:10:23,910 So by observing an x that cuts down and eliminates part of 185 00:10:23,910 --> 00:10:28,320 the range of theta, the range that theta can take on. 186 00:10:28,320 --> 00:10:30,000 Now what else do we know? 187 00:10:30,000 --> 00:10:31,665 Well this, we can actually plot this. 188 00:10:31,665 --> 00:10:34,030 This is a function of theta. 189 00:10:34,030 --> 00:10:35,930 The log x, we can just think of it as some 190 00:10:35,930 --> 00:10:37,500 sort of scaling factor. 191 00:10:37,500 --> 00:10:41,030 So it's something like 1 over theta scaled. 192 00:10:41,030 --> 00:10:43,015 And so that's going to look something like this. 193 00:10:43,015 --> 00:10:46,080 194 00:10:46,080 --> 00:10:48,270 And so what we've done is we've transformed the prior, 195 00:10:48,270 --> 00:10:50,850 which looks like flat and uniform into something that 196 00:10:50,850 --> 00:10:52,660 looks like this, the posterior. 197 00:10:52,660 --> 00:10:56,050 So we've eliminated small values of x because we know 198 00:10:56,050 --> 00:10:57,650 that those can't be possible. 199 00:10:57,650 --> 00:11:01,600 And now what's left is everything between x and 1. 200 00:11:01,600 --> 00:11:07,830 So now why is it also that it becomes not uniform 201 00:11:07,830 --> 00:11:09,520 between x and 1? 202 00:11:09,520 --> 00:11:16,630 Well it's because, if you think about it, when theta is 203 00:11:16,630 --> 00:11:20,010 close to x, so say x is half an hour. 204 00:11:20,010 --> 00:11:23,520 If theta is half an hour, that means that there's higher 205 00:11:23,520 --> 00:11:26,320 probability that you will actually observe something, a 206 00:11:26,320 --> 00:11:31,570 delay of half an hour because there's only a range between 0 207 00:11:31,570 --> 00:11:36,640 and half an hour that x can be drawn from. 208 00:11:36,640 --> 00:11:42,120 Now if theta was actually 1 then x could be drawn anywhere 209 00:11:42,120 --> 00:11:44,330 from 0 to 1 which is a wider range. 210 00:11:44,330 --> 00:11:48,730 And so it's less likely that you'll get a value of x equal 211 00:11:48,730 --> 00:11:49,780 to half an hour. 212 00:11:49,780 --> 00:11:55,070 And so because of that values of theta closer 213 00:11:55,070 --> 00:11:56,540 to x are more likely. 214 00:11:56,540 --> 00:12:01,320 That's why you get this decreasing function. 215 00:12:01,320 --> 00:12:09,690 OK, so now let's continue and now what we have is this is 216 00:12:09,690 --> 00:12:12,350 the case for if you observe one data point. 217 00:12:12,350 --> 00:12:16,440 So you arrange a date with Juliet, you observe how late 218 00:12:16,440 --> 00:12:18,800 she is, and you get one value of x. 219 00:12:18,800 --> 00:12:22,690 And now suppose you want to get collect more data so you 220 00:12:22,690 --> 00:12:25,110 arrange say 10 dates with Juliet. 221 00:12:25,110 --> 00:12:27,420 And for each one you observe how late she was. 222 00:12:27,420 --> 00:12:33,720 So now we can collect multiple samples, say 223 00:12:33,720 --> 00:12:35,290 n samples of delays. 224 00:12:35,290 --> 00:12:39,080 So x1 is her delay on the first date. 225 00:12:39,080 --> 00:12:41,950 Xn is her delay on the nth date. 226 00:12:41,950 --> 00:12:45,100 And x we can now just call a variable that's a collection 227 00:12:45,100 --> 00:12:46,730 of all of these. 228 00:12:46,730 --> 00:12:49,180 And now the question is, how do you incorporate in all this 229 00:12:49,180 --> 00:12:53,330 information into updating your belief about theta? 230 00:12:53,330 --> 00:12:55,360 And it's actually pretty analogous to 231 00:12:55,360 --> 00:12:56,730 what we've done here. 232 00:12:56,730 --> 00:12:59,030 The important assumption that we make in this problem is 233 00:12:59,030 --> 00:13:04,590 that conditional on theta, all of these delays are in fact 234 00:13:04,590 --> 00:13:06,730 conditionally independent. 235 00:13:06,730 --> 00:13:09,210 And that's going to help us solve this problem. 236 00:13:09,210 --> 00:13:13,900 So the set up is essentially the same. 237 00:13:13,900 --> 00:13:16,740 What we still need is a-- 238 00:13:16,740 --> 00:13:18,980 we still need the prior. 239 00:13:18,980 --> 00:13:20,230 And the prior hasn't changed. 240 00:13:20,230 --> 00:13:23,980 241 00:13:23,980 --> 00:13:27,225 The prior is still uniform between 0 and 1. 242 00:13:27,225 --> 00:13:35,460 243 00:13:35,460 --> 00:13:40,140 The way the actual delays are generated is we still assume 244 00:13:40,140 --> 00:13:44,420 to be the same given conditional on theta, each one 245 00:13:44,420 --> 00:13:47,230 of these is conditionally independent, and each one is 246 00:13:47,230 --> 00:13:51,000 uniformly distributed between 0 and theta. 247 00:13:51,000 --> 00:13:57,790 And so what we get is that this is going to be equal to-- 248 00:13:57,790 --> 00:14:04,420 you can also imagine this as a big joint PDF, joint 249 00:14:04,420 --> 00:14:12,250 conditional PDF of all the x's. 250 00:14:12,250 --> 00:14:15,800 And because we said that they are conditionally independent 251 00:14:15,800 --> 00:14:20,450 given theta, then we can actually split this joint PDF 252 00:14:20,450 --> 00:14:24,330 into the product of a lot of individual conditional PDFs. 253 00:14:24,330 --> 00:14:29,240 So this we can actually rewrite as PDF of x1 given 254 00:14:29,240 --> 00:14:33,720 theta times all the way through the condition PDF of 255 00:14:33,720 --> 00:14:38,640 xn given theta. 256 00:14:38,640 --> 00:14:41,870 And because we assume that each one of these is-- 257 00:14:41,870 --> 00:14:44,060 for each one of these it's uniformly distributed between 258 00:14:44,060 --> 00:14:46,680 0 and theta, they're all the same. 259 00:14:46,680 --> 00:14:49,600 So in fact what we get is 1 over theta 260 00:14:49,600 --> 00:14:50,810 for each one of these. 261 00:14:50,810 --> 00:14:52,090 And there's n of them. 262 00:14:52,090 --> 00:14:53,490 So it's 1 over theta to the n. 263 00:14:53,490 --> 00:14:57,700 264 00:14:57,700 --> 00:15:02,010 But what values of x is this valid for? 265 00:15:02,010 --> 00:15:03,590 What values of x and theta? 266 00:15:03,590 --> 00:15:08,290 Well what we need is that for each one of these, we need 267 00:15:08,290 --> 00:15:14,940 that theta has to be at least equal to whatever x you get. 268 00:15:14,940 --> 00:15:17,820 Whatever x you observe, theta has to at least that. 269 00:15:17,820 --> 00:15:28,730 So we know that theta has to at least equal to x1 and all 270 00:15:28,730 --> 00:15:29,490 the way through xn. 271 00:15:29,490 --> 00:15:32,650 And so theta has to be at least greater than or equal to 272 00:15:32,650 --> 00:15:39,370 all these x's and otherwise this would be 0. 273 00:15:39,370 --> 00:15:42,110 So let's define something that's going to help us. 274 00:15:42,110 --> 00:15:50,620 Let's define x bar to be the maximum of all 275 00:15:50,620 --> 00:15:53,680 the observed x's. 276 00:15:53,680 --> 00:16:03,600 And so what we can do is rewrite this condition as 277 00:16:03,600 --> 00:16:06,590 theta has to be at least equal to the 278 00:16:06,590 --> 00:16:10,050 maximum, equal to x bar. 279 00:16:10,050 --> 00:16:13,740 All right, and now we can again apply Bayes' rule. 280 00:16:13,740 --> 00:16:16,560 Bayes' rule will tell us what this posterior 281 00:16:16,560 --> 00:16:19,270 distribution is. 282 00:16:19,270 --> 00:16:27,490 So again the numerator will be the prior times this 283 00:16:27,490 --> 00:16:34,645 conditional PDF over PDF of x. 284 00:16:34,645 --> 00:16:41,100 OK, so the numerator again, the prior is just one. 285 00:16:41,100 --> 00:16:43,850 This distribution we calculated over here. 286 00:16:43,850 --> 00:16:47,240 It's 1 over theta to the n. 287 00:16:47,240 --> 00:16:50,630 And then we have this denominator. 288 00:16:50,630 --> 00:16:53,660 289 00:16:53,660 --> 00:16:58,320 And again, we need to be careful to write down when 290 00:16:58,320 --> 00:16:59,650 this is actually valid. 291 00:16:59,650 --> 00:17:08,599 So it's actually valid when x bar is greater than theta-- 292 00:17:08,599 --> 00:17:12,069 I'm sorry, x bar is less than or equal to theta, and 293 00:17:12,069 --> 00:17:14,480 otherwise it's zero. 294 00:17:14,480 --> 00:17:17,390 So this is actually more or less complete. 295 00:17:17,390 --> 00:17:22,109 Again we need to calculate out what exactly this denominator 296 00:17:22,109 --> 00:17:26,660 is but just like before it's actually just a scaling factor 297 00:17:26,660 --> 00:17:28,800 which is independent of what theta is. 298 00:17:28,800 --> 00:17:31,530 So if we wanted to, we could actually calculate this out. 299 00:17:31,530 --> 00:17:34,150 It would be just like before. 300 00:17:34,150 --> 00:17:37,200 It would be the integral of the numerator, which is 1 over 301 00:17:37,200 --> 00:17:39,340 theta to the n d theta. 302 00:17:39,340 --> 00:17:43,050 And we integrate theta from before, it was from x to 1. 303 00:17:43,050 --> 00:17:46,970 But now we need to integrate from x bar to 1. 304 00:17:46,970 --> 00:17:49,100 And if we wanted to, we can actually do others. 305 00:17:49,100 --> 00:17:54,230 It's fairly simple calculus to calculate what this 306 00:17:54,230 --> 00:17:55,840 normalization factor would be. 307 00:17:55,840 --> 00:17:58,600 But the main point is that the shape of it will be dictated 308 00:17:58,600 --> 00:18:02,100 by this 1 over theta to the n term. 309 00:18:02,100 --> 00:18:05,820 And so now we know that with n pieces of data, it's actually 310 00:18:05,820 --> 00:18:07,530 going to be 1-- 311 00:18:07,530 --> 00:18:11,190 the shape will be 1 over theta to the n, where theta has to 312 00:18:11,190 --> 00:18:14,700 be at least greater than or equal to x bar. 313 00:18:14,700 --> 00:18:18,890 Before it was actually just 1 over theta and has to be 314 00:18:18,890 --> 00:18:20,920 between x and 1. 315 00:18:20,920 --> 00:18:24,920 So you can kind of see how the problem generalizes when you 316 00:18:24,920 --> 00:18:27,140 collect more data. 317 00:18:27,140 --> 00:18:30,920 So now imagine that this is the new-- 318 00:18:30,920 --> 00:18:34,460 when you collect n pieces of data, the maximum of all the 319 00:18:34,460 --> 00:18:36,140 x's is here. 320 00:18:36,140 --> 00:18:40,000 Well, it turns out that it's the posterior now is going to 321 00:18:40,000 --> 00:18:43,020 look something like this. 322 00:18:43,020 --> 00:18:45,960 323 00:18:45,960 --> 00:18:49,460 So it becomes steeper because it's 1 over theta to the n as 324 00:18:49,460 --> 00:18:50,740 opposed to 1 over theta. 325 00:18:50,740 --> 00:18:55,560 And it's limited to be between x bar and 1. 326 00:18:55,560 --> 00:19:01,020 And so with more data you're more sure of the range that 327 00:19:01,020 --> 00:19:08,260 theta can take on because each data points eliminates parts 328 00:19:08,260 --> 00:19:11,260 of theta, the range of theta that theta can't be. 329 00:19:11,260 --> 00:19:13,720 And so you're left with just x bar to 1. 330 00:19:13,720 --> 00:19:15,320 And you're also more certain. 331 00:19:15,320 --> 00:19:20,350 So you have this kind of distribution. 332 00:19:20,350 --> 00:19:26,590 OK, so this is kind of the posterior distribution which 333 00:19:26,590 --> 00:19:31,170 tells you the entire distribution of what the 334 00:19:31,170 --> 00:19:33,000 unknown parameter-- 335 00:19:33,000 --> 00:19:35,220 the entire distribution of the unknown parameter given all 336 00:19:35,220 --> 00:19:38,750 the data that you have plus the prior 337 00:19:38,750 --> 00:19:40,800 distribution that you have. 338 00:19:40,800 --> 00:19:44,250 But if someone were to come to ask you, your manager asks 339 00:19:44,250 --> 00:19:49,090 you, well what is your best guess of what theta is? 340 00:19:49,090 --> 00:19:54,130 It's less informative or less clear when you tell them, 341 00:19:54,130 --> 00:19:55,560 here's the distribution. 342 00:19:55,560 --> 00:20:00,000 Because you still have a big range of what theta could be, 343 00:20:00,000 --> 00:20:02,760 it could be anything between x and 1 or x bar and 1. 344 00:20:02,760 --> 00:20:05,420 So if you wanted to actually come up with a point estimate 345 00:20:05,420 --> 00:20:09,190 which is just one single value, there's different ways 346 00:20:09,190 --> 00:20:10,100 you can do it. 347 00:20:10,100 --> 00:20:16,380 The first way that we'll talk about is the map rule. 348 00:20:16,380 --> 00:20:20,130 What the map rule does is it takes the posterior 349 00:20:20,130 --> 00:20:25,700 distribution and just finds the value of the parameter 350 00:20:25,700 --> 00:20:29,050 that gives the maximum posterior distribution, the 351 00:20:29,050 --> 00:20:31,360 maximum point in the posterior distribution. 352 00:20:31,360 --> 00:20:39,560 So if you look at this posture distribution, the map will 353 00:20:39,560 --> 00:20:43,870 just take the highest value. 354 00:20:43,870 --> 00:20:47,320 And in this case, because the posterior looks like this, the 355 00:20:47,320 --> 00:20:51,060 highest value is in fact x. 356 00:20:51,060 --> 00:21:00,260 And so theta hat map is actually just x. 357 00:21:00,260 --> 00:21:03,360 And if you think about it, this kind of an optimistic 358 00:21:03,360 --> 00:21:07,310 estimate because you always assume that it's whatever, if 359 00:21:07,310 --> 00:21:12,800 Juliet were 30 minutes late then you assume that her delay 360 00:21:12,800 --> 00:21:16,300 is uniformly distributed between 0 and 30 minutes. 361 00:21:16,300 --> 00:21:20,710 Well in fact, even though she arrived 30 minutes late, that 362 00:21:20,710 --> 00:21:24,340 could have been because she's actually distributed between 0 363 00:21:24,340 --> 00:21:27,450 and 1 hour and you just happened to get 30 minutes. 364 00:21:27,450 --> 00:21:30,590 But what you do is you always take kind of the optimistic, 365 00:21:30,590 --> 00:21:33,690 and just give her the benefit of the doubt, and say that was 366 00:21:33,690 --> 00:21:37,210 actually kind of the worst case scenario given her 367 00:21:37,210 --> 00:21:39,320 distribution. 368 00:21:39,320 --> 00:21:43,490 So another way to take this entire posterior distribution 369 00:21:43,490 --> 00:21:46,440 and come up with just a single number, a point estimate, is 370 00:21:46,440 --> 00:21:49,560 to take the conditional expectation. 371 00:21:49,560 --> 00:21:51,750 So you have an entire distribution. 372 00:21:51,750 --> 00:21:55,020 So there's two obvious ways of getting a number out of this. 373 00:21:55,020 --> 00:21:57,500 One is to take the maximum and the other is to take the 374 00:21:57,500 --> 00:21:58,170 expectation. 375 00:21:58,170 --> 00:22:01,130 So take everything in the distribution, combine it and 376 00:22:01,130 --> 00:22:03,740 come up with a estimate. 377 00:22:03,740 --> 00:22:06,240 So if you think about it, it will probably be something 378 00:22:06,240 --> 00:22:09,610 like here, would be the conditional distribution. 379 00:22:09,610 --> 00:22:12,890 So this is called the LMS estimator. 380 00:22:12,890 --> 00:22:17,590 And the way to calculate it is just like we said, you take 381 00:22:17,590 --> 00:22:18,840 the conditional expectation. 382 00:22:18,840 --> 00:22:21,190 383 00:22:21,190 --> 00:22:23,710 So how do we take the conditional expectation? 384 00:22:23,710 --> 00:22:29,810 Remember it is just the value and you weight it by the 385 00:22:29,810 --> 00:22:33,260 correct distribution, in this case it's the conditional PDF 386 00:22:33,260 --> 00:22:37,580 of theta given x which is the posterior distribution. 387 00:22:37,580 --> 00:22:40,770 And what do we integrate theta from? 388 00:22:40,770 --> 00:22:45,840 Well we integrate it from x to 1. 389 00:22:45,840 --> 00:22:48,780 Now if we plug this in, we integrate from x to 1, theta 390 00:22:48,780 --> 00:22:56,370 times the posterior. 391 00:22:56,370 --> 00:23:02,710 The posterior we calculated earlier, it was 1 over theta 392 00:23:02,710 --> 00:23:07,840 times the absolute value of log x. 393 00:23:07,840 --> 00:23:11,090 So the thetas just cancel out, and you just have 1 over 394 00:23:11,090 --> 00:23:12,110 absolute value of log x. 395 00:23:12,110 --> 00:23:13,970 Well that doesn't depend on theta. 396 00:23:13,970 --> 00:23:20,150 So what you get is just 1 minus x over absolute 397 00:23:20,150 --> 00:23:23,810 value of log x. 398 00:23:23,810 --> 00:23:28,870 All right, so we can actually plot this, so we have two 399 00:23:28,870 --> 00:23:29,570 estimates now. 400 00:23:29,570 --> 00:23:33,280 One is that the estimate is just theta-- 401 00:23:33,280 --> 00:23:34,830 the estimate is just x. 402 00:23:34,830 --> 00:23:37,710 The other one is that it's 1 minus x over absolute 403 00:23:37,710 --> 00:23:39,840 value of log x. 404 00:23:39,840 --> 00:23:41,630 So we can plot this and compare the two. 405 00:23:41,630 --> 00:23:45,250 406 00:23:45,250 --> 00:23:53,320 So here's x, and here is theta hat, theta hat of x for the 407 00:23:53,320 --> 00:23:55,490 two different estimates. 408 00:23:55,490 --> 00:24:02,690 So here's you the estimate from the map rule which is 409 00:24:02,690 --> 00:24:06,190 whatever x is, we estimate that theta is equal to x. 410 00:24:06,190 --> 00:24:09,130 So it just looks like this. 411 00:24:09,130 --> 00:24:11,520 Now if we plot this, turns out that it looks 412 00:24:11,520 --> 00:24:12,770 something like this. 413 00:24:12,770 --> 00:24:18,980 414 00:24:18,980 --> 00:24:22,550 And so whatever x is, this will tell you what the 415 00:24:22,550 --> 00:24:25,480 estimate, the LMS estimate of theta would be. 416 00:24:25,480 --> 00:24:27,740 And it turns out that it's always higher 417 00:24:27,740 --> 00:24:29,850 than the map estimate. 418 00:24:29,850 --> 00:24:33,330 So it's less optimistic. 419 00:24:33,330 --> 00:24:36,485 And it kind of factors in the entire distribution. 420 00:24:36,485 --> 00:24:41,380 421 00:24:41,380 --> 00:24:43,980 So because there are several parts to this problem, we're 422 00:24:43,980 --> 00:24:46,470 going to take a pause for a quick break and we'll come 423 00:24:46,470 --> 00:24:48,500 back and finish the problem in a little bit. 424 00:24:48,500 --> 00:24:51,034