1 00:00:00,806 --> 00:00:04,200 Let us now come back to the trajectory estimation problem 2 00:00:04,200 --> 00:00:06,250 that we introduced earlier. 3 00:00:06,250 --> 00:00:09,630 We have an object that moves vertically. 4 00:00:09,630 --> 00:00:14,720 At any given time t, the height at which the object is found 5 00:00:14,720 --> 00:00:17,350 is equal to this expression. 6 00:00:17,350 --> 00:00:19,950 It corresponds to the following-- the object starts 7 00:00:19,950 --> 00:00:22,940 at time 0, at some initial height Theta0, 8 00:00:22,940 --> 00:00:25,570 it has an initial velocity of Theta1, 9 00:00:25,570 --> 00:00:27,980 but also has a certain acceleration. 10 00:00:27,980 --> 00:00:29,890 And if Theta2 is negative, this will 11 00:00:29,890 --> 00:00:31,750 be a downwards acceleration, which 12 00:00:31,750 --> 00:00:36,360 means that the object eventually will turn and start going down. 13 00:00:36,360 --> 00:00:39,850 So this is a typical trajectory of such an object, where 14 00:00:39,850 --> 00:00:44,010 here we're plotting the height as a function of time. 15 00:00:44,010 --> 00:00:47,810 However, the Thetas are unknown and they 16 00:00:47,810 --> 00:00:51,120 are random-- we do not know what they are. 17 00:00:51,120 --> 00:00:53,745 So this blue curve is just a simulation 18 00:00:53,745 --> 00:00:57,910 where we drew values for those random variables at random. 19 00:00:57,910 --> 00:00:59,640 But if we were to simulate again, 20 00:00:59,640 --> 00:01:02,380 we might obtain a somewhat different blue curve, 21 00:01:02,380 --> 00:01:06,250 because the values of the Thetas might have been different. 22 00:01:06,250 --> 00:01:09,830 We do not observe the true trajectory directly. 23 00:01:09,830 --> 00:01:13,610 What we do observe is certain data points. 24 00:01:13,610 --> 00:01:14,820 What are they? 25 00:01:14,820 --> 00:01:19,000 At certain times ti we make a measurement 26 00:01:19,000 --> 00:01:23,030 of the height of the object, except that this measurement 27 00:01:23,030 --> 00:01:26,120 is corrupted by some additive noise. 28 00:01:26,120 --> 00:01:28,440 This is the model that we introduced earlier. 29 00:01:28,440 --> 00:01:31,340 And our assumptions were that all of the random variables 30 00:01:31,340 --> 00:01:36,020 involved-- the Thetas and the W's were normal with 0 mean 31 00:01:36,020 --> 00:01:38,539 and were also independent. 32 00:01:38,539 --> 00:01:43,390 In that case, we saw that maximizing the posterior 33 00:01:43,390 --> 00:01:49,000 distribution of the Thetas after taking logarithms 34 00:01:49,000 --> 00:01:52,690 amounted to minimizing this quadratic function 35 00:01:52,690 --> 00:01:54,039 of the thetas. 36 00:01:54,039 --> 00:01:57,370 So once we have some data available in our hands, 37 00:01:57,370 --> 00:02:00,500 we look at this expression as a function of the thetas 38 00:02:00,500 --> 00:02:04,270 and find the thetas that are as as good as possible 39 00:02:04,270 --> 00:02:06,180 in terms of this criterion. 40 00:02:06,180 --> 00:02:10,360 And this is the MAP methodology for this particular example. 41 00:02:10,360 --> 00:02:12,490 Now, for the purposes of this illustration, 42 00:02:12,490 --> 00:02:16,430 actually, we will change our assumptions a little bit. 43 00:02:16,430 --> 00:02:17,850 They will be as follows. 44 00:02:17,850 --> 00:02:22,300 Regarding the acceleration, we will take it to be a constant. 45 00:02:22,300 --> 00:02:24,240 The acceleration term often has to do 46 00:02:24,240 --> 00:02:26,650 with gravitational effects which are known, 47 00:02:26,650 --> 00:02:28,860 so we will treat Theta2 as a constant. 48 00:02:28,860 --> 00:02:30,770 And that means that there's no point 49 00:02:30,770 --> 00:02:34,060 in having a prior distribution for Theta2. 50 00:02:34,060 --> 00:02:36,860 So this term here, which originated 51 00:02:36,860 --> 00:02:41,570 from the prior distribution of Theta2 is going to disappear. 52 00:02:41,570 --> 00:02:45,760 We will take the variances of these basic random variables 53 00:02:45,760 --> 00:02:47,310 to be the same. 54 00:02:47,310 --> 00:02:49,490 And because of this, these constants 55 00:02:49,490 --> 00:02:52,030 here will all be the same. 56 00:02:52,030 --> 00:02:55,560 Therefore, we can take them outside of this expression, 57 00:02:55,560 --> 00:02:58,670 and outside the minimization they will not matter. 58 00:02:58,670 --> 00:03:02,220 So we can remove them from the picture. 59 00:03:02,220 --> 00:03:05,330 The factor of 1/2 can also be removed similarly. 60 00:03:05,330 --> 00:03:08,510 It does not affect the minimization. 61 00:03:08,510 --> 00:03:12,700 Finally, just in order to get a nicer illustration, 62 00:03:12,700 --> 00:03:15,920 instead of taking 0 means, we're assuming 63 00:03:15,920 --> 00:03:19,620 that the initial position has a mean of 200. 64 00:03:19,620 --> 00:03:22,560 So we're starting somewhere around here. 65 00:03:22,560 --> 00:03:26,230 And furthermore, the initial velocity has a mean of 50. 66 00:03:26,230 --> 00:03:29,920 So we expect the object to start moving upwards. 67 00:03:29,920 --> 00:03:32,650 How does this change the formulation? 68 00:03:32,650 --> 00:03:35,880 Well, remember, that this term and this term 69 00:03:35,880 --> 00:03:41,480 originated from the priors for Theta0 and Theta1. 70 00:03:41,480 --> 00:03:46,380 If we now change the means, the priors will change. 71 00:03:46,380 --> 00:03:50,600 And what happens, if you recall the formula for the normal PDF 72 00:03:50,600 --> 00:03:54,300 and how the mean enters, after you take logarithms, 73 00:03:54,300 --> 00:03:57,010 you see that instead of having here theta0, 74 00:03:57,010 --> 00:04:02,980 you should have theta0 minus the mean of theta0 squared. 75 00:04:02,980 --> 00:04:06,640 And this leads us to the following formulation. 76 00:04:06,640 --> 00:04:10,630 So this is the formulation that we will consider. 77 00:04:10,630 --> 00:04:14,030 We obtain these data points, and for these particular data 78 00:04:14,030 --> 00:04:17,680 points and for known times at which the measurements were 79 00:04:17,680 --> 00:04:22,260 taken, we put these numbers into this minimization, carried 80 00:04:22,260 --> 00:04:26,450 it out numerically, and this is what we got. 81 00:04:26,450 --> 00:04:29,680 We got estimates for the different parameters. 82 00:04:29,680 --> 00:04:33,640 And using this estimates, we can use this expression 83 00:04:33,640 --> 00:04:36,240 to construct an estimated trajectory. 84 00:04:36,240 --> 00:04:39,870 And the estimated trajectory is given by the red curve. 85 00:04:39,870 --> 00:04:43,500 It seems to be doing somewhat of a reasonable job, 86 00:04:43,500 --> 00:04:44,720 but not quite. 87 00:04:44,720 --> 00:04:49,120 The distance between these two curves is quite substantial. 88 00:04:49,120 --> 00:04:50,970 How could we do a little better? 89 00:04:50,970 --> 00:04:54,659 Why is it that we're not doing very well? 90 00:04:54,659 --> 00:04:57,210 Let's think intuitively. 91 00:04:57,210 --> 00:05:01,200 One of the parameters we wish to estimate is Theta1. 92 00:05:01,200 --> 00:05:03,280 And Theta1 is a velocity. 93 00:05:03,280 --> 00:05:05,590 Now, all of our measurements are concentrated 94 00:05:05,590 --> 00:05:07,830 at pretty much the same time. 95 00:05:07,830 --> 00:05:11,430 But if you measure an object only at a certain time, 96 00:05:11,430 --> 00:05:14,490 it is very difficult to estimate its velocity. 97 00:05:14,490 --> 00:05:16,820 A much better idea would be to try 98 00:05:16,820 --> 00:05:19,900 to measure the position of the object at different times 99 00:05:19,900 --> 00:05:23,700 and use that information to estimate velocity. 100 00:05:23,700 --> 00:05:27,300 So let us instead of taking all the measurements 101 00:05:27,300 --> 00:05:31,490 around the initial time, have five measurements 102 00:05:31,490 --> 00:05:34,980 in the beginning and five measurements towards the end. 103 00:05:34,980 --> 00:05:37,380 The total number of measurements in this example 104 00:05:37,380 --> 00:05:40,280 is the same as in the previous example. 105 00:05:40,280 --> 00:05:44,780 And once more, we generate a simulated trajectory 106 00:05:44,780 --> 00:05:47,010 according to the probability distributions 107 00:05:47,010 --> 00:05:48,409 that we are assuming. 108 00:05:48,409 --> 00:05:51,270 Then we generate data according to this model 109 00:05:51,270 --> 00:05:54,909 and we wish to estimate this trajectory. 110 00:05:54,909 --> 00:05:58,380 We take the data, plug them into this minimization, 111 00:05:58,380 --> 00:06:02,390 carry it out numerically, and this is what we obtain. 112 00:06:02,390 --> 00:06:06,160 So we see that here we are doing a lot better. 113 00:06:06,160 --> 00:06:08,510 The estimated trajectory is quite 114 00:06:08,510 --> 00:06:12,230 close to the unknown blue trajectory, 115 00:06:12,230 --> 00:06:17,910 even though the data seems to be scattered quite a bit. 116 00:06:17,910 --> 00:06:19,440 This is a very nice property. 117 00:06:19,440 --> 00:06:23,290 But is it just an accident of this numerical experiment? 118 00:06:23,290 --> 00:06:26,730 Or, also, to put it differently, once you 119 00:06:26,730 --> 00:06:29,510 get your estimated trajectory, yes, it 120 00:06:29,510 --> 00:06:32,380 is true that it is close to the blue trajectory, 121 00:06:32,380 --> 00:06:35,980 but you do not necessarily know that fact. 122 00:06:35,980 --> 00:06:38,190 It is one thing to have an estimate that 123 00:06:38,190 --> 00:06:43,310 is close to the true value, and it's a different thing 124 00:06:43,310 --> 00:06:45,450 to have an estimate that you know 125 00:06:45,450 --> 00:06:48,380 that it is close to the true value. 126 00:06:48,380 --> 00:06:52,170 So how could we get some guarantees that, indeed, this 127 00:06:52,170 --> 00:06:56,520 is the case, that we have good estimates? 128 00:06:56,520 --> 00:06:59,070 Here's how it goes. 129 00:06:59,070 --> 00:07:02,420 As we discussed before, the posterior distribution 130 00:07:02,420 --> 00:07:06,670 of the Thetas given the data is normal. 131 00:07:06,670 --> 00:07:10,940 And for similar reasons, the posterior distribution 132 00:07:10,940 --> 00:07:14,400 of this quantity, which is the true position, 133 00:07:14,400 --> 00:07:18,650 it's what we denoted by X of t, the posterior distribution of X 134 00:07:18,650 --> 00:07:21,890 of t is also normal. 135 00:07:21,890 --> 00:07:27,040 And in fact, what we obtain from this diagram is at any given 136 00:07:27,040 --> 00:07:30,430 point it's the maximum of posteriority probability 137 00:07:30,430 --> 00:07:35,530 estimate of the position X of t at that time. 138 00:07:35,530 --> 00:07:39,270 However, besides just this point estimate, 139 00:07:39,270 --> 00:07:41,050 we have additional information. 140 00:07:44,940 --> 00:07:47,950 We know that the posterior distribution of X of t 141 00:07:47,950 --> 00:07:48,810 is normal. 142 00:07:48,810 --> 00:07:51,110 And so, for example, at this time, 143 00:07:51,110 --> 00:07:54,900 this is the peak of the posterior. 144 00:07:54,900 --> 00:07:58,430 This is the maximum a posteriori probability estimate. 145 00:07:58,430 --> 00:08:04,720 By we are also able to calculate the variance of this posterior 146 00:08:04,720 --> 00:08:06,180 distribution. 147 00:08:06,180 --> 00:08:08,650 This is a calculation that's a bit complicated 148 00:08:08,650 --> 00:08:11,370 for the multivariate case, for the case where 149 00:08:11,370 --> 00:08:13,390 you have multiple unknown parameters. 150 00:08:13,390 --> 00:08:15,300 We will not get into it. 151 00:08:15,300 --> 00:08:17,580 But we did see earlier an example 152 00:08:17,580 --> 00:08:19,950 where we had a single unknown parameter, 153 00:08:19,950 --> 00:08:22,360 and in which we were able to calculate 154 00:08:22,360 --> 00:08:24,620 the variance of the posterior distribution. 155 00:08:24,620 --> 00:08:27,220 So the idea is somewhat similar. 156 00:08:27,220 --> 00:08:30,900 So not only we have an estimate for the position 157 00:08:30,900 --> 00:08:33,409 of the object at this particular time, 158 00:08:33,409 --> 00:08:36,230 but we also have a probability distribution 159 00:08:36,230 --> 00:08:39,980 for what the true position might be. 160 00:08:39,980 --> 00:08:43,240 And once we have such a posterior probability 161 00:08:43,240 --> 00:08:48,340 distribution, we can find an interval with the property 162 00:08:48,340 --> 00:08:55,020 that 95% of the probability is inside that interval. 163 00:08:55,020 --> 00:08:58,860 In other words, we construct an interval with the property 164 00:08:58,860 --> 00:09:04,235 that the probability that X of t belongs to the interval. 165 00:09:06,790 --> 00:09:09,440 (Now, we're talking about posterior probabilities. 166 00:09:09,440 --> 00:09:15,660 So it is a posterior probability, given the data.) 167 00:09:15,660 --> 00:09:20,670 This probability is, let's say, 0.95. 168 00:09:20,670 --> 00:09:23,970 Such an interval gives useful information besides a point 169 00:09:23,970 --> 00:09:28,470 estimate, it also gives us a range of possible values. 170 00:09:28,470 --> 00:09:31,760 And outside this range, it is quite unlikely 171 00:09:31,760 --> 00:09:34,770 to have the true trajectory be out there. 172 00:09:34,770 --> 00:09:38,800 So here we're showing some confidence intervals that 173 00:09:38,800 --> 00:09:42,830 apply to different times, and they're pretty narrow, 174 00:09:42,830 --> 00:09:44,260 they're pretty small. 175 00:09:44,260 --> 00:09:48,370 And they indicate, they give us confidence 176 00:09:48,370 --> 00:09:51,230 that we have pretty accurate estimates 177 00:09:51,230 --> 00:09:53,940 of the true trajectory. 178 00:09:53,940 --> 00:09:55,760 This kind of confidence intervals 179 00:09:55,760 --> 00:09:59,060 that we have discussed in the context of this examples 180 00:09:59,060 --> 00:10:03,580 are called Bayesian confidence intervals. 181 00:10:03,580 --> 00:10:06,540 And they're very useful when you report your results, 182 00:10:06,540 --> 00:10:08,560 to not just give point estimates, 183 00:10:08,560 --> 00:10:13,540 but to also provide confidence intervals. 184 00:10:13,540 --> 00:10:16,150 Coming back to the bigger picture, what 185 00:10:16,150 --> 00:10:18,490 happened in this particular example 186 00:10:18,490 --> 00:10:22,100 is quite indicative of many real world applications. 187 00:10:22,100 --> 00:10:24,870 One starts with a linear model, in which 188 00:10:24,870 --> 00:10:29,590 we have a linear relation between the variables that 189 00:10:29,590 --> 00:10:32,180 are unknown and the observations, 190 00:10:32,180 --> 00:10:35,180 but where also the observations are corrupted by noise. 191 00:10:35,180 --> 00:10:38,495 One makes certain normality and independence assumptions. 192 00:10:38,495 --> 00:10:41,500 And as long as the modeling has been done carefully 193 00:10:41,500 --> 00:10:44,050 and the assumptions are justified, then 194 00:10:44,050 --> 00:10:47,560 by carrying out this procedure, one usually 195 00:10:47,560 --> 00:10:52,990 obtains estimates that are very helpful and very informative.