1 00:00:01,480 --> 00:00:04,670 The trajectory estimation problem that we considered 2 00:00:04,670 --> 00:00:08,220 gives you a first glimpse into a large field 3 00:00:08,220 --> 00:00:11,380 that deals with linear normal models. 4 00:00:11,380 --> 00:00:13,220 In this segment, I will just give you 5 00:00:13,220 --> 00:00:16,050 a preview of what happens in that field, 6 00:00:16,050 --> 00:00:20,250 although, we will not attempt to prove or justify anything. 7 00:00:20,250 --> 00:00:22,630 What happens in this field is that we're 8 00:00:22,630 --> 00:00:25,970 dealing with models where there are some underlying 9 00:00:25,970 --> 00:00:28,510 independent normal random variables. 10 00:00:28,510 --> 00:00:31,110 And then, the random variables of interest, 11 00:00:31,110 --> 00:00:33,420 the unknown parameters and the observations, 12 00:00:33,420 --> 00:00:35,960 can all be expressed as linear functions 13 00:00:35,960 --> 00:00:38,140 of these independent normals. 14 00:00:38,140 --> 00:00:40,730 Since linear functions of independent normals 15 00:00:40,730 --> 00:00:43,320 are normal, in particular, this means 16 00:00:43,320 --> 00:00:48,620 that the Theta j and the Xi are all normal random variables. 17 00:00:48,620 --> 00:00:52,720 Carrying out inference within this class of models 18 00:00:52,720 --> 00:00:55,580 goes under the name of linear regression. 19 00:00:59,530 --> 00:01:02,440 One can proceed using pretty much the same steps 20 00:01:02,440 --> 00:01:05,930 as we had in the trajectory estimation problem 21 00:01:05,930 --> 00:01:08,700 and write down formulas for the posterior. 22 00:01:08,700 --> 00:01:11,000 And it turns out that in every case, 23 00:01:11,000 --> 00:01:14,280 the posterior of the parameter vector 24 00:01:14,280 --> 00:01:19,620 takes a form which is the exponential of and the negative 25 00:01:19,620 --> 00:01:22,830 of a quadratic function of the parameters. 26 00:01:22,830 --> 00:01:25,370 And this means, in particular, that in order 27 00:01:25,370 --> 00:01:30,470 to find the MAP estimate of the vector Theta, what 28 00:01:30,470 --> 00:01:32,880 we need to do is to just minimize 29 00:01:32,880 --> 00:01:35,940 this quadratic function with respect to theta. 30 00:01:35,940 --> 00:01:38,310 And minimizing a quadratic function 31 00:01:38,310 --> 00:01:40,820 is done by taking derivatives and setting them 32 00:01:40,820 --> 00:01:45,800 to zero which leads us to a system of linear equations, 33 00:01:45,800 --> 00:01:50,000 exactly as in the trajectory inference problem. 34 00:01:50,000 --> 00:01:54,037 And this means that numerically it is very simple to come up 35 00:01:54,037 --> 00:01:54,870 with a MAP estimate. 36 00:01:57,479 --> 00:02:00,360 There's an interesting fact that comes out 37 00:02:00,360 --> 00:02:03,660 of the algebra involved, namely, that the MAP 38 00:02:03,660 --> 00:02:06,290 estimate of each one of the parameters 39 00:02:06,290 --> 00:02:10,590 turns out to also be a linear function of the observations. 40 00:02:10,590 --> 00:02:12,920 We saw this property in the estimators 41 00:02:12,920 --> 00:02:17,640 that we derived for simpler cases in this lecture sequence. 42 00:02:17,640 --> 00:02:21,060 It turns out that this property is still true. 43 00:02:21,060 --> 00:02:23,710 And this is an appealing and desirable property 44 00:02:23,710 --> 00:02:27,750 because it means that these estimators can be applied very 45 00:02:27,750 --> 00:02:30,210 efficiently in practice without having 46 00:02:30,210 --> 00:02:33,430 to do any complicated calculations. 47 00:02:33,430 --> 00:02:37,700 Finally, there's a number of important facts, some of which 48 00:02:37,700 --> 00:02:40,180 we have seen in our examples which 49 00:02:40,180 --> 00:02:42,930 are true in very big generality. 50 00:02:45,579 --> 00:02:50,470 One fact is that the maximum a posteriori probability estimate 51 00:02:50,470 --> 00:02:53,000 of some parameter turns out to be 52 00:02:53,000 --> 00:02:58,450 the same as the conditional expectation of that parameter. 53 00:02:58,450 --> 00:03:03,875 Furthermore, if you look at this joint density of all 54 00:03:03,875 --> 00:03:06,750 the Theta parameters and from it you 55 00:03:06,750 --> 00:03:10,660 find the marginal density of the Theta parameters 56 00:03:10,660 --> 00:03:13,390 always within this conditional universe. 57 00:03:13,390 --> 00:03:21,550 It turns out that this marginal posterior PDF is itself normal. 58 00:03:21,550 --> 00:03:26,480 Since it is normal, its mean-- which is this quantity-- 59 00:03:26,480 --> 00:03:29,980 is going to be equal to its peak. 60 00:03:29,980 --> 00:03:35,300 And therefore, it is equal also to the MAP estimate that 61 00:03:35,300 --> 00:03:39,450 would be derived from this marginal PDF. 62 00:03:39,450 --> 00:03:41,640 So what do we have here? 63 00:03:41,640 --> 00:03:44,640 There are two ways of coming up with MAP estimates. 64 00:03:44,640 --> 00:03:51,079 One way is to find the peak of the joint PDF, 65 00:03:51,079 --> 00:03:55,170 and then read out the different components of Theta. 66 00:03:55,170 --> 00:03:58,030 Another way of coming up with MAP estimates 67 00:03:58,030 --> 00:04:01,940 is to find for each parameter the marginal PDF 68 00:04:01,940 --> 00:04:05,050 and look at the peak of this marginal PDF. 69 00:04:05,050 --> 00:04:09,360 It turns out that for this model these two approaches 70 00:04:09,360 --> 00:04:11,670 are going to give you the same answer. 71 00:04:11,670 --> 00:04:14,850 Whether you work with the marginal or with the joint, 72 00:04:14,850 --> 00:04:17,160 you get the same MAP estimates. 73 00:04:17,160 --> 00:04:20,110 And this is a reassuring property to have. 74 00:04:20,110 --> 00:04:24,650 Finally, as in the examples that we have worked in more detail, 75 00:04:24,650 --> 00:04:28,490 it turns out that the mean squared error conditioned 76 00:04:28,490 --> 00:04:31,530 on a particular observation is the same no matter 77 00:04:31,530 --> 00:04:34,780 what the value of that observation was. 78 00:04:34,780 --> 00:04:38,810 And furthermore, there are fairly simple and easily 79 00:04:38,810 --> 00:04:42,940 computable formulas that one can apply in order 80 00:04:42,940 --> 00:04:46,740 to find what this mean squared error is. 81 00:04:46,740 --> 00:04:51,340 So to summarize, this class of models 82 00:04:51,340 --> 00:04:56,030 that involve linear relations and normal random variables 83 00:04:56,030 --> 00:05:00,850 have a rich set of important and elegant properties. 84 00:05:00,850 --> 00:05:03,860 This is one of the reasons why these models are 85 00:05:03,860 --> 00:05:06,280 used very much in practice. 86 00:05:06,280 --> 00:05:09,440 And they're probably the most widely used class 87 00:05:09,440 --> 00:05:11,850 of statistical models.