1 00:00:00,380 --> 00:00:04,070 If you remember our discussion from a long time ago, we said 2 00:00:04,070 --> 00:00:06,790 that much of this class consists of variations of a 3 00:00:06,790 --> 00:00:10,540 few basic skills and ideas, one of which is the Bayes 4 00:00:10,540 --> 00:00:13,670 rule, the foundation of inference. 5 00:00:13,670 --> 00:00:16,580 So let's look here at the Bayes rule again and its 6 00:00:16,580 --> 00:00:18,670 different incarnations. 7 00:00:18,670 --> 00:00:22,530 In a discrete setting we have a random variable with a known 8 00:00:22,530 --> 00:00:26,210 PMF but whose values are not observed. 9 00:00:26,210 --> 00:00:29,450 Instead we observe the value of another random variable, 10 00:00:29,450 --> 00:00:33,040 call it Y, which has some relation with X. 11 00:00:33,040 --> 00:00:36,800 And we will use the value of Y to make some inferences about 12 00:00:36,800 --> 00:00:40,570 X. The relation between the two random variables is 13 00:00:40,570 --> 00:00:45,390 captured by specifying the conditional PMF of Y given any 14 00:00:45,390 --> 00:00:49,690 value of X. Think of X as an unknown state of the world and 15 00:00:49,690 --> 00:00:54,530 of Y as a noisy observation of X. The conditional PMF tells 16 00:00:54,530 --> 00:00:57,840 us the distribution of Y under each possible 17 00:00:57,840 --> 00:00:59,970 state of the world. 18 00:00:59,970 --> 00:01:03,420 Once we observe the value of Y we obtain some information 19 00:01:03,420 --> 00:01:06,990 about X. And we use this information to make inferences 20 00:01:06,990 --> 00:01:11,070 about the likely values of X. Mathematically, instead of 21 00:01:11,070 --> 00:01:16,820 relying on the prior for X, we form some revised beliefs. 22 00:01:16,820 --> 00:01:19,280 That is, we form the conditional [PMF] 23 00:01:19,280 --> 00:01:24,130 of X given the particular observation that we have seen. 24 00:01:24,130 --> 00:01:27,370 All this becomes possible because of the Bayes rule. 25 00:01:27,370 --> 00:01:29,400 We have seen the Bayes rule for events. 26 00:01:29,400 --> 00:01:33,090 But it is easy to translate into PMF notation. 27 00:01:33,090 --> 00:01:35,140 We take the multiplication rule. 28 00:01:35,140 --> 00:01:38,990 And we use it twice in different orders to get two 29 00:01:38,990 --> 00:01:40,610 different forms-- 30 00:01:40,610 --> 00:01:42,090 or two different expressions-- 31 00:01:42,090 --> 00:01:44,479 for the joint PMF. 32 00:01:44,479 --> 00:01:48,990 We then take one of the terms involved here and send it to 33 00:01:48,990 --> 00:01:50,710 the other side. 34 00:01:50,710 --> 00:01:54,100 We obtain this expression, which is the Bayes rule. 35 00:01:54,100 --> 00:01:55,560 What [do] we have here? 36 00:01:55,560 --> 00:01:59,630 We want to calculate the conditional distribution of X 37 00:01:59,630 --> 00:02:01,920 which we typically call the posterior. 38 00:02:07,600 --> 00:02:13,860 And to do this we rely on the prior of X as well as on the 39 00:02:13,860 --> 00:02:16,740 model that we have for the observations. 40 00:02:16,740 --> 00:02:21,660 The denominator requires us to compute the marginal of Y. But 41 00:02:21,660 --> 00:02:25,350 this is something that is easily done because we have 42 00:02:25,350 --> 00:02:27,250 the joint available. 43 00:02:27,250 --> 00:02:31,190 The numerator, this expression here, is just the joint PMF. 44 00:02:31,190 --> 00:02:34,790 And using the joint PMF you can always find 45 00:02:34,790 --> 00:02:36,770 the marginal PMF. 46 00:02:36,770 --> 00:02:39,910 Essentially, we're using here the total probability theorem. 47 00:02:39,910 --> 00:02:43,160 And we're using the pieces of information that were given to 48 00:02:43,160 --> 00:02:47,600 us, the prior and the model of the observations. 49 00:02:47,600 --> 00:02:50,020 When we're dealing with continuous random variables 50 00:02:50,020 --> 00:02:51,690 the story is identical. 51 00:02:51,690 --> 00:02:54,880 We still have two versions of the multiplication rule. 52 00:02:54,880 --> 00:02:56,810 By sending one term-- 53 00:02:56,810 --> 00:02:57,570 this term-- 54 00:02:57,570 --> 00:03:00,270 to the other side of the equation we 55 00:03:00,270 --> 00:03:02,150 get the Bayes rule. 56 00:03:02,150 --> 00:03:05,070 And then we use the total probability theorem to 57 00:03:05,070 --> 00:03:07,600 calculate the denominator term. 58 00:03:07,600 --> 00:03:11,860 So as far as mathematics go, the story is pretty simple. 59 00:03:11,860 --> 00:03:14,500 It is exactly the same in the discrete and 60 00:03:14,500 --> 00:03:15,740 the continuous case. 61 00:03:15,740 --> 00:03:18,440 This story will be our stepping stone for dealing 62 00:03:18,440 --> 00:03:22,270 with more complex models and also when we go into more 63 00:03:22,270 --> 00:03:24,300 detail on the subject of inference 64 00:03:24,300 --> 00:03:25,550 later in this course.