1 00:00:01,980 --> 00:00:04,540 We have seen two versions of the Bayes rule-- 2 00:00:04,540 --> 00:00:08,250 one involving two discrete random variables, and another 3 00:00:08,250 --> 00:00:11,660 that involves two continuous random variables. 4 00:00:11,660 --> 00:00:14,700 But there are many situations in real life when one has to 5 00:00:14,700 --> 00:00:17,310 deal simultaneously with discrete and 6 00:00:17,310 --> 00:00:19,160 continuous random variables. 7 00:00:19,160 --> 00:00:23,170 For example, you may want to recover a discrete digital 8 00:00:23,170 --> 00:00:26,560 signal that was sent to you, but the signal has been 9 00:00:26,560 --> 00:00:30,220 corrupted by continuous noise so that your observation is a 10 00:00:30,220 --> 00:00:32,910 continuous random variable. 11 00:00:32,910 --> 00:00:36,800 So suppose that we have a discrete random variable K, 12 00:00:36,800 --> 00:00:41,320 and another continuous random variable, Y. In order to get a 13 00:00:41,320 --> 00:00:44,730 variant of the Bayes rule that applies to this situation, we 14 00:00:44,730 --> 00:00:48,460 will proceed as in the more standard cases. 15 00:00:48,460 --> 00:00:52,460 We will use the multiplication rule twice to get two 16 00:00:52,460 --> 00:00:54,880 alternative expressions for the probability 17 00:00:54,880 --> 00:00:57,120 of two events happening. 18 00:00:57,120 --> 00:00:59,740 We will equate those expressions, and from these, 19 00:00:59,740 --> 00:01:02,570 derive a version of the Bayes rule. 20 00:01:02,570 --> 00:01:07,250 So we will look at the probability that the discrete 21 00:01:07,250 --> 00:01:12,700 random variable takes on a certain numerical value, and, 22 00:01:12,700 --> 00:01:17,120 simultaneously, the continuous random variable takes a value 23 00:01:17,120 --> 00:01:20,530 inside a certain small interval. 24 00:01:20,530 --> 00:01:24,450 So here, delta is a positive number, which we will take to 25 00:01:24,450 --> 00:01:25,830 be very small. 26 00:01:25,830 --> 00:01:28,730 And in fact, we will be interested in the limiting 27 00:01:28,730 --> 00:01:31,020 case as delta goes to 0. 28 00:01:31,020 --> 00:01:33,060 So now we use the multiplication rule. 29 00:01:33,060 --> 00:01:37,380 The probability of two events is equal to the probability of 30 00:01:37,380 --> 00:01:43,350 the first event times the conditional probability of the 31 00:01:43,350 --> 00:01:51,759 second event given that the first event has occurred. 32 00:01:54,300 --> 00:01:58,030 But we know that we can use the multiplication rule in any 33 00:01:58,030 --> 00:02:01,390 order, so the probability of two events happening can also 34 00:02:01,390 --> 00:02:09,770 be written as the probability that the second event occurs 35 00:02:09,770 --> 00:02:13,400 times the conditional probability that the first 36 00:02:13,400 --> 00:02:19,050 event occurs, given that the second event has occurred. 37 00:02:23,460 --> 00:02:26,100 So these two expressions that we obtain from the 38 00:02:26,100 --> 00:02:28,540 multiplication rule have to be equal. 39 00:02:28,540 --> 00:02:32,540 Let us rewrite those expressions using PMF notation 40 00:02:32,540 --> 00:02:34,630 and PDF notation. 41 00:02:34,630 --> 00:02:36,300 What do we have here? 42 00:02:36,300 --> 00:02:38,690 The probability that a discrete random variable takes 43 00:02:38,690 --> 00:02:40,000 on a certain value-- 44 00:02:40,000 --> 00:02:44,910 that's just the PMF of this random variable evaluated at a 45 00:02:44,910 --> 00:02:46,520 particular point. 46 00:02:46,520 --> 00:02:47,850 And what do we have here? 47 00:02:47,850 --> 00:02:50,440 The probability that the random variable, Y, a 48 00:02:50,440 --> 00:02:54,370 continuous random variable, takes values inside an 49 00:02:54,370 --> 00:03:00,260 interval is always equal to the PDF of that random 50 00:03:00,260 --> 00:03:04,420 variable times the length of this interval. 51 00:03:04,420 --> 00:03:07,420 And this is an approximate equality. 52 00:03:07,420 --> 00:03:10,580 However, because here we're talking about the probability 53 00:03:10,580 --> 00:03:13,590 of being in a small interval conditioned on a certain 54 00:03:13,590 --> 00:03:18,870 event, we should be using a conditional PDF. 55 00:03:18,870 --> 00:03:21,660 It's the conditional PDF conditioned on the random 56 00:03:21,660 --> 00:03:25,360 variable, capital K, and conditioned on the specific 57 00:03:25,360 --> 00:03:28,990 event that this discrete random variable takes on a 58 00:03:28,990 --> 00:03:32,920 certain value, little k. 59 00:03:32,920 --> 00:03:36,860 Let us do a similar notation change for the second 60 00:03:36,860 --> 00:03:37,930 expression. 61 00:03:37,930 --> 00:03:39,380 Here we have the probability-- 62 00:03:39,380 --> 00:03:41,210 the unconditional probability-- 63 00:03:41,210 --> 00:03:45,340 that Y takes a value inside a small interval, and when delta 64 00:03:45,340 --> 00:03:48,460 is small, this is approximately equal to the PDF 65 00:03:48,460 --> 00:03:52,329 of the random variable Y times the length of the interval. 66 00:03:52,329 --> 00:03:53,490 And what do we have here? 67 00:03:53,490 --> 00:03:56,590 The probability that a discrete random variable takes 68 00:03:56,590 --> 00:04:02,180 on a certain value, that just corresponds to the PMF of that 69 00:04:02,180 --> 00:04:04,500 the random variable. 70 00:04:04,500 --> 00:04:09,840 However, we're talking about a conditional probability given 71 00:04:09,840 --> 00:04:14,770 that a random variable Y takes a value that's approximately 72 00:04:14,770 --> 00:04:18,680 equal to a certain little y. 73 00:04:18,680 --> 00:04:22,860 So this is a notation that we have not used before, but its 74 00:04:22,860 --> 00:04:26,130 meaning should be unambiguous at this point. 75 00:04:26,130 --> 00:04:29,710 But just by arguing by analogy to what we have been doing all 76 00:04:29,710 --> 00:04:35,200 along, it's a PMF of a discrete random variable. 77 00:04:35,200 --> 00:04:37,560 But it is a conditional PMF. 78 00:04:37,560 --> 00:04:40,500 It describes to us the probability distribution of 79 00:04:40,500 --> 00:04:45,780 the discrete random variable K when the random variable Y, 80 00:04:45,780 --> 00:04:48,050 which happens to be a continuous one, takes on a 81 00:04:48,050 --> 00:04:50,550 specific value. 82 00:04:50,550 --> 00:04:55,450 So we can cancel the deltas from both sides, and we have 83 00:04:55,450 --> 00:04:57,950 that this expression is approximately equal to that 84 00:04:57,950 --> 00:05:00,700 expression, and this approximate equality is more 85 00:05:00,700 --> 00:05:04,230 and more exact as we send delta to 0. 86 00:05:04,230 --> 00:05:07,600 But delta has already disappeared from here, so we 87 00:05:07,600 --> 00:05:11,250 can set these two expressions equal to each other. 88 00:05:11,250 --> 00:05:16,280 At this point, now, we can take this term and move it to 89 00:05:16,280 --> 00:05:19,220 the other side of the equality so it will go to the 90 00:05:19,220 --> 00:05:20,780 denominator. 91 00:05:20,780 --> 00:05:26,010 And we obtain this version of the Bayes rule. 92 00:05:26,010 --> 00:05:29,540 It gives us the conditional probability of a random 93 00:05:29,540 --> 00:05:33,860 variable K given that a certain continuous random 94 00:05:33,860 --> 00:05:37,909 variable Y has taken on a specific value. 95 00:05:37,909 --> 00:05:43,659 So this version is useful if we have a continuous noisy 96 00:05:43,659 --> 00:05:48,820 observation, Y, on the basis of which we're trying to say 97 00:05:48,820 --> 00:05:52,070 something, to make inferences about the discrete random 98 00:05:52,070 --> 00:05:56,370 variable K. And in order to apply the Bayes rule, we need 99 00:05:56,370 --> 00:06:01,370 to know the unconditional distribution of the random 100 00:06:01,370 --> 00:06:05,770 variable K, and we also need to have a model of the noisy 101 00:06:05,770 --> 00:06:07,610 observation-- 102 00:06:07,610 --> 00:06:10,900 a model of that observation under each possible 103 00:06:10,900 --> 00:06:12,570 conditional universe. 104 00:06:12,570 --> 00:06:16,940 So for any possibility for the random variable K, we need to 105 00:06:16,940 --> 00:06:21,030 know the distribution of the random variable Y. 106 00:06:21,030 --> 00:06:26,150 Or, alternatively, we can take this term and send it to the 107 00:06:26,150 --> 00:06:29,870 denominator of the other side, and we get a different version 108 00:06:29,870 --> 00:06:31,460 of the Bayes rule. 109 00:06:31,460 --> 00:06:34,210 This version of the Bayes rule applies if we're trying to 110 00:06:34,210 --> 00:06:38,500 make an inference about a continuous random variable Y, 111 00:06:38,500 --> 00:06:42,140 given that we know the value of a certain related 112 00:06:42,140 --> 00:06:49,510 observation, K, of a random variable, capital K. 113 00:06:49,510 --> 00:06:52,300 In both versions of the Bayes rule, there's also a 114 00:06:52,300 --> 00:06:56,240 denominator term which needs to be evaluated. 115 00:06:56,240 --> 00:06:59,280 This term gets evaluated similar to the cases that we 116 00:06:59,280 --> 00:07:03,210 have considered earlier, and they are determined by using a 117 00:07:03,210 --> 00:07:06,800 suitable version of the total probability theorem. 118 00:07:06,800 --> 00:07:09,440 This is a version of the total probability theory that we 119 00:07:09,440 --> 00:07:10,970 have already seen. 120 00:07:10,970 --> 00:07:15,800 We have a conditional density of Y under different scenarios 121 00:07:15,800 --> 00:07:18,900 for the random variable capital K, and we get the 122 00:07:18,900 --> 00:07:22,770 density of Y by considering the conditional densities and 123 00:07:22,770 --> 00:07:26,080 weighing them according to the probabilities of the different 124 00:07:26,080 --> 00:07:29,340 discrete scenarios. 125 00:07:29,340 --> 00:07:32,920 This version of the total probability theorem is 126 00:07:32,920 --> 00:07:37,450 something that we have not proved so far, and we 127 00:07:37,450 --> 00:07:38,990 have not seen it. 128 00:07:38,990 --> 00:07:41,950 On the other hand, it's not hard to derive. 129 00:07:44,920 --> 00:07:53,110 If we fix the value of k, this is a density, and therefore it 130 00:07:53,110 --> 00:07:55,200 must integrate to 1. 131 00:07:55,200 --> 00:07:59,490 So the integral of this ratio, with respect to y, has to be 132 00:07:59,490 --> 00:08:00,850 equal to 1. 133 00:08:00,850 --> 00:08:04,990 Now, there's no y in the denominator, so the integral 134 00:08:04,990 --> 00:08:08,370 of the numerator divided by the denominator has to be 135 00:08:08,370 --> 00:08:12,580 equal to 1, which means that the denominator must be equal 136 00:08:12,580 --> 00:08:16,180 to the integral of the numerator when we integrate 137 00:08:16,180 --> 00:08:19,320 overall y's, and this is just what this 138 00:08:19,320 --> 00:08:21,500 expression is saying. 139 00:08:21,500 --> 00:08:26,130 So what we will do next will be to consider one example for 140 00:08:26,130 --> 00:08:31,030 each one of these two cases of the Bayes rule that we have 141 00:08:31,030 --> 00:08:32,280 just derived.