1 00:00:03,280 --> 00:00:07,790 We will now use the Bayes rule in an important application 2 00:00:07,790 --> 00:00:10,650 that involves a discrete unknown random variable and a 3 00:00:10,650 --> 00:00:12,410 continuous measurement. 4 00:00:12,410 --> 00:00:16,260 Our discrete unknown random variable will be one that 5 00:00:16,260 --> 00:00:21,390 takes the values plus or minus 1 with equal probability. 6 00:00:23,900 --> 00:00:30,230 And the measurement will be another random variable, Y, 7 00:00:30,230 --> 00:00:35,670 which is equal to the discrete random variable, but corrupted 8 00:00:35,670 --> 00:00:40,780 by additive noise that we denote by W. So what we get to 9 00:00:40,780 --> 00:00:45,320 observe is the sum of K and W. 10 00:00:45,320 --> 00:00:47,990 This is a common situation in digital communications. 11 00:00:47,990 --> 00:00:51,490 We're trying to send one bit of information whether K is 12 00:00:51,490 --> 00:00:56,050 plus 1 or minus 1, but the observation that we're making 13 00:00:56,050 --> 00:00:59,620 is corrupted by a communication channel, by some 14 00:00:59,620 --> 00:01:03,500 noise that is present in the channel, and on the basis of 15 00:01:03,500 --> 00:01:06,810 the value of Y that we will observe, we will try to guess 16 00:01:06,810 --> 00:01:09,080 what was sent. 17 00:01:09,080 --> 00:01:11,600 The assumption that we will make about the noise is that 18 00:01:11,600 --> 00:01:15,800 it is a standard normal random variable. 19 00:01:15,800 --> 00:01:19,870 So suppose that we observed a specific value for the random 20 00:01:19,870 --> 00:01:23,840 variable Y. We want to make a guess about the random 21 00:01:23,840 --> 00:01:27,360 variable capital K. Of course, there's no way to guess with 22 00:01:27,360 --> 00:01:28,289 complete certainty. 23 00:01:28,289 --> 00:01:33,000 The only thing that we can say is to determine how likely it 24 00:01:33,000 --> 00:01:36,500 is that a 1 was sent as opposed to how likely it is 25 00:01:36,500 --> 00:01:38,780 that a minus 1 was sent. 26 00:01:38,780 --> 00:01:40,960 How do we approach such a problem? 27 00:01:40,960 --> 00:01:43,550 Well, we use the version of the Bayes rule that we have 28 00:01:43,550 --> 00:01:46,880 already developed, which is this formula that gives us the 29 00:01:46,880 --> 00:01:49,300 conditional probabilities that we want. 30 00:01:49,300 --> 00:01:52,530 And in particular, here, we're asking a question about the 31 00:01:52,530 --> 00:01:58,789 conditional probability that K takes the value of 1 given 32 00:01:58,789 --> 00:02:03,070 that a value of y has been observed. 33 00:02:03,070 --> 00:02:05,520 This is what we want to calculate. 34 00:02:05,520 --> 00:02:08,889 So let us look at the various terms involved here and see 35 00:02:08,889 --> 00:02:11,700 what each term is. 36 00:02:11,700 --> 00:02:13,590 First, we need the prior probability 37 00:02:13,590 --> 00:02:15,420 of K. This is simple. 38 00:02:15,420 --> 00:02:21,520 The prior probabilities are 1/2 for k equal to minus 1 or 39 00:02:21,520 --> 00:02:24,460 plus 1, because we said that the two possibilities are 40 00:02:24,460 --> 00:02:25,980 equally likely. 41 00:02:25,980 --> 00:02:30,880 Then we need the conditional density of Y given K. So what 42 00:02:30,880 --> 00:02:33,440 does this assumption mean? 43 00:02:33,440 --> 00:02:37,770 It means that Y is a standard normal random variable to 44 00:02:37,770 --> 00:02:43,820 which we add the value of K. So if K is equal to 1, we're 45 00:02:43,820 --> 00:02:49,020 taking a standard normal, and we add a value of plus 1. 46 00:02:49,020 --> 00:02:55,200 So Y, given that K is equal to plus 1, is going to be a 47 00:02:55,200 --> 00:02:57,420 standard normal plus 1. 48 00:02:57,420 --> 00:02:58,900 What does that do? 49 00:02:58,900 --> 00:03:01,560 If we take a standard normal and add a constant to it, that 50 00:03:01,560 --> 00:03:06,660 changes the mean and makes the mean equal to 1, and does not 51 00:03:06,660 --> 00:03:08,980 change the variance. 52 00:03:08,980 --> 00:03:12,510 On the other hand, if K happens to be equal to minus 53 00:03:12,510 --> 00:03:17,570 1, then the observation that we see is going to be a 54 00:03:17,570 --> 00:03:22,060 standard normal plus a value of minus 1, and that changes 55 00:03:22,060 --> 00:03:26,640 the mean to become minus 1, but with a variance of 1. 56 00:03:26,640 --> 00:03:33,690 So if we are to plot the density of Y, that density, of 57 00:03:33,690 --> 00:03:38,350 course, will depend on what the value of K was. 58 00:03:38,350 --> 00:03:44,870 And if K is equal to 1, then we will obtain a normal that 59 00:03:44,870 --> 00:03:49,120 has a mean of 1, so it's centered here. 60 00:03:49,120 --> 00:03:56,490 But if K is equal to minus 1, then our observation will be a 61 00:03:56,490 --> 00:04:02,730 normal with unit variance, but centered at minus 1. 62 00:04:06,120 --> 00:04:11,060 So if we are to write this in terms of symbols, the 63 00:04:11,060 --> 00:04:16,950 distribution of Y is normal with variance equal to 1. 64 00:04:16,950 --> 00:04:26,530 So the PDF is given by this form, e to the minus 1/2 y 65 00:04:26,530 --> 00:04:31,390 minus the mean of Y. But given the value of K, the mean of Y 66 00:04:31,390 --> 00:04:38,409 is equal to k, plus or minus 1, depending on what k is. 67 00:04:38,409 --> 00:04:44,130 So this is the PDF of a normal with unit variance and mean 68 00:04:44,130 --> 00:04:45,800 equal to k. 69 00:04:45,800 --> 00:04:49,340 And it corresponds, when you set k equal to 1, it 70 00:04:49,340 --> 00:04:50,990 corresponds to this graph. 71 00:04:50,990 --> 00:04:53,340 When you set K equal to minus 1, it 72 00:04:53,340 --> 00:04:56,940 corresponds to that graph. 73 00:04:56,940 --> 00:05:00,590 Let us continue with the next term in this expression. 74 00:05:00,590 --> 00:05:04,400 We need the term in the denominator, which is obtained 75 00:05:04,400 --> 00:05:08,280 by taking a sum over the different choices of k. 76 00:05:08,280 --> 00:05:10,930 There are 2 choices, and each choice has a 77 00:05:10,930 --> 00:05:13,090 probability of 1/2. 78 00:05:13,090 --> 00:05:16,430 From the first choice, we have 1/2 times the density of Y 79 00:05:16,430 --> 00:05:19,325 when k is equal to minus 1. 80 00:05:24,450 --> 00:05:30,350 And when k is equal to minus 1, we obtain this expression. 81 00:05:30,350 --> 00:05:34,000 And we have another term that corresponds to the case where 82 00:05:34,000 --> 00:05:41,120 k is equal to plus one, in which case we have this 83 00:05:41,120 --> 00:05:44,190 expression here. 84 00:05:44,190 --> 00:05:48,159 Once more, this expression here corresponds to this 85 00:05:48,159 --> 00:05:50,640 normal with a mean of minus 1. 86 00:05:50,640 --> 00:05:53,640 This expression here corresponds to a normal with a 87 00:05:53,640 --> 00:05:56,830 mean of plus 1, which is this graph here. 88 00:05:56,830 --> 00:05:59,830 So at this point, we have in our hands expressions for 89 00:05:59,830 --> 00:06:03,750 everything that is involved here, and we can just apply 90 00:06:03,750 --> 00:06:10,270 the formula and carry out a fair amount of algebra. 91 00:06:10,270 --> 00:06:13,470 There are some very nice simplifications that happen 92 00:06:13,470 --> 00:06:17,590 along the way, and we end up with an answer that has the 93 00:06:17,590 --> 00:06:18,990 following form. 94 00:06:18,990 --> 00:06:28,320 It's 1 divided by 1 plus e to the minus 2 y. 95 00:06:28,320 --> 00:06:32,040 And this gives us the probability that a 1 was sent. 96 00:06:32,040 --> 00:06:35,490 Let us try to make sense of this expression. 97 00:06:35,490 --> 00:06:38,860 Let's see what it looks like by plotting it as 98 00:06:38,860 --> 00:06:40,110 a function of y. 99 00:06:43,040 --> 00:06:47,370 So what we're plotting here is this expression. 100 00:06:47,370 --> 00:06:52,730 OK, if y is very large, as y goes to plus infinity, this 101 00:06:52,730 --> 00:06:55,620 term disappears, and we obtain a 1. 102 00:06:59,080 --> 00:07:02,280 If, on the other hand, y is very, very negative-- 103 00:07:02,280 --> 00:07:04,890 so y goes to minus infinity-- 104 00:07:04,890 --> 00:07:08,410 here we get to e to the infinity, which is a very 105 00:07:08,410 --> 00:07:09,620 large number. 106 00:07:09,620 --> 00:07:13,990 So this ratio is going to converge to 0. 107 00:07:13,990 --> 00:07:18,680 So we have a graph that starts at 0. 108 00:07:18,680 --> 00:07:22,440 It actually rises monotonically, and in the 109 00:07:22,440 --> 00:07:25,560 limit, converges to 1. 110 00:07:25,560 --> 00:07:30,460 If y is equal to 0, then this term is 1, 111 00:07:30,460 --> 00:07:32,130 and we obtain a 1/2. 112 00:07:34,930 --> 00:07:37,730 Let us interpret this plot. 113 00:07:37,730 --> 00:07:44,860 If y is very large, it is much more likely that y is coming 114 00:07:44,860 --> 00:07:50,170 out of this distribution so that K is equal to 1. 115 00:07:50,170 --> 00:07:53,530 So the probability that K is equal to 1, if we obtain this 116 00:07:53,530 --> 00:07:55,940 observation, is almost 1. 117 00:07:55,940 --> 00:07:57,800 We have almost certainty. 118 00:07:57,800 --> 00:08:01,040 If, on the other hand, y is very, very negative, then it 119 00:08:01,040 --> 00:08:04,390 is much more likely that what we're seeing is coming from 120 00:08:04,390 --> 00:08:08,480 this distribution so that K is equal to minus 1. 121 00:08:08,480 --> 00:08:12,370 And in that case, the probability that K was 1 is 122 00:08:12,370 --> 00:08:14,370 going to be approximately 0. 123 00:08:14,370 --> 00:08:19,790 Finally, if y is 0, then we're just in the middle of the two 124 00:08:19,790 --> 00:08:23,670 possibilities, and by symmetry, either choice of K 125 00:08:23,670 --> 00:08:25,400 is equally likely. 126 00:08:25,400 --> 00:08:28,920 Therefore, the posterior probability that K is equal to 127 00:08:28,920 --> 00:08:32,880 1, given that Y was equal to 0-- 128 00:08:32,880 --> 00:08:34,820 that probability is 1/2. 129 00:08:34,820 --> 00:08:38,650 When Y is equal to 0, it's equally likely that either 130 00:08:38,650 --> 00:08:41,750 signal was sent. 131 00:08:41,750 --> 00:08:45,590 This example is a prototype of the kind of calculations that 132 00:08:45,590 --> 00:08:48,940 are done in the analysis of communication systems. 133 00:08:48,940 --> 00:08:52,570 This is the simplest model of communication of a single bit 134 00:08:52,570 --> 00:08:56,230 in the presence of additive noise, but of course, there 135 00:08:56,230 --> 00:08:59,510 can also be more complicated models in which we have more 136 00:08:59,510 --> 00:09:02,750 complicated signals that are sent, and more complicated 137 00:09:02,750 --> 00:09:04,300 models of the noise. 138 00:09:04,300 --> 00:09:07,260 But the general principles of the analysis are 139 00:09:07,260 --> 00:09:08,700 always of this kind. 140 00:09:08,700 --> 00:09:11,710 We're using the Bayes rule, and we need to write down the 141 00:09:11,710 --> 00:09:13,170 different terms that are involved.