1 00:00:00,930 --> 00:00:03,580 In the context of the problem of estimating 2 00:00:03,580 --> 00:00:09,000 the unknown bias of a coin, we encountered this distribution, 3 00:00:09,000 --> 00:00:12,500 which is known as a Beta distribution. 4 00:00:12,500 --> 00:00:16,750 It's a probability density for a random variable, Theta, 5 00:00:16,750 --> 00:00:21,650 that takes values in the interval from 0 to 1. 6 00:00:21,650 --> 00:00:24,950 So this formula is valid for thetas in this range. 7 00:00:24,950 --> 00:00:28,700 And k here is a non-negative integer. 8 00:00:31,980 --> 00:00:35,605 Now, this coefficient here, d(n,k), 9 00:00:35,605 --> 00:00:39,740 is a normalizing constant, which is needed so that this is 10 00:00:39,740 --> 00:00:43,560 a legitimate PDF, that it integrates to 1. 11 00:00:43,560 --> 00:00:47,060 And so in particular, d needs to be 12 00:00:47,060 --> 00:00:50,440 equal to the integral of what we have in the numerator. 13 00:00:50,440 --> 00:00:53,540 This is the choice that makes this whole expression integrate 14 00:00:53,540 --> 00:00:54,550 to 1. 15 00:00:54,550 --> 00:00:57,790 And this integral is calculated and can 16 00:00:57,790 --> 00:01:02,110 be found to be equal to this particular expression. 17 00:01:02,110 --> 00:01:05,349 How do we derive this expression? 18 00:01:05,349 --> 00:01:09,210 One way is to carry out a long exercise in calculus. 19 00:01:09,210 --> 00:01:10,830 We have this integral here. 20 00:01:10,830 --> 00:01:14,500 You might either expand it and then integrate and collect 21 00:01:14,500 --> 00:01:18,800 terms, or you could try to demonstrate this equality 22 00:01:18,800 --> 00:01:22,310 by applying integration by parts. 23 00:01:22,310 --> 00:01:23,940 But this is complicated. 24 00:01:23,940 --> 00:01:27,020 Is there some simple way of arguing and deriving 25 00:01:27,020 --> 00:01:28,200 this expression? 26 00:01:28,200 --> 00:01:31,370 We will see that there is a very simple probabilistic argument 27 00:01:31,370 --> 00:01:33,670 for deriving this equality. 28 00:01:33,670 --> 00:01:36,680 What we will actually derive is this same equality, 29 00:01:36,680 --> 00:01:39,190 but in slightly different notation. 30 00:01:39,190 --> 00:01:41,330 Instead of k, we will use alpha. 31 00:01:41,330 --> 00:01:45,200 Instead of n minus k, we will use beta. 32 00:01:45,200 --> 00:01:49,070 So here we have alpha factorial times beta factorial. 33 00:01:49,070 --> 00:01:51,130 In the denominator, we have the sum 34 00:01:51,130 --> 00:01:54,360 of these two coefficients plus 1, 35 00:01:54,360 --> 00:01:59,240 so this corresponds to alpha plus beta plus 1 factorial. 36 00:01:59,240 --> 00:02:02,030 This is what we want to demonstrate. 37 00:02:02,030 --> 00:02:08,350 What we will do will be to consider the following setting. 38 00:02:08,350 --> 00:02:11,900 We start with alpha plus beta plus 1, 39 00:02:11,900 --> 00:02:15,150 that many independent random variables that 40 00:02:15,150 --> 00:02:17,940 are uniform on the unit interval, 41 00:02:17,940 --> 00:02:21,490 and we will consider the following event 42 00:02:21,490 --> 00:02:23,510 and its probability. 43 00:02:23,510 --> 00:02:26,630 This is the probability that these random variables 44 00:02:26,630 --> 00:02:32,680 happen to be ordered in some particular order. 45 00:02:32,680 --> 00:02:38,750 Let us call this event A, so this is the probability of A. 46 00:02:38,750 --> 00:02:41,410 Now, this probability is not hard to calculate. 47 00:02:41,410 --> 00:02:45,900 We have alpha plus beta plus 1 random variables-- independent, 48 00:02:45,900 --> 00:02:47,620 identically distributed. 49 00:02:47,620 --> 00:02:51,250 By symmetry, any particular way of ordering 50 00:02:51,250 --> 00:02:54,590 these random variables is equally likely. 51 00:02:54,590 --> 00:02:58,030 How many ways are there to order alpha plus beta 52 00:02:58,030 --> 00:03:00,390 plus 1 random variables? 53 00:03:00,390 --> 00:03:03,440 It's the factorial of the number of items 54 00:03:03,440 --> 00:03:05,090 that we're trying to order. 55 00:03:05,090 --> 00:03:06,570 We're talking about the probability 56 00:03:06,570 --> 00:03:09,860 of a particular permutation, so this probability 57 00:03:09,860 --> 00:03:15,190 is equal to 1 over the number of permutations 58 00:03:15,190 --> 00:03:20,320 of alpha plus beta plus 1 objects. 59 00:03:20,320 --> 00:03:23,280 So this is one expression for the probability of this event 60 00:03:23,280 --> 00:03:24,280 A. 61 00:03:24,280 --> 00:03:27,850 Now, we're going to calculate this probability 62 00:03:27,850 --> 00:03:29,670 in a different way. 63 00:03:29,670 --> 00:03:34,829 What we will do is we're going to apply the total probability 64 00:03:34,829 --> 00:03:36,010 theorem. 65 00:03:36,010 --> 00:03:39,250 We're going to condition on Z. We're 66 00:03:39,250 --> 00:03:43,770 going to calculate the conditional probability of A 67 00:03:43,770 --> 00:03:48,020 given that Z takes a specific value, 68 00:03:48,020 --> 00:03:52,570 and then weigh those probabilities according 69 00:03:52,570 --> 00:03:59,380 to the probability density of the random variable Z. 70 00:03:59,380 --> 00:04:02,300 So this is just the total probability theorem 71 00:04:02,300 --> 00:04:05,300 applied to this particular context. 72 00:04:05,300 --> 00:04:09,150 And now to make progress, what we will need to do 73 00:04:09,150 --> 00:04:11,610 is to find this conditional probability. 74 00:04:16,040 --> 00:04:22,810 We fix a constant little theta, and we want the probability 75 00:04:22,810 --> 00:04:25,540 that this event happens. 76 00:04:25,540 --> 00:04:26,465 What is this event? 77 00:04:32,390 --> 00:04:42,620 It is the event that all of the X's fall inside this interval, 78 00:04:42,620 --> 00:04:46,770 all the Y's fall inside this interval, 79 00:04:46,770 --> 00:04:52,650 and furthermore, the X's are sorted and the Y's are sorted. 80 00:04:52,650 --> 00:04:55,020 So let us write this out. 81 00:04:55,020 --> 00:05:01,180 It's the probability that all of the X's happen 82 00:05:01,180 --> 00:05:06,990 to be less than theta, all the Y's 83 00:05:06,990 --> 00:05:14,880 happen to be bigger than theta, and also, not just that, 84 00:05:14,880 --> 00:05:25,610 but the X's are sorted, and furthermore, the Y's 85 00:05:25,610 --> 00:05:26,990 are sorted as well. 86 00:05:34,790 --> 00:05:37,700 Clearly, if I give you the value of theta 87 00:05:37,700 --> 00:05:41,870 so that Z is equal to theta, for this event to happen, 88 00:05:41,870 --> 00:05:46,370 we must have all these events here happen as well. 89 00:05:46,370 --> 00:05:51,180 So now, let us try to calculate the probability of this event. 90 00:05:51,180 --> 00:05:53,090 We're going to use the multiplication rule. 91 00:05:53,090 --> 00:05:55,890 First, take the probability of this event 92 00:05:55,890 --> 00:05:58,485 and then the conditional probability of that event. 93 00:06:01,460 --> 00:06:04,150 The X's and the Y's are independent, 94 00:06:04,150 --> 00:06:08,215 so we can take the probability of this event and then multiply 95 00:06:08,215 --> 00:06:11,480 with the probability of this event involving the Y's. 96 00:06:11,480 --> 00:06:13,440 How about the probability of this event, 97 00:06:13,440 --> 00:06:16,025 that all of the X's are less than theta? 98 00:06:16,025 --> 00:06:18,990 Since the X's are independent, this 99 00:06:18,990 --> 00:06:21,400 is going to be equal to the probability 100 00:06:21,400 --> 00:06:24,870 that X1 is less than theta. 101 00:06:24,870 --> 00:06:26,440 What is this probability? 102 00:06:26,440 --> 00:06:30,040 Since X1 is uniform on the unit interval and this is theta, 103 00:06:30,040 --> 00:06:32,040 the probability of falling in this interval 104 00:06:32,040 --> 00:06:34,430 is equal to theta. 105 00:06:34,430 --> 00:06:39,070 Times the probability that X2 is less than theta. 106 00:06:39,070 --> 00:06:42,670 This probability is, again, theta and so on. 107 00:06:42,670 --> 00:06:47,310 We have alpha many terms of that kind, 108 00:06:47,310 --> 00:06:50,510 so this probability that all of these random variables 109 00:06:50,510 --> 00:06:55,100 are less theta is equal to theta to the power of alpha. 110 00:06:55,100 --> 00:06:57,380 Similarly, about the Y's. 111 00:06:57,380 --> 00:07:00,300 For any particular Y, the probability 112 00:07:00,300 --> 00:07:02,970 that it falls in this interval is 113 00:07:02,970 --> 00:07:05,540 equal to the length of this interval, which 114 00:07:05,540 --> 00:07:08,220 is 1 minus theta. 115 00:07:08,220 --> 00:07:11,830 This is the probability for each one of the Y's. 116 00:07:11,830 --> 00:07:15,750 There's beta many Y's. 117 00:07:15,750 --> 00:07:17,370 The Y's are independent. 118 00:07:17,370 --> 00:07:20,080 So the probability that all of them fall in this interval 119 00:07:20,080 --> 00:07:24,770 is going to be this number to the power of beta. 120 00:07:28,620 --> 00:07:32,690 So suppose that I told you that all the X's are 121 00:07:32,690 --> 00:07:37,330 less than theta, and then I ask you, 122 00:07:37,330 --> 00:07:41,650 given this information, what is the probability 123 00:07:41,650 --> 00:07:45,060 that the X's that you got are arranged 124 00:07:45,060 --> 00:07:46,360 in this particular order? 125 00:07:51,280 --> 00:07:55,630 Now, because of the complete symmetry of the problem, 126 00:07:55,630 --> 00:07:59,965 even if I told you that all the X's fall inside this interval, 127 00:07:59,965 --> 00:08:04,690 any order of the X's is equally likely. 128 00:08:04,690 --> 00:08:07,860 So the probability of this particular order 129 00:08:07,860 --> 00:08:12,880 is going to be 1 over the number of possible orderings. 130 00:08:12,880 --> 00:08:17,140 How many ways are there that alpha items can be ordered? 131 00:08:17,140 --> 00:08:21,240 There are alpha factorial possible orderings, 132 00:08:21,240 --> 00:08:25,410 so the probability that I obtain one particular ordering 133 00:08:25,410 --> 00:08:29,010 is 1 over alpha factorial. 134 00:08:29,010 --> 00:08:31,850 And similarly, if I tell you that the Y's all 135 00:08:31,850 --> 00:08:34,570 fell in this interval by symmetry, 136 00:08:34,570 --> 00:08:36,620 the probability of a particular order 137 00:08:36,620 --> 00:08:40,830 is going to be 1 over the [number of possible] 138 00:08:40,830 --> 00:08:43,465 orders, which is beta factorial. 139 00:08:46,240 --> 00:08:46,930 All right. 140 00:08:46,930 --> 00:08:49,990 So we have this conditional probability, 141 00:08:49,990 --> 00:08:54,830 and now we can go back to this formula and substitute, 142 00:08:54,830 --> 00:09:00,290 and what we obtain is the integral of this expression, 143 00:09:00,290 --> 00:09:05,660 theta to the alpha, 1 minus theta [to the] beta, 1 144 00:09:05,660 --> 00:09:11,840 over alpha factorial times 1 over beta factorial. 145 00:09:11,840 --> 00:09:15,520 Then we have the density of Z, but since Z is uniform, 146 00:09:15,520 --> 00:09:18,010 the density is equal to 1. 147 00:09:18,010 --> 00:09:21,720 And then we have a factor of d theta. 148 00:09:21,720 --> 00:09:23,730 So what have we achieved? 149 00:09:23,730 --> 00:09:26,855 We calculated the probability of the event A 150 00:09:26,855 --> 00:09:31,520 in two different ways, and we got two seemingly different 151 00:09:31,520 --> 00:09:32,490 answers. 152 00:09:32,490 --> 00:09:35,420 But these two answers have to agree. 153 00:09:35,420 --> 00:09:39,450 Therefore, this expression is equal to that expression. 154 00:09:39,450 --> 00:09:43,930 And now if you take this factor, 1 over alpha factorial times 1 155 00:09:43,930 --> 00:09:47,170 over beta factorial, and send it to the other side 156 00:09:47,170 --> 00:09:50,920 of the equation, what we obtain is exactly the formula 157 00:09:50,920 --> 00:09:53,420 that we wished to derive. 158 00:09:53,420 --> 00:09:56,530 This example is an instance of the following. 159 00:09:56,530 --> 00:09:58,840 There are certain formulas in mathematics 160 00:09:58,840 --> 00:10:01,870 that are somewhat complicated to derive, 161 00:10:01,870 --> 00:10:04,270 and their derivations using, for example, 162 00:10:04,270 --> 00:10:06,570 calculus are quite unintuitive. 163 00:10:06,570 --> 00:10:10,090 But once you interpret the various terms that 164 00:10:10,090 --> 00:10:14,280 appear in such a relation in a probabilistic way, 165 00:10:14,280 --> 00:10:19,660 you can sometimes find very easy derivations and explanations 166 00:10:19,660 --> 00:10:22,978 why such a formula has to be true.