1 00:00:00,480 --> 00:00:03,600 We will now work with a geometric random variable and 2 00:00:03,600 --> 00:00:06,830 put to use our understanding of conditional PMFs and 3 00:00:06,830 --> 00:00:08,850 conditional expectations. 4 00:00:08,850 --> 00:00:11,810 Remember that a geometric random variable corresponds to 5 00:00:11,810 --> 00:00:14,580 the number of independent coin tosses until 6 00:00:14,580 --> 00:00:16,550 the first head occurs. 7 00:00:16,550 --> 00:00:20,445 And here p is a parameter that describes the coin. 8 00:00:20,445 --> 00:00:24,600 It is the probability of heads at each coin toss. 9 00:00:24,600 --> 00:00:28,430 We have already seen the formula for the geometric PMF 10 00:00:28,430 --> 00:00:30,370 and the corresponding plot. 11 00:00:30,370 --> 00:00:34,380 We will now add one very important property which is 12 00:00:34,380 --> 00:00:37,140 usually called Memorylessness. 13 00:00:37,140 --> 00:00:40,080 Ultimately, this property has to do with the fact that 14 00:00:40,080 --> 00:00:43,850 independent coin tosses do not have any memory. 15 00:00:43,850 --> 00:00:48,490 Past coin tosses do not affect future coin tosses. 16 00:00:48,490 --> 00:00:52,290 So consider a coin-tossing experiment with independent 17 00:00:52,290 --> 00:00:57,050 tosses and let X be the number of tosses 18 00:00:57,050 --> 00:01:00,140 until the first heads. 19 00:01:00,140 --> 00:01:04,209 And X is a geometric with parameter p. 20 00:01:04,209 --> 00:01:08,270 Suppose that you show up a little after the experiment 21 00:01:08,270 --> 00:01:10,030 has started. 22 00:01:10,030 --> 00:01:15,000 And you're told that there was so far just one coin toss. 23 00:01:15,000 --> 00:01:18,460 And that this coin toss resulted in tails. 24 00:01:18,460 --> 00:01:23,940 Now you have to take over and carry out the remaining tosses 25 00:01:23,940 --> 00:01:26,130 until heads are observed. 26 00:01:26,130 --> 00:01:29,980 What should your model be about the future? 27 00:01:29,980 --> 00:01:34,440 Well, you will be making independent coin tosses until 28 00:01:34,440 --> 00:01:36,400 the first heads. 29 00:01:36,400 --> 00:01:41,560 So the number of such tosses will be a random variable, 30 00:01:41,560 --> 00:01:45,530 which is geometric with parameter p. 31 00:01:45,530 --> 00:01:47,180 So this duration-- 32 00:01:47,180 --> 00:01:49,479 as far as you are concerned-- 33 00:01:49,479 --> 00:01:54,430 is geometric with parameter p. 34 00:01:54,430 --> 00:01:59,610 Therefore, the number of remaining coin tosses starting 35 00:01:59,610 --> 00:02:00,920 from here-- 36 00:02:00,920 --> 00:02:04,200 given that the first toss was tails-- 37 00:02:04,200 --> 00:02:07,990 has the same geometric distribution as the original 38 00:02:07,990 --> 00:02:10,370 random variable X. 39 00:02:10,370 --> 00:02:13,290 This is the Memorylessness property. 40 00:02:13,290 --> 00:02:18,100 Now, since X is the total number of coin tosses and 41 00:02:18,100 --> 00:02:22,890 since your coin tosses were all of them except for the 42 00:02:22,890 --> 00:02:27,190 first one, the random variable that you are concerned 43 00:02:27,190 --> 00:02:30,430 with is X minus 1. 44 00:02:30,430 --> 00:02:33,030 And so the geometric distribution that you are 45 00:02:33,030 --> 00:02:39,180 seeing here is the conditional distribution of X minus 1 46 00:02:39,180 --> 00:02:43,440 given that the first toss resulted in tails, which is 47 00:02:43,440 --> 00:02:49,520 the same as the event that X is strictly larger than 1. 48 00:02:49,520 --> 00:02:53,210 So the statement that we have been making is the following 49 00:02:53,210 --> 00:02:55,460 in more mathematical language-- 50 00:02:55,460 --> 00:03:00,930 that conditioned on X being larger than 1, the random 51 00:03:00,930 --> 00:03:04,370 variable X minus 1, which is the remaining number of coin 52 00:03:04,370 --> 00:03:10,100 tosses, has a geometric distribution with parameter p. 53 00:03:10,100 --> 00:03:11,830 Let us now give a more precise, 54 00:03:11,830 --> 00:03:13,150 mathematical argument. 55 00:03:13,150 --> 00:03:15,620 But first, for a special case. 56 00:03:15,620 --> 00:03:19,210 Let's us look at the conditional probabilities for 57 00:03:19,210 --> 00:03:22,430 the random variable X minus 1. 58 00:03:22,430 --> 00:03:25,560 And calculate, for example, the conditional probability 59 00:03:25,560 --> 00:03:32,610 that X minus 1 is equal to 3, given that X is larger than 1. 60 00:03:32,610 --> 00:03:34,780 Which is the same as saying that the first 61 00:03:34,780 --> 00:03:36,730 toss resulted in tails. 62 00:03:39,460 --> 00:03:41,770 Now, the first toss resulted in tails. 63 00:03:41,770 --> 00:03:45,480 This is the probability that you will need three more 64 00:03:45,480 --> 00:03:48,970 tosses until you observe heads. 65 00:03:48,970 --> 00:03:52,410 Needing three more tosses until you observe heads is the 66 00:03:52,410 --> 00:03:58,079 event that you had tails in the second toss, tails in the 67 00:03:58,079 --> 00:04:03,660 third toss, and heads in the fourth toss. 68 00:04:03,660 --> 00:04:07,170 And all that is conditioned on the first toss 69 00:04:07,170 --> 00:04:10,750 having resulted in tails. 70 00:04:10,750 --> 00:04:14,900 However, the different coin tosses are independent. 71 00:04:14,900 --> 00:04:18,810 So the conditional probabilities, given the event 72 00:04:18,810 --> 00:04:22,670 that the first toss was tails should be the same as the 73 00:04:22,670 --> 00:04:25,050 unconditional probabilities. 74 00:04:25,050 --> 00:04:28,580 The first toss does not change our beliefs about the 75 00:04:28,580 --> 00:04:34,470 probabilities associated with the remaining tosses. 76 00:04:34,470 --> 00:04:36,760 Now, this unconditional 77 00:04:36,760 --> 00:04:40,190 probability is easy to calculate. 78 00:04:40,190 --> 00:04:43,550 It is 1 minus p squared-- 79 00:04:43,550 --> 00:04:45,900 because we have two tails in a row-- 80 00:04:45,900 --> 00:04:48,920 times p. 81 00:04:48,920 --> 00:04:53,170 Now, we observe that this quantity here is the 82 00:04:53,170 --> 00:04:57,670 probability that a geometric random variable takes the 83 00:04:57,670 --> 00:05:00,580 value of three. 84 00:05:00,580 --> 00:05:03,240 Here what have we calculated? 85 00:05:03,240 --> 00:05:09,340 We have calculated the PMF of the random variable X minus 1 86 00:05:09,340 --> 00:05:14,610 in a conditional universe where X is larger than 1. 87 00:05:14,610 --> 00:05:18,510 And we evaluated it for a value of 3. 88 00:05:18,510 --> 00:05:21,210 The probability that our random variable X minus 1 89 00:05:21,210 --> 00:05:23,200 takes the value of 3. 90 00:05:23,200 --> 00:05:28,410 So what we have shown is that this conditional PMF is the 91 00:05:28,410 --> 00:05:32,900 same as the unconditional PMF. 92 00:05:32,900 --> 00:05:36,110 Now, there is nothing special about the number 3. 93 00:05:36,110 --> 00:05:40,600 You can generalize this argument and establish that 94 00:05:40,600 --> 00:05:45,900 the conditional probability of X minus 1 given that X is 95 00:05:45,900 --> 00:05:51,090 strictly larger than one, for any particular k, is the same 96 00:05:51,090 --> 00:05:54,300 as the corresponding probability for the random 97 00:05:54,300 --> 00:06:01,600 variable X, which is given by the geometric PMF. 98 00:06:01,600 --> 00:06:07,715 Finally, there is nothing special about the value of 1 99 00:06:07,715 --> 00:06:10,980 that we're using here. 100 00:06:10,980 --> 00:06:16,500 In fact, we can generalize and argue as follows-- 101 00:06:16,500 --> 00:06:20,410 suppose that I tell you that X is strictly larger than n. 102 00:06:20,410 --> 00:06:24,880 That is, the first n tosses resulted in tails. 103 00:06:24,880 --> 00:06:29,160 Once more, these past tosses were wasted but have no effect 104 00:06:29,160 --> 00:06:30,200 on the future. 105 00:06:30,200 --> 00:06:34,909 So the conditional PMF of the remaining number of tosses 106 00:06:34,909 --> 00:06:38,090 should be, again, the same. 107 00:06:38,090 --> 00:06:42,200 Therefore, the statement we're making is that this geometric 108 00:06:42,200 --> 00:06:49,330 PMF will also be the PMF of X minus n, given that X is 109 00:06:49,330 --> 00:06:53,659 strictly larger than n, and this will be true no matter 110 00:06:53,659 --> 00:06:58,570 what argument we plug-in into the PMF. 111 00:06:58,570 --> 00:07:01,640 We will now exploit the Memorylessness property of the 112 00:07:01,640 --> 00:07:05,490 geometric PMF and use it together with the total 113 00:07:05,490 --> 00:07:08,690 expectation theorem to calculate the mean or 114 00:07:08,690 --> 00:07:11,620 expectation of the geometric PMF. 115 00:07:11,620 --> 00:07:14,050 If we wanted to calculate the expected value of the 116 00:07:14,050 --> 00:07:17,590 geometric using the definition of the expectation, we would 117 00:07:17,590 --> 00:07:21,280 have to calculate this infinite sum here, which is 118 00:07:21,280 --> 00:07:23,030 quite difficult. 119 00:07:23,030 --> 00:07:26,710 Instead, we're going to use a certain trick. 120 00:07:26,710 --> 00:07:28,420 The trick is the following-- 121 00:07:28,420 --> 00:07:32,120 to break down the expected value calculation into two 122 00:07:32,120 --> 00:07:33,820 different scenarios. 123 00:07:33,820 --> 00:07:38,210 Under one scenario we obtain heads in the first toss. 124 00:07:38,210 --> 00:07:42,430 And in that case the random variable X-- 125 00:07:42,430 --> 00:07:44,460 the number of tosses until the first heads-- 126 00:07:44,460 --> 00:07:45,730 is equal to 1. 127 00:07:45,730 --> 00:07:48,480 And this scenario occurs with probability p. 128 00:07:48,480 --> 00:07:51,470 And we have another scenario with probability 1 minus p 129 00:07:51,470 --> 00:07:54,580 where we obtain tails in the first toss. 130 00:07:54,580 --> 00:07:56,830 And in that case, our random variable is 131 00:07:56,830 --> 00:07:59,400 strictly larger than 1. 132 00:07:59,400 --> 00:08:05,340 Now, the expected value of X consists of two pieces. 133 00:08:05,340 --> 00:08:08,930 We have a first toss no matter what. 134 00:08:08,930 --> 00:08:12,990 And then we have the number of remaining tosses, 135 00:08:12,990 --> 00:08:15,060 which is X minus 1. 136 00:08:15,060 --> 00:08:18,640 So this is true by linearity of expectations. 137 00:08:18,640 --> 00:08:23,480 Now, the expected value of X minus 1 consists of two pieces 138 00:08:23,480 --> 00:08:25,770 using the total expectation theorem. 139 00:08:25,770 --> 00:08:29,300 The probability of the first scenario times the expected 140 00:08:29,300 --> 00:08:34,450 value of X minus 1 given that X is equal to 1, 141 00:08:34,450 --> 00:08:36,558 plus 1 minus p-- 142 00:08:36,558 --> 00:08:38,970 the probability of the second scenario-- 143 00:08:38,970 --> 00:08:43,549 times the expected value of X minus 1 given that X 144 00:08:43,549 --> 00:08:45,000 is bigger than 1. 145 00:08:47,660 --> 00:08:51,200 Now, this term here is 0. 146 00:08:51,200 --> 00:08:52,000 Why? 147 00:08:52,000 --> 00:08:55,730 If I tell you that X is equal to 1, then you're certain 148 00:08:55,730 --> 00:08:58,350 that's X minus 1 is equal to 0. 149 00:08:58,350 --> 00:09:01,770 So this term gives a 0 contribution. 150 00:09:01,770 --> 00:09:03,760 How about the next term? 151 00:09:03,760 --> 00:09:08,850 We have a 1 minus p here times this expected value. 152 00:09:08,850 --> 00:09:15,560 Now this random variable, conditioned on this event, has 153 00:09:15,560 --> 00:09:20,890 the same distribution as an ordinary, unconditioned 154 00:09:20,890 --> 00:09:22,890 geometric random variable. 155 00:09:22,890 --> 00:09:27,450 So this expectation here must be the same as the expectation 156 00:09:27,450 --> 00:09:32,050 of an ordinary, unconditioned, geometric random variable. 157 00:09:32,050 --> 00:09:34,730 And this gives us an equality. 158 00:09:34,730 --> 00:09:38,900 Both sides involve the expected value of X. But we 159 00:09:38,900 --> 00:09:42,780 can solve this equation for the expected value. 160 00:09:42,780 --> 00:09:45,610 And we obtain the end result that the expected 161 00:09:45,610 --> 00:09:49,540 value is 1 over p. 162 00:09:49,540 --> 00:09:53,550 By the way, this answer makes intuitive sense. 163 00:09:53,550 --> 00:09:58,030 If p is small, this means that the odds of 164 00:09:58,030 --> 00:10:00,860 seeing heads is small. 165 00:10:00,860 --> 00:10:05,040 Then in that case, we need to wait longer and longer until 166 00:10:05,040 --> 00:10:07,720 we see heads for the first time. 167 00:10:07,720 --> 00:10:10,740 Setting aside the specific form of the answer that we 168 00:10:10,740 --> 00:10:15,510 found, what we have just done actually illustrates that 169 00:10:15,510 --> 00:10:19,950 fairly difficult calculations can become very simple if one 170 00:10:19,950 --> 00:10:23,740 breaks down a model or a problem in a clever way. 171 00:10:23,740 --> 00:10:26,450 This is going to be a recurring theme throughout 172 00:10:26,450 --> 00:10:27,700 this class.