1 00:00:00,550 --> 00:00:04,200 We will now study a problem which is quite difficult to 2 00:00:04,200 --> 00:00:08,800 approach in a direct brute force manner but becomes 3 00:00:08,800 --> 00:00:14,520 tractable once we break it down into simpler pieces using 4 00:00:14,520 --> 00:00:17,410 several of the tricks that we have learned so far. 5 00:00:17,410 --> 00:00:21,030 And this problem will also be a good opportunity for 6 00:00:21,030 --> 00:00:23,840 reviewing some of the tricks and techniques 7 00:00:23,840 --> 00:00:25,690 that we have developed. 8 00:00:25,690 --> 00:00:27,420 The problem is the following. 9 00:00:27,420 --> 00:00:28,790 There are n people. 10 00:00:28,790 --> 00:00:33,280 And let's say for the purpose of illustration that we have 3 11 00:00:33,280 --> 00:00:36,825 people, persons 1, 2, and 3. 12 00:00:40,790 --> 00:00:43,280 And each person has a hat. 13 00:00:43,280 --> 00:00:46,780 They throw their hats inside a box. 14 00:00:46,780 --> 00:00:51,770 And then each person picks a hat at random out of that box. 15 00:00:51,770 --> 00:00:53,595 So here are the three parts. 16 00:01:00,610 --> 00:01:04,849 And one possible outcome of this experiment is that person 17 00:01:04,849 --> 00:01:09,020 1 ends up with hat number 2, person 2 ends up with hat 18 00:01:09,020 --> 00:01:14,120 number 1, person 3 ends up with hat number 3. 19 00:01:14,120 --> 00:01:19,090 We could indicate the hats that each person got by noting 20 00:01:19,090 --> 00:01:21,940 here the numbers associated with each 21 00:01:21,940 --> 00:01:24,760 person, the hat numbers. 22 00:01:24,760 --> 00:01:29,470 And notice that this sequence of numbers, which is a 23 00:01:29,470 --> 00:01:32,570 description of the outcome of the experiment, is just a 24 00:01:32,570 --> 00:01:36,660 permutation of the numbers 1, 2, 3 of the hats. 25 00:01:36,660 --> 00:01:41,020 So we permute the hat numbers so that we can place them next 26 00:01:41,020 --> 00:01:44,640 to the person that got each one of the hats. 27 00:01:44,640 --> 00:01:49,020 In particular, we have n factorial possible outcomes. 28 00:01:49,020 --> 00:01:52,710 This is the number of possible permutations. 29 00:01:52,710 --> 00:01:56,340 What does it mean to pick hats at random? 30 00:01:56,340 --> 00:01:58,680 One interpretation is that every 31 00:01:58,680 --> 00:02:01,370 permutation is equally likely. 32 00:02:01,370 --> 00:02:05,660 And since we have n factorial permutations, each permutation 33 00:02:05,660 --> 00:02:10,169 would have a probability of 1 over n factorial. 34 00:02:10,169 --> 00:02:15,440 But there's another way of describing our model, which is 35 00:02:15,440 --> 00:02:17,620 the following. 36 00:02:17,620 --> 00:02:22,650 Person 1 gets a hat at random out of the three available. 37 00:02:22,650 --> 00:02:26,420 Then person 2 gets a hat at random out of 38 00:02:26,420 --> 00:02:28,750 the remaining hats. 39 00:02:28,750 --> 00:02:31,860 Then person 3 gets the remaining hat. 40 00:02:31,860 --> 00:02:34,600 Each time that there is a choice, each one of the 41 00:02:34,600 --> 00:02:37,600 available hats is equally likely to be picked 42 00:02:37,600 --> 00:02:39,700 as any other hat. 43 00:02:39,700 --> 00:02:42,310 Let us calculate the probability, let's say, that 44 00:02:42,310 --> 00:02:45,710 this particular permutation gets materialized. 45 00:02:45,710 --> 00:02:52,350 The probability that person 1 gets hat number 2 is 1/3. 46 00:02:52,350 --> 00:02:53,950 Then we're left with two hats. 47 00:02:57,350 --> 00:02:59,730 Person 2 has 2 hats to choose from. 48 00:02:59,730 --> 00:03:02,680 The probability that it picks this particular hat 49 00:03:02,680 --> 00:03:05,440 is going to be 1/2. 50 00:03:05,440 --> 00:03:10,330 And finally, person 3 has only 1 hat available, so it will be 51 00:03:10,330 --> 00:03:12,500 picked with probability 1. 52 00:03:12,500 --> 00:03:16,243 So the probability of this particular permutation is one 53 00:03:16,243 --> 00:03:18,880 over 3 factorial. 54 00:03:18,880 --> 00:03:21,900 But you can repeat this argument and consider any 55 00:03:21,900 --> 00:03:24,960 other permutation, and you will always be getting the 56 00:03:24,960 --> 00:03:26,260 same answer. 57 00:03:26,260 --> 00:03:29,346 Any particular permutation has the same probability, one over 58 00:03:29,346 --> 00:03:31,790 3 factorial. 59 00:03:31,790 --> 00:03:35,570 The same argument goes through for the case of general n, n 60 00:03:35,570 --> 00:03:37,430 people and n hats. 61 00:03:37,430 --> 00:03:40,740 And we will find that any permutation will have the same 62 00:03:40,740 --> 00:03:43,990 probability, 1/n factorial. 63 00:03:43,990 --> 00:03:48,350 Therefore, the process of picking one hat at a time is 64 00:03:48,350 --> 00:03:52,660 probabilistically identical to a model in which we simply 65 00:03:52,660 --> 00:03:55,435 state that all permutations are equally likely. 66 00:03:58,040 --> 00:04:02,370 Now that we have described our model and our process and the 67 00:04:02,370 --> 00:04:05,940 associated probabilities, let us consider the question we 68 00:04:05,940 --> 00:04:07,680 want to answer. 69 00:04:07,680 --> 00:04:13,590 Let X be the number of people who get their own hat back. 70 00:04:13,590 --> 00:04:18,380 For example, for the outcome that we have drawn here, the 71 00:04:18,380 --> 00:04:22,110 only person who gets their own hat back is person 3. 72 00:04:22,110 --> 00:04:26,720 And so in this case X happens to take the value of 1. 73 00:04:26,720 --> 00:04:29,950 What we want to do is to calculate the expected value 74 00:04:29,950 --> 00:04:35,370 of the random variable X. The problem is difficult because 75 00:04:35,370 --> 00:04:41,040 if you try to calculate the PMF of the random variable X 76 00:04:41,040 --> 00:04:45,520 and then use the definition of the expectation to calculate 77 00:04:45,520 --> 00:04:51,260 this sum, you will run into big difficulties. 78 00:04:51,260 --> 00:04:56,100 Calculating this quantity, the PMF of X, is difficult. 79 00:04:56,100 --> 00:05:00,300 And it is difficult because there is no simple expression 80 00:05:00,300 --> 00:05:02,230 that describes it. 81 00:05:02,230 --> 00:05:05,460 So we need to do something more intelligent, find some 82 00:05:05,460 --> 00:05:08,180 other way of approaching the problem. 83 00:05:08,180 --> 00:05:13,480 The trick that we will use is to employ indicator variables. 84 00:05:13,480 --> 00:05:18,590 Let Xi be equal to one 1 if person i selects their own hat 85 00:05:18,590 --> 00:05:20,680 and 0 otherwise. 86 00:05:20,680 --> 00:05:25,610 So then, each one of the Xi's is 1 whenever a person has 87 00:05:25,610 --> 00:05:27,320 selected their own hat. 88 00:05:27,320 --> 00:05:32,659 And by adding all the 1's that we may get, we obtain the 89 00:05:32,659 --> 00:05:37,580 total number of people who have selected their own hats. 90 00:05:37,580 --> 00:05:40,390 This makes things easier, because now to calculate the 91 00:05:40,390 --> 00:05:43,820 expected value of X it's sufficient to calculate the 92 00:05:43,820 --> 00:05:47,440 expected value of each one of those terms and add the 93 00:05:47,440 --> 00:05:50,450 expected values, which we're allowed to 94 00:05:50,450 --> 00:05:53,260 do because of linearity. 95 00:05:53,260 --> 00:05:57,370 So let's look at the typical term here. 96 00:05:57,370 --> 00:06:00,410 What is the expected value of Xi? 97 00:06:00,410 --> 00:06:04,000 If you consider the first description or our model, all 98 00:06:04,000 --> 00:06:07,300 permutations are equally likely, this description is 99 00:06:07,300 --> 00:06:10,410 symmetric with respect to all of the persons. 100 00:06:10,410 --> 00:06:14,160 So the expected value of Xi should be the same as the 101 00:06:14,160 --> 00:06:16,040 expected value of X1. 102 00:06:18,910 --> 00:06:24,300 Now, to calculate the expected value of X1, we will consider 103 00:06:24,300 --> 00:06:27,740 the sequential description of the process in which 1 is the 104 00:06:27,740 --> 00:06:30,780 first person to pick a hat. 105 00:06:30,780 --> 00:06:34,190 Now, since X1 is a Bernoulli random variable that takes 106 00:06:34,190 --> 00:06:38,490 values 0 or 1, the expected value of X1 is just the 107 00:06:38,490 --> 00:06:42,680 probability that X1 is equal to 1. 108 00:06:42,680 --> 00:06:46,130 And if person 1 is the first one to choose a hat, that 109 00:06:46,130 --> 00:06:53,540 person has probability 1/n of obtaining the correct hat. 110 00:06:53,540 --> 00:06:56,560 So each one of these random variables has an expected 111 00:06:56,560 --> 00:06:58,065 value of 1/n. 112 00:07:01,050 --> 00:07:05,400 The expected value of X by linearity is going to be the 113 00:07:05,400 --> 00:07:07,465 sum of the expected values. 114 00:07:11,660 --> 00:07:14,950 There is n of them. 115 00:07:14,950 --> 00:07:17,010 Each expected value is 1/n. 116 00:07:20,160 --> 00:07:24,000 And so the final answer is 1. 117 00:07:24,000 --> 00:07:32,060 This is the expected value of the random variable X. 118 00:07:32,060 --> 00:07:36,140 Let us now move and try to calculate a more difficult 119 00:07:36,140 --> 00:07:44,060 quantity, namely, the variance of X. How shall we proceed? 120 00:07:44,060 --> 00:07:50,300 Things would be easiest if the random variables Xi were 121 00:07:50,300 --> 00:07:51,370 independent. 122 00:07:51,370 --> 00:07:55,630 Because in that case, the variance of X would be the sum 123 00:07:55,630 --> 00:07:58,450 of the variances of the Xi's. 124 00:07:58,450 --> 00:08:01,960 But are the Xi's independent? 125 00:08:01,960 --> 00:08:04,220 Let us consider a special case. 126 00:08:04,220 --> 00:08:09,300 Suppose that we only have two persons and that I tell you 127 00:08:09,300 --> 00:08:13,660 that the first person got their own hat back. 128 00:08:13,660 --> 00:08:17,980 In that case, the second person must have also gotten 129 00:08:17,980 --> 00:08:20,580 their own hat back. 130 00:08:20,580 --> 00:08:23,840 If, on the other hand, person 1 did not to get their own hat 131 00:08:23,840 --> 00:08:30,160 back, then person 2 will not get their own hat back either. 132 00:08:30,160 --> 00:08:33,808 Because in this scenario, person 1 gets hat 2, and that 133 00:08:33,808 --> 00:08:36,370 means that person 2 gets hat 1. 134 00:08:36,370 --> 00:08:39,419 So we see that knowing the value of the random variable 135 00:08:39,419 --> 00:08:43,240 X1 tells us a lot about the value of the 136 00:08:43,240 --> 00:08:45,030 random variable X2. 137 00:08:45,030 --> 00:08:46,650 And that means that the random variables 138 00:08:46,650 --> 00:08:49,740 X1 and X2 are dependent. 139 00:08:49,740 --> 00:08:53,140 More generally, if I were to tell you that the first n 140 00:08:53,140 --> 00:08:58,680 minus 1 people got their own hats back, then the last 141 00:08:58,680 --> 00:09:02,860 remaining person will have his or her own hat 142 00:09:02,860 --> 00:09:04,140 available to be picked. 143 00:09:04,140 --> 00:09:06,770 That's going to be the only available hat. 144 00:09:06,770 --> 00:09:10,020 And then person n we also get their hat back. 145 00:09:10,020 --> 00:09:13,730 So we see that the information about some of the Xi's gives 146 00:09:13,730 --> 00:09:18,180 us information about the remaining Xn. 147 00:09:18,180 --> 00:09:19,920 And again, this means that the random 148 00:09:19,920 --> 00:09:23,140 variables are dependent. 149 00:09:23,140 --> 00:09:25,870 Since we do not have independence, we cannot find 150 00:09:25,870 --> 00:09:28,760 the variance by just adding the variances of the different 151 00:09:28,760 --> 00:09:30,120 random variables. 152 00:09:30,120 --> 00:09:35,700 But we need to do a lot more work in that direction. 153 00:09:35,700 --> 00:09:39,130 In general, whenever we need to calculate variances, it is 154 00:09:39,130 --> 00:09:43,220 usually simpler to carry out the calculation using this 155 00:09:43,220 --> 00:09:46,710 alternative form for the variance. 156 00:09:46,710 --> 00:09:51,440 So let us start towards a calculation of the expected 157 00:09:51,440 --> 00:09:54,720 value of X squared. 158 00:09:54,720 --> 00:09:59,990 Now the random variable X squared, by simple algebra, is 159 00:09:59,990 --> 00:10:02,390 this expression times itself. 160 00:10:02,390 --> 00:10:05,380 And by expanding the product we get all 161 00:10:05,380 --> 00:10:07,550 sorts of cross terms. 162 00:10:07,550 --> 00:10:11,900 Some of these cross terms will be of the type X1 times Xi or 163 00:10:11,900 --> 00:10:14,050 X2 times X2. 164 00:10:14,050 --> 00:10:19,300 These will be terms of this form, and there is n of them. 165 00:10:19,300 --> 00:10:23,770 And then we get cross terms, such as X1 times X2, X1 times 166 00:10:23,770 --> 00:10:27,670 X3, X2 times X1, and so on. 167 00:10:27,670 --> 00:10:30,880 How many terms do we have here? 168 00:10:30,880 --> 00:10:34,100 Well, if we have n terms multiplying n other terms we 169 00:10:34,100 --> 00:10:37,330 have a total of n squared terms. 170 00:10:37,330 --> 00:10:41,450 n are already here, so the remaining terms, which are the 171 00:10:41,450 --> 00:10:46,050 cross terms, will be n squared minus n. 172 00:10:46,050 --> 00:10:53,370 Or, in a simpler form, it's n times n minus 1. 173 00:10:53,370 --> 00:10:57,280 So now how are we going to calculate the expected value 174 00:10:57,280 --> 00:10:58,490 of X squared? 175 00:10:58,490 --> 00:11:01,310 Well, we will use linearity of expectations. 176 00:11:01,310 --> 00:11:04,890 So we need to calculate the expected value of Xi squared, 177 00:11:04,890 --> 00:11:09,080 and we also need to calculate the expected value of Xi Xj 178 00:11:09,080 --> 00:11:12,846 when i is different from j. 179 00:11:12,846 --> 00:11:14,820 Let us start with Xi squared. 180 00:11:17,800 --> 00:11:22,600 First, if we use the symmetric description of our model, all 181 00:11:22,600 --> 00:11:26,920 permutations are equally likely, then all persons play 182 00:11:26,920 --> 00:11:27,440 the same role. 183 00:11:27,440 --> 00:11:29,130 There's symmetry in the problem. 184 00:11:29,130 --> 00:11:34,450 So Xi squared has the same distribution as X1 squared. 185 00:11:38,270 --> 00:11:42,340 Then, X1 is a 0-1 random variable, a 186 00:11:42,340 --> 00:11:43,980 Bernoulli random variable. 187 00:11:43,980 --> 00:11:48,310 So X1 squared will always take the same numerical value as 188 00:11:48,310 --> 00:11:49,730 the random variable X1. 189 00:11:52,400 --> 00:11:56,300 This is a very special case which happens only because a 190 00:11:56,300 --> 00:11:59,140 random variable takes values in {0, 1}. 191 00:11:59,140 --> 00:12:01,550 And 0 squared is the same as 0. 192 00:12:01,550 --> 00:12:04,575 1 squared is the same as 1. 193 00:12:04,575 --> 00:12:07,660 This expected value is something that we have already 194 00:12:07,660 --> 00:12:09,610 calculated, and it is 1/n. 195 00:12:13,410 --> 00:12:16,270 Let us now move to the calculation of the expectation 196 00:12:16,270 --> 00:12:20,060 of a typical term inside the sum. 197 00:12:20,060 --> 00:12:22,320 So let i be different than j, and look at the 198 00:12:22,320 --> 00:12:25,280 expected value of Xi Xj. 199 00:12:25,280 --> 00:12:28,550 Once more, because of the symmetry of the probabilistic 200 00:12:28,550 --> 00:12:33,160 model, it doesn't matter which i and j we are considering. 201 00:12:33,160 --> 00:12:37,585 So we might as well consider the product of X1 with X2. 202 00:12:40,770 --> 00:12:44,760 Now, X1 and X2 take values 0 and 1. 203 00:12:44,760 --> 00:12:48,620 And the product of the two also takes values 0 and 1. 204 00:12:48,620 --> 00:12:51,440 So this is a Bernoulli random variable, and so the 205 00:12:51,440 --> 00:12:54,340 expectation of that random variable is just the 206 00:12:54,340 --> 00:12:58,535 probability that this random variable is equal to 1. 207 00:13:01,060 --> 00:13:05,040 But for the product to be equal to 1, the only way that 208 00:13:05,040 --> 00:13:09,940 this can happen is if both of these random variables happen 209 00:13:09,940 --> 00:13:11,590 to be equal to 1. 210 00:13:14,890 --> 00:13:18,430 Let us now turn to the sequential 211 00:13:18,430 --> 00:13:20,370 description of the model. 212 00:13:20,370 --> 00:13:24,350 The probability that the first person gets their own hat back 213 00:13:24,350 --> 00:13:27,400 and the second person gets their own hat back is the 214 00:13:27,400 --> 00:13:32,270 probability that the first one gets their own hat back, and 215 00:13:32,270 --> 00:13:36,020 then multiplied by the conditional probability that 216 00:13:36,020 --> 00:13:40,380 the second person gets their own hat back, given that the 217 00:13:40,380 --> 00:13:43,540 first person got their own hat back. 218 00:13:43,540 --> 00:13:44,650 What are these probabilities? 219 00:13:44,650 --> 00:13:46,150 The probability that a person gets their 220 00:13:46,150 --> 00:13:47,630 own hat back is 1/n. 221 00:13:50,720 --> 00:13:55,010 Given that person 1 got their own hat back, person 2 is 222 00:13:55,010 --> 00:13:57,450 faced with a situation where there are n 223 00:13:57,450 --> 00:13:59,800 minus 1 available hats. 224 00:13:59,800 --> 00:14:02,540 And one of those is that person's hat. 225 00:14:02,540 --> 00:14:07,400 So the probability that person 2 will also pick his or her 226 00:14:07,400 --> 00:14:11,350 own hat is 1 over n minus 1. 227 00:14:14,920 --> 00:14:19,610 Now we are in a position to calculate the expected value 228 00:14:19,610 --> 00:14:23,360 of X squared. 229 00:14:23,360 --> 00:14:31,180 The expected value of X squared consists of the sum of 230 00:14:31,180 --> 00:14:42,110 n expected values, each one of which is equal to 1/n plus so 231 00:14:42,110 --> 00:14:48,400 many expected values, because we have so many terms, each 232 00:14:48,400 --> 00:14:54,490 one of which, by this calculation, is 1/n times 1 233 00:14:54,490 --> 00:14:57,970 over n minus 1. 234 00:14:57,970 --> 00:15:00,780 And we see that we get cancellations here. 235 00:15:00,780 --> 00:15:04,390 And we obtain 1 plus 1, which is equal to 2. 236 00:15:09,450 --> 00:15:13,840 On the other hand we have this term that we need to subtract. 237 00:15:13,840 --> 00:15:16,370 We found previously that the expected value of 238 00:15:16,370 --> 00:15:17,860 X is equal to 1. 239 00:15:17,860 --> 00:15:19,950 So we need to subtract 1. 240 00:15:19,950 --> 00:15:23,800 And the final answer to our problem is that the variance 241 00:15:23,800 --> 00:15:26,520 of X is also equal to 1. 242 00:15:32,200 --> 00:15:36,780 So what we saw in this problem is that we can deal with quite 243 00:15:36,780 --> 00:15:41,810 complicated models, but by breaking them down into more 244 00:15:41,810 --> 00:15:47,600 manageable pieces, first break down the random variable X as 245 00:15:47,600 --> 00:15:51,270 a sum of different random variables, then taking the 246 00:15:51,270 --> 00:15:54,680 square of this and break it down into a number of 247 00:15:54,680 --> 00:15:58,170 different terms, and then by considering one term at a 248 00:15:58,170 --> 00:16:04,550 time, we can often end up with the solutions or the answers 249 00:16:04,550 --> 00:16:06,110 to problems that would have been 250 00:16:06,110 --> 00:16:07,530 otherwise quite difficult.