1 00:00:00,000 --> 00:00:00,530 2 00:00:00,530 --> 00:00:03,340 In this exercise, we'll be looking at a problem, also 3 00:00:03,340 --> 00:00:05,710 know as the coupons collector's problem. 4 00:00:05,710 --> 00:00:09,500 We have a set of K coupons, or grades in our case. 5 00:00:09,500 --> 00:00:11,230 And each time slot we're revealed 6 00:00:11,230 --> 00:00:13,210 with one random grade. 7 00:00:13,210 --> 00:00:15,230 And we'd like to know how long it would take for us to 8 00:00:15,230 --> 00:00:16,900 collect all K grades. 9 00:00:16,900 --> 00:00:20,690 In our case, K is equal to 6. 10 00:00:20,690 --> 00:00:22,390 Now the key to solving the problem 11 00:00:22,390 --> 00:00:23,930 is essentially twofolds. 12 00:00:23,930 --> 00:00:26,980 First, we'll have to find a way to intelligently define 13 00:00:26,980 --> 00:00:30,050 sequence random variables that captured, essentially, 14 00:00:30,050 --> 00:00:32,280 stopping time of this process. 15 00:00:32,280 --> 00:00:35,930 And then we'll employ the idea of linearity of expectations 16 00:00:35,930 --> 00:00:39,740 in breaking down this value in simpler terms. 17 00:00:39,740 --> 00:00:41,260 So let's get started. 18 00:00:41,260 --> 00:00:49,440 We'll define Yi as the number of papers till we see 19 00:00:49,440 --> 00:00:50,910 the i-th new grade. 20 00:00:50,910 --> 00:00:56,420 21 00:00:56,420 --> 00:00:57,280 What does that mean? 22 00:00:57,280 --> 00:01:00,040 Well, let's take a look at an example. 23 00:01:00,040 --> 00:01:04,069 Suppose, here we have a timeline from no paper yet, 24 00:01:04,069 --> 00:01:06,740 first paper, second paper, third paper, 25 00:01:06,740 --> 00:01:08,430 so on, and so forth. 26 00:01:08,430 --> 00:01:12,230 Now, if we got grade A on the first slot, grade A minus on 27 00:01:12,230 --> 00:01:16,020 second slot, A again on the third slot, let's say there's 28 00:01:16,020 --> 00:01:17,960 a fourth's slot, we got B. 29 00:01:17,960 --> 00:01:22,380 According to this process, we see that Y1 is always 1, 30 00:01:22,380 --> 00:01:24,100 because whatever we got on the first slot 31 00:01:24,100 --> 00:01:25,810 will be a new grade. 32 00:01:25,810 --> 00:01:29,000 Now, Y2 is 2, because the second paper is, 33 00:01:29,000 --> 00:01:31,190 again, a new grade. 34 00:01:31,190 --> 00:01:33,690 On the third paper we got a grade, which is the same as 35 00:01:33,690 --> 00:01:34,950 the first grade. 36 00:01:34,950 --> 00:01:38,150 So that would not count as any Yi. 37 00:01:38,150 --> 00:01:43,940 And the third time we saw new grade would now be paper four. 38 00:01:43,940 --> 00:01:47,490 According to this notation, we're interested in knowing 39 00:01:47,490 --> 00:01:53,600 what is the expected value of E of Y6, which is the time it 40 00:01:53,600 --> 00:01:56,270 takes to receive all six grades. 41 00:01:56,270 --> 00:01:59,180 So so far this notation isn't really helping us in solving 42 00:01:59,180 --> 00:02:02,290 the problem, but kind of just staying a different way. 43 00:02:02,290 --> 00:02:05,290 It turns out, it's much easier to look at the following 44 00:02:05,290 --> 00:02:07,690 variable derived from the Yis. 45 00:02:07,690 --> 00:02:11,090 We'll define Xi as the difference between Yi 46 00:02:11,090 --> 00:02:13,700 plus 1 minus Yi. 47 00:02:13,700 --> 00:02:17,690 And in [? words, ?] it says, Xi is a number of papers you 48 00:02:17,690 --> 00:02:21,950 need until you see the i plus 1-th new grade, after you have 49 00:02:21,950 --> 00:02:23,840 received i new grade so far. 50 00:02:23,840 --> 00:02:30,450 So in this case, X1 will be if we call 0, Y0, will be the 51 00:02:30,450 --> 00:02:34,030 difference between Y1 and Y0, which is always 1-- 52 00:02:34,030 --> 00:02:35,270 that's X1. 53 00:02:35,270 --> 00:02:38,100 And the difference between these two will be X2. 54 00:02:38,100 --> 00:02:42,100 And the difference between Y3 and Y2-- 55 00:02:42,100 --> 00:02:44,685 Sorry. 56 00:02:44,685 --> 00:02:51,590 It should be Y X0, 1, 2, and so on. 57 00:02:51,590 --> 00:02:53,610 OK? 58 00:02:53,610 --> 00:02:59,040 Through this notation we see that Y6 now can be written as 59 00:02:59,040 --> 00:03:04,700 the summation of i equal to 0, 2, 5, X, i. 60 00:03:04,700 --> 00:03:08,580 So all I did was to break down i6 into a sequence of 61 00:03:08,580 --> 00:03:13,220 summation of the differences, like Y. 6 Minus Y5, Y5 minus 62 00:03:13,220 --> 00:03:14,960 Y4, and so on. 63 00:03:14,960 --> 00:03:19,060 It turns out, this expression will be very useful. 64 00:03:19,060 --> 00:03:20,420 OK. 65 00:03:20,420 --> 00:03:25,280 So now that we have the two variables Y and X, let's see 66 00:03:25,280 --> 00:03:28,200 if it will be easier to look at the distribution of X in 67 00:03:28,200 --> 00:03:30,170 studying this process. 68 00:03:30,170 --> 00:03:34,370 Let's say, we have seen a new grade so far-- 69 00:03:34,370 --> 00:03:35,400 one. 70 00:03:35,400 --> 00:03:37,660 How many trials would it take for us to see 71 00:03:37,660 --> 00:03:38,710 the second new grade? 72 00:03:38,710 --> 00:03:40,720 It turns out it's not that hard. 73 00:03:40,720 --> 00:03:44,790 In this case, we know there is a total of six grades, and we 74 00:03:44,790 --> 00:03:45,890 have seen one of them. 75 00:03:45,890 --> 00:03:48,920 So that leaves us five more grades that 76 00:03:48,920 --> 00:03:50,290 we'll potentially see. 77 00:03:50,290 --> 00:03:53,740 And therefore, on any random trial after that, there is a 78 00:03:53,740 --> 00:03:57,560 probability of 5 over 6 that we'll see a new grade. 79 00:03:57,560 --> 00:04:03,970 And hence, we know that X1 has a distribution geometric with 80 00:04:03,970 --> 00:04:08,350 a success probability, or a parameter, 5/6. 81 00:04:08,350 --> 00:04:12,580 Now, more generally, if we extend this idea further, we 82 00:04:12,580 --> 00:04:19,019 see that Xi will have a geometric distribution of 83 00:04:19,019 --> 00:04:24,840 parameter 6 minus i over 6. 84 00:04:24,840 --> 00:04:27,120 And this is due to the fact that so far we have already 85 00:04:27,120 --> 00:04:29,630 seen i new grades. 86 00:04:29,630 --> 00:04:33,350 And that will be the success probability of seeing a 87 00:04:33,350 --> 00:04:35,480 further new grade. 88 00:04:35,480 --> 00:04:39,670 So from this expression, we know that the expected value 89 00:04:39,670 --> 00:04:45,780 of Xi will simply be the inverse of the parameter of 90 00:04:45,780 --> 00:04:51,730 the geometric distribution, which is 6 over 6 minus i or 6 91 00:04:51,730 --> 00:04:54,350 times 1 over 6 minus i. 92 00:04:54,350 --> 00:04:56,930 93 00:04:56,930 --> 00:05:00,600 And now we're ready to compute a final answer. 94 00:05:00,600 --> 00:05:05,550 So from this expression we know expected value of Y6 is 95 00:05:05,550 --> 00:05:13,280 equal to the expected value of sum of i equal to 0 to 5 Xi. 96 00:05:13,280 --> 00:05:16,360 97 00:05:16,360 --> 00:05:19,600 And by the linearity of expectation, we can pull out 98 00:05:19,600 --> 00:05:28,140 the sum and write it as 2, 5 expected value of Xi. 99 00:05:28,140 --> 00:05:31,220 Now, since we know that expective of Xi is the 100 00:05:31,220 --> 00:05:32,670 following expression. 101 00:05:32,670 --> 00:05:36,990 We see that this term is equal to 6 times expected value of i 102 00:05:36,990 --> 00:05:43,260 equals 0, 5, 1 over 6 minus i. 103 00:05:43,260 --> 00:05:48,260 Or written in the other way this is equal to 6 times i 104 00:05:48,260 --> 00:05:51,356 equal to 0, 2, 5. 105 00:05:51,356 --> 00:06:03,510 In fact, 1, 2, 5, 1 over i. 106 00:06:03,510 --> 00:06:05,970 And all I did here was to, essentially, change the 107 00:06:05,970 --> 00:06:10,190 variable, so that these two summations contained exactly 108 00:06:10,190 --> 00:06:12,030 the same terms. 109 00:06:12,030 --> 00:06:20,840 And this will give us the answer, which is 14.7. 110 00:06:20,840 --> 00:06:24,420 Now, more generally, we can see that there's nothing 111 00:06:24,420 --> 00:06:26,400 special about number 6 here. 112 00:06:26,400 --> 00:06:30,640 We could have substituted 6 with a number, let's say, K. 113 00:06:30,640 --> 00:06:37,840 And then we'll get E of YK, let's say, there's more than 114 00:06:37,840 --> 00:06:39,580 six labels. 115 00:06:39,580 --> 00:06:45,030 And this will give us K times summation i equal to 1, so K 116 00:06:45,030 --> 00:06:47,740 minus 1, 1 over i. 117 00:06:47,740 --> 00:06:51,000 Interestingly, it turns out this quantity has an 118 00:06:51,000 --> 00:06:51,480 [? asymptotic ?] 119 00:06:51,480 --> 00:06:56,170 expression that, essentially, is roughly equal to K times 120 00:06:56,170 --> 00:07:01,100 the natural logarithm of K. And this is known as the 121 00:07:01,100 --> 00:07:02,190 scaling [? la ?] 122 00:07:02,190 --> 00:07:04,680 for the coupon collector's problem that says, 123 00:07:04,680 --> 00:07:06,810 essentially, takes about K times [? la ?] 124 00:07:06,810 --> 00:07:11,650 K many trials until we collect all K coupons. 125 00:07:11,650 --> 00:07:13,800 And that'll be the end of the problem. 126 00:07:13,800 --> 00:07:15,050 See you next time. 127 00:07:15,050 --> 00:07:15,850