1 00:00:02,220 --> 00:00:05,530 Simulation is an important tool in the analysis 2 00:00:05,530 --> 00:00:08,830 of probabilistic phenomena. 3 00:00:08,830 --> 00:00:12,310 For example, suppose that X, Y, and Z 4 00:00:12,310 --> 00:00:14,760 are independent random variables, 5 00:00:14,760 --> 00:00:17,580 and you're interested in the statistical properties 6 00:00:17,580 --> 00:00:20,400 of this random variable. 7 00:00:20,400 --> 00:00:22,660 Perhaps you can find the distribution 8 00:00:22,660 --> 00:00:26,300 of this random variable by solving a derived distribution 9 00:00:26,300 --> 00:00:29,210 problem, but sometimes this is impossible. 10 00:00:29,210 --> 00:00:32,049 And in such cases, what you do is, 11 00:00:32,049 --> 00:00:37,230 you generate random samples of these random variables 12 00:00:37,230 --> 00:00:39,850 drawn according to their distributions, 13 00:00:39,850 --> 00:00:45,450 and then evaluate the function g on that random sample. 14 00:00:45,450 --> 00:00:49,120 And this gives you one sample value of this function. 15 00:00:49,120 --> 00:00:51,400 And you can repeat that several times 16 00:00:51,400 --> 00:00:54,070 to obtain some kind of histogram and from that, 17 00:00:54,070 --> 00:00:56,690 get some understanding about the statistical properties 18 00:00:56,690 --> 00:00:58,160 of this function. 19 00:00:58,160 --> 00:01:02,330 So the question is, how can we generate a random sample 20 00:01:02,330 --> 00:01:06,300 of a random variable whose distribution is known? 21 00:01:06,300 --> 00:01:10,230 So what we want is to create some kind of box 22 00:01:10,230 --> 00:01:12,789 that outputs numbers. 23 00:01:12,789 --> 00:01:15,300 And these numbers are random variables 24 00:01:15,300 --> 00:01:21,580 that are distributed according to a CDF that's given to us. 25 00:01:21,580 --> 00:01:23,270 How can we do it? 26 00:01:23,270 --> 00:01:28,010 Well, computers typically have a random number generator 27 00:01:28,010 --> 00:01:29,640 in them. 28 00:01:29,640 --> 00:01:33,650 And random number generators, typically what they do 29 00:01:33,650 --> 00:01:37,729 is generate values that are drawn 30 00:01:37,729 --> 00:01:39,495 from a uniform distribution. 31 00:01:42,160 --> 00:01:44,580 So this gives us a starting point. 32 00:01:44,580 --> 00:01:47,640 We can generate uniform random variables. 33 00:01:47,640 --> 00:01:51,060 But what we want is to generate values of a random variable 34 00:01:51,060 --> 00:01:52,940 according to some other distribution. 35 00:01:52,940 --> 00:01:54,890 How are we going to do it? 36 00:01:54,890 --> 00:02:00,770 What we want to do is to create some kind of box or function 37 00:02:00,770 --> 00:02:07,870 that takes this uniform random variable and generates g of U. 38 00:02:07,870 --> 00:02:13,610 And we want to find the right function to use. 39 00:02:13,610 --> 00:02:20,190 Find a g so that the random variable, g of U, 40 00:02:20,190 --> 00:02:23,860 is distributed according to the distribution that we want. 41 00:02:23,860 --> 00:02:26,520 That is, we want the CDF of g of U 42 00:02:26,520 --> 00:02:29,940 to be the CDF that's given to us. 43 00:02:29,940 --> 00:02:32,140 So let's see how we can do this. 44 00:02:35,130 --> 00:02:39,310 Let us look at the discrete case first, which is easier. 45 00:02:39,310 --> 00:02:41,780 And let us look at an example. 46 00:02:41,780 --> 00:02:46,700 So suppose that I want to generate samples 47 00:02:46,700 --> 00:02:51,160 of a discrete random variable that has the following PMF. 48 00:02:51,160 --> 00:02:53,920 It takes this value with probability 2/6, 49 00:02:53,920 --> 00:02:58,050 this value with probability 3/6, and this value 50 00:02:58,050 --> 00:03:00,800 with probability 1/6. 51 00:03:00,800 --> 00:03:05,930 What I have is a uniform random variable 52 00:03:05,930 --> 00:03:10,500 that's drawn from a uniform distribution. 53 00:03:10,500 --> 00:03:13,100 What can I do? 54 00:03:13,100 --> 00:03:15,090 I can do the following. 55 00:03:15,090 --> 00:03:19,579 Let this number here be 2/6. 56 00:03:19,579 --> 00:03:22,420 If my uniform random variable falls 57 00:03:22,420 --> 00:03:27,510 in this range, which happens with probability 2/6, 58 00:03:27,510 --> 00:03:30,210 I'm going to report this value for 59 00:03:30,210 --> 00:03:31,470 my discrete random variable. 60 00:03:34,390 --> 00:03:38,300 Then I take an interval of length 3/6, 61 00:03:38,300 --> 00:03:42,860 which takes me to 5/6. 62 00:03:42,860 --> 00:03:47,500 And if my uniform random variable falls in this range, 63 00:03:47,500 --> 00:03:50,470 then I'm going to report that value 64 00:03:50,470 --> 00:03:52,700 for my discrete random variable. 65 00:03:52,700 --> 00:03:56,150 And finally, with probability 1/6, 66 00:03:56,150 --> 00:03:59,640 my uniform random variable happens to fall in here. 67 00:03:59,640 --> 00:04:02,810 And then I report that [value]. 68 00:04:02,810 --> 00:04:05,720 So clearly, the value that I'm reporting 69 00:04:05,720 --> 00:04:07,480 has the correct probabilities. 70 00:04:07,480 --> 00:04:11,180 I'm going to report this value with probability 2/6, 71 00:04:11,180 --> 00:04:15,020 I'm going to report that value with probability 3/6, 72 00:04:15,020 --> 00:04:16,190 and so on. 73 00:04:16,190 --> 00:04:19,100 So this is how we can generate random samples 74 00:04:19,100 --> 00:04:22,010 of a discrete distribution, starting 75 00:04:22,010 --> 00:04:24,460 from a uniform random variable. 76 00:04:24,460 --> 00:04:30,210 Let us now look at what we did in a somewhat different way. 77 00:04:30,210 --> 00:04:33,650 This is the x-axis. 78 00:04:33,650 --> 00:04:39,610 And let me plot the CDF of my discrete random variable. 79 00:04:39,610 --> 00:04:48,390 So the CDF has a jump of 2/6, at a point which is equal to that. 80 00:04:48,390 --> 00:04:52,730 Then it has another jump of size 3/6, which 81 00:04:52,730 --> 00:04:57,720 takes us to 5/6 at some other point. 82 00:04:57,720 --> 00:05:01,930 And that point here corresponds to the location of that value. 83 00:05:01,930 --> 00:05:05,600 And finally, it has another jump of 1/6 84 00:05:05,600 --> 00:05:08,740 that takes us to 1, at another point, that 85 00:05:08,740 --> 00:05:10,370 corresponds to the third value. 86 00:05:13,230 --> 00:05:18,670 And look now at this interval here from 0 to 1. 87 00:05:18,670 --> 00:05:20,630 And let us think as follows. 88 00:05:20,630 --> 00:05:22,910 We have a uniform random variable 89 00:05:22,910 --> 00:05:26,140 distributed between 0 to 1. 90 00:05:26,140 --> 00:05:28,770 If my uniform random variable happens 91 00:05:28,770 --> 00:05:34,470 to fall in this interval, I'm going to report that value. 92 00:05:34,470 --> 00:05:37,930 If my uniform random variable happens 93 00:05:37,930 --> 00:05:49,320 to fall in this interval, I'm going to report that value. 94 00:05:49,320 --> 00:05:54,690 And finally, if my uniform falls in this interval, 95 00:05:54,690 --> 00:05:58,020 I'm going to report that value. 96 00:05:58,020 --> 00:06:01,030 We're doing exactly the same thing as before. 97 00:06:01,030 --> 00:06:05,040 With probability 2/6, my uniform falls here. 98 00:06:05,040 --> 00:06:08,270 And we report this value and so on. 99 00:06:08,270 --> 00:06:10,840 So what's a graphical way of understanding 100 00:06:10,840 --> 00:06:12,480 of what we're doing? 101 00:06:12,480 --> 00:06:14,430 We're taking the CDF. 102 00:06:14,430 --> 00:06:18,120 We generate a value of the uniform. 103 00:06:18,120 --> 00:06:22,090 And then we move until we hit the CDF 104 00:06:22,090 --> 00:06:25,740 and report the corresponding value of x. 105 00:06:25,740 --> 00:06:28,280 It turns out that this recipe will also 106 00:06:28,280 --> 00:06:32,860 work in the continuous case. 107 00:06:32,860 --> 00:06:34,230 Let's see how this is done. 108 00:06:36,830 --> 00:06:40,409 So let's assume that we have a CDF, which 109 00:06:40,409 --> 00:06:41,800 is strictly monotonic. 110 00:06:44,860 --> 00:06:49,540 So the picture would be as follows. 111 00:06:49,540 --> 00:06:51,300 It's a CDF. 112 00:06:51,300 --> 00:06:53,620 CDFs are monotonic, but here, we assume 113 00:06:53,620 --> 00:06:56,290 that it is strictly monotonic. 114 00:06:56,290 --> 00:06:59,615 And we also assume that it is continuous. 115 00:07:06,060 --> 00:07:08,690 It doesn't have any jumps. 116 00:07:08,690 --> 00:07:16,530 So this CDF starts at 0 and rises, asymptotically, to 1. 117 00:07:16,530 --> 00:07:20,480 What was the recipe that we were just discussing? 118 00:07:20,480 --> 00:07:24,195 We generate a value for a uniform random variable. 119 00:07:28,110 --> 00:07:36,440 We move until we hit the CDF, and then report this value here 120 00:07:36,440 --> 00:07:38,870 for x. 121 00:07:38,870 --> 00:07:40,480 So what is it that we're doing? 122 00:07:40,480 --> 00:07:43,440 We're going from u's to x's. 123 00:07:43,440 --> 00:07:48,100 So we're using the inverse function. 124 00:07:48,100 --> 00:07:52,100 The cumulative takes as an input an x, 125 00:07:52,100 --> 00:07:57,400 a value on this axis, and then reports, a value on that axis. 126 00:07:57,400 --> 00:07:59,400 The inverse function is the function 127 00:07:59,400 --> 00:08:01,350 that goes the opposite way. 128 00:08:01,350 --> 00:08:03,850 We start from a value on the vertical axis 129 00:08:03,850 --> 00:08:08,030 and takes us to the horizontal axis. 130 00:08:08,030 --> 00:08:10,620 Now, the important thing is that because of our assumption 131 00:08:10,620 --> 00:08:14,450 that f is continuous and strictly monotonic, 132 00:08:14,450 --> 00:08:17,920 this inverse function is well-defined. 133 00:08:17,920 --> 00:08:20,020 Given any point here, we can always 134 00:08:20,020 --> 00:08:26,720 find one and only one corresponding x. 135 00:08:26,720 --> 00:08:29,480 Now, what are the properties of this method 136 00:08:29,480 --> 00:08:31,910 that we have been using? 137 00:08:31,910 --> 00:08:39,010 If I take some number c and then take the corresponding number 138 00:08:39,010 --> 00:08:45,620 up here, which is going to be F_X of c, 139 00:08:45,620 --> 00:08:49,020 then we have the following property. 140 00:08:49,020 --> 00:08:53,120 My random variable X is going to be 141 00:08:53,120 --> 00:08:56,235 less than or equal to c if and only 142 00:08:56,235 --> 00:09:01,160 if my random variable X falls into this interval. 143 00:09:01,160 --> 00:09:09,680 But that's equivalent to saying that the uniform random 144 00:09:09,680 --> 00:09:12,760 variable fell in that interval. 145 00:09:12,760 --> 00:09:16,180 Values of the uniform in this interval-- these 146 00:09:16,180 --> 00:09:19,600 are the values that give me x's that are 147 00:09:19,600 --> 00:09:22,370 less than or equal to c. 148 00:09:22,370 --> 00:09:26,270 So the event that X is less than or equal to c 149 00:09:26,270 --> 00:09:30,990 is identical to the event that U is less than 150 00:09:30,990 --> 00:09:34,410 or equal to F_X of c. 151 00:09:34,410 --> 00:09:39,260 So this is how I am generating my x's based on u's. 152 00:09:39,260 --> 00:09:42,400 We now need to verify that the x's that I'm generating 153 00:09:42,400 --> 00:09:47,110 this way have the correct property, have the correct CDF. 154 00:09:47,110 --> 00:09:49,850 So let's check it out. 155 00:09:49,850 --> 00:09:55,240 The probability that X is less than or equal to c, this 156 00:09:55,240 --> 00:10:03,230 is the probability that U is less than or equal to F_X of c. 157 00:10:03,230 --> 00:10:05,640 But U is a uniform random variable. 158 00:10:05,640 --> 00:10:08,050 The probability of being less than something 159 00:10:08,050 --> 00:10:11,250 is just that something. 160 00:10:11,250 --> 00:10:16,530 So we have verified that with this way of constructing 161 00:10:16,530 --> 00:10:19,650 samples of X based on samples of U, 162 00:10:19,650 --> 00:10:23,225 the random variable that we get has the desired CDF. 163 00:10:26,950 --> 00:10:29,340 Let's look at an example now. 164 00:10:29,340 --> 00:10:32,010 Suppose that we want to generate samples 165 00:10:32,010 --> 00:10:36,710 of a random variable, which is an exponential random variable, 166 00:10:36,710 --> 00:10:38,720 with parameter 1. 167 00:10:38,720 --> 00:10:41,770 In this case, we know what the CDF is. 168 00:10:41,770 --> 00:10:47,600 The CDF of an exponential with parameter 1 is given by this 169 00:10:47,600 --> 00:10:50,000 formula, for non-negative x's. 170 00:10:54,110 --> 00:10:57,065 Now, let us find the inverse function. 171 00:11:01,080 --> 00:11:08,180 If a u corresponds to 1 minus e to the minus x-- 172 00:11:08,180 --> 00:11:12,870 so we started with some x here and we find the corresponding 173 00:11:12,870 --> 00:11:17,970 u-- this is the formula that takes us from x's to u's. 174 00:11:17,970 --> 00:11:21,990 Let's find the formula that takes us from u's to x's. 175 00:11:21,990 --> 00:11:24,910 So we need to solve this equation. 176 00:11:24,910 --> 00:11:27,910 Let's send u to the other side, and let's 177 00:11:27,910 --> 00:11:30,610 send this term to the left hand side. 178 00:11:30,610 --> 00:11:35,750 We obtain e to the minus x equals 1 minus u. 179 00:11:35,750 --> 00:11:41,060 Let us take logarithms: minus x equals to the logarithm 180 00:11:41,060 --> 00:11:43,530 of 1 minus u. 181 00:11:43,530 --> 00:11:51,600 And finally, x is equal to minus the logarithm of 1 minus u. 182 00:11:51,600 --> 00:11:54,850 So this is the inverse function. 183 00:11:54,850 --> 00:11:58,510 And now, what we have discussed leads us 184 00:11:58,510 --> 00:12:00,590 to the following procedure. 185 00:12:00,590 --> 00:12:03,870 I generate a random variable, U, according 186 00:12:03,870 --> 00:12:05,920 to the uniform distribution. 187 00:12:05,920 --> 00:12:09,420 Then I form the random variable X 188 00:12:09,420 --> 00:12:15,750 by taking the negative of the logarithm of 1 minus U. 189 00:12:15,750 --> 00:12:20,010 And this gives me a random variable, 190 00:12:20,010 --> 00:12:22,430 which has an exponential distribution. 191 00:12:22,430 --> 00:12:25,710 And so we have found a way of simulating 192 00:12:25,710 --> 00:12:29,590 exponential random variables, starting with a random number 193 00:12:29,590 --> 00:12:33,543 generator that produces uniform random variables.