Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
Description: In this lecture, the professor discussed central limit theorem, Normal approximation, 1/2 correction for binomial approximation, and De Moivre–Laplace central limit theorem.
Instructor: John Tsitsiklis
Lecture 20: Central Limit T...
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
PROFESSOR: We're going to finish today our discussion of limit theorems. I'm going to remind you what the central limit theorem is, which we introduced briefly last time. We're going to discuss what exactly it says and its implications. And then we're going to apply to a couple of examples, mostly on the binomial distribution.
OK, so the situation is that we are dealing with a large number of independent, identically distributed random variables. And we want to look at the sum of them and say something about the distribution of the sum. We might want to say that the sum is distributed approximately as a normal random variable, although, formally, this is not quite right. As n goes to infinity, the distribution of the sum becomes very spread out, and it doesn't converge to a limiting distribution.
In order to get an interesting limit, we need first to take the sum and standardize it. By standardizing it, what we mean is to subtract the mean and then divide by the standard deviation. Now, the mean is, of course, n times the expected value of each one of the X's. And the standard deviation is the square root of the variance. The variance is n times sigma squared, where sigma is the variance of the X's -- so that's the standard deviation.
And after we do this, we obtain a random variable that has 0 mean -- its centered -- and the variance is equal to 1. And so the variance stays the same, no matter how large n is going to be.
So the distribution of Zn keeps changing with n, but it cannot change too much. It stays in place. The mean is 0, and the width remains also roughly the same because the variance is 1. The surprising thing is that, as n grows, that distribution of Zn kind of settles in a certain asymptotic shape. And that's the shape of a standard normal random variable. So standard normal means that it has 0 mean and unit variance.
More precisely, what the central limit theorem tells us is a relation between the cumulative distribution function of Zn and its relation to the cumulative distribution function of the standard normal. So for any given number, c, the probability that Zn is less than or equal to c, in the limit, becomes the same as the probability that the standard normal becomes less than or equal to c. And of course, this is useful because these probabilities are available from the normal tables, whereas the distribution of Zn might be a very complicated expression if you were to calculate it exactly.
So some comments about the central limit theorem. First thing is that it's quite amazing that it's universal. It doesn't matter what the distribution of the X's is. It can be any distribution whatsoever, as long as it has finite mean and finite variance. And when you go and do your approximations using the central limit theorem, the only thing that you need to know about the distribution of the X's are the mean and the variance. You need those in order to standardize Sn. I mean -- to subtract the mean and divide by the standard deviation -- you need to know the mean and the variance. But these are the only things that you need to know in order to apply it.
In addition, it's a very accurate computational shortcut. So the distribution of this Zn's, in principle, you can calculate it by convolution of the distribution of the X's with itself many, many times. But this is tedious, and if you try to do it analytically, it might be a very complicated expression. Whereas by just appealing to the standard normal table for the standard normal random variable, things are done in a very quick way. So it's a nice computational shortcut if you don't want to get an exact answer to a probability problem.
Now, at a more philosophical level, it justifies why we are really interested in normal random variables. Whenever you have a phenomenon which is noisy, and the noise that you observe is created by adding the lots of little pieces of randomness that are independent of each other, the overall effect that you're going to observe can be described by a normal random variable. So in a classic example that goes 100 years back or so, suppose that you have a fluid, and inside that fluid, there's a little particle of dust or whatever that's suspended in there. That little particle gets hit by molecules completely at random -- and so what you're going to see is that particle kind of moving randomly inside that liquid.
Now that random motion, if you ask, after one second, how much is my particle displaced, let's say, in the x-axis along the x direction. That displacement is very, very well modeled by a normal random variable. And the reason is that the position of that particle is decided by the cumulative effect of lots of random hits by molecules that hit that particle.
So that's a sort of celebrated physical model that goes under the name of Brownian motion. And it's the same model that some people use to describe the movement in the financial markets. The argument might go that the movement of prices has to do with lots of little decisions and lots of little events by many, many different actors that are involved in the market. So the distribution of stock prices might be well described by normal random variables. At least that's what people wanted to believe until somewhat recently.
Now, the evidence is that, actually, these distributions are a little more heavy-tailed in the sense that extreme events are a little more likely to occur that what normal random variables would seem to indicate. But as a first model, again, it could be a plausible argument to have, at least as a starting model, one that involves normal random variables. So this is the philosophical side of things.
On the more accurate, mathematical side, it's important to appreciate exactly quite kind of statement the central limit theorem is. It's a statement about the convergence of the CDF of these standardized random variables to the CDF of a normal. So it's a statement about convergence of CDFs. It's not a statement about convergence of PMFs, or convergence of PDFs.
Now, if one makes additional mathematical assumptions, there are variations of the central limit theorem that talk about PDFs and PMFs. But in general, that's not necessarily the case. And I'm going to illustrate this with-- I have a plot here which is not in your slides. But just to make the point, consider two different discrete distributions.
This discrete distribution takes values 1, 4, 7. This discrete distribution can take values 1, 2, 4, 6, and 7. So this one has sort of a periodicity of 3, this one, the range of values is a little more interesting. The numbers in these two distributions are cooked up so that they have the same mean and the same variance.
Now, what I'm going to do is to take eight independent copies of the random variable and plot the PMF of the sum of eight random variables. Now, if I plot the PMF of the sum of 8 of these, I get the plot, which corresponds to these bullets in this diagram. If I take 8 random variables, according to this distribution, and add them up and compute their PMF, the PMF I get is the one denoted here by the X's. The two PMFs look really different, at least, when you eyeball them.
On the other hand, if you were to plot the CDFs of them, then the CDFs, if you compare them with the normal CDF, which is this continuous curve, the CDF, of course, it goes up in steps because we're looking at discrete random variables. But it's very close to the normal CDF. And if we, instead of n equal to 8, we were to take 16, then the coincidence would be even better.
So in terms of CDFs, when we add 8 or 16 of these, we get very close to the normal CDF. We would get essentially the same picture if I were to take 8 or 16 of these. So the CDFs sit, essentially, on top of each other, although the two PMFs look quite different. So this is to appreciate that, formally speaking, we only have a statement about CDFs, not about PMFs.
Now in practice, how do you use the central limit theorem? Well, it tells us that we can calculate probabilities by treating Zn as if it were a standard normal random variable. Now Zn is a linear function of Sn. Conversely, Sn is a linear function of Zn. Linear functions of normals are normal. So if I pretend that Zn is normal, it's essentially the same as if we pretend that Sn is normal. And so we can calculate probabilities that have to do with Sn as if Sn were normal. Now, the central limit theorem does not tell us that Sn is approximately normal. The formal statement is about Zn, but, practically speaking, when you use the result, you can just pretend that Sn is normal.
Finally, it's a limit theorem, so it tells us about what happens when n goes to infinity. If we are to use it in practice, of course, n is not going to be infinity. Maybe n is equal to 15. Can we use a limit theorem when n is a small number, as small as 15?
Well, it turns out that it's a very good approximation. Even for quite small values of n, it gives us very accurate answers. So n over the order of 15, or 20, or so give us very good results in practice. There are no good theorems that will give us hard guarantees because the quality of the approximation does depend on the details of the distribution of the X's. If the X's have a distribution that, from the outset, looks a little bit like the normal, then for small values of n, you are going to see, essentially, a normal distribution for the sum. If the distribution of the X's is very different from the normal, it's going to take a larger value of n for the central limit theorem to take effect.
So let's illustrates this with a few representative plots. So here, we're starting with a discrete uniform distribution that goes from 1 to 8. Let's add 2 of these random variables, 2 random variables with this PMF, and find the PMF of the sum. This is a convolution of 2 discrete uniforms, and I believe you have seen this exercise before. When you convolve this with itself, you get a triangle. So this is the PMF for the sum of two discrete uniforms.
Now let's continue. Let's convolve this with itself. These was going to give us the PMF of a sum of 4 discrete uniforms. And we get this, which starts looking like a normal. If we go to n equal to 32, then it looks, essentially, exactly like a normal. And it's an excellent approximation. So this is the PMF of the sum of 32 discrete random variables with this uniform distribution.
If we start with a PMF which is not symmetric-- this one is symmetric around the mean. But if we start with a PMF which is non-symmetric, so this is, here, is a truncated geometric PMF, then things do not work out as nicely when I add 8 of these. That is, if I convolve this with itself 8 times, I get this PMF, which maybe resembles a little bit to the normal one.
But you can really tell that it's different from the normal if you focus at the details here and there. Here it sort of rises sharply. Here it tails off a bit slower. So there's an asymmetry here that's present, and which is a consequence of the asymmetry of the distribution we started with. If we go to 16, it looks a little better, but still you can see the asymmetry between this tail and that tail.
If you get to 32 there's still a little bit of asymmetry, but at least now it starts looking like a normal distribution. So the moral from these plots is that it might vary, a little bit, what kind of values of n you need before you get the really good approximation. But for values of n in the range 20 to 30 or so, usually you expect to get a pretty good approximation. At least that's what the visual inspection of these graphs tells us.
So now that we know that we have a good approximation in our hands, let's use it. Let's use it by revisiting an example from last time. This is the polling problem. We're interested in the fraction of population that has a certain habit been. And we try to find what f is. And the way we do it is by polling people at random and recording the answers that they give, whether they have the habit or not. So for each person, we get the Bernoulli random variable. With probability f, a person is going to respond 1, or yes, so this is with probability f. And with the remaining probability 1-f, the person responds no.
We record this number, which is how many people answered yes, divided by the total number of people. That's the fraction of the population that we asked. This is the fraction inside our sample that answered yes. And as we discussed last time, you might start with some specs for the poll. And the specs have two parameters-- the accuracy that you want and the confidence that you want to have that you did really obtain the desired accuracy. So the specs here is that we want, probability 95% that our estimate is within 1 % point from the true answer.
So the event of interest is this. That's the result of the poll minus distance from the true answer is less or bigger than 1 % point. And we're interested in calculating or approximating this particular probability.
So we want to do it using the central limit theorem. And one way of arranging the mechanics of this calculation is to take the event of interest and massage it by subtracting and dividing things from both sides of this inequality so that you bring him to the picture the standardized random variable, the Zn, and then apply the central limit theorem.
So the event of interest, let me write it in full, Mn is this quantity, so I'm putting it here, minus f, which is the same as nf divided by n. So this is the same as that event. We're going to calculate the probability of this. This is not exactly in the form in which we apply the central limit theorem. To apply the central limit theorem, we need, down here, to have sigma square root n.
So how can I put sigma square root n here? I can divide both sides of this inequality by sigma. And then I can take a factor of square root n from here and send it to the other side.
So this event is the same as that event. This will happen if and only if that will happen. So calculating the probability of this event here is the same as calculating the probability that this events happens.
And now we are in business because the random variable that we got in here is Zn, or the absolute value of Zn, and we're talking about the probability that Zn, absolute value of Zn, is bigger than a certain number. Since Zn is to be approximated by a standard normal random variable, our approximation is going to be, instead of asking for Zn being bigger than this number, we will ask for Z, absolute value of Z, being bigger than this number.
So this is the probability that we want to calculate. And now Z is a standard normal random variable. There's a small difficulty, the one that we also encountered last time. And the difficulty is that the standard deviation, sigma, of the Xi's is not known. Sigma is equal to f times-- sigma, in this example, is f times (1-f), and the only thing that we know about sigma is that it's going to be a number less than 1/2.
OK, so we're going to have to use an inequality here. We're going to use a conservative value of sigma, the value of sigma equal to 1/2 and use that instead of the exact value of sigma. And this gives us an inequality going this way.
Let's just make sure why the inequality goes this way. We got, on our axis, two numbers. One number is 0.01 square root n divided by sigma. And the other number is 0.02 square root of n. And my claim is that the numbers are related to each other in this particular way.
Why is this? Sigma is less than 2. So 1/sigma is bigger than 2. So since 1/sigma is bigger than 2 this means that this numbers sits to the right of that number. So here we have the probability that Z is bigger than this number. The probability of falling out there is less than the probability of falling in this interval.
So that's what that last inequality is saying-- this probability is smaller than that probability. This is the probability that we're interested in, but since we don't know sigma, we take the conservative value, and we use an upper bound in terms of the probability of this interval here.
And now we are in business. We can start using our normal tables to calculate probabilities of interest. So for example, let's say that's we take n to be 10,000. How is the calculation going to go? We want to calculate the probability that the absolute value of Z is bigger than 0.2 times 1000, which is the probability that the absolute value of Z is larger than or equal to 2.
And here let's do some mechanics, just to stay in shape. The probability that you're larger than or equal to 2 in absolute value, since the normal is symmetric around the mean, this is going to be twice the probability that Z is larger than or equal to 2.
Can we use the cumulative distribution function of Z to calculate this? Well, almost the cumulative gives us probabilities of being less than something, not bigger than something. So we need one more step and write this as 1 minus the probability that Z is less than or equal to 2.
And this probability, now, you can read off from the normal tables. And the normal tables will tell you that this probability is 0.9772. And you do get an answer. And the answer is 0.0456. OK, so we tried 10,000. And we find that our probably of error is 4.5%, so we're doing better than the spec that we had. So this tells us that maybe we have some leeway. Maybe we can use a smaller sample size and still stay without our specs.
Let's try to find how much we can push the envelope. How much smaller can we take n? To answer that question, we need to do this kind of calculation, essentially, going backwards. We're going to fix this number to be 0.05 and work backwards here to find-- did I do a mistake here? 10,000. So I'm missing a 0 here. Ah, but I'm taking the square root, so it's 100.
Where did the 0.02 come in from? Ah, from here. OK, all right. 0.02 times 100, that gives us 2. OK, all right. Very good, OK. So we'll have to do this calculation now backwards, figure out if this is 0.05, what kind of number we're going to need here and then here, and from this we will be able to tell what value of n do we need.
OK, so we want to find n such that the probability that Z is bigger than 0.02 square root n is 0.05. OK, so Z is a standard normal random variable. And we want the probability that we are outside this range. We want the probability of those two tails together. Those two tails together should have probability of 0.05. This means that this tail, by itself, should have probability 0.025. And this means that this probability should be 0.975.
Now, if this probability is to be 0.975, what should that number be? You go to the normal tables, and you find which is the entry that corresponds to that number. I actually brought a normal table with me. And 0.975 is down here. And it tells you that to the number that corresponds to it is 1.96.
So this tells us that this number should be equal to 1.96. And now, from here, you do the calculations. And you find that n is 9604. So with a sample of 10,000, we got probability of error 4.5%. With a slightly smaller sample size of 9,600, we can get the probability of a mistake to be 0.05, which was exactly our spec.
So these are essentially the two ways that you're going to be using the central limit theorem. Either you're given n and you try to calculate probabilities. Or you're given the probabilities, and you want to work backwards to find n itself.
So in this example, the random variable that we dealt with was, of course, a binomial random variable. The Xi's were Bernoulli, so the sum of the Xi's were binomial. So the central limit theorem certainly applies to the binomial distribution. To be more precise, of course, it applies to the standardized version of the binomial random variable.
So here's what we did, essentially, in the previous example. We fixed the number p, which is the probability of success in our experiments. p corresponds to f in the previous example. Let every Xi a Bernoulli random variable and are standing assumption is that these random variables are independent. When we add them, we get a random variable that has a binomial distribution. We know the mean and the variance of the binomial, so we take Sn, we subtract the mean, which is this, divide by the standard deviation. The central limit theorem tells us that the cumulative distribution function of this random variable is a standard normal random variable in the limit.
So let's do one more example of a calculation. Let's take n to be-- let's choose some specific numbers to work with. So in this example, first thing to do is to find the expected value of Sn, which is n times p. It's 18.
Then we need to write down the standard deviation. The variance of Sn is the sum of the variances. It's np times (1-p). And in this particular example, p times (1-p) is 1/4, n is 36, so this is 9. And that tells us that the standard deviation of this n is equal to 3.
So what we're going to do is to take the event of interest, which is Sn less than 21, and rewrite it in a way that involves the standardized random variable. So to do that, we need to subtract the mean. So we write this as Sn-3 should be less than or equal to 21-3. This is the same event. And then divide by the standard deviation, which is 3, and we end up with this. So the event itself of--
AUDIENCE: [INAUDIBLE].
Should subtract, 18, yes, which gives me a much nicer number out here, which is 1. So the event of interest, that Sn is less than 21, is the same as the event that a standard normal random variable is less than or equal to 1. And once more, you can look this up at the normal tables. And you find that the answer that you get is 0.43.
Now it's interesting to compare this answer that we got through the central limit theorem with the exact answer. The exact answer involves the exact binomial distribution. What we have here is the binomial probability that, Sn is equal to k. Sn being equal to k is given by this formula. And we add, over all values for k going from 0 up to 21, we write a two lines code to calculate this sum, and we get the exact answer, which is 0.8785. So there's a pretty good agreements between the two, although you wouldn't call that's necessarily excellent agreement.
Can we do a little better than that? OK. It turns out that we can. And here's the idea. So our random variable Sn has a mean of 18. It has a binomial distribution. It's described by a PMF that has a shape roughly like this and which keeps going on.
Using the central limit theorem is basically pretending that Sn is normal with the right mean and variance. So pretending that Zn has 0 mean unit variance, we approximate it with Z, that has 0 mean unit variance. If you were to pretend that Sn is normal, you would approximate it with a normal that has the correct mean and correct variance. So it would still be centered at 18. And it would have the same variance as the binomial PMF.
So using the central limit theorem essentially means that we keep the mean and the variance what they are but we pretend that our distribution is normal. We want to calculate the probability that Sn is less than or equal to 21. I pretend that my random variable is normal, so I draw a line here and I calculate the area under the normal curve going up to 21. That's essentially what we did.
Now, a smart person comes around and says, Sn is a discrete random variable. So the event that Sn is less than or equal to 21 is the same as Sn being strictly less than 22 because nothing in between can happen. So I'm going to use the central limit theorem approximation by pretending again that Sn is normal and finding the probability of this event while pretending that Sn is normal. So what this person would do would be to draw a line here, at 22, and calculate the area under the normal curve all the way to 22.
Who is right? Which one is better? Well neither, but we can do better than both if we sort of split the difference. So another way of writing the same event for Sn is to write it as Sn being less than 21.5. In terms of the discrete random variable Sn, all three of these are exactly the same event. But when you do the continuous approximation, they give you different probabilities. It's a matter of whether you integrate the area under the normal curve up to here, up to the midway point, or up to 22. It turns out that integrating up to the midpoint is what gives us the better numerical results. So we take here 21 and 1/2, and we integrate the area under the normal curve up to here.
So let's do this calculation and see what we get. What would we change here? Instead of 21, we would now write 21 and 1/2. This 18 becomes, no, that 18 stays what it is. But this 21 becomes 21 and 1/2. And so this one becomes 1 + 0.5 by 3. This is 117.
So we now look up into the normal tables and ask for the probability that Z is less than 1.17. So this here gets approximated by the probability that the standard normal is less than 1.17. And the normal tables will tell us this is 0.879.
Going back to the previous slide, what we got this time with this improved approximation is 0.879. This is a really good approximation of the correct number. This is what we got using the 21. This is what we get using the 21 and 1/2. And it's an approximation that's sort of right on-- a very good one.
The moral from this numerical example is that doing this 1 and 1/2 correction does give us better approximations. In fact, we can use this 1/2 idea to even calculate individual probabilities. So suppose you want to approximate the probability that Sn equal to 19. If you were to pretend that Sn is normal and calculate this probability, the probability that the normal random variable is equal to 19 is 0. So you don't get an interesting answer.
You get a more interesting answer by writing this event, 19 as being the same as the event of falling between 18 and 1/2 and 19 and 1/2 and using the normal approximation to calculate this probability. In terms of our previous picture, this corresponds to the following.
We are interested in the probability that Sn is equal to 19. So we're interested in the height of this bar. We're going to consider the area under the normal curve going from here to here, and use this area as an approximation for the height of that particular bar.
So what we're basically doing is, we take the probability under the normal curve that's assigned over a continuum of values and attributed it to different discrete values. Whatever is above the midpoint gets attributed to 19. Whatever is below that midpoint gets attributed to 18. So this is green area is our approximation of the value of the PMF at 19.
So similarly, if you wanted to approximate the value of the PMF at this point, you would take this interval and integrate the area under the normal curve over that interval. It turns out that this gives a very good approximation of the PMF of the binomial. And actually, this was the context in which the central limit theorem was proved in the first place, when this business started.
So this business goes back a few hundred years. And the central limit theorem was first approved by considering the PMF of a binomial random variable when p is equal to 1/2. People did the algebra, and they found out that the exact expression for the PMF is quite well approximated by that expression hat you would get from a normal distribution. Then the proof was extended to binomials for more general values of p.
So here we talk about this as a refinement of the general central limit theorem, but, historically, that refinement was where the whole business got started in the first place. All right, so let's go through the mechanics of approximating the probability that Sn is equal to 19-- exactly 19. As we said, we're going to write this event as an event that covers an interval of unit length from 18 and 1/2 to 19 and 1/2. This is the event of interest.
First step is to massage the event of interest so that it involves our Zn random variable. So subtract 18 from all sides. Divide by the standard deviation of 3 from all sides. That's the equivalent representation of the event. This is our standardized random variable Zn. These are just these numbers.
And to do an approximation, we want to find the probability of this event, but Zn is approximately normal, so we plug in here the Z, which is the standard normal. So we want to find the probability that the standard normal falls inside this interval. You find these using CDFs because this is the probability that you're less than this but not less than that. So it's a difference between two cumulative probabilities.
Then, you look up your normal tables. You find two numbers for these quantities, and, finally, you get a numerical answer for an individual entry of the PMF of the binomial. This is a pretty good approximation, it turns out. If you were to do the calculations using the exact formula, you would get something which is pretty close-- an error in the third digit-- this is pretty good.
So I guess what we did here with our discussion of the binomial slightly contradicts what I said before-- that the central limit theorem is a statement about cumulative distribution functions. In general, it doesn't tell you what to do to approximate PMFs themselves. And that's indeed the case in general. One the other hand, for the special case of a binomial distribution, the central limit theorem approximation, with this 1/2 correction, is a very good approximation even for the individual PMF.
All right, so we spent quite a bit of time on mechanics. So let's spend the last few minutes today thinking a bit and look at a small puzzle. So the puzzle is the following. Consider Poisson process that runs over a unit interval. And where the arrival rate is equal to 1. So this is the unit interval. And let X be the number of arrivals. And this is Poisson, with mean 1.
Now, let me take this interval and divide it into n little pieces. So each piece has length 1/n. And let Xi be the number of arrivals during the Ith little interval.
OK, what do we know about the random variables Xi? Is they are themselves Poisson. It's a number of arrivals during a small interval. We also know that when n is big, so the length of the interval is small, these Xi's are approximately Bernoulli, with mean 1/n.
Guess it doesn't matter whether we model them as Bernoulli or not. What matters is that the Xi's are independent. Why are they independent? Because, in a Poisson process, these joint intervals are independent of each other.
So the Xi's are independent. And they also have the same distribution. And we have that X, the total number of arrivals, is the sum over the Xn's. So the central limit theorem tells us that, approximately, the sum of independent, identically distributed random variables, when we have lots of these random variables, behaves like a normal random variable. So by using this decomposition of X into a sum of i.i.d random variables, and by using values of n that are bigger and bigger, by taking the limit, it should follow that X has a normal distribution.
On the other hand, we know that X has a Poisson distribution. So something must be wrong in this argument here. Can we really use the central limit theorem in this situation?
So what do we need for the central limit theorem? We need to have independent, identically distributed random variables. We have it here. We want them to have a finite mean and finite variance. We also have it here, means variances are finite.
What is another assumption that was never made explicit, but essentially was there? Or in other words, what is the flaw in this argument that uses the central limit theorem here? Any thoughts?
So in the central limit theorem, we said, consider-- fix a probability distribution, and let the Xi's be distributed according to that probability distribution, and add a larger and larger number or Xi's. But the underlying, unstated assumption is that we fix the distribution of the Xi's. As we let n increase, the statistics of each Xi do not change.
Whereas here, I'm playing a trick on you. As I'm taking more and more random variables, I'm actually changing what those random variables are. When I take a larger n, the Xi's are random variables with a different mean and different variance. So I'm adding more of these, but at the same time, in this example, I'm changing their distributions.
That's something that doesn't fit the setting of the central limit theorem. In the central limit theorem, you first fix the distribution of the X's. You keep it fixed, and then you consider adding more and more according to that particular fixed distribution. So that's the catch. That's why the central limit theorem does not apply to this situation. And we're lucky that it doesn't apply because, otherwise, we would have a huge contradiction destroying probability theory.
OK, but now that's still leaves us with a little bit of a dilemma. Suppose that, here, essentially we're adding independent Bernoulli random variables. So the issue is that the central limit theorem has to do with asymptotics as n goes to infinity. And if we consider a binomial, and somebody gives us specific numbers about the parameters of that binomial, it might not necessarily be obvious what kind of approximation do we use.
In particular, we do have two different approximations for the binomial. If we fix p, then the binomial is the sum of Bernoulli's that come from a fixed distribution, we consider more and more of these. When we add them, the central limit theorem tells us that we get the normal distribution.
There's another sort of limit, which has the flavor of this example, in which we still deal with a binomial, sum of n Bernoulli's. We let that sum, the number of the Bernoulli's go to infinity. But each Bernoulli has a probability of success that goes to 0, and we do this in a way so that np, the expected number of successes, stays finite.
This is the situation that we dealt with when we first defined our Poisson process. We have a very, very large number so lots, of time slots, but during each time slot, there's a tiny probability of obtaining an arrival. Under that setting, in discrete time, we have a binomial distribution, or Bernoulli process, but when we take the limit, we obtain the Poisson process and the Poisson approximation.
So these are two equally valid approximations of the binomial. But they're valid in different asymptotic regimes. In one regime, we fixed p, let n go to infinity. In the other regime, we let both n and p change simultaneously.
Now, in real life, you're never dealing with the limiting situations. You're dealing with actual numbers. So if somebody tells you that the numbers are like this, then you should probably say that this is the situation that fits the Poisson description-- large number of slots with each slot having a tiny probability of success.
On the other hand, if p is something like this, and n is 500, then you expect to get the distribution for the number of successes. It's going to have a mean of 50 and to have a fair amount of spread around there. It turns out that the normal approximation would be better in this context.
As a rule of thumb, if n times p is bigger than 10 or 20, you can start using the normal approximation. If n times p is a small number, then you prefer to use the Poisson approximation. But there's no hard theorems or rules about how to go about this. OK, so from next time we're going to switch base again. And we're going to put together everything we learned in this class to start solving inference problems.