1 00:00:00,690 --> 00:00:04,270 We now discuss how to come up with confidence intervals when 2 00:00:04,270 --> 00:00:08,550 we try to estimate the unknown mean of some random variable, 3 00:00:08,550 --> 00:00:10,830 or of some distribution, using the 4 00:00:10,830 --> 00:00:13,170 sample mean as our estimator. 5 00:00:13,170 --> 00:00:17,310 So here X1 up to Xn are independent, identically 6 00:00:17,310 --> 00:00:20,320 distributed random variables that are drawn from a 7 00:00:20,320 --> 00:00:23,320 distribution that has a certain mean theta, the 8 00:00:23,320 --> 00:00:26,410 quantity that we want to estimate, and some variance 9 00:00:26,410 --> 00:00:28,400 sigma squared. 10 00:00:28,400 --> 00:00:30,570 Let us say that we want to construct a 11 00:00:30,570 --> 00:00:33,320 95% confidence interval. 12 00:00:33,320 --> 00:00:36,790 Our starting point will be the fact that the sample mean, 13 00:00:36,790 --> 00:00:40,050 according to the central limit theorem, can be described 14 00:00:40,050 --> 00:00:43,740 approximately using normal distributions. 15 00:00:43,740 --> 00:00:48,420 And we look up at the normal table, and we observe the 16 00:00:48,420 --> 00:00:49,450 following-- 17 00:00:49,450 --> 00:00:55,070 that if we take a standard normal random variable, then 18 00:00:55,070 --> 00:01:02,430 there is probability, 97.5% of falling below this number, 19 00:01:02,430 --> 00:01:07,570 1.96, which means that there is probability 2 1/2% of 20 00:01:07,570 --> 00:01:09,910 falling above that number. 21 00:01:09,910 --> 00:01:12,760 And by symmetry, the probability of falling below 22 00:01:12,760 --> 00:01:17,990 minus 1.96 is also 2 1/2%. 23 00:01:17,990 --> 00:01:21,810 This means that this middle interval here has probability 24 00:01:21,810 --> 00:01:26,070 95%, and we exploit this fact as follows. 25 00:01:28,800 --> 00:01:32,860 If we take the sample mean, subtract the true mean, and 26 00:01:32,860 --> 00:01:36,250 then divide by the standard deviation of the sample mean, 27 00:01:36,250 --> 00:01:39,479 then we obtain a random variable, which is 28 00:01:39,479 --> 00:01:42,630 approximately a standard normal. 29 00:01:42,630 --> 00:01:46,320 Therefore, what we have here is the probability of an 30 00:01:46,320 --> 00:01:49,039 approximately standard normal random variable. 31 00:01:49,039 --> 00:01:55,420 Or actually, its absolute value falling below 1.96. 32 00:01:55,420 --> 00:01:59,710 This is just the event that our standard normal falls 33 00:01:59,710 --> 00:02:04,180 inside this middle interval here, according to this entry 34 00:02:04,180 --> 00:02:07,090 from the normal tables and the previous discussion, this 35 00:02:07,090 --> 00:02:10,179 probability is going to be approximately 95%. 36 00:02:12,990 --> 00:02:17,470 And now we take this statement, send this term to 37 00:02:17,470 --> 00:02:20,710 the other side of the inequality, and then interpret 38 00:02:20,710 --> 00:02:22,910 what it means for an absolute value to 39 00:02:22,910 --> 00:02:24,540 be less than something. 40 00:02:24,540 --> 00:02:28,130 And we obtain an equivalent statement. 41 00:02:28,130 --> 00:02:32,320 This event here is algebraically identical to the 42 00:02:32,320 --> 00:02:35,370 event that we have up there, and this provides us with the 43 00:02:35,370 --> 00:02:37,360 desired confidence interval. 44 00:02:37,360 --> 00:02:40,920 We think of this quantity here as the lower end of the 45 00:02:40,920 --> 00:02:42,320 confidence interval. 46 00:02:42,320 --> 00:02:45,310 This quantity here is the upper end of 47 00:02:45,310 --> 00:02:47,010 the confidence interval. 48 00:02:47,010 --> 00:02:50,730 And this statement tells us that there is probability 49 00:02:50,730 --> 00:02:55,550 approximately equal to 95% that the confidence interval 50 00:02:55,550 --> 00:02:59,560 constructed this way contains the true value 51 00:02:59,560 --> 00:03:02,320 of the unknown parameter. 52 00:03:02,320 --> 00:03:06,650 So this is how we obtain a 95% confidence interval. 53 00:03:06,650 --> 00:03:11,600 If instead we wanted a 90% confidence interval, we would 54 00:03:11,600 --> 00:03:14,210 proceed in more or less in the same way. 55 00:03:14,210 --> 00:03:20,070 Here, we would want to have the number 0.95. 56 00:03:20,070 --> 00:03:21,290 Why is that? 57 00:03:21,290 --> 00:03:26,020 We want this middle interval to have probability 90%, which 58 00:03:26,020 --> 00:03:30,890 means that we want to have probability 5% at 59 00:03:30,890 --> 00:03:32,525 each one of the tails. 60 00:03:35,550 --> 00:03:39,510 And then we look up at the normal tables, and we find 61 00:03:39,510 --> 00:03:44,980 that the entry that gives us probability 95% of being below 62 00:03:44,980 --> 00:03:48,950 that value is 1.645. 63 00:03:48,950 --> 00:03:54,910 So if we use 1.645 in place of 1.96, we obtain a 90% 64 00:03:54,910 --> 00:03:58,310 confidence interval, and similarly for other choices. 65 00:03:58,310 --> 00:04:03,290 For example, if we want a 99% confidence interval. 66 00:04:03,290 --> 00:04:06,660 There's only one issue that's left to discuss, and this is 67 00:04:06,660 --> 00:04:07,620 the following. 68 00:04:07,620 --> 00:04:11,860 In order to obtain numerical values for the endpoints of 69 00:04:11,860 --> 00:04:15,890 the confidence interval, we need to know sigma, the 70 00:04:15,890 --> 00:04:17,940 standard deviation of the random 71 00:04:17,940 --> 00:04:20,010 variables that we are observing. 72 00:04:20,010 --> 00:04:23,160 But if we do not know the value of sigma, then we may 73 00:04:23,160 --> 00:04:24,720 have to do some additional work.