1 00:00:01,080 --> 00:00:03,660 We have claimed that normal random variables are very 2 00:00:03,660 --> 00:00:06,880 important, and therefore we would like to be able to 3 00:00:06,880 --> 00:00:09,280 calculate probabilities associated with them. 4 00:00:09,280 --> 00:00:11,940 For example, given a normal random variable, what is the 5 00:00:11,940 --> 00:00:15,200 probability that it takes a value less than 5? 6 00:00:15,200 --> 00:00:18,470 Unfortunately, there are no closed form expressions that 7 00:00:18,470 --> 00:00:20,210 can help us with this. 8 00:00:20,210 --> 00:00:23,040 In particular, the CDF, the Cumulative Distribution 9 00:00:23,040 --> 00:00:26,030 Function of normal random variables, is not given in 10 00:00:26,030 --> 00:00:27,120 closed form. 11 00:00:27,120 --> 00:00:30,960 But fortunately, we do have tables for the standard normal 12 00:00:30,960 --> 00:00:32,820 random variable. 13 00:00:32,820 --> 00:00:38,270 These tables, which take the form shown here, give us the 14 00:00:38,270 --> 00:00:40,200 following information. 15 00:00:40,200 --> 00:00:43,930 If we have a normal random variable, which is a standard 16 00:00:43,930 --> 00:00:48,150 normal, they tell us the values of the cumulative 17 00:00:48,150 --> 00:00:55,240 distribution function for different values of little y. 18 00:00:55,240 --> 00:00:59,870 In terms of a picture, if this is the PDF of a standard 19 00:00:59,870 --> 00:01:04,840 normal and I give you a value little y, I'm interested in 20 00:01:04,840 --> 00:01:08,250 the corresponding value of the CDF, which is the 21 00:01:08,250 --> 00:01:10,150 area under the curve. 22 00:01:10,150 --> 00:01:13,770 Well, that value, the area under this curve, is exactly 23 00:01:13,770 --> 00:01:16,580 what this table is giving to us. 24 00:01:16,580 --> 00:01:19,990 And there's a shorthand notation for referring to the 25 00:01:19,990 --> 00:01:24,510 CDF of the standard normal, which is just phi of y. 26 00:01:24,510 --> 00:01:27,390 Let us see how we use this table. 27 00:01:27,390 --> 00:01:31,140 Suppose we're interested in phi of 0. 28 00:01:31,140 --> 00:01:33,900 Which is the probability that our standard normal takes a 29 00:01:33,900 --> 00:01:36,570 value less than or equal to 0? 30 00:01:36,570 --> 00:01:40,259 Well, by symmetry since the PDF is symmetric around 0, we 31 00:01:40,259 --> 00:01:44,180 know that this probability should be 0.5. 32 00:01:44,180 --> 00:01:46,990 Let's see what the table tells us. 33 00:01:46,990 --> 00:01:53,690 0 corresponds to this entry, which is indeed 0.5. 34 00:01:53,690 --> 00:01:57,789 Let us look up the probability that our standard normal takes 35 00:01:57,789 --> 00:02:01,330 a value less than, let's say, 1.16. 36 00:02:01,330 --> 00:02:03,580 How do we find this information? 37 00:02:03,580 --> 00:02:06,620 1 is here. 38 00:02:06,620 --> 00:02:11,770 And 1.1 is here. 39 00:02:11,770 --> 00:02:17,350 1.1, and then we have a 6 in the next decimal place, which 40 00:02:17,350 --> 00:02:20,590 leads us to this entry. 41 00:02:20,590 --> 00:02:27,660 And so this value is 0.8770. 42 00:02:27,660 --> 00:02:31,200 Similarly, we can calculate the probability that the 43 00:02:31,200 --> 00:02:35,030 normal is less than 2.9. 44 00:02:35,030 --> 00:02:37,010 How do we look up this information? 45 00:02:37,010 --> 00:02:38,860 2.9 is here. 46 00:02:38,860 --> 00:02:42,520 We do not have another decimal digit, so we're looking at 47 00:02:42,520 --> 00:02:43,640 this column. 48 00:02:43,640 --> 00:02:50,910 And we obtain this value, which is 0.9981. 49 00:02:50,910 --> 00:02:54,930 And by looking at this number we can actually tell that a 50 00:02:54,930 --> 00:02:59,810 standard normal random variable has extremely low 51 00:02:59,810 --> 00:03:04,400 probability of being bigger than 2.9. 52 00:03:04,400 --> 00:03:08,850 Now notice that the table specifies phi of y for y being 53 00:03:08,850 --> 00:03:10,190 non-negative. 54 00:03:10,190 --> 00:03:15,820 What if we wish to calculate the value, for example, of phi 55 00:03:15,820 --> 00:03:18,860 of minus 2? 56 00:03:18,860 --> 00:03:23,980 In terms of a picture, this is a standard normal. 57 00:03:23,980 --> 00:03:26,870 Here is minus 2. 58 00:03:26,870 --> 00:03:29,570 And we wish to calculate this probability. 59 00:03:29,570 --> 00:03:32,040 There's nothing in the table that gives us this probability 60 00:03:32,040 --> 00:03:35,150 directly, but we can argue as follows. 61 00:03:35,150 --> 00:03:37,730 The normal PDF is symmetric. 62 00:03:37,730 --> 00:03:43,530 So if we look at 2, then this probability here, which is phi 63 00:03:43,530 --> 00:03:47,200 of minus 2, is the same as that probability 64 00:03:47,200 --> 00:03:49,020 here, of that tail. 65 00:03:49,020 --> 00:03:51,810 What is the probability of that tail? 66 00:03:51,810 --> 00:03:56,750 It's 1, which is the overall area under the curve, minus 67 00:03:56,750 --> 00:04:01,530 the area under the curve when you go up to the value of 2. 68 00:04:04,540 --> 00:04:11,340 So this quantity is going to be the same as phi of minus 2. 69 00:04:11,340 --> 00:04:14,570 And this one we can now get from the tables. 70 00:04:14,570 --> 00:04:15,830 It's 1 minus-- 71 00:04:15,830 --> 00:04:18,709 let us see, 2 is here. 72 00:04:18,709 --> 00:04:27,170 It's 1 minus 0.9772. 73 00:04:27,170 --> 00:04:30,460 The standard normal table gives us probabilities 74 00:04:30,460 --> 00:04:33,610 associated with a standard normal random variable. 75 00:04:33,610 --> 00:04:37,320 What if we're dealing with a normal random variable that 76 00:04:37,320 --> 00:04:41,450 has a mean and a variance that are different from those of 77 00:04:41,450 --> 00:04:42,820 the standard normal? 78 00:04:42,820 --> 00:04:44,280 What can we do? 79 00:04:44,280 --> 00:04:48,200 Well, there's a general trick that you can do to a random 80 00:04:48,200 --> 00:04:51,409 variable, which is the following. 81 00:04:51,409 --> 00:04:55,770 Let us define a new random variable Y in this fashion. 82 00:04:55,770 --> 00:05:01,030 Y measures how far away is X from the mean value. 83 00:05:01,030 --> 00:05:04,610 But because we divide by sigma, the standard deviation, 84 00:05:04,610 --> 00:05:08,920 it measures this distance in standard deviations. 85 00:05:08,920 --> 00:05:14,190 So if Y is equal to 3 it means that X is 3 standard 86 00:05:14,190 --> 00:05:16,320 deviations away from the mean. 87 00:05:16,320 --> 00:05:20,540 In general, Y measures how many deviations away from the 88 00:05:20,540 --> 00:05:22,270 mean are you. 89 00:05:22,270 --> 00:05:25,030 What properties does this random variable have? 90 00:05:25,030 --> 00:05:30,080 The expected value of Y is going to be equal to 0, 91 00:05:30,080 --> 00:05:33,650 because we have X and we're subtracting the mean of X. So 92 00:05:33,650 --> 00:05:36,890 the expected value of this term is equal to 0. 93 00:05:36,890 --> 00:05:39,690 How about the variance of Y? 94 00:05:39,690 --> 00:05:43,820 Whenever we multiply a random variable by a constant, the 95 00:05:43,820 --> 00:05:48,300 variance gets multiplied by the square of that constant. 96 00:05:48,300 --> 00:05:51,320 So we get this expression. 97 00:05:51,320 --> 00:05:54,280 But the variance of X is sigma squared. 98 00:05:54,280 --> 00:05:55,980 So this is equal to 1. 99 00:05:55,980 --> 00:05:59,500 So starting from X, we have obtained a closely related 100 00:05:59,500 --> 00:06:03,460 random variable Y that has the property that it has 0 mean 101 00:06:03,460 --> 00:06:05,480 and unit variance. 102 00:06:05,480 --> 00:06:10,990 If it also happens that X is a normal random variable, then Y 103 00:06:10,990 --> 00:06:14,380 is going to be a standard normal random variable. 104 00:06:14,380 --> 00:06:18,160 So we have managed to relate X to a standard 105 00:06:18,160 --> 00:06:19,940 normal random variable. 106 00:06:19,940 --> 00:06:23,400 And perhaps you can rewrite this expression in this form, 107 00:06:23,400 --> 00:06:29,040 X equals to mu plus sigma Y where Y is 108 00:06:29,040 --> 00:06:32,620 now a standard normal. 109 00:06:32,620 --> 00:06:35,040 So, instead of doing calculations having to do with 110 00:06:35,040 --> 00:06:40,220 X, we can try to calculate in terms of Y. And for Y we do 111 00:06:40,220 --> 00:06:42,570 have available tables. 112 00:06:42,570 --> 00:06:46,300 Let us look at an example of how this is done. 113 00:06:46,300 --> 00:06:49,210 The way to calculate probabilities associated with 114 00:06:49,210 --> 00:06:53,560 general normal random variables is to take the event 115 00:06:53,560 --> 00:06:58,140 whose probability we want calculated and express it in 116 00:06:58,140 --> 00:07:00,690 terms of standard normal random variables. 117 00:07:00,690 --> 00:07:03,810 And then use the standard normal tables. 118 00:07:03,810 --> 00:07:07,090 Let us see how this is done in terms of an example. 119 00:07:07,090 --> 00:07:13,000 Suppose that X is normal with mean 6 and variance 4, so that 120 00:07:13,000 --> 00:07:15,780 the standard deviation sigma is equal to 2. 121 00:07:15,780 --> 00:07:19,080 And suppose that we want to calculate the probability that 122 00:07:19,080 --> 00:07:23,550 X lies between 2 and 8. 123 00:07:30,520 --> 00:07:33,200 Here's how we can proceed. 124 00:07:33,200 --> 00:07:39,140 This event is the same as the event that X minus 6 takes a 125 00:07:39,140 --> 00:07:43,220 value between 2 minus 6 and 8 minus 6. 126 00:07:43,220 --> 00:07:45,790 This event is the same as the original event we were 127 00:07:45,790 --> 00:07:47,590 interested in. 128 00:07:47,590 --> 00:07:50,890 We can also divide both sides of this inequality by the 129 00:07:50,890 --> 00:07:52,420 standard deviation. 130 00:07:52,420 --> 00:07:55,040 And the event of interest has now been 131 00:07:55,040 --> 00:07:59,430 expressed in this form. 132 00:07:59,430 --> 00:08:03,830 But at this point we recognize that this is of the form X 133 00:08:03,830 --> 00:08:06,390 minus mu over sigma. 134 00:08:06,390 --> 00:08:09,430 So this random variable here is a standard 135 00:08:09,430 --> 00:08:10,855 normal random variable. 136 00:08:14,090 --> 00:08:24,530 So the probability that X lies between 2 and 8 is the same as 137 00:08:24,530 --> 00:08:27,600 the probability that a standard normal random 138 00:08:27,600 --> 00:08:32,350 variable, call it Y, falls between these numbers minus 4 139 00:08:32,350 --> 00:08:35,090 divided by 2, that's minus 2. 140 00:08:35,090 --> 00:08:37,380 Then Y less than 1. 141 00:08:40,960 --> 00:08:44,340 And now we can use the standard normal tables to 142 00:08:44,340 --> 00:08:46,210 calculate this probability. 143 00:08:46,210 --> 00:08:49,960 We have here 1 and here we have minus 2. 144 00:08:49,960 --> 00:08:52,790 And we want to find the probability that our standard 145 00:08:52,790 --> 00:08:55,660 normal falls inside this range. 146 00:08:55,660 --> 00:08:59,420 This is the probability that it is less than 1. 147 00:08:59,420 --> 00:09:02,640 But we need to subtract the probability of that tail so 148 00:09:02,640 --> 00:09:06,850 that we're left just with this intermediate area. 149 00:09:06,850 --> 00:09:11,970 So this is the probability that Y is less than 1 minus 150 00:09:11,970 --> 00:09:17,390 the probability that Y is less than minus 2. 151 00:09:17,390 --> 00:09:21,520 And finally, as we discussed earlier, the probability that 152 00:09:21,520 --> 00:09:27,360 Y is less than minus 2, this is 1 minus the probability 153 00:09:27,360 --> 00:09:31,690 that Y is less than or equal to 2. 154 00:09:31,690 --> 00:09:35,670 And now we can go to the normal tables, identify the 155 00:09:35,670 --> 00:09:38,660 values that we're interested in, the probability that Y is 156 00:09:38,660 --> 00:09:41,970 less than 1, the probability that Y is less than 2, and 157 00:09:41,970 --> 00:09:43,440 plug these in. 158 00:09:43,440 --> 00:09:46,250 And this gives us the desired probability. 159 00:09:46,250 --> 00:09:53,450 Again, the key step is to take the event of interest and by 160 00:09:53,450 --> 00:09:55,530 subtracting the mean and dividing by the standard 161 00:09:55,530 --> 00:09:59,740 deviation express that same event in an equivalent form, 162 00:09:59,740 --> 00:10:02,240 but which now involves a standard 163 00:10:02,240 --> 00:10:03,660 normal random variable. 164 00:10:06,880 --> 00:10:09,780 And then finally, use the standard normal tables.