1 00:00:00,820 --> 00:00:03,830 The next random variable that we will discuss is the 2 00:00:03,830 --> 00:00:05,560 binomial random variable. 3 00:00:05,560 --> 00:00:07,570 It is one that is already familiar 4 00:00:07,570 --> 00:00:09,380 to us in most respects. 5 00:00:09,380 --> 00:00:13,190 It is associated with the experiment of taking a coin 6 00:00:13,190 --> 00:00:16,510 and tossing it n times independently. 7 00:00:16,510 --> 00:00:20,100 And at each toss, there is a probability, p, 8 00:00:20,100 --> 00:00:21,590 of obtaining heads. 9 00:00:21,590 --> 00:00:25,120 So the experiment is completely specified in terms 10 00:00:25,120 --> 00:00:26,270 of two parameters-- 11 00:00:26,270 --> 00:00:30,470 n, the number of tosses, and p, the probability of heads at 12 00:00:30,470 --> 00:00:32,549 each one of the tosses. 13 00:00:32,549 --> 00:00:35,830 We can represent this experiment by the usual 14 00:00:35,830 --> 00:00:38,020 sequential tree diagram. 15 00:00:38,020 --> 00:00:41,490 And the leaves of the tree are the possible outcomes of the 16 00:00:41,490 --> 00:00:42,440 experiment. 17 00:00:42,440 --> 00:00:44,640 So these are the elements of the sample space. 18 00:00:44,640 --> 00:00:49,585 And a typical outcome is a particular sequence of heads 19 00:00:49,585 --> 00:00:52,190 and tails that has length n. 20 00:00:52,190 --> 00:00:56,890 In this diagram here, we took n to be equal to 3. 21 00:00:56,890 --> 00:01:00,120 We can now define a random variable associated with this 22 00:01:00,120 --> 00:01:00,720 experiment. 23 00:01:00,720 --> 00:01:03,660 Our random variable that we denote by capital X is the 24 00:01:03,660 --> 00:01:06,370 number of heads that are observed. 25 00:01:06,370 --> 00:01:10,950 So for example, if the outcome happens to be this one-- 26 00:01:10,950 --> 00:01:15,900 tails, heads, heads-- we have 2 heads that are observed. 27 00:01:15,900 --> 00:01:21,200 And the numerical value of our random variable is equal to 2. 28 00:01:21,200 --> 00:01:24,580 In general, this random variable, a binomial random 29 00:01:24,580 --> 00:01:29,500 variable, can be used to model any kind of situation in which 30 00:01:29,500 --> 00:01:34,200 we have a fixed number of independent trials and 31 00:01:34,200 --> 00:01:38,160 identical trials, and each trial can result in success or 32 00:01:38,160 --> 00:01:42,539 failure, and we have a probability of success equal 33 00:01:42,539 --> 00:01:44,880 to some given number, p. 34 00:01:44,880 --> 00:01:48,440 The number of successes obtained in these trials is, 35 00:01:48,440 --> 00:01:51,000 of course, random and it is modeled by a 36 00:01:51,000 --> 00:01:53,800 binomial random variable. 37 00:01:53,800 --> 00:01:57,979 We can now proceed and calculate the PMF of this 38 00:01:57,979 --> 00:01:59,220 random variable. 39 00:01:59,220 --> 00:02:03,140 Instead of calculating the whole PMF, let us look at just 40 00:02:03,140 --> 00:02:06,000 one typical entry of the PMF. 41 00:02:06,000 --> 00:02:09,259 Let's look at this entry, which, by definition, is the 42 00:02:09,259 --> 00:02:14,990 probability that our random variable takes the value of 2. 43 00:02:14,990 --> 00:02:17,840 Now, the random variable taking the numerical value of 44 00:02:17,840 --> 00:02:23,030 2, this is an event that can happen in three possible ways 45 00:02:23,030 --> 00:02:25,840 that we can identify in the sample space. 46 00:02:25,840 --> 00:02:29,150 We can have 2 heads followed by a tail. 47 00:02:29,150 --> 00:02:32,990 We can have heads, tails, heads. 48 00:02:32,990 --> 00:02:39,460 Or we can have tails, heads, heads. 49 00:02:39,460 --> 00:02:43,150 The probability of this outcome is p times p 50 00:02:43,150 --> 00:02:44,690 times (1 minus p). 51 00:02:44,690 --> 00:02:47,230 So it's p squared times (1 minus p). 52 00:02:47,230 --> 00:02:49,660 And the other two outcomes also have the same 53 00:02:49,660 --> 00:02:54,050 probability, so the overall probability is 3 times this. 54 00:02:54,050 --> 00:02:59,250 Which can also be written this way, 3 is the same as 55 00:02:59,250 --> 00:03:00,060 3-choose-2. 56 00:03:00,060 --> 00:03:03,050 It's the number of ways that you can choose 2 heads, where 57 00:03:03,050 --> 00:03:04,930 they will be placed in a sequence of 58 00:03:04,930 --> 00:03:09,940 3 slots or 3 trials. 59 00:03:09,940 --> 00:03:15,660 More generally, we have the familiar binomial formula. 60 00:03:15,660 --> 00:03:18,530 So this is a formula that you have already seen. 61 00:03:18,530 --> 00:03:22,180 It's the probability of obtaining k successes in a 62 00:03:22,180 --> 00:03:25,180 sequence of n independent trials. 63 00:03:25,180 --> 00:03:29,160 The only thing that is new is that instead of using the 64 00:03:29,160 --> 00:03:32,130 traditional probability notation, now 65 00:03:32,130 --> 00:03:35,020 we're using PMF notation. 66 00:03:35,020 --> 00:03:38,750 To get a feel for the binomial PMF, it's instructive to look 67 00:03:38,750 --> 00:03:39,890 at some plots. 68 00:03:39,890 --> 00:03:43,670 So suppose that we toss the coin three times and that the 69 00:03:43,670 --> 00:03:46,510 coin tosses are fair, so that the probability of heads is 70 00:03:46,510 --> 00:03:48,180 equal to 1/2. 71 00:03:48,180 --> 00:03:52,930 Then we see that 1 head or 2 heads are equally likely, and 72 00:03:52,930 --> 00:03:59,100 they are more likely than the outcome of 0 or 3 heads. 73 00:03:59,100 --> 00:04:02,860 Now, if we change the number of tosses and toss the coin 10 74 00:04:02,860 --> 00:04:07,260 times, then we see that the most likely result 75 00:04:07,260 --> 00:04:10,050 is to have 5 heads. 76 00:04:10,050 --> 00:04:13,660 And then as we move away from 5 in either direction, the 77 00:04:13,660 --> 00:04:15,920 probability of that particular result 78 00:04:15,920 --> 00:04:18,250 becomes smaller and smaller. 79 00:04:18,250 --> 00:04:22,840 Now, if we toss the coin many times, let's say 100 times, 80 00:04:22,840 --> 00:04:28,250 the coin is still fair, then we see that the number of 81 00:04:28,250 --> 00:04:32,350 heads that we're going to get is most likely to be somewhere 82 00:04:32,350 --> 00:04:36,810 in this range between, let's say, 35 and 65. 83 00:04:36,810 --> 00:04:40,750 These are values of the random variable that have some 84 00:04:40,750 --> 00:04:43,820 noticeable or high probabilities. 85 00:04:43,820 --> 00:04:48,710 But anything below 30 or anything about 70 is extremely 86 00:04:48,710 --> 00:04:51,240 unlikely to occur. 87 00:04:51,240 --> 00:04:55,360 We can generate similar plots for unfair coins. 88 00:04:55,360 --> 00:04:58,360 So suppose now that our coin is biased and the probability 89 00:04:58,360 --> 00:05:01,610 of heads is quite low, equal to 0.2. 90 00:05:01,610 --> 00:05:05,680 In that case, the most likely result is that we're going to 91 00:05:05,680 --> 00:05:07,640 see 0 heads. 92 00:05:07,640 --> 00:05:10,740 And then, there's smaller and smaller probability of 93 00:05:10,740 --> 00:05:12,440 obtaining more heads. 94 00:05:12,440 --> 00:05:16,740 On the other hand, if we toss the coin 10 times, we expect 95 00:05:16,740 --> 00:05:21,320 to see a few heads, not a very large number, but some number 96 00:05:21,320 --> 00:05:25,210 of heads between, let's say, 0 and 4. 97 00:05:25,210 --> 00:05:30,210 Finally, if we toss the coin 100 times and we take the coin 98 00:05:30,210 --> 00:05:35,690 to be an extremely unfair one, what do we expect to see? 99 00:05:35,690 --> 00:05:39,220 If we think of probabilities as frequencies, we expect to 100 00:05:39,220 --> 00:05:43,010 see heads roughly 10% of the time. 101 00:05:43,010 --> 00:05:48,190 So, given that n is 100, we expect to see about 10 heads. 102 00:05:48,190 --> 00:05:51,460 But when we say about 10 heads, we do not 103 00:05:51,460 --> 00:05:53,659 mean exactly 10 heads. 104 00:05:53,659 --> 00:05:57,480 About 10 heads, in this instance, as this plot tells 105 00:05:57,480 --> 00:06:02,440 us, is any number more or less in the range from 0 to 20. 106 00:06:02,440 --> 00:06:05,650 But anything above 20 is extremely unlikely.