1 00:00:00,527 --> 00:00:02,360 PROFESSOR: Certain kinds of random variables 2 00:00:02,360 --> 00:00:05,110 keep coming up, so let's look at two basic examples 3 00:00:05,110 --> 00:00:08,890 now, namely uniform random variables 4 00:00:08,890 --> 00:00:12,225 and binomial random variables. 5 00:00:12,225 --> 00:00:13,850 Let's begin with uniform, because we've 6 00:00:13,850 --> 00:00:15,480 seen those already. 7 00:00:15,480 --> 00:00:17,580 So a uniform random variable means 8 00:00:17,580 --> 00:00:19,095 that all the values that it takes, 9 00:00:19,095 --> 00:00:21,560 it takes with equal probability. 10 00:00:21,560 --> 00:00:25,390 So the threshold variable Z took all the values from 0 11 00:00:25,390 --> 00:00:28,560 to 6 inclusive, each with probability 1/7. 12 00:00:28,560 --> 00:00:33,760 So it was a basic example of a uniform variable. 13 00:00:33,760 --> 00:00:37,570 And other examples that come up, if D 14 00:00:37,570 --> 00:00:41,470 is the outcome of a fair die-- dies are six-sided. 15 00:00:41,470 --> 00:00:43,390 Dice are six-sided. 16 00:00:43,390 --> 00:00:49,890 So the probability that it comes up 1 or 2 or 6 is 1/6 each. 17 00:00:49,890 --> 00:00:52,680 Another game is the four-digit lottery number 18 00:00:52,680 --> 00:00:55,390 where it's supposed to be the case that the four digits are 19 00:00:55,390 --> 00:00:59,100 each chosen at random, which means that the possibilities 20 00:00:59,100 --> 00:01:04,150 range from four 0's up through four 9's for 10,000 numbers. 21 00:01:04,150 --> 00:01:06,780 And they're supposed to be all equally likely. 22 00:01:06,780 --> 00:01:11,270 So the probability that the lottery winds up with 00 is 23 00:01:11,270 --> 00:01:14,340 the same as that it ends up with 1 is the same that it ends up 24 00:01:14,340 --> 00:01:15,240 with four 9's. 25 00:01:15,240 --> 00:01:16,990 It's 1/10,000. 26 00:01:16,990 --> 00:01:19,055 So that's another uniform random variable. 27 00:01:22,020 --> 00:01:24,950 Let's prove a little lemma that will be of use later. 28 00:01:24,950 --> 00:01:28,260 It's just some practice with uniformity. 29 00:01:28,260 --> 00:01:31,888 Suppose that I have R1, R2, R3 are three random variables. 30 00:01:31,888 --> 00:01:33,096 They're mutually independent. 31 00:01:36,060 --> 00:01:37,310 And R1 is uniform. 32 00:01:37,310 --> 00:01:39,380 I don't really care about the other two. 33 00:01:39,380 --> 00:01:43,950 I do care technically that they are only taking the values. 34 00:01:43,950 --> 00:01:46,919 They only take values that R1 can take as well. 35 00:01:46,919 --> 00:01:48,460 So I haven't said that on this slide, 36 00:01:48,460 --> 00:01:50,830 but that's what we're assuming. 37 00:01:50,830 --> 00:01:55,580 And then I claim is that each of the pairs, the probability 38 00:01:55,580 --> 00:02:01,850 that R1 equals R2-- the event that R1 is equal to R2 39 00:02:01,850 --> 00:02:03,480 is independent of the event that R2 40 00:02:03,480 --> 00:02:07,020 is equal to R3, which is independent of the event 41 00:02:07,020 --> 00:02:09,419 that R1 is equal to R3. 42 00:02:09,419 --> 00:02:11,360 Now, these events overlap. 43 00:02:11,360 --> 00:02:16,020 There's an R1 here and an R1 there and there's an R2 here 44 00:02:16,020 --> 00:02:16,860 and an R2 there. 45 00:02:16,860 --> 00:02:20,930 So even though the R1, R2, R3 are mutually independent, 46 00:02:20,930 --> 00:02:22,270 it's not completely clear. 47 00:02:22,270 --> 00:02:25,920 In fact, it isn't really clear that these events 48 00:02:25,920 --> 00:02:28,920 are mutually independent. 49 00:02:28,920 --> 00:02:31,660 But in fact, they're not mutually independent. 50 00:02:31,660 --> 00:02:34,290 In fact, they're pairwise independent. 51 00:02:34,290 --> 00:02:36,410 They're obviously not three-way independent-- 52 00:02:36,410 --> 00:02:38,700 that is, mutually independent-- because if I know 53 00:02:38,700 --> 00:02:41,790 that R1 is equal to R2 and I know that R2 is equal to R3, 54 00:02:41,790 --> 00:02:45,260 it follows that R1 is equal to R3. 55 00:02:45,260 --> 00:02:48,660 So given these two, the probability of this 56 00:02:48,660 --> 00:02:54,840 changes dramatically to certainty. 57 00:02:54,840 --> 00:02:57,689 So this is the useful lemma, which 58 00:02:57,689 --> 00:02:59,230 is that if I have the three variables 59 00:02:59,230 --> 00:03:03,240 and I look at the three possible pairs of values that might be 60 00:03:03,240 --> 00:03:06,720 equal that whether any two of them are equal 61 00:03:06,720 --> 00:03:10,300 is independent of each other. 62 00:03:10,300 --> 00:03:12,430 Now, let me give a handwaving argument. 63 00:03:12,430 --> 00:03:15,630 There's a more rigorous argument based 64 00:03:15,630 --> 00:03:17,630 on total probability that appears 65 00:03:17,630 --> 00:03:19,740 as a problem in the text. 66 00:03:19,740 --> 00:03:21,750 But the intuitive ideas, let's look at the case 67 00:03:21,750 --> 00:03:25,370 that R1 is the uniform variable, and R1 68 00:03:25,370 --> 00:03:28,240 is independent of R2 and R3. 69 00:03:28,240 --> 00:03:32,050 So certainly, that implies that R1 is independent of the event 70 00:03:32,050 --> 00:03:34,630 that R2 is equal to R3, because R1 isn't mutually 71 00:03:34,630 --> 00:03:36,280 independent, both R1 and R2. 72 00:03:36,280 --> 00:03:39,440 Doesn't matter what they do, so it's independent of this event 73 00:03:39,440 --> 00:03:41,630 that R2 is equal to R3. 74 00:03:41,630 --> 00:03:46,620 Now, because R1 is uniform, it has 75 00:03:46,620 --> 00:03:51,770 probability p of equaling every possible value 76 00:03:51,770 --> 00:03:52,690 that it can take. 77 00:03:52,690 --> 00:03:58,250 And since R2 and R3 only take a value that R1 could take, 78 00:03:58,250 --> 00:04:03,200 the probability that R1 hits the value that R2 and R3 happens 79 00:04:03,200 --> 00:04:05,170 to have is still p. 80 00:04:05,170 --> 00:04:06,520 That's the informal argument. 81 00:04:06,520 --> 00:04:08,686 So in other words, the claim is that the probability 82 00:04:08,686 --> 00:04:12,680 that R1 is equal to R2 given that R2 is equal to R3 83 00:04:12,680 --> 00:04:16,490 is simply the probability that R1 happens to hit 84 00:04:16,490 --> 00:04:18,980 R2, whatever values R2 has. 85 00:04:18,980 --> 00:04:21,089 This equation says that R1 equals 86 00:04:21,089 --> 00:04:24,050 R2 is independent of R2, R3. 87 00:04:24,050 --> 00:04:26,860 And in fact, in both cases, it's the same probability 88 00:04:26,860 --> 00:04:29,540 that R1 is equal to any given value, 89 00:04:29,540 --> 00:04:32,450 the probability of R being uniform 90 00:04:32,450 --> 00:04:35,315 has of equaling each of its possible values. 91 00:04:35,315 --> 00:04:38,570 You can think about that, see if it's persuasive. 92 00:04:38,570 --> 00:04:40,820 It's an OK argument, but I was bothered by it. 93 00:04:40,820 --> 00:04:43,960 I found that it took me-- I wasn't happy with it 94 00:04:43,960 --> 00:04:45,540 until I sat down and really worked it 95 00:04:45,540 --> 00:04:49,350 out formally to justify this somewhat 96 00:04:49,350 --> 00:04:51,930 handwavy proof of the lemma. 97 00:04:51,930 --> 00:04:53,730 All right. 98 00:04:53,730 --> 00:04:55,790 Let's turn from uniform random variables 99 00:04:55,790 --> 00:04:57,460 to binomial random variables. 100 00:04:57,460 --> 00:04:59,780 They're probably the most important single example 101 00:04:59,780 --> 00:05:02,350 of random variable that comes up all the time. 102 00:05:02,350 --> 00:05:05,360 So the simplest definition of a binomial random variable 103 00:05:05,360 --> 00:05:07,650 is the one that you get by flipping 104 00:05:07,650 --> 00:05:12,440 n mutually independent coins. 105 00:05:12,440 --> 00:05:15,350 Now, they have an order, so you can tell them apart. 106 00:05:15,350 --> 00:05:20,340 Or again, you can say that you flip one coin n times, 107 00:05:20,340 --> 00:05:23,820 but each of the flips is independent of all the others. 108 00:05:23,820 --> 00:05:25,990 Now, there's two parameters here, an n and a p, 109 00:05:25,990 --> 00:05:29,140 because we don't assume that the flips are fair. 110 00:05:29,140 --> 00:05:32,840 So there's one parameter is how many flips there are. 111 00:05:32,840 --> 00:05:34,770 The other parameter is the probability 112 00:05:34,770 --> 00:05:37,560 of a head, which might be biased that heads 113 00:05:37,560 --> 00:05:39,660 are more likely or less likely than tails. 114 00:05:39,660 --> 00:05:42,990 The fair case would be when p was 1/2. 115 00:05:42,990 --> 00:05:47,620 So for example, if n is 5 and p is 2/3, 116 00:05:47,620 --> 00:05:51,900 what's the probability that we consecutively flip head, head, 117 00:05:51,900 --> 00:05:53,680 tail, tail, head? 118 00:05:53,680 --> 00:05:55,570 Well, because they're independent, 119 00:05:55,570 --> 00:05:57,730 the probability of this is simply 120 00:05:57,730 --> 00:05:59,900 the product of the probability that I 121 00:05:59,900 --> 00:06:01,950 flip a head on the first toss, which 122 00:06:01,950 --> 00:06:05,200 is probability of H, which is p; probability 123 00:06:05,200 --> 00:06:08,140 of H on the second toss; probability of T on the third; 124 00:06:08,140 --> 00:06:10,220 T on the fourth; T on the fifth. 125 00:06:10,220 --> 00:06:13,362 So I can replace each of those by 2/3 126 00:06:13,362 --> 00:06:14,570 is the probability of a head. 127 00:06:14,570 --> 00:06:16,620 2/3, 1/3. 128 00:06:16,620 --> 00:06:18,670 1 minus 2/3 is the probability of a tail. 129 00:06:18,670 --> 00:06:20,670 1/3, 2/3. 130 00:06:20,670 --> 00:06:24,650 And I discover that the probability of HHTTH 131 00:06:24,650 --> 00:06:30,580 is 2/3 cubed and 1/3 squared. 132 00:06:30,580 --> 00:06:38,240 Or abstracting the probability of a sequence of n tosses 133 00:06:38,240 --> 00:06:41,090 in which there are i heads and the rest are tails, 134 00:06:41,090 --> 00:06:46,500 n minus i tails, is simply the probability 135 00:06:46,500 --> 00:06:50,180 of a head raised to the i-th power times the probability 136 00:06:50,180 --> 00:06:52,790 of a tail, namely 1 minus p, raised 137 00:06:52,790 --> 00:06:55,490 to the n minus i-th power. 138 00:06:55,490 --> 00:06:59,700 Given any particular sequence of H's and T's of length n, 139 00:06:59,700 --> 00:07:02,980 this is the probability that's assigned to that sequence. 140 00:07:02,980 --> 00:07:05,180 So all sequences with the same number of H's have 141 00:07:05,180 --> 00:07:06,799 the same probability. 142 00:07:06,799 --> 00:07:08,840 But of course, with different numbers of H's they 143 00:07:08,840 --> 00:07:10,960 have different probabilities. 144 00:07:10,960 --> 00:07:14,610 Well, what's the probability that you actually toss i heads 145 00:07:14,610 --> 00:07:17,390 and n minus i tails in the n tosses? 146 00:07:17,390 --> 00:07:21,590 That's going to be equal to the number of possible sequences 147 00:07:21,590 --> 00:07:25,570 that have this property of i heads and n minus i tails. 148 00:07:25,570 --> 00:07:26,990 Well, the number of such sequences 149 00:07:26,990 --> 00:07:30,420 is simply choose the i places for the n heads 150 00:07:30,420 --> 00:07:34,460 out of-- choose the i places for the heads out of the n tosses. 151 00:07:34,460 --> 00:07:36,100 So it's going to be n choose i. 152 00:07:36,100 --> 00:07:39,330 So we've just figured out that the probability of tossing i 153 00:07:39,330 --> 00:07:42,100 heads and n minus i tails is simply 154 00:07:42,100 --> 00:07:48,300 n choose i times p to the i, 1 minus p to the n minus i. 155 00:07:48,300 --> 00:07:53,730 In short, the probability that the number of heads is i 156 00:07:53,730 --> 00:07:55,730 is equal to this number. 157 00:07:55,730 --> 00:07:59,300 And this is the probability that's 158 00:07:59,300 --> 00:08:01,980 associated with whether the binomial variable 159 00:08:01,980 --> 00:08:04,600 with parameters n and p is equal to i 160 00:08:04,600 --> 00:08:07,660 is n choose i, p to the i, 1 minus p to the n minus i. 161 00:08:07,660 --> 00:08:09,300 This is a pretty basic formula. 162 00:08:09,300 --> 00:08:10,810 If you can't memorize it, then make 163 00:08:10,810 --> 00:08:13,160 sure it's written on any crib sheet you take to an exam. 164 00:08:15,700 --> 00:08:18,720 So the probability density function, 165 00:08:18,720 --> 00:08:21,780 it abstracts out some properties of random variables. 166 00:08:21,780 --> 00:08:23,530 Basically, it just tells you what's 167 00:08:23,530 --> 00:08:26,190 the probability that the random variable takes a given 168 00:08:26,190 --> 00:08:28,420 value for every possible value. 169 00:08:28,420 --> 00:08:32,830 So the probability density function, PDF of R, 170 00:08:32,830 --> 00:08:34,890 is a function on the real values. 171 00:08:34,890 --> 00:08:37,450 And it tells you for each a what's the probability 172 00:08:37,450 --> 00:08:39,426 that R is equal to a. 173 00:08:42,049 --> 00:08:44,370 So what we've just said is that the probability density 174 00:08:44,370 --> 00:08:47,410 function of the binomial random variable characterized 175 00:08:47,410 --> 00:08:52,470 by parameters n and p at i is n choose i, p to the i, 1 176 00:08:52,470 --> 00:08:54,420 minus p to the n minus i, where we're 177 00:08:54,420 --> 00:08:59,850 assuming that i is an integer from 0 to n. 178 00:08:59,850 --> 00:09:03,440 If I look at the probability density 179 00:09:03,440 --> 00:09:06,060 function for a uniform variable, then it's constant. 180 00:09:06,060 --> 00:09:09,430 The probability density function on any possible value 181 00:09:09,430 --> 00:09:14,370 v that the uniform variable can take is the same. 182 00:09:14,370 --> 00:09:17,540 This applies for v in the range of U. 183 00:09:17,540 --> 00:09:20,210 So in fact, you could say exactly what it is. 184 00:09:20,210 --> 00:09:22,420 It's simply 1 over the size of the range of U, 185 00:09:22,420 --> 00:09:23,330 if U is uniform. 186 00:09:27,320 --> 00:09:30,690 A closely related function that describes 187 00:09:30,690 --> 00:09:32,760 a lot about the behavior of a random variable 188 00:09:32,760 --> 00:09:35,460 is the cumulative distribution function. 189 00:09:35,460 --> 00:09:38,960 It's simply the probability that R is less than or equal to a. 190 00:09:38,960 --> 00:09:41,030 So it's a function on the real numbers, 191 00:09:41,030 --> 00:09:46,810 from reals to reals, where CDF R of a 192 00:09:46,810 --> 00:09:49,370 is the probability that R is less than or equal to a. 193 00:09:49,370 --> 00:09:52,810 Clearly given the PDF, you can get the CDF. 194 00:09:52,810 --> 00:09:55,440 And given the CDF, you can get the PDF. 195 00:09:55,440 --> 00:09:57,880 But it's convenient to have both around. 196 00:09:57,880 --> 00:10:00,500 Now the key observation about these 197 00:10:00,500 --> 00:10:05,330 is that once we've abstracted out to the PDF and the CDF, 198 00:10:05,330 --> 00:10:08,580 we don't have to think about the sample space anymore. 199 00:10:08,580 --> 00:10:10,730 They do not depend on the sample space. 200 00:10:10,730 --> 00:10:13,010 All they're telling you is the probability 201 00:10:13,010 --> 00:10:15,890 that the random variable takes a given value, which 202 00:10:15,890 --> 00:10:18,330 is in some ways, the most important data 203 00:10:18,330 --> 00:10:19,900 about a random variable. 204 00:10:19,900 --> 00:10:22,110 You need to fall back on something more general 205 00:10:22,110 --> 00:10:24,800 than the PDF or the CDF when you start 206 00:10:24,800 --> 00:10:27,240 having dependent random variables, 207 00:10:27,240 --> 00:10:29,490 and you need to know how the probability that R 208 00:10:29,490 --> 00:10:32,750 takes a value changes, given that s has some property 209 00:10:32,750 --> 00:10:34,920 or takes some other value. 210 00:10:34,920 --> 00:10:38,456 But if you're just looking at the random variable alone, 211 00:10:38,456 --> 00:10:40,455 essentially everything you need to know about it 212 00:10:40,455 --> 00:10:44,200 is given by its density or distribution functions. 213 00:10:44,200 --> 00:10:47,170 And you don't have to worry about the sample space. 214 00:10:47,170 --> 00:10:50,650 And this has the advantage that both the uniform distributions 215 00:10:50,650 --> 00:10:57,260 and binomial distributions come up [AUDIO OUT] 216 00:11:01,730 --> 00:11:05,630 --and it means that all of these different random variables, 217 00:11:05,630 --> 00:11:07,790 based on different sample spaces, 218 00:11:07,790 --> 00:11:10,480 are going to share a whole lot of properties. 219 00:11:10,480 --> 00:11:13,440 Everything that I derive based on what the PDF is 220 00:11:13,440 --> 00:11:15,280 is going to apply to all of them. 221 00:11:15,280 --> 00:11:19,360 That's why this abstraction of a random variable in terms 222 00:11:19,360 --> 00:11:20,940 of a probability density function 223 00:11:20,940 --> 00:11:22,280 is so valuable and key. 224 00:11:22,280 --> 00:11:24,600 But remember, the definition of a random variable 225 00:11:24,600 --> 00:11:28,340 is not that it is a probability density function, 226 00:11:28,340 --> 00:11:33,040 rather it's a function from the sample space to values.