1 00:00:00,499 --> 00:00:02,900 PROFESSOR: We ask about averages all the time. 2 00:00:02,900 --> 00:00:05,280 And in the context of random variables, 3 00:00:05,280 --> 00:00:07,870 averages get abstracted into a lovely concept 4 00:00:07,870 --> 00:00:11,950 called the expectation of the random variable. 5 00:00:11,950 --> 00:00:14,170 Let's begin with a motivating example 6 00:00:14,170 --> 00:00:18,240 which, as is often the case, will come from gambling. 7 00:00:18,240 --> 00:00:20,830 So there's a game that's actually 8 00:00:20,830 --> 00:00:24,910 played in casinos called Carnival Dice where you have 9 00:00:24,910 --> 00:00:28,760 three dice, and the way you play is you 10 00:00:28,760 --> 00:00:30,970 pick your favorite number from 1 to 6, 11 00:00:30,970 --> 00:00:32,720 whatever it happens to be. 12 00:00:32,720 --> 00:00:34,334 And then you roll the three dice. 13 00:00:34,334 --> 00:00:36,500 The dice are assumed to be fair, so each one of them 14 00:00:36,500 --> 00:00:38,670 has a one in six chance of coming up 15 00:00:38,670 --> 00:00:40,140 with any given number. 16 00:00:40,140 --> 00:00:43,910 And then the payoff goes as follows. 17 00:00:43,910 --> 00:00:48,630 For every match of your favorite number, you get $1.00. 18 00:00:48,630 --> 00:00:51,870 And if none of your favorite-- if none of the die show 19 00:00:51,870 --> 00:00:56,840 your favorite number, then you lose $1.00. 20 00:00:56,840 --> 00:00:57,340 OK. 21 00:00:57,340 --> 00:00:58,180 Let's do an example. 22 00:00:58,180 --> 00:00:59,763 Suppose your favorite number was five. 23 00:00:59,763 --> 00:01:02,440 You announce that to the house, or the dealer, 24 00:01:02,440 --> 00:01:04,590 and then the dice start rolling. 25 00:01:04,590 --> 00:01:07,644 Now if your roll happened to come up with the numbers two, 26 00:01:07,644 --> 00:01:09,560 three, and four, well, there's no fives there, 27 00:01:09,560 --> 00:01:11,300 so you've lost $1.00. 28 00:01:11,300 --> 00:01:14,422 On the other hand, if your rolls came out five, four, six, 29 00:01:14,422 --> 00:01:15,880 there's one five, you've one $1.00. 30 00:01:15,880 --> 00:01:18,200 If it came out five, four, five, there's two fives, 31 00:01:18,200 --> 00:01:19,000 you've won $1.00. 32 00:01:19,000 --> 00:01:22,080 And if it was all fives, you've actually won $3.00. 33 00:01:22,080 --> 00:01:25,460 Now real carnival dice is often played where you either win 34 00:01:25,460 --> 00:01:27,670 or lose $1.00 depending on whether there's any match 35 00:01:27,670 --> 00:01:30,380 at all, but we're playing a more generous game where, 36 00:01:30,380 --> 00:01:32,120 if you double match, you get $2.00. 37 00:01:32,120 --> 00:01:34,410 If you triple match, you get $3.00. 38 00:01:34,410 --> 00:01:38,290 So the basic question about this is, is this a fair game. 39 00:01:38,290 --> 00:01:41,280 Is this worth playing, and how can we think about that? 40 00:01:41,280 --> 00:01:43,960 Well, we're going to think about it probabilistically. 41 00:01:43,960 --> 00:01:46,330 So let's think about the probability 42 00:01:46,330 --> 00:01:48,955 of rolling no fives. 43 00:01:48,955 --> 00:01:51,040 If five is my favorite number, what's 44 00:01:51,040 --> 00:01:52,910 the probability that I roll none of them? 45 00:01:52,910 --> 00:01:55,370 Well, there's a five out of six chance 46 00:01:55,370 --> 00:01:58,070 that I don't roll a five on the first die, 47 00:01:58,070 --> 00:02:00,020 and on the second die and on the third die. 48 00:02:00,020 --> 00:02:02,390 And since the die rolls are assumed to be independent, 49 00:02:02,390 --> 00:02:05,890 the dies are independent, the probability of no fives 50 00:02:05,890 --> 00:02:11,664 is 5/6 to the third, which comes out to be 125/216. 51 00:02:11,664 --> 00:02:13,080 I'm writing this out because we're 52 00:02:13,080 --> 00:02:15,240 going to put all the numbers over 216 53 00:02:15,240 --> 00:02:16,680 to make them easier to compare. 54 00:02:16,680 --> 00:02:17,450 OK. 55 00:02:17,450 --> 00:02:19,880 What's the probability of one five? 56 00:02:19,880 --> 00:02:23,320 Well, the probability of any single sequence 57 00:02:23,320 --> 00:02:30,066 of die rolls with a single five is 5/6 of no five times 5/6 58 00:02:30,066 --> 00:02:32,850 of no five times 1/6 of one five. 59 00:02:32,850 --> 00:02:37,900 And there are 3 choose 1 possible sequences of dice 60 00:02:37,900 --> 00:02:42,800 rolls with one five, and the others non-fives. 61 00:02:42,800 --> 00:02:44,520 Likewise, for two fives, there's 3 62 00:02:44,520 --> 00:02:49,130 choose 2 times 5/6 to the 1, which 63 00:02:49,130 --> 00:02:57,550 is one way of choosing the place that does not have a five. 64 00:02:57,550 --> 00:03:01,630 And 1/6 times 1/6, which is the probability of getting 65 00:03:01,630 --> 00:03:03,214 fives in the other places. 66 00:03:03,214 --> 00:03:05,380 I didn't say that well, but you can get it straight. 67 00:03:05,380 --> 00:03:05,950 OK. 68 00:03:05,950 --> 00:03:10,490 The probability of three fives is the probability of 1/6 69 00:03:10,490 --> 00:03:12,415 of getting a five on the first die, 1/6 70 00:03:12,415 --> 00:03:14,540 of getting a five on the second die, 1/6 of getting 71 00:03:14,540 --> 00:03:15,725 a five on the third die. 72 00:03:15,725 --> 00:03:17,690 It's simply 1/6 cubed. 73 00:03:17,690 --> 00:03:20,610 OK, so we can easily calculate these probabilities. 74 00:03:20,610 --> 00:03:22,530 This is a familiar exercise. 75 00:03:22,530 --> 00:03:23,830 Let's put them in a chart. 76 00:03:23,830 --> 00:03:27,040 So what we've figured out is that 0 matches has 77 00:03:27,040 --> 00:03:29,920 a probability of 125 over 216. 78 00:03:29,920 --> 00:03:33,360 And in that case, I lose $1.00. 79 00:03:33,360 --> 00:03:36,770 One match turns out to have a probability of 75 out of 216, 80 00:03:36,770 --> 00:03:38,846 and I win $1.00. 81 00:03:38,846 --> 00:03:42,290 Two matches is 15 out of 216, I win $2.00. 82 00:03:42,290 --> 00:03:46,340 And three matches, there's one chance in 216 that I win 83 00:03:46,340 --> 00:03:48,520 the $3.00. 84 00:03:48,520 --> 00:03:53,130 So now I can ask about what do I expect to win. 85 00:03:53,130 --> 00:03:55,940 Suppose I play 216 games, and the games 86 00:03:55,940 --> 00:03:58,670 split exactly according to these probabilities. 87 00:03:58,670 --> 00:04:01,810 Then what I would expect is that I would wind up 88 00:04:01,810 --> 00:04:05,470 with 0 matches about 125 times. 89 00:04:05,470 --> 00:04:08,000 That was the probability of there being no matches. 90 00:04:08,000 --> 00:04:09,860 It was 125/216. 91 00:04:09,860 --> 00:04:13,604 So if I played 216 games, I expect about 125 92 00:04:13,604 --> 00:04:15,270 are going to-- I'm going to win nothing. 93 00:04:15,270 --> 00:04:17,894 Or, I'm going to get no matches, which actually means I'll lose 94 00:04:17,894 --> 00:04:19,240 $1.00 on each. 95 00:04:19,240 --> 00:04:21,820 One match I expect about 75 times. 96 00:04:21,820 --> 00:04:23,120 2 matches, 15 times. 97 00:04:23,120 --> 00:04:25,190 3 matches, once. 98 00:04:25,190 --> 00:04:33,245 So my average win is going to be 125 times minus 1, 75 times 1, 99 00:04:33,245 --> 00:04:38,700 15 times 2 plus 1 times 3 divided by 216. 100 00:04:38,700 --> 00:04:43,460 So these numbers on the top were how the 216 rolls split among 101 00:04:43,460 --> 00:04:46,610 my choices of losing $1.00, winning $1.00, winning $2.00, 102 00:04:46,610 --> 00:04:47,660 and winning $3.00. 103 00:04:47,660 --> 00:04:49,540 And it comes out to be slightly negative. 104 00:04:49,540 --> 00:04:55,100 It's actually minus $0.08-- minus 17/216 of $1.00, 105 00:04:55,100 --> 00:04:58,000 which is about minus $0.08. 106 00:04:58,000 --> 00:05:00,880 So I'm losing, on the average, $0.08 per roll. 107 00:05:00,880 --> 00:05:02,800 This is not a fair game. 108 00:05:02,800 --> 00:05:04,800 It's really biased against me. 109 00:05:04,800 --> 00:05:06,260 And if I keep playing long enough, 110 00:05:06,260 --> 00:05:07,890 I'm going to find that I average out 111 00:05:07,890 --> 00:05:13,100 a kind of steady loss of about $0.08 a play. 112 00:05:13,100 --> 00:05:15,700 So we would summarize this by saying 113 00:05:15,700 --> 00:05:18,760 that you expect to lose $0.08, meaning 114 00:05:18,760 --> 00:05:21,374 that your average loss is $0.08 and you expect that that's 115 00:05:21,374 --> 00:05:23,040 going to be the phenomenon that comes up 116 00:05:23,040 --> 00:05:25,761 if you keep playing the game repeatedly and repeatedly. 117 00:05:25,761 --> 00:05:27,260 It's important to notice, of course, 118 00:05:27,260 --> 00:05:30,940 you never actually lose $0.08 on any single play. 119 00:05:30,940 --> 00:05:34,040 So what you-- this notion of your expecting to lose $0.08, 120 00:05:34,040 --> 00:05:35,570 it never happens. 121 00:05:35,570 --> 00:05:37,660 It's just your average loss. 122 00:05:37,660 --> 00:05:40,310 Notice every single play you're either going to lose $1, 123 00:05:40,310 --> 00:05:42,050 win $1, win $2, win $3. 124 00:05:42,050 --> 00:05:44,770 There's no $0.08 at all showing up. 125 00:05:44,770 --> 00:05:46,280 OK. 126 00:05:46,280 --> 00:05:49,470 So now let's abstract the expected value 127 00:05:49,470 --> 00:05:52,400 of a random variable R. So a random variable 128 00:05:52,400 --> 00:05:54,440 is this thing that probabilistically 129 00:05:54,440 --> 00:05:57,500 takes on different values with different probabilities. 130 00:05:57,500 --> 00:05:59,810 And its expected value is defined 131 00:05:59,810 --> 00:06:03,160 to be its average value where the different values are 132 00:06:03,160 --> 00:06:05,780 weighted by their probabilities. 133 00:06:05,780 --> 00:06:08,090 We can write this out as a precise formula. 134 00:06:08,090 --> 00:06:11,170 The expectation of a random variable R 135 00:06:11,170 --> 00:06:16,426 is defined to be the sum over all its possible values-- it 136 00:06:16,426 --> 00:06:18,050 doesn't indicate what the summation is, 137 00:06:18,050 --> 00:06:22,140 but that's over all possible values v-- of v 138 00:06:22,140 --> 00:06:24,890 times the probability that v comes up, 139 00:06:24,890 --> 00:06:26,530 the probability that R equals v. So 140 00:06:26,530 --> 00:06:29,200 this is the basic definition of the expected 141 00:06:29,200 --> 00:06:31,520 value of a random variable. 142 00:06:31,520 --> 00:06:35,140 Now let me mention here that this sum works 143 00:06:35,140 --> 00:06:39,680 because since we're assuming accountable sample space, 144 00:06:39,680 --> 00:06:43,080 R is defined on only countably many outcomes, 145 00:06:43,080 --> 00:06:46,060 which means it can only take countably many values. 146 00:06:46,060 --> 00:06:50,030 So this is a countable sum over all the possible values 147 00:06:50,030 --> 00:06:56,090 that R takes, because there are only countably many of them. 148 00:06:56,090 --> 00:06:58,400 And what we've just concluded, then, 149 00:06:58,400 --> 00:07:02,240 is the expected win in the carnival dice game 150 00:07:02,240 --> 00:07:06,010 is minus 17/216. 151 00:07:06,010 --> 00:07:08,430 Check this formal definition of the expectation 152 00:07:08,430 --> 00:07:12,450 of a random variable versus the random variable defined 153 00:07:12,450 --> 00:07:17,480 to be how much you win on a given play of carnival dice, 154 00:07:17,480 --> 00:07:19,890 and it comes out to be that average. 155 00:07:19,890 --> 00:07:23,799 Minus 17/216. 156 00:07:23,799 --> 00:07:25,340 Now there's a technical result that's 157 00:07:25,340 --> 00:07:29,990 useful in some proofs that says that there's another way 158 00:07:29,990 --> 00:07:31,290 to get the expectation. 159 00:07:31,290 --> 00:07:33,320 The expectation can also be expressed 160 00:07:33,320 --> 00:07:36,810 by saying it's the sum over all the possible outcomes 161 00:07:36,810 --> 00:07:38,710 in the sample space-- S is the sample 162 00:07:38,710 --> 00:07:44,060 space-- of the value of the random variable at that outcome 163 00:07:44,060 --> 00:07:47,500 times the probability of that outcome. 164 00:07:47,500 --> 00:07:51,610 So this is another alternative definition of 165 00:07:51,610 --> 00:07:56,990 compared to saying it's the sum over all 166 00:07:56,990 --> 00:08:00,160 the values times the probability of that value. 167 00:08:00,160 --> 00:08:02,620 Here, it's the sum over all the outcomes 168 00:08:02,620 --> 00:08:04,789 of the value of the random variable, 169 00:08:04,789 --> 00:08:06,830 the outcome times the probability of the outcome. 170 00:08:06,830 --> 00:08:10,060 It's not entirely obvious that those two definitions 171 00:08:10,060 --> 00:08:10,820 are equivalent. 172 00:08:10,820 --> 00:08:12,361 This form of the definition turns out 173 00:08:12,361 --> 00:08:14,190 to be technically helpful in some proofs, 174 00:08:14,190 --> 00:08:17,490 although outside of proofs you don't use it 175 00:08:17,490 --> 00:08:18,670 so much in applications. 176 00:08:18,670 --> 00:08:21,550 But it's not a bad exercise to prove this equivalence. 177 00:08:21,550 --> 00:08:23,050 So I'm going to walk you through it. 178 00:08:23,050 --> 00:08:26,560 But if it's boring-- it's kind of a boring series of equations 179 00:08:26,560 --> 00:08:29,180 on slides, and you're welcome to skip past it. 180 00:08:29,180 --> 00:08:32,700 It is a derivation that I expect you to be able to carry out. 181 00:08:32,700 --> 00:08:34,730 So let's just carry out this derivation. 182 00:08:34,730 --> 00:08:36,900 I'm going to prove that the expectation is 183 00:08:36,900 --> 00:08:40,282 equal to the sum over all the outcomes of the value 184 00:08:40,282 --> 00:08:42,740 of the random variable at the outcome times the probability 185 00:08:42,740 --> 00:08:44,080 of the outcome. 186 00:08:44,080 --> 00:08:46,227 And let's prove it. 187 00:08:46,227 --> 00:08:48,560 In order to prove it, let's begin with one little remark 188 00:08:48,560 --> 00:08:49,870 that's useful. 189 00:08:49,870 --> 00:08:52,990 Remember that this notation R equals 190 00:08:52,990 --> 00:08:56,550 v describes the event that the random variable takes the value 191 00:08:56,550 --> 00:09:00,730 v, which by definition is an event is the set of outcomes 192 00:09:00,730 --> 00:09:02,310 where this property holds. 193 00:09:02,310 --> 00:09:06,490 So it's the set of outcomes omega where R of omega 194 00:09:06,490 --> 00:09:11,090 is equal to v. So let's just remember that, that brackets 195 00:09:11,090 --> 00:09:13,540 R equals v is the event that R is 196 00:09:13,540 --> 00:09:17,090 equal to v, meaning the set of outcomes where that's true. 197 00:09:17,090 --> 00:09:18,670 So what that tells us in particular 198 00:09:18,670 --> 00:09:20,510 is that the probability of R equals 199 00:09:20,510 --> 00:09:23,720 v is, by definition, the sum of the probabilities 200 00:09:23,720 --> 00:09:26,680 of the outcomes in the event. 201 00:09:26,680 --> 00:09:29,485 So it's the sum over all those outcomes. 202 00:09:32,110 --> 00:09:33,990 Now let's go back to the original definition 203 00:09:33,990 --> 00:09:36,860 of the expectation of R. The original definition is-- 204 00:09:36,860 --> 00:09:41,240 and the standard one is-- it's the sum over all the values 205 00:09:41,240 --> 00:09:43,740 of the value times the probability 206 00:09:43,740 --> 00:09:46,550 that the random variable is equal to the value. 207 00:09:46,550 --> 00:09:48,200 Now on the previous slide, we just 208 00:09:48,200 --> 00:09:50,640 had a formula for the probability 209 00:09:50,640 --> 00:09:53,370 that R is equal to v. It's simply 210 00:09:53,370 --> 00:09:57,990 the sum over all the outcomes of where R is equal to v, 211 00:09:57,990 --> 00:10:00,270 of the probability of that outcome. 212 00:10:00,270 --> 00:10:03,100 So I can replace that term by the sum 213 00:10:03,100 --> 00:10:06,970 over all the outcomes of the probability of the outcome. 214 00:10:06,970 --> 00:10:07,576 OK. 215 00:10:07,576 --> 00:10:09,700 So I'm trying to head towards an expressions that's 216 00:10:09,700 --> 00:10:12,980 only outcomes, which is kind of the top-level strategy here. 217 00:10:12,980 --> 00:10:14,680 So the first thing I did was I got rid 218 00:10:14,680 --> 00:10:17,600 of that probability of v and replaced it 219 00:10:17,600 --> 00:10:19,704 by the sum of all these probabilities-- 220 00:10:19,704 --> 00:10:21,370 of the probabilities of all the outcomes 221 00:10:21,370 --> 00:10:25,935 where R is v. Well, next step is I'm going to just distribute 222 00:10:25,935 --> 00:10:28,020 the v over the inner sum. 223 00:10:28,020 --> 00:10:32,350 And I get that this thing is equal to the sum, 224 00:10:32,350 --> 00:10:36,470 again, over all those outcomes in R equals v of v times 225 00:10:36,470 --> 00:10:38,740 the probability of the outcome. 226 00:10:38,740 --> 00:10:43,550 But look, these outcomes are the outcomes where R is equal to v. 227 00:10:43,550 --> 00:10:50,170 So I could replace that v by R of omega. 228 00:10:50,170 --> 00:10:52,330 That one slipped sideways a little bit, 229 00:10:52,330 --> 00:10:53,240 so let's watch that. 230 00:10:53,240 --> 00:10:58,730 This v is simply going to become an R of omega. 231 00:10:58,730 --> 00:11:01,340 I'm still [INAUDIBLE] over the same set of omegas, 232 00:11:01,340 --> 00:11:04,770 but now I've gotten rid of pretty much everything 233 00:11:04,770 --> 00:11:05,730 but the omegas. 234 00:11:05,730 --> 00:11:10,530 So I've got this inner sum of over all possible omegas in R 235 00:11:10,530 --> 00:11:14,320 of v of R of omega times the probability of omega. 236 00:11:14,320 --> 00:11:16,670 And I'm summing over all possible v. 237 00:11:16,670 --> 00:11:19,000 But if I'm summing over all possible v and then 238 00:11:19,000 --> 00:11:21,525 all possible outcomes where R is equal to v, 239 00:11:21,525 --> 00:11:25,540 I wind up summing over all possible outcomes. 240 00:11:25,540 --> 00:11:29,250 And so I've finished the proof that the expectation of R 241 00:11:29,250 --> 00:11:33,770 is equal to the sum over all the outcomes of R of omega times 242 00:11:33,770 --> 00:11:37,120 the probability of omega. 243 00:11:37,120 --> 00:11:39,140 Now I'd never do a proof like this in a lecture, 244 00:11:39,140 --> 00:11:41,800 because I think watching a lecturer write stuff 245 00:11:41,800 --> 00:11:43,500 on the board, a whole bunch of symbols 246 00:11:43,500 --> 00:11:46,270 and manipulating equations, is really insipid and boring. 247 00:11:46,270 --> 00:11:48,160 Most people can't follow it anyway. 248 00:11:48,160 --> 00:11:50,530 I'm hoping that in the video, where you can go back 249 00:11:50,530 --> 00:11:52,950 if you wish and replay it and watch it more slowly, 250 00:11:52,950 --> 00:11:55,010 or at your own speed, the derivation 251 00:11:55,010 --> 00:11:56,870 will be of some value to you. 252 00:11:56,870 --> 00:12:00,070 But let's step back a little bit and notice 253 00:12:00,070 --> 00:12:02,910 some top-level technical things that we never 254 00:12:02,910 --> 00:12:05,820 really paid attention to in the process of doing 255 00:12:05,820 --> 00:12:07,490 this manipulative proof. 256 00:12:07,490 --> 00:12:09,860 So the top-level observation, first of all, 257 00:12:09,860 --> 00:12:13,670 is that this proof, like many proofs in basic foundations 258 00:12:13,670 --> 00:12:16,340 of probability theory and random variables, 259 00:12:16,340 --> 00:12:19,120 in particular, involves taking sums and rearranging 260 00:12:19,120 --> 00:12:21,440 the terms in the sums a lot. 261 00:12:21,440 --> 00:12:24,170 So the first question is, why sums? 262 00:12:24,170 --> 00:12:26,640 Remember here we were summing over all 263 00:12:26,640 --> 00:12:30,520 the possible variables, all the possible values 264 00:12:30,520 --> 00:12:31,810 of some random variable. 265 00:12:31,810 --> 00:12:33,220 Why is that a sum? 266 00:12:33,220 --> 00:12:38,030 Well it's a sum because we were assuming that the sample 267 00:12:38,030 --> 00:12:39,520 space was countable. 268 00:12:39,520 --> 00:12:41,640 There were only a countable number 269 00:12:41,640 --> 00:12:45,430 of values R of omega 0, R of omega 1, R of omega n, 270 00:12:45,430 --> 00:12:46,690 and so on. 271 00:12:46,690 --> 00:12:51,100 And so we can be sure that the sum 272 00:12:51,100 --> 00:12:53,390 over all the possible values of the random variable 273 00:12:53,390 --> 00:12:54,730 is a countable sum. 274 00:12:54,730 --> 00:12:58,620 It's a sum, and we don't have to worry about integrals, which 275 00:12:58,620 --> 00:13:00,980 is the main technical reason why we're 276 00:13:00,980 --> 00:13:03,790 doing discrete probability and assuming that there are only 277 00:13:03,790 --> 00:13:06,130 a countable number of outcomes. 278 00:13:06,130 --> 00:13:08,790 There's a second very important technicality 279 00:13:08,790 --> 00:13:10,360 that's worth mentioning. 280 00:13:10,360 --> 00:13:12,880 All the proofs involved rearranging terms 281 00:13:12,880 --> 00:13:16,960 in sums freely and without care. 282 00:13:16,960 --> 00:13:19,780 But that means that we're implicitly 283 00:13:19,780 --> 00:13:24,320 assuming that it's safe to do that, and that, in particular, 284 00:13:24,320 --> 00:13:27,280 that the defining sum for expectations 285 00:13:27,280 --> 00:13:29,490 needs to be absolutely convergent. 286 00:13:29,490 --> 00:13:31,780 And all of these sums need to be absolutely 287 00:13:31,780 --> 00:13:34,850 convergent in order for that kind of rearrangement 288 00:13:34,850 --> 00:13:35,960 to make sense. 289 00:13:35,960 --> 00:13:37,910 So remember that absolute convergence 290 00:13:37,910 --> 00:13:42,240 means that the sum of the absolute values of all 291 00:13:42,240 --> 00:13:46,030 the terms in the sum converge. 292 00:13:46,030 --> 00:13:49,070 So if we look at this definition of expectation, 293 00:13:49,070 --> 00:13:51,830 it said it was the sum over all the values in the range. 294 00:13:51,830 --> 00:13:54,220 We know that's a countable sum of the value 295 00:13:54,220 --> 00:13:57,350 times the probability that R was equal to that value. 296 00:13:57,350 --> 00:14:00,730 But the very definition never specified 297 00:14:00,730 --> 00:14:04,330 the order in which these terms, v times probability R equals v, 298 00:14:04,330 --> 00:14:05,560 got added up. 299 00:14:05,560 --> 00:14:07,400 It better not make a difference. 300 00:14:07,400 --> 00:14:11,170 So we're implicitly assuming absolute convergence 301 00:14:11,170 --> 00:14:14,300 of this sum in order for the expectation 302 00:14:14,300 --> 00:14:15,720 to even be well-defined. 303 00:14:15,720 --> 00:14:17,730 As a matter of fact, the terrible pathology 304 00:14:17,730 --> 00:14:19,355 that happens-- and you may have learned 305 00:14:19,355 --> 00:14:20,820 about this in first-time calculus, 306 00:14:20,820 --> 00:14:22,570 and we actually have a problem in the text 307 00:14:22,570 --> 00:14:26,660 about it-- is that you can have sums like this, that are not 308 00:14:26,660 --> 00:14:32,010 absolutely convergent, and then you pick your favorite value 309 00:14:32,010 --> 00:14:34,520 and I can rearrange the terms in the sum 310 00:14:34,520 --> 00:14:37,610 so that it converges to that value. 311 00:14:37,610 --> 00:14:40,840 When you're dealing with non-absolute value sums, 312 00:14:40,840 --> 00:14:44,560 rearranging is a no-no. 313 00:14:44,560 --> 00:14:47,210 The sum depends crucially on the ordering 314 00:14:47,210 --> 00:14:49,590 in which the terms appear, and all 315 00:14:49,590 --> 00:14:52,660 of the reasoning and probability theory would be inapplicable. 316 00:14:52,660 --> 00:14:55,950 So we are implicitly assuming that all of these sums 317 00:14:55,950 --> 00:14:57,545 are absolutely convergent. 318 00:15:01,060 --> 00:15:02,930 Just to get some vocabulary in place, 319 00:15:02,930 --> 00:15:06,520 the expected value is also known as the mean value, or the mean, 320 00:15:06,520 --> 00:15:11,380 or the expectation of the random variable. 321 00:15:11,380 --> 00:15:14,680 Now let's connect up expectations with averages 322 00:15:14,680 --> 00:15:16,020 in a more precise way. 323 00:15:16,020 --> 00:15:17,690 We said that the expectation was kind 324 00:15:17,690 --> 00:15:20,240 of an abstraction of averages, but it's more 325 00:15:20,240 --> 00:15:22,592 intimately connected to averages than that, even. 326 00:15:22,592 --> 00:15:24,050 Let's take an example where suppose 327 00:15:24,050 --> 00:15:27,820 you have a pile of graded exams, and you pick one at random. 328 00:15:27,820 --> 00:15:32,330 Let's let S be the score of the randomly picked exam. 329 00:15:32,330 --> 00:15:37,520 So I'm turning this process, this random process of picking 330 00:15:37,520 --> 00:15:42,960 an exam from the pile, is defining a random variable, S, 331 00:15:42,960 --> 00:15:45,380 where by definition of picking one at random, 332 00:15:45,380 --> 00:15:46,450 I mean uniformly. 333 00:15:46,450 --> 00:15:49,610 So S is actually not a uniform random variable, 334 00:15:49,610 --> 00:15:52,430 but I'm picking the exams with equal probability. 335 00:15:52,430 --> 00:15:55,220 And then they have different scores, 336 00:15:55,220 --> 00:15:58,460 so the outcomes are of uniform probability. 337 00:15:58,460 --> 00:16:00,940 But S is not, because there might 338 00:16:00,940 --> 00:16:03,650 be a lot of outcomes, a lot of exams with the same score. 339 00:16:03,650 --> 00:16:04,370 All right. 340 00:16:04,370 --> 00:16:07,210 S is a random variable defined by this process 341 00:16:07,210 --> 00:16:09,660 of picking a random exam. 342 00:16:09,660 --> 00:16:12,690 And then you can just check that the expectation of S 343 00:16:12,690 --> 00:16:17,100 now exactly equals the average exam score, which 344 00:16:17,100 --> 00:16:18,550 is the typical thing that students 345 00:16:18,550 --> 00:16:21,891 want to know when the exam is done, what's the average score. 346 00:16:21,891 --> 00:16:23,390 Actually, the average score is often 347 00:16:23,390 --> 00:16:26,690 less informative than the median score, the middle score, 348 00:16:26,690 --> 00:16:28,790 but people somehow rather always want 349 00:16:28,790 --> 00:16:30,500 to know about the averages. 350 00:16:30,500 --> 00:16:33,940 The reason why the average may not be so informative 351 00:16:33,940 --> 00:16:37,170 is because-- well, it has some weird properties that I'll 352 00:16:37,170 --> 00:16:38,350 illustrate in a second. 353 00:16:38,350 --> 00:16:40,890 But the point here of what we did 354 00:16:40,890 --> 00:16:47,530 where we took the-- we got at the average score on the exam 355 00:16:47,530 --> 00:16:53,340 by defining a random variable based on picking a random exam. 356 00:16:53,340 --> 00:16:54,940 So that's a general process. 357 00:16:54,940 --> 00:16:59,620 We can estimate averages in some population of things 358 00:16:59,620 --> 00:17:04,810 by estimating the expectations of random variables 359 00:17:04,810 --> 00:17:07,940 based on picking random elements from the thing 360 00:17:07,940 --> 00:17:09,569 that we're averaging over. 361 00:17:09,569 --> 00:17:11,944 That's called sampling, and it's a basic idea 362 00:17:11,944 --> 00:17:14,180 of probability theory that we're going 363 00:17:14,180 --> 00:17:16,420 to be able to get a hold of averages 364 00:17:16,420 --> 00:17:21,670 by abstracting the calculation of an average 365 00:17:21,670 --> 00:17:25,650 into taking-- defining a random variable 366 00:17:25,650 --> 00:17:28,260 and calculating its expectation. 367 00:17:28,260 --> 00:17:31,170 Let's look at an example. 368 00:17:31,170 --> 00:17:33,710 It's obviously impossible for all the exams 369 00:17:33,710 --> 00:17:36,612 to be above average, because then the average 370 00:17:36,612 --> 00:17:37,570 would be above average. 371 00:17:37,570 --> 00:17:38,480 That's absurd. 372 00:17:38,480 --> 00:17:41,210 So if you translate that into a formal statement 373 00:17:41,210 --> 00:17:43,952 about expectations, it translates directly-- 374 00:17:43,952 --> 00:17:45,910 by the way, I don't know how many of you listen 375 00:17:45,910 --> 00:17:49,780 to the Prairie Home Companion, but one of the sign-offs 376 00:17:49,780 --> 00:17:53,120 there is at the town of Lake Woebegone in Wisconsin, 377 00:17:53,120 --> 00:17:55,140 where all the children are above average. 378 00:17:55,140 --> 00:17:57,920 Well, t'ain't possible. 379 00:17:57,920 --> 00:18:00,910 That translates into this technical statement 380 00:18:00,910 --> 00:18:04,060 that the probability that a random variable is 381 00:18:04,060 --> 00:18:08,400 greater than its expected value is less than 1. 382 00:18:08,400 --> 00:18:13,590 It can't always be greater than its expected value. 383 00:18:13,590 --> 00:18:15,360 That's absurd. 384 00:18:15,360 --> 00:18:20,504 On the other hand, it's actually possible for the probability 385 00:18:20,504 --> 00:18:22,920 that the random variable is bigger than its expected value 386 00:18:22,920 --> 00:18:26,050 to be as close to 1 as you want. 387 00:18:26,050 --> 00:18:28,330 And one way to think about that is 388 00:18:28,330 --> 00:18:31,180 that, for example, almost everyone 389 00:18:31,180 --> 00:18:33,730 has an above average number of fingers. 390 00:18:33,730 --> 00:18:34,980 Think about that for a second. 391 00:18:34,980 --> 00:18:38,690 Almost everyone has an above average number of fingers. 392 00:18:38,690 --> 00:18:40,330 Well, the explanation is really simple. 393 00:18:40,330 --> 00:18:45,020 It's simply because amputation is much more 394 00:18:45,020 --> 00:18:47,790 common than polydactylism. 395 00:18:47,790 --> 00:18:50,410 And if you can't understand what I just said, 396 00:18:50,410 --> 00:18:53,420 look it up and think about it.