1 00:00:00,700 --> 00:00:04,110 We will now define the notion of a random variable. 2 00:00:04,110 --> 00:00:06,970 Very loosely speaking, a random variable is a numerical 3 00:00:06,970 --> 00:00:10,180 quantity that takes random values. 4 00:00:10,180 --> 00:00:11,410 But what does this mean? 5 00:00:11,410 --> 00:00:14,260 We want to be a little more precise and I'm going to 6 00:00:14,260 --> 00:00:17,140 introduce the idea through an example. 7 00:00:17,140 --> 00:00:22,160 Suppose that our sample space is a set of students labeled 8 00:00:22,160 --> 00:00:23,780 according to their names. 9 00:00:23,780 --> 00:00:33,920 Or for simplicity, let's just label them as a, b, c, and d. 10 00:00:33,920 --> 00:00:37,200 Our probabilistic experiment is to pick a student at random 11 00:00:37,200 --> 00:00:41,690 according to some probability law and then record their 12 00:00:41,690 --> 00:00:43,640 weight in kilograms. 13 00:00:47,700 --> 00:00:51,520 So for example, suppose that the outcome of the experiment 14 00:00:51,520 --> 00:00:54,600 was this particular student, and the weight of 15 00:00:54,600 --> 00:00:57,430 that student is 62. 16 00:00:57,430 --> 00:01:00,370 Or it could be that the outcome of the experiment is 17 00:01:00,370 --> 00:01:03,980 this particular student, and that particular student has a 18 00:01:03,980 --> 00:01:07,900 weight of 75 kilograms. 19 00:01:07,900 --> 00:01:11,930 The weight of a particular student is a number, little w. 20 00:01:11,930 --> 00:01:15,280 But let us think of the abstract concept of weight, 21 00:01:15,280 --> 00:01:21,170 something that we will denote by capital W. Weight is an 22 00:01:21,170 --> 00:01:24,700 object whose value is determined once you tell me 23 00:01:24,700 --> 00:01:28,300 the outcome of the experiment, once you tell me which student 24 00:01:28,300 --> 00:01:29,370 was picked. 25 00:01:29,370 --> 00:01:33,080 In this sense, weight is really a function of the 26 00:01:33,080 --> 00:01:34,960 outcome of the experiment. 27 00:01:34,960 --> 00:01:40,440 So think of weight as an abstract box that takes as 28 00:01:40,440 --> 00:01:51,130 input a student and produces a number, little w, which is the 29 00:01:51,130 --> 00:01:54,650 weight of that particular student. 30 00:01:54,650 --> 00:01:59,220 Or more concretely, think of weight with a capital W as a 31 00:01:59,220 --> 00:02:02,820 procedure that takes a student, puts him or her on a 32 00:02:02,820 --> 00:02:04,840 scale, and reports the result. 33 00:02:04,840 --> 00:02:08,620 In this sense, weight is an object of the same kind as the 34 00:02:08,620 --> 00:02:13,740 square root function that's sitting inside your computer. 35 00:02:13,740 --> 00:02:16,450 The square root function is a function. 36 00:02:16,450 --> 00:02:20,660 It's a subroutine, perhaps it is a piece of code, that takes 37 00:02:20,660 --> 00:02:25,460 as input a number, let's say the number 9, and produces 38 00:02:25,460 --> 00:02:26,750 another number. 39 00:02:26,750 --> 00:02:30,110 In this case, it would be the number 3, which is the 40 00:02:30,110 --> 00:02:31,720 square root of 9. 41 00:02:34,300 --> 00:02:37,660 Notice here the distinction that we will keep emphasizing 42 00:02:37,660 --> 00:02:39,050 over and over. 43 00:02:39,050 --> 00:02:41,190 Square root of 9 is a number. 44 00:02:41,190 --> 00:02:43,120 It is the number 3. 45 00:02:43,120 --> 00:02:46,375 The box square root is a function. 46 00:02:49,320 --> 00:02:52,810 Now, let us go back to our probabilistic experiment. 47 00:02:52,810 --> 00:02:55,470 Note that a probabilistic experiment such as the one in 48 00:02:55,470 --> 00:02:59,720 our example can have several associated random variables. 49 00:02:59,720 --> 00:03:03,290 For example, we could have another random variable 50 00:03:03,290 --> 00:03:08,470 denoted by capital H, which is the height of a student 51 00:03:08,470 --> 00:03:10,665 recorded in meters. 52 00:03:13,350 --> 00:03:17,030 So if the outcome of the experiment, for example, was 53 00:03:17,030 --> 00:03:21,750 student a, then this random variable would take a value 54 00:03:21,750 --> 00:03:26,690 which is the height of that student, let's say it was 1.7. 55 00:03:26,690 --> 00:03:30,010 Or if the outcome of the experiment was student c, then 56 00:03:30,010 --> 00:03:32,520 we would record the height of that student. 57 00:03:32,520 --> 00:03:35,920 And let's say it turns out to be 1.8. 58 00:03:35,920 --> 00:03:39,540 Once again, height with a capital H is an abstract 59 00:03:39,540 --> 00:03:44,610 object, a function whose value is determined once you tell me 60 00:03:44,610 --> 00:03:47,840 the outcome of the experiment. 61 00:03:47,840 --> 00:03:51,660 Now, given some random variables, we can create new 62 00:03:51,660 --> 00:03:54,840 random variables as functions of the 63 00:03:54,840 --> 00:03:56,730 original random variables. 64 00:03:56,730 --> 00:04:02,170 For example, consider the quantity defined as weight 65 00:04:02,170 --> 00:04:05,010 divided by height squared. 66 00:04:05,010 --> 00:04:08,930 This quantity is the so-called body mass index, and it is 67 00:04:08,930 --> 00:04:12,860 also a function on the sample space. 68 00:04:12,860 --> 00:04:15,630 Why is it a function on the sample space? 69 00:04:15,630 --> 00:04:19,540 Well, because once an outcome of the experiment is 70 00:04:19,540 --> 00:04:22,580 determined, suppose that the outcome of the experiment was 71 00:04:22,580 --> 00:04:26,830 the blue student, then these two numbers, 62 and 1.7, are 72 00:04:26,830 --> 00:04:28,180 also determined. 73 00:04:28,180 --> 00:04:32,650 And using those numbers, we can carry out this calculation 74 00:04:32,650 --> 00:04:36,940 and find the body mass index of that particular student, 75 00:04:36,940 --> 00:04:40,510 which in this case would be 21.5. 76 00:04:40,510 --> 00:04:44,960 Or if it happened that this student was selected, then the 77 00:04:44,960 --> 00:04:48,980 body mass index would turn out to be some other number. 78 00:04:48,980 --> 00:04:51,650 In this case, it would be 23. 79 00:04:51,650 --> 00:04:55,650 So again, we see that the body mass index can be viewed as an 80 00:04:55,650 --> 00:04:58,820 abstract concept defined by this formula. 81 00:04:58,820 --> 00:05:02,510 But once an outcome is determined, then the body mass 82 00:05:02,510 --> 00:05:04,330 index is also determined. 83 00:05:04,330 --> 00:05:08,090 And so the body mass index is really a function of which 84 00:05:08,090 --> 00:05:12,620 particular outcome was selected. 85 00:05:12,620 --> 00:05:16,280 Let us now abstract from the previous discussion. 86 00:05:16,280 --> 00:05:21,930 We have seen that random variables are abstract objects 87 00:05:21,930 --> 00:05:26,330 that associate a specific value, a particular number, to 88 00:05:26,330 --> 00:05:30,700 any particular outcome of a probabilistic experiment. 89 00:05:30,700 --> 00:05:33,760 So in that sense, random variables are functions from 90 00:05:33,760 --> 00:05:36,690 the sample space to the real numbers. 91 00:05:36,690 --> 00:05:39,970 They are numerical functions, but as numerical functions 92 00:05:39,970 --> 00:05:42,760 they can either take discrete values, for example the 93 00:05:42,760 --> 00:05:46,200 integers, or they can take continuous values, let's say 94 00:05:46,200 --> 00:05:48,260 on the real line. 95 00:05:48,260 --> 00:05:52,200 For example, if your random variable is the number of 96 00:05:52,200 --> 00:05:56,250 heads in 10 consecutive coin tosses, this is a discrete 97 00:05:56,250 --> 00:05:58,630 random variable that takes values in the 98 00:05:58,630 --> 00:06:01,220 set from 0 to 10. 99 00:06:01,220 --> 00:06:05,770 If your random variable is a measurement of the time at 100 00:06:05,770 --> 00:06:10,900 which something happened, and if your timer has infinite 101 00:06:10,900 --> 00:06:15,090 accuracy, then the timer reports a real number and we 102 00:06:15,090 --> 00:06:18,710 would have a continuous random variable. 103 00:06:18,710 --> 00:06:23,830 In this lecture sequence and in the next few ones, we will 104 00:06:23,830 --> 00:06:26,730 concentrate on discrete random variables because they are 105 00:06:26,730 --> 00:06:28,120 easier to handle. 106 00:06:28,120 --> 00:06:30,550 And then later on, we will move to a discussion of 107 00:06:30,550 --> 00:06:33,060 continuous random variables. 108 00:06:33,060 --> 00:06:38,230 Throughout, we want to keep noting this very important 109 00:06:38,230 --> 00:06:42,390 distinction that I already brought in the discussion for 110 00:06:42,390 --> 00:06:46,260 a particular example, but it needs to be emphasized and 111 00:06:46,260 --> 00:06:47,750 re-emphasized. 112 00:06:47,750 --> 00:06:50,540 That we make a distinction between random variables, 113 00:06:50,540 --> 00:06:52,030 which are abstract objects. 114 00:06:52,030 --> 00:06:54,560 They are functions on the sample space and they are 115 00:06:54,560 --> 00:06:57,270 denoted by uppercase letters. 116 00:06:57,270 --> 00:07:01,340 In contrast, we will use lower case letters to indicate 117 00:07:01,340 --> 00:07:05,000 numerical values of the random variables. 118 00:07:05,000 --> 00:07:09,290 So little x is always a real number, as opposed to the 119 00:07:09,290 --> 00:07:13,720 random variable, which is a function. 120 00:07:13,720 --> 00:07:16,670 One point that we made earlier is that for the same 121 00:07:16,670 --> 00:07:19,490 probabilistic experiment we can have several random 122 00:07:19,490 --> 00:07:22,400 variables associated with that experiment. 123 00:07:22,400 --> 00:07:25,080 And we can also combine random variables to 124 00:07:25,080 --> 00:07:27,470 form new random variables. 125 00:07:27,470 --> 00:07:32,810 In general, a function of random variables has numerical 126 00:07:32,810 --> 00:07:37,060 values that are determined by the numerical values of the 127 00:07:37,060 --> 00:07:38,870 original random variables. 128 00:07:38,870 --> 00:07:42,070 And so, ultimately, they are determined by the outcome of 129 00:07:42,070 --> 00:07:43,140 the experiment. 130 00:07:43,140 --> 00:07:45,780 So a function of random variables has a numerical 131 00:07:45,780 --> 00:07:48,409 value which is completely determined by the outcome of 132 00:07:48,409 --> 00:07:49,159 the experiment. 133 00:07:49,159 --> 00:07:52,080 And so a function of random variables is 134 00:07:52,080 --> 00:07:54,440 also a random variable. 135 00:07:54,440 --> 00:07:58,980 As an example, we could think of two random variables, X and 136 00:07:58,980 --> 00:08:02,700 Y, associated with the same probabilistic experiment. 137 00:08:02,700 --> 00:08:07,060 And then define a random variable, let's say X plus Y. 138 00:08:07,060 --> 00:08:08,230 What does that mean? 139 00:08:08,230 --> 00:08:17,770 X plus Y is a random variable that takes the value little x 140 00:08:17,770 --> 00:08:25,560 plus little y when the random variable capital X takes the 141 00:08:25,560 --> 00:08:34,240 value little x and capital Y takes the value little y. 142 00:08:36,820 --> 00:08:39,400 So X and Y are random variables. 143 00:08:39,400 --> 00:08:42,140 X plus Y is another random variable. 144 00:08:42,140 --> 00:08:45,540 X and Y will take numerical values once the outcome of the 145 00:08:45,540 --> 00:08:47,510 experiment has been obtained. 146 00:08:47,510 --> 00:08:50,740 And if the numerical values that they take are little x 147 00:08:50,740 --> 00:08:55,160 and little y, then the random variable X plus Y will take 148 00:08:55,160 --> 00:09:00,040 the numerical value little x plus little y. 149 00:09:00,040 --> 00:09:03,700 So we can now move on and start doing some interesting 150 00:09:03,700 --> 00:09:05,820 things about random variables. 151 00:09:05,820 --> 00:09:09,130 Characterize them, describe them, give some examples, and 152 00:09:09,130 --> 00:09:11,860 introduce some new concepts associated with them.