1 00:00:01,370 --> 00:00:04,250 We will now talk about conditional expectations of 2 00:00:04,250 --> 00:00:07,280 one random variable given another. 3 00:00:07,280 --> 00:00:11,240 As we will see, there will be nothing new here, except for 4 00:00:11,240 --> 00:00:14,560 older results but given in new notation. 5 00:00:14,560 --> 00:00:17,840 Any PMF has an associated expectation. 6 00:00:17,840 --> 00:00:23,550 And so conditional PMFs also have associated expectations, 7 00:00:23,550 --> 00:00:26,420 which we call conditional expectations. 8 00:00:26,420 --> 00:00:29,630 We have already seen them for the case where we condition on 9 00:00:29,630 --> 00:00:31,400 an event, A. 10 00:00:31,400 --> 00:00:34,850 The case where we condition on random variables 11 00:00:34,850 --> 00:00:37,440 is exactly the same. 12 00:00:37,440 --> 00:00:42,660 We let the event, A, be the event that Y takes on a 13 00:00:42,660 --> 00:00:45,970 specific value. 14 00:00:45,970 --> 00:00:50,780 And then we calculate the expectation using the relevant 15 00:00:50,780 --> 00:00:53,500 conditional probabilities, those that are given by the 16 00:00:53,500 --> 00:00:55,010 conditional PMF. 17 00:00:55,010 --> 00:00:59,380 So the conditional expectation of X given that Y takes on a 18 00:00:59,380 --> 00:01:04,920 certain value is defined as the usual expectation, except 19 00:01:04,920 --> 00:01:08,710 that we use the conditional probabilities that apply given 20 00:01:08,710 --> 00:01:14,289 that Y takes on a specific value little y. 21 00:01:14,289 --> 00:01:17,460 Recall now the expected value rule for ordinary 22 00:01:17,460 --> 00:01:19,300 expectations. 23 00:01:19,300 --> 00:01:22,480 And also the Expected Value Rule for conditional 24 00:01:22,480 --> 00:01:25,310 expectations given an event, something that we 25 00:01:25,310 --> 00:01:27,630 have already seen. 26 00:01:27,630 --> 00:01:32,640 Now, in PMF notation, the expected value rule takes a 27 00:01:32,640 --> 00:01:34,550 similar form. 28 00:01:34,550 --> 00:01:39,259 The event, A is replaced by the specific event that Y 29 00:01:39,259 --> 00:01:41,490 takes on a specific value. 30 00:01:41,490 --> 00:01:44,620 And in that case, the conditional PMF given the 31 00:01:44,620 --> 00:01:48,310 event A is just the conditional PMF given that 32 00:01:48,310 --> 00:01:54,550 random variable Y takes on a specific value, little y. 33 00:01:54,550 --> 00:01:58,210 For the case where we condition on events, we also 34 00:01:58,210 --> 00:02:02,000 developed a version of the total probability theorem and 35 00:02:02,000 --> 00:02:05,910 the total expectation theorem. 36 00:02:05,910 --> 00:02:09,360 We can do the same when we condition on random variables. 37 00:02:09,360 --> 00:02:12,650 So suppose that the sample space has been partitioned 38 00:02:12,650 --> 00:02:15,910 into n, disjoint scenarios. 39 00:02:15,910 --> 00:02:19,579 The total probability theorem tells us that the probability 40 00:02:19,579 --> 00:02:26,190 of the event that random variable X takes on a value 41 00:02:26,190 --> 00:02:30,610 little x, can be found by taking the probabilities of 42 00:02:30,610 --> 00:02:35,079 this event under each one of the possible scenarios. 43 00:02:35,079 --> 00:02:38,870 And then weighing those probabilities according to the 44 00:02:38,870 --> 00:02:43,390 probabilities of the different scenarios. 45 00:02:43,390 --> 00:02:47,160 Now, suppose that we are dealing with a random variable 46 00:02:47,160 --> 00:02:52,995 that takes values in a set consisting of n elements. 47 00:02:57,300 --> 00:03:03,550 And let us consider scenarios Ai, the i-th scenario is the 48 00:03:03,550 --> 00:03:06,750 event that the random variable Y takes on the 49 00:03:06,750 --> 00:03:10,270 i-th possible value. 50 00:03:10,270 --> 00:03:12,070 We can apply the total probability 51 00:03:12,070 --> 00:03:14,550 theorem to this situation. 52 00:03:14,550 --> 00:03:17,590 We can find the probability that the random variable X 53 00:03:17,590 --> 00:03:21,680 takes on a certain value, little x, by considering the 54 00:03:21,680 --> 00:03:25,320 probability of this event happening under each possible 55 00:03:25,320 --> 00:03:30,640 scenario, where a scenario is that Y took on a specific 56 00:03:30,640 --> 00:03:35,620 value, and then weigh those probabilities according to the 57 00:03:35,620 --> 00:03:37,770 probabilities of the different scenarios. 58 00:03:40,280 --> 00:03:41,670 The story with the total 59 00:03:41,670 --> 00:03:45,190 expectation theorem is similar. 60 00:03:45,190 --> 00:03:48,579 We know that an expectation can be found by taking the 61 00:03:48,579 --> 00:03:52,630 conditional expectations under each one of the scenarios and 62 00:03:52,630 --> 00:03:55,520 weighing them according to the probabilities of 63 00:03:55,520 --> 00:03:58,280 the different scenarios. 64 00:03:58,280 --> 00:04:03,650 Again, let the event that Y takes on a specific value be a 65 00:04:03,650 --> 00:04:06,490 different scenario. 66 00:04:06,490 --> 00:04:10,730 And with this correspondence we obtain the following 67 00:04:10,730 --> 00:04:14,830 version of the total expectation theorem. 68 00:04:14,830 --> 00:04:16,940 We have a sum of different terms. 69 00:04:16,940 --> 00:04:20,660 And each term in the sum is the probability of a given 70 00:04:20,660 --> 00:04:25,200 scenario times the expected value of X under this 71 00:04:25,200 --> 00:04:26,450 particular scenario. 72 00:04:28,910 --> 00:04:31,460 At this point, I have to add a comment of a more 73 00:04:31,460 --> 00:04:33,270 mathematical flavor. 74 00:04:33,270 --> 00:04:36,650 We have been talking about a partition of the sample space 75 00:04:36,650 --> 00:04:41,520 into finitely many scenarios. 76 00:04:41,520 --> 00:04:46,960 But if Y takes on values in a discrete but infinite set, for 77 00:04:46,960 --> 00:04:52,420 example, if Y can take on any integer value, the argument 78 00:04:52,420 --> 00:04:55,580 that we have given is not quite complete. 79 00:04:55,580 --> 00:04:59,820 Fortunately, the total probability theorem and the 80 00:04:59,820 --> 00:05:05,150 total expectation theorem, they both remain true, even 81 00:05:05,150 --> 00:05:10,710 for the case where Y ranges over an infinite set as long 82 00:05:10,710 --> 00:05:16,310 as the random variable X has a well-defined expectation. 83 00:05:16,310 --> 00:05:19,640 For the total probability theorem, the proof for the 84 00:05:19,640 --> 00:05:23,850 general case can be carried out without a lot of 85 00:05:23,850 --> 00:05:28,020 difficulty, just using the countable additivity axiom. 86 00:05:28,020 --> 00:05:31,590 However, for the total expectation theorem, it takes 87 00:05:31,590 --> 00:05:34,470 some harder mathematical work. 88 00:05:34,470 --> 00:05:36,620 And this is beyond our scope. 89 00:05:36,620 --> 00:05:39,800 But we will just take this fact for granted, that the 90 00:05:39,800 --> 00:05:43,540 total expectation theorem carries over to the case where 91 00:05:43,540 --> 00:05:47,680 we're adding over an infinite sequence of possible values of 92 00:05:47,680 --> 00:05:48,870 Y. 93 00:05:48,870 --> 00:05:51,830 In the rest of the course we will often use the total 94 00:05:51,830 --> 00:05:56,520 expectation theorem, including in cases where Y ranges over 95 00:05:56,520 --> 00:05:58,960 an infinite discrete set. 96 00:05:58,960 --> 00:06:03,110 In fact, we will see that this theorem is an extremely useful 97 00:06:03,110 --> 00:06:05,670 tool that can be used to divide and 98 00:06:05,670 --> 00:06:07,670 conquer complicated models.