1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:22,840 10 00:00:22,840 --> 00:00:25,260 JOHN TSITSIKLIS: Today we're going to finish our discussion 11 00:00:25,260 --> 00:00:27,480 of the Poisson process. 12 00:00:27,480 --> 00:00:31,280 We're going to see a few of its properties, do a few 13 00:00:31,280 --> 00:00:35,660 interesting problems, some more interesting than others. 14 00:00:35,660 --> 00:00:39,000 So go through a few examples and then we're going to talk 15 00:00:39,000 --> 00:00:42,170 about some quite strange things that happen with the 16 00:00:42,170 --> 00:00:43,980 Poisson process. 17 00:00:43,980 --> 00:00:46,500 So the first thing is to remember what the Poisson 18 00:00:46,500 --> 00:00:48,200 processes is. 19 00:00:48,200 --> 00:00:52,300 It's a model, let's say, of arrivals of customers that 20 00:00:52,300 --> 00:00:55,940 are, in some sense, quote unquote, completely random, 21 00:00:55,940 --> 00:00:58,990 that is a customer can arrive at any point in time. 22 00:00:58,990 --> 00:01:01,670 All points in time are equally likely. 23 00:01:01,670 --> 00:01:05,330 And different points in time are sort of independent of 24 00:01:05,330 --> 00:01:06,450 other points in time. 25 00:01:06,450 --> 00:01:10,160 So the fact that I got an arrival now doesn't tell me 26 00:01:10,160 --> 00:01:13,050 anything about whether there's going to be an arrival at some 27 00:01:13,050 --> 00:01:14,950 other time. 28 00:01:14,950 --> 00:01:18,230 In some sense, it's a continuous time version of the 29 00:01:18,230 --> 00:01:19,760 Bernoulli process. 30 00:01:19,760 --> 00:01:23,130 So the best way to think about the Poisson process is that we 31 00:01:23,130 --> 00:01:26,240 divide time into extremely tiny slots. 32 00:01:26,240 --> 00:01:29,750 And in each time slot, there's an independent possibility of 33 00:01:29,750 --> 00:01:31,120 having an arrival. 34 00:01:31,120 --> 00:01:33,860 Different time slots are independent of each other. 35 00:01:33,860 --> 00:01:36,660 On the other hand, when the slot is tiny, the probability 36 00:01:36,660 --> 00:01:39,760 for obtaining an arrival during that tiny slot is 37 00:01:39,760 --> 00:01:41,910 itself going to be tiny. 38 00:01:41,910 --> 00:01:45,950 So we capture these properties into a formal definition what 39 00:01:45,950 --> 00:01:48,380 the Poisson process is. 40 00:01:48,380 --> 00:01:51,390 We have a probability mass function for the number of 41 00:01:51,390 --> 00:01:56,250 arrivals, k, during an interval of a given length. 42 00:01:56,250 --> 00:02:00,590 So this is the sort of basic description of the 43 00:02:00,590 --> 00:02:03,200 distribution of the number of arrivals. 44 00:02:03,200 --> 00:02:07,520 So tau is fixed. 45 00:02:07,520 --> 00:02:09,350 And k is the parameter. 46 00:02:09,350 --> 00:02:13,660 So when we add over all k's, the sum of these probabilities 47 00:02:13,660 --> 00:02:15,780 has to be equal to 1. 48 00:02:15,780 --> 00:02:19,330 There's a time homogeneity assumption, which is hidden in 49 00:02:19,330 --> 00:02:22,840 this, namely, the only thing that matters is the duration 50 00:02:22,840 --> 00:02:25,880 of the time interval, not where the time interval sits 51 00:02:25,880 --> 00:02:28,230 on the real axis. 52 00:02:28,230 --> 00:02:31,260 Then we have an independence assumption. 53 00:02:31,260 --> 00:02:34,570 Intervals that are disjoint are statistically independent 54 00:02:34,570 --> 00:02:35,650 from each other. 55 00:02:35,650 --> 00:02:39,110 So any information you give me about arrivals during this 56 00:02:39,110 --> 00:02:43,200 time interval doesn't change my beliefs about what's going 57 00:02:43,200 --> 00:02:45,930 to happen during another time interval. 58 00:02:45,930 --> 00:02:49,050 So this is a generalization of the idea that we had in 59 00:02:49,050 --> 00:02:51,970 Bernoulli processes that different time slots are 60 00:02:51,970 --> 00:02:53,930 independent of each other. 61 00:02:53,930 --> 00:02:56,770 And then to specify this function, the distribution of 62 00:02:56,770 --> 00:03:00,120 the number of arrivals, we sort of go in stages. 63 00:03:00,120 --> 00:03:03,750 We first specify this function for the case where the time 64 00:03:03,750 --> 00:03:05,920 interval is very small. 65 00:03:05,920 --> 00:03:09,630 And I'm telling you what those probabilities will be. 66 00:03:09,630 --> 00:03:13,540 And based on these then, we do some calculations and to find 67 00:03:13,540 --> 00:03:16,510 the formula for the distribution of the number of 68 00:03:16,510 --> 00:03:19,420 arrivals for intervals of a general duration. 69 00:03:19,420 --> 00:03:23,000 So for a small duration, delta, the probability of 70 00:03:23,000 --> 00:03:26,730 obtaining 1 arrival is lambda delta. 71 00:03:26,730 --> 00:03:30,220 The remaining probability is assigned to the event that we 72 00:03:30,220 --> 00:03:32,980 get to no arrivals during that interval. 73 00:03:32,980 --> 00:03:36,330 The probability of obtaining more than 1 arrival in a tiny 74 00:03:36,330 --> 00:03:40,630 interval is essentially 0. 75 00:03:40,630 --> 00:03:45,190 And when we say essentially, it's means modular, terms that 76 00:03:45,190 --> 00:03:47,620 of order delta squared. 77 00:03:47,620 --> 00:03:50,440 And when delta is very small, anything which is delta 78 00:03:50,440 --> 00:03:52,690 squared can be ignored. 79 00:03:52,690 --> 00:03:55,850 So up to delta squared terms, that's what happened during a 80 00:03:55,850 --> 00:03:57,660 little interval. 81 00:03:57,660 --> 00:04:01,210 Now if we know the probability distribution for the number of 82 00:04:01,210 --> 00:04:03,260 arrivals in a little interval. 83 00:04:03,260 --> 00:04:06,470 We can use this to get the distribution for the number of 84 00:04:06,470 --> 00:04:08,370 arrivals over several intervals. 85 00:04:08,370 --> 00:04:09,870 How do we do that? 86 00:04:09,870 --> 00:04:13,850 The big interval is composed of many little intervals. 87 00:04:13,850 --> 00:04:16,410 Each little interval is independent from any other 88 00:04:16,410 --> 00:04:20,720 little interval, so is it is as if we have a sequence of 89 00:04:20,720 --> 00:04:22,310 Bernoulli trials. 90 00:04:22,310 --> 00:04:24,910 Each Bernoulli trial is associated with a little 91 00:04:24,910 --> 00:04:29,580 interval and has a small probability of obtaining a 92 00:04:29,580 --> 00:04:32,850 success or an arrival during that mini-slot. 93 00:04:32,850 --> 00:04:35,800 On the other hand, when delta is small, and you take a big 94 00:04:35,800 --> 00:04:38,680 interval and chop it up, you get a large 95 00:04:38,680 --> 00:04:41,410 number of little intervals. 96 00:04:41,410 --> 00:04:45,240 So what we essentially have here is a Bernoulli process, 97 00:04:45,240 --> 00:04:48,690 in which is the number of trials is huge but the 98 00:04:48,690 --> 00:04:53,030 probability of success during any given trial is tiny. 99 00:04:53,030 --> 00:05:02,580 The average number of trials ends up being proportional to 100 00:05:02,580 --> 00:05:04,740 the length of the interval. 101 00:05:04,740 --> 00:05:07,410 If you have twice as large an interval, it's as if you're 102 00:05:07,410 --> 00:05:10,860 having twice as many over these mini-trials, so the 103 00:05:10,860 --> 00:05:14,080 expected number of arrivals will increase proportionately. 104 00:05:14,080 --> 00:05:18,380 There's also this parameter lambda, which we interpret as 105 00:05:18,380 --> 00:05:22,600 expected number of arrivals per unit time. 106 00:05:22,600 --> 00:05:25,940 And it comes in those probabilities here. 107 00:05:25,940 --> 00:05:28,810 When you double lambda, this means that a little interval 108 00:05:28,810 --> 00:05:31,160 is twice as likely to get an arrival. 109 00:05:31,160 --> 00:05:33,300 So you would expect to get twice as 110 00:05:33,300 --> 00:05:34,880 many arrivals as well. 111 00:05:34,880 --> 00:05:37,990 That's why the expected number of arrivals during an interval 112 00:05:37,990 --> 00:05:40,580 of length tau also scales proportional to 113 00:05:40,580 --> 00:05:42,850 this parameter lambda. 114 00:05:42,850 --> 00:05:45,740 Somewhat unexpectedly, it turns out that the variance of 115 00:05:45,740 --> 00:05:48,750 the number of arrivals is also the same as the mean. 116 00:05:48,750 --> 00:05:50,540 This is a peculiarity that happens 117 00:05:50,540 --> 00:05:52,310 in the Poisson process. 118 00:05:52,310 --> 00:05:56,100 So this is one way of thinking about Poisson process, in 119 00:05:56,100 --> 00:05:59,640 terms of little intervals, each one of which has a tiny 120 00:05:59,640 --> 00:06:01,690 probability of success. 121 00:06:01,690 --> 00:06:04,370 And we think of the distribution associated with 122 00:06:04,370 --> 00:06:07,030 that process as being described by 123 00:06:07,030 --> 00:06:09,100 this particular PMF. 124 00:06:09,100 --> 00:06:12,800 So this is the PMF for the number of arrivals during an 125 00:06:12,800 --> 00:06:15,880 interval of a fixed duration, tau. 126 00:06:15,880 --> 00:06:20,970 It's a PMF that extends all over the entire range of 127 00:06:20,970 --> 00:06:22,810 non-negative integers. 128 00:06:22,810 --> 00:06:25,730 So the number of arrivals you can get during an interval for 129 00:06:25,730 --> 00:06:27,890 certain length can be anything. 130 00:06:27,890 --> 00:06:31,690 You can get as many arrivals as you want. 131 00:06:31,690 --> 00:06:34,400 Of course the probability of getting a zillion arrivals is 132 00:06:34,400 --> 00:06:35,620 going to be tiny. 133 00:06:35,620 --> 00:06:38,110 But in principle, this is possible. 134 00:06:38,110 --> 00:06:41,440 And that's because an interval, even if it's a fixed 135 00:06:41,440 --> 00:06:47,460 length, consists of an infinite number of mini-slots 136 00:06:47,460 --> 00:06:48,770 in some sense. 137 00:06:48,770 --> 00:06:50,880 You can divide, chop it up, into as many 138 00:06:50,880 --> 00:06:52,130 mini-slots as you want. 139 00:06:52,130 --> 00:06:54,400 So in principle, it's possible that every 140 00:06:54,400 --> 00:06:55,860 mini-slot gets an arrival. 141 00:06:55,860 --> 00:06:59,190 In principle, it's possible to get an arbitrarily large 142 00:06:59,190 --> 00:07:01,210 number of arrivals. 143 00:07:01,210 --> 00:07:05,250 So this particular formula here is not very intuitive 144 00:07:05,250 --> 00:07:06,560 when you look at it. 145 00:07:06,560 --> 00:07:08,630 But it's a legitimate PMF. 146 00:07:08,630 --> 00:07:10,360 And it's called the Poisson PMF. 147 00:07:10,360 --> 00:07:13,970 It's the PMF that describes the number of arrivals. 148 00:07:13,970 --> 00:07:17,660 So that's one way of thinking about the Poisson process, 149 00:07:17,660 --> 00:07:21,650 where the basic object of interest would be this PMF and 150 00:07:21,650 --> 00:07:23,520 you try to work with it. 151 00:07:23,520 --> 00:07:26,600 There's another way of thinking about what happens in 152 00:07:26,600 --> 00:07:28,060 the Poisson process. 153 00:07:28,060 --> 00:07:31,780 And this has to do with letting things evolve in time. 154 00:07:31,780 --> 00:07:34,160 You start at time 0. 155 00:07:34,160 --> 00:07:37,080 There's going to be a time at which the first arrival 156 00:07:37,080 --> 00:07:40,000 occurs, and call that time T1. 157 00:07:40,000 --> 00:07:44,130 This time turns out to have an exponential distribution with 158 00:07:44,130 --> 00:07:46,340 parameter lambda. 159 00:07:46,340 --> 00:07:49,120 Once you get an arrival, it's as if the 160 00:07:49,120 --> 00:07:53,300 process starts fresh. 161 00:07:53,300 --> 00:07:55,830 The best way to understand why this is the case is by 162 00:07:55,830 --> 00:07:57,740 thinking in terms of the analogy with 163 00:07:57,740 --> 00:07:58,840 the Bernoulli process. 164 00:07:58,840 --> 00:08:01,660 If you believe that statement for the Bernoulli process, 165 00:08:01,660 --> 00:08:05,510 since this is a limiting case, it should also be true. 166 00:08:05,510 --> 00:08:09,150 So starting from this time, we're going to wait a random 167 00:08:09,150 --> 00:08:12,710 amount of time until we get the second arrival This random 168 00:08:12,710 --> 00:08:15,250 amount of time, let's call it T2. 169 00:08:15,250 --> 00:08:18,360 This time, T2 is also going to have an exponential 170 00:08:18,360 --> 00:08:21,140 distribution with the same parameter, lambda. 171 00:08:21,140 --> 00:08:26,615 And these two are going to be independent of each other. 172 00:08:26,615 --> 00:08:27,750 OK? 173 00:08:27,750 --> 00:08:31,520 So the Poisson process has all the same memorylessness 174 00:08:31,520 --> 00:08:34,820 properties that the Bernoulli process has. 175 00:08:34,820 --> 00:08:37,630 What's another way of thinking of this property? 176 00:08:37,630 --> 00:08:43,360 So think of a process where you have a light bulb. 177 00:08:43,360 --> 00:08:47,070 The time at the light bulb burns out, you can model it by 178 00:08:47,070 --> 00:08:48,855 an exponential random variable. 179 00:08:48,855 --> 00:08:51,680 180 00:08:51,680 --> 00:08:58,170 And suppose that they tell you that so far, we're are sitting 181 00:08:58,170 --> 00:09:01,550 at some time, T. And I tell you that the light bulb has 182 00:09:01,550 --> 00:09:04,510 not yet burned out. 183 00:09:04,510 --> 00:09:08,290 What does this tell you about the future of the light bulb? 184 00:09:08,290 --> 00:09:11,700 Is the fact that they didn't burn out, so far, is it good 185 00:09:11,700 --> 00:09:13,720 news or is it bad news? 186 00:09:13,720 --> 00:09:17,640 Would you rather keep this light bulb that has worked for 187 00:09:17,640 --> 00:09:20,950 t times steps and is still OK? 188 00:09:20,950 --> 00:09:25,770 Or would you rather use a new light bulb that starts new at 189 00:09:25,770 --> 00:09:27,740 that point in time? 190 00:09:27,740 --> 00:09:30,920 Because of the memorylessness property, the past of that 191 00:09:30,920 --> 00:09:33,220 light bulb doesn't matter. 192 00:09:33,220 --> 00:09:37,040 So the future of this light bulb is statistically the same 193 00:09:37,040 --> 00:09:40,740 as the future of a new light bulb. 194 00:09:40,740 --> 00:09:43,700 For both of them, the time until they burn out is going 195 00:09:43,700 --> 00:09:46,580 to be described an exponential distribution. 196 00:09:46,580 --> 00:09:50,990 So one way that people described the situation is to 197 00:09:50,990 --> 00:09:55,450 say that used is exactly as good as a new. 198 00:09:55,450 --> 00:09:59,220 So a used on is no worse than a new one. 199 00:09:59,220 --> 00:10:01,950 A used one is no better than a new one. 200 00:10:01,950 --> 00:10:06,130 So a used light bulb that hasn't yet burnt out is 201 00:10:06,130 --> 00:10:09,180 exactly as good as a new light bulb. 202 00:10:09,180 --> 00:10:11,740 So that's another way of thinking about the 203 00:10:11,740 --> 00:10:17,150 memorylessness that we have in the Poisson process. 204 00:10:17,150 --> 00:10:19,350 Back to this picture. 205 00:10:19,350 --> 00:10:22,410 The time until the second arrival is the sum of two 206 00:10:22,410 --> 00:10:24,990 independent exponential random variables. 207 00:10:24,990 --> 00:10:28,050 So, in principle, you can use the convolution formula to 208 00:10:28,050 --> 00:10:32,330 find the distribution of T1 plus T2, and that would be 209 00:10:32,330 --> 00:10:36,750 what we call Y2, the time until the second arrival. 210 00:10:36,750 --> 00:10:39,210 But there's also a direct way of obtaining to the 211 00:10:39,210 --> 00:10:42,580 distribution of Y2, and this is the calculation that we did 212 00:10:42,580 --> 00:10:44,340 last time on the blackboard. 213 00:10:44,340 --> 00:10:46,320 And actually, we did it more generally. 214 00:10:46,320 --> 00:10:49,990 We found the time until the case arrival occurs. 215 00:10:49,990 --> 00:10:53,860 It has a closed form formula, which is called the Erlang 216 00:10:53,860 --> 00:10:56,960 distribution with k degrees of freedom. 217 00:10:56,960 --> 00:11:00,170 So let's see what's going on here. 218 00:11:00,170 --> 00:11:03,230 It's a distribution Of what kind? 219 00:11:03,230 --> 00:11:05,210 It's a continuous distribution. 220 00:11:05,210 --> 00:11:07,150 It's a probability density function. 221 00:11:07,150 --> 00:11:10,620 This is because the time is a continuous random variable. 222 00:11:10,620 --> 00:11:11,580 Time is continuous. 223 00:11:11,580 --> 00:11:14,320 Arrivals can happen at any time. 224 00:11:14,320 --> 00:11:17,090 So we're talking about the PDF. 225 00:11:17,090 --> 00:11:20,230 This k is just the parameter of the distribution. 226 00:11:20,230 --> 00:11:22,450 We're talking about the k-th arrival, so 227 00:11:22,450 --> 00:11:24,210 k is a fixed number. 228 00:11:24,210 --> 00:11:27,440 Lambda is another parameter of the distribution, which is the 229 00:11:27,440 --> 00:11:32,660 arrival rate So it's a PDF over the Y's, whereas lambda 230 00:11:32,660 --> 00:11:36,060 and k are parameters of the distribution. 231 00:11:36,060 --> 00:11:40,530 232 00:11:40,530 --> 00:11:40,860 OK. 233 00:11:40,860 --> 00:11:45,630 So this was what we knew from last time. 234 00:11:45,630 --> 00:11:51,550 Just to get some practice, let us do a problem that's not too 235 00:11:51,550 --> 00:11:55,730 difficult, but just to see how we use the various formulas 236 00:11:55,730 --> 00:11:57,470 that we have. 237 00:11:57,470 --> 00:12:01,930 So Poisson was a mathematician, but Poisson 238 00:12:01,930 --> 00:12:04,730 also means fish in French. 239 00:12:04,730 --> 00:12:07,130 So Poisson goes fishing. 240 00:12:07,130 --> 00:12:11,680 And let's assume that fish are caught according 241 00:12:11,680 --> 00:12:13,420 to a Poisson process. 242 00:12:13,420 --> 00:12:15,310 That's not too bad an assumption. 243 00:12:15,310 --> 00:12:18,180 At any given point in time, you have a little probability 244 00:12:18,180 --> 00:12:19,840 that a fish would be caught. 245 00:12:19,840 --> 00:12:22,930 And whether you catch one now is sort of independent about 246 00:12:22,930 --> 00:12:28,210 whether at some later time a fish will be caught or not. 247 00:12:28,210 --> 00:12:30,030 So let's just make this assumption. 248 00:12:30,030 --> 00:12:35,270 And suppose that the rules of the game are that you-- 249 00:12:35,270 --> 00:12:40,350 Fish are being called it the certain rate of 0.6 per hour. 250 00:12:40,350 --> 00:12:44,390 You fish for 2 hours, no matter what. 251 00:12:44,390 --> 00:12:46,190 And then there are two possibilities. 252 00:12:46,190 --> 00:12:50,710 If I have caught a fish, I stop and go home. 253 00:12:50,710 --> 00:12:54,320 So if some fish have been caught, so there's at least 1 254 00:12:54,320 --> 00:12:57,250 arrival during this interval, I go home. 255 00:12:57,250 --> 00:13:01,760 Or if nothing has being caught, I continue fishing 256 00:13:01,760 --> 00:13:03,630 until I catch something. 257 00:13:03,630 --> 00:13:05,300 And then I go home. 258 00:13:05,300 --> 00:13:09,410 So that's the description of what is going to happen. 259 00:13:09,410 --> 00:13:12,940 And now let's starts asking questions of all sorts. 260 00:13:12,940 --> 00:13:16,450 What is the probability that I'm going to be fishing for 261 00:13:16,450 --> 00:13:19,060 more than 2 hours? 262 00:13:19,060 --> 00:13:23,200 I will be fishing for more than 2 hours, if and only if 263 00:13:23,200 --> 00:13:28,400 no fish were caught during those 2 hours, in which case, 264 00:13:28,400 --> 00:13:30,140 I will have to continue. 265 00:13:30,140 --> 00:13:33,600 Therefore, this is just this quantity. 266 00:13:33,600 --> 00:13:38,630 The probability of catching 2 fish in-- 267 00:13:38,630 --> 00:13:43,450 of catching 0 fish in the next 2 hours, and according to the 268 00:13:43,450 --> 00:13:47,170 formula that we have, this is going to be e to the minus 269 00:13:47,170 --> 00:13:50,820 lambda times how much time we have. 270 00:13:50,820 --> 00:13:53,040 There's another way of thinking about this. 271 00:13:53,040 --> 00:13:55,790 The probability that I fish for more than 2 hours is the 272 00:13:55,790 --> 00:14:01,230 probability that the first catch happens after time 2, 273 00:14:01,230 --> 00:14:04,990 which would be the integral from 2 to infinity of the 274 00:14:04,990 --> 00:14:09,610 density of the first arrival time. 275 00:14:09,610 --> 00:14:11,770 And that density is an exponential. 276 00:14:11,770 --> 00:14:14,910 So you do the integral of an exponential, and, of course, 277 00:14:14,910 --> 00:14:17,160 you would get the same answer. 278 00:14:17,160 --> 00:14:17,550 OK. 279 00:14:17,550 --> 00:14:18,730 That's easy. 280 00:14:18,730 --> 00:14:22,880 So what's the probability of fishing for more than 2 but 281 00:14:22,880 --> 00:14:25,420 less than 5 hours? 282 00:14:25,420 --> 00:14:28,570 What does it take for this to happen? 283 00:14:28,570 --> 00:14:35,540 For this to happen, we need to catch 0 fish from time 0 to 2 284 00:14:35,540 --> 00:14:43,020 and catch the first fish sometime between 2 and 5. 285 00:14:43,020 --> 00:14:44,400 So if you-- 286 00:14:44,400 --> 00:14:47,510 one way of thinking about what's happening here might be 287 00:14:47,510 --> 00:14:49,900 to say that there's a Poisson process that 288 00:14:49,900 --> 00:14:52,770 keeps going on forever. 289 00:14:52,770 --> 00:14:57,090 But as soon as I catch the first fish, instead of 290 00:14:57,090 --> 00:15:00,990 continuing fishing and obtaining those other fish I 291 00:15:00,990 --> 00:15:04,070 just go home right now. 292 00:15:04,070 --> 00:15:11,060 Now the fact that I go home before time 5 means that, if I 293 00:15:11,060 --> 00:15:13,990 were to stay until time 5, I would have 294 00:15:13,990 --> 00:15:15,850 caught at least 1 fish. 295 00:15:15,850 --> 00:15:18,350 I might have caught more than 1. 296 00:15:18,350 --> 00:15:22,970 So the event of interest here is that the first catch 297 00:15:22,970 --> 00:15:26,560 happens between times 2 and 5. 298 00:15:26,560 --> 00:15:32,050 So one way of calculating this quantity would be-- 299 00:15:32,050 --> 00:15:35,300 Its the probability that the first catch happens between 300 00:15:35,300 --> 00:15:37,700 times 2 and 5. 301 00:15:37,700 --> 00:15:40,060 Another way to deal with it is to say, this is the 302 00:15:40,060 --> 00:15:44,880 probability that I caught 0 fish in the first 2 hours and 303 00:15:44,880 --> 00:15:49,170 then the probability that I catch at least 1 fish during 304 00:15:49,170 --> 00:15:51,130 the next 3 hours. 305 00:15:51,130 --> 00:15:53,890 306 00:15:53,890 --> 00:15:54,780 This. 307 00:15:54,780 --> 00:15:56,080 What is this? 308 00:15:56,080 --> 00:15:59,180 The probability of 0 fish in the next 3 hours is the 309 00:15:59,180 --> 00:16:01,600 probability of 0 fish during this time. 310 00:16:01,600 --> 00:16:04,480 1 minus this is the probability of catching at 311 00:16:04,480 --> 00:16:07,850 least 1 fish, of having at least 1 arrival, 312 00:16:07,850 --> 00:16:09,730 between times 2 and 5. 313 00:16:09,730 --> 00:16:13,310 If there's at least 1 arrival between times 2 and 5, then I 314 00:16:13,310 --> 00:16:17,140 would have gone home by time 5. 315 00:16:17,140 --> 00:16:20,660 So both of these, if you plug-in numbers and all that, 316 00:16:20,660 --> 00:16:24,170 of course, are going to give you the same answer. 317 00:16:24,170 --> 00:16:26,820 Now next, what's the probability that I catch at 318 00:16:26,820 --> 00:16:29,560 least 2 fish? 319 00:16:29,560 --> 00:16:32,370 In which scenario are we? 320 00:16:32,370 --> 00:16:36,570 Under this scenario, I go home when I catch my first fish. 321 00:16:36,570 --> 00:16:39,560 So in order to catch at least 2 fish, it 322 00:16:39,560 --> 00:16:41,340 must be in this case. 323 00:16:41,340 --> 00:16:44,830 So this is the same as the event that I catch at least 2 324 00:16:44,830 --> 00:16:49,020 fish during the first 2 time steps. 325 00:16:49,020 --> 00:16:52,410 So it's going to be the probability from 2 to 326 00:16:52,410 --> 00:16:56,780 infinity, the probability that I catch 2 fish, or that I 327 00:16:56,780 --> 00:17:01,860 catch 3 fish, or I catch more than that. 328 00:17:01,860 --> 00:17:04,109 So it's this quantity. 329 00:17:04,109 --> 00:17:06,730 k is the number of fish that I catch. 330 00:17:06,730 --> 00:17:09,599 At least 2, so k goes from 2 to infinity. 331 00:17:09,599 --> 00:17:13,180 These are the probabilities of catching a number k of fish 332 00:17:13,180 --> 00:17:14,859 during this interval. 333 00:17:14,859 --> 00:17:17,920 And if you want a simpler form without an infinite sum, this 334 00:17:17,920 --> 00:17:20,619 would be 1 minus the probability of catching 0 335 00:17:20,619 --> 00:17:24,880 fish, minus the probability of catching 1 fish, during a time 336 00:17:24,880 --> 00:17:28,050 interval of length 2. 337 00:17:28,050 --> 00:17:29,520 Another way to think of it. 338 00:17:29,520 --> 00:17:34,230 I'm going to catch 2 fish, at least 2 fish, if and only if 339 00:17:34,230 --> 00:17:40,630 the second fish caught in this process happens before time 2. 340 00:17:40,630 --> 00:17:43,950 So that's another way of thinking about the same event. 341 00:17:43,950 --> 00:17:46,230 So it's going to be the probability that the random 342 00:17:46,230 --> 00:17:51,440 variable Y2, the arrival time over the second fish, is less 343 00:17:51,440 --> 00:17:52,690 than or equal to 2. 344 00:17:52,690 --> 00:17:55,387 345 00:17:55,387 --> 00:17:56,310 OK. 346 00:17:56,310 --> 00:18:00,000 The next one is a little trickier. 347 00:18:00,000 --> 00:18:03,380 Here we need to do a little bit of divide and conquer. 348 00:18:03,380 --> 00:18:06,490 Overall, in this expedition, what the expected number of 349 00:18:06,490 --> 00:18:08,840 fish to be caught? 350 00:18:08,840 --> 00:18:11,550 One way to think about it is to try to use the total 351 00:18:11,550 --> 00:18:13,100 expectations theorem. 352 00:18:13,100 --> 00:18:17,830 And think of expected number of fish, given this scenario, 353 00:18:17,830 --> 00:18:21,010 or expected number of fish, given this scenario. 354 00:18:21,010 --> 00:18:24,190 That's a little more complicated than the way I'm 355 00:18:24,190 --> 00:18:25,290 going to do it. 356 00:18:25,290 --> 00:18:28,240 The way I'm going to do is to think as follows-- 357 00:18:28,240 --> 00:18:32,310 Expected number of fish is the expected number of fish caught 358 00:18:32,310 --> 00:18:37,520 between times 0 and 2 plus expected number of fish caught 359 00:18:37,520 --> 00:18:39,800 after time 2. 360 00:18:39,800 --> 00:18:45,580 So what's the expected number caught between time 0 and 2? 361 00:18:45,580 --> 00:18:47,860 This is lambda t. 362 00:18:47,860 --> 00:18:52,310 So lambda is 0.6 times 2. 363 00:18:52,310 --> 00:18:55,380 This is the expected number of fish that are caught between 364 00:18:55,380 --> 00:18:57,260 times 0 and 2. 365 00:18:57,260 --> 00:19:00,440 Now let's think about the expected number of fish caught 366 00:19:00,440 --> 00:19:01,630 afterwards. 367 00:19:01,630 --> 00:19:04,300 How many fish are being caught afterwards? 368 00:19:04,300 --> 00:19:06,110 Well it depends on the scenario. 369 00:19:06,110 --> 00:19:08,750 If we're in this scenario, we've gone home 370 00:19:08,750 --> 00:19:10,800 and we catch 0. 371 00:19:10,800 --> 00:19:14,570 If we're in this scenario, then we continue fishing until 372 00:19:14,570 --> 00:19:15,980 we catch one. 373 00:19:15,980 --> 00:19:19,970 So the expected number of fish to be caught after time 2 is 374 00:19:19,970 --> 00:19:24,520 going to be the probability of this scenario times 1. 375 00:19:24,520 --> 00:19:29,020 And the probability of that scenario is the probability 376 00:19:29,020 --> 00:19:33,490 that they call it's 0 fish during the first 2 time steps 377 00:19:33,490 --> 00:19:37,420 times 1, which is the number of fish I'm going to catch if 378 00:19:37,420 --> 00:19:39,790 I continue. 379 00:19:39,790 --> 00:19:43,960 The expected total fishing time we can calculate exactly 380 00:19:43,960 --> 00:19:46,150 the same way. 381 00:19:46,150 --> 00:19:47,890 I'm jumping to the last one. 382 00:19:47,890 --> 00:19:51,580 My total fishing time has a period of 2 time steps. 383 00:19:51,580 --> 00:19:54,910 I'm going to fish for 2 time steps no matter what. 384 00:19:54,910 --> 00:19:59,190 And then if I caught 0 fish, which happens with this 385 00:19:59,190 --> 00:20:04,540 probability, my expected time is going to be the expected 386 00:20:04,540 --> 00:20:08,920 time from here onwards, which is the expected value of this 387 00:20:08,920 --> 00:20:12,490 geometric random variable with parameter lambda. 388 00:20:12,490 --> 00:20:15,430 So the expected time is 1 over lambda. 389 00:20:15,430 --> 00:20:22,460 And in our case this, is 1/0.6. 390 00:20:22,460 --> 00:20:31,180 Finally, if I tell you that I have been fishing for 4 hours 391 00:20:31,180 --> 00:20:37,800 and nothing has been caught so far, how much do you expect 392 00:20:37,800 --> 00:20:41,630 this quantity to be? 393 00:20:41,630 --> 00:20:46,330 Here is the story that, again, that for the Poisson process 394 00:20:46,330 --> 00:20:48,720 used is as good as new. 395 00:20:48,720 --> 00:20:51,060 The process does not have any memory. 396 00:20:51,060 --> 00:20:54,930 Given what happens in the past doesn't matter for the future. 397 00:20:54,930 --> 00:20:58,430 It's as if the process starts new at this point in time. 398 00:20:58,430 --> 00:21:02,420 So this one is going to be, again, the same exponentially 399 00:21:02,420 --> 00:21:04,910 distributed random variable with the 400 00:21:04,910 --> 00:21:08,270 same parameter lambda. 401 00:21:08,270 --> 00:21:12,270 So expected time until an arrival comes is an 402 00:21:12,270 --> 00:21:13,740 exponential distribut -- 403 00:21:13,740 --> 00:21:15,910 has an exponential distribution with parameter 404 00:21:15,910 --> 00:21:19,660 lambda, no matter what has happened in the past. 405 00:21:19,660 --> 00:21:22,730 Starting from now and looking into the future, it's as if 406 00:21:22,730 --> 00:21:24,910 the process has just started. 407 00:21:24,910 --> 00:21:32,440 So it's going to be 1 over lambda, which is 1/0.6. 408 00:21:32,440 --> 00:21:33,690 OK. 409 00:21:33,690 --> 00:21:37,540 410 00:21:37,540 --> 00:21:41,500 Now our next example is going to be a little more 411 00:21:41,500 --> 00:21:43,780 complicated or subtle. 412 00:21:43,780 --> 00:21:46,800 But before we get to the example, let's refresh our 413 00:21:46,800 --> 00:21:50,300 memory about what we discussed last time about merging 414 00:21:50,300 --> 00:21:53,110 Poisson independent Poisson processes. 415 00:21:53,110 --> 00:21:56,090 Instead of drawing the picture that way, another way we could 416 00:21:56,090 --> 00:21:58,260 draw it could be this. 417 00:21:58,260 --> 00:22:01,260 We have a Poisson process with rate lambda1, and a Poisson 418 00:22:01,260 --> 00:22:03,440 process with rate lambda2. 419 00:22:03,440 --> 00:22:07,320 They have, each one of these, have their arrivals. 420 00:22:07,320 --> 00:22:09,780 And then we form the merged process. 421 00:22:09,780 --> 00:22:13,580 And the merged process records an arrival whenever there's an 422 00:22:13,580 --> 00:22:16,930 arrival in either of the two processes. 423 00:22:16,930 --> 00:22:19,990 424 00:22:19,990 --> 00:22:23,730 This process in that process are assumed to be independent 425 00:22:23,730 --> 00:22:26,760 of each other. 426 00:22:26,760 --> 00:22:32,590 Now different times in this process and that process are 427 00:22:32,590 --> 00:22:34,780 independent of each other. 428 00:22:34,780 --> 00:22:39,400 So what happens in these two time intervals is independent 429 00:22:39,400 --> 00:22:41,780 from what happens in these two time intervals. 430 00:22:41,780 --> 00:22:45,560 These two time intervals to determine what happens here. 431 00:22:45,560 --> 00:22:48,750 These two time intervals determine what happens there. 432 00:22:48,750 --> 00:22:53,740 So because these are independent from these, this 433 00:22:53,740 --> 00:22:56,600 means that this is also independent from that. 434 00:22:56,600 --> 00:22:59,020 So the independence assumption is satisfied 435 00:22:59,020 --> 00:23:01,150 for the merged process. 436 00:23:01,150 --> 00:23:05,030 And the merged process turns out to be a Poisson process. 437 00:23:05,030 --> 00:23:10,340 And if you want to find the arrival rate for that process, 438 00:23:10,340 --> 00:23:12,550 you argue as follows. 439 00:23:12,550 --> 00:23:15,000 During a little interval of length delta, we have 440 00:23:15,000 --> 00:23:17,280 probability lambda1 delta of having an 441 00:23:17,280 --> 00:23:18,620 arrival in this process. 442 00:23:18,620 --> 00:23:21,700 We have probability lambda2 delta of an arrival in this 443 00:23:21,700 --> 00:23:24,890 process, plus second order terms in 444 00:23:24,890 --> 00:23:26,860 delta, which we're ignoring. 445 00:23:26,860 --> 00:23:29,270 And then you do the calculation and you find that 446 00:23:29,270 --> 00:23:31,870 in this process, you're going to have an arrival 447 00:23:31,870 --> 00:23:37,830 probability, which is lambda1 plus lambda2, again ignoring 448 00:23:37,830 --> 00:23:40,490 second order in delta-- 449 00:23:40,490 --> 00:23:42,650 terms that are second order in delta. 450 00:23:42,650 --> 00:23:46,130 So the merged process is a Poisson process whose arrival 451 00:23:46,130 --> 00:23:48,760 rate is the sum of the arrival rates of 452 00:23:48,760 --> 00:23:52,080 the individual processes. 453 00:23:52,080 --> 00:23:55,290 And the calculation we did at the end of the last lecture-- 454 00:23:55,290 --> 00:23:59,240 If I tell you that the new arrival happened here, where 455 00:23:59,240 --> 00:24:00,610 did that arrival come from? 456 00:24:00,610 --> 00:24:02,910 Did it come from here or from there? 457 00:24:02,910 --> 00:24:06,720 If the lambda1 is equal to lambda2, then by symmetry you 458 00:24:06,720 --> 00:24:09,240 would say that it's equally likely to have come from here 459 00:24:09,240 --> 00:24:10,660 or to come from there. 460 00:24:10,660 --> 00:24:13,720 But if this lambda is much bigger than that lambda, the 461 00:24:13,720 --> 00:24:16,850 fact that they saw an arrival is more likely to have come 462 00:24:16,850 --> 00:24:17,850 from there. 463 00:24:17,850 --> 00:24:22,410 And the formula that captures this is the following. 464 00:24:22,410 --> 00:24:27,300 This is the probability that my arrival has come from this 465 00:24:27,300 --> 00:24:32,360 particular stream rather than that particular stream. 466 00:24:32,360 --> 00:24:38,900 So when an arrival comes and you ask, what is the origin of 467 00:24:38,900 --> 00:24:39,690 that arrival? 468 00:24:39,690 --> 00:24:43,760 It's as if I'm flipping a coin with these odds. 469 00:24:43,760 --> 00:24:46,910 And depending on outcome of that coin, I'm going to tell 470 00:24:46,910 --> 00:24:49,790 you came from there or it came from there. 471 00:24:49,790 --> 00:24:53,850 So the origin of an arrival is either this 472 00:24:53,850 --> 00:24:55,610 stream or that stream. 473 00:24:55,610 --> 00:24:58,190 And this is the probability that the origin of the arrival 474 00:24:58,190 --> 00:24:59,510 is that one. 475 00:24:59,510 --> 00:25:04,160 Now if we look at 2 different arrivals, and we ask about 476 00:25:04,160 --> 00:25:05,570 their origins-- 477 00:25:05,570 --> 00:25:08,130 So let's think about the origin of this arrival and 478 00:25:08,130 --> 00:25:12,060 compare it with the origin that arrival. 479 00:25:12,060 --> 00:25:14,010 The origin of this arrival is random. 480 00:25:14,010 --> 00:25:16,720 It could be right be either this or that. 481 00:25:16,720 --> 00:25:18,840 And this is the relevant probability. 482 00:25:18,840 --> 00:25:20,750 The origin of that arrival is random. 483 00:25:20,750 --> 00:25:24,360 It could be either here or is there, and again, with the 484 00:25:24,360 --> 00:25:26,880 same relevant probability. 485 00:25:26,880 --> 00:25:27,730 Question. 486 00:25:27,730 --> 00:25:31,780 The origin of this arrival, is it dependent or independent 487 00:25:31,780 --> 00:25:34,710 from the origin that arrival? 488 00:25:34,710 --> 00:25:37,500 And here's how the argument goes. 489 00:25:37,500 --> 00:25:40,740 Separate times are independent. 490 00:25:40,740 --> 00:25:45,050 Whatever has happened in the process during this set of 491 00:25:45,050 --> 00:25:48,040 times is independent from whatever happened in the 492 00:25:48,040 --> 00:25:50,980 process during that set of times. 493 00:25:50,980 --> 00:25:55,040 Because different times have nothing to do with each other, 494 00:25:55,040 --> 00:25:59,650 the origin of this, of an arrival here, has nothing to 495 00:25:59,650 --> 00:26:02,480 do with the origin of an arrival there. 496 00:26:02,480 --> 00:26:06,890 So the origins of different arrivals are also independent 497 00:26:06,890 --> 00:26:08,850 random variables. 498 00:26:08,850 --> 00:26:12,710 So if I tell you that-- 499 00:26:12,710 --> 00:26:14,150 yeah. 500 00:26:14,150 --> 00:26:15,310 OK. 501 00:26:15,310 --> 00:26:19,600 So it as if that each time that you have an arrival in 502 00:26:19,600 --> 00:26:22,820 the merge process, it's as if you're flipping a coin to 503 00:26:22,820 --> 00:26:26,410 determine where did that arrival came from and these 504 00:26:26,410 --> 00:26:31,516 coins are independent of each other. 505 00:26:31,516 --> 00:26:32,766 OK. 506 00:26:32,766 --> 00:26:35,550 507 00:26:35,550 --> 00:26:35,920 OK. 508 00:26:35,920 --> 00:26:37,770 Now we're going to use this-- 509 00:26:37,770 --> 00:26:42,970 what we know about merged processes to solve the problem 510 00:26:42,970 --> 00:26:48,240 that would be harder to do, if you were not using ideas from 511 00:26:48,240 --> 00:26:49,720 Poisson processes. 512 00:26:49,720 --> 00:26:52,250 So the formulation of the problem has nothing to do with 513 00:26:52,250 --> 00:26:54,370 the Poisson process. 514 00:26:54,370 --> 00:26:57,450 The formulation is the following. 515 00:26:57,450 --> 00:26:59,870 We have 3 light-bulbs. 516 00:26:59,870 --> 00:27:03,490 And each light bulb is independent and is going to 517 00:27:03,490 --> 00:27:07,920 die out at the time that's exponentially distributed. 518 00:27:07,920 --> 00:27:11,170 So 3 light bulbs. 519 00:27:11,170 --> 00:27:16,630 They start their lives and then at some point 520 00:27:16,630 --> 00:27:21,260 they die or burn out. 521 00:27:21,260 --> 00:27:26,150 So let's think of this as X, this as Y, and this as Z. 522 00:27:26,150 --> 00:27:31,220 And we're interested in the time until the last 523 00:27:31,220 --> 00:27:33,200 light-bulb burns out. 524 00:27:33,200 --> 00:27:36,930 So we're interested in the maximum of the 3 random 525 00:27:36,930 --> 00:27:41,480 variables, X, Y, and Z. And in particular, we want to find 526 00:27:41,480 --> 00:27:43,170 the expected value of this maximum. 527 00:27:43,170 --> 00:27:45,770 528 00:27:45,770 --> 00:27:47,490 OK. 529 00:27:47,490 --> 00:27:50,760 So you can do derived distribution, use the expected 530 00:27:50,760 --> 00:27:52,880 value rule, anything you want. 531 00:27:52,880 --> 00:27:56,230 You can get this answer using the tools that you already 532 00:27:56,230 --> 00:27:58,180 have in your hands. 533 00:27:58,180 --> 00:28:02,070 But now let us see how we can connect to this picture with a 534 00:28:02,070 --> 00:28:05,550 Poisson picture and come up with the answer in a very 535 00:28:05,550 --> 00:28:07,240 simple way. 536 00:28:07,240 --> 00:28:09,630 What is an exponential random variable? 537 00:28:09,630 --> 00:28:14,450 An exponential random variable is the first act in the long 538 00:28:14,450 --> 00:28:19,570 play that involves a whole Poisson process. 539 00:28:19,570 --> 00:28:23,020 So an exponential random variable is the first act of a 540 00:28:23,020 --> 00:28:24,650 Poisson movie. 541 00:28:24,650 --> 00:28:25,660 Same thing here. 542 00:28:25,660 --> 00:28:29,700 You can think of this random variable as being part of some 543 00:28:29,700 --> 00:28:31,850 Poisson process that has been running. 544 00:28:31,850 --> 00:28:35,360 545 00:28:35,360 --> 00:28:38,040 So it's part of this bigger picture. 546 00:28:38,040 --> 00:28:42,370 We're still interested in the maximum of the 3. 547 00:28:42,370 --> 00:28:45,780 The other arrivals are not going to affect our answers. 548 00:28:45,780 --> 00:28:49,640 It's just, conceptually speaking, we can think of the 549 00:28:49,640 --> 00:28:52,840 exponential random variable as being embedded in a bigger 550 00:28:52,840 --> 00:28:55,110 Poisson picture. 551 00:28:55,110 --> 00:29:00,980 So we have 3 Poisson process that are running in parallel. 552 00:29:00,980 --> 00:29:06,150 Let us split the expected time until the last burnout into 553 00:29:06,150 --> 00:29:09,800 pieces, which is time until the first burnout, time from 554 00:29:09,800 --> 00:29:11,810 the first until the second, and time from the 555 00:29:11,810 --> 00:29:13,690 second until the third. 556 00:29:13,690 --> 00:29:16,780 557 00:29:16,780 --> 00:29:20,570 And find the expected values of each one of these pieces. 558 00:29:20,570 --> 00:29:24,620 What can we say about the expected value of this? 559 00:29:24,620 --> 00:29:29,310 This is the first arrival out of all of 560 00:29:29,310 --> 00:29:31,540 these 3 Poisson processes. 561 00:29:31,540 --> 00:29:34,070 It's the first event that happens when you look at all 562 00:29:34,070 --> 00:29:36,080 of these processes simultaneously. 563 00:29:36,080 --> 00:29:39,660 So 3 Poisson processes running in parallel. 564 00:29:39,660 --> 00:29:43,750 We're interested in the time until one of them, any one of 565 00:29:43,750 --> 00:29:46,380 them, gets in arrival. 566 00:29:46,380 --> 00:29:47,690 Rephrase. 567 00:29:47,690 --> 00:29:51,330 We merged the 3 Poisson processes, and we ask for the 568 00:29:51,330 --> 00:29:56,820 time until we observe an arrival in the merged process. 569 00:29:56,820 --> 00:30:01,250 When 1 of the 3 gets an arrival for the first time, 570 00:30:01,250 --> 00:30:03,880 the merged process gets its first arrival. 571 00:30:03,880 --> 00:30:06,300 So what's the expected value of this time 572 00:30:06,300 --> 00:30:08,820 until the first burnout? 573 00:30:08,820 --> 00:30:11,940 It's going to be the expected value of a 574 00:30:11,940 --> 00:30:13,720 Poisson random variable. 575 00:30:13,720 --> 00:30:17,050 So the first burnout is going to have an expected 576 00:30:17,050 --> 00:30:20,430 value, which is-- 577 00:30:20,430 --> 00:30:21,540 OK. 578 00:30:21,540 --> 00:30:23,690 It's a Poisson process. 579 00:30:23,690 --> 00:30:28,530 The merged process of the 3 has a collective arrival rate, 580 00:30:28,530 --> 00:30:32,750 which is 3 times lambda. 581 00:30:32,750 --> 00:30:36,250 So this is the parameter over the exponential distribution 582 00:30:36,250 --> 00:30:39,870 that describes the time until the first arrival in the 583 00:30:39,870 --> 00:30:41,220 merged process. 584 00:30:41,220 --> 00:30:42,990 And the expected value of this random 585 00:30:42,990 --> 00:30:45,670 variable is 1 over that. 586 00:30:45,670 --> 00:30:48,190 When you have an exponential random variable with parameter 587 00:30:48,190 --> 00:30:50,150 lambda, the expected value of that random 588 00:30:50,150 --> 00:30:52,330 variable is 1 over lambda. 589 00:30:52,330 --> 00:30:56,660 Here we're talking about the first arrival time in a 590 00:30:56,660 --> 00:30:58,720 process with rate 3 lambda. 591 00:30:58,720 --> 00:31:00,680 The expected time until the first arrival 592 00:31:00,680 --> 00:31:03,000 is 1 over (3 lambda). 593 00:31:03,000 --> 00:31:03,870 Alright. 594 00:31:03,870 --> 00:31:08,710 So at this time, this bulb, this arrival happened, this 595 00:31:08,710 --> 00:31:11,490 bulb has been burned. 596 00:31:11,490 --> 00:31:15,760 So we don't care about that bulb anymore. 597 00:31:15,760 --> 00:31:21,610 We start at this time, and we look forward. 598 00:31:21,610 --> 00:31:23,640 This bulb has been burned. 599 00:31:23,640 --> 00:31:27,810 So let's just look forward from now on. 600 00:31:27,810 --> 00:31:28,900 What have we got? 601 00:31:28,900 --> 00:31:34,030 We have two bulbs that are burning. 602 00:31:34,030 --> 00:31:37,320 We have a Poisson process that's the bigger picture of 603 00:31:37,320 --> 00:31:40,270 what could happen to that light bulb, if we were to keep 604 00:31:40,270 --> 00:31:41,190 replacing it. 605 00:31:41,190 --> 00:31:42,880 Another Poisson process. 606 00:31:42,880 --> 00:31:45,610 These two processes are, again, independent. 607 00:31:45,610 --> 00:31:50,850 From this time until that time, how long does it take? 608 00:31:50,850 --> 00:31:53,930 It's the time until either this process records an 609 00:31:53,930 --> 00:31:57,090 arrival or that process records and arrival. 610 00:31:57,090 --> 00:32:01,210 That's the same as the time that the merged process of 611 00:32:01,210 --> 00:32:03,810 these two records an arrival. 612 00:32:03,810 --> 00:32:06,430 So we're talking about the expected time until the first 613 00:32:06,430 --> 00:32:08,710 arrival in a merged process. 614 00:32:08,710 --> 00:32:11,030 The merged process is Poisson. 615 00:32:11,030 --> 00:32:14,240 It's Poisson with rate 2 lambda. 616 00:32:14,240 --> 00:32:17,690 So that extra time is going to take-- 617 00:32:17,690 --> 00:32:21,390 the expected value is going to be 1 over the (rate of that 618 00:32:21,390 --> 00:32:22,580 Poisson process). 619 00:32:22,580 --> 00:32:25,170 So 1 over (2 lambda) is the expected value 620 00:32:25,170 --> 00:32:26,980 of this random variable. 621 00:32:26,980 --> 00:32:30,870 So at this point, this bulb now is also burned. 622 00:32:30,870 --> 00:32:33,620 So we start looking from this time on. 623 00:32:33,620 --> 00:32:37,110 That part of the picture disappears. 624 00:32:37,110 --> 00:32:40,150 Starting from this time, what's the expected value 625 00:32:40,150 --> 00:32:43,650 until that remaining light-bulb burns out? 626 00:32:43,650 --> 00:32:47,130 Well, as we said before, in a Poisson process or with 627 00:32:47,130 --> 00:32:50,090 exponential random variables, we have memorylessness. 628 00:32:50,090 --> 00:32:53,120 A used bulb is as good as a new one. 629 00:32:53,120 --> 00:32:55,990 So it's as if we're starting from scratch here. 630 00:32:55,990 --> 00:32:58,700 So this is going to be an exponential random variable 631 00:32:58,700 --> 00:33:00,690 with parameter lambda. 632 00:33:00,690 --> 00:33:05,540 And the expected value of it is going to be 1 over lambda. 633 00:33:05,540 --> 00:33:07,990 So the beauty of approaching this problem in this 634 00:33:07,990 --> 00:33:10,930 particular way is, of course, that we manage to do 635 00:33:10,930 --> 00:33:14,100 everything without any calculus at all, without 636 00:33:14,100 --> 00:33:16,990 striking an integral, without trying to calculate 637 00:33:16,990 --> 00:33:19,220 expectations in any form. 638 00:33:19,220 --> 00:33:23,150 Most of the non-trivial problems that you encounter in 639 00:33:23,150 --> 00:33:28,540 the Poisson world basically involve tricks of these kind. 640 00:33:28,540 --> 00:33:31,830 You have a question and you try to rephrase it, trying to 641 00:33:31,830 --> 00:33:35,240 think in terms of what might happen in the Poisson setting, 642 00:33:35,240 --> 00:33:39,200 use memorylessness, use merging, et cetera, et cetera. 643 00:33:39,200 --> 00:33:43,360 644 00:33:43,360 --> 00:33:46,080 Now we talked about merging. 645 00:33:46,080 --> 00:33:49,480 It turns out that the splitting of Poisson processes 646 00:33:49,480 --> 00:33:53,400 also works in a nice way. 647 00:33:53,400 --> 00:33:57,160 The story here is exactly the same as for 648 00:33:57,160 --> 00:33:58,820 the Bernoulli process. 649 00:33:58,820 --> 00:34:01,870 So I'm having a Poisson process. 650 00:34:01,870 --> 00:34:06,060 And each time, with some rate lambda, and each time that an 651 00:34:06,060 --> 00:34:09,790 arrival comes, I'm going to send it to that stream and the 652 00:34:09,790 --> 00:34:13,179 record an arrival here with some probability P. And I'm 653 00:34:13,179 --> 00:34:16,120 going to send it to the other stream with some probability 1 654 00:34:16,120 --> 00:34:19,469 minus P. So either of this will happen or that will 655 00:34:19,469 --> 00:34:21,940 happen, depending on the outcome of the 656 00:34:21,940 --> 00:34:23,550 coin flip that I do. 657 00:34:23,550 --> 00:34:27,449 Each time that then arrival occurs, I flip a coin and I 658 00:34:27,449 --> 00:34:30,929 decide whether to record it here or there. 659 00:34:30,929 --> 00:34:32,620 This is called splitting a Poisson 660 00:34:32,620 --> 00:34:34,719 process into two pieces. 661 00:34:34,719 --> 00:34:37,120 What kind of process do we get here? 662 00:34:37,120 --> 00:34:40,250 If you look at the little interval for length delta, 663 00:34:40,250 --> 00:34:41,810 what's the probability that this little 664 00:34:41,810 --> 00:34:44,090 interval gets an arrival? 665 00:34:44,090 --> 00:34:47,739 It's the probability that this one gets an arrival, which is 666 00:34:47,739 --> 00:34:51,260 lambda delta times the probability that after I get 667 00:34:51,260 --> 00:34:55,210 an arrival my coin flip came out to be that way, so that it 668 00:34:55,210 --> 00:34:56,270 sends me there. 669 00:34:56,270 --> 00:34:58,740 So this means that this little interval is going to have 670 00:34:58,740 --> 00:35:03,620 probability lambda delta P. Or maybe more suggestively, I 671 00:35:03,620 --> 00:35:09,480 should write it as lambda P times delta. 672 00:35:09,480 --> 00:35:12,350 So every little interval has a probability of an arrival 673 00:35:12,350 --> 00:35:13,470 proportional to delta. 674 00:35:13,470 --> 00:35:16,780 The proportionality factor is lambda P. So lambda P is the 675 00:35:16,780 --> 00:35:18,590 rate of that process. 676 00:35:18,590 --> 00:35:22,500 And then you go through the mental exercise that you went 677 00:35:22,500 --> 00:35:25,170 through for the Bernoulli process to argue that a 678 00:35:25,170 --> 00:35:28,520 different intervals here are independent and so on. 679 00:35:28,520 --> 00:35:31,710 And that completes checking that this process is going to 680 00:35:31,710 --> 00:35:33,360 be a Poisson process. 681 00:35:33,360 --> 00:35:38,060 So when you split a Poisson process by doing independent 682 00:35:38,060 --> 00:35:41,040 coin flips each time that something happens, the 683 00:35:41,040 --> 00:35:44,330 processes that you get is again a Poisson process, but 684 00:35:44,330 --> 00:35:46,490 of course with a reduced rate. 685 00:35:46,490 --> 00:35:50,040 So instead of the word splitting, sometimes people 686 00:35:50,040 --> 00:35:54,330 also use the words thinning-out. 687 00:35:54,330 --> 00:35:57,650 That is, out of the arrivals that came, you keep a few but 688 00:35:57,650 --> 00:35:59,000 throw away a few. 689 00:35:59,000 --> 00:36:01,820 690 00:36:01,820 --> 00:36:02,730 OK. 691 00:36:02,730 --> 00:36:08,570 So now the last topic over this lecture is a quite 692 00:36:08,570 --> 00:36:11,270 curious phenomenon that goes under the 693 00:36:11,270 --> 00:36:12,595 name of random incidents. 694 00:36:12,595 --> 00:36:15,550 695 00:36:15,550 --> 00:36:18,950 So here's the story. 696 00:36:18,950 --> 00:36:22,550 Buses have been running on Mass Ave. from time 697 00:36:22,550 --> 00:36:24,070 immemorial. 698 00:36:24,070 --> 00:36:29,060 And the bus company that runs the buses claims that they 699 00:36:29,060 --> 00:36:33,150 come as a Poisson process with some rate, let's say, of 4 700 00:36:33,150 --> 00:36:34,970 buses per hour. 701 00:36:34,970 --> 00:36:39,250 So that the expected time between bus arrivals is going 702 00:36:39,250 --> 00:36:42,500 to be 15 minutes. 703 00:36:42,500 --> 00:36:45,180 OK. 704 00:36:45,180 --> 00:36:45,840 Alright. 705 00:36:45,840 --> 00:36:48,130 So people have been complaining that they have 706 00:36:48,130 --> 00:36:49,150 been showing up there. 707 00:36:49,150 --> 00:36:51,500 They think the buses are taking too long. 708 00:36:51,500 --> 00:36:54,270 So you are asked to investigate. 709 00:36:54,270 --> 00:36:56,840 Is the company-- 710 00:36:56,840 --> 00:37:00,730 Does it operate according to its promises or not. 711 00:37:00,730 --> 00:37:05,880 So you send an undercover agent to go and check the 712 00:37:05,880 --> 00:37:07,940 interarrival times of the buses. 713 00:37:07,940 --> 00:37:09,660 Are they 15 minutes? 714 00:37:09,660 --> 00:37:11,690 Or are they longer? 715 00:37:11,690 --> 00:37:17,660 So you put your dark glasses and you show up at the bus 716 00:37:17,660 --> 00:37:21,110 stop at some random time. 717 00:37:21,110 --> 00:37:25,530 And you go and ask the guy in the falafel truck, how long 718 00:37:25,530 --> 00:37:28,370 has it been since the last arrival? 719 00:37:28,370 --> 00:37:31,310 So of course that guy works for the FBI, right? 720 00:37:31,310 --> 00:37:36,900 So they tell you, well, it's been, let's say, 12 minutes 721 00:37:36,900 --> 00:37:39,360 since the last bus arrival. 722 00:37:39,360 --> 00:37:40,960 And then you say, "Oh, 12 minutes. 723 00:37:40,960 --> 00:37:42,780 Average time is 15. 724 00:37:42,780 --> 00:37:47,000 So a bus should be coming any time now." 725 00:37:47,000 --> 00:37:48,230 Is that correct? 726 00:37:48,230 --> 00:37:49,660 No, you wouldn't think that way. 727 00:37:49,660 --> 00:37:51,010 It's a Poisson process. 728 00:37:51,010 --> 00:37:53,810 It doesn't matter how long it has been since 729 00:37:53,810 --> 00:37:55,270 the last bus arrival. 730 00:37:55,270 --> 00:37:56,920 So you don't go through that fallacy. 731 00:37:56,920 --> 00:37:59,970 Instead of predicting how long it's going to be, you just sit 732 00:37:59,970 --> 00:38:03,300 down there and wait and measure the time. 733 00:38:03,300 --> 00:38:08,820 And you find that this is, let's say, 11 minutes. 734 00:38:08,820 --> 00:38:13,260 And you go to your boss and report, "Well, it took-- 735 00:38:13,260 --> 00:38:16,410 I went there and the time from the previous bus to the next 736 00:38:16,410 --> 00:38:18,310 one was 23 minutes. 737 00:38:18,310 --> 00:38:20,360 It's more than the 15 that they said." 738 00:38:20,360 --> 00:38:21,830 So go and do that again. 739 00:38:21,830 --> 00:38:23,590 You go day after day. 740 00:38:23,590 --> 00:38:28,350 You keep these statistics of the length of this interval. 741 00:38:28,350 --> 00:38:32,160 And you tell your boss it's a lot more than 15. 742 00:38:32,160 --> 00:38:36,720 It tends to be more like 30 or so. 743 00:38:36,720 --> 00:38:39,170 So the bus company is cheating us. 744 00:38:39,170 --> 00:38:43,490 Does the bus company really run Poisson buses at the rate 745 00:38:43,490 --> 00:38:46,490 that they have promised? 746 00:38:46,490 --> 00:38:51,270 Well let's analyze the situation here and figure out 747 00:38:51,270 --> 00:38:55,010 what the length of this interval 748 00:38:55,010 --> 00:38:57,900 should be, on the average. 749 00:38:57,900 --> 00:39:01,120 The naive argument is that this interval is an 750 00:39:01,120 --> 00:39:02,590 interarrival time. 751 00:39:02,590 --> 00:39:06,410 And interarrival times, on the average, are 15 minutes, if 752 00:39:06,410 --> 00:39:10,610 the company runs indeed Poisson processes with these 753 00:39:10,610 --> 00:39:11,850 interarrival times. 754 00:39:11,850 --> 00:39:14,970 But actually the situation is a little more subtle because 755 00:39:14,970 --> 00:39:19,940 this is not a typical interarrival interval. 756 00:39:19,940 --> 00:39:23,440 This interarrival interval consists of two pieces. 757 00:39:23,440 --> 00:39:28,810 Let's call them T1 and T1 prime. 758 00:39:28,810 --> 00:39:32,250 What can you tell me about those two random variables? 759 00:39:32,250 --> 00:39:35,940 What kind of random variable is T1? 760 00:39:35,940 --> 00:39:39,950 Starting from this time, with the Poisson process, the past 761 00:39:39,950 --> 00:39:41,290 doesn't matter. 762 00:39:41,290 --> 00:39:43,870 It's the time until an arrival happens. 763 00:39:43,870 --> 00:39:49,110 So T1 is going to be an exponential random variable 764 00:39:49,110 --> 00:39:50,425 with parameter lambda. 765 00:39:50,425 --> 00:39:53,300 766 00:39:53,300 --> 00:39:56,620 So in particular, the expected value of T1 is 767 00:39:56,620 --> 00:40:00,260 going to be 15 by itself. 768 00:40:00,260 --> 00:40:02,720 How about the random variable T1 prime. 769 00:40:02,720 --> 00:40:07,130 What kind of random variable is it? 770 00:40:07,130 --> 00:40:14,180 This is like the first arrival in a Poisson process that runs 771 00:40:14,180 --> 00:40:17,650 backwards in time. 772 00:40:17,650 --> 00:40:20,330 What kind of process is a Poisson process running 773 00:40:20,330 --> 00:40:21,200 backwards in time? 774 00:40:21,200 --> 00:40:23,030 Let's think of coin flips. 775 00:40:23,030 --> 00:40:26,130 Suppose you have a movie of coin flips. 776 00:40:26,130 --> 00:40:29,480 And for some accident, that fascinating movie, you happen 777 00:40:29,480 --> 00:40:31,100 to watch it backwards. 778 00:40:31,100 --> 00:40:33,610 Will it look any different statistically? 779 00:40:33,610 --> 00:40:33,780 No. 780 00:40:33,780 --> 00:40:36,940 It's going to be just the sequence of random coin flips. 781 00:40:36,940 --> 00:40:40,770 So a Bernoulli process that's runs in reverse time is 782 00:40:40,770 --> 00:40:42,410 statistically identical to a Bernoulli 783 00:40:42,410 --> 00:40:44,290 process in forward time. 784 00:40:44,290 --> 00:40:46,600 The Poisson process is a limit of the Bernoulli. 785 00:40:46,600 --> 00:40:48,950 So, same story with the Poisson process. 786 00:40:48,950 --> 00:40:51,410 If you run it backwards in time it looks the same. 787 00:40:51,410 --> 00:40:55,190 So looking backwards in time, this is a Poisson process. 788 00:40:55,190 --> 00:40:58,930 And T1 prime is the time until the first arrival in this 789 00:40:58,930 --> 00:41:00,260 backward process. 790 00:41:00,260 --> 00:41:04,910 So T1 prime is also going to be an exponential random 791 00:41:04,910 --> 00:41:07,340 variable with the same parameter, lambda. 792 00:41:07,340 --> 00:41:11,000 And the expected value of T1 prime is 15. 793 00:41:11,000 --> 00:41:15,860 Conclusion is that the expected length of this 794 00:41:15,860 --> 00:41:22,860 interval is going to be 30 minutes. 795 00:41:22,860 --> 00:41:26,690 And the fact that this agent found the average to be 796 00:41:26,690 --> 00:41:31,230 something like 30 does not contradict the claims of the 797 00:41:31,230 --> 00:41:35,010 bus company that they're running Poisson buses with a 798 00:41:35,010 --> 00:41:38,370 rate of lambda equal to 4. 799 00:41:38,370 --> 00:41:38,780 OK. 800 00:41:38,780 --> 00:41:43,390 So maybe the company can this way-- they can defend 801 00:41:43,390 --> 00:41:44,970 themselves in court. 802 00:41:44,970 --> 00:41:47,490 But there's something puzzling here. 803 00:41:47,490 --> 00:41:50,360 How long is the interarrival time? 804 00:41:50,360 --> 00:41:51,910 Is it 15? 805 00:41:51,910 --> 00:41:53,216 Or is it 30? 806 00:41:53,216 --> 00:41:55,750 On the average. 807 00:41:55,750 --> 00:41:59,960 The issue is what do we mean by a typical 808 00:41:59,960 --> 00:42:01,360 interarrival time. 809 00:42:01,360 --> 00:42:04,940 When we say typical, we mean some kind of average. 810 00:42:04,940 --> 00:42:08,690 But average over what? 811 00:42:08,690 --> 00:42:13,280 And here's two different ways of thinking about averages. 812 00:42:13,280 --> 00:42:15,080 You number the buses. 813 00:42:15,080 --> 00:42:17,120 And you have bus number 100. 814 00:42:17,120 --> 00:42:21,120 You have bus number 101, bus number 102, bus 815 00:42:21,120 --> 00:42:24,660 number 110, and so on. 816 00:42:24,660 --> 00:42:29,370 One way of thinking about averages is that you pick a 817 00:42:29,370 --> 00:42:32,150 bus number at random. 818 00:42:32,150 --> 00:42:36,070 I pick, let's say, that bus, all buses being sort of 819 00:42:36,070 --> 00:42:37,760 equally likely to be picked. 820 00:42:37,760 --> 00:42:41,610 And I measure this interarrival time. 821 00:42:41,610 --> 00:42:45,380 So for a typical bus. 822 00:42:45,380 --> 00:42:50,390 Then, starting from here until there, the expected time has 823 00:42:50,390 --> 00:42:56,600 to be 1 over lambda, for the Poisson process. 824 00:42:56,600 --> 00:42:58,370 But what we did in this experiment 825 00:42:58,370 --> 00:42:59,720 was something different. 826 00:42:59,720 --> 00:43:02,040 We didn't pick a bus at random. 827 00:43:02,040 --> 00:43:05,090 We picked a time at random. 828 00:43:05,090 --> 00:43:08,870 And if the picture is, let's say, this way, I'm much more 829 00:43:08,870 --> 00:43:12,770 likely to pick this interval and therefore this 830 00:43:12,770 --> 00:43:16,290 interarrival time, rather than that interval. 831 00:43:16,290 --> 00:43:20,480 Because, this interval corresponds to very few times. 832 00:43:20,480 --> 00:43:23,680 So if I'm picking a time at random and, in some sense, 833 00:43:23,680 --> 00:43:27,430 let's say, uniform, so that all times are equally likely, 834 00:43:27,430 --> 00:43:31,190 I'm much more likely to fall inside a big interval rather 835 00:43:31,190 --> 00:43:32,710 than a small interval. 836 00:43:32,710 --> 00:43:37,140 So a person who shows up at the bus stop at a random time. 837 00:43:37,140 --> 00:43:42,040 They're selecting an interval in a biased way, with the bias 838 00:43:42,040 --> 00:43:44,350 favor of longer intervals. 839 00:43:44,350 --> 00:43:47,850 And that's why what they observe is a random variable 840 00:43:47,850 --> 00:43:51,830 that has a larger expected value then the ordinary 841 00:43:51,830 --> 00:43:53,080 expected value. 842 00:43:53,080 --> 00:43:56,780 So the subtlety here is to realize that we're talking 843 00:43:56,780 --> 00:43:59,590 between two different kinds of experiments. 844 00:43:59,590 --> 00:44:05,250 Picking a bus number at random verses picking an interval at 845 00:44:05,250 --> 00:44:11,500 random with a bias in favor of longer intervals. 846 00:44:11,500 --> 00:44:14,840 Lots of paradoxes that one can cook up using Poisson 847 00:44:14,840 --> 00:44:19,190 processes and random processes in general often have to do 848 00:44:19,190 --> 00:44:21,340 with the story of this kind. 849 00:44:21,340 --> 00:44:24,780 The phenomenon that we had in this particular example also 850 00:44:24,780 --> 00:44:28,970 shows up in general, whenever you have other kinds of 851 00:44:28,970 --> 00:44:30,470 arrival processes. 852 00:44:30,470 --> 00:44:34,210 So the Poisson process is the simplest arrival process there 853 00:44:34,210 --> 00:44:36,830 is, where the interarrival times are 854 00:44:36,830 --> 00:44:38,820 exponential random variables. 855 00:44:38,820 --> 00:44:40,280 There's a larger class of models. 856 00:44:40,280 --> 00:44:43,580 They're called renewal processes, in which, again, we 857 00:44:43,580 --> 00:44:46,900 have a sequence of successive arrivals, interarrival times 858 00:44:46,900 --> 00:44:50,100 are identically distributed and independent, but they may 859 00:44:50,100 --> 00:44:52,320 come from a general distribution. 860 00:44:52,320 --> 00:44:55,100 So to make the same point of the previous example but in a 861 00:44:55,100 --> 00:44:59,250 much simpler setting, suppose that bus interarrival times 862 00:44:59,250 --> 00:45:02,830 are either 5 or 10 minutes apart. 863 00:45:02,830 --> 00:45:05,930 So you get some intervals that are of length 5. 864 00:45:05,930 --> 00:45:08,790 You get some that are of length 10. 865 00:45:08,790 --> 00:45:12,810 And suppose that these are equally likely. 866 00:45:12,810 --> 00:45:16,990 So we have -- not exactly -- 867 00:45:16,990 --> 00:45:20,380 In the long run, we have as many 5 minute intervals as we 868 00:45:20,380 --> 00:45:22,490 have 10 minute intervals. 869 00:45:22,490 --> 00:45:30,590 So the average interarrival time is 7 and 1/2. 870 00:45:30,590 --> 00:45:35,850 But if a person shows up at a random time, what are they 871 00:45:35,850 --> 00:45:37,100 going to see? 872 00:45:37,100 --> 00:45:40,520 873 00:45:40,520 --> 00:45:43,150 Do we have as many 5s as 10s? 874 00:45:43,150 --> 00:45:47,490 But every 10 covers twice as much space. 875 00:45:47,490 --> 00:45:52,640 So if I show up at a random time, I have probability 2/3 876 00:45:52,640 --> 00:45:57,180 falling inside an interval of duration 10. 877 00:45:57,180 --> 00:46:00,990 And I have one 1/3 probability of falling inside an interval 878 00:46:00,990 --> 00:46:02,460 of duration 5. 879 00:46:02,460 --> 00:46:06,710 That's because, out of the whole real line, 2/3 of it is 880 00:46:06,710 --> 00:46:08,810 covered by intervals of length 10, just 881 00:46:08,810 --> 00:46:09,590 because they're longer. 882 00:46:09,590 --> 00:46:12,280 1/3 is covered by the smaller intervals. 883 00:46:12,280 --> 00:46:19,530 Now if I fall inside an interval of length 10 and I 884 00:46:19,530 --> 00:46:23,260 measure the length of the interval that I fell into, 885 00:46:23,260 --> 00:46:25,030 that's going to be 10. 886 00:46:25,030 --> 00:46:27,780 But if I fall inside an interval of length 5 and I 887 00:46:27,780 --> 00:46:30,320 measure how long it is, I'm going to get a 5. 888 00:46:30,320 --> 00:46:37,270 And that, of course, is going to be different than 7.5. 889 00:46:37,270 --> 00:46:38,010 OK. 890 00:46:38,010 --> 00:46:42,310 And which number should be bigger? 891 00:46:42,310 --> 00:46:45,110 It's the second number that's bigger because this one is 892 00:46:45,110 --> 00:46:48,930 biased in favor of the longer intervals. 893 00:46:48,930 --> 00:46:51,380 So that's, again, another illustration of the different 894 00:46:51,380 --> 00:46:54,640 results that you get when you have this random incidence 895 00:46:54,640 --> 00:46:55,990 phenomenon. 896 00:46:55,990 --> 00:46:59,320 So the bottom line, again, is that if you talk about a 897 00:46:59,320 --> 00:47:03,380 typical interarrival time, one must be very precise in 898 00:47:03,380 --> 00:47:05,370 specifying what we mean typical. 899 00:47:05,370 --> 00:47:08,120 So typical means sort of random. 900 00:47:08,120 --> 00:47:11,250 But to use the word random, you must specify very 901 00:47:11,250 --> 00:47:15,070 precisely what is the random experiment that you are using. 902 00:47:15,070 --> 00:47:18,920 And if you're not careful, you can get into apparent puzzles, 903 00:47:18,920 --> 00:47:20,770 such as the following. 904 00:47:20,770 --> 00:47:25,170 Suppose somebody tells you the average family size is 4, but 905 00:47:25,170 --> 00:47:30,340 the average person lives in a family of size 6. 906 00:47:30,340 --> 00:47:33,330 Is that compatible? 907 00:47:33,330 --> 00:47:36,610 Family size is 4 on the average, but typical people 908 00:47:36,610 --> 00:47:40,110 live, on the average, in families of size 6. 909 00:47:40,110 --> 00:47:41,590 Well yes. 910 00:47:41,590 --> 00:47:43,080 There's no contradiction here. 911 00:47:43,080 --> 00:47:45,450 We're talking about two different experiments. 912 00:47:45,450 --> 00:47:50,000 In one experiment, I pick a family at random, and I tell 913 00:47:50,000 --> 00:47:51,960 you the average family is 4. 914 00:47:51,960 --> 00:47:55,910 In another experiment, I pick a person at random and I tell 915 00:47:55,910 --> 00:47:58,310 you that this person, on the average, will be in their 916 00:47:58,310 --> 00:48:00,080 family of size 6. 917 00:48:00,080 --> 00:48:01,140 And what is the catch here? 918 00:48:01,140 --> 00:48:05,440 That if I pick a person at random, large families are 919 00:48:05,440 --> 00:48:08,160 more likely to be picked. 920 00:48:08,160 --> 00:48:11,710 So there's a bias in favor of large families. 921 00:48:11,710 --> 00:48:15,270 Or if you want to survey, let's say, are trains crowded 922 00:48:15,270 --> 00:48:16,495 in your city? 923 00:48:16,495 --> 00:48:19,170 Or are buses crowded? 924 00:48:19,170 --> 00:48:22,040 One choice is to pick a bus at random and inspect 925 00:48:22,040 --> 00:48:23,220 how crowded it is. 926 00:48:23,220 --> 00:48:27,260 Another choice is to pick a typical person and ask them, 927 00:48:27,260 --> 00:48:29,080 "Did you ride the bus today? 928 00:48:29,080 --> 00:48:33,500 Was it's crowded?" Well suppose that in this city 929 00:48:33,500 --> 00:48:36,265 there's one bus that's extremely crowded and all the 930 00:48:36,265 --> 00:48:38,520 other buses are completely empty. 931 00:48:38,520 --> 00:48:42,300 If you ask a person. "Was your bus crowded?" They will tell 932 00:48:42,300 --> 00:48:46,040 you, "Yes, my bus was crowded." There's no witness 933 00:48:46,040 --> 00:48:49,460 from the empty buses to testify in their favor. 934 00:48:49,460 --> 00:48:52,780 So by sampling people instead of sampling buses, you're 935 00:48:52,780 --> 00:48:54,940 going to get different result. 936 00:48:54,940 --> 00:48:58,320 And in the process industry, if your job is to inspect and 937 00:48:58,320 --> 00:49:01,450 check cookies, you will be faced with a big dilemma. 938 00:49:01,450 --> 00:49:05,190 Do you want to find out how many chocolate chips there are 939 00:49:05,190 --> 00:49:06,940 on a typical cookie? 940 00:49:06,940 --> 00:49:09,990 Are you going to interview cookies or are you going to 941 00:49:09,990 --> 00:49:13,880 interview chocolate chips and ask them how many other chips 942 00:49:13,880 --> 00:49:16,520 where there on your cookie? 943 00:49:16,520 --> 00:49:18,020 And you're going to get different 944 00:49:18,020 --> 00:49:19,210 answers in these cases. 945 00:49:19,210 --> 00:49:22,670 So moral is, one has to be very precise on how you 946 00:49:22,670 --> 00:49:26,160 formulate the sampling procedure that you have. 947 00:49:26,160 --> 00:49:28,330 And you'll get different answers.