1 00:00:01,030 --> 00:00:04,660 In this segment, we discuss the so-called "random incidence" 2 00:00:04,660 --> 00:00:07,220 paradox for the Poisson process. 3 00:00:07,220 --> 00:00:09,170 It's a paradox because it involves 4 00:00:09,170 --> 00:00:12,200 a somewhat counterintuitive phenomenon. 5 00:00:12,200 --> 00:00:15,430 However, we will understand exactly what's going on, 6 00:00:15,430 --> 00:00:18,450 and in the end, it will cease to be a paradox 7 00:00:18,450 --> 00:00:20,700 and we will have an intuitive understanding 8 00:00:20,700 --> 00:00:22,650 of what exactly is happening. 9 00:00:22,650 --> 00:00:26,250 So consider a Poisson process that has been running forever, 10 00:00:26,250 --> 00:00:28,500 or think of it as a Poisson process that 11 00:00:28,500 --> 00:00:33,100 started a very long time back in the past. 12 00:00:33,100 --> 00:00:37,110 To make things concrete, suppose that the arrival rate 13 00:00:37,110 --> 00:00:42,900 is 4 arrivals per hour so that the expected interarrival time 14 00:00:42,900 --> 00:00:47,690 is one fourth, in hours, or that would 15 00:00:47,690 --> 00:00:49,360 be the same as 15 minutes. 16 00:00:51,960 --> 00:00:56,690 For example, suppose that the bus company in your town 17 00:00:56,690 --> 00:01:00,690 claims that buses arrive to your stop 18 00:01:00,690 --> 00:01:05,200 according to a Poisson process with this particular rate. 19 00:01:05,200 --> 00:01:09,050 But you don't really believe that your bus company is 20 00:01:09,050 --> 00:01:12,789 telling the truth and you decide to investigate. 21 00:01:12,789 --> 00:01:15,130 So what you do is the following. 22 00:01:15,130 --> 00:01:19,130 You show up at some time at your bus stop 23 00:01:19,130 --> 00:01:24,370 and wait until the next arrival comes 24 00:01:24,370 --> 00:01:27,630 and also ask someone who lives near the bus 25 00:01:27,630 --> 00:01:30,770 stop, what time was the last arrival? 26 00:01:30,770 --> 00:01:32,690 And they tell you the last arrival 27 00:01:32,690 --> 00:01:35,350 happened at that time instant. 28 00:01:35,350 --> 00:01:37,960 And you measure this amount of time, 29 00:01:37,960 --> 00:01:43,120 which is the interarrival time, record what it is, 30 00:01:43,120 --> 00:01:49,340 repeat this experiment on many days, and calculate an average. 31 00:01:49,340 --> 00:01:51,880 What you're likely to see turns out 32 00:01:51,880 --> 00:01:56,320 to be something around 30 minutes. 33 00:01:56,320 --> 00:01:59,350 At this point, you could go to the bus company 34 00:01:59,350 --> 00:02:01,160 and challenge them. 35 00:02:01,160 --> 00:02:05,100 You claim an arrival rate of 4 arrivals per hour, which 36 00:02:05,100 --> 00:02:08,310 would translate into interarrivals of 15 minutes, 37 00:02:08,310 --> 00:02:12,070 but every day I go and check the interarrival time 38 00:02:12,070 --> 00:02:15,800 and I find that they are close to 30 minutes. 39 00:02:15,800 --> 00:02:17,030 What's the explanation? 40 00:02:17,030 --> 00:02:18,210 What's going on? 41 00:02:18,210 --> 00:02:21,820 Is it that the belief or the claim of the bus company 42 00:02:21,820 --> 00:02:26,190 is incorrect, or is there something more complicated? 43 00:02:26,190 --> 00:02:27,910 So let us try to understand what's 44 00:02:27,910 --> 00:02:33,000 going on by being very precise and careful. 45 00:02:33,000 --> 00:02:37,360 You show up at the bus station at some time-- 46 00:02:37,360 --> 00:02:41,530 let's call that time t star. 47 00:02:41,530 --> 00:02:45,110 You ask someone who has been at the station, 48 00:02:45,110 --> 00:02:48,750 when was the last arrival time, and they tell you, 49 00:02:48,750 --> 00:02:53,860 and it is some number U. You wait until the next bus, 50 00:02:53,860 --> 00:02:57,950 and the next bus arrives at some future time capital 51 00:02:57,950 --> 00:03:02,730 V. You are interested in the interarrival time 52 00:03:02,730 --> 00:03:04,910 that you're observing, which is the difference 53 00:03:04,910 --> 00:03:09,290 between these two random variables V minus U. 54 00:03:09,290 --> 00:03:12,890 Now this difference-- let us split it into two pieces. 55 00:03:12,890 --> 00:03:16,336 There's one piece from t star until V, 56 00:03:16,336 --> 00:03:20,270 which is V minus t star. 57 00:03:20,270 --> 00:03:23,610 And there's another piece, which is the first interval, 58 00:03:23,610 --> 00:03:31,930 and this is t star minus U. Now t star, the time at which you 59 00:03:31,930 --> 00:03:34,690 arrive, is just a constant. 60 00:03:34,690 --> 00:03:38,600 Suppose that you arrive at the bus station at exactly 12 noon. 61 00:03:38,600 --> 00:03:40,430 There's nothing random about it. 62 00:03:40,430 --> 00:03:43,620 However, V and U are random variables. 63 00:03:43,620 --> 00:03:46,460 What kind of random variable is this? 64 00:03:46,460 --> 00:03:52,290 You show up at 12 noon and you wait until the first arrival. 65 00:03:52,290 --> 00:03:56,920 Because a Poisson process starts fresh at any given time-- 66 00:03:56,920 --> 00:03:59,680 so after 12 noon it starts fresh-- this 67 00:03:59,680 --> 00:04:02,940 is the time until the first arrival in a Poisson process 68 00:04:02,940 --> 00:04:06,150 with rate lambda, so this is a random variable 69 00:04:06,150 --> 00:04:10,750 which is exponential with parameter lambda. 70 00:04:10,750 --> 00:04:14,060 Now let us understand what this random variable is. 71 00:04:16,930 --> 00:04:19,899 One way of thinking about it is to think 72 00:04:19,899 --> 00:04:23,720 of the Poisson process running backwards in time, 73 00:04:23,720 --> 00:04:25,420 so you live time backwards. 74 00:04:25,420 --> 00:04:29,910 You show up at 12 noon, and then time runs backwards, 75 00:04:29,910 --> 00:04:34,600 and you wait until you see the first arrival coming 76 00:04:34,600 --> 00:04:38,650 in this backwards universe. 77 00:04:38,650 --> 00:04:41,250 So we're dealing here with the time 78 00:04:41,250 --> 00:04:44,240 until an arrival in a Poisson process 79 00:04:44,240 --> 00:04:46,840 that runs backwards in time. 80 00:04:46,840 --> 00:04:50,020 What kind of process is a backwards Poisson process? 81 00:04:52,960 --> 00:04:56,260 If you take a Poisson process in reverse time, 82 00:04:56,260 --> 00:04:59,590 the independence assumption is not affected. 83 00:04:59,590 --> 00:05:02,380 Disjoint time intervals are independent. 84 00:05:02,380 --> 00:05:05,080 Even if you reverse time, disjoints time intervals 85 00:05:05,080 --> 00:05:07,320 still remain independent. 86 00:05:07,320 --> 00:05:11,130 Any given time interval of small length delta 87 00:05:11,130 --> 00:05:15,070 will have certain probabilities of an arrival 88 00:05:15,070 --> 00:05:18,280 or of two arrivals, and these will be the same 89 00:05:18,280 --> 00:05:22,120 whether time goes forward or time goes backward. 90 00:05:22,120 --> 00:05:24,400 So the conclusion from this discussion 91 00:05:24,400 --> 00:05:26,940 is that the backwards running Poisson process 92 00:05:26,940 --> 00:05:31,020 is also a Poisson process, and so this time 93 00:05:31,020 --> 00:05:34,500 until the first arrival in the backwards process 94 00:05:34,500 --> 00:05:38,030 is just like the time until the first arrival in a Poisson 95 00:05:38,030 --> 00:05:38,830 process. 96 00:05:38,830 --> 00:05:42,870 So this also is an exponential random variable 97 00:05:42,870 --> 00:05:46,510 with parameter lambda. 98 00:05:46,510 --> 00:05:51,050 Even more than that, these two random variables 99 00:05:51,050 --> 00:05:53,200 are independent of each other. 100 00:05:53,200 --> 00:05:55,180 Why are they independent? 101 00:05:55,180 --> 00:05:58,220 The length of this time interval has 102 00:05:58,220 --> 00:06:00,830 to do with the history of the Poisson process 103 00:06:00,830 --> 00:06:02,990 after time t star. 104 00:06:02,990 --> 00:06:04,790 The length of this time interval has 105 00:06:04,790 --> 00:06:07,200 to do with the history of the Poisson process 106 00:06:07,200 --> 00:06:10,810 before time t star, but in the Poisson process because 107 00:06:10,810 --> 00:06:14,800 of the independence property, the past and the future 108 00:06:14,800 --> 00:06:17,640 are independent, and therefore, this random variable 109 00:06:17,640 --> 00:06:21,580 is independent from that random variable. 110 00:06:21,580 --> 00:06:27,120 In any case, the expected value of the interarrival interval 111 00:06:27,120 --> 00:06:30,300 that you see, the expected value of this random variable, 112 00:06:30,300 --> 00:06:34,510 is going to be the expected value of one exponential, which 113 00:06:34,510 --> 00:06:37,240 is 1 over lambda, plus the expected 114 00:06:37,240 --> 00:06:40,630 value of another exponential, which is 1 over lambda, 115 00:06:40,630 --> 00:06:46,220 and we get a result of 2 over lambda. 116 00:06:46,220 --> 00:06:50,630 And that's why when you actually carried out the experiment, 117 00:06:50,630 --> 00:06:53,850 you saw interarrival intervals that 118 00:06:53,850 --> 00:06:58,390 had a length of 30 minutes as opposed to the 15 minutes 119 00:06:58,390 --> 00:07:01,890 that you were expecting in the first place. 120 00:07:01,890 --> 00:07:04,420 Now how can this be? 121 00:07:04,420 --> 00:07:08,280 Since the interarrival times in a Poisson process 122 00:07:08,280 --> 00:07:11,410 have expected value 1 over lambda, 123 00:07:11,410 --> 00:07:14,160 how can it be that the expected length 124 00:07:14,160 --> 00:07:16,730 of the interarrival times that you see 125 00:07:16,730 --> 00:07:20,560 have an expected value of 2 over lambda? 126 00:07:20,560 --> 00:07:23,360 Well, the resolution of this paradox has to do [with] 127 00:07:23,360 --> 00:07:29,840 what exactly we mean when we use the words an interarrival time. 128 00:07:29,840 --> 00:07:34,520 There's one interpretation which is the first interarrival time, 129 00:07:34,520 --> 00:07:38,060 the second one, the hundredth interarrival time-- 130 00:07:38,060 --> 00:07:41,280 each one of these actually has an expected 131 00:07:41,280 --> 00:07:45,100 value of 1 over lambda. 132 00:07:45,100 --> 00:07:47,980 But this is a different kind of interarrival time. 133 00:07:47,980 --> 00:07:50,720 It's not the first or the second or 134 00:07:50,720 --> 00:07:54,190 some specific k-th interarrival time. 135 00:07:54,190 --> 00:08:00,570 It's the interarrival time that you selected to watch. 136 00:08:00,570 --> 00:08:04,780 When you show up at a certain time, like 12 noon, 137 00:08:04,780 --> 00:08:09,310 you're more likely to fall inside a large interarrival 138 00:08:09,310 --> 00:08:13,740 interval rather than a smaller interarrival interval. 139 00:08:13,740 --> 00:08:16,420 So just the fact that you're showing up 140 00:08:16,420 --> 00:08:18,640 at a certain time that's uncoordinated 141 00:08:18,640 --> 00:08:20,630 with the rest of the process makes 142 00:08:20,630 --> 00:08:24,500 you more likely to be biased towards longer 143 00:08:24,500 --> 00:08:26,740 rather than shorter intervals. 144 00:08:26,740 --> 00:08:31,220 And this bias is what causes this factor of 2. 145 00:08:34,030 --> 00:08:37,308 So it's an issue really about how you sample 146 00:08:37,308 --> 00:08:40,250 or how you choose the interarrival time 147 00:08:40,250 --> 00:08:43,419 that you're going to watch, and this particular sampling 148 00:08:43,419 --> 00:08:46,990 method has a bias towards longer intervals. 149 00:08:46,990 --> 00:08:49,420 As we will see, this is not something 150 00:08:49,420 --> 00:08:52,330 that's specific to the Poisson process. 151 00:08:52,330 --> 00:08:54,500 In general, in many occasions there 152 00:08:54,500 --> 00:08:57,760 are different ways of sampling which give you 153 00:08:57,760 --> 00:09:01,520 different answers, and we will go through a number of examples 154 00:09:01,520 --> 00:09:04,280 that will give you some intuition about the source 155 00:09:04,280 --> 00:09:07,660 of the discrepancy between these two answers.