1 00:00:01,130 --> 00:00:03,650 PROFESSOR: In this segment, we will discuss a little bit 2 00:00:03,650 --> 00:00:08,000 the union bound and then discuss a counterpart, which is known 3 00:00:08,000 --> 00:00:10,220 as the Bonferroni inequality. 4 00:00:10,220 --> 00:00:12,320 Let us start with a story. 5 00:00:12,320 --> 00:00:18,130 Suppose that we have a number of students in some class. 6 00:00:18,130 --> 00:00:21,190 And we have a set of students that are smart, 7 00:00:21,190 --> 00:00:23,690 let's call that set a1. 8 00:00:23,690 --> 00:00:27,310 So this is the set of smart students. 9 00:00:27,310 --> 00:00:31,540 And we have a set of students that are beautiful. 10 00:00:31,540 --> 00:00:34,880 And let's call that set a2. 11 00:00:34,880 --> 00:00:39,040 So a2 is the set of beautiful students. 12 00:00:39,040 --> 00:00:43,540 If I tell you that the set of smart students is small, 13 00:00:43,540 --> 00:00:47,960 and the set of beautiful students are small, 14 00:00:47,960 --> 00:00:51,730 then you can probably conclude that there 15 00:00:51,730 --> 00:00:57,880 are very few students that are either smart or beautiful. 16 00:00:57,880 --> 00:01:00,410 What does this have to do with probability? 17 00:01:00,410 --> 00:01:03,260 Well, when we say very few are smart, 18 00:01:03,260 --> 00:01:06,140 we might mean that if I pick a student at random, 19 00:01:06,140 --> 00:01:08,200 there is only a small probability 20 00:01:08,200 --> 00:01:10,810 that I pick a smart student, and similarly 21 00:01:10,810 --> 00:01:13,980 for beautiful students. 22 00:01:13,980 --> 00:01:17,910 Can we make this statement more precise? 23 00:01:17,910 --> 00:01:19,350 Indeed we can. 24 00:01:19,350 --> 00:01:22,800 We have the union bound that tells us 25 00:01:22,800 --> 00:01:25,980 that the probability that I pick a student that 26 00:01:25,980 --> 00:01:30,900 is either smart or beautiful is less than 27 00:01:30,900 --> 00:01:33,240 or equal to the probability of being 28 00:01:33,240 --> 00:01:37,920 a smart student plus the probability of picking 29 00:01:37,920 --> 00:01:40,330 a beautiful student. 30 00:01:40,330 --> 00:01:44,910 So, if this probability is small and that probability is 1, 31 00:01:44,910 --> 00:01:47,340 then this probability will also be small, 32 00:01:47,340 --> 00:01:51,030 which means that there is only a small number of students that 33 00:01:51,030 --> 00:01:53,760 are either smart or beautiful. 34 00:01:53,760 --> 00:01:59,670 Now let us try to turn the statement around its head. 35 00:01:59,670 --> 00:02:04,440 Suppose that most of the students are smart 36 00:02:04,440 --> 00:02:07,840 and most of the students are beautiful. 37 00:02:07,840 --> 00:02:09,979 So in this case, I'm telling you that these 38 00:02:09,979 --> 00:02:14,640 sets a1 and a2 are big. 39 00:02:14,640 --> 00:02:19,390 Now, if the set a1 is big, then it 40 00:02:19,390 --> 00:02:26,260 means that this set here, the complement of a1, 41 00:02:26,260 --> 00:02:28,960 is a small set. 42 00:02:28,960 --> 00:02:34,380 And if I tell you that the set a2 is big, 43 00:02:34,380 --> 00:02:38,430 then it means that this set here, 44 00:02:38,430 --> 00:02:44,010 which is a complement of a2, is also small. 45 00:02:44,010 --> 00:02:50,610 So everything outside here is a small set, which 46 00:02:50,610 --> 00:02:53,610 means that whatever is left, which 47 00:02:53,610 --> 00:02:59,100 is the intersection of a1 and a2, should be a big set. 48 00:02:59,100 --> 00:03:03,820 So we should be able to conclude that, in this case, 49 00:03:03,820 --> 00:03:06,910 most of the students belong to the intersection. 50 00:03:06,910 --> 00:03:10,530 So they're both smart and beautiful. 51 00:03:10,530 --> 00:03:14,550 How can we turn this into a mathematical statement? 52 00:03:14,550 --> 00:03:19,240 It's the following inequality that we will prove shortly. 53 00:03:19,240 --> 00:03:23,490 But what it says is that the probability of the intersection 54 00:03:23,490 --> 00:03:26,610 is larger than or equal to something. 55 00:03:26,610 --> 00:03:31,350 And if this probability is close to 1, 56 00:03:31,350 --> 00:03:33,660 which says that most of the students are smart, 57 00:03:33,660 --> 00:03:35,970 and this probability is close to 1, which 58 00:03:35,970 --> 00:03:38,190 says that more students are beautiful, 59 00:03:38,190 --> 00:03:41,070 then this difference here is going 60 00:03:41,070 --> 00:03:44,900 to be close to 1 plus 1 minus 1, which is 1. 61 00:03:44,900 --> 00:03:48,870 Therefore, the probability of the intersection 62 00:03:48,870 --> 00:03:52,320 is going to be larger than or equal to some number that's 63 00:03:52,320 --> 00:03:53,620 close to 1. 64 00:03:53,620 --> 00:03:57,150 So this one will also be close to 1, which is the conclusion 65 00:03:57,150 --> 00:04:00,660 that indeed most students fall in this intersection 66 00:04:00,660 --> 00:04:03,880 and they're both smart and beautiful. 67 00:04:03,880 --> 00:04:07,980 So what we will do next will be to derive this inequality 68 00:04:07,980 --> 00:04:09,600 and actually generalize it. 69 00:04:13,050 --> 00:04:16,800 So here is the relation that we wish to establish. 70 00:04:16,800 --> 00:04:19,620 We want to show that the probability of a certain event 71 00:04:19,620 --> 00:04:23,310 is bigger than something. 72 00:04:23,310 --> 00:04:24,900 How do we show that? 73 00:04:24,900 --> 00:04:29,190 One way is to show that the probability of the complement 74 00:04:29,190 --> 00:04:35,320 of this event, namely this event here, 75 00:04:35,320 --> 00:04:40,240 we want to show that this event has small probability. 76 00:04:40,240 --> 00:04:42,130 Now what is this event? 77 00:04:42,130 --> 00:04:45,010 Here we can use DeMorgan's laws, which 78 00:04:45,010 --> 00:04:51,790 tell us that this event is the same as this one. 79 00:04:51,790 --> 00:04:54,610 That is, the complement of an intersection 80 00:04:54,610 --> 00:04:57,700 is the union of the complements. 81 00:04:57,700 --> 00:05:02,180 Since these two sets or events are identical, 82 00:05:02,180 --> 00:05:06,040 it means that their probabilities will also 83 00:05:06,040 --> 00:05:06,880 be equal. 84 00:05:11,460 --> 00:05:14,250 And next we will use the union bound 85 00:05:14,250 --> 00:05:17,730 to write this probability as being less than 86 00:05:17,730 --> 00:05:22,560 or equal to the sum of the probabilities of the two events 87 00:05:22,560 --> 00:05:24,345 whose union we are taking. 88 00:05:27,820 --> 00:05:30,640 Now we're getting close, except that here we 89 00:05:30,640 --> 00:05:33,400 have complements all over, whereas up here we 90 00:05:33,400 --> 00:05:34,960 do not have any complements. 91 00:05:34,960 --> 00:05:36,380 What can we do? 92 00:05:36,380 --> 00:05:39,820 Well, the probability of a complement of an event 93 00:05:39,820 --> 00:05:45,070 is the same as 1 minus the probability of that event. 94 00:05:48,790 --> 00:05:53,380 And we do the same thing for the terms that we have here. 95 00:05:53,380 --> 00:05:57,670 This probability here is equal to 1 96 00:05:57,670 --> 00:06:01,150 minus the probability of a1. 97 00:06:01,150 --> 00:06:03,910 And this probability here is equal to 1 98 00:06:03,910 --> 00:06:08,200 minus the probability of a2. 99 00:06:08,200 --> 00:06:10,880 And now if we take this inequality, 100 00:06:10,880 --> 00:06:17,730 cancel this term with that term, and then moved terms around, 101 00:06:17,730 --> 00:06:21,090 what we have is exactly this relation 102 00:06:21,090 --> 00:06:24,130 that we wanted to prove. 103 00:06:24,130 --> 00:06:26,580 It turns out that this inequality 104 00:06:26,580 --> 00:06:29,320 has a generalization to the case where we take 105 00:06:29,320 --> 00:06:32,770 the intersection of n events. 106 00:06:32,770 --> 00:06:36,610 And this has, again, the same intuitive content. 107 00:06:36,610 --> 00:06:40,900 Suppose that each one of these events a1 up to a(n) 108 00:06:40,900 --> 00:06:42,580 is almost certain to occur. 109 00:06:42,580 --> 00:06:46,300 That is, it has a probability close to 1. 110 00:06:46,300 --> 00:06:50,890 In that case, this term will be close to n. 111 00:06:50,890 --> 00:06:54,790 We subtract n minus 1, so this term on the right hand side 112 00:06:54,790 --> 00:06:57,350 will be close to 1. 113 00:06:57,350 --> 00:06:59,860 Therefore, the probability of the intersection 114 00:06:59,860 --> 00:07:04,670 will be larger than or equal to something that's close to 1. 115 00:07:04,670 --> 00:07:06,040 So this is big. 116 00:07:06,040 --> 00:07:09,430 Essentially what it's saying is that we have big sets 117 00:07:09,430 --> 00:07:11,980 and we take their intersection, then 118 00:07:11,980 --> 00:07:14,950 that intersection will also be big in terms 119 00:07:14,950 --> 00:07:17,380 of having large probability. 120 00:07:17,380 --> 00:07:19,300 How do we prove this relation? 121 00:07:19,300 --> 00:07:21,430 Exactly the same way as it was proved 122 00:07:21,430 --> 00:07:23,530 for the case of two sets. 123 00:07:23,530 --> 00:07:27,070 Namely, instead of looking at this event, 124 00:07:27,070 --> 00:07:30,360 we look at the complement of this event. 125 00:07:33,020 --> 00:07:37,760 And we use DeMorgan's laws to write this complement 126 00:07:37,760 --> 00:07:40,430 as the union of the complements. 127 00:07:44,540 --> 00:07:47,600 These two are the same sets or events, 128 00:07:47,600 --> 00:07:52,040 so they have the same probability. 129 00:07:52,040 --> 00:07:55,430 And then we use the union bound to write this 130 00:07:55,430 --> 00:07:59,630 as being less than or equal to the probabilities of all 131 00:07:59,630 --> 00:08:01,370 of those sets. 132 00:08:06,150 --> 00:08:09,570 Now this is equal to 1 minus the probability 133 00:08:09,570 --> 00:08:10,860 of the intersection. 134 00:08:15,230 --> 00:08:22,760 This side here is equal to 1 minus the probability of a1. 135 00:08:22,760 --> 00:08:24,080 This is one term. 136 00:08:24,080 --> 00:08:28,450 We get n such terms, the last one 137 00:08:28,450 --> 00:08:30,920 being 1 minus the probability of a(n). 138 00:08:34,960 --> 00:08:38,530 And we still have an inequality going this way. 139 00:08:38,530 --> 00:08:41,049 We collect those 1s that we have here. 140 00:08:41,049 --> 00:08:45,130 There's n of them, and one here, so we're 141 00:08:45,130 --> 00:08:49,450 left with n minus 1 terms that are equal to 1. 142 00:08:49,450 --> 00:08:51,610 And this gives rise to this term. 143 00:08:51,610 --> 00:08:55,360 We have all the probabilities of the various events 144 00:08:55,360 --> 00:08:57,340 that appear with the same sign. 145 00:08:57,340 --> 00:08:59,480 This gives rise to this term. 146 00:08:59,480 --> 00:09:03,550 And finally, this term here will correspond to that term. 147 00:09:03,550 --> 00:09:06,040 Namely, if we started with this inequality 148 00:09:06,040 --> 00:09:08,350 and just rearrange a few terms, we 149 00:09:08,350 --> 00:09:11,620 obtain this inequality up here. 150 00:09:11,620 --> 00:09:15,990 So these Bonferroni inequalities are a nice illustration 151 00:09:15,990 --> 00:09:18,930 of how one can combine DeMorgan's laws, set 152 00:09:18,930 --> 00:09:22,320 theoretical operations, and the union bound 153 00:09:22,320 --> 00:09:24,960 in order to obtain some interesting relations 154 00:09:24,960 --> 00:09:27,500 between probabilities.