1 00:00:00,000 --> 00:00:04,190 All right, you guys, let's go ahead and get started. 2 00:00:07,120 --> 00:00:08,370 I am back. 3 00:00:08,370 --> 00:00:10,470 I know you guys all 4 00:00:10,470 --> 00:00:11,300 missed me. 5 00:00:13,779 --> 00:00:15,070 Just a couple of announcements. 6 00:00:15,070 --> 00:00:17,567 Since you guys didn't have recitations this week. 7 00:00:17,567 --> 00:00:19,775 I want to make sure that you guys remember the Design 8 00:00:19,775 --> 00:00:23,140 Project 2 proposals are due next tomorrow 9 00:00:23,140 --> 00:00:26,770 in class, as is Hands On 6. 10 00:00:26,770 --> 00:00:31,340 Also, Quiz 2 is graded and is ready to be handed back. 11 00:00:31,340 --> 00:00:33,690 If you go to office hours with your TA this afternoon, 12 00:00:33,690 --> 00:00:35,981 you can pick it up or you can get it in class tomorrow. 13 00:00:42,255 --> 00:00:44,380 We will post the statistics as soon as we get them. 14 00:00:44,380 --> 00:00:46,630 We are still waiting to get the scores from one of the TAs 15 00:00:46,630 --> 00:00:48,210 into the thing, so I do not want to say anything 16 00:00:48,210 --> 00:00:49,380 until we know for sure. 17 00:00:52,441 --> 00:00:52,940 All right. 18 00:00:52,940 --> 00:00:55,070 What we have seen so far, we saw last time, 19 00:00:55,070 --> 00:00:56,920 we talked about this notion of transactions, 20 00:00:56,920 --> 00:01:00,180 we talked about the notion of making 21 00:01:00,180 --> 00:01:02,216 actions durable and consistent. 22 00:01:02,216 --> 00:01:03,590 This time what we are going to do 23 00:01:03,590 --> 00:01:09,040 is look at a related topic, a slightly more advanced topic 24 00:01:09,040 --> 00:01:11,970 that is more related to the notion of atomic actions 25 00:01:11,970 --> 00:01:15,340 that we spent two or three lectures previous 26 00:01:15,340 --> 00:01:18,600 to the one about transactions talking about. 27 00:01:18,600 --> 00:01:24,180 If you remember, an atomic action, 28 00:01:24,180 --> 00:01:32,145 we say it is both recoverable and isolated. 29 00:01:36,440 --> 00:01:39,150 When we say an action is recoverable, 30 00:01:39,150 --> 00:01:41,870 that means, remember, it either all happens 31 00:01:41,870 --> 00:01:43,380 or it does not happen at all. 32 00:01:43,380 --> 00:01:47,710 When we say an action is isolated, 33 00:01:47,710 --> 00:01:49,760 it means, from the point of view of other actions 34 00:01:49,760 --> 00:01:51,593 that are running concurrently in the system, 35 00:01:51,593 --> 00:01:54,060 the action appears to only be one unit. 36 00:01:54,060 --> 00:01:56,320 So then none of the intermediate states of the action 37 00:01:56,320 --> 00:02:00,280 is visible to any other actions that are running concurrently. 38 00:02:00,280 --> 00:02:04,100 And we talked about the use of the logging protocol 39 00:02:04,100 --> 00:02:06,340 to allow us to recover actions. 40 00:02:06,340 --> 00:02:08,970 We also, in the text and in class, 41 00:02:08,970 --> 00:02:11,590 talked briefly about version histories as an alternative way 42 00:02:11,590 --> 00:02:13,830 that we can recover actions. 43 00:02:13,830 --> 00:02:15,790 And we talked about how to do isolation. 44 00:02:15,790 --> 00:02:17,190 We talked about several different methods 45 00:02:17,190 --> 00:02:18,023 for doing isolation. 46 00:02:18,023 --> 00:02:20,690 We spent a while talking about locking as a mechanism 47 00:02:20,690 --> 00:02:23,660 that we use to isolate actions from each other. 48 00:02:23,660 --> 00:02:28,380 What we are going to talk about today is a related topic, 49 00:02:28,380 --> 00:02:30,330 and it has to do with when you have actions 50 00:02:30,330 --> 00:02:32,020 that are spread across multiple sites, 51 00:02:32,020 --> 00:02:33,690 multiple different computers. 52 00:02:33,690 --> 00:02:35,940 And this is going to tie together some of the concepts 53 00:02:35,940 --> 00:02:38,720 that we learned in the previous section on networking 54 00:02:38,720 --> 00:02:41,580 with the more recent stuff that we have been talking 55 00:02:41,580 --> 00:02:44,160 about with atomic action. 56 00:02:44,160 --> 00:02:46,800 The topic today is multi-site atomicity. 57 00:02:54,106 --> 00:02:55,980 And just to give you a simple example of what 58 00:02:55,980 --> 00:02:58,930 we mean by a situation in which you might care 59 00:02:58,930 --> 00:03:01,400 about multi-site atomicity, suppose that you 60 00:03:01,400 --> 00:03:04,940 are running a travel website. 61 00:03:04,940 --> 00:03:10,170 You have some travel site, and that 62 00:03:10,170 --> 00:03:12,340 travel site you have negotiated agreements 63 00:03:12,340 --> 00:03:15,730 with various different people who sell airline tickets. 64 00:03:15,730 --> 00:03:22,830 Maybe you have agreements with Jet Blue and USAir 65 00:03:22,830 --> 00:03:26,690 and some other set of airlines. 66 00:03:26,690 --> 00:03:30,390 And you want to make it so that when somebody purchases 67 00:03:30,390 --> 00:03:33,100 a ticket, they may purchase a set of flights that 68 00:03:33,100 --> 00:03:35,247 are purchased together as one unit where 69 00:03:35,247 --> 00:03:37,330 there are different flights on different airlines. 70 00:03:37,330 --> 00:03:39,630 I might want to purchase a flight from here 71 00:03:39,630 --> 00:03:41,190 to San Francisco on JetBlue and then 72 00:03:41,190 --> 00:03:44,250 a flight from San Francisco back to here on USAir. 73 00:03:44,250 --> 00:03:46,800 Sometimes people want to use multiple different vendors 74 00:03:46,800 --> 00:03:48,720 when they are purchasing their tickets. 75 00:03:48,720 --> 00:03:52,660 And it may be the way that your travel site talks to JetBlue 76 00:03:52,660 --> 00:03:54,600 and USAir is over the Internet. 77 00:03:54,600 --> 00:03:59,070 They use some remote procedure call 78 00:03:59,070 --> 00:04:05,310 to ask JetBlue or USAir to reserve a seat on their behalf. 79 00:04:05,310 --> 00:04:07,280 But when somebody is using your website, 80 00:04:07,280 --> 00:04:09,780 you want to create the illusion that these reservations that 81 00:04:09,780 --> 00:04:12,370 are made on the website are sort of one atomic unit. 82 00:04:12,370 --> 00:04:13,720 You do not want it to be that a person gets 83 00:04:13,720 --> 00:04:15,261 a reservation on JetBlue and does not 84 00:04:15,261 --> 00:04:16,732 get a reservation on USAir. 85 00:04:16,732 --> 00:04:18,190 You want them to make a reservation 86 00:04:18,190 --> 00:04:21,370 and get all of the reservation or none of the reservation. 87 00:04:21,370 --> 00:04:22,850 In order to make that work, we are 88 00:04:22,850 --> 00:04:24,600 going to need some sort of special support 89 00:04:24,600 --> 00:04:28,130 from the travel site, JetBlue and USAir. 90 00:04:28,130 --> 00:04:30,520 Suppose, for example, I did not have any special support 91 00:04:30,520 --> 00:04:32,950 from the JetBlue website and I said 92 00:04:32,950 --> 00:04:35,840 I want to purchase this ticket to San Francisco. 93 00:04:39,510 --> 00:04:41,340 JetBlue goes ahead and says I am ready 94 00:04:41,340 --> 00:04:43,050 and I purchased that ticket for you. 95 00:04:43,050 --> 00:04:45,674 And then suppose we cannot get this reservation on USAir. 96 00:04:45,674 --> 00:04:46,590 Now we are in trouble. 97 00:04:46,590 --> 00:04:50,280 We need some way to back out of the action with JetBlue 98 00:04:50,280 --> 00:04:51,910 and say wait, never mind, I did not 99 00:04:51,910 --> 00:04:53,180 mean to actually purchase that ticket 100 00:04:53,180 --> 00:04:55,388 because I could not complete the rest of my purchase. 101 00:04:55,388 --> 00:04:58,530 We are going to talk about how to provide this kind of, what 102 00:04:58,530 --> 00:05:00,640 we call, multi-site atomicity where 103 00:05:00,640 --> 00:05:03,170 we want this whole action that includes purchases of tickets 104 00:05:03,170 --> 00:05:05,336 from these different websites to appear as though it 105 00:05:05,336 --> 00:05:06,414 was one atomic action. 106 00:05:06,414 --> 00:05:08,830 Particularly we want to make sure that this whole thing is 107 00:05:08,830 --> 00:05:10,079 both recoverable and isolated. 108 00:05:19,104 --> 00:05:20,770 To sneak up on this topic, because there 109 00:05:20,770 --> 00:05:22,500 are a bunch of different issues that are going to come up here 110 00:05:22,500 --> 00:05:23,980 and are going to be a little bit complicated, 111 00:05:23,980 --> 00:05:25,854 we are going to start by looking at a simpler 112 00:05:25,854 --> 00:05:27,730 version of this problem. 113 00:05:27,730 --> 00:05:32,030 Let's suppose that we have all of these things running 114 00:05:32,030 --> 00:05:35,860 on the same computer together, that is they 115 00:05:35,860 --> 00:05:37,360 are not connected over the Internet, 116 00:05:37,360 --> 00:05:40,749 there is not this possibility of a message being lost like there 117 00:05:40,749 --> 00:05:41,790 would be on the Internet. 118 00:05:41,790 --> 00:05:42,720 So we are going to look at these things 119 00:05:42,720 --> 00:05:45,570 as though they are all running together on the same computer. 120 00:05:45,570 --> 00:05:47,442 In that case, we call these actions that 121 00:05:47,442 --> 00:05:49,150 are running together on the same computer 122 00:05:49,150 --> 00:05:50,265 "nested atomic actions". 123 00:05:57,940 --> 00:06:00,190 Once we see how this works on just the single computer 124 00:06:00,190 --> 00:06:02,157 case then we will sort of look and see 125 00:06:02,157 --> 00:06:04,240 how it gets more complicated when we extend it out 126 00:06:04,240 --> 00:06:05,490 to the multi-computer case. 127 00:06:09,200 --> 00:06:10,790 Let's simplify this a little bit. 128 00:06:10,790 --> 00:06:14,570 Suppose we had our buy ticket procedure looking something 129 00:06:14,570 --> 00:06:28,040 like begin, buy on JetBlue, then buy on USAir and then end. 130 00:06:28,040 --> 00:06:31,709 I am just going to call these two things A and B for now 131 00:06:31,709 --> 00:06:34,250 just to sort of simplify it so I do not have to write JetBlue 132 00:06:34,250 --> 00:06:36,610 and USAir over and over again. 133 00:06:36,610 --> 00:06:40,960 And the property that we want is that each one of these actions, 134 00:06:40,960 --> 00:06:45,960 in and of itself, should be atomic with respect 135 00:06:45,960 --> 00:06:47,150 to the other actions. 136 00:06:47,150 --> 00:06:55,570 We want atomic with respect to each other. 137 00:06:55,570 --> 00:06:59,370 That is one property we want. 138 00:06:59,370 --> 00:07:03,890 We want that because, for example, suppose 139 00:07:03,890 --> 00:07:07,000 that buying JetBlue requires me to provide some credit card 140 00:07:07,000 --> 00:07:09,830 number or debit card number that is going to go out and purchase 141 00:07:09,830 --> 00:07:10,380 the ticket. 142 00:07:10,380 --> 00:07:13,000 Well, I do not want to have the action that 143 00:07:13,000 --> 00:07:15,170 purchases from JetBlue and the action that purchases 144 00:07:15,170 --> 00:07:18,160 from USAir simultaneously decrementing my, 145 00:07:18,160 --> 00:07:19,400 say, bank account. 146 00:07:19,400 --> 00:07:21,776 Because we saw how you could imagine getting into trouble 147 00:07:21,776 --> 00:07:24,066 where they would both read the balance at the same time 148 00:07:24,066 --> 00:07:26,650 and then both decrement and then both write at the same time. 149 00:07:26,650 --> 00:07:29,490 You could get some mixed up balance left in your bank 150 00:07:29,490 --> 00:07:30,600 account, for example. 151 00:07:30,600 --> 00:07:33,925 So if these things are, let's say, for example, 152 00:07:33,925 --> 00:07:35,300 debiting a bank account, you want 153 00:07:35,300 --> 00:07:38,950 to make sure that these actions are atomic with respect to one 154 00:07:38,950 --> 00:07:40,760 another. 155 00:07:40,760 --> 00:07:44,920 You also want to make sure that this whole thing is 156 00:07:44,920 --> 00:07:47,905 atomic with respect to the outside world. 157 00:07:54,290 --> 00:07:59,450 That is to say that the caller, somebody 158 00:07:59,450 --> 00:08:02,260 who invokes this procedure to buy this pair of tickets 159 00:08:02,260 --> 00:08:05,004 never gets to see a state where one of the tickets is purchased 160 00:08:05,004 --> 00:08:06,670 and one of the tickets is not purchased. 161 00:08:06,670 --> 00:08:09,360 It either looks like the ticket has been completely purchased 162 00:08:09,360 --> 00:08:11,000 or it has not been purchased at all, 163 00:08:11,000 --> 00:08:12,407 so we want it to be isolated. 164 00:08:12,407 --> 00:08:13,990 And we also want it to be recoverable. 165 00:08:13,990 --> 00:08:17,000 We want it to be the case that if we crash halfway 166 00:08:17,000 --> 00:08:20,300 through here, after we have bought the ticket on JetBlue, 167 00:08:20,300 --> 00:08:22,550 if the whole system crashes at that point, when 168 00:08:22,550 --> 00:08:25,866 the system comes back up, we do not want the system 169 00:08:25,866 --> 00:08:27,240 to be in some halfway state where 170 00:08:27,240 --> 00:08:28,730 we paid the money for the JetBlue ticket 171 00:08:28,730 --> 00:08:30,440 and have not paid the money for the USAir ticket. 172 00:08:30,440 --> 00:08:32,273 So we should either complete the transaction 173 00:08:32,273 --> 00:08:34,320 or we should completely roll back the transaction 174 00:08:34,320 --> 00:08:37,281 and abort it. 175 00:08:37,281 --> 00:08:39,530 So this is this notion of nested atomic actions, which 176 00:08:39,530 --> 00:08:41,640 is we have this outer action and then it 177 00:08:41,640 --> 00:08:43,640 has these two actions that are nested within it. 178 00:08:43,640 --> 00:08:45,990 And each of these actions is, in and of itself, 179 00:08:45,990 --> 00:08:46,880 an atomic action. 180 00:08:54,360 --> 00:09:01,330 If you think about what's going on here for a minute, 181 00:09:01,330 --> 00:09:04,960 so if you think about what the sort of condition that we want 182 00:09:04,960 --> 00:09:07,530 is, we've got A and B. 183 00:09:07,530 --> 00:09:15,450 And we want A to commit if B commits 184 00:09:15,450 --> 00:09:26,740 and we want B to commit if A commits. 185 00:09:26,740 --> 00:09:30,390 Because, if A doesn't commit, we need both of these actions 186 00:09:30,390 --> 00:09:33,852 to definitely commit or definitely not commit. 187 00:09:33,852 --> 00:09:36,060 Otherwise, we're kind of in trouble for this example. 188 00:09:36,060 --> 00:09:37,720 But the way that I've worded this, 189 00:09:37,720 --> 00:09:40,460 it kind of sounds impossible, like how 190 00:09:40,460 --> 00:09:42,305 is it A is waiting for B to commit 191 00:09:42,305 --> 00:09:44,680 and B is waiting for A to commit so how are we ever going 192 00:09:44,680 --> 00:09:46,680 to make any progress? 193 00:09:46,680 --> 00:09:48,490 And the trick is that we're going 194 00:09:48,490 --> 00:09:50,900 to introduce a third-party, something that is 195 00:09:50,900 --> 00:09:56,330 in charge of deciding whether or not the entire action has 196 00:09:56,330 --> 00:09:57,020 committed. 197 00:09:57,020 --> 00:09:59,619 So we're going to introduce a node S which 198 00:09:59,619 --> 00:10:00,660 we call the "supervisor". 199 00:10:04,480 --> 00:10:08,730 And S is in charge of deciding whether or not 200 00:10:08,730 --> 00:10:12,830 the entire action, the action with both A and B inside of it 201 00:10:12,830 --> 00:10:15,460 actually commits. 202 00:10:15,460 --> 00:10:18,550 So let's see how that works because just seeing 203 00:10:18,550 --> 00:10:21,510 that we still have to have some way of S 204 00:10:21,510 --> 00:10:23,270 knowing that A is ready to commit 205 00:10:23,270 --> 00:10:25,560 and S knowing that B is ready to commit. 206 00:10:25,560 --> 00:10:27,240 And we still need some way of making it 207 00:10:27,240 --> 00:10:31,760 so that A and B both make the decision to commit 208 00:10:31,760 --> 00:10:33,290 at exactly the same time. 209 00:10:33,290 --> 00:10:49,605 So let's see how we can go about doing that. 210 00:10:49,605 --> 00:10:51,730 The idea is that we're going to introduce something 211 00:10:51,730 --> 00:10:53,030 we call "tentative commits". 212 00:10:59,330 --> 00:11:03,030 So what we want to do is get A and B 213 00:11:03,030 --> 00:11:07,410 to a point where they're both exactly ready 214 00:11:07,410 --> 00:11:09,900 to commit but they haven't yet actually committed 215 00:11:09,900 --> 00:11:10,580 their results. 216 00:11:10,580 --> 00:11:12,829 So they haven't actually made their results available, 217 00:11:12,829 --> 00:11:17,602 but they want to give up control to S as to the instant 218 00:11:17,602 --> 00:11:18,810 that they'll actually commit. 219 00:11:18,810 --> 00:11:21,540 So what we're trying to get is a way 220 00:11:21,540 --> 00:11:24,330 in which S can instantaneously make a decision 221 00:11:24,330 --> 00:11:27,070 about whether both A and B commit or neither A 222 00:11:27,070 --> 00:11:28,630 or B commits. 223 00:11:28,630 --> 00:11:30,240 And so what we're going to do is, 224 00:11:30,240 --> 00:11:32,340 the idea with a tentative commit is 225 00:11:32,340 --> 00:11:35,860 we're going to run A and B until they tentatively commit. 226 00:11:35,860 --> 00:11:40,250 And what that means is that they're going to do, 227 00:11:40,250 --> 00:11:48,170 so we want A and B to do everything 228 00:11:48,170 --> 00:11:51,385 except actually commit. 229 00:11:54,632 --> 00:11:55,590 So what does that mean? 230 00:11:55,590 --> 00:11:58,000 It means that A and B are going to read all of the data 231 00:11:58,000 --> 00:11:59,290 that they would normally read, they're 232 00:11:59,290 --> 00:12:01,620 going to write all the things that they would normally write. 233 00:12:01,620 --> 00:12:03,830 If we are using a locking protocol they would acquire 234 00:12:03,830 --> 00:12:06,230 all of the locks that they would need to acquire in order 235 00:12:06,230 --> 00:12:09,830 to process the transaction, process 236 00:12:09,830 --> 00:12:11,670 their part of the action. 237 00:12:11,670 --> 00:12:13,950 But they're not actually going to commit. 238 00:12:13,950 --> 00:12:20,780 So not committing means that, in particular, they 239 00:12:20,780 --> 00:12:23,260 are not going to expose their results out 240 00:12:23,260 --> 00:12:25,540 to the outside world. 241 00:12:25,540 --> 00:12:47,600 So we say they don't expose results beyond S. 242 00:12:47,600 --> 00:12:50,370 So a tentatively committed transaction, 243 00:12:50,370 --> 00:12:52,540 I'm just going to write as TC, says 244 00:12:52,540 --> 00:12:55,010 that it's not going to expose any 245 00:12:55,010 --> 00:12:57,730 of the results of its action outside of S. 246 00:12:57,730 --> 00:12:59,700 If we're using a locking protocol, 247 00:12:59,700 --> 00:13:03,920 how do we prevent one of these nested actions 248 00:13:03,920 --> 00:13:06,910 from being able to expose its results? 249 00:13:06,910 --> 00:13:12,470 Or, how do we make it so that it doesn't expose its results 250 00:13:12,470 --> 00:13:15,390 or it makes its results visible outside? 251 00:13:15,390 --> 00:13:15,890 Yeah. 252 00:13:15,890 --> 00:13:18,890 STUDENT: Whenever a stop action wants to commit, 253 00:13:18,890 --> 00:13:24,020 we would just move the lock to its higher level of action, 254 00:13:24,020 --> 00:13:26,510 to the action it belongs to. 255 00:13:26,510 --> 00:13:27,010 Right. 256 00:13:27,010 --> 00:13:29,680 That's essentially what we're going to want to do. 257 00:13:29,680 --> 00:13:32,080 If we just want to make it so that the action's results 258 00:13:32,080 --> 00:13:33,871 aren't visible, we're just going to make it 259 00:13:33,871 --> 00:13:36,671 so that that action doesn't release any of its locks. 260 00:13:36,671 --> 00:13:38,170 If it doesn't release its locks then 261 00:13:38,170 --> 00:13:41,120 nobody else can get locks on any of the data that it updated. 262 00:13:41,120 --> 00:13:43,920 And, therefore, none of its updates will be visible. 263 00:13:43,920 --> 00:13:45,440 The solution that was proposed here 264 00:13:45,440 --> 00:13:47,970 is what we're going to work up to, 265 00:13:47,970 --> 00:13:51,420 which is that basically we want to make sure that if we want 266 00:13:51,420 --> 00:13:55,380 S to be able to see the results of a tentatively 267 00:13:55,380 --> 00:13:58,480 committed sub-action, which we will and I'll explain why. 268 00:13:58,480 --> 00:13:59,980 Then what we're going to do is we're 269 00:13:59,980 --> 00:14:02,590 going to hand the locks off from the sub-action 270 00:14:02,590 --> 00:14:05,440 up to S, the superior action. 271 00:14:05,440 --> 00:14:08,010 We're work through how that works. 272 00:14:08,010 --> 00:14:10,360 The way that this is going to work, 273 00:14:10,360 --> 00:14:13,220 this is going to allow us to get, 274 00:14:13,220 --> 00:14:16,890 so we can draw a graph that looks like this. 275 00:14:16,890 --> 00:14:20,980 We say S, we've got some action A, we've got some action B, 276 00:14:20,980 --> 00:14:23,720 and we may, in principle, have other actions 277 00:14:23,720 --> 00:14:26,512 that are sub-actions of these sub-actions. 278 00:14:26,512 --> 00:14:27,970 And what we're going to do is we're 279 00:14:27,970 --> 00:14:30,000 going to draw a graph where we put arrows 280 00:14:30,000 --> 00:14:32,740 pointing from the sub-action up to its parent 281 00:14:32,740 --> 00:14:35,590 action, the action that it is depending on. 282 00:14:35,590 --> 00:14:37,500 And we're going to label the states, 283 00:14:37,500 --> 00:14:39,290 we can label each one of these actions 284 00:14:39,290 --> 00:14:41,747 in this graph as either tentatively committed. 285 00:14:41,747 --> 00:14:43,330 Or, if it's not tentatively committed, 286 00:14:43,330 --> 00:14:45,246 it hasn't finished doing all of its processing 287 00:14:45,246 --> 00:14:48,040 yet we might label it as pending. 288 00:15:01,387 --> 00:15:02,970 Let's look in a little bit more detail 289 00:15:02,970 --> 00:15:08,580 about how we might actually get this tentative commit thing 290 00:15:08,580 --> 00:15:09,210 to work. 291 00:15:09,210 --> 00:15:11,940 And hopefully that will make it a little bit more 292 00:15:11,940 --> 00:15:13,939 clear what's going on here. 293 00:15:13,939 --> 00:15:15,480 And, in particular, what I want to do 294 00:15:15,480 --> 00:15:19,960 is I want to look at the way in which a log 295 00:15:19,960 --> 00:15:24,410 action on this machine might be being maintained. 296 00:15:24,410 --> 00:15:27,790 And this is really going to help us get at how 297 00:15:27,790 --> 00:15:36,880 we do recovery via logging. 298 00:15:36,880 --> 00:15:39,020 This is for these nested actions. 299 00:15:39,020 --> 00:15:47,740 Suppose we have some log and this has actions in it. 300 00:15:50,310 --> 00:15:54,630 When S first starts running the transaction, 301 00:15:54,630 --> 00:16:00,630 when the transaction first starts running this supervisor 302 00:16:00,630 --> 00:16:05,070 module writes a begin transaction message. 303 00:16:05,070 --> 00:16:07,900 And then what it's going to do is 304 00:16:07,900 --> 00:16:14,355 it's going to invoke each one of these subatomic actions. 305 00:16:14,355 --> 00:16:15,980 And each one of those subatomic actions 306 00:16:15,980 --> 00:16:19,080 is going to write a begin record as well. 307 00:16:22,800 --> 00:16:24,790 And then there's going to be some processing, 308 00:16:24,790 --> 00:16:26,410 so these actions are going to execute, 309 00:16:26,410 --> 00:16:28,280 they're going to obtain some locks 310 00:16:28,280 --> 00:16:30,836 and they are going to update some data. 311 00:16:30,836 --> 00:16:33,210 We're going to write those log records for that data that 312 00:16:33,210 --> 00:16:35,540 was updated into the log. 313 00:16:35,540 --> 00:16:37,430 And then, at some point later, we 314 00:16:37,430 --> 00:16:43,170 will see tentative commits for A and tentative commits for B. 315 00:16:43,170 --> 00:16:46,510 And then finally what we'll do is, 316 00:16:46,510 --> 00:16:48,900 once those guys are tentatively committed, 317 00:16:48,900 --> 00:16:57,350 we'll write the commit record for S. 318 00:16:57,350 --> 00:17:01,720 Now let's look at what we can say at various points 319 00:17:01,720 --> 00:17:02,470 sort of during it. 320 00:17:02,470 --> 00:17:04,761 If you think of this log as being a timeline about when 321 00:17:04,761 --> 00:17:06,540 things happen in the system, let's 322 00:17:06,540 --> 00:17:09,619 look and see what we can say at various points. 323 00:17:09,619 --> 00:17:12,569 One thing we can say is that this time when this commit 324 00:17:12,569 --> 00:17:15,960 S record was written, this is the commit point 325 00:17:15,960 --> 00:17:17,750 for this entire transaction. 326 00:17:17,750 --> 00:17:26,260 So we say this is the commit point for both A and B 327 00:17:26,260 --> 00:17:30,220 and anything else that maybe S did. 328 00:17:30,220 --> 00:17:32,970 If you remember, what the commit point is, 329 00:17:32,970 --> 00:17:34,580 it's sort of the point of no return. 330 00:17:34,580 --> 00:17:36,320 Once we reach the commit point we've 331 00:17:36,320 --> 00:17:40,760 guaranteed that this action is going to persist. 332 00:17:40,760 --> 00:17:43,930 Even if the system crashes, it will recover in the state 333 00:17:43,930 --> 00:17:46,030 as though that action had taken effect. 334 00:17:46,030 --> 00:17:48,100 And if we crash prior to the commit point 335 00:17:48,100 --> 00:17:52,281 then what we want to guaranty is that this action was not 336 00:17:52,281 --> 00:17:52,780 visible. 337 00:17:52,780 --> 00:17:55,190 When we recover we're going to undo any effects 338 00:17:55,190 --> 00:17:57,920 that anything in this action did. 339 00:17:57,920 --> 00:18:00,510 If we were to crash right here, just 340 00:18:00,510 --> 00:18:02,160 before we wrote the commit record, 341 00:18:02,160 --> 00:18:05,562 then we would undo this whole action. 342 00:18:05,562 --> 00:18:07,520 And if we were to crash any time after we write 343 00:18:07,520 --> 00:18:08,936 the commit record then we're going 344 00:18:08,936 --> 00:18:11,561 to make sure that we redo any updates that the action may 345 00:18:11,561 --> 00:18:12,310 have needed to do. 346 00:18:12,310 --> 00:18:17,080 We're going to guaranty that this action is forced to disc. 347 00:18:17,080 --> 00:18:20,220 But notice that there are these two tentative commit records. 348 00:18:20,220 --> 00:18:24,284 What do these tentative commit records correspond to? 349 00:18:24,284 --> 00:18:25,950 What these tentative commit records mean 350 00:18:25,950 --> 00:18:30,170 is that A and B, this buy JetBlue and buy USAir, 351 00:18:30,170 --> 00:18:32,200 did all of the work that they had to do. 352 00:18:32,200 --> 00:18:36,030 So, in particular, it means that neither A or B is going 353 00:18:36,030 --> 00:18:38,304 to have to acquire anymore locks, 354 00:18:38,304 --> 00:18:39,970 they're not going to write anymore data. 355 00:18:39,970 --> 00:18:42,984 They're at exactly this point where they're ready to commit. 356 00:18:42,984 --> 00:18:44,900 And the thing that's going to make them commit 357 00:18:44,900 --> 00:18:47,020 is the writing of this commit S log record. 358 00:18:49,770 --> 00:18:51,750 Effectively, A and B, the sort of whether 359 00:18:51,750 --> 00:18:54,497 or not A and B commit is now out of control of A and B 360 00:18:54,497 --> 00:18:55,830 once they write this log record. 361 00:18:55,830 --> 00:18:58,950 And it's completely in the hands of this outer supervisor module 362 00:18:58,950 --> 00:19:00,120 S. 363 00:19:00,120 --> 00:19:02,090 And so the outer supervisor module 364 00:19:02,090 --> 00:19:04,610 S, of course it can commit, but you also 365 00:19:04,610 --> 00:19:07,820 have to realize that it's also OK if this outer module aborts. 366 00:19:07,820 --> 00:19:10,390 And if it aborts then it will write an abort log record 367 00:19:10,390 --> 00:19:13,440 and we will have to go and undo the effects of the whole thing. 368 00:19:13,440 --> 00:19:18,710 But we can still do that because we haven't, as of this point 369 00:19:18,710 --> 00:19:21,285 here, actually exposed any results outside of this action. 370 00:19:21,285 --> 00:19:23,910 So there is nobody else who has seen the effects of this thing. 371 00:19:23,910 --> 00:19:25,690 We're still isolated with respect 372 00:19:25,690 --> 00:19:28,500 to any outside action that might be running on the system. 373 00:19:28,500 --> 00:19:31,990 So it's OK if we abort any time up to this commit record. 374 00:19:31,990 --> 00:19:34,030 It also means that after this commit point, 375 00:19:34,030 --> 00:19:36,070 this is sort of the point where we're going 376 00:19:36,070 --> 00:19:37,520 to start exposing results. 377 00:19:37,520 --> 00:19:41,400 If we're locking we're going to release our write locks 378 00:19:41,400 --> 00:19:43,450 after this commit point. 379 00:19:43,450 --> 00:19:47,481 And the act of releasing the locks 380 00:19:47,481 --> 00:19:49,480 is what's going to make it so that other actions 381 00:19:49,480 --> 00:19:52,920 outside of this system can see the effects of A and B running. 382 00:20:04,720 --> 00:20:17,550 Just to make it clear, we say A and B commit or abort 383 00:20:17,550 --> 00:20:20,200 when S commits or aborts. 384 00:20:20,200 --> 00:20:23,544 I have just written CA here for commit or abort. 385 00:20:23,544 --> 00:20:25,210 There is one other little detail that we 386 00:20:25,210 --> 00:20:27,580 need to point out which is that S 387 00:20:27,580 --> 00:20:37,770 can commit even if A or B fail. 388 00:20:37,770 --> 00:20:40,330 So this may seem a little bit counterintuitive, 389 00:20:40,330 --> 00:20:44,630 but the intuition here is that this action S, 390 00:20:44,630 --> 00:20:49,900 suppose that S tries to run A and it cannot get a hold 391 00:20:49,900 --> 00:20:52,480 of the tickets on JetBlue that it wanted. 392 00:20:52,480 --> 00:20:55,230 Well, S is free to go and try and make a reservation 393 00:20:55,230 --> 00:20:58,250 on some other airline that also satisfies the user's request. 394 00:20:58,250 --> 00:21:02,670 So the fact that A failed doesn't say anything about 395 00:21:02,670 --> 00:21:05,020 whether or not S is necessarily going to fail. 396 00:21:05,020 --> 00:21:07,812 S still gets to decide whether it fails or not. 397 00:21:07,812 --> 00:21:09,520 And this is kind of an important property 398 00:21:09,520 --> 00:21:13,120 because it means that these actions that are running inside 399 00:21:13,120 --> 00:21:15,950 of S are actually, while they are running, 400 00:21:15,950 --> 00:21:18,540 are isolated with respect to S and the other actions that 401 00:21:18,540 --> 00:21:19,569 are running on S. 402 00:21:19,569 --> 00:21:21,860 Any of the updates that they make while they're running 403 00:21:21,860 --> 00:21:25,170 aren't seen by S or any of the other actions. 404 00:21:25,170 --> 00:21:29,170 This is just one little thing to keep in mind. 405 00:21:35,810 --> 00:21:39,930 There is one last detail that I've sort of brushed over here 406 00:21:39,930 --> 00:21:42,400 that we hinted at a little bit a minute ago. 407 00:21:42,400 --> 00:21:48,590 And that's the problem which is what if A and B conflict 408 00:21:48,590 --> 00:21:50,690 with each other? 409 00:21:50,690 --> 00:21:54,300 We said that A and B might both update the same bank account 410 00:21:54,300 --> 00:21:56,690 balance, right? 411 00:21:56,690 --> 00:21:58,330 Well, we have a little bit of a problem 412 00:21:58,330 --> 00:22:01,260 if that happens because I've got my S 413 00:22:01,260 --> 00:22:05,130 and I've got my A and my B. 414 00:22:05,130 --> 00:22:08,270 And it may be the case that suppose 415 00:22:08,270 --> 00:22:12,050 we start running A first, A gets the lock on the bank account 416 00:22:12,050 --> 00:22:14,322 and then B starts running and tries to get 417 00:22:14,322 --> 00:22:15,530 the lock on the bank account. 418 00:22:15,530 --> 00:22:17,520 And it waits for A because A is still holding 419 00:22:17,520 --> 00:22:19,230 this lock on the bank account. 420 00:22:19,230 --> 00:22:21,050 So we may have this situation where 421 00:22:21,050 --> 00:22:24,680 B is waiting for A to release the lock on the bank account. 422 00:22:24,680 --> 00:22:27,520 But the way that I've described this so far, 423 00:22:27,520 --> 00:22:33,226 it may not be clear that B is ever 424 00:22:33,226 --> 00:22:35,100 going to be able to actually obtain the lock. 425 00:22:35,100 --> 00:22:38,830 Because we said that these tentatively committed actions 426 00:22:38,830 --> 00:22:42,970 don't release their locks until after the commit point 427 00:22:42,970 --> 00:22:44,120 has been reached. 428 00:22:44,120 --> 00:22:46,631 It turns out that we need to sort of modify that statement 429 00:22:46,631 --> 00:22:47,380 just a little bit. 430 00:22:47,380 --> 00:22:49,109 And that kind of gets to the comment 431 00:22:49,109 --> 00:22:50,900 that was made before which points this out. 432 00:22:50,900 --> 00:22:56,460 What we want to say is that when an action like A 433 00:22:56,460 --> 00:23:03,820 tentatively commits, we want to make it 434 00:23:03,820 --> 00:23:11,690 so that S and its children can see A's updates. 435 00:23:16,460 --> 00:23:18,230 After A has tentatively committed, 436 00:23:18,230 --> 00:23:21,070 the other actions that are underneath S 437 00:23:21,070 --> 00:23:23,660 should be able to see the updates that A made. 438 00:23:23,660 --> 00:23:26,230 So A is going to go ahead and withdraw the money 439 00:23:26,230 --> 00:23:29,110 from the bank account, and then it's going to say OK, I'm done, 440 00:23:29,110 --> 00:23:30,180 I've withdrawn my money. 441 00:23:30,180 --> 00:23:32,205 And now any other action within S 442 00:23:32,205 --> 00:23:33,830 that needs to run that maybe also needs 443 00:23:33,830 --> 00:23:35,611 to update the bank account will be allowed 444 00:23:35,611 --> 00:23:36,610 to go ahead and do that. 445 00:23:36,610 --> 00:23:39,280 So B could go ahead and run and update the bank account. 446 00:23:39,280 --> 00:23:42,790 Notice that we haven't said that A's updates are 447 00:23:42,790 --> 00:23:44,244 visible outside of S. 448 00:23:44,244 --> 00:23:46,160 Nobody else gets to see that this bank account 449 00:23:46,160 --> 00:23:46,993 balance got changed. 450 00:23:46,993 --> 00:23:50,800 It's just the action B that gets to see that the bank account 451 00:23:50,800 --> 00:23:52,720 balance is going to change. 452 00:23:52,720 --> 00:23:54,150 Effectively, what this amounts to 453 00:23:54,150 --> 00:23:56,380 is that when A tentatively commits 454 00:23:56,380 --> 00:24:00,896 it assigns all of its locks up to the S action. 455 00:24:08,210 --> 00:24:11,130 That's as much as I'm going to say 456 00:24:11,130 --> 00:24:13,660 about this notion of nested atomic actions. 457 00:24:13,660 --> 00:24:15,540 What this has given us is this way to have, 458 00:24:15,540 --> 00:24:18,550 on a single site, one action that 459 00:24:18,550 --> 00:24:20,310 is composed of multiple sub-actions 460 00:24:20,310 --> 00:24:23,150 where those sub-actions are isolated from each other. 461 00:24:23,150 --> 00:24:27,250 And all of the sub-actions either commit or abort 462 00:24:27,250 --> 00:24:28,340 together as a batch. 463 00:24:31,200 --> 00:24:33,200 But we said what we ultimately wanted 464 00:24:33,200 --> 00:24:35,360 was the ability to do this across multiple sites. 465 00:24:40,510 --> 00:24:45,980 Just to draw a simple architecture diagram 466 00:24:45,980 --> 00:24:49,550 about the multi-site case seeing what this looks like, 467 00:24:49,550 --> 00:24:53,450 suppose we have some action S and suppose 468 00:24:53,450 --> 00:25:03,800 we still have our A and our B. 469 00:25:03,800 --> 00:25:05,770 Now, just to conceptualize this, suppose 470 00:25:05,770 --> 00:25:11,080 that there is some network in the middle of these things. 471 00:25:11,080 --> 00:25:17,262 And this is a best-effort network 472 00:25:17,262 --> 00:25:19,220 so it has all the problems that we talked about 473 00:25:19,220 --> 00:25:21,400 that best-effort networks have. 474 00:25:21,400 --> 00:25:24,543 It has congestion, it has delays, it can lose packets. 475 00:25:28,366 --> 00:25:29,990 And the way to think about these things 476 00:25:29,990 --> 00:25:33,699 is that these actions are going to interact with each other. 477 00:25:33,699 --> 00:25:35,990 These nodes are going to interact with each other using 478 00:25:35,990 --> 00:25:37,620 RPCs. 479 00:25:37,620 --> 00:25:39,470 And they're just going to send actions, 480 00:25:39,470 --> 00:25:40,928 they're just going to send requests 481 00:25:40,928 --> 00:25:42,950 to each other and responses over these links. 482 00:25:42,950 --> 00:25:47,600 So S is going to send RPC to A saying reserve a seat for me, 483 00:25:47,600 --> 00:25:49,960 for example, and then A is going to send a reply 484 00:25:49,960 --> 00:25:53,364 back saying OK, I went ahead and reserved that seat. 485 00:25:53,364 --> 00:25:54,780 So what I want to do is sort of go 486 00:25:54,780 --> 00:25:56,890 from this informal description of what 487 00:25:56,890 --> 00:26:00,080 we want in the multi-site case to an actual description of how 488 00:26:00,080 --> 00:26:02,305 the protocol works so you guys can see that there 489 00:26:02,305 --> 00:26:04,680 are some pretty subtle details that are involved in this. 490 00:26:04,680 --> 00:26:08,920 And it's worth pointing out that this situation is fairly 491 00:26:08,920 --> 00:26:11,150 similar to many of the things that, some 492 00:26:11,150 --> 00:26:11,940 of the problems that you're going 493 00:26:11,940 --> 00:26:14,314 to have to deal with in the context of the design project 494 00:26:14,314 --> 00:26:16,910 so it probably is a good idea for you to pay attention. 495 00:26:19,440 --> 00:26:23,240 So let's talk about how this protocol would work. 496 00:26:23,240 --> 00:26:31,160 The idea here is we want to provide -- 497 00:26:31,160 --> 00:26:34,570 Suppose we're still in this same travel site example. 498 00:26:34,570 --> 00:26:39,760 The client wants to make these reservations over this network, 499 00:26:39,760 --> 00:26:43,150 and the protocol we're going to use is a protocol called 500 00:26:43,150 --> 00:26:44,050 "two-phase commit". 501 00:26:53,610 --> 00:26:57,150 Suppose we have our node S which is the coordinator 502 00:26:57,150 --> 00:26:59,400 node, the node that represents the travel site, 503 00:26:59,400 --> 00:27:05,990 and we also have our two worker nodes that correspond 504 00:27:05,990 --> 00:27:11,020 to JetBlue and USAir A and B. 505 00:27:11,020 --> 00:27:20,180 I just want to show that each of these sites 506 00:27:20,180 --> 00:27:22,410 is going to maintain a bit of a log that 507 00:27:22,410 --> 00:27:25,012 reflects the state of the actions that it's running. 508 00:27:25,012 --> 00:27:26,720 So I'm going to show the state of the log 509 00:27:26,720 --> 00:27:27,678 on each of these nodes. 510 00:27:32,440 --> 00:27:38,320 Suppose at some time S starts executing this action, 511 00:27:38,320 --> 00:27:41,600 let's call it T, so it's going to write a log record that 512 00:27:41,600 --> 00:27:44,900 says start T, and then it's going 513 00:27:44,900 --> 00:27:48,470 to request that each of the subordinate nodes, 514 00:27:48,470 --> 00:27:52,040 the sub-nodes goes ahead and does the processing, 515 00:27:52,040 --> 00:27:54,230 say purchases the ticket that it wants. 516 00:27:54,230 --> 00:28:00,740 It is going to send a message here saying, for example, 517 00:28:00,740 --> 00:28:04,220 say the message consists of do something, 518 00:28:04,220 --> 00:28:06,830 do X, purchase this ticket. 519 00:28:06,830 --> 00:28:08,450 And, at the same time, it may also 520 00:28:08,450 --> 00:28:11,700 send a message to B telling it to do something else. 521 00:28:11,700 --> 00:28:18,782 Say, for example, do Y. 522 00:28:18,782 --> 00:28:20,990 Now what's going to happen is that each of these guys 523 00:28:20,990 --> 00:28:23,270 is going to receive this request to do something, 524 00:28:23,270 --> 00:28:26,034 so it's going to write a log record that says start. 525 00:28:26,034 --> 00:28:27,950 It's going to keep some information about what 526 00:28:27,950 --> 00:28:29,754 it started doing, it's going to say X, 527 00:28:29,754 --> 00:28:31,670 and it's going to remember that maybe this was 528 00:28:31,670 --> 00:28:33,450 a part of transaction T. 529 00:28:33,450 --> 00:28:35,150 And, similarly, this guy is going 530 00:28:35,150 --> 00:28:42,130 to start Y which was a part of transaction T. 531 00:28:42,130 --> 00:28:44,362 And then these two As and Bs now are just 532 00:28:44,362 --> 00:28:45,820 going to start executing the action 533 00:28:45,820 --> 00:28:47,380 that they were asked to execute. 534 00:28:47,380 --> 00:28:49,690 So they are going to, for example, purchase the ticket. 535 00:28:49,690 --> 00:28:50,610 They're going to run. 536 00:28:50,610 --> 00:28:51,984 They're going to acquire whatever 537 00:28:51,984 --> 00:28:53,840 locks that they need to process and then 538 00:28:53,840 --> 00:28:56,890 they are, at some point, going to enter the tentatively 539 00:28:56,890 --> 00:28:58,270 committed state. 540 00:28:58,270 --> 00:29:01,580 The same thing is going to happen over here. 541 00:29:01,580 --> 00:29:06,490 They are both going to run these actions. 542 00:29:06,490 --> 00:29:08,590 When they reach the tentatively committed state, 543 00:29:08,590 --> 00:29:10,048 what they're going to do is they're 544 00:29:10,048 --> 00:29:17,880 going to send a message back that says something like did X? 545 00:29:17,880 --> 00:29:19,970 This message is sometimes called a vote. 546 00:29:19,970 --> 00:29:22,100 Basically it says whether or not this site 547 00:29:22,100 --> 00:29:25,690 agrees that it was able to finish the work that it 548 00:29:25,690 --> 00:29:26,340 was able to do. 549 00:29:26,340 --> 00:29:30,060 So if it votes yes that means, yes, I'm done, I'm ready to go. 550 00:29:30,060 --> 00:29:31,610 And if it votes no that means sorry, 551 00:29:31,610 --> 00:29:34,210 I couldn't do the thing that you wanted. 552 00:29:34,210 --> 00:29:41,690 So similarly B is going to send back its vote that says did Y. 553 00:29:41,690 --> 00:29:44,190 So let's suppose, in both these cases, both of these actions 554 00:29:44,190 --> 00:29:46,190 were able to do the work that they wanted to do. 555 00:29:56,484 --> 00:29:58,150 At this point now what's going to happen 556 00:29:58,150 --> 00:30:00,490 is that S is going to look at the votes from the actions 557 00:30:00,490 --> 00:30:04,540 that it is tasked, and it's going to decide whether or not 558 00:30:04,540 --> 00:30:07,480 this action is going to commit or is not going to commit. 559 00:30:07,480 --> 00:30:09,600 So S is the one that's responsible for deciding 560 00:30:09,600 --> 00:30:12,830 whether or not this entire action commits. 561 00:30:12,830 --> 00:30:17,035 And suppose it decides it's going to commit, 562 00:30:17,035 --> 00:30:18,410 it's going to write a record that 563 00:30:18,410 --> 00:30:25,470 says commit this transaction T. 564 00:30:25,470 --> 00:30:26,810 So this is now the commit point. 565 00:30:26,810 --> 00:30:29,330 As soon as S writes the commit record, that 566 00:30:29,330 --> 00:30:31,430 means this action is going to commit, 567 00:30:31,430 --> 00:30:34,630 it's going to be made visible to the outside world, 568 00:30:34,630 --> 00:30:36,420 all the work has been done. 569 00:30:36,420 --> 00:30:38,414 And it's OK if it does this because A and B are 570 00:30:38,414 --> 00:30:40,080 both in the tentatively committed state. 571 00:30:40,080 --> 00:30:41,820 They've said I'm ready to go, I've 572 00:30:41,820 --> 00:30:44,520 done all the work I need to do, I can commit whenever 573 00:30:44,520 --> 00:30:46,560 you tell me it's OK to commit. 574 00:30:46,560 --> 00:30:51,067 So, after this point, everything is going to commit. 575 00:30:51,067 --> 00:30:53,400 The reason this is called the two-phase commit protocol, 576 00:30:53,400 --> 00:30:56,050 typically this part is called phase one 577 00:30:56,050 --> 00:30:59,660 where we're deciding whether or not we agree to commit. 578 00:30:59,660 --> 00:31:04,100 And then here in this next step we enter phase two. 579 00:31:04,100 --> 00:31:06,902 So what do we have to do in phase two? 580 00:31:09,950 --> 00:31:13,280 Notice that these guys have tentatively committed. 581 00:31:13,280 --> 00:31:15,030 A and B don't actually know whether or not 582 00:31:15,030 --> 00:31:17,160 the action has committed so they don't actually 583 00:31:17,160 --> 00:31:19,020 know whether they should release their locks 584 00:31:19,020 --> 00:31:21,220 and make their updates visible to the outside world. 585 00:31:21,220 --> 00:31:24,190 So we need to make sure that S tells A and B that OK, 586 00:31:24,190 --> 00:31:27,346 this action is done, I've committed and it's OK for you 587 00:31:27,346 --> 00:31:29,470 to also go ahead and commit and expose your results 588 00:31:29,470 --> 00:31:30,660 to the outside world. 589 00:31:30,660 --> 00:31:32,170 This is slightly different than the protocol 590 00:31:32,170 --> 00:31:33,930 we looked at before because these things are 591 00:31:33,930 --> 00:31:36,263 on different machines and so we have to pass information 592 00:31:36,263 --> 00:31:39,180 from S to A and B to let them know that this action is 593 00:31:39,180 --> 00:31:41,320 ready to go. 594 00:31:41,320 --> 00:31:48,250 S is going to send a message to A saying commit, 595 00:31:48,250 --> 00:31:54,300 A is going to write a log record that says commit, 596 00:31:54,300 --> 00:31:57,250 and then it's going to do the same thing, 597 00:31:57,250 --> 00:32:01,642 send the message to B that says commit. 598 00:32:01,642 --> 00:32:03,100 And, similarly, B is going to write 599 00:32:03,100 --> 00:32:09,870 a log record that says commit. 600 00:32:09,870 --> 00:32:13,085 This is the basic two-phase commit protocol. 601 00:32:15,610 --> 00:32:18,150 If you count the number of messages that you see here, 602 00:32:18,150 --> 00:32:19,990 if you have N sites, the number of messages 603 00:32:19,990 --> 00:32:21,490 you have to send in two-phase commit 604 00:32:21,490 --> 00:32:27,490 is 3N in the basic protocol without any loss. 605 00:32:27,490 --> 00:32:29,690 Notice I haven't said anything about what 606 00:32:29,690 --> 00:32:31,470 happens when a message is lost. 607 00:32:31,470 --> 00:32:33,530 And remember these best-effort networks 608 00:32:33,530 --> 00:32:37,050 have this property that messages can be lost, we can lose data, 609 00:32:37,050 --> 00:32:39,970 so we want to make sure that we understand 610 00:32:39,970 --> 00:32:42,994 how this protocol works in the face of data being lost. 611 00:32:42,994 --> 00:32:44,410 The other thing that we need to do 612 00:32:44,410 --> 00:32:46,040 is to make sure that we understand 613 00:32:46,040 --> 00:32:47,640 how this protocol works in the event 614 00:32:47,640 --> 00:32:50,560 that either S crashes or A and B crash 615 00:32:50,560 --> 00:32:52,800 at different points in the execution of the protocol. 616 00:32:52,800 --> 00:32:54,220 So that's what we're going to talk through now, sort 617 00:32:54,220 --> 00:32:56,345 of these nitty-gritty details about how we actually 618 00:32:56,345 --> 00:32:58,800 get this thing to work in the face of these properties 619 00:32:58,800 --> 00:33:00,680 that best-effort networks introduce. 620 00:33:10,070 --> 00:33:22,020 To remind you guys how we deal with loss 621 00:33:22,020 --> 00:33:23,830 in best-effort networks, I am just 622 00:33:23,830 --> 00:33:26,320 going to very quickly review exactly 623 00:33:26,320 --> 00:33:29,470 once RPC protocol that we talked about a few weeks ago. 624 00:33:29,470 --> 00:33:36,790 Exactly once RPC remember is a way 625 00:33:36,790 --> 00:33:39,850 to make it so that a procedure call gets executed once, 626 00:33:39,850 --> 00:33:42,780 a remote procedure call gets executed once, and only once, 627 00:33:42,780 --> 00:33:44,960 between a client and a server. 628 00:33:44,960 --> 00:33:47,890 So if we've got our client and our server, 629 00:33:47,890 --> 00:33:50,610 what we do is keep at the client, 630 00:33:50,610 --> 00:33:52,800 we keep a list of messages, at the server 631 00:33:52,800 --> 00:33:53,930 we keep a list of "nonces." 632 00:33:56,500 --> 00:33:59,430 The client sends a request, a message, for example, 633 00:33:59,430 --> 00:34:04,310 to the server asking it to do something 634 00:34:04,310 --> 00:34:07,460 and it attaches a nonce to it, say N1. 635 00:34:07,460 --> 00:34:10,969 So the client puts in its message table message, 636 00:34:10,969 --> 00:34:13,250 N1 stores that information. 637 00:34:13,250 --> 00:34:18,710 The server receives this request, 638 00:34:18,710 --> 00:34:22,260 it stores the nonce in its nonce table and, one, 639 00:34:22,260 --> 00:34:25,610 sends an acknowledgement and processes the request. 640 00:34:25,610 --> 00:34:30,980 The acknowledgement comes back, and maybe it's 641 00:34:30,980 --> 00:34:33,719 lost because these are best-ever networks. 642 00:34:33,719 --> 00:34:35,929 Remember we have this timeout. 643 00:34:35,929 --> 00:34:40,159 After some timeout period the client resends. 644 00:34:40,159 --> 00:34:43,500 This is message, N1 again. 645 00:34:43,500 --> 00:34:45,489 When the message gets resent, the server 646 00:34:45,489 --> 00:34:48,199 checks to see if this message is already in its nonce table. 647 00:34:48,199 --> 00:34:50,889 If it is, it doesn't process the message again but it sends 648 00:34:50,889 --> 00:34:53,889 the [ACE?] for that message. 649 00:34:53,889 --> 00:34:56,342 Now this [ACE?] message gets received, 650 00:34:56,342 --> 00:34:58,800 the client crosses it off its message list because it knows 651 00:34:58,800 --> 00:34:59,675 it's done processing. 652 00:35:04,800 --> 00:35:07,180 That's the basic exactly once RPC protocol. 653 00:35:07,180 --> 00:35:12,220 And what this guarantees is that this persistent client is 654 00:35:12,220 --> 00:35:13,950 going to retry sending the request 655 00:35:13,950 --> 00:35:15,760 until it gets an acknowledgement. 656 00:35:15,760 --> 00:35:18,270 And the problem with that is that it can generate multiple, 657 00:35:18,270 --> 00:35:21,420 the server can hear this message multiple times 658 00:35:21,420 --> 00:35:23,539 so the server uses this nonce table to filter out 659 00:35:23,539 --> 00:35:24,330 duplicate messages. 660 00:35:38,520 --> 00:35:41,546 Let's see how we can use this notion of this exactly once 661 00:35:41,546 --> 00:35:42,920 in our two-phase commit protocol. 662 00:35:45,470 --> 00:35:50,300 I'm going to erase this and just redraw a similar example with 663 00:35:50,300 --> 00:35:55,570 a [LOSI?] protocol instead of a [LOSLS?] protocol. 664 00:35:55,570 --> 00:35:58,580 And just to sort of make the notation a little bit simpler, 665 00:35:58,580 --> 00:36:02,340 let's now just suppose we have one worker site A. 666 00:36:02,340 --> 00:36:05,180 We're not going to show A or B. 667 00:36:05,180 --> 00:36:09,050 But this generalizes completely to as many As, Bs, Cs as we 668 00:36:09,050 --> 00:36:10,380 want. 669 00:36:10,380 --> 00:36:16,540 What's going to happen now is, what I want to do 670 00:36:16,540 --> 00:36:21,490 is I want to keep a list of pending actions at S, 671 00:36:21,490 --> 00:36:23,990 as well as the log of NS. 672 00:36:23,990 --> 00:36:26,430 And I'm also at A going to keep a list of pending 673 00:36:26,430 --> 00:36:35,070 actions and the log at A. 674 00:36:35,070 --> 00:36:37,417 At some point S is going to go ahead and start 675 00:36:37,417 --> 00:36:39,750 processing the transaction again and it's going to write 676 00:36:39,750 --> 00:36:41,220 start T. 677 00:36:41,220 --> 00:36:44,390 It's going to add T to its list of pending actions. 678 00:36:44,390 --> 00:36:49,920 And then it's going to send this message that says do X to A. 679 00:36:49,920 --> 00:36:54,760 Of course, it may be the case that this message gets lost 680 00:36:54,760 --> 00:36:56,940 because this is a [LOSI?] network. 681 00:36:56,940 --> 00:36:59,560 So we're just going to use our persistent. 682 00:36:59,560 --> 00:37:02,690 We're going to make S persistently 683 00:37:02,690 --> 00:37:06,060 retry so some time later, after some timeout, 684 00:37:06,060 --> 00:37:09,100 it's going to resend this do X message. 685 00:37:09,100 --> 00:37:10,720 And it knows that it needs to timeout 686 00:37:10,720 --> 00:37:12,806 because it sees that there is this action here 687 00:37:12,806 --> 00:37:13,680 that's still pending. 688 00:37:18,210 --> 00:37:23,520 Now this site A is going to receive this request. 689 00:37:23,520 --> 00:37:27,160 It's going to add X for transaction T 690 00:37:27,160 --> 00:37:30,290 to its pending list, it's going to write its start 691 00:37:30,290 --> 00:37:34,110 X of T record, and it's going to go ahead and process 692 00:37:34,110 --> 00:37:37,580 just like it did before until it tentatively commits. 693 00:37:41,930 --> 00:37:44,810 And now, once it is tentatively committed, 694 00:37:44,810 --> 00:37:48,920 it's going to go ahead and mark this in its pending 695 00:37:48,920 --> 00:37:51,660 table this action is tentatively committed 696 00:37:51,660 --> 00:37:58,320 and it's going to send a request back that says did X. 697 00:37:58,320 --> 00:38:02,470 Of course, this request can also be lost. 698 00:38:02,470 --> 00:38:06,750 So, again, remember we have the server persistently retrying. 699 00:38:06,750 --> 00:38:09,490 So, at some point, it didn't hear this did X request. 700 00:38:09,490 --> 00:38:12,700 It's just going to say hey, do it again. 701 00:38:12,700 --> 00:38:15,919 And now when A receives this request, 702 00:38:15,919 --> 00:38:17,710 it's going to look up in its pending table, 703 00:38:17,710 --> 00:38:20,168 it's going to see that it has already tentatively committed 704 00:38:20,168 --> 00:38:20,720 this action. 705 00:38:20,720 --> 00:38:23,220 So it's just going to not process the action at all, 706 00:38:23,220 --> 00:38:25,520 it's just going to send back the request that says, 707 00:38:25,520 --> 00:38:33,010 this should say do X, it's going to say did X. 708 00:38:33,010 --> 00:38:36,770 Now, once S has received the tentative commits from all 709 00:38:36,770 --> 00:38:39,060 of the other actions, it can go ahead and write 710 00:38:39,060 --> 00:38:45,040 its commit record and we can go ahead and enter phase two, 711 00:38:45,040 --> 00:38:48,310 just like we did before. 712 00:38:50,907 --> 00:38:52,990 You can kind of see how this is going to work out. 713 00:38:52,990 --> 00:38:57,660 The process is just going to continue in the same way. 714 00:38:57,660 --> 00:38:59,641 After S writes its commit record, 715 00:38:59,641 --> 00:39:01,765 it's going to go ahead and send the commit message. 716 00:39:07,252 --> 00:39:09,460 And, of course, it is possible for the commit message 717 00:39:09,460 --> 00:39:11,800 to be lost. 718 00:39:11,800 --> 00:39:14,960 So, in this case, notice that the server doesn't actually 719 00:39:14,960 --> 00:39:17,430 know whether the commit message has been lost or not. 720 00:39:17,430 --> 00:39:21,030 And we haven't shown A sending out any responses 721 00:39:21,030 --> 00:39:23,820 back to the server after the commit message has been sent. 722 00:39:23,820 --> 00:39:26,470 So that means that A is the one that actually 723 00:39:26,470 --> 00:39:27,880 has to retry in this case. 724 00:39:27,880 --> 00:39:36,870 Now, A is just going to say hey, S, I did X. 725 00:39:36,870 --> 00:39:38,700 And that's OK. 726 00:39:38,700 --> 00:39:40,480 Now what's going to happen is S is 727 00:39:40,480 --> 00:39:43,310 going to go ahead and look up in its table, when 728 00:39:43,310 --> 00:39:45,897 it wrote the commit message to change 729 00:39:45,897 --> 00:39:48,230 the state of this message to committed, this transaction 730 00:39:48,230 --> 00:39:52,610 to committed, so S is just going to look up in its table, 731 00:39:52,610 --> 00:39:54,280 see that the transaction is committed 732 00:39:54,280 --> 00:39:58,590 and is going to send this message again. 733 00:39:58,590 --> 00:40:01,131 And now, hopefully, this time it gets through. 734 00:40:01,131 --> 00:40:03,380 And S can go ahead and change the state of this action 735 00:40:03,380 --> 00:40:07,640 to committed, it can release its locks and the protocol is done. 736 00:40:07,640 --> 00:40:11,220 One thing to note is that as soon 737 00:40:11,220 --> 00:40:13,930 as A knows that the action is committed, 738 00:40:13,930 --> 00:40:16,870 it doesn't need to keep any more state about the action 739 00:40:16,870 --> 00:40:17,950 around anymore. 740 00:40:17,950 --> 00:40:19,680 It knows it's committed. 741 00:40:19,680 --> 00:40:22,570 That means that S has definitely heard about the fact 742 00:40:22,570 --> 00:40:24,219 that it did X. 743 00:40:24,219 --> 00:40:25,760 And so A can go ahead and just forget 744 00:40:25,760 --> 00:40:28,020 any information it had in this pending transaction 745 00:40:28,020 --> 00:40:31,520 table about the action. 746 00:40:31,520 --> 00:40:33,340 We haven't quite solved the problem 747 00:40:33,340 --> 00:40:37,030 but we still, in S, have to keep this information around 748 00:40:37,030 --> 00:40:39,790 about the fact that the transaction committed 749 00:40:39,790 --> 00:40:42,890 because we never actually know, in S, 750 00:40:42,890 --> 00:40:45,130 whether this final commit message got through or not. 751 00:40:45,130 --> 00:40:48,900 So it's always possible that A could re-request 752 00:40:48,900 --> 00:40:50,810 the state of transaction T. 753 00:40:50,810 --> 00:40:54,290 And S needs to be able to answer that correctly. 754 00:40:54,290 --> 00:40:56,780 So this complicates this a little bit, 755 00:40:56,780 --> 00:40:58,330 and there are a couple of solutions 756 00:40:58,330 --> 00:41:00,345 that people have proposed. 757 00:41:00,345 --> 00:41:02,220 The obvious one is we just add an extra round 758 00:41:02,220 --> 00:41:04,247 of acknowledgements onto the end of this. 759 00:41:04,247 --> 00:41:06,830 So that's a simple thing we can do, is just have A acknowledge 760 00:41:06,830 --> 00:41:07,914 that it heard the message. 761 00:41:07,914 --> 00:41:09,954 And then, as soon as S has heard acknowledgements 762 00:41:09,954 --> 00:41:11,890 from all of its subordinates, it can go ahead 763 00:41:11,890 --> 00:41:13,510 and delete the information. 764 00:41:13,510 --> 00:41:15,000 There is another variant of this, 765 00:41:15,000 --> 00:41:16,958 which is talked about a little bit in the text, 766 00:41:16,958 --> 00:41:18,640 something called presumed commit where 767 00:41:18,640 --> 00:41:21,359 basically if A doesn't know anything about the action, 768 00:41:21,359 --> 00:41:22,900 it assumes that the action committed. 769 00:41:22,900 --> 00:41:26,284 So A can discard the fact about any committed actions. 770 00:41:26,284 --> 00:41:28,200 Getting that to work is a little bit trickier. 771 00:41:28,200 --> 00:41:34,270 And the details of it are a little bit complicated 772 00:41:34,270 --> 00:41:39,327 but you can sort of see the idea. 773 00:41:39,327 --> 00:41:41,660 I just want to quickly spend a couple of minutes talking 774 00:41:41,660 --> 00:41:47,520 about what happens in the case when these systems crash 775 00:41:47,520 --> 00:41:49,139 during different phases of execution 776 00:41:49,139 --> 00:41:50,930 so you guys can get a sense of how recovery 777 00:41:50,930 --> 00:41:53,650 would work in this environment. 778 00:41:53,650 --> 00:41:55,145 Let's suppose that S crashes. 779 00:41:57,990 --> 00:42:00,550 And there are two situations we're worried about. 780 00:42:00,550 --> 00:42:04,870 Either it crashes before commit or it crashes after commit. 781 00:42:04,870 --> 00:42:11,230 If it crashes before commit that means 782 00:42:11,230 --> 00:42:19,900 that, S is the sort of lead transaction here, 783 00:42:19,900 --> 00:42:21,540 we want to treat this just like we 784 00:42:21,540 --> 00:42:24,581 would treat this in sort of traditional recoverable 785 00:42:24,581 --> 00:42:25,080 systems. 786 00:42:25,080 --> 00:42:27,700 So if we crash before the main commit, well, 787 00:42:27,700 --> 00:42:30,660 what we want to do is undo the effects of this transaction 788 00:42:30,660 --> 00:42:31,460 completely. 789 00:42:31,460 --> 00:42:36,350 So we're going to undo T. 790 00:42:36,350 --> 00:42:38,800 Notice, however, that if there are multiple As, 791 00:42:38,800 --> 00:42:43,480 Bs and Cs it may be the case that S crashes and some 792 00:42:43,480 --> 00:42:46,430 of the subordinates are still processing messages and send 793 00:42:46,430 --> 00:42:50,940 did finish processing requests to S. 794 00:42:50,940 --> 00:42:54,970 That means when S crashes it recovers, it comes back up, 795 00:42:54,970 --> 00:43:02,580 it undoes the transaction and it remembers that T aborted. 796 00:43:02,580 --> 00:43:04,760 So it puts T in its transaction table 797 00:43:04,760 --> 00:43:07,280 so that when it gets a request from somebody saying hey, 798 00:43:07,280 --> 00:43:09,410 I finished doing this part of transaction T, 799 00:43:09,410 --> 00:43:13,890 it can tell that guy oh, by the way, that transaction aborted. 800 00:43:13,890 --> 00:43:16,077 Now, suppose that we crashed after the commit, well, 801 00:43:16,077 --> 00:43:17,410 you know what's going to happen. 802 00:43:17,410 --> 00:43:19,035 We want this transaction to be durable. 803 00:43:19,035 --> 00:43:21,080 We said that the commit is the commit point, 804 00:43:21,080 --> 00:43:23,900 we want this thing to appear to have happened. 805 00:43:23,900 --> 00:43:28,720 So we need to make sure that we run redo on T 806 00:43:28,720 --> 00:43:36,110 and that we remember that T committed. 807 00:43:41,110 --> 00:43:44,804 Now, if A crashes, the situation is a little bit easier. 808 00:43:44,804 --> 00:43:46,470 Basically, there are only two situations 809 00:43:46,470 --> 00:43:48,050 we have to worry about in A. 810 00:43:48,050 --> 00:43:50,070 It's either before or after we've 811 00:43:50,070 --> 00:43:51,690 gone into the tentative commit state. 812 00:43:58,010 --> 00:43:59,760 If it's before the tentative commit state, 813 00:43:59,760 --> 00:44:02,220 well, we're just going to undo the effects of this transaction 814 00:44:02,220 --> 00:44:02,750 completely. 815 00:44:02,750 --> 00:44:04,458 We're going to roll it back and basically 816 00:44:04,458 --> 00:44:06,050 going to forget about it. 817 00:44:06,050 --> 00:44:08,380 And it's going to be S's responsibility 818 00:44:08,380 --> 00:44:13,580 to try and redo this action with us, if it wants to, 819 00:44:13,580 --> 00:44:16,760 or S may go ask somebody else to do this part of the action. 820 00:44:16,760 --> 00:44:18,640 If we crashed after the tentative commit, 821 00:44:18,640 --> 00:44:23,100 though, remember that at this point it's up to S. 822 00:44:23,100 --> 00:44:24,025 This is A crashes. 823 00:44:28,050 --> 00:44:30,490 At this point it's up to S. 824 00:44:30,490 --> 00:44:33,610 So, after we've reached the tentative commit, 825 00:44:33,610 --> 00:44:36,330 this action needs to go ahead and check with S 826 00:44:36,330 --> 00:44:38,580 and see what the final outcome of this thing was. 827 00:44:38,580 --> 00:44:40,780 It crashes, it comes back up, it sees 828 00:44:40,780 --> 00:44:44,109 it was in the tentative commit state for T, 829 00:44:44,109 --> 00:44:45,650 and so it sends a message to S saying 830 00:44:45,650 --> 00:44:47,110 hey, whatever happened to T? 831 00:44:47,110 --> 00:44:49,120 And then S sends a response back if it knows. 832 00:44:51,930 --> 00:44:55,300 This is the basic outline of how we do crash recovery 833 00:44:55,300 --> 00:44:57,454 and how we deal with lost messages 834 00:44:57,454 --> 00:44:58,870 in this two-phase commit protocol. 835 00:45:01,980 --> 00:45:04,400 With the last few minutes, I want to talk about something. 836 00:45:04,400 --> 00:45:06,650 This two-phase commit protocol seems really great. 837 00:45:06,650 --> 00:45:08,550 It seems like we got in this way we 838 00:45:08,550 --> 00:45:10,210 have actions that are distributed 839 00:45:10,210 --> 00:45:11,440 across multiple sites. 840 00:45:11,440 --> 00:45:13,980 We've made it so that if the action commits, 841 00:45:13,980 --> 00:45:17,590 we have this nice property that if the action commits, 842 00:45:17,590 --> 00:45:21,450 we can make it so that either A or B don't commit 843 00:45:21,450 --> 00:45:23,170 or they definitely both commit. 844 00:45:23,170 --> 00:45:25,930 We have this way in which S is the ultimate arbiter 845 00:45:25,930 --> 00:45:28,050 and authority of what commits. 846 00:45:28,050 --> 00:45:31,380 This seems like a really nice system that we built. 847 00:45:31,380 --> 00:45:31,940 And it is. 848 00:45:31,940 --> 00:45:36,180 And it has all these great properties, 849 00:45:36,180 --> 00:45:39,170 but there is one property that it doesn't have. 850 00:45:39,170 --> 00:45:55,200 The question is, say in this environment with A and B, 851 00:45:55,200 --> 00:46:12,620 do they make their results visible at the same time. 852 00:46:12,620 --> 00:46:14,346 I had written commit before. 853 00:46:14,346 --> 00:46:16,470 And, in some sense, they do commit at the same time 854 00:46:16,470 --> 00:46:19,030 because they're going to definitely commit at the point 855 00:46:19,030 --> 00:46:20,556 that this record gets written. 856 00:46:20,556 --> 00:46:22,930 But the answer to the question are there results visible, 857 00:46:22,930 --> 00:46:24,305 the question is are there results 858 00:46:24,305 --> 00:46:28,400 visible at the same time, if you stare at this for a little 859 00:46:28,400 --> 00:46:30,340 while you realize no because their results 860 00:46:30,340 --> 00:46:34,600 become visible at the point at which these A and B receive 861 00:46:34,600 --> 00:46:37,060 the commit message from S. 862 00:46:37,060 --> 00:46:39,980 Not at the point that S writes the commit record. 863 00:46:39,980 --> 00:46:43,230 So they're going to expose their results at a slightly different 864 00:46:43,230 --> 00:46:45,490 point in time. 865 00:46:45,490 --> 00:46:47,100 And, in fact, it could be a long time 866 00:46:47,100 --> 00:46:49,342 because it may be that A crashed after it went 867 00:46:49,342 --> 00:46:51,300 into the tentative commit record and it took it 868 00:46:51,300 --> 00:46:52,660 two days to recover. 869 00:46:52,660 --> 00:46:54,870 And then it came back up and then it checks with S 870 00:46:54,870 --> 00:46:56,690 and says hey, whatever happened to T? 871 00:46:56,690 --> 00:47:00,220 So it could be a very long time before, say, 872 00:47:00,220 --> 00:47:02,740 A commits, makes its results visible, 873 00:47:02,740 --> 00:47:06,560 whereas B may have done immediately 874 00:47:06,560 --> 00:47:09,510 after it received the first commit message from S. 875 00:47:09,510 --> 00:47:16,560 You might ask the question is it possible to guaranty 876 00:47:16,560 --> 00:47:20,130 that A and B expose their results to the outside world 877 00:47:20,130 --> 00:47:21,470 at exactly the same instant. 878 00:47:21,470 --> 00:47:25,990 And the answer to this question, it turns out, is no. 879 00:47:25,990 --> 00:47:30,520 And this is kind of a fundamental result. 880 00:47:30,520 --> 00:47:32,650 The reason for this, let me just give you, 881 00:47:32,650 --> 00:47:35,560 there's a simple example that's talked about in the book. 882 00:47:35,560 --> 00:47:37,185 It's called the "two generals problem". 883 00:47:41,070 --> 00:47:43,850 And the idea is as follows. 884 00:47:43,850 --> 00:47:47,220 Suppose there are two generals and they have their armies. 885 00:47:47,220 --> 00:47:49,220 They've flanking somebody who they are attacking 886 00:47:49,220 --> 00:47:50,937 and they're on the sides of a valley 887 00:47:50,937 --> 00:47:53,520 and they're both going to dive into the valley with the armies 888 00:47:53,520 --> 00:47:55,182 at the same time. 889 00:47:55,182 --> 00:47:56,890 They're trying to agree what time they're 890 00:47:56,890 --> 00:47:57,960 going to do this at. 891 00:47:57,960 --> 00:47:59,880 And they want to make sure they both attack at the same time 892 00:47:59,880 --> 00:48:01,838 because, if they don't attack at the same time, 893 00:48:01,838 --> 00:48:03,830 they're afraid that one of them will lose. 894 00:48:03,830 --> 00:48:06,030 So the only way they have to communicate 895 00:48:06,030 --> 00:48:08,280 with each other is by sending these messengers, 896 00:48:08,280 --> 00:48:09,280 say, across the valleys. 897 00:48:09,280 --> 00:48:10,738 They have some guy who runs across. 898 00:48:10,738 --> 00:48:12,460 But, of course, this guy may not make it. 899 00:48:12,460 --> 00:48:16,090 He may collapse from exhaustion or he may get shot or whatever. 900 00:48:18,980 --> 00:48:21,130 So the first general sends a runner out 901 00:48:21,130 --> 00:48:24,260 to the second general that says we'll attack at dawn. 902 00:48:24,260 --> 00:48:26,330 And maybe that runner gets through. 903 00:48:26,330 --> 00:48:28,550 And then the second general sends a runner back that 904 00:48:28,550 --> 00:48:30,660 says yes, we'll attack at dawn. 905 00:48:30,660 --> 00:48:34,440 And maybe that runner gets through, or maybe he doesn't. 906 00:48:34,440 --> 00:48:36,270 The second runner doesn't get through, 907 00:48:36,270 --> 00:48:37,290 and the first general says man, I 908 00:48:37,290 --> 00:48:39,665 don't know if the second general heard about this or not. 909 00:48:39,665 --> 00:48:42,500 So he sends another runner that says we'll attack at dawn. 910 00:48:42,500 --> 00:48:44,760 And the second general says I already sent a runner 911 00:48:44,760 --> 00:48:46,135 but I guess he didn't get through 912 00:48:46,135 --> 00:48:48,490 and he sends another one back. 913 00:48:48,490 --> 00:48:50,415 And, ultimately, they both need to agree 914 00:48:50,415 --> 00:48:52,040 that they'll both attack at dawn so you 915 00:48:52,040 --> 00:48:54,500 need a certain number of runners to successfully get 916 00:48:54,500 --> 00:48:56,750 through in order for this to happen. 917 00:48:56,750 --> 00:48:59,600 The issue here, though, is that suppose 918 00:48:59,600 --> 00:49:02,740 the general have a fixed number of runners 919 00:49:02,740 --> 00:49:05,250 and they want to say what's the maximum number of runners 920 00:49:05,250 --> 00:49:07,510 that I could possibly ever need in order 921 00:49:07,510 --> 00:49:09,070 to agree on this thing? 922 00:49:09,070 --> 00:49:10,960 And, if you think about this for a minute, 923 00:49:10,960 --> 00:49:15,067 you will see that there's no finite bound 924 00:49:15,067 --> 00:49:16,900 on the number of runners that could possibly 925 00:49:16,900 --> 00:49:24,804 be needed, because it's always the case that a huge number 926 00:49:24,804 --> 00:49:26,470 of runners could be lost because this is 927 00:49:26,470 --> 00:49:28,450 this best-effort network where there's always 928 00:49:28,450 --> 00:49:30,160 some probability of something being lost. 929 00:49:30,160 --> 00:49:32,920 There is always this infinitesimal little chance 930 00:49:32,920 --> 00:49:36,770 that a million runners in a row wouldn't make it through. 931 00:49:36,770 --> 00:49:38,770 So this is essentially the two generals problem. 932 00:49:38,770 --> 00:49:42,190 And what it says is it's impossible for these two guys 933 00:49:42,190 --> 00:49:44,800 to guaranty that they will achieve consensus 934 00:49:44,800 --> 00:49:47,390 about something using a fixed number of messages, 935 00:49:47,390 --> 00:49:50,000 using a fixed number of runners. 936 00:49:50,000 --> 00:49:54,400 That means, in the case of two-phase commit, 937 00:49:54,400 --> 00:49:58,580 what that suggests is that this S or one of these clients 938 00:49:58,580 --> 00:50:00,550 may need to continually retransmit, say, 939 00:50:00,550 --> 00:50:04,062 an infinite number of times, a very large number of times 940 00:50:04,062 --> 00:50:05,770 before its request is actually processed. 941 00:50:05,770 --> 00:50:08,830 Because there's always this sort of small chance of a message 942 00:50:08,830 --> 00:50:11,310 being lost by the best-effort network. 943 00:50:11,310 --> 00:50:13,060 So that's the two generals problem 944 00:50:13,060 --> 00:50:15,306 and it's just sort of a nice result to keep in mind. 945 00:50:15,306 --> 00:50:17,180 It's one of those results in computer science 946 00:50:17,180 --> 00:50:19,590 that sometimes it turns out to be important. 947 00:50:19,590 --> 00:50:21,920 In practice, in this kind of environment 948 00:50:21,920 --> 00:50:24,220 it doesn't matter that much. 949 00:50:24,220 --> 00:50:26,340 The probabilities of loss are small. 950 00:50:26,340 --> 00:50:29,140 And so most of the time this is going 951 00:50:29,140 --> 00:50:32,730 to achieve consensus after a limited number of messages. 952 00:50:32,730 --> 00:50:35,000 So that's it for our discussion of fault tolerance 953 00:50:35,000 --> 00:50:35,950 and recovery. 954 00:50:35,950 --> 00:50:38,180 Next time we're going to start talking about security 955 00:50:38,180 --> 00:50:41,100 and protection of information. 956 00:50:41,100 --> 00:50:43,490 We will see you on Monday.