1 00:00:00,770 --> 00:00:06,000 Today's topic is one of the most important concepts 2 00:00:06,000 --> 00:00:08,940 in this area, and it is called atomicity. 3 00:00:08,940 --> 00:00:11,020 And what we are going to do is spend time 4 00:00:11,020 --> 00:00:13,160 understanding what this is as a concept 5 00:00:13,160 --> 00:00:19,240 and then understanding how to achieve atomicity in systems. 6 00:00:19,240 --> 00:00:24,120 And recall that the main goal is to handle "failures", 7 00:00:24,120 --> 00:00:27,636 and that is what we talked about the last time. 8 00:00:27,636 --> 00:00:30,010 And we came up with a bunch of different ways of thinking 9 00:00:30,010 --> 00:00:32,509 about failures and how to cope with it. 10 00:00:32,509 --> 00:00:40,110 And one idea that we saw the last time was an idea involving 11 00:00:40,110 --> 00:00:42,620 replicating a component, let's say 12 00:00:42,620 --> 00:00:49,070 a disk or any component whose failure you 13 00:00:49,070 --> 00:00:51,445 wish to cope with and vote on the results. 14 00:00:55,420 --> 00:00:58,640 And so the idea is that if you are not exactly sure what 15 00:00:58,640 --> 00:00:59,890 the right answer should be-- 16 00:00:59,890 --> 00:01:02,960 If you are not sure whether any given component is working 17 00:01:02,960 --> 00:01:05,030 correctly or not, replicate that component 18 00:01:05,030 --> 00:01:10,440 and then give them all the same input, see what output appears 19 00:01:10,440 --> 00:01:11,770 and then vote on the results. 20 00:01:11,770 --> 00:01:14,340 And we did see that these things are pretty sophisticated, 21 00:01:14,340 --> 00:01:16,840 but the main problem with replicate plus vote 22 00:01:16,840 --> 00:01:20,490 is that often it is extremely expensive to build and very, 23 00:01:20,490 --> 00:01:22,870 very hard to get right. 24 00:01:22,870 --> 00:01:25,230 And, second, it often does not actually work. 25 00:01:25,230 --> 00:01:28,790 For example, if you just take a software program, 26 00:01:28,790 --> 00:01:33,240 a software module and you make 100 copies or 95 copies 27 00:01:33,240 --> 00:01:34,890 of that software module and give them 28 00:01:34,890 --> 00:01:37,050 all the same input and then vote on the output, 29 00:01:37,050 --> 00:01:38,860 if you have a bug in one of the modules 30 00:01:38,860 --> 00:01:41,290 and it is a bug that is actually replicated 31 00:01:41,290 --> 00:01:44,010 in all of the modules then all of the replicas 32 00:01:44,010 --> 00:01:46,230 are going to give you the same wrong answer. 33 00:01:46,230 --> 00:01:48,930 So the key assumption behind replicating and voting 34 00:01:48,930 --> 00:01:51,320 is that the replicas are independent of each other 35 00:01:51,320 --> 00:01:53,730 and have independent modes of failure. 36 00:01:53,730 --> 00:01:58,340 And that may not be true in all of your modules. 37 00:01:58,340 --> 00:02:00,915 And so the way we are going to deal with this problem, 38 00:02:00,915 --> 00:02:03,290 and even though it is possible to design software systems 39 00:02:03,290 --> 00:02:06,090 where the replicas are, in fact, independent of each other, 40 00:02:06,090 --> 00:02:08,430 it will turn out that it is quite expensive 41 00:02:08,430 --> 00:02:10,100 to do in many cases. 42 00:02:10,100 --> 00:02:14,470 So what we are going to do, to relax this assumption of having 43 00:02:14,470 --> 00:02:18,680 a system which handles failures by giving 44 00:02:18,680 --> 00:02:21,220 the same input to multiple outputs and then voting on it, 45 00:02:21,220 --> 00:02:23,610 we are going to relax that and instead 46 00:02:23,610 --> 00:02:26,595 look at a different concept called "recoverability". 47 00:02:29,770 --> 00:02:31,880 And the idea here is rather is rather 48 00:02:31,880 --> 00:02:36,980 than to try to replicate modules so that to the higher layers 49 00:02:36,980 --> 00:02:40,200 it looks as if the underlying module has never failed 50 00:02:40,200 --> 00:02:43,190 because you have replicated it, the idea here is to allow 51 00:02:43,190 --> 00:02:45,500 the underlying module to fail. 52 00:02:45,500 --> 00:02:48,570 But have it fail, typically in a fail fast manner 53 00:02:48,570 --> 00:02:50,640 so that you can detect the failure, 54 00:02:50,640 --> 00:02:53,050 and then arrange for that module to be restarted. 55 00:02:53,050 --> 00:02:55,440 And when it restarts the idea is to make it 56 00:02:55,440 --> 00:02:58,000 so that the module does something 57 00:02:58,000 --> 00:03:01,250 such that in the end the state of the system, 58 00:03:01,250 --> 00:03:03,220 after it does that thing, usually some kind 59 00:03:03,220 --> 00:03:06,900 of recovery procedure is that you can get back 60 00:03:06,900 --> 00:03:08,310 to using that module. 61 00:03:08,310 --> 00:03:10,230 So it is a little bit like rather than try 62 00:03:10,230 --> 00:03:13,907 to build, you know, the analogy might be something like this. 63 00:03:13,907 --> 00:03:16,240 You might imagine, let's say there is a little child who 64 00:03:16,240 --> 00:03:17,310 is learning to walk. 65 00:03:17,310 --> 00:03:19,040 One approach for nature to have adopted 66 00:03:19,040 --> 00:03:21,498 would have been to try to make it so the child never falls. 67 00:03:21,498 --> 00:03:23,830 And there is a lot of complexity associated with always 68 00:03:23,830 --> 00:03:25,385 keeping that child walking. 69 00:03:25,385 --> 00:03:26,760 Or, alternatively, you could have 70 00:03:26,760 --> 00:03:29,580 a story or a method by which every once in a while 71 00:03:29,580 --> 00:03:33,380 the child falls but then has a plan to get up from that fall 72 00:03:33,380 --> 00:03:35,182 and then restart. 73 00:03:35,182 --> 00:03:37,140 So that is the plan that we are going to adopt. 74 00:03:37,140 --> 00:03:40,522 And this notion here is called recoverability. 75 00:03:40,522 --> 00:03:41,980 And the general plan is going to be 76 00:03:41,980 --> 00:03:48,730 that if you have a module M1 which invokes another module M2 77 00:03:48,730 --> 00:03:56,080 and M2 were to fail then the idea is that M2 fails 78 00:03:56,080 --> 00:03:59,450 and then it recovers and you restart the module. 79 00:03:59,450 --> 00:04:03,800 And you want to make sure that M2 is left in a situation, 80 00:04:03,800 --> 00:04:06,460 once it recovers, where there is no partial state. 81 00:04:06,460 --> 00:04:09,982 And I will define that more precisely as we go along today. 82 00:04:09,982 --> 00:04:12,190 But the main idea is going to be to insure that there 83 00:04:12,190 --> 00:04:15,010 is no vestige of previous computations 84 00:04:15,010 --> 00:04:17,176 that are in the middle of being run. 85 00:04:17,176 --> 00:04:19,050 So the state of the system, when it recovers, 86 00:04:19,050 --> 00:04:21,149 is at a well-understood point so that M1 87 00:04:21,149 --> 00:04:22,960 can continue to use that. 88 00:04:22,960 --> 00:04:25,050 So there is no "partial" state where 89 00:04:25,050 --> 00:04:27,160 partial is in quotes here. 90 00:04:27,160 --> 00:04:29,960 And we will talk about what it means for something 91 00:04:29,960 --> 00:04:31,969 to be in a partial state. 92 00:04:31,969 --> 00:04:33,760 The idea is to prevent that from happening. 93 00:04:39,040 --> 00:04:41,830 So we are going to do this by starting 94 00:04:41,830 --> 00:04:43,400 with an example, and the same example 95 00:04:43,400 --> 00:04:49,410 that I mentioned the last time which was a transfer of money 96 00:04:49,410 --> 00:04:52,797 from one bank account to another. 97 00:04:52,797 --> 00:04:54,880 There is a "from" account, there is a "to" account 98 00:04:54,880 --> 00:05:00,010 and some dollar "amount". 99 00:05:00,010 --> 00:05:03,820 And you want to transfer money from "from" to "to" 100 00:05:03,820 --> 00:05:06,292 and it is whatever the "amount" is. 101 00:05:06,292 --> 00:05:07,750 And the problem here is, of course, 102 00:05:07,750 --> 00:05:11,770 that in the middle of transfer this procedure might fail, 103 00:05:11,770 --> 00:05:14,170 the system might crash and you might 104 00:05:14,170 --> 00:05:18,440 be left in a situation where a part of this transfer 105 00:05:18,440 --> 00:05:19,890 has already run. 106 00:05:19,890 --> 00:05:22,210 To take a specific example, here is 107 00:05:22,210 --> 00:05:24,750 an example of what the transfer procedure might look like. 108 00:05:24,750 --> 00:05:28,120 It takes a "from" and a "to" and an "amount". 109 00:05:28,120 --> 00:05:30,930 And the first thing it does is to read. 110 00:05:30,930 --> 00:05:33,260 Assume that all of this data is stored on disk. 111 00:05:33,260 --> 00:05:37,380 It reads from the "from" account and then it reduces, 112 00:05:37,380 --> 00:05:42,290 it debits the amount from the "account" and then writes back. 113 00:05:42,290 --> 00:05:45,189 And it does the same thing to the "to" account. 114 00:05:45,189 --> 00:05:46,980 So in the end, if this procedure completely 115 00:05:46,980 --> 00:05:52,010 ran, then "from" account would be reduced by "amount" and "to" 116 00:05:52,010 --> 00:05:54,530 account would be enhanced by "amount". 117 00:05:54,530 --> 00:05:57,070 Of course, the problem is you might have a failure 118 00:05:57,070 --> 00:05:58,930 anywhere in the middle. 119 00:05:58,930 --> 00:06:01,450 And, as a concrete example, if a crash 120 00:06:01,450 --> 00:06:04,680 were to happen after the first three lines shown above, 121 00:06:04,680 --> 00:06:07,230 if you owned this account you would not be very happy 122 00:06:07,230 --> 00:06:09,530 because you just lost some money from an account 123 00:06:09,530 --> 00:06:12,000 and nothing happened. 124 00:06:12,000 --> 00:06:16,500 No other account got money added to it, 125 00:06:16,500 --> 00:06:18,490 and this is the problem that we want to avoid. 126 00:06:18,490 --> 00:06:21,770 If you think about this for a moment, what you would like 127 00:06:21,770 --> 00:06:25,910 intuitively is that if a crash like this were to happen 128 00:06:25,910 --> 00:06:28,570 and the system were to recover and come back up, 129 00:06:28,570 --> 00:06:30,990 there are really only two states that the system should 130 00:06:30,990 --> 00:06:32,700 be in for the system to really be correct 131 00:06:32,700 --> 00:06:35,540 and to meet what your intuition might expect. 132 00:06:35,540 --> 00:06:38,060 Either this procedure must completely be finished, that is 133 00:06:38,060 --> 00:06:40,950 the state of the system must be the same as if this procedure 134 00:06:40,950 --> 00:06:45,200 completely ran and finished, or the state of the system 135 00:06:45,200 --> 00:06:49,110 must be such that the procedure never ran at all. 136 00:06:49,110 --> 00:06:52,070 It is not at all OK to let the state of the system 137 00:06:52,070 --> 00:06:56,930 be equal to whatever the state was, in this example, 138 00:06:56,930 --> 00:07:00,050 at the time the crash happened. 139 00:07:00,050 --> 00:07:02,470 What you want is a kind of all or nothing behavior. 140 00:07:02,470 --> 00:07:09,619 And, of course, if the crash happened as I have shown here, 141 00:07:09,619 --> 00:07:12,160 there is no way for you to have prevented those lines of code 142 00:07:12,160 --> 00:07:13,450 from being wrong. 143 00:07:13,450 --> 00:07:15,820 Those lines of code ran and then the crash happened. 144 00:07:15,820 --> 00:07:18,310 So what you really need is a way by which you 145 00:07:18,310 --> 00:07:19,800 can back out of these changes. 146 00:07:19,800 --> 00:07:21,425 What the system needs is a way by which 147 00:07:21,425 --> 00:07:25,350 when the system crashes and then recovers from the crash, 148 00:07:25,350 --> 00:07:27,650 during failure recovery the system 149 00:07:27,650 --> 00:07:32,860 has to have a way to back out of whatever changes it has made. 150 00:07:32,860 --> 00:07:35,279 In other words, what we want is a concept 151 00:07:35,279 --> 00:07:36,195 called recoverability. 152 00:07:47,640 --> 00:07:49,660 So a more precise definition of recoverability 153 00:07:49,660 --> 00:07:53,760 is shown on this slide, and let me just read it out. 154 00:07:53,760 --> 00:07:55,760 A composite sequence of steps, which we are also 155 00:07:55,760 --> 00:07:58,350 going to use the word "action" for, an action is 156 00:07:58,350 --> 00:08:01,040 recoverable if, from the point of view of the module that 157 00:08:01,040 --> 00:08:04,820 invokes this action, this sequence either 158 00:08:04,820 --> 00:08:09,250 always completes or aborts. 159 00:08:09,250 --> 00:08:12,780 That is if it fails and then backs out, aborts in a way 160 00:08:12,780 --> 00:08:15,040 such that it appears that the sequence had never 161 00:08:15,040 --> 00:08:17,312 started to begin with. 162 00:08:17,312 --> 00:08:18,770 And, in particular, what this means 163 00:08:18,770 --> 00:08:21,180 is that if a failure were to happen in the middle 164 00:08:21,180 --> 00:08:23,410 when the system recovers, it better 165 00:08:23,410 --> 00:08:25,840 have a plan of backing out the changes. 166 00:08:25,840 --> 00:08:29,975 In other words, of aborting this action. 167 00:08:29,975 --> 00:08:31,600 The way you think about recoverability, 168 00:08:31,600 --> 00:08:35,909 the simple way to think about it is do it all or not at all. 169 00:08:39,370 --> 00:08:42,200 And our goal is to try to somehow come up with a way 170 00:08:42,200 --> 00:08:43,700 to achieve this goal. 171 00:08:47,739 --> 00:08:49,780 And before we get into a solution to this problem 172 00:08:49,780 --> 00:08:52,710 there are a few other concepts to discuss, 173 00:08:52,710 --> 00:08:55,560 and they will turn out to be very related to each other. 174 00:08:55,560 --> 00:08:57,870 And the second concept after recoverability 175 00:08:57,870 --> 00:08:59,860 that is very closely related to this idea 176 00:08:59,860 --> 00:09:02,280 has to do with concurrent actions. 177 00:09:07,650 --> 00:09:09,890 Imagine for a moment that you had the same transfer 178 00:09:09,890 --> 00:09:15,220 procedure as in this example but you had two transfers running 179 00:09:15,220 --> 00:09:18,500 at the same time and they happened 180 00:09:18,500 --> 00:09:25,440 to act on the same data items like that. 181 00:09:25,440 --> 00:09:29,420 Let's say that the first transfer moved from a savings 182 00:09:29,420 --> 00:09:32,140 account to a checking account, it moved $100. 183 00:09:32,140 --> 00:09:34,940 And the second one moved from savings to checking, 184 00:09:34,940 --> 00:09:36,820 it moved $200. 185 00:09:36,820 --> 00:09:40,870 And let's say at the beginning S was $1,000. 186 00:09:40,870 --> 00:09:45,200 And, of course as you recall from several lectures 187 00:09:45,200 --> 00:09:47,340 ago, when you have these interleave sequences, 188 00:09:47,340 --> 00:09:51,470 these two threads running the steps 189 00:09:51,470 --> 00:09:53,356 that these threads are made of might 190 00:09:53,356 --> 00:09:55,230 be interleave in arbitrary order if you don't 191 00:09:55,230 --> 00:09:58,180 have a plan to isolate them. 192 00:09:58,180 --> 00:10:00,950 And, in particular, you might have many results that show up. 193 00:10:00,950 --> 00:10:02,420 And one result that might show up 194 00:10:02,420 --> 00:10:05,000 is both of these transfers running concurrently 195 00:10:05,000 --> 00:10:09,680 read $1,000 from "from" account and then both of them 196 00:10:09,680 --> 00:10:12,300 debit by $100 and $200 respectively. 197 00:10:12,300 --> 00:10:16,837 So at the end of it you might be left with either $800 or $900 198 00:10:16,837 --> 00:10:18,670 left in the account when the right answer is 199 00:10:18,670 --> 00:10:20,841 to have been left intuitively, if you ran 200 00:10:20,841 --> 00:10:22,590 both these transfers you would like to see 201 00:10:22,590 --> 00:10:26,550 $700 left in that account. 202 00:10:26,550 --> 00:10:28,320 So what you intuitively want here 203 00:10:28,320 --> 00:10:31,200 is if this is the first action, A1, 204 00:10:31,200 --> 00:10:34,860 and this is the second action, A2, what you would like to see 205 00:10:34,860 --> 00:10:35,590 is a sequence-- 206 00:10:35,590 --> 00:10:37,250 You don't actually care what the order 207 00:10:37,250 --> 00:10:39,465 is between these two transfers. 208 00:10:39,465 --> 00:10:40,840 I mean you are transferring money 209 00:10:40,840 --> 00:10:43,420 from one account to another and you are doing two of these. 210 00:10:43,420 --> 00:10:45,190 You do not actually care in this example, 211 00:10:45,190 --> 00:10:47,690 and it will turn out all the examples 212 00:10:47,690 --> 00:10:51,727 that we are going to be talking about with this notion 213 00:10:51,727 --> 00:10:54,060 that you are not really going to care what the order is. 214 00:10:54,060 --> 00:10:56,540 Either order is perfectly fine, but the order 215 00:10:56,540 --> 00:11:05,210 should be as if it is equivalent to either A1 before A2 or A2 216 00:11:05,210 --> 00:11:06,690 before A1. 217 00:11:12,360 --> 00:11:14,890 And that is what we would like. 218 00:11:14,890 --> 00:11:19,230 And, of course, some naīve way to achieve this is to insure 219 00:11:19,230 --> 00:11:21,080 that exactly one action runs at a time, 220 00:11:21,080 --> 00:11:22,930 it finishes and then the second one runs, 221 00:11:22,930 --> 00:11:25,200 but that is kind of going to be no fun for us to do. 222 00:11:25,200 --> 00:11:26,700 It is the right simplest solution, 223 00:11:26,700 --> 00:11:29,300 but we are going to want to improve concurrency as we had 224 00:11:29,300 --> 00:11:31,310 wanted to several lectures ago. 225 00:11:31,310 --> 00:11:32,810 So we are going to come up with ways 226 00:11:32,810 --> 00:11:34,930 of getting higher performance than running one 227 00:11:34,930 --> 00:11:35,670 after the other. 228 00:11:35,670 --> 00:11:39,350 But the net effect is if you run it in some serial order, 229 00:11:39,350 --> 00:11:41,480 in some sequential order of the actions. 230 00:11:41,480 --> 00:11:44,470 That is the result of running concurrent action has 231 00:11:44,470 --> 00:11:47,040 to be the same as some serial ordering 232 00:11:47,040 --> 00:11:48,600 of the individual actions. 233 00:11:51,340 --> 00:11:56,044 And this idea of A1 before A2 or A2 before A1 has a name. 234 00:11:56,044 --> 00:11:57,085 It is called "isolation". 235 00:12:04,670 --> 00:12:07,049 And you should distinguish that in your mind 236 00:12:07,049 --> 00:12:08,215 clearly from recoverability. 237 00:12:12,070 --> 00:12:14,080 So a more precise definition of isolation 238 00:12:14,080 --> 00:12:16,820 is essentially what I said before. 239 00:12:16,820 --> 00:12:21,930 The composite sequence of steps is isolated 240 00:12:21,930 --> 00:12:24,720 if its effect from the point of view of its invoker 241 00:12:24,720 --> 00:12:26,760 is the same as if the action occurred either 242 00:12:26,760 --> 00:12:30,140 completely before or completely after every other isolated 243 00:12:30,140 --> 00:12:30,640 action. 244 00:12:32,879 --> 00:12:34,420 And the simple way to understand this 245 00:12:34,420 --> 00:12:36,924 is you either do it all before or do it all after. 246 00:12:36,924 --> 00:12:39,590 That is the net effect has to be the same as doing it all before 247 00:12:39,590 --> 00:12:40,641 or doing it all after. 248 00:12:40,641 --> 00:12:42,640 And it is different from recoverability which is 249 00:12:42,640 --> 00:12:44,480 really do it all or not at all. 250 00:12:50,650 --> 00:12:54,780 Now, when you have a system that satisfies 251 00:12:54,780 --> 00:12:57,190 both recoverability and isolations-- 252 00:12:57,190 --> 00:12:59,220 The way to understand this is both of these 253 00:12:59,220 --> 00:13:00,270 really, although they are talking 254 00:13:00,270 --> 00:13:02,561 about different concepts, this is saying all or nothing 255 00:13:02,561 --> 00:13:05,470 and this is saying all before or all after, both of these 256 00:13:05,470 --> 00:13:08,010 are getting at the same intuitive idea which 257 00:13:08,010 --> 00:13:12,010 is that somehow there is a sequence of steps, for example, 258 00:13:12,010 --> 00:13:14,620 in this transfer procedure there will be sequences of steps. 259 00:13:14,620 --> 00:13:18,580 And somehow you want to make it look as if, for each action, 260 00:13:18,580 --> 00:13:21,880 the sequence of steps is not visible to somebody invoking 261 00:13:21,880 --> 00:13:25,240 the action because you do not want the person invoking 262 00:13:25,240 --> 00:13:26,600 this action for recoverability. 263 00:13:26,600 --> 00:13:28,120 You do not want him to know that it is build out 264 00:13:28,120 --> 00:13:29,190 of a sequence of steps. 265 00:13:29,190 --> 00:13:30,490 And if a failure happens in the middle, 266 00:13:30,490 --> 00:13:32,240 you do not want the invoker of that action 267 00:13:32,240 --> 00:13:33,897 to see some partial state. 268 00:13:33,897 --> 00:13:35,980 Likewise, when you have concurrent actions running 269 00:13:35,980 --> 00:13:38,320 together, you do not want the different invokers 270 00:13:38,320 --> 00:13:41,560 of that action to somehow see this muddled result 271 00:13:41,560 --> 00:13:42,489 of the interleaving. 272 00:13:42,489 --> 00:13:44,030 You want them to only see the results 273 00:13:44,030 --> 00:13:47,766 of running these actions one after the other. 274 00:13:47,766 --> 00:13:49,140 What you really trying to achieve 275 00:13:49,140 --> 00:13:50,848 for both of these concepts, although they 276 00:13:50,848 --> 00:13:53,600 are distinct concepts, is to hide the fact 277 00:13:53,600 --> 00:13:56,640 that this action is a composite sequence of steps. 278 00:13:56,640 --> 00:13:59,210 You want to make it look as if it is quite [UNINTELLIGIBLE]. 279 00:13:59,210 --> 00:14:00,830 And this idea of wanting something to look 280 00:14:00,830 --> 00:14:02,455 [UNINTELLIGIBLE] is called "atomicity". 281 00:14:08,340 --> 00:14:13,125 And we are going to be basically hiding the fact 282 00:14:13,125 --> 00:14:14,000 that it is composite. 283 00:14:23,370 --> 00:14:27,376 So more precisely for this course, 284 00:14:27,376 --> 00:14:29,250 we are going to use the word "atomic" to mean 285 00:14:29,250 --> 00:14:31,979 recoverable and isolated. 286 00:14:31,979 --> 00:14:33,520 And I am going to say for this course 287 00:14:33,520 --> 00:14:38,170 because these terms have been used in various different ways 288 00:14:38,170 --> 00:14:41,150 for at least probably more than 30 years 289 00:14:41,150 --> 00:14:45,472 and I think it is about time we made these precise. 290 00:14:45,472 --> 00:14:47,430 In the literature, you will see the word atomic 291 00:14:47,430 --> 00:14:50,390 to often mean recoverable. 292 00:14:50,390 --> 00:14:52,230 And sometimes, and this is unfortunate, 293 00:14:52,230 --> 00:14:55,750 you will see the word consistent to mean isolated. 294 00:14:55,750 --> 00:14:58,540 And, in particular, you will run into this confusion 295 00:14:58,540 --> 00:15:04,380 when you read the paper for recitation on Thursday, 296 00:15:04,380 --> 00:15:06,530 the System R paper. 297 00:15:06,530 --> 00:15:08,884 The problem is those terms used historically 298 00:15:08,884 --> 00:15:10,550 have not been used in a very precise way 299 00:15:10,550 --> 00:15:12,280 so we will define it precisely. 300 00:15:12,280 --> 00:15:14,800 When we say something is atomic, in general 301 00:15:14,800 --> 00:15:17,275 we mean both recoverable and isolated. 302 00:15:17,275 --> 00:15:18,650 When we mean only one of them, we 303 00:15:18,650 --> 00:15:20,820 will say atomic with respect to recoverability 304 00:15:20,820 --> 00:15:24,180 or recoverable, atomic with respect to isolation 305 00:15:24,180 --> 00:15:25,950 or isolated. 306 00:15:25,950 --> 00:15:30,220 And, like I said, atomic means recoverable and isolated. 307 00:15:30,220 --> 00:15:31,720 The general plan is to hide the fact 308 00:15:31,720 --> 00:15:34,180 that an action is built out of composite sequence of steps. 309 00:15:43,026 --> 00:15:44,900 Now, to add to this confusion of terminology, 310 00:15:44,900 --> 00:15:48,390 there are actually two other terms or two other properties 311 00:15:48,390 --> 00:15:52,290 that you often want from actions in addition 312 00:15:52,290 --> 00:15:54,040 to recoverability and isolation. 313 00:15:56,740 --> 00:15:59,140 And these two other properties are 314 00:15:59,140 --> 00:16:01,550 provided by many database systems 315 00:16:01,550 --> 00:16:05,740 which are one of the most common users of these concepts. 316 00:16:05,740 --> 00:16:09,100 The most common system that provides atomicity, one example 317 00:16:09,100 --> 00:16:10,010 is a database system. 318 00:16:10,010 --> 00:16:12,650 Now, many, many systems provide atomicity. 319 00:16:12,650 --> 00:16:15,660 For example, every computer does it in its instruction set. 320 00:16:15,660 --> 00:16:17,200 You often want your instructions, 321 00:16:17,200 --> 00:16:19,574 from the point of view of the invoker of the instruction, 322 00:16:19,574 --> 00:16:20,731 to be atomic. 323 00:16:20,731 --> 00:16:22,480 So we are going to be designing techniques 324 00:16:22,480 --> 00:16:25,282 that, in general, operate across the whole range of systems. 325 00:16:25,282 --> 00:16:27,240 But database systems are of particular interest 326 00:16:27,240 --> 00:16:30,680 because they are very common and they exercise these concepts 327 00:16:30,680 --> 00:16:32,550 to a high degree. 328 00:16:32,550 --> 00:16:35,689 And two other concepts that many systems provide, 329 00:16:35,689 --> 00:16:36,980 the first one is "consistency". 330 00:16:40,307 --> 00:16:42,890 And it is unfortunate that the word consistency was previously 331 00:16:42,890 --> 00:16:45,187 used, to some extent, to mean isolated. 332 00:16:45,187 --> 00:16:47,270 So it is important not to get into that confusion. 333 00:16:47,270 --> 00:16:52,484 In some old papers when you see consistency, 334 00:16:52,484 --> 00:16:54,150 you should realize that what they really 335 00:16:54,150 --> 00:16:58,640 are talking about isolated, A1 before A2 or A2 before A1. 336 00:16:58,640 --> 00:17:00,710 But we will mean by consistency, and we 337 00:17:00,710 --> 00:17:03,330 will get into this next week, is that there 338 00:17:03,330 --> 00:17:07,450 is some invariant for the application that is often using 339 00:17:07,450 --> 00:17:10,500 atomicity that is maintained. 340 00:17:10,500 --> 00:17:12,880 For example, in a banking application, 341 00:17:12,880 --> 00:17:14,599 if you take the transfer examples, 342 00:17:14,599 --> 00:17:16,470 isolated means that you want the result 343 00:17:16,470 --> 00:17:20,010 to be as if the transfers ran in some serial order. 344 00:17:20,010 --> 00:17:22,069 Consistent means that there might be a high level 345 00:17:22,069 --> 00:17:26,359 notion that the designer of this banking application 346 00:17:26,359 --> 00:17:29,350 might have wanted, such as a bank might have a rule that 347 00:17:29,350 --> 00:17:34,050 says that at the end of each day every checking account should 348 00:17:34,050 --> 00:17:37,270 have an amount that is at least 10% 349 00:17:37,270 --> 00:17:40,060 of the corresponding savings account. 350 00:17:40,060 --> 00:17:43,990 Now, during the middle of the day 351 00:17:43,990 --> 00:17:46,100 there might be individual actions that 352 00:17:46,100 --> 00:17:49,330 transiently violate that rule. 353 00:17:49,330 --> 00:17:51,790 But, at various points, the designer 354 00:17:51,790 --> 00:17:56,112 might wish to insure that a rule is the checking 355 00:17:56,112 --> 00:17:58,320 account must have at least a certain amount of money, 356 00:17:58,320 --> 00:18:00,540 some fraction of the savings account. 357 00:18:00,540 --> 00:18:04,280 Or in some payroll application for a company, 358 00:18:04,280 --> 00:18:07,670 they are modifying the payroll and giving raises 359 00:18:07,670 --> 00:18:09,115 to various people, but they might 360 00:18:09,115 --> 00:18:10,990 have a rule that says you could give whatever 361 00:18:10,990 --> 00:18:14,390 raise you want but every manager must make at least 5% more 362 00:18:14,390 --> 00:18:17,217 than all of his or her direct reports. 363 00:18:17,217 --> 00:18:18,550 You might have a rule like that. 364 00:18:18,550 --> 00:18:20,654 All of these are applications of an invariant that 365 00:18:20,654 --> 00:18:22,570 correspond to the consistency of the data that 366 00:18:22,570 --> 00:18:25,810 is being maintained in this example in a database. 367 00:18:25,810 --> 00:18:30,160 And you can use database systems to provide these consistency 368 00:18:30,160 --> 00:18:30,700 rules. 369 00:18:30,700 --> 00:18:32,290 But that is different from isolation. 370 00:18:32,290 --> 00:18:36,350 Isolation just says that there has 371 00:18:36,350 --> 00:18:41,510 to be some equivalent serial ordering in which things run. 372 00:18:41,510 --> 00:18:46,082 And the fourth property after recoverability, isolation 373 00:18:46,082 --> 00:18:47,415 and consistency is "durability". 374 00:18:52,350 --> 00:18:54,110 Durability basically says that the data 375 00:18:54,110 --> 00:18:56,650 should last for as long as-- 376 00:18:56,650 --> 00:18:59,950 It's an application-specific concept, but what it says 377 00:18:59,950 --> 00:19:02,780 is the data must last for as long 378 00:19:02,780 --> 00:19:05,460 as some pre-defined duration. 379 00:19:05,460 --> 00:19:07,460 For example, you might store data in a database. 380 00:19:07,460 --> 00:19:09,290 And, in many databases, you really 381 00:19:09,290 --> 00:19:11,170 want it to last "forever". 382 00:19:11,170 --> 00:19:14,370 But in reality it is very hard to make things last forever 383 00:19:14,370 --> 00:19:17,800 so you might define that the data in this database 384 00:19:17,800 --> 00:19:21,000 must last for three years, and you work hard to preserve that. 385 00:19:21,000 --> 00:19:22,930 Or you might have an application that as long 386 00:19:22,930 --> 00:19:25,120 as the thread is running you want the data to last, 387 00:19:25,120 --> 00:19:27,294 but after the thread is terminated 388 00:19:27,294 --> 00:19:28,960 you do not actually care about the data. 389 00:19:28,960 --> 00:19:31,070 And that is a different notion of durability. 390 00:19:31,070 --> 00:19:33,860 But both of these have talked about the lifetime with which 391 00:19:33,860 --> 00:19:37,450 you want to preserve data. 392 00:19:37,450 --> 00:19:40,030 Now, when you have a system that provides recoverability 393 00:19:40,030 --> 00:19:43,880 and isolation, that is atomicity, consistency 394 00:19:43,880 --> 00:19:45,620 and durability, then we are going 395 00:19:45,620 --> 00:19:52,150 to call that a transaction. 396 00:19:52,150 --> 00:19:55,650 A set of actions, each of which is recoverable, 397 00:19:55,650 --> 00:19:58,090 that are isolated from each other, that 398 00:19:58,090 --> 00:20:00,510 has a notion of consistency and can achieve it 399 00:20:00,510 --> 00:20:04,620 and where the data has durability, those actions 400 00:20:04,620 --> 00:20:07,130 are called transactions. 401 00:20:07,130 --> 00:20:09,957 And many database systems work hard to provide transactions, 402 00:20:09,957 --> 00:20:11,915 which means they provide all of these features. 403 00:20:14,570 --> 00:20:16,260 But it is certainly possible, and we 404 00:20:16,260 --> 00:20:19,020 will look at many examples where you can just 405 00:20:19,020 --> 00:20:22,920 design systems that have just recoverability and isolation. 406 00:20:22,920 --> 00:20:25,467 And we will not even worry about these other notions. 407 00:20:25,467 --> 00:20:26,800 That is what we will start with. 408 00:20:26,800 --> 00:20:28,510 We do not want to solve all of the problems at once. 409 00:20:28,510 --> 00:20:30,430 We will start with the easier set of problems 410 00:20:30,430 --> 00:20:31,513 and then build from there. 411 00:20:44,170 --> 00:20:46,540 Today, and on Wednesday, our plan 412 00:20:46,540 --> 00:20:49,000 is to come up with ways of achieving recoverability. 413 00:20:49,000 --> 00:20:50,833 So that is what we are going to start doing. 414 00:20:59,392 --> 00:21:00,850 The general approach for how we are 415 00:21:00,850 --> 00:21:02,600 going to achieve recoverability of modules 416 00:21:02,600 --> 00:21:04,570 is, and recall that the problem here 417 00:21:04,570 --> 00:21:08,322 is M2 fails and then M1 somehow discovers its failure 418 00:21:08,322 --> 00:21:10,030 and then when it restarts you do not want 419 00:21:10,030 --> 00:21:12,840 any partial state to be kept. 420 00:21:12,840 --> 00:21:17,449 The general plan is to design modules to be failed fast. 421 00:21:17,449 --> 00:21:19,740 You need a way to discover that things are not working, 422 00:21:19,740 --> 00:21:21,930 and that is the scope of the kinds of systems we 423 00:21:21,930 --> 00:21:24,870 are going to be dealing with. 424 00:21:24,870 --> 00:21:27,910 And then once the system's failure is detected 425 00:21:27,910 --> 00:21:30,280 and then you restart the system or it recovers, 426 00:21:30,280 --> 00:21:34,160 you run some kind of a repair procedure. 427 00:21:34,160 --> 00:21:36,550 This is in general you run some kind of repair procedure 428 00:21:36,550 --> 00:21:40,340 that allows that failed module to recover 429 00:21:40,340 --> 00:21:45,510 and then it restarts where restarts 430 00:21:45,510 --> 00:21:48,320 means it allows, M1 in this case, 431 00:21:48,320 --> 00:21:52,140 allows invokers to start running on that system, on that module. 432 00:22:00,250 --> 00:22:02,564 We are going to do this in three steps. 433 00:22:02,564 --> 00:22:03,980 The first thing we are going to do 434 00:22:03,980 --> 00:22:07,590 is to look at a very specific special case of this problem 435 00:22:07,590 --> 00:22:11,410 which is realize that all of these having 436 00:22:11,410 --> 00:22:13,880 to do with partial state occur because there 437 00:22:13,880 --> 00:22:16,330 is some state, once a module has crashed 438 00:22:16,330 --> 00:22:18,080 there is some state that it has remaining. 439 00:22:18,080 --> 00:22:20,860 So if it just recovered and started running again 440 00:22:20,860 --> 00:22:23,010 without doing something then that partial state 441 00:22:23,010 --> 00:22:28,090 is visible to the invoker of that module. 442 00:22:28,090 --> 00:22:30,970 Now, if the state were all a volatile state like in just 443 00:22:30,970 --> 00:22:33,080 RAM, for example, and a thread crashed, 444 00:22:33,080 --> 00:22:35,330 if it was in its virtual memory and the thread crashed 445 00:22:35,330 --> 00:22:36,700 and it recovered then you do not really 446 00:22:36,700 --> 00:22:38,800 have to worry about this because all of the state 447 00:22:38,800 --> 00:22:41,507 anywhere has gone away. 448 00:22:41,507 --> 00:22:43,090 Primarily, we were worried about state 449 00:22:43,090 --> 00:22:47,130 that lasts across failures. 450 00:22:47,130 --> 00:22:50,420 And an example of that is the state 451 00:22:50,420 --> 00:22:53,710 that is maintained on this, just as a concrete example. 452 00:22:53,710 --> 00:22:59,170 We are going to start first by obtaining a recoverable sector. 453 00:23:02,990 --> 00:23:06,700 Basically coming up with the scheme that allows us to do 454 00:23:06,700 --> 00:23:08,670 reads and writes of a single sector 455 00:23:08,670 --> 00:23:10,440 of a disk in a recoverable way. 456 00:23:10,440 --> 00:23:13,090 So we are going to define two procedures, a recoverable "put" 457 00:23:13,090 --> 00:23:14,870 that allows you to put stuff, write stuff 458 00:23:14,870 --> 00:23:17,270 onto a single sector of a disk and the recoverable 459 00:23:17,270 --> 00:23:18,760 "get" that allows you to read stuff 460 00:23:18,760 --> 00:23:24,980 of a single sector of a disk in a way that is recoverable. 461 00:23:24,980 --> 00:23:27,110 And the hard problem here is going 462 00:23:27,110 --> 00:23:30,150 to be that as the system is crashing, 463 00:23:30,150 --> 00:23:32,590 for a variety of reasons, bad data might 464 00:23:32,590 --> 00:23:34,760 get written to a sector. 465 00:23:34,760 --> 00:23:38,840 If you just took a regular sector of your disk, 466 00:23:38,840 --> 00:23:41,460 let's say that the operating system 467 00:23:41,460 --> 00:23:43,460 is trying to write something into a disk sector, 468 00:23:43,460 --> 00:23:45,910 somebody turns off the power and random stuff 469 00:23:45,910 --> 00:23:48,909 might get written out onto the disk. 470 00:23:48,909 --> 00:23:50,450 And so when the system comes back up, 471 00:23:50,450 --> 00:23:53,270 the reader of that sector might get some garbage value, 472 00:23:53,270 --> 00:23:55,410 a result of some partial write. 473 00:23:55,410 --> 00:23:57,710 So that is what we are going to try to avoid. 474 00:23:57,710 --> 00:24:00,990 So we will do that first. 475 00:24:00,990 --> 00:24:04,010 And that is for next time, to complete 476 00:24:04,010 --> 00:24:05,670 the recoverability story. 477 00:24:05,670 --> 00:24:10,550 We are going to use this solution as a building-block 478 00:24:10,550 --> 00:24:12,500 for a more general solution because it is not 479 00:24:12,500 --> 00:24:15,090 going to be enough for us to just be able to read and write 480 00:24:15,090 --> 00:24:16,620 single sectors in a recoverable way 481 00:24:16,620 --> 00:24:19,612 because how many applications use only one sector of a disk? 482 00:24:19,612 --> 00:24:21,320 What you would like to do is to make sure 483 00:24:21,320 --> 00:24:22,630 that you have a general solution that 484 00:24:22,630 --> 00:24:25,350 works across all of the data that is being written and read. 485 00:24:25,350 --> 00:24:27,780 We are going to use that to come up with two schemes. 486 00:24:27,780 --> 00:24:31,260 The first scheme uses an idea called a "version history". 487 00:24:31,260 --> 00:24:40,470 And a second scheme uses an idea called "logging" using logs. 488 00:24:40,470 --> 00:24:42,640 And both of these schemes will turn out 489 00:24:42,640 --> 00:24:44,830 to be very general and useful and work, 490 00:24:44,830 --> 00:24:46,290 but both of these schemes basically 491 00:24:46,290 --> 00:24:50,690 will use this technique as a bootstrapping technique. 492 00:24:50,690 --> 00:24:53,150 And so we need a solution here anyway because we 493 00:24:53,150 --> 00:24:55,970 are going to build on that to develop a more sophisticated 494 00:24:55,970 --> 00:24:58,570 solution for the general case. 495 00:24:58,570 --> 00:25:02,314 And so today we are going to start with a special case. 496 00:25:02,314 --> 00:25:03,730 A, because it is a building block, 497 00:25:03,730 --> 00:25:06,340 and, B, because it will turn out to show us 498 00:25:06,340 --> 00:25:09,000 a rule that we are going to religiously following in coming 499 00:25:09,000 --> 00:25:14,734 up with systematic solutions to work in a more general case 500 00:25:14,734 --> 00:25:16,650 when you have more than one sector being read. 501 00:25:24,710 --> 00:25:27,120 So let's write out the assumptions in the model 502 00:25:27,120 --> 00:25:29,711 here for this solution. 503 00:25:29,711 --> 00:25:31,460 The first assumption we are going to make, 504 00:25:31,460 --> 00:25:33,160 since we are dealing with recoverability and not 505 00:25:33,160 --> 00:25:33,820 with isolation. 506 00:25:33,820 --> 00:25:36,426 We are going to deal with isolation next week. 507 00:25:36,426 --> 00:25:37,800 The first assumption we will make 508 00:25:37,800 --> 00:25:44,161 is that there is no concurrency, and we will come up 509 00:25:44,161 --> 00:25:45,660 with different solutions for dealing 510 00:25:45,660 --> 00:25:49,210 with people concurrently trying to write the same sector. 511 00:25:54,922 --> 00:25:56,630 And this is an assumption we will revisit 512 00:25:56,630 --> 00:25:58,130 in a couple of weeks to show you how 513 00:25:58,130 --> 00:26:00,470 to actually achieve this goal. 514 00:26:00,470 --> 00:26:05,400 But we will assume that there are no hardware failures, 515 00:26:05,400 --> 00:26:06,330 no hardware errors. 516 00:26:11,900 --> 00:26:14,320 For example, the appendix to Chapter 8, 517 00:26:14,320 --> 00:26:17,280 which we have assigned for reading later 518 00:26:17,280 --> 00:26:21,500 on in the semester, actually shows two methods, 519 00:26:21,500 --> 00:26:24,300 "careful put" and "careful get" that actually 520 00:26:24,300 --> 00:26:26,530 deal with a variety of hardware problems. 521 00:26:26,530 --> 00:26:30,820 For example, every sector has a disk "checks-them" on it. 522 00:26:30,820 --> 00:26:33,810 If you wrote bad data and something happened 523 00:26:33,810 --> 00:26:36,940 in the middle of that write and then someone went back and read 524 00:26:36,940 --> 00:26:38,920 that sector, they would discover that it is bad 525 00:26:38,920 --> 00:26:41,880 because the checks-them would not match. 526 00:26:41,880 --> 00:26:44,030 Now, the appendix to this chapter, 9B, 527 00:26:44,030 --> 00:26:46,210 has a more careful description of how 528 00:26:46,210 --> 00:26:48,020 you deal with a variety of errors 529 00:26:48,020 --> 00:26:50,500 so that you can achieve this careful put and careful get 530 00:26:50,500 --> 00:26:53,749 of a disk sector. 531 00:26:53,749 --> 00:26:55,790 Assume for now that there are no hardware errors, 532 00:26:55,790 --> 00:26:57,987 there is no decay of data on the disk and so on. 533 00:26:57,987 --> 00:27:00,070 It will turn out the problem is still interesting, 534 00:27:00,070 --> 00:27:03,740 that it is not easy to achieve a recoverable put 535 00:27:03,740 --> 00:27:06,360 and get even though the hardware is fine. 536 00:27:06,360 --> 00:27:08,310 And that is because there are software errors. 537 00:27:13,590 --> 00:27:16,600 And, in particular, the model here 538 00:27:16,600 --> 00:27:18,310 is that you have some application 539 00:27:18,310 --> 00:27:20,930 and then you have the operating system. 540 00:27:20,930 --> 00:27:22,950 And the operating system has a buffer 541 00:27:22,950 --> 00:27:28,730 here of data that it is waiting to write onto disk. 542 00:27:28,730 --> 00:27:35,210 Then you have a disk and that is a disk sector. 543 00:27:35,210 --> 00:27:37,550 The problem might be that as a failure occurs 544 00:27:37,550 --> 00:27:40,355 there is something that happens, an error or something that gets 545 00:27:40,355 --> 00:27:41,730 triggered in the operating system 546 00:27:41,730 --> 00:27:43,521 so the buffer gets corrupted and then there 547 00:27:43,521 --> 00:27:46,950 is some bad data that gets written out onto the sector. 548 00:27:46,950 --> 00:27:50,977 That is the kind of problem that we want to protect against. 549 00:27:50,977 --> 00:27:52,560 The fact that your hardware is perfect 550 00:27:52,560 --> 00:27:54,060 does not actually solve this problem 551 00:27:54,060 --> 00:27:56,769 because this buffer itself has been corrupted or something 552 00:27:56,769 --> 00:27:58,310 happens during the process of writing 553 00:27:58,310 --> 00:28:01,877 this buffer to the sector so the data itself is bad, 554 00:28:01,877 --> 00:28:03,710 and that is what we want to protect against. 555 00:28:09,180 --> 00:28:24,559 We are going to build on something that I have already 556 00:28:24,559 --> 00:28:25,100 talked about. 557 00:28:25,100 --> 00:28:31,350 We are going to build on two procedures, careful put that 558 00:28:31,350 --> 00:28:35,160 puts to a sector, it puts some data, 559 00:28:35,160 --> 00:28:40,760 and the corresponding careful get which reads from a sector 560 00:28:40,760 --> 00:28:42,710 and returns the data that is on that sector. 561 00:28:42,710 --> 00:28:44,190 And the assumption is that careful 562 00:28:44,190 --> 00:28:46,377 put and get, once you give it some data there 563 00:28:46,377 --> 00:28:48,710 are no hardware failures for you to worry about anymore. 564 00:28:54,489 --> 00:28:56,530 The solution we are going to take to this problem 565 00:28:56,530 --> 00:28:59,570 is to realize that when a failure happens, 566 00:28:59,570 --> 00:29:01,880 for example, somebody turns off the power switch 567 00:29:01,880 --> 00:29:06,250 and this buffer gets corrupted, when the operating systems does 568 00:29:06,250 --> 00:29:07,832 a write to that sector, the sector 569 00:29:07,832 --> 00:29:09,790 might be left in a state that does not actually 570 00:29:09,790 --> 00:29:11,456 correspond to the data that was intended 571 00:29:11,456 --> 00:29:13,570 to put onto that sector. 572 00:29:13,570 --> 00:29:16,350 And so when the system recovers you are sort of stuck 573 00:29:16,350 --> 00:29:21,210 because this data in the sector contains some values in it that 574 00:29:21,210 --> 00:29:25,920 do not actually correspond to any actual intended put 575 00:29:25,920 --> 00:29:29,320 of the data, any intended write of the data. 576 00:29:29,320 --> 00:29:34,360 What this suggests is that a solution to this problem 577 00:29:34,360 --> 00:29:36,860 must involve a copy of some kind. 578 00:29:36,860 --> 00:29:40,340 You must make sure that if you have just one copy of the data 579 00:29:40,340 --> 00:29:42,890 and you write to it and something fails in the middle 580 00:29:42,890 --> 00:29:44,610 and you do not have a plan to back out 581 00:29:44,610 --> 00:29:49,279 to an earlier working version that was correct you are stuck. 582 00:29:49,279 --> 00:29:51,320 That suggests that we better have a solution that 583 00:29:51,320 --> 00:29:54,770 involves a copy of data. 584 00:29:54,770 --> 00:29:57,680 Later on we will see how to systematically [develop 585 00:29:57,680 --> 00:29:58,840 a rule?] based on this. 586 00:30:02,390 --> 00:30:04,520 The idea here is very simple. 587 00:30:04,520 --> 00:30:08,380 The way we are going to achieve a "recoverable get of a sector" 588 00:30:08,380 --> 00:30:13,480 is actually to build a single sector, a recoverable sector 589 00:30:13,480 --> 00:30:15,150 out of three sectors. 590 00:30:15,150 --> 00:30:19,410 The first sector here is going to have one copy of the data, 591 00:30:19,410 --> 00:30:22,306 the second sector is going to have another copy of the data 592 00:30:22,306 --> 00:30:24,180 and we are going to have a third sector which 593 00:30:24,180 --> 00:30:28,090 is going to act as a flag that allows us to choose one 594 00:30:28,090 --> 00:30:29,840 version or the other version. 595 00:30:29,840 --> 00:30:34,430 Let me call this D0, let me call this D1 596 00:30:34,430 --> 00:30:38,620 and let me call this the "chooser". 597 00:30:42,900 --> 00:30:49,460 Assume that at some point in time D0 has proper data on it. 598 00:30:49,460 --> 00:30:52,140 The idea now is going to be that anybody reading it, 599 00:30:52,140 --> 00:30:55,180 the chooser is going to contain the value zero in it. 600 00:30:55,180 --> 00:30:58,810 Now, anybody reading is going to read from D0. 601 00:30:58,810 --> 00:31:03,150 Anybody writing in recoverable put 602 00:31:03,150 --> 00:31:05,260 is not allowed to write to D0 because that is 603 00:31:05,260 --> 00:31:06,990 what people are reading from. 604 00:31:06,990 --> 00:31:08,300 Instead, they will write to D1. 605 00:31:08,300 --> 00:31:09,800 When the chooser value is zero, they 606 00:31:09,800 --> 00:31:12,140 will start writing into D1. 607 00:31:12,140 --> 00:31:16,170 The plan is going to be that if that write succeeds properly 608 00:31:16,170 --> 00:31:18,300 then what we will do is go ahead and change 609 00:31:18,300 --> 00:31:22,390 the chooser from zero to a one, and then people 610 00:31:22,390 --> 00:31:24,395 will start reading from one. 611 00:31:24,395 --> 00:31:26,270 But if that write were to fail in the middle, 612 00:31:26,270 --> 00:31:28,370 if the power fails or something like that, 613 00:31:28,370 --> 00:31:33,290 D1 will be left in sort of a weird intermediate state. 614 00:31:33,290 --> 00:31:35,010 But that is OK because nobody is really 615 00:31:35,010 --> 00:31:36,620 going to be reading from D1. 616 00:31:36,620 --> 00:31:40,030 They are all going to be reading from D0 because the chooser has 617 00:31:40,030 --> 00:31:42,859 not yet been changed. 618 00:31:42,859 --> 00:31:44,650 The only other thing we have to worry about 619 00:31:44,650 --> 00:31:47,690 is now we are OK, as long as the failure happens, 620 00:31:47,690 --> 00:31:50,202 if the failure happens in the middle 621 00:31:50,202 --> 00:31:52,660 here somewhere where we are writing D1 we are OK because we 622 00:31:52,660 --> 00:31:54,900 have not touched the chooser. 623 00:31:54,900 --> 00:31:57,770 If the failure happens at the end of writing D1-- 624 00:31:57,770 --> 00:32:00,590 So we have written D1 and then we have not yet 625 00:32:00,590 --> 00:32:03,990 started writing the chooser and a failure happens here, 626 00:32:03,990 --> 00:32:06,407 we are still OK because everybody 627 00:32:06,407 --> 00:32:07,490 will be reading from zero. 628 00:32:07,490 --> 00:32:09,550 And that is not going to have garbage in it. 629 00:32:09,550 --> 00:32:11,650 It is not going to have the latest value in it. 630 00:32:11,650 --> 00:32:12,380 But that is OK. 631 00:32:12,380 --> 00:32:15,160 We never said that we should see the latest value 632 00:32:15,160 --> 00:32:16,510 for recoverability to hold. 633 00:32:16,510 --> 00:32:19,960 It is going to be OK for us to be reading from D0 634 00:32:19,960 --> 00:32:21,830 and continue to read from D0. 635 00:32:21,830 --> 00:32:23,910 And really the correctness of this 636 00:32:23,910 --> 00:32:26,340 boils down to understanding what will happen 637 00:32:26,340 --> 00:32:28,850 when a failure happens during the middle of writing 638 00:32:28,850 --> 00:32:30,357 this sector. 639 00:32:30,357 --> 00:32:32,190 You are starting to write the chooser sector 640 00:32:32,190 --> 00:32:34,400 and the system fails. 641 00:32:34,400 --> 00:32:36,790 And we do not have to worry about that because now we 642 00:32:36,790 --> 00:32:38,831 have written D1 completely and a failure happened 643 00:32:38,831 --> 00:32:40,700 in the middle of that. 644 00:32:40,700 --> 00:32:42,286 To understand that, we will get back 645 00:32:42,286 --> 00:32:43,910 to understanding the correctness of it, 646 00:32:43,910 --> 00:32:45,910 but it helps to see what pseudo code looks like. 647 00:32:49,710 --> 00:32:53,340 So that is what put looks like. 648 00:32:53,340 --> 00:32:56,137 To do a put, you first read the chooser sector 649 00:32:56,137 --> 00:32:57,720 and then you put into the other place. 650 00:33:01,556 --> 00:33:02,930 This which here is the thing that 651 00:33:02,930 --> 00:33:05,100 tells you what the value of the chooser sector is. 652 00:33:05,100 --> 00:33:09,620 It tells you which of the two copies to write into. 653 00:33:09,620 --> 00:33:13,200 And then after you do the careful put, if which is zero, 654 00:33:13,200 --> 00:33:16,480 you put it to one, if which is one, you put it to zero. 655 00:33:16,480 --> 00:33:19,590 After that you twiddle a bit and then 656 00:33:19,590 --> 00:33:22,540 you do a put onto the chooser sector. 657 00:33:22,540 --> 00:33:24,820 The get is actually easier. 658 00:33:24,820 --> 00:33:27,840 You just look at what the value is of the chooser sector 659 00:33:27,840 --> 00:33:31,530 and then get it from the corresponding place. 660 00:33:31,530 --> 00:33:34,590 Now, there is a line here, the second line of this pseudo code 661 00:33:34,590 --> 00:33:36,230 which says status "not-OK". 662 00:33:36,230 --> 00:33:38,470 So status not-OK is the key thing. 663 00:33:38,470 --> 00:33:40,360 If status not-OK is what happens when 664 00:33:40,360 --> 00:33:43,500 a failure happens in the middle of writing the chooser sector. 665 00:33:43,500 --> 00:33:46,950 Let's say a failure happens on this pseudo code, 666 00:33:46,950 --> 00:33:49,830 I already explained why there is no problem if a failure happens 667 00:33:49,830 --> 00:33:52,210 until you get to the last line, until you 668 00:33:52,210 --> 00:33:55,290 get to the careful put of the chooser sector. 669 00:33:55,290 --> 00:33:58,680 Until that line is executed nobody sees the new data. 670 00:33:58,680 --> 00:34:00,162 Everybody doing a get is continuing 671 00:34:00,162 --> 00:34:02,120 to see the old data, not the new data that just 672 00:34:02,120 --> 00:34:04,770 got written with careful put. 673 00:34:04,770 --> 00:34:08,460 After this careful put executes and returns then 674 00:34:08,460 --> 00:34:10,880 everybody is going to see the new data because the chooser 675 00:34:10,880 --> 00:34:13,401 sector has been correctly changed. 676 00:34:13,401 --> 00:34:14,900 The only tricky part to worry about, 677 00:34:14,900 --> 00:34:18,010 we have reduced this problem of the slightly more general case 678 00:34:18,010 --> 00:34:20,340 of writing these sectors and switching between then 679 00:34:20,340 --> 00:34:22,409 to this specific problem of figuring out 680 00:34:22,409 --> 00:34:24,950 what happens if a failure occurs in the middle of the chooser 681 00:34:24,950 --> 00:34:26,179 sector's write. 682 00:34:26,179 --> 00:34:28,620 If a failure happens here, one of the common things 683 00:34:28,620 --> 00:34:32,639 that could happen is that this particular sector's checks-them 684 00:34:32,639 --> 00:34:37,690 does not match the data that is written here. 685 00:34:37,690 --> 00:34:39,469 So when you do a get of that sector 686 00:34:39,469 --> 00:34:41,895 here, in the first line up there, when 687 00:34:41,895 --> 00:34:43,270 you do a careful get of that, you 688 00:34:43,270 --> 00:34:45,429 will find that the checks-them does not 689 00:34:45,429 --> 00:34:47,860 match so it returns a status of not-OK. 690 00:34:47,860 --> 00:34:49,400 If the status is not OK, you will 691 00:34:49,400 --> 00:34:52,920 have to figure out which of the two copies to put. 692 00:34:52,920 --> 00:34:54,949 Now, the reason you can pick either 693 00:34:54,949 --> 00:35:00,430 and you can arbitrarily pick read the data from sector zero. 694 00:35:00,430 --> 00:35:02,937 But you could pick either of these. 695 00:35:02,937 --> 00:35:04,520 And the reason is it OK to pick either 696 00:35:04,520 --> 00:35:07,590 is you know for sure that the failure must have happened here 697 00:35:07,590 --> 00:35:11,160 while writing this chooser sector. 698 00:35:11,160 --> 00:35:13,730 And because there are no concurrent threads going on, 699 00:35:13,730 --> 00:35:16,510 you are assured that there is no failure that happened here 700 00:35:16,510 --> 00:35:20,260 while writing D0, nor was there any failure 701 00:35:20,260 --> 00:35:24,672 that occurred here while writing D1 702 00:35:24,672 --> 00:35:26,130 because the assumption we have made 703 00:35:26,130 --> 00:35:28,500 is that there is no concurrency. 704 00:35:28,500 --> 00:35:30,580 A system crashes and recovers and discovers 705 00:35:30,580 --> 00:35:34,320 that there is a failure, or the careful-get of the chooser 706 00:35:34,320 --> 00:35:36,450 sector did not quite work out, did not 707 00:35:36,450 --> 00:35:38,260 give you a status of OK, that it was not 708 00:35:38,260 --> 00:35:40,869 OK then you know the failure happened while writing here. 709 00:35:40,869 --> 00:35:42,910 And what that means is it is perfectly OK for you 710 00:35:42,910 --> 00:35:44,430 to read from either version. 711 00:35:44,430 --> 00:35:48,400 Both of those correspond to a write 712 00:35:48,400 --> 00:35:53,320 to that individual sector that did not fail in the middle. 713 00:35:53,320 --> 00:35:56,520 And it does not matter which of the two you pick. 714 00:35:56,520 --> 00:35:59,890 That is the reason why this approach basically works. 715 00:36:02,580 --> 00:36:04,190 And if you look at this solution, 716 00:36:04,190 --> 00:36:07,010 this copy idea is actually a pretty critical idea 717 00:36:07,010 --> 00:36:10,515 for all of our solutions to achieving recoverability. 718 00:36:10,515 --> 00:36:11,890 And it is going to lead to a rule 719 00:36:11,890 --> 00:36:14,140 that we are going to call the "Golden Rule 720 00:36:14,140 --> 00:36:16,740 of Recoverability". 721 00:36:16,740 --> 00:36:19,000 The rule says never modify the only copy. 722 00:36:22,770 --> 00:36:26,240 If you were asked to come up with a way 723 00:36:26,240 --> 00:36:29,449 to achieve something that is recoverable, one guideline, 724 00:36:29,449 --> 00:36:31,490 this is unfortunately not a sufficient condition. 725 00:36:31,490 --> 00:36:35,611 But a necessary condition is that if you have something, 726 00:36:35,611 --> 00:36:38,110 and you only have one copy of that which you end up writing, 727 00:36:38,110 --> 00:36:40,520 then chances are that if a failure happens in the middle 728 00:36:40,520 --> 00:36:44,390 of writing that one copy you cannot back out of it 729 00:36:44,390 --> 00:36:45,895 so your scheme would not work. 730 00:36:48,890 --> 00:36:50,640 So never modify the only copy of anything, 731 00:36:50,640 --> 00:36:52,330 that is the general rule. 732 00:36:57,680 --> 00:37:00,640 Now, there is another point to observe about 733 00:37:00,640 --> 00:37:03,840 this recoverable disk write. 734 00:37:03,840 --> 00:37:07,970 And that has to do with that careful put line. 735 00:37:07,970 --> 00:37:10,820 Write before that line, everybody else 736 00:37:10,820 --> 00:37:13,340 reading this recoverable sector using recoverable 737 00:37:13,340 --> 00:37:16,180 get sees the old version of data. 738 00:37:16,180 --> 00:37:19,260 Right after that line has finished, everybody reading it 739 00:37:19,260 --> 00:37:21,550 sees the new data. 740 00:37:21,550 --> 00:37:23,310 That line is an example of something 741 00:37:23,310 --> 00:37:26,670 that we will repeatedly visit and use 742 00:37:26,670 --> 00:37:29,880 called a "commit point". 743 00:37:29,880 --> 00:37:31,580 The successful completion of that line 744 00:37:31,580 --> 00:37:35,410 insures that everybody else following doing gets 745 00:37:35,410 --> 00:37:40,280 will see the data that was written by this put. 746 00:37:40,280 --> 00:37:43,200 And before that line is run, everybody else following 747 00:37:43,200 --> 00:37:46,880 will see the older version of the data. 748 00:37:46,880 --> 00:37:49,680 Now, if a failure occurs in the middle of that line 749 00:37:49,680 --> 00:37:53,930 then the answer depends on what the recovery procedure does. 750 00:37:53,930 --> 00:37:55,710 And one approach might be that the invoker 751 00:37:55,710 --> 00:37:58,810 of this module, the person who originally did the disk write-- 752 00:37:58,810 --> 00:38:01,100 If a failure happens in the middle of the write, 753 00:38:01,100 --> 00:38:03,920 one plan might be that the invoker of that disk write, 754 00:38:03,920 --> 00:38:10,474 upon recovery, tries the write again, tries the put again. 755 00:38:10,474 --> 00:38:11,890 And the way he tries the put is he 756 00:38:11,890 --> 00:38:14,550 first does a get and sees what answers return. 757 00:38:14,550 --> 00:38:16,140 If the answer is the new answer then 758 00:38:16,140 --> 00:38:17,760 he says OK everything is fine. 759 00:38:17,760 --> 00:38:19,430 If the answer is the old answer then 760 00:38:19,430 --> 00:38:22,060 he says I am going to retry the put. 761 00:38:22,060 --> 00:38:23,580 And this is an example of something 762 00:38:23,580 --> 00:38:25,746 we saw the last time which is "temporal redundancy". 763 00:38:25,746 --> 00:38:26,801 You can retry things. 764 00:38:26,801 --> 00:38:28,300 Not only can you replicate in space, 765 00:38:28,300 --> 00:38:31,024 but you can retry things in time which is the idea here 766 00:38:31,024 --> 00:38:32,273 for achieving fault-tolerance. 767 00:38:41,980 --> 00:38:43,990 An example of this idea called a commit point 768 00:38:43,990 --> 00:38:46,350 is that careful put line. 769 00:38:46,350 --> 00:38:47,940 And, in general, a commit point is 770 00:38:47,940 --> 00:38:49,970 a point in a recoverable action, in this case. 771 00:38:49,970 --> 00:38:51,595 And it will turn out to be an idea that 772 00:38:51,595 --> 00:38:55,150 is useful for isolated actions and for transactions more 773 00:38:55,150 --> 00:38:55,840 generally. 774 00:38:55,840 --> 00:38:59,490 But a commit point is a point where before the commit point 775 00:38:59,490 --> 00:39:01,675 other people do not see the results of your action. 776 00:39:01,675 --> 00:39:03,300 And after the commit point successfully 777 00:39:03,300 --> 00:39:05,845 finishes everybody sees the results of your action, 778 00:39:05,845 --> 00:39:07,720 and that is the definition of a commit point. 779 00:39:29,869 --> 00:39:32,410 Now we have to generalize this idea because what we have seen 780 00:39:32,410 --> 00:39:33,496 is a scheme. 781 00:39:33,496 --> 00:39:35,120 By the way, is this clear to everybody? 782 00:39:35,120 --> 00:39:40,540 Do you have any questions about recoverable put and get? 783 00:39:40,540 --> 00:39:43,070 What does that mean? 784 00:39:43,070 --> 00:39:45,270 No questions or not clear? 785 00:39:45,270 --> 00:39:49,470 All right. 786 00:39:49,470 --> 00:39:50,810 Good. 787 00:39:50,810 --> 00:39:55,300 Now we have to generalize this idea 788 00:39:55,300 --> 00:39:58,195 because the class of programs where you could just sort 789 00:39:58,195 --> 00:40:00,320 of read and write from one sector is quite limited. 790 00:40:04,060 --> 00:40:06,810 And so to generalize this idea of what we are going to do 791 00:40:06,810 --> 00:40:10,210 is to change the programming model 792 00:40:10,210 --> 00:40:12,974 for writing recoverable actions a little bit. 793 00:40:12,974 --> 00:40:14,890 Ideally, what you would like to be able to do, 794 00:40:14,890 --> 00:40:16,514 the model we are going to try to get at 795 00:40:16,514 --> 00:40:18,930 is to be able to take a procedure 796 00:40:18,930 --> 00:40:23,720 and begin recoverable action in front of that procedure, 797 00:40:23,720 --> 00:40:25,500 write code for that procedure and just 798 00:40:25,500 --> 00:40:28,710 say end recoverable action and sort of magically end up 799 00:40:28,710 --> 00:40:33,110 with a model where the set of steps in that action 800 00:40:33,110 --> 00:40:34,427 becomes recoverable. 801 00:40:34,427 --> 00:40:36,510 And it will turn out we have come very, very close 802 00:40:36,510 --> 00:40:38,720 to achieving this very general model 803 00:40:38,720 --> 00:40:41,450 by making some slight assumptions, 804 00:40:41,450 --> 00:40:43,590 or requiring the programmer to make 805 00:40:43,590 --> 00:40:46,455 some small assumptions in the way they write their programs. 806 00:40:50,390 --> 00:40:53,440 And this generalization to more general actions 807 00:40:53,440 --> 00:40:55,750 that are recoverable, generalizing 808 00:40:55,750 --> 00:41:00,415 from a single sector uses this idea of a commit point. 809 00:41:00,415 --> 00:41:01,790 The way this is going to work out 810 00:41:01,790 --> 00:41:06,340 is the programmer, for any recoverable action, 811 00:41:06,340 --> 00:41:09,640 he or she is going to end up writing this special function 812 00:41:09,640 --> 00:41:12,620 call called begin recoverable action 813 00:41:12,620 --> 00:41:17,260 and then writing the code for that recoverable action. 814 00:41:17,260 --> 00:41:21,130 And then at some point in the middle of this code calling 815 00:41:21,130 --> 00:41:25,200 a function called "commit". 816 00:41:25,200 --> 00:41:27,340 And the idea is that until this commit 817 00:41:27,340 --> 00:41:30,480 is called nobody else sees the results of this action. 818 00:41:30,480 --> 00:41:32,570 Which means that if a failure happened, 819 00:41:32,570 --> 00:41:39,020 upon crash recovery or once the system restarts, 820 00:41:39,020 --> 00:41:41,490 the result would be as if none of the steps of this action 821 00:41:41,490 --> 00:41:43,130 ever happened. 822 00:41:43,130 --> 00:41:44,420 So they are called commit. 823 00:41:44,420 --> 00:41:47,550 And then once commit finished then no matter what happens, 824 00:41:47,550 --> 00:41:49,820 a failure could happen and the system restarts, 825 00:41:49,820 --> 00:41:52,210 but once commit is called and it returns then you 826 00:41:52,210 --> 00:41:55,090 are guaranteed that all other actions see the state 827 00:41:55,090 --> 00:41:58,760 changes made by this action. 828 00:41:58,760 --> 00:42:00,540 So this is a special call. 829 00:42:00,540 --> 00:42:03,300 And then after commit they might have some other lines 830 00:42:03,300 --> 00:42:08,130 that they write and then they end the recoverable action. 831 00:42:08,130 --> 00:42:10,480 Now, in many, many cases, the very last thing 832 00:42:10,480 --> 00:42:13,660 that is done before the end recoverable action 833 00:42:13,660 --> 00:42:15,650 is the commit. 834 00:42:15,650 --> 00:42:18,380 But, in general, you might have other things here. 835 00:42:18,380 --> 00:42:20,910 And it will turn out that you cannot do arbitrary things 836 00:42:20,910 --> 00:42:22,030 here. 837 00:42:22,030 --> 00:42:25,690 For example, you cannot do disk writes that you want to make 838 00:42:25,690 --> 00:42:28,540 recoverable over here because the moment you do that, 839 00:42:28,540 --> 00:42:30,910 by definition, if a crash happens after a commit, 840 00:42:30,910 --> 00:42:33,170 we do not have a plan to back out of it. 841 00:42:33,170 --> 00:42:34,750 Because the semantics were that once 842 00:42:34,750 --> 00:42:36,270 a commit is done then no matter what 843 00:42:36,270 --> 00:42:37,686 happens the state of the system is 844 00:42:37,686 --> 00:42:41,690 as if all of the things in this action finished. 845 00:42:41,690 --> 00:42:43,920 The discipline is going to be, this thing is called 846 00:42:43,920 --> 00:42:48,730 the "pre-commit phase" and this thing here is called 847 00:42:48,730 --> 00:42:49,790 the "post-commit phase". 848 00:42:53,830 --> 00:42:56,660 And so the idea is that in the pre-commit phase 849 00:42:56,660 --> 00:42:59,240 you should always be prepared to back out. 850 00:42:59,240 --> 00:43:02,930 Because, by definition, if the failure occurs before commit 851 00:43:02,930 --> 00:43:06,120 is called the result is going to be as if nothing ever happened, 852 00:43:06,120 --> 00:43:08,280 which means that any change you make here 853 00:43:08,280 --> 00:43:10,510 you better religiously follow that never 854 00:43:10,510 --> 00:43:15,310 modify the only copy rule and be prepared to back out. 855 00:43:15,310 --> 00:43:19,990 In the post-commit phase, conversely, you 856 00:43:19,990 --> 00:43:22,074 don't have the option to back out 857 00:43:22,074 --> 00:43:23,990 so you better make sure that once you get here 858 00:43:23,990 --> 00:43:26,270 you just run to completion. 859 00:43:26,270 --> 00:43:29,380 If a failure occurs out here and you restart, 860 00:43:29,380 --> 00:43:32,259 you better make sure that you can run to completion. 861 00:43:32,259 --> 00:43:34,050 In fact, there are a few other restrictions 862 00:43:34,050 --> 00:43:36,990 out in the post-commit phase. 863 00:43:36,990 --> 00:43:38,580 Let me do this by an example. 864 00:43:38,580 --> 00:43:41,300 In the pre-commit phase, because you have to be prepared to back 865 00:43:41,300 --> 00:43:43,740 out, it often means in practice that you cannot be sending 866 00:43:43,740 --> 00:43:46,320 messages out onto the network. 867 00:43:46,320 --> 00:43:48,320 You can maintain your local state 868 00:43:48,320 --> 00:43:51,680 but you have a way to back out of that. 869 00:43:51,680 --> 00:43:54,330 But if you are sending messages out onto the network and you 870 00:43:54,330 --> 00:43:58,120 do not have a bigger story to deal with it-- 871 00:43:58,120 --> 00:44:01,459 We will talk later about nesting atomic actions within one 872 00:44:01,459 --> 00:44:03,500 another or nesting recoverable actions within one 873 00:44:03,500 --> 00:44:05,060 another in a few lectures from now. 874 00:44:05,060 --> 00:44:07,560 But, in the simple model, if you do anything that you cannot 875 00:44:07,560 --> 00:44:10,150 back out of such as sending a network packet then you are 876 00:44:10,150 --> 00:44:10,820 stuck. 877 00:44:10,820 --> 00:44:15,100 So all of that stuff like printing out checks or firing 878 00:44:15,100 --> 00:44:17,900 a bullet or things like that, that you cannot back out 879 00:44:17,900 --> 00:44:20,390 of, you better put out here. 880 00:44:20,390 --> 00:44:22,530 All the things that you can back out of go here. 881 00:44:22,530 --> 00:44:25,026 Likewise, nothing you can back out of can go here. 882 00:44:25,026 --> 00:44:27,150 Because, once you reach here and a failure happens, 883 00:44:27,150 --> 00:44:28,952 you have to continue to completion. 884 00:44:28,952 --> 00:44:31,160 What that means is in the first commit phase, really, 885 00:44:31,160 --> 00:44:32,840 you cannot do very many things. 886 00:44:32,840 --> 00:44:36,560 I mean you can do things that do not really have, for example, 887 00:44:36,560 --> 00:44:38,994 you can do things that are OK to keep doing. 888 00:44:38,994 --> 00:44:40,910 For example, you can do item put and operation 889 00:44:40,910 --> 00:44:43,505 so that if a failure happens here and you recover then 890 00:44:43,505 --> 00:44:45,130 you know that you are out at this point 891 00:44:45,130 --> 00:44:47,630 so you could keep retrying those actions over and over again 892 00:44:47,630 --> 00:44:50,950 until you insure that it completes. 893 00:44:50,950 --> 00:44:52,964 But those are the only rules. 894 00:44:52,964 --> 00:44:55,130 There is a pre-commit phase and a post-commit phase. 895 00:44:55,130 --> 00:44:58,077 There is a commit that is explicitly called. 896 00:44:58,077 --> 00:44:59,660 Now, in addition there is another call 897 00:44:59,660 --> 00:45:02,180 that a programmer can make or that the system can 898 00:45:02,180 --> 00:45:04,690 invoke automatically and that is called "abort". 899 00:45:08,652 --> 00:45:11,110 For example, when you are moving money from savings account 900 00:45:11,110 --> 00:45:13,800 to checking account in that transfer example, 901 00:45:13,800 --> 00:45:17,440 if you discover in the middle here that you do not 902 00:45:17,440 --> 00:45:19,250 have enough funds to cover that transfer, 903 00:45:19,250 --> 00:45:26,000 you could just decide to abort the recoverable action. 904 00:45:26,000 --> 00:45:29,020 And what that means is that abort automatically 905 00:45:29,020 --> 00:45:30,890 will insure that the state of the system 906 00:45:30,890 --> 00:45:34,300 is at the point right before the start 907 00:45:34,300 --> 00:45:35,504 of the recoverable action. 908 00:45:35,504 --> 00:45:37,670 Whatever changes were made in the middle until abort 909 00:45:37,670 --> 00:45:41,152 was called end up backing out. 910 00:45:41,152 --> 00:45:43,110 Now, abort might also be invoked by the system. 911 00:45:43,110 --> 00:45:46,300 In a database, there is somebody booking airline tickets, car 912 00:45:46,300 --> 00:45:48,800 reservations and all of that, and you discover in the middle 913 00:45:48,800 --> 00:45:50,870 that you are not actually able to find 914 00:45:50,870 --> 00:45:53,640 a hotel for the same dates. 915 00:45:53,640 --> 00:45:56,610 So you might just abort the whole process, control C 916 00:45:56,610 --> 00:45:58,320 the thread you are running, which 917 00:45:58,320 --> 00:46:00,460 means that all of the work that has been done 918 00:46:00,460 --> 00:46:01,430 has to be backed out. 919 00:46:01,430 --> 00:46:03,388 And so the system would normally implement that 920 00:46:03,388 --> 00:46:06,260 by aborting all of the changes that you have made so far. 921 00:46:06,260 --> 00:46:09,710 It will back out of your car reservation, 922 00:46:09,710 --> 00:46:11,750 back out of your airline reservation and so on. 923 00:46:11,750 --> 00:46:14,140 So abort is called in a few different contexts. 924 00:46:14,140 --> 00:46:16,090 Sometimes by the program itself, sometimes 925 00:46:16,090 --> 00:46:18,180 by the system to free up resources, 926 00:46:18,180 --> 00:46:22,400 sometimes by the user of your, say, transaction system. 927 00:46:22,400 --> 00:46:38,160 I am not going to get into how we implement recoverable action 928 00:46:38,160 --> 00:46:40,080 today, but this programming model 929 00:46:40,080 --> 00:46:41,300 is important to understand. 930 00:46:41,300 --> 00:46:43,040 I do want to mention one thing going back 931 00:46:43,040 --> 00:46:45,840 to this idea of isolation that we talked about. 932 00:46:45,840 --> 00:46:47,490 If you recall, isolation is this idea 933 00:46:47,490 --> 00:46:51,240 that you have two actions or multiple actions whose net 934 00:46:51,240 --> 00:46:53,440 effect is as if they ran in some sequential order, 935 00:46:53,440 --> 00:46:55,910 some serial order, A1 before A2 or A2 936 00:46:55,910 --> 00:47:01,220 before A1 for all implantation of A1, A2, A3, etc. 937 00:47:01,220 --> 00:47:03,950 Now, this idea is actually very closely related 938 00:47:03,950 --> 00:47:06,625 but not the same as stuff we have seen before. 939 00:47:06,625 --> 00:47:08,000 Earlier in the semester we looked 940 00:47:08,000 --> 00:47:11,834 at ways in which you have multiple threads that 941 00:47:11,834 --> 00:47:13,500 need to be synchronized with each other. 942 00:47:13,500 --> 00:47:16,540 And we actually did look at isolation as a concept 943 00:47:16,540 --> 00:47:19,330 then but we specifically focused on things 944 00:47:19,330 --> 00:47:20,950 like sequence coordination where you 945 00:47:20,950 --> 00:47:23,470 want to have one thread run before the other thread or one 946 00:47:23,470 --> 00:47:25,650 thread run off of the other thread. 947 00:47:25,650 --> 00:47:29,810 For example, in a producer-consumer relationship. 948 00:47:29,810 --> 00:47:34,790 The point is that in one significant respect, 949 00:47:34,790 --> 00:47:37,850 achieving this idea of isolation for actions 950 00:47:37,850 --> 00:47:42,040 is harder than achieving sequence coordination. 951 00:47:42,040 --> 00:47:44,600 And the reason it is harder is that everybody 952 00:47:44,600 --> 00:47:47,640 who writes an isolated action, in general, does not know, 953 00:47:47,640 --> 00:47:49,200 any given isolated action does not 954 00:47:49,200 --> 00:47:52,120 know what other actions there are in the system. 955 00:47:52,120 --> 00:47:55,000 So you might have 25 different actions all of which 956 00:47:55,000 --> 00:47:57,956 are touching the same data, but no single action 957 00:47:57,956 --> 00:47:59,580 is aware of all of these other actions. 958 00:47:59,580 --> 00:48:02,490 That is very different from sequence coordination. 959 00:48:02,490 --> 00:48:04,800 In sequence coordination, there is one or two 960 00:48:04,800 --> 00:48:07,917 or a small number of threads that are actually aware 961 00:48:07,917 --> 00:48:08,500 of each other. 962 00:48:08,500 --> 00:48:10,040 And there is a single programmer that is actually 963 00:48:10,040 --> 00:48:12,050 designing these things to specifically interact 964 00:48:12,050 --> 00:48:14,814 with each other in some fashion, so this thread runs and then 965 00:48:14,814 --> 00:48:16,230 this other one runs after the data 966 00:48:16,230 --> 00:48:18,080 has been produced and so on. 967 00:48:18,080 --> 00:48:20,230 In that sense, this kind of isolation 968 00:48:20,230 --> 00:48:25,550 is harder to achieve because each individual action does not 969 00:48:25,550 --> 00:48:26,980 know which other action there are. 970 00:48:26,980 --> 00:48:31,312 But, yet, you want to achieve this sequential goal. 971 00:48:31,312 --> 00:48:32,770 Now, in one other respect, actually 972 00:48:32,770 --> 00:48:36,720 isolated actions are easier than sequence coordination. 973 00:48:36,720 --> 00:48:38,720 And the significant way in which they are easier 974 00:48:38,720 --> 00:48:40,178 is they are easier for programmers. 975 00:48:42,300 --> 00:48:45,130 Because we are not worried about coordinating different actions 976 00:48:45,130 --> 00:48:50,680 with each other, once you design a system that inside the system 977 00:48:50,680 --> 00:48:53,810 deals with ways of achieving isolation, 978 00:48:53,810 --> 00:48:56,360 the programmers do not have to think about locks and unlocks 979 00:48:56,360 --> 00:48:59,210 and acquiring and releasing locks or other ways 980 00:48:59,210 --> 00:49:04,155 in which they control access to variables that might be shared. 981 00:49:04,155 --> 00:49:06,530 What this means is that if we can design isolated actions 982 00:49:06,530 --> 00:49:10,610 right and we do not worry about any serial order, 983 00:49:10,610 --> 00:49:12,980 A1 can run before A2 or A2 before A1, 984 00:49:12,980 --> 00:49:15,940 then it makes life a lot easier for a programmer. 985 00:49:15,940 --> 00:49:17,610 And our goal is to come up with ways 986 00:49:17,610 --> 00:49:20,400 of achieving recoverability and isolation that 987 00:49:20,400 --> 00:49:22,990 require very little from a programmer that 988 00:49:22,990 --> 00:49:24,600 wants these properties. 989 00:49:24,600 --> 00:49:26,590 It is a little bit like pixy dust. 990 00:49:26,590 --> 00:49:28,410 You might write a general program 991 00:49:28,410 --> 00:49:31,000 and come in and just put a begin recoverable action, 992 00:49:31,000 --> 00:49:34,177 end recoverable action and make a few changes to your program. 993 00:49:34,177 --> 00:49:36,010 Or you might just say begin isolated action, 994 00:49:36,010 --> 00:49:37,450 end isolated action, and magically 995 00:49:37,450 --> 00:49:40,450 the system achieves isolation or recoverability for you. 996 00:49:43,640 --> 00:49:45,710 It can make life much easier for a programmer 997 00:49:45,710 --> 00:49:47,240 but it is a harder problem for us 998 00:49:47,240 --> 00:49:49,340 because no single action is aware of all 999 00:49:49,340 --> 00:49:51,571 of the other actions in the system. 1000 00:49:51,571 --> 00:49:54,070 Next time we will see how to achieve recoverability and then 1001 00:49:54,070 --> 00:49:55,680 isolation and transactions. 1002 00:50:07,190 --> 00:50:09,300 Design Project 2 is out on the website now. 1003 00:50:09,300 --> 00:50:12,300 And the main thing for you to make sure you do this week 1004 00:50:12,300 --> 00:50:16,700 is get project partners and send a list of team members 1005 00:50:16,700 --> 00:50:19,460 to your teaching assistant by Thursday's recitation. 1006 00:50:19,460 --> 00:50:21,010 Thanks.