1 00:00:00,780 --> 00:00:03,350 Good afternoon. 2 00:00:03,350 --> 00:00:06,302 So we're going to continue our discussion about atomicity 3 00:00:06,302 --> 00:00:07,510 and how to achieve atomicity. 4 00:00:07,510 --> 00:00:09,030 And today the focus is going to be 5 00:00:09,030 --> 00:00:11,860 on implementing this idea called recoverability, 6 00:00:11,860 --> 00:00:15,440 which we just described and defined the last time. 7 00:00:15,440 --> 00:00:18,710 So if you recall from last time, the idea 8 00:00:18,710 --> 00:00:23,550 is that when you have modules that interact with one another, 9 00:00:23,550 --> 00:00:27,590 and in this example M1 calls M2, and M2 fails somewhere 10 00:00:27,590 --> 00:00:30,800 in the middle of this invocation and it recovers, 11 00:00:30,800 --> 00:00:32,670 the goal here is to try to make sure 12 00:00:32,670 --> 00:00:36,420 that the invoker of this module, in this case M1, 13 00:00:36,420 --> 00:00:38,950 or all subsequent invokers of M1, 14 00:00:38,950 --> 00:00:41,260 don't see any partial results that 15 00:00:41,260 --> 00:00:45,670 were computed during this execution when M2 failed. 16 00:00:45,670 --> 00:00:49,320 And this was the idea that we called recoverability. 17 00:00:49,320 --> 00:00:50,900 And the definition of recoverability 18 00:00:50,900 --> 00:00:54,100 was that an action, which is made up 19 00:00:54,100 --> 00:00:56,340 of a composite sequence of steps is 20 00:00:56,340 --> 00:00:59,380 recoverable from the point of view of its invoker, 21 00:00:59,380 --> 00:01:03,480 if it looks to the invoker and to all subsequent invokers as 22 00:01:03,480 --> 00:01:05,900 if this action either completely occurred, 23 00:01:05,900 --> 00:01:08,770 or if it didn't completely occur and aborted, it 24 00:01:08,770 --> 00:01:12,360 aborted in such a way that all partial effects of that action 25 00:01:12,360 --> 00:01:14,770 were undone or backed out. 26 00:01:14,770 --> 00:01:16,270 So in other words, recoverability is 27 00:01:16,270 --> 00:01:19,110 this idea that you either do it all, 28 00:01:19,110 --> 00:01:21,730 either complete the action, or do none of the action. 29 00:01:21,730 --> 00:01:23,560 But the effects are as if you were 30 00:01:23,560 --> 00:01:26,850 able to back out of the action. 31 00:01:26,850 --> 00:01:29,510 And we use this idea to then talk 32 00:01:29,510 --> 00:01:33,800 about a particular special case of [NOISE OBSCURES] 33 00:01:33,800 --> 00:01:36,720 to implement a recoverable sector, which 34 00:01:36,720 --> 00:01:39,680 is a single sector of a disk where 35 00:01:39,680 --> 00:01:43,820 what we were able to do was to ensure that everybody reading, 36 00:01:43,820 --> 00:01:46,710 we defined a put procedure and a get procedure. 37 00:01:46,710 --> 00:01:48,820 So, readers wouldn't [UNINTELLIGIBLE]. 38 00:01:48,820 --> 00:01:51,750 And we ensure that everybody doing a get 39 00:01:51,750 --> 00:01:54,470 would never see the partial results of any put. 40 00:01:54,470 --> 00:01:57,020 So, if a failure were to happen in the middle of a put, 41 00:01:57,020 --> 00:02:01,060 people doing a get wouldn't see these partial results. 42 00:02:01,060 --> 00:02:05,000 And, the main idea here was to actually maintain 43 00:02:05,000 --> 00:02:08,509 what is more generally known as a shadow version, or a shadow 44 00:02:08,509 --> 00:02:11,690 copy, or a shadow object of the data, 45 00:02:11,690 --> 00:02:14,620 and we maintained two versions of the data 46 00:02:14,620 --> 00:02:16,520 that we call D0 and D1. 47 00:02:16,520 --> 00:02:21,550 And, we maintain a sector that we 48 00:02:21,550 --> 00:02:24,410 call the chooser sector to choose between the two shadows. 49 00:02:24,410 --> 00:02:29,250 And, what we were able to argue was that this chooser always 50 00:02:29,250 --> 00:02:32,090 points to the version that you want people to get from 51 00:02:32,090 --> 00:02:35,510 to read from, and so when someone does a put, 52 00:02:35,510 --> 00:02:38,000 the idea is first to write to the version that's 53 00:02:38,000 --> 00:02:39,540 not currently being read from. 54 00:02:39,540 --> 00:02:40,800 So the chooser points to zero. 55 00:02:40,800 --> 00:02:45,927 Then the putter would put data, write data into one. 56 00:02:45,927 --> 00:02:48,260 And if the failure happened in the middle of that write, 57 00:02:48,260 --> 00:02:50,010 there's no problem because people who read 58 00:02:50,010 --> 00:02:52,050 would still read from zero. 59 00:02:52,050 --> 00:02:53,630 And we reduce this case of proving 60 00:02:53,630 --> 00:02:55,430 this algorithm correct to the case 61 00:02:55,430 --> 00:02:58,201 when a failure happened in the middle of writing the chooser 62 00:02:58,201 --> 00:02:58,700 sector. 63 00:02:58,700 --> 00:03:01,820 And we were able to argue that as long as people, 64 00:03:01,820 --> 00:03:04,070 if a failure happened in the middle of writing here, 65 00:03:04,070 --> 00:03:05,910 either of these versions is correct 66 00:03:05,910 --> 00:03:07,730 because a failure by definition didn't 67 00:03:07,730 --> 00:03:10,429 happen in the middle of writing either of these two sectors. 68 00:03:10,429 --> 00:03:12,220 And therefore you could pick either of them 69 00:03:12,220 --> 00:03:15,250 and read from it. 70 00:03:15,250 --> 00:03:17,740 And during this process, we came up 71 00:03:17,740 --> 00:03:20,630 with this notion which we're going to generalize 72 00:03:20,630 --> 00:03:21,990 today called a commit point. 73 00:03:27,220 --> 00:03:30,030 The commit point is the point at which for any action, 74 00:03:30,030 --> 00:03:33,160 the results are visible to subsequent actions. 75 00:03:33,160 --> 00:03:35,800 And if a failure happens before the commit point, 76 00:03:35,800 --> 00:03:37,650 then the idea is, in general, you 77 00:03:37,650 --> 00:03:41,190 would not want people not to see the partial results that 78 00:03:41,190 --> 00:03:44,335 might have accumulated before the failure occurred. 79 00:03:44,335 --> 00:03:46,210 And in this particular case, the commit point 80 00:03:46,210 --> 00:03:49,770 is when the chooser sector gets written to the current version 81 00:03:49,770 --> 00:03:50,420 of the data. 82 00:03:50,420 --> 00:03:54,050 And that call to writing the chooser sector returns. 83 00:03:54,050 --> 00:03:58,250 And if it returns, then you know that people doing a get 84 00:03:58,250 --> 00:04:00,360 will get from the version that just got written. 85 00:04:00,360 --> 00:04:02,590 So, in the implementation of recoverable put, 86 00:04:02,590 --> 00:04:05,960 the commit point was when this call returned. 87 00:04:08,940 --> 00:04:13,750 So now, the question for today is how we deal with larger 88 00:04:13,750 --> 00:04:16,910 actions -- 89 00:04:16,910 --> 00:04:23,140 -- because this is a plan that works pretty well for single 90 00:04:23,140 --> 00:04:24,660 sector puts and gets. 91 00:04:24,660 --> 00:04:27,680 So, we were able to make individual sector reads 92 00:04:27,680 --> 00:04:29,450 and writes recoverable. 93 00:04:29,450 --> 00:04:32,760 But if you think about any serious application or even any 94 00:04:32,760 --> 00:04:35,600 [toy?] application, in most cases you end up having more 95 00:04:35,600 --> 00:04:38,700 data than what fits into one single sector. 96 00:04:38,700 --> 00:04:41,060 And, you might have things touching data all 97 00:04:41,060 --> 00:04:41,920 over the place. 98 00:04:44,750 --> 00:04:50,360 And, our approach to doing this is to actually first define 99 00:04:50,360 --> 00:04:53,120 what a programmer must do, what somebody 100 00:04:53,120 --> 00:04:56,740 wishing to write a program that is a recoverable action 101 00:04:56,740 --> 00:04:57,480 must do. 102 00:04:57,480 --> 00:04:59,859 And then we're going to implement that underneath 103 00:04:59,859 --> 00:05:01,400 in a system so the programmer doesn't 104 00:05:01,400 --> 00:05:04,760 have to worry about implementing recoverability. 105 00:05:04,760 --> 00:05:07,940 So the idea here is for the programmer 106 00:05:07,940 --> 00:05:11,410 of a recoverable action, to start writing that action using 107 00:05:11,410 --> 00:05:15,660 a system call, a call that they call begin recoverable action, 108 00:05:15,660 --> 00:05:19,230 and then discipline herself or himself 109 00:05:19,230 --> 00:05:21,920 to write some software which has a small number of rules 110 00:05:21,920 --> 00:05:25,120 as to what can go in here. 111 00:05:25,120 --> 00:05:27,190 And then, explicitly, when they want 112 00:05:27,190 --> 00:05:29,680 to commit that recoverable action, 113 00:05:29,680 --> 00:05:32,100 make its results visible to subsequent actions, 114 00:05:32,100 --> 00:05:33,060 invoke commit. 115 00:05:36,870 --> 00:05:39,570 And then, they are allowed to do a little bit more work, 116 00:05:39,570 --> 00:05:41,090 or a lot of work here. 117 00:05:41,090 --> 00:05:44,260 But, there's very strict restrictions 118 00:05:44,260 --> 00:05:46,370 on what they can do after a commit. 119 00:05:46,370 --> 00:05:52,289 And then, they can end using end recoverable action. 120 00:05:52,289 --> 00:05:53,830 And this phase here before the commit 121 00:05:53,830 --> 00:05:55,121 is called the pre-commit phase. 122 00:05:55,121 --> 00:05:56,680 This is the post-commit phase. 123 00:05:56,680 --> 00:05:59,200 And the idea here is if a failure occurred 124 00:05:59,200 --> 00:06:04,730 here or an abort occurred before the commit and this action 125 00:06:04,730 --> 00:06:06,920 was made to abort, then the system 126 00:06:06,920 --> 00:06:09,980 must restore the state of all of the variables, 127 00:06:09,980 --> 00:06:13,710 and all of the data that was touched here to the same state 128 00:06:13,710 --> 00:06:15,770 before this action even got invoked. 129 00:06:15,770 --> 00:06:17,270 OK, it's as if not of this happened. 130 00:06:17,270 --> 00:06:20,970 So this is the not at all part of this definition 131 00:06:20,970 --> 00:06:22,391 of recoverability. 132 00:06:22,391 --> 00:06:24,390 Once you reach this point of the commit returns, 133 00:06:24,390 --> 00:06:26,056 the only thing you're allowed to do here 134 00:06:26,056 --> 00:06:28,046 are things that cause you to complete. 135 00:06:28,046 --> 00:06:29,420 You're not allowed to abort here. 136 00:06:29,420 --> 00:06:30,920 You're not allowed to back out here. 137 00:06:30,920 --> 00:06:33,420 So once you reach the point, it means 138 00:06:33,420 --> 00:06:37,930 you're in the do it all part of do it all or none at all. 139 00:06:37,930 --> 00:06:41,110 So you have to complete all the way to the end. 140 00:06:41,110 --> 00:06:43,100 And what this really means is that all 141 00:06:43,100 --> 00:06:45,342 of the data that you want to manipulate, 142 00:06:45,342 --> 00:06:47,550 and all of the resources that you want to accumulate, 143 00:06:47,550 --> 00:06:50,049 and we'll look at locks as a resource that you would like to 144 00:06:50,049 --> 00:06:52,710 accumulate in order to enforce isolation, which 145 00:06:52,710 --> 00:06:54,590 is a topic for next time, all that 146 00:06:54,590 --> 00:06:57,050 has to happen here so that once you reach this point 147 00:06:57,050 --> 00:06:59,540 and it ends, then even if a failure occurs when it 148 00:06:59,540 --> 00:07:01,980 restarts, you just have to crunch through 149 00:07:01,980 --> 00:07:03,900 and finish what was going on here. 150 00:07:03,900 --> 00:07:04,970 And that can just happen. 151 00:07:04,970 --> 00:07:06,966 There's nothing to acquire, no resources 152 00:07:06,966 --> 00:07:08,840 to get all of the data variables have already 153 00:07:08,840 --> 00:07:13,500 been put in their correct situation in the correct state. 154 00:07:13,500 --> 00:07:14,960 So the interesting part really is 155 00:07:14,960 --> 00:07:17,260 what happens between the begin recoverable action 156 00:07:17,260 --> 00:07:19,050 and until the commit finishes. 157 00:07:19,050 --> 00:07:22,610 And that's really what we're going to focus on. 158 00:07:22,610 --> 00:07:25,370 Now in addition to commit, there is another call 159 00:07:25,370 --> 00:07:27,950 that we have to explicitly think about, and that's abort. 160 00:07:31,356 --> 00:07:32,980 And there's two or three different ways 161 00:07:32,980 --> 00:07:34,700 in which abort may be invoked. 162 00:07:34,700 --> 00:07:38,590 The first is a program that might herself or himself have 163 00:07:38,590 --> 00:07:40,030 abort in their code. 164 00:07:40,030 --> 00:07:42,360 For example, in that bank transfer application, 165 00:07:42,360 --> 00:07:45,030 if you discover that your savings account doesn't 166 00:07:45,030 --> 00:07:48,140 have enough funds to cover a transfer, you read it, 167 00:07:48,140 --> 00:07:49,770 and then you maybe write something, 168 00:07:49,770 --> 00:07:52,329 and then you discover that you don't have the funds to cover 169 00:07:52,329 --> 00:07:52,870 the transfer. 170 00:07:52,870 --> 00:07:54,380 You might just abort. 171 00:07:54,380 --> 00:07:57,460 And the semantics of abort are that once abort 172 00:07:57,460 --> 00:07:59,420 is called by the programmer, they 173 00:07:59,420 --> 00:08:02,720 can be guaranteed that when the next person invokes 174 00:08:02,720 --> 00:08:07,040 a recoverable action that involves the same data items, 175 00:08:07,040 --> 00:08:11,470 those readers will see the same state as if this action never 176 00:08:11,470 --> 00:08:12,431 started. 177 00:08:12,431 --> 00:08:14,180 So what this means is that the system must 178 00:08:14,180 --> 00:08:16,510 have a plan of undoing and backing out 179 00:08:16,510 --> 00:08:18,240 of any changes that might have occurred 180 00:08:18,240 --> 00:08:21,162 before this abort is called. 181 00:08:21,162 --> 00:08:22,620 Another reason an abort might occur 182 00:08:22,620 --> 00:08:26,650 is that you're in a, for example, database complication, 183 00:08:26,650 --> 00:08:28,490 and you're booking all sorts of things 184 00:08:28,490 --> 00:08:32,240 like plane tickets, and air tickets, and hotel 185 00:08:32,240 --> 00:08:33,640 reservations, and so on. 186 00:08:33,640 --> 00:08:35,970 And you book a few of them and then 187 00:08:35,970 --> 00:08:38,809 you discover you can't get one of the reservations 188 00:08:38,809 --> 00:08:39,490 that you want. 189 00:08:39,490 --> 00:08:42,480 You might as a user might abort the whole transaction. 190 00:08:42,480 --> 00:08:45,970 And that causes all the individual things that 191 00:08:45,970 --> 00:08:48,999 are in partial state to abort. 192 00:08:48,999 --> 00:08:50,540 Another reason why abort might happen 193 00:08:50,540 --> 00:08:52,590 is that, and we'll see this the next time when 194 00:08:52,590 --> 00:08:55,440 we talk about locking, anytime you have locks, 195 00:08:55,440 --> 00:08:58,250 we already saw that anytime you have locks you 196 00:08:58,250 --> 00:09:00,120 have the danger of deadlock. 197 00:09:00,120 --> 00:09:02,080 In one way in which the system implementing 198 00:09:02,080 --> 00:09:06,780 these atomic actions, both for isolation in particular, deals 199 00:09:06,780 --> 00:09:09,950 with deadlocks is when two or more actions are 200 00:09:09,950 --> 00:09:12,940 waiting for each other, waiting on locks that the others hold, 201 00:09:12,940 --> 00:09:15,430 you just abort one of them, or abort as many of them 202 00:09:15,430 --> 00:09:17,160 as needed for progress to happen. 203 00:09:17,160 --> 00:09:19,130 So the system might unilaterally decide 204 00:09:19,130 --> 00:09:20,404 to abort certain actions. 205 00:09:20,404 --> 00:09:22,820 And, what that means is that the systems' abort had better 206 00:09:22,820 --> 00:09:25,410 have a plan to undo all partial changes that 207 00:09:25,410 --> 00:09:29,360 might have occurred before it returns from abort. 208 00:09:32,090 --> 00:09:34,274 OK, so that's the general model. 209 00:09:34,274 --> 00:09:35,690 So what we're going to do today is 210 00:09:35,690 --> 00:09:41,360 to understand what happens when data variables are written 211 00:09:41,360 --> 00:09:44,270 inside one of these recoverable actions: 212 00:09:44,270 --> 00:09:47,070 how come it's implemented, and how abort is implemented. 213 00:09:47,070 --> 00:09:48,160 And that's the plan. 214 00:09:48,160 --> 00:09:49,636 And, once we do that, we will have 215 00:09:49,636 --> 00:09:50,760 implemented recoverability. 216 00:09:50,760 --> 00:09:56,420 So we're going to study two solutions to this problem. 217 00:09:56,420 --> 00:10:05,390 And the first solution uses an idea called version histories. 218 00:10:05,390 --> 00:10:07,940 And version histories really build on an idea 219 00:10:07,940 --> 00:10:11,970 that we did see last time when we talked 220 00:10:11,970 --> 00:10:14,810 about recoverable sector, which is this rule that we call 221 00:10:14,810 --> 00:10:17,250 the golden rule of recoverability, which 222 00:10:17,250 --> 00:10:20,340 says never modify the only copy because if you modify 223 00:10:20,340 --> 00:10:23,002 the only copy of something and a failure occurs, then you don't 224 00:10:23,002 --> 00:10:24,460 really have a way of backing it out 225 00:10:24,460 --> 00:10:27,920 because you don't know what the original value was. 226 00:10:27,920 --> 00:10:29,560 Version histories generalize the idea 227 00:10:29,560 --> 00:10:32,880 to say, never modify anything. 228 00:10:32,880 --> 00:10:35,140 So the idea is anytime you want to write a variable, 229 00:10:35,140 --> 00:10:37,090 you don't actually overwrite anything. 230 00:10:37,090 --> 00:10:38,870 You create another version of the variable 231 00:10:38,870 --> 00:10:41,860 and somehow arrange for the set of pointers 232 00:10:41,860 --> 00:10:44,740 that, for a variable to point to all 233 00:10:44,740 --> 00:10:47,130 of the versions of any given variable. 234 00:10:47,130 --> 00:10:50,480 And to understand that, we need to understand the difference 235 00:10:50,480 --> 00:10:54,220 between conventional storage, like a conventional variable 236 00:10:54,220 --> 00:10:59,380 that is also called a cell store or a cell storage item, 237 00:10:59,380 --> 00:11:03,400 and a variable that allows you to implement versions which 238 00:11:03,400 --> 00:11:08,710 we're going to call [a journal?] based storage. 239 00:11:08,710 --> 00:11:10,750 So, cell storage is traditional storage. 240 00:11:10,750 --> 00:11:13,080 So if you have a variable, X, that's cell storage 241 00:11:13,080 --> 00:11:18,250 and you set X to some value, V, what ends up happening 242 00:11:18,250 --> 00:11:22,860 is that the cell that contains X is you 243 00:11:22,860 --> 00:11:25,101 write the value, V, into X. 244 00:11:25,101 --> 00:11:27,600 In other words, you overwrite whatever [there is?] you know, 245 00:11:27,600 --> 00:11:28,957 and replace it with V. 246 00:11:28,957 --> 00:11:31,540 And, this overwriting really is what causes the problem if you 247 00:11:31,540 --> 00:11:35,740 don't have another copy of this variable somehow maintained, 248 00:11:35,740 --> 00:11:38,590 overwriting means that this rule of recoverabilities 249 00:11:38,590 --> 00:11:41,570 is being violated. 250 00:11:41,570 --> 00:11:45,260 We're going to use the word install for these writes. 251 00:11:45,260 --> 00:11:48,500 So we'll be installing items into cell stores. 252 00:11:48,500 --> 00:11:52,370 So what that means is assigning a value to a cell store 253 00:11:52,370 --> 00:11:52,870 variable. 254 00:11:56,730 --> 00:11:59,520 And the problem is this gets in the way of the golden rule. 255 00:11:59,520 --> 00:12:06,620 So what were going to do is use these cell storage items 256 00:12:06,620 --> 00:12:09,170 that we know how to build that's the memory abstraction 257 00:12:09,170 --> 00:12:12,060 to build an expanded version called a journal 258 00:12:12,060 --> 00:12:15,217 storage of generalized storage in which nothing 259 00:12:15,217 --> 00:12:16,050 is ever overwritten. 260 00:12:18,615 --> 00:12:19,990 The way this works is that if you 261 00:12:19,990 --> 00:12:26,960 have X, the very first time you set X to some value, 262 00:12:26,960 --> 00:12:30,880 you end up creating a data structure in cell storage 263 00:12:30,880 --> 00:12:33,550 that looks like this. 264 00:12:33,550 --> 00:12:35,480 You have a value of V1. 265 00:12:35,480 --> 00:12:38,830 And you also keep track of the identifier of the action that 266 00:12:38,830 --> 00:12:39,460 created that. 267 00:12:39,460 --> 00:12:40,520 And, that'll turn out to be useful 268 00:12:40,520 --> 00:12:42,810 for us to know the identifiers of the actions that 269 00:12:42,810 --> 00:12:46,044 created any given variable. 270 00:12:46,044 --> 00:12:47,460 And how you get these identifiers? 271 00:12:47,460 --> 00:12:51,180 When [begin RA?] is called, it returns an ID, OK, 272 00:12:51,180 --> 00:12:53,170 and the system knows that. 273 00:12:53,170 --> 00:12:57,630 And this ID is available to the program as well. 274 00:12:57,630 --> 00:13:00,880 Then the next version, if X gets set by any action 275 00:13:00,880 --> 00:13:05,260 to a different value, what you do is you created that as V2. 276 00:13:05,260 --> 00:13:08,740 And, you keep track of the identifier that maintains that. 277 00:13:08,740 --> 00:13:10,939 And then you got V3, and so on, all the way. 278 00:13:10,939 --> 00:13:12,730 And the current version, the latest version 279 00:13:12,730 --> 00:13:16,920 might be VN that was written by IDN. 280 00:13:16,920 --> 00:13:18,570 Now if the same action repeatedly 281 00:13:18,570 --> 00:13:21,160 writes the same variable, you just create new versions. 282 00:13:21,160 --> 00:13:23,440 So it isn't like there's one version per action. 283 00:13:23,440 --> 00:13:25,640 It's just that there's one version every time 284 00:13:25,640 --> 00:13:26,580 you write something. 285 00:13:26,580 --> 00:13:30,182 So literally, nothing is overwritten. 286 00:13:30,182 --> 00:13:30,890 And so, that's X. 287 00:13:30,890 --> 00:13:35,580 So, X itself points to the head version, the very latest 288 00:13:35,580 --> 00:13:36,700 version that was written. 289 00:13:36,700 --> 00:13:38,130 And, you could imagine that there 290 00:13:38,130 --> 00:13:42,075 are these pointers pulling you back like a link list. 291 00:13:42,075 --> 00:13:44,450 But the nice thing about it is this is the journal store. 292 00:13:44,450 --> 00:13:46,550 So, X itself is this whole thing. 293 00:13:50,580 --> 00:13:53,570 And, we'll implement two calls that when you have, 294 00:13:53,570 --> 00:13:56,130 this is basically a memory abstraction. 295 00:13:56,130 --> 00:13:59,020 So, you need to read and you need to write. 296 00:13:59,020 --> 00:14:01,410 So, for write, we're going to come up 297 00:14:01,410 --> 00:14:06,200 with a call called write journal, which in the notes 298 00:14:06,200 --> 00:14:08,740 I think has a slightly different name. 299 00:14:08,740 --> 00:14:10,640 I think they call it write new value. 300 00:14:10,640 --> 00:14:14,410 But write journal makes it clear that it's for journal store. 301 00:14:14,410 --> 00:14:16,280 And, this is easy. 302 00:14:16,280 --> 00:14:18,650 It's some data item, X. 303 00:14:18,650 --> 00:14:20,470 It's some value, V. 304 00:14:20,470 --> 00:14:24,620 And, it's the ID of the action that's doing the write. 305 00:14:24,620 --> 00:14:26,080 And this is very easy to implement. 306 00:14:26,080 --> 00:14:28,110 All you do is you create a new version. 307 00:14:28,110 --> 00:14:32,250 And then you take the current thing that X is pointing to, 308 00:14:32,250 --> 00:14:35,950 and make the current version's next pointer point to that. 309 00:14:35,950 --> 00:14:38,210 And then you make X point to the new version. 310 00:14:38,210 --> 00:14:42,080 So, it's just a link list thing, OK? 311 00:14:42,080 --> 00:14:45,000 And, in addition to write journal, 312 00:14:45,000 --> 00:14:48,750 we obviously need to implement read journal. 313 00:14:55,430 --> 00:14:59,270 And read journal is going to take a data item that you wish 314 00:14:59,270 --> 00:15:02,130 to read, X, and for reasons that will become clearer 315 00:15:02,130 --> 00:15:04,810 in a minute, it also takes the ID of the action that 316 00:15:04,810 --> 00:15:08,220 wants to do the read, OK? 317 00:15:08,220 --> 00:15:09,980 So if you want to read something, 318 00:15:09,980 --> 00:15:13,250 the idea is going to be the following: the idea is going 319 00:15:13,250 --> 00:15:17,340 to be that some of these actions are actions; 320 00:15:17,340 --> 00:15:20,550 some of these versions are going to have been written 321 00:15:20,550 --> 00:15:23,330 by actions that were committed. 322 00:15:23,330 --> 00:15:25,540 OK, and some of these actions were 323 00:15:25,540 --> 00:15:27,570 going to have been written by actions 324 00:15:27,570 --> 00:15:32,080 that started writing things and then maybe failed or aborted. 325 00:15:32,080 --> 00:15:34,887 So they never committed. 326 00:15:34,887 --> 00:15:36,470 Now, clearly when you do read journal, 327 00:15:36,470 --> 00:15:38,761 you don't want to see the results of those actions that 328 00:15:38,761 --> 00:15:42,050 were never committed because what you want 329 00:15:42,050 --> 00:15:44,280 to see from the definition that we laid out 330 00:15:44,280 --> 00:15:46,370 are once you reach the commit point, 331 00:15:46,370 --> 00:15:48,130 you want to see the change is visible. 332 00:15:48,130 --> 00:15:50,832 Before that, you don't want anything visible. 333 00:15:50,832 --> 00:15:52,540 So as long as you can keep track of which 334 00:15:52,540 --> 00:15:55,270 of these actions committed, and which of these didn't commit, 335 00:15:55,270 --> 00:15:58,480 you can implement read journal by starting at the most 336 00:15:58,480 --> 00:16:01,660 recent version, and going backwards 337 00:16:01,660 --> 00:16:05,340 until you find the first version that corresponds 338 00:16:05,340 --> 00:16:10,400 to a value that was written by an action that was committed. 339 00:16:10,400 --> 00:16:13,670 So what you need to do is start from here and look at IDN. 340 00:16:13,670 --> 00:16:17,200 If IDN, you need to maintain another table that tells you 341 00:16:17,200 --> 00:16:19,664 whether IDN committed or not. 342 00:16:19,664 --> 00:16:21,330 If it committed, then return that value. 343 00:16:21,330 --> 00:16:23,140 If not, go back one. 344 00:16:23,140 --> 00:16:25,850 And, keep going until you find the most recent version 345 00:16:25,850 --> 00:16:29,700 that was written by a committed action. 346 00:16:29,700 --> 00:16:31,880 If you do that, then read journal clearly 347 00:16:31,880 --> 00:16:35,220 returns to you what you would want, 348 00:16:35,220 --> 00:16:36,860 which is the value that was written 349 00:16:36,860 --> 00:16:39,832 by the last committed action. 350 00:16:39,832 --> 00:16:41,540 The only other tweak that you want to do, 351 00:16:41,540 --> 00:16:44,040 and the reason why ID is passed as an argument read 352 00:16:44,040 --> 00:16:46,629 journal is if the current action has already written, 353 00:16:46,629 --> 00:16:48,420 so let's say you are implementing an action 354 00:16:48,420 --> 00:16:51,750 and you set the value of X to 17, 355 00:16:51,750 --> 00:16:53,750 then when you read the value of X, 356 00:16:53,750 --> 00:16:55,369 you would want the value that you set. 357 00:16:55,369 --> 00:16:57,660 I mean, you wouldn't want the previous committed action 358 00:16:57,660 --> 00:17:00,800 that's one way of defining read journal. 359 00:17:00,800 --> 00:17:05,180 So as you go from the most recent version to the oldest 360 00:17:05,180 --> 00:17:08,510 version, you either look see whether the value that you 361 00:17:08,510 --> 00:17:11,569 are reading now is a value that you set, your own action set. 362 00:17:11,569 --> 00:17:13,079 And if it was, just return that. 363 00:17:13,079 --> 00:17:14,912 And then, it'll return to you the last value 364 00:17:14,912 --> 00:17:16,592 that this action set. 365 00:17:16,592 --> 00:17:18,050 Otherwise, you keep going until you 366 00:17:18,050 --> 00:17:22,300 find the value set by the most recent committed action. 367 00:17:22,300 --> 00:17:25,550 And since we aren't dealing here with concurrent actions at all, 368 00:17:25,550 --> 00:17:31,056 right, we've already said last time that, until next Monday, 369 00:17:31,056 --> 00:17:33,430 we're only going to be dealing with one action at a time. 370 00:17:33,430 --> 00:17:35,030 There's no concurrent actions. 371 00:17:35,030 --> 00:17:37,856 Clearly, this algorithm will be correct. 372 00:17:37,856 --> 00:17:39,480 You start from the most recent version, 373 00:17:39,480 --> 00:17:42,530 keep going until you find the first version that was either 374 00:17:42,530 --> 00:17:44,740 [done?] by this action that's doing the read, 375 00:17:44,740 --> 00:17:49,820 or the first version that was written by an action that 376 00:17:49,820 --> 00:17:51,740 committed. 377 00:17:51,740 --> 00:17:55,540 So, clearly what this means is that you need a table 378 00:17:55,540 --> 00:17:59,010 that you have to maintain that stores the status 379 00:17:59,010 --> 00:18:00,140 of these different actions. 380 00:18:00,140 --> 00:18:02,410 It needs to store which actions committed, 381 00:18:02,410 --> 00:18:04,355 and which actions didn't commit. 382 00:18:04,355 --> 00:18:06,730 And that's going to be done using a data structure called 383 00:18:06,730 --> 00:18:07,730 the commit record table. 384 00:18:12,000 --> 00:18:13,980 And this is a very simple table. 385 00:18:13,980 --> 00:18:16,990 It just has ID1, ID2, all the way down 386 00:18:16,990 --> 00:18:18,300 to whatever ID's you have. 387 00:18:18,300 --> 00:18:22,130 Every time somebody calls begin RA, you return them an ID, 388 00:18:22,130 --> 00:18:25,222 and then you create this table that as soon 389 00:18:25,222 --> 00:18:27,680 as they create this action, you set their state to pending, 390 00:18:27,680 --> 00:18:31,090 which I'll call P, OK? 391 00:18:31,090 --> 00:18:35,190 And, any time an action commits, you replace this P 392 00:18:35,190 --> 00:18:38,160 with a C, which is a commit record. 393 00:18:38,160 --> 00:18:41,960 OK, and once it's replaced with a C for an action, 394 00:18:41,960 --> 00:18:47,130 this item is called the commit record for an action. 395 00:18:47,130 --> 00:18:49,650 So now, when you want to do read journal 396 00:18:49,650 --> 00:18:52,442 and you're looking to see whether for any given action, 397 00:18:52,442 --> 00:18:54,400 things were committed, the corresponding action 398 00:18:54,400 --> 00:18:56,460 is committed or not, you look at this. 399 00:18:56,460 --> 00:18:57,360 You see its IDN. 400 00:18:57,360 --> 00:19:00,300 You look for IDN in this table, C, if it's committed or not. 401 00:19:00,300 --> 00:19:02,980 If it's not committed, then you go to the previous version 402 00:19:02,980 --> 00:19:04,310 and you do the same thing. 403 00:19:04,310 --> 00:19:10,400 If it's committed, then you return it. 404 00:19:10,400 --> 00:19:12,910 Now, it's not actually clear why you need this pending thing 405 00:19:12,910 --> 00:19:13,410 here. 406 00:19:13,410 --> 00:19:16,740 But it'll turn out that you will require the pending thing when 407 00:19:16,740 --> 00:19:18,350 you deal with isolation on Monday. 408 00:19:18,350 --> 00:19:20,530 So for now, you don't have to worry about the fact 409 00:19:20,530 --> 00:19:24,350 that these pending things are there, OK? 410 00:19:24,350 --> 00:19:28,990 Now, suppose an action starts, and then it aborts. 411 00:19:28,990 --> 00:19:31,780 So I mentioned here that when an action starts and it aborts, 412 00:19:31,780 --> 00:19:34,890 the system has to do some kind of undoing of data in order 413 00:19:34,890 --> 00:19:36,560 for abort to be correctly implemented. 414 00:19:36,560 --> 00:19:38,910 So, the state of the system's restored to the state 415 00:19:38,910 --> 00:19:42,250 before the action even started. 416 00:19:42,250 --> 00:19:44,875 The nice thing about this way of implementing version histories 417 00:19:44,875 --> 00:19:46,291 and read journal is you don't have 418 00:19:46,291 --> 00:19:47,440 to do anything on an abort. 419 00:19:50,080 --> 00:19:53,250 If the application or the system called abort, 420 00:19:53,250 --> 00:19:56,900 nothing has to be done because read journal basically is just 421 00:19:56,900 --> 00:19:59,260 going scanning this backward, looking 422 00:19:59,260 --> 00:20:01,770 for whether the version was written by itself, 423 00:20:01,770 --> 00:20:05,080 that same action or looking for whether the version was written 424 00:20:05,080 --> 00:20:06,750 by a committed action. 425 00:20:06,750 --> 00:20:09,350 So as long as you can find for any given ID 426 00:20:09,350 --> 00:20:12,960 whether it was committed or not, that's all you need. 427 00:20:12,960 --> 00:20:16,880 OK, but just for completeness, and this will become useful 428 00:20:16,880 --> 00:20:22,960 the next time, all we'll do when abort is called on an action, 429 00:20:22,960 --> 00:20:25,760 so abort takes the ID of the action as an argument, 430 00:20:25,760 --> 00:20:29,940 all we'll do is we'll replace, if ID7 aborts, 431 00:20:29,940 --> 00:20:32,340 we'll just replace the pending. 432 00:20:32,340 --> 00:20:35,510 We'll replace that with an abort, OK? 433 00:20:35,510 --> 00:20:38,440 So, this commit record table contains 434 00:20:38,440 --> 00:20:40,100 the status of the actions. 435 00:20:40,100 --> 00:20:44,990 And that status could either be committed, pending, or aborted. 436 00:20:44,990 --> 00:20:46,270 When it starts, it's pending. 437 00:20:46,270 --> 00:20:51,350 And then it's pending as long as either it aborts, in which case 438 00:20:51,350 --> 00:20:54,010 it aborted, or it's committed. 439 00:20:54,010 --> 00:20:56,832 Now, if it just fails and you don't do anything about it, 440 00:20:56,832 --> 00:20:58,540 and there's no abort call, it'll continue 441 00:20:58,540 --> 00:21:00,750 to remain in the pending state. 442 00:21:00,750 --> 00:21:02,870 But that's OK because we're never really going 443 00:21:02,870 --> 00:21:04,896 to read the value of anything that's 444 00:21:04,896 --> 00:21:07,270 the in the pending state that was set by an action that's 445 00:21:07,270 --> 00:21:08,145 in the pending state. 446 00:21:12,210 --> 00:21:13,105 So is this clear? 447 00:21:16,280 --> 00:21:18,430 OK, this approach is actually quite reasonable 448 00:21:18,430 --> 00:21:20,520 except that it has a few problems. 449 00:21:20,520 --> 00:21:25,720 The first problem it has is, well, it 450 00:21:25,720 --> 00:21:26,910 has two related problems. 451 00:21:26,910 --> 00:21:29,370 And that's the first class of problems that it has is 452 00:21:29,370 --> 00:21:31,210 that although it looks like we've really 453 00:21:31,210 --> 00:21:35,882 nailed this problem of achieving recoverable storage using 454 00:21:35,882 --> 00:21:37,340 this journal storage idea, building 455 00:21:37,340 --> 00:21:40,340 general recoverable actions so that for any variable that's 456 00:21:40,340 --> 00:21:45,550 read inside here or read inside a recoverable action, 457 00:21:45,550 --> 00:21:47,970 you use this general storage idea. 458 00:21:47,970 --> 00:21:50,470 It's not quite correct because you have to ask, 459 00:21:50,470 --> 00:21:55,840 what happens if the system fails while the system is 460 00:21:55,840 --> 00:21:57,610 writing this commit record? 461 00:21:57,610 --> 00:21:59,260 So, the application calls commit. 462 00:21:59,260 --> 00:22:01,910 The system's starting to write this commit record 463 00:22:01,910 --> 00:22:04,620 and it fails. 464 00:22:04,620 --> 00:22:06,350 Or you might more generally ask, what 465 00:22:06,350 --> 00:22:11,300 happens if I create this new version in write journal, 466 00:22:11,300 --> 00:22:14,280 and as I'm creating a new version of a variable, 467 00:22:14,280 --> 00:22:15,180 the system crashes. 468 00:22:15,180 --> 00:22:17,580 So some garbage got written here. 469 00:22:17,580 --> 00:22:21,360 Or more likely, some garbage got written not in here 470 00:22:21,360 --> 00:22:23,791 but as I was changing this pointer for X 471 00:22:23,791 --> 00:22:25,290 to point to the most recent version, 472 00:22:25,290 --> 00:22:26,190 some garbage got written. 473 00:22:26,190 --> 00:22:28,230 So, all subsequent reads of X don't quite work. 474 00:22:31,730 --> 00:22:34,000 The answer to this question is that we 475 00:22:34,000 --> 00:22:37,257 know how to solve this problem because that question is 476 00:22:37,257 --> 00:22:38,090 basically identical. 477 00:22:38,090 --> 00:22:39,674 Both of these are identical. 478 00:22:39,674 --> 00:22:41,590 If we know how to solve the problem of writing 479 00:22:41,590 --> 00:22:44,640 a single, recoverable sector, a single, small item of data, 480 00:22:44,640 --> 00:22:47,070 then we know how to solve these two problems because both 481 00:22:47,070 --> 00:22:50,520 of these are writing recoverably a small amount of data. 482 00:22:50,520 --> 00:22:52,530 In one case, a pointer that takes 483 00:22:52,530 --> 00:22:55,930 X to point to the most recent version, in another case 484 00:22:55,930 --> 00:22:59,740 it's a single data item that corresponds 485 00:22:59,740 --> 00:23:03,240 to the commit record in this commit record table. 486 00:23:03,240 --> 00:23:08,140 And so this shows this idea of bootstrap, 487 00:23:08,140 --> 00:23:11,610 that in order to build this atomic action, 488 00:23:11,610 --> 00:23:14,975 this recoverable action, we end up [SOUND OFF/THEN ON] and then 489 00:23:14,975 --> 00:23:17,350 you bootstrap on something that we know already how to do 490 00:23:17,350 --> 00:23:17,460 because there are these cases where you have to make sure 491 00:23:17,460 --> 00:23:17,520 that it writes to certain pointers, 492 00:23:17,520 --> 00:23:17,600 and some table items are done [commonly?]. 493 00:23:17,600 --> 00:23:17,720 And we know how to do that because we just told you 494 00:23:17,720 --> 00:23:17,780 how to do recoverable sectors. 495 00:23:17,780 --> 00:23:17,990 And you could just take [UNINTELLIGIBLE] objects 496 00:23:17,990 --> 00:23:18,050 for these items, and [UNINTELLIGIBLE PHRASE] 497 00:23:18,050 --> 00:23:18,470 to get this bootstrap. 498 00:23:18,470 --> 00:23:18,560 So that's the first thing, the first [step problem?]. 499 00:23:18,560 --> 00:23:18,650 There's another problem, not so much a correctness problem, 500 00:23:18,650 --> 00:23:18,730 but a problem in general using these version 501 00:23:18,730 --> 00:23:18,800 histories in order to build recoverable actions. 502 00:23:18,800 --> 00:23:21,220 Any ideas on what that might be? 503 00:23:21,220 --> 00:23:25,490 Like, why would we want to use this? 504 00:23:25,490 --> 00:23:27,630 Is this a space? 505 00:23:27,630 --> 00:23:32,440 Well, you kind of can't really get around that. 506 00:23:32,440 --> 00:23:37,250 I mean, it's true that there are these older 507 00:23:37,250 --> 00:23:39,920 versions that you keep forever. 508 00:23:39,920 --> 00:23:43,130 But, there are organizations you can 509 00:23:43,130 --> 00:23:45,800 bring to bear that's [UNINTELLIGIBLE] 510 00:23:45,800 --> 00:23:51,140 beneath these old version that you can't really care about 511 00:23:51,140 --> 00:23:53,820 anymore because really the [UNINTELLIGIBLE] 512 00:23:53,820 --> 00:23:57,020 requires, at least for [UNINTELLIGIBLE PHRASE] 513 00:23:57,020 --> 00:24:01,300 about this when we talk about isolation tomorrow. 514 00:24:01,300 --> 00:24:03,970 But really, the [UNINTELLIGIBLE] only 515 00:24:03,970 --> 00:24:07,180 requires for a single action case 516 00:24:07,180 --> 00:24:09,310 the last committed version. 517 00:24:09,310 --> 00:24:14,660 So, you could garbage collect this stuff if you want. 518 00:24:14,660 --> 00:24:16,800 Yeah, it's really slow. 519 00:24:16,800 --> 00:24:20,000 So, for applications where you care 520 00:24:20,000 --> 00:24:22,670 about performance, a reasonable performance, 521 00:24:22,670 --> 00:24:25,880 [UNINTELLIGIBLE PHRASE] this is really slow. 522 00:24:25,880 --> 00:24:30,160 And naturally, it's not to say that this 523 00:24:30,160 --> 00:24:36,570 is a bad idea, an idea that shouldn't be used at all. 524 00:24:36,570 --> 00:24:41,910 In fact, it's a perfectly good idea for many cases 525 00:24:41,910 --> 00:24:45,120 where you might, for various reasons, 526 00:24:45,120 --> 00:24:49,390 want to store restorative records of old data 527 00:24:49,390 --> 00:24:54,740 and you don't care about fast read or write performance. 528 00:24:54,740 --> 00:24:58,480 So it's perfectly good for certain applications. 529 00:24:58,480 --> 00:25:02,654 But it's not good for applications that want 530 00:25:02,654 --> 00:25:03,820 reasonably high-performance. 531 00:25:03,820 --> 00:25:08,100 And the reason that this thing is small 532 00:25:08,100 --> 00:25:11,840 is because if you think about it, 533 00:25:11,840 --> 00:25:17,720 it actually optimizes what you might think of as uncommon case 534 00:25:17,720 --> 00:25:24,130 because what it ensures is that when you fail and you recover, 535 00:25:24,130 --> 00:25:27,340 you have to do no work. 536 00:25:27,340 --> 00:25:32,150 So crash recovery is really fast in this approach 537 00:25:32,150 --> 00:25:35,830 because there's nothing to be done for crash recovery. 538 00:25:35,830 --> 00:25:39,482 But reads and writes are slow because a read involves 539 00:25:39,482 --> 00:25:40,440 [traversing?] the list. 540 00:25:40,440 --> 00:25:44,070 A write involves [UNINTELLIGIBLE PHRASE]. 541 00:25:44,070 --> 00:25:46,770 And so, it almost optimizes the opposite 542 00:25:46,770 --> 00:25:47,730 of what you would want. 543 00:25:47,730 --> 00:25:49,105 If you want to write performance, 544 00:25:49,105 --> 00:25:51,740 you want to form the principle of optimizing the common case. 545 00:25:51,740 --> 00:25:59,480 And in order to optimize the common case, what it means, 546 00:25:59,480 --> 00:26:08,760 what you want to do here is to make the reads and writes 547 00:26:08,760 --> 00:26:13,760 really fast, and maybe pay the penalty 548 00:26:13,760 --> 00:26:20,180 of a little bit of extra turning in doing 549 00:26:20,180 --> 00:26:21,610 [UNINTELLIGIBLE PHRASE]. 550 00:26:21,610 --> 00:26:23,750 It's working now? 551 00:26:23,750 --> 00:26:25,180 [LAUGHTER] Hello? 552 00:26:25,180 --> 00:26:27,320 All right, thanks. 553 00:26:27,320 --> 00:26:32,910 OK, so what you want to do is optimize, whoa, it's loud. 554 00:26:32,910 --> 00:26:36,100 The integral of the volume over time is correct. 555 00:26:43,670 --> 00:26:47,450 OK, so the solution to this problem 556 00:26:47,450 --> 00:26:49,920 where we want to optimize the common case of reads 557 00:26:49,920 --> 00:26:54,010 and writes, but we are OK taking a bunch of time 558 00:26:54,010 --> 00:26:56,870 to do crash recovery is an idea called logging. 559 00:27:04,890 --> 00:27:09,360 So the way to think of a log is it's like a version history 560 00:27:09,360 --> 00:27:13,860 except you don't have a version for each variable. 561 00:27:13,860 --> 00:27:17,930 You think of it as an Interleaf version data structure 562 00:27:17,930 --> 00:27:21,520 that interleaves all the version histories for all of the data 563 00:27:21,520 --> 00:27:24,860 that was ever written during an action, 564 00:27:24,860 --> 00:27:27,100 during all of the actions that ran. 565 00:27:27,100 --> 00:27:30,042 So what this means is that you can write the log sequentially. 566 00:27:30,042 --> 00:27:31,750 And you've seen this in yesterday's paper 567 00:27:31,750 --> 00:27:34,120 where they use logs for a different application 568 00:27:34,120 --> 00:27:37,930 for high performance in a file system for a system 569 00:27:37,930 --> 00:27:43,081 where writes normally would incur a lot of seeks. 570 00:27:43,081 --> 00:27:44,330 But you can use the same idea. 571 00:27:44,330 --> 00:27:48,070 In this case, we're going to use a log for crash recovery. 572 00:27:48,070 --> 00:27:50,670 But the fundamental property of a log data structure 573 00:27:50,670 --> 00:27:53,760 is that it needs be written only sequentially. 574 00:27:53,760 --> 00:27:56,090 And we know that disks do that pretty fast. 575 00:27:56,090 --> 00:27:58,170 It's only when you have to seek that and read 576 00:27:58,170 --> 00:28:00,600 small chunks of data with seeks that you end up 577 00:28:00,600 --> 00:28:03,580 being really slow. 578 00:28:03,580 --> 00:28:08,360 So we're going to use cell storage to satisfy our reads 579 00:28:08,360 --> 00:28:10,912 and writes. 580 00:28:10,912 --> 00:28:12,870 So all of those are going to go to cell stores. 581 00:28:12,870 --> 00:28:14,911 [You don't read?] means you just read a variable. 582 00:28:14,911 --> 00:28:16,930 You don't traverse any link lists and writes. 583 00:28:16,930 --> 00:28:18,350 You don't create any new versions. 584 00:28:18,350 --> 00:28:22,720 You just write into cell store. 585 00:28:22,720 --> 00:28:32,540 But then the log is going to be stored on a nonvolatile medium 586 00:28:32,540 --> 00:28:33,620 such as a disk. 587 00:28:33,620 --> 00:28:36,690 And it's written sequentially. 588 00:28:45,500 --> 00:28:52,390 So once we have those two, our plan is going to be as follows. 589 00:28:52,390 --> 00:28:55,440 And this plan is the same plan that's adopted. 590 00:28:55,440 --> 00:28:58,450 Although there is dozens of ways of doing 591 00:28:58,450 --> 00:29:01,340 log based crash recover, they all essentially follow 592 00:29:01,340 --> 00:29:04,740 the same basic plan. 593 00:29:04,740 --> 00:29:07,180 You read and write normally to cell storage. 594 00:29:07,180 --> 00:29:09,865 And you also write a copy of what 595 00:29:09,865 --> 00:29:10,990 you're reading and writing. 596 00:29:10,990 --> 00:29:13,180 You write an encoding of what you're writing, 597 00:29:13,180 --> 00:29:15,970 any updates that you make into the log. 598 00:29:15,970 --> 00:29:18,130 OK, and we'll talk in more detail 599 00:29:18,130 --> 00:29:20,050 about what you're exactly right into the log 600 00:29:20,050 --> 00:29:22,530 and when you write into the log, OK? 601 00:29:22,530 --> 00:29:25,460 So that allows us to follow this golden rule of recoverability. 602 00:29:25,460 --> 00:29:28,050 It'll turn out that the log is a copy of the data. 603 00:29:28,050 --> 00:29:30,800 So you always have two copies of the data: one in cell storage, 604 00:29:30,800 --> 00:29:31,690 one on the log. 605 00:29:36,170 --> 00:29:39,340 So what happens when you fail? 606 00:29:39,340 --> 00:29:42,080 Well, when you fail, unlike in the version history case where 607 00:29:42,080 --> 00:29:45,900 you could fail and restart, and you don't have to do anything, 608 00:29:45,900 --> 00:29:52,750 here when you fail, the system runs a recovery procedure. 609 00:29:52,750 --> 00:29:55,480 And that recovery procedure recovers from the log 610 00:29:55,480 --> 00:29:57,384 that we have conveniently arranged to write 611 00:29:57,384 --> 00:29:58,550 in the non-volatile storage. 612 00:29:58,550 --> 00:30:01,070 So, it remains even after a crash, 613 00:30:01,070 --> 00:30:05,129 and it remains after a crash recovers. 614 00:30:05,129 --> 00:30:07,670 And there are two things to do while recovering from the log. 615 00:30:10,460 --> 00:30:15,819 For actions that didn't get to finish the commit, for actions 616 00:30:15,819 --> 00:30:17,860 that were uncommitted, which is this commit never 617 00:30:17,860 --> 00:30:21,230 return, what we have to do is to look carefully 618 00:30:21,230 --> 00:30:26,100 to see whether the corresponding cell store had any updates that 619 00:30:26,100 --> 00:30:27,360 were made to it. 620 00:30:27,360 --> 00:30:28,900 And it'll turn out that the log is 621 00:30:28,900 --> 00:30:31,660 going to help us keep track of what items were updated 622 00:30:31,660 --> 00:30:33,160 by any given action. 623 00:30:33,160 --> 00:30:35,220 And what we're going to end up doing 624 00:30:35,220 --> 00:30:40,150 is for uncommitted actions, we're going to back out. 625 00:30:43,969 --> 00:30:46,510 In other words, we're going to undo any changes that it made, 626 00:30:46,510 --> 00:30:48,176 and the log is going to help us do that. 627 00:30:51,470 --> 00:30:54,330 And conversely, for committed actions, 628 00:30:54,330 --> 00:30:57,320 because the semantics we want are that once committed, 629 00:30:57,320 --> 00:31:01,490 you would like the changes to be visible to other people. 630 00:31:01,490 --> 00:31:03,610 For committed actions, what you would like to do 631 00:31:03,610 --> 00:31:05,630 are to make sure that the changes made 632 00:31:05,630 --> 00:31:07,820 by all committed actions are in fact 633 00:31:07,820 --> 00:31:10,880 installed in the cell store. 634 00:31:10,880 --> 00:31:12,860 And what this means is that if they turn out 635 00:31:12,860 --> 00:31:14,937 to not have been installed, and we're 636 00:31:14,937 --> 00:31:17,520 going to use the log to tell us whether they've been installed 637 00:31:17,520 --> 00:31:19,640 or not, we will redo those actions. 638 00:31:25,720 --> 00:31:27,820 And, the second thing we need to do 639 00:31:27,820 --> 00:31:31,720 is what happens if an abort is called 640 00:31:31,720 --> 00:31:34,880 either by the application or by the system. 641 00:31:34,880 --> 00:31:40,310 Well, in this case, what we have to do is to use the log, 642 00:31:40,310 --> 00:31:41,780 and to keep track, the log is going 643 00:31:41,780 --> 00:31:44,860 to help us keep track of the changes made by this action 644 00:31:44,860 --> 00:31:46,760 to the cell store. 645 00:31:46,760 --> 00:31:49,459 The cell store itself doesn't have an ocean of old or new 646 00:31:49,459 --> 00:31:50,500 because it's overwritten. 647 00:31:50,500 --> 00:31:52,270 So the log is going to tell us that. 648 00:31:52,270 --> 00:31:53,890 And when abort is called, we just 649 00:31:53,890 --> 00:31:58,120 want to back out by undoing the changes of the current action. 650 00:32:04,447 --> 00:32:05,280 And that's the plan. 651 00:32:09,154 --> 00:32:10,820 So the first thing we need to figure out 652 00:32:10,820 --> 00:32:12,070 is what this log looks like. 653 00:32:16,310 --> 00:32:18,180 So as we saw from this discussion, 654 00:32:18,180 --> 00:32:21,120 the log is going to be required for us to do two things. 655 00:32:21,120 --> 00:32:23,520 We're going to be undoing things from the log, 656 00:32:23,520 --> 00:32:28,260 and we're going to be redoing things from the log. 657 00:32:28,260 --> 00:32:32,440 So what that suggests is that any time you update cell store, 658 00:32:32,440 --> 00:32:34,930 you change X from 17 to 25. 659 00:32:34,930 --> 00:32:36,980 What you'd really like to maintain 660 00:32:36,980 --> 00:32:40,520 is what the value was before the change was made so that you can 661 00:32:40,520 --> 00:32:43,280 undo if you need to, and what the value 662 00:32:43,280 --> 00:32:46,940 is after the change was made so that you can redo if you have 663 00:32:46,940 --> 00:32:50,230 to if by chance the actual cell store didn't 664 00:32:50,230 --> 00:32:51,850 get written at the right time. 665 00:32:51,850 --> 00:32:54,450 So really the way to think about logging base crash 666 00:32:54,450 --> 00:32:56,420 recover is that the log is really 667 00:32:56,420 --> 00:32:59,100 the authoritative version of the data. 668 00:32:59,100 --> 00:33:01,970 The cell store itself is you should think of as a cache. 669 00:33:01,970 --> 00:33:03,424 And we've seen this idea before. 670 00:33:03,424 --> 00:33:05,340 The cell store you should think of as a cache. 671 00:33:05,340 --> 00:33:07,630 If a failure happens, you really have 672 00:33:07,630 --> 00:33:09,730 to be careful about trusting the cell store. 673 00:33:09,730 --> 00:33:12,280 And, you don't trust what's in the cell store. 674 00:33:12,280 --> 00:33:15,050 You start with a log, and by selectively 675 00:33:15,050 --> 00:33:16,890 undoing certain changes that were made 676 00:33:16,890 --> 00:33:19,090 and redoing certain changes, you produce 677 00:33:19,090 --> 00:33:22,970 a more pristine, correct version of the data, which corresponds 678 00:33:22,970 --> 00:33:25,430 to the changes made by all the committed actions being 679 00:33:25,430 --> 00:33:29,120 visible, and the changes made by all the uncommitted actions 680 00:33:29,120 --> 00:33:35,290 being wiped away to the previous version. 681 00:33:35,290 --> 00:33:37,040 OK, so what does the log look like? 682 00:33:37,040 --> 00:33:40,200 Well, as I've already said, a log is like a version history 683 00:33:40,200 --> 00:33:41,900 except it interleaves everything, 684 00:33:41,900 --> 00:33:42,960 and it's sequential. 685 00:33:42,960 --> 00:33:44,870 So it's really an append-only data structure. 686 00:33:48,410 --> 00:33:58,240 And there's a few different kinds of records 687 00:33:58,240 --> 00:34:00,110 that the log maintains. 688 00:34:00,110 --> 00:34:03,480 In particular, two are going to be interesting to us. 689 00:34:03,480 --> 00:34:10,550 So there are two types of records that we care about. 690 00:34:10,550 --> 00:34:14,860 The first type are update records, 691 00:34:14,860 --> 00:34:18,750 which are written to the log whenever 692 00:34:18,750 --> 00:34:22,050 a cell store item changes. 693 00:34:22,050 --> 00:34:25,860 So, if X goes from 17-25, what you would write 694 00:34:25,860 --> 00:34:27,850 is an update record that looks like this. 695 00:34:27,850 --> 00:34:31,989 You store the ID of the transaction, 696 00:34:31,989 --> 00:34:35,560 sorry, ID of the recoverable action that did the update. 697 00:34:35,560 --> 00:34:38,850 And then, you store two items. 698 00:34:38,850 --> 00:34:42,960 One of them is an undo item or an undo action, actually. 699 00:34:42,960 --> 00:34:49,140 And, an undo that might [save/say?], and a redo action. 700 00:34:54,610 --> 00:34:57,070 So what this means here is that let's say 701 00:34:57,070 --> 00:34:59,550 that the actual step of this action 702 00:34:59,550 --> 00:35:04,900 said X is assigned to some value, new. 703 00:35:04,900 --> 00:35:06,500 In the log, what you would write is 704 00:35:06,500 --> 00:35:09,220 keep track of old value, the current value of X, 705 00:35:09,220 --> 00:35:12,021 and make that the undo step. 706 00:35:12,021 --> 00:35:14,020 And then, keep track of the change that was made 707 00:35:14,020 --> 00:35:17,860 and make that the real step. 708 00:35:17,860 --> 00:35:21,600 So now, after doing this, if the system were to fail, 709 00:35:21,600 --> 00:35:26,070 and this action 172 were to never commit then 710 00:35:26,070 --> 00:35:28,250 you can systematically start with the log, 711 00:35:28,250 --> 00:35:29,950 start with the latest item in the log 712 00:35:29,950 --> 00:35:34,670 and go backwards, and undo any changes made 713 00:35:34,670 --> 00:35:37,110 by actions that didn't commit. 714 00:35:37,110 --> 00:35:39,970 And conversely, and you might need to do this as well, 715 00:35:39,970 --> 00:35:42,780 you might want to look at all the actions that committed, 716 00:35:42,780 --> 00:35:45,480 and make sure that all those actions, those individual steps 717 00:35:45,480 --> 00:35:48,735 in those actions are redone so that once the crash recovers, 718 00:35:48,735 --> 00:35:50,360 you have a correct version of the data. 719 00:35:53,240 --> 00:35:55,170 Now the other thing that you will need, 720 00:35:55,170 --> 00:36:00,970 and you'll see why in a moment, is another kind, a record 721 00:36:00,970 --> 00:36:07,214 and a log, which we're going to call the outcome record. 722 00:36:07,214 --> 00:36:08,630 And this outcome is the thing that 723 00:36:08,630 --> 00:36:11,554 keeps track of whether an action committed or not. 724 00:36:11,554 --> 00:36:13,720 Remember I said you're going to look through the log 725 00:36:13,720 --> 00:36:15,390 and figure out which actions committed, 726 00:36:15,390 --> 00:36:16,580 and which didn't commit. 727 00:36:16,580 --> 00:36:18,220 You need to store that somewhere. 728 00:36:18,220 --> 00:36:21,011 In particular, what that means is that when an action commits, 729 00:36:21,011 --> 00:36:23,510 you had better make sure that there is in it them in the log 730 00:36:23,510 --> 00:36:25,660 because the log really is the only correct version 731 00:36:25,660 --> 00:36:26,900 of the data. 732 00:36:26,900 --> 00:36:29,100 So you have an outcome record, and this 733 00:36:29,100 --> 00:36:31,490 has an ID of the action. 734 00:36:31,490 --> 00:36:34,470 It might be 174. 735 00:36:34,470 --> 00:36:39,550 And, there's a status that might stay committed. 736 00:36:42,450 --> 00:36:45,230 And other values for the status might be aborted 737 00:36:45,230 --> 00:36:49,250 is a possible value of the status. 738 00:36:49,250 --> 00:36:50,770 Another is pending. 739 00:36:50,770 --> 00:36:56,770 So for various reasons, what we will 740 00:36:56,770 --> 00:37:00,380 have is when begin recoverable action returns with an ID, 741 00:37:00,380 --> 00:37:03,360 we will create a log entry that says 742 00:37:03,360 --> 00:37:05,556 that this action has begun. 743 00:37:05,556 --> 00:37:06,930 So you might have a begin record. 744 00:37:06,930 --> 00:37:10,200 It's not that important to worry about for now. 745 00:37:10,200 --> 00:37:13,540 But the status of a committed record and an aborted, 746 00:37:13,540 --> 00:37:17,300 and the update type are important to understand. 747 00:37:20,220 --> 00:37:23,800 So once you have this log structure understood, 748 00:37:23,800 --> 00:37:26,390 or the log data structure understood, 749 00:37:26,390 --> 00:37:28,780 what you have to think about our there 750 00:37:28,780 --> 00:37:31,520 are two questions that you end up spending a lot of time 751 00:37:31,520 --> 00:37:35,430 thinking about in designing these log-based protocols. 752 00:37:35,430 --> 00:37:37,830 The first one is when to write the log. 753 00:37:45,040 --> 00:37:47,570 And the second one is, you know, I sort of 754 00:37:47,570 --> 00:37:49,030 said you just look through the log 755 00:37:49,030 --> 00:37:50,980 and undo the guys who didn't commit, 756 00:37:50,980 --> 00:37:52,990 and redo the people who committed. 757 00:37:52,990 --> 00:37:55,150 But you have to be very careful about doing that. 758 00:37:55,150 --> 00:37:58,020 And that corresponds to this question 759 00:37:58,020 --> 00:38:03,150 of exactly how to recover, how to systematically recover 760 00:38:03,150 --> 00:38:05,920 so the state of the system is as I have described before. 761 00:38:08,054 --> 00:38:09,970 So those are the questions we're going to deal 762 00:38:09,970 --> 00:38:11,220 with for the next few minutes. 763 00:38:15,890 --> 00:38:18,580 Let's do this with a specific example. 764 00:38:18,580 --> 00:38:20,190 And it will turn out and to answer 765 00:38:20,190 --> 00:38:21,770 doesn't really depend on the example. 766 00:38:21,770 --> 00:38:25,680 But the example is good to give you the right intuition. 767 00:38:25,680 --> 00:38:28,690 And this example is actually pretty common example 768 00:38:28,690 --> 00:38:30,140 of a disk-bound database. 769 00:38:34,800 --> 00:38:40,270 So a disk bound database is one where 770 00:38:40,270 --> 00:38:43,930 you have applications writing to a database, which 771 00:38:43,930 --> 00:38:47,000 is where the cell storage is implemented. 772 00:38:47,000 --> 00:38:48,560 And the cell storage is on disk. 773 00:38:52,020 --> 00:38:57,680 So, you might have writes of cell items, X, 774 00:38:57,680 --> 00:39:00,240 and they go to a database. 775 00:39:00,240 --> 00:39:03,510 And similarly, in any disk bound database 776 00:39:03,510 --> 00:39:05,240 that you want crash recovery for, 777 00:39:05,240 --> 00:39:06,600 you need to maintain a log. 778 00:39:06,600 --> 00:39:09,699 And for various reasons having to do primarily 779 00:39:09,699 --> 00:39:11,990 with dealing with failures of the disk hardware itself, 780 00:39:11,990 --> 00:39:15,020 it's very often useful to an experience 781 00:39:15,020 --> 00:39:18,360 to maintain the log on a different disk. 782 00:39:18,360 --> 00:39:20,230 So we'll maintain for this example 783 00:39:20,230 --> 00:39:22,330 the log on a different disk. 784 00:39:22,330 --> 00:39:26,970 So whenever write X is done, just looking at the log data 785 00:39:26,970 --> 00:39:31,030 structure, you need to write an update record 786 00:39:31,030 --> 00:39:33,160 and append that to the log. 787 00:39:33,160 --> 00:39:36,530 So at some point you would need to write this to the log. 788 00:39:36,530 --> 00:39:40,700 You need to log the update -- 789 00:39:40,700 --> 00:39:47,650 -- that says that X change from something to something else. 790 00:39:47,650 --> 00:39:52,049 So the question is, when do you write both of these? 791 00:39:52,049 --> 00:39:54,340 So one approach might be that it really doesn't matter. 792 00:39:54,340 --> 00:39:56,990 As long as the log gets the data, you're fine. 793 00:39:56,990 --> 00:39:59,180 But that has a couple of problems. 794 00:39:59,180 --> 00:40:02,140 In particular, suppose you write X 795 00:40:02,140 --> 00:40:04,064 without writing the log entry. 796 00:40:04,064 --> 00:40:06,230 And as soon as you write X, before you have a chance 797 00:40:06,230 --> 00:40:10,140 to write to the log, you crash, or the system 798 00:40:10,140 --> 00:40:14,420 causes this program to abort, or the program itself aborts. 799 00:40:14,420 --> 00:40:17,250 It writes X and then it does some calculation 800 00:40:17,250 --> 00:40:20,300 and the it decides to abort. 801 00:40:20,300 --> 00:40:25,260 Now you are in trouble because the log hasn't kept track yet 802 00:40:25,260 --> 00:40:27,360 the log hasn't had a chance of keeping 803 00:40:27,360 --> 00:40:31,240 track of what the old value was, which 804 00:40:31,240 --> 00:40:32,930 means that if you really want to restore 805 00:40:32,930 --> 00:40:36,730 this database by undoing this write to X, 806 00:40:36,730 --> 00:40:38,380 you have to do a whole lot of work. 807 00:40:38,380 --> 00:40:40,110 And it might be impossible to do it. 808 00:40:40,110 --> 00:40:42,900 If you didn't know, for example, what the current value was, 809 00:40:42,900 --> 00:40:44,470 there was absolutely no way for you 810 00:40:44,470 --> 00:40:48,890 to restore to the old value. 811 00:40:48,890 --> 00:40:52,910 So what this suggests is that you better not write 812 00:40:52,910 --> 00:40:55,550 to the cell store before you write to the log 813 00:40:55,550 --> 00:40:59,350 because if you wrote to the cell store log write, 814 00:40:59,350 --> 00:41:03,190 and the system crashed right after or failure about it, 815 00:41:03,190 --> 00:41:05,370 you won't really have a way in general 816 00:41:05,370 --> 00:41:09,010 of reverting to the version of the data item 817 00:41:09,010 --> 00:41:10,429 before this write. 818 00:41:10,429 --> 00:41:12,470 And you do need to revert because it just aborted 819 00:41:12,470 --> 00:41:12,970 or fails. 820 00:41:12,970 --> 00:41:18,280 So you need to back out of all changes that were made. 821 00:41:18,280 --> 00:41:21,630 So that suggests the first part of our protocol 822 00:41:21,630 --> 00:41:23,615 which we are going to call the wall protocol. 823 00:41:26,809 --> 00:41:29,100 Actually, that is the wall, I mean, not the first part. 824 00:41:29,100 --> 00:41:30,474 This suggests this wall protocol. 825 00:41:30,474 --> 00:41:32,090 Wall stands for write-ahead logging. 826 00:41:38,940 --> 00:41:46,930 And the protocol says update the log or append to the log 827 00:41:46,930 --> 00:41:50,570 before you write to the cell store. 828 00:41:50,570 --> 00:41:51,630 It's what it says. 829 00:41:51,630 --> 00:41:58,440 Write ahead log says write the log before you write the cell 830 00:41:58,440 --> 00:42:00,500 store. 831 00:42:00,500 --> 00:42:03,630 The advantage of writing the log before you write to the cell 832 00:42:03,630 --> 00:42:09,930 store is that suppose now you set X to some value 833 00:42:09,930 --> 00:42:12,110 and then you crashed. 834 00:42:12,110 --> 00:42:15,400 Then you're guaranteed that if the cell store got written, 835 00:42:15,400 --> 00:42:21,080 the log got written, which means that if this action didn't 836 00:42:21,080 --> 00:42:22,960 commit, you can go through the log 837 00:42:22,960 --> 00:42:26,570 and undo that action because you know that the log entry got 838 00:42:26,570 --> 00:42:29,550 written correctly before the cell store got written. 839 00:42:29,550 --> 00:42:31,300 And if the log entry didn't get written, 840 00:42:31,300 --> 00:42:32,520 then you know the cell store didn't 841 00:42:32,520 --> 00:42:34,853 get written, which means you don't have to undo anything 842 00:42:34,853 --> 00:42:36,810 for that particular data item. 843 00:42:36,810 --> 00:42:39,040 So either way you're fine. 844 00:42:39,040 --> 00:42:44,380 There is another part of this protocol 845 00:42:44,380 --> 00:42:46,300 that we're going to need to meet the semantics 846 00:42:46,300 --> 00:42:48,870 of a recoverable action that we wanted, 847 00:42:48,870 --> 00:42:51,550 which is that once you reach commit, 848 00:42:51,550 --> 00:42:54,110 you want the changes made by that action 849 00:42:54,110 --> 00:42:55,990 to be visible to all the other people, 850 00:42:55,990 --> 00:42:59,590 all of the other actions that are subsequent actions. 851 00:42:59,590 --> 00:43:02,990 And what that means is that before you return 852 00:43:02,990 --> 00:43:05,860 from the commit, you had better make sure 853 00:43:05,860 --> 00:43:09,740 that the commit record for this action is logged to the disk, 854 00:43:09,740 --> 00:43:15,950 is logged, because if you didn't do that, and you just returned, 855 00:43:15,950 --> 00:43:23,150 then you can't be guaranteed that all of the writes that 856 00:43:23,150 --> 00:43:25,270 were done to the cell item were actually 857 00:43:25,270 --> 00:43:26,780 put on to the cell store. 858 00:43:26,780 --> 00:43:28,530 There's no guarantee that these writes 859 00:43:28,530 --> 00:43:30,640 to the cell store actually got written to the cell 860 00:43:30,640 --> 00:43:32,640 store because all you are doing in this protocol 861 00:43:32,640 --> 00:43:34,600 is ensuring that the writes to the log 862 00:43:34,600 --> 00:43:36,150 are being written before the writes to the data. 863 00:43:36,150 --> 00:43:38,233 Nobody is saying when the writes of the cell store 864 00:43:38,233 --> 00:43:39,880 really are happening and finishing, 865 00:43:39,880 --> 00:43:45,080 which means if the action commits, 866 00:43:45,080 --> 00:43:48,180 and you return committed to the user to the application, 867 00:43:48,180 --> 00:43:50,280 then you had better have a way of making sure 868 00:43:50,280 --> 00:43:51,660 that if the failure now happened, 869 00:43:51,660 --> 00:43:54,290 the system when it recovers knows 870 00:43:54,290 --> 00:43:58,070 that this action committed, which means it follows, then, 871 00:43:58,070 --> 00:44:01,100 that if you want those semantics that you'd better 872 00:44:01,100 --> 00:44:05,490 write the commit record, the fact that this action committed 873 00:44:05,490 --> 00:44:08,200 to the log before the commit returns. 874 00:44:08,200 --> 00:44:10,950 And really the only reason you need that is that 875 00:44:10,950 --> 00:44:13,520 we've established; we've decided that we wanted the semantics 876 00:44:13,520 --> 00:44:15,978 [the?] different action commits, you want the results to be 877 00:44:15,978 --> 00:44:17,150 visible to everybody else. 878 00:44:17,150 --> 00:44:20,640 And later on, we'll see that this is related 879 00:44:20,640 --> 00:44:23,730 to this notion of durability. 880 00:44:23,730 --> 00:44:30,620 So write commit record before -- 881 00:44:35,120 --> 00:44:46,880 returning for commit. 882 00:44:46,880 --> 00:44:49,890 So two main ideas: write ahead logging means 883 00:44:49,890 --> 00:44:52,060 make sure that you write the log, append to the log 884 00:44:52,060 --> 00:44:54,070 before you write to the cell store. 885 00:44:54,070 --> 00:44:57,400 And in order to make sure that committed actions, the results 886 00:44:57,400 --> 00:45:00,010 of committed actions are visible even after failure 887 00:45:00,010 --> 00:45:02,254 to subsequent actions, log the commit record 888 00:45:02,254 --> 00:45:03,670 before you return from the commit. 889 00:45:11,070 --> 00:45:12,570 So now we are actually in good shape 890 00:45:12,570 --> 00:45:17,950 to specify this recovery procedure 891 00:45:17,950 --> 00:45:20,870 that I've alluded to before because the log is 892 00:45:20,870 --> 00:45:23,940 going to contain these update records and these outcome 893 00:45:23,940 --> 00:45:25,390 records. 894 00:45:25,390 --> 00:45:27,830 And that's going to allow us to decide 895 00:45:27,830 --> 00:45:30,496 what to do upon crash recovery. 896 00:45:30,496 --> 00:45:31,870 And actually the only other piece 897 00:45:31,870 --> 00:45:35,299 we need is to decide what happens on an abort. 898 00:45:35,299 --> 00:45:37,090 And that's actually pretty straightforward. 899 00:45:37,090 --> 00:45:39,390 If the system calls abort, or if the user application 900 00:45:39,390 --> 00:45:42,840 calls abort on an action, what abort has to do 901 00:45:42,840 --> 00:45:44,500 is to look through the log. 902 00:45:44,500 --> 00:45:47,420 Remember that all of the rights have been written. 903 00:45:47,420 --> 00:45:49,314 Any time a write happens, you don't actually 904 00:45:49,314 --> 00:45:50,730 care about when the write actually 905 00:45:50,730 --> 00:45:52,180 happens at the cell store. 906 00:45:52,180 --> 00:45:56,120 What you care about is that the write happens to the log 907 00:45:56,120 --> 00:45:58,600 before the write happens to the cell store. 908 00:45:58,600 --> 00:46:01,160 So, if an abort were called, all you have to do 909 00:46:01,160 --> 00:46:03,890 is to ensure that before abort returns, 910 00:46:03,890 --> 00:46:08,720 all of the actions done by, all of the steps taken 911 00:46:08,720 --> 00:46:12,250 by this action around done, and the corresponding cell values 912 00:46:12,250 --> 00:46:12,920 are on done. 913 00:46:15,860 --> 00:46:19,010 And that's all you have to do when you implement abort. 914 00:46:22,550 --> 00:46:27,880 So one thing that I haven't really specified very clearly 915 00:46:27,880 --> 00:46:30,310 is when the actual writes happen to the disk 916 00:46:30,310 --> 00:46:31,960 or to any cell store. 917 00:46:31,960 --> 00:46:36,540 And it turns out that it really doesn't matter. 918 00:46:36,540 --> 00:46:38,791 If there's no failure, as long as you ensure, 919 00:46:38,791 --> 00:46:40,290 you could have caches in the middle. 920 00:46:40,290 --> 00:46:41,498 You could have anything else. 921 00:46:41,498 --> 00:46:44,940 So, as long as you ensure that if there's no concurrency, 922 00:46:44,940 --> 00:46:46,290 we'll deal with that next time. 923 00:46:46,290 --> 00:46:47,995 But as long as you ensure that when 924 00:46:47,995 --> 00:46:50,120 you have actions that come one after the other that 925 00:46:50,120 --> 00:46:53,390 are recoverable that the values that are read 926 00:46:53,390 --> 00:46:57,810 are only the values that were written by previously 927 00:46:57,810 --> 00:47:00,350 committed actions, then it really 928 00:47:00,350 --> 00:47:03,750 doesn't matter when those were actually written to disk. 929 00:47:03,750 --> 00:47:06,610 [NOISE OBSCURES] main thing that matters is 930 00:47:06,610 --> 00:47:11,040 make sure the log keeps track exactly of all the things 931 00:47:11,040 --> 00:47:13,180 to undo for uncommitted actions. 932 00:47:13,180 --> 00:47:14,680 And for things that got committed, 933 00:47:14,680 --> 00:47:19,610 to make sure that the log keeps track of the commit record 934 00:47:19,610 --> 00:47:20,780 before the commit returns. 935 00:47:24,730 --> 00:47:27,430 So given the story, the way the recovery procedure 936 00:47:27,430 --> 00:47:28,830 works as the following. 937 00:47:28,830 --> 00:47:31,676 The first step is the system fails, and that it recovers. 938 00:47:31,676 --> 00:47:32,800 You scan the log backwards. 939 00:47:39,690 --> 00:47:41,820 And as you are scanning the log backwards, 940 00:47:41,820 --> 00:47:45,430 you keep track of two kinds of actions. 941 00:47:45,430 --> 00:47:50,490 You keep track of actions that were either committed or were 942 00:47:50,490 --> 00:47:52,350 aborted, OK? 943 00:47:52,350 --> 00:47:55,620 And what that means is that for actions 944 00:47:55,620 --> 00:47:58,600 that were committed or aborted, the cell store 945 00:47:58,600 --> 00:48:01,400 for those actions is in a certain state 946 00:48:01,400 --> 00:48:03,400 or needs to be in a certain state. 947 00:48:03,400 --> 00:48:05,080 For committed actions, it needs to be 948 00:48:05,080 --> 00:48:08,470 in a state that's the result of finishing the committed action. 949 00:48:08,470 --> 00:48:10,410 And for the aborted actions, what it means 950 00:48:10,410 --> 00:48:12,780 is that when the abort returned and there 951 00:48:12,780 --> 00:48:15,310 was an aborted action, abort already 952 00:48:15,310 --> 00:48:17,249 undid the state of the cell store 953 00:48:17,249 --> 00:48:19,540 by definition by the definition of the abort procedure. 954 00:48:19,540 --> 00:48:22,190 So what that means is for log records that 955 00:48:22,190 --> 00:48:24,834 contain a type outcome and the status abort 956 00:48:24,834 --> 00:48:26,250 that you don't have to do anything 957 00:48:26,250 --> 00:48:29,050 because the changes are already on done before 958 00:48:29,050 --> 00:48:31,400 that abort record was written. 959 00:48:31,400 --> 00:48:33,360 So what you do in scanning the log backwards 960 00:48:33,360 --> 00:48:35,550 is you build up two kinds of actions. 961 00:48:35,550 --> 00:48:38,140 You build up winners, which are actions 962 00:48:38,140 --> 00:48:42,355 that were committed or aborted. 963 00:48:45,300 --> 00:48:51,182 And you build up a list of losers that were none of these. 964 00:48:51,182 --> 00:48:52,890 In other words, they were pending actions 965 00:48:52,890 --> 00:48:56,320 that kind of just during a failure they were pending, 966 00:48:56,320 --> 00:48:57,340 so they didn't commit. 967 00:48:57,340 --> 00:48:58,506 And they were never aborted. 968 00:49:06,150 --> 00:49:08,160 And so the plan now is to make sure 969 00:49:08,160 --> 00:49:10,820 that the cell store is correctly restored to the state that 970 00:49:10,820 --> 00:49:16,010 was before the crash where all of the committed actions' 971 00:49:16,010 --> 00:49:20,080 results are visible, and none of the uncommitted actions, 972 00:49:20,080 --> 00:49:22,430 you know, all of those are blown away. 973 00:49:22,430 --> 00:49:26,835 All you have to do is to [UNINTELLIGIBLE] 974 00:49:26,835 --> 00:49:27,460 were committed. 975 00:49:27,460 --> 00:49:29,220 You don't have to do anything for the aborted winners 976 00:49:29,220 --> 00:49:30,910 because they were already undone. 977 00:49:30,910 --> 00:49:37,370 So you have to redo committed winners, 978 00:49:37,370 --> 00:49:46,020 and you have to undo any changes made by losers, right, 979 00:49:46,020 --> 00:49:47,680 because these losers by definition 980 00:49:47,680 --> 00:49:50,320 were things that didn't commit or didn't abort. 981 00:49:50,320 --> 00:49:53,350 And the reason you only redo the committed winners rather than 982 00:49:53,350 --> 00:49:55,735 all winners is it makes no sense to redo aborted winners. 983 00:49:55,735 --> 00:49:58,110 And you don't need to undo them because they were already 984 00:49:58,110 --> 00:50:06,410 undone when the abort record was written to the log. 985 00:50:06,410 --> 00:50:08,460 So this is the basic idea for dealing 986 00:50:08,460 --> 00:50:11,350 with one of these databases. 987 00:50:11,350 --> 00:50:13,050 But there's five or six optimizations 988 00:50:13,050 --> 00:50:16,875 that end up making this kind of system go faster. 989 00:50:16,875 --> 00:50:18,750 You'll see some of these optimizations buried 990 00:50:18,750 --> 00:50:22,190 inside the system R paper, which is the discussion for tomorrow. 991 00:50:22,190 --> 00:50:25,750 But what I'll do on Monday, I'll spend five minutes talking 992 00:50:25,750 --> 00:50:28,230 about the most important optimizations, 993 00:50:28,230 --> 00:50:30,440 and I think the whole story will become clear. 994 00:50:30,440 --> 00:50:32,660 So the plan for the subsequent lectures on this topic 995 00:50:32,660 --> 00:50:34,850 are: on Monday we'll deal with isolation, 996 00:50:34,850 --> 00:50:37,580 and on Wednesday we'll continue to talk about isolation, 997 00:50:37,580 --> 00:50:41,140 and then talk about a different issue of consistency.