1
00:00:00,770 --> 00:00:06,000
Today's topic is one of
the most important concepts

2
00:00:06,000 --> 00:00:08,940
in this area, and it
is called atomicity.

3
00:00:08,940 --> 00:00:11,020
And what we are going
to do is spend time

4
00:00:11,020 --> 00:00:13,160
understanding what
this is as a concept

5
00:00:13,160 --> 00:00:19,240
and then understanding how to
achieve atomicity in systems.

6
00:00:19,240 --> 00:00:24,120
And recall that the main
goal is to handle "failures",

7
00:00:24,120 --> 00:00:27,636
and that is what we talked
about the last time.

8
00:00:27,636 --> 00:00:30,010
And we came up with a bunch
of different ways of thinking

9
00:00:30,010 --> 00:00:32,509
about failures and
how to cope with it.

10
00:00:32,509 --> 00:00:40,110
And one idea that we saw the
last time was an idea involving

11
00:00:40,110 --> 00:00:42,620
replicating a
component, let's say

12
00:00:42,620 --> 00:00:49,070
a disk or any component
whose failure you

13
00:00:49,070 --> 00:00:51,445
wish to cope with and
vote on the results.

14
00:00:55,420 --> 00:00:58,640
And so the idea is that if
you are not exactly sure what

15
00:00:58,640 --> 00:00:59,890
the right answer should be--

16
00:00:59,890 --> 00:01:02,960
If you are not sure whether
any given component is working

17
00:01:02,960 --> 00:01:05,030
correctly or not,
replicate that component

18
00:01:05,030 --> 00:01:10,440
and then give them all the same
input, see what output appears

19
00:01:10,440 --> 00:01:11,770
and then vote on the results.

20
00:01:11,770 --> 00:01:14,340
And we did see that these
things are pretty sophisticated,

21
00:01:14,340 --> 00:01:16,840
but the main problem
with replicate plus vote

22
00:01:16,840 --> 00:01:20,490
is that often it is extremely
expensive to build and very,

23
00:01:20,490 --> 00:01:22,870
very hard to get right.

24
00:01:22,870 --> 00:01:25,230
And, second, it often
does not actually work.

25
00:01:25,230 --> 00:01:28,790
For example, if you just
take a software program,

26
00:01:28,790 --> 00:01:33,240
a software module and you
make 100 copies or 95 copies

27
00:01:33,240 --> 00:01:34,890
of that software
module and give them

28
00:01:34,890 --> 00:01:37,050
all the same input and
then vote on the output,

29
00:01:37,050 --> 00:01:38,860
if you have a bug in
one of the modules

30
00:01:38,860 --> 00:01:41,290
and it is a bug that
is actually replicated

31
00:01:41,290 --> 00:01:44,010
in all of the modules
then all of the replicas

32
00:01:44,010 --> 00:01:46,230
are going to give you
the same wrong answer.

33
00:01:46,230 --> 00:01:48,930
So the key assumption behind
replicating and voting

34
00:01:48,930 --> 00:01:51,320
is that the replicas are
independent of each other

35
00:01:51,320 --> 00:01:53,730
and have independent
modes of failure.

36
00:01:53,730 --> 00:01:58,340
And that may not be true
in all of your modules.

37
00:01:58,340 --> 00:02:00,915
And so the way we are going
to deal with this problem,

38
00:02:00,915 --> 00:02:03,290
and even though it is possible
to design software systems

39
00:02:03,290 --> 00:02:06,090
where the replicas are, in
fact, independent of each other,

40
00:02:06,090 --> 00:02:08,430
it will turn out that
it is quite expensive

41
00:02:08,430 --> 00:02:10,100
to do in many cases.

42
00:02:10,100 --> 00:02:14,470
So what we are going to do, to
relax this assumption of having

43
00:02:14,470 --> 00:02:18,680
a system which handles
failures by giving

44
00:02:18,680 --> 00:02:21,220
the same input to multiple
outputs and then voting on it,

45
00:02:21,220 --> 00:02:23,610
we are going to relax
that and instead

46
00:02:23,610 --> 00:02:26,595
look at a different concept
called "recoverability".

47
00:02:29,770 --> 00:02:31,880
And the idea here
is rather is rather

48
00:02:31,880 --> 00:02:36,980
than to try to replicate modules
so that to the higher layers

49
00:02:36,980 --> 00:02:40,200
it looks as if the underlying
module has never failed

50
00:02:40,200 --> 00:02:43,190
because you have replicated
it, the idea here is to allow

51
00:02:43,190 --> 00:02:45,500
the underlying module to fail.

52
00:02:45,500 --> 00:02:48,570
But have it fail, typically
in a fail fast manner

53
00:02:48,570 --> 00:02:50,640
so that you can
detect the failure,

54
00:02:50,640 --> 00:02:53,050
and then arrange for that
module to be restarted.

55
00:02:53,050 --> 00:02:55,440
And when it restarts
the idea is to make it

56
00:02:55,440 --> 00:02:58,000
so that the module
does something

57
00:02:58,000 --> 00:03:01,250
such that in the end
the state of the system,

58
00:03:01,250 --> 00:03:03,220
after it does that
thing, usually some kind

59
00:03:03,220 --> 00:03:06,900
of recovery procedure
is that you can get back

60
00:03:06,900 --> 00:03:08,310
to using that module.

61
00:03:08,310 --> 00:03:10,230
So it is a little bit
like rather than try

62
00:03:10,230 --> 00:03:13,907
to build, you know, the analogy
might be something like this.

63
00:03:13,907 --> 00:03:16,240
You might imagine, let's say
there is a little child who

64
00:03:16,240 --> 00:03:17,310
is learning to walk.

65
00:03:17,310 --> 00:03:19,040
One approach for
nature to have adopted

66
00:03:19,040 --> 00:03:21,498
would have been to try to make
it so the child never falls.

67
00:03:21,498 --> 00:03:23,830
And there is a lot of complexity
associated with always

68
00:03:23,830 --> 00:03:25,385
keeping that child walking.

69
00:03:25,385 --> 00:03:26,760
Or, alternatively,
you could have

70
00:03:26,760 --> 00:03:29,580
a story or a method by
which every once in a while

71
00:03:29,580 --> 00:03:33,380
the child falls but then has a
plan to get up from that fall

72
00:03:33,380 --> 00:03:35,182
and then restart.

73
00:03:35,182 --> 00:03:37,140
So that is the plan that
we are going to adopt.

74
00:03:37,140 --> 00:03:40,522
And this notion here is
called recoverability.

75
00:03:40,522 --> 00:03:41,980
And the general
plan is going to be

76
00:03:41,980 --> 00:03:48,730
that if you have a module M1
which invokes another module M2

77
00:03:48,730 --> 00:03:56,080
and M2 were to fail then
the idea is that M2 fails

78
00:03:56,080 --> 00:03:59,450
and then it recovers and
you restart the module.

79
00:03:59,450 --> 00:04:03,800
And you want to make sure that
M2 is left in a situation,

80
00:04:03,800 --> 00:04:06,460
once it recovers, where
there is no partial state.

81
00:04:06,460 --> 00:04:09,982
And I will define that more
precisely as we go along today.

82
00:04:09,982 --> 00:04:12,190
But the main idea is going
to be to insure that there

83
00:04:12,190 --> 00:04:15,010
is no vestige of
previous computations

84
00:04:15,010 --> 00:04:17,176
that are in the
middle of being run.

85
00:04:17,176 --> 00:04:19,050
So the state of the
system, when it recovers,

86
00:04:19,050 --> 00:04:21,149
is at a well-understood
point so that M1

87
00:04:21,149 --> 00:04:22,960
can continue to use that.

88
00:04:22,960 --> 00:04:25,050
So there is no
"partial" state where

89
00:04:25,050 --> 00:04:27,160
partial is in quotes here.

90
00:04:27,160 --> 00:04:29,960
And we will talk about
what it means for something

91
00:04:29,960 --> 00:04:31,969
to be in a partial state.

92
00:04:31,969 --> 00:04:33,760
The idea is to prevent
that from happening.

93
00:04:39,040 --> 00:04:41,830
So we are going to
do this by starting

94
00:04:41,830 --> 00:04:43,400
with an example,
and the same example

95
00:04:43,400 --> 00:04:49,410
that I mentioned the last time
which was a transfer of money

96
00:04:49,410 --> 00:04:52,797
from one bank
account to another.

97
00:04:52,797 --> 00:04:54,880
There is a "from" account,
there is a "to" account

98
00:04:54,880 --> 00:05:00,010
and some dollar "amount".

99
00:05:00,010 --> 00:05:03,820
And you want to transfer
money from "from" to "to"

100
00:05:03,820 --> 00:05:06,292
and it is whatever
the "amount" is.

101
00:05:06,292 --> 00:05:07,750
And the problem
here is, of course,

102
00:05:07,750 --> 00:05:11,770
that in the middle of transfer
this procedure might fail,

103
00:05:11,770 --> 00:05:14,170
the system might
crash and you might

104
00:05:14,170 --> 00:05:18,440
be left in a situation where
a part of this transfer

105
00:05:18,440 --> 00:05:19,890
has already run.

106
00:05:19,890 --> 00:05:22,210
To take a specific
example, here is

107
00:05:22,210 --> 00:05:24,750
an example of what the transfer
procedure might look like.

108
00:05:24,750 --> 00:05:28,120
It takes a "from" and
a "to" and an "amount".

109
00:05:28,120 --> 00:05:30,930
And the first thing
it does is to read.

110
00:05:30,930 --> 00:05:33,260
Assume that all of this
data is stored on disk.

111
00:05:33,260 --> 00:05:37,380
It reads from the "from"
account and then it reduces,

112
00:05:37,380 --> 00:05:42,290
it debits the amount from the
"account" and then writes back.

113
00:05:42,290 --> 00:05:45,189
And it does the same
thing to the "to" account.

114
00:05:45,189 --> 00:05:46,980
So in the end, if this
procedure completely

115
00:05:46,980 --> 00:05:52,010
ran, then "from" account would
be reduced by "amount" and "to"

116
00:05:52,010 --> 00:05:54,530
account would be
enhanced by "amount".

117
00:05:54,530 --> 00:05:57,070
Of course, the problem is
you might have a failure

118
00:05:57,070 --> 00:05:58,930
anywhere in the middle.

119
00:05:58,930 --> 00:06:01,450
And, as a concrete
example, if a crash

120
00:06:01,450 --> 00:06:04,680
were to happen after the
first three lines shown above,

121
00:06:04,680 --> 00:06:07,230
if you owned this account
you would not be very happy

122
00:06:07,230 --> 00:06:09,530
because you just lost
some money from an account

123
00:06:09,530 --> 00:06:12,000
and nothing happened.

124
00:06:12,000 --> 00:06:16,500
No other account got
money added to it,

125
00:06:16,500 --> 00:06:18,490
and this is the problem
that we want to avoid.

126
00:06:18,490 --> 00:06:21,770
If you think about this for
a moment, what you would like

127
00:06:21,770 --> 00:06:25,910
intuitively is that if a
crash like this were to happen

128
00:06:25,910 --> 00:06:28,570
and the system were to
recover and come back up,

129
00:06:28,570 --> 00:06:30,990
there are really only two
states that the system should

130
00:06:30,990 --> 00:06:32,700
be in for the system
to really be correct

131
00:06:32,700 --> 00:06:35,540
and to meet what your
intuition might expect.

132
00:06:35,540 --> 00:06:38,060
Either this procedure must
completely be finished, that is

133
00:06:38,060 --> 00:06:40,950
the state of the system must be
the same as if this procedure

134
00:06:40,950 --> 00:06:45,200
completely ran and finished,
or the state of the system

135
00:06:45,200 --> 00:06:49,110
must be such that the
procedure never ran at all.

136
00:06:49,110 --> 00:06:52,070
It is not at all OK to let
the state of the system

137
00:06:52,070 --> 00:06:56,930
be equal to whatever the
state was, in this example,

138
00:06:56,930 --> 00:07:00,050
at the time the crash happened.

139
00:07:00,050 --> 00:07:02,470
What you want is a kind of
all or nothing behavior.

140
00:07:02,470 --> 00:07:09,619
And, of course, if the crash
happened as I have shown here,

141
00:07:09,619 --> 00:07:12,160
there is no way for you to have
prevented those lines of code

142
00:07:12,160 --> 00:07:13,450
from being wrong.

143
00:07:13,450 --> 00:07:15,820
Those lines of code ran and
then the crash happened.

144
00:07:15,820 --> 00:07:18,310
So what you really need
is a way by which you

145
00:07:18,310 --> 00:07:19,800
can back out of these changes.

146
00:07:19,800 --> 00:07:21,425
What the system needs
is a way by which

147
00:07:21,425 --> 00:07:25,350
when the system crashes and
then recovers from the crash,

148
00:07:25,350 --> 00:07:27,650
during failure
recovery the system

149
00:07:27,650 --> 00:07:32,860
has to have a way to back out
of whatever changes it has made.

150
00:07:32,860 --> 00:07:35,279
In other words, what
we want is a concept

151
00:07:35,279 --> 00:07:36,195
called recoverability.

152
00:07:47,640 --> 00:07:49,660
So a more precise
definition of recoverability

153
00:07:49,660 --> 00:07:53,760
is shown on this slide, and
let me just read it out.

154
00:07:53,760 --> 00:07:55,760
A composite sequence of
steps, which we are also

155
00:07:55,760 --> 00:07:58,350
going to use the word
"action" for, an action is

156
00:07:58,350 --> 00:08:01,040
recoverable if, from the point
of view of the module that

157
00:08:01,040 --> 00:08:04,820
invokes this action,
this sequence either

158
00:08:04,820 --> 00:08:09,250
always completes or aborts.

159
00:08:09,250 --> 00:08:12,780
That is if it fails and then
backs out, aborts in a way

160
00:08:12,780 --> 00:08:15,040
such that it appears that
the sequence had never

161
00:08:15,040 --> 00:08:17,312
started to begin with.

162
00:08:17,312 --> 00:08:18,770
And, in particular,
what this means

163
00:08:18,770 --> 00:08:21,180
is that if a failure were
to happen in the middle

164
00:08:21,180 --> 00:08:23,410
when the system
recovers, it better

165
00:08:23,410 --> 00:08:25,840
have a plan of backing
out the changes.

166
00:08:25,840 --> 00:08:29,975
In other words, of
aborting this action.

167
00:08:29,975 --> 00:08:31,600
The way you think
about recoverability,

168
00:08:31,600 --> 00:08:35,909
the simple way to think about
it is do it all or not at all.

169
00:08:39,370 --> 00:08:42,200
And our goal is to try to
somehow come up with a way

170
00:08:42,200 --> 00:08:43,700
to achieve this goal.

171
00:08:47,739 --> 00:08:49,780
And before we get into a
solution to this problem

172
00:08:49,780 --> 00:08:52,710
there are a few other
concepts to discuss,

173
00:08:52,710 --> 00:08:55,560
and they will turn out to be
very related to each other.

174
00:08:55,560 --> 00:08:57,870
And the second concept
after recoverability

175
00:08:57,870 --> 00:08:59,860
that is very closely
related to this idea

176
00:08:59,860 --> 00:09:02,280
has to do with
concurrent actions.

177
00:09:07,650 --> 00:09:09,890
Imagine for a moment that
you had the same transfer

178
00:09:09,890 --> 00:09:15,220
procedure as in this example but
you had two transfers running

179
00:09:15,220 --> 00:09:18,500
at the same time
and they happened

180
00:09:18,500 --> 00:09:25,440
to act on the same
data items like that.

181
00:09:25,440 --> 00:09:29,420
Let's say that the first
transfer moved from a savings

182
00:09:29,420 --> 00:09:32,140
account to a checking
account, it moved $100.

183
00:09:32,140 --> 00:09:34,940
And the second one moved
from savings to checking,

184
00:09:34,940 --> 00:09:36,820
it moved $200.

185
00:09:36,820 --> 00:09:40,870
And let's say at the
beginning S was $1,000.

186
00:09:40,870 --> 00:09:45,200
And, of course as you
recall from several lectures

187
00:09:45,200 --> 00:09:47,340
ago, when you have these
interleave sequences,

188
00:09:47,340 --> 00:09:51,470
these two threads
running the steps

189
00:09:51,470 --> 00:09:53,356
that these threads
are made of might

190
00:09:53,356 --> 00:09:55,230
be interleave in arbitrary
order if you don't

191
00:09:55,230 --> 00:09:58,180
have a plan to isolate them.

192
00:09:58,180 --> 00:10:00,950
And, in particular, you might
have many results that show up.

193
00:10:00,950 --> 00:10:02,420
And one result
that might show up

194
00:10:02,420 --> 00:10:05,000
is both of these transfers
running concurrently

195
00:10:05,000 --> 00:10:09,680
read $1,000 from "from"
account and then both of them

196
00:10:09,680 --> 00:10:12,300
debit by $100 and
$200 respectively.

197
00:10:12,300 --> 00:10:16,837
So at the end of it you might
be left with either $800 or $900

198
00:10:16,837 --> 00:10:18,670
left in the account
when the right answer is

199
00:10:18,670 --> 00:10:20,841
to have been left
intuitively, if you ran

200
00:10:20,841 --> 00:10:22,590
both these transfers
you would like to see

201
00:10:22,590 --> 00:10:26,550
$700 left in that account.

202
00:10:26,550 --> 00:10:28,320
So what you
intuitively want here

203
00:10:28,320 --> 00:10:31,200
is if this is the
first action, A1,

204
00:10:31,200 --> 00:10:34,860
and this is the second action,
A2, what you would like to see

205
00:10:34,860 --> 00:10:35,590
is a sequence--

206
00:10:35,590 --> 00:10:37,250
You don't actually
care what the order

207
00:10:37,250 --> 00:10:39,465
is between these two transfers.

208
00:10:39,465 --> 00:10:40,840
I mean you are
transferring money

209
00:10:40,840 --> 00:10:43,420
from one account to another
and you are doing two of these.

210
00:10:43,420 --> 00:10:45,190
You do not actually
care in this example,

211
00:10:45,190 --> 00:10:47,690
and it will turn
out all the examples

212
00:10:47,690 --> 00:10:51,727
that we are going to be
talking about with this notion

213
00:10:51,727 --> 00:10:54,060
that you are not really going
to care what the order is.

214
00:10:54,060 --> 00:10:56,540
Either order is perfectly
fine, but the order

215
00:10:56,540 --> 00:11:05,210
should be as if it is equivalent
to either A1 before A2 or A2

216
00:11:05,210 --> 00:11:06,690
before A1.

217
00:11:12,360 --> 00:11:14,890
And that is what we would like.

218
00:11:14,890 --> 00:11:19,230
And, of course, some naīve way
to achieve this is to insure

219
00:11:19,230 --> 00:11:21,080
that exactly one
action runs at a time,

220
00:11:21,080 --> 00:11:22,930
it finishes and then
the second one runs,

221
00:11:22,930 --> 00:11:25,200
but that is kind of going
to be no fun for us to do.

222
00:11:25,200 --> 00:11:26,700
It is the right
simplest solution,

223
00:11:26,700 --> 00:11:29,300
but we are going to want to
improve concurrency as we had

224
00:11:29,300 --> 00:11:31,310
wanted to several lectures ago.

225
00:11:31,310 --> 00:11:32,810
So we are going to
come up with ways

226
00:11:32,810 --> 00:11:34,930
of getting higher
performance than running one

227
00:11:34,930 --> 00:11:35,670
after the other.

228
00:11:35,670 --> 00:11:39,350
But the net effect is if you
run it in some serial order,

229
00:11:39,350 --> 00:11:41,480
in some sequential
order of the actions.

230
00:11:41,480 --> 00:11:44,470
That is the result of
running concurrent action has

231
00:11:44,470 --> 00:11:47,040
to be the same as
some serial ordering

232
00:11:47,040 --> 00:11:48,600
of the individual actions.

233
00:11:51,340 --> 00:11:56,044
And this idea of A1 before A2
or A2 before A1 has a name.

234
00:11:56,044 --> 00:11:57,085
It is called "isolation".

235
00:12:04,670 --> 00:12:07,049
And you should distinguish
that in your mind

236
00:12:07,049 --> 00:12:08,215
clearly from recoverability.

237
00:12:12,070 --> 00:12:14,080
So a more precise
definition of isolation

238
00:12:14,080 --> 00:12:16,820
is essentially
what I said before.

239
00:12:16,820 --> 00:12:21,930
The composite sequence
of steps is isolated

240
00:12:21,930 --> 00:12:24,720
if its effect from the
point of view of its invoker

241
00:12:24,720 --> 00:12:26,760
is the same as if the
action occurred either

242
00:12:26,760 --> 00:12:30,140
completely before or completely
after every other isolated

243
00:12:30,140 --> 00:12:30,640
action.

244
00:12:32,879 --> 00:12:34,420
And the simple way
to understand this

245
00:12:34,420 --> 00:12:36,924
is you either do it all
before or do it all after.

246
00:12:36,924 --> 00:12:39,590
That is the net effect has to be
the same as doing it all before

247
00:12:39,590 --> 00:12:40,641
or doing it all after.

248
00:12:40,641 --> 00:12:42,640
And it is different from
recoverability which is

249
00:12:42,640 --> 00:12:44,480
really do it all or not at all.

250
00:12:50,650 --> 00:12:54,780
Now, when you have a
system that satisfies

251
00:12:54,780 --> 00:12:57,190
both recoverability
and isolations--

252
00:12:57,190 --> 00:12:59,220
The way to understand
this is both of these

253
00:12:59,220 --> 00:13:00,270
really, although
they are talking

254
00:13:00,270 --> 00:13:02,561
about different concepts,
this is saying all or nothing

255
00:13:02,561 --> 00:13:05,470
and this is saying all before
or all after, both of these

256
00:13:05,470 --> 00:13:08,010
are getting at the same
intuitive idea which

257
00:13:08,010 --> 00:13:12,010
is that somehow there is a
sequence of steps, for example,

258
00:13:12,010 --> 00:13:14,620
in this transfer procedure there
will be sequences of steps.

259
00:13:14,620 --> 00:13:18,580
And somehow you want to make
it look as if, for each action,

260
00:13:18,580 --> 00:13:21,880
the sequence of steps is not
visible to somebody invoking

261
00:13:21,880 --> 00:13:25,240
the action because you do
not want the person invoking

262
00:13:25,240 --> 00:13:26,600
this action for recoverability.

263
00:13:26,600 --> 00:13:28,120
You do not want him to
know that it is build out

264
00:13:28,120 --> 00:13:29,190
of a sequence of steps.

265
00:13:29,190 --> 00:13:30,490
And if a failure
happens in the middle,

266
00:13:30,490 --> 00:13:32,240
you do not want the
invoker of that action

267
00:13:32,240 --> 00:13:33,897
to see some partial state.

268
00:13:33,897 --> 00:13:35,980
Likewise, when you have
concurrent actions running

269
00:13:35,980 --> 00:13:38,320
together, you do not want
the different invokers

270
00:13:38,320 --> 00:13:41,560
of that action to somehow
see this muddled result

271
00:13:41,560 --> 00:13:42,489
of the interleaving.

272
00:13:42,489 --> 00:13:44,030
You want them to
only see the results

273
00:13:44,030 --> 00:13:47,766
of running these actions
one after the other.

274
00:13:47,766 --> 00:13:49,140
What you really
trying to achieve

275
00:13:49,140 --> 00:13:50,848
for both of these
concepts, although they

276
00:13:50,848 --> 00:13:53,600
are distinct concepts,
is to hide the fact

277
00:13:53,600 --> 00:13:56,640
that this action is a
composite sequence of steps.

278
00:13:56,640 --> 00:13:59,210
You want to make it look as if
it is quite [UNINTELLIGIBLE].

279
00:13:59,210 --> 00:14:00,830
And this idea of wanting
something to look

280
00:14:00,830 --> 00:14:02,455
[UNINTELLIGIBLE] is
called "atomicity".

281
00:14:08,340 --> 00:14:13,125
And we are going to be
basically hiding the fact

282
00:14:13,125 --> 00:14:14,000
that it is composite.

283
00:14:23,370 --> 00:14:27,376
So more precisely
for this course,

284
00:14:27,376 --> 00:14:29,250
we are going to use the
word "atomic" to mean

285
00:14:29,250 --> 00:14:31,979
recoverable and isolated.

286
00:14:31,979 --> 00:14:33,520
And I am going to
say for this course

287
00:14:33,520 --> 00:14:38,170
because these terms have been
used in various different ways

288
00:14:38,170 --> 00:14:41,150
for at least probably
more than 30 years

289
00:14:41,150 --> 00:14:45,472
and I think it is about
time we made these precise.

290
00:14:45,472 --> 00:14:47,430
In the literature, you
will see the word atomic

291
00:14:47,430 --> 00:14:50,390
to often mean recoverable.

292
00:14:50,390 --> 00:14:52,230
And sometimes, and
this is unfortunate,

293
00:14:52,230 --> 00:14:55,750
you will see the word
consistent to mean isolated.

294
00:14:55,750 --> 00:14:58,540
And, in particular, you
will run into this confusion

295
00:14:58,540 --> 00:15:04,380
when you read the paper
for recitation on Thursday,

296
00:15:04,380 --> 00:15:06,530
the System R paper.

297
00:15:06,530 --> 00:15:08,884
The problem is those
terms used historically

298
00:15:08,884 --> 00:15:10,550
have not been used
in a very precise way

299
00:15:10,550 --> 00:15:12,280
so we will define it precisely.

300
00:15:12,280 --> 00:15:14,800
When we say something
is atomic, in general

301
00:15:14,800 --> 00:15:17,275
we mean both recoverable
and isolated.

302
00:15:17,275 --> 00:15:18,650
When we mean only
one of them, we

303
00:15:18,650 --> 00:15:20,820
will say atomic with
respect to recoverability

304
00:15:20,820 --> 00:15:24,180
or recoverable, atomic
with respect to isolation

305
00:15:24,180 --> 00:15:25,950
or isolated.

306
00:15:25,950 --> 00:15:30,220
And, like I said, atomic means
recoverable and isolated.

307
00:15:30,220 --> 00:15:31,720
The general plan
is to hide the fact

308
00:15:31,720 --> 00:15:34,180
that an action is built out of
composite sequence of steps.

309
00:15:43,026 --> 00:15:44,900
Now, to add to this
confusion of terminology,

310
00:15:44,900 --> 00:15:48,390
there are actually two other
terms or two other properties

311
00:15:48,390 --> 00:15:52,290
that you often want
from actions in addition

312
00:15:52,290 --> 00:15:54,040
to recoverability and isolation.

313
00:15:56,740 --> 00:15:59,140
And these two other
properties are

314
00:15:59,140 --> 00:16:01,550
provided by many
database systems

315
00:16:01,550 --> 00:16:05,740
which are one of the most
common users of these concepts.

316
00:16:05,740 --> 00:16:09,100
The most common system that
provides atomicity, one example

317
00:16:09,100 --> 00:16:10,010
is a database system.

318
00:16:10,010 --> 00:16:12,650
Now, many, many systems
provide atomicity.

319
00:16:12,650 --> 00:16:15,660
For example, every computer
does it in its instruction set.

320
00:16:15,660 --> 00:16:17,200
You often want
your instructions,

321
00:16:17,200 --> 00:16:19,574
from the point of view of the
invoker of the instruction,

322
00:16:19,574 --> 00:16:20,731
to be atomic.

323
00:16:20,731 --> 00:16:22,480
So we are going to be
designing techniques

324
00:16:22,480 --> 00:16:25,282
that, in general, operate across
the whole range of systems.

325
00:16:25,282 --> 00:16:27,240
But database systems are
of particular interest

326
00:16:27,240 --> 00:16:30,680
because they are very common
and they exercise these concepts

327
00:16:30,680 --> 00:16:32,550
to a high degree.

328
00:16:32,550 --> 00:16:35,689
And two other concepts
that many systems provide,

329
00:16:35,689 --> 00:16:36,980
the first one is "consistency".

330
00:16:40,307 --> 00:16:42,890
And it is unfortunate that the
word consistency was previously

331
00:16:42,890 --> 00:16:45,187
used, to some extent,
to mean isolated.

332
00:16:45,187 --> 00:16:47,270
So it is important not to
get into that confusion.

333
00:16:47,270 --> 00:16:52,484
In some old papers when
you see consistency,

334
00:16:52,484 --> 00:16:54,150
you should realize
that what they really

335
00:16:54,150 --> 00:16:58,640
are talking about isolated,
A1 before A2 or A2 before A1.

336
00:16:58,640 --> 00:17:00,710
But we will mean by
consistency, and we

337
00:17:00,710 --> 00:17:03,330
will get into this next
week, is that there

338
00:17:03,330 --> 00:17:07,450
is some invariant for the
application that is often using

339
00:17:07,450 --> 00:17:10,500
atomicity that is maintained.

340
00:17:10,500 --> 00:17:12,880
For example, in a
banking application,

341
00:17:12,880 --> 00:17:14,599
if you take the
transfer examples,

342
00:17:14,599 --> 00:17:16,470
isolated means that
you want the result

343
00:17:16,470 --> 00:17:20,010
to be as if the transfers
ran in some serial order.

344
00:17:20,010 --> 00:17:22,069
Consistent means that
there might be a high level

345
00:17:22,069 --> 00:17:26,359
notion that the designer
of this banking application

346
00:17:26,359 --> 00:17:29,350
might have wanted, such as a
bank might have a rule that

347
00:17:29,350 --> 00:17:34,050
says that at the end of each day
every checking account should

348
00:17:34,050 --> 00:17:37,270
have an amount that
is at least 10%

349
00:17:37,270 --> 00:17:40,060
of the corresponding
savings account.

350
00:17:40,060 --> 00:17:43,990
Now, during the
middle of the day

351
00:17:43,990 --> 00:17:46,100
there might be
individual actions that

352
00:17:46,100 --> 00:17:49,330
transiently violate that rule.

353
00:17:49,330 --> 00:17:51,790
But, at various
points, the designer

354
00:17:51,790 --> 00:17:56,112
might wish to insure that
a rule is the checking

355
00:17:56,112 --> 00:17:58,320
account must have at least
a certain amount of money,

356
00:17:58,320 --> 00:18:00,540
some fraction of
the savings account.

357
00:18:00,540 --> 00:18:04,280
Or in some payroll
application for a company,

358
00:18:04,280 --> 00:18:07,670
they are modifying the
payroll and giving raises

359
00:18:07,670 --> 00:18:09,115
to various people,
but they might

360
00:18:09,115 --> 00:18:10,990
have a rule that says
you could give whatever

361
00:18:10,990 --> 00:18:14,390
raise you want but every manager
must make at least 5% more

362
00:18:14,390 --> 00:18:17,217
than all of his or
her direct reports.

363
00:18:17,217 --> 00:18:18,550
You might have a rule like that.

364
00:18:18,550 --> 00:18:20,654
All of these are applications
of an invariant that

365
00:18:20,654 --> 00:18:22,570
correspond to the
consistency of the data that

366
00:18:22,570 --> 00:18:25,810
is being maintained in
this example in a database.

367
00:18:25,810 --> 00:18:30,160
And you can use database systems
to provide these consistency

368
00:18:30,160 --> 00:18:30,700
rules.

369
00:18:30,700 --> 00:18:32,290
But that is different
from isolation.

370
00:18:32,290 --> 00:18:36,350
Isolation just
says that there has

371
00:18:36,350 --> 00:18:41,510
to be some equivalent serial
ordering in which things run.

372
00:18:41,510 --> 00:18:46,082
And the fourth property after
recoverability, isolation

373
00:18:46,082 --> 00:18:47,415
and consistency is "durability".

374
00:18:52,350 --> 00:18:54,110
Durability basically
says that the data

375
00:18:54,110 --> 00:18:56,650
should last for as long as--

376
00:18:56,650 --> 00:18:59,950
It's an application-specific
concept, but what it says

377
00:18:59,950 --> 00:19:02,780
is the data must
last for as long

378
00:19:02,780 --> 00:19:05,460
as some pre-defined duration.

379
00:19:05,460 --> 00:19:07,460
For example, you might
store data in a database.

380
00:19:07,460 --> 00:19:09,290
And, in many
databases, you really

381
00:19:09,290 --> 00:19:11,170
want it to last "forever".

382
00:19:11,170 --> 00:19:14,370
But in reality it is very hard
to make things last forever

383
00:19:14,370 --> 00:19:17,800
so you might define that
the data in this database

384
00:19:17,800 --> 00:19:21,000
must last for three years, and
you work hard to preserve that.

385
00:19:21,000 --> 00:19:22,930
Or you might have an
application that as long

386
00:19:22,930 --> 00:19:25,120
as the thread is running
you want the data to last,

387
00:19:25,120 --> 00:19:27,294
but after the
thread is terminated

388
00:19:27,294 --> 00:19:28,960
you do not actually
care about the data.

389
00:19:28,960 --> 00:19:31,070
And that is a different
notion of durability.

390
00:19:31,070 --> 00:19:33,860
But both of these have talked
about the lifetime with which

391
00:19:33,860 --> 00:19:37,450
you want to preserve data.

392
00:19:37,450 --> 00:19:40,030
Now, when you have a system
that provides recoverability

393
00:19:40,030 --> 00:19:43,880
and isolation, that is
atomicity, consistency

394
00:19:43,880 --> 00:19:45,620
and durability,
then we are going

395
00:19:45,620 --> 00:19:52,150
to call that a transaction.

396
00:19:52,150 --> 00:19:55,650
A set of actions, each
of which is recoverable,

397
00:19:55,650 --> 00:19:58,090
that are isolated
from each other, that

398
00:19:58,090 --> 00:20:00,510
has a notion of consistency
and can achieve it

399
00:20:00,510 --> 00:20:04,620
and where the data has
durability, those actions

400
00:20:04,620 --> 00:20:07,130
are called transactions.

401
00:20:07,130 --> 00:20:09,957
And many database systems work
hard to provide transactions,

402
00:20:09,957 --> 00:20:11,915
which means they provide
all of these features.

403
00:20:14,570 --> 00:20:16,260
But it is certainly
possible, and we

404
00:20:16,260 --> 00:20:19,020
will look at many examples
where you can just

405
00:20:19,020 --> 00:20:22,920
design systems that have just
recoverability and isolation.

406
00:20:22,920 --> 00:20:25,467
And we will not even worry
about these other notions.

407
00:20:25,467 --> 00:20:26,800
That is what we will start with.

408
00:20:26,800 --> 00:20:28,510
We do not want to solve all
of the problems at once.

409
00:20:28,510 --> 00:20:30,430
We will start with the
easier set of problems

410
00:20:30,430 --> 00:20:31,513
and then build from there.

411
00:20:44,170 --> 00:20:46,540
Today, and on
Wednesday, our plan

412
00:20:46,540 --> 00:20:49,000
is to come up with ways of
achieving recoverability.

413
00:20:49,000 --> 00:20:50,833
So that is what we are
going to start doing.

414
00:20:59,392 --> 00:21:00,850
The general approach
for how we are

415
00:21:00,850 --> 00:21:02,600
going to achieve
recoverability of modules

416
00:21:02,600 --> 00:21:04,570
is, and recall that
the problem here

417
00:21:04,570 --> 00:21:08,322
is M2 fails and then M1
somehow discovers its failure

418
00:21:08,322 --> 00:21:10,030
and then when it
restarts you do not want

419
00:21:10,030 --> 00:21:12,840
any partial state to be kept.

420
00:21:12,840 --> 00:21:17,449
The general plan is to design
modules to be failed fast.

421
00:21:17,449 --> 00:21:19,740
You need a way to discover
that things are not working,

422
00:21:19,740 --> 00:21:21,930
and that is the scope of
the kinds of systems we

423
00:21:21,930 --> 00:21:24,870
are going to be dealing with.

424
00:21:24,870 --> 00:21:27,910
And then once the system's
failure is detected

425
00:21:27,910 --> 00:21:30,280
and then you restart the
system or it recovers,

426
00:21:30,280 --> 00:21:34,160
you run some kind of
a repair procedure.

427
00:21:34,160 --> 00:21:36,550
This is in general you run
some kind of repair procedure

428
00:21:36,550 --> 00:21:40,340
that allows that failed
module to recover

429
00:21:40,340 --> 00:21:45,510
and then it restarts
where restarts

430
00:21:45,510 --> 00:21:48,320
means it allows,
M1 in this case,

431
00:21:48,320 --> 00:21:52,140
allows invokers to start running
on that system, on that module.

432
00:22:00,250 --> 00:22:02,564
We are going to do
this in three steps.

433
00:22:02,564 --> 00:22:03,980
The first thing
we are going to do

434
00:22:03,980 --> 00:22:07,590
is to look at a very specific
special case of this problem

435
00:22:07,590 --> 00:22:11,410
which is realize that
all of these having

436
00:22:11,410 --> 00:22:13,880
to do with partial state
occur because there

437
00:22:13,880 --> 00:22:16,330
is some state, once
a module has crashed

438
00:22:16,330 --> 00:22:18,080
there is some state
that it has remaining.

439
00:22:18,080 --> 00:22:20,860
So if it just recovered
and started running again

440
00:22:20,860 --> 00:22:23,010
without doing something
then that partial state

441
00:22:23,010 --> 00:22:28,090
is visible to the
invoker of that module.

442
00:22:28,090 --> 00:22:30,970
Now, if the state were all a
volatile state like in just

443
00:22:30,970 --> 00:22:33,080
RAM, for example,
and a thread crashed,

444
00:22:33,080 --> 00:22:35,330
if it was in its virtual
memory and the thread crashed

445
00:22:35,330 --> 00:22:36,700
and it recovered then
you do not really

446
00:22:36,700 --> 00:22:38,800
have to worry about this
because all of the state

447
00:22:38,800 --> 00:22:41,507
anywhere has gone away.

448
00:22:41,507 --> 00:22:43,090
Primarily, we were
worried about state

449
00:22:43,090 --> 00:22:47,130
that lasts across failures.

450
00:22:47,130 --> 00:22:50,420
And an example of
that is the state

451
00:22:50,420 --> 00:22:53,710
that is maintained on this,
just as a concrete example.

452
00:22:53,710 --> 00:22:59,170
We are going to start first by
obtaining a recoverable sector.

453
00:23:02,990 --> 00:23:06,700
Basically coming up with the
scheme that allows us to do

454
00:23:06,700 --> 00:23:08,670
reads and writes
of a single sector

455
00:23:08,670 --> 00:23:10,440
of a disk in a recoverable way.

456
00:23:10,440 --> 00:23:13,090
So we are going to define two
procedures, a recoverable "put"

457
00:23:13,090 --> 00:23:14,870
that allows you to
put stuff, write stuff

458
00:23:14,870 --> 00:23:17,270
onto a single sector of a
disk and the recoverable

459
00:23:17,270 --> 00:23:18,760
"get" that allows
you to read stuff

460
00:23:18,760 --> 00:23:24,980
of a single sector of a disk
in a way that is recoverable.

461
00:23:24,980 --> 00:23:27,110
And the hard problem
here is going

462
00:23:27,110 --> 00:23:30,150
to be that as the
system is crashing,

463
00:23:30,150 --> 00:23:32,590
for a variety of
reasons, bad data might

464
00:23:32,590 --> 00:23:34,760
get written to a sector.

465
00:23:34,760 --> 00:23:38,840
If you just took a regular
sector of your disk,

466
00:23:38,840 --> 00:23:41,460
let's say that the
operating system

467
00:23:41,460 --> 00:23:43,460
is trying to write something
into a disk sector,

468
00:23:43,460 --> 00:23:45,910
somebody turns off the
power and random stuff

469
00:23:45,910 --> 00:23:48,909
might get written
out onto the disk.

470
00:23:48,909 --> 00:23:50,450
And so when the
system comes back up,

471
00:23:50,450 --> 00:23:53,270
the reader of that sector
might get some garbage value,

472
00:23:53,270 --> 00:23:55,410
a result of some partial write.

473
00:23:55,410 --> 00:23:57,710
So that is what we are
going to try to avoid.

474
00:23:57,710 --> 00:24:00,990
So we will do that first.

475
00:24:00,990 --> 00:24:04,010
And that is for next
time, to complete

476
00:24:04,010 --> 00:24:05,670
the recoverability story.

477
00:24:05,670 --> 00:24:10,550
We are going to use this
solution as a building-block

478
00:24:10,550 --> 00:24:12,500
for a more general
solution because it is not

479
00:24:12,500 --> 00:24:15,090
going to be enough for us to
just be able to read and write

480
00:24:15,090 --> 00:24:16,620
single sectors in
a recoverable way

481
00:24:16,620 --> 00:24:19,612
because how many applications
use only one sector of a disk?

482
00:24:19,612 --> 00:24:21,320
What you would like
to do is to make sure

483
00:24:21,320 --> 00:24:22,630
that you have a
general solution that

484
00:24:22,630 --> 00:24:25,350
works across all of the data
that is being written and read.

485
00:24:25,350 --> 00:24:27,780
We are going to use that to
come up with two schemes.

486
00:24:27,780 --> 00:24:31,260
The first scheme uses an idea
called a "version history".

487
00:24:31,260 --> 00:24:40,470
And a second scheme uses an idea
called "logging" using logs.

488
00:24:40,470 --> 00:24:42,640
And both of these
schemes will turn out

489
00:24:42,640 --> 00:24:44,830
to be very general
and useful and work,

490
00:24:44,830 --> 00:24:46,290
but both of these
schemes basically

491
00:24:46,290 --> 00:24:50,690
will use this technique as
a bootstrapping technique.

492
00:24:50,690 --> 00:24:53,150
And so we need a solution
here anyway because we

493
00:24:53,150 --> 00:24:55,970
are going to build on that to
develop a more sophisticated

494
00:24:55,970 --> 00:24:58,570
solution for the general case.

495
00:24:58,570 --> 00:25:02,314
And so today we are going to
start with a special case.

496
00:25:02,314 --> 00:25:03,730
A, because it is
a building block,

497
00:25:03,730 --> 00:25:06,340
and, B, because it will
turn out to show us

498
00:25:06,340 --> 00:25:09,000
a rule that we are going to
religiously following in coming

499
00:25:09,000 --> 00:25:14,734
up with systematic solutions
to work in a more general case

500
00:25:14,734 --> 00:25:16,650
when you have more than
one sector being read.

501
00:25:24,710 --> 00:25:27,120
So let's write out the
assumptions in the model

502
00:25:27,120 --> 00:25:29,711
here for this solution.

503
00:25:29,711 --> 00:25:31,460
The first assumption
we are going to make,

504
00:25:31,460 --> 00:25:33,160
since we are dealing with
recoverability and not

505
00:25:33,160 --> 00:25:33,820
with isolation.

506
00:25:33,820 --> 00:25:36,426
We are going to deal
with isolation next week.

507
00:25:36,426 --> 00:25:37,800
The first assumption
we will make

508
00:25:37,800 --> 00:25:44,161
is that there is no
concurrency, and we will come up

509
00:25:44,161 --> 00:25:45,660
with different
solutions for dealing

510
00:25:45,660 --> 00:25:49,210
with people concurrently trying
to write the same sector.

511
00:25:54,922 --> 00:25:56,630
And this is an assumption
we will revisit

512
00:25:56,630 --> 00:25:58,130
in a couple of weeks
to show you how

513
00:25:58,130 --> 00:26:00,470
to actually achieve this goal.

514
00:26:00,470 --> 00:26:05,400
But we will assume that there
are no hardware failures,

515
00:26:05,400 --> 00:26:06,330
no hardware errors.

516
00:26:11,900 --> 00:26:14,320
For example, the
appendix to Chapter 8,

517
00:26:14,320 --> 00:26:17,280
which we have assigned
for reading later

518
00:26:17,280 --> 00:26:21,500
on in the semester,
actually shows two methods,

519
00:26:21,500 --> 00:26:24,300
"careful put" and "careful
get" that actually

520
00:26:24,300 --> 00:26:26,530
deal with a variety
of hardware problems.

521
00:26:26,530 --> 00:26:30,820
For example, every sector has
a disk "checks-them" on it.

522
00:26:30,820 --> 00:26:33,810
If you wrote bad data
and something happened

523
00:26:33,810 --> 00:26:36,940
in the middle of that write and
then someone went back and read

524
00:26:36,940 --> 00:26:38,920
that sector, they would
discover that it is bad

525
00:26:38,920 --> 00:26:41,880
because the checks-them
would not match.

526
00:26:41,880 --> 00:26:44,030
Now, the appendix
to this chapter, 9B,

527
00:26:44,030 --> 00:26:46,210
has a more careful
description of how

528
00:26:46,210 --> 00:26:48,020
you deal with a
variety of errors

529
00:26:48,020 --> 00:26:50,500
so that you can achieve this
careful put and careful get

530
00:26:50,500 --> 00:26:53,749
of a disk sector.

531
00:26:53,749 --> 00:26:55,790
Assume for now that there
are no hardware errors,

532
00:26:55,790 --> 00:26:57,987
there is no decay of data
on the disk and so on.

533
00:26:57,987 --> 00:27:00,070
It will turn out the problem
is still interesting,

534
00:27:00,070 --> 00:27:03,740
that it is not easy to
achieve a recoverable put

535
00:27:03,740 --> 00:27:06,360
and get even though
the hardware is fine.

536
00:27:06,360 --> 00:27:08,310
And that is because there
are software errors.

537
00:27:13,590 --> 00:27:16,600
And, in particular,
the model here

538
00:27:16,600 --> 00:27:18,310
is that you have
some application

539
00:27:18,310 --> 00:27:20,930
and then you have
the operating system.

540
00:27:20,930 --> 00:27:22,950
And the operating
system has a buffer

541
00:27:22,950 --> 00:27:28,730
here of data that it is
waiting to write onto disk.

542
00:27:28,730 --> 00:27:35,210
Then you have a disk and
that is a disk sector.

543
00:27:35,210 --> 00:27:37,550
The problem might be
that as a failure occurs

544
00:27:37,550 --> 00:27:40,355
there is something that happens,
an error or something that gets

545
00:27:40,355 --> 00:27:41,730
triggered in the
operating system

546
00:27:41,730 --> 00:27:43,521
so the buffer gets
corrupted and then there

547
00:27:43,521 --> 00:27:46,950
is some bad data that gets
written out onto the sector.

548
00:27:46,950 --> 00:27:50,977
That is the kind of problem
that we want to protect against.

549
00:27:50,977 --> 00:27:52,560
The fact that your
hardware is perfect

550
00:27:52,560 --> 00:27:54,060
does not actually
solve this problem

551
00:27:54,060 --> 00:27:56,769
because this buffer itself has
been corrupted or something

552
00:27:56,769 --> 00:27:58,310
happens during the
process of writing

553
00:27:58,310 --> 00:28:01,877
this buffer to the sector
so the data itself is bad,

554
00:28:01,877 --> 00:28:03,710
and that is what we
want to protect against.

555
00:28:09,180 --> 00:28:24,559
We are going to build on
something that I have already

556
00:28:24,559 --> 00:28:25,100
talked about.

557
00:28:25,100 --> 00:28:31,350
We are going to build on two
procedures, careful put that

558
00:28:31,350 --> 00:28:35,160
puts to a sector,
it puts some data,

559
00:28:35,160 --> 00:28:40,760
and the corresponding careful
get which reads from a sector

560
00:28:40,760 --> 00:28:42,710
and returns the data
that is on that sector.

561
00:28:42,710 --> 00:28:44,190
And the assumption
is that careful

562
00:28:44,190 --> 00:28:46,377
put and get, once you
give it some data there

563
00:28:46,377 --> 00:28:48,710
are no hardware failures for
you to worry about anymore.

564
00:28:54,489 --> 00:28:56,530
The solution we are going
to take to this problem

565
00:28:56,530 --> 00:28:59,570
is to realize that
when a failure happens,

566
00:28:59,570 --> 00:29:01,880
for example, somebody
turns off the power switch

567
00:29:01,880 --> 00:29:06,250
and this buffer gets corrupted,
when the operating systems does

568
00:29:06,250 --> 00:29:07,832
a write to that
sector, the sector

569
00:29:07,832 --> 00:29:09,790
might be left in a state
that does not actually

570
00:29:09,790 --> 00:29:11,456
correspond to the
data that was intended

571
00:29:11,456 --> 00:29:13,570
to put onto that sector.

572
00:29:13,570 --> 00:29:16,350
And so when the system
recovers you are sort of stuck

573
00:29:16,350 --> 00:29:21,210
because this data in the sector
contains some values in it that

574
00:29:21,210 --> 00:29:25,920
do not actually correspond
to any actual intended put

575
00:29:25,920 --> 00:29:29,320
of the data, any intended
write of the data.

576
00:29:29,320 --> 00:29:34,360
What this suggests is that
a solution to this problem

577
00:29:34,360 --> 00:29:36,860
must involve a
copy of some kind.

578
00:29:36,860 --> 00:29:40,340
You must make sure that if you
have just one copy of the data

579
00:29:40,340 --> 00:29:42,890
and you write to it and
something fails in the middle

580
00:29:42,890 --> 00:29:44,610
and you do not have
a plan to back out

581
00:29:44,610 --> 00:29:49,279
to an earlier working version
that was correct you are stuck.

582
00:29:49,279 --> 00:29:51,320
That suggests that we
better have a solution that

583
00:29:51,320 --> 00:29:54,770
involves a copy of data.

584
00:29:54,770 --> 00:29:57,680
Later on we will see how
to systematically [develop

585
00:29:57,680 --> 00:29:58,840
a rule?] based on this.

586
00:30:02,390 --> 00:30:04,520
The idea here is very simple.

587
00:30:04,520 --> 00:30:08,380
The way we are going to achieve
a "recoverable get of a sector"

588
00:30:08,380 --> 00:30:13,480
is actually to build a single
sector, a recoverable sector

589
00:30:13,480 --> 00:30:15,150
out of three sectors.

590
00:30:15,150 --> 00:30:19,410
The first sector here is going
to have one copy of the data,

591
00:30:19,410 --> 00:30:22,306
the second sector is going to
have another copy of the data

592
00:30:22,306 --> 00:30:24,180
and we are going to have
a third sector which

593
00:30:24,180 --> 00:30:28,090
is going to act as a flag
that allows us to choose one

594
00:30:28,090 --> 00:30:29,840
version or the other version.

595
00:30:29,840 --> 00:30:34,430
Let me call this D0,
let me call this D1

596
00:30:34,430 --> 00:30:38,620
and let me call
this the "chooser".

597
00:30:42,900 --> 00:30:49,460
Assume that at some point in
time D0 has proper data on it.

598
00:30:49,460 --> 00:30:52,140
The idea now is going to
be that anybody reading it,

599
00:30:52,140 --> 00:30:55,180
the chooser is going to
contain the value zero in it.

600
00:30:55,180 --> 00:30:58,810
Now, anybody reading is
going to read from D0.

601
00:30:58,810 --> 00:31:03,150
Anybody writing
in recoverable put

602
00:31:03,150 --> 00:31:05,260
is not allowed to write
to D0 because that is

603
00:31:05,260 --> 00:31:06,990
what people are reading from.

604
00:31:06,990 --> 00:31:08,300
Instead, they will write to D1.

605
00:31:08,300 --> 00:31:09,800
When the chooser
value is zero, they

606
00:31:09,800 --> 00:31:12,140
will start writing into D1.

607
00:31:12,140 --> 00:31:16,170
The plan is going to be that
if that write succeeds properly

608
00:31:16,170 --> 00:31:18,300
then what we will do
is go ahead and change

609
00:31:18,300 --> 00:31:22,390
the chooser from zero to
a one, and then people

610
00:31:22,390 --> 00:31:24,395
will start reading from one.

611
00:31:24,395 --> 00:31:26,270
But if that write were
to fail in the middle,

612
00:31:26,270 --> 00:31:28,370
if the power fails or
something like that,

613
00:31:28,370 --> 00:31:33,290
D1 will be left in sort of
a weird intermediate state.

614
00:31:33,290 --> 00:31:35,010
But that is OK because
nobody is really

615
00:31:35,010 --> 00:31:36,620
going to be reading from D1.

616
00:31:36,620 --> 00:31:40,030
They are all going to be reading
from D0 because the chooser has

617
00:31:40,030 --> 00:31:42,859
not yet been changed.

618
00:31:42,859 --> 00:31:44,650
The only other thing
we have to worry about

619
00:31:44,650 --> 00:31:47,690
is now we are OK, as long
as the failure happens,

620
00:31:47,690 --> 00:31:50,202
if the failure
happens in the middle

621
00:31:50,202 --> 00:31:52,660
here somewhere where we are
writing D1 we are OK because we

622
00:31:52,660 --> 00:31:54,900
have not touched the chooser.

623
00:31:54,900 --> 00:31:57,770
If the failure happens at
the end of writing D1--

624
00:31:57,770 --> 00:32:00,590
So we have written D1
and then we have not yet

625
00:32:00,590 --> 00:32:03,990
started writing the chooser
and a failure happens here,

626
00:32:03,990 --> 00:32:06,407
we are still OK
because everybody

627
00:32:06,407 --> 00:32:07,490
will be reading from zero.

628
00:32:07,490 --> 00:32:09,550
And that is not going
to have garbage in it.

629
00:32:09,550 --> 00:32:11,650
It is not going to have
the latest value in it.

630
00:32:11,650 --> 00:32:12,380
But that is OK.

631
00:32:12,380 --> 00:32:15,160
We never said that we
should see the latest value

632
00:32:15,160 --> 00:32:16,510
for recoverability to hold.

633
00:32:16,510 --> 00:32:19,960
It is going to be OK for
us to be reading from D0

634
00:32:19,960 --> 00:32:21,830
and continue to read from D0.

635
00:32:21,830 --> 00:32:23,910
And really the
correctness of this

636
00:32:23,910 --> 00:32:26,340
boils down to understanding
what will happen

637
00:32:26,340 --> 00:32:28,850
when a failure happens
during the middle of writing

638
00:32:28,850 --> 00:32:30,357
this sector.

639
00:32:30,357 --> 00:32:32,190
You are starting to
write the chooser sector

640
00:32:32,190 --> 00:32:34,400
and the system fails.

641
00:32:34,400 --> 00:32:36,790
And we do not have to worry
about that because now we

642
00:32:36,790 --> 00:32:38,831
have written D1 completely
and a failure happened

643
00:32:38,831 --> 00:32:40,700
in the middle of that.

644
00:32:40,700 --> 00:32:42,286
To understand that,
we will get back

645
00:32:42,286 --> 00:32:43,910
to understanding the
correctness of it,

646
00:32:43,910 --> 00:32:45,910
but it helps to see what
pseudo code looks like.

647
00:32:49,710 --> 00:32:53,340
So that is what put looks like.

648
00:32:53,340 --> 00:32:56,137
To do a put, you first
read the chooser sector

649
00:32:56,137 --> 00:32:57,720
and then you put
into the other place.

650
00:33:01,556 --> 00:33:02,930
This which here
is the thing that

651
00:33:02,930 --> 00:33:05,100
tells you what the value
of the chooser sector is.

652
00:33:05,100 --> 00:33:09,620
It tells you which of the
two copies to write into.

653
00:33:09,620 --> 00:33:13,200
And then after you do the
careful put, if which is zero,

654
00:33:13,200 --> 00:33:16,480
you put it to one, if which
is one, you put it to zero.

655
00:33:16,480 --> 00:33:19,590
After that you
twiddle a bit and then

656
00:33:19,590 --> 00:33:22,540
you do a put onto
the chooser sector.

657
00:33:22,540 --> 00:33:24,820
The get is actually easier.

658
00:33:24,820 --> 00:33:27,840
You just look at what the
value is of the chooser sector

659
00:33:27,840 --> 00:33:31,530
and then get it from
the corresponding place.

660
00:33:31,530 --> 00:33:34,590
Now, there is a line here, the
second line of this pseudo code

661
00:33:34,590 --> 00:33:36,230
which says status "not-OK".

662
00:33:36,230 --> 00:33:38,470
So status not-OK
is the key thing.

663
00:33:38,470 --> 00:33:40,360
If status not-OK is
what happens when

664
00:33:40,360 --> 00:33:43,500
a failure happens in the middle
of writing the chooser sector.

665
00:33:43,500 --> 00:33:46,950
Let's say a failure happens
on this pseudo code,

666
00:33:46,950 --> 00:33:49,830
I already explained why there is
no problem if a failure happens

667
00:33:49,830 --> 00:33:52,210
until you get to the
last line, until you

668
00:33:52,210 --> 00:33:55,290
get to the careful put
of the chooser sector.

669
00:33:55,290 --> 00:33:58,680
Until that line is executed
nobody sees the new data.

670
00:33:58,680 --> 00:34:00,162
Everybody doing a
get is continuing

671
00:34:00,162 --> 00:34:02,120
to see the old data, not
the new data that just

672
00:34:02,120 --> 00:34:04,770
got written with careful put.

673
00:34:04,770 --> 00:34:08,460
After this careful put
executes and returns then

674
00:34:08,460 --> 00:34:10,880
everybody is going to see the
new data because the chooser

675
00:34:10,880 --> 00:34:13,401
sector has been
correctly changed.

676
00:34:13,401 --> 00:34:14,900
The only tricky
part to worry about,

677
00:34:14,900 --> 00:34:18,010
we have reduced this problem of
the slightly more general case

678
00:34:18,010 --> 00:34:20,340
of writing these sectors
and switching between then

679
00:34:20,340 --> 00:34:22,409
to this specific
problem of figuring out

680
00:34:22,409 --> 00:34:24,950
what happens if a failure occurs
in the middle of the chooser

681
00:34:24,950 --> 00:34:26,179
sector's write.

682
00:34:26,179 --> 00:34:28,620
If a failure happens here,
one of the common things

683
00:34:28,620 --> 00:34:32,639
that could happen is that this
particular sector's checks-them

684
00:34:32,639 --> 00:34:37,690
does not match the data
that is written here.

685
00:34:37,690 --> 00:34:39,469
So when you do a
get of that sector

686
00:34:39,469 --> 00:34:41,895
here, in the first
line up there, when

687
00:34:41,895 --> 00:34:43,270
you do a careful
get of that, you

688
00:34:43,270 --> 00:34:45,429
will find that the
checks-them does not

689
00:34:45,429 --> 00:34:47,860
match so it returns
a status of not-OK.

690
00:34:47,860 --> 00:34:49,400
If the status is
not OK, you will

691
00:34:49,400 --> 00:34:52,920
have to figure out which
of the two copies to put.

692
00:34:52,920 --> 00:34:54,949
Now, the reason
you can pick either

693
00:34:54,949 --> 00:35:00,430
and you can arbitrarily pick
read the data from sector zero.

694
00:35:00,430 --> 00:35:02,937
But you could pick
either of these.

695
00:35:02,937 --> 00:35:04,520
And the reason is
it OK to pick either

696
00:35:04,520 --> 00:35:07,590
is you know for sure that the
failure must have happened here

697
00:35:07,590 --> 00:35:11,160
while writing this
chooser sector.

698
00:35:11,160 --> 00:35:13,730
And because there are no
concurrent threads going on,

699
00:35:13,730 --> 00:35:16,510
you are assured that there is
no failure that happened here

700
00:35:16,510 --> 00:35:20,260
while writing D0, nor
was there any failure

701
00:35:20,260 --> 00:35:24,672
that occurred here
while writing D1

702
00:35:24,672 --> 00:35:26,130
because the assumption
we have made

703
00:35:26,130 --> 00:35:28,500
is that there is no concurrency.

704
00:35:28,500 --> 00:35:30,580
A system crashes and
recovers and discovers

705
00:35:30,580 --> 00:35:34,320
that there is a failure, or
the careful-get of the chooser

706
00:35:34,320 --> 00:35:36,450
sector did not quite
work out, did not

707
00:35:36,450 --> 00:35:38,260
give you a status of
OK, that it was not

708
00:35:38,260 --> 00:35:40,869
OK then you know the failure
happened while writing here.

709
00:35:40,869 --> 00:35:42,910
And what that means is it
is perfectly OK for you

710
00:35:42,910 --> 00:35:44,430
to read from either version.

711
00:35:44,430 --> 00:35:48,400
Both of those
correspond to a write

712
00:35:48,400 --> 00:35:53,320
to that individual sector that
did not fail in the middle.

713
00:35:53,320 --> 00:35:56,520
And it does not matter
which of the two you pick.

714
00:35:56,520 --> 00:35:59,890
That is the reason why this
approach basically works.

715
00:36:02,580 --> 00:36:04,190
And if you look
at this solution,

716
00:36:04,190 --> 00:36:07,010
this copy idea is actually
a pretty critical idea

717
00:36:07,010 --> 00:36:10,515
for all of our solutions to
achieving recoverability.

718
00:36:10,515 --> 00:36:11,890
And it is going
to lead to a rule

719
00:36:11,890 --> 00:36:14,140
that we are going to
call the "Golden Rule

720
00:36:14,140 --> 00:36:16,740
of Recoverability".

721
00:36:16,740 --> 00:36:19,000
The rule says never
modify the only copy.

722
00:36:22,770 --> 00:36:26,240
If you were asked to
come up with a way

723
00:36:26,240 --> 00:36:29,449
to achieve something that is
recoverable, one guideline,

724
00:36:29,449 --> 00:36:31,490
this is unfortunately not
a sufficient condition.

725
00:36:31,490 --> 00:36:35,611
But a necessary condition is
that if you have something,

726
00:36:35,611 --> 00:36:38,110
and you only have one copy of
that which you end up writing,

727
00:36:38,110 --> 00:36:40,520
then chances are that if a
failure happens in the middle

728
00:36:40,520 --> 00:36:44,390
of writing that one copy
you cannot back out of it

729
00:36:44,390 --> 00:36:45,895
so your scheme would not work.

730
00:36:48,890 --> 00:36:50,640
So never modify the
only copy of anything,

731
00:36:50,640 --> 00:36:52,330
that is the general rule.

732
00:36:57,680 --> 00:37:00,640
Now, there is another
point to observe about

733
00:37:00,640 --> 00:37:03,840
this recoverable disk write.

734
00:37:03,840 --> 00:37:07,970
And that has to do with
that careful put line.

735
00:37:07,970 --> 00:37:10,820
Write before that
line, everybody else

736
00:37:10,820 --> 00:37:13,340
reading this recoverable
sector using recoverable

737
00:37:13,340 --> 00:37:16,180
get sees the old
version of data.

738
00:37:16,180 --> 00:37:19,260
Right after that line has
finished, everybody reading it

739
00:37:19,260 --> 00:37:21,550
sees the new data.

740
00:37:21,550 --> 00:37:23,310
That line is an
example of something

741
00:37:23,310 --> 00:37:26,670
that we will repeatedly
visit and use

742
00:37:26,670 --> 00:37:29,880
called a "commit point".

743
00:37:29,880 --> 00:37:31,580
The successful
completion of that line

744
00:37:31,580 --> 00:37:35,410
insures that everybody
else following doing gets

745
00:37:35,410 --> 00:37:40,280
will see the data that
was written by this put.

746
00:37:40,280 --> 00:37:43,200
And before that line is run,
everybody else following

747
00:37:43,200 --> 00:37:46,880
will see the older
version of the data.

748
00:37:46,880 --> 00:37:49,680
Now, if a failure occurs
in the middle of that line

749
00:37:49,680 --> 00:37:53,930
then the answer depends on what
the recovery procedure does.

750
00:37:53,930 --> 00:37:55,710
And one approach might
be that the invoker

751
00:37:55,710 --> 00:37:58,810
of this module, the person who
originally did the disk write--

752
00:37:58,810 --> 00:38:01,100
If a failure happens in
the middle of the write,

753
00:38:01,100 --> 00:38:03,920
one plan might be that the
invoker of that disk write,

754
00:38:03,920 --> 00:38:10,474
upon recovery, tries the write
again, tries the put again.

755
00:38:10,474 --> 00:38:11,890
And the way he
tries the put is he

756
00:38:11,890 --> 00:38:14,550
first does a get and
sees what answers return.

757
00:38:14,550 --> 00:38:16,140
If the answer is
the new answer then

758
00:38:16,140 --> 00:38:17,760
he says OK everything is fine.

759
00:38:17,760 --> 00:38:19,430
If the answer is
the old answer then

760
00:38:19,430 --> 00:38:22,060
he says I am going
to retry the put.

761
00:38:22,060 --> 00:38:23,580
And this is an
example of something

762
00:38:23,580 --> 00:38:25,746
we saw the last time which
is "temporal redundancy".

763
00:38:25,746 --> 00:38:26,801
You can retry things.

764
00:38:26,801 --> 00:38:28,300
Not only can you
replicate in space,

765
00:38:28,300 --> 00:38:31,024
but you can retry things in
time which is the idea here

766
00:38:31,024 --> 00:38:32,273
for achieving fault-tolerance.

767
00:38:41,980 --> 00:38:43,990
An example of this idea
called a commit point

768
00:38:43,990 --> 00:38:46,350
is that careful put line.

769
00:38:46,350 --> 00:38:47,940
And, in general,
a commit point is

770
00:38:47,940 --> 00:38:49,970
a point in a recoverable
action, in this case.

771
00:38:49,970 --> 00:38:51,595
And it will turn out
to be an idea that

772
00:38:51,595 --> 00:38:55,150
is useful for isolated actions
and for transactions more

773
00:38:55,150 --> 00:38:55,840
generally.

774
00:38:55,840 --> 00:38:59,490
But a commit point is a point
where before the commit point

775
00:38:59,490 --> 00:39:01,675
other people do not see
the results of your action.

776
00:39:01,675 --> 00:39:03,300
And after the commit
point successfully

777
00:39:03,300 --> 00:39:05,845
finishes everybody sees
the results of your action,

778
00:39:05,845 --> 00:39:07,720
and that is the definition
of a commit point.

779
00:39:29,869 --> 00:39:32,410
Now we have to generalize this
idea because what we have seen

780
00:39:32,410 --> 00:39:33,496
is a scheme.

781
00:39:33,496 --> 00:39:35,120
By the way, is this
clear to everybody?

782
00:39:35,120 --> 00:39:40,540
Do you have any questions
about recoverable put and get?

783
00:39:40,540 --> 00:39:43,070
What does that mean?

784
00:39:43,070 --> 00:39:45,270
No questions or not clear?

785
00:39:45,270 --> 00:39:49,470
All right.

786
00:39:49,470 --> 00:39:50,810
Good.

787
00:39:50,810 --> 00:39:55,300
Now we have to
generalize this idea

788
00:39:55,300 --> 00:39:58,195
because the class of programs
where you could just sort

789
00:39:58,195 --> 00:40:00,320
of read and write from one
sector is quite limited.

790
00:40:04,060 --> 00:40:06,810
And so to generalize this idea
of what we are going to do

791
00:40:06,810 --> 00:40:10,210
is to change the
programming model

792
00:40:10,210 --> 00:40:12,974
for writing recoverable
actions a little bit.

793
00:40:12,974 --> 00:40:14,890
Ideally, what you would
like to be able to do,

794
00:40:14,890 --> 00:40:16,514
the model we are
going to try to get at

795
00:40:16,514 --> 00:40:18,930
is to be able to
take a procedure

796
00:40:18,930 --> 00:40:23,720
and begin recoverable action
in front of that procedure,

797
00:40:23,720 --> 00:40:25,500
write code for that
procedure and just

798
00:40:25,500 --> 00:40:28,710
say end recoverable action
and sort of magically end up

799
00:40:28,710 --> 00:40:33,110
with a model where the set
of steps in that action

800
00:40:33,110 --> 00:40:34,427
becomes recoverable.

801
00:40:34,427 --> 00:40:36,510
And it will turn out we
have come very, very close

802
00:40:36,510 --> 00:40:38,720
to achieving this
very general model

803
00:40:38,720 --> 00:40:41,450
by making some
slight assumptions,

804
00:40:41,450 --> 00:40:43,590
or requiring the
programmer to make

805
00:40:43,590 --> 00:40:46,455
some small assumptions in the
way they write their programs.

806
00:40:50,390 --> 00:40:53,440
And this generalization
to more general actions

807
00:40:53,440 --> 00:40:55,750
that are recoverable,
generalizing

808
00:40:55,750 --> 00:41:00,415
from a single sector uses
this idea of a commit point.

809
00:41:00,415 --> 00:41:01,790
The way this is
going to work out

810
00:41:01,790 --> 00:41:06,340
is the programmer, for
any recoverable action,

811
00:41:06,340 --> 00:41:09,640
he or she is going to end up
writing this special function

812
00:41:09,640 --> 00:41:12,620
call called begin
recoverable action

813
00:41:12,620 --> 00:41:17,260
and then writing the code
for that recoverable action.

814
00:41:17,260 --> 00:41:21,130
And then at some point in the
middle of this code calling

815
00:41:21,130 --> 00:41:25,200
a function called "commit".

816
00:41:25,200 --> 00:41:27,340
And the idea is that
until this commit

817
00:41:27,340 --> 00:41:30,480
is called nobody else sees
the results of this action.

818
00:41:30,480 --> 00:41:32,570
Which means that if
a failure happened,

819
00:41:32,570 --> 00:41:39,020
upon crash recovery or
once the system restarts,

820
00:41:39,020 --> 00:41:41,490
the result would be as if none
of the steps of this action

821
00:41:41,490 --> 00:41:43,130
ever happened.

822
00:41:43,130 --> 00:41:44,420
So they are called commit.

823
00:41:44,420 --> 00:41:47,550
And then once commit finished
then no matter what happens,

824
00:41:47,550 --> 00:41:49,820
a failure could happen
and the system restarts,

825
00:41:49,820 --> 00:41:52,210
but once commit is called
and it returns then you

826
00:41:52,210 --> 00:41:55,090
are guaranteed that all
other actions see the state

827
00:41:55,090 --> 00:41:58,760
changes made by this action.

828
00:41:58,760 --> 00:42:00,540
So this is a special call.

829
00:42:00,540 --> 00:42:03,300
And then after commit they
might have some other lines

830
00:42:03,300 --> 00:42:08,130
that they write and then they
end the recoverable action.

831
00:42:08,130 --> 00:42:10,480
Now, in many, many cases,
the very last thing

832
00:42:10,480 --> 00:42:13,660
that is done before the
end recoverable action

833
00:42:13,660 --> 00:42:15,650
is the commit.

834
00:42:15,650 --> 00:42:18,380
But, in general, you might
have other things here.

835
00:42:18,380 --> 00:42:20,910
And it will turn out that you
cannot do arbitrary things

836
00:42:20,910 --> 00:42:22,030
here.

837
00:42:22,030 --> 00:42:25,690
For example, you cannot do disk
writes that you want to make

838
00:42:25,690 --> 00:42:28,540
recoverable over here because
the moment you do that,

839
00:42:28,540 --> 00:42:30,910
by definition, if a crash
happens after a commit,

840
00:42:30,910 --> 00:42:33,170
we do not have a plan
to back out of it.

841
00:42:33,170 --> 00:42:34,750
Because the semantics
were that once

842
00:42:34,750 --> 00:42:36,270
a commit is done
then no matter what

843
00:42:36,270 --> 00:42:37,686
happens the state
of the system is

844
00:42:37,686 --> 00:42:41,690
as if all of the things
in this action finished.

845
00:42:41,690 --> 00:42:43,920
The discipline is going to
be, this thing is called

846
00:42:43,920 --> 00:42:48,730
the "pre-commit phase" and
this thing here is called

847
00:42:48,730 --> 00:42:49,790
the "post-commit phase".

848
00:42:53,830 --> 00:42:56,660
And so the idea is that
in the pre-commit phase

849
00:42:56,660 --> 00:42:59,240
you should always be
prepared to back out.

850
00:42:59,240 --> 00:43:02,930
Because, by definition, if the
failure occurs before commit

851
00:43:02,930 --> 00:43:06,120
is called the result is going to
be as if nothing ever happened,

852
00:43:06,120 --> 00:43:08,280
which means that any
change you make here

853
00:43:08,280 --> 00:43:10,510
you better religiously
follow that never

854
00:43:10,510 --> 00:43:15,310
modify the only copy rule
and be prepared to back out.

855
00:43:15,310 --> 00:43:19,990
In the post-commit
phase, conversely, you

856
00:43:19,990 --> 00:43:22,074
don't have the
option to back out

857
00:43:22,074 --> 00:43:23,990
so you better make sure
that once you get here

858
00:43:23,990 --> 00:43:26,270
you just run to completion.

859
00:43:26,270 --> 00:43:29,380
If a failure occurs out
here and you restart,

860
00:43:29,380 --> 00:43:32,259
you better make sure that
you can run to completion.

861
00:43:32,259 --> 00:43:34,050
In fact, there are a
few other restrictions

862
00:43:34,050 --> 00:43:36,990
out in the post-commit phase.

863
00:43:36,990 --> 00:43:38,580
Let me do this by an example.

864
00:43:38,580 --> 00:43:41,300
In the pre-commit phase, because
you have to be prepared to back

865
00:43:41,300 --> 00:43:43,740
out, it often means in practice
that you cannot be sending

866
00:43:43,740 --> 00:43:46,320
messages out onto the network.

867
00:43:46,320 --> 00:43:48,320
You can maintain
your local state

868
00:43:48,320 --> 00:43:51,680
but you have a way
to back out of that.

869
00:43:51,680 --> 00:43:54,330
But if you are sending messages
out onto the network and you

870
00:43:54,330 --> 00:43:58,120
do not have a bigger
story to deal with it--

871
00:43:58,120 --> 00:44:01,459
We will talk later about nesting
atomic actions within one

872
00:44:01,459 --> 00:44:03,500
another or nesting
recoverable actions within one

873
00:44:03,500 --> 00:44:05,060
another in a few
lectures from now.

874
00:44:05,060 --> 00:44:07,560
But, in the simple model, if
you do anything that you cannot

875
00:44:07,560 --> 00:44:10,150
back out of such as sending
a network packet then you are

876
00:44:10,150 --> 00:44:10,820
stuck.

877
00:44:10,820 --> 00:44:15,100
So all of that stuff like
printing out checks or firing

878
00:44:15,100 --> 00:44:17,900
a bullet or things like that,
that you cannot back out

879
00:44:17,900 --> 00:44:20,390
of, you better put out here.

880
00:44:20,390 --> 00:44:22,530
All the things that you
can back out of go here.

881
00:44:22,530 --> 00:44:25,026
Likewise, nothing you can
back out of can go here.

882
00:44:25,026 --> 00:44:27,150
Because, once you reach
here and a failure happens,

883
00:44:27,150 --> 00:44:28,952
you have to continue
to completion.

884
00:44:28,952 --> 00:44:31,160
What that means is in the
first commit phase, really,

885
00:44:31,160 --> 00:44:32,840
you cannot do very many things.

886
00:44:32,840 --> 00:44:36,560
I mean you can do things that
do not really have, for example,

887
00:44:36,560 --> 00:44:38,994
you can do things that
are OK to keep doing.

888
00:44:38,994 --> 00:44:40,910
For example, you can do
item put and operation

889
00:44:40,910 --> 00:44:43,505
so that if a failure happens
here and you recover then

890
00:44:43,505 --> 00:44:45,130
you know that you
are out at this point

891
00:44:45,130 --> 00:44:47,630
so you could keep retrying those
actions over and over again

892
00:44:47,630 --> 00:44:50,950
until you insure
that it completes.

893
00:44:50,950 --> 00:44:52,964
But those are the only rules.

894
00:44:52,964 --> 00:44:55,130
There is a pre-commit phase
and a post-commit phase.

895
00:44:55,130 --> 00:44:58,077
There is a commit that
is explicitly called.

896
00:44:58,077 --> 00:44:59,660
Now, in addition
there is another call

897
00:44:59,660 --> 00:45:02,180
that a programmer can make
or that the system can

898
00:45:02,180 --> 00:45:04,690
invoke automatically and
that is called "abort".

899
00:45:08,652 --> 00:45:11,110
For example, when you are moving
money from savings account

900
00:45:11,110 --> 00:45:13,800
to checking account in
that transfer example,

901
00:45:13,800 --> 00:45:17,440
if you discover in the
middle here that you do not

902
00:45:17,440 --> 00:45:19,250
have enough funds to
cover that transfer,

903
00:45:19,250 --> 00:45:26,000
you could just decide to
abort the recoverable action.

904
00:45:26,000 --> 00:45:29,020
And what that means is
that abort automatically

905
00:45:29,020 --> 00:45:30,890
will insure that the
state of the system

906
00:45:30,890 --> 00:45:34,300
is at the point right
before the start

907
00:45:34,300 --> 00:45:35,504
of the recoverable action.

908
00:45:35,504 --> 00:45:37,670
Whatever changes were made
in the middle until abort

909
00:45:37,670 --> 00:45:41,152
was called end up backing out.

910
00:45:41,152 --> 00:45:43,110
Now, abort might also be
invoked by the system.

911
00:45:43,110 --> 00:45:46,300
In a database, there is somebody
booking airline tickets, car

912
00:45:46,300 --> 00:45:48,800
reservations and all of that,
and you discover in the middle

913
00:45:48,800 --> 00:45:50,870
that you are not
actually able to find

914
00:45:50,870 --> 00:45:53,640
a hotel for the same dates.

915
00:45:53,640 --> 00:45:56,610
So you might just abort the
whole process, control C

916
00:45:56,610 --> 00:45:58,320
the thread you
are running, which

917
00:45:58,320 --> 00:46:00,460
means that all of the
work that has been done

918
00:46:00,460 --> 00:46:01,430
has to be backed out.

919
00:46:01,430 --> 00:46:03,388
And so the system would
normally implement that

920
00:46:03,388 --> 00:46:06,260
by aborting all of the changes
that you have made so far.

921
00:46:06,260 --> 00:46:09,710
It will back out of
your car reservation,

922
00:46:09,710 --> 00:46:11,750
back out of your airline
reservation and so on.

923
00:46:11,750 --> 00:46:14,140
So abort is called in a
few different contexts.

924
00:46:14,140 --> 00:46:16,090
Sometimes by the program
itself, sometimes

925
00:46:16,090 --> 00:46:18,180
by the system to
free up resources,

926
00:46:18,180 --> 00:46:22,400
sometimes by the user of
your, say, transaction system.

927
00:46:22,400 --> 00:46:38,160
I am not going to get into how
we implement recoverable action

928
00:46:38,160 --> 00:46:40,080
today, but this
programming model

929
00:46:40,080 --> 00:46:41,300
is important to understand.

930
00:46:41,300 --> 00:46:43,040
I do want to mention
one thing going back

931
00:46:43,040 --> 00:46:45,840
to this idea of isolation
that we talked about.

932
00:46:45,840 --> 00:46:47,490
If you recall,
isolation is this idea

933
00:46:47,490 --> 00:46:51,240
that you have two actions or
multiple actions whose net

934
00:46:51,240 --> 00:46:53,440
effect is as if they ran
in some sequential order,

935
00:46:53,440 --> 00:46:55,910
some serial order,
A1 before A2 or A2

936
00:46:55,910 --> 00:47:01,220
before A1 for all implantation
of A1, A2, A3, etc.

937
00:47:01,220 --> 00:47:03,950
Now, this idea is actually
very closely related

938
00:47:03,950 --> 00:47:06,625
but not the same as stuff
we have seen before.

939
00:47:06,625 --> 00:47:08,000
Earlier in the
semester we looked

940
00:47:08,000 --> 00:47:11,834
at ways in which you have
multiple threads that

941
00:47:11,834 --> 00:47:13,500
need to be synchronized
with each other.

942
00:47:13,500 --> 00:47:16,540
And we actually did look
at isolation as a concept

943
00:47:16,540 --> 00:47:19,330
then but we specifically
focused on things

944
00:47:19,330 --> 00:47:20,950
like sequence
coordination where you

945
00:47:20,950 --> 00:47:23,470
want to have one thread run
before the other thread or one

946
00:47:23,470 --> 00:47:25,650
thread run off of
the other thread.

947
00:47:25,650 --> 00:47:29,810
For example, in a
producer-consumer relationship.

948
00:47:29,810 --> 00:47:34,790
The point is that in
one significant respect,

949
00:47:34,790 --> 00:47:37,850
achieving this idea of
isolation for actions

950
00:47:37,850 --> 00:47:42,040
is harder than achieving
sequence coordination.

951
00:47:42,040 --> 00:47:44,600
And the reason it is
harder is that everybody

952
00:47:44,600 --> 00:47:47,640
who writes an isolated action,
in general, does not know,

953
00:47:47,640 --> 00:47:49,200
any given isolated
action does not

954
00:47:49,200 --> 00:47:52,120
know what other actions
there are in the system.

955
00:47:52,120 --> 00:47:55,000
So you might have 25
different actions all of which

956
00:47:55,000 --> 00:47:57,956
are touching the same
data, but no single action

957
00:47:57,956 --> 00:47:59,580
is aware of all of
these other actions.

958
00:47:59,580 --> 00:48:02,490
That is very different
from sequence coordination.

959
00:48:02,490 --> 00:48:04,800
In sequence coordination,
there is one or two

960
00:48:04,800 --> 00:48:07,917
or a small number of threads
that are actually aware

961
00:48:07,917 --> 00:48:08,500
of each other.

962
00:48:08,500 --> 00:48:10,040
And there is a single
programmer that is actually

963
00:48:10,040 --> 00:48:12,050
designing these things
to specifically interact

964
00:48:12,050 --> 00:48:14,814
with each other in some fashion,
so this thread runs and then

965
00:48:14,814 --> 00:48:16,230
this other one
runs after the data

966
00:48:16,230 --> 00:48:18,080
has been produced and so on.

967
00:48:18,080 --> 00:48:20,230
In that sense, this
kind of isolation

968
00:48:20,230 --> 00:48:25,550
is harder to achieve because
each individual action does not

969
00:48:25,550 --> 00:48:26,980
know which other
action there are.

970
00:48:26,980 --> 00:48:31,312
But, yet, you want to
achieve this sequential goal.

971
00:48:31,312 --> 00:48:32,770
Now, in one other
respect, actually

972
00:48:32,770 --> 00:48:36,720
isolated actions are easier
than sequence coordination.

973
00:48:36,720 --> 00:48:38,720
And the significant way
in which they are easier

974
00:48:38,720 --> 00:48:40,178
is they are easier
for programmers.

975
00:48:42,300 --> 00:48:45,130
Because we are not worried about
coordinating different actions

976
00:48:45,130 --> 00:48:50,680
with each other, once you design
a system that inside the system

977
00:48:50,680 --> 00:48:53,810
deals with ways of
achieving isolation,

978
00:48:53,810 --> 00:48:56,360
the programmers do not have to
think about locks and unlocks

979
00:48:56,360 --> 00:48:59,210
and acquiring and releasing
locks or other ways

980
00:48:59,210 --> 00:49:04,155
in which they control access to
variables that might be shared.

981
00:49:04,155 --> 00:49:06,530
What this means is that if we
can design isolated actions

982
00:49:06,530 --> 00:49:10,610
right and we do not worry
about any serial order,

983
00:49:10,610 --> 00:49:12,980
A1 can run before
A2 or A2 before A1,

984
00:49:12,980 --> 00:49:15,940
then it makes life a lot
easier for a programmer.

985
00:49:15,940 --> 00:49:17,610
And our goal is to
come up with ways

986
00:49:17,610 --> 00:49:20,400
of achieving recoverability
and isolation that

987
00:49:20,400 --> 00:49:22,990
require very little
from a programmer that

988
00:49:22,990 --> 00:49:24,600
wants these properties.

989
00:49:24,600 --> 00:49:26,590
It is a little bit
like pixy dust.

990
00:49:26,590 --> 00:49:28,410
You might write
a general program

991
00:49:28,410 --> 00:49:31,000
and come in and just put a
begin recoverable action,

992
00:49:31,000 --> 00:49:34,177
end recoverable action and make
a few changes to your program.

993
00:49:34,177 --> 00:49:36,010
Or you might just say
begin isolated action,

994
00:49:36,010 --> 00:49:37,450
end isolated action,
and magically

995
00:49:37,450 --> 00:49:40,450
the system achieves isolation
or recoverability for you.

996
00:49:43,640 --> 00:49:45,710
It can make life much
easier for a programmer

997
00:49:45,710 --> 00:49:47,240
but it is a harder
problem for us

998
00:49:47,240 --> 00:49:49,340
because no single
action is aware of all

999
00:49:49,340 --> 00:49:51,571
of the other actions
in the system.

1000
00:49:51,571 --> 00:49:54,070
Next time we will see how to
achieve recoverability and then

1001
00:49:54,070 --> 00:49:55,680
isolation and transactions.

1002
00:50:07,190 --> 00:50:09,300
Design Project 2 is
out on the website now.

1003
00:50:09,300 --> 00:50:12,300
And the main thing for you
to make sure you do this week

1004
00:50:12,300 --> 00:50:16,700
is get project partners and
send a list of team members

1005
00:50:16,700 --> 00:50:19,460
to your teaching assistant
by Thursday's recitation.

1006
00:50:19,460 --> 00:50:21,010
Thanks.