1
00:00:00,499 --> 00:00:04,400
We are continuing our
discussion of fault-tolerance

2
00:00:04,400 --> 00:00:08,360
and atomicity.

3
00:00:08,360 --> 00:00:11,060
And sort of teaching
these lectures

4
00:00:11,060 --> 00:00:14,170
makes me feel, in the beginning,
like those TV shows where

5
00:00:14,170 --> 00:00:16,700
they always, before
a new episode,

6
00:00:16,700 --> 00:00:20,370
tell you everything that
happened so far in the season.

7
00:00:20,370 --> 00:00:23,480
So we will do the same thing.

8
00:00:23,480 --> 00:00:29,130
The story so far here is that
in order to deal with failures,

9
00:00:29,130 --> 00:00:33,010
we came up with this idea
of making modules have

10
00:00:33,010 --> 00:00:37,790
this property of atomicity,
which actually has

11
00:00:37,790 --> 00:00:42,370
two aspects to it, one of which
is an all or nothing aspect,

12
00:00:42,370 --> 00:00:49,400
which we call recoverability,
and the other is a way

13
00:00:49,400 --> 00:00:51,640
to coordinate multiple
concurrent activities so you

14
00:00:51,640 --> 00:00:55,870
get this illusion that they are
all separate from each other.

15
00:00:55,870 --> 00:00:58,260
And we call that isolation.

16
00:00:58,260 --> 00:01:01,530
And the basic rule here for
achieving recoverability

17
00:01:01,530 --> 00:01:04,090
was we repeatedly
applied this one rule

18
00:01:04,090 --> 00:01:06,880
that we called the "Golden Rule
of Recoverability" which is

19
00:01:06,880 --> 00:01:12,140
to never modify the only copy.

20
00:01:15,380 --> 00:01:16,870
And we used that
rule to build up

21
00:01:16,870 --> 00:01:19,060
this idea of recoverable
sector and we

22
00:01:19,060 --> 00:01:20,930
used that idea of
recoverable sector

23
00:01:20,930 --> 00:01:23,770
to come up with two schemes
for achieving recoverability,

24
00:01:23,770 --> 00:01:26,200
one using version
histories where

25
00:01:26,200 --> 00:01:29,710
you had a special case
of this "Never Modify

26
00:01:29,710 --> 00:01:32,490
the Only Copy Rule" which
was never modify anything.

27
00:01:32,490 --> 00:01:34,040
So, for any given
variable, you must

28
00:01:34,040 --> 00:01:36,930
create lots and lots of
versions never updating

29
00:01:36,930 --> 00:01:39,100
anything in place.

30
00:01:39,100 --> 00:01:41,337
And then we decided
that it is inefficient,

31
00:01:41,337 --> 00:01:43,920
so we came up with a different
way of achieving recoverability

32
00:01:43,920 --> 00:01:46,760
using logging.

33
00:01:46,760 --> 00:01:50,000
Then for isolation,
we did this last time

34
00:01:50,000 --> 00:01:58,540
where we talked
about serializability

35
00:01:58,540 --> 00:02:03,160
where the goal was to allow the
steps of the different actions

36
00:02:03,160 --> 00:02:07,000
to run in such a way that
the result is as if they

37
00:02:07,000 --> 00:02:08,550
ran in some serial order.

38
00:02:11,450 --> 00:02:13,540
And we talked about a
way of achieving that

39
00:02:13,540 --> 00:02:14,660
with cell storage.

40
00:02:14,660 --> 00:02:21,470
In particular, we talked about
using locks as an abstraction,

41
00:02:21,470 --> 00:02:24,315
as a programming primitive
to achieve isolation.

42
00:02:28,170 --> 00:02:31,840
And, in particular, the
key idea that we saw here

43
00:02:31,840 --> 00:02:35,610
was that serializability
implied that there

44
00:02:35,610 --> 00:02:38,430
were no cycles in this data
structure called the action

45
00:02:38,430 --> 00:02:39,750
graph.

46
00:02:39,750 --> 00:02:42,220
And, as long as you
could argue that

47
00:02:42,220 --> 00:02:45,870
for a given method of locking,
as long as you could argue

48
00:02:45,870 --> 00:02:48,750
that the resulting action
graph had no cycles,

49
00:02:48,750 --> 00:02:50,730
you were guaranteed
serializability.

50
00:02:50,730 --> 00:02:53,490
And, therefore, the
scheme provided isolation.

51
00:02:53,490 --> 00:02:56,450
And, in particular, a scheme
we looked at near the end

52
00:02:56,450 --> 00:02:57,520
was two-phase locking.

53
00:03:01,930 --> 00:03:06,660
Where the idea is that you never
acquire a lock for any item

54
00:03:06,660 --> 00:03:09,570
if you have done a
release of any other lock

55
00:03:09,570 --> 00:03:12,620
in the atomic action so far.

56
00:03:12,620 --> 00:03:14,750
And that's the reason why
this is called two-phase

57
00:03:14,750 --> 00:03:18,560
locking because if you
look at the two phases

58
00:03:18,560 --> 00:03:21,020
being a lock acquisition phase
where the only thing that

59
00:03:21,020 --> 00:03:23,014
is happening is locks
are being acquired

60
00:03:23,014 --> 00:03:25,180
and nothing is being released
so the number of locks

61
00:03:25,180 --> 00:03:27,185
is strictly
increasing with time.

62
00:03:27,185 --> 00:03:28,560
And then there is
a certain point

63
00:03:28,560 --> 00:03:31,190
of the action after which
you can only release locks.

64
00:03:31,190 --> 00:03:33,890
And you cannot acquire a lock
the moment you have released

65
00:03:33,890 --> 00:03:35,500
any given lock.

66
00:03:35,500 --> 00:03:39,400
And the way we argued this
protocol achieved isolation was

67
00:03:39,400 --> 00:03:41,890
to consider the
action graph resulting

68
00:03:41,890 --> 00:03:48,070
from some execution
of two-phase locking

69
00:03:48,070 --> 00:03:51,280
and argued that if there was a
cycle in that resulting action

70
00:03:51,280 --> 00:03:54,030
graph then two-phase
locking gets violated.

71
00:03:54,030 --> 00:03:55,530
And, therefore,
two-phase locking

72
00:03:55,530 --> 00:03:59,647
provides an action graph
which does not have cycles

73
00:03:59,647 --> 00:04:01,355
and, therefore, achieves
serializability.

74
00:04:04,540 --> 00:04:07,320
Two-phase locking is fine
and is a really good idea

75
00:04:07,320 --> 00:04:09,654
if you're into using locks.

76
00:04:09,654 --> 00:04:12,070
It has the property that you
do not actually need to know,

77
00:04:12,070 --> 00:04:14,580
the action does not need to
know at the beginning which

78
00:04:14,580 --> 00:04:18,712
data items it is going to
access which means that all you

79
00:04:18,712 --> 00:04:21,170
need to do is to make sure that
you do not release anything

80
00:04:21,170 --> 00:04:23,829
until everything
has been acquired.

81
00:04:23,829 --> 00:04:25,870
But you do not have to
know which ones to acquire

82
00:04:25,870 --> 00:04:28,740
before the start of the action.

83
00:04:28,740 --> 00:04:31,880
You just have to keep
acquiring sort of on demand

84
00:04:31,880 --> 00:04:35,240
until typically you get
to the commit point.

85
00:04:35,240 --> 00:04:37,675
And once you commit you
can release the locks.

86
00:04:40,400 --> 00:04:46,140
Now, in theory you can release
a lock at any given time

87
00:04:46,140 --> 00:04:49,860
once you are sure that you are
not going to acquire anymore

88
00:04:49,860 --> 00:04:54,270
locks, but that
theoretical approach only

89
00:04:54,270 --> 00:04:56,060
works if you are
guaranteed that there

90
00:04:56,060 --> 00:04:59,170
will be no aborts happening.

91
00:04:59,170 --> 00:05:02,761
Now, in general, you cannot know
beforehand when an action might

92
00:05:02,761 --> 00:05:03,260
abort.

93
00:05:03,260 --> 00:05:04,885
I mean the system
might decide to abort

94
00:05:04,885 --> 00:05:06,860
an action for a
variety of reasons,

95
00:05:06,860 --> 00:05:10,022
and we will see
some reasons today.

96
00:05:10,022 --> 00:05:11,480
In practice, what
ends up happening

97
00:05:11,480 --> 00:05:16,260
is that when you abort
you have to go back

98
00:05:16,260 --> 00:05:18,780
and look through the log
and undo the actions,

99
00:05:18,780 --> 00:05:22,384
run the undo steps
associated with steps

100
00:05:22,384 --> 00:05:23,550
that happened in the action.

101
00:05:23,550 --> 00:05:28,060
Which means that because
the undo goes ahead

102
00:05:28,060 --> 00:05:31,490
into cell storage and uninstalls
whatever changes were made,

103
00:05:31,490 --> 00:05:35,390
it better be the case that when
abort starts undoing things,

104
00:05:35,390 --> 00:05:37,990
it better be the case that
the cell storage items that

105
00:05:37,990 --> 00:05:41,840
are being undone have locks
owned by this action that

106
00:05:41,840 --> 00:05:44,250
is doing the undo.

107
00:05:44,250 --> 00:05:47,690
What this means is that if you
have a set of statements here

108
00:05:47,690 --> 00:05:54,450
that are doing read of x and a
write of y and things like that

109
00:05:54,450 --> 00:05:59,319
and then you have commit here --

110
00:05:59,319 --> 00:06:01,110
But this is the last
point in time at which

111
00:06:01,110 --> 00:06:02,276
you are reading and writing.

112
00:06:02,276 --> 00:06:05,860
And after that you are doing
some computation here not

113
00:06:05,860 --> 00:06:07,215
involving any reads or writes.

114
00:06:11,070 --> 00:06:15,400
The action might abort anywhere
here because the process,

115
00:06:15,400 --> 00:06:16,990
I mean this thread
might be terminated

116
00:06:16,990 --> 00:06:19,880
and the action
will have to abort.

117
00:06:19,880 --> 00:06:24,120
What that means is that a
release for a data item that

118
00:06:24,120 --> 00:06:27,710
is required by abort in
order to undo the state

119
00:06:27,710 --> 00:06:30,450
changes that have
been made, that lock

120
00:06:30,450 --> 00:06:32,750
had better not be
released before here.

121
00:06:32,750 --> 00:06:35,210
Because if the lock
got released here then

122
00:06:35,210 --> 00:06:37,680
some other action could
have acquired the lock

123
00:06:37,680 --> 00:06:40,110
and gone ahead and started
working with the changes made

124
00:06:40,110 --> 00:06:42,020
by this action.

125
00:06:42,020 --> 00:06:43,742
And now it is too late to abort.

126
00:06:43,742 --> 00:06:45,450
Someone else has
already seen the changes

127
00:06:45,450 --> 00:06:47,820
that have been made.

128
00:06:47,820 --> 00:06:49,820
In fact, you cannot be
guaranteed now that later

129
00:06:49,820 --> 00:06:53,320
on you actually can regain the
lock and the results would be

130
00:06:53,320 --> 00:06:55,450
wrong.

131
00:06:55,450 --> 00:07:01,490
So, in fact, the two-phase
locking rule really is that you

132
00:07:01,490 --> 00:07:06,000
cannot release any lock until
all of the locks have been

133
00:07:06,000 --> 00:07:07,520
acquired.

134
00:07:07,520 --> 00:07:11,290
And, moreover, any locks that
are needed in order for abort

135
00:07:11,290 --> 00:07:14,120
to successfully
run had better not

136
00:07:14,120 --> 00:07:16,550
be released until you are
sure that the action won't

137
00:07:16,550 --> 00:07:17,166
abort anymore.

138
00:07:17,166 --> 00:07:18,540
And the only time
you can be sure

139
00:07:18,540 --> 00:07:19,998
that the action
won't abort anymore

140
00:07:19,998 --> 00:07:23,290
is once commit has been done.

141
00:07:23,290 --> 00:07:28,360
What that really means
is that the release

142
00:07:28,360 --> 00:07:30,020
of the locks of all
of the items that

143
00:07:30,020 --> 00:07:32,590
are required for
undoing this action

144
00:07:32,590 --> 00:07:35,210
had better happen
after the commit point.

145
00:07:35,210 --> 00:07:36,960
And, moreover, no
item locks should

146
00:07:36,960 --> 00:07:41,440
be released until all of
the acquires have been done.

147
00:07:41,440 --> 00:07:43,190
Now the reason I've
said this in two parts

148
00:07:43,190 --> 00:07:45,420
is that if you are just
reading a data item,

149
00:07:45,420 --> 00:07:48,120
you actually don't need to
hold onto the lock of that item

150
00:07:48,120 --> 00:07:51,170
in order to do the undo
because all you did was read x.

151
00:07:51,170 --> 00:07:53,240
There is no change that
happened to the variable

152
00:07:53,240 --> 00:07:55,917
x, which means although you
need to acquire the lock of x

153
00:07:55,917 --> 00:07:58,000
in order to read it because
you don't want to have

154
00:07:58,000 --> 00:08:01,810
other people making changes to
it while you are reading it,

155
00:08:01,810 --> 00:08:04,719
you don't actually need to hold
onto the lock of x in order

156
00:08:04,719 --> 00:08:06,510
to do the undo because
you are not actually

157
00:08:06,510 --> 00:08:09,690
writing x during the undo step.

158
00:08:13,340 --> 00:08:15,590
So that's the amendment to
the two-phase locking rule.

159
00:08:15,590 --> 00:08:18,730
Things that you need in
order to do undos for

160
00:08:18,730 --> 00:08:20,590
should only be
released after you

161
00:08:20,590 --> 00:08:22,930
are sure that no aborts
will be done, which

162
00:08:22,930 --> 00:08:24,370
means after the commit point.

163
00:08:29,122 --> 00:08:30,580
This way of doing
two-phase locking

164
00:08:30,580 --> 00:08:34,010
is actually a
pretty good scheme.

165
00:08:34,010 --> 00:08:36,669
And it turns out
that, in many ways,

166
00:08:36,669 --> 00:08:39,740
it is the most efficient
and most general method.

167
00:08:39,740 --> 00:08:42,740
What that means is that there
might be special cases where

168
00:08:42,740 --> 00:08:46,130
other ways of other protocols
for doing locks of objects

169
00:08:46,130 --> 00:08:48,460
perform better under
certain special cases

170
00:08:48,460 --> 00:08:51,360
compared to two-phase
locking, but if you've

171
00:08:51,360 --> 00:08:55,210
bought into using locks in
order to do concurrency control

172
00:08:55,210 --> 00:08:57,390
and you don't know very
much about the nature

173
00:08:57,390 --> 00:09:00,580
of the actions involved
then two-phase locking

174
00:09:00,580 --> 00:09:01,474
is quite efficient.

175
00:09:01,474 --> 00:09:03,390
I mean there are variants
of two-phase locking

176
00:09:03,390 --> 00:09:06,190
but, by and large, this
idea, it's very hard

177
00:09:06,190 --> 00:09:08,810
to do much better than this
in a very general sense

178
00:09:08,810 --> 00:09:12,330
if you're using locking for
doing concurrency control.

179
00:09:15,777 --> 00:09:17,110
But there are a set of problems.

180
00:09:17,110 --> 00:09:19,980
It's not two-phase locking, as
we have described it so far,

181
00:09:19,980 --> 00:09:22,070
completely solves the
problem of insuring

182
00:09:22,070 --> 00:09:25,079
that actions perform well.

183
00:09:25,079 --> 00:09:26,620
And a particular
problem that happens

184
00:09:26,620 --> 00:09:29,880
any time you use locks
like here is deadlocks.

185
00:09:35,102 --> 00:09:36,560
And we have actually
seen deadlocks

186
00:09:36,560 --> 00:09:39,820
before in an
earlier chapter when

187
00:09:39,820 --> 00:09:42,582
we talked about
synchronization of threads,

188
00:09:42,582 --> 00:09:44,040
and it is exactly
the same problem.

189
00:09:44,040 --> 00:09:47,900
And the way you deal with it
pretty much is almost the same.

190
00:09:50,529 --> 00:09:51,570
What is the problem here?

191
00:09:51,570 --> 00:09:54,470
Well, what could happen
is that one action does

192
00:09:54,470 --> 00:10:02,609
read x and write y and the other
action does read y and write x.

193
00:10:02,609 --> 00:10:04,650
And now you intersperse
the acquires and releases

194
00:10:04,650 --> 00:10:08,670
so you do an acquire of
lx here and maybe you

195
00:10:08,670 --> 00:10:14,510
do an acquire of ly here and
here you do an acquire of ly

196
00:10:14,510 --> 00:10:17,050
and you do an acquire of lx.

197
00:10:17,050 --> 00:10:19,920
And what could happen is that
once you get to this stage

198
00:10:19,920 --> 00:10:24,420
where this action has come this
far and is about to run this

199
00:10:24,420 --> 00:10:26,540
and this other action
has come up to here,

200
00:10:26,540 --> 00:10:28,857
now you are stuck
because this action has

201
00:10:28,857 --> 00:10:30,940
to wait until that is
released and this action has

202
00:10:30,940 --> 00:10:33,820
to wait until that is released
and neither can make progress.

203
00:10:39,382 --> 00:10:41,590
So there are a few different
ways of dealing with it.

204
00:10:41,590 --> 00:10:45,470
And the simplest way and
the way that turns out

205
00:10:45,470 --> 00:10:49,490
to be one that is often
used in practice both

206
00:10:49,490 --> 00:10:53,219
because it is simple and
because once you implement

207
00:10:53,219 --> 00:10:55,260
the technique you don't
have to do very much else

208
00:10:55,260 --> 00:10:57,850
is to just set
timers on actions.

209
00:10:57,850 --> 00:10:59,650
So it's just to timeout.

210
00:10:59,650 --> 00:11:03,360
And if you notice that
for a period of time

211
00:11:03,360 --> 00:11:05,250
an action has not
made any progress

212
00:11:05,250 --> 00:11:10,180
then have a timeout that is
associated with the action.

213
00:11:10,180 --> 00:11:12,810
And if the action
itself notices that it

214
00:11:12,810 --> 00:11:15,440
hasn't made any progress,
perhaps in another thread,

215
00:11:15,440 --> 00:11:18,080
then just go ahead
and abort this thread.

216
00:11:18,080 --> 00:11:20,380
Now, it is perfectly
OK to abort.

217
00:11:20,380 --> 00:11:22,930
And, in this particular
case, aborting

218
00:11:22,930 --> 00:11:24,820
either of these
actions is enough

219
00:11:24,820 --> 00:11:29,590
and the other will make
progress and then you are done.

220
00:11:29,590 --> 00:11:33,210
And then the action that
got aborted can retry.

221
00:11:33,210 --> 00:11:36,420
So the first solution
is to just use a timer.

222
00:11:41,379 --> 00:11:42,920
And there is a school
of thought that

223
00:11:42,920 --> 00:11:45,380
believes that in
practice deadlocks

224
00:11:45,380 --> 00:11:47,836
should not be very common.

225
00:11:47,836 --> 00:11:49,960
And the reason is that
deadlocks occur if there is,

226
00:11:49,960 --> 00:11:52,390
you know, there has to be
a contention for resources

227
00:11:52,390 --> 00:11:55,630
and there has to be contention
for multiple threads

228
00:11:55,630 --> 00:11:57,634
for the same resources.

229
00:11:57,634 --> 00:11:59,300
And it has to be more
than one resource,

230
00:11:59,300 --> 00:12:02,010
because if you just have one
resource you cannot really get

231
00:12:02,010 --> 00:12:06,020
a deadlock which means that you
are sort of running multiple

232
00:12:06,020 --> 00:12:09,255
actions that are contending for
a number of different shared

233
00:12:09,255 --> 00:12:09,755
objects.

234
00:12:14,420 --> 00:12:16,930
And what that
suggests is if there

235
00:12:16,930 --> 00:12:20,340
is a high degree of concurrency
like that and shared contention

236
00:12:20,340 --> 00:12:23,510
then it may be hard for you
to get high performance.

237
00:12:23,510 --> 00:12:25,620
A lot of people think
that the right way

238
00:12:25,620 --> 00:12:28,650
to be designing
applications is to try

239
00:12:28,650 --> 00:12:32,040
hard to insure that the degree
of sharing between objects

240
00:12:32,040 --> 00:12:33,580
is actually quite small.

241
00:12:33,580 --> 00:12:35,240
For example, rather
than set up a lock

242
00:12:35,240 --> 00:12:38,210
on an entire big
database table, you

243
00:12:38,210 --> 00:12:40,930
might set up locks at
final granularities.

244
00:12:40,930 --> 00:12:42,960
And if you set up locks
at final granularities

245
00:12:42,960 --> 00:12:45,530
the chances of multiple
actions wanting to gain access

246
00:12:45,530 --> 00:12:49,390
to the same exact
fine-grained entry in a table

247
00:12:49,390 --> 00:12:51,220
might be small.

248
00:12:51,220 --> 00:12:54,520
And in that situation,
given that the chances

249
00:12:54,520 --> 00:12:56,220
of a deadlock
occurring are rare,

250
00:12:56,220 --> 00:12:58,637
timing out every once in a
while and aborting an action

251
00:12:58,637 --> 00:12:59,970
is not going to be catastrophic.

252
00:12:59,970 --> 00:13:00,330
It's OK.

253
00:13:00,330 --> 00:13:01,190
It is a rare event.

254
00:13:01,190 --> 00:13:03,140
So rather than spend a
whole lot of complexity

255
00:13:03,140 --> 00:13:05,010
dealing with that
rare event, just

256
00:13:05,010 --> 00:13:08,580
go ahead and let
something abort.

257
00:13:08,580 --> 00:13:10,740
Let an action that hasn't
made any progress abort.

258
00:13:10,740 --> 00:13:12,350
Moreover, these
timers are necessary

259
00:13:12,350 --> 00:13:14,520
anyway because an
action might end up

260
00:13:14,520 --> 00:13:16,842
getting stuck in
an infinite loop

261
00:13:16,842 --> 00:13:19,050
or it might end up getting
stuck in a situation where

262
00:13:19,050 --> 00:13:21,720
it is not really waiting for a
lock there is just a bug in it.

263
00:13:21,720 --> 00:13:24,170
There is a problem with
it, it is not really

264
00:13:24,170 --> 00:13:26,794
making any progress and maybe
it is consuming resources

265
00:13:26,794 --> 00:13:28,210
and no one else
can make progress.

266
00:13:28,210 --> 00:13:30,910
So the system anyway needs a
way to abort those actions.

267
00:13:30,910 --> 00:13:32,710
And it needs a timeout
mechanism anyway.

268
00:13:32,710 --> 00:13:34,470
So why not just use
that same mechanism

269
00:13:34,470 --> 00:13:40,060
to deal with deadlocks as well.

270
00:13:40,060 --> 00:13:42,240
Probably somewhat a minority,
but some other people

271
00:13:42,240 --> 00:13:43,740
believe that deadlocks
might happen.

272
00:13:43,740 --> 00:13:46,180
And, when they do
happen, perhaps

273
00:13:46,180 --> 00:13:48,390
because the granularity
of locking in your system

274
00:13:48,390 --> 00:13:52,144
is not fine-grained then you
do not want to get stuck.

275
00:13:52,144 --> 00:13:53,560
And you want to
optimize, at least

276
00:13:53,560 --> 00:13:54,910
you want to do reasonably
well rather than

277
00:13:54,910 --> 00:13:56,680
waiting for some
long timeout period

278
00:13:56,680 --> 00:13:58,960
before aborting an action.

279
00:13:58,960 --> 00:14:05,120
And people who believe that
build a data structured called

280
00:14:05,120 --> 00:14:06,310
the "Waits-For Graph".

281
00:14:10,932 --> 00:14:12,390
And the best way
to understand this

282
00:14:12,390 --> 00:14:20,530
is imagine you have a database
system that supports isolation

283
00:14:20,530 --> 00:14:22,450
and any time you want
to acquire a lock you

284
00:14:22,450 --> 00:14:25,150
send a message to this
entity in the database system

285
00:14:25,150 --> 00:14:27,870
called lock manager
asking to acquire a lock.

286
00:14:27,870 --> 00:14:30,100
And any time you release
it you do the same thing.

287
00:14:30,100 --> 00:14:34,760
What that lock manager
can do, for each lock

288
00:14:34,760 --> 00:14:39,470
it can keep track of which
actions running concurrently

289
00:14:39,470 --> 00:14:43,050
has acquired that lock and which
action is waiting for a lock.

290
00:14:43,050 --> 00:14:46,849
And what you can do now is build
up a graph of actions and locks

291
00:14:46,849 --> 00:14:49,390
and look to see whether there
is some kind of cycle where you

292
00:14:49,390 --> 00:14:52,230
have action A waiting
for lock B and lock

293
00:14:52,230 --> 00:14:55,650
B is being held by action C and
action C is waiting for lock D

294
00:14:55,650 --> 00:14:57,670
and lock D is being
held by action A.

295
00:14:57,670 --> 00:15:00,470
When you have a
cycle in this graph

296
00:15:00,470 --> 00:15:02,650
then you know that
you have a deadlock

297
00:15:02,650 --> 00:15:05,530
and none of those actions
can make progress so go ahead

298
00:15:05,530 --> 00:15:07,229
and kill one.

299
00:15:07,229 --> 00:15:09,020
And you can be
sophisticated about deciding

300
00:15:09,020 --> 00:15:10,160
which one to kill.

301
00:15:10,160 --> 00:15:12,020
You might kill the
one, for example,

302
00:15:12,020 --> 00:15:14,160
that has been waiting the
shortest amount of time

303
00:15:14,160 --> 00:15:16,860
because the others have been
waiting longer so they might

304
00:15:16,860 --> 00:15:19,770
make progress, or you might
have other policies for deciding

305
00:15:19,770 --> 00:15:22,890
which ones to kill.

306
00:15:22,890 --> 00:15:25,750
In practice, both these
systems are used sometimes

307
00:15:25,750 --> 00:15:30,110
by the same system
combining these ideas.

308
00:15:30,110 --> 00:15:33,370
For example, if you look at
like an Oracle database system,

309
00:15:33,370 --> 00:15:34,610
it uses primarily timers.

310
00:15:34,610 --> 00:15:36,770
At least from what
I could tell, it

311
00:15:36,770 --> 00:15:38,270
does not seem to
have any mechanisms

312
00:15:38,270 --> 00:15:40,680
for really doing this
check of a Waits-For graph.

313
00:15:40,680 --> 00:15:43,460
It just uses timers.

314
00:15:43,460 --> 00:15:46,610
And one of the oldest
transaction processing systems

315
00:15:46,610 --> 00:15:49,190
was a system called
CICS from IBM

316
00:15:49,190 --> 00:15:51,920
which also basically
used timers,

317
00:15:51,920 --> 00:15:54,590
but there are other systems.

318
00:15:54,590 --> 00:15:58,630
For instance, IBM has this
system called DB2 and Microsoft

319
00:15:58,630 --> 00:16:01,630
Sequence server that both use
this Waits-For data structure.

320
00:16:01,630 --> 00:16:04,420
And, in fact,
Microsoft's system seems

321
00:16:04,420 --> 00:16:07,260
to have a hundred thousand
different knobs for deciding

322
00:16:07,260 --> 00:16:10,090
how to turn off deadlocks,
including the ability

323
00:16:10,090 --> 00:16:12,630
to set various priorities
on different actions that

324
00:16:12,630 --> 00:16:13,670
might be running.

325
00:16:13,670 --> 00:16:16,045
And it is not actually apparent
that those knobs actually

326
00:16:16,045 --> 00:16:17,860
are usual for anything
or how you set them

327
00:16:17,860 --> 00:16:21,210
but that they have a lot of
things that you could set.

328
00:16:21,210 --> 00:16:22,120
Sounds familiar.

329
00:16:26,669 --> 00:16:27,960
Now, you can combine these two.

330
00:16:27,960 --> 00:16:33,112
And I think certain products
combine these two ideas.

331
00:16:33,112 --> 00:16:35,070
One decision you have to
make is to decide when

332
00:16:35,070 --> 00:16:36,870
to check this Waits-For graph.

333
00:16:36,870 --> 00:16:39,500
And an aggressive way of
doing it is the moment anybody

334
00:16:39,500 --> 00:16:41,620
does an acquire or
anybody does a release,

335
00:16:41,620 --> 00:16:45,530
in particular an acquire, you
update your lock manager's data

336
00:16:45,530 --> 00:16:48,150
structure and immediately look
to see if you have a cycle.

337
00:16:48,150 --> 00:16:50,190
Of course that takes
time and effort.

338
00:16:50,190 --> 00:16:53,170
You might decide not to
both but rather periodically

339
00:16:53,170 --> 00:16:54,806
look for cycles in
this Waits-For graph

340
00:16:54,806 --> 00:16:56,930
when a timer fires, so
every three seconds go ahead

341
00:16:56,930 --> 00:16:58,070
and look for cycles.

342
00:16:58,070 --> 00:17:02,110
So you might combine these ideas
in a bunch of different ways.

343
00:17:02,110 --> 00:17:04,140
Now, if you recall from
several lectures ago,

344
00:17:04,140 --> 00:17:05,639
another way of
dealing with deadlock

345
00:17:05,639 --> 00:17:10,173
is to order all of the
locks that an action might

346
00:17:10,173 --> 00:17:11,839
be able to acquire
in a particular order

347
00:17:11,839 --> 00:17:13,255
and insure that
all of the actions

348
00:17:13,255 --> 00:17:15,586
acquire the locks in
exactly the same order.

349
00:17:15,586 --> 00:17:17,960
And that will insure there
are no cycles because you have

350
00:17:17,960 --> 00:17:22,180
to go in the same order,
but that idea requires

351
00:17:22,180 --> 00:17:25,369
you to know beforehand
which data items

352
00:17:25,369 --> 00:17:26,569
you wish to gain access to.

353
00:17:26,569 --> 00:17:30,770
And that's often not possible
in many systems in which you

354
00:17:30,770 --> 00:17:31,710
care about isolations.

355
00:17:31,710 --> 00:17:36,310
So that's usually not adopted
at least in any database system.

356
00:17:36,310 --> 00:17:42,240
OK, so we talked
about deadlocks.

357
00:17:42,240 --> 00:17:46,730
We talked about when you can
release a lock that you acquire

358
00:17:46,730 --> 00:17:50,820
in order to abort because you
cannot release it typically,

359
00:17:50,820 --> 00:17:52,810
in reality, until the
commit point is done.

360
00:17:52,810 --> 00:17:54,830
The last issue we
need to talk about

361
00:17:54,830 --> 00:17:57,415
is an interaction
between logs and locks.

362
00:18:01,680 --> 00:18:03,180
And this interaction
has to do with,

363
00:18:03,180 --> 00:18:05,270
so we already saw what
happens when you abort.

364
00:18:05,270 --> 00:18:07,310
When you abort you need
to undo so you better

365
00:18:07,310 --> 00:18:09,180
make sure that to
do the undo you

366
00:18:09,180 --> 00:18:13,560
have the locks for
those cell items.

367
00:18:13,560 --> 00:18:16,070
You don't have to abort
but suppose you crash.

368
00:18:16,070 --> 00:18:20,050
Suppose the system
crashes and recovers.

369
00:18:20,050 --> 00:18:22,750
At that point, when
it recovers, it

370
00:18:22,750 --> 00:18:24,680
is going to run a
recovery procedure which

371
00:18:24,680 --> 00:18:27,180
has some combination
of redoing the winners

372
00:18:27,180 --> 00:18:29,491
and undoing the losers.

373
00:18:29,491 --> 00:18:31,490
Now, when it's undoing
things and redoing things

374
00:18:31,490 --> 00:18:37,300
it needs access to
items in the cell store.

375
00:18:37,300 --> 00:18:39,794
And we've already seen
when the system is normally

376
00:18:39,794 --> 00:18:41,960
running, in order to change
items in your cell store

377
00:18:41,960 --> 00:18:44,880
you need to gain
access to locks.

378
00:18:44,880 --> 00:18:47,400
The question now is
during crash recovery

379
00:18:47,400 --> 00:18:49,690
when the system is running
this redo undo thing,

380
00:18:49,690 --> 00:18:53,404
where do you get these
locks from and do you need

381
00:18:53,404 --> 00:18:54,570
to gain access to the locks?

382
00:18:57,539 --> 00:18:59,330
Now, in general, the
answer to the question

383
00:18:59,330 --> 00:19:01,340
might be that you need
to be very careful

384
00:19:01,340 --> 00:19:03,420
and perhaps need
access to the locks

385
00:19:03,420 --> 00:19:05,370
when you're running recovery.

386
00:19:05,370 --> 00:19:12,310
But there is one simplification
that systems typically

387
00:19:12,310 --> 00:19:14,340
make that eliminates
that requirement.

388
00:19:14,340 --> 00:19:17,540
And that simplification is
that during crash recovery

389
00:19:17,540 --> 00:19:22,010
you don't really allow new
actions to run on your system.

390
00:19:22,010 --> 00:19:25,040
So when a system crashes
and it is recovering,

391
00:19:25,040 --> 00:19:27,820
do not allow new actions to
run until recovery is complete.

392
00:19:27,820 --> 00:19:30,522
And only then do you
start new actions.

393
00:19:30,522 --> 00:19:31,980
What this means is
now we just have

394
00:19:31,980 --> 00:19:35,070
to worry about insuring
isolation clearly

395
00:19:35,070 --> 00:19:39,940
during recovery without
having new actions coming

396
00:19:39,940 --> 00:19:43,260
in and muddling things up.

397
00:19:43,260 --> 00:19:45,030
The question really
to think about

398
00:19:45,030 --> 00:19:50,240
is whether before the
crash, because the log is

399
00:19:50,240 --> 00:19:51,980
the only thing you
have in order to do

400
00:19:51,980 --> 00:19:54,150
recover, whether in
the log you actually

401
00:19:54,150 --> 00:19:55,670
need to keep track
of which locks

402
00:19:55,670 --> 00:20:01,340
were being held when the
system was running just fine.

403
00:20:01,340 --> 00:20:02,990
And if it turns out
that the log has

404
00:20:02,990 --> 00:20:06,130
to encode in it the locks
that were being held,

405
00:20:06,130 --> 00:20:09,762
it could be quite complicated
and a little bit messy.

406
00:20:09,762 --> 00:20:11,470
But if you think about
it, the nice thing

407
00:20:11,470 --> 00:20:14,090
is that we don't actually have
to encode the locks at all,

408
00:20:14,090 --> 00:20:15,060
store the locks at all.

409
00:20:15,060 --> 00:20:17,990
The locks can be completely
involved in the storage.

410
00:20:17,990 --> 00:20:20,740
And that is because
when you start off,

411
00:20:20,740 --> 00:20:24,510
when you have a log which
has various redo items

412
00:20:24,510 --> 00:20:27,520
and undo items, in any
element of the log,

413
00:20:27,520 --> 00:20:32,710
let's say an item x has been
updated in that log entry.

414
00:20:32,710 --> 00:20:34,950
Then you know for
sure that at the time

415
00:20:34,950 --> 00:20:38,540
this log entry was
written, the action

416
00:20:38,540 --> 00:20:42,260
that was making this update
did hold onto this lock

417
00:20:42,260 --> 00:20:44,350
and that this change
being made here

418
00:20:44,350 --> 00:20:47,046
that got written to the log
was, in fact, isolated assuming

419
00:20:47,046 --> 00:20:48,420
the locking protocol
was correct,

420
00:20:48,420 --> 00:20:50,753
was, in fact, isolated from
everything else concurrently

421
00:20:50,753 --> 00:20:53,200
that was going on.

422
00:20:53,200 --> 00:21:00,280
And so, although the
locks are not explicit,

423
00:21:00,280 --> 00:21:03,630
the log encodes in it the actual
serial order, some serial order

424
00:21:03,630 --> 00:21:07,320
of execution that did provide
isolation before the crash.

425
00:21:07,320 --> 00:21:11,560
Therefore, if you just blindly
go back through the log

426
00:21:11,560 --> 00:21:14,070
and make those changes
in sequential order

427
00:21:14,070 --> 00:21:16,120
then you are assured
that the changes you make

428
00:21:16,120 --> 00:21:18,660
are, in fact, going to be
isolated from one another.

429
00:21:18,660 --> 00:21:20,160
So you do not
actually have to worry

430
00:21:20,160 --> 00:21:23,400
about storing the locks
before the crash into the log,

431
00:21:23,400 --> 00:21:25,125
and that makes
life quite simple.

432
00:21:38,280 --> 00:21:44,390
That wraps up the discussion of
atomicity and, in particular,

433
00:21:44,390 --> 00:21:45,760
isolations.

434
00:21:45,760 --> 00:21:48,472
For the rest of
today and next time

435
00:21:48,472 --> 00:21:50,805
we are going to be talking
about some uses of atomicity.

436
00:21:58,671 --> 00:21:59,920
And the plan is the following.

437
00:21:59,920 --> 00:22:02,290
The plan is the first
application of atomicity which

438
00:22:02,290 --> 00:22:04,890
actually is the umbrella
for a number of things

439
00:22:04,890 --> 00:22:09,310
we are going to be looking
at is a transaction.

440
00:22:09,310 --> 00:22:11,680
And a transaction is
defined as an atomic action

441
00:22:11,680 --> 00:22:14,610
that has a few other
properties that it holds.

442
00:22:14,610 --> 00:22:19,470
And the first property
is consistency

443
00:22:19,470 --> 00:22:21,505
and the second
property is durability.

444
00:22:26,420 --> 00:22:29,340
And the second thing we
are going to look at,

445
00:22:29,340 --> 00:22:35,420
next lecture
actually, is atomicity

446
00:22:35,420 --> 00:22:37,110
when you have a
distributed system.

447
00:22:37,110 --> 00:22:43,510
It is using atomicity
on one computer

448
00:22:43,510 --> 00:22:46,430
to build out a system
that provides atomicity

449
00:22:46,430 --> 00:22:47,500
in a distributed system.

450
00:22:52,785 --> 00:22:54,910
So we will talk about
consistency the rest of today

451
00:22:54,910 --> 00:22:57,520
and the recitation
for tomorrow looks

452
00:22:57,520 --> 00:22:59,980
at a paper for reconciling
replicas, which

453
00:22:59,980 --> 00:23:02,030
is a particular
aspect of consistency.

454
00:23:02,030 --> 00:23:04,870
And then next
lecture next week we

455
00:23:04,870 --> 00:23:06,680
will talk about
multi-site atomicity.

456
00:23:06,680 --> 00:23:10,400
And the recitation next week
we will talk about durability.

457
00:23:10,400 --> 00:23:13,690
And once we do all of
that, that kind of wraps up

458
00:23:13,690 --> 00:23:19,630
this fault-tolerance
part of 6.033.

459
00:23:19,630 --> 00:23:21,745
Let me first talk a little
bit about transactions.

460
00:23:27,081 --> 00:23:28,580
Transaction is an
atomic action that

461
00:23:28,580 --> 00:23:31,870
has two other properties
associated with it.

462
00:23:31,870 --> 00:23:36,190
And people in the literature
often, in collegial terms,

463
00:23:36,190 --> 00:23:38,460
refer to transactions
as having a property

464
00:23:38,460 --> 00:23:40,680
called the ACID
property where ACID

465
00:23:40,680 --> 00:23:45,000
stands for atomicity,
consistency, isolation

466
00:23:45,000 --> 00:23:46,755
and durability.

467
00:23:46,755 --> 00:23:48,380
And you will see this
term a great deal

468
00:23:48,380 --> 00:23:50,890
in the literature and people
will use this all the time.

469
00:23:50,890 --> 00:23:54,181
And, for various
reasons, the way

470
00:23:54,181 --> 00:23:56,430
we have done things in this
class, some of these terms

471
00:23:56,430 --> 00:24:01,320
are used in slightly different
ways from the ACID term.

472
00:24:01,320 --> 00:24:06,340
When most people, at least
in distributed systems

473
00:24:06,340 --> 00:24:09,980
and database systems, use the
word atomicity, what they mean

474
00:24:09,980 --> 00:24:13,020
is what we meant
by recoverability.

475
00:24:13,020 --> 00:24:14,690
So it is all or nothing.

476
00:24:14,690 --> 00:24:16,930
When they use the letter
I here for isolation,

477
00:24:16,930 --> 00:24:20,750
they mean exactly the same
thing here that we did.

478
00:24:20,750 --> 00:24:22,750
And consistency and
durability unfortunately

479
00:24:22,750 --> 00:24:26,860
are going to mean
the exact same thing.

480
00:24:26,860 --> 00:24:28,480
But really the
point to notice is

481
00:24:28,480 --> 00:24:31,490
that these two properties,
atomicity and isolation

482
00:24:31,490 --> 00:24:36,070
are things that are
independent of an application.

483
00:24:36,070 --> 00:24:38,410
They just are properties
of atomic actions

484
00:24:38,410 --> 00:24:42,120
that an atomic action
can be recoverable

485
00:24:42,120 --> 00:24:43,882
and can be isolated.

486
00:24:43,882 --> 00:24:46,340
And you do not have to worry
about what the application is.

487
00:24:46,340 --> 00:24:48,570
It could be an application
in database systems.

488
00:24:48,570 --> 00:24:50,094
It could be something
in a processor

489
00:24:50,094 --> 00:24:52,010
where you are trying to
provide recoverability

490
00:24:52,010 --> 00:24:54,530
or isolation for instructions.

491
00:24:54,530 --> 00:24:56,422
These are properties
that are, in some sense,

492
00:24:56,422 --> 00:24:58,130
somewhat more fundamental
and lower layer

493
00:24:58,130 --> 00:25:02,310
properties than these
other two properties.

494
00:25:02,310 --> 00:25:05,540
What consistency
means is the property

495
00:25:05,540 --> 00:25:10,580
of an atomic action that is some
application-specific invariant.

496
00:25:17,850 --> 00:25:19,250
Consistency of a
transaction says

497
00:25:19,250 --> 00:25:20,708
that if you have
a transaction that

498
00:25:20,708 --> 00:25:23,990
commits then some set of
consistency invariants

499
00:25:23,990 --> 00:25:25,580
must hold.

500
00:25:25,580 --> 00:25:28,600
I will describe some
examples of what this means.

501
00:25:28,600 --> 00:25:30,350
Consistent just
says that there is

502
00:25:30,350 --> 00:25:33,660
some application-specific
invariants that must hold.

503
00:25:37,480 --> 00:25:40,980
And durability says that if
a transaction commits then

504
00:25:40,980 --> 00:25:43,180
the state changes
that it has made,

505
00:25:43,180 --> 00:25:46,190
that the data items
that it has changed

506
00:25:46,190 --> 00:25:48,880
has to last for
some period of time.

507
00:25:48,880 --> 00:25:51,140
And the period of time
that they have to last for

508
00:25:51,140 --> 00:25:53,300
is defined by the application.

509
00:25:53,300 --> 00:25:54,710
And there are many examples.

510
00:25:54,710 --> 00:25:56,230
A simple example
of durability might

511
00:25:56,230 --> 00:25:59,450
be that the changes
made by an atomic action

512
00:25:59,450 --> 00:26:02,590
just have to last until
the entire thread finishes.

513
00:26:02,590 --> 00:26:04,170
And, at the other
extreme, you could

514
00:26:04,170 --> 00:26:07,950
get into semantics
of durability which

515
00:26:07,950 --> 00:26:09,910
say that the changes
made by an atomic action

516
00:26:09,910 --> 00:26:12,660
have to last for three
years or for five years

517
00:26:12,660 --> 00:26:16,690
or for forever which is a
really hard thing to solve.

518
00:26:16,690 --> 00:26:20,240
But you might define
semantics that relates

519
00:26:20,240 --> 00:26:21,671
to the permanence of data.

520
00:26:21,671 --> 00:26:23,170
For how long do you
want the changes

521
00:26:23,170 --> 00:26:25,920
that you made to
last and be visible

522
00:26:25,920 --> 00:26:28,135
to other atomic actions?

523
00:26:42,490 --> 00:26:44,110
There are two cases
for consistency

524
00:26:44,110 --> 00:26:49,000
that we need to talk about.

525
00:26:49,000 --> 00:26:51,660
The first one is consistency
in a centralized system.

526
00:26:57,790 --> 00:27:00,670
An example of this, and the
most common example of this

527
00:27:00,670 --> 00:27:04,450
is in database systems
that support transaction

528
00:27:04,450 --> 00:27:09,380
where you might have rules
that are also called integrity

529
00:27:09,380 --> 00:27:16,630
rules for deciding whether
you are allowing a transaction

530
00:27:16,630 --> 00:27:18,060
to commit or not.

531
00:27:18,060 --> 00:27:20,360
Let me give you a couple
of examples of this.

532
00:27:23,140 --> 00:27:28,840
Let's say that you have
a type of database system

533
00:27:28,840 --> 00:27:31,710
as a relational database
system where all of the data

534
00:27:31,710 --> 00:27:34,390
is stored in tables.

535
00:27:34,390 --> 00:27:37,770
For example, you might have
a table storing a student

536
00:27:37,770 --> 00:27:45,030
ID, a student name and
let's say the department

537
00:27:45,030 --> 00:27:47,780
that the student belongs to.

538
00:27:47,780 --> 00:27:51,760
And let's say the
departments have IDs.

539
00:27:51,760 --> 00:27:54,800
And you might have another
table in your system

540
00:27:54,800 --> 00:28:02,590
that stores a department
ID and a department name.

541
00:28:06,575 --> 00:28:07,950
Now, you might
have a transaction

542
00:28:07,950 --> 00:28:11,880
that makes updates to entries
in this table, you know,

543
00:28:11,880 --> 00:28:15,240
one or more rows in this table
could actually make updates

544
00:28:15,240 --> 00:28:19,230
to just specific
cells of this table.

545
00:28:19,230 --> 00:28:22,410
It could add a new
student ID, add a name

546
00:28:22,410 --> 00:28:24,080
and add some department ID.

547
00:28:27,509 --> 00:28:29,550
Now, the kind of constraint
we are worried about,

548
00:28:29,550 --> 00:28:32,400
the kind of invariants
we are worried about

549
00:28:32,400 --> 00:28:35,700
are things where the person
who has designed this database

550
00:28:35,700 --> 00:28:40,250
might say that you are not
allowed to add a department

551
00:28:40,250 --> 00:28:42,930
ID that is nonexistent.

552
00:28:42,930 --> 00:28:45,320
And what that means is that
there are these two tables.

553
00:28:45,320 --> 00:28:47,140
And you should not
allow any transaction

554
00:28:47,140 --> 00:28:50,260
to write the department ID which
is not already in this table.

555
00:28:50,260 --> 00:28:54,430
So if 43 might be in this table
and 25 might be on this table,

556
00:28:54,430 --> 00:28:58,740
but a number that is not in this
table should not be added here.

557
00:28:58,740 --> 00:29:01,980
And so the transaction
processing system might decide,

558
00:29:01,980 --> 00:29:03,910
will, in fact, not
allow this transaction

559
00:29:03,910 --> 00:29:06,570
to commit if it is
writing a value that

560
00:29:06,570 --> 00:29:07,940
is not in this other table.

561
00:29:07,940 --> 00:29:13,300
And, for those familiar with
databases, relation databases,

562
00:29:13,300 --> 00:29:16,270
there are these two
tables called T1 and T2.

563
00:29:16,270 --> 00:29:18,200
This might be a primary key.

564
00:29:18,200 --> 00:29:19,880
Department ID might
be a primary key

565
00:29:19,880 --> 00:29:25,120
of T2 defined as what is called
a foreign key in T1, which

566
00:29:25,120 --> 00:29:27,490
means that you are not actually
allowed to add something

567
00:29:27,490 --> 00:29:29,031
to a foreign key if
it is not already

568
00:29:29,031 --> 00:29:34,850
in the other table where that
same column is a primary key.

569
00:29:34,850 --> 00:29:37,990
So there are rules like this
in most relational database

570
00:29:37,990 --> 00:29:39,900
systems and there are
a variety of rules

571
00:29:39,900 --> 00:29:43,740
like this that all have to do
with maintaining the integrity

572
00:29:43,740 --> 00:29:46,420
of the data that you add here.

573
00:29:46,420 --> 00:29:48,750
Now, this has nothing
to do with isolation.

574
00:29:48,750 --> 00:29:53,490
It has to do with atomicity
because these rules are

575
00:29:53,490 --> 00:29:55,370
typically checked
at the commit point,

576
00:29:55,370 --> 00:29:57,194
because until then
anything could happen.

577
00:29:57,194 --> 00:29:58,610
So, right before
you commit, there

578
00:29:58,610 --> 00:30:00,320
are these invariants
on the data that

579
00:30:00,320 --> 00:30:02,911
are application-specific
that you need to check.

580
00:30:02,911 --> 00:30:04,410
But it has nothing
to do with locks.

581
00:30:04,410 --> 00:30:05,370
It has nothing to
do with anything.

582
00:30:05,370 --> 00:30:07,670
It sort of presumes
atomicity, and after that it

583
00:30:07,670 --> 00:30:10,291
checks these
application-specific rules.

584
00:30:10,291 --> 00:30:11,790
And you can get
quite sophisticated.

585
00:30:11,790 --> 00:30:14,540
Some of these things about
primary keys and secondary keys

586
00:30:14,540 --> 00:30:17,510
are things that are
checked by most transaction

587
00:30:17,510 --> 00:30:20,540
processing systems, but you
could get quite sophisticated

588
00:30:20,540 --> 00:30:21,320
about these rules.

589
00:30:21,320 --> 00:30:24,530
For example, you
could have rules.

590
00:30:24,530 --> 00:30:26,410
Let's say you have
a database storing

591
00:30:26,410 --> 00:30:28,420
employees and their salaries.

592
00:30:28,420 --> 00:30:31,960
You could have rules that
say any time an employee gets

593
00:30:31,960 --> 00:30:35,760
a raise then everybody
else in the same peer group

594
00:30:35,760 --> 00:30:37,900
also gets some kind of raise.

595
00:30:37,900 --> 00:30:39,810
And so you wouldn't
allow any transaction

596
00:30:39,810 --> 00:30:43,420
to commit that did not insure
that invariant to hold.

597
00:30:43,420 --> 00:30:45,770
And checking these things
could be quite difficult,

598
00:30:45,770 --> 00:30:47,990
and most systems do not
actually do a really good job

599
00:30:47,990 --> 00:30:49,760
of checking these things.

600
00:30:49,760 --> 00:30:51,920
The sets of rules they
allow you to write

601
00:30:51,920 --> 00:30:54,400
is quite limited because
checking it is quite hard,

602
00:30:54,400 --> 00:30:56,691
because when you are trying
to commit a transaction now

603
00:30:56,691 --> 00:30:59,100
you might have to check
a large number of rules.

604
00:30:59,100 --> 00:31:03,180
And some of them could be both
time-consuming and complicated.

605
00:31:03,180 --> 00:31:06,870
But the main point here
is that these rules

606
00:31:06,870 --> 00:31:08,550
are application-specific.

607
00:31:08,550 --> 00:31:13,650
And that is what defines
consistency of the data

608
00:31:13,650 --> 00:31:16,230
that you have.

609
00:31:16,230 --> 00:31:18,470
The more interesting
case for consistency

610
00:31:18,470 --> 00:31:21,260
and the thing that
is going to occupy us

611
00:31:21,260 --> 00:31:23,740
for the rest of
today and tomorrow

612
00:31:23,740 --> 00:31:25,773
is consistency in
distributed systems.

613
00:31:30,750 --> 00:31:32,460
In particular,
when the same data

614
00:31:32,460 --> 00:31:34,791
gets distributed, typically
for fault-tolerance

615
00:31:34,791 --> 00:31:36,790
and for availability, to
insure that the data is

616
00:31:36,790 --> 00:31:38,830
available at
different locations,

617
00:31:38,830 --> 00:31:41,420
you end up with
consistency problems.

618
00:31:41,420 --> 00:31:45,480
And we have already seen
a few examples of this.

619
00:31:45,480 --> 00:31:47,880
One example of this is in
the "Domain Name System"

620
00:31:47,880 --> 00:31:53,240
which maintains mapping between
domain names and IP addresses.

621
00:31:53,240 --> 00:31:56,900
And, if you remember, in
order to achieve availability

622
00:31:56,900 --> 00:32:00,200
and good performance,
these mappings

623
00:32:00,200 --> 00:32:02,240
between DNS names and
IP addresses where

624
00:32:02,240 --> 00:32:05,230
cached essentially on demand.

625
00:32:05,230 --> 00:32:07,970
Whenever a name
server on the Internet

626
00:32:07,970 --> 00:32:10,570
made an access to
that name it address

627
00:32:10,570 --> 00:32:13,650
cached the mapping results.

628
00:32:13,650 --> 00:32:15,080
And so now you
have to worry about

629
00:32:15,080 --> 00:32:17,450
whether the data that is cached
somewhere out on the Internet

630
00:32:17,450 --> 00:32:19,800
is, in fact, the correct
data where correct is defined

631
00:32:19,800 --> 00:32:22,530
as the data that is being
maintained by the primary name

632
00:32:22,530 --> 00:32:25,380
server.

633
00:32:25,380 --> 00:32:28,530
And if you think about
DNS did, it actually

634
00:32:28,530 --> 00:32:35,870
used a mechanism
of expiration times

635
00:32:35,870 --> 00:32:39,870
to keep this cache consistent.

636
00:32:39,870 --> 00:32:43,020
And what that means
is that the only time

637
00:32:43,020 --> 00:32:45,810
you are guaranteed that
the data in a cache

638
00:32:45,810 --> 00:32:49,260
is, in fact, the data that
is stored at the primary name

639
00:32:49,260 --> 00:32:55,740
server for that name is when
this expiration time finishes.

640
00:32:55,740 --> 00:32:57,850
And the first access
after the expiration time

641
00:32:57,850 --> 00:33:01,410
requires the name server to go
to the original primary name

642
00:33:01,410 --> 00:33:04,710
server and do a
look up of the name.

643
00:33:07,646 --> 00:33:10,020
So the rest of the time you
cannot actually be guaranteed

644
00:33:10,020 --> 00:33:12,640
that the data is consistent.

645
00:33:12,640 --> 00:33:15,640
And, in other words,
you are not getting

646
00:33:15,640 --> 00:33:17,555
what is considered
strong consistency.

647
00:33:20,410 --> 00:33:21,905
What is strong consistency?

648
00:33:27,550 --> 00:33:30,950
One way to define
the semantics of what

649
00:33:30,950 --> 00:33:33,240
it means for data to be
consistent in a distributed

650
00:33:33,240 --> 00:33:36,990
system is it is sort of a
natural definition which

651
00:33:36,990 --> 00:33:40,360
is to see that any time
you do a read anywhere,

652
00:33:40,360 --> 00:33:42,640
any node does a
read of some data,

653
00:33:42,640 --> 00:33:44,980
read returns the result
of the late write.

654
00:33:54,010 --> 00:33:57,050
That is one notion
of consistency.

655
00:33:57,050 --> 00:33:59,530
And a system provides
strong consistency

656
00:33:59,530 --> 00:34:02,470
if you can insure that
every read returns

657
00:34:02,470 --> 00:34:05,350
the result of the last write
that was done on the data.

658
00:34:09,219 --> 00:34:10,909
And this is really
hard to provide

659
00:34:10,909 --> 00:34:16,070
because what it typically means
is that the data is widely

660
00:34:16,070 --> 00:34:17,620
replicated or cached.

661
00:34:17,620 --> 00:34:19,219
Any time anybody
changes the data

662
00:34:19,219 --> 00:34:23,025
you have to make sure that all
of the copies get that change.

663
00:34:23,025 --> 00:34:25,150
And, even if you work really
hard to invalidate all

664
00:34:25,150 --> 00:34:26,620
the entries and
make changes to it,

665
00:34:26,620 --> 00:34:29,760
there are these small windows
of vulnerability where --

666
00:34:29,760 --> 00:34:32,610
In fact, in DNS, for example,
even the first access

667
00:34:32,610 --> 00:34:35,159
that you make the server
after the expiration time

668
00:34:35,159 --> 00:34:37,800
may not guaranty that
when the response returns,

669
00:34:37,800 --> 00:34:40,060
the response is, in
fact, the newest response

670
00:34:40,060 --> 00:34:42,639
because the primary name
server could send a response.

671
00:34:42,639 --> 00:34:45,360
And, while it is coming back to
the person who made the query,

672
00:34:45,360 --> 00:34:48,560
the data could get changed
at the primary name server

673
00:34:48,560 --> 00:34:51,070
so it is really hard
to guaranty this,

674
00:34:51,070 --> 00:34:54,364
at all points in time,
in a distributed system.

675
00:34:54,364 --> 00:34:55,780
And it gets much
harder when there

676
00:34:55,780 --> 00:34:58,490
are failures making
certain copies unavailable

677
00:34:58,490 --> 00:35:01,840
or making access to a primary
in the DNS case unavailable.

678
00:35:04,800 --> 00:35:07,480
In practice, in most
systems, the kind

679
00:35:07,480 --> 00:35:14,970
of consistency that people try
to get is eventual consistency

680
00:35:14,970 --> 00:35:17,600
or they try to approximate
strong consistency

681
00:35:17,600 --> 00:35:18,730
in some other way.

682
00:35:18,730 --> 00:35:20,730
And eventually
consistency just --

683
00:35:20,730 --> 00:35:23,650
It is a little bit of a loser
notion, but what it says

684
00:35:23,650 --> 00:35:26,030
is that there might
be periods of time

685
00:35:26,030 --> 00:35:29,110
where things are consistent or
that the system is doing work

686
00:35:29,110 --> 00:35:32,340
in the background to
make sure that all

687
00:35:32,340 --> 00:35:35,930
of the copies of a given data
item are, in fact, the same

688
00:35:35,930 --> 00:35:42,442
and are the result of the
last write to that data.

689
00:35:42,442 --> 00:35:44,150
Again, the notion of
eventual consistency

690
00:35:44,150 --> 00:35:45,524
depends a lot on
the application.

691
00:35:45,524 --> 00:35:48,620
So, really, to specify
this precisely you

692
00:35:48,620 --> 00:35:51,210
have to look at in the
context of the application.

693
00:35:51,210 --> 00:35:53,350
Different applications
you have different notions

694
00:35:53,350 --> 00:35:57,670
of consistency and
eventual consistency.

695
00:35:57,670 --> 00:36:00,560
So we looked at
DNS as an example.

696
00:36:00,560 --> 00:36:03,070
Another example to
look at is something

697
00:36:03,070 --> 00:36:10,130
you might be familiar with
which is "Web caches".

698
00:36:10,130 --> 00:36:14,130
Web caches, for example, your
browser has a cache in it.

699
00:36:14,130 --> 00:36:16,820
And there might be Web
caches located elsewhere

700
00:36:16,820 --> 00:36:19,370
in the network that
capture your requests.

701
00:36:19,370 --> 00:36:22,760
And people use Web
caches to save latency

702
00:36:22,760 --> 00:36:27,790
or to prevent slamming a Web
server that might otherwise

703
00:36:27,790 --> 00:36:30,060
get overloaded.

704
00:36:30,060 --> 00:36:35,530
The semantics here are
usually that you do not just

705
00:36:35,530 --> 00:36:36,740
return stale data.

706
00:36:36,740 --> 00:36:38,920
If the data has changed
on the Web server,

707
00:36:38,920 --> 00:36:40,930
it might be that you
actually want to return

708
00:36:40,930 --> 00:36:43,010
good data to the client.

709
00:36:43,010 --> 00:36:46,640
The way this is normally
done is for the client

710
00:36:46,640 --> 00:36:50,540
or for any cache to first
check with the Web server

711
00:36:50,540 --> 00:36:52,430
to see if the data
has been changed

712
00:36:52,430 --> 00:36:54,960
since the last cached version.

713
00:36:54,960 --> 00:36:58,230
Let's say that the cache
went to the Web server

714
00:36:58,230 --> 00:37:01,330
at 9:00 in the morning and had
to go there because it did not

715
00:37:01,330 --> 00:37:03,110
have the data in the cache.

716
00:37:03,110 --> 00:37:04,890
And it got some data back.

717
00:37:04,890 --> 00:37:07,980
The data has a timestamp on it.

718
00:37:07,980 --> 00:37:11,880
Then the next time somebody
makes a request to the cache,

719
00:37:11,880 --> 00:37:15,120
the cache does not just
return the data immediately.

720
00:37:15,120 --> 00:37:17,980
What the cache usually does
is to go to the Web server

721
00:37:17,980 --> 00:37:19,980
and ask the Web server
if the data has changed

722
00:37:19,980 --> 00:37:21,640
since 9:00 in the morning.

723
00:37:21,640 --> 00:37:23,740
If the data has changed
since 9:00 in the morning

724
00:37:23,740 --> 00:37:25,330
you might retrieve the data.

725
00:37:25,330 --> 00:37:27,160
You would retrieve the
data for the server.

726
00:37:27,160 --> 00:37:31,340
If not then go ahead and
return the data to the client.

727
00:37:31,340 --> 00:37:39,160
This is also called
"If-Modified-Since"

728
00:37:39,160 --> 00:37:42,570
because what you are saying is
the cache is telling the server

729
00:37:42,570 --> 00:37:46,400
send me the data if it has been
modified since the last time I

730
00:37:46,400 --> 00:37:48,630
know the version of
the data that I have.

731
00:37:48,630 --> 00:37:50,840
And a convenient way
to represent that

732
00:37:50,840 --> 00:37:51,990
is as a timestamp.

733
00:37:51,990 --> 00:37:54,825
It's just a version of the data.

734
00:37:54,825 --> 00:37:56,200
So you can see
that this actually

735
00:37:56,200 --> 00:37:59,900
provides a more stronger
consistency semantics than DNS.

736
00:37:59,900 --> 00:38:01,740
Because in DNS the
data could have changed

737
00:38:01,740 --> 00:38:07,800
and your cache just
has outdated data.

738
00:38:07,800 --> 00:38:09,980
But for the application
that DNS is used for

739
00:38:09,980 --> 00:38:16,170
it is perfectly OK for
that to be the case.

740
00:38:16,170 --> 00:38:18,520
Now, in general, in
distributed systems

741
00:38:18,520 --> 00:38:21,910
there is a tradeoff
between the consistency

742
00:38:21,910 --> 00:38:26,910
of data at the different
replicas and availability.

743
00:38:26,910 --> 00:38:29,690
Availability just means
that clients wanting data

744
00:38:29,690 --> 00:38:33,249
should get some
copy of the data.

745
00:38:33,249 --> 00:38:35,540
Now, if the system is strongly
consistent then the copy

746
00:38:35,540 --> 00:38:37,880
of data that you
get is, in fact,

747
00:38:37,880 --> 00:38:41,090
the result of the last write.

748
00:38:41,090 --> 00:38:44,980
But the tradeoff occurs between
availability and consistency

749
00:38:44,980 --> 00:38:47,690
because in many
distributed systems

750
00:38:47,690 --> 00:38:50,530
your networks are not reliable
or nodes themselves are not

751
00:38:50,530 --> 00:38:52,120
reliable and they might fail.

752
00:38:52,120 --> 00:38:55,720
So in the presence of failures,
say network partitions

753
00:38:55,720 --> 00:38:58,220
or failures of
nodes, it turns out

754
00:38:58,220 --> 00:39:01,660
to be really hard to guaranty
both high availability

755
00:39:01,660 --> 00:39:06,190
and strong consistency.

756
00:39:06,190 --> 00:39:09,410
As sort of a trivial
existent example of this,

757
00:39:09,410 --> 00:39:11,080
if you have three
copies of the data

758
00:39:11,080 --> 00:39:16,400
and you were not very careful
about figuring out your

759
00:39:16,400 --> 00:39:17,160
write protocol.

760
00:39:17,160 --> 00:39:19,550
Let's say that your write
protocol was to sort of write

761
00:39:19,550 --> 00:39:22,090
to one version and
then your read protocol

762
00:39:22,090 --> 00:39:24,360
was to just read from
some other version

763
00:39:24,360 --> 00:39:26,780
and for some process
in the background

764
00:39:26,780 --> 00:39:29,370
to transfer the replica
from the first version

765
00:39:29,370 --> 00:39:31,880
that the client wrote to,
to all of the other copies,

766
00:39:31,880 --> 00:39:34,040
then there would be periods
of time of the network

767
00:39:34,040 --> 00:39:36,350
where partitioned you
could end up in a situation

768
00:39:36,350 --> 00:39:40,390
where the version that a
given client is reading

769
00:39:40,390 --> 00:39:44,290
is not actually the last version
of the data that was written.

770
00:39:44,290 --> 00:39:47,970
In fact, if you started thinking
about DP2, Design Project 2,

771
00:39:47,970 --> 00:39:54,150
really, one part of it gets at
how you manage replicated data.

772
00:39:54,150 --> 00:40:00,220
For example, when the utility
that does the archiving

773
00:40:00,220 --> 00:40:02,040
publishes data, one
approach it might take

774
00:40:02,040 --> 00:40:05,460
is to publish the
data that it wants

775
00:40:05,460 --> 00:40:10,140
to archive to all of the copies,
to all of the replica machines.

776
00:40:10,140 --> 00:40:15,490
And the read protocol might
be to read from one of them.

777
00:40:15,490 --> 00:40:18,910
Now, if you insure that
the write protocol finishes

778
00:40:18,910 --> 00:40:22,610
and succeeds only when all
of the replica machines

779
00:40:22,610 --> 00:40:26,800
are updated then you can try
to get at a decent version

780
00:40:26,800 --> 00:40:28,089
of consistency.

781
00:40:28,089 --> 00:40:30,380
But you need to be able to
do that when failures occur.

782
00:40:30,380 --> 00:40:32,440
The network might fail
or nodes might fail,

783
00:40:32,440 --> 00:40:37,010
and you need to figure
out how to do that.

784
00:40:37,010 --> 00:40:38,990
But you might decide that
writing to end copies

785
00:40:38,990 --> 00:40:43,260
and reading from one copy is
difficult or has high overhead

786
00:40:43,260 --> 00:40:47,002
so you might think about ways
of writing to certain subsets,

787
00:40:47,002 --> 00:40:48,460
writing to a subset
of the machines

788
00:40:48,460 --> 00:40:50,060
and reading from a
subset of the machines

789
00:40:50,060 --> 00:40:51,685
to try to see whether
you could come up

790
00:40:51,685 --> 00:40:58,052
with ways to get a consistent
version of the data.

791
00:40:58,052 --> 00:41:00,510
Or you might decide that the
right way to solve the problem

792
00:41:00,510 --> 00:41:02,960
is not to try to achieve
really strong consistency

793
00:41:02,960 --> 00:41:06,750
in all situations but to
relax the kind of consistency

794
00:41:06,750 --> 00:41:11,324
you want and maybe a different
version of semantics.

795
00:41:11,324 --> 00:41:13,240
As long as you are precise
about the semantics

796
00:41:13,240 --> 00:41:14,910
that your system
provides, it might

797
00:41:14,910 --> 00:41:16,610
be a different solution
or reasonable solution

798
00:41:16,610 --> 00:41:17,234
to the problem.

799
00:41:29,520 --> 00:41:32,670
So one interesting
way in which people

800
00:41:32,670 --> 00:41:34,700
achieve reasonable
strong consistency

801
00:41:34,700 --> 00:41:39,070
in tightly coupled distributed
systems, and distributed

802
00:41:39,070 --> 00:41:43,030
systems that are not across the
Internet where a network could

803
00:41:43,030 --> 00:41:46,540
arbitrarily fail, but in
more tightly coupled systems

804
00:41:46,540 --> 00:41:48,750
is in a multiprocessor.

805
00:41:51,490 --> 00:41:54,337
If you have a computer
that has many processors --

806
00:42:04,080 --> 00:42:07,460
And the abstraction here
for this multiprocessor

807
00:42:07,460 --> 00:42:09,010
is that of shared memory.

808
00:42:09,010 --> 00:42:12,890
You actually have memory
sitting outside here,

809
00:42:12,890 --> 00:42:15,300
and these processors
are reading and writing

810
00:42:15,300 --> 00:42:19,070
data to this memory.

811
00:42:19,070 --> 00:42:22,530
The latency to get to
memory and back is high.

812
00:42:22,530 --> 00:42:24,740
So, as you know, processors
have caches on them.

813
00:42:31,810 --> 00:42:38,640
As long as the memory locations
that are being written and read

814
00:42:38,640 --> 00:42:40,580
are not shared between
them these caches

815
00:42:40,580 --> 00:42:42,774
could function just fine.

816
00:42:42,774 --> 00:42:44,440
And when there is an
instruction running

817
00:42:44,440 --> 00:42:48,180
on one of these processors that
wants to access some memory

818
00:42:48,180 --> 00:42:50,730
location, you could just
read and write from the cache

819
00:42:50,730 --> 00:42:53,440
so things would just work out.

820
00:42:53,440 --> 00:42:57,120
The problem arises when there
is a memory location being

821
00:42:57,120 --> 00:43:01,170
read here that
actually was previously

822
00:43:01,170 --> 00:43:03,270
written by this processor.

823
00:43:03,270 --> 00:43:06,090
And, if you read
it here, then you

824
00:43:06,090 --> 00:43:07,990
might get an old
version of the data.

825
00:43:07,990 --> 00:43:12,520
And if you think of just memory
as the basic abstraction,

826
00:43:12,520 --> 00:43:15,090
virtual memory then
this is bad semantics

827
00:43:15,090 --> 00:43:18,210
because your programs wouldn't
function the same way as they

828
00:43:18,210 --> 00:43:21,220
did when you just
had one processor

829
00:43:21,220 --> 00:43:23,000
or when you didn't
have the caches at all

830
00:43:23,000 --> 00:43:24,500
and you just went
directly to memory

831
00:43:24,500 --> 00:43:26,650
from multiple processors.

832
00:43:26,650 --> 00:43:31,740
The question is how do you know
whether the data in a cache

833
00:43:31,740 --> 00:43:34,060
is good or bad?

834
00:43:34,060 --> 00:43:37,940
Now, like in the Web caches
case, checking on every access

835
00:43:37,940 --> 00:43:39,610
whether the data
has changed is not

836
00:43:39,610 --> 00:43:42,050
going to be useful here
because the amount of work

837
00:43:42,050 --> 00:43:45,210
it takes to check something is
about the same as the amount

838
00:43:45,210 --> 00:43:47,030
of work it takes to
read or write something

839
00:43:47,030 --> 00:43:51,350
because you have taken
the latency hit for that.

840
00:43:51,350 --> 00:43:54,570
So that approach is
not going to work.

841
00:43:54,570 --> 00:43:58,980
The solution that is
followed in many systems

842
00:43:58,980 --> 00:44:00,300
is to use two ideas.

843
00:44:00,300 --> 00:44:03,850
The first idea is that
of a "Write-Thru Cache".

844
00:44:03,850 --> 00:44:05,710
What a write-thru
cache says is if there

845
00:44:05,710 --> 00:44:12,580
is a write that happens
here, or store instruction,

846
00:44:12,580 --> 00:44:14,400
the cache gets updated.

847
00:44:14,400 --> 00:44:16,730
But, in addition to the
cache getting updated,

848
00:44:16,730 --> 00:44:18,960
the data also gets
written through

849
00:44:18,960 --> 00:44:22,300
on the bus to the
memory location here.

850
00:44:22,300 --> 00:44:28,150
So that is the first idea,
to use a write-thru cache.

851
00:44:38,390 --> 00:44:43,570
The second is because this
is a bus all of these nodes

852
00:44:43,570 --> 00:44:45,339
can actually snoop
on this bus and see

853
00:44:45,339 --> 00:44:47,880
what activity there is on the
bus because it is a shared bus.

854
00:44:47,880 --> 00:44:50,010
It is a very special kind
of network, as I said.

855
00:44:50,010 --> 00:44:51,640
You cannot apply
this idea in general.

856
00:44:51,640 --> 00:44:53,400
It is a very special
kind of network

857
00:44:53,400 --> 00:44:55,252
where because it is a
bus and nothing fails,

858
00:44:55,252 --> 00:44:56,710
or the assumption
is nothing fails,

859
00:44:56,710 --> 00:45:00,620
everybody can check to see
what is going on on the bus.

860
00:45:00,620 --> 00:45:02,720
And any time there is any
activity on the bus that

861
00:45:02,720 --> 00:45:06,720
corresponds to something that
is stored in any node's cache

862
00:45:06,720 --> 00:45:08,210
you can do two things.

863
00:45:08,210 --> 00:45:10,830
You can actually
invalidate that cache entry

864
00:45:10,830 --> 00:45:14,590
but you can actually also see
what the update is and go ahead

865
00:45:14,590 --> 00:45:17,200
and look at the change
that was being done

866
00:45:17,200 --> 00:45:19,610
and update your cache.

867
00:45:19,610 --> 00:45:22,970
And this idea is sometimes
called a "Snoopy Cache"

868
00:45:22,970 --> 00:45:24,610
because you have
these caches that

869
00:45:24,610 --> 00:45:29,380
are snooping on activity that
is occurring in your system.

870
00:45:32,100 --> 00:45:34,880
And this is one way in which
you can achieve something that

871
00:45:34,880 --> 00:45:37,000
resembles strong consistency.

872
00:45:37,000 --> 00:45:39,490
But it actually turns out,
if you think hard about it,

873
00:45:39,490 --> 00:45:41,437
a precise version of
strong consistency

874
00:45:41,437 --> 00:45:42,520
is really hard to achieve.

875
00:45:42,520 --> 00:45:44,780
In fact, it is very,
very hard to even define

876
00:45:44,780 --> 00:45:49,100
what it means for
any read to see

877
00:45:49,100 --> 00:45:51,000
the result of the last
write because when

878
00:45:51,000 --> 00:45:54,190
you have multiple people
reading and writing things,

879
00:45:54,190 --> 00:45:56,406
when you get down to
the instruction level,

880
00:45:56,406 --> 00:45:58,280
it turns out to be really
hard to even define

881
00:45:58,280 --> 00:45:59,370
the right semantics.

882
00:45:59,370 --> 00:46:04,010
A lot of people are working
on this kind of thing.

883
00:46:04,010 --> 00:46:06,100
But this is a little
bit of a special case

884
00:46:06,100 --> 00:46:07,850
because this kind of
solution applies only

885
00:46:07,850 --> 00:46:11,240
in a very tightly coupled
system where you do not really

886
00:46:11,240 --> 00:46:15,234
have failures and everybody
can listen to everything else.

887
00:46:15,234 --> 00:46:16,900
But it is interesting
to note that there

888
00:46:16,900 --> 00:46:18,400
are cases when
you can achieve it

889
00:46:18,400 --> 00:46:19,550
and that is why
this is interesting.

890
00:46:19,550 --> 00:46:20,660
It is practically useful.

891
00:46:26,550 --> 00:46:31,780
So the main thing
about Design Project 2

892
00:46:31,780 --> 00:46:33,910
that relates to the
consistency discussion

893
00:46:33,910 --> 00:46:36,620
is for you to try to, I mean
at least one part of it,

894
00:46:36,620 --> 00:46:40,340
in case it was not clear from
the description of the project

895
00:46:40,340 --> 00:46:43,300
is for you to think about what
kind of consistency you want

896
00:46:43,300 --> 00:46:46,950
and come up with ways to manage
these different replicas.

897
00:46:46,950 --> 00:46:49,940
We are going to stop here.

898
00:46:49,940 --> 00:46:52,600
Next week we will talk
about multi-site atomicity.

899
00:46:52,600 --> 00:46:54,220
Tomorrow's recitation
is on a system

900
00:46:54,220 --> 00:46:58,130
called Unison which also looks
at consistency when you have

901
00:46:58,130 --> 00:47:01,000
mobile computers that are
trying to synchronize data

902
00:47:01,000 --> 00:47:02,850
with servers.