# 编程辅导 www.cardiff.ac.uk/medic/irg-clinicalepidemiology – cscodehelp代写

www.cardiff.ac.uk/medic/irg-clinicalepidemiology

Concurrency control

Information modelling
& database systems

Concurrency
large databases are used by many users
many users  many transactions
if these transactions are run sequentially, then long transactions will make others wait for long periods
therefore, it is desirable to let transactions run concurrently
… but we need to preserve isolation

let C be a column
transaction A: decrease the value of C by 5, i.e. C := C – 5
transaction B: increase the value of C by 5, i.e. C := C + 5
What would be the effect on C after completing transactions A and B?
(C – 5) + 5 = C  no change in the value of C

Problem: lost update
the final value of C has increased by 5 (i.e. C=12), but should not have changed (i.e. C=7–5+5=7)
Transaction A Time Transaction B e.g. C = 7
C = C – 5 t2 2
t4 C = C + 5 12
write(C) t5 2
t6 write(C) 12
COMMIT t7 2
t8 COMMIT 12

occurs when two different transactions are trying to update the same cell at the same time

the final value of C has not changed (i.e. C=7), but should have increased by 5 (i.e. C=12)
it should be as if transaction A never happened
Transaction A Time Transaction B e.g. C = 7
C = C – 5 t2 2
write(C) t3 2
t5 C = C + 5 7
t6 write(C) 7
ROLLBACK t7 7
t8 COMMIT 7

occurs when a transaction reads data that has been modified by another transaction, but not yet committed

Problem: inconsistent analysis
SUM should be C + D = 2 + 9 = 11
Transaction A Time Transaction B e.g. C = 7, D = 4
C = C – 5 t2 2
write(C) t3 2
t6 SUM = C + D 6
D = D + 5 t8 9
write(D) 9
print(SUM) 6

occurs when a transaction reads several values trying to aggregate them, but another transaction updates some of them

Concurrency control
a transaction may be correct in itself, but it can still produce incorrect result if its execution is interfered by other transactions
concurrency control is about managing simultaneous execution of multiple transactions without them interfering with one another
to solve the problem, transactions use locks on shared data items before operating on them

Serialisability

a schedule is a time–ordered execution of operations from a set of concurrent transactions
a serial schedule is a schedule in which …
operations of individual transactions are executed consecutively
… and do not interleave with operations from other transactions
… and each transaction commits before another one is allowed to begin

Serial schedule
serial schedules are guaranteed to avoid interference and keep the database consistent
however, databases need concurrent access, which means interleaving operations from different transactions
Transaction A Time Transaction B e.g. C = 7
C = C – 5 t2 2
write(C) t3 2
COMMIT t4 2
t6 C = C + 5 7
t7 write(C) 7
t8 COMMIT 7

Serialisability
two schedules are equivalent if they always have the same effect
a schedule is serialisable if it is equivalent to some serial schedule
if two transactions only read some data items, then the order is which they do it is not important
if transaction A reads and updates a data item C and transaction B reads and updates a different data item D, then again they can be scheduled in any order

Serial vs. serialisable
if two transactions only read some data items,
then the order is which they do it is not important
interleaved schedule
serial schedule
Transaction Operation

Transaction Operation

this schedule is serialisable

Conflict serialisability
Do two transactions have a conflict?
NO if they refer to different resources
YES if at least one is a write and
they use the same resource
a schedule is conflict serialisable if transactions in the schedule have a conflict, but the schedule is still serialisable

Conflict serialisable schedule
interleaved schedule
serial schedule
Transaction Operation
A write(X)
B write(X)
A write(Y)
B write(Y)

this schedule is serialisable even though A and B read and write the same resources X and Y, i.e. they have a conflict
Transaction Operation
A write(X)
A write(Y)
B write(X)
B write(Y)

Conflict serialisable schedules
the main focus of concurrency control
they allow for interleaving and at the same time they are guaranteed to behave like serial schedules
important questions:
How to determine whether a schedule is conflict serialisable?
How to construct conflict serialisable schedules?

Precedence graphs
to determine whether a schedule is serialisable or not, we build a precedence graph
nodes are transactions
edges are precedence: there is an edge from A to B if A must happen before B in any equivalent serial schedule
the schedule is serialisable if there are no cycles in its precedence graph

Precedence
let A and B be two transactions
let a be an action of A and b is an action of B
A takes precedence over B if:
a is ahead of b in the schedule
both a and b involve the same resource R
at least one of a and b is a write action
in other words, A  B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)

remember, A  B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)
Transaction Action
A write(Y)
D write(X)

this schedule is serialisable

remember, A  B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)
Transaction Action
A write(Y)
D write(X)
A write(Z)

this schedule is not serialisable

locking is a procedure used to control concurrent access to data by ensuring serialisability of concurrent transactions
in order to use a resource (e.g. table, row, etc.) a transaction must first acquire a lock on that resource

two types of locks:
read lock (shared lock or S–lock)
write lock (exclusive lock or X–lock)
read lock allows several transactions simultaneously to read a resource, but no transactions can change it at the same time
write lock allows one transaction exclusive access to write to a resource and no other transaction can read this resource at the same time
the lock manager in the DBMS assigns locks and records them in the data dictionary

Concurrency control by locking
let T be a transaction and R be a resource
if T holds a write lock on R, then no other transactions may lock R
if T holds a read lock on R, then no other transactions may write lock A
T must acquire a write lock on R before writing R
after using a lock on R, T must release the lock in order to free up R
if the requested lock is not available, transaction waits

Two–phase locking
a transaction follows the two-phase locking protocol (2PL) if all locking operations precede the first unlock operation in the transaction
two phases:
growing phase where locks are acquired on resources
shrinking phase where locks are released

A follows 2PL protocol
all of its locks are acquired before any of them is released
it releases its lock on X and then goes on acquire a lock on Y

Transaction A Transaction B
write-lock(Y) unlock(X)
unlock(X) write-lock(Y)
Y = Y + X Y = Y + X
write(Y) write(Y)
unlock(Y) unlock(Y)

Serialisability theorem
Any schedule of two–phased transactions is conflict serialisable.

Lost update cannot happen with 2PL
Comment Transaction A Transaction B Comment
cannot acquire write–lock(C) because B has read–lock(C) write(C)
write(C) cannot acquire write–lock(C) because A has read–lock(C)

Uncommitted update cannot happen with 2PL
Comment Transaction A Transaction B Comment
write–lock(C) write(C)
to release
write–lock(C)
locks released ROLLBACK

Inconsistent analysis cannot happen with 2PL
Comment Transaction A Transaction B Comment
write–lock(C) write(C)
to release
write–lock(C) and later write–lock(D)
sum = C + D
write–lock(D) write(D)

timestamping

the use of locks solves one problem (serialising schedules)
… but introduces another (deadlocked schedules)
deadlock is a situation in which two or more transactions are in a simultaneous wait state, each waiting for others to release a lock
transaction A has a lock on a resource C and is waiting for a lock on a resource D
transaction B has a lock on a resource D and is waiting for a lock on a resource C

Wait–for graphs
given a schedule, potential deadlocks can be detected using a wait–for graph (WFG)
nodes are transactions
there is an edge from transaction B to transaction A if B waits for A, i.e.
A holds a lock on a resource R
B is waiting for a lock on the resource R
B cannot get the lock on R unless A releases it
if the graph does not contain cycles, then the schedule will not be deadlocked

transaction B waits for A in any of the following scenarios:
A read–locks R, then B tries to write–lock it
A write–locks R, then B tries to read–lock it
A write–locks R, then B tries to write–lock it

Transaction Action Lock Waits for
A write(P) write–lock(P)
B write(R) D
D write(P) A
A write(Q) B

deadlocks can arise with two–phase locking
deadlock is less of a problem than an inconsistent database
we can detect and recover from deadlock
… but it would be nice to avoid it altogether
e.g. conservative two–phase locking
all locks must be acquired before the transaction starts
low ‘lock utilisation’ – transactions can hold on to locks for a long time, but not effectively use them much