CS代考 Database Design – cscodehelp代写 – INTERVIEW&CODEHELP

Database Design

Introduction to Transactions

Simple Model of a DB
DB: A collection of named data items

Data granularity: A field, a record, a table, a whole disk block

Basic operations: Retrieve/Change data, i.e. read/write
Transaction Management

What is a Transaction?
Transaction: A logical unit of DB processing
A set of operations
Includes one or more access operations
E.g., read (retrieval), write (insert or update), delete
move money from my savings account to my checking account
The two updates to my savings and checking account would be a single transaction

Transaction Management

What is a Transaction?
A DBMS will typically receive many transaction requests possibly at the same time (concurrently)

Transaction Manager: A component of the DB
Execute DB requests from one or more users
Protect the data from loss or damage

Transaction Management

What is a Transaction?
Specified as a stand alone operation in a high level language such as SQL
Either submitted interactively or embedded in a program
An application program may contain several transactions
Each transaction has boundaries … Begin/End
Transaction Management

Basic Transaction Operations
read_item(X) – Read a DB item named X into a program variable

write_item(X) – Writes the value of a program variable into the DB item named X
(INSERT, UPDATE, DELETE)
Transaction Management

Multi-user database systems
Suppose two “users” are accessing a banking database simultaneously

How can we make sure that the automatic process doesn’t check my accounts in the middle of the move?
This is known as “concurrency control”
I’m transferring money from savings to my checking account
Bank is checking to see if the total amount in all of my accounts is less than $250
If so, charge fee and send alert

Concurrency Control
Ensuring that processes running simultaneously (concurrently) behave in an appropriate way
Result from two processes running in an interleaved fashion is predictable

The final result of running A and B interleaved depends on how A and B are divided

Concurrency control makes this behavior predictable

Transaction Properties – ACID
To handle concurrency, transactions need to be:
The transaction needs to either be entirely performed or not performed at all – no half-done transactions are allowed
Consistent
Transactions that start from a database in a consistent state should end with the database in a consistent state

Transaction Properties – ACID
To handle concurrency, transactions need to be:
Transactions should always function as if they were executing in isolation from other transactions
Database changes made by a completed transaction must persist – even if the database fails

Concurrency Control Issues
Lost Update
Two processes update same data
Incorrect Summary
Summary process at the same time as update
Temporary Update (Dirty Read)
Write operation fails, requires undo
Transaction Management

Concurrency Control Issues
Lost updates
Two processes try to update the same item
Interleaved in a way that causes incorrect values
T1 – move money from checking to savings
T2 – add money to checking

Concurrency Control Issues
Incorrect Summary
One process is computing an aggregation of values
Another process is making changes while the computation is occurring
T1 – Move money from checking to savings
T3 – Compute the total in all accounts

Concurrency Control Issues
Temporary Update (Dirty Read)
T1 – take money out of the account
T1 performs a read, then a write
T2 – add money to the account
T2 reads, updates and writes the same item
Then T1 fails at its next step, needing to undo the write

What causes a Transaction to Fail?
Unfortunate events –
Computer failure (crash, out of memory)
Transaction/System Errors (database full, disk full)
Logical/exception Errors
insufficient funds in a banking transaction, no such record, referential integrity fail
Disk I/O failure
What should be done if a failure occurs?
Remember, the job of the Transaction Manager
Protect the data from loss or damage
SQLITE_FULL: database or disk full
SQLITE_IOERR: disk I/O error
SQLITE_BUSY: database in use by another process
SQLITE_NOMEM: out or memory

SQLITE_FULL: database or disk full
SQLITE_IOERR: disk I/O error
SQLITE_BUSY: database in use by another process
SQLITE_NOMEM: out or memory

Transaction Implementation – DBMS
How are transactions implemented at the DBMS level?
Need to be able to restore the database to the last commit point after a failure
DBMS keeps a system log (or journal) of transactions
Sequential, append-only file – all transactions appended to end of log file
Kept on disk – only a catastrophic disk failure prevents a recovery
In case of failure, system log can “play back” transactions back to the last transaction before failure

Transaction Implementation – DBMS
DBMS keeps a system log (or journal) of transactions
Need to be able to restore the database to the last commit point after a failure
Sequential, append-only file – all transactions appended to end of log file
Kept on disk – only a catastrophic disk failure prevents a recovery
In case of failure, system log can “play back” transactions back to the last transaction before failure

Transaction Schedules
We are most concerned with read (r) and write (w) operations from transactions:
r1(X) – T1 performs a read on data item X
w2(Y) – T2 performs a write on data item Y
We may also be concerned with the beginning and ending of transactions, and commit points:
b1 – T1 transaction begins
c2 – T2 transaction commits

Transaction Schedules
Some notation:
Each schedule is noted by an alphabetic subscript:
Sa, Sb, etc.
Each transaction noted by a numeric subscript:
T1, T2, etc.

Schedules (example)
The above pair of transactions would have the following notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)

Schedules (another example)
Sb: r3(A); r1(X); w1(X); r3(X); r3(Y); r1(Y); w1(Y)

Transaction Schedules (Histories)
The order of operations of a sequence of transactions
Individual operations from multiple transactions can be interleaved together
Assume schedules have Total Ordering
For any two operations in the schedule, one MUST occur before the other

Transaction Schedules
Some notation:
Each schedule is noted by an alphabetic subscript:
Sa, Sb, etc.
Each transaction noted by a numeric subscript:
T1, T2, etc.
We are most concerned with read (r) and write (w) operations from transactions:
r1(X) – T1 performs a read on data item X
w2(Y) – T2 performs a write on data item Y
We may also be concerned with the beginning and ending of transactions, and commit points:
b1 – T1 transaction begins
c2 – T2 transaction commits
a1 – T1 transaction aborts (rollback)

Schedules (example)
The above pair of transactions would have the following notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)

Schedules (another example)
Sb: r3(A); r1(X); w1(X); r3(X); r3(Y); r1(Y); w1(Y)

Transaction Schedules – Conflicts
Two operations in a schedule conflict if:
They belong to different transactions
They operate on the same database item
At least one of the two operations is a write operation

Transaction Schedules – Conflicts
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)

Two operations conflict if changing their order changes the final result of the schedule

Different transactions
Write operation
NO CONFLICT
Same transaction

Transaction Schedule – Conflicts
Suppose we have two operations:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)

Two different schedules would be possible:
r2(X); w1(X) – first READ the original balance, then WRITE the new balance
w1(X); r2(X) – first WRITE the new balance, then READ the new balance

These two schedules give two different results
This kind of conflict is known as a read-write conflict
Write-write conflicts also possible

Transaction Schedules
We call a schedule a complete schedule if:
The operations in the schedule are exactly those of the transactions that make up the schedule
For any pair of operations within the same transaction, their order in the schedule is the SAME as their order in the transaction
For any pair of conflicting operations, one of the operations must occur before the other in the schedule

Complete Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;

Some possible complete schedules for the above transactions
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); c1; c2;
Sb: r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); c2;
Sf: r2(X); w2(X); c2; r1(X); w1(X); r1(Y); w1(Y); c1;
Sg: r2(X); r1(X); w2(X); w1(X); c2; r1(Y); w1(Y); c1;

This is NOT a possible complete schedule for the above:
Sh: r1(Y); r2(X); w2(X); w1(X); c2; r1(X); w1(Y); c1;
All of the operations are there, but NOT in the correct order for T1

Complete Schedules
Note condition 3
“For any pair of conflicting operations, one of the operations must occur before the other in the schedule”
Only requires that conflicting operations have a total ordering
Two read operations can happen in any order if they come from two different transactions
Remember – operations from same transaction MUST occur in the order given by the transaction
This means that a schedule is actually a partial order of the operations of a set of transactions
Partial order – some operation pairs have no specific order
In practice, most schedules are given a total order anyway

One way to characterize schedules – ease of recovery
How easy is it for a schedule to recover from transaction/system failures?
Possible but not simple?
Impossible?

Avoid schedules where recovery is impossible
Try to create schedules where recovery is simple

What makes a schedule recoverable?
Once a transaction T is committed, it should never be necessary to roll back T
This ensures Durability in our ACID properties
If a schedule behaves with regards to this rule, the schedule is recoverable
Schedules that do not have this property are nonrecoverable
DBMS software needs to avoid placing transactions into nonrecoverable schedules!

Nonrecoverable Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;

This schedule would be recoverable
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); c1; c2;

If we roll back just before the commit of T1, the result of T2 is unaffected – T2’s operations can be left in place

Nonrecoverable Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;

This schedule would be nonrecoverable
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1

Note that we’ve aborted T1 after committing T2
Rolling back T1 changes the result of T2
That first read for T2 would have a different result, since the first write for T1 was undone
But we’ve already committed T2! We can’t roll it back now

Nonrecoverable Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;

We can modify Sc to be recoverable:
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
What makes Sd recoverable when Sc isn’t?
Commit of T1 happens before the commit of T2
If we need to abort T1 before it finishes, T2 can be rolled back
If we need to abort T2 before it finishes but after T1 commits, there are no reads by T1 that are affected

Cascading Rollback
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;

Sd is recoverable, but there’s another issue
What happens if we abort T1 right before the commit?
T2 also has to be rolled back to its beginning
First read by T2 reads something that T1 has written, but hasn’t committed
These kinds of rollbacks are called cascading rollbacks

Cascading Rollback
Cascadeless rollbacks – every transaction in the schedule reads only items that were written by transactions that have committed:

Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
Sd is not cascadeless

Se: r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); c2;
Se is cascadeless
By delaying the T2’s read from X until after the commit by T1, cascading rollback is eliminated
Cascading rollbacks are time consuming

Serializability
Recall: ACID
What do we mean by isolated ?
Transactions need to appear as if they were run by themselves – without other transactions operating
What does it take to enforce this?
The transaction needs to run without interference by other transactions
Does this mean that the only schedules we’re allowed to have are serial schedules?

Serial Schedules
The representations above are serial schedules
Either T1 executes, followed by T2…
… or T2 executes, followed by T1
Here isolation is obviously preserved
Each transaction is independent of the other
No lost updates, incorrect summaries or other issues
Upshot – any serial ordering of transactions is considered a correct ordering

Non-Serial Schedules
Now Consider Transactions T1 and T2 above
The representations above are non-serial schedules for T1 and T2
No longer completely isolated – transactions interleaved
Correctness is no longer so obvious
Could we have a lost update here? An incorrect summary?
Can we still interleave operations while preserving correctness?

Interleaving Operations
Can we interleave operations while preserving correctness?
First question – why do we care about interleaving?
Idle processors waste time!
We want to maximize our efficiency
Allow as much concurrency of operation as possible
Serial schedules are inefficient
If T1 is waiting for an I/O operation, it would be better to let T2 process if we can
Forcing every schedule to actually be a serial schedule is unacceptable from an efficiency standpoint

Interleaving Operations
So, can we interleave operations while preserving correctness?
Sometimes – if our operations are ordered in a way that makes them equivalent to a serial schedule

Interleaving
Consider these schedules
A & B are serial schedules
D is equivalent to a serial schedule,
C is vulnerable to a lost update and is not equivalent to a serial schedule

Serializable Schedules
A schedule of n transactions is a serializable schedule if it is equivalent to some serial schedule of the same n transactions

How do we test to see if a non-serial schedule is serializable?

Result Equivalence
Simplest way of testing for equivalence
Two schedules are result equivalent if they produce the same final state of the database
Not a good approach – equivalences may be accidental
Consider the schedules below – equivalent for X=100 but NOT in general
We want tests that would not consider these schedules to be equivalent

Conflict Equivalence
A better test for equivalence is conflict equivalence
Two schedules are conflict equivalent if they have the same operations and the order of any two conflicting operations is the same in both schedules
Recall: conflicting operations:
Belong to two different transactions
Access the same database item
At least one is a write operation
A schedule is considered conflict serializable if it is conflict equivalent to some serial schedule

Serializability
Schedule D above is conflict equivalent to schedule A – serializable
Schedule C above is not conflict equivalent to either possible serial schedule – not serializable