Serializability

From Wikipedia, the free encyclopedia

In databases and transaction processing, serializability is the property of a schedule (history) being serializable. It means equivalence (in its outcome, the resulting database state, the values of the database's data) to a serial schedule (serial schedule: No overlap in two transactions' execution time intervals; consecutive transaction execution). It relates to the isolation property of a transaction, and plays an essential role in concurrency control. Transactions are usually executed concurrently since their serial executions are typically extremely inefficient and thus impractical.

Contents

[edit] Correctness - Serializability

Serializability is the major criterion for the correctness of concurrent transactions' executions (i.e., transactions that have overlapping execution time intervals, and possibly access same shared resources), and a major goal for concurrency control. As such it is supported in all general purpose database systems. The rationale behind it is the following: If each transaction is correct by itself, then any serial execution of these transactions is correct. As a result, any execution that is equivalent (in its outcome) to a serial execution, is correct.

Schedules that are not serializable are likely to generate erroneous outcomes. Well known examples are with transactions that debit and credit accounts with money. If the related schedules are not serializable, then the total sum of money may not be preserved. Money could disappear, or be generated from nowhere. This is caused by one transaction writing, and "stepping on" and erasing what has been written by another transaction before it has become permanent in the database. This does not happen if serializability is maintained.

[edit] Correctness - Recoverability

In systems where transactions can abort (virtually all real systems), serializability by itself is not sufficient for correctness. Schedules also need to possess the Recoverability property. Recoverability means that committed transactions have not read data written by aborted transactions (whose effects do not exist in the resulting database states). While serializability can be compromised in many applications, compromising recoverability always violates the database's integrity.

[edit] Relaxing serializability

In many applications, unlike with finances, absolute correctness is not needed. For example, when retrieving a list of products according to specification, in most cases it does not matter much if a product, whose data was updated a short time ago, does not appear in the list, even if it meets the specification. It will typically appear in such a list when tried again a short time later. Commercial databases provide concurrency control with a whole range of (controlled) serializability violations (see isolation levels) in order to achieve higher performance, when the application can tolerate such violations. Higher performance means better transaction execution rate and shorter transaction response time (transaction duration).

[edit] View serializability and Conflict serializability

Two major types of serializability exist: View serializability, and Conflict serializability. Any schedule with the latter property also has the first property. However, conflict serializability is easier to achieve, and is widely utilized.

View serializability of a schedule is defined by equivalence to a serial schedule with the same transactions, such that respective transactions in the two schedules read and write the same data values ("view" the same data values).

Conflict serializability is defined by equivalence to a serial schedule with the same transactions, such that both schedules have the same sets of respective ordered (by time) pairs of conflicting operations (same precedence relations of respective conflicting operations). Two operations (read or write) are conflicting if they are of different transactions, upon the same data item, and at least one of them is write. A more general definition of conflicting operations (also for complex operations, which may consist each of several "simple" read/write operations) requires that they are noncommutative (changing their order also changes their combined result). Each such operation needs to be atomic by itself (by proper system support) in order to be commutative (nonconflicting) with the other. For example, the operations increment and decrement of a counter are both write operations, but do not need to be considered conflicting since they are commutative.

[edit] Testing conflict serializability

Schedule compliance with Conflict serializability can be tested as follows: The Conflict graph (Serializability graph) of the schedule for committed transactions, the directed graph representing precedence of transactions in the schedule, as reflected by precedence of conflicting operations in the transactions (transactions are nodes, precedence relations are directed edges), needs to be acyclic. This means that when a cycle of committed transactions is generated, serializability is violated. Thus conflict serializability mechanisms prevent cycles of committed transactions by aborting an undecided (neither committed, nor aborted) transaction (one is sufficient; at least one is aborted) on each such cycle, when generated, in order to break it. The probability of cycle generation is typically low, but nevertheless, such a situation is carefully handled, since correctness is involved. Many mechanisms do not maintain a conflict graph as a data structure, but rather prevent or break cycles implicitly (e.g., see SS2PL below). Transactions aborted due to serializability violation prevention are executed again.

[edit] Common mechanism - (Strong) Strict Two Phase Locking

(Strong) Strict Two Phase Locking (SS2PL) is a common mechanism (and schedule property) utilized to enforce in database systems both conflict serializability and Strictness, a special case of recoverability. The related schedule property is also referred to as Rigorousness. In this mechanism each data item is locked by a transaction before accessing it (any read or modify operation): The item is marked by a lock of a certain type, depending on operation (and the specific implementation; various models with different lock types exist). Access by another transaction may be blocked, typically upon conflict, depending on lock type and the other transaction's access operation type. All locked data on behalf of a transaction are released only after the transaction has ended (either committed or aborted).

Mutual blocking of two transactions or more results in a deadlock, where execution of these transactions is stalled, and no completion can be reached. A deadlock is a reflection of a potential cycle in the conflict graph, that would occur without the blocking. Deadlocks are resolved by aborting a transaction involved with such potential cycle (aborting one transaction per cycle is sufficient). Transactions aborted due to deadlock resolution are executed again.

[edit] Global serializability - Commitment ordering

Enforcing global serializability in a multidatabase system (typically distributed), where transactions span multiple databases (two or more), is problematic, since even if each database enforces serializability, the global schedule of all the databases is not necessarily serializable, and the needed communication between databases to reach conflict serializability using conflict information is excessive and unfeasible. An effective way to enforce conflict serializability globally in such a system is to enforce the Commitment ordering (CO, or Commit-order-serializability) property in each database. CO is a broad special case of conflict serializability, and if enforced locally in each database, also the global schedule possesses this property (CO). The only needed communication between the databases for this purpose is the (unmodified) messages of an atomic commitment protocol (e.g., the two phase commit protocol), already needed by each distributed transaction to reach atomicity. An effective local (to any single database) CO algorithm can run beside any local concurrency control mechanism (serializability enforcing mechanism) without interfering with its resource access scheduling strategy. As such CO provides a general, fully distributed solution (no central processing component or central data structure are needed) for guaranteeing global serializability in heterogeneous environments with different database system types and other multiple transactional objects (objects with states accessed and modified only by transactions) that may employ different serializability mechanisms.

CO by itself is not sufficient as a concurrency control mechanism, since it lacks the recoverability property, which should be supported as well.

SS2PL implies Commitment ordering, and any SS2PL compliant database can participate in multidatabase systems that utilize the CO solution without any modification or addition of a CO algorithm component.

With the Commitment ordering property the precedence (partial) order of transactions' commitment events is identical to the precedence (partial) order of the respective transactions as determined by their schedule's (acyclic) conflict graph. Any conflict serializable schedule can be made a CO compliant one, without aborting any transaction in the schedule, by delaying commitment events to comply with the needed partial order. The commitment event of a distributed transaction is always generated by some atomic commitment protocol (utilized to reach consensus among the transaction's components on whether to commit or abort it; this procedure is always carried out for distributed transactions, independently of concurrency control and CO). The atomic commitment protocol plays a central role in the distributed CO algorithm which enforces CO globally. In case of incompatible local commitment orders in two or more databases, which implies a global cycle (a cycle that spans two or more database) in the global conflict graph, the atomic commitment protocol breaks that cycle by aborting a transaction on the cycle.