Snapshot isolation
From Wikipedia, the free encyclopedia
In databases, snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database, and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.
Snapshot isolation (SI) has been adopted by several major database management systems, such as InterBase, Oracle, PostgreSQL and Microsoft SQL Server. The main reason for its adoption is that it allows better performance than serializability, yet still avoids the kind of concurrency anomalies that cannot easily be worked around. SI has also been used to critique the ANSI SQL-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI).
Snapshot isolation is called "serializable" by Oracle and PostgreSQL. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible anomalous behaviour in their database logic.
Contents |
[edit] Definition
A transaction executing under snapshot isolation performs all reads and updates as if operating on a personal snapshot of the database, taken at the start of the transaction. Thus, any update performed during the transaction will be seen by subsequent reads within that same transaction, but no concurrent updates will be observed. When the transaction is completed, it will commit only if its updates do not conflict with any other transaction that has committed since the snapshot was taken. Such a write-write conflict will cause the transaction to abort.
If built on multiversion concurrency control (MCC), snapshot isolation allows transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed.
As a concrete example, imagine a bank storing two balances, X and Y, for two accounts held by a single person, Phil. The bank will allow X and Y to go into the red, provided the total held in both is never negative (i.e. X + Y >= 0 must hold). Suppose both X and Y start at $100. Now imagine Phil initiates two transactions concurrently, T1 withdrawing $200 from X, and T2 withdrawing $200 from Y.
If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from X, and then verify that X + Y >= 0 still holds, aborting if not. T2 similarly deducts $200 from Y and then verifies X + Y >= 0. Since the transactions must serialise, either T1 happens first, leaving X = -$100, Y = $100, and preventing T2 from succeeding (since X + (Y - $200) is now -$200), or T2 happens first and similarly prevents T1 from committing.
Under snapshot isolation, however, both T1 and T2 can operate on a private snapshot of X and Y: both deduct $200 (from X and Y respectively), and then both verify that the new total (respectively using the value of Y and X that held when the snapshot was taken) is zero. Since neither update conflicts, both commit successfully, leaving X = Y = -$100, and X + Y = -$200! This non-serializable anomaly is known as write skew.
[edit] Making Snapshot Isolation Serializable
Fekete et al.
have shown that potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions.- Materialize the conflict: Add a special conflict table, which both transactions update in order to create a direct write-write conflict.
- Promotion: Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write-write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE).
In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their total balance. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write-write conflict that would prevent the two from succeeding concurrently. This approach violates the normal form.
Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set X = X, creating an artificial write-write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible.
In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance.
[edit] History
Snapshot isolation arose from work on multiversion concurrency control (MCC) databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level. Borland's MCC database, InterBase 4, provided SI as far back as 1994.
Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions.
- PHANTOM READ anomalies: Two identical "range queries" in a single transaction return different result sets.
- WRITE SKEW anomalies: Two transactions read a value each, then change the other's value, but neither sees the result of the other's update (e.g. T1 and T2 in the example above).
ANSI's "REPEATABLE READ" isolation level allows phantom reads, but prevents write skew. In contrast, snapshot isolation allows write skew, but prevents phantom reads. Serializable transactions allow neither.
[edit] References
- ^ Berenson, Bernstein, Gray, Melton, O'Neil and O'Neil. A Critique of ANSI SQL Isolation Levels. Retrieved on April 3, 2006.
- ^ Fekete, Liarokapis, O'Neil, O'Neil and Shasha. Making Snapshot Isolation Serializable. Retrieved on April 3, 2006.