Commitment ordering

From Wikipedia, the free encyclopedia

In databases and transaction processing, Commitment ordering (CO) is a property of a schedule (history). CO is a broad special case of serializability and effective means to achieve global serializability. It has been referred to also as commit ordering, and commit order serializability and strong recoverability (the latter is a misleading name since CO is incomparable with recoverability, and the term "strong" implies a special case. This means that a schedule with one property does not necessarily have the other property).

If transactions span two or more databases, enforcing serializability globally is problematic, since even if each database enforces serializability, the global schedule of all the databases involved with the transactions is not necessarily serializable, and the needed communication between databases to reach conflict serializability using conflict information is excessive and unfeasible. Enforcing CO is an effective way to enforce conflict serializability globally in a distributed system, where transactions span multiple databases, since enforcing CO locally in each database, also enforces it globally. Each database may use any, possibly different, type of concurrency control mechanism that provides conflict serializability, and has to enforce CO locally. Enforcing CO locally does not affect the data access scheduling strategy of the local concurrency control mechanism, since such a mechanism typically does not consider the commitment events (or their order). The CO solution requires no communication overhead, since it uses (unmodified) atomic commitment protocol messages only, already needed by each distributed transaction to reach atomicity. An atomic commitment protocol plays a central role in the distributed CO algorithm, which enforces CO globally, by breaking global cycles (cycles that span two or more database) in the global conflict graph. CO, its special cases, and its generalizations are compatible, and achieve global serializability while transparently being utilized together in a single heterogeonous distributed environment comprising objects with possibly different concurrency control mechanisms. As such, Commitment ordering, including its special cases, and together with its generalizations, provides a general, fully distributed solution (no central processing component or central data structure are needed) for guaranteeing global serializability in heterogeneous environments of federated databases (multidatabase systems) and other multiple transactional objects (objects with states accessed and modified only by transactions; e.g., in the framework of transactional processes).

Contents

[edit] The commitment ordering solution for global serializability

[edit] General characterization

Commitment ordering (CO) is a special case of conflict serializability. CO can be enforced with non-blocking mechanisms. In a CO schedule the commitment events' (partial) precedence order of the transactions corresponds to the precedence (partial) order of the respective transactions in the (directed) conflict graph (serializability graph), as induced by their conflicting access operations (usually read and write (insert/modify/delete) operations; CO also applies to higher level operations, where they are conflicting if noncommutative, as well as to conflicts between operations upon multi-version data). The commitment decision events are generated by either a local commitment mechanism, or an atomic commitment protocol, if different processes need to reach consensus on whether to commit or abort. The protocol may be distributed or centralized. Transactions may be committed concurrently, if the commit partial order allows (if they do not have conflicting operations). If different conflicting operations induce different partial orders of same transactions, then the conflict graph has cycles, and the schedule will violate serializability when all the transactions on a cycle are committed. In this case no partial order for commitment events can be found. Thus, cycles in the conflict graph need to broken by aborting transactions. However, any conflict serializable schedule can be made CO without aborting any transaction, by properly delaying commit events to comply with the transactions' precedence partial order.

CO enforcement by itself is not sufficient as a concurrency control mechanism, since CO lacks the recoverability property, which should be supported as well.

[edit] The distributed commitment ordering algorithm

A fully distributed Commitment ordering algorithm exists, that uses local CO of each participating database, and needs only (unmodified) atomic commitment protocol messages with no further communication. The distributed algorithm is the combination of local (to each database) CO algorithm processes, and an atomic commitment protocol (which can be fully distributed). Atomic commitment protocol is essential to enforce atomicity of each distributed transaction (to decide whether to commit or abort it; this procedure is always carried out for distributed transactions, independently of concurrency control and CO). A common example of an atomic commitment protocol is the two phase commit protocol, which is resilient to many types of system failure.

A local (in each database) CO algorithm determines the needed commitment order for the database (by the characterization of CO above, this order depends on the local precedence order of transactions, which results from the local data access scheduling mechanisms), and accordingly schedules the atomic commitment protocol votes-to-commit, for each (unaborted) distributed transaction. If a precedence relation (conflict) exists between two transactions, then the second will not be voted on before the first is completed (either committed or aborted), to prevent possible commit order violation by the atomic commitment protocol (commit order by the protocol is not necessarily the same as the voting order); if no precedence relation exists, both can be voted on concurrently. If participating databases in a same distributed transaction do not have compatible local commitment orders (no global partial order exists that can embed the local partial orders) for that transaction (without "knowing" it; typically no coordination between database systems exists on conflicts, since the needed communication is excessive and unfeasible), which means existence of a global cycle (a cycle involving two or more databases) in the global conflict graph, then the atomic commitment protocol will fail to collect all the votes needed to commit that transaction (at least one database will delay its vote for that transaction indefinitely, to comply with its own commitment order, since it will be waiting to the completion of another, preceding transaction on that global cycle, delayed indefinitely by another database with a different order). As a result the protocol will eventually abort a transaction on this global cycle (the specifics depend on the atomic commitment protocol's specific abort policies; a timeout mechanism is common; abort time can be shorten for CO by a dedicated mechanism). This will break the global cycle involving that distributed transaction, waiting transactions will be free to be voted on, and both global CO and global serializability will be maintained.

[edit] Enforcing commitment ordering locally

Commitment ordering can be enforced locally (in a single database) by a dedicated CO algorithm, or by any algorithm/protocol that provides any special case of CO. An important such protocol, being utilized extensively in database systems, which generates a CO schedule, is the strong strict two phase locking protocol (SS2PL: "release transaction's locks only after the transaction has been either committed or aborted"; see below). SS2PL is a proper subset of the intersection of 2PL and strictness.

[edit] A generic local commitment ordering algorithm

A generic CO algorithm is an algorithm that enforces exactly the CO property. It is nonblocking, and consists of aborting transactions (only if needed) before committing a transaction (or voting to commit it in a distributed environment). It aborts a (uniquely determined at any given time) minimal set of other undecided (neither committed, nor aborted) transactions that run locally and can cause serializability violation in the future (can later generate cycles of committed transactions in the conflict graph). This set consists of all undecided transactions with directed edges in the conflict graph to the transaction to be committed. The size of this set cannot increase when that transaction is waiting to be committed (in ready state: processing has ended), and typically decreases in time as its transactions are being decided. Thus, unless real-time constraints exist to complete transactions, it is preferred to wait with committing this transaction and let this set decrease in size. If another serializability mechanism exists locally (which eliminates cycles in the local conflict graph), or if no cycles involving that transaction exist, the set will be empty eventually. Otherwise the set will stabilize with transactions on local cycles, and aborts will have to occur to break the cycles (a database system can identify its local cycles). The generic CO algorithm does not affect local resource (data) access scheduling strategy, when it runs alongside of any other local concurrency control mechanism. It affects only the commit order, and for this reason it does not need to abort more transactions than those needed to be aborted for serializability violation prevention by any combined local concurrency control mechanism (e.g., aborts for deadlock resolution in blocking mechanisms). The net effect of CO may be, at most, a delay of commit events (or voting in a distributed environment), to comply with the needed commit order (but not more delay than its special cases, for example, SS2PL, and on the average significantly less).

From a software architecture point of view a CO component that implements the generic CO algorithm can be designed in a straightforward way as a mediator between a (single) database system and an atomic commitment protocol component. Its functions are to vote to commit on ready transactions (processing has ended) according to the local commitment order, to vote to abort on transactions for which the database system has initiated an abort (the database system can initiate abort for any transaction, for many reasons), and to pass the atomic commitment decision to the database system. For determining the commitment order the CO component maintains an updated representation of the local conflict graph of the undecided (neither committed nor aborted) transactions as a data structure. The CO component has a passive interface with its database system to receive conflict, "ready" (processing has ended; readiness to vote on a transaction), and abort notifications from the database system. It also interfaces with the atomic commitment protocol to vote and to receive the atomic commitment protocol's decision on each transaction. The decision is delivered from the CO component to the database system through their interface to execute it (this is the only needed notification to the database system; for a database's own initiated abort the decision (abort) is known, and the database system can abort a transaction immediately upon its decision to abort). The CO component, including its interfaces, can be enhanced, if it implements another variant of CO (see below), or plays a role in the database's concurrency control mechanism beyond voting in atomic commitment.

[edit] The situation when commitment ordering is a necessary condition for global serializability

If the databases that participate in distributed transactions (i.e., transactions that span more than a single database) do not use any shared concurrency control information and use unmodified atomic commitment protocol messages (for reaching atomicity), then maintaining (local) commitment ordering is a necessary condition for guaranteeing global serializability. This is a mathematical fact derived from the definitions of serializability and a transaction. It means that if not complying with CO, then global serializability cannot be guaranteed under this condition (the condition of no concurrency control information sharing between databases beyond atomic commit protocol messages). Atomic commitment is a minimal requirement for a distributed transaction since it is always needed, which is implied by the definition of transaction. Thus, complying with this requirement is probably the broadest definition of database independence and autonomy in the context of database concurrency control.

[edit] Related schedule properties and generalizations

[edit] Strong Strict two phase locking

Strong Strict Two Phase Locking (SS2PL; also referred to as Rigorousness or Rigorous scheduling) means that both read and write locks of a transaction are released only after the transaction has ended (either committed or aborted). The set of SS2PL schedules is a proper subset of the set of CO schedules. This property is widely utilized in database systems, and since it implies CO, databases that use it and participate in global transactions generate together a (globally) serializable schedule (when using any atomic commitment protocol, which is needed for atomicity in a multi-database environment). No database modificaion or addition is needed in this case to participate in a CO distributed solution (the set of undecided transactions to be aborted before voting in the local generic CO algorithm above is empty because of the locks, and hence such an algorithm is unnecessary in this case. A transaction can be voted on by a database system immediately after entering a "ready" state, i.e., completing running its task locally, and its locks are released by the database system only after it is decided by the atomic commitment protocol).

[edit] Strict commitment ordering

Strict Commitment Ordering (SCO) is the intersection of strictness (a special case of recoverability) and CO. It can use blocking mechanisms (locking) similar to those used for the popular SS2PL. Unlike SS2PL, SCO does not block on read-write conflicts (identical blocking for the other two conflict types: write-read, and write-write), and thus it allows shorter overall blocking in general, and more concurency (e.g., performance results for the most significant variant of "ordered shared locks," which is identical to SCO, support this). It is as practical as SS2PL since, like SS2PL, it provides besides serializability also strictness, which is utilized for efficient recovery of databases from failure.

SS2PL is a proper subset of SCO (which is another explanation why SCO is less constraining and provides more concurrency than SS2PL).

[edit] Extended commitment ordering

Extended Commitment Ordering (ECO) generalizes CO. When local transactions (transactions confined to a single database) can be distinguished from global (distributed) transactions (transactions that span two databases or more), commitment order is applied to global transactions only. This means that the temporal (partial) order of commit events of global transactions only (unimportant for local transactions) is consistent with the order of these transactions on the local (directed) conflict graph of each database in a system. A distributed algorithm to guarantee global ECO exists. As for CO, the algorithm needs only (unmodified) atomic commitment protocol messages. In order to guarantee global serializability, each database needs to guarantee also the serializability of its own local transactions by any (local) concurency control mechanism.

In summary, (local, which implies global) ECO together with local serializability, is a necessary and sufficient condition to guarantee global serializability when no concurrency control information beyond atomic commit messages is shared between databases, and local transactions (Vs. global) can be identified. This condition (ECO with local serializability) is weaker than CO, and allows more concurrency than CO at the cost of a more complicated local algorithm.

When all the transactions are assumed to be global (e.g., if no information is available about transactions being local), ECO reduces to CO.

Before a global transaction is committed, a generic local (to a database) ECO algorithm aborts a minimal set of undecided transactions (neither committed, nor aborted; either local transactions, or global that run locally), that can cause later a cycle in the conflict graph. This set of aborted transactions (not unique, contrary to CO) can be optimized , if each transaction is assigned with a weight (that can be determined by transaction's importance and by the computing resources already invested in the running transaction; optimization can be carried out, for example, by a reduction to the Max flow in networks broblem). Like for CO such a set is time dependent, does not increase for a ready distributed transaction, and becomes empty eventually. The local (to the database) concurrency control mechanism (separate from the ECO algorithm) ensures that local cycles are eliminated (unlike with CO, which implies serializability by itself). Local transactions can be always committed concurrently (even if a precedence relation exists, unlike CO). When the overall transactions' local partial order (which is determined by the local conflict graph, now only with possible temporary cycles, since a cycle eliminating local serializability mechanism exists) allows, also global transactions can be voted on to be committed concurrently (when all their transitively (indirect) preceding (via conflict) global transactions are committed, and all their transitively preceding local transactions are either committed or waiting (ready) to be committed, this, in analogy to the stronger distributed CO algorithm concurrent voting condition, where all the transitively preceding transactions need to be committed).

[edit] Multi-version commitment ordering

Multi-version Commitment Ordering (MVCO) is a generalization of CO for databases with multi-version resources. MVCO implies One-copy-serializability (1SER or 1SR) which is the generalization of serializability for multi-version resources (conflicts are generalized for different versions of a same data item, and conflict temporal order is replaced by version order, and possibly reversed, when different version accesses of that data item are conflicting). A distributed MVCO enforcing algorithm exists for a mixed environment with both single-version and multi-version resources. As for CO, the algorithm needs only (unmodified) atomic commitment protocol messages with no additional communication overhead. MVCO can be further generalized to employ the generalization of ECO.

See also Multiversion concurrency control.

[edit] On the interoperability among commitment ordering variants

With CO and its variants above (i.e., SS2PL, SCO, ECO, and MVCO) global serializability is achieved via atomic commitment protocol based distributed algorithms. For all the CO variants atomic commitment protocol is the instrument to eliminate global cycles (cycles that span two or more databases) in the global conflict graph (implicitly; no global data structure implementation is needed). In case of incompatible local commitment orders in two or more databases (when no global partial order can embed the local partial orders), which implies a global cycle in the global conflict graph, the atomic commitment protocol breaks that cycle by aborting an undecided transaction on it (see the distributed CO algorithm above). Differences between the various variants exist at the local level only (within the participating database systems). Each local CO instance of any variant has the same role, to determine the position of every global transaction (a transaction that spans two or more databases) within the local commitment order, i.e., to determine when it is the transaction's turn to be voted on locally in the atomic commitment protocol. Thus, all the CO variants exhibit the same behavior in regard to atomic commitment. This means that they are all interoperable via atomic commitment (using the same software interfaces, some already standardized for atomic commitment, primarily for the two phase commit protocol) and transparently can be utilized together in any distributed environment (while each CO variant instance is possibly associated with any relevant local concurrency control mechanism type). Any single global transaction can participate simultaneously in databases that may employ different CO variants.

Atomic commitment protocols are intended and designed to achieve atomicity without considering database concurrency control. These protocols can be specially enhanced for CO (including CO's various variants) to accelerate aborts used for breaking global cycles in the global conflict graph (for an earlier release of computing resources and typically locked data).

[edit] External links