Commitment ordering
From Wikipedia, the free encyclopedia
In databases and transaction processing, Commitment ordering (CO) is a serializability technique, as well as the name of the resulting transaction schedule (history) property. In a CO compliant schedule the order of commitment events of transactions is identical to the precedence order of the respective transactions. CO is a broad special case of conflict serializability, and effective means (reliable and high-performance) to achieve global serializability across any conflict serializablity based database systems, each augmented with a CO component. As such CO is instrumental for global concurrency control of multidatabase systems, possibly highly distributed, and other cases of modular concurrency control that require global serializability, e.g., environments with transactional objects. CO is the most general property (a necessary condition) that guarantees global serializability, if the database systems involved do not share concurrency control information beyond atomic commitment messages. It generalizes the popular Strong strict two phase locking (SS2PL) property, which in conjunction with the Two phase commit protocol is the de facto standard to achieve global serializability across (SS2PL based) database systems.
Contents
|
[edit] Overview
CO has been referred to also as commit ordering, and commit order serializability and strong recoverability. The latter is a misleading name since CO is incomparable with recoverability, and the term "strong" implies a special case. This means that a schedule with a strong recoverability property does not necessarily have the CO property, and vice versa.
In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple (and possibly distributed) databases. Enforcing global serializability in such system is problematic. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability would lead to unacceptable performance, primarily due to computer and communication latency. The problem of achieving global serializability effectively had been characterized as open until the end of 1991 (see Global serializability).
Enforcing CO is an effective way to enforce conflict serializability globally in a distributed system, since enforcing CO locally in each database, also enforces it globally. Each database may use any, possibly different, type of concurrency control mechanism that provides conflict serializability, and has to enforce CO locally. Enforcing CO locally does not affect the data access scheduling strategy of the local concurrency control mechanism, since such a mechanism typically does not consider the commitment events (or their order). The CO solution requires no communication overhead, since it uses (unmodified) atomic commitment protocol messages only, already needed by each distributed transaction to reach atomicity. An atomic commitment protocol plays a central role in the distributed CO algorithm, which enforces CO globally, by breaking global cycles (cycles that span two or more database) in the global conflict graph. CO, its special cases, and its generalizations are compatible, and achieve global serializability while transparently being utilized together in a single heterogeneous distributed environment comprising objects with possibly different concurrency control mechanisms. As such, Commitment ordering, including its special cases, and together with its generalizations, provides a general, high performance, fully distributed solution (no central processing component or central data structure are needed) for guaranteeing global serializability in heterogeneous environments of multidatabase systems and other multiple transactional objects (objects with states accessed and modified only by transactions; e.g., in the framework of transactional processes). The CO solution scales up with network size and the number of databases without any negative impact on performance (assuming the statistics of a single distributed transaction, e.g., the average number of databases involved with a single transaction, are unchanged).
[edit] The commitment ordering solution for global serializability
[edit] General characterization of CO
Commitment ordering (CO) is a special case of conflict serializability. CO can be enforced with non-blocking mechanisms (each transaction can complete its task without having its data-access blocked; however, commitment could be blocked). In a CO schedule the commitment events' (partial) precedence order of the transactions corresponds to the precedence (partial) order of the respective transactions in the (directed) conflict graph (serializability graph), as induced by their conflicting access operations (usually read and write (insert/modify/delete) operations; CO also applies to higher level operations, where they are conflicting if noncommutative, as well as to conflicts between operations upon multi-version data).
- Definition - Commitment ordering
- Let T1,T2 be two committed transactions in a schedule, such that T2 is in a conflict with T1 (T1 precedes T2). The schedule has the Commitment ordering (CO) property, if for every two such transactions T1 commits before T2 commits.
The commitment decision events are generated by either a local commitment mechanism, or an atomic commitment protocol, if different processes need to reach consensus on whether to commit or abort. The protocol may be distributed or centralized. Transactions may be committed concurrently, if the commit partial order allows (if they do not have conflicting operations). If different conflicting operations induce different partial orders of same transactions, then the conflict graph has cycles, and the schedule will violate serializability when all the transactions on a cycle are committed. In this case no partial order for commitment events can be found. Thus, cycles in the conflict graph need to be broken by aborting transactions. However, any conflict serializable schedule can be made CO without aborting any transaction, by properly delaying commit events to comply with the transactions' precedence partial order.
CO enforcement by itself is not sufficient as a concurrency control mechanism, since CO lacks the recoverability property, which should be supported as well.
[edit] The distributed CO algorithm
A fully distributed Global commitment ordering enforcement algorithm exists, that uses local CO of each participating database, and needs only (unmodified) atomic commitment protocol messages with no further communication. The distributed algorithm is the combination of local (to each database) CO algorithm processes, and an atomic commitment protocol (which can be fully distributed). Atomic commitment protocol is essential to enforce atomicity of each distributed transaction (to decide whether to commit or abort it; this procedure is always carried out for distributed transactions, independently of concurrency control and CO). A common example of an atomic commitment protocol is the two phase commit protocol, which is resilient to many types of system failure.
[edit] Enforcing global CO
In each database system a local CO algorithm determines the needed commitment order for that database. By the characterization of CO above, this order depends on the local precedence order of transactions, which results from the local data access scheduling mechanisms. Accordingly votes-to-commit in the atomic commitment protocol are scheduled for each (unaborted) distributed transaction. If a precedence relation (conflict) exists between two transactions, then the second will not be voted on before the first is completed (either committed or aborted), to prevent possible commit order violation by the atomic commitment protocol. Such can happen since the commit order by the protocol is not necessarily the same as the voting order. If no precedence relation exists, both can be voted on concurrently. This voting strategy ensures that also the atomic commitment protocol maintains commitment order, and it is a necessary condition for guaranteeing Global CO (and local CO of a database; without it both Global CO and Local CO, which means that each database is CO compliant, may be violated).
However, since database systems schedule their transactions independently, it is possible that the transactions' precedence orders in two databases or more are not compatible (no global partial order exists that can embed the respective local partial orders). With CO precedence orders are also the commitment orders. When participating databases in a same distributed transaction do not have compatible local precedence orders for that transaction (without "knowing" it; typically no coordination between database systems exists on conflicts, since the needed communication is massive and unacceptably degrades performance) it means that the transaction resides on a global cycle (involving two or more databases) in the global conflict graph. In this case the atomic commitment protocol will fail to collect all the votes needed to commit that transaction: At least one database will delay its vote for that transaction indefinitely, to comply with its own commitment (precedence) order, since it will be waiting to the completion of another, preceding transaction on that global cycle, delayed indefinitely by another database with a different order. This means a voting-deadlock situation involving the databases on that cycle. As a result the protocol will eventually abort some deadlocked transaction on this global cycle, since each such transaction is missing at least one participant's vote. Selection of the specific transaction on the cycle to be aborted depends on the atomic commitment protocol's abort policies. A timeout mechanism is common. However, it may result in more than one abort per cycle. Both preventing unnecessary aborts, and abort time shortening, can be achieved by a dedicated abort mechanism for CO. This abort will break the global cycle involving that distributed transaction, and deadlocked transactions, and possibly others in conflict with the deadlocked (and thus blocked) will be free to be voted on. It is worthwhile noting that each database involved with the voting-deadlock continues to vote regularly on transactions that are not in conflict with its deadlocked transaction, typically almost all the outstanding transactions. Thus, in case of incompatible local (partial) commitment orders, no action is needed since the atomic commitment protocol resolves it automatically by aborting a transaction that is the cause of incompatibility. This means that the above voting policy is also a sufficient condition for guaranteeing Global CO.
The following is concluded:
- The Global CO Enforcing Theorem
- Let T1,T2 be undecided (neither committed nor aborted) transactions in a database system that enforces CO for local transactions, such that T2 is global and in conflict with T1 (T1 precedes T2). Then, having T1 ended (either committed or aborted) before T2 is voted on to be committed, in each such database system in a multidatabase environment, is a necessary and sufficient condition for guaranteeing Global CO (the condition guarantees Global CO, which may be violated without it).
- Comment: The Local CO property of a global schedule means that each database is CO compliant. From the necessity discussion part above it directly follows that the theorem is true also when replacing "Global CO" with "Local CO" when global transactions are present. Together it means that Global CO is guaranteed if and only if Local CO is guaranteed (which is untrue for Global serializability and Local serializability: Global implies Local, but not the opposite).
Global CO implies Global serializability.
[edit] More on global cycles and voting-deadlocks
The above global cycle elimination process can be explained also by the following observation:
Assume first, for simplicity, that every transaction reaches the ready- to-commit state and is voted on by at least one database (this implies that no blocking by locks occurs). Define a "wait for vote to commit" graph as a directed graph with transactions as nodes, and a directed edge from any first transaction to a second transaction if the first transaction blocks the vote to commit of the second transaction. Such blocking happens only if the second transaction is in a conflict with the first transaction (see above). Thus the "wait for vote to commit" graph is identical to the global conflict graph (has exactly the same definition - see in Serializability). A cycle in the "wait for vote to commit" graph means a deadlock in voting. Hence there is a deadlock in voting if and only if there is a cycle in the conflict graph. Local cycles (confined to a single database) are eliminated by the local serializability mechanisms. Consequently only global cycles are left, which are then eliminated by the atomic commitment protocol when it aborts deadlocked transactions with missing (blocked) respective votes.
Secondly, also local commits are dealt with: Note that with CO also a regular local commit of a local transaction blocks local commits and votes upon conflicts, and the situation for global transactions does not change also without the simplifying assumption above: The final result is the same also with local commitment for local transactions, without voting in atomic commitment for them.
Finally, blocking by a lock (which has been excluded so far) needs to be considered: A lock blocks and prevents a conflict, and if released only after transaction end, it may block indirectly either a vote or a local commit of another transaction (which now cannot get to ready state), with the same effect as of a direct blocking of a vote or a local commit. In this case a cycle is generated in the conflict graph only if such a blocking by a lock is also considered a conflict. With such extended definition of a conflict, the conflict graph is augmented with events of blocking-by-a-lock as conflicts, to become an augmented conflict graph, which is actually a wait for graph for voting and local commit. For example, if all the databases on a global cycle are SS2PL based, then all the related vote blocking situations are caused by locks (this is the classical, and probably the only global deadlock situation dealt with in the research literature). This is a global deadlock case where each related database creates a portion of the cycle, and no complete cycle is generated in any respective local wait-for graph. Thus, both global serializability graph cycles and global deadlocks related to locking generate voting-deadlocks. Augmented conflict graph cycles provide complete characterization for voting deadlocks as a combination of the two situations (both real cycles and potential cycles due to locking in the conflict graph, and portions of the two types that add up to complete cycles). All these cycle types need to be broken to both maintain global serializability and resolve global locking deadlocks, and indeed they are broken by the atomic commitment protocol upon voting deadlock.
Comment: This observation also explains the correctness of Extended CO (ECO) below: Global transactions' voting order must follow the conflict graph order with vote blocking when order relation (graph path) exists between two global transactions. Local transactions are not voted on, and their (local) commits are not blocked upon conflicts. This results in same voting-deadlock situations and resulting global cycle elimination process for ECO.
The voting-deadlock situation can be summarized as follows:
- The CO Voting-Deadlock Theorem
- Let a multidatabase environment comprise CO compliant (which eliminates local cycles) database systems that enforce, each, Global CO (using the condition in the theorem above). Then a voting-deadlock occurs if and only if a global cycle (spans two or more databases) exists in the Global augmented conflict graph (also blocking by a lock is considered a conflict). If the cycle does not break by any abort, then all the global transactions on it are involved with the respective voting-deadlock, and eventually each has its vote blocked (either directly, or indirectly by a data-access lock); if a local transaction resides on the cycle, eventually it has its (local) commit blocked.
Also the following locking based special case is concluded:
- The CO Locking-based Global-Deadlock Theorem
- In a CO compliant multidatabase system a locking-based global-deadlock, involving at least one data-access lock, and two or more database systems, is a reflection of a global cycle in the Global augmented conflict graph, which results in a voting-deadlock, if not broken. Such cycle is not a cycle in the (regular) Global conflict graph (and does not affect serializability).
- Comment: Any blocking in the cycle that is not data-access lock is a direct blocking of either voting or local commit. Atomic commitment resolves all voting-deadlocks, including this locking-based type.
Global cycle elimination (here voting-deadlock resolution by atomic commitment) and resulting aborted transactions' re-executions are time consuming, regardless of concurrency control used. If databases schedule transactions independently, global cycles are unavoidable (any scheduling coordination, e.g., with Distributed lock manager, results in autonomy violation, and typically also in substantial performance penalty). However, their likelihood can be made very low by implementing database and transaction design guidelines that reduce the number of conflicts involving a global transaction, primarily by properly handling hot spots, and avoiding conflicts by using commutativity when possible (e.g., when extensively using counters, as in finances, and especially multi-transaction accumulation counters, which are typically hot spots).
Atomic commitment protocols are intended and designed to achieve atomicity without considering database concurrency control. They abort upon detecting or heuristically finding missing votes, and typically unaware of global cycles. These protocols can be specially enhanced for CO (including CO's variants below) to accelerate aborts used for breaking global cycles in the global augmented conflict graph (for better performance by earlier release upon transaction-end of computing resources and typically locked data). For example, locking based global deadlock detection methods, other than timeout, can be generalized to include also local commit and vote direct blocking.
[edit] Enforcing CO locally
Commitment ordering can be enforced locally (in a single database) by a dedicated CO algorithm, or by any algorithm/protocol that provides any special case of CO. An important such protocol, being utilized extensively in database systems, which generates a CO schedule, is the strong strict two phase locking protocol (SS2PL: "release transaction's locks only after the transaction has been either committed or aborted"; see below). SS2PL is a proper subset of the intersection of 2PL and strictness.
[edit] A generic local CO algorithm
A generic CO algorithm is an algorithm independent of implementation details, that enforces exactly the CO property. It does not block data access (nonblocking), and consists of aborting transactions (only if needed) before committing a transaction. It aborts a (uniquely determined at any given time) minimal set of other undecided (neither committed, nor aborted) transactions that run locally and can cause serializability violation in the future (can later generate cycles of committed transactions in the conflict graph). This set consists of all undecided transactions with directed edges in the conflict graph to the transaction to be committed. The size of this set cannot increase when that transaction is waiting to be committed (in ready state: processing has ended), and typically decreases in time as its transactions are being decided. Thus, unless real-time constraints exist to complete transactions, it is preferred to wait with committing this transaction and let this set decrease in size. If another serializability mechanism exists locally (which eliminates cycles in the local conflict graph), or if no cycles involving that transaction exist, the set will be empty eventually. Otherwise the set will stabilize with transactions on local cycles, and aborts will have to occur to break the cycles. Since in the case of CO conflicts generate blocking on commit, local cycles indicate local commit-deadlocks, and deadlock resolution techniques as in SS2PL can be used (e.g., timeout, wait-for graph). Practically an additional concurrency control mechanism is always utilized even solely to enforce Recoverability. The generic CO algorithm does not affect local resource (data) access scheduling strategy, when it runs alongside of any other local concurrency control mechanism. It affects only the commit order, and for this reason it does not need to abort more transactions than those needed to be aborted for serializability violation prevention by any combined local concurrency control mechanism (e.g., aborts for deadlock resolution in blocking mechanisms). The net effect of CO may be, at most, a delay of commit events (or voting in a distributed environment), to comply with the needed commit order (but not more delay than its special cases, for example, SS2PL, and on the average significantly less).
[edit] Implementation considerations - The Commitment Order Coordinator (COCO)
Consider a database system in a multidatabase environment. From a software architecture point of view a CO component that implements the generic CO algorithm locally, the Commitment Order Coordinator (COCO), can be designed in a straightforward way as a mediator between a (single) database system and an atomic commitment protocol component (however, the COCO is typically an integral part of the database system). The COCO's functions are to vote to commit on ready transactions (processing has ended) according to the local commitment order, to vote to abort on transactions for which the database system has initiated an abort (the database system can initiate abort for any transaction, for many reasons), and to pass the atomic commitment decision to the database system. For determining the commitment order the COCO maintains an updated representation of the local conflict graph of the undecided (neither committed nor aborted) transactions as a data structure (e.g., utilizing mechanisms similar to locking, but with no data-access blocking). The COCO component has a passive interface with its database system to receive conflict, "ready" (processing has ended; readiness to vote on a transaction), and abort notifications from the database system. It also interfaces with the atomic commitment protocol to vote and to receive the atomic commitment protocol's decision on each transaction. The decision is delivered from the COCO to the database system through their interface to execute it (this is the only needed notification to the database system; for a database's own initiated abort the decision (abort) is known, and the database system can abort a transaction immediately upon its decision to abort). The COCO, including its interfaces, can be enhanced, if it implements another variant of CO (see below), or plays a role in the database's concurrency control mechanism beyond voting in atomic commitment.
[edit] The situation when CO is a necessary condition for global serializability
If the databases that participate in distributed transactions (i.e., transactions that span more than a single database) do not use any shared concurrency control information and use unmodified atomic commitment protocol messages (for reaching atomicity), then maintaining (local) commitment ordering is a necessary condition for guaranteeing global serializability (obviously it is also a sufficient condition). This is a mathematical fact derived from the definitions of serializability and a transaction. It means that if not complying with CO, then global serializability cannot be guaranteed under this condition (the condition of no concurrency control information sharing between databases beyond atomic commit protocol messages). Atomic commitment is a minimal requirement for a distributed transaction since it is always needed, which is implied by the definition of transaction. Thus, defining database autonomy and independence as complying with this requirement is probably the broadest such definition in the context of database concurrency control. Using this definition the following is concluded:
- The CO Theorem
- CO compliance of every autonomous database system (or transactional object) in a multidatabase environment is a necessary condition for guaranteeing Global serializability. CO compliance of every database system is a sufficient condition for guaranteeing Global serializability.
Comment: The definition of autonomy above implies that transactions are scheduled in a way that local transactions (confined to a single database) cannot be identified as such by an autonomous database system. This is realistic for some transactional objects, but less realistic for general purpose database systems. If autonomy is augmented with the ability to identify local transactions, then compliance with a more general property, Extended commitment ordering (ECO, see below), is the necessary condition.
[edit] Summary
The Commitment ordering (CO) solution for Global serializability can be summarized as follows:
If each database (or any other transactional object) in a multidatabase environment complies with CO, i.e., arranges its local transactions' commitments and its votes on (global, distributed) transactions to the atomic commitment protocol according to the local (to the database) partial order induced by the local conflict graph (serializability graph) for the respective transactions, then Global CO and Global serializability are guaranteed. A database's CO compliance can be achieved effectively with any local conflict serializability based concurrency control mechanism, with neither affecting any transaction's execution process or scheduling, nor aborting it. Also the database's autonomy is not violated. The only low overhead incurred is detecting conflicts (e.g., as with locking, but with no data-access blocking; if not already detected for other purposes), and ordering votes and local transactions' commits according to the conflicts.
In case of incompatible partial orders of two or more databases (no global partial order can embed the respective local partial orders), a global cycle (spans two databases or more) in the global conflict graph is generated. This, together with CO, results in a cycle of blocked votes, and a voting-deadlock occurs for the databases on that cycle (however, allowed concurrent votings in each database, typically almost all the outstanding votings, continue to execute). In this case the atomic commitment protocol fails to collect all the votes needed for the blocked transactions on that global cycle, and consequently the protocol aborts some transaction with a missing vote. This breaks the global cycle, the voting-deadlock is resolved, and the related blocked votings are free to be executed. Breaking the global cycle in the global conflict graph ensures that both Global CO and Global serializability are maintained. Thus, in case of incompatible local (partial) commitment orders no action is needed since the atomic commitment protocol resolves it automatically by aborting a transaction that is a cause for the incompatibility. Furthermore, also global deadlocks due to locking (potential global cycles in the conflict graph) result in voting deadlocks and are resolved automatically by the same mechanism.
Local CO is a necessary condition for guaranteeing Global serializability, if the databases involved do not share any concurrency control information beyond (unmodified) atomic commitment protocol messages, i.e., if the databases are autonomous in the context of concurrency control. This means that every global serializability solution for autonomous databases must comply with CO. Otherwise global serializability may be violated.
The CO solution scales up with network size and the number of databases without performance penalty when it utilizes common distributed atomic commitment architecture.
[edit] CO variants: Interesting special cases and generalizations
[edit] Strong strict two phase locking (SS2PL)
Strong Strict Two Phase Locking (SS2PL; also referred to as Rigorousness or Rigorous scheduling) means that both read and write locks of a transaction are released only after the transaction has ended (either committed or aborted). The set of SS2PL schedules is a proper subset of the set of CO schedules. This property is widely utilized in database systems, and since it implies CO, databases that use it and participate in global transactions generate together a serializable global schedule (when using any atomic commitment protocol, which is needed for atomicity in a multi-database environment). No database modification or addition is needed in this case to participate in a CO distributed solution: The set of undecided transactions to be aborted before committing in the local generic CO algorithm above is empty because of the locks, and hence such an algorithm is unnecessary in this case. A transaction can be voted on by a database system immediately after entering a "ready" state, i.e., completing running its task locally. Its locks are released by the database system only after it is decided by the atomic commitment protocol, and thus the condition in the Global CO enforcing theorem above is kept. Interestingly, if a local timeout mechanism is used by a database system to resolve (local) SS2PL deadlocks, then aborting blocked transactions breaks not only potential local cycles in the global conflict graph (real cycles in the augmented conflict graph), but also database system's potential global cycles as a side effect, if the atomic commitment protocol's abort mechanism is relatively slow. Such idependent aborts by several entities typically may result in unnecessary aborts for more than one transaction per global cycle. The situation is different for a local wait-for graph based mechanisms: Such cannot identify global cycles, and the atomic commitment protocol will break the global cycle, if the resulting voting deadlock is not resolved earlier in another database.
[edit] Strict CO (SCO)
Strict Commitment Ordering (SCO) is the intersection of strictness (a special case of recoverability) and CO, and provides an upper bound for a schedule's concurrency when both properties exist. It can be implemented using blocking mechanisms (locking) similar to those used for the popular SS2PL. Unlike SS2PL, SCO does not block on read-write conflicts (identical blocking for the other two conflict types: write-read, and write-write), possibly blocking on commit instead. As a result it has shorter average blocking periods, and more concurrency (e.g., performance results for the most significant variant of "ordered shared locks," which is identical to SCO, support this). It is as practical as SS2PL since, like SS2PL, it provides besides serializability also strictness, which is widely utilized as a basis for efficient recovery of databases from failure.
SS2PL is a proper subset of SCO (which is another explanation why SCO is less constraining and provides more concurrency than SS2PL).
[edit] Extended CO (ECO)
[edit] General characterization of ECO
Extended Commitment Ordering (ECO) generalizes CO. When local transactions (transactions confined to a single database) can be distinguished from global (distributed) transactions (transactions that span two databases or more), commitment order is applied to global transactions only. Thus, for a local (to a database) schedule to have the ECO property, the chronological (partial) order of commit events of global transactions only (unimportant for local transactions) is consistent with their order on the respective local conflict graph.
- Definition - Extended commitment ordering
- Let T1,T2 be two committed global transactions in a schedule, such that a directed path of unaborted transactions exists in the conflict graph (precedence graph) from T1 to T2 (T1 precedes T2, possibly transitively, indirectly). The schedule has the Extended commitment ordering (ECO) property, if for every two such transactions T1 commits before T2 commits.
A distributed algorithm to guarantee global ECO exists. As for CO, the algorithm needs only (unmodified) atomic commitment protocol messages. In order to guarantee global serializability, each database needs to guarantee also the conflict serializability of its own transactions by any (local) concurrency control mechanism.
- The ECO Theorem
- (Local, which implies global) ECO together with local conflict serializability, is a sufficient condition to guarantee global conflict serializability (when local transactions, Vs. global, can be identified). When no concurrency control information beyond atomic commitment messages, except for identifying local transactions, is shared outside a database, it is also a necessary condition.
This condition (ECO with local serializability) is weaker than CO, and allows more concurrency at the cost of a little more complicated local algorithm (however, no practical overhead difference with CO exists).
When all the transactions are assumed to be global (e.g., if no information is available about transactions being local), ECO reduces to CO.
[edit] The ECO algorithm
Before a global transaction is committed, a generic local (to a database) ECO algorithm aborts a minimal set of undecided transactions (neither committed, nor aborted; either local transactions, or global that run locally), that can cause later a cycle in the conflict graph. This set of aborted transactions (not unique, contrary to CO) can be optimized , if each transaction is assigned with a weight (that can be determined by transaction's importance and by the computing resources already invested in the running transaction; optimization can be carried out, for example, by a reduction to the Max flow in networks problem). Like for CO such a set is time dependent, and becomes empty eventually. Practically, almost in all needed implementations a transaction should be committed only when the set is empty (and no set optimization is applicable). The local (to the database) concurrency control mechanism (separate from the ECO algorithm) ensures that local cycles are eliminated (unlike with CO, which implies serializability by itself; however, practically also for CO a local concurrency mechanism is utilized, at least to ensure Recoverability). Local transactions can be always committed concurrently (even if a precedence relation exists, unlike CO). When the overall transactions' local partial order (which is determined by the local conflict graph, now only with possible temporary local cycles, since cycles are eliminated by a local serializability mechanism) allows, also global transactions can be voted on to be committed concurrently (when all their transitively (indirect) preceding (via conflict) global transactions are committed, while transitively preceding local transactions can be at any state. This in analogy to the distributed CO algorithm's stronger concurrent voting condition, where all the transitively preceding transactions need to be committed).
The condition for guaranteeing Global ECO can be summarized similarly to CO:
- The Global ECO Enforcing Theorem
- Let T1,T2 be undecided (neither committed nor aborted) global transactions in a database system that ensures serializability locally, such that a directed path of unaborted transactions exists in the local conflict graph (that of the database itself) from T1 to T2. Then, having T1 ended (either committed or aborted) before T2 is voted on to be committed, in every such database system in a multidatabase environment, is a necessary and sufficient condition for guaranteeing Global ECO (the condition guarantees Global ECO, which may be violated without it).
Global ECO (all global cycles in the global conflict graph are eliminated by atomic commitment) together with Local serializability (i.e., each database system maintains serializability locally; all local cycles are eliminated) imply Global serializability (all cycles are eliminated).
Similarly to CO as well, the ECO voting-deadlock situation can be summarized as follows:
- The ECO Voting-Deadlock Theorem
- Let a multidatabase environment comprise database systems that enforce, each, both Global ECO (using the condition in the theorem above) and local conflict serializability (which eliminates local cycles in the global conflict graph). Then, a voting-deadlock occurs if and only if a global cycle (spans two or more databases) exists in the Global augmented conflict graph (also blocking by a lock is considered a conflict). If the cycle does not break by any abort, then all the global transactions on it are involved with the respective voting-deadlock, and eventually each has its vote blocked (either directly, or indirectly by a data-item lock). If a local transaction resides on the cycle, it may be in any unaborted state (running, ready, or committed; no commit blocking is needed).
This means that also global deadlocks due to locking generate voting deadlocks, and are automatically resolved by atomic commitment.
[edit] Multi-version CO (MVCO)
Multi-version Commitment Ordering (MVCO) is a generalization of CO for databases with multi-version resources. MVCO implies One-copy-serializability (1SER or 1SR) which is the generalization of serializability for multi-version resources (conflicts are generalized for different versions of a same data item, and conflict temporal order is replaced by version order, and possibly reversed, when different version accesses of that data item are conflicting). A distributed MVCO enforcing algorithm exists for a mixed environment with both single-version and multi-version resources. As for CO, the algorithm needs only (unmodified) atomic commitment protocol messages with no additional communication overhead. MVCO can be further generalized to employ the generalization of ECO.
See also Multiversion concurrency control.
[edit] On the interoperability among CO variants
With CO and its variants above (i.e., SS2PL, SCO, ECO, and MVCO) global serializability is achieved via atomic commitment protocol based distributed algorithms. For CO and all its variants atomic commitment protocol is the instrument to eliminate global cycles (cycles that span two or more databases) in the augmented (and regular) global conflict graph (implicitly; no global data structure implementation is needed). In case of incompatible local commitment orders in two or more databases (when no global partial order can embed the local partial orders), which implies a global cycle in the global conflict graph, the atomic commitment protocol breaks that cycle by aborting an undecided transaction on it (see The distributed CO algorithm above). Differences between the various variants exist at the local level only (within the participating database systems). Each local CO instance of any variant has the same role, to determine the position of every global transaction (a transaction that spans two or more databases) within the local commitment order, i.e., to determine when it is the transaction's turn to be voted on locally in the atomic commitment protocol. Thus, all the CO variants exhibit the same behavior in regard to atomic commitment. This means that they are all interoperable via atomic commitment (using the same software interfaces, some already standardized for atomic commitment, primarily for the two phase commit protocol) and transparently can be utilized together in any distributed environment (while each CO variant instance is possibly associated with any relevant local concurrency control mechanism type).
In summary, any single global transaction can participate simultaneously in databases that may employ each any, possibly different, CO variant (while concurrently running processes in each such database, and running concurrently with local and other global transactions in each such database). The atomic commitment protocol is indifferent to CO, and does not distinguish between the various CO variants. Any global cycle generated in the augmented global conflict graph may span databases of different CO variants, and generate a voting deadlock that is resolved by atomic commitment exactly the same way as in a single CO variant environment.
[edit] See also
[edit] References
- Yoav Raz: The Principle of Commitment Ordering, or Guaranteeing Serializability in a Heterogeneous Environment of Multiple Autonomous Resource Managers Using Atomic Commitment. Proceedings of the Eighteenth International Conference on Very Large Data Bases (VLDB), pp. 292-312, Vancouver, Canada, August 1992. (also DEC-TR 841, Digital Equipment Corporation, November 1990)
- Yoav Raz: Serializability by Commitment Ordering. Information Processing Letters, Volume 51, Number 5, pp. 257-264, September 1994. (Received August 1991)
- Yoav Raz: The Commitment Order Coordinator (COCO) of a Resource Manager, or Architecture for Distributed Commitment Ordering Based Concurrency Control, DEC-TR 843, Digital Equipment Corporation, December 1991.
- Yoav Raz: Locking Based Strict Commitment Ordering, or How to improve Concurrency in Locking Based Resource Managers, DEC-TR 844, December 1991.
- Yoav Raz: Extended Commitment Ordering or Guaranteeing Global Serializability by Applying Commitment Order Selectivity to Global Transactions, Proceedings of the Twelfth ACM Symposium on Principles of Database Systems, Washington, DC, pp. 83-96, May 1993. (also DEC-TR 842, November 1991)
- Yoav Raz: Commitment Ordering Based Distributed Concurrency Control for Bridging Single and Multi Version Resources, Proceedings of the Third IEEE International Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems, Vienna, Austria, pp. 189-198, April 1993.(also DEC-TR 853, July 1992)