Global serializability

From Wikipedia, the free encyclopedia

In databases and transaction processing, global serializability is a property of a global schedule of transactions. A global schedule is the unified schedule of all the individual database schedules in a multidatabase (federated database) environment. Complying with global serializability means that the global schedule is serializable, has the serializability property. This makes global serializability a major goal for global concurrency control in multidatabase systems.

In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple (and possibly distributed) databases. Enforcing global serializability in such system, where different databases may use different types of concurrency control, is problematic. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability would lead to unacceptable performance, primarily due to computer and communication latency.

Contents

[edit] The global serializability problem

The difficulties above translate into the following problem:

Find an efficient (high-performance and fault tolerant) method to enforce Global serializability (global conflict serializability) in a heterogeneous distributed environment of multiple autonomous database systems. The database systems may employ different concurrency control methods. No limitation is imposed on the operations of either local transactions (confined to a single database system) or global transactions (span two or more database systems).

Lack of an appropriate solution for the global serializability problem drove researches to look for alternatives to serializability as correctness criteria in a multidatabase environment (e.g., see below), and the problem was characterized as difficult and open. The following quotes demonstrate the mindset about it by the end of the year 1991:

"Without knowledge about local as well as global transactions, it is highly unlikely that efficient global concurrency control can be provided... Additional complications occur when different component DBMSs (Database Management Systems) and the FDBMSs (Federated Database Management Systems) support different concurrency mechanisms... It is unlikely that a theoretically elegant solution that provides conflict serializability without sacrificing performance (i.e., concurrency and/or response time) and availability exists."[1]
"Transaction management in a heterogeneous, distributed database system is a difficult issue. The main problem is that each of the local database management systems may be using a different type of concurrency control scheme. Integrating this is a challenging problem, made worse if we wish to preserve the local autonomy of each of the local databases, and allow local and global transactions to execute in parallel. One simple solution is to restrict global transactions to retrieve-only access. However, the issue of reliable transaction management in the general case, where global and local transactions are allowed to both read and write data, is still open."[2]

Several solutions, some partial, have been proposed for the global serializability problem. Among them:

The problem of Global serializability has been a quite intensively researched subject in the late 1980ies and early 1990ies. Commitment ordering has provided both a general solution to the problem, insight into it, and understanding about possible generalizations of strong strict two phase locking (SS2PL), which practically and almost exclusively has been utilized (in conjunction with the Two-phase commit protocol (2PC) ) since the 1980ies to achieve global serializability across databases. Since then the tremendous progress in computing power, storage, and communication networks, resulted in tremendous increases in both centralized databases' sizes, transaction rates, and remote access to database capabilities, as well as blurring the boundaries between centralized computing and distributed one over fast, low-latency local networks. These, together with progress in database vendors' distributed solutions (primarily the popular SS2PL with 2PC based, a de facto standard that also allows interoperability among different vendors' (SS2PL based) databases), workflow systems, and database replication technology, in most cases have provided satisfactory and sometimes better information technology solutions without multi database atomic distributed transactions over databases with different concurrency control (bypassing the problem above). As a result, the sense of urgency that existed with the problem at that period, and with high-performance distributed atomic transactions over databases with different concurrency control in general, has reduced.

[edit] The commitment ordering solution

Main article: Commitment ordering

Commitment ordering[3][4] (CO) is the only high-performance, fault tolerant solution that has been proposed as a fully distributed (no central computing component or data-structure are needed), general mechanism that can be combined seamlessly with any local (to a database) conflict serializability providing concurrency control mechanism (see technical summary). Since the CO property of a schedule is a necessary condition for global serializability of autonomous databases (in the context of concurrency control), it provides the only general solution for autonomous databases. Seemingly by sheer luck, the CO solution possesses many attractive properties: It maintains each database's autonomy, does not restrict or delay any data-access operation for either local or global transactions, and does not need any knowledge about the transactions. It requires no communication overhead since it uses already needed, unmodified atomic commitment protocol messages (using atomic commitment protocols makes it fault tolerant, assuming that database systems are fault tolerant). Furthermore, it automatically resolves global deadlocks due to locking. Since each global transaction is confined to certain relatively small numbers of databases and network nodes, the CO solution scales up effectively with network size and number of databases, almost without any negative impact on performance. The only overhead incurred by the CO solution is locally detecting conflicts (if not already done otherwise) and locally ordering in each database system the (local) commits of local transactions, and the votings for atomic commitment of global transactions. Such overhead is low. This makes CO instrumental for global concurrency control of multidatabase systems (e.g., federated database systems). The underlying theory of Commitment ordering is both sound and elegant.

Most existing database systems, including all major commercial database systems, are strong strict two phase locking (SS2PL) based and already CO compliant. Thus they can participate in a CO based solution for global serializability in multidatabase environments without any modification (except for the popular multiversioning, where additional CO aspects should be considered). Achieving global serializability across SS2PL based databases using atomic commitment (primarily using two phase commit, 2PC) has been employed for many years (i.e., using the same CO solution for a specific special case; however, no reference is known prior to CO, that notices this special case's automatic global deadlock resulotion by the atomic commitment protocol's augmented-conflict-graph global cycle elimination process). Virtually all existing distributed transaction processing environments and supporting products rely on SS2PL and provide 2PC. As a matter of fact SS2PL together with 2PC have become a de facto standard. It is a homogeneous concurrency control solution, suboptimal but still quite effective in many cases, which allows inter-operation among SS2PL complying different database system types, i.e, allows heterogeneity in aspects other than concurrency control. SS2PL is a very constraining schedule property, and "takes over" when combined with any other property. For example, when combined with any optimistic property, the result is not optimistic anymore, but rather characteristically SS2PL. On the other hand, CO does not change data-access scheduling patterns at all, and any combined property's characteristics remain unchanged. Since also CO uses atomic commitment (e.g., 2PC) for achiveng global serializability, as SS2PL, any CO compliant database system or transactional object can transparently join existing SS2PL based environments, use 2PC, and maintain global serializability without any environment change. This makes CO a straightforward, natural generalization of SS2PL for any conflict serializability based database system, for all practical purposes.

Commitment ordering has been quite widely known inside the transaction processing and databases communities at Digital Equipment Corporation (DEC) since 1990. It has been under company confidentiality due to patenting processes. CO was disclosed outside of DEC by lectures and technical reports' distribution to database researches in May 1991, immediately after its first patent filing.

[edit] Relaxing global serializability

Some techniques have been developed for relaxed global serializability (i.e., they do not guarantee global serializability; see also Relaxing serializability). Among them:

  • Quasi serializability
  • Two-level serializability

While local (to a database system) relaxed serializability methods compromise serializability for performance gain (and utilized only when the application allows), it is unclear that various proposed relaxed global serializability methods which compromise global serializability, provide any performance gain over commitment ordering which guarantees global serializability. Typically, the declared intention of such methods has not been performance gain over effective global serializability methods (which apparently have been unknown to the inventors), but rather correctness criteria alternatives due to lack of a known effective global serializability method. Oddly, some of them were introduced years after CO had been introduced, and some even quote CO without realizing that it provides an effective global serializability solution, and thus without providing any performance comparison with CO to justify them as alternatives to global serializability (only for some applications).

Classes of schedules defined by relaxed global serializability properties either contain the global serializability class, or are incomparable with it. What differentiates relaxed global conflict serializability (RGCSR) properties from relaxed conflict serializability (RCSR) properties that are not RGCSR is typically the different way global cycles (span two or more databases) in the global conflict graph are handled. No distinction between global and local cycles exists for RCSR properties that are not RGCSR. RCSR contains RGCSR. Typically RGCSR techniques eliminate local cycles, i.e., provide local serializability (which can be achieved effectively by regular, known concurrency control methods), however, obviously they do not eliminate all global cycles (which would achieve global serializability).

[edit] See also

[edit] References

  1. ^ Amit Sheth, James Larson, Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases, ACM Computing Surveys,Vol. 22, No 3, pp. 183-236, September 1990. quote from page 227
  2. ^ Abraham Silberschatz, Michael Stonebraker, and Jeffrey Ullman, Database Systems: Achievements and Opportunities, Communications of the ACM, Vol. 34, No. 10, pp. 110-120, October 1991. quote from page 120
  3. ^ Yoav Raz, The Principle of Commitment Ordering, or Guaranteeing Serializability in a Heterogeneous Environment of Multiple Autonomous Resource Managers Using Atomic Commitment, Proc. of the Eighteenth Int. Conf. on Very Large Data Bases, pp. 292-312, Vancouver, Canada, August 1992. (also DEC-TR 841, Digital Equipment Corporation, November 1990)
  4. ^ Yoav Raz, Serializability by Commitment Ordering, Information Processing Letters, Volume 51, Number 5, pp. 257-264, September 1994. (Received August 1991)