Atomic broadcast

In distributed systems, atomic broadcast or total order broadcast is a broadcast messaging protocol that ensures that messages are received reliably and in the same order by all participants.^[1] Distributed systems are ones where each computer runs independently toward a common goal, and as a result, designing a successful atomic broadcast system is a significant challenge.^[2]

Atomic broadcast is a fundamental problem in distributed computing. A successful system must be a reliable broadcast. In addition, such a system must satisfy the total order property. This means that if computer A sends message 1 first and message 2 second, a success means that computer B receives both messages and that it receives message 1 before message 2. Atomic broadcasts are simple when computers are correct, meaning that they never fail. However, real computers are faulty, and do fail, and even if failures are temporary, this is where the challenge results.^[2]

The following properties are usually required from an atomic broadcast protocol. Validity means that if a correct participant broadcasts a message, then all correct participants will eventually receive it. Uniform agreement means that if a participant delivers a message, then all correct participants will eventually deliver it as well. Uniform integrity means that any given message is delivered by each participant at most once, and only if it was previously broadcast.

The definitions for validity and integrity may be sometimes formulated in different way. E.g. Michel Raynal et al.^[3] and Schiper et al.^[4] define validity property of atomic broadcast slightly differently, but the main requirement that messages are broadcast in the correct order remains.

A number of protocols have been proposed for performing atomic broadcast, under various assumptions about the network, failure models, availability of hardware support for multicast, and so forth.^[1] One widely popular technology in which atomic broadcast is available as a primitive is virtual synchrony, a kind of computing 'model' used for fault tolerance and data replication in many real-world systems and products.

References

Défago, X., Schiper, A., and Urbán, P. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4 (Dec. 2004), 372-421. DOI=10.1145/1041680.1041682 (alternate source)

↑ 1.0 1.1 Défago et al.. 2004
↑ 2.0 2.1 Kshemkalyani, Ajay; Singhal, Mukesh (2008). Distributed Computing: Principles, Algorithms, and Systems (Google eBook). Cambridge University Press. pp. 583–585. ISBN 9781139470315.
↑ Rodrigues L, Raynal M.: Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems , ICDCS '00: Proceedings of The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
↑ Ekwall, R.; Schiper, A.: Solving Atomic Broadcast with Indirect Consensus. Dependable Systems and Networks, 2006. DSN 2006. International Conference on 2006.