Atomic broadcast
From Wikipedia, the free encyclopedia
To comply with Wikipedia's quality standards, this article may need to be rewritten. Please help improve this article. The discussion page may contain suggestions. |
This article needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (June 2007) |
The introduction to this article provides insufficient context for those unfamiliar with the subject. Please help improve the article with a good introductory style. |
In distributed systems, atomic broadcast or total order broadcast is a broadcast messaging protocol that ensures that messages are received reliably and in the same order by all participants (Défago et al. 2004).
This problem is usually considered in environments where participants can fail, for example, by crashing. Participants who never fail are called correct, the others are faulty. The following properties are usually required from an atomic broadcast protocol.
- Validity
- If a correct participant broadcasts a message, then all correct participants will eventually deliver it.
- Uniform Agreement
- If a participant delivers a message, then all correct participants will eventually deliver it as well.
- Uniform Integrity
- Any given message is delivered by each participant at most once, and only if it was previously broadcast.
- Uniform Total Order
- If some participant delivers message A after message B, then every participant delivers B only after it has delivered A.
A number of protocols have been proposed for performing atomic broadcast, under various assumptions about the network, failure models, availability of hardware support for multicast, and so forth (Défago et al. 2004). One widely popular technology in which atomic broadcast is available as a primitive is virtual synchrony, a kind of computing 'model' used for fault tolerance and data replication in many real-world systems and products.
[edit] How do atomic broadcast protocols work?
There are many protocols, but a very simple one would split the reliability aspects from the problem of ordering. A simple way to provide reliability is as follows.
- Add a header to the message listing the destinations to which it will be sent. For example, if the message is a string "Hello world", it will now have a list of destinations attached to it {Fred,Sarah,Sally,Dan}:"Hello world".
- Try to send the message directly to the destinations. Because networks are not reliable, some copies might not get through, so keep trying until the destinations acknowledge receipt.
- On receiving a message for the first time, send an acknowledgement back to the sender, and then send a copy to each of the other destinations listed in the header. For example, Fred should relay the message to Sarah, Sally and Dan. Again, he keeps trying until he is sure they have received the message.
- On receiving a duplicate of a message seen previously, acknowledge it, but then discard it without relaying it.
As you can see, each message is relayed by all the receivers and hence received N times, if there are N participants. The network actually experiences an O(N²) load. However, this was a very simple protocol, and there are ways to optimize atomic broadcast protocols for higher average performance. For example, we could have used IP multicast (if available), and could have delayed the relaying of messages briefly in the hope that the sender would succeed in getting messages through directly (obviously, that only helps if the sender then tells the destinations that it was successful!).
What about ordering? A simple solution would work this way:
- On receiving a message for the first time, save it in a wait queue.
- Some process is designated as the leader. Periodically, it should send out a list of the ordering to use for messages in its wait queue.
- Deliver messages from the wait queue in the order specified by the leader.
As you can see, ordering can be a little slow. At the very fastest, the leader will send out the recommended ordering to use the instant it first learns about a message. The average receiver would deliver a message 'one message hop' after receiving it. But one can fine-tune this sort of protocol and, in any case, many ordering protocols have been proposed; this is just one example.
But how do we pick the leader? We need to run a leader election protocol. Normally this protocol will run just once, at the start, and then will be idle unless the leader crashes. Systems that implement virtual synchrony automate the leader election mechanism and then provide built-in atomic broadcast protocols, with high speed ordering algorithms tuned to ensure that messages will get through with the smallest possible delay.
[edit] What sorts of issues arise?
We've touched on two of the many issues that real-world protocol builders need to address: 'fault-tolerance and ordering. Other important concerns involve ensuring high performance when sending a stream of multicast messages, minimizing the latency before delivery occurs, scaling in the number of receivers, and scaling in the numbers of groups that arise in the system as a whole.
Our protocol didn't run over TCP, but in some real-world settings, TCP is mandatory. For example, a multicast that runs directly over point to point message passing in this manner might have problems with network firewalls, some of which are designed to reject non-TCP messages.
In fact, firewalls and network address translators are big issues in the real Internet. In Internet settings, A may be able to establish a connection to B, but B might not be able to make a connection back to A. Thus our sender might be able to make TCP connections to the destinations, and yet the destinations might not be able to connect to one-another. Some machines may have problems with uplink speeds, or other sorts of performance issues (this can happen when a machine lives at the end of a wireless connection or a modem that receives from a cable or satellite, hence at high data rates, but transmits at low data rates). Thus, building a practical atomic broadcast protocol for use in Internet settings is a very complex undertaking, dominated by tough engineering challenges.
[edit] References
- Défago, X., Schiper, A., and Urbán, P. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4 (Dec. 2004), 372-421. DOI=10.1145/1041680.1041682 (alternate source)