Self-stabilization
From Wikipedia, the free encyclopedia
To comply with Wikipedia's lead section guidelines, the introduction of this article may need to be rewritten. Please discuss this issue on the talk page and read the layout guide to make sure the section will be inclusive of all essential details. |
Self-stabilization is a concept of fault-tolerance in distributed computing. A distributed algorithm is self-stabilizing if, starting from an arbitrary state, it is guaranteed to converge to a legitimate state and remain in a legitimate set of states thereafter. A state is legitimate if starting from this state the algorithm satisfies its specification. The property of self-stabilization enables a distributed algorithm to recover from a transient fault regardless of its nature. Moreover, a self-stabilizing algorithm does not have to be initialized as it eventually starts to behave correctly.
E.W. Dijkstra in 1974 presented the first self-stabilizing algorithms. This prompted further research in this area.[1] Dijkstra used the example of a "token ring" — a network of computers ordered in a circle, such that exactly one of them is supposed to "hold a token" at any given time. Not holding a token is a correct state for each computer in this network, since the token can be held by another computer. However, if every computer is in the state of "not holding a token" then the network as a whole is not in a correct state. Similarly, if more than one computer "has a token" then this is not a correct state for the network, although it cannot be observed to be incorrect by viewing any computer individually. Since every computer can "see" only the states of two other computers, it is hard for the computers to decide whether the network as a whole is in a correct state.
The ability to recover without external intervention is very desirable in modern computer and telecommunications networks, since it would enable them to repair errors and return to normal operations on their own. Computers and networks can thus be made fault-tolerant. Hence, many years after the seminal paper of Dijkstra, this concept is gaining in importance as it presents an important foundation for self-managing computer systems and fault-tolerant systems.
The time complexity of a self-stabilizing algorithm is measured in (asynchronous) rounds or cycles. A round is a shortest execution trace in which each processor executes at least one step. Similarly, a cycle is a shortest execution trace in which each processor executes at least one complete iteration of its repeatedly executed list of commands. It is also interesting to measure the output stabilization time. For that, a subset of the state variables is defined to be externally visible (the output). Certain states of outputs are defined to be correct (legitimate). The set of the outputs of all the components of the system is said to have stabilized at the time that it starts to be correct, provided it stays correct indefinitely, unless additional faults occur. The output stabilization time is the time (the number of (asynchronous) rounds) until the output stabilized.[2]
The first self-stabilizing algorithms did not detect errors explicitly in order to subsequently repair them. Instead, they constantly pushed the system towards a legitimate state, even without explicitly detecting error states. Since traditional methods for detecting an error (e.g.[3]) were often very difficult and time-consuming, such a behaviour was considered desirable.
New methods for light-weight error detection for self-stabilizing systems were suggested in,[4][2] under the names of local detection[4] and local checking.[2] The term local refers to a part of a computer network. When local detection is used, a computer in a network is not required to communicate with the entire network in order to detect an error — the error can be detected by having each computer communicate only with its nearest neighbors. These local detection methods simplified the task of designing self-stabilizing algorithms considerably. This is because the error detection mechanism and the recovery mechanism can be designed separately. Newer algorithms based on these detection methods turned out to be also much more efficient.
Additional efficiency was introduced with the notion of time-adaptive protocols.[5] The idea behind these is that when only a small number of errors occurs, the recovery time should (and can) be made short. The original algorithms of Dijkstra do not have this property.
A useful property of self-stabilizing algorithms is that they can be composed by layering if they do not exhibit any circular dependencies. The stabilization time of the composition is then bounded by the sum of the individual stabilization times of each layer.
[edit] Definition
A system is self-stabilizing if and only if:
- Starting from any state, it is guaranteed that the system will eventually reach a correct state (convergence).
- Given that the system is in a correct state, it is guaranteed to stay in a correct state, provided that no fault happens (closure).
A system is said to be randomized self-stabilizing if and only if it is self-stabilizing and the expected number of rounds needed to reach a correct state is bounded by some constant k.[6]
A self-stabilizing algorithm is silent if and only if it converges to a global state where the values of communication registers used by the algorithm remain fixed.[7]
[edit] Related work
An extension of the concept of self-stabilization is that of superstabilization.[8] The intent here is to cope with dynamic distributed systems that undergo topological changes. In classical self-stabilization theory, arbitrary changes are viewed as errors where no guarantees are given until the system has stabilized again. With superstabilizing systems, there is a passage predicate that is always satisfied, while the system's topology is reconfigured.
[edit] References
- ^ E.W. Dijkstra: Self-stabilizing systems in spite of distributed control. Commun. ACM 17 (1974), 11: 643-644.
- ^ a b c Baruch Awerbuch, Boaz Patt-Shamir, George Varghese. Self-Stabilization By Local Checking and Correction (Extended Abstract) FOCS 1991: 268-277.
- ^ Shmuel Katz, Kenneth J. Perry. Self-Stabilizing Extensions for Message-Passing Systems. Distributed Computing 7(1): 17-26 (1993).
- ^ a b Yehuda Afek, Shay Kutten, Moti Yung. The Local Detection Paradigm and Its Application to Self-Stabilization. Theor. Comput. Sci. 186(1-2): 199-229 (1997).
- ^ Shay Kutten, Boaz Patt-Shamir: Stabilizing Time-Adaptive Protocols. Theor. Comput. Sci. 220(1): 93-111 (1999).
- ^ Self-Stabilization. Shlomi Dolev, MIT Press, 2000.
- ^ Shlomi Dolev, Mohamed G. Gouda, and Marco Schneider. Memory requirements for silent stabilization. In PODC '96: Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, pages 27--34, New York, NY, USA, 1996. ACM Press. Online extended abstract.
- ^ Shlomi Dolev and Ted Herman. Superstabilizing protocols for dynamic distributed systems. Chicago Journal of Theoretical Computer Science, 4, December 1997. Special Issue on Self-Stabilization.