Deterministic finite automaton
In theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite state machine—is a finite state machine that accepts/rejects finite strings of symbols and only produces a unique computation (or run) of the automaton for each input string.[1] 'Deterministic' refers to the uniqueness of the computation. In search of simplest models to capture the finite state machines, McCulloch and Pitts were among the first researchers to introduce a concept similar to finite automaton in 1943.[2][3]
The figure illustrates a deterministic finite automaton using a state diagram. In the automaton, there are three states: S0, S1, and S2 (denoted graphically by circles). The automaton takes a finite sequence of 0s and 1s as input. For each state, there is a transition arrow leading out to a next state for both 0 and 1. Upon reading a symbol, a DFA jumps deterministically from a state to another by following the transition arrow. For example, if the automaton is currently in state S0 and current input symbol is 1 then it deterministically jumps to state S1. A DFA has a start state (denoted graphically by an arrow coming in from nowhere) where computations begin, and a set of accept states (denoted graphically by a double circle) which help define when a computation is successful.
A DFA is defined as an abstract mathematical concept, but is often implemented in hardware and software for solving various specific problems. For example, a DFA can model software that decides whether or not online user-input such as email addresses are valid.[4] (see: finite state machine for more practical examples).
DFAs recognize exactly the set of regular languages[1] which are, among other things, useful for doing lexical analysis and pattern matching. DFAs can be built from nondeterministic finite automata (NFAs) using the powerset construction method.
Formal definition
A deterministic finite automaton M is a 5-tuple, (Q, Σ, δ, q0, F), consisting of
- a finite set of states (Q)
- a finite set of input symbols called the alphabet (Σ)
- a transition function (δ : Q × Σ → Q)
- an initial or start state (q0 ∈ Q)
- a set of accept states (F ⊆ Q)
Let w = a1a2 ... an be a string over the alphabet Σ. The automaton M accepts the string w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions:
- r0 = q0
- ri+1 = δ(ri, ai+1), for i = 0, ..., n−1
- rn ∈ F.
In words, the first condition says that the machine starts in the start state q0. The second condition says that given each character of string w, the machine will transition from state to state according to the transition function δ. The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. Otherwise, it is said that the automaton rejects the string. The set of strings that M accepts is the language recognized by M and this language is denoted by L(M).
A deterministic finite automaton without accept states and without a starting state is known as a transition system or semiautomaton.
For more comprehensive introduction of the formal definition see automata theory.
Complete and incomplete deterministic finite automata
According to the above definition, deterministic finite automata are always complete: they define a transition for each state and each input symbol.
While this is the most common definition, some authors use the term deterministic finite automaton for a slightly different notion: an automaton that defines at most one transition for each state and each input symbol; the transition function is allowed to be partial. When no transition is defined, such an automaton halts.
Example
The following example is of a DFA M, with a binary alphabet, which requires that the input contains an even number of 0s.
M = (Q, Σ, δ, q0, F) where
- Q = {S1, S2},
- Σ = {0, 1},
- q0 = S1,
- F = {S1}, and
- δ is defined by the following state transition table:
0 1 S1 S2 S1 S2 S1 S2
The state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not. If the input did contain an even number of 0s, M will finish in state S1, an accepting state, so the input string will be accepted.
The language recognized by M is the regular language given by the regular expression (1 + 0 (1*) 0)*, where "*" is the Kleene star, e.g., 1* denotes any non-negative number (possibly zero) of symbols "1".
Closure properties
If DFAs recognize the languages that are obtained by applying an operation on the DFA recognizable languages then DFAs are said to be closed under the operation. The DFAs are closed under the following operations.
- Union
- Intersection
- Concatenation
- Negation
- Kleene closure
- Reversal
- Init
- Quotient
- Substitution
- Homomorphism
Since DFAs are equivalent to nondeterministic finite automata (NFA), these closures may be proved using closure properties of NFA.
DFA as a transition monoid
Alternatively a run can be seen as a sequence of compositions of transition function with itself. Given an input symbol , one may write the transition function as , using the simple trick of currying, that is, writing for all . This way, the transition function can be seen in simpler terms: it's just something that "acts" on a state in Q, yielding another state. One may then consider the result of function composition repeatedly applied to the various functions , , and so on. Using this notion we define . Given a pair of letters , one may define a new function , by insisting that , where denotes function composition. Clearly, this process can be recursively continued. So, we have the following recursive definition
- where is empty string and
- where and .
is defined for all words . Repeated function composition forms a monoid. For the transition functions, this monoid is known as the transition monoid, or sometimes the transformation semigroup. The construction can also be reversed: given a , one can reconstruct a , and so the two descriptions are equivalent.
Local automata
A local automaton is a DFA for which all edges with the same label lead to a single vertex. Local automata accept the class of local languages, those for which membership of a word in the language is determined by a "sliding window" of length two on the word.[5][6]
A Myhill graph over an alphabet A is a directed graph with vertex set A and subsets of vertices labelled "start" and "finish". The language accepted by a Myhill graph is the set of directed paths from a start vertex to a finish vertex: the graph thus acts as an automaton.[5] The class of languages accepted by Myhill graphs is the class of local languages.[7]
Random DFA
When the start state and accept states are ignored, a DFA of -states and an alphabet of size can be seen as a digraph of vertices in which all vertices have out-arcs labeled (a -out digraph). It is known that when is a fixed integer, with high probability, the largest strongly connected component (SCC) in such a -out digraph chosen uniformly at random is of linear size and it can be reached by all vertices.[8] It has also been proven that if is allowed to increase as increases, then the whole digraph has a phase transition for strong connectivity similar to Erdős–Rényi model for connectivity.[9]
In a random DFA, the maximum number of vertices reachable from one vertex is very close to the number of vertices in the largest SCC with high probably.[8][10] This is also true for the largest induced sub-digraph of minimum in-degree one, which can be seen as a directed version of -core.[9]
Advantages and disadvantages
DFAs are one of the most practical models of computation, since there is a trivial linear time, constant-space, online algorithm to simulate a DFA on a stream of input. Also, there are efficient algorithms to find a DFA recognizing:
- the complement of the language recognized by a given DFA.
- the union/intersection of the languages recognized by two given DFAs.
Because DFAs can be reduced to a canonical form (minimal DFAs), there are also efficient algorithms to determine:
- whether a DFA accepts any strings
- whether a DFA accepts all strings
- whether two DFAs recognize the same language
- the DFA with a minimum number of states for a particular regular language
DFAs are equivalent in computing power to nondeterministic finite automata (NFAs). This is because, firstly any DFA is also an NFA, so an NFA can do what a DFA can do. Also, given an NFA, using the powerset construction one can build a DFA that recognizes the same language as the NFA, although the DFA could have exponentially larger number of states than the NFA.[11][12]
On the other hand, finite state automata are of strictly limited power in the languages they can recognize; many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA. The classic example of a simply described language that no DFA can recognize is bracket or Dyck language, i.e., the language that consists of properly paired brackets such as word "(()())". Intuitively, no DFA can recognize the Dyck language because DFAs are not capable of counting: a DFA-like automaton needs to have a state to represent any possible number of "currently open" parentheses, meaning it would need an unbounded number of states. Another simpler example is the language consisting of strings of the form anbn for some finite but arbitrary number of a's, followed by an equal number of b's.[13]
See also
- Acyclic deterministic finite automata
- DFA minimization
- Monadic second-order logic
- Quantum finite automata
- Read-only right moving Turing Machines
- Separating words problem
- Turing machine
- Two-way deterministic finite automaton
Notes
- 1 2 Hopcroft 2001:
- ↑ McCulloch and Pitts (1943):
- ↑ Rabin and Scott (1959):
- ↑ Gouda, Prabhakar, Application of Finite automata
- 1 2 Lawson (2004) p.129
- ↑ Sakarovitch (2009) p.228
- ↑ Lawson (2004) p.128
- 1 2 Grusho, A. A. (1973). "Limit distributions of certain characteristics of random automaton graphs". Mathematical Notes of the Academy of Sciences of the USSR 4: 633–637. doi:10.1007/BF01095785.
- 1 2 Cai, X.S.; Devroye, L. "The graph structure of a deterministic automaton chosen at random: full version". arXiv:1504.06238.
- ↑ Carayol, Arnaud; Nicaud,, Cyril (2012). "Distribution of the number of accessible states in a random deterministic automaton".
- ↑ Sakarovitch (2009) p.105
- ↑ Lawson (2004) p.63
- ↑ Lawson (2004) p.46
References
- Hopcroft, John E.; Motwani, Rajeev; Ullman, Jeffrey D. (2001). Introduction to Automata Theory, Languages, and Computation (2 ed.). Addison Wesley. ISBN 0-201-44124-1. Retrieved 19 November 2012.
- Lawson, Mark V. (2004). Finite automata. Chapman and Hall/CRC. ISBN 1-58488-255-7. Zbl 1086.68074.
- McCulloch, W. S.; Pitts, E. (1943). "A logical calculus of the ideas imminent in nervous activity". Bulletin of Mathematical Biophysics: 541–544.
- Rabin, M. O.; Scott, D. (1959). "Finite automata and their decision problems.". IBM J. Res. Develop.: 114–125.
- Sakarovitch, Jacques (2009). Elements of automata theory. Translated from the French by Reuben Thomas. Cambridge: Cambridge University Press. ISBN 978-0-521-84425-3. Zbl 1188.68177.
- Sipser, Michael (1997). Introduction to the Theory of Computation. Boston: PWS. ISBN 0-534-94728-X.. Section 1.1: Finite Automata, pp. 31–47. Subsection "Decidable Problems Concerning Regular Languages" of section 4.1: Decidable Languages, pp. 152–155.4.4 DFA can accept only regular language
External links
|