Tree automaton

A tree automaton is a type of state machine. Tree automata deal with tree structures, rather than the strings of more conventional state machines.

The following article deals with branching tree automata, which correspond to regular languages of trees. For a different notion of tree automaton, see tree walking automaton.

As with classical automata, finite tree automata (FTA) can be either a deterministic automaton or not. According to how the automaton processes the input tree, finite tree automata can be of two types: (a) bottom up, (b) top down. This is an important issue, as although non-deterministic (ND) top-down and ND bottom-up tree automata are equivalent in expressive power, deterministic top-down automata are strictly less powerful than their deterministic bottom-up counterparts, because tree properties specified by deterministic top-down tree automata can only depend on path properties. (Deterministic bottom-up tree automata are as powerful as ND tree automata.)

Definitions

A ranked alphabet is a pair of an ordinary alphabet F and a function Arity: F→ℕ. Each letter in F has its arity so it can be used to build terms. Nullary elements (of zero arity) are also called constants. Terms built with unary symbols and constants can be considered as strings. Higher arities leads to proper trees.

A bottom-up finite tree automaton over F is defined as a tuple (Q, F, Q_f, Δ), where Q is a set of unary letters used as states, F is a ranked alphabet, Q_f ⊆ Q is a set of final states, and Δ is a set of transition rules of the form f(q₁(x₁),...,q_n(x_n)) → q(f(x₁,...,x_n)), for an n-ary f ∈ F, q, q_i ∈ Q, and x_i variables denoting subtrees. That is, members of Δ are rewrite rules from nodes whose childs' roots are states, to nodes whose roots are states. Thus the state of a node is deduced from the states of its children.

For n=0, that is, for a constant symbol f, the above transition rule definition reads f() → q(f()); often the empty parentheses are omitted for convenience: f → q(f). Since these transition rules for constant symbols (leaves) do not require a state, no explicitly definied initial states are needed. A tree automaton is run on a ground term over F, starting at the leaves and moving upwards, associating a run state from Q with each subterm. The tree is accepted if its root is associated to an accepting state from Q_f.^[1]

A top-down finite tree automaton over F is defined as a tuple (Q, F, Q_i, Δ), with two differences with bottom-up tree automata. First, Q_i ⊆ Q, the set of its initial states, replaces Q_f; second, its transition rules are oriented conversely: q(f(x₁,...,x_n)) → f(q₁(x₁),...,q_n(x_n)), for an n-ary f ∈ F, q, q_i ∈ Q, and x_i variables denoting subtrees. That is, members of Δ are here rewrite rules from nodes whose roots are states to nodes whose childs' roots are states. A top-down automaton starts at the root and moves downward along branches of the tree, associating along a run a state with each subterm inductively. A tree is accepted if every branch can be gone through this way.^[3]

A bottom-up tree automaton is called deterministic (abbreviated DFTA) if no two rules from Δ have the same left hand side; otherwise it is called nondeterministic (NFTA).^[4] Non-deterministic top-down tree automata have the same expressive power as non-deterministic bottom-up ones;^[5] the transition rules are simply reversed, and the final states become the initial states.

In contrast, deterministic top-down tree automata^{[note 1]} are less powerful than their bottom-up counterparts, because in a deterministic tree automaton no two transition rules have the same left-hand side. For tree automata, transition rules are rewrite rules; and for top-down ones, the left-hand side will be parent nodes. Consequently a deterministic top-down tree automaton will only be able to test for tree properties that are true in all branches, because the choice of the state to write into each child branch is determined at the parent node, without knowing the child branches contents.

Examples

Employing coloring to distinguish members of F and Q, and using the ranked alphabet F={ false,true,nil,cons(.,.) }, with cons having arity 2 and all other symbols having arity 0, a bottom-up tree automaton recognizing the set of all finite lists of boolean values can be defined as (Q, F, Q_f, Δ) with Q={ Bool,BList }, Q_f={ BList }, and Δ consisting of the rules

false	→	Bool(false)	(1),
true	→	Bool(true)	(2),
nil	→	BList(nil)	(3), and
cons(Bool(x₁),BList(x₂))	→	BList(cons(x₁,x₂))	(4).

An accepting example run is

	cons(	false,	cons(	true,	nil	))
⇒	cons(	false,	cons(	true,	BList(nil)	))	by (3)
⇒	cons(	false,	cons(	Bool(true),	BList(nil)	))	by (2)
⇒	cons(	false,	BList(cons(	true,	nil	)))	by (4)
⇒	cons(	Bool(false),	BList(cons(	true,	nil	)))	by (1)
⇒	BList(cons(	false,	cons(	true,	nil	)))	by (4), accepted.

Cf. the derivation of the same term from a regular tree grammar corresponding to the automaton, shown at Regular tree grammar#Examples.

An rejecting example run is

	cons(	false,	true	)
⇒	cons(	false,	Bool(true)	)	by (1)
⇒	cons(	Bool(false),	Bool(true)	)	by (2), no further rule applicable.

Properties

Recognizability

For a bottom-up automaton, a ground term t (that is, a tree) is accepted if there exists a reduction that starts from t and ends with q(t), where q is a final state. For a top-down automaton, a ground term t is accepted if there exists a reduction that starts from q(t) and ends with t, where q(t) is an initial state.

The tree language L(A) recognized by a tree automaton A is the set of all ground terms accepted by A. A set of ground terms is recognizable if there exists a tree automaton that recognizes it.

A linear (that is, arity-preserving) homomorphism preserves recognizability.^[6]

Completeness and Reduction

A non-deterministic finite tree automaton is complete if there is at least one transition rule available for every possible symbol-states combination. A state q is accessible if there exists a ground term t such that there exists a reduction from t to q(t). An NFTA is reduced if all its states are accessible. ^[7]

Pumping Lemma

Every sufficiently large^{[note 2]} ground term t in a recognizable tree language L can be vertically tripartited^{[note 3]} such that arbitrary repetition ("pumping") of the middle part keeps the resulting term in L.^{[note 4]} ^[8]

For the language of all finite lists of boolean values from the above example, all terms beyond the height limit k=2 can be pumped, since they need to contain an occurrence of cons. For example,

cons(false,	cons(true,nil)	)	,
cons(false,cons(false,	cons(true,nil)	))	,
cons(false,cons(false,cons(false,	cons(true,nil)	)))	, ...

all belong to that language.

Closure

The class of recognizable tree languages is closed under union, under complementation, and under intersection.

Myhill-Nerode theorem

A congruence on tree languages is an equivalence relation such that u₁ ≡ v₁ and ... and u_n ≡ v_n implies f(u₁,...,u_n) ≡ f(v₁,...,v_n). It is of finite index if its number of equivalence-classes is finite.

For a given tree-language L, a congruence can be defined by u ≡_L v if C[u] ∈ L ⇔ C[v] ∈ L for each context C.^{[note 5]}

The Myhill-Nerode theorem for tree automaton states that the following three statements are equivalent:^[9]

L is a recognizable tree language
L is the union of some equivalence classes of a congruence of finite index
the relation ≡_L is a congruence of finite index

Notes

↑ In a strict sense, deterministic top-down automata are not defined by Comon et al.,^[2] but they are used there (Sect.1.6, Proposition 1.6.2, p.38). They accept the class of path-closed tree languages (Sect.1.8, Exercise 1.6, p.43-44).
↑ Formally: height(t) > k, with k > 0 depending only on L, not on t
↑ 3.0 3.1 Formally: there is a context C[.], a nontrivial context C’[.], and a ground term u such that t = C[C’[u]]. A "context" C[.] is a tree with one hole (or, correspondingly, a term with one occurrence of one variable). A context is called "trivial" if the tree consists only of the hole node (or, correspondingly, if the term is just the variable). The notation C[t] means the result of inserting the tree t into the hole of C[.] (or, correspondingly, instantiating the variable to t). Comon et.al. p.17 gives a formal definition.
↑ Formally: C[C’ⁿ[u]] ∈ L for all n ≥ 0. The notation Cⁿ[.] means the result of stacking n copies of C[.] one in another, cf. Comon et.al. p.17.
↑ See note ^{[note 3]} for the notion of a context.

References

↑ Comon et al.^[2] Sect.1.1, p.20
↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard and D. Lugiez, S. Tison, M. Tommasi (Nov 2008). Tree Automata Techniques and Applications (PDF). Retrieved 11 February 2014.
↑ Comon et al.^[2] Sect.1.6, p.38
↑ Comon et al.^[2] Sect.1.1, p.23.
↑ Comon et al.^[2] Sect.1.6, Theorem 1.6.1, p.38
↑ Comon et al.^[2] Sect.1.4, p.31-32, Theorem 1.4.3. The book's notion of tree homomorphism is more general than that of the article "tree homomorphism".
↑ Comon et al.^[2] Sect.1.1, p.23-24
↑ Comon et al.^[2] Sect.1.2, p.29
↑ Comon et al.^[2] Sect.1.5, p.36

External links

All the information in this page was taken from Chapter 1 of http://tata.gforge.inria.fr

Implementations

(OCaml) Grappa - Ranked and Unranked Tree Automata Libraries (http://www.grappa.univ-lille3.fr/~filiot/tata/)
(OCaml) Timbuk - Tools for Reachability Analysis and Tree Automata Calculations (http://www.irisa.fr/celtique/genet/timbuk/)
(Java) LETHAL - Library for working with finite tree and hedge automata (http://lethal.sf.net/)
(Isabelle [OCaml, SML, Haskell]) - Machine-Checked Tree Automata Library (http://afp.sourceforge.net/entries/Tree-Automata.shtml)
(C++) VATA: A Library for Efficient Manipulation of Non-Deterministic Tree Automata - (http://www.fit.vutbr.cz/research/groups/verifit/tools/libvata/)

Automata theory: formal languages and formal grammars

Chomsky hierarchy	Grammars	Languages	Abstract machines

Type-0 — Type-1 — — — — — Type-2 — — Type-3 — —	Unrestricted (no common name) Context-sensitive Positive range concatenation Indexed — Linear context-free rewriting systems Tree-adjoining Context-free Deterministic context-free Visibly pushdown Regular — Non-recursive	Recursively enumerable Decidable Context-sensitive Positive range concatenation^* Indexed^* — Linear context-free rewriting language Tree-adjoining Context-free Deterministic context-free Visibly pushdown Regular Star-free Finite	Turing machine Decider Linear-bounded PTIME Turing Machine Nested stack Thread automaton — Embedded pushdown Nondeterministic pushdown Deterministic pushdown Visibly pushdown Finite Counter-free (with aperiodic finite monoid) Acyclic finite

Each category of languages, except those marked by a ^*, is a proper subset of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.