Suffix trie

From Wikipedia, the free encyclopedia

A suffix tree of a string is a tree representing all the suffixes of that string. When a suffix tree for string A has been constructed, it can be used to determine whether another string B is a substring of string A.

Contents

[edit] Formal definition

Formally, a suffix tree has

  • A set of states Q
  • An initial state root
  • A set of finishing states
  • A transition function g: Q x \Sigma\, \rightarrow Q
  • A suffix function f: Q \rightarrow Q

The set of states of the suffix tree of a string A can be put in one-one correspondence with the substrings of A. The set of finishing states would correspond to the suffixes of the string A. Given a character, the transition function takes a state to another state. A suffix function represents suffix links going from one state to another state. f(x) = y means x = ay, where a \in \Sigma\,. For example, in the picture above, f(“cacao”) = “acao”

There is also an auxiliary state \perp that acts as the state before root.

[edit] Construction

Initial suffix trie
Initial suffix trie
Suffix trie with "c"
Suffix trie with "c"
Suffix trie with "ca"
Suffix trie with "ca"
Suffix trie with "cac"
Suffix trie with "cac"
Suffix trie with "caca"
Suffix trie with "caca"
Suffix trie with "cacao"
Suffix trie with "cacao"

To construct the suffix trie, we add the characters of the string one by one. Suppose the character we are adding is c, we do the following:

r = top
while there is no c-transition from r do
    create new state r’ and new transition g(r, c) = r’
    if r != top then create f(oldr’) = r’
    oldr’ = r’
    r = f(r)
f(oldr’) = g(r, c)
top = g(top, c)

After each iteration, top points to the state that represents the full string.

We illustrate the construction of a suffix trie by constructing a trie for “cacao”.

Step 1 First, start with the trie on the right. Initially, top is root. To add the character ‘c’, we follow the path formed by suffix links starting from root until we get a c-transition.

Step 2 After the character ‘c’ is added, the suffix trie looks like the second picture on the right.

Step 3 To add the character ‘a’, we start with state “c”. Since state “c” has no a-transition, we create a new state that goes from “c”, given the character ‘a’. We follow the suffix link of “c” and reach root. We see that root has no a-transition, so we create a new state that goes from root, given ‘a’. The suffix trie after this step looks like the third picture on the right.

Step 4 We do the same thing when we add ‘c’. Likewise, we start from “ca”, and we create new nodes when there are no c-transitions. We stop at root since g(root, c) is defined.

Step 5 This step is similar to the previous step. We create new states for “ca” and “a”, and when we can stop when get to “ca”, since there is a c-transition already defined there.

Step 6 The same thing is done when ‘o’ is added. From “caca” to “aca”, to “ca”, to “a” and to root, there are no o-transition. Therefore, new states and suffix links are created accordingly.

[edit] Conclusion

The algorithm to construct suffix tries runs in quadratic time and space. This algorithm can be modified into another algorithm for constructing suffix trees, which runs in linear time and space. Interested readers should consult Ukkonen’s paper or wikipedia’s entry on suffix trees.

[edit] Reference

E. Ukkonen. (1995). On-line construction of suffix trees. Algorithmica 14(3):249-260. PDF