Substring index
From Wikipedia, the free encyclopedia
A substring index is a data structure which gives substring search in a text or text collection in sublinear time. If you have a document S of length n, or a set of documents of total length n, you can locate all occurrences of a pattern P in o(n) time. (o(n) means less than O(n). See Big O notation.)
The phrase full-text index is also often used for an index of all substrings of a text. But is ambiguous, as it is also used for regular word indexes such as inverted files and signature files. See full text search.
Substring indexes include:
- Suffix tree
- Suffix array
- N-gram index, an inverted file for all N-grams of the text
- Compressed suffix array
- FM-index
- LZ-index