Subsequence

From Wikipedia, the free encyclopedia

In mathematics, a subsequence of some sequence is a new sequence which is formed from the original sequence by deleting some of the elements without disturbing the relative positions of the remaining elements.

Formally, suppose that X is a set and that (ak)kK is a sequence in X, where K = {1,2,3,...,n} if (ak) is a finite sequence and K = N if (ak) is an infinite sequence. Then, a subsequence of (ak) is a sequence of the form (a_{n_r}) where (nr) is a strictly increasing sequence in the index set K.

Contents

[edit] Example

As an example,

< B,C,D,B >

is a subsequence of

< A,C,B,D,E,G,C,E,D,B,G > ,

with corresponding index sequence <3,7,9,10>.

Given two sequences X and Y, a sequence G is said to be a common subsequence of X and Y, if G is a subsequence of both X and Y. For example, if

X = < A,C,B,D,E,G,C,E,D,B,G > and
Y = < B,E,G,C,F,E,U,B,K >

then common subsequence of X and Y could be

G = < B,E,E >

This would not be the longest common subsequence, since G only has length 3, and the common subsequence < B,E,E,B > has length 4. The longest common subsequence of X and Y is < B,E,G,C,E,B >

[edit] Applications

Subsequences have applications to computer science, especially in the discipline of Bioinformatics, where computers are used to compare, analyze, and store DNA strands.

Take two strands of DNA, say :

ORG1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
ORG2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA

Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.

[edit] Substring vs. subsequence

In computer science, string is often used as a synonym for sequence, but it is important to note that substring and subsequence are not synonyms. Substrings are consecutive parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but the opposite is not true.[1]

[edit] See also

[edit] References

  1. ^ Gusfield, Dan [1997] (1999). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. USA: Cambridge University Press, 4. ISBN 0-521-58519-8.

This article incorporates material from subsequence on PlanetMath, which is licensed under the GFDL.

In other languages