Shortest common supersequence

From Wikipedia, the free encyclopedia

This shortest common supersequence problem is closely related to the lcs (longest common subsequence problem). Again, assume that two sequences X = < x₁,...,x_m > and Y = < y₁,...,y_n > are given.

A sequence U = < u₁,...,u_k > is a common supersequence of X and Y if U is a supersequence of both X and Y.

The shortest common supersequence (scs) is a common supersequence of minimal length. In the shortest common supersequence problem, the two sequences X and Y are given and the task is to find a shortest possible common supersequence of these sequences. In general, the scs is not unique.

For two input sequences, a scs can be formed from a lcs easily. For example, if X $[1.. m] = a b c b d a b$ and Y $[1.. n] = b d c a b a$ , the lcs is Z $[1.. r] = b c b a$ . By inserting the non-lcs symbols while preserving the symbol order, we get the scs: U $[1.. t] = a b d c a b d a b$ .

It is quite clear that $r + t = m + n$ for two input sequences. However, for three or more input sequences this does not hold. Note also, that the lcs and the scs problems are not dual problems.

[edit] References

Michael R. Garey and David S. Johnson (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman. ISBN 0-7167-1045-5. A4.2: SR8, pg.228.