Shortest common supersequence

From Wikipedia, the free encyclopedia

This shortest common supersequence problem is closely related to the longest common subsequence problem. Given two sequences X = < x₁,...,x_m > and Y = < y₁,...,y_n >, a sequence U = < u₁,...,u_k > is a common supersequence of X and Y if U is a supersequence of both X and Y.

The shortest common supersequence (scs) is a common supersequence of minimal length. In the shortest common supersequence problem, the two sequences X and Y are given and the task is to find a shortest possible common supersequence of these sequences. In general, the scs is not unique.

For two input sequences, an scs can be formed from an lcs easily. For example, if X $[1.. m] = a b c b d a b$ and Y $[1.. n] = b d c a b a$ , the lcs is Z $[1.. r] = b c b a$ . By inserting the non-lcs symbols while preserving the symbol order, we get the scs: U $[1.. t] = a b d c a b d a b$ .

It is quite clear that $r + t = m + n$ for two input sequences. However, for three or more input sequences this does not hold. Note also, that the lcs and the scs problems are not dual problems.