Talk:Longest common subsequence problem/Archive

From Wikipedia, the free encyclopedia

Archive This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Contents

Repetitive language

The repeated use of the language "Computer scientists" with some verb (devised, agree) could be improved to just usign "Computer science".

Besides, the repeated contextual links to" Computer scientists" would make the user think they would click on Computer scientists to go to an entry of the computer scientists who invented, devised or agreed to the algorithm's solution, not to the literal entry "Computer scientists".

Besides this, a great entry. Hope I've helped.


Relevant External Links

The link to dedupe seems to be more of a plug for it then a honest attempt to provide a link to an external application of LCS... if no one complains I am going to remove it


Sequence or string?

I've read somewhere that the longest common substring is not necesarily the same as the longest common subsequence. Specifically a sequence apparently doesn't have to be contiguous. Is there any truth in this? Shinobu 23:30, 3 Apr 2005 (UTC)

Yes, you are correct: substrings are contiguous, subsequences need not be.

A new edit states that this article should not be confused with LCSubstring. However, when I read the article I kind of get the feeling this is the LCSubstring article... How to proceed? Does anyone know enough details on both to make the call? Shinobu 6 July 2005 21:55 (UTC)

Uh oh.. Didn't realise this before but there's already a Longest common substring problem article. It explains how to do it using trees which is probably a lot more efficient (and harder to grasp) than the method here (I saw you asking about trees on another talk page Shinobu so you may be interested). Now I've implemented the LCS algorithm as described on this article (the LCSubsequence article), and I'm using it in fully-working software - it is most definitely giving me substrings. Maybe I didn't implement understand correctly as stated here and may have accidentally modified it into a substring problem, but from what I can tell, both articles give differing algorithms for the same problem. I recommend that this article be merged with the other one (giving the easy solution first and then the tricky one) and a new stub article is made - and also that it's moved to the page without a hyphen! Geez, what a mess! —EatMyShortz 02:33, 15 February 2006 (UTC)
See substring for an explanation of the difference between a substring and a subsequence. -- Nils Grimsmo 19:04, 7 August 2006 (UTC)


These wiki-links should be added somewhere

Unfortunately I haven't got the time to do so in a neat fashion right now. Anyway, here goes: suffix tree suffix array Bye! Shinobu 23:43, 4 Apr 2005 (UTC)

Are you talking about an approximate solution? -- Nils Grimsmo 19:04, 7 August 2006 (UTC)


Removed C source

I removed the C source, partly because it should be on Wikisource and not here, partly because if we should have a source on this we should use a better method, partly because we already have pseudocode source and partly because it was not formatted correctly. Of course I encourage anyone to put an optimized suffix array version on Wikisource and add a link here. Shinobu 12:39, 17 Apr 2005 (UTC)

Wikisource isn't for source code - it's for old public domain texts (nothing to do with programming)! C code is often published on Wikipedia although pseudocode is generally preferable so I agree the C source should go. —EatMyShortz 02:04, 5 February 2006 (UTC)

Whether Wikisource is(n't) for source code is a matter of ongoing debate. Back when I first edited Wikipedia, back when I didn't even have a username, source code belonged on Wikisource. However, some people have since started objecting. Now I don't know anymore. Wikipedia references source code on Wikisource, Wikisource contains lots of source code, but some want to move it all to Wikibooks, Wikibooks says "No, that stuff should be on Wikisource." I think simple boilerplate code (bubblesort and the like) qualify as primary source texts, but I don't want to burn my fingers on this debate, so I refuse to participate in it. They can jolly well fight the fight themselves and afterwards clean up the bloody mess themselves. Shinobu 03:12, 15 February 2006 (UTC)

Hmm.. I suppose algorithms count as "source text", just so long as the word "source" isn't confused - the "source" in "WikiSource" means that it's being taken from a primary or secondary source (see Primary source for definition), whereas the word "source" in "source code" means the original form of a program before being compiled, so it's a coincidence that the word is being used in two ways - Anyway here's their policy: What Wikisource includes#Source codeEatMyShortz 13:45, 19 February 2006 (UTC)


Hyphen

This article should be called "longest common subsequence problem", not "longest-common subsequence problem". There is absolutely no reason to have a hyphen there.

I guess you're right. But before you decide to move this article remember that there is still some research going on that perhaps may force a rename to "Longest common substring problem" anyway. Stay tuned, Shinobu 16:46, 26 Apr 2005 (UTC)


The algorithm

Recently someone removed the algorithm, and then put it pack again, in such away as to remove the exernal links. I reverted the edit, but...

I think the algorithm reminds me very much of the LCSstring. I'll check if it is, and if so, I'll move it to the appropriate article. Shinobu 01:10, 27 August 2005 (UTC)

IMHO it would be best just to remove the algorithm and instead add a link to http://www.ics.uci.edu/~eppstein/161/960229.html , which is the same explanation but written so that it actually makes sense. 89.102.137.191 08:29, 6 July 2006 (UTC)

Source?

Is this algorithm cited? It seems to directly come from Cormen et al

Cleanup

This is a very poorly written page. The C code is exactly the same as the recursion above it, only less comprehensible. There is also no mention of the algorithms that are better than this one for large sequences. --Zero 11:21, 9 November 2005 (UTC)

Example: Incorrect B table?

From my understanding of the algorithm, the B table should have a lot less 0s. For instance anything to the right of a "left" cell should not say 0 - because at the very least it should look "left" to its left neighbour. Although the pictured B table doesn't relate to any specific example (perhaps it should), it seems invalid. Here's what I reckon a valid b-table might look like (based on the same b-table given, and I also made its corresponding C table):

C table:

\begin{bmatrix} ! & 0 & 0 & 0 & 0 & 0\\  
 0 & 1 & 1 & 1 & 1 & 1 \\  
 0 & 1 & 1 & 2 & 2 & 2\\  
 0 & 1 & 1 & 2 & 3 & 3 \\  
 0 & 1 & 1 & 2 & 3 & 3 \\
\end{bmatrix}

B table:

\begin{bmatrix} ! & 0 & 0 & 0 & 0 & 0\\  
 0 & UP-LEFT & LEFT & LEFT & LEFT & LEFT \\  
 0 & UP & UP & UP-LEFT & LEFT & LEFT\\  
 0 & UP & UP & UP & UP-LEFT & LEFT \\  
 0 & UP & UP & UP & UP & LEFT \\
\end{bmatrix}

It's quite a bit more confusing to look at but.. the other one seems wrong. I'll play more with the algorithm and see if I can confirm or deny this. Also what purpose does the '!' do? As far as I can tell you just stop when you hit a 0 in th b table. You don't necessarily finish at (0,0) anyway, that's only if the LCS starts at the beginning of the string! —EatMyShortz 02:15, 5 February 2006 (UTC)

Also why does the algorithm return c? As far as I can tell, it is no longer useful, and the b table alone is used to determine the LCS. —EatMyShortz 02:22, 5 February 2006 (UTC)

Actually the b table calculation is correct. The reason c is returned, in my opinion the wrong idea for using the algorithm, is that the idea is to only determine the length of the LCS not the LCS itself. Personally I do not see any reason for wanting to know anything but the LCS it self (see my stuff on finding the diff)

Direving the edit list

The following algorithm will give the list of edits needed to make Y equal to X (the diff algorithm):

Notation: [n,m] refers to a LCS component at X[n] and Y[n]

Assumptions:

   The LCS as found by findLCS() is placed on a stack moving from the lower-right corner and 

following the direction arrows to form the LCS.

oldElem <---- [-1,-1]

while stack is not empty

   elem <---- pop from stack
   // diff does not require LCS components but add to list here if desired
   for i=oldElem[n] to elem[n]
        add delete X(i) to list of edits
   for i=oldElem[m] to elem[m]
        add insert Y(i) to list of edits

Note: Diff(1) will treat each "gap" between LCS components as a "hunk"

Very poor page

"An old method of searching for LCS was to employ a brute force policy: ..." - says who? Is there a known example where this "old algorithm" was used? This page needs rewriting from scratch. --Zerotalk 06:52, 7 May 2006 (UTC)

I've changed it to "naive" - it isn't used to actually solve the problem, but to demonstrate that it's a) mathematically well-defined b) interesting, in that the obvious algorithm isn't practical