Talk:Merge algorithm
From Wikipedia, the free encyclopedia
Contents |
[edit] Nomenclature issue
I would like to know if it is standard nomenclature to call "merge algorithms" the ones that follow.
- given a set of current account balances and a set of transactions, both sorted by account number, produce the set of new account balances after the transactions are applied; this requires always advancing the "new transactions" pointer in preference to the "account number" pointer when the two have equal keys, and adding all the numbers on either tape with the same account number to produce the new balance.
- produce a sorted list of records with keys present in all the lists (equijoin); this requires outputting a record whenever the keys of all the p0..n are equal.
- similarly for finding the largest number on one tape smaller than each number on another tape (e.g. to figure out what tax bracket each person is in).
- similarly for computing set differences: all the records in one list with no corresponding records in another.
I checked several books on algorithms and in all of them "merge algorithm" is taking two or more ordered lists and producing one ordered list in linear time. Maybe the above algorithms could be in another article.
Pablo.cl 02:04, 24 May 2004 (UTC)
The only reference I've found so far that calls set difference a merge algorithm is Kragen Sitaker. I think the nomenclature is misleading, and that the examples in italics above, and also the middle one I copy below, should be in a separate article, tentatively called Sorted list algorithms.
I copy form kragen-hacks
Some examples of merge algorithms:
- produce a single sorted list from multiple sorted lists. (This is the kind of merge used in mergesort.)
- produce the set union, set intersection, or set difference of two sorted lists. [Set intersection and set difference would be moved to Sorted list algorithms.]
- given a master file and an update file sorted in the same order, produce a new master file with all the updates applied.
Pablo.cl 01:50, 26 May 2004 (UTC)
[edit] Nomenclature response
Pablo may be correct; perhaps updating a sorted master file from a sorted list of updates is not a common meaning for the term "merge", but the term "merge" does seem to be used for it in the wild ([1] says:
Batch Processing Batch processing occurs when files are processed off-line, i.e. they are saved until some convenient time (e.g. at night) when the computer system is not otherwise being used.
Updating files: the updates, i.e. the changed records, are "batched" together, i.e. saved, and sorted into key order. Then the master file is read, one record at a time, until a record with the same key as the first transaction is found. This master file record is then updated and then the process is repeated. The updating is done as follows: * For magnetic tape. Put the updates in a separate file and merge the master and the update to give a new master. * For a disk, the easiest method is to associate a pointer with each record leading to the next one (linked list). Individual records can be scrubbed and replaced by reseting pointers. In general, if only one master file is available, it is not altered, but a new master is produced by merging.
Clearly in this context the new master file should not contain both the record from the old master file and the record from the update file, which would be the necessary consequence of interpreting "merge" as "produce multiset union".)
And perhaps the Sort-merge join in Oracle and other RDBMSs' query evaluation isn't a common meaning for the term either; but certainly the merge in that case isn't concerned with producing a multiset union, but a subset of the multiset cartesian product.
Certainly when Knuth mentions "the idea of merging" on page 385 of volume 3 2ed, he's talking about merging two sorted sequences (of punched cards) to get a sorted sequence containing all the items of both sequences --- a sorted multiset union. Pages 197-207 also appear to be talking about producing the multiset union.
In any case, each of these things I called "merge algorithms" when I wrote this page can be conceptualized as the composition of some other function over sequences and the ordinary sorting merge, the one that produces a multiset union.
The kragen-hacks reference carries no weight in this case, because kragen-hacks is written by the same person who promulgated this questionable nomenclature in this Wikipedia article in the first place: me. If we decide to accept the term "merge" for this larger class of algorithms beyond just multiset union, it would have to be on the basis of documented terminology usage by people besides me. In my eyes, the fact that Knuth uses the term with a clearly more restricted meaning in mind creates a substantial burden of proof.
-- KragenSitaker
[edit] Minor change to python example
The example originally had a[0] < b[0] rather then a[0] <= b[0] as the basis for using the item from the a[] array. This would not result in a stable sort, since equal elements would have been selected from the b[] array before those from the a[] array. The change ensures stability.
It's a bit of a shame that one sees so many assertions that merge sort is stable. As the uncorrected example shows, it's easily possible to write an unstable, but otherwise valid, merge sort. -- jpl
[edit] about the examples
I removed the comments about the STL merge implementation being more efficient than the presented source code because there's no guarantee of the STL's implementation performance, or approachability, for any specific task. The assertion I removed doesn't apply in all cases, and can't be presented as fact without much more context and discussion.
Meanwhile, I am wondering why the examples don't implement the pseudo code presented at the beginning of the article. The examples have no comments and don't explain what their parameters are. The C and C++ versions don't bother to check the length of the target array.
Should I add comments, and rewrite the samples to more closely match the pseudocode? Mikeblas 21:48, 28 December 2005 (UTC)
[edit] Sample code
I intend to remove the sample implementations section if and when Wikipedia:Articles for deletion/Insertion sort implementations is successful. It may be worthwhile to transwiki them to WikiBooks, WikiSource, or the Literate Programming wiki if anyone feels strongly about them. —donhalcon╤ 06:09, 5 March 2006 (UTC)
- That AFD was successful, but there are still a couple of sample implementations. What's going on with requiring references for sample code in articles? -- Mikeblas 15:43, 5 September 2007 (UTC)
[edit] BEST KNOWN MERGE ALGORITHM
I would like to know which is the fastest of knowns merge algorithms, I've read a lot... but I didn't find an answer. Trying to find an optimal sorting algorithm, I came to the conclusion that the optimal merge is a generalization of the binary search algorithm. Someone thinks that may be true, or even already known? I've writen in ADA this algorithm and if someone finds the best known merge algorithm, I will benchmark my proof of concept against. Meanwhile, I would try to compare it to clasic merge.--Azrael60 18:55, 26 September 2007 (UTC)
As a side note, I think a mention to Hwang-Lin binary merging algorithm should be done. Also the information theory limit for the worst case comparaison.--Azrael60 (talk) 16:05, 13 January 2008 (UTC)