Schwartzian transform

From Wikipedia, the free encyclopedia

In computer science, the Schwartzian Transform is a Perl programming idiom used to improve the efficiency of sorting a list of items. This idiom is appropriate for comparison-based sorting when comparing a pair of elements is a compute-intensive operation that should be performed a minimal number of times. The Schwartzian Transform is notable in that it does not use named temporary arrays.

The idiom is named after Randal L. Schwartz, who first demonstrated it in Perl shortly after the release of Perl 5 in 1994. The term "Schwartzian Transform" applied solely to Perl programming for a number of years, but it has later been adopted by some users of other languages, such as Python, to refer to similar idioms in those languages. It should be noted, however, that the algorithm was already in use in other languages (under no specific name) before it was popularized among the Perl community in the form of that particular idiom by Schwartz. The term "Schwartzian transform" indicates a specific idiom, and not the algorithm in general.

The Schwartzian Transform is a version of a Lisp idiom known as decorate-sort-undecorate, which avoids recomputing the sort keys by temporarily associating them with the input items. This approach is similar to memoization, which avoids repeating the calculation of the key corresponding to a specific input value. By comparison, this idiom assures that each input item's key is calculated exactly once, which may still result in repeating some calculations if the input data contains duplicate items. And in the 1960s sort generators^[1] "own coding" features facilitated similar input/output transformations for ease of comparing records (among other things).

1 The Perl idiom
2 Example
3 History
4 References
5 External links

[edit] The Perl idiom

The general form of the Schwartzian Transform is:

@sorted = map { $_->[0] }
          sort { $a->[1] cmp $b->[1] }
          map { [$_, foo($_)] }
          @unsorted;

Where foo($_) represents an expression that takes $_ (each item of the list in turn) and produces the corresponding value that is to be compared in its sake.

Reading from right to left (or from the bottom to the top):

the original list @unsorted is fed into a map operation that transforms each item into a list consisting of itself and the calculated value that will determine its sort order (list of item => list of [item, value]);
then the list of lists produced by map is fed into sort, which sorts it according to the values previously calculated (list of [item, value] => sorted list of [item, value]);
finally, another map operation removes the values used for the sorting, producing the items of the original list in the sorted order (sorted list of [item, value] => sorted list of item).

The use of anonymous arrays ensures that memory will be reclaimed by the Perl garbage collector immediately after the sorting is done.

The computational savings obtained by the Schwartzian transform depend strongly on the structure of the inner function. For an efficient ordinary sort function, the number of invocations of the transform function goes from an average of $2 n log n$ to $n$ ; one should carefully consider on a case-by-case basis whether the extra implementation complexity is justified by this efficiency savings.

[edit] Example

For example, to sort a list of files by their modification times, a novice approach might be as follows:

 function naiveCompare(file a, file b) {
     return modificationTime(a) < modificationTime(b)
 }
 
 // Assume that sort(list, comparisonPredicate) sorts the given list using
 // the comparisonPredicate to compare two elements.
 sortedArray := sort(filesArray, naiveCompare)

Unless the modification times are memoized for each file, this method requires their re-computing every time a file is compared in the sort. Using the Schwartzian transform, the modification time is calculated only once per file.

For people who have trouble understanding a functional style of coding, this procedural pseudo-code might help understand how the algorithm works:

 for each file in filesArray
     insert array(file, modificationTime(file)) at end of transformedArray
 
 function simpleCompare(array a, array b) {
     return a[2] < b[2]
 }
 
 transformedArray := sort(transformedArray, simpleCompare)
 
 for each file in transformedArray
     insert file[1] at end of sortedArray

Note, however, that while the underlying algorithm is the same, this code is not an instance of the Schwartzian transform idiom. See, for example, the use of a temporary array.

[edit] History

The first known online appearance of the Schwartzian Transform is a December 16, 1994 posting by Randal Schwartz to a thread in comp.unix.shell, crossposted to comp.lang.perl. (The current version of the Perl Timeline is incorrect and refers to a later date in 1995.) The thread began with a question about how to sort a list of lines by their "last" word:

 adjn:Joshua Ng
 adktk:KaLap Timothy Kwong
 admg:Mahalingam Gobieramanan
 admln:Martha L. Nangalama

Schwartz responded with:

#!/usr/bin/perl
require 5; # new features, new bugs!
print
    map { $_->[0] }
    sort { $a->[1] cmp $b->[1] }
    map { [$_, /(\S+)$/] }
    <>;

This code produces the result:

 admg:Mahalingam Gobieramanan
 adktk:KaLap Timothy Kwong
 admln:Martha L. Nangalama
 adjn:Joshua Ng

Schwartz noted in the post that he was "Speak[ing] with a lisp in Perl," a reference to the idiom's Lisp origins.

The term "Schwartzian Transform" itself was coined by Tom Christiansen in a followup reply. Later posts by Christiansen made it clear that he had not intended to name the construct, but merely to refer to it from the original post: his attempt to finally name it "The Black Transform" did not take hold ("Black" here being a pun on "schwar[t]z", which means black in German).

[edit] References

^ Donald Knuth, The Art of Computer Programming, volume 3 Sorting and Searching, section 5.4.6 subsection Sort generators, first edition, 1973, ISBN 0-201-03803-X, page 342