Three-way comparison

From Wikipedia, the free encyclopedia

In computer science, a three-way comparison takes two values A and B belonging to a type with a total order and determines whether A < B, A = B, or A > B in a single operation, in accordance with the mathematical law of trichotomy.

Machine-level computation

Many processors have instruction sets that support such an operation on primitive types. Some machines have signed integers based on a sign-and-magnitude or one's complement representation (see signed number representations), both of which allow a differentiated positive and negative zero. This does not violate trichotomy as long as we adopt a consistent total order: either -0 = +0 or -0 < +0 is valid. Common floating point types, however, have an exception to trichotomy: there is a special value "NaN" (Not a Number) such that x < NaN, x > NaN, and x = NaN are all false for all floating-point values x (including NaN itself).

High-level languages

In C, the `strcmp` and `memcmp` perform three-way comparison between strings and memory buffers, respectively. They return a number with sign -1 when the first argument is lexicographically smaller than the second, zero when the arguments are equal and a positive number otherwise. This convention of returning the "sign of the difference" is extended to arbitrary comparison functions by the standard sorting function qsort, which takes a comparison function as an argument and requires it to abide by it.

In Perl (numeric comparison only), Ruby, and Groovy, the spaceship operator ("<=>") returns the sign function (a.k.a. signum) values of -1, 0, or 1 depending on whether A < B, A = B, or A > B, respectively. In Python 2.x (since removed in 3.x), the cmp function computes the same thing. In OCaml the compare function computes the same thing. In the Haskell standard library the three-way comparison function compare is defined for all types in the Ord class; it returns type Ordering, whose values are LT (less than), EQ (equal), and GT (greater than).

Many object-oriented languages have a three-way comparison method, which performs a three-way comparison between the object and another given object. For example, in Java, any class that implements the Comparable interface has a compareTo method which returns a negative integer, zero, or a positive integer. Similarly, in the .NET Framework, any class that implements the IComparable interface has such a CompareTo method.

Since Java version 1.5, the same can be computed using the Math.signum static method if the difference can be known without computational problems such as arithmetic overflow mentioned below. Many computer languages allow the definition of functions so a compare(A,B) could be devised appropriately, but the question is whether or not its internal definition can employ some sort of three-way syntax or else must fall back on repeated tests.

When implementing a three-way comparison where a three-way comparison operator or method is not already available, it is common to combine two comparisons, such as A = B and A < B, or A < B and A > B. A compiler may be able to optimize these two comparisons into a single three-way comparison at a lower level, but this is not a common optimization.

In some cases, three-way comparison can be simulated by subtracting A and B and examining the sign of the result, exploiting special instructions for examining the sign of a number. However, this requires the type of A and B to have a well-defined difference. Fixed-width signed integers may overflow when they are subtracted, floating-point numbers have the value NaN with undefined sign, and character strings have no difference function corresponding to their total order. At the machine level, overflow is typically tracked and can be used to determine order after subtraction, but this information is not usually available to higher-level languages.

In one case of a three-way conditional provided by the programming language, Fortran's now-deprecated three-way arithmetic IF statement considers the sign of an arithmetic expression and offers three labels to jump to according to the sign of the result:

     IF (expression) negative,zero,positive

The common library function strcmp in C and related languages is a three-way lexicographic comparison of strings; however, these languages lack a general three-way comparison of other data types.

Composite data types

Three-way comparisons have the property of being easy to compose and build lexicographic comparisons of non-primitive data types, unlike two-way comparisons.

Here is a composition example in Perl.

    sub compare($$) {
        my ($a, $b) = @_;
        return $a->{unit} cmp $b->{unit}
            || $a->{rank} <=> $b->{rank}
            || $a->{name} cmp $b->{name};
    }

Note that cmp is for strings as <=> is for numbers. Two-way equivalents tend to be less compact but not necessarily less legible. The above takes advantage of short-circuit evaluation of the || operator, and the fact that 0 is considered false in Perl. As a result, if the first comparison is equal (thus evaluates to 0), it will "fall through" to the second comparison, and so on., until it finds one that is non-zero, or until it reaches the end.

In some languages, including Python, Ruby, Haskell, etc., comparison of lists are done lexicographically, which means that it is possible to build a chain of comparisons like the above example by putting the values into lists in the order desired; for example in Ruby:

[a.unit, a.rank, a.name] <=> [b.unit, b.rank, b.name]

See also

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.