Fisher-Yates shuffle

From Wikipedia, the free encyclopedia

The Fisher-Yates shuffle, named after Ronald Fisher and Frank Yates, also known as the Knuth shuffle, after Donald Knuth, is an algorithm for generating a random permutation of a finite set—in plain terms, for randomly shuffling the set. A variant of the Fisher-Yates shuffle, known as Sattolo's algorithm, may be used to generate cyclic permutations instead. Properly implemented, the Fisher-Yates shuffle is unbiased, so that every permutation is equally likely. The modern version of the algorithm is also rather efficient, requiring only time proportional to the number of items being shuffled and no additional storage space.

The basic process of Fisher-Yates shuffling is similar to randomly picking numbered tickets out of a hat, or cards from a deck, one after another until there are no more left. What the specific algorithm provides is a way of doing this numerically in an efficient and rigorous manner that, properly done, guarantees an unbiased result.

Contents

[edit] Fisher and Yates' original method

The Fisher-Yates shuffle, in its original form, was described in 1938 by Ronald A. Fisher and Frank Yates in their book Statistical tables for biological, agricultural and medical research.[1] (Later editions describe a somewhat different method attributed to C. R. Rao.) Their method was designed to be implemented using pencil and paper, with a precomputed table of random numbers as the source of randomness. The basic method given for generating a random permutation of the numbers 1–N goes as follows:

  1. Write down the numbers from one to N.
  2. Pick a random number k between one and the number of unstruck numbers remaining (inclusive).
  3. Counting from the low end, strike out the kth number not yet struck out, and write it down elsewhere.
  4. Repeat from step 2 until all the numbers have been struck out.
  5. The sequence of numbers written down in step 3 is now a random permutation of the original numbers.

Provided that the random numbers picked in step 2 above are truly random and unbiased, so will the resulting permutation be. Fisher and Yates took care to describe how to obtain such random numbers in any desired range from the supplied tables in a manner which avoids any bias. They also suggested the possibility of using a simpler method — picking random numbers from one to N and discarding any duplicates—to generate the first half of the permutation, and only applying the more complex algorithm to the remaining half, where picking a duplicate number would otherwise become frustratingly common.

[edit] The modern algorithm

The modern version of the Fisher-Yates shuffle, designed for computer use, was introduced by Richard Durstenfeld in 1964 in Communications of the ACM volume 7, issue 7, as "Algorithm 235: Random permutation",[2] and was popularized by Donald E. Knuth in volume 2 of his book The Art of Computer Programming as "Algorithm P".[3] Neither Durstenfeld nor Knuth, in the first edition of his book, acknowledged the earlier work of Fisher and Yates in any way, and may not have been aware of it. Subsequent editions of The Art of Computer Programming do, however, mention Fisher and Yates' contribution.[4]

The algorithm described by Durstenfeld differs from that given by Fisher and Yates in a small but significant way. Whereas a naive computer implementation of Fisher and Yates' method would spend needless time counting the remaining numbers in step 3 above, Durstenfeld's solution is to move the "struck" numbers to the end of the list by swapping them with the last unstruck number at each iteration. This reduces the algorithm's time complexity to O(n), compared to O(n2) for the naive implementation.[5] The algorithm thus becomes, for a set of N elements:

  1. Let A1 := 1, A2 := 2 and so on up to AN := N, and let n := N.
  2. Pick a random number k between 1 and n inclusive.
  3. If kn, swap the values of Ak and An.
  4. Decrease n by one.
  5. Repeat from step 2 until n is less than 2.

The Fisher-Yates shuffle, as implemented by Durstenfeld, is an in-place shuffle. That is, given a preinitialized array, it shuffles the elements of the array in place, rather than producing a shuffled copy of the array. This can be an advantage if the array to be shuffled is large. An example implementation of Durstenfeld's algorithm in Java (with 0-based arrays) could be:

public static void shuffle (int[] array) 
    {
        Random rng = new Random();   // i.e., java.util.Random.
        int n = array.length;        // The number of items left to shuffle (loop invariant).
        while (n > 1) 
        {
            int k = rng.nextInt(n);  // 0 <= k < n.
            --n;                     // n is now the last pertinent index;
            int temp = array[n];     // swap array[n] with array[k].
            array[n] = array[k];
            array[k] = temp;
        }
    }

The implementation above relies on Random.nextInt(int) providing sufficiently random and unbiased results; see below for potential problems if this is not the case.

[edit] Examples

[edit] Pencil-and-paper method

As an example, we'll permute the numbers from 1 to 8 using Fisher and Yates' original method. We'll start by writing the numbers out on a piece of scratch paper:

Range Roll Scratch Result
    1 2 3 4 5 6 7 8  

Now we roll a random number k from 1 to 8—let's make it 3—and strike out the kth (i.e. third) number (3, of course) on the scratch pad and write it down as the result:

Range Roll Scratch Result
1–8 3 1 2 3 4 5 6 7 8 3

Now we pick a second random number, this time from 1 to 7: it turns out to be 4. Now we strike out the fourth number not yet struck off the scratch pad—that's number 5—and add it to the result:

Range Roll Scratch Result
1–7 4 1 2 3 4 5 6 7 8 3 5

Now we pick the next random number from 1 to 6, and then from 1 to 5, and so on, always repeating the strike-out process as above:

Range Roll Scratch Result
1–6 5 1 2 3 4 5 6 7 8 3 5 7
1–5 3 1 2 3 4 5 6 7 8 3 5 7 4
1–4 4 1 2 3 4 5 6 7 8 3 5 7 4 8
1–3 1 1 2 3 4 5 6 7 8 3 5 7 4 8 1
1–2 2 1 2 3 4 5 6 7 8 3 5 7 4 8 1 6
    1 2 3 4 5 6 7 8 3 5 7 4 8 1 6 2

[edit] Modern method

We'll now do the same thing using Durstenfeld's version of the algorithm: this time, instead of striking out the chosen numbers and copying them elsewhere, we'll swap them with the last number not yet chosen. We'll start by writing out the numbers from 1 to 8 as before. For clarity, we'll use a vertical bar (|) to separate the part of the list that has already been processed from the part that hasn't been permuted yet; of course, no such separator is actually used in the real algorithm:

Range Roll Scratch | Result
    1 2 3 4 5 6 7 8 |

We now roll a random number from 1 to 8: this time it's 6, so we swap the 6th and 8th numbers in the list:

Range Roll Scratch | Result
1–8 6 1 2 3 4 5 8 7 | 6

The next random number we roll from 1 to 7, and turns out to be 2. Thus, we swap the 2nd and 7th numbers and move on:

Range Roll Scratch | Result
1–7 2 1 7 3 4 5 8 | 2 6

The next random number we roll is from 1 to 6, and just happens to be 6, which means we leave the 6th number in the list (which, after the swap above, is now number 8) in place and just move to the next step. Again, we proceed the same way until the permutation is complete:

Range Roll Scratch | Result
1–6 6 1 7 3 4 5 | 8 2 6
1–5 1 5 7 3 4 | 1 8 2 6
1–4 3 5 7 4 | 3 1 8 2 6
1–3 3 5 7 | 4 3 1 8 2 6
1–2 1 7 | 5 4 3 1 8 2 6

At this point there's nothing more that can be done, so the resulting permutation is 7 5 4 3 1 8 2 6.

[edit] Variants

[edit] Sattolo's algorithm

A very similar algorithm was published in 1986 by Sandra Sattolo for generating uniformly distributed cyclic permutations.[6] The only difference between Durstenfeld's and Sattolo's algorithms is that in the latter, in step 2 above, the random number k is chosen from the range between 1 and n−1 (rather than between 1 and n) inclusive. To turn the Java example above into an example of Sattolo's algorithm, simply replace rng.nextInt(n) with rng.nextInt(n-1) in the code. This simple change modifies the algorithm so that the resulting permutation always consists of a single cycle.

In fact, as described below, it's quite easy to accidentally implement Sattolo's algorithm when the ordinary Fisher-Yates shuffle is intended. This will bias the results by causing the permutations to be picked from the smaller set of (N−1)! cyclic permutations instead of the full set of all N! possible permutations.

The fact that Sattolo's algorithm in fact produces a cyclic permutation, and that it produces each such permutation with equal probability, may not be immediately obvious. The former can be shown inductively: Assume that, before step 2 of the modified algorithm, the permutation has n distinct cycles, each containing exactly one member Ai for which in. This is clearly true at the start, when Ai = i for all 1 ≤ iN, and n = N. Given the assumption, for any randomly chosen k < n, An and Ak must belong to distinct cycles, and thus swapping their values in step 3 will merge those cycles, reducing the number of distinct cycles by one. This merged cycle will have two members (An and Ak) with indices less than or equal to n, but will lose one of them when n is correspondingly decreased by one in step 4, and thus the assumption given above will continue to hold. Eventually, of course, n, and thus the number of cycles, will decrease down to one, at which point the algorithm will terminate.

As for the equal probability of the permutations, it suffices to observe that the modified algorithm involves (N−1)! distinct possible sequences of swaps, each of which clearly produces a different permutation, and each of which occurs—assuming the random number source is unbiased—with equal probability. Since (N−1)!, the number of distinct permutations the algorithm can produce, is also known to be exactly the total number of cyclic permutations of N elements, it is clear that the algorithm must be able to produce them all.

[edit] Comparison with other shuffling algorithms

The Fisher-Yates shuffle is quite efficient; indeed, its asymptotic time and space complexity are optimal. Combined with a high-quality unbiased random number source, it is also guaranteed to produce unbiased results. Compared to some other solutions, it also has the advantage that, if only part of the resulting permutation is needed, it can be stopped halfway through, or even stopped and restarted repeatedly, generating the permutation incrementally as needed.

In high-level programming languages with a fast built-in sorting algorithm, an alternative method, where each element of the set to be shuffled is assigned a random number and the set is then sorted according to these numbers, may be faster in practice, despite having worse asymptotic time complexity (O(n log n) vs. O(n)). Like the Fisher-Yates shuffle, this method will also produce unbiased results if correctly implemented, and may be more tolerant of certain kinds of bias in the random numbers. However, care must be taken to ensure that the assigned random numbers are never duplicated, since sorting algorithms in general won't order elements randomly in case of a tie.

A variant of the above method that has seen some use in languages that support sorting with user-specified comparison functions is to shuffle a list by sorting it with a comparison function that returns random values. However, this does not always work: with a number of commonly used sorting algorithms, the results end up biased due to internal asymmetries in the sorting implementation.[7]

[edit] Potential sources of bias

Care must be taken when implementing the Fisher-Yates shuffle, both in the implementation of the algorithm itself and in the generation of the random numbers it is built on, otherwise the results may show detectable bias. A number of common sources of bias have been listed below.

[edit] Implementation errors

A common error when implementing the Fisher-Yates shuffle is to pick the random numbers from the wrong range. The resulting algorithm may appear to work, but will produce biased results. For example, a common off-by-one error would be replacing k = rng.nextInt(n) with k = rng.nextInt(n-1) in the Java example above, so that k is always strictly less than the last pertinent index, n-1. (In Java, Random.nextInt(int) returns a random non-negative integer less than its argument.[8]) This turns the Fisher-Yates shuffle into Sattolo's algorithm, which only ever produces cyclic permutations: in particular, it is easy to see that, with this modification, the last element of the array can never end up in its original position.

Similarly, always selecting k from the entire range of valid array indexes on every iteration (i.e. using k = rng.nextInt(array.length) in the Java example above) also produces a result which is biased, albeit less obviously so. This can be seen from the fact that doing so yields NN distinct possible sequences of swaps, whereas there are only N! possible permutations of an N-element array. Since NN can never be evenly divisible by N! (as the latter is divisible by N−1, which shares no prime factors with N), some permutations must be produced by more of the NN sequences of swaps than others.

[edit] Modulo bias

Doing a Fisher-Yates shuffle involves picking uniformly distributed random integers from various ranges. Most random number generators, however—whether true or pseudorandom—will only directly provide numbers in some fixed range, such as, say, from 0 to 232−1. A simple and commonly used way to force such numbers into a desired smaller range is to apply the modulo operator; that is, to divide them by the size of the range and take the remainder. However, the need, in a Fisher-Yates shuffle, to generate random numbers in every range from 0–1 to 0–N pretty much guarantees that some of these ranges will not evenly divide the natural range of the random number generator. Thus, the remainders will not always be evenly distributed and, worse yet, the bias will be systematically in favor of small remainders.

For example, assume that your random number source gives numbers from 0 to 99 (as was the case for Fisher and Yates' original tables), and that you wish to obtain an unbiased random number from 0 to 15. If you simply divide the numbers by 16 and take the remainder, you'll find that the numbers 0–3 occur about 17% more often than others. This is because 16 does not evenly divide 100: the largest multiple of 16 less than or equal to 100 is 6×16 = 96, and it is the numbers in the incomplete range 96–99 that cause the bias. The simplest way to fix the problem is to discard those numbers before taking the remainder and to keep trying again until a number in the suitable range comes up. While in principle this could, in the worst case, take forever, in practice the expected number of retries will always be less than one.

A related problem occurs with implementations that first generate a random floating-point number—usually in the range [0,1)—and then multiply it by the size of the desired range and round down. The problem here is that random floating-point numbers, however carefully generated, always have only finite precision. This means that there are only a finite number of possible floating point values in any given range, and if the range is divided into a number of segments that doesn't divide this number evenly, some segments will end up with more possible values than others. While the resulting bias will not show the same systematic downward trend as in the previous case, it will still be there.

[edit] Limited PRNG state space

An additional problem occurs when the Fisher-Yates shuffle is used with a pseudorandom number generator: as the sequence of numbers output by such a generator is entirely determined by its internal state at the start of a sequence, a shuffle driven by such a generator cannot possibly produce more distinct permutations than the generator has distinct possible states. Even when the number of possible states exceeds the number of permutations, the irregular nature of the mapping from sequences of numbers to permutations means that some permutations will occur more often than others. Thus, to minimize bias, the number of states of the PRNG should exceed the number of permutations by at least several orders of magnitude.

For example, the built-in pseudorandom number generator provided by many programming languages and/or libraries may often have only 32 bits of internal state, which means it can only produce 232 different sequences of numbers. If such a generator is used to shuffle a deck of 52 playing cards, it can only ever produce a vanishingly small fraction of the 52! ≈ 2225.6 possible permutations. Thus, it's impossible for a generator with less than 226 bits of internal state to produce all the possible permutations of a 52-card deck, and for a (reasonably) unbiased shuffle, the generator must have at least about 250 bits of state.

A further problem occurs when a simple linear congruential PRNG is used with the divide-and-take-remainder method of range reduction described above. The problem here is that the low-order bits of a linear congruential PRNG are less random than the high-order ones: the low n bits of the generator themselves have a period of at most 2n. When the divisor is a power of two, taking the remainder essentially means throwing away the high-order bits, such that one ends up with a significantly less random value.

Also, of course, no pseudorandom number generator can produce more distinct sequences than there are distinct seed values it may be initialized with. Thus, it doesn't matter much if a generator has 1024 bits of internal state if it is only ever initialized with a 32-bit seed.

[edit] References

  1. ^ Fisher, R.A.; Yates, F. [1938] (1948). Statistical tables for biological, agricultural and medical research, 3rd ed., London: Oliver & Boyd, pp. 26–27. OCLC 14222135.  (note: 6th edition, ISBN 0-02-844720-4, is available on the web, but gives a different shuffling algorithm by C. R. Rao)
  2. ^ Durstenfeld, Richard (July 1964). "Algorithm 235: Random permutation". Communications of the ACM 7 (7): 420. doi:10.1145/364520.364540. ISSN 0001-0782. 
  3. ^ Knuth, Donald E. (1969). The Art of Computer Programming volume 2: Seminumerical algorithms. Reading, MA: Addison-Wesley, 124–125. OCLC 85975465. 
  4. ^ Knuth [1969] (1998). The Art of Computer Programming vol. 2, 3rd ed., 145–146. ISBN 0-201-89684-2. OCLC 38207978. 
  5. ^ Black, Paul E. (2005-12-19). Fisher-Yates shuffle. Dictionary of Algorithms and Data Structures. National Institute of Standards and Technology. Retrieved on 2007-08-09.
  6. ^ Wilson, Mark C. (2004-06-21). "Overview of Sattolo's Algorithm" in Algorithms Seminar 2002–2004. F. Chyzak (ed.), summary by Éric Fusy. INRIA Research Report 5542: 105–108. ISSN 0249-6399. 
  7. ^ A simple shuffle that proved not so simple after all. require ‘brain’ (2007-06-19). Retrieved on 2007-08-09.
  8. ^ java.util.Random.nextInt(int). Java 2 Platform SE v1.4.2 documentation. Retrieved on 2007-08-09.