Pseudorandom number generator

From Wikipedia, the free encyclopedia

A pseudorandom number generator (PRNG) is an algorithm that generates a sequence of numbers which are not truly random. The outputs of pseudorandom number generators only approximate some of the properties of random numbers. Although truly random numbers are believed to be generatable using hardware random number generators, pseudo-random numbers are important in practice for simulations (eg, of physical systems with the Monte Carlo method), and are central in the practice and so in the theory of cryptography. Careful mathematical analysis is required to have any confidence a PRNG generates numbers that are sufficiently "random" to suit the intended use. Robert R. Coveyou of Oak Ridge National Laboratory once titled an article, "The generation of random numbers is too important to be left to chance."1 and John von Neumann put essentially the same idea in a slightly more colorful way, "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."2

Most pseudo-random generator algorithms produce sequences which are uniformly distributed by any of several tests. Common classes of these algorithms are linear congruential generators, lagged Fibonacci generators, linear feedback shift registers and generalised feedback shift registers. Recent instances of pseudo-random algorithms include Blum Blum Shub, Fortuna, and the Mersenne twister.

Contents

[edit] Inherent nonrandomness

Because any PRNG run on a deterministic computer (contrast quantum computer) is necessarily a deterministic algorithm, its output will inevitably have one property that a true random sequence can never have: periodicity. It is certain that, if such a generator has only a finite memory, it will, given sufficient time, revisit a previous internal state, after which it will repeat the prior sequence forever. Non-periodic generators for deterministic computers can be designed, but memory requirements would grow without limit during runs; this is not, of course, possible in real equipment. A PRNG can be started from an arbitrary starting point, using a 'random' seed state; it will always produce the same sequence thereafter when initialized with that state.

The undesirable consequences of this periodicity can sometimes be avoided in practice. The length of the maximum period typically doubles with each bit of 'state' added, and it is thus relatively easy to build PRNGs with periods so long no computer could complete a single period in the expected lifetime of the universe, and this is sufficient in some applications. It is an open question, and one central to the theory and practice of cryptography, whether there is any way to distinguish the output of a high-quality PRNG from a truly random sequence (ie, white noise or its equivalent) without knowing, for instance, the algorithm(s) used and the state with which it was initialized. The security of most cryptographic algorithms and protocols using PRNGs is based on the assumption that it is infeasible to distinguish use of a suitable PRNG from a random sequence. The simplest examples of this dependency are stream ciphers, which (most often) works by exclusive oring the plaintext of a message with the output of a PRNG, producing ciphertext. The design of adequate PRNGs is extremely difficult, and most programs use simpler PRNGs, even some cryptosystems which should certainly not do so.

R P Brent's algorithm can be used to determine the period length of deterministic PRNGs; the result may be surprising, but they can be used to assist in evaluating the acceptability of a PRNG algorithm to a particular use.

[edit] Problems with deterministic generators

In practice, the output from many common PRNGs exhibit artifacts which cause them to fail statistical pattern detection tests. These include, but are certainly not limited to

  • Shorter than expected periods for some seed states (ie, not a full period); such seed states may be called 'weak' in this context
  • Poor dimensional distribution of the output sequence
  • Correlation of successive values
  • Some bit sequences being 'more random' than others
  • Lack of uniformity of distribution

Defects exhibited by flawed PRNGs range from unnoticeable to absurdly obvious. The RANDU random number algorithm used for decades on mainframe computers was seriously flawed, and much research work of that period is less reliable than it might have been, and should have been, as a result. The RAND function included in current Microsoft operating systems has only 15 bits of state and thus repeats after generating 32768 bytes. Sometimes, but not always, problems with models are noticed and traced to inadequate generators, which in turn sometimes leads to improvements in PRNGs. This is certainly not as often noticed as it should be, and so it is better to use good PRNGs from the beginning.

[edit] Early approaches

An early computer-based PRNG, suggested by John von Neumann in 1946, is known as the middle-square method. It is very simple: take any number, square it, remove the middle digits of the resulting number as your "random number", then use that number as the seed for the next iteration. For example, squaring the number "1111" yields "1234321", which can be written as "01234321", an 8-digit number being the square of a 4-digit number. This gives "2343" as the "random" number. Repeating this procedure gives "4896" as the next result, and so on. Von Neumann used 10 digit numbers, but the process was the same.

A problem with the "middle square" method is that all sequences eventually repeat themselves, some very quickly, such as "0000". Von Neumann was aware of this, but he found the approach sufficient for his purposes, and was worried that mathematical "fixes" would simply hide errors rather than remove them. The middle square method's errors are easy to detect.

Von Neumann judged hardware random number generators unsuitable, for, if they did not record the output generated, they could not later be tested for errors. If they did record their output, they would exhaust the limited computer memories available then, and so the computer's ability to read and write numbers. If the numbers were written to cards, they would take very much longer to write and read. On the ENIAC computer he was using, the "middle square" method generated numbers at a rate some two hundred times faster than reading numbers in from punch cards.

The middle-square method has been supplanted for most purposes by more elaborate generators.

[edit] Mersenne twister

The 1997 invention of the Mersenne twister algorithm, by Makoto Matsumoto and Takuji Nishimura, avoids many of the problems with earlier generators. It has the colossal period of 219937-1 iterations (likely more than the number of computations which could be performed within the entire future existence of the universe), is proven to be equidistributed in (up to) 623 dimensions (for 32-bit values), and runs faster than all but the least statistically reasonable generators. It is now increasingly becoming the "random number generator of choice" for statistical simulations and generative modeling.

Although suitable for other purposes, the Mersenne twister is not suitable for use in cryptography. It is equivalent to a generalized feedback shift register, and its output, therefore, is predictable.

[edit] Decimal expansion

The decimal expansion of a reciprocal of the form 1/q, for "suitable" values of q, provides an easily-implemented source of (pseudo-)random numbers [1]. Matthews showed that a sufficient condition for the method to produce a stream of high quality random numbers of length q - 1 is that q = 2S + 1 where S is a Sophie Germain prime, a prime such that both S and 2S + 1 are prime, with S being of the form 3, 9 or 11 mod 20. Thus “suitable” prime numbers q are 7, 23, 47, 59, 167, 179, etc (corresponding to S = 3, 11, 23, 29, 83, 89, etc.). The result is a stream of length q - 1 digits (including leading zeros). So, for example, using q = 23 generates the random digits 0,4,3,4,7,8,2,6.....3,9,1,3.

[edit] Cryptographically secure pseudorandom number generators

Main article: Cryptographically secure pseudorandom number generator

A PRNG suitable for cryptographic applications is called a cryptographically secure PRNG (CSPRNG). The chief difference between a PRNG and a CSPRNG is simple: in a cryptographical setting the state of the PRNG is a secret, and guessing it from the PRNG output must be as hard as possible, whereas normal applications only care about the statistical properties of the PRNG output (adversaries cannot achieve anything useful by predicting future pseudorandom numbers).

Some classes of CSPRNGs include the following:

[edit] Evaluation criteria example

The German Institute for Security in Information Technology has established a four part criteria for quality of deterministic random number generators. They are summarized here:

  • K1 -- A sequence of random numbers with a high probablility of containing no identical consecutive elements.
  • K2 -- A sequence of numbers which is indistinguishable from 'true random' numbers according to specified statistical tests. The tests are the monobit test (equal numbers of ones and zeros in the sequence), poker test (a special instance of the chi-squared test), runs test (counts the frequency of runs of various lengths), longruns test (checks whether there exists any run of length 34 or greater in 20 000 bits of the sequence) -- both from BSI2 (AIS 20, v. 1, 1999) and FIPS (140-1, 1994), and the autocorrelation test. In essence, these requirements are a test of how well a bit sequence: has zeros and ones equally often; after a sequence of n zeros (or ones), the next bit a one (or zero) with probablility one-half; and any selected subsequence contains no information about the next element(s) in the sequence.
  • K3 -- It should be impossible for any attacker (for all practical purposes) to calculate, or otherwise guess, from any given sub-sequence, any previous or future values in the sequence, nor any inner state of the generator.
  • K4 -- It should be impossible, for all practical purposes, for an attacker to calculate, or guess from an inner state of the generator, any previous numbers in the sequence or any previous inner generator states.

For cryptographic applications, only generators meeting the K4 standard are really acceptable.

[edit] See also

[edit] Notes

1 Peterson. Ivars. The Jungles of Randomness: A Mathematical Safari. Wiley, NY, 1998. (pp. 178) ISBN 0-471-16449-6

2 "Various techniques used in connection with random digits", Applied Mathematics Series, no. 12, 36-38 (1951).

[edit] References

  • Michael Luby, Pseudorandomness and Cryptographic Applications, Princeton Univ Press, 1996. A definitive source of techniques for provably random sequences.
  • Donald Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89684-2. Chapter 3, pp.1–193. Extensive coverage of statistical tests for non-randomness.
  • R. Matthews Maximally Periodic Reciprocals Bulletin of the Institute of Mathematics and its Applications 28 147-148 1992
  • J. Viega, Practical Random Number Generation in Software, in Proc. 19th Annual Computer Security Applications Conference, Dec. 2003.
  • John von Neumann, "Various techniques used in connection with random digits," in A.S. Householder, G.E. Forsythe, and H.H. Germond, eds., Monte Carlo Method, National Bureau of Standards Applied Mathematics Series, 12 (Washington, D.C.: U.S. Government Printing Office, 1951): 36-38.
  • NIST Recommendation for Random Number Generation Using Deterministic Random Bit Generators

[edit] External links