Algorithmically random sequence
From Wikipedia, the free encyclopedia
Intuitively, an algorithmically random sequence (or random sequence) is an infinite sequence of binary digits that appears random to any algorithm. The definition applies equally well to sequences on any finite set of characters. Random sequences are key objects of study in algorithmic information theory.
Because infinite sequences of binary digits can be identified with real numbers in the unit interval, random binary sequences are often called random real numbers. Additionally infinite binary sequences correspond to characteristic functions of sets of natural numbers; therefore those sequences might be seen as sets of natural numbers.
The class of all algorithmically random (binary) sequences is denoted by RAND.
Contents |
[edit] History
The first suitable definition of a random sequence was given by Per Martin-Löf in 1966. Earlier researchers such as Richard von Mises had attempted to formalize the notion of a test for randomness in order to define a random sequence as one that passed all tests for randomness; however, the precise notion of a randomness test was left vague. Martin-Löf's key insight was to use the theory of computation to formally define the notion of a test for randomness. This contrasts with the idea of randomness in probability; in that theory, no particular element of a sample space can be said to be random.
Martin-Löf randomness has since been shown to admit many equivalent characterizations -- in terms of compression, randomness tests, and gambling -- that bear little outward resemblance to the original definition, but each of which satisfy our intuitive notion of properties that random sequences ought to have: random sequences should be incompressible, they should pass statistical tests for randomness, and it should be difficult to make money betting on them. The existence of these multiple definitions of Martin-Löf randomness, and the stability of these definitions under different models of computation, give evidence that Martin-Löf randomness is a fundamental property of mathematics and not an accident of Martin-Löf's particular model.
[edit] Definitions
Martin-Löf's original definition of a random sequence was in terms of constructive null covers; he defined a sequence to be random if it is not contained in any such cover. Leonid Levin and Claus-Peter Schnorr proved a characterization in terms of Kolmogorov complexity: a sequence is random if there is a uniform bound on the compressibility of its initial segments. Schnorr gave a third equivalent definition in terms of martingales (a type of betting strategy). Li and Vitanyi's book An Introduction to Kolmogorov Complexity and Its Applications is an excellent introduction these ideas.
- Kolmogorov complexity (Schnorr 1973, Levin 1973): Kolmogorov complexity can be thought of as a lower bound on the algorithmic compressibility of a finite string. It assigns to each string w a natural number K(w) that, intuitively, measures that minimum length of a computer program that will output w when run with no input. Given a natural number c and a string w, we say that w is c-incompressible if .
- A sequence is random if and only if there is a constant c such that all prefixes of S are c-incompressible.
- Constructive null covers (Martin-Löf 1966): This is Martin-Löf's original definition. For a finite binary string w we let Cw denote the cylinder generated by w. This is the set of all infinite sequences beginning with w, which is a basic open set in Cantor space. The product measure μ(Cw) of the cylinder generated by w is defined to be 2-|w|. Every open subset of Cantor space is the union of a countable sequence of disjoint basic open sets, and the measure of an open set is exactly the sum of the measures of any such sequence. An effective open set is an open set that is the union of the sequence of basic open sets determined by a recursively enumerable sequence of binary strings. A constructive null cover or effective measure 0 set is a recursively enumerable sequence Ui of effective open sets (that is, a sequence of indices for open sets) such that and for each natural number i. Every effective null cover determines a Gδ set of measure 0.
- A sequence is defined to be random if it is not contained in any Gδ set determined by an effective null cover.
- Constructive martingales (Schnorr 1971): A martingale is a function such that, for all finite strings w, d(w) = (d(w0) + d(w1)) / 2. (This is Ville's original definition of a martingale, later extended by Joseph Leo Doob.) A constructive martingale d is a martingale that is constructive (also known as weakly computable, lower semi-computable, subcomputable): there exists a computable function such that, for all finite binary strings w and positive integers t,
- A martingale d is said to succeed on a sequence S if where is the first n bits of S.
- A sequence is random if and only if no constructive martingale succeeds on it.
[edit] Interpretations of the definitions
The Kolmogorov complexity characterization conveys the intuition that a random sequence is incompressible: no prefix can be produced by a program much shorter than the prefix.
The null cover characterization conveys the intuition that a random real number should not have any property that is “uncommon”. Each measure 0 set can be thought of as an uncommon property. It is not possible for a sequence to lie in no measure 0 sets, because each one-point set has measure 0. Martin-Löf's idea was to limit the definition to measure 0 sets that are effectively describable; the definition of an effective null cover determines a countable collection of effectively describable measure 0 sets and defines a sequence to be random if it does not lie in any of these particular measure 0 sets. Since the union of a countable collection of measure 0 sets has measure 0, this definition immediately leads to the theorem that there is a measure 1 set of random sequences.
The martingale characterization conveys the intuition that no effective procedure should be able to make money betting against a random sequence. A martingale d is a betting strategy. d reads a finite string w and bets money on the next bit. It bets some fraction of its money that the next bit will be 0, and then remainder of its money that the next bit will be 1. d doubles the money it placed on the bit that actually occurred, and it loses the rest. d(w) is the amount of money it has after seeing the string w. Since the bet placed after seeing the string w can be calculated from the values d(w), d(w0), and d(w1), calculating the amount of money it has is equivalent to calculating the bet. The martingale characterization says that no betting strategy implementable by any computer (even in the weak sense of constructive strategies, which are not necessarily computable) can make money betting on a random sequence.
[edit] Properties of random sequences
- There is a constructive null cover of RANDc (where RANDc denotes the complement of RAND). This means that all effective tests for randomness (that is, constructive null covers) are, in a sense, subsumed by this universal test for randomness, since any sequence that passes this single test for randomness will pass all tests for randomness. (Martin-Löf 1966)
- There is a universal constructive martingale d. This martingale is universal in the sense that, given any constructive martingale d, if d succeeds on a sequence, then d succeeds on that sequence as well. Thus, d succeeds on every sequence in RANDc (but, since d is constructive, it succeeds on no sequence in RAND). (Schnorr 1971)
- The class RAND is a subset of Cantor space, where refers to the second level of the arithmetical hierarchy. This is because a sequence S is in RAND if and only if there is some open set in the universal effective null cover that does not contain S; this property can be seen to be definable by a formula.
- There is a random sequence which is , that is, computable relative to an oracle for the Halting problem. (Schnorr 1971) Chaitin's Ω is an example of such a sequence.
- No random sequence is decidable, computably enumerable, or co-computably-enumerable. Since these correspond to the , , and levels of the arithmetical hierarchy, this means that is the lowest level in the arithmetical hierarchy where random sequences can be found.
- RANDc is a measure 0 subset of the set of all infinite sequences. This is implied by the existence of a universal null cover. It is also implied by the fact that each constructive null cover covers a measure 0 set, there are only countably many null covers, and a countable union of measure 0 sets has measure 0. This implies that RAND is a measure 1 subset of the set of all infinite sequences.
- Every random sequence is normal.
- Every sequence is Turing reducible to some random sequence. (Kučera 1985/1989, Gács 1986). Thus there are random sequences of arbitrarily high Turing degree.
[edit] References
- Gács, P. "Every sequence is reducible to a random one". Information and Control, 70(2/3):186-192, August/September 1986.
- Kučera, A. "Measure, -classes and complete extensions of PA". Recursion Theory Week, Lecture Notes in Mathematics, 1141:245–259, 1985.
- Kučera, A. "On the use of diagonally nonrecursive functions". In Studies in Logic and the Foundations of Mathematics, volume 129, pages 219–239. North-Holland, 1989.
- Levin, L. "On the notion of a random sequence". Soviet Mathematics Doklady, 14:1413-1416, 1973.
- Li M. and Vitanyi P. M. B. An Introduction to Kolmogorov Complexity and its Applications. Springer-Verlag, Berlin, 1997. Second Edition.
- Martin-Löf, P. "The definition of random sequences." Information and Control 9, 602-619 (1966).
- Schnorr, C. P., "A unified approach to the definition of a random sequence." Mathematical Systems Theory, 5(3) (1971), 246-258.
- Schnorr, C. P., "Process complexity and effective random tests." Journal of Computer and System Sciences, 7:376-388, 1973.
- Ville, J. "Etude critique de la notion de collectif." Gauthier-Villars, Paris, 1939.