Standard probability space

From Wikipedia, the free encyclopedia

Discrete (that is, finite or countably infinite) probability spaces are technically simple. In contrast, non-discrete probability spaces are technically complicated objects of measure theory; some of them are pathologic, others are standard (in other words, Lebesgue-Rokhlin) probability spaces.

Non-discrete probability spaces are indispensable when formalizing various ideas; two examples:

tossing a fair coin endlessly (Bernoulli process);
a random variable distributed uniformly on a given interval.

Some probabilists hold the following opinion: only standard probability spaces are pertinent to probability theory, thus, it is a pity that the standardness is not included into the definition of `probability space'. Others disagree, however.

1 Arguments against standardness
2 Arguments in favour of standardness
3 Short history
4 What is meant here by `pathologic'?
- 4.1 An example: a naive white noise
- 4.2 Another example: a perforated interval
5 A definition
6 Verifying the standardness
7 Using the standardness
- 7.1 Regular conditional probabilities
- 7.2 Measure preserving transformations
8 Further reading

[edit] Arguments against standardness

the definition of standardness is technically demanding;
the same about the theorems based on that definition;
it is possible (and natural) to build all the probability theory without the standardness;
events and random variables are essential, while probability spaces are auxiliary and should not be taken too seriously.

[edit] Arguments in favour of standardness

conditioning is easy and natural on standard probability spaces, otherwise it becomes obscure;
the same for measure-preserving transformations between probability spaces, group actions on a probability space, etc.;
ergodic theory uses standard probability spaces routinely and successfully;
being unable to eliminate these (auxiliary) probability spaces, we should make them as useful as possible.

[edit] Short history

Norbert Wiener constructed the Wiener process (also called `Brownian motion') in the form of a measurable map from the interval $\textstyle (0,1)$ to the space $\textstyle C[0,\infty)$ of continuous functions. The map sends the Lebesgue measure $\textstyle \text{mes}$ on $\textstyle (0,1)$ into the Wiener measure on $\textstyle C[0,\infty)$ . Thus, the seemingly one-dimensional probability space $\textstyle ((0,1),\text{mes})$ appeared to be rich enough for carrying a (non-degenerate) random element of the infinite-dimensional space $\textstyle C[0,\infty)$ . See Lecture 1 in [7].

Vladimir Rokhlin created the theory of standard probability spaces in 1940 (see [1], p.2); published in short in 1947, in detail in 1949 in Russian and in 1952 in English, reprinted in 1962 [1]. He showed that the probability space $\textstyle ((0,1),\text{mes})$ is sufficient for all `reasonable' purposes of probability theory, and has important advantages over general probability spaces. For modernized presentations see [2], [3], and Sect. 2.4 of [4].

Nowadays standard probability spaces may be (and often are) treated in the framework of descriptive set theory, via standard Borel spaces, see for example Sect. 17 of [5].

[edit] What is meant here by `pathologic'?

[edit] An example: a naive white noise

The space of all functions $\textstyle f : \mathbb{R} \to \mathbb{R}$ may be thought of as the product $\textstyle \mathbb{R}^\mathbb{R}$ of a continuum of copies of the real line $\textstyle \mathbb{R}$ . One may equip $\textstyle \mathbb{R}$ with a probability measure, say, the standard normal distribution $\textstyle \gamma = N(0,1)$ , and treat the space of functions as the product $\textstyle (\mathbb{R},\gamma)^\mathbb{R}$ of a continuum of identical probability spaces $\textstyle (\mathbb{R},\gamma)$ . The product measure $\textstyle \gamma^\mathbb{R}$ is a probability measure on $\textstyle \mathbb{R}^\mathbb{R}$ . Many non-experts are inclined to believe that $\textstyle \gamma^\mathbb{R}$ describes the so-called white noise.

However, it does not. For the white noise, its integral from $0$ to $1$ should be a random variable distributed $\textstyle N(0,1)$ . In contrast, the integral (from $0$ to $1$ ) of $\textstyle f \in \textstyle (\mathbb{R},\gamma)^\mathbb{R}$ is undefined. Even worse, $\textstyle f$ fails to be almost surely measurable. Still worse, the probability of $\textstyle f$ being measurable is undefined. And the worst thing: if $\textstyle X$ is a random variable distributed (say) uniformly on $\textstyle (0,1)$ and independent of $\textstyle f$ , then $\textstyle f(X)$ is not a random variable at all! (It lacks measurability.)

[edit] Another example: a perforated interval

Let $\textstyle Z \subset (0,1)$ be a set whose inner Lebesgue measure is equal to $0$ , but outer Lebesgue measure --- to $1$ (thus, $\textstyle Z$ is nonmeasurable to extreme). There exists a probability measure $\textstyle m$ on $\textstyle A$ such that $\textstyle m(Z \cap A) = \text{mes} (A)$ for every Lebesgue measurable $\textstyle A \subset (0,1)$ . Events and random variables on the probability space $\textstyle (Z,m)$ (treated $\textstyle \operatorname{mod} \, 0$ ) are in a natural one-to-one correspondence with events and random variables on the probability space $\textstyle ((0,1),\text{mes})$ . Many non-experts are inclined to conclude that the probability space $\textstyle (Z,m)$ is as good as $\textstyle ((0,1),\text{mes})$ .

However, it is not. A random variable $\textstyle X$ defined by $\textstyle X(\omega)=\omega$ is distributed uniformly on $\textstyle (0,1)$ . The conditional measure, given $\textstyle X=x$ , is just a single atom (at $\textstyle x$ ), provided that $\textstyle ((0,1),\text{mes})$ is the underlying probability space. However, if $\textstyle (Z,m)$ is used instead, then the conditional measure does not exist when $\textstyle x \notin A$ .

A perforated circle is constructed similarly. Its events and random variables are the same as on the usual circle. The group of rotations acts on them naturally. However, it fails to act on the perforated circle.

[edit] A definition

One of several well-known equivalent definitions of the standardness is given below, after some preparations. All probability spaces are assumed to be complete.

[edit] Isomorphism

An isomorphism between two probability spaces $\textstyle (\Omega_1,\mathcal{F}_1,P_1)$ , $\textstyle (\Omega_2,\mathcal{F}_2,P_2)$ is an invertible map $\textstyle f : \Omega_1 \to \Omega_2$ such that $\textstyle f$ and $\textstyle f^{-1}$ both are (measurable and) measure preserving maps.

Two probability spaces are isomorphic, if there exists an isomorphism between them.

[edit] Isomorphism modulo zero

Two probability spaces $\textstyle (\Omega_1,\mathcal{F}_1,P_1)$ , $\textstyle (\Omega_2,\mathcal{F}_2,P_2)$ are isomorphic $\textstyle \operatorname{mod} \, 0$ , if there exist null sets $\textstyle A_1 \subset \Omega_1$ , $\textstyle A_2 \subset \Omega_2$ such that the probability spaces $\textstyle \Omega_1 \setminus A_1$ , $\textstyle \Omega_2 \setminus A_2$ are isomorphic (being endowed naturally by sigma-fields and probability measures).

[edit] Standard probability space

A probability space is standard, if it is isomorphic $\textstyle \operatorname{mod} \, 0$ to an interval with Lebesgue measure, a finite or countable set of atoms, or a combination (disjoint union) of both.

See Sect. 2.4 (p. 20) of [1]; Proposition 6 (p. 249) and Remark 2 (p. 250) in [2]; and Theorem 4-3 in [3]. See also Sect. 17.F of [5], and [4] (especially Sect. 2.4 and Exercise 3.1(v)).

[edit] Verifying the standardness

Every probability distribution on the space $\textstyle \mathbb{R}^n$ turns it into a standard probability space. (Here, a probability distribution means a probability measure defined initially on the Borel sigma-algebra and completed.)

The same holds on every Polish space, see Sect. 2.7 (p. 24) of [1]; Example 1 (p. 248) in [2]; Theorem 2-3 in [3]; and Theorem 2.4.1 in [4].

For example, the Wiener measure turns the Polish space $\textstyle C[0,\infty)$ (endowed with the topology of local uniform convergence) into a standard probability space.

Another example: for every sequence of random variables, their joint distribution turns the Polish space $\textstyle \mathbb{R}^\infty$ (of sequences; endowed with the product topology) into a standard probability space.

(Thus, the idea of dimension, very natural for topological spaces, is utterly inappropriate for standard probability spaces.)

The product of two standard probability spaces is a standard probability space.

The same holds for the product of countably many spaces, see Sect. 3.4 of [1], Proposition 12 in [2], and Theorem 2.4.3 in [4].

A measurable subset of a standard probability space is a standard probability space. It is assumed that the set is not a null set, and is endowed with the conditional measure. See Sect. 2.3 (p. 14) of [1] and Proposition 5 in [2].

[edit] Using the standardness

[edit] Regular conditional probabilities

In the discrete setup, the conditional probability is another probability measure, and the conditional expectation may be treated as the (usual) expectation with respect to the conditional measure, see conditional expectation. In the non-discrete setup, conditioning is often treated indirectly, since the condition may have probability 0, see conditional expectation. As a result, a number of well-known facts have special 'conditional' counterparts. For example: linearity of the expectation; Jensen's inequality (see conditional expectation); Hölder's inequality; the monotone convergence theorem, etc.

Given a random variable $\textstyle Y$ on a probability space $\textstyle (\Omega,\mathcal{F},P)$ , it is natural to try constructing a conditional measure $\textstyle P_y$ , that is, the conditional distribution of $\textstyle \omega \in \Omega$ given $\textstyle Y(\omega)=y$ . In general this is impossible (see Sect. 4.1(c) in [6]). However, for a standard probability space $\textstyle (\Omega,\mathcal{F},P)$ this is possible, and well-known as canonical system of measures (see Sect. 3.1 of [1]), which is basically the same as conditional probability measures (see Sect. 3.5 in [4]), disintegration of measure (see Exercise (17.35) in [5]), and regular conditional probabilities (see Sect. 4.1(c) in [6]).

The conditional Jensen's inequality is just the (usual) Jensen's inequality applied to the conditional measure. The same holds for many other facts.

[edit] Measure preserving transformations

Given two probability spaces $\textstyle (\Omega_1,\mathcal{F}_1,P_1)$ , $\textstyle (\Omega_2,\mathcal{F}_2,P_2)$ and a measure preserving map $\textstyle f : \Omega_1 \to \Omega_2$ , the image $\textstyle f(\Omega_1)$ need not cover the whole $\textstyle \Omega_2$ , it may miss a null set. It may seem that $\textstyle P_2(f(\Omega_1))$ has to be equal to 1, but it is not so. The outer measure of $\textstyle f(\Omega_1)$ is equal to 1, but the inner measure may differ. However, if the probability spaces $\textstyle (\Omega_1,\mathcal{F}_1,P_1)$ , $\textstyle (\Omega_2,\mathcal{F}_2,P_2)$ are standard then $\textstyle P_2(f(\Omega_1))=1$ , see Theorem 3-2 in [3]. If $\textstyle f$ is also one-to-one then every $\textstyle A \in \mathcal{F}_1$ satisfies $\textstyle f(A) \in \mathcal{F}_2$ , $\textstyle P_2(f(A))=P_1(A)$ . Therefore $\textstyle f^{-1}$ is measurable (and measure preserving). See Sect. 2.5 (p. 20) of [1] and Theorem 3-5 in [3]. See also Proposition 9 in [2] (and Remark after it).

Striving to get rid of null sets, mathematicians often use equivalence classes of measurable sets or functions. Equivalence classes of measurable subsets of a probability space form a normed complete Boolean algebra called the measure algebra (or metric sstructure). Every measure preserving map $\textstyle f : \Omega_1 \to \Omega_2$ leads to a homomorphism $\textstyle F$ of measure algebras; basically, $\textstyle F(B) = f^{-1}(B)$ for $\textstyle B\in\mathcal{F}_2$ .

It may seem that every homomorphism of measure algebras has to correspond to some measure preserving map, but it is not so. However, for standard probability spaces each $\textstyle F$ corresponds to some $\textstyle f$ . See Sect. 2.6 (p. 23) and 3.2 of [1] and Sect. 17.F of [5].

[edit] Further reading

[1] V.A. Rohlin, "On the fundamental ideas of measure theory", Translations (American Mathematical Society) Series 1, Vol. 10, 1-54 (1962). Translated from Russian: В.А. Рохлин, "Об основных понятиях теории меры", Математический Сборник (новая серия) 25(67), 107-150 (1949).
[2] J. Haezendonck, "Abstract Lebesgue-Rohlin spaces", Bulletin de la Societe Mathematique de Belgique 25, 243-258 (1973).
[3] T. de la Rue, "Espaces de Lebesgue", Lecture Notes in Mathematics (Seminaire de Probabilites XXVII), Springer, Berlin, 1557, 15-21 (1993).
[4] K. Itô, "Introduction to probability theory", Cambridge Univ. Press 1984.
[5] A.S. Kechris, "Classical descriptive set theory", Springer 1995.
[6] R. Durrett, "Probability: theory and examples" (second edition), 1996.
[7] N. Wiener, "Nonlinear problems in random theory", M.I.T. Press 1958.
[8] Lectures of B. Tsirelson (especially, Sect. 2a and 2b).

Categories: Probability theory