Continuous mapping theorem

In probability theory, the continuous mapping theorem states that continuous functions are limit-preserving even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if x_n → x then g(x_n) → g(x). The continuous mapping theorem states that this will also be true if we replace the deterministic sequence {x_n} with a sequence of random variables {X_n}, and replace the standard notion of convergence of real numbers “→” with one of the types of convergence of random variables.

This theorem was first proved by Mann & Wald (1943), and it is therefore sometimes called the Mann–Wald theorem.^[1]

Statement

Let {X_n}, X be random elements defined on a metric space S. Suppose a function g: S→S′ (where S′ is another metric space) has the set of discontinuity points D_g such that Pr[X ∈ D_g] = 0. Then^[2]^[3]^[4]

$X_{n}\ {\xrightarrow {d}}\ X\quad \Rightarrow \quad g(X_{n})\ {\xrightarrow {d}}\ g(X);$
$X_{n}\ {\xrightarrow {p}}\ X\quad \Rightarrow \quad g(X_{n})\ {\xrightarrow {p}}\ g(X);$
$X_{n}\ {\xrightarrow {\!\!as\!\!}}\ X\quad \Rightarrow \quad g(X_{n})\ {\xrightarrow {\!\!as\!\!}}\ g(X).$

Proof

This proof has been adopted from (van der Vaart 1998, Theorem 2.3)

Spaces S and S′ are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x−y| notation, even though the metrics may be arbitrary and not necessarily Euclidean.

Convergence in distribution

We will need a particular statement from the portmanteau theorem: that convergence in distribution $X_{n}{\xrightarrow {d}}X$ is equivalent to

\limsup _{{n\to \infty }}\operatorname {Pr}(X_{n}\in F)\leq \operatorname {Pr}(X\in F){\text{ for every closed set }}F.

Fix an arbitrary closed set F⊂S′. Denote by g⁻¹(F) the pre-image of F under the mapping g: the set of all points x ∈ S such that g(x)∈F. Consider a sequence {x_k} such that g(x_k) ∈ F and x_k → x. Then this sequence lies in g⁻¹(F), and its limit point x belongs to the closure of this set, g⁻¹(F) (by definition of the closure). The point x may be either:

a continuity point of g, in which case g(x_k) → g(x), and hence g(x)∈F because F is a closed set, and therefore in this case x belongs to the pre-image of F, or
a discontinuity point of g, so that x ∈ D_g.

Thus the following relationship holds:

\overline {g^{{-1}}(F)}\ \subset \ g^{{-1}}(F)\cup D_{g}\ .

Consider the event {g(X_n)∈F}. The probability of this event can be estimated as

\operatorname {Pr}{\big (}g(X_{n})\in F{\big )}=\operatorname {Pr}{\big (}X_{n}\in g^{{-1}}(F){\big )}\leq \operatorname {Pr}{\big (}X_{n}\in \overline {g^{{-1}}(F)}{\big )},

and by the portmanteau theorem the limsup of the last expression is less than or equal to Pr(X ∈ g⁻¹(F)). Using the formula we derived in the previous paragraph, this can be written as

{\begin{aligned}&\operatorname {Pr} {\big (}X\in {\overline {g^{-1}(F)}}{\big )}\leq \operatorname {Pr} {\big (}X\in g^{-1}(F)\cup D_{g}{\big )}\leq \\&\operatorname {Pr} {\big (}X\in g^{-1}(F){\big )}+\operatorname {Pr} (X\in D_{g})=\operatorname {Pr} {\big (}g(X)\in F{\big )}+0.\end{aligned}}

On plugging this back into the original expression, it can be seen that

\limsup_{n\to\infty} \Pr \big(g(X_n)\in F\big) \leq \Pr \big(g(X) \in F\big),

which, by the portmanteau theorem, implies that g(X_n) converges to g(X) in distribution.

Convergence in probability

Fix an arbitrary ε > 0. Then for any δ > 0 consider the set B_δ defined as

B_\delta = \big\{x\in S \mid x\notin D_g:\ \exists y\in S:\ |x-y|<\delta,\, |g(x)-g(y)|>\varepsilon\big\}.

This is the set of continuity points x of the function g(·) for which it is possible to find, within the δ-neighborhood of x, a point which maps outside the ε-neighborhood of g(x). By definition of continuity, this set shrinks as δ goes to zero, so that lim_δ → 0B_δ = ∅.

Now suppose that |g(X) − g(X_n)| > ε. This implies that at least one of the following is true: either |X−X_n| ≥ δ, or X ∈ D_g, or X∈B_δ. In terms of probabilities this can be written as

\Pr {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}\leq \Pr {\big (}|X_{n}-X|\geq \delta {\big )}+\Pr(X\in B_{\delta })+\Pr(X\in D_{g}).

On the right-hand side, the first term converges to zero as n → ∞ for any fixed δ, by the definition of convergence in probability of the sequence {X_n}. The second term converges to zero as δ → 0, since the set B_δ shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore the conclusion is that

\lim_{n\to\infty}\Pr \big(\big|g(X_n)-g(X)\big|>\varepsilon\big) = 0,

which means that g(X_n) converges to g(X) in probability.

Convergence almost surely

By definition of the continuity of the function g(·),

\lim _{{n\to \infty }}X_{n}(\omega )=X(\omega )\quad \Rightarrow \quad \lim _{{n\to \infty }}g(X_{n}(\omega ))=g(X(\omega ))

at each point X(ω) where g(·) is continuous. Therefore

{\begin{aligned}\operatorname {Pr} {\Big (}\lim _{n\to \infty }g(X_{n})=g(X){\Big )}&\geq \operatorname {Pr} {\Big (}\lim _{n\to \infty }g(X_{n})=g(X),\ X\notin D_{g}{\Big )}\\&\geq \operatorname {Pr} {\Big (}\lim _{n\to \infty }X_{n}=X,\ X\notin D_{g}{\Big )}=1.\end{aligned}}

because the intersection of two almost sure events is almost sure.

By definition, we conclude that g(X_n) converges to g(X) almost surely.

References

↑ Amemiya 1985, p. 88
↑ Van der Vaart 1998, Theorem 2.3, page 7
↑ Billingsley 1969, p. 31, Corollary 1
↑ Billingsley 1999, p. 21, Theorem 2.7