Glivenko-Cantelli theorem

From Wikipedia, the free encyclopedia

In the theory of probability, the Glivenko-Cantelli theorem determines the asymptotic behaviour of the empirical distribution function as the number of iid observations grows. This uniform convergence of more general empirical measures becomes an important property of the Glivenko-Cantelli classes of functions or sets. The Glivenko-Cantelli classes arise in VC theory, with applications to machine learning.

Contents

[edit] Glivenko-Cantelli theorem

Assume that X_1,X_2,\dots are iid random variables in \mathbb{R} with common cumulative distribution function F(x). The empirical distribution function for X_1,\dots,X_n is defined by

F_n(x)=\frac{1}{n}\sum_{i=1}^n I_{(-\infty,x]}(X_i),

where IC is the indicator function. For fixed x, Fn(x) is a random variable which converges to F(x) almost surely by the strong law of large numbers, that is, Fn(x) converges to F(x) pointwise. Glivenko and Cantelli strengthened this result by proving uniform convergence of Fn to F.

Theorem (Glivenko, Cantelli, 1933):

\|F_n - F\|_\infty = \sup_{x\in R} |F_n(x) - F(x)| {\longrightarrow} 0 almost surely.

[edit] Empirical measures

One can generalize the empirical distribution function by replacing the set (-\infty,x] by an arbitrary set c from a class of sets \mathcal{C} to obtain an empirical measure indexed by c

P_n(c)=\frac{1}{n}\sum_{i=1}^n I_c(X_i), c\in\mathcal{C},

Further generalization is the map induced by Pn on measurable real-valued functions f, which is given by

f\mapsto P_nf=\int_SfdP_n=\frac{1}{n}\sum_{i=1}^n f(X_i), f\in\mathcal{F},

Then it becomes an important property of these classes that the strong law of large numbers holds uniformly on \mathcal{F} or \mathcal{C}.

[edit] Glivenko-Cantelli class

Consider a set \mathcal{S} with a sigma algebra of Borel subsets A and a probability measure P. For a class of subsets,

{\mathcal C}\subset\{c:c \mbox{ is measurable subset of }\mathcal{S}\}

and a class of functions

\mathcal{F}\subset\{f:\mathcal{S}\to \mathbb{R}, f \mbox{ is measurable}\,\}

define random variables

\|P_n-P\|_{\mathcal C}=\sup_{c\in {\mathcal C}} |P_n(c)-P(c)|
\|P_n-P\|_{\mathcal F}=\sup_{f\in {\mathcal F}} |P_nf-\mathbb{E}f|

where Pn(c) is the empirical measure, Pnf is the corresponding map, and

\mathbb{E}f=\int_\mathcal{S} fdP, assuming that it exists.

Definitions

  • A class \mathcal C is called a Glivenko-Cantelli class (or GC class) with respect to a probability measure P if any of the following equivalent statements is true.
1. \|P_n-P\|_\mathcal{C}\to 0 almost surely as n\to\infty.
2. \|P_n-P\|_\mathcal{C}\to 0 in probability as n\to\infty.
3. \mathbb{E}\|P_n-P\|_\mathcal{C}\to 0, as n\to\infty (convergence in mean).
The Glivenko-Cantelli classes of functions are defined similarly.
  • A class is called a universal Glivenko-Cantelli class if it is a GC class with respect to any probability measure P on (S,A).
  • A class is called uniformly Glivenko-Cantelli if the convergence occurs uniformly over all probability measures P on (S,A):
\sup_{P\in \mathcal{P}(S,A)} \mathbb E \|P_n-P\|_\mathcal{C}\to 0;
\sup_{P\in \mathcal{P}(S,A)} \mathbb E \|F_n-F\|_\mathcal{F}\to 0.

Theorem (Vapnik and Chervonenkis, 1968)

A class of sets \mathcal{C} is uniformly GC if and only if it is a Vapnik-Chervonenkis class.

[edit] Examples

  • Let S=\mathbb R and {\mathcal C}=\{(-\infty,t]:t\in {\mathbb R}\}. The classical Glivenko-Cantelli theorem implies that this class is a universal GC class. Furthermore, by Kolmogorov's theorem,
\sup_{P\in \mathcal{P}(S,A)}\|P_n-P\|_{\mathcal C} \sim n^{-1/2}, that is \mathcal{C} is uniformly Glivenko-Cantelli class.
  • Let P be a nonatomic probability measure on S and \mathcal{C} be a class of all finite subsets in S. Because A_n=\{X_1,\ldots,X_n\}\in \mathcal{C}, P(An) = 0, Pn(An) = 1, we have that \|P_n-P\|_{\mathcal C}=1 and so \mathcal{C} is not a GC class with respect to P.

[edit] See also