Empirical process

From Wikipedia, the free encyclopedia

For the process control topic, see Empirical process (process control model).

The study of empirical processes is a branch of mathematical statistics and a sub-area of probability theory. It is a generalization of the central limit theorem for the empirical measures.

Contents

[edit] Definition

It is known that under certain conditions empirical measures Pn uniformly converge to the probability measure P (see Glivenko-Cantelli theorem). Empirical processes provide rate of this convergence.

A centered and scaled version of the empirical measure is the signed measure

G_n(A)=\sqrt{n}(P_n(A)-P(A))

It induces map on measurable functions f given by

f\mapsto G_n f=\sqrt{n}(P_n-P)f=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n f(X_i)-\mathbb{E}f\right)

By the central limit theorem, Gn(A) converges in distribution to a normal random variable N(0,P(A)(1-P(A))) for fixed measurable set A. Similarly, for a fixed function f, Gnf converges in distribution to a normal random variable N(0,\mathbb{E}(f-\mathbb{E}f)^2), provided that \mathbb{E}f and \mathbb{E}f^2 exist.

Definition

\bigl(G_n(c)\bigr)_{c\in\mathcal{C}} is called empirical process indexed by \mathcal{C}, a collection of measurable subsets of S.
\bigl(G_nf\bigr)_{f\in\mathcal{F}} is called empirical process indexed by \mathcal{F}, a collection of measurable functions from S to \mathbb{R}.

A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of the Donsker classes such that empirical processes indexed by these classes converge weakly to a certain Gaussian process. It can be shown that the Donsker classes are Glivenko-Cantelli, the converse is not true in general.

[edit] Example

As an example, consider empirical distribution functions. For real-valued iid random variables X1,Xn,... they are given by

F_n(x)=P_n((-\infty,x])=P_nI_{(-\infty,x]}.

In this case, empirical processes are indexed by a class \mathcal{C}=\{(-\infty,x]:x\in\mathbb{R}\}. It has been shown that \mathcal{C} is a Donsker class, in particular,

\sqrt{n}(F_n(x)-F(x)) converges weakly in \ell^\infty(\mathbb{R}) to a Brownian bridge B(F(x)).

[edit] References

  • P. Billingsley, Probability and Measure, John Wiley and Sons, New York, third edition, 1995.
  • M.D. Donsker, Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems, Annals of Mathematical Statistics, 23:277-281, 1952.
  • R.M. Dudley, Central limit theorems for empirical measures, Annals of Probability, 6(6): 899-929, 1978.
  • R.M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics, 63, Cambridge University Press, Cambridge, UK, 1999.
  • Aad W. van der Vaart and Jon A. Wellner,Weak Convergence and Empirical Processes: With Applications to Statistics, 2nd ed., Springer, 2000. ISBN 978-0387946405
  • J. Wolfowitz, Generalization of the theorem of Glivenko-Cantelli. Annals of Mathematical Statistics, 25, 131-138, 1954.

[edit] External links