Law of large numbers/Proof

From Wikipedia, the free encyclopedia

The name of this article may be improper for Wikipedia.
To meet Wikipedia's quality standards, this article may need to be moved to a better name that matches the subject. The current name may violate one or more of Wikipedia's naming conventions, or is otherwise inaccurate. Please see the discussion on the talk page.

It has been suggested that this article or section be merged into Law of large numbers. (Discuss)

Main article: Law of large numbers

Given X₁, X₂, ... an infinite sequence of i.i.d. random variables with finite expected value E(X₁) = E(X₂) = ... = µ < ∞, we are interested in the convergence of the sample average

$\overline{X}_n=\tfrac1n(X_1+\cdots+X_n).$

1 The weak law
- 1.1 Proof using Chebyshev's inequality
- 1.2 Proof using convergence of characteristic functions

[edit] The weak law

Theorem: $\overline{X}_n \, \xrightarrow{P} \, \mu \qquad\textrm{for}\qquad n \to \infty$

[edit] Proof using Chebyshev's inequality

This proof uses the assumption of finite variance $\operatorname{Var} (X_i)=\sigma^2$ (for all $i$ ). The independence of the random variables implies no correlation between them, and we have that

$\operatorname{Var}(\overline{X}_n) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}.$

The common mean μ of the sequence is the mean of the sample average:

$E(\overline{X}_n) = \mu.$

Using Chebyshev's inequality on $\overline{X}_n$ results in

$\operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq \frac{\sigma^2}{{n\varepsilon^2}}.$

This may be used to obtain the following:

$\operatorname{P}( \left| \overline{X}_n-\mu \right| < \varepsilon) = 1 - \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \geq 1 - \frac{\sigma^2}{\varepsilon^2 n}.$

As n approaches infinity, the expression approaches 1. And by definition of convergence in probability (see Convergence of random variables), we have obtained

$\overline{X}_n \, \xrightarrow{P} \, \mu \qquad\textrm{for}\qquad n \to \infty$

[edit] Proof using convergence of characteristic functions

By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as

$\varphi_X(t) = 1 + it\mu + o(t), \quad t \rightarrow 0.$

All X₁, X₂, ... have the same characteristic function, so we will simply denote this φ_X.

Among the basic properties of characteristic functions there are

$\varphi_{\frac 1 n X}(t)= \varphi_X(\tfrac t n) \quad \textrm{and} \quad \varphi_{X+Y}(t)=\varphi_X(t) \varphi_Y(t) \quad \textrm{if\,}X\,\textrm{and}\, Y\, \textrm{are\,\,independent}.$

These rules can be used to calculate the characteristic function of $\scriptstyle\overline{X}_n$ in terms of φ_X:

$\varphi_{\overline{X}_n}(t)= \left[\varphi_X\left({t \over n}\right)\right]^n = \left[1 + i\mu{t \over n} + o\left({t \over n}\right)\right]^n \, \rightarrow \, e^{it\mu}, \quad \textrm{as} \quad n \rightarrow \infty.$

The limit e^itμ is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem, $\scriptstyle\overline{X}_n$ converges in distribution to μ:

$\overline{X}_n \, \xrightarrow{\mathcal D} \, \mu \qquad\textrm{for}\qquad n \to \infty.$

μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent. (See Convergence of random variables) This implies that

$\overline{X}_n \, \xrightarrow{P} \, \mu \qquad\textrm{for}\qquad n \to \infty.$