Law of large numbers

From Wikipedia, the free encyclopedia

The law of large numbers is a fundamental concept in statistics and probability that describes how the average of a randomly selected sample from a large population is likely to be close to the average of the whole population. The term "law of large numbers" was introduced by S.D. Poisson in 1835 as he discussed a 1713 version of it put forth by James Bernoulli.[1]

In formal language:

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

In statistics, this means that the more units of something that are measured, the closer that sample's average will be to the true average of all of the units—including those that were not measured. (The term "average" means the arithmetic mean.)

For example, the average weight of 10 apples taken from a barrel of 100 apples is probably closer to the "real" average weight of all 100 apples than the average weight of 3 apples taken from that same barrel. This is because the sample of 10 is a larger number than the sample of only 3 and better represents the whole group. If you took a sample of 99 apples out of 100 apples, the average would be almost exactly the same as the average for all 100 apples. While this rule may appear self-evident, it allows statisticians to draw conclusions or make forecasts that would not be possible otherwise. In particular, it permits precise measurement of the likelihood that an estimate is close to the "right" or true number.

However, in an infinite (or very large) set of observations, the value of any one individual observation cannot be predicted based upon past observations. Such predictions are known as the Gambler's Fallacy.

There are two versions of the Law of Large Numbers, one called the "weak" law and the other the "strong" law. This article will describe both versions in technical detail, but in essence the two laws do not describe different actual laws but instead refer to different ways of describing the convergence of the sample mean with the population mean. The weak law states that as the sample size grows larger, the difference between the sample mean and the population mean will approach zero. The strong law states that as the sample size grows larger, the probability that the sample mean and the population mean will be exactly equal approaches 1.

One of the most important conclusions of the Law of Large Numbers is the Central Limit Theorem which, generally, describes how sample means tend to occur in a Normal Distribution around the mean of the population regardless of the shape of the population distribution, especially as sample sizes get larger. (See Central Limit Theorem for details of this application, including some important limitations.) This helps statisticians evaluate the reliability of their results because they are able to make assumptions about a sample and extrapolate their results or conclusions to the population from which the sample was derived with a certain degree of confidence. See Statistical hypothesis testing as an example.

The phrase "law of large numbers" is also sometimes used in a less technical way to refer to the principle that the probability of any possible event (even an unlikely one) occurring at least once in a series increases with the number of events in the series. For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets.

The remainder of this article will assume the reader has a familiarity with mathematical concepts and notation.

Contents

[edit] The weak law

The weak law of large numbers states that if X1, X2, X3, ... is an infinite sequence of random variables, where all the random variables have the same expected value μ and variance σ2; and are uncorrelated (i.e., the correlation between any two of them is zero), then the sample average

\overline{X}_n=(X_1+\cdots+X_n)/n

converges in probability to μ. Somewhat less tersely: For any positive number ε, no matter how small, we have

\lim_{n\rightarrow\infty}\operatorname{P}\left(\left|\overline{X}_n-\mu\right|<\varepsilon\right)=1.

[edit] Proof

Chebyshev's inequality is used to prove this result. Finite variance \operatorname{Var} (X_i)=\sigma^2 (for all i) and no correlation yield that

\operatorname{Var}(\overline{X}_n) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}.

The common mean μ of the sequence is the mean of the sample average:

E(\overline{X}_n) = \mu.

Using Chebyshev's inequality on \overline{X}_n results in

\operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq \frac{\sigma^2}{{n\varepsilon^2}}.

This may be used to obtain the following:

\operatorname{P}( \left| \overline{X}_n-\mu \right| < \varepsilon) = 1 - \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \geq 1 - \frac{\sigma^2}{\varepsilon^2 n}.

As n approaches infinity, the expression approaches 1.

Proof ends here

The result holds also for the 'infinite variance' case, provided the Xi are mutually independent and their (finite) mean μ exists.

A consequence of the weak law of large numbers is the asymptotic equipartition property.

[edit] The strong law

The strong law of large numbers states that if X1, X2, X3, ... is an infinite sequence of random variables that are pairwise independent and identically distributed with E(|Xi|) < ∞   (and where the common expected value is μ), then

\operatorname{P}\left(\lim_{n\rightarrow\infty}\overline{X}_n=\mu\right)=1,

i.e., the sample average converges almost surely to μ.

If we replace the finite expectation condition with a finite second moment condition,  E(Xi2) < ∞ (which is the same as assuming that Xi has variance), then we obtain both almost sure convergence and convergence in mean square. In either case, these conditions also imply the consequent weak law of large numbers, since almost sure convergence implies convergence in probability (as, indeed, does convergence in mean square).

This law justifies the intuitive interpretation of the expected value of a random variable as the "long-term average when sampling repeatedly".

[edit] A weaker law and proof

Proofs of the above weak and strong laws of large numbers are rather involved. The consequent of the slightly weaker form below is implied by the weak law above (since convergence in distribution is implied by convergence in probability), but has a simpler proof.

Theorem. Let X1, X2, X3, ... be a sequence of random variables, independent and identically distributed with common mean μ < ∞, and define the partial sum Sn := X1 + X2 + ... +Xn. Then,  Sn / n converges in distribution to μ.

Proof. (See [1], p. 174) By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as

\varphi(t) = 1 + it\mu + o(t), \quad t \rightarrow 0.

Then, since the characteristic function of the sum of independent random variables is the product of their characteristic functions, the characteristic function of  Sn / n  is

\left[\varphi\left({t \over n}\right)\right]^n = \left[1 + i\mu{t \over n} + o\left({t \over n}\right)\right]^n \, \rightarrow \, e^{it\mu}, \quad \textrm{as} \quad n \rightarrow \infty.

The limit  eitμ  is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem,  Sn / n converges in distribution to μ. Note that the proof of the central limit theorem, which tells us more about the convergence of the average to μ (when the variance σ 2 is finite), follows a very similar approach.

[edit] References

  • Grimmett, G. R. and Stirzaker, D. R. (1992). Probability and Random Processes, 2nd Edition. Clarendon Press, Oxford. ISBN 0-19-853665-8.

[edit] See also

[edit] External links