Completeness (statistics)

From Wikipedia, the free encyclopedia

In statistics, completeness is a property of a statistic for which the statistic optimally obtains information about the unknown parameters characterizing the distribution of the underlying data.

It is closely related to statistical sufficiency and often occurs in conjunction with it.

[hide]

1 Mathematical definition
- 1.1 Completeness of the family
- 1.2 Heuristic approach
2 Examples
3 Utility
- 3.1 Lehmann-Scheffé theorem
- 3.2 Basu's theorem

[edit] Mathematical definition

Suppose a random variable X (which may be a sequence (X₁, ..., X_n) of scalar-valued random variables), has a probability distribution belonging to a known family of probability distributions P_θ parametrized by θ. Let s(X) be any statistic based on X.

Then s(X) is a complete statistic if and only if for every measurable function g,

E(g(s(X))) = 0 for all θ $\Rightarrow$ P_θ(g(s(X)) = 0) = 1 for all θ

and is boundedly complete if the implication holds for all bounded g.

[edit] Completeness of the family

It is not guaranteed that for a particular family of probabilities, a complete sufficient statistic will always exist. In contrast, a minimal sufficient statistic always exists.

In particular, if a complete sufficient statistic exists, it will be minimal sufficient (note: completeness does not necessarily imply sufficiency and sufficiency does not necessarily imply completeness). Taking this fact into account, the family P_θ of distributions is called complete if and only if its minimal sufficient statistic is complete.

[edit] Heuristic approach

A sufficient statistic retains at least enough information from the data to estimate θ. A complete statistic retains no irrelevant information in estimating θ (it is possible a complete statistic may retain no information). If the intersection of these two groups exists, it will contain complete sufficient statistics. In other words, it contains efficient statistics that retain as much information as possible from the data and will retain no irrelevant information.

[edit] Examples

[edit] Sum of normals

Suppose (X₁, X₂) are independent, identically distributed random variables, normally distributed with expectation θ and variance 1. The sum

$s((X_1,\ X_2)) = X_1 + X_2\,\!$

is a complete statistic. To show this one demonstrates that there is no non-zero function $g$ such that the expectation of

$g(s(X_1,\ X_2)) = g(X_1+X_2)\,\!$

remains zero regardless of the value of θ.

That fact may be seen as follows. The probability distribution of X₁ + X₂ is normal with expectation 2θ and variance 2. Its probability density function in $x$ is therefore proportional to

$\exp\left(-(x-2\theta)^2/4\right).$

The expectation of g above would therefore be a constant times

$\int_{-\infty}^\infty g(x)\exp\left(-(x-2\theta)^2/4\right)\,dx.$

A bit of algebra reduces this to

$k(\theta) \int_{-\infty}^\infty h(x)e^{x\theta}\,dx\,\!$

where k(θ) is nowhere zero and

$h(x)=g(x)e^{-x^2/4}.\,\!$

As a function of θ this is a two-sided Laplace transform of h(X), and cannot be identically zero unless h(x) is zero almost everywhere. The exponential is not zero, so this can only happen if g(x) is zero almost everywhere.

[edit] Counterexample 1

Again suppose (X₁, X₂) are independent, identically distributed random variables, normally distributed with expectation θ and variance 1.

Then

$g((X_1,\ X_2)) = X_1 - X_2\,\!$

is an unbiased estimator of zero. Therefore the pair (X₁, X₂) itself is not a complete statistic (though it is a sufficient statistic in a sample of size 2).

[edit] Counterexample 2

Let U follow Uniform[-½,½]. Let X = U + θ, so that the distribution of X is parametrized by the mean θ = E(X).

Then if g(x) = sin(2πx), then E(g(X)) = 0 irrespective of θ. Therefore X itself is not a complete statistic for θ.

[edit] Utility

[edit] Lehmann-Scheffé theorem

The major importance of completeness is in the application of the Lehmann-Scheffé theorem, which states that a statistic that is unbiased, complete and sufficient for some parameter θ is the best unbiased estimator for θ, i.e., the one that has a smaller expected loss for any convex loss function (in typical practice, a smaller mean squared error) among any estimators with the same expected value.

[edit] Basu's theorem

Completeness is also a prerequisite for the applicability of Basu's theorem: A statistic which is both complete and sufficient is independent of any ancillary statistic (one independent of the parameters θ).

Categories: Statistical theory