Sufficiency (statistics)

From Wikipedia, the free encyclopedia

In statistics, a statistic is sufficient for the parameter θ, which indexes the distribution family of the data, precisely when the data's conditional probability distribution, given the statistic's value, no longer depends on θ.

Intuitively, a sufficient statistic for θ captures all the possible information about θ from the data. Both the statistic and θ can be vectors.

The concept is due to Sir Ronald Fisher.

Contents

[edit] Mathematical definition

A statistic T(X) is sufficient for θ precisely if the conditional probability distribution of the data X, given the statistic T(X), is independent of the parameter θ, i.e.

\Pr(X=x|T(X)=t,\theta) = \Pr(X=x|T(X)=t), \,

or in shorthand

\Pr(x|t,\theta) = \Pr(x|t).\,

[edit] Fisher's factorization theorem

Fisher's factorization theorem provides a convenient characterization of a sufficient statistic. If the likelihood function of X is L(θx), then T is sufficient for θ if and only if functions g and h can be found such that

L(\theta;x)=h(x) \, g(T(x);\theta), \,\!

i.e. the likelihood L can be factored into a product such that one factor, h, does not depend on θ and the other factor, which does depend on θ, depends on x only through T(x).

[edit] Interpretation

A way to think about this is to consider varying x in such a way as to maintain a constant value of T(X) and ask whether such a variation has any effect on inferences one might make about θ. If the factorization criterion above holds, the answer is "none" because the dependence of the likelihood function f on θ is unchanged.

[edit] Proof

[edit] Sufficiency => Factorization

If T(x) is sufficient, then

\begin{align} \Pr(x|\theta) = \Pr(x,t|\theta) & = \sum_{x:T(x)=t} \Pr(x|t,\theta) \cdot \Pr(t|\theta) \\ & = \sum_{x:T(x)=t} \Pr(x|t)\, \Pr(t|\theta) & \mbox{ (by sufficiency)} \\ & = \left(\sum_{x:T(x)=t} \Pr(x|t)\right)\Pr(t|\theta) & \mbox{ (since summand is same)} \\ & = h(x) \, g(T(x);\theta) \end{align}

[edit] Factorization => Sufficiency

On the other hand, if factorization holds, then

\Pr(x|t,\theta) = \frac{\Pr(x,t|\theta)}{\Pr(t|\theta)} = \frac{h(x)\,g(t;\theta)}{\int_{T(y)=t} h(y)\,g(t;\theta) \, dy} = \frac{h(x)}{\int_{T(y)=t} h(y)\, dy}, independent of θ.

[edit] Minimal sufficiency

A sufficient statistic is minimal sufficient if it can be represented as a function of any other sufficient statistic.

In other words, S(X) is minimal sufficient iff

  1. S(X) is sufficient, and
  2. if T(X) is sufficient, then there exists a function f such that S(X) = f(T(X)).

Intuitively, a minimal sufficient statistic most efficiently captures as much information as is possible about the parameter θ.

[edit] Examples

[edit] Bernoulli distribution

If X1, ...., Xn are independent Bernoulli-distributed random variables with expected value p, then the sum T(X) = X1 + ... + Xn is a sufficient statistic for p (here 'success' corresponds to Xi = 1 and 'failure' to Xi = 0; so T is the total number of successes)

This is seen by considering the joint probability distribution:

\Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

p^{x_1}(1-p)^{1-x_1} p^{x_2}(1-p)^{1-x_2}\cdots p^{x_n}(1-p)^{1-x_n} \,\!

and, collecting powers of p and 1 − p, gives

p^{\sum x_i}(1-p)^{n-\sum x_i}=p^{T(x)}(1-p)^{n-T(x)} \,\!

which satisfies the factorization criterion, with h(x) being just the identity function.

Note the crucial feature: the unknown parameter p interacts with the observation x only via the statistic T(x) = Σ xi.

[edit] Uniform distribution

If X1, ...., Xn are independent and uniformly distributed on the interval [0,θ], then T(X) = max(X1, ...., Xn ) is sufficient for θ.

To see this, consider the joint probability distribution:

\Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

\frac{\operatorname{H}(\theta-x_1)}{\theta}\cdot \frac{\operatorname{H}(\theta-x_2)}{\theta}\cdot\,\cdots\,\cdot \frac{\operatorname{H}(\theta-x_n)}{\theta} \,\!

where H(x) is the Heaviside step function. This may be written as

\frac{\operatorname{H}\left(\theta-\max_i \{\,x_i\,\}\right)}{\theta^n}\,\!

which can be viewed as a function of only θ and maxi(Xi) = T(X). This shows that the factorization criterion is satisfied, again where h(x) is the identity function.

[edit] Poisson distribution

If X1, ...., Xn are independent and have a Poisson distribution with parameter λ, then the sum T(X) = X1 + ... + Xn is a sufficient statistic for λ.

To see this, consider the joint probability distribution:

\Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

{e^{-\lambda} \lambda^{x_1} \over x_1 !} \cdot  {e^{-\lambda} \lambda^{x_2} \over x_2 !} \cdot\,\cdots\,\cdot  {e^{-\lambda} \lambda^{x_n} \over x_n !} \,\!

which may be written as

e^{-n\lambda} \lambda^{(x_1+x_2+\cdots+x_n)} \cdot  {1 \over x_1 ! x_2 !\cdots x_n ! } \,\!

which shows that the factorization criterion is satisfied, where h(x) is the reciprocal of the product of the factorials.

[edit] The Rao-Blackwell theorem

Sufficiency finds a useful application in the Rao-Blackwell theorem.

Since the conditional distribution of X given a sufficient statistic T(X) does not depend on θ, neither does the conditional expected value of g(X) given T(X), where g is any function well-behaved enough for the conditional expectation to exist. Consequently that conditional expected value is actually a statistic, and so is available for use in estimation.

The Rao-Blackwell theorem states that if g(X) is any kind of estimator of θ, then typically the conditional expectation of g(X) given T(X) is a better estimator of θ, and is never worse. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.

[edit] See also

In other languages