Score (statistics)

From Wikipedia, the free encyclopedia

In statistics, the score is the partial derivative, with respect to some parameter θ, of the logarithm (commonly the natural logarithm) of the likelihood function. If the observation is X and its likelihood is L(θ;X), then the score V can be found through the chain rule:

V = \frac{\partial}{\partial\theta} \log L(\theta;X) = \frac{1}{L(\theta;X)} \frac{\partial L(\theta;X)}{\partial\theta}.

Note that V is a function of θ and the observation X. The score V is a sufficient statistic for θ.

The expected value of V, written \mathbb{E}(V|\theta), is zero. To see this, rewrite the definition of expectation, using the fact that the probability mass function is just L(θ;x), which is conventionally denoted by f(x;θ) (in which the dependence on x is more explicit). The corresponding cumulative distribution function is denoted as F(x;θ). With this change of notation and writing f'θ(x;θ) for the partial derivative with respect to θ,

\mathbb{E}(V|\theta) =\int_{[0,1]}\frac{f'_{\theta}(x; \theta)}{f(x; \theta)}dF(x;\theta) =\int_X  \frac{f'_{\theta}(x; \theta)}{f(x; \theta)} f(x; \theta) dx = \int_X \frac{\partial f(x; \theta)}{\partial \theta} \, dx

where the integral runs over the whole of the probability space of X and a prime denotes partial differentiation with respect to θ. If certain differentiability conditions are met, the integral may be rewritten as

\frac{\partial}{\partial\theta} \int_X f(x; \theta) \, dx = \frac{\partial}{\partial\theta}1 = 0.

It is worth restating the above result in words: the expected value of the score is zero. Thus, if one were to repeatedly sample from some distribution, and repeatedly calculate the score with the true θ, then the mean value of the scores would tend to zero as the number of repeat samples approached infinity.

The variance of the score is known as the Fisher information and is written \mathcal{I}(\theta). Because the expectation of the score is zero, this may be written as

\mathcal{I}(\theta) = \mathbb{E} \left\{\left.  \left[   \frac{\partial}{\partial\theta} \log L(\theta;X)  \right]^2 \right|\theta\right\}.

Note that the Fisher information, as defined above, is not a function of a particular observation, as the random variable X has been averaged out. This concept of information is useful when comparing two methods of observation of some random process.

[edit] Example

Consider a Bernoulli process, with A successes and B failures; the probability of success is θ.

Then the likelihood L is

L(\theta;A,B)=\frac{(A+B)!}{A!B!}\theta^A(1-\theta)^B,

so the score V is given by

V=\frac{\partial}{\partial\theta}\log\left[L(\theta;A,B)\right]= \frac{1}{L}\frac{\partial L}{\partial\theta}.

This is a standard calculus problem: A and B are treated as constants. Then

V=\frac{A}{\theta}-\frac{B}{1-\theta}.

So if the score is zero, θ = A / (A + B). We can now verify that the expectation of the score is zero. Noting that the expectation of A is nθ and the expectation of B is n(1 − θ), we can see that the expectation of V is

E(V)=\frac{n\theta}{\theta}-\frac{n(1-\theta)}{1-\theta}=0.

We can also check the variance of V. We know that A + B = n and the variance of A is nθ(1 − θ) so the variance of V is

\sigma^2(V)=\sigma^2\left(\frac{A}{\theta}-\frac{n-A}{1-\theta}\right)  =\sigma^2\left(A\left(\frac{1}{\theta}+\frac{1}{1-\theta}\right)\right) =\left(\frac{1}{\theta}+\frac{1}{1-\theta}\right)^2\sigma^2(A) =\frac{n}{\theta(1-\theta)}.

[edit] See also