Scoring algorithm

From Wikipedia, the free encyclopedia

In statistics, Fisher's Scoring algorithm is a form of Newton's method used to solve maximum likelihood equations numerically.


Contents

[edit] Sketch of Derivation

Let Y_1,\ldots,Y_n be random variables, independent and identically distributed with twice differentiable p.d.f. f(y;θ), and we wish to calculate the maximum likelihood estimator (M.L.E.) θ * of θ. First, suppose we have a starting point for our algorithm θ0, and consider a Taylor expansion of the score function, V(θ), about θ0:

V(\theta) \approx V(\theta_0) - \mathcal{J}(\theta_0)(\theta - \theta_0),

where

\mathcal{J}(\theta_0) = - \sum_{i=1}^n \left. \nabla \nabla^{\top} \right|_{\theta=\theta_0} \log f(Y_i ; \theta)

is the observed information matrix at θ0. Now, setting θ = θ * , using that V* ) = 0 and rearranging gives us:

\theta^* = \theta_{0} + \mathcal{J}^{-1}(\theta_{0})V(\theta_{0}).

We therefore use the algorithm

\theta_{m+1} = \theta_{m} + \mathcal{J}^{-1}(\theta_{m})V(\theta_{m}),

and under certain regularity conditions, it can be shown that \theta_m \rightarrow \theta^*.

[edit] Fisher Scoring

In practice, \mathcal{J}(\theta) is usually replaced by \mathcal{I}(\theta)= \mathrm{E}[J(\theta)], the Fisher information, thus giving us the Fisher Scoring Algorithm:

\theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m}).


[edit] Application to Linear Models

The method of Fisher Scoring is often used in the theory of linear models. Suppose we have a standard linear model

Y = X\beta + \varepsilon,

where \varepsilon_i \sim N(0,\sigma^2) independently; now suppose we want to estimate β. It can be shown[1] that

\beta^* \sim N(\beta, \sigma^2 \mathcal{I}^{-1}(\beta))

where β * is the M.L.E. of β. It is therefore desirable to find β * and use it as an estimator of β.


[edit] References

  1. ^ A.C. Davidson Statistical Models. Cambridge University Press (2003).