MAP estimator

From Wikipedia, the free encyclopedia

In statistics, MAP estimates come from maximizing the likelihood function multiplied by a prior probability distribution.

For example, if the density of the data is given by f, then the likelihood function is given by

L(\theta) = f(x_1,\dots,x_n \mid \theta).\,

When θ is unknown, the method of maximum likelihood uses the value of θ that maximizes L(θ) as an estimate of θ. This is the maximum likelihood estimator (MLE) \widehat{\theta}_{MLE} of θ

In contrast, the MAP estimator \widehat{\theta}_{MAP} of θ postulates the existence of an a priori distribution π(θ), and the MAP estimator is given by

\widehat{\theta}_{MAP} \in \arg\max \pi(\theta) f(x_1, \dots, x_n \mid \theta).

[edit] Example

Suppose that we are given a sequence (x_1, \dots, x_n) of IID N(\mu,\sigma_v^2 ) random variables and an a prior distribution of μ is given by N(0,\sigma_m^2 ). We wish to find the MAP estimate of μ.

The function to be maximized is then given by

\pi(\mu) L(\mu) =  \frac{1}{\sqrt{2 \pi \sigma_m}} \exp\left(-\frac{1}{2} \left(\frac{\mu}{\sigma_v}\right)^2\right) \prod_{j=1}^n \frac{1}{\sqrt{2 \pi \sigma_v}} \exp\left(-\frac{1}{2} \left(\frac{x_j - \mu}{\sigma_v}\right)^2\right),

which is equivalent to minimizing in μ the following

\sum_{j=1}^n \left(\frac{x_j - \mu}{\sigma_v}\right)^2 + \left(\frac{\mu}{\sigma}\right)^2.

Thus, we see that the MAP estimator for μ is given by

\hat{\mu}_{MAP} =     \frac{\sigma_m^2}{n \sigma_m^2 + \sigma_v^2 } \sum_{j=1}^n x_j.

Note that as \sigma_m \to \infty that \hat{\mu}_{MAP} \to \hat{\mu}_{MLE}.

The case of \sigma_m \to \infty is called a non-informative prior and leads to an ill-defined a priori probability distribution.

[edit] See also

[edit] References

  • Harold W. Sorenson, (1980) "Parameter Estimation: Principles and Problems", Marcel Dekker.