Bayes estimator

From Wikipedia, the free encyclopedia

In decision theory and estimation theory, a Bayes estimator is an estimator or decision rule that maximizes the posterior expected value of a utility function or minimizes the posterior expected value of a loss function. (See also prior probability.)

Specifically, suppose an unknown parameter θ is known to have a prior distribution $Π$ . Let $δ$ be an estimator of θ (based on some measurements), and let $R (θ,δ)$ be a risk function, such as the mean squared error. The Bayes risk of $δ$ is defined as $E Π {R (θ,δ)}$ , where the expectation is taken over the probability distribution of $θ$ . An estimator $δ$ is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators.

If we take the mean squared error as a risk function, then it is not difficult to show that the Bayes' estimate of the unknown parameter is simply the posterior mean,

$\widehat{\theta }(x) = E[\theta |X]=\int \theta f(\theta |x)\,d\theta.$

The Bayes risk, in this case, is the posterior variance.

Other risk functions can be chosen, depending on how we measure the "distance" between our estimation and the unknown parameter. Some examples of these rules, and the corresponding Bayes' estimate are (we denote the posterior generalized distribution function as F):

1) A "linear" loss function, with $a > 0$ , which yields the posterior median as the Bayes' estimate:

$L(\theta,\widehat{\theta}) = a|\theta-\widehat{\theta}|$

$F(\widehat{\theta }(x)|X) = \tfrac{1}{2}$

2) Another "linear" loss function, which assigns different "weights" $a, b > 0$ to over or sub estimation. It yields a quantile from the posterior distribution, and is a generalization of the previous loss function:

$L(\theta,\widehat{\theta}) = \left\{\begin{matrix} a|\theta-\widehat{\theta}| & \mbox{for }\theta-\widehat{\theta} \ge 0 \\ b|\theta-\widehat{\theta}| & \ \ \ \mbox{for }\theta-\widehat{\theta} < 0 \end{matrix}\right.$

$F(\widehat{\theta }(x)|X) = \frac{a}{a+b}$

3) The following loss function is trickier: it yields either the posterior mode, or a point close to it depending on the curvature and properties of the posterior distribution. Small values of the parameter $K > 0$ are recommended, in order to use the mode as an aproximation ( $L > 0$ ):

$L(\theta,\widehat{\theta}) = \left\{\begin{matrix} 0 & \mbox{for }|\theta-\widehat{\theta}| < K \\ L & \ \ \ \mbox{for }|\theta-\widehat{\theta}| \ge K \end{matrix}\right.$