Bayesian model comparison

From Wikipedia, the free encyclopedia

A common problem in statistical inference is to use data to determine which of two competing models is the truth. Frequentist statistics uses hypothesis tests for this purpose. There are several Bayesian approaches. One approach is through Bayes factors.

The posterior probability of a model given data, Pr(H|D), is given by Bayes' theorem:

$Pr(H|D) = \frac{Pr(D|H)Pr(H)}{Pr(D)}$

The key data-dependent term Pr(D|H) is a likelihood, and is sometimes called the evidence for model H; evaluating it correctly is the key to Bayesian model comparison.

The evidence is usually the normalizing constant or partition function of another inference, namely the inference of the parameters of model H given the data D.

The plausibility of two different models H₁ and H₂, parametrised by model parameter vectors $θ 1$ and $θ 2$ is assessed by the Bayes factor given by

$\frac{\Pr(D|H_2)}{\Pr(D|H_1)} = \frac{\int \Pr(\theta_2|H_2)\Pr(D|\theta_2,H_2)\,d\theta_2} {\int \Pr(\theta_1|H_1)\Pr(D|\theta_1,H_1)\,d\theta_1 }.$

Thus the Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. Alternatively, the Maximum likelihood estimate could be used for each of the parameters.

An advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure. It thus guards against overfitting.

Another approach is to treat model comparison as a decision problem, computing the expected value or cost of each model choice.

Another approach is to use Minimum Message Length (MML).

[edit] See also

Akaike information criterion
Schwarz's Bayesian information criterion
Conditional predictive ordinate
Wallace's Minimum Message Length (MML)
Model selection

[edit] References

Gelman, A., Carlin, J.,Stern, H. and Rubin, D. Bayesian Data Analysis. Chapman and Hall/CRC.(1995)
Bernardo, J., and Smith, A.F.M., Bayesian Theory. John Wiley. (1994)
Lee, P.M. Bayesian Statistics. Arnold.(1989).
Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M., Bayesian Methods for Nonlinear Classification and Regression. John Wiley. (2002).
Richard O. Duda, Peter E. Hart, David G. Stork (2000) Pattern classification (2nd edition), Section 9.6.5, p. 487-489, Wiley, ISBN 0-471-05669-3
Chapter 24 in Probability Theory - The logic of science by E. T. Jaynes, 1994.
David J.C. MacKay (2003) Information theory, inference and learning algorithms, CUP, ISBN 0-521-64298-1, (also available online)