Bayes factor

From Wikipedia, the free encyclopedia

In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing^[1]^[2].

Given a model selection problem in which we have to choose between two models M₁ and M₂, on the basis of a data vector x. The Bayes factor K is given by

$K = \frac{p(x|M_1)}{p(x|M_2)}.$

This is similar to a likelihood-ratio test, but instead of maximising the likelihood, Bayesians average it over the parameters. Generally, the models M₁ and M₂ will be parametrised by vectors of parameters θ₁ and θ₂; thus K is given by

$K = \frac{p(x|M_1)}{p(x|M_2)} = \frac{\int \,p(\theta_1|M_1)p(x|\theta_1, M_1)d\theta_1}{\int \,p(\theta_2|M_2)p(x|\theta_2, M_2)d\theta_2}.$

The logarithm of K is sometimes called the weight of evidence given by x for M₁ over M₂, measured in bits, nats, or bans, according to whether the logarithm is taken to base 2, base e, or base 10.

A value of K > 1 means that the data indicate that M₁ is more likely than M₂. Note that classical hypothesis testing gives one hypothesis (or model) preferred status (the 'null hypothesis'), and only considers evidence against it. Harold Jeffreys gave a scale for interpretation of K:

K	dB	Strength of evidence
< 1:1	< 0	Negative (supports M₂)
1:1 to 3:1	0 to 5	Barely worth mentioning
3:1 to 12:1	5 to 11	Positive
12:1 to 150:1	11 to 22	Strong
> 150:1	> 22	Very strong

The second column gives the corresponding weights of evidence in decibans (tenths of a power of 10). According to I. J. Good a change in a weight of evidence of 1 deciban (ie a change in an odds ratio from evens to about 55:45) is about as finely as humans can reasonably perceive their degree of belief in a hypothesis in everyday use.

The use of Bayes factors or classical hypothesis testing takes place in the context of inference rather than decision-making under uncertainty. That is, we merely wish to find out which hypothesis is true, rather than actually making a decision on the basis of this information. Frequentist statistics draws a strong distinction between these two because hypothesis tests are not coherent in the Bayesian sense. Bayesian procedures, including Bayes factors, are coherent, so there is no need to draw such a distinction. Inference is then simply regarded as a special case of decision-making under uncertainty in which the resulting action is to report a value. In a decision-making context Bayesian statisticians might use a Bayes factor as part of making a choice, but would also combine it with a prior distribution and a loss function associated with making the wrong choice. In an inference context the loss function would take the form of a scoring rule. Use of a logarithmic score function for example, leads to the expected utility taking the form of the Kullback-Leibler divergence. If the logarithms are to the base 2 this is equivalent to Shannon information.

1 Example
2 See also
3 References
4 External links

[edit] Example

Suppose we have a random variable which produces either a success or a failure. We want to compare a model M₁ where the probability of success is q = ½, and another model M₂ where q is completely unknown and we take a prior distribution for q which is uniform on [0,1]. We take a sample of 200, and find 115 successes and 85 failures. The likelihood is

${{200 \choose 115}q^{115}(1-q)^{85}}.$

So we have

$P(X=115|M_1)={200 \choose 115}\left({1 \over 2}\right)^{200}=0.00595...,\,$

but

$P(X=115|M_2)=\int_{q=0}^1{200 \choose 115}q^{115}(1-q)^{85}dq = {1 \over 201} = 0.00497...\,.$

The ratio is then 1.197..., which is "barely worth mentioning" even if it points very slightly towards M₁.

This is not the same as a classical likelihood ratio test, which would have found the maximum likelihood estimate for q, namely ¹¹⁵⁄₂₀₀ = 0.575, and from that get a ratio of 0.1045..., and so pointing towards M₂. Alternatively, Edwards's "exchange rate" of two units of likelihood per degree of freedom suggests that $M 2$ is preferable (just) to $M 1$ , as $0.1045\ldots = e^{-2.25\ldots}$ and $2.25 > 2$ : the extra likelihood compensates for the unknown parameter in $M 2$ .

A frequentist hypothesis test of $M 1$ (here considered as a null hypothesis) would have produced a more dramatic result, saying that that M₁ could be rejected at the 5% significance level, since the probability of getting 115 or more successes from a sample of 200 if q = ½ is 0.0200..., and as a two-tailed test of getting a figure as extreme as or more extreme than 115 is 0.0400... Note that 115 is more than two standard deviations away from 100.

M₂ is a more complex model than M₁ because it has a free parameter which allows it to model the data more closely. The ability of Bayes factors to take this into account is a reason why Bayesian inference has been put forward as a theoretical justification for and generalisation of Occam's razor, reducing Type I errors.

[edit] See also

Bayesian model comparison

[edit] References

^ Goodman S (1999). "Toward evidence-based medical statistics. 1: The P value fallacy.". Ann Intern Med 130 (12): 995-1004. PMID 10383371.
^ Goodman S (1999). "Toward evidence-based medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005-13. PMID 10383350.