Beta-binomial model

From Wikipedia, the free encyclopedia

In empirical Bayes methods, the Beta-binomial model is an analytic model where the likelihood function L(x | θ) is specifed by a binomial distribution

L(x|\theta) = \operatorname{Bin}(x,\theta)\,
 = {n\choose x}\theta^x(1-\theta)^{n-x}\,

and the conjugate prior π(θ | α,β) is a Beta distribution

\pi(\theta|\alpha,\beta) = \mathrm{Beta}(\alpha,\beta)\,
 = \frac{1}{\mathrm{B}(\alpha,\beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}.

The Beta-binomial is a two-dimensional Multivariate Polya distribution, as the binomial and Beta distributions are special cases of the multinomial and Dirichlet distributions, respectively.

Contents

[edit] Derivation of the posterior and the marginal

It is convenient to reparameterize the distributions so that the expected mean of the prior is a single parameter: Let

\pi(\theta|\mu,M) = Beta(\mu,M)\,
 = \frac{\Gamma (M)}{\Gamma(\mu M)\Gamma(M(1-\mu))} \theta^{M\mu-1}(1-\theta)^{M(1-\mu)-1}

where

 \mu = \frac{\alpha}{\alpha+\beta}\, and  M = \alpha+\beta\,

so that

E(\theta|\mu,M) = \mu\,
 Var(\theta|\mu,M) = \frac{\mu(1-\mu)}{M+1}

The posterior distribution ρ(θ | x) is also a beta distribution

 \rho(\theta|x) \propto l(x|\theta)\pi(\theta|\mu,M)
 = Beta(x+M \mu , n-x+M(1- \mu) ) \,
 =  \frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x}\theta^{x+M\mu-1}(1-\theta)^{n-x+M(1-\mu)-1}\,

while the marginal distribution m(x | μ,M) is given by

 m(x|\mu,M) = \int_{0}^{1} l(x|\theta)\pi(\theta|\mu, M) d\theta


= \frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x} \int_{0}^{1} \theta^{x+M\mu-1}(1-\theta)^{n-x+M(1-\mu)-1} d\theta


= \frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x}  
\frac{\Gamma (x+M\mu)\Gamma(n-x+M(1-\mu)) }{\Gamma(n+M)}


[edit] Moment estimates

Because the marginal is a complex, non-linear function of Gamma and Digamma functions, it is quite difficult to obtain a marginal maximum likelihood estimate (MMLE) for the mean and variance. Instead, we use the method of iterated expectations to find the expected value of the marginal moments.

Let us write our model as a two-stage compund sampling model (for each event i out of ni possible events):

 X_{i}|\theta_{i} \sim \mathrm{Bin}(n_{i}, \theta_{i})\,
 \theta_{i} \sim \mathrm{Beta}|(\mu,M), i.i.d\,

We can find iterated moment estimates for the mean and variance using the moments for the distributions in the two-stage model:

E\left(\frac{X}{n}\right) = E\left[E\left(\frac{X}{n}|\theta\right)\right] = E(\theta) = \mu
\mathrm{var}\left(\frac{X}{n}\right) = E\left[\mathrm{var}\left(\frac{X}{n}|\theta\right)\right] + \mathrm{var}\left[E\left(\frac{X}{n}|\theta\right)\right]
 = E\left[\left(\frac{1}{n}\right)\theta(1-\theta)|\mu,M\right] + \mathrm{var}\left(\theta|\mu,M\right)
 = \frac{1}{n}\left(\mu(1-\mu)\right)+ \frac{n_{i}-1}{n_{i}}\frac{(\mu(1-\mu))}{M+1}
 = \frac{\mu(1-\mu)}{n}\left(1+\frac{n-1}{M+1}\right).

We now seek a point estimate \tilde{\theta} as a weighted average of the sample estimate \hat{\theta_{i}} and an estimate for \hat{\mu}. The sample estimate \hat{\theta_{i}} is given by

\hat{\theta_{i}} = \frac{x_{i}}{n_{i}}\,

Therefore we need point estimates for μ and M. The estimated mean \hat{\mu} is given as a weighted average

\hat{\mu} = \frac{\sum_{i=1}^{N} n_{i} \hat{\theta_{i}} }{\sum_{i=1}^{N} n_{i} } = \frac{\sum_{i=1}^{N} x_{i} }{\sum_{i=1}^{N} n_{i} }

The hyperparameter M is obtained using the moment estimates for the variance of the two-stage model:

s^{2} = \frac{1}{N} \sum_{i=1}^{N} \mathrm{var}\left(\frac{x_{i}}{n_{i}}\right) = \frac{1}{N} \sum_{i=1}^{N} \frac{\hat{\mu}(1-\hat{\mu})}{n_{i}} \left[1+\frac{n_{i}-1}{\hat{M}+1}\right]=
s^{2} = \frac{N \sum_{i=1}^{N} n_{i} (\hat{\theta_{i}} - \hat{\mu})^{2} }{(N-1)\sum_{i=1}^{N} n_{i} }

We can now solve for \hat{M}:

\hat{M} = \frac{\hat{\mu}(1-\hat{\mu}) - s^{2}}{s^{2} - \frac{\hat{\mu}(1 - \hat{\mu})}{N}\sum_{i=1}^{N} 1/n_{i} }.

Given our point estimates for the prior, we may now plug in these values to find a point estimate for the posterior

\tilde{\theta_{i}} = E(\Theta|\hat{\mu},\hat{M}) = \frac{x_{i} + \hat{M}\hat{\mu}}{n_{i}+\hat{M}} = \frac{\hat{M}}{n_{i}+\hat{M_{i}}}\hat{\mu} + \frac{n_{i}}{n_{i}+\hat{M_{i}}}\frac{x_{i}}{n_{i}}

[edit] Shrinkage factors

We may write the posterior estimate as a weighted average:

\tilde{\theta_{i}} = \hat{B_{i}}\hat{\mu} + (1-\hat{B_{i}})\hat{\theta_{i}}

where \hat{B_{i}} is called the Shrinkage Factor.

\hat{B_{i}} = \frac{\hat{M_{i}}}{\hat{M_{i}}+n_{i}}

[edit] Example

[edit] Maximum likelihood estimation

Maximum likelihood estimates from empirical data can be computed using general methods for fitting multinomial Polya distributions, methods for which are described in (Minka 2003).

[edit] Improved estimates: James-Stein estimator

[edit] External links

  • Fastfit contains Matlab code for fitting Beta-Binomial distributions (in the form of two-dimensional Polya distributions) to data.

[edit] References

* Minka, Thomas P. (2003). Estimating a Dirichlet distribution. Microsoft Technical Report.