Metropolis-Hastings algorithm

From Wikipedia, the free encyclopedia

The Proposal distribution Q proposes the next point that the random walk might move to.
Enlarge
The Proposal distribution Q proposes the next point that the random walk might move to.

In mathematics and physics, the Metropolis-Hastings algorithm is an algorithm used to generate a sequence of samples from a probability distribution that is difficult to directly sample from. This sequence can be used in Markov chain Monte Carlo to approximate the distribution (as with a histogram), or to compute an integral (such as an expected value). It is named after Nicholas Metropolis, who published it in 1953 for the specific case of the Boltzmann distribution, and W.K. Hastings,[1] who generalized it in 1970. The Gibbs sampling algorithm is a special case of the Metropolis-Hastings algorithm.

The Metropolis-Hastings algorithm can draw samples from any probability distribution p(x), requiring only that the density can be calculated at x. The algorithm generates a Markov chain in which each state xt depends only on the previous state xt − 1. The algorithm uses a proposal density Q(x';xt), which depends on the current state xt, to generate a new proposed sample x'. This proposal is 'accepted' as the next value (xt + 1=x') if u drawn from U(0,1) is

u < \frac{P(x')Q(x^t|x')}{P(x^t)Q(x'|x^t)}

For example, the proposal density could be a Gaussian function centred on the current state xt

Q( x'; x^t ) \sim N( x^t, \sigma^2 I)

reading Q(x';xt) as the probability density function for x' given the previous value xt. This proposal density would generate samples centred around the current state with variance σ2I. The original Metropolis algorithm calls for the proposal density to be symmetric ( Q(x;y) = Q(y;x) ); generalization by Hastings lifts this restriction. It is allowed for Q(x',xt) not to depend on x' at all, in which case the algorithm is called "Independence Chain Metropolis-Hastings" ( as opposed to "Random Walk Metropolis-Hastings" ). Independence chain M-H algorithm with suitable proposal density function can offer higher accuracy than random walk version, but it requires some a priori knowledge of the distribution.

Now, we draw a new proposal state x' with probability Q(x';xt) and then calculate a value

a = a_1 a_2\,

where

a_1 = \frac{P(x')}{P(x^t)}

is the likelihood ratio between the proposed sample x' and the previous sample xt, and

a_2 = \frac{Q( x^t; x' )}{Q(x';x^t)}

is the ratio of the proposal density in two directions (from xt to x' and vice versa). This is equal to 1 if the proposal density is symmetric. Then the new state xt + 1 is chosen with the rule

x^{t+1} = \left \{  \begin{matrix}   x'                            & \mbox{if }a > 1 \\   x'\mbox{ with probability }a, & \mbox{if }a < 1  \end{matrix} \right.

The Markov chain is started from a random initial value x0 and the algorithm is run for many iterations until this initial state is "forgotten". These samples, which are discarded, are known as burn-in. The algorithm works best if the proposal density matches the shape of the target distribution p(x), that is Q(x'; x^t) \approx p(x'), but in most cases this is unknown. If a Gaussian proposal is used the variance parameter σ2 has to be tuned during the burn-in period. This is usually done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last N samples. This is usually set to be around 60%. If the proposal steps are too small the chain will mix slowly (i.e., it will move around the space slowly and converge slowly to p(x)). If the proposal steps are too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density so a1 will be very small.

[edit] See also

[edit] References

  • Bernd A. Berg. "Markov Chain Monte Carlo Simulations and Their Statistical Analysis". Singapore, World Scientific 2004.
  • Chib, Siddhartha and Edward Greenberg: "Understanding the Metropolis–Hastings Algorithm". American Statistician, 49(4), 327–335, 1995
  • W.K. Hastings. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications", Biometrika, 57(1):97-109, 1970.
  • N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". Journal of Chemical Physics, 21(6):1087-1092, 1953. [2]

[edit] External links

In other languages