Variational Bayesian methods

From Wikipedia, the free encyclopedia

Variational Bayesian methods, also called ensemble learning, are a family of techniques for approximating intractable integrals arising in Bayesian statistics and machine learning. They can be used to lower bound the marginal likelihood (i.e. "evidence") of several models with a view to performing model selection, and often provide an analytical approximation to the parameter posterior which is useful for prediction.

[edit] Mathematical derivation

In variational inference, the posterior distribution over a set of latent variables X = \{X_1 \dots X_n\} given some data D is approximated by a variational distribution

P(X|D) \approx Q(X).

The variational distribution Q(X) is restricted to belong to a family of distributions of simpler form than P(X | D). This family is selected with the intention that Q can be made very similar to the true posterior. The difference between Q and this true posterior is measured in terms of a dissimilarity function d(Q;P) and hence inference is performed by selecting the distribution Q that minimises d. One choice of dissimilarity function where this minimisation is tractable is the Kullback-Leibler divergence (KL divergence), defined as

KL(Q || P) = \sum_X  Q(X) \log \frac{Q(X)}{P(X|D)}.

We can write the log evidence as

\log P(D)\! = KL(Q||P) - \sum_X Q(X) \log \frac{Q(X)}{P(X,D)}
= KL(Q||P) + \mathcal{L}(Q).

As the log evidence is fixed with respect to Q, maximising the final term \mathcal{L}(Q) will minimise the KL divergence between Q and P. By appropriate choice of Q, we can make \mathcal{L}(Q) tractable to compute and to maximise. Hence we have both a lower bound on the evidence \mathcal{L}(Q) and an analytical approximation to the posterior Q.

[edit] See also

[edit] External links