Variational Bayesian methods

From Wikipedia, the free encyclopedia

This article describes a highly specialized aspect of its subject.
Please help improve this article by adding more general information.

Variational Bayesian methods, also called ensemble learning, are a family of techniques for approximating intractable integrals arising in Bayesian statistics and machine learning. They can be used to lower bound the marginal likelihood (i.e. "evidence") of several models with a view to performing model selection, and often provide an analytical approximation to the parameter posterior which is useful for prediction.

[edit] Mathematical derivation

In variational inference, the posterior distribution over a set of latent variables $X = \{X_1 \dots X_n\}$ given some data $D$ is approximated by a variational distribution

$P(X|D) \approx Q(X).$

The variational distribution $Q (X)$ is restricted to belong to a family of distributions of simpler form than $P (X | D)$ . This family is selected with the intention that $Q$ can be made very similar to the true posterior. The difference between $Q$ and this true posterior is measured in terms of a dissimilarity function $d (Q; P)$ and hence inference is performed by selecting the distribution $Q$ that minimises $d$ . One choice of dissimilarity function where this minimisation is tractable is the Kullback-Leibler divergence (KL divergence), defined as

$KL(Q || P) = \sum_X Q(X) \log \frac{Q(X)}{P(X|D)}.$

We can write the log evidence as

$\log P(D)\!$	$= KL(Q\|\|P) - \sum_X Q(X) \log \frac{Q(X)}{P(X,D)}$
	$= KL(Q\|\|P) + \mathcal{L}(Q)$ .

As the log evidence is fixed with respect to $Q$ , maximising the final term $\mathcal{L}(Q)$ will minimise the KL divergence between $Q$ and $P$ . By appropriate choice of $Q$ , we can make $\mathcal{L}(Q)$ tractable to compute and to maximise. Hence we have both a lower bound on the evidence $\mathcal{L}(Q)$ and an analytical approximation to the posterior $Q$ .

[edit] See also

Variational message passing: a modular algorithm for variational Bayesian inference.
Expectation-maximization algorithm: a related approach which corresponds to a special case of variational Bayesian inference.