Information geometry

Information geometry is a branch of mathematics that applies the techniques of differential geometry to the field of probability theory. It derives its name from the fact that the Fisher information is used as the Riemannian metric when considering the geometry of probability distribution families that form a Riemannian manifold. Notably, information geometry has been used to prove the higher-order efficiency properties of the maximum-likelihood estimator.

Information geometry reached maturity through the work of Shun'ichi Amari and other Japanese mathematicians in the 1980s. Amari and Nagaoka's book, Methods of Information Geometry[1], is currently the de facto reference book of the relatively young field due to its broad coverage of significant developments attained using the methods of information geometry up to the year 2000. Many of these developments were previously only available in Japanese-language publications.

Contents

Introduction

In information geometry, a family (similar collection) of probability distributions over the random variable (or vector), X, is viewed as forming a manifold, M, with coordinate system, \Xi =\{\xi | \xi \in \real^n \}. One possible coordinate system for the manifold is the free parameters of the probability distribution family. Each point, P, in the manifold, M, with coordinate \xi, carries a function on the random variable (or vector), i.e. the probability distribution. We write this as P = p(x; \xi). The set of all points, P, in the probability family forms the manifold, M.

For example, with the family of normal distributions, the ordered pair of the mean, \mu, and standard deviation, \sigma, form one possible coordinate system, \xi = (\mu, \sigma). Each particular point in the manifold, such as \mu=\mu_0 and \sigma = \sigma_0, carries a specific normal distribution with mean, \mu_0, and standard deviation, \sigma_0, so that


p(x;\xi_0)=p(x; \mu_0, \sigma_0)= \frac{1}{\sqrt{2\pi\sigma_0^2}} \exp( -\frac{(x-\mu_0)^2}{2\sigma_0^2} )
.

To form a Riemannian manifold on which the techniques of differential geometry can be applied, a Riemannian metric must be defined. Information geometry takes the Fisher information to be the "natural" metric, although it is not the only possible metric. The Fisher information is re-termed to be called the "Fisher metric" and, significantly, is invariant under coordinate transformation.

Examples

The main tenet of information geometry is that many important structures in probability theory, information theory and statistics can be treated as structures in differential geometry by regarding a space of probability distributions as a differentiable manifold endowed with a Riemannian metric and a family of affine connections distinct from the canonical affine connection. The e-affine connection and m-affine connection geometrize expectation and maximization, as in the expectation-maximization algorithm.

For example,

The importance of studying statistical structures as geometrical structures lies in the fact that geometric structures are invariant under coordinate transforms. For example, the Fisher information metric is invariant under coordinate transformation.[1]

The statistician Fisher recognized in the 1920s that there is an intrinsic measure of amount of information for statistical estimators. The Fisher information matrix was shown by Cramer and Rao to be a Riemannian metric on the space of probabilities, and became known as Fisher information metric.

The mathematician Cencov (Chentsov) proved in the 1960s and 1970s that on the space of probability distributions on a sample space containing at least three points,

Both of these uniqueness are, of course, up to the multiplication by a constant.

Amari and Nagaoka's study in the 1980s brought all these results together, with the introduction of the concept of dual-affine connections, and the interplay among metric, affine connection and divergence. In particular,

Also, Amari and Kumon showed that asymptotic efficiency of estimates and tests can be represented by geometrical quantities.

Fisher information metric as a Riemannian metric

Information geometry makes frequent use of the Fisher information metric:

g_{jk}(\xi)=\int \frac{\partial \log p(x;\xi)}{\partial \xi_j} \frac{\partial \log p(x;\xi)}{\partial \xi_k} p(x;\xi)\, dx.

Substituting  \ell(x;\xi) = \log(p(x; \xi)) from information theory, the formula becomes:

g_{jk}(\xi)=\int \frac{\partial \ell(x,\xi)}{\partial \xi_j} \frac{\partial \ell(x,\xi)}{\partial \xi_k} p(x,\xi)\, dx.

History

The history of information geometry is associated with the discoveries of at least the following people, and many others

Some applications

Natural gradient

An important concept on information geometry is the natural gradient. The concept and theory of the natural gradient suggests an adjustment to the energy function of a learning rule. This adjustment takes into account the curvature of the (prior) statistical differential manifold, by way of the Fisher information metric.

This concept has many important applications in blind signal separation, neural networks, artificial intelligence, and other engineering problems that deal with information. Experimental results have shown that application of the concept leads to substantial performance gains.

Nonlinear filtering

Other applications concern statistics of stochastic processes and approximate finite dimensional solutions of the filtering problem (stochastic processes). As the nonlinear filtering problem admits an infinite dimensional solution in general, one can use a geometric structure in the space of probability distributions to project the infinite dimensional filter into an approximate finite dimensional one, leading to the projection filters introduced in 1987 by Bernard Hanzon.

See also

References

  1. ^ a b c d e f Shun'ichi Amari, Hiroshi Nagaoka - Methods of information geometry, Translations of mathematical monographs; v. 191, American Mathematical Society, 2000 (ISBN 978-0821805312)
  2. ^ Shun'ichi Amari - Differential-geometrical methods in statistics, Lecture notes in statistics, Springer-Verlag, Berlin, 1985.
  3. ^ S. Amari, Information geometry of the EM and em algorithms for neural networks, Neural Networks, vol. 8, 1995, pp. 1379-1408.

External links