Chow-Liu tree

From Wikipedia, the free encyclopedia

A second-order dependency tree representing the product below.
A second-order dependency tree representing the product below.

A Chow-Liu tree is an efficient method for constructing a second-order product approximation of a joint distribution, first described in a paper by Chow & Liu (1968). The goals of such a decomposition, as with such Bayesian networks in general, may be either data compression or inference.

Contents

[edit] The Chow-Liu representation

The Chow-Liu method describes a joint probability distribution P(X_{1},X_{2},\ldots,X_{n}) as a product of second-order conditional and marginal distributions. For example, the six-dimensional distribution P(X1,X2,X3,X4,X5,X6) might be approximated as


P^{\prime
}(X_{1},X_{2},X_{3},X_{4},X_{5},X_{6})=P(X_{6}|X_{5})P(X_{5}|X_{2})P(X_{4}|X_{2})P(X_{3}|X_{2})P(X_{2}|X_{1})P(X_{1})

where each new term in the product introduces just one new variable, and the product can be represented as a first-order dependency tree, as shown in the figure. The Chow-Liu algorithm (below) determines which conditional probabilities are to be used in the product approximation. In general, unless there are no third or higher-order interactions, the Chow-Liu approximation is indeed an approximation, and cannot capture the complete structure of the original distribution. Pearl (1988) provides a modern analysis of the Chow-Liu tree as a Bayesian network.

[edit] The Chow-Liu algorithm

Chow and Liu show how to select second-order terms for the product approximation so that among all such second-order approximations (first-order dependency trees), the constructed approximation P^{\prime} has the minimum Kullback-Leibler distance to the actual distribution P, and is thus the closest approximation in the classical information-theoretic sense. The Kullback-Leibler distance between a second-order product approximation and the actual distribution is shown to be


D(P\parallel P^{\prime })=-\sum I(X_{i};X_{i-1})+\sum
H(X_{i})-H(X_{1},X_{2},\ldots ,X_{n})

where I(Xi;Xi − 1) is the mutual information between variable Xi and Xi − 1 and H(X_{1},X_{2},\ldots ,X_{n}) is the joint entropy of variable set \{X_{1},X_{2},\ldots ,X_{n}\}. Since the terms \sum H(X_{i}) and H(X_{1},X_{2},\ldots ,X_{n}) are independent of the dependency ordering in the tree, only the sum of the pairwise mutual informations, \sum I(X_{i};X_{i-1}), determines the quality of the approximation. Thus, if every branch (edge) on the tree is given a weight corresponding to the mutual information between the variables at its vertices, then the tree which provides the optimal second-order approximation to the target distribution is just the maximum-weight tree. The equation above also highlights the role of the dependencies in the approximation: When no dependencies exist, and the first term in the equation is absent, we have only an approximation based on first-order marginals, and the distance between the approximation and the true distribution is due to the redundancies that are not accounted for when the variables are treated as independent. As we specify second-order dependencies, we begin to capture some of that structure and reduce the distance between the two distributions.

Chow and Liu provide a simple algorithm for constructing the optimal tree; at each stage of the procedure the algorithm simply adds the maximum mutual information pair to the tree. See the original paper, Chow & Liu (1968), for full details.

[edit] Variations on Chow-Liu trees

The obvious problem which occurs when the actual distribution is not in fact a second-order dependency tree can still in some cases be addressed by fusing or aggregating together densely connected subsets of variables to obtain a "large-node" Chow-Liu tree (Huang & King 2002), or by extending the idea of greedy maximum branch weight selection to non-tree (multiple parent) structures (Williamson 2000). (Similar techniques of variable substitution and construction are common in the Bayes network literature, e.g., for dealing with loops. See Pearl (1988).)

[edit] References

  • Chow, C. K. & C. N. Liu (1968), "Approximating discrete probability distributions with dependence trees", IEEE Transactions on Information Theory IT-14 (3): 462-467.
  • Huang, Kaizhu; Irwin King & Michael R. Lyu (2002), "Constructing a large node Chow-Liu tree based on frequent itemsets", written at Singapore, in Wang, Lipo & Rajapakse, Jagath C. & Fukushima, Kunihiko & Lee, Soo-Young & Yao, Xin, Proceedings of the 9th International Conference on Neural Information Processing ({ICONIP}'02), 498-502.
  • Pearl, Judea (1988), written at San Mateo, CA, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann
  • Williamson, Jon (2000), "Approximating discrete probability distributions with Bayesian networks", written at Tasmania, Proceedings of the International Conference on Artificial Intelligence in Science and Technology, 16-20.

[edit] See also