Markov network

From Wikipedia, the free encyclopedia

A Markov network, or Markov random field, is a model of the (full) joint probability distribution of a set \mathcal{X} of random variables. A Markov network is similar to a Bayesian network in its representation of dependencies. It can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies).

Contents

[edit] Formal Definition

Formally, a Markov network consists of:

  • an undirected graph G = (V,E), where each vertex vV represents a random variable in \mathcal{X} and each edge {u,v} ∈ E represents a dependency between the random variables u and v,
  • a set of potential functions φk (also called factors or clique potentials), where each φk has the domain of some (sub)clique k in G. Each φk is a mapping from possible joint assignments (to the elements of k) to non-negative real numbers.

[edit] Joint Distribution Function

The joint distribution represented by a Markov network is given by:

 P(X=x) = \frac{1}{Z} \prod_{k} \phi_k (x_{ \{ k \}})

where x{k} is the state of the random variables in the kth clique, and the normalizing constant Z (also called a partition function), where

 Z = \sum_{x \isin \mathcal{X}} \prod_{k} \phi_k(x_{ \{ k \} }).

In practice, a Markov network is often conveniently expressed as a log-linear model, given by

 P(X=x) = \frac{1}{Z} \exp \left( \sum_{k} w_k \phi_k (x_{ \{ k \}}) \right)

with normalizing constant  Z = \sum_{x \isin \mathcal{X}} \exp \left(\sum_{k} w_k\phi_k(x_{ \{ k \} })\right). In this context, the wks are weights and the φks are functions from some subset of x to the reals. These models are especially convenient for their interpretation. A log-linear model can provide a much more compact representation for many distributions, especially when variables have large domains. They are convenient too because their negative log likelihoods are convex. Unfortunately, though the likelihood of a log-linear Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is in general computationally infeasible.

[edit] Independencies in a Markov Network

A node is conditionally independent of another node in the Markov network given some set of nodes S if every path from the two nodes goes through a node in S. This means that every node in a Markov network is conditionally independent of every other node given the nodes that make up that node's Markov blanket.

[edit] Inference

As in a Bayesian network, one may calculate the conditional distribution of a set of nodes V' = {v1,...,vi} given values to another set of nodes W' = {w1,...,wj} in the Markov network by summing over all possible assignments to u \notin V',W'; this is called exact inference. However, exact inference is a #P-complete problem, and thus computationally intractable in the general case. Approximation techniques such as Markov chain Monte Carlo and loopy belief propagation are often more feasible in practice. Some particular subclasses of MRFs, such as trees, have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs which permit efficient MAP, or most likely assignment, inference; examples of these include associative networks.

[edit] Conditional Random Fields

One notable variant of a Markov network is a conditional random field, in which each random variable may also be conditioned upon a set of global observations o. In this model, each function φk is a mapping from all assignments to both the clique k and the observations o to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations.

[edit] See also

[edit] External links

Languages