A Markov random field, Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. It is also referred to as a Gibbs random field in case the probability distribution is positive, because, according to the Hammersley–Clifford theorem, it can then be represented by a Gibbs measure. A Markov random field is similar to a Bayesian network in its representation of dependencies. It can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies). The prototypical Markov random field is the Ising model; indeed, the Markov random field was introduced as the general setting for the Ising model[1]. A Markov random field is used to model various low- to mid-level tasks in image processing and computer vision.[2] For example, MRFs are used for image restoration, image completion, segmentation, texture synthesis, super-resolution and stereo matching.
Contents |
Given an undirected graph G = (V, E), a set of random variables X = (Xv)v ∈ V indexed by V form a Markov random field with respect to G if they satisfy the following equivalent Markov properties:
As the Markov properties of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the cliques of the graph.
Given a set of random variables X = (Xv)v ∈ V for which the joint density (with respect to a product measure) can be factorized over the cliques of G:
then X forms a Markov random field with respect to G, where cl(G) is the set of cliques of G (the definition is equivalent if only maximal cliques are used). The functions φC are often referred to as factor potentials or clique potentials.
Although some MRFs do not factorize (a simple example can be constructed on a cycle of 4 nodes[3]), in certain cases they can be shown to be equivalent conditions:
When such a factorization does exist, it is possible to construct a factor graph for the network.
A log-linear model is a Markov random field with feature functions such that the full-joint distribution can be written as
with partition function
where is the set of possible assignments of values to all the network's random variables. Usually, the feature functions are defined such that they are indicators of the clique's configuration, i.e. if corresponds to the i-th possible configuration of the k-th clique and 0 otherwise. This model is thus equivalent to the general formulation above if and the weight of a feature corresponds to the logarithm of the corresponding clique potential, i.e. , where is the i-th possible configuration of the k-th clique, i.e. the i-th value in the domain of the clique . Note that this is only possible if all clique potentials are non-zero, i.e. if none of the elements of are assigned a probability of 0.
Log-linear models are especially convenient for their interpretation. A log-linear model can provide a much more compact representation for many distributions, especially when variables have large domains. They are convenient too because their negative log likelihoods are convex. Unfortunately, though the likelihood of a log-linear Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is in general computationally infeasible.
A multivariate normal distribution forms a Markov random field with respect to a graph G = (V, E) if the missing edges correspond to zeros on the precision matrix (the inverse covariance matrix):
As in a Bayesian network, one may calculate the conditional distribution of a set of nodes given values to another set of nodes in the Markov random field by summing over all possible assignments to ; this is called exact inference. However, exact inference is a #P-complete problem, and thus computationally intractable in the general case. Approximation techniques such as Markov chain Monte Carlo and loopy belief propagation are often more feasible in practice. Some particular subclasses of MRFs, such as trees see Chow-Liu tree, have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs that permit efficient MAP, or most likely assignment, inference; examples of these include associative networks.
One notable variant of a Markov random field is a conditional random field, in which each random variable may also be conditioned upon a set of global observations . In this model, each function is a mapping from all assignments to both the clique k and the observations to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations.