Pattern theory
From Wikipedia, the free encyclopedia
Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to Artificial Intelligence since it does not begin by prescribing algorithms and machinery to recognize & classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language.
In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to:
- Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was commonplace at the time.
- Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph.
- Study the randomness and variability of these graphs.
- Create the basic classes of stochastic models applied by listing the deformations of the patterns.
- Synthesize (sample) from the models, not just analyze signals with it.
Broad in its mathematical coverage, Pattern Theory spans algebra and statistics, as well as local topological and global entropic properties.
The Brown University Pattern Theory Group was formed in 1972 by Ulf Grenander. Many mathematicians are currently working in this group, noteworthy among them being the Fields Medalist David Mumford. Mumford regards Grenander as his "guru" in this subject.
Contents |
[edit] Algebraic Foundations
We begin with an example to motivate the algebraic definitions that follows.
- Example 1 Grammar
- If we want to represent language patterns, the most immediate candidate for primitives might be words. However, such phrases as “in order to” immediately obviates the inappropriateness of words as atomics. In searching for other primitives, we might try the rules of grammar. We can represent grammars as Finite State Automata or Context-Free Grammars. Below is a sample Finite State grammar automaton
- The following phrases are generated from a few simple rules of the automaton and programming code in pattern theory:
- the boy who owned the small cottage went to the deep forest
- the prince walked to the lake
- the girl walked to the lake and the princess went to the lake
- the pretty prince walked to the dark forest
- To create such sentences, rewriting rules in Finite State Automatons act as "generators" to create the sentences as follows: if a machine starts in state 1, it goes to state 2 and writes the word “the”. From state 2, it writes one of 4 words: prince, boy, princess, girl. Such a simplistic automaton occasionally generates more awkward sentences
- the evil evil prince walked to the lake
- the prince walked to the dark forest and the prince walked to a forest and the pricess who lived in some big small big cottage who owned the small big small house went to a forest
- From the finite state diagram we can infer the following generators (left) that creates the signal . A generator is a 4-tuple: current state, next state, word written, probability of written word when there are multiple choices.
- Imagine that a "configuration" of generators are strung together linearly so its output forms a sentence, so each generator "bonds" to the generators before and after it. Denote these bonds as 1a,1b,2a,2b,…12a,12b. Each numerical label corresponds to the automaton's state and each letter "a" and "b" corresponds to the inbound and outbound bonds. Then the following bond table (right) is equivalent to the automaton diagram. For the sake of simplicity, only half of the bond table is shown -- the table is actually symmetric.
As one can tell from this example, and typical of signals we study, identifying the primitives and bond tables require some thought. The example highlights another important fact not readily apparent in other signals modalities: that a configuration is not the signal we observe; rather, we observe its image as a sentence. Herein lies a significant justification to distinguish an observable from a non-observable construct. Additionally, it gives us an algebraic structure to associate our Hidden Markov Models with. In sensory modalities such as the vision example below, the hidden configurations and observed images are much more similar that such a distinction may not seem justified. Fortunately, we have the Grammar example to remind us of this distinction.
Motivated by the example, we have the following definitions:
1. A generator , drawn as
is the primitive of Pattern Theory that generates the observed signal. Structurally, it is a value with interfaces, called bonds , which connects the g's to form a signal generator. 2 neighboring generators are connected when their bond values are the same. Similarity self-maps s: G -> G express the invariances of the world we are looking at, such as rigid body transformations, or scaling.
2. Bonds glue generators into a configuration, c, which creates the signal against a backdrop Σ, with global features described locally by a bond coupling table called ρ. The boolean function ρ is the principal component of the regularity 4-tuple <G,S,ρ,Σ>, which is defined as the Regularity is designed to capture the notion of the global feature of interest on a local scale.
3. An image (C mod R) captures the notion of an observed Configuration, as opposed to one which exists independently from any perceptual apparatus. Images are configurations distinguished only by their external bonds, inheriting the configuration’s composition and similarities transformations. Formally, images are equivalence classes partitioned by an Identification Rule "~" with 3 properties:
- ext(c) = ext(c') whenever c~c'
- sc~sc' whenever c~c'
- sigma(c1,c2) ~ sigma(c1',c2') whenever c1~c1', c2~c2' are all regular.
A configuration corresponding to a physical stimulus may have many images, corresponding to many observer perception's identification rule.
4. A pattern is the repeatable components of an image, defined as the S-invariant subset of an image. Similarities are reference transformations we use to define patterns, eg. rigid body transformations. At first glance, this definition seems suited for only texture patterns where the minimal sub-image is repeated over and over again. If we were to view an image of an object such as a dog, its is not repeated, yet seem like it seems familiar and should be a pattern. (Help needed here).
5. A deformation is a transformation of the original image that accounts for the noise in the environment and error in the perceptual apparatus. Grenander identifies 4 types of deformations: noise and blur, multi-scale superposition, domain warping, and interruptions.
- Example 2 Directed boundary
- This configuration of generators generating the image is created by primitives woven together by the bonding table, and perceived by an observer with the identification rule that maps non "0" & "1" generators to a single boundary element. Nine other undepicted generators are created by rotating each of the non-"0"&"1" generators by 90 degrees. Keeping the feature of "directed boundaries" in mind, the generators are cooked with some thought and is interpreted as follows: the "0" generator corresponds to interior elements, "1" to the exterior, "2" and its rotations are straight elements, and the remainder are the turning elements.
- With Boolean regularity defined as Product(all nbr bonds), any configurations with even a single generator violating the bond table is discarded from consideration. Thus only features in its purest form with all neighboring generators adhering to the bond table are allowed. This stringent condition can be relaxed using probability measures instead of Boolean bond tables.
- The new regularity no longer dictates a perfect directed boundary, but it defines a probability of a configuration in terms of the Acceptor function A(). Such configurations are allowed to have impurities and imperfections with respect to the feature of interest.
With the benefit of being given generators and complete bond tables, a difficult part of pattern analysis is done. In tackling a new class of signals and features, the task of devising the generators and bond table is much more difficult
Again, just as in Grammars, identifying the generators and bond tables require some thought. Just as subtle is the fact that a configuration is not the signal we observe. Rather, we observe its image as silhouette projections of the identification rule.
[edit] Topology
to be expanded
[edit] Entropy
PT defines order in terms of the feature of interest given by p(c).
- Energy(c) = -log P(c)
[edit] Statistics
Grenander’s Pattern Theory treatment of Bayesian inference in seems to be skewed towards on image reconstruction (eg. content addressible memory). That is given image I-deformed, find I. However, Mumford’s interpretation of Pattern Theory is broader and he defines PT to include many more well-known statistical methods. Mumford’s criteria for inclusion of a topic as Pattern Theory are those methods "characterized by common techniques and motivations", such as the HMM, EM algorithm, dynamic programming circle of ideas. Topics in this section will reflect Mumford's treatment of Pattern Theory. His principle of statistical Pattern Theory are the following:
- Use real world signals rather than constructed ones to infer the hidden states of interest.
- Such signals contain too much complexity and artifacts to succumb to a purely deterministic analysis, so employ stochastic methods too.
- Respect the natural structure of the signal, including any symmetries, independence of parts, marginals on key statistics. Validate by sampling from the derived models by and infer hidden states with Bayes’ rule.
- Across all modalities, a limited family of deformations distort the pure patterns into real world signals.
- Stochastic factors affecting an observation show strong conditional independence.
Statistical PT makes ubiquitous use of conditional probability in the form of Bayes theorem and Markov Models. Both these concepts are used to express the relation between hidden states (configurations) and observed states (images). Markov Models also captures the local properties of the stimulus, reminiscent of the purpose of bond table for regularity.
The generic set up is the following: Let s = the hidden state of the data that we wish to know. i = observed image. Bayes theorem gives
- p( s | i ) p(i) = p( s,i ) = p( i|s ) p(s)
- To analyze the signal (recognition): fix i, maximize p, infer s.
- To synthesize signals (sampling): fix s, generate i's, compare w/ real world images
The following conditional probability examples illustrates these methods in action:
[edit] Conditional probability for local properties
N-gram Text Strings: See Mumford's Pattern Theory by Examples, Chapter 1.
MAP ~ MDL ( MDL offers a glimpse of why the MAP probabilistic formulation make sense analytically )
[edit] Conditional probability for Hidden-observed states
[edit] Further reading
- 2007. Ulf Grenander and Michael Miller Pattern Theory: From Representation to Inference. Oxford University Press. 9780199297061 Paperback
- 1993. Ulf Grenander General Pattern Theory. Oxford Science Publications.
- 1996. Ulf Grenander Elements of Pattern Theory. John Hopkins University Press.