Dirichlet process
From Wikipedia, the free encyclopedia
Let S be a set equipped with a suitable σ-algebra. A Dirichlet process over S is a stochastic process whose sample path is a probability distribution on S. The finite dimensional distributions are from the Dirichlet distribution: If M is a finite measure on S, and X is a random distribution drawn from a Dirichlet process, written as
then for any partition of S, say , we have that
The Dirichlet process was formally introduced by Thomas Ferguson[1] in 1973.
Contents |
[edit] The Chinese Restaurant Process
The above formal definition of the Dirichlet process does not lend itself to an understanding of the distribution's properties. A more intuitive approach is to consider a sampling scheme known as the Chinese Restaurant Process. This scheme results in a set of samples from S with an equivalent distribution to that which would be obtained by first sampling a distribution on S from a Dirichlet process, and then drawing i.i.d. samples from that distribution.
Suppose that J samples, have already been obtained. According to the Chinese Restaurant Process, the sample should be drawn from
where δθ is an atomic distribution centred on θ. Interpreting this, two properties are clear:
- Even if S is an uncountable set, there is a finite probability that two samples will have exactly the same value. Samples from a Dirichlet process are therefore discrete.
- The Dirichlet process exhibits a self-reinforcing property; the more often a given value has been sampled in the past, the more likely it is to be sampled again.
The name 'Chinese Restaurant Process' is derived from the following analogy: imagine an infinitely large restaurant containing an infinite number of tables, and able to serve an infinite number of dishes. The restaurant in question operates a somewhat unusual seating policy whereby new diners are seated either at a currently occupied table with probability proportional to the number of guests already seated there, or at an empty table with probability proportional to a constant. Guests who sit at an occupied table must order the same dish as those currently seated, whereas guests allocated a new table are served a dish at random according to the chef's taste. The distribution of dishes after J guests are served is a sample drawn as described above. The Chinese Restaurant Process is related to the Polya Urn sampling scheme for finite Dirichlet distributions.
[edit] Stick-breaking Construction
A third approach to the Dirichlet process is provided by the so-called stick-breaking construction. Let be a set of random variables so that
where α is the normalisation constant for the measure M, so that M = αMnorm. Define according to
and let be a set of samples from Mnorm. The distribution given by is then a sample from the corresponding Dirichlet process. This method provides an explicit construction of the non-parametric sample, and makes clear the fact that the samples are discrete.
The name 'stick-breaking' comes from the interpretation of the β variables as the lengths of the pieces of a unit-length stick, the remainder of which is repeatedly broken according to samples from a beta distribution.
[edit] Applications of the Dirichlet Process
As draws from a Dirichlet process are discrete, an important use is as a prior probability in infinite mixture models. In this case, S is the parametric set of component distributions. The generative process is therefore that a sample is drawn from a Dirichlet process, and in turn for each data point a value is drawn from this sample distribution and used as the component distribution for that data point. The fact that there is no limit to the number of distinct components which may be generated makes this kind of model ideal for the case when the number of mixture components is not well-defined in advance. For example, the infinite mixture of Gaussians model [2].
The infinite nature of these models also lends them to Natural Language Processing applications, where it is often desirable to treat the vocabulary as an infinite, discrete set.
[edit] Related Distributions
- The Pitman-Yor distribution (also known as the 'two-parameter Poisson-Dirichlet process') is a generalisation of the Dirichlet process.
- The hierarchical Dirichlet process extends the ordinary Dirichlet process for modelling grouped data.
[edit] External links
- http://www.cs.toronto.edu/~beal/npbayes/ - Webpage for the NIPS 2003 workshop on non-parametric Bayesian methods.
- http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps - Michael Jordan's NIPS 2005 tutorial: Nonparametric Bayesian Methods: Dirichlet Processes, Chinese Restaurant Processes and All That.
[edit] References
- ^ Ferguson, Thomas (1973). "Bayesian analysis of some nonparametric problems". Annals of Statistics 1: 209--230. doi: .
- ^ Rasmussen, Carl (2000). "The Infinite Gaussian Mixture Model". Advances in Neural Information Processing Systems (NIPS) 12.