Dirichlet distribution

From Wikipedia, the free encyclopedia

Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).

In probability and statistics, the Dirichlet distribution (after Johann Peter Gustav Lejeune Dirichlet), often denoted Dir(α), is a family of continuous multivariate probability distributions parametrized by the vector α of nonnegative reals. It is the multivariate generalization of the beta distribution, and conjugate prior of the multinomial distribution in Bayesian statistics.

1 Probability density function
2 Properties
3 Connections to other distributions
4 Random number generation
5 Intuitive interpretation of the parameters
6 See also
7 References

[edit] Probability density function

The probability density function of the Dirichlet distribution of order K is the following function of a K-dimensional vector x = (x₁, ..., x_K) with x_i ≥ 0:

$f(x; \alpha) \sim \prod_{i=1}^K x_i^{\alpha_i - 1} \;\delta\left(1 -\sum_{i=1}^K x_i\right)$

where α = (α₁, ..., α_K) is a parameter vector with α_i ≥ 0. The Dirac delta δ ensures that the density is zero unless

$\sum_{i=1}^K x_i = 1.\,\!$

The normalizing constant is the multinomial beta function, which is expressed in terms of the gamma function:

$\frac{\prod_{i=1}^K \Gamma(\alpha_i)}{\Gamma\left(\sum_{i=1}^K \alpha_i\right)} = \mathrm{B}(\alpha).$

The density can therefore be written as the function

$g(x; \alpha) = \frac{1}{\mathrm{B}(\alpha)} \prod_{i=1}^K x_i^{\alpha_i-1}$

with domain the set of K-component vectors x over the nonnegative reals with |x|₁ = 1 (i.e. the (K − 1)-simplex).

[edit] Properties

Let $X = (X_1, \ldots, X_K)\sim\operatorname{Dir}(\alpha)$ and $\alpha_0 = \sum_{i=1}^K\alpha_i,$ then

$\mathrm{E}[X_i|\alpha] = \frac{\alpha_i}{\alpha_0},$

$\mathrm{Var}[X_i|\alpha] = \frac{\alpha_i (\alpha_0-\alpha_i)}{\alpha_0^2 (\alpha_0+1)},$

$\mathrm{Cov}[X_iX_j|\alpha] = \frac{- \alpha_i \alpha_j}{\alpha_0^2 (\alpha_0+1)}.$

The mode of the distribution is the vector (x₁, ..., x_K) with

$x_i = \frac{\alpha_i - 1}{\alpha_0 - K}, \quad \alpha_i > 1.$

The Dirichlet distribution is conjugate to the multinomial distribution in the following sense: if

$\beta|X=(\beta_1, \ldots, \beta_{K})|X \sim \operatorname{Mult}(X),$

where β_i is the number of occurrences of i in a sample of n points from the discrete distribution on {1, ..., K} defined by X, then

$X | \beta \sim \operatorname{Dir}(\alpha + \beta).$

This relationship is used in Bayesian statistics to estimate the hidden parameters, X, of a discrete probability distribution given a collection of n samples. Intuitively, if the prior is represented as Dir(α), then Dir(α + β) is the posterior following a sequence of observations with histogram β.

[edit] Connections to other distributions

If, for $i\in\{1,2,\ldots,K\},$

$Y_i\sim\operatorname{Gamma}(\textrm{shape}=\alpha_i,\textrm{scale}=1)$ independently,

then

$V=\sum_{i=1}^K Y_i\sim\operatorname{Gamma}(\textrm{shape}=\sum_{i=1}^K\alpha_i,\textrm{scale}=1),$ and

$(X_1,\ldots,X_K) = (Y_1/V,\ldots,Y_K/V)\sim \operatorname{Dir}(\alpha_1,\ldots,\alpha_K).$

Though the X_is are not independent from one another, they can be seen to be generated from a set of $K$ independent gamma random variables. Unfortunately, since the sum $V$ is lost in the process of forming X = (X₁, ..., X_K), it is not possible to recover the original gamma random variables from these values alone. Nevertheless, because independent random variables are simpler to work with, this reparametrization can still be useful for proofs about properties of the Dirichlet distribution.

[edit] Random number generation

A method to sample a random vector $x=(x_1, \ldots, x_K)$ from the K-dimensional Dirichlet distribution with parameters $(\alpha_1, \ldots, \alpha_K)$ follows immediately from this connection. First, draw K independent random samples $y_1, \ldots, y_K$ from gamma distributions each with density

$\frac{y_i^{\alpha_i-1} \; e^{-y_i}}{\Gamma (\alpha_i)}, \!$

and then set

$x_i = y_i/\sum_{j=1}^K y_j. \!$

[edit] Intuitive interpretation of the parameters

For the case K=3, the probability density function is a concave, unimodal function defined on a filled triangle (the 2-simplex). The vector α/α₀ is the mean (not mode) of the distribution.

[edit] See also

[edit] References

Non-Uniform Random Variate Generation, by Luc Devroye http://cg.scs.carleton.ca/~luc/rnbookindex.html

	Probability distributions [ view • talk • edit ]
	Univariate	Multivariate
Discrete:	Bernoulli • binomial • Boltzmann • compound Poisson • degenerate • Gauss-Kuzmin • geometric • hypergeometric • logarithmic • negative binomial • parabolic fractal • Poisson • Rademacher • Skellam • uniform • Yule-Simon • zeta • Zipf • Zipf-Mandelbrot	Ewens • multinomial
Continuous:	Beta • Beta prime • Cauchy • chi-square • Dirac delta function • Erlang • exponential • exponential power • F • fading • Fisher's z • Fisher-Tippett • Gamma • generalized extreme value • generalized hyperbolic • generalized inverse Gaussian • Half-Logistic • Hotelling's T-square • hyperbolic secant • hyper-exponential • hypoexponential • inverse chi-square • inverse Gaussian • inverse gamma • Kumaraswamy • Landau • Laplace • Lévy • Lévy skew alpha-stable • logistic • log-normal • Maxwell-Boltzmann • Maxwell speed • normal (Gaussian) • Pareto • Pearson • polar • raised cosine • Rayleigh • relativistic Breit-Wigner • Rice • Student's t • triangular • type-1 Gumbel • type-2 Gumbel • uniform • Voigt • von Mises • Weibull • Wigner semicircle • Wilks' lambda	Dirichlet • Kent • matrix normal • multivariate normal • von Mises-Fisher • Wigner quasi • Wishart
Miscellaneous:	Cantor • conditional • exponential family • infinitely divisible • location-scale family • marginal • maximum entropy • phase-type • posterior • prior • quasi • sampling • singular