Differential entropy

From Wikipedia, the free encyclopedia

Differential entropy (also referred to as continuous entropy) is a concept in information theory which tries to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions.

1 Definition
2 Properties of differential entropy
3 Example: Exponential distribution
4 Differential entropies for various distributions
5 See also
6 References
7 External links

[edit] Definition

Let X be a random variable with a probability density function f whose support is a set $\mathbb X$ . The differential entropy $h (X)$ or $h (f)$ is defined as

$h(X) = -\int_\mathbb{X} f(x)\log f(x)\,dx.$

As with its discrete analog, the units of differential entropy depend on the base of the logarithm, which is usually 2 (i.e., the units are bits). See logarithmic units for logarithms taken in different bases. Related concepts such as joint, conditional differential entropy, and relative entropy are defined in a similar fashion. One must take care in trying to apply properties of discrete entropy to differential entropy, since probability density functions can be greater than 1. For example, Uniform(0,1/2) has differential entropy $\int_0^\frac{1}{2} -2\log2\,dx=-1$ .

The definition of differential entropy above can be obtained by partitioning the range of X into bins of length $Δ$ with associated sample points $i Δ$ within the bins, for X Riemann integrable. This gives a quantized version of X, defined by $X Δ = i Δ$ if $i\Delta \leq X \leq (i+1)\Delta$ . Then the entropy of $X Δ$ is

$-\sum_i f(i\Delta)\log f(i\Delta)\Delta - \sum f(i\Delta)\log(\Delta)\Delta$ .

The first term approximates the differential entropy, while the second term is approximately $- log(Δ)$ . Note that this procedure suggests that the differential entropy of a discrete random variable should be $-\infty$ .

Note that the continuous mutual information $I (X; Y)$ has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of partitions of X and Y as these partitions become finer and finer. Thus it is invariant under quite general transformations of X and Y, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values.

[edit] Properties of differential entropy

For two densities f and g, $D(f||g) \geq 0$ with equality if $f = g$ almost everywhere. Similarly, for two random variables X and Y, $I(X;Y) \geq 0$ and $h(X|Y) \leq h(X)$ with equality if and only if X and Y are independent.
The chain rule for differential entropy holds as in the discrete case

$h(X_1, \ldots, X_n) = \sum_{i=1}^{n} h(X_i|X_1, \ldots, X_i-1) \leq \sum h(X_i)$ .

Differential entropy is translation invariant, ie, $h (X + c) = h (X)$ for a constant c.
Differential entropy is in general not invariant under arbitrary invertible maps. In particular, for a constant a, $h(aX) = h(X) + \log \left| a \right|$ . For a vector valued random variable X and a matrix A, $h(A\mathbf{X}) = h(\mathbf{X}) + \log(\det A)$ .
If a random vector $\mathbf{X} \in \mathbb{R}^{n}$ has mean zero and covariance matrix K, $h(\mathbf{X}) \leq \frac{1}{2} \log[(2\pi e)^n \det{K}]$ with equality if and only if X is jointly gaussian.

[edit] Example: Exponential distribution

Let X be an exponentially distributed random variable with parameter $λ$ , that is, with probability density function

$f(x) = \lambda e^{-\lambda x} \mbox{ for } x \geq 0.$

Its differential entropy is then

$h_e(X)\,$	$=-\int_0^\infty \lambda e^{-\lambda x} \log (\lambda e^{-\lambda x})\,dx$
	$= -\left(\int_0^\infty (\log \lambda)\lambda e^{-\lambda x}\,dx + \int_0^\infty (-\lambda x) \lambda e^{-\lambda x}\,dx\right)$
	$= -\log \lambda \int_0^\infty f(x)\,dx + \lambda E[X]$
	$= -\log\lambda + 1\,.$

Here, $h e (X)$ was used rather than $h (X)$ to make it explicit that the logarithm was taken to base e, to simplify the calculation.

[edit] Differential entropies for various distributions

In the table below, $\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt$ (the gamma function), $\psi(x) = \frac{d}{dx} \Gamma(x)$ , $B (p, q) = Γ(p)Γ(q)$ , and $γ$ is Euler's constant.

Table of differential entropies.
Distribution Name	Probability density function (pdf)	Entropy in nats
Uniform	$f(x) = \frac{1}{b-a}$ for $a \leq x \leq b$	$\ln(b - a) \,$
Normal	$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-\frac{(x-\mu)^2}{2\sigma^2})$	$\ln\left(\sigma\sqrt{2\,\pi\,e}\right)\!$
Exponential	$f(x) = \frac{1}{\lambda} \exp(-\frac{x}{\lambda})$	$1 - \ln \lambda \,$
Rayleigh	$f(x) = \frac{x}{b^2} \exp(-\frac{x^2}{2b^2})$	$1 + \ln \frac{\beta}{\sqrt{2}} + \frac{\gamma}{2}$
Beta	$f(x) = \frac{x^{p-1}(1-x)^{q-1}}{B(p,q)}$ for $0 \leq x \leq 1$	$\ln B(p,q) - (p-1)[\psi(p) - \psi(p + q)] - (q-1)[\psi(q) - \psi(p + q)] \,$
Cauchy	$f(x) = \frac{\lambda}{\pi} \frac{1}{\lambda^2 + x^2}$	$\ln(4\pi\lambda) \,$
Chi	$f(x) = \frac{2}{2^{n/2} \sigma^n \Gamma(n/2)} x^{n-1} \exp(-\frac{x^2}{2\sigma^2})$	$\ln{\frac{\sigma\Gamma(n/2)}{\sqrt{2}}} - \frac{n-1}{2} \psi\left(\frac{n}{2}\right) + \frac{n}{2}$
Chi-squared	$f(x) = \frac{1}{2^{n/2} \sigma^n \Gamma(n/2)} x^{\frac{n}{2} - 1} \exp(-\frac{x}{2\sigma^2)}$	$\ln 2\sigma^{2}\Gamma\left(\frac{n}{2}\right) - \left(1 - \frac{n}{2}\right)\psi\left(\frac{n}{2}\right) + \frac{n}{2}$
Erlang	$f(x) = \frac{\beta^n}{(n-1)!} x^{n-1} \exp(-\beta x)$	$(1-n)\psi(n) + \ln \frac{\Gamma(n)}{\beta} + n$
F	$f(x) = \frac{n_1^{\frac{n_1}{2}} n_2^{\frac{n_2}{2}}}{B(\frac{n_1}{2},\frac{n_2}{2})} \frac{x^{\frac{n_1}{2} - 1}}{(n_2 + n_1 x)^{\frac{n_1 + n2}{2}}}$	$\ln \frac{n_1}{n_2} B\left(\frac{n_1}{2},\frac{n_2}{2}\right) + \left(1 - \frac{n_1}{2}\right) \psi\left(\frac{n_1}{2}\right) -$ $\left(1 + \frac{n_2}{2}\right)\psi\left(\frac{n_2}{2}\right) + \frac{n_1 + n_2}{2} \psi\left(\frac{n_1 + n_2}{2}\right)$
Gamma	$f(x) = \frac{x^{\alpha - 1} \exp(-\frac{x}{\beta})}{\beta^\alpha \Gamma(\alpha)}$	$\ln(\beta \Gamma(a)) + (1 - \alpha)\psi(\alpha) + \alpha \,$
Laplace	$f(x) = \frac{1}{2\lambda} \exp(-\frac{\|x - \theta\|}{\lambda})$	$1 + \ln(2\lambda) \,$
Logistic	$f(x) = \frac{e^{-x}}{(1 + e^{-x})^2}$	$2 \,$
Lognormal	$f(x) = \frac{1}{\sigma x \sqrt{2\pi}} \exp(-\frac{(\ln x - m)^2}{2\sigma^2})$	$m + \frac{1}{2} \ln(2\pi e \sigma^2)$
Maxwell-Boltzmann	$f(x) = 4 \pi^{-\frac{1}{2}} \beta^{\frac{3}{2}} x^{2} \exp(-\beta x^2)$	$\frac{1}{2} \ln \frac{\pi}{\beta} + \gamma - 1/2$
Generalized normal	$f(x) = \frac{2 \beta^{\frac{\alpha}{2}}}{\Gamma(\frac{\alpha}{2})} x^{\alpha - 1} \exp(-\beta x^2)$	$\ln{\frac{\Gamma(\alpha/2)}{2\beta^{\frac{1}{2}}}} - \frac{\alpha - 1}{2} \psi\left(\frac{\alpha}{2}\right) + \frac{\alpha}{2}$
Pareto	$f(x) = \frac{a k^a}{x^{a+1}}$	$\ln \frac{k}{a} + 1 + \frac{1}{a}$
Student's t	$f(x) = \frac{(1 + x^2/n)^{-\frac{n+1}{2}}}{\sqrt(n)B(\frac{1}{2},\frac{n}{2})}$	$\frac{n+1}{2}\psi\left(\frac{n+1}{2}\right) - \psi\left(\frac{n}{2}\right) + \ln \sqrt{n} B\left(\frac{1}{2},\frac{n}{2}\right)$
Triangular	$f(x) = \begin{cases} \frac{2x}{a} & 0 \leq x \leq a\\ \frac{2(1-x)}{1-a} & a \leq x \leq 1 \end{cases}$	$\frac{1}{2} - \ln 2$
Weibull	$f(x) = \frac{c}{\alpha} x^{c-1} \exp(-\frac{x^c}{\alpha})$	$\frac{(c-1)\gamma}{c} + \ln \frac{\alpha^{1/c}}{c} + 1$
Multivariate normal	$f_X(x_1, \dots, x_N) =$ $\frac{1} {(2\pi)^{N/2} \left\|\Sigma\right\|^{1/2}} \exp \left( -\frac{1}{2} ( x - \mu)^\top \Sigma^{-1} (x - \mu) \right)$	$\frac{1}{2}\ln\{(2\pi e)^{N} \det(\Sigma)\}$

[edit] See also

[edit] References

Thomas M. Cover, Joy A. Thomas. Elements of Information Theory New York: Wiley, 1991. ISBN 0-471-06259-6
Lazo, A. and P. Rathie. On the entropy of continuous probability distributions Information Theory, IEEE Transactions on, 1978. 24(1): p. 120-122.