Conditional entropy

From Wikipedia, the free encyclopedia

In information theory, the conditional entropy (or equivocation) quantifies the remaining entropy (i.e. uncertainty) of a random variable Y given that the value of a second random variable X is known. It is referred to as the entropy of Y conditional on X, and is written H(Y | X). Like other entropies, the conditional entropy is measured in bits, nats, or bans.

Given discrete random variable X with support \mathcal X and Y with support \mathcal Y, the conditional entropy of Y given X is defined as:

\begin{align}
H(Y|X)\ &\stackrel{\mathrm{def}}{=}\sum_{x\in\mathcal X}\,p(x)\,H(Y|X=x)\\
&{=}-\sum_{x\in\mathcal X}p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\,p(y|x)\\
&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(y,x)\,\log\,p(y|x)\\
&=-E_{p(x,y)}\log\,p(y|x).
\end{align}

From this definition and Bayes' theorem, the chain rule for conditional entropy is

H(Y|X)\,=\,H(Y,X)-H(X).

This is true because

\begin{align}
H(Y|X)&=-E_{p(x,y)}\log\,p(y|x)\\
&=-E_{p(x,y)}\log\left(\frac{p(y,x)}{p(x)}\right)\\
&=-E_{p(x,y)}(\log p(y,x)-\log p(x))\\
&=-E_{p(x,y)}\log p(y,x)+E_{p(x)}\log p(x)\\
&=H(Y,X)-H(X).
\end{align}

Intuitively, the combined system contains H(X,Y) bits of information: we need H(X,Y) bits of information to reconstruct its exact state. If we learn the value of X, we have gained H(X) bits of information, and the system has H(Y | X) bits remaining of uncertainty. H(Y | X) = 0 if and only if the value of Y is completely determined by the value of X. Conversely, H(Y | X) = H(Y) if and only if Y and X are independent random variables.

In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy.

[edit] References

  1. Theresa M. Korn; Korn, Granino Arthur. Mathematical Handbook for Scientists and Engineers: Definitions, Theorems, and Formulas for Reference and Review. New York: Dover Publications, 613-614. ISBN 0-486-41147-8.