Joint entropy

From Wikipedia, the free encyclopedia

The joint entropy is an entropy measure used in information theory. The joint entropy measures how much entropy is contained in a joint system of two random variables. If the random variables are X and Y, the joint entropy is written H(X,Y). Like other entropies, the joint entropy can be measured in bits, nits, or hartleys depending on the base of the logarithm.

Contents

[edit] Background

Given a random variable X, the entropy H(X) describes our uncertainty about the value of X. If X consists of several events x, which each occur with probability px, then the entropy of X is

H(X) = -\sum_x p_x \log_2(p_x) \!

Consider another random variable Y, containing events y occurring with probabilities py. Y has entropy H(Y).

However, if X and Y describe related events, the total entropy of the system may not be H(X) + H(Y). For example, imagine we choose an integer between 1 and 8, with equal probability for each integer. Let X represent whether the integer is even, and Y represent whether the integer is prime. One-half of the integers between 1 and 8 are even, and one-half are prime, so H(X) = H(Y) = 1. However, if we know that the integer is even, there is only a 1 in 4 chance that it is also prime; the distributions are related. The total entropy of the system is less than 2 bits. We need a way of measuring the total entropy of both systems.

[edit] Definition

We solve this by considering each pair of possible outcomes (x,y). If each pair of outcomes occurs with probability px,y, the joint entropy is defined as

H(X,Y) = -\sum_{x,y} p_{x,y} \log_2(p_{x,y}) \!

In the example above we are not considering 1 as a prime. Then the joint probability distribution becomes:

P(even,prime)=P(odd,not prime)=1/8  \quad

P(even,not prime)=P(odd,prime)=3/8 \quad

Thus, the joint entropy is

 -2\frac{1}{8}\log_2(1/8)  -2\frac{3}{8}\log_2(3/8) \approx 1.8 bits.

[edit] Properties

[edit] Greater than subsystem entropies

The joint entropy is always at least equal to the entropies of the original system; adding a new system can never reduce the available uncertainty.

H(X,Y) \geq H(X)

This inequality is an equality if and only if Y is a (deterministic) function of X.

if Y is a (deterministic) function of X, we also have

H(X) \geq H(Y)


[edit] Subadditivity

Two systems, considered together, can never have more entropy than the sum of the entropy in each of them. This is an example of subadditivity.

H(X,Y) \leq H(X) + H(Y)

This inequality is an equality if and only if X and Y are statistically independent.

[edit] Bounds

Like other entropies, H(X,Y) \geq 0 always.

[edit] Relations to Other Entropy Measures

The joint entropy is used in the definitions of the conditional entropy:

H(X|Y) = H(X,Y) - H(Y)\,

and the mutual information:

I(X;Y) = H(X) + H(Y) - H(X,Y)\,

In quantum information theory, the joint entropy is generalized into the joint quantum entropy.

[edit] References

  1. Theresa M. Korn; Korn, Granino Arthur. Mathematical Handbook for Scientists and Engineers: Definitions, Theorems, and Formulas for Reference and Review. New York: Dover Publications, 613-614. ISBN 0-486-41147-8. 
Languages