Pointwise mutual information

Pointwise mutual information (PMI),[1] or point mutual information, is a measure of association used in information theory and statistics.

Definition

The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically:


\operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}.

The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (with respect to the joint distribution p(x,y)).

The measure is symmetric (\operatorname{pmi}(x;y)=\operatorname{pmi}(y;x)). It can take positive or negative values, but is zero if X and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI) is positive. PMI maximizes when X and Y are perfectly associated (i.e. p(x|y) or p(y|x)=1), yielding the following bounds:


-\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right] .

Finally, \operatorname{pmi}(x;y) will increase if p(x|y) is fixed but p(x)decreases.

Here is an example to illustrate:

xyp(x, y)
000.1
010.7
100.15
110.05

Using this table we can marginalize to get the following additional table for the individual distributions:

p(x)p(y)
0.80.25
1.20.75

With this example, we can compute four values for pmi(x;y). Using base-2 logarithms:

pmi(x=0;y=0)1
pmi(x=0;y=1)0.222392421
pmi(x=1;y=0)1.584962501
pmi(x=1;y=1)1.584962501

(For reference, the mutual information \operatorname{I}(X;Y) would then be 0.214170945)

Similarities to mutual information

Pointwise Mutual Information has many of the same relationships as the mutual information. In particular,


\begin{align}
\operatorname{pmi}(x;y) &=& h(x) + h(y) - h(x,y) \\ 
 &=& h(x) - h(x|y) \\ 
 &=& h(y) - h(y|x)
\end{align}

Where h(x) is the self-information, or -\log_2 p(X=x).

Normalized pointwise mutual information (npmi)

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.



\operatorname{npmi}(x;y) = \frac{\operatorname{pmi}(x;y)}{-\log \left[ p(x, y) \right] }

Chain-rule for pmi

Pointwise mutual information follows the chain rule, that is,

\operatorname{pmi}(x;yz) = \operatorname{pmi}(x;y) + \operatorname{pmi}(x;z|y)

This is easily proven by:


\begin{align}
\operatorname{pmi}(x;y) + \operatorname{pmi}(x;z|y) & {} = \log\frac{p(x,y)}{p(x)p(y)} + \log\frac{p(x,z|y)}{p(x|y)p(z|y)} \\ 
& {} = \log \left[ \frac{p(x,y)}{p(x)p(y)} \frac{p(x,z|y)}{p(x|y)p(z|y)} \right] \\ 
& {} = \log \frac{p(x|y)p(y)p(x,z|y)}{p(x)p(y)p(x|y)p(z|y)} \\
& {} = \log \frac{p(x,yz)}{p(x)p(yz)} \\
& {} = \operatorname{pmi}(x;yz)
\end{align}

References

  1. Kenneth Ward Church and Patrick Hanks (March 1990). "Word association norms, mutual information, and lexicography". Comput. Linguist. 16 (1): 22–29.

External links