Multivariate mutual information

In information theory there have been various attempts over the years to extend the definition of mutual information to more than two random variables. These attempts have met with a great deal of confusion and a realization that interactions among many random variables are poorly understood.

Definition

The conditional mutual information can be used to inductively define a multivariate mutual information (MMI) in a set- or measure-theoretic sense in the context of information diagrams. In this sense we define the multivariate mutual information as follows:

I(X_1;\ldots;X_{n+1}) = I(X_1;\ldots;X_n) - I(X_1;\ldots;X_n|X_{n+1}),

where

I(X_1;\ldots;X_n|X_{n+1}) = \mathbb E_{X_{n+1}}\big(I(X_1;\ldots;X_n)|X_{n+1}\big).

This definition is identical to that of interaction information except for a change in sign in the case of an odd number of random variables.

Properties

Multi-variate information and conditional multi-variate information can be decomposed into a sum of entropies, by Jakulin & Bratko (2003). The general expression for interaction information on variable set \mathcal{V}=\{X_{1},X_{2},\ldots ,X_{n}\} in terms of the marginal entropies:


I(\mathcal{V})\equiv -\sum_{\mathcal{T}\subseteq \mathcal{V}}(-1)^{\left\vert\mathcal{V}\right\vert -\left\vert \mathcal{T}\right\vert}H(\mathcal{T})

which is an alternating (inclusion-exclusion) sum over all subsets \mathcal{T}\subseteq \mathcal{V}, where \left\vert \mathcal{V}\right\vert =n.

 I(X_1;\ldots;X_n|Y) = -\sum_{T \subseteq \{1,\ldots,n\} } (-1)^{|T|} H(T|Y)

Example of Positive Multivariate mutual information

Positive MMI is typical of common-cause structures. For example, clouds cause rain and also block the sun; therefore, the correlation between rain and darkness is partly accounted for by the presence of clouds, I(rain;dark|cloud) \leq I(rain;dark). The result is positive MMI I(rain;dark;cloud).

Example of Negative Multivariate mutual information

The case of negative MMI is infamously non-intuitive. A prototypical example of negative I(X;Y;Z) has X as the output of an XOR gate to which Y and Z are the independent random inputs. In this case I(Y;Z) will be zero, but I(Y;Z|X) will be positive (1 bit) since once output X is known, the value on input Y completely determines the value on input Z. Since I(Y;Z|X)>I(Y;Z), the result is negative MMI I(X;Y;Z). It may seem that this example relies on a peculiar ordering of X,Y,Z to obtain the positive interaction, but the symmetry of the definition for I(X;Y;Z) indicates that the same positive interaction information results regardless of which variable we consider as the interloper or conditioning variable. For example, input Y and output X are also independent until input Z is fixed, at which time they are totally dependent.

This situation is an instance where fixing the common effect X of causes Y and Z induces a dependency among the causes that did not formerly exist. This behavior is colloquially referred to as explaining away and is thoroughly discussed in the Bayesian Network literature (e.g., Pearl 1988). Pearl's example is auto diagnostics: A car's engine can fail to start (X) due either to a dead battery (Y) or due to a blocked fuel pump (Z). Ordinarily, we assume that battery death and fuel pump blockage are independent events, because of the essential modularity of such automotive systems. Thus, in the absence of other information, knowing whether or not the battery is dead gives us no information about whether or not the fuel pump is blocked. However, if we happen to know that the car fails to start (i.e., we fix common effect X), this information induces a dependency between the two causes battery death and fuel blockage. Thus, knowing that the car fails to start, if an inspection shows the battery to be in good health, we conclude the fuel pump is blocked.

Battery death and fuel blockage are thus dependent, conditional on their common effect car starting. The obvious directionality in the common-effect graph belies a deep informational symmetry: If conditioning on a common effect increases the dependency between its two parent causes, then conditioning on one of the causes must create the same increase in dependency between the second cause and the common effect. In Pearl's automotive example, if conditioning on car starts induces I(X;Y;Z) bits of dependency between the two causes battery dead and fuel blocked, then conditioning on fuel blocked must induce I(X;Y;Z) bits of dependency between battery dead and car starts. This may seem odd because battery dead and car starts are governed by the implication battery dead \rightarrow car doesn't start. However, these variables are still not totally correlated because the converse is not true. Conditioning on fuel blocked removes the major alternate cause of failure to start, and strengthens the converse relation and therefore the association between battery dead and car starts.

Bounds

The bounds for the 3-variable case are


-min\ \{ I(X;Y|Z), I(Y;Z|X), I(X;Z|Y) \} \leq I(X;Y;Z) \leq min\ \{ I(X;Y), I(Y;Z), I(X;Z) \}


Difficulties

A complication is that this multivariate mutual information (as well as the interaction information) can be positive, negative, or zero, which makes this quantity difficult to interpret intuitively. In fact, for n random variables, there are 2^n-1 degrees of freedom for how they might be correlated in an information-theoretic sense, corresponding to each non-empty subset of these variables. These degrees of freedom are bounded by the various inequalities in information theory.


See also

References