Conditional independence

From Wikipedia, the free encyclopedia

These pictures represent the probabilities of events A, B and C by the areas shaded red, blue and green respectively with respect to the total area. In the first example A and B are conditionally independent given C or not C but in the second they are only given C, because $\Pr(A \cap B \mid \mbox{not } C) \not=$ $\Pr(A \mid \mbox{not } C)\Pr(B \mid \mbox{not } C).\,$

In probability theory, two events A and B are conditionally independent given a third event C precisely if the occurrence or non-occurrence of A and B are independent events in their conditional probability distribution given C. In the standard notation of probability theory,

$\Pr(A \cap B \mid C) = \Pr(A \mid C)\Pr(B \mid C),\,$

or equivalently,

$\Pr(A \mid B \cap C) = \Pr(A \mid C).\,$

Two random variables X and Y are conditionally independent given an event C if they are independent in their conditional probability distribution given C. Two random variables X and Y are conditionally independent given a third random variable W if for any measurable set S of possible values of W, X and Y are conditionally independent given the event [W ∈ S].

Conditional independence of more than two events, or of more than two random variables, is defined analogously.

1 Uses in Bayesian statistics
2 Rules of conditional independence
3 References
4 See also

[edit] Uses in Bayesian statistics

Let p be the proportion of voters who will vote "yes" in an upcoming referendum. In taking an opinion poll, one chooses n voters randomly from the population. For i = 1, ..., n, let X_i = 1 or 0 according as the ith chosen voter will or will not vote "yes".

In a frequentist approach to statistical inference one would not attribute any probability distribution to p (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that X₁, ..., X_n are independent random variables.

By contrast, in a Bayesian approach to statistical inference, one would assign a probability distribution to p regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that p is in any interval to which a probability is assigned. In that model, the random variables X₁, ..., X_n are not independent, but they are conditionally independent given the value of p. In particular, if a large number of the Xs are observed to be equal to 1, that would imply a high conditional probability, given that observation, that p is near 1, and thus a high conditional probability, given that observation, that the next X to be observed will be equal to 1.

[edit] Rules of conditional independence

A set of rules governing statements of conditional independence have been derived from the basic definition^[1] ^[2]. If we write $A \perp B \mid C$ to mean A is conditionally independent of B given C, then the following rules hold:

Symmetry: $X \perp Y \mid Z \implies Y \perp X \mid Z$

Decomposition: $Y,W \perp X \mid Z \implies Y \perp X \mid Z$ and $W \perp X \mid Z$

Weak Union: $X \perp Y,W \mid Z \implies X \perp Y \mid Z,W$

Contraction: $X \perp W \mid Z, Y$ and $X \perp Y \mid Z \implies X \perp W,Y\mid Z$

If the probabilities of X, Y, Z, W are all strictly greater than zero then the following also holds.

Intersection: $X \perp Y \mid Z, W$ and $X \perp W \mid Z, Y \implies X \perp Y, W \mid Z$

[edit] References

^ AP Dawid, Conditional Independence in Statistical Theory, Journal of the Royal Statistical Society Series B, 1979, Vol 41 pp 1-31
^ J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press