Inclusion–exclusion principle

In combinatorics, the inclusion–exclusion principle (also known as the sieve principle) is an equation relating the sizes of two sets and their union. It states that if A and B are two (finite) sets, then

$|A \cup B| = |A| %2B |B| - |A \cap B|. \,$

The meaning of the statement is that the number of elements in the union of the two sets is the sum of the elements in each set, respectively, minus the number of elements that are in both. Similarly, for three sets A, B and C,

$|A \cup B \cup C| = |A| %2B |B| %2B |C| - |A \cap B| - |A \cap C| - |B \cap C| %2B |A \cap B \cap C|. \,$

This can be seen by counting how many times each region in the figure to the right is included in the right hand side.

More generally, for finite sets A₁, ..., A_n, one has the identity

$\begin{align} \biggl|\bigcup_{i=1}^n A_i\biggr| & {} =\sum_{i=1}^n\left|A_i\right| -\sum_{i,j\,:\,1 \le i < j \le n}\left|A_i\cap A_j\right| \\ & {}\qquad %2B\sum_{i,j,k\,:\,1 \le i < j < k \le n}\left|A_i\cap A_j\cap A_k\right|-\ \cdots\ %2B \left(-1\right)^{n-1} \left|A_1\cap\cdots\cap A_n\right|. \end{align}$

The name comes from the idea that the principle is based on over-generous inclusion, followed by compensating exclusion. When n > 2 the exclusion of the pairwise intersections is (possibly) too severe, and the correct formula is as shown with alternating signs.

This formula is attributed to Abraham de Moivre; it is sometimes also named for Daniel da Silva, Joseph Sylvester or Henri Poincaré.

For the case of three sets A, B, C the inclusion–exclusion principle is illustrated in the graphic on the right.

1 Proof
- 1.1 Alternative proof
2 Example
3 In probability
- 3.1 Special case
4 Diluted inclusion–exclusion principle
5 Other forms
6 Applications
- 6.1 Counting derangements
- 6.2 Counting intersections
7 See also
8 Citations
9 References

Proof

Let A denote the union $\scriptstyle \cup_{i=1}^n A_i$ of the sets A₁, ..., A_n. To prove the inclusion–exclusion principle in general, we first have to verify the identity

$1_A =\sum_{k=1}^n (-1)^{k-1}\sum_{\scriptstyle I\subset\{1,\ldots,n\}\atop\scriptstyle|I|=k} 1_{A_I}\qquad(*)$

for indicator functions, where

$A_I = \bigcap_{i\in I} A_i.$

There are at least two ways to do this:

First possibility: It suffices to do this for every x in the union of A₁, ..., A_n. Suppose x belongs to exactly m sets with 1 ≤ m ≤ n, for simplicity of notation say A₁, ..., A_m. Then the identity at x reduces to

$1 =\sum_{k=1}^m (-1)^{k-1}\sum_{\scriptstyle I\subset\{1,\ldots,m\}\atop\scriptstyle|I|=k} 1.$

The number of subsets of cardinality k of an m-element set is the combinatorical interpretation of the binomial coefficient $\textstyle\binom mk$ . Since $\textstyle1=\binom m0$ , we have

$\binom m0 =\sum_{k=1}^m (-1)^{k-1}\binom mk.$

Putting all terms to the left-hand side of the equation, we obtain the expansion for (1 – 1)^m given by the binomial theorem, hence we see that (*) is true for x.

Second possibility: The following function is identically zero

$(1_A-1_{A_1})(1_A-1_{A_2})\cdots(1_A-1_{A_n})\,=\,0,$

because: if x is not in A, then all factors are 0 - 0 = 0; and otherwise, if x does belong to some A_m, then the corresponding mth factor is 1 − 1 = 0. By expanding the product on the right-hand side, equation (*) follows.

Use of (*): To prove the inclusion–exclusion principle for the cardinality of sets, sum the equation (*) over all x in the union of A₁, ..., A_n. To derive the version used in probability, take the expectation in (*). In general, integrate the equation (*) with respect to μ. Always use linearity.

Alternative proof

Pick an element contained in the union of all sets and let $A_1, A_2, \dots, A_t$ be the individual sets containing it. (Note that $t > 0$ .) Since the element is counted precisely once by the left-hand side of the equation, we need to show that it is counted only once by the right-hand side. By the binomial theorem,

$(1-1)^t= \binom{t}{0} - \binom{t}{1} %2B \binom{t}{2} - \cdots %2B (-1)^{t}\binom{t}{t}$ .

Using the convention that $\binom{t}{0} = 1$ and rearranging terms, we have

$\begin{align} 1 & = \binom{t}{1} - \binom{t}{2} %2B \cdots %2B (-1)^{t%2B1}\binom{t}{t}\\ & = |\{A_i \mid 1 \leq i \leq t\}| - |\{A_i \cap A_j \mid 1 \leq i < j \leq t\}| %2B \cdots %2B (-1)^{t%2B1}|\{A_1 \cap A_2 \cap \cdots A_t\}|, \end{align}$

and so the chosen element is indeed counted only once by the right-hand side of the proposed equation.

Example

Suppose there is a deck of n cards, each card is numbered from 1 to n. Suppose a card numbered m is in the correct position if it is the mth card in the deck. How many ways, W, can the cards be shuffled with at least 1 card being in the correct position?

Begin by defining set A_m, which is all of the orderings of cards with the mth card correct. Then the number of orders, W, with at least one card being in the correct position, m, is

$W = \biggl|\bigcup_{m=1}^nA_m\biggr|.$

Apply the principle of inclusion-exclusion,

$\begin{align} W & = \sum_{m_1=1}^n \left| A_{m_1} \right| \\ & - \sum_{m_1,m_2\,:\,1 \le m_1 < m_2 \le n} \left|A_{m_1} \cap A_{m_2} \right| \\ & %2B \sum_{m_1,m_2,m_3\,:\,1 \le m_1 < m_2 < m_3 \le n} \left|A_{m_1} \cap A_{m_2} \cap A_{m_3} \right| \\ & - \cdots \\ & %2B (-1)^{p-1} \sum_{m_1,\ldots,m_p\,:\,1 \le m_1 < \cdots < m_p \le n} \left|A_{m_1} \cap \cdots \cap A_{m_p} \right| \\ & \cdots. \\ \end{align}$

Each value $A_{m_1} \cap \cdots \cap A_{m_p}$ represents the set of shuffles having p values m₁, ..., m_p in the correct position. Note that the number of shuffles with p values correct only depends on p, not on the particular values of $m$ . For example, the number of shuffles having the 1st, 3rd, and 17th cards in the correct position is the same as the number of shuffles having the 2nd, 5th, and 13th cards in the correct positions. It only matters that of the n cards, 3 were chosen to be in the correct position. Thus there are ${n \choose p}$ terms in each summation (see combination).

$\begin{align} W & = {n \choose 1} \left|A_1 \right| \\ & - {n \choose 2} \left|A_1 \cap A_2 \right| \\ & %2B {n \choose 3} \left|A_1 \cap A_2 \cap A_3 \right| \\ & - \cdots \\ & %2B (-1)^{p-1} {n \choose p} \left|A_1 \cap \cdots \cap A_p \right| \\ & \cdots. \\ \end{align}$

$\left|A_1 \cap \cdots \cap A_p \right|$ is the number of orderings having p elements in the correct position, which is equal to the number of ways of ordering the remaining n − p elements, or (n − p)!. Thus we finally get:

$\begin{align} W & = {n \choose 1} (n-1)! \\ & - {n \choose 2} (n-2)! \\ & %2B {n \choose 3} (n-3)! \\ & - \cdots \\ & %2B (-1)^{p-1} {n \choose p} (n-p)! \\ & \cdots \\ W & = \sum_{p=1}^n (-1)^{p-1} {n \choose p} (n-p)!. \\ \end{align}$

Noting that ${n \choose p} = \frac{n!}{p!(n-p)!}$ , this reduces to

$\begin{align} W & = \sum_{p=1}^n (-1)^{p-1}\, \frac{n!}{p!}. \end{align}$

A permutation where no card is in the correct position is called a derangement. Taking n! to be the total number of permutations, the probability Q that a random shuffle produces a derangement is given by

$Q = 1 - \frac{W}{n!} = \sum_{p=0}^n \frac{(-1)^p}{p!},$

the Taylor expansion of e⁻¹. Thus the probability of guessing an order for a shuffled deck of cards and being incorrect about every card is approximately 1/e or 37%.

In probability

In probability, for events A₁, ..., A_n in a probability space $\scriptstyle(\Omega,\mathcal{F},\mathbb{P})$ , the inclusion–exclusion principle becomes for n = 2

$\mathbb{P}(A_1\cup A_2)=\mathbb{P}(A_1)%2B\mathbb{P}(A_2)-\mathbb{P}(A_1\cap A_2),$

for n = 3

$\begin{align}\mathbb{P}(A_1\cup A_2\cup A_3)&=\mathbb{P}(A_1)%2B\mathbb{P}(A_2)%2B\mathbb{P}(A_3)\\ &\qquad-\mathbb{P}(A_1\cap A_2)-\mathbb{P}(A_1\cap A_3)-\mathbb{P}(A_2\cap A_3)\\ &\qquad%2B\mathbb{P}(A_1\cap A_2\cap A_3) \end{align}$

and in general

$\begin{align} \mathbb{P}\biggl(\bigcup_{i=1}^n A_i\biggr) & {} =\sum_{i=1}^n \mathbb{P}(A_i) -\sum_{i,j\,:\,i<j}\mathbb{P}(A_i\cap A_j) \\ &\qquad%2B\sum_{i,j,k\,:\,i<j<k}\mathbb{P}(A_i\cap A_j\cap A_k)-\ \cdots\ %2B(-1)^{n-1}\, \mathbb{P}\biggl(\bigcap_{i=1}^n A_i\biggr), \end{align}$

which can be written in closed form as

$\mathbb{P}\biggl(\bigcup_{i=1}^n A_i\biggr) =\sum_{k=1}^n (-1)^{k-1}\sum_{\scriptstyle I\subset\{1,\ldots,n\}\atop\scriptstyle|I|=k} \mathbb{P}(A_I),$

where the last sum runs over all subsets I of the indices 1, ..., n which contain exactly k elements, and

$A_I:=\bigcap_{i\in I} A_i$

denotes the intersection of all those A_i with index in I.

According to the Bonferroni inequalities, the sum of the first terms in the formula is alternately an upper bound and a lower bound for the LHS. This can be used in cases where the full formula is too cumbersome.

For a general measure space (S,Σ,μ) and measurable subsets A₁, ..., A_n of finite measure, the above identities also hold when the probability measure $\mathbb{P}$ is replaced by the measure μ.

Special case

If, in the probabilistic version of the inclusion–exclusion principle, the probability of the intersection A_I only depends on the cardinality of I, meaning that for every k in {1, ..., n} there is an a_k such that

$a_k=\mathbb{P}(A_I)\quad\text{for every}\quad I\subset\{1,\ldots,n\}\quad\text{with}\quad |I|=k,$

then the above formula simplifies to

$\mathbb{P}\biggl(\bigcup_{i=1}^n A_i\biggr) =\sum_{k=1}^n (-1)^{k-1}\binom nk a_k$

due to the combinatorial interpretation of the binomial coefficient $\scriptstyle\binom nk$ .

Similarly, when the cardinality of the union of finite sets A₁, ..., A_n is of interest, and these sets are a family with regular intersections, meaning that for every k in {1, ..., n} the intersection

$A_I:=\bigcap_{i\in I} A_i$

has the same cardinality, say a_k = |A_I|, irrespective of the k-element subset I of {1, ..., n}, then

$\biggl|\bigcup_{i=1}^n A_i\biggr| =\sum_{k=1}^n (-1)^{k-1}\binom nk a_k.$

An analogous simplification is possible in the case of a general measure space (S,Σ,μ) and measurable subsets A₁, ..., A_n of finite measure.

Diluted inclusion–exclusion principle

Other forms

The principle is sometimes stated in the form that says that if

$g(A)=\sum_{S\,:\,S\subseteq A}f(S)$

then

$f(A)=\sum_{S\,:\,S\subseteq A}(-1)^{\left|A\right|-\left|S\right|}g(S)\qquad(**)$

We show now that the combinatorial and the probabilistic version of the inclusion-exclusion principle are instances of (**). Take $\underline{m} = \{1,2,\ldots,m\}$ , $f(\underline{m}) = 0$ , and

$f(S)=\bigg|\bigcap_{i \in \underline{m} \backslash S} A_i \bigg\backslash \bigcup_{i \in S} A_i\bigg| \qquad\text{and}\qquad f(S)=\mathbb{P}\bigg(\bigcap_{i \in \underline{m} \backslash S} A_i \bigg\backslash \bigcup_{i \in S} A_i\bigg)$

respectively for all sets $S$ with $S \subsetneq \underline{m}$ . Then we obtain

$g(A)=\bigg|\bigcap_{i \in \underline{m} \backslash A} A_i\bigg|,~~ g(\underline{m}) = \bigg|\bigcup_{i \in \underline{m}} A_i \bigg| \qquad\text{and}\qquad g(A)=\mathbb{P}\bigg(\bigcap_{i \in \underline{m} \backslash A} A_i\bigg),~~ g(\underline{m}) = \mathbb{P}\bigg(\bigcup_{i \in \underline{m}} A_i\bigg)$

respectively for all sets $A$ with $A \subsetneq \underline{m}$ . This is because elements $a$ of $\cap_{i \in \underline{m} \backslash A} A_i$ can be contained in other $A_i$ 's ( $A_i$ 's with $i \in A$ ) as well, and the $\cap\backslash\!\!\cup\!\text{-}$ formula runs exactly through all possible extensions of the sets $\{A_i \mid i \in \underline{m} \backslash A\}$ with other $A_i$ 's, counting $a$ only for the set that matches the membership behavior of $a$ , if $S$ runs through all subsets of $A$ (as in the definition of $g(A)$ ).

Since $f(\underline{m}) = 0$ , we obtain from (**) with $A = \underline{m}$ that

$\sum_{T\,:\,\underline{m} \supseteq T \supsetneq \varnothing}(-1)^{\left|T\right|-1} g(\underline{m} \backslash T) = \sum_{S\,:\,\varnothing \subseteq S \subsetneq \underline{m}}(-1)^{m-\left|S\right|-1}g(S) = g(\underline{m})$

and by interchanging sides, the combinatorial and the probabilistic version of the inclusion-exclusion principle follow.

If one sees a number $n$ as a set of its prime factors, then (**) is a generalization of Möbius inversion formula for square-free natural numbers. Therefore, (**) is seen as the Möbius inversion formula for the incidence algebra of the partially ordered set of all subsets of A.

For a generalization of the full version of Möbius inversion formula, (**) must be generalized to multisets. For multisets instead of sets, (**) becomes

$f(A)=\sum_{S\,:\,S\subseteq A}\mu(A - S)g(S)\qquad(***)$

where $A - S$ is the multiset for which $(A - S) \uplus S = A$ , and

μ(S) = 1 if S is a set (i.e. a multiset without double elements) of even cardinality.
μ(S) = −1 if S is a set (i.e. a multiset without double elements) of odd cardinality.
μ(S) = 0 if S is a proper multiset (i.e. S has double elements).

Notice that $\mu$ $(A - S)$ is just the $(-1)^{\left|A\right|-\left|S\right|}$ of (**) in case $A - S$ is a set.

Proof of (***): Substitute

$g(S)=\sum_{T\,:\,T\subseteq S}f(T)$

on the right hand side of (***). Notice that $f(A)$ appears once on both sides of (***). So we must show that for all $T$ with $T\subsetneq A$ , the terms $f(T)$ cancel out on the right hand side of (***). For that purpose, take a fixed $T$ such that $T\subsetneq A$ and take an arbitrary fixed $a \in A$ such that $a \not\in T$ .

Notice that $A - S$ must be a set for each positive or negative appearance of $f(T)$ on the right hand side of (***) that is obtained by way of the multiset $S$ such that $T \subseteq S \subseteq A$ . Now each appearance of $f(T)$ on the right hand side of (***) that is obtained by way of $S$ such that $A - S$ is a set that contains $a$ cancels out with the one that is obtained by way of the corresponding $S$ such that $A - S$ is a set that does not contain $a$ . This gives the desired result.

Applications

In many cases where the principle could give an exact formula (in particular, counting prime numbers using the sieve of Eratosthenes), the formula arising doesn't offer useful content because the number of terms in it is excessive. If each term individually can be estimated accurately, the accumulation of errors may imply that the inclusion–exclusion formula isn't directly applicable. In number theory, this difficulty was addressed by Viggo Brun. After a slow start, his ideas were taken up by others, and a large variety of sieve methods developed. These for example may try to find upper bounds for the "sieved" sets, rather than an exact formula.

Counting derangements

Main article: Derangement

A well-known application of the inclusion–exclusion principle is to the combinatorial problem of counting all derangements of a finite set. A derangement of a set A is a bijection from A into itself that has no fixed points. Via the inclusion–exclusion principle one can show that if the cardinality of A is n, then the number of derangements is [n! / e] where [x] denotes the nearest integer to x; a detailed proof is available here.

This is also known as the subfactorial of n, written !n. It follows that if all bijections are assigned the same probability then the probability that a random bijection is a derangement quickly approaches 1/e as n grows.

Counting intersections

The principle of inclusion–exclusion, combined with de Morgan's theorem, can be used to count the intersection of sets as well. Let $\scriptstyle\overline{A}_k$ represent the complement of A_k with respect to some universal set A such that $\scriptstyle A_k\, \subseteq\, A$ for each k. Then we have

$\bigcap_{i=1}^n A_i = \overline{\bigcup_{i=1}^n \overline{A}_i}$

thereby turning the problem of finding an intersection into the problem of finding a union.

Citations

^ (Fernández, Fröhlich & Sokal 1992, Proposition 12.6)

References

Fernández, Roberto; Fröhlich, Jürg; Alan D., Sokal (1992), Random Walks, Critical Phenomena, and Triviality in Quantum Field Theory, Texts an Monographs in Physics, Berlin: Springer-Verlag, pp. xviii+444, ISBN 3-540-54358-9, MR 1219313, Zbl 0761.60061

This article incorporates material from principle of inclusion-exclusion on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.