Coupon collector's problem (generating function approach)

From Wikipedia, the free encyclopedia

The coupon collector's problem can be solved in several different ways. The generating function approach is a combinatorial technique that allows to obtain precise results.

We introduce the probability generating function (PGF) $G (z)$ where $[z q] G (z)$ is the probability that we take q steps to collect the n coupons i.e. $T = q$ and the expectation is given by

$\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}.$

We can calculate $G (z)$ explicitly. We have

$G(z) = \frac{n}{n} z \frac{1}{1-\frac{1}{n} z} \frac{n-1}{n} z \frac{1}{1-\frac{2}{n} z} \frac{n-2}{n} z \frac{1}{1-\frac{3}{n} z} \frac{n-3}{n} z \cdots \frac{1}{1-\frac{n-1}{n} z} \frac{n-(n-1)}{n} z.$

To see what this means, note that

$\frac{1}{1 - p z} = 1 + p z + p^2 z^2 + p^3 z^3 + \cdots,$

so that this is the PGF of an event that has probability p occurring zero or more times, with the exponent of z counting the number of times. We split the sequence of coupons into segments. A new segment begins every time a new coupon is retrieved for the first time. The PGF is the product of the PGFs of the individual segments. Applying this to $G (z)$ , we see that it represents the following sequence of events:

retrieve the first coupon (no restrictions at this time)
retrieve the first coupon some number of times
retrieve the second coupon (probability $(n - 1) / n)$ )
retrieve a mix of the first and second coupons some number of times
retrieve the third coupon (probability $(n - 2) / n$ )
retrieve a mix of the first, second, and third coupons some number of times
retrieve the fourth coupon (probability $(n - 3) / n$ )
$\ldots$
retrieve the last coupon (probability $(n - (n - 1)) / n$ ).

We simplify $G (z)$ before we compute the expectation, getting

$G(z) = z^n \frac{n-1}{n-z} \frac{n-2}{n-2z} \frac{n-3}{n-3z} \cdots \frac{n-(n-1)}{n-(n-1)z}$ .

Now we use the fact that

$\frac{\mathrm{d}}{\mathrm{d}z} \frac{n-k}{n-kz} = \frac{k(n-k)}{(n-kz)^2}$

to obtain the derivative of $G (z)$

$\frac{\mathrm{d}}{\mathrm{d}z} G(z) = G(z) \left( \frac{n}{z} + \frac{1}{n-z} + \frac{2}{n-2z} + \frac{3}{n-3z} \cdots + \frac{n-1}{n-(n-1)z} \right)$ .

This yields

$\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1} = G(1) \left( n + \frac{1}{n-1} + \frac{2}{n-2} + \frac{3}{n-3} \cdots + \frac{n-1}{n-(n-1)} \right)$

$\operatorname{E}(T) = n + \sum_{k=1}^{n-1} \frac{k}{n-k}.$

Finally, some simplification:

$\sum_{k=1}^{n-1} \frac{k}{n-k} = \sum_{k=1}^{n-1} \left( \frac{k}{n-k} - \frac{n}{n-k} \right) + n H_{n-1} = n H_{n-1} - (n-1)$

so that

$\operatorname{E}(T) = n + n H_{n-1} - (n-1) = n H_{n-1} + 1 = n H_n, \quad \mbox{QED.}$

The PGF $G (z)$ makes it possible to obtain an exact value for the variance. Start with

$\operatorname{Var}(T) = \operatorname{E}(T(T-1)) + \operatorname{E}(T) - \operatorname{E}(T)^2,$

which consists entirely of factorial moments that can be calculated from the PGF. We already have the value of $\operatorname{E}(T)$ . For the remainder, use

$\operatorname{E}(T(T-1)) = \left. \left( \frac{\mathrm{d}}{\mathrm{d}z} \right)^2 G(z) \right|_{z=1}.$

The derivative is

$\begin{align} & G(z) \left( \frac{n}{z} + \frac{1}{n-z} + \frac{2}{n-2z} + \frac{3}{n-3z} \cdots + \frac{n-1}{n-(n-1)z} \right)^2 \\ + & \; G(z) \left( - \frac{n}{z^2} + \frac{1^2}{(n-z)^2} + \frac{2^2}{(n-2z)^2} + \frac{3^2}{(n-3z)^2} \cdots + \frac{(n-1)^2}{(n-(n-1)z)^2} \right) \end{align}$

Evaluation at $z = 1$ yields

$\begin{align} & n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{k^2}{(n-k)^2} = n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{(n-k)^2}{k^2} \\ = & \; n^2 H_n^2 - n + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + (n-1). \end{align}$

The conclusion is that

$\begin{align} \operatorname{Var}(T) & = \; n^2 H_n^2 - 1 + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + n H_{n-1} + 1 - n^2 H_n^2 \\ & = \; n^2 H_{n-1}^{(2)} - n H_{n-1} < \frac{\pi^2}{6} n^2, \quad \mbox{QED.} \end{align}$

Categories: Probability theory | Combinatorics | Articles containing proofs

Coupon collector's problem (generating function approach)

From Wikipedia, the free encyclopedia

Views

Navigation

Interaction

Search