Wikipedia:Reference desk/Science/Birthday probability question

From Wikipedia, the free encyclopedia

This came up at my conscientious objector place. We know that if there is a group of over 365 people, at least two people will have the same birthday. But if we pose the question "backwards", i.e. how large a group of people has to be so that every day of the year is a birthday of someone? The correct answer, of course, is "infinite", as there is nothing preventing, for example, everyone from being born on the same day.

But given the number of people, what is the probability of every day in the year being someone's birthday? For 1 to 364 people, it is 0, i.e. such a thing is impossible. For exactly 365 people, it is 1/(365!), i.e. 1 divided by the factorial of 365. But what is the probability for larger groups? (For simplicity, we ignore leap years.) JIP | Talk 16:38, 27 October 2005 (UTC)

Your math basis is incorrect. The probability that 365 people have distinct birthdays is 365!/365^365. (1/365! is the probability that you take 365 people with distinct birthdays and, picking them one at a time, correctly pick them in birthday order). Let's work with smaller numbers: assume a 3-sided coin (it's more interesting than a two-sided, but the numbers are small). Leaving order intact, there are 3*3*3 or 27 combinations of flips with results a, b, and c. Of these, only 3! (abc, acb, bac, bca, cab, cba) have all 3 results, for 3!/3^3 probability. For a fourth flip, there are 3^4 or 81 combinations. Every 3-ple satisfying 3 flips will work with any fourth value (that's 6*3), and every 3-ple with 2 distinct values will satisfy for one particular fourth value (that's 18*1), a grand total of 36 working combinations. For a fifth, all the working 4-tuples with any (36*3), plus all the failed 2-item 3-tuples with the needed third value (18*2*1), plus the 1-item 3-tuples with a distinct fourth value plus the needed fifth (3*2*1), a grand total of 150/243. If somebody else can come along and make this into a series... you're better with symbols than me. — Lomn | Talk / RfC 17:56, 27 October 2005 (UTC)
I've got a recursive solution, but I'm not sure how (or if it's possible) to get a closed formula. Let f(M,N) be the number of ways of distributing M distinct balls among N distinct bins, with each bin containing at least one ball; and let g(M,N) be the number of ways of distributing M distinct balls among N distinct bins, possibly having one or more bins empty. (Throughout this, consider "balls" and "bins" to always refer to distinct balls, such as individually numbered balls, and distinct bins, so I don't have to keep specifying "distinct.")
Thus, the probability you are looking for is \frac{f(M,N)}{g(M,N)}. If you want to know the probability of having every day represented in a group of 1000 people, you are looking for \frac{f(1000,365)}{g(1000,365)}.
g is easy: g(M,N) = NM.
f is the difficult one. One way to compute the number of ways of distributing M balls among N bins, with at least one ball in each bin, then subtracting the number of ways of distributing M balls among N bins, with at least one bin being empty.
Let's consider 7 balls placed into 4 bins. The number of ways of distributing them, with no restriction, is g(7,4) = 47 = 16384. Now, we need to subtract the number of those ways which leave one or more bins empty. Let's consider the separate cases of exactly one bin empty, exactly two bins empty, and exactly three bins empty.
The number of ways of placing 7 balls into 4 bins, leaving exactly 1 specific bin empty, is f(7,3). So the number of ways to place 7 balls into 4 bins, leaving exactly 1 (but not any specific bin) empty, is 4 \cdot f(7,3).
Similarly, to place 7 balls into 4 bins, leaving 2 specifc bins (and no others) empty, is f(7,2). Since there are 6 different ways to choose 2 bins to remain empty, the number of ways to place 7 balls in 4 bins, leaving exactly 2 (but any 2) empty is 6 \cdot f(7,2).
Likewise, there are 4 \cdot f(7,1) ways to place 7 balls in 4 bins with exactly three of them empty. Of course, f(7,1) = 1.
Thus, f(7,4)=4^7-4 \cdot f(7,3)-6 \cdot f(7,2)-4 \cdot f(7,1)
To generalize:
f(M,1) = 1
f(M,N)=N^M-\sum_{i=1}^{N-1} {N \choose i} f(M,i) for N > 1, where {N \choose i} = \frac{N!}{i!(N-i)!}
It's easier to see with an example; let's look at 7 people and 4 distinct birthdates, so we're looking for \frac{f(7,4)}{g(7,4)}.
g(7,4) = 47 = 16384
f(7,1) = 1
f(7,2)=2^7-{2 \choose 1}f(7,1)=128-2 \cdot 1=126
f(7,3)=3^7-{3 \choose 2}f(7,2)-{3 \choose 1}f(7,1)=2187-3 \cdot 126-3 \cdot 1=1806
f(7,4)=4^7-{4 \choose 3}f(7,3)-{4 \choose 2}f(7,2)-{4 \choose 1}f(7,1)=16384-4 \cdot 1806-6 \cdot 126-4 \cdot 1=8400
Thus, the probability we are looking for is \frac{8400}{16384}\approx0.51
Chuck 20:09, 27 October 2005 (UTC)

[edit] A non-recursive solution

A naive application of the even-odd rule gives

P(m,n) = \sum_{k=0}^{n} (-1)^k {n \choose k} \left( \frac{n-k}{n} \right)^m

where P(m,n) is the probability of m people having all of n possible birthdays. At least for P(4,7) this formula gives the same answer as above, 525/1024 = 8400/16384, so I'm fairly confident it's right. (Note that the k=0 term is always 1, and the k=n term always 0 unless m=0.) I'd test it on larger numbers if I had a calculator that could evaluate binomial coefficients directly. —Ilmari Karonen 23:54, 27 October 2005 (UTC)

To explain that formula a little: {n \choose k} is obviously the number of ways to choose k days out of n, while \left( \frac{n-k}{n} \right)^m is the probability that no birthday falls on any of k given days out of n. The even-odd rule then says that -\sum_{k=1}^{n} (-1)^k {n \choose k} \left( \frac{n-k}{n} \right)^m is the probability of some day being no-one's birthday. Subtracting that from 1 gives P(m,n) above. —Ilmari Karonen 00:10, 28 October 2005 (UTC)

I went ahead and wrote a Perl program to calculate this. The core function is:

 sub p {
     my ($m, $n) = @_;
     my $p = Math::BigRat->new(1);
     my $c = Math::BigRat->new(1);
     for my $k (1 .. $n) {
         $c *= ($n+1-$k);
         $c /= $k;
         $p += Math::BigRat->new($n-$k."/".$n)**$m * $c * ($k%2?-1:1);
     }
     return $p;
 }

The use of Math::BigRat is necessary, since otherwise one gets roundoff errors pretty quickly. I've done some optimization to avoid repeated calculations, but it's still quite slow. I'm running the calculation for P(365,365) while I'm writing this... —Ilmari Karonen 01:15, 28 October 2005 (UTC)

Just so you know, P(365,365) ≈ 1.4549552156187×10-157. Took me just under an hour to compute. —Ilmari Karonen 02:28, 28 October 2005 (UTC)

Overnight, I realized that one needs to extend the sum to k=n (and note that 00=1) in order to get sensible answers for m=0. Admittedly this is a somewhat academic issue, since the m=0 case is trivial anyway, and for all other values of m that term is zero. I've nonetheless made the change to the formula and code above.

With the modification above, there's a really nice proof based on the construction of Pascal's triangle that P(0,n) = δ0n. I'm still trying to figure out a similar proof for m>0. —Ilmari Karonen 10:02, 28 October 2005 (UTC)

I just realized that the computation can be made much faster by taking the common factor 1/nm out of the loop. This (and a few other minor optimizations) gives the following improved routine:
 sub p {
     my ($m, $n) = @_;
     my $p = Math::BigRat->new(0);
     my $c = Math::BigRat->new($n);
     for my $k (1 .. $n) {
         $c *= ($n-$k);
         $c /= $k;
         $p += Math::BigRat->new($n-$k)**($m-1) * $c * ($k%2?-1:1);
     }
     return 1 + $p / Math::BigRat->new($n)**$m;
 }
With this code computing P(365,365) only takes about 15 seconds! I now feel stupid for not figuring this out sooner. —Ilmari Karonen 16:01, 28 October 2005 (UTC)

Maybe you should wait for a draft before you start being so objectionable. :-) Nelson Ricardo 03:59, 28 October 2005 (UTC)