Negative binomial distribution

From Wikipedia, the free encyclopedia

Negative binomial
Probability mass function
Cumulative distribution function
Parameters	$r > 0\!$ (real) $p \in (0;1)\!$ (real)
Support	$k \in \{0,1,2,\ldots\}\!$
Probability mass function (pmf)	$\frac{\Gamma(r+k)}{k!\,\Gamma(r)}\,p^r\,(1-p)^k \!$
Cumulative distribution function (cdf)	$I_p(r,k+1)\!$
Mean	$\frac{r(1-p)}{p}\!$
Median
Mode	$\lfloor(r-1)\,(1-p)/p\rfloor\!$ if $r > 1$ $0$ if $r\leq1$
Variance	$r\,\frac{1-p}{p^2}\!$
Skewness	$\frac{2-p}{\sqrt{r\,(1-p)}}\!$
Excess Kurtosis	$\frac{6}{r} + \frac{p^2}{r\,(1-p)}\!$
Entropy
mgf	$\left(\frac{p}{1-(1-p) e^t}\right)^r \!$
Char. func.	$\left(\frac{p}{1-(1-p) e^{i\,t}}\right)^r \!$

In probability and statistics the negative binomial distribution is a discrete probability distribution. The Pascal distribution (after Blaise Pascal) is a special case of the negative binomial. There is a convention among engineers, climatologists, and others to reserve "negative binomial" for the case of an integer-valued parameter r, and use "Polya" (for George Pólya) for the real-valued case, to the right. The Polya distribution more accurately models occurrences of "contagious" discrete events, like tornado outbreaks, than does the Poisson distribution.

1 Specification of the negative binomial distribution
- 1.1 Probability mass function
- 1.2 Cumulative distribution function
2 Occurrence
- 2.1 Waiting time in a Bernoulli process
- 2.2 Overdispersed Poisson
3 Related distributions
4 Properties
- 4.1 Relation to other distributions
- 4.2 Relation to the binomial theorem
5 Examples

[edit] Specification of the negative binomial distribution

[edit] Probability mass function

The family of negative binomial distributions is a two-parameter family; several parametrizations are in common use. One very common parameterization employs two real-valued parameters p and r with 0 < p < 1 and r > 0. Under this parameterization, the probability mass function of a random variable with a NegBin(r, p) distribution takes the following form:

$f(k;r,p) = \frac{\Gamma(r+k)}{k!\;\Gamma(r)} \; p^r \, (1-p)^k \!$

for k = 0,1,2,... (Γ is the gamma function).

Under an alternative parameterization, let

$p = \frac{\omega}{\lambda+\omega} \!$ and $r = \omega, \!$

and so the mass function becomes

$g(k) = \frac{\lambda^k}{k!} \times \frac{\Gamma(\omega+k)}{\Gamma(\omega)\;(\lambda+\omega)^k} \times \frac{1}{\left(1+\frac{\lambda}{\omega}\right)^{\omega}} \!$

where λ and ω are nonnegative real parameters. Under this parameterization, we have

$\lim_{\omega\to\infty} g(k) = \frac{\lambda^k}{k!} \times 1 \times \frac{1}{\exp(\lambda)} \!$

which is precisely the mass function of a Poisson-distributed random variable with Poisson rate λ. In other words, the alternatively parameterized negative binomial distribution converges to the Poisson distribution and ω controls the deviation from the Poisson. This makes the negative binomial distribution suitable as a robust alternative to the Poisson, which approaches the Poisson for large ω, but which has larger variance than the Poisson for small ω.

Third, the negative binomial distribution arises as a continuous mixture of Poisson distributions where the mixing distribution of the Poisson rate is a gamma distribution. Formally, this means that the mass function of the negative binomial distribution can also be written as

$f(k)\!\!\!\!$	$= \int_0^{\infty} \mathrm{Poisson}(k \,\|\, \lambda) \times \mathrm{Gamma}(\lambda \,\|\, r, (1-p)/p) \; \mathrm{d}\lambda \!$
	$= \int_0^{\infty} \frac{\lambda^k}{k!} \exp(-\lambda) \times \frac{\lambda^{r-1} \exp(-\lambda p/(1-p))}{\Gamma(r)\;((1-p)/p)^r} \; \mathrm{d}\lambda \!$
	$= \frac{1}{k!\;\Gamma(r)} \; p^r \; \frac{1}{(1-p)^r} \; \int_0^{\infty} \lambda^{(r+k)-1} \, \exp(-\lambda/(1-p)) \;\mathrm{d}\lambda \!$
	$= \frac{1}{k!\;\Gamma(r)} \; p^r \; \frac{1}{(1-p)^r} \; (1-p)^{r+k} \; \Gamma(r+k) \!$
	$= \frac{\Gamma(r+k)}{k!\;\Gamma(r)} \; p^r \, (1-p)^k. \!$

Because of this, the negative binomial distribution is also known as the gamma-Poisson (mixture) distribution.

[edit] Cumulative distribution function

The cumulative distribution function can be expressed in terms of the regularized incomplete beta function:

$F(k) = I_{p}(r, k+1). \!$

[edit] Occurrence

[edit] Waiting time in a Bernoulli process

The NegBin(r, p) distribution is the probability distribution of a certain number of failures and successes in a series of independent and identically distributed Bernoulli trials. Specifically, for k+r Bernoulli trials with success probability p, the negative binomial gives the probability of k failures and r successes, with success on the last trial. In other words, the negative binomial distribution is the probability distribution of the number of failures before the rth success in a Bernoulli process, with probability p of success on each trial.

Consider the following example. Suppose we repeatedly throw a die, and consider a "1" to be a "success". The probability of success on each trial is 1/6. The number of trials needed to get three successes belongs to the infinite set { 3, 4, 5, 6, ... }. That number of trials is a (displaced) negative-binomially distributed random variable. The number of failures before the third success belongs to the infinite set { 0, 1, 2, 3, ... }. That number of failures is also a negative-binomially distributed random variable.

A Bernoulli process is a discrete time process, and so the number of trials, failures, and successes are integers. For the special case where r is an integer, the negative binomial distribution is known as the Pascal distribution. In this case the gamma function is not needed to express the probability mass function, and factorials or binomial coefficients can be used instead:

$f(k) = \frac{(k+r-1)!}{k!\;(r-1)!} \; p^r \, (1-p)^k = {k+r-1 \choose r-1} \; p^r \, (1-p)^k \!$

A further specialization occurs when r = 1: in this case we get the probability distribution of failures before the first success (i.e. the probability of success on the (k+1)^th trial), which is a geometric distribution. To wit:

$f(k) = {k+1-1 \choose 1-1} \; p^1 \, (1-p)^k = p \, (1-p)^k \!$

[edit] Overdispersed Poisson

The negative binomial distribution, especially in its alternative parameterization described above, can be used as an alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean. If a Poisson distribution is used to model such data, the model mean and variance are equal. In that case, the observations are overdispersed with respect to the Poisson model. Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used to adjust the variance independently of the mean.

[edit] Related distributions

The geometric distribution is a special case of the negative binomial distribution, with

$\mathrm{Geometric}(p) = \mathrm{Neg Bin}(1, p).\,$

The negative binomial distribution converges to the Poisson distribution in the following sense:

$\mathrm{Poisson}(\lambda) = \lim_{r \to \infty} \mathrm{NegBin}(r, r/(\lambda+r)).\,$

[edit] Properties

[edit] Relation to other distributions

If X_r is a random variable following the negative binomial distribution with parameters r and p, then X_r is a sum of r independent variables following the geometric distribution with parameter p. As a result of the central limit theorem, X_r is therefore approximately normal for sufficiently large r.

Furthermore, if Y_s+r is a random variable following the binomial distribution with parameters s + r and p, then

$\Pr(X_r \leq s) \!\!\!\!$	$= I_p(r, s+1) \,$
	$= 1 - I_{1-p}(s+1, r) \,$
	$= 1 - I_{1-p}((s+r)-(r-1), (r-1)+1) \,$
	$= 1 - \Pr(Y_{s+r} \leq r-1) \,$
	$= \Pr(Y_{s+r} \geq r) \,$
	$= \Pr(\mathrm{after\ } s+r \mathrm{\ trials,\ there\ are\ at\ least\ } r \mathrm{\ successes})$

In this sense, the negative binomial distribution is the "inverse" of the binomial distribution.

The sum of independent negative-binomially distributed random variables with the same value of the parameter p but the "r-values" r₁ and r₂ is negative-binomially distributed with the same p but with "r-value" r₁ + r₂.

The negative binomial distribution is infinitely divisible, i.e., if X has a negative binomial distribution, then for any positive integer n, there exist independent identically distributed random variables X₁, ..., X_n whose sum has the same distribution that X has. These will not be negative-binomially distributed in the sense defined above unless n is a divisor of r (more on this below).

[edit] Relation to the binomial theorem

Suppose X is a random variable with a negative binomial distribution with parameters r and p. The statement that the sum from x = r to infinity, of the probability Pr[X = x], is equal to 1, can be shown by a bit of algebra to be equivalent to the statement that (1 − p)^{− r} is what Newton's binomial theorem says it should be.

Suppose Y is a random variable with a binomial distribution with parameters n and p. The statement that the sum from y = 0 to n, of the probability Pr[Y = y], is equal to 1, says that 1 = (p + (1 − p))ⁿ is what the strictly finitary binomial theorem of rudimentary algebra says it should be.

Thus the negative binomial distribution bears the same relationship to the negative-integer-exponent case of the binomial theorem that the binomial distribution bears to the positive-integer-exponent case.

Assume p + q = 1. Then the binomial theorem of elementary algebra implies that

$1=1^n=(p+q)^n=\sum_{x=0}^n {n \choose x} p^x q^{n-x}.$

This can be written in a way that may at first appear to some to be incorrect, and perhaps perverse even if correct:

$(p+q)^n=\sum_{x=0}^\infty {n \choose x} p^x q^{n-x},$

in which the upper bound of summation is infinite. If the binomial coefficient is defined by

${n \choose x}={n! \over x!(n-x)!}$

then it does not make sense when x > n, since factorials of negative numbers are not defined. But one may also read it as

${n \choose x}={n(n-1)(n-2)\cdots(n-x+1) \over x! (n-x)!}.$

In that case it is defined even when n is negative or is not an integer. But in our case of the binomial distribution it is zero when x > n. So why would we write the result in that form, with a seemingly needless sum of infinitely many zeros? The answer comes when we generalize the binomial theorem of elementary algebra to Newton's binomial theorem. Then we can say, for example

$(p+q)^{8.3}=\sum_{x=0}^\infty {8.3 \choose x} p^x q^{n-x}.$

Now suppose r > 0 and we use a negative exponent:

$1=p^r p^{-r}=p^r (1-q)^{-r}=p^r\sum_{x=0}^\infty {-r \choose x} (-q)^x.$

Then all of the terms are positive, and the term

$p^r {-r \choose x} (-q)^x$

is just the probability that the number of failures before the rth success is equal to x, provided r is an integer. (If r is a negative non-integer, so that the exponent is a positive non-integer, then some of the terms in the sum above are negative, so we do not have a probability distribution on the set of all nonnegative integers.)

Now we also allow non-integer values of r. Then we have a proper negative binomial distribution, which is a generalization of the Pascal distribution, which coincides with the Pascal distribution when r happens to be a positive integer.

Recall from above that

This property persists when the definition is thus generalized, and affords a quick way to see that the negative binomial distribution is infinitely divisible.

[edit] Examples

(After a problem by Dr. Diane Evans, professor of mathematics at Rose-Hulman Institute of Technology)

Pat is required to sell candy bars to raise money for the 6th grade field trip. There are thirty houses in the neighborhood, and Pat is not supposed to return home until five candy bars have been sold. So the child goes door to door, selling candy bars. At each house, there is a 0.4 probability of selling one candy bar and a 0.6 probability of selling nothing.

What's the probability mass function for selling the last candy bar at the n^th house?

Recall that the NegBin(r, p) distribution describes the probability of k failures and r successes in k+r Bernoulli(p) trials with success on the last trial. Selling five candy bars means getting five successes. The number of trials (i.e. houses) this takes is therefore k+5 = n. The random variable we are interested in is the number of houses, so we substitute k = n − 5 into a NegBin(5, 0.4) mass function and obtain the following mass function of the distribution of houses (for n ≥ 5):

$f(n) = {(n-5) + 5 - 1 \choose 5-1} \; 0.4^5 \; 0.6^{n-5} = {n-1 \choose 4} \; 2^5 \; \frac{3^{n-5}}{5^n}$

What's the probability that Pat finishes on the tenth house?

$f(10) = 0.1003290624 \,$

What's the probability that Pat finishes on or before reaching the eighth house?

To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities:

$f(5) = 0.01024 \,$

$f(6) = 0.03072 \,$

$f(7) = 0.055296 \,$

$f(8) = 0.0774144 \,$

$\sum_{j=5}^8 f(j) = 0.17367$

What's the probability that Pat exhausts all 30 houses in the neighborhood?

$1-\sum_{j=5}^{30} f(j) = 1 - I_{0.4}(5, 30-5+1) \approx 1 - 0.99849 = 0.00151$

	Probability distributions [ view • talk • edit ]
	Univariate	Multivariate
Discrete:	Bernoulli • binomial • Boltzmann • compound Poisson • degenerate • Gauss-Kuzmin • geometric • hypergeometric • logarithmic • negative binomial • parabolic fractal • Poisson • Rademacher • Skellam • uniform • Yule-Simon • zeta • Zipf • Zipf-Mandelbrot	Ewens • multinomial
Continuous:	Beta • Beta prime • Cauchy • chi-square • Dirac delta function • Erlang • exponential • exponential power • F • fading • Fisher's z • Fisher-Tippett • Gamma • generalized extreme value • generalized hyperbolic • generalized inverse Gaussian • Half-Logistic • Hotelling's T-square • hyperbolic secant • hyper-exponential • hypoexponential • inverse chi-square • inverse gaussian • inverse gamma • Kumaraswamy • Landau • Laplace • Lévy • Lévy skew alpha-stable • logistic • log-normal • Maxwell-Boltzmann • Maxwell speed • normal (Gaussian) • Pareto • Pearson • polar • raised cosine • Rayleigh • relativistic Breit-Wigner • Rice • Student's t • triangular • type-1 Gumbel • type-2 Gumbel • uniform • Voigt • von Mises • Weibull • Wigner semicircle • Wilks' lambda	Dirichlet • Kent • matrix normal • multivariate normal • von Mises-Fisher • Wigner quasi • Wishart
Miscellaneous:	Cantor • conditional • exponential family • infinitely divisible • location-scale family • marginal • maximum entropy • phase-type • posterior • prior • quasi • sampling • singular