Geometric distribution

Geometric
Probability mass function
Cumulative distribution function
Parameters 0< p \leq 1 success probability (real) 0< p \leq 1 success probability (real)
Support k trials where k \in \{1,2,3,\dots\}\! k failures where k \in \{0,1,2,3,\dots\}\!
Probability mass function (pmf) (1 - p)^{k-1}\,p\! (1 - p)^{k}\,p\!
Cumulative distribution function (CDF) 1-(1 - p)^k\! 1-(1 - p)^{k+1}\!
Mean \frac{1}{p}\! \frac{1-p}{p}\!
Median \left\lceil \frac{-1}{\log_2(1-p)} \right\rceil\!

(not unique if -1/\log_2(1-p) is an integer)

\left\lceil \frac{-1}{\log_2(1-p)} \right\rceil\! - 1

(not unique if -1/\log_2(1-p) is an integer)

Mode 1 0
Variance \frac{1-p}{p^2}\! \frac{1-p}{p^2}\!
Skewness \frac{2-p}{\sqrt{1-p}}\! \frac{2-p}{\sqrt{1-p}}\!
Excess kurtosis 6+\frac{p^2}{1-p}\! 6+\frac{p^2}{1-p}\!
Entropy \tfrac{-(1-p)\log_2 (1-p) - p \log_2 p}{p}\! \tfrac{-(1-p)\log_2 (1-p) - p \log_2 p}{p}\!
Moment-generating function (mgf) \frac{pe^t}{1-(1-p) e^t}\!,
for t<-\ln(1-p)\!
\frac{p}{1-(1-p)e^t}\!
Characteristic function \frac{pe^{it}}{1-(1-p)\,e^{it}}\! \frac{p}{1-(1-p)\,e^{it}}\!

In probability theory and statistics, the geometric distribution is either of two discrete probability distributions:

Which of these one calls "the" geometric distribution is a matter of convention and convenience.

These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of the number X); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

It’s the probability that the first occurrence of success requires k number of independent trials, each with success probability p. If the probability of success on each trial is p, then the probability that the kth trial (out of k trials) is the first success is

\Pr(X = k) = (1-p)^{k-1}\,p\,

for k = 1, 2, 3, ....

The above form of geometric distribution is used for modeling the number of trials until the first success. By contrast, the following form of geometric distribution is used for modeling number of failures until the first success:

\Pr(Y=k) = (1 - p)^k\,p\,

for k = 0, 1, 2, 3, ....

In either case, the sequence of probabilities is a geometric sequence.

For example, suppose an ordinary die is thrown repeatedly until the first time a "1" appears. The probability distribution of the number of times it is thrown is supported on the infinite set { 1, 2, 3, ... } and is a geometric distribution with p = 1/6.

Moments and cumulants

The expected value of a geometrically distributed random variable X is 1/p and the variance is (1  p)/p2:

\mathrm{E}(X) = \frac{1}{p},
 \qquad\mathrm{var}(X) = \frac{1-p}{p^2}.

Similarly, the expected value of the geometrically distributed random variable Y = X  1 (where Y corresponds to the pmf listed in the right column) is q/p = (1  p)/p, and its variance is (1  p)/p2:

\mathrm{E}(Y) = \frac{1-p}{p},
 \qquad\mathrm{var}(Y) = \frac{1-p}{p^2}.

Let μ = (1  p)/p be the expected value of Y. Then the cumulants \kappa_n of the probability distribution of Y satisfy the recursion

\kappa_{n+1} = \mu(\mu+1) \frac{d\kappa_n}{d\mu}.

Outline of proof: That the expected value is (1  p)/p can be shown in the following way. Let Y be as above. Then


\begin{align}
\mathrm{E}(Y) & {} =\sum_{k=0}^\infty (1-p)^k p\cdot k \\
& {} =p\sum_{k=0}^\infty(1-p)^k k \\
& {} = p (1-p) \left[\frac{d}{dp}\left(-\sum_{k=0}^\infty (1-p)^k\right)\right] \\
& {} =-p(1-p)\frac{d}{dp}\frac{1}{p}=\frac{1-p}{p}.
\end{align}

(The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.)

Parameter estimation

For both variants of the geometric distribution, the parameter p can be estimated by equating the expected value with the sample mean. This is the method of moments, which in this case happens to yield maximum likelihood estimates of p.

Specifically, for the first variant let k = k1, ..., kn be a sample where ki  1 for i = 1, ..., n. Then p can be estimated as

\widehat{p} = \left(\frac1n \sum_{i=1}^n k_i\right)^{-1} = \frac{n}{\sum_{i=1}^n k_i }. \!

In Bayesian inference, the Beta distribution is the conjugate prior distribution for the parameter p. If this parameter is given a Beta(α, β) prior, then the posterior distribution is

p \sim \mathrm{Beta}\left(\alpha+n,\ \beta+\sum_{i=1}^n (k_i-1)\right). \!

The posterior mean E[p] approaches the maximum likelihood estimate \widehat{p} as α and β approach zero.

In the alternative case, let k1, ..., kn be a sample where ki  0 for i = 1, ..., n. Then p can be estimated as

\widehat{p} = \left(1 + \frac1n \sum_{i=1}^n k_i\right)^{-1} = \frac{n}{\sum_{i=1}^n k_i + n}. \!

The posterior distribution of p given a Beta(α, β) prior is

p \sim \mathrm{Beta}\left(\alpha+n,\ \beta+\sum_{i=1}^n k_i\right). \!

Again the posterior mean E[p] approaches the maximum likelihood estimate \widehat{p} as α and β approach zero.

Other properties


\begin{align}
G_X(s) & = \frac{s\,p}{1-s\,(1-p)}, \\[10pt]
G_Y(s) & = \frac{p}{1-s\,(1-p)}, \quad |s| < (1-p)^{-1}.
\end{align}
\Pr(D=d) = {q^{100d} \over 1 + q^{100} + q^{200} + \cdots + q^{900}},
where q = 1  p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.


\{(p-1) \Pr (k)+\Pr (k+1)=0,\Pr (0)=p\}

Related distributions

Z = \sum_{m=1}^r Y_m
follows a negative binomial distribution with parameters r and p.[1]
W = \min_{m \in 1, \dots, r} Y_m\,
is also geometrically distributed, with parameter p = 1-\prod_m(1-p_{m}).
\sum_{k=1}^\infty k\,X_k
has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value r/(1  r).
Y = \lfloor X \rfloor,
where \lfloor \quad \rfloor is the floor (or greatest integer) function, is a geometrically distributed random variable with parameter p = 1  eλ (thus λ = ln(1  p)[2]) and taking values in the set {0, 1, 2, ...}. This can be used to generate geometrically distributed pseudorandom numbers by first generating exponentially distributed pseudorandom numbers from a uniform pseudorandom number generator: then \lfloor \ln(U) / \ln(1-p)\rfloor is geometrically distributed with parameter p, if U is uniformly distributed in [0,1].


Since: P\left( X>a \right)={{\left( 1-p \right)}^{a}}={{\left( 1-\frac{1}{n} \right)}^{n\frac{1}{n}\left( a \right)}}={{\left[ {{\left( 1-\frac{1}{n} \right)}^{n}} \right]}^{\frac{1}{n}\left( a \right)}}\xrightarrow[n\to \infty ]{}{{\left[ {{e}^{-1}} \right]}^{\frac{1}{n}\left( a \right)}}= {{e}^{-\frac{1}{n}a}}

See also


References

  1. Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.
  2. http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l

External links