Yule-Simon distribution

From Wikipedia, the free encyclopedia

Yule-Simon
Probability mass function
Plot of the Yule-Simon PMF
Yule-Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Cumulative distribution function
Plot of the Yule-Simon CMF
Yule-Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Parameters \rho>0\, shape (real)
Support k \in \{1,2,\dots\}\,
Probability mass function (pmf) \rho\,\mathrm{B}(k, \rho+1)\,
Cumulative distribution function (cdf) 1 - k\,\mathrm{B}(k, \rho+1)\,
Mean \frac{\rho}{\rho-1}\, for \rho>1\,
Median
Mode 1\,
Variance \frac{\rho^2}{(\rho-1)^2\;(\rho-2)}\, for \rho>2\,
Skewness \frac{(\rho+1)^2\;\sqrt{\rho-2}}{(\rho-3)\;\rho}\, for \rho>3\,
Excess kurtosis \rho+3+\frac{11\rho^3-49\rho-22} {(\rho-4)\;(\rho-3)\;\rho}\, for \rho>4\,
Entropy
Moment-generating function (mgf) \frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^t)\,e^t \,
Characteristic function \frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^{i\,t})\,e^{i\,t} \,

In probability and statistics, the Yule-Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution.

The probability mass function of the Yule-Simon(ρ) distribution is

f(k;\rho) = \rho\,\mathrm{B}(k, \rho+1), \,

for integer k \geq 1 and real ρ > 0, where B is the beta function. Equivalently the pmf can be written in terms of the falling factorial as


 f(k;\rho) = \frac{\rho\,\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}}
 ,
\,

where Γ is the gamma function. Thus, if ρ is an integer,


 f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!}
 .
\,

The probability mass function f has the property that for sufficiently large k we have


 f(k;\rho)
 \approx \frac{\rho\,\Gamma(\rho+1)}{k^{\rho+1}}
 \propto \frac{1}{k^{\rho+1}}
 .
\,

This means that the tail of the Yule-Simon distribution is a realization of Zipf's law: f(k;ρ) can be used to model, for example, the relative frequency of the kth most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of k.

[edit] Occurrence

The Yule-Simon distribution arises as a continuous mixture of geometric distributions. Specifically, assume that W follows an exponential distribution with scale 1 / ρ or rate ρ:

W \sim \mathrm{Exponential}(\rho)\,
h(w;\rho) = \rho \, \exp(-\rho\,w)\,

Then a Yule-Simon distributed variable K has the following geometric distribution:

K \sim \mathrm{Geometric}(\exp(-W))\,

The pmf of a geometric distribution is

g(k; p) = p  \, (1-p)^{k-1}\,

for k\in\{1,2,\dots\}. The Yule-Simon pmf is then the following exponential-geometric mixture distribution:

f(k;\rho)
 = \int_0^{\infty} \,\,\, g(k;\exp(-w))\,h(w;\rho)\,dw
\,

[edit] Generalizations

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule-Simon(ρ, α) distribution is defined as


 f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \;
        \mathrm{B}_{1-\alpha}(k, \rho+1)
 ,
 \,

with 0 \leq \alpha < 1. For α = 0 the ordinary Yule-Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.

Plot of the Yule-Simon(1) distribution (red) and its asymptotic Zipf law (blue)
Plot of the Yule-Simon(1) distribution (red) and its asymptotic Zipf law (blue)

[edit] References

  • Herbert A. Simon, On a Class of Skew Distribution Functions, Biometrika 42(3/4): 425–440, December 1955.
  • Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)