Yule-Simon distribution

From Wikipedia, the free encyclopedia

Yule-Simon
Probability mass function
Plot of the Yule-Simon PMF
Yule-Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Cumulative distribution function
Plot of the Yule-Simon CMF
Yule-Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Parameters \rho>0\, shape (real)
Support k \in \{1,2,\dots\}\,
Probability mass function (pmf) \rho\,\mathrm{B}(k, \rho+1)\,
Cumulative distribution function (cdf) 1 - k\,\mathrm{B}(k, \rho+1)\,
Mean \frac{\rho}{\rho-1}\, for \rho>1\,
Median
Mode 1\,
Variance \frac{\rho^2}{(\rho-1)^2\;(\rho-2)}\, for \rho>2\,
Skewness \frac{(\rho+1)^2\;\sqrt{\rho-2}}{(\rho-3)\;\rho}\, for \rho>3\,
Excess Kurtosis \rho+3+\frac{11\rho^3-49\rho-22} {(\rho-4)\;(\rho-3)\;\rho}\, for \rho>4\,
Entropy
mgf \frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^t)\,e^t \,
Char. func. \frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^{i\,t})\,e^{i\,t} \,

In probability and statistics, the Yule-Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution.

The probability mass function of the Yule-Simon(ρ) distribution is

f(k;\rho) = \rho\,\mathrm{B}(k, \rho+1), \,

for integer k \geq 1 and real ρ > 0, where B is the beta function. Equivalently the pmf can be written in terms of the falling factorial as

f(k;\rho) = \frac{\rho\,\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}}  , \,

where Γ is the gamma function. Thus, if ρ is an integer,

f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!}  . \,

The probability mass function f has the property that for sufficiently large k we have

f(k;\rho)  \approx \frac{\rho\,\Gamma(\rho+1)}{k^{\rho+1}}  \propto \frac{1}{k^{\rho+1}}  . \,

This means that the tail of the Yule-Simon distribution is a realization of Zipf's law: f(k;ρ) can be used to model, for example, the relative frequency of the kth most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of k.

[edit] Occurrence

The Yule-Simon distribution arises as a continuous mixture of geometric distributions. Specifically, assume that W follows an exponential distribution with scale 1 / ρ or rate ρ:

W \sim \mathrm{Exponential}(\rho)\,
h(w;\rho) = \rho \, \exp(-\rho\,w)\,

Then a Yule-Simon distributed variable K has the following geometric distribution:

K \sim \mathrm{Geometric}(\exp(-W))\,

The pmf of a geometric distribution is

g(k; p) = p  \, (1-p)^{k-1}\,

for k\in\{1,2,\dots\}. The Yule-Simon pmf is then the following exponential-geometric mixture distribution:

f(k;\rho)  = \int_0^{\infty} \,\,\, g(k;\exp(-w))\,h(w;\rho)\,dw \,

[edit] Generalizations

Simon also hinted at a two-parameter generalization of the Yule-Simon distribution, in which the beta function is replaced by an incomplete beta function. The probability mass function of the generalized Yule-Simon(ρ, α) distribution is defined as

f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \;         \mathrm{B}_{1-\alpha}(k, \rho+1)  ,  \,

with 0 \leq \alpha < 1. For α = 0 the ordinary Yule-Simon(ρ) distribution is obtained as a special case.

Plot of the Yule-Simon(1) distribution (red) and its asymptotic Zipf law (blue)
Enlarge
Plot of the Yule-Simon(1) distribution (red) and its asymptotic Zipf law (blue)

[edit] References

  • Herbert A. Simon, On a Class of Skew Distribution Functions, Biometrika 42(3/4): 425–440, December 1955.
  • Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)
Image:Bvn-small.png Probability distributionsview  talk  edit ]
Univariate Multivariate
Discrete: BernoullibinomialBoltzmanncompound PoissondegenerateGauss-Kuzmingeometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniformYule-SimonzetaZipfZipf-Mandelbrot Ewensmultinomial
Continuous: BetaBeta primeCauchychi-squareDirac delta functionErlangexponentialexponential powerFfadingFisher's zFisher-TippettGammageneralized extreme valuegeneralized hyperbolicgeneralized inverse GaussianHalf-LogisticHotelling's T-squarehyperbolic secanthyper-exponentialhypoexponentialinverse chi-squareinverse Gaussianinverse gammaKumaraswamyLandauLaplaceLévyLévy skew alpha-stablelogisticlog-normalMaxwell-BoltzmannMaxwell speednormal (Gaussian)ParetoPearsonpolarraised cosineRayleighrelativistic Breit-WignerRiceStudent's ttriangulartype-1 Gumbeltype-2 GumbeluniformVoigtvon MisesWeibullWigner semicircleWilks' lambda DirichletKentmatrix normalmultivariate normalvon Mises-FisherWigner quasiWishart
Miscellaneous: Cantorconditionalexponential familyinfinitely divisiblelocation-scale familymarginalmaximum entropyphase-typeposteriorpriorquasisamplingsingular