Hermite distribution

Hermite
Probability mass function The horizontal axis is the index k, the number of occurrences. The function is only defined at integer values of k. The connecting lines are only guides for the eye.
Cumulative distribution function The horizontal axis is the index k, the number of occurrences. The CDF is discontinuous at the integers of k and flat everywhere else because a variable that is Hermite distributed only takes on integer values.
Notation	$\mathrm{Herm}(a_1,a_2)\,$
Parameters	a₁ ≥ 0, a₂ ≥ 0
Support	k ∈ { 0, 1, 2, ... }
pmf	$e^{[-(a_1+a_2)]} \sum_{j=0}^{[n/2]} \frac{a_1^{n-2j}a_2^j}{(n-2j!)j!}$
CDF	$e^{[-a_1+a_2]} \sum_{i=0}^{\lfloor x\rfloor} \sum_{j=0}^{[i/2]} \frac{a_1^{i-2j}a_2^j}{(i-2j)!j!}$
Mean	$a_1 + 2 a_2$
Variance	$a_1 + 4 a_2$
Skewness	$\frac {a_1 + 8a_2}{(a_1+4a_2)^{3/2}}$
Ex. kurtosis	$\frac {a_1 +16a_2}{(a_1+4a_2)^2}$
MGF	$\exp(a_1(e^t-1)+a_2(e^{2t}-1))\,$
CF	$\exp(a_1(e^{ti}-1)+a_2(e^{2ti}-1))\,$
PGF	$\exp(a_1(s-1)+a_2(s^2-1))\,$

In probability theory and statistics, the Hermite distribution, named after Charles Hermite, is a discrete probability distribution used to model count data with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate over-dispersion in the data. The Hermite distribution is a special case of the Poisson binomial distribution, when n = 2.

The authors Kemp and Kemp ^[1] have called it "Hermite distribution" from the fact its probability function and the moment generating function can be expressed in terms of the coefficients of (modified) Hermite polynomials.

History

The distribution first appeared in the paper Applications of Mathematics to Medical Problems,^[2] by Anderson Gray McKendrick in 1926. In this work the author explains several mathematical methods that can be applied to medical research. In one of this methods he considered the bivariate Poisson distribution and showed that the distribution of the sum of two correlated Poisson variables follow a distribution that later would be known as Hermite distribution.

As a practical application, McKendrick considered the distribution of counts of bacteria in leucocytes. Using the method of moments he fitted the data with the Hermite distribution and found the model more satisfactory than fitting it with a Poisson distribution.

The distribution was formally introduced and published by C. D. Kemp and Adrienne W.Kemp in 1965 in their work Some Properties of ‘Hermite’ Distribution. The work is focused on the properties of this distribution for instance a necessary condition on the parameters and their Maximum Likelihood (MLE), the analysis of the probability generating function (PGF) and how it can be expressed in terms of the coefficients of (modified) Hermite polynomials. An example they have used in this publication is the distribution of counts of bacteria in leucocytes that used McKendrick but Kemp and Kemp estimate the model using the maximum likelihood method.

Hermite distribution is is a special case of discrete compound Poisson distribution with only 2 parameters. ^[3] ^[4]

The same authors published in 1966 the paper An alternative Derivation of the Hermite Distribution.^[5] In this work established that the Hermite distribution can be obtained formally by combining a Poisson distribution with a Normal distribution.

In 1971, Y. C. Patel^[6] did a comparative study of various estimation procedures for the Hermite distribution in his doctoral thesis. It included maximum likelihood, moment estimators, mean and zero frequency estimators and the method of even points.

In 1974, Gupta and Jain^[7] did a research on a generalized form of Hermite distribution.

In the probabilistic number theory, due to Bekelis's work,^[8] when a strongly additive function ${f_x}(m),(x \ge 1)$ only takes value {0,1,2} on prime number p, under some conditions, then the frequent number of ${f_x}(m)$ convergent to a Hermite distribution for $x \to \infty$ .^[9]

Definition

Probability mass function

Let X₁ and X₂ be two independent Poisson variables with parameters a₁ and a₂. The probability distribution of the random variable Y = X₁ + 2X₂ is the Hermite distribution with parameters a₁ and a₂ and probability mass function is given by ^[10]

p_n = P(Y=n) = e^{[-a_1+a_2]} \sum_{j=0}^{[n/2]} \frac{a_1^{n-2j}a_2^j}{(n-2j)!j!}

where

n = 0, 1, 2, ...
a₁, a₂ ≥ 0.
(n − 2j)! and j! are the factorial of (n − 2j) and j, respectively.
[n/2] is the integer part of [n/2].

As a special case of discrete compound Poisson, there are at least ten approaches to proving the probability mass function of Hermite distribution.^[9]

The probability generating function of the probability mass is,^[10]

G_Y(s) = \sum_{n=0}^\infty p_n s^n = \exp(a_1(s-1)+a_2(s^2-1))

Notation

When a random variable Y = X₁ + 2X₂ is distributed by an Hermite distribution, where X₁ and X₂ are two independent Poisson variables with parameters a₁ and a₂, we write

Y\ \sim\ \mathrm{Herm}(a_1,a_2)\,

Properties

Moment and cumulant generating functions

The moment generating function of a random variable X is defined as the expected value of e^t, as a function of the real parameter t. For an Hermite distribution with parameters X₁ and X₂, the moment generating function exists and is equal to

M(t) = G (e^t) = \exp(a_1(e^t-1)+a_2(e^{2t}-1))

The cumulant generating function is the logarithm of the moment generating function and is equal to ^[4]

K(t) = \log(M(t)) = a_1(e^t-1)+a_2(e^{2t}-1)

If we consider the coefficient of (it)^rr! in the expansion of K(t) we obtain the r-cumulant

k_n = a_1 +2^n a_2

Hence the mean and the succeeding three moments abouit it are

Order	Moment	Cumulant
1	$\mu_1 = k_1 = a_1 +2a_2$	$\mu$
2	$\mu_2 = k_2 = a_1 +4a_2$	$\sigma^2$
3	$\mu_3 = k_3 = a_1 +8a_2$	$k_3$
4	$\mu_4 = k_4+3k_2^2 = a_1 +16a_2 + 3 (a_1+4a_2)^2$	$k_4$

Skewness

The skewness is the third moment centered around the mean divided by the 3/2 power of the standard deviation, and for the hermite distribution is,^[4]

\gamma_1 = \frac{\mu_3}{\mu_2^{3/2}} = \frac{(a_1+8a_2)}{(a_1+4a_2)^{3/2}}

Always $\gamma_1>0$ , so the mass of the distribution is concentrated on the left.

Kurtosis

The kurtosis is the fourth moment centered around the mean, divided by the square of the variance, and for the Hermite distribution is,^[4]

\beta_2= \frac{\mu_4}{\mu_2^2} = \frac{a_1+16a_2+3(a_1+4a_2)^2}{(a_1+4a_2)^2} = \frac {a_1+16a_2}{(a_1+4a_2)^2}+3

The excess kurtosis is just a correction to make the kurtosis of the normal distribution equal to zero, and it is the following,

\gamma_2= \frac{\mu_4}{\mu_2^2}-3 = \frac {a_1+16a_2}{(a_1+4a_2)^2}

Always $\beta_2 >3$ , or $\gamma_2 >0$ the distribution has a high acute peak around the mean and fatter tails.

Characteristic function

In a discrete distribution the characteristic function of any real-valued random variable is defined as the expected value of $e^{itX}$ , where i is the imaginary unit and t ∈ R

\phi(t)= E[e^{itX}] = \sum_{j=0}^\infty e^{ijt}P[X=j]

This function is related to the moment-generating function via $\phi_x(t) = M_X(it)$ . Hence for this distribution the characteristic function is,^[1]

\phi_x(t) = \exp(a_1(e^{it}-1)+a_2(e^{2it}-1))

Cumulative distribution function

The cumulative distribution function is,^[1]

\begin{align} F(x;a_1,a_2)& = P(X \leq x)\\ & = \exp (-(a_1+a_2)) \sum_{i=0}^{\lfloor x\rfloor} \sum_{j=0}^{[i/2]} \frac{a_1^{i-2j}a_2^j}{(i-2j)!j!} \end{align}

Other properties

This distribution can have any number of modes. As an example, the fitted distribution for McKendrick’s ^[2] data has an estimated parameters of $\hat{a_1}=0.0135$ , $\hat{a_2}= 0.0932$ . Therefore, the first five estimated probabilities are 0.899, 0.012, 0.084, 0.001, 0.004.

Example of a multi-modal data, Hermite Distribution(0.1,1.5).

This distribution is closed under addition or closed under convolutions.^[11] As the Poisson distribution, the Hermite distribution has this property. Given 2 random Hermite variables $X_1 \sim \mathrm{Herm}(a_1,a_2)$ and $X_2 \sim \mathrm{Herm}(b_1,b_2)$ , then Y = X₁ + X₂ follows an Hermite distribution, $Y \sim \mathrm{Herm}(a_1+b_1,a_2+b_2)$ .

This distribution allows a moderate overdispersion, so it can be used when data has this property.^[11] A random variable has overdispersion, or it is overdispersed with respect the Poisson distribution, when its variance is greater than its expected value. The Hermite distribution allows a moderate overdispersion because the coefficient of dispersion is always between 1 and 2,

d = \frac{\mathrm{Var}(Y)}{E(y)} = \frac{a_1+4a_2}{a_1+2a_2} = 1 + \frac{2a_2}{a_1+2a_2}

Parameter estimation

Method of moments

The mean and the variance of the Hermite distribution are $\mu = a_1+2a_2$ and $\sigma^2 =a_1+4a_2$ , respectively. So we have these two equation,

\begin{cases} \bar{x} = a_1 + 2a_2 \\ \sigma^2 = a_1 + 4a_2 \end{cases}

Solving these two equation we get the moment estimators $\hat{a_1}$ and $\hat{a_2}$ of a₁ and a₂.^[6]

\hat{a_1} = 2 \bar{x}- \sigma^2

\hat{a_2} = \frac {\sigma^2 - \hat{x}}{2}

Since a₁ and a₂ both are positive, the estimator $\hat{a_1}$ and $\hat{a_2}$ are admissible (≥ 0) only if, $\bar{x} < \sigma^2 < 2 \bar{x}$ .

Maximum likelihood

Given a sample X₁ ... X_m are independent random variables each having an Hermite distribution we wish to estimate the value of the parameters $\hat{a_1}$ and $\hat{a_2}$ . We know that the mean and the variance of the distribution are $\mu = a_1+2a_2$ and $\sigma^2 =a_1+4a_2$ , respectively. Using these two equation,

\begin{cases} a_1 = \mu (2-d) \\ a_2 = \frac{\mu(d-1)}{2} \end{cases}

We can parameterize the probability function by μ and d

P(X=x)= \exp\left(-\left(\mu(2-d)+ \frac{\mu(d-1)}{2}\right)\right) \sum_{j=0}^{[x/2]} \frac{(\mu(2-d))^{x-2j}\left(\frac{\mu(d-1)}{2}\right)^j}{(x-2j)!j!}

Hence the log-likelihood function is,^[11]

\begin{align} \mathcal{L}(x_1,\ldots,x_m;\mu,d)& = \log(\mathcal{L}(x_1,\ldots,x_m;\mu,d))\\ & = m\mu \left(-1 + \frac{d-1}{2}\right) + \log(\mu(2-d)) \sum_{i=1}^m x_i + \sum_{i=1}^m \log(q_i(\theta)) \end{align}

where

$q_i(\theta) = \sum_{j=0}^{[x_i/2]} \frac{\theta^j}{(x_i-2j)!j!}$
$\theta = \frac{d-1}{2\mu (2-d)^2}$

From the log-likelihood function, the likelihood equations are,^[11]

\frac{\partial l}{\partial \mu} = m \left(-1 + \frac{d-1}{2}\right) + \frac{1}{\mu} \sum_{i=1}^m x_i - \frac{d-1}{2 \mu^2(2-d)^2} \sum_{i=1}^m \frac{q_i^{'}(\theta)}{q_i(\theta)}

\frac{\partial l}{\partial d} = m \frac{\mu}{2} - \frac{\sum_{i=1}^m x_i}{2-d} - \frac{d}{2\mu (2-d)^3} \sum_{i=1}^m \sum_{i=1}^m \frac{q_i^{'}(\theta)}{q_i(\theta)}

Straightforward calculations show that,^[11]

$\mu = \bar{x}$
And d can be found by solving,

\sum_{i=1}^m \frac{q_i^{'}(\tilde{\theta})}{q_i(\tilde{\theta})}= m(\bar{x}(2-d))^2

where $\tilde{\theta}\frac{d-1}{2\bar{x}(2-d)^2}$

It can be shown that the log-likelihood function is strictly concave in the domain of the parameters. Consequently, the MLE is unique.

The likelihood equation does not always have a solution like as it shows the following proposition,

Proposition:^[11] Let X₁, ..., X_m come from a generalized Hermite distribution with fixed n. Then the MLEs of the parameters are $\hat{\mu}$ and $\tilde{d}$ if only if $m^{(2)}/\bar{x}^2 > 1$ , where $m^{(2)} = \sum_{i=1}^n x_i(x_i-1)/n$ indicates the empirical factorial momement of order 2.

Remark 1: The condition $m^{(2)}/\bar{x}^2 > 1$ is equivalent to $\tilde{d} > 1$ where $\tilde{d} = \sigma^2 / \bar{x}$ is the empirical dispersion index

Remark 2: If the condition is not satisfied, then the MLEs of the parameters are $\hat{\mu} = \bar{x}$ and $\tilde{d} =1$ , that is, the data are fitted using the Poisson distribution.

Zero frequency and the mean estimators

A usual choice for discrete distributions is the zero relative frequency of the data set which is equated to the probability of zero under the assumed distribution. Observing that $f_0 = \exp(-(a_1+a_2))$ and $\mu=a_1+2a_2$ . Following the example of Y. C. Patel (1976) the resulting system of equations,

\begin{cases} \bar{x}=a_1+2a_2 \\ f_0 =\exp(-(a_1+a_2)) \end{cases}

We obtain the zero frequency and the mean estimator a₁ of $\hat{a_1}$ and a₂ of $\hat{a_2}$ ,^[6]

\hat{a_1}=-(\bar{x}+2\log(f_0))

\hat{a_2} = \bar{x}+\log(f_0)

where $f_0 = \frac{n_0}{n}$ , is the zero relative frequency, n > 0

It can be seen that for distributions with a high probability at 0, the efficiency is high.

For admissible values of $\hat{a_1}$ and $\hat{a_2}$ , we must have

-\log\left(\frac{n_0}{n}\right) < \bar{x} < -2\log\left(\frac{n_0}{n}\right)

Testing Poisson assumption

When Hermite distribution is used to model a data sample is important to check if the Poisson distribution is enough to fit the data. Following the parametrized probability mass function used to calculate the maximum likelihood estimator, is important to corroborate the following hypothesis,

\begin{cases} H_0: d=1 \\ H_1: d> 1 \end{cases}

Likelihood-ratio test

The Likelihood-ratio test statistic ^[11] for hermite distribution is,

W = 2(\mathcal{L}(X;\hat{\mu},\hat{d})-\mathcal{L}(X;\hat{\mu},1))

Where $\mathcal{L}()$ is the log-likelihood function. As d = 1 belongs to the boundary of the domain of parameters, under the null hypothesis, W does not have an asymptotic $\chi_1^2$ distribution as expected. It can be established that the asymptotic distribution of W is a 50:50 mixture of the constant 0 and the $\chi_1^2$ . The α upper-tail percentage points for this mixture are the same as the 2α upper-tail percentage points for a $\chi_1^2$ ; for instance, for α=0.01,0.05, and 0.10 they are 5.41189, 2.70554 and 1.64237.

The "score" or Lagrange multiplier test

The score statistic is,^[11]

S_2 = 2m \left[\frac{m^{(2)}-\bar{x}^2}{2\bar{x}}\right]^2 = \frac{m(\tilde{d}-1)^2}{2}

where m is the number of observations.

The asymptotic distribution of the score test statistic under the null hypothesis is a $\chi_1^2$ distribution. It may be convenient to use a signed version of the score test, that is, $\operatorname{sgn}(m^{(2)} - \bar{x}^2)\sqrt{S}$ , following asympotically a standard normal.

References

↑ 1.0 1.1 1.2 Kemp, C.D; Kemp, A.W (1965). "Some Properties of the "Hermite" Distribution". Biometrika 52 (3-4): 381–394. doi:10.1093/biomet/52.3-4.381.
↑ 2.0 2.1 McKendrick, A.G. (1926). "Applications of Mathematics to Medical Problems". Proceedings of the Edinburgh Mathematical Society 44: 98–130. doi:10.1017/s0013091500034428.
↑ Huiming, Zhang; Yunxiao Liu; Bo Li (2014). "Notes on discrete compound Poisson model with applications to risk theory". Insurance: Mathematics and Economics 59: 325–336. doi:10.1016/j.insmatheco.2014.09.012.
↑ 4.0 4.1 4.2 4.3 Johnson, N.L., Kemp, A.W., and Kotz, S. (2005) Univariate Discrete Distributions, 3rd Edition, Wiley, ISBN 978-0-471-27246-5.
↑ Kemp, ADRIENNE W.; Kemp C.D (1966). "An alternative derivation of the Hermite distribution". Biometrika 53 (3-4): 627–628. doi:10.1093/biomet/53.3-4.627.
↑ 6.0 6.1 6.2 Patel, Y.C (1976). "Even Point Estimation and Moment Estimation in Hermite Distribution". Biometrics 32 (4): 865–873. doi:10.2307/2529270.
↑ Gupta, R.P.; Jain, G.C. (1974). "A Generalized Hermite distribution and Its Properties". SIAM Journal on Applied Mathematics 27: 359–363. doi:10.1137/0127027.
↑ Bekelis, D. (1996). "Convolutions of the Poisson laws in number theory". In Analytic & Probabilistic Methods in Number Theory: Proceedings of the 2nd International Conference in Honour of J. Kubilius, Lithuania 4: 283–296.
↑ 9.0 9.1 Zhang, H.; He, J.; Huang, H. (2013). "On Nonnegative Integer-Valued Lévy Processes and Applications in Probabilistic Number Theory and Inventory Policies". American Journal of Theoretical and Applied Statistics 2: 110–121. doi:10.11648/j.ajtas.20130205.11.
↑ 10.0 10.1 Kotz, Samuel (1982–1989). Encyclopedia of statistical sciences. John Wiley. ISBN 0471055522.
↑ 11.0 11.1 11.2 11.3 11.4 11.5 11.6 11.7 Puig, P. (2003). "Characterizing Additively Closed Discrete Models by a Property of Their Maximum Likelihood Estimators, with an Application to Generalized Hermite Distributions". Journal of the American Statistical Association 98: 687–692. doi:10.1198/016214503000000594.

Probability distributions

Discrete univariate with finite support

Benford Bernoulli Beta-binomial binomial categorical hypergeometric Poisson binomial Rademacher discrete uniform Zipf Zipf–Mandelbrot

Discrete univariate with infinite support

beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Gauss–Kuzmin geometric logarithmic negative binomial parabolic fractal Poisson Skellam Yule–Simon zeta

Continuous univariate supported on a bounded interval, e.g. [0,1]

Arcsine ARGUS Balding–Nichols Bates Beta Beta rectangular Irwin–Hall Kumaraswamy logit-normal Noncentral beta raised cosine Triangular U-quadratic uniform Wigner semicircle

[[List of probability distributions#Supported_on_semi-infinite_intervals.2C_usually_.5B0.2C.E2.88.9E.29|Continuous univariate supported on a semi-infinite interval, usually [0,∞)]]

Benini
Benktander 1st kind
Benktander 2nd kind
Beta prime
Burr
chi-squared
chi
Coxian
Dagum
Davis
EL
Erlang
exponential
F
folded normal
Flory-Schulz
Fréchet
Gamma
Gamma/Gompertz
generalized inverse Gaussian
Gompertz
half-logistic
half-normal
Hotelling's T-squared
hyper-Erlang
hyperexponential
hypoexponential
inverse chi-squared (scaled inverse chi-squared)
inverse Gaussian
inverse gamma
Kolmogorov
Lévy
log-Cauchy
log-Laplace
log-logistic
log-normal
matrix-exponential
Maxwell–Boltzmann
Maxwell–Jüttner
Mittag–Leffler
Nakagami
noncentral chi-squared
Pareto
phase-type
Poly-Weibull
Rayleigh
relativistic Breit–Wigner
Rice
Rosin–Rammler
shifted Gompertz
truncated normal
type-2 Gumbel
Weibull
Wilks' lambda

Continuous univariate supported on the whole real line (−∞, ∞)

Cauchy exponential power Fisher's z generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson SU Landau Laplace Linnik logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t type-1 Gumbel Tracy–Widom variance-gamma Voigt

Continuous univariate with support whose type varies

generalized extreme value generalized Pareto Tukey lambda q-Gaussian q-exponential q-Weibull shifted log-logistic

Mixed continuous-discrete univariate distributions

rectified Gaussian

Multivariate (joint)

Discrete Ewens multinomial Dirichlet-multinomial negative multinomial Continuous Dirichlet Generalized Dirichlet multivariate normal Multivariate stable multivariate Student normal-scaled inverse gamma normal-gamma Matrix-valued inverse matrix gamma inverse-Wishart matrix normal matrix t matrix gamma normal-inverse-Wishart normal-Wishart Wishart

Directional

Univariate (circular) directional Circular uniform univariate von Mises wrapped normal wrapped Cauchy wrapped exponential wrapped Lévy Bivariate (spherical) Kent Bivariate (toroidal) bivariate von Mises Multivariate von Mises–Fisher Bingham

Degenerate and singular

Degenerate discrete degenerate Dirac delta function Singular Cantor

Families

Circular compound Poisson elliptical exponential natural exponential location-scale maximum entropy mixture Pearson Tweedie wrapped