Moment (mathematics)

In mathematics, a moment is a specific quantitative measure, used in both mechanics and statistics, of the shape of a set of points. If the points represent mass, then the zeroth moment is the total mass, the first moment divided by the total mass is the center of mass, and the second moment is the rotational inertia. If the points represent probability density, then the zeroth moment is the total probability (i.e. one), the first moment is the mean, the second central moment is the variance, the third central moment is the skewness, and the fourth central moment (with normalization and shift) is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.

For a distribution of mass or probability on a bounded interval, the collection of all the moments (of all orders, from $0$ to $\infty$ ) uniquely determines the distribution (Hausdorff moment problem). The same is not true on unbounded intervals (Hamburger moment problem).

Significance of the moments

The $n$ -th moment of a real-valued continuous function f(x) of a real variable about a value c is

\mu _{n}=\int _{-\infty }^{\infty }(x-c)^{n}\,f(x)\,dx.

It is possible to define moments for random variables in a more general fashion than moments for real values—see moments in metric spaces. The moment of a function, without further explanation, usually refers to the above expression with c = 0.

For the second and higher moments, the central moment (moments about the mean, with c being the mean) are usually used rather than the moments about zero, because they provide clearer information about the distribution's shape.

Other moments may also be defined. For example, the $n$ -th inverse moment about zero is $\operatorname {E} \left[X^{-n}\right]$ and the $n$ -th logarithmic moment about zero is $\operatorname {E} \left[\ln ^{n}(X)\right].$

The $n$ -th moment about zero of a probability density function f(x) is the expected value of $X n$ and is called a raw moment or crude moment.^[1] The moments about its mean $μ$ are called central moments; these describe the shape of the function, independently of translation.

If f is a probability density function, then the value of the integral above is called the $n$ -th moment of the probability distribution. More generally, if F is a cumulative probability distribution function of any probability distribution, which may not have a density function, then the $n$ -th moment of the probability distribution is given by the Riemann–Stieltjes integral

\mu '_{n}=\operatorname {E} [X^{n}]=\int _{-\infty }^{\infty }x^{n}\,dF(x)\,

where X is a random variable that has this cumulative distribution F, and $E$ is the expectation operator or mean.

When

\operatorname {E} \left[\left|X^{n}\right|\right]=\int _{-\infty }^{\infty }|x^{n}|\,dF(x)=\infty ,

then the moment is said not to exist. If the $n$ -th moment about any point exists, so does the $(n - 1)$ -th moment (and thus, all lower-order moments) about every point.

The zeroth moment of any probability density function is 1, since the area under any probability density function must be equal to one.

Significance of moments (raw, central, normalised) and cumulants (raw, normalised), in connection with named properties of distributions
Moment number	Raw moment	Central moment	Normalised moment	Raw cumulant	Standardised cumulant
1	mean	0	0	mean	N/A
2	–	variance	1	variance	1
3	–	–	skewness	–	skewness
4	–	–	(non-excess or historical) kurtosis	–	excess kurtosis
5	–	–	hyperskewness	–	–
6	–	–	hyperflatness	–	–
7+	–	–	-	–	–

Mean

The first raw moment is the mean, usually denoted $\mu \equiv \mu _{1}\equiv \operatorname {E} [X].$

Variance

The second central moment is the variance. Its positive square root is the standard deviation $\sigma \equiv \left(\operatorname {E} [(x-\mu )^{2}]\right)^{1/2}.$

Normalised moments

The normalised $n$ -th central moment or standardised moment is the $n$ -th central moment divided by $σ n$ ; the normalised $n$ -th central moment of the random variable $X$ is ${\frac {\mu _{n}}{\sigma ^{n}}}={\frac {\operatorname {E} [(X-\mu )^{n}]}{\sigma ^{n}}}.$

These normalised central moments are dimensionless quantities, which represent the distribution independently of any linear change of scale.

For an electric signal, the first moment is its DC level, and the 2nd moment is proportional to its average power.^[2]^[3]

Skewness

The third central moment is the measure of the lopsidedness of the distribution; any symmetric distribution will have a third central moment, if defined, of zero. The normalised third central moment is called the skewness, often $γ$ . A distribution that is skewed to the left (the tail of the distribution is longer on the left) will have a negative skewness. A distribution that is skewed to the right (the tail of the distribution is longer on the right), will have a positive skewness.

For distributions that are not too different from the normal distribution, the median will be somewhere near $μ - γσ /6$ ; the mode about $μ - γσ /2$ .

Kurtosis

The fourth central moment is a measure of the heaviness of the tail of the distribution, compared to the normal distribution of the same variance. Since it is the expectation of a fourth power, the fourth central moment, where defined, is always nonnegative; and except for a point distribution, it is always strictly positive. The fourth central moment of a normal distribution is $3 σ 4$ .

The kurtosis κ is defined to be the normalised fourth central moment minus 3 (Equivalently, as in the next section, it is the fourth cumulant divided by the square of the variance). Some authorities do not subtract three, but it is usually more convenient to have the normal distribution at the origin of coordinates.^[4]^[5] If a distribution has heavy tails, the kurtosis will be high (sometimes called leptokurtic); conversely, light-tailed distributions (for example, bounded distributions such as the uniform) have low kurtosis (sometimes called platykurtic).

The kurtosis can be positive without limit, but $κ$ must be greater than or equal to $γ 2 - 2$ ; equality only holds for binary distributions. For unbounded skew distributions not too far from normal, $κ$ tends to be somewhere in the area of $γ 2$ and $2 γ 2$ .

The inequality can be proven by considering

\operatorname {E} [(T^{2}-aT-1)^{2}]

where $T = (X - μ)/ σ$ . This is the expectation of a square, so it is non-negative for all a; however it is also a quadratic polynomial in a. Its discriminant must be non-positive, which gives the required relationship.

Mixed moments

Mixed moments are moments involving multiple variables.

Some examples are covariance, coskewness and cokurtosis. While there is a unique covariance, there are multiple co-skewnesses and co-kurtoses.

Higher moments

High-order moments are moments beyond 4th-order moments. As with variance, skewness, and kurtosis, these are higher-order statistics, involving non-linear combinations of the data, and can be used for description or estimation of further shape parameters. The higher the moment, the harder it is to estimate, in the sense that larger samples are required in order to obtain estimates of similar quality. This is due to the excess degrees of freedom consumed by the higher orders. Further, they can be subtle to interpret, often being most easily understood in terms of lower order moments – compare the higher derivatives of jerk and jounce in physics. For example, just as the 4th-order moment (kurtosis) can be interpreted as "relative importance of tails versus shoulders in causing dispersion" (for a given dispersion, high kurtosis corresponds to heavy tails, while low kurtosis corresponds to broad shoulders), the 5th-order moment can be interpreted as measuring "relative importance of tails versus center (mode, shoulders) in causing skew" (for a given skew, high 5th moment corresponds to heavy tail and little movement of mode, while low 5th moment corresponds to more change in shoulders).

Transformation of center

Since:

(x-b)^{n}=(x-a+a-b)^{n}=\sum _{i=0}^{n}{{n} \choose {i}}(x-a)^{i}(a-b)^{n-i}

where ${\dbinom {n}{i}}$ is the binomial coefficient, it follows that the moments about b can be calculated from the moments about a by:

E[(x-b)^{n}]=\sum _{i=0}^{n}{{n} \choose {i}}E[(x-a)^{i}](a-b)^{n-i}

Cumulants

The first raw moment and the second and third unnormalized central moments are additive in the sense that if X and Y are independent random variables then

{\begin{aligned}m_{1}(X+Y)&=m_{1}(X)+m_{1}(Y)\\\operatorname {Var} (X+Y)&=\operatorname {Var} (X)+\operatorname {Var} (Y)\\\mu _{3}(X+Y)&=\mu _{3}(X)+\mu _{3}(Y)\end{aligned}}

(These can also hold for variables that satisfy weaker conditions than independence. The first always holds; if the second holds, the variables are called uncorrelated).

In fact, these are the first three cumulants and all cumulants share this additivity property.

Sample moments

For all k, the $k$ -th raw moment of a population can be estimated using the $k$ -th raw sample moment

{\frac {1}{n}}\sum _{i=1}^{n}X_{i}^{k}

applied to a sample $X 1, ..., X n$ drawn from the population.

It can be shown that the expected value of the raw sample moment is equal to the $k$ -th raw moment of the population, if that moment exists, for any sample size $n$ . It is thus an unbiased estimator. This contrasts with the situation for central moments, whose computation uses up a degree of freedom by using the sample mean. So for example an unbiased estimate of the population variance (the second central moment) is given by

{\frac {1}{n-1}}\sum _{i=1}^{n}(X_{i}-{\bar {X}})^{2}

in which the previous denominator $n$ has been replaced by the degrees of freedom $n - 1$ , and in which ${\bar {X}}$ refers to the sample mean. This estimate of the population moment is greater than the unadjusted observed sample moment by a factor of ${\tfrac {n}{n-1}},$ and it is referred to as the "adjusted sample variance" or sometimes simply the "sample variance".

Problem of moments

The problem of moments seeks characterizations of sequences { μ′_n : n = 1, 2, 3, ... } that are sequences of moments of some function f.

Partial moments

Partial moments are sometimes referred to as "one-sided moments." The $n$ -th order lower and upper partial moments with respect to a reference point r may be expressed as

\mu _{n}^{-}(r)=\int _{-\infty }^{r}(r-x)^{n}\,f(x)\,dx,

\mu _{n}^{+}(r)=\int _{r}^{\infty }(x-r)^{n}\,f(x)\,dx.

Partial moments are normalized by being raised to the power 1/n. The upside potential ratio may be expressed as a ratio of a first-order upper partial moment to a normalized second-order lower partial moment. They have been used in the definition of some financial metrics, such as the Sortino ratio, as they focus purely on upside or downside.

Central moments in metric spaces

Let $(M, d)$ be a metric space, and let B(M) be the Borel $σ$ -algebra on M, the $σ$ -algebra generated by the d-open subsets of M. (For technical reasons, it is also convenient to assume that M is a separable space with respect to the metric d.) Let $1 \leq p \leq \infty$ .

The pth central moment of a measure $μ$ on the measurable space (M, B(M)) about a given point $x 0 \in M$ is defined to be

\int _{M}d(x,x_{0})^{p}\,\mathrm {d} \mu (x).

μ is said to have finite $p$ -th central moment if the $p$ -th central moment of $μ$ about x₀ is finite for some $x 0 \in M$ .

This terminology for measures carries over to random variables in the usual way: if $(Ω, Σ, P)$ is a probability space and $X : Ω \to M$ is a random variable, then the $p$ -th central moment of X about $x 0 \in M$ is defined to be

\int _{M}d(x,x_{0})^{p}\,\mathrm {d} \left(X_{*}(\mathbf {P} )\right)(x)\equiv \int _{\Omega }d(X(\omega ),x_{0})^{p}\,\mathrm {d} \mathbf {P} (\omega ),

and X has finite $p$ -th central moment if the $p$ -th central moment of X about x₀ is finite for some $x 0 \in M$ .

References

↑ http://mathworld.wolfram.com/RawMoment.html Raw Moments at Math-world
↑ Clive Maxfield; John Bird; Tim Williams; Walt Kester; Dan Bensky (2011). Electrical Engineering: Know It All. Newnes. p. 884. ISBN 978-0-08-094966-6.
↑ Ha H. Nguyen; Ed Shwedyk (2009). A First Course in Digital Communications. Cambridge University Press. p. 87. ISBN 978-0-521-87613-1.
↑ Casella, George; Berger, Roger L. (2002). Statistical Inference (2 ed.). Pacific Grove: Duxbury. ISBN 0-534-24312-6.
↑ Ballanda, Kevin P.; MacGillivray, H. L. (1988). "Kurtosis: A Critical Review". The American Statistician. American Statistical Association. 42 (2): 111–119. JSTOR 2684482. doi:10.2307/2684482.

External links

Hazewinkel, Michiel, ed. (2001) [1994], "Moment", Encyclopedia of Mathematics, Springer Science+Business Media B.V. / Kluwer Academic Publishers, ISBN 978-1-55608-010-4
Moments at Mathworld
Higher Moments

Theory of probability distributions
probability mass function (pmf) probability density function (pdf) cumulative distribution function (cdf) quantile function
raw moment central moment mean variance standard deviation skewness kurtosis L-moment
moment-generating function (mgf) characteristic function probability-generating function (pgf) cumulant combinant

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode
Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range
Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data
Survey methodology	Sampling stratified cluster Standard error Opinion poll Questionnaire
Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment
Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F
Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.