Statistical dispersion

In statistics, statistical dispersion (also called statistical variability or variation) is variability or spread in a variable or a probability distribution. Common examples of measures of statistical dispersion are the variance, standard deviation and interquartile range.

Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.

1 Measures of statistical dispersion
2 Sources of statistical dispersion
3 A partial ordering of dispersion
4 See also
5 References

Measures of statistical dispersion

A measure of statistical dispersion is a real number that is zero if all the data are identical, and increases as the data become more diverse. It cannot be less than zero.

Most measures of dispersion have the same scale as the quantity being measured. In other words, if the measurements have units, such as metres or seconds, the measure of dispersion has the same units. Such measures of dispersion include:

Standard deviation
Interquartile range or Interdecile range
Range
Mean difference
Median absolute deviation
Average absolute deviation (or simply called average deviation)
Distance standard deviation

These are frequently used (together with scale factors) as estimators of scale parameters, in which capacity they are called estimates of scale. Robust measures of scale are those unaffected by a small fraction of outliers.

All the above measures of statistical dispersion have the useful property that they are location-invariant, as well as linear in scale. So if a random variable X has a dispersion of S_X then a linear transformation Y = aX + b for real a and b should have dispersion S_Y = |a|S_X.

Other measures of dispersion are dimensionless (scale-free). In other words, they have no units even if the variable itself has units. These include:

Coefficient of variation
Quartile coefficient of dispersion
Relative mean difference, equal to twice the Gini coefficient

There are other measures of dispersion:

Variance (the square of the standard deviation) — location-invariant but not linear in scale.
Variance-to-mean ratio — mostly used for count data when the term coefficient of dispersion is used and when this ratio is dimensionless, as count data are themselves dimensionless: otherwise this is not scale-free.

Some measures of dispersion have specialized purposes, among them the Allan variance and the Hadamard variance.

For categorical variables, it is less common to measure dispersion by a single number. See qualitative variation. One measure that does so is the discrete entropy.

Sources of statistical dispersion

In the physical sciences, such variability may result from random measurement errors: instrument measurements are often not perfectly precise, i.e., reproducible, and there is additional inter-rater variability in interpreting and reporting the measured results. One may assume that the quantity being measured is stable, and that the variation between measurements is due to observational error. A system of a large number of particles is characterized by the mean values of a relatively few number of macroscopic quantities such as temperature, energy, and density. The standard deviation is an important measure in Fluctuation theory, which explains many physical phenomena, including why the sky is blue^[1] .

In the biological sciences, the quantity being measured is seldom unchanging and stable, and the variation observed might additionally be intrinsic to the phenomenon: It may be due to inter-individual variability, that is, distinct members of a population differing from each other. Also, it may be due to intra-individual variability, that is, one and the same subject differing in tests taken at different times or in other differing conditions. Such types of variability are also seen in the arena of manufactured products; even there, the meticulous scientist finds variation.

In economics, finance, and other disciplines, regression analysis attempts to explain the dispersion of a dependent variable, generally measured by its variance, using one or more independent variables each of which itself has positive dispersion. The fraction of variance explained is called the coefficient of determination.

A partial ordering of dispersion

A mean-preserving spread (MPS)^[2] is a change from one probability distribution A to another probability distribution B, where B is formed by spreading out one or more portions of A's probability density function while leaving the mean (the expected value) unchanged. The concept of a mean-preserving spread provides a partial ordering of probability distributions according to their dispersions: of two probability distributions, one may be ranked as having more dispersion than the other, or alternatively neither may be ranked as having more dispersion.

References

^ McQuarrie, Donald A. (1976). Statistical Mechanics. NY: Harper & Row. ISBN 06-044366-9.
^ Rothschild, Michael, and Stiglitz, Joseph, "Increasing risk I: A definition," Journal of Economic Theory, 1970, 225–243.

Statistics

Descriptive statistics

Continuous data

Location	Mean (Arithmetic, Geometric, Harmonic) Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Designing studies	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling Stratified sampling Opinion poll Questionnaire

Controlled experiment	Design of experiments Randomized experiment Random assignment Replication Blocking Factorial experiment Optimal design

Uncontrolled studies	Natural experiment Quasi-experiment Observational study

Statistical inference

Statistical theory	Sampling distribution Sufficient statistic Meta-analysis

Bayesian inference	Bayesian probability Prior Posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator

Frequentist inference	Confidence interval Hypothesis testing Likelihood-ratio

Specific tests	Z-test (normal) Student's t-test F-test Pearson's chi-squared test Wald test Mann–Whitney U Shapiro–Wilk Signed-rank Kolmogorov–Smirnov test

General estimation	Bias Robustness Efficiency Maximum likelihood Method of moments Minimum distance Density estimation

Correlation and regression analysis

Correlation	Pearson product-moment correlation Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust

Generalized linear model	Exponential families Logistic (Bernoulli) Binomial Poisson

Partition of variance	Analysis of variance (ANOVA) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data	Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test

Multivariate statistics	Multivariate regression Principal components Factor analysis Cluster analysis Copulas

Time series analysis	Decomposition (Trend, Stationary process) ARMA model ARIMA model Vector autoregression Spectral density estimation

Survival analysis	Survival function Kaplan–Meier Logrank test Failure rate Proportional hazards models Accelerated failure time model

Applications

Biostatistics	Bioinformatics Biometrics Clinical trials & studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process & Quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Outline
Index

Statistical dispersion

Contents

Measures of statistical dispersion

Sources of statistical dispersion

A partial ordering of dispersion

See also

References