Statistical dispersion
From Wikipedia, the free encyclopedia
In statistics, (statistical) dispersion (also called statistical variability or variation) is variability or spread in a variable or a probability distribution. Common examples of measures of statistical dispersion are the variance, standard deviation and interquartile range.
[edit] Measures of statistical dispersion
A measure of statistical dispersion is a real number that is zero if all the data are identical, and increases as the data becomes more diverse. It cannot be less than zero.
Most measures of dispersion have the same scale as the quantity being measured. In other words, if the measurements have units, such as metres or seconds, the measure of dispersion has the same units. Such measures of dispersion include:
- Standard deviation
- Interquartile range
- Range
- Mean difference
- Median absolute deviation
- Average absolute deviation (or simply average deviation)
All the above measures of statistical dispersion have the useful property that they are location-invariant, as well as linear in scale. So if a random variable X has a dispersion of SX then a linear transformation Y = aX + b for real a and b should have dispersion SY = |a|SX.
Other measures of dispersion are dimensionless (scale-free). In other words, they have no units even if the variable itself has units. These include:
- Coefficient of variation
- Quartile coefficient of dispersion
- Relative mean difference, equal to twice the Gini coefficient
There are other measures of dispersion:
- Variance (the square of the standard deviation) — location-invariant but not linear in scale.
- Variance-to-mean ratio — mostly used for count data when the term coefficient of dispersion is used and when this ratio is dimensionless, as count data are themselves dimensionless: otherwise this is not scale-free.
For categorical variables, it is less common to measure dispersion by a single number. See qualitative variation. One measure which does so is the discrete entropy.
[edit] Sources of statistical dispersion
In the physical sciences, such variability may result only from random measurement errors: instrument measurements are often not perfectly precise, i.e., reproducible. One may assume that the quantity being measured is unchanging and stable, and that the variation between measurements is due to observational error.
In the biological sciences, this assumption is false: the variation observed might be intrinsic to the phenomenon: distinct members of a population differ greatly. This is also seen in the arena of manufactured products; even there, the meticulous scientist finds variation.
The simple model of a stable quantity is preferred when it is tenable. Each phenomenon must be examined to see if it warrants such a simplification.