Homoscedasticity

From Wikipedia, the free encyclopedia

Plot with random data showing homoscedasticity.

In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The alternative spelling homo- or heteroskedasticity is equally correct and is also used frequently.

As described by Joshua Isaac Walters, "the assumption of homoscedasticity simplifies mathematical and computational treatment and usually leads to adequate estimation results (e.g. in data mining) even if the assumption is not true." Serious violations in homoscedasticity (assuming a distribution of data is homoscedastic when in actuality it is heteroscedastic) result in underemphasizing the Pearson coefficient.

In a scatterplot of data, homoscedasticity looks like an oval (most x values are concentrated around the mean of y, with fewer and fewer x values as y becomes more extreme in either direction). If a scatterplot looks like any geometric shape other than an oval, the rules of homoscedasticity may have been violated.

1 Assumptions of a regression model
2 Testing
3 Homoscedastic distributions
4 See also

[edit] Assumptions of a regression model

In simple linear regression analysis, one assumption of the fitted model is that the standard deviation of the error terms are constant and does not depend on the x-value. Consequently, each probability distributions for each y (response variable) has the same standard deviation regardless of the x-value (predictor). In short, this assumption is homoskedasticity.

[edit] Testing

Residuals can be tested for homoscedasticity using the Breusch-Pagan test, which regresses square residuals to independent variables. The BP test is sensitive to normality so for general purpose the Koenkar-Basset or generalized Breusch-Pagan test statistic is used. For testing for groupwise heteroscedasticity, the Goldfeld-Quandt test is needed.

[edit] Homoscedastic distributions

Two or more normal distributions, $N (μ i,Σ i)$ , are homoscedastic if they share a common covariance (or correlation) matrix, $\Sigma_i = \Sigma_j,\ \forall i,j$ . Homoscedastic distributions are especially useful to derive statistical pattern recognition and machine learning algorithms. One popular example is Fisher's linear discriminant analysis.