Mean squared error

From Wikipedia, the free encyclopedia

For meanings of the 3-letter abbreviation see MSE

In statistics, the mean squared error or MSE of an estimator is the expected value of the square of the "error." The error is the amount by which the estimator differs from the quantity to be estimated. The difference occurs because of randomness or because the estimator doesn't account for information that could produce a more accurate estimate.

Formally, the MSE of estimator T of an unobservable parameter θ is

\operatorname{MSE}(T)=\operatorname{E}((T-\theta)^2)

It can be shown that the MSE expresses the variance plus the bias of the estimator, or

\operatorname{MSE}(T)=\operatorname{var}(T)+(\operatorname{bias}(T))^2

The root mean squared error (RMSE) (or root mean squared deviation (RMSD)) is simply the square root of the MSE. This is not to be confused with the expected value of the absolute value of the error, which is not equivalent to the RMSE.

Contents

[edit] Examples

Suppose we have a random sample of size n from a normally distributed population, X_1,\dots,X_n\sim\operatorname{N}(\mu,\sigma^2).

Some commonly-used estimators of the true parameters of the population, μ and σ2, are:

True value Estimator Mean squared error
θ = μ T = the unbiased estimator of the sample mean, \overline{X}=\frac{1}{n}\sum_{i=1}^n(X_i) \operatorname{MSE}(\overline{X})=\operatorname{E}((\overline{X}-\mu)^2)=\left(\frac{\sigma}{\sqrt{n}}\right)^2
θ = σ2 T = the unbiased estimator of the sample variance, S^2 = \frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2 \operatorname{MSE}(S^2)=\operatorname{E}((S^2-\sigma^2)^2)=\operatorname{var}(S^2)

Notice how these examples also illustrate one facet of the bias-variance decomposition. The MSE of unbiased estimators are just their variance. The MSE of a biased estimator would have a non-zero bias term as well as a variance term.

[edit] Applications

  • In statistical modelling, the MSE is defined as the difference between the actual observations and the response predicted by the model and is used to determine whether the model does not fit the data or whether the model can be simplified by removing terms.
  • In Bioinformatics, the RMSD is the measure of the average distance between the backbones of superimposed proteins.
  • In GIS, the RMSE is one measure used to assess the accuracy of spatial analysis and remote sensing.
  • In Imaging Science, the RMSD is one measure used to assess how well a method to reconstruct an image performs relative to the original image.

[edit] See also

[edit] External links

In other languages