Winsorising

From Wikipedia, the free encyclopedia

The distribution of many statistics can be heavily influenced by outliers. Winsorization or Winsorising is the transformation of extreme values in statistical data. A typical strategy is to set all outliers to a specified percentile of the data; for example, a 90% Winsorisation would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile. Winsorised estimators are usually more robust to outliers than their unwinsorised counterparts. Note that Winsorizing is not equivalent to simply excluding data; for example, a Winsorized mean is not the same as a truncated mean. This is because the order statistics are not independent.

The procedure is named for the engineer-turned-biostatistician Charles P. Winsor (1895-1951).

[edit] References

  • Simplified Estimation from Censored Normal Samples, W. J. Dixon, The Annals of Mathematical Statistics, 31, pp. 385-391, 1960
  • The Future of Data Analysis, J. W. Tukey, The Annals of Mathematical Statistics, 33, p. 18, 1962