Freedman's paradox

In statistical analysis, Freedman's paradox,[1] named after David Freedman, describes a problem in model selection whereby predictor variables with no explanatory power can appear artificially important. Freedman demonstrated (through simulation and asymptotic calculation) that this is a common occurrence when the number of variables is similar to the number of data points. Recently, new information-theoretic estimators have been developed in an attempt to reduce this problem,[2] in addition to the accompanying issue of model selection bias,[3] whereby estimators of predictor variables that have a weak relationship with the response variable are biased.

References

  1. Freedman, D. A. (1983) "A note on screening regression equations." The American Statistician, 37, 152155.
  2. Lukacs, P. M., Burnham, K. P. & Anderson, D. R. (2010) "Model selection bias and Freedman's paradox." Annals of the Institute of Statistical Mathematics, 62(1), 117125 doi:10.1007/s10463-009-0234-4
  3. Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical-Theoretic Approach, 2nd ed. Springer-Verlag.