Stepwise regression

From Wikipedia, the free encyclopedia

Stepwise regression is often used to mean an algorithm proposed by Efroymson (1960). This is an automatic procedure for statistical model selection in cases where there are a large number of potential explanatory variables, and no underlying theory on which to base the model selection. The procedure is used primarily in regression analysis, though the basic approach is applicable in many forms of model selection.

This is a variation on forward selection. At each stage in the process, after a new variable is added, a test is made to check if some variables can be deleted without appreciably increasing the Residual Sum of Squares (RSS). The procedure terminates when the measure is (locally) maximized, or when the available improvement falls below some critical value.

Stepwise regression procedures are used in data mining, but critics regard the procedure as a paradigmatic example of data dredging, intense computation often being inadequate substitute for subject area expertise.

In this example, necessity and sufficiency are usually determined by F-tests, t-tests, Adjusted R-square, Akaike Information Criterion, Bayesian Information Criterion, Mallows' Cp, False Discovery Rate, or any of several other model selection criteria, stopping rules, or measures of "goodness of fit." Image:Stepwise.jpg

[edit] See also

  • Backward regression
  • Forward regression

[edit] Reference

  • Efroymson, MA (1960). Multiple regression analysis. In Ralston, A. and Wilf, HS, editors, Mathematical Methods for Digital Computers. Wiley.