Regularization (machine learning)

From Wikipedia, the free encyclopedia

For other uses in related fields, see Regularization

In statistics and machine learning, regularization is any method of preventing overfitting of data by a model. It is used for solving ill-conditioned parameter-estimation problems. Typical examples of regularization in statistical machine learning include ridge regression, lasso, and L2-norm in support vector machines.

Regularization methods are also used for model selection, where they work by implicitly or explicitly penalizing models based on the number of their parameters. For example, Bayesian learning methods make use of a prior probability that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion (AIC), minimum description length (MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting include cross validation.

Examples of applications of different methods of regularization to the linear model are:

Model Fit measure Entropy measure
AIC/BIC \|Y-X\beta\|_2 \|\beta\|_0
Ridge regression \|Y-X\beta\|_2 \|\beta\|_2
Lasso[1] \|Y-X\beta\|_2 \|\beta\|_1
RLAD[2] \|Y-X\beta\|_1 \|\beta\|_1
Dantzig Selector[3] \|X^\top (Y-X\beta)\|_\infty \|\beta\|_1

[edit] References

  1. ^ Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the Lasso" (PostScript). Journal of the Royal Statistical Society, Series B (Methodology) 58 (1): 267–288. 
  2. ^ Li Wang, Michael D. Gordon & Ji Zhu (December 2006). "Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning". Sixth International Conference on Data Mining: 690–700. doi:10.1109/ICDM.2006.134. 
  3. ^ Candes, Emmanuel; Tao, Terence (2007). "The Dantzig selector: Statistical estimation when p is much larger than n" (arXiv Reprint). Annals of Statistics 35 (6): 2313–2351. doi:10.1214/009053606000001523.