Generalized additive model

In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. GAMs were originally developed by Trevor Hastie and Robert Tibshirani ^[1] to blend properties of generalized linear models with additive models.

The model relates a univariate response variable, Y, to some predictor variables, x_i. An exponential family distribution is specified for Y (for example normal, binomial or Poisson distributions) along with a link function g (for example the identity or log functions) relating the expected value of Y to the predictor variables via a structure such as

g(\operatorname{E}(Y))=\beta_0 + f_1(x_1) + f_2(x_2)+ \cdots + f_m(x_m).\,\!

The functions f_i(x_i) may be functions with a specified parametric form (for example a polynomial, or a coefficient depending on the levels of a factor variable) or may be specified non-parametrically, or semi-parametrically, simply as 'smooth functions', to be estimated by non-parametric means. So a typical GAM might use a scatterplot smoothing function, such as a locally weighted mean, for f₁(x₁), and then use a factor model for f₂(x₂). This flexibility to allow non-parametric fits with relaxed assumptions on the actual relationship between response and predictor, provides the potential for better fits to data than purely parametric models, but arguably with some loss of interpretability.

Estimation

The original GAM estimation method was the backfitting algorithm,^[1] which provides a very general modular estimation method capable of using a wide variety of smoothing methods to estimate the $fᵢ (xᵢ)$ . A disadvantage of backfitting is that it is difficult to integrate with well founded methods for choosing the degree of smoothness of the $fᵢ (xᵢ)$ . As a result alternative methods have been developed in which smooth functions are represented semi-parametrically, using penalized regression splines,^[2] in order to allow computationally efficient estimation of the degree of smoothness of the model components using generalized cross validation^[3] or similar criteria.

Overfitting can be a problem with GAMs.^[4] The number of smoothing parameters can be specified, and this number should be reasonably small, certainly well under the degrees of freedom offered by the data. Cross-validation can be used to detect and/or reduce overfitting problems with GAMs (or other statistical methods). Other models such as GLMs may be preferable to GAMs unless GAMs improve predictive ability substantially (in validation sets) for the application in question.

References

↑ 1.0 1.1 Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman & Hall/CRC. ISBN 978-0-412-34390-2.
↑ Wood, S. N. (2006). Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC. ISBN 978-1-58488-474-3.
↑ Wood, S.N. (2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society: Series B 62(2),413-428.
↑ Wood, Simon N. (2008). "Fast stable direct fitting and smoothness selection for generalized additive models". Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (3): 495–518. doi:10.1111/j.1467-9868.2007.00646.x.

External links

gam, an R package for GAMs by backfitting
mgcv, an R package for GAMs using penalized regression splines

Generalized additive model

Estimation

See also

References

External links