Variance inflation factor

From Wikipedia, the free encyclopedia

In statistics, the variance inflation factor (VIF) is a method of detecting the severity of multicollinearity. More precisely, the VIF is an index which measures how much the variance of a coefficient (square of the standard deviation) is increased because of collinearity. Considering the following regression equation with k independent variables

Y = β0 + β1 X1 + β2 X 2 + ... + βk Xk + ε

VIF can be calculated in three steps:

Contents

[edit] Step one

One can calculate k different VIFs, one for each Xi by first running an ordinary least square regression that has Xi as a function of all the other explanatory variables in the first equation.
If i = 1, for example, the equation would be X1 = α2X2 + α3 X3 + ... + αk Xk +c0 + e

where c0 is a constant and e is the error term.

[edit] Step two

Then one can calculate the VIF factor for \hat\beta_i with the following formula: \mathrm{VIF}(\hat{\beta_i})= \frac{1}{1-R^2_i} where R²iis the coefficient of determination of the regression equation in step one.

[edit] Step three

Analyse the magnitude of multicollinearity by considering the size of the VIF(\hat \beta_i). A common rule of thumb is that if VIF(\hat \beta_i) > 5 then multicollinearity is high. Also 10 has been proposed (see KNN book referenced below) as a cut off value.

Some software calculates the tolerance which is just the reciprocal of the VIF. The choice of which formula to use is mostly a personal preference of the researcher.

[edit] Interpretation

The square root of the variance inflation factor tells you how much larger the standard error is, compared with what it would be if that variable were uncorrelated with the other independent variables in the equation.
Example
If the variance inflation factor of an independent variable were 5.27 (\sqrt{5.27} = 2.3) this means that the standard error for the coefficient of that independent variable is 2.3 times as large as it would be if that independent variable were uncorrelated with the other independent variables.


[edit] References

Longnecker, M.T & Ott, R.L :"A First Course in Statistical Methods", page 615. Thomson Brooks/Cole, 2004.
Studenmund, A.H: "Using Econometrics: Apractical guide",5th Edition, page 258-259. Pearson International Edition, 2006.
Hair JF, Anderson R, Tatham RL, Black WC: "Multivariate Data Analysis". Prentice Hall: Upper Saddle River, N.J. 2006.
Marquardt, D.W. 1970 "Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Eestimation" Technometrics 12(3), 591, 605-07
Allison, P.D. "Multiple Regression: a primer", page 142. Pine Forge Press: Thousand Oaks, C.A. 1999.
Kutner, Nachtsheim, Neter, "Applied Linear Regression Models", 4th edition, McGraw-Hill Irwin, 2004.