Dummy variable

From Wikipedia, the free encyclopedia

This article is not about "dummy variables" as that term is usually understood in mathematics. See free variables and bound variables.

In regression analysis, a dummy variable (also known as indicator or bound variable) is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. For example, in econometric time series analysis, dummy variables may be used to indicate the occurrence of wars, or major strikes. Use of dummy variables usually increases model fit (coefficient of determination), but at a cost of fewer degrees of freedom and loss of generality of the model. Too many dummy variables result in a model that does not provide any general conclusions.

Dummy variables may be extended to more complex cases. For example, seasonal effects may be captured by creating dummy variables for each of the seasons. In panel data fixed effects estimator dummies are created for each of the units in cross-sectional data (e.g. firms or countries) or periods in a pooled time-series. However in such regressions either the constant term has to be removed, or one of the dummies.

When there are dummies in all observations, the constant term has to be excluded. If a constant term is included in the regression, it is important to exclude one of the dummy variables from the regression, making this the base category against which the others are assessed. If all the dummy variables are included, their sum is equal to 1 (which stands for the variable X₀ to the constant term B₀), resulting in perfect multicollinearity. This is referred to as the dummy variable trap.