Unit-weighted regression

From Wikipedia, the free encyclopedia

In statistics, unit-weighted regression is perhaps the easiest form of multiple regression analysis, a method in which two or more variables are used to predict the value of an outcome.

At a conceptual level, the example of weight loss can illustrate the idea of multiple regression. If a group of people join a weight loss program, we might wish to predict who would lose weight. The outcome is weight loss. We might find that those who lost weight were likely to increase their fruit intake, to exercise more, and to substitute low-calorie drinks for sugary drinks. The point is that several variables can be considered at the same time for their effect on an outcome of interest.

1 Beta weights
2 Model specification
3 Unit weights
4 Literature review
5 Example
6 Citations
7 External links

[edit] Beta weights

In the standard form of multiple regression, each predictor is multiplied by a number that is called the beta weight. The prediction is obtained by adding these products (and usually by adding a constant, as well). In the weight loss example above, suppose that reducing sugary drinks led to twice as much weight loss as did the other variables. If that were the case, then the beta weight for weight loss would be twice as big as the weights for the other variables.

When the weights are chosen to give the best prediction by some criterion, the model is called a proper linear model. Therefore, multiple regression is a proper linear model. By contrast, unit-weighted regression is called an improper linear model.

[edit] Model specification

Standard multiple regression has a major assumption: it assumes that all the important predictors are in the equation. This assumption is called model specification. A model is specified when all the predictors are in the equation, and no irrelevant predictors are in the equation.

However, in the social sciences, it is rare for a study to be able to know all the important predictors of a behavioral outcome. Therefore, most models are not specified. When the model is not specified, the estimates for the beta weights are not accurate. Because the inclusion of one variable can cause the beta weights to fluctuate wildly, this fluctuation is sometimes called the problem of the bouncing betas. It is this problem with bouncing betas that makes unit-weighted regression a useful method.

[edit] Unit weights

Unit-weighted regression proceeds in three steps. First, predictors for the outcome of interest are selected; ideally, there should be good empirical or theoretical reasons for the selection. Second, continuous predictor variables are changed to Z scores. Third, the predictors are added together; the sum is called the variate. This variate is used as the predictor of the outcome, also expressed in z scores. The relationship of this variate to the outcome is assessed with the Pearson R correlation.

One small variation on unit-weighted regression is to make the weights not one, but one divided by the number of predictors. Thus, with three predictors, the weight of each variable is 1/3; with four predictors, the weight is 1/4; and so on. The value of this variation is that the variate is already in z score form.

A second variation occurs when predictors are binary. In this case, the predictors are scored as one (present) or zero (absent).

[edit] Literature review

The idea of unit-weighted regression was introduced in 1938 by Samuel Stanley Wilks, a leading statistician who had a special interest in multivariate analysis. Wilks described how unit weights could be used in practical settings, when data were not available to estimate beta weights. For example, a small college may want to select good students for admission. But the school may have no money to gather data and conduct a standard multiple regression analysis. In this case, the school could use several predictors -- high school grades, SAT scores, teacher ratings. Wilks showed mathematically why unit weights should work well in practice.

Frank Schmidt in 1971 conducted a simulation study of unit weights. His results showed that Wilks was indeed correct and that unit weights tend to perform well in simulations of practical studies.

Robyn Dawes in 1979 discussed the use of unit weights in applied studies, referring to the robust beauty of unit weighted models. Jacob Cohen in 1990 also discussed the value of unit weights and noted their practical utility. Indeed, he wrote, "As a practical matter, most of the time, we are better off using unit weights" (p. 1306).

Dave Kerby in 2003 showed that unit weights compare well with standard regression, doing so with a cross validation study -- that is, he derived beta weights in one sample and applied them to a second sample. The outcome of interest was suicidal thinking, and the predictor variables were broad personality traits. Kerby also showed how Regression Tree analysis could be combined with unit weights to further simplify unit-weighted regression. In this approach, the variate consists of merely the weighted counts of significant predictors.

[edit] Example

An example may clarify how unit weights can be useful in practice.

Brenna Bry and colleagues (1982) addressed the question of what causes drug use in adolescents. Previous research had made use of multiple regression; with this method, it is natural to look for the best predictor, the one with the highest beta weight. One previous study had found that early use of alcohol was the best predictor. Another study had found that alienation from parents was the best predictor. Still another study had found that a low grades in school was the best predictor. The failure to replicate was clearly a problem, a problem that could be caused by bouncing betas.

Bry and colleagues suggested a different approach. Instead of looking for the best predictor, they looked at the number of predictors. In other words, they gave a unit weight to each predictor. Their study had six predictors: 1) grades in school, 2) affiliation with religion, 3) age of alcohol use, 4) psychological distress, 5) self-esteem, and 6) alienation from parents. Each risk factor was scored as one (present) or zero (absent). For example, grades in school were scored as one when the grades were Ds or Fs. The results showed that the number of risk factors was a good predictor of drug use: adolescents with more risk factors were more likely to use drugs.

The model used by Bry and colleagues was that drug users do not differ in any special way from non-drug users. Rather, they differ in the number of problems they must face. "The number of factors an individual must cope with is more important than exactly what those factors are" (p. 277). Given this model, unit-weighted regression is an appropriate method of analysis.

[edit] Citations

Bry, Brenna H., McKeon, P., & Pandina, R. J. (1982). Extent of drug use as a function of number of risk factors. Journal of Abnormal Psychology, volume 9, pages 273-279.
Cohen, Jacob. (1990). Things I have learned (so far). American Psychologist, volume 45, pages 1304-1312.
Dawes, Robyn M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, volume 34, pages 571-582.
Kerby, Dave S. (2003). CART analysis with unit-weighted regression to predict suicidal ideation from Big Five traits. Personality and Individual Differences, volume 35, pages 249-261.
Schmidt, Frank L. (1971). The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement, volume 31, pages 699-714.
Wilks, S. S. (1938). Weighting systems for linear functions of correlated variables when there is no dependent variable. Psychometrika, volume 3, pages 23-40.