Regression toward the mean
From Wikipedia, the free encyclopedia
In statistics regression toward the mean, sometimes called the regression effect in other disciplines, is a principle stating a relationship between a measurement that is used to split a population into groups, and a second measurement of the groups thereby created. It states that given one measurement, another measurement — made only on a selection of those having a first measurement which is either higher or lower than the overall average — is expected to produce a result that is closer to the overall average than the observed value of the first measurement. The degree of regression toward the mean becomes more extreme, other things being equal, as the distance of the first measurement from the average becomes larger. The less reproduceable the measurement, the more randomness there is in the quantity measured, the more there is expectation that regression toward the mean will be seen in the second measurement.
Contents |
[edit] Explanation
Consider, for example, students who take a midterm and a final exam. Students who got an extremely high score on the midterm will probably get a good score on the final exam as well, but we expect their score to generally be closer to the average than their midterm score was. This is because there are more students with less than exceptional skill than there are students with exceptional skill. Because there are more un-exceptional students than exceptional students it is not very likely that a student with an un-exceptional score had exceptional abilitities but was unlucky. It is more likely that a student with an exceptional score was a less-skilled student that got lucky. There are simply more un-exceptional students that have a chance at getting lucky than there are exceptional students who might be unlucky.
Of course, any given student is equally likely to have either good or bad luck. It is the uneven division of students into two groups that makes some cases of good luck matter whereas other cases of bad luck, i.e. the un-exceptional students who also had bad luck, have no effect on the test results of the students who performed well. Dividing the population of students into two groups based on whether or not they tested exceptionally well puts more students into the unexceptional category and so there are more opportunities for good luck to shift someone in the larger group into the smaller than there are for bad luck to shift people from the smaller group into the larger. Necessarily then, examination of only those students who tested well is likely to reveal more good luck than bad. Since it is likely that some good luck was involved in getting the exceptional midterm score, this luck cannot be counted on for the final and the score of our specially selected group of students will probably be closer to average on the final.
Consider now that group of students which did not get an exceptional score on the midterm. Clearly, they have an average score that's lower than the average score of all students. But because luck tends to even out, there is as much good luck as bad luck, and the high scoring group of students have already been shown to be lucky, the low scoring group of students must contain a preponderance of unlucky students. Their bad luck will not last either, and the low scoring group of students will score, on average, better on the final exam, closer to the overall student average, than they did on the midterm. This group of students will also regress toward the mean.
The change in the non-exceptional student's average score, however, will not be as large as that of the students that did exceptionally well on the midterm. This makes sense because the group of students who did not get an exceptional score is the larger group, and so it is more difficult to make big changes to its overall average. The "quantity of luck", good or bad, of both groups is the same so when distributing the luck among the larger number of students each student will be less affected. Another way to approach the same notion is to note that the average score of the students who performed less than exceptionally on the midterm is not nearly as far from the overall average as that of the students who performed exceptionally on the midterm. Equivalent proportional changes in the two groups, say, a second test in which each group's average is twice as close to the overall average, represent much different absolute changes.
What makes regression toward the mean "work", in these examples and in general, is the re-testing of a select group, a group chosen specifically for its non-averageness. Note that the order of the testing does not matter. It is also true that among those who get exceptionally high final exam scores, the average midterm score will not have been as far above average as the final exam score, since some of the students will have obtained high scores on the final due to luck that they didn't have on the midterm. Whether the select group is above or below the average does not matter either; unusually low scores also regress toward the mean. Thus, if students who obtain a very low midterm score are selected, their average on the final exam is expected go up from their low midterm score and be closer to the average for all students.
The regression effect also explains the commonplace observation that offspring of two championship athletes, or of two geniuses, is usually a child who is above average but less talented than either of their parents. In these sorts of cases the tested populations, the parents — who are being tested for the ability to have talented children — are preselected to be non-average with respect to getting a good combination of genes, having a healthy developmental environment, a suitable personality, and so forth. Clearly, just as in academic test taking, there is some luck involved when it comes to being above-average in these respects.
[edit] History
The first regression line drawn on biological data was a plot of seed weights presented by Francis Galton at a Royal Institution lecture in 1877. Galton had seven sets of sweet pea seeds labeled K to Q and in each packet the seeds were of the same weight. He chose sweet peas on the advice of his cousin Charles Darwin and the botanist Joseph Dalton Hooker as sweet peas tend not to self fertilise and the seed weight varies little with humidity. He distributed these packets to a group of friends throughout Great Britain who planted them. At the end of the growing season the plants were uprooted and returned to Galton. The seeds were distributed because when Galton had tried this experiment himself in the Kew Gardens in 1874, the crop had failed.
He found that the weights of the offspring seeds were normally distributed, like their parents, and that if he plotted the mean diameter of the offspring seeds against the mean diameter of their parents he could draw a straight line through the points — the first regression line. He also found on this plot that the mean size of the offspring seeds tended to the overall mean size. He initially referred to the slope of this line as the "coefficient of reversion". Once he discovered that this effect was not a heritable property but the result of his manipulations of the data, he changed the name to the "coefficient of regression". This result was important because it appeared to conflict with the current thinking on evolution and natural selection. He went to do extensive work in quantitative genetics and in 1888 coined the term "co-relation" and used the now familiar symbol "r" for this value.
In additional work he investigated geniuses in various fields and noted that their children, while typically gifted, were almost invariably closer to the average than their exceptional parents. He later described the same effect more numerically by comparing fathers' heights to their sons' heights. Again, the heights of sons both of unusually tall fathers and of unusually short fathers was typically closer to the mean height than their fathers' heights.
[edit] Ubiquity
It is important to realize that regression toward the mean is unrelated to the progression of time: the fathers of exceptionally tall people also tend to be closer to the mean than their sons. The overall variability of height among fathers and sons is the same.
The original version of regression toward the mean suggests an identical trait with two correlated measurements with the same reliability. However, this character is not necessary, unless any pair of predicting and predicted variables had to be viewed with an identical potential trait. The necessary implicate presumption is that the standard deviations of the predicting and the predicted are the same to be comparable, or have been transformed or interpreted to be comparable.
One later version of regression toward the mean defines a predicting variable with measurement error which impairs the predicting coefficient. This interpretation is not necessary. For example, in the original case the measurement error of length could be ignored.
[edit] Mathematical derivation
Let X and Y be zero mean jointly Gaussian random variables with the same variance, and correlation coefficient r. The Cauchy-Schwartz inequality shows that |r| <= 1. From Gaussianity, the expected value of Y conditioned on the value of X is linear in X; more precisely, E[Y|X]=rX, hence the estimated value for Y is closer to the mean 0 than the observed value X since |r| <= 1. Similar results can be obtained for more general classes of distributions. For example, let (X,Y) be jointly normal as above, and define W=AX, Z=AY, where A is any absolutely integrable scalar random variable independent of X and Y. The variables W and Z have zero mean but are not Gaussian. Nevertheless, it is possible to prove that the linear regression property still holds: E[Z|W]=rW, and once again regression toward the mean is observed.
The example illustrates a general feature: regression toward the mean is more pronounced the less the two variables are correlated, i.e. the smaller |r| is.
The phenomenon of regression toward the mean is related to Stein's example.
[edit] Regression fallacies
Misunderstandings of the principle (known as "regression fallacies") have repeatedly led to mistaken claims in the scientific literature.
An extreme example is Horace Secrist's 1933 book The Triumph of Mediocrity in Business, in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend towards the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer likened the book to "proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals".
A different regression fallacy occurs in the following example. We want to test whether a certain stress-reducing drug increases reading skills of poor readers. Pupils are given a reading test. The lowest 10% scorers are then given the drug, and tested again, with a different test that also measures reading skill. We find that the average reading score of our group has improved significantly. This however does not show anything about the effectiveness of the drug: even without the drug, the principle of regression toward the mean would have predicted the same outcome. (The solution is to introduce a control group, compare results between the group to which drugs were administered and the control group, and make no comparisons with the original population. This removes the bias between the groups compared.)
The calculation and interpretation of "improvement scores" on standardized educational tests in Massachusetts probably provides another example of the regression fallacy. In 1999, schools were given improvement goals. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed. As in many cases involving statistics and public policy, the issue is debated, but "improvement scores" were not announced in subsequent years and the findings appear to be a case of regression to the mean.
[edit] In sports
Statistical analysts have long recognized the effect of regression to the mean in sports; they even have a special name for it: the "Sophomore Slump." For example, Carmelo Anthony of the NBA's Denver Nuggets had an outstanding rookie season in 2004. It was so outstanding, in fact, that he couldn't possibly be expected to repeat it: in 2005, Anthony's numbers had slightly dropped from his torrid rookie season. The reasons for the "sophomore slump" abound, as sports are all about adjustment and counter-adjustment, but luck-based excellence as a rookie is as good a reason as any. Of course, not just "sophomores" experience regression to the mean. Any athlete who posts a significant outlier, whether as a rookie (young players are universally not as good as those in their prime seasons), or particularly after their prime years (for most sports, the mid to late twenties), can be expected to perform more in line with their established standards of performance. The trick for sports executives, then, is to determine whether or not a player's play in the previous season was indeed an outlier, or if the player has established a new level of play. However, this is not easy. Melvin Mora of the Baltimore Orioles put up a season in 2003, at age 31, that was so far away from his performance in prior seasons that analysts assumed it had to be an outlier... but in 2004, Mora was even better. Mora, then, had truly established a new level of production, though he will likely regress to his more reasonable 2003 numbers in 2005. Conversely, Kurt Thomas of the New York Knicks significantly ramped up his production in 2001, at an age (29) when players typically start to play more poorly. Sure enough, in the following season Thomas was his old self again, having regressed to the mean of his established level of play. John Hollinger has an alternate name for the law of regression to the mean: the "fluke rule." Whatever you call it, though, regression to the mean is a fact of life, and also of sports.
Regression to the mean in sports performance produced the "Sports Illustrated Jinx" superstition, in all probability. Athletes believe that being on the cover of Sports Illustrated jinxes their future performance, where this apparent jinx was an artifact of regression.
[edit] References
- J.M. Bland and D.G. Altman. "Statistic Notes: Regression towards the mean", British Medical Journal 308:1499, 1994. (Article, including a diagram of Galton's original data, online at: [1])
- Francis Galton. "Regression Towards Mediocrity in Hereditary Stature," Journal of the Anthropological Institute, 15:246-263 (1886). (Facsimile at: [2])
- Stephen M. Stigler. Statistics on the Table, Harvard University Press, 1999. (See Chapter 9.)
[edit] External links
- A non-mathematical explanation of regression toward the mean.
- A simulation of regression toward the mean.
- Amanda Wachsmuth, Leland Wilkinson, Gerard E. Dallal. Galton's Bend: An Undiscovered Nonlinearity in Galton's Family Stature Regression Data and a Likely Explanation Based on Pearson and Lee's Stature Data (A modern look at Galton's analysis.)
- Massachusetts standardized test scores, interpreted by a statistician as an example of regression: see discussion in sci.stat.edu and its continuation.