Simpson's paradox

From Wikipedia, the free encyclopedia

Simpson's paradox (or the Yule-Simpson effect) is a statistical paradox described by E. H. Simpson in 1951^[1] and G. U. Yule in 1903, in which the successes of several groups seem to be reversed when the groups are combined. This seemingly impossible result is encountered surprisingly often in social science and medical statistics, and occurs when a weighting variable which is not relevant to the individual group assessment must be used in the combined assessment.

1 Explanation by example
2 Real-world examples
3 "Lurking variable"
4 See also
5 References
6 External links

[edit] Explanation by example

To illustrate the paradox, suppose two people, Lisa and Bart, are let loose on Wikipedia. In the first week, Lisa improves 60 percent of the articles she edits while Bart improves 90 percent of the articles he edits. In the second week, Lisa improves just 10 percent of the articles she edits, while Bart improves 30 percent.

Both times, Bart improved a much higher percentage of articles than Lisa—yet when the two tests are combined, Lisa has improved a much higher percentage than Bart!

	Week 1	Week 2	Total
Lisa	60.0%	10.0%	55.5%
Bart	90.0%	30.0%	35.5%

This strange-looking result comes about because of the varying number of articles worked on by each person - information not presented in the initial presentation. In the first week, Lisa edits 100 articles, improving 60 of them, while Bart edits just 10 articles, improving all but one. In the second week, Lisa edits only 10 articles, improving one, while Bart edits 100 articles, improving 30. When two week's worth of work is combined, both edited the same number of articles, yet Lisa improved 55% of them (61 in total) while Bart improved only 35% of them (39 in total).

	Week 1	Week 2	Total
Lisa	60 / 100	1 / 10	61 / 110
Bart	9 / 10	30 / 100	39 / 110

To recap, introducing some notation that will be useful later:

In the first week

$S_A(1) = 60\%$ — Lisa improved 60% of the many articles she edited.
$S_B(1) = 90\%$ — Bart had a 90% success rate during that time.

Success is associated with Bart.

In the second week

$S_A(2) = 10\%$ — Lisa managed 10% in her busy life.
$S_B(2) = 30\%$ — Bart achieved a 30% success rate.

Success is associated with Bart.

On both occasions Bart's edits were more successful than Lisa's. But if we combine the two sets, we see that Lisa and Bart both edited 110 articles, and:

$S_A = \begin{matrix}\frac{61}{110}\end{matrix}$ — Lisa improved 61 articles.
$S_B = \begin{matrix}\frac{39}{110}\end{matrix}$ — Bart improved only 39.
$S A > S B$ — Success is now associated with Lisa.

Bart is better for each set but worse overall!

The arithmetical basis of the paradox is uncontroversial. If $S B (1) > S A (1)$ and $S B (2) > S A (2)$ we feel that $S B$ must be greater than $S A$ . However if different weights are used to form the overall score for each person then this feeling may be disappointed. Here the first test is weighted $\begin{matrix}\frac{100}{110}\end{matrix}$ for Lisa and $\begin{matrix}\frac{10}{110}\end{matrix}$ for Bart while the weights are reversed on the second test.

$S_A = \begin{matrix}\frac{100}{110}\end{matrix}S_A(1) + \begin{matrix}\frac{10}{110}\end{matrix}S_A(2)$

$S_B = \begin{matrix}\frac{10}{110}\end{matrix}S_B(1) + \begin{matrix}\frac{100}{110}\end{matrix}S_B(2)$

By more extreme reweighting A's overall score can be pushed up towards 60% and B's down towards 30%.

Who is more accomplished? Lisa and Bart's mutual friends think Lisa is better—her overall success rate is higher. But it is possible to have told the story in a way which would make it appear obvious that Bart is more diligent.

[edit] Real-world examples

[edit] The batting average paradox

The most common example of the paradox in America involves batting averages in baseball. It is possible — and in rare occasions it has actually happened — for one player to hit for a higher batting average than another player during the first half of the year, and to do so again during the second half, but to have a lower batting average for the entire year, as shown in this example:

            First Half     Second Half      Total season  
Player A     4/10 (.400)   25/100 (.250)    29/110 (.264)
Player B   35/100 (.350)    2/10  (.200)    37/110 (.336)

Sports sabermetrician Bill James has called attention to this phenomenon.

[edit] A kidney stone treatment example

This is a real-life example from a medical study^[2] comparing the success rates of two treatments for kidney stones. [1]

The first table shows the overall success rates and numbers of treatments for both treatments.

**success rates (successes/total)**
Treatment A	Treatment B
78% (273/350)	83% (289/350)

This seems to show treatment B is more effective. If we include data about kidney stone size, however, the same set of treatments reveals a different answer.

**Results accounting for stone size**
small stones		large stones
Treatment A	Treatment B	Treatment A	Treatment B
Group 1	Group 2	Group 3	Group 4
93% (81/87)	87% (234/270)	73% (192/263)	69% (55/80)

The information about stone size has reversed our conclusion about the effectiveness of each treatment. Now treatment A is seen to be more effective in both cases. In this example the lurking variable (or confounding variable) of stone size was not previously known to be important until its effects were included.

Which treatment is considered better is determined by an inequality between two ratios (successes/total). The reversal of the inequality between the ratios, which creates Simpson's paradox, happens because two effects occur together:

The sizes of the groups which are combined when the lurking variable is ignored are very different. Doctors tend to give the severe cases the better treatment (group 3 in the table above), and the milder cases the inferior treatment (group 2). Therefore, the totals are dominated by these two groups, and not by the much smaller groups 1 and 4.
The lurking variable has a large effect on the ratios, i.e. the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, group 3 does worse than group 2.

[edit] The Berkeley sex bias case

One of the best known real life examples of Simpson's paradox occurred when U. C. Berkeley was sued for bias against women applying to grad school. The admission figures showed that men applying were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to chance. ^[3] However when examining the individual departments, it was found that no department was significantly biased against women; in fact, most departments had a small (and not very significant) bias against men.

The explanation turned out to be that women tended to apply to departments with low rates of admission, while men tended to apply to departments with high rates of admission.

[edit] 2006 US school study

In July 2006, the United States Department of Education released a study^[4] documenting student performances in reading and math in different school settings[2]. It reported that while the math and reading levels for students at grades 4 and 8 were uniformly higher in private/parochial schools than in public schools, repeating the comparisons on demographic subgroups showed much smaller differences which were nearly equally divided in direction.

[edit] "Lurking variable"

Simpson's paradox shows us an extreme example of the importance of including data about possible confounding variables when attempting to calculate correlations.

The "lurking variable" principle also works with the Electoral College, which determines the winner of United States presidential elections. For example, if Candidate A wins 35 of the states and Candidate B wins 15 of the states, the color-coded map will appear to be a landslide for Candidate A; but if Candidate A's states are less populated and Candidate B's states are more populated, it is still possible for Candidate B to win. The lurking variable is the differing number of electoral votes each state carries.

[edit] See also

Low birth weight paradox, an example of Simpson's paradox in action.

[edit] References

^ Simpson,E. H. (1951). "The Interpretation of Interaction in Contingency Tables". Journal of the Royal Statistical Society, Ser. B 13: 238-241.
^ Charig CR, Webb DR, Payne SR, Wickham OE. "Comparison of treatment of renal calculi by operative surgery, percutaneous nephrolithotomy, and extracorporeal shock wave lithotripsy." BMJ 1986;292:879-82.
^ P.J. Bickel, E.A. Hammel and J.W. O'Connell (1975). "Sex Bias in Graduate Admissions: Data From Berkeley". Science 187: 398-404.
^ H. Braun, F. Jenkins and W. Grigg, (2006) "Comparing Private Schools and Public Schools Using Hierarchical Linear Modeling, U.S. Department of Education, National Center for Education Statistics, Institute of Education Sciences, Washington, DC, U.S. government Printing Office

[edit] External links

For a brief history of the origins of the paradox see the entries on Simpson's Paradox and Spurious Correlation in

Earliest known uses of some of the words of mathematics: S