Anscombe's quartet
From Wikipedia, the free encyclopedia
Anscombe's quartet comprises four datasets which have identical simple statistical properties, yet which are revealed to be very different when inspected graphically. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician F.J. Anscombe to demonstrate the importance of graphing data before analyzing it, and of the effect of outliers on the statistical properties of a dataset.
For all four datasets:
Property | Value |
---|---|
Mean of each x variable | 9.0 |
Variance of each x variable | 10.0 |
Mean of each y variable | 7.5 |
Variance of each y variable | 3.75 |
Correlation between each x and y variable | 0.816 |
Linear regression line | y = 3 + 0.5x |
Edward Tufte uses the quartet to emphasize the importance of looking at one's data before analyzing it in the first page of the first chapter of his book, The Visual Display of Quantitative Information.
The datasets are as follows. The x values are the same for the first three datasets.
I | II | III | IV | ||||
---|---|---|---|---|---|---|---|
x | y | x | y | x | y | x | y |
10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |
8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |
13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |
9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |
11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |
14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |
6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |
4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |
12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |
7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |
5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |
[edit] References
- F.J. Anscombe, "Graphs in Statistical Analysis," American Statistician, 27 (February 1973), 17-21.
- Tufte, Edward R. (2001). The Visual Display of Quantitative Information, 2nd Edition, Cheshire, CT: Graphics Press. ISBN 0961392142