Anscombe's quartet
From Wikipedia, the free encyclopedia
Anscombe's quartet comprises four datasets which have identical simple statistical properties, yet which are revealed to be very different when inspected graphically. Each dataset consists of eleven (x,y) points. They were constructed by statistician F.J. Anscombe. The quartet are a demonstration of effect of outliers on the statistical properties of a dataset, and the importance of looking at data before analysing it.
I | II | III | IV | ||||
---|---|---|---|---|---|---|---|
x | y | x | y | x | y | x | y |
10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |
8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |
13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |
9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |
11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |
14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |
6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |
4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |
12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |
7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |
5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |
For all four datasets:
mean of the x values = 9.0
mean of the y values = 7.5
equation of the least-squared regression line: y = 3 + 0.5x
sums of squared errors (about the mean) = 110.0
regression sums of squared errors (variance accounted for by x) = 27.5
residual sums of squared errors (about the regression line) = 13.75
correlation coefficient = 0.82
coefficient of determination = 0.67
When presented graphically, the four datasets are seen to be very different, as shown below:
Edward Tufte uses the quartet to emphasise the importance of looking at one's data before analyzing it in the first page of the first chapter of his book, The Visual Display of Quantitative Information.
[edit] External references
- F.J. Anscombe, "Graphs in Statistical Analysis," American Statistician, 27 (February 1973), 17-21.