Compositional data

From Wikipedia, the free encyclopedia

In statistics, compositional data is data in which each data point is an n-tuple of nonnegative numbers whose sum is 1. Typically each of the n components pi of each data point (p1, ..., pn) says what proportion (or "percentage") of a statistical unit falls into the ith category in a list of n categories.

For example,

  • Each data point may correspond to a rock composed of three different minerals; a rock of which 10% is the first mineral, 30% is the second, and the remaining 60% is the third would correspond to the triple (0.1, 0.3, 0.6); a data set would contain one such triple for each rock in a sample of rocks.
  • Each data point may correspond to a town; a town in which 35% of the people are Christians, 55% are Muslims, 6% are Jews, and the remaining 4% are others would correspond to the quadruple (0.35, 0.55, 0.06, 0.04); a data set would correspond to a list of towns.

Very often ternary plots are used in analysis of compositional data.

[edit] References

  • The Statistical Analysis of Compositional Data by John Aitchison (Chapman & Hall)