Rand index
From Wikipedia, the free encyclopedia
The Rand index or Rand measure is a measure of the similarity between two data clusters.
[edit] Definition
Given a set of n elements and two partitions of S to compare, and , we define the following:
- a, the number of pairs of elements in S that are in the same set in X and in the same set in Y
- b, the number of pairs of elements in S that are in different sets in X and in different sets in Y
- c, the number of pairs of elements in S that are in the same set in X and in different sets in Y
- d, the number of pairs of elements in S that are in different sets in X and in the same set in Y
The Rand index, R, is:
Intuitively, one can think of a + b as the number of agreements between X and Y and c + d as the number of disagreements between X and Y.
The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
[edit] References
- W. M. Rand, Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, pp846–850 (1971).
- K. Y. Yeung, W. L. Ruzzo, Details of the Adjusted Rand index and Clustering algorithms, Bioinformatics. [1]