Affinity propagation

In statistics and data mining, affinity propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points.[1] Unlike clustering algorithms such as k-means or k-medoids, AP does not require the number of clusters to be determined or estimated before running the algorithm. Like k-medoids, AP finds "exemplars", members of the input set that are representative of clusters.[1]

Algorithm

Let x1 through xn be a set of data points, with no assumptions made about their internal structure, and let s be a function that quantifies the similarity between any two points, such that s(xi, xj) > s(xi, xk) iff xi is more similar to xj than to xk.

The algorithm proceeds by alternating two message passing steps, to update two matrices:[1]

Both matrices are initialized to all zeroes, and can be viewed as log-probability tables. The algorithm then performs the following updates iteratively:

a(i,k) \leftarrow \min \left( 0, r(k,k) + \sum_{i' \not\in \{i,k\}} \max(0, r(i',k)) \right) for i \neq k and
a(k,k) \leftarrow \sum_{i' \neq k} \max(0, r(i',k)).

Applications

The inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks, e.g. clustering of pictures of human faces and identifying regulated transcripts, than k-means,[1] even when k-means was allowed many random restarts and initialized using PCA.[2] A study comparing AP and Markov clustering on protein interaction graph partitioning found Markov clustering to work better for that problem.[3] A semi-supervised variant has been proposed for text mining applications.[4]

Software

References

  1. 1.0 1.1 1.2 1.3 Brendan J. Frey; Delbert Dueck (2007). "Clustering by passing messages between data points". Science 315 (5814): 972–976. doi:10.1126/science.1136800. PMID 17218491.
  2. Delbert Dueck; Brendan J. Frey (2007). Non-metric affinity propagation for unsupervised image categorization. Int'l Conf. on Computer Vision.
  3. James Vlasblom; Shoshana Wodak (2009). "Markov clustering versus affinity propagation for the partitioning of protein interaction graphs". BMC Bioinformatics 10 (1): 99. doi:10.1186/1471-2105-10-99.
  4. Renchu Guan; Xiaohu Shi; Maurizio Marchese; Chen Yang; Yanchun Liang (2011). "Text Clustering with Seeds Affinity Propagation". IEEE Transactions on Knowledge & Data Engineering 23 (4): 627–637.
  5. Clustering.jl www.github.com
  6. "Clustering — scikit-learn 0.14.1 documentation". Retrieved 15 July 2014.
  7. apcluster cran.r-project.org>