Jensen–Shannon divergence

From Wikipedia, the free encyclopedia

In probability theory and statistics, the Jensen-Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average[2]. It is based on the Kullback-Leibler divergence, with the notable (and useful) difference that it is always a finite value.

[edit] Definition

Consider the set M_+^1(A) of probability distributions where A is a set provided with some σ-algebra.

Jensen-Shannon divergence (JSD) M_+^1(A) \times M_+^1(A) \rightarrow [0,\infty] is a symmetrized and smoothed version of the Kullback-Leibler divergenceD(P \parallel Q). It is defined by

JSD(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M)

where M=\frac{1}{2}(P+Q)

[edit] See also

Kullback-Leibler divergence for details calculating the Jensen-Shannon divergence.

[edit] References

  1. ^ Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press, p. 304. ISBN 0-262-13360-1. 
  2. ^ Dagan, Ido; Lillian Lee, Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: pp. 56–63. 
This probability-related article is a stub. You can help Wikipedia by expanding it.