F-divergence
From Wikipedia, the free encyclopedia
In probability theory, an f-divergence is a function If(P,Q) that measures the difference between two probability distributions P and Q. The divergence is intuitively an average of the function f of the odds ratio given by P and Q.
These divergences were introduced and studied independently by Csiszár (1967) and Ali and Silvey (1966) and are sometimes known as Csiszár f-divergences or Ali-Silvey distances.
Contents |
[edit] Definition
Let P and Q be two probability distributions over a space Ω such that P is absolutely continuous with respect to Q. Then, for a convex function f such that f(1) = 0, the f-divergence of Q from P is
If P and Q are both absolutely continuous with respect to a reference distribution μ on Ω then their probability densities p and q satisfy dP = p dμ and dQ = q dμ. In this case the f-divergence can be written as
[edit] Instances of f-divergences
Many common divergences, such as KL-divergence, Hellinger distance, and total variation, are special cases of f-divergence, coinciding with a particular choice of f. The following table lists many of the common divergences between probability distributions and the f function to which they correspond (cf. Liese and Vajda, 2006).
Divergence | Corresponding f(t) |
---|---|
KL-divergence | ![]() |
Hellinger distance | ![]() |
Total variation | ![]() |
Pearson divergence | ![]() |
[edit] References
- M. S. Ali and D. Silvey (1966). A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society, Ser. B, No. 28, pp.131-140
- I. Csiszár, (1967). Information-type measures of difference of probability distributions and indirect observation, Studia Sci. Math. Hungar., Vol 2, pp. 229-318
- I. Csiszár and P. Shields (2004), Information Theory and Statistics: A Tutorial, Foundations and Trends in Communications and Information Theory, Vol 1, No 4 (2004) 417-528
- F. Liese and I. Vajda (2006). On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394–4412