Wasserstein metric

From Wikipedia, the free encyclopedia

In mathematics, the Wasserstein metric is a metric on the space of probability measures on a given metric space.

The Wasserstein distance is named for the Russian mathematician L.N. Vasershtein; the usage of "Wasserstein" can be attributed to a German-influenced transliteration of the Cyrillic lettering. Following Ambrosio, Gigli & Savaré, this article acknowledges that the "Vasershtein" spelling is more correct, but that the "Wasserstein" spelling is more widespread in English-language publications.

It was first introduced by L.N. Vasershtein in 1969; R.L. Dobrushin coined the term "Wasserstein/Vasershtein distance" in 1970.

Contents

[edit] Definition

Let (M,d) be a metric space for which every probability measure on M is a Radon measure (a so-called Radon space). For p \geq 1, let \mathcal{P}_{p} (M) denote the collection of all probability measures on M with finite pth moment.

Then the pth Wasserstein distance between two probability measures \mu, \nu \in \mathcal{P}_{p} (M) is defined as

\left( \inf_{\gamma \in \Gamma (\mu, \nu)} \int_{M \times M} d(x, y)^{p} \, \mathrm{d} \gamma (x, y) \right)^{1/p},

where Γ(μ,ν) denotes the collection of all measures on M \times M with marginals μ and ν on the first and second factors respectively.

The above distance is usually denoted Wp(μ,ν) (typically among authors who prefer the "Wasserstein" spelling) or \ell_{p} (\mu, \nu) (typically among authors who prefer the "Vasershtein" spelling).

It can be shown that Wp satisfies all the axioms of a metric on \mathcal{P}_{p} (M).

[edit] Properties

[edit] Dual representation of W1

The following dual representation of W1 is a special case of the 1958 duality theorem of Kantorovich and Rubinstein (1958): when μ,ν have bounded support,

W_{1} (\mu, \nu) = \sup \left\{ \left. \int_{M} f(x) \, \mathrm{d} (\mu - \nu) (x) \right| \mathrm{continuous\,} f : M \to \mathbb{R}, \mathrm{Lip} (f) \leq 1 \right\},

where Lip(f) denotes the minimal Lipschitz constant for f.

Compare this with the definition of the Radon metric:

\rho (\mu, \nu) := \sup \left\{ \left. \int_{M} f(x) \, \mathrm{d} (\mu - \nu) (x) \right| \mathrm{continuous\,} f : M \to [-1, 1] \subsetneq \mathbb{R} \right\}.

If the metric d is bounded by some constant C, then

2W_{1} (\mu, \nu) \leq C\rho (\mu, \nu),

and so convergence in the Radon metric (also known as strong convergence) implies convergence in the Wasserstein metric, but not vice versa.

[edit] Separability and completeness

For any p \geq 1, the metric space \left( \mathcal{P}_{p} (M), W_{p} \right) is separable, and is complete if (M,d) is separable and complete.

[edit] See also

[edit] References

  • Ambrosio, L., Gigli, N. & Savaré, G. (2005). Gradient Flows in Metric Spaces and in the Space of Probability Measures. ETH Zürich, Birkhäuser Verlag, Basel. ISBN 3-764-32428-7.
  • Rueshendorff, L., "Wasserstein metric" SpringerLink Encyclopaedia of Mathematics (2001)