Probability metric

From Wikipedia, the free encyclopedia

A probability metric is a function defining a distance between random variables or vectors. In particular the probability metric does not satisfy the identity of indiscernibles condition required to be satisfied by the metric of the metric space.

1 Probability metric of random variables
- 1.1 Example: two continuous random variables with normal distributions (NN)
- 1.2 Example: two continuous random variables with uniform distributions (RR)
2 Probability metric of random vectors
3 Probability metric of random vectors - the Euclidean form
4 External references

[edit] Probability metric of random variables

A probability metric D between two random variables X and Y may be defined as:

$D(X, Y) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|F(x, y) \, dx\, dy$ ,

where F(x, y) denotes the joint probability density function of random variables X and Y. Obviously, if X and Y are independent from each other, the equation above transforms into:

$D(X, Y) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|f(x)g(y) \, dx\, dy$

where f(x) and g(y) are the probability density functions of X and Y respectively.

One may easily show that such probability metrics do not satisfy the identity of indiscernibles condition of the metric or satisfies it if and only if both of its arguments X, Y are certain events described by Dirac delta density probability distribution functions. In this case:

$D_{\delta\delta}(X, Y) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|\delta(x-\mu_x)\delta(y-\mu_y) \, dx\, dy = |\mu_x-\mu_y|$

the probability metric simply transforms into the metric between expected values $μ x$ , $μ y$ of the variables X and Y and obviously:

$D_{\delta\delta}(X, X) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-x'|\delta(x-\mu_x)\delta(x'-\mu_x) \, dx\, dx' = |\mu_x-\mu_x| = 0$ .

For all other cases:

$D\left(X, X\right) > 0$ .

Probability metric between two random variables X and Y, both having normal distributions and the same standard deviation

σ = 0,σ = 0.2,σ = 0.4,σ = 0.6,σ = 0.8,σ = 1

(beginning with the bottom curve).

m x y = | μ x - μ y |

denotes a distance between means of X and Y.

[edit] Example: two continuous random variables with normal distributions (NN)

If both probability distribution functions of random variables X and Y are normal distributions (N) having the same standard deviation $σ$ , integrating $D\left(X, Y\right)$ yields to:

$D_{NN}(X, Y) = \mu_{xy} + \frac{2\sigma}{\sqrt\pi}\operatorname{exp}\left(-\frac{\mu_{xy}^2}{4\sigma^2}\right)-\mu_{xy} \operatorname{erfc} \left(\frac{\mu_{xy}}{2\sigma}\right)$

where:

$\mu_{xy} = \left|\mu_x-\mu_y\right|$ ,

and $\operatorname{erfc}(x)$ is the complementary error function.

In this case "zero value" of the metric $D N N (X, Y)$ amounts:

$\lim_{\mu_{xy}\to 0} D_{NN}(X, Y) = D_{NN}(X, X) = \frac{2\sigma}{\sqrt\pi}$ .

[edit] Example: two continuous random variables with uniform distributions (RR)

In case both random variables X and Y are characterized by uniform distributions (R) of the same standard deviation $σ$ , integrating $D\left(X, Y\right)$ yields to:

$D_{RR}(X, Y) = \begin{cases} \frac{24\sqrt{3}\sigma^3-\mu_{xy}^3+6\sqrt{3}\sigma\mu_{xy}^2}{36\sigma^2}, & \mu_{xy}<2\sqrt{3}\sigma \\ \mu_{xy}, & \mu_{xy} \ge 2\sqrt{3}\sigma \end{cases}$ .

The minimal value of this kind of probability metric amounts:

$D_{RR}(X, X) = \frac{2\sigma}{\sqrt{3}}$ .

[edit] Probability metric of random vectors

equidistant surface for euclidean metric $d^{2}(\mathbf{x},\mathbf{0}), \left(\mathbf{x,0}\right) \in \mathbb{R}^2$

equidistant surface for Euclidean probability metric $D_{R\delta}^{2}(\mathbf{X},\mathbf{0}), \left(\mathbf{X,0}\right): \Omega \to \mathbb{R}^2$

The probability metric of random variables may be extended into metric D(X, Y) of random vectors X, Y by substituting $| x - y |$ with any metric operator d(x,y):

$D(\mathbf{X}, \mathbf{Y}) =\int_{\Omega} \int_{\Omega} d(\mathbf{x}, \mathbf{y})F(\mathbf{x}, \mathbf{y}) \, d\Omega_x \, d\Omega_y$

where F(X, Y) is the joint probability density function of random vectors X and Y. For example substituting d(x,y) with Euclidean metric and providing the vectors X and Y are mutually independent would yield to:

$D(\mathbf{X}, \mathbf{Y}) =\int_{\Omega} \int_{\Omega} \sqrt{\sum_i|x_i-y_i|^2} F(\mathbf{x})G(\mathbf{y}) \, d\Omega_x \, d \Omega_y$ .

[edit] Probability metric of random vectors - the Euclidean form

If the random the vectors X and Y are not only mutually independent but also all components of each vector are mutually independent the probability metric for random vectors may be defined as:

$D^{(p)}(\mathbf{X}, \mathbf{Y}) = \left( {\sum_i{D_{**}(X_i, Y_i)}^p} \right)^{\frac1p}$

where:

D * * (X i, Y i)

is a particular form of probability metric of random variables chosen in dependence of the distributions of particular coefficients $X i$ and $Y i$ of vectors X, Y .