Probability metric

From Wikipedia, the free encyclopedia

A probability metric is a function defining a distance between random variables or vectors. In particular the probability metric does not satisfy the identity of indiscernibles condition required to be satisfied by the metric of the metric space.

Contents

[edit] Probability metric of random variables

A probability metric D between two random variables X and Y may be defined as:

D(X, Y) = E(|X - Y|).\,

If the joint probability distribution is absolutely continuous, this is the same as

\int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|F(x, y) \, dx\, dy,

where F(x, y) denotes the joint probability density function of random variables X and Y. Obviously, if X and Y are independent from each other, the equation above transforms into:

D(X, Y) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|f(x)g(y) \, dx\, dy

where f(x) and g(y) are the probability density functions of X and Y respectively.

One may easily show that such probability metrics do not satisfy the identity of indiscernibles condition of the metric or satisfies it if and only if both of its arguments X, Y are certain events described by Dirac delta density probability distribution functions. In this case:

D_{\delta\delta}(X, Y) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|\delta(x-\mu_x)\delta(y-\mu_y) \, dx\, dy = |\mu_x-\mu_y|

the probability metric simply transforms into the metric between expected values μx, μy of the variables X and Y and obviously:

D_{\delta\delta}(X, X) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-x'|\delta(x-\mu_x)\delta(x'-\mu_x) \, dx\, dx' = |\mu_x-\mu_x| = 0.

For all other cases:

D\left(X, X\right) > 0.
Probability metric between two random variables X and Y, both having normal distributions and the same standard deviation σ = 0,σ = 0.2,σ = 0.4,σ = 0.6,σ = 0.8,σ = 1 (beginning with the bottom curve). mxy =  | μx − μy |  denotes a distance between means of X and Y.
Probability metric between two random variables X and Y, both having normal distributions and the same standard deviation σ = 0,σ = 0.2,σ = 0.4,σ = 0.6,σ = 0.8,σ = 1 (beginning with the bottom curve). mxy = | μx − μy | denotes a distance between means of X and Y.

[edit] Example: two continuous random variables with normal distributions (NN)

If both probability distribution functions of random variables X and Y are normal distributions (N) having the same standard deviation σ, and moreover X and Y are independent, then evaluating D(XY) yields


D_{NN}(X, Y) = \mu_{xy} + \frac{2\sigma}{\sqrt\pi}\operatorname{exp}\left(-\frac{\mu_{xy}^2}{4\sigma^2}\right)-\mu_{xy} \operatorname{erfc} \left(\frac{\mu_{xy}}{2\sigma}\right)

where

\mu_{xy} = \left|\mu_x-\mu_y\right|,

erfc(x) is the complementary error function and subscripts NN indicate the type of the metric.

In this case "zero value" of the probability metric DNN(X,Y) amounts:

\lim_{\mu_{xy}\to 0} D_{NN}(X, Y) = D_{NN}(X, X) = \frac{2\sigma}{\sqrt\pi}.

[edit] Example: two continuous random variables with uniform distributions (RR)

In case both random variables X and Y are characterized by uniform distributions (R) of the same standard deviation σ, integrating D(XY) yields:

D_{RR}(X, Y) = \begin{cases} \frac{24\sqrt{3}\sigma^3-\mu_{xy}^3+6\sqrt{3}\sigma\mu_{xy}^2}{36\sigma^2}, & \mu_{xy}<2\sqrt{3}\sigma, \\ \mu_{xy}, & \mu_{xy} \ge 2\sqrt{3}\sigma. \end{cases}

The minimal value of this kind of probability metric amounts:

D_{RR}(X, X) = \frac{2\sigma}{\sqrt{3}}.

[edit] Probability metric of discrete random variables

In case random variables X and Y are characterized by discrete probability distribution the probability metric D may be defined as:

D(X, Y) = \sum_{i} \sum_{j} |x_i-y_j|P(X=x_i)P(Y=y_j)\,.

For example for two discrete Poisson-distributed random variables X and Y the equation above transforms into:

D_{PP}(X, Y) = \sum_{x=0}^n\sum_{y=0}^n |x-y|\frac{{\lambda_x}^x{\lambda_y}^ye^{-(\lambda_x+\lambda_y)}}{x!y!}.

[edit] Probability metric of random vectors

equidistant surface for Euclidean metric
equidistant surface for Euclidean metric d^{2}(\mathbf{x},\mathbf{0}), \left(\mathbf{x,0}\right) \in \mathbb{R}^2
equidistant surface for Euclidean probability metric
equidistant surface for Euclidean probability metric D_{R\delta}^{2}(\mathbf{X},\mathbf{0}), \left(\mathbf{X,0}\right): \Omega \to \mathbb{R}^2

The probability metric of random variables may be extended into metric D(X, Y) of random vectors X, Y by substituting | xy | with any metric operator d(x,y):

D(\mathbf{X}, \mathbf{Y}) =\int_{\Omega} \int_{\Omega} d(\mathbf{x}, \mathbf{y})F(\mathbf{x}, \mathbf{y}) \, d\Omega_x \, d\Omega_y,

where F(X, Y) is the joint probability density function of random vectors X and Y. For example substituting d(x,y) with Euclidean metric and providing the vectors X and Y are mutually independent would yield:

D(\mathbf{X}, \mathbf{Y}) =\int_{\Omega} \int_{\Omega} \sqrt{\sum_i|x_i-y_i|^2} F(\mathbf{x})G(\mathbf{y}) \, d\Omega_x \, d \Omega_y.

[edit] Probability metric of random vectors - the Euclidean form

If the random vectors X and Y are not only mutually independent but also all components of each vector are mutually independent, the probability metric for random vectors may be defined as:

D_{**}^{(p)}(\mathbf{X}, \mathbf{Y}) = \left( {\sum_i{D_{**}(X_i, Y_i)}^p}   \right)^{\frac1p}

where:

D_{**}(X_i, Y_i)\,

is a particular form of probability metric of random variables chosen in dependence of the distributions of particular coefficients Xi and Yi of vectors X, Y .

[edit] Physical interpretation

The probability metric may be considered as a distance between particles in quantum mechanics, where a particle is described by wavefunction ψ and the probability dP that the particle is present in a given volume of space dV amounts:

dP = |\psi(x, y, z)|^2  dV\,.

[edit] A quantum particle in a box

Let us consider a quantum particle (X) in a box of length L. If the wavefunction of this particle is in the form:

\psi_m(x) = \sqrt{\frac{2}{L}} \sin{\left(\frac{m \pi x}{L} \right)}, \,

than the probability metric between this particle and any point \xi \in (0, L)\, of the box amounts:

D(X, \xi) = \int\limits_{0}^L |x-\xi||\psi_m(x)|^2dx = \frac{\xi^2}{L} - \xi +L\left(\frac{1}{2}-\frac{\sin^2(\frac{m\pi\xi}{L})}{m^2\pi^2}\right).

From the properties of the probability metric it follows that the sum of the distance between the edge of the box (ξ = 0 or ξ= L) and a given point and the probability metric between this point and the particle differs to the probability metric between the edge of the box and the particle. E.g. for a quantum particle at an energy level m = 2:

d(0,0.2L) + D(X, 0.2L) \approx 0.2L + 0.3171L = 0.517L \neq  D(X, 0) = D(X, L) = 0.5L\,.

The probability metric between the particle and the edge of the box (0.5L) is nonetheless independent on the particle's energy level.

[edit] Two quantum particles in a box

Probability metric D(Y, Y) between two particles X, Y in a potential well for the first ten energy values m, n of these particles.
Probability metric D(Y, Y) between two particles X, Y in a potential well for the first ten energy values m, n of these particles.

A distance between two particles bouncing in one dimensional box of length L having time-independent wavefunctions:

\psi_m(x) = \sqrt{\frac{2}{L}} \sin{\left(\frac{m \pi x}{L} \right)}, \,
\psi_n(y) = \sqrt{\frac{2}{L}} \sin{\left(\frac{n \pi y}{L} \right)}, \,

may be defined in terms of probability metric of independent random variables as:

\begin{align}
&{} D(X, Y) = \int\limits_{0}^L \int\limits_{0}^L |x-y||\psi_m(x)|^2|\psi_n(y)|^2 \, dx\, dy \\
&{} = L\left(\frac{1}{3}(m+n)^2 - \frac{m^4 + n^4 + 2m^3n + 2mn^3 + 2m^2n^2}{2m^2n^2\pi^2}  \right).
\end{align}

The distance between particles X and Y is obviously minimum for m = 1 i n = 1, that is for the minimum energy levels of these particles and amounts:

\min(D(X, Y)) = L\left(\frac{4}{3}-\frac{4}{\pi^2} \right) \approx 0.93L \,.

According to the probability metric properties the minimum distance is nonzero. In fact it is close to the length L of the potential well. For other energy levels it is even greater than the length of the well.

[edit] External references

Languages