In computer science, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved [1].
The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms or probability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions [2] [3].
Contents |
Some applications may require the comparison of distributions with different total masses. One approach is to allow for a partial match, where dirt from the most massive distribution is rearranged to make the least massive, and any leftover "dirt" is discarded at no cost. Under this approach, the EMD is no longer a true distance between distributions. Another approach is to allow for mass to be created or destroyed, on a global and/or local level, as an alternative to transportation, but with a cost penalty. In that case one must specify a real parameter σ, the ratio between the cost of creating or destroying one unit of "dirt", and the cost of transporting it by a unit distance. This is equivalent to minimizing the sum of the earth moving cost plus σ times the L1 distance between the rearranged pile and the second distribution.
If the domain D is discrete, the EMD can be computed by solving an instance transportation problem, which can be solved by the so-called Hungarian algorithm. In particular, if D is a one-dimensional array of "bins" the EMD can be efficiently computed by scanning the array and keeping track of how much dirt needs to be transported between consecutive bins.
An early application of the EMD in computer science was to compare two grayscale images that may differ due to dithering, blurring, or local deformations [4]. In this case, the region is the image's domain, and the total amount of light (or ink) is the "dirt" to be rearranged.
The EMD is widely used in content-based image retrieval to compute distances between the color histograms of two digital images. In this case, the region is the RGB color cube, and each image pixel is a parcel of "dirt". The same technique can be used for any other quantitative pixel attribute, such as luminance, gradient, apparent motion in a video frame, etc..
More generally, the EMD is used in pattern recognition to compare generic summaries or surrogates of data records called signatures. A typical signature consists of list of pairs ((x1,m1), … (xn,mn)), where each xi is a certain "feature" (e.g., color in an image, letter in a text, etc.), and mi is "mass" (how many times that feature occurs the record). Alternatively, xi may be the centroid of a data cluster, and mi the number of entities in that cluster. To compare two such signatures with the EMD, one must define a distance between features, which is interpreted as the cost of turning a unit mass of one feature into a unit mass of the other. The EMD between two signatures is then the minimum cost of turning one of them into the other.
The concept was first introduced by Gaspard Monge in 1781 [5]. The use of the EMD as a distance measure for monochromatic images was described in 1989 by S. Peleg, M. Werman and H. Rom [4]. The name "earth movers' distance" was proposed by J. Stolfi in 1994 [6], and was used in print in 1998 by Y. Rubner, C. Tomasi and L. G. Guibas[7].