In statistics, the Hodges–Lehmann estimator is a robust and nonparametric estimator of a population's location parameter, the "pseudo–median", which is closely related to the population median. The Hodges–Lehmann estimator is used not only for the pseudo–median of a single population but also for the differences between members of two populations. The Hodges–Lehmann estimator was proposed in 1963 independently by Pranab Kumar Sen and by Joseph Hodges and Erich Lehmann, and so it is also called the "Hodges–Lehmann–Sen estimator".[1]
Contents |
The Hodges–Lehmann estimator estimates the difference between the values in two sets of data. If the two sets of data contain m and n data points respectively, then their Cartesian product contains m × n pairs of points (one from each set); each such pair defines one difference of values. The Hodges–Lehmann estimator for the difference is defined as the median of the m × n differences.[2]
A second type of estimate which has also been called by the name "Hodges–Lehmann" relates to defining a location estimate for a single dataset.[3][4] In this case, if the dataset contains n data points, then its Cartesian product with itself has n(n + 1)/2 pairs, including the pair of each item taken twice. For each such pair, the average is computed; finally, the median of the n(n + 1)/2 averages is defined to be the Hodges–Lehmann estimator of location. These pairwise averages are called the "Walsh averages".
The Hodges–Lehmann statistic estimates the population's "pseudo-median",[5] a location parameter that is closely related to the median. The difference between the median and pseudo-median is relatively small, and so this distinction is neglected in elementary discussions. Like the spatial median,[6] the pseudo–median is well defined for all distributions of random variables having dimension two or greater; for one-dimensional distributions, there exists some pseudo–median, which need not be unique, however. Like the median, the pseudo–median is defined for even heavy–tailed distributions that lack any (finite) mean.[7]
The one-sample Hodges–Lehmann statistic need not estimate any population mean, which for many distributions does not exist. The two-sample Hodges–Lehmann estimator need not estimate the difference of two means or the difference of two (pseudo-)medians; rather, it estimates the differences between the population of the paired random–variables drawn respectively from the populations.[2]
The Hodges–Lehmann univariate statistics have several generalizations in multivariate statistics:[8]