Hodges–Lehmann estimator

In statistics, the Hodges–Lehmann estimator is a robust and nonparametric estimator of a population's location parameter, the "pseudo–median", which is closely related to the population median. The Hodges–Lehmann estimator is used not only for the pseudo–median of a single population but also for the differences between members of two populations. The Hodges–Lehmann estimator was proposed in 1963 independently by Pranab Kumar Sen and by Joseph Hodges and Erich Lehmann, and so it is also called the "Hodges–Lehmann–Sen estimator".^[1]

1 Computations
2 Estimating the population median
3 In general statistics
4 See also
5 Notes
6 References

Computations

The Hodges–Lehmann estimator estimates the difference between the values in two sets of data. If the two sets of data contain m and n data points respectively, then their Cartesian product contains m × n pairs of points (one from each set); each such pair defines one difference of values. The Hodges–Lehmann estimator for the difference is defined as the median of the m × n differences.^[2]

A second type of estimate which has also been called by the name "Hodges–Lehmann" relates to defining a location estimate for a single dataset.^[3]^[4] In this case, if the dataset contains n data points, then its Cartesian product with itself has n(n + 1)/2 pairs, including the pair of each item taken twice. For each such pair, the average is computed; finally, the median of the n(n + 1)/2 averages is defined to be the Hodges–Lehmann estimator of location. These pairwise averages are called the "Walsh averages".

Estimating the population median

The Hodges–Lehmann statistic estimates the population's "pseudo-median",^[5] a location parameter that is closely related to the median. The difference between the median and pseudo-median is relatively small, and so this distinction is neglected in elementary discussions. Like the spatial median,^[6] the pseudo–median is well defined for all distributions of random variables having dimension two or greater; for one-dimensional distributions, there exists some pseudo–median, which need not be unique, however. Like the median, the pseudo–median is defined for even heavy–tailed distributions that lack any (finite) mean.^[7]

The one-sample Hodges–Lehmann statistic need not estimate any population mean, which for many distributions does not exist. The two-sample Hodges–Lehmann estimator need not estimate the difference of two means or the difference of two (pseudo-)medians; rather, it estimates the differences between the population of the paired random–variables drawn respectively from the populations.^[2]

In general statistics

The Hodges–Lehmann univariate statistics have several generalizations in multivariate statistics:^[8]

Multivariate ranks and signs^[9]
Spatial sign tests and spatial medians^[6]
Spatial signed-rank tests^[10]
Comparisons of tests and estimates^[11]
Several-sample location problems^[12]

Notes

^ Lehmann (2006, pp. 176 and 200–201)
^ ^a ^b Everitt (2002) Entry for "Hodges-Lehmann estimator"
^ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-850994-4 Entry for "Hodges-Lehmann one-samaple estimator"
^ Hodges & Lehmann (1963)
^ Hettmansperger & McKean (1998, pp. 2–4)
^ ^a ^b Oja (2010, p. 71)
^ Hettmansperger & McKean (1998, pp. 2–4 and 355–356)
^ Oja (2010, pp. 2–3)
^ Oja (2010, p. 34)
^ Oja (2010, pp. 83–94)
^ Oja (2010, pp. 98–102)
^ Oja (2010, pp. 160, 162, and 167–169)

References

Everitt, B.S. (2002) The Cambridge Dictionary of Statistics, CUP. ISBN 052181099x
Hettmansperger, T. P.; McKean, J. W. (1998). Robust nonparametric statistical methods. Kendall's Library of Statistics. 5 (First ed., rather than Taylor and Francis (2010) second ed.). London: Edward Arnold. pp. xiv+467. ISBN 0-340-54937-8, 0-471-19479-4. MR 1604954.
Hodges, J. L.; Lehmann, E. L. (1963). "Estimation of location based on ranks". Annals of Mathematical Statistics 34 (2): 598–611. doi:10.1214/aoms/1177704172. JSTOR 2238406. MR 152070. Zbl 0203.21105. PE euclid.aoms/1177704172. http://projecteuclid.org/euclid.aoms/1177704172.
Lehmann, Erich L. (2006). Nonparametrics: Statistical methods based on ranks. With the special assistance of H. J. M. D'Abrera (Reprinting of 1988 revision of 1975 Holden-Day ed.). New York: Springer. pp. xvi+463. ISBN 978-0-387-35212-1, 0-387-35212-0. MR 395032.
Oja, Hannu (2010). Multivariate nonparametric methods with R: An approach based on spatial signs and ranks. Lecture Notes in Statistics. 199. New York: Springer. pp. xiv+232. doi:10.1007/978-1-4419-0468-3. ISBN 978-1-4419-0467-6. MR 2598854.
Sen, Pranab Kumar (December 1963). "On the estimation of relative potency in dilution(-direct) assays by distribution-free methods". Biometrics 19 (4): 532–552. doi:10.2307/2527532. JSTOR 2527532. Zbl 0119.15604.