H-index

From Wikipedia, the free encyclopedia

The correct title of this article is h-index. The initial letter is shown capitalized due to technical restrictions.

This article is about the index on scientific prolificacy. For the economic measure, see Herfindahl index.

The h-index is an index for quantifying the scientific productivity of physicists and other scientists based on their publication record. It was suggested in 2005 by Jorge E. Hirsch of the University of California, San Diego.

1 Definition and Purpose
2 Examples of ranking with h index
3 See also
4 References
- 4.1 Notes

[edit] Definition and Purpose

The index is calculated based on the distribution of citations received by a given researcher's publications. Hirsch writes:

A scientist has index h if h of his N_p papers have at least h citations each, and the other (N_p - h) papers have at most h citations each.

In other words, a scholar with an index of h has published h papers with at least h citations each.[1] Thus, the h-index is the result of the balance between the number of publications and the number of citations per publication. The index is designed to improve upon simpler measures such as the total number of citations or publications, to distinguish truly influential scientists from those who simply publish many papers. The index is also not affected by single papers that have many citations. The index works properly only for comparing scientists working in the same field; citation conventions differ widely among different fields.

Online web programs are available to directly calculate a scientist's h-index: H-index calculator and H-index calculator mirror. The h-index can also be manually determined using free Internet databases, such as Google Scholar, and serves as an alternative to more traditional journal impact factor metrics, which are not freely available. Because only the most highly cited articles contribute to the h-index, its determination is a relatively simpler process. Hirsch has demonstrated that $h$ has high predictive value for whether or not a scientist has won honors like National Academy membership or the Nobel Prize. In physics, a moderately productive scientist should have an $h$ equal to the number of years of service while biomedical scientists tend to have higher values.

[edit] Advantages

The main disadvantages of the old bibliometric indicators, such as total number of papers or total number of citations are that the former does not account for the quality of scientific publications, while the latter is disproportionately affected by participation in a single publication of major influence. The h-index is intended to measure simultaneously the quality and sustainability of scientific output, as well as, to some extent, the diversity of scientific research. For istance, the h-index is much less affected by methodological papers proposing successful new techniques, methods or approximations. For example, one of the most cited condensed matter theorists, John P. Perdew, has been very successful in devising new approximations within the widely used density functional theory. He has published 3 papers cited more than 5000 times and 2 cited more than 4000 times. Several thousand papers utilizing the density functional theory are published every year, most of them citing at least one paper of J.P. Perdew. His total citation index is close to 39 000, while his h-index is large, 51, but not unique. In contrast, the condensed matter theorist with the highest h-index (94), Marvin L. Cohen, has a lower citation index of 35 000. One can argue that in this case the h-index reflects the broader impact of Cohen's paper in solid state physics.

The h-index can also be calculated as a function of time, in two different ways. It was originally proposed by Hirsch that h depends linearly on the age of a researcher; in this case the time derivative allows to compare scientists of different age. Another possibility is to calculate h using papers published within a particular time period, for instance, within the last 10 years, thus measuring the current productivity as opposed to the lifetime achievement.

[edit] Criticism

It is not difficult to come up with situations in which $h$ may provide misleading information about a scientist's output. Most importantly the fact that $h$ is bounded by the total number of publications means that scientists with a short career are at an inherent disadvantage, regardless of the importance of their discoveries. For example, Evariste Galois' h-index is 2, and will remain so forever. Had Albert Einstein died in early 1906, his h-index would be stuck at 4 or 5, despite his being widely acknowledged as one of the most important physicists, even considering only his publications to that date.

Additionally, some potential drawbacks of the impact factor apply equally to the h-index. For example, review articles are usually more cited than original articles, so a hypothetic author who would only write review articles would have a higher h-index than authors who would actually contribute original research.

Furthermore, it was pointed out that while the h-index de-emphasizes singular successful publications in favor of sustained productivity, it does so too strongly. Indeed, two scientists may have the same h-index, say, $h = 30$ , but one has 10 papers cited more than 200 times, and the other has none. Clearly scientific output of the former is more valuable. Several recipes to correct for that have been proposed, but none has gained a universal recognition.

The h-index is also affected by implementation issues. Some automated searching processes find citations to papers going back many years, while others will only yield recent papers or citations. This issue is less important for more people whose publication record started after automated indexing begun around 1990. A manual search of a citation source may identify a substantial number of citations that are not quite correct, and therefore not automatically matched to the correct paper.

General problems associated with any bibliometric index, namely the necessity to measure scientific impact by one number, apply here as well. For instance, comparing two condensed matter theorists with the highest h-index, Marvin Cohen and Philip Anderson, we observe that they have the same h-index within 3%, although the latter is a Nobel Prize winner and a founder of entire new fields in condensed matter theory.

Essentially, while the h-index is an interesting attempt to 'measure' scientific productivity, it should be obvious that attempting to condense a human activity as complex as the formal aquisition of knowledge down to a single numeric metric must lose almost all of the important information about that human's endevours in the process. Two general dangers are present here -

career progression and other aspects of a human's life may be damaged by the use of a simple metric in a decision-making process by someone who has neither the time nor the intelligence to consider more appropriate decision metrics
Scientists may respond to this by maximising their h-index to the detriment of doing more justifiable work. This effect of using simple metrics for making management decisions has often been found to be an unintended consequence of metric-based decision taking; for instance, governments routinely operate policies designed to minimise not crime, but crime figures.

[edit] Modifications of h-index and m value

Proposals to modify the h-index in order to emphasize different features have been made [2]. A modification of h-index (actually of the m-value) has been proposed for assessing clinical scientists who spend a large proportion of their time treating patients. It is called the v-index.^[1]

[edit] Examples of ranking with h index

[edit] Some physicists with high h-indices

Based on the SPIRES HEP Database (Particle and High energy Physics, As of August 2005,):

Edward Witten: h = 110 (132 as of December 2005)
John Ellis: h = 101
Steven Weinberg: h = 88
Dimitri Nanopoulos: h = 86
Cumrun Vafa: h = 85

Based on the ISI Web of Science, according to the original paper (all physics):

Alan J. Heeger: (h = 107),
Marvin L. Cohen: (h = 94),
Arthur C. Gossard: (h = 94),
Philip W. Anderson: (h = 91),
Manuel Cardona: (h = 90),

Based on the ISI Web of Knowledge (all fields):

Watt W. Webb: h = 72
Klaus Ploog: h = 71
Arthur Gossard: h = 64
Daniel Chemla: h = 61
David A. B. Miller: h = 58

[edit] Physical chemists from Berkeley and Stanford with high h-indices

The following is a list of some American physical chemists with high h-indices. It has been compiled from "The Everyday Scientist" and ISI Web of Science, using data for only professors at Stanford and U.C. Berkeley:

Richard N. Zare: h = 95
Gabor A. Somorjai: h = 90
Harden M. McConnell: h = 89
Graham R. Fleming: h = 75
Richard A. Mathies: h = 68

[edit] Biologists with high h-indices

A starting attempt was made at using the physics-oriented h-index for the life sciences (a blanket term for biology, botany, medicine, and so forth). Only ten names were listed in the reference, but if a more solid attempt is made this list will certainly grow. The following list is based on publications from 1983-2002.

Solomon H. Snyder: h = 191
David Baltimore: h = 160
Robert Gallo: h = 154
Pierre Chambon: h = 153
Bert Vogelstein: h = 151

[edit] Computer scientists with high h-indices

The following are five computer scientists with high h-indices (from http://www.cs.ucla.edu/~palsberg/h-number.html).

Hector Garcia-Molina: h = 70
Deborah Estrin: h = 68
Scott Shenker: h = 65
Jeffrey D. Ullman: h = 65
Don Towsley: h = 65
Robert Tarjan: h = 64

[edit] Economics researchers with high h-indices

The following are five economics researchers with high h-indices as measured by the University of Connecticut's RePEc Author Service (as of December 2006)[3]:

[edit] Scientists in other fields with high h-indices

Marcus Raichle: h = 89 (Neurologist and neuroscientist)
Endel Tulving: h = 65 (Cognitive psychologist)
Daniel Schacter: h = 64 (Cognitive psychologist)

[edit] See also

Bibliometrics
H-Index calculator.
A Rational Indicator of Scientific Creativity
Erdős number
A simple web script to compute a (raw) h-index based on Google Scholar
The H-index for computer science
A MATLAB script to compute the h-index
h-b index
Publish or Perish calculates various statistics, including the h-index and the g-index using Google Scholar data
The HView visualizer showing a sorted histogram of citations showing the h-number as the biggest square included in the histogram
Yet another web script highlighting the article(s) to cite to raise the h-number

[edit] References

Hirsch, Jorge E., (2005), "An index to quantify an individual's scientific research output". Retrieved from arXiv February 13, 2006.
Sidiropoulos A., Katsaros D. and Manolopoulos Y., (2006), Generalized h-index for disclosing latent facts in citation networks.
Kelly C. D. and Jennions M.D. (2006), The h index and career assessment by numbers: a paper expounding certain problems of the h-index.
"Impact factor," Science 309:1181, 19 August 2005.
"An index to quantify an individual's scientific research output," PNAS 102(46):16569-16572, November 15 2005.
H values for Stanford p-chem professors from "The Everyday Scientist"
Lehmann S. L., Lautrup B. E., and Jackson A. D. (December 2006). "Measures for measures". Nature 404 (7122): 1003–1004. DOI:10.1038/4441003a.

[edit] Notes

^ Jayant S Vaidya (December 2005). "V-index: A fairer index to quantify an individual 's research output capacity". BMJ 331: 339-c-1340-c.

Retrieved from "http://en.wikipedia.org../../../h/-/i/H-index.html"