Author Name Disambiguation

Author name disambiguation is a type of Record linkage that is applied to scholarly documents where the goal is to find all mentions of the same author and cluster them together. Authors of scholarly documents often share names which makes it hard to distinguish each author's work. Hence, author name disambiguation aims to find all publications that belong to a given author and distinguish them from publications of other authors who share the same name.

There are multiple reasons that cause author names to be ambiguous, among which: individuals may publish under multiple names for variety of reasons including different spelling, misspelling, name change due to marriage, or the use of middle names and initials.[1]

Typical approaches for author name disambiguation rely on information about the authors such as their affiliations, email addresses, year of publication, co-authors, topic information to distinguish between authors. This information can be used to train a machine learning classifier to decide whether two author mentions refer to the same author or not.[2][3] Other approaches use heuristics to distinguish between authors.

References

  1. "Author name disambiguation". Annual Review of Information Science and Technology. doi:10.1002/aris.2009.1440430113. Retrieved 2015-04-20.
  2. Treeratpituk, Pucktada; Giles, C. Lee (2009). Disambiguating authors in academic publications using random forests (PDF). Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM. pp. 39–48. CiteSeerX 10.1.1.147.3500Freely accessible. doi:10.1145/1555400.1555408.
  3. Khabsa, Madian; Treeratpituk, Pucktada; Giles, C. Lee (2014). Large scale author name disambiguation in digital libraries (PDF). Proceedings of the IEEE International Conference on Big Data. IEEE. pp. 41–42. CiteSeerX 10.1.1.687.6830Freely accessible. doi:10.1109/BigData.2014.7004487.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.