eTBLAST
From Wikipedia, the free encyclopedia
eTBLAST is a text similarity search engine currently offering access to the MEDLINE database, the National Institutes of Health (NIH) CRISP database, the Institute of Physics (IOP) database, and the NASA technical reports database. It is continuously expanding with additional text-based databases. The eTBLAST server compares a user's natural text query to target databases using a hybrid search algorithm consisting of a low-sensitivity weighted keyword-based first pass followed by a novel sentence-alignment based second pass. eTBLAST is a free web-based service of The Innovation Laboratory at the University of Texas Southwestern Medical School.
eTBLAST, as a text similarity engine, made possible a large study of duplicate publications and potential plagiarisms in the biomedical literature. Thousands of random samples of Medline abstracts were submitted to eTBLAST, and those with the highest similarity were studied and entered into a on-line database. This study is on-going, with the database maturing as the entries are manually inspected and classified. This work revealed several trends, including an increasing rate of duplication in the biomedical literature, as reported in the journals Bioinformatics and Nature.
[edit] Interface
Because eTBLAST is a text-similarity engine rather than a simple keyword-based search tool, it is claimed that the user need not identify and manipulate query keywords and Boolean operators, as must be done for other search engines.
eTBLAST aims to help the user rapidly to find references, evaluate novelty, find experts and journals in a given topical area and track the popularity of the topic as defined by the user’s query.
A typical query of 100 words takes 1-2 minutes to return results after a comparison to MEDLINE that as of 1/1/2007 contains over 16 million records.
[edit] References
- Mounir Errami and Harold R. Garner, A tale of two citations. Nature, 2008 Jan 24;451(7177):397-9. View on PubMed.
- Mounir Errami, Justin M. Hicks, Wayne Fisher, David Trusty, Tara C Long, Jonathan D Wren and Harold R. Garner, Déjà vu--a study of duplicate citations in Medline. Bioinformatics, 2008 Jan 15;24(2):243-9. View on PubMed.
- Mounir Errami, Jonathan D. Wren, Justin M. Hicks, and Harold R. Garner, eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications. Nucleic Acid Research, 2007 Apr. View on PubMed.
- James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami, and Harold R. Garner, Text Similarity: an alternative way to search MEDLINE, Bioinformatics, 15;22(18):2298-304, September, 2006.View on PubMed.
- eTBLAST, was highlighted on the NetWatch column in Science, May 14, 2004, http://www.sciencemag.org/content/vol304/issue5673/netwatch.shtml
- Alexander Pertsemlidis and Harold R. Garner, Text Comparison Based on Dynamic Programming, IEEE Engineering in Biology and Medicine, Nov./Dec., 2004, Vol. 23, No. 6, pgs. 66-71.