National Centre for Text Mining
From Wikipedia, the free encyclopedia
The National Centre for Text Mining (NaCTeM) is the world’s first publicly funded text mining (TM) centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response to the requirements of the local British academic community.
The software tools and services which NaCTeM supplies allow researchers to apply text mining techniques to problems within their areas of interest - examples of these tools are highlighted below. In addition to providing services, the centre is also involved in, and makes significant contributions to, the text mining research community both nationally and internationally.
The Centre is located in the Manchester Interdisciplinary Biocentre and is operated by two Universities: the School of Computer Science at the University of Manchester [1] leads the consortium and contributes expertise in information extraction, natural language processing and parallel and distributed data mining; the Special Collections and Archives section of the University of Liverpool [2] library have experience and expertise in information retrieval systems.
In addition to the main consortium partners, there are a number of self-funded associates from world-leading groups including the San Diego Supercomputer Center’s Data Intensive Computing group led by Reagan Moore; the University of California, Berkeley [3] information retrieval specialists lead by Ray Larson and the Tsujii Lab, University of Tokyo. Professor Tsujii's lab are world-leading researchers in the field of computational linguistics, especially in the application of such technology to the biomedical domain.
[edit] Applications
TerMine is a domain independent method for automatic term recognition which finds the most important terms in a document and automatically ranks them.
Cheshire-Termine is a search engine for Medline documents that generates lists of significant terms relevant to the search results. It combines the best features of browse and search techniques for drilling down to more focused set of documents.
AcroMine finds all known expanded forms of acronyms as they have appeared in Medline entries or conversely, it can be used to find possible acronyms of expanded forms as they have previously appeared in Medline and disambiguates them.
Medie is an intelligent search engine, for semantic retrieval of sentences containing biomedical correlations from Medline abstracts.
Info-PubMed provides information and graphical representation of biomedical interactions extracted from Medline using deep semantic parsing technology. This is supplemented with a term dictionary consisting of over 200,000 protein/gene names and identification of disease types and organisms.