Text analytics
From Wikipedia, the free encyclopedia
Text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents.
A typical application is to scan a set of documents written in a natural language and populate a database or search index with the information extracted. Current approaches to text analytics use natural language processing techniques that focus on specialized domains.
Typical subtasks are:
- Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
- Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.
[edit] See also
- Noisy text analytics
- Information extraction
- Computational linguistics
- Natural language processing
- Named entity recognition
- Text mining
[edit] External links
- http://www.itl.nist.gov/iaui/894.02/related_projects/muc/ MUC
- http://projects.ldc.upenn.edu/ace/ ACE (LDC)
- http://www.itl.nist.gov/iad/894.01/tests/ace/ ACE (NIST)
[edit] Commercial Software and Applications
- TEMIS - TEMIS is a software editor providing innovative Information Discovery solutions to serve the Information Intelligence needs of business corporations.