Piranha (software)
Piranha is a text mining system developed for the United States Department of Energy (DOE) by Oak Ridge National Laboratory (ORNL). The software processes large volumes of unrelated free-text documents and shows relationships amongst them, a technique valuable across numerous scientific and data domains, from health care fraud to national security. The results are presented in clusters of prioritized relevance to business and government analysts. Piranha has six main strengths: Collecting and Extracting: Millions of documents from numerous sources such as databases and social media can be collected and text extracted from hundreds of file formats; This info. can then be translated to any number of languages. Storing and indexing: Documents in search servers, relational databases, etc. can be stored and indexed at will. Recommending: Recommending the most valuable information for particular users. Categorizing: Grouping items via supervised and semi-supervised machine learning methods and targeted search lists. Clustering: Similarity is used to create a hierarchical group of documents. Visualizing: Showing relationships among documents so that users can quickly recognize connections.
This work has resulted in four issued ( 7,072,883 7,315,858 7,693,9037,805,446) and four pending patents, several commercial licenses (including Pro2Serve and TextOre), a spin-off company (Global Security Information Analysts LLC (GSIA)), an R&D 100 Awards, and scores of peer reviewed research publications.
References
- Cui, X., Beaver, J., St. Charles, J., Potok, T. (September 2008). Proceedings of the IEEE Swarm Intelligence Symposium, St. Louis, Mo. Dimensionality Reduction for High Dimensional Particle Swarm Clustering.
- Yasin, Rutrell (Nov 29, 2012) GCN. Energy lab's Piranha puts teeth into text analysis
- Franklin Jr., Curtis (Nov 30, 2012) Enterprise Efficiency. Piranha Brings Affordable Big-Data to Government
- Breeden II, John (Dec 7, 2012) GCN. Swimming with Piranha: Testing Oak Ridge's text analysis tool
- Kirby, Bob (Summer 2013) FedTech. Big Data Can Help the Federal Government Move Mountains. Here's How.
Awards
- 2007 R&D 100 Magazine’s Award Piranha (software)
Patents
- U.S. Patent 7,072,883 – System for gathering and summarizing internet information
- U.S. Patent 7,315,858 – Method for gathering and summarizing internet information
- U.S. Patent 7,693,903
- U.S. Patent 7,805,446 – Agent-based method for distributed clustering of textual information
- U.S. Patent 7,937,389 – Dynamic reduction of dimensions of a document vector in a document search and retrieval system
- U.S. Patent 8,473,314 – Method and system for determining precursors of health abnormalities from processing medical records
External links
- DOE Energy Innovlation Portal (2014) Agent-Based Software for Gathering and Summarizing Textual and Internet Information.
- ORNL Piranha website