Feature Selection Toolbox

Feature Selection Toolbox

A screenshot shows the full user interface of FST1. At left is a log window with feature selection results. At center right is a result table window. At bottom right is a graphic projection of data and mixture model components. On top of it is the dialog for setting parameters of optimal subset search methods.
Developer(s) UTIA, Czech Academy of Sciences
Stable release
3.1.1 / 9 September 2012 (2012-09-09)
Written in C++
Operating system Cross-platform (v3)
Type Machine learning, pattern recognition
License Free for non-commercial use
Website fst.utia.cz

Feature Selection Toolbox (FST) is software primarily for feature selection in the machine learning domain,[1] written in C++, developed at the Institute of Information Theory and Automation (UTIA), of the Czech Academy of Sciences.

Version 1

The first generation of Feature Selection Toolbox (FST1) was a Windows application with user interface allowing users to apply several sub-optimal, optimal and mixture-based feature selection methods on data stored in a trivial proprietary textual flat file format.[2]

Version 3

The third generation of Feature Selection Toolbox (FST3) was a library without user interface, written to be more efficient and versatile than the original FST1.[3]

FST3 supports several standard data mining tasks, more specifically, data preprocessing and classification, but its main focus is on feature selection. In feature selection context, it implements several common as well as less usual techniques, with particular emphasis put on threaded implementation of various sequential search methods (a form of hill-climbing). Implemented methods include individual feature ranking, floating search, oscillating search (suitable for very high-dimension problems) in randomized or deterministic form, optimal methods of branch and bound type, probabilistic class distance criteria, various classifier accuracy estimators, feature subset size optimization, feature selection with pre-specified feature weights, criteria ensembles, hybrid methods, detection of all equivalent solutions, or two-criterion optimization. FST3 is more narrowly specialized than popular software like the Waikato Environment for Knowledge Analysis Weka, RapidMiner or PRTools.[4]

By default, techniques implemented in the toolbox are predicated on the assumption that the data is available as a single flat file in a simple proprietary format or in Weka format ARFF, where each data point is described by a fixed number of numeric attributes. FST3 is provided without user interface, and is meant to be used by users familiar both with machine learning and C++ programming. The older FST1 software is more suitable for simple experimenting or educational purposes because it can be used with no need to code in C++.

History

See also

References

  1. Petr Somol; Jana Novovičová; Pavel Pudil (2010). "Efficient Feature Subset Selection and Subset Size Optimization" (PDF). Pattern Recognition Recent Advances, INTECH, ISBN 978-953-7619-90-9. pp. 75–97.
  2. Petr Somol; Pavel Pudil (2002). "Feature Selection toolbox" (PDF). Pattern Recognition vol.35, no.12, Elsevier. pp. 2749–2759.
  3. Petr Somol; Pavel Vácha; Stanislav Mikeš; Jan Hora; Pavel Pudil; Pavel Žid (2010). "Introduction to Feature Selection Toolbox 3 -- The C++ Library for Subset Search, Data Modeling and Classification" (PDF). UTIA Tech. Report No. 2287. pp. 1–12. Retrieved 2010-11-02.
  4. PRTools

Official website

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.