Scikit-learn

From Wikipedia, the free encyclopedia

scikit-learn
Original author(s) David Cournapeau
Initial release June 2007 (2007-06)[1]
Stable release 0.13.1 / February 23, 2013 (2013-02-23)
Preview release 0.14a1 / July 29, 2013 (2013-07-29)
Written in Python, Cython, C and C++
Operating system Linux, Mac OS X, Microsoft Windows
Type Library for machine learning
License BSD License
Website scikit-learn.org

scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language.[2] It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Overview

The scikit-learn project started as scikits.learn, a Google Summer of Code project by David Cournapeau. Its name stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately-developed and distributed third-party extension to SciPy. The original codebase was later extensively rewritten by other developers. Of the various scikits, scikit-learn as well as scikit-image were described as "well-maintained and popular" in November 2012.[3]

As of 2013, scikit-learn is under active development and is sponsored by INRIA and occasionally Google (through the Google Summer of Code).[4] Among its users is Evernote, which uses the library to distinguish recipes from other user posts through a naive Bayes classifier,[5] and Mendeley, which builds recommender systems from scikit-learn's SGD regression algorithm.[6] The Python Natural Language Toolkit (NLTK) includes a wrapper to allow use of scikit-learn through the nltk.classify API.[7]

The scikit-learn API has been adopted by wise.io, who offer a proprietary implementation of random forests called wiseRF.[8][9] wise.io's business partner Continuum IO claimed data throughput of up to 7.5 times that of scikit-learn's implementation;[10] since then, the scikit-learn developers claim to have optimized their implementation to be competitive with wise.io's, except in terms of memory use.[11]

Implementation

scikit-learn is largely written in Python, with some core algorithms written in Cython to achieve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM. Logistic Regression and Linear support vector machines are implemented by a Cython wrapper around LIBLINEAR.

See also

References

  1. "Welcome to the SciPy Toolkits". 7 October 2009. Retrieved 7 June 2013. 
  2. Fabian Pedregosa; Gaël Varoquaux; Alexandre Gramfort; Vincent Michel; Bertrand Thirion; Olivier Grisel; Mathieu Blondel; Peter Prettenhofer; Ron Weiss; Vincent Dubourg; Jake Vanderplas; Alexandre Passos; David Cournapeau (2011). "Scikit-learn: Machine Learning in Python". Journal of Machine Learning Research 12: 2825–2830. 
  3. Eli Bressert (2012). SciPy and NumPy: an overview for developers. O'Reilly. p. 43. 
  4. "About Us". http://scikit-learn.org. Retrieved 3 May 2013. 
  5. Mark Ayzenshtat (22 January 2013). "Stay classified". Evernote Techblog. Retrieved 4 May 2013. 
  6. Mark Levy (2013). "Efficient Top-N Recommendation by Linear Regression". ACM RecSys Large Scale Recommender System workshop. 
  7. "scikitlearn Module". NLTK 2.0 Documentation. Retrieved 4 May 2013. 
  8. "wiserf". wise.io. Retrieved 22 January 2014. 
  9. Buitinck, Lars, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae et al. (2013). "API design for machine learning software: experiences from the scikit-learn project". ECML PKDD Workshop on Languages for Machine Learning. 
  10. Joseph W. Richards (27 November 2012). "wiseRF Use Cases and Benchmarks". Continuum IO. Retrieved 22 January 2014. 
  11. Gaël Varoquaux (8 August 2013). "Scikit-learn 0.14 release: features and benchmarks". Retrieved 22 January 2014. 

External links

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.