Shogun (toolbox)

From Wikipedia, the free encyclopedia

Shogun is an open source toolbox written in C++. It offers numerous algorithms and data structures for machine learning problems.

Contents

[edit] Description

The focus of Shogun is on kernel machines such as support vector machines for regression and classification problems. Shogun also offers a full implementation of Hidden Markov models. The core software itself is written in C++ and offers interfaces for Matlab, Octave, Python and R. Shogun has been under active development since 1999. Today there is a vibrant user community all over the world using Shogun as a base for research and education, and contributing to the core package.

A screenshot taken under MacOS
A screenshot taken under MacOS

[edit] Supported algorithms

Currently Shogun supports the following algorithms:

  • Kernel Ridge Regression,
  • Hidden Markov Models,
  • Support Vector Machine,
  • K-Nearest Neighbors,
  • Linear Discriminant Analysis, and
  • Kernel Perceptrons.

Many different kernels are implemented, ranging from kernels for numerical data (such as gaussian or linear kernels) to kernels on special data (such as strings over certain alphabets). The currently implemented kernels for numeric data include

  • linear
  • gaussian
  • polynomial
  • sigmoid kernels

The supported kernels for special data include:

  • Spectrum
  • Weighted Degree
  • Weighted Degree with Shifts

The latter group of kernels allows processing of arbitrary sequences over fixed alphabets such as DNA sequences as well as whole email texts

[edit] Special features

As Shogun was developed with bioinformatics applications in mind it is capable of processing huge datasets consisting of up to 10 million samples. Shogun supports the use of precalculated kernels. It is also possible to use a combined kernel i.e. a kernel consisting of a linear combination of arbitrary kernels over different domains. The coefficients or weights of the linear combination can be learned as well. For this purpose Shogun offers a multiple kernel learning functionality.

[edit] References

  • C.Cortes und V.N. Vapnik. Support-vector networks Machine Learning, 20(3):273--297, 1995.
  • S.Sonnenburg, G.Rätsch, C.Schäfer und B.Schölkopf:, Large Scale Multiple Kernel Learning Journal of Machine Learning Research,7:1531-1565, July 2006, K.Bennett and E.P.-Hernandez Editors.
  • T.Joachims. Making large-scale SVM learning practical In B.Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 169--184, Cambridge, MA, 1999. MIT Press.
  • C.-C. Chang and C.-J. Lin, LIBSVM : a library for support vector machines, 2001.

[edit] External links

In other languages