Fisher kernel

From Wikipedia, the free encyclopedia

In mathematics, the Fisher kernel, named in honour of Sir Ronald Fisher, is a kernel. It was introduced in 1998 by Tommi Jaakkola [1].

The Fisher kernel combines the advantages of generative statistical models (like the Hidden Markov model) and those of discriminative methods (like Support vector machine):

  • generative model can process data of variable length (adding or removing data is well-supported)
  • discriminative methods can have flexible criterias and yield better results.

Contents

[edit] Derivation

[edit] Fisher score

The Fisher kernel makes use of the Fisher score, defined as


U_X = \nabla_{\theta} \log P(X|\theta)

with θ being a set (vector) of parameters. logP(X | θ) is the log-likelihood of the probabilistic model.

[edit] Fisher kernel

The Fisher kernel is defined as


K(X_i, X_j) = U_{X_{i}}^T I^{-1} U_{X_{j}}

with I the Fisher information matrix

[edit] Applications

[edit] Information retrieval

The Fisher kernel is the kernel for a generative probabilistic model. As such, it constitutes a bridge between generative and probabilistic models of documents[2]. Fisher kernels exist for numerous models, notably tf–idf [3], Naive Bayes and PLSI.

[edit] See also

[edit] Notes and references

  1. ^ Exploiting Generative Models in Discriminative Classifiers (1998) PS, Citeseer
  2. ^ Generative vs Discriminative Approaches to Entity Recognition from Label-Deficient Data (2003) PDF, Citeseer
  3. ^ Deriving TF-IDF as a fisher kernel (2005) PDF [1]