Dimensionality reduction

From Wikipedia, the free encyclopedia

In statistics, dimensionality reduction can be divided into two categories: feature selection and feature extraction. Dimensionality reduction is also a phenomenon discussed widely in physics, whereby a physical system exists in three dimensions, but its properties behave like those of a lower-dimensional system. It has been experimentally realised at the quantum critical point in an insulating magnet called 'Han purple'.

1 Feature selection
2 Feature extraction
3 References
4 See also
5 External links

[edit] Feature selection

Main article: Feature selection

Feature selection approaches try to find a subset of the original features. Two strategies are filter (e.g. information gain) and wrapper (e.g. genetic algorithm) approaches. See also combinatorial optimization problems.

It is sometimes the case that data analysis such as regression or classification can be carried out in the reduced space more accurately than in the original space.

[edit] Feature extraction

Main article: Feature extraction

Feature extraction is applying a mapping of the multidimensional space into a space of fewer dimensions. This means that the original feature space is transformed by applying e.g. a linear transformation via a principal components analysis.

Consider a string of beads, first 100 black and then 100 white. If the string is wadded up, a classification boundary between black and white beads will be very complicated in three dimensions. However, there is a mapping from three dimensions to one dimension, namely distance along the string, which makes the classification trivial. Unfortunately, a simplification as dramatic as that is rarely possible in practice.

If the correlation matrix of the data is constructed and the eigenvectors found (see principal components analysis) and listed in eigenvalue order, then (usually) just the first few eigenvectors can be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behaviour of the system. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors.

A dimensionality reduction without loss of information is possible if the data in question fall exactly on a smooth, locally flat subspace; then the reduced dimensions are just coordinates in the subspace. More commonly, data are noisy and there does not exist an exact mapping, so there must be some loss of information. Dimensionality reduction is effective if the loss of information due to mapping to a lower-dimensional space is less than the gain due simplifying the problem.

Feature extraction methods can be classified as linear or nonlinear methods. Linear methods attempt to find a globally flat subspace, while nonlinear methods attempt to find a locally flat subspace. As is the case with other problems, linear methods are simpler and more completely understood, while nonlinear methods are more general and more difficult to analyze.