Dimension reduction

From Wikipedia, the free encyclopedia

In statistics, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. In physics, dimension reduction is a widely discussed phenomenon, whereby a physical system exists in three dimensions, but its properties behave like those of a lower-dimensional system. It has been experimentally realised at the quantum critical point in an insulating magnet called 'Han Purple'.

1 Feature selection
2 Feature extraction
3 References
4 See also
5 External links

[edit] Feature selection

Main article: Feature selection

Feature selection approaches try to find a subset of the original variables (also called features or attributes). Two strategies are filter (e.g. information gain) and wrapper (e.g. genetic algorithm) approaches. See also combinatorial optimization problems.

It is sometimes the case that data analysis such as regression or classification can be done in the reduced space more accurately than in the original space.

[edit] Feature extraction

Main article: Feature extraction

Feature extraction is applying a mapping of the multidimensional space into a space of fewer dimensions. This means that the original feature space is transformed by applying e.g. a linear transformation via a principal components analysis.

Consider a string of beads, first 100 black and then 100 white. If the string is wadded up, a classification boundary between black and white beads will be very complicated in three dimensions. However, there is a mapping from three dimensions to one dimension, namely distance along the string, which makes the classification trivial. Unfortunately, a simplification as dramatic as that is rarely possible in practice.

The main linear techniques for dimensionality reduction, principal components analysis, performs a linear mapping of the data to a lower dimensional space in such a way, that the variance of the data in the low-dimensional representation is maximizes. In practice, the correlation matrix of the data is constructed and the eigenvectors on this matrix are computed. The eigenvectors that correspond to the largest eigenvalues (the principal components) can now be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behaviour of the system. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors.

Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The resulting techniques is capable of constructing nonlinear mappings that maximize the variance in the data. The resulting technique is entitled Kernel PCA. Other nonlinear techniques include techniques for locally linear embedding (such as LLE, Hessian LLE, Laplacian Eigenmaps, and LTSA). These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data (actually these techniques can be viewed upon as defining a graph-based kernel for Kernel PCA). In this way, the techniques are capable of unfolding datasets such as the Swiss roll. Techniques that employ neighborhood graphs in order to retain global properties of the data include Isomap and Maximum Variance Unfolding.

A completely different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special kind of feed-forward neural networks. Although the idea of autoencoders is quite old, training of the encoders has only recently become possible through the use of Restricted Boltzmann machines.