Deep learning

Deep learning is a sub-field within machine learning that is based on algorithms for learning multiple levels of representation in order to model complex relationships among data. Higher-level features and concepts are thus defined in terms of lower-level ones, and such a hierarchy of features is called a deep architecture; see Bengio (2009) for a review of the field. Such models have proven to be effective feature extractors over high-dimensional, structured data (Hinton – Scholarpedia, 2009)[1]. Most of these models are based on unsupervised learning of representations, and this makes them particularly useful to extract generic abstractions and features from a large corpus of examples, even when these examples are not necessarily labeled, as in semi-supervised learning, and not necessarily of the immediate tasks of interest, as in multi-task learning.

Attempts at training deep architectures (mostly neural networks) before 2006 failed, except for the special case of convolutional neural networks. One of the earliest successful implementations of a deep model (Hinton et al. 2006) involves learning the distribution of high level image (or possibly other data) features using successive layers of binary latent variables. However, real valued variables may also be used. The approach of (Hinton et al. 2006) uses a restricted Boltzmann machine (Smolensky, 1986) to model each new layer of higher level features. Each new layer guarantees an increase on the lower-bound of the log likelihood of the data, thus improving the model, if trained properly. Once sufficiently many layers have been learned the deep architecture may be used as a generative model by reproducing the data when sampling down the model (an "ancestral pass") from the top level feature activations.

References

  1. ^ Scholarpedia: Deep Belief Networks - http://www.scholarpedia.org/article/Deep_belief_networks, 2009

External links