Recursive neural network

Not to be confused with Recurrent neural network.

A recursive neural network (RNN) is a kind of deep neural network created by applying the same set of weights recursively over a structure, to produce a structured prediction over variable-length input, or a scalar prediction on it, by traversing a given structure in topological order. RNNs have been successful in learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RNNs have first been introduced to learn distributed representations of structure, such as logical terms.^[1]

Architectures

Basic Architecture

A simple recursive neural network architecture

In the most simple architecture, nodes are combined into parents using a weight matrix that is shared across the whole network, and a non-linearity such as tanh. If c₁ and c₂ are n-dimensional vector representation of nodes, their parent will also be an n-dimensional vector, calculated as

$p_{1,2} = \tanh\left(W[c_1 ; c_2]\right)$

Where W is a learned $n\times 2n$ weight matrix.

This architecture, with a few improvements, has been used for successfully parsing natural scenes and for syntactic parsing of natural language sentences.^[2]

Recursive Neural Tensor Network

These networks use a single, tensor-based composition function for all nodes in the tree.^[3]

Training

Stochastic gradient descent

Typically, stochastic gradient descent (SGD) is used to train the network. The gradient is computed using backpropagation through structure (BPTS), a variant of backpropagation through time used for recurrent neural networks.

Related models

Recurrent neural networks are in fact recursive neural networks with a particular structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step.

References

↑ Goller, C.; Küchler, A. "Learning task-dependent distributed representations by backpropagation through structure". Neural Networks, 1996., IEEE. doi:10.1109/ICNN.1996.548916.
↑ Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D. "Parsing Natural Scenes and Natural Language with Recursive Neural Networks". The 28th International Conference on Machine Learning (ICML 2011).
↑ Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013.