Word embedding

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing where words from the vocabulary (and possibly phrases thereof) are mapped to vectors of real numbers in a low dimensional space, relative to the vocabulary size ("continuous space").

There are several methods for generating this mapping. They include neural networks,^[1] dimensionality reduction on the word co-occurrence matrix,^[2] and explicit representation in terms of the context in which words appear.^[3]

Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing^[4] and sentiment analysis.^[5]

References

↑ Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Distributed Representations of Words and Phrases and their Compositionality". arXiv:1310.4546 [cs.CL].
↑ Lebret, Rémi; Collobert, Ronan (2013). "Word Emdeddings through Hellinger PCA". arXiv:1312.5542 [cs.CL].
↑ Levy, Omer; Goldberg, Yoav. "Linguistic Regularities in Sparse and Explicit Word Representations" (PDF). Proceedings of the Eighteenth Conference on Computational Natural Language Learning, Baltimore, Maryland, USA, June. Association for Computational Linguistics. 2014.
↑ Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew. "Parsing with compositional vector grammars" (PDF). Proceedings of the ACL conference. 2013.
↑ Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). Conference on Empirical Methods in Natural Language Processing.

Word embedding

See also

References