Word embedding

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing where words from the vocabulary (and possibly phrases thereof) are mapped to vectors of real numbers in a low dimensional space, relative to the vocabulary size ("continuous space").

There are several methods for generating this mapping. They include neural networks,[1] dimensionality reduction on the word co-occurrence matrix,[2] and explicit representation in terms of the context in which words appear.[3]

Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing[4] and sentiment analysis.[5]

See also

References

  1. Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Distributed Representations of Words and Phrases and their Compositionality". arXiv:1310.4546 [cs.CL].
  2. Lebret, Rémi; Collobert, Ronan (2013). "Word Emdeddings through Hellinger PCA". arXiv:1312.5542 [cs.CL].
  3. Levy, Omer; Goldberg, Yoav. "Linguistic Regularities in Sparse and Explicit Word Representations" (PDF). Proceedings of the Eighteenth Conference on Computational Natural Language Learning, Baltimore, Maryland, USA, June. Association for Computational Linguistics. 2014.
  4. Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew. "Parsing with compositional vector grammars" (PDF). Proceedings of the ACL conference. 2013.
  5. Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). Conference on Empirical Methods in Natural Language Processing.