Word embedding
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing where words from the vocabulary (and possibly phrases thereof) are mapped to vectors of real numbers in a low dimensional space, relative to the vocabulary size ("continuous space").
There are several methods for generating this mapping. They include neural networks,[1] dimensionality reduction on the word co-occurrence matrix,[2] and explicit representation in terms of the context in which words appear.[3]
Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing[4] and sentiment analysis.[5]
See also
References
- ↑ Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Distributed Representations of Words and Phrases and their Compositionality". arXiv:1310.4546 [cs.CL].
- ↑ Lebret, Rémi; Collobert, Ronan (2013). "Word Emdeddings through Hellinger PCA". arXiv:1312.5542 [cs.CL].
- ↑ Levy, Omer; Goldberg, Yoav. "Linguistic Regularities in Sparse and Explicit Word Representations" (PDF). Proceedings of the Eighteenth Conference on Computational Natural Language Learning, Baltimore, Maryland, USA, June. Association for Computational Linguistics. 2014.
- ↑ Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew. "Parsing with compositional vector grammars" (PDF). Proceedings of the ACL conference. 2013.
- ↑ Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). Conference on Empirical Methods in Natural Language Processing.