Word embedding

Machine learning and data mining

Problems
Classification Clustering Regression Anomaly detection Association rules Reinforcement learning Structured prediction Feature engineering Feature learning Online learning Semi-supervised learning Unsupervised learning Learning to rank Grammar induction
Supervised learning (classification • regression)
Decision trees Ensembles (Bagging, Boosting, Random forest) k-NN Linear regression Naive Bayes Neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering
BIRCH Hierarchical k-means Expectation-maximization (EM) DBSCAN OPTICS Mean-shift
Dimensionality reduction
Factor analysis CCA ICA LDA NMF PCA t-SNE
Structured prediction
Graphical models (Bayes net, CRF, HMM)
Anomaly detection
k-NN Local outlier factor
Neural nets
Autoencoder Deep learning Multilayer perceptron RNN Restricted Boltzmann machine SOM Convolutional neural network
Theory
Bias-variance dilemma Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
Machine learning venues
NIPS ICML JMLR ArXiv:cs.LG
Data mining venues
KDD ICDM SDM
Machine learning portal

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing where words or phrases from the vocabulary are mapped to vectors of real numbers in a low-dimensional space relative to the vocabulary size ("continuous space").

Methods to generate this mapping include neural networks,^[1] dimensionality reduction on the word co-occurrence matrix,^[2]^[3]^[4] and explicit representation in terms of the context in which words appear.^[5]

Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing^[6] and sentiment analysis.^[7]

References

↑ Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Distributed Representations of Words and Phrases and their Compositionality". arXiv:1310.4546 [cs.CL].
↑ Lebret, Rémi; Collobert, Ronan (2013). "Word Emdeddings through Hellinger PCA". arXiv:1312.5542 [cs.CL].
↑ Levy, Omer; Goldberg, Yoav (2014). Neural Word Embedding as Implicit Matrix Factorization (PDF). NIPS.
↑ Li, Yitan; Xu, Linli (2015). Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective (PDF). Int'l J. Conf. on Artificial Intelligence (IJCAI).
↑ Levy, Omer; Goldberg, Yoav (2014). Linguistic Regularities in Sparse and Explicit Word Representations (PDF). CoNLL. pp. 171–180.
↑ Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew (2013). Parsing with compositional vector grammars (PDF). Proc. ACL Conf.
↑ Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (PDF). EMNLP.

This article is issued from Wikipedia - version of the Monday, January 25, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Word embedding

See also

References