Gensim
Original author(s) | Radim Řehůřek |
---|---|
Developer(s) | various |
Stable release | 0.10.3 / 17 November 2014 |
Development status | active |
Written in | Python |
Platform | cross-platform |
Type | Natural language processing |
License | LGPL |
Website |
radimrehurek |
Gensim is an open-source vector space modeling and topic modeling toolkit, implemented in the Python programming language, using NumPy, SciPy and optionally Cython for performance. It is specifically intended for handling large text collections, using efficient online algorithms.
Gensim includes implementations of tf–idf, random projections, deep learning with Google's word2vec algorithm [1] (reimplemented and optimized in Cython), hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), including distributed parallel versions.[2]
Gensim has been used in a number of commercial as well as academic applications.[3][4] The code is hosted on GitHub[5] and a support forum is maintained on Google Groups.[6]
Gensim accompanied the PhD dissertation Scalability of Semantic Analysis in Natural Language Processing of Radim Řehůřek (2011).[7]
Gensim's tagline
- Topic Modelling for Humans [8]
References
- ↑ Deep learning with word2vec and gensim
- ↑ Radim Řehůřek and Petr Sojka (2010). Software framework for topic modelling with large corpora. Proc. LREC Workshop on New Challenges for NLP Frameworks.
- ↑ Interview with Radim Řehůřek, creator of gensim
- ↑ gensim academic citations
- ↑ gensim source code
- ↑ gensim mailing list
- ↑ Rehurek, Radim (2011). "Scalability of Semantic Analysis in Natural Language Processing". http://radimrehurek.com/''. Retrieved 27 January 2015.
my open-source
gensim software package that accompanies this thesis - ↑ Rehurek, Radim. "Gensim". http://radimrehurek.com/''. Retrieved 27 January 2015.
Gensim's tagline: "Topic Modelling for Humans"