Gensim

Gensim

Original author(s)	Radim Řehůřek
Developer(s)	RaRe Technologies, various
Initial release	2009

Stable release	0.13.4 / 25 December 2016 (2016-12-25)

Repository	github.com/RaRe-Technologies/gensim
Development status	active
Written in	Python
Operating system	Linux, Windows, macOS, OS X
Platform	cross-platform
Type	Information retrieval
License	LGPL
Website	radimrehurek.com/gensim/

Gensim is a robust open-source vector space modeling and topic modeling toolkit implemented in Python. It uses NumPy, SciPy and optionally Cython for performance. Gensim is specifically designed to handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory processing.

Main features

Gensim includes implementations of tf-idf, random projections, word2vec and document2vec algorithms,^[1] hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), including distributed parallel versions.^[2]

Some of the online algorithms in Gensim were also published in the 2011 PhD dissertation Scalability of Semantic Analysis in Natural Language Processing of Radim Řehůřek, the creator of Gensim.^[3]

Uses of gensim

Gensim has been used and cited in over 500 commercial and academic applications.^[4]^[5] The software has been covered in several new articles, podcasts and interviews since 2009.^[6]^[7]^[8]

Free and commercial support

The open source code is developed and hosted on GitHub^[9] and a public support forum is maintained on Google Groups^[10] and Gitter.^[11]

Gensim is commercially supported by the company rare-technologies.com, who also provide student mentorships and academic thesis projects for gensim via their Student Incubator programme.^[12]

References

↑ Deep learning with word2vec and gensim
↑ Radim Řehůřek and Petr Sojka (2010). Software framework for topic modelling with large corpora. Proc. LREC Workshop on New Challenges for NLP Frameworks
↑ Řehůřek, Radim (2011). "Scalability of Semantic Analysis in Natural Language Processing" (PDF). Retrieved 27 January 2015. my open-source gensim software package that accompanies this thesis
↑ Gensim academic citations
↑ Commercial adopters of gensim
↑ Podcast.__init__ episode #71 on gensim
↑ Interview with Radim Řehůřek, creator of Gensim
↑ http://decisionstats.com/2015/12/07/decisionstats-interview-radim-rehurek-gensim-python/
↑ Gensim source code on Github
↑ Gensim mailing list on Google Groups
↑ Gensim chat room on Gitter
↑ Gensim open source Incubator

External links

Official website

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.