Xapian
From Wikipedia, the free encyclopedia
Xapian is an Open Source Probabilistic Information Retrieval library, released under the GPL. That is, it is a full text search engine library for programmers.
It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, and Ruby (so far!). Xapian is highly portable and runs on Linux, MacOS X, FreeBSD, NetBSD, OpenBSD, Solaris, HP-UX, Tru64, IRIX, and probably other Unix platforms, as well as Microsoft Windows.
Xapian is designed to be a highly adaptable toolkit to allow developers to easily add advanced indexing and search facilities to their own applications. Its features include:
- Transactions: if database update fails in the middle of a transaction, the database is guaranteed to remain in a consistent state.
- Simultaneous search and update, with new documents being immediately visible.
- Support for large databases: Xapian has been proved to be scalable to hundreds of millions of documents.
- Accurate probabilistic ranking: more relevant documents are listed first.
- Phrase and proximity searching.
- Relevance feedback, which improves ranking and can expand a query, find related documents, categorise documents etc.
- Structured Boolean queries, e.g. "race AND condition NOT horse"
- Wildcard search, e.g. "wiki*"
- Omega, a packaged solution for adding a search engine to a web site or intranet. Omega can easily be extended and adapted to fit changing requirements.
A growing number of organisations and projects are known to be using Xapian including Orange, Gmane and Die Zeit.
[edit] External links
- http://www.xapian.org
- Oligarchy Ltd. and Lemur Consulting Ltd. offer commercial support, consultancy and bespoke development for Xapian.