Xapian

From Wikipedia, the free encyclopedia

Xapian is an Open Source Probabilistic Information Retrieval library, released under the GPL. That is, it is a full text search engine library for programmers.

It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, and Ruby (so far!). Xapian is highly portable and runs on Linux, MacOS X, FreeBSD, NetBSD, OpenBSD, Solaris, HP-UX, Tru64, IRIX, and probably other Unix platforms, as well as Microsoft Windows.

Xapian is designed to be a highly adaptable toolkit to allow developers to easily add advanced indexing and search facilities to their own applications. Its features include:

  • Transactions: if database update fails in the middle of a transaction, the database is guaranteed to remain in a consistent state.
  • Simultaneous search and update, with new documents being immediately visible.
  • Support for large databases: Xapian has been proved to be scalable to hundreds of millions of documents.
  • Accurate probabilistic ranking: more relevant documents are listed first.
  • Phrase and proximity searching.
  • Relevance feedback, which improves ranking and can expand a query, find related documents, categorise documents etc.
  • Structured Boolean queries, e.g. "race AND condition NOT horse"
  • Wildcard search, e.g. "wiki*"
  • Omega, a packaged solution for adding a search engine to a web site or intranet. Omega can easily be extended and adapted to fit changing requirements.

A growing number of organisations and projects are known to be using Xapian including Orange, Gmane and Die Zeit.

[edit] External links

In other languages