Sphinx (search engine)

Sphinx
Developer(s) Andrew Aksyonoff
Initial release 2001
Stable release 2.2.7 / 20 January 2015
Preview release 2.3.2-dev / 3 March 2015
Development status Active
Written in C++
Operating system Linux, Windows, Solaris, FreeBSD, NetBSD, Mac OS, AIX
Type Search and index
License GPLv2 and commercial
Website www.sphinxsearch.com

Sphinx is a free software/open source Fulltext search engine designed to provide full-text search functionality to client applications.

Overview

Like other DBMS's, Sphinx can be used as a stand-alone server. It can be used to communicate with other DBMS's by using native protocols of MySQL, MariaDB and PostgreSQL, or by using ODBC with ODBC-compliant DBMS's. And, Sphinx can also be used as a storage engine ("SphinxSE") for MySQL and its forks. MariaDB, a fork of MySQL, is distributed with SphinxSE.[1]

SphinxAPI

If Sphinx is executed as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl, Ruby and Python languages. Unofficial implementations for other languages, as well as various third party[2] plugins and modules are also available. Other data sources can be indexed via pipe in a custom XML format.[3]

SphinxQL

The Sphinx search daemon supports MySQL binary network protocol and can be accessed with the regular MySQL API. Sphinx also supports a subset of SQL (SphinxQL). It supports standard querying of all index types with SELECT, modifying RealTime indexes with INSERT, REPLACE, and DELETE, and more.

SphinxSE

When using MariaDB or MySQL, the Sphinx searchd can also query via a table running with SphinxSE engine. The sphinx query is passed to searchd via the reserved query field.

Full-text fields

Full-text fields (or just fields for brevity) are the textual document contents that get indexed by Sphinx, and can be (quickly) searched for keywords. Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (e.g. to "title" and "abstract" only). Sphinx's index format generally supports up to 256 fields. Note that the original contents of the fields are not stored in the Sphinx index. The text that you send to Sphinx gets processed, and a full-text index (a special data structure that enables quick searches for a keyword) gets built from that text. But the original text contents are then simply discarded. Sphinx assumes that you store those contents elsewhere anyway.

Attributes

Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search. Attributes are named. Attribute names are case insensitive. Attributes are not full-text indexed; they are stored in the index as is. Currently supported attribute types are:

More about JSON Attributes in Sphinx

Sphinx, like classic SQL databases, works with a so-called fixed schema, that is, a set of attribute columns. These work well when most of the data you store actually has values. However, mapping sparse data to static columns can be very cumbersome. Assume for example that you’re running a price comparison or an auction site with many different products categories. Some of the attributes like the price or the vendor are identical across all goods. But from there, for laptops, you also need to store the weight, screen size, HDD type, RAM size, etc. And, say, for shovels, you probably want to store the color, the handle length, and so on. So it’s manageable across a single category, but all the distinct fields that you need for all the goods across all the categories are legion. The JSON field can be used to overcome this. Inside the JSON attribute you don’t need a fixed structure. You can have various keys which may or may not be present in all documents. When you try to filter on one of these keys, Sphinx will ignore documents that don’t have the key in the JSON attribute and will work only with those documents that have it.

License

Sphinx is dual licensed:

  1. GNU General Public License version 2
  2. and, commercial licensing is available for use-cases which are not within the terms of the GNU GPLv2.

Sphinx use examples

Feature list

Performance and scalability

See also

References

  1. "AskMonty: About SphinxSE". http://kb.askmonty.org''. Monty Program AB. Retrieved 2013-08-16.
  2. "Sphinx Wiki: Third Party Tools". http://sphinxsearch.com''. Sphinx Search Wiki. Retrieved 2013-08-16.
  3. "xmlpipe2". http://sphinxsearch.com''. Sphinx Search Documentation. Retrieved 2013-08-16.
  4. "JSON Attributes in Sphinx 2.1.1". http://sphinxsearch.com''. Sphinx Search Blog. Retrieved 2013-08-16.
  5. "Full JSON Support in Trunk". http://sphinxsearch.com''. Sphinx Search Blog. Retrieved 2013-08-16.
  6. "Sphinx at Craigslist". http://craigslist.org''. Craigslist. Retrieved 2013-08-17.
  7. "GM Recruitment". http://www.aleph-networks.com''. Aleph-networks. Retrieved 2012-10-01.
  8. "Lighting Fast PHP Site Search". http://tradebit.com''. Tradebit. Retrieved 2013-08-17.
  9. "Sphinx Search beta for Vbulletin 4.0". http://vbulletin.com''. Vbulletin. Retrieved 2013-08-17.
  10. "Sphinx Search Extension for MediaWiki". http://mediawiki.org''. MediaWiki: Svemir Brkic, Paul Grinberg. Retrieved 2013-08-17.
  11. "Powered by Sphinx Search: Boardreader". http://sphinxsearch.com''. Sphinx Search. Retrieved 2013-08-17.
  12. "About Sphinx". http://sphinxsearch.com''. Sphinx Search. Retrieved 2013-08-16.
  13. "Powered by Sphinx". http://sphinxsearch.com''. Sphinx Search. Retrieved 2013-08-16.
  14. "Craigslist: Factsheet". http://www.craigslist.org''. Craigslist. Retrieved 2013-08-16.

External links

Wikibooks has a book on the topic of: Sphinx Search

Further reading