Search appliance

From Wikipedia, the free encyclopedia

A search appliance (SA) is a type of computer appliance which is attached to a corporate network for the purpose of indexing the content shared across that network in a way that is similar to a web search engine.[1][2]


Architecture

A search appliance is usually made up of several components. These include a gathering component, a standardizing component, a data storage area, a search component, a user interface component, and a management interface component:[3]

  • The gathering component is usually a web crawler or file crawler that goes out on a network or the Internet and gathers files and data from specified locations. This might include SMB shared directories, NFS shared directories, databases, and web pages. The crawler might either copy files to the search appliance, or only copy the metadata about the file.
  • A standardizing component takes the data from the gathering component and transposes it into a standardized format for storage in the data storage component. It then places it in the data storage area.
  • The data storage component holds metadata about the files and might also contain copies of the actual file or data as well as the metadata about the file.
  • The search component searches through the stored metadata from the files and provides the information to the search interface in the form of query results. It also can provide links to the copies of the files stored on the search appliance, or it can provide links to the original files in the source locations.
  • The search interface is the component where users compose their search queries. It provides instructions to the search component and displays query results to the user.
  • The management interface lets administrators manage user accounts, permissions, adding and deleting search indexes, crawl job scheduling, and other relevant functions.

Commercial examples

  • Google Search Appliance is a SA from Google. It is supplied in two models: a 2U model (GB-7007) capable of indexing up to 10 million documents, and a 5U model (GB-9009) that is capable of indexing up to 30 million documents.[4]
  • The Perfect Search Appliance is a third example of an appliance that searches files. The appliance stores file metadata in an index on the appliance. A web server on the appliance uses that metadata to provide relevant search results in response to user queries, and provides a link to access the original files.[6][7]
  • Clusterpoint Search Appliance is a software-only document and file indexing and search solution, that is based on Clusterpoint Server, a hybrid NoSQL database management system with fast full text search engine combined into a single software platform. The Clusterpoint Search Appliance software package includes Crawler, Web server, high speed scalable document-oriented database, management application and customizable web-GUI based search application. Software can be downloaded and installed on commodity hardware and operates as a server-based enterprise search appliance. Multiple cluster nodes when installed with Clusterpoint Search Appliance software and configured to work together, create a single large searchable cluster database. Indexing application crawls specified file systems and Web resources at scheduled intervals, extracts and stores original documents, their text content, and their meta data for search into this cluster database. Customers can apply their own custom rankings to display results in order of relevance, using Clusterpoint API and Clusterpoint Information Ranking formulas.

References

  1. "Google and Thunderstone deliver plug and search to the enterprise", Infoworld.com October 2004.
  2. "Googles Mini search appliance", ZDnet.com April 2005
  3. https://doc.perfectsearchcorp.com/?q=content/search-appliance-21-configuration-guide
  4. Computerworld - Google Releases New Versions of Its Search Appliance
  5. Product Description
  6. http://www.perfectsearchcorp.com
  7. https://doc.perfectsearchcorp.com

See also

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.