Federated search

From Wikipedia, the free encyclopedia

Federated Search is an emerging feature of automated, web based library and information management systems. It is also often referred to as a Portal, as opposed to simply a web-based search engine.

Contents

[edit] User objectives

The goal of federated searching is to enable a user to search multiple independent, discretely mounted, data sources or databases through one search query. When this is done in traditional search engines, such as Google, only sources available on the Internet can be searched, retrieved and accessed. The large volume of documents housed in proprietary databases is not open to traditional Internet search engines, unless the documents are mounted on a website. Federated Searching then requires a library, university or private firm to first purchase access with individual data source vendors and/or providers, which will license access to the information in the databases. When a cluster of databases is purchased in this manner, it is impossible for a user to search a multiple selection of databases with the same one-time single query string. Typically a user must select a specific database then search the database, collect and evaluate results, then repeat the procedure with another database. This process can be time consuming and inefficient since many duplicate entries may be found. Moreover, each database may have different search features and options, which affect the results a user retrieves. Often the user must spend time learning the unique features of each data source before being able to accurately and reliably search the database.

[edit] The process

Federated search implements a computer program that allows users to access multiple data sources with a single query string located within a single interface. Two steps achieve this process. The user enters a search query in the Portal interface’s search box and the search string is sent to every individual database that is incorporated into the Portal or Federated Search list. This must be 'programmed' by the portal vendor and is often called the 'search module'. Each individual database must be linked to each portal user's web IP address. Often there will be options available in sophisticated portals to select and/or deselect which data sources a user queries. Generally speaking vendor portals will include Reference Databases, Public Access web-based Library Catalogues or OPACs, web based search engines like Google and in-house, or corporate data sources. Step Two is actualized when the individual data sources send back to the portal's interface a 'List of Results' from the search query. The user can review this 'hit list', which is usually just the number of articles retrieved in each source. The user then hyperlinks to a complete list of each source; if only one record is returned, they will enter directly into the record. Some portals will 'screen scrape' the actual database results and not directly allow a user to enter the data source's application. There are additional features available in Portals, but the basic idea is the same: to improve the accuracy and relevance of individual searches as well as reduce the amount of time required to search for resources.

[edit] Applications

One application of federated searching is the meta-search engine; however, this is not a complete solution as many documents are not currently indexed. This is known as the the invisible web. Many more information sources are not yet stored in electronic form. Google Scholar is an example of a project trying to address this.

When the search vocabulary or data model of the search system is different from the data model of one or more of the foreign target systems the query must be translated into the each of the foreign target systems. This can be done using simple data-element translation or may require semantic translation.

[edit] Quotes

Federated searching consists of transforming a query and broadcasting it to a group of disparate databases with the appropriate syntax, merging the results collected from the databases, presenting them in a succinct and unified format with minimal duplication, and allowing the library patron to sort the merged result set by various criteria (definition by Peter Jacso, 2004).

[edit] See also