Relevance (information retrieval)
From Wikipedia, the free encyclopedia
In computer science, and particularly in search engines, relevance is a numerical score assigned to a search result, representing how well the result meets the information need of the user that issued the search query. In many cases, a result's relevance determines the order in which it is presented to the user.
In academic information retrieval, the word relevance has been used in system evaluation for over forty years, going back to the Cranfield Experiments of the early 1960s. In the relatively new commercial search realm, among web search engine companies, search engine optimizers, and in the press, the incorrect relevancy is mistakenly being used more and more instead of the correct relevance. One can often tell from which community an information retrieval practitioner hails, depending on whether he or she uses the correct form of the word. Wikipedia's search facility once exhibited an example of use of the incorrect relevancy.
[edit] Algorithms for relevance
In the simplest case, relevance can be calculated by examining how many times a query term appears in a document (term frequency), possibly combined with how discriminative that query term is across the searched collection (often called Term Frequency-Inverse Document Frequency).
Since search engines and other businesses rely upon the accuracy of their results, many additional, more complex algorithms have been developed to estimate result relevance. Many of these algorithms, particularly those used by search engines, are hidden to the public, as a user that knows the details of a search algorithm can artificially boost his own content's ranking.
Relevance calculation is often misinterpreted by the press. For example, it has often been said that when Google burst onto the scene it was miles ahead of its competitors because it, unlike anyone else, ranked web pages by relevance. This is not true since everyone ranks by relevance. It is just that Google had come up with a fairly new way of estimating relevance, namely PageRank. But even search engines that only use TFIDF rank by relevance.
[edit] Clustering and relevance
The cluster hypothesis in information retrieval says that two documents that are similar to each other have a high likelihood of being relevant to the same information need. Topic clustering, and document filtering algorithms function by grouping relevant documents together. What is actually meant is that the algorithms are grouping similar documents together. Two (or more) documents are never relevant to each other. They may be similar to each other, but they are only ever relevant to a user's information need. If there is no user information need, there is no relevance.[citation needed]