Ranking function
In information retrieval, a ranking function is a function used by search engines to rank matching documents according to their relevance to a given search query.
Once a search engine has identified a set of potentially relevant documents, it faces the task of determining which articles are most relevant, so that they may be listed first. This is typically done by assigning a numerical score to each document based on a ranking function, which incorporates features of the document, the query, and the overall document collection.
Some very simple ranking functions include:
- The constant ranking function assigning the same score to all documents.
- The term frequency ranking function counting the number of times that each query term occurs in the document, then summing these.
- The tf-idf ranking function computing the product of the term frequency and inverse document frequency for each query term, then summing these.
More sophisticated ranking functions include:
- Okapi BM25: a variant of tf-idf. As of 2008, represents the state-of-the-art and is used in many practical applications.
- Machine-learned ranking formulas, obtained automatically from training data by machine learning methods.
Ranking functions are evaluated by a variety of means; one of the simplest is determining the precision of the first k top-ranked results for some fixed k; for example, the proportion of the top 10 results that are relevant, on average over many queries.
Frequently, computation of ranking functions can be simplified by taking advantage of the observation that only the relative order of scores matters, not their absolute value; hence terms or factors that are independent of the document may be removed, and terms or factors that are independent of the query may be precomputed and stored with the document.