SMART Information Retrieval System

The SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System is an information retrieval system developed at Cornell University in the 1960s. Many important concepts in information retrieval were developed as part of research on the SMART system, including the vector space model, relevance feedback, and Rocchio classification.

Gerard Salton led the group that developed SMART. Other contributors included Mike Lesk.

The SMART system also provides a set of corpora, queries and reference rankings, taken from different subjects, notably

To the legacy of the SMART system belongs the so-called SMART notation, a mnemonic scheme for denoting tf-idf weighting variants in the vector space model. The mnemonic for representing a combination of weights takes the form ddd.qqq, where the first three letters represents the term weighting of the document vector and the second three letters represents the term weighting for the query vector. The letter representation for a term,  t , and document,  d , is as follows:[1]

Term frequency Document frequency Normalization
n (natural): \text{tf}_{t,d} n (no): 1 n (none): 1
l (logarithm): 1+log(\text{tf}_{t,d}) t (idf): log\tfrac{N}{df_{t}} c (cosine):  \tfrac{1}{\sqrt{w_1^2 + w_2^2 + ... + w_M^2}}
a (augmented): 0.5 + \tfrac{0.5 \times \text{tf}_{t,d}}{\text{max(tf}_{t,d})} p (prob idf): \textbf{max}\left( 0,\text{log}\tfrac{N-df_{t}}{df_{t}} \right) b (byte size): 1/\textit{CharLength}^\alpha , \alpha < 1
b (boolean): \begin{cases} 1, & \text{if tf}_{t,d} > 0 \\
 0, & \text{otherwise}
\end{cases}
L (log average):  \tfrac{1+\text{log}(\text{tf}_{t,d})}{1+\text{log}(\text{ave}_{t \epsilon d}( \text{tf}_{t,d}))}

where tf_{t,d} is the term frequency of term  t in document  d .

References

  1. Manning, Christopher D.; Raghavan, Prabhakar; Schütze, Hinrich (2008), "Document and query weighting schemes", Introduction to Information Retrieval, Cambridge University Press

External links


This article is issued from Wikipedia - version of the Sunday, December 20, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.