Signature files
From Wikipedia, the free encyclopedia
Signature files is a technique applied for document retrieval. The idea behind Signature files is to create a quick and dirty filter that will keep all the documents that match to the query and hopefully a few ones that do not. The way this is done is by creating for each file a signature, typically a hash coded version. One method is superimposed coding. A post-processing step is done to discard the false alarms. This structure since in most cases is inferior to inverted files in terms of speed, size and functionality, is not used much. However, with proper parameters it can beat the inverted files in certain environments.
[edit] References
- Christos Faloutsos and Stavros Christodoulakis, Signature files: An access method for documents and its analytical performance evaluation. ACM Transactions on Information Systems (TOIS), Vol. 2, No. 4 (1984), pp. 267-288.
- Justin Zobel, Alistair Moffat and Kotagiri Ramamohanarao, Inverted files versus signature files for text indexing. ACM Transactions on Database Systems (TODS), Vol. 23, Issue 4 (1998), pp. 453-490.
- Ben Carterette and Fazli Can, Comparing inverted files and signature files for searching a large lexicon. Information Processing and Management, Vol. 41, No. 3 (2005), pp. 613-633.