Random indexing

Random indexing is a dimension reduction method and computational framework for Distributional semantics, based on the insight that very-high-dimensional Vector Space Model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) is encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately, which is the original point of the random projection approach to dimension reduction first formulated as the Johnson–Lindenstrauss lemma. Locality-sensitive hashing has some of the same starting points. Random indexing, as used in representation of language, originates from the work of Pentti Kanerva on Sparse distributed memory, and can be described as an incremental formulation of a random projection.

It can be also verified that random indexing is a random projection technique for the construction of Euclidean spaces---i.e. L2 normed vecor spaces.[1] In Euclidean spaces, random projections are elucidated using the Johnson–Lindenstrauss lemma.[2]

TopSig [3] extends the Random Indexing model to produce bit vectors for comparison with the Hamming distance similarity function. It is used for improving the performance of information retrieval and document clustering. In a similar line of research, Random Manhattan Integer Indexing[4] is proposed for improving the performance of the methods that employ the Manhattan distance between text units.

References

  1. QasemiZadeh, B. & Handschuh, S. (2014) Random Manhattan Indexing, In Proceedings of the 25th International Workshop on Database and Expert Systems Applications.
  2. Johnson, W. and Lindenstrauss, J. (1984) Extensions of Lipschitz mappings into a Hilbert space, in Contemporary Mathematics. American Mathematical Society, vol. 26, pp. 189–206.
  3. Geva, S. & De Vries, C.M. (2011) TopSig: Topology Preserving Document Signatures, In Proceedings of Conference on Information and Knowledge Management 2011, 24-28 October 2011, Glasgow, Scotland.
  4. QasemiZadeh, B. & Handschuh, S. (2014) , In Proceedings of EMNLP'14.