Nearest neighbor search

From Wikipedia, the free encyclopedia

Nearest Neighbor Search (NNS), also known as proximity search or closest point search, is a numerical optimization problem for finding closest points in multidimensional metric spaces. The problem is: given a set S of points in d-dimensional space V and a query point q ∈V, find the closest point in S to q. In most cases, V is taken to be d-dimensional Euclidean space and distance is measured by Euclidean distance or Manhattan distance.

Contents

[edit] Applications

The Nearest Neighbor Search problem arises in several fields including

[edit] Methods

Various solutions to the NNS problem have been proposed. The quality and usefulness of the algorithm are determined by the time complexity of queries as well as the space complexity of any search data structures that must be maintained.

[edit] Linear Search

The simplest solution to the NNS problem is to compute the distance from the query point to every other point in the database, keeping track of the "best so far". This algorithm, sometimes referred to as the naive approach, works for small databases but quickly becomes intractable as either the size or the dimensionality of the problem becomes large. Linear search has a running time of O(N*d) where N is the cardinality of S and d is the dimensionality of S. There are no search data structures to maintain, so linear search has no space complexity beyond the storage of the database.

[edit] Space Partitioning

Several space-partitioning methods have been developed for solving the NNS problem. Perhaps the simplest is the kd-Tree, which iteratively bisects the search space into two regions containing half of the points of the parent region. Queries are performed via traversal of the tree from the root to a leaf by evaluating the query point at each split. Query time complexity is O(log N).

[edit] Locality sensitive hashing

Locality sensitive hashing (LSH) is a technique for grouping points in space into 'buckets'.

[edit] Variants

There are two primary variants of the NNS problem. The first is k-Nearest Neighbor Search, which identifies the top k nearest neighbors to the query. This technique is commonly used in predictive analytics to estimate or classify a point based on the consensus of its neighbors.

The second variant is ε/Approximate Nearest Neighbor Search, which is becoming an increasingly popular tool for fighting the "curse of dimensionality".

[edit] Related Papers

  • Arya, S., D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. Journal of the ACM, vol. 45, no. 6, pp. 891-923