Cover tree
From Wikipedia, the free encyclopedia
The cover tree is a special type of data structure in computer science that is specifically designed to facilitate the speed-up of a nearest neighbor search. It was introduced by Alina Beygelzeimer, John Langford, and Sham Kakade.
The tree can be thought of as a hierarchy of levels with the top level containing the root point and the bottom level containing every point in the metric space. Each level C is associated with an integer value i that decrements by one as the tree is descended. Each level C in the cover tree has three important properties:
- Nesting:
- Covering: For every point , there exists a point such that the distance from p to q is less than or equal to 2i and exactly one such q is a parent of p.
- Separation: For all points , the distance from p to q is greater than 2i.
[edit] Complexity Analysis
- Searches: Like other metric trees the cover tree allows for nearest neighbor searches in O(k*log n) where k is a constant associated with the dimensionality of the dataset and n is the cardinality. This maximum bound on search times is much better than naive linear searches which operate in O(k*n). However, in high-dimensional metric spaces the k constant is non-trivial, which means it cannot be ignored in complexity analysis. Unlike other metric trees, the cover tree has a theoretical bound on its constant that is based on the dataset's expansion constant or doubling constant. The bound on search time is O(c12 log n) where c is the expansion constant of the dataset.
- Insertions: Although cover trees provide faster searches than the naive approach, this advantage must be weighed with the additional cost of maintaining the data structure. In a naive approach adding a new point to the dataset is trivial because order does not need to be preserved, but in a cover tree it can take O(c6 log n) time. However, this is an upper-bound, and a number of techniques have been implemented that reduce the average-case time complexity [1].
- Space: The cover tree uses implicit representation to keep track of repeated points. Thus, it only requires O(n) space.
[edit] See also
[edit] References
- JL's Cover Tree page. John Langford's page links to papers and code.