R-tree

From Wikipedia, the free encyclopedia

This article is about the data structure. For the type of metric space, see Real tree.
Simple example of R-tree for a 2D rectangles
Simple example of R-tree for a 2D rectangles

R-trees are tree data structures that are similar to B-trees, but are used for spatial access methods i.e., for indexing multi-dimensional information; for example, the (X, Y) coordinates of geographical data. A common real-world usage for an R-tree might be: "Find all museums within 2 miles of my current location".

The data structure splits space with hierarchically nested, and possibly overlapping, minimum bounding rectangles (otherwise known as bounding boxes).

Each node of an R-tree has a variable number of entries (up to some pre-defined maximum). Each entry within a non-leaf node stores two pieces of data; a way of identifying a child node, and the bounding box of all entries within this child node.

The insertion and deletion algorithms use the bounding boxes from the nodes to ensure that "nearby" elements are placed in the same leaf node (in particular, a new element will go into the leaf node that requires the least enlargement in its bounding box). Each entry within a leaf node stores two pieces of information; a way of identifying the actual data element (which, alternatively, may be placed directly in the node), and the bounding box of the data element.

Similarly, the searching algorithms (for example; intersection, containment, nearest) use the bounding boxes to decide whether or not to search inside a child node. In this way, most of the nodes in the tree are never "touched" during a search. Like B-trees, this makes R-trees suitable for databases, where nodes can be paged to disk when needed.

Different algorithms can be used to "split" nodes when they become too full, resulting in the "Quadratic" and "Linear" R-tree sub-types.

R-trees do not historically guarantee good worst-case performance, but generally performed well with real-world data. However, a new algorithm was published in 2004 that defines the Priority R-Tree, which claims to be as efficient as the currently most efficient methods and is at the same time worst-case optimal.

Contents

[edit] Variants

[edit] Algorithm

[edit] Search

The input is a search rectangle (Query box). Searching is quite similar to searching in a B-tree. The search starts from the root node of the tree. Every internal node contains a set of rectangles and pointers to the corresponding child node and every leaf node contains the rectangles of spatial objects (the pointer to some spatial object can be there). For every rectangle in a node, it has to be decided if it overlaps the search rectangle or not. If yes, the corresponding child node has to be searched also. Searching is done like this in a recursive manner until all overlapping nodes have been traversed. When a leaf node is reached, bounding boxes (rectangles) are tested against the search rectangle and their objects (if there are any) are fetched for testing into result set whether they intersect the search rectangle.

[edit] Insertion

To insert an object, the tree is traversed from the root node. All rectangles in the current internal node are examined. The constraint of least coverage is employed to insert an object, i.e., the box that needs least enlargement to enclose the new object is selected. In the case there is more than one rectangle meets the first criterion, the one with the smallest area is chosen. Inserting continue recursively in chosen node. Once a leaf node is reached, a straightforward insertion is made if the leaf node is not full. The leaf node needs splitting if it is full, before the insertion is made. A few splitting algorithms have been proposed for good R-tree performance.

[edit] Bulk-loading

[edit] See Also

[edit] References

[edit] External links

In other languages