B+ tree
From Wikipedia, the free encyclopedia
In computer science, a B+ tree is a type of tree, which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of keys in each index segment (usually called a 'block' or 'node'). In a B+ tree, in contrast to a B-tree, all records are stored at the lowest level of the tree; only keys are stored in interior blocks.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context. Given a storage system with a block size of b, a B+ tree which stores a number of keys equal to a multiple of b will be very efficient when compared to a binary search tree (the corresponding data structure for non-block-oriented storage contexts).
The NTFS filesystem (for Microsoft Windows), ReiserFS filesystem (for Linux), XFS filesystem (for IRIX and Linux), and the JFS2 filesystem (for AIX, OS/2 and Linux) all use this type of tree for block indexing. Relational databases also often use this type of tree for table indices (in fact, file systems and relational database tables can be thought of as differently optimised variants on the same idea).
Contents |
[edit] Details
The maximum number of keys or records that can be stored in a block is called the order of the B+ tree; the minimum number is 1 / 2 of the maximum (if the order is odd, this will be rounded either up or down, consistently throughout the tree). Usually the order is denoted by the variable b. The minimum restriction is lifted for the root index, because the tree may contain too few entries to fully populate this index.
For example, if the order of a B+ tree is b, each node (except for the root) may have between (b / 2) + 1 and b keys; the root may have between 2 and b.
The number of keys that may be indexed using a B+ tree is a function of the order of the tree and its height.
[edit] Performance and Statistics
For a b-order B+ tree with h levels of index:
- The maximum number of records stored is n = bh
- The minimum number of keys is 2(b / 2)h − 1
- The space required to store the tree is O(n)
- Inserting a record requires O(logbn) operations in the worst case
- Finding a record requires O(logbn) operations in the worst case
- Removing a (previously located) record requires O(logbn) operations in the worst case
- Performing a range query requires O(logbn + k) operations in the worst case, where k is the number of elements occurring within the range.
[edit] Relationship to Other Data Structures
The B+ tree (and most other "B..." trees) are all specialisations of the (a,b)-tree, in which the minimum and maximum order is explicitly defined (the a and b, respectively).
The B+ tree is a variant of the B-tree, in which interior nodes store both keys and records; in this sense, the B+ tree is a specialisation of the B-tree.
The B# Tree is a specialisation of the B+ tree, which adds additional restrictions.
The binary search tree is a relative of the B+ tree; specifically, it is a B+ tree where b = 2.
The ReiserFS chief developer makes reference to dancing trees, which are apparently a variant on the B+ tree; however he does not provide enough detail to demonstrate the nature of the relationship.[citation needed]
[edit] Implementation Details
The leaves (the bottom-most index blocks) of the B+ tree are often linked to one another in a linked list; this makes range queries simpler and more efficient (though the aforementioned upper bound can be achieved even without this addition). This does not substantially increase space consumption or maintenance on the tree.
If a storage system has a block size of B bytes, and the keys to be stored have a size of k, arguably the most efficient B+ tree is one where b = (B / k) − 1. Although theoretically the one-off is unnecessary, in practise there is often a little extra space taken up by the index blocks (for example, the linked list references in the leaf blocks). Having an index block which is slightly larger than the storage system's actual block represents a significant performance decrease; therefore erring on the side of caution is preferable.
[edit] History
The B+ tree was first described in the paper "Rudolf Bayer, Edward M. McCreight: Organization and Maintenance of Large Ordered Indices. Acta Informatica 1: 173-189 (1972)".