Log-structured merge-tree
Log-structured merge-tree | ||
---|---|---|
Type | Hybrid (two tree-like components) | |
Invented | 1996 | |
Invented by | Patrick O'Neil, Edward Cheng, Dieter Gawlick, Elizabeth O'Neil | |
Time complexity in big O notation | ||
Average | Worst case | |
Space | ||
Search | ||
Insert | ||
Delete |
In computer science, the Log-Structured Merge-Tree (or LSM-tree) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM-trees maintain data in two separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.
The LSM-tree is a hybrid data structure. It is composed of two tree-like structures, known as the C0 and C1 components. C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk. The performance characteristics of LSM-trees stem for the fact that each component is tuned to the characteristics of its underlying storage medium, and that data is efficiently migrated across media in rolling batches, using an algorithm reminiscent of merge sort.
LSM-trees are used in database management systems such as LevelDB, SQLite4, RocksDB and Apache Cassandra.
References
- O'Neil, P.; Cheng, E.; Gawlick, D.; O'Neil, E. (1996). "The log-structured merge-tree (LSM-tree)". Acta Informatica 33 (4): 351. doi:10.1007/s002360050048.
- Li, Y.; He, B.; Luo, Q.; Yi, K. (2009). "Tree Indexing on Flash Disks". 2009 IEEE 25th International Conference on Data Engineering. p. 1303. doi:10.1109/ICDE.2009.226. ISBN 978-1-4244-3422-0.
|