Lightning Memory-Mapped Database
Original author(s) | Howard Chu |
---|---|
Developer(s) | Symas |
Development status | production |
Written in | C |
Operating system | Unix, Linux, Windows, AIX, Sun Solaris, SCO Unix, Mac OS, Ios |
Type | Embedded database |
License | OpenLDAP Public License |
Website |
symas |
Lightning Memory-Mapped Database (LMDB) is a software library that provides a high-performance embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records at the end of the database (MDB_APPEND) which gives a dramatic write performance increase over other similar stores.[1] LMDB is not a relational database but like Berkeley DB and other similar databases such as dbm is strictly a key-value store.
LMDB may also be used concurrently in a multi-threaded or multi-processing environment, with read performance scaling linearly by design. LMDB databases may have only one writer at a time, however unlike many similar key-value store databases, write transactions do not block readers, nor do readers block writers. LMDB is also unusual in that multiple applications on the same system may simultaneously open and use the same LMDB store, as a means to scale up performance. Also LMDB does not require a transaction log (thereby increasing write performance by not needing to write data twice) because it maintains data integrity inherently by design.
History
LMDB's design was first discussed in a 2009 post to the OpenLDAP developer mailing list,[2] in the context of exploring solutions to the cache management difficulty caused by the project's dependence on Berkeley DB. A specific goal was to replace the multiple layers of configuration and caching inherent to Berkeley DB's design with a single, automatically managed cache under the control of the host operating system.
Development subsequently began, initially as a fork of a similar implementation from the OpenBSD ldapd project.[3] The first publicly available version appeared in the OpenLDAP source repository in June 2011.[4]
The project was known as MDB until November 2012, after which it was renamed in order to avoid conflicts with existing software.[5]
Technical Description
Internally LMDB uses B+Tree data structures. The efficiency of its design and small footprint had the unintended side-effect of providing good write performance as well. LMDB has an API similar to Berkeley DB and dbm. LMDB treats the computer's memory as a single address space, shared across multiple processes or threads using shared memory with copy-on-write semantics (known historically as a single-level store). Due to most former modern computing architectures having 32-bit memory address space limitations, which imposes a hard limit of 4Gb on the size of any database using such techniques, the effectiveness of the technique of directly mapping a database into a single-level store was strictly limited. However, today's 64 bit processors now mostly implement 48 bit address spaces, giving access to 47 bit addresses or 128 terabytes of database size,[6] making databases using shared memory useful once again in real-world applications.
Specific noteworthy technical features of LMDB are:
- Its use of B+Tree. With an LMDB instance being in shared memory and the B+Tree block size being set to the OS page size, access to an LMDB store is extremely memory efficient[7]
- New data is written without overwriting or moving existing data. This results in guaranteed data integrity and reliability without requiring transaction logs or cleanup services.
- The provision of a unique append-write mode (MDB_APPEND)[8] which is implemented by allowing the new record to be added directly to the end of the B+Tree. This reduces the number of reads and write page operations, resulting in greatly-increased performance but requiring that the programmer is responsible for ensuring key integrity.
- Copy-on-write semantics help ensure data integrity as well as providing transactional guarantees and simultaneous access by readers without requiring any locking, even by the current writer. New memory pages required internally during data modifications are allocated through copy-on-write semantics by the underlying OS: the LMDB library itself never actually modifies older data being accessed by readers because it simply cannot do so: any shared-memory updates automatically create a completely independent copy of the memory-page being written to.
- As LMDB is memory-mapped, it can return direct pointers to memory addresses of keys and values through its API, thereby avoiding unnecessary and expensive copying of memory. This results in greatly-increased performance (especially when the values stored are extremely large), and expands the potential use cases for LMDB.
- LMDB also tracks unused memory pages, using a B+Tree to keep track of pages freed (no longer needed) during transactions. By tracking unused pages the need for garbage-collection (and a garbage collection phase which would consume CPU cycles) is completely avoided. Transactions which need new pages are first given pages from this unused free pages tree; only after these are used up will it expand into formerly unused areas of the underlying memory-mapped file. On a modern filesystem with sparse file support this helps minimise actual disk usage.
Concurrency
LMDB allows multiple threads within multiple processes to coordinate simultaneous access to a database. Readers scale linearly by design. While write transactions are globally serialized via a mutex, read-only transactions operate in parallel, including in the presence of a write transaction, and are entirely wait free except for the first read-only transaction on a thread. Each thread reading from a database gains ownership of an element in a shared memory array, which it may update to indicate when it is within a transaction. Writers scan the array to determine the oldest database version the transaction must preserve, without requiring direct synchronization with active readers.
Performance
In 2011 Google published software which allowed users to generate micro-benchmarks comparing LevelDB's performance to SQLite and Kyoto Cabinet in different scenarios.[9] In 2012 Symas added support for LMDB and Berkeley DB and made the updated benchmarking software publicly available.[10] The resulting benchmarks showed that LMDB outperformed all other databases in read and batch write operations. SQLite with LMDB excelled on write operations, and particularly so on synchronous/transactional writes.
It should be noted that the benchmarks showed the underlying filesystem as having a big influence on performance. JFS with an external journal performs well, especially compared to other modern systems like Btrfs and ZFS.[11][12] Zimbra has tested back-mdb vs back-hdb performance in OpenLDAP, with LMDB clearly outperforming the BDB based back-hdb.[13] Many other OpenLDAP users have observed similar benefits.[14]
Since the initial benchmarking work done in 2012, multiple follow-on tests have been conducted with additional database engines for both in-memory [15] and on-disk [16] workloads characterizing the performance across multiple CPUs and record sizes. These tests show that LMDB performance is unmatched on all in-memory workloads, and excels in all disk-bound read workloads, as well as disk-bound write workloads using large record sizes. The benchmark driver code was subsequently published on github[17] and further expanded in database coverage.
Reliability
LMDB was designed from the start to resist data loss in the face of system and application crashes. Its Copy-on-write approach never overwrites currently-in-use data. Avoiding overwrites means the structure on disk/storage is always valid, so application or system crashes can never leave the database in a corrupted state. In its default mode, at worst a crash can lose data from the last not-yet-committed write transaction. Even with all asynchronous modes enabled, it is only an OS catastrophic failure or hardware power-loss event rather than merely an application crash that could potentially result in any data corruption. In an early 2014 study of crash behavior of a dozen software packages including LevelDB and Berkeley DB, LMDB was the only software that showed no data loss or corruption.[18]
Two academic papers were also presented at the USENIX Annual Technical Conference covering failure modes of DB engines (including LMDB).[19][20] Both papers claim to point out failures in LMDB, however the pillai paper is shown in follow-up technical discussions with the developers of LMDB to be mistaken.[21] Additionally the zheng_mai conclusion depends on whether fsync or fdatasync is utilised. Using fsync ameliorates the problem. Selection of fsync or fdatasync is a compile-time switch which is not the default behavior in current GNU/Linux builds of LMDB, but is the default on MacOSX, *BSD, Android, and Windows. Default GNU/Linux builds of LMDB are therefore the only ones vulnerable to the problem discovered by the zhengmai researchers however LMDB may simply be rebuilt by GNU/Linux users to utilise fsync instead.[22]
License Issues
In June 2013, Oracle changed Berkeley DB's license from the Sleepycat license to the Affero General Public License,[23] thus restricting its use in a wide variety of applications. This caused the Debian project to exclude the library from 6.0 onwards. It was also criticized that this license is not friendly to commercial redistributors. The discussion sparked over if the same re-licensing could happen to LMDB as well. Author Howard Chu made clear that LMDB is part of the OpenLDAP project, which had its BSD style license before he joined, and it will stay like it. No copyright is transferred to anybody by checking in, which would make a similar move like Oracle's impossible.[24][25][26][27][28][29][30][31][32]
The Berkeley DB license issue has caused major GNU/Linux distributions such as Debian to completely phase out their use of Berkeley DB, with a preference for LMDB.[33]
API and Uses
There are wrappers for several programming languages, such as C++,[34][35] Python,[36][37] Lua,[38] Go,[39] Ruby,[40] Objective C,[41] Javascript,[42] C#,[43] Perl,[44] and PHP[45]
A complete list of wrappers may be found on the main web site[46]
Howard Chu ported SQLite 3.7.7.1 to use LMDB instead of its original B-tree code, calling the end result SQLightning.[47] One cited insert test of 1000 records was 20 times faster (than the original SQLite with its B-Tree implementation).[48] LMDB is available as a backing store for other open source projects including Cyrus SASL,[49] Heimdal Kerberos,[50] and OpenDKIM.[51] It is also available in some other NoSQL projects like MemcacheDB [52] and Mapkeeper.[53] LMDB was used to make the in-memory store Redis persist data on disk. The existing back-end in Redis showed pathological behaviour in rare cases, and a replacement was sought. The baroque API of LMDB was criticized though, forcing a lot of coding to get simple things done however its performance and reliability during testing was considerably better than the alternative back-end stores that were tried.[54]
An independent 3rd party software developer utilised the Python bindings to LMDB[55] in a high-performance environment and published, on the prominent technical news site Slashdot, how the system managed to successfully sustain 200,000 simultaneous read, write and delete operations per second (a total of 600,000 database operations per second)[56][57]
An up-to-date list of applications using LMDB is maintained on the main web site[58]
Application Support
Many popular free software projects distribute or include support for LMDB, often as the primary or sole storage mechanism.
- The Debian,[59] Ubuntu,[60] Fedora,[61] and OpenSuSE[62] operating systems.
- OpenLDAP, for which LMDB was originally developed via back-mdb.[63]
- Postfix via the lmdb_table adapter.[64]
- PowerDNS, the DNS server used by Wikimedia.
- CFEngine uses LMDB by default since version of 3.6.0.[65]
- Shopify use LMDB in their SkyDB system.[66]
- Caffe is a deep learning framework, in active development by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
- FineDB is a NoSQL server built on LMDB, which compares itself, amongst others, with Redis, CouchBase, MongoDB.[67]
- Extenium ACID compliant NoSQL database is implemented on top of LMDB.
- The Apache module, mod_authn_lmdb [68] provides Apache user authentication using an LMDB as a credential store.
- The GNU Guile generic graph database sph-dg builds upon LMDB.
- InfluxDB, a time series, events, and metrics database.
- urbackup employs LMDB as a cache backend.
Technical reviews of LMDB
LMDB makes unusual (novel) use of well-known computer science techiques such as copy-on-write semantics and B+Trees to provide atomicity and reliability guarantees as well as performance that can be hard to accept, given the library's relative simplicity and that no other similar key-value store database offers the same guarantees or overall performance, even though the authors explicitly state in presentations that LMDB is read-optimised not write-optimised. Additionally, as LMDB was primarily developed for use in OpenLDAP its developers are focussed mainly on development and maintenance of OpenLDAP, not on LMDB per se. The developers limited time spent presenting the first benchmark results was therefore criticized as not stating limitations, and for giving a "silver bullet impression" not adequate to address an engineers attitude[69] (it has to be pointed out that the concerns raised however were later adequately addressed to the reviewer's satisfaction by the key developer behind LMDB.[70][71])
The presentation did spark other database developers dissecting the code in-depth to understand how and why it works. Reviews run from brief [72] to in-depth. The RavenDB author wrote a 12-part series of articles on his analysis of LMDB, beginning July 9, 2013. The conclusion was in the lines of "impressive codebase ... dearly needs some love", mainly because of too long methods and code duplication.[73] This review, conducted by a .NET developer with no former experience of c, concluded on August 22, 2013 with "beyond my issues with the code, the implementation is really quite brilliant. The way LMDB manages to pack so much functionality by not doing things is quite impressive... I learned quite a lot from the project, and it has been frustrating, annoying and fascinating experience"[74]
Multiple other reviews cover LMDB[75] in various languages including Chinese.[76][77]
References
- ↑ LMDB Reference Guide. Retrieved on 2014-10-19
- ↑ back-mdb - futures. Retrieved on 2014-10-19
- ↑ MDB: A Memory-Mapped Database and Backend for OpenLDAP. Retrieved 2014-10-19
- ↑ First public version of MDB source code. Retrieved 2014-10-19
- ↑ MDB renamed to LMDB. Retrieved 2014-10-19
- ↑ Chu, Howard (2011). MDB: A Memory-Mapped Database and Backend for OpenLDAP (PDF). LDAPCon..
- ↑ B+Tree#Implementation
- ↑ LMDB Reference Guide. Retrieved on 2014-10-19
- ↑ "LevelDB Benchmarks". http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html''. Google, Inc.
- ↑ Chu, Howard. "Database Microbenchmarks". Symas Corp. Retrieved 8 August 2014.
- ↑ "MDB Microbenchmarks". Symas Corp., 2012-09
- ↑ Database Microbenchmarks, Symas Corp., 2012-07.
- ↑ "OpenLDAP MDB vs HDB performance". Zimbra, Inc.
- ↑ OpenLDAP LMDB vs BDB
- ↑ Chu, Howard. "In-Memory Microbenchmark". Symas Corp.
- ↑ Chu, Howard. "On-Disk Microbenchmark". Symas Corp.
- ↑ "Benchmark Drivers". https://github.com/hyc/leveldb/tree/benches/doc/bench''.
- ↑ Application-Level Crash Vulnerabilities (PDF). University of Wisconsin.
- ↑ "Usenix 2014, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications".
- ↑ "Usenix 2014, Torturing Databases for Fun and Profit".
- ↑ "Archive of discussion regarding the Usenix 2014 pillai paper".
- ↑ "LMDB Crash consistency discussion".
- ↑ "Berkeley DB Release Announcement". Oracle.
- ↑ "Berkeley DB 6.0 license change to AGPLv3".
- ↑ "Oracle switches Berkeley DB license". InfoWorld.
- ↑ "Oracle Quietly Switches BerkeleyDB to AGPL". Slashdot.
- ↑ Programmers in Ukraine
- ↑ "Oracle passe Berkeley DB sous licence GNU AGPL". Le Monde Informatique.
- ↑ abclinuxu.cz
- ↑ "Debian, Berkeley DB, and AGPLv3". LWN.net.
- ↑ "Berkeley DB 6.0 license change to AGPLv3". LWN.net.
- ↑ "Re: Berkeley DB 6.0 license change to AGPLv3". LWN.net.
- ↑ Surý, Ondřej (June 19, 2014). "New project goal: Get rid of Berkeley DB (post jessie)". debian-devel (Mailing list).
- ↑ LMDB C++11 wrapper, 2015-04
- ↑ LMDB C++ wrapper, 2012-11.
- ↑ LMDB Python wrapper, 2013-02
- ↑ py-lmdb. Retrieved on 2014-10-20.
- ↑ LMDB Lua wrapper, 2013-04.
- ↑ LMDB Go wrapper, 2013-04
- ↑ LMDB Ruby wrapper, 2013-02
- ↑ LMDB Objective-C wrapper, 2013-04
- ↑ LMDB Node.js wrapper, 2013-05
- ↑ LMDB .Net wrapper, 2013-06
- ↑ LMDB Perl wrapper, 2013-08
- ↑ LMDB PHP wrapper, 2015-04
- ↑ "List of API wrappers for LMDB".
- ↑ SQLightning: a port of SQLite to use LMDB
- ↑ SQLightning tests.
- ↑ Cyrus SASL
- ↑ Heimdal Kerberos
- ↑ OpenDKIM
- ↑ LMDB in MemcacheDB
- ↑ Mapkeeper
- ↑ "Second Strike With Lightning". Anchor.
- ↑ "Python bindings to LMDB".
- ↑ "Python-LMDB in a high-performance environment on Slashdot".
- ↑ "Open letter to Howard Chu and David Wilson regarding Python-LMDB".
- ↑ "List of projects using LMDB".
- ↑ liblmdb0 in Debian Jessie. Retrieved 2014-10-20.
- ↑ lmdb in Ubuntu 14.04 LTS
- ↑ LMDB in Fedora 20. Retrieved 2014-10-20.
- ↑ lmdb in OpenSUSE. Retrieved 2014-10-20.
- ↑ OpenLDAP back-mdb. Retrieved 2014-10-20
- ↑ Postfix lmdb_table(5). Retrieved 2014-10-20
- ↑ What's new in CFEngine 3.6
- ↑ SkyDB on LMDB
- ↑ FineDB Comparison and Performance
- ↑ "Updating custom apache modules".
- ↑ "LMDB: The Leveldb Killer?".
- ↑ "Response to LMDB review".
- ↑ LMDB: The Leveldb Killer?. Retrieved 2014-10-20.
- ↑ "Lightning Memory-Mapped Database".
- ↑ "Reviewing Lightning memory-mapped database library: Partial".
- ↑ "Some final notes about LMDB review".
- ↑ "LMDB". Sampath Herga.
- ↑ "lmdb".
- ↑ lmdb