FLAIM Database Engine

From Wikipedia, the free encyclopedia

FLAIM
OS: Cross-platform
Use: Development Library
License: GPL
Website: FLAIM

Contents


[edit] Overview

FLAIM is an embeddable database technology, developed by Novell and released in 2006 as an open source project. "FLAIM" is an acronym for FLexible Adaptable Information Management, terms which appropriately describe the fundamental design goals of the technology. Even though FLAIM provides many traditional database features (e.g., transactions, recovery, reliability, scalability), it was conceived with a broader view toward the flexibility and adaptability that is offered by an XML data model. FLAIM is not new; various products have used FLAIM (notably Novell's GroupWise and eDirectory) for over 15 years with user licenses totaling well into the hundreds of millions.

FLAIM is similar to other embeddable database engines such as SQLite and Sleepycat/Oracle's Berkeley DB. To access the functionality offered by FLAIM, an application merely needs to link against either a static or dynamic version of the FLAIM library. FLAIM has been ported to a wide variety of 32- and 64-bit platforms, including SUSE Linux Enterprise Server, OpenSUSE, NetWare, Microsoft Windows, Fedora Core, Ubuntu Linux, Sun Solaris, AIX, Mac OS X, and HP-UX.

[edit] FLAIM Features

[edit] Transactions

  • Transaction begin, commit, abort. Use of rollback log for transaction abort and for recovery after a crash.
  • Transaction types:
    • Update. Update, read, and query operations allowed.
    • Read. Only read and query operations allowed. Read transactions provide a read consistent snapshot of the database as of the point in time the transaction is started.
    • Automatic. Single update operations may be told to automatically begin and end (commit or abort) a transaction if no transaction has been explicitly started.
  • Automatic rollback of failed transactions (due to application failures or CPU failures).
  • Periodic checkpoints to minimize recovery time after a system crash.
  • No limit on size of update transactions.
  • ACID principles supported: Atomicity, Consistency, Isolation, Durability.
  • Group Commit allows multiple update transactions committed to disk at once to enhance update performance.

[edit] Roll-forward Logging

  • Use of roll-forward log to minimize data that has to be written to commit a transaction.
  • Roll-forward log is used in automatic recovery after a crash. Transactions that were committed since the last checkpoint will be redone.
  • Multiple roll-forward log files may be used to support continuous backup feature. Files are numbered sequentially and are also identified with serial numbers to guarantee proper sequencing - no spoofing. Up to 4 billion log files supported - capacity is practically unlimited.
  • Option to use only a single roll-forward log file - for applications that do not care about continuous backup.
  • Roll-forward log files may be stored on a separate disk from rest of database.
  • Minimal transaction logging. Only deltas logged for record modifies. Only DRNs logged for record deletes.
  • Aborted transactions can be logged for debug purposes, but default is to not log them.
  • Support for logging of application data.

[edit] Database Reliability and Recovery

  • Automatic database recovery after a system crash. Rollback log is used to roll database back to last consistent checkpointed state. Then roll-forward log is used to redo transactions that were committed after the last checkpoint.
  • Recovery is idempotent. That is, if we crash during recovery, it will be resumed when the database is subsequently opened.
  • Reliability has been tested using an automated pull-the-plug test, which randomly cycles the power on the server during high volume updates to test database recovery. Thousands of pull-the-plug iterations have been performed.
  • Handling of disk-full conditions and other disk errors. Database attempts to stall new update transactions until disk-full condition is resolved - without requiring a shut down.
  • Protection against media failure. Customers can take hot backups and put roll-forward logs on a different volume than the database. If they do these things, two simultaneous disk failures would be required to lose any data.

[edit] Checksumming

  • Block checksum set on all blocks in the database when writing to disk.
  • Block checksum verified when reading blocks from disk.
  • Checksum used to automatically detect corruption.

[edit] Concurrency

  • One writer, multiple readers.
  • Readers don't block writers (they NEVER lock items in the database).
  • Writers don't block readers.
  • Read consistency for readers (readers get a stable consistent snapshot of the database). Rollback log is used to provide block multi-versioning.
  • Uncommitted data is not visible to other transactions.

[edit] Fields and Records

  • Variable length fields. Text and binary fields up to 4GB per field.
  • All fields are tagged - record is self-describing - no schema for record structure - structure is embedded in each record - XML-like.
  • Nested sub-records, N-levels deep.
  • Repeating fields and repeating sub-records.
  • No storage used for omitted fields or to pad text fields to a fixed length.
  • Unregistered fields (can store fields that are not defined in the dictionary).
  • Data types: text, numeric, binary, context, blob.
  • Text types: UNICODE.

[edit] Containers

  • Allow application to partition data records physically and/or logically.
  • Multiple containers per database.
  • Multiple record types per container.

[edit] Indexing

  • Compound indexes, component fields may be any FLAIM data type except BLOB.
  • Optional and/or required fields in compound indexes (key not generated if required field missing).
  • Existence indexes (indexes the presence of a field versus the field’s content).
  • Case insensitive and case sensitive collation.
  • Case insensitive collation with case preserved (post indexes).
  • White space compression, other special indexing rules.
  • Cross-record type indexes.
  • Counter indexes.
  • Sub-string indexing.
  • Each-word indexing.
  • Unique indexes.
  • Support for many international languages and collating sequences, including Arabic, Hebrew, Asian (Japanese, Korean, Chinese), etc.
  • Each index in a database can have its own international language.
  • Fast updating of large reference sets.
  • Keys up to 640 bytes long, key truncation supported.
  • Multiple indexes per container and/or per record type.
  • Left-end compression of index keys.
  • Compression of index reference sets.
  • APIs for reading of indexes directly (keys and references).
  • Dynamically updated when records are added, modified, or deleted.
  • Background indexing threads.
  • Suspend, resume indexing. Can take indexes “offline.”

[edit] Dynamic Dictionary

  • Add, modify, drop indexes, containers, field definitions.
  • Comment fields allowed in ALL dictionary records.

[edit] Query Capabilities

  • Rich set of query expression operators:
    • Comparison operators (equal, not equal, less than, less than or equal, greater than, greater than or equal, match, match begin, contains, match end). Text comparison operators include wild card matching.
    • Arithmetic operators (unary minus, multiply, divide, mod, plus, minus).
    • Logical operators (not, and, or).
    • Parentheses (used to alter normal operator precedence).
  • Simple, powerful mechanism for building up query expression programmatically:
    • Expression does not have to be passed in as a string.
    • Allows program to add operators, operands, and parentheses to the expression in infix order.
    • Allows program to easily use program variables which contain comparison values or field names.
    • Allows use of values that are not easily formatted into a string (such as binary).
  • Advanced query optimization (FLAIM will automatically select an indexes, etc. based on cost estimation).
  • Index specification - application may specify an index instead of letting FLAIM choose one.
  • Embedded Application-defined pPredicate callbacks.
  • Powerful navigational calls for retrieving and browsing through query results (retrieve first, last, next, previous, and current records). Only records which satisfy query expression are retrieved.

[edit] Read and Update Operations

  • Reading data records directly from containers (including dictionary container).
  • Reading of indexes directly (keys and references).
  • Advanced querying capabilities.
  • Navigating forward and backward through containers and indexes.
  • Update operations are: add, modify, and delete (including dictionary records).

[edit] Caching

  • Block cache, shared by all threads in a process - up to 4 GB on 32 bit machines, much more on 64 bit machines.
  • Record cache.
  • Cache poisoning prevention
  • Cache statistics available - hits, faults, hit looks, fault looks.
  • Memory fragmentation prevention. Background thread is continually moving cached items to eliminate fragmentation.

[edit] Optimized Disk Reading/Writing

  • Direct IO - bypass file system cache.
  • Asynchronous writes.
  • Sorting of blocks to optimize disk head movements. Also attempt to coalesce adjacent dirty blocks into larger write buffers for improved performance. Will fill write buffer with non-dirty blocks that are already in cache if it results in a more optimal write.

[edit] Database Validation and Repair

  • Routine for checking physical structure of database. Links between Blocks verified, B-Tree structure verified, block checksums verified, field and record structures verified, index keys and reference sets verified, data in fields verified.
  • Routine for checking indexes. Ensures that all keys that ought to be in an index are, in fact, in the indexes, and that no extra keys are in the indexes. In-line repair of index problems is allowed during index checking. Extra keys will be automatically deleted. Missing keys will be added.
  • Routine for repairing database. Can rebuild from a totally trashed file - or even a zero length file!
  • Callback facility in all functions to report progress. Allows application to display progress and cancel out if desired. Corruptions are also reported via the callback so that an application can create a detailed log of corruptions found if desired.

[edit] Backup/Restore

  • Hot backup. Backups can be performed without taking the database offline and without stopping updates.
  • Continuous backup. Roll-forward logs can be managed in a way that allows them to serve as a “continuous” backup of the database. No committed transaction will be lost.
  • Incremental backups. This minimizes what must be backed up - only blocks changed since last backup.
  • Capture of output during backup using callbacks. This allows an application to capture backup output and stream it directly to tape or other backup medium without having to stage it to an intermediate disk file first. An application could even choose to send backup data across a network connection to be stored on a remote device. FLAIM uses double-buffering so that an output device can be kept busy while FLAIM is fetching the next set of blocks to backup. This would help prevent a streaming tape device from stalling, resulting in dramatically improved backup throughput.
  • All blocks in backup include a checksum to ensure that data is reliable when restored.
  • Simple block compression used to minimize size of backup.
  • Use of serial numbers in roll-forward log files and backups to ensure identifiability when restoring. Database also has a serial number.
  • Restore from full backup, multiple incremental backups, and/or roll-forward logs - all in one call.
  • Streaming input during restore using callbacks. Allows an application to restore backed up data directly from tape or other backup medium without having to stage backed up data to an intermediate file first. An application could also use this to restore directly from a remote location by bringing the data over a network connection. FLAIM uses double-buffering so that an input device can be kept busy while FLAIM is writing out blocks from a backup to the database. This would help prevent a streaming tape device from stalling, resulting in dramatically improved restore throughput.
  • Status callbacks during backup/restore so that application can report progress and/or abort the backup or restore operation.
  • Partial restore supported. An application has the option of stopping a restore operation after either: 1) a full backup or incremental has been restored, or 2) after any transaction in the roll-forward log has been redone.

[edit] Database Monitoring, Statistics Collection

  • APIs to collect detailed statistics on disk I/O activity and transaction activity.
  • APIs to monitor cache utilization, including bytes used, number of blocks and records cached, cache hits, faults, etc.
  • APIs to collection detailed information about queries - to see what indexes were used, how many keys were fetched, how many records were fetched, how many failed the criteria, etc. This allows analyzing of query efficiency and troubleshooting of query performance problems.

[edit] Database Size

  • Database may grow up to 8 terabytes or 4 terabytes (depends on platform). Up to 4096 files may be created. Each file is limited to either ~2GB or ~4GB, depending on operating system limitations.
  • Number of records up to 4 billion per container.
  • Database grows as-needed. No need to preallocate disk space. However, when extending files, it is more optimal to extend by a large amount than a small amount, so we typically extend a file by 8 MB at a time.
  • Routine for re-claiming unused database blocks and log areas and returning to OS. Space may be reclaimed without taking database off-line.
  • Benchmarks and comparisons show FLAIM database size to be smaller than other databases (25-40%).
  • Database block size can be set on database creation to 4K or 8K.
  • Sophisticated block splitting and block combining to maximize block utilization.
  • Roughly 70% utilization in index blocks.
  • Roughly 80-90% utilization in data blocks.
  • Left end compression of index keys.
  • Compression of index reference sets.

[edit] Cross Platform

  • Database file is binary portable to ALL supported platforms, no need for conversions when moving database file from platform to platform. Little endian format used for most internal integer values.
  • Platforms: Netware, Windows (NT, 2000, XP-64 bit), Unix (Solaris, AIX, HP/UX), Linux, MAC OSX (both PowerPC and Intel). 64 bit supported for Windows, Linux, and Unix platforms where it is available.
  • Source code is developed in C++ programming language (one source for all platforms), allowing FLAIM to easily build libraries for other platforms – a new platform is generally an hour or two of work.
  • Operating System services are abstracted into common interfaces or C++ classes for upper layers of code so they don’t have to worry about operating system differences. Code is maintained in a handful of files. Abstractions exist for disk I/O, memory management, semaphores and mutexes, and so forth.

[edit] Utilities

  • Database checking utility (checkdb).
  • Database rebuild utility (rebuild).
  • Database browser/editor utility (dbshell). Can retrieve, add, modify, and delete records, perform transactions, perform queries, etc.
  • Low-level viewers: Physical structure viewer/editor (view) and roll-forward log viewer/searcher (rflview).
  • Text interface (TUI) for all platforms - supports colors, rudimentary windowing, keyboard access, and multiple screens. Have a common cross-platform abstraction for these services to hide platform specific details.
  • All utilities build and work on all platforms and have the same look and feel.

[edit] External links