Versant Object Database

From Wikipedia, the free encyclopedia
Versant Object Database
Developer(s) Versant Corporation
Stable release 8.0.2.15[1] / October 1, 2012 (2012-10-01)
Development status Active
Written in Java, C, C#, C++, Smalltalk, Python
Operating system Cross-platform Solaris, Linux, Windows (NT thru Vista), AIX, HP-UX (both 32 and 64 bit for all platforms)
Type Object Database
License All rights reserved
Website www.versant.com

Versant Object Database (VOD) is an object database software product developed by Versant Corporation.

The Versant Object Database enables developers using object oriented languages to transactionally store their information by allowing the respective language to act as the Data Definition Language (DDL) for the database. In other words, the memory model is the database schema model.[2]

In general, persistence in VOD in implemented by declaring a list of classes, then providing a transaction demarcation application programming interface to use cases. Respective language integrations adhere to the constructs of that language, including syntactic and directive sugars.

Additional APIs exist, beyond simple transaction demarcation, providing for the more advanced capabilities necessary to address practical issues found when dealing with performance optimization and scalability for systems with large amounts of data, many concurrent users, network latency, disk bottlenecks, etc.

Feature highlights

Supported languages

Primary supported languages are Java, C# and C++. Versant also has language support for Smalltalk and Python.

Query systems

VOD supports queries via a server side indexing and query execution engine. Query support includes both a Versant-specific and a standards-based query language syntax. Versant provides this query capability in a number of forms depending on the developer's chosen language binding. For example, in Java VOD provides VQL (Versant Query Language), JDOQL, EJB QL and OQL. In C++ Versant provides VQL and OQL, with C# support for VQL, OQL and LINQ. VOD will do optimization of query execution based on available attribute indexes. Versant also has support for standard SQL queries against the Versant database using ODBC/JDBC drivers.

Versant Query Language

The native Versant Query Language (VQL) is similar to SQL92. It is a string based implementation which allows parameterized runtime binding. The difference is that instead of targeting tables and columns, it targets classes and attributes.

Other object-oriented elements apply to query processing. For example, a query targeting a super class will return all instances of concrete subclasses that satisfy the query predicate. VOD is a distributed database: a logical database can be composed of many physical database nodes, with queries are performed in parallel.

Versant query support includes most of the core concepts found in relational query languages including: pattern matching, join, set operators, orderby, existence, distinct, projections, numerical expressions, indexing, cursors, etc.

Indexing

VOD supports indexes on large collections. However it is not necessary to have a collection in order to have a queryable object with a usable index. Unlike other OODB implementations, any object in a Versant database is indexable and accessible via query. Indexes can be placed on attributes of classes and those classes can then be the target of a query operation. Indexes can be hash, b-tree, unique, compound, virtual and can be created online either using a utility, via a graphical user interface or via an API call.

Large collection support

VOD provides pagination support for large collections using a special node based implementation. These collections are designed in such a way that access is done so that only nodes needed by the client are brought resident into memory, instead of having to load the entire collection.

These large collections are created and operated on just as other persistent collection classes. The interface is also consistent with the appropriate language constructs. For example C++ Standard Template Library, Java iterators, C# enumerables, etc.

Collections of objects by default are only a collection of object identifiers. So, these can be very large, yet have a small resident memory footprint. To iterate the collection, objects are dereferenced into client memory space in either a configurable batch mode or one at a time. A query on the collection can be done using the “in” operator (or other set based operators like subset_of, superset_of, etc.) without loading the collection to the client memory space.

Data replication

There are several mechanisms for replication on VOD that depend on the motivation behind the replication. It is for high availability or for distribution or integration.

High availability

Versant does synchronous pair replication. Full replication for fault tolerance only requires installation of one configuration file specifying the buddy node names: New connections notice the existence of the replica file and on connect, check the file for a buddy pair and if it exists, connect to both buddies. This could be a distributed database so that there are many buddy pairs. Then all transactional changes are committed synchronously to the buddy database server processes.

If any one of the databases in the buddy pair should become unreachable, the in-flight transactions are handled so that there is no commit failure, instead in-flight transactions on node failure will continue to the node that is still alive in the buddy pair. On the machine where the node is still alive and processing transactions, a new process will start that monitors for the crashed database to become accessible again. Once the previously failed node is alive, the monitoring process starts replicating all changes that have occurred since the time of failure to bring the two buddies back into full synchronization. Once they are in full sync, a flag is set and on the next transaction clients will move back to full synchronous operation. All of this is handled without any user involvement.

In the case of extreme failure, like a broken disk drive, etc., the replicated node can be recreated from an online backup of the live node. Simply install a new disk drive, take an online backup of the live node, restore on the failed machine, start the monitor to sync the last few transactions and restore full replication at clients.

Distribution

Distribution is handled using Versant Asynchronous Replication (VAR), a channel driven, master-slave or peer-to-peer replication framework with rule based conflict detection and resolution.

An administrator uses a utility to define replication channels. Channels are named entities that define a scope of replication within a physical node. The “scope” can be anything from full database replication to something as fine grained as anything definable by a Versant query. Once the channels are defined, applications can register as listeners on these channels, at which point changes from those channel begin to flow to the respective clients.

These channels provide both persistence and reliable messaging. In the event that a connection is lost between a registered listener and a channel, ongoing changes will be guaranteed delivery once the connection is re-established. There are multiple transport protocols that can be configured for optimization in highly reliable LAN networks or high reliability in unreliable WAN type of environments.

In bi-directional channel replication, a set of conflict detection rules are put in place so that conflicting changes can be resolved at runtime without disrupting channel activity. There are other forms of data distribution.

Integration

Usually, integration requires some kind of custom code. Users can connect to both relational and Versant databases using ORM products. They can load objects either from a relational database or Versant and then with some minor code implementation, disconnect those objects from the source and write them to a target. This can be used for import/export in a batch processing mode for integration with other database systems.

Data distribution architecture

VOD handles distributed data processing using a distributed two-phase commit protocol across multiply connected databases. In this process, VOD uses an internal resource manager that is handling the distributed transactions. Versant also supports the XA protocol allowing external transaction monitors to control the transactional context, so for example plug into a CORBA or J2EE application server.

Versant allows object relationships to span physical resource (database) nodes. Shared information referenced from object graphs that reside in other databases and resolution of that information is transparent at runtime. For example, several physical databases may hold user information models that are partitioned by account number holding aggregations on account activities such as trades and then have some more databases holding actual trade models and these users and trades can be related. A query across all of the user databases and return a user (or set of users), then as messages are sent to user objects involving trades, the trade models will automatically be resolved across the distribution. After updates of any of those objects, at commit time Versant will ensure that all changes commit back to their respective physical nodes in a completely ACID 2phase commit process.

Object id’s are guaranteed to be unique across all physical nodes. Objects could be “moved” from one physical node to another without any application code changes required.

Schema evolution

Schema evolution is handled via a normal update of the application's class models and then applying those changes to the operational database. Those schema changes can be applied to an existing database either via a utility or API. The result is a versioning of the database schema.

Existing objects in the database are lazily evolved to the latest schema version. No object is actually evolved unless it is made dirty (marked for update) and committed back to the database. In general this means an application with the new schema will not cause evolution, expect for new and updated objects.

There are utilities that can “crawl” a database slowly evolving all instanced to the latest version by grabbing sets of them, marking them dirty, committing. This is sometimes desired for embedded or real-time systems where performance and space needs to be optimized.

In most cases, older clients get patch updates with the new schema in conjunction with updates to the server. The clients schema version is in sync with the database server. Versant’s loose schema mapping facility can also be used. This is enabled by a flag in the client so that it does not complain about a mismatch in schema version and instead filters the incoming objects to match the old schema. Using this facility requires some forethought to avoid any unintended side effects.

The process goes as follows:

  1. class definitions are updated, i.e. add new subclasses, add attributes, rename attributes, remove attributes, etc. and recompile. When the application connects to a Versant database, a schema version mismatch will be detected and you would normally get an error unless you take some action to avoid the mismatch.
  2. The schema mismatch can be avoided using a number of techniques.
    1. a utility can be used to describe the new schema to the database. The utility will show a list of incompatibilities and ask how you want them to be resolved. Your action will depend on whether you are in development, QA, production, etc. Regardless, actions like dropping the existing class, evolving the schema version and keeping all existing objects, rename and retype, etc, are also possible.
    2. the evolution process can be automated via connection options. This is normally used in development mode and allows the schema to automatically evolve any mismatches on connect and continue preserving the existing objects.
    3. specific API’s can be used to dynamically evolve the database schema. This is an advanced topic, involving what's called Versant runtime classes. Basically, you can create completely dynamic schema structure for the database so that new classes and attributes can be created on the fly.
  3. If clients with the older schema continue to operate on the database, loose_schema_mapping in the application profile file should be set to true.
  4. Optionally, a utility can be started to crawl the database and force version migration of all existing instances.

The general guidelines for schema evolution are that any schema changes can be made and existing instances preserved, without having to write custom evolution code, with the exception of two things:

  1. Changes to the middle of an inheritance hierarchy. Inserting a new class into the middle of a hierarchy is impossible without losing your existing objects, unless custom code is written to do this operation in a series of steps.
  2. Incompatible type changes like Array to a String.

All other forms of evolution like renaming attributes, deleting leaf classes, adding leaf classes, adding new classes, adding or removing attributes, etc. can be done online and without custom code. If actions like setting non standard default values for newly added attributes are necessary, this can be done in callback functions within the objects. There are a set of standard object lifecycle callbacks that get invoked in activities like cache load. Those callbacks can be used to check for default values and take action if necessary.

Persistent object lifecycle

The lifecycle of an object load can be controlled on a use case basis.

By default, objects are loaded only when they are sent a message. This includes the default behavior for queries which only return a collection of references to objects that satisfied the query predicate, not the actual objects. When an object is loaded, all it’s non-reference attributes (primitives) are also loaded and remaining reference types follow the same pattern as the referencing object.

When a message is sent to an object VOD looks into internal structures to see if the object is already in client memory. If not, VOS does an RPC to load the object. At the time VOD loads the object, it will also look at the connections locking strategy to decide how to deal with locking the object on load. VOD supports both global locking strategies that can be applied to a connection and extremely fine grained control to override behavior for a particular use case.

Once an object is loaded and locked it stays in the client cache, with an equivalent lock in the server, until one of a number of events occurs.

The most common event, the current transaction ends with commit. In the default case, this will release the lock and object from memory. However, note that there are forms of commit that will do combinations of things like, keep the cache and the locks and start a new transaction, keep the cache, but release the locks and start a new transaction. These forms and others are used to optimize cache effectiveness when using non-default locking strategies like optimistic locking or when you have a series of transactions that form a task and operate on the same set of objects.

Another possibility is that your client cache starts to get full. In this case, VOD may decide to swap objects back to the server process to make space and do some work that will have to be done at commit anyway. VOD does this in a fully transactional way, so that even if modified objects get swapped to the server, they will still be undone if the transaction is rolled back. Also, you have the ability to “pin” objects into the client cache to prevent swapping of important sets of objects, enabling the use of direct memory pointers without concern for memory faults.

Another possible event is a query call which has the option set to flush the cache of objects in the target class, so that changed objects currently in your cache become part of the current query execution evaluation.

Other possibilities include API calls that result in explicit release of the object, like a call to refresh or a call to release.

There are many ways to override the default behavior. Those are in fact commonly used to performance tune on a use case basis. For example, if you are going to iterate over a collection of 1000 objects, you don’t want to do 1000 RPC’s. Giving the collection of references to a call to groupRead will use a single RPC and load all objects. Similarly, you can make a call to getClosure which will use groupRead behavior to load all referenced objects in a graph from the starting point, down to your specified level of reachability. Further, queries have options to set a lock and load result sets rather than just references or to use cursors. There are API’s to explicitly load objects into cache and set higher lock levels than the connection defaults, etc.

Achieving persistence

For users of C++, Versant requires that the uppermost class in an inheritance hierarchy inherit from a base class “PObject”, which handles database activities.

Then there is a file setup, schema.imp, that declares which classes in the model are to be made persistent and that file is used in a pre-compilation phase where Versant's necessary magic is added to the persistent classes. Finally, the resulting schema.cxx file is compiled and linked with the application.

The pre-compilation phase is done with a utility though note this is typically automatically set up in one's visual development environment so the process is automatic when a build is done.

When using Java or .NET, this same procedure described above with C++ is accomplished using post-processing byte code enhancement. One sets up a file that declares which classes are to be persistent and then uses a utility, or API, or IDE integration to enhance the classes before running or debugging.

Versant provides other Java APIs based on standards JDO and JPA. In those versions of the API, the system adheres to the standards defined for declaring persistence whether it be some kind of XML or annotation. Enhancement is then done using a utility (similarly with .NET) or more commonly with Eclipse plug-in or Microsoft Visual Studio integration during the build process.

Integration with relational databases

A large percentage of Versant’s customers do some form of integration with relational tables. This can be accomplished in a couple of ways depending on the requirements such as: on-line/off-line, batch based, transactional, etc.

XA

Versant supports the XA protocol for distributed transactions. This allows participation in online distributed transactions with relational databases. The interaction with the relational tables can take many forms from custom code to ORM solutions to J2EE application servers (Entity Relationship Modeling) to message passing to ORBs, etc. The XA API allows the Versant database to act as a resource controlled by an external transaction monitor coordinating changes to both Versant and relational databases in the same transactional context.

ORM

Versant can interact with relational databases using Java ORM technology such as JDO (Java Data Objects) and Hibernate JPA. These standards-based implementations have the ability to detach objects from their transactional context and then attach them to another connection. There are restrictions in that Versant requires the application to use a concept known as database identity in order for replication to work with relations intact. Versant does not support the ORM form of application identity in anything other than a disconnected data form.

XML

Versant has tools that enable the import and export of XML data. For example, batch based replication of data can be accomplished by exporting objects from the Versant database as XML, if necessary applying an XSLT transform and then importing into relational tables. The opposite direction is also possible. With Java, the most common approach using XML is to dynamically replicate information using JAXB which runtime converts objects into and out of an XML form. Using JAXB, the Versant database only needs to work with objects rather than importing an XML form. In essence, XML coming from relational databases are converted to objects at runtime using JAXB and those objects are then persisted into the Versant database.

Custom code

Users of C++ are especially challenged in integrating with relational databases. Versant provides consulting to help these customers with their integration challenges, but does not make those solutions, which require customization for each application, available in a productized form.

Transactions

Versant by default is always implicitly in a transaction when connected to the database. In addition, VOD supports the XA protocol and apply that to certain standards based API’ such as JDO and JPA which require explicit transaction demarcation. There is a non-implicit form of transaction where transaction begin/end must be declared.

In order to discard from memory objects that have been modified in the current transaction you can either do it globally for the current transaction by issuing a rollback which also implicitly starts another transaction or you can do it in isolation or globally using specific calls within the same transaction.

Locking and caching strategies

Versant by default uses a pessimistic locking strategy to ensure that objects in the database server are in sync with client access in an ACID way. This is done by using a combination of locks against both schema and instance objects.

The database server process maintains lock request queues at the object level to control concurrency of access to the same object. A request for update will establish a queue if there are any existing readers of an object. The request either goes through when all current readers release their locks or times-out (an exception which can be handled by client is thrown). Locks are generally released at transaction boundaries. When a queue is established by an update request, all other subsequent requests fall in queue behind the update request. Once the update request has been filled, all read requests in the queue rush in and get their read lock, return the object, and if there are no other updates, the queue disappears. In this architecture, locks are done at the object level so false waits and false deadlocks do not occur.

Other ways of keeping client caches in sync are, for example, an optimistic locking strategy, using a classic timestamp mechanism. VOD also provides forms of client cache synchronization using multi-cast. Additionally it provides an event mechanism where clients can register for triggering events within the database server to be used for synchronization or for business logic work flow.

Scalability

Storage

Versant supports, multiple file and multiple process configurations. Data storage is done in a single or multiple files, but there are supporting files for the logging subsystem (logical and physical log files). These logging files are used for high performance and scalability under concurrent user loads and for online database backup processes.

Clients

Versant is a multi-user client server database and has production applications with thousands of concurrently connected users. Thus, Versant can also run linked and embedded in the same address space as the application process (so it can be also an embedded database).

Performance

Versant uses internal performance and scalability benchmarks to monitor and measure behavior over time across releases, patches and generations of new hardware.

Versant has done other non-standard benchmarking activities in a public forum.[3] .[4]

Versant ran the 007 benchmarks in the early 90’s but currently doesn’t provide any comparisons because there are no industry benchmarks that make sense for object databases,

One of the candidates considered was TPC-E, which was supposed to be the new OLTP standard database benchmark with new complex models aimed at being representative of today’s computing environment. The TPC-E is based on a financial trading system model. Still, comparative results could not be obtained. The reason is that the TPC specifies requirements regarding what part of the code resides in the “driver” of the benchmark and what part resides in “database” functionality. However, the driver to application logic interface is completely defined at the data level. This means that when measuring relational access you would not incur any overhead for mapping into a C++ object. The mapping of the raw data into what ever form was necessary in the driver to implement the business logic was completely outside of the benchmark measurements. When it comes to the object database, you need to now un-map the C++ objects into the driver data structures and in doing so, measure the cost of that activity as part of the benchmark timings.

But this is the opposite of a real world application where people write object oriented applications resulting in object oriented models. In a relational database, you need to map/un-map from objects to the relational data structures. The TPC-E was written in a way as to exclude the “mapping effect” from the measurements, which by the very nature of how an object database works means the TPC-E was written in a way that forces measurement of an “un-mapping effect”, an activity which does not occur in a real world application. Thus with TPC-E, the true cost of computing is removed for relational and even worse added to object databases.

Add-on Modules

Versant provides add-on modules for deployment or access to its Object Database.

  • V/Management Center: V/MC delivers real-time views of performance data and analytical information about the Versant Object Database. For example, it alerts administrators about potential issues before the database availability is affected. It's designed as an Eclipse-based RCP client.
  • Versant Compact: Online Database Maintenance.
  • Versant FTS: High Availability Database Server.
  • Versant Async Server: Production Database Replication.
  • Versant HA Backup: High Availability Backup Solution.
  • Versant SQL: SQL Access & Reporting.

Applications

Usually the “best kind of application” to use a Versant database are those applications requiring an application specific database of an OLTP nature. There are certain application characteristics where Versant technology provides better performance and scalability than traditional relational technology: complex models, large amount of data, large number of concurrent users.

Thus, VOD is found in applications within many different vertical industries: global trading platforms for large stock exchanges, network management for large telecommunications providers, intelligence analytics for defense agencies, reservation systems for large airline/hotel companies, risk management analytics for banking and transportation organizations, Massively multiplayer online game systems, network security and fraud detection, local number portability, advanced simulations, social networking, etc..

References

  1. "VOD patches and releases announcements". Versant. Retrieved 18 October 2012. 
  2. "TechView Product Report: Versant Object Database", odbms.org. Retrieved 6 October 2010.
  3. "Poleposition, the open source database benchmark",polepos.org. Retrieved 24 Februar 2011.
  4. "Accelerating IBM WebSphere Application Server Performance with Versant enJin", ibm.com. Retrieved 6 October 2010.
This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.