Developer(s) | Versant Corporation |
---|---|
Stable release | 8 / February 16, 2010 |
Development status | Active |
Written in | Java, C, C#, C++, Smalltalk, Python |
Operating system | Cross-platform Solaris, Linux, Windows (NT thru Vista), AIX, HP-UX (both 32 and 64 bit for all platforms) |
Type | Object Database |
License | All rights reserved |
Website | www.versant.com |
Versant Object Database (also known as VOD or just "Versant") is an enterprise grade object database supporting massive concurrency and large data sets provided by Versant Corporation. For other meanings of VOD see the disambiguation page.
The Versant Object Database (VOD) enables developers using OO languages such as Java, C# and C++ to transactionally store their information by allowing the respective language to act as the Data Definition Language (DDL) for the database. In other words, the memory model is the database schema model.[1]
In general, the way to achieve persistence using VOD, consists in declaring the list of classes in your model which will be made persistent, then providing a relatively thin transaction demarcation API to your use cases. Respective language integrations adhere to the constructs of that language, including syntactical and directive sugars.
Additional APIs exist, beyond simple transaction demarcation, providing for the more advanced capabilities necessary to address practical issues found when dealing with performance optimization and scalability for systems with large amounts of data, lots of concurrent users, network latency, disk bottlenecks, etc.
Primary supported languages are Java, C# and C++. Versant also has language support for Smalltalk and Python users.
VOD supports queries via a server side indexing and query execution engine. Query support includes both a Versant-specific and a standards-based query language syntax. Versant provides this query capability in a number of forms depending on the developer's chosen language binding. For example, in Java VOD provides VQL (Versant Query Language), JDOQL, EJBQL and OQL. In C++ Versant provides VQL and OQL, with C# support for VQL, OQL and LINQ. VOD will do optimization of query execution based on available attribute indexes. Versant also has support for standard SQL queries against the Versant database using ODBC/JDBC drivers through the Versant SQL product.
The native Versant Query Language (VQL) is very similar to SQL92. It is a string based implementation which allows parameterized runtime binding. The difference is that instead of targeting tables and columns, it targets classes and attributes.
Other elements of the OO paradigm apply to the query processing. For example, a query targeting a super class will return all instances of concrete subclasses that satisfy the query predicate. VOD is a distributed database, meaning you can define a logical database composed of many physical database nodes, so queries are performed in parallel when using logical databases.
Versant query support includes most of the core concepts found in relational query languages including: pattern matching, join, set operators, orderby, existence, distinct, projections, numerical expressions, indexing, cursors, etc.
VOD supports indexes on large collections. However it is not necessary to have a collection in order to have a queryable object with a usable index. Unlike other OODB implementations, any object in a Versant database is indexable and accessible via query. Indexes can be placed on attributes of your classes and those classes can then be the target of a query operation. Indexes can be hash, b-tree, unique, compound, virtual and can be created online either using a utility, via a graphical user interface or via an API call.
VOD provides pagination support for large collections using a special node based implementation. These “Large” collections are designed in such a way that access is done so that only nodes needed by the client are brought resident into memory, instead of having to load the entire collection.
These large collections are created and operated on just as the other persistent collection classes. The API is also consistent with the appropriate language constructs i.e. C++ STL, Java iterators, C# enumerables, etc.
A further point is that normal collections of objects by default are only a collection of object identifiers. So, these can be very large, yet have a small resident memory footprint. As you iterate the collection, objects are dereferenced into client memory space in either a configurable batch mode or one at a time. So, even in the case where you are dealing with collections of billions of objects, you can easily handle the load/release memory cycles as you iterate the contents.
If you are trying to query the contents of the collection, that can be done on the server using the “in” operator ( or other set based operators like subset_of, superset_of, etc. ) without ever having to load the collection to the client memory space.
There are several mechanisms for replication on VOD that really depend on the motivation behind the replication. It is for high availability or for distribution or integration.
Versant does synchronous pair replication. This requires zero code changes, just install one configuration file specifying the buddy node names and presto, you are now in full replication for fault tolerance. The way it works is as follows: New connections notice the existence of the replica file and on connect, check the file for a buddy pair and if it exists, connect to both buddies. Note, this could be a distributed database so that there are lots of buddy pairs. Then all transactional changes are committed synchronously to the buddy database server processes.
If any one of the databases in the buddy pair should become unreachable, the in- flight transactions are handled so that there is no commit failure, instead in-flight transactions on node failure will continue to the node that is still alive in the buddy pair. On the machine where the node is still alive and processing transactions, a new process will start that monitors for the crashed database to become accessible again. Once the previously failed node is alive, the monitoring process starts replicating all changes that have occurred since the time of failure to bring the two buddies back into full synchronization. Once they are in full sync, a flag is set and on the next transaction clients will move back to full synchronous operation. All of this is handled without any user involvement.
In the case of extreme failure, like a broken disk drive, etc., the replicated node can be recreated from an online backup of the live node. Simply install a new disk drive, take an online backup of the live node, restore on the failed machine, start the monitor to sync the last few transactions and restore full replication at clients.
This is handed using Versant Asynchronous Replication (VAR). This is a channel driven, master-slave or peer-to-peer replication framework with rule based conflict detection and resolution. The way it works is that an administrator uses a utility to define replication channels. Channels are named entities that define a scope of replication within a physical node. The “scope” can be anything from full database replication to something as fine grained as anything definable by a Versant query. Once the channels are defined, applications can register as listeners on these channels, at which point changes from those channel begin to flow to the respective clients.
These channels provide both persistence and reliable messaging. So, in the event that a connection is lost between a registered listener and a channel, ongoing changes will be guaranteed delivery once the connection is re-established. There are multiple transport protocols that can be configured for optimization in highly reliable LAN networks or high reliability in unreliable WAN type of environments. In the event that you have bi-directional channel replication, a set of conflict detection rules are put in place so that conflicting changes can be resolved at runtime without disrupting channel activity. Note: There are other forms of data distribution for various operational purposes.
Usually, integration requires some kind of custom code. If you are using Java or C#, you can connect to both relational and Versant databases using ORM products. You can then load objects either from a relational database or Versant and then with some minor code implementation, disconnect those objects from the source and write them to a target. This can be used for import/export purposes in a batch processing mode for integration with other database systems. If you are using C++, Versant provides a special consulting solution that will allow your application to talk with a relational database.
VOD handles distributed data processing using a distributed two-phase commit protocol across multiply connected databases. In this process, VOD uses an internal resource manager that is handling the distributed transactions. Versant also supports the XA protocol allowing external transaction monitors to control the transactional context, so for example you can plug into a CORBA or J2EE application server.
Versant allows object relationships to span physical resource (database) nodes. So, you can have shared information referenced from object graphs that reside in other databases and resolution of that information is transparent at runtime. For example, you may have several physical databases holding user information models that are partitioned by account number holding aggregations on account activities such as trades and then have some more databases holding actual trade models and these users and trades can be related. So, you might do a query across all of the user databases and return a user ( or set of users ), then as you perform message sends to the user objects involving trades, the trade models will automatically be resolved across the distribution. If you perform updates of any of those objects, at commit time Versant will ensure that all changes commit back to their respective physical nodes in a completely ACID 2phase commit process.
Note - Object id’s are guaranteed to be unique across all physical nodes. Objects could be “moved” ( migrateobjs( objs[ ], fromDB, targetDB ) from one physical node to another without any application code changes required.
Schema evolution is handled via the normal update of your application class models and then applying those changes to the operational database. Those schema changes can be applied to an existing database either via utility or API. The result is a versioning of the database schema which occurs in sub second time regardless of the actual size of the database.
Existing objects in the database are now lazily evolved to the latest schema version. The approach is that no object is actually evolved unless it is made dirty ( marked for update ) and committed back to the database. So, in general this means an application with the new schema will not cause evolution, expect for new and updated objects.
There are utilities that can “crawl” a database slowly evolving all instanced to the latest version by grabbing sets of them, marking them dirty, committing. This is sometimes desired for embedded or real-time systems where every ounce of performance and space needs to be optimized.
In most cases, older clients get patch updates with the new schema in conjunction with updates to the server. So, the clients schema version is in sync with the database server. However, you can also use Versant’s loose schema mapping facility. This is enabled by a flag in the client so that it does not complain about a mismatch in schema version and instead filters the incoming objects to match the old schema. Clearly, using this facility requires some forethought to avoid any unintended side effects.
The process goes as follows:
The general guidelines for schema evolution are that any schema changes can be made and existing instances preserved, without having to write custom evolution code, with the exception of two things. 1) Changes to the middle of an inheritance hierarchy. So, you cannot do something like insert a new class into the middle of a hierarchy, without losing your existing objects, unless you write custom code to do this operation in a series of steps. 2) Incompatible type changes like Array to a String.
All other forms of evolution like renaming attributes, deleting leaf classes, adding leaf classes, adding new classes, adding or removing attributes, etc. can be done online and without custom code. If you have a need to do things like set non standard default values for newly added attributes, you can do this in callback functions within your objects. There are a set of standard object lifecycle callbacks that get invoked in activities like cache load. You can use those callbacks to check for default values and take action if necessary.
The lifecycle of an object load can be controlled on a use case basis.
By default, objects are loaded only when they are sent a message. This includes the default behavior for queries which only return a collection of references to objects that satisfied the query predicate, not the actual objects. When an object is loaded, all it’s non-reference attributes (primitives) are also loaded and remaining reference types follow the same pattern as the referencing object.
When a message is sent to an object VOD looks into internal structures to see if the object is already in client memory. If not, VOS does an RPC to load the object. At the time VOD loads the object, it will also look at the connections locking strategy to decide how to deal with locking the object on load. VOD supports both global locking strategies that can be applied to a connection and extremely fine grained control to override behavior for a particular use case.
Once an object is loaded and locked it stays in the client cache, with an equivalent lock in the server, until one of a number of events occurs.
The most common event, the current transaction ends with commit. In the default case, this will release the lock and object from memory. However, note that there are forms of commit that will do combinations of things like, keep the cache and the locks and start a new transaction, keep the cache, but release the locks and start a new transaction. These forms and others are used to optimize cache effectiveness when using non-default locking strategies like optimistic locking or when you have a series of transactions that form a task and operate on the same set of objects.
Another possibility is that your client cache starts to get full. In this case, VOD may decide to swap objects back to the server process to make space and do some work that will have to be done at commit anyway. VOD does this in a fully transactional way, so that even if modified objects get swapped to the server, they will still be undone if the transaction is rolled back. Also, you have the ability to “pin” objects into the client cache to prevent swapping of important sets of objects, enabling the use of direct memory pointers without concern for memory faults.
Another possible event is a query call which has the option set to flush the cache of objects in the target class, so that changed objects currently in your cache become part of the current query execution evaluation.
Other possibilities include API calls that result in explicit release of the object, like a call to refresh or a call to release.
There are many ways to override the default behavior. Those are in fact commonly used to performance tune on a use case basis. For example, if you are going to iterate over a collection of 1000 objects, you don’t want to do 1000 RPC’s. So, you can give the collection of references to a call to groupRead which will use a single RPC and load all objects. Similarly, you can make a call to getClosure which will use groupRead behavior to load all referenced objects in a graph from the starting point, down to your specified level of reachability. Further, queries have options to set a lock and load result sets rather than just references or to use cursors. There are API’s to explicitly load objects into cache and set higher lock levels than the connection defaults, etc.
For users of C++, Versant requires that the upper most class in an inheritance hierarchy inherit from a base class “PObject”, which handles database activities.
Then there is a file setup schema.imp that declares which classes in your model are to be made persistent and that file is used in a pre-compilation phase where Versants necessary magic is added to your persistent classes. Finally, the resulting schema.cxx file is compiled and linked with your application.
The pre-compilation phase is done with a utility though note this is typically automatically setup in your visual development environment so the process is automatic whenever you do a build.
When using Java or .NET this same procedure described above with C++ is accomplished using post processing byte code enhancement. You setup a file that declares which classes are to be persistent and then use a utility, or API, or IDE integration to enhance the classes before running or debugging.
Versant provides other Java APIs based on standards JDO and JPA. In those versions of the API the system adheres to the standards defined for declaring persistence whether it be some kind of XML or annotation. Enhancement is then done using a utility (similarly with .NET) or more commonly you have an Eclipse plug-in or Microsoft Visual Studio integration which automatically does the right stuff during the build process.
A large percentage of Versant’s customers do some form of integration with relational tables. This can be accomplished in a couple of ways depending on the requirements such as: on-line/off-line, batch based, transactional, etc.
Versant supports the XA protocol for distributed transactions. This allows Versant to participate in online distributed transactions with relational databases. The form of interaction with the relational tables can take many forms from custom code to ORM solutions to J2EE application servers ( Entity Relationship Modeling ) to message passing to ORB’s, etc. The XA API allows the Versant database to act as a resource controlled by an external transaction monitor coordinating changes to both Versant and relational databases in the same transactional context.
Versant can interact with relational databases using Java ORM technology such as JDO (Java Data Objects) and Hibernate JPA. These standards based implementations have the ability to detach objects from their transactional context and then attach them to another connection. There are restrictions in that Versant requires the application to use a concept known as database identity in order for replication to work with relations intact. Versant does not support the ORM form of application identity in anything other than a disconnected data form.
Versant has tools that enable the import and export of XML data. So, for example, batch based replication of data can be accomplished by exporting objects from the Versant database in the form of XML, if necessary applying an XSLT transform and then importing into relational tables. Of course, the opposite is also possible. In addition, with Java, the most common approach using XML is to dynamically replicate information using JAXB which runtime converts objects into and out of an XML form. Using JAXB, the Versant database only needs to work with objects rather than importing an XML form. In essence, XML coming from relational databases are converted to objects at runtime using JAXB and those objects are then persisted into the Versant database.
Users of C++ are especially challenged in integrating with relational databases. Versant provides consulting productized frameworks to help these customers with their integration challenges, but does not make those solutions, which require customization for each application, available in a productized form.
Versant by default is always implicitly in a transaction when connected to the database. In addition, VOD supports the XA protocol and apply that to certain standards based API’ such as JDO and JPA which require explicit transaction demarcation. So, there is a non-implicit form of transaction where transaction begin/end must be declared.
In order to discard from memory objects that have been modified in the current transaction you can either do it globally for the current transaction by issuing a rollback which also implicitly starts another transaction or you can do it in isolation or globally using specific calls within the same transaction.
Versant by default uses a pessimistic locking strategy to ensure that objects in the database server are in sync with client access in an ACID way. This is done by using a combination of locks against both schema and instance objects.
In brief, the database server process maintains lock request queues at the object level to control concurrency of access to the same object. A request for update will establish a queue if there are any existing readers of an object. The request either goes through when all current readers release their locks or times-out ( an exception which can be handled by client is thrown ). Locks are generally released at transaction boundaries. When a queue is established by an update request, all other subsequent requests fall in queue behind the update request. Once the update request has been filled, all read requests in the queue rush in and get their read lock, return the object, and if there are no other updates, the queue disappears. In this architecture, locks are done at the object level so false waits and false deadlocks do not occur.
Note that Versant supports other ways of keeping client caches in sync. For example, VOD can use an optimistic locking strategy, using a classic timestamp mechanism. VOD also provides forms of client cache synchronization using multi-cast. Additionally it provides an event mechanism where clients can register for triggering events within the database server to be used for synchronization or for business logic work flow.
Versant supports, multiple file and multiple process configurations. Data storage is done in a single or multiple files, but there are supporting files for the logging subsystem ( logical and physical log files ). These logging files are used for high performance and scalability under concurrent user loads and for online database backup processes.
Versant is a multi-user client server database and has production applications with thousands of concurrently connected users. That being said, Versant can also run linked and embedded in the same address space as your application process (so it can be also an embedded database).
Versant uses internal performance and scalability benchmarks to monitor and measure behavior over time across releases, patches and generations of new hardware. It’s an ongoing effort to improve in performance and scalability.
Versant has done other non-standard benchmarking activities in a public forum.[2] .[3]
At one point in time, Versant ran the 007 benchmarks, but that is long outdated originating back in the early 90’s. There are currently no industry benchmarks that make sense for object databases, so Versant doesn’t run any of the benchmarks recognized as standards in the database industry.
Versant took a serious look at the latest TPC-E, which was supposed to be the new OLTP standard database benchmark with new complex models aimed at being representative of today’s computing environment. The TPC-E is based on a financial trading system model.
Unfortunately, it was impossible to get real comparative results from TPC. The reason is that the TPC specifies requirements regarding what part of the code resides in the “driver” of the benchmark and what part resides in “database” functionality. However, the driver to application logic interface is completely defined at the data level. So, what this means is that when measuring relational access you would not incur any overhead for mapping into a C++ object. The mapping of the raw data into what ever form was necessary in the driver to implement the business logic was completely outside of the benchmark measurements. Of course, when it comes to the object database, you need to now un-map the C++ objects into the driver data structures and in doing so, measure the cost of that activity as part of the benchmark timings.
This is the exactly the opposite of a real world application. In the real world, people write object oriented applications resulting in object oriented models. Now if you choose a relational database, you need to map/un-map from your objects to the relational data structures. The TPC-E was written in a way as to exclude the “mapping effect” from the measurements, which by the very nature of how an object database works means the TPC-E was written in a way that forces measurement of an “un-mapping effect”, an activity which does not occur in a real world application.
Thus with TPC-E, the true cost of computing is removed for relational and even worse added to object databases. It was very disappointing because Versant had high hopes of finally delivering a public result representative of the technologies true value proposition, the reason why VOD has been selected for so many of the worlds most demanding applications.
Versant provides Add-on Modules for deployments or access to Versant Object Database.
Usually the “best kind of application” to use a Versant database are those applications requiring an application specific database of an OLTP nature. In other words, a non-traditional I.T. type of transactional application. That being said, there are certain characteristics, which when exhibited in an application, indicate a stronger value add by Versant.
Those characteristics are: complex models, large amount of data, large number of concurrent users. Any one of those three characteristics starts down a path of Versant value and at the extreme end, where you have all of those characteristics, Versant provides clear distinguishing value. The whole reason Versant still exists is that it provides better performance and scalability for applications with the above characteristics over traditional relational technology.
To that end, Versant is found in applications within many different Vertical industries where those characteristics come into play. So, Versant runs global trading platforms for the worlds largest stock exchanges, network management for the worlds largest telecommunications providers, intelligence analytics for defense agencies, reservation systems for the largest airline/hotel companies, risk management analytics for banking and transportation organizations, massive multi-player gaming systems, network security and fraud detection, local number portability, advanced simulations, social networking, etc., etc. as a representative set of industries and applications exhibiting those characteristics.