Object-relational mapping

From Wikipedia, the free encyclopedia

For another use of "ORM" see Object role modelling.

Object-relational mapping (aka ORM, O/RM, and O/R mapping) is a programming technique for converting data between incompatible type systems in relational databases and object-oriented programming languages. This creates, in effect, a "virtual object database" which can be used from within the programming language. There are both free and commercial packages available that perform object-relational mapping, although some programmers opt to create their own ORM tools.

Contents

[edit] Problem description

Data management tasks in object-oriented (OO) programming are typically implemented by manipulating objects, which are almost always non-scalar values. For example, consider an address book entry which represents a single person along with zero or more phone numbers and zero or more addresses. This could be modeled in an object-oriented implementation by a "person object" with "slots" to hold the data that comprise the entry: the person's name, a list (or array) of phone numbers, and a list of addresses. The list of phone numbers would itself contain "phone number objects" and so on. The address book entry is treated as a single value by the programming language (it can be referenced by a single variable, for instance). Various methods can be associated with the object, such as a method to return the preferred phone number, the home address, and so on.

However, many popular database products such as SQL DBMS can only store and manipulate scalar values such as integers and strings organized within tables.

The programmer must either convert the object values into groups of simpler values for storage in the database (and convert them back upon retrieval), or only use simple scalar values within the program. Object-relational mapping is used to implement the first approach.

The crux of the problem is translating those objects to forms which can be stored in the database, and which can later be retrieved easily, while preserving the properties of the objects and their relationships; these objects are then said to be persistent.

[edit] Implementations

The most common type of database used is the relational database, which predates the rise of object-oriented programming in the 1990s. Relational databases use a series of tables to organize data. Data in different tables are associated through the use of declarative constraints, rather than explicit pointers or links. The same data that can be stored in a single object value would likely need to be stored across several of these tables.

An object-relational mapping implementation should systematically and predictably choose which tables to use and generate the necessary SQL. The Object-Relational impedance mismatch between object oriented languages, such as Java or C++, and data stored in a relational database management system (RDBMS), such as Oracle or IBM DB2, presents a number of challenges in achieving:

The real values in using an ORM tool are to save time, simplify development (i.e. the ORM tool handles the complexity for the developer), increase performance or scalability, and minimize architectural challenges related to inability of the ORM tool or developer's experience.

Many packages have been developed to reduce the tedium of developing object-relational mapping systems by providing libraries of classes which are able to perform mappings automatically. Given a list of tables in the database, and objects in the program, they will automatically map requests from one to the other. Asking a person object for its phone numbers will result in the proper query being created and sent, and the results being translated directly into phone number objects inside the program. [1]

From a programmer's perspective, the system should look like a persistent object store. One can create objects and work with them as one would normally, and they automatically end up in the database.

In practice, however, things are never quite that simple. All ORM systems tend to make themselves visible in various ways, reducing to some degree one's ability to ignore the database. Worse, the translation layer can be slow and inefficient (notably in terms of the SQL it writes), resulting in programs that are slower and use more memory than code written "by hand."

A number of ORM systems have been created over the years, but their effect on the market seems mixed. NeXT's Enterprise Objects Framework (EOF) was once considered one of the best such systems, but it never achieved broad marketshare, chiefly because it was tightly tied to NeXT's entire toolkit, OpenStep. It was later integrated into NeXT's WebObjects, the first object-oriented Web Application Server. Since Apple Computer bought NeXT in 1997, EOF provides the technology behind the company's e-commerce Web site, the .Mac services and the iTunes Music Store. Apple provides EOF in two implementations: the Objective-C implementation that comes with the Apple Developers Tools and the Pure Java implementation that comes in WebObjects 5.X.

Enterprise Objects Framework has influenced and inspired many subsequent ORM efforts, including open source Apache Cayenne. Cayenne has similar goals to EOF and aims to meet the JPA standard.

An alternative approach is being taken with technologies such as RDF and SPARQL, and the concept of the "triplestore". RDF is a serialization of the subject-predicate-object concept, RDF/XML is an XML representation of it, SPARQL is an SQL-like query language, and a triplestore is a general description of any database that deals with a triple.

More recently, a similar system has started to evolve in the Java world, known as Java Data Objects (JDO). Unlike EOF, JDO is a standard, and several implementations are available from different vendors. The Enterprise Java Beans 3.0 (EJB3) specification also covers this same area. There has been standards conflict between the two standards bodies in terms of pre-eminence. JDO has several commercial implementations, while EJB 3.0 is still under development. However, most recently another new standard has been announced by JCP to bring these two standards together and make the future standard something that works with various Java architectures.

Another example to mention is Hibernate, a popular O/R mapping framework in the Java world that has features similar to EJB3. NHibernate is a free, open-source port of Hibernate to the Microsoft .NET platform.

Service Data Objects is another standard driven by the need of delivering updatable datagraphs to business level components written in any programming language. Then the O/R mapping is done at the data access layer level, possibly driven by an enterprise Metadata repository, and reusable by every client application.

[edit] Non-SQL databases

Another solution would be to use an object-oriented database management system, which, as the name implies, is a database designed specifically for working with object-oriented values. Using an OODBMS would eliminate the need for converting data to and from its SQL form, as the data would be stored in its original object representation.

Databases such as Caché do not require manual ORM. SQL access to non-scalar values is already built in. Caché allows the developer to design any combination of OO and table structured storage within the database instead of resorting to external tool sets.

Object-oriented databases have yet to come into widespread use. One of their main limitations is that switching from an SQL DBMS to a purely object-oriented DBMS means you lose the capability to create SQL queries, a tried and tested method for retrieving ad-hoc combinations of data. For this reason, many programmers find themselves more at home with an object-SQL mapping system, even though most commercial object-oriented databases are able to process SQL queries to a limited extent. Caché has a built-in SQL parser so that interrogations on the object may be done in a straightforward SQL manner.

Other post-relational databases such as Matisse also utilise a SQL engine to interpret the object store.

[edit] Criticism

Some have proposed that the promotion of Object-Relational Mapping tools is symptomatic of an intent to solve the wrong side of the Object-Relational impedance mismatch issue. The information principle underpinning relational databases implies that object orientation itself is inadequate for the full needs of data manipulation, and it is that 'paradigm' as a whole that should be addressed. If this were the case, ORM would be left redundant. In this view, the "impedance mismatch" and the supposed need for object-relational mapping arises from the mistaken equation of object and relation (table or view in SQL speak). The correct mapping in the relational model is between object and type. Also, ORM systems tend to perform worse than writing SQL directly for more complex tasks. However, most ORM systems allow writing raw SQL to some degree.

Because of the complexity associated with high performance physical data models, it is often impossible to build any usable interface to a relational database which is easily navigable by a naive user. This is simply a consequence of having a high quality, normalized relational database. When interfacing with database, multiple tables must therefore be "merged" in the users view, or alternatively, information must be abstracted into data formats such as XML which utilize data structures which are more easily managed by object oriented methodologies.

These needs are almost ubiquitous in the enterprise environment, even when ORM tools are not being used. This suggests the fact that many, if not all, data driven systems with high level interfaces which are not "ORM" in name must implement ORM on some level (especially since many enterprise systems are built on object oriented frameworks such as Java). For example, a join operation on two related tables may be considered a type of object relational mapping since it merges lower level data tuples into a more easily navigable, higher order structure. Thus, a join is a way of hardcoding object relational logic into SQL.

ORM tools have been criticized for their tendency to eclipse the work done by Database Administrators for optimization and performance. This issue has been solved by most tools by redundantly allowing hardcoded SQL, stored procedure calling, and by implementing dirty solutions for data acquisition and insertion, such as transaction management. In addition, the convenience involved with caching custom selected data sets in memory using ORM methodologies makes the implementation of dirty solutions at the database level unnecessary in many environments.

In this context, an ORM tool is simply any tool which joins relational data in the context of a Business object (computer science) required for some real world application.

[edit] See also

[edit] References

  1. ^ Animation showing how an object-relational mapping utility works

[edit] External links