Dimensional database

From Wikipedia, the free encyclopedia

A dimensional database is one which, rather than storing data in multiple two dimensional tables (as a relational database does), represents key data entities as different dimensions. That is, multidimensional database systems offer an extension to the relational system to provide a multi-dimensional view of the data (Rand). For example, in multi-dimensional analysis, data entities such as products, regions, customers, dates etc. may all represent different dimensions. This intrinsic feature of the database structure will be covered in depth in subsequent sections of this article.

Some further advantages to this database model are:

The ability to analyse large amounts of data with very fast response times.
To "slice and dice" through data, and "drill down or roll up" through various dimensions of the defined data structure.
To quickly identify trends or problem areas that would have been otherwise overlooked in an industry environment.

1 Description
2 History
3 Why use dimensional databases?
4 Advantages
5 See also

[edit] Description

Multi-dimensional data structures can be implemented with multi-dimensional databases, or else they can also be implemented in a relational database management system using such techniques as the "star schema" and the "snowflake schema" (Weldon 1995).

The star schema is a means of storing data based on a set of known database dimensions, attempting to store a multi-dimensional data structure in a two-dimensional relational database management system (RDBMS). A star schema model is a representation of a central fact table with foreign keys to many dimension tables. The snowflake schema is a normalized implementation of dimensional data with foreign keys in the primary dimension tables referencing additional dimensional data. A snowflake does not increase the dimensionality of the model as the dimensionality (or grain) is defined by the dimensional foreign keys in the fact table. Use of snowflakes in a relational dimensional model is generally discouraged as it can have a significant impact on query performance. Normally snowflakes are eliminated by denormalizing the 'outlying' dimensional data into a primary dimension table.

[edit] History

The relational database model uses a two-dimensional structure of rows and columns to store data, in tables of records corresponding to real-world entities. Tables can be linked by common key values. Edgar F. Codd first designed this model in 1970, while working for IBM, and its simplicity revolutionized database usage at the time. Codd's work was in many ways ahead of its time, as computing power could not support the overheads of his database system (Hasan 1999).

In the 1980s the power of computers had grown to the point where these overheads were no longer a problem, and today relational database management systems (RDBMS) are available on local desktops, as well as large organisational database management servers.

[edit] Why use dimensional databases?

The techniques of entity-relationship (ER) modelling and the structuring of data in normalised tables have become popular with trained database administrators and designers, who routinely use relational DBMS to store huge volumes of organizational data with very high transaction rates.

Although deceptively simple to design and operate, relational database simplicity for the end-user does fall down when it comes to running queries. Accessing data from relational databases may require complex joins of many tables and is distinctly non-trivial for untrained end-users, who may be forced to hire IT professionals to structure such queries in a query language, such as SQL. When queries of a writing nature are run, such as INSERT, DELETE and ALTER TABLE, the consequences of getting it wrong are greatly increased when they are employed on a live system environment.

In a multi-dimension database system, the data is presented to the user in such a way as to represent a hypercube, or multi-dimensional array, where each individual data value is contained within a cell accessible by multiple indexes.

The multi-dimensional array structure represents a higher level of organization than the relational table. The structure itself represents a more intelligent view of the data it contains, because our perspectives of this data are embedded directly into the structure as dimensions, as opposed to being placed into fields.

[edit] Advantages

Apart from the inherent advantages of using a multi-dimensional array structure, multi-dimensional databases also contain the following advantages.

Enhanced Data Presentation and Navigation: Intuitive spreadsheet-like views of the data are the output of multi-dimensional databases. Such views are difficult to generate in relational systems without the use of complex SQL queries, while others cannot be performed by standard SQL at all, eg. top ten exam results.

Ease of Maintenance: Multi-dimensional databases are very easy to maintain, because data is stored in the same way as it is viewed, that is according to its fundamental attributes, so no additional computational overhead is required for queries of the database. Compare this to relational system, where complex indexing and joins may be used that require significant maintenance and overhead.

Increased Performance: Multi-dimensional database achieve performance levels that are well in excess of that of relational systems performing similar data storage requirements. These high performance levels encourage and enable OLAP applications. Performance can be improved in relational systems through database tuning, but the database cannot be tuned for every possible on-the-fly query. In relational systems, tuning is quite specific, therefore decreasing flexibility, and also requires expensive database specialists.

In summary, multi-dimensional database systems are a complementary technology to entity relational systems, and in some circumstances it makes more sense to use multi-dimensional arrays rather than relational tables.

Where multi-dimensional systems excel over their relational system counterparts is in the area of data presentation and analysis, where the data in question leads itself to being suitable for multi-dimensional systems, such as where complex inter-relationships exist.

The top-level views of data over many combinations of dimensions make multi-dimensional systems particularly useful for trend analysis over time by management staff of organizations, due to the ease of viewing the data in a more naturally intuitive way.