Dimensional database
From Wikipedia, the free encyclopedia
A dimensional database is one which, rather than storing data in multiple two dimensional tables (as a relational database does), represents key data entities as different dimensions.
It should be noted that the idea of a table in a relational database being a two dimensional structure is a fallacy. Consider that the dimensions of a cube can be represented in a relation (the mathematical structure on which tables in a relational database are based). If you accept that a table in a relational database is two dimensional then you must also accept that cubes are two dimensional, a clearly absurd position. Thus on closer inspection a standard relational database already is multidimensional and the "dimensional database" is merely a reporting tool. So, so called multidimensional database systems offer a reporting tool that sits on top of a relational system providing a more compact view of multi-dimensional data. For example, in multi-dimensional analysis, data entities such as products, regions, customers, dates etc. may all represent different dimensions. The dimensions are essentially attributes from a relational point of view of course. This intrinsic feature of the database structure will be covered in depth in subsequent sections of this article.
Some further advantages to this database model are:
- The ability to analyse large amounts of data with very fast response times.
- To "slice and dice" through data, and "drill down or roll up" through various dimensions of the defined data structure.
- To quickly identify trends or problem areas that would have been otherwise overlooked in an industry environment.
Contents |
[edit] Description
Multi-dimensional data structures can be implemented with multi-dimensional databases, or else they can also be implemented in a relational database management system using relations.
The star schema is a means of storing data based on a set of known database dimensions, attempting to store a multi-dimensional data structure in a two-dimensional relational database management system (RDBMS). A star schema model is a representation of a central fact table with foreign keys to many dimension tables. The snowflake schema is a normalized implementation of dimensional data with foreign keys in the primary dimension tables referencing additional dimensional data. A snowflake does not increase the dimensionality of the model as the dimensionality (or grain) is defined by the dimensional foreign keys in the fact table. As the relational model is also multi-dimensional it also follows that so-called multi-dimensional database also cannot increase the "dimensionality" of the data. Use of snowflakes in a relational dimensional model is generally discouraged as it can have a significant impact on query performance. Normally snowflakes are eliminated by denormalizing the 'outlying' dimensional data into a primary dimension table.
[edit] History
The relational database model uses a structure of attributes within tuples within relations to represent data. As such the relation model requires no further enhancements to make it "multidimensional". Edgar F. Codd first designed this model in 1970, while working for IBM, and its simplicity revolutionized database usage at the time. Codd's work was in many ways ahead of its time, as computing power could not support the overheads of his database system (Hasan 1999).
In the 1980s the power of computers had grown to the point where these overheads were no longer a problem, and today relational database management systems (RDBMS) are available on local desktops, as well as large organisational database management servers.
[edit] Why use dimensional databases?
The techniques of entity-relationship (ER) modelling and the structuring of data in normalised tables have become popular with trained database administrators and designers, who routinely use relational DBMS to store huge volumes of organizational data with very high transaction rates.
Although deceptively simple to design and operate, relational database simplicity for the end-user does fall down when it comes to running queries. Accessing data from relational databases may require complex joins of many tables and is distinctly non-trivial for untrained end-users, who may be forced to hire IT professionals to structure such queries in a query language, such as SQL. When queries of a writing nature are run, such as INSERT, DELETE and ALTER TABLE, the consequences of getting it wrong are greatly increased when they are employed on a live system environment. Though of course all these difficulties can be avoided simply by creating a view in the relational database which the user can then query comfortably as if it was a single table using standard and widely available query by example style tools.
In a multi-dimension database system, the data is presented to the user in such a way as to represent a hypercube, or multi-dimensional array, where each individual data value is contained within a cell accessible by multiple indexes.
The multi-dimensional array structure represents a report on relational tables with the attribute names placed somewhere other than at the head of each column in the report. The result is a more readable view of the data because our perspectives of the data are more compactly represented in the report.
[edit] Advantages
Apart from the inherent advantages of using a multi-dimensional array structure, multi-dimensional databases also contain the following advantages.
- Enhanced Data Presentation and Navigation: Intuitive spreadsheet-like views of the data are the output of multi-dimensional databases. Such views are difficult to generate in relational systems without the use of complex SQL queries.
- Ease of Maintenance: Multi-dimensional databases are very easy to maintain, because data is stored in the same way as it is viewed, that is according to its fundamental attributes, so no additional computational overhead is required for queries of the database. Compare this to relational system, where complex indexing and joins may be used that require significant maintenance and overhead.
- Increased Performance: Multi-dimensional database achieve performance levels that are well in excess of that of relational systems performing similar data storage requirements. These high performance levels encourage and enable OLAP applications. Performance can be improved in relational systems through database tuning, but the database cannot be tuned for every possible on-the-fly query. In relational systems, tuning is quite specific, therefore decreasing flexibility, and also requires expensive database specialists.
In summary, multi-dimensional database systems are a complementary technology to entity relational systems, and in some circumstances it makes more sense to use multi-dimensional arrays rather than relational tables.
Where multi-dimensional systems excel over their relational system counterparts is in the area of data presentation and analysis, where the data in question leads itself to being suitable for multi-dimensional systems, such as where complex inter-relationships exist.
The top-level views of data over many combinations of dimensions make multi-dimensional systems particularly useful for trend analysis over time by management staff of organizations, due to the ease of viewing the data in a more naturally intuitive way.
[edit] Disadvantages
Data redundancy. In order to create the "dimensional" database you have to copy all of your data somewhere else with all the resulting risks of inconsistencies that result from such activity.
Also someone still needs to write the queries on the relational database to load the data in the "dimensional" database so no effort is saved here.