ROLAP
From Wikipedia, the free encyclopedia
ROLAP stands for Relational Online Analytical Processing.
ROLAP is an alternative to the MOLAP (Multidimensional OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the pre-computation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions.
While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database.
Contents |
[edit] ROLAP vs. MOLAP
The discussion of the advantages and disadvantages of ROLAP below, focus on those things that are true of the most widely used ROLAP and MOLAP tools available today. In some cases there will be tools which are exceptions to any generalization made.
[edit] Advantages of ROLAP
- ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality (ie. millions of members).
- With a variety of data loading tools available, and the ability to fine tune the ETL code to the particular data model, load times are generally much shorter than with the automated MOLAP loads.
- The data is stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool).
- ROLAP tools are better at handling non-aggregatable facts or (e.g. textual descriptions). MOLAP tools tend to suffer from slow performance when querying these elements.
- By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model.
[edit] Disadvantages of ROLAP
- There is a general consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance.
- The loading of aggregate tables must be managed by custom ETL code. The ROLAP tools do not help with this task. This means additional development time and more code to support.
- Many ROLAP dimensional model implementors skip the step of creating aggregate tables. The query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables, however it is still not practical to create aggregate tables for all combinations of dimensions/attributes.
- ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAP tools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements in SQL language such as CUBE and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the MOLAP tools.
- Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.
[edit] Performance of ROLAP
[edit] OLAP Survey
In the OLAP industry ROLAP is usually perceived as being able to scale for large data volumes, but suffering from slower query performance as opposed to MOLAP. The OLAP Survey, the largest independent survey across all major OLAP products, being conducted for 5 years (2001 to 2005) have consistently found that companies using ROLAP report slower performance than those using MOLAP.
However, as with any survey there are a number of subtle issues that must be taken into account when interpreting the results.
- ROLAP tools are generally selected by companies with larger volumes of data (high cardinality dimensions), due to ROLAPs superior scalability, and the same survey also consistently confirms this. In the OLAP Survey 3 results ROLAP tools had a median data volume of 312 GB compared to 4 GB for MOLAP tools. [1] Obviously, larger data volumes lead to longer query times.
- The survey also shows that ROLAP tools have 7 times more users than MOLAP tools within each company. Systems with more users will tend to suffer more performance problems at peak usage times.
- There is also a question about complexity of the model, measured both in number of dimensions and richness of calculations. The survey does not offer a good way to control for these variations in the data being analyzed.
[edit] Downside of flexibility
Some companies select ROLAP because they intend to re-use existing relational database tables -- these tables will frequently not be optimally designed for OLAP use. The superior flexibility of ROLAP tools allows this less than optimal design to work, but performance suffers. MOLAP tools in contrast would force the data to be re-loaded into an optimal OLAP design.
[edit] Trends
The undesirable trade-off between additional ETL cost and slow query performance has ensured that most commercial OLAP tools now use a "Hybrid OLAP" (HOLAP) approach, which allows the model designer to decide which portion of the data will be stored in MOLAP and which portion in ROLAP.
[edit] Products
Examples of commercial products using ROLAP include Microsoft Analysis Services, Microstrategy and Business Objects, Oracle BI (the former Siebel Analytics). There is also an open source ROLAP server - Mondrian.