User:Ben White/WIP
From Wikipedia, the free encyclopedia
Contents |
[edit] Essbase
[edit] History and Motivation
Although Essbase has been categorised as a general-purpose multidimensional database, it was originally developed to address the scalability issues associated with spreadsheets such as Lotus 1-2-3 and Microsoft Excel. Indeed, the patent covering Essbase uses spreadsheets as a motivating example to illustrate the need for such a system. [1]
In this context, "multi-dimensional" refers to the representation of financial data in spreadsheet format. A typical spreadsheet may display time intervals along column headings, and account names on row headings. For example:
Jan | Feb | Mar | Total | |
---|---|---|---|---|
Quantity | 1000 | 2000 | 3000 | 6000 |
Sales | $100 | $200 | $300 | $600 |
Expenses | $80 | $160 | $240 | $480 |
Profit | $20 | $40 | $60 | $120 |
If a user wants to break down these values by Region, for example, this typically involves the duplication of this table on multiple spreadsheets:
|
|
|
An alternative representation of this structure would be a three-dimensional spreadsheet grid, giving rise to the idea that "Time", "Account", and "Region" are dimensions. As further dimensions are added to the system, it becomes very difficult to maintain spreadsheets that correctly represent the multi-dimensional values. Multidimensional databases such as Essbase provide a data store for values that exist, at least conceptually, in a multi-dimensional hypercube.
[edit] Sparsity
A technical problem faced by multidimensional databases is the physical representation of data as the number and size of dimensions increases. Say the above example was extended to add a "Customer" and "Product" dimension:
Dimension | Number of dimension values |
---|---|
Accounts | 4 |
Time | 4 |
Region | 3 |
Customer | 10,000 |
Product | 5,000 |
If the multidimensional database reserved storage space for every possible value, it would need to store 2,400,000,000 (4 × 4 × 3 × 10000 × 5000) cells. If each cell is represented as a 64-bit floating point value, this equates to a memory requirement of at least 17 gigabytes. In practice, of course, the number of combinations of "Customer" and "Product" that contain meaningful values will be a tiny subset of the total space. This property of multi-dimensional spaces is referred to as sparsity.
[edit] Aggregation
OLAP systems generally provide for multiple levels of detail within each dimension by arranging the members of each dimension into one or more hierarchies. A Time dimension, for example, may be represented as a hierarchy starting with "Total Time", and breaking down into multiple years, then quarters, then months. An Accounts dimension may start with "Profit", which breaks down into "Sales" and "Expenses", and so on.
In the example above, if "Product" represents individual product SKUs, analysts may want to also be able to report using aggregations such as "Product Group", "Product Family", "Product Line", etc. Similarly, for "Customer", natural aggregations may arrange customers according to geographic location or industry.
The number of aggregate values implied by a set of input data can be surprisingly large. If the Customer and Product dimensions are each in fact six "generations" deep, then 36 (6 × 6) aggregate values are affected by a single data point. It follows that if all these aggregate values are to be stored, the amount of space required is proportional to the product of the depth of all aggregating dimensions. For large databases, this can cause the effective storage requirements to be many hundred times the size of the data being aggregated.
[edit] Block Storage
Since version 7, Essbase has supported two "storage options" which take advantage of sparsity to minimise the amount of physical memory and disk space required to represent large multidimensional spaces. The Essbase patent[1] describes the original method, which aimed to reduce the amount of physical memory required without increasing the time required to look up closely-related values. With the introduction of alternative storage options, this was named Block Storage Option (Essbase ASO), and later referred to as Essbase Analytics in marketing material.
Put briefly, Essbase requires the developer to tag dimensions as "dense" or "sparse". The system then arranges data to represent the hypercube into "blocks", where each block is multi-dimensional array made up of "dense" dimensions, and space is allocated for every potential cell in that block. Sparsity is exploited because the system only creates blocks when required. In the example above, say the developer has tagged "Accounts" and "Time" as "dense", and "Region", "Customer, and "Product" as "sparse". If there are, say, 12,000 combinations of Region, Customer and Product that contain data, then only 12,000 blocks will be created, each block large enough to store every possible combination of Accounts and Time. The number of cells stored is therefore 192000 (4 × 4 × 12000), requiring under 2 megabytes of memory, plus the size of the index used to look up the appropriate blocks.
Because this implementation is hidden from front-end tools (i.e., a report that attempts to retrieve data from non-existent cells merely sees "null" values), the full hypercube can be navigated naturally, and it is possible to load values into any cell interactively.
[edit] Calculation Engine
Calculations in Essbase BSO can be specified as:
- the aggregation of values through dimensional hierarchies;
- stored calculations on dimension members;
- "dynamically calculated" dimension members; or
- procedural "calculation scripts" that act on values stored in the database.
The first method (dimension aggregation) is implicitly performed through addition, or by selectively tagging branches of the hierarchy to be subtracted, multiplied, divided or ignored. Also, the result of this aggregation can be stored in the database, or calculated dynamically on demand -- members must be tagged as "Stored" or "Dynamic Calc." to specify which method is to be used.
The second method (stored calculations) uses a formula against each calculated dimension member -- when Essbase calculates that member, the result is stored against that member just like a data value.
The third method (dynamic calculation) is specified in exactly the same format as stored calculations, but are calculated when a value addressed by that member is accessed by a user, and are not stored.
The fourth method (calculation scripts) uses a procedural programming language specific to the Essbase calculation engine. This type of calculation may act upon any data value in the hypercube, and can therefore be used to perform calculations that cannot be expressed as a simple formula.
A calculation script must also be executed to trigger the calculation of aggregated values or stored calculations as described above -- a built-in calculation script (called the "default calculation") can be used to execute this type of calculation.
[edit] Aggregate Storage
Although Block Storage effectively minimizes storage requirements without impacting retrieval time, it is limited by its treatment of aggregate data in large applications, motivating the introduction of a second storage engine, named Aggregate Storage Option (Essbase ASO) or more recently, Enterprise Analytics. This storage option makes the database behave much more similarly to OLAP databases like SQL Server Analysis Services.
Following a data load, Essbase ASO does not store any aggregate values, but instead calculates them on demand. For large databases, where the time required to generate these values is inconvenient, the database can materialize one or more aggregate "views", made up of one aggregate level from each dimension (for example, the database may calculate all combinations of the fifth generation of Product with the third generation of Customer), and these views are then used to generate other aggregate values where possible. This process can be partially automated, where the administrator specifies the amount of disk space that may be used, and the database generates views according to actual usage.
The major drawback of this approach is that the cube cannot be treated for calculation purposes as a single large hypercube, because aggregate values cannot be directly controlled, so write-back from front-end tools, and complex calculations that cannot be expressed as MDX expressions are not possible.
[edit] Calculation Engine
Calculations in Essbase ASO can be specified as:
- the aggregation of values through dimensional hierarchies; or
- dynamically calculated dimension members.
The first method (dimension aggregation) is basically the same as for Essbase BSO.
The second method (dynamic calculations) are specified by entering a MDX expression against the dimension member.