Slowly Changing Dimension

From Wikipedia, the free encyclopedia

Dimension is a term in data management and data warehousing that refers to logical groupings of data such as geographical location, customer information, or product information. Slowly Changing Dimensions (SCD) are dimensions that have data that slowly changes.

For example, you may have a Dimension in your database that tracks the sales records of your company's salesmen. Creating sales reports seems simple enough, until a sales person is transferred from one regional office to another. How do you record such a change in your sales Dimension?

You could sum or average the sales by salesperson, but if you use that to compare the performance of sales people, that might give misleading information. If the sales person that was tranferred used to work in a hot market where sales were easy, and now works in a market where sales are infrequent, his totals will look much stronger than the other salespeople in his new region, even if they are just as good. Or you could create a second sales-person record and treat the transferred person as a new sales person, but that creates problems also.

Dealing with these issues involves SCD management methodologies referred to as Type 0, 1, 2, 3, and 4.


Contents

[edit] Type 0

Type 0 is used, somewhat infrequently, to refer to an SCD where no effort has been made to deal with the issues.


[edit] Type 1

The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past.)

Another example would be of a database table that keeps supplier information.

Supplier_key Supplier_Name Supplier_State

001 Phlogistical Supply Company CA

Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply overwrite this record:

Supplier_key Supplier_Name Supplier_State

001 Phlogistical Supply Company IL

The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to this is that these are very easy to maintain.

[edit] Type 2

The Type 2 method tracks historical data by creating multiple records in the dimensional tables with separate keys.

In the same example, if the supplier moves to Illinois, the table would look like this:

Supplier_key Supplier_Name Supplier_State

001 Phlogistical Supply Company CA

002 Phlogistical Supply Company IL


[edit] Type 3

The Type 3 method track changes using separate columns. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the tables:

Supplier_key Supplier_Name Original_Supplier_State Effective_Date Current_Supplier_State

001 Phlogistical Supply Company CA 12-22-2004 IL

Note that this record can not track all historical changes, such as when a supplier moves twice.


[edit] Type 4

The Type 4 method is usually just referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of some or all changes.

Following the example above, the original table might be called Supplier and the history table might be called Supplier_History.


[edit] Combining Types

Different types can be applied to different columns of a table. For example, we can apply Type 1 to Supplier Name column and Type 2 to Supplier State column of the same table, the Supplier table.


[edit] Alternative Typing

Todd Schraml has a well-written article available at Data Management Review.