Surrogate key

From Wikipedia, the free encyclopedia

A Surrogate key in a database is a unique identifier for either an entity in the modelled world or an object in the database. The Surrogate Key is not derived from application data.

1 History
2 Definition
3 Surrogates in practice
4 Advantages of Surrogate Keys
5 Disadvantages of Surrogate Keys
6 See also
7 References

[edit] History

The concept of a surrogate is discussed by Langefors (1968) and Engles (1972), but the term entity-surrogate and surrogate was coined by Hall, Owlett and Todd in a paper published in Nijjsen (1976). They explain that "surrogates are representatives where the real thing cannot represent itself," and "that every entity of the outside world is associated with a surrogate which stands for that object in the model. If we wish to refer to an object in the model, we refer to its surrogate using one or more properties to identify uniquely the surrogate required."

On this final point, the authors are saying that if a surrogate value is insufficient to uniquely identify an object in the database, then one or more additional entity attributes may be used.

[edit] Definition

There appear to be two definitions of a surrogate in the literature. We shall call these surrogate (1) and surrogate (2):

Surrogate (1) This definition is based on that given by Hall, Owlett and Todd (1976). Here a surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible by the user or application.

Surrogate (2) This definition is based on that given by Wieringa and de Jung (1991). Here a surrogate represents an object in the database itself. The surrogate is internally generated by the system and is invisible to the user or application.

We shall adopt the surrogate (1) definition throughout this article largely because it is more data model rather than storage model oriented. See Date (1998).

An important distinction exists between a surrogate and a primary key, depending on whether the database is a current database or a temporal database. A current database stores only currently valid data, therefore there is a one-to-one correspondence between a surrogate in the modelled world and the primary key of some object in the database; in this case the surrogate may be used as a primary key, resulting in the term surrogate key. However, in a temporal database there is a many-to-one relationship between primary keys and the surrogate. Since there may be several objects in the database corresponding to a single surrogate, we cannot use the surrogate as a primary key; another attribute is required, in addition to the surrogate, to uniquely identify each object.

Although Hall et alia (1976) say nothing about this, other authors have argued that a surrogate should have the following constraints:

the value is unique system-wide, hence never reused
the value is system generated
the value is not manipulable by the user or application
the value contains no semantic meaning
the value is not visible to the user or application
the value is not composed of several values from different domains.

[edit] Surrogates in practice

In a current database, the surrogate key is the primary key, generated by the database management system and not derived from any application data in the database. The only significance of the surrogate key is to act as the primary key.

A surrogate key is frequently a sequential number (e.g. a Sybase or SQL Server "identity column" or an Oracle SEQUENCE or a column defined with AUTO_INCREMENT in MySQL) but doesn't have to be. Having the key independent of all other columns insulates the database relationships from changes in data values or database design (making your database more agile) and guarantees uniqueness.

In a temporal database, it is necessary to distinguish between the surrogate key and the primary key. Typically, every row would have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the surrogate key identifies the unique entity in the modelled world; these two keys are not the same. For example, table Staff may contain two rows for "John Smith", one row when he was employed between 1990 and 1999, another row when he was employed between 2001 and 2006. The surrogate key is identical (non-unique) in both rows however the primary key will be unique.

Some database designers use surrogate keys religiously regardless of the suitability of other candidate keys, while others will use a key already present in the data, if there is one.

A surrogate may also be called a:

surrogate key
entity identifier
system-generated key
database sequence number
synthetic key
technical key
arbitrary unique identifier

Some of these terms describe the way of generating new surrogate values rather than the nature of the surrogate concept.

Here are some possible candidates for generating surrogates:

Universally Unique Identifiers (UUIDs)
Globally Unique Identifiers (GUIDs)
Object Identifiers (OIDs)
Sybase or SQL Server identity column
Oracle SEQUENCE
PostgreSQL serial
MySQL AUTO_INCREMENT

[edit] Advantages of Surrogate Keys

Immutability: generally surrogate keys do not change while the row exists. This has two advantages:

Database applications won't lose their "handle" on the row because the data changes;
Many database systems do not have good support for cascading updates of keys across foreign key relations. This results in difficulty in modifying the key data when natural keys are used.

Performance: often surrogate keys are composed of some compact data type, such as four-byte integers. In theory, this allows the database system to perform operations and index this value much faster than it could multiple columns or large data types. In many cases, however, this is premature optimization as no real overall performance is gained.

Compatibility: several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate, depend on the use of integer or GUID surrogate keys in order to support database-system-agnostic operation and object-to-row mapping.

[edit] Disadvantages of Surrogate Keys

Disassociation: because the surrogate key is completely unrelated to the data of the row to which it is attached, it is possible for the key to become disassociated with that row, or confused with a surrogate key from another row. This can result in persistent data loss bugs in database applications which are extremely hard to trace. Another problem with disassociated keys is that SQL optimization can be very difficult because there is no semantic data in the indexes to help the optimizer make sensible judgments.

Normalization: in poor database designs, the presence of a surrogate key can cause the developer or database administrator to forget to establish, or accidentally remove, the natural key of the table. This results in a situation where duplicate rows in the table are impossible to identify, and garbage gets introduced into the database.

Business Process Modeling: since surrogate keys are not contextually meaningful, substantial flaws can occur when initially mapping business requirements to a data schema. This is different than a natural key that intrinsically incorporates one or more business rules.

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

[edit] See also

[edit] References

Nijssen, G.M. (1976). Modelling in Data Base Management Systems. North-Holland Pub. Co.. ISBN 0-7204-0459-2.

Engles, R.W.: (1972), A Tutorial on Data-Base Organization, Annual Review in Automatic Programming, Vol.7, Part 1, Pergamon Press, Oxford, pp. 1–64.

Langefors, B: (1968), Elementary Files and Elementary File Records, Proceedings of File 68, an IFIP/IAG International Seminar on File Organisation, Amsterdam, November, pp. 89–96.

The identification of objects and roles: Object identifiers revisited by Wieringa and de Jung (1991).

Relational Database Writings 1994–1997 by C.J. Date (1998), Chapters 11 and 12.

Carter, Breck. Intelligent Versus Surrogate Keys. Retrieved on 2006-12-03.

Berkus, Josh. Database Soup: Primary Keyvil, Part I. Retrieved on 2006-12-03.

Topics in database management systems (DBMS) ( view • talk • edit )
Concepts Database • Database model • Relational database • Relational model • Relational algebra • Primary key, Foreign key, Surrogate key, Superkey, Candidate key • Database normalization • Referential integrity • Relational DBMS • Distributed DBMS • ACID
Objects Trigger • View • Table • Cursor • Log • Transaction • Index • Stored procedure • Partition	Topics in SQL Select • Insert • Update • Merge • Delete • Join • Union • Create • Drop
Implementations of database management systems
Types of implementations Relational • Flat file • Deductive • Dimensional • Hierarchical • Object oriented • Temporal • XML data stores	Components Query language • Query optimizer • Query plan • ODBC • JDBC
Database products
Apache Derby • Berkeley DB • Caché • db4o • DBase • eXtremeDB • Filemaker Pro • Firebird • Greenplum • H2 • Helix • DB2 • Informix • Ingres • InterBase • Linter • Microsoft Access • Microsoft SQL Server • Mimer SQL • MonetDB • MySQL • OpenLink Virtuoso • Oracle • Oracle Rdb • Paradox • Perst • PostgreSQL • SQLite • Sybase IQ • Sybase • Teradata • UniVerse • Visual FoxPro Other: Object-oriented (comparison) • relational (comparison)

Retrieved from "http://en.wikipedia.org../../../s/u/r/Surrogate_key.html"

Categories: FOLDOC sourced articles | Data modeling

Surrogate key

From Wikipedia, the free encyclopedia

Contents

[edit] History

[edit] Definition

[edit] Surrogates in practice

[edit] Advantages of Surrogate Keys

[edit] Disadvantages of Surrogate Keys

[edit] See also

[edit] References

Views

Navigation

interaction

Search

In other languages