Primary key

From Wikipedia, the free encyclopedia

In database design, a primary key is a value that can be used to identify a unique row in a table. Attributes are associated with it. Examples of primary keys are Social Security numbers (associated to a specific person) or ISBNs (associated to a specific book).

In the relational model of data, a primary key is a candidate key chosen as the main method of uniquely identifying a tuple in a relation. Practical telephone books and dictionaries cannot use names or words or Dewey Decimal System numbers as candidate keys because they do not uniquely identify telephone numbers or words. A primary key (as well as a unique key) can be referenced by a foreign key.

Primary Keys are defined in the ANSI SQL Standard, through the PRIMARY KEY constraint. The syntax to add such a constraint to an existing table is defined in SQL:2003 like this:

ALTER TABLE <table identifier> ADD [ CONSTRAINT <constraint identifier> ] PRIMARY KEY ( <column expression> {, <column expression>}... )

The primary key can also be specified directly during table creation. In the SQL Standard, primary keys may consist of one or multiple columns. Each column participating in the primary key is implicitly defined as NOT NULL. If the primary key consists only of a single column, the column can be marked as such using the following syntax:

CREATE TABLE table_name (
   id_col  INT  PRIMARY KEY,
   col2    CHARACTER VARYING(20),
   ...
)

Note that the relational model, as expressed through relational calculus and relational algebra, does not distiguish between primary keys and other kinds of keys. In the fourth normal form and higher, no relation has more than one key, so the concept of a "primary key" becomes irrelevant. Primary keys were added to the SQL standard mainly as a convenience to the application programmer and because applications usually use at most third normal form.

[edit] Surrogate Primary Keys

In some design situations the natural key that uniquely identifies a tuple in a relation is difficult to use for software development. For example, it may involve multiple columns or large text fields. A surrogate key can be used as the primary key. In other situations there may be more than one candidate key for a relation, and no candidate key is obviously preferred. A surrogate key may be used as the primary key to avoid giving one candidate key artificial primacy over the others.

Since Primary Keys exist primarily as a convenience to the programmer, surrogate primary keys are used extensively, and in many cases exclusively, in database application design. See surrogate key article for further discussion of this.

Due to the popularity of surrogate primary keys, many developers and in some cases even theoreticians have come to regard surrogate primary keys as an inalienable part of the relational data model. This is probably due largely to a transfer of Object-Oriented Programming principles to the relational database to create a kind of hybrid model. In this hybrid model, these additional restrictions are placed on primary keys:

Primary keys should be immutable, that is, not change until the record is destroyed.
Primary keys should be anonymous integer or numeric identifiers.

However, neither of these restrictions are part of the relational model or any SQL standard.