Unique key

In database relational modeling and implementation, a unique key is a set of zero, one or more attributes, the value(s) of which are guaranteed to be unique for each tuple (row) in a relation. The value or combination of values of unique key attributes for any tuple cannot be duplicated for any other tuple in that relation.

When more than one column is combined to form a unique key, their combined value is used to access each row and maintain uniqueness. Values are not combined, they are compared using their data types.

When a column or set of columns is defined as unique to the database management system, the system verifies that each set of value(s) is unique before assigning the unique constraint. After the column(s) are defined as unique, an error will occur if an insertion is attempted with values that already exist. Some systems do not allow key values to be updated, but all systems will not allow duplicates, ensuring that uniqueness is maintained.

Summary

In an entity-relationship diagram of a data model, one or more unique keys may be declared for each data entity. Each unique key is composed from one or more data attributes of that data entity. The set of unique keys declared for a data entity is often referred to as the candidate keys for that data entity. From the set of candidate keys, a single unique key is selected and declared the primary key for that data entity. In an entity relationship diagram, each entity relationship uses a unique key, most often the primary key, of one data entity and copies the unique key data attributes to another data entity to which it relates. This inheritance of the unique key data attributes is referred to as a foreign key and is used to provide data access paths between data entities. Once the data model is instantiated into a database, each data entity usually becomes a database table, unique keys become unique indexes associated with their assigned database tables, and entity relationships become foreign key constraints. In integrated data models,[1] commonality relationships[2] do not become foreign key constraints since commonality relationships are a peer-to-peer type of relationship.

The primary key may consist of a single attribute or a multiple attributes in combination. For example, a birthday could be shared by many people and so would not be a prime candidate for the primary key, but a Social Security Number or driver's license number would be ideal since it correlates to one single data value.

There are several types of unique keys used in database modeling and implementations.[3]

Key Name Definition
Simple A key made from only one attribute.
Concatenated A key made from more than one attribute joined together as a single key, such as part or whole name with a system generated number appended as often used for E-mail addresses.
Compound A key made from at least two attributes or simple keys, only simple keys exist in a compound key.
Composite A key containing at least one compound key with at least one other attribute or simple key (this is an extension of a compound key).
Natural A key made from data that exists outside the current database. In other words, the data is not system generated, such as a social security number imported from another system.
Surrogate An artificial key made from data that is system assigned or generated when another candidate key exists. Surrogate keys are usually numeric ID values and often used for performance reasons.
Candidate A key that may become the primary key.
Primary The key that is selected as the primary key. Only one key within an entity is selected to be the primary key. This is the key that is allowed to migrate to other entities to define the relationships that exist among the entities. When the data model is instantiated into a physical database, it is the key that the system uses the most when accessing the table, or joining the tables together when selecting data.
Alternate A non-primary key that can be used to identify only one row in a table. Alternate keys may be used like a primary key in a single-table select.
Foreign A unique key that has migrated to another entity.

At the most basic definition, "a key is a unique identifier",[4] so unique key is redundant. Keys that are within their originating entity are unique within that entity. Keys that migrate to another entity may or may not be unique, depending on the design and how they are used in the other table. Foreign keys may be the primary key in another table; for example a PersonID may become the EmployeeID in the Employee table. In this case, the EmployeeID is both a foreign key and the unique primary key, meaning that the tables have a 1:1 relationship. In the case where the person entity contained the biological father ID, the father ID would not be expected to be unique because a father may have more than one child.

Here is an example of a primary key becoming a foreign key on a related table. ID migrates from the Author table to the Book table.

Author TABLE Schema:
 
Author(ID, Name, Address, Born)
 
Book TABLE Schema:
 
Book(ISBN, AuthorID, Title, Publisher, Price)

Here ID serves as the primary key in the table 'Author', but also as AuthorID serves as a Foreign Key in the table 'Book'. The Foreign Key serves as the link, and therefore the connection, between the two related tables in this sample database.

In a relational database, a candidate key uniquely identifies each row of data values in a database table. A candidate key comprises a single column or a set of columns in a single database table. No two distinct rows or data records in a database table can have the same data value (or combination of data values) in those candidate key columns since NULL values are not used. Depending on its design, a database table may have many candidate keys but at most one candidate key may be distinguished as the primary key.

A unique key constraint does not imply the NOT NULL constraint in practice. Because NULL is not an actual value (it represents the lack of a value), when two rows are compared, and both rows have NULL in a column, the column values are not considered to be equal. Thus, in order for a unique key to uniquely identify each row in a table, NULL values must not be used. According to the SQL[5] standard and Relational Model theory, a unique key (unique constraint) should accept NULL in several rows/tuples — however not all RDBMS implement this feature according to the SQL standard.[6][7]

A unique key should uniquely identify all possible rows that exist in a table and not only the currently existing rows . Examples of unique keys are Social Security numbers (associated with a specific person[8]) or ISBNs (associated with a specific book). Telephone books and dictionaries cannot use names, words, or Dewey Decimal Classification system numbers as candidate keys because they do not uniquely identify telephone numbers or words.

A table can have at most one primary key, but it may have more than one candidate key. A primary key is a combination of columns which uniquely specify a row; it is a special case of unique keys. One difference is that primary keys have an implicit NOT NULL constraint while unique keys do not. Thus, the values in unique key columns may or may not be NULL, and in fact such a column may contain at most one NULL fields.[9] Another difference is that primary keys must be defined using another syntax.

The relational model, as expressed through relational calculus and relational algebra, does not distinguish between primary keys and other kinds of keys. Primary keys were added to the SQL standard mainly as a convenience to the application programmer.

Unique keys as well as primary keys may be logically referenced by foreign keys, but most RDBMS only allow a foreign key constraint against a primary key.

Defining primary keys

Primary keys are defined in the ANSI SQL Standard, through the PRIMARY KEY constraint. The syntax to add such a constraint to an existing table is defined in SQL:2003 like this:

  ALTER TABLE <TABLE identifier> 
      ADD [ CONSTRAINT <CONSTRAINT identifier> ] 
      PRIMARY KEY ( <COLUMN expression> {, <COLUMN expression>}... )

The primary key can also be specified directly during table creation. In the SQL Standard, primary keys may consist of one or multiple columns. Each column participating in the primary key is implicitly defined as NOT NULL. Note that some RDBMS require explicitly marking primary key columns as NOT NULL.

  CREATE TABLE TABLE_NAME (
 
     ...
  )

If the primary key consists only of a single column, the column can be marked as such using the following syntax:

  CREATE TABLE TABLE_NAME (
     id_col  INT  PRIMARY KEY,
     col2    CHARACTER VARYING(20),
     ...
  )

Differences between Primary Key and Unique Key:

Primary Key
1. A primary key cannot allow null (a primary key cannot be defined on columns that allow nulls).
2. Each table cannot have more than one primary key.
3. On some RDBMS a primary key generates a clustered index by default.

Unique Key
1. A unique key can allow null (a unique key can be defined on columns that allow nulls.)
2. Each table can have multiple unique keys.
3. On some RDBMS a unique key generates a nonclustered index by default.

Defining unique keys

The definition of unique keys is syntactically very similar to primary keys.

  ALTER TABLE <TABLE identifier> 
      ADD [ CONSTRAINT <CONSTRAINT identifier> ] 
      UNIQUE ( <COLUMN expression> {, <COLUMN expression>}... )

Likewise, unique keys can be defined as part of the CREATE TABLE SQL statement.

  CREATE TABLE TABLE_NAME (
     id_col   INT,
     col2     CHARACTER VARYING(20),
     key_col  SMALLINT,
     ...
     CONSTRAINT key_unique UNIQUE(key_col),
     ...
  )
  CREATE TABLE TABLE_NAME (
     id_col  INT  PRIMARY KEY,
     col2    CHARACTER VARYING(20),
     ...
     key_col  SMALLINT UNIQUE,
     ...
  )

Surrogate keys

Main article: Surrogate key

In some circumstances the natural key that uniquely identifies a tuple in a relation may be cumbersome to use for software development. For example, it may involve multiple columns or large text fields. In such cases, a surrogate key can be used instead as the primary key. In other situations there may be more than one candidate key for a relation, and no candidate key is obviously preferred. A surrogate key may be used as the primary key to avoid giving one candidate key artificial primacy over the others.

Since primary keys exist primarily as a convenience to the programmer, surrogate primary keys are often used, in many cases exclusively, in database application design.

Due to the popularity of surrogate primary keys, many developers and in some cases even theoreticians have come to regard surrogate primary keys as an inalienable part of the relational data model. This is largely due to a migration of principles from the Object-Oriented Programming model to the relational model, creating the hybrid object-relational model. In the ORM, these additional restrictions are placed on primary keys:

However, neither of these restrictions is part of the relational model or any SQL standard. Due diligence should be applied when deciding on the immutability of primary key values during database and application design. Some database systems even imply that values in primary key columns cannot be changed using the UPDATE SQL statement.

Alternate key

Typically, one candidate key is chosen as the primary key. Other candidate keys become alternate keys, each of which may have a unique index assigned to it in order to prevent duplicates (a duplicate entry is not valid in a unique column).[10]

Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

See also

References

  1. Data Model Integration | The Integration of Data Models
  2. Commonality Relationships | Commonality Constraints
  3. "Choosing a Primary Key: Natural or Surrogate?". AgileData.org. Retrieved August 28, 2014.
  4. Awad, Elias (1985), Systems Analysis and Design, Second Edition, Richard D. Irwin, Inc., ISBN 0-256-02824-9
  5. Summary of ANSI/ISO/IEC SQL
  6. Constraints - SQL Database Reference Material - Learn sql, read an sql manual, follow an sql tutorial, or learn how to structure an SQL query
  7. Comparison of different SQL implementations
  8. SSN uniqueness: Rare SSN duplicates do exist in the field, a condition that led to problems with early commercial computer systems that relied on SSN uniqueness. Practitioners are taught that well-known duplications in SSN assignments occurred in the early days of the SSN system. This situation points out the complexity of designing systems that assume unique keys in real-world data.
  9. MySQL 5.5 Reference Manual :: 12.1.14. CREATE TABLE Syntax "For all engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL."
  10. Alternate key - Oracle FAQ

External links