Second normal form

Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.[1]

A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table. A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table.

Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of every candidate key.

Example

Consider a table describing employees' skills:

Employees' Skills
Employee Skill Current Work Location
BrownLight Cleaning73 Industrial Way
BrownTyping73 Industrial Way
HarrisonLight Cleaning73 Industrial Way
JonesShorthand114 Main Street
JonesTyping114 Main Street
JonesWhittling114 Main Street

Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to appear more than once (the employee might have multiple Skills), and a given Skill might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for the table.

The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Brown works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Shorthand" and "Typing" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?", unless it is meant that Jones exercises different skills at different locations.

A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}:

Employees
Employee Current Work Location
Brown 73 Industrial Way
Harrison 73 Industrial Way
Jones 114 Main Street
Employees' Skills
Employee Skill
BrownLight Cleaning
BrownTyping
HarrisonLight Cleaning
JonesShorthand
JonesTyping
JonesWhittling

Neither of these tables can suffer from update anomalies.

Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update anomalies is:

Tournament Winners
Tournament Year Winner Winner Date of Birth
Des Moines Masters1998Chip Masterson14 March 1977
Indiana Invitational1998Al Fredrickson21 July 1975
Cleveland Open1999Bob Albertson28 September 1968
Des Moines Masters1999Al Fredrickson21 July 1975
Indiana Invitational1999Chip Masterson14 March 1977

Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads to an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two different dates of birth.

The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year.

This problem is addressed by third normal form (3NF).

2NF and candidate keys

A functional dependency on part of any candidate key is a violation of 2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.

Multiple candidate keys occur in the following table:

Electric Toothbrush Models
Manufacturer Model Model Full Name Manufacturer Country
ForteX-PrimeForte X-PrimeItaly
ForteUltracleanForte UltracleanItaly
Dent-o-FreshEZbrushDent-o-Fresh EZbrushUSA
KobayashiST-60Kobayashi ST-60Japan
HochToothmasterHoch ToothmasterGermany
Hoch X-PrimeHoch X-PrimeGermany

Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer. To make the design conform to 2NF, it is necessary to have two tables:

Electric Toothbrush Manufacturers
Manufacturer Manufacturer Country
ForteItaly
Dent-o-FreshUSA
KobayashiJapan
HochGermany
Electric Toothbrush Models
Manufacturer Model Model Full Name
ForteX-PrimeForte X-Prime
ForteUltracleanForte Ultraclean
Dent-o-FreshEZbrushDent-o-Fresh EZbrush
KobayashiST-60Kobayashi ST-60
HochToothmasterHoch Toothmaster
Hoch X-PrimeHoch X-Prime

See also

References

  1. Codd, E.F. "Further Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems," New York City, May 24th-25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin (ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.

Further reading

External links