First normal form

From Wikipedia, the free encyclopedia

First normal form (1NF) is a normal form used in database normalization. First normal form excludes the possibility of repeating groups by requiring that each field in a database hold an atomic value, and that records be defined in such a way as to be uniquely identifiable by means of a primary key.

Contents

[edit] Breaches of 1NF

[edit] Repeating groups

Repeating groups might occur on a ledger sheet in which each entry on the sheet features variable numbers of sub-entries. For example:

Transactions
Customer Day Amount
Pete Monday 19.00

-28.20

Pete Wednesday -84.00
Sarah Friday 100.00

150.00

-40.00

The 'Amount' sub-entries constitute repeating groups; therefore the information has not been presented in 1NF. Repeating groups can be eliminated simply by transforming each sub-entry into a single entry:

Transactions
Transaction ID Customer Day Amount
1 Pete Monday 19.00
2 Pete Monday -28.20
3 Pete Wednesday -84.00
4 Sarah Friday 100.00
5 Sarah Friday 150.00
6 Sarah Friday -40.00

[edit] Absence of unique identifiers

Suppose that in the Transactions example, two transactions for the same amount are recorded for the same person on the same day:

Transactions
Customer Day Amount
Pete Monday 19.00
Pete Monday 19.00

As represented above, the two transactions are indistinguishable except by ordinal position. In other words, we cannot use the values within the transaction entries themselves to discriminate one transaction from the other. Because the representation above lacks a unique identifier, i.e. a column or set of columns whose values are guaranteed to be different for every entry, the representation breaches 1NF. Normalizing the representation to 1NF requires a unique identifier to be included:

Transactions
Transaction ID Customer Day Amount
1 Pete Monday 19.00
2 Pete Monday 19.00

[edit] 1NF in the context of relational database tables

Most RDBMS products permit the user to define a table that lacks a primary key. Such a table would breach 1NF.

The prohibition against repeating groups, on the other hand, is implicit in the way a relational database represents information. The nature of a database table in an RDBMS is that within each record of the table, a single field can only ever contain a single value. More precisely, it can only ever contain a single instance of the data type to which the column in question belongs. For example, if a column called Favorite Number is defined as an integer, each record is restricted to having one integer (or Null) in the Favourite Number field. Therefore – so long as primary keys are employed – it is technically impossible to design a relational database table that violates 1NF. Chris Date has argued that this holds true even when the RDBMS permits complex data types such as arrays.[1]

Nevertheless, it is possible to design a database table that violates 1NF in spirit. Typical violations involve a) encoding multiple meaningful values in a single field, and b) using multiple columns in a given table to represent the same type of fact.

[edit] Multiple meaningful values in a single field

Some databases effectively violate 1NF by storing a sequence of values within a single field, for example as a comma-separated list:

Foods not eaten, by person
Person Foods Not Eaten
Jim Liver, Goat's cheese
Alice Broccoli
Norman Pheasant, Liver, Peas

Such a design would make it extremely difficult to write a query which involved the matching of identical foods-not-eaten. The design can be altered in such a way as to make it conform to 1NF:

Foods not eaten, by person
Person Food Not Eaten
Jim Liver
Jim Goat's cheese
Alice Broccoli
Norman Pheasant
Norman Liver
Norman Peas

[edit] Multiple columns representing the same type of fact

Also in contravention of 1NF is a design in which multiple columns in a table represent exactly the same type of fact:

Facts About People
Person Favorite Color Food Not Eaten (1) Food Not Eaten (2) Food Not Eaten (3)
Jim Green Liver Goat's cheese
Alice Fuchsia Broccoli
Norman Blue Pheasant Liver Peas
Emily Yellow

This is a poor design, even if we believe that no more than three foods-not-eaten will ever need to be recorded for any person. Suppose, for instance, we wanted to identify all pairs of people who shared at least one food-not-eaten. A food designation may appear in any of three separate columns, so a query would have to be written to check for food-not-eaten matches in 9 (3 x 3) separate pairs of columns!

[edit] Further reading

[edit] External links

Topics in Database normalization

First normal form | Second normal form | Third normal form
Boyce-Codd normal form | Fourth normal form | Fifth normal form | Domain/key normal form | Sixth normal form
Denormalization