First normal form

From Wikipedia, the free encyclopedia

First normal form (1NF) is a normal form used in database normalization. First normal form excludes the possibility of repeating groups by requiring that each field in a database hold an atomic value, and that records be defined in such a way as to be uniquely identifiable by means of a primary key.

1 Breaches of 1NF
- 1.1 Repeating groups
- 1.2 Absence of unique identifiers
2 1NF in the context of relational database tables
- 2.1 Multiple meaningful values in a single field
- 2.2 Multiple columns representing the same type of fact
3 Further reading
4 External links

[edit] Breaches of 1NF

[edit] Repeating groups

Repeating groups might occur on a ledger sheet in which each entry on the sheet features variable numbers of sub-entries. For example:

Transactions
Customer	Day	Amount
Pete	Monday	19.00 -28.20
Pete	Wednesday	-84.00
Sarah	Friday	100.00 150.00 -40.00

The 'Amount' sub-entries constitute repeating groups; therefore the information has not been presented in 1NF. Repeating groups can be eliminated simply by transforming each sub-entry into a single entry:

Transactions
Transaction ID	Customer	Day	Amount
1	Pete	Monday	19.00
2	Pete	Monday	-28.20
3	Pete	Wednesday	-84.00
4	Sarah	Friday	100.00
5	Sarah	Friday	150.00
6	Sarah	Friday	-40.00

[edit] Absence of unique identifiers

Suppose that in the Transactions example, two transactions for the same amount are recorded for the same person on the same day:

Transactions
Customer	Day	Amount
Pete	Monday	19.00
Pete	Monday	19.00

As represented above, the two transactions are indistinguishable except by ordinal position. In other words, we cannot use the values within the transaction entries themselves to discriminate one transaction from the other. Because the representation above lacks a unique identifier, i.e. a column or set of columns whose values are guaranteed to be different for every entry, the representation breaches 1NF. Normalizing the representation to 1NF requires a unique identifier to be included:

Transactions
Transaction ID	Customer	Day	Amount
1	Pete	Monday	19.00
2	Pete	Monday	19.00

[edit] 1NF in the context of relational database tables

Most RDBMS products permit the user to define a table that lacks a primary key. Such a table would breach 1NF.

The prohibition against repeating groups, on the other hand, is implicit in the way a relational database represents information. The nature of a database table in an RDBMS is that within each record of the table, a single field can only ever contain a single value. More precisely, it can only ever contain a single instance of the data type to which the column in question belongs. For example, if a column called Favorite Number is defined as an integer, each record is restricted to having one integer (or Null) in the Favourite Number field. Therefore – so long as primary keys are employed – it is technically impossible to design a relational database table that violates 1NF. Chris Date has argued that this holds true even when the RDBMS permits complex data types such as arrays.[1]

Nevertheless, it is possible to design a database table that violates 1NF in spirit. Typical violations involve a) encoding multiple meaningful values in a single field, and b) using multiple columns in a given table to represent the same type of fact.

[edit] Multiple meaningful values in a single field

Some databases effectively violate 1NF by storing a sequence of values within a single field, for example as a comma-separated list:

Foods not eaten, by person
Person	Foods Not Eaten
Jim	Liver, Goat's cheese
Alice	Broccoli
Norman	Pheasant, Liver, Peas

Such a design would make it extremely difficult to write a query which involved the matching of identical foods-not-eaten. The design can be altered in such a way as to make it conform to 1NF:

Foods not eaten, by person
Person	Food Not Eaten
Jim	Liver
Jim	Goat's cheese
Alice	Broccoli
Norman	Pheasant
Norman	Liver
Norman	Peas

[edit] Multiple columns representing the same type of fact

Also in contravention of 1NF is a design in which multiple columns in a table represent exactly the same type of fact:

Facts About People
Person	Favorite Color	Food Not Eaten (1)	Food Not Eaten (2)	Food Not Eaten (3)
Jim	Green	Liver	Goat's cheese
Alice	Fuchsia	Broccoli
Norman	Blue	Pheasant	Liver	Peas
Emily	Yellow

This is a poor design, even if we believe that no more than three foods-not-eaten will ever need to be recorded for any person. Suppose, for instance, we wanted to identify all pairs of people who shared at least one food-not-eaten. A food designation may appear in any of three separate columns, so a query would have to be written to check for food-not-eaten matches in 9 (3 x 3) separate pairs of columns!