First normal form
From Wikipedia, the free encyclopedia
First normal form (1NF) is a normal form used in database normalization. First normal form excludes the possibility of repeating groups by requiring that each field in a database hold an atomic value, and that records be defined in such a way as to be uniquely identifiable by means of a primary key.
Contents |
[edit] Breaches of 1NF
[edit] Repeating groups
Repeating groups might occur on a ledger sheet in which each entry on the sheet features variable numbers of sub-entries. For example:
Customer | Day | Amount |
---|---|---|
Pete | Monday | 19.00
-28.20 |
Pete | Wednesday | -84.00 |
Sarah | Friday | 100.00
150.00 -40.00 |
The 'Amount' sub-entries constitute repeating groups; therefore the information has not been presented in 1NF. Repeating groups can be eliminated simply by transforming each sub-entry into a single entry:
Transaction ID | Customer | Day | Amount |
---|---|---|---|
1 | Pete | Monday | 19.00 |
2 | Pete | Monday | -28.20 |
3 | Pete | Wednesday | -84.00 |
4 | Sarah | Friday | 100.00 |
5 | Sarah | Friday | 150.00 |
6 | Sarah | Friday | -40.00 |
[edit] Absence of unique identifiers
Suppose that in the Transactions example, two transactions for the same amount are recorded for the same person on the same day:
Customer | Day | Amount |
---|---|---|
Pete | Monday | 19.00 |
Pete | Monday | 19.00 |
As represented above, the two transactions are indistinguishable except by ordinal position. In other words, we cannot use the values within the transaction entries themselves to discriminate one transaction from the other. Because the representation above lacks a unique identifier, i.e. a column or set of columns whose values are guaranteed to be different for every entry, the representation breaches 1NF. Normalizing the representation to 1NF requires a unique identifier to be included:
Transaction ID | Customer | Day | Amount |
---|---|---|---|
1 | Pete | Monday | 19.00 |
2 | Pete | Monday | 19.00 |
[edit] 1NF in the context of relational database tables
Most RDBMS products permit the user to define a table that lacks a primary key. Such a table would breach 1NF.
The prohibition against repeating groups, on the other hand, is implicit in the way a relational database represents information. The nature of a database table in an RDBMS is that within each record of the table, a single field can only ever contain a single value. More precisely, it can only ever contain a single instance of the data type to which the column in question belongs. For example, if a column called Favorite Number is defined as an integer, each record is restricted to having one integer (or Null) in the Favourite Number field. Therefore – so long as primary keys are employed – it is technically impossible to design a relational database table that violates 1NF. Chris Date has argued that this holds true even when the RDBMS permits complex data types such as arrays.[1]
Nevertheless, it is possible to design a database table that violates 1NF in spirit. Typical violations involve a) encoding multiple meaningful values in a single field, and b) using multiple columns in a given table to represent the same type of fact.
[edit] Multiple meaningful values in a single field
Some databases effectively violate 1NF by storing a sequence of values within a single field, for example as a comma-separated list:
Person | Foods Not Eaten |
---|---|
Jim | Liver, Goat's cheese |
Alice | Broccoli |
Norman | Pheasant, Liver, Peas |
Such a design would make it extremely difficult to write a query which involved the matching of identical foods-not-eaten. The design can be altered in such a way as to make it conform to 1NF:
Person | Food Not Eaten |
---|---|
Jim | Liver |
Jim | Goat's cheese |
Alice | Broccoli |
Norman | Pheasant |
Norman | Liver |
Norman | Peas |
[edit] Multiple columns representing the same type of fact
Also in contravention of 1NF is a design in which multiple columns in a table represent exactly the same type of fact:
Person | Favorite Color | Food Not Eaten (1) | Food Not Eaten (2) | Food Not Eaten (3) |
---|---|---|---|---|
Jim | Green | Liver | Goat's cheese | |
Alice | Fuchsia | Broccoli | ||
Norman | Blue | Pheasant | Liver | Peas |
Emily | Yellow |
This is a poor design, even if we believe that no more than three foods-not-eaten will ever need to be recorded for any person. Suppose, for instance, we wanted to identify all pairs of people who shared at least one food-not-eaten. A food designation may appear in any of three separate columns, so a query would have to be written to check for food-not-eaten matches in 9 (3 x 3) separate pairs of columns!
[edit] Further reading
- Litt's Tips: Normalization
- Rules Of Data Normalization
- Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
- Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120-125
- Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings
[edit] External links
- Database Normalization Basics by Mike Chapple (About.com)
- An Introduction to Database Normalization by Mike Hillyer.
- Normalization by ITS, University of Texas.
- Rules of Data Normalization by Data Model.org
- A tutorial on the first 3 normal forms by Fred Coulson
- Free PDF poster available by Marc Rettig
- Description of the database normalization basics by Microsoft
Topics in Database normalization |
First normal form | Second normal form | Third normal form |