First normal form
First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.[1]
Edgar Codd, in a 1971 conference paper, defined a relation in first normal form to be one such that none of the domains of that relation should have elements which are themselves sets.[2]
First normal form is an essential property of a relation in a relational database. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement.
Examples
The following scenario illustrates how a database design might violate first normal form.[3][4]
Domains and values
Suppose a designer wishes to record the names and telephone numbers of customers. He defines a customer table which looks like this:
Customer ID | First Name | Surname | Telephone Number |
---|---|---|---|
123 | Robert | Ingram | 555-861-2025 |
456 | Jane | Wright | 555-403-1659 |
789 | Maria | Fernandez | 555-808-9633 |
The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one value:
Customer ID | First Name | Surname | Telephone Number |
---|---|---|---|
123 | Robert | Ingram | 555-861-2025 |
456 | Jane | Wright | 555-403-1659 555-776-4100 |
789 | Maria | Fernandez | 555-808-9633 |
Assuming, however, that the Telephone Number column is defined on some telephone number-like domain, such as the domain of 12-character strings, the representation above is not in first normal form.
A design that complies with 1NF
In the first normal form, the previous table can be represented in the following way.
Customer ID | First Name | Surname | Telephone Number |
---|---|---|---|
123 | Robert | Ingram | 555-861-2025 |
456 | Jane | Wright | 555-403-1659 |
456 | Jane | Wright | 555-776-4100 |
789 | Maria | Fernandez | 555-808-9633 |
However, this database design does not meet the more stringent requirements of second normal form.[5]
A design that also complies with higher normal forms
Another design for the same data makes use of two tables: a Customer Name table and a Customer Telephone Number table.
|
|
This design is in first normal form. Indeed, repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record. With Customer ID as key, a one-to-many relationship exists between the two tables. A record in the "parent" table, Customer Name, can have many telephone number records in the "child" table, Customer Telephone Number, but each telephone number belongs to one, and only one customer. It is worth noting that this design meets the additional requirements for second and third normal form.
Atomicity
Edgar F. Codd's definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."[6] Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)"[7] meaning a field should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same field.
Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.[8][9] In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
- A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into substrings.
- A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components.
- An ISBN would seem not to be atomic, as it includes language and publisher identifiers.
Date suggests that "the notion of atomicity has no absolute meaning":[10] a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to array types and table types) are then acceptable in a 1NF table—although perhaps not always desirable; for example, it would be more desirable to separate a Customer Name field into two separate fields as First Name, Surname.
First normal form, as defined by Chris Date, permits relation-valued attributes (tables within tables). Date argues that relation-valued attributes, by means of which a field within a table can contain a table, are useful in rare cases.[11]
1NF tables as representations of relations
According to Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:[12]
- There's no top-to-bottom ordering to the rows.
- There's no left-to-right ordering to the columns.
- There are no duplicate rows.
- Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
- All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.
Examples of tables (or views) that would not meet this definition of first normal form are:
- A table that lacks a unique key. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
- A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view.[13] This violates condition 1. The tuples in true relations are not ordered with respect to each other.
- A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every field to contain exactly one value from its column's domain. It should be noted, however, that this aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the relational model,[14] which made explicit provision for nulls.[15]
References
- ↑ Elmasri, Ramez and Navathe, Shamkant B. (July 2003). Fundamentals of Database Systems, Fourth Edition. Pearson. p. 315. ISBN 0321204484.
It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
- ↑ E. F. Codd (Oct 1972), Further normalization of the database relational model, Courant Institute: Prentice-Hall, ISBN 013196741X,
A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets.
- ↑ studytonight.com
- ↑ stackoverflow.com
- ↑ Indeed, assuming that different customers can in principle have the same name (although there are no such in the current table), this database design is not in second normal form since "First Name" and "Surname" are non-prime attributes that depend only on Customer ID, not the whole candidate key (Customer ID, Telephone number).
- ↑ Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990).
- ↑ Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), p. 6.
- ↑ Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen, Relational Database Writings 1989-1991 (Addison-Wesley, 1992).
- ↑ "[F]or many years," writes Date, "I was as confused as anyone else. What's worse, I did my best (worst?) to spread that confusion through my writings, seminars, and other presentations." Date, C. J. ["What First Normal Form Really Means"] in Date on Database: Writings 2000-2006 (Springer-Verlag, 2006), p. 108
- ↑ Date, C. J. ["What First Normal Form Really Means"] p. 112.
- ↑ Date, C. J. ["What First Normal Form Really Means"] pp. 121–126.
- ↑ Date, C. J. ["What First Normal Form Really Means"] pp. 127–128.
- ↑ Such views cannot be created using SQL that conforms to the SQL:2003 standard.
- ↑ "Codd first defined the relational model in 1969 and didn't introduce nulls until 1979" Date, C. J. SQL and Relational Theory (O'Reilly, 2009), Appendix A.2.
- ↑ The third of Codd's 12 rules states that "Null values ... [must be] supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type." Codd, E. F. "Is Your DBMS Really Relational?" Computerworld, October 14, 1985.
Further reading
- Litt's Tips: Normalization
- Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
- Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120–125
|