Fourth normal form

From Wikipedia, the free encyclopedia

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce-Codd normal form (BCNF). Whereas the second, third, and Boyce-Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X →→ Y, X is a superkey—that is, X is either a candidate key or a superset thereof.[1]

Contents

[edit] Multivalued dependencies

If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A multivalued dependency X →→ Y signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we will find that xc is associated with the same y entries regardless of z.

A trivial multivalued dependency X →→ Y is one in which Y consists of all columns not belonging to X. That is, a subset of attributes in a table has a trivial multivalued dependency on the remaining subset of attributes.

A functional dependency is a special case of multivalued dependency. In a functional dependency XY, every x determines exactly one y, never more than one.

[edit] Example

Consider the following example:

Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville

Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.

The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all normal forms up to BCNF. It does not, however, meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:

  • {Restaurant} →→ {Pizza Variety}
  • {Restaurant} →→ {Delivery Area}

These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza start producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas.

To satisfy 4NF, we must place the facts about varieties offered into a different table from the facts about delivery areas:

Varieties By Restaurant
Restaurant Pizza Variety
A1 Pizza Thick Crust
A1 Pizza Stuffed Crust
Elite Pizza Thin Crust
Elite Pizza Stuffed Crust
Vincenzo's Pizza Thick Crust
Vincenzo's Pizza Thin Crust
Delivery Areas By Restaurant
Restaurant Delivery Area
A1 Pizza Springfield
A1 Pizza Shelbyville
A1 Pizza Capital City
Elite Pizza Capital City
Vincenzo's Pizza Springfield
Vincenzo's Pizza Shelbyville

In contrast, if the pizza varieties offered by a restaurant sometimes did vary from one delivery area to another, the original three-column table would satisfy 4NF.

Ronald Fagin demonstrated[2] that it is always possible to achieve 4NF. Rissanen's theorem is also applicable on multivalued dependencies.

[edit] 4NF in practice

A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically stops short of 4NF, perhaps because of a belief that tables violating 4NF (but meeting all lower normal forms) are rarely encountered in business applications. This belief may not be accurate, however. Wu reports that in a study of forty organizational databases, over 20% contained one or more tables that violated 4NF while meeting all lower normal forms.[3]

[edit] References

  1. ^ "A relation schema R* is in fourth normal form (4NF) if, whenever a nontrivial multivalued dependency X →→ Y holds for R*, then so does the functional dependency X → A for every column name A of R*. Intuitively all dependencies are the result of keys." Fagin, Ronald (September 1977). "Multivalued Dependencies and a New Normal Form for Relational Databases". ACM Transactions on Database Systems 2 (1): 267. 
  2. ^ Fagin, 268
  3. ^ Wu, Margaret S. (March 1992). "The Practical Need for Fourth Normal Form". ACM SIGCSE Bulletin 24 (1): 19-23. 

[edit] See also

[edit] Further reading

Topics in Database normalization

First normal form | Second normal form | Third normal form
Boyce-Codd normal form | Fourth normal form | Fifth normal form | Domain/key normal form | Sixth normal form
Denormalization

Languages