Talk:Database normalization

From Wikipedia, the free encyclopedia

Archives: 1

Contents

[edit] Is that really 3NF?

I'm not really an expert on the subject but from my understanding, I suspect there's an error in the BNCF section. There's an example provided in the BCNF section that is as follows...

BEGIN QUOTE

Example of a relation that is in 3NF form but not in BCNF:

Relation: {A,B,C,D} AB is a candidate key, BC is candidate key and A->C.

END QUOTE

For a relation to be in 3NF, it has to satisfy the conditions of 2NF too, and therefore 1NF.

In this example, AB is a candidate key, and A->C so there's partial dependency if that is chosen as the PK. BC is a candidate key, and C->A (since A->C) so there's partial dependency if that becomes the PK. Doesn't that violate the conditions for 2NF, and therefore the 3NF?

K nitin r 06:55, 10 July 2006 (UTC)


I guess the example isn't very clear. There's a better example available at:

[1]

K nitin r 07:20, 10 July 2006 (UTC)

[edit] Normalization: not just for relational databases...or is it?

I've used the phrase "poorly normalized" as a way of characterizing all kinds of things (the duplication of content between wikipedia articles, for example). This article seems to be talking specifically about the relational model. This article spends its time talking about the formal characteristization of redundancy in relational databases. Do other database forms speak of "normalization"?

In any case, someone made a stub for "redundancy (database)" which I redirected here. That's based on the belief that there isn't enough to talk about on the subject of redundancy that isn't taken care of here or in the general definition of the term. Metaeducation 03:56, 25 October 2005 (UTC)

    • I have been a professional in database technology for 20+ yrs. Prior to there being a relational model I used normalization to design databases. It was most beneficial when designing for a CODASYL database, but it was also helpful for designing a TOTAL or an IMS database. It was even helpful when designing a set of VSAM files (which aren't managed by a DBMS at all). Take a look at IDEF for a description of some of the history of data modeling was applied prior to the advent of relational technology. KeyStroke 04:36, 25 October 2005 (UTC)


[edit] I find this page very hard to understand...

...and I've been working with relational databases for years. I wonder why.

Could you be a little more specific? Remember, we are always seeking to improve the article, so if you could do a little thinking: what was hard to understand? — Ambush Commander(Talk) 03:15, 14 November 2005 (UTC)
I didn't really understand anything either. Now, I don't know much about databases at all, so maybe that's the reason. Honestly, I never understood the whole point of databases at all. People say normalization is one important reason for using databases, but this page didn't enlighten me much. --Apoc2400 02:47, 21 July 2006 (UTC)

I'm not sure I understood this, either, I'm pretty new to relational databases - only been working with them a few years, self-taught on mySQL, but I've had some experience with logic and symbolic logic. I can see how normalization helps the programmer - but it seems to me that normalization is more of a shortcut for programmers than a more AI heavy approach to actually analyzing and reconsidering the presented data might be. But I might be way off-base - I mean, I started learning with flat-files and personally "tweaking" each one - I could see how this could be cumbersome for larger data-sets, but normalization seems like it would lose some very important "facts" (or, in some cases, "opinions" or preferences of the person entering the data) coming in from the different data sources. I'll look into this further, and am limiting myself to only editing the article for readability. Perhaps one of the more experienced database engineers could steer me towards some methods of data reconciliation that doesn't involve removing redundancies, but makes use of them? gnomelock 06:21, 16 August 2006 (UTC)

[edit] How about this?

Look at the examples given on this page:

http://www.troubleshooters.com/littstip/ltnorm.html

Look at the examples, not the text. They are coherent. There are no examples of non-normalized within the normalized forms. Only one example per form. What do you think? I say do it.


[edit] 5th Normal Form Subsection

Help!

I found the subsection on 5th normal form. Generally speaking, the subject is a welcome addition to complete the article. The contributor signed his addition in the article with

             BY MAHESH M
             MTECH -COMPUTER SCIENCE(BMSCE)

Mahesh M. seems to have supplied the makings of an understandable example — something involving students and classes — but the typesetting of the example data is bad.

This needs an editor to fix the typesetting, who clearly understands the 5th Normal Form. Unfortunately, I only understand enough to sort of get the idea that the contribution seems like it's good, if only the typesetting was clear. Tom Lougheed 18:30, 4 January 2006 (UTC)

The normal form subsection has been replaced, yet again, with a different set of information. Sigh... — Ambush Commander(Talk) 01:32, 8 January 2006 (UTC)
Yes, I replaced it because after reading the paper in the references and also my old textbook "Database System Concepts" by Abraham Silberschatz et. al., it wasn't clear to me that the person knew what he was talking about. Also, when I looked through the history right before he made his submission he wrote: "The information is not sufficient please any one update it.". This combined with the fact that the typesetting was bad, led me to think that he most likely had some school assignment involving the material, and either wasn't sure that his answer was right or didn't know what to put. Although some of the definitions looked consistent with "Database System Concepts" I guess I just felt it would be more confusing than enlightening. Jjjjjjjjjj 02:05, 8 January 2006 (UTC)
This section is wrong. Go ahead and compute any join of the three decomposed relation as the author suggests and you'll find that you do not recover the original relation: you will insert additional tuples that don't belog. In fact, the original relation *is* in 5NF, since you cannot further decompose the relation without losing information. I'll try to fix it at some point.
I've fixed this, I think. Hopefully the Psychiatrist example makes things more clear. --Nabav 14:07, 15 August 2006 (UTC)

[edit] Style guide for this article

The manner in which tables are specified in this article number at about three. This is unacceptable. We should adopt a standard style for all tables in this article, and then switch them all to it. Although, as a developer, I would prefer working with SQL-style table definitions, I recognize that real tables with fake data help demonstrate the relationships better. In that case, would we use class="wikitable" to format those tables? Just throwing out some ideas. — Ambush Commander(Talk) 01:31, 8 January 2006 (UTC)

I have added cleanup to this article for this very reason. More discussion please! :-) — Ambush Commander(Talk) 21:55, 24 January 2006 (UTC)
Gonna be bold. — Edward Z. Yang(Talk) 01:02, 14 April 2006 (UTC)

[edit] If it is not relational how can it be normalized?

In this edit I removed:

—especially in systems that do not use the relational model (such as OLAP).

How can a system be normalized if it is not relational? Does this mean that non-relational systems (like OLAP) are denormalized to the extreme? For now, it has been removed: the abstract notion of the desirability of denormalization is good enough, methinks. — Ambush Commander(Talk) 21:52, 24 January 2006 (UTC)

Long prior to the acceptance of RDBMSs we in Data Processing (as it was called then) used normalization for various kinds of databases and even non-database systems. I, myself, have used normalization for two kinds of CODASYL databases as well as for two kinds of Hierarchical databases. Normalization is a discipline for organizing data of any kind. It does not have to be implemented into an RDBMS to be normalized. KeyStroke 17:54, 28 February 2006 (UTC))

[edit] 5NF is a bad example

Kate knowing Classical Guitar and Jazz Violin does not imply in any way that Kate knows Jazz Guitar or Classical Violin. Somebody who understands this stuff better, please clean this up :) -- Vstarre 17:46, 7 March 2006 (UTC)

Hmm, I think that needs a completely new example. I will see what I can do to improve this. Jatos 11:41, 12 May 2006 (UTC)

Yep, try (james, piano, jazz) (james, trumpet, classical) (kate, trumpet, jazz) (kate, piano, classical). Join it back together and what do you get? A mess.

That's because in some respects, "jazz drumkit" and "rock violin" are still instruments. (e.g. a rock guitar and a classical guitar are two very different instruments.) --DavidHOzAu 11:24, 19 May 2006 (UTC)

I am only a student and I might be completely on the wrong track here, but possible alternatives are using an attribute for the instrument's type (wood, brass, strings, percussive, electronic) or shape (saxaphone, guitar, piano, drums). My reasoning is that once an instrument is a wooden instrument, it stays a wooden instrument. Also, learning to play a brass instrument is quite different to playing the drums, yet there isn't much difference between playing, say, a grand piano and an electric piano. Team this up with more than one instrument for a type, (e.g. violin, cello,) and the 5NF form will start to look much better. Please correct if I am wrong. --DavidHOzAu 11:24, 19 May 2006 (UTC)

Indeed, types of instruments such as brass, woodwind, strings would be better. The instrument - genre table is entirely bogus. --DavidConrad 21:23, 13 July 2006 (UTC)


This has the makings of a good example but I think goes slightly astray; for example it is not correct that “Joining these three tables together will return the original relation” (normalization is a reversible process). If the example were slightly different and the original table represented all permitted combinations, including the fact that Lois could play jazz piano, it would not be in fifth normal form and could be replaced by the three tables as shown. If the original table cannot be derived from (decomposed into) two or more simpler tables it is already in fifth normal form.

Assuming that the intention of the original table is to show actual combinations (not permitted), and is therefore already in fifth normal form, there is still value in creating separate tables to represent the constraints, but they do not allow the original table to be replaced. In fact these tables substitute for the permitted table that is otherwise concealed within (but derivable from) the table of actual occurrences (e.g. earlier it was possible to derive the fact that Lois could play jazz piano from: (a) Lois plays piano (b) piano is a jazz instrument (c) Lois plays jazz. There are two sets of multi-valued facts (actual and permitted) involving the same three values. To achieve fourth normal form they must be separated; to achieve fifth normal form the permitted table must be simplified into three pairs of tables!

There is a readable discussion of normalization in 'Data Modeling Essentials' by Graeme Simsion (http://www.simsion.com.au/)

Robertwilliamsau 08:10, 11 August 2006 (UTC)


[Hmmm. In the unnormalized table, Lois does NOT play Jazz Piano. But she appears to do so, in the normalized version.] User:206.11.112.251 16:59, 13 July 2006

[ Comment: As the comment above suggests, the three tables as presented do not accurately model the initial unnormalized table.

We can, however, "normalize" the initial table simply by interpreting it differently. Let's assume that we have three tables: a musician table, an instrument table and a genre table.

Each of these tables should have a primary key, which is generally numeric, but for this example assume that the primary keys in the musician table are "James", "Kate" and "Lois". The primary keys for the instrument table are "Piano", "Trumpet", "Drums", "Clarinet", "Saxophone", "Violin" and "Guitar". The primary keys for the genre table are "Classical", "Jazz", and "Rock".

With these assumptions, the above "unnormalized" table becomes a three-way many-to-many relationship table that is exactly the correct way to model the data. ] User:SteveHL 23:31, 8 August 2006


On reading the William Kent article and example, I think the problem with the example given is that 5NF only applies if there is a data rule such as "If person A plays instrument P and style Z, person A plays instrument I in style Z". This is (in real life) not true in the case of instruments and musical styles, which is why the example is unconvincing. The example given on the William Kent page talks about agents, vehicle manufacturers and vehicle types, but is also unconvincing in my opinion. I'm trying to think of a plausible example from a different situation so here are some suggestions, none of which I'm quite convinced about:

  • If engineer A knows the use of tool I and knows the maintenance of vehicle type P, then engineer A can use tool I to maintain vehicles of type P
  • If cook A knows the use of utensil I and knows the style of cooking P, then cook A can use utensil I to cook in style P.
  • If person A works for company I and has skill P, then person A uses skill P in the employ of company I.
  • If a company A makes product I and sells to market P, then company A sells product I in market P.
  • If a theme park A has customer I and ride P, then customer I can go on ride P.
  • If a person I knows language A and book P is in language A, person I can read book P.

The only ones of those where I think the rule is plausible are the last two, in particular the languages/books rule. But is it just a bit too artificial to have a table listing 'books people can read' with a separate entry for each person-language-book combination, to then normalise? --VinceBowdren 12:35, 9 August 2006 (UTC)


I've altered the example to something I reckon is more realistic. --Nabav 17:30, 14 August 2006 (UTC)


[edit] Minimal Cover

Minimal Cover should be in mentioned [2] --Tim 17:45, 14 March 2006 (UTC)

Agreed, but if we go into minimal cover, then we need to cover F and X closures. No pun intended. Railgun 18:05, 14 March 2006 (UTC)
The canonical (minimal) cover is not even mentioned in the functional dependency page, nor are closures. They would have to be elaborated there first. And because closures play a big role from BNCF and onward, it would make sense to explain them here as well. Then again, this is another reason to split the article because not all normal forms work with functional dependencies.--Citral 21:56, 14 August 2006 (UTC)

[edit] BCNF needs good example

Is it possible to put a practical/real example to understand BCNF? Bcos I'm finding it little difficult to understand this with just ABCDs! Many thanks for whoever does this.

I agree! The presentation lapses into shorthand that no-one besides an expert is likely to understand. I am, however, curious; is this the relational algebra of which one hears but almost never sees?
In its earliest form, this article was completely mathematically oriented, an example: http://en.wikipedia.org/w/index.php?title=Database_normalization&oldid=8906405 I really couldn't tell you. — Edward Z. Yang(Talk) 21:20, 4 June 2006 (UTC)
An example has now been provided. --Nabav 14:14, 15 August 2006 (UTC)

[edit] No definition of multi-valued dependancies

The Fourth Normal form is described as being similar to the third normal form following the removal of multi-valued dependancies, which are not defined or explained at all.

I'm fixing that with the creation of the multivalued dependency page. Cheers, --Citral 12:22, 11 August 2006 (UTC)

[edit] expert?

Anyone know why this article has an "expert" tag on it? What was specifically in need of review? -- Mikeblas 15:01, 6 June 2006 (UTC)

While normal forms one through three are fairly sufficient for most needs, the examples for 4NF and 5NF are horrendous. That's primarily the reason. — Edward Z. Yang(Talk) 19:35, 6 June 2006 (UTC)

[edit] 2nd / 3rd NF

I'm revising for my exams on databases at the moment... and I'm sure that 2NF and 3NF are identical in this article - not the same as in my book... or am I just reading it all wrong? Paul 14:16, 17 June 2006 (UTC)

Yea, the formal definitions out of Elmasari [ISBN 0321122267] are easier to understand than this article. I think this is a perfect example of good brainstorming that is now in need of good editing. I'd like to see the normal forms defined both informally (for the non-techies) as well as their formal definitions. It's all there, just not organized. It's easy to say 2NF is no partial key dependencies and 3NF no transitive - but as we use to say, "that's the undergrad definition." Railgun 14:29, 17 June 2006 (UTC)

[edit] some more examples

You may find the examples here useful: http://www.robin-beaumont.co.uk/rbeaumont/virtualclassroom/chap7/s7/index.htm

[edit] Split the article

I am in favour of splitting each of the normal forms into a separate article. I think the separate articles should contain all the specifics and formal definitions, and we can maintain a pithy, abstract summary about each here. Any detractors? — Direvus 19:06, 6 August 2006 (UTC)

I agree: normal forms are a complex isssue that requires more in depth explanation of each. There might be a general article about normalization but there should also be a more profound article on each normal form. Relational databases are quite important these days and normalization is a big issue. --200.122.153.226 17:53, 12 August 2006 (UTC) (Crio de La Paz).

Is there any reason not to split this article? I think it's more than safe to go about splitting now. --Citral 22:01, 14 August 2006 (UTC)

I've started the split by moving the 5NF section out to a new article. --VinceBowdren 10:15, 15 August 2006 (UTC)

I support splitting each normal form as a seperate article as they are one of complex to understand yet very important topics in database designing. --Rahullahurikar 12:55, 2 September 2006 (UTC)

I too am for splitting each norm into separate articles.

I agree, great idea. 219.88.83.179 22:08, 6 October 2006 (UTC)

[edit] No defn of join dependencies

5NF seems to be defined in terms of join dependencies, which we don't have a definition of yet. --VinceBowdren 10:19, 15 August 2006 (UTC)

[edit] Template and category for split articles

The Databases template is a bit big, and looks like overkill for the separate articles for 1NF, 2NF etc. How about we keep that template on the main article, but introduce a new one for the split articles, looking something like this:

Topics in Database normalization

First normal form | Second normal form | Third normal form
Boyce-Codd normal form | Fourth normal form | Fifth normal form | Sixth normal form

Also a category called 'Database normalization' (probably a subcategory of the Database constraints category), to link them all together. --VinceBowdren 12:13, 16 August 2006 (UTC)

Done, see Template:Database normalization. I made the category a subcategory of the Databases category in the end, seemed more appropriate there. --VinceBowdren 19:07, 21 August 2006 (UTC)

[edit] Denormalization

Just happened across a separate article about Denormalization. I've now linked to it from here, so now we should get on with replacing the long description of denormalization here with a potted summary, and expanding the separate article. --VinceBowdren 18:57, 21 August 2006 (UTC)

[edit] Useful references

Just came across this [3]. I haven't yet edited any of the articles as a result, so I'm just linking it here for now. --VinceBowdren 08:43, 22 August 2006 (UTC)

And another one: [4] --VinceBowdren 08:52, 22 August 2006 (UTC)

[edit] minor edit

I changed the beginning bit from:

==Description=smita

to

==Description== --Wakimakirolls 20:13, 14 October 2006 (UTC)