Talk:Database normalization

From Wikipedia, the free encyclopedia

Archives: 1 2

Contents

[edit] Spelling: normalize vs normalise

'Normalize' is perfectly correct ([1]) spelling of the word. Changing it would be no improvement, and would render the article inconsistent with most usage, and inconsistent with the other wikipedia pages on database theory. --VinceBowdren 20:39, 7 January 2007 (UTC)

[edit] Incomplete Information and potential source for tremendous bias

This is the first article I've encountered in the Wikipedia that I have the potential to help with. I'm not sure of the best way to go about doing so. Some of these comments also apply to the dimensional database article too.

In the Products section, Cache is listed, but not the original MUMPS that it was based on, also the current crop of multi-valued aka post-relational aka multidimensional databases (OpenQM, QM, uniVerse, UniDATA, mvBase, mvEnterprise, jBase, D3, etc.) are not mentioned.

There are also links to web pages by Fabian Pascal. He is a paid author and is known for his mischaracterizations or, and rants against, multi-value databases. His typical "argument" style consists of making a statement, and when it's argued, his reply is essentially no more than "The math proves it and if you don't see that you're too stupid to ague with." Also, when people point inaccuracies and mischaracterizations in his examples, the typical response is along the lines of, "You're too stupid to know your own product." Finally, when people point out his bias, he defends himself by trashing one of the SQL-style databases that he doesn't derive income from.

ElRay 16:27, 7 February 2007 (UTC) Ray

Erm... that's a very strong assertion. Can you source that statement? (BTW, I took the liberty of converting some of your text to links)
Regarding the Products section, I believe what they're trying to do is list some of the most widely used databases: from my experience, the post-relational databases haven't gained too much traction yet. For a complete list of DBMS, you'd want to consult Category:Database management systems. Anyway, it doesn't have anything to do with this article: it's a template that's transcluded throughout here. You'd want to bring your objections to this page: Template talk:Databases.
I'm glad to see someone from the field volunteering to help out this article. It is in desperate need of expert attention. If you have any more questions, don't hesitate to ask! — Edward Z. Yang(Talk) 02:34, 8 February 2007 (UTC)

[edit] Teaching the Wrong Thing

Please do not teach the wrong way how to apply normalization. Starting with the universal relation (including all attributes) as a starting point is the worst way to go. Every expert in the field will tell You: Never start modeling with the universal relation! Don't try to normalize in praxis, since every thing is decomposed and normalized almost always. Just tell this your readers: Normalization is only theory failing always when practical problems are considered. --213.61.130.220 09:09, 1 March 2007 (UTC)

As a software architect (whatever that means ;-) and designer, specialising in database design for the last 12 years, I agree with 213.61.130.220 (gasp). Let's try not to obfuscate the real issue here, which is how does database normalization help real software projects! Let's not pull out the old overworn examples. Let's use instead a real-world approach, which starts best by gathering business knowledge of the entities involved and then testing the documented assertions against the available data, then reconsidering where attributes should normalise to.
The true benefits of real-world normalization only become evident as you consider pros and cons of alternative designs using various normal forms. How can we demonstrate this to our readers?
Can we start to collect here, some business cases and design cases, for use of the more esoteric normal forms? Under what circumstances does the cost-benefit favour these approaches?
-- Matthew 1130 12:44, 1 March 2007 (UTC)
213.61.130.220 says "Don't try to normalize in praxis, since every thing is decomposed and normalized almost always." By "everything", if I'm interpreting him correctly, he means "everything that's been modelled using common sense". This accords with what's often said about normalization, that it's merely formalized common sense. If you design sensibly, your design will come out normalized. I acknowledge this in the article in the passage which reads, "A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms." Also see the introduction to the example, in which it is acknowledged that the example is somewhat contrived for the purposes of illustrating the way the normal forms allow anomalies to be overcome; it is not meant to reflect what typically happens in design. Remember: the article is primarily concerned with explaining what it means to achieve the various normal forms. How one achieves them is another story. Finally, let me reply to Matthew's point about the more esoteric normal forms. It is actually very hard not to achieve these normal forms. In the vast majority of cases, a design adheres to those esoteric normal forms already, regardless of whether the designer has any knowledge of them. See the passage just referenced. There would not be business cases and design cases for meeting those esoteric normal forms, because except in rare cases, one meets them without trying! --Nabav 17:34, 12 March 2007 (UTC)
Thank you for clarification, Nabav. I agree in general. Why not to include your complete paragraph above into the article, since it's the thruth about normalization for me and for others? Why is 'normalization' (not to be confused with 'normal forms') still in the textbooks after 30 years of the relational model forcing Wikipedia to follow this tradition? What are the historical, academic or economic reasons? Who ever writes down functional dependencies, to say nothing on multivalued dependencies, accept in lessons or exams, however? Why using contrived examples and justifying this academic tradition by apologizing in the first paragraph (and by for the sake of this example phrases, which sound like a confession anyway)? Why not to delete the normalization example section completely instead of apologizing, justifying and confessing? Even in most textbooks, designers are first guided to model with entities and relationships or some linguistic approach in mind. It can be proven that some restricted formalism always lead to relational models satisfying at least BCNF, not to talk about PJ/NF or inclusion dependencies ... --213.61.130.220 09:56, 15 March 2007 (UTC)
These are good points, and have made me think about what 'normalization' is essentially about for me. I have some ideas for how the article could be changed to address some of your concerns. Unfortunately I don't have a great deal of time at the moment, but as a long-term ambition I would like to a) make the article convey a better sense of what normalization is (and isn't), and b) add material about the relation between normal forms and E/R modelling, and other approaches to modelling. --Nabav 07:17, 22 March 2007 (UTC)

[edit] Examples

Wouldn't it be easier to have E-R diagrams instead of the actual tables? It would be much more intuitive for someone who doesn't fully grasp the data in the table at first glance. --68.239.240.144 00:11, 21 March 2007 (UTC)

the thing about database normalization is that it's very dependant on the data in tables and implications that can be drawn from the data. ER diagrams don't convey nearly the same message. :( McKay 14:31, 21 March 2007 (UTC)