Talk:PostgreSQL

From Wikipedia, the free encyclopedia

Contents

[edit] Joins?

Does anyone know about the automagick joins feature in this article? There doesn't seem to be one, but I'm not PostgreSQL expert. Eurleif 15:21, Jan 18, 2004 (UTC)

[edit] what's with the criticism section?

I took the criticism section completely out (was renamed shortcomings). It was poorly written and inaccurate. The information that was presented there was more approraite for a PostgreSQL faq entry than on this page.

The few bits of good information (for example: a slony mention) would be more appopriate in a 3rd party support section, etc.

Merlin

It looks like it was written by a Mysql fanatic who dug anything he could in order to make Pgsql look bad. Lack of replication support and native win32 ports are indeed important issues, but the rest doesn't merit being mentioned, IMO.

I agree that the latest item on the "Criticisms" list is badly worded (how about "Due to PostgreSQL's MVCC transaction design, some operations (like count(*)) are slower than one might expect at first."), but I find the criticism list good. All the DBMS pages should have sections on the good/bad sides of the DBMS. That way, Wikipedia may become an interesting and useful encyclopedia, and not just yet another an advocacy advertising pillar. I always feel sceptical when software descriptions don't tell about weaknesses: The rest of the information in such pages becomes dubious. Hopefully, the "Criticisms" section will shrink when PostgreSQL 7.5 is released.

TroelsArvin 18:40, 2 Jun 2004 (UTC)

regarding Due to PostgreSQL's MVCC. I think the specific technical reasons for one or another shortcoming should not be mentioned because it would make the article too bloated. It should be just a list of things that an average user should be aware about when making a decision to use PGSQL. If you don't like wording, reword it. User:Gene s
Uh, when is an "average user" in charge of picking database software? cbraga 13:19, Jun 3, 2004 (UTC)
That depends on a definition of an average user Gene s 14:14, 3 Jun 2004 (UTC)
I think the prevous critisim section was better than the present watered-down version. It's also not a good idea to use statements like Mysql fanatic unless you have any facts to substantiate it. I belive it's wrong to scrap factually correct information because it "makes something look bad". If you don't like wording, reword it. Renaming section is fine, otherwise I want to roll back the latest change.

User:Gene s

Well, I was precisely rewording it. The 'Criticism' section has too many details and does not fit well in this article. It gives the impression that the software is incomplete and unuseable, which is not the case. It mentions many database terms such as COUNT(), 'materialized views', 'updateable views' in a prominent manner which the 'average user' will have no idea what they're about. It mentions replication without giving any hint of why it's important. I really don't think you were justified in rolling back the changes to the original version which I consider inadequate for the reasons I just mentioned, so I'll put mine back on. I suppose it could be complemented if you feel the need, however please stick to the important points. Sorry about the 'fanatic' thing. cbraga 13:19, Jun 3, 2004 (UTC)
No, that's not rewording. You took a factually correct section written by many people over an extended period of time and replaced it with your own which is not as precise or deep. This is not a PGSL advocacy group. It does not matter if correct information makes PGSQL look good or bad. The thing that matters is if it's correct or wrong, readable or not. So, there are two ways to deal with it. (1) Professional, by trying to find a compromise. (2) Unprofessional, by having a roll back fight. I propose the (1) - since you were the last one to roll back my and other people changes, you restore it back and then reword it keeping all the items on the list.

Gene s 14:14, 3 Jun 2004 (UTC)

Well then call it rewriting. Look, sometimes it is necessary to rewrite a whole section of an article because it is not adequate. Such was the case with the criticism section for the reasons I stated before. I don't really care if many shorcomings are listed, only that they are factual and actually relevant to the article. Some of the previous itens were simple inappropriate here. Others gave the false impression that PostgreSQL is an incomplete and unusable product. So I rewrote them into something better: factual, correct and concise. I'm sure you can see that the fact that the original section was written by many people over an extended period of time does not mean it will be well written in the end. In this case, it was not. Also, 'Criticism' is a very poor title choice. You can't criticise software any more than you criticise a hammer. Software and hammers have shortcomings. Thanks. cbraga 23:43, Jun 3, 2004 (UTC)
I have no objection to changing the title. I object to removal of valid points. PGSQL is not suitable for some applications and it should be noted. It's not a good idea to hide problems and let people discover them well into the project. Let's rewrite it some more and put back valid criticism. Gene s 07:07, 4 Jun 2004 (UTC)

I think it's reasonable to compare and contrast the "criticisms" that have been placed in this article with those in the article for MySQL. The MySQL article describes actual criticisms expressed by experts in the field, such as C. J. Date, and the way that MySQL's creators have responded to them. It goes on to describe objective errors in MySQL documentation, and the fact that they have been remedied. It also treats with the response of a part of the user community to MySQL AB's license change. It is not a list of Wikipedia editors' complaints about the product, but a description of a sort of dialogue that has gone on between MySQL's creators and the database community (including MySQL users).

Wikipedia is not supposed to present "original research" or the bare opinions of Wikipedia editors. When we call an article section "Criticisms" it needs to refer to criticisms made by authorities in writing elsewhere. Our own complaints do not fit the bill. --FOo 02:59, 4 Jun 2004 (UTC)

This is a very good idea. Could you provide links to such authoritive opinions so they can be incorporated into the article? I can't completely agree that user opinions are not worthy of inclusion. And I think that issues considered important by PGSQL developers deserve close attention. For example the matter with poor performance of aggregates takes nearly half of the agregates page in the documentation. Need for VACUUM is also a specific feature of PGSQL which is unexpected.
Gene s 07:07, 4 Jun 2004 (UTC)


The statement and the database functions normally while it runs regarding VACUUM is flat out incorrect with PGSQL 7.4 Gene s 14:32, 4 Jun 2004 (UTC)

Quoting from http://www.postgresql.org/docs/7.4/static/maintenance.html#ROUTINE-VACUUMING
Beginning in PostgreSQL 7.2, the standard form of VACUUM can run in parallel with normal database operations (selects, inserts, updates, deletes, but not changes to table definitions). Routine vacuuming is therefore not nearly as intrusive as it was in prior releases, and it's not as critical to try to schedule it at low-usage times of day.
cbraga 16:00, Jun 4, 2004 (UTC)
Can run in parallel is not the same as database functions normally while it runs. Not as critical is simply becuase earlier the database was completely unuseable while it ran. Now it's kind of useful if load is light. It does affect performance A LOT, even with 7.4. Why would it otherwise be adviseable to schedule it at night if it weren't affecting the performance?
Gene s 07:58, 8 Jun 2004 (UTC)
Trivially, any other use of the database server "affects performance". That is, if we measure "performance" by responsiveness to user queries, then the presence of any other activity on the server "affects performance". This would include the activity of another user, or the running of a database dump for backup, or even unrelated activity on the same computer as the database server, as well as a VACUUM.
It's a principle of system administration that one schedules non-time-critical batch processes for times when few time-critical interactive processes are running. This is exactly as true of VACUUM as it is of full backups or any other big, heavy batch process. Making VACUUM sound exceptional in this regard is misleading. --FOo 14:12, 8 Jun 2004 (UTC)
Absolutely correct. Just like stating that it does not affect performance is also misleading (it was so in the previous edition). The current edition seems to fit the bill. Gene s 14:21, 8 Jun 2004 (UTC)

[edit] MVCC

The article currently states:

"Concurrency is managed via a Multi-Version Concurrency Control (MVCC) design, which ensures excellent performance even under heavy concurrent access"

Isn't it the case that MVCC improves _read_-concurrency (a read will never block or be blocked) while write concurrency isn't affected?

MVCC improves overall concurrency, since without it a single write operation will lock the full table before proceeding. With MVCC, multiple reads and writes can go on in parallel.
Surely that would depend on how it's implemented? A single write could lock only the row, only the page or perhaps the whole table, depending on the details of the implementation. You don't need to be in the SQL-92 read uncommitted mode, which MVCC appears to be, to have multiple concurrent reads and writes. Jamesday 15:13, 17 July 2005 (UTC)
The whole selling point of MVCC is that you can guarantee read committed isolation without read locks. Or in the serializable isolation level, read locks are acquired, but they do not conflict with concurrent write locks, thus again, improving concurrency. Read [1] for details. -- intgr 10:46, 31 January 2007 (UTC)

[edit] Flawed view of the relational model

I changed a few grand (and incorrect) statements wrt the relational model. For example,

Primary among these was the relational model's inability to understand "types", combination of simpler data that make up a single unit. Today we typically refer to these as objects.

Was totally incorrect becuase:

  • he relational model deals with types quite well.
  • In 1985, there was no working implementation of the relational model (AFIK there still isn't one)
  • General sentiments refelect a very common falacy of confusing SQL databases with relational databases (SQL != Relational).
  • "units" of data are most certainly not referred to as "objects" except within the narrow realm of oo programming. Many statements in the article reflect a heavy bias towards OO philosophies which are certainly not universal in the database world.

    Merlin

[edit] VACUUM

Vacuuming doesn't actually remove the old data, only marks it such that the space can be reused by new data. Hence the size of the database files on disk does not change. To actually shrink those files you need to do VACUUM FULL which will lock the whole database. Also vacuum full is not usually of any advantadge since an active database would grow again. cbraga 02:28, Jun 11, 2004 (UTC)

[edit] Type inheritance?

Could someone please confirm that this is actually correct? It seems to state outright that PostgreSQL has type inheritance, and I'm not sure that it does. Currently the 5th paragraph in the Description section is:

PostgreSQL also allows types to include inheritance, one of the major concepts in object-oriented programming. For instance, one could define a post_code type, and then create us_zip_code and canadian_postal_code based on them. Addresses could then be specialized for us_address and canadian_address, including specialized rules to validate the data in each case.

It's possible that it's referring to table inheritance already described elsewhere, which I don't personally consider quite the same as type inheritance. To the best of my admittedly restricted knowledge, PostgreSQL doesn't have type inheritance. At least, I can't seem to find anything definitive about it on the CREATE TYPE documentation page. It only talks about composite types, base types and array types. There's no mention of inheritance in any of them. Is it an undocumented feature?

Izogi 01:56, 20 Jul 2004 (UTC)

[edit] Illustra

This doesn't have anything to do with PostgreSQL

Illustra's product was first introduced in 1991, where it was used in the Sequoia 2000 project late that year. By 1995 the product had added an ability to write plug-in modules they referred to as DataBlades. Unlike other plug-in technologies, with DataBlades external authors could write code to create new low-level datatypes, and tell the database how to store, index and manage it. For instance, one of the most popular DataBlades was used to create a time-series, a list of one particular variable over time, often with gaps. For instance, the price of a stock over time changes, but there are times, like weekends, where the data does not change and there is no entry. Traditional databases have difficultly handling this sort of task; while they can find a record for a particular date, finding the one that is "active" in one of the gaps is time consuming. With the Time Series DataBlade, this was fast and easy.

DataBlades were incredibly successful and started to generate considerable industry "buzz", eventually leading Informix to purchase the company outright in 1996. Industry insiders claimed that it would not be possible to merge the two products, but in fact this was fairly easy because both were based on the original Ingres code and concepts. Informix released their Illustra-based Universal Server in 1997, leaving them in an unchallenged position in terms of technical merit. Roadrunner 22:53, 29 Jul 2004 (UTC)


[edit] PostgreSQL v MySQL??!!

We need section about it! where the comparations??


Added typical PostgreSQL vs. MySQL flame. I tried to summarize the typical PostgeSQL v. MySQL battle (and to point out that there is a flame war). Ultimately every criticism I've ever seen on the issue was boiled down to one side saying that PostgreSQL was bloated and the other side saying that MySQL was a toy.

Roadrunner 23:10, 29 Jul 2004 (UTC)

You took a section about specific PgSQL shorcomings and turned it into a "typical PostgreSQL vs. MySQL flame" which is cleary NOT what it should be. Your edition appears to be a regression. Gene s 04:06, 30 Jul 2004 (UTC)

[edit] "direct understanding of relationships"

I've removed the following text from the page, on the grounds that it is vague and nonsensical. AFAIK, PostgreSQL does not actually have a "direct understanding of the relationships that exist between tables." I suspect that some earlier version of POSTGRES or a related product may have had the capability to do this, but AFAIK PostgreSQL does not. The text is vague about the definition of the tables involved and how exactly this "direct relationship" is established, so perhaps I'm missing something. If someone would care to (a) provide links to the PostgreSQL documentation that discusses this feature (b) provide working SQL that actually takes advantage of this alleged functionality, I'd be happy to re-add the text and rework it to be less vague. Neilc 07:13, 28 Sep 2004 (UTC)

This paragraph is just awful: "The SQL data stores simple data types in "flat tables", requiring the user to gather together related information using queries. " etc A relation is a subset of a cartesian product. Sounds like this was written by an uninformed amateur. user:gtoomey 220.240.152.221 15:13, 19 Jan 2005 (UTC)

Another very useful feature of PostgreSQL involves direct understanding of the relationships that exist between tables. People in the real world typically have several addresses, which the relational model approaches by storing the addresses in one table and the rest of the user information in another. The addresses become "related" to a particular user by storing some unique information, say the user's name, in the address table itself. In order to find all the addresses for "Bob Smith", the user writes a query that "joins" the data back together, by selecting a particular name from the users table and then searching for that name in the address table. Doing a search for all the users in New York can become somewhat complex, requiring the database to find all the user names in the address table, then search the user table for those users. A typical search might look like this:

 SELECT u.* FROM user u, address a WHERE a.city='New York' AND a.user_name=u.user_name

PostgreSQL can explicitly define the relationship between users and addresses. Once defined, the address becomes a property of the user, so the search can be greatly simplified to:

 SELECT * FROM user WHERE address.city='New York'

This code requires no "join": the database itself understands the user.address relationship.

A related example shows the usefulness of types. If one uses PostgreSQL to do:

 SELECT address FROM user

then the database filters the results automatically, returning only those addresses for users, not those for companies or other objects that might also use the address table.

[edit] MVCC vs. locks vs. read uncommitted

The recent revision that claimed that MVCC was another name for the "read uncommitted" SQL standard isolation level was mistaken. Postgres provides two isolation levels (read-committed, the default, and serializable); these two levels are implemented via MVCC (MVCC is an implementation technique, whereas the isolation levels describe what behavior is exposed to the user). Also, MVCC does not mean that out of date information is ever returned, merely that multiple versions of tuples are stored in order to decrease locking requirements. MVCC also does not mean that locks are not used -- Postgres uses both locks and MVCC (row-level, page-level, and table-level locks depending on the circumstance). See the chapter on concurrency in the Postgres docs for more information. Neilc 15:18, 17 July 2005 (UTC)

I checked that on the IRC channel prior to making any changes and was told that yes, out of date rows (rows changed by operations started after the beginning of the query) would be returned. The manual page you referenced says the same thing: "This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data". Not sure where MVCC belongs - does it really belong there or should it be in some technology section? Probably too obscure to really be in the article, since it seems common to at least PostgreSQl and MySQL.
Talking here about how a query can get a non-exclusive read lock to ensure it's seeing the latest verion of a row, without using SELECT .. FOR UPDATE to get an exclusive write lock and serialised, no concurrency, access it doesn't need. Looks as though that's new for 8.1, currently a development rather than production version. I know the MySQL article distinguishes between what's in development versions and what's in production - should we consistently do it to include development features or consistently do it to set them apart so people know what's released? I know I prefer keeping them apart, since it can take a while for a development version to enter full production release. Jamesday 15:18, 20 July 2005 (UTC)
I'm confused as to what you're talking about. Postgres' behavior is quite standard as far as I know, and does not depend on any new features in 8.1 (8.1 introduces SELECT FOR SHARE, but that is rarely required by applications). As far as "out of date rows" go, it depends on the isolation level. A transaction in serializable isolation level sees a consistent snapshot; if it attempts to UPDATE a row that has been modified by a committed transaction that began after it began (i.e. after its snapshot was defined), it will abort itself (since the two transactions conflict). In read-committed isolation level, a query's snapshot is consistent for the length of the query; new rows added by a committed transaction during the query's execution will not be visible to the transaction, but that is what one would expect. Write conflicts are handled via blocking one of the transactions involved until one transaction has been committed, and then checking the WHERE of the update/delete to see if it is still applicable. The docs describe this in more detail, so I won't go on -- my point is just that I don't see how this is surprising / confusing / undesirable behavior. Neilc 22:25, 24 September 2005 (UTC)
My two cents on MVCC - the previous paragraph by Neilc seems correct to me. Some people seem to be misunderstanding what MVCC is. MVCC never returns "out of date rows" - the data returned is always consistent with what was visible at the start of the transaction. True, it is possible that the data may have been changed by another transaction by this point, but as far as the current transaction is concerned this modification has not yet taken place. MVCC provides an elegant mechanism for achieving such consistency without locking data. Mike.

[edit] addons vs. builtin features

Is there merit in distinguishing between these two? For example, some of the "features" in the first list are actually implemented as addons (e.g. PL/PHP, PL/Java, PL/sh, PL/R, PL/Ruby), but I'm not sure there's much to be gained by making the distinction explicit. Anyway, if people think they should be separated, I won't object, but I just wanted to note that there are more addons than just PostGIS. Neilc 15:25, 17 July 2005 (UTC)

Of course. One is what the program as supplied by the vendor does, the other is what you can get other products to do. I don't know PostgreSQL well enough to know all of the add-ons, though, so I only did it for one I was fairly sure of. Would be nice if someone distinguished for the rest, so we have it clear what PostgreSQL's standard distribution does and what other tools do. Jamesday 15:18, 20 July 2005 (UTC)

[edit] History and description

Would not the history section be better after the description and features? It's a fairly dry section, and I can't imagine that's what most people want to read about first.

The description section is also poorly written and needs some cleaning up, as it ranges all over the place, leaning into a little advocacy at times, veering into obscure tangents (e.g. reasons why oo dbs are difficult), and overall is hard to read, especially for a non-technical person.

I also nominate this as the worst sentence: 'The SQL data stores simple data types in "flat tables"' :)

Turnstep 13:44, July 19, 2005 (UTC)

Agreed about history and advocacy - no surprise though - PostgreSQL has some really passionate supporters. Trying to get a non-technical overview is tough for this but it's worth trying, even if the article does have to go into some depth later, because it's the depths which the people who know the field need to know. That discussion of locking above is pretty obscure to most people not involved inwriting programs which use databases but it makes a massive difference when you're handling lots of operations simultaneously and need a guarantee that you're seeing the current data - it's the sort of thing you need when showing stock levels. Jamesday 15:18, 20 July 2005 (UTC)

[edit] Major Rearrangement

I finally got around to cleaning up the page a lot. It's still in rough shape, but most of the long, boring, lecture like sections are gone. Still needs a lot of work though, but I'm done for now, so have at it. Turnstep 01:15, 6 October 2005 (UTC)

[edit] Split off/Create Postgres

I think that there should be a separate Postgres article, with more details on Postgres... 132.205.45.110 18:08, 19 October 2005 (UTC)

Separate from what? Turnstep 18:29, 19 October 2005 (UTC)

[edit] market statistics for pgres growth...

i'm writing an open source article for a major online publication and would love to get some statistics for growth in pgres usage, or website stats for the postgresql website or anything that would show some measurement of interst level that goes back at least 5 years.

any ideas?

Such enquiries should be addressed to josh AT postgresql . org --Gsherry 04:55, 14 February 2006 (UTC)

[edit] Administration tools

I propose a section listing (and perhaps briefly describing) all major command line and GUI administration tools available for PostgreSQL. Unfortunately I do not have the knowledge to reliably constuct such a section myself. Mike.

I do not think that would fly well; with the PostgreSQL Management Tools article recently deleted (per Wikipedia is not a link directory), I wouldn't suggest anyone to attempt it either. Rather, I'll simply link to PostgreSQL at the Open Directory Project (suggest site) . -- intgr 11:29, 31 January 2007 (UTC)

[edit] relational database rewrite

I am trying to rewrite relational database and am soliciting opinions. I am particularly interested in bringing in the practical and popular definitions of the term to counter the current article's domination by the "theoretical" crowd. Ideogram 11:13, 11 June 2006 (UTC)

Relational databases, like any other mathematical construct, are not defined by what is popular. The ignorant may think that pi == 22/7 or that SQL NULLs are a relational feature, but that doesn't make it so. --FOo 23:09, 11 June 2006 (UTC)
The term "relational database" is not owned by the mathematicians. The vast majority of people are introduced to the term by claims that Oracle, SQL Server, and PostgreSQL are relational databases. An encyclopedia article is not the place to push your POV that popular usage is "ignorant". Ideogram 00:22, 12 June 2006 (UTC)
The relational model is a mathematical construct. A relational database is a piece of software. Ideogram 00:31, 12 June 2006 (UTC)

[edit] License?

The FAQ describes the license as "classic BSD", but it clearly isn't. It's a simple permissive license, more like the MIT license. It seems like it would be more clear to edit the article here to reflect this rather than calling it a BSD-style license. Can anyone point to any version of BSD ever released under this license? Or maybe the license text in the PostgreSQL FAQ is incorrect? --Johnsu01 17:49, 24 July 2006 (UTC)

Well, the Postgres developers certainly consider it to be "BSD licensed" (not "classic BSD" in the "4 clause BSD" sense). I think the license is functionally equivalent to the first two clauses of the original BSD license. Can we just call it a "BSD-style" license and leave it at that? Neilc 00:36, 25 July 2006 (UTC)

[edit] Prominent Users

Under prominent users Sony Online is listed as a PostgreSQL user which is technically incorrect. Sony Online are using EnterpriseDB (http://www.enterprisedb.com), which is based on PostgreSQL. Apropriate changes should be made.

How about we just remove it alltogether? It'll get filled up with cruft like this. — ceejayoz talk 21:33, 7 November 2006 (UTC)

[edit] Triggers on views

test=# create trigger bar BEFORE INSERT ON foo EXECUTE PROCEDURE eek();
ERROR: "foo" is not a table
84.160.218.201 18:58, 14 October 2006 (UTC)

Yeah, from an implementation point-of-view, attaching triggers to views doesn't make a lot of sense (triggers are fired by the executor, but by the time the query plan reaches the executor, references to views have already been replaced with the view's definition). Neilc 19:27, 14 October 2006 (UTC)