PostgreSQL

PostgreSQL

Developer(s)	PostgreSQL Global Development Group
Stable release	8.4.4 / 8.3.11 / 8.2.17 / 8.1.21 / 8.0.25 / 7.4.29 / May 17, 2010; 8 months ago (2010-05-17)
Preview release	9.0 beta4 / August 2, 2010; 6 months ago (2010-08-02)
Written in	C
Operating system	Cross-platform
Type	ORDBMS
License	PostgreSQL licence^[1]^[2]^[3]
Website	http://www.postgresql.org/

PostgreSQL, often simply Postgres, is an object-relational database management system (ORDBMS).^[4] It is released under an MIT-style license and is thus free and open source software. As with many other open-source programs, PostgreSQL is not controlled by any single company - a global community of developers and companies develops the system.

Product name

The mixed-capitalization of the PostgreSQL name can confuse some people on first viewing. The several pronunciations of "SQL" can lead to this confusion. PostgreSQL's developers pronounce it /ˈpoʊstɡrɛs ˌkjuː ˈɛl/; (Audio sample, 5.6k MP3). It is abbreviated as "Postgres", its original name. Because of ubiquitous support for the SQL Standard amongst most relational databases, the community considered changing the name back to Postgres. However, the PostgreSQL Core Team announced in 2007 that the product would continue to use the name PostgreSQL.^[5] The name refers to the project's origins as a "post-Ingres" database, the original authors having also developed the Ingres database. (The name Ingres was an abbreviation for INteractive Graphics REtrieval System.)

History

PostgreSQL evolved from the Ingres project at University of California, Berkeley. In 1982, the project leader, Michael Stonebraker, left Berkeley to make a proprietary version of Ingres. He returned to Berkeley in 1985 and started a post-Ingres project to address the problems with contemporary database systems that had become increasingly clear during the early 1980s. The new project, Postgres, aimed to add the fewest features needed to completely support types. These features included the ability to define types and to fully describe relationships – something used widely before but maintained entirely by the user. In Postgres, the database "understood" relationships, and could retrieve information in related tables in a natural way using rules. Postgres used many of the ideas of Ingres, but not its code.

Starting in 1986, the team published a number of papers describing the basis of the system, and by 1988 had a prototype version. The team released version 1 to a small number of users in June 1989, then version 2 with a re-written rules system in June 1990. Version 3, released in 1991, again re-wrote the rules system, and added support for multiple storage managers and an improved query engine. By 1993 the great number of users began to overwhelm the project with requests for support and features. After releasing version 4 — primarily a cleanup — the project ended.

But open-source developers could obtain copies and develop the system further, because Berkeley had released Postgres under a MIT-style license. In 1994, Berkeley graduate students Andrew Yu and Jolly Chen replaced the Ingres-based QUEL query language interpreter with one for the SQL query language, creating Postgres95. The code was released on the web.

In July 1996, Marc Fournier at Hub.Org Networking Services provided the first non-university development server for the open-source development effort. Along with Bruce Momjian and Vadim B. Mikheev, work began to stabilize the code inherited from Berkeley. The first open-source version was released on August 1, 1996.

In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. The first PostgreSQL release formed version 6.0 in January 1997. Since then, the software has been maintained by a group of database developers and volunteers around the world, coordinating via the Internet.

The PostgreSQL project continues to make major releases (approximately annually) and minor "bugfix" releases, all available under the same license. Code comes from contributions from proprietary vendors, support companies, and open-source programmers at large.

Features

Functions

Functions allow blocks of code to be executed by the server. Although these blocks can be written in SQL, the lack of basic programming operations which existed prior to version 8.4, such as branching and looping, has driven the adoption of other languages inside of functions. Some of the languages can even execute inside of triggers. Functions in PostgreSQL can be written in the following languages:

A built-in language called PL/pgSQL which resembles Oracle's procedural language PL/SQL.
Scripting languages are supported through PL/Lua, PL/Perl, plPHP, PL/Python, PL/Ruby, PL/sh, PL/Tcl and PL/Scheme.
Compiled languages C, C++, Java (via PL/Java)
The statistical language R through PL/R.

PostgreSQL supports row-returning functions, where the output of the function is a set of values which can be treated much like a table within queries. Custom aggregates and window functions can also be defined.

Functions can be defined to execute with the privileges either of the caller or of the user who defined the function. Functions are sometimes referred to as stored procedures, although there is a slight technical distinction between the two.

Indexes

PostgreSQL includes built-in support for B+-tree, hash, GiST and GiN indexes. In addition, user-defined index methods can be created, although this is quite an involved process. Indexes in PostgreSQL also support the following features:

Expression indexes can be created with an index of the result of an expression or function, instead of simply the value of a column.
Partial indexes, which only index part of a table, can be created by adding a WHERE clause to the end of the CREATE INDEX statement. This allows a smaller index to be created.
The planner is capable of using multiple indexes together to satisfy complex queries, using temporary in-memory bitmap index operations.

Triggers

Triggers are events triggered by the action of SQL DML statements. For example, an INSERT statement might activate a trigger that checked if the values of the statement were valid. Most triggers are only activated by either INSERT or UPDATE statements.

Triggers are fully supported and can be attached to tables but not to views. Views can have rules, though. Multiple triggers are fired in alphabetical order. In addition to calling functions written in the native PL/PgSQL, triggers can also invoke functions written in other languages like PL/Perl.

MVCC

PostgreSQL manages concurrency through a system known as Multi-Version Concurrency Control (MVCC), which gives each user a "snapshot" of the database, allowing changes to be made without being visible to other users until a transaction is committed. This largely eliminates the need for read locks, and ensures the database maintains the ACID principles in an efficient manner.

Rules

Rules allow the "query tree" of an incoming query to be rewritten. One common usage is to implement views, including updatable views. Rules, or more properly, "Query Re-Write Rules", are attached to a table/class and "Re-Write" the incoming DML (select, insert, update, and/or delete) into one or more queries that either replace the original DML statement or execute in addition to it. Query Re-Write occurs after DML statement parsing, but, before query planning.

Data types

A wide variety of native data types are supported, including:

Variable length arrays (including text and composite types) up to 1GB in total storage size.
Arbitrary precision numerics
Geometric primitives
IPv4 and IPv6 addresses
CIDR blocks and MAC addresses
XML supporting Xpath queries (as of 8.3)
UUID (as of 8.3)

In addition, users can create their own data types which can usually be made fully indexable via PostgreSQL's GiST infrastructure. Examples of these include the geographic information system (GIS) data types from the PostGIS project for PostgreSQL.

User-defined objects

New types of almost all objects inside the database can be created, including:

Casts
Conversions
Data types
Domains
Functions, including aggregate functions
Indexes
Operators (existing ones can be overloaded)
Procedural languages

Inheritance

Tables can be set to inherit their characteristics from a "parent" table. Data in child tables will appear to exist in the parent tables, unless data is selected from the parent table using the ONLY keyword, i.e. select * from ONLY PARENT_TABLE. Adding a column in the parent table will cause that column to appear in the child table.

Inheritance can be used to implement table partitioning, using either triggers or rules to direct inserts to the parent table into the proper child tables.

As of 2010^[update] this feature is not fully supported yet—in particular, table constraints are not currently inheritable. As of the 8.4 release, all check constraints and not-null constraints on a parent table are automatically inherited by its children. Other types of constraints (unique, primary key, and foreign key constraints) are not inherited.

Inheritance provides a way to map the features of generalization hierarchies depicted in Entity Relationship Diagrams (ERD) directly into the PostgreSQL database.

Other features

Referential integrity constraints including foreign key constraints, column constraints, and row checks
Views. Although views are by default not update-able, they can be made so by creating "insert", "update", and/or "delete" Query Re-Write Rules on the view.
Inner, outer (full, left and right), and cross joins
Sub-selects
- Correlated sub-queries ^[6]
Transactions
Supports most of the major features of SQL:2008 standard ^[7]
Encrypted connections via SSL
Binary and textual large-object storage
Online backup
Domains
Tablespaces
Savepoints
Point-in-time recovery
Two-phase commit
TOAST (The Oversized-Attribute Storage Technique) is used to transparently store large table attributes (such as big MIME attachments or XML messages) in a separate area, with automatic compression.
Regular expressions ^[8]
Common table expressions
Embedded SQL is implemented using preprocessor. SQL code is first written embedded into C code. Then code is run through ECPG preprocessor, which replaces SQL with calls to code library. Then code can be compiled using C compiler. Embedding works also with C++ but it does not recognize all C++ constructs.
Full text search

Replication

The currently in beta testing PostgreSQL 9.0 includes built-in binary replication, based on shipping the changes made to all database blocks to other systems asynchronously after commmit. Adding synchronous replication is on the roadmap for the 9.1 release.

9.0 also introduces the ability to run read-only queries against these replicated slaves, where earlier versions would only allow that after promoting them to be a new master. This allows splitting read traffic among multiple nodes efficiently. Earlier replication software that allowed similar read scaling normally relied on adding replication triggers to the master, introducing additional load onto it.

There are several asynchronous master/slave replication packages also available. These remain useful even after introduction of the expanded core capabilities, for situations where binary replication of an entire database cluster isn't the appropriate approach:

Londiste (permissive license)
Slony-I (same license as PostgreSQL)
Mammoth Replicator. (same license as PostgreSQL^[9], formerly proprietary)
Bucardo (http://bucardo.org/) (Simplified BSD license; also does master to master replication)

There are also proxy (middleware) tools that enable replication, failover or load management and balancing for PostgreSQL:

PGPool-II.
Sequoia available for a number of different server besides PostgreSQL.

Some synchronous multi-master derivatives or extensions exist, including

pgcluster (same license as PostgreSQL)
Postgres-R (in early stages of development)
Postgres-XC (in early stages of development)

Add-ons

Performance wizard - included with any Postgres download from EnterpriseDB. Source code also available.
MySQL migration wizard - included with any Postgres download from EnterpriseDB. Source code also available.
Geographic objects via PostGIS. GPL.
Shortest-Path-Algorithms with pgRouting using PostGIS. GPL.

Database administration front-ends

Open source

psql

The primary front-end for PostgreSQL is the psql command-line program, which can be used to enter SQL queries directly, or execute them from a file. In addition, psql provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks; for example tab completion of object names and SQL syntax.

pgAdmin

pgAdmin is a free and open source graphical front-end administration tool for PostgreSQL, which is supported on most popular computer platforms. The program is available in more than a dozen languages. The first prototype, named pgManager, was written for PostgreSQL 6.3.2 from 1998, and rewritten and released as pgAdmin under the GPL License in later months. The second incarnation (named pgAdmin II) was a complete rewrite, first released on 16 January 2002. The current version is pgAdmin III, which was originally released under the Artistic License and is now released under the same license as PostgreSQL. Unlike prior versions that were written in Visual Basic, pgAdmin III is written in C++, using the wxWidgets framework allowing it to run on most common operating systems.

phpPgAdmin

phpPgAdmin is a web-based administration tool for PostgreSQL written in PHP and based on the popular phpMyAdmin interface originally written for MySQL administration.^[10]

Proprietary

A number of companies offer proprietary tools for PostgreSQL. They often consist of a universal core that is adapted for various specific database products. These tools mostly share the administration features with the open source tools but offer improvements in data modeling, importing, exporting or reporting.

See also: Comparison of database tools

Benchmarks and Performance

Many informal performance studies of PostgreSQL have been done^[11]. Performance improvements aimed at improving scalability started heavily with version 8.1, and running simple benchmarks the current version 8.4 has been shown to be at least 7.5 times faster on both read and write workloads compared with version 8.0.^[12]

The first industry-standard and peer-validated benchmark was completed in June 2007 using the Sun Java System Application Server (proprietary version of GlassFish) 9.0 Platform Edition, UltraSPARC T1 based Sun Fire server and Postgres 8.2.^[13] This result of 778.14 SPECjAppServer2004 JOPS@Standard compares favourably with the 874 JOPS@Standard with Oracle 10 on an Itanium based HP-UX system.^[11]

In August 2007, Sun submitted an improved benchmark score of 813.73 SPECjAppServer2004 JOPS@Standard. With the system under test at a reduced price, the price/performance improved from $US 84.98/JOPS to $US 70.57/JOPS.^[14]

The default configuration of PostgreSQL only uses a small amount of dedicated memory for performance critical purposes such as caching database blocks and sorting. This limitation is primarily because many operating systems will not allow allocating large blocks of shared memory by default, which means the database requires kernel changes before it can be tuned properly. Advice on basic recommended performance practice covering all recent versions of PostgreSQL is available at Tuning Your PostgreSQL Server.

Prominent users

Yahoo! for web user behavioral analysis, storing two petabytes and claimed to be the largest data warehouse using a heavily modified version of PostgreSQL with an entirely different column-based storage engine and different query processing layer. While for performance, storage, and query purposes the database bears little resemblance to PostgreSQL, the front-end maintains compatibility so that Yahoo can use many off-the-shelf tools already written to interact with PostgreSQL.^[15]^[16]
MySpace, a popular social networking website, is using Aster nCluster Database for data warehousing, which is built on unmodified PostgreSQL.^[17]^[18]
OpenStreetMap, a collaborative project to create a free editable map of the world.^[19]
Afilias, domain registries for .org, .info and others.^[20]
Sony Online multiplayer online games.^[21]
BASF, shopping platform for their agribusiness portal.^[22]
hi5.com social networking portal.^[23]
reddit.com social news website.^[24]
Skype VoIP application, central business databases.^[25]
Sun xVM, Sun's virtualization and datacenter automation suite.^[26]
Evergreen, an open source integrated library system providing an Online Public Access Catalog and cataloging, management, and other functions for hundreds of libraries in the United States, Canada, and elsewhere.
MusicBrainz, open online music encyclopedia.
International Space Station for collecting telemetry data in orbit and replicating to the ground.^[27]
MyYearbook social networking site^[28]

Awards

As of 2008^[update] PostgreSQL has received the following awards:^[29]

1999 LinuxWorld Editor's Choice Award for Best Database
2000 Linux Journal Editors' Choice Awards for Best Database
2002 Linux New Media Editors Choice Award for Best Database
2003 Linux Journal Editors' Choice Awards for Best Database
2004 Linux New Media Award For Best Database
2004 Linux Journal Editors' Choice Awards for Best Database
2004 ArsTechnica Best Server Application Award
2005 Linux Journal Editors' Choice Awards for Best Database
2006 Linux Journal Editors' Choice Awards for Best Database
2008 Developer.com Product of the Year, Database Tool

Proprietary derivatives and support

Although the license allowed proprietary products based on Postgres, the code did not develop in the proprietary space at first — somewhat surprisingly considering the advantages Postgres offered. The main offshoot originated when Paula Hawthorn (an original Ingres team member who moved from Ingres) and Michael Stonebraker formed Illustra Information Technologies to make a proprietary product based on Postgres.

In 2000, former Red Hat investors created the company Great Bridge to make a proprietary product based on PostgreSQL and compete against proprietary database vendors. Great Bridge sponsored several PostgreSQL developers and donated many resources back to the community,^[30] but by late 2001 closed due to tough competition from companies like Red Hat and to poor market conditions.^[31]

In 2001, Command Prompt, Inc. released Mammoth PostgreSQL, a proprietary product based on PostgreSQL. In 2008, Command Prompt, Inc. released the source under the original license. Command Prompt, Inc. continues to support the PostgreSQL community actively through developer sponsorships and projects including PL/Perl, PL/php, and hosting of community projects such as the PostgreSQL Build Farm.

In January 2005, PostgreSQL received backing by database vendor Pervasive Software, known for its Btrieve product which was ubiquitous on the Novell NetWare platform. Pervasive announced commercial support and community participation and achieved some success. In July 2006, Pervasive left the PostgreSQL support market.^[32]

In mid-2005 two other companies announced plans to make proprietary products based on PostgreSQL with focus on separate niche markets. EnterpriseDB added functionality to allow applications written to work with Oracle to be more readily run with PostgreSQL. Greenplum contributed enhancements directed at data warehouse and business intelligence applications, including the BizGres project.

In October 2005, John Loiacono, executive vice president of software at Sun Microsystems, commented: "We're not going to OEM Microsoft but we are looking at PostgreSQL right now,"^[33] although no specifics were released at that time. By November 2005, Sun had announced support for PostgreSQL.^[34] By June 2006, Sun Solaris 10 (6/06 release) shipped with PostgreSQL.

In August 2007, EnterpriseDB announced^[35] the Postgres Resource Center [4] and EnterpriseDB Postgres, designed as a fully configured distribution of PostgreSQL including many contrib modules and add-on components. EnterpriseDB Postgres was renamed to Postgres Plus in March 2008. Postgres Plus is available in two versions: Postgres Plus Standard Server which has all the features of PostgreSQL plus additional QA testing, integrated components, tuning and one-click install, and Postgres Plus Advanced Server which has all the features of Postgres Standard Server plus Oracle compatibility, scalability features, and DBA and developer tools. Both versions are available for free and are fully supported, though the free version of Postgres Plus Advanced Server is restricted by a "limited use" license, which is defined as "confined to a single CPU, utilizing 1 GB RAM, storing no more than 6GB of data in a NON-PRODUCTION environment."

References

↑ "PostgreSQL licence approved by OSI". 2010-02-18. http://www.crynwr.com/cgi-bin/ezmlm-cgi?17:mmp:969. Retrieved 2010-02-18.
↑ "OSI PostgreSQL licence". 2010-02-20. http://www.opensource.org/licenses/postgresql. Retrieved 2010-02-20.
↑ http://www.postgresql.org/about/licence
↑ What is PostgreSQL?
↑ "Project name - statement from the core team". archives.postgresql.org. 2007-11-16. http://archives.postgresql.org/pgsql-advocacy/2007-11/msg00109.php. Retrieved 2007-11-16.
↑ [1]
↑ [2]
↑ [3]
↑ Command prompt's Mammoth replicator page lists replicator's availability as "open source"
↑ phpPgAdmin Project (2008-04-25). "About phpPgAdmin". http://phppgadmin.sourceforge.net/?page=about. Retrieved 2008-04-25.
↑ ^11.0 ^11.1 Josh Berkus (2007-07-06). "PostgreSQL publishes first real benchmark". http://blogs.ittoolbox.com/database/soup/archives/postgresql-publishes-first-real-benchmark-17470. Retrieved 2007-07-10.
↑ Josh Berkus (2009-09-29). "PostgreSQL history". http://suckit.blog.hu/2009/09/29/postgresql_history. Retrieved 2010-08-28.
↑ "SPECjAppServer2004 Result". SPEC. 2007-07-06. http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070606-00065.html. Retrieved 2007-07-10.
↑ "SPECjAppServer2004 Result". SPEC. 2007-07-04. http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html. Retrieved 2007-09-01.
↑ Eric Lai (2008-05-22). "Size matters: Yahoo claims 2-petabyte database is world's biggest, busiest". Computerworld. http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyId=18&articleId=9087918&intsrc=hm_topic.
↑ Thomas Claburn (2008-05-21). "Yahoo Claims Record With Petabyte Database". InformationWeek. http://www.informationweek.com/news/showArticle.jhtml?articleID=207801579.
↑ Emmanuel Cecchet (2009-05-21). "Building PetaByte Warehouses with Unmodified PostgreSQL". PGCon 2009. http://www.pgcon.org/2009/schedule/events/196.en.html.
↑ "MySpace.com scales analytics for all their friends". case study. http://www.asterdata.com/resources/downloads/casestudies/myspace_aster.pdf.
↑ "Database - OpenStreetMap". http://wiki.openstreetmap.org/wiki/Database.
↑ PostgreSQL affiliates .ORG domain
↑ Sony Online opts for open-source database over Oracle
↑ A Web Commerce Group Case Study on PostgreSQL
↑ Rodney Gedda (2007-05-30). "Open source PostgreSQL hits another Hi5". Computerworld. http://www.linuxworld.com/news/2007/053007-open-source-postgresql-hits-another.html. Retrieved 2007-10-23.
↑ "Reddit FAQ". Reddit. 2010. http://www.reddit.com/help/faq. Retrieved 2010-8-14.
↑ "PostgreSQL at Skype". Skype Developer Zone. 2006. https://developer.skype.com/SkypeGarage/DbProjects/SkypePostgresqlWhitepaper. Retrieved 2007-10-23.
↑ "How Much Are You Paying For Your Database?". Sun Microsystems blog. 2007. http://blogs.sun.com/marchamilton/entry/how_much_are_you_paying. Retrieved 2007-12-14.
↑ Duncavage, Daniel P. (2010-07-13). "NASA needs Postgres - Nagios help". http://archives.postgresql.org/pgsql-general/2010-07/msg00394.php.
↑ Roy, Gavin M.. "PostgreSQL at myYearbook.com". https://www.postgresqlconference.org/2010/east/talks/postgresql_at_myyearbook.com.
↑ "PostgreSQL, Award Winning Software". postgresql.org. 2008-01-19. http://www.postgresql.org/about/awards. Retrieved 2008-01-31.
↑ Maya Tamiya (2001-01-10). "Interview: Bruce Momjian". LWN.net. http://lwn.net/2001/features/Momjian/. Retrieved 2007-09-07.
↑ Great Bridge (2001-09-06). "Great Bridge ceases operations". Press release. http://findarticles.com/p/articles/mi_m0EIN/is_2001_Sept_6/ai_77869978/print. Retrieved 2007-09-07.
↑ John Farr (2006-07-25). "Open letter to the PostgreSQL Community". Pervasive Software. http://web.archive.org/web/20070225141652/http://www.pervasive-postgres.com/letter.asp. Retrieved 2007-02-13.
↑ Rodney Gedda (2005-10-05). "Sun's software chief eyes databases, groupware". Computerworld. http://www.computerworld.com.au/index.php/id;116679278;fp;16;fpid;0. Retrieved 2007-02-13.
↑ Sun Microsystems (2005-11-17). "Sun Announces Support for Postgres Database on Solaris 10". Press release. http://www.sun.com/smi/Press/sunflash/2005-11/sunflash.20051117.1.html. Retrieved 2007-02-13.
↑ EnterpriseDB (2007-08-07). "EnterpriseDB Announces First-Ever Professional-Grade PostgreSQL Distribution for Linux". Press release. http://www.enterprisedb.com/news_events/press_releases/07_08_07.do. Retrieved 2007-08-07.

External links

PostgreSQL website
Professional technical support for PostgreSQL
Official PostgreSQL wiki, including Performance Optimization and Detailed installation guides sections.
Installing Postgres Plus Standard Server v8.3 on Windows Tutorial
Getting Started with Postgres Plus Tutorial for Windows
PostgreSQL at the Open Directory Project
Database Journal articles on PostgreSQL
Linux Productivity Magazine: a complete issue on PostgreSQL
PostgreSQL Guide
PostgreSQL gotchas, documented but counterintuitive behavior
Test_PGC, Example embedded SQL/C program for PostgreSQL showing database operations and SQLSTATE testing.