Database connection

In computer science, a database connection is the means by which a database server and its client software communicate with each other. The term is used whether or not the client and the server are on different machines.

The client uses a database connection to send commands to and receive replies from the server. A database is stored as a file or a set of files on magnetic disk or tape, optical disk, or some other secondary storage device. The information in these files may be broken down into records, each of which consists of one or more fields.

Fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Records are also organized into tables that include information about relationships between its various fields. Although database is applied loosely to any collection of information in computer files, a database in the strict sense provides cross-referencing capabilities.

Connections are a key concept in data-centric programming. Since some DBMSs require considerable time to connect, connection pooling is used to improve performance. No command can be performed against a database without an "open and available" connection to it.

Connections are built by supplying an underlying driver or provider with a connection string, which is used to address a specific database or server and to provide instance and user authentication credentials (for example, Server=sql_box;Database=Common;User ID=uid;Pwd=password;).

Once a connection has been built, it can be opened and closed at will, and properties (such as the command time-out length, or transaction, if one exists) can be set. The connection string consists of a set of key/value pairs, dictated by the data access interface of the data provider.

Some databases, such as PostgreSQL, only allow one operation to be performed at a time on each connection. If a request for data (a SQL Select statement) is sent to the database and a result set is returned, the connection is open but not available for other operations until the client finishes consuming the result set.

Other databases, such as SQL Server 2005 (and later), do not impose this limitation. However, databases that allow multiple concurrent operations on each connection usually incur far more overhead than those that only allow one operation at a time.

Pooling

Database connections are finite and expensive and can take a disproportionately long time to create relative to the operations performed on them. It is very inefficient for an application to create and close a database connection whenever it needs to update a database.

Connection pooling is a technique designed to alleviate this problem. A pool of database connections is created and then shared among the applications that need to access the database. When an application needs database access, it requests a connection from the pool. When it is finished, it returns the connection to the pool, where it becomes available for use by other applications.

The connection object obtained from the connection pool is often a wrapper around the actual database connection. The wrapper handles its relationship with the pool internally and hides the details of the pool from the application. For example, the wrapper object can implement a "close" method that can be called just like the "close" method on the database connection. Unlike the method on the database connection, the method on the wrapper may not actually close the database connection, but might instead return it to the pool. The application does not need to be aware of the connection pooling when it calls the methods on the wrapper object.

This approach encourages the practice of opening a connection in an application only when needed, and closing it as soon as the work is done, rather than holding a connection open for the entire life of the application. In this manner, a relatively small number of connections can service a large number of requests. This is also called multiplexing.

In a client–server architecture, on the other hand, a persistent connection is typically used so that server state can be managed. This "state" includes server-side cursors, temporary products, connection-specific functional settings, and so on.

It is desirable to set some limit on the number of connections in the pool. Using too many connections may just cause thrashing rather than get more useful work done. In case an operation is attempted and all connections are in use, the operation can block until a connection is returned to the pool, or an error may be returned.

See also

References

External links