Database administrator

From Wikipedia, the free encyclopedia

A database administrator (DBA) is a person who is responsible for the environmental aspects of a database. In general, these include:

  • Recoverability - Creating and testing Backups
  • Integrity - Verifying or helping to verify data integrity
  • Security - Defining and/or implementing access controls to the data
  • Availability - Ensuring maximum uptime
  • Performance - Ensuring maximum performance given budgetary constraints
  • Development and testing support - Helping programmers and engineers to efficiently utilize the database.

The role of a database administrator has changed according to the technology of database management systems (DBMSs) as well as the needs of the owners of the databases. For example, although logical and physical database design are traditionally the duties of a database analyst or database designer, a DBA may be tasked to perform those duties.

Contents

[edit] Duties

The duties of a database administrator vary and depend on the job description, corporate and Information Technology (IT) policies and the technical features and capabilities of the DBMS being administered. They nearly always include disaster recovery (backups and testing of backups), performance analysis and tuning, and some database design.

[edit] Definition of Database

A database is a collection of related information, accessed and managed by its DBMS. After experimenting with hierarchical and networked DBMSs during the 1970’s, the IT industry became dominated by relational DBMSs (Or Object Relational Database Management System) such as Oracle, Sybase, and, later on, Microsoft SQL Server and the like.

In a strictly technical sense, for any database to be defined as a "Truly Relational Model Database Management System," it should, ideally, adhere to the twelve rules defined by Edgar F. Codd, pioneer in the field of relational databases. To date, while many come close, it is admitted that nothing on the market adheres 100% to those rules, any more than they are 100% ANSI-SQL compliant.

While IBM and Oracle technically were the earliest on the RDBMS scene, many others have followed, and while it is unlikely that miniSQL still exist in their original form, Monty's MySQL is still extant and thriving, along with the Ingres-descended PostgreSQL. Alpha Five, Microsoft Access - the 1995+ versions, not the prior versions - were, despite various limitations, technically the closest thing to being 'Truly Relational' DBMS's for the desktop PC, with Visual FoxPro, and many other desktop products marketed at that time far less compliant with Codd's Rules.

A relational DBMS manages information about types of real-world things (entities) in the form of tables that represent the entities. A table is like a spreadsheet; each row represents a particular entity (instance), and each column represents a type of information about the entity (domain). Sometimes entities are made up of smaller related entities, such as orders and order lines; and so one of the challenges of a multi-user DBMS is provide data about related entities from the standpoint of an instant of logical consistency.

Properly managed relational databases minimize the need for application programs to contain information about the physical storage of the data they access. To maximize the isolation of programs from data structures, relational DBMSs restrict data access to the messaging protocol SQL, a nonprocedural language that limits the programmer to specifying desired results. This message-based interface was a building block for the decentralization of computer hardware, because a program and data structure with such a minimal point of contact become feasible to reside on separate computers.

[edit] Recoverability

Recoverability means that, if a data entry error, program bug or hardware failure occurs, the DBA can bring the database backward in time to its state at an instant of logical consistency before the damage was done. Recoverability activities include making database backups and storing them in ways that minimize the risk that they will be damaged or lost, such as placing multiple copies on removable media and storing them outside the affected area of an anticipated disaster. Recoverability is the DBA’s most important concern.

Recoverability, also sometimes called "disaster recovery," takes two primary forms. First the backup, then recovery tests.

The backup of the database consists of data with timestamps combined with database logs to change the data to be consistent to a particular moment in time. It is possible to make a backup of the database containing only data without timestamps or logs, but the DBA must take the database offline to do such a backup.

The recovery tests of the database consist of restoring the data, then applying logs against that data to bring the database backup to consistency at a particular point in time up to the last transaction in the logs. Alternatively, an offline database backup can be restored simply by placing the data in-place on another copy of the database.

If a DBA (or any administrator) attempts to implement a recoverability plan without the recovery tests, there is no guarantee that the backups are at all valid. In practice, in all but the most mature RDBMS packages, backups rarely are valid without extensive testing to be sure that no bugs or human error have corrupted the backups.

[edit] Integrity

Integrity means that the database, or the programs that create its content, embody means of preventing users who provide data from breaking the system’s business rules. For example, a retailer may have a business rule that only individual customers can place orders; and so every order must identify one and only one customer. Oracle Server and other relational DBMSs enforce this type of business rule with constraints, which are configurable implicit queries. To continue the example, in the process of inserting a new order the database may query its customer table to make sure that the customer identified by the order exists.

[edit] Security

Security means that users’ ability to access and change data conforms to the policies of the business and the delegation decisions of its managers. Like other metadata, a relational DBMS manages security information in the form of tables. These tables are the “keys to the kingdom” and so it is important to protect them from intruders.

[edit] Availability

Availability means that authorized users can access and change data as needed to support the business. Increasingly, businesses are coming to expect their data to be available at all times (“24x7”, or 24 hours a day, 7 days a week, ). The IT industry has responded to the availability challenge with hardware and network redundancy and increasing online administrative capabilities.

[edit] Performance

Performance means that the database does not cause unreasonable online response times, and it does not cause unattended programs to run for an unworkable period of time. In complex client/server and three-tier systems, the database is just one of many elements that determine the performance that online users and unattended programs experience. Performance is a major motivation for the DBA to become a generalist and coordinate with specialists in other parts of the system outside of traditional bureaucratic reporting lines.

Techniques for database performance tuning have changed as DBA's have become more sophisticated in their understanding of what causes performance problems and their ability to diagnose the problem.

In the 1990s, DBAs often focused on the database as a whole, and looked at database-wide statistics for clues that might help them find out why the system was slow. Also, the actions DBAs took in their attempts to solve performance problems were often at the global, database level, such as changing the amount of computer memory available to the database, or changing the amount of memory available to any database program that needed to sort data.

Around the year 2000, many of the most fundamental assumptions about database performance tuning were discovered to be myths. Most famously, the database buffer cache hit ratio, once thought to be the most reliable way to measure database performance, was found to be a completely meaningless statistic.

As of 2005, the fog has lifted. DBA's understand that performance problems initially must be diagnosed, and this is best done by examining individual SQL programs, not the database as a whole. Various tools, some included with the database and some available from third parties, provide a behind the scenes look at how the database is handling the SQL program, shedding light on what's taking so long.

Having identified the problem, the individual SQL statement can be tuned, and this is usually done by either rewriting it, using hints, adding or modifying indexes, or sometimes modifying the database tables themselves.

[edit] Development/Testing Support

Development and testing support is typically what the database administrator regards as his or her least important duty, while results-oriented managers consider it the DBA’s most important duty. Support activities include collecting sample production data for testing new and changed programs and loading it into test databases; consulting with programmers about performance tuning; and making table design changes to provide new kinds of storage for new program functions.

Here are some IT roles that are related to the role of database administrator:

[edit] See also