Index (database)
From Wikipedia, the free encyclopedia
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns. The disk space required to store the index is typically less than the storage of the table (since indexes usually contains only the key-fields according to which the table is to be arranged, and excludes all the other details in the table). In a relational database an index is a copy of part of a table.
Some databases extend the power of indexes even further by allowing indexes to be created on functions or expressions. For example, an index could be created on upper(last_name)
, which would only store the uppercase versions of the last_name field in the index[citation needed].
Indexes are defined as unique or non-unique. A unique index acts as a constraint on the table by preventing identical rows in the index and thus, the original columns.
Contents |
[edit] Architecture
Index architectures are classified as clustered or non-clustered. Clustered indexes are indexes that are built based on the same key by which the data is ordered on disk.[citation needed] In some relational database management systems such as Microsoft SQL Server, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to data that resides elsewhere, as is the case with a non-clustered index. Due to the fact that the clustered index corresponds (at the leaf level) to the actual data, the data in the table is sorted as per the index, and therefore, only one clustered index can exist in a given table (whereas many non-clustered indexes can exist, limited by the particular RDBMS vendor). Unclustered indexes are indexes that are built on any key. Each relation can have a single clustered index and many unclustered indexes. Clustered indexes usually store the actual records within the data structure and as a result can be much faster than unclustered indexes.[citation needed] Unclustered indexes are forced to store only record IDs in the data structure and require at least one additional i/o operation to retrieve the actual record. 'Intrinsic' might be a better adjective than 'clustered' -- indicating that the index is an integral part of the data structure storing the table.
Indexes can be implemented using a variety of data structures. Popular indices include balanced trees, B+ trees and hashes.[citation needed]...
[edit] Column order
The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.
For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name. Some databases can do this, others just won’t use the index.
[edit] Applications and limitations
Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Finkelstein';
. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the b-tree data structure until the Finkelstein entry has been found; this is much less computationally expensive than a full table scan.
Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';
. This query would yield an email address for every customer whose email address ends with "@yahoo.com", but even if the email_address column has been indexed the database still must perform a full table scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term the database software is unable to use the underlying b-tree data structure. This problem can be solved through the addition of another index created on reverse(email_address)
and a SQL query like this: select email_address from customers where reverse(email_address) like reverse('%@yahoo.com');
. This puts the wild-card at the right most part of the query (now moc.oohay@%) which the index on reverse(email_address) can satisfy.
[edit] See also
- How to measure Index Selectivity
- Indexes FillFactor
- Bitmap Index