Hash join

From Wikipedia, the free encyclopedia

The Hash Join is an example of a join algorithm and is used in the implementation of a relational database management system.

The basic problem of a join algorithm is to find, for each distinct value of the join attribute, the set of tuples in each relation which display that value.

Applying the hash join algorithm on an inner join of two relations proceeds as follows: first prepare a hash table for the smaller relation, by applying a hash function to the join attribute of each row. Then scan the larger relation and find the relevant rows by looking in the hash table.

Hash joins require an equi-join predicate (a predicate comparing values from one table with values from the other table using the equals operator '='). Hash joins have the advantage that each table is read only once and that no sort operations are required. In their simplest form they require that the smaller table fits in memory. Various schemes (e.g. Grace Join or Hybrid Hash Join) have been proposed to handle cases where the smaller table does not fit into memory.

Hash joins can also be evaluated for an anti-join predicate (a predicate selecting values from one table when no related values are found in the other). Applying this algorithm proceeds as follows: first prepare a hash table for the 'not in' side of the join. Then scan the other table, selecting any rows where the join attribute hashes to an empty entry in the hash table.

Hash joins were first mentioned in 1984 in a paper "Hashing Methods and Relational Algebra Operations" by Kjell Bratbergsengen.

In other languages