Datalog

From Wikipedia, the free encyclopedia

Datalog is a query and rule language for deductive databases that syntactically is a subset of Prolog. Its origins date back to the beginnning of logic programming, but it became prominent as a separate area around 1978 when Hervé Gallaire and Jack Minker organized a workshop on logic and databases. The term Datalog was coined in the mid 1980's by a group of researchers interested in database theory.

1 Features, limitations and extensions
2 Example
3 Systems implementing Datalog
4 See also
5 References

[edit] Features, limitations and extensions

Query evaluation with Datalog is sound and complete and can be done efficiently even for large databases. Query evaluation is usually done using bottom up strategies. For restricted forms of datalog that don't allow any function symbols, safety of query evaluation is guaranteed.

In contrast to Prolog, it

disallows complex terms as arguments of predicates, e.g. P(1, 2) is admissible but not P(f1(1), 2),
imposes certain stratification restrictions on the use of negation and recursion, and
only allows range restricted variables, i.e. each variable in the conclusion of a rule must also appear in a not negated clause in the premise of this rule.

Datalog was popular in academic database research but never succeeded in becoming part of a commercial database system. Advantages of Datalog over SQL such as the clean semantics or recursive queries were not sufficient. Modern database systems, however, include ideas and algorithms developed for Datalog: the SQL99 standard, which includes recursive queries and the Magic Sets algorithm initially developed for the faster evaluation of Datalog queries, is implemented in IBMs DB2.

Extensions to Datalog were made to make it object-oriented, or to allow disjunctions as heads of clauses. Both extensions have major impacts on the definition of the semantics and the implementation of a corresponding Datalog interpreter.

[edit] Example

Example Datalog program:

parent(bill,mary).
parent(mary,john).
ancestor(X,Y) :- ancestor(X,Z),ancestor(Z,Y).
ancestor(X,Y) :- parent(X,Y).

The ordering of the clauses is irrelevant in Datalog in contrast to Prolog which depends on the ordering of clauses for computing the result of the query call.

[edit] Systems implementing Datalog

Most implementations of Datalog stem from university projects.^[1] Here is a short list of a few open source systems that are either based on Datalog or are providing a Datalog interpreter:

bddbddb, an implementation of Datalog done at the Stanford University. It is mainly used to query Java bytecode including points-to analysis on large Java programs.
ConceptBase, a deductive and object-oriented database system based on a Datalog query evaluator. It is mainly used for conceptual modeling and meta-modeling.
DES, an open-source implementation of Datalog to be used for teaching Datalog in courses.
XSB, is a Logic Programming and Deductive Database system for Unix and Windows.
.QL is a modern, object-oriented variant of Datalog created by Semmle, that runs on any of the major relational database systems (SQL Server, Postgres, ...) A version of .QL for querying Java source code is freely downloadable from Semmle's website.