Datalog

From Wikipedia, the free encyclopedia

Datalog is a query and rule language for deductive databases that syntactically is a subset of Prolog. Its origins date back to the beginning of logic programming, but it became prominent as a separate area around 1978 when Hervé Gallaire and Jack Minker organized a workshop on logic and databases. The term Datalog was coined in the mid 1980s by a group of researchers interested in database theory.

1 Features, limitations and extensions
2 Example
3 Systems implementing Datalog
4 See also
5 References

[edit] Features, limitations and extensions

Query evaluation with Datalog is sound and complete and can be done efficiently even for large databases. Query evaluation is usually done using bottom up strategies.

In contrast to Prolog, it

disallows complex terms as arguments of predicates, e.g. P(1, 2) is admissible but not P(f1(1), 2),
imposes certain stratification restrictions on the use of negation and recursion, and
only allows range restricted variables, i.e. each variable in the conclusion of a rule must also appear in a not negated clause in the premise of this rule.

Datalog was popular in academic database research but never succeeded in becoming part of a commercial database system, despite its advantages (compared to other database languages such as SQL) such as recursive queries and a clean semantics. Even so, some widely used database systems include ideas and algorithms developed for Datalog. For example, the SQL:1999 standard includes recursive queries, and the Magic Sets algorithm (initially developed for the faster evaluation of Datalog queries) is implemented in IBM's DB2.

Two extensions that have been made to Datalog include an extension to allow object-oriented programming and an extension to allow disjunctions as heads of clauses. Both extensions have major impacts on the definition of Datalog's semantics and on the implementation of a corresponding Datalog interpreter.

[edit] Example

Example Datalog program:

parent(bill,mary).
parent(mary,john).

These two lines define two facts, i.e. things that always hold. They can be intuitively understood as: the parent of bill is mary and the parent of mary is john.

ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- ancestor(X,Z),ancestor(Z,Y).

These two lines describe the rules that define the ancestor relationship. A rule consists of two main parts separated by the :- symbol. The part to the left of this symbol is the head, the part to the right the body of the rule. A rule is read (and can be intuitively understood) as <head> if it is known that <body>. Uppercase letters stand for variables. Hence in the example the first rule can be read as X is the ancestor of Y if it is known that X is the parent of Y. And the second rule as X is the ancestor of Y if it is known that X is the ancestor of some Z and Z is the ancestor of Y. The ordering of the clauses is irrelevant in Datalog in contrast to Prolog which depends on the ordering of clauses for computing the result of the query call.

Datalog distinguishes between extensional and intensional predicate symbols. While extensional predicate symbols are only defined by facts, intensional predicate symbols are defined only by rules. In the example above ancestor is an intensional predicate symbol, and parent is extensional. Predicates may also be defined by facts and rules and therefore neither be purely extensional nor intensional, but any datalog program can be rewritten into an equivalent program without such predicate symbols with duplicate roles.

?- ancestor(bill,X).

The query above asks for all ancestors of bill and would return mary and john when posed against a Datalog system containing the facts and rules described above.

[edit] Systems implementing Datalog

Most implementations of Datalog stem from university projects.^[1] Here is a short list of systems that are either based on Datalog or are providing a Datalog interpreter:

bddbddb, an implementation of Datalog done at Stanford University. It is mainly used to query Java bytecode including points-to analysis on large Java programs.
ConceptBase, a deductive and object-oriented database system based on a Datalog query evaluator. It is mainly used for conceptual modeling and meta-modeling.
IRIS, an open-source Datalog engine implemented in java. IRIS extends Datalog with function symbols, built-in predicates, locally stratified or un-stratified logic programs (using the well-founded semantics), unsafe rules and XML schema data types.
DES, an open-source implementation of Datalog to be used for teaching Datalog in courses.
XSB, a logic programming and deductive database system for Unix and Windows.
.QL, an object-oriented variant of Datalog created by Semmle.
Datalog [1], a lightweight deductive database system written in Lua.
SecPAL a security policy language developed by Microsoft Research [2]
DLV is a datalog extension that supports disjunctive head clauses.