Algebraic data type

From Wikipedia, the free encyclopedia

In computer programming, an algebraic data type is a datatype each of whose values is data from other datatypes wrapped in one of the constructors of the datatype. Any wrapped data is arguments to the constructor. In contrast to other datatypes, the constructor is not executed and the only way to operate on the data is to unwrap the constructor using pattern matching.

The most common algebraic data type is a list with two constructors: Nil or [] for an empty list, and Cons (an abbreviation of constructor), ::, or : for the combination of a new element with a shorter list (for example (Cons 1 '(2 3 4)) or 1:[2,3,4]).

Special cases of algebraic types are product types (only one constructor) and enumeration types (many constructors with no arguments). Algebraic types are one kind of constructed type (i.e. a type formed by combining other types).

An algebraic data type may also be an abstract data type (ADT) if it is exported from a module without its constructors. Values of such a type can only be manipulated using functions defined in the same module as the type itself.

In set theory the equivalent of an algebraic data type is a discriminated union – a set whose elements consist of a tag (equivalent to a constructor) and an object of a type corresponding to the tag (equivalent to the constructor arguments).

Contents

[edit] An example

For example, in Haskell we can define a new algebraic data type, Tree:

data Tree = Empty 
          | Leaf Int 
          | Node Tree Tree

or in OCaml syntax:

type tree = Empty 
          | Leaf of int 
          | Node of tree * tree

or in Visual Prolog syntax:

domains
    tree =
        empty();
        leaf(integer Leaf);
        node(tree Left, tree Right).


Here, Empty, Leaf and Node are the constructors. Somewhat similar to a function, a constructor is applied to arguments of an appropriate type, then yielding an instance of the data type to which the constructor belongs. For instance, Leaf has the something like a “functional type” Int -> Tree meaning that giving an integer as an argument to Leaf produces a value of the type Tree. As Node takes two arguments of the type Tree itself, the datatype is recursive.

Operations on algebraic data types can be defined by using pattern matching to retrieve the arguments. For example, consider a function to find the depth of a Tree:

depth :: Tree -> Int
depth Empty       = 0
depth (Leaf n)    = 1
depth (Node l r) = 1 + max (depth l) (depth r)

Thus, a Tree given to depth can be constructed using any of Empty, Leaf or Node and we must match for any of them respectively to deal with all cases. In case of Node, the pattern extracts the subtrees l and r for further processing.

[edit] Theory

A general algebraic data type is a (possibly recursive) sum type of product types. Each constructor tags a product type to separate it from others, or if there is only one constructor, the data type is a product type. Further, the parameter types of a constructor are the factors of the product type. A parameterless constructor corresponds to the empty product. If a datatype is recursive, the entire sum of products is wrapped in a recursive type, and each constructor also rolls the datatype into the recursive type.

For example, the Haskell datatype:

  data List a = Nil | Cons a (List a)

is represented in type theory as \lambda \alpha. \mu \beta. 1 + \alpha \times \beta with constructors \mathrm{nil}_\alpha = \mathrm{roll}\ (\mathrm{inl}\ \langle\rangle) and \mathrm{cons}_\alpha\ x\ l = \mathrm{roll}\ (\mathrm{inr}\  \langle x, l\rangle).

The Haskell List datatype can also be represented in type theory in a slightly different form, as follows: \mu \phi. \lambda a. 1 + \alpha \times \phi\ \alpha. (Note how the μ and λ contructs are reversed relative to the original.) The original formation specified a type function whose body was a recursive type; the revised version specifies a recursive function on types. (I use the type variable φ to suggest a function rather than a "base type" like β: φ is like a Greek "f".) Note that we must also now apply the function φ to its argument type α in the body of the type.

For the purposes of the List example, these two formulations are not significantly different; but the second form allows one to express so-called nested data types, i.e., those where the recursive type differs parametrically from the original. (For more information on nested data types, see the works of Richard Bird, Lambert Meertens and Ross Paterson.)

[edit] See also

[edit] Reference

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.