List comprehension

From Wikipedia, the free encyclopedia

In some programming languages, list comprehension is a syntactic construct for creating a list based on existing lists, analogous to the set-builder notation (set comprehension), that is, the mathematical notation such as the following:

S=\{\,2*x\mid x \in \mathbb{N}, x^2>3\,\}

For an example, in Haskell's list comprehension syntax, the example set-builder construct above would be similarily written as:

S = [ 2*x | x<-[0..], x^2>3 ] 

Here, the list [0..] represents N, and x^2>3 represents the conditional. The example could be read: "S is the list of all 2*x where x is an item in the list of natural numbers, and x squared is greater than 3."


Contents

[edit] History

The earliest reference to the list comprehension notation is in Rod Burstall and John Darlington's description of their programming language, NPL from 1977, but SETL already had a similar construct. In the computer algebra system AXIOM, a similar construct processes streams.

Comprehensions were proposed as a query notation for databases [1] and were implemented in the Kleisli database query language [2].

[edit] Forms of list comprehension

[edit] In Haskell

In Haskell, list comprehensions can also be written as expressions involving the higher-order functions map and filter. For example, S above can also be written as

S = map (2*) (filter (\x -> x^2 > 3) [0..])

However, Haskell's list comprehension syntax permits any number of comma-separated qualifiers after the pipe |. A qualifier can be a generator drawing items from a list, a guard doing filtering, or a local declaration using let. Certain lists are too complex to express by other means.

[edit] In Python

Further information: Python programming language

The Python programming language has a corresponding syntax for expressing list comprehensions. The near-equivalent example in Python is as follows:

L = range(100)   # Finite list from 0 to 99, since values are computed up-front.
S = [2*x for x in L if x**2 > 3]

The xrange function may be used when lazy evaluation is desired. Unlike range, which creates the entire list up front (using Python's own list type), an xrange is a generator; as such, it generates the elements of the range as they are iterated upon (as by the list comprehenson).

A generator expression may be used in Python 2.4 to achieve functional equivalence with S using a generator to iterate an infinite list:

from itertools import count
S = (2*x for x in count() if x**2 > 3)

[edit] In Erlang

S = [2*X || X <- L, X*X > 3]

[edit] In PostScript

PostScript has no list comprehension syntax as such, but its stack-based operation and the illusory nature of its array syntax create one.

In PostScript, an array is defined by the syntax:

[1 2 3]

But this is somewhat misleading to somebody who does not know PostScript. It appears to be a real array-literal syntax; it appears that the interpreter will actually recognize this attempt to create an array and create one atomically.

This is not the case. In fact, as defined by Adobe's PostScript Level 3 specification, “[” and “]” are operators: [ pushes a mark onto the stack, and ] creates an array from every item on the stack down to the nearest mark (and pushes it). This is perfectly legal PostScript:

/begin_list [ def          % 1
/end_list (]) cvn load def % 2
begin_list 1 2 3 end_list  % 3
  1. Defines a variable named “begin_list” containing a mark.
  2. Converts the string “]” to a name (“/foo” is called a name in PostScript), retrieves the value associated with that name (the ] operator), and defines a variable named “end_list” containing the retrieved operator. Saying “end_list” in our code will now execute the ] operator.
  3. Creates a simple array using our newly-invented array syntax.

The effect of this is that with no special cases or special syntax in the interpreter, the results of any PostScript code — not necessarily a simple sequence of objects — can be entered into an array. Thus, a simple list comprehension:

[ 1 1 5 { } for ]

The for operator takes four arguments: start, increment, stop, and a proc to execute. Each successive value is pushed onto the stack, and then the proc is executed; our proc is empty, which means that the pushed values are not consumed, which means that they build up in the array.

Implementing the example from the previous two sections:

[ 0 1 max { dup 2 exp 3 gt { } { pop } ifelse } for ] % 1
  1. Assuming that /max has previously been defined to a maximum value to square.

Note that, since the building of the array is not done lazily (that is, as items are retrieved), the array must be finite or memory will be exceeded.

The code is as follows:

  1. For every number from 0 to the maximum:
    1. Duplicate the number on the stack. Stack contents: x x.
    2. Raise the topmost element on the stack to the 2nd power (that is, square it). Stack contents: x x2.
    3. Compare the topmost element on the stack (x2) to 3. The result of this conversion, which is a bool, is the first argument to the ifelse operator. Stack contents: x x2 (x2 > 3).
    4. First proc (and second argument to ifelse): If x2 > 3, do nothing (leave x on the stack). Thus x will be included in the array.
    5. Second proc (and third argument to ifelse): Else (that is, if x2 ≤ 3), pop x from the stack. Thus x will be excluded from the array.
    6. The ifelse operator is executed. It consumes the previous three items, and decides between the two procs based on the bool.

[edit] Monad comprehension

Monad comprehension is a generalization of list comprehension to other monads in functional programming. The list comprehension syntax can be understood as operations in a monad, for the earlier example written in the monadic do-notation:

do x <- [0..]
   guard (x^2>3)
   return (2*x)

The use of guard requires that the monad has the additional operation mzero. Otherwise, the list comprehension becomes a do-block, each qualifier becomes an operation, and the result item is moved from the front to the final return operation.

Sometime before the Haskell 98 standard the list comprehension syntax was first generalized to monad comprehension but then specialized back as it caused type ambiguity: ['c'] wouldn't mean a list of one element, the character c, but the result of return 'c' in any monad.

[edit] Parallel list comprehension

The Glasgow Haskell Compiler has an extension called parallel list comprehension (also known as zip-comprehension) that permits multiple independent branches of qualifiers within the list comprehension syntax. Whereas qualifiers separated by commas are dependent, qualifier branches separated by pipes are evaluated in parallel. First consider the following list comprehension with conventional dependent qualifiers:

[(x,y) | x <- a, y <- b]

This takes items from lists a and b and produces a list of pairs of the items. For each item from a, each item from b is drawn in turn to get all combinations of the items. This specific case is the Cartesian product from set theory and relational algebra.

Now consider the following list comprehension with independent qualifiers provided by the extension:

[(x,y) | x <- a | y <- b]

The difference is that this doesn't produce all combinations. Instead, the first branch produces one item in turn from a and the second branch from b, and the result is a pairing of each item from a with one item from b. This ressembles a zipper in aligning the items from the two lists.

Considering a rewrite to higher-order functions, parallel comprehension adds zipWith to the earlier map and filter. The example parallel comprehension could simply be rewritten as follows:

zipWith (,) a b

In the general case, each parallel branch is rewritten into a separate list comprehension, the result lists are zipped into an aligned list, and an outer list comprehension turns it into the result list.

[edit] See also

[edit] References

  • List Comprehension in The Free On-line Dictionary of Computing, Editor Denis Howe.
  • Phil Trinder [3] Comprehensions, a query notation for DBPLs, Proceedings of the third international workshop on Database programming languages : bulk types & persistent data: bulk types & persistent data, Nafplion, Greece, Pages 55-68, 1992.
  • Philip Wadler. Comprehending Monads. Proceedings of the 1990 ACM Conference on LISP and Functional Programming, Nice. 1990.
  • Limsoon Wong [4] The Functional Guts of the Kleisli Query System, International Conference on Functional Programming, Proceedings of the fifth ACM SIGPLAN international conference on Functional programming, Pages 1-10, 2000.

Haskell:

Python:

PostScript:

Common Lisp

Axiom:

  • Axiom stream examples: [5]

[edit] External links

In other languages