Concatenative programming language

From Wikipedia, the free encyclopedia

The concatenative or stack-based programming languages are ones in which the concatenation of two pieces of code expresses the composition of the functions they express. These languages use a stack to store the arguments and return values of operations.

The most widespread concatenative language is the page description language PostScript, a restricted version of which is used in PDF. However, Postcript code are usually generated by programs written in other languages. Other well-known concatenative languages include Forth[1][2], and the RPL used on Hewlett-Packard HP-28 and HP-48 scientific calculators.

The RPL used on Hewlett-Packard HP-28 and HP-48 scientific calculators is a concatenative garbage-collected language that has been popular amongst generations of electrical engineers. Other concatenative languages include Joy, Cat, and Factor[3][4], [5][6].

The concatenative languages may be regarded as a language family, similar to the Lisp languages. As with the Lisps (including Scheme), there are strong "family resemblances" within the group. However, there are also major differences in the design, implementation, and purposes of these languages. RPL and many versions of Forth itself are targeted at small and embedded devices; likewise PostScript is targeted largely at a specific type of device, namely printers. Other concatenative languages, however, are general-purpose or research-focused, as with Joy and Cat.

Contents

[edit] History and description

Forth was the first language of this sort, but Joy was the first language to call itself concatenative. The creator of Joy, Manfred von Thun, has written much about concatenative theory.

Formally speaking, a programming language is concatenative (and not applicative) when:[citation needed]

  • The elementary well-formed expressions of the language are monadic functions of one argument and one return value.
  • If X and Y are well-formed expressions, then the concatenation of X and Y is well-formed.
  • If Z is the concatenation of X and Y, then the value of Z is the composition of the values of X and Y.

In this definition, there is no mention of the stack; however, all concatenative languages do use a stack, as it is the most straightforward way to pass data from one operation to the next. The term "concatenative" is not universally accepted as a particularly useful term, with many users of these languages preferring to call them "stack-based".

Operators or functions in these languages are usually referred to as words, and this article will refer to them as such.

[edit] Postfix notation

Code in concatenative languages resemble the postfix notation or reverse Polish notation (RPN) form of arithmetic. Arguments, or code that produces arguments, comes first, followed by the words that use those arguments. Literals, such as numbers, place values on the stack; other words may consume some items from the stack and produce others.

An example:

2 3 + .

In Forth and Factor, this is a valid program: 2 and 3 are literals, words that simply push a value onto the stack. + is a mathematical operator; it removes the top two values from the stack and pushes their sum. Finally, the . (dot) operator removes the top value and prints it to the display. Thus, this program will print the number 5.

[edit] Flow control

In order to be Turing-complete, a language must have some means of flow control -- the ability to make branches or choices based on the value of an expression. Concatenative languages implement flow control in different ways. Contrast the following examples in Forth and Factor:

= IF 23 ELSE 17 THEN
= [ 23 ] [ 17 ] if

Both examples do the same thing: compare the top two items on the stack; if they are equal, the value 23 is left on the stack; if they are not equal, the value 17 is left. In Forth, the word IF is special; it causes the code prior to the ELSE to be executed if the condition is true, or the code after the ELSE if it is false. In Factor, the word if is just another function; it consumes the condition and two quotations, or quoted code fragments; it executes the first quotation or the second based on whether the condition is true. The ifelse operator in PostScript is similar.

The formalization of code quotations, a feature ultimately inherited from Lisp, distinguishes the later concatenative languages such as Factor and PostScript from earlier ones such as Forth. Words that operate on quotations are sometimes called combinators.

[edit] Definitions and parsing words

The concatenative languages differ widely in the syntax used to define new words. The following pieces of code in Cat, Forth or Factor, and Joy all do the same thing: define a new word called add100 which adds 100 to the value on the top of the stack.

define add100  { 100 + }
: add100  100 + ;
DEFINE add100 ==  100 + .

In each case, the body of the definition is the same: the words 100 +. What differ are the parsing words used to tell the language that a definition is taking place. Parsing words are somewhat analogous to keywords in C or special operators in Lisp: rather than expressing operations on the stack, they control how the parser deals with the following words.

It is possible to define new parsing words. This ability gives concatenative languages their equivalent of Lisp macros: programmers can define new syntax.

[edit] Stack effect

Concatenative programming makes use of the notion of the stack effect of a word. A word's stack effect is a description of what values are consumed from the stack, and what values are left on the stack, when the word is executed. Typically, a word will have a consistent stack effect: it will always consume the same number of arguments and leave the same number of results.

[edit] External links