INTERCAL, a programming language parody, is an esoteric programming language that was created by Don Woods and James M. Lyon, two Princeton University students, in 1972. It satirizes aspects of the various programming languages at the time,[1] as well as the proliferation of proposed language constructs and notations in the 1960s. Consequently, the humor may appear rather dated to modern readers brought up with C or Java.
According to the original manual by the authors,[2]
“ | The full name of the compiler is "Compiler Language With No Pronounceable Acronym," which is, for obvious reasons, abbreviated "INTERCAL." | ” |
There are two currently maintained versions of INTERCAL: C-INTERCAL, formerly maintained by Eric S. Raymond,[3] and CLC-INTERCAL, maintained by Claudio Calvelli.[4]
Contents |
INTERCAL is intended to be completely different from all other computer languages. Common operations in other languages have cryptic and redundant syntax in INTERCAL. From the INTERCAL Reference Manual:[2]
“ | It is a well-known and oft-demonstrated fact that a person whose work is incomprehensible is held in high esteem. For example, if one were to state that the simplest way to store a value of 65536 in a 32-bit INTERCAL variable is:
any sensible programmer would say that that was absurd. Since this is indeed the simplest method, the programmer would be made to look foolish in front of his boss, who would of course happened to turn up, as bosses are wont to do. The effect would be no less devastating for the programmer having been correct. |
” |
The INTERCAL Reference Manual contains many paradoxical, nonsensical, or otherwise humorous instructions (in the manner of the game Mornington Crescent):
“ | Caution! Under no circumstances confuse the mesh with the interleave operator, except under confusing circumstances! | ” |
The manual also contains a "tonsil", as explained in this footnote: "4) Since all other reference manuals have Appendices, it was decided that the INTERCAL manual should contain some other type of removable organ."[2]
INTERCAL has many other features designed to make it even more aesthetically unpleasing to the programmer: it uses statements such as "READ OUT", "IGNORE", "FORGET", and modifiers such as "PLEASE". This last keyword provides two reasons for the program's rejection by the compiler: if "PLEASE" does not appear often enough, the program is considered insufficiently polite, and the error message says this; if too often, the program could be rejected as excessively polite. Although this feature existed in the original INTERCAL compiler, it was undocumented.[5]
The INTERCAL manual gives unusual names to all non-alphanumeric ASCII characters: single and double quotes are "sparks" and "rabbit ears" respectively. (The exception is the ampersand: as the Jargon File states, "what could be sillier?") The assignment operator, represented as an equals sign (INTERCAL's "half mesh") in many other programming languages, is in INTERCAL a left-arrow, "<-", referred to as "gets" and made up of an "angle" and a "worm".
The original Princeton implementation used punched cards and the EBCDIC character set. To allow INTERCAL to run on computers using ASCII, substitutions for two characters had to be made: $ substituted for ¢ as the mingle operator, "represent[ing] the increasing cost of software in relation to hardware", and ? was substituted for ∀ as the unary exclusive-or operator to "correctly express the average person's reaction on first encountering exclusive-or".[2] In recent versions of C-INTERCAL, the older operators are supported as alternatives; INTERCAL programs may now be encoded in ASCII, Latin-1, or UTF-8.[5]
The Usenet newsgroup alt.lang.intercal is devoted to the study and appreciation of INTERCAL and other esoteric languages.
Despite the language's intentionally obtuse and wordy syntax, INTERCAL is nevertheless Turing-complete: given enough memory, INTERCAL can solve any problem that a Universal Turing machine can solve. Most implementations of INTERCAL do this very slowly, however. A Sieve of Eratosthenes benchmark, computing all prime numbers less than 65536, was tested on a Sun SPARCStation-1. In C, it took less than half a second; the same program in INTERCAL took over seventeen hours.[6]
It should be noted that almost any programming language allows notational horrors as great as or greater than INTERCAL's, as demonstrated in contests such as the International Obfuscated C Code Contest. However, these are generally intentional efforts to create unreadable code, in contrast to INTERCAL's design, which forces virtually all code to be unreadable.
According to the INTERCAL manual, "the aim in designing INTERCAL was to have no precedents", supposedly neither in flow control features, nor in data manipulation operators. The designers were relatively successful; however, one known precedent is a machine instruction [7] in a Soviet mainframe computer BESM-6, released in 1967, that is effectively equivalent to INTERCAL's "select" operator.
The original Woods–Lyon INTERCAL was very limited in its input/output capabilities: the only acceptable input were numbers with the digits spelled out, and the only output was an extended version of Roman numerals. A while later, there was an 'Atari implementation', about which notes are provided in the INTERCAL reference manual; it 'differs from the original Princeton version primarily in the use of ASCII rather than EBCDIC'.[2]
The C-INTERCAL reimplementation, being available on the Internet, has made the language more popular with devotees of esoteric programming languages.[8] The C-INTERCAL dialect has a few differences from original INTERCAL and introduced a few new features, such as a COME FROM statement and a means of doing text I/O based on the Turing Text Model.[5]
The authors of C-INTERCAL also created the TriINTERCAL variant, based on the Ternary numeral system and generalizing INTERCAL's set of operators.[5]
A more recent variant is Threaded Intercal, which extends the functionality of COME FROM to support multithreading.[9]
INTERCAL-72 (the original version of INTERCAL) had only four data types, the 16-bit integer (represented with a .
, called a 'spot'), the 32-bit integer (:
, a 'twospot'), the array of 16-bit integers (,
, a 'tail'), and the array of 32-bit integers (;
, a 'hybrid'). There are 65535 available variables of each type, numbered from .1
to .65535
for 16-bit integers, for instance. However, each of these variables has its own stack on which it can be pushed and popped (STASHed and RETRIEVEd, in INTERCAL terminology), increasing the possible complexity of data structures.[2] (More modern versions of INTERCAL have by and large kept the same data structures, with appropriate modifications; TriINTERCAL, which modifies the radix with which numbers are represented, can use a 10-trit type rather than a 16-bit type,[5] and CLC-INTERCAL implements many of its own data structures, such as 'classes and lectures', by making the basic data types store more information rather than adding new types.[4]) Arrays are dimensioned by assigning to them as if they were a scalar variable. Constants can also be used, and are represented by a #
('mesh') followed by the constant itself, written as a decimal number; only integer constants from 0 to 65535 are supported.[2]
There are only five operators in INTERCAL-72; implementations vary in which characters represent which operation, and many accept more than one character, so more than one possibility is given for many of the operators.
Sources for this table:[2][4][5]
Operator | INTERCAL-72 characters | Atari characters | C-INTERCAL characters | CLC-INTERCAL characters |
---|---|---|---|---|
INTERLEAVE / MINGLE | c backspace / |
$ |
¢ , $ , c backspace / |
¢ |
SELECT | ~ |
~ |
~ |
~ |
AND | & |
& |
& |
& |
OR | V |
V |
V |
V |
XOR | V backspace - |
? |
V backspace - , ? , ∀ |
V backspace - , ¥ |
Contrary to most other languages, AND, OR, and XOR are unary operators, which work on consecutive bits of their argument; the most significant bit of the result is the operator applied to the most significant and least significant bits of the input, the second-most-significant bit of the result is the operator applied to the most and second-most significant bits, the third-most-significant bit of the result is the operator applied to the second-most and third-most bits, and so on. The operator is placed between the punctuation mark specifying a variable name or constant and the number that specifies which variable it is, or just inside grouping marks (i.e. one character later than it would be in programming languages like C.) SELECT and INTERLEAVE (which is also known as MINGLE) are infix binary operators; SELECT takes the bits of its first operand that correspond to '1' bits of its second operand and removes the bits that correspond to '0' bits, shifting towards the least significant bit and padding with zeroes (so 51 (110011 in binary) SELECT 21 (10101 in binary) is 5 (101 in binary)); MINGLE alternates bits from its first and second operands (in such a way that the least significant bit of its second operand is the least significant bit of the result). There is no operator precedence; grouping marks must be used to disambiguate the precedence where it would otherwise be ambiguous (the grouping marks available are '
('spark'), which matches another spark, and "
('rabbit ears'), which matches another rabbit ears; the programmer is responsible for using these in such a way that they make the expression unambiguous.)[2]
INTERCAL statements all start with a 'statement identifier'; in INTERCAL-72, this can be DO
, PLEASE
, or PLEASE DO
, all of which mean the same to the program (but using one of these too heavily causes the program to be rejected, an undocumented feature in INTERCAL-72 that was mentioned in the C-INTERCAL manual[5]), or an inverted form (with NOT
or N'T
appended to the identifier).[2] Backtracking INTERCAL, a modern variant, also allows variants using MAYBE
(possibly combined with PLEASE or DO) as a statement identifier, which introduces a choice-point.[10] Before the identifier, an optional line number (an integer enclosed in parentheses) can be given; after the identifier, a percent chance of the line executing can be given in the format %50
, which defaults to 100%.[2]
In INTERCAL-72, the main control structures are NEXT, RESUME, and FORGET. DO (line) NEXT
branches to the line specified, remembering the next line that would be executed if it weren't for the NEXT on a call stack (other identifiers than DO can be used on any statement, DO is given as an example); DO FORGET expression
removes expression entries from the top of the call stack (this is useful to avoid the error that otherwise happens when there are more than 80 entries), and DO RESUME expression
removes expression entries from the call stack and jumps to the last line remembered.[2]
C-INTERCAL also provides the COME FROM instruction, written DO COME FROM (line)
; CLC-INTERCAL and the most recent C-INTERCAL versions also provide computed COME FROM (DO COME FROM expression
) and NEXT FROM, which is like COME FROM but also saves a return address on the NEXT STACK.[4]
Alternative ways to affect program flow, originally available in INTERCAL-72, are to use the IGNORE and REMEMBER instructions on variables (which cause writes to the variable to be silently ignored and to take effect again, so that instructions can be disabled by causing them to have no effect), and the ABSTAIN and REINSTATE instructions on lines or on types of statement, causing the lines to have no effect or to have an effect again respectively.[2]
Input (using the WRITE IN
instruction) and output (using the READ OUT
instruction) do not use the usual formats; in INTERCAL-72, WRITE IN inputs a number written out as digits in English (such as SIX FIVE FIVE THREE FIVE), and READ OUT outputs it in 'butchered' Roman numerals.[2] More recent versions have their own I/O systems.[4][5] Comments can be achieved by using the inverted statement identifiers involving NOT or N'T; these cause lines to be initially ABSTAINed so that they have no effect.[2] (A line can be ABSTAINed from even if it doesn't have valid syntax; syntax errors happen at runtime, and only then when the line is un-ABSTAINed.)[2]
The traditional "Hello, world!" program demonstrates how different INTERCAL is from standard programming languages. In C, it could read as follows:
#include <stdio.h> int main() { printf("Hello, world!\n"); return 0; }
The equivalent program in C-INTERCAL is longer and harder to read:
DO ,1 <- #13 PLEASE DO ,1 SUB #1 <- #238 DO ,1 SUB #2 <- #108 DO ,1 SUB #3 <- #112 DO ,1 SUB #4 <- #0 DO ,1 SUB #5 <- #64 DO ,1 SUB #6 <- #194 DO ,1 SUB #7 <- #48 PLEASE DO ,1 SUB #8 <- #22 DO ,1 SUB #9 <- #248 DO ,1 SUB #10 <- #168 DO ,1 SUB #11 <- #24 DO ,1 SUB #12 <- #16 DO ,1 SUB #13 <- #162 PLEASE READ OUT ,1 PLEASE GIVE UP
Hello world cannot be implemented in INTERCAL-72 since the language lacks a facility for text output. Only "butchered" Roman numerals can be printed.[2]
In the article "A Box, Darkly: Obfuscation, Weird Languages, and Code Aesthetics",[8] INTERCAL is described under the heading "Abandon all sanity, ye who enter here: INTERCAL". The compiler and commenting strategy are among the "weird" features described:
“ | The compiler, appropriately named "ick," continues the parody. Anything the compiler can’t understand, which in a normal language would result in a compilation error, is just skipped. This "forgiving" feature makes finding bugs very difficult; it also introduces a unique system for adding program comments. The programmer merely inserts non-compileable text anywhere in the program, being careful not to accidentally embed a bit of valid code in the middle of their comment. | ” |