C trigraph
From Wikipedia, the free encyclopedia
In the C family of programming languages, a trigraph is a sequence of three characters, the first two of which are both question marks, that represents a single character.
The reason for their existence is that the basic character set of C is a subset of the ASCII character set, but nine of its characters lie outside the smaller ISO 646 invariant character set. The ISO 646 character set is largely equivalent to ASCII, except that certain punctuation characters present in ASCII are allowed to be substituted by "national characters". In other words, users of non-English languages are free to reassign those characters to additional alphabetic symbols needed in their language. However, this poses a problem for C programming, since those removed punctuation characters are heavily used in C. The ANSI C committee invented trigraphs to permit programs to be written using any version of the ISO 646 character set. Non-ASCII ISO 646 character sets are not much used today, but trigraphs remain in the C standard.
Trigraphs may also be useful with some EBCDIC code pages that lack characters such as {
and }
.
Trigraphs are rarely used outside compiler test suites. Many compilers either have an option to turn recognition of trigraphs off, or disable trigraphs by default and require an option to turn them on. Some can issue warnings when they encounter trigraphs in source files. Borland supplied a separate program, the trigraph preprocessor, to be used only when trigraph processing is desired.
Processing of trigraphs may be considered a performance burden on compilers as every character in every source file has to be checked to see if it introduces a trigraph. However, the source characters have to be examined individually anyway, so the additional overhead is small.
Contents |
[edit] Trigraph sequences
The C preprocessor replaces all occurrences of the following nine trigraph sequences by their single-character equivalents before any other processing.
Trigraph Equivalent ======== ========== ??= # ??/ \ ??' ^ ??( [ ??) ] ??! | ??< { ??> } ??- ~
Note that ???
is not a trigraph sequence.
Note also that the problematic characters are nevertheless required to exist within the implementation, in both the source and execution character sets.
The ??/
trigraph can be used to introduce an escaped newline for line splicing; this must be taken into account for correct and efficient handling of trigraphs within the preprocessor. It can also cause surprises, particularly within comments. For example:
// Will the next line be executed????????????????/ a++;
which is a single logical comment line, and
/??/ * A comment *??/ /
which is a correctly formed block comment.
[edit] Example
An example of a C program that uses all the defined trigraphs:
??=include <stdio.h> /* # */ int main(void) ??< /* { */ char n??(5??); /* [ and ] */ n??(4??) = '0' - (??-0 ??' 1 ??! 2); /* ~, ^ and | */ printf("%c??/n", n??(4??)); /* ??/ = \ */ return 0; ??> /* } */
[edit] Disambiguation
A programmer may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two subsequent ?
tokens, so the only places in a C file where two question marks in a row may be used are in multi-character constants, string literals, and comments. To safely place two consecutive question marks within a string literal, concatenation "...?""?..."
can be used.
[edit] Alternatives
In 1994 a normative amendment to the C standard, included in C99, supplied so-called digraphs as more readable alternatives to trigraphs. They are:
Digraph Equivalent ======= ========== <: [ :> ] <% { %> } %: # %:%: ##
Unlike trigraphs, digraphs are not substituted within quoted strings, character constants, or comments.