C preprocessor

From Wikipedia, the free encyclopedia

The C preprocessor (cpp) is the preprocessor for the C programming language. In many C implementations, it is a separate program invoked by the compiler as the first part of translation. The preprocessor handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if). The language of preprocessor directives is not strictly specific to the grammar of C, so the C preprocessor can also be used independently to process other types of files.

The transformations it makes on its input form the first four of C's so-called Phases of Translation. Though an implementation may choose to perform some or all phases simultaneously, it must behave as if it performed them one-by-one in order.

Contents

[edit] Phases

  1. Trigraph Replacement - The preprocessor replaces trigraph sequences with the characters they represent.
  2. Line Splicing - Physical source lines that are continued with escaped newline sequences are spliced to form logical lines.
  3. Tokenization - The preprocessor breaks the result into preprocessing tokens and whitespace. It replaces comments with whitespace.
  4. Macro Expansion and Directive Handling - Preprocessing directive lines, including file inclusion and conditional compilation, are executed. The preprocessor simultaneously expands macros and, in the 1999 version of the C standard, handles _Pragma operators.

[edit] Including files

The most common use of the preprocessor is to include another file:

#include <stdio.h>

int main (void)
{
    printf("Hello, world!\n");
    return 0;
}

The preprocessor replaces the line #include <stdio.h> with the system header file of that name, which declares the printf() function amongst other things. More precisely, the entire text of the file 'stdio.h' replaces the #include directive.

This can also be written using double quotes, e.g. #include "stdio.h". The angle brackets were originally used to indicate 'system' include files, and double quotes user-written include files, and it is good practice to retain this distinction. C compilers and programming environments all have a facility which allows the programmer to define where include files can be found. This can be introduced through a command line flag, which can be parameterized using a makefile, so that a different set of include files can be swapped in for different operating systems, for instance.

By convention, include files are given a .h extension, and files not included by others are given a .c extension. However, there is no requirement that this be observed. Occasionally you will see files with other extensions included, in particular files with a .def extension may denote files designed to be included multiple times, each time expanding the same repetitive content.

#include often compels the use of #include guards or #pragma once to prevent double inclusion.

[edit] Conditional compilation

The #ifdef, #ifndef, #else, #elif and #endif directives can be used for conditional compilation.

#define __WINDOWS__

#ifdef __WINDOWS__
#include <windows.h>
#else
#include <unistd.h>
#endif

The first line defines a macro __WINDOWS__. The macro could be defined implicitly by the compiler, or specified on the compiler's command line, perhaps to control compilation of the program from a makefile.

The subsequent code tests if a macro __WINDOWS__ is defined. If it is, as in this example, the file <windows.h> is included, otherwise <unistd.h>.

[edit] Macro definition and expansion

There are two types of macros, object-like and function-like. Function-like macros take parameters; object-like macros don't. The generic syntax for declaring an identifier as a macro of each type is, respectively,

 #define <identifier> <replacement token list>
 #define <identifier>(<parameter list>) <replacement token list>

Note that there must not be any whitespace between the macro identifier and the left parenthesis.

Wherever the identifier appears in the source code it is replaced with the replacement token list, which can be empty. For an identifier declared to be a function-like macro, it is only replaced when the following token is also a left parenthesis that begins the argument list of the macro invocation. The exact procedure followed for expansion of function-like macros with arguments is subtle.

Object-like macros are conventionally used as part of good programming practice to create symbolic names for constants, e.g.

#define PI 3.14159

instead of hard-coding those numbers throughout one's code.

An example of a function-like macro is:

#define RADTODEG(x) ((x) * 57.29578)

This defines a radians to degrees conversion which can be written subsequently, e.g. RADTODEG(34). This is expanded in-place, so the caller does not need to litter copies of the multiplication constant all over his code. The macro here is written as all uppercase to emphasize that it is a macro, not a compiled function.

[edit] Precedence

Note that the macro uses parentheses both around the argument and around the entire expression. Omitting either of these can lead to unexpected results. For example:

  • Without parentheses around the argument:
  • Macro defined as #define RADTODEG(x) (x * 57.29578)
  • RADTODEG(a + b) expands to (a + b * 57.29578)
  • Without parentheses around the whole expression:
  • Macro defined as #define RADTODEG(x) (x) * 57.29578
  • 1 / RADTODEG(a) expands to 1 / (a) * 57.29578

neither of which give the probably intended result.

[edit] Multiple evaluation of side effects

Another example of a function-like macro is:

#define MIN(a,b) ((a)>(b)?(b):(a))

This illustrates one of the dangers of using function-like macros. One of the arguments, a or b, will be evaluated twice when this "function" is called. So, if the expression MIN(++firstnum,secondnum) is evaluated, then firstnum may be incremented twice, not once as would be expected.

A safer way to achieve the same would be to use a typeof-construct:

#define max(a,b) \
       ({ typeof (a) _a = (a); \
           typeof (b) _b = (b); \
         _a > _b ? _a : _b; })

This will cause the arguments to be evaluated only once, and it won't be type-specific anymore. This construct is not legal ANSI C; both the typeof keyword, and the construct of placing a compound statement within parentheses, are non-standard extensions implemented in the popular gnu C compiler (gcc). If you are using gcc, the same general problem can also be solved using a static inline function, which is as efficient as a #define. The inline function allows the compiler to check/coerce parameter types -- in this particular example this appears to be a disadvantage, since the 'max' function as shown works equally well with different parameter types, but in general having the type coercion is often an advantage.

Within ANSI C, there's no reliable general solution to the issue of side-effects in macro arguments.

[edit] String concatenation

String concatenation is one of the most subtle — and easy to abuse — features of the C macro preprocessor. Two arguments can be 'glued' together using ## preprocessor operator; this allows two strings to be concatenated in the preprocessed code. This can be used to construct elaborate macros which act much like C++ templates (without many of their benefits).

For instance:

#define MYCASE(item,id) \
   case id: \
     item##_##id = id;\
   break 

  switch(x) {
      MYCASE(widget,23);
  }

The line MYCASE(widget,23); gets expanded here into case 23: widget_23 = 23; break;. (The semicolon following the invocation of MYCASE becomes the semicolon that completes the break statement.)

[edit] Semicolons

One stylistic note about the above macro is that the semicolon on the last line of the macro definition is omitted so that the macro looks 'natural' when written. It could be included in the macro definition, but then there would be lines in the code without semicolons at the end which would throw off the casual reader. Worse, the user could be tempted to include semicolons anyway; in most cases this would be harmless (an extra semicolon denotes an empty statement) but it would cause errors in control flow blocks:

#define PRETTY_PRINT(s) \
   printf ("Message: \"%s\"\n", s);

  if (n < 10)
    PRETTY_PRINT("n is less than 10");
  else
    PRETTY_PRINT("n is at least 10");

This expands to give two statements – the intended printf and an empty statement – in each branch of the if/else construct, which will cause the compiler to give an error message similar to:

error: expected expression before ‘else’

gcc 4.1.1

[edit] Multiple lines

The macro can be extended over as many lines as required using a backslash escape at the end of the line. The macro ends on the first line which does not end in a backslash.

Properly used, multi-line macros can greatly reduce the size and complexity of the source of a C program, enhancing its readability and maintainability.

[edit] Multiple statements

Inconsistent use of multiple-statement macros can result in unintended behaviour. The code

#define CMDS \
   foo(); \
   bar()

  if (var == 13)
    CMDS;
  else
    return;

will expand to

  if (var == 13)
    foo();
  bar();
  else
    return;

which is a syntax error. The macro can be made safe by writing it as

#define CMDS \
  do { \
    foo(); \
    bar(); \
  } while (0)

If the do and while (0) are omitted, then the semicolon in the macro's invocation above becomes an empty statement, causing a syntax error at the else.

[edit] Quoting macro arguments

Although macro expansion does not occur within a quoted string, the text of the macro arguments can be quoted and treated as a string literal by using the "#" directive. For example, with the macro

#define QUOTEME(x) #x

the code

printf("%s\n", QUOTEME(1+2));

will expand to

printf("%s\n", "1+2");

This capability can be used with automatic string concatenation to make debugging macros. For example, the macro in

#define dumpme(x, fmt) printf("%s:%u: %s=" fmt, __FILE__, __LINE__, #x, x)

int some_function() {
    int foo;
    /* [a lot of complicated code goes here] */
    dumpme(foo, "%d");
    /* [more complicated code goes here] */
}

would print the name of an expression and its value, along with the file name and the line number.

[edit] Variadic macros

Main article: Variadic macro

Macros that can take a varying number of arguments (variadic macros) are not allowed in C89, but were introduced by a number of compilers and standardised in C99. Variadic macros are particularly useful when writing wrappers to printf, for example when logging warnings and errors.

[edit] X-Macros

One little-known usage-pattern of the C preprocessor is known as "X-Macros". X-Macros are the practice of using the #include directive multiple times on the same source header file, each time in a different environment of defined macros.

File: commands.def

COMMAND(ADD, "Addition command")
COMMAND(SUB, "Subtraction command")
COMMAND(XOR, "Exclusive-or command")
enum command_indices {
#define COMMAND(name, description)              COMMAND_##name ,
#include "commands.def"
#undef COMMAND
    COMMAND_COUNT /* The number of existing commands */
};

char *command_descriptions[] = {
#define COMMAND(name, description)              description ,
#include "commands.def"
#undef COMMAND
    NULL
};

result_t handler_ADD (state_t *)
{
  /* code for ADD here */
}

result_t handler_SUB (state_t *)
{
  /* code for SUB here */
}

result_t handler_XOR (state_t *)
{
  /* code for XOR here */
}

typedef result_t (*command_handler_t)(state_t *);

command_handler_t command_handlers[] = {
#define COMMAND(name, description)              &handler_##name ,
#include "commands.def"
#undef COMMAND
    NULL
};

New commands may then be defined by changing the command list in the X-Macro header file (often named with a .def extension), and defining a new command handler of the proper name. The command descriptions list, handler list, and enumeration are updated automatically by the preprocessor. In many cases, however, full automation is not possible, as is the case with the definitions of the handler functions.

[edit] User-defined compilation errors and warnings

The #error directive inserts an error message into the compiler output.

#error "Gosh!"

This prints "Gosh!" in the compiler output and halts the computation at that point. This is extremely useful if you aren't sure whether a given line is being compiled or not. It is also useful if you have a heavily parameterized body of code and want to make sure a particular #define has been introduced from the makefile, e.g.:

#ifdef WINDOWS
    ... /* Windows specific code */
#elif defined(UNIX)
    ... /* Unix specific code */
#else
    #error "What's your operating system?"
#endif

Some implementations provide a non-standard #warning directive to print out a warning message in the compiler output, but not stop the compilation process. A typical use is to warn about the usage of some old code, which is now unfavored and only included for compatibility reasons, e.g.:

#warning "Do not use ABC, which is deprecated. Use XYZ instead."

Although the text following the #error or #warning directive does not have to be quoted, it's good practice to do so. Otherwise, there may be problems with apostrophes and other characters that the preprocessor tries to interpret.

[edit] Compiler-specific preprocessor features

The #pragma directive is a compiler specific directive which compiler vendors may use for their own purposes. For instance, a #pragma is often used to allow suppression of specific error messages, manage heap and stack debugging, etc.

C99 introduced a few standard #pragma directives, taking the form #pragma STDC …, which are used to control the floating-point implementation.

[edit] Standard positioning macros

Certain symbols are predefined in ANSI C. Two useful ones are __FILE__ and __LINE__, which expand into the current file and line number. For instance:

// debugging macros so we can pin down message provenance at a glance
#define WHERESTR "[file %s, line %d] "
#define WHEREARG __FILE__,__LINE__

printf(WHERESTR ": hey, x=%d\n", WHEREARG, x);

This prints the value of x, preceded by the file and line number, allowing quick access to which line the message was produced on. Note that the WHERESTR argument is concatenated with the string following it.

[edit] Compiler-specific predefined macros

Compiler-specific predefined macros are usually listed in the compiler documentation, although this is often incomplete. The Pre-defined C/C++ Compiler Macros project lists "various pre-defined compiler macros that can be used to identify standards, compilers, operating systems, hardware architectures, and even basic run-time libraries at compile-time".

Some compilers can be made to dump at least some of their useful predefined macros, for example:

GNU C Compiler
gcc -dM -E - < /dev/null
HP-UX ansi C compiler
cc -v fred.c (where fred.c is a simple test file)

[edit] As a general-purpose preprocessor

Since the C preprocessor can be invoked independently to process files other than those containing to-be-compiled source code, it can also be used as a "general purpose preprocessor" for other types of text processing. One particularly notable example is the now-deprecated imake system; more examples are listed at General purpose preprocessor.

[edit] See also

[edit] External links

Wikibooks
Wikibooks has more about this subject:
C programming language
Libraries: C standard library | glibc | Dietlibc | uClibc | Newlib
History: Criticism of the C programming language
Language Features: String | Syntax | Preprocessor | Variable types and declarations | Functions
Dialects: C++ | Objective-C
C and Other Languages: Compatibility of C and C++ | Operators in C and C++ | Comparison of Pascal and C | C to Java byte-code compiler
In other languages