Undefined behavior

In computer programming, undefined behavior is a feature of some programming languages—most famously C.[1] In these languages, to simplify the specification and allow some flexibility in implementation, the specification leaves the results of certain operations specifically undefined.

For example, in C the use of any automatic variable before it has been initialized yields undefined behavior, as do division by zero and indexing an array outside of its defined bounds (see buffer overflow). This specifically frees the compiler to do whatever is easiest or most efficient, should such a program be submitted. In general, any behavior afterwards is also undefined. In particular, it is never required that the compiler diagnose undefined behavior — therefore, programs invoking undefined behavior may appear to compile and even run without errors at first, only to fail on another system, or even on another date. When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all.

Under some circumstances there can be specific restrictions on undefined behavior. For example, the instruction set specifications of a CPU might leave the behavior of some forms of an instruction undefined, but if the CPU supports memory protection then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in the operating system's security; so an actual CPU would be permitted to corrupt any or all user registers in response to such an instruction but would not be allowed to, for example, switch into supervisor mode.

Contents

Examples in C and C++

Attempting to modify a string literal causes undefined behavior:[2]

char * p = "wikipedia"; // ill-formed C++11, deprecated C++98/C++03
p[0] = 'W'; // undefined behaviour

One way to prevent this is defining it as an array instead of a pointer.

char p[] = "wikipedia"; /* RIGHT */
p[0] = 'W';

In C++ one can use STL string as follows.

std::string s = "wikipedia"; /* RIGHT */
s[0] = 'W';

Division by zero results in undefined behavior(float, double and long double under IEEE 754 return INF):[3]

return x/0; // undefined behavior

Certain pointer operations may result in undefined behavior:[4]

int arr[4] = {0, 1, 2, 3};
int* p = arr + 5;  // undefined behavior

Reaching the end of a value-returning function (other than main()) without a return statement may result in undefined behavior:

int f()
{
}  /* undefined behavior */

The C Programming Language written by Kernighan and Ritchie cites the following examples of code that have undefined behavior in Section 2.12.

printf("%d %d\n", ++n, power(2, n));    /* WRONG */

and

a[i] = i++;

Risks of undefined behavior

HTML versions 4 and earlier left error handling undefined. Over time pages started relying on unspecified error-recovery implemented in popular browsers. This caused difficulties for vendors of less-popular browsers who were forced to reverse-engineer and implement bug compatible error recovery. This has led to de-facto standard that was much more complicated than it could have been if this behavior was specified from the start.

Compiler easter eggs

In some languages (including C), even the compiler is not bound to behave in a sensible manner once undefined behavior has been invoked. One instance of undefined behavior acting as an Easter egg is the behavior of early versions of the GCC C compiler when given a program containing the #pragma directive, which has implementation-defined behavior according to the C standard. ("Implementation-defined" is more restrictive than "undefined", requiring the implementation to document what it does.) In practice, many C implementations recognize, for example, #pragma once as a rough equivalent of #include guards — but GCC 1.21, upon finding a #pragma directive, would instead attempt to launch commonly distributed Unix games such as NetHack and Rogue, or start Emacs running a simulation of the Towers of Hanoi.[5]

References

  1. ^ Lattner, Chris (May 13, 2011). "What Every C Programmer Should Know About Undefined Behavior". LLVM Project Blog. LLVM.org. http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html. Retrieved May 24, 2011. 
  2. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §2.13.4 String literals [lex.string] para. 2
  3. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.6 Multiplicative operators [expr.mul] para. 4
  4. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.7 Additive operators [expr.add] para. 5
  5. ^ "A Pragmatic Decision" quotes the March 1988 issue of UNIX Review magazine, which referred to GCC version 1.17 but got the order wrong. "Everything2: #pragma" gives the correct order. The actual code is in file "cccp.c" in the GCC 1.21 distribution: http://www.oldlinux.org/Linux.old/gnu/gcc-1/

External links