Inline expansion

From Wikipedia, the free encyclopedia

In computing, Inline expansion, or inlining, is a compiler optimization that "expands" a function call site into a version of the function which is called. The intent of performing this optimization is to improve runtime performance, at the possible cost of increasing the size of the final program.

While inlining removes the cost of the function call and return instructions, these are often small savings. The major savings often come from the additional optimizations that become possible on the inlined function body (eg., a constant passed as an argument can often be propagated to all instances of the matching parameter). Inlining often, but not always, increases the size of the generated code. This expansion occasionally reduces runtime performance, for instance because of loss of locality of reference.

Ordinarily, when a function is invoked, control is transferred to its definition by a branch or call instruction; with inlining, the function is "spliced in" to the caller.

Some languages (e.g., C and C++) support the inline keyword, which when appearing in a function definition acts as a hint to the compiler that the function should be inlined. Compilers use a variety of mechanisms, including hints from developers, to decide which function calls should be inlined.

In the context of functional programming languages, inline expansion is often referred to as beta reduction, a term used in the lambda calculus, the formal language underlying these languages.

Inlining might be performed manually, as a once only operation on the source code using copy and paste programming. However, other methods of controlling inlining (see below) are preferable, because they avoid the situation where a (possibly modified) copy-paste of a function is overlooked when trying to fix a bug in that function.

1 Implementation
2 Benefits
3 Problems
4 Selection methods and language support
5 See also
6 External links

[edit] Implementation

Once the compiler has decided to inline a particular function, it is usually a simple matter to do so. Depending on whether one wants cross-language inline functions, the inlining can be done with either a high-level intermediate representation, like abstract syntax trees, or a low-level intermediate representation. In either case, one simply computes the arguments, stores them in variables corresponding to the function's arguments, and then inserts the body of the function at the call site.

Function inlining can also be performed at link-time, which enables inlining of functions whose source is not available such as library functions (see link-time optimization) and at run time, which enables using dynamic profiling information to make better decisions about which functions to inline, as in the Java Hotspot compiler.

Here's a simple example of inline expansion performed "by hand" at the source level in the C programming language:

int pred(int x) {
    if (x == 0)
       return 0;
    else
       return x - 1;
}

Before inlining:

int f(int y) {
     return pred(y) + pred(0) + pred(y+1);
 }

After inlining:

int f(int y) {
    int temp = 0;
    if (y   == 0) temp += 0; else temp += y       - 1;
    if (0   == 0) temp += 0; else temp += 0       - 1;
    if (y+1 == 0) temp += 0; else temp += (y + 1) - 1;
    return temp;
}

Note that this is only an example; in an actual C application, it would be preferable to use an inlining language feature such as parameterized macros or inline functions to tell the compiler to perform this transformation. Below, we discuss some of the optimizations that can be performed on this code to simplify it.

[edit] Benefits

Inline expansion itself is an optimization, since it eliminates call overhead, but it is much more important as an enabling transformation. That is, once the body of the function is expanded in the context of its call site, often with arguments that may be fixed constants, the code is opened to a variety of new optimizations that were not possible before. For example, a branch using an argument may turn out to be always true or always false in this one case, allowing dead code elimination, loop-invariant statements may be moved outside a loop, or a variable may become a candidate for induction variable elimination.

In our C example, we see that optimization opportunities abound. We can reduce it in the following steps:

The temp += 0 statements do nothing. We can remove them.
The condition 0 == 0 is always true, so we can use just the true branch (which does nothing).
The condition y+1 == 0 is equivalent to y == -1.
The expression (y + 1) - 1 reduces to simply y (assuming wraparound overflow semantics)
The expressions y and y+1 cannot both equal zero. This leaves three cases we can explicitly consider.

Our new function looks like:

int f(int y) {
    if (y == 0)
        return y;            /* or return 0 */
    else if (y == -1)
        return y - 1;        /* or return -2 */
    else
        return y + y - 1;
}

[edit] Problems

Replacing a call site with an expanded function body can present several problems that may make this "optimization" actually hurt performance:

In applications where code size is more important than speed, such as many embedded systems, inlining is usually disadvantageous except for very small functions.
The increase in code size may cause a small, critical section of code to no longer fit in the cache, causing cache misses and slowdown.
The added variables from the inlined procedure may consume additional registers, and in an area where register pressure is already high this may force spilling, which causes additional RAM accesses.
A language specification may allow a program to make additional assumptions about arguments to procedures which it can no longer make after the procedure is inlined.
If code size is increased too much, resource constraints such as RAM size may be exceeded, leading to programs that either cannot be run or that cause thrashing. Today, this is unlikely to be an issue with desktop or server computers except with ridiculous levels of inlining, but it can still be an issue for embedded systems.

Typically, a compiler is aware of these issues and strives to choose which functions to inline in such a way that performance is only enhanced in most cases.

Furthermore, it is not always possible to inline a subroutine. Consider the case of a subroutine which calls itself recursively until a particular piece of input data is received from a peripheral. Because the compiler is not omniscient, it cannot know in general when this process will end, so it would never finish inlining if it was designed to inline every single subroutine invocation. Thus, compilers for languages which support recursion must have restrictions on what they will automatically choose to inline, to avoid getting stuck in that kind of infinite inlining loop.

[edit] Selection methods and language support

Many compilers aggressively inline functions wherever it is beneficial to do so. Although this can lead to larger executables, this has nevertheless become more and more desirable as growth of memory capacities have outpaced growth of CPU speed. This automatic type of inlining is a critical optimization in functional languages and object-oriented programming languages, which rely on it to give enough context to their typically small functions to make classical optimization effective.