Loop splitting

Loop splitting is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range.

Loop peeling

Loop peeling is a special case of loop splitting which splits any problematic first (or last) few iterations from the loop and performs them outside of the loop body.

Suppose a loop was written like this:

 int p = 10;
 for (int i=0; i<10; ++i)
 {
   y[i] = x[i] + x[p];
   p = i;
 }

Notice that p = 10 only for the first iteration, and for all other iterations, p = i - 1. A compiler can take advantage of this by unwinding (or "peeling") the first iteration from the loop.

After peeling the first iteration, the code would look like this:

 y[0] = x[0] + x[10];
 for (int i=1; i<10; ++i)
 {
   y[i] = x[i] + x[i-1];
 }

This equivalent form eliminates the need for the variable p inside the loop body.

Loop peeling was introduced in gcc in version 3.4.

Brief history of the term

Apparently the term was for the first time used by Cannings, Thompson and Skolnick [1] in their 1976 paper on computational models for (human) inheritance. There the term was used to denote a method for collapsing phenotypic information onto parents. From there the term was used again in their papers, including their seminal paper on probability functions on complex pedigrees.[2]

In compiler technology, the term first turned up in late 1980s papers on VLIW and superscalar compilation, including [3] and.[4]

Further reading

  1. Cannings, C.; Thompson, EA; Skolnick, HH (1976). "The recursive derivation of likelihoods on complex pedigrees". Advances in Applied Probability 8 (4): 622–625. doi:10.2307/1425918.
  2. Cannings, C.; Thompson, EA; Skolnick, HH (1978). "Probability functions on complex pedigrees". Advances in Applied Probability 10 (1): 26–61. doi:10.2307/1426718.
  3. Callahan, D; Kennedy, K (1988). "Compiling Programs for Distributed-memory Multiprocessors". The Journal of Supercomputing 2 (2): 151–169. doi:10.1007/BF00128175.
  4. Mahlke, SA; Lin, DC; Chen, WY; Hank, RE; Bringman, RA (1992). Effective compiler support for predicated execution using the hyperblock. 25th Annual International Symposium on Microarchitecture. pp. 45––54.