Explicitly Parallel Instruction Computing

From Wikipedia, the free encyclopedia

Explicitly Parallel Instruction Computing (EPIC) is a computing paradigm that began to be researched in the 1990s. This paradigm is also called Independence architectures. It was used by Intel and HP in the development of Intel’s IA-64 architecture, and has been implemented in Intel’s Itanium and Itanium 2 line of server processors. The goal of EPIC was to increase the ability of microprocessors to execute software instructions in parallel, by using the compiler, rather than complex on-die circuitry, to identify and leverage opportunities for parallel execution. This would allow performance to be scaled more rapidly in future processor designs, without resorting to ever-higher clock frequencies, which have since become problematic due to associated power and cooling issues.

Contents

[edit] Roots in VLIW

Out-of-order execution and speculative execution have been used successfully for many years to increase the parallel execution of software code in mainstream microprocessors. However, due to the growing complexity of scaling these approaches, the processor industry in the mid-1990s started to re-examine instruction sets which explicitly encode multiple operations per instruction. The basis for such research is VLIW, in which multiple operations are encoded in every instruction, and then processed by multiple execution units.

One goal of this strategy is to move the complexity of instruction scheduling from the CPU hardware to the software compiler, which can do the instruction scheduling statically (with help of trace feedback information). This eliminates the need for complex scheduling circuitry in the CPU, which frees up space and power for other functions, including additional execution resources. An equally important goal is to further exploit instruction level parallelism (ILP), by using the compiler to find and exploit additional opportunities for parallel execution..

VLIW (at least the original forms) has several short-comings that precluded it from becoming mainstream:

  • VLIW instruction sets are not backward compatible between implementations. As wider implementations (more execution units) are built, the instruction set for the wider machines is not backward compatible with older, narrower implementations.
  • Load responses from a memory hierarchy which includes CPU caches and DRAM do not give a deterministic delay of when the load response returns to the processor. This makes static scheduling of load instructions by the compiler very difficult.

[edit] Moving Beyond VLIW

EPIC architectures add several features to get around the deficiencies of VLIW:

  • Each group of multiple software instructions is called a bundle. Each of the bundles has information indicating if this set of operations is depended upon by the subsequent bundle. With this capability, future implementations can be built to issue multiple bundles in parallel. The dependency information is calculated by the compiler, so the hardware does not have to perform operand dependency checking.
  • A speculative load instruction is used as a type of data prefetch. This prefetch increases the chances for a primary cache hit for normal loads.
  • A check load instruction also aids speculative loads by checking that a load was not dependent on a previous store.

The EPIC architecture also includes a grab-bag of architectural concepts to increase ILP:

  • Predicated execution is used to decrease the occurrence of branches and to increase the speculative execution of instructions. In this feature, branch conditions are converted to predicate registers which are used to kill results of executed instructions from the side of the branch which is not taken.
  • Delayed exceptions (using a Not-A-Thing bit within the general purpose registers) also allow more speculative execution past possible exceptions.
  • Very large architectural register files avoid the need for register renaming.
  • Multi-way branch instructions

The IA-64 architecture also added register rotation - a digital signal processing concept useful for loop unrolling and software pipelining.

[edit] Ongoing Research and development

  • The IMPACT project at University of Illinois at Urbana-Champaign, led by Wen-mei Hwu, has been the source of much influential research on this topic.
  • The PlayDoh architecture from HP-labs is another major research project.
  • Gelato.org is an open source development community in which academic and commercial researchers are working to develop more effective compilers for Linux applications running on Itanium servers.


[edit] See also