Power Processing Element

Power Processing Element

The 90 nm Cell BE processor. The PPE is the upper fourth of the processor.
Produced From 2005 to Present
Marketed by IBM, Sony, Microsoft
Designed by IBM
Common manufacturer(s)
Max. CPU clock rate 2.8 GHz to 3.2 GHz
Min. feature size 90 nm to 45 nm
Instruction set Power Architecture
Microarchitecture PPU
Cores 1
L1 cache 32 KB instruction + 32 KB data
GPU Xenos, in the XCGPU variant.
Application Gaming Console, HPC
Variant Cell BE, XCPU, XCGPU, PowerXCell 8i

The Power Processing Unit (PPU) is a 64-bit dual threaded in-order Power Architecture microprocessor core designed by IBM for use primarily in the game consoles Playstation 3 and Xbox 360, but has also found applications in high performance computing in supercomputers such as the record setting IBM Roadrunner.

In most instances the PPU is joined by a 512 KB L2 cache to form what is called the Power Processing Element (PPE).

The PPU is used as a main CPU core in three different processor designs:

Main features

Functional units

In-Order

The PPU is an In-Order processor, but it has some unique traits which allow it to achieve some benefits of Out-of-Order execution without expensive re-ordering hardware. Upon reaching an L1 cache miss - it can execute past the cache miss, stopping only when an instruction is actually dependent on a load. It can send up to 8 load instructions to the L2 cache out-of-order. It also has an instruction delay pipe - a side path that allows it to execute instructions that would normally cause pipeline stalls without holding up the rest of the pipeline.

Multithreading

The PPU runs two hardware threads simultaneously. The main registers for code execution are duplicated, as are the exception and interrupt-handling registers, and several essential arrays and queues. They can generate exceptions simultaneously, and perform branch prediction on their individual branch histories. The execution engine and caches are not duplicated though - so it is still just a single-core design.[1]

Floating Point Capacity

Its 64-bit single precision floating-point unit, and 128-bit VMX unit (using the AltiVec instruction set), can perform a theoretical 12 floating-point operations per cycle, as all Power Architecture floating-point units can do floating-point multiply-adds, and come no smaller than 64-bits. That gives 3.2 billion clock cycles * 12 = 38.4 billion floating-point operations/second.

The PPU is enhanced in the PowerXCell 8i processor to be able to make single cycle double precision floating point operations, tailored for high performance computing in supercomputers.

The VMX unit in the XCPU in the Xbox 360 is enhanced with 128 registers and is not entirely compatible with regular AltiVec.

References