POWER4
From Wikipedia, the free encyclopedia
The POWER4 chip is a CPU that implements the 64-bit PowerPC instruction set architecture. Released in 2001, the POWER4 chip is based on the previous POWER3 chip design. The POWER4 chip is a multicore chip, including two PowerPC cores.
Contents |
[edit] Functional Layout
The functional unit of the POWER4™ consists of two 64-bit implementations of the PowerPC AS Architecture. The POWER4™ has an L2 unified cache, divided into three equal parts. Each has its own independent L2 controller which can feed 32 bytes of data per cycle. The Core Interface Unit (CIU) connects each L2 controller to either the data cache or instruction cache in either of the two processors. The Non-Cacheable (NC) Unit is responsible for handling instruction serializing functions and performing any noncacheable operations in the storage topology. There is an L3 cache controller, but the actual memory is off-chip. The GX bus controller controls I/O device communications, and there are two 4-byte wide GX buses, one incoming and the other outgoing. The Fabric Controller is the master controller for the network of buses, controlling communications for both L1/L2 controllers, communications between POWER4™ chips {4-way, 8-way, 16-way, 32-way} and POWER4™ MCM’s. Trace-and-Debug, used for First Failure Data Capture, is provided. There is also a Built In Self Test function (BIST) and Performance Monitoring Unit (PMU). Power-On Reset (POR) is supported.
[edit] Execution Unit
The POWER4 implements a superscalar microarchitecture through high-frequency speculative out-of-order execution using 8 independent execution units. They are: 2 floating-point units (FP1-2), 2 load-store units (LD1-2), 2 fixed-point units (FX1-2), 1 branch unit (BR), and 1 conditional-register unit (CR). These execution units can complete up to eight operations per clock (not including the BR and CR units):
- each floating point unit can complete one fused multiply-add per clock (two operations),
- each load-store unit can complete one instruction per clock,
- each fixed-point unit can complete one instruction per clock.
The pipeline stages are:
- Branch Prediction
- Instruction Fetch
- Decode, Crack and Group Formation
- Group Dispatch and Instruction Issue
- Load/Store Unit Operation
- Load Hit Store
- Store Hit Load
- Load Hit Load
- Instruction Execution Pipeline
[edit] Multi-Chip Configuration
Not only did the POWER4 become the first microprocessor to incorporate dual-cores in a single die, it also was the first to implement a Multi-Chip Module (MCM) containing four POWER4 Microprocessors in a single package.
[edit] Parametrics
Clock GHz | >1.3 | |
---|---|---|
Power | 115 W | 1.5 V @ 1.1 GHz |
Transistors | 174 million | |
Gate L | 90 nm | |
Gate oxide | 2.3 nm | |
Metal-layer | pitch | thickness |
M1 | 500 nm | 310 nm |
M2 | 630 nm | 310 nm |
M3-M5 | 630 nm | 420 nm |
M6(MQ) | 1260 nm | 920 nm |
M7(LM) | 1260 nm | 920 nm |
Dielectric | ~4.2 | |
Vdd | 1.6 V |
[edit] See also
[edit] References
- POWER4 System Microarchitecture. IBM. Retrieved on 2006-07-21.
- J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy (2002). "POWER4 system microarchitecture". IBM Journal of Research and Development 46 (1): 5-26. DOI:10.1147/rd.461.0005. ISSN 0018-8646. Retrieved on 2006-07-21.
- J. D. Warnock, J. M. Keaty, J. Petrovick, J. G. Clabes, C. J. Kircher, B. L. Krauter, P. J. Restle, B. A. Zoric, and C. J. Anderson (2002). "The circuit and physical design of the POWER4 microprocessor". IBM Journal of Research and Development 46 (1): 27-52. DOI:10.1147/rd.461.0027. ISSN 0018-8646. Retrieved on 2006-07-21.