NetBurst

From Wikipedia, the free encyclopedia

The Intel NetBurst Microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of CPUs made by Intel. The first one to use this architecture was the Willamette core, released in late 2000. This was the first of the Pentium 4 CPUs, and all subsequent Pentium 4 variants have also been based on NetBurst. In mid 2001, Intel released the Foster core, which was also based on NetBurst, thus switching the Xeon CPUs to the new architecture as well. Pentium 4 based Celeron CPUs also use the NetBurst architecture.

NetBurst is sometimes referred to as the Intel P7, Intel 80786, or i786 architecture when comparing to previous generations[citation needed]. These are not official names; P7 was in fact used internally at Intel for what became the Itanium architecture.

Contents

[edit] Technology

The NetBurst architecture includes features such as "Hyper Pipelined Technology" and "Rapid Execution Engine" which are firsts in this particular microarchitecture.

[edit] Hyper Pipelined Technology

Intel chose this name for the 20 stage pipeline within the Willamette architecture. This is a significant increase in the number of stages when compared to the Pentium 3 which had 10 stages in its pipeline. The Prescott achitecture, the last core of the Pentium 4, has a 31 stage pipeline. Although a longer pipeline has some disadvantages, mainly a reduced number of instructions per cycle (IPC), the higher number of stages in the pipeline allow the CPU to have higher clock speeds which will technically offset any loss in performance due to the reduced IPC. A smaller IPC is an indirect consequence of pipeline depth - a matter of design compromise (a small number of long pipelines has a smaller IPC than a greater number of short pipelines). Another drawback of having more stages in a pipeline is an increase in the number of stages that need to be traced back in the event that the branch predictor makes a mistake, increasing the penalty paid for a misprediction. To address this issue, Intel devised the "Rapid Execution Engine" and has invested a great deal into its branch prediction technology, which Intel claims reduces mispredictions by 33% over Pentium III.[1]

Critique: While this architecture achieved substantial increases in clock speeds, overall performance is considered to have been substantially lower than AMDs competitive product at the time. This occurred because of the substantial number of refreshes required by the pipelines durring erroneous branch predictions. This however led to Intel's development of a highly accurate branch predictor that is capable of prediction to within 90% accuracy. Developement of this algorithm have no doubt increased the performance of the Core microarchitecture, which more closely resembles the architecture of the Pentium III.[citations needed]

[edit] Rapid Execution Engine

As per this technology, the ALUs in the core of the CPU actually operate at twice the core clock frequency. This means that in a 3.5 GHz CPU, the ALUs will effectively be operating at 7 GHz. The reason behind this is to generally make up for the low IPC count; additionally this considerably enhances the integer performance of the CPU. The downside is that certain instructions are now much slower (relatively and absolutely) than before, making optimization for multiple target CPUs difficult. An example is shift and rotate operations, which suffer from the lack of a barrel shifter which was present on every x86 CPU beginning with the 386 (and is also present on Athlon and Hammer).

[edit] Execution Trace Cache

Within the L2 cache of the CPU, Intel has incorporated what it calls an Execution Trace Cache. This cache stores decoded micro-operations, so that when executing a new instruction, instead of fetching and decoding the instruction again, the CPU can directly access the decoded micro-ops from the trace cache, thereby saving a considerable amount of time. Moreover the micro-ops are cached in their predicted path of execution, which means that when instructions are fetched by the CPU from the cache, they are already present in the correct order of execution.

Despite all these enhancements, the NetBurst architecture created obstacles for engineers trying to scale up its performance. With this architecture, Intel was looking to touch speeds of 10 GHz, but with rising clock speed, Intel faced increasing problems with keeping power dissipation within acceptable limits. Intel reached limits at a speed of 3.8 GHz and has encountered problems trying to achieve even that. As a result, Intel decided to abandon NetBurst, and has since developed a newer microarchitecture, known as Core microarchitecture (inspired by the P6 Core of the Pentium Pro to the "Tualatin" Pentium III-S and most directly the Pentium M), to help them achieve their goals.

[edit] Revisions

Intel replaced the original Willamette core with a redesigned version of the NetBurst architecture called Northwood in January of 2002. The Northwood design combined an increased cache size, a smaller 130 nm fabrication process, and hyper-threading technology (although initially all models but the 3.06 GHz one had this feature disabled) to produce a more modern, higher-performing version of the NetBurst architecture.

In February of 2004, Intel introduced another, more radical revision of the architecture called Prescott. The Prescott was produced on a 90 nm process, and included several major design changes, including the addition of an even larger cache (from 512 KiB in the Northwood to 1 MiB, and later 2 MiB), a much larger instruction pipeline (31 stages as compared to 20 in the Northwood), a heavily improved branch predictor, the introduction of the SSE3 SIMD instructions, and later, the implementation of EM64T, Intel's branding for their compatible implementation of the AMD64 64-bit version of the x86 architecture (as with hyper-threading, all Prescott chips have hardware to support this feature, but it was initially only enabled on high-end Xeon processors before being officially introduced in processors with the Pentium brand). Despite having many new features, the Prescott often performed worse than a similarly-clocked Northwood, and many engineers felt that the real-world performance of the processor was compromised by attempting to achieve the highest clock speed possible.[citation needed] Power consumption and heat dissipation also became a major issue with Prescott, as it is one of the hottest-running and power-hungry microprocessors in history. Power and heat concerns have thus far prevented Intel from releasing a Prescott clocked above 3.8 GHz, or a mobile version of the core. This has led some computer enthusiasts to coin the term "the Intel Face-Plant", mocking the apparent failure of Prescott.

Intel has also released a dual-core version of the NetBurst architecture called Smithfield, which is actually two Prescott cores in a single die, and later Presler, which consists of two Cedar Mill cores on two separate dies (Cedar Mill being the 65 nm die-shrink of Prescott).

[edit] Future

Intel has replaced NetBurst with the Intel Core microarchitecture, released in July 2006, which is more directly derived from 1995's Pentium Pro or 2001's Pentium III-S than it is from NetBurst.

Presler, a Pentium D core released in early 2006, is widely touted by analysts to be the last in the line of NetBurst, though the actual final NetBurst chip was the Cedar Mill core Celeron D 365 clocked at 3.60GHz. The "Conroe" version of the Intel Core 2 processor, using the Intel Core microarchitecture, is the successor to Presler.

[edit] NetBurst based chips

[edit] See also

In other languages