Manycore processing unit

From Wikipedia, the free encyclopedia

A many-core processing unit (or MPU for short) is a type of microprocessor characterized by

MPUs have emerged since 2000 as a new class of processor used for embedded applications in telecommunications, networking and other applications.

Contents

[edit] History and evolution

[edit] Origins: network processing units (NPU)

The 1990s saw the emergence of a class of device called the Network Processing Unit (NPU). These were offered as being a flexible technology for the implementation of high-speed processing of packet network data. The NPU was offered as a technology that could replace ASIC and FPGA designs and reduce development time by replacing costly hardware design and validation with software development.

NPUs did not have all the market success they hoped for[1]. Nor did they quite meet the promises of broad applicability or ease of use as a result of difficulties mapping NPUs to specific applications[2]. Consequently the start-up companies that brought these products to market are mostly defunct, a process driven also by the 1990s technology market boom and bust.

[edit] Emergence of the MPU

Intel’s offering in the NPU market, the IXP[3] product line, was a device with many processor cores they called micro-engines. The micro-engines were specialized for NPU tasks and had a special, IXP-specific instruction set but were general enough to allow the IXP to be programmed for a range of applications wider than some other NPUs. But because of the specialized nature of the micro-engines, users were faced with learning to program highly pipelined designs in a new environment, with new tools on a specialized target.

SiByte[4], a start-up subsequently acquired by Broadcom, took a similar path but using many standard MIPS cores instead of the proprietary micro-engines of the IXP family. This offered several important benefits: a standard development tool-set could be used including the GNU compiler (although this was not always seamless and some users found it necessary to program in MIPS assembly language to meet performance targets). It was also possible to run the Linux operating system and applications on SiByte. SiByte was the first product in the new class of Many-core Processing Unit devices, or MPU for short.

[edit] Modern MPU market

Following the Broadcom SiByte, several MPUs have come to market:

  • Cavium Networks’ OCTEON[5]
  • Raza Microelectronics’ XLR[6]
  • Sun Microsystems’ Niagara UltraSPARC[7]
  • PA-Semi’s PWRficient[8]

[edit] Defining the MPU processor class

The common characteristics that define the MPU processor class are examined in turn.

Not all MPUs exhibit all the characteristics presented below but all meet enough of them to be identified as MPUs.

[edit] System on a chip

In contrast to Intel Architecture and PowerPC general purpose processors (GPP), MPUs are aimed at embedded applications. As such MPUs integrate many peripheral functions that GPPs do not and as a result, most can be regarded as system-on-a-chip (SOC) devices.

[edit] Many cores

Each MPU product family currently offers up to 8 or 16 processor cores. This stands in contrast to multi-core general purpose IA and PowerPC processors that typically have two processor cores and occasionally have four.

[edit] Standard RISC instruction set

MPU cores have standard instruction sets:

  • MIPS64: Broadcom SiByte, Cavium OCTEON, Raza XLR
  • PowerPC: PA-Semi PWRficient
  • SPARC: Sun UltraSPARC

All of these are industry-standard Reduced Instruction Set Computer (RISC) processor cores.

This characteristic stands in contrast to NPUs that, to the extent that they were programmable, used specialized proprietary instruction sets. It also stands in contrast to multi-core IA processors that use the IA CISC instruction set.

[edit] Integrated memory controllers

Performance of typical MPU applications, such as packet processing and network control protocols (e.g. signalling and call control), is often sensitive to first-access memory latency, i.e. the time taken to access memory that is not cached on chip, owing to high cache miss rate. This is sometimes more important than peak memory bandwidth. To achieve low first-access latency MPUs have integrated memory controllers. This is distinct from Intel and IBM general purpose processors that use separate memory controller devices adjacent to the processors and are more optimized for maximum bulk memory throughput.

[edit] Integrated streaming packet IO hardware

Embedded packet processing and network control applications have heavy packet IO loads so many MPUs add streaming packet interface functions in on-chip hardware to offload these tasks from software on the processor cores. Layer-two protocol termination (e.g. Ethernet MAC layer) in hardware combined with packet input and output packet queues are typical. This is compared with general purpose processors that normally use memory address-space oriented interfaces such as PCI, PCI-Express or Hypertransport.

[edit] Packet processing acceleration hardware

Many MPU applications can benefit from specialized hardware processing for acceleration functions for common tasks in packet processing:

[edit] MPU applications

[edit] In telecommunications and networking equipment

MPUs are emerging as a class of technology relevant for embedded applications in telecommunications and networking equipment such as:

The high integration and specialized features that characterize MPU class processor devices make them more efficient and thus more cost-effective than general purpose processors in many of these applications.

Also of value is that MPUs are sufficiently general as processors (they can run standard Linux SMP or lightweight simple executives) so that one MPU based computer design can address a wide selection of the applications listed above using different software loads. This reduces the number of different module types that are needed in a product line addressing both control and packet processing network functions, reducing development costs and inventory volumes.