Manycore processing unit
From Wikipedia, the free encyclopedia
A many-core processing unit (or MPU for short) is a type of microprocessor characterized by
- many standard instruction set microprocessor cores,
- integrated low-latency memory controller,
- hardware acceleration futures for packet handling.
MPUs have emerged since 2000 as a new class of processor used for embedded applications in telecommunications, networking and other applications.
Contents |
[edit] History and evolution
[edit] Origins: network processing units (NPU)
The 1990s saw the emergence of a class of device called the Network Processing Unit (NPU). These were offered as being a flexible technology for the implementation of high-speed processing of packet network data. The NPU was offered as a technology that could replace ASIC and FPGA designs and reduce development time by replacing costly hardware design and validation with software development.
NPUs did not have all the market success they hoped for[1]. Nor did they quite meet the promises of broad applicability or ease of use as a result of difficulties mapping NPUs to specific applications[2]. Consequently the start-up companies that brought these products to market are mostly defunct, a process driven also by the 1990s technology market boom and bust.
[edit] Emergence of the MPU
Intel’s offering in the NPU market, the IXP[3] product line, was a device with many processor cores they called micro-engines. The micro-engines were specialized for NPU tasks and had a special, IXP-specific instruction set but were general enough to allow the IXP to be programmed for a range of applications wider than some other NPUs. But because of the specialized nature of the micro-engines, users were faced with learning to program highly pipelined designs in a new environment, with new tools on a specialized target.
SiByte[4], a start-up subsequently acquired by Broadcom, took a similar path but using many standard MIPS cores instead of the proprietary micro-engines of the IXP family. This offered several important benefits: a standard development tool-set could be used including the GNU compiler (although this was not always seamless and some users found it necessary to program in MIPS assembly language to meet performance targets). It was also possible to run the Linux operating system and applications on SiByte. SiByte was the first product in the new class of Many-core Processing Unit devices, or MPU for short.
[edit] Modern MPU market
Following the Broadcom SiByte, several MPUs have come to market:
- Cavium Networks’ OCTEON[5]
- Raza Microelectronics’ XLR[6]
- Sun Microsystems’ Niagara UltraSPARC[7]
- PA-Semi’s PWRficient[8]
[edit] Defining the MPU processor class
The common characteristics that define the MPU processor class are examined in turn.
Not all MPUs exhibit all the characteristics presented below but all meet enough of them to be identified as MPUs.
[edit] System on a chip
In contrast to Intel Architecture and PowerPC general purpose processors (GPP), MPUs are aimed at embedded applications. As such MPUs integrate many peripheral functions that GPPs do not and as a result, most can be regarded as system-on-a-chip (SOC) devices.
[edit] Many cores
Each MPU product family currently offers up to 8 or 16 processor cores. This stands in contrast to multi-core general purpose IA and PowerPC processors that typically have two processor cores and occasionally have four.
[edit] Standard RISC instruction set
MPU cores have standard instruction sets:
All of these are industry-standard Reduced Instruction Set Computer (RISC) processor cores.
This characteristic stands in contrast to NPUs that, to the extent that they were programmable, used specialized proprietary instruction sets. It also stands in contrast to multi-core IA processors that use the IA CISC instruction set.
[edit] Integrated memory controllers
Performance of typical MPU applications, such as packet processing and network control protocols (e.g. signalling and call control), is often sensitive to first-access memory latency, i.e. the time taken to access memory that is not cached on chip, owing to high cache miss rate. This is sometimes more important than peak memory bandwidth. To achieve low first-access latency MPUs have integrated memory controllers. This is distinct from Intel and IBM general purpose processors that use separate memory controller devices adjacent to the processors and are more optimized for maximum bulk memory throughput.
[edit] Integrated streaming packet IO hardware
Embedded packet processing and network control applications have heavy packet IO loads so many MPUs add streaming packet interface functions in on-chip hardware to offload these tasks from software on the processor cores. Layer-two protocol termination (e.g. Ethernet MAC layer) in hardware combined with packet input and output packet queues are typical. This is compared with general purpose processors that normally use memory address-space oriented interfaces such as PCI, PCI-Express or Hypertransport.
[edit] Packet processing acceleration hardware
Many MPU applications can benefit from specialized hardware processing for acceleration functions for common tasks in packet processing:
- traffic management, such as Class of service queues with congestion controls like tail-drop and random early detection
- Scheduling algorithms such as strict priority and weighted fair queueing
- Security functions including: bulk encryption and decryption, random number generation, packet authentication hash computation
- Packet parsing and inspection algorithms
- Regular expression processing
- Compression and decompression, necessary for inspection of compressed data
- Fast interfaces to external search devices such as ternary content-addressable memories, deep packet parsers, longest prefix match engines etc.
[edit] MPU applications
[edit] In telecommunications and networking equipment
MPUs are emerging as a class of technology relevant for embedded applications in telecommunications and networking equipment such as:
- Signalling applications, often considered control plane applications, including softswitch, control functions in IP Multimedia Subsystem (IMS), control server (x-CSCF), signalling gateway (SGW), mobile switching centres (MSC).
- Bearer applications which pass or manipulate bearer traffic including both circuit and IP media gateways, access network aggregators, Base Station Controller (BSC) and Radio Network Controller (RNC).
- Transport applications that are part of the access network including IP-DSLAM, optical network termination, optical line termination.
- Base station applications a specialized class covering the needs of wireless base stations serving WiMAX, 4G and 3GPP/3GPP2 networks.
The high integration and specialized features that characterize MPU class processor devices make them more efficient and thus more cost-effective than general purpose processors in many of these applications.
Also of value is that MPUs are sufficiently general as processors (they can run standard Linux SMP or lightweight simple executives) so that one MPU based computer design can address a wide selection of the applications listed above using different software loads. This reduces the number of different module types that are needed in a product line addressing both control and packet processing network functions, reducing development costs and inventory volumes.