Heterogeneous computing

From Wikipedia, the free encyclopedia

Heterogeneous computing refers to systems that use more than one kind of processor. These are multi-core systems that gain performance not just by adding cores, but also by incorporating specialized processing capabilities to handle particular tasks. Heterogeneous System Architecture (HSA) systems utilize multiple processor types (typically CPUs and GPUs), usually on the same silicon die, to give you the best of both worlds: GPU processing, apart from its well-known 3D graphics rendering capabilities, can also perform mathematically intensive computations on very large data sets, while CPUs can run the operating system and perform traditional serial tasks.

By the end of 2010, nearly all new desktop computers had multicore processors, with dual-core and even quad core processors entering the mainstream of affordable computing. Still, multicore processing posed some challenges of its own. The extra cores and cache memory required to fuel their instruction pipelines came at a cost of both increased processor size and, again, high power consumption.

Meanwhile, the multi-core era also saw some interesting developments in GPUs, which were growing in sophistication and complexity, spurred on by advances in semiconductor technology. GPUs have vector processing capabilities that enable them to perform parallel operations on very large sets of data – and to do it at much lower power consumption relative to the serial processing of similar data sets on CPUs. This is what allows GPUs to drive capabilities such as incredibly realistic, multiple display stereoscopic gaming. And while their value was initially derived from the ability to improve 3D graphics performance by offloading graphics from the CPU, they became increasingly attractive for more general purposes, such as addressing data parallel programming tasks.

The early efforts to leverage GPUs for general purpose computing coincided with a notable shift in consumer culture. There was a dramatic increase in the availability and quality of digital content, coupled with an increasing consumer appetite for rich visual experiences like video playback and viewing content in HD. At the same time, the emergence of mainstream operating system support for advanced multitasking began to require processing efficiency of an entirely new magnitude

Common features

The drive to improve performance and the continuing constraints on power and scalability in multi-core CPU development have led semiconductor, software and systems designers increasingly to look to the vector processing capabilities of GPUs.

Vector processors like those in advanced GPUs have up to thousands of individual compute cores, which can operate simultaneously. This makes GPUs ideally suited for computing tasks that deal with a combination of very large data sets and intensive numerical computation requirements.

(And GPUs are rapidly advancing to do even more for less.) But – and this is a very important “but” – vector processing isn’t the answer all the time. For example, small data arrays have overhead associated with setting up vector processing that can easily outweigh the time saved. That’s why the scalar approach used by CPUs is still best for certain problems and algorithms. And that’s why heterogeneous computing, which brings together the best of both CPUs and GPUs, is essential to driving faster and more powerful processor designs for new and better experiences.

ISA or Instruction set architecture: Compute elements may have different instruction set architectures, leading to binary incompatibility.
ABI or application binary interface: Compute elements may interpret memory in different ways. This may include both endianness, calling convention, and memory layout, and depends on both the architecture and compiler being used.
API or application programming interface: Library and OS services may not be uniformly available to all compute elements.
Low-Level Implementation of Language Features: Language features such as functions and threads are often implemented using function pointers, a mechanism which requires additional translation or abstraction when used in heterogeneous environments.
Memory Interface and Hierarchy: Compute elements may have different cache structures, cache coherency protocols, and memory access may be uniform or non-uniform memory access (NUMA). Differences can also be found in the ability to read arbitrary data lengths as some processors/units can only perform byte-, word-, or burst accesses.
Interconnect: Compute elements may have differing types of interconnect aside from basic memory/bus interfaces. This may include dedicated network interfaces, Direct memory access (DMA) devices, mailboxes, FIFOs, and scratchpad memories, etc.

Heterogeneous platforms often require the use of multiple compilers in order to target the different types of compute elements found in such platforms. This results in a more complicated development process compared to homogeneous systems process; as multiple compilers and linkers must be used together in a cohesive way in order to properly target a heterogeneous platform. Interpretive techniques can be used to hide heterogeneity, but the cost (overhead) of interpretation often requires the use of just-in-time compilation mechanisms that result in a more complex run-time system that may be unsuitable in embedded, or real-time scenarios.

Heterogeneous computing platforms

Heterogeneous computing platforms can be found in every domain of computing—from high-end servers and high-performance computing machines all the way down to low-power embedded devices including mobile phones and tablets.

High Performance Computing

Cray XD1 ^[1]
SRC Computers SRC-6 and SRC-7

Embedded Systems (DSP and Mobile Platforms)

Texas Instruments OMAP
Analog Devices Blackfin
ARM Big.LITTLE CPUs (symmetric ISAs, but performance/power asymmetric platform)
Nvidia Tegra
Samsung Exynos
Apple "A" series

Reconfigurable Computing

Xilinx Platform FPGAs (Virtex-II Pro, Virtex 4 FX, Virtex 5 FXT)^[2] and Zynq Platforms ^[3]
Intel "Stellarton" (Atom + Altera FPGA)

Networking

Intel IXP Network Processors

General Purpose Computing, Gaming, and Entertainment Devices

Intel Sandy Bridge, Ivy Bridge, and Haswell CPUs
AMD APUs ^[4]
IBM Cell,^[5] found in the Playstation 3
- SpursEngine, a variant of the IBM Cell processor
Emotion Engine,^[6] found in the Playstation 2

Hybrid-core computing

Hybrid-core computing is the technique of extending a commodity instruction set architecture (e.g. x86) with application-specific instructions to accelerate application performance. It is a form of heterogeneous computing^[7] wherein asymmetric computational units coexist with a "commodity" processor.

Hybrid-core processing differs from general heterogeneous computing in that the computational units share a common logical address space, and an executable is composed of a single instruction stream—in essence a contemporary coprocessor. The instruction set of a hybrid-core computing system contains instructions that can be dispatched either to the host instruction set or to the application-specific hardware.

Typically, hybrid-core computing is best deployed where the predominance of computational cycles are spent in a few identifiable kernels, as is often seen in high-performance computing applications. Acceleration is especially pronounced when the kernel’s logic maps poorly to a sequence of commodity processor instructions, and/or maps well to the application-specific hardware.

Hybrid-core computing is used to accelerate applications beyond what is currently physically possible with off-the-shelf processors, or to lower power & cooling costs in a data center by reducing computational footprint. (i.e., to circumvent obstacles such as the power/density challenges faced with today's commodity processors).^[8]

Programming Heterogeneous Computing Architectures

Programming heterogeneous machines can be done in specialized APIS since developing programs that make best use of characteristics of same processors decreases the programmer's burden. Requiring APIs that depending on the task (Serial,Graphic,Web) portability can be designed.^[9] Balancing the application workload across processors isn't problematic in HSA as can be solved by using GPU units transitors to work as cores(x86),^[10] Users simply program using these abstractions and an intelligent compiler chooses the best implementation based on the context.^[11]

In 2012 a group of companies formed the HSA foundation.^[12]

References

↑ Cray Computers. "Cray XD1 Datasheet." Retrieved March 22, 2013
↑ Ron Wilson, EDN. "Xilinx FPGA introductions hint at new realities." February 2, 2009 Retrieved June 10, 2010.
↑ Mike Demler, EDN. "Xilinx integrates dual ARM Cortex-A9 MPCore with 28-nm, low-power programmable logic." March 1, 2011. Retrieved March 1, 2011.
↑ "What is Heterogeneous System Architecture (HSA)?". AMD. March 31, 2013. Retrieved March 31, 2013.
↑ "A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor" (PDF). Hot Chips 17. August 15, 2005. Retrieved 1 January 2006.
↑ "Vector Unit Architecture for Emotion Synthesis". IEEE Micro. March 2000. Retrieved March 31, 2013.
↑ Heterogeneous Processing: a Strategy for Augmenting Moore's Law". Linux Journal 1/2/2006. http://www.linuxjournal.com/article/8368
↑ "New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies," Fred Pollack, Director of Microprocessor Research Labs http://research.ac.upc.edu/HPCseminar/SEM9900/Pollack1.pdf
↑ Kunzman, D. M.; Kale, L. V. (2011). "Programming Heterogeneous Systems". 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. p. 2061. doi:10.1109/IPDPS.2011.377. ISBN 978-1-61284-425-1.
↑ Siegfried Benkner, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas, Andrew Richards, Raymond Namyst, Beverly Bachmayer, Christoph Kessler, David Moloney, Peter Sanders (2012). "The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures". Advances in Parallel Computing, IOS Press 22: 361–368. doi:10.3233/978-1-61499-041-3-361.
↑ John Darlinton, Moustafa Ghanem, Yike Guo, Hing Wing To (1996), "Guided Resource Organisation in Heterogeneous Parallel Computing", Journal of High Performance Computing 4 (1): 13–23
↑ "The 'third era' of app development will be fast, simple, and compact."

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.