AMD FireStream

AMD FireStream is AMD's brand name for their Radeon-based product line targeting stream processing and/or GPGPU in supercomputers. Originally developed by ATI Technologies until the company was acquired by AMD in 2006, the product line was previously branded as both ATI FireStream and AMD Stream Processor.^[1] The AMD FireStream can also be used as a floating-point co-processor for offloading CPU calculations, which is part of the Torrenza initiative.

Overview

Since the release of the past-generation Radeon R520 and GeForce G70 GPU cores, the programmable shaders architecture with large floating-point (FP) throughput has drawn more attention from academic and commercial interest groups, primarily for its ability to process data besides its original intended use of rendering visual effects. Due to the displayed interest, more resources were allocated towards developing GPGPU products — responsible for calculating general purpose mathematical formulas — to process heavy calculations which were previously running on mainstream servers, desktop Central Processing Units (CPU), and specialized floating-point math co-processors. GPGPUs were projected to have performance gains upwards of a factor of 10 when compared to CPU-only projections.

Similar GPGPUs appeared as early as the early 2000s. BionicFX was experimenting with processing audio data with a GeForce 6800 video card, announcing the Audio Video EXchange (AVEX) framework,^[2] with similar trials being performed by ATI at about the same time. Another example is the Folding@home distributed computing research program from Stanford University. This was the first piece of software to use the Radeon R580 GPU and other ATI GPU cores, equipped with a special beta version of the ATI Catalyst driver (version 6.5), to perform computations unrelated to graphics. Since May 2006, it has used the GPU cores to accelerate the simulation of protein folding in order to investigate protein-related diseases. At this time, the ATI FireStream was in its planning stages.

With the acquisition of ATI complete, AMD officially announced the reconstruction of branding and announced the AMD Stream Processor (originally the ATI FireStream) on November 15, 2006 as the industry's first commercially available hardware stream processing solution. Based on an ATI Radeon X1900 video card, the AMD Stream Processor is a specialized add-on card that implements the R580 Graphics Processing Unit (GPU). However, it was targeted at complex floating-point calculations used in scientific and financial fields instead of 3D graphics acceleration. AMD claimed that this processor had 8 times the floating-point performance over traditional graphics data processing.^[3]

In fact, ATI had put considerable effort into research and development (R&D) of a GPGPU product before their purchase by AMD,^[4] and announced the adoption of the stream processing/GPGPU concept in its line of GPU cores in 2006, codenamed Radeon R580.

The brand was further renamed to AMD FireStream with the second generation of stream processors (based on a 55 nm process), released on November 8, 2007.^[5] Future plans include the development of a stream processor on an MXM module, intended for embedded applications and next generation products in the fourth quarter of 2008.^[6]

Hardware

AMD stream processing lineup

The hardware specifications of stream processors released by AMD (and previously ATI) are summarized as follows:

Generation	Model	Video card equivalent	GPU Core	Threads max.	Core		Memory					Raw processing power (Floating-Point Operations per Second)		Peak TDP (watts)	Others
Generation	Model	Video card equivalent	GPU Core	Threads max.	SPUs ^NB1	Clock (MHz)	Bandwidth (GiB/s)	Type	Bus width (bit)	Amount (MiB)	Clock (MHz)	FP32 GFLOPs	FP64 GFLOPs	Peak TDP (watts)	Others
1st ^NB2	580^[6]/2U^[9]	Radeon X1900 XTX	R580	512	48	600	83.2	GDDR3	256	1024	650	375^[10]	N/A	≤165
2nd ^NB2	9170^[8]^[11]	Radeon HD 3870	RV670	?	64 (320)	800	51.2	GDDR3	256	2048	800	512	102.4 ^NB3^[12]	≤105
3rd ^NB2	9250^[13]	Radeon HD 4850	RV770	16,384^[14]	160 (800)	625	63.5	GDDR3	256	1024	993	1000	200 ^NB3	≤150
3rd ^NB2	9270^[15]	Radeon HD 4870	RV770	16,384^[14]	160 (800)	750	108.8	GDDR5	256	2048	850	1200	240 ^NB3	<160
4th ^NB2	9350	Radeon HD 5850	Cypress (RV870)	31,744^[16]	288 (1440)	700	128	GDDR5	256	2048	1000	2016	403.2	150	codenamed Kestrel
4th ^NB2	9370	Radeon HD 5870	Cypress (RV870)	31,744^[16]	320 (1600)	825	147.2	GDDR5	256	4096	1150	2640	528	225	codenamed Osprey

Notes:

NB1: The number of Stream Processing Units (SPU) can only be applied to DirectX 10-compatible hardware and above, which contains unified shaders. Also note that the Stream Processing Units in ATI hardware implementations is architecturally different from NVIDIA's implementation of Stream Processors in Tesla products. The SP in NVIDIA's implementation has a hot clock domain which runs higher frequency than the other parts of the core, while SPUs in ATI's implementation have the same clock frequency as the core and don't feature a hot clock domain.

NB2: The first generation of products originally used the ATI FireStream brand, and were re-branded as AMD Stream Processor after the brand reconstruction act that was followed by AMD's acquisition of ATI. AMD refers to the RV670-based AMD FireStream 9170 because no R600-based AMD Stream Processors were released under the stream processing lineup (although prototype cards were publicly demonstrated with similar configurations as the FireGL V8650 without video output capabilities). Since the FireGL 2007 series, the high-end and ultra high-end FireGL products have implemented stream processing support. This feature is also available on all ATI FirePro cards.

NB3: Estimated to be one-fifth of the theoretical figure for single-precision operations.

Software

The AMD FireStream was launched with a wide range of software platform support. One of the supporting firms was PeakStream (acquired by Google in June 2007), who was first to provide an open beta version of software to support CTM and AMD FireStream as well as x86 and Cell (Cell Broadband Engine) processors. The FireStream was claimed to be 20 times faster in typical applications than regular CPUs after running PeakStream's software . RapidMind also provided stream processing software that worked with ATI and NVIDIA, as well as Cell processors.^[17]

Software Development Kit

After abandoning their short-lived Close to Metal API, AMD focused on OpenCL. AMD first released its Stream Computing SDK (v1.0), in December 2007 under the AMD EULA, to be run on Windows XP.^[17] The SDK includes "Brook+", an AMD hardware optimized version of the Brook language developed by Stanford University, itself a variant of the ANSI C (C language), open-sourced and optimized for stream computing. The AMD Core Math Library (ACML) and AMD Performance Library (APL) with optimizations for the AMD FireStream and the COBRA video library (further renamed as "Accelerated Video Transcoding" or AVT) for video transcoding acceleration will also be included. Another important part of the SDK, the Compute Abstraction Layer (CAL), is a software development layer aimed for low-level access, through the CTM hardware interface, to the GPU architecture for performance tuning software written in various high-level programming languages.

In August 2011, AMD released version 2.5 of the ATI APP Software Development Kit,^[17] which includes support for OpenCL 1.1, a parallel computing language developed by the Khronos Group. The concept of compute shaders, officially called DirectCompute, in Microsoft's next generation API called DirectX 11 is already included in graphics drivers with DirectX 11 support.

AMD APP SDK

AMD APP SDK
Original author(s)	Advanced Micro Devices
Stable release	2.9.1 / August 28, 2014 (2014-08-28)
Preview release	3.0 Beta / December 9, 2014 (2014-12-09)
Operating system	Linux, Microsoft Windows
Type	software development kit
License	?
Website	developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/

AMD Stream SDK (ATI Stream SDK) was replaced by AMD APP SDK, available for Microsoft Windows and Linux, 32-bit and 64-bit. APP stands for "Accelerated Parallel Processing",^[18] and also targets Heterogeneous System Architecture (not only GPU).^[19]

AMD intends developers to employ AMD APP SDK to utilize Video Codec Engine hybrid mode to create create hybrid encoders that pair custom motion estimation, inverse discrete cosine transform and motion compensation with the hardware entropy encoding to achieve faster than real-time encoding.

APP SDK 3.0 Beta supports OpenCL™ 2.0 and Catalyst Omega 14.12 driver.

The AMD APP SDK v3.0 Beta includes samples for OpenCL™ as well as accelerated libraries such as Bolt (an open-source C++ template library) and the OpenCL™ accelerated OpenCV (Open Computer Vision) library.

Advantages

According to an AMD-demonstrated system^[20] with two dual-core AMD Opteron processors and two Radeon R600 GPU cores running on Microsoft Windows XP Professional, 1 teraflop (TFLOP) can be achieved by a universal multiply-add (MADD) calculation. By comparison, an Intel Core 2 Quad Q9650 3.0 GHz processor at the time could achieve 48 GFLOPS.^[21]

Recent demonstrations showed that, in Kaspersky SafeStream anti-virus scanning tests optimized for AMD stream processors, the system with two AMD stream processors with dual Opteron processors spotted 6.2 Gbit/s (775 MiB/s) bandwidth, 21 times faster when compared to other dual-processor systems. The stream processor systems also showed only 1-2% CPU use, which indicates significant offloading from the CPU to the stream processor.^[22]

Limitations

Recursive functions are not supported in Brook+ because all function calls are inlined at compile time. Using CAL, functions (recursive or otherwise) are supported to 32 levels.^[23]
Only bilinear texture filtering is supported; mipmapped textures and anisotropic filtering are not supported at this time.
Functions cannot have a variable number of arguments. The same problem occurs for recursive functions.
Conversion of floating-point numbers to integers on GPUs is done differently than on x86 CPUs; it is not fully IEEE-754 compliant.
Doing "global synchronization" on the GPU is not very efficient, which forces the GPU to divide the kernel and do synchronization on the CPU. Given the variable number of multiprocessors and other factors, there may not be a perfect solution to this problem.
The bus bandwidth and latency between the CPU and the GPU may become a bottleneck, which may be alleviated in the future by introducing interconnects with higher bandwidth.

References

↑ AMD Press Release
↑ ExtremeTech report, retrieved July 17, 2007
↑ AMD “Close to Metal” Technology Unleashes the Power of Stream Computing: AMD Press Release, November 14, 2006.
↑ ATI DPVM SIGGRAPH 2006 sketch PDF (134 KiB), ATI DVPM SIGGRAPH 2006 Presentation PDF (671 KiB)
↑ "AMD Delivers First Stream Processor with Double Precision Floating Point Technology". AMD. 8 November 2007. Retrieved 12 November 2007.
↑ 6.0 6.1 AMD WW HPC 2007 presentation PDF (8.81 MiB), page 37 of 53
↑ AnandTech report: ATI's Stream Processing & Folding@home, September 30, 2006
↑ 8.0 8.1 Business Wire coverage, retrieved November 8, 2007
↑ ATI Vendor ID page, retrieved February 26, 2008. "Product name, GPU, PCI Device ID: ATI FireStream 2U, R580, 724E; ATI FireStream 2U Secondary, R580, 726E"
↑ R580 shader core FLOPs
↑ AMD FireStream 9170 - Product page
↑ "AMD's RV670 does double-precision at half the speed". Tigervision Media. 1 February 2008.
↑ AMD FireStream 9250 - Product page
↑ "Entering the Golden Age of Heterogeneous Computing", Michael Mantor, Senior GPU Compute Architect / Fellow, AMD Graphics Product Group, slide 11 of 71
↑ AMD FireStream 9270 - Product page
↑ "Heterogeneous Computing: OpenCL™ and the ATI Radeon™ HD 5870 (“Evergreen”) Architecture", Advanced Micro Devices, slide 56 of 80
↑ 17.0 17.1 17.2 AMD APP SDK download page and Stream Computing SDK EULA, retrieved December 29, 2007
↑ "AMD APP SDK OpenCL™ Accelerated Parallel Processing".
↑ http://stackoverflow.com/questions/9473420/whats-the-difference-between-amds-app-sdk-and-amd-atis-stream-technology
↑ HardOCP report, retrieved July 17, 2007
↑ Intel microprocessor export compliance metrics
↑ The Inquirer report, retrieved September 12, 2007
↑ AMD Intermediate Language Reference Guide, August 2008

External links

Applications

Folding@home

AMD graphics

Radeon-brand (List of GPUs and List of APUs)

Fixed pipeline

Unified shaders

TeraScale	HD 2000 HD 3000 HD 4000 HD 5000 HD 6000

Unified shaders & memory

GCN	HD 7000 HD 8000 Rx 200 Rx 300

Current technologies and software

Audio/Video acceleration	Unified Video Decoder (UVD) Video Codec Engine (VCE) TrueAudio

GPU technologies	Eyefinity (multi-monitor) PowerTune (power-saving) CrossFireX (multi-GPU) Hybrid Graphics HyperMemory HyperZ Mantle HSA

Software	Catalyst Mantle HD3D AMD CodeXL GPU PerfStudio AMD APP SDK HLSL2GLSL Close to Metal

Other brands and products

Workstations & Supercomputers	FireGL/FirePro (certified OpenGL) FireMV (multi-monitor) FireStream (stream processing & GPGPU)

Consoles	Flipper (GameCube) Xenos (Xbox 360) Hollywood (Wii) Latte (Wii U) AMD Liverpool (PlayStation 4) AMD Durango (Xbox One)