OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus APIs that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. It has been adopted by Intel, AMD, Nvidia, and ARM.
OpenCL gives any application access to the graphics processing unit for non-graphical computing. Thus, OpenCL extends the power of the Graphics Processing Unit beyond graphics (general-purpose computing on graphics processing units). Academic researchers have investigated automatically compiling OpenCL programs into application-specific processors running on FPGAs,[1] and commercial FPGA vendors are developing tools to translate OpenCL to run on their FPGA devices.[2]
OpenCL is analogous to the open industry standards OpenGL and OpenAL, for 3D graphics and computer audio, respectively. OpenCL is managed by the non-profit technology consortium Khronos Group.
History
OpenCL was initially developed by Apple Inc., which holds trademark rights, and refined into an initial proposal in collaboration with technical teams at AMD, IBM, Intel, and Nvidia. Apple submitted this initial proposal to the Khronos Group. On 16 June 2008 the Khronos Compute Working Group was formed[3] with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by 18 November 2008.[4] This technical specification was reviewed by the Khronos members and approved for public release on 8 December 2008.[5]
OpenCL 1.0
'OpenCL 1.0' has been released with Mac OS X Snow Leopard. According to an Apple press release:[6]
Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.
AMD has decided to support OpenCL (and DirectX 11) instead of the now deprecated Close to Metal in its Stream framework.[7][8] RapidMind announced their adoption of OpenCL underneath their development platform to support GPUs from multiple vendors with one interface.[9] On 9 December 2008, Nvidia announced its intention to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit.[10] On 30 October 2009, IBM released its first OpenCL implementation as a part of the XL compilers.[11]
OpenCL 1.1
'OpenCL 1.1' was ratified by the Khronos Group 14 June 2010[12] and adds significant functionality for enhanced parallel programming flexibility, functionality and performance including:
- New data types including 3-component vectors and additional image formats;
- Handling commands from multiple host threads and processing buffers across multiple devices;
- Operations on regions of a buffer including read, write and copy of 1D, 2D or 3D rectangular regions;
- Enhanced use of events to drive and control command execution;
- Additional OpenCL built-in C functions such as integer clamp, shuffle and asynchronous strided copies;
- Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.
OpenCL 1.2
On 15 Nov 2011 the 'OpenCL 1.2' specification was announced by the Khronos Group[13] which added significant functionality over the previous versions in terms of performance and features for parallel programming. Most notable features include:
- Device partitioning: the ability to partition a device into sub-devices so that work assignments can be allocated to individual compute units. This is useful for reserving areas of the device to reduce latency for time-critical tasks.
- Separate compilation and linking of objects: the functionality to compile OpenCL into external libraries for inclusion into other programs.
- Enhanced image support: 1.2 adds support for 1D images and 1D/2D image arrays. Furthermore, the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D/2D texture arrays to be used to create OpenCL images.
- Built-in kernels: custom devices that contain specific unique functionality are now integrated more closely into the OpenCL framework. Kernels can be called to use specialised or non-programmable aspects of underlying hardware. Examples include, video encoding/decoding and digital signal processors.
- DirectX functionality: DX9 media surface sharing allows for efficient sharing between OpenCL and DX9 or DXVA media surfaces. Equally, for DX11 seamless sharing between OpenCL and DX11 surfaces is enabled.
The OpenCL specification is under development at Khronos, which is open to any interested company to join.
Implementation
- On 10 December 2008, AMD and Nvidia held the first public OpenCL demonstration, a 75-minute presentation at Siggraph Asia 2008. AMD showed a CPU-accelerated OpenCL demo explaining the scalability of OpenCL on one or more cores while Nvidia showed a GPU-accelerated demo.[14][15]
- On 16 March 2009, at the 4th Multicore Expo, Imagination Technologies announced the PowerVR SGX543MP, the first GPU of this company to feature OpenCL support.[16]
- On 26 March 2009, at GDC 2009, AMD and Havok demonstrated the first working implementation for OpenCL accelerating Havok Cloth on AMD Radeon HD 4000 series GPU.[17]
- On 20 April 2009, Nvidia announced the release of its OpenCL driver and SDK to developers participating in its OpenCL Early Access Program.[18]
- On 5 August 2009, AMD unveiled the first development tools for its OpenCL platform as part of its ATI Stream SDK v2.0 Beta Program.[19]
- On 28 August 2009, Apple released Mac OS X Snow Leopard, which contains a full implementation of OpenCL.[20]
- OpenCL in Snow Leopard is supported on the NVIDIA GeForce 320M, GeForce GT 330M, GeForce 9400M, GeForce 9600M GT, GeForce 8600M GT, GeForce GT 120, GeForce GT 130, GeForce GTX 285, GeForce 8800 GT, GeForce 8800 GS, Quadro FX 4800, Quadro FX5600, ATI Radeon HD 4670, ATI Radeon HD 4850, Radeon HD 4870, ATI Radeon HD 5670, ATI Radeon HD 5750, ATI Radeon HD 5770 and ATI Radeon HD 5870.[21]
- On 28 September 2009, NVIDIA released its own OpenCL drivers and SDK implementation.
- On 13 October 2009, AMD released the fourth beta of the ATI Stream SDK 2.0, which provides a complete OpenCL implementation on both R700/R800 GPUs and SSE3 capable CPUs. The SDK is available for both Linux and Windows.[22]
- On 26 November 2009, NVIDIA released drivers for OpenCL 1.0 (rev 48).
- The Apple,[23] Nvidia,[24] RapidMind[25] and Gallium3D[26] implementations of OpenCL are all based on the LLVM Compiler technology and use the Clang Compiler as its frontend.
- On 27 October 2009, S3 released their first product supporting native OpenCL 1.0 - the Chrome 5400E embedded graphics processor.[27]
- On 10 December 2009, VIA released their first product supporting OpenCL 1.0 - ChromotionHD 2.0 video processor included in VN1000 chipset.[28]
- On 21 December 2009, AMD released the production version of the ATI Stream SDK 2.0,[29] which provides OpenCL 1.0 support for R800 GPUs and beta support for R700 GPUs.
- On 1 June 2010, ZiiLABS released details of their first OpenCL implementation for the ZMS processor for handheld, embedded and digital home products.[30]
- On 30 June 2010, IBM released a fully conformant version of OpenCL 1.0.[31]
- On 13 September 2010, Intel released details of their first OpenCL implementation for the Sandy Bridge chip architecture. Sandy Bridge will integrate Intel's newest graphics chip technology directly onto the central processing unit.[32]
- On 15 November 2010, Wolfram Research released Mathematica 8 with OpenCLLink package.
- On 3 March 2011, Khronos Group announces the formation of the WebCL working group to explore defining a JavaScript binding to OpenCL. This creates the potential to harness GPU and multi-core CPU parallel processing from a Web browser.[33][34]
- On 31 March 2011, IBM released a fully conformant version of OpenCL 1.1.[31][35]
- On 25 April 2011, IBM released OpenCL Common Runtime v0.1 for Linux on x86 Architecture.[36]
- On 4 May 2011, Nokia Research releases an open source WebCL extension for the Firefox web browser, providing a JavaScript binding to OpenCL.[37]
- On 1 July 2011, Samsung Electronics releases an open source prototype implementation of WebCL for WebKit, providing a JavaScript binding to OpenCL.[38]
- On 8 August 2011, AMD released the OpenCL-driven AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK) v2.5, replacing the ATI Stream SDK as technology and concept.[39]
- On 12 December 2011, AMD released AMD APP SDK v2.6[40] which contained a preview of OpenCL 1.2.
OpenCL language
The programming language used to write computation kernels is based on C99 with some limitations and additions. It omits the use of function pointers, recursion, bit fields, variable-length arrays, and standard C99 header files.[41] The language is extended to easily use parallelism with vector types and operations, synchronization, functions to work with work-items/groups.[42] It has memory region qualifiers: __global, __local, __constant, and __private. Also, a lot of built-in functions are added.
Example: computing the FFT
This example will load a Fast Fourier Transformation (FFT) implementation and execute it. The FFT implementation is presented below. [43]
// create a compute context with GPU device
context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
// create a command queue
queue = clCreateCommandQueue(context, NULL, 0, NULL);
// allocate the buffer memory objects
memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float)*2*num_entries, srcA, NULL);
memobjs[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*2*num_entries, NULL, NULL);
// create the compute program
program = clCreateProgramWithSource(context, 1, &fft1D_1024_kernel_src, NULL, NULL);
// build the compute program executable
clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
// create the compute kernel
kernel = clCreateKernel(program, "fft1D_1024", NULL);
// set the args values
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobjs[0]);
clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&memobjs[1]);
clSetKernelArg(kernel, 2, sizeof(float)*(local_work_size[0]+1)*16, NULL);
clSetKernelArg(kernel, 3, sizeof(float)*(local_work_size[0]+1)*16, NULL);
// create N-D range object with work-item dimensions and execute kernel
global_work_size[0] = num_entries;
local_work_size[0] = 64;
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL);
The actual calculation (based on Fitting FFT onto the G80 Architecture):[44]
// This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into
// calls to a radix 16 function, another radix 16 function and then a radix 4 function
__kernel void fft1D_1024 (__global float2 *in, __global float2 *out,
__local float *sMemx, __local float *sMemy) {
int tid = get_local_id(0);
int blockIdx = get_group_id(0) * 1024 + tid;
float2 data[16];
// starting index of data to/from global memory
in = in + blockIdx; out = out + blockIdx;
globalLoads(data, in, 64); // coalesced global reads
fftRadix16Pass(data); // in-place radix-16 pass
twiddleFactorMul(data, tid, 1024, 0);
// local shuffle using local memory
localShuffle(data, sMemx, sMemy, tid, (((tid & 15) * 65) + (tid >> 4)));
fftRadix16Pass(data); // in-place radix-16 pass
twiddleFactorMul(data, tid, 64, 4); // twiddle factor multiplication
localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15)));
// four radix-4 function calls
fftRadix4Pass(data); // radix-4 function number 1
fftRadix4Pass(data + 4); // radix-4 function number 2
fftRadix4Pass(data + 8); // radix-4 function number 3
fftRadix4Pass(data + 12); // radix-4 function number 4
// coalesced global writes
globalStores(data, out, 64);
}
A full, open source implementation of an OpenCL FFT can be found on Apple's website.[45]
OpenCL conformant products
The Khronos Group announces an extended list of OpenCL conformant products, see OpenCL Conformant Products.
Synopsis of OpenCL conformant products[46] |
AMD APP SDK (supports OpenCL CPU and Accelerated processing unit Devices) |
X86 + SSE2 (or higher) compatible CPUs 64bit & 32bit;[47] Linux 2.6 PC, Windows Vista/7 PC |
AMD Fusion E-350, E-240, C-50, C-30 with HD 6310/HD 6250 |
AMD Radeon/Mobility HD 6800, HD 5x00 series GPU, iGPU HD 6310/HD 6250 |
ATI FirePro Vx800 series GPU |
Intel OpenCL SDK 1.1[48] (supports only OpenCL Intel Core based CPU Device) |
Intel CPUs with SSE 4.1, SSE 4.2 or AVX support.[49][50] Microsoft Windows, Linux |
Intel Core i7, i5, i3; 2nd Generation Intel Core i7/5/3 |
Intel Core 2 Solo, Duo Quad, Extreme |
Intel Xeon 7x00,5x00,3x00 (Core based) |
IBM Servers with OpenCL Development Kit for Linux on Power running on Power VSX[51][52] |
IBM Power 755 (PERCS), 750 |
IBM BladeCenter PS70x Express |
IBM BladeCenter JS2x, JS43 |
IBM BladeCenter QS22 |
IBM OpenCL Common Runtime (OCR)
[53]
|
X86 + SSE2 (or higher) compatible CPUs 64bit & 32bit;[54] Linux 2.6 PC |
AMD Fusion, NVIDIA ION and Intel Core i7, i5, i3; 2nd Generation Intel Core i7/5/3 |
AMD Radeon, NVIDIA GeForce and Intel Core 2 Solo, Duo, Quad, Extreme |
ATI FirePro, NVIDIA Quadro and Intel Xeon 7x00,5x00,3x00 (Core based) |
NVIDIA OpenCL Driver and Tools[55] |
NVIDIA Tesla C/D/S |
NVIDIA GeForce GTS/GT/GTX |
NVIDIA ION |
NVIDIA Quadro FX/NVX/Plex |
See also
References
- ^ Jääskeläinen, Pekka O.; de La Lama, Carlos S.; Huerta, Pablo; Takala, Jarmo H. (July 2010). "OpenCL-based design methodology for application-specific processors". 2010 International Conference on Embedded Computer Systems (SAMOS) (IEEE): 223–230. doi:10.1109/ICSAMOS.2010.5642061. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5642061. Retrieved 17 February 2011.
- ^ "Jobs at Altera". Archived from the original on 21 July 2011. http://web.archive.org/web/20110721110044/http://tbe.taleo.net/NA3/ats/careers/requisition.jsp?org=ALTERA&cws=1&rid=938.
- ^ "Khronos Launches Heterogeneous Computing Initiative" (Press release). Khronos Group. 16 June 2008. http://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/. Retrieved 18 June 2008.
- ^ "OpenCL gets touted in Texas". MacWorld. 20 November 2008. http://www.macworld.com/article/136921/2008/11/opencl.html?lsrc=top_2. Retrieved 12 June 2009.
- ^ "The Khronos Group Releases OpenCL 1.0 Specification" (Press release). Khronos Group. 8 December 2008. http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/. Retrieved 12 June 2009.
- ^ "Apple Previews Mac OS X Snow Leopard to Developers" (Press release). Apple Inc.. 9 June 2008. http://www.apple.com/pr/library/2008/06/09snowleopard.html. Retrieved 9 June 2008.
- ^ "AMD Drives Adoption of Industry Standards in GPGPU Software Development" (Press release). AMD. 6 August 2008. http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~127451,00.html. Retrieved 14 August 2008.
- ^ "AMD Backs OpenCL, Microsoft DirectX 11". eWeek. 6 August 2008. http://www.eweek.com/c/a/Desktops-and-Notebooks/AMD-Backing-OpenCL-and-Microsoft-DirectX-11/. Retrieved 14 August 2008.
- ^ "HPCWire: RapidMind Embraces Open Source and Standards Projects". HPCWire. 10 November 2008. http://www.hpcwire.com/topic/applications/RapidMind_Embraces_Open_Source_and_Standards_Projects.html. Retrieved 11 November 2008.
- ^ "NVIDIA Adds OpenCL To Its Industry Leading GPU Computing Toolkit" (Press release). Nvidia. 9 December 2008. http://www.nvidia.com/object/io_1228825271885.html. Retrieved 10 December 2008.
- ^ "OpenCL Development Kit for Linux on Power". alphaWorks. 30 October 2009. http://www.alphaworks.ibm.com/tech/opencl. Retrieved 30 October 2009.
- ^ Khronos Drives Momentum of Parallel Computing Standard with Release of OpenCL 1.1 Specification
- ^ Khronos Releases OpenCL 1.2 Specification
- ^ "OpenCL Demo, AMD CPU". 10 December 2008. http://www.youtube.com/watch?v=sLv_fhQlqis. Retrieved 28 March 2009.
- ^ "OpenCL Demo, NVIDIA GPU". 10 December 2008. http://www.youtube.com/watch?v=PJ1jydg8mLg. Retrieved 28 March 2009.
- ^ "Imagination Technologies launches advanced, highly-efficient POWERVR SGX543MP multi-processor graphics IP family". Imagination Technologies. 19 March 2009. http://www.imgtec.com/News/Release/index.asp?NewsID=449. Retrieved 30 January 2011.
- ^ "AMD and Havok demo OpenCL accelerated physics". PC Perspective. 26 March 2009. http://www.pcper.com/comments.php?nid=6954. Retrieved 28 March 2009.
- ^ "NVIDIA Releases OpenCL Driver To Developers". NVIDIA. 20 April 2009. http://www.nvidia.com/object/io_1240224603372.html. Retrieved 27 April 2009.
- ^ "AMD does reverse GPGPU, announces OpenCL SDK for x86". Ars Technica. 5 August 2009. http://arst.ch/5te. Retrieved 6 August 2009.
- ^ Dan Moren; Jason Snell (8 June 2009). "Live Update: WWDC 2009 Keynote". macworld.com. MacWorld. http://www.macworld.com/article/140897/2009/06/keynote.html. Retrieved 12 June 2009.
- ^ "Mac OS X Snow Leopard – Technical specifications and system requirements". Apple Inc. 23 March 2011. http://www.apple.com/macosx/specs.html. Retrieved 23 March 2011.
- ^ "ATI Stream Software Development Kit (SDK) v2.0 Beta Program". http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx#one. Retrieved 14 October 2009.
- ^ "Apple entry on LLVM Users page". http://llvm.org/Users.html#Apple. Retrieved 29 August 2009.
- ^ "Nvidia entry on LLVM Users page". http://llvm.org/Users.html. Retrieved 6 August 2009.
- ^ "Rapidmind entry on LLVM Users page". http://llvm.org/Users.html. Retrieved 1 October 2009.
- ^ "Zack Rusin's blog post about the Gallium3D OpenCL implementation". http://zrusin.blogspot.com/2009/02/opencl.html. Retrieved 1 October 2009.
- ^ "S3 Graphics launched the Chrome 5400E embedded graphics processor". http://www.s3graphics.com/en/news/news_detail.aspx?id=44. Retrieved 27 October 2009.
- ^ "VIA Brings Enhanced VN1000 Graphics Processor"]. http://www.via.com.tw/en/resources/pressroom/pressrelease.jsp?press_release_no=4327. Retrieved 10 December 2009.
- ^ "ATI Stream SDK v2.0 with OpenCL 1.0 Support". http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx. Retrieved 23 October 2009.
- ^ http://www.ziilabs.com/opencl
- ^ a b "Khronos Group Conformant Products". http://www.khronos.org/adopters/conformant-products/#topencl.
- ^ "Intel discloses new Sandy Bridge technical details". http://news.cnet.com/8301-13924_3-20016302-64.html. Retrieved 13 September 2010.
- ^ WebCL related stories
- ^ Khronos Releases Final WebGL 1.0 Specification
- ^ "OpenCL Development Kit for Linux on Power". http://www.alphaworks.ibm.com/tech/opencl.
- ^ "About the OpenCL Common Runtime for Linux on x86 Architecture". https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Wbf059a58a9b9_459d_aca4_493655c96370/page/OpenCL%20Common%20Runtime.
- ^ Nokia Research releases WebCL prototype
- ^ Samsung's WebCL Prototype for WebKit
- ^ [1]
- ^ AMD APP SDK v2.6
- ^ AMD. Introduction to OpenCL Programming 201005, page 89-90
- ^ AMD. Introduction to OpenCL Programming 201005, page 89-90
- ^ "OpenCL". SIGGRAPH2008. 14 August 2008. http://s08.idav.ucdavis.edu/munshi-opencl.pdf. Retrieved 14 August 2008.
- ^ "Fitting FFT onto G80 Architecture" (PDF). Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report. May 2008. http://www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_report.pdf. Retrieved 14 November 2008.
- ^ . "OpenCL on FFT". Apple. 16 Nov 2009. https://developer.apple.com/mac/library/samplecode/OpenCL_FFT/index.html. Retrieved 7 December 2009.
- ^ "Conformant Products". http://www.khronos.org/conformance/adopters/conformant-products/. Retrieved 11 August 2011.
- ^ "OpenCL and the AMD APP SDK". AMD Developer Central. developer.amd.com. http://developer.amd.com/documentation/articles/pages/OpenCL-and-the-AMD-APP-SDK.aspx. Retrieved 11 August 2011.
- ^ "About Intel OpenCL SDK 1.1". software.intel.com. intel.com. http://software.intel.com/en-us/articles/opencl-sdk/. Retrieved 11 August 2011.
- ^ "Product Support". http://software.intel.com/en-us/articles/opencl-sdk-frequently-asked-questions/#12. Retrieved 11 August 2011.
- ^ "Intel OpenCL SDK - Release Notes". http://software.intel.com/en-us/articles/opencl-release-notes/. Retrieved 11 August 2011.
- ^ "Announcing OpenCL Development Kit for Linux on Power v0.3". http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14600651&tstart=0. Retrieved 11 August 2011.
- ^ "IBM releases OpenCL Development Kit for Linux on Power v0.3 - OpenCL 1.1 conformant release available". OpenCL Lounge. ibm.com. https://www.ibm.com/developerworks/mydeveloperworks/blogs/80367538-d04a-47cb-9463-428643140bf1/entry/ibm_releases_opencl_development_kit_for_linux_on_power_v0_3_opencl_1_1_conformant_release_available6?lang=en. Retrieved 11 August 2011.
- ^ "IBM releases OpenCL Common Runtime for Linux on x86 Architecture". https://www.ibm.com/developerworks/mydeveloperworks/blogs/80367538-d04a-47cb-9463-428643140bf1/entry/ibm_releases_opencl_common_runtime_for_linux_on_x86_architecture4?lang=en. Retrieved 10 September 2011.
- ^ "OpenCL and the AMD APP SDK". AMD Developer Central. developer.amd.com. http://developer.amd.com/documentation/articles/pages/OpenCL-and-the-AMD-APP-SDK.aspx. Retrieved 10 September 2011.
- ^ "Nvidia Releases OpenCL Driver". http://www.tomshardware.com/news/Nvidia-Cuda-OpenCL-SDK,7596.html. Retrieved 11 August 2011.
External links
Documentation
Drivers
Libraries
Language bindings and wrappers
Tools