OpenCL

OpenCL
Original author(s) Apple Inc.
Developer(s) Khronos Group
Stable release 1.2 / 15 November 2011; 2 months ago (2011-11-15)
Operating system Cross-platform
Type API
License Royalty Free
Website www.khronos.org/opencl
www.khronos.org/webcl

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus APIs that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. It has been adopted by Intel, AMD, Nvidia, and ARM.

OpenCL gives any application access to the graphics processing unit for non-graphical computing. Thus, OpenCL extends the power of the Graphics Processing Unit beyond graphics (general-purpose computing on graphics processing units). Academic researchers have investigated automatically compiling OpenCL programs into application-specific processors running on FPGAs,[1] and commercial FPGA vendors are developing tools to translate OpenCL to run on their FPGA devices.[2]

OpenCL is analogous to the open industry standards OpenGL and OpenAL, for 3D graphics and computer audio, respectively. OpenCL is managed by the non-profit technology consortium Khronos Group.

Contents

History

OpenCL was initially developed by Apple Inc., which holds trademark rights, and refined into an initial proposal in collaboration with technical teams at AMD, IBM, Intel, and Nvidia. Apple submitted this initial proposal to the Khronos Group. On 16 June 2008 the Khronos Compute Working Group was formed[3] with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by 18 November 2008.[4] This technical specification was reviewed by the Khronos members and approved for public release on 8 December 2008.[5]

OpenCL 1.0

'OpenCL 1.0' has been released with Mac OS X Snow Leopard. According to an Apple press release:[6]

Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.

AMD has decided to support OpenCL (and DirectX 11) instead of the now deprecated Close to Metal in its Stream framework.[7][8] RapidMind announced their adoption of OpenCL underneath their development platform to support GPUs from multiple vendors with one interface.[9] On 9 December 2008, Nvidia announced its intention to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit.[10] On 30 October 2009, IBM released its first OpenCL implementation as a part of the XL compilers.[11]

OpenCL 1.1

'OpenCL 1.1' was ratified by the Khronos Group 14 June 2010[12] and adds significant functionality for enhanced parallel programming flexibility, functionality and performance including:

OpenCL 1.2

On 15 Nov 2011 the 'OpenCL 1.2' specification was announced by the Khronos Group[13] which added significant functionality over the previous versions in terms of performance and features for parallel programming. Most notable features include:

The OpenCL specification is under development at Khronos, which is open to any interested company to join.

Implementation

OpenCL in Snow Leopard is supported on the NVIDIA GeForce 320M, GeForce GT 330M, GeForce 9400M, GeForce 9600M GT, GeForce 8600M GT, GeForce GT 120, GeForce GT 130, GeForce GTX 285, GeForce 8800 GT, GeForce 8800 GS, Quadro FX 4800, Quadro FX5600, ATI Radeon HD 4670, ATI Radeon HD 4850, Radeon HD 4870, ATI Radeon HD 5670, ATI Radeon HD 5750, ATI Radeon HD 5770 and ATI Radeon HD 5870.[21]
The Apple,[23] Nvidia,[24] RapidMind[25] and Gallium3D[26] implementations of OpenCL are all based on the LLVM Compiler technology and use the Clang Compiler as its frontend.

OpenCL language

The programming language used to write computation kernels is based on C99 with some limitations and additions. It omits the use of function pointers, recursion, bit fields, variable-length arrays, and standard C99 header files.[41] The language is extended to easily use parallelism with vector types and operations, synchronization, functions to work with work-items/groups.[42] It has memory region qualifiers: __global, __local, __constant, and __private. Also, a lot of built-in functions are added.

Example: computing the FFT

This example will load a Fast Fourier Transformation (FFT) implementation and execute it. The FFT implementation is presented below. [43]

  // create a compute context with GPU device
  context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
 
  // create a command queue
  queue = clCreateCommandQueue(context, NULL, 0, NULL);
 
  // allocate the buffer memory objects
  memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float)*2*num_entries, srcA, NULL);
  memobjs[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*2*num_entries, NULL, NULL);
 
  // create the compute program
  program = clCreateProgramWithSource(context, 1, &fft1D_1024_kernel_src, NULL, NULL);
 
  // build the compute program executable
  clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
 
  // create the compute kernel
  kernel = clCreateKernel(program, "fft1D_1024", NULL);
 
  // set the args values
  clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobjs[0]);
  clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&memobjs[1]);
  clSetKernelArg(kernel, 2, sizeof(float)*(local_work_size[0]+1)*16, NULL);
  clSetKernelArg(kernel, 3, sizeof(float)*(local_work_size[0]+1)*16, NULL);
 
  // create N-D range object with work-item dimensions and execute kernel
  global_work_size[0] = num_entries;
  local_work_size[0] = 64;
  clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL);

The actual calculation (based on Fitting FFT onto the G80 Architecture):[44]

  // This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into
  // calls to a radix 16 function, another radix 16 function and then a radix 4 function
 
  __kernel void fft1D_1024 (__global float2 *in, __global float2 *out,
                          __local float *sMemx, __local float *sMemy) {
    int tid = get_local_id(0);
    int blockIdx = get_group_id(0) * 1024 + tid;
    float2 data[16];
 
    // starting index of data to/from global memory
    in = in + blockIdx;  out = out + blockIdx;
 
    globalLoads(data, in, 64); // coalesced global reads
    fftRadix16Pass(data);      // in-place radix-16 pass
    twiddleFactorMul(data, tid, 1024, 0);
 
    // local shuffle using local memory
    localShuffle(data, sMemx, sMemy, tid, (((tid & 15) * 65) + (tid >> 4)));
    fftRadix16Pass(data);               // in-place radix-16 pass
    twiddleFactorMul(data, tid, 64, 4); // twiddle factor multiplication
 
    localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15)));
 
    // four radix-4 function calls
    fftRadix4Pass(data);      // radix-4 function number 1
    fftRadix4Pass(data + 4);  // radix-4 function number 2
    fftRadix4Pass(data + 8);  // radix-4 function number 3
    fftRadix4Pass(data + 12); // radix-4 function number 4
 
    // coalesced global writes
    globalStores(data, out, 64);
  }

A full, open source implementation of an OpenCL FFT can be found on Apple's website.[45]

OpenCL conformant products

The Khronos Group announces an extended list of OpenCL conformant products, see OpenCL Conformant Products.

Synopsis of OpenCL conformant products[46]
AMD APP SDK (supports OpenCL CPU and Accelerated processing unit Devices) X86 + SSE2 (or higher) compatible CPUs 64bit & 32bit;[47] Linux 2.6 PC, Windows Vista/7 PC AMD Fusion E-350, E-240, C-50, C-30 with HD 6310/HD 6250 AMD Radeon/Mobility HD 6800, HD 5x00 series GPU, iGPU HD 6310/HD 6250 ATI FirePro Vx800 series GPU
Intel OpenCL SDK 1.1[48] (supports only OpenCL Intel Core based CPU Device) Intel CPUs with SSE 4.1, SSE 4.2 or AVX support.[49][50] Microsoft Windows, Linux Intel Core i7, i5, i3; 2nd Generation Intel Core i7/5/3 Intel Core 2 Solo, Duo Quad, Extreme Intel Xeon 7x00,5x00,3x00 (Core based)
IBM Servers with OpenCL Development Kit for Linux on Power running on Power VSX[51][52] IBM Power 755 (PERCS), 750 IBM BladeCenter PS70x Express IBM BladeCenter JS2x, JS43 IBM BladeCenter QS22
IBM OpenCL Common Runtime (OCR)

[53]

X86 + SSE2 (or higher) compatible CPUs 64bit & 32bit;[54] Linux 2.6 PC AMD Fusion, NVIDIA ION and Intel Core i7, i5, i3; 2nd Generation Intel Core i7/5/3 AMD Radeon, NVIDIA GeForce and Intel Core 2 Solo, Duo, Quad, Extreme ATI FirePro, NVIDIA Quadro and Intel Xeon 7x00,5x00,3x00 (Core based)
NVIDIA OpenCL Driver and Tools[55] NVIDIA Tesla C/D/S NVIDIA GeForce GTS/GT/GTX NVIDIA ION NVIDIA Quadro FX/NVX/Plex

See also

References

  1. ^ Jääskeläinen, Pekka O.; de La Lama, Carlos S.; Huerta, Pablo; Takala, Jarmo H. (July 2010). "OpenCL-based design methodology for application-specific processors". 2010 International Conference on Embedded Computer Systems (SAMOS) (IEEE): 223–230. doi:10.1109/ICSAMOS.2010.5642061. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5642061. Retrieved 17 February 2011. 
  2. ^ "Jobs at Altera". Archived from the original on 21 July 2011. http://web.archive.org/web/20110721110044/http://tbe.taleo.net/NA3/ats/careers/requisition.jsp?org=ALTERA&cws=1&rid=938. 
  3. ^ "Khronos Launches Heterogeneous Computing Initiative" (Press release). Khronos Group. 16 June 2008. http://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/. Retrieved 18 June 2008. 
  4. ^ "OpenCL gets touted in Texas". MacWorld. 20 November 2008. http://www.macworld.com/article/136921/2008/11/opencl.html?lsrc=top_2. Retrieved 12 June 2009. 
  5. ^ "The Khronos Group Releases OpenCL 1.0 Specification" (Press release). Khronos Group. 8 December 2008. http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/. Retrieved 12 June 2009. 
  6. ^ "Apple Previews Mac OS X Snow Leopard to Developers" (Press release). Apple Inc.. 9 June 2008. http://www.apple.com/pr/library/2008/06/09snowleopard.html. Retrieved 9 June 2008. 
  7. ^ "AMD Drives Adoption of Industry Standards in GPGPU Software Development" (Press release). AMD. 6 August 2008. http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~127451,00.html. Retrieved 14 August 2008. 
  8. ^ "AMD Backs OpenCL, Microsoft DirectX 11". eWeek. 6 August 2008. http://www.eweek.com/c/a/Desktops-and-Notebooks/AMD-Backing-OpenCL-and-Microsoft-DirectX-11/. Retrieved 14 August 2008. 
  9. ^ "HPCWire: RapidMind Embraces Open Source and Standards Projects". HPCWire. 10 November 2008. http://www.hpcwire.com/topic/applications/RapidMind_Embraces_Open_Source_and_Standards_Projects.html. Retrieved 11 November 2008. 
  10. ^ "NVIDIA Adds OpenCL To Its Industry Leading GPU Computing Toolkit" (Press release). Nvidia. 9 December 2008. http://www.nvidia.com/object/io_1228825271885.html. Retrieved 10 December 2008. 
  11. ^ "OpenCL Development Kit for Linux on Power". alphaWorks. 30 October 2009. http://www.alphaworks.ibm.com/tech/opencl. Retrieved 30 October 2009. 
  12. ^ Khronos Drives Momentum of Parallel Computing Standard with Release of OpenCL 1.1 Specification
  13. ^ Khronos Releases OpenCL 1.2 Specification
  14. ^ "OpenCL Demo, AMD CPU". 10 December 2008. http://www.youtube.com/watch?v=sLv_fhQlqis. Retrieved 28 March 2009. 
  15. ^ "OpenCL Demo, NVIDIA GPU". 10 December 2008. http://www.youtube.com/watch?v=PJ1jydg8mLg. Retrieved 28 March 2009. 
  16. ^ "Imagination Technologies launches advanced, highly-efficient POWERVR SGX543MP multi-processor graphics IP family". Imagination Technologies. 19 March 2009. http://www.imgtec.com/News/Release/index.asp?NewsID=449. Retrieved 30 January 2011. 
  17. ^ "AMD and Havok demo OpenCL accelerated physics". PC Perspective. 26 March 2009. http://www.pcper.com/comments.php?nid=6954. Retrieved 28 March 2009. 
  18. ^ "NVIDIA Releases OpenCL Driver To Developers". NVIDIA. 20 April 2009. http://www.nvidia.com/object/io_1240224603372.html. Retrieved 27 April 2009. 
  19. ^ "AMD does reverse GPGPU, announces OpenCL SDK for x86". Ars Technica. 5 August 2009. http://arst.ch/5te. Retrieved 6 August 2009. 
  20. ^ Dan Moren; Jason Snell (8 June 2009). "Live Update: WWDC 2009 Keynote". macworld.com. MacWorld. http://www.macworld.com/article/140897/2009/06/keynote.html. Retrieved 12 June 2009. 
  21. ^ "Mac OS X Snow Leopard – Technical specifications and system requirements". Apple Inc. 23 March 2011. http://www.apple.com/macosx/specs.html. Retrieved 23 March 2011. 
  22. ^ "ATI Stream Software Development Kit (SDK) v2.0 Beta Program". http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx#one. Retrieved 14 October 2009. 
  23. ^ "Apple entry on LLVM Users page". http://llvm.org/Users.html#Apple. Retrieved 29 August 2009. 
  24. ^ "Nvidia entry on LLVM Users page". http://llvm.org/Users.html. Retrieved 6 August 2009. 
  25. ^ "Rapidmind entry on LLVM Users page". http://llvm.org/Users.html. Retrieved 1 October 2009. 
  26. ^ "Zack Rusin's blog post about the Gallium3D OpenCL implementation". http://zrusin.blogspot.com/2009/02/opencl.html. Retrieved 1 October 2009. 
  27. ^ "S3 Graphics launched the Chrome 5400E embedded graphics processor". http://www.s3graphics.com/en/news/news_detail.aspx?id=44. Retrieved 27 October 2009. 
  28. ^ "VIA Brings Enhanced VN1000 Graphics Processor"]. http://www.via.com.tw/en/resources/pressroom/pressrelease.jsp?press_release_no=4327. Retrieved 10 December 2009. 
  29. ^ "ATI Stream SDK v2.0 with OpenCL 1.0 Support". http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx. Retrieved 23 October 2009. 
  30. ^ http://www.ziilabs.com/opencl
  31. ^ a b "Khronos Group Conformant Products". http://www.khronos.org/adopters/conformant-products/#topencl. 
  32. ^ "Intel discloses new Sandy Bridge technical details". http://news.cnet.com/8301-13924_3-20016302-64.html. Retrieved 13 September 2010. 
  33. ^ WebCL related stories
  34. ^ Khronos Releases Final WebGL 1.0 Specification
  35. ^ "OpenCL Development Kit for Linux on Power". http://www.alphaworks.ibm.com/tech/opencl. 
  36. ^ "About the OpenCL Common Runtime for Linux on x86 Architecture". https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Wbf059a58a9b9_459d_aca4_493655c96370/page/OpenCL%20Common%20Runtime. 
  37. ^ Nokia Research releases WebCL prototype
  38. ^ Samsung's WebCL Prototype for WebKit
  39. ^ [1]
  40. ^ AMD APP SDK v2.6
  41. ^ AMD. Introduction to OpenCL Programming 201005, page 89-90
  42. ^ AMD. Introduction to OpenCL Programming 201005, page 89-90
  43. ^ "OpenCL". SIGGRAPH2008. 14 August 2008. http://s08.idav.ucdavis.edu/munshi-opencl.pdf. Retrieved 14 August 2008. 
  44. ^ "Fitting FFT onto G80 Architecture" (PDF). Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report. May 2008. http://www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_report.pdf. Retrieved 14 November 2008. 
  45. ^ . "OpenCL on FFT". Apple. 16 Nov 2009. https://developer.apple.com/mac/library/samplecode/OpenCL_FFT/index.html. Retrieved 7 December 2009. 
  46. ^ "Conformant Products". http://www.khronos.org/conformance/adopters/conformant-products/. Retrieved 11 August 2011. 
  47. ^ "OpenCL and the AMD APP SDK". AMD Developer Central. developer.amd.com. http://developer.amd.com/documentation/articles/pages/OpenCL-and-the-AMD-APP-SDK.aspx. Retrieved 11 August 2011. 
  48. ^ "About Intel OpenCL SDK 1.1". software.intel.com. intel.com. http://software.intel.com/en-us/articles/opencl-sdk/. Retrieved 11 August 2011. 
  49. ^ "Product Support". http://software.intel.com/en-us/articles/opencl-sdk-frequently-asked-questions/#12. Retrieved 11 August 2011. 
  50. ^ "Intel OpenCL SDK - Release Notes". http://software.intel.com/en-us/articles/opencl-release-notes/. Retrieved 11 August 2011. 
  51. ^ "Announcing OpenCL Development Kit for Linux on Power v0.3". http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14600651&tstart=0. Retrieved 11 August 2011. 
  52. ^ "IBM releases OpenCL Development Kit for Linux on Power v0.3 - OpenCL 1.1 conformant release available". OpenCL Lounge. ibm.com. https://www.ibm.com/developerworks/mydeveloperworks/blogs/80367538-d04a-47cb-9463-428643140bf1/entry/ibm_releases_opencl_development_kit_for_linux_on_power_v0_3_opencl_1_1_conformant_release_available6?lang=en. Retrieved 11 August 2011. 
  53. ^ "IBM releases OpenCL Common Runtime for Linux on x86 Architecture". https://www.ibm.com/developerworks/mydeveloperworks/blogs/80367538-d04a-47cb-9463-428643140bf1/entry/ibm_releases_opencl_common_runtime_for_linux_on_x86_architecture4?lang=en. Retrieved 10 September 2011. 
  54. ^ "OpenCL and the AMD APP SDK". AMD Developer Central. developer.amd.com. http://developer.amd.com/documentation/articles/pages/OpenCL-and-the-AMD-APP-SDK.aspx. Retrieved 10 September 2011. 
  55. ^ "Nvidia Releases OpenCL Driver". http://www.tomshardware.com/news/Nvidia-Cuda-OpenCL-SDK,7596.html. Retrieved 11 August 2011. 

External links

Documentation

Drivers

Libraries

Language bindings and wrappers

Tools