CUDA

From Wikipedia, the free encyclopedia

CUDA ("Compute Unified Device Architecture"), is a GPGPU technology that allows a programmer to use the C programming language to code algorithms for execution on the graphics processing unit (GPU). CUDA has been developed by Nvidia and to use this architecture requires an Nvidia GPU and special stream processing drivers. CUDA works with the new GeForce 8 Series, featuring G8X GPUs; Nvidia states that programs developed for the GeForce 8 series will also work without modification on all future Nvidia video cards[citation needed]. CUDA gives developers unfettered access to the native instruction set and memory of the massively parallel computational elements in CUDA GPUs. Using CUDA, Nvidia GeForce-based GPUs effectively become powerful, programmable open architectures like today’s CPUs (Central Processing Units). By opening up the architecture, CUDA provides developers both with the low-level, deterministic, and the high-level API for repeatable access to hardware which is necessary to develop essential high-level programming tools such as compilers, debuggers, math libraries, and application platforms.

The initial CUDA SDK was made public 15th February 2007.[1] The compiler in CUDA is based on Open64.

Contents

[hide]

[edit] Hardware

The 8-Series (G8X) GeForce-based GPU from Nvidia is the first series of GPU to support the CUDA SDK. The 8-Series (G8X) GPUs features hardware support for 32-bit (single precision) floating point vector processors, using the CUDA SDK as API. (CUDA supports the C "double" data type, however on G8X series GPUs these types will get demoted to 32-bit floats.). Due to the highly parallel nature of vector processors, GPU assisted hardware stream processing can have a huge impact in specific data processing applications. It is anticipated in the computer gaming industry that graphics cards may be used in future game physics calculations (physical effects like debris, smoke, fire, fluids). CUDA has also been used to accelerate non-graphical applications in computational biology and other fields by an order of magnitude or more.[2] [3]

[edit] Advantages

CUDA has several advantages over traditional general purpose computation on GPUs (GPGPU) using graphics APIs.

  • It uses the standard C language, with some simple extensions.
  • Scattered writes – code can write to arbitrary addresses in memory.
  • Shared memory – CUDA exposes a fast shared memory region (16KB in size) that can be shared amongst threads. This can be used as a user-managed cache, enabling higher bandwidth than is possible using texture lookups. See example here.[4]
  • Faster downloads and readbacks to and from the GPU
  • Full support for integer and bitwise operations

[edit] Limitations

  • Only bilinear texture filtering is supported – mipmapped textures and anisotropic filtering are not supported at this time.
  • Texture rendering is not supported.
  • Recursive functions are not supported and must be converted to loops.
  • Various deviations from the IEEE 754 standard. Denormals and signalling NaNs are not supported; only two IEEE rounding modes are supported (chop and round-to-nearest even), and those are specified on a per-instruction basis rather than in a control word; and the precision of division/square root is slightly lower than single precision.
  • The bus bandwidth and latency between the CPU and the GPU may be a bottleneck.
  • Threads must run in groups of at least 32 threads that execute identical instructions simultaneously. Branching does not impact performance significantly, provided that blocks of 32 threads take the same code path; but the SIMD execution model becomes a significant limitation for any inherently divergent task (e.g., traversing a ray tracing acceleration data structure).
  • CUDA-enabled GPUs are only available from Nvidia (GeForce 8000 series and above, Quadro and Tesla[1])

[edit] See also

[edit] References

  1. ^ CUDA for GPU Computing
  2. ^ Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A. (2007). "High-throughput sequence alignment using Graphics Processing Units.". BMC Bioinformatics 8:474: 474. doi:10.1186/1471-2105-8-474. 
  3. ^ Manavski, Svetlin A.; Giorgio Valle (2008). "CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment". BMC Bioinformatics 9(Suppl 2):S10: S10. doi:10.1186/1471-2105-9-S2-S10. 
  4. ^ Silberstein, Mark (2007). Efficient computation of Sum-products on GPUs.

[edit] External links