Nvidia Tesla
The Tesla graphics processing unit (GPU) is Nvidia's third brand of GPUs designed as a general purpose GPU. It is based on high-end GPUs from the G80 series onward with elements from the Quadro series. The Tesla series takes its name from pioneering electrical engineer Nikola Tesla.
Tesla overview
With their very high computational power (measured in floating point operations per second or FLOPS) compared to microprocessors, the Tesla products target the high performance computing market.[1] As of 2012, Nvidia Teslas power some of the world's fastest supercomputers, including Titan at Oak Ridge National Laboratory and Tianhe-1A, in Tianjin, China.
The lack of ability to output images to a display was the main difference between Tesla products and the consumer level GeForce cards and the professional level Quadro cards, but the latest Tesla C-class products include one Dual-Link DVI port.[2] For equivalent single precision output, Fermi-based Nvidia GeForce cards have four times less dual-precision performance. Tesla products primarily operate:[3]
- in simulations and in large scale calculations (especially floating-point calculations)
- for high-end image generation for applications in professional and scientific fields
- with the use of OpenCL or CUDA.
Nvidia intends to offer ARMv8 processor cores embedded into future Tesla GPUs as part of Project Denver.[4] This will be a 64-bit follow on to the 32-bit Tegra chips.
Tesla itself will be followed by the TB/s Volta in 2016.[5]
Market
The defense industry currently accounts for less than a sixth of Tesla sales, but Sumit Gupta predicts further sales to the geospatial intelligence market.[6]
Specifications and configurations
Configuration | Model | # of GPUs | Core clock in MHz (each) |
Shaders | Memory | Processing Power (peak) GFLOPs[7] |
Compute capability4 | TDP watts | Form factor and features | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Thread Processors (total) | Clock in MHz (each) | Bandwidth max (GB/s) | Bus type | Bus width (bit, each GPU) | Total size (MiB) | Clock (MHz) | Single Precision (SP) Total (MUL+ADD+SF) | Single Precision (SP) MAD (MUL+ADD) | Double Precision (DP) FMA | |||||||
GPU Computing Processor1 |
C870 | 1 | 600 | 128 | 1350 | 76.8 | GDDR3 | 384 | 1536 | 1600 | 518.4 | 345.6 | 0 | 1.0 | 170.9 | Full-height video card |
Deskside Supercomputer1 | D870 | 2 | 600 | 2 × 128 (256) | 1350 | 153.6 | GDDR3 | 384 | 3072 | 1600 | 1036.8 | 691.2 | 0 | 1.0 | 520 | Deskside system or Rack unit |
GPU Computing Server1 | S870 | 4 | 600 | 4 × 128 (512) | 1350 | 307.2 | GDDR3 | 384 | 6144 | 1600 | 2073.6 | 1382.4 | 0 | 1.0 | 1U Rack | |
C1060 Computing Processor 2 |
C1060 | 1 | 602 | 240 | 1300 | 102.4 | GDDR3 | 512 | 4096 | 1600 | 933.12 | 622.08 | 77.76 | 1.3 | 187.8 | 2 slot video card |
S1075 1U[8] GPU Computing Server3,4 |
S1070 | 4 | 602 | 4 × 240 (960) | 1440 | 409.6 | GDDR3 | 512 | 16384 | 1600 | 4147.2 | 2764.8 | 345.6 | 1.3 | 1U Rack IEEE 754-2008 capabilities | |
C2050/C2070/C2075 GPU Computing Processor |
C2050/C2070/C2075 | 1 | 575 | 448 | 1150 | 144 | GDDR5 | 384 | 3072/61445 | 1500 [9][10] | 1288 | 1030.46 | 515.2 | 2.0 | 238/247/225 | Full-height video card IEEE 754-2008 FMA capabilities |
M2050 GPU Computing Module |
M2050 | 1 | 575 | 448 | 1150 | 148.4 | GDDR5 | 384 | 30725 | 1546 | 1288 | 1030.46 | 515.2 | 2.0 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
M2070/M2070Q[11] GPU Computing Module |
M2070/M2070Q | 1 | 575 | 448 | 1150 | 150.336 | GDDR5 | 384 | 61445 | 1566 | 1288 | 1030.46 | 515.2 | 2.0 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
M2090[12][13][14] GPU Computing Module |
M2090 | 1 | 650 | 512 | 1301 | 177 | GDDR5 | 384 | 61445 | 1848 | ? | 1332.2 | 666.1 | 2.0 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
S2050 1U GPU Computing System |
S2050 | 4 | 575 | 4 × 448 (1792) | 1150 | 4 × 148.4 (593.6) | GDDR5 | 384 | 122885 | 3092 | 5152 | 4121.66 | 2060.8 | 2.0 | 900 | 1U Rack IEEE 754-2008 FMA capabilities |
K10 GPU Computing Module |
K10 / GK104 | 2 | 745 | 1536 per GPU | 745 | 160 per GPU | GDDR5 | 256 per GPU | 4096 per GPU | 2500 | 2288 per GPU | - | 95 per GPU | 3.0 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
K20 GPU Computing Module | GK110 | 1 | 706 | 2496 | 706 | 208 | GDDR5 | 320 | 5120 | 2600 | 3520 | - | 1170 | 3.5 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
K20X GPU Computing Module | GK110 | 1 | 732 | 2688 | 732 | 250 | GDDR5 | 384 | 6144 | 2600 | 3950 | - | 1310 | 3.5 | 235[15] | Computing Module IEEE 754-2008 FMA capabilities |
K40 GPU Computing Module | GK110 | 1 | 2880 | 745 | 288 | GDDR5 | 384 | 12288 | 3004 | 3.5 | 245 | Computing Module IEEE 754-2008 FMA capabilities |
Notes
- 1 Specifications not specified by Nvidia are assumed to be based on the GeForce 8800GTX
- 2 Specifications not specified by Nvidia are assumed to be based on the GeForce GTX 285
- 3 A host system/server is required to connect to the 1U GPU computing server by the PCI Express card (similar set-up as the Nvidia Quadro Plex)
- 4 Core architecture version according to the CUDA programming guide.
- 5 With ECC on, a portion of the dedicated memory is used for ECC bits, so the available user memory is reduced by 12.5%. (e.g. 3 GB total memory yields 2.625 GB of user available memory.)
- 6 Fermi implements the new fused multiply–add (FMA) instruction for both 32-bit single-precision and 64-bit double-precision floating point numbers (GT200 supported FMA only in double precision) that improves upon multiply-add by retaining full precision in the intermediate stage.[16]
- Performance figures are for single-precision except where noted.
- NVIDIA Tesla Supercomputers are also available with up to 8x Fermi GPUs from manufacturers.
See also
- Nvidia Tesla Personal Supercomputer
- Comparison of Nvidia graphics processing units
- GeForce 8 series
- GeForce 200 Series
- GeForce 400 Series
- GeForce 500 Series
- GeForce 600 Series
- GeForce 700 Series
- GeForce 800 Series
- CUDA
- GPGPU
- OpenCL
- Stream processing
- Intel Xeon Phi direct competitor in the HPC market.
References
- ↑ High Performance Computing - Supercomputing with Tesla GPUs
- ↑
- ↑ Tesla Technical Brief (PDF)
- ↑ "Nvidia to Integrate ARM Processors in Tesla."
- ↑ "NVIDIA's Volta GPU Launches In 2016, Delivers 1TB/s Of Memory Bandwidth."
- ↑ "Nvidia chases defense, intelligence ISVs with GPUs."
- ↑ Nvidia Announces Tesla 20 Series
- ↑ Difference between Tesla S1070 and S1075
- ↑ "TESLA™ C2050 / C2070 GPU Computing Processor". NVIDIA. Retrieved 8 January 2013.
- ↑ "NVIDIA® TESLA™ C2075 COMPANION PROCESSOR". NVIDIA. Retrieved 8 January 2013.
- ↑ NVidia Tesla M2050 & M2070/M2070Q Specs Online
- ↑ TESLA M2090 Product brief
- ↑ http://www.nvidia.com/docs/IO/43395/Tesla-M2090-Board-Specification.pdf
- ↑ http://www.nvidia.com/docs/IO/105880/DS-Tesla-M-Class-Aug11.pdf
- ↑ http://www.nvidia.com/content/PDF/kepler/Tesla-K20X-BD-06397-001-v05.pdf
- ↑ NVIDIA Fermi Compute Architecture Whitepaper.pdf PDF (855KiB), Page 13 of 22
External links
Wikimedia Commons has media related to Nvidia Tesla series. |
- NVIDIA Product Overview and Technical Brief
- NVIDIA's Tesla homepage
- Nvidia Tesla C2050 / C2070 GPU Computing Processor
- Nvidia Tesla S2050 GPU Computing System
- Nvidia Tesla C1060 Computing Processor
- Nvidia Tesla S1070
- Nvidia Tesla M1060 Processor
- Nvidia Nsight
|