Tensor processing unit
A tensor processing unit (TPU) is an application-specific integrated circuit (ASIC) developed by Google specifically for machine learning. Compared to a graphics processing unit, it is designed explicitly for a higher volume of reduced precision computation (e.g. as little as 8-bit precision[1]) with higher IOPS per watt, and lacks hardware for rasterisation/texture mapping.[2] The chip has been specifically designed for Google's TensorFlow framework, however Google still uses CPUs and GPUs for other types of machine learning.[3] Other AI accelerator designs are appearing from other vendors also and are aimed at embedded and robotics markets.
Google has stated that its proprietary tensor processing units were used in the AlphaGo versus Lee Sedol series of man-machine Go games.[2] Google has also used TPUs for Google Street View text processing, and was able to find all the text in the Street View database in less than five days. In Google Photos, an individual TPU can process over 100 million photos a day. It is also used in RankBrain which Google uses to provide search results.[4] The tensor processing unit was announced in 2016 at Google I/O, although the company stated that the TPU had been used inside their datacenter for over a year prior.[3][2]
The TPU ASICs are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center rack, according to Google Distinguished Hardware Engineer Norm Jouppi.[3]
Architecture
First generation
The first generation TPU is an 8-bit matrix multiply engine, driven with CISC instructions by the host processor across a PCIe 3.0 bus. It is manufactured on a 28 nm process with a die size ≤ 331 mm2. The clock speed is 700 MHz and has a thermal design power of 28-40 W. It has 28 MiB of on chip memory, and 4 MiB of 32-bit accumulators taking the results of a 256x256 array of 8-bit multipliers. Instructions transfer data to or from the host, perform matrix multiplies or convolutions, and apply activation functions [5]
Second generation
The second generation TPU was announced in May 2017.[6] The individual TPU ASICs are rated at 45 TFLOPS and arranged into 4-chip 180 TFLOPS modules. These modules are then assembled into 256 chip pods with 11.5 PFLOPS of performance. Notably, while the first generation TPUs were limited to integers, the second generation TPUs can also calculate in floating point. This makes the second generation TPUs useful for both training and inference of machine learning models. Google has stated these second generation TPUs will be available on the Google Compute Engine for use in TensorFlow applications.[7]
See also
- Vision processing unit a similar device specialised for vision processing.
- TrueNorth a similar device simulating spiking neurons instead of low precision tensors.
- Neural processing unit
References
- ↑ Armasu, Lucian (2016-05-19). "Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)". Tom's Hardware. Retrieved 2016-06-26.
- 1 2 3 Jouppi, Norm (May 18, 2016). "Google supercharges machine learning tasks with TPU custom chip". Google Cloud Platform Blog. Google. Retrieved 2017-01-22.
- 1 2 3 "Google's Tensor Processing Unit explained: this is what the future of computing looks like". TechRadar. Retrieved 2017-01-19.
- ↑ "Google's Tensor Processing Unit could advance Moore's Law 7 years into the future". PCWorld. Retrieved 2017-01-19.
- ↑ "In-Datacentre Performance Analysis of a Tensor Processing Unit".
- ↑ Bright, Peter (17 May 2017). "Google brings 45 teraflops tensor flow processors to its compute cloud". Ars Technica. Retrieved 30 May 2017.
- ↑ Kennedy, Patrick (17 May 2017). "Google Cloud TPU Details Revealed". Serve The Home. Retrieved 30 May 2017.