CLMUL instruction set
Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008[1] and made available in the Intel Westmere processors announced in early 2010. The purpose is to improve the speed of applications doing block cipher encryption in Galois/Counter Mode, which depends on finite field multiplication. Finite field (GF(2k)) multiplication can be implemented more efficiently[2] with the new CLMUL instructions than with the traditional instruction set.[3] Another application is the fast calculation of CRC values.[4]
New instructions
The instruction computes the 128-bit product of two 64-bit values. The destination is a 128-bit XMM register. The source may be another XMM register or memory. An immediate operand specifies which halves of the 128-bit operands are multiplied. Mnemonics specifying specific values of the immediate operand are also defined:
Instruction | Opcode | Description |
---|---|---|
PCLMULQDQ xmmreg,xmmrm,imm | [rmi: 66 0f 3a 44 /r ib] |
Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2). |
PCLMULLQLQDQ xmmreg,xmmrm | [rm: 66 0f 3a 44 /r 00] |
Multiply the low halves of the two registers. |
PCLMULHQLQDQ xmmreg,xmmrm | [rm: 66 0f 3a 44 /r 01] |
Multiply the high half of the destination register by the low half of the source register. |
PCLMULLQHQDQ xmmreg,xmmrm | [rm: 66 0f 3a 44 /r 10] |
Multiply the low half of the destination register by the high half of the source register. |
PCLMULHQHQDQ xmmreg,xmmrm | [rm: 66 0f 3a 44 /r 11] |
Multiply the high halves of the two registers. |
CPUs with CLMUL instruction set
- Intel
- Westmere processor (March 2010).
- Sandy Bridge processor
- Ivy Bridge processor
- Haswell processor
- Broadwell processor
- AMD:
- Bulldozer processor (2011).[5]
- Piledriver based processors (including newer AMD A-series APUs)
- Jaguar based processors.[6]
The presence of the CLMUL instruction set can be checked by testing one of the CPU feature bits.
See also
- Finite field arithmetic
- AES instruction set
- FMA3 instruction set
- FMA4 instruction set
- AVX instruction set
References
- ↑ "Intel Software Network". Intel. Retrieved 2008-04-05.
- ↑ "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode - Rev 2".
- ↑ Detailed description of instructions on Intel website
- ↑ "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ".
- ↑ Dave Christie (6 May 2009). "Striking a balance". AMD Developer blogs. Retrieved 2011-03-11.
- ↑ "Slide detailing improvements of Jaguar over Bobcat". AMD. Retrieved August 3, 2013.