VEX prefix
The VEX prefix (from "vector extensions") and VEX coding scheme are comprising an extension to the x86 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others.
Features
The VEX coding scheme allows the definition of new instructions and the extension or modification of previously existing instruction codes. This serves the following purposes:
- The opcode map is extended to make space for future instructions.
- It allows instruction codes to have up to five operands, where the original scheme allows only two operands (in rare cases three operands).
- It allows the size of SIMD vector registers to be extended from the 128-bits XMM registers to 256-bits registers named YMM. There is room for further extensions of the register size in the future.
- It allows existing two-operand instructions to be modified into non-destructive three-operand forms where the destination register is different from both source registers. For example, c = a + b instead of a = a + b (where register a is changed by the instruction).
The VEX prefix replaces the most commonly used instruction prefix bytes and escape codes. In many cases, the number of prefix bytes and escape bytes that are replaced is the same as the number of bytes in the VEX prefix, so that the total length of the VEX-encoded instruction is the same as the length of the legacy instruction code. In other cases, the VEX-encoded version is longer or shorter than the legacy code. In 32-bit mode VEX encoded instructions can only access the first 8 YMM/XMM registers; the encodings for the other registers would be interpreted as the legacy LDS and LES instructions that are not supported in 64-bit mode.
The two-byte VEX prefix contains the following components:
- The bit, R, similar to the REX.R prefix bit used in the x86-64 instruction set extension.
- Four bits named v, specifying a second source register operand.
- A bit named L specifying 256-bit vector length.
- Two bits named p to replace operand size prefixes and operand type prefixes (66, F2, F3).
The three-byte VEX prefix additionally contains:
- The three bits, X; B; and W, also similar to the corresponding bits in the REX prefix.
- Five bits named m. Two of the m bits are used for replacing existing escape codes and for specifying the length of the instruction. The remaining three m bits are reserved for future use, such as specifying vector lengths >256 bits, specifying different instruction lengths, or extending the opcode space, however as of 2013, Intel decided to introduce a new encoding scheme, the EVEX prefix, rather than expand the remaining m bits.
Technical description
# of bytes | 0,2,3 | 1 | 1 | 0,1 | 0,1,2,4 | 0,1 |
---|---|---|---|---|---|---|
[Prefixes] | [VEX] | OPCODE | ModR/M | [SIB] | [DISP] | [IMM] |
The VEX coding scheme uses a code prefix consisting of 2 or 3 bytes which is added to existing or new instruction codes.[1]
In x86 architecture, instructions with a memory operand may use the ModR/M byte which specifies the addressing mode. This byte has three bit fields:
- mod, bits [7:6] - combined with the r/m field, encodes either 8 registers or 24 addressing modes. Also encodes opcode information for some instructions
- reg/opcode, bits [5:3] - specifies either a register or three more bits of opcode information, as specified in the primary opcode byte
- r/m, bits [2:0] - can specify a register as an operand, or combine with the mod field to encode an addressing mode.
The base-plus-index and scale-plus-index forms of 32-bit addressing (encoded with r/m=100 and mod <>11) require another addressing byte, the SIB byte. It has the following fields:
- scale factor, encoded with bits [7:6]
- index register, bits [5:3]
- base register, bits [2:0].
To use 64-bit addressing and additional registers present in the x86-64 architecture, the REX prefix has been introduced which provides additional space for encoding addressing modes. Bit-field W expands the operand size to 64 bits, R expands reg, B expands r/m or reg (depending on the opcode format used), and X and B expand index and base in the SIB byte. However REX prefix is encoded quite inefficiently, wasting half of its 8 bits.
3-byte VEX | ||||||||
---|---|---|---|---|---|---|---|---|
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
Byte 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
Byte 1 | R̅ | X̅ | B̅ | m4 | m3 | m2 | m1 | m0 |
Byte 2 | W | v̅3 | v̅2 | v̅1 | v̅0 | L | p1 | p0 |
2-byte VEX | ||||||||
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
Byte 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 |
Byte 1 | R̅ | v̅3 | v̅2 | v̅1 | v̅0 | L | p1 | p0 |
The VEX prefix provides a compact representation of the REX prefix, as well as various other prefixes, to expand the addressing mode, register enumeration and operand size and width:
- R̅, X̅ and B̅ bits are inversion of the REX prefix's R, X and B bits; these provide a fourth (high) bit for register index fields (ModRM reg, SIB index, and ModRM r/m; SIB base; or opcode reg fields, respectively) allowing access to 16 instead of 8 registers. The W bit is equivalent to the REX prefix's W bit, and specifies a 64-bit operand; for non-integer instructions, it is a general opcode extension bit.
- v̅ is the inversion of an additional source register index.
- m replaces leading opcode prefix bytes. The values 1, 2 and 3 are equivalent to opcode prefixes 0F, 0F 38 and 0F 3A; all other values are reserved. The 2-byte VEX prefix always corresponds to the 0F prefix.
- L indicates the vector length; 0 for 128-bit SSE (XMM) registers, and 1 for 256-bit AVX (YMM) registers.
- p encodes additional prefix bytes. The values 0, 1, 2, and 3 correspond to implied prefixes of none, 66, F3, and F2. These encode the operand type for SSE instructions: packed single, packed double, scalar single and scalar double, respectively.
The VEX opcode bytes, C4h and C5h, are the same as that used by the LDS and LES instructions. These instructions are not supported in 64-bit mode, while in 32-bit mode a following ModRM byte can not be of the form 11xxxxxx (which would specify a register operand). Various bits are inverted to ensure that the second byte of a VEX prefix is always of this form in 32-bit mode.
Instructions that need more than three operands have an extra suffix byte specifying one or two additional register operands. Instructions coded with the VEX prefix can have up to five operands. At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.
The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instructions have up to four operands. The AVX instruction set allows the VEX prefix to be applied only to instructions using the SIMD XMM registers. However, the VEX coding scheme has space for applying the VEX prefix to other instructions as well in future instruction sets.
Legacy SIMD instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:
- The VEX-encoded instruction can have one more operand, making it non-destructive.
- A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero.
Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency.
History
- In August 2007, AMD proposed the SSE5 instruction set extension which includes a new coding scheme for instructions with three operands, using an extra byte named DREX intended for the Bulldozer processor core, due to begin production in 2011.[2][3]
- In March 2008, Intel proposed the AVX instruction set, using the new VEX coding scheme.[4]
- In August 2008, commentators deplored the expected incompatibility between AMD and Intel instruction sets, and proposed that AMD revise their plans and replace the DREX scheme with the more flexible and extensible VEX scheme.[5]
- In May 2009, AMD announced a revision of the proposed SSE5 instruction set to make it compatible with the AVX instruction set and the VEX coding scheme. The revised SSE5 is called XOP.[6]
- January 2011. The AVX instruction set is supported in Intel's Sandy Bridge microprocessor architecture.
- 2011. The AVX, XOP and FMA4 instruction sets, all using the VEX scheme, are supported in the AMD Bulldozer processor.[7]
- 2013. The FMA3 instruction set is supported in Intel Haswell processors.
References
- ↑ Intel Corporation (January 2009). "Intel Advanced Vector Extensions Programming Reference".
- ↑ "128-Bit SSE5 Instruction Set". AMD Developer Central. Retrieved 2009-06-02.
- ↑ Hruska, Joel (November 14, 2008). "AMD Fusion now pushed back to 2011". Ars Technica.
- ↑ "Intel Software Network". Intel. Retrieved 2008-04-05.
- ↑ "AMD and Intel incompatible - What to do?". AMD Developer Forums. Retrieved 2012-08-10.
- ↑ "AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit and 256-Bit Media Instructions" (PDF). AMD. December 22, 2010.
- ↑ "Striking a balance". Dave Christie, AMD Developer blogs. Retrieved 2012-08-10.