Orthogonal instruction set
From Wikipedia, the free encyclopedia
Orthogonal instruction set is a term used in computer engineering. A computer's instruction set is said to be orthogonal if any instruction can use data of any type via any addressing mode. Generally, having an orthogonal instruction set is a desirable attribute for any computer architecture.
Contents |
[edit] The definition of orthogonality
All computer architectures define the set of basic (fundamental) instructions that a computer conforming to that architecture must be capable of processing. These instructions define the basic operations that the computer can perform and the types of data that the computer can process. For example, a computer may be able to add and subtract, and it can process data that is in its registers and in its main memory. But it does not go without saying that a computer must be able to perform additions on both data that is in its registers and data that is in its main memory; in fact, many very famous computers are unable to do both.
Specifically, a computer's instruction set is said to be orthogonal if any instruction can use data of any type via any addressing mode. This terminology results from considering a computer instruction as a bit vector whose components can be divided into several fields:
- The instruction's operation code (opcode), specifying the operation to be performed (such as "add two numbers")
- The type of data to be operated upon (such as a "one byte signed integer")
- The sources and/or destinations for the data to be operated upon (such as "data from Register 1 and Register 2 with the result to be stored at memory address 12345")
If all of these fields can simultaneously specify any of the various capabilities of which the computer is capable, the instruction set is said to be "orthogonal". (The term comes from the fact that you could view the several fields as individual dimensions in space; if each dimension can vary fully without affecting any other dimension, the dimensions (axes) are said to be orthogonal to each other).
However, if it turns out that, for example, floating point numbers can only be added when contained in Registers 0, 1, and 2, but integer numbers can be added when contained in any register, that computer's instruction set is said to not be orthogonal because the programmer's choice of data type (floating versus integer) affects which registers the programmer can then choose to use.
As with a set of mathematical basis vectors, which must be orthogonal if they are to represent any vector uniquely, only an orthogonal instruction set can uniquely encode all combinations of opcodes, data types, registers, and addressing modes.
[edit] Orthogonality in practice
In many CISC computers, an instruction could access either registers or memory, usually in several different ways. This made the CISC machines easier to program, because rather than being required to remember thousands of individual instruction opcodes, an orthogonal instruction set allowed a programmer to instead remember just thirty to a hundred operation codes ("ADD", "SUBTRACT", "MULTIPLY", "DIVIDE", etc.) and a set of three to ten addressing modes ("FROM REGISTER 0", "FROM REGISTER 1", FROM MEMORY", etc.). The DEC PDP-11 and Motorola 68000 computer architectures are examples of nearly orthogonal instruction sets.
[edit] The PDP-11
With the exception of its floating point instructions, the PDP-11 was very strongly orthogonal. Every integer instruction could operate on either 1-byte or 2-byte integers and could access data stored in registers, stored as part of the instruction, stored in memory, or stored in memory and pointed to by addresses in registers. Even the PC and the stack pointer could be affected by the ordinary instructions using all of the ordinary data modes.
[edit] The MC68000
By comparison, Motorola's designers attempted to make the assembly language orthogonal while the underlying machine language was somewhat less so. Compared to the PDP-11, the MC68000 used separate registers to store instructions and the addresses of data in memory; the assembly language "hid" some of this separation from the programmer. Many programmers disliked the "near" orthogonality, while others were grateful for the attempt.
At the bit level, the person writing the assembler (or debugging machine code) would clearly see that these "instructions" could become any of several different op-codes. It was quite a good compromise because it gave almost the same convenience as a truly orthogonal machine, and yet also gave the CPU designers freedom to use the bits in the instructions more efficiently than a purely-orthogonal approach might have.
[edit] The 8080 and follow on designs
By comparison, the Intel 8080 8-bit microprocessor was a highly non-orthogonal machine. An assembly-language programmer needed to be constantly mindful of what operations were legal on which registers. Most operations could only be performed on data in the A (accumulator) register while other operations could only be performed on the H/L pair and so forth. This was probably a necessary tradeoff given that the 8080's 8-bit instruction word could only encode a total of 256 machine language instructions (compared to the 65,536 possibilities available with the PDP-11 or the 4 billion instructions available with the MC68000). In the interest of program compatibility, the 8086 family maintained much of this non-orthogonality even though it led to an instruction set that some computer scientists derided as being "baroque".
[edit] Into the RISC age
A fully orthogonal architecture may not be the most "bit efficient" architecture. In the late 1970s research at IBM (and similar projects elsewhere) demonstrated that the majority of these "orthogonal" addressing modes were ignored by most programs. Perhaps some of the bits that were used to express the fully orthogonal instruction set could instead be used to express more virtual address bits or select from among more registers.
In the RISC age, computer designers strove to achieve a balance that they thought better. In particular, most RISC computers, while still being highly orthogonal with regard to which instructions can process which data types, now have reverted to "load/store" architectures. In these architectures, only a very few memory reference instructions can access main memory and only for the purpose of loading data into registers or storing register data back into main memory; only a few addressing modes may be available, and these modes may vary depending on whether the instruction refers to data or involves a transfer of control (jump). Conversely, data must be in registers before it can be operated upon by the other instructions in the computer's instruction set. This tradeoff is made explicitly to enable the use of much larger register sets, extended virtual addresses, and longer immediate data (data stored directly within the computer instruction).