Addressing mode

From Wikipedia, the free encyclopedia

Addressing modes, a concept from computer science, are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand (or operands) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

In computer programming, addressing modes are primarily of interest to compiler writers and to those who write code directly in assembly language.

Contents

[edit] Caveats

Note that there is no generally accepted way of naming the various addressing modes. In particular, different authors and/or computer manufacturers may give different names to the same addressing mode, or the same names to different addressing modes. Furthermore, an addressing mode which, in one given architecture, is treated as a single addressing mode may represent functionality that, in another architecture, is covered by two or more addressing modes. For example, some complex instruction set computer (CISC) computer architectures, such as the Digital Equipment Corporation (DEC) VAX, treat registers and literal/immediate constants as just another addressing mode. Others, such as the IBM System/390 and most reduced instruction set computer (RISC) designs, encode this information within the instruction code. Thus, the latter machines have three distinct instruction codes for copying one register to another, copying a literal constant into a register, and copying the contents of a memory location into a register, while the VAX has only a single "MOV" instruction.

The addressing modes listed below are divided into code addressing and data addressing. Most computer architectures maintain this distinction, but there are, or have been, some architectures which allow (almost) all addressing modes to be used in any context.

The instructions shown below are purely representative in order to illustrate the addressing modes, and do not necessarily apply to any particular computer.

[edit] Useful side effect

Some computers have a Load effective address instruction. This performs a calculation of the effective operand address, but instead of acting on that memory location, it loads the address that would have been accessed into a register. This can be useful when passing the address of an array element to a subroutine. It may also be a slightly sneaky way of doing more calculation than normal in one instruction; for example, use with the addressing mode 'base+index+offset' allows one to add two registers and a constant together in one instruction.

[edit] How many address modes?

Different computer architectures vary greatly as to the number of addressing modes they provide. At the cost of a few extra instructions, and perhaps an extra register, it is normally possible to use the simpler addressing modes instead of the more complicated modes. It has proven much easier to design pipelined CPUs if the only addressing modes available are simple ones.

Most RISC machines have only about five simple addressing modes, while CISC machines such as the DEC VAX supermini have over a dozen addressing modes, some of which are quite complicated. The IBM System/360 mainframe had only three addressing modes; a few more have been added for the System/390.

When there are only a few addressing modes, the particular addressing mode required is usually encoded within the instruction code (e.g. IBM System/390, most RISC). But when there are lots of addressing modes, a specific field is often set aside in the instruction to specify the addressing mode. The DEC VAX allowed multiple memory operands for almost all instructions and so reserved the first few bits of each operand specifier to indicate the addressing mode for that particular operand.

Even on a computer with many addressing modes, measurements of actual programs indicate that the simple addressing modes listed below account for some 90% or more of all addressing modes used. Since most such measurements are based on code generated from high-level languages by compilers, this may reflect to some extent the limitations of the compilers being used.

[edit] Simple addressing modes for code

[edit] Absolute

   +----+------------------------------+
   |jump|           address            | 
   +----+------------------------------+

Effective address = address as given in instruction

[edit] Program relative

   +------+-----+-----+----------------+
   |jumpEQ| reg1| reg2|         offset |    jump relative if reg1=reg2
   +------+-----+-----+----------------+

Effective address = offset plus address of next instruction.

The offset is usually signed, in the range -32768 to +32767.

This is particularly useful in connection with conditional jumps, because you usually only want to jump to some nearby instruction (in a high-level language most if or while statements are reasonably short). Measurements of actual programs suggest that an 8 or 10 bit offset is large enough for some 90% of conditional jumps.

Another advantage of program-relative addressing is that the code may be position-independent, i.e. it can be loaded anywhere in memory without the need to adjust any addresses.

[edit] Register indirect

   +-------+-----+
   |jumpVia| reg | 
   +-------+-----+

Effective address = contents of specified register.

The effect is to transfer control to the instruction whose address is in the specified register. Such an instruction is often used for returning from a subroutine call, since the actual call would usually have placed the return address in a register.

[edit] Simple addressing modes for data

[edit] Register

   +------+-----+-----+-----+
   | mul  | reg1| reg2| reg3|      reg1 := reg2 * reg3;
   +------+-----+-----+-----+

This 'addressing mode' does not have an effective address and is not considered to be an addressing mode on some computers.

In this example, all the operands are in registers, and the result is placed in a register.

[edit] Base plus offset, and variations

   +------+-----+-----+----------------+
   | load | reg | base|     offset     | 
   +------+-----+-----+----------------+

Effective address = offset plus contents of specified base register.

The offset is usually a signed 16-bit value (though the 80386 is famous for expanding it to 32-bit, though x64 didn't).

If the offset is zero, this becomes an example of register indirect addressing; the effective address is just that in the base register.

On many RISC machines, register 0 is fixed with value 0. If register 0 is used as the base register, this becomes an example of absolute addressing. However, only a small portion of memory can be accessed (the first 32 Kbytes and possibly the last 32 Kbytes)

The 16-bit offset may seem very small in relation to the size of current computer memories (which is why the 80386 expanded it to 32-bit. x64 didn't expand it, however.) (it could be worse: IBM System/360 mainframes only have a positive 12-bit offset 0 to 4095). However, the principle of locality of reference applies: over a short time span most of the data items you wish to access are fairly close to each other.

Example 1: Within a subroutine you will mainly be interested in the parameters and the local variables, which will rarely exceed 64 Kbytes, for which one base register suffices. If this routine is a class method in an object-oriented language, you will need a second base register pointing at the attributes for the current object (this or self in some high level languages).

Example 2: If the base register contains the address of a record or structure, the offset can be used to select a field from that record (most records/structures are less than 32 Kbytes in size).

[edit] Immediate/literal

   +------+-----+-----+----------------+
   | add  | reg1| reg2|    constant    |    reg1 := reg2 + constant;
   +------+-----+-----+----------------+

This 'addressing mode' does not have an effective address, and is not considered to be an addressing mode on some computers.

The constant might be signed or unsigned.

Instead of using an operand from memory, the value of the operand is held within the instruction itself. On the DEC VAX machine, the literal operand sizes could be 6, 8, 16, or 32 bits long.

[edit] Other addressing modes for code and/or data

[edit] Absolute/Direct

   +------+-----+--------------------------------------+
   | load | reg |         address                      | 
   +------+-----+--------------------------------------+

Effective address = address as given in instruction.

This requires space in an instruction for quite a large address. It is often available on CISC machines which have variable length instructions.

Some RISC machines have a special Load Upper Literal instruction which places a 16-bit constant in the top half of a register. An OR literal instruction can be used to insert a 16-bit constant in the lower half of that register, so that a full 32-bit address can then be used via the register-indirect addressing mode, which itself is provided as 'base-plus-offset' with an offset of 0.

[edit] Indexed absolute

   +------+-----+-----+--------------------------------+
   | load | reg |index|  32-bit address                | 
   +------+-----+-----+--------------------------------+

Effective address = address plus contents of specified index register.

This also requires space in an instruction for quite a large address. The address could be the start of an array or vector, and the index could select the particular array element required. The index register may need to have been scaled to allow for the size of each array element.

Note that this is more or less the same as base-plus-offset addressing mode, except that the offset in this case is large enough to address any memory location.

[edit] Base plus index

   +------+-----+-----+-----+
   | load | reg | base|index| 
   +------+-----+-----+-----+

Effective address = contents of specified base register plus contents of specified index register.

The base register could contain the start address of an array or vector, and the index could select the particular array element required. The index register may need to have been scaled to allow for the size of each array element. This could be used for accessing elements of an array passed as a parameter.

[edit] Base plus index plus offset

   +------+-----+-----+-----+----------------+
   | load | reg | base|index|  16-bit offset | 
   +------+-----+-----+-----+----------------+

Effective address = offset plus contents of specified base register plus contents of specified index register.

The base register could contain the start address of an array or vector of records, the index could select the particular record required, and the offset could select a field within that record. The index register may need to have been scaled to allow for the size of each record.

[edit] Scaled

   +------+-----+-----+-----+
   | load | reg | base|index| 
   +------+-----+-----+-----+

Effective address = contents of specified base register plus scaled contents of specified index register.

The base register could contain the start address of an array or vector, and the index could contain the number of the particular array element required.

This addressing mode dynamically scales the value in the index register to allow for the size of each array element, e.g. if the array elements are double precision floating-point numbers occupying 8 bytes each then the value in the index register is multiplied by 8 before being used in the effective address calculation. The scale factor is normally restricted to being a power of two so that shifting rather than multiplication can be used (shifting is usually faster than multiplication).

[edit] Register indirect

   +------+-----+-----+
   | load | reg | base| 
   +------+-----+-----+

Effective address = contents of base register.

A few computers have this as a distinct addressing mode. Many computers just use base plus offset with an offset value of 0.

[edit] Register autoincrement indirect

   +------+-----+-----+
   | load | reg | base| 
   +------+-----+-----+

Effective address = contents of base register.

After determining the effective address, the value in the base register is incremented by the size of the data item that is to be accessed.

Within a loop, this addressing mode can be used to step through all the elements of an array or vector. A stack can be implemented by using this in conjunction with the next addressing mode (autodecrement).

In high-level languages it is often thought to be a good idea that functions which return a result should not have side effects (lack of side effects makes program understanding and validation much easier). This addressing mode has a side effect in that the base register is altered. If the subsequent memory access causes an error (e.g. page fault, bus error, address error) leading to an interrupt, then restarting the instruction becomes much more problematic since one or more registers may need to be set back to the state they were in before the instruction originally started.

There have been at least two computer architectures which have had implementation problems with regard to recovery from interrupts when this addressing mode is used:

  • Motorola 68000. Could have one or two autoincrement register operands. The 68010+ resolved the problem by saving the processor's internal state on bus or address errors.
  • DEC VAX. Could have up to 6 autoincrement register operands. Each operand access could cause 2 page faults (if operands happened to straddle a page boundary). Of course the instruction itself could be over 50 bytes long and might straddle a page boundary as well!

[edit] Autodecrement register indirect

   +------+-----+-----+
   | load | reg | base| 
   +------+-----+-----+

Before determining the effective address, the value in the base register is decremented by the size of the data item which is to be accessed.

Effective address = new contents of base register.

Within a loop, this addressing mode can be used to step backwards through all the elements of an array or vector. A stack can be implemented by using this in conjunction with the previous addressing mode (autoincrement).

See also the discussion on side-effects under the autoincrement addressing mode.

[edit] Memory indirect

Any of the addressing modes mentioned in this article could have an extra bit to indicate indirect addressing, i.e. the address calculated by using some addressing mode is the address of a location (often 32 bits or a complete word) which contains the actual effective address.

Indirect addressing may be used for code and/or data. It can make implementation of pointers or references very much easier, and can also make it easier to call subroutines which are not otherwise addessable. There is a performance penalty due to the extra memory access involved.

Some early minicomputers (e.g. DEC PDP8, Data General Nova) had only a few registers and only a limited addressing range (8 bits). Hence the use of memory indirect addressing was almost the only way of referring to any significant amount of memory.

[edit] PC-based addressing

The x86-64 architecture supports RIP-based addressing, which uses the 64-bit program counter (instruction pointer) RIP as a base register. This allows for position-independent code.

[edit] Obsolete addressing modes

The addressing modes listed here were used in the 1950-1980 time frame, but most are no longer available on current computers. This list is by no means complete; there have been lots of other interesting/peculiar addressing modes used from time to time, e.g. absolute plus logical OR of 2 or 3 index registers.

[edit] Multi-level memory indirect

If the word size is larger than the address size, then the word referenced for memory-indirect addressing could itself have an indirect flag set to indicate another memory indirect cycle. Care is needed to ensure that a chain of indirect addresses does not refer to itself; if it did, you could get an infinite loop while trying to resolve an address.

The DEC PDP-10 computer with 18-bit addresses and 36-bit words allowed multi-level indirect addressing with the possibility of using an index register at each stage as well.

[edit] Memory-mapped registers

On some computers the registers were regarded as occupying the first 8 or 16 words of memory (e.g. ICL 1900, DEC PDP-10). This meant that there was no need for a separate 'Add register to register' instruction - you could just use the 'Add memory to register' instruction.

In the case of early models of the PDP-10, which did not have any cache memory, you could actually load a tight inner loop into the first few words of memory (the fast registers in fact), and have it run much faster than if it was in magnetic core memory.

Later models of the DEC PDP-11 series mapped the registers onto addresses in the input/output area, but this was primarily intended to allow remote diagnostics. Confusingly, the 16-bit registers were mapped onto consecutive 8-bit byte addresses.

[edit] Memory indirect, auto inc/dec

On some early minicomputers (e.g. DEC PDP8, Data General Nova), there were typically 16 special memory locations. When accessed via memory indirect addressing, 8 would automatically increment after use and 8 would automatically decrement after use. This made it very easy to step through memory in loops without using any registers.

[edit] Zero page

In the MOS Technology 6502 the first 256 bytes of memory could be accessed very rapidly. The reason was that the 6502 was lacking in registers which were not special function registers. To use zero page access an 8-bit address would be used, saving one clock cycle as compared with using a 16-bit address. An Operating System would use much of zero page, so it was not as useful as it might have seemed.

[edit] Scaled index with bounds checking

This is similar to scaled index addressing, except that the instruction has two extra operands (typically constants), and the hardware would check that the index value was between these bounds.

Another variation uses vector descriptors to hold the bounds; this makes it easy to implement dynamically allocated arrays and still have full bounds checking.

[edit] Register indirect to byte within word

The DEC PDP-10 computer used 36-bit words. It had a special addressing mode which allowed memory to be treated as a sequence of bytes (bytes could be any size from 1 bit to 36 bits). A 1-word sequence descriptor held the current word address within the sequence, a bit position within a word, and the size of each byte.

Instructions existed to load and store bytes via this descriptor, and to increment the descriptor to point at the next byte (bytes were not split across word boundaries). Much DEC software used five 7-bit bytes per word (plain ASCII characters), with 1 bit unused per word. Implementations of C had to use four 9-bit bytes per word, since C assumes that you can access every bit of memory by accessing consecutive bytes.

[edit] Index next instruction

The Elliott 503 computer had 39-bit words, only used absolute addressing, and did not have any index registers. To avoid the need for self-modifying programs, it had an instruction:

add the contents of this location to the address of the next instruction.

The effect of this was that any location could be used as an index, at the cost of reduced speed, of course.