Belt machine

In computer engineering and in programming language implementations, a belt machine is a real or emulated computer that uses a FIFO queue rather than individual machine registers to evaluate each sub-expression in the program. A belt computer is programmed with an instruction set that specifies arguments explicitly but results implicitly.[1]

The common alternative to belt machines are register machines, in which each instruction explicitly names the specific registers to use for operand argument and result locations. Belt machines are related to stack machines, which specify both arguments and results implicitly using a pushdown stack. Other alternatives are accumulator machines, which have only one visible general-purpose temp register, and memory-to-memory machines, which have no visible temp registers.

A belt machine implements temporary storage with a fixed-length FIFO queue, or "belt" by analogy to a conveyor belt. The operands of the arithmetic logic units (ALUs) and other functional units may be taken from any position on the belt, and the result from the computation is "dropped" (stored) in the front position of the belt, advancing the belt to make room. As the belt is fixed length, drops in the front are matched by older operands falling off the back; pushed-off operands become inaccessible and must be explicitly saved if they are still needed for later computation. Most operations of the instruction set work only with data on the belt, not on data registers or main memory cells.

For a typical instruction like Add, both argument operands come from explicitly named positions on the belt, and the result is dropped on the front, ready for the next instruction. Operations with multiple results simply drop more values at the belt front. Most belt instructions are encoded as just an opcode and two belt positions, with no additional fields to specify a result register, memory address, or literal constant. This encoding is easily extended to richer operations with more than two inputs or more than one result. Constant operands are dropped by separate Load Immediate instructions. All accessing of program variables in main memory RAM is segregated into separate Load or Store instructions containing one memory address or some way to calculate that address from belt operands.

All belt machines have variants of the load/store opcodes for accessing local variables and the heap. This can be by offsets from a pointer on the belt, or by offsets from various special-purpose base registers. Similarly there will be instructions to branch to an address taken from the belt, in addition to branches relative to the program counter.

Temporal addressing

Because each drop of a result moves the previous belt content along to later positions in the queue, a particular operand continually changes its position (and hence address) as a consequence of later execution. In effect, an access to the operand at position zero is a request for the most recent value dropped to the belt, while (for example) a reference to position five is to the sixth most recent drop. Thus the addresses of belt operands reflect the belt history back in time; this is temporal addressing. It is difficult for human programmers to keep track of belt contents, and hence operand addresses, when writing assembly code for a belt machine. However it is not difficult for a compiler to track the changing contents and emit the correct position addresses in generated code.

Spill and fill

The belt is fixed length and can may not be long enough to hold all live transient operands before they are pushed off the end. If an operand is needed for longer than its belt lifetime it must be saved while still on the belt (spill) and later restored to the belt when needed again (fill). This situation is equivalent to the need to spill registers to memory when a program runs out of registers in a general-register machine. Spilled operands may be written to memory using normal store instructions, and restored using normal load instructions, or spill and fill may use special-purpose storage and associated operations that are faster or offer other advantages over load and store.

Freedom from hazard

The operands on the belt are read-only; new results do not overwrite previous values. The belt is thus a single-assignment structure, and is immune to the data hazards that must be dealt with by modern out-of-order general-register machines..

Compact object code

Dense machine code was very valuable in the 1960s, when main memory was very expensive and very limited even on mainframes. It became important again on the initially-tiny memories of minicomputers and then microprocessors. Density remains important today, for smartphone applications, Java applications downloaded into browsers over slow Internet connections, and in ROMs for embedded applications. A more general advantage of increased density is improved effectiveness of caches and instruction prefetch.

Belt machines have smaller instructions than other common styles of machines, due to not requiring a destination address for results. This saving can make a significant difference for fixed-length instruction formats, which normally use power-of-two instruction widths. If there are thirty-two addressable elements (registers on a general-register machine, belt positions on a belt machine) then each element address occupies five bits in the instruction, needing 15 bits for the three-address format of a general-register machine, but only 10 bits using the two-address format of a belt machine. Because bits are also needed for opcode and other information in the instruction, the (power-of-two constrained) instruction width often determines the maximal number of addressable elements possible in a design. Typically a belt machine instruction can support the encoding of double the number of addressable elements compared to a general-register machine of the same instruction width. There are similar gains in variable-length instruction encodings.

Overall, belt machine code is not as compact as that for stack machines, which use no operand addresses at all but frequently must introduce stack-manipulation instructions that are not needed on a belt machine. The instructions for accumulator or memory-to-memory machines are not padded out with multiple register fields. Instead, they use compiler-managed anonymous variables for subexpression values. These temps require extra memory reference instructions which take more code space than for the belt machine. Consequently they produce compact code when used for small memory address spaces, but have no advantage over belt encoding in larger spaces.

Implementation

While a belt machine presents an operand queue as the program model, there is not necessarily a physical queue (shift register) in the implemented hardware. Instead, a belt design may use an implementation analogous to the register renaming common in modern general-register machines. Live data values are kept in conveniently addressable physical (individual registers, register files, SRAM, or operand forwarding) and generally not moved for the duration of their lifetime on the belt. Logic in decode maps logical belt positions in the instructions to physical locations; the mapping is updated to reflect the changes of logical position arising from newly dropped results.

The belt-machine architecture was created by startup Mill Computing, Inc. for use in their Mill family of general-purpose CPUs.[2]

References