Threaded code
From Wikipedia, the free encyclopedia
- For multi-threaded programming, see Thread (computer science).
In computer science, the term threaded code refers to a compiler implementation technique where the generated code has a form that essentially consists entirely of calls to subroutines. The code may be processed by an interpreter, or may simply be a sequence of machine code call instructions.
Compared to alternative code generation techniques, threaded code is often very compact, at the expense of slower execution speed. However, code that is sufficiently compact to fit in a computer processor's cache can be executed with fewer cache misses and improved performance. In these cases threaded code can be both smaller and faster than non-threaded code,
Threaded code is most well known as the implementation technique commonly used in the Forth. It was also used in early versions of the B programming languages, as well as many implementations of BASIC, and some implementations of COBOL and other languages for small minicomputers.
Contents |
[edit] History leading to threaded code
Early computers supported relatively little memory (16-bit machines can only address 64 kilobytes) and it was very expensive. Consequently a lot of time was spent trying to find ways to reduce the size of programs so they would fit in the memory available.
Instead of writing out every step of such operations in every part of the program where it was needed (possibly using a macro), programmers saved memory by writing out every step of such operations exactly once and placing it in a subroutine.
This process (refactoring) is still commonly used today by all programmers, although today they do it for different reasons.
The top-level application usually consists of nothing but subroutine calls. And many of those subroutines, in turn, also consist of nothing but subroutine calls.
Some early computers such as the RCA 1802 required several instructions to call a subroutine. In the top-level application and in many subroutines, that sequence is repeated over and over again, only the subroutine address changing from one call to the next. Using expensive memory to store the same thing over and over again seems wasteful -- is there any way to store this information exactly once?
[edit] Threaded code
To save even more space, programmers squeezed those lists of subroutine calls into simple lists of subroutine addresses (leaving out the "call" instruction on each one), and used a small interpreter (later called a virtual machine) to call each subroutine in turn. This is called Direct Threaded Code (DTC).
Charles H. Moore invented an even more compact notation in 1970 for his Forth virtual machine: indirect threaded code (ITC). Originally, Moore invented this because it was easy and fast on NOVA minicomputers, which have an indirection bit in every address. He said (in published remarks, Byte Magazine's Forth Issue) that he found it so convenient that he propagated it into all later Forth designs.
Some Forth compilers compile Forth programs into direct-threaded code, while others make indirect-threaded code. The programs act the same either way.
[edit] Later developments
Not content to stop there, programmers have developed other related techniques to make programs even more compact, although they are slower than threaded code.
The idea of program compression has led to the idea of Kolmogorov complexity.
[edit] Threading models
Practically all executable threaded code uses one or another of these methods for calling subroutines (each method is called a "threading model").
[edit] Indirect threading
The list of addresses in code point to an address at the start of a data area. The address at the start of the data area points at machine code to use the data. This "extra" pointer in indirect threading serves as an "executable data type" to define custom interpreters for data. The address at the start of each data area points to the code shared by all data areas of that type. More compact, yet slightly slower than direct-threaded code. Faster than byte code. No type interpretation is usually needed. Older Forth systems produced indirect-threaded code.
[edit] Direct threading
The addresses in the code are actually the address of machine language. This is a compromise between speed and space. The indirect data pointer is lost, at some loss in the language's flexibility, and this may need to be corrected by a type tag in the data areas, with an auxiliary table. Some Forth systems produce direct-threaded code. On many machines direct-threading is faster than subroutine threading (see reference below).
[edit] Subroutine threading
So-called "subroutine-threaded code" consists of a series of native machine language "call" instructions. This is not threaded code in the original sense, since the instructions are directly executed, without any interpretation. Early compilers for ALGOL, Fortran, Cobol and some Forth systems often produced subroutine-threaded code. The code in many of these systems operated on a first-in-first-out stack of operands, which had well-developed compiler theory. Many people believe that this is the fastest threading model, since all modern processors have special hardware support for such subroutine "call" instructions. But according to measurements by Anton Ertl, "in contrast to popular myths, subroutine threading is usually slower than direct threading."
[edit] Token threading
Token threaded code uses lists of 8 or 12-bit indexes to a table of pointers. Token threaded code is notably compact, without much special effort by a programmer. It is usually half to three-fourths the size of other threaded-codes, which are themselves a quarter to an eighth the size of compiled code. The table's pointers can either be indirect or direct. Some Forth compilers produce token threaded code. Some programmers consider the "p-code" generated by some Pascal compilers, as well as the byte codes used by .NET, Java, Basic and some C compilers to be token-threading.
[edit] Huffman threading
Huffman threaded code consists of lists of Huffman codes. A Huffman code is a variable length bit string used to identify a unique item. A Huffman-threaded interpreter locates subroutines using an index table or tree of pointers that can be navigated by the Huffman code. Huffman threaded code is one of the most compact representations known for a computer program. Basically the index and codes are organized by measuring the frequency that each subroutine occurs in the code. Frequent calls are given the shortest codes. Operations with approximately equal frequencies are given codes with nearly equal bit-lengths. Most Huffman-threaded systems have been implemented as direct-threaded Forth systems, and used to pack large amounts of slow-running code into small, cheap microcontrollers. Most published uses have been in toys, calculators or watches.
[edit] Lesser used threading
- String threading, where operations are identified by strings, usually looked-up by a hash table. This was used in Charles H. Moore's earliest Forth implementations and in the University of Illinois's experimental hardware-interpreted computer language. It is also used in Bashforth.
- return threading
- Call Threaded Code uses a list of addresses that refers directly to machine language primitives, just like Direct Threaded code. However, the interpreter, instead of jumping to the primitive, calls it with a microprocessor subroutine call instruction instead. This system is slower than direct threading, and may even be slower than indirect threading on some systems. However, it is convenient because it allows for implementations written entirely in high-level languages, like C or Oberon. For example, here is an implementation written in both C and Oberon (untested):
/* In C */ extern int stillInterpreting; extern int dataStack[], returnStack[]; extern int dataStackPointer, returnStackPointer; typedef void WORD( int **, int **, int *, int * ); stillInterpreting = 1; while( stillInterpreting ) { WORD *nextWord = (WORD *)(*instructionPointer++); nextWord( &dataStackPointer, &returnStackPointer, dataStack, returnStack ); }
/* In Oberon */ MODULE CallThreadingInterpreter; CONST MaxStackSize* = 256; TYPE ThreadState* = POINTER TO ThreadStateDesc; Word* = PROCEDURE( current: ThreadState ); StackSpace* = ARRAY MaxStackSize OF INTEGER; ThreadStateDesc* = RECORD dataStackPointer, returnStackPointer: INTEGER; dataStack, returnStack: StackSpace; instructionPointer: INTEGER; program: ARRAY OF Word; END; VAR stillInterpreting*: BOOLEAN; PROCEDURE Interpret*( current: ThreadState ); VAR w: Word; BEGIN stillInterpreting := TRUE; WHILE stillInterpreting DO w := current.program[ current.instructionPointer ]; INC( current.instructionPointer ); w( current ); END; END Interpret; END CallThreadedInterpreter.
Note how in both cases no direct assembly language is required.
[edit] Common amenities
Separating the data and return stacks in a machine eliminates a great deal of stack management code, substantially reducing the size of the threaded code. The dual-stack principle was originated three times independently: for Burroughs large systems, Forth and PostScript, and is used in some Java virtual machines.
Three registers are often present in a threaded virtual machine. Another one exists for passing data between subroutines ('words'). These are:
- ip or i (instruction pointer)
- w (work pointer)
- rp or r (return stack pointer)
- sp or s (parameter stack pointer for passing parameters between words)
Often, threaded virtual machines such as implementations of Forth have a simple virtual machine at heart, consisting of three primitives. Those are:
- nest, also called docol
- unnest, or semi_s (;s)
- next
In an indirect-threaded virtual machine, the one given here, the operations are:
next: (ip)+ -> w ; jmp (w)+ nest: ip -> -(rp) ; w -> ip ; next unnest: (rp)+ -> ip ; next
This is perhaps the simplest and fastest interpreter or virtual machine.
[edit] See also
[edit] External links
- Anton Ertl's explanatory page What is Threaded Code? describes different threading techniques and provides further references.
- The Development of the C Language by Dennis M. Ritchie describes B (a precursor of C) as implemented using "threaded code".
- Thinking Forth Project includes the seminal (but out of print) book Thinking Forth by Leo Brodie published in 1984.