Self-hosting

From Wikipedia, the free encyclopedia

Self-hosting refers to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, and shells.

If a system is so new that no software has been written for it, then software is developed on another self-hosting system and placed on a storage device that the new system can read. Development continues this way until the new system can reliably host its own development. Development of the Linux operating system, for example, was initially hosted on a Minix system. Writing new software development tools "from the metal" (that is, without using another host system) is rare and in many cases impossible.

Several programming languages are self-hosting, in the sense that a compiler for the language, written in the same language, is available. The first compiler for a new programming language can be written in another language (in rare cases, machine language) or produced using bootstrapping. Self-hosting languages include Lisp, Forth, C, Pascal, Modula-2, Oberon, Smalltalk, OCaml, and FreeBASIC.

Reliance on self-hosting programming tools is a potential security risk, as demonstrated by the Thompson hack.

[edit] History

The first self-hosting compiler (excluding assemblers) was written for Lisp by Hart and Levin at MIT in 1962. [1] Because Lisp interpreters existed previously, but no Lisp compilers, they used an original method to compile their compiler. The compiler, like any other Lisp program, could be run in a Lisp interpreter. So they simply ran the compiler in the interpreter, giving it its own source code to compile.

The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the S-expression definition of the compiler work on itself through the interpreter. (AI Memo 39) [2]

This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science, such as the proof that the halting problem is undecidable.

[edit] See also

In other languages