Code generation (compiler)

From Wikipedia, the free encyclopedia

This article is about machine code generation with a compiler. For other uses, see Code generation.

In computer science, code generation is the process by which a compiler's code generator converts a syntactically-correct program into a series of instructions that could be executed by a machine. Sophisticated compilers may use several cascaded code generation stages to fully compile code; this is due to the fact that algorithms for code optimization are more readily applicable in an intermediate code form, and also facilitates a single compiler that can target multiple architectures called target machines as only the final code generation stage (the backend) would need to change from target to target.

The input to the code generator stage typically consists of a parse tree, abstract syntax tree, or intermediate language code (often in three address code form). Since the target machine may be a physical machine such as a microprocessor, or an abstract machine such as a virtual machine or an intermediate language, (human-readable code), the output of code generator could be machine code, assembly code, code for an abstract machine (like JVM), or anything between.

In a more general sense, code generation is used to produce programs in some automatic manner, reducing the need for human programmers to write code manually. Code generations can be done either at runtime, including load time, or compiler time. Just-in-time compilers are an example of a code generator that produce native or nearly native code from byte-code or the like when programs are loaded onto the compilers. On the other hand, a compiler-compiler, (yacc, for example) almost always generates code at compiler time. A preprocessor is an example of the simplest code generator, which produces target code from the source code by replacing predefined keywords.

When code generation occurs at runtime, it is important that it is efficient in space and time. For example, when regular expressions are interpreted and used to generate code at runtime, a non-determistic finite state machine instead of deterministic one is often generated because usually the former can be created more quickly and occupies less memory space than the latter. Despite it generally generating less efficient code, code generation at runtime can take the advantage of being done at runtime. Some people cite this fact to note that a JIT compiler can generate more efficient code than a compiler invoked before runtime, since it is more knowledgeable about the context and the execution path of the program than when the compiler generates code at compile time.

[edit] Major tasks in code generation

In addition to basic conversion from intermediate representation into machine instructions, code generator also tries to use faster instructions, use fewer instructions, exploit available fast registers and avoid redundant computations.

  • Instruction selection. With the diverse instructions supported in a target machine, instruction selection deal with the problem of which instructions to use
  • Instruction scheduling. In what order to run the instructions.
  • Register allocation. The speed gap between processors and memory is partially bridged by the registers in processors. How to put useful variables into registers has a great impact of the final performance.

The usual method is a state machine, or weak artificial intelligence scheme that selects and combines templates for computations.

[edit] See also

[edit] External links

In other languages