Talk:Compiler

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Computer science.
??? This article has not yet received a rating on the assessment scale.
??? This article has not yet received an importance rating on the assessment scale.

Contents

[edit] where is runtime system/library ?

I think it is indispensable for some compiler dealing with Fortran 90 and OpenMP.

[edit] Moved from original Compilers page

Compilers - Tools for creating machine-readable programs

A compiler is a software programs which takes as its input a set of modules written in a ProgrammingLanguage and creates a machine readable binary executable image file.

Computer hardware can only understand very specific binary machine language instructions. Each instruction must be written in an extremely precise format and must conform to extremely limited abilities of the computer's CPU (Central Processing Unit). Such binary machine language is too cumbersome for humans to read, write, and understand directly.

Compilers have the ability to understand the more abstract, symbolic, human-friendly programming languages and to generate a low level binary machine language program which executes *exactly* what the programming language specifies.

[edit] Moved from original Talk page

test Right, I forgot about that. Most of my experience programming was in the mid-80s, tinkering around with BASIC. Not exactly a powerhouse language.  :-D

Compilers do not always take source code and output executable machine code. The program may have to be processed by a linkage editor and/or loader program first.----

... Steve Gibson (http://grc.com/) releases his programs in machine language. Not sure how he does it... If he actually writes in it or what. I would imagine not but haven't asked. I suppose it's possible, though that would probably make him a certifiable genius.  :-D

Yes he writes in assembler. See How to write Windows programs in assembler. There is even an link to a free assembler for 386 Intel processors and beyond (486, Pentium etc.).

It would be nice to have an assembler entry with some example programs.

No. Steve Gibson packs his executables with a custom/obsolete version of UPX and claims them to be Assembly. Please verify this for yourself. Goat-see 05:00, 8 May 2004 (UTC)

[edit] Moved from Compile (computing)

In telecommunication, the term compile has the following meanings:

1. To translate a computer program expressed in a high-level language into a program expressed in a lower level language, such as an intermediate language, assembly language, or a machine language.

   * See: Compiler 

2. To prepare a machine language program from a computer program written in another programming language by making use of the overall logic structure of the program or by generating more than one computer instruction for each symbolic statement as well as performing the function of an assembler.

Source: from Federal Standard 1037C

[edit] Semantic analysis

Shouldn't there be anything said abort semantic analysis, thats an important part of the frontend to. Type-checking and such.

[edit] Source code/language

I changed "source code" to "source language" because a compiler can be pretty anything that translates a language to another one. This includes a lot of translations which are not source code, such as binary-to-binary translations and just-in-time compilation processes. "Source code" in its normal meaning refers to a list of instructions in a language understandable to humans. Source language can be anything that can be griped by formal language definitions (and that is pretty much anything).

[edit] Compiled vs interpreted

The article makes a distinction between compiled and interpreted languages. While most languages are either primarily interpreted, or primarily compiled, any language that can be compiled can be interpreted, and (modulo eval statements and the like) any language that can be interpreted can be compiled. In practice, there are compilers for normally-interpreted languages such as Perl (which is compiled to bytecode anyway), LISP, and Prolog, and interpreters for such commonly-compiled languages as C and C++. -- Robert Merkel

[edit] Definition of "compiler"

The original definition of compiler is not necessarily bound to generation of executeable/object code. The general definition in the dragon book (which would worth a mention on its own on the compiler main page) goes roughly as "a compiler reads a program written in one language - the source language - and translates it into an equivalent program written in another language - the target language". Also a rough lineup of the history of compiler development, maybe split from "first FORTRAN compiler to dragon book" and "dragon book until now" would be nice. -- scut

From language compiler: a programming language compiler is an application that translates text of computer program written in some human-understandable programming language into the native code (machine code) of a particular physical processor, or into some virtual (software) machine code (e.g. Java compiler). --Anonymous

[edit] Need help in compiler construction

Hi, I need some help in my compiler construction course. I acquire the famous Dragon Book but it seems to me very difficult as it contains all examples in pascal & I had C/C++ background. So if you know any good book or link then please share with me. Thanks in advance, Iqbal

This is not a compiler discussion forum. It is for discussion of the Compiler entry of Wikipedia.
Oh, and the Dragon book is definitely the book you want. Learn Pascal. It's not that hard. -P3d0 20:15, 9 Jun 2004 (UTC)

On the other hand, Wikibooks:Compiler construction *is* a compiler discussion forum. --DavidCary 04:00, 27 Jun 2005 (UTC)

[edit] Pascal

"The Pascal programming language is well known for this capability, and in fact many Pascal compilers are themselves written in the Pascal language because of the rigid specification of the language and the capability to use a single pass to compile Pascal language programs."

I don't understand the logic here. --Grouse 13:37, 10 Jul 2004 (UTC)

Nor do I. I've heard similar sentiments expressed as an article of faith in Pascal - 20 years ago. Charmingly quaint now, but quaintness is not part of the mission, and it should go away or be rephrased. Stan 19:10, 10 Jul 2004 (UTC)
I suspect the real reason "many Pascal compilers are themselves written in the Pascal language" merely because Pascal is used in the Dragon book. But the authors picked Pascal because of those above reasons.
Let me expand on what I *think* that sentence in the article is trying to say:
  • If I had to write a compiler for some given language from scratch, I wouldn't implement the compiler in Pascal. I would pick some more modern language (perhaps using Lex, AWK, Bison, etc.), something incredibly sophisticated and full-featured. However,
  • if I had to write a compiler, and I could pick what language it compiled, I'd pick something really simple and easy like assembly language or L00P. I'd still try to implement the compiler in some more modern language.
  • The idea of a "self-hosted compiler" is intellectually challenging[1]. If a language is really great, wouldn't it make it easy to write everything in it ? Even compilers ? Even it's own compiler ? But if I get to pick the language, I would never pick some more modern language -- there's far too much stuff to implement. But I wouldn't write such a thing for L00P, either -- there's only a few things to implement, but the code is so difficult to read, write, and test. I think Pascal is close to the "sweet spot", and Dragon book seems to agree. If I had to rank languages from "easiest to write a self-hosted compiler" to "most difficult", I would guess something like Forth, Pascal, Lisp, assembly language, BASIC, Small C ...
Does that make sense ? Of course, I'm far too long-winded -- is there a better way to rephrase this ?
On the other hand, perhaps my understanding is quaint and out of date. Please bring me up-to-date. Perhaps there is some *other* language that is now considered better for learning about self-hosting? Perhaps the idea of "self-hosting" is now considered unimportant?
--DavidCary 04:00, 27 Jun 2005 (UTC)

I mostly agree with David. Also keep in mind that in the old days, a language that was not self hosting was considered a failure, or at best not truely a general purpose language. (since if GP, one could implement a compiler in it). Pascal is fairly nice to create a compiler for imperative languages.

Writing compilers in assembler is insane, since the refactoring for each new iteration (e.g. with better optimizations) would kill you. I know, since I tried :-)

The *nix LALR parser tools (Yacc and its GNU variant Bison and Lex) IMHO have as problem that they don't really simplify relative clean languages, also the quality of errormessages is often quite bad. This is one of the reasons why most commercial compilers (including GCC since either 4.0 or one of the later 3.x releases) are recursive descent.

I'm not a OOP purist, but OOP (read basic inheritance) does help when writing a compiler, so for that original Pascal disqualifies, and I wouldn't use it today. A modernised dialect like Delphi or Free Pascal are quite ok though. (Unsigned comment)

  • If you want to check out this link what you'll find is a 350K, 12,537 line listing of the source to a Pascal Compiler for the PDP-10. It's not OOP, and it is self-compiling. Probably something for people who are very masochistic. :) Speaking of Masochism, Nicklaus Worth reported that (and if you'd ever written programs on a CDC Mainframe, as I have, you'd understand why he did) the original Pascal compiler was written in Fortran. Paul Robinson (Rfc1394) 04:45, 26 March 2007 (UTC)

[edit] Weird, weird, weird

Normal humans, as opposed to persons knowledgeable in computer science, would expect an article title compilation to be about what that word meant until last week. But those who have studied computer science (most of them, anyway) deny as a matter of religious dogma that the world existed before last week. Michael Hardy 20:45, 19 Jul 2004 (UTC)

??? Stan 01:00, 20 Jul 2004 (UTC)
Um... there is no article named compilation. When I try to find one, I only hit a redirect to compiler.
What do you really want ? --DavidCary 06:00, 26 Jul 2004 (UTC)
Michael, if you're unhappy with the redirect and have better ideas for it, disambiguate and break the redirect yourself. I don't think there's much room for something other than a dictionary definition, if I understand you correctly... Dysprosia 07:24, 26 Jul 2004 (UTC)

For the record compilers have been around since at least the 1960s.

It would be nice to have a discussion of the origins of the words "compiler" and "to compile" as it applies to computers. Who used the term first? Before there were computers these words were in use, usually associated with "compile a list". The first computer programmers were very literate folks. While defining this new field they made very creative use of the English language. --- And by the way, for the record compilers have been around since at least the late 1950's. --rchrd 01:50, 11 July 2006 (UTC)

[edit] Aren't compilers system software?

  • Until quite recently, I lived under the impression that assemblers, compilers, linkers, etc, belonged to the category of system software. But the wikipedia category of the same name seems to indicate otherwise (only OSs, drivers, and related SW are included). Could someone enlighten me as to this matter? I just haven't been used to think of compilers etc as applications (implying that apps are for -eh- "end users", as opposed to programmers/hackers). At the university where I got my CEng degree, we even had a course called System programming which specifically addressed compiler writing and connected topics (no wonder I'm confused...). --Wernher 23:36, 24 Jul 2004 (UTC)
(What the &^%$! was that last edit all about? I think some of the discussion materal got deleted) Anyway, yes, it's usual to put compilers in system software if you're only allowed to have app and system software categories, otherwise you'd finesse the question by putting it in programming tools and not asking too many questions. :-) Stan 03:32, 25 Jul 2004 (UTC)
On the top level Software is devided to only two categories: system and application. All other categories are subcategories of this two. Kenny 10:11, 2004 Aug 8 (UTC)
Yep, no problem. I did. BTW, I've made a major effort today/night in (sub)categorizing all the stuff in Category:Software (too much was lying around in the Software 'root directory'; in my opinion the subcategories should at least be used if we create them in the first place...). I hope my work led to a more logical and easily navigable sw information collection than before. --Wernher 03:30, 26 Jul 2004 (UTC)
  • I'd disagree strongly that compilers should be counted as system software. I would expect that category to contain the software that keeps the 'system' running, ie. mainly the OS functionality. A compiler, while creating code which may perform that function, is not necessary in any way to operate a 'system' (although it may make system calls during operation, etc). --[[User:VampWillow|VampWillow]] 21:44, 7 Aug 2004 (UTC)
Excuse me, IMHO, VampWillow is not accuanted enougth with CS. In CS compilers etc are under system software. Kenny 10:11, 2004 Aug 8 (UTC)
  • Compilers can be system software (typical case:Unix). However they don't need to be. E.g. something like Delphi doesn't have any direct connection to the OS builders. I'd rather put them under apps myself too. Kenny, could you produce a reference for your argumentation? That is more useful than yes/no fights. (unsigned entry)
  • I think I can add some light (and less heat) to this discussion by adding the following. Originally, when large computers were sold, they came with a base set of system software to be able to use them, including an assembler, Fortran, Cobol, Basic (interpreted or compiled or both) and RPG III compilers, or you bought them as add-ons from the manufacturer of your computer and operating system. At that time you could classify compilers as system software because they were either sold as a package as part of the operating system or were sold to sophisticated, technical users.
Today, compilers are sold over the counter in retail stores - or downloadable for free - by third parties who make them and have nothing to do with the computer manufacturer or operating system vendor and in some cases are available for multiple operating systems (Free Pascal, Free Basic and Java, among others), and thus would no longer qualify as "system software." Paul Robinson (Rfc1394) 19:29, 26 March 2007 (UTC)

[edit] n-pass vs "n-parse"

Hmmm, how is it that the Dragon Book uses pass rather than parse in the relevant context? I would like to believe that Aho, Sethi, and Ullman knew what they were writing about. Strange, or what, VampWillow? --Wernher 21:15, 7 Aug 2004 (UTC)

Ah well, these 'Johnny come lately's do get things wrong occasionally, but as I was taight how to write compilers (and the rest) back in the mid-70s, and *the* book on the subject was the 1971 Compiler Construction for Digital Computers by David Gries which uses 'parse' then, I think, I rest my case!
It is, however, a very common mistake, as with newer compilers repeating their transit through the source code then the original usage of parsing the lexical strings to create the output streams/code/etc rather than being well-written and just going through the once, then the original spelling got lost somewhat (and some youngsters like to think they invented computer languages! ;-) --[[User:VampWillow|VampWillow]] 21:40, 7 Aug 2004 (UTC)
May be somewere it is misstake, but not here. Kenny 10:19, 2004 Aug 8 (UTC)
um, given that you are unable to spell 'mistake' and acknowledge in your userpage that I am not fluent in english, so please feel free to edit my comments in wikipedia I think it was rather pre-emptive of you to revert the correction made, especially whilst discussion is on-going. As such I have removed the revertion. --[[User:VampWillow|VampWillow]] 11:23, 8 Aug 2004 (UTC)

OK kids, let's hold off on the edit war and finish the discussion please. I think the use of "multi-parse" must be something that died relatively quickly, because I started learning about compilers in 1975 and don't remember hearing of that usage (thought I had a copy of Gries, but can't seem to find it). I suspect that "pass" came to be favored because a step that processes intermediate code into better intermediate is not really "parsing" anymore, plus preprocessing and assembly steps aren't really doing parsing either. Since our usual rule is to use the most common term, I think we should use "pass", but at the same time include a note that some sources use "parse" to mean the same thing (since there's a confusion about this, the WP article can be valuable in clearing it up). Stan 13:53, 8 Aug 2004 (UTC)

I have now entered a comprehensive note about the pass/parse usage, with which I hope we may reach a concensus regarding this question. I find the discussion very interesting, as I had in fact not thought of pass as emanating from parse before (quite plausible using English pronounciation). Whether this is the case for all use of pass in compilers, or whether pass has also been used independently from the beginning, I think is an open question. Also allow me to admit that I should perhaps have added <slightly sarcastic>txt</slightly sarcastic> tags around my starting comment of this thread. I did not intent to be 'evil'/sarcastic as such. --Wernher 00:37, 9 Aug 2004 (UTC)
"parse" is the source of the deriviation "pass" as to parse something is to work through it, and the first parse would grab syntatic elements, second parse build tables, etc. as each parse would re-inspect the data. When you try to think carefully about it you can't really make the word "pass" do the same function in an active sense. I suspect there could even be a language (en-us/en-uk) issue in here too, maybe? but. 'Object code generation' btw is done during a parse of the stored data (and again 'pass' doesn't really make sense!). PS. Nothing wrong with being evil occasionally! --[[User:VampWillow|VampWillow]] 10:24, 9 Aug 2004 (UTC)
"Parse" and "pass" have nothing to do with each etymologically, so saith OED. I got an opportunity to look at Gries (via a colleague at Apple even older than I am :-) ) last week and not only does Gries also use "pass" in the "multiple passes" sense, but there is no reference to "multiple parses" that I could find. So I think we need to get a book and page number that justifies this alternate sense of "parse" - don't want this article to become the source of a lemming-suicide-type legend! Stan 21:14, 15 Aug 2004 (UTC)
Coming late to this discussion, but I also find the footnote to be fairly tangential and a bit unlikely. I've never heard of parse and pass being conflicted; they seem very distinct to me. To make things more confusing, the Dragon Book (Aho, Sethi, and Ullman) does not refer to either multi-pass or multi-parse. It defines the former, but then discards it as being too narrow for their purposes:
"Several phases of compilation are usually implemented in a single pass consisting of reading an input file and writing an output file. In practice, there is great variation in the way the phases of a compiler are grouped into passes, so we prefer to organize our discussion of compiling around phases rather than passes." -- Compilers: Principles, Techniques, and Tools, 1988 edition, page 20.
Here, a pass is primarily a means of counting the number of times data is read and written -- it is mainly a way to count the expensive file I/O loops. The dragon book uses "phases" where this wikipedia article uses "passes". Given the importance of this book in the general CS populace, it might even make sense to add a footnote about "phases", to replace the one on "parses"! But anyway, unless someone has references to a book that uses "multi-parse", I'm going to say that the footnote is unnecessary. And since Stan asked for a reference years ago, and it didn't materialize, it seems safe to delete this footnote now. So I did! If that's inappropriate, you can revert it and yell at me. Captain Wingo 11:37, 5 December 2006 (UTC) [Edit - changed "1998 edition" to "1988 edition" :-)

There is a difference; I've never heard the word "parse" used in the context of multiple repetition of source-code processing by a compiler; the term has always been "pass", e.g. a 1-pass compiler, a 2-pass compiler, etc. Now, on each pass the compiler may - and probably does - parse the source, but I've never heard of a 1-parse or 2-parse reference. I mean, I heard the first Fortran compiler had to squeeze into such a small area that it was something like over 50 passes! Paul Robinson (Rfc1394) 19:35, 26 March 2007 (UTC)

[edit] threaded and incremental: the same ?

The article currently mentions

but I'm still a bit fuzzy on the difference between them. I think I understand threaded code, but incremental compiler is simply a redirect to compiler, which has no more information.

What's the difference ?

--DavidCary 01:57, 6 Nov 2004 (UTC)

A threaded compiler can compile parts in paralel, or parse and do codegeneration/optimization (for previously compiled parts) in paralel. The former is more likely.

An incremental compiler only (re)compiles the needed parts of a source. One could discuss about the granularity of what is the minimal part that must be recompiled (procedure or whole compilation unit).

These two things are IMHO not related

[edit] First self-hosting compiler?

I'm wondering if anyone knows of a self-hosting compiler prior to Hart and Levin's 1962 Lisp compiler described in Artificial Intelligence Memo 39, "The New Compiler" (767 kB scanned PDF). The memo describes this compiler as "the first compiler that has ever compiled itself by being executed interpretively." However, I'm not sure if this is to be taken as meaning that it is the first self-hosting compiler ... or just the first one to combine interpretation and compilation in the way that Lisp systems have ever since.

The only languages for which we have cites for earlier compilers are FORTRAN and COBOL; were there (prior to 1962) any FORTRAN compilers written in FORTRAN, or COBOL compilers in COBOL? I find this question interesting in part because it casts light on just how odd is the misconception many people have, that Lisp languages are interpreted ... :) --FOo 02:25, 16 Nov 2004 (UTC)

Probably not any compilers written in Fortran or Cobol around 1960, for one thing it would have been hard to code things like symbol tables. Assembly was how one did system programming back then. However, Backus in HoPL mentions knowing about a Corrado Böhm language ca 1953 for which a self-compiler had been published, don't know any more about that. Some of the earliest languages were not that different from assembly, and certainly self-assembling assemblers go back into the early 50s. Stan 04:12, 16 Nov 2004 (UTC)
Following some citations suggests this paper, but it's not publicly available to non-ACM-members. Grumble, grumble. I expect it's probably fair to say that Hart and Levin's Lisp is the first recognizable modern language with a self-hosting compiler, since the sample code presented in their memo is within a keyword or two of working in Common Lisp or Scheme. If we're going to say "the FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler, in 1957" then I feel fair calling Lisp in '62 the first self-hosting one. --FOo 05:16, 16 Nov 2004 (UTC)

I have heard that there was one self-hosted Cobol compiler, supposedly it was about 800,000 lines of code. But I don't think there's ever been a self-hosted Fortran compiler, or at least none that was advertised. Now, the Ratfor preprocessor, I believe that was written in Fortran. (I looked it up; according to the article, Ratfor was written in Ratfor, and could translate itself to Fortran). Paul Robinson (Rfc1394) 19:40, 26 March 2007 (UTC)

[edit] how is "threaded code compiler" different from any other kind of compiler?

the article currently claims

  • Threaded code compiler (or interpreter), like most implementations of FORTH
    • This kind of compiler can be thought of as a database lookup program. It just replaces given strings in the source with given binary code.

While this is certainly true for most Forth compilers, how is this any different from any other kind of compiler ? For example, doesn't *every* compiler replace the "while" string in a do{...}while(); loop with the appropriate machine-language binary code for test-and-jump ?

The level of this binary code can vary;

Huh?

in fact, some FORTH compilers can compile programs that don't even need an operating system.

I certainly think it's very interesting that some compilers can create programs that don't need an operating system.. Cross-compilers for microcontrollers produce programs that run "on bare metal"; any RTOS for those systems is usually re-compiled and mixed with the application. Visual Basic and Java compilers, on the other hand, create programs that require some sort of operating system. What would you call these two kinds of compiler?.

I suggest replacing that with:

  • Threaded code compiler (or interpreter), like most implementations of FORTH, generates threaded code.
    • This kind of compiler focuses on producing extremely small output to fit into limited memory systems. Often people make the compiler itself very small, so it also can fit into the target system (as a native compiler). Typically single-pass. Often places output directly into RAM where it can immediately be executed (like a just-in-time compiler) -- this allows programmers to write functions that are executed during the compilation of later functions. (Most other compilers require the entire program to be completely compiled, before any of it is executed).
  • (some good name here) compiler generates "hosted" applications that require some sort of operating system.
  • (some other good name here) compiler generates "freestanding" code that runs "on the bare metal" without any operating system. People who write operating systems use this kind of compiler.

--DavidCary 04:00, 27 Jun 2005 (UTC)

I think the freestanding vs hosted discussion is pretty useless. It is only a gradual difference in how much work you must do to create a freestanding app. (iow if I must fix the RTL?) It seems to me a hosted compiler can become freestanding if the vendor postdelivers a minimal RTL with no OS ties, so a categorisation hosted or not seems only temporarily valid at best.

[edit] IL vs IR

I changed some occurances of IL generation as a compiler phase to "intermediate representation", since that is more correct. The intermediate representation from which code is generated doesn't have to be a language, even in the broadest sense of the word. And more important, it often isn't. I kept IL as an example of an IR though, and also mentioned compile graphs and trees.

Maybe I didn't get all occurances though.

[edit] Figure 1 should be explained

Hi, I am creating an OpenOffice.org source drawing in order to translate the figure to other languages. The draw in spanish and a source file link is going to be put in commons. I realize that the figure is not explained in the article text as the good practice sign to. For example, "target code generator" or "machine code" are not sufficient treat. --GengisKanhg (my talk) 13:32, 27 September 2005 (UTC)

[edit] incorrect "cross compiler" comment

In the introduction section, the phrase "Compilers also exist which translate from one high level language to another (cross compilers), or[...]" is highly inaccurate. "Cross compiler" refers to something very different, as described in the immediately following section. Having worked in the industry for many years, I've heard lots of names for a high-level-to-high-level translator, but cross compiler was never one of them. I've deleted the parenthetical comment.

[edit] WikiProject Efficiency

Wikipedia:WikiProject Efficiency I'm looking at the list of related compiler articles and merging or redirecting the weaker stubs: did 2 already. Cwolfsheep 23:32, 4 June 2006 (UTC)

[edit] duplication

the "types of compilers" section is a duplication of text at the very top of the page.

[edit] Bad Link URL

The link to CodePedia needs a www. prefix. It give me a "Host not found" message. I'll fix it. --rchrd 01:54, 11 July 2006 (UTC)

[edit] The last sentence in the "Compiled vs Interpreted Languages"

The last sentence in the "Compiled vs Interpreted languages" section is: "To implement these features in a compiled language, programs must usually be shipped with a runtime environment that includes the compiler itself." Shouldn't that be: "... shipped with a runtime environment that includes the interpreter itself?" Or am I just splitting hairs? -Matt B. 03:14, 17 March 2007 (UTC)

No, "compiler" is correct. An example is Steel Bank Common Lisp, or many other Lisp systems. The runtime environment includes a compiler, which programs can use to compose and compile functions.
(For clarification, current versions of SBCL also include an interpreter. But the COMPILE function is most definitely part of the runtime, and that function is a compiler.) --FOo 04:25, 17 March 2007 (UTC)