Talk:Second-generation programming language

From Wikipedia, the free encyclopedia

The idea of "generations" of programming languages appears to have arisen as a bit of marketing jargon particularly around the epoch of the so-called "fourth-generation" languages. The proposed distinctions imply that trends in language popularity are progressive rather than being driven by a combination of marketing fads and shifting requirements.

It is increasingly obvious, however, that this is the case: while there is a broad general trend towards greater abstraction from the hardware, it is not monotonic. For instance see the decline in popularity of the more-abstract language Lisp in favor of the closer-to-hardware C and C++ in the 1980s and '90s. Nor is there a determined trend towards application specificity; see, for instance, the demise of special-purpose COBOL for general-purpose Java in business applications.

Of course, changes in language popularity are not driven entirely by marketing. COBOL lacks standard libraries to talk to Internet clients; Java has them. As talking to the Internet becomes more important for the problem domain, usage migrates to a language where it is natural: Java. Likewise in other domains: biological science programming, once dominated by Fortran, acquires a need for text processing due to the rising importance of genomics, and begins to migrate to Perl. These changes are not toward greater application specificity, but rather toward closer fit to changing application requirements.

(Indeed, the newly adopted languages often lack underlying application-specific features the old ones have: Java does not have fixed-point decimal numbers, a COBOL feature valuable for business applications.)

What's my point? The idea of successive "generations" of programming languages replacing one another at higher levels of abstraction and application specificity is not historically accurate after, say, 1960. (COBOL, Fortran, and Lisp all existed in 1960.) Wikipedia should not present it uncritically, but rather note it wherever it appears as folk-history and marketing jargon rather than historical reality. --FOo 15:23, 8 Dec 2003 (UTC)

[edit] Scope/accuracy of this article

This article, the other articles in the programming language generation series, and the first comment on this discussion page, all characterize language generations in a way that seems strange to me.

In particular, the idea of calling C a second generation language is bizarre. Its relatively low level notwithstanding, this flies in the face of how the term has always been used. Just the fact that it is possible (important!) to write optimizers for C, and that assembly language programmers shake (or rather, shook) their heads about the inefficiencies of C code generation, would seem to bear this out.

My understanding of the history of this term – but one for which I am as yet unable to find good sources – is as follows:

  • First generation languages are simply the numerical machine code of a particular processor. In general-purpose computing, machine code was only used on the very first computers. Those computers themselves were termed "first generation" hardware, and were based on vacuum tubes or mechanical relays.
  • Second generation languages are symbolic assembly languages that provide a mnemonic sugaring for machine language. By the 1950s, as second-generation (transistorized) hardware became dominant, essentially all processors were shipped with assemblers.
  • Third generation languages were first developed in the 1950s. They originally were just called "higher level languages", in contrast to "assembly language"; "generations" was still primarily a term applied to hardware. Important early HLL efforts, as everybody knows, included FORTRAN, COBOL, PL/I, ALGOL, and LISP.
  • The term fourth generation language was introduced became popular in the 1970s to describe nonprocedural languages – languages with significant processing that occurred "behind the scenes", instead of being directly specified by the programmer. [update: Though according to a citation on the 4GL article, James Martin asserts that the term was in use in the 1960s.] These included constraint-based languages, output-oriented languages (e.g. report writers), application generators, and database languages, among others. A few random names include RAMIS, NOMAD, FOCUS, QUEL, SQL [update: some object to considering SQL and other query languages as 4GLs, but these were in fact archetypes of the concept at the time; note that they were called fourth-generation languages, not fourth-generation programming languages], Smalltalk, and the programming languages associated with most database systems of the day, as well as domain-oriented systems like SPSS. Most were attempts to allow end users to specify processing requirements in the language of their fields – bringing the computer to a subject matter expert, rather than forcing such users to become computer experts. Many of these languages were interpreters. The distinction between fourth generation and third generation languages increased the sense that assemblers were second-generation languages. Note that powerful earlier languages such as LISP, Simula, and even FORTRAN and Algol (when considered alongside their function libraries) were capable of great complexity and sophistication; but that the term 4GL was generally reserved for languages that were geared to an end-user problem space.
  • The term fifth generation language was invented to describe various approaches to language design after – well, after the term 4GL came into wide use. It emerged around the time of Japan's famous fifth generation computer project of 1982. It has been variously applied to the language concept du jour since then. [update: Its value would seem to be in identifying languages that aren't like the 3GL procedural languages nor the 4GL nonprocedural/domain-oriented languages.]

I present this for what it's worth. Having programmed using languages in all generations, fought the good fight to introduce higher-level languages into second-generation shops, and participated in the development of so-called fourth-generation and fifth-generation languages, I would hope this description isn't too far off.

I agree with the comment above that we need to distinguish the historical use of nth-generation language from neologisms based on programming complexity/hierarchy. However, since the latter seems to be the current trend in practice, perhaps both views need to be presented, on these pages and on a new article about programming language generations that all of these should use as a main article. Trevor Hanson 18:18, 7 October 2007 (UTC)

I'm not sure exactly when these terms were coined. I'm quite certain that machine code and assembler language were both developed without the inventors realising that these terms applied (a bit like World War One not being known by that name until after WWII) and that they have come about from an historical perspective, probably when people starting looking at languages like COBOL and needed a way to distinguish them from what had come before. As far as I am concerned, these are abstract terms which attempt to characterise a language by the proximity of its syntax to the underlying machine code. I can see the point of view that compares C with a 2GL because there is often a one to one relationship between a command and a machine instruction - but I think the portability of C definitely makes it a 3GL. I'm sure that a case can be made for specific languages falling between generations, or having attributes which make it difficult to say which it truly belongs to. However, I don't think this is the point of the series of 1GL, 2GL, 3GL ... articles because they are, or should be, discussing abstract concepts, rather than categorising individual programming languages.
Potentially you could merge this whole series into a single programming language generations article. SilentC 03:33, 8 October 2007 (UTC)

Something worth considering: It may well be that these terms are ill-defined; that they are inconsistent, self-undermining, and ultimately nonsense.

As I understand it, these terms were largely advanced by software marketing folks. They wanted their own companies' software tools considered to be of a higher "generation" than other people's tools -- particularly database companies pushing "fourth generation languages", meaning "report generators" ... with the implication of higher generation being, of course, that these were newer, more advanced, more powerful, more productive tools.

Naturally, these terms caught on with people who like to think of themselves as expert in the newest, coolest tools. But that's precisely because they aren't very descriptive terms. They're mostly wind. --FOo 05:23, 8 October 2007 (UTC)

Nevertheless, these terms have been used for a long time – if Martin is correct, the term 4GL dates from the 60s – and not just for marketing purposes, and NOT simply to describe the proximity of a language to the hardware. They largely paralleled the generations of hardware, which really DID have generations in a measurable sense; though this no doubt made it harder to be precise about what exactly constituted a 3GL versus a 4GL. There are assertions here and elsewhere that these were primarily marketing terms – and "mostly wind" – and there is no doubt there was plenty of wind on the topic. Yet I don't think the terms are devoid of meaning; and they certainly are not devoid of history. Our primary focus here should probably be on historical usage, with a nod to whatever currency they may have in contemporary academic taxonomy of programming languages. Does somebody have Jean Sammet's book? What does she say about language generations? Trevor Hanson 07:53, 8 October 2007 (UTC)
And just to remind everybody, the term is 'nth generation language' rather than, say 'nth-ring meta-level language'. This would intrinsically seem to refer to an evolutionary process of language design over time, rather than a measure of complexity or abstraction. Later generations have built on their predecessors. In this view, a later language that is radically different from its predecessors would be described as a different generation; if s not different, then it's the same generation. I think this is how the term has primarily been used. But again, we probably need to defer to how contemporary educators teach computer languages and computer science history. It doesn't matter so much what we may individually remember or believe about this hierarchy – or its non-existence. Trevor Hanson 08:05, 8 October 2007 (UTC)
Well sorry but no I disagree with that. The terms might have been derived from the terminology used in hardware, but every definition of language generation that I have ever seen discusses the languages encompassed in terms of the relationship to machine code. Yes each generation builds on the one before, but the building has little to do with the underlying hardware and everything to do with the layers of abstraction between the machine and the programmer. 1GL = machine code. 2GL = human readable mnemonics for machine code. 3GL = abstraction from the machine layer. 4GL = syntax understood by non-programmers. 5GL = visual tools which create underlying code. You can waffle and pontificate as much as you like, but that is the simple definition of the concept that these articles should describe. I'm not debating that the evolution of hardware has enabled these latter generations to come to be, but that is not the point. As a pure abstract concept, language generation is simply about how detailed a knowledge the programmer has to have of the actual machine instructions that underlie his or her code. SilentC 23:38, 8 October 2007 (UTC)
Sorry, I didn't make myself clear. Of course what you're describing is how we tend to use these terms today (and your summary is good); and moreover, the terms are probably only useful in terms of that hierarchy. And I also agree that the focus of these articles should be on levels of abstraction. However, I submit that a Wikipedia article about these terms needs to be very clear on their historical use; and I don't believe that their reflection of levels of abstraction away from the hardware became explicit until relatively recently, i.e. the 80s. I think some languages that today we'd consider fourth-generation languages were originally considered third-generation, strictly because of the time at which they were developed. APL and Simula, and perhaps Lucid, Clu, SNOBAL, and Smalltalk, might be examples. At any rate, any taxonomy that might call C a 2nd generation language would seem to fly in the face of historical usage, and this fact needs to be clear. Trevor Hanson 00:56, 9 October 2007 (UTC)
Fair enough. Apologies if I came on a bit strong. This is why I think it would be better if the articles were all merged into an article on generations of programming language: there's obviously some historical context that needs to be wrapped around the whole topic and it probably wouldn't be out of place to mention that the terms have been 'misappropriated' by marketing departments. So it could start out by explaining what we mean by a language generation, and how the term came into common use. Then it could expand on the 5 generations. I think any statement along the lines of 'some people consider C to be a 2GL' can either be omitted, or if kept, require an academic reference. I believe it is generally accepted that C is a 3GL. SilentC 02:25, 9 October 2007 (UTC)
Yes, I think we're actually on the same page, or would be if I could get some sleep. My starting point was the condition of the current articles, and the need to get some eyeballs on them. (Though there is one thing I like about them: that nice little template on the bottom. I have to agree with you about merging the five into a single article. But the template looks so...logical. :) Maybe we could keep the template, if we cut them all down to little stublets that link to the main article, mostly existing for the purpose of animating the template. Each could show a code snippet of the same algorithm, programmed in languages with different levels of abstraction. James Martin did that in a 4GL book back in the 70s.) Anyway, I would argue that your crisp description of language abstraction levels (modulo a little hand waving about how to describe 4GL and 5GL) is a good counterargument to the position that these terms are either meaningless or not useful. Abstraction seems like a very helpful way to view language relationships. Trevor Hanson 09:16, 9 October 2007 (UTC)
"Each could show a code snippet of the same algorithm, programmed in languages with different levels of abstraction." Now that is a nice idea. But please, not Hello World! I don't see an issue with keeping the sub-articles. Some editors have a thing about stubs but I believe they often serve a purpose. Bit busy at the moment, but can probably help out in the next couple of weeks if someone else doesn't get to it first. SilentC 23:01, 9 October 2007 (UTC)