Talk:Chomsky hierarchy
From Wikipedia, the free encyclopedia
Does anyone else find the title of this page ironic? Chomsky hierarchy--I know it isn't a hierarchy over people, but still, it is kinda funny.
- What is funny about a categorization scheme for grammars of different languages? Perhaps you are referring to Chomsky's advocation of anarchosyndicalism which is inherently opposite a hierarchy. —optikos 02:32, 19 September 2006 (UTC)
I've added a table to Chomsky hierarchy, and moved some of the information from the bullets to the table. I thought it would be easier to see the relationships at a glance if they were in a table. Feel free to resurrect the old bullets (below) if you think that's more readable. -LC
I've reinserted the old description because I feel it is more understandable and gives more explanation for people who don't know the hierarchy already. -- JanHidders
- well then I'd hate to see the old description because the current one is incomprehensible. --Ignignot 03:26, Dec 13, 2004 (UTC)
I've fixed the definitions (nonempty γ, Type-1 includes S->&epsilon). I also removed the redundant A->a, for simplicity, and for consistency with some theory books.
I'm curious about the names "Chomsky-n" and "(CHn)". I've always heard people talk of "Type-n" grammars and languages, but not "Chomsky-n" languages. None of my books have it either. A web search turns up lots of hits on "Type-0 grammar" but none on "Chomsky-0". The same is true for the papers archived at [1]. Is this term in widespread use? Or did a textbook coin it for internal use? -LC
I agree with the repair of the Type-1 grammer definition but then you should add a remark at CH1 that the rule S -> ε is also allowed. That way it is indeed the most common form. (I just did some checking in the library.) But for regular grammars the most common form is where all three types of rules are allowed. An important exception is the book by Hopcroft and Ullman Introduction to Automata and Language Theory where they only allow A -> aB and A -> a with the remark that S -> ε is also allowed. Actually I like their approach the best:
- type-0 : no restrictions
- type-1 : αAβ -> αγβ (with γ <> ε) and maybe S -> ε
- type-2 : A -> γ (with γ <> ε) and maybe S -> ε
- type-3 : A -> a or A -> aB and maybe S -> ε
Two arguments: (1) the book is a classic, (2) it makes it immediately clear that every higher type grammar is more restrictive than the lower type grammars. The discussion about the alternative definitions and why they are equivalent can always be done in the articles on the specific type of grammer. Deal? :-)
As far as the "Chomsky-n" names are concerned, I didn't introduce those and I also haven't seen them anywhere before. The hierarchy is usually presented as a hierarchy of grammars and not of languages, so I feel we should do the same. -- JanHidders
I agree. These definitions make it clearer that each is a superset of the next. That's nice. I'll go ahead and change the table to use those definitions. The "maybe S->ε" part can go in a sentence after the table, since it also needs to explain that if you use that form, you can't have S on the right side for Type-1.
I changed it from a hierarchy of languages to a hierarchy of grammars, and got rid of the "Chomsky-n" and "CHn" names before, but it slipped back in when the bullets were re-expanded. I'll make that change again. -LC
The definition of regular languages that was in the table (namely, using A --> a and not A --> \epsilon) was technically wrong, since it will not allow the generation of any regular language that contains the empty word. I've changed this to allow the A --> \epsilon production rule. The discussion above about different ways to present regular languages might be interesting to push in the article, but for now, the diagram should not be incorrect. -dam
Shouldn't the third production in the example grammar be changed as follows:
OLD:
- BA -> AB
NEW:
- AB -> AABB
I don't see where BA can ever be generated when starting with S. This change seems like it would make it so that the grammar could indeed produce { anbn | n ∈ Integers & n >= 0 } as described. Right now it looks like it can only produces { ε, ab } (never being able to use several of the productions).
- It is correct. BA can be generated using the first production twice. For example
S -> ABS -> ABABS -> AABBS -> AABb -> AAbb -> Aabb -> aabb.
However, it would be nicer to give an example of a language that is not context-free. --Zero 12:25, 25 Oct 2003 (UTC)
Alternatively, the same language could be codified with a context-free grammar of only two productions and one nonterminal: S -> aSb; S -> ε.
Useful links for further work
I'm not an expert on formal grammars, so forgive me if I'm wrong, but with the example grammar given for anbn:
S -> ABS
S -> ε (with ε the empty string)
BA -> AB
BS -> b
Bb -> bb
Ab -> ab
Aa -> aa
it seems you can reach a state where no more productions are possible yet you still have non-terminals in the string:
S ABS (S -> ABS) AB (S -> ε)
There's no production rule for AB, nor A alone or B alone. Is there an error?
Also I agree that the simple context-free grammar S -> aSb; S -> ε may be a better example. What do people think?
Chopchopwhitey 20:49, 22 Nov 2003 (UTC)
- It is not an error to be able to reach a point where no productions apply. The definition of the language generated by the grammar is the set of strings of all terminals that can be produced. It doesn't matter what other strings can be produced. --Zero 02:43, 23 Nov 2003 (UTC)
-
- Ok, thanks for clearing that up for me. --Chopchopwhitey 08:28, 23 Nov 2003 (UTC)
The linked page pushdown automaton makes a distinction between the deterministic and non-deterministic varieties of PAs and the languages that can be generated by each. Would it be correct to say this causes a splitting of Type-2 grammars into:
Type-2a - recognizable by NPA
Type-2b - recognizable by DPA ?
A sentence about this would be helpful.
marsh @{1} mysteray \.{1} com
- Not exactly. It would be more correct to say that you have Type-2 grammars (recognizable by NPAs) and Type-2.5 grammars (recognizable by DPAs) where the first describes a proper superset of the second, and the second describes a proper superset of the languages described by Type-3 grammars. -- Jan Hidders 22:52, 16 Feb 2004 (UTC)
The names "type n grammar" and "type n language" for n in 0..3 are used by Chomsky in his original article (On Certain Formal Properties of Grammars, Information and Control 1959), so I think it is best to use them here as well. The purpose of the article is to prove that the four types of languages are separated by proper inclusions, so I think it is best to keep discussion of other types of grammars and languages introduced later (deterministic context-free languages, LR(k) and LL(k) languages, simple context-free languages, bracketed languages) out of this entry. -- rp 9 mar 2004
- Then where should other formal languages be discussed, for example, indexed grammars/languages (Type 1.5) (Aho, 1968), mildly context-sensitive grammars/languages (1.75) (Joshi et al, 1975), and deterministic PDAs (2.5), among others. They certainly deserve treatment somewhere in a hierarchy of formal languages, even if Chomsky didn't discuss them in his 1959 and 1963 papers. --jonsafari 23:03, 20 May 2006 (UTC)
In his original article, Chomsky does not consider empty productions or languages with empty strings, but the fact that empty string generation can always be "pushed out" to a single production is so fundamental that perhaps it must be discussed here rather than in a separate node. -- rp 9 mar 2004
Mention in made in the Chomsky hierarchy entry of a "containment hierarchy". What is it? How does a containment hierarchy differ from other types of formal hierarchies? What are the other types of formal hierarchies?
- If you need an explanation of a term that is linked to another Wikipedia article, then click on that link and read about the concept at the other article. This matter is fully addressed in containment hierarchy. —optikos 02:32, 19 September 2006 (UTC)
Contents |
[edit] Finitely many productions.
I've added to the given definition of a grammar the stipulation that the set of productions must be finite. This restriction needs to be placed either on all grammars, or on each of the classes used to define the Chomsky hierarchy for the article to be correct. I've chosen the former since this is consistent with the definition in formal grammar. Of course one can argue that there's nothing in principle wrong with the idea of an "infinite grammar" and that this idea is useful in certain contexts, but I think that debate is better had in the context of that article! Best wishes, Cambyses 10:21, 21 July 2005 (UTC)
[edit] Schützenberger
The article says nothing about Marcel Schützenberger. I added a line, but it seems there should be more info (some references?) on this. Mhym 19:55, 14 March 2006 (UTC)
[edit] Categorization addition
Given how important the hierarchy is to computer science, especially in the development of programming languages shouldn't there be a categorization link to either Category: Computer Science of Category: Programming Languages?
[edit] Why does this matter?
I came across this somewhat accidentally, and it left me wondering: why does this matter? Not that I doubt that it does, of course. It's just that the article doesn't say much about why this division of grammars is interesting or important. If someone knows, it would be great if they could add it. William Pietri 06:06, 10 July 2006 (UTC)
[edit] Page name change
Back in March, an editor moved this page from Chomsky hierarchy to Chomsky–Schützenberger hierarchy.
The name "Chomsky–Schützenberger hierarchy" has very little currency, as Google search will immediately verify. Hits for "Chomsky hierarchy" outnumber "Chomsky-Schützenberger hierarchy" by 68,000 to 15.
Regardless of the justness or aptness for the name "Chomsky hierarchy", that is what the subject of the article is called, and that is what the article should be named.
There are many similar cases; for example the Zorn's lemma article is under "Zorn's lemma", not "Kuratowski's lemma" or "Kuratowski-Zorn lemma" or "Zorn-Kuratowski lemma", because, in English, the lemma in question is universally known as "Zorn's lemma", despite the fact that Zorn enunciated a different (albiet related) maximal principle, and that Zorn's work was indisputably anticipated by Kuratowski.
Accordingly, I am moving this article back to Chomsky hierarchy. If Schützenberger made significant contributions to it, the article should say so in detail. But it should not be titled "Chomsky-Schützenberger hierarchy" when hardly anyone calls it that.
-- Dominus 13:49, 13 September 2006 (UTC)
Addendum: All of the cited references are for Chomsky, none for Schützenberger. The "external link" refers to "Chomsky hierarchy", no mention of Schützenberger.
-- Dominus 13:53, 13 September 2006 (UTC)
- I wholeheartedly agree with this analysis and this corrective action. —optikos 02:13, 19 September 2006 (UTC)
[edit] Not too technical
Unless someone can explicitly itemize how this article is too technical or does not properly link to prerequisite background material needed to understand this widely-taught computer science concept, then the { { technical } } demerit needs to be removed, because this article is now very accessible and provides numerous links to prerequisite background material. This article is babified as much as it can possibly be and still discuss anything remotely related to the Chomsky hierarchy of grammars and the computability/expressivity implications. (And this article does a very bad job in the computability/expressivity department just to keep it babified for the masses.) Quite honestly we computer scientists do not take kindly to complaints such as "ooOOOOooo, it makes my head hurt" regarding our subject matter. I have heard ever since diagramming sentences in 6th grade how much people hate structured grammar in every one of its forms of presentation. It is time for the topic to get some respect! —optikos 02:32, 19 September 2006 (UTC)