Talk:Context-free grammar

From Wikipedia, the free encyclopedia

Socrates This article is within the scope of the WikiProject Philosophy, which collaborates on articles related to philosophy. To participate, you can edit this article or visit the project page for more details.
??? This article has not yet received a rating on the quality scale.
??? This article has not yet received an importance rating on the importance scale.
This article is within the scope of WikiProject Computer science, which aims to create a comprehensive computer science reference for Wikipedia. Visit the project page for more information and to join in on related discussions.
B rated as B-Class on the assessment scale
Top rated as Top-importance on the assessment scale

Contents

[edit] Derivations and syntax trees

The article presents this example:

For example, if we take the following grammar:

(1) S → S + S
(2) S → 1

and the string "1 + 1 + 1" then the left derivation of this string is the list [ (1), (1), (2), (2), (2) ]. Analogously the rightmost derivation is defined as the list that we get if we always replace the rightmost nonterminal first. In this case this would be the list [ (1), (2), (1), (2), (2)].

However, we could just as easily replace the rightmost S with S+S for the rightmost case, or we could replace the leftmost S with 1 for the leftmost case. In other words, this section should be rewritten with a better (less ambiguous) example RaulMiller 02:32, 21 Sep 2005 (UTC)
The "1 + 1 + 1"-example is really hard to understand. There is (if I understood it correct) two ways to do rightmost derivation, and two ways to do leftmost derivation. In adittion the method that gives [ (1), (2), (1), (2), (2) ] actually applies to both leftmost- and rightmost derivation? Things would probably be better explained to the reader if another example is used. [Guest user], 3 Dec 2005

It is necessary to add, that arrows outgoing from node are ordered. AlexShkotin 05:14, 6 September 2006 (UTC)

[edit] Decidability

It is not possible to construct a general algorithm which takes as input two context-free grammars and decides whether the two grammars generate the same language; nor is it decidable whether their languages have a single string in common. It is however possible to decide whether a given context-free grammar generates a non-empty language or not, and it is also possible to decide algorithmically whether a given context-free grammar generates an infinite language or not.

Is it possible to decide whether given context-free grammar is equivalent to given regular grammar ? In particular is it possible to decide whethere there is single string that doesn't belong to context-free language (that is, whether or not is it equivalent to .*) ? Taw 14:41 Apr 6, 2003 (UTC)

The answer for your first question is Pumping lemma. -- Sundar 05:08, Feb 15, 2005 (UTC)
And the answer to your second question is yes. See Rice's Theorem. -- JBolla
It seems both answers are wrong. It is undecidable whether a given context-free grammar generates all words over its alphabet. Jochgem 23:45, 11 March 2007 (UTC)
The first answer is indeed wrong as well: it is undecidable whether any given context-free language is regular (I was surprised to learn this, too. Google for references). The pumping lemma isn't good enough: some non-regular languages can be pumped. Rp (talk) 23:29, 11 April 2008 (UTC)

[edit] Syntax or semantics

Context-free grammars are important because they are powerful enough to describe the syntax of programming languages

(Well...context-free grammars can describe most of the syntax of programming languages. For example, any programming language that requires that variables be declared before use is not fully describable via a context-free grammar, essentially because there can be arbitrary amounts of source code between declaration and use--see the reference to the "pumping lemma" below. Such constraints are typically swept over into the corner, labelled "semantics," and dealt with outside of the parser mechanism proper. In the example, "semantic action" code is attached to the grammar rule that recognizes identifiers to check against a symbol table.) --24.36.158.101

Right, the requirement to declare before use is called "semantics" rather than "syntax", so the sentence in the article is correct: the syntax of programming languages is describable by context-free grammars. --LC
Not exactly - it's debatable whether "declare-before-use" is really "syntax" or "semantics". In C and C++, for example, it's impossible to check whether a source file is syntactically correct without building a symbol table while parsing that can at least distinguish between typedef'd and "regular" identifiers. The syntax of C and C++ critically depends on the distinction betwen those two forms of identifiers, which depends on what you're calling semantic information. In addition, there's plenty of other such stuff in various languages that's usually "swept into the corner" but is unquestionably syntax. For example: the ad hoc precedence rules built into parser generators to resolve dangling else and similar conflicts; the whole "longest match" convention that is built into lexical analyzers; the layout mechanisms of many more recent languages such as ML, Haskell, and Python. Then there's the fact that the C++ grammar has ambiguities that cannot be resolved by any finite amount of lookahead, and so the C++ standard simply states ad hoc disambiguation rules which implementors have to follow by making the parser capable of looking ahead an arbitrary distance in certain situations. As far as I can tell, practical languages that really follow a "pure" context-free syntactic paradigm are rather few and far between. Brynosaurus 15:00, 13 Aug 2004 (UTC)
It's my understanding that because of these facts C, C++, and the other languages do not have context-free grammars. A context-free grammar is an ideal for a computer programming language which is seldom achieved - basically because many things which are very useful for programming languages are not possible in a context-free grammar.
This is the strict sense of "context-free" grammar. When programmers think about "grammar", "syntax", and "semantics" they usually have much looser usage of these terms. — Hippietrail 02:13, 14 Aug 2004 (UTC)
Quite so. It is certainly almost universal practice to use BNF notation informally as a method of concisely describing or suggesting the general, overall structure of a language's syntax. But it is almost never the case that these BNF notations actually formally specify the syntax of the language by themselves. Brynosaurus 06:18, 14 Aug 2004 (UTC)
I have added some words in the article in the hope of clearing up the confusion. It is a fact that the equivocation of 'syntax' and 'context-free grammar' is common in computer science.

[edit] Phrase structure grammars

Phrase structure grammar and context-free grammars


There is a redirect from Phrase structure grammar to this page.

Does this mean that Phrase structure grammars are just context-free grammars in any case? Isn't there more to say about Phrase structure grammars?

PSG isn't a precise term. CFGs are definitely PSGs, but there are also CSGs (context-sensitive grammars), for example, which could also be described as PSGs. The redirect seems sensible enough to me for the moment, since it's hard to imagine that an article on PSGs could be written which didn't just duplicate material in Context-free grammar and Chomsky hierarchy. I agree though, that the redirect is misleading. Perhaps it would be better to have a stub with links to Context-free grammar and Chomsky hierarchy? Cadr 23:11, 14 Feb 2005 (UTC)

It ought to be a stub if they are not equivalent.Otherwise it is misleading. -- Sundar 05:08, Feb 15, 2005 (UTC)

I agree with Sundar. –jonsafari 22:38, 18 June 2007 (UTC)

I did some more looking around on this. Here's the score from the most reputable sources I could find:

  • Chomsky Three models for the description of language (1956) introduces the term "phrase structure grammar". Here he is clearly referring to a CFG.
  • Jurafsky and Martin's Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (2000) says PSG is a synonym for CFG.
  • JE Hopcroft and JD Ullman, Introduction to Automata Theory, Languages, and Computation (1979) might in some sense be more reputable than Jurafsky and Martin, and says PSG refers to anything in the Chomsky hierarchy. This might reflect a change in usage between the introduction of the term and this work's publication.
  • Lewis and Papadimitriou Elements of the Theory of Computation (1998) doesn't address the issue.

So this seems like a disputed usage. At any rate, no one uses the term in any current work. The stub idea seems reasonable. kraemer 00:39, 20 June 2007 (UTC)

I changed it to a stub. Cadr 18:21, 21 June 2007 (UTC)
As far as I can tell: computer science doesn't use the term phrase structure grammar at all, while it is the standard term in (generative) linguistics. Hence the confusion. Rp (talk) 23:32, 11 April 2008 (UTC)

[edit] Natural languages

It would be nice to include some statement about whether natural languages are context-free in this article. On this topic, the article Bambara language has the following note:

In mathematical linguistics Bambara is regarded with interest, since for only very few languages it was possible to show that they were not context-free. For Zurich German and Dutch the proof is based on sentence construction, whereas the proof for Bambara is based on word construction.

I wonder what is statement is really supposed to say, because I find it a bit difficult to believe. Can it really be so hard to find an example showing that English is not context-free? --Saforrest 12:37, Apr 19, 2005 (UTC)

[edit] BNF and Production rules

The article states that "BNF (Backus-Naur Form) is the most common notation used to express context-free grammars.", but still the article make use of production rules like V→w instead of V::=w, is that because production rules and BNF are interchangeable ? --Robsy 11:32, 2 May 2006 (UTC)

No, it is because theoretical articles about context-free grammars use the arrow, while the BNF notation is pretty much restricted to manuals or textbooks that describe the syntax of a concrete language. The reason for the difference, by the way, is that BNF was designed for ASCII-only text. Rp 09:28, 25 July 2007 (UTC)

[edit] Lojban

I doubt that Lojban is actually context-free. It probably were context-free if none of the terminator cmavo were elidable. Icek 22:53, 28 June 2006 (UTC)

It is parsable at the word-level with an LALR parser (there is at least one implementation of a parser done using Yacc). However, should it really be listed on the context free grammar page (I'd argue not)? Saying it has "an immense expressive power" is definitely POV and there is no citation, so I'm removing that part. JordanDeLong 03:34, 24 August 2006 (UTC)
AFAIK it is not possible without some preprocessing. From the Lojban Reference Grammar[1]: Lojban is not in itself LALR1. There are words whose grammatical function is determined by following tokens. As a result, parsing of the YACC grammar must take place in two steps. [...].
The problem that I see with the alleged context-freeness of Lojban is as follows: Consider a Lojban sumti (an argument to a predicate) which contains subordinate clauses which contain the words be (marks the beginning of the relative clause) and be'o (which ends a relative clause). The subordinate clause may itself contain subordinate clauses and so on. However deeply nested the subordinate clauses are, all the be'o terminators may be elided if there is a single ku terminator terminating the sumti (and if there are no ambiguities stemming from multiple subordinate clauses on the same level, for further details see [2], section 7). Icek 20:15, 25 August 2006 (UTC)
Got ya: I think you're right. I went ahead and removed it from the page. It may need some other items in "Other Examples" now, though. JordanDeLong 02:55, 26 August 2006 (UTC)

[edit] How useful is this page?

You either understand all this jargon, in which case you already know what a "context free grammar" is...or you don't, and this is Greek to you.

Useless.

The comment above is useless. I found the explanation on this page clear and helpful. --Whisk3rs 00:07, 17 May 2007 (UTC)

I strongly disagree. An encyclopedic article is supposed to be understandable for the interested layman. This article in its present state is not; it presupposes familiarity with the basic notions of formal language theory. The article would greatly benefit from clear examples, and external references to more introductory material. Rp 09:32, 25 July 2007 (UTC)
As someone who has studied this topic in depth, it's hard for me to get perspective on just what is hard for the layman to understand. Do we have a layman that is willing to make specific criticisms that I can address? --P3d0 06:15, 27 July 2007 (UTC)

[edit] Clarify clarify and clarify more

A starting point would be to add an explanation of the used symbol S to the first example. It represents a grammar? In that case, would G be more appropriate? No? Why? Also, the use of the arrow symbol (product?) could be linked to an article describing that symbol etc. The use of symbols instead of human language warrants for an easy reference material to symbols used. Otherwise the use of symbols serve only an obfuscation purpose, being informative only to those already adept in the topic, which doubtfully is the purpose of Wikipedia.

[edit] AnBnCn Example and PEGs

This particular language can be generated by a parsing expression grammar, which is a relatively new formalism that is particularly well-suited to programming languages.

I've cut this from the head of the article. PEGs are relatively new, and I see no reason why they would be differentiated from any of the number of well-established formalisms that can also accept the language AnBnCn, especially since this is the canonical type 1 language (context-sensitive).

QTJ 20:24, 11 October 2006 (UTC)

Another example of a context-free language is \{ b^n a^m b^{2n} : n \ge 0, m \ge 0 \}. This is not a regular language, but it is context free as it can be generated by the following CFG (Context Free Grammar):

   S → bSbb | A
   A → aA | ε 

In my opinion the grammar in this example is not Context Free, because CFL does not have capability of remembering the number of symbols it has seen in b which is n, and then match it to last b containing 2n symbols. Please clarify my doubt.

The grammar proves you wrong! It is definitely a context-free grammar: the possible rewritings of S and A do not depend on the context they appear in. Rp 20:21, 28 July 2007 (UTC) What you appear to mean is that the language is not regular; which is indeed the case. Rp (talk) 15:32, 16 May 2008 (UTC)