Talk:LR parser
From Wikipedia, the free encyclopedia
It's a bit long now. Perhaps there should be a separate article about LR(0) parsers that deals with the LR(0) algorithm. -- Jan Hidders 10:01 Aug 18, 2002 (PDT)
This article is one of the better explanations of LR parsing I have seen. My only complaint is the dual use of the "*" character as (1) the current position in a in the LR(0) item, and (2) as the multiplication operator. That leads to items like "E * * B" (an actual example from the text) which is confusing. Perhaps we could just use a period? --P3d0 22:35, 23 Aug 2003 (UTC)
(Scratch that; apparently Konqueror renders · as an asterisk. --P3d0 23:16, 23 Aug 2003 (UTC))
Looks like the "note on SLR and LALR" overlaps a bit with the section on conflicts. Perhaps they could be combined. --P3d0 01:38, 26 Oct 2003 (UTC)
It's a really minor thing, but could we maybe change the grammars which use nonterminals "E" and "B" to use some other letters? They often look awfully similar on my screen. Marnanel 05:53, 4 Feb 2004 (UTC)
Contents |
[edit] Programming Languages not LR
I'm looking for a reference for the claim that Perl and C cannot be parsed by an LR parser. Does somebody know one? -- Jan Hidders 14:12, 21 Jul 2004 (UTC)
- The one thing that I know of is the dangling else problem. Because of this problem, not only can C not be parsed by a (pure) LR parser, it is not even an unambiguous grammar. I'm sure a Google search for dangling else will produce a large number of references, and if you're lucky, some of them may even have references with regards to other reasons C cannot be parsed by an LR parser (if there are indeed other reasons; I would imagine there are, but I don't know). CyborgTosser (Only half the battle) 07:51, 17 Apr 2005 (UTC)
-
- It's possible to create unambiguous grammar for languages with if-else construction, but requires introducing a new nonterminal symbol: all_instructions_but_if_with_else. However such grammar becomes unnatural and less readible and that's why most parsers use another ambiguousness handling techniques, like associativity declarations.
- Typedefs are another issue with C. They make the language context-dependent. It's discussed a little bit here [1], but I think there's a better reference somewhere. --Werdnagreb 19:06, 5 May 2005 (UTC)
- The article now refers to Perl and C++. C++ is probably a better example than C, as I know C parsers get around both of the points noted above with fairly inocuous kludges, whereas C++ is trickier. It might be worth mentioning that early versions of Fortran were even bigger offenders, although the problems go beyond difficulty of parsing. CyborgTosser (Only half the battle) 01:07, 8 May 2005 (UTC)
[edit] Minor improvements to the article
This is, indeed, a great explanation of the LR(0) algorithm. Nevertheless, I'd like to suggest a few changes that may make it better:
a) Change the term "begin state" to "start state", as it seems to be the most correct form to reference the, err, start state of a grammar.
b) the section "Conflicts in the constructed tables" needs some editing, as it seems. For example, it begins with the phrase "The automaton that is constructed is by the way it is constructed always a deterministic automaton", which, although correct, sounds confusing (at least to me). A better wording might be: "The steps presented so far always construct a deterministic automaton." or "The automaton contructed by looking for transitions between item sets will always be a deterministic one."
c) Besides (still in the section referred above), we find several references to the grammar not being LL(0). Because we're discussing LR(0), it looks like a typo. It would be better if an explanation of how the non-LL(0)-ness of the grammar afftects the LR(0) parser was given.
d) Finally -- and maybe this is just in my system, I guess that using • instead of · to represent the dot would make it more visble.
Great article. --Branco 13:39, 23 October 2005 (UTC)
[edit] Discussion of LR(1) Construction Needed
Although LR(0) item set construction is important bacause it is the basis for the SLR(1) parser construction algorithm as well as most LALR(1) construction algorithms, it should be pointed out that very few parsers are based on LR(0) itself. Indeed, LALR(1) parsers such as those produced by Yacc and GNU bison seem to be the most commonly used LR(k)-based parsers. Discussing LR(1) item set constuction would not only describe the LR(k) methodology when lookahead is a component of an item (as it always is except for the trivial case of k = 0), it would provide the basis for understanding the LALR(1) construction method and why an LR(1) grammar may not be LALR(1) or why an LALR(1) grammar may not be SLR(1).
It might also be pointed out that although a grammar might be LR(k) for some k > 1, there will always be LR(1), LALR(1), and SLR(1) grammars for the same language. --Raaronson 16:58, 27 December 2005 (UTC)
[edit] Discussion of construction of grammars for LR parsers?
It might be worthwhile to have a section about what sort of grammars work best with LR parsers: yes, they can parse all LR grammars which is a strictly-defined set, but some work better than others. E.g.:
list ::= item ',' list list ::= item '.'
is an LR(1) grammar that would need to shift the entire input onto the stack before reducing it, whereas:
list ::= list-items '.' list-items ::= item list-items ::= list-items ',' item
is a grammar for the same language that allows the parser to reduce after each item. I just added a note to left recursion that mentions this, although not in as much detail. The same reference [2] is probably useful here. JulesH 08:06, 4 August 2006 (UTC)
[edit] removed sentence
I removed from section 1.2:
In practice the parser will usually try to recover from a syntax error by ignoring unexpected symbols and/or inserting missing symbols, and continue parsing, but this is outside the scope of this article.
This paragraph is like taken from a tutorial. Well, I imagine it's only because of the "outside of scope" thing. Whatever. --euyyn 22:14, 4 September 2006 (UTC)
[edit] Web resource for LR(k) parser construction
These are what I found when searching on the Internet for my project on implementing a LR(1) yacc:
- Practical LR(k) Parser Construction [3]. By David R. Tribble, 12/12/2004. This is a long and complete introduction, although many places are labeled as "+INCOMPLETE".
- The Honalee LR(k) Algorithm [4]. By David R. Tribble, 4/27/2006.
BTW, in "E → E + B", maybe B can be changed to T so it looks different from E. "Item" is called "Configuration" in some early papers in the 1970s.
-- oxc 10:04, 7 December 2006 (UTC)
[edit] Closure of item sets
I am not convinced that the example is entirely consistent with the given definition of closure.
In particular, in construction of the item sets the rule "S → E •" is added to Item set 3, and two others are added to complete the closure. However this rule has empty follow set, so closure as defined (as I understand) will not add a rule such as "E → E • * B" to the item set.
It is entirely possible that I have misunderstood some part of this -- if so perhaps this is an indicator that the explanation needs clearing up a little (or that I should pay attention in lectures :p ), but a comment either way by someone more knowledgable than I would be appreciated. JoeKearney 19:52, 6 April 2007 (UTC)
- With a bit more thought, I believe that what were trying to say is this:
-
If we're at the state of "S → E •" then we want (i.e. the closure includes) all rules to parse anything in the follow-set of E.
- In particular it includes rules 1 and 2: "E → E • * B" and "E → E • + B".
- I believe the definition of the closure doesn't fully account for this, but I'm not certain. If I have a spark of inspiration, or (more likely) if someone would care to point one way or the other I'll edit to make this more clear. JoeKearney 20:19, 6 April 2007 (UTC)