Left recursion

From Wikipedia, the free encyclopedia

In computer science, left recursion is a special case of recursion.

A formal grammar that contains left recursion cannot be parsed by a recursive descent parser. In contrast, left recursion is preferred for LALR parsers because it results in lower stack usage than right recursion.

Contents

[edit] Definition

"A grammar is left-recursive if we can find some non-terminal A which will eventually derive a sentential form with itself as the left-symbol."[http://www.cs.may.ie/~jpower/Courses/parsing/parsing.pdf#search='indirect%20left%20recursion' JPR02

[edit] Immediate left recursion

Immediate left recursion occurs in rules of the form

A \rightarrow A\alpha\,|\,\beta

Where α and β are sequences of nonterminals and terminals, and β doesn't start with A.

Example : The rule

Expr \rightarrow Expr\,+\,Term

is immediately left-recursive. The recursive descent parser for this rule might look like :

function Expr() {
Expr(); match('+'); Term();
}

and would, evidently, fall into infinite recursion.

[edit] Indirect left recursion

Indirect left recursion in its simplest form could be defined as :

A \rightarrow B\alpha\,|\,C

B \rightarrow A\beta\,|\,D

Possibly giving the derivation A \Rightarrow B\alpha \Rightarrow A\beta \Rightarrow A \Rightarrow ...

More generally, for the non-terminals A0,A1,...,An, indirect left recursion can be defined as being of the form :

A_0 \rightarrow A_1\alpha_1\,|...

A_1 \rightarrow A_2\alpha_2\,|...

...

A_n \rightarrow A_0\alpha_{(n+1)}\,|...

Where α12,...,αn are sequences of nonterminals and terminals.

[edit] Removing left recursion

[edit] Removing immediate left recursion

The general algorithm to remove immediate left recursion follows. Several improvements to this method have been made, including the ones described in this paper.

For each rule of the form

A \rightarrow A\alpha_1\,|\,...\,|\,A\alpha_n\,|\,\beta_1\,|\,...\,|\,\beta_m

Where :

  • A is a left-recursive nonterminal
  • α is a sequence of nonterminals and terminals that is not null (\alpha \ne \epsilon)
  • β is a sequence of nonterminals and terminals that does not start with A.

Replace the A-production by the production :

A \rightarrow \beta_1A^\prime\, |\, ...\,  |\,  \beta_mA^\prime

And create a new nonterminal

A^\prime \rightarrow \epsilon\, |\, \alpha_1A^\prime\,  |\,  ...\, |\, \alpha_nA^\prime

This newly created symbol is often called the "tail", or the "rest".

[edit] Removing indirect left recursion

If the grammar has no ε-productions (no productions of the form A \rightarrow ... | \epsilon | ...) and is not cyclic (no derivations of the form A \Rightarrow  ... \Rightarrow A for any nonterminal A), this general algorithm may be applied to remove indirect left recursion :

Arrange the nonterminals in some (any) fixed order A1, ... An.

for i = 1 to n {
for j = 1 to i – 1 {
  • let the current Aj productions be
A_j \rightarrow \delta_1 | ... | \delta_k
  • replace each production A_i \rightarrow A_j \gamma by
A_i \rightarrow \delta_1\gamma | ... | \delta_k\gamma
  • remove direct left recursion for Ai
}
}

[edit] Pitfalls

The above transformations remove left-recursion by creating a right-recursive grammar; but this changes the associativity of our rules. Left recursion makes left associativity; right recursion makes right associativity. Example : We start out with a grammar :

Expr \rightarrow Expr\,+\,Term\,|\,Term

Term \rightarrow Term\,*\,Factor\,|\,Factor

Factor \rightarrow (Expr)\,|\,Int

After having applied standard transformations to remove left-recursion, we have the following grammar :

Expr \rightarrow Term Expr'

Expr' \rightarrow + Term Expr'\,|\,\epsilon

Term \rightarrow Factor Term'

Term' \rightarrow * Factor Term'\,|\,\epsilon

Factor \rightarrow (Expr)\,|\,Int

Parsing the string 'a + a + a' with the first grammar in an LALR parser (which can recognize left-recursive grammars) would have resulted in the parse tree :

                           Expr
                         /      \
                       Expr  + Term
                     /  |  \        \
                   Expr + Term    Factor
                    |       |        |
                  Term    Factor    Int
                    |        |
                  Factor    Int
                    |
                   Int  

This parse tree grows to the left, indicating that the '+' operator is left associative, representing (a + a) + a.

But now that we've changed the grammar, our parse tree looks like this :

                            Expr ---
                           /        \
                         Term      Expr' --
                           |      /  |     \ 
                       Factor    +  Term   Expr' ------
                           |         |      |  \       \
                          Int      Factor   +  Term   Expr'
                                                 |      |
                                               Factor   ε
                                                 |
                                                Int

We can see that the tree grows to the right, representing a + ( a + a). We have changed the associavity of our operator '+', it is now right-associative. While this isn't a problem for the associativity of addition with addition it would have a signifcantly different value if this were subtraction.

The problem is that normal arithmetic requires left associativity. Several solutions are: (a) rewrite the grammar to be left recursive, or (b) rewrite the grammar with more nonterminals to force the correct precedence/associativity, or (c) if using YACC or Bison, there are operator declarations, %left, %right and %nonassoc, which tell the parser generator which associativity to force.

[edit] See also

[edit] External links