Bottom-up parsing

From Wikipedia, the free encyclopedia

Bottom-up parsing is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them. It occurs in the analysis of both natural languages and computer languages.

See also: top-down parsing

Within computer science, bottom-up parsing is also known as "shift-reduce parsing"

Contents

[edit] Linguistics

In linguistics, an example of bottom-up parsing would be analyzing a sentence by identifying words first, and then using properties of the words to infer grammatical relations and phrase structures to build a parse tree of the complete sentence.

[edit] Computer Science

In programming language compilers, bottom-up parsing is a parsing method that works by identifying terminal symbols first, and combines them successively to produce nonterminals. The productions of the parser can be used to build a parse tree of a program written in human-readable source code that can be compiled to assembly language or pseudocode.

Different computer languages require different parsing techniques, although it is not uncommon to use a parsing technique that is more powerful than that actually required.

It is common for bottom-up parsers to take the form of general parsing engines, that can either parse or generate a parser for a specific programming language given a specification of its grammar. Perhaps the most well known generalized parser generator is YACC.


[edit] Type of Bottom-up Parsers

The common classes of bottom-up parsing are:

  • LR parser
    • LR(0) - No lookahead symbol
    • SLR(1) - Simple with one lookahead symbol
    • LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to implement. YACC uses this language.
    • LR(1) - Most general language, but most complex to implement.
    • LR(n) - where n is an integer>=0 indicates an LR parser with n lookahead symbols; while languages can be designed that require more than 1 lookahead practical languages try to avoid this because increasing n can threoretically require exponentially more code and data space (in practice, this may not be as bad).
  • Precedence parser

The Parser performs one of two actions (beside accept). These are "Shift" and "Reduce".

  • Shift means moving a symbol from the input to the stack
  • Reduce means matching a set of symbols in the stack for a more general symbol

For example see figure 1.

[edit] An Example of shift-reduce parsing

Figure 1.

 Take the language:
 S --> AB
 A --> a
 B --> b

 And the input:
 ab

 Then the bottom up parsing is:

    Stack           Input
 ----------+     +-+-+------
           |     |a|b|
 ----------+     +-+-+------
           Shift a
 --------+-+     +-+--------
         |a|     |b|
 --------+-+     +-+--------
      Reduce a (A --> a)
 --------+-+     +-+--------
         |A|     |b|
 --------+-+     +-+--------
           Shift b
 ------+-+-+     +----------
       |A|b|     |
 ------+-+-+     +----------
      Reduce b (B --> b)
 ------+-+-+     +----------
       |A|B|     |
 ------+-+-+     +----------
     Reduce AB (S --> AB)
 --------+-+     +----------
         |S|     |
 --------+-+     +----------
           Accept

[edit] External links

  • [1] An example of shift-reduce parsing (which is a type of bottom up parsing), with a small grammar, state diagram, and c language code to implement the parser
  • [2] course notes on shift reduce parsing
  • [3] A good non-technical tutorial in the context of natural (human) languages
  • [4] A discussion of shift-reduce conflicts in bottom up parsers. A knowledgeable but technical article.