Bottom-up parsing

From Wikipedia, the free encyclopedia

This article or section is in need of attention from an expert on the subject.
WikiProject Computer science or the Computer science Portal may be able to help recruit one.
If a more appropriate WikiProject or portal exists, please adjust this template accordingly.

Bottom-up parsing is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them. It occurs in the analysis of both natural languages and computer languages.

See also: top-down parsing

Within computer science, bottom-up parsing is also known as "shift-reduce parsing"

1 Linguistics
2 Computer Science
3 Type of Bottom-up Parsers
4 An Example of shift-reduce parsing
5 External links

[edit] Linguistics

In linguistics, an example of bottom-up parsing would be analyzing a sentence by identifying words first, and then using properties of the words to infer grammatical relations and phrase structures to build a parse tree of the complete sentence.

[edit] Computer Science

In programming language compilers, bottom-up parsing is a parsing method that works by identifying terminal symbols first, and combines them successively to produce nonterminals. The productions of the parser can be used to build a parse tree of a program written in human-readable source code that can be compiled to assembly language or pseudocode.

Different computer languages require different parsing techniques, although it is not uncommon to use a parsing technique that is more powerful than that actually required.

It is common for bottom-up parsers to take the form of general parsing engines, that can either parse or generate a parser for a specific programming language given a specification of its grammar. Perhaps the most well known generalized parser generator is YACC.

[edit] Type of Bottom-up Parsers

The common classes of bottom-up parsing are:

LR parser
- LR(0) - No lookahead symbol
- SLR(1) - Simple with one lookahead symbol
- LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to implement. YACC uses this language.
- LR(1) - Most general language, but most complex to implement.
- LR(n) - where n is an integer>=0 indicates an LR parser with n lookahead symbols; while languages can be designed that require more than 1 lookahead practical languages try to avoid this because increasing n can threoretically require exponentially more code and data space (in practice, this may not be as bad).
Precedence parser
- Simple precedence parser
- Operator-precedence parser
- Extended precedence parser

The Parser performs one of two actions (beside accept). These are "Shift" and "Reduce".

Shift means moving a symbol from the input to the stack
Reduce means matching a set of symbols in the stack for a more general symbol

For example see figure 1.

[edit] An Example of shift-reduce parsing

Figure 1.

 Take the language:
 S --> AB
 A --> a
 B --> b

 And the input:
 ab

 Then the bottom up parsing is:

    Stack           Input
 ----------+     +-+-+------
           |     |a|b|
 ----------+     +-+-+------
           Shift a
 --------+-+     +-+--------
         |a|     |b|
 --------+-+     +-+--------
      Reduce a (A --> a)
 --------+-+     +-+--------
         |A|     |b|
 --------+-+     +-+--------
           Shift b
 ------+-+-+     +----------
       |A|b|     |
 ------+-+-+     +----------
      Reduce b (B --> b)
 ------+-+-+     +----------
       |A|B|     |
 ------+-+-+     +----------
     Reduce AB (S --> AB)
 --------+-+     +----------
         |S|     |
 --------+-+     +----------
           Accept

[edit] External links

[1] An example of shift-reduce parsing (which is a type of bottom up parsing), with a small grammar, state diagram, and c language code to implement the parser