Bottom-up parsing
From Wikipedia, the free encyclopedia
Bottom-up parsing is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them. It occurs in the analysis of both natural languages and computer languages.
- See also: top-down parsing
Within computer science, bottom-up parsing is also known as "shift-reduce parsing"
Contents |
[edit] Linguistics
In linguistics, an example of bottom-up parsing would be analyzing a sentence by identifying words first, and then using properties of the words to infer grammatical relations and phrase structures to build a parse tree of the complete sentence.
[edit] Computer Science
In programming language compilers, bottom-up parsing is a parsing method that works by identifying terminal symbols first, and combines them successively to produce nonterminals. The productions of the parser can be used to build a parse tree of a program written in human-readable source code that can be compiled to assembly language or pseudocode.
Different computer languages require different parsing techniques, although it is not uncommon to use a parsing technique that is more powerful than that actually required.
It is common for bottom-up parsers to take the form of general parsing engines, that can either parse or generate a parser for a specific programming language given a specification of its grammar. Perhaps the most well known generalized parser generator is YACC.
[edit] Type of Bottom-up Parsers
The common classes of bottom-up parsing are:
- LR parser
- LR(0) - No lookahead symbol
- SLR(1) - Simple with one lookahead symbol
- LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to implement. YACC uses this language.
- LR(1) - Most general language, but most complex to implement.
- LR(n) - where n is an integer>=0 indicates an LR parser with n lookahead symbols; while languages can be designed that require more than 1 lookahead practical languages try to avoid this because increasing n can threoretically require exponentially more code and data space (in practice, this may not be as bad).
- Precedence parser
- Simple precedence parser
- Operator-precedence parser
- Extended precedence parser
The Parser performs one of two actions (beside accept). These are "Shift" and "Reduce".
- Shift means moving a symbol from the input to the stack
- Reduce means matching a set of symbols in the stack for a more general symbol
For example see figure 1.
[edit] An Example of shift-reduce parsing
Figure 1.
Take the language: S --> AB A --> a B --> b And the input: ab Then the bottom up parsing is: Stack Input ----------+ +-+-+------ | |a|b| ----------+ +-+-+------ Shift a --------+-+ +-+-------- |a| |b| --------+-+ +-+-------- Reduce a (A --> a) --------+-+ +-+-------- |A| |b| --------+-+ +-+-------- Shift b ------+-+-+ +---------- |A|b| | ------+-+-+ +---------- Reduce b (B --> b) ------+-+-+ +---------- |A|B| | ------+-+-+ +---------- Reduce AB (S --> AB) --------+-+ +---------- |S| | --------+-+ +---------- Accept
[edit] External links
- [1] An example of shift-reduce parsing (which is a type of bottom up parsing), with a small grammar, state diagram, and c language code to implement the parser
- [2] course notes on shift reduce parsing
- [3] A good non-technical tutorial in the context of natural (human) languages
- [4] A discussion of shift-reduce conflicts in bottom up parsers. A knowledgeable but technical article.