Top-down parsing
From Wikipedia, the free encyclopedia
Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis. It occurs in the analysis of both natural languages and computer languages.
- See also: bottom-up parsing
[edit] Programming language application
A compiler parses input from a programming language to assembly language or an internal representation by matching the incoming symbols to Backus-Naur form production rules. An LL parser, also called a top-bottom parser or top-down parser, applies each production rule to the incoming symbols by working from the left-most symbol yielded on a production rule and then proceeding to the next production rule for each non-terminal symbol encountered. In this way the parsing starts on the Left of the result side (right side) of the production rule and evaluates non-terminals from the Left first and, thus, proceeds down the parse tree for each new non-terminal before continuing to the next symbol for a production rule.
For example:
- A → aBC
- B → c | cd
- C → df | eg
would match A → aBC and attempt to match B → c | cd next. Then C → df | eg would be tried. As one may expect, some languages are more ambiguous than others. For a non-ambiguous language in which all productions for a non-terminal produce distinct strings: the string produced by one production will not start with the same symbol as the string produced by another production. A non-ambiguous language may be parsed by an LL(1) grammar where the (1) signifies the parser reads ahead one token at a time. For an ambiguous language to be parsed by an LL parser, the parser must lookahead >1 symbol, e.g. LL(3).
The common solution is to use an LR parser (also known as bottom-up or shift-reduce parser).
[edit] See also
- Recursive descent parser - another type of top down parser.