LRGen
From Wikipedia, the free encyclopedia
LRGen | |
---|---|
Developed by | Paul B Mann |
Latest release | 8.3.021 |
OS | Microsoft Windows |
Genre | Parser generator |
License | BSD |
Website | LRGEN.COM |
LRGen is a parser generator and lexer generator that was designed to produce high-performance language processors through the generation of finite-state machines, optimized for size and speed.
Contents |
[edit] Features
LRGen generates portable LR parsers and LR lexers as source-code files for use with any C/C++ compiler on any operating system. It can also generate code in other programming languages. From 1985 to 2005, the product was called LALR. It has the following distinguishing features:
[edit] Four-matrix parser tables
Four-matrix parser tables (4MPT) make use of shift-reduce operations and allow chain-reduction elimination, not possible with YACC-style parser tables. 4MPT's achieve a speed of 2 to 4 times the speed of YACC-generated parsers, according to benchmarks done by Parsetec in 1989. The ACM paper Very Fast LR Parsing [1] claimed, in 1986, that YACC generates the fastest LR table-driven parsers. For an in depth discussion of the theory that lead to the invention of 4MPT, see Optimization Of Parser Tables For Portable Compilers [2].
[edit] LR(1) grammars
LRGen reads LR(1) grammars and creates minimal LR(1) parsers which are approximately the same size as LALR parsers. It uses a state-merging algorithm, similar to Pager's Lane Tracing Algorithm [3], which preserves the LR(1) class of languages and avoids the huge size of canonical LR(1) parser tables. Note: Some LR(1) grammars may require slight modifications to be acceptable.
[edit] TBNF grammar notation
LRGen reads TBNF grammar notation for creating LR parsers. TBNF is a modern, powerful EBNF notation for defining LR language translators. TBNF permits easy handling of context sensitive issues (like the typedef problem in the C language) with special semantic symbols. TBNF permits automatic abstract-syntax tree (AST) construction through the use of several AST-creation operators. TBNF permits automatic intermediate-code generation through the use of tree-traversal notation and output strings. See the paper A Translational BNF Grammar Notation [4] published in SIGPLAN Notices, April 2006.
[edit] LBNF grammar notation
LRGen reads LBNF grammar notation for creating LR lexers. LBNF is a complete BNF grammar notation which allows more flexibility and readability in the definition of lexical analyzers. Most other systems use regular expressions which can be limiting and difficult to read. LBNF can easily define such things as nested comments.
[edit] Parser skeletons
LRGen uses a parser skeleton concept to generate source code in any programming language. While other systems may generate binary files usable by any programming language, they do not allow you the advantage of having the parser tables in source code for any programming language. Source-code tables offer higher-performance by the use of function pointers, not possible with binary tables. Users have created parsers in C, C++, Pascal, IBM Assembly Language, C# (sharp), and others.
[edit] Performance issues
If speed is a priority, fast DFA lexers can be generated which make use of a one-matrix lexer table (1MLT) structure. Tests show the DFA lexers to run 5.5 times the speed of optimized LR lexers.
LRGen builds parsers quickly. Grammars with 2,000 rules can be processed in about 1 second, or 2.5 seconds with all optimizations turned on. This is partly due to the use of DeRemer and Pennello's DiGraph Algorithm [5].
[edit] Operation
[edit] References
- ^ Very Fast LR Parsing
- ^ Optimization Of Parser Tables For Portable Compilers
- ^ The Lane Tracing Algorithm For Constructing LR(k) Parsers
- ^ A Translational BNF Grammar Notation (TBNF)
- ^ Efficient Computation Of LALR(1) Look-Aheads Sets