The lexer hack
From Wikipedia, the free encyclopedia
When parsing computer programming languages, the lexer hack (as opposed to "a lexer hack") is a term in common use describing a common solution to the problems which arise when attempting to use a regular grammar-based lexer to classify tokens in ANSI C as either variable names or type names.
[edit] Solutions
The solution generally consists of feeding information from the parser's symbol table back into the lexer. This incestuous commingling of the lexer and parser is generally regarded as inelegant, which is why it is called a "hack".
This problem does not arise (and hence needs no "hack" in order to solve) when using lexerless parsing techniques.
[edit] Citations
- http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/index.html
- http://cs.nyu.edu/rgrimm/papers/pldi06.pdf
- http://cens.ioc.ee/local/man/CompaqCompilers/ladebug/ladebug-manual-details.html
- http://www.springerlink.com/index/YN4GQ2YMNQUY693L.pdf
- http://news.gmane.org/find-root.php?group=gmane.comp.lang.groovy.jsr&article=843&type=blog
- http://groups.google.com/group/comp.compilers/browse_frm/thread/db7f68e9d8b49002/fa20bf5de9c73472?lnk=st&q=%2B%22the+lexer+hack%22&rnum=1&hl=en#fa20bf5de9c73472