Scannerless parsing

In computer science, scannerless parsing (also called lexerless parsing) refers to the use of a single formalism to express both the lexical and context-free syntax used to parse a language.

This parsing strategy is suitable when a clear lexer–parser distinction is unneeded. Examples of when this is appropriate include TeX, most wiki grammars, makefiles, and simple per application control languages.

Contents

Advantages

Disadvantages

Required extensions

Unfortunately, when parsed at the character level, most popular programming languages are no longer strictly context-free. Visser identified five key extensions to classical context-free syntax which handle almost all common non-context-free constructs arising in practice:

Implementations

Notes

Further reading

Visser, E. (1997b). Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam