Talk:Scannerless parsing
From Wikipedia, the free encyclopedia
Added a disadvantages section and edited the advantages. This article is in need of being balanced, scannerless parsing is a technique which makes sense in limited circumstances, usually when the language being parsed is very simple. Remember, there is a reason the lexer/parser distinction was made in the first place.
In particular:
- lexer/parser distinction not neccesary: actually, yes it is, depending on what your needs are. as I said above, there was a very good reason it was developed, combining the 2 functions (scanning/parsing) with more complex languages becomes messy, harder to understand, develop, and maintain. Moved this into the introduction, changed semantics to explain when this technique is appropriate (as opposed to implying it's a universal truth)
- no keywordification: keywords are often included as a feature, and having a seperate lexer and parser doesn't mean you have to have keywords; scannerless parsing can do without them simply because it has less of the design constraints that make keywords attractive to implement in the first place. Also, many people would rightly consider keywords a feature, and not a requirement; go look up the early fortran days to get an inkling why. As such, moved this info to the token classification not required advantage.
12.165.250.13 (talk) 20:54, 28 May 2008 (UTC)
I've been observing this article for a while, and I've been dismayed at how poor the article still is. It contains a number of factual mistakes and does not really explain anything. I'm reluctant to improve the article myself though, because I'm one of the researchers publishing on the merits of scannerless parsing. I have a few problems with the article:
- There is no decent explanation of the scanner/parsing process. This article should explain how in a traditional scanner/parser division a scanner splits up a character stream into tokens, and how the parser consumes the tokens.
- The article does not give any actual examples of cases where scannerless parsing is useful. The current list of applications is not correct. In fact, scannerless parsing is mostly useful for languages with a complex, context-sensitive lexical syntax. Typically, these are languages that involve a mixture of different sublanguages. We've published a series of papers on this: "Concrete Syntax for Objects" (OOPSLA'04) and "Declarative, Formal, and Extensible Syntax Definition for AspectJ - A Case for Scannerless Generalized-LR Parsing". The second paper in particular illustrates how the traditional scanner/parser separation breaks down on languages with a complex context-sensitive lexical syntax.
- The 'required extensions' section is largely focussed on language extensions in SDF/SGLR. Some of these extensions are not related to scannerless parsing at all. In particular: preference attributes (more an aspect of GLR) and per-production transitions (related to the priorities mechanism, which is unrelated to scannerless parsing).
Martin Bravenboer (talk) 15:11, 30 May 2008 (UTC)