[spirit][xml] Scanner/Parser Separation

Hi, I've given a lot of thought lately to Boost.Xml and how it could be implemented. In the typical lex/yacc-like scanner/parser separation, XML requires a very complicated scanner: if the parser is supposed to be streaming, the scanner must replace all entity references by their replacement token stream. This is because they can appear almost anywhere, and are mostly not directly recognized by the XML grammar. Entity reference substitution is rather complicated, as are the rules affecting it. For example, pure token stream insertion by the scanner doesn't work: the scanner must keep track of whether an entity is allowed in a certain place (they may appear only /almost/ anywhere), which depends not only on the previous tokens, but for example whether the current context is that of an internal or an external DTD subset. In addition, the scanner should check several constraints of the replacement text, such as parentheses nesting in content expressions. The original idea was to implement Boost.Xml using Spirit with the existing XML grammar. I've given up on that. I believe it is not, with any reasonable effort, possible to implement a completely compliant XML parser with the merged scanner/parser-system that is natural to Spirit. The grammar would have to account for entity references in too many places, replacing character references at the scanner level would present problems with the characters representing <, > and &, and so on. So I'm wondering, does Spirit support in any way the separation of scanner and parser? Is it possible to write a Spirit grammar specification that acts on some token type instead of characters? How much effort would that be? Has anyone done something similar previously? Sebastian Redl

Sebastian Redl wrote:
Hi,
Hi,
So I'm wondering, does Spirit support in any way the separation of scanner and parser?
Yes.
Is it possible to write a Spirit grammar specification that acts on some token type instead of characters?
Yes.
How much effort would that be?
Not much.
Has anyone done something similar previously?
Wave. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net
participants (2)
-
Joel de Guzman
-
Sebastian Redl