
Dick Hadsell wrote: <snip>
disappointed, because I was hoping to use Spirit or something like it, to give me some independence from Lex/Yacc's dictatorial control of the input source.
Your project sounds like it would solve the worst of the problems I have in trying to move to Spirit.
The current API we are working with templatises the input so that in theory it will work with any character like input, in much the same way as std::basic_string<>. We are still working on getting the DFA to work generically, rather than just explicitly with char and wchar_t, but I think we should have some success. At present the lexer is strongly typed from this character type in the same way as std::basic_string<> but I don't necessarily see that as a problem.
I broke up the problem into 3 steps. In the first phase the program uses a Spirit grammar to generate a list of tokens with info similar to <snip>
Depending on the type of grammar I think you should easily achieve a 6x or greater performance boost. If the input has long repetitive sections, you could probably further optimise the stage so that the lexer does most of the work - for example if you have long lists of numbers or similar. I'm not sure how well this would work with Spirit until I try it, but it should be possible to switch control part way through a parse to a very quick and efficient parser that just throws the tokens at a visitor until a particular section is finished. I'm sure this could probably be done by writing a new parser type in Spirit. The idea would be to process long lists of numbers or strings or similar, where those lists have a clearly defined end token. Dave Handley