[boost] Interest in a fast lexical analyser compatible

28 Dec 2004

      Dick Hadsell wrote:

<snip>
...
disappointed, because I was hoping to use Spirit or something like it,
to give me some independence from Lex/Yacc's dictatorial control of the
input source.
...
Your project sounds like it would solve the worst of the problems I have
in trying to move to Spirit.
The current API we are working with templatises the input so that in theory 
it will work with any character like input, in much the same way as 
std::basic_string<>.  We are still working on getting the DFA to work 
generically, rather than just explicitly with char and wchar_t, but I think 
we should have some success.  At present the lexer is strongly typed from 
this character type in the same way as std::basic_string<> but I don't 
necessarily see that as a problem.
...
I broke up the problem into 3 steps.  In the first phase the program
uses a Spirit grammar to generate a list of tokens with info similar to
<snip>
Depending on the type of grammar I think you should easily achieve a 6x or 
greater performance boost.  If the input has long repetitive sections, you 
could probably further optimise the stage so that the lexer does most of the 
work - for example if you have long lists of numbers or similar.  I'm not 
sure how well this would work with Spirit until I try it, but it should be 
possible to switch control part way through a parse to a very quick and 
efficient parser that just throws the tokens at a visitor until a particular 
section is finished.  I'm sure this could probably be done by writing a new 
parser type in Spirit.  The idea would be to process long lists of numbers 
or strings or similar, where those lists have a clearly defined end token.

Dave Handley