New subject: Interest in a fast lexical analyser compatible with Spirit

27 Dec 2004

      A colleague and I are working on a fast lexical analyser designed to work 
alongside Spirit.  We currently have some working prototypes, and as such I 
thought that I would gauge interest in this library.  We plan to produce a 
DFA based lexical analyser that provides output as a set of iterable 
polymorphic flyweighted tokens.  These could then be provided as input to 
Spirit instead of character iterators (albeit with the addition of a token_p 
parser in Spirit).  The objectives of our project are as follows:

1)    As fast as lex/flex.
2)    Simple to use
3)    Rules to generate tokens can be provided both statically and 
dynamically.  Static definition would be through an offline pre-process 
stage much like lex.
4)    Easy to interface with Spirit.

I understand from Joel's posts below (both on this group and on codeproject) 
that he is already planning some work to this end.  My understanding of 
expression template based lexers is that they can outperform DFAs under some 
circumstances, but with other situations DFAs can provide the best 
performance.  I am therefore recommending this in addition to Joel and 
Eric's work rather than instead of it.

Finally, as a taster of the work we are doing, I have performed some rather 
unscientific and arbitrary performance tests on a 20Mb VRML 1.0 file. The 
performance of flex, Spirit, and our new library are as follows:

flex: < 1 second
new library:  about 1 second
Spirit: 40-50 seconds.

These results were produced with no semantic actions to slow down any of the 
3 solutions.  They are also not conclusive, since we are almost an order of 
magnitude slower on an Athlon 64 bit machine at present.

Dave Handley

Relevant posts:
Joel de Guzman wrote:
...
...
I guess if you use string literals you'll have this
problem no matter what, though I'm not sure the use of string literals
should be a foregone conclusion.  If you consider parser generators like
YACC, they cleanly separate lexical analysis from parsing and the parser
itself is never working with string literals.
Again, true. However, one of the original intent of Spirit is to
be usable in micro parsing tasks. When parsing small grammars
like, say, a complex number, you do not want to write a separate
front end lexer.
Anyway, I do plan to write an optional but separate lexer
sometime.
Joel de Guzman wrote:
...
As for the lexer, I am working with Eric Niebler, author of Greta and
xpressive (google regular expression c++) with the hopes of developing
a powerful expresion template based lexer front end.

Interest in a fast lexical analyser compatible with Spirit

Dave Handley

christopher diggins

Hartmut Kaiser

Richard Hadsell

Darren Cook

David Abrahams

tags

participants (6)