New subject: Interest in a fast lexical analyser compatible

27 Dec 2004

      Hartmut Kaiser wrote:
...
I'm definitely interested to have a look at your library. Besides my 
general
interest in Spirit I'd like to try it out as an alternative lexing 
component
for Wave. I'm pretty sure this should is interesting for you as well,
because there are implemented already two different lexers, which gives a
good opportunity to compare them in a real environment.
I have to confess to not knowing much about Wave, but I would be willing to 
look in more detail at using this library for Wave.  Once we have our first 
stable version of the software we will be happy to let you have a look at it 
in more detail.  I expect this to happen within the next month or so.
...
As for a dynamic DFA based lexer Wave already uses the Spirit based SLEX,
but a static DFA based solution is very interesting to look at. Is the DFA
generated at compile time?
By static and dynamic, I am meaning compile-time and run-time.  The system 
is designed to generate the DFA at run-time.  But we are discussing a method 
at the moment whereby the same code base could be used to generate 
code-stubs for compile-time DFA creation.  I am keen to support both.
...
We definitely should try the new upcoming Spirit-2 code base as well, since
it should be a lot faster then the current version. Is it possible to have 
a
look at your test code as well? This way we could try to make a comparision
as soon as the Spirit-2 codebase evolves.
I would be quite keen to see how Spirit-2 performs on similar tests.  If the 
interest is there, then we will quite happily post the code into the boost 
yahoo group once we have completed a bit of tidying up in the New Year.  We 
need to put some effort into writing more detailed test cases.  At present, 
we are only directly comparing lexical analysis, and have not looked at the 
performance of the interface in real detail.  We have a desire to properly 
test the system with a complete parse.  To do this I think there are a 
number of useful test cases:

1)    Flex and bison/lex and yacc.
2)    Spirit - without any assistance from any lexer.
3)    Spirit 2 once available.
4)    Our library (called lextl at present) with yacc/bison.
5)    Lextl with Spirit.
6)    Lextl with Spirit 2.

Case 1 would be the control - and the target performance for other cases to 
achieve.  I am confident that case 4 should achieve the same speed, also I 
think there is a good likelihood of cases 5 and 6 achieving the same speed. 
The key question is whether 5 outperforms 2 and 6 outperforms 3.  If this is 
the case, then I think we have a viable library.  My current test file which 
is a 20Mb VRML1.0 file reduces to about 2e6 tokens when whitespace is 
stripped and the file is lexed.  By using flyweighted tokens, we are hoping 
to reduce the overhead for Spirit to parse tokens instead of characters to a 
minimum, so the reduction from 2e7 to 2e6 input entities should give an 
order of magnitude speed increase to Spirit.  At least that is my hope :-)

Dave Handley

RE: [boost] Interest in a fast lexical analyser compatible

Dave Handley

Hartmut Kaiser

Joel de Guzman

tags

participants (3)