RE: [boost] Interest in a fast lexical analyser compatible

Hartmut Kaiser wrote:
I'm definitely interested to have a look at your library. Besides my general interest in Spirit I'd like to try it out as an alternative lexing component for Wave. I'm pretty sure this should is interesting for you as well, because there are implemented already two different lexers, which gives a good opportunity to compare them in a real environment.
I have to confess to not knowing much about Wave, but I would be willing to look in more detail at using this library for Wave. Once we have our first stable version of the software we will be happy to let you have a look at it in more detail. I expect this to happen within the next month or so.
As for a dynamic DFA based lexer Wave already uses the Spirit based SLEX, but a static DFA based solution is very interesting to look at. Is the DFA generated at compile time?
By static and dynamic, I am meaning compile-time and run-time. The system is designed to generate the DFA at run-time. But we are discussing a method at the moment whereby the same code base could be used to generate code-stubs for compile-time DFA creation. I am keen to support both.
We definitely should try the new upcoming Spirit-2 code base as well, since it should be a lot faster then the current version. Is it possible to have a look at your test code as well? This way we could try to make a comparision as soon as the Spirit-2 codebase evolves.
I would be quite keen to see how Spirit-2 performs on similar tests. If the interest is there, then we will quite happily post the code into the boost yahoo group once we have completed a bit of tidying up in the New Year. We need to put some effort into writing more detailed test cases. At present, we are only directly comparing lexical analysis, and have not looked at the performance of the interface in real detail. We have a desire to properly test the system with a complete parse. To do this I think there are a number of useful test cases: 1) Flex and bison/lex and yacc. 2) Spirit - without any assistance from any lexer. 3) Spirit 2 once available. 4) Our library (called lextl at present) with yacc/bison. 5) Lextl with Spirit. 6) Lextl with Spirit 2. Case 1 would be the control - and the target performance for other cases to achieve. I am confident that case 4 should achieve the same speed, also I think there is a good likelihood of cases 5 and 6 achieving the same speed. The key question is whether 5 outperforms 2 and 6 outperforms 3. If this is the case, then I think we have a viable library. My current test file which is a 20Mb VRML1.0 file reduces to about 2e6 tokens when whitespace is stripped and the file is lexed. By using flyweighted tokens, we are hoping to reduce the overhead for Spirit to parse tokens instead of characters to a minimum, so the reduction from 2e7 to 2e6 input entities should give an order of magnitude speed increase to Spirit. At least that is my hope :-) Dave Handley

Dave Handley wrote:
I have to confess to not knowing much about Wave, but I would be willing to look in more detail at using this library for Wave. Once we have our first stable version of the software we will be happy to let you have a look at it in more detail. I expect this to happen within the next month or so.
I'd be willing to write the interfacing stub to plug your library into Wave.
By static and dynamic, I am meaning compile-time and run-time. The system is designed to generate the DFA at run-time. But we are discussing a method at the moment whereby the same code base could be used to generate code-stubs for compile-time DFA creation. I am keen to support both.
The two different lexers I was using in the Wave library were a re2c based lexer (static switch based lexer, extremly fast and compact) and a SLEX based lexer (runtime generated DFA). I haven't done serious speed measurements, but the numbers I've got so far showed similar timings for both with a slight advantage for re2c (as expected). I'd expect your library to be very similar in speed as well. But just out of curiousity I'm very interested in seeing the static DFA generation version :-)
We definitely should try the new upcoming Spirit-2 code base as well, since it should be a lot faster then the current version. Is it possible to have a look at your test code as well? This way we could try to make a comparision as soon as the Spirit-2 codebase evolves.
I would be quite keen to see how Spirit-2 performs on similar tests. If the interest is there, then we will quite happily post the code into the boost yahoo group once we have completed a bit of tidying up in the New Year.
I'm definitely interested.
We need to put some effort into writing more detailed test cases. At present, we are only directly comparing lexical analysis, and have not looked at the performance of the interface in real detail. We have a desire to properly test the system with a complete parse. To do this I think there are a number of useful test cases:
1) Flex and bison/lex and yacc. 2) Spirit - without any assistance from any lexer. 3) Spirit 2 once available. 4) Our library (called lextl at present) with yacc/bison. 5) Lextl with Spirit. 6) Lextl with Spirit 2.
Sounds sensible. Regards Hartmut

Dave Handley wrote:
I would be quite keen to see how Spirit-2 performs on similar tests. If the interest is there, then we will quite happily post the code into the boost yahoo group once we have completed a bit of tidying up in the New Year. We need to put some effort into writing more detailed test
Hi Dave, I'm definitely interested! As you already know, I'm focusing now on optimization and performance. To that end, I'm targetting 1) predictive parsing and 2) lexical analysis. Any help I could get on these two areas would be very welcome. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net
participants (3)
-
Dave Handley
-
Hartmut Kaiser
-
Joel de Guzman