Using lexertl instead of spirit

In answer to Paul Giaccone, The main selling point of lexertl as far as boost goes is probably speed. lexertl lexers are both fast to construct (typically under 100 milliseconds on a modern machine) and fast to tokenise input (the generated state machine uses the flex technique of equivalence classes). I personally like the fact that you can dump the DFA as data and process it later with any language you like too. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.

ben@benhanson.net wrote:
In answer to Paul Giaccone,
The main selling point of lexertl as far as boost goes is probably speed. lexertl lexers are both fast to construct (typically under 100 milliseconds on a modern machine) and fast to tokenise input (the generated state machine uses the flex technique of equivalence classes).
I personally like the fact that you can dump the DFA as data and process it later with any language you like too.
Spirit is a parser, please don't compare apples and oranges. You cannot implement, say, Wave, with just a lexer. I suggest you don't go the "this-is-better-that-is-better" route. Spirit has its own lexer too, FYI. It's called SLex. See Slex in http://tinyurl.com/29mcn Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
ben@benhanson.net wrote:
In answer to Paul Giaccone,
The main selling point of lexertl as far as boost goes is probably speed. lexertl lexers are both fast to construct (typically under 100 milliseconds on a modern machine) and fast to tokenise input (the generated state machine uses the flex technique of equivalence classes).
I personally like the fact that you can dump the DFA as data and process it later with any language you like too.
Spirit is a parser, please don't compare apples and oranges. You cannot implement, say, Wave, with just a lexer. I suggest you don't go the "this-is-better-that-is-better" route. Spirit has its own lexer too, FYI. It's called SLex. See Slex in http://tinyurl.com/29mcn
Oh and BTW, if you want to talk about speed, matching the speed of Flex is not good enough. The thing to beat is Re2C! Hartmut and I shall see how lexertl fares soon. Wave has an adaptable front end where you can choose your own lexer. Re2C (http://re2c.org/) is one of them. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
The main selling point of lexertl as far as boost goes is probably speed. lexertl lexers are both fast to construct (typically under 100 milliseconds on a modern machine) and fast to tokenise input (the generated state machine uses the flex technique of equivalence classes).
I personally like the fact that you can dump the DFA as data and process it later with any language you like too.
Spirit is a parser, please don't compare apples and oranges. You cannot implement, say, Wave, with just a lexer. I suggest you don't go the "this-is-better-that-is-better" route. Spirit has its own lexer too, FYI. It's called SLex. See Slex in http://tinyurl.com/29mcn
Oh and BTW, if you want to talk about speed, matching the speed of Flex is not good enough. The thing to beat is Re2C! Hartmut and I shall see how lexertl fares soon. Wave has an adaptable front end where you can choose your own lexer. Re2C (http://re2c.org/) is one of them.
Yes, and Slex is the other one. BTW, Slex seems to be very similar to lexertl (with the exeption of not allowing to optimize the constructed state machine tables - but this is not a principal issue, merely lack of time). Wave is based on a modular (layered) design. The lexer sits on top of the input (character) stream, producing C++ tokens, exposed via an iterator interface. The preprocessing component consumes the lexer iterators and is almost completely independent. The only dependency is that both have to use the same token type (which is a template parameter to both). It is very easy to interface a different lexer to the preprocessor. The cpp_tokens example (libs/wave/samples/cpp_tokens) demonstrates this by using the Slex based lexer. I'll elaborate on this in another mail during the next couple of days. Regards Hartmut

David Abrahams wrote:
"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:
Yes, and Slex is the other one
Not to mention XPressive?
Well, XPressive is not technically a lexer. I'm urging Eric to make it. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
David Abrahams wrote:
"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:
Yes, and Slex is the other one Not to mention XPressive?
Well, XPressive is not technically a lexer. I'm urging Eric to make it.
I remember discussing this with you and Hartmut, but I don't recall the outcome. I think we decided xpressive needed some features before it could be usable as a lexer, but I don't recall what they are. If you remind me, I can revisit this. Certainly, a small, fast, DFA-based back-end for xpressive would be ideal. It's been a while since I had that kind of free time, though. -- Eric Niebler Boost Consulting www.boost-consulting.com

Joel de Guzman <joel@boost-consulting.com> writes:
David Abrahams wrote:
"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:
Yes, and Slex is the other one
Not to mention XPressive?
Well, XPressive is not technically a lexer. I'm urging Eric to make it.
And I guess the NFA nature of what XPressive builds probably makes it a speed loser when compared to some DFA approaches. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Yes, and Slex is the other one
Not to mention XPressive?
Xpressive is not really usable as a lexer, and Eric is aware of that. I have a Wave lexer implemented with Xpressive here on my hard disk, and it functions well, it is only 3 magnitudes slower as for instance the re2c based one. The main reasons are: - no optimization between different regex's used for token representation (no internal NFA/DFA generation) - no way to tell which alternative matched if using regex's containing alternatives The first rules out using separate regex's, one for each token, the second one inhibits us from using one giant regex with alternatives... Both are probably merely natural restrictions stemmed from the fact Xpressive is a regex library not a lexer generator. The same issues would probably occur if we were trying to use Boost.Regex for this task. Regards Hartmut

Hartmut Kaiser wrote:
David Abrahams wrote:
Yes, and Slex is the other one Not to mention XPressive?
Xpressive is not really usable as a lexer, and Eric is aware of that. I have a Wave lexer implemented with Xpressive here on my hard disk, and it functions well, it is only 3 magnitudes slower as for instance the re2c based one. The main reasons are:
- no optimization between different regex's used for token representation (no internal NFA/DFA generation) - no way to tell which alternative matched if using regex's containing alternatives
The first rules out using separate regex's, one for each token, the second one inhibits us from using one giant regex with alternatives...
Both are probably merely natural restrictions stemmed from the fact Xpressive is a regex library not a lexer generator. The same issues would probably occur if we were trying to use Boost.Regex for this task.
Ah, yes. I remember now. And I was going to implement a special matcher that was a trie for token literals, to reduce the need for so many alternates. Could you send me your code to integrate xpressive and Wave. It seems unlikely that I'd be able to do better than re2c with xpressive, but it might be interesting nonetheless. What would be nice is a DSEL that generates optimal DFA-based lexers. But given the sheer number of DFA states some lexers generate, I wonder if an expression template approach is even viable. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
Ah, yes. I remember now. And I was going to implement a special matcher that was a trie for token literals, to reduce the need for so many alternates.
Could you send me your code to integrate xpressive and Wave. It seems unlikely that I'd be able to do better than re2c with xpressive, but it might be interesting nonetheless.
What would be nice is a DSEL that generates optimal DFA-based lexers. But given the sheer number of DFA states some lexers generate, I wonder if an expression template approach is even viable.
I've added the token_statistics sample to Wave (in the Boost CVS::HEAD) containing a lexer based on Xpressive. The sample itself doesn't do anything fancy yet, mainly because I stopped working on the functionality when I got stuck with Xpressive. Additionally I added the test_xlex_lexer.cpp testapp in the libs/wave/test/testlexers/ directory verifying the lexer in the context of Wave. Regards Hartmut
participants (5)
-
benï¼ benhanson.net
-
David Abrahams
-
Eric Niebler
-
Hartmut Kaiser
-
Joel de Guzman