
Hartmut Kaiser wrote:
I'd be willing to write the interfacing stub to plug your library into Wave.
Thanks - I think this should be relatively easy because we currently use a forward iterator across the tokens in order to generate suitable output for Spirit. Clearly one of the requirements for the Boost community to accept this library in general will probably be meeting the Standard requirements for forward iterators - and I note that this is one of your key requirements in Wave.
The two different lexers I was using in the Wave library were a re2c based lexer (static switch based lexer, extremly fast and compact) and a SLEX based lexer (runtime generated DFA). I haven't done serious speed measurements, but the numbers I've got so far showed similar timings for both with a slight advantage for re2c (as expected). I'd expect your library to be very similar in speed as well. But just out of curiousity I'm very interested in seeing the static DFA generation version :-)
Can these lexers be effectively used with any Spirit grammar? I'll download Wave over the next few days and have a look at how much overlap there is between our library and the lexers within Wave. My plan for the static DFA version (which we haven't fully discussed in our internal design discussions yet so may change significantly), is to use the memento pattern to reflect the internal state of the DFA, along with the production rules. These could then be saved or loaded, or serialised to a C++ file, in a rather similar way to flex. This would mean that the lextl classes would be both an inline and an offline tool. In principle this gives some very desirable advantages: 1) The API remains the same in both compile-time and run-time versions. 2) You can very easily swap between run-time and compile-time versions - for example during development you may use run-time creation to speed up the process of developing a complex grammar, then switch to compile-time for your production release. 3) You have a number of options for compile-time versions. The grammar can be in a configuration file, or compiled into the program directly.
at least you could have used the symbol parser ... <snip>
I've changed the Spirit grammar to use the symbol parser, and dropped the time from 40-50 seconds to about 6-7 seconds. Thanks for the tip.
Is there any documentation available?
No, I'm sorry, we don't have any documentation at present, because we are still only a little beyond a prototyping stage. We still have to refine the API a little, and complete some optimisation work before we publish an early Alpha. As I said in my original post, I'm trying to gauge interest at the moment, and I'm pleased to see that we are generating a little interest. As a flavour, the current API would result in some code looking like: syntax<> mySyntax; mySyntax.add_rule( rule<token<1> >( "Group" ) ); mySyntax.add_rule( rule<token<2> >( "Separator" ) ); mySyntax.add_rule( rule<token<3> >( "Switch" ) ); // Lots more symbols for the rest of VRML... mySyntax.add_rule( rule<token_float<69> >( "([-+]?((([0-9]+[.]?)|([0-9]*[.][0-9]+))([eE][-+]?[0-9]+)?))" ) ); mySyntax.add_rule( rule<token_string<70> >( "#[^\\n\\r]*[\\n\\r]" ) ); mySyntax.add_rule( rule<token_string<71> >( "\"[^\"\\n\\r]*[\"\\n\\r]" ) ); mySyntax.add_rule( rule<token_string<72> >( "[a-zA-Z_][a-zA-Z0-9_]*" ) ); mySyntax.add_rule( rule<token<73> >( "[ \\t\\n\\r]+" ) ); lexer<> myLexer( mySyntax ); std::ifstream fsp( "test.vrml" ); myLexer.set_source( fsp ); // Also it will have a char iterator interface lexer<>::token_iterator iter = myLexer.begin(); for ( ; iter != myLexer.end(); ++iter ) // Do something with the tokens... ; Essentially you create a syntax to which you add rules. These rules explain the token that will be created by the rule, and a regular expression to be matched. Special versions of tokens exist that will automatically generate a float, int or string from the matched expression. You then construct a lexer off the syntax, set a suitable source input, and then iterate over the tokens. The tokens are polymorphic so you can easily access the type and data, they are statically typed so you can use types to match the tokens, or you can dispatch tokens to functions through a visitor framework. You can also use dynamically typed tokens as well if you want - although you lose much of the benefit of strong typing. I especially like the idea of using visitors to distinguish between tokens. This means that in my example of a VRML file, the long float lists can be handled with a visitor that contains visit functions for a float token, a comma token (note that in vector lists in VRML the elements in a 3D vector are separated by whitespace, and the 3D vectors themselves separated by commas) and a list termination token (eg a "]"). The visitor would then automatically store the floats in a suitable data structure - like a list of 3D vectors. Dave