
On 6/8/2010 4:22 AM, John Bytheway wrote:
On 08/06/10 01:34, Eric Niebler wrote:
I haven't forgotten. IIRC, lexertl is like dynamic xpressive: it accepts strings as input and builds a FSA. I was speculating about what the lexertl equivalent of static xpressive would look like and whether it would be even faster. Then static xpressive could use static lexertl for simple regexes, and dynamic xpressive could use the dynamic one.
I once wrote a compile-time lexer compiler, which seems to be the sort of thing you're suggesting. It worked like this:
typedef lexer< rule<literal<'k','e','y','w','o','r','d'>, construct<KEYWORD> >,
mpl::string could help you here.
rule<one_or_more<range<'a','z'> >, construct<ID, mpl::true_> >, rule<alternative_literal<' ','\n','\t','\r'>, discard >
keyword_lexer;
<snip>
This defines a lexer recognizing whitespace-separated lower-case words, but treating the word 'keyword' as special. Each rule of the lexer is a regex followed by what to do when that regex is matched. The first rule constructs a KEYWORD token, the second constructs an ID token (and the mpl::true_ means it passes the range which matched the regex to the constructor), the third matches without creating a token.
The library will take this lexer specification and turn it into an NFA, then transform that into a DFA, then encode that as a transition table in an array, *all at compile time*.
Wow! This is something like what I had in mind. <snip>
The down side with this approach, of course, is the absolutely massive compile-time cost. Compiling the above example with g++ 4.4.3 takes over 6 minutes and over 1.3GiB of memory (and this is after I've put in effort to optimize it). The time I could live with, but the memory requirements are absurd. Any lexer of real-world complexity would be quite impossible to compile.
Ouch, that's a serious problem. Maybe there is a more efficient way to get to a DFA. Or maybe the world isn't ready for compile-time DFAs.
However, it might just be possible to use it for simple regexes, rather than full-blown lexers. Anyone think that might be worth investigating?
If the compile times are that bad, I wouldn't be able to use it. I can't use it as-is regardless, because static xpressive doesn't have compile-time access to the string literals; they're still char*. -- Eric Niebler BoostPro Computing http://www.boostpro.com