
On 08/06/10 14:13, Eric Niebler wrote:
On 6/8/2010 4:22 AM, John Bytheway wrote:
I once wrote a compile-time lexer compiler, which seems to be the sort of thing you're suggesting. It worked like this:
typedef lexer< rule<literal<'k','e','y','w','o','r','d'>, construct<KEYWORD> >,
mpl::string could help you here.
Indeed. I think it didn't exist when I wrote this code.
The library will take this lexer specification and turn it into an NFA, then transform that into a DFA, then encode that as a transition table in an array, *all at compile time*.
Wow! This is something like what I had in mind.
Well, I'm glad you seem impressed. It wasn't trivial to make this work :).
<snip>
The down side with this approach, of course, is the absolutely massive compile-time cost. Compiling the above example with g++ 4.4.3 takes over 6 minutes and over 1.3GiB of memory (and this is after I've put in effort to optimize it). The time I could live with, but the memory requirements are absurd. Any lexer of real-world complexity would be quite impossible to compile.
Ouch, that's a serious problem. Maybe there is a more efficient way to get to a DFA. Or maybe the world isn't ready for compile-time DFAs.
I suspect there's a decent constant factor that could be squeezed out, but I don't think there are any serious algorithmic improvements. OTOH, metaprogramming algorithmic complexity is confusing at the best of times, so I may be wrong. Once g++ supports C++0x constexpr functions I intend to revisit this; they might well provide substantial improvement. That said, I'll try clang now and see how that fares...
However, it might just be possible to use it for simple regexes, rather than full-blown lexers. Anyone think that might be worth investigating?
If the compile times are that bad, I wouldn't be able to use it. I can't use it as-is regardless, because static xpressive doesn't have compile-time access to the string literals; they're still char*.
Yeah, that is an annoyance. I don't think there's anything to be done about it, though. John Bytheway