Re: [boost] [regex] any interest in finite automata-based regexengine?

27 Oct 2004

      ...
The stuff I offer is dedicated to two tasks:
 * building an ANFA (that's Augmented NFA) from an expression
   tree of a given regex;
 * running the result against a given input string.
What such a code desperately needs, is the following:
 * syntactical front-end: a class that would parse the actual
   regex string and build its expression tree;
 * character back-end: a class that would allow checking whether
   a given character is contained in a given character set,
   respecting encodings, locales etc.
Boost.regex employs quite a general approach to these components.
Reusing them and connecting my code to them is what I have in
mind.
The only snag is, I'm not familiar with boost.regex internals. So,
any help in that field would be appreciated.
The regex internals are in the process of being completely rewritten (code 
is in cvs in the regex5 branch), I hope to merge this to the main trunk in 
the next few weeks: mainly it's the docs that I need to bring up to date.

Regex parsing and state machine construction should now be quite 
straightforward to understand (within limits for a regex engine obviously!), 
so I would urge you to take a look (I can send you a zip if you don't have 
cvs access).

I think the main problem is providing the same feature set as the existing 
engine - my understanding is that no machine can have the complexity you 
claim and still match backrefs, or even I believe wide characters (because 
the character set is too large to realistically build a table based NFA). 
Is that correct?

BTW, I have always thought that there was room for multiple regex engines in 
Boost that would offer increasingly fewer features, but gain in worst-case 
performance.

I suppose I should have tried to separate the parser from the back-end state 
machine format more, so that different engines can be plugged in at will, 
but there are only so many times I think I can stand to rewrite this stuff 
:-/

John.

Re: [boost] [regex] any interest in finite automata-based regexengine?

John Maddock