Re: [boost] [regex] any interest in finite automata-based regex engine?

3 Nov 2004

      On Tue, 2 Nov 2004 10:11:45 +0300, Vladimir Pozdyayev <ardatur@mail.ru> wrote:
...
To Eric Niebler and John Maddock on the subject of "something
completely different":
Oh well. Mea culpa. Thanks.
And now for something more completely different.
JM> I forgot to say: without understanding the algorithm you are using, and the
JM> data it needs, it's hard to know whether or not Boost.Regex can support it,
JM> I think you may have been asked this already, but do you have a description
JM> anywhere of the algorithm you are using?
http://groups.yahoo.com/group/boost/files/anfa-regex/
EN> ... xpressive's
EN> compile-time design allows for multiple back-ends (NFA, backtracking
EN> NFA, DFA) without paying for ones that aren't used. However, xpressive
EN> is hard-coded to use a backtracking NFA for now, and there is no clean
EN> interface for plugging in a different back-end...
Maybe it's good time to officially separate regex components from
each other and define the way they should interact, thus allowing
users to connect them at their preference?
I can name three of them: syntax parser, state machine, and character
sets. The first two are interacting through either tree data
structures (passive/declarative way), or callbacks imitating the
parser shift-reduce activity (active/imperative way; the better one,
I think). The other two interact through "bool contains( char_t c )"
(and, possibly, additional functions to deal with more subtle stuff).
I understand that some algorithms could prefer to treat those three
as if they were closely related... well, whether it will become a
problem is a question of design. Opinions on this, as well as on the
whole subject, are welcome.
For what it's worth, I think that's a really promising idea.  As is, the
library is a black box.  Nobody can introduce a new syntax parser
without access to the library internals; this is directly contrary to 
the way that the STL separates containers and algorithms.

The question is whether it's possible to define an interface, or a
family of interface, for creating and executing NDFAs.  (Or maybe
even DFAs in some simple cases?)  Certainly the fact that better
algorithms are possible for regexes that don't have backreferences
than for regexes that do is interesting; to me, at least, this suggests
that we might want to provide both algorithms and let users choose
whichever one is appropriate.

      --Matt

Re: [boost] [regex] any interest in finite automata-based regex engine?

Matt Austern