
On Tue, 2 Nov 2004 10:11:45 +0300, Vladimir Pozdyayev <ardatur@mail.ru> wrote:
To Eric Niebler and John Maddock on the subject of "something completely different":
Oh well. Mea culpa. Thanks.
And now for something more completely different.
JM> I forgot to say: without understanding the algorithm you are using, and the JM> data it needs, it's hard to know whether or not Boost.Regex can support it, JM> I think you may have been asked this already, but do you have a description JM> anywhere of the algorithm you are using?
http://groups.yahoo.com/group/boost/files/anfa-regex/
EN> ... xpressive's EN> compile-time design allows for multiple back-ends (NFA, backtracking EN> NFA, DFA) without paying for ones that aren't used. However, xpressive EN> is hard-coded to use a backtracking NFA for now, and there is no clean EN> interface for plugging in a different back-end...
Maybe it's good time to officially separate regex components from each other and define the way they should interact, thus allowing users to connect them at their preference?
I can name three of them: syntax parser, state machine, and character sets. The first two are interacting through either tree data structures (passive/declarative way), or callbacks imitating the parser shift-reduce activity (active/imperative way; the better one, I think). The other two interact through "bool contains( char_t c )" (and, possibly, additional functions to deal with more subtle stuff). I understand that some algorithms could prefer to treat those three as if they were closely related... well, whether it will become a problem is a question of design. Opinions on this, as well as on the whole subject, are welcome.
For what it's worth, I think that's a really promising idea. As is, the library is a black box. Nobody can introduce a new syntax parser without access to the library internals; this is directly contrary to the way that the STL separates containers and algorithms. The question is whether it's possible to define an interface, or a family of interface, for creating and executing NDFAs. (Or maybe even DFAs in some simple cases?) Certainly the fact that better algorithms are possible for regexes that don't have backreferences than for regexes that do is interesting; to me, at least, this suggests that we might want to provide both algorithms and let users choose whichever one is appropriate. --Matt