
I've studied the interfaces you suggested, and here are my observations. Please correct me if I'm wrong.
<perl_matcher> is a collection of algorithms which sort of hack into the underlying <basic_regex>'s internal data structures and use them to perform matching or whatever they're up to.
Yes, it's responsible for the actual matching.
<basic_regex_creator> is a "syntax features to internals" converter which is called directly by the parser. In theory, implementing it should be the better way to initialize customized structures. However, the way it is used is somewhat tricky: it fills not its own data structures, but structures of the class that called the parser itself---the <basic_regex_implementation>. So this one would need reimplementing, too.
Now for the real problem. Both <perl_matcher> and <basic_regex_creator> deal with the already compiled state machine or its elements. The first one works directly with the regex internals, the second one gets <append_state> and similar calls from the parser. The trouble is, some of my algorithms' calculations have to be performed directly on the expression tree, the compiled state machine won't help. Is it possible to restore the tree from the information provided by the library? That is, given the regex "((?:a|b)*?)(b+)", end up with an object like
new cat( new match( new kleene_lazy( new alt( new charset( "a" ), new charset( "b" ) ) ) ), new match( new repeat( new charset( "b" ) ) ) )
I see what you mean, no, there never is a parse tree like that: it's never been necessary (until now obviously).
And now for something completely different.
The following program outputs ' aa', where the first char is \0. If we replace <smatch> by <cmatch>, the output is ok. That holds for the regex5 as well as the regex library in boost 1.31.0. Am I missing something? (MSVC 7.0)
#include <boost/regex.hpp> #include <iostream> main() { boost::smatch m; boost::regex_match( "aaa", m, boost::regex( ".*" ) ); std::cout << m[ 0 ] << "\n"; }
Well smatch is the wrong type to use in this situation: std::string::const_iterator and const char* are not the same type, you should be using match_results<const char*> in this case, which is the same type as typedef cmatch (I hope that makes sense). For conforming compilers, the code you posted does not compile (which is the correct behaviour): but some workarounds for bugs present in VC6 and VC7 cause a temporary string to be created in this case, and the call to go through an overload that ideally would not have been found - so the iterators you get back are iterators into a string that's already been destroyed. John.