[regex] matchinf info with merged regex

Hi boost.regex gurus, I'm stuck with a problem dealing with some kind of regex merging (using boost 1.33). I don't know if the way I took is viable, so any ideas and advice will be appreciated. To give you some insight, i have a set of a hundred matching (ei) and formating (ri) "rules" e.g. e1 : (a)(?=ll) r1 : (?1o) e1/r1 should mean "matching 'a' of some string like 'all' should be replaced by 'o'" I merge all my e/r into one big regex using regex_merge (for performance), so the resulting matching/formating regex is like : e : e1|e2|...|en r: r1r2...rn I'm getting weird behaviour with this, as the resulting string is sometimes filled with sequences like 'u4u5u6u7u8u9u' or other "trash". So to debug this, I'd like to know which rule (i.e. which ei) matched on what part of the string. I'm unsure if it's possible to get some kind of iterator on the rules that have matched using regex_merge ? I also looked at the match_results returned by the simpler method regex_match(), but I can't figure out how to know which part of my matching regex matched (i.e. which ei) ? Otherwise, is there a way to analyse or dump the matching/replacing behaviour of such a complex regex ? Thanks Line.

Line Oddskool wrote:
Hi boost.regex gurus,
I'm stuck with a problem dealing with some kind of regex merging (using boost 1.33). I don't know if the way I took is viable, so any ideas and advice will be appreciated.
To give you some insight, i have a set of a hundred matching (ei) and formating (ri) "rules" e.g.
e1 : (a)(?=ll) r1 : (?1o)
e1/r1 should mean "matching 'a' of some string like 'all' should be replaced by 'o'"
I merge all my e/r into one big regex using regex_merge (for performance), so the resulting matching/formating regex is like :
e : e1|e2|...|en r: r1r2...rn
I'm getting weird behaviour with this, as the resulting string is sometimes filled with sequences like 'u4u5u6u7u8u9u' or other "trash".
So to debug this, I'd like to know which rule (i.e. which ei) matched on what part of the string.
I'm unsure if it's possible to get some kind of iterator on the rules that have matched using regex_merge ?
I also looked at the match_results returned by the simpler method regex_match(), but I can't figure out how to know which part of my matching regex matched (i.e. which ei) ?
Unless you really meant it, regex_search would be analogous to regex_replace (the new name for regex_merge). The way to find out which sub-expression matched is simply: match_results<something> what; ... for(unsigned i = 1; i < what.size(); ++i) { if(what[i].matched) std:cout << "sub-expression " << i << " matched " << what[i] << std::endl; }
Otherwise, is there a way to analyse or dump the matching/replacing behaviour of such a complex regex ?
'Fraid not, you would likely be swamped with so much data that it probably wouldn't be that useful in anycase :-( You could also try a binary-search-reduction on the problem: split the regex in two and find which half has the issue, then split again and so on... HTH, John.

John Maddock wrote:
The way to find out which sub-expression matched is simply:
match_results<something> what; ... for(unsigned i = 1; i < what.size(); ++i) { if(what[i].matched) std:cout << "sub-expression " << i << " matched " << what[i] << std::endl; }
First, thanks for the reply, it's nearly what I need ;) Actually, either I misunderstood something or is this code giving me just the first part in the regex that matches ? e.g. if I try <code> std::string myString ="jayjay"; std::string myRegexSearch = "(ay)|(j)(?=[aeiouy])"; boost:regex* myRegexp = new boost::regex (myRegexSearch, boost::regex::normal); boost::cmatch what; boost::regex_search (myString , what, *myRegexp); for(unsigned _ = 1; _ < what.size(); ++_) { if(what[_].matched) printf("rule [%d] matched '%s'\n",_,what[_]); } delete myRegexp; </code> The output is rule [2] matched 'jayjay' where I expected it to tell me "rule 1" and "rule 2" matched! Am I missing something? Thanks again, Line.

Line Oddskool wrote:
John Maddock wrote:
The way to find out which sub-expression matched is simply:
match_results<something> what; ... for(unsigned i = 1; i < what.size(); ++i) { if(what[i].matched) std:cout << "sub-expression " << i << " matched " << what[i] << std::endl; }
First, thanks for the reply, it's nearly what I need ;)
Actually, either I misunderstood something or is this code giving me just the first part in the regex that matches ?
e.g. if I try
<code>
std::string myString ="jayjay"; std::string myRegexSearch = "(ay)|(j)(?=[aeiouy])";
boost:regex* myRegexp = new boost::regex (myRegexSearch, boost::regex::normal);
boost::cmatch what;
boost::regex_search (myString , what, *myRegexp);
for(unsigned _ = 1; _ < what.size(); ++_) { if(what[_].matched) printf("rule [%d] matched '%s'\n",_,what[_]); } delete myRegexp;
</code>
The output is
rule [2] matched 'jayjay'
where I expected it to tell me "rule 1" and "rule 2" matched!
Am I missing something?
Well I'm amazed your computer didn't explode in a ball of flame when you passed a structure (boost::sub_match) by value to printf :-) Hint: match_results::operator[] does *not* return a null-terminated string. But apart from that if you want to search through all the occurances of the expression in the text, then use regex_iterator: std::string mystring = whatever; boost::regex e(myregex); boost::sregex_iterator i(e, mystring.begin(), mystring.end()), j; while(i != j) { for(unsigned sub = 1; sub < what.size(); ++sub) { if((*i)[sub].matched) std::cout << (*i)[sub] << std::endl; } } HTH, John.
participants (2)
-
John Maddock
-
Line Oddskool