
Line Oddskool wrote:
Hi boost.regex gurus,
I'm stuck with a problem dealing with some kind of regex merging (using boost 1.33). I don't know if the way I took is viable, so any ideas and advice will be appreciated.
To give you some insight, i have a set of a hundred matching (ei) and formating (ri) "rules" e.g.
e1 : (a)(?=ll) r1 : (?1o)
e1/r1 should mean "matching 'a' of some string like 'all' should be replaced by 'o'"
I merge all my e/r into one big regex using regex_merge (for performance), so the resulting matching/formating regex is like :
e : e1|e2|...|en r: r1r2...rn
I'm getting weird behaviour with this, as the resulting string is sometimes filled with sequences like 'u4u5u6u7u8u9u' or other "trash".
So to debug this, I'd like to know which rule (i.e. which ei) matched on what part of the string.
I'm unsure if it's possible to get some kind of iterator on the rules that have matched using regex_merge ?
I also looked at the match_results returned by the simpler method regex_match(), but I can't figure out how to know which part of my matching regex matched (i.e. which ei) ?
Unless you really meant it, regex_search would be analogous to regex_replace (the new name for regex_merge). The way to find out which sub-expression matched is simply: match_results<something> what; ... for(unsigned i = 1; i < what.size(); ++i) { if(what[i].matched) std:cout << "sub-expression " << i << " matched " << what[i] << std::endl; }
Otherwise, is there a way to analyse or dump the matching/replacing behaviour of such a complex regex ?
'Fraid not, you would likely be swamped with so much data that it probably wouldn't be that useful in anycase :-( You could also try a binary-search-reduction on the problem: split the regex in two and find which half has the issue, then split again and so on... HTH, John.