Re: [boost] Re: [regex] regex_replace with format string replaced asitis

9 Jun 2005

      ...
well, since what[] is already an array of n subexpressions, during finding 
of matches, I suppose, you set all the fields of what[i], if the i-th 
subexpression matched.  While you're doing this, you could also store the 
index i into another structure of what.
I hope I explained this correctly...
Yes, but it's not that simple...

1) As well as adding matches to the list, matches can become "unmatched" 
during backtracking.
2) The time taken for a match is completely dominated by the time taken for 
calls to new and delete, especially for simple expressions a call to new is 
about 10x as long as the time to find a match, so if you reuse your 
match_results structures (so the matcher doesn't have to allocate any 
memory), then matching is extremely fast.  Using a dynamic memory structure 
(like std::set or std::map) to store just those sub-expressions that matched 
would completely blow this out of the water.  And yes, for some folks this 
is very important....

An alternative would be to extend the existing structure to hold the extra 
information, the trouble is, almost any way I can think of arranging this 
will have rather a negative speed impact (don't forget you have to remove as 
well as add entries).  Probably the fastest method is to just post-process 
the results by doing a linear scan through them, and that's something you 
can do right now!  You'd need hundreds of sub-expressions before this became 
unduly inefficient - scanning through a vector really is very fast.

There is one option I can think of that might work: add a linked list to the 
existing submatches, so that each links forward to the next matched 
subexpression and back to the previous matched subexpression.  However, this 
means that only bi-directional (not random access ) would be possible to 
"just the matched" sub-expressions.  It would also double the size of the 
sub_match structure, which itself has important implications: because these 
structures are pushed into the stack (on Windows at least), it basically 
halves the maximum complexity of expression you can match, more excessive 
memory usage also has a performance impact.  For this reason the capturing 
code (see repeated captures in libs/regex/doc/captures.html) isn't enabled 
by default: it also doubles the size of sub_match, with all the issues that 
raises.

Sorry if I'm being unduly pessimistic, but in all honesty, the most 
efficient method really is to just scan the match_results structure after 
the event, this has the advantage as well that you don't pay for what you 
don't use.

John.

Re: [boost] Re: [regex] regex_replace with format string replaced asitis

John Maddock