
well, since what[] is already an array of n subexpressions, during finding of matches, I suppose, you set all the fields of what[i], if the i-th subexpression matched. While you're doing this, you could also store the index i into another structure of what.
I hope I explained this correctly...
Yes, but it's not that simple... 1) As well as adding matches to the list, matches can become "unmatched" during backtracking. 2) The time taken for a match is completely dominated by the time taken for calls to new and delete, especially for simple expressions a call to new is about 10x as long as the time to find a match, so if you reuse your match_results structures (so the matcher doesn't have to allocate any memory), then matching is extremely fast. Using a dynamic memory structure (like std::set or std::map) to store just those sub-expressions that matched would completely blow this out of the water. And yes, for some folks this is very important.... An alternative would be to extend the existing structure to hold the extra information, the trouble is, almost any way I can think of arranging this will have rather a negative speed impact (don't forget you have to remove as well as add entries). Probably the fastest method is to just post-process the results by doing a linear scan through them, and that's something you can do right now! You'd need hundreds of sub-expressions before this became unduly inefficient - scanning through a vector really is very fast. There is one option I can think of that might work: add a linked list to the existing submatches, so that each links forward to the next matched subexpression and back to the previous matched subexpression. However, this means that only bi-directional (not random access ) would be possible to "just the matched" sub-expressions. It would also double the size of the sub_match structure, which itself has important implications: because these structures are pushed into the stack (on Windows at least), it basically halves the maximum complexity of expression you can match, more excessive memory usage also has a performance impact. For this reason the capturing code (see repeated captures in libs/regex/doc/captures.html) isn't enabled by default: it also doubles the size of sub_match, with all the issues that raises. Sorry if I'm being unduly pessimistic, but in all honesty, the most efficient method really is to just scan the match_results structure after the event, this has the advantage as well that you don't pay for what you don't use. John.