[RegEx] does match_extra flag follow the specification ?

3 Nov 2006

      The boost documentation located (HYPERLINK "http://www.boost.org/libs/regex/doc/match_flag_type.html"http://www.boost.org/libs/regex/doc/match_flag_type.html) tell this :

“match_extra Instructs the matching engine to retain all available HYPERLINK "http://www.boost.org/libs/regex/doc/captures.html"capture information; if a capturing group is repeated then information about every repeat is available via HYPERLINK "http://www.boost.org/libs/regex/doc/match_results.html#m17"match_results::captures() or HYPERLINK "http://www.boost.org/libs/regex/doc/sub_match.html#m8"sub_match_captures(). “

This feature was for me THE great feature that can provide a great way to link related information together.

But the behavior using this flag with search (algorithm) was not the one expected (for me).

Because instead of getting information about every repeat, sub_match_captures() contains all the captures obtained for corresponding sub-expression (as documentation  HYPERLINK "http://www.boost.org/libs/regex/doc/sub_match.html"http://www.boost.org/libs/regex/doc/sub_match.html of sub_match’s captures member says).

A capturing group repeat differ from captures and the fact that regex behave this way prevent me to link information that were captured in the same repeat.

For example (with use of named capture syntax (wich is not supported today in boost) to clarify regular expression):

^(?<time>[^ ]+)(?: (?<attr>[A-Za-z]+)=(?:"(?<qvalue>[^"]+)"|(?<svalue>[^ ]+)))+

which intend to parse this kind of lines

12/05/2006_12:04:25 id=5 msg="this is a problem" user=paul

captures for this example

time={‘12/05/2006_12:04:25’}

attr={‘id’,’msg’,’user’}

qvalue={‘this is a problem’}

svalue={‘5’,’paul’}

and I was expecting

time={‘12/05/2006_12:04:25’}

attr={‘id’,’msg’,’user’}

qvalue={null,‘this is a problem’,null}

svalue={‘5’,null,’paul’}

I’ve got “useless” data because we loose the data structure, no way to link paul to user neither to link “msg” to “this is a problem”.

Sorry for my English that’s may be a starting point for misunderstanding, but it should be cool that documentation match specification and or behave like I was expecting.

I understand that there a limitation to the behavior i was expecting since it does not take care of underneath structure if there is repeated group in repeated group.

There is several way to prevent loosing these relationship between data (with different degree of relevance) :

-          build a hierarchical tree of capture (syntactical tree)

-          Provide iterator on all captures that keep track apparition’s order.

-          Allow named capture with duplicate group name.

So, is it documentation to fix or a bug?

Alquier Luc

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.25/515 - Release Date: 03/11/2006

Luc LA. ALQUIER

John Maddock

Eric Niebler

tags

participants (3)