Re: [boost] [Review] xpressive

16 Sep 2005

      From: Eric Niebler <eric@boost-consulting.com>
...
Daryle Walker wrote:
...
On 9/15/05 1:10 PM, "Eric Niebler" <eric_at_[hidden]> wrote:
...
In practice, the only reason why you might iterator over
 all sub-matches is to print them out. Otherwise, the sub-matches are 
accessed
 randomly, because (for example) the 1st sub-match is a date and the 3rd
 sub-match is an email address, and I'm not interested in the 2nd. See?
<snip>
...
It looks like the current setup is not STL-friendly. Most of the "what"
list is one type of thing, the in-order pieces of the regex parse. The
first item of the list doesn't match that pattern (since it's the whole
That's completely STL-friendly: there are iterators.  When using
STL-style algorithms, one must determine the applicable range.
For many uses, I'll grant that you'd want to skip the first
element, but that hardly constitutes being unfriendly to the STL.
...
...
parse). I'm guessing that this "old" way wasn't a problem because people
expected 1-based arrays, so the 0-index could be special. That doesn't 
work
in a 0-based array culture, like C++ (or C). C++ people would expect the
0-index element to match the general rule of the list. This mixing of
element types mixes concerns (violating "keep it simple, silly"). A
STL-friendly alternative would to have separate member functions for the
whole-parse and the list-of-parse-pieces, then have a special function
(member or non-member) that generates a regex-culture combined list.
You could fatten the interface that way, but it really wouldn't
gain much and can certainly lead to confusion because of the
differing indices based upon which interface one uses.

Since each user of the library could choose a different
interface, maintenence would be more difficult due to requiring
knowledge of all of the interfaces and knowing which was employed
in a given case.
...
I agree, if we we're only concerned about satisfying people familiar 
with C++ culture. But we are also trying to satisfy people familiar with 
regex culture. Every regex package out there I know of that supports 
back-references begins numbering captures at 1. I don't know why. But I 
do know that to break with that tradition now would cause massive 
confusion. Besides, I'm trying to minimize the differences between 
xpressive's interface and TR1 regex.
There are numerous examples of using the 0th element to be "the
whole thing" and then the parts being elements 1 through N.  For
example awk uses $0 for the entire line and $1 through $(NF) for
the fields matched by the field separator.  IIRC, JavaScript's RE
support provides the entire matched string in element 0 of the
result, with the captures in elements 1 through N.
...
Is it a wart? OK, I agree. But frankly, I don't feel that this is an 
ugly enough wart for me to break with established practice.
I think it was a wise decision.

-- 
Rob Stewart                           stewart@sig.com
Software Engineer                     http://www.sig.com
Susquehanna International Group, LLP  using std::disclaimer;