Re: [boost] [Review] xpressive

19 Sep 2005


      Rob Stewart <stewart@sig.com> writes:
...
From: Daryle Walker <darylew@hotmail.com>
...
On 9/16/05 12:49 PM, "Rob Stewart" <stewart@sig.com> wrote:
...
From: Eric Niebler <eric@boost-consulting.com>
Martin Bonner had the same reaction as I did when I read that:
why not just pass ++pieces.begin()?
Actually boost::next(pieces.begin()) would be more general.
...
Furthermore, as I said, one
must always determine the applicable range to pass to an
algorithm.  Sure it's nice to pass the whole range, but you can't
always do that.  The downside here is that you'd rarely want to do
that (maybe to copy the strings for some post processing or doing
I/O).
...
...
...
...
parse). I'm guessing that this "old" way wasn't a problem because people
expected 1-based arrays, so the 0-index could be special. That doesn't
work
in a 0-based array culture, like C++ (or C). C++ people would expect the
0-index element to match the general rule of the list. This mixing of
element types mixes concerns (violating "keep it simple, silly"). A
STL-friendly alternative would to have separate member functions for the
whole-parse and the list-of-parse-pieces, then have a special function
(member or non-member) that generates a regex-culture combined list.
You could fatten the interface that way, but it really wouldn't
gain much and can certainly lead to confusion because of the
differing indices based upon which interface one uses.
I'm guessing that C++ people would use the single-step interface and not
bother with numeric indices, and Regex people would do the reverse.  And I
suspect that the C++ format is internally generated anyway and just hidden
before the whole-string piece is prepended to it.  The only "flaw" is that
numeric indices require random-acess iteration, which brings a single-step
interface because it's a superset of forward iteration.
What about "C++ people" that are also "Regex people?"  Which do
they use?  Note also what I said here:
...
...
Since each user of the library could choose a different
interface, maintenence would be more difficult due to requiring
knowledge of all of the interfaces and knowing which was employed
in a given case.
Restating what I said, I think most people would pick a C++ culture
interface at every step or a Regex culture interface at every step.
If a "C++ person" chose the C++ interface and a maintainer was a
"Regex person," confusion would ensue.
...
...
There are numerous examples of using the 0th element to be "the
whole thing" and then the parts being elements 1 through N.  For
example awk uses $0 for the entire line and $1 through $(NF) for
the fields matched by the field separator.  IIRC, JavaScript's RE
support provides the entire matched string in element 0 of the
result, with the captures in elements 1 through N.
I'm guessing that these many not be independent examples, but simple
borrowing of an interface.  In other words, doing it just to follow
precedent.  Maybe JavaScript's RE does it this way only because regular
"regex" does.  And maybe "awk" does it because "regex" does.  (Or since I
don't know too much about Unix history, the order could be reversed so
"regex" copied the idea from "awk" instead.  And then "awk" would have done
it to save resources, which were tight back then.)
You may be right, but is it wise to part with decades of
precedent?
Besides, I think Eric pointed out the biggest reason to keep the
1-based capture interface: the captures in the RE are 1-based, so
those accessed from C++ should be, too.
-- 
Rob Stewart                           stewart@sig.com
Software Engineer                     http://www.sig.com
Susquehanna International Group, LLP  using std::disclaimer;
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Re: [boost] [Review] xpressive

David Abrahams