
On 9/16/05 12:49 PM, "Rob Stewart" <stewart@sig.com> wrote:
From: Eric Niebler <eric@boost-consulting.com>
Daryle Walker wrote:
On 9/15/05 1:10 PM, "Eric Niebler" <eric_at_[hidden]> wrote:
In practice, the only reason why you might iterator over all sub-matches is to print them out. Otherwise, the sub-matches are accessed randomly, because (for example) the 1st sub-match is a date and the 3rd sub-match is an email address, and I'm not interested in the 2nd. See? <snip>
It looks like the current setup is not STL-friendly. Most of the "what" list is one type of thing, the in-order pieces of the regex parse. The first item of the list doesn't match that pattern (since it's the whole
That's completely STL-friendly: there are iterators. When using STL-style algorithms, one must determine the applicable range. For many uses, I'll grant that you'd want to skip the first element, but that hardly constitutes being unfriendly to the STL.
Making something like: std::copy( pieces.begin(), pieces.end(), destination ); completely useless isn't unfriendly? You'll have to use a (mutable) object to store "pieces.begin()" so you can increment it before the copy, or use a "+ 1" if the "what" list supports random iteration.
parse). I'm guessing that this "old" way wasn't a problem because people expected 1-based arrays, so the 0-index could be special. That doesn't work in a 0-based array culture, like C++ (or C). C++ people would expect the 0-index element to match the general rule of the list. This mixing of element types mixes concerns (violating "keep it simple, silly"). A STL-friendly alternative would to have separate member functions for the whole-parse and the list-of-parse-pieces, then have a special function (member or non-member) that generates a regex-culture combined list.
You could fatten the interface that way, but it really wouldn't gain much and can certainly lead to confusion because of the differing indices based upon which interface one uses.
I'm guessing that C++ people would use the single-step interface and not bother with numeric indices, and Regex people would do the reverse. And I suspect that the C++ format is internally generated anyway and just hidden before the whole-string piece is prepended to it. The only "flaw" is that numeric indices require random-acess iteration, which brings a single-step interface because it's a superset of forward iteration.
Since each user of the library could choose a different interface, maintenence would be more difficult due to requiring knowledge of all of the interfaces and knowing which was employed in a given case.
Restating what I said, I think most people would pick a C++ culture interface at every step or a Regex culture interface at every step.
I agree, if we we're only concerned about satisfying people familiar with C++ culture. But we are also trying to satisfy people familiar with regex culture. Every regex package out there I know of that supports back-references begins numbering captures at 1. I don't know why. But I do know that to break with that tradition now would cause massive confusion. Besides, I'm trying to minimize the differences between xpressive's interface and TR1 regex.
There are numerous examples of using the 0th element to be "the whole thing" and then the parts being elements 1 through N. For example awk uses $0 for the entire line and $1 through $(NF) for the fields matched by the field separator. IIRC, JavaScript's RE support provides the entire matched string in element 0 of the result, with the captures in elements 1 through N.
I'm guessing that these many not be independent examples, but simple borrowing of an interface. In other words, doing it just to follow precedent. Maybe JavaScript's RE does it this way only because regular "regex" does. And maybe "awk" does it because "regex" does. (Or since I don't know too much about Unix history, the order could be reversed so "regex" copied the idea from "awk" instead. And then "awk" would have done it to save resources, which were tight back then.)
Is it a wart? OK, I agree. But frankly, I don't feel that this is an ugly enough wart for me to break with established practice.
I think it was a wise decision.
-- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com