Re: [boost] Re: new version of xpressive available

18 May 2005

      From: "Eric Niebler" <eric@boost-consulting.com>
...
Rob Stewart wrote:
...
Why does the regex_token_iterator<> ctor use a magic number like
-1 to indicate behavior rather than a named value?  (I just
clicked through to the reference and see that it takes a
regex_constants::match_flag_type, but
http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp...
shows passing -1 -- with an explanatory comment -- instead.  This
leads to confusion.)
Again, I'm just following the standard here, but providing a named 
constant would be a nice addition. The -1 is an optional 4th parameter, 
and the match_flag_type is an optional 5th parameter -- so there should 
be no confusion.
Apparently, I can't count.  I was matching the -1 with the
match_flag_type parameter.  Whatever the type, it ought to use
named values.  Perhaps there's time to improve the proposed
interface, too?
...
...
The following items are from the "Perl syntax vs. Static
xpressive syntax" table in
http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/creat...:
You seem to suggest that the xpressive equivalent of Perl's
   "a|b" must be spelled "a | b" but as far as I can see, the
   whitespace is irrelevant, so calling attention to it suggests
   a difference that doesn't exist.
Naturally whitespace is irrelevant. That's how C++ works. I don't think 
this should be a source of confusion for people.
Of course.  I was just pointing out that the Perl syntax was
shown without whitespace (necessary) and the C++ with (not
necessary).  Many writing C++ can be confused over matters like
this.  Of course, if you show the xpressive version as "a|b" such
people won't think they can write "a | b."  Doing so, however,
does avoid a gratuitous difference, don't you think?

Maybe a note clarifying that while spaces are significant in a
Perl or, for that matter, a dynamic xpressive RE, they aren't
significant in a static xpressive RE other than in literals.
...
...
"bos" and "eos" are a little odd.  First, it seems like
   "sequence" should be "input."  Second, I usually think of
   SOF/EOF and SOL/EOL pairs rather than BOF/EOF and BOL/EOL.
   Thus, I'd have gone with "soi" and "eoi" at the least.
   Unfortunately, in an effort to keep them short, they aren't
   terribly mnemonic.  How about "start" and "end" (or "beg" and
   "end" if you want to go with just three letters)?
The regex std proposal has match flags match_not_bol and match_not_eol, 
so I'm reusing this terminology. Boost.Regex also has match_not_bob for 
"beginning of buffer". This is not proposed for standardization, and I 
don't think the term "buffer" is appropriate anyway. You like "input" 
but I prefer "sequence". I dislike "input" becauase it might suggest to 
people that input iterators are acceptable to the regex algorithms, 
where as a bidirectional sequence is what is required.
What about "beg" and "end?"  I realize they aren't reusing the
proposed terminology, but they avoid the "sequence/buffer/input"
issue.
...
...
Considering how much you compare xpressive to Perl's REs, I'm
   surprised you opted for ~_d instead of _D, for example.  I'm
   not saying that would be better, but the disconnect from Perl
   didn't seem necessary in this case.
It is necessary. _D is an illegal identifier, reserved to the 
implementation. All identifiers that begin with an underscore and a 
capital letter are illegal in user code. Even if that were not the case, 
ALL CAPS is reserved for macros by convention. That's how I ended up 
with ~_d.
Doh!  Where was my mind?  Of course that's not a legal
identifier.  Clearly I was doing too many things at once at that
time.

(I'd hardly consider that all caps thus implying a macro,
however.)

-- 
Rob Stewart                           stewart@sig.com
Software Engineer                     http://www.sig.com
Susquehanna International Group, LLP  using std::disclaimer;