Re: [boost] Re: new version of xpressive available

19 May 2005

      ...
...
This one applies to Boost.RegEx, too, but I'll ask you: Why have
both regex_match() and regex_search() when the latter can behave
like the former by adding two anchors?
This is true. I'm following the lead of the regex std proposal here, but 
I've never felt comfortable with regex_match, to be honest. A common 
noobie mistake is to use regex_match instead of regex_search. Perl, for 
instance, doesn't distinguish between "search" and "match" operations, and 
"search" is the default. What makes it worse is that in Perl circles, the 
semantic equivalent of regex_search is called /matching/, hence the 
disconnect. Not sure what to do. Perhaps John could comment.
If I remember correctly the original terminology was inherited from the GNU 
regex package, and later got refined as a result of user feedback.  But 
Eric's correct it is a major source of confusion *for those migrating from 
Perl*.

The aim was that the code should be quite explicit about what it's doing: a 
programmer that sees regex_match would know that the code is looking to 
match all of the text and not just some part of it.
...
...
Why does the regex_token_iterator<> ctor use a magic number like
-1 to indicate behavior rather than a named value?  (I just
clicked through to the reference and see that it takes a
regex_constants::match_flag_type, but
http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp...
shows passing -1 -- with an explanatory comment -- instead.  This
leads to confusion.)
Again, I'm just following the standard here, but providing a named 
constant would be a nice addition. The -1 is an optional 4th parameter, 
and the match_flag_type is an optional 5th parameter -- so there should be 
no confusion.
The -1 means "the thing before 0" and 0 is the whole of what matched, so -1 
is the string before the bit that matched.  Well that's the logic anyway. 
Doesn't seem to have caused any confusion in practice, but there's no harm 
in adding a named constant.
...
The regex std proposal has match flags match_not_bol and match_not_eol, so 
I'm reusing this terminology. Boost.Regex also has match_not_bob for 
"beginning of buffer". This is not proposed for standardization, and I 
don't think the term "buffer" is appropriate anyway. You like "input" but 
I prefer "sequence". I dislike "input" becauase it might suggest to people 
that input iterators are acceptable to the regex algorithms, where as a 
bidirectional sequence is what is required.
Historically, those terms (or very similar) are used by GNU regex and the 
BSD (Henry Spencer) packages.  Renaming them would probably start a 
bicycle-shed style discussion I guess.  Good names are hard, especially if 
the answer isn't immediately obvious!

HTH,

John.

Re: [boost] Re: new version of xpressive available

John Maddock