
This one applies to Boost.RegEx, too, but I'll ask you: Why have both regex_match() and regex_search() when the latter can behave like the former by adding two anchors?
This is true. I'm following the lead of the regex std proposal here, but I've never felt comfortable with regex_match, to be honest. A common noobie mistake is to use regex_match instead of regex_search. Perl, for instance, doesn't distinguish between "search" and "match" operations, and "search" is the default. What makes it worse is that in Perl circles, the semantic equivalent of regex_search is called /matching/, hence the disconnect. Not sure what to do. Perhaps John could comment.
If I remember correctly the original terminology was inherited from the GNU regex package, and later got refined as a result of user feedback. But Eric's correct it is a major source of confusion *for those migrating from Perl*. The aim was that the code should be quite explicit about what it's doing: a programmer that sees regex_match would know that the code is looking to match all of the text and not just some part of it.
Why does the regex_token_iterator<> ctor use a magic number like -1 to indicate behavior rather than a named value? (I just clicked through to the reference and see that it takes a regex_constants::match_flag_type, but http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp... shows passing -1 -- with an explanatory comment -- instead. This leads to confusion.)
Again, I'm just following the standard here, but providing a named constant would be a nice addition. The -1 is an optional 4th parameter, and the match_flag_type is an optional 5th parameter -- so there should be no confusion.
The -1 means "the thing before 0" and 0 is the whole of what matched, so -1 is the string before the bit that matched. Well that's the logic anyway. Doesn't seem to have caused any confusion in practice, but there's no harm in adding a named constant.
The regex std proposal has match flags match_not_bol and match_not_eol, so I'm reusing this terminology. Boost.Regex also has match_not_bob for "beginning of buffer". This is not proposed for standardization, and I don't think the term "buffer" is appropriate anyway. You like "input" but I prefer "sequence". I dislike "input" becauase it might suggest to people that input iterators are acceptable to the regex algorithms, where as a bidirectional sequence is what is required.
Historically, those terms (or very similar) are used by GNU regex and the BSD (Henry Spencer) packages. Renaming them would probably start a bicycle-shed style discussion I guess. Good names are hard, especially if the answer isn't immediately obvious! HTH, John.