
From: "Eric Niebler" <eric@boost-consulting.com>
Rob Stewart wrote:
Why does the regex_token_iterator<> ctor use a magic number like -1 to indicate behavior rather than a named value? (I just clicked through to the reference and see that it takes a regex_constants::match_flag_type, but http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp... shows passing -1 -- with an explanatory comment -- instead. This leads to confusion.)
Again, I'm just following the standard here, but providing a named constant would be a nice addition. The -1 is an optional 4th parameter, and the match_flag_type is an optional 5th parameter -- so there should be no confusion.
Apparently, I can't count. I was matching the -1 with the match_flag_type parameter. Whatever the type, it ought to use named values. Perhaps there's time to improve the proposed interface, too?
The following items are from the "Perl syntax vs. Static xpressive syntax" table in http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/creat...:
You seem to suggest that the xpressive equivalent of Perl's "a|b" must be spelled "a | b" but as far as I can see, the whitespace is irrelevant, so calling attention to it suggests a difference that doesn't exist.
Naturally whitespace is irrelevant. That's how C++ works. I don't think this should be a source of confusion for people.
Of course. I was just pointing out that the Perl syntax was shown without whitespace (necessary) and the C++ with (not necessary). Many writing C++ can be confused over matters like this. Of course, if you show the xpressive version as "a|b" such people won't think they can write "a | b." Doing so, however, does avoid a gratuitous difference, don't you think? Maybe a note clarifying that while spaces are significant in a Perl or, for that matter, a dynamic xpressive RE, they aren't significant in a static xpressive RE other than in literals.
"bos" and "eos" are a little odd. First, it seems like "sequence" should be "input." Second, I usually think of SOF/EOF and SOL/EOL pairs rather than BOF/EOF and BOL/EOL. Thus, I'd have gone with "soi" and "eoi" at the least. Unfortunately, in an effort to keep them short, they aren't terribly mnemonic. How about "start" and "end" (or "beg" and "end" if you want to go with just three letters)?
The regex std proposal has match flags match_not_bol and match_not_eol, so I'm reusing this terminology. Boost.Regex also has match_not_bob for "beginning of buffer". This is not proposed for standardization, and I don't think the term "buffer" is appropriate anyway. You like "input" but I prefer "sequence". I dislike "input" becauase it might suggest to people that input iterators are acceptable to the regex algorithms, where as a bidirectional sequence is what is required.
What about "beg" and "end?" I realize they aren't reusing the proposed terminology, but they avoid the "sequence/buffer/input" issue.
Considering how much you compare xpressive to Perl's REs, I'm surprised you opted for ~_d instead of _D, for example. I'm not saying that would be better, but the disconnect from Perl didn't seem necessary in this case.
It is necessary. _D is an illegal identifier, reserved to the implementation. All identifiers that begin with an underscore and a capital letter are illegal in user code. Even if that were not the case, ALL CAPS is reserved for macros by convention. That's how I ended up with ~_d.
Doh! Where was my mind? Of course that's not a legal identifier. Clearly I was doing too many things at once at that time. (I'd hardly consider that all caps thus implying a macro, however.) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;