xpressive not in boost.1.33, but what about 1.34?

I suppose xpressive has not been accepted into boost yet (or I missed library review). This would mean that we will not see it in boost 1.33. Is there any chance we would see it in boost 1.34? B. -- Bronek Kozicki brok@rubikon.pl http://b.kozicki.pl/

Bronek Kozicki wrote:
I suppose xpressive has not been accepted into boost yet (or I missed library review). This would mean that we will not see it in boost 1.33. Is there any chance we would see it in boost 1.34?
You haven't missed anything. xpressive is still under active development. You can see a list of ToDo's for v1.0 at http://tinyurl.com/3wlgn. What I do with xpressive once I reach v1.0 is an open question. Incidentally, I've just completed a major rewrite of xpressive's innards. The meta-programming stuff is now based on MPL and Fusion and is very flexible, clean and idiomatic, but there shouldn't be any change from a user's perspective. Look for a new release Real Soon Now. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
I've just completed a major rewrite of xpressive's innards. The meta-programming stuff is now based on MPL and Fusion and is very flexible, clean and idiomatic, but there shouldn't be any change from a user's perspective. Look for a new release Real Soon Now.
The new release of xpressive is available now. It represents a complete rewrite of all of xpressive's meta-programming. This is an important milestone on the way to 1.0. I encourage anybody who has been using xpressive to grab the latest version and let me know of any problems. Download xpressive.zip at: http://boost-sandbox.sf.net/vault/index.php?directory=eric_niebler Docs at: http://boost-sandbox.sf.net/libs/xpressive -- Eric Niebler Boost Consulting www.boost-consulting.com

On Wed, Apr 27, 2005 at 04:51:58PM -0700, Eric Niebler <eric@boost-consulting.com> wrote:
The new release of xpressive is available now. It represents a complete rewrite of all of xpressive's meta-programming. This is an important milestone on the way to 1.0. I encourage anybody who has been using xpressive to grab the latest version and let me know of any problems.
Looks nice! Could it be possible to write up a small template framework to create scanners for spirit with xpressive? I could love a syntax like: template<typename IteratorT> struct my_scanner<IteratorT> : public boost::xpressive<my_scanner<IteratorT> > { struct result_type : public boost::xpressive_variant<string, int, pair<IteratorT,IteratorT> > {}; enum token_type { number, literal }; struct scanner_definition { scanner_definition() { //pseudocode: matcher<string,literal> = as_xpr('"') >> _ >> '"'; matcher<int,number> = range('0', '9'); // ... } }; }; Another thing: I always thought one should not use identifier names starting with an underscore like: "_b"? Andreas Pokorny

Andreas Pokorny wrote:
On Wed, Apr 27, 2005 at 04:51:58PM -0700, Eric Niebler <eric@boost-consulting.com> wrote:
The new release of xpressive is available now. It represents a complete rewrite of all of xpressive's meta-programming. This is an important milestone on the way to 1.0. I encourage anybody who has been using xpressive to grab the latest version and let me know of any problems.
Looks nice! Could it be possible to write up a small template framework to create scanners for spirit with xpressive?
About a year ago, Hartmut Kaiser wrote an xpressive_p() parser for Spirit which wrapped an xpressive regex. It worked a bit like the current regex_p parser. Would that meet your needs? I don't know what became of it. Perhaps Hartmut knows.
Another thing: I always thought one should not use identifier names starting with an underscore like: "_b"?
IIRC, you're not allowed to use names like _B (one underscore followed by a capital) or names like __b (two underscores). I think names like _b are OK. If they're not, we're un trouble becuase Boost.Bind, Boost.Lambda and Boost.MPL use _1 as a placeholder. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
Another thing: I always thought one should not use identifier names starting with an underscore like: "_b"?
IIRC, you're not allowed to use names like _B (one underscore followed by a capital) or names like __b (two underscores). I think names like _b are OK. If they're not, we're un trouble becuase Boost.Bind, Boost.Lambda and Boost.MPL use _1 as a placeholder.
I think they're okay as long as they're not global or in namespace std. Jonathan

Eric Niebler wrote:
IIRC, you're not allowed to use names like _B (one underscore followed by a capital) or names like __b (two underscores). I think names like _b are OK. If they're not, we're un trouble becuase Boost.Bind, Boost.Lambda and Boost.MPL use _1 as a placeholder.
_b is only reserved in the global namespace.

Hartmut Kaiser wrote:
About a year ago, Hartmut Kaiser wrote an xpressive_p() parser for Spirit which wrapped an xpressive regex. It worked a bit like the current regex_p parser. Would that meet your needs? I don't know what became of it. Perhaps Hartmut knows.
These are still in the Spirit CVS under boost/spirit/utility and boost/spirit/utility/impl. I'm attaching the two relevant files for your convenience (but I didn't check whether these compile today). Regards Hartmut

'Nuther new and much-improved version of xpressive in the sandbox. Many bugs fixed, moderately faster compiles, and more features. In particular: - nested inverse character sets finally work. - Match flags match_partial and match_continuous are implemented. Download xpressive.zip at: http://boost-sandbox.sf.net/vault/index.php?directory=eric_niebler Docs at: http://boost-sandbox.sf.net/libs/xpressive -- Eric Niebler Boost Consulting www.boost-consulting.com

From: "Eric Niebler" <eric@boost-consulting.com>
I was reading through a portion of the docs and a few issues came to mind. This one applies to Boost.RegEx, too, but I'll ask you: Why have both regex_match() and regex_search() when the latter can behave like the former by adding two anchors? Why does the regex_token_iterator<> ctor use a magic number like -1 to indicate behavior rather than a named value? (I just clicked through to the reference and see that it takes a regex_constants::match_flag_type, but http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp... shows passing -1 -- with an explanatory comment -- instead. This leads to confusion.) The following items are from the "Perl syntax vs. Static xpressive syntax" table in http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/creat...: You seem to suggest that the xpressive equivalent of Perl's "a|b" must be spelled "a | b" but as far as I can see, the whitespace is irrelevant, so calling attention to it suggests a difference that doesn't exist. "bos" and "eos" are a little odd. First, it seems like "sequence" should be "input." Second, I usually think of SOF/EOF and SOL/EOL pairs rather than BOF/EOF and BOL/EOL. Thus, I'd have gone with "soi" and "eoi" at the least. Unfortunately, in an effort to keep them short, they aren't terribly mnemonic. How about "start" and "end" (or "beg" and "end" if you want to go with just three letters)? . appears twice in the table with two different equivalences. It may be that the two are effectively the same, but they aren't grouped and the "Meaning" doesn't point out their equivalence. Considering how much you compare xpressive to Perl's REs, I'm surprised you opted for ~_d instead of _D, for example. I'm not saying that would be better, but the disconnect from Perl didn't seem necessary in this case. (I do recognize that you're using ~ to mean negation of the following subexpression in many other cases, so perhaps you just determined that being consistent in expressing negation was more important.) For "[abc]," you show to different xpressive equivalents, each in its own row of the table. Why not combine them into a single row? (Same for any other cases like that.) A tool that converts a Perl-style RE to xpressive (static notation certainly, and dynamic if there are any differences) would be quite helpful (for those that know Perl's REs). -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Thanks for the feedback. Answers inline... Rob Stewart wrote:
From: "Eric Niebler" <eric@boost-consulting.com>
I was reading through a portion of the docs and a few issues came to mind.
This one applies to Boost.RegEx, too, but I'll ask you: Why have both regex_match() and regex_search() when the latter can behave like the former by adding two anchors?
This is true. I'm following the lead of the regex std proposal here, but I've never felt comfortable with regex_match, to be honest. A common noobie mistake is to use regex_match instead of regex_search. Perl, for instance, doesn't distinguish between "search" and "match" operations, and "search" is the default. What makes it worse is that in Perl circles, the semantic equivalent of regex_search is called /matching/, hence the disconnect. Not sure what to do. Perhaps John could comment.
Why does the regex_token_iterator<> ctor use a magic number like -1 to indicate behavior rather than a named value? (I just clicked through to the reference and see that it takes a regex_constants::match_flag_type, but http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp... shows passing -1 -- with an explanatory comment -- instead. This leads to confusion.)
Again, I'm just following the standard here, but providing a named constant would be a nice addition. The -1 is an optional 4th parameter, and the match_flag_type is an optional 5th parameter -- so there should be no confusion.
The following items are from the "Perl syntax vs. Static xpressive syntax" table in http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/creat...:
You seem to suggest that the xpressive equivalent of Perl's "a|b" must be spelled "a | b" but as far as I can see, the whitespace is irrelevant, so calling attention to it suggests a difference that doesn't exist.
Naturally whitespace is irrelevant. That's how C++ works. I don't think this should be a source of confusion for people.
"bos" and "eos" are a little odd. First, it seems like "sequence" should be "input." Second, I usually think of SOF/EOF and SOL/EOL pairs rather than BOF/EOF and BOL/EOL. Thus, I'd have gone with "soi" and "eoi" at the least. Unfortunately, in an effort to keep them short, they aren't terribly mnemonic. How about "start" and "end" (or "beg" and "end" if you want to go with just three letters)?
The regex std proposal has match flags match_not_bol and match_not_eol, so I'm reusing this terminology. Boost.Regex also has match_not_bob for "beginning of buffer". This is not proposed for standardization, and I don't think the term "buffer" is appropriate anyway. You like "input" but I prefer "sequence". I dislike "input" becauase it might suggest to people that input iterators are acceptable to the regex algorithms, where as a bidirectional sequence is what is required.
. appears twice in the table with two different equivalences. It may be that the two are effectively the same, but they aren't grouped and the "Meaning" doesn't point out their equivalence.
Yes the docs are misleading here. In perl, . can have two meanings, depending on the /s modifier. xpressive's docs should be more specific.
Considering how much you compare xpressive to Perl's REs, I'm surprised you opted for ~_d instead of _D, for example. I'm not saying that would be better, but the disconnect from Perl didn't seem necessary in this case.
It is necessary. _D is an illegal identifier, reserved to the implementation. All identifiers that begin with an underscore and a capital letter are illegal in user code. Even if that were not the case, ALL CAPS is reserved for macros by convention. That's how I ended up with ~_d.
For "[abc]," you show to different xpressive equivalents, each in its own row of the table. Why not combine them into a single row? (Same for any other cases like that.)
Sure.
A tool that converts a Perl-style RE to xpressive (static notation certainly, and dynamic if there are any differences) would be quite helpful (for those that know Perl's REs).
Total agreement. It's on my list, but reaching v1.0 is a higher priority for me right now. Thanks! -- Eric Niebler Boost Consulting www.boost-consulting.com

From: "Eric Niebler" <eric@boost-consulting.com>
Rob Stewart wrote:
Why does the regex_token_iterator<> ctor use a magic number like -1 to indicate behavior rather than a named value? (I just clicked through to the reference and see that it takes a regex_constants::match_flag_type, but http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp... shows passing -1 -- with an explanatory comment -- instead. This leads to confusion.)
Again, I'm just following the standard here, but providing a named constant would be a nice addition. The -1 is an optional 4th parameter, and the match_flag_type is an optional 5th parameter -- so there should be no confusion.
Apparently, I can't count. I was matching the -1 with the match_flag_type parameter. Whatever the type, it ought to use named values. Perhaps there's time to improve the proposed interface, too?
The following items are from the "Perl syntax vs. Static xpressive syntax" table in http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/creat...:
You seem to suggest that the xpressive equivalent of Perl's "a|b" must be spelled "a | b" but as far as I can see, the whitespace is irrelevant, so calling attention to it suggests a difference that doesn't exist.
Naturally whitespace is irrelevant. That's how C++ works. I don't think this should be a source of confusion for people.
Of course. I was just pointing out that the Perl syntax was shown without whitespace (necessary) and the C++ with (not necessary). Many writing C++ can be confused over matters like this. Of course, if you show the xpressive version as "a|b" such people won't think they can write "a | b." Doing so, however, does avoid a gratuitous difference, don't you think? Maybe a note clarifying that while spaces are significant in a Perl or, for that matter, a dynamic xpressive RE, they aren't significant in a static xpressive RE other than in literals.
"bos" and "eos" are a little odd. First, it seems like "sequence" should be "input." Second, I usually think of SOF/EOF and SOL/EOL pairs rather than BOF/EOF and BOL/EOL. Thus, I'd have gone with "soi" and "eoi" at the least. Unfortunately, in an effort to keep them short, they aren't terribly mnemonic. How about "start" and "end" (or "beg" and "end" if you want to go with just three letters)?
The regex std proposal has match flags match_not_bol and match_not_eol, so I'm reusing this terminology. Boost.Regex also has match_not_bob for "beginning of buffer". This is not proposed for standardization, and I don't think the term "buffer" is appropriate anyway. You like "input" but I prefer "sequence". I dislike "input" becauase it might suggest to people that input iterators are acceptable to the regex algorithms, where as a bidirectional sequence is what is required.
What about "beg" and "end?" I realize they aren't reusing the proposed terminology, but they avoid the "sequence/buffer/input" issue.
Considering how much you compare xpressive to Perl's REs, I'm surprised you opted for ~_d instead of _D, for example. I'm not saying that would be better, but the disconnect from Perl didn't seem necessary in this case.
It is necessary. _D is an illegal identifier, reserved to the implementation. All identifiers that begin with an underscore and a capital letter are illegal in user code. Even if that were not the case, ALL CAPS is reserved for macros by convention. That's how I ended up with ~_d.
Doh! Where was my mind? Of course that's not a legal identifier. Clearly I was doing too many things at once at that time. (I'd hardly consider that all caps thus implying a macro, however.) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Eric Niebler" <eric@boost-consulting.com>
Rob Stewart wrote:
Perhaps there's time to improve the proposed interface, too?
It's too late for TR1 but not for C++0X.
Maybe a note clarifying that while spaces are significant in a Perl or, for that matter, a dynamic xpressive RE, they aren't significant in a static xpressive RE other than in literals.
But this is obvious, and I don't think pointing out the obvious makes for better documentation.
What about "beg" and "end?" I realize they aren't reusing the proposed terminology, but they avoid the "sequence/buffer/input" issue.
Beginning of what? End of what? The line? The word? The sequence/input/buffer? beg/end are not specific enough, nor are they more memorable than bos/eos, IMO. Besides, "end" is too common to make it a namespace scoped constant, and "beg" is a word of its own with a meaning distinct than "begin". Nope, bos/eos are it. -- Eric Niebler Boost Consulting www.boost-consulting.com

This one applies to Boost.RegEx, too, but I'll ask you: Why have both regex_match() and regex_search() when the latter can behave like the former by adding two anchors?
This is true. I'm following the lead of the regex std proposal here, but I've never felt comfortable with regex_match, to be honest. A common noobie mistake is to use regex_match instead of regex_search. Perl, for instance, doesn't distinguish between "search" and "match" operations, and "search" is the default. What makes it worse is that in Perl circles, the semantic equivalent of regex_search is called /matching/, hence the disconnect. Not sure what to do. Perhaps John could comment.
If I remember correctly the original terminology was inherited from the GNU regex package, and later got refined as a result of user feedback. But Eric's correct it is a major source of confusion *for those migrating from Perl*. The aim was that the code should be quite explicit about what it's doing: a programmer that sees regex_match would know that the code is looking to match all of the text and not just some part of it.
Why does the regex_token_iterator<> ctor use a magic number like -1 to indicate behavior rather than a named value? (I just clicked through to the reference and see that it takes a regex_constants::match_flag_type, but http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examp... shows passing -1 -- with an explanatory comment -- instead. This leads to confusion.)
Again, I'm just following the standard here, but providing a named constant would be a nice addition. The -1 is an optional 4th parameter, and the match_flag_type is an optional 5th parameter -- so there should be no confusion.
The -1 means "the thing before 0" and 0 is the whole of what matched, so -1 is the string before the bit that matched. Well that's the logic anyway. Doesn't seem to have caused any confusion in practice, but there's no harm in adding a named constant.
The regex std proposal has match flags match_not_bol and match_not_eol, so I'm reusing this terminology. Boost.Regex also has match_not_bob for "beginning of buffer". This is not proposed for standardization, and I don't think the term "buffer" is appropriate anyway. You like "input" but I prefer "sequence". I dislike "input" becauase it might suggest to people that input iterators are acceptable to the regex algorithms, where as a bidirectional sequence is what is required.
Historically, those terms (or very similar) are used by GNU regex and the BSD (Henry Spencer) packages. Renaming them would probably start a bicycle-shed style discussion I guess. Good names are hard, especially if the answer isn't immediately obvious! HTH, John.
participants (8)
-
Andreas Pokorny
-
Bronek Kozicki
-
Eric Niebler
-
Hartmut Kaiser
-
John Maddock
-
Jonathan Turkanis
-
Peter Dimov
-
Rob Stewart