[xpressive] Is there a way to test for an empty regex?

What is the best way to test for an empty boost::xpressive::regex object? That is, if I have code like: boost::xpressive::sregex re; ... // Test if re is empty and should not be used to perform a regex_search if (?????) Thanks, Michael Goldshteyn

Michael Goldshteyn wrote:
What is the best way to test for an empty boost::xpressive::regex object?
That is, if I have code like:
boost::xpressive::sregex re;
...
// Test if re is empty and should not be used to perform a regex_search if (?????)
There is, although it's not obvious from the documentation that it's possible. You can see from the postcondition on the default constructor of basic_regex that this->regex_id() == 0 for default-constructed regex objects and != 0 for non-default constructed regex objects. So your test above can be: if( 0 != re.regex_id() ) HTH, -- Eric Niebler BoostPro Computing http://www.boostpro.com

"Eric Niebler" <eric@boost-consulting.com> wrote in message news:4924898A.4020101@boost-consulting.com...
Michael Goldshteyn wrote:
What is the best way to test for an empty boost::xpressive::regex object?
That is, if I have code like:
boost::xpressive::sregex re;
...
// Test if re is empty and should not be used to perform a regex_search if (?????)
There is, although it's not obvious from the documentation that it's possible. You can see from the postcondition on the default constructor of basic_regex that this->regex_id() == 0 for default-constructed regex objects and != 0 for non-default constructed regex objects. So your test above can be:
if( 0 != re.regex_id() )
HTH,
Thanks for the speedy reply. I did notice the postcondition on the constructor and the fact that the regex_id() will return 0 for a non-initialized regex, but was hoping that there was a more intuitive way to go about this test, since readers of the code may get confused without a comment. How hard would it be to add an empty() function with a signature that is similar to the one in boost::regex before Boost 1.38.0 comes out, since this functionality, at least in my humble opinion is very useful. The signature for the function in boost::regex is: bool empty() const; Thanks, Michael Goldshteyn

Michael Goldshteyn wrote:
"Eric Niebler" <eric@boost-consulting.com> wrote in message
if( 0 != re.regex_id() )
Thanks for the speedy reply. I did notice the postcondition on the constructor and the fact that the regex_id() will return 0 for a non-initialized regex, but was hoping that there was a more intuitive way to go about this test, since readers of the code may get confused without a comment. How hard would it be to add an empty() function with a signature that is similar to the one in boost::regex before Boost 1.38.0 comes out, since this functionality, at least in my humble opinion is very useful.
The signature for the function in boost::regex is:
bool empty() const;
boost::regex goes to some length to present itself as a souped up container of characters. You can assign a character range to it, get begin() and end() iterators for stepping through the characters with which the regex was initialized, etc. Given that, see if you can answer this without checking the docs: assert(std::string().empty()); // OK assert(std::string("").empty()); // OK assert(boost::regex().empty()); // Is this true??? assert(boost::regex("").empty()); // How about this??? Unsurprisingly, regex::empty() is not part of the standard regex interface in C++0x. I don't like regex::empty() and I'm not inclined to add it. Sorry. You already have a way to get the information you're interested in. If you would like to give it a pretty name, by all means... template<class Iter> bool is_invalid(xpressive::basic_regex<Iter> const &rex) { return 0 == rex.regex_id(); } -- Eric Niebler BoostPro Computing http://www.boostpro.com

"Eric Niebler" <eric@boost-consulting.com> wrote in message news:4925187A.6080402@boost-consulting.com...
Michael Goldshteyn wrote:
"Eric Niebler" <eric@boost-consulting.com> wrote in message
if( 0 != re.regex_id() )
Thanks for the speedy reply. I did notice the postcondition on the constructor and the fact that the regex_id() will return 0 for a non-initialized regex, but was hoping that there was a more intuitive way to go about this test, since readers of the code may get confused without a comment. How hard would it be to add an empty() function with a signature that is similar to the one in boost::regex before Boost 1.38.0 comes out, since this functionality, at least in my humble opinion is very useful.
The signature for the function in boost::regex is:
bool empty() const;
boost::regex goes to some length to present itself as a souped up container of characters. You can assign a character range to it, get begin() and end() iterators for stepping through the characters with which the regex was initialized, etc. Given that, see if you can answer this without checking the docs:
assert(std::string().empty()); // OK assert(std::string("").empty()); // OK assert(boost::regex().empty()); // Is this true??? assert(boost::regex("").empty()); // How about this???
Yes, the behavior of the last assert is surprising, so I do see your point. "empty" is certainly not the right name for such a function, at least in the context of standard container functionality.
Unsurprisingly, regex::empty() is not part of the standard regex interface in C++0x. I don't like regex::empty() and I'm not inclined to add it. Sorry. You already have a way to get the information you're interested in. If you would like to give it a pretty name, by all means...
template<class Iter> bool is_invalid(xpressive::basic_regex<Iter> const &rex) { return 0 == rex.regex_id(); }
empty() may not a good name for this function, but agree with me that encapsulating the functionality of testing whether an re object actually holds a regular-expression should be more intuitive than: 0 == rex.regex_id(); // Test if rex actually contains a regular expression Perhaps something like unfilled(), bare(), or as you suggested is_invalid() should be added to the actual implementation, instead of user code? I would further argue that something so trivial and tightly coupled to the implementation (i.e., regex_id() being zero) should be a member function and not stand alone. Opinions on this topic from others would also be greatly appreciated! Thanks, Michael Goldshteyn

Michael Goldshteyn wrote:
empty() may not a good name for this function, but agree with me that encapsulating the functionality of testing whether an re object actually holds a regular-expression should be more intuitive than:
---------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0 == rex.regex_id(); // Test if rex actually contains a regular expression
Actually, I don't agree. Conceptually, all regex objects hold a regular expression; that is, every regex object matches a subset of all possible character sequences. For default-constructed regexes, the subset happens to be empty. I'm having a hard time imagining why you want to know that a given regex was default-constructed (or was copied from such a regex). Is it to avoid the cost of calling a regex algorithm when you know it would fail? The first thing these algorithms do is check to see if the regex_id() is 0 and if so, immediately return, so there isn't really any overhead you'd be saving (aside from resetting the match_results, which is cheap). So, not only do you already have access to the information you want, it also doesn't seem to be particularly useful information. If the cost of resetting the match_results concerns you, I could see about reorganizing the code in the algorithms to avoid even this tiny cost. Might be a good change to make anyway.
Perhaps something like unfilled(), bare(), or as you suggested is_invalid() should be added to the actual implementation, instead of user code? I would further argue that something so trivial and tightly coupled to the implementation (i.e., regex_id() being zero) should be a member function and not stand alone.
What is it about checking the regex_id() that bothers you? Is it the constant 0? You could instead write it like ... if(rex.regex_id() == sregex().regex_id()) Default-constructing a regex is very cheap, FWIW. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Eric Niebler wrote:
Michael Goldshteyn wrote:
Perhaps something like unfilled(), bare(), or as you suggested is_invalid() should be added to the actual implementation, instead of user code? I would further argue that something so trivial and tightly coupled to the implementation (i.e., regex_id() being zero) should be a member function and not stand alone.
What is it about checking the regex_id() that bothers you? Is it the constant 0? You could instead write it like ...
if(rex.regex_id() == sregex().regex_id())
Default-constructing a regex is very cheap, FWIW.
I'm not very familiar with xpressive, but having a look from aside it would never come into my mind that "rex.regex_id() == 0" is an emptyness (or whatever) check. Sorry, this looks more like a hack to me. I think a clear statement is needed: is a default-constructed regex valid? If not, then why does it have a default constructor? If it is, the default-constructed state should be easily detectable, since this is a common practice in just about any domain: - Boost.Regex, containers, strings, Boost.Function provide empty() - Boost.Optional, Boost.SmartPtr provide unspecified bool conversions and operator!, or equivalent facilities. Personally, I like empty() naming, but if a more precise name can be found, I would be fine. And I agree with Michael, such a function should be a member of the class. As for application, the empty state of regex may be useful when the expression may, or may not be compiled. It's a sort of implicit optional, but a simpler and more efficient one. However, I'm sure it's not the only valid use case.

Andrey Semashev wrote:
I'm not very familiar with xpressive, but having a look from aside it would never come into my mind that "rex.regex_id() == 0" is an emptyness (or whatever) check. Sorry, this looks more like a hack to me.
I think a clear statement is needed: is a default-constructed regex valid?
Yes!
If not, then why does it have a default constructor? If it is, the default-constructed state should be easily detectable,
Why? Does a std::pair tell you whether it was default-constructed? It's just a value. How it was constructed doesn't matter.
since this is a common practice in just about any domain: - Boost.Regex,
We've already covered why regex::empty is confusing.
containers, strings,
These are sequences. empty() tests whether the sequence is empty. A regex is not a sequence. At least not for xpressive.
Boost.Function provide empty() - Boost.Optional, Boost.SmartPtr provide unspecified bool conversions and operator!, or equivalent facilities.
These types all have invalid singular values. Regexes do not.
Personally, I like empty() naming, but if a more precise name can be found, I would be fine. And I agree with Michael, such a function should be a member of the class.
As for application, the empty state of regex may be useful when the expression may, or may not be compiled. It's a sort of implicit optional, but a simpler and more efficient one. However, I'm sure it's not the only valid use case.
A regex is not like an optional. It does not have an invalid state. I'm sorry to disappoint you guys. I'm not adding an empty() member function. -- Eric Niebler BoostPro Computing http://www.boostpro.com

I have submitted Ticket #2519 to more formally report this matter and its suggested resolution. https://svn.boost.org/trac/boost/ticket/2519 Thanks, Michael Goldshteyn
participants (3)
-
Andrey Semashev
-
Eric Niebler
-
Michael Goldshteyn