[xpressive] regex_token_iterator - bug, feature, misunderstanding query?

OK, so I want to use the sregex_token_iterator functionality to split a data string. The data string contains: /a/b//c/ The delimiter is the forward slash and I do want empty strings. I expect to get: {}{a}{b}{}{c}{} What I actually get is: {}{a}{b}{}{c} The empty string after {c}, which I expect because the data string ended in a forward slash, is missing. What do I have to do to get the empty string after {c} if the data string ends in a forward slash? The code is as follows: --- #include <iostream> #include <string> #include <boost/xpressive/xpressive.hpp> #include <boost/xpressive/regex_token_iterator.hpp> int main(int argc, char *argv[]) { // Split the path using namespace boost::xpressive; // For simplicity sregex levelSplitter(as_xpr('/')); std::string nodePath("/a/b//c/"); sregex_token_iterator begin(nodePath.begin(),nodePath.end(),levelSplitter,-1); sregex_token_iterator end; for (sregex_token_iterator iCur=begin;iCur!=end;++iCur) std::cout << '{' << *iCur << '}'; std::cout << std::endl; return 0; } --- Thanks, Michael Goldshteyn

Michael Goldshteyn wrote:
OK, so I want to use the sregex_token_iterator functionality to split a data string. The data string contains:
/a/b//c/
The delimiter is the forward slash and I do want empty strings. I expect to get:
{}{a}{b}{}{c}{}
What I actually get is:
{}{a}{b}{}{c}
The empty string after {c}, which I expect because the data string ended in a forward slash, is missing. What do I have to do to get the empty string after {c} if the data string ends in a forward slash?
<snip> This is by design. It behaves the same as Boost.Regex and perl's split() function. Try running this perl code: $str = '/a/b//c/'; @rg = split(/\//, $str); foreach(@rg) { printf("{%s}", $_); } It prints: {}{a}{b}{}{c} I'm not 100% sure I understand this behavior myself, but the C++0x standard is very clear about this case. 28.12.2.4/5-6 about regex_token_iterator::operator++ says:
Otherwise, if any of the values stored in subs is equal to -1 and prev->suffix().length() is not 0 the operator sets *this to a suffix iterator that points to the range [prev->suffix().first, prev->suffix().second). Otherwise, sets *this to an end-of-sequence iterator.
In your case, subs[0] is -1 and prev->suffix().length() is 0 after matching the trailing '/', so *this becomes the end-of-sequence iterator and we're done. I don't myself remember the rationale for requiring the suffix to be non-empty. Perhaps it is for parity with perl. -- Eric Niebler BoostPro Computing http://www.boostpro.com

"Eric Niebler" <eric@boost-consulting.com> wrote in message news:4936C62D.3000502@boost-consulting.com...
Michael Goldshteyn wrote: ... In your case, subs[0] is -1 and prev->suffix().length() is 0 after matching the trailing '/', so *this becomes the end-of-sequence iterator and we're done. I don't myself remember the rationale for requiring the suffix to be non-empty. Perhaps it is for parity with perl.
-- Eric Niebler BoostPro Computing http://www.boostpro.com
Thanks for the insightful response. I'll just work around it. Michael Goldshteyn
participants (2)
-
Eric Niebler
-
Michael Goldshteyn