Re: [boost] [regex and string algo] again strange split behaviour

It looks more like a bug than by design if you ask me. I don't think so - this behaviour is specified in the standardization proposal. To quote from http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1429.htm, Chapter "RE.8.2 Template class regex_token_iterator":
"If the end of sequence is reached (regex_search returns false), the iterator becomes equal to the end-of-sequence iterator value, unless the sub-expression being enumerated has index -1: In which case the iterator enumerates one last string that contains all the characters from the end of the last regular expression match to the end of the input sequence being enumerated, provided that this would not be an empty string." !!! "provided that this would not be an empty string" !!!
I don't agree. I think this is counterintuitive, both for string_algo and regex May be, but the above described behaviour has a good change to be standardized, and boost::split() should do the same thing.
Jan -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thore Karlsen Posted At: Wednesday, July 13, 2005 6:03 PM Posted To: Boost Developer Conversation: [boost] [regex and string algo] again strange split behaviour Subject: Re: [boost] [regex and string algo] again strange split behaviour On Wed, 13 Jul 2005 16:42:06 +0200, "Jan Hermelink" <Jan.Hermelink@metalogic.de> wrote:
The 1.32 behaviour is compatible with Boost.Regex:
boost::regex_token_iterator in splitting mode returns for
"abc/abc/" -> 2 tokens
This is compatible with the regex standardization proposal.
I think the behaviour of boost::split should be the same.
I don't agree. I think this is counterintuitive, both for string_algo and regex. It looks more like a bug than by design if you ask me. -- Be seeing you. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, 13 Jul 2005 19:00:12 +0200, "Jan Hermelink" <Jan.Hermelink@metalogic.de> wrote:
It looks more like a bug than by design if you ask me.
I don't think so - this behaviour is specified in the standardization proposal. To quote from http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1429.htm, Chapter "RE.8.2 Template class regex_token_iterator":
"If the end of sequence is reached (regex_search returns false), the iterator becomes equal to the end-of-sequence iterator value, unless the sub-expression being enumerated has index -1: In which case the iterator enumerates one last string that contains all the characters from the end of the last regular expression match to the end of the input sequence being enumerated, provided that this would not be an empty string."
!!! "provided that this would not be an empty string" !!!
How about this string: "/abc/abc". Would this result in "", "abc", "abc"? Yet "abc/abc/" would result in "abc", "abc"? That seems terribly unbalanced to me, and this is not the behavior I would expect.
I don't agree. I think this is counterintuitive, both for string_algo and regex
May be, but the above described behaviour has a good change to be standardized, and boost::split() should do the same thing.
I still don't agree based on that argument. Two wrongs don't make a right. I'd like to know the reason why the empty string is explicitly excluded in the paragraph above, though. -- Be seeing you.

On Wed, 13 Jul 2005 19:00:12 +0200 "Jan Hermelink" <Jan.Hermelink@metalogic.de> wrote:
"If the end of sequence is reached (regex_search returns false), the iterator becomes equal to the end-of-sequence iterator value, unless the sub-expression being enumerated has index -1: In which case the iterator enumerates one last string that contains all the characters from the end of the last regular expression match to the end of the input sequence being enumerated, provided that this would not be an empty string."
!!! "provided that this would not be an empty string" !!!
Why does it have to jive with regex? If you want regex, use regex. If you want split-like-a-script, then use split...

"Peter Dimov" <pdimov@mmltd.net> writes:
Jan Hermelink wrote:
May be, but the above described behaviour has a good change to be standardized, and boost::split() should do the same thing.
Repeating a standardized mistake doesn't necessarily make us right. ^ not-yet------^
And we ought to consider whether TR1 needs fixing. It's not quite the same case, of course, since the cited functionality uses an iterator. -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (5)
-
David Abrahams
-
Jan Hermelink
-
Jody Hagins
-
Peter Dimov
-
Thore Karlsen