Re: [boost] [regex and string algo] again strange split behaviour

The 1.32 behaviour is compatible with Boost.Regex: boost::regex_token_iterator in splitting mode returns for "abc/abc/" -> 2 tokens This is compatible with the regex standardization proposal. I think the behaviour of boost::split should be the same. Regards, Jan -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Stefan Slapeta Posted At: Wednesday, July 13, 2005 3:13 PM Posted To: Boost Developer Conversation: [boost] [string algo] again strange split behaviour Subject: [boost] [string algo] again strange split behaviour hi, I assume the behaviour of boost::split not to return the last token has been changed (in CVS). However, a side effect seems to be that there is also one token returned for _empty_ strings, which is very questionable IMO! summary (if '/' is the separator): boost 1.32: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 2 tokens CVS: "" -> 1 token (!) "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens should be IMO: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens Thoughts? Stefan _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, 13 Jul 2005 16:42:06 +0200, "Jan Hermelink" <Jan.Hermelink@metalogic.de> wrote:
The 1.32 behaviour is compatible with Boost.Regex:
boost::regex_token_iterator in splitting mode returns for
"abc/abc/" -> 2 tokens
This is compatible with the regex standardization proposal.
I think the behaviour of boost::split should be the same.
I don't agree. I think this is counterintuitive, both for string_algo and regex. It looks more like a bug than by design if you ask me. -- Be seeing you.

Hi, This is an interesting observation. So it seems reasonable to drop back to 1.32 version. Ideas? Regards, Pavol On Wed, Jul 13, 2005 at 04:42:06PM +0200, Jan Hermelink wrote:
The 1.32 behaviour is compatible with Boost.Regex:
boost::regex_token_iterator in splitting mode returns for
"abc/abc/" -> 2 tokens
This is compatible with the regex standardization proposal.
I think the behaviour of boost::split should be the same.
Regards,
Jan
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Stefan Slapeta Posted At: Wednesday, July 13, 2005 3:13 PM Posted To: Boost Developer Conversation: [boost] [string algo] again strange split behaviour Subject: [boost] [string algo] again strange split behaviour
hi,
I assume the behaviour of boost::split not to return the last token has been changed (in CVS). However, a side effect seems to be that there is also one token returned for _empty_ strings, which is very questionable IMO!
summary (if '/' is the separator):
boost 1.32: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 2 tokens
CVS: "" -> 1 token (!) "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens
should be IMO: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens
Thoughts?
Stefan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Pavol Droba wrote:
Hi,
This is an interesting observation. So it seems reasonable to drop back to 1.32 version.
The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.

On Wed, Jul 13, 2005 at 09:22:17PM +0300, Peter Dimov wrote:
Pavol Droba wrote:
Hi,
This is an interesting observation. So it seems reasonable to drop back to 1.32 version.
The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.
Reading your replies and thinking about it a little bit, I come to conclusion, that the definition above is realy to only reasonable one. That was actualy also the reason why I have altered the behaviour in this release. Beacuse I considered the former one as wrong. But now there is still issue about the empty string. I think, that both approaches - returning no token - returing one empty token have some meaning. The second one is very similar to returnig an empty token at the end if the input string ends with a separator. The first one simply tells that nothing equals nothing. Yet I prefer the second approach, since it is more on par with the current reasoning. Does anybody have some arguments/reasoning that can help here? Regards, Pavol

On Wed, 13 Jul 2005 23:08:19 +0200, Pavol Droba <droba@topmail.sk> wrote:
Hi,
This is an interesting observation. So it seems reasonable to drop back to 1.32 version.
The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.
Reading your replies and thinking about it a little bit, I come to conclusion, that the definition above is realy to only reasonable one. That was actualy also the reason why I have altered the behaviour in this release. Beacuse I considered the former one as wrong.
But now there is still issue about the empty string. I think, that both approaches - returning no token - returing one empty token
have some meaning.
The second one is very similar to returnig an empty token at the end if the input string ends with a separator.
The first one simply tells that nothing equals nothing. Yet I prefer the second approach, since it is more on par with the current reasoning.
Does anybody have some arguments/reasoning that can help here?
I agree that both have some meaning, but like you, I also prefer the second approach. I think Peter brings up a good point when he says that split should return N+1 pieces for a string with N separators. I feel the same way about this, so I vote that you don't change anything. -- Be seeing you.

"Peter Dimov" <pdimov@mmltd.net> writes:
Pavol Droba wrote:
Hi,
This is an interesting observation. So it seems reasonable to drop back to 1.32 version.
The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.
I realize it may not be obvious that I have a strong opinion in this, and since there still seems to be some debate about it going on: I agree with Peter. It's the _only_ reasonable thing. -- Dave Abrahams Boost Consulting www.boost-consulting.com

On Wed, 13 Jul 2005 19:28:23 +0200, Pavol Droba <droba@topmail.sk> wrote:
Hi,
This is an interesting observation. So it seems reasonable to drop back to 1.32 version.
Ideas?
What other libraries/languages behave this way? Python doesn't, Perl doesn't (just checked). How about "/abc//"? "", "abc", ""? I still don't think that it makes sense to arbitrarily drop only the last empty token. One interesting thing to note on the original topic is that Perl doesn't return an empty match for an empty string, unlike Python. -- Be seeing you.
participants (5)
-
David Abrahams
-
Jan Hermelink
-
Pavol Droba
-
Peter Dimov
-
Thore Karlsen