Re: [boost] [regex and string algo] again strange split behaviour

Jan Hermelink

13 Jul 2005 13 Jul '05

2:42 p.m.

The 1.32 behaviour is compatible with Boost.Regex: boost::regex_token_iterator in splitting mode returns for "abc/abc/" -> 2 tokens This is compatible with the regex standardization proposal. I think the behaviour of boost::split should be the same. Regards, Jan -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Stefan Slapeta Posted At: Wednesday, July 13, 2005 3:13 PM Posted To: Boost Developer Conversation: [boost] [string algo] again strange split behaviour Subject: [boost] [string algo] again strange split behaviour hi, I assume the behaviour of boost::split not to return the last token has been changed (in CVS). However, a side effect seems to be that there is also one token returned for _empty_ strings, which is very questionable IMO! summary (if '/' is the separator): boost 1.32: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 2 tokens CVS: "" -> 1 token (!) "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens should be IMO: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens Thoughts? Stefan _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Show replies by date

Thore Karlsen

13 Jul 13 Jul

4:03 p.m.

New subject: [regex and string algo] again strange split behaviour

On Wed, 13 Jul 2005 16:42:06 +0200, "Jan Hermelink" <Jan.Hermelink@metalogic.de> wrote:

...

The 1.32 behaviour is compatible with Boost.Regex:

boost::regex_token_iterator in splitting mode returns for

"abc/abc/" -> 2 tokens

This is compatible with the regex standardization proposal.

I think the behaviour of boost::split should be the same.

I don't agree. I think this is counterintuitive, both for string_algo and regex. It looks more like a bug than by design if you ask me. -- Be seeing you.

Pavol Droba

5:28 p.m.

New subject: [regex and string algo] again strange split behaviour

Hi, This is an interesting observation. So it seems reasonable to drop back to 1.32 version. Ideas? Regards, Pavol On Wed, Jul 13, 2005 at 04:42:06PM +0200, Jan Hermelink wrote:

...

The 1.32 behaviour is compatible with Boost.Regex:

boost::regex_token_iterator in splitting mode returns for

"abc/abc/" -> 2 tokens

This is compatible with the regex standardization proposal.

I think the behaviour of boost::split should be the same.

Regards,

Jan

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Stefan Slapeta Posted At: Wednesday, July 13, 2005 3:13 PM Posted To: Boost Developer Conversation: [boost] [string algo] again strange split behaviour Subject: [boost] [string algo] again strange split behaviour

hi,

I assume the behaviour of boost::split not to return the last token has been changed (in CVS). However, a side effect seems to be that there is also one token returned for _empty_ strings, which is very questionable IMO!

summary (if '/' is the separator):

boost 1.32: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 2 tokens

CVS: "" -> 1 token (!) "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens

should be IMO: "" -> 0 tokens "abc/abc" -> 2 tokens "abc/abc/" -> 3 tokens

Thoughts?

Stefan

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

6:22 p.m.

New subject: [regex and string algo] again strange split behaviour

Pavol Droba wrote:

...

Hi,

This is an interesting observation. So it seems reasonable to drop back to 1.32 version.

The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.

Pavol Droba

9:08 p.m.

New subject: [regex and string algo] spliting empty string (was: again strange split behaviour)

On Wed, Jul 13, 2005 at 09:22:17PM +0300, Peter Dimov wrote:

...

Pavol Droba wrote:

...
Hi,

This is an interesting observation. So it seems reasonable to drop back to 1.32 version.

The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.

Reading your replies and thinking about it a little bit, I come to conclusion, that the definition above is realy to only reasonable one. That was actualy also the reason why I have altered the behaviour in this release. Beacuse I considered the former one as wrong. But now there is still issue about the empty string. I think, that both approaches - returning no token - returing one empty token have some meaning. The second one is very similar to returnig an empty token at the end if the input string ends with a separator. The first one simply tells that nothing equals nothing. Yet I prefer the second approach, since it is more on par with the current reasoning. Does anybody have some arguments/reasoning that can help here? Regards, Pavol

Thore Karlsen

9:32 p.m.

New subject: [regex and string algo] spliting empty string (was: again strange split behaviour)

On Wed, 13 Jul 2005 23:08:19 +0200, Pavol Droba <droba@topmail.sk> wrote:

...

...
...
Hi,

This is an interesting observation. So it seems reasonable to drop back to 1.32 version.

...

...
The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.

...

Reading your replies and thinking about it a little bit, I come to conclusion, that the definition above is realy to only reasonable one. That was actualy also the reason why I have altered the behaviour in this release. Beacuse I considered the former one as wrong.

But now there is still issue about the empty string. I think, that both approaches - returning no token - returing one empty token

have some meaning.

The second one is very similar to returnig an empty token at the end if the input string ends with a separator.

The first one simply tells that nothing equals nothing. Yet I prefer the second approach, since it is more on par with the current reasoning.

Does anybody have some arguments/reasoning that can help here?

I agree that both have some meaning, but like you, I also prefer the second approach. I think Peter brings up a good point when he says that split should return N+1 pieces for a string with N separators. I feel the same way about this, so I vote that you don't change anything. -- Be seeing you.

David Abrahams

14 Jul 14 Jul

4:26 p.m.

New subject: [regex and string algo] again strange split behaviour

"Peter Dimov" <pdimov@mmltd.net> writes:

...

Pavol Droba wrote:

...
Hi,

This is an interesting observation. So it seems reasonable to drop back to 1.32 version.

The only reasonable thing is for split to return N+1 pieces for a string with N separators, if you ask me. :-) This is lossless in the sense that it allows you to reconstruct the original string. The non-reasonable behavior can easily be implemented on top of that, but not vice versa.

I realize it may not be obvious that I have a strong opinion in this, and since there still seems to be some debate about it going on: I agree with Peter. It's the _only_ reasonable thing. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Thore Karlsen

13 Jul 13 Jul

6:34 p.m.

New subject: [regex and string algo] again strange split behaviour

On Wed, 13 Jul 2005 19:28:23 +0200, Pavol Droba <droba@topmail.sk> wrote:

...

Hi,

This is an interesting observation. So it seems reasonable to drop back to 1.32 version.

Ideas?

What other libraries/languages behave this way? Python doesn't, Perl doesn't (just checked). How about "/abc//"? "", "abc", ""? I still don't think that it makes sense to arbitrarily drop only the last empty token. One interesting thing to note on the original topic is that Perl doesn't return an empty match for an empty string, unlike Python. -- Be seeing you.

7317

Age (days ago)

7318

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

David Abrahams
Jan Hermelink
Pavol Droba
Peter Dimov
Thore Karlsen