Result of boost::split against empty string changed in post-1.45 release

older
[context] build and warning fixes...

Mateusz Loskot

11 Jun 2012 11 Jun '12

2:58 p.m.

Hi, I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version: // https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp> int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a; assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; } IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46? Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Show replies by date

Olaf van der Spek

11 Jun 11 Jun

3:03 p.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On Mon, Jun 11, 2012 at 4:58 PM, Mateusz Loskot <mateusz@loskot.net> wrote:

...

IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?

What does a contain? -- Olaf

Mateusz Loskot

3:05 p.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On 11 June 2012 16:03, Olaf van der Spek <ml@vdspek.org> wrote:

...

On Mon, Jun 11, 2012 at 4:58 PM, Mateusz Loskot <mateusz@loskot.net> wrote:

...
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?

What does a contain?

a [1]("") Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Pavol Droba

6:09 p.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:

...

Hi,

I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:

// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>

int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;

assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }

IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?

Hi, Please search the boos archives. There was an extensive discussion about this particular issue. The current behavior is consistent with the assertion that split always returns n+1 tokens where n is number of separators in the input string. Best Regards, Pavol.

Olaf van der Spek

8:05 p.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On Mon, Jun 11, 2012 at 8:09 PM, Pavol Droba <droba@topmail.sk> wrote:

...

On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:

...
Hi,

I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:

// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>

int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;

assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }

IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?

Hi,

Please search the boos archives. There was an extensive discussion about this particular issue.

The current behavior is consistent with the assertion that split always returns n+1 tokens where n is number of separators in the input string.

It's not what I'd expect. Would also be handy to include a link to the discussion in the docs for future reference. -- Olaf

Mateusz Loskot

8:26 p.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On 11 June 2012 19:09, Pavol Droba <droba@topmail.sk> wrote:

...

On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:

...
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:

// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>

int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;

assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }

IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?

Please search the boos archives. There was an extensive discussion about this particular issue.

Pavol, You are right, there are some discussions about this behavior, For example http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534 and others. Thanks for pointing that. I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Marshall Clow

12 Jun 12 Jun

3:48 a.m.

On Jun 11, 2012, at 1:26 PM, Mateusz Loskot wrote:

...

On 11 June 2012 19:09, Pavol Droba <droba@topmail.sk> wrote:

...
On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:

...
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:

// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>

int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;

assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }

IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?

Please search the boos archives. There was an extensive discussion about this particular issue.

Pavol,

You are right, there are some discussions about this behavior, For example

http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534

and others. Thanks for pointing that.

I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.

Mateusz -- If you would be willing to write up a paragraph making this clear, I will put it into the docs. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

Mateusz Loskot

8:19 a.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On 12 June 2012 04:48, Marshall Clow <mclow.lists@gmail.com> wrote:

...

On Jun 11, 2012, at 1:26 PM, Mateusz Loskot wrote:

...
On 11 June 2012 19:09, Pavol Droba <droba@topmail.sk> wrote:

...
On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:

...
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:

// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>

int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;

assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }

IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?

Please search the boos archives. There was an extensive discussion about this particular issue.

Pavol,

You are right, there are some discussions about this behavior, For example

http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534

and others. Thanks for pointing that.

I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.

Mateusz --

If you would be willing to write up a paragraph making this clear, I will put it into the docs.

Marshall, I'd rather leave it to those who introduce changed behaviour, because they're aware of all reasons behind it. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Pavol Droba

16 Jun 16 Jun

8:03 p.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

Hello, On Mon, 11 Jun 2012 22:26:33 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:

...

Pavol,

You are right, there are some discussions about this behavior, For example

http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534

and others. Thanks for pointing that.

I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.

I will not argue whether the behavior is intuitive or not. Unfortunately this term is different from person to person. Important is that the current setup is "correct". It follows a well defined constraint and it is deterministic. And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not. It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not. On the other hand, you are right, the documentation should be more explicit. Best Regards, Pavol

Olaf van der Spek

9:23 p.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On Sat, Jun 16, 2012 at 10:03 PM, Pavol Droba <droba@topmail.sk> wrote:

...

And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not.

It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not.

On the other hand, you are right, the documentation should be more explicit.

It's not about that, is it? It's about the case where the input is empty. -- Olaf

Mateusz Loskot

17 Jun 17 Jun

12:42 a.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On 16 June 2012 22:23, Olaf van der Spek <ml@vdspek.org> wrote:

...

On Sat, Jun 16, 2012 at 10:03 PM, Pavol Droba <droba@topmail.sk> wrote:

...
And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not.

It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not.

On the other hand, you are right, the documentation should be more explicit.

It's not about that, is it? It's about the case where the input is empty.

And there seem to be more than one valid or practical approach possible, indeed. By the way, it looks Python has chosen similar approach to Boost:

...

...
...
"".split(',') ['']

...

...
...
len("".split(',')) 1

Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Mateusz Loskot

12:38 a.m.

New subject: Result of boost::split against empty string changed in post-1.45 release

On 16 June 2012 21:03, Pavol Droba <droba@topmail.sk> wrote:

...

On Mon, 11 Jun 2012 22:26:33 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:

...
You are right, there are some discussions about this behavior, For example

http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534

and others. Thanks for pointing that.

I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.

I will not argue whether the behavior is intuitive or not.

Sure, and I'm not trying either.

...

Unfortunately this term is different from person to person.

Important is that the current setup is "correct". It follows a well defined constraint and it is deterministic.

Yes, it's only the matter of awareness about this behaviour.

...

And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not.

It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not.

That is a good point indeed. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

4787

Age (days ago)

4793

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Marshall Clow
Mateusz Loskot
Olaf van der Spek
Pavol Droba