Result of boost::split against empty string changed in post-1.45 release

Hi, I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version: // https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp> int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a; assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; } IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46? Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

On Mon, Jun 11, 2012 at 4:58 PM, Mateusz Loskot <mateusz@loskot.net> wrote:
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?
What does a contain? -- Olaf

On 11 June 2012 16:03, Olaf van der Spek <ml@vdspek.org> wrote:
On Mon, Jun 11, 2012 at 4:58 PM, Mateusz Loskot <mateusz@loskot.net> wrote:
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?
What does a contain?
a [1]("") Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:
Hi,
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:
// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>
int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;
assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?
Hi, Please search the boos archives. There was an extensive discussion about this particular issue. The current behavior is consistent with the assertion that split always returns n+1 tokens where n is number of separators in the input string. Best Regards, Pavol.

On Mon, Jun 11, 2012 at 8:09 PM, Pavol Droba <droba@topmail.sk> wrote:
On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:
Hi,
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:
// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>
int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;
assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?
Hi,
Please search the boos archives. There was an extensive discussion about this particular issue.
The current behavior is consistent with the assertion that split always returns n+1 tokens where n is number of separators in the input string.
It's not what I'd expect. Would also be handy to include a link to the discussion in the docs for future reference. -- Olaf

On 11 June 2012 19:09, Pavol Droba <droba@topmail.sk> wrote:
On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:
// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>
int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;
assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?
Please search the boos archives. There was an extensive discussion about this particular issue.
Pavol, You are right, there are some discussions about this behavior, For example http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534 and others. Thanks for pointing that. I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

On Jun 11, 2012, at 1:26 PM, Mateusz Loskot wrote:
On 11 June 2012 19:09, Pavol Droba <droba@topmail.sk> wrote:
On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:
// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>
int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;
assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?
Please search the boos archives. There was an extensive discussion about this particular issue.
Pavol,
You are right, there are some discussions about this behavior, For example
http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534
and others. Thanks for pointing that.
I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.
Mateusz -- If you would be willing to write up a paragraph making this clear, I will put it into the docs. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On 12 June 2012 04:48, Marshall Clow <mclow.lists@gmail.com> wrote:
On Jun 11, 2012, at 1:26 PM, Mateusz Loskot wrote:
On 11 June 2012 19:09, Pavol Droba <droba@topmail.sk> wrote:
On Mon, 11 Jun 2012 16:58:40 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:
I have a simple program runs boost::split against empty string. I'm testing with Boost 1.45 and later versions like 1.46 or 1.49. As stated in the // comment below, there is difference in behaviour depending on the Boost version:
// https://gist.github.com/2910461 #include <cassert> #include <string> #include <vector> #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string/classification.hpp>
int main(int argc, wchar_t* argv[]) { std::wstring s; std::vector<std::wstring> a;
assert(s.empty()); boost::split(a, s, boost::is_any_of(L", "), boost::token_compress_on); // Boost 1.45: true // Boost 1.46+: false // Assertion failed: s.empty() == a.empty(), file boost_split_empty_string.cpp, line 16 assert(s.empty() == a.empty()); return 0; }
IMHO, the behaviour exposed in newer versions is a bug. Could anyone shed light on what happened in post-1.46?
Please search the boos archives. There was an extensive discussion about this particular issue.
Pavol,
You are right, there are some discussions about this behavior, For example
http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534
and others. Thanks for pointing that.
I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.
Mateusz --
If you would be willing to write up a paragraph making this clear, I will put it into the docs.
Marshall, I'd rather leave it to those who introduce changed behaviour, because they're aware of all reasons behind it. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Hello, On Mon, 11 Jun 2012 22:26:33 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:
Pavol,
You are right, there are some discussions about this behavior, For example
http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534
and others. Thanks for pointing that.
I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.
I will not argue whether the behavior is intuitive or not. Unfortunately this term is different from person to person. Important is that the current setup is "correct". It follows a well defined constraint and it is deterministic. And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not. It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not. On the other hand, you are right, the documentation should be more explicit. Best Regards, Pavol

On Sat, Jun 16, 2012 at 10:03 PM, Pavol Droba <droba@topmail.sk> wrote:
And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not.
It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not.
On the other hand, you are right, the documentation should be more explicit.
It's not about that, is it? It's about the case where the input is empty. -- Olaf

On 16 June 2012 22:23, Olaf van der Spek <ml@vdspek.org> wrote:
On Sat, Jun 16, 2012 at 10:03 PM, Pavol Droba <droba@topmail.sk> wrote:
And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not.
It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not.
On the other hand, you are right, the documentation should be more explicit.
It's not about that, is it? It's about the case where the input is empty.
And there seem to be more than one valid or practical approach possible, indeed. By the way, it looks Python has chosen similar approach to Boost:
"".split(',') ['']
len("".split(',')) 1
Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

On 16 June 2012 21:03, Pavol Droba <droba@topmail.sk> wrote:
On Mon, 11 Jun 2012 22:26:33 +0200, Mateusz Loskot <mateusz@loskot.net> wrote:
You are right, there are some discussions about this behavior, For example
http://lists.boost.org/Archives/boost/2005/01/79380.php https://svn.boost.org/trac/boost/ticket/534
and others. Thanks for pointing that.
I have to admit that to me the current behavior is not intuitive. Also, I have failed to find any example of it in the docs.
I will not argue whether the behavior is intuitive or not.
Sure, and I'm not trying either.
Unfortunately this term is different from person to person.
Important is that the current setup is "correct". It follows a well defined constraint and it is deterministic.
Yes, it's only the matter of awareness about this behaviour.
And actually the rationale behind it is quite simple. Imagine that you are parsing a CSV file. You need to get the exactly same number of elements regardless whether the last token is empty or not.
It is much easier to remove the empty tokens than to guess whether they were supposed to be in the result or not.
That is a good point indeed. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net
participants (4)
-
Marshall Clow
-
Mateusz Loskot
-
Olaf van der Spek
-
Pavol Droba