whitespace and user-defined types in lexical_cast

I am having some trouble with lexical_cast that I think illustrates a problem with the underlying implementation. My basic problem is that I'd like lexical_cast to "work" for user-defined types (UDTs) that conform to lexical_cast's documented specification of OutputStreamable, InputStreamable, CopyConstructible and DefaultConstructible. For example: struct Udt{ int a, b Udt(int _a, int _b) : a(_a), b(_b){} Udt() : a(0), b(0){} }; std::ostream& operator<<(std::ostream& s, const Udt& u){ return s << u.a << " " << u.b; } std::istream& operator>>(std::istream& s, Udt& u){ return s >> u.a >> u.b; } I believe I should be able to write: bool operator==(const Udt &u1, const Udt &u2){ return u1.a==u2.a && u1.b==u2.b; } Udt(3,19) == lexical_cast<Udt>("3 19"); In fact, with the current (CVS) implementation of lexical_cast the above expression throws a bad_lexical_cast because of lexical_cast's insistence on unsetting the skipws flag in the internal lexical_stream. I could fix this by doing s.setf(std::ios::skipws) in the Udt extractor, but that just seems wrong. Nothing in lexical_cast's documentation suggests that the author of Udt should do that. The patch below (which applies to the CVS tree, NOT to boost 1_33) fixes this problem and makes the above lexical_cast "work". It limits the skipws manipulation to the case where the output type is a char or a wchar_t. This change preserves the desired behavior of lexical_cast<char>(" "), but it does not introduce the confusing whitespace senstivity when the target is not a char. It has the side-effect of changing the behavior of some of the existing unit tests so that they return reasonable results rather than throwing exceptions. I believe that users will find this behavior less surprisng. E.g., with the patch, lexical_cast<int>(" 123") returns 123 rather than throwing an exception, so the patch also contains changes to the unit tests that reflect this behavioral change. The patch also contains some new checks to make sure that strings with leading, trailing and embedded spaces convert "correctly". Finally, there is a new unit test: libs/conversion/tst/lexical_cast_udt_test.cpp that verifies the new behavior for user defined types. I hope this patch, or something like it, can make it into a future release of boost. John Salmon ------- [jsalmon@river boost]$ diff -Nau /dev/null libs/conversion/test/lexical_cast_udt_test.cpp --- /dev/null 2004-02-23 16:02:56.000000000 -0500 +++ libs/conversion/test/lexical_cast_udt_test.cpp 2007-01-27 18:23:12.234392725 -0500 @@ -0,0 +1,48 @@ +#include <boost/config.hpp> +#include <boost/test/unit_test.hpp> +#include <boost/lexical_cast.hpp> +#include <cfloat> +#include <iostream> +using namespace boost; + +void test_udt(); + +unit_test::test_suite *init_unit_test_suite(int, char *[]) +{ + unit_test_framework::test_suite *suite = + BOOST_TEST_SUITE("lexical_cast unit test"); + suite->add(BOOST_TEST_CASE(&test_udt)); + + return suite; +} + +// Udt: a simple user-defined type that models InputStreamable, +// OutputStreamable, CopyConstructable and DefaultConstructable. +// I.e., it should "work" with lexcical_cast. +struct Udt{ + int a, b; + Udt(int _a, int _b) : a(_a), b(_b){} + Udt() : a(0), b(0){} +}; + +std::ostream& operator<<(std::ostream& s, const Udt& f){ + return s << f.a << " " << f.b; +} + +std::istream& operator>>(std::istream& s, Udt& f){ + return s >> f.a >> f.b; +} + +bool operator==(const Udt& f1, const Udt& f2){ + return f1.a == f2.a && f1.b == f2.b; +} + +void test_udt() +{ + Udt f(13, -11); + BOOST_CHECK_EQUAL(f, lexical_cast<Udt>(lexical_cast<std::string>(f))); + BOOST_CHECK_EQUAL(f, lexical_cast<Udt>("13 -11")); + Udt g(99999, 0); + BOOST_CHECK_EQUAL(g, lexical_cast<Udt>(lexical_cast<std::string>(g))); + BOOST_CHECK_EQUAL(g, lexical_cast<Udt>("99999 0")); +} [jsalmon@river boost]$ Index: boost/lexical_cast.hpp =================================================================== RCS file: /cvsroot/boost/boost/boost/lexical_cast.hpp,v retrieving revision 1.33 diff -u -r1.33 lexical_cast.hpp --- boost/lexical_cast.hpp 16 Jan 2007 21:03:47 -0000 1.33 +++ boost/lexical_cast.hpp 27 Jan 2007 23:47:03 -0000 @@ -25,6 +25,7 @@ #include <boost/mpl/if.hpp> #include <boost/throw_exception.hpp> #include <boost/type_traits/is_pointer.hpp> +#include <boost/type_traits/is_same.hpp> #include <boost/call_traits.hpp> #include <boost/static_assert.hpp> #include <boost/detail/lcast_precision.hpp> @@ -528,7 +529,8 @@ public: lexical_stream(char_type* = 0, char_type* = 0) { - stream.unsetf(std::ios::skipws); + if( is_same<char, Target>::value || is_same<wchar_t, Target>::value) + stream.unsetf(std::ios::skipws); lcast_set_precision(stream, (Source*)0, (Target*)0); } ~lexical_stream() @@ -693,7 +695,8 @@ this->setg(start, start, finish); std::basic_istream<CharT> stream(static_cast<Base*>(this)); - stream.unsetf(std::ios::skipws); + if( is_same<char, InputStreamable>::value || is_same<wchar_t, InputStreamable>::value) + stream.unsetf(std::ios::skipws); lcast_set_precision(stream, (InputStreamable*)0); #if (defined _MSC_VER) # pragma warning( pop ) Index: libs/conversion/lexical_cast_test.cpp =================================================================== RCS file: /cvsroot/boost/boost/libs/conversion/lexical_cast_test.cpp,v retrieving revision 1.24 diff -u -r1.24 lexical_cast_test.cpp --- libs/conversion/lexical_cast_test.cpp 28 Oct 2006 19:33:32 -0000 1.24 +++ libs/conversion/lexical_cast_test.cpp 27 Jan 2007 23:47:03 -0000 @@ -140,14 +140,12 @@ BOOST_CHECK_EQUAL(1, lexical_cast<int>(true)); BOOST_CHECK_EQUAL(0, lexical_cast<int>(false)); BOOST_CHECK_EQUAL(123, lexical_cast<int>("123")); - BOOST_CHECK_THROW( - lexical_cast<int>(" 123"), bad_lexical_cast); + BOOST_CHECK_EQUAL(123, lexical_cast<int>(" 123")); BOOST_CHECK_THROW(lexical_cast<int>(""), bad_lexical_cast); BOOST_CHECK_THROW(lexical_cast<int>("Test"), bad_lexical_cast); BOOST_CHECK_EQUAL(123, lexical_cast<int>("123")); BOOST_CHECK_EQUAL(123, lexical_cast<int>(std::string("123"))); - BOOST_CHECK_THROW( - lexical_cast<int>(std::string(" 123")), bad_lexical_cast); + BOOST_CHECK_EQUAL(123, lexical_cast<int>(std::string(" 123"))); BOOST_CHECK_THROW( lexical_cast<int>(std::string("")), bad_lexical_cast); BOOST_CHECK_THROW( @@ -215,6 +213,12 @@ BOOST_CHECK_EQUAL(" ", lexical_cast<std::string>(" ")); BOOST_CHECK_EQUAL("", lexical_cast<std::string>("")); BOOST_CHECK_EQUAL("Test", lexical_cast<std::string>(std::string("Test"))); + BOOST_CHECK_EQUAL("TrailingSpace ", lexical_cast<std::string>(std::string("TrailingSpace "))); + BOOST_CHECK_EQUAL("TrailingSpace ", lexical_cast<std::string>("TrailingSpace ")); + BOOST_CHECK_EQUAL(" LeadingSpace ", lexical_cast<std::string>(std::string(" LeadingSpace "))); + BOOST_CHECK_EQUAL(" LeadingSpace ", lexical_cast<std::string>(" LeadingSpace ")); + BOOST_CHECK_EQUAL(" Embedded Space ", lexical_cast<std::string>(std::string(" Embedded Space "))); + BOOST_CHECK_EQUAL(" Embedded Space ", lexical_cast<std::string>(" Embedded Space ")); BOOST_CHECK_EQUAL(" ", lexical_cast<std::string>(std::string(" "))); BOOST_CHECK_EQUAL("", lexical_cast<std::string>(std::string(""))); } @@ -358,7 +362,7 @@ void test_no_whitespace_stripping() { - BOOST_CHECK_THROW(lexical_cast<int>(" 123"), bad_lexical_cast); + BOOST_CHECK_EQUAL(123, lexical_cast<int>(" 123")); BOOST_CHECK_THROW(lexical_cast<int>("123 "), bad_lexical_cast); } Index: libs/conversion/test/Jamfile =================================================================== RCS file: /cvsroot/boost/boost/libs/conversion/test/Jamfile,v retrieving revision 1.10 diff -u -r1.10 Jamfile --- libs/conversion/test/Jamfile 20 Jan 2007 13:17:35 -0000 1.10 +++ libs/conversion/test/Jamfile 27 Jan 2007 23:47:03 -0000 @@ -32,6 +32,7 @@ [ run lexical_cast_loopback_test.cpp <lib>../../test/build/boost_unit_test_framework ] [ run lexical_cast_abstract_test.cpp <lib>../../test/build/boost_unit_test_framework ] [ run lexical_cast_noncopyable_test.cpp <lib>../../test/build/boost_unit_test_framework ] + [ run lexical_cast_udt_test.cpp <lib>../../test/build/boost_unit_test_framework ] ; } Index: libs/conversion/test/Jamfile.v2 =================================================================== RCS file: /cvsroot/boost/boost/libs/conversion/test/Jamfile.v2,v retrieving revision 1.7 diff -u -r1.7 Jamfile.v2 --- libs/conversion/test/Jamfile.v2 20 Jan 2007 13:17:35 -0000 1.7 +++ libs/conversion/test/Jamfile.v2 27 Jan 2007 23:47:03 -0000 @@ -24,6 +24,7 @@ [ run lexical_cast_loopback_test.cpp ../../test/build//boost_unit_test_framework/<link>static ] [ run lexical_cast_abstract_test.cpp ../../test/build//boost_unit_test_framework/<link>static ] [ run lexical_cast_noncopyable_test.cpp ../../test/build//boost_unit_test_framework/<link>static ] + [ run lexical_cast_udt_test.cpp ../../test/build//boost_unit_test_framework/<link>static ] ;

john@thesalmons.org wrote:
I am having some trouble with lexical_cast that I think illustrates a problem with the underlying implementation. My basic problem is that I'd like lexical_cast to "work" for user-defined types (UDTs) that conform to lexical_cast's documented specification of OutputStreamable, InputStreamable, CopyConstructible and DefaultConstructible.
[ skiped ]
std::istream& operator>>(std::istream& s, Udt& u){ return s >> u.a >> u.b; }
I believe that this implementation is incorrect. If you put a writespace in output operator, you should be prepared for skipws flag and read the writespace explicitly: std::istream& operator>>(std::istream& s, Udt& u){ s >> u.a; if((s.flags() & std::ios_base::skipws) == 0) { char whitespace; s >> whitespace; } return s >> u.b; } -- Alexander Nasonov http://nasonov.blogspot.com There comes a time in every rightly constructed boy`s life when he has a raging desire to go somewhere and dig for hidden treasure. -- Henry Kissinger -- This quote is generated by: /usr/pkg/bin/curl -L http://tinyurl.com/veusy \ | sed -e 's/^document\.write(.//' -e 's/.);$/ --/' \ -e 's/<[^>]*>//g' -e 's/^More quotes from //' \ | fmt | tee ~/.signature-quote

On 1/28/07, Alexander Nasonov <alnsn@yandex.ru> wrote:
john@thesalmons.org wrote:
I am having some trouble with lexical_cast that I think illustrates a problem with the underlying implementation. My basic problem is that I'd like lexical_cast to "work" for user-defined types (UDTs) that conform to lexical_cast's documented specification of OutputStreamable, InputStreamable, CopyConstructible and DefaultConstructible.
[ skiped ]
std::istream& operator>>(std::istream& s, Udt& u){ return s >> u.a >> u.b; }
I believe that this implementation is incorrect. If you put a writespace in output operator, you should be prepared for skipws flag and read the writespace explicitly:
std::istream& operator>>(std::istream& s, Udt& u){ s >> u.a; if((s.flags() & std::ios_base::skipws) == 0) { char whitespace; s >> whitespace; } return s >> u.b; }
Yes. It's possible to write more "defensive" inserters and extractors. That's beside the point. The patch allows lexical_cast to work with Udts whose implementations are not so careful. It does so while passing all existing regressions (except for two which are modified to return a very reasonable result rather than throwing a somewhat surprising exception). It has no runtime overhead (assuming the compiler is smart enough to process the if() at compile-time). If anything it makes lexical_cast infinitesimally faster in the majority of cases because it avoids a call to stream.unsetf. Many extractors and inserters "out there" are not as defensive as perhaps they should be. Many authors assume that the streams they're working with are in their default state with respect to whitespace, precision, numeric base, etc. Yes. They should be more careful. Nevertheless, it seems to me that lexical_cast should do its best to work with/around such common deficiencies. This is similar to the rationale for why lexical_cast takes extraordinary measures to work with floating point types. Nothing in lexical_cast's documentation even hints that one should beware of whitespace when reading and writing from streams. There is no reason for lexical_cast to insist on this degree of care and attention to detail in the implementations of user-defined types. The naive pair of operator<< and operator>> in the original post work perfectly well when operating on iostreams in their standard, default state. They work perfectly well for the user who never heard of lexical_cast who uses the standard istringstream idiom: istringstream iss("13 19"); Udt Foo; iss >> Foo; assert( Foo == Udt(13, 19) ); Shouldn't lexical_cast do just as well? When I see something like the above in a colleague's code, and I suggest lexical_cast, should I really have to also warn him about carefully handling whitespace in the Udt extractor? John Salmon
-- Alexander Nasonov http://nasonov.blogspot.com

John Salmon <john <at> thesalmons.org> writes:
Yes. It's possible to write more "defensive" inserters and extractors. That's beside the point. The patch allows lexical_cast to work with Udts whose implementations are not so careful. It does so while passing all existing regressions (except for two which are modified to return a very reasonable result rather than throwing a somewhat surprising exception).
I don't have an access to my home system to check if your patch doesn't break this: BOOST_CHECK_EQUAL(" a b c " == lexical_cast<std::string>(" a b c ")); Does it?
Nothing in lexical_cast's documentation even hints that one should beware of whitespace when reading and writing from streams. There is no reason for lexical_cast to insist on this degree of care and attention to detail in the implementations of user-defined types. The naive pair of operator<< and operator>> in the original post work perfectly well when operating on iostreams in their standard, default state. They work perfectly well for the user who never heard of lexical_cast who uses the standard istringstream idiom:
istringstream iss("13 19"); Udt Foo; iss >> Foo; assert( Foo == Udt(13, 19) );
Shouldn't lexical_cast do just as well? When I see something like the above in a colleague's code, and I suggest lexical_cast, should I really have to also warn him about carefully handling whitespace in the Udt extractor?
I'll think about it. But noskipws is there for a good reason. You can search the boost archive. -- Alexander Nasonov

On 1/30/07, Alexander Nasonov <alnsn@yandex.ru> wrote:
John Salmon <john <at> thesalmons.org> writes:
Yes. It's possible to write more "defensive" inserters and extractors. That's beside the point. The patch allows lexical_cast to work with Udts whose implementations are not so careful. It does so while passing all existing regressions (except for two which are modified to return a very reasonable result rather than throwing a somewhat surprising exception).
I don't have an access to my home system to check if your patch doesn't break this: t> BOOST_CHECK_EQUAL(" a b c " == lexical_cast<std::string>(" a b c "));
Does it?
The patch satisfies that test on my system (Linux, gcc-3.3.3). I was surprised that such a test was not already in the regression suite. Something like it obviously should be, so I added some very similar tests just to be sure. These are in the original patch: diff -u -r1.24 lexical_cast_test.cpp --- libs/conversion/lexical_cast_test.cpp 28 Oct 2006 19:33:32 -0000 1.24 +++ libs/conversion/lexical_cast_test.cpp 27 Jan 2007 23:47:03 -0000 @@ -215,6 +213,12 @@ BOOST_CHECK_EQUAL(" ", lexical_cast<std::string>(" ")); BOOST_CHECK_EQUAL("", lexical_cast<std::string>("")); BOOST_CHECK_EQUAL("Test", lexical_cast<std::string>(std::string("Test"))); + BOOST_CHECK_EQUAL("TrailingSpace ", lexical_cast<std::string>(std::string("TrailingSpace "))); + BOOST_CHECK_EQUAL("TrailingSpace ", lexical_cast<std::string>("TrailingSpace ")); + BOOST_CHECK_EQUAL(" LeadingSpace ", lexical_cast<std::string>(std::string(" LeadingSpace "))); + BOOST_CHECK_EQUAL(" LeadingSpace ", lexical_cast<std::string>(" LeadingSpace ")); + BOOST_CHECK_EQUAL(" Embedded Space ", lexical_cast<std::string>(std::string(" Embedded Space "))); + BOOST_CHECK_EQUAL(" Embedded Space ", lexical_cast<std::string>(" Embedded Space ")); BOOST_CHECK_EQUAL(" ", lexical_cast<std::string>(std::string(" "))); BOOST_CHECK_EQUAL("", lexical_cast<std::string>(std::string("")));
Nothing in lexical_cast's documentation even hints that one should beware of whitespace when reading and writing from streams. There is no reason for lexical_cast to insist on this degree of care and attention to detail in the implementations of user-defined types. The naive pair of operator<< and operator>> in the original post work perfectly well when operating on iostreams in their standard, default state. They work perfectly well for the user who never heard of lexical_cast who uses the standard istringstream idiom:
istringstream iss("13 19"); Udt Foo; iss >> Foo; assert( Foo == Udt(13, 19) );
Shouldn't lexical_cast do just as well? When I see something like the above in a colleague's code, and I suggest lexical_cast, should I really have to also warn him about carefully handling whitespace in the Udt extractor?
I'll think about it. But noskipws is there for a good reason. You can search the boost archive.
Yes. There have been a few discussions over the years. There are two basic themes: 1 - somebody thinks that lexical_cast ought to work on a type whose operator>> isn't smart enough to handle a non-default skipws flag. The response is typically "your operator>> is broken", followed by more or less discussion about whether lexical_cast should accommodate "broken" operator>> anyway. We are repeating this theme. 2 - a desire for consistency between the handling of leading whitespace and trailing whitespace. I would suggest that the fact that questions of type 1 keep coming up is an indication that the current behavior is surprising and alternatives should be explored. If we can change it so that it is less surprising to "newbies" without breaking anything else and without introducing new complexity, I think we should. I would suggest that the answer to questions of type 2 is to observe that it's not trailing whitespace that causes the exception, it's trailing *anything*. As such, it should be clearly documented. In fact, the proposed text for the specification of lexical_cast in TR2 makes this very clear. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1973.html Boost's documentation should be reconciled with the TR2 proposal. Here's an excerpt: <snip> lexical_cast behavior is specified in terms of operator<< and operator>> on a std::basic_stringstream object. Implementations are not required to actually use a std::basic_stringstream object to achieve the required behavior. Implementations are permitted to provide specializations of the lexical_cast template. [Note: Implementations may use this "as if" leeway to achieve efficiency. -- end note.] template<typename Target, typename Source> Target lexical_cast(const Source& arg); Effects: * Inserts arg into an empty std::basic_stringstream object via operator<<. * Extracts the result, of type Target, from the std::basic_stringstream object via operator>>. Throws: bad_lexical_cast if: * Source is a pointer type. * fail() for the std::basic_stringstream object is true after either operator<< or operator>> is applied. * get() for the std::basic_stringstream object is not std::char_traits<char_type>::eof() after both operator<< and operator>> are applied. Returns: The result as created by the effects. Remarks: If Target is either std::string or std::wstring, stream extraction takes the whole content of the string, including spaces, rather than relying on the default operator>> behavior. </snip>
-- Alexander Nasonov
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (3)
-
Alexander Nasonov
-
John Salmon
-
john@thesalmons.org