Re: [boost] whitespace and user-defined types in lexical_cast

28 Jan 2007

      On 1/28/07, Alexander Nasonov <alnsn@yandex.ru> wrote:
...
john@thesalmons.org wrote:
...
I am having some trouble with lexical_cast that I think illustrates
a problem with the underlying implementation.  My basic
problem is that I'd like lexical_cast to "work" for user-defined
types (UDTs) that conform to lexical_cast's documented specification of
OutputStreamable, InputStreamable, CopyConstructible and
DefaultConstructible.
[ skiped ]
...
std::istream& operator>>(std::istream& s, Udt& u){
    return s >> u.a >> u.b;
}
I believe that this implementation is incorrect. If you put a writespace
in output operator, you should be prepared for skipws flag and read the
writespace explicitly:
std::istream& operator>>(std::istream& s, Udt& u){
    s >> u.a;
    if((s.flags() & std::ios_base::skipws) == 0)
    {
        char whitespace;
        s >> whitespace;
    }
    return s >> u.b;
}
Yes.  It's possible to write more "defensive" inserters and
extractors.  That's beside the point.  The patch allows lexical_cast
to work with Udts whose implementations are not so careful.  It does
so while passing all existing regressions (except for two which are
modified to return a very reasonable result rather than throwing a
somewhat surprising exception).  It has no runtime overhead (assuming
the compiler is smart enough to process the if() at compile-time).  If
anything it makes lexical_cast infinitesimally faster in the majority
of cases because it avoids a call to stream.unsetf.

Many extractors and inserters "out there" are not as defensive as
perhaps they should be.   Many authors assume that the streams they're
working with are in their default state with respect to whitespace,
precision, numeric base, etc.  Yes.  They should be more careful.
Nevertheless, it seems to me that lexical_cast should do its best to
work with/around such common deficiencies.  This is similar to the
rationale for why lexical_cast takes extraordinary measures to work
with floating point types.

Nothing in lexical_cast's documentation even hints that one should
beware of whitespace when reading and writing from streams.  There is
no reason for lexical_cast to insist on this degree of care and
attention to detail in the implementations of user-defined types.  The
naive pair of operator<< and operator>> in the original post work
perfectly well when operating on iostreams in their standard, default
state.  They work perfectly well for the user who never heard of
lexical_cast who uses the standard istringstream idiom:

istringstream iss("13 19");
Udt Foo;
iss >> Foo;
assert( Foo == Udt(13, 19) );

Shouldn't lexical_cast do just as well?  When I see something like the
above in a colleague's code, and I suggest lexical_cast, should I
really have to also warn him about carefully handling whitespace in
the Udt extractor?

John Salmon
...
--
Alexander Nasonov
http://nasonov.blogspot.com