Re: [boost] [rfc] I/O Library Design

30 Jun 2007


      Phil Endecott wrote:
...
** Formatting of user-defined types often broken in practice.
The ability to write overloaded functions to format user-defined types 
for text I/O is attractive in theory, but in practice it always lets me 
down somewhere.  My main complaint is that neither of these work:
typedef std::set<thing> things_t;
operator<<(things_t things) { .... }  // doesn't work because things_t 
is a typedef
I see no specific reason why that would fail, as long as there isn't an
operator << for std::set<thing> somewhere already. It's even legal, I
think, because std::set<thing> depends on a type not in namespace std.
(You can't overload for std::set<int>, for example, by the rules of the
standard.)
...
uint8_t i;
cout << i;  // doesn't work because uint8_t is actually a char
Yes, that's annoying. In my opinion, it's a defect in the standard that
unsigned and signed char are treated as characters instead of small
integers. Characters is what char is for.
...
When I do have a class, I often find that there is more than one way in 
which I'd like to format it, but there is only one operator<< to 
overload.  And often I want to put the result of the formatting into a 
string, not a stream.
I have an idea for a formatting system that should address all these
issues. Basically, a format string would be able to specify, in an
extensible and type-safe way, how to format an object. The format string
would be used to look up a formatter in some sort of registry.
...
** lexical_cast<> uses streams, should the reversed.
Currently we implement formatters that output to streams.  We implement 
lexical_cast using stringstreams.  Surely it would be preferable to 
implement formatters as specialisations of lexical_cast to a string (or 
character sequence / output iterator / whatever) and to implement 
formatted output to streams on top of that.  I suppose you could argue 
that the stream model is better for very large amounts of output since 
you don't accumulate it all in a temporary string, but I've never 
encountered a case where that would matter.
I have written in another post why I think the stream interface is
better. Efficiency is one part of the issue. Another is that the code is
simpler that way for the library implementer, and the difference is
transparent for the library user. Also, it means that it's easier to
switch the string type used (something that is not uncommon).
...
** Formatting state has the wrong scope
void f() {
   scoped_fmt_state(cout,hex);
   cout << ....;
   if (...) throw;
   cout << .....;
}
Hmm, I think that's too much work.  I'd be happy with NO formatting 
state in the stream, and to use explicit formatting when I want it:
cout << hex(x);
OR cout << format("%08x",x);
OR printf(stdout,"%08x",x);
I absolutely agree. Stateful formatting is generally not good. The only
state that should be in formatting is the used locale.
...
And it _is_ 
type safe if you are using a compiler that treats it as special.)
... _and_ if you use a string literal as the formatting string. Far from
guaranteed, especially when localizing.
...
** Too much disconnect between POSIX file descriptors and std::streams
I cannot make myself think of this specific issue as a defect. It would
mean platform coupling.
...
I have quite a lot of code that uses sockets and serial ports, does 
ioctls on file descriptors, and things like that.  So I have a 
FileDescriptor class that wraps a file descriptor with methods that 
implement simple error-trapping wrappers around the POSIX function calls.
Is there any specific reason you cannot implement a streambuffer that
acts on a file descriptor? A streambuffer, despite its name, doesn't
have to buffer data.
...
Currently, there's a strong separation between what I can do to a 
FileDescriptor (i.e. reads and writes) and what I can do to a stream.  
There is no reason why this has to be the case.  It should be possible 
to add buffering to a FileDescriptor *and only add buffering*, and it 
should be possible to do formatted I/O on a non-buffered FileDescriptor.
Yes. It is possible now. It should be easier with my system.
...
** Character sets need support
This is a hugely complex area which native English speakers are 
uniquely unqualified to talk about.
Luckily, I'm not a native English speaker. I have some experience with
the issues involved, although my experience is limited to German
umlauts. I have experienced the pains of unexpected encoding use in web
applications. This is why I really, really think all C++ types involving
text handling really need to be tagged with the encoding used.
...
I think that a starting point would be for someone to write a Boost 
interface to iconv (I have an example that makes functors for iconv 
conversions), and to write a tagged-string class that knows its 
encoding (either a compile-time type tag or a run-time enumeration tag 
or both).  Ideally we'd spend a couple of years getting used to using 
that, and then consider how it can best integrate with IO.
I don't want to wait that long ;)
I have in fact considered this issue and have drawn the outline of such
a character handling and conversion library. In fact, a subset of it is
absolutely needed for the text layer of my I/O plans.

Sebastian Redl