Re: [boost] [rfc] I/O Library Design

18 Jun 2007

      Sebastian Redl <sebastian.redl@getdesigned.at> writes:
...
A few weeks ago, a discussion that followed the demonstration of the
binary_iostream library made me think about the standard C++ I/O and
what I would expect from an I/O model.
...
Now I have a preliminary design document ready and would like to have
some feedback from the Boost community on it.
...
The document can be found here:
http://windmuehlgasse.getdesigned.at/newio/
...
I'd especially like input on the unresolved issues, but all comments are
welcome, even if you tell me that what I'm doing is completely pointless
and misguided. (At least I'd know not to waste my time with refining and
implementing the design. :-))
I am pleased to you taking an interest in a new I/O library for C++.
The existing C++ I/O facilities have always bothered me, but I've never
gotten around to trying to write something better.  I have a number of
comments.  They aren't particularly well structured, because I didn't
bother to try to reorganize them after initially just writing down
thoughts as they occurred to me.

- I think it is important to look at the boost iostreams architecture,
  and make sure to include or reuse any of the ideas or even actual code
  if possible.  One idea from that library to consider is the
  direct/indirect device distinction.

- Binary transport layer issue:

  Make the "binary transport layer" the "byte transport layer" to make
  it clear that it is for bytes.

  Platforms with unusual features, like 9-bit bytes or inability to
  handle types less than 32-bits in size can possibly still implement
  the interface for a text/character transport layer, possibly on top of
  some other lower-level transport that need not be part of the boost
  library.  Clearly, the text encoding and decoding would have to be
  done differently anyway.

- Asynchronous issue:

  Asynchronous I/O is extremely useful, but it also requires a very
  different architecture --- something like asio io_service is needed to
  manage requests, a function to call on completion or error must be
  provided.

  One issue is that there are very large differences between platforms
  (Windows and Linux).  On Linux, asynchronous I/O via efficient polling
  for readiness is possible for sockets and pipes using epoll (and
  somewhat less efficiently using select and poll), but these mechanisms
  cannot be used for regular files.  I think there may be other
  asynchronous I/O mechanisms on Linux that do support regular files, at
  least on some filesystems, but which are not very easily compatible
  with epoll and other methods suitable for sockets.  Furthermore, even
  if read and write are asynchronous, open will always be synchronous on
  Linux.  It may not be feasible, therefore, to implement a proper
  asynchronous I/O interface on Linux.  Even on Windows, I belive it may
  not be possible to get asynchronous open.

  Thus, I think I agree that it would be better to avoid including an
  asychronous I/O interface in this library, although probably a bit
  more thought should go into the decision before it is made.

- Seeking:

  Maybe make multiple mark/reset use the same interface as seeking, for
  simplicity.  Just define that a seeking device has the additional
  restriction that the mark type is an offset, and the argument to seek
  need not be the result of a call to tell.

  Another issue is whether to standardize the return type from tell,
  like std::ios_base::streampos in the C++ iostreams library.

- Binary formatting (perhaps the name data format would be better?):

  I think it is important to provide a way to format
  {uint,int}{8,16,32,64}_t as either little or big endian two's
  complement (and possibly also one's complement).  It might be useful
  to look at the not-yet-official boost endian library in the vault.

  A similar variety of output formats for floating point types should
  also be supported.

  It is also important to provide the most efficient output format as
  an option as well (i.e. writing the in-memory represention of the
  type directly, via e.g. reinterpret_cast).  It should probably also
  be possible to determine using the library at compile time what the
  native format is.  It is not clear what to do about the issue of some
  platforms not using any standard format as its native format.

- Header vs Precompiled:

  I think as much should be separately compiled as possible, but I also
  think that type erasure should not be used in any case where it will
  significantly compromise performance.

- The "byte" stream and the character stream, while conceptually
  different, should probably both be considered just "streams" of
  particular POD types.  The interfaces will in general be exactly the
  same as far as reading, writing, seeking, filtering.

- Text transport:

  I don't think this layer should be restricted to Unicode encodings.
  Rather, a text transport should just be a "stream" of type T, where T
  might be uint8_t, uint16_t, uint32_t depending on the character
  encoding.  For full generality, the library should provide facilities
  for converting between any two of a large list of encodings.  (For
  simplicity, some of these conversions might internally be implemented
  by converting first to one encoding, like UTF-16, and then converting
  to the other encoding, if a direct conversion is not coded specially.)

  I think it is important to require that all of a minimal set of
  encodings are supported, where this minimal set should include at
  least all of the common unicode encodings, and perhaps all of the
  iso-8559-* encodings as well, in addition to ASCII.

- Text formatting:

  For text formatting, I think it would be very useful to look at the
  IBM ICU library.  It may in fact make sense to leave text formatting
  as a separate library (for example, as a unicode library), since it is
  somewhat encoding specific, and a huge task by itself and not very
  related to this I/O library.  As long as the I/O library provides a
  suitable character stream interface, an arbitrary formatting facility
  can be used on top of it.

-- 
Jeremy Maitin-Shepard