[boost] IOStreams Formal review -- Guide for Reviewers

7 Sep 2004

      Dear All,

     I've been very pleased with the discussion of the iostreams library so far.
I'd like to thank all those who have posted review and comments.

    One thing that has been lacking, to a large extent, is a detailed
examination of
the source code. I understand that this may seem a daunting task, since the
library is quite large. Therefore, I would like to point out the relatively
small bits of the source which contain the core implementation.

    I'll also give a brief description of the filtering framework, to make it
easier to answer the questions raised by Carlo Wood that large amounts of data
are being copied needlessly.

   Finally -- the easiest part -- I'll mention what I think are the most
important outstanding issues about the library interface.

-----------------------------------------------------

Part I: Overview of the source. Even if you only look at one of the following
three sections, I'd be very happy :-)

A. Filter and resource support. This provides the infrastructure for the generic
i/o operations read, write, close, etc. It contains no definitions of stream or
stream buffers:

      <boost/io/categories.hpp>
      <boost/io/concepts.hpp>
      <boost/io/operations.hpp>
      <boost/io/io_traits.hpp>
      <boost/io/detail/ios_traits.hpp>
      <boost/io/detail/wrap_unwrap.hpp>

B. Policy-based streams and stream buffers. This provides the implementation of
the fundamental library component streambuf_facade:

      <boost/io/detail/adapters/filter_adapter.hpp>
      <boost/io/detail/adapters/resource_adapter.hpp>
      <boost/io/detail/streambufs/direct_streambuf.hpp>
      <boost/io/detail/streambufs/indirect_streambuf.hpp>
      <boost/io/streambuf_facade.hpp>

C. Filtering streams and stream buffers. This provides the infrastructure for
chaining filters.

      <boost/io/detail/chain.hpp>
      <boost/io/detail/streambufs/chainbuf.hpp>
      <boost/io/filtering_streambuf.hpp>

-----------------------------------------------------

Part II. Implementation of filter chains. The following diagram should be viewed
with a fixed-width font:

     streambuf1            streambuf2           streambuf3

     filter1          ->   filter2          ->  resource

     [------------]        [------------]       [------------]
     buf1 (size n1)        buf2 (size n2)       buf3 (size n3)

           ^
           |
           |
    filtering_stream

Here the end user write to the filtering_stream at the bottom of the diagram,
and the data is passed through streambuf1, streambuf2 and streambuf3, each of
which contains a character buffer and either a filter or a resource.

The end user, through the filtering stream functions and operatiors, writes
*directly* to buf1. When buf1 is full, the buffer's contents are passed to
filter1 as follows:

     filter1.write(streambuf2, buf1, buf1 + n1);

This writes *directly* to buf2. When buf2 is full, filter2 is called

     filter2.write(streambuf3, buf2, buf2 + n2);

This writes *directly* to buf3. When buf3 is full, resource is called:

     resource.write(buf3, buf3 + n3);

In the general case, data is copied no more than absolutely necessary.

There are some special cases where the copying is wasteful, though. For
instance:

1. If filter1 is just a passive observer, simply counting the number of
occurences of '\n', or copying the data to a logging stream, then the end user
should really be writing directly to buf2. Filter1 could process the data using
the same buffer as filter2.

2. Same as above, but filter1 also modifies the data in-place, making
character-by-character modifications. (E.g., a toupper filter). This can be
handled the same way.

3. If 'resource' is a stream or stream buffer, it could be assumed to do its own
buffering. In that case, filter2 should be writing directly to resource, instead
of to streambuf3:

    filter2.write(resource, buf2, buf2+ n2).

These three cases can be handled easily by modifying the existing framework. I
didn't add special treatment because it occurred to me rather late in
development.

There is another class of cases in which the current setup is wasteful, but I
think it is rather domain-specific:

4. Most filters in the chain modify only small parts of character sequences,
leaving big chunks unchanged.

For example, a web server might have a chain of filters which add headers to an
HTTP message, but leave the body unchanged. In this case, it might be better to
allow the filters to send the body through the chain simply as a pointer to a
file or memory block. If at some point a filter needs to modify the body, it
will read the actual data, modify it, and send it to the next filter as a
sequence of characters rather than a pointer to a file or memory block.

To optimize case 4 would require a major library extension. My feeling is that
it is not necessary at this point, but I'd like to know what others think.

-----------------------------------------------------

Part III: Interface questions:

1. How to handle read and write requests which return fewer characters than
requested, though there has been no error, and EOF has not been reached. I think
some answer to this question is necessary to allow the library to be extended
later to handle models orther than ordinary blocking i/o. I mention three
possibilties here, http://tinyurl.com/6r8p2, but only two are realistic. I'm
interested to know how important pople think this issues is, and what is the
best way to resolve it.

2. The stack interface. Is the interface to the underlying filter chains rich
enoguh? Originally is was similar to std::list, so that you could disconnect
chains at arbitrary points, store them, and reattach them later. I decided there
wasn't much use for this, so I simplified the interface.

3. Exceptions. James Kanze has argued repeatedly that protected stream buffer
functions should not throw exceptions (http://tinyurl.com/5o34x). I try to make
the case for exceptions here: http://tinyurl.com/6r8p2. What do people think?

4. Closable. Many output filters need to write additional information to a
stream as soon as it is about to close. For instance, a gzip_compressor needs to
write a checksum and message length. This is one of the motivations for the
'Closable' concept (http://tinyurl.com/3pg5j). Sometimes, however, the
beginning -- rather than the end -- of a character sequence requires special
treatment. For instance, a gzip_compressor must write the gzip header before the
compressed data is written. Currently this has to be implemented using a
'first-time switch'.

So the question is: Should an open() function be added to the closable
interface, to eliminate the need for first-time switches? Alternatively, should
there be a separate Openable concept?

-----------------------------------------------------

Best Regards,
Jonathan

[boost] IOStreams Formal review -- Guide for Reviewers

Jonathan Turkanis