
Dear All, I've been very pleased with the discussion of the iostreams library so far. I'd like to thank all those who have posted review and comments. One thing that has been lacking, to a large extent, is a detailed examination of the source code. I understand that this may seem a daunting task, since the library is quite large. Therefore, I would like to point out the relatively small bits of the source which contain the core implementation. I'll also give a brief description of the filtering framework, to make it easier to answer the questions raised by Carlo Wood that large amounts of data are being copied needlessly. Finally -- the easiest part -- I'll mention what I think are the most important outstanding issues about the library interface. ----------------------------------------------------- Part I: Overview of the source. Even if you only look at one of the following three sections, I'd be very happy :-) A. Filter and resource support. This provides the infrastructure for the generic i/o operations read, write, close, etc. It contains no definitions of stream or stream buffers: <boost/io/categories.hpp> <boost/io/concepts.hpp> <boost/io/operations.hpp> <boost/io/io_traits.hpp> <boost/io/detail/ios_traits.hpp> <boost/io/detail/wrap_unwrap.hpp> B. Policy-based streams and stream buffers. This provides the implementation of the fundamental library component streambuf_facade: <boost/io/detail/adapters/filter_adapter.hpp> <boost/io/detail/adapters/resource_adapter.hpp> <boost/io/detail/streambufs/direct_streambuf.hpp> <boost/io/detail/streambufs/indirect_streambuf.hpp> <boost/io/streambuf_facade.hpp> C. Filtering streams and stream buffers. This provides the infrastructure for chaining filters. <boost/io/detail/chain.hpp> <boost/io/detail/streambufs/chainbuf.hpp> <boost/io/filtering_streambuf.hpp> ----------------------------------------------------- Part II. Implementation of filter chains. The following diagram should be viewed with a fixed-width font: streambuf1 streambuf2 streambuf3 filter1 -> filter2 -> resource [------------] [------------] [------------] buf1 (size n1) buf2 (size n2) buf3 (size n3) ^ | | filtering_stream Here the end user write to the filtering_stream at the bottom of the diagram, and the data is passed through streambuf1, streambuf2 and streambuf3, each of which contains a character buffer and either a filter or a resource. The end user, through the filtering stream functions and operatiors, writes *directly* to buf1. When buf1 is full, the buffer's contents are passed to filter1 as follows: filter1.write(streambuf2, buf1, buf1 + n1); This writes *directly* to buf2. When buf2 is full, filter2 is called filter2.write(streambuf3, buf2, buf2 + n2); This writes *directly* to buf3. When buf3 is full, resource is called: resource.write(buf3, buf3 + n3); In the general case, data is copied no more than absolutely necessary. There are some special cases where the copying is wasteful, though. For instance: 1. If filter1 is just a passive observer, simply counting the number of occurences of '\n', or copying the data to a logging stream, then the end user should really be writing directly to buf2. Filter1 could process the data using the same buffer as filter2. 2. Same as above, but filter1 also modifies the data in-place, making character-by-character modifications. (E.g., a toupper filter). This can be handled the same way. 3. If 'resource' is a stream or stream buffer, it could be assumed to do its own buffering. In that case, filter2 should be writing directly to resource, instead of to streambuf3: filter2.write(resource, buf2, buf2+ n2). These three cases can be handled easily by modifying the existing framework. I didn't add special treatment because it occurred to me rather late in development. There is another class of cases in which the current setup is wasteful, but I think it is rather domain-specific: 4. Most filters in the chain modify only small parts of character sequences, leaving big chunks unchanged. For example, a web server might have a chain of filters which add headers to an HTTP message, but leave the body unchanged. In this case, it might be better to allow the filters to send the body through the chain simply as a pointer to a file or memory block. If at some point a filter needs to modify the body, it will read the actual data, modify it, and send it to the next filter as a sequence of characters rather than a pointer to a file or memory block. To optimize case 4 would require a major library extension. My feeling is that it is not necessary at this point, but I'd like to know what others think. ----------------------------------------------------- Part III: Interface questions: 1. How to handle read and write requests which return fewer characters than requested, though there has been no error, and EOF has not been reached. I think some answer to this question is necessary to allow the library to be extended later to handle models orther than ordinary blocking i/o. I mention three possibilties here, http://tinyurl.com/6r8p2, but only two are realistic. I'm interested to know how important pople think this issues is, and what is the best way to resolve it. 2. The stack interface. Is the interface to the underlying filter chains rich enoguh? Originally is was similar to std::list, so that you could disconnect chains at arbitrary points, store them, and reattach them later. I decided there wasn't much use for this, so I simplified the interface. 3. Exceptions. James Kanze has argued repeatedly that protected stream buffer functions should not throw exceptions (http://tinyurl.com/5o34x). I try to make the case for exceptions here: http://tinyurl.com/6r8p2. What do people think? 4. Closable. Many output filters need to write additional information to a stream as soon as it is about to close. For instance, a gzip_compressor needs to write a checksum and message length. This is one of the motivations for the 'Closable' concept (http://tinyurl.com/3pg5j). Sometimes, however, the beginning -- rather than the end -- of a character sequence requires special treatment. For instance, a gzip_compressor must write the gzip header before the compressed data is written. Currently this has to be implemented using a 'first-time switch'. So the question is: Should an open() function be added to the closable interface, to eliminate the need for first-time switches? Alternatively, should there be a separate Openable concept? ----------------------------------------------------- Best Regards, Jonathan