[boost] Re: IOStreams formal review start

6 Sep 2004

      Daryle, I think this discussion is getting overheated. (See, e.g., the long code
excerpts containing 'jon_xxx'). If I was a bit harsh in my first comments on
your library, I'm sorry. I did vote to include a large percentage of it.

On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:
...
...
"Daryle Walker" <darylew@hotmail.com> wrote:
...
...
...
1.  Aren't memory-mapped files and file descriptors highly platform
specific?
Yes, just like threads, sockets and directory iteration.
...
Code that works with them would have to be non-portable, so I
don't think they're appropriate for Boost.
It achieves portability the same way boost.thread and boost.filesystem do:
by
having separate implementations for different systems. See
http://www.boost.org/more/imp_vars.htm ("Implementation variations").
But for the thread and file-system libraries, we can define default
behavior.
We can do this for memory mapped files as well. Either including the appropriate
header could cause a static assertion, or construction of mapped file resources
could fail at runtime. Right now I've followed the example of Boost.Filesystem
and assumed that every system is either Windows or Posix. This can easily be
changed to produce more informative errors. Good point.
...
Thread-less environments act as if no spare threads can be
allocated.
That's not the approach of Boost.Thread, IIRC. If thread support is unavailable,
you get a preprocessor error (at least on windows.)
...
All file-systems can simulate a tree/container calculus, so a
portable interface can be defined.
Again, Boost.Filesystem doesn't do this.
...
But memory-mapped files and file
descriptors are totally meaningless on some environments; what would the
code map to in those cases?
See above.
...
...
...
2.  This library does what a lot of other text-I/O libraries do, try to fit
in "kewl" compression schemes.  The problem is that the types of
compression
...
...
...
here are binary oriented; they convert between sets of byte streams.
However, characters are not bytes (although characters, like other types,
are stored as bytes).
Are you saying there are problems with the implementation of the compression
filters, e.g., that they make unwarranted assumptions about 'char'? If so,
please let me know. I'm sure it can be fixed.
I'm complaining that binary I/O should _not_ be treated as a variant of text
I/O (which your library assumes).
All I/O is treated as streams of characters. When these streams of characters
require special 'textual' interpretation, you can use a newline_filter, for
line-ending conversion, or a converter, for code conversion.
...
Binary I/O only concerns itself with
bytes, which is too low-level for text I/O.  There can and should be
bridging code, but the concepts of text sources/sinks should be distinct
from binary sources/sinks.
This just doubles the number of concepts, for little gain.
...
...
I don't see the iostream framework as relating to text streams only: streams
can handle text and binary. In some cases, you want text and binary to work
together. E.g., suppose you have a compressed text file ("essay.z") and you
want
to read a 'family-friendly' version of it. You can do so as follows:
filtering_istream in;
  in.push(regex_filter(regex("damn"), "darn"));
  in.push(zlib_decompressor());
  in.push(file_source("essay.z"));
  // read from in.
Isn't this perfectly natural and convenient? What's wrong with using the
decompressor and the regex filter in the same chain?
By itself, nothing.  But these compression schemes only work with bytes, so
you have hidden at least one text <-> binary converter in your code.
(BTW, the file_source above should have been opened in binary mode.)

All that's assumed in this example is that the characters in the essay file can
be mapped directly to chars. If they can't, one would have to add a layer of
code conversion (using converter) after the decompression, and use a
wide-character filtering stream and wide-character regex_filter.

If the above example were disallowed, then in the common case that output is
stored in a form which can be directly mapped to the internal character set
without code conversion, the user would be forced to insert a do-nothing
adapter.

The current library trusts users to know when they are dealing with data which
must be converter to a wide character type before it can be processed by
text-oriented filters.
...
...
Can I rephrase this as follows: InputFilters and OutputFilters are a useful
addition to the standard library, but Sources and Sinks just duplicate
functionality alread present? If this is not your point please correct me
Yes, that's my point.  I looked through your code, and thought "this is just
a rearrangement of what's already in streams and stream-buffers".  I got
really convinced of this once I saw that you added member functions for
locale control.
I found I had to add this, rather late in development, to implement converting
streams and stream buffers (which still aren't finished). What's wrong with
locales? You say it like it's a dirty word.
...
I've recently noticed that even your documentation for the
Resource and Filter concepts admit that they're just like certain C++ or C
I/O functions.
You mean when I say, for example,

   "Filters are class types which define one or more member
   functions get, put, read, write and seek having interfaces
   resembling the functions fgetc, fputc, fread, fwrite and fseek
   from <stdio.h>"

?

The functions boost::io::read, boost::io::write, etc., are indeed generic
versions of these familiar functions. I mention the familiar functions as a way
to introduce readers to the generic versions. The benefits of generic
programming are well known, I hope.
...
...
There are two main resons to write Sources and Sinks instead of stream
buffers:
1. Sources and Sinks and sinks express just the core functionality of a
component. Usually you have to implement just one or two functions with very
natural interfaces. You don't have to worry about buffering or about putting
back characters. I would have thought it would be obvious that it's easier
to
write:
template<typename Ch>
      struct null_buf {
          typedef Ch char_type;
          typedef sink_tag category;
          void write(const Ch*, std::streamsize) { }
      };
than to write your null_buf, which is 79 lines long.
That really misleading.  The null-sink I have does a lot more.  I keep track
of how many characters passed through (i.e. a value-added function), and I
optimize for single vs. multiple character output.
Okay,

    template<typename Ch>
    class null_buf {
    public:
        typedef Ch char_type;
        typedef sink_tag category;
        buf() : count_(0) { }
        void write(const Ch*, std::streamsize n) { count_ += n}
        int count() const { return count_; }
    private:
        int count_;
    };

This will lead to a stream buffer which keeps track of how many characters pass
through, is optimized for single vs. multiple character output, *and* is
buffered by default.
...
Also, I'm verbose in my
writing style.  If I wanted to be compact I could just do:
//========================================================================
template < typename Ch, class Tr = std::char_traits<Ch> >
class basic_nullbuf
    : public std::basic_streambuf<Ch, Tr>
{
protected:
    // Overriden virtual functions
    virtual  int_type   overflow( int_type c = traits_type::eof() )
    { return traits_type::not_eof( c ); }
};
But that doesn't do what my version, listed above, does.
...
And for those of you who think that "traits_type" is scary: get over it!
Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and
WRONG.  The whole point of the traits class is so that a character type
isn't forced to define those operators.  Worse, those operators could exist
but be inappropriate.  For example, Josuttis' STL book has a string type
that implements case-insensitive comparisons with a custom traits type.
Using operator== directly would have missed that.  Ignoring the policies of
the traits type's creator could betray his/her vision of usage.
In early versions of my library, filters and resources had traits types as well
as charatcer types. Prompted by remarks of Gennadiy Rozental, I made a careful
study and found that traits could be eliminated from the public interface of the
filter/resource module of the library without sacrificing generality or
correctness, except in the case of the return type of get, which is still

    std::char_traits<char_type>::int_type.

Even this could be eliminated by having get return optional<char>. For a more
ambitious proposal along these lines, see http://tinyurl.com/6r8p2.

Of course, filter and resources authors may need to use char_traits to implement
member functions read, write, etc. .... But I'm not sure I see where this
discussion is going.
...
...
2. Sources and sinks can be reused in cases where standard streams and
stream
buffers are either unnecessary or are not the appropriate abstraction. For
example, suppose you want to write the concatenation of three files to a
string.
You can do so like this:
string s;
  boost::io::copy(
      concatenate(
          file_source("file1"),
          file_source("file2"),
          file_source("file3")
      ),
      back_insert_resource(s)
  );
...
A straw-man?  Wouldn't an iterator-based solution have been better?  (There
are stream(-buffer) iterators, and (string) insert iterators.  If the Boost
iterator library provides a chaining iterator type, then the standard
copying procedure could be used.)
It's tempting to try to do everything using iterators. In fact, Robert Ramey's
original suggestion to expand the library to handle filtering suggested that it
be based on iterator adapters.
(http://lists.boost.org/MailArchives/boost/msg48300.php)

The problem with this approach is that it misses the opportunity for many
important optimizations that can be made when one is presented with a contiguous
buffer full of characters, instead of one character at a time.
...
...
...
The whole framework seems like "I/O done 'right'", a "better"
implementation
of the ideas/concepts shown in the standard I/O framework.
I'd say thanks here if 'right' and 'better' weren't in quotes ;-)
It looked like you changed the interface just to change the interface, not
out of any actual need.  What about the following (untested) code:
I'm going to ignore the code, which seems sarcastic. (Don't name stuff after me
until I'm dead.)

Instead, let me quote part of my response to Dietmar Kuehl:

Jonathan Wrote:
...
... The protected virtual interface of
basic_streambuf is, IMO, quite strange. The function have wierd names:
underflow, uflow, pbackfail, overflow, showmanyc, xsptun, xsgetn, seekoff,
etc -- the functions read, write, and seek are much more intuitive. The
specifications of the standard functions are tricky, too. For example,
overflow
(one of the better-named functions), is specified roughly like this:
virtual int_type overflow(int_type c = traits_type::eof());
"If c is not eof, attempts to insert into the output sequence
    the result of converting c to a character. If this can't be done,
    returns eof or throws an exception. Otherwise returns any value
    other than eof."
Contrast this with
void write(const char_type* s, std::streamsize n);
"Writes the sequence of n characters starting at s to the
    output sequence, throwing an exception in case of error."
What I've tried to do with the library is to factor out the essential
functionality necessary to define a stream buffer. I've found that in most cases
writing a stream buffer can be reduced to implementing one or two functions with
simple names and specifications. It seems like an obvious win to me.

<snip lots of code>
...
...
...
The price is a
code size many times larger than the conventional system,
Are you talking about the size of the libray or the size of the generated
code?
The size of the library.
1. The library is big partly because it contains a lot of special pupose
components, such as compression filters. You don't pay for them if you don't use
them.

2. The support for the generic read and write operations is quite lightweight.

3. If you use the library just to define new stream buffer types, then in
addition to (3) the main code comes from

    <boost/io/detail/streambufs/indirect_streambuf.hpp>,

which is the generic streambuf implementation, and from

   <boost/io/detail/adapters/resource_adapter.hpp>
   <boost/io/detail/adapters/filter_adapter.hpp>

which are lightweight wrappers that allow indirect_streambuf to interact with
filters and resources using a single interface.

4. If you want to chain filters, then in addition to (2) and (3), the main code
comes from

   <boost/io/detail/chain.hpp>

which at 16k is a small price to pay for a flexible filtering framework.
...
...
...
and a large chunk
of it is a "poor man's" reflection system.
Do you mean the i/o categories? This follows the example of the standard
library and the boost iterator library. It's better than reflection, since
you
can't get accidental conformance.
No, I'm talking about the code you used to get the existing standard I/O
framework to inter-operate with your framework.
Specifically?
...
...
...
... The sample stream-buffer in
More-I/O generally had added-value member functions attached, that perform
inspection or (limited) reconfiguration.  Those member functions also have
to be manually carried over to the final derived stream class. ...
The Iostreams
framework seems to totally ignore the issue!  ...
...
...
With a streambuf_facade or stream_facade you can access the underlying
resource
directly using operators * and ->. E.g.,
stream_facade<tcp_resource> tcp("www.microsoft.com", 80);
  ...
  if (tcp->input_closed()) {
     ...
  }
Maybe I should stress this more in the documentation. (I imagine some people
won't like the use of operators * and -> here, but these can be replaced by
a
member functions such as resource().)
I didn't like the iterator "look" those operations have.
Noted.
...
Also, is a stream-façade an actual stream?
Yes.
...
1.  Are there really any important sources/sinks that can't be put through
the existing Standard I/O framework?
The standard library handles non-blocking, asynchronous and multiplexed i/o
awkwardly at best. In contrast, for a generic i/o framework, adding such support
should be fairly straightforward. We just need to introduce the right concepts.
...
2.  An existing source/sink, if it wants to work with Standard C++, would
work with the standard framework already.
To summarize: an existing source/sink, if it wants to work with the standard
framework, already works with the standard framework?
...
You have a potential problem:
     standard C++ I/O is "too hard"
But you got the wrong solution:
     throw away the standard I/O's legacy and start over from scratch
     (but include transition code)
I hope it's possible to improve some of the standard library I/O framework in
the future. Perhaps experience with the current library will help form the basis
for a proposal. But that's not the point of the current library. The point is to
make easy what is currently not-so-easy, and to reduce the difficulty of what is
currently very difficult.
...
This is independent of the decisions on memory-mapped files, file
descriptors, binary I/O, and filters.  Couldn't all of those been
implemented around the standard framework?
Of couse -- with massive code duplication.

Jonathan