[boost] Re: IOStreams formal review start

8 Sep 2004

      "Daryle Walker" <darylew@hotmail.com> wrote:
...
On 9/5/04 8:58 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:
...
On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:
...
...
"Daryle Walker" <darylew@hotmail.com> wrote:
...
...
...
...
...
1.  Aren't memory-mapped files and file descriptors highly platform
specific?
...
...
...
But for the thread and file-system libraries, we can define default
behavior.
We can do this for memory mapped files as well. Either including the
appropriate header could cause a static assertion, or construction of mapped
file resources could fail at runtime. Right now I've followed the example of
Boost.Filesystem and assumed that every system is either Windows or Posix
This can easily be changed to produce more informative errors. Good point
An object that can never be configured to work (for those deficient
platforms) isn't very useful.
On those platforms, yes. On supported platforms, it is can be very useful.
...
I know that thread (and rarely file-system)
classes have the same potential drawback, but I feel that threads and file
systems are more general "computer science concepts" than memory mapped
files, and so allowances could be made for the latter class ideas.
Threads and filesystem support are good additions to boost (and would be to the
standard) because they are useful, not because they are general "computer
science concepts".
...
...
...
Thread-less environments act as if no spare threads can be
allocated.
That's not the approach of Boost.Thread, IIRC. If thread support is
unavailable, you get a preprocessor error (at least on windows.)
Maybe that should be considered a bug.
It's useful in contexts where thread support can be turned on or off with a
command-line switch. It's probably a bad approach on systems which don't support
threads at all.
...
...
...
Binary I/O only concerns itself with bytes, which is too low-level for text
I/O.  There can and should be bridging code, but the concepts of text
sources/sinks should be distinct from binary sources/sinks.
This just doubles the number of concepts, for little gain.
Not separating concepts that have notable distinctions is not a service.
(That's why a separated regular pointer-based streams from the ones for
pointers-to-const in my library.  The "savings" in making only one set of
class code wasn't worth mixing the semantics of the two stream types.)
What's wrong with this analogy:

Saying that a sequence of characters represents 'text' is like saying that a
sequence of characters represents a 'picture' (i.e., that it conforms to some
image file format specification, such as jpeg, png, etc.)

In order to interpret the data properly, the user must know something about its
internal structure, and must in general apply an additional layer of software
for the content to be usable.

In the case of a sequence of characters representing Chinese text, the user must
apply code conversion to produce a wide character representation. In the case of
a sequence of characters representing a jpeg image, the user must apply a jpeg
interpretter to produce an object representing the image size, pixel data. etc.

In the first case, it would be naive to expect that sending the raw character
sequence to std::cout will print Chinese characters to the console. In the
second case, it would be naive to expect that sending the raw character sequence
to std::cout will display a jpeg image on the console.

So, do we need another family of resource concepts for 'pictures'?

<snip history of C and C++ text/binary distinction>
...
If you're going to start over from scratch with I/O, why not go all the way
and finally split-off binary I/O?  Stop it from being treated as "text I/O
with funny settings".
I'm not starting from scratch. I'm trying to make it easier to use the existing
framework. (In the future, the library may be extended beyond the existing
framework.)
...
...
...
...
filtering_istream in;
  in.push(regex_filter(regex("damn"), "darn"));
  in.push(zlib_decompressor());
  in.push(file_source("essay.z"));
  // read from in.
...
...
All that's assumed in this example is that the characters in the essay file
can be mapped directly to chars. If they can't, one would have to add a
layer
of code conversion (using converter) after the decompression, and use a
wide-character filtering stream and wide-character regex_filter.
...
That a major implicit assumption.
It's not fundamentally different from the assumption that a sequence of
characters conatins a gif image.

   filtering_istream in;
   in.push(gif_to_jpeg())
   in.push(file_source("pony.gif"));
   // read jpeg data from in.

Trust the programmer.
...
...
...
...
Can I rephrase this as follows: InputFilters and OutputFilters are a
useful
addition to the standard library, but Sources and Sinks just duplicate
functionality alread present? If this is not your point please correct me
Yes, that's my point.  I looked through your code, and thought "this is
just
a rearrangement of what's already in streams and stream-buffers".  I got
really convinced of this once I saw that you added member functions for
locale control.
I found I had to add this, rather late in development, to implement
converting
streams and stream buffers (which still aren't finished). What's wrong with
locales? You say it like it's a dirty word.
I have no problems with locales.  I was noting that the more features you
added to the base classes, the more they looked like the rearrangements of
the standard I/O base classes.
Localizability is an optional behavior. Most filters and resources won't
implement it. Filters and resources *do not* have to derive from the convenience
base classes source, sink, input_filter, etc. Since localizability was so easy
to add as a no-op, I gave these base classes no-op implementations of imbue and
i/o categories refining localizable_tag.

Programmers will rarely use this feature, but it imposes no runtime overhead and
very little compile-time overhead, so I don't see any problem.
...
...
...
I've recently noticed that even your documentation for the
Resource and Filter concepts admit that they're just like certain C++ or C
I/O functions.
You mean when I say, for example,
"Filters are class types which define one or more member
 functions get, put, read, write and seek having interfaces
 resembling the functions fgetc, fputc, fread, fwrite and fseek
 from <stdio.h>"
?
Yes.  But I was thinking more of the equivalent paragraph you gave in the
documentation about Resources.
I think I need to change this part of the documentation. Unlike fread, etc, the
basic_streambuf member functions can't be assumed to be familiar to most
programmers. I should  probably use istream::read, istream::write, etc. The
reason I didn't is that these functions don't have the right return types, which
is not a good reason since neither does streambuf::sputn.
...
...
template<typename Ch>
  class null_buf {
  public:
      typedef Ch char_type;
      typedef sink_tag category;
      buf() : count_(0) { }
      void write(const Ch*, std::streamsize n) { count_ += n}
      int count() const { return count_; }
  private:
      int count_;
  };
This will lead to a stream buffer which keeps track of how many characters
pass through, is optimized for single vs. multiple character output, *and*
is
buffered by default.
I don't see any buffering.  (I guess it'll be in whatever class you hook
this up too, like "streambuf_façade".)
Right.
...
Which version, the first or second?
The second.
...
(Hopefully the first, since I wrote my
code above after the first version, and you wrote the second as a response.)
If it's the first, then what is my version missing?  (If it's the second,
then look at the version of the code under my review before comparing.)
I did. That's how I knew it was 79 lines long. It doesn't provide buffering, as
far as I can tell.
...
The traits type carries the policies for comparing and copying (and EOF
issues).  Does the user have the option for overriding policies so they're
not based on "std::char_traits<Ch>"?
As I said, the only place character traits are used in the public interface of
filters and resources is in the return type of get. For this purpose,
std::char_traits<Ch>::int_type should always be sufficient. At any rate, I'm
considering changing it either to optional<char> or to a class type that can
store a char, and eof indicator, or a 'no input available -- try back later'
indicator. Then there would be absolutely no use of character traits.

If you want to define a stream_facade with a custom char_traits type, you can do
so using the second template parameter.

      template< typename T,
              typename Tr = ...
              typename Alloc = ... >,
              typename Mode = ... >
     class streambuf_facade;
...
...
What I've tried to do with the library is to factor out the essential
functionality necessary to define a stream buffer. I've found that in most
cases writing a stream buffer can be reduced to implementing one or two
functions with simple names and specifications. It seems like an obvious win
to me.
But is it always worth the extra layer of indirection you introduce (when
you need to interface with standard-looking I/O)?
The indirection, mostly contained in <boost/io/operations.hpp>, is fairly
lightweight. Users never need to look at it. I'm not sure why you're so
concerned about it.
...
[SNIP concerns about total code size (in terms of header text length)]
...
...
...
...
and a large chunk of it is a "poor man's" reflection system.
Do you mean the i/o categories? This follows the example of the standard
library and the boost iterator library. It's better than reflection, since
...
...
...
...
you can't get accidental conformance.
No, I'm talking about the code you used to get the existing standard I/O
framework to inter-operate with your framework.
Specifically?
Just the large amount of "detail"-level headers.
Fairly typical for boost, I'm afraid.
...
[SNIP about forwarding to the base-stream's value-added functions and on the
nature of the stream facades.]
...
...
1.  Are there really any important sources/sinks that can't be put through
the existing Standard I/O framework?
The standard library handles non-blocking, asynchronous and multiplexed i/o
awkwardly at best. In contrast, for a generic i/o framework, adding such
support should be fairly straightforward. We just need to introduce the
right
concepts.
Whoa.
I just had my "a-ha" moment.
I thought you re-did the interface for streaming concepts just to be
arbitrary.  But you actually did it because you have issues about the
architectural philosophy used by the standard I/O framework, right?!  You
want to fix the problems with current streaming with re-imagining the
architecture (i.e. starting from scratch), and you decided to re-do the
interface to match.
As I said above, I don't think I'm redoing it from scratch -- I'm just
generalizing a little. Later, I might generalize even more.
...
I guess one issue is that you're extending functionality through templates,
while the standard framework uses virtual member functions.
I don't think virtual functions are an issue. Virtual function calls are only
slightly more expensive that ordinary (non-inlined) function calls, and one
can't expect all function calls to be inlined when you have a chain of
non-trivial filters. One must relying on buffering to mitigate the function call
overhead.

Since the static types of the filtering streams and stream buffers do not depend
on the static types of the filters and resources in the underlying chain, some
type of runtime indirection, such as virtual functions, is required. I'm
actually taking advantage of the streambuf virtual functions as a feature -- not
a liability. If I didn't have basic_streambuf to serve as the 'glue' for filter
chains, I'd have to write my own version, probably using virtual functions.
...
...
...
2.  An existing source/sink, if it wants to work with Standard C++, would
work with the standard framework already.
To summarize: an existing source/sink, if it wants to work with the standard
framework, already works with the standard framework?
I meant that existing libraries would have already chosen to base their I/O
around the standard framework, if they had no need to customize the I/O
experience.
If the library is accepted -- and becomes widely used -- I except that
developers will want to write sources and sinks instead of stream buffers.
Existing stream buffers can be rewritten as source or sinks fairly easily in
many cases.
...
...
...
You have a potential problem:
     standard C++ I/O is "too hard"
But you got the wrong solution:
     throw away the standard I/O's legacy and start over from scratch
     (but include transition code)
I hope it's possible to improve some of the standard library I/O framework
in
the future. Perhaps experience with the current library will help form the
basis for a proposal. But that's not the point of the current library. The
point is to make easy what is currently not-so-easy, and to reduce the
difficulty of what is currently very difficult.
I gave an example (the code you snipped) of how the simplified core
interface could be integrated with the standard framework.  What are the
other difficulties?
I don't understand what's wrong with the way I've done it.
...
...
...
This is independent of the decisions on memory-mapped files, file
descriptors, binary I/O, and filters.  Couldn't all of those been
implemented around the standard framework?
Of couse -- with massive code duplication.
Duplication where?  (My question above assumed that your new architecture
never existed and you built your other stuff around the standard framework.)
Right. A lot of typcial stream buffer implemention is boilerplate, esp. if
buffering is used.
...
About the Overlap Between Our Contributions
A bunch of people during my I/O review wanted to defer decisions to see your
I/O review.  I'm not sure that there's a need to pick one-or-the-other due
to how they work.
The review managers will sort this out.
...
I had no intention of redoing the concepts of I/O, so all my sources and
sinks extend the standard framework.
You built a whole new framework, hopefully to address problems with the
standard framework.
Again, I just wanted to make the standard framework easier to use.
...
You build the your sources and sinks to work with your
framework.  And you added adaptors so the new-I/O classes can work with
std-I/O classes.
It's really the other way around. And the adapters are so thin you could crush
them just be leaning against them ;-)
...
There's no problems with efficiency if new-I/O is used through-out the
user's code, since you use a lot of template goodness.  However, if the user
needs to interface with std-I/O, at the user end or the final destination
end, they will have to take a performance hit since std-I/O will call
virtual functions which you can't remove.  (The guy who writes the
"xpressive" library seems to have techniques around the problem, but I'm not
sure they can be applied here.  [I don't know what the techniques are.]  The
std-I/O virtual call dispatch takes place in the standard stream classes, so
the "xpressive" technique can't work if code changes are needed.)  In these
mixed cases, using the new framework can be a win if the applied task takes
more time in the new framework than in the adaptor code.  If the task at
hand has a std-I/O interface, doesn't touch the issues that new-I/O was
meant to solve, and can be succinctly expressed with std-I/O, then there is
no advantage to making and/or using a new-I/O version, since the layer of
indirection given by the adaptor class is the bigger bottleneck.  (The
pointer-based streams are an example of this.)
I think there's a basic misunderstanding here. The adapters generally have no
virtual functions and function calls through the adapters are optimized away
entirely. (I've confirmed this on several compilers. It should be true for any
decent optimizing compiler.) There is currently an inefficiency when you add a
standard stream or stream buffer to the end of a filtering stream, as I describe
in the message "IOStreams Formal review -- Guide for Reviewers". This will be
eliminated entirely if the library is accepted.
...
The point is that one set of class doesn't preclude the usage of the other.
Each one has situations where it's the better solution.
As far as I can tell, the two valid points you have made, w.r.t. our two
contributions, are:

1. Using my library to define a null_buff, pointerbuf or value_buf causes more
code to be included. This is a legitimate criticism, but I don't think you've
made the case that the amount of code included is so enormous that there should
be two versions of the same components in boost.

2. The object code will be slightly larger when using a streambuf_facade
(actually, I'm not sure you made that point, but I think it's correct.) This can
be mitigated somewhat if it turns out to be a problem, but I don't think you
have shown yet that it is.

Best Regards,
Jonathan