Re: [boost] Re: Re: Re: IOStreams formal review -- extended

16 Sep 2004

      From: "Jonathan Turkanis" <technews@kangaroologic.com>
...
"Rob Stewart" <stewart@sig.com> wrote in message:
...
...
"Rob Stewart" <stewart@sig.com> wrote in message:
...
From: "Jonathan Turkanis" <technews@kangaroologic.com>:
...
"Rob Stewart" <stewart@sig.com> wrote in message:
...
If both remain, then each needs more
information and rationale so users understand which to choose for
a given use case.
This should be the explanation:
"If you have an existing streambuf implementation and you can't or don't
    want to reimplement it as a Resource, use streambuf_wrapping.hpp;
    otherwise, you should probably reimplement is as a Resource and and use
    stream_facade."
I'm not sure Daryle would agree with that.  Anyway, my point is
that if Boost accepts both libraries, then the two of you need to
determine the synergies and differences between your libraries
and ensure users understand the value of each approach.
I see your point now -- someone wants (no apostrophe ;-) to write a
streambuf/stream pair.
Daryle: Write the streambuf from scratch, then use streambuf_wrapping.hpp
    Jonathan: Write a Resource, then use streambuf_facade and stream_facade
I agree with Jonathan :-)
Imagine that.
...
I've already given most of the reasons in my reply to Dietmar Kuehl, so if you
don't mind, I'll quote myself (sorry for the length):
[snip rationale for IOStreams Library vs. MoreIO]

You'll need Daryle's side of this when providing the rationale
for choosing between the libraries.

BTW, the reasons you cite are significant and certainly cause me
to favor your approach over Daryle's.  Unfortunately, I don't
recall feedback from him to counter those specific claims aside
from preference.  (If preference remains the reason to keep both
libraries, then your rationale and any from Daryle should make
the case for each library and leave to the user the choice based
upon preference.)
...
I should add: even if someone reads one of the several available books on the
standard iostreams library and decides to write a stream buffer from scratch,
there's a good chance the implementation will suffer one of the following
problems:
(i) buffering will be omitted, since it's hard to do correctly.
(ii) buffering will be provided, but mistakes in pointer arithmetic will cause
subtle errors
(iii) sub-optimal algorithms will be used.
These are excellent reasons to hide the details of buffering in a
framework and should be part of your rationale.
...
Note that two of Daryle's stream buffers suffer from defect (i).
I doubt that he omitted buffering because it is difficult.
...
...
...
...
...
...
"Peekable" does not imply being able to put back a character.
...
...
...
Many applications have only one level of undo and don't allow
everything to be undone.  Consequently, I don't think this is
much of a problem.  How about "revertable?"
This still sounds too general. Maybe 'PutbackResource'?
That, of course, doesn't follow the "able" convention you've
established.  Otherwise, it does get right to the point clearly.
I know -- that's because 'Putbackable' is ugly, even when joined to 'Resource.'
Here are some other ideas (based on your suggestions and a thesaurus):
RevertableSouce, RestorableSource, UndoableSource, ReinsertableSource.
*Between* these, I prefer UndoableSource.  (I wrote "among"
 first, but couldn't resist.)
...
...
...
...
...
In addition, allowing filters to be pushed after a resource would give
many
new
users the impression that they can add filters *after* i/o is in
progress.
As
has been discussed during the review, this is not currently supported;
support
can be added in limited circumstances, but not generally.
Consider:
filtering_ostream out;
  out.push(file_sink("log"));
  out.push(base_64_encoder());
  out << "hello world!\n"; // stream is implicity 'open'
  out.push(zlib_compressor()); // error!
This won't be a problem with complete() or add_resource().
If you mean that the above should be rewritten
filtering_ostream out;
   out.push(file_sink("log"));
   out.complete(base_64_encoder());
   out << "hello world!\n";
   out.push(zlib_compressor()); // error!
you may be right that users would be less likely to make this mistake. I
don't
see how add_resource would help at all.
Because "add_resource" was offered as a synonym for "complete."
But here, the component being added with add_resource (the base_64_encoder) is
not a resource at all!
But file_sink is.
...
...
...
I believe the current stack-like interface is elegant and intuitive.
Reversing
the order will also be confusing if I adopt JC van Winkel's pipe notation,
which
I plan to do. If I adopt both changes, the following would be equivalent:
filtering_ostream out;
   out.push(file_sink("log"));
   out.push(base_64_encoder());
   out.complete(newline_filter(newline::windows));
---
filtering_ostream out(
      newline_filter(newline::windows) |
      base_64_encoder() |
      file_sink("log") );
The first example is using the proposed, new syntax, so I'd
prefer to see it written like this:
filtering_ostream out;
   out.push(base_64_encoder());
   out.push(file_sink("log"));
   out.complete(newline_filter(newline::windows));
Then, the second, which is confusing as written, should be:
filtering_ostream out(
      base_64_encoder() |
      file_sink("log") |
      newline_filter(newline::windows));
Then, the two are quite similar.
This seems totally screwy to me. ;-)
Well, duh!  Let me try that again:

   filtering_ostream out;
   out.push(base_64_encoder());
   out.push(newline_filter(newline::windows));
   out.complete(file_sink("log"));

   filtering_ostream out(
      base_64_encoder() |
      newline_filter(newline::windows) |
      file_sink("log"));
...
This seems totally screwy to me. ;-) There are two resonable conventions:
I. Push the resource first, then push the filters, in order, starting with
the one furthest from the user.
    II. Push the filters first, in order (the reverse of I), starting with the
one closest to the user, then push the resouce.
There are other reasonable conventions.  I like this one:

   III. Push the filters first, in order of data flow, followed
        by the resource.

That is, if you're filtering input, then the filter connected to
the Source's output is the first in the list.  If you're
filtering output, then the filter connected to the Sink's input
is the last in the list.  Put another way, the first filter to
see data appears first, the last one to see data appears last.
The only "odd" thing is that the resource always goes last.  (Odd
because for an input stream, you'd ideally want the resource to
be first.)
...
II is the convention I adopted, for reasons already explained. In the above
example, there are two possibilities:
I.   file_sink <-- base64_encoder <-- newline_filter
    II.  newline_filter --> base64_encoder --> file_sink
III. (same as II in this case)
...
I can't see any justification for putting the resource in the middle, as you
have done.
Nor can I.  I was clearly distracted when I wrote that.
...
...
...
...
...
...
_______________________________
basic_newline_filter
...
...
Under your proposal, would a typical construction of a newline_filter look
like
this:
newline_filter(write_CR, accept_LF | accept_CR | accept_CRLF )
instead of
newline_filter(write_CR | accept_LF | accept_CR | accept_CRLF )
Yes.
That sounds like a good idea. Then there would be two constructors, used as
follows:
newline_filter(write_LF | accept_LF | accept_CR | accept_CRLF);
    newline_filter(posix);
I guess this is what you already said.
Bingo.
...
...
...
somtimes necessary, e.g., to achieve a good compression ratio, to allow
symmetric filters to output fewer characters than possible. In that case,
one
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I'd like to see that!
This is the case with zlib. The longer it can store up input, the better the
You wrote, "output fewer characters than possible."  That's what
I'd like to see! ;-)
...
...
I don't quite understand your point, but that's immaterial.  It
sounds like something like this would work:
std::pair<streamsize, streamsize>
   filter(char const * input, streamsize n,
      char const * output);
Provided those interfaces are close, wouldn't this make writing
symmetric filters easier?
There are still two problems:
1. the output buffer is const, which seems wrong.
Quite right.  I just type "const" out of habit and then remove it
when appropriate.  In this case, I failed to remove it when
appropriate.
...
2. the filter has no way of knowing the size of one of the two provided
buffers, depending on the interpretation of the streamsize parameter.
I was just making a tacit assumption that the input and output
buffers were the same size.
...
So putting aside the issue of flushing, your suggested interface should be
std::pair<streamsize, streamsize>
    filter( char const* input, streamsize input_size,
             char* output, streamsize output_size );
I consider this interface pretty much equivalent to mine. In fact, I considered
having SymmetricFilters return std::pair<streamsize, streamsize> -- I can't
remember why I chose the present interface. At any rate, I consider them
equivalent and don't see how your version makes things easier.
I don't think I even looked at your SymmetricFilter stuff, but
yes, they do appear to be equivalent.
...
There's another problem with throwing out the current InputFilter and
OutputFilter concepts. A filter which performs both input and output with two
separate character sequences -- currently called InoutFilter but soon to be
renamed BidirectionalFilter -- needs some way to know whether it's being asked
to perform input or output. So the full interface becomes:
boost::optional<char_type>
     filter(char_type ch, ios::openmode);
for one-character-at-a-time filtering, and
std::pair<streamsize, streamsize>
    filter( char const* input, streamsize n,
             char* output, streamsize n, ios::openmode );
for multi-character filtering.
I think you'd just have two instances of the same filter when you
want bidirectional filtering.  The framework would take care of
inserting each instance into the correct data stream.
...
...
...
...
...
There are several choices for this type of passage:
1. Use the passive voice everywhere.
2. Use 'we' -- this sounds natural to me because it's used in
mathematical
papers.
3. Use 'you'
4. Use 'the user'
What about in the ordinary case (not comments, not tutorials)?
...
What is an "ordinary" case?  Personal correspondence?  Scientific
report?  Essay on the current geopolitical state of the world?
Reference documentation.
I was considering that equivalent to a tutorial.

-- 
Rob Stewart                           stewart@sig.com
Software Engineer                     http://www.sig.com
Susquehanna International Group, LLP  using std::disclaimer;