Re: [boost] Re: IOStreams formal review -- extended

14 Sep 2004


      From: "Jonathan Turkanis" <technews@kangaroologic.com>
...
"Rob Stewart" <stewart@sig.com> wrote in message
news:200409131343.i8DDhOa15411@lawrencewelk.systems.susq.com...
...
_________________________________________________________________
Should the library be accepted into Boost?
Absent Daryle Walker's strong comments against this library, I'd
have said, "Absolutely."  Given those comments, I must be more
guarded.
The MoreIO library did less to simplify creating and managing
streams and the cognitive complexity thereof, and it does not
provide the filtering that the Iostreams library offers.  Still,
MoreIO offers some facilities and an approach that leads to
questions regarding whether these two are competing or
complementary libraries.  While more investigation regarding the
strengths and synergies of the two libraries seems warranted, it
may be best to accept both and sort out the matter later.
There is no overlap between the manipulators from More IO and the present
library. I voted to accept all the manipulators (partly as a result of your
comments) so I don't see any problems there.
Agreed.
...
streambuf_wrapping doesn't really overlap with the present library either. I
voted to reject because I thought it didn't simplify writing streams enough to
justify including it. If it is accepted, however, I don't see any real conflict.
I see a conflict because of the dramatically different approaches
to "wrapping" a streambuf.  If both remain, then each needs more
information and rationale so users understand which to choose for
a given use case.
...
So the real question is: if one or more of null_buf, pointerbuf array_buf and
value_buf (I may be spelling these wrong) is accepted, would it be advantageous
to reimplement it using the present library.
There are two issues:
1. Performance
Reimplementing pointerbuf and array_buf using the present library should produce
exactly the same runtime performce, since resources which present their
character sequences as in-memory arrays are specially optimized (see
direct_streambuf.fppp at http://tinyurl.com/3qrsx).
Great.
...
Reimplementing null_buf and value_buf should improve runtime performance, since
the implementations from More IO are not buffered.
The question, then, is whether MoreIO should add buffering to
remove that difference.
...
The reimplemented versions of null_buf and value_buf will have somewhat larger
generated code size. Perhaps this would also be true for pointerbuf and
array_buf.
With the addition of buffering, this nebulous difference may be
moot.  Real numbers will make it possible to judge the value of
both approaches, of course.
...
Using the reimplemented versions will also cause more source code to be included
during compilation. This is because Daryle's implementations are self contained
while and mine depend on parts of type traits and mpl as well as a certain
amount of internal infrastructure. Daryle clearly feels this is an important
issue; I think the amount of code included is in keeping with other boost
libraries.
If yours were orders of magnitude longer to compile, this might
be an issue, but I doubt that's the case.
...
2. Interface.
Streams and stream buffers generated using the present library have a simple
open/is_open/close interface similar to std::fstream. I think including the
streams and stream buffers from More IO as is, rather than giving them a
compatible interface, will be confusing to users.
This is one of the reasons why I'm concerned about keeping both.
I like the simplicity of your approach, especially since the code
size and performance is nearly the same.  Nevertheless, without
hard numbers on these differences, it's hard to judge
objectively.  There's also the issue of whether MoreIO would be
of more benefit to those doing embedded C++, for example.
...
...
_______________________________
Design Evaluation
The library is well conceived and executed.  The design seems
sound.  There are still questions regarding the overlap and
differences between this library and MoreIO.  There are
fundamental, philosophical differences between the libraries.
Could you elaborate?
Daryle thinks the Standard IOStreams I/F is acceptable as is and
worked within it.  You hide the complexities of that I/F behind a
minimal, but sufficient -- at least until async is dealt with
fully -- I/F.  His approach is documented in the standard and
many C++ books.
...
...
_________________________________________________________________
Library Name
I agree with the sentiment of other reviewers in that the library
is misnamed.  Since this library provides filtering, complete
with chaining, as well as to make it easy to create streambufs
and streams, the current name fails to capture all of these
features.
I agree, but couldn't think of anything better.
...
Since the Standard Library provides streambufs,
streams, manipulators, etc., and yet it is called the IOStreams
Library, perhaps there's nothing wrong with simply calling this
library Boost.IOStreams.  (I do think this library should use the
same capitalization: "IOStreams" not "Iostreams.")
Steve Teal's "C++ IOStreams Handbook" and Langer and Kreft's "Standard C++
IOStreams and Locales" use the spelling 'IOStreams'. The C++ standard,
however.uses 'Iostreams'. (See 27.1 [lib.iostreams.requirements]). I think
'IOStreams' looks a bit funny, but I'm happy to use it if that's what people
like.
I'd seen it spelled "IOStreams" so many places that I was sure it
was that way in the standard.  Go figure.
...
...
_________________________________________________________________
Design Comments
_______________________________
Miscellaneous
Wouldn't the InoutResource concept be better named
"BidirectionalResource," though that is longer?  (Same for
InoutFilter.)
Bidirectional was the name I first gave it (http://tinyurl.com/4gskg). I changed
it because the filters and resources in question manage two separate sequences,
whereas bidirectional iterators represent a single sequence. I chose 'inout'
because such components act like an input component glued to an output
component.
I don't like the name much, and would happily revert to bidirectional, I people
don't feel that it's misleading.
Generally, "bidirectional" doesn't convey any information
regarding a common or separate buffer, so I think it works fine.
...
...
"Peekable" does not imply being able to put back a character.
I know. The natural choice would have been Putbackable, which sounds terrible. I
choice Peekable because if you can put back a character than you can peek ahead
in the stream by reading a character and then putting it back.
...
Wouldn't "Undoable" be a better name for this optional behavior?
It sounds a bit to general. For instance, an Undoable Sink might allow
characters already written to be cancelled.
Many applications have only one level of undo and don't allow
everything to be undone.  Consequently, I don't think this is
much of a problem.  How about "revertable?"
...
...
The normal template parameter abbreviation for a character type
is "Char."  I don't think abbreviating it to "Ch" is helpful.
The standard uses charT; so do Langer and Kreft. The Dinkumware docs use Elem. I
don't recall seeing Char, but I'm not against it.
You're not using the "T" suffix, so "Char" is the same as "charT"
in my estimation.
...
...
There's no information provided on the performance of this
library.  How much overhead does it add compared to custom
solutions?
Performance comparable to hand-written components is listed as one of the main
design goals. (Rationale-->Design Goals-->Item 3, at http://tinyurl.com/3z9ou).
The footnote to that item (http://tinyurl.com/4th3v) mentions some performace
comparisons I performed.
The information, as I recall, wasn't rigorous, or at least not
reported that way.  Also, I think it was limited to a few
compilers and one or two streams.
...
To summarize, I found that for output, stream_facade<file_descriptor_sink> was
slightly faster than the Dinkumware std::ofstream (but it does no
synchonization.) Since the overhead of filesystem access might be expected to
dominate in that comparison, I also tested a custom ostringstream against the
Dinkumware component, and found identical performance.
Yes, I recall that now.  It does suggest that your library does
about as well as can be done.  This should be discussed and
highlighted better, I think.  Also, a wider range of tests on
many compilers would lend greater credibility to the conclusions.
...
It would have been natural to test input and random access as well. I didn't do
this because I wasn't convinced that the tests I was running (from
<libs/io/test/detail/verification.hpp>) had any relation to typical use cases.
(BTW, I expect random-access performance won't be so good right now, since I
haven't optimized streambuf_facade::seekoff for short, frequent seeks. This can
be done if necessary.)
I didn't test filtering performance because at the time I didn't have anything
to compare it with except for James Kanze's implementation, which uses
pre-standard streams. Now that I have JC van Winkel's filtering implementation
and Robert Ramey's Dataflow Iterators, I can run some head-to-head tests. I have
spent a lot of effort on optimization, and expect my implementation to compare
favorably.
That sounds good.
...
...
For example, how much memory and runtime overhead is
there wrt MoreIO's mechanisms?
There should be slightly more memory overhead with the reimplementation of
null_buf and value_buf using my library, since it provides buffering by default.
The only fair comparison is, I think with both buffered.
Conversely, what about a version from your library that isn't
buffered?
...
...
The filter chain ordering for input versus output is awkward.  It
should always be from the beginning to end; let the library
reverse the order.
I'm surprised nobody mentioned this earlier.
Having the resource at the end dramatically simplifies the interface. It allows
an instance filtering_stream of filtering streambuf to be considered 'open' as
soon as a resource is pushed, and 'closed' as soon as it is popped. This allows
a simple stack-like interface. The alternative would be
I understand.
...
(i) to add open/close functions (which are part of the interface of
streambuf_facade, but would currently be redundant for filtering streams, OR
I don't think this would be a problem, but I don't see -- and I'm
not going to go look right now, sorry -- why it is redundant for
filtering streams.  Also, couldn't the function be called
"complete" or "add_resource?"
...
(ii) to have a first-time switch which automatically 'opens' the filtering
stream as soon as the first i/o is performed.
This will add memory and runtime overhead it would be best to
avoid.
...
In addition, allowing filters to be pushed after a resource would give many new
users the impression that they can add filters *after* i/o is in progress. As
has been discussed during the review, this is not currently supported; support
can be added in limited circumstances, but not generally.
Consider:
filtering_ostream out;
  out.push(file_sink("log"));
  out.push(base_64_encoder());
  out << "hello world!\n"; // stream is implicity 'open'
  out.push(zlib_compressor()); // error!
This won't be a problem with complete() or add_resource().
...
...
_______________________________
boost::io::converter
"converter" isn't well-named.
How about code_converter?
That seems pretty reasonable, but I'm not very lucid right
now. ;-)
...
...
It only widens characters for the
resources it converts, so it should be called something else.
converter is basically an easy to use wrapper for codecvt, which is the standard
library's component for code conversion. So 'code_coverter' (or converter, for
short) seems appropriate. George Garner suggested that reverse code conversion
be supported too, and I am inclined to agree, though it is much less useful.
Reasonable.
...
...
generic "widen" which ensures that the resulting streambuf or
stream uses wide characters.
In theory, yes. If you supply a locale containing a codecvt<wchar_t, wchar_t>,
then you could use
converter<wfile_source>
to "widen" a wide-character stream. However, you can't depend on an
implementation providing specializations  codecvt<wchar_t, wchar_t> or even
codecvt<char, wchar_t>.
I'm inclined to parameterize converter by a codevt type, as suggested by George
Garner:
template< typename Resource,
                    typename Codecvt = fetch_from_global_locale >
If Codecvt::extern_type does not match the character type of Resource, but
Codecvt::inter_type does, the codecvt will be automatically 'reversed'.
           ^^^^^^^^^^
int_type, I presume
That's pretty reasonable, I think.
...
...
_______________________________
basic_newline_filter
I realize that this is only an example, but there is no apparent
protection from misconfiguring the constructor flags.  Grouping
the related options into enumerated types and taking separate
parameters for each group would make it safer to use.  (You can
overload the bitwise OR operator to permit combining them
conveniently.)  That is, since print_CR, print_LF, and print_CRLF
are mutually exclusive, they should be part of an enumerated type
that does not permit bitwise OR'ing.  Since the posix, mac, and
windows values are meant to supplant all of the other values,
they should be part of their own enumerated type and should be
used in a separate constructor.  The remaining options can be
part of a third enumerated type, with bitwise OR support, that
constitutes the second argument to the existing constructor.
I was trying to keep things simple. Specifying more than one of print_CR,
print_LF and print_CRLF yeilds a runtime error. Even if print_xxx had it's own
enumeration type, it would still be possible to specify illegal values.
If no operators are defined for the enumerated type, then one
must go out of one's way to provide illegal values.
...
...
The print_CR, print_CRLF,
and print_LF values would be better if named write_CR,
write_CRLF, and write_LF, since no printing is occuring, at least
in general.
How about output_xxx instead of print_xxx?
If you prefer "output" to "write" that's fine.
...
...
_________________________________________________________________
boost::io::reverse
Isn't the direction of a filter independent of the filtering?
Yes, in theory, put in practice some filters are much easier to express as an
output filter than an input filter, or vice versa.
...
That is, why isn't a filter's interface based upon these
semantics:
char_type filter(char_type ch);
    streamsize filter(char const * input, streamsize n,
       char const * output);
Then, the source and destination of the data is under the control
of the filter() function and means that whether the filtering is
used on input or output is an orthogonal decision.  This
interface should be every bit as efficient as what you've
specified, but makes filters more straightforward and flexible.
This is what symmetric filter does (http://tinyurl.com/3lz82). I'm planning to
optimize the case where a symmetric filter is added to a filter chain, so that
symmetric filters should be ruthlessly efficient.
The problem with making symmetric filters the default (or exclsuive) filter
concept is that it's unexpectedly difficult to implement them correctly.
I don't recall reading information on this rationale.  It should
be part of the docs, maybe a FAQ.  However, I'd like more
information on why filters are difficult to write
bidirectionally.  The I/F I mention above seems quite simple.

(Note, I"m not suggesting that filters support simultaneous
bidirectionality by default.  For that, I can understand a
special category or mode tag.)
...
...
_________________________________________________________________
Documentation Comments
...
When formatting source code may I suggest not indenting twice for
braces?  Doing so pushes code too far right and reduces
readability due to long lines.  (This occurs on tutorial.html and
read.html, for example.)
Do you mean:
struct tolower_filter : public buffered_input_filter {
        template<typename Source>
        streamsize read(Source& src, char* s, streamsize n)
            {
                streamsize result = boost::io::read(src, s, n);
                for (streamsize z = 0; z < result; ++z)
                    s[z] = tolower(s[z]);
                return result;
            }
    };
Should be
struct tolower_filter : public buffered_input_filter {
        template<typename Source>
        streamsize read(Source& src, char* s, streamsize n)
        {
            streamsize result = boost::io::read(src, s, n);
            for (streamsize z = 0; z < result; ++z)
                s[z] = tolower(s[z]);
            return result;
        }
    };
?
Yes.
...
...
_________________________________________________________________
menu.html
_______________________________
Contents
This is confusing.  Apparently, the leading "+" means a different
page, whereas entries without the "+" are on the current page,
but there is no legend explaining that.
'+' means that the entry is expandible by clicking. '-' means the entry is
collapsable. See http://tinyurl.com/65234.
OK.  I never hovered over them, so I never noticed that they were
active.
...
...
There's nothing wrong with what you've done, but why not use the
phrase, "standard streambuf" instead of "standard stream buffer?"
It's shorter and in keeping with what anyone with some
familiarity of Standard IOStreams would expect.
'Stream buffer' is used by the standard and by Langer and Kreft, presumably
because streambuf is the specialization of basic_streambuf for the type char.
OK.  I guess I usually just abbreviate it.
...
...
_______________________________
Footnotes
I don't see any text that references the footnotes at the bottom
of the page, so their presence is confusing.
I forgot to remove them when I moved Tutorial to its own page.
...
(Further reading
reveals that these footnotes appear at the bottom of every page,
I hope not!
No, you're right.  They were at the bottom of a number of pages,
but I never came back to this comment to note that.  (I
generalized too soon.)
...
...
_________________________________________________________________
tutorial.html
...
" Both streambuf_facade and stream_facade have constructors and
overloads of open which forward their arguments to filter or
                                                 ^^^^^^^^^
to the filter
resource constructor. Therefore, the first example could have
                                      ^^^^^
previous
I see a grammatical problem here --  "forward their arguments to filter or
resource constructor" -- but I don't understand what changes you are suggesting.
"...forward their arguments to the filter or resource
constructor.  Therefore, the previous example could...."
...
...
"We could also add any number of InputFilters to the chain before
adding the Source."
...
"They could also use the following convenience typedefs:"
I think the first sentence was changed without updating the second.
There are several choices for this type of passage:
1. Use the passive voice everywhere.
2. Use 'we' -- this sounds natural to me because it's used in mathematical
papers.
3. Use 'you'
4. Use 'the user'
Which should one prefer? Settling on one of the above and using it consistently
should resolve many of your (snipped) suggestions/objections below.
I use we in comments (design notes, and such), and you in
tutorial type text.  After all, I'm teach "you," the reader in
such text.  I think you should write the docs as if you are
writing to a single reader.  That makes it easiest to get things
consistent.
...
...
_______________________________
Efficient Filtering: Buffered Filters
...
In such cases, reading a character from an ordinary
    InputFilter means calling its get member function, thus
    incurring the overhead of a (virtual?) function call.
It's not virtual.
Hence my making it parenthetical.
...
...
_______________________________
Regex OutputFilter
"This example uses a regex_filter to remove C and C++-style
comments from a code excerpt. Although somewhat more
          ^^^^^^^^^^^^^^^^^^^
omit
sophisticated than the Uncommenting InputFilter, above, the
What's the problem here?
It sounds as though the filter is only good for a hard coded
string or something.  Omitting "from a code excerpt" makes it
sound more general.
...
...
_______________________________
Overview
"...an InputFilter, which filters input read from a Source, and
an OutputFilter, which filters output written to a Sink."
I suggest a little rephrasing to make this read better and fill
in a couple of gaps:
an InputFilter, which filters input read from a Source or
    another InputFilter, and an OutputFilter, which filters output
    read from a Source or another OutputFilter, before it is
    written to a Sink.
But that would be false. In order to chain filters, the library has to wrap
filters with adapters so that they become resources.
Suffice it to say that I wrote it that way because I thought that
was accurate.  Therefore, it wasn't sufficiently clear to prevent
my jumping to the wrong conclusion.
...
...
"SeekableResource" refers to a read/write "head," but few, if
any, Resources for which this library will be used are physical
devices with read/write heads.  I suggest using "location,"
"position," or "offset" in lieu of "head" here and in other parts
of the documentation.
How about 'stream position indicator'?
Why "indicator?"
...
...
_______________________________
Filter Concepts
"Notifications" is unexpected and you refer to a stream.  Try
this:
Closable: A Filter or Resource which is told to close
    immediately before a Source or Sink is closed.
How about:
Closable: A Filter or Resource with a member function close which is called
when the end of a character sequence is reached.
Good.
...
...
In "Direct," you refer to a socket-like interface, but not
everyone will understand what you mean.  I suggest using the
phrase, "function-based interface."
I see your point, but Direct Resources also have a function based interface.
Well, you need something more specific and plain than
"socket-like interface."
...
...
_________________________________________________________________
modes.html
...
"Readers new to the Iostreams Library should feel free to
  ^^^^^^^                                     ^^^^^^^^^^^^
  Users                                       omit
concentrate primarly on these two modes."
How would you phrase the whole sentence?
"Users new to the IOStreams Library should concentrate primarily
on these two modes."
...
...
_______________________________
Definitions of the Modes
"This is useful to help reduce...different of filter types."
                                            ^^
omit
I don't follow.
Change to, "This is useful to help reduce...different filter
types."
...
...
_______________________________
The Metafunction mode
You need to include an example and the signature of mode so we
know how to use it.
This section contains a link to the reference docs for mode:
http://tinyurl.com/5oqsy. Wouldn't it suffice to provide an example there?
I did see mode.html later, but I don't think I noticed the link
when I was looking at this page.  An example on that other page
is the right place, but maybe you can call attention to the link?
...
...
_______________________________
Reference
...
"Policy-based stream buffer template with an interface...."
                             ^^^^^^^^
class template
"Policy-based class template deriving from a specialization of basic_streambuf
with an interface...." ?
Good.
...
...
______
Synopsis
Using "T" as the name of a policy class template parameter is
confusing.  "T" is, of course, commonly used for a data type
stored in a container and for similar purposes.  The closest
thing to "T" in streambuf_facade is the "Ch" nested type in the
policy class.  Change "T" to a more meaningful name.
I adopted this convention from "C++ Templates", p. 10. FilterOrResource would be
more decriptive, but it's too long. What would you suggest?
You need a supercategory of Filter and Resource.  "Component?"

-- 
Rob Stewart                           stewart@sig.com
Software Engineer                     http://www.sig.com
Susquehanna International Group, LLP  using std::disclaimer;