Re: [boost] Re: IOStreams formal review start

30 Aug 2004

      On Mon, Aug 30, 2004 at 11:32:28AM -0600, Jonathan Turkanis wrote:
...
...
Apart from some broken links and typos in the documentation/comments,
Would you please point them out?
On http://home.comcast.net/~jturkanis/iostreams/libs/io/doc/classes/alphabetica...

converting_stream and converting_streambuf do not have hyperlinks.
stream_facade and streambuf_facade link to non-existing pages.

http://home.comcast.net/~jturkanis/iostreams/boost/io/filtering_stream.hpp

contains a comment:

// Macro: BOOST_IO_DEFINE_FILTER_STERAM(name_, mode_)

while the macro signature is actually:

#define BOOST_IO_DEFINE_FILTER_STREAM(name_, chain_type_, default_char_)

[..snip..]
...
For now, let me just make these points:
1. There is already a mechanism to avoid copying data in certain cases: by
implementing resources which model the concept Direct.
I understand from the documentation that this (Array Resources) is a pre-allocated
buffer of fixed size.  It has to be fixed of course, otherwise you need to move
it when a larger size is needed.  However, what if the buffer runs full? My
dbstreambuf implementation exists of a list of dynamically allocated memory
blocks: a new block of memory is allocated when the buffer needs to grow.
As a result it is possible to a piece of data (which I call 'messages')
that should be contigious for easy processing, but is not (when it spans
two or more internal blocks); but, but allocating blocks of a size that are
considerably larger than the average message size, and by automatically
starting at the beginning of a block if the block become entirely empty,
in practise very very little copying (to make a message contigous) is needed.
I don't think this is possible with the Direct concept currently provided.

The basic idea that you merely return a pointer to the message inside
the stream buf and then process it seems covered, but there is more to it.
Please inform me if the following is also possible with the Direct concept:
What libcw is aiming for is that data (from file/socket descriptors) is
read into a buffer in memory - and then no more copying is needed at all.
This means that if you 'read' a 'message' (where a what a 'message' is is
determined by a custom virtual function 'decode' in a derived class)
then you only return a pointer, and advance an internal pointer so that
the next 'message' will get subsequential data.  However - that message
is *still* in the buffer and may not be overwritten until it is true
done with.  Therefore, messages are passed as objects with a reference
counter that inform the underlaying (now seemingly unrelated streambuf)
when the data may be overwritten and/or freed.  The application would
process the 'message' and destruct the message object once it is totally
done with it.  You will understand that this is also the reason that
it is rather important that the buffer can 'grow': Even if on average
you process as many message as that are being received - there will
normally always be unprocessed messages in the buffer, disallowing it
to start writing again at the beginning of the buffer.  And therefore
every new message needs to be appended at the end.. until you reach
the end of the buffer.  At that point a 'buffer full' is unacceptable
because it is NOT really entirely full - you are merely only using
the bytes at the end of it.

[...]
...
streambuf_facade would look like this:
template< typename Resource,
                         typenmae Tr = ...,
                         typename Buffering = basic_buffering<Resource>,
                         ... >
         class streambuf_facade;
This would allow essentially any buffering policy to be employed.
Including the one I described above?  Having a linked list of
allocated memory blocks and reference counting 'message' objects
that reserve parts of it and communicate with the buffer about
those parts really being free for reuse?
...
The main
application I have in mind is cases where the underlying resource should be
accessed in mulitples of a certain block size.
That block size (my message size I think) does not have to be fixed.
There are many protocols put there that have variable sized messages! ;)
...
In fact, I have already (mostly) implemented such an approach, but I have not
incorporated it into the library for several reasons:
- The buffering policy has a rather bulky interface which I think I may be able
to simplify
- I'm not convinced yet that it's a performance win -- only tests will tell. If
it makes only a small difference in a few cases, it may not be worth
complicating the library.
Well, this only makes sense for large servers with thousands of connections
that all burst data in huge quantities... exactly the kind of applications I like
to write ;).  There are two major cpu hogs in that case: 1) finding which
filedescriptor is ready, 2) moving data in memory.
The first can be solved by not using ancient interfaces like select() or poll()
but the more modern ones like kqueue.
...
To summarize, I'd like to make streambuf_facade flexible enough so that you
don't have to substitute you own home-brewed version. This is *not* a criticism
of your library: if you have good ideas about how to make streambufs more
efficient, I'd like to incorporate them directly into streambuf_facade -- 
possibly as buffering policies -- with your permission.
My ideas are free :p.  You won't be able to use libcw ('s code) anyway
because it wasn't written with a friendly interface in mind - I designed
it with two goals: 1) Speed, 2) The ability to adapt to yet-unknown demands,
in other words 'flexibility' at the user level (or 'one size fits all',
but that really sounds too bad :p).  As a result, the interface so
complex that someone who doesn't understand it (and that is everyone
else besides me) will call it bloated ;)
...
...
Another thing that is bothering me is that the whole
presence of anyting 'stream-like' (ostream/istream) seems
not in the right place here.  This is not only because
the std::ostream/std::istream class are merely 'hooks' to
hook into the operator<< and operator>> functions which
are primarily intended for text (human readable representations)
while this library is about binary data - but more importantly
because everything this library does is related to and at the
level of streambuf's (which DO have a binary interface)
This fact is most apparent by considering the fact that this
code should work:
filtered_ostream fout;
fout.push(filter);
fout.push(cout);
It works (with 'filtering_ostream'). What's wrong with it?
...
std::ostream& out(fout); // Only have/use the std::ostream base class.
out << "Hello World"; // This must use the filter.
A much more logical API would therefore be:
filtered_streambuf fbuf;
fbuf.push(filter);
std::streambuf& buf(fbuf);
And then using 'buf' as streambuf for some ostream of operator<< inserters
are desirable.
Could you rephrase this whole argument? I don't think I follow it.
...
From that follows that if you don't NEED initialization
for it - then you don't need the whole ostream class.
However, what convinced me more is the notition of what
an ostream really is: a hook to the operator<< classes.
If you only need a hook and users will only write
operator<<(std::ostream& ... functions, then why would
you ever need something else then an ostream?  It is
just too unlogical.  When I look at the interface of
I am afraid I cannot explain it ... it's experience :/.
By providing an interface foobar_stream while you really
only need to provide foobar_streambuf you do something
that makes my alarm bells go off. The word "inflexible"
comes to mind.  This will lead to problems of the kind
that a user wants to do something but can't.  You are
limiting yourself too much this way.

Another thing, and I can explain that better, is that
users only write serializers for std::ostream (and please
don't ask them to do that again for filtering_ostream!).
Therefore, if there has to a filtering_ostream then it
MUST be derived from std::ostream AND still work
(the same) if all you have - at any moment after construction
and initialization - is a pointer to the std::ostream
base class.

this library and see filtering_ostream then that strikes
me as "impossible", you just CANNOT need that. So, why
is it there?  You already answered that yourself
later by the way: to make it easier for the user.
You provide a filtering_ostream as wrapper around
std::ostream so that the initialization functions for
the *streambuf* look nicer (and surely, yes, this cleans
up code that uses ostreams).

Therefore, my only objection against filtering_ostream
is that it HIDES the real interface: filtering_streambuf...
which is doesn't hide as you told me now ;).

So, you can consider this objection to be void.  But
I still think you should make it a bit more clear that
filtering_ostream is just candy, convenience - and not
hide the real thing (filtering_streambuf) behind it in
all your examples and documentation.  I completely missed it!

[...]
...
...
To summarize:
- I think that the stream interface should be ripped out and replaced
  by one that is an equivalent streambuf.  Providing a stream interface
  should be merely a 'convenience' interface and not the main API.
The stream interface already *is* just a convenience, as explained above.
Perhaps the misunderstanding stems from the fact that in the examples I tend to
use streams, since they are more familiar to most users than stream buffers.
'Ripping out' the stream interface would simply mean omitting the two files
stream_facade.hpp and filtering_stream.hpp, for a combined total of about 11k
;-)
Ok
...
...
- This streambuf interface should use a 'Streambuf' template parameter
  for its base class that only defaults to std::streambuf (and may demand
  that it is derived from std::streambuf if that is really necessary) but
  allows the base class to be replaced with a custom implementation.
I think a buffering policy is the way to go.
If you can add the functionality of libcw's dbstream into this
library - then it will become my favourite boost lib ;)

Err... probably not.  There is another thing that I'd missing.
But I can't ask to add that too; its too... ugly (I wanted to say
complex).  When I use iostream classes I need TWO streambufs
(one buffer for the input and another for the output). This is
not supported by std::iostream because it only has a single (virtual)
std::ios base class and thus only a single streambuf pointer.
You can read on the url to libcw that I gave in the previous post
how I solved that, but believe me it makes the interface very hard
to understand unless you drag in the whole API that I designed
around it - and I don't think that will merge nicely with your
IOStreams anymore (?).

-- 
Carlo Wood <carlo@alinoe.com>