
On Mon, Aug 30, 2004 at 11:32:28AM -0600, Jonathan Turkanis wrote:
Apart from some broken links and typos in the documentation/comments,
Would you please point them out?
On http://home.comcast.net/~jturkanis/iostreams/libs/io/doc/classes/alphabetica... converting_stream and converting_streambuf do not have hyperlinks. stream_facade and streambuf_facade link to non-existing pages. http://home.comcast.net/~jturkanis/iostreams/boost/io/filtering_stream.hpp contains a comment: // Macro: BOOST_IO_DEFINE_FILTER_STERAM(name_, mode_) while the macro signature is actually: #define BOOST_IO_DEFINE_FILTER_STREAM(name_, chain_type_, default_char_) [..snip..]
For now, let me just make these points:
1. There is already a mechanism to avoid copying data in certain cases: by implementing resources which model the concept Direct.
I understand from the documentation that this (Array Resources) is a pre-allocated buffer of fixed size. It has to be fixed of course, otherwise you need to move it when a larger size is needed. However, what if the buffer runs full? My dbstreambuf implementation exists of a list of dynamically allocated memory blocks: a new block of memory is allocated when the buffer needs to grow. As a result it is possible to a piece of data (which I call 'messages') that should be contigious for easy processing, but is not (when it spans two or more internal blocks); but, but allocating blocks of a size that are considerably larger than the average message size, and by automatically starting at the beginning of a block if the block become entirely empty, in practise very very little copying (to make a message contigous) is needed. I don't think this is possible with the Direct concept currently provided. The basic idea that you merely return a pointer to the message inside the stream buf and then process it seems covered, but there is more to it. Please inform me if the following is also possible with the Direct concept: What libcw is aiming for is that data (from file/socket descriptors) is read into a buffer in memory - and then no more copying is needed at all. This means that if you 'read' a 'message' (where a what a 'message' is is determined by a custom virtual function 'decode' in a derived class) then you only return a pointer, and advance an internal pointer so that the next 'message' will get subsequential data. However - that message is *still* in the buffer and may not be overwritten until it is true done with. Therefore, messages are passed as objects with a reference counter that inform the underlaying (now seemingly unrelated streambuf) when the data may be overwritten and/or freed. The application would process the 'message' and destruct the message object once it is totally done with it. You will understand that this is also the reason that it is rather important that the buffer can 'grow': Even if on average you process as many message as that are being received - there will normally always be unprocessed messages in the buffer, disallowing it to start writing again at the beginning of the buffer. And therefore every new message needs to be appended at the end.. until you reach the end of the buffer. At that point a 'buffer full' is unacceptable because it is NOT really entirely full - you are merely only using the bytes at the end of it. [...]
streambuf_facade would look like this:
template< typename Resource, typenmae Tr = ..., typename Buffering = basic_buffering<Resource>, ... > class streambuf_facade;
This would allow essentially any buffering policy to be employed.
Including the one I described above? Having a linked list of allocated memory blocks and reference counting 'message' objects that reserve parts of it and communicate with the buffer about those parts really being free for reuse?
The main application I have in mind is cases where the underlying resource should be accessed in mulitples of a certain block size.
That block size (my message size I think) does not have to be fixed. There are many protocols put there that have variable sized messages! ;)
In fact, I have already (mostly) implemented such an approach, but I have not incorporated it into the library for several reasons: - The buffering policy has a rather bulky interface which I think I may be able to simplify - I'm not convinced yet that it's a performance win -- only tests will tell. If it makes only a small difference in a few cases, it may not be worth complicating the library.
Well, this only makes sense for large servers with thousands of connections that all burst data in huge quantities... exactly the kind of applications I like to write ;). There are two major cpu hogs in that case: 1) finding which filedescriptor is ready, 2) moving data in memory. The first can be solved by not using ancient interfaces like select() or poll() but the more modern ones like kqueue.
To summarize, I'd like to make streambuf_facade flexible enough so that you don't have to substitute you own home-brewed version. This is *not* a criticism of your library: if you have good ideas about how to make streambufs more efficient, I'd like to incorporate them directly into streambuf_facade -- possibly as buffering policies -- with your permission.
My ideas are free :p. You won't be able to use libcw ('s code) anyway because it wasn't written with a friendly interface in mind - I designed it with two goals: 1) Speed, 2) The ability to adapt to yet-unknown demands, in other words 'flexibility' at the user level (or 'one size fits all', but that really sounds too bad :p). As a result, the interface so complex that someone who doesn't understand it (and that is everyone else besides me) will call it bloated ;)
Another thing that is bothering me is that the whole presence of anyting 'stream-like' (ostream/istream) seems not in the right place here. This is not only because the std::ostream/std::istream class are merely 'hooks' to hook into the operator<< and operator>> functions which are primarily intended for text (human readable representations) while this library is about binary data - but more importantly because everything this library does is related to and at the level of streambuf's (which DO have a binary interface) This fact is most apparent by considering the fact that this code should work:
filtered_ostream fout; fout.push(filter); fout.push(cout);
It works (with 'filtering_ostream'). What's wrong with it?
std::ostream& out(fout); // Only have/use the std::ostream base class. out << "Hello World"; // This must use the filter.
A much more logical API would therefore be:
filtered_streambuf fbuf; fbuf.push(filter);
std::streambuf& buf(fbuf);
And then using 'buf' as streambuf for some ostream of operator<< inserters are desirable.
Could you rephrase this whole argument? I don't think I follow it.
From that follows that if you don't NEED initialization for it - then you don't need the whole ostream class. However, what convinced me more is the notition of what an ostream really is: a hook to the operator<< classes. If you only need a hook and users will only write operator<<(std::ostream& ... functions, then why would you ever need something else then an ostream? It is just too unlogical. When I look at the interface of
I am afraid I cannot explain it ... it's experience :/. By providing an interface foobar_stream while you really only need to provide foobar_streambuf you do something that makes my alarm bells go off. The word "inflexible" comes to mind. This will lead to problems of the kind that a user wants to do something but can't. You are limiting yourself too much this way. Another thing, and I can explain that better, is that users only write serializers for std::ostream (and please don't ask them to do that again for filtering_ostream!). Therefore, if there has to a filtering_ostream then it MUST be derived from std::ostream AND still work (the same) if all you have - at any moment after construction and initialization - is a pointer to the std::ostream base class. this library and see filtering_ostream then that strikes me as "impossible", you just CANNOT need that. So, why is it there? You already answered that yourself later by the way: to make it easier for the user. You provide a filtering_ostream as wrapper around std::ostream so that the initialization functions for the *streambuf* look nicer (and surely, yes, this cleans up code that uses ostreams). Therefore, my only objection against filtering_ostream is that it HIDES the real interface: filtering_streambuf... which is doesn't hide as you told me now ;). So, you can consider this objection to be void. But I still think you should make it a bit more clear that filtering_ostream is just candy, convenience - and not hide the real thing (filtering_streambuf) behind it in all your examples and documentation. I completely missed it! [...]
To summarize:
- I think that the stream interface should be ripped out and replaced by one that is an equivalent streambuf. Providing a stream interface should be merely a 'convenience' interface and not the main API.
The stream interface already *is* just a convenience, as explained above. Perhaps the misunderstanding stems from the fact that in the examples I tend to use streams, since they are more familiar to most users than stream buffers. 'Ripping out' the stream interface would simply mean omitting the two files stream_facade.hpp and filtering_stream.hpp, for a combined total of about 11k ;-)
Ok
- This streambuf interface should use a 'Streambuf' template parameter for its base class that only defaults to std::streambuf (and may demand that it is derived from std::streambuf if that is really necessary) but allows the base class to be replaced with a custom implementation.
I think a buffering policy is the way to go.
If you can add the functionality of libcw's dbstream into this library - then it will become my favourite boost lib ;) Err... probably not. There is another thing that I'd missing. But I can't ask to add that too; its too... ugly (I wanted to say complex). When I use iostream classes I need TWO streambufs (one buffer for the input and another for the output). This is not supported by std::iostream because it only has a single (virtual) std::ios base class and thus only a single streambuf pointer. You can read on the url to libcw that I gave in the previous post how I solved that, but believe me it makes the interface very hard to understand unless you drag in the whole API that I designed around it - and I don't think that will merge nicely with your IOStreams anymore (?). -- Carlo Wood <carlo@alinoe.com>