Iostreams filter buffering policy leads to extremely small reads.

Johnathan, The Iostreams library provides for built in buffering of filters which in principle should permit efficient use of filters even with streams that do not support filtering. However, in practice it does not appear to live up to expectations. More specifically, the following line in indirect_streambuf.hpp can lead to extremely fragmented reads to lower filters or devices: std::streamsize indirect_streambuf<T, Tr, Alloc, Mode>::xsgetn (char_type* s, std::streamsize n) { [...] streamsize amt = obj().read(s + avail, n - avail, next_); // n - avail may equal 10 or less [...] } This is true even if the filter buffer size is set to 1MiB or more. Large buffer sizes (> 2 MiB) greatly enhances read and write performance on many modern operating systems. This is particularly felt when reading and writing large files (> 4 GiB). Ideally, if you set a filter buffer policy (e.g. to 1 MiB) you would like all reads to lower filters and devices to request that value, except possibly for the last read. Fortunately, it is relatively trivial to disable Iostreams buffering altogether and write a buffering_shim_filter. But then I do not understand what purpose Iostreams buffering serves. I Googled through the online documentation and I didn't find a detailed discussion of its objectives, though there were some comments that touched on the subject matter during the review process. Perhaps you can further elaborate this subject matter. Regards, George.

George M. Garner Jr. wrote:
Johnathan,
Hi George,
The Iostreams library provides for built in buffering of filters which in principle should permit efficient use of filters even with streams that do not support filtering. However, in practice it does not appear to live up to expectations. More specifically, the following line
in
indirect_streambuf.hpp can lead to extremely fragmented reads to lower filters or devices:
std::streamsize indirect_streambuf<T, Tr, Alloc, Mode>::xsgetn (char_type* s, std::streamsize n) { [...] streamsize amt = obj().read(s + avail, n - avail, next_); // n - avail may equal 10 or less
[...] } This is true even if the filter buffer size is set to 1MiB or more.
This seems to be an optimization gone awry. The intended buffering policy, reflected correctly (I hope) in underflow(), is to fill the input buffer as soon as a read request is received, regarless of the size of the request, and to fill read requests from the buffer until it is empty, at which point it will be filled again. The implementation of xsgetn works well for large read requests, since a single read is performed rather than a sequence of reads. You seem to be right, though, that it is unsatisfactory for small reads. Whether this is bad for performance depends on the size of n in typical filtering situations. I beleive that n should typically equal the buffer size for all filters and devices in a chain other than the first, but that for the first filter or device, n will tend to reflect the i/o operations performed by the end user, and so may turn out to be small. I'd be interested to know if this is consistent with your experience. One fix would be to have xsgetn fill read requests directly from the source only if they are large, and otherwise use the underflow strategy. I'm inclined, however, just to scrap xsgetn (and perhaps xsputn), and rely soley on underflow (and overflow). I'd appreciate it if you would comment out the declaration and implementation of xsgetn and see if you still experience problems.
Large buffer sizes (> 2 MiB) greatly enhances read and write performance on many modern operating systems. This is particularly felt when reading and writing large files (> 4 GiB). Ideally, if you set a filter buffer policy (e.g. to 1 MiB) you would like all reads to lower filters and devices to request that value, except possibly for the last read.
Right.
Fortunately, it is relatively trivial to disable Iostreams buffering altogether and write a buffering_shim_filter.
You can simply set the buffer size to zero when you add a filter to a chain.
But then I do not understand what purpose Iostreams buffering serves. I Googled through the online documentation and I didn't find a detailed discussion of its objectives, though there were some comments that touched on the subject matter during the review process. Perhaps you can further elaborate this subject matter.
The purpose is to minimize the number of function calls (for filters and devices) and to minimize the number of potential expensive accesses to external devices (mostly for devices). Thanks for digging into the iostreams internals!
Regards,
George.
Jonathan

Johnathan,
I'd appreciate it if you would comment out the declaration and implementation of xsgetn and see if you still experience problems.
It makes the problem worse. Then I get reads for a single character from std::streambuf<>::uflow(): virtual int_type uflow() { // get a character from stream, point past it return (_Traits::eq_int_type(_Traits::eof(), underflow()) ? _Traits::eof() : _Traits::to_int_type(*_Gninc())); }
Fortunately, it is relatively trivial to disable Iostreams buffering altogether and write a buffering_shim_filter.
You can simply set the buffer size to zero when you add a filter to a chain.
Actually not! I was much chagrinned to find that you cannot disable input buffering. Setting the input buffer and pback buffers to 0 actually results in an input buffer size of either 4 or 1, depending on whether or not I disable your STLPort workaround. :-( I do think it should be possible for someone not using STLPort to disable input buffering. The following modified code appears to work: template<typename T, typename Tr, typename Alloc, typename Mode> void indirect_streambuf<T, Tr, Alloc, Mode>::open (const T& t, std::streamsize buffer_size, std::streamsize pback_size) { using namespace std; // Normalize buffer sizes. buffer_size = (buffer_size != -1) ? buffer_size : is_filter<T>::value ? default_filter_buffer_size : default_buffer_size; pback_size = (pback_size != -1) ? pback_size : default_pback_buffer_size; // Construct input buffer. if (can_read()) { // Begin gmg modifications here // It should be possible to disable this workaround for people // not using STLPort. #ifndef FORGET_STLPORT pback_size_ = std::max( static_cast<streamsize>(2), // STLPort needs 2. //illegal token on right side of pback_size ); #endif //FORGET_STLPORT if(streamsize size = pback_size_ + buffer_size) { in().realloc(size); if (!shared_buffer()) init_get_area(); } } // End gmg modifications here // Construct output buffer. if (can_write() && !shared_buffer()) { if (buffer_size != 0) out().realloc(buffer_size); init_put_area(); } storage_ = wrapper(t); flags_ |= f_open; if (can_write() && buffer_size) flags_ |= f_output_buffered; } template<typename T, typename Tr, typename Alloc, typename Mode> typename indirect_streambuf<T, Tr, Alloc, Mode>::int_type indirect_streambuf<T, Tr, Alloc, Mode>::underflow() { using namespace std; // Begin gmg modifications here buffer_type& buf = in(); // If buffering is disabled return eof(). if(buf.size() == 0) return traits_type::eof(); //gmg if (!gptr()) init_get_area(); //buffer_type& buf = in(); // End gmg modifications here if (gptr() < egptr()) return traits_type::to_int_type(*gptr()); // Fill putback buffer. streamsize keep = std::min( static_cast<streamsize>(gptr() - eback()), pback_size_ ); if (keep) traits_type::move( buf.data() + (pback_size_ - keep), gptr() - keep, keep ); // Set pointers to reasonable values in case read throws. setg( buf.data() + pback_size_ - keep, buf.data() + pback_size_, buf.data() + pback_size_ ); // Read from source. streamsize chars = obj().read(buf.data() + pback_size_, buf.size() - pback_size_, next_); setg(eback(), gptr(), buf.data() + pback_size_ + chars); return chars != 0 ? traits_type::to_int_type(*gptr()) : traits_type::eof(); } template<typename T, typename Tr, typename Alloc, typename Mode> std::streamsize indirect_streambuf<T, Tr, Alloc, Mode>::xsgetn (char_type* s, std::streamsize n) { using namespace std; buffer_type& buf = in(); if (!gptr() && buf.size() > 0) init_get_area(); streamsize total = 0; do { // Fill request from buffer if anything is available. if (streamsize avail = std::min(n, static_cast<streamsize>(egptr() - gptr()))) { traits_type::copy(s, gptr(), avail); gbump((int) avail); total += avail; s += avail; n -= avail; } else { //Either buffering is disabled or the buffer has been exhausted. // If the buffer size is less than or equal to n fill the request // directly from the source. This includes the case where input // buffering has been disabled. if(buf.size() <= n) { streamsize to_read = buf.size(); if(to_read == 0) to_read = n; if(streamsize amt = obj().read(s, to_read, next_)) { s += amt; n -= amt; total += amt; } else { // Something is screwed. Get out of dodge. break; } } else { //The number of bytes requested is less than the buffer size. // Call overflow to refill the input buffer. // s and n will be updated from the (now full) input buffer on // the next iteration of the do-loop. if(traits_type::eq_int_type(traits_type::eof(), underflow())) break; } } } while(n > 0); return total; } This appears to work so far. Regards, George.

George M. Garner Jr. wrote:
Johnathan,
I'd appreciate it if you would comment out the declaration and implementation of xsgetn and see if you still experience problems.
It makes the problem worse. Then I get reads for a single character from std::streambuf<>::uflow():
virtual int_type uflow() { // get a character from stream, point past it return (_Traits::eq_int_type(_Traits::eof(), underflow()) ? _Traits::eof() : _Traits::to_int_type(*_Gninc())); }
underflow or uflow should only be called when the input buffer is empty. The only effect of removing xsgetn should be that the default xsgetn kicks in. The default reads up to n characters "as if by repeated calls to sbumpc()," which means it reads from the buffer, filling it as necessary using uflow.
Fortunately, it is relatively trivial to disable Iostreams buffering altogether and write a buffering_shim_filter.
You can simply set the buffer size to zero when you add a filter to a chain.
Actually not! I was much chagrinned to find that you cannot disable input buffering. Setting the input buffer and pback buffers to 0 actually results in an input buffer size of either 4 or 1, depending on whether or not I disable your STLPort workaround.
Unfortunately I'm not sure it's really a "workaround." As far as I can tell, STLPort's implementation is conforming, which means the buffer-sizing policy is necessary to handle an arbitrary conforming standard library.
:-( I do think it should be possible for someone not using STLPort to disable input buffering.
The stream buffers in a chain provide the guarantee that a single character can always be put back, so a buffer size of at least one is necessary. I guess I could relax this requirment for stream buffers which are not part of a chain, but I'd have to be convinced that maintaining a putback buffer is sometimes a performance bottleneck.
The following modified code appears to work:
Thanks for the code. Unfortunately I can't test it now, since I'm in the middle of adding support for non-blocking i/o and the codebase is in flux. Would it be fair to characterize you code as follows? 1. undeflow and xsgetn both contain speicial code to handle the unbuffered case 2. xsgetn fills the request by invoking the underlying source only if n is big; otherwise it fills the request from the buffer, calling overflow as necessary. Regarding 1, I'd like to handle the unbuffered case by writing a separate stream buffer, unbuffered_streambuf, which combines the unbuffered sections from the various indirect_streambuf virtual functions. This would offer a tiny performance advantage, because there would be no check for the unbuffered case while performing i/o, but mostly it would make the code easier to read. Unfortunately, the necessity to deal with STLPort-type implementations makes this impractical. If I could be convinced that STLPort's behavior is non-conforming, I might do this. That still leaves the question whether the "unbuffered" streambuf should have a putback buffer of size 1, as is usual. Regarding 2, your implementation of xsgetn is essentially the same as a quality default implementation, except for the optimization for very large n. In the absence of data showing the large n optimization is significant, I'd prefer to use the default implementation to spare code size. I'm interested to know whether I've understood your code correctly, whether you think always providing a putback buffer of size is unreasonable, and whether you think the optimization for large n is critical. Best Regards, Jonathan

Johnathan, In the interests of time I think that the most important thing that you could do right now is to make it easy to declare filtering_xxstream classes with specialized indirect_streambuf/direct_streambufs. It shouldn't be necessary to replace the entire plumbing just for me to make the few modifications that we have discussed. At the moment the choice of indirect/direct streambuf appears to be hardwired into the streambuf_facade_traits. Or perhaps you can explain how to specialize just this class without replacing the filter plumbing. Regards, George. "Jonathan Turkanis" <technews@kangaroologic.com> escreveu na mensagem news:d416hh$qb5$1@sea.gmane.org...
George M. Garner Jr. wrote:
Johnathan,
I'd appreciate it if you would comment out the declaration and implementation of xsgetn and see if you still experience problems.
It makes the problem worse. Then I get reads for a single character from std::streambuf<>::uflow():
virtual int_type uflow() { // get a character from stream, point past it return (_Traits::eq_int_type(_Traits::eof(), underflow()) ? _Traits::eof() : _Traits::to_int_type(*_Gninc())); }
underflow or uflow should only be called when the input buffer is empty. The only effect of removing xsgetn should be that the default xsgetn kicks in. The default reads up to n characters "as if by repeated calls to sbumpc()," which means it reads from the buffer, filling it as necessary using uflow.
Fortunately, it is relatively trivial to disable Iostreams buffering altogether and write a buffering_shim_filter.
You can simply set the buffer size to zero when you add a filter to a chain.
Actually not! I was much chagrinned to find that you cannot disable input buffering. Setting the input buffer and pback buffers to 0 actually results in an input buffer size of either 4 or 1, depending on whether or not I disable your STLPort workaround.
Unfortunately I'm not sure it's really a "workaround." As far as I can tell, STLPort's implementation is conforming, which means the buffer-sizing policy is necessary to handle an arbitrary conforming standard library.
:-( I do think it should be possible for someone not using STLPort to disable input buffering.
The stream buffers in a chain provide the guarantee that a single character can always be put back, so a buffer size of at least one is necessary. I guess I could relax this requirment for stream buffers which are not part of a chain, but I'd have to be convinced that maintaining a putback buffer is sometimes a performance bottleneck.
The following modified code appears to work:
Thanks for the code. Unfortunately I can't test it now, since I'm in the middle of adding support for non-blocking i/o and the codebase is in flux.
Would it be fair to characterize you code as follows?
1. undeflow and xsgetn both contain speicial code to handle the unbuffered case 2. xsgetn fills the request by invoking the underlying source only if n is big; otherwise it fills the request from the buffer, calling overflow as necessary.
Regarding 1, I'd like to handle the unbuffered case by writing a separate stream buffer, unbuffered_streambuf, which combines the unbuffered sections from the various indirect_streambuf virtual functions. This would offer a tiny performance advantage, because there would be no check for the unbuffered case while performing i/o, but mostly it would make the code easier to read. Unfortunately, the necessity to deal with STLPort-type implementations makes this impractical. If I could be convinced that STLPort's behavior is non-conforming, I might do this. That still leaves the question whether the "unbuffered" streambuf should have a putback buffer of size 1, as is usual.
Regarding 2, your implementation of xsgetn is essentially the same as a quality default implementation, except for the optimization for very large n. In the absence of data showing the large n optimization is significant, I'd prefer to use the default implementation to spare code size.
I'm interested to know whether I've understood your code correctly, whether you think always providing a putback buffer of size is unreasonable, and whether you think the optimization for large n is critical.
Best Regards, Jonathan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

George M. Garner Jr. wrote:
Johnathan,
In the interests of time I think that the most important thing that you could do right now is to make it easy to declare filtering_xxstream classes with specialized indirect_streambuf/direct_streambufs. It shouldn't be necessary to replace the entire plumbing just for me to make the few modifications that we have discussed. At the moment the choice of indirect/direct streambuf appears to be hardwired into the streambuf_facade_traits. Or perhaps you can explain how to specialize just this class without replacing the filter plumbing.
You can't just replace the stream buffers in a chain with arbitrary stream buffers; they must have special properties in order to work together correctly. The type of changes you have proposed would best be encapsulted by a "buffering policy"; in other words, streambuf_facade would be given a second major template parameter: template<typename Component, typename Buffering , ...> class streambuf_facade; The buffering policy would dictate whether buffers are present and the strategry for making use of the buffers. I implemented this last summer -- I'd summarize the interface but it's buried somewhere in my local CVS repository. What I found was that to be sufficiently general the interface was extremely complex. If your goal is to write a stream buffer to access a particular device, it's far easier to write it from scratch than to implement a model of Device and a buffering policy. If your goal is to produce more efficient filter chains, the ability to specify custom buffering policies might be an advantage, but my judgement was that the current two-size-fits-all buffering policy was good enough for the initial release, and possibly for good. Rather than adding an aditional policy parameter, I'd rather work on fine-tuning the two existing policies.
Regards,
George.
Jonathan
participants (2)
-
George M. Garner Jr.
-
Jonathan Turkanis