
Hello, I've recently used iostreams for easier gziping and bzip2ing of output streams. However, I'd also like to be able to read any file compressed with either gzip or bzip2, by analyzing the input stream and deducing which decompressor to use. I don't want to open the file twice or to seek in the stream, as this might not be possible due to e.g. reading from the network. My solution is to create a custom streambuf, which is told to read and retain the first few bytes to determine which decompressor to push onto the boost::iostreams::filtering_streambuf. Then the actual reading can take place, and the custom streambuf first returns the retained bytes, then just streams the rest of the actual input streambuf. As far as I can tell, this works nicely. I wonder if anybody else has better solutions for this? Perhaps there is come capability of iostreams that I've overlooked? Below is the code, feel free to use as you wish. I don't know if I need to override xsgetc(), underflow() and uflow(), so they are currently just non-working stubs. So far nothing seems to have triggered any of them: only xsgetn seems to be used. I haven't found any actual errors, though. Both gzip and bzip2 files seem to decompress. Otherwise the code can be improved a lot, e.g. by registering separate compression detectors instead of hard-coding them in the streambuf. I think I'm also lost a bit where and when to use streambufs instead of streams, e.g. m_input. Apologies if the code is too long for your mailing-list, I didn't find any guidelines for this. Regards, Marcus #include <boost/iostreams/filtering_streambuf.hpp> #include <boost/iostreams/copy.hpp> #include <boost/iostreams/filter/bzip2.hpp> #include <boost/iostreams/filter/gzip.hpp> #include <iostream> #include <boost/noncopyable.hpp> class general_decompressor_streambuf : public std::basic_streambuf<char, std::char_traits<char> >, public boost::noncopyable { private: std::streambuf& m_input; static const int BUFFER_SIZE = 5; unsigned char m_read[BUFFER_SIZE]; std::streamsize m_readpos; std::string m_compression_type; public: general_decompressor_streambuf(std::streambuf& i) : m_input(i), m_readpos(0) { ; } ~general_decompressor_streambuf() throw () { ; } std::string get_compression_type() const { return m_compression_type; } void resolve_compressor (boost::iostreams::filtering_streambuf<boost::iostreams::input>& sb) { int pos = 0; while (pos < BUFFER_SIZE) { int c = m_input.sbumpc(); if (c == EOF) return; m_read[pos++] = c; } if (m_read[0] == 037 && m_read[1] == 0213) { m_compression_type = "GZIP"; sb.push( boost::iostreams::gzip_decompressor() ); } else if (m_read[0] == 'B' && m_read[1] == 'Z' && m_read[2] == 'h') { m_compression_type = "BZIP2"; sb.push( boost::iostreams::bzip2_decompressor() ); } else { ; } } std::streamsize xsgetn(char* s, std::streamsize n) { std::streamsize cnt = 0; if (m_readpos < BUFFER_SIZE) { while (m_readpos < BUFFER_SIZE && n > 0) { unsigned char ch = m_read[m_readpos++]; *s++ = ch; ++cnt; } if (cnt == n) return cnt; } std::streamsize ss = m_input.sgetn(s, n - cnt); ss += cnt; return ss; } int xsgetc() { std::cerr << "xsgetc" << std::endl; return m_input.sgetc(); } int underflow ( ) { std::cerr << "underflow" << std::endl; return m_input.sgetc(); } int uflow ( ) { std::cerr << "uflow" << std::endl; return m_input.sgetc(); } }; int main() { general_decompressor_streambuf buffering_in_streambuf(*std::cin.rdbuf()); boost::iostreams::filtering_streambuf<boost::iostreams::input> cmpr; buffering_in_streambuf.resolve_compressor(cmpr); cmpr.push(buffering_in_streambuf); std::istream i(&cmpr); boost::iostreams::copy(i, std::cout); return 0; }