Performance problem in iostreams::stream

We use iostreams::stream to read from a legacy file format. The code that reads from the stream looks a little like this: void ReadStuff(std::istream &stream, Stuff &stuff, OtherStuff &otherStuff) { StreamDirectory streamDir; stream >> streamDir; stream.seekg(streamDir.GetOffset(STUFF_TAG)); stream >> stuff; stream.seekg(streamDir.GetOffset(OTHER_STUFF_TAG); stream >> otherStuff; } I.e. the stream contains a directory that holds information about the items following the directory and the offsets to them. The stream and code is also organized so that the calls to seekg in most cases are to the current position of the stream. In other words, seekg doesn’t have to do anything. If the stream is buffered the code runs into serious performance problems. After a while seekg ends up in this function. template<typename T, typename Tr, typename Alloc, typename Mode> typename indirect_streambuf<T, Tr, Alloc, Mode>::pos_type indirect_streambuf<T, Tr, Alloc, Mode>::seek_impl (stream_offset off, BOOST_IOS::seekdir way, BOOST_IOS::openmode which) { if (pptr() != 0) this->BOOST_IOSTREAMS_PUBSYNC(); // sync() confuses VisualAge 6. if (way == BOOST_IOS::cur && gptr()) off -= static_cast<off_type>(egptr() - gptr()); setg(0, 0, 0); setp(0, 0); return obj().seek(off, way, which, next_); } and as far as I can see it just dumps the internal buffer and passes the call on. After still some calls we end up in the stream source seek function. This wouldn’t be so bad if it wasn’t for the fact that the stream source offset isn’t the same as the streams offset since the stream is buffered. Perhaps a little example would clarify this. Assume that we read 4 bytes from a stream with a 10k buffer. The stream will then fill its buffer from the underlying source which means that after the read, the stream will have a full 10k buffer and an offset into the buffer that is 4. The underlying stream source will have an offset pointer that points to 10k+1 bytes into the underlying stream. If we now call seekg to position the file to offset 4 (which already is the current position), the stream throws away it’s buffer and we end up in the stream source who’s file position is 10k+1 so it also throws away it’s internal buffers and seeks back in the underlying stream to offset 4. In this way, what should have been a NULL operation turns into something very time-consuming. Previously we used a class that inherited directly from std::basic_streambuff that contained horrible code that no one really understood so switching to boost::iostreams was a blessing. Unfortunately the boost::iostreams implementation is 10 times slower when it is buffered and 50% slower when it is unbuffered. When reading files that are a couple of GB that really matters. /Jerker
participants (1)
-
Jerker Öhman