Re: [boost] Interest in a simple buffer abstraction?

30 Jan 2011

      On Sat, Jan 29, 2011 at 2:59 PM, Vicente Botet <vicente.botet@wanadoo.fr> wrote:
...
Cory Nelson wrote:
...
On Thu, Jan 27, 2011 at 6:49 AM, Vicente Botet <vicente.botet@wanadoo.fr>
wrote:
...
I'm working on a frame concept, which is something like a bidirectional
buffer. It allows to prepend and append binary data, and obtain/remove
the
header and the trailer.
In addition we can split a frame in fragments and join fragments to make
a
frame.
All these features are needed when working with protocol stacks.
You'll have a winner if you expand that to allow bidirectional
iteration of both bytes and fragments (and then a fragment is just a
raw array giving random-access iteration within them).  VERY useful
for high-performance incremental parsing, I would definitely give my
+1 for boost inclusion.
I wanted to hide whether the memory is continuous or not. Only when the user
needs continuous memory the library will ensure it on the resulting data. Of
course this continuous memory will have random access.
Could you give an example of how you want to use the features you seem to
need? This will help me to see if the current interface is enough or if it
must be extended.
I've designed a similar buffer class before, with I/O and incremental
parsing in mind.  In addition to what you describe, it also had
reserve(size_t), commit(size_t), release(size_t), and access to the
reserved area.  I'm still not 100% sure if this type of access is
appropriate for a general buffer, but allowing it would not cost
anything to those who aren't using it and it would reduce a ton of
code duplication for a specific io::buffer, since they're basically
the same but with the internals a bit more exposed.

1) Incremental parsing.

enum result { ok, need_more };

template<typename Iterator>
result parse(Iterator first, Iterator last);

I can call parse() on the first fragment (just using pointers for
iterators) for high performance.  Only if the first fragment doesn't
contain enough to parse (returning need_more) will I call parse() on
the entire buffer, which will be slower (like dequeue iterators) but
rare.  If that returns need_more still, then I'll perform more I/O to
fill the buffer with more data.  Think of, eg. HTTP or XML, where the
individual bits (like a header or <element>) are quite small but
you'll want to read in large page-sized fragments to reduce I/O calls.
 A single fragment will hold a lot of things and there's no reason to
use slow iterators to get them.

2) Scatter/gather I/O.

I will typically have a pool of page-sized fragments.  A slow stream
might only work on one or two fragments at a time, but for a fast one
I might want to reduce I/O calls and work with a lot of fragments at
once.  If a stream is coming in fast, I can reserve() 64KB worth of
fragments, request a scatter into the reserved area, then commit()
whatever amount of data it actually gave me.  Whatever didn't get
committed will still be usable for the next I/O.  I then release()
however much data I was able to parse.

This can be implemented pretty cleanly with hierarchical iterators, like so:

class fragment
{
   T* begin();
   T* end();
};

class iterator
{
   fragment& fragment();
};

class buffer
{
   iterator begin(); // go over committed range only.
   iterator end();

   iterator reserved_begin(); // go over reserved
(uninitialized/unconstructed) range only.
   iterator reserved_end();

   iterator to_iterator(fragment &f, T* iter); // creates a slower
iterator from a fragment iterator.

   size_type size(); // committed size.
   size_type reserved_size();
   size_type capacity(); // size() + reserved_size().

   size_type reserve(size_type minsize); // ensures there are at least
'minspace' elements reserved, and returns the new reserved_size().
   size_type commit(size_type size); // commits 'size' elements from
the reserved space, returns the new size().
   size_type release(size_type size); // releases 'size' elements from
committed space, returns the new size().
};

-- 
Cory Nelson
http://int64.org