Re: [boost] small but handy (to me) data buffer class

26 Mar 2004

      From: "Matthew Hurd" <matt@finray.net>
...
...
On Behalf Of Rob Stewart
Fancy meeting you here.  We work at the same company, SIG, just on 
different
sides of the world.  I work on the top side in Sydney and you work down
north ;-)
I suspected it was you, but didn't want to assume it given the
non-SIG address.
...
...
Isn't that the purpose of shared_ptr and shared_array?
A shared_ptr idiom works just dandy.  Trouble is you have to add all the
alternatives with a handle-like idiom, or some such, which drives you to
policies and you end up with something similar in complexity anyway but
perhaps not quite as to the point.
Seeing the examples and code, I see that you're doing more than
just managing the deletion of such buffers; you providing the
means of allocation, the size, the data type, etc.
...
Being able to annotate quite clearly and simply the appropriate 
semantics
for data block ownership for network libs drove my turning 
  struct thing { size_t size; char * data; }
into several more lines.  Win32/64, bsd, ACE, Ensemble, etc all have 
varying
semantics for handling data blocks which at least with simple policies 
you
can approach declaratively.  At least that was the idea, not sure it is 
not
overkill though as it is all much ado about nothing.
The policy based approach is quite flexible.  Too flexible I
think.  See below.
...
data<> usual;
data<256, create_refer, kill_delete > take_responsibility_for;
data<256, create_new, kill_delete > page6502;
data<4096, create_new, kill_delete, __int8, __int64 > vm_page;
data<MTU_SIZE, create_refer, kill_free, char*, __int64 > dodgy;
data<1e10, create_refer, kill_free, char*, __int64 > 
big_sucker_zero_copy;
data_standard_ptr     dont_copy___share_the_love;  // non-copyable
I've found the iterator on the buffer via the neat 
boost::iterator_facade is
handy syntax sugar too.  (If I did it correctly, I was a facade 
virgin.)
/*---------------------------------------------------------------------
  created:	2004-3-24   18:30
  filename: 	data.hpp
  author:		Matt Hurd
purpose:	I'm sick of getting confused about who owns what buffer
                as I'm not too clever about remembering stuff.
This 7 way configurable data buffer that 'allows' binary
                compatibility with a WSABUF on win32 solves a few simple
                but annoying issues for me.
Also I've found it has acted as nice simple intro for 
                a couple of colleagues w.r.t. to policy based design.
7?  3 create policies and 3 delete policies == 9 options
                2 don't make sense, new with free and malloc with delete.
                7 that could make sense though I'm not sure the malloc 
                with free should ever be used when you have a perfectly 
                good new with delete ;-)
This is where I disagree with the flexibility.  I think the
allocation and deallocation routines should be paired in a single
policy.  That way, to create a new combination, one must do so
expressly, not accidentally as the current design allows.
...
PS:  A vector<Base> won't do because of the ownership
                transfer semantic issue when integrating with third
                party libraries.
-----------------------------------------------------------------------*/
#ifndef DATA_HPP_2004324
#define DATA_HPP_2004324
#include <boost/shared_ptr.hpp>
#include <boost/static_assert.hpp>
#include <boost/iterator/iterator_facade.hpp>
namespace net {
//-----------------------------------------------------------------------
----
//   Create policies
// new space, allocate with new
template < int MaxSize, class T, typename SizeType  >
Why restrict MaxSize to be of type int?  You can eliminate
SizeType; see below.
...
struct create_new
{
    static T * create()
    {
        return new T[MaxSize];
    }
static T * create( const T * source, SizeType size)
Why not make this a function template based upon SizeType?  The
compiler can even deduce the type needed and the policy user
won't need to specify a type.
...
{
        return static_cast<T*>(memcpy( create(), source,  size * 
sizeof(T) ));
Um, what if size > MaxSize?
...
}
protected:
    ~create_new() {}
};
// new space, allocate with malloc
template < int MaxSize, class T, typename SizeType  >
struct create_malloc
{
    static T * create()
    {
        return static_cast<T*>(malloc(MaxSize * sizeof(T) ));
    }
static T * create( const T * source, SizeType size)
    {
        return static_cast<T*>(memcpy( create(), source,  size * 
sizeof(T) ));
Same problem here.
...
}
protected:
    ~create_malloc() {}
};
// already allocated just refer to the space
template < int MaxSize, class T, typename SizeType  >
struct create_refer
"create_refer" didn't carry any meaning for me. "create_nothing"
works a little better.  Still, it's a strange "creation" policy
and helps to make my "combine them" case.  (See below.)
...
{
    static T * create()
    {
        // shouldn't be called
        BOOST_STATIC_ASSERT(false);
That will cause a compile-time assertion whenever this class is
instantiated, won't it?  The only way that would be safe is if
create() were a function template that was only instantiated if
used.  To prevent calling this function, why not omit it?
...
}
static T * create( T * source, SizeType size)
    {
        return source;
    }
protected:
    ~create_refer() {}
};
//-----------------------------------------------------------------------
----
//   Kill policies
// data owns the block, allocated with new T[]
It's comments like that that make me want to combine the
policies.
...
template< class T >
struct kill_delete
{
    static void kill( T * p )
    {
        delete [] p;
    }
protected:
    ~kill_delete() {}
};
So here's my take:

template < typename T, typename MaxSize >
struct storage_array_new
{
   static T * create()
   {
      return new T[MaxSize];
   }

   template < typename SizeType >
   static T * create(T * source, SizeType size)
   {
      BOOST_ASSERT(size <= MaxSize);
      return static_cast<T*>(memcpy(create(), source,  size * sizeof(T)));
   }

   static void kill(T * p)
   {
       delete [] p;
   }

protected:
   ~storage_array_new() {}
};
...
// data owns the block, allocated with malloc
template< class T >
struct kill_free
{
    static void kill( T * p )
    {
        free(p);
    }
protected:
    ~kill_free() {}
};
// data is not to own the data block, don't delete it
template< class T >
struct kill_nothing
{
    static void kill( T * p )
    {
}
protected:
    ~kill_nothing() {}
};
Combining your create_refer with kill_nothing would be more
meaningful than either separately.  Can you think of an occasion
when you'd want to allocate a buffer but not release it?

template < typename T, typename MaxSize >
struct storage_reference_only
{
   template < typename SizeType >
   static T * create(T * source, SizeType size)
   {
      BOOST_ASSERT(size <= MaxSize);
      return source;
   }

   static void kill(T * p)
   {
   }

protected:
   ~storage_reference_only() {}
};
...
//-----------------------------------------------------------------------
//   The data
template< 
    size_t MaxSize = 4096
Oops!  You used size_t (std::size_t, I assume) for MaxSize, but
int for the corresponding argument in the creation policy
classes.  I also don't think a default size is appropriate.
There are way too many scenarios in which that default is wrong
and too few in which it is right.
...
, template <int, class, class > class CreatePolicy = create_new
    , template <class> class KillPolicy = kill_delete
    , typename Base = char
^^^^
Bad choice of name.  This is the data type, so "T" is customary
and "Base" connotes base class.
...
, typename SizeType = size_t
...
class data : public CreatePolicy< MaxSize, Base, SizeType >, KillPolicy<
^^^^^^
Why?  Do you mean for clients to be able to call create() and
kill() or are those just for data?
...
Base >
My version:

template
   < typename MaxSize
   , template < typename, typename > class StoragePolicy =
      storage_array_new
   , typename T = char
...
class data : StoragePolicy< MaxSize, T >, boost::noncopyable

If only boost::noncopyable used base class chaining to ensure
this MI doesn't bloat the object (thus losing your binary
compatibility goal).  Lacking that, perhaps the thing to do is to
make the storage policy chainable:

namespace detail
{
   struct end_of_base_class_chain { };
}

template
   < typename T
   , typename MaxSize
   , class Base = detail::end_of_base_class_chain
...
struct storage_example : Base
{ ... };
...
// implicit conversion is a bad idea, but I'm addicted to its 
handiness, someone slap me ;-)
    operator Base * ()
    {
        return data_;
    }
Consider yourself slapped!  This is rarely a good idea.
...
SizeType size( ) const
    {
        return number_of_things_;
^^^^^^^^^^^^^^^^^
Wouldn't "size_" work?
...
SizeType number_of_bytes() const
    {
        return number_of_things_ * sizeof(Base);
    }
Perhaps "byte_count" would work?
...
Base * overwrite( SizeType data_size, const Base * source )
    {
        assert(data_size <= MaxSize);
        size(data_size);
        return static_cast<Base *>(memcpy( data_, source,  
number_of_bytes() ));
This duplicates logic already in some of the creation policies.
Perhaps you should put "copy" in the suggested storage policy.
(Another good reason for combining them!)
...
typedef iterator_type< Base > iterator;
    typedef iterator_type< Base const > const_iterator;
iterator begin()   { return iterator(data_); }
    iterator end()   { return iterator(data_ + number_of_things_); }
const_iterator begin() const  { return const_iterator(data_); }
    const_iterator end() const { return const_iterator(data_ + 
number_of_things_); }
Iterators are certainly compelling justification for this class.
...
private:
    // non copyable - if you're thinking of copying:
    //                     a)  think again, 
    //                     b)  use a shared_ptr<data...> to reference 
count
    //                     c)  if you really have to have the same data 
twice
    //                         use the constructor to clone
data( const data & source );
    data& operator=( const data & rhs );
Use boost::noncopyable.

Separate issue: where are the typedefs, etc. for generic code?
You should have data reveal its data type, its MaxSize, even its
storage policy.  They're liable to be useful.
...
typedef data<>  data_standard;
typedef boost::shared_ptr< data_standard > data_standard_ptr;
These typedef's don't belong in this header.  Leave it to clients
to establish their own idea(s) of what a "standard" version of
data is.

-- 
Rob Stewart                           stewart@sig.com
Software Engineer                     http://www.sig.com
Susquehanna International Group, LLP  using std::disclaimer;