Re: [boost] [serialization] use of unsigned int instead of size_type

13 Feb 2006

      On Sun, Feb 12, 2006 at 03:43:12PM -0800, Robert Ramey wrote:
...
troy d. straszheim wrote:
...
template<class Archive, class OStream>
 BOOST_ARCHIVE_OR_WARCHIVE_DECL(void)
 basic_binary_oprimitive<Archive, OStream>::save(const char * s)
 {
     std::size_t l = std::strlen(s);
     this->This()->save(l);
     save_binary(s, l);
 }
One can fix this by implementing one's own portable binary primitives
(which is what we've done), but that does involve duplicating a lot of
library code in order to change a few lines.  The special type for
collection size (if applied consistently) seems cleaner.
I'm curious as to what problem using std::size_t created.  That is,
why did you feel you had to change it.
Gah.  Looking at it again, I probably felt I had to change it in the
binary primitives due to sleep deprivation, or just from having stared
at the code for too long.  There's no reason to implement separate
portable primitives...  You can simply add save/load for const char*,
const std::string&, etc, in the most-derived
portable_binary_(i|o)archive, and use whatever type you like to
represent the string's size.  For the platforms that I need to be
portable across, this is workable.  The save_collection stuff,

  template<class Archive, class Container>
  inline void save_collection(Archive & ar, const Container &s)
  {
      // record number of elements
      unsigned int count = s.size();
      ar << make_nvp("count", const_cast<const unsigned int &>(count));

is fine for my portability purposes, since unsigned int is the same
size everywhere I have to run.

But in general, the size_t problem I've hit looks like this:

If you feel that you can't afford the extra time and space overhead of
storing a 1 byte size with every single int/long/longlong in the
archive, then you must decide how much storage each primitive gets
when the archive is created and record it in the archive header or
something.  Take these 3 platforms:

                   intel32/glibc  intel64/glibc  ppc/darwin

sizeof(int)        4              4              4
sizeof(long)       4              8              4
sizeof(long long)  8              8              8
uint32_t           u int          u int          u int
uint64_t           u long long    u long         u long long
size_t             u int          u long         u long

Even if you could afford to keep size_t at 32 bits because your
containers are never that big, or you could afford to bump size_t up
to 64 bits because we're not too short on disk space, you have
problems.  There's no way to do something consistent with size_t if
the archive doesn't know it is size_t.

If you make a decision about how big size_t will be on disk when the
archive is created, then you have to shrink or expand all types that
size_t might be before you write them and after you read them.  You
can either shrink it to 32 bits when saving on 64 bit machines (say
using numeric_limits<> to range-check and throw if something is out of
range), or save as 64 bits and shrink to 32 bits when loading on 32
bit plaforms...

If you reduce "unsigned long" to 32 bits in the archive you do get a
consistent size for size_t across all platforms... but then you have
no way to save 64 bit ints, because on intel64, uint64_t is unsigned
long.  

If you increase unsigned long to 64 bits on disk, then intel64 and ppc
have consistent container size in the archive, but intel32 doesn't, as
there size_t is unsigned int.

You could bump up unsigned int *and* unsigned long to 64 bits, then
you have consistent container size across all three, but you have even
more space overhead than the size-byte-per-primitive approach.

There may be some other cases, I dunno.  The whole thing is kinda
messy.

-t