New subject: Formal review: serialization

22 Apr 2004

      Vladimir Prus wrote:
...
Robert Ramey wrote:

...
I guess I have two questions:
1. Won't serialization fail in some way if I just seek the stream to the
position found in index and try reading.
2. For random access I need to make sure that all saved objects have
export
key. How do I do that? Not necessary out-of-the box but where I can plug
the check?
I believe the original question referred the ability to "browse" the archive
for purposes of debugging etc.  My view would be that the best way to do
this would be to create and xml_archve.  Your use case suggested a 30 data
would probably would end up as a 90MB XML datafile.  The reason for thinking
in XML would be that I would hope that here exists "XML Browser for large
files" which would be suitable for this case.  If such a thing doesn't exist
- I would prefer spending time making this general purpose tool - (and maybe
a commercial product) to making a custom archive of narrow purpose.

Random access into an archive would require some thought.  First, it's not
clear how it would be used.  In general archives are nested structures, one
could de-serialize an inner piece - but to where? The original data
structure was embedded in something else which is not now there.  So this
can really only consider in the context of a specific application.  Here are
a couple of plausible scenarios.

a) A general purpose archive browser - this could browse the archive in a
random manner but wouldn't actually de-serialize the data.  I don't see any
real problem here.  One would need to create an index either as a side
effect of archive create or with a second pass over the final archive.

b) using serialization for a data-state logging.

Log(Archive ar, statedata){
	// save seek point to index
	// append to archive
	ar << statedata;
}

Recover(Archive ar, statedata &, seekpoint)
	// set stream seekpoint
	ar >> statedata
}

I could envision something like this being made to work.

So I would say that generally, serialization would be expected to be a
serial operation. (surprise!).   On the other hand, in certain special
situations it might be possible/convenient to provide for some random access
but I would expect that to be application specific.
...
...
...
The reason why I think the other is important, it that's it's actually
saving/loading support for plain C++ array -- which is rather basic
thing.
Hmmm - the library already implements serialization of plane C++ arrays
by
serializing each element.
For *fixed-size* arrays. But not for dynamically allocated arrays. BTW, it
seems we need two wrappers for completeness:
...
one for dynamic arrays with  element-wise save
I would envision one using

ar << element_count
For(i = 0; I < element_count; i++)
	ar << element[i]

I'm not sure that its worth adding and documenting such an obvious and
trivial thing as a separate wrapper.
...
and another for dynamic arrays with binary save.
I see the binary object as filling that role.

Ar << make_binary_object(dynamic_array_address, element_size *
element_count);

I once did propose a manual section name "Serialization Wrappers" which
would have a couple of examples and highlight the fact that nvp and
binary_object are instances of this concept.  The idea was received
unenthusiastically at the time but now I'm becoming convinced It's more
effective to explain and illustrate the concept rather than try to
anticipate all the wrappers that users might need.  Actually, I think having
such a section would have avoided the confusion surrounding the intended
usage and purpose of binary_object.
...
I agree. I actually have a *crazy* but cute idea that one can use file
offset for object id. How object ids are assigned and can I customize that
process? That would keep overhead at absolute minimum.
Object_id are assigned sequentially starting with 0.  They are only used for
classes which require tracking (e.g. when instances are serialized through
pointers) they are used as indices into a vector so using these indices
keeps overhead to a minimum.  There are cases when an object id is assigned
but not written to the archive I don't see these as having much utility
outside of serialization.
...
Ah, I've missed that. Do I need to provide both 'type' and 'value'? Can't
serialization library work with just one?
Could be.  I just provided both so that I could interoperate with mpl
without having to think about each specific case.
...
...
...
It also necessary to provide overload for char*. Does not count as
primitive
type?
Default implementation for char * will work as it does for other
pointers.
Eh... I though that serialization of pointers to builtin types is just not
allowed.
Default implementation level for pointers to primitives is set to
not-serializable.  This is resetable as it is for any other type.  I doubt
that on would want to do it for something like int or double as I would
expect a lot of unintended side effects.  One might want to use
STRONG_TYPEDEF to define another int or double type and reset the
implementation level of that - but that going off on a tangent.
...
Actually, my archive initially have only one (non-templated) 'save' for
unsigned. I got compile error until I've declared 'save' for const char*.
I'm not sure why.
Its probably because of the above (pointer to a primitive type).  Attempts
to serialize instances of types whose implementation level is set to
not-serializable will result in compile time assertions. (these assertions
are sometimes the reason for the deep nested mpl).
...
...
I came to conclude it presented a big security risk.  The problem is the
following:
...
later:
char str[MAX_STRING_SIZE]
ar >> static_cast<char *>(str); // to avoid str being treated as an
array
suppose the text archive gets corrupted to:
3000 abc............
The archive will load with a buffer overrun - a security risk.
Right. I think this problem can be addressed with a wrapper for dynamic
array
char* str(0);
   ar >> make_dynarray_wrapper(str)
so that the library allocates the string itself.
How does it know what size to make the array?  Maybe you meen

Ar >> make_dynarray_wrapper(str, number of elements * element size)

Of course for char arrays one could use

Ar << binary_object(str, element_count * element_size)

And it would still be portable.
...
...
...
Can I define only one 'load' for unsigned int?
As oppose to ?
As opposed to overloads for all builtin types. For polymorphic_archive we
can't have templated function which 'fallbacks' anywhere. We need to have
a
closed set of virtual functions and I wonder what's the minimal set.
...
the rest.  BTW this provided a huge benefit. In the original version of
last year I got into a never-ending battle to specify virtual functions
which was
dependent on the compiler - long long, etc.  It was hopeless - moving to
templates solved that.
:-( I guess we're back to those problems.
I have recently got the polymorphic array working on my machine.  Only a
couple really small changes to the library code were required.  For the list
of primitive types I included all portable C++ primitive types. (that is no
long long, __int64 etc.) I'm not sure how much interest the polymorphic
archive will engender and it's a little hard to understand until you've
actually spent enough time with the library to appreciate the limitations of
templated code in certain scenarios. So although its clean (and clever) its
not going to be easy to explain.

Robert Ramey

Re:[boost] Re:Re:Formal review: serialization

Robert Ramey

Jeff Garland

Vladimir Prus

tags

participants (3)