Re:[boost] Re:Re:Formal review: serialization

Vladimir Prus wrote:
Robert Ramey wrote:
I guess I have two questions: 1. Won't serialization fail in some way if I just seek the stream to the position found in index and try reading. 2. For random access I need to make sure that all saved objects have export key. How do I do that? Not necessary out-of-the box but where I can plug the check?
I believe the original question referred the ability to "browse" the archive for purposes of debugging etc. My view would be that the best way to do this would be to create and xml_archve. Your use case suggested a 30 data would probably would end up as a 90MB XML datafile. The reason for thinking in XML would be that I would hope that here exists "XML Browser for large files" which would be suitable for this case. If such a thing doesn't exist - I would prefer spending time making this general purpose tool - (and maybe a commercial product) to making a custom archive of narrow purpose. Random access into an archive would require some thought. First, it's not clear how it would be used. In general archives are nested structures, one could de-serialize an inner piece - but to where? The original data structure was embedded in something else which is not now there. So this can really only consider in the context of a specific application. Here are a couple of plausible scenarios. a) A general purpose archive browser - this could browse the archive in a random manner but wouldn't actually de-serialize the data. I don't see any real problem here. One would need to create an index either as a side effect of archive create or with a second pass over the final archive. b) using serialization for a data-state logging. Log(Archive ar, statedata){ // save seek point to index // append to archive ar << statedata; } Recover(Archive ar, statedata &, seekpoint) // set stream seekpoint ar >> statedata } I could envision something like this being made to work. So I would say that generally, serialization would be expected to be a serial operation. (surprise!). On the other hand, in certain special situations it might be possible/convenient to provide for some random access but I would expect that to be application specific.
The reason why I think the other is important, it that's it's actually saving/loading support for plain C++ array -- which is rather basic thing.
Hmmm - the library already implements serialization of plane C++ arrays by serializing each element.
For *fixed-size* arrays. But not for dynamically allocated arrays. BTW, it seems we need two wrappers for completeness:
one for dynamic arrays with element-wise save
I would envision one using ar << element_count For(i = 0; I < element_count; i++) ar << element[i] I'm not sure that its worth adding and documenting such an obvious and trivial thing as a separate wrapper.
and another for dynamic arrays with binary save.
I see the binary object as filling that role. Ar << make_binary_object(dynamic_array_address, element_size * element_count); I once did propose a manual section name "Serialization Wrappers" which would have a couple of examples and highlight the fact that nvp and binary_object are instances of this concept. The idea was received unenthusiastically at the time but now I'm becoming convinced It's more effective to explain and illustrate the concept rather than try to anticipate all the wrappers that users might need. Actually, I think having such a section would have avoided the confusion surrounding the intended usage and purpose of binary_object.
I agree. I actually have a *crazy* but cute idea that one can use file offset for object id. How object ids are assigned and can I customize that process? That would keep overhead at absolute minimum.
Object_id are assigned sequentially starting with 0. They are only used for classes which require tracking (e.g. when instances are serialized through pointers) they are used as indices into a vector so using these indices keeps overhead to a minimum. There are cases when an object id is assigned but not written to the archive I don't see these as having much utility outside of serialization.
Ah, I've missed that. Do I need to provide both 'type' and 'value'? Can't serialization library work with just one?
Could be. I just provided both so that I could interoperate with mpl without having to think about each specific case.
It also necessary to provide overload for char*. Does not count as primitive type?
Default implementation for char * will work as it does for other pointers.
Eh... I though that serialization of pointers to builtin types is just not allowed.
Default implementation level for pointers to primitives is set to not-serializable. This is resetable as it is for any other type. I doubt that on would want to do it for something like int or double as I would expect a lot of unintended side effects. One might want to use STRONG_TYPEDEF to define another int or double type and reset the implementation level of that - but that going off on a tangent.
Actually, my archive initially have only one (non-templated) 'save' for unsigned. I got compile error until I've declared 'save' for const char*. I'm not sure why.
Its probably because of the above (pointer to a primitive type). Attempts to serialize instances of types whose implementation level is set to not-serializable will result in compile time assertions. (these assertions are sometimes the reason for the deep nested mpl).
I came to conclude it presented a big security risk. The problem is the following:
later:
char str[MAX_STRING_SIZE]
ar >> static_cast<char *>(str); // to avoid str being treated as an array
suppose the text archive gets corrupted to:
3000 abc............
The archive will load with a buffer overrun - a security risk.
Right. I think this problem can be addressed with a wrapper for dynamic array
char* str(0); ar >> make_dynarray_wrapper(str)
so that the library allocates the string itself.
How does it know what size to make the array? Maybe you meen Ar >> make_dynarray_wrapper(str, number of elements * element size) Of course for char arrays one could use Ar << binary_object(str, element_count * element_size) And it would still be portable.
Can I define only one 'load' for unsigned int?
As oppose to ?
As opposed to overloads for all builtin types. For polymorphic_archive we can't have templated function which 'fallbacks' anywhere. We need to have a closed set of virtual functions and I wonder what's the minimal set.
the rest. BTW this provided a huge benefit. In the original version of last year I got into a never-ending battle to specify virtual functions which was dependent on the compiler - long long, etc. It was hopeless - moving to templates solved that.
:-( I guess we're back to those problems.
I have recently got the polymorphic array working on my machine. Only a couple really small changes to the library code were required. For the list of primitive types I included all portable C++ primitive types. (that is no long long, __int64 etc.) I'm not sure how much interest the polymorphic archive will engender and it's a little hard to understand until you've actually spent enough time with the library to appreciate the limitations of templated code in certain scenarios. So although its clean (and clever) its not going to be easy to explain. Robert Ramey

On Thu, 22 Apr 2004 09:53:30 -0700, Robert Ramey wrote
I have recently got the polymorphic array working on my machine. Only a couple really small changes to the library code were required. For the list of primitive types I included all portable C++ primitive types. (that is no long long, __int64 etc.)
Use boost::int64_t and boost::uint64_t and all your problems will be solved :-) Date-time uses these and it works fine on all the boost test platforms. Jeff

Robert Ramey wrote:
I guess I have two questions: 1. Won't serialization fail in some way if I just seek the stream to the position found in index and try reading. 2. For random access I need to make sure that all saved objects have export key. How do I do that? Not necessary out-of-the box but where I can plug the check?
Random access into an archive would require some thought. First, it's not clear how it would be used. In general archives are nested structures, one could de-serialize an inner piece - but to where? The original data structure was embedded in something else which is not now there. So this can really only consider in the context of a specific application. Here are a couple of plausible scenarios.
a) A general purpose archive browser - this could browse the archive in a random manner but wouldn't actually de-serialize the data. I don't see any real problem here. One would need to create an index either as a side effect of archive create or with a second pass over the final archive.
I'd rather like to deserialize the data when needed.
b) using serialization for a data-state logging.
Log(Archive ar, statedata){ // save seek point to index // append to archive ar << statedata; }
Recover(Archive ar, statedata &, seekpoint) // set stream seekpoint ar >> statedata }
I could envision something like this being made to work.
Yes, I look for something like this.
So I would say that generally, serialization would be expected to be a serial operation. (surprise!). On the other hand, in certain special situations it might be possible/convenient to provide for some random access but I would expect that to be application specific.
Hmm... gotta look into this. What about my second question:
2. For random access I need to make sure that all saved objects have export key. How do I do that? Not necessary out-of-the box but where I can plug the check?
one for dynamic arrays with element-wise save
I would envision one using
ar << element_count For(i = 0; I < element_count; i++) ar << element[i]
I'm not sure that its worth adding and documenting such an obvious and trivial thing as a separate wrapper.
But on loading you need to add 'new[]' call. So load and save become non-symmetric, and you need to split serialize, which is rather inconvenient.
and another for dynamic arrays with binary save.
I see the binary object as filling that role.
Ar << make_binary_object(dynamic_array_address, element_size * element_count);
I once did propose a manual section name "Serialization Wrappers" which would have a couple of examples and highlight the fact that nvp and binary_object are instances of this concept. The idea was received unenthusiastically at the time but now I'm becoming convinced It's more effective to explain and illustrate the concept rather than try to anticipate all the wrappers that users might need. Actually, I think having such a section would have avoided the confusion surrounding the intended usage and purpose of binary_object.
Sure, docs rarely hurt.
I agree. I actually have a *crazy* but cute idea that one can use file offset for object id. How object ids are assigned and can I customize that process? That would keep overhead at absolute minimum.
Object_id are assigned sequentially starting with 0. They are only used for classes which require tracking (e.g. when instances are serialized through pointers) they are used as indices into a vector so using these indices keeps overhead to a minimum. There are cases when an object id is assigned but not written to the archive I don't see these as having much utility outside of serialization.
So, it's not easily possible to plug a different algorithm?
Ah, I've missed that. Do I need to provide both 'type' and 'value'? Can't serialization library work with just one?
Could be. I just provided both so that I could interoperate with mpl without having to think about each specific case.
But for user this can be inconvenient.
Actually, my archive initially have only one (non-templated) 'save' for unsigned. I got compile error until I've declared 'save' for const char*. I'm not sure why.
Its probably because of the above (pointer to a primitive type). Attempts to serialize instances of types whose implementation level is set to not-serializable will result in compile time assertions. (these assertions are sometimes the reason for the deep nested mpl).
In fact, it looked like the library tried to save char* somewhere.... I'll take a second look.
Right. I think this problem can be addressed with a wrapper for dynamic array
char* str(0); ar >> make_dynarray_wrapper(str)
so that the library allocates the string itself.
How does it know what size to make the array? Maybe you meen
Ar >> make_dynarray_wrapper(str, number of elements * element size)
I actually meant. int size; char* str; ar & make_dynarray_wrapper(str, size); On load, the size is initialized to the size of data and 'str' is new[]-ed.
Of course for char arrays one could use
Ar << binary_object(str, element_count * element_size)
And it would still be portable.
Again, how would I load data? I'd need to new[] the array myself and this leads to split serialization.
I have recently got the polymorphic array working on my machine. Only a couple really small changes to the library code were required. For the list of primitive types I included all portable C++ primitive types. (that is no long long, __int64 etc.) I'm not sure how much interest the polymorphic archive will engender and it's a little hard to understand until you've actually spent enough time with the library to appreciate the limitations of templated code in certain scenarios. So although its clean (and clever) its not going to be easy to explain.
These are great news. It it possible that you make BOOST_EXPORT always register classes with the polymoprhic archive and use polymoprhic archive as fallback for serializing classes? - Volodya
participants (3)
-
Jeff Garland
-
Robert Ramey
-
Vladimir Prus