New subject: Formal review: serialization

21 Apr 2004

      Vladimir Prus wrote:
...
Robert Ramey wrote:
...
...
-- (critical) The primary issue with the library is lack of reference
documentation
...
OK I don't see a problem with this - Thanks to Dave for the hint on how to
handle this.
...
For those stubs, requirements docs that Dave mentioned would be nice.
However, there's code that user calls, and I'd like to see docs for that,
too.
For example:
binary_object make_binary_object(void* ptr, unsigned size);
   Returns: binary_object(ptr, size);
binary_object::save(Archive& ar, unsigned version)
   Effects: Calls ar.save_binary(m_ptr, m_size);
binary_object::load(Archive& ar, unsigned version)
   Effects: Calls ar.load_binary(m_ptr, m_size);
Aside - I wouldn't expect load/save/load_binary/save_binary to ever be
invoked by a library user.
...
It would be nice to avoid asking the user to do BOOST_CLASS_EXPORT for all
possible argument types. What's desirable is:
template<class T>
  void register_rpc_function(const char* name, void (*f)(const T&))
  {
        functions[name] = ... ;
        boost::serialization::export_class< function_call_1<T>
              >::instantiate();
  }
I can't put BOOST_CLASS_EXPORT in register_rpc function now.
This situation is discussed in the documentation under the heading "Template
Serialization Traits".  There is an example that uses assignation of traits
to the nvp<T> template.  BOOST_CLASS_EXPORT is just a syntactic short hand
to for the specialization above.  Uh-oh - I just looked at the definition of
BOOST_CLASS_EXPORT and I see it's not that obvious as it is in the other
traits. I'll take a look at this.
...
I'm afraid the no matter how smart XML reader is, it still would have to
scan the entire file.
So what's wrong with that?  I would guess that such an XML reader already
exists somewhere and you cat get everything thing for free.  Assuming this
doesn't address your need, and you want to do some programming, one can
create another XML archive version.  This would create two output files. One
exactly as it is now will another "index".  When I was considering the
options regarding XML output I briefly considered the possibility of
creating two files - one like we have the one now - along with an optional
parallel file containing the corresponding XML schema describing the XML
archive.  I decided to keep things simpler.  But I think you see the idea.
...
...
I can expand the doc a bit.  binary_object is just a wrapper around a
size
and pointer to permit them to be handled as pair. It presumes the
pointer
already points to allocated storage.
This last sentence is exactly what I'd like added to docs.
OK that's easy.  Sorry for the confusion.
...
...
It also presumes that the the size
of thing its pointing to is the same on save and load. Its very
lightweight.
I don't see a need for the others, but of course any user can make the
wrappers he needs.
The reason why I think the other is important, it that's it's actually
saving/loading support for plain C++ array -- which is rather basic thing.
Hmmm - the library already implements serialization of plane C++ arrays by
serializing each element.  This is the more general solution as it calls
serialization for each element.  The only time one might want to do
save/load for a whole array is for a non-portable binary file.

Re: save/load asymmetry for polymorphic pointers.
...
As I've said previously, this is a bug which is easy to make and very hard
to debug. I think we can only wait for others to express an opinion, as we
fail to convince each other.
Well, THAT we can agree on.
...
...
BTW, one usage of XML archives did occur to me.  By checking the name
tag
on input, we can implement a crude check that save and load operations
are
synchronized.  This would effectively a debug mode for archives and
might
be useful.
But you won't catch a case where you save one type and load another.
True, but I think it would help a lot - and practically free to implement.
...
So, the get zero overhead I need to tweak base class an disable tracking
of
pointers. Let me try that... yea, the results are nice. One one extra
element (class id) per saved item.
If your not serializing pointers, the class_id isn't written to the archive
even once.  Object id is required for tracking, class id is required for
pointers. If versioning is used, class-id is used once.  I do not believe
that there is any information stored in an archive which is not used.
...
BTW, how to I set tracking level and implementation level for a template
class. I think I can partially specialize 'tracking_level', but it should
be mentioned in the docs.
See docs section "Template Serialization Traits"
...
...
...
- The documentation should really state the minimal set of
'load'/'save'
overloads which will make archive usable. For example, it's probably
not
necessary to provide separate overload for 'bool', right?
The document states:
" However, we're not quite done. The above code addresses serialization
of
all non-primitive types. To be complete, each primitive type must either
be covered by a definition of template<typename T> void load(T & t); or
an
overload of the >> operator"
It also necessary to provide overload for char*. Does not count as
primitive
type?
Default implementation for char * will work as it does for other pointers.
Which is probably not what one has in mind.  I have code in there for
handling it as a c string (Its commented out). I tested it and it worked but
I came to conclude it presented a big security risk.  The problem is the
following:

char * str = "abc"

ar << str; //  no problem

the (text) archive looks like

3 abc

later:

char str[MAX_STRING_SIZE]

ar >> static_cast<char *>(str); // to avoid str being treated as an array

suppose the text archive gets corrupted to:

3000 abc............

The archive will load with a buffer overrun - a security risk.

So then one should dynamically allocate the storage according to the size -
that is one should be using the std::string .  So I decided to comment out
the code that handles char *.
...
...
" If all primitive types have been accounted for, any program with
serialization defined should work with the new archive."
Maybe I can expand upon that a little to something like:
"Any program with serialization defined should work with the new archive
as long as every primitive type has a matching save/load function
prototype or template."
Can I define only one 'load' for unsigned int?
As oppose to ? They way I implemented the included archives I specified load
for those requiring special treatment and used a template as a fall back for
the rest.  BTW this provided a huge benefit. In the original version of last
year I got into a never-ending battle to specify virtual functions which was
dependent on the compiler - long long, etc.  It was hopeless - moving to
templates solved that.
...
...
I've concluded this myself.  Its pretty easy given the bjam setup.
Maybe
...
it can be made even easier with a bjam argument or environmental
variable.
It
just needs to be explained.
Right.
BTW as a bjam expert, you might want to suggest how to do this in cool way.

I would love to have a shell script which lets me do

test_archive <toolset> <archive>

But I don't see how to fix up the bjam files to support this.  As a bjam
expert, maybe you could look into this.
...
Do you mean you plan to always serialize char* as arrays and remove
'load'/'save' overloads for char*? Again, I'm not sure I understand the
motivation.
See above

Robert Ramey

Re:[boost] Re:Formal review: serialization

Robert Ramey

Vladimir Prus

tags

participants (2)