[boost] Re: [serialize] xml archives and base 64 encoded binary data

25 Oct 2004


      "Russell Hind" <rh_gmane@mac.com> wrote in message
news:cl7qf0$b1a$1@sea.gmane.org...
...
Robert Ramey wrote:
...
It would seem that you're using the xml archive for purporses other than
for
...
serialization.  Of course I don't see any problem with this (until one
decides to edit it and change its schema).  But I am curious what use
you've
found fot it.  I originally did it only to satisfy boost nit-pickers as
I
felt it was an inefficient way to implement serialization.  I've since
found
it useful for debugging archives.  I seems to be compatile with xml
viewers
so its useful for rendering archives in a visible way.  So, after all I
have
to concede that the nit-picker do have a point.  I have a sneaking
suspicion
that it will turn up in all kinds of unexpected places and I'm wonder
what
those might be.
I've been using our in-house implemented serialiazation stuff for a few
years which offsers similar functionality to yours.  Unforunately ours
was very geard towards quickly dealing with large (>1Gb) files that have
100,000's pointer-based objects stored in them so was tied to a specific
app.
The systems we are dealing at the moment only generates smaller files
(20Mb or so) so boost::serialization will hopefully support it nicely.
It also gives the advantage of XML/text archives as well as binary.
We have an R&D group who only use python for testing purposes and want
to read in our data files for extra processing and trying out new ides.
  Binary files are by far the most efficient, but describing the
structure of a binary archive to someone who only uses python isn't easy
at all.  So XML seems like the way to go as they can visually look at it
and see the information they want to pick out easily.
Can't you use boost python to call boost serialization from within python
via some wrapper function?  wouldn't this make the whole process totally
painless?  I believe someone else, (I forgot whom) was doing this with good
success.
...
Our data consists of many settings, 3d model information uses comments
etc, all which are textual so XML/text supports them well, but three
quarters of the data is vectors floating point scan data.  Writing these
textually would lead to an over-top archive.  Complete binary would mean
passing it to python users would be a pain, so XML with encoding seems
like a good solution.
When the files get bigger, we can put them through a zip because the
python lot could still handle un-zipping and then reading xml so that
isn't an issue.
If it wasn't for the need to let our R&D group have access to data in
this way, then I would go for a binary format but I'm hoping that
ultimately zipped XML won't be a lot larger for our files (hoping to
test in the next few days).
The urgency of getting serialization up and running is that I've shyed
away from introducing our serialization stuff in to the project and
generating files in its format because I was hoping that boost
serialization would be out in time (we ship in December) and could move
to that as it is a much more flexible system than our in house one.
...
you have a couple of options:
a) Make your own derivation of xml_(i/o)archive which uses your own
version
...
...
of write/read_binary.  Advantage - wouldn't touch the current archive
classes.  The manual describes how to do this.
b) Just fix the current code that does the read/write_binary text data.
You could roll this in to your own version of 1.32 and be on your way.
This
is implemented as part of the dataflow iterators and I don't think this
is
very difficult except that that understanding my dataflow iterator idea
would take some investment of effort that might not be worthwhile.
There is
already a test for serialization of binary data so even that is done.
The
reason I don't do it now is that it starts a whole chain reaction
regarding
testing on all the platforms that boost supports and it is a very
inconvenient time to do this.  Also no one raised the issue until now.
Fixing the current code would be my ideal solution, I'll just have to
see how much time I get to look in to this.  If not, for now, I'm sure
the python lot can handle adding the necessary padding characters in.
I take it the archive version will be increased for the next release if
something like this changes so current files will be compatible?
Thanks
Russell
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost