Re: [Boost-users] [Serialization] A few questions about boost::serialization

4 Aug 2008


      Just an FYI, boost is not a SAX parser like libxml2, so it will be orders of
magnitude faster then.  However you will loose some flexibility if for some
reason there is an error in your output stream due to human interaction.  A
SAX parser could move on and ignore the error, it's kind of all or nothing
with boost::serialization.  Unless Matthias or Robert know something that I
don't, as they are the authors, while I'm just a avid user.

Cheers,
Tim

On Mon, Aug 4, 2008 at 10:31 AM, <Sebastian.Karlsson@mmorpgs.org> wrote:
...
1) In the overview section performance is nowhere to be seen as a  goal,
...
...
...
...
which for my use case is very important. If I were to use  the  binary
archive, how well would it perform in comparsion to a  hand  crafted
optimized serialization aproach? I've seen in the  examples  that strings
seems to be used to identify data, won't  this create a  large overhead for
both deserilzation and storage?
Performance is a secondary goal that I have worked on, especially the
serialization of large dense arrays. This is now as fast ad any hand
crafted approach. What data structures are you interested in?
Matthias
I'm reading a xml file into a custom tree data structure, parsing  the
string representations into their correct types stored as  boost::any. I'm
hoping that deserialization using boost::serialize  will be considerably
faster than using libxml2 which I use to parse  the xml file. The node data
in this structure pretty much looks like:
vector< DataCollection > children; // Naturally all the children of  this
node
std::string name; // This is the tag name in xml
boost::any value; // This is <b>value</b> in xml
std::map< std::string, boost::any > attributes; // Not entirely
 suprising the attributes of the xml node
The values stored in boost::any will be fairly lightweight, so I  would
recon that the majority of data read will actually be  std::string for keys
into the attributes as well as the name of the  node. So I guess I'm having
a little bit of everything hehe.
Since I won't send this data over network, and if I make a build  for
another system I can just ship different data files, I'm more  interested in
speed and the flexibility which boost::serilization  offers. I'd be very
interested in your changes Matthias.
There are not many optimizations for XML files: most of the overhead is
in parsing the strings. I you are interested in performance, a binary
archive will always be faster than an XML one. Most of the
optimizations for binary archives are already in Boost 1.35.
I have a couple of questions:
1. why are your attributes a std::map< std::string, boost::any >  and
not a std::map< std::string, std::string > ? How do you find out which
type to use?
2. why is your value a boost::any? How do you know the type to use?
When I parse the XML file with libxml2 I have a list where the different
types have registered a regex filter which it will use to find the real
type. Lets say you have for example <elem position="3 3 3">, then that will
match the vector3 filter and construct a boost::any holding that vector3. I
have a pretty neat system running here where I just need new types to
register at FilterList. My DataCollection then have a Type& GetAttribute<
Type >( const std::string& ), which basically wraps the any_cast and asserts
that the typeids match. This way I get a pretty decent type safety, and
since the client knows what type to expect it works out in the end.
I don't really know how boost::serialize works under the hood, but I was
expecting to get healthy speed up due to:
A) libxml2 needs to parse the string data, locating start/end of xml
elements, which I'm presuming is pretty costly in searching through all the
string data.
B) When I use libxml2 it first parses data into a string, which I then need
to extract and match at runtime to construct the real type.
C) I'm hoping the binary archive will take up less memory, resulting in
less I/O. I strip the xml formating for example.
I'm also enteraining the thought of having much more complex objects stored
from my application, kind of using the binary archive as a cache. I haven't
really explored that area all that much yet though.
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- 
Regards,
Timothy St. Clair
[timothysc@gmail.com]

Re: [Boost-users] [Serialization] A few questions about boost::serialization

Tim St. Clair