1) In the overview section performance is nowhere to be seen as a goal, which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach? I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage?
Performance is a secondary goal that I have worked on, especially the serialization of large dense arrays. This is now as fast ad any hand crafted approach. What data structures are you interested in?
Matthias
I'm reading a xml file into a custom tree data structure, parsing the string representations into their correct types stored as boost::any. I'm hoping that deserialization using boost::serialize will be considerably faster than using libxml2 which I use to parse the xml file. The node data in this structure pretty much looks like:
vector< DataCollection > children; // Naturally all the children of this node std::string name; // This is the tag name in xml boost::any value; // This is <b>value</b> in xml std::map< std::string, boost::any > attributes; // Not entirely suprising the attributes of the xml node
The values stored in boost::any will be fairly lightweight, so I would recon that the majority of data read will actually be std::string for keys into the attributes as well as the name of the node. So I guess I'm having a little bit of everything hehe.
Since I won't send this data over network, and if I make a build for another system I can just ship different data files, I'm more interested in speed and the flexibility which boost::serilization offers. I'd be very interested in your changes Matthias.
There are not many optimizations for XML files: most of the overhead is in parsing the strings. I you are interested in performance, a binary archive will always be faster than an XML one. Most of the optimizations for binary archives are already in Boost 1.35.
I have a couple of questions:
1. why are your attributes a std::map< std::string, boost::any > and not a std::map< std::string, std::string > ? How do you find out which type to use?
2. why is your value a boost::any? How do you know the type to use?
When I parse the XML file with libxml2 I have a list where the different types have registered a regex filter which it will use to find the real type. Lets say you have for example <elem position="3 3 3">, then that will match the vector3 filter and construct a boost::any holding that vector3. I have a pretty neat system running here where I just need new types to register at FilterList. My DataCollection then have a Type& GetAttribute< Type >( const std::string& ), which basically wraps the any_cast and asserts that the typeids match. This way I get a pretty decent type safety, and since the client knows what type to expect it works out in the end. I don't really know how boost::serialize works under the hood, but I was expecting to get healthy speed up due to: A) libxml2 needs to parse the string data, locating start/end of xml elements, which I'm presuming is pretty costly in searching through all the string data. B) When I use libxml2 it first parses data into a string, which I then need to extract and match at runtime to construct the real type. C) I'm hoping the binary archive will take up less memory, resulting in less I/O. I strip the xml formating for example. I'm also enteraining the thought of having much more complex objects stored from my application, kind of using the binary archive as a cache. I haven't really explored that area all that much yet though.