[Serialization] A few questions about boost::serialization
Hi, I'm considering using boost::serialization but have a few questions which I would love to get answered. 1) In the overview section performance is nowhere to be seen as a goal, which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach? I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage? 2) I saw the thread about the reason for boost::any not being serializable. But I don't really care about portability, so in that case, is it possible to implement a non intrusive serialization aproach to boost::any? Would really appreciate some little snippet for that if that is the case. If that's not possible my last option would naturally be to make an intrusive implementation, I see no obvious aproach here since I can't make a virtual template function, so I'm kind of hoping someone could push me in the right direction if it comes to this. Besides this I can only say that the library have a very slick interface and I really dig the design :) // Sebastian Karlsson
Comments Below.
On Mon, Aug 4, 2008 at 3:48 AM,
Hi,
I'm considering using boost::serialization but have a few questions which I would love to get answered.
1) In the overview section performance is nowhere to be seen as a goal, which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach?
When I compared a bruit stupid << stream overload >> method of serialization it was about 2-2.5 times faster the serialization. I firmly believe that I loose performance to increase flexibility, and depending on your requirements this may or may not be acceptable.
I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage?
This depends on your implementation. It's your choice on how you wish to "chunk" your data.
2) I saw the thread about the reason for boost::any not being serializable. But I don't really care about portability, so in that case, is it possible to implement a non intrusive serialization aproach to boost::any? Would really appreciate some little snippet for that if that is the case. If that's not possible my last option would naturally be to make an intrusive implementation, I see no obvious aproach here since I can't make a virtual template function, so I'm kind of hoping someone could push me in the right direction if it comes to this.
Someone else would have to chime in here, but you may want to take a look at tuple serialization examples.
Besides this I can only say that the library have a very slick interface and I really dig the design :)
// Sebastian Karlsson _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Regards, Timothy St. Clair [timothysc@gmail.com]
On Aug 4, 2008, at 10:48 AM, Sebastian.Karlsson@mmorpgs.org wrote:
Hi,
I'm considering using boost::serialization but have a few questions which I would love to get answered.
1) In the overview section performance is nowhere to be seen as a goal, which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach? I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage?
Performance is a secondary goal that I have worked on, especially the serialization of large dense arrays. This is now as fast ad any hand crafted approach. What data structures are you interested in? Matthias
1) In the overview section performance is nowhere to be seen as a goal, which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach? I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage?
Performance is a secondary goal that I have worked on, especially the serialization of large dense arrays. This is now as fast ad any hand crafted approach. What data structures are you interested in?
Matthias
I'm reading a xml file into a custom tree data structure, parsing the string representations into their correct types stored as boost::any. I'm hoping that deserialization using boost::serialize will be considerably faster than using libxml2 which I use to parse the xml file. The node data in this structure pretty much looks like: vector< DataCollection > children; // Naturally all the children of this node std::string name; // This is the tag name in xml boost::any value; // This is <b>value</b> in xml std::map< std::string, boost::any > attributes; // Not entirely suprising the attributes of the xml node The values stored in boost::any will be fairly lightweight, so I would recon that the majority of data read will actually be std::string for keys into the attributes as well as the name of the node. So I guess I'm having a little bit of everything hehe. Since I won't send this data over network, and if I make a build for another system I can just ship different data files, I'm more interested in speed and the flexibility which boost::serilization offers. I'd be very interested in your changes Matthias.
On Aug 4, 2008, at 4:20 PM, Sebastian.Karlsson@mmorpgs.org wrote:
1) In the overview section performance is nowhere to be seen as a goal, which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach? I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage?
Performance is a secondary goal that I have worked on, especially the serialization of large dense arrays. This is now as fast ad any hand crafted approach. What data structures are you interested in?
Matthias
I'm reading a xml file into a custom tree data structure, parsing the string representations into their correct types stored as boost::any. I'm hoping that deserialization using boost::serialize will be considerably faster than using libxml2 which I use to parse the xml file. The node data in this structure pretty much looks like:
vector< DataCollection > children; // Naturally all the children of this node std::string name; // This is the tag name in xml boost::any value; // This is <b>value</b> in xml std::map< std::string, boost::any > attributes; // Not entirely suprising the attributes of the xml node
The values stored in boost::any will be fairly lightweight, so I would recon that the majority of data read will actually be std::string for keys into the attributes as well as the name of the node. So I guess I'm having a little bit of everything hehe.
Since I won't send this data over network, and if I make a build for another system I can just ship different data files, I'm more interested in speed and the flexibility which boost::serilization offers. I'd be very interested in your changes Matthias.
There are not many optimizations for XML files: most of the overhead is in parsing the strings. I you are interested in performance, a binary archive will always be faster than an XML one. Most of the optimizations for binary archives are already in Boost 1.35. I have a couple of questions: 1. why are your attributes a std::map< std::string, boost::any > and not a std::map< std::string, std::string > ? How do you find out which type to use? 2. why is your value a boost::any? How do you know the type to use? Matthias
1) In the overview section performance is nowhere to be seen as a goal, which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach? I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage?
Performance is a secondary goal that I have worked on, especially the serialization of large dense arrays. This is now as fast ad any hand crafted approach. What data structures are you interested in?
Matthias
I'm reading a xml file into a custom tree data structure, parsing the string representations into their correct types stored as boost::any. I'm hoping that deserialization using boost::serialize will be considerably faster than using libxml2 which I use to parse the xml file. The node data in this structure pretty much looks like:
vector< DataCollection > children; // Naturally all the children of this node std::string name; // This is the tag name in xml boost::any value; // This is <b>value</b> in xml std::map< std::string, boost::any > attributes; // Not entirely suprising the attributes of the xml node
The values stored in boost::any will be fairly lightweight, so I would recon that the majority of data read will actually be std::string for keys into the attributes as well as the name of the node. So I guess I'm having a little bit of everything hehe.
Since I won't send this data over network, and if I make a build for another system I can just ship different data files, I'm more interested in speed and the flexibility which boost::serilization offers. I'd be very interested in your changes Matthias.
There are not many optimizations for XML files: most of the overhead is in parsing the strings. I you are interested in performance, a binary archive will always be faster than an XML one. Most of the optimizations for binary archives are already in Boost 1.35.
I have a couple of questions:
1. why are your attributes a std::map< std::string, boost::any > and not a std::map< std::string, std::string > ? How do you find out which type to use?
2. why is your value a boost::any? How do you know the type to use?
When I parse the XML file with libxml2 I have a list where the different types have registered a regex filter which it will use to find the real type. Lets say you have for example <elem position="3 3 3">, then that will match the vector3 filter and construct a boost::any holding that vector3. I have a pretty neat system running here where I just need new types to register at FilterList. My DataCollection then have a Type& GetAttribute< Type >( const std::string& ), which basically wraps the any_cast and asserts that the typeids match. This way I get a pretty decent type safety, and since the client knows what type to expect it works out in the end. I don't really know how boost::serialize works under the hood, but I was expecting to get healthy speed up due to: A) libxml2 needs to parse the string data, locating start/end of xml elements, which I'm presuming is pretty costly in searching through all the string data. B) When I use libxml2 it first parses data into a string, which I then need to extract and match at runtime to construct the real type. C) I'm hoping the binary archive will take up less memory, resulting in less I/O. I strip the xml formating for example. I'm also enteraining the thought of having much more complex objects stored from my application, kind of using the binary archive as a cache. I haven't really explored that area all that much yet though.
Just an FYI, boost is not a SAX parser like libxml2, so it will be orders of
magnitude faster then. However you will loose some flexibility if for some
reason there is an error in your output stream due to human interaction. A
SAX parser could move on and ignore the error, it's kind of all or nothing
with boost::serialization. Unless Matthias or Robert know something that I
don't, as they are the authors, while I'm just a avid user.
Cheers,
Tim
On Mon, Aug 4, 2008 at 10:31 AM,
1) In the overview section performance is nowhere to be seen as a goal,
which for my use case is very important. If I were to use the binary archive, how well would it perform in comparsion to a hand crafted optimized serialization aproach? I've seen in the examples that strings seems to be used to identify data, won't this create a large overhead for both deserilzation and storage?
Performance is a secondary goal that I have worked on, especially the serialization of large dense arrays. This is now as fast ad any hand crafted approach. What data structures are you interested in?
Matthias
I'm reading a xml file into a custom tree data structure, parsing the string representations into their correct types stored as boost::any. I'm hoping that deserialization using boost::serialize will be considerably faster than using libxml2 which I use to parse the xml file. The node data in this structure pretty much looks like:
vector< DataCollection > children; // Naturally all the children of this node std::string name; // This is the tag name in xml boost::any value; // This is <b>value</b> in xml std::map< std::string, boost::any > attributes; // Not entirely suprising the attributes of the xml node
The values stored in boost::any will be fairly lightweight, so I would recon that the majority of data read will actually be std::string for keys into the attributes as well as the name of the node. So I guess I'm having a little bit of everything hehe.
Since I won't send this data over network, and if I make a build for another system I can just ship different data files, I'm more interested in speed and the flexibility which boost::serilization offers. I'd be very interested in your changes Matthias.
There are not many optimizations for XML files: most of the overhead is in parsing the strings. I you are interested in performance, a binary archive will always be faster than an XML one. Most of the optimizations for binary archives are already in Boost 1.35.
I have a couple of questions:
1. why are your attributes a std::map< std::string, boost::any > and not a std::map< std::string, std::string > ? How do you find out which type to use?
2. why is your value a boost::any? How do you know the type to use?
When I parse the XML file with libxml2 I have a list where the different types have registered a regex filter which it will use to find the real type. Lets say you have for example <elem position="3 3 3">, then that will match the vector3 filter and construct a boost::any holding that vector3. I have a pretty neat system running here where I just need new types to register at FilterList. My DataCollection then have a Type& GetAttribute< Type >( const std::string& ), which basically wraps the any_cast and asserts that the typeids match. This way I get a pretty decent type safety, and since the client knows what type to expect it works out in the end.
I don't really know how boost::serialize works under the hood, but I was expecting to get healthy speed up due to: A) libxml2 needs to parse the string data, locating start/end of xml elements, which I'm presuming is pretty costly in searching through all the string data. B) When I use libxml2 it first parses data into a string, which I then need to extract and match at runtime to construct the real type. C) I'm hoping the binary archive will take up less memory, resulting in less I/O. I strip the xml formating for example.
I'm also enteraining the thought of having much more complex objects stored from my application, kind of using the binary archive as a cache. I haven't really explored that area all that much yet though.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Regards, Timothy St. Clair [timothysc@gmail.com]
On Aug 4, 2008, at 5:42 PM, Tim St. Clair wrote:
Just an FYI, boost is not a SAX parser like libxml2, so it will be orders of magnitude faster then. However you will loose some flexibility if for some reason there is an error in your output stream due to human interaction. A SAX parser could move on and ignore the error, it's kind of all or nothing with boost::serialization.
That is also my understanding.
Unless Matthias or Robert know something that I don't, as they are the authors, while I'm just a avid user.
Actually, this is Robert's library. I am just responsible for some performance improvements. Matthias
participants (3)
-
Matthias Troyer
-
Sebastian.Karlsson@mmorpgs.org
-
Tim St. Clair