Updated performance results using Boost Serialization 1.41

I've recently updated the performance section of a comparison between the Boost Serialization library and the C++ Middleware Writer -- http://webEbenezer.net/comparison.html#perf<http://webebenezer.net/comparison.html#perf>. The new tests were done on Fedora 12 and Windows Vista. The previous version of that file is here http://webEbenezer.net/comp138.html#perf<http://webebenezer.net/comp138.html#perf>. The most dramatic change occurred on Windows. Previously the Boost versions were around 2.7 times slower than the Ebenezer versions. Now they are between 3.8 and 4.0 times slower than the Ebenezer versions. I believe that difference is due to our dropping return codes for exceptions. I'm not sure why it shows up more on Windows than on Linux. Cheers, Brian Wood Ebenezer Enterprises http://www.webEbenezer.net "When a man's ways please the L-RD, he makes even his enemies to be at peace with him." Proverbs 16:7

I've recently updated the performance section of a comparison between the Boost Serialization library and the C++ Middleware Writer -- http://webEbenezer.net/comparison.html#perf. The new tests were done on Fedora 12 and Windows Vista. The previous version of that file is here http://webEbenezer.net/comp138.html#perf. The most dramatic change occurred on Windows. Previously the Boost versions were around 2.7 times slower than the Ebenezer versions. Now they are between 3.8 and 4.0 times slower than the Ebenezer versions. I believe that difference is due to our dropping return codes for exceptions. I'm not sure why it shows up more on Windows than on Linux.
Our measurements show Boost.Serialization spends most of its time creating (constructing) archives and there it's mainly initializing the locale... Not sure about the exceptions, I have no related data. Regards Hartmut ------------------- Meet me at BoostCon http://boostcon.com

On Dec 4, 2009, at 7:11 PM, Hartmut Kaiser wrote:
Our measurements show Boost.Serialization spends most of its time creating (constructing) archives and there it's mainly initializing the locale...
The locale initialization can be avoided if your application can get away with using the boost::archive::no_codecvt construction flag. (Assuming that's still behaving the same as when I looked into this performance area back around boost 1.33.) These threads may also be of interest: http://lists.boost.org/Archives/boost/2005/07/90814.php http://lists.boost.org/Archives/boost/2006/03/102133.php I keep promising myself that I will soon collect sufficient round tuits to do something about this, but so far those promises remain broken. Maybe once I upgrade my team to boost 1.41.

Kim Barrett wrote:
On Dec 4, 2009, at 7:11 PM, Hartmut Kaiser wrote:
Our measurements show Boost.Serialization spends most of its time creating (constructing) archives and there it's mainly initializing the locale...
The locale initialization can be avoided if your application can get away with using the boost::archive::no_codecvt construction flag. (Assuming that's still behaving the same as when I looked into this performance area back around boost 1.33.)
These threads may also be of interest:
http://lists.boost.org/Archives/boost/2005/07/90814.php http://lists.boost.org/Archives/boost/2006/03/102133.php
I keep promising myself that I will soon collect sufficient round tuits to do something about this, but so far those promises remain broken. Maybe once I upgrade my team to boost 1.41.
I'm aware of this suggestion and it's motivation. I'm not convinced that this is the best way to address. In your particular application, I don't think it should be necessary to "reset" the archive as long as you're not serializating any pointers. If one is not serializing any pointers, then no tracking is done. So a "reset" operation should be surperfluous. The best way to handle this is still an open question to me Robert Ramey

On Dec 7, 2009, at 12:29 PM, Robert Ramey wrote:
I'm aware of this suggestion and it's motivation. I'm not convinced that this is the best way to address.
I'd be happy to discuss alternatives.
In your particular application, I don't think it should be necessary to "reset" the archive as long as you're not serializating any pointers. If one is not serializing any pointers, then no tracking is done. So a "reset" operation should be surperfluous.
I hadn't realized that archive reuse might already be an option when there's been no pointer tracking. (Of course, that probably also requires that one is using the boost::archive::no_header option, but that is true for my use-cases where this performance issue is of concern.) That restriction of no pointer serialization is not met by my application. Serialization of pointers isn't common, but some of the data types being serialized are complex and do include pointers, and in some cases may be polymorphic and serialized via base pointers. It might be possible to tag the complex types in some fashion though, and do archive reconstruction when dealing with those. Or it might be possible to detect that complex case has been encountered (by peeking inside the archive) and force a reconstruction on next attempt to reuse. I will add this to my notes about this issue, to think about some more when I finally find some time for it.

Kim Barrett wrote:
On Dec 7, 2009, at 12:29 PM, Robert Ramey wrote:
I'm aware of this suggestion and it's motivation. I'm not convinced that this is the best way to address.
I'd be happy to discuss alternatives.
In your particular application, I don't think it should be necessary to "reset" the archive as long as you're not serializating any pointers. If one is not serializing any pointers, then no tracking is done. So a "reset" operation should be surperfluous.
I hadn't realized that archive reuse might already be an option when there's been no pointer tracking. (Of course, that probably also requires that one is using the boost::archive::no_header option, but that is true for my use-cases where this performance issue is of concern.)
That restriction of no pointer serialization is not met by my application. Serialization of pointers isn't common, but some of the data types being serialized are complex and do include pointers, and in some cases may be polymorphic and serialized via base pointers. It might be possible to tag the complex types in some fashion though, and do archive reconstruction when dealing with those. Or it might be possible to detect that complex case has been encountered (by peeking inside the archive) and force a reconstruction on next attempt to reuse. I will add this to my notes about this issue, to think about some more when I finally find some time for it.
The "real" solution which I envision is just to suppress tracking in an archive. I've considered different syntaxes for doing this and the best way to implement this. serialization of rvalues also touches on this subject so the best way to do this isn't as obvious as it might seem to a casual user. If such facility were implemented, one would proceed something like this. create an output stream. This would not be a file stream but rather be plugged into a communication channel. If asio doesn't have something like already, I'm sure a "channel_buffer" would easily be crafted. Open this stream and connect to the other application. Then open the archive using this ostream as an argument. On the other side of the channel open an archive using an istream (with this channel_buffer type). Then your in business !!. The sending side just uses the << operator to send any data it want's while the other side just uses the >> operator to reconstruct any data sent. Easy as pie ! So, to my way of thinking, that's the real solution. Robert Ramey

On Dec 7, 2009, at 7:08 PM, Robert Ramey wrote:
The "real" solution which I envision is just to suppress tracking in an archive. [...]
If such facility were implemented, one would proceed something like this.
create an output stream. This would not be a file stream but rather be plugged into a communication channel. If asio doesn't have something like already, I'm sure a "channel_buffer" would easily be crafted. Open this stream and connect to the other application.
Then open the archive using this ostream as an argument. On the other side of the channel open an archive using an istream (with this channel_buffer type).
Then your in business !!. The sending side just uses the << operator to send any data it want's while the other side just uses the >> operator to reconstruct any data sent. Easy as pie !
So, to my way of thinking, that's the real solution.
While that is an interesting solution to some problems, I don't think it actually helps with my use-cases (multiple). As mentioned earlier, turning off pointer tracking is not an option for me in some cases. I can envision various ways to deal with that. If one had a means for determining whether serializing a data structure used or would use tracking one could use this approach now, with no [further] changes to the serialization library. However, none of my existing use-cases are stream oriented. They are instead all transaction / packet oriented. This means there needs to be a clear boundry, where all output to be generated by the archive has been flushed to the buffer collecting the data, any "end of archive" data needs to be written, and the next transaction / packet needs to include any "beginning of archive" data needs to be written again. Right now I'm accomplishing that by deleting the old archive and creating a new one. I'm looking for either a lighter weight alternative to that (preferably), or a way to make that delete / recreate lighter weight.

Kim Barrett wrote:
While that is an interesting solution to some problems, I don't think it actually helps with my use-cases (multiple).
As mentioned earlier, turning off pointer tracking is not an option for me in some cases. I can envision various ways to deal with that. If one had a means for determining whether serializing a data structure used or would use tracking one could use this approach now, with no [further] changes to the serialization library.
However, none of my existing use-cases are stream oriented. They are instead all transaction / packet oriented. This means there needs to be a clear boundry, where all output to be generated by the archive has been flushed to the buffer collecting the data, any "end of archive" data needs to be written, and the next transaction / packet needs to include any "beginning of archive" data needs to be written again. Right now I'm accomplishing that by deleting the old archive and creating a new one. I'm looking for either a lighter weight alternative to that (preferably), or a way to make that delete / recreate lighter weight.
How about flush()? I implemented such a thing back in 2005: http://lists.boost.org/Archives/boost/2005/03/81544.php I would still find it useful, I have a lot of code that relies on anonymous scopes to control construction/destruction of archives, the code would look a lot more straightforward if I could just flush(). The use case is can be modeled the same was as sending packets over the network. Each packet needs to have internal pointer tracking, but independent of the others. -t

On Dec 9, 2009, at 3:25 PM, troy d. straszheim wrote:
Kim Barrett wrote:
However, none of my existing use-cases are stream oriented. They are instead all transaction / packet oriented. This means there needs to be a clear boundry, where all output to be generated by the archive has been flushed to the buffer collecting the data, any "end of archive" data needs to be written, and the next transaction / packet needs to include any "beginning of archive" data needs to be written again. Right now I'm accomplishing that by deleting the old archive and creating a new one. I'm looking for either a lighter weight alternative to that (preferably), or a way to make that delete / recreate lighter weight.
How about flush()? I implemented such a thing back in 2005:
http://lists.boost.org/Archives/boost/2005/03/81544.php
I would still find it useful, I have a lot of code that relies on anonymous scopes to control construction/destruction of archives, the code would look a lot more straightforward if I could just flush(). The use case is can be modeled the same was as sending packets over the network. Each packet needs to have internal pointer tracking, but independent of the others.
This is similar to the "reset" operation that I suggested, also back in 2005. Your flush() operation keeps the class information and only discards object identity information. That doesn't work if some of the receivers might not get every bit of data, as can happen when using an unreliable network protocol such as UDP. That is one of the use-cases of interest to me. There are others that are similarly unreliable, in the sense that receivers will always get complete "packets" of data, but might not get all of them.

Kim Barrett wrote:
On Dec 7, 2009, at 7:08 PM, Robert Ramey wrote: ...
So, to my way of thinking, that's the real solution.
While that is an interesting solution to some problems, I don't think it actually helps with my use-cases (multiple).
As mentioned earlier, turning off pointer tracking is not an option for me in some cases. I can envision various ways to deal with that. If one had a means for determining whether serializing a data structure used or would use tracking one could use this approach now, with no [further] changes to the serialization library.
However, none of my existing use-cases are stream oriented. They are instead all transaction / packet oriented. This means there needs to be a clear boundry, where all output to be generated by the archive has been flushed to the buffer collecting the data, any "end of archive" data needs to be written, and the next transaction / packet needs to include any "beginning of archive" data needs to be written again. Right now I'm accomplishing that by deleting the old archive and creating a new one. I'm looking for either a lighter weight alternative to that (preferably), or a way to make that delete / recreate lighter weight.
Hmmm - Faced with this, struct transaction { transaction(data1 &d1, datad &d2, ....) // or transaction(data1 d1, data2 d2, ...) template<class Archive> void serialize(Archive &ar, const unsigned int version){ ar & d1; ar & d2; ... } }; I would think the most natural solution would be: So on the sending side: main(.){ tcpostream tos; text_oarchive oa(tos) loop(){ transaction(d1, d2, ..) t; oa << t; } } And recieving side main(.){ tcpistream tis; text_iarchive ia(tis) while(tis.open()){ transaction(d1, d2, ..) t; ia >> t; } } The only fly in this ointment is that tracking will imply that only the first transaction data is actually sent. So to implement this - only tracking has to be surpressed. This is related to the issue of serialization of r_values which conflict with the idea of tracking in a fundamental way. If this were addressed one could just use oa << transation(d1, d2, ..) This would be pretty efficient in that all the class information would be sent only once. Note that MPI also has a special archive for this sort of stuff you might want to check that out. Robert Ramey

On Dec 9, 2009, at 3:51 PM, Robert Ramey wrote:
This would be pretty efficient in that all the class information would be sent only once. Note that MPI also has a special archive for this sort of stuff you might want to check that out.
I didn't know about that special MPI archive. I think I will need to go look at it. Not a complete solution for my uses, but definitely sounds like a useful piece.
participants (5)
-
Brian Wood
-
Hartmut Kaiser
-
Kim Barrett
-
Robert Ramey
-
troy d. straszheim