
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
Furthermore, it's not a fair comparison unless you first measure the number of bytes you have to save so you can preallocate the buffer. In general the only way to do that is with a special counting archive, so you have to account for the time taken up by the counting. Of course we did that test too. The code and test results are attached.
Without seeing the implementation of binary_oprimitive you plan to use I can only speculate what would be the closest test. Assuming that performance is an issue, I wouln't expect you to use the current binary_oarchive which is based on stream i/o. So if that's an important factor then it shouldn't used for benchmarking. I presume that is why Matthias chose not to use it. On the other hand its not clear why one sould chose to use a buffer based on std:vector<char> for this purpose either. I chose an implementation which I thought would be closest to the one that would actually end up being used for a network protocol. The question is what is the time difference between one invocation of save_binary with data N bytes long vs N invocations of save_binary 1 byte long. That is really all that is being measured here. So using an implementation of save_binary based on stream write isn't really very interesting unless one is really going to use that implementation. Of course I don't really know if you are going to do that - I just presumed you weren't.
In case it isn't obvious to you by now Matthias Troyer is a world-class expert in high performance computing. You don't get to be a recognized authority in that area without developing the ability to create tests that accurately measure performance. You also develop some generalized knowledge about what things will lead to slowdowns and speedups. It's really astounding that you manage challenge every assertion he makes in a domain where he is an expert and you are not, especially in a domain with so many subtle pitfalls waiting for the naive tester.
wow - well the bench mark was posted and I took that as an indication that it was ok to check it out. Sorry about that - Just go back to the std::vector<char> implementation of buffer and well let it go at that.
a) the usage of save_array does not have a huge effect on performance. It IS measureable. It seems that it saves about 1/3 the time over using a loop of saves in the best case. (1)
In the best case, even with your flawed test, it's a factor of 2 as shown above.
which is a heck of a lot less than 10x
b) In the worst case, its even slower than a loop of saves!!! (2) and even slower than the raw serialization system (3)
That result is completely implausible. If you can get someone else to reproduce it using a proper test protocol I'll be *quite* impressed.
Well, at least we can agree on that. We've corrected the bench mark and made a few more runs. The anomaly above disappears and things still vary but things don't change all that much. BTW, the program has a value type which can be set to either char or double which tests different primitives. If the results the rest of are showing are way differen than yours that might be an explanation.
c) the overhead of the serialization library isn't too bad. It does show up when doing 100M characters one by one, but generally it doesn't seem to be a big issuues.
In my view, it does support my contention that implementing save_array - regardless of how it is in fact implemented - represents a premature optimization. I suspect that the net benefit in the kind of scenario you envision using it will be very small.
Obviously, this test raises more questions than it answers
Like what questions?
a) Like the anomoly above - which I don't think is an issue anymore b) Will the current stream based implementation of binary_oarchive be used? or would it be substituted for a different one. c) What would the results be for the actual archive you plan to use? Robert Ramey