
Robert Ramey wrote:
Ian McCulloch wrote:
[...]
Secondly, the buffer in the oprimitive class has much less functionality than the vector<char> buffer, as well as the buffer I used previously (http://lists.boost.org/Archives/boost/2005/11/97156.php). In particular, it does not check for buffer overflow when writing. Thus it has no capability for automatic resizing/flushing, and is only useful if you know in advance what the maximum size of the serialized data is. This kind of buffer is of rather limited use, so I think that this is not a fair comparison.
I think its much closer to the binary archive implementation the the current binary_oarchive is.
I dont understand that sentence, sorry. Which binary archive implementation?
I also think its fairly close to what an archive class would look like for a message passing application.
Surely it depends on the usage pattern? If you are sending fixed size messages, then sure a fixed size buffer with no overflow checks will be fastest. If you are sending variable size messages with no particular upper bound on the message size then it is a tradeoff whether you use a resizeable buffer or count the number of items you need to serialize beforehand. I wouldn't like to guess what is the more 'typical' use. Both are important cases.
The real difference here is that save_binary would be implemented in such a way that the overhead per call is pretty small. Maybe not quite as small as here, but much smaller than the overhead associated with ostream::write.
Ok, but even with the ideal fixed-size buffer, the difference between the serialization lib and save_array for out-of-cache arrays of char, measured by you, is:
Time using serialization library: 1.922 Time using direct call to save_array: 0.25 almost a factor 8. For a buffer that has more overhead, no matter how small, it will directly translate to an increase in that factor.
In my view, it does support my contention that implementing save_array - regardless of how it is in fact implemented - represents a premature optimization. I suspect that the net benefit in the kind of scenario you envision using it will be very small.
Note however that in this case, save_array() is purely memory-bandwidth limited. It would be interesting for you if you repeated the benchmark with a much smaller array size. You should see several jumps in performance corresponding to various caches, L1, L2, TLB, perhaps others. In any particular benchmark, some of these thresholds might be hard to see. You will need to put the serialization into a loop to get the CPU time to a sensible number, and do a loop or two before starting the timer so that the data is already in the cache. In the fixed-size buffer scenario this is actually not too far from a realistic benchmark. I know (roughly) what the result will be. If you still stand by your previous comment: then obviously you do not.
So I believe that the above results give a much more accurate picture than the previous ones do of the effect of application of the proposed enhancement.
Fine. I am glad you finally agree with the 10x slowdown figure (well, if you want to be picky 7.688x slowdown on your Windows XP box, 9.8512x on my linux-opteron box). [...]
Interestingly, on this platform/compiler combination, without the bug fix in save_binary() it still takes 1.11 seconds ;) I would guess your Windows compiler is doing some optimization that gcc is not, in that case.
Thanks for doing this - it is very helpful.
Sure you're compiling at maximum optimization -O3 .
Of course. -O3 gives no difference from -O2, small difference from -O1, huge difference from -O0. When there is a bug in the benchmark, any result is possible ;) Quite possibly your compiler was simply noticing that the same memory location was being overwritten repeatedly and chose to instead store it in a register? Anyway, since you took no special effort to ensure that the compiler didn't optimize away code it would have been quite legitimate for your benchmark to report 0 time for all tests. In the absence of such effort, you at least need to check carefully the assembly output to make sure the benchmark is really testing what you think it is testing.
In anycase, this is not untypical of my personal experience with benchmarks. They vary a lot depending on extraneaus variables. Our results seem pretty comparable though.
Robert Ramey
Cheers, Ian