
On 07/09/2011 21:03, Phil Endecott wrote:
Increasing the block size doesn't make any significant difference; reducing it below 4096 bytes does slow it down.
So the overhead of byteswapping compared to I/O - for a file cached in memory - is between about 25% (case 3) and 150% (case 4) on this system.
So std::reverse is six times slower than std::transform in that benchmark. Not entirely unexpected, especially on ARM.
So as expected the amount of CPU time used scales approximately as before, but the elapsed time doesn't change as it's limited by the SATA interface or SSD to around 50 MB/sec.
Personally I think these savings are worthwhile, and I believe that a library developer should normally assume that potential users of a library will have applications that need optimal performance, even if the developer is happy with something more modest.
The conclusion seems to be "doesn't matter for bandwidth, but does for latency". I take it that high-speed trading stuff needs the lowest latency possible?