
On Thu, Jan 26, 2012 at 3:35 AM, Daniel James <dnljms@gmail.com> wrote:
On 25 January 2012 19:05, Grund, Holger (ISGT) <Holger.Grund@morganstanley.com> wrote:
My guess would be that there is 64-bit division for transforming the hash to a bucket index, which is killing perf (though one would wonder why there are no peeps for an instructions like MOD(1,x) in the codegen).
You're right. I tried removing all the modulus calculations from the container (fine in this case since it's only ever looking for hash values < 4 - I'm also using a modified benchmark which looks up i % 4 rather than 1, to stop things getting optimised away), and the difference became much smaller. I didn't realise how expensive they are on 64 bit computers, I think I might change the design of the container to reduce the number of times they're used, although that will make some other things a bit slower.
Is it possible to always use power-of-2 number of buckets and bitwise operations instead of modulus division?