On 07/27/2014 05:20 PM, Niall Douglas wrote:
1. Items in a bucket were kept via an atomic forward pointer to the next item and one used cmpxchg to insert and remove items. This had really great, even superb, performance for load factors near one, but became pathological for badly distributed hash functions (I'm looking at you libstdc++). Once you rehashed the hash function to make it better distributed, performance nose dived. I was tempted to override std::hash with a decent implementation ...
I would not rule out this solution. You can just tell users to supply their own hash function to solve the distribution problem.
Final proposed design (comments welcome):
If value_type has a big sizeof(), a very good idea is to make the above a unique_ptr as otherwise you can't pack many item_type's into a cache line (AFIO's value_type is a std::pair
> which is 24 bytes, so item_type is 32 bytes which works well for my purposes).
You add storage traits so users can decide for themselves. This trait would contain the stored type and to/from conversion functions.
* To insert or find an item involves determining the bucket from the modulus of the hash by the number of buckets, locking the bucket and scanning the linear list for an empty item or the right item as appropriate. We know empty items from a magic hash value which is defined to some magic constant (advice on the best magic hash value least likely to turn up from std::hash is welcome). find() starts
std::hash already does a modulo operation, so you could simply do another to free a number for the magic value. For example, auto magic = std::numeric_limitHash::result_type::max(); hash %= magic; It will change the distribution, however, as as both 0 and max are mapped to 0.