On 28/07/2014 03:20, Niall Douglas wrote:
* item_type is allocated via a rebind of the supplied allocator, and looks like:
struct item_type { value_type p; size_t hash; };
I presume this will support partial construction to handle the case when value_type does not have a default constructor, or is movable-only, etc?
* To insert or find an item involves determining the bucket from the modulus of the hash by the number of buckets, locking the bucket and scanning the linear list for an empty item or the right item as appropriate. We know empty items from a magic hash value which is defined to some magic constant (advice on the best magic hash value least likely to turn up from std::hash is welcome).
Isn't it by definition (admittedly reality is less ideal) that all hash values have equal probability? What happens if a real item does end up with the magic "empty" hash value? I think it'd be safer to have a separate flag for this case.
* To erase an item is easy: iterators store an iterator to the bucket and an offset into the linear table. Erasure involves constructing a temporary value_type, locking the bucket and performing a *move* construction out of the table into the temporary value_type, resetting the hash to empty. Destruction can then occur without holding the bucket lock. If the item just erased is the last in the bucket, the bucket's size (not capacity) is reduced by one, and that is repeated until the bucket size is the minimal possible.
If the hash is reset to empty and the lock released before destruction occurs, doesn't that mean that something else could try to concurrently construct into the same storage? After all, the item slot will now look empty to insert(). Similarly the bucket size could be reduced by one just as insert() puts something into that bucket slot (thinking that it was still within the valid range, so not trying to increment it itself).
* size() under this design becomes O(bucket count) as we sum the count members of each bucket. empty() is worst case O(bucket count) when the map is empty, best case O(1) when the first bucket is not empty, average case O(bucket count/item count/2).
You could use a relaxed count at the top level to improve this, possibly. size() and empty() become statistical rather than prescriptive in the presence of concurrent writes anyway, so a stale value is not harmful. Although I suppose conversely updating a shared counter will cause cache contention performance loss for insert/erase, which may not be worth it for the benefit of statistical functions that are less likely to even be called in a concurrent design.
Thoughts on this design before I write a final implementation are welcome. I don't plan to complete my implementation past the minimum needed for proposed Boost.AFIO, so if someone else wishes to finish it that and make it part of official Boost that would be great.
I presume you've looked at other common implementations of this sort of thing? eg. libcds.