Re: [boost] Request for feedback on design of concurrent_unordered_map, plus notes on use of memory transactions

28 Jul 2014


      On 28/07/2014 03:20, Niall Douglas wrote:
...
* item_type is allocated via a rebind of the supplied allocator, and
looks like:
struct item_type
{
   value_type p;
   size_t hash;
};
I presume this will support partial construction to handle the case when 
value_type does not have a default constructor, or is movable-only, etc?
...
* To insert or find an item involves determining the bucket from the
modulus of the hash by the number of buckets, locking the bucket and
scanning the linear list for an empty item or the right item as
appropriate. We know empty items from a magic hash value which is
defined to some magic constant (advice on the best magic hash value
least likely to turn up from std::hash is welcome).
Isn't it by definition (admittedly reality is less ideal) that all hash 
values have equal probability?  What happens if a real item does end up 
with the magic "empty" hash value?  I think it'd be safer to have a 
separate flag for this case.
...
* To erase an item is easy: iterators store an iterator to the bucket
and an offset into the linear table. Erasure involves constructing a
temporary value_type, locking the bucket and performing a *move*
construction out of the table into the temporary value_type,
resetting the hash to empty. Destruction can then occur without
holding the bucket lock. If the item just erased is the last in the
bucket, the bucket's size (not capacity) is reduced by one, and that
is repeated until the bucket size is the minimal possible.
If the hash is reset to empty and the lock released before destruction 
occurs, doesn't that mean that something else could try to concurrently 
construct into the same storage?  After all, the item slot will now look 
empty to insert().  Similarly the bucket size could be reduced by one 
just as insert() puts something into that bucket slot (thinking that it 
was still within the valid range, so not trying to increment it itself).
...
* size() under this design becomes O(bucket count) as we sum the
count members of each bucket. empty() is worst case O(bucket count)
when the map is empty, best case O(1) when the first bucket is not
empty, average case O(bucket count/item count/2).
You could use a relaxed count at the top level to improve this, 
possibly.  size() and empty() become statistical rather than 
prescriptive in the presence of concurrent writes anyway, so a stale 
value is not harmful.  Although I suppose conversely updating a shared 
counter will cause cache contention performance loss for insert/erase, 
which may not be worth it for the benefit of statistical functions that 
are less likely to even be called in a concurrent design.
...
Thoughts on this design before I write a final implementation are
welcome. I don't plan to complete my implementation past the minimum
needed for proposed Boost.AFIO, so if someone else wishes to finish
it that and make it part of official Boost that would be great.
I presume you've looked at other common implementations of this sort of 
thing?  eg. libcds.

Re: [boost] Request for feedback on design of concurrent_unordered_map, plus notes on use of memory transactions

Gavin Lambert