New subject: [boost][multi_index][hash][rfc] Hashed indices preview +Boost.Hash

20 Feb 2005

      Hi Howard, I'm glad you jumped into this discussion!

----- Mensaje original -----
De: Howard Hinnant <hinnant@twcny.rr.com>
Fecha: Domingo, Febrero 20, 2005 5:31 pm
Asunto: Re: [boost]	[multi_index][hash][rfc]	Hashed	indices
	preview	+Boost.Hash
...
On Feb 20, 2005, at 10:21 AM, JOAQUIN LOPEZ MU?Z wrote:
...
...
...
1.4. Usual implementations use one pointer per bucket, whereas
Boost.MultiIndex has two pointers per bucket. This addresses
potential> performance problems of insert() and erase() under low
load conditions,
as discussed in
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf
issue 6.14. Is this convenient?
Well it's not unreasonable, you need what you need.
BTW, have you read the committee's resolution to this
issue? It doesn't make any sense to me (Bill Wade's concern,
OTOH, seems very rasonable.)
Imho the committee's resolution is quite logical.
The hash containers exist for one reason:  As an optimized 
solution, in 
both memory consumption and speed, compared to the tree-based 
containers.  The speed department especially concerns lookup, 
insert 
and erase (in that order), but not iteration (there are other 
containers which optimize iteration, vector being the extreme 
example).
The coder who carefully tunes his hash function and loading will 
be 
rewarded with significant optimization.  The coder who doesn't, 
may be 
presented with a solution that performs worse than the tree-based 
containers, and is better off sticking with the O(Log N) solution.
Imho, the committee should do absolutely nothing to penalize the 
coder 
who has carefully tuned his hash function and loading, either in 
memory 
consumption nor speed.  Of course if hash container writers would 
like 
to add safeguards which may also add such overhead, I see little 
rationale for the committee to object to that either.
Doubling the per element storage overhead for a perfect 
distribution 
with a loading of 1 is a serious hit in the memory consumption 
department, and may well lead to a performance hit due to more L2 
cache 
misses.  Adding bidirectional iteration, again at the cost of 
doubling 
the per element overhead in a well tuned application makes little 
sense 
to me.  The container is unordered.  When you iterate in the 
forward 
direction, you access the elements in a generally unpredictable, 
unordered order.  What motivation is there (motivation high enough 
to 
double per element overhead), to iterate in the reverse of the 
generally unpredictable, unordered order?
Again, I have no wish for the committee to outlaw such hash 
containers. 
 But I strongly feel that neither should they outlaw the more 
traditional hash container design which favors uncompromising 
performance when everything is tuned just right.
What strikes about the resolution is that it doesn't seem to
address Bill Wade's concern: in a minimum overhead "classic"
hash table implementation (one pointer per element + one pointer
per bucket), iteration is not going to be amortized constant under
low load conditions.
This is in very strong disagreement with that
the std requires about forward iterators. I think this
cannot be just swept under the rug: if the committee is going
to accept it, at least it should be noted somewhere.

The implementation I adopted (2 pointers per node + 2 pointers
per bucket) has O(1) iteration, insert() and erase(), whatever
the load conditions (provided the hash function behaves, of
course.) I agree with you bidirectional iteration is of little
value, and my main motivation was to achieve O(1). So, I'll be
happy to go for a lighter implementation if these issues
can be solved in another way --or if there's consensum that
costly iteration under low load conditions is acceptable.

PS. The worst of all implementations seems to be Dinkum's,
where iteration is O(1) but insertion will degrade to
O(bucket_count()^2) when a previous rehashing is issued,
as Wade points out. It also puzzled me that the comittee
NADed the issue without at least a comment on this.

Looking fwd to your comments,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo

Re: [boost][multi_index][hash][rfc] Hashed indices preview +Boost.Hash

JOAQUIN LOPEZ MU?Z

Howard Hinnant

Daniel James

tags

participants (3)