[intrusive] linear hashing
Hi, Can anyone explain the performance characteristics of linear hashing. I read the Wikipedia article and I'm still not sure. The article mentions growing the hash table incrementally by 1, but the instrusive unordered_set doesn't grow/rehash unless you tell it to? When does it make sense to use the linear hashing option? Why is it not on by default? Is it because it slows down the address lookup into the table. Is the ideal bucket size still 1 per expected element. Thanks -ephi
El 29/12/2010 4:48, Ephraim Sinowitz escribió:
Hi, Can anyone explain the performance characteristics of linear hashing. I read the Wikipedia article and I'm still not sure. The article mentions growing the hash table incrementally by 1, but the instrusive unordered_set doesn't grow/rehash unless you tell it to? When does it make sense to use the linear hashing option? Why is it not on by default? Is it because it slows down the address lookup into the table. Is the ideal bucket size still 1 per expected element.
The Wikipedia article explains it quite well: Linear Hashing: The cost of hash table expansion is spread out across each hash table insertion operation, as opposed to being incurred all at once. Linear hashing is therefore well suited for interactive applications. Linear hashing typically requires power of two length bucket array, but those buckets are not fully used from the beginning, as it is when using non-linear hashing (which typically uses prime length bucket array). Although the bucket array, say, has 16 buckets, the implementation uses first just one or two, then when after incremental rehashing uses three, etc.. incrementally, until it fills those 16 buckets. Then for a new incremental rehash, it allocates a new bucket array with length 32, but starts using only 17, and continues with incremental hashing until reaching 32. I think Dinkum STL used this incremental rehashing. The key is that in each incremental hashing, not all elements are rehashed, but just elements of a single bucket, distributing hashing impact in all insertions. For more information on hashing alternatives see the original standard hashing container proposal (chapter Control of Hash Resizing): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html Now, intrusive containers don't allocate memory at all, so incremental rehashing must be triggered by the user using "incremental_rehash(bool)" (use an additional bucket, that is, incremental rehash) and "incremental_rehash(bucket_traits)" (to update the new bucket array with an array that should be twice/half the size of the previous one, to allow more incremental rehashings in the future). I admit that this is not explained at all with an example, so I will note this issue in my to do list. As in Intrusive incremental (linear) hashing is triggered by the user, I think most people should go with non-linear hashing (which is the default), except those that want to build a non-intrusive container based on intrusive that should have incremental hashing performance needs. Best, Ion
participants (2)
-
Ephraim Sinowitz
-
Ion Gaztañaga