Re: [boost] [multi_index] benchmarking

----- Mensaje original ----- De: Maxim Yegorushkin <maxim.yegorushkin@gmail.com> Fecha: Martes, Mayo 30, 2006 10:25 pm Asunto: Re: [boost] [multi_index] benchmarking
Darren Cook wrote:
Here are results for cvs head version with your modification, which show that your analysis was correct:
It would be interesting to see how your results compare to Google's (open source) hash map: http://goog-sparsehash.sourceforge.net/ http://goog-sparsehash.sourceforge.net/doc/performance.html
I've not used them but they seem to give some good improvements (memory> or speed, but not both) over std_hash and std_map.
Surprisingly google::dense_hash_set often outperforms all other containers in both speed and memory usage. [...] 99997 elements; 10 iterations [...] memory test (lower is better) gs_hash [ 21.64%] OOOOOOOO 432760 bytes gd_hash [ 52.43%] OOOOOOOOOOOOOOOOOOOO 1048576 bytes std_hash [ 61.58%] OOOOOOOOOOOOOOOOOOOOOOOO 1231568 bytes ext_hash [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586428 bytes mi_hash [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586440 bytes mi_set [ 80.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1599968 bytes std_set [100.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1999940 bytes
Hello Maxim, gd_hash is expected to be faster than other hash containers, but in general it occupies more memory --only that your test is particularly favorable to gd_hash, let me explain: For a regular, bucket-based singly-linked hashed container, the memory consumption for n elements can be calculated as (assuming a 32-bit architecture where pointers are 4 bytes): R(n)=n*(sizeof(element)+4*(1+1/lf)) where lf is the load factor, typically oscillating between 0.5 and 1. Setting n=99997, element=unsigned int, lf=0.75, we've got R = 99997*(4+4(1+1/0.75) = 1333293 which is between actual values for std_hash and ext_hash/mi_hash. Now, for a Google dense hash set the memory consumption is expressed by the formula D(n)=2^(ceil(1+logbin n)) * sizeof(element) where ^ denotes exponentiation, ceil(x) is the smallest integer not less than x, and logbin is the binary logarithm. With the same settings as above we've got D = 2^(ceil(1+logbin 99997)) * 4 = = 2^(ceil(1+16,06)) * 4 = 2^18 * 4 = = 1048576 that is, exactly what your figures show. Why is your test favorable to gd_hash? Because sizeof(element) is small; if you had used a not too large class having, say, four int-sized elements, i.e. sizeof(element)=4*4=16, the resulting numbers change a lot: R = 99997*(16+4*(1+1/0.75)) = = 2533257 D = 2^(ceil(1+logbin 99997) * 16 = 2^18 * 16 = = 4194304 That is, R(n) grows as n*(C+sizeof(element)) with C a constant, while D(n) has the form K*n*sizeof(element) with K always >=2. Hence, when sizeof(element) is sufficiently large gd_hash takes nearly twice the space as a regular hash container. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Surprisingly google::dense_hash_set often outperforms all other containers in both speed and memory usage.
[...]
Hello Maxim, gd_hash is expected to be faster than other hash containers, but in general it occupies more memory --only that your test is particularly favorable to gd_hash, ...
Very interesting results (and explanation). Thanks. Recently if I'm writing in C++ instead of a quick-and-easy scripting language it is often because I need a lookup table (i.e. a hash-map) with millions of entries and performance starts to really matter, so seeing where each solution is better/worse is very useful to me. Darren

JOAQUIN LOPEZ MU?Z wrote:
----- Mensaje original ----- De: Maxim Yegorushkin <maxim.yegorushkin@gmail.com> Fecha: Martes, Mayo 30, 2006 10:25 pm Asunto: Re: [boost] [multi_index] benchmarking
Darren Cook wrote:
Here are results for cvs head version with your modification, which show that your analysis was correct: It would be interesting to see how your results compare to Google's (open source) hash map: http://goog-sparsehash.sourceforge.net/ http://goog-sparsehash.sourceforge.net/doc/performance.html
I've not used them but they seem to give some good improvements (memory> or speed, but not both) over std_hash and std_map.
Surprisingly google::dense_hash_set often outperforms all other containers in both speed and memory usage. [...] 99997 elements; 10 iterations [...] memory test (lower is better) gs_hash [ 21.64%] OOOOOOOO 432760 bytes gd_hash [ 52.43%] OOOOOOOOOOOOOOOOOOOO 1048576 bytes std_hash [ 61.58%] OOOOOOOOOOOOOOOOOOOOOOOO 1231568 bytes ext_hash [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586428 bytes mi_hash [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586440 bytes mi_set [ 80.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1599968 bytes std_set [100.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1999940 bytes
Hello Maxim, gd_hash is expected to be faster than other hash containers, but in general it occupies more memory --only that your test is particularly favorable to gd_hash, let me explain:
[]
That is, R(n) grows as n*(C+sizeof(element)) with C a constant, while D(n) has the form K*n*sizeof(element) with K always >=2. Hence, when sizeof(element) is sufficiently large gd_hash takes nearly twice the space as a regular hash container.
Thank you for the illuminating explanation.
participants (3)
-
Darren Cook
-
JOAQUIN LOPEZ MU?Z
-
Maxim Yegorushkin