New subject: [multi_index] benchmarking

30 May 2006

      ----- Mensaje original -----
De: Maxim Yegorushkin <maxim.yegorushkin@gmail.com>
Fecha: Martes, Mayo 30, 2006 10:25 pm
Asunto: Re: [boost] [multi_index] benchmarking
...
Darren Cook wrote:
...
...
Here are results for cvs head version with your modification, 
which show that 
your analysis was correct:
It would be interesting to see how your results compare to Google's
(open source) hash map:
  http://goog-sparsehash.sourceforge.net/
  http://goog-sparsehash.sourceforge.net/doc/performance.html
I've not used them but they seem to give some good improvements 
(memory> or speed, but not both) over std_hash and std_map.
Surprisingly google::dense_hash_set often outperforms all other 
containers in both speed and memory usage.
[...]
99997 elements; 10 iterations
[...]
memory test (lower is better)
gs_hash   [ 21.64%] OOOOOOOO 432760 bytes
gd_hash   [ 52.43%] OOOOOOOOOOOOOOOOOOOO 1048576 bytes
std_hash  [ 61.58%] OOOOOOOOOOOOOOOOOOOOOOOO 1231568 bytes
ext_hash  [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586428 bytes
mi_hash   [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586440 bytes
mi_set    [ 80.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1599968 bytes
std_set   [100.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 
1999940 bytes
Hello Maxim, gd_hash is expected to be faster than other hash
containers, but in general it occupies more memory --only that
your test is particularly favorable to gd_hash, let me explain:

For a regular, bucket-based singly-linked hashed container, the
memory consumption for n elements can be calculated as
(assuming a 32-bit architecture where pointers are 4 bytes):

  R(n)=n*(sizeof(element)+4*(1+1/lf))

where lf is the load factor, typically oscillating between
0.5 and 1. Setting n=99997, element=unsigned int, lf=0.75,
we've got

  R = 99997*(4+4(1+1/0.75) = 1333293

which is between actual values for std_hash and
ext_hash/mi_hash.
Now, for a Google dense hash set the memory consumption
is expressed by the formula

  D(n)=2^(ceil(1+logbin n)) * sizeof(element)

where ^ denotes exponentiation, ceil(x) is the smallest
integer not less than x, and logbin is the binary logarithm.
With the same settings as above we've got

  D = 2^(ceil(1+logbin 99997)) * 4 =
    = 2^(ceil(1+16,06)) * 4 = 2^18 * 4 =
    = 1048576

that is, exactly what your figures show. Why is your test
favorable to gd_hash? Because sizeof(element) is small; if
you had used a not too large class having, say, four
int-sized elements, i.e. sizeof(element)=4*4=16, the
resulting numbers change a lot:

  R = 99997*(16+4*(1+1/0.75)) =
    = 2533257

  D = 2^(ceil(1+logbin 99997) * 16 = 2^18 * 16 =
    = 4194304

That is, R(n) grows as n*(C+sizeof(element)) with C a
constant, while D(n) has the form K*n*sizeof(element)
with K always >=2. Hence, when sizeof(element) is
sufficiently large gd_hash takes nearly twice the space
as a regular hash container.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo

Re: [boost] [multi_index] benchmarking

JOAQUIN LOPEZ MU?Z

Darren Cook

Maxim Yegorushkin

tags

participants (3)