Re: [boost] [multi_index] benchmarking

29 May 2006

      Maxim Yegorushkin ha escrito:
...
JOAQUIN LOPEZ MU?Z wrote:
[...]
...
...
If so,
I cannot find a reason why mi_set performs in general
sensibly better than std_set, since the data structures
and algorithms are basically equivalent, and B.MI
introduces some metaprogramming overhead that the
optimizer usually doesn't clean away entirely. I'd
have hoped for a small decrease in performance
for B.MI, though not as high as that observed in
mi_hash wrt std_hash. Do you have an idea anout this?
The attached results show that B.MI is faster on insert and erase, but slightly
slower on find hit/miss than std::set. My guesswork is that std::set probably
does more(?) balancing on insert/erase.
I doubt that. The basic RB-tree algorithms are based on SGI STL
both in the case of B.MI and libstdc++. I might be wrong, of course,
and libstdc++ might have changed a lot wrt to its SGI ancestor.

[...]
...
...
I must say that B.MI does not intend to outperform
single-index std containers since in general the
data structures are equivalent; it is when
multi-indexing comes into play that I expect a
large impact in favor of B.MI.
I see nothing what could prevent B.MI from being as fast as the
standard containers.
The metaprogramming scaffolding constructed by B.MI with the
aim of allowing multi-indexing is not always completely optimized
away by the compiler, basically, most operations go through a lot more
(ideally inlined) functions. GCC is traditionally poorer at optimizing
than say MSVC or ICC. But the range of degradation I expect is
0-10%, not as high as the 250+% on insert that your tests show for
mi_hash...

...which has gotten me examining the test to determine what's going
weird. I think I found it: your insert test consists of invocations of
the form

  h.insert(begin, end);

for which std_hash (and presumably ext_hash) are smart enough to
pre-rehash based on n=std::distance(begin,end) when this distance's
calculation is affordable. B.MI hashed indices don't do that, so the
same call incurs several rehashing ops from size()=0 till size()=n.
Could you do me a favor? Please replace your local copy of
hashed indices range insert member function (at line 249 of
the RC_1_34_0 version of boost/multi_index/hashed_index.hpp):

  template<typename InputIterator>
  void insert(InputIterator first,InputIterator last)
  {
    BOOST_MULTI_INDEX_HASHED_INDEX_CHECK_INVARIANT;
    iterator hint=end();
    for(;first!=last;++first)hint=insert(hint,*first);
  }

with the following:

  template<typename InputIterator>
  void insert(InputIterator first,InputIterator last)
  {
    BOOST_MULTI_INDEX_HASHED_INDEX_CHECK_INVARIANT;

    // WARNING: following line only for testing purposes.
    reserve(size()+std::distance(first,last));

    iterator hint=end();
    for(;first!=last;++first)hint=insert(hint,*first);
  }

and rerun your tests? If my analysis is correct, figures for
insert(mi_hash) should be then on par with or only slightly worse
than std_hash and ext_hash. If this hunch proves right, it is certainly
an opportunity for huge performance improvements and I will
deal with it (in a proper way, the patch above  is only a hack meant
for testing purposes).

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo

Re: [boost] [multi_index] benchmarking

Joaquín Mª López Muñoz