New subject: [MultiIndex] re-indexing on load

28 Apr 2008

      Hi Joaquin.
...
As you have found out, loading involves some amount of reindexing,
though the algorithm is designed to minimize this, let me summarize
how serialization of multi_index_containers is performed:
Saving a multi_index_container internally consists in:
1. Saving each of the elements, in the order they are traversed
with index #0.
2. For each index other than #0, some "readjusting" info is saved if
necessary. More on this later.
Loading on the other hand reduces to:
1. Loading each element and inserting them with hinted insertion
at the end of the container: for index #0, hinting the end() position
reduces the nomber of key comparisons performed: for the rest
of key based indices, hinting end() usually does not have any impact
and insertion takes as many comparison ops as a regular insertion.
2. After step 1 the elements in index #0 are listed in the exact
order they were originally saved, as it should be. As for the
rest of the indices, they can be differences in traversal order
for non key-based indices (sequenced and random_access)
and for ordered non-unique indices (equivalent elements will be
adjacent as ordered non-unique indices guarantee, but they might
be shuffled with respect to the original situation). The "adjusting"
info stored in saving step 2 is used here to restore the exact
original traversing order for every index of the container.
So, to try to minimize your loading time you'd need to first make
a guess about where the most overhead lies: is it in step 1
or step 2? Step 2 can only be significant if any of your indices
*other than #0* is of type ordered_non_unique, sequenced
or random_access. Assuming you indeed have a situation where
step 2 causes some impact, you can:
a) if affordable, move the problematic index to position #0, so that it
won't get any adjustment later.
b) replace Boost.MultiIndex built-in serialization with the a handmade
method so as to avoid step 2 altogether. Please find attached a sketchy
implementation example of this idea. Note that this method does
not guarantee exact traversal order replication for indices other
than #0.
In any case, do not expect spectacular improvements after applying
any of the approaches above, as more likely most of the overhead
comes from step 1 anyway. A more radical solution, but also one
with a very high impact in your application code, is to put the
multi_index_container in a memory-mapped file, so that serialization
reduces to dumping the memory segment. You can find an example
of this technique at:
http://www.boost.org/doc/libs/1_35_0/libs/multi_index/doc/examples.html#exam...
Thanks for the explanation and ideas.

Based on the information you provided, I guessed that moving from hashed
indices (where the hash function is relatively expensive) to
sorted_non_unique (with a cheap comparison function) would improve
things; and indeed that change has got the load hugely quicker.

I then tried the 'poor man serialization' example that you attached.  As
you suspected, the performance effect of this was pretty small and so I
guess it is indeed step 1 that is causing the issue (I have several
ordered_non_uniques).  I do like the memory mapped file idea, but as you
say, the impact on the code is large; I think too large for the time
being, though it may be sometime I come back to if needs be in the
future.

I think, for the time being at least, the performance is ok now (using
ordered rather than hashed indices).  
Is there any chance that future versions of MI might be able to store
the hashes/complete ordering whilst serializing, or is that Just Not
Possible (TM)?

Thanks again,
Mark

Re: [Boost-users] [MultiIndex] re-indexing on load

Mark Westcott

joaquin＠tid.es

tags

participants (2)