
Hello Mark, ________________________________________ De: boost-users-bounces@lists.boost.org [boost-users-bounces@lists.boost.org] En nombre de Mark Westcott [mark-boost@befishy.co.uk] Enviado el: viernes, 25 de abril de 2008 18:29 Para: boost-users@lists.boost.org Asunto: [Boost-users] [MultiIndex] re-indexing on load
Hi, Loading a multi_index_container from an archive seems to be involving a re-index of that archive. I'm setting breakpoints on my key extract functions, and they're getting called when i load the container from the archive.
I would have thought that the serialization would mimic the copy constructor in this respect, and not do any re-indexing?
This is causing me significant performance problems at the moment. Is there any workaround?
As you have found out, loading involves some amount of reindexing, though the algorithm is designed to minimize this, let me summarize how serialization of multi_index_containers is performed: Saving a multi_index_container internally consists in: 1. Saving each of the elements, in the order they are traversed with index #0. 2. For each index other than #0, some "readjusting" info is saved if necessary. More on this later. Loading on the other hand reduces to: 1. Loading each element and inserting them with hinted insertion at the end of the container: for index #0, hinting the end() position reduces the nomber of key comparisons performed: for the rest of key based indices, hinting end() usually does not have any impact and insertion takes as many comparison ops as a regular insertion. 2. After step 1 the elements in index #0 are listed in the exact order they were originally saved, as it should be. As for the rest of the indices, they can be differences in traversal order for non key-based indices (sequenced and random_access) and for ordered non-unique indices (equivalent elements will be adjacent as ordered non-unique indices guarantee, but they might be shuffled with respect to the original situation). The "adjusting" info stored in saving step 2 is used here to restore the exact original traversing order for every index of the container. So, to try to minimize your loading time you'd need to first make a guess about where the most overhead lies: is it in step 1 or step 2? Step 2 can only be significant if any of your indices *other than #0* is of type ordered_non_unique, sequenced or random_access. Assuming you indeed have a situation where step 2 causes some impact, you can: a) if affordable, move the problematic index to position #0, so that it won't get any adjustment later. b) replace Boost.MultiIndex built-in serialization with the a handmade method so as to avoid step 2 altogether. Please find attached a sketchy implementation example of this idea. Note that this method does not guarantee exact traversal order replication for indices other than #0. In any case, do not expect spectacular improvements after applying any of the approaches above, as more likely most of the overhead comes from step 1 anyway. A more radical solution, but also one with a very high impact in your application code, is to put the multi_index_container in a memory-mapped file, so that serialization reduces to dumping the memory segment. You can find an example of this technique at: http://www.boost.org/doc/libs/1_35_0/libs/multi_index/doc/examples.html#exam... I'm sorry I can't be more helpful. Please report back whether any of this can be of some use to you. Thanks for using Boost.MultiIndex, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo