Please Help: Experienced very bad performance with Multi-index

Below is a sample program that I am running. If I create a multi-index as defined below, the performance as the number of entries increases gets to the point of being unusable. Examples: Num Entries Time to insert last entry Time to insert all entries 100 452 usecs 30.4 millisecs 1000 4.6 millisecs 2.4 seconds 10000 52.2 millisecs 266.92 seconds (over 4.5 minutes) This performance does not seem to be correct to me. Deletions from a multiindex based on a key, were taking super long as well (not shown in the below program). To delete all 1000 entries based on being below a certain ordered value was taking over 2 minutes. Is there something that is wrong with the below program that is causing these issues? Thanks for any help. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Cut Here XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX #if !defined(NDEBUG) #define BOOST_MULTI_INDEX_ENABLE_INVARIANT_CHECKING #define BOOST_MULTI_INDEX_ENABLE_SAFE_MODE #endif #include <boost/multi_index_container.hpp> #include <boost/multi_index/member.hpp> #include <boost/multi_index/ordered_index.hpp> #include <boost/multi_index/hashed_index.hpp> #include <algorithm> #include <iostream> #include <iterator> #include <string> #include <sys/time.h> using boost::multi_index_container; using namespace boost::multi_index; using namespace std; struct ErRecord { int value1; std::string value2; int value3; int value4; ErRecord() : value1(0), value2(""), value3(0), value4(0) { } ErRecord(int _value1, const std::string& _value2, const int _value3, int _value4) : value1(_value1), value2(_value2), value3(_value3), value4(_value4) { } }; //@@== Tags struct value1 {}; struct value2 {}; struct value4 {}; struct value3 {}; typedef multi_index_container < ErRecord, indexed_by < ordered_non_unique < tag<value1>, BOOST_MULTI_INDEX_MEMBER(ErRecord, int, value1) >, ordered_non_unique < tag<value2>, BOOST_MULTI_INDEX_MEMBER(ErRecord, std::string, value2) >, ordered_non_unique < tag<value3>, BOOST_MULTI_INDEX_MEMBER(ErRecord, int, value3) >, ordered_non_unique < tag<value4>, BOOST_MULTI_INDEX_MEMBER(ErRecord, int, value4) >
ErRecordMap;
int main(int argc, char* argv[]) { int numloops = 100; if (argc > 1) numloops = atoi(argv[1]); ErRecordMap em; char buf[50]; int i = 0; int sec; int usec; unsigned long long u64; struct timeval tv_start, tv_end; struct timeval tv_before, tv_after; gettimeofday( &tv_start, 0 ); while (1) { sprintf(buf, "value2-%d", i); gettimeofday( &tv_before, 0 ); em.insert(ErRecord(i, string(buf), i, i)); gettimeofday( &tv_after, 0 ); if( i == numloops ) { sec = tv_after.tv_sec - tv_before.tv_sec; usec = tv_after.tv_usec - tv_before.tv_usec; if (usec < 0) { sec--; usec += 1000000; } u64 = sec * 1000000 + usec; cout << "Time to insert last entry: " << sec << ":" << usec << endl; } ++i; if( i > numloops ) { break; } } gettimeofday( &tv_end, 0 ); sec = tv_end.tv_sec - tv_start.tv_sec; usec = tv_end.tv_usec - tv_start.tv_usec; if (usec < 0) { sec--; usec += 1000000; } u64 = sec * 1000000 + usec; cout << "total: " << numloops << "," << sec << ":" << usec << endl; cout << "average: " << numloops << "," << u64/numloops << "usec" << endl; return 0; } XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Cut Here XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Hi Robert, Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
Below is a sample program that I am running. If I create a multi-index as defined below, the performance as the number of entries increases gets to the point of being unusable.
[...]
#if !defined(NDEBUG) #define BOOST_MULTI_INDEX_ENABLE_INVARIANT_CHECKING #define BOOST_MULTI_INDEX_ENABLE_SAFE_MODE #endif
I strongly suspect this is the problem. Are you doing your performance tests in debug mode? If so, invariant and safe mode (specially the former) have a huge impact on performance. They shouldn't be used in production, release mode. Try either not using these modes or, better yet, testing with release settings, which in your particular case implies disabling these modes as well. Please report back. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Thank you. Removing those seemed to help significantly. I am now having issues with removal up to an upper_bound value. You mentioned "testing with release settings". How do I enable "release settings"? Thanks -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Joaquin M Lopez Munoz Sent: Wednesday, July 29, 2009 4:58 PM To: boost@lists.boost.org Subject: Re: [boost]Please Help: Experienced very bad performance with Multi-index Hi Robert, Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
Below is a sample program that I am running. If I create a multi-index as defined below, the performance as the number of entries increases gets to the point of being unusable.
[...]
#if !defined(NDEBUG) #define BOOST_MULTI_INDEX_ENABLE_INVARIANT_CHECKING #define BOOST_MULTI_INDEX_ENABLE_SAFE_MODE #endif
I strongly suspect this is the problem. Are you doing your performance tests in debug mode? If so, invariant and safe mode (specially the former) have a huge impact on performance. They shouldn't be used in production, release mode. Try either not using these modes or, better yet, testing with release settings, which in your particular case implies disabling these modes as well. Please report back. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Actually, I found the issue. I will try to explain it as follows: Assume that I have data that needs to be hashed by 4 different keys (key1, key2, key3, key4). If there are situations where key3 and key4 are non unique and I make them the same value for every entry, when I remove an entry from my table that has a lot of entries (ex. 10000), it may take seconds to for the removal to take place. Is there a way to improve on this? Something like be able to tell the multiindex to not always hash a key if the value is not set (example, if integer, value of -1 means not set)? Thanks for any help. -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Watanabe Sent: Thursday, July 30, 2009 3:10 PM To: boost@lists.boost.org Subject: Re: [boost] Please Help: Experienced very bad performance withMulti-index AMDG Jarolin, Robert wrote:
How do I enable "release settings"?
That depends on your compiler and build system. In Christ, Steven Watanabe _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
Actually, I found the issue. I will try to explain it as follows:
Assume that I have data that needs to be hashed by 4 different keys (key1, key2, key3, key4). If there are situations where key3 and key4 are non unique and I make them the same value for every entry, when I remove an entry from my table that has a lot of entries (ex. 10000), it may take seconds to for the removal to take place.
Hashed non-unique indices have this problem, when deleting an entry with a much repeated key performance can be linear. On the other hand, several *seconds* to delete an entry on a container with tens of thousands of entries is *way* too long even taking this problem into account. Which compiler/ environment are you using? Did you manage to find how to activate release settings?
Is there a way to improve on this? Something like be able to tell the multiindex to not always hash a key if the value is not set (example, if integer, value of -1 means not set)?
I see a number of possibilities (once you've ruled out the debug vs. release issue, which I strongly suggest you take a look at before going any further): 1. Replace the indices for key3 and key4 with ordered indices. This is a trade-off, since some oeprations will get slower (lookup), so you'll have to measure and decide. 2. If I understood you right, you don't always care about the key3 or key4 value of a given element. When this is the case, set them to a random value to avoid repetitions. 3. This is drastic, but a colleague recently modified the lib source code so as to turn singly-linked hashed indices into doubly-linked indices, which don't have the deletion problem described above. Details can be found at http://lists.boost.org/Archives/boost/2009/04/151184.php (read the whole thread). HTH, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Joaquin, Thank you for the help you are providing. To answer your first question, I am still uncertain how to put the boost library into a release mode for the usage, though I no longer have those #define statements that caused the excessive debug overhead. I happened to go a route similar to what you stated in 2 below. I implemented a static counter that is incremented then used to be put in place for the empty fields, so that they can be more uniquely hashed. This has changed the performance from taking around 93 seconds to remove 100 entries from a list of 20,000 to around 1-2 milliseconds. I only wish there was a way to tell the boost multiindex to not create a hash on the item if it is a certain defined default value (ex. Empty string, -1, ...) As for choice 3, I do not want to modify the boost library myself, due to the overhead of having to maintain those changes for future upgrades. Again, thanks for your help. -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Joaquin M Lopez Munoz Sent: Thursday, July 30, 2009 4:06 PM To: boost@lists.boost.org Subject: Re: [boost] Please Help: Experienced very bad performance withMulti-index Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
Actually, I found the issue. I will try to explain it as follows:
Assume that I have data that needs to be hashed by 4 different keys (key1, key2, key3, key4). If there are situations where key3 and key4 are non unique and I make them the same value for every entry, when I remove an entry from my table that has a lot of entries (ex. 10000), it may take seconds to for the removal to take place.
Hashed non-unique indices have this problem, when deleting an entry with a much repeated key performance can be linear. On the other hand, several *seconds* to delete an entry on a container with tens of thousands of entries is *way* too long even taking this problem into account. Which compiler/ environment are you using? Did you manage to find how to activate release settings?
Is there a way to improve on this? Something like be able to tell the multiindex to not always hash a key if the value is not set (example, if integer, value of -1 means not set)?
I see a number of possibilities (once you've ruled out the debug vs. release issue, which I strongly suggest you take a look at before going any further): 1. Replace the indices for key3 and key4 with ordered indices. This is a trade-off, since some oeprations will get slower (lookup), so you'll have to measure and decide. 2. If I understood you right, you don't always care about the key3 or key4 value of a given element. When this is the case, set them to a random value to avoid repetitions. 3. This is drastic, but a colleague recently modified the lib source code so as to turn singly-linked hashed indices into doubly-linked indices, which don't have the deletion problem described above. Details can be found at http://lists.boost.org/Archives/boost/2009/04/151184.php (read the whole thread). HTH, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
Joaquin, Thank you for the help you are providing. To answer your first question, I am still uncertain how to put the boost library into a release mode for the usage, though I no longer have those #define statements that caused the excessive debug overhead.
It is not about putting the *Boost library* into release mode, but about compiling your code with release settings, usually implying that the optimizer is used among several other settings oriented toards making the resulting binary code faster and lighter. Which environment are you using?
I happened to go a route similar to what you stated in 2 below[...] I only wish there was a way to tell the boost multiindex to not create a hash on the item if it is a certain defined default value (ex. Empty string, -1, ...)
Some hash value has to be associated, as the element is to be inserted anywhere into the index. Your mechanism is a good compromise. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Ok. I thought there was some options to specify for boost that removed any and all non-release related settings. If not, then we are probably fine with our current test and release based settings. I was hoping that there was some ability to specify your own hashing function to be called for each hash. Add to that, I was hoping that the supplied hash function was given the ability to return some BOOST defined value like DO_NOT_HASH that would tell boost not to hash that value in the entry when building its table info. That would be ideal, but if that (or something similar) is not possible, then I guess my compromise is the way for me to go. Again, thanks. -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Joaquin M Lopez Munoz Sent: Friday, July 31, 2009 12:49 PM To: boost@lists.boost.org Subject: Re: [boost] Please Help: Experienced very bad performance withMulti-index Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
Joaquin, Thank you for the help you are providing. To answer your first question, I am still uncertain how to put the boost library into a release mode for the usage, though I no longer have those #define statements that caused the excessive debug overhead.
It is not about putting the *Boost library* into release mode, but about compiling your code with release settings, usually implying that the optimizer is used among several other settings oriented toards making the resulting binary code faster and lighter. Which environment are you using?
I happened to go a route similar to what you stated in 2 below[...] I only wish there was a way to tell the boost multiindex to not create a hash on the item if it is a certain defined default value (ex. Empty string, -1, ...)
Some hash value has to be associated, as the element is to be inserted anywhere into the index. Your mechanism is a good compromise. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jarolin, Robert <Robert.Jarolin <at> trueposition.com> writes:
Ok. I thought there was some options to specify for boost that removed any and all non-release related settings. If not, then we are probably fine with our current test and release based settings.
OK. I still think the times you were getting were way too long even considering the duplicate key problem, but anyway.
I was hoping that there was some ability to specify your own hashing function to be called for each hash.
Yes, you can do that by passing a custom hash predicate in the speciication of the hashed index, as explained at http://www.boost.org/libs/multi_index/doc/tutorial/indices.html#hash_spec But even so the hash predicate is always expected to return a meaningful hash code for each value passed, so what you've got now is probably as good as you can get. Good luck with your project, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (3)
-
Jarolin, Robert
-
Joaquin M Lopez Munoz
-
Steven Watanabe