complexity problem using multi_index insert()
Hello,
this is my third e-mail regarding multi_index containers.
I'm using them for the process of clustering a huge amount of data (posts
from an aple blog from the last 4-5 years)
I'm counting Euclidean distance between two clusters, which I store in a
multi_index container like the above
*typedef multi_index_container<
cluster_d,
indexed_by<
ordered_unique<
tag
cluster_index;*
My problem is that I have to count 9500! (! = factorial) distances, and
store them in the multi_index container.
The process of counting distances is costing me 13minutes ( that is because
the dimension of each of the two vectors is nearly 1000 or more for which i
have to access STL's maps).
Another 9 minutes takes the insertion using "std::pair
Andrew Kokkalis
Hello,this is my third e-mail regarding multi_index containers.
You're welcome, come here for help as frequently as you need.
I'm using them for the process of clustering a huge amount of data (posts from an aple blog from the last 4-5 years)I'm counting Euclidean distance between two clusters, which I store in a multi_index container like the above [...] My problem is that I have to count 9500! (! = factorial) distances, and store them in the multi_index container.
I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
I'm using them for the process of clustering a huge amount of data (posts from an aple blog from the last 4-5 years)I'm counting Euclidean distance between two clusters, which I store in a multi_index container like the above [...] My problem is that I have to count 9500! (! = factorial) distances, and store them in the multi_index container.
I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case.
Is it impossible for a container such as the above to hold that much elements? during insertion ram is over 2.4 gb (only for this program). I'm running now the program with different parameters and see the number of elements that are stored.
2010/2/28 Andrew Kokkalis
I'm using them for the process of clustering a huge amount of data
(posts from an aple blog from the last 4-5 years)I'm counting Euclidean distance between two clusters, which I store in a multi_index container like the above [...] My problem is that I have to count 9500! (! = factorial) distances, and store them in the multi_index container.
I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case.
Is it impossible for a container such as the above to hold that much elements? during insertion ram is over 2.4 gb (only for this program).
I'm running now the program with different parameters and see the number of elements that are stored.
Obviously it is not factorial! Once again i was wrong and fool!
The elements in my container are 46,238,536 (n^2 - n ) / 2 so complexity
for storage is O(n^2)
And the problem is that inserting those few elements into the multi_index
container takes almost 9 minutes.
As I said before I'm using "std::pair
Andrew Kokkalis
Joaquín M López writes:
I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case.
Obviously it is not factorial! Once again i was wrong and fool The elements in my container are 46,238,536 (n^2 - n ) / 2 so complexity for storage is O(n^2)And the problem is that inserting those few elements into the multi_index container takes almost 9 minutes. As I said before I'm using "std::pair
insert(const value_type& x);".Is there a faster way to insert?
I take it you're inserting the elements in some sort
of double loop like this, right?
for(int i=0;i
And the problem is that inserting those few elements into the multi_index container takes almost 9 minutes. <...>
In addition to Joaquín's advice: ensure your benchmarks are done in "release mode", i.e. with full optimization. Besides, if you compile in MSVC and stl containers are somehow related to this operation, you might also want to disabe so called "secured stl", which has enormous impact on the performance: #define _SECURE_SCL 0
And the problem is that inserting those few elements into the multi_index container takes almost 9 minutes. <...>
In addition to Joaquín's advice: ensure your benchmarks are done in "release mode", i.e. with full optimization. Besides, if you compile in MSVC and stl containers are somehow related to this operation, you might also want to disabe so called "secured stl", which has enormous impact on the performance: #define _SECURE_SCL 0
Thank you very much!This helped a lot!
Regarding _SECURE_SCL we are using ubuntu linux and Eclipse for C++ so i guess the above won't work. Is there anything similar for eclipse? While googling i found many links from the archive of this mailing list, but nothing relevant. Now from 22 minutes I'm down to 5! I'm already very pleased but I would like to become greedy!
________________________________________ De: boost-users-bounces@lists.boost.org [boost-users-bounces@lists.boost.org] En nombre de Igor R [boost.lists@gmail.com] Enviado el: domingo, 28 de febrero de 2010 15:36 Para: boost-users@lists.boost.org Asunto: Re: [Boost-users] complexity problem using multi_index insert()
Regarding _SECURE_SCL we are using ubuntu linux and Eclipse for C++ so i guess the above won't work.
No, _SECURE_SCL is a feature of STL coming with MSVC, it's not relevant for gcc/linux. So all you can do is to use the hinted insert() and compile with -O3.
Andrew, have you implemented both measures (hinted insert and -O3) or just the latter? Have you measured the inmpact on performance of each of them? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
Andrew, have you implemented both measures (hinted insert and -O3) or just the latter? Have you measured the inmpact on performance of each of them?
yes I'm using both. hinted insert made it faster by 3-4 minutes while both of them made it faster by 14 minutes Previously, I was compiling it with -g3
Andrew Kokkalis
Andrew, have you implemented both measures (hinted insert and -O3) or just the latter? Have you measured the inmpact on performance of each of them?
yes I'm using both.hinted insert made it faster by 3-4 minutes while both of them made it faster by 14 minutes Previously, I was compiling it with -g3
Being yours an allocation intensive app, you might consider using fast malloc replacements, like for instance Google's TCMalloc: http://goog-perftools.sourceforge.net/doc/tcmalloc.html http://code.google.com/p/google-perftools/ Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (4)
-
Andrew Kokkalis
-
Igor R
-
Joaquin M Lopez Munoz
-
JOAQUIN M. LOPEZ MUÑOZ