complexity problem using multi_index insert()

older
[units] conversion_factor in the...

Andrew Kokkalis

27 Feb 2010 27 Feb '10

7:39 p.m.

Hello, this is my third e-mail regarding multi_index containers. I'm using them for the process of clustering a huge amount of data (posts from an aple blog from the last 4-5 years) I'm counting Euclidean distance between two clusters, which I store in a multi_index container like the above *typedef multi_index_container< cluster_d, indexed_by< ordered_unique< tag<cluster_pair>, composite_key< cluster_d, BOOST_MULTI_INDEX_MEMBER(cluster_d, int, cluster1), BOOST_MULTI_INDEX_MEMBER(cluster_d, int, cluster2)> >, ordered_non_unique< tag<cluster_distance>, BOOST_MULTI_INDEX_MEMBER(cluster_d, float, distance)> >

...

cluster_index;*

My problem is that I have to count 9500! (! = factorial) distances, and store them in the multi_index container. The process of counting distances is costing me 13minutes ( that is because the dimension of each of the two vectors is nearly 1000 or more for which i have to access STL's maps). Another 9 minutes takes the insertion using "std::pair<iterator,bool> insert(const value_type& x);". I would like to know if there is another way to do faster insertion. I've seen you use mailing list for much complicated problems, and I'm sorry for being a noob.But since i chose to use boost I'd like to stick with that, so I'm kindly asking for your help! Thank you in advance Andrew Kokkalis

Attachments:

attachment.html (text/html — 1.6 KB)

Show replies by date

Joaquin M Lopez Munoz

27 Feb 27 Feb

8:14 p.m.

New subject: [Boost-users] complexity problem using multi_index insert()

Andrew Kokkalis <andrewkokkalis <at> gmail.com> writes:

...

Hello,this is my third e-mail regarding multi_index containers.

You're welcome, come here for help as frequently as you need.

...

I'm using them for the process of clustering a huge amount of data (posts from an aple blog from the last 4-5 years)I'm counting Euclidean distance between two clusters, which I store in a multi_index container like the above [...] My problem is that I have to count 9500! (! = factorial) distances, and store them in the multi_index container.

I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Andrew Kokkalis

28 Feb 28 Feb

9:59 a.m.

...

...
I'm using them for the process of clustering a huge amount of data (posts from an aple blog from the last 4-5 years)I'm counting Euclidean distance between two clusters, which I store in a multi_index container like the above [...] My problem is that I have to count 9500! (! = factorial) distances, and store them in the multi_index container.

I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case.

Is it impossible for a container such as the above to hold that much elements? during insertion ram is over 2.4 gb (only for this program). I'm running now the program with different parameters and see the number of elements that are stored.

Andrew Kokkalis

10:52 a.m.

2010/2/28 Andrew Kokkalis <andrewkokkalis@gmail.com>

...

...
I'm using them for the process of clustering a huge amount of data

...
(posts from an aple blog from the last 4-5 years)I'm counting Euclidean distance between two clusters, which I store in a multi_index container like the above [...] My problem is that I have to count 9500! (! = factorial) distances, and store them in the multi_index container.

I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case.

Is it impossible for a container such as the above to hold that much elements? during insertion ram is over 2.4 gb (only for this program).

I'm running now the program with different parameters and see the number of elements that are stored.

Obviously it is not factorial! Once again i was wrong and fool! The elements in my container are 46,238,536 (n^2 - n ) / 2 so complexity for storage is O(n^2) And the problem is that inserting those few elements into the multi_index container takes almost 9 minutes. As I said before I'm using "std::pair<iterator,bool> insert(const value_type& x);". Is there a faster way to insert?

Joaquin M Lopez Munoz

11:46 a.m.

New subject: [Boost-users] complexity problem using multi_index insert()

Andrew Kokkalis <andrewkokkalis <at> gmail.com> writes:

...

Joaquín M López writes:

...
I think I'm misunderstanding this: do you mean your container is holding 9500! elements? As 9500! ~ 9*10^33664 this can't possibly be the case.

Obviously it is not factorial! Once again i was wrong and fool The elements in my container are 46,238,536 (n^2 - n ) / 2 so complexity for storage is O(n^2)And the problem is that inserting those few elements into the multi_index container takes almost 9 minutes. As I said before I'm using "std::pair<iterator,bool> insert(const value_type& x);".Is there a faster way to insert?

I take it you're inserting the elements in some sort of double loop like this, right? for(int i=0;i<cluster1_elements;++i){ for(int j=0;j<cluster2_elements;++j){ container.insert(cluster_d(i,j,distance(i,j))); } } If so, note that the elements are inserted in the same order as maintained by your first index, so you can take advantage of hinted insertion: container.insert(cluster_d(i,j,distance(i,j)),container.end()); If the double loop is set up the other way around, i.e. first cluster2 then cluster1, please reverse it. This should reduce insertion times by a noticeable amount (I dare say 25% or so, but you'll be able to report back when you try it). Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Igor R

1:14 p.m.

...

And the problem is that inserting those few elements into the multi_index container takes almost 9 minutes. <...>

In addition to Joaquín's advice: ensure your benchmarks are done in "release mode", i.e. with full optimization. Besides, if you compile in MSVC and stl containers are somehow related to this operation, you might also want to disabe so called "secured stl", which has enormous impact on the performance: #define _SECURE_SCL 0

Andrew Kokkalis

2:26 p.m.

...

...
And the problem is that inserting those few elements into the multi_index container takes almost 9 minutes. <...>

In addition to Joaquín's advice: ensure your benchmarks are done in "release mode", i.e. with full optimization. Besides, if you compile in MSVC and stl containers are somehow related to this operation, you might also want to disabe so called "secured stl", which has enormous impact on the performance: #define _SECURE_SCL 0

Thank you very much!This helped a lot!

Regarding _SECURE_SCL we are using ubuntu linux and Eclipse for C++ so i guess the above won't work. Is there anything similar for eclipse? While googling i found many links from the archive of this mailing list, but nothing relevant. Now from 22 minutes I'm down to 5! I'm already very pleased but I would like to become greedy!

Igor R

2:36 p.m.

...

Regarding _SECURE_SCL we are using ubuntu linux and Eclipse for C++ so i guess the above won't work.

No, _SECURE_SCL is a feature of STL coming with MSVC, it's not relevant for gcc/linux. So all you can do is to use the hinted insert() and compile with -O3.

JOAQUIN M. LOPEZ MUÑOZ

2:41 p.m.

________________________________________ De: boost-users-bounces@lists.boost.org [boost-users-bounces@lists.boost.org] En nombre de Igor R [boost.lists@gmail.com] Enviado el: domingo, 28 de febrero de 2010 15:36 Para: boost-users@lists.boost.org Asunto: Re: [Boost-users] complexity problem using multi_index insert()

...

...
Regarding _SECURE_SCL we are using ubuntu linux and Eclipse for C++ so i guess the above won't work.

No, _SECURE_SCL is a feature of STL coming with MSVC, it's not relevant for gcc/linux. So all you can do is to use the hinted insert() and compile with -O3.

Andrew, have you implemented both measures (hinted insert and -O3) or just the latter? Have you measured the inmpact on performance of each of them? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Andrew Kokkalis

3:26 p.m.

...

Andrew, have you implemented both measures (hinted insert and -O3) or just the latter? Have you measured the inmpact on performance of each of them?

yes I'm using both. hinted insert made it faster by 3-4 minutes while both of them made it faster by 14 minutes Previously, I was compiling it with -g3

Joaquin M Lopez Munoz

3:42 p.m.

New subject: [Boost-users] complexity problem using multi_index insert()

Andrew Kokkalis <andrewkokkalis <at> gmail.com> writes:

...

...
Andrew, have you implemented both measures (hinted insert and -O3) or just the latter? Have you measured the inmpact on performance of each of them?

yes I'm using both.hinted insert made it faster by 3-4 minutes while both of them made it faster by 14 minutes Previously, I was compiling it with -g3

Being yours an allocation intensive app, you might consider using fast malloc replacements, like for instance Google's TCMalloc: http://goog-perftools.sourceforge.net/doc/tcmalloc.html http://code.google.com/p/google-perftools/ Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

5644

Age (days ago)

5645

Last active (days ago)

List overview

Download

10 comments

4 participants

participants (4)

Andrew Kokkalis
Igor R
Joaquin M Lopez Munoz
JOAQUIN M. LOPEZ MUÑOZ