Efficient implementation of associative_property_map
Dear Boost users, for an application in computational biology I am intendig to use the BGL to compute strongly connected components of a graph with about 600.000 nodes. In the BGL book on page 103 it is mentioned, that "The choice to use std::map to implement the property map is rather inefficient in this case...". Since my graph is large I'm a bit worried about this statement. Could anybody of you please give me a hint, what other datastructure I might use here, to make it efficient? all the best Martin -- ----------------------------------------------------------------------------- Martin Okrslar MPI for Molecular Genetics phone: ++ 49 + 30 / 8413-1166 Computational Molecular Biology Fax: ++ 49 + 30 / 8413-1152 Ihnestrasse 73 email: okrslar@molgen.mpg.de D-14195 Berlin URL: http://cmb.molgen.mpg.de -----------------------------------------------------------------------------
I'd think a hashed dictionary would be needed for such a large data set. There is a caviot though, weither or not a good hash function can be created. The good thing is that its easy to experiment with different STL associative containers, as long as you have SGI's STL or a derivative/ or good alternative. In SGI, the container is hash_map. Martin Okrslar wrote:
Dear Boost users,
for an application in computational biology I am intendig to use the BGL to compute strongly connected components of a graph with about 600.000 nodes.
In the BGL book on page 103 it is mentioned, that "The choice to use std::map to implement the property map is rather inefficient in this case...". Since my graph is large I'm a bit worried about this statement.
Could anybody of you please give me a hint, what other datastructure I might use here, to make it efficient?
all the best Martin
-- ----------------------------------------------------------------------------- Martin Okrslar MPI for Molecular Genetics phone: ++ 49 + 30 / 8413-1166 Computational Molecular Biology Fax: ++ 49 + 30 / 8413-1152 Ihnestrasse 73 email: okrslar@molgen.mpg.de D-14195 Berlin URL: http://cmb.molgen.mpg.de -----------------------------------------------------------------------------
*Yahoo! Groups Sponsor* ADVERTISEMENT http://rd.yahoo.com/M=246920.2960106.4328965.2848452/D=egroupweb/S=170500678...
Info: http://www.boost.org Wiki: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl Unsubscribe: mailto:boost-users-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service http://docs.yahoo.com/info/terms/.
--- In Boost-Users@yahoogroups.com, Jeff Holle
I'd think a hashed dictionary would be needed for such a large data set. There is a caviot though, weither or not a good hash function can be created.
The good thing is that its easy to experiment with different STL associative containers, as long as you have SGI's STL or a derivative/ or good alternative.
In SGI, the container is hash_map.
We don't have SGI STL around here. I've seen that before, but I hoped that there would be an other way than changing my STL version... It's amazing that there is no hashtable in the stdc++, isn't it? Martin
On Wed, 2003-03-05 at 13:50, okrslar wrote:
--- In Boost-Users@yahoogroups.com, Jeff Holle
wrote: I'd think a hashed dictionary would be needed for such a large data set. There is a caviot though, weither or not a good hash function can be created.
The good thing is that its easy to experiment with different STL associative containers, as long as you have SGI's STL or a derivative/ or good alternative.
In SGI, the container is hash_map.
We don't have SGI STL around here. I've seen that before, but I hoped that there would be an other way than changing my STL version... It's amazing that there is no hashtable in the stdc++, isn't it?
More or less... :) Hashed containers were introduced too late to make it into the standard IIRC, allthough most STL implementations have them and most STL books document them. I believe you can be pretty certain they'll make it into the next standard though :) I'd be interested to know which STL implementation it is you are using that don't have hashed containers? I would strongly consider switching to STLport for instance. Regards, -- Tarjei
Tarjei Knapstad wrote: [snip]
We don't have SGI STL around here. I've seen that before, but I hoped that there would be an other way than changing my STL version... It's amazing that there is no hashtable in the stdc++, isn't it?
More or less... :) Hashed containers were introduced too late to make it into the standard IIRC, allthough most STL implementations have them and most STL books document them.
I believe you can be pretty certain they'll make it into the next standard though :)
I'd be interested to know which STL implementation it is you are using that don't have hashed containers? I would strongly consider switching to STLport for instance.
I am what comes with gcc 3.2. Now that you told me, I had a look into our /package/gcc-3.2/linux/include/c++/3.2/ext/ directory, and there is it. Great! Thanks! Martin
Regards, -- Tarjei
Info: http://www.boost.org Wiki: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl Unsubscribe: mailto:boost-users-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
-- ----------------------------------------------------------------------------- Martin Okrslar MPI for Molecular Genetics phone: ++ 49 + 30 / 8413-1166 Computational Molecular Biology Fax: ++ 49 + 30 / 8413-1152 Ihnestrasse 73 email: okrslar@molgen.mpg.de D-14195 Berlin URL: http://cmb.molgen.mpg.de -----------------------------------------------------------------------------
At 08:14 AM 3/5/2003, Tarjei Knapstad wrote:
On Wed, 2003-03-05 at 13:50, okrslar wrote:
--- In Boost-Users@yahoogroups.com, Jeff Holle
wrote: I'd think a hashed dictionary would be needed for such a large data set. There is a caviot though, weither or not a good hash function can be created.
The good thing is that its easy to experiment with different STL associative containers, as long as you have SGI's STL or a derivative/ or good alternative.
In SGI, the container is hash_map.
We don't have SGI STL around here. I've seen that before, but I hoped that there would be an other way than changing my STL version... It's amazing that there is no hashtable in the stdc++, isn't it?
More or less... :) Hashed containers were introduced too late to make it into the standard IIRC, allthough most STL implementations have them and most STL books document them.
I believe you can be pretty certain they'll make it into the next standard though :)
Yes, Matt Austern has written a nice hash table proposal for the upcoming Standard Library Technical Report. He plans to present version 3 of his proposal in Oxford in April, and I'd be very surprised if it doesn't get accepted, and probably at this meeting. --Beman
Tarjei Knapstad
We don't have SGI STL around here. I've seen that before, but I hoped that there would be an other way than changing my STL version... It's amazing that there is no hashtable in the stdc++, isn't it?
More or less... :) Hashed containers were introduced too late to make it into the standard IIRC, allthough most STL implementations have them and most STL books document them.
A caution: while hashtable lookups are theoretically O(1), it can be very difficult in practice to come up with a hash function that will cause a hashtable implementation to outperform a custom-built associative data structure. One thing I've used successfully is a sorted vector of sorted vectors. A couple of binary searches can be very fast, and have nice locality properties if the vectors don't grow too large. -- Dave Abrahams Boost Consulting www.boost-consulting.com
Another caution. Hash containers are much more memory efficient than red/black implementations. This can be significant with 600000 elements... Don't remember the actual ratio but it somewhere like 3:1. David Abrahams wrote:
Tarjei Knapstad
writes: We don't have SGI STL around here. I've seen that before, but I hoped that there would be an other way than changing my STL version... It's amazing that there is no hashtable in the stdc++, isn't it?
More or less... :) Hashed containers were introduced too late to make it into the standard IIRC, allthough most STL implementations have them and most STL books document them.
A caution: while hashtable lookups are theoretically O(1), it can be very difficult in practice to come up with a hash function that will cause a hashtable implementation to outperform a custom-built associative data structure. One thing I've used successfully is a sorted vector of sorted vectors. A couple of binary searches can be very fast, and have nice locality properties if the vectors don't grow too large.
-- Dave Abrahams Boost Consulting www.boost-consulting.com
*Yahoo! Groups Sponsor* ADVERTISEMENT http://rd.yahoo.com/M=246920.2960106.4328965.2848452/D=egroupweb/S=170500678...
Info: http://www.boost.org Wiki: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl Unsubscribe: mailto:boost-users-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service http://docs.yahoo.com/info/terms/.
On Tuesday 04 March 2003 02:37 pm, Martin Okrslar wrote:
In the BGL book on page 103 it is mentioned, that "The choice to use std::map to implement the property map is rather inefficient in this case...". Since my graph is large I'm a bit worried about this statement.
Could anybody of you please give me a hint, what other datastructure I might use here, to make it efficient?
Do you need the data structure to be external to the graph? The fastest property map is an internal property map, where the value is stored along with the vertex, edge, or graph the property described. Constant lookup time, without any hash functions or magic. Doug
--- In Boost-Users@yahoogroups.com, Douglas Gregor
On Tuesday 04 March 2003 02:37 pm, Martin Okrslar wrote:
In the BGL book on page 103 it is mentioned, that "The choice to use std::map to implement the property map is rather inefficient in this case...". Since my graph is large I'm a bit worried about this statement.
Could anybody of you please give me a hint, what other datastructure I might use here, to make it efficient?
Do you need the data structure to be external to the graph? The fastest property map is an internal property map, where the value is stored along with the vertex, edge, or graph the property described. Constant lookup time, without any hash functions or magic.
What I need is an associative_property_map, so your solution seems to be the right choice for me. Thanks! Martin
Doug
participants (7)
-
Beman Dawes
-
David Abrahams
-
Douglas Gregor
-
Jeff Holle
-
Martin Okrslar
-
okrslar
-
Tarjei Knapstad