
Michael Fawcett escribió:
In my application I need to load millions of vertices into a graph. I'm using BGL for this. Each edge has around 20-30 attributes associated with it and each attribute has a unique ID (not necessarily sequential or 0-based).
I'm currently using this multi_index_container for the attributes:
typedef detail::mutable_pair<long, _variant_t> id_attribute_pair_type; typedef boost::multi_index_container < id_attribute_pair_type, boost::multi_index::indexed_by < boost::multi_index::hashed_unique<boost::multi_index::member<id_attribute_pair_type, long, &id_attribute_pair_type::first> > >
map_type;
which emulates:
typedef std::map<long, _variant_t> map_type;
If I check memory consumption after loading the graph using multi_index I'm at 190MB. Using std::map it's at 124MB. I realize this is because std::map is perfectly sized and a hashmap will grow in chunks, but is there a way I can reserve a size ahead of time since I know exactly how many attributes there will be?
There is no way to fine-tune the size of the bucket array: this is in general the minimum of a list of prime numbers, roughly each the double of the prior, that is compatible with the maximum load factor specified . With the default maximum load factor mlf=1.0, the size of the bucket array can range approximately between n and 2n, where n is the number of elements. You can use max_load_factor(z) to set the maximum load factor slightly below 1.0 to see if that improves the situation (that is, if the bucket size keeps at the size immediately preceding the one you have now). The member function bucket_count() gives you the size of the bucket array. Is this indeed much larger than the number of elements? Nevertheless, the differences in memory consumption do not look consistent to me: a std::map has an overhead of 12-16 bytes per element (16 in most cases, 12 if some optimizations for a so-called "color" internal parameter are applied) for 32-bit architectures. For a hashed index the overhead (with lf=1.0) should be between 8 and 12 bytes per element. We are missing something here. Can you provide more detailed info on how you're measuring memory consumption? Are there any aspect you might not be taking into account? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo