Re: [Boost-users] [Multi-index] hash_unique memory consumption vs std::map

5 Nov 2008

      Michael Fawcett escribió:
...
In my application I need to load millions of vertices into a graph.
I'm using BGL for this.  Each edge has around 20-30 attributes
associated with it and each attribute has a unique ID (not necessarily
sequential or 0-based).
I'm currently using this multi_index_container for the attributes:
typedef detail::mutable_pair<long, _variant_t> id_attribute_pair_type;
typedef boost::multi_index_container
<
        id_attribute_pair_type,
        boost::multi_index::indexed_by
        <               boost::multi_index::hashed_unique<boost::multi_index::member<id_attribute_pair_type,
long, &id_attribute_pair_type::first> >
        >
...
map_type;
which emulates:
typedef std::map<long, _variant_t> map_type;
If I check memory consumption after loading the graph using
multi_index I'm at 190MB.  Using std::map it's at 124MB.  I realize
this is because std::map is perfectly sized and a hashmap will grow in
chunks, but is there a way I can reserve a size ahead of time since I
know exactly how many attributes there will be?
There is no way to fine-tune the size of the bucket array: this is in 
general the minimum of a list
of prime numbers, roughly each the double of the prior, that is 
compatible with the
maximum load factor specified . With the default maximum load factor 
mlf=1.0, the size of the
bucket array can range approximately between n and 2n, where n is the 
number of elements.
You can use max_load_factor(z) to set the maximum load factor slightly 
below 1.0 to see if that
improves the situation (that is, if the bucket size keeps at the size 
immediately preceding the one
you have now). The member function bucket_count() gives you the size of 
the bucket
array. Is this indeed much larger than the number of elements?

Nevertheless, the differences in memory consumption do not look 
consistent to me: a std::map
has an overhead of 12-16 bytes per element (16 in most cases, 12 if some 
optimizations
for a so-called "color" internal parameter are applied) for 32-bit 
architectures. For a hashed index
the overhead (with lf=1.0) should be between 8 and 12 bytes per element. 
We are missing
something here. Can you provide more detailed info on how you're 
measuring memory
consumption? Are there any aspect you might not be taking into account?

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo

Re: [Boost-users] [Multi-index] hash_unique memory consumption vs std::map

joaquin＠tid.es