On Wed, Nov 5, 2008 at 5:27 AM,
There is no way to fine-tune the size of the bucket array: this is in general the minimum of a list of prime numbers, roughly each the double of the prior, that is compatible with the maximum load factor specified . With the default maximum load factor mlf=1.0, the size of the bucket array can range approximately between n and 2n, where n is the number of elements. You can use max_load_factor(z) to set the maximum load factor slightly below 1.0 to see if that improves the situation (that is, if the bucket size keeps at the size immediately preceding the one you have now). The member function bucket_count() gives you the size of the bucket array. Is this indeed much larger than the number of elements?
It is indeed much larger. I have 22 attributes with my particular test graph. bucket_count() is 53. I investigated further and this is because 53 is the lowest possible size for bucket_array_base. Should I just stick to using std::map when I know that N will be less than 53, or would modifying prime_list[] make sense?
Nevertheless, the differences in memory consumption do not look consistent to me: a std::map has an overhead of 12-16 bytes per element (16 in most cases, 12 if some optimizations for a so-called "color" internal parameter are applied) for 32-bit architectures. For a hashed index the overhead (with lf=1.0) should be between 8 and 12 bytes per element. We are missing something here. Can you provide more detailed info on how you're measuring memory consumption? Are there any aspect you might not be taking into account?
I'm using the MS specific _CrtMemDumpStatistics function immediately after loading the graph. I've also used VTune, but that reports system-wide memory usage, so it's hard to be more detailed than "x uses more than y" using that profiler. --Michael Fawcett