[Multi-index] hash_unique memory consumption vs std::map

In my application I need to load millions of vertices into a graph. I'm using BGL for this. Each edge has around 20-30 attributes associated with it and each attribute has a unique ID (not necessarily sequential or 0-based). I'm currently using this multi_index_container for the attributes: typedef detail::mutable_pair<long, _variant_t> id_attribute_pair_type; typedef boost::multi_index_container < id_attribute_pair_type, boost::multi_index::indexed_by < boost::multi_index::hashed_unique<boost::multi_index::member<id_attribute_pair_type, long, &id_attribute_pair_type::first> > >
map_type;
which emulates: typedef std::map<long, _variant_t> map_type; If I check memory consumption after loading the graph using multi_index I'm at 190MB. Using std::map it's at 124MB. I realize this is because std::map is perfectly sized and a hashmap will grow in chunks, but is there a way I can reserve a size ahead of time since I know exactly how many attributes there will be? My graph loading performance does not seem to change between multi_index and map, however subsequent search operations (shortest_path, etc) seem to favor multi_index. Thanks for such a great library. I've used it in many projects. --Michael Fawcett

Michael Fawcett escribió:
In my application I need to load millions of vertices into a graph. I'm using BGL for this. Each edge has around 20-30 attributes associated with it and each attribute has a unique ID (not necessarily sequential or 0-based).
I'm currently using this multi_index_container for the attributes:
typedef detail::mutable_pair<long, _variant_t> id_attribute_pair_type; typedef boost::multi_index_container < id_attribute_pair_type, boost::multi_index::indexed_by < boost::multi_index::hashed_unique<boost::multi_index::member<id_attribute_pair_type, long, &id_attribute_pair_type::first> > >
map_type;
which emulates:
typedef std::map<long, _variant_t> map_type;
If I check memory consumption after loading the graph using multi_index I'm at 190MB. Using std::map it's at 124MB. I realize this is because std::map is perfectly sized and a hashmap will grow in chunks, but is there a way I can reserve a size ahead of time since I know exactly how many attributes there will be?
There is no way to fine-tune the size of the bucket array: this is in general the minimum of a list of prime numbers, roughly each the double of the prior, that is compatible with the maximum load factor specified . With the default maximum load factor mlf=1.0, the size of the bucket array can range approximately between n and 2n, where n is the number of elements. You can use max_load_factor(z) to set the maximum load factor slightly below 1.0 to see if that improves the situation (that is, if the bucket size keeps at the size immediately preceding the one you have now). The member function bucket_count() gives you the size of the bucket array. Is this indeed much larger than the number of elements? Nevertheless, the differences in memory consumption do not look consistent to me: a std::map has an overhead of 12-16 bytes per element (16 in most cases, 12 if some optimizations for a so-called "color" internal parameter are applied) for 32-bit architectures. For a hashed index the overhead (with lf=1.0) should be between 8 and 12 bytes per element. We are missing something here. Can you provide more detailed info on how you're measuring memory consumption? Are there any aspect you might not be taking into account? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Wed, Nov 5, 2008 at 5:27 AM, <joaquin@tid.es> wrote:
There is no way to fine-tune the size of the bucket array: this is in general the minimum of a list of prime numbers, roughly each the double of the prior, that is compatible with the maximum load factor specified . With the default maximum load factor mlf=1.0, the size of the bucket array can range approximately between n and 2n, where n is the number of elements. You can use max_load_factor(z) to set the maximum load factor slightly below 1.0 to see if that improves the situation (that is, if the bucket size keeps at the size immediately preceding the one you have now). The member function bucket_count() gives you the size of the bucket array. Is this indeed much larger than the number of elements?
It is indeed much larger. I have 22 attributes with my particular test graph. bucket_count() is 53. I investigated further and this is because 53 is the lowest possible size for bucket_array_base. Should I just stick to using std::map when I know that N will be less than 53, or would modifying prime_list[] make sense?
Nevertheless, the differences in memory consumption do not look consistent to me: a std::map has an overhead of 12-16 bytes per element (16 in most cases, 12 if some optimizations for a so-called "color" internal parameter are applied) for 32-bit architectures. For a hashed index the overhead (with lf=1.0) should be between 8 and 12 bytes per element. We are missing something here. Can you provide more detailed info on how you're measuring memory consumption? Are there any aspect you might not be taking into account?
I'm using the MS specific _CrtMemDumpStatistics function immediately after loading the graph. I've also used VTune, but that reports system-wide memory usage, so it's hard to be more detailed than "x uses more than y" using that profiler. --Michael Fawcett

Michael Fawcett <michael.fawcett <at> gmail.com> writes:
On Wed, Nov 5, 2008 at 5:27 AM, <joaquin <at> tid.es> wrote:
The member function bucket_count() gives you the size of the bucket array. Is this indeed much larger than the number of elements?
It is indeed much larger. I have 22 attributes with my particular test graph. bucket_count() is 53.
Oh, sorry, I didn't understand your scenario at first. I thought you have one container with millions of elements, when what you have is millions of containers with a few elements (22) each. Is that right?
I investigated further and this is because 53 is the lowest possible size for bucket_array_base. Should I just stick to using std::map when I know that N will be less than 53, or would modifying prime_list[] make sense?
Absolutely, just add some prime which is roughly 53/2, say 23. I'm sorry this is forcing you to change B.MI source code. Please report back whether this improves things. [rest snipped as it's clear to me now] Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Wed, Nov 5, 2008 at 2:38 PM, Joaquin M Lopez Munoz <joaquin@tid.es> wrote:
Oh, sorry, I didn't understand your scenario at first. I thought you have one container with millions of elements, when what you have is millions of containers with a few elements (22) each. Is that right?
Yes exactly, sorry I wasn't clear.
Absolutely, just add some prime which is roughly 53/2, say 23. I'm sorry this is forcing you to change B.MI source code. Please report back whether this improves things.
I was trying this as you responded. I tried 23 and things improved greatly. It now uses less memory than std::map (as you suspected) by about 1%, or 1.2MB in my small graph test case. This will go up to about 120MB in my large graph test case, so that's very helpful. For more gains over std::map - Graph load times decreased by 3.4%. Graph search time decreased by 17%. Graph save times decreased by 14%. By the way, I've always been impressed by how quickly you respond on these mailing lists in the past. You continue to impress ;) Thanks, --Michael Fawcett

Michael Fawcett <michael.fawcett <at> gmail.com> writes:
On Wed, Nov 5, 2008 at 2:38 PM, Joaquin M Lopez Munoz <joaquin <at> tid.es> wrote:
Absolutely, just add some prime which is roughly 53/2, say 23. I'm sorry this is forcing you to change B.MI source code. Please report back whether this improves things.
I was trying this as you responded. I tried 23 and things improved greatly. It now uses less memory than std::map (as you suspected) by about 1%, or 1.2MB in my small graph test case. This will go up to about 120MB in my large graph test case, so that's very helpful.
Ummm.. It should be still less than that, as the overhead of the hashed index is now vey close to 2 pointers per element, while std::map incurs at best 3 pointers per element. Can you declare some iterator "it" to the hashed index and evaluate the following expression sizeof(*(it.get_node())); and compare with sizeof(id_attribute_pair_type) ?
For more gains over std::map -
Graph load times decreased by 3.4%. Graph search time decreased by 17%. Graph save times decreased by 14%.
By the way, I've always been impressed by how quickly you respond on these mailing lists in the past. You continue to impress ;)
Nothing to be proud of --I just spend too much time on the Internet :) Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Wed, Nov 5, 2008 at 3:22 PM, Joaquin M Lopez Munoz <joaquin@tid.es> wrote:
Ummm.. It should be still less than that, as the overhead of the hashed index is now vey close to 2 pointers per element, while std::map incurs at best 3 pointers per element.
Can you declare some iterator "it" to the hashed index and evaluate the following expression
sizeof(*(it.get_node()));
and compare with
sizeof(id_attribute_pair_type)
You're right. The first prints out 32, the second prints out 24. In my test graph there are 106,096 edges in the graph. Each edge has 22 attributes. In the hashed_index case, this results in a bucket_array with bucket_count() == 23 and 8 bytes overhead per node with a node (in my case) at 24 bytes. So that means 106,096 * 23 * 32 == 78,086,656 bytes. In the std::map case, this results in 22 nodes at 12 bytes overhead per node. 106,096 * 22 * 36 == 84,028,032 bytes. I'm thinking I probably missed a digit in my previous email. --Michael Fawcett

Michael Fawcett <michael.fawcett <at> gmail.com> writes:
On Wed, Nov 5, 2008 at 3:22 PM, Joaquin M Lopez Munoz <joaquin <at> tid.es wrote:
Can you declare some iterator "it" to the hashed index and evaluate the following expression
sizeof(*(it.get_node()));
and compare with
sizeof(id_attribute_pair_type)
You're right. The first prints out 32, the second prints out 24.
You're in a 32-bit platform, right? Then, the first sizeof() should ideally be 24+4=28, not 32. I think you're having a padding issue due to the presence of a long type in id_attribute_pair_type. Can you try changing the order in which long and _variant_t are declared within id_attribute_pair_type? This might avoid padding and get you a sizeof(*(it.get_node()))=28. If not, well, we're out of luck.
In my test graph there are 106,096 edges in the graph. Each edge has 22 attributes.
In the hashed_index case, this results in a bucket_array with bucket_count() == 23 and 8 bytes overhead per node with a node (in my case) at 24 bytes. So that means 106,096 * 23 * 32 == 78,086,656 bytes.
Not entirely correct. The right formula is n_edges*(n_attributes*sizeof(node)+ bucket_count*4) which in our case is 106,096*(22*32 + 23*4) = 84,452,416
In the std::map case, this results in 22 nodes at 12 bytes overhead per node. 106,096 * 22 * 36 == 84,028,032 bytes.
Not consistent with the previous calculation and your reported saving of 1% in memory consumption :-/ Also, you might want to investigate whether the overhead for std::map is 12 bytes per element or 16 bytes per element --the latter is more likely. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Wed, Nov 5, 2008 at 4:18 PM, Joaquin M Lopez Munoz <joaquin@tid.es> wrote:
You're in a 32-bit platform, right? Then, the first sizeof() should ideally be 24+4=28, not 32. I think you're having a padding issue due to the presence of a long type in id_attribute_pair_type. Can you try changing the order in which long and _variant_t are declared within id_attribute_pair_type? This might avoid padding and get you a sizeof(*(it.get_node()))=28. If not, well, we're out of luck.
Yes, 32-bit platform. We're out of luck - I tried both ways. 32 bytes for both.
Not entirely correct. The right formula is
n_edges*(n_attributes*sizeof(node)+ bucket_count*4)
which in our case is
106,096*(22*32 + 23*4) = 84,452,416
In the std::map case, this results in 22 nodes at 12 bytes overhead per node. 106,096 * 22 * 36 == 84,028,032 bytes.
Not consistent with the previous calculation and your reported saving of 1% in memory consumption :-/ Also, you might want to investigate whether the overhead for std::map is 12 bytes per element or 16 bytes per element --the latter is more likely.
Well, the numbers are different because I was reporting total program memory usage for both implementations while the formulas are for just that particular data structure. The percentages, however, should have been consistent. Not sure why they weren't, but I've since modified the program so much that it would be difficult to go back and test. I investigated the node structure that my std::map implementation is using. It has an overhead of 3 pointers and 2 chars. This is with Visual Studio 2005 SP1. --Michael Fawcett

Michael Fawcett <michael.fawcett <at> gmail.com> writes:
I investigated the node structure that my std::map implementation is using. It has an overhead of 3 pointers and 2 chars. This is with Visual Studio 2005 SP1.
Are you building in release with _SECURE_SCL #defined to 0? If not, you have extra size overhead from debug and/or checked iterators being enabled. See the following links for more information: http://msdn.microsoft.com/en-us/library/aa985965.aspx http://msdn.microsoft.com/en-us/library/aa985982.aspx

On Thu, Nov 6, 2008 at 4:27 PM, Adam Merz <adammerz@hotmail.com> wrote:
Michael Fawcett <michael.fawcett <at> gmail.com> writes:
I investigated the node structure that my std::map implementation is using. It has an overhead of 3 pointers and 2 chars. This is with Visual Studio 2005 SP1.
Are you building in release with _SECURE_SCL #defined to 0? If not, you have extra size overhead from debug and/or checked iterators being enabled.
I'm building in Debug mode with _SECURE_SCL = 0 and _HAS_ITERATOR_DEBUGGING = 0. I'm in Debug mode so that the _Crt* debug functions aren't compiled away since I was using those to track memory usage. Do you know of a better way to track memory usage in Windows? Task Manager is obviously not up to the task, VTune reports it on a system-wide basis, not per process, and _Crt* functions require modifying your code. --Michael Fawcett
participants (4)
-
Adam Merz
-
Joaquin M Lopez Munoz
-
joaquin@tid.es
-
Michael Fawcett