[unordered] Buffered functions?

Hi all, Can someone please explain why the unordered containers store two copies of the hash and equality functions, and why they aren't using compressed_pair to take advantage of the EBO for these? Thanks, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams skrev:
Hi all,
Can someone please explain why the unordered containers store two copies of the hash and equality functions, and why they aren't using compressed_pair to take advantage of the EBO for these?
Its been discussed before, and here are the answers I got from the author: 1. storing two functors had to do with implementing exception safety if copying of the functors may throw. Was it swap() that was the problem? 2. the space optimization only gave little less space. -Thorsten

2008/11/24 Thorsten Ottosen <thorsten.ottosen@dezide.com>:
1. storing two functors had to do with implementing exception safety if copying of the functors may throw. Was it swap() that was the problem?
Here's the original post where I described the problem: http://lists.boost.org/Archives/boost/2005/02/80211.php I think it also applies to assignment.
2. the space optimization only gave little less space.
The cost from not using EBO amounts to a few bytes - which can sometimes be offset due to alignment. Since 4 bytes is typically required per bucket it doesn't seem like a large amount. Early versions used boost::compressed_pair but that had poor support for Visual C++ 6 (which I supported at the time). I've been thinking about reintroducing EBO, but there have also been plenty of other more pressing tasks. I'd probably implement it manually though, as I don't think using compressed_pair would buy much. I can't imagine a situation where the hash function will be empty and the equality predicate won't so there isn't much point in the extra complexity and dependency. Similarly, I'm not aware of an allocator that is empty for a bucket, but not for a node. Daniel

on Mon Nov 24 2008, "Daniel James" <daniel_james-AT-fmail.co.uk> wrote:
2008/11/24 Thorsten Ottosen <thorsten.ottosen@dezide.com>:
1. storing two functors had to do with implementing exception safety if copying of the functors may throw. Was it swap() that was the problem?
Here's the original post where I described the problem:
http://lists.boost.org/Archives/boost/2005/02/80211.php
I think it also applies to assignment.
This looks like a bug in the standard. Equality and hashing should be required to be nothrow-swappable. I've raised that issue on the committee reflector.
2. the space optimization only gave little less space.
The cost from not using EBO amounts to a few bytes - which can sometimes be offset due to alignment. Since 4 bytes is typically required per bucket it doesn't seem like a large amount.
Some people really care about the space taken up by an empty container.
Early versions used boost::compressed_pair but that had poor support for Visual C++ 6 (which I supported at the time).
I hope that's a non-issue now :-)
I've been thinking about reintroducing EBO, but there have also been plenty of other more pressing tasks.
I'd probably implement it manually
Implement _what_ manually?
though, as I don't think using compressed_pair would buy much. I can't imagine a situation where the hash function will be empty and the equality predicate won't so there isn't much point in the extra complexity and dependency. Similarly, I'm not aware of an allocator that is empty for a bucket, but not for a node.
It seems like you think compressed pair can only compress away one member, but IIUC you can chain them and then derive your hash table from that to compress as much as you like. Of course it would be nice to have a compressed_tuple that handles it all for us, but I thikn you _can_ do it. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Nov 25, 2008, at 4:42 PM, David Abrahams wrote:
on Mon Nov 24 2008, "Daniel James" <daniel_james-AT-fmail.co.uk> wrote:
2008/11/24 Thorsten Ottosen <thorsten.ottosen@dezide.com>:
1. storing two functors had to do with implementing exception safety if copying of the functors may throw. Was it swap() that was the problem?
Here's the original post where I described the problem:
http://lists.boost.org/Archives/boost/2005/02/80211.php
I think it also applies to assignment.
This looks like a bug in the standard. Equality and hashing should be required to be nothrow-swappable. I've raised that issue on the committee reflector.
Fwiw, I support nothrow-swappable and encourage others to express their opinions on this issue. This is a decision that will impact all of us.
2. the space optimization only gave little less space.
The cost from not using EBO amounts to a few bytes - which can sometimes be offset due to alignment. Since 4 bytes is typically required per bucket it doesn't seem like a large amount.
Some people really care about the space taken up by an empty container.
<nod> Believe it or not vector<unordered_set<A>> is a viable container, even when many of those inner containers are empty. If the overhead(unordered_set<A>) can be 5 words instead of 9 words (or more), then the ability to fit a data structure into nearly half the space is a significant size *and* speed optimization (for some clients). It is never the number of words that matter in shaving off the space overhead of a container. It is the percent difference in space overhead between an optimized and unoptimized implementation that matters. Because for the clients who care, they have a very large number of containers which statically have a small size(). If you can cut the overhead(container) by 25% or more, it is a big win.
It seems like you think compressed pair can only compress away one member, but IIUC you can chain them and then derive your hash table from that to compress as much as you like. Of course it would be nice to have a compressed_tuple that handles it all for us, but I thikn you _can_ do it.
I have high hopes that std::tuple will become the compressed_pair of the future. Technically, and especially with variadic template language support, it is not that hard to do (tuple is a pain no matter what without variadic template language support). Standards-wise, the current C++0X draft (N2798) does not outlaw, nor mandate, the empty member optimization for tuple, and I hope to keep it that way. Recent straw polls on the LWG have been favorable to allowing the empty member optimization for std::tuple. -Howard

Howard Hinnant wrote:
On Nov 25, 2008, at 4:42 PM, David Abrahams wrote:
on Mon Nov 24 2008, "Daniel James" <daniel_james-AT-fmail.co.uk> wrote:
2008/11/24 Thorsten Ottosen <thorsten.ottosen@dezide.com>:
1. storing two functors had to do with implementing exception safety if copying of the functors may throw. Was it swap() that was the problem?
Here's the original post where I described the problem:
http://lists.boost.org/Archives/boost/2005/02/80211.php
I think it also applies to assignment.
This looks like a bug in the standard. Equality and hashing should be required to be nothrow-swappable. I've raised that issue on the committee reflector.
Fwiw, I support nothrow-swappable and encourage others to express their opinions on this issue. This is a decision that will impact all of us.
I agree. This will simplify our lives. Now I would like to see a real case where the comparison function or the hash function throws. Regards,
2. the space optimization only gave little less space.
The cost from not using EBO amounts to a few bytes - which can sometimes be offset due to alignment. Since 4 bytes is typically required per bucket it doesn't seem like a large amount.
Some people really care about the space taken up by an empty container.
I was the review manager for the unordered library and I remember the issue was discussed in the review. My personal view was to avoid having buffered functions but unordered library took the refreshing approach of having stronger guarantees, something that I also like. For example, unordered_multisets have an additional pointer to group equal values in a singly linked list so that bucket traversing is linear to the number of different values instead of the number of values in the bucket. It's a nice feature and useful in several scenarios (I even added that option to Intrusive unordered containers), but you have to pay for it ;-)
I have high hopes that std::tuple will become the compressed_pair of the future. Technically, and especially with variadic template language support, it is not that hard to do (tuple is a pain no matter what without variadic template language support). Standards-wise, the current C++0X draft (N2798) does not outlaw, nor mandate, the empty member optimization for tuple, and I hope to keep it that way. Recent straw polls on the LWG have been favorable to allowing the empty member optimization for std::tuple.
-Howard
I hope EBO gets into tuple. Tuple seems quite lightweight for compilation times in C++0x but let's avoid using tuple to do some EBO in unordered because pushing all the preprocessor machinery to do simple EBO it's a bit of overkill. Regards, ion

on Wed Nov 26 2008, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:
For example, unordered_multisets have an additional pointer to group equal values in a singly linked list so that bucket traversing is linear to the number of different values instead of the number of values in the bucket. It's a nice feature and useful in several scenarios (I even added that option to Intrusive unordered containers), but you have to pay for it ;-)
Hmm, that doesn't make a lot of sense to me. Lots of equal values in a hash table can quickly destroy its O(1) lookup characteristics. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2008/11/26 David Abrahams <dave@boostpro.com>:
on Wed Nov 26 2008, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:
For example, unordered_multisets have an additional pointer to group equal values in a singly linked list so that bucket traversing is linear to the number of different values instead of the number of values in the bucket. It's a nice feature and useful in several scenarios (I even added that option to Intrusive unordered containers), but you have to pay for it ;-)
Hmm, that doesn't make a lot of sense to me. Lots of equal values in a hash table can quickly destroy its O(1) lookup characteristics.
That's what it avoids. Because equal values are grouped together, you only need to examine a single key for all the equal values. Here's a quick diagram of a bucket in an unordered_multiset (assuming that the hash function is terrible and 5,3 and 2 all end up in the same bucket): http://www.calamity.org.uk/x/bucket.png The black links represent the normal single linked list. The red links are a circular list running backwards through equal values. This results in a red link from the first node in a group of equals values to the last node - this lets you quickly skip over equal values. This structure is easy to maintain and has some nice results, such as a stable order when inserting. The only tricky part was erasing by iterator. Daniel

on Wed Nov 26 2008, "Daniel James" <daniel_james-AT-fmail.co.uk> wrote:
2008/11/26 David Abrahams <dave@boostpro.com>:
on Wed Nov 26 2008, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:
For example, unordered_multisets have an additional pointer to group equal values in a singly linked list so that bucket traversing is linear to the number of different values instead of the number of values in the bucket. It's a nice feature and useful in several scenarios (I even added that option to Intrusive unordered containers), but you have to pay for it ;-)
Hmm, that doesn't make a lot of sense to me. Lots of equal values in a hash table can quickly destroy its O(1) lookup characteristics.
That's what it avoids. Because equal values are grouped together, you only need to examine a single key for all the equal values.
I understand.
Here's a quick diagram of a bucket in an unordered_multiset (assuming that the hash function is terrible and 5,3 and 2 all end up in the same bucket):
Yes, I figured it was something like that. It still doesn't make sense to me. If the number of repeated values in a bucket is small, it won't hurt to traverse all of them. If it is really large enough that traversing all equal values to get to the next value is expensive, it will destroy O(1) lookup in any buckets with collisions. Did anyone profile this in a real application and find it to be a win?
The black links represent the normal single linked list. The red links are a circular list running backwards through equal values. This results in a red link from the first node in a group of equals values to the last node - this lets you quickly skip over equal values. This structure is easy to maintain and has some nice results, such as a stable order when inserting.
Stable order when inserting is nice, but ought to be achievable without the overhead of an extra pointer per node. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2008/11/27 David Abrahams <dave@boostpro.com>:
Yes, I figured it was something like that. It still doesn't make sense to me. If the number of repeated values in a bucket is small, it won't hurt to traverse all of them. If it is really large enough that traversing all equal values to get to the next value is expensive, it will destroy O(1) lookup in any buckets with collisions.
That's always an issue with hash tables. But you can improve matters by using a better hash function or lowering the maximum load factor. But if your container is slowing down because you have elements with equal keys there's nothing you can do. Maybe you could use unordered_map<key, vector<value> >?
Did anyone profile this in a real application and find it to be a win?
No, and nobody has found it to be a loss either. It isn't about optimization, it's about meeting the complexity requirements.
Stable order when inserting is nice, but ought to be achievable without the overhead of an extra pointer per node.
Stable order is not just nice, it's a requirement. But it was never the main purpose of this design, I implemented it before the standard changed, but it is a nice side product. Daniel

on Thu Nov 27 2008, "Daniel James" <daniel_james-AT-fmail.co.uk> wrote:
2008/11/27 David Abrahams <dave@boostpro.com>:
Yes, I figured it was something like that. It still doesn't make sense to me. If the number of repeated values in a bucket is small, it won't hurt to traverse all of them. If it is really large enough that traversing all equal values to get to the next value is expensive, it will destroy O(1) lookup in any buckets with collisions.
That's always an issue with hash tables. But you can improve matters by using a better hash function or lowering the maximum load factor.
Yes, but if you improve it enough, you don't end up with multiple key values in a single bucket, so your extra pointer is wasted.
But if your container is slowing down because you have elements with equal keys there's nothing you can do.
Please define "container is slowing down." I can't picture what you mean.
Maybe you could use unordered_map<key, vector<value> >?
Did anyone profile this in a real application and find it to be a win?
No, and nobody has found it to be a loss either.
It's a manifest loss; I don't need to profile it to see that the overhead per stored element can be significant!
It isn't about optimization, it's about meeting the complexity requirements.
Where did these requirements come from? In generic programming, complexity requirements should never be so strict that they impose significant efficiency cost. If that requirement is in the standard, it's a bug.
Stable order when inserting is nice, but ought to be achievable without the overhead of an extra pointer per node.
Stable order is not just nice, it's a requirement.
Again, whence the requirement?
But it was never the main purpose of this design, I implemented it before the standard changed,
Changed?
but it is a nice side product.
Sorry, I'm confused. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
on Wed Nov 26 2008, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:
For example, unordered_multisets have an additional pointer to group equal values in a singly linked list so that bucket traversing is linear to the number of different values instead of the number of values in the bucket. It's a nice feature and useful in several scenarios (I even added that option to Intrusive unordered containers), but you have to pay for it ;-)
Hmm, that doesn't make a lot of sense to me. Lots of equal values in a hash table can quickly destroy its O(1) lookup characteristics.
Ok, but unordered_multimap is supposed to deal with equal values just like multimap, so I think you have use cases where you might introduce many "equal" values (for example a lot of requests/packets... from the same client, user, or whatever). If you chain groups together you might save a lot of comparisons. In case you introduce a lot of values with the same key you penalize all the keys that share the same bucket. Again, this depends on the application, but having multiple equal keys seems ok to me, and the container will surely not rehash if a lot of equals keys are introduced and other buckets are empty (load would be still low). Not to mention that with these group pointers equal_range is O(1). Regards, Ion

on Tue Nov 25 2008, Howard Hinnant <howard.hinnant-AT-gmail.com> wrote:
<nod> Believe it or not vector<unordered_set<A>> is a viable container, even when many of those inner containers are empty. If the overhead(unordered_set<A>) can be 5 words instead of 9 words (or more), then the ability to fit a data structure into nearly half the space is a significant size *and* speed optimization (for some clients).
It is never the number of words that matter in shaving off the space overhead of a container. It is the percent difference in space overhead between an optimized and unoptimized implementation that matters. Because for the clients who care, they have a very large number of containers which statically have a small size(). If you can cut the overhead(container) by 25% or more, it is a big win.
One thing I've always wondered about this: in that case -- at least when you're not also trying to use the small object optimization -- why not always use a Pimpl idiom where an empty container (with no capacity) can be represented by a single null pointer?
It seems like you think compressed pair can only compress away one member, but IIUC you can chain them and then derive your hash table from that to compress as much as you like. Of course it would be nice to have a compressed_tuple that handles it all for us, but I thikn you _can_ do it.
I have high hopes that std::tuple will become the compressed_pair of the future. Technically, and especially with variadic template language support, it is not that hard to do (tuple is a pain no matter what without variadic template language support). Standards-wise, the current C++0X draft (N2798) does not outlaw, nor mandate, the empty member optimization for tuple,
Except that there is no "empty member optimization," right? Unless this is one of the langauge changes I've failed to notice (and there are a few), I think what you mean is that tuple is free to use derivation to take full advantage of the EBO, right?
and I hope to keep it that way. Recent straw polls on the LWG have been favorable to allowing the empty member optimization for std::tuple.
Until we get CWG to agree that the EMO exists, I doubt straw polls in LWG can make much difference ;-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Dec 27, 2008, at 6:57 PM, David Abrahams wrote:
on Tue Nov 25 2008, Howard Hinnant <howard.hinnant-AT-gmail.com> wrote:
<nod> Believe it or not vector<unordered_set<A>> is a viable container, even when many of those inner containers are empty. If the overhead(unordered_set<A>) can be 5 words instead of 9 words (or more), then the ability to fit a data structure into nearly half the space is a significant size *and* speed optimization (for some clients).
It is never the number of words that matter in shaving off the space overhead of a container. It is the percent difference in space overhead between an optimized and unoptimized implementation that matters. Because for the clients who care, they have a very large number of containers which statically have a small size(). If you can cut the overhead(container) by 25% or more, it is a big win.
One thing I've always wondered about this: in that case -- at least when you're not also trying to use the small object optimization -- why not always use a Pimpl idiom where an empty container (with no capacity) can be represented by a single null pointer?
I've simply had clients complain to me: why is sizeof(vector) 4 words instead of 3? Please make it 3. Given that complaint, I can attempt to generate reasonable use cases which motivate such a complaint. But my basic premise is that those complaining know their needs far better than I do. I've never actually had anyone complain to me about the sizeof an unordered_set. However it seems logical to me that if clients are concerned about the overhead of one container, they could be concerned about the overhead of another (and thus any) container. Concerning your suggestion about container<unique_ptr<unordered_set<A>>> as a substitute for container<unordered_set<A>> (or equivalent), my best guess is that there may be clients with use cases where the median size of the inner container is small but non-zero, and thus the extra heap allocation and indirection ends up costing more. Suffice it to say that from a general purpose library point of view, more overhead is never better if you can get the equivalent functionality/performance out of less overhead. And historically, the design of the std::containers has given precedence to speed and space performance as opposed to turning basic exception safety into strong exception safety (and I agree with that design philosophy for a general purpose library).
It seems like you think compressed pair can only compress away one member, but IIUC you can chain them and then derive your hash table from that to compress as much as you like. Of course it would be nice to have a compressed_tuple that handles it all for us, but I thikn you _can_ do it.
I have high hopes that std::tuple will become the compressed_pair of the future. Technically, and especially with variadic template language support, it is not that hard to do (tuple is a pain no matter what without variadic template language support). Standards-wise, the current C++0X draft (N2798) does not outlaw, nor mandate, the empty member optimization for tuple,
Except that there is no "empty member optimization," right? Unless this is one of the langauge changes I've failed to notice (and there are a few), I think what you mean is that tuple is free to use derivation to take full advantage of the EBO, right?
and I hope to keep it that way. Recent straw polls on the LWG have been favorable to allowing the empty member optimization for std::tuple.
Until we get CWG to agree that the EMO exists, I doubt straw polls in LWG can make much difference ;-)
I used the term "empty member optimization" to refer to the library technique for optimization as opposed to a language optimization. Your assumption regarding my meaning referring to derivation under the covers is correct. As long as we're on the subject, I should warn off naive attempts at this optimization which look like: namespace std { template <class T, class A> class vector : private A { ... }; } // std Such an implementation of "EMO" is error prone as it associates the namespace of the client-supplied A with that of the vector. I'm not preaching to Dave with this statement as he has a proven track record in this department with his implementation of boost::noncopyable (http://www.boost.org/doc/libs/1_37_0/libs/utility/utility.htm#Class_noncopya... ). I mention this warning aimed at the widest possible audience. Instead one should use the techniques employed by boost::compressed_pair, or go back to (as far as I know) the originator of this technique, Nathan Myers (http://www.cantrip.org/emptyopt.html ). -Howard

on Sat Dec 27 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
On Dec 27, 2008, at 6:57 PM, David Abrahams wrote:
One thing I've always wondered about this: in that case -- at least when you're not also trying to use the small object optimization -- why not always use a Pimpl idiom where an empty container (with no capacity) can be represented by a single null pointer?
I've simply had clients complain to me: why is sizeof(vector) 4 words instead of 3? Please make it 3.
I take it you mean that nobody has ever complained to you that sizeof(vector) should have been one word.
Given that complaint, I can attempt to generate reasonable use cases which motivate such a complaint. But my basic premise is that those complaining know their needs far better than I do.
I've never actually had anyone complain to me about the sizeof an unordered_set. However it seems logical to me that if clients are concerned about the overhead of one container, they could be concerned about the overhead of another (and thus any) container.
By the same token, it seems logical to me that if reducing the size of a vector to 3 words matters, reducing it to one word matters even more ;-)
Concerning your suggestion about container<unique_ptr<unordered_set<A>>> as a substitute for container<unordered_set<A>> (or equivalent),
That's a twisty way to say it. Just to be sure we understand one another, I'll clarify that I never suggested about asking clients to use the unique_ptr approach. Rather, my suggestion was that the library use unique_ptr (or equivalent) under the covers to minimize the size of empty data structures.
my best guess is that there may be clients with use cases where the median size of the inner container is small but non-zero, and thus the extra heap allocation and indirection ends up costing more.
I'm guessing that most of those clients either haven't had the opportunity to make the measurement, or else they use unique_ptr (or equivalent) explicitly to get the effect they want. Also, the lack, up to now, of a conforming auto_ptr-sized thingy that can be stored in containers has surely impacted the measurements people are able to conveniently make.
Suffice it to say that from a general purpose library point of view, more overhead is never better if you can get the equivalent functionality/performance out of less overhead. And historically, the design of the std::containers has given precedence to speed and space performance as opposed to turning basic exception safety into strong exception safety (and I agree with that design philosophy for a general purpose library).
Me too...
It seems like you think compressed pair can only compress away one member, but IIUC you can chain them and then derive your hash table from that to compress as much as you like. Of course it would be nice to have a compressed_tuple that handles it all for us, but I thikn you _can_ do it.
I have high hopes that std::tuple will become the compressed_pair of the future. Technically, and especially with variadic template language support, it is not that hard to do (tuple is a pain no matter what without variadic template language support). Standards-wise, the current C++0X draft (N2798) does not outlaw, nor mandate, the empty member optimization for tuple,
Except that there is no "empty member optimization," right? Unless this is one of the langauge changes I've failed to notice (and there are a few), I think what you mean is that tuple is free to use derivation to take full advantage of the EBO, right?
and I hope to keep it that way. Recent straw polls on the LWG have been favorable to allowing the empty member optimization for std::tuple.
Until we get CWG to agree that the EMO exists, I doubt straw polls in LWG can make much difference ;-)
I used the term "empty member optimization" to refer to the library technique for optimization as opposed to a language optimization. Your assumption regarding my meaning referring to derivation under the covers is correct. As long as we're on the subject, I should warn off naive attempts at this optimization which look like:
namespace std {
template <class T, class A> class vector : private A { ... };
} // std
Such an implementation of "EMO" is error prone as it associates the namespace of the client-supplied A with that of the vector. I'm not preaching to Dave with this statement as he has a proven track record in this department with his implementation of boost::noncopyable (http://www.boost.org/doc/libs/1_37_0/libs/utility/utility.htm#Class_noncopya... ).
Probably you should be preaching to me. That protection was added in 2004, and although I worry about unintended ADL a lot, the technique I used there wouldn't help anyone trying to leverage EBO. In fact overuse of boost::noncopyable can undermine EBO for reasons pointed out in...
I mention this warning aimed at the widest possible audience. Instead one should use the techniques employed by boost::compressed_pair, or go back to (as far as I know) the originator of this technique, Nathan Myers (http://www.cantrip.org/emptyopt.html ).
...Nathan's article. Nice article, Nathan! I never saw that before. The conundrum here is that if we use that technique for tuple, then a tuple of empty types can never be empty itself. I'm starting to get ill just thinking about a generic framework for composing "logically empty" types that undoes this "EBO erasure" effect when necessary. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Dec 28, 2008, at 1:25 AM, David Abrahams wrote:
The conundrum here is that if we use that technique for tuple, then a tuple of empty types can never be empty itself. I'm starting to get ill just thinking about a generic framework for composing "logically empty" types that undoes this "EBO erasure" effect when necessary.
Perhaps I'm misunderstanding. Reverting from English to C++ in the hopes of clarification. :-) The C++0X program and its output below look quite reasonable to me: #include <tuple> #include <type_traits> #include <cstdio> struct empty1 {}; struct empty2 {}; struct empty3 {}; int main() { typedef std::tuple<empty1> T1; std::printf("is_empty<tuple<empty1>> = %d\n", std::is_empty<T1>::value); typedef std::tuple<empty1, empty2> T2; std::printf("is_empty<tuple<empty1, empty2>> = %d\n", std::is_empty<T2>::value); typedef std::tuple<T2, empty3> T3; std::printf("is_empty<tuple<tuple<empty1, empty2>, empty3>> = %d \n", std::is_empty<T3>::value); typedef std::tuple<int, T3> T4; std::printf("is_empty<tuple<int, tuple<tuple<empty1, empty2>, empty3>>> = %d\n", std::is_empty<T4>::value); std::printf("sizeof(int) = %zu, sizeof(tuple<int, tuple<tuple<empty1, empty2>, empty3>>) = %zu\n", sizeof(int), sizeof(T4)); } is_empty<tuple<empty1>> = 1 is_empty<tuple<empty1, empty2>> = 1 is_empty<tuple<tuple<empty1, empty2>, empty3>> = 1 is_empty<tuple<int, tuple<tuple<empty1, empty2>, empty3>>> = 0 sizeof(int) = 4, sizeof(tuple<int, tuple<tuple<empty1, empty2>, empty3>>) = 4 I currently favor the "shallow hierarchy" design Doug outlined in c+ +std-lib-18989. Though EMO isn't shown in that description, it is pretty simple to add to that design. -Howard

on Sun Dec 28 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
On Dec 28, 2008, at 1:25 AM, David Abrahams wrote:
The conundrum here is that if we use that technique for tuple, then a tuple of empty types can never be empty itself. I'm starting to get ill just thinking about a generic framework for composing "logically empty" types that undoes this "EBO erasure" effect when necessary.
Perhaps I'm misunderstanding. Reverting from English to C++ in the hopes of clarification. :-) The C++0X program and its output below look quite reasonable to me:
<snip> Reasonable, but unless I'm missing something, impossible, unless the tuple itself is derived from all those empty bases. A class containing an empty member is not itself empty, IIUC.
I currently favor the "shallow hierarchy" design Doug outlined in c+ +std-lib-18989. Though EMO isn't shown in that description, it is pretty simple to add to that design.
/me rummages through his committee message archive... As far as I can tell from that thread, your solution and Doug's shallow refinement show exactly the problem you were trying to warn about: any empty element types become (private) bases of the tuple type itself, leading to unintended namespace associations. As far as I can tell this is no different in spirit from namespace std { template <class T, class A> class vector : private A { ... }; } // std Such an implementation of "EMO" is error prone as it associates the namespace of the client-supplied A with that of the vector And, BTW, this is all way too hard; at its limit it leads to building every generic class template out of tuples instead of ordinary members. We probably should have an EMO in the core language after all. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Dec 28, 2008, at 2:41 PM, David Abrahams wrote:
on Sun Dec 28 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
On Dec 28, 2008, at 1:25 AM, David Abrahams wrote:
The conundrum here is that if we use that technique for tuple, then a tuple of empty types can never be empty itself. I'm starting to get ill just thinking about a generic framework for composing "logically empty" types that undoes this "EBO erasure" effect when necessary.
Perhaps I'm misunderstanding. Reverting from English to C++ in the hopes of clarification. :-) The C++0X program and its output below look quite reasonable to me:
<snip>
Reasonable, but unless I'm missing something, impossible, unless the tuple itself is derived from all those empty bases. A class containing an empty member is not itself empty, IIUC.
Ah, I see what you are saying. Yes, I agree.
I currently favor the "shallow hierarchy" design Doug outlined in c+ +std-lib-18989. Though EMO isn't shown in that description, it is pretty simple to add to that design.
/me rummages through his committee message archive...
As far as I can tell from that thread, your solution and Doug's shallow refinement show exactly the problem you were trying to warn about: any empty element types become (private) bases of the tuple type itself, leading to unintended namespace associations. As far as I can tell this is no different in spirit from
namespace std {
template <class T, class A> class vector : private A { ... };
} // std
Such an implementation of "EMO" is error prone as it associates the namespace of the client-supplied A with that of the vector
Quite so. I forgot one other language rule when I posted the vector example. [basic.lookup.argdep]/p2/b2:
... Furthermore, if T is a class template specialization, its associated namespaces and classes also include: the namespaces and classes associated with the types of the template arguments provided for template type parameters (excluding template template parameters);
Demonstration: typedef char one; struct two {one _[2];}; namespace std { template <class T, class A> struct vector { }; template <class T, class A> one test(vector<T, A>); } // std namespace mine { struct A {}; two test(std::vector<int, A>); } // mine #include <cstdio> int main() { std::printf("%zu\n", sizeof(test(std::vector<int, mine::A>()))); } Output: 2 I.e. The namespace of A is associated with std::vector<T, A>, whether or not vector derives from A. Silly me for forgetting that. /Some/ day I may master C++...
And, BTW, this is all way too hard; at its limit it leads to building every generic class template out of tuples instead of ordinary members.
This conclusion would be an option for class template designers, not a requirement. And it would be easier than constantly reinventing EMO. Back when c++std-lib-18989 was written (about 18 months ago) I implemented such a tuple, and used that implementation in preparing my previous message demonstrating a space-optimizing tuple. You are quite correct that the tuple does indeed end up deriving (indirectly in my example implementation) from client-supplied empty members. I think perhaps one important caveat is that the tuple *does not* derive from client-supplied types which have virtual functions (at least no known implementation of polymorphic types qualify as "empty"). Said differently, this implementation of tuple only derives from empty class types, and not from non-empty class types. Because of this caveat, and because of the associated namespace rule involving template parameters of template class types, I have not yet come up with a way for clients to detect the size optimization with the following exceptions (which I hope are acceptable to clients): 1. The tuple size optimization can be detected with sizeof. 2. The tuple size optimization can be detected by comparing the addresses of empty members and seeing that they may not be unique with respect to other tuple members: #include <tuple> #include <cstdio> struct empty1 {}; struct empty2 {}; int main() { typedef std::tuple<empty1, empty2> T2; T2 t; std::printf("address of std::get<0>(t) is %p\n", &std::get<0>(t)); std::printf("address of std::get<1>(t) is %p\n", &std::get<1>(t)); } address of std::get<0>(t) is 0xbffff65f address of std::get<1>(t) is 0xbffff65f "Unacceptable" means of detecting this optimization would include a change in overload resolution due to associated namespaces depending on whether or not the space optimization was performed (which would likely be error prone).
We probably should have an EMO in the core language after all.
I would probably be in favor of such a core change. Indeed, in the "next language" I imagine that a tuple-like construct might be a built- in type which lays the foundation for the layout of every client- defined aggregate type (complete with built-in type and value introspection). Imagine if you could inspect the member types of a struct as easily as you can a std::tuple. :-) -Howard

on Sun Dec 28 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
I.e. The namespace of A is associated with std::vector<T, A>, whether or not vector derives from A. Silly me for forgetting that. /Some/ day I may master C++...
Oh, me too, I forgot that.
And, BTW, this is all way too hard; at its limit it leads to building every generic class template out of tuples instead of ordinary members.
This conclusion would be an option for class template designers, not a requirement. And it would be easier than constantly reinventing EMO.
Yes, but, why should we have to worry about that?
Back when c++std-lib-18989 was written (about 18 months ago) I implemented such a tuple, and used that implementation in preparing my previous message demonstrating a space-optimizing tuple. You are quite correct that the tuple does indeed end up deriving (indirectly in my example implementation) from client-supplied empty members. I think perhaps one important caveat is that the tuple *does not* derive from client-supplied types which have virtual functions (at least no known implementation of polymorphic types qualify as "empty"). Said differently, this implementation of tuple only derives from empty class types, and not from non-empty class types.
Understood.
Because of this caveat, and because of the associated namespace rule involving template parameters of template class types, I have not yet come up with a way for clients to detect the size optimization with the following exceptions (which I hope are acceptable to clients):
1. The tuple size optimization can be detected with sizeof. 2. The tuple size optimization can be detected by comparing the addresses of empty members and seeing that they may not be unique with respect to other tuple members:
Hmm, int f(void*); char* f(some_empty_type*); int x = f(tuple_containing_some_empty_type); // compilation error.
#include <tuple> #include <cstdio>
struct empty1 {}; struct empty2 {};
int main() { typedef std::tuple<empty1, empty2> T2; T2 t; std::printf("address of std::get<0>(t) is %p\n", &std::get<0>(t)); std::printf("address of std::get<1>(t) is %p\n", &std::get<1>(t)); }
address of std::get<0>(t) is 0xbffff65f address of std::get<1>(t) is 0xbffff65f
"Unacceptable" means of detecting this optimization would include a change in overload resolution due to associated namespaces depending on whether or not the space optimization was performed (which would likely be error prone).
It's not just a matter of what namespaces are associated, but also which functions in those namespaces will match.
We probably should have an EMO in the core language after all.
I would probably be in favor of such a core change. Indeed, in the "next language" I imagine that a tuple-like construct might be a built- in type which lays the foundation for the layout of every client- defined aggregate type (complete with built-in type and value introspection). Imagine if you could inspect the member types of a struct as easily as you can a std::tuple. :-)
Yup. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Dec 28, 2008, at 4:55 PM, David Abrahams wrote:
on Sun Dec 28 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
And, BTW, this is all way too hard; at its limit it leads to building every generic class template out of tuples instead of ordinary members.
This conclusion would be an option for class template designers, not a requirement. And it would be easier than constantly reinventing EMO.
Yes, but, why should we have to worry about that?
<practical> Because language supported EMO isn't a realistic possibility in C++0X but allowing (not requiring) EMO emulation in tuple is a realistic possibility (N2800 arguably already allows the latter). </practical> That being said, if you can get an easier to use solution into C++0X, then you're the man! :-)
Because of this caveat, and because of the associated namespace rule involving template parameters of template class types, I have not yet come up with a way for clients to detect the size optimization with the following exceptions (which I hope are acceptable to clients):
1. The tuple size optimization can be detected with sizeof. 2. The tuple size optimization can be detected by comparing the addresses of empty members and seeing that they may not be unique with respect to other tuple members:
Hmm,
int f(void*); char* f(some_empty_type*);
int x = f(tuple_containing_some_empty_type); // compilation error.
Excellent test case, thanks! Ok, I've made a slight tweak to my example implementation, and to the test case. The tuple still does space compression, but not to the extent as shown previously. Still, I believe it equals the performance of boost::compressed_pair: #include <tuple> #include <cstdio> struct empty1 {}; struct empty2 {}; struct empty3 {}; template <class Tuple, unsigned I = 0, unsigned S = std::tuple_size<Tuple>::value> struct print_sizeof_elements { void operator()() const { std::printf("sizeof element %u is %zu\n", I, sizeof(typename std::tuple_element<I, Tuple>::type)); print_sizeof_elements<Tuple, I+1>()(); } }; template <class Tuple, unsigned I> struct print_sizeof_elements<Tuple, I, I> { void operator()() const { } }; template <class Tuple> void test(const char* name) { std::printf("---- %s -----\n", name); std::printf("is_empty<%s> = %d\n", name, std::is_empty<Tuple>::value); std::printf("tuple_size<%s> = %zu\n", name, std::tuple_size<Tuple>::value); print_sizeof_elements<Tuple>()(); std::printf("sizeof(%s) = %zu\n", name, sizeof(Tuple)); }; int f(void*) {std::printf("int f(void*)\n");} char* f(empty1*) {std::printf("char* f(empty1*)\n");} int main() { test<std::tuple<empty1> >("std::tuple<empty1>"); test<std::tuple<empty1, empty2> >("std::tuple<empty1, empty2>"); test<std::tuple<empty1, empty2, empty3> >("std::tuple<empty1, empty2, empty3>"); test<std::tuple<int, empty1> >("std::tuple<int, empty1>"); test<std::tuple<int, empty1, empty2> >("std::tuple<int, empty1, empty2>"); std::tuple<empty1> t; f(&t); } ---- std::tuple<empty1> ----- is_empty<std::tuple<empty1>> = 0 tuple_size<std::tuple<empty1>> = 1 sizeof element 0 is 1 sizeof(std::tuple<empty1>) = 1 ---- std::tuple<empty1, empty2> ----- is_empty<std::tuple<empty1, empty2>> = 0 tuple_size<std::tuple<empty1, empty2>> = 2 sizeof element 0 is 1 sizeof element 1 is 1 sizeof(std::tuple<empty1, empty2>) = 1 ---- std::tuple<empty1, empty2, empty3> ----- is_empty<std::tuple<empty1, empty2, empty3>> = 0 tuple_size<std::tuple<empty1, empty2, empty3>> = 3 sizeof element 0 is 1 sizeof element 1 is 1 sizeof element 2 is 1 sizeof(std::tuple<empty1, empty2, empty3>) = 1 ---- std::tuple<int, empty1> ----- is_empty<std::tuple<int, empty1>> = 0 tuple_size<std::tuple<int, empty1>> = 2 sizeof element 0 is 4 sizeof element 1 is 1 sizeof(std::tuple<int, empty1>) = 4 ---- std::tuple<int, empty1, empty2> ----- is_empty<std::tuple<int, empty1, empty2>> = 0 tuple_size<std::tuple<int, empty1, empty2>> = 3 sizeof element 0 is 4 sizeof element 1 is 1 sizeof element 2 is 1 sizeof(std::tuple<int, empty1, empty2>) = 4 int f(void*) Note that as you asserted previously, a tuple is never empty. However, for this tuple it is possible that the sum of the sizeof its elements is less than the sizeof the tuple. This means that (for example), one could write create a smart pointer like so: template <class T, class D = default_delete<T>> class unique_ptr { tuple<T*, D> data_; public: ... }; And not have to worry about whether D is a function pointer or empty policy class, and still get sizeof(unique_ptr) == sizeof(T*) when D / is/ an empty policy class. The implementation technique for tuple which is used is: template <class ..._Tp> class tuple { typedef __tuple_impl<typename __make_tuple_indices<sizeof... (_Tp)>::type, _Tp...> base; base base_; // a member, not a base class! public: ... }; where __tuple_impl subsequently derives from __tuple_leaf: template<size_t ..._Indx, class ..._Tp> struct __tuple_impl<__tuple_indices<_Indx...>, _Tp...> : public __tuple_leaf<_Indx, _Tp>... { ... }; and __tuple_leaf either derives from _Tp if _Tp is an empty class, else it contains _Tp as a member.
#include <tuple> #include <cstdio>
struct empty1 {}; struct empty2 {};
int main() { typedef std::tuple<empty1, empty2> T2; T2 t; std::printf("address of std::get<0>(t) is %p\n", &std::get<0>(t)); std::printf("address of std::get<1>(t) is %p\n", &std::get<1>(t)); }
address of std::get<0>(t) is 0xbffff65f address of std::get<1>(t) is 0xbffff65f
"Unacceptable" means of detecting this optimization would include a change in overload resolution due to associated namespaces depending on whether or not the space optimization was performed (which would likely be error prone).
It's not just a matter of what namespaces are associated, but also which functions in those namespaces will match.
<nod> Thanks for the good test case. Please come up with more! :-) -Howard

on Sun Dec 28 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
On Dec 28, 2008, at 4:55 PM, David Abrahams wrote:
on Sun Dec 28 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
And, BTW, this is all way too hard; at its limit it leads to building every generic class template out of tuples instead of ordinary members.
This conclusion would be an option for class template designers, not a requirement. And it would be easier than constantly reinventing EMO.
Yes, but, why should we have to worry about that?
<practical>
Because language supported EMO isn't a realistic possibility in C++0X but allowing (not requiring) EMO emulation in tuple is a realistic possibility (N2800 arguably already allows the latter).
</practical>
That being said, if you can get an easier to use solution into C++0X, then you're the man! :-)
"Even I" can't slip that feature in without anyone noticing at this late date.
Hmm,
int f(void*); char* f(some_empty_type*);
int x = f(tuple_containing_some_empty_type); // compilation error.
Excellent test case, thanks!
Ok, I've made a slight tweak to my example implementation, and to the test case. The tuple still does space compression, but not to the extent as shown previously. Still, I believe it equals the performance of boost::compressed_pair:
I think I'd choose the space compression over the overloading protection in this case. Apart from the fact that it's too darned easy, there's no particular reason anyone *has* to make unqualified calls on raw tuples, and it should be possible to explicitly qualify your way out of any such problems.
Note that as you asserted previously, a tuple is never empty. However, for this tuple it is possible that the sum of the sizeof its elements is less than the sizeof the tuple. This means that (for example), one could write create a smart pointer like so:
template <class T, class D = default_delete<T>> class unique_ptr { tuple<T*, D> data_; public: ... };
And not have to worry about whether D is a function pointer or empty policy class, and still get sizeof(unique_ptr) == sizeof(T*) when D / is/ an empty policy class.
If tuple is to be an all-purpose space compression utility, it should at least compose efficiently. You can always do the above when you want to present a higher-level interface that doesn't create unintended base classes.
It's not just a matter of what namespaces are associated, but also which functions in those namespaces will match.
<nod> Thanks for the good test case. Please come up with more! :-)
How about this one: struct empty1 {}; struct empty2 {}; struct empty3 {}; struct A { empty1 x1; empty2 x2; }; struct B { A x3; empty3 x4; }; assert(sizeof(A) == 1 && sizeof(B) == 1); ? ;-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
on Sun Dec 28 2008, Howard Hinnant <hinnant-AT-twcny.rr.com> wrote:
Because language supported EMO isn't a realistic possibility in C++0X but allowing (not requiring) EMO emulation in tuple is a realistic possibility (N2800 arguably already allows the latter).
</practical>
That being said, if you can get an easier to use solution into C++0X, then you're the man! :-)
"Even I" can't slip that feature in without anyone noticing at this late date.
I think we should care about C compatibility. If C does not allow EMO, then I would suggest using struct/class keywords to make struct C compatible whereas "class" could implement EMO and other future features. Regards, Ion
participants (6)
-
Daniel James
-
David Abrahams
-
Howard Hinnant
-
Howard Hinnant
-
Ion Gaztañaga
-
Thorsten Ottosen